Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Junio C Hamano
Junio C Hamano  writes:

> I do not know if it makes sense to have 3 and 5 separate; I suspect
> a single patch that does "clarify the warning, and allow those who
> have no choice in which version of Git to choose squelch it" would
> suffice.

I actually do not mind two patches for these, but I think the
separation presented in the series is wrong (first to kill the
canary completely, and then add it as if it were a completely
separate advice).  

It would make more sense, at least to me, if 

 - the earlier step is to clarify the warning on two points
   (i.e. this is safe to ignore, but you may want to know that you
   are using a stale Git when we see evidence that a newer one has
   already been used here) and then

 - the later step is to optionally make it possible to squelch it
   for those who do not have control over what version of Git they
   are allowed to run.

But again, a single patch to do all of that is also fine.


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Junio C Hamano
Junio C Hamano  writes:

> As the deployed versions of Git will keep sending the wrong message,
> I do not mind applying 1/5 and 2/5, given especially that Ben seems
> to be OK with the plan.  I however do not think 3 thru 5 is ready
> yet with this round---there were some discussions on phrasing in
> this thread.

I ran out of time looking at the surrounding code, but I think 1, 2
and 4 form a set that would give us immediate benefit to fast track
to the upcoming release.

I do not know if it makes sense to have 3 and 5 separate; I suspect
a single patch that does "clarify the warning, and allow those who
have no choice in which version of Git to choose squelch it" would
suffice.

The phrasing in 5 received a couple of good concrete suggestions
already, so it is not ready in its current form but need a bit of
wordsmithing.  I also do not think a new "trace_printf" would
particularly help.  If I stared it a lot longer, I may spot more
issues in it.

But what that step does primarily would help long after the upcoming
release and in that sense can wait a bit longer than 1, 2 & 4 (which
I am hoping can be merged in -rc1).


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> I don't *think* you intend to say "sure, you got user reports, but
>> (those users are wrong | those users are not real | you are not
>> interpreting those users correctly)", but that is what I am hearing.
>
> What I have been saying is "we are sending a wrong message to those
> users by not clearly saying 'optional' (i.e. it is OK for your Git
> not to understand this optional bits of information---you do not
> have to get alarmed immediately) and also not hinting where that
> optional thing comes from (i.e. if users realized they come from the
> future, the coalmine canary message will serve its purpose of
> reminding them that a newer Git is not just available but has been
> used already in their repository and help them to rectify the
> situation sooner)".
>
> As the deployed versions of Git will keep sending the wrong message,
> I do not mind applying 1/5 and 2/5, given especially that Ben seems
> to be OK with the plan.  I however do not think 3 thru 5 is ready
> yet with this round---there were some discussions on phrasing in
> this thread.

Thanks much --- that helps a lot.

Would you mind taking patch 4/5 as well?  (It's a tweak to the
configuration introduced in patches 1 and 2 that addresses a concern
Ben Peart had.)

As for patches 3 and 5, I agree.  In particular, patch 5 needs an
s/performance//, and it seems that the commit messages need some work
as well.

Sorry for getting the conversation in the wrong direction, and I'm
glad to hear we have a good way forward.

Sincerely,
Jonathan


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Junio C Hamano
Jonathan Nieder  writes:

> I don't *think* you intend to say "sure, you got user reports, but
> (those users are wrong | those users are not real | you are not
> interpreting those users correctly)", but that is what I am hearing.

What I have been saying is "we are sending a wrong message to those
users by not clearly saying 'optional' (i.e. it is OK for your Git
not to understand this optional bits of information---you do not
have to get alarmed immediately) and also not hinting where that
optional thing comes from (i.e. if users realized they come from the
future, the coalmine canary message will serve its purpose of
reminding them that a newer Git is not just available but has been
used already in their repository and help them to rectify the
situation sooner)".

As the deployed versions of Git will keep sending the wrong message,
I do not mind applying 1/5 and 2/5, given especially that Ben seems
to be OK with the plan.  I however do not think 3 thru 5 is ready
yet with this round---there were some discussions on phrasing in
this thread.


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Now, a meta point.  Throughout this discussion, I have been hoping for
>> some acknowledgement of the problem --- e.g. an "I am sympathetic to
>> what you are trying to do, but ".  I wasn't able to find that, and
>> that is part of what contributed to the feeling of not being heard.
[...]
> And seeing the same "let's not enable the new extension" patch again
> without much improved justification contributed greatly to the
> feeling of not being heard.

Thanks for the pointer.  I had not understood before that you were
unhappy with those commit messages.

The commit message describes symptoms and the motivation for the
change.  I was confused at the original replies to patch 1 and 2 that
seemed to be more about patch 3; patch 1 and 2 are meaningful without
patch 3, so it would be odd to include a justification for patch 3 in
their commit message.

That said, it sounds like their commit messages are not adequate.  I'd
appreciate help from someone else to improve them.

>  The feeling is mutual.

I was trying to diagnose what was going wrong with the conversation so
as to move things forward on a better footing.  It seems I only
escalated things more. :(

Sorry about that, and I hope there's some way to move forward.

What is the best way to handle this?  I am feeling somewhat burnt by
this review process.  If Ben and I, working together, are able to come
up with a series that we both like, will you consider it for 2.20?  Is
there some other trusted contributor, such as Peff or Duy, that you
would trust to represent your wishes so I can pursue their Reviewed-by
without risking getting burnt in the same way again?

Jonathan


Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-20 Thread Junio C Hamano
Jeff King  writes:

> But in (b), we use the number of stored objects, _not_ the allocated
> size of the objects array. So we can run into a situation like this:
>
>   1. packlist_alloc() needs to store the Nth object, so it grows the
>  objects array to M, where M > N.
>
>   2. oe_set_tree_depth() wants to store a depth, so it allocates an
>  array of length N. Now we've violated our invariant.
>
>   3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
>  grow the objects array, since N <= M still holds. We try to assign
>  to tree_depth[N+1], which is out of bounds.

Ouch.  I see counting and allocationg is hard (I think I spotted a
bug in another area that comes from the same "count while filtering
and then allocate" pattern during this cycle).  Thanks for spotting.



Re: [PATCH 0/1] rebase: warn about the correct tree's OID

2018-11-20 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> A quick fix for a recent topic. Not overly critical, but I would deem this
> v2.20.0-rc1 material.
>
> Johannes Schindelin (1):
>   rebase: warn about the correct tree's OID
>

Yup, it is kind of embarrasing that nobody caught it, but at the
same time, this typo is at so tiny level that I would not be
surprised if it survived for many years.

Will apply.


>  builtin/rebase.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
>
> base-commit: bb75be6cb916297f271c846f2f9caa3daaaec718
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-85%2Fdscho%2Freset_head-typo-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-85/dscho/reset_head-typo-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/85


Re: [PATCH 1/1] legacy-rebase: backport -C and --whitespace= checks

2018-11-20 Thread Junio C Hamano
Carlo Arenas  writes:

> Tested-by: Carlo Marcelo Arenas Belón 
>
> the C version prepends: "fatal: " unlike the shell version for both
> error messages

In addition, "Invalid whitespace option" says 'bad', not
'--whitespace=bad', in the builtin version.

I think the following would address both issues.


 git-legacy-rebase.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-legacy-rebase.sh b/git-legacy-rebase.sh
index ced0635326..b97ffdc9dd 100755
--- a/git-legacy-rebase.sh
+++ b/git-legacy-rebase.sh
@@ -340,7 +340,7 @@ do
warn|nowarn|error|error-all)
;; # okay, known whitespace option
*)
-   die "Invalid whitespace option: '${1%*=}'"
+   die "fatal: Invalid whitespace option: '${1#*=}'"
;;
esac
;;
@@ -358,7 +358,7 @@ do
force_rebase=t
;;
-C*[!0-9]*)
-   die "switch \`C' expects a numerical value"
+   die "fatal: switch \`C' expects a numerical value"
;;
-C*)
git_am_opt="$git_am_opt $1"




Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Junio C Hamano
Jonathan Nieder  writes:

> Now, a meta point.  Throughout this discussion, I have been hoping for
> some acknowledgement of the problem --- e.g. an "I am sympathetic to
> what you are trying to do, but ".  I wasn't able to find that, and
> that is part of what contributed to the feeling of not being heard.

I had little sympathy for what you were trying to do, i.e. killing
the coalmine canary that warns users about using older version of
Git when there clearly is a sign that a newer one is available to
them and they have already used it; there was no problem to be
acknowledged.

Not before "why is it different this time?" question gets answered
anyway.

And seeing the same "let's not enable the new extension" patch again
without much improved justification contributed greatly to the
feeling of not being heard.  The feeling is mutual.



Re: [PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'

2018-11-20 Thread Junio C Hamano
SZEDER Gábor  writes:

> I rename these variables to 'num_large_edges', because the commit
> graph file format speaks about the 'Large Edge List' chunk.
>
> However, I do find that the term 'extra' makes much more sense and
> fits the concept better (i.e. extra commit graph edges resulting from
> the extra parents or octopus merges; after a s/extra/large/g the
> previous phrase would make no sense), and notice that the term 'large'
> doesn't come up in the file format itseld (the chunk's magic is {'E',
> 'D', 'G', 'E'}, there is no 'L' in there), but only in the
> specification text and a couple of variable and function names in the
> code.
>
> Would it make sense to do the rename in the other direction?

So edges that are involved in octopus merges are counted with
num_extra_edges and written to the large edges table?

I tend to agree that "large edge" is a misnomer.  These edges that
point at third and subsequent parents are no larger than the edges
that point at the first or the second parents---they are the same
size.  What is larger than usual is the size of the list of edges
(i.e. the number of parents), because the commit has extra (compared
to the majority of commits) number of edges.  So from the point of
view, I agree with you that "extra" makes a lot more sense than
"large".

And the magic number being "EDGE" without "L" is probably a good
thing, as a graph whose commits are all without any extra edge does
not need the "EDGE" chunk, so presence of the chunk by itself is a
sign that extra things are involved.  Which means that there isn't
any need to update the magic number, if we wanted to get rid of
"large" and replace it with "extra".  The only thing needed to
update the documentation, variable names and in-code comment.

And while at it, GRAPH_OCTOPUS_EDGES_NEEDED may also want to be
renamed with s/OCTOPUS/EXTRA/;

>  commit-graph.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 40c855f185..7b4e3a02cf 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>  {
>   struct commit **list = commits;
>   struct commit **last = commits + nr_commits;
> - uint32_t num_extra_edges = 0;
> + uint32_t num_large_edges = 0;
>  
>   while (list < last) {
>   struct commit_list *parent;
> @@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>   if (!parent)
>   edge_value = GRAPH_PARENT_NONE;
>   else if (parent->next)
> - edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
> num_extra_edges;
> + edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
> num_large_edges;
>   else {
>   edge_value = sha1_pos(parent->item->object.oid.hash,
> commits,
> @@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>  
>   if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
>   do {
> - num_extra_edges++;
> + num_large_edges++;
>   parent = parent->next;
>   } while (parent);
>   }
> @@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
>   uint32_t chunk_ids[5];
>   uint64_t chunk_offsets[5];
>   int num_chunks;
> - int num_extra_edges;
> + int num_large_edges;
>   struct commit_list *parent;
>   struct progress *progress = NULL;
>  
> @@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
>   commits.alloc = count_distinct;
>   ALLOC_ARRAY(commits.list, commits.alloc);
>  
> - num_extra_edges = 0;
> + num_large_edges = 0;
>   for (i = 0; i < oids.nr; i++) {
>   int num_parents = 0;
>   if (i > 0 && oideq([i - 1], [i]))
> @@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
>   num_parents++;
>  
>   if (num_parents > 2)
> - num_extra_edges += num_parents - 1;
> + num_large_edges += num_parents - 1;
>  
>   commits.nr++;
>   }
> - num_chunks = num_extra_edges ? 4 : 3;
> + num_chunks = num_large_edges ? 4 : 3;
>  
>   if (commits.nr >= GRAPH_PARENT_MISSING)
>   die(_("too many commits to write graph"));
> @@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
>   chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
>   chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
>   chunk_ids[2] = GRAPH_CHUNKID_DATA;
> - if (num_extra_edges)
> + if (num_large_edges)
>   chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
>   else
>   chunk_ids[3] = 0;
> @@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
>   

[PATCH v4 2/2] merge: add scissors line on merge conflict

2018-11-20 Thread Denton Liu
This fixes a bug where the scissors line is placed after the Conflicts:
section, in the case where a merge conflict occurs and
merge.cleanup = scissors.

Next, if commit.cleanup = scissors is specified, don't produce a
scissors line in commit if one already exists in the MERGE_MSG file.

Finally, we give pull the passthrough option of --cleanup so that it
can also take advantage of this change.

Signed-off-by: Denton Liu 
---
 Documentation/merge-options.txt |  6 +
 builtin/commit.c| 20 ++
 builtin/merge.c | 16 
 builtin/pull.c  |  6 +
 t/t7600-merge.sh| 46 +
 5 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index 63a3fc0954..115e0ca6f0 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -27,6 +27,12 @@ they run `git merge`. To make it easier to adjust such 
scripts to the
 updated behaviour, the environment variable `GIT_MERGE_AUTOEDIT` can be
 set to `no` at the beginning of them.
 
+--cleanup=::
+   This option determines how the merge message will be cleaned up
+   before being passed on. Specifically, if the '' is given a
+   value of `scissors`, scissors will be prepended to the message in
+   the case of a merge conflict. See also linkgit:git-commit[1].
+
 --ff::
When the merge resolves as a fast-forward, only update the branch
pointer, without creating a merge commit.  This is the default
diff --git a/builtin/commit.c b/builtin/commit.c
index 0d9828e29e..7902645bc9 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -659,6 +659,7 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
const char *hook_arg2 = NULL;
int clean_message_contents = (cleanup_mode != COMMIT_MSG_CLEANUP_NONE);
int old_display_comment_prefix;
+   int merge_contains_scissors = 0;
 
/* This checks and barfs if author is badly specified */
determine_author_info(author_ident);
@@ -719,6 +720,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
strbuf_addbuf(, );
hook_arg1 = "message";
} else if (!stat(git_path_merge_msg(the_repository), )) {
+   size_t merge_msg_start;
+
/*
 * prepend SQUASH_MSG here if it exists and a
 * "merge --squash" was originally performed
@@ -729,8 +732,14 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
hook_arg1 = "squash";
} else
hook_arg1 = "merge";
+
+   merge_msg_start = sb.len;
if (strbuf_read_file(, git_path_merge_msg(the_repository), 
0) < 0)
die_errno(_("could not read MERGE_MSG"));
+
+   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
+   wt_status_locate_end(sb.buf + merge_msg_start, sb.len - 
merge_msg_start) < sb.len - merge_msg_start)
+   merge_contains_scissors = 1;
} else if (!stat(git_path_squash_msg(the_repository), )) {
if (strbuf_read_file(, git_path_squash_msg(the_repository), 
0) < 0)
die_errno(_("could not read SQUASH_MSG"));
@@ -798,7 +807,8 @@ static int prepare_to_commit(const char *index_file, const 
char *prefix,
struct ident_split ci, ai;
 
if (whence != FROM_COMMIT) {
-   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
+   if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
+   !merge_contains_scissors)
wt_status_add_cut_line(s->fp);
status_printf_ln(s, GIT_COLOR_NORMAL,
whence == FROM_MERGE
@@ -823,10 +833,10 @@ static int prepare_to_commit(const char *index_file, 
const char *prefix,
_("Please enter the commit message for your 
changes."
  " Lines starting\nwith '%c' will be ignored, 
and an empty"
  " message aborts the commit.\n"), 
comment_line_char);
-   else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
-whence == FROM_COMMIT)
-   wt_status_add_cut_line(s->fp);
-   else /* COMMIT_MSG_CLEANUP_SPACE, that is. */
+   else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
+   if (whence == FROM_COMMIT && !merge_contains_scissors)
+   wt_status_add_cut_line(s->fp);
+   } else /* COMMIT_MSG_CLEANUP_SPACE, that is. */
status_printf(s, GIT_COLOR_NORMAL,
_("Please enter the commit message for 

[PATCH v4 1/2] t7600: clean up 'merge --squash c3 with c7' test

2018-11-20 Thread Denton Liu
This cleans up the original test by removing some unnecessary braces and
removing a pipe.

Helped-by: SZEDER Gábor 
Signed-off-by: Denton Liu 
---
 t/t7600-merge.sh | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/t/t7600-merge.sh b/t/t7600-merge.sh
index 106148254d..d879efd330 100755
--- a/t/t7600-merge.sh
+++ b/t/t7600-merge.sh
@@ -233,8 +233,7 @@ test_expect_success 'merge --squash c3 with c7' '
cat result.9z >file &&
git commit --no-edit -a &&
 
-   {
-   cat <<-EOF
+   cat >expect <<-EOF &&
Squashed commit of the following:
 
$(git show -s c7)
@@ -242,8 +241,8 @@ test_expect_success 'merge --squash c3 with c7' '
# Conflicts:
#   file
EOF
-   } >expect &&
-   git cat-file commit HEAD | sed -e '1,/^$/d' >actual &&
+   git cat-file commit HEAD >tmp &&
+   sed -e '1,/^$/d' actual &&
test_cmp expect actual
 '
 
-- 
2.19.1



[PATCH v4 0/2] Fix scissors bug during merge conflict

2018-11-20 Thread Denton Liu
Thanks for catching that, Szeder. I tested this revised patch under
GETTEXT_POISON and it should be passing tests now.

Changes since V1:
* Only check MERGE_MSG for a scissors line instead of all prepended
  files
* Make a variable static in merge where appropriate
* Add passthrough options in pull
* Add documentation for the new option
* Add tests to ensure desired behaviour

Changes since V2:
* Merge both patches into one patch
* Fix bug in help message printing logic

Changes since V3:
* Add patch to cleanup 'merge --squash c3 with c7' test in t7600
* Use test_i18ncmp instead of test_cmp to pass GETTEXT_POISON tests

Denton Liu (2):
  t7600: clean up 'merge --squash c3 with c7' test
  merge: add scissors line on merge conflict

 Documentation/merge-options.txt |  6 
 builtin/commit.c| 20 +
 builtin/merge.c | 16 ++
 builtin/pull.c  |  6 
 t/t7600-merge.sh| 53 ++---
 5 files changed, 92 insertions(+), 9 deletions(-)

-- 
2.19.1



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Jonathan Nieder
Stefan Xenos wrote:
> Jonathan Nieder wrote:

>> putting it in the commit message is a way to
>> experiment with the workflow without changing the object format
>
> As long as we're talking about a temporary state of affairs for users
> that have opted in, and we're explicit about the fact that future
> versions of git won't understand the change graphs that are produced
> during that temporary state of affairs, I'm fine with using the commit
> message. We can move it to the header prior to enabling the feature by
> default.

Yay!  I think that addresses both my and Ævar's concerns.  Also, if
you run into an issue that requires changing the object format
earlier, that's fine and we can handle the situation when it comes.

I don't have a strong opinion about whether this would go in the
design doc.  I suppose the doc could have an "implementation plan"
section describing temporary stopping points on the way to the final
result, but it's not necessary to include that.

Thanks for the quick and thoughtful replies.

Jonathan


[PATCH 2/2] commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

2018-11-20 Thread SZEDER Gábor
The optional 'Large Edge List' chunk of the commit graph file stores
parent information for commits with more than two parents.  Since the
chunk is optional, write_commit_graph() looks through all commits to
find those with more than two parents, and then writes the commit
graph file header accordingly, i.e. if there are no such commits, then
there won't be a 'Large Edge List' chunk written, only the three
mandatory chunks.

However, when it comes to writing chunk data, write_commit_graph()
unconditionally invokes write_graph_chunk_large_edges(), even when it
was decided earlier that that chunk won't be written.  Strictly
speaking there is no bug here, because write_graph_chunk_large_edges()
won't write anything because it won't find any commits with more than
two parents, but then it unnecessarily and in vain looks through all
commits once again in search for such commits.

Don't call write_graph_chunk_large_edges() when that chunk won't be
written to spare an unnecessary iteration over all commits.

Signed-off-by: SZEDER Gábor 
---
 commit-graph.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7b4e3a02cf..965eb23a7b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -940,7 +940,8 @@ void write_commit_graph(const char *obj_dir,
write_graph_chunk_fanout(f, commits.list, commits.nr);
write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_large_edges(f, commits.list, commits.nr);
+   if (num_large_edges)
+   write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
-- 
2.20.0.rc0.134.gf0022f8e60



[PATCH 1/2] commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'

2018-11-20 Thread SZEDER Gábor
The commit graph file format describes an optional 'Large Edge List'
chunk, and the function writing out this chunk is called
write_graph_chunk_large_edges().  Then there are two functions in
'commit-graph.c', namely write_graph_chunk_data() and
write_commit_graph(), which have a local variable called
'num_extra_edges'.

It can be confusing on first sight whether large edges and extra edges
refer to the same thing or not, but they do, so let's rename those
variables to 'num_large_edges'.

Signed-off-by: SZEDER Gábor 
---

I rename these variables to 'num_large_edges', because the commit
graph file format speaks about the 'Large Edge List' chunk.

However, I do find that the term 'extra' makes much more sense and
fits the concept better (i.e. extra commit graph edges resulting from
the extra parents or octopus merges; after a s/extra/large/g the
previous phrase would make no sense), and notice that the term 'large'
doesn't come up in the file format itseld (the chunk's magic is {'E',
'D', 'G', 'E'}, there is no 'L' in there), but only in the
specification text and a couple of variable and function names in the
code.

Would it make sense to do the rename in the other direction?

 commit-graph.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..7b4e3a02cf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -475,7 +475,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
-   uint32_t num_extra_edges = 0;
+   uint32_t num_large_edges = 0;
 
while (list < last) {
struct commit_list *parent;
@@ -507,7 +507,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
if (!parent)
edge_value = GRAPH_PARENT_NONE;
else if (parent->next)
-   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_extra_edges;
+   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_large_edges;
else {
edge_value = sha1_pos(parent->item->object.oid.hash,
  commits,
@@ -521,7 +521,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
do {
-   num_extra_edges++;
+   num_large_edges++;
parent = parent->next;
} while (parent);
}
@@ -761,7 +761,7 @@ void write_commit_graph(const char *obj_dir,
uint32_t chunk_ids[5];
uint64_t chunk_offsets[5];
int num_chunks;
-   int num_extra_edges;
+   int num_large_edges;
struct commit_list *parent;
struct progress *progress = NULL;
 
@@ -871,7 +871,7 @@ void write_commit_graph(const char *obj_dir,
commits.alloc = count_distinct;
ALLOC_ARRAY(commits.list, commits.alloc);
 
-   num_extra_edges = 0;
+   num_large_edges = 0;
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
if (i > 0 && oideq([i - 1], [i]))
@@ -885,11 +885,11 @@ void write_commit_graph(const char *obj_dir,
num_parents++;
 
if (num_parents > 2)
-   num_extra_edges += num_parents - 1;
+   num_large_edges += num_parents - 1;
 
commits.nr++;
}
-   num_chunks = num_extra_edges ? 4 : 3;
+   num_chunks = num_large_edges ? 4 : 3;
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
@@ -916,7 +916,7 @@ void write_commit_graph(const char *obj_dir,
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
chunk_ids[2] = GRAPH_CHUNKID_DATA;
-   if (num_extra_edges)
+   if (num_large_edges)
chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
else
chunk_ids[3] = 0;
@@ -926,7 +926,7 @@ void write_commit_graph(const char *obj_dir,
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
-   chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+   chunk_offsets[4] = chunk_offsets[3] + 4 * num_large_edges;
 
for (i = 0; i <= num_chunks; i++) {
uint32_t chunk_write[3];
-- 
2.20.0.rc0.134.gf0022f8e60



[no subject]

2018-11-20 Thread SZEDER Gábor
On Tue, Nov 20, 2018 at 05:58:00PM +0100, SZEDER Gábor wrote:
> I saw a
> bit of weirdness while at it, and want to look into it, but now I've
> got to go...

So here are two simple patches that address the "Huh?!" moments I had
while looking at the progress output during writing the commit graph
file.  The first is a small cleanup to avoid confusion, but see the
notes attaches, while the second is a bit of an optimization.

SZEDER Gábor (2):
  commit-graph: rename 'num_extra_edges' variable to 'num_large_edges'
  commit-graph: don't call write_graph_chunk_large_edges() unnecessarily

 commit-graph.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

-- 
2.20.0.rc0.134.gf0022f8e60



Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Junio C Hamano wrote:

>  This series has a strong smell of pushing back by the
> toolsmiths who refuse to promptly upgrade to help their users, and
> that is why I do not feel entirely happy with this series.

Last reply, I promise. :)

This sentence might have the key to the misunderstanding.  Let me say
a little more about where this showed up in the internal deployment
here, to clarify things a little.

At Google we deploy snapshots of the "next" branch approximately
weekly so that we can find problems early before they affect a
published release.  We rely on the ability to roll back quickly when a
problem is discovered, and we might care more about compatibility than
some others because of that.

A popular tool within Google has a bundled copy of Git (also a
snapshot of the "next" branch, but from a few weeks prior) and when we
deployed Git with the EOIE and IEOT extensions, users of that tool
very quickly reported the mysterious message.

That said, the maintainers of that tool did not complain at all, so
hopefully I can allay your worries about toolsmiths pushing back.
Once the problem reached my attention (a few days later than I would
have liked it to), the Git team at Google knew that we could not roll
back and were certainly alarmed about what that means about our
ability to cope with other problems should we need to.  But we were
able to quickly update that popular tool --- no issue.

Instead, we ran into a number of other users running into the same
problem, when sharing repositories between machines using sshfs, etc.
That, plus the aforementioned inability to roll back Git if we need
to, meant that this was a serious issue so we quickly addressed it in
the internal installation.

In general, we haven't had much trouble getting people to use Git
2.19.1 or newer.  So the problem here does not have to do with users
being slow to upgrade.

Instead, it's simply that upgrading Git should not cause the older,
widely deployed version of Git to complain about the repositories it
acts on.  That's a recipe for difficult debugging situations, it can
lead to people upgrading less quickly and reporting bugs later, and
all in all it's a bad situation to be in.  I've used tools like
Subversion that would upgrade repositories so they are unusable by the
previous version and experienced all of these problems.

So I consider it important *to Git upstream* to handle this well in
the Git 2.20 release.  We can flip the default soon after, even as
soon as 2.21.

Moreover, I am not the only one who ran into this --- e.g. from [1],
2018-10-19:

  17:10  jrnieder: Yes, I noticed that annoyance myself. ;)
  17:11  Yeah, I saw that message a few times and was slightly
 annoyed as well.

Now, a meta point.  Throughout this discussion, I have been hoping for
some acknowledgement of the problem --- e.g. an "I am sympathetic to
what you are trying to do, but ".  I wasn't able to find that, and
that is part of what contributed to the feeling of not being heard.

Thanks for your patient explanations, and hope that helps,
Jonathan

[1] https://colabti.org/irclogger/irclogger_log/git-devel?date=2018-10-19#l114


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
onathan Nieder wrote:
> Junio C Hamano wrote:

>> I am still puzzled by the insistence of 3/5 and this step that wants
>> to kill the coalmine canary.  But I am even more puzzled by the
>> first two steps that want to disable the two optional extensions.
[...]
> I acknowledge your puzzlement.  I'm not sure what to do about it.
>
> There are a few significant differences from the REUC case:
>
>  1. This happens whenever the index is refreshed.  REUC, as you
> mentioned, only affected resolutions of conflicted merges.  So
> users ran into it less often.
>
>  2. I never ran into the REUC case.  If I had, I would have sent the
> same patch then.
>
>  3. Time has passed and people's standards may have gone up.
>
> I wish I had been around when the message was added in the first
> place, so that I could have provided the same feedback about the
> message then.  But I do not think that that should be held against me.
> I'm describing a real user problem.

And to be clear, it is the first two patches that address the
immediate user problem.  Whatever improvements we make to the warning
message today, we cannot retroactively change the other versions of
Git that users are using that want to access the same repository.

Jonathan


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:

> I am still puzzled by the insistence of 3/5 and this step that wants
> to kill the coalmine canary.  But I am even more puzzled by the
> first two steps that want to disable the two optional extensions.
>
> What's so different this time with the new optional extensions?
>
> The other early optional extensions like cache-tree or resolve-undo
> were added unconditionally and by definition appeared much earlier
> in git-core than any other Git reimplementations.  Everbody who
> recorded the fact that s/he resolved merge conflicts got REUC, and
> we would have given warning when an older Git did not understand
> these extensions [*1*].  We knudged users to more modern Git by
> preparing the old Gits to warn when there are unknown extensions,
> either by upgrading their Git themselves, or by bugging their
> toolsmiths.  Nobody complained to propose to rip the messages like
> this round.  This series has a strong smell of pushing back by the
> toolsmiths who refuse to promptly upgrade to help their users, and
> that is why I do not feel entirely happy with this series.

I acknowledge your puzzlement.  I'm not sure what to do about it.

There are a few significant differences from the REUC case:

 1. This happens whenever the index is refreshed.  REUC, as you
mentioned, only affected resolutions of conflicted merges.  So
users ran into it less often.

 2. I never ran into the REUC case.  If I had, I would have sent the
same patch then.

 3. Time has passed and people's standards may have gone up.

I wish I had been around when the message was added in the first
place, so that I could have provided the same feedback about the
message then.  But I do not think that that should be held against me.
I'm describing a real user problem.

Are the commit messages unclear?  Is there some missing use case that
this version of the patch misses?

I don't *think* you intend to say "sure, you got user reports, but
(those users are wrong | those users are not real | you are not
interpreting those users correctly)", but that is what I am hearing.
On the other hand, I don't want to discourage useful review feedback,
and I think adding the advise() call was a real improvement.  I'm just
getting confused about why I am still not being heard.

Jonathan


Re: [RFC PATCH 1/5] ref-filter: add objectsize:disk option

2018-11-20 Thread Junio C Hamano
Оля Тележная   writes:

>> I am OK if we avoid PRIdMAX and use PRIuMAX instead with a cast to
>> the corresponding size in this codepath, as long as we properly
>> handle negative oi.disk_size field, which may be telling some
>> "unusual" condition to us.
>
> Maybe we want to change the type (from off_t to unsigned) directly in
> struct object_info? That will help us not to make additional
> checkings. Or, at least, I suggest to add check to
> oid_object_info_extended() so that this function will give a guarantee
> that the size is non-negative.

I am fine with the approach.  The potential gain of allowing
oi.disk_size is it would allow the code to say "I'll give these
pieces of info about the object, but the on-disk size is unknown"
without failing the whole object_info_extended() request.  And I
personally do not think such an ability is not all that important
or useful.



Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Junio C Hamano
Ben Peart  writes:

>> This message should say something like "Index uses the mandatory %s
>> extension" to clarify and distinguish it from the below. We don't
>> understand the upper-case one either, but the important distinction is
>> that one is mandatory, and the other can be dropped. The two messages
>> should make this clear.
>>
>> Also, having advice() for that case is even more valuable since we have
>> a hard error at this point. So something like:
>>
>>  "This is likely due to the index having been written by a future
>>  version of Git. All-lowercase index extensions are mandatory, as
>>  opposed to optional all-uppercase ones which we'll drop with a
>>  warning if we see them".
>>
>
> I agree that we should have different messages for mandatory and
> optional extensions.  I don't think we should try and educate the end
> user on the implementation detail that git makes lower cases mandatory
> and upper case optional (ie drop the 'All-lowercase..." part).  They
> will never see the lower vs upper case difference and can't do
> anything about it anyway.

I agree that the "warn and continue" message should say "optional"
(meaning: safe to ignore but you would want to take note) while
"cannot continue" message should say something different.

I do not mind a more verbose error message when we saw unknown but
required extension, but unlike the "warn and continue" case, the
program will stop and die with such an error right there, so I am
not sure if it is worth allowing to tone it down by putting some
part of the verbosity behind the advise() mechanism.

>>> trace_printf("ignoring %.4s extension\n", ext);
>>> +   if (advice_unknown_index_extension) {
>>> +   warning(_("ignoring optional %.4s index extension"), 
>>> ext);

So from that point of view, the distinction between this message and this one

>>> return error("index uses %.4s extension, which we do 
>>> not understand",
>>>  ext);

is halfway there.  The message needs to anticipate and answer an
end-user reaction: "we do not understand" so what?

I am still puzzled by the insistence of 3/5 and this step that wants
to kill the coalmine canary.  But I am even more puzzled by the
first two steps that want to disable the two optional extensions.

What's so different this time with the new optional extensions?

The other early optional extensions like cache-tree or resolve-undo
were added unconditionally and by definition appeared much earlier
in git-core than any other Git reimplementations.  verbody who
recorded the fact that s/he resolved merge conflicts got REUC, and
we would have given warning when an older Git did not understand
these extensions [*1*].  We knudged users to more modern Git by
preparing the old Gits to warn when there are unknown extensions,
either by upgrading their Git themselves, or by bugging their
toolsmiths.  Nobody complained to propose to rip the messages like
this round.  This series has a strong smell of pushing back by the
toolsmiths who refuse to promptly upgrade to help their users, and
that is why I do not feel entirely happy with this series.


[Footnote]

 *1* A Git that did not understand TREE would have been silent, as
  it was the first extension and that was the first time we became
  aware of the need to warn unknown extensions.


Re: [PATCH] .mailmap: record canonical email address for Sven Strickroth

2018-11-20 Thread Junio C Hamano
Jonathan Nieder  writes:

>> +Sven Strickroth  
>
> Is the above line needed?  It's not clear to me from the commit message
> what it does.

Stare at it once again and you'll see ;-) FWIW I needed to do that
myself before you had the same issue.

Sven is trying to hide real-name name and have a bland token that
would leave the least amount of impression to the recipients there
instead, i.e. "em...@add.re.ss".


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> putting it in the commit message is a way to
> experiment with the workflow without changing the object format

As long as we're talking about a temporary state of affairs for users
that have opted in, and we're explicit about the fact that future
versions of git won't understand the change graphs that are produced
during that temporary state of affairs, I'm fine with using the commit
message. We can move it to the header prior to enabling the feature by
default.

- Stefan



On Tue, Nov 20, 2018 at 2:06 PM Jonathan Nieder  wrote:
>
> Stefan Xenos wrote:
> > On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
> >  wrote:
>
> >> I think it sounds better to just make it, in the header:
> >>
> >> x-evolve-pt content
> >> x-evolve-pt obsolete
> >> x-evolve-pt origin
> >>
> >> Where "pt = parent-type", we could of course spell that out too, but in
> >> this case it's "x-evolve-pt" is the exact same number of bytes as
> >> "parent-type", so nobody can object that it takes more space:)
> >>
> >> We'd then carry some documentation where we say everything except "x-*-"
> >> is reserved, and that we'd like to know about new "*" there before it's
> >> used, so it can be documented.
> [...]
> >  that should
> > probably be the subject of a separate proposal (who owns the content
> > of a namespace, what is the process for adding a new namespace or a
> > new attribute within a namespace, what order should the header
> > attributes appear in, what problem is namespacing there to solve, when
> > do we use a namespaced attribute versus a "reserved" attribute, etc.).
>
> Agreed.  There are reasons that I prefer not to go in this direction,
> but regardless, it would be the subject of a separate thread if you want
> to pursue it.
>
> >> Putting it in the commit message just sounds like a hack around not
> >> having namespaced headers. If we'd like to keep this then tools would
> >> need to parse both (potentially unpacking a lot of the commit message
> >> object, it can be quite big in some cases...).
>
> On the contrary: putting it in the commit message is a way to
> experiment with the workflow without changing the object format at
> all.
>
> I don't think we should underestimate the value of that ability.
>
> I don't understand what you're referring to by parsing both.  Are you
> saying that if the experiment proves successful, we wouldn't be able
> to migrate completely to a new format?  That sounds worrying to me ---
> I want the ability to experiment and to act on what we learn from an
> experiment, including when it touches on formats.
>
> Thanks,
> Jonathan


Re: [PATCH 1/1] legacy-rebase: backport -C and --whitespace= checks

2018-11-20 Thread Carlo Arenas
Tested-by: Carlo Marcelo Arenas Belón 

the C version prepends: "fatal: " unlike the shell version for both
error messages

Carlo


Re: [PATCH v2 2/6] commit-graph write: add more progress output

2018-11-20 Thread SZEDER Gábor
On Tue, Nov 20, 2018 at 07:50:23PM +, Ævar Arnfjörð Bjarmason wrote:
> Add more progress output to the output already added in
> 7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
> 
> As noted in that commit most of the progress output isn't displayed on
> small repositories, but before this change we'd noticeably hang for
> 2-3 seconds at the end on medium sized repositories such as linux.git.
> 
> Now we'll instead show output like this, and have no human-observable
> point at which we're not producing progress output:
> 
> $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
> Finding commits for commit graph: 6365492, done.
> Computing commit graph generation numbers: 100% (797222/797222), done.
> Writing out commit graph: 2399912, done.
> 
> This "writing out" number is not meant to be meaningful to the user,
> but just to show that we're doing work and the command isn't
> hanging.
> 
> In the current implementation it's approximately 4x the number of
> commits.

"approximately" only, because the current implementation is buggy :)
If done right it's exactly 4x the number of commits.

> As noted in on-list discussion[1] we could add the loops up
> and show percentage progress here, but I don't think it's worth it. It
> would make the implementation more complex and harder to maintain for
> very little gain.

I think that if we can cheaply and accurately figure out the total,
then we should display it, so onlooking users can guesstimate how much
work is still left to be done.

> On a much larger in-house repository I have we'll show (note how we
> also say "Annotating[...]"):
> 
> $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
> Finding commits for commit graph: 50026015, done.
> Annotating commit graph: 21567407, done.
> Computing commit graph generation numbers: 100% (7144680/7144680), done.
> Writing out commit graph: 21434417, done.
> 
> 1. https://public-inbox.org/git/20181120165800.gb30...@szeder.dev/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  commit-graph.c | 41 -
>  1 file changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index e6d0d7722b..6f6409b292 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository 
> *r, const struct commit
>  
>  static void write_graph_chunk_fanout(struct hashfile *f,
>struct commit **commits,
> -  int nr_commits)
> +  int nr_commits,
> +  struct progress *progress,
> +  uint64_t *progress_cnt)
>  {
>   int i, count = 0;
>   struct commit **list = commits;
> @@ -445,6 +447,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>*/
>   for (i = 0; i < 256; i++) {
>   while (count < nr_commits) {
> + display_progress(progress, ++*progress_cnt);

I think this display_progress() should be places after the condition,
so no one has to waste brain cycles on figuring out, why it always
counts 255 more than the number of commits.

>   if ((*list)->object.oid.hash[0] != i)
>   break;
>   count++;
> @@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>  }
>  
>  static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
> -struct commit **commits, int nr_commits)
> +struct commit **commits, int nr_commits,
> +struct progress *progress,
> +uint64_t *progress_cnt)
>  {
>   struct commit **list = commits;
>   int count;
> - for (count = 0; count < nr_commits; count++, list++)
> + for (count = 0; count < nr_commits; count++, list++) {
> + display_progress(progress, ++*progress_cnt);
>   hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
> + }
>  }
>  
>  static const unsigned char *commit_to_sha1(size_t index, void *table)
> @@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, 
> void *table)
>  }
>  
>  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> -struct commit **commits, int nr_commits)
> +struct commit **commits, int nr_commits,
> +struct progress *progress,
> +uint64_t *progress_cnt)
>  {
>   struct commit **list = commits;
>   struct commit **last = commits + nr_commits;
> @@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>   struct commit_list *parent;
>   int edge_value;
>   uint32_t 

Re: Git for Windows v2.20.0-rc0, was Re: [ANNOUNCE] Git v2.20.0-rc0

2018-11-20 Thread Bryan Turner
On Tue, Nov 20, 2018 at 12:57 PM Johannes Schindelin
 wrote:
>
> Team,
>
> On Sun, 18 Nov 2018, Junio C Hamano wrote:
>
> > An early preview release Git v2.20.0-rc0 is now available for
> > testing at the usual places.  It is comprised of 887 non-merge
> > commits since v2.19.0, contributed by 71 people, 23 of which are
> > new faces.
>
> The "for Windows" flavor of Git v2.20.0-rc0 is available here:
>

Thanks again for publishing these release candidates! I greatly
appreciate the effort that's involved, and the opportunity it affords
to test new versions prior to their final release.

I've run 2.20.0-rc0 through the test matrix for Bitbucket Server on
both Linux and Windows, and the only failures were related to this
change:

* "git branch -l " used to be a way to ask a reflog to be
   created while creating a new branch, but that is no longer the
   case.  It is a short-hand for "git branch --list " now.

Since this is an intentional change I suspect there's nothing to do
for it but patch Bitbucket Server and move on, but I'll confess it's a
little frustrating that the option was deprecated in 2.19 and then
immediately removed in the next minor release. Such a
backwards-incompatible change seems far more apt for a major release,
a perspective that's reinforced by having the change follow such a
brief deprecation period--2.19.0 was only tagged September 10th (in my
timezone), so it's been less than 3 months. (Looking at the git branch
documentation for 2.18.0 [1] shows nothing about this deprecation; the
messaging first appears in 2.19.0 [2]. To be honest, I didn't even
realize it was deprecated until now, when it's gone--our test suite is
automated, so the deprecation warning was not visible.)

For what it's worth, I think having -l mean --list is a _good change_,
and one that will likely be more natural for both new and existing
users. It's the rapid changeover that hurts.

Best regards,
Bryan Turner

[1] https://git-scm.com/docs/git-branch/2.18.0
[2] https://git-scm.com/docs/git-branch/2.19.0


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Jonathan Nieder
Stefan Xenos wrote:
> On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
>  wrote:

>> I think it sounds better to just make it, in the header:
>>
>> x-evolve-pt content
>> x-evolve-pt obsolete
>> x-evolve-pt origin
>>
>> Where "pt = parent-type", we could of course spell that out too, but in
>> this case it's "x-evolve-pt" is the exact same number of bytes as
>> "parent-type", so nobody can object that it takes more space:)
>>
>> We'd then carry some documentation where we say everything except "x-*-"
>> is reserved, and that we'd like to know about new "*" there before it's
>> used, so it can be documented.
[...]
>  that should
> probably be the subject of a separate proposal (who owns the content
> of a namespace, what is the process for adding a new namespace or a
> new attribute within a namespace, what order should the header
> attributes appear in, what problem is namespacing there to solve, when
> do we use a namespaced attribute versus a "reserved" attribute, etc.).

Agreed.  There are reasons that I prefer not to go in this direction,
but regardless, it would be the subject of a separate thread if you want
to pursue it.

>> Putting it in the commit message just sounds like a hack around not
>> having namespaced headers. If we'd like to keep this then tools would
>> need to parse both (potentially unpacking a lot of the commit message
>> object, it can be quite big in some cases...).

On the contrary: putting it in the commit message is a way to
experiment with the workflow without changing the object format at
all.

I don't think we should underestimate the value of that ability.

I don't understand what you're referring to by parsing both.  Are you
saying that if the experiment proves successful, we wouldn't be able
to migrate completely to a new format?  That sounds worrying to me ---
I want the ability to experiment and to act on what we learn from an
experiment, including when it touches on formats.

Thanks,
Jonathan


Re: [PATCH] .mailmap: record canonical email address for Sven Strickroth

2018-11-20 Thread Jonathan Nieder
Hi,

Sven Strickroth wrote:

> Subject: .mailmap: record canonical email address for Sven Strickroth
>
> Signed-off-by: Sven Strickroth 
> ---
>  .mailmap | 2 ++
>  1 file changed, 2 insertions(+)

Thanks for taking care of it.

[...]
> --- a/.mailmap
> +++ b/.mailmap
> @@ -235,6 +235,8 @@ Steven Grimm  
> 
>  Steven Grimm  kor...@midwinter.com
>  Steven Walter  
>  Steven Walter  
> +Sven Strickroth  

Is the above line needed?  It's not clear to me from the commit message
what it does.

> +Sven Strickroth  

This line looks good.

>  Sven Verdoolaege  
>  Sven Verdoolaege  
>  SZEDER G??bor  

This line does not seem to have survived mailing.  Fortunately it's a
context line, but thought I should mention it for the future.

Thanks and hope that helps,
Jonathan


Git for Windows v2.20.0-rc0, was Re: [ANNOUNCE] Git v2.20.0-rc0

2018-11-20 Thread Johannes Schindelin
Team,

On Sun, 18 Nov 2018, Junio C Hamano wrote:

> An early preview release Git v2.20.0-rc0 is now available for
> testing at the usual places.  It is comprised of 887 non-merge
> commits since v2.19.0, contributed by 71 people, 23 of which are
> new faces.

The "for Windows" flavor of Git v2.20.0-rc0 is available here:

https://github.com/git-for-windows/git/releases/tag/v2.20.0-rc0.windows.1

The current change log for v2.20.0 reads like this:

Changes since Git for Windows v2.19.1 (Oct 5th 2018)

Please note: Git CMD is deprecated as of this Git for Windows version. The
default is to have git.exe in the PATH anyway, so there is no noticeable
difference between CMD and Git CMD. It is impossible to turn off CMD's
behavior where it picks up any git.exe in the current directory, so let's
discourage the use of Git CMD. Users who dislike Git Bash should switch to
Powershell instead.

New Features

  • Comes with OpenSSH v7.9p1.
  • The description of the editor option to choose Vim has been clarified
to state that this unsets core.editor.
  • Comes with cURL v7.62.0.
  • The type of symlinks to create (directory or file) can now be
specified via the .gitattributes.
  • The FSCache feature now uses a faster method to enumerate files,
making e.g. git status faster in large repositories.
  • Comes with Git Credential Manager v1.18.3.
  • Comes with Git LFS v2.6.0.
  • Comes with MSYS2 runtime (Git for Windows flavor) based on Cygwin
2.11.2.
  • The FSCache feature was optimized to become faster.

Bug Fixes

  • The 64-bit Portable Git no longer sets pack.packSizeLimit.

Thanks,
Johannes

Re: [PATCH] msvc: Directly use MS version (_stricmp) of strcasecmp

2018-11-20 Thread Johannes Schindelin
Hi Junio,

On Tue, 20 Nov 2018, Junio C Hamano wrote:

> Sven Strickroth  writes:
> 
> > This also removes an implicit conversion from size_t (unsigned) to int 
> > (signed).
> >
> > _stricmp as well as _strnicmp are both available since VS2012.

Looks good to me.

> > Signed-off-by: Sven Strickroth 
> > ---
> >  compat/msvc.h | 8 +---
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> Will apply, thanks.
> 
> The substition from ftello with _ftelli64 does not appear in our
> codebase yet, but it was easy enough to adjust the patch myself, so
> no need to resend this patch.

Indeed, that is only in Git for Windows' code base yet, AFAICT.

For the record: I am currently holding off from contributing those patches
(I am talking about the patch series to make Git compile with MS Visual
C++ on the command line, followed by the patch series to generate project
definitions ready for use with MS Visual Studio) because of the feature
freeze. I had hoped to be able to contribute them sooner, but it took Jeff
Hostetler and myself a combined gargantuan effort to reorder and
disentangle Git for Windows' branch thicket so that those patches apply
cleanly on top of git.git's `master`.

Happily, almost all of the prerequisites made it upstream (e.g. the
nanosecond support for Windows, the patches to require Windows Vista or
later, the patch to use CreateHardLink() directly, etc). By my counting,
only two, relatively small patch series are left, and both are already
under discussion (but on hold, due to the code freeze).

For interested parties: the current shape of the `visual-studio` patch
series can be seen here:
https://github.com/git-for-windows/git/compare/581eb5441089%5E...581eb5441089%5E2
and the current shape of the `msvc` patch series can be seen here:
https://github.com/git-for-windows/git/compare/e9e7bd2a2485%5E...e9e7bd2a2485%5E2

Ciao,
Dscho

> 
> > diff --git a/compat/msvc.h b/compat/msvc.h
> > index e6e1a6bbf7..2d558bae14 100644
> > --- a/compat/msvc.h
> > +++ b/compat/msvc.h
> > @@ -14,18 +14,12 @@
> >  #define inline __inline
> >  #define __inline__ __inline
> >  #define __attribute__(x)
> > +#define strcasecmp   _stricmp
> >  #define strncasecmp  _strnicmp
> >  #define ftruncate_chsize
> >  #define strtoull _strtoui64
> >  #define strtoll  _strtoi64
> >  
> > -static __inline int strcasecmp (const char *s1, const char *s2)
> > -{
> > -   int size1 = strlen(s1);
> > -   int sisz2 = strlen(s2);
> > -   return _strnicmp(s1, s2, sisz2 > size1 ? sisz2 : size1);
> > -}
> > -
> >  #undef ERROR
> >  
> >  #define ftello _ftelli64
> 


Re: Git Test Coverage Report (v2.20.0-rc0)

2018-11-20 Thread Johannes Schindelin
Hi Stolee,

On Mon, 19 Nov 2018, Derrick Stolee wrote:

> On 11/19/2018 1:33 PM, Ævar Arnfjörð Bjarmason wrote:
> > On Mon, Nov 19 2018, Derrick Stolee wrote:
> >
> > > [...]
> > > builtin/rebase.c
> > > 62c23938fa 55) return env;
> > > [...]
> > > Ævar Arnfjörð Bjarmason 62c23938f: tests: add a special setup
> > > where rebase.useBuiltin is off
> > This one would be covered with
> > GIT_TEST_REBASE_USE_BUILTIN=false. Obviously trivial, but I wonder if
> > the rest of the coverage would look different when passed through the
> > various GIT_TEST_* options.
> >
> 
> Thanks for pointing out this GIT_TEST_* variable to me. I had been running
> builds with some of them enabled, but didn't know about this one.
> 
> Unfortunately, t3406-rebase-message.sh fails with
> GIT_TEST_REBASE_USE_BUILTIN=false and it bisects to 4520c2337: Merge branch
> 'ab/rebase-in-c-escape-hatch'.
> 
> The issue is that the commit 04519d72 "rebase: validate -C and
> --whitespace= parameters early" introduced the following test that cares
> about error messages:
> 
> +test_expect_success 'error out early upon -C or --whitespace=' '
> +   test_must_fail git rebase -Cnot-a-number HEAD 2>err &&
> +   test_i18ngrep "numerical value" err &&
> +   test_must_fail git rebase --whitespace=bad HEAD 2>err &&
> +   test_i18ngrep "Invalid whitespace option" err
> +'
> 
> The merge commit then was the first place where this test could run with that
> variable.
> 
> What's the correct fix here? Force the builtin rebase in this test? Unify the
> error message in the non-builtin case?

I am obviously biased, but my take is that the correct fix is in
https://public-inbox.org/git/pull.86.git.gitgitgad...@gmail.com

Ciao,
Dscho

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> If a merge has been cherry-picked we cannot update it as we don't record
> which parent was used for the pick, however it is probably not a problem
> in practice - I think it is unusual to amend merges.

I've read and reread that sentence several times and don't fully
understand it. Could you elaborate?

It sounds scary, though. With the evolve command, amending merges will
need to be supported. If you create a merge and then amend one of its
parent commits, the evolve command will need to rebase the merge and
point one or both parents to the replacement instead.

  - Stefan
On Tue, Nov 20, 2018 at 5:03 AM Phillip Wood  wrote:
>
> On 15/11/2018 00:55, sxe...@google.com wrote:
> > From: Stefan Xenos 
> >
> > +Obsolescence across cherry-picks
> > +
> > +By default the evolve command will treat cherry-picks and squash merges as 
> > being
> > +completely separate from the original. Further amendments to the original 
> > commit
> > +will have no effect on the cherry-picked copy. However, this behavior may 
> > not be
> > +desirable in all circumstances.
> > +
> > +The evolve command may at some point support an option to look for cases 
> > where
> > +the source of a cherry-pick or squash merge has itself been amended, and
> > +automatically apply that same change to the cherry-picked copy. In such 
> > cases,
> > +it would traverse origin edges rather than ignoring them, and would treat a
> > +commit with origin edges as being obsolete if any of its origins were 
> > obsolete.
>
> If a merge has been cherry-picked we cannot update it as we don't record
> which parent was used for the pick, however it is probably not a problem
> in practice - I think it is unusual to amend merges.
>
> Best Wishes
>
> Phillip


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> This explains why we have 'origin' fields in the meta commits, it might
> be worth putting a forward reference or note earlier on to explain why
> recording the origin is useful. (I didn't find gerrit needs it very
> convincing on its own but it is actually more general than gerrit's
> specific use case)

I'll add the forward reference.

TBH, gerrit is the main reason I added it - so I'm interested in why
you didn't find the gerrit use-case convincing. Can you elaborate? (If
there's some other way around the gerrit requirement, we might not
need the origin parents)

> Should this be meta/mychange:refs/for/master or have I missed something?

It should be metas/mychange/ It's already fixed in the v2 patch.

I really wanted to use the namespace "changes", but gerrit is
squatting on that. I tried "change", but that brakes the plural naming
scheme and may get confused with gerrit's namespace, so I settled on
"metas".

> I think it would make sense to have this next to the sections on commit
> --amend and merge I was wondering what about rebase when I was reading
> those sections.

Will do.

> I'm a bit confused why it is creating a meta ref per commit rather than
> one for the current branch.

I tried to explain that later in the doc. meta refs serve two purposes
- they act as stable names for changes (or at least the commits at the
head of each change) and they point to the metacommits that are
currently in use. For both purposes, we need a ref per commit. For the
"stable name" case, this should be obvious - something that just
points to a branch couldn't provide different names for each commit on
that branch. The metacommit case is less obvious - the set of
metacommits for one change aren't connected to the metacommits for any
other change. The "parents" of a metacommit are older versions of the
same change. They don't point to the metacommits from the parent
change. That means that there is no single ref we could create for a
branch that would reach all the necessary metacommits.

> I got the impression they had put quite a lot of effort
> into having evolve automatically run and resolve divergences when
> pulling and rebasing, is there a long term plan for git to do the same?

IMO, we should add anything to the plan if doing so improves the
workflow of our users... but it sounds like you're referring to
mercurial features I've never used. Could you point me to specific
docs on the feature you want and/or make a concrete suggestion about
how it might work?

I never use pull so it slipped my mind. It would probably make sense
to have the option of doing an automatic evolve after pull (actually,
once the feature is stable, most users would probably want it to be
the default). How do you think it should be triggered? "git pull
--evolve"? or perhaps "git pull --rebase=evolve"? We should probably
also introduce a new "evolve" enum value to branch..rebase
config value. I'll use "--evolve" for now. If may make sense to add
"--evolve" to every git command that performs an automatic evolve when
done.

> What happens if the original commit are currently checked out with local
> changes?

For a start, I'll probably just display an error message if the
current working tree is dirty ("Please stash"). Long term, I'd like it
to work like rebase --autostash. It should stash your changes, do the
evolve, return to the evolved version of the original change, and
reapply the stash. I'll add this to the doc.

> Can I suggest using refs/remote//metas. I

Ooh! Great idea! I'll update the doc.

> I think this could be useful (although I guess you can get the branches
> you've been working on recently from HEAD's reflog quite easily).

The changes list is different from the reflog. It's a list of all your
unsubmitted patches - regardless of their age or what branch they're
on. They may not have corresponding branches: you may have been
working on them with a detached head, or there may be multiple changes
on the same branch. You might not have visited them recently, in which
case they wouldn't be in the reflog at all. You may have reset to an
older version of the change, in which case they'd be in the reflog but
the reflog and change point to different places. If you've used gerrit
before, the "changes" list will contain pretty much the same content
as the gerrit dashboard, except that it works locally.

>> +Much like a merge conflict, divergence is a situation that requires user
>> +intervention to resolve. The evolve command will stop when it encounters
>> +divergence and prompt the user to resolve the problem. Users can solve the
>> +problem in several ways:
>> +
>> +- Discard one of the changes (by deleting its change branch).
>> +- Merge the two changes (producing a single change branch).
>
>I assume this wont create merge commits for the actual commits though,
>just merge the meta branches and create some new commits that are each
>the result of something like 'merge-recursive original-commit
>our-new-version 

[PATCH 0/1] legacy-rebase: fix "regression"

2018-11-20 Thread Johannes Schindelin via GitGitGadget
This is a backport, really, to accommodate a new regression test that was
introduced when the built-in rebase learned to validate the -C and 
--whitespace= arguments early.

Johannes Schindelin (1):
  legacy-rebase: backport -C and --whitespace= checks

 git-legacy-rebase.sh | 8 
 1 file changed, 8 insertions(+)


base-commit: bb75be6cb916297f271c846f2f9caa3daaaec718
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-86%2Fdscho%2Fscripted-rebase-Cn-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-86/dscho/scripted-rebase-Cn-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/86
-- 
gitgitgadget


[PATCH 1/1] legacy-rebase: backport -C and --whitespace= checks

2018-11-20 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Since 04519d720114 (rebase: validate -C and --whitespace=
parameters early, 2018-11-14), the built-in rebase validates the -C and
--whitespace arguments early. As this commit also introduced a
regression test for this, and as a later commit introduced the
GIT_TEST_REBASE_USE_BUILTIN mode to run tests, we now have a
"regression" in the scripted version of `git rebase` on our hands.

Backport the validation to fix this.

Reported-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Johannes Schindelin 
---
 git-legacy-rebase.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/git-legacy-rebase.sh b/git-legacy-rebase.sh
index 75a08b2683..ced0635326 100755
--- a/git-legacy-rebase.sh
+++ b/git-legacy-rebase.sh
@@ -337,6 +337,11 @@ do
fix|strip)
force_rebase=t
;;
+   warn|nowarn|error|error-all)
+   ;; # okay, known whitespace option
+   *)
+   die "Invalid whitespace option: '${1%*=}'"
+   ;;
esac
;;
--ignore-whitespace)
@@ -352,6 +357,9 @@ do
git_am_opt="$git_am_opt $1"
force_rebase=t
;;
+   -C*[!0-9]*)
+   die "switch \`C' expects a numerical value"
+   ;;
-C*)
git_am_opt="$git_am_opt $1"
;;
-- 
gitgitgadget


Re: [PATCH] rebase: mark a test as failing with rebase.useBuiltin=false

2018-11-20 Thread Johannes Schindelin
Hi Ævar,

On Tue, 20 Nov 2018, Ævar Arnfjörð Bjarmason wrote:

> Mark a test added in 04519d7201 ("rebase: validate -C and
> --whitespace= parameters early", 2018-11-14) as only succeeding
> with the builtin version of rebase. It would be nice if the
> shellscript version had the same fix, but it's on its way out, and the
> author is not interested in fixing it[1].

It's not that I am not interested in fixing it. It's more like I hoped
that I could work on Git for Windows v2.20.0-rc0 and rely on you to fix
this bug.

Now that Git for Windows v2.20.0-rc0 is out (see
https://github.com/git-for-windows/git/releases/tag/v2.20.0-rc0.windows.1),
expect a patch soon (see https://github.com/gitgitgadget/git/pull/86).

Ciao,
Dscho

> 
> This makes the entire test suite pass again with the
> GIT_TEST_REBASE_USE_BUILTIN=false mode added in my 62c23938fa ("tests:
> add a special setup where rebase.useBuiltin is off", 2018-11-14).
> 
> 1. 
> https://public-inbox.org/git/nycvar.qro.7.76.6.1811201157170...@tvgsbejvaqbjf.bet/
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> 
> On Tue, Nov 20 2018, Johannes Schindelin wrote:
> 
> > [...]
> > Maybe you can concoct a prereq for this test? Something like
> >
> > test_lazy_prereq BUILTIN_REBASE '
> >   test true = "${GIT_TEST_REBASE_USE_BUILTIN:-true}"
> > '
> 
> It's better to just mark the test as needing the prereq turned
> off. E.g. this is what we do for the split index tests & now for the
> gettext tests. That way we always run the test, but just indicate that
> it relies on GIT_TEST_REBASE_USE_BUILTIN being unset.
> 
>  t/t3406-rebase-message.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/t/t3406-rebase-message.sh b/t/t3406-rebase-message.sh
> index 38bd876cab..77e5bbb3d5 100755
> --- a/t/t3406-rebase-message.sh
> +++ b/t/t3406-rebase-message.sh
> @@ -84,7 +84,8 @@ test_expect_success 'rebase --onto outputs the invalid ref' 
> '
>   test_i18ngrep "invalid-ref" err
>  '
>  
> -test_expect_success 'error out early upon -C or --whitespace=' '
> +test_expect_success 'builtin rebase: error out early upon -C or 
> --whitespace=' '
> + sane_unset GIT_TEST_REBASE_USE_BUILTIN &&
>   test_must_fail git rebase -Cnot-a-number HEAD 2>err &&
>   test_i18ngrep "numerical value" err &&
>   test_must_fail git rebase --whitespace=bad HEAD 2>err &&
> -- 
> 2.20.0.rc0.387.gc7a69e6b6c
> 
> 

[PATCH v2 6/6] commit-graph write: add even more progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Add more progress output to sections of code that can collectively
take 5-10 seconds on a large enough repository. On a test repository
with I have with ~7 million commits and ~50 million objects we'll now
emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(50026015/50026015), done.
Annotating commit graph: 21567407, done.
Counting distinct commits in commit graph: 100% (7189147/7189147), done.
Finding extra edges in commit graph: 100% (7189147/7189147), done.
Computing commit graph generation numbers: 100% (7144680/7144680), done.
Writing out commit graph: 21434417, done.

Whereas on a medium-sized repository such as linux.git these new
progress bars won't have time to kick in and as before and we'll still
emit output like:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365492/6365492), done.
Annotating commit graph: 2391666, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph: 2399912, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 0e98679bce..1ad960 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -887,12 +887,19 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable(, report_progress);
 
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Counting distinct commits in commit graph"),
+   oids.nr);
+   display_progress(progress, 0); /* TODO: Measure QSORT() progress */
QSORT(oids.list, oids.nr, commit_compare);
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
+   display_progress(progress, i + 1);
if (!oideq([i - 1], [i]))
count_distinct++;
}
+   stop_progress();
 
if (count_distinct >= GRAPH_PARENT_MISSING)
die(_("the commit graph format cannot write %d commits"), 
count_distinct);
@@ -902,8 +909,13 @@ void write_commit_graph(const char *obj_dir,
ALLOC_ARRAY(commits.list, commits.alloc);
 
num_extra_edges = 0;
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Finding extra edges in commit graph"),
+   oids.nr);
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
+   display_progress(progress, i + 1);
if (i > 0 && oideq([i - 1], [i]))
continue;
 
@@ -920,6 +932,7 @@ void write_commit_graph(const char *obj_dir,
commits.nr++;
}
num_chunks = num_extra_edges ? 4 : 3;
+   stop_progress();
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 4/6] commit-graph write: add more describing progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

$ git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365492/6365492), done.
[...]

# Actually we don't emit this since this takes almost no time at
# all. But if we did (s/_delayed//) we'd show:
$ git for-each-ref --format='%(objectname)' | git commit-graph write 
--stdin-commits
Finding commits for commit graph from 584 refs: 100% (584/584), done.
[...]

$ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
Finding commits for commit graph in 3 packs: 6365492, done.
[...]

The middle on of those is going to be the output users might see in
practice, since it'll be emitted when they get the commit graph via
gc.writeCommitGraph=true. But as noted above you need a really large
number of refs for this message to show. It'll show up on a test
repository I have with ~165k refs:

Finding commits for commit graph from 165203 refs: 100% (165203/165203), 
done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7c1afa4704..fd1fd61750 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -779,6 +779,7 @@ void write_commit_graph(const char *obj_dir,
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
unsigned long approx_nr_objects;
+   struct strbuf progress_title = STRBUF_INIT;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -815,8 +816,12 @@ void write_commit_graph(const char *obj_dir,
strbuf_addf(, "%s/pack/", obj_dir);
dirlen = packname.len;
if (report_progress) {
-   oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph in %d 
pack",
+  "Finding commits for commit graph in %d 
packs",
+  pack_indexes->nr),
+   pack_indexes->nr);
+   oids.progress = 
start_delayed_progress(progress_title.buf, 0);
oids.progress_done = 0;
}
for (i = 0; i < pack_indexes->nr; i++) {
@@ -833,14 +838,20 @@ void write_commit_graph(const char *obj_dir,
free(p);
}
stop_progress();
+   strbuf_reset(_title);
strbuf_release();
}
 
if (commit_hex) {
-   if (report_progress)
-   progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
-   commit_hex->nr);
+   if (report_progress) {
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph from 
%d ref",
+  "Finding commits for commit graph from 
%d refs",
+  commit_hex->nr),
+   commit_hex->nr);
+   progress = start_delayed_progress(progress_title.buf,
+ commit_hex->nr);
+   }
for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
@@ -860,12 +871,13 @@ void write_commit_graph(const char *obj_dir,
}
}
stop_progress();
+   strbuf_reset(_title);
}
 
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
+   _("Finding commits for commit graph among 
packed objects"),
approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
if (oids.progress_done < approx_nr_objects)
@@ -970,6 +982,8 @@ void write_commit_graph(const char *obj_dir,
  _cnt);
stop_progress();
 
+   strbuf_release(_title);
+
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 3/6] commit-graph write: show progress for object search

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

Finding commits for commit graph: 6365492, done.
[...]

And now:

Finding commits for commit graph: 100% (6365492/6365492), done.
[...]

Since the commit graph only includes those commits that are packed
(via for_each_packed_object(...)) the approximate_object_count()
returns the actual number of objects we're going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6f6409b292..7c1afa4704 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -778,12 +778,14 @@ void write_commit_graph(const char *obj_dir,
struct commit_list *parent;
struct progress *progress = NULL;
uint64_t progress_cnt = 0;
+   unsigned long approx_nr_objects;
 
if (!commit_graph_compatible(the_repository))
return;
 
oids.nr = 0;
-   oids.alloc = approximate_object_count() / 32;
+   approx_nr_objects = approximate_object_count();
+   oids.alloc = approx_nr_objects / 32;
oids.progress = NULL;
oids.progress_done = 0;
 
@@ -863,8 +865,11 @@ void write_commit_graph(const char *obj_dir,
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   _("Finding commits for commit graph"),
+   approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
+   if (oids.progress_done < approx_nr_objects)
+   display_progress(oids.progress, approx_nr_objects);
stop_progress();
}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 5/6] commit-graph write: remove empty line for readability

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index fd1fd61750..0e98679bce 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -888,7 +888,6 @@ void write_commit_graph(const char *obj_dir,
close_reachable(, report_progress);
 
QSORT(oids.list, oids.nr, commit_compare);
-
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
if (!oideq([i - 1], [i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 0/6] commit-graph write: progress output improvements

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Fixes issues SZEDER raised with v1, except displaying an accurate ETA
in write_graph_*(). As noted in 2/6 I don't think it's worth it, I
just adjusted the message instead.

Ævar Arnfjörð Bjarmason (6):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add more progress output
  commit-graph write: show progress for object search
  commit-graph write: add more describing progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add even more progress output

 commit-graph.c | 92 +++---
 1 file changed, 73 insertions(+), 19 deletions(-)

Range-diff:
1:  751d3a7561 ! 1:  093c63e99f commit-graph write: add more progress output
 a => b | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

@@ -13,22 +13,30 @@
 point at which we're not producing progress output:
 
 $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
-Finding commits for commit graph: 6418991, done.
-Computing commit graph generation numbers: 100% (797205/797205), 
done.
-Writing out commit graph chunks: 2399861, done.
+Finding commits for commit graph: 6365492, done.
+Computing commit graph generation numbers: 100% (797222/797222), 
done.
+Writing out commit graph: 2399912, done.
 
-This "graph chunks" number is not meant to be meaningful to the user,
+This "writing out" number is not meant to be meaningful to the user,
 but just to show that we're doing work and the command isn't
 hanging.
 
+In the current implementation it's approximately 4x the number of
+commits. As noted in on-list discussion[1] we could add the loops up
+and show percentage progress here, but I don't think it's worth it. It
+would make the implementation more complex and harder to maintain for
+very little gain.
+
 On a much larger in-house repository I have we'll show (note how we
 also say "Annotating[...]"):
 
 $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
-Finding commits for commit graph: 48271163, done.
-Annotating commit graph: 21424536, done.
-Computing commit graph generation numbers: 100% (7141512/7141512), 
done.
-Writing out commit graph chunks: 21424913, done.
+Finding commits for commit graph: 50026015, done.
+Annotating commit graph: 21567407, done.
+Computing commit graph generation numbers: 100% (7144680/7144680), 
done.
+Writing out commit graph: 21434417, done.
+
+1. https://public-inbox.org/git/20181120165800.gb30...@szeder.dev/
 
 Signed-off-by: Ævar Arnfjörð Bjarmason 
 
@@ -50,8 +58,7 @@
 */
for (i = 0; i < 256; i++) {
while (count < nr_commits) {
-+  if (progress)
-+  display_progress(progress, ++*progress_cnt);
++  display_progress(progress, ++*progress_cnt);
if ((*list)->object.oid.hash[0] != i)
break;
count++;
@@ -68,8 +75,7 @@
int count;
 -  for (count = 0; count < nr_commits; count++, list++)
 +  for (count = 0; count < nr_commits; count++, list++) {
-+  if (progress)
-+  display_progress(progress, ++*progress_cnt);
++  display_progress(progress, ++*progress_cnt);
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
 +  }
  }
@@ -87,15 +93,13 @@
struct commit **list = commits;
struct commit **last = commits + nr_commits;
 @@
+   struct commit_list *parent;
int edge_value;
uint32_t packedDate[2];
++  display_progress(progress, ++*progress_cnt);
  
-+  if (progress)
-+  display_progress(progress, ++*progress_cnt);
-+
parse_commit(*list);
hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
- 
 @@
  
  static void write_graph_chunk_large_edges(struct hashfile *f,
@@ -108,20 +112,18 @@
struct commit **list = commits;
struct commit **last = commits + nr_commits;
 @@
+ commits,
  nr_commits,
  commit_to_sha1);
++  display_progress(progress, ++*progress_cnt);
  
-+  if (progress)
-+  display_progress(progress, ++*progress_cnt);
-+
if (edge_value < 0)
edge_value = GRAPH_PARENT_MISSING;
-   else if (!parent->next)
 @@

[PATCH v2 2/6] commit-graph write: add more progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Add more progress output to the output already added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 6365492, done.
Computing commit graph generation numbers: 100% (797222/797222), done.
Writing out commit graph: 2399912, done.

This "writing out" number is not meant to be meaningful to the user,
but just to show that we're doing work and the command isn't
hanging.

In the current implementation it's approximately 4x the number of
commits. As noted in on-list discussion[1] we could add the loops up
and show percentage progress here, but I don't think it's worth it. It
would make the implementation more complex and harder to maintain for
very little gain.

On a much larger in-house repository I have we'll show (note how we
also say "Annotating[...]"):

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 50026015, done.
Annotating commit graph: 21567407, done.
Computing commit graph generation numbers: 100% (7144680/7144680), done.
Writing out commit graph: 21434417, done.

1. https://public-inbox.org/git/20181120165800.gb30...@szeder.dev/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 41 -
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e6d0d7722b..6f6409b292 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, 
const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 struct commit **commits,
-int nr_commits)
+int nr_commits,
+struct progress *progress,
+uint64_t *progress_cnt)
 {
int i, count = 0;
struct commit **list = commits;
@@ -445,6 +447,7 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 */
for (i = 0; i < 256; i++) {
while (count < nr_commits) {
+   display_progress(progress, ++*progress_cnt);
if ((*list)->object.oid.hash[0] != i)
break;
count++;
@@ -456,12 +459,16 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
int count;
-   for (count = 0; count < nr_commits; count++, list++)
+   for (count = 0; count < nr_commits; count++, list++) {
+   display_progress(progress, ++*progress_cnt);
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+   }
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +478,9 @@ static const unsigned char *commit_to_sha1(size_t index, 
void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -481,6 +490,7 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
struct commit_list *parent;
int edge_value;
uint32_t packedDate[2];
+   display_progress(progress, ++*progress_cnt);
 
parse_commit(*list);
hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
@@ -542,7 +552,9 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
  struct commit **commits,
- int nr_commits)
+ int nr_commits,
+ struct progress *progress,
+ uint64_t *progress_cnt)
 {
struct 

[PATCH v2 1/6] commit-graph write: rephrase confusing progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.gh19...@szeder.dev/

Reported-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..e6d0d7722b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, 
int report_progress)
 
if (report_progress)
progress = start_delayed_progress(
-   _("Annotating commits in commit graph"), 0);
+   _("Annotating commit graph"), 0);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH] .mailmap: record canonical email address for Sven Strickroth

2018-11-20 Thread Sven Strickroth
Signed-off-by: Sven Strickroth 
---
 .mailmap | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.mailmap b/.mailmap
index f165222a78..a30f409f39 100644
--- a/.mailmap
+++ b/.mailmap
@@ -235,6 +235,8 @@ Steven Grimm  

 Steven Grimm  kor...@midwinter.com
 Steven Walter  
 Steven Walter  
+Sven Strickroth  
+Sven Strickroth  
 Sven Verdoolaege  
 Sven Verdoolaege  
 SZEDER Gábor  
-- 
2.19.1.windows.1



Re: [PATCH] clone: fix colliding file detection on APFS

2018-11-20 Thread Duy Nguyen
On Tue, Nov 20, 2018 at 8:35 PM Carlo Arenas  wrote:
> IMHO it would be ideal if test would be enabled/validated for windows
> (native, not only cygwin) as it might even work without the override
> and if we are to see conflicts, that is probably where most users with
> file insensitive filesystems might be found

Yes but I can't test on Windows so I will not enable the test until I
got a report that it's working there.
-- 
Duy


Re: [PATCH] clone: fix colliding file detection on APFS

2018-11-20 Thread Carlo Arenas
Tested-by: Carlo Marcelo Arenas Belón 

in macOS 10.14.1 with APFS
in Linux using VFAT (for the lulz)

IMHO it would be ideal if test would be enabled/validated for windows
(native, not only cygwin) as it might even work without the override
and if we are to see conflicts, that is probably where most users with
file insensitive filesystems might be found

Carlo


Re: [PATCH v2 10/17] help: use command-list.txt for the source of guides

2018-11-20 Thread Ævar Arnfjörð Bjarmason


On Sun, May 20 2018, Nguyễn Thái Ngọc Duy wrote:

> The help command currently hard codes the list of guides and their
> summary in C. Let's move this list to command-list.txt. This lets us
> extract summary lines from Documentation/git*.txt. This also
> potentially lets us list guides in git.txt, but I'll leave that for
> now.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/gitattributes.txt|  2 +-
>  Documentation/gitmodules.txt   |  2 +-
>  Documentation/gitrevisions.txt |  2 +-
>  Makefile   |  2 +-
>  builtin/help.c | 32 --
>  command-list.txt   | 16 +
>  contrib/completion/git-completion.bash | 15 
>  help.c | 21 +
>  help.h |  1 +
>  t/t0012-help.sh|  6 +
>  10 files changed, 54 insertions(+), 45 deletions(-)
>
> diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
> index 1094fe2b5b..083c2f380d 100644
> --- a/Documentation/gitattributes.txt
> +++ b/Documentation/gitattributes.txt
> @@ -3,7 +3,7 @@ gitattributes(5)
>
>  NAME
>  
> -gitattributes - defining attributes per path
> +gitattributes - Defining attributes per path
>
>  SYNOPSIS
>  
> diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt
> index db5d47eb19..4d63def206 100644
> --- a/Documentation/gitmodules.txt
> +++ b/Documentation/gitmodules.txt
> @@ -3,7 +3,7 @@ gitmodules(5)
>
>  NAME
>  
> -gitmodules - defining submodule properties
> +gitmodules - Defining submodule properties
>
>  SYNOPSIS
>  
> diff --git a/Documentation/gitrevisions.txt b/Documentation/gitrevisions.txt
> index 27dec5b91d..1f6cceaefb 100644
> --- a/Documentation/gitrevisions.txt
> +++ b/Documentation/gitrevisions.txt
> @@ -3,7 +3,7 @@ gitrevisions(7)
>
>  NAME
>  
> -gitrevisions - specifying revisions and ranges for Git
> +gitrevisions - Specifying revisions and ranges for Git
>
>  SYNOPSIS
>  
> diff --git a/Makefile b/Makefile
> index a60a78ee67..1efb751e46 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1937,7 +1937,7 @@ $(BUILT_INS): git$X
>
>  command-list.h: generate-cmdlist.sh command-list.txt
>
> -command-list.h: $(wildcard Documentation/git-*.txt)
> +command-list.h: $(wildcard Documentation/git*.txt)
>   $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh command-list.txt >$@+ 
> && mv $@+ $@
>
>  SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
> diff --git a/builtin/help.c b/builtin/help.c
> index 0e0af8426a..5727fb5e51 100644
> --- a/builtin/help.c
> +++ b/builtin/help.c
> @@ -402,38 +402,6 @@ static void show_html_page(const char *git_cmd)
>   open_html(page_path.buf);
>  }
>
> -static struct {
> - const char *name;
> - const char *help;
> -} common_guides[] = {
> - { "attributes", N_("Defining attributes per path") },
> - { "everyday", N_("Everyday Git With 20 Commands Or So") },
> - { "glossary", N_("A Git glossary") },
> - { "ignore", N_("Specifies intentionally untracked files to ignore") },
> - { "modules", N_("Defining submodule properties") },
> - { "revisions", N_("Specifying revisions and ranges for Git") },
> - { "tutorial", N_("A tutorial introduction to Git (for version 1.5.1 or 
> newer)") },
> - { "workflows", N_("An overview of recommended workflows with Git") },
> -};
> -
> -static void list_common_guides_help(void)
> -{
> - int i, longest = 0;
> -
> - for (i = 0; i < ARRAY_SIZE(common_guides); i++) {
> - if (longest < strlen(common_guides[i].name))
> - longest = strlen(common_guides[i].name);
> - }
> -
> - puts(_("The common Git guides are:\n"));
> - for (i = 0; i < ARRAY_SIZE(common_guides); i++) {
> - printf("   %s   ", common_guides[i].name);
> - mput_char(' ', longest - strlen(common_guides[i].name));
> - puts(_(common_guides[i].help));
> - }
> - putchar('\n');
> -}
> -
>  static const char *check_git_cmd(const char* cmd)
>  {
>   char *alias;
> diff --git a/command-list.txt b/command-list.txt
> index 3bd23201a6..99ddc231c1 100644
> --- a/command-list.txt
> +++ b/command-list.txt
> @@ -139,3 +139,19 @@ gitweb  
> ancillaryinterrogators
>  git-whatchanged ancillaryinterrogators
>  git-worktreemainporcelain
>  git-write-tree  plumbingmanipulators
> +gitattributes   guide
> +gitcli  guide
> +gitcore-tutorialguide
> +gitcvs-migrationguide
> +gitdiffcore guide
> +giteveryday guide
> +gitglossary guide
> +githooksguide

Re: [PATCH] clone: fix colliding file detection on APFS

2018-11-20 Thread Ramsay Jones



On 20/11/2018 16:28, Nguyễn Thái Ngọc Duy wrote:
> Commit b878579ae7 (clone: report duplicate entries on case-insensitive
> filesystems - 2018-08-17) adds a warning to user when cloning a repo
> with case-sensitive file names on a case-insensitive file system. The
> "find duplicate file" check was doing by comparing inode number (and
> only fall back to fspathcmp() when inode is known to be unreliable
> because fspathcmp() can't cover all case folding cases).
> 
> The inode check is very simple, and wrong. It compares between a
> 32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
> an inode is larger than 2^32 (which seems to be the case for APFS), it
> will be truncated and stored in sd_ino, but comparing with itself will
> fail.
> 
> As a result, instead of showing a pair of files that have the same
> name, we show just one file (marked before the beginning of the
> loop). We fail to find the original one.
> 
> The fix could be just a simple type cast (*)
> 
> dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino
> 
> but this is no longer a reliable test, there are 4G possible inodes
> that can match sd_ino because we only match the lower 32 bits instead
> of full 64 bits.
> 
> There are two options to go. Either we ignore inode and go with
> fspathcmp() on Apple platform. This means we can't do accurate inode
> check on HFS anymore, or even on APFS when inode numbers are still
> below 2^32.
> 
> Or we just to to reduce the odds of matching a wrong file by checking
> more attributes, counting mostly on st_size because st_xtime is likely
> the same. This patch goes with this direction, hoping that false
> positive chances are too small to be seen in practice.
> 
> While at there, enable the test on Cygwin (verified working by Ramsay
> Jones)

Well, no, I tested the previous version of this patch. However, this
patch also passes the test. (Note _test_ singular - in order to check
that this patch doesn't cause a regression I would need to run the
whole test-suite - that takes 3.5 hours, if I'm not doing anything
else!)

> 
> (*) this is also already done inside match_stat_data()
> 
> Reported-by: Carlo Arenas 
> Helped-by: Ramsay Jones 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  So I'm going with match_stat_data(). But I don't know, perhaps just
>  ignoring inode (like Carlo's original patch) is safer/better?
> 
>  Tested on case-insensitive JFS on Linux. But I don't think it really
>  matters because I'm not even sure if I could push inode above 2^32
>  with this. Hacking JFS for this test sounds fun, but no time for that.
> 
>  entry.c  | 4 ++--
>  t/t5601-clone.sh | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..0a3c451f5f 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout 
> *state,
>  {
>   int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)

I was a little curious about this (but couldn't be bothered actually
read the code, post-application), so I removed this hunk from the
patch, rebuilt and ran the test again: it _passed_ the test. :-D

So, ...

ATB,
Ramsay Jones

>   trust_ino = 0;
>  #endif
>  
> @@ -419,7 +419,7 @@ static void mark_colliding_entries(const struct checkout 
> *state,
>   if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>   continue;
>  
> - if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> + if ((trust_ino && !match_stat_data(>ce_stat_data, st)) ||
>   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
>   dup->ce_flags |= CE_MATCHED;
>   break;
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index f1a49e94f5..c28d51bd59 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
>   )
>  '
>  
> -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
> detection' '
> +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
>   grep X icasefs/warning &&
>   grep x icasefs/warning &&
>   test_i18ngrep "the following paths have collided" icasefs/warning
> 


Re: [PATCH 1/1] revision.c: use new topo-order logic in tests

2018-11-20 Thread Derrick Stolee

On 11/20/2018 1:13 AM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:


@@ -3143,6 +3144,9 @@ int prepare_revision_walk(struct rev_info *revs)
commit_list_sort_by_date(>commits);
if (revs->no_walk)
return 0;
+   if (revs->limited &&
+   git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
+   revs->limited = 0;
if (revs->limited) {
if (limit_list(revs) < 0)
return -1;

That is equivalent to say

if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
revs->limited = 0;


Not exactly equivalent, because we can use short-circuiting to avoid the 
git_env_bool check, but I see what you mean.



Wouldn't that make the codepath that involves limit_list()
completely unreachable while testing, though?


Testing with GIT_TEST_COMMIT_GRAPH=0 would still hit limit_list(). Both 
modes are important to test (for instance, to ensure we still have 
correct behavior without a commit-graph file).



The title only mentions "topo-order" logic, but the topo-order is
not the only reason why limited bit can be set, is it?  When showing
merges, simplifying merges, or post-processing to show ancestry
path, or showing a bottom-limited revision range, the limited bit is
automatically set because all of these depend on first calling
limit_list() and then postprocessing its result.  Doesn't it hurt
these cases to unconditionally drop limited bit?


You're right that we only want to do this in the topo-order case, so 
perhaps the diff should instead be:


if (revs->no_walk)
return 0;
+   if (revs->topo_order &&
+   git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
+   revs->limited = 0;
if (revs->limited) {
if (limit_list(revs) < 0)
return -1;

Thanks,
-Stolee



Re: [PATCH v1 9/9] diff --color-moved-ws: handle blank lines

2018-11-20 Thread Stefan Beller
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> When using --color-moved-ws=allow-indentation-change allow lines with
> the same indentation change to be grouped across blank lines. For now
> this only works if the blank lines have been moved as well, not for
> blocks that have just had their indentation changed.
>
> This completes the changes to the implementation of
> --color-moved=allow-indentation-change. Running
>
>   git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0
>
> now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
> version of the previous implementation and 66% from the original which
> took 14.6s.
>
> Signed-off-by: Phillip Wood 
> ---
>
> Notes:
> Changes since rfc:
>  - Split these changes into a separate commit.
>  - Detect blank lines when processing the indentation rather than
>parsing each line twice.
>  - Tweaked the test to make it harder as suggested by Stefan.
>  - Added timing data to the commit message.
>
>  diff.c | 34 ---
>  t/t4015-diff-whitespace.sh | 41 ++
>  2 files changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 89559293e7..072b5bced6 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
> memset(b, 0, sizeof(*b));
>  }
>
> +#define INDENT_BLANKLINE INT_MIN

Answering my question from the previous patch:
This is why we need to keep the indents signed.

This patch looks quite nice to read along.

The whole series looks good to me.
Do we need to update the docs in any way?

Thanks,
Stefan


[PATCH] Documentation: build technical/multi-pack-index

2018-11-20 Thread Todd Zullinger
The git-multi-pack-index doc links to technical/multi-pack-index.html.
Ensure it is built to prevent a broken link.

Signed-off-by: Todd Zullinger 
---
While building 2.20.0-rc0 I noticed the broken link from
git-multi-pack-index to technical/multi-pack-index.html.

 Documentation/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 48d261dc2c..d5d936e6a7 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -73,6 +73,7 @@ TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
 TECH_DOCS += technical/long-running-process-protocol
+TECH_DOCS += technical/multi-pack-index
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
-- 
2.19.1



[RFC] Introduce two new commands, switch-branch and restore-paths

2018-11-20 Thread Duy Nguyen
On Mon, Nov 19, 2018 at 04:19:53PM +0100, Duy Nguyen wrote:
> I promise to come back with something better (at least it still
> sounds better in my mind). If that idea does not work out, we can
> come back and see if we can improve this.

So this is it. The patch isn't pretty, mostly as a proof of
concept. Just look at the three functions at the bottom of checkout.c,
which is the main thing.

This patch tries to split "git checkout" command in two new ones:

- git switch-branch is all about switching branches
- git restore-paths (maybe restore-file is better) for checking out
  paths

The main idea is these two commands will co-exist with the good old
'git checkout', which will NOT be deprecated. Old timers will still
use "git checkout". But new people should be introduced to the new two
instead. And the new ones are just as capable as "git checkout".

Since the three commands will co-exist (with duplicate functionality),
maintenance cost must be kept to minimum. The way I did this is simply
split the command line options into three pieces: common,
switch-branch and checkout-paths. "git checkout" has all three, the
other two have common and another piece.

With this, a new option added to git checkout will be automatically
available in either switch-branch or checkout-paths. Bug fixes apply
to all relevant commands.

Later on, we could start to add a bit more stuff in, e.g. some form of
disambiguation is no longer needed when running as switch-branch, or
restore-paths.

So, what do you think?

-- 8< --
diff --git a/builtin.h b/builtin.h
index 6538932e99..6e321ec8a4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -214,6 +214,7 @@ extern int cmd_remote_fd(int argc, const char **argv, const 
char *prefix);
 extern int cmd_repack(int argc, const char **argv, const char *prefix);
 extern int cmd_rerere(int argc, const char **argv, const char *prefix);
 extern int cmd_reset(int argc, const char **argv, const char *prefix);
+extern int cmd_restore_paths(int argc, const char **argv, const char *prefix);
 extern int cmd_rev_list(int argc, const char **argv, const char *prefix);
 extern int cmd_rev_parse(int argc, const char **argv, const char *prefix);
 extern int cmd_revert(int argc, const char **argv, const char *prefix);
@@ -227,6 +228,7 @@ extern int cmd_show_index(int argc, const char **argv, 
const char *prefix);
 extern int cmd_status(int argc, const char **argv, const char *prefix);
 extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
 extern int cmd_submodule__helper(int argc, const char **argv, const char 
*prefix);
+extern int cmd_switch_branch(int argc, const char **argv, const char *prefix);
 extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_tag(int argc, const char **argv, const char *prefix);
 extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index acdafc6e4c..868ca3c223 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -33,6 +33,16 @@ static const char * const checkout_usage[] = {
NULL,
 };
 
+static const char * const switch_branch_usage[] = {
+   N_("git switch-branch [] "),
+   NULL,
+};
+
+static const char * const restore_paths_usage[] = {
+   N_("git restore-paths [] [] -- ..."),
+   NULL,
+};
+
 struct checkout_opts {
int patch_mode;
int quiet;
@@ -44,6 +54,7 @@ struct checkout_opts {
int ignore_skipworktree;
int ignore_other_worktrees;
int show_progress;
+   int dwim_new_local_branch;
/*
 * If new checkout options are added, skip_merge_working_tree
 * should be updated accordingly.
@@ -55,6 +66,7 @@ struct checkout_opts {
int new_branch_log;
enum branch_track track;
struct diff_options diff_options;
+   char *conflict_style;
 
int branch_exists;
const char *prefix;
@@ -1223,78 +1235,105 @@ static int checkout_branch(struct checkout_opts *opts,
return switch_branches(opts, new_branch_info);
 }
 
-int cmd_checkout(int argc, const char **argv, const char *prefix)
+static struct option *add_common_options(struct checkout_opts *opts,
+struct option *prevopts)
 {
-   struct checkout_opts opts;
-   struct branch_info new_branch_info;
-   char *conflict_style = NULL;
-   int dwim_new_local_branch = 1;
-   int dwim_remotes_matched = 0;
struct option options[] = {
-   OPT__QUIET(, N_("suppress progress reporting")),
-   OPT_STRING('b', NULL, _branch, N_("branch"),
+   OPT__QUIET(>quiet, N_("suppress progress reporting")),
+   OPT_BOOL(0, "ignore-skip-worktree-bits", 
>ignore_skipworktree,
+N_("do not limit pathspecs to sparse entries only")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", NULL,
+   "checkout", "control recursive updating 

Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
This sounds like a proposal for general namespacing. I like it - that
would pave the way for other header extensions - but that should
probably be the subject of a separate proposal (who owns the content
of a namespace, what is the process for adding a new namespace or a
new attribute within a namespace, what order should the header
attributes appear in, what problem is namespacing there to solve, when
do we use a namespaced attribute versus a "reserved" attribute, etc.).

x-evolve-pt seems reasonable to me. If you're keen on this and want to
document the namespacing proposal, I'll conform to it. However, if
don't have formal rules for namespaces in place yet it might be better
to avoid the use of an x- prefix for now, just in case I accidentally
squat on a name that breaks whatever namespacing rules we eventually
come up with.

Since we're talking bytes, a more compact representation of
parent-type could use single-letter codes:
x-evolve-pt c r o
(where c=content, r=replace/obsolete, o=origin)

  - Stefan
On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Tue, Nov 20 2018, Jonathan Nieder wrote:
>
> > Ævar Arnfjörð Bjarmason wrote:
> >> On Thu, Nov 15 2018, sxe...@google.com wrote:
> >
> >>> +Parent-type
> >>> +---
> >>> +The “parent-type” field in the commit header identifies a commit as a
> >>> +meta-commit and indicates the meaning for each of its parents. It is 
> >>> never
> >>> +present for normal commits.
> > [...]
> >> I think it's worth pointing out for those that are rusty on commit
> >> object details (but I checked) is that the reason for it not being:
> >>
> >> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> >> parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> >> parent-type content
> >> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> >> parent-type obsolete
> >> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> >> parent-type origin
> >> author Stefan Xenos  1540841596 -0700
> >> committer Stefan Xenos  1540841596 -0700
> >>
> >> Which would be easier to read, is that we're very sensitive to the order
> >> of the first few fields (tree -> parent -> author -> committer) and fsck
> >> will error out if we interjected a new field.
> >
> > By the way, in the spirit of limiting the initial scope, I wonder
> > whether the parent-type fields can be stored in the commit message
> > initially.
> >
> > Elsewhere in this thread it was mentioned that the parent-type is a
> > field to allow tools like "git fsck" to understand the meaning of
> > these parent relationships (for example, to forbid a commit
> > referencing a meta-commit).  The same could be done using special
> > commit message text, though.
> >
> > The advantage of such an approach would be that we could experiment
> > without changing the official object format at all.  If experiments
> > revealed a different set of information to store, we could update the
> > format without having to maintain the memory of the older format in
> > "git fsck"'s understanding of commit object fields.  So even though I
> > think that in the end we would want to put this information in the
> > commit object header, I'm tempted to suspect that the benefits of
> > putting it in the commit message to start outweigh the costs (in
> > particular, of having to migrate to another format later).
>
> I think it sounds better to just make it, in the header:
>
> x-evolve-pt content
> x-evolve-pt obsolete
> x-evolve-pt origin
>
> Where "pt = parent-type", we could of course spell that out too, but in
> this case it's "x-evolve-pt" is the exact same number of bytes as
> "parent-type", so nobody can object that it takes more space:)
>
> We'd then carry some documentation where we say everything except "x-*-"
> is reserved, and that we'd like to know about new "*" there before it's
> used, so it can be documented.
>
> Putting it in the commit message just sounds like a hack around not
> having namespaced headers. If we'd like to keep this then tools would
> need to parse both (potentially unpacking a lot of the commit message
> object, it can be quite big in some cases...).


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Stefan Xenos
> I was trying to see if this is something we can leave out to limit the 
> initial scope.

Oh, in that case, "yes". :-) If there's a need to cut something,
origin parents would be a viable candidate.

I was thinking that this file could document the final goal so that if
anyone else wanted to contribute to the implementation, we would be
heading in the same direction. It seems reasonable that an early
implementation may omit origin parents. Since the actual
implementation will lag behind the spec, I'll add a status section to
the top of the document where we can describe the delta between plan
and implementation.

Also, I'm now convinced we're talking about the same thing. :-)

> > Are you claiming that this is undesirable, or are you claiming that
> > this could be accomplished without origin parents?
>
> I was trying to see if this is something we can leave out to limit
> the initial scope.


Re: [PATCH 2/6] commit-graph write: add more progress output

2018-11-20 Thread SZEDER Gábor
On Tue, Nov 20, 2018 at 03:04:39PM +, Ævar Arnfjörð Bjarmason wrote:
> Add more progress output to the output already added in
> 7b0f229222 ("commit-graph write: add progress output", 2018-09-17).
> 
> As noted in that commit most of the progress output isn't displayed on
> small repositories, but before this change we'd noticeably hang for
> 2-3 seconds at the end on medium sized repositories such as linux.git.
> 
> Now we'll instead show output like this, and have no human-observable
> point at which we're not producing progress output:
> 
> $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
> Finding commits for commit graph: 6418991, done.
> Computing commit graph generation numbers: 100% (797205/797205), done.
> Writing out commit graph chunks: 2399861, done.
> 
> This "graph chunks" number is not meant to be meaningful to the user,
> but just to show that we're doing work and the command isn't
> hanging.
> 
> On a much larger in-house repository I have we'll show (note how we
> also say "Annotating[...]"):
> 
> $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
> Finding commits for commit graph: 48271163, done.
> Annotating commit graph: 21424536, done.
> Computing commit graph generation numbers: 100% (7141512/7141512), done.
> Writing out commit graph chunks: 21424913, done.

That's a lot of chunks, but according to the specs, there are only 3
or 4 chunks in a commit-graph file.  More on this below.

> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  commit-graph.c | 47 ++-
>  1 file changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index e6d0d7722b..afce20dd4d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository 
> *r, const struct commit
>  
>  static void write_graph_chunk_fanout(struct hashfile *f,
>struct commit **commits,
> -  int nr_commits)
> +  int nr_commits,
> +  struct progress *progress,
> +  uint64_t *progress_cnt)
>  {
>   int i, count = 0;
>   struct commit **list = commits;
> @@ -445,6 +447,8 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>*/
>   for (i = 0; i < 256; i++) {
>   while (count < nr_commits) {
> + if (progress)
> + display_progress(progress, ++*progress_cnt);

The condition is unnecessary, display_progress() is prepared to deal
with a NULL progress pointer.  The same applies to all such calls in
this patch.

>   if ((*list)->object.oid.hash[0] != i)
>   break;
>   count++;
> @@ -456,12 +460,17 @@ static void write_graph_chunk_fanout(struct hashfile *f,
>  }
>  
>  static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
> -struct commit **commits, int nr_commits)
> +struct commit **commits, int nr_commits,
> +struct progress *progress,
> +uint64_t *progress_cnt)
>  {
>   struct commit **list = commits;
>   int count;
> - for (count = 0; count < nr_commits; count++, list++)
> + for (count = 0; count < nr_commits; count++, list++) {
> + if (progress)
> + display_progress(progress, ++*progress_cnt);
>   hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
> + }
>  }
>  
>  static const unsigned char *commit_to_sha1(size_t index, void *table)
> @@ -471,7 +480,9 @@ static const unsigned char *commit_to_sha1(size_t index, 
> void *table)
>  }
>  
>  static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> -struct commit **commits, int nr_commits)
> +struct commit **commits, int nr_commits,
> +struct progress *progress,
> +uint64_t *progress_cnt)
>  {
>   struct commit **list = commits;
>   struct commit **last = commits + nr_commits;
> @@ -482,6 +493,9 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>   int edge_value;
>   uint32_t packedDate[2];
>  
> + if (progress)
> + display_progress(progress, ++*progress_cnt);
> +
>   parse_commit(*list);
>   hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
>  
> @@ -542,7 +556,9 @@ static void write_graph_chunk_data(struct hashfile *f, 
> int hash_len,
>  
>  static void write_graph_chunk_large_edges(struct hashfile *f,
> struct commit **commits,
> -   int 

Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-20 Thread Duy Nguyen
On Tue, Nov 20, 2018 at 11:04 AM Jeff King  wrote:
>
> Commit 108f530385 (pack-objects: move tree_depth into 'struct
> packing_data', 2018-08-16) dynamically manages a tree_depth array in
> packing_data that maintains one of these invariants:
>
>   1. tree_depth is NULL (i.e., the requested options don't require us to
>  track tree depths)
>
>   2. tree_depth is non-NULL and has as many entries as the "objects"
>  array
>
> We maintain (2) by:
>
>   a. When the objects array grows, grow tree_depth to the same size
>  (unless it's NULL, in which case we can leave it).
>
>   b. When a caller asks to set a depth via oe_set_tree_depth(), if
>  tree_depth is NULL we allocate it.
>
> But in (b), we use the number of stored objects, _not_ the allocated
> size of the objects array. So we can run into a situation like this:
>
>   1. packlist_alloc() needs to store the Nth object, so it grows the
>  objects array to M, where M > N.
>
>   2. oe_set_tree_depth() wants to store a depth, so it allocates an
>  array of length N. Now we've violated our invariant.
>
>   3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
>  grow the objects array, since N <= M still holds. We try to assign
>  to tree_depth[N+1], which is out of bounds.

Do you think if this splitting data to packing_data is too fragile
that we should just scrape the whole thing and move all data back to
object_entry[]? We would use more memory of course but higher memory
usage is still better than more bugs (if these are likely to show up
again).
-- 
Duy


[PATCH] clone: fix colliding file detection on APFS

2018-11-20 Thread Nguyễn Thái Ngọc Duy
Commit b878579ae7 (clone: report duplicate entries on case-insensitive
filesystems - 2018-08-17) adds a warning to user when cloning a repo
with case-sensitive file names on a case-insensitive file system. The
"find duplicate file" check was doing by comparing inode number (and
only fall back to fspathcmp() when inode is known to be unreliable
because fspathcmp() can't cover all case folding cases).

The inode check is very simple, and wrong. It compares between a
32-bit number (sd_ino) and potentially a 64-bit number (st_ino). When
an inode is larger than 2^32 (which seems to be the case for APFS), it
will be truncated and stored in sd_ino, but comparing with itself will
fail.

As a result, instead of showing a pair of files that have the same
name, we show just one file (marked before the beginning of the
loop). We fail to find the original one.

The fix could be just a simple type cast (*)

dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino

but this is no longer a reliable test, there are 4G possible inodes
that can match sd_ino because we only match the lower 32 bits instead
of full 64 bits.

There are two options to go. Either we ignore inode and go with
fspathcmp() on Apple platform. This means we can't do accurate inode
check on HFS anymore, or even on APFS when inode numbers are still
below 2^32.

Or we just to to reduce the odds of matching a wrong file by checking
more attributes, counting mostly on st_size because st_xtime is likely
the same. This patch goes with this direction, hoping that false
positive chances are too small to be seen in practice.

While at there, enable the test on Cygwin (verified working by Ramsay
Jones)

(*) this is also already done inside match_stat_data()

Reported-by: Carlo Arenas 
Helped-by: Ramsay Jones 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 So I'm going with match_stat_data(). But I don't know, perhaps just
 ignoring inode (like Carlo's original patch) is safer/better?

 Tested on case-insensitive JFS on Linux. But I don't think it really
 matters because I'm not even sure if I could push inode above 2^32
 with this. Hacking JFS for this test sounds fun, but no time for that.

 entry.c  | 4 ++--
 t/t5601-clone.sh | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/entry.c b/entry.c
index 5d136c5d55..0a3c451f5f 100644
--- a/entry.c
+++ b/entry.c
@@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout 
*state,
 {
int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
trust_ino = 0;
 #endif
 
@@ -419,7 +419,7 @@ static void mark_colliding_entries(const struct checkout 
*state,
if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
continue;
 
-   if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
+   if ((trust_ino && !match_stat_data(>ce_stat_data, st)) ||
(!trust_ino && !fspathcmp(ce->name, dup->name))) {
dup->ce_flags |= CE_MATCHED;
break;
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f1a49e94f5..c28d51bd59 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
)
 '
 
-test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file 
detection' '
+test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
grep X icasefs/warning &&
grep x icasefs/warning &&
test_i18ngrep "the following paths have collided" icasefs/warning
-- 
2.19.1.1327.g328c130451.dirty



Re: [PATCH 0/3] delta-island fixes

2018-11-20 Thread Derrick Stolee

On 11/20/2018 4:44 AM, Jeff King wrote:

In cases like this I think it's often a good idea to have a perf test.
Those are expensive anyway, and we'd have the double benefit of
exercising the code and showing off the performance improvement. But the
delta-island code only makes sense on a very specialized repo: one where
you have multiple related but diverging histories.  You could simulate
that by picking two branches in a repository, but the effect is going to
be miniscule.


The changes in this series look correct. Too bad it is difficult to test.

Perhaps we should add a performance test for the --delta-islands check 
that would trigger the failure (and maybe a clone afterwards). There are 
lots of freely available forks of git.git that present interesting fork 
structure. Here are three that would suffice to make this interesting:


    https://github.com/git/git
    https://github.com/git-for-windows/git
    https://github.com/microsoft/git

Of course, running it on a specific repo is up to the person running the 
perf suite.


Thanks,
-Stolee


[PATCH 5/6] commit-graph write: remove empty line for readability

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Remove the empty line between a QSORT(...) and the subsequent oideq()
for-loop. This makes it clearer that the QSORT(...) is being done so
that we can run the oideq() loop on adjacent OIDs. Amends code added
in 08fd81c9b6 ("commit-graph: implement write_commit_graph()",
2018-04-02).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index a0aea850f1..d0961e89df 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -894,7 +894,6 @@ void write_commit_graph(const char *obj_dir,
close_reachable(, report_progress);
 
QSORT(oids.list, oids.nr, commit_compare);
-
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
if (!oideq([i - 1], [i]))
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 2/6] commit-graph write: add more progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Add more progress output to the output already added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

As noted in that commit most of the progress output isn't displayed on
small repositories, but before this change we'd noticeably hang for
2-3 seconds at the end on medium sized repositories such as linux.git.

Now we'll instead show output like this, and have no human-observable
point at which we're not producing progress output:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 6418991, done.
Computing commit graph generation numbers: 100% (797205/797205), done.
Writing out commit graph chunks: 2399861, done.

This "graph chunks" number is not meant to be meaningful to the user,
but just to show that we're doing work and the command isn't
hanging.

On a much larger in-house repository I have we'll show (note how we
also say "Annotating[...]"):

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph: 48271163, done.
Annotating commit graph: 21424536, done.
Computing commit graph generation numbers: 100% (7141512/7141512), done.
Writing out commit graph chunks: 21424913, done.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 47 ++-
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e6d0d7722b..afce20dd4d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -433,7 +433,9 @@ struct tree *get_commit_tree_in_graph(struct repository *r, 
const struct commit
 
 static void write_graph_chunk_fanout(struct hashfile *f,
 struct commit **commits,
-int nr_commits)
+int nr_commits,
+struct progress *progress,
+uint64_t *progress_cnt)
 {
int i, count = 0;
struct commit **list = commits;
@@ -445,6 +447,8 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 */
for (i = 0; i < 256; i++) {
while (count < nr_commits) {
+   if (progress)
+   display_progress(progress, ++*progress_cnt);
if ((*list)->object.oid.hash[0] != i)
break;
count++;
@@ -456,12 +460,17 @@ static void write_graph_chunk_fanout(struct hashfile *f,
 }
 
 static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
int count;
-   for (count = 0; count < nr_commits; count++, list++)
+   for (count = 0; count < nr_commits; count++, list++) {
+   if (progress)
+   display_progress(progress, ++*progress_cnt);
hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+   }
 }
 
 static const unsigned char *commit_to_sha1(size_t index, void *table)
@@ -471,7 +480,9 @@ static const unsigned char *commit_to_sha1(size_t index, 
void *table)
 }
 
 static void write_graph_chunk_data(struct hashfile *f, int hash_len,
-  struct commit **commits, int nr_commits)
+  struct commit **commits, int nr_commits,
+  struct progress *progress,
+  uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -482,6 +493,9 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
int edge_value;
uint32_t packedDate[2];
 
+   if (progress)
+   display_progress(progress, ++*progress_cnt);
+
parse_commit(*list);
hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
 
@@ -542,7 +556,9 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
 
 static void write_graph_chunk_large_edges(struct hashfile *f,
  struct commit **commits,
- int nr_commits)
+ int nr_commits,
+ struct progress *progress,
+ uint64_t *progress_cnt)
 {
struct commit **list = commits;
struct commit **last = commits + nr_commits;
@@ -566,6 +582,9 @@ static void write_graph_chunk_large_edges(struct hashfile 
*f,
  nr_commits,
   

[PATCH 1/6] commit-graph write: rephrase confusing progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Rephrase the title shown for the progress output emitted by
close_reachable(). The message I added in 7b0f229222 ("commit-graph
write: add progress output", 2018-09-17) gave the impression that it
would count up to the number of commit objects.

But that's not what the number means. It just represents the work
we're doing in several for-loops to do various work before the graph
is written out. So let's just say "Annotating commit graph", that
title makes no such promises, and we can add other loops here in the
future and still consistently show progress output.

See [1] for the initial bug report & subsequent discussion about other
approaching to solving this.

1. https://public-inbox.org/git/20181015165447.gh19...@szeder.dev/

Reported-by: SZEDER Gábor 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..e6d0d7722b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -648,7 +648,7 @@ static void close_reachable(struct packed_oid_list *oids, 
int report_progress)
 
if (report_progress)
progress = start_delayed_progress(
-   _("Annotating commits in commit graph"), 0);
+   _("Annotating commit graph"), 0);
for (i = 0; i < oids->nr; i++) {
display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 0/6] commit-graph write: progress output improvements

2018-11-20 Thread Ævar Arnfjörð Bjarmason
This replaces my "commit-graph: split up close_reachable() progress
output". We could still do something like that, but I think this makes
more sense, and also plugs some missing holes in the progress
output. See 6/6 for what the end-state is.

I believe this addresses SZEDER Gábor's concerns (thanks
b.t.w.!). I.e. now it should be clear to the user at each step if
we're counting objects, or just as in the case of close_reachable()
doing some X amount of work without any particular relation to the
number of objects or commits.

Ævar Arnfjörð Bjarmason (6):
  commit-graph write: rephrase confusing progress output
  commit-graph write: add more progress output
  commit-graph write: show progress for object search
  commit-graph write: add more describing progress output
  commit-graph write: remove empty line for readability
  commit-graph write: add even more progress output

 commit-graph.c | 98 --
 1 file changed, 79 insertions(+), 19 deletions(-)

-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 3/6] commit-graph write: show progress for object search

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Show the percentage progress for the "Finding commits for commit
graph" phase for the common case where we're operating on all packs in
the repository, as "commit-graph write" or "gc" will do.

Before we'd emit on e.g. linux.git with "commit-graph write":

Finding commits for commit graph: 6418991, done.
[...]

And now:

Finding commits for commit graph: 100% (6418991/6418991), done.
[...]

Since the commit graph only includes those commits that are
packed (via for_each_packed_object(...)) the
approximate_object_count() returns the actual number of objects we're
going to process.

Still, it is possible due to a race with "gc" or another process
maintaining packs that the number of objects we're going to process is
lower than what approximate_object_count() reported. In that case we
don't want to stop the progress bar short of 100%. So let's make sure
it snaps to 100% at the end.

The inverse case is also possible and more likely. I.e. that a new
pack has been added between approximate_object_count() and
for_each_packed_object(). In that case the percentage will go beyond
100%, and we'll do nothing to snap it back to 100% at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index afce20dd4d..4d03f8aa7f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -784,12 +784,14 @@ void write_commit_graph(const char *obj_dir,
struct commit_list *parent;
struct progress *progress = NULL;
uint64_t progress_cnt;
+   unsigned long approx_nr_objects;
 
if (!commit_graph_compatible(the_repository))
return;
 
oids.nr = 0;
-   oids.alloc = approximate_object_count() / 32;
+   approx_nr_objects = approximate_object_count();
+   oids.alloc = approx_nr_objects / 32;
oids.progress = NULL;
oids.progress_done = 0;
 
@@ -869,8 +871,11 @@ void write_commit_graph(const char *obj_dir,
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   _("Finding commits for commit graph"),
+   approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
+   if (oids.progress_done < approx_nr_objects)
+   display_progress(oids.progress, approx_nr_objects);
stop_progress();
}
 
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 6/6] commit-graph write: add even more progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Add more progress output to sections of code that can collectively
take 5-10 seconds on a large enough repository. On a test repository
with 7141512 commits (see earlier patches for details) we'll now emit:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(50009986/50009986), done.
Annotating commit graph: 21564240, done.
Counting distinct commits in commit graph: 100% (7188080/7188080), done.
Finding extra edges in commit graph: 100% (7188080/7188080), done.
Computing commit graph generation numbers: 100% (7143635/7143635), done.
Writing out commit graph chunks: 21431282, done.

Whereas on a medium-sized repository such as linux.git we'll still
emit output like:

$ ~/g/git/git --exec-path=$HOME/g/git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6365328/6365328), done.
Annotating commit graph: 2391621, done.
Computing commit graph generation numbers: 100% (797207/797207), done.
Writing out commit graph chunks: 2399867, done.

The "Counting distinct commits in commit graph" phase will spend most
of its time paused at "0/*" as we QSORT(...) the list. That's not
optimal, but at least we don't seem to be stalling anymore.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index d0961e89df..2e2eaa24ca 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -893,12 +893,19 @@ void write_commit_graph(const char *obj_dir,
 
close_reachable(, report_progress);
 
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Counting distinct commits in commit graph"),
+   oids.nr);
+   display_progress(progress, 0); /* TODO: Measure QSORT() progress */
QSORT(oids.list, oids.nr, commit_compare);
count_distinct = 1;
for (i = 1; i < oids.nr; i++) {
+   display_progress(progress, i + 1);
if (!oideq([i - 1], [i]))
count_distinct++;
}
+   stop_progress();
 
if (count_distinct >= GRAPH_PARENT_MISSING)
die(_("the commit graph format cannot write %d commits"), 
count_distinct);
@@ -908,8 +915,13 @@ void write_commit_graph(const char *obj_dir,
ALLOC_ARRAY(commits.list, commits.alloc);
 
num_extra_edges = 0;
+   if (report_progress)
+   progress = start_delayed_progress(
+   _("Finding extra edges in commit graph"),
+   oids.nr);
for (i = 0; i < oids.nr; i++) {
int num_parents = 0;
+   display_progress(progress, i + 1);
if (i > 0 && oideq([i - 1], [i]))
continue;
 
@@ -926,6 +938,7 @@ void write_commit_graph(const char *obj_dir,
commits.nr++;
}
num_chunks = num_extra_edges ? 4 : 3;
+   stop_progress();
 
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 4/6] commit-graph write: add more describing progress output

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Make the progress output shown when we're searching for commits to
include in the graph more descriptive. This amends code I added in
7b0f229222 ("commit-graph write: add progress output", 2018-09-17).

Now, on linux.git, we'll emit this sort of output in the various modes
we support:

$ git commit-graph write
Finding commits for commit graph among packed objects: 100% 
(6418991/6418991), done.
[...]
$ git for-each-ref --format='%(objectname)' | git commit-graph write 
--stdin-commits
Finding commits for commit graph from 584 ref tips: 100% (584/584), done.
[...]
$ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack
Finding commits for commit graph in 4 packs: 6418991, done.
[...]

The middle on of those is going to be the output users will most
commonly see, since it'll be emitted when they get the commit graph
via gc.writeCommitGraph=true.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 commit-graph.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 4d03f8aa7f..a0aea850f1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -785,6 +785,7 @@ void write_commit_graph(const char *obj_dir,
struct progress *progress = NULL;
uint64_t progress_cnt;
unsigned long approx_nr_objects;
+   struct strbuf progress_title = STRBUF_INIT;
 
if (!commit_graph_compatible(the_repository))
return;
@@ -821,8 +822,12 @@ void write_commit_graph(const char *obj_dir,
strbuf_addf(, "%s/pack/", obj_dir);
dirlen = packname.len;
if (report_progress) {
-   oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"), 0);
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph in %d 
pack",
+  "Finding commits for commit graph in %d 
packs",
+  pack_indexes->nr),
+   pack_indexes->nr);
+   oids.progress = 
start_delayed_progress(progress_title.buf, 0);
oids.progress_done = 0;
}
for (i = 0; i < pack_indexes->nr; i++) {
@@ -839,14 +844,20 @@ void write_commit_graph(const char *obj_dir,
free(p);
}
stop_progress();
+   strbuf_reset(_title);
strbuf_release();
}
 
if (commit_hex) {
-   if (report_progress)
-   progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
-   commit_hex->nr);
+   if (report_progress) {
+   strbuf_addf(_title,
+   Q_("Finding commits for commit graph from 
%d ref tip",
+  "Finding commits for commit graph from 
%d ref tips",
+  commit_hex->nr),
+   commit_hex->nr);
+   progress = start_delayed_progress(progress_title.buf,
+ commit_hex->nr);
+   }
for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
@@ -866,12 +877,13 @@ void write_commit_graph(const char *obj_dir,
}
}
stop_progress();
+   strbuf_reset(_title);
}
 
if (!pack_indexes && !commit_hex) {
if (report_progress)
oids.progress = start_delayed_progress(
-   _("Finding commits for commit graph"),
+   _("Finding commits for commit graph among 
packed objects"),
approx_nr_objects);
for_each_packed_object(add_packed_commits, , 0);
if (oids.progress_done < approx_nr_objects)
@@ -976,6 +988,8 @@ void write_commit_graph(const char *obj_dir,
  _cnt);
stop_progress();
 
+   strbuf_release(_title);
+
close_commit_graph(the_repository);
finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
commit_lock_file();
-- 
2.20.0.rc0.387.gc7a69e6b6c



Re: [PATCH 1/5] eoie: default to not writing EOIE section

2018-11-20 Thread Ben Peart




On 11/20/2018 1:11 AM, Jonathan Nieder wrote:

Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
2018-10-10) Git defaults to writing the new EOIE section when writing
out an index file.  Usually that is a good thing because it improves
threaded performance, but when a Git repository is shared with older
versions of Git, it produces a confusing warning:

   $ git status
   ignoring EOIE extension
   HEAD detached at 371ed0defa
   nothing to commit, working tree clean

Let's introduce the new index extension more gently.  First we'll roll
out the new version of Git that understands it, and then once
sufficiently many users are using such a version, we can flip the
default to writing it by default.

Introduce a '[index] recordEndOfIndexEntries' configuration variable
to allow interested users to benefit from this index extension early.

Signed-off-by: Jonathan Nieder 
---
Rebased.  No other change from v1.

As Jonathan pointed out, it would be nice to have tests here.  Ben,
any advice for how I could write some in a followup change?  E.g. does
Derrick Stolee's tracing based testing trick apply here?



I suppose a 'test-dump-eoie' could be written along the lines of 
test-dump-fsmonitor or test-dump-untracked-cache.  Unlike those, there 
isn't much state to dump other than the existence of the extension and 
the offset.  That could be used to test that the new settings are 
working properly.




Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Ben Peart




On 11/20/2018 4:26 AM, Ævar Arnfjörð Bjarmason wrote:


On Tue, Nov 20 2018, Jonathan Nieder wrote:

Just commenting here on the end-state of this since it's easier than
each patch at a time:

First, do we still need to be doing %.4s instead of just %s? It would be
easier for translators / to understand what's going on if it were just
%s. I.e. "this is the extension name" v.s. "this is the first 4 bytes of
whatever it is...".


return error("index uses %.4s extension, which we do not 
understand",
 ext);


Missing _(). Not the fault of this series, but something to fix while
we're at it.

Also not the fault of this series, the "is this upper case" test is
unportable, but this is probably the tip of the iceberg for git not
working on EBCDIC systems.

This message should say something like "Index uses the mandatory %s
extension" to clarify and distinguish it from the below. We don't
understand the upper-case one either, but the important distinction is
that one is mandatory, and the other can be dropped. The two messages
should make this clear.

Also, having advice() for that case is even more valuable since we have
a hard error at this point. So something like:

 "This is likely due to the index having been written by a future
 version of Git. All-lowercase index extensions are mandatory, as
 opposed to optional all-uppercase ones which we'll drop with a
 warning if we see them".



I agree that we should have different messages for mandatory and 
optional extensions.  I don't think we should try and educate the end 
user on the implementation detail that git makes lower cases mandatory 
and upper case optional (ie drop the 'All-lowercase..." part).  They 
will never see the lower vs upper case difference and can't do anything 
about it anyway.



trace_printf("ignoring %.4s extension\n", ext);
+   if (advice_unknown_index_extension) {
+   warning(_("ignoring optional %.4s index extension"), 
ext);


Should start with upper-case. Good that it says "optional".


+   advise(_("This is likely due to the file having been written 
by a newer\n"
+"version of Git than is reading it. You can upgrade 
Git to\n"
+"take advantage of performance improvements from 
the updated\n"
+"file format.\n"


Let's not promise performance improvements with this extension in a
future version. We have no idea what the extension is, yeah right now
it's going to be true for the extension that prompted this patch series,
but may not be in the future. So just something like this for the last
sentence:

 You can try upgrading Git to use this new index format.


Agree - not all are guaranteed to be perf related.




+"\n"
+"Run \"%s\"\n"
+"to suppress this message."),
+  "git config advice.unknownIndexExtension false");


Somewhat of an aside, but if I grep:

 git grep -C10 'git config advice\..*false' -- '*.[ch]'

There's a few existing examples of this, but the majority of advice()
messages don't say in the message how you can turn these off. Do we
think this a message users would especially like to turn off? I have the
opposite impression, it's a one-off in most cases, although not in the
case where an editor has an embedded git.

I think it would make sense to add this sort of thing to the advice()
API, i.e.:

 advice_with_config_hint(_(""), "unknownIndexExtension");

Which would then know how to consistently print this advice about how to
turn off the warning.



I like this.  I personally never knew you could turn of the "spent xxx 
seconds finding untracked files" advice until I worked on this patch 
series. This would help make that feature more discoverable.


Re: [PATCH 4/5] index: make index.threads=true enable ieot and eoie

2018-11-20 Thread Ben Peart




On 11/20/2018 1:14 AM, Jonathan Nieder wrote:

If a user explicitly sets

[index]
threads = true

to read the index using multiple threads, ensure that index writes
include the offset table by default to make that possible.  This
ensures that the user's intent of turning on threading is respected.

In other words, permit the following configurations:

- index.threads and index.recordOffsetTable unspecified: do not write
   the offset table yet (to avoid alarming the user with "ignoring IEOT
   extension" messages when an older version of Git accesses the
   repository) but do make use of multiple threads to read the index if
   the supporting offset table is present.

   This can also be requested explicitly by setting index.threads=true,
   0, or >1 and index.recordOffsetTable=false.

- index.threads=false or 1: do not write the offset table, and do not
   make use of the offset table.

   One can set index.recordOffsetTable=false as well, to be more
   explicit.

- index.threads=true, 0, or >1 and index.recordOffsetTable unspecified:
   write the offset table and make use of threads at read time.

   This can also be requested by setting index.threads=true, 0, >1, or
   unspecified and index.recordOffsetTable=true.

Fortunately the complication is temporary: once most Git installations
have upgraded to a version with support for the IEOT and EOIE
extensions, we can flip the defaults for index.recordEndOfIndexEntries
and index.recordOffsetTable to true and eliminate the settings.



This looks good.  I think this provides good default behavior while 
enabling fine grained control to those who want/need it.


I'm looking forward to the day when we can turn it back on by default so 
that people can take advantage of the speed improvements.




Re: [PATCH 1/5] eoie: default to not writing EOIE section

2018-11-20 Thread SZEDER Gábor
On Tue, Nov 20, 2018 at 08:06:16AM -0500, Ben Peart wrote:
> >diff --git a/read-cache.c b/read-cache.c
> >index 4ca81286c0..1e9c772603 100644
> >--- a/read-cache.c
> >+++ b/read-cache.c
> >@@ -2689,6 +2689,15 @@ void update_index_if_able(struct index_state *istate, 
> >struct lock_file *lockfile
> > rollback_lock_file(lockfile);
> >  }
> >+static int record_eoie(void)
> >+{
> >+int val;
> 
> I believe you are going to want to initialize val to 0 here as it is on the
> stack so is not guaranteed to be zero.

The git_config_get_bool() call below will initialize it anyway.

> >+
> >+if (!git_config_get_bool("index.recordendofindexentries", ))
> >+return val;
> >+return 0;
> >+}


Re: How to prepare patch for git am which remove a file ?

2018-11-20 Thread Mathieu Malaterre
On Tue, Nov 20, 2018 at 1:55 PM SZEDER Gábor  wrote:
>
> On Tue, Nov 20, 2018 at 01:39:40PM +0100, Mathieu Malaterre wrote:
> > Here is a simple setup:
> >
> >   cd /tmp
> >   mkdir g
> >   cd g
> >   git init .
> >   wget 
> > http://central.maven.org/maven2/org/apache/xmlgraphics/fop/2.1/fop-2.1.pom
> >   git add fop-2.1.pom
> >   git commit -m "My First Commit"
> >   git rm fop-2.1.pom
> >   git commit -m "Second Commit"
> >   git format-patch HEAD~
> >   git reset --hard HEAD~
> >   git am 0001-Second-Commit.patch
> > Applying: Second Commit
> > error: patch failed: fop-2.1.pom:1
> > error: fop-2.1.pom: patch does not apply
> > Patch failed at 0001 Second Commit
> > hint: Use 'git am --show-current-patch' to see the failed patch
> > When you have resolved this problem, run "git am --continue".
> > If you prefer to skip this patch, run "git am --skip" instead.
> > To restore the original branch and stop patching, run "git am --abort".
> >
> > What is the black magic to get `git am` to understand this patch ?
>
> The file in question uses CRLF line endings.

Ah right, I wasn't paying attention. Thanks.

>   $ git am --keep-cr 0001-Second-Commit.patch
>   Applying: Second Commit
>
> For explanation I quote ad2c928001 (git-am: Add command line parameter
> `--keep-cr` passing it to git-mailsplit, 2010-02-27):
>
>   c2ca1d7 (Allow mailsplit (and hence git-am) to handle mails with CRLF
>   line-endings, 2009-08-04) fixed "git mailsplit" to help people with
>   MUA whose output from save-as command uses CRLF as line terminators by
>   stripping CR at the end of lines.
>
>   However, when you know you are feeding output from "git format-patch"
>   directly to "git am", and especially when your contents have CR at the
>   end of line, such stripping is undesirable.  To help such a use case,
>   teach --keep-cr option to "git am" and pass that to "git mailinfo".
>


Re: [PATCH 2/5] ieot: default to not writing IEOT section

2018-11-20 Thread Ben Peart




On 11/20/2018 1:12 AM, Jonathan Nieder wrote:

As with EOIE, popular versions of Git do not support the new IEOT
extension yet.  When accessing a Git repository written by a more
modern version of Git, they correctly ignore the unrecognized section,
but in the process they loudly warn

ignoring IEOT extension

resulting in confusion for users.  Introduce the index extension more
gently by not writing it yet in this first version with support for
it.  Soon, once sufficiently many users are running a modern version
of Git, we can flip the default so users benefit from this index
extension by default.

Introduce a '[index] recordOffsetTable' configuration variable to
control whether the new index extension is written.

Signed-off-by: Jonathan Nieder 
---
As with patch 1/5, no change from v1 other than rebasing.

  Documentation/config/index.txt |  7 +++
  read-cache.c   | 11 ++-
  2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
index 8e138aba7a..de44183235 100644
--- a/Documentation/config/index.txt
+++ b/Documentation/config/index.txt
@@ -5,6 +5,13 @@ index.recordEndOfIndexEntries::
reading the index using Git versions before 2.20. Defaults to
'false'.
  
+index.recordOffsetTable::

+   Specifies whether the index file should include an "Index Entry
+   Offset Table" section. This reduces index load time on
+   multiprocessor machines but produces a message "ignoring IEOT
+   extension" when reading the index using Git versions before 2.20.
+   Defaults to 'false'.
+
  index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index 1e9c772603..f3d5638d9e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2698,6 +2698,15 @@ static int record_eoie(void)
return 0;
  }
  
+static int record_ieot(void)

+{
+   int val;
+


Initialize stack val to zero to ensure proper default.


+   if (!git_config_get_bool("index.recordoffsettable", ))
+   return val;
+   return 0;
+}
+
  /*
   * On success, `tempfile` is closed. If it is the temporary file
   * of a `struct lock_file`, we will therefore effectively perform
@@ -2761,7 +2770,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
else
nr_threads = 1;
  
-	if (nr_threads != 1) {

+   if (nr_threads != 1 && record_ieot()) {
int ieot_blocks, cpus;
  
  		/*




Re: [PATCH 1/5] eoie: default to not writing EOIE section

2018-11-20 Thread Ben Peart




On 11/20/2018 1:11 AM, Jonathan Nieder wrote:

Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
2018-10-10) Git defaults to writing the new EOIE section when writing
out an index file.  Usually that is a good thing because it improves
threaded performance, but when a Git repository is shared with older
versions of Git, it produces a confusing warning:

   $ git status
   ignoring EOIE extension
   HEAD detached at 371ed0defa
   nothing to commit, working tree clean

Let's introduce the new index extension more gently.  First we'll roll
out the new version of Git that understands it, and then once
sufficiently many users are using such a version, we can flip the
default to writing it by default.

Introduce a '[index] recordEndOfIndexEntries' configuration variable
to allow interested users to benefit from this index extension early.

Signed-off-by: Jonathan Nieder 
---
Rebased.  No other change from v1.

As Jonathan pointed out, it would be nice to have tests here.  Ben,
any advice for how I could write some in a followup change?  E.g. does
Derrick Stolee's tracing based testing trick apply here?

  Documentation/config/index.txt |  7 +++
  read-cache.c   | 11 ++-
  t/t1700-split-index.sh | 11 +++
  3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
index 4b94b6bedc..8e138aba7a 100644
--- a/Documentation/config/index.txt
+++ b/Documentation/config/index.txt
@@ -1,3 +1,10 @@
+index.recordEndOfIndexEntries::
+   Specifies whether the index file should include an "End Of Index
+   Entry" section. This reduces index load time on multiprocessor
+   machines but produces a message "ignoring EOIE extension" when
+   reading the index using Git versions before 2.20. Defaults to
+   'false'.
+
  index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index 4ca81286c0..1e9c772603 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2689,6 +2689,15 @@ void update_index_if_able(struct index_state *istate, 
struct lock_file *lockfile
rollback_lock_file(lockfile);
  }
  
+static int record_eoie(void)

+{
+   int val;


I believe you are going to want to initialize val to 0 here as it is on 
the stack so is not guaranteed to be zero.



+
+   if (!git_config_get_bool("index.recordendofindexentries", ))
+   return val;
+   return 0;
+}
+
  /*
   * On success, `tempfile` is closed. If it is the temporary file
   * of a `struct lock_file`, we will therefore effectively perform
@@ -2936,7 +2945,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 * read.  Write it out regardless of the strip_extensions parameter as 
we need it
 * when loading the shared index.
 */
-   if (offset) {
+   if (offset && record_eoie()) {
struct strbuf sb = STRBUF_INIT;
  
  		write_eoie_extension(, _c, offset);

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..0cbac64e28 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -25,14 +25,17 @@ test_expect_success 'enable split index' '
git update-index --split-index &&
test-tool dump-split-index .git/index >actual &&
indexversion=$(test-tool index-version <.git/index) &&
+
+   # NEEDSWORK: Stop hard-coding checksums.
if test "$indexversion" = "4"
then
-   own=3527df833c6c100d3d1d921a9a782d62a8be4b58
-   base=746f7ab2ed44fb839efdfbffcf399d0b113fb4cb
+   own=432ef4b63f32193984f339431fd50ca796493569
+   base=508851a7f0dfa8691e9f69c7f055865389012491
else
-   own=5e9b60117ece18da410ddecc8b8d43766a0e4204
-   base=4370042739b31cd17a5c5cd6043a77c9a00df113
+   own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
+   base=39d890139ee5356c7ef572216cebcd27aa41f9df
fi &&
+
cat >expect <<-EOF &&
own $own
base $base



Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 

+Obsolescence across cherry-picks
+
+By default the evolve command will treat cherry-picks and squash merges as 
being
+completely separate from the original. Further amendments to the original 
commit
+will have no effect on the cherry-picked copy. However, this behavior may not 
be
+desirable in all circumstances.
+
+The evolve command may at some point support an option to look for cases where
+the source of a cherry-pick or squash merge has itself been amended, and
+automatically apply that same change to the cherry-picked copy. In such cases,
+it would traverse origin edges rather than ignoring them, and would treat a
+commit with origin edges as being obsolete if any of its origins were obsolete.


If a merge has been cherry-picked we cannot update it as we don't record 
which parent was used for the pick, however it is probably not a problem 
in practice - I think it is unusual to amend merges.


Best Wishes

Phillip


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

On 20/11/2018 12:18, Phillip Wood wrote:

On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 
+Divergence
+--
+From the user’s perspective, two changes are divergent if they both 
ask for
+different replacements to the same commit. More precisely, a target 
commit is
+considered divergent if there is more than one commit at the head of 
a change in
+refs/metas that leads to the target commit via an unbroken chain of 
“obsolete”

+edges.
+
+Much like a merge conflict, divergence is a situation that requires user
+intervention to resolve. The evolve command will stop when it encounters
+divergence and prompt the user to resolve the problem. Users can 
solve the

+problem in several ways:
+
+- Discard one of the changes (by deleting its change branch).
+- Merge the two changes (producing a single change branch).


I assume this wont create merge commits for the actual commits though, 
just merge the meta branches and create some new commits that are each 
the result of something like 'merge-recursive original-commit 
our-new-version their-new-version'


That should have been

merge-recursive original-commit^ -- our-new-version their-new-version

Best Wishes

Phillip



Re: How to prepare patch for git am which remove a file ?

2018-11-20 Thread SZEDER Gábor
On Tue, Nov 20, 2018 at 01:39:40PM +0100, Mathieu Malaterre wrote:
> Here is a simple setup:
> 
>   cd /tmp
>   mkdir g
>   cd g
>   git init .
>   wget 
> http://central.maven.org/maven2/org/apache/xmlgraphics/fop/2.1/fop-2.1.pom
>   git add fop-2.1.pom
>   git commit -m "My First Commit"
>   git rm fop-2.1.pom
>   git commit -m "Second Commit"
>   git format-patch HEAD~
>   git reset --hard HEAD~
>   git am 0001-Second-Commit.patch
> Applying: Second Commit
> error: patch failed: fop-2.1.pom:1
> error: fop-2.1.pom: patch does not apply
> Patch failed at 0001 Second Commit
> hint: Use 'git am --show-current-patch' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> What is the black magic to get `git am` to understand this patch ?

The file in question uses CRLF line endings.

  $ git am --keep-cr 0001-Second-Commit.patch
  Applying: Second Commit

For explanation I quote ad2c928001 (git-am: Add command line parameter
`--keep-cr` passing it to git-mailsplit, 2010-02-27):

  c2ca1d7 (Allow mailsplit (and hence git-am) to handle mails with CRLF
  line-endings, 2009-08-04) fixed "git mailsplit" to help people with
  MUA whose output from save-as command uses CRLF as line terminators by
  stripping CR at the end of lines.
  
  However, when you know you are feeding output from "git format-patch"
  directly to "git am", and especially when your contents have CR at the
  end of line, such stripping is undesirable.  To help such a use case,
  teach --keep-cr option to "git am" and pass that to "git mailinfo".



Re: Cygwin Git with Windows paths

2018-11-20 Thread Steven Penny
On Tue, Nov 20, 2018 at 4:36 AM Torsten Bögershausen wrote:
> Could you please post a "git diff" of your instrumented code,
> so that I/we can follow the debugging, especially what the printouts mean?
>
> I think we need to understand what is going on in abspath.c
>

diff --git a/abspath.c b/abspath.c
index 9857985..09548e5 100644
--- a/abspath.c
+++ b/abspath.c
@@ -14,6 +14,7 @@ int is_directory(const char *path)
 /* removes the last path component from 'path' except if 'path' is root */
 static void strip_last_component(struct strbuf *path)
 {
+   printf("strip_last_component, path, [%s]\n", path->buf);
size_t offset = offset_1st_component(path->buf);
size_t len = path->len;

@@ -30,6 +31,8 @@ static void strip_last_component(struct strbuf *path)
 /* get (and remove) the next component in 'remaining' and place it in 'next' */
 static void get_next_component(struct strbuf *next, struct strbuf *remaining)
 {
+   printf("get_next_component, next, [%s]\n", next->buf);
+   printf("get_next_component, remaining, [%s]\n", remaining->buf);
char *start = NULL;
char *end = NULL;


Re: Git Test Coverage Report (v2.20.0-rc0)

2018-11-20 Thread Jeff King
On Tue, Nov 20, 2018 at 07:17:46AM -0500, Derrick Stolee wrote:

> > And I think that's a pattern with the delta-island code. What we really
> > care about most is that if we throw a real fork-network repository at
> > it, it produces faster clones with fewer un-reusable deltas. So I think
> > a much more interesting approach here would be perf tests. But:
> > 
> >- we'd want to count those as coverage, and that likely makes your
> >  coverage tests prohibitively expensive
> > 
> >- it requires a specialized repo to demonstrate, which most people
> >  aren't going to have handy
> 
> Do you have regularly-running tests that check this in your infrastructure?
> As long as someone would notice if this code starts failing, that would be
> enough.

We do integration tests, and this feature gets exercised quite a bit in
production at GitHub. But none of that caught the breakage I fixed this
morning for the simple fact that we don't keep up with upstream master
in real-time. We're running the delta-island code I wrote years ago, and
the bug was introduced by the upstreaming. Probably in a month or three
I'll merge v2.20 into our fork, and we would have noticed then.

I've had dreams of following upstream more closely, but there are two
blockers:

  - we have enough custom bits that the merges are time-consuming (and
introduce the possibility of funny interactions or regressions). I'd
like to reduce our overall diff from upstream, but it's slow going
and time is finite. (And ironically, it's made harder in some ways
by the lag, because many of our topics are backports of things I
send upstream).

  - deploying the ".0" release from master can be scary. For instance, I
was really happy to have a fix for 9ac3f0e5b3 (pack-objects: fix
performance issues on packing large deltas, 2018-07-22) before
putting the bug into production.

> > Yeah, this triggers if your regex has more than one capture group (and
> > likewise, we almost certainly don't run the "you have too many groups"
> > warning).
> 
> Did you know that regexes are notoriously under-tested [1]? When looking at
> this code, I didn't even know regexes were involved (but I didn't look
> enough at the context).
> 
> [1] https://dl.acm.org/citation.cfm?id=3236072

Heh. Well, fortunately we are compiling regexes from the user's config,
so it's not Git's fault if the regexes are bad. ;)

(Of course on our production servers I'm on the hook for both sides!)

-Peff


How to prepare patch for git am which remove a file ?

2018-11-20 Thread Mathieu Malaterre
Here is a simple setup:

  cd /tmp
  mkdir g
  cd g
  git init .
  wget 
http://central.maven.org/maven2/org/apache/xmlgraphics/fop/2.1/fop-2.1.pom
  git add fop-2.1.pom
  git commit -m "My First Commit"
  git rm fop-2.1.pom
  git commit -m "Second Commit"
  git format-patch HEAD~
  git reset --hard HEAD~
  git am 0001-Second-Commit.patch
Applying: Second Commit
error: patch failed: fop-2.1.pom:1
error: fop-2.1.pom: patch does not apply
Patch failed at 0001 Second Commit
hint: Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

What is the black magic to get `git am` to understand this patch ?
Please note that:

$ patch --dry-run -p1 < 0001-Second-Commit.patch
checking file fop-2.1.pom

Thanks!


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Phillip Wood

Hi Stefan

Thanks for working on this, I think it could be a really useful addition 
to git. I'd echo Gábor's comments about making commands descriptive and 
easy for the user to find, git has aliases, accepts abbreviated option 
names and has shell completion so I don't think typing is really such a 
problem. From your reply it looks like you've taken those concerns on 
board. I've got some more comments below.


On 15/11/2018 00:55, sxe...@google.com wrote:

From: Stefan Xenos 

This document describes what an obsolescence graph for
git would look like, the behavior of the evolve command,
and the changes planned for other commands.

Signed-off-by: Stefan Xenos 
---
  Documentation/technical/evolve.txt | 885 +
  1 file changed, 885 insertions(+)
  create mode 100644 Documentation/technical/evolve.txt

diff --git a/Documentation/technical/evolve.txt 
b/Documentation/technical/evolve.txt
new file mode 100644
index 00..88470eada3
--- /dev/null
+++ b/Documentation/technical/evolve.txt
@@ -0,0 +1,885 @@
+Git Obsolescence Graph
+==
+
+Objective
+-
+Track the edits to a commit over time in an obsolescence graph.
+
+Background
+--
+Imagine you have three dependent changes up for review and you receive feedback
+that requires editing all three changes. While you're editing one, more 
feedback
+arrives on one of the others. What do you do?
+
+The evolve command is a convenient way to work with chains of commits that are
+under review. Whenever you rebase or amend a commit, the repository remembers
+that the old commit is obsolete and has been replaced by the new one. Then, at
+some point in the future, you can run "git evolve" and the correct sequence of
+rebases will occur in the correct order such that no commit has an obsolete
+parent.
+
+Part of making the "evolve" command work involves tracking the edits to a 
commit
+over time, which is why we need an obsolescence graph. However, the 
obsolescence
+graph will also bring other benefits:
+
+- Users can view the history of a commit directly (the sequence of amends and
+  rebases it has undergone, orthogonal to the history of the branch it is on).
+- It will be possible to quickly locate and list all the changes the user
+  currently has in progress.
+- It can be used as part of other high-level commands that combine or split
+  changes.
+- It can be used to decorate commits (in git log, gitk, etc) that are either
+  obsolete or are the tip of a work in progress.
+- By pushing and pulling the obsolescence graph, users can collaborate more
+  easily on changes-in-progress. This is better than pushing and pulling the
+  changes themselves since the obsolescence graph can be used to locate a more
+  specific merge base, allowing for better merges between different versions of
+  the same change.
+- It could be used to correctly rebase local changes and other local branches
+  after running git-filter-branch.
+- It can replace the change-id footer used by gerrit.
+
+Similar technologies
+
+There are some other technologies that address the same end-user problem.
+
+Rebase -i can be used to solve the same problem, but users can't easily switch
+tasks midway through an interactive rebase or have more than one interactive
+rebase going on at the same time. It can't handle the case where you have
+multiple changes sharing the same parent when that parent needs to be rebased
+and won't let you collaborate with others on resolving a complicated 
interactive
+rebase. You can think of rebase -i as a top-down approach and the evolve 
command
+as the bottom-up approach to the same problem.
+
+Several patch queue managers have been built on top of git (such as topgit,
+stgit, and quilt). They address the same user need. However they also rely on
+state managed outside git that needs to be kept in sync. Such state can be
+easily damaged when running a git native command that is unaware of the patch
+queue. They also typically require an explicit initialization step to be done 
by
+the user which creates workflow problems.
+
+Replacements (refs/replace) are superficially similar to obsolescences in that
+they describe that one commit should be replaced by another. However, they
+differ in both how they are created and how they are intended to be used.
+Obsolescences are created automatically by the commands a user runs, and they
+describe the user’s intent to perform a future rebase. Obsolete commits still
+appear in branches, logs, etc like normal commits (possibly with an extra
+decoration that marks them as obsolete). Replacements are typically created
+explicitly by the user, they are meant to be kept around for a long time, and
+they describe a replacement to be applied at read-time rather than as the input
+to a future operation. When a replaced commit is queried, it is typically 
hidden
+and swapped out with its replacement as though the replacement has already
+occurred.
+
+Goals

Re: Git Test Coverage Report (v2.20.0-rc0)

2018-11-20 Thread Derrick Stolee

On 11/20/2018 6:34 AM, Jeff King wrote:

On Mon, Nov 19, 2018 at 10:40:53AM -0500, Derrick Stolee wrote:


Since depth is never incremented, we are not covering this block. Is it
possible to test?
This should be covered by the fix in:

   https://public-inbox.org/git/20181120095053.gc22...@sigill.intra.peff.net/

because now entries at the top-level are depth "1". The "depth++" line
is still never executed in our test suite. I'm not sure how much that
matters.


Thanks! I'll go take a look at your patch.


delta-islands.c
c8d521faf7  53) memcpy(b, old, size);

This memcpy happens when the 'old' island_bitmap is passed to
island_bitmap_new(), but...


c8d521faf7 187) b->refcount--;
c8d521faf7 188) b = kh_value(island_marks, pos) = island_bitmap_new(b);

This block has the only non-NULL caller to island_bitmap_new().

This is another case where it triggers a lot for a reasonably-sized
repo, but it's hard to construct a small case. This code implements a
copy-on-write of the bitmap, which means the same objects have to be
accessible from two different paths through the reachability graph, each
with different island marks. And then a test would I guess make sure
that the correct subsets of objects never become deltas, which gets
complicated.

And I think that's a pattern with the delta-island code. What we really
care about most is that if we throw a real fork-network repository at
it, it produces faster clones with fewer un-reusable deltas. So I think
a much more interesting approach here would be perf tests. But:

   - we'd want to count those as coverage, and that likely makes your
 coverage tests prohibitively expensive

   - it requires a specialized repo to demonstrate, which most people
 aren't going to have handy


Do you have regularly-running tests that check this in your 
infrastructure? As long as someone would notice if this code starts 
failing, that would be enough.



c8d521faf7 212) obj = ((struct tag *)obj)->tagged;
c8d521faf7 213) if (obj) {
c8d521faf7 214) parse_object(the_repository, >oid);
c8d521faf7 215) marks = create_or_get_island_marks(obj);
c8d521faf7 216) island_bitmap_set(marks, island_counter);

It appears that this block would happen if we placed a tag in the delta
island.

Yep. Again, exercised by real repositories.

I'm not sure how far we want to go in the blind pursuit of coverage.
Certainly we could add a tag to the repo in t5320, and this code would
get executed. But verifying that it's doing the right thing is much
harder (and is more easily done with a perf test).


Blind coverage goals are definitely not worth the effort. My goal here 
was to re-check all of the new code (since last release) that is not 
covered, because it's easier to hide a bug there.





c8d521faf7 397) strbuf_addch(_name, '-');

This block is inside the following patch:
[...]

Yeah, this triggers if your regex has more than one capture group (and
likewise, we almost certainly don't run the "you have too many groups"
warning).


Did you know that regexes are notoriously under-tested [1]? When looking 
at this code, I didn't even know regexes were involved (but I didn't 
look enough at the context).


[1] https://dl.acm.org/citation.cfm?id=3236072




c8d521faf7 433) continue;
c8d521faf7 436) list[dst] = list[src];

These blocks are inside the following nested loop in deduplicate_islands():

+   for (ref = 0; ref + 1 < island_count; ref++) {
+   for (src = ref + 1, dst = src; src < island_count; src++) {
+   if (list[ref]->hash == list[src]->hash)
+   continue;
+
+   if (src != dst)
+   list[dst] = list[src];
+
+   dst++;
+   }
+   island_count = dst;
+   }

This means that our "deduplication" logic is never actually doing anything
meaningful.

Sorry, I don't even remember what this code is trying to do. The island
code is 5+ years old, and just recently ported to upstream Git by
Christian. And that's perhaps part of my laziness in the above tests; it
would be a significant effort to re-figure out all these corner cases.
It's a big part of why I hadn't been sending the patches upstream
myself.


Sure. Hopefully pointing out these blocks gives us more motivation to 
manually inspect them and avoid silly bugs.


Thanks,
-Stolee


Re: [PATCH] tests: send "bug in the test script" errors to the script's stderr

2018-11-20 Thread SZEDER Gábor
On Tue, Nov 20, 2018 at 05:58:25AM -0500, Jeff King wrote:
> On Tue, Nov 20, 2018 at 05:45:28AM -0500, Jeff King wrote:
> 
> > On Tue, Nov 20, 2018 at 12:34:04AM +0100, SZEDER Gábor wrote:
> > 
> > > > I do notice that many of the existing "FATAL:" errors use descriptor 5,
> > > > which goes to stdout. I'm not sure if those should actually be going to
> > > > stderr (or if there's some TAP significance to those lines).
> > > 
> > > I chose to send these messages to standard error, because they are,
> > > well, errors.  TAP only cares about stdout, but by aborting the test
> > > script in BUG(), error() or die() we are already violating TAP anyway,
> > > so I don't think it matters whether we send "bug in test script" or
> > > "FATAL" errors to stdout or stderr.
> > 
> > Yeah, I agree it probably doesn't matter. I was mostly suggesting to
> > make the existing ">&5" into ">&7" for consistency. But I don't think
> > that needs to block your patch.
> 
> By the way, I did check to see whether this might help the situation
> where running under "prove" we see only "Dubious..." and not the actual
> error() message produced by the test script. But no, it eats both stdout
> and stderr. You can sneak a line through by prepending it with "#", but
> only if "prove -o" is used (and even then, it's hard to associate it
> with a particular test when you're running many in parallel).

Just to be clear: I don't mind if in some combination of test
harnesses and test options a "bug in the test script" message doesn't
reach the terminal as long as I get a clearly visible error from
somewhere.

> So I guess the status quo is not too bad: prove says "dubious", and then
> you re-run the test with "./t1234-foo.sh -v -i" and you get to see the
> error.

And with '--verbose-log' the "bug in the test script" message goes to
the test's log as well, even when it has to go through fd 7 first, so
if you use 'prove' and your GIT_TEST_OPTS includes '--verbose-log'
then you can just look at the log, there's no need to re-run the test.



[PATCH] rebase: mark a test as failing with rebase.useBuiltin=false

2018-11-20 Thread Ævar Arnfjörð Bjarmason
Mark a test added in 04519d7201 ("rebase: validate -C and
--whitespace= parameters early", 2018-11-14) as only succeeding
with the builtin version of rebase. It would be nice if the
shellscript version had the same fix, but it's on its way out, and the
author is not interested in fixing it[1].

This makes the entire test suite pass again with the
GIT_TEST_REBASE_USE_BUILTIN=false mode added in my 62c23938fa ("tests:
add a special setup where rebase.useBuiltin is off", 2018-11-14).

1. 
https://public-inbox.org/git/nycvar.qro.7.76.6.1811201157170...@tvgsbejvaqbjf.bet/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

On Tue, Nov 20 2018, Johannes Schindelin wrote:

> [...]
> Maybe you can concoct a prereq for this test? Something like
>
> test_lazy_prereq BUILTIN_REBASE '
>   test true = "${GIT_TEST_REBASE_USE_BUILTIN:-true}"
> '

It's better to just mark the test as needing the prereq turned
off. E.g. this is what we do for the split index tests & now for the
gettext tests. That way we always run the test, but just indicate that
it relies on GIT_TEST_REBASE_USE_BUILTIN being unset.

 t/t3406-rebase-message.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t3406-rebase-message.sh b/t/t3406-rebase-message.sh
index 38bd876cab..77e5bbb3d5 100755
--- a/t/t3406-rebase-message.sh
+++ b/t/t3406-rebase-message.sh
@@ -84,7 +84,8 @@ test_expect_success 'rebase --onto outputs the invalid ref' '
test_i18ngrep "invalid-ref" err
 '
 
-test_expect_success 'error out early upon -C or --whitespace=' '
+test_expect_success 'builtin rebase: error out early upon -C or 
--whitespace=' '
+   sane_unset GIT_TEST_REBASE_USE_BUILTIN &&
test_must_fail git rebase -Cnot-a-number HEAD 2>err &&
test_i18ngrep "numerical value" err &&
test_must_fail git rebase --whitespace=bad HEAD 2>err &&
-- 
2.20.0.rc0.387.gc7a69e6b6c



WISH: set word-diff-regex for/in gitk

2018-11-20 Thread Anna Vernerova
Dear developers,

I am a daily use of gitk; I use the "color words" diff option most often.
I have found no way to set --word-diff-regex for gitk, or to make the command

git config --global diff.wordRegex '.'

have effect on the diffs shown in gitk.

I think the value of --word-diff-regex is effectively discarded in the block at 
line 215 of gitk-git/gitk,
but it seems that other changes are necessary for the parameter to take any 
effect.

Being able to configure word-diff-regex would be much appreciated!

Anša


Re: [PATCH] test-lib-functions: make 'test_cmp_rev' more informative on failure

2018-11-20 Thread Jeff King
On Tue, Nov 20, 2018 at 12:25:38PM +0100, SZEDER Gábor wrote:

> > but we do not
> > usually bother to do so for our helper functions (like test_cmp).
> 
> test_i18ngrep() does since your 03aa3783f2 (t: send verbose
> test-helper output to fd 4, 2018-02-22).

Oh, indeed. Usually I find myself forgetting about patches I worked on
from 5 years ago. Forgetting about one from 9 months ago is a new low.
:)

> > I don't think it would ever hurt,
> > though. Should we be doing that for all the others, too?
> 
> I think we should, and you agreed:
> 
> https://public-inbox.org/git/20180303065648.ga17...@sigill.intra.peff.net/
> 
> but then went travelling, and then the whole thing got forgotten.

Stupid vacation. :)

OK, yes, I reaffirm my agreement that this is the right direction, then.
Sorry for my lack of memory.

-Peff


Re: Git Test Coverage Report (v2.20.0-rc0)

2018-11-20 Thread Jeff King
On Mon, Nov 19, 2018 at 10:40:53AM -0500, Derrick Stolee wrote:

> > 28b8a73080 builtin/pack-objects.c 2793) depth++;
> > 108f530385 builtin/pack-objects.c 2797) oe_set_tree_depth(_pack, ent,
> > depth);
> 
> This 'depth' variable is incremented as part of a for loop in this patch:
> 
>  static void show_object(struct object *obj, const char *name, void *data)
> @@ -2686,6 +2706,19 @@ static void show_object(struct object *obj, const
> char *name, void *data)
>     add_preferred_base_object(name);
>     add_object_entry(>oid, obj->type, name, 0);
>     obj->flags |= OBJECT_ADDED;
> +
> +   if (use_delta_islands) {
> +   const char *p;
> +   unsigned depth = 0;
> +   struct object_entry *ent;
> +
> +   for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
> +   depth++;
> +
> +   ent = packlist_find(_pack, obj->oid.hash, NULL);
> +   if (ent && depth > ent->tree_depth)
> +   ent->tree_depth = depth;
> +   }
>  }
> 
> And that 'ent->tree_depth = depth;' line is replaced with the
> oe_set_tree_depth() call in the report.
> 
> Since depth is never incremented, we are not covering this block. Is it
> possible to test?

This should be covered by the fix in:

  https://public-inbox.org/git/20181120095053.gc22...@sigill.intra.peff.net/

because now entries at the top-level are depth "1". The "depth++" line
is still never executed in our test suite. I'm not sure how much that
matters.

> > delta-islands.c
> > c8d521faf7  53) memcpy(b, old, size);
> 
> This memcpy happens when the 'old' island_bitmap is passed to
> island_bitmap_new(), but...
> 
> > c8d521faf7 187) b->refcount--;
> > c8d521faf7 188) b = kh_value(island_marks, pos) = island_bitmap_new(b);
> 
> This block has the only non-NULL caller to island_bitmap_new().

This is another case where it triggers a lot for a reasonably-sized
repo, but it's hard to construct a small case. This code implements a
copy-on-write of the bitmap, which means the same objects have to be
accessible from two different paths through the reachability graph, each
with different island marks. And then a test would I guess make sure
that the correct subsets of objects never become deltas, which gets
complicated.

And I think that's a pattern with the delta-island code. What we really
care about most is that if we throw a real fork-network repository at
it, it produces faster clones with fewer un-reusable deltas. So I think
a much more interesting approach here would be perf tests. But:

  - we'd want to count those as coverage, and that likely makes your
coverage tests prohibitively expensive

  - it requires a specialized repo to demonstrate, which most people
aren't going to have handy

> > c8d521faf7 212) obj = ((struct tag *)obj)->tagged;
> > c8d521faf7 213) if (obj) {
> > c8d521faf7 214) parse_object(the_repository, >oid);
> > c8d521faf7 215) marks = create_or_get_island_marks(obj);
> > c8d521faf7 216) island_bitmap_set(marks, island_counter);
> 
> It appears that this block would happen if we placed a tag in the delta
> island.

Yep. Again, exercised by real repositories.

I'm not sure how far we want to go in the blind pursuit of coverage.
Certainly we could add a tag to the repo in t5320, and this code would
get executed. But verifying that it's doing the right thing is much
harder (and is more easily done with a perf test).

> > c8d521faf7 397) strbuf_addch(_name, '-');
> 
> This block is inside the following patch:
> [...]

Yeah, this triggers if your regex has more than one capture group (and
likewise, we almost certainly don't run the "you have too many groups"
warning).

> > c8d521faf7 433) continue;
> > c8d521faf7 436) list[dst] = list[src];
> 
> These blocks are inside the following nested loop in deduplicate_islands():
> 
> +   for (ref = 0; ref + 1 < island_count; ref++) {
> +   for (src = ref + 1, dst = src; src < island_count; src++) {
> +   if (list[ref]->hash == list[src]->hash)
> +   continue;
> +
> +   if (src != dst)
> +   list[dst] = list[src];
> +
> +   dst++;
> +   }
> +   island_count = dst;
> +   }
> 
> This means that our "deduplication" logic is never actually doing anything
> meaningful.

Sorry, I don't even remember what this code is trying to do. The island
code is 5+ years old, and just recently ported to upstream Git by
Christian. And that's perhaps part of my laziness in the above tests; it
would be a significant effort to re-figure out all these corner cases.
It's a big part of why I hadn't been sending the patches upstream
myself.

-Peff


Re: [PATCH] test-lib-functions: make 'test_cmp_rev' more informative on failure

2018-11-20 Thread SZEDER Gábor
On Mon, Nov 19, 2018 at 02:49:20PM -0500, Jeff King wrote:
> On Mon, Nov 19, 2018 at 02:28:18PM +0100, SZEDER Gábor wrote:
> 
> > The 'test_cmp_rev' helper is merely a wrapper around 'test_cmp'
> > checking the output of two 'git rev-parse' commands, which means that
> > its output on failure is not particularly informative, as it's
> > basically two OIDs with a bit of extra clutter of the diff header, but
> > without any indication of which two revisions have caused the failure:
> > 
> >   --- expect.rev  2018-11-17 14:02:11.569747033 +
> >   +++ actual.rev  2018-11-17 14:02:11.569747033 +
> >   @@ -1 +1 @@
> >   -d79ce1670bdcb76e6d1da2ae095e890ccb326ae9
> >   +139b20d8e6c5b496de61f033f642d0e3dbff528d
> > 
> > It also pollutes the test repo with these two intermediate files,
> > though that doesn't seem to cause any complications in our current
> > tests (meaning that I couldn't find any tests that have to work around
> > the presence of these files by explicitly removing or ignoring them).
> > 
> > Enhance 'test_cmp_rev' to provide a more useful output on failure with
> > less clutter:
> > 
> >   error: two revisions point to different objects:
> > 'HEAD^': d79ce1670bdcb76e6d1da2ae095e890ccb326ae9
> > 'extra': 139b20d8e6c5b496de61f033f642d0e3dbff528d
> > 
> > Doing so is more convenient when storing the OIDs outputted by 'git
> > rev-parse' in a local variable each, which, as a bonus, won't pollute
> > the repository with intermediate files.
> > 
> > While at it, also ensure that 'test_cmp_rev' is invoked with the right
> > number of parameters, namely two.
> 
> This is an improvement, in my opinion (and I agree that using your new
> BUG for this last part would be better still). It also saves a process
> in the common case.

But then it adds two subshells to the common case right away...  I'm
not sure which is worse on Windows, where it matters the most.

> One question:
> 
> > +   else
> > +   local r1 r2
> > +   r1=$(git rev-parse --verify "$1") &&
> > +   r2=$(git rev-parse --verify "$2") &&
> > +   if test "$r1" != "$r2"
> > +   then
> > +   cat >&4 <<-EOF
> > +   error: two revisions point to different objects:
> > + '$1': $r1
> > + '$2': $r2
> > +   EOF
> > +   return 1
> > +   fi
> 
> Why does this cat go to descriptor 4? I get why you'd want it to (it's
> meant for the user's eyes, and that's where 4 goes),

Exactly.

> but we do not
> usually bother to do so for our helper functions (like test_cmp).

test_i18ngrep() does since your 03aa3783f2 (t: send verbose
test-helper output to fd 4, 2018-02-22).

> I don't think it matters usually in practice, because nobody tries to
> capture the stderr of test_cmp, etc.

See 54ce2e9be9 (t9400-git-cvsserver-server: don't rely on the output
of 'test_cmp', 2018-03-08) for a fun example, I remember it caused a
spike in WTF/minutes :)


> I don't think it would ever hurt,
> though. Should we be doing that for all the others, too?

I think we should, and you agreed:

https://public-inbox.org/git/20180303065648.ga17...@sigill.intra.peff.net/

but then went travelling, and then the whole thing got forgotten.



Re: [PATCH v2 2/2] rebase: validate -C and --whitespace= parameters early

2018-11-20 Thread Johannes Schindelin
Hi Ævar,

On Mon, 19 Nov 2018, Ævar Arnfjörð Bjarmason wrote:

> 
> On Wed, Nov 14 2018, Johannes Schindelin via GitGitGadget wrote:
> 
> > From: Johannes Schindelin 
> >
> > It is a good idea to error out early upon seeing, say, `-Cbad`, rather
> > than starting the rebase only to have the `--am` backend complain later.
> >
> > Let's do this.
> >
> > The only options accepting parameters which we pass through to `git am`
> > (which may, or may not, forward them to `git apply`) are `-C` and
> > `--whitespace`. The other options we pass through do not accept
> > parameters, so we do not have to validate them here.
> >
> > Signed-off-by: Johannes Schindelin 
> > ---
> >  builtin/rebase.c  | 12 +++-
> >  t/t3406-rebase-message.sh |  7 +++
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/rebase.c b/builtin/rebase.c
> > index 96ffa80b71..571cf899d5 100644
> > --- a/builtin/rebase.c
> > +++ b/builtin/rebase.c
> > @@ -1064,12 +1064,22 @@ int cmd_rebase(int argc, const char **argv, const 
> > char *prefix)
> > }
> >
> > for (i = 0; i < options.git_am_opts.argc; i++) {
> > -   const char *option = options.git_am_opts.argv[i];
> > +   const char *option = options.git_am_opts.argv[i], *p;
> > if (!strcmp(option, "--committer-date-is-author-date") ||
> > !strcmp(option, "--ignore-date") ||
> > !strcmp(option, "--whitespace=fix") ||
> > !strcmp(option, "--whitespace=strip"))
> > options.flags |= REBASE_FORCE;
> > +   else if (skip_prefix(option, "-C", )) {
> > +   while (*p)
> > +   if (!isdigit(*(p++)))
> > +   die(_("switch `C' expects a "
> > + "numerical value"));
> > +   } else if (skip_prefix(option, "--whitespace=", )) {
> > +   if (*p && strcmp(p, "warn") && strcmp(p, "nowarn") &&
> > +   strcmp(p, "error") && strcmp(p, "error-all"))
> > +   die("Invalid whitespace option: '%s'", p);
> > +   }
> > }
> >
> > if (!(options.flags & REBASE_NO_QUIET))
> > diff --git a/t/t3406-rebase-message.sh b/t/t3406-rebase-message.sh
> > index 0392e36d23..2c79eed4fe 100755
> > --- a/t/t3406-rebase-message.sh
> > +++ b/t/t3406-rebase-message.sh
> > @@ -84,4 +84,11 @@ test_expect_success 'rebase --onto outputs the invalid 
> > ref' '
> > test_i18ngrep "invalid-ref" err
> >  '
> >
> > +test_expect_success 'error out early upon -C or --whitespace=' '
> > +   test_must_fail git rebase -Cnot-a-number HEAD 2>err &&
> > +   test_i18ngrep "numerical value" err &&
> > +   test_must_fail git rebase --whitespace=bad HEAD 2>err &&
> > +   test_i18ngrep "Invalid whitespace option" err
> > +'
> > +
> 
> The combination of this gitster/js/rebase-am-options and my
> gitster/ab/rebase-in-c-escape-hatch breaks tests under
> GIT_TEST_REBASE_USE_BUILTIN=false for obvious reasons. The C version is
> now more strict.

Maybe you can concoct a prereq for this test? Something like

test_lazy_prereq BUILTIN_REBASE '
test true = "${GIT_TEST_REBASE_USE_BUILTIN:-true}"
'

Ciao,
Dscho

Re: [PATCH] tests: send "bug in the test script" errors to the script's stderr

2018-11-20 Thread Jeff King
On Tue, Nov 20, 2018 at 05:45:28AM -0500, Jeff King wrote:

> On Tue, Nov 20, 2018 at 12:34:04AM +0100, SZEDER Gábor wrote:
> 
> > > I do notice that many of the existing "FATAL:" errors use descriptor 5,
> > > which goes to stdout. I'm not sure if those should actually be going to
> > > stderr (or if there's some TAP significance to those lines).
> > 
> > I chose to send these messages to standard error, because they are,
> > well, errors.  TAP only cares about stdout, but by aborting the test
> > script in BUG(), error() or die() we are already violating TAP anyway,
> > so I don't think it matters whether we send "bug in test script" or
> > "FATAL" errors to stdout or stderr.
> 
> Yeah, I agree it probably doesn't matter. I was mostly suggesting to
> make the existing ">&5" into ">&7" for consistency. But I don't think
> that needs to block your patch.

By the way, I did check to see whether this might help the situation
where running under "prove" we see only "Dubious..." and not the actual
error() message produced by the test script. But no, it eats both stdout
and stderr. You can sneak a line through by prepending it with "#", but
only if "prove -o" is used (and even then, it's hard to associate it
with a particular test when you're running many in parallel).

So I guess the status quo is not too bad: prove says "dubious", and then
you re-run the test with "./t1234-foo.sh -v -i" and you get to see the
error.

-Peff


Re: [PATCH] tests: send "bug in the test script" errors to the script's stderr

2018-11-20 Thread Jeff King
On Tue, Nov 20, 2018 at 12:34:04AM +0100, SZEDER Gábor wrote:

> > I do notice that many of the existing "FATAL:" errors use descriptor 5,
> > which goes to stdout. I'm not sure if those should actually be going to
> > stderr (or if there's some TAP significance to those lines).
> 
> I chose to send these messages to standard error, because they are,
> well, errors.  TAP only cares about stdout, but by aborting the test
> script in BUG(), error() or die() we are already violating TAP anyway,
> so I don't think it matters whether we send "bug in test script" or
> "FATAL" errors to stdout or stderr.

Yeah, I agree it probably doesn't matter. I was mostly suggesting to
make the existing ">&5" into ">&7" for consistency. But I don't think
that needs to block your patch.

> BTW, TAP understands "Bail out!" as bail out from the _entire_ test
> suite, but that's not what we want here, I'd think.
> 
> https://testanything.org/tap-specification.html#bail-out

Yes, I added the only existing "Bail out!" to test-lib.sh. :)

I agree that's not what we want here. I actually think the current
behavior (to exit non-zero) does what we want. A TAP harness will
realize that's an error, but not block other scripts.

-Peff


Re: Cygwin Git with Windows paths

2018-11-20 Thread Torsten Bögershausen
On 20.11.18 01:17, Steven Penny wrote:
> On Sun, Nov 18, 2018 at 11:21 PM Torsten Bögershausen wrote:
>> If nothing works,
>> it may help to add some fprintf(stderr,...) in the functions used
>> by 05b458c104708141d2f:
>>
>> strip_last_component(),
>> get_next_component()
>> real_path_internal()
> 
> I didnt see any "real_path_internal" in the current codebase - however i added
> some "printf" to the other 2 and got this:
> 
> $ git clone git://github.com/benhoyt/goawk 'C:\cygwin64\tmp\goawk'
> get_next_component, next, []
> get_next_component, remaining, [C:\cygwin64\tmp\goawk]
> Cloning into 'C:\cygwin64\tmp\goawk'...
> get_next_component, next, []
> get_next_component, remaining, [C:\cygwin64\tmp\goawk/.git]
> fatal: Invalid path '/usr/local/cache/git/C:\cygwin64\tmp\goawk': No such file
> or directory
> 

Could you please post a "git diff" of your instrumented code,
so that I/we can follow the debugging, especially what the printouts mean?

I think we need to understand what is going on in abspath.c



[PATCH 3/3] pack-objects: fix off-by-one in delta-island tree-depth computation

2018-11-20 Thread Jeff King
When delta-islands are in use, we need to record the deepest path at
which we find each tree and blob. Our loop to do so counts slashes, so
"foo" is depth 0, "foo/bar" is depth 1, and so on.

However, this neglects root trees, which are represented by the empty
string. Those also have depth 0, but are at a layer above "foo". Thus,
"foo" should be 1, "foo/bar" at 2, and so on. We use this depth to
topo-sort the trees in resolve_tree_islands(). As a result, we may fail
to visit a root tree before the sub-trees it contains, and therefore not
correctly pass down the island marks.

That in turn could lead to missing some delta opportunities (objects are
in the same island, but we didn't realize it) or creating unwanted
cross-island deltas (one object is in an island another isn't, but we
don't realize). In practice, it seems to have only a small effect.  Some
experiments on the real-world git/git fork network at GitHub showed an
improvement of only 0.14% in the resulting clone size.

Signed-off-by: Jeff King 
---
 builtin/pack-objects.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e7ea206c08..411aefd687 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2786,9 +2786,11 @@ static void show_object(struct object *obj, const char 
*name, void *data)
 
if (use_delta_islands) {
const char *p;
-   unsigned depth = 0;
+   unsigned depth;
struct object_entry *ent;
 
+   /* the empty string is a root tree, which is depth 0 */
+   depth = *name ? 1 : 0;
for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
depth++;
 
-- 
2.20.0.rc0.715.gf6b01ab3e1


[PATCH 2/3] pack-objects: zero-initialize tree_depth/layer arrays

2018-11-20 Thread Jeff King
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) started maintaining a tree_depth array that
matches the "objects" array. We extend the array when:

  1. The objects array is extended, in which case we use realloc to
 extend the tree_depth array.

  2. A caller asks to store a tree_depth for object N, and this is the
 first such request; we create the array from scratch and store the
 value for N.

In the latter case, though, we use regular xmalloc(), and the depth
values for any objects besides N is undefined. This happens to not
trigger a bug with the current code, but the reasons are quite subtle:

 - we never ask about the depth for any object with index i < N. This is
   because we store the depth immediately for all trees and blobs. So
   any such "i" must be a non-tree, and therefore we will never need to
   care about its depth (in fact, we really only care about the depth of
   trees).

 - there are no objects at this point with index i > N, because we
   always fill in the depth for a tree immediately after its object
   entry is created (we may still allocate uninitialized depth entries,
   but they'll be initialized by packlist_alloc() when it initializes
   the entry in the "objects" array).

So it works, but only by chance. To be defensive, let's zero the array,
which matches the "unset" values which would be handed out by
oe_tree_depth() already.

Signed-off-by: Jeff King 
---
 git-compat-util.h | 1 +
 pack-objects.h| 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index f16058182f..09b0102cae 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -861,6 +861,7 @@ extern FILE *fopen_or_warn(const char *path, const char 
*mode);
 #define FREE_AND_NULL(p) do { free(p); (p) = NULL; } while (0)
 
 #define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc)))
+#define CALLOC_ARRAY(x, alloc) (x) = xcalloc((alloc), sizeof(*(x)));
 #define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), 
(alloc)))
 
 #define COPY_ARRAY(dst, src, n) copy_array((dst), (src), (n), sizeof(*(dst)) + 
\
diff --git a/pack-objects.h b/pack-objects.h
index f31ac1c81c..dc869f26c2 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data 
*pack,
 unsigned int tree_depth)
 {
if (!pack->tree_depth)
-   ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
+   CALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
pack->tree_depth[e - pack->objects] = tree_depth;
 }
 
@@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack,
unsigned char layer)
 {
if (!pack->layer)
-   ALLOC_ARRAY(pack->layer, pack->nr_alloc);
+   CALLOC_ARRAY(pack->layer, pack->nr_alloc);
pack->layer[e - pack->objects] = layer;
 }
 
-- 
2.20.0.rc0.715.gf6b01ab3e1



[PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-20 Thread Jeff King
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) dynamically manages a tree_depth array in
packing_data that maintains one of these invariants:

  1. tree_depth is NULL (i.e., the requested options don't require us to
 track tree depths)

  2. tree_depth is non-NULL and has as many entries as the "objects"
 array

We maintain (2) by:

  a. When the objects array grows, grow tree_depth to the same size
 (unless it's NULL, in which case we can leave it).

  b. When a caller asks to set a depth via oe_set_tree_depth(), if
 tree_depth is NULL we allocate it.

But in (b), we use the number of stored objects, _not_ the allocated
size of the objects array. So we can run into a situation like this:

  1. packlist_alloc() needs to store the Nth object, so it grows the
 objects array to M, where M > N.

  2. oe_set_tree_depth() wants to store a depth, so it allocates an
 array of length N. Now we've violated our invariant.

  3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
 grow the objects array, since N <= M still holds. We try to assign
 to tree_depth[N+1], which is out of bounds.

That doesn't happen in our test scripts, because the repositories they
use are so small, but it's easy to trigger by running:

  echo HEAD | git pack-objects --revs --delta-islands --stdout >/dev/null

in any reasonably-sized repo (like git.git).

We can fix it by always growing the array to match pack->nr_alloc, not
pack->nr_objects. Likewise for the "layer" array from fe0ac2fb7f
(pack-objects: move 'layer' into 'struct packing_data', 2018-08-16),
which has the same bug.

Signed-off-by: Jeff King 
---
 pack-objects.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pack-objects.h b/pack-objects.h
index feb6a6a05e..f31ac1c81c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data 
*pack,
 unsigned int tree_depth)
 {
if (!pack->tree_depth)
-   ALLOC_ARRAY(pack->tree_depth, pack->nr_objects);
+   ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
pack->tree_depth[e - pack->objects] = tree_depth;
 }
 
@@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack,
unsigned char layer)
 {
if (!pack->layer)
-   ALLOC_ARRAY(pack->layer, pack->nr_objects);
+   ALLOC_ARRAY(pack->layer, pack->nr_alloc);
pack->layer[e - pack->objects] = layer;
 }
 
-- 
2.20.0.rc0.715.gf6b01ab3e1



[PATCH 0/3] delta-island fixes

2018-11-20 Thread Jeff King
This fixes a few bugs in the cc/delta-islands topic: one major, and two
minor.

Sadly, the major one was not caught by our test suite, and I'm not sure
how to remedy that. The problem is a memory management one, and happens
only when you have a reasonably large number of objects. So it triggers
readily when run on a real repository, but not on the toy one in t5320.
Creating a much larger repository there would make the test suite more
expensive.

In cases like this I think it's often a good idea to have a perf test.
Those are expensive anyway, and we'd have the double benefit of
exercising the code and showing off the performance improvement. But the
delta-island code only makes sense on a very specialized repo: one where
you have multiple related but diverging histories.  You could simulate
that by picking two branches in a repository, but the effect is going to
be miniscule.

Anyway, here are the fixes without tests. We should at least apply these
before v2.20 ships with the bugs.

  [1/3]: pack-objects: fix tree_depth and layer invariants
  [2/3]: pack-objects: zero-initialize tree_depth/layer arrays
  [3/3]: pack-objects: fix off-by-one in delta-island tree-depth computation

 builtin/pack-objects.c | 4 +++-
 git-compat-util.h  | 1 +
 pack-objects.h | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)



[PATCH 1/1] rebase: warn about the correct tree's OID

2018-11-20 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

This was a simple copy/paste error, and an obvious one at that: if we
cannot fill the tree descriptor, we should show an error message about
*that* tree, not another one.

Signed-off-by: Johannes Schindelin 
---
 builtin/rebase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 1a2758756a..5b3e5baec8 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -582,7 +582,8 @@ static int reset_head(struct object_id *oid, const char 
*action,
}
 
if (!reset_hard && !fill_tree_descriptor([nr++], _oid)) {
-   ret = error(_("failed to find tree of %s"), oid_to_hex(oid));
+   ret = error(_("failed to find tree of %s"),
+   oid_to_hex(_oid));
goto leave_reset_head;
}
 
-- 
gitgitgadget


[PATCH 0/1] rebase: warn about the correct tree's OID

2018-11-20 Thread Johannes Schindelin via GitGitGadget
A quick fix for a recent topic. Not overly critical, but I would deem this
v2.20.0-rc1 material.

Johannes Schindelin (1):
  rebase: warn about the correct tree's OID

 builtin/rebase.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


base-commit: bb75be6cb916297f271c846f2f9caa3daaaec718
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-85%2Fdscho%2Freset_head-typo-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-85/dscho/reset_head-typo-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/85
-- 
gitgitgadget


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 20 2018, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Nov 15 2018, sxe...@google.com wrote:
>
>>> +Parent-type
>>> +---
>>> +The “parent-type” field in the commit header identifies a commit as a
>>> +meta-commit and indicates the meaning for each of its parents. It is never
>>> +present for normal commits.
> [...]
>> I think it's worth pointing out for those that are rusty on commit
>> object details (but I checked) is that the reason for it not being:
>>
>> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
>> parent aa7ce55545bf2c14bef48db91af1a74e2347539a
>> parent-type content
>> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
>> parent-type obsolete
>> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
>> parent-type origin
>> author Stefan Xenos  1540841596 -0700
>> committer Stefan Xenos  1540841596 -0700
>>
>> Which would be easier to read, is that we're very sensitive to the order
>> of the first few fields (tree -> parent -> author -> committer) and fsck
>> will error out if we interjected a new field.
>
> By the way, in the spirit of limiting the initial scope, I wonder
> whether the parent-type fields can be stored in the commit message
> initially.
>
> Elsewhere in this thread it was mentioned that the parent-type is a
> field to allow tools like "git fsck" to understand the meaning of
> these parent relationships (for example, to forbid a commit
> referencing a meta-commit).  The same could be done using special
> commit message text, though.
>
> The advantage of such an approach would be that we could experiment
> without changing the official object format at all.  If experiments
> revealed a different set of information to store, we could update the
> format without having to maintain the memory of the older format in
> "git fsck"'s understanding of commit object fields.  So even though I
> think that in the end we would want to put this information in the
> commit object header, I'm tempted to suspect that the benefits of
> putting it in the commit message to start outweigh the costs (in
> particular, of having to migrate to another format later).

I think it sounds better to just make it, in the header:

x-evolve-pt content
x-evolve-pt obsolete
x-evolve-pt origin

Where "pt = parent-type", we could of course spell that out too, but in
this case it's "x-evolve-pt" is the exact same number of bytes as
"parent-type", so nobody can object that it takes more space:)

We'd then carry some documentation where we say everything except "x-*-"
is reserved, and that we'd like to know about new "*" there before it's
used, so it can be documented.

Putting it in the commit message just sounds like a hack around not
having namespaced headers. If we'd like to keep this then tools would
need to parse both (potentially unpacking a lot of the commit message
object, it can be quite big in some cases...).


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Ævar Arnfjörð Bjarmason


On Tue, Nov 20 2018, Jonathan Nieder wrote:

Just commenting here on the end-state of this since it's easier than
each patch at a time:

First, do we still need to be doing %.4s instead of just %s? It would be
easier for translators / to understand what's going on if it were just
%s. I.e. "this is the extension name" v.s. "this is the first 4 bytes of
whatever it is...".

>   return error("index uses %.4s extension, which we do 
> not understand",
>ext);

Missing _(). Not the fault of this series, but something to fix while
we're at it.

Also not the fault of this series, the "is this upper case" test is
unportable, but this is probably the tip of the iceberg for git not
working on EBCDIC systems.

This message should say something like "Index uses the mandatory %s
extension" to clarify and distinguish it from the below. We don't
understand the upper-case one either, but the important distinction is
that one is mandatory, and the other can be dropped. The two messages
should make this clear.

Also, having advice() for that case is even more valuable since we have
a hard error at this point. So something like:

"This is likely due to the index having been written by a future
version of Git. All-lowercase index extensions are mandatory, as
opposed to optional all-uppercase ones which we'll drop with a
warning if we see them".

>   trace_printf("ignoring %.4s extension\n", ext);
> + if (advice_unknown_index_extension) {
> + warning(_("ignoring optional %.4s index extension"), 
> ext);

Should start with upper-case. Good that it says "optional".

> + advise(_("This is likely due to the file having been 
> written by a newer\n"
> +  "version of Git than is reading it. You can 
> upgrade Git to\n"
> +  "take advantage of performance improvements 
> from the updated\n"
> +  "file format.\n"

Let's not promise performance improvements with this extension in a
future version. We have no idea what the extension is, yeah right now
it's going to be true for the extension that prompted this patch series,
but may not be in the future. So just something like this for the last
sentence:

You can try upgrading Git to use this new index format.

> +  "\n"
> +  "Run \"%s\"\n"
> +  "to suppress this message."),
> +"git config advice.unknownIndexExtension false");

Somewhat of an aside, but if I grep:

git grep -C10 'git config advice\..*false' -- '*.[ch]'

There's a few existing examples of this, but the majority of advice()
messages don't say in the message how you can turn these off. Do we
think this a message users would especially like to turn off? I have the
opposite impression, it's a one-off in most cases, although not in the
case where an editor has an embedded git.

I think it would make sense to add this sort of thing to the advice()
API, i.e.:

advice_with_config_hint(_(""), "unknownIndexExtension");

Which would then know how to consistently print this advice about how to
turn off the warning.


Re: [RFC PATCH 1/5] ref-filter: add objectsize:disk option

2018-11-20 Thread Оля Тележная
вт, 13 нояб. 2018 г. в 04:52, Junio C Hamano :
>
> Jeff King  writes:
>
> >> You mean something like
> >>
> >>  v->s = xstrfmt("%"PRIdMAX, (intmax_t)oi->disk_size);
> >
> > I think elsewhere we simply use PRIuMAX for printing large sizes via
> > off_t; we know this value isn't going to be negative.
> >
> > I'm not opposed to PRIdMAX, which _is_ more accurate, but...
> >
> >> P.S.: I wondered whether we have precedent for PRIdMAX, as we used to use
> >> only PRIuMAX, but yes: JeffH's json-writer uses PRIdMAX.
> >
> > That's pretty recent. I won't be surprised if we have to do some
> > preprocessor trickery to handle that at some point. We have a PRIuMAX
> > fallback already. That comes from c4001d92be (Use off_t when we really
> > mean a file offset., 2007-03-06), but it's not clear to me if that was
> > motivated by a real platform or an over-abundance of caution.
> >
> > I'm OK with just using PRIdMAX as appropriate for now. It will serve as
> > a weather-balloon, and we can #define our way out of it later if need
> > be.
>
> I am OK if we avoid PRIdMAX and use PRIuMAX instead with a cast to
> the corresponding size in this codepath, as long as we properly
> handle negative oi.disk_size field, which may be telling some
> "unusual" condition to us.

Maybe we want to change the type (from off_t to unsigned) directly in
struct object_info? That will help us not to make additional
checkings. Or, at least, I suggest to add check to
oid_object_info_extended() so that this function will give a guarantee
that the size is non-negative. That will make code cleaner (otherwise
we need to add checks everywhere after oid_object_info_extended()
usage).

Please, look at this one also [1]. Thanks a lot!

[1] 
https://public-inbox.org/git/CAL21BmnoZuRih3Ky66_Tk0PweD36eZ6=fbY3jGumRcSJ=bc...@mail.gmail.com/

>
>


  1   2   >