Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Jacob Keller
On Wed, Nov 8, 2017 at 4:10 PM, Stefan Beller  wrote:
>> The relationship is indeed currently useful, but if the long term plan
>> is to strongly discourage detached submodule HEAD, then I would think
>> that these patches are in the wrong direction. (If the long term plan is
>> to end up supporting both detached and linked submodule HEAD, then these
>> patches are fine, of course.) So I think that the plan referenced in
>> Junio's email (that you linked above) still needs to be discussed.
>

> New type of symbolic refs
> =
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
>
> HEAD pointing at the superprojects index
> 
> Introduce a new symbolic ref that points at the superprojects
> index of the gitlink. The format is
>
>   "repo:"  '\0'  '\0'
>
> Just like existing symrefs, the content of the ref will be read and followed.
> On reading "repo:", the sha1 will be obtained equivalent to:
>
> git -C  ls-files -s  | awk '{ print $2}'
>
> Ref write operations driven by the submodule, affecting symrefs
>   e.g. git checkout  (in the submodule)
>
> In this scenario only the HEAD is optionally attached to the superproject,
> so we can rewrite the HEAD to be anything else, such as a branch just fine.
> Once the HEAD is not pointing at the superproject any more, we'll leave the
> submodule alone in operations driven by the superproject.
> To get back on the superproject branch, we’d need to invent new UX, such as
>git checkout --attach-superproject
> as that is similar to --detach
>

Some of the idea trimmed for brevity, but I like this aspect the most.
Currently, I work on several projects which have multiple
repositories, which are essentially submodules.

However, historically, we kept them separate. 99% of the time, you can
use all 3 projects on "master" and everything works. But if you go
back in time, there's no correlation to "what did the parent project
want this "COMMON" folder to be at?

I started promoting using submodules for this, since it seemed quite natural.

The core problem, is that several developers never quite understood or
grasped how submodules worked. There's problems like "but what if I
wanna work on master?" or people assume submodules need to be checked
out at master instead of in a detached HEAD state.

So we often get people who don't run git submodule update and thus are
confused about why their submodules are often out of date. (This can
be solved by recursive options to commands to more often recurse into
submodules and checkout and update them).

We also often get people who accidentally commit the old version of
the repository, or commit an update to the parent project pointing the
submodule at some commit which isn't yet in the upstream of the common
repository.

The proposal here seems to match the intuition about how submodules
should work, with the ability to "attach" or "detach" the submodule
when working on the submodule directly.

Ideally, I'd like for more ways to say "ignore what my submodule is
checked out at, since I will have something else checked out, and
don't intend to commit just yet."

Basically, a workflow where it's easier to have each submodule checked
out at master, and we can still keep track of historical relationship
of what commit was the submodule at some time ago, but without causing
some of these headaches.

I've often tried to use the "--skip-worktree" bit to have people set
their repository to ignore the submodule. Unfortunately, this is
pretty complex, and most of the time, developers never remember to do
this again on a fresh clone.

Thanks,
Jake


Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Junio C Hamano
Jonathan Tan  writes:

> What if, in the submodule, we have a new ref backend that mirrors the
> superproject? When initializing the submodule, its original refs are not
> cloned at all, but instead virtual refs are used.
> ...
> These rules seem straightforward to me (although I have been working
> with Git for a while, so perhaps I'm not the best judge), and I think
> leads to a good workflow, as discussed below.

Indeed this is intriguing.

> The above rules allow the following workflow:
>  - "checkout --recurse-submodules" the branch you want on the
>superproject
>  - make whatever changes you want in each submodule
>  - commit each individual submodule (which updates the index of the
>superproject), then commit the superproject (we can introduce a
>commit --recurse-submodules to make this more convenient)

The "recurse" option would also give users an extra atomicity, and
would not be merely for convenience; when a user wants to treat a
superproject and its two submodules as if the combined whole were a
single repository, there shouldn't be two separate commits in the
history of the superproject only because two submodules made one
commit each to work on a single theme that spans all of them.



Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Junio C Hamano
Stefan Beller  writes:

>> The relationship is indeed currently useful, but if the long term plan
>> is to strongly discourage detached submodule HEAD, then I would think
>> that these patches are in the wrong direction. (If the long term plan is
>> to end up supporting both detached and linked submodule HEAD, then these
>> patches are fine, of course.) So I think that the plan referenced in
>> Junio's email (that you linked above) still needs to be discussed.
>
> This email presents different approaches.
>
> Objective
> =
> This document should summarize the current situation of Git submodules
> and start a discussion of where it can be headed long term.
> Show different ways in which submodule refs could evolve.
>
> Background
> ==
> Submodules in Git are considered as an independet repository currently.
> This is okay for current workflows, such as utilizing a library that is
> rarely updated. Other workflows that require a tighter integration between
> submodule and superproject are possible, but cumbersome as there is an
> additional step that has to be performed, which is the update of the gitlink
> pointer in the superproject.

I do not think "rarely updaed" is an issue.

The problem is that we may want to make it easier to use a
superproject and its submodules as if the combined whole were a
single project, which currently is not easy, primarily because
submodules are separate entities with different set of branches that
can be checked out independently from what branch the superproject
is working on.

> Workflows
> =
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  
> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?
> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?
> * Working on a bug fix (Changing the feature that you currently work on, 
> branches)
>   -> How does switching branches in the superproject affect submodules

These are good starting points for copying such a combined whole to
your local machine and start working on it.  The more interesting,
important, and potentially difficult part is how the result of such
work is shared back to where you started from.  "push --recursive"
may be a simple phrase, but a sensible definition of how it should
work won't be that simple.

> Possible data models and workflow implications
> ==
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.
>
> Keep everything as is, superproject and submodule have their own refs
> -
> ...
> Cons:
>  * Current tools that manage multiple repositories (e.g. repo, git-slave)
>have "branches in parallel", i.e. each repo has a branch of the same
>name, instead of using a superproject to manage the state of all repos
>involved. So users of such tools may be confused by submodules.
>  * when using a detached HEAD in the submodule, we may run into git-gc issues.

We should make detached HEAD safe against gc if it is not,
regardless of the use of submodules.  I thought it already was made
safe long time ago.

> Use replicate refs in submodules
> 
> This approach will replicate the superproject refs into the submodule
> ref namespace, e.g. git-branch learns about --recurse-submodules, which
> creates a branch of a given name in all submodules. These (topic) branches
> should be kept in sync with the superproject
>
> Pros:
>  * This seemed intuitive to Gerrit users
>  * 'quick' to implement, most of the commands are already there,
>just git-branch is needed to have the workflows mentioned above complete.
> Cons:
>  * What does "git checkout -b A B" mean? (special case: B == HEAD)

The command ran at which level?  In the superproject, or in a single
submodule?

>Is the branch name replicated as a string into the submodule operation,
>or do we dereference the superprojects gitlink and walk from there?

If they are "kept in sync with the superproject", then there should
be no difference between the two, so I do not see any room for
wondering about that.  In other words, if there is need to worry
about the differences between the above two, then it probably is
fundamentally impossible to keep these in sync, and a design that
assumes it is possible would have to expose glitches to the end-user
experience.

I do not know if glitches resulting from there would be so severe to
be show-stoppers, though.  

Re: [PATCH/RFC] Replace Free Software Foundation address in license notices

2017-11-08 Thread Junio C Hamano
Todd Zullinger  writes:

> The mailing address for the FSF has changed over the years.  Rather than
> updating the address across all files, refer readers to gnu.org, as the
> GNU GPL documentation now suggests for license notices.  The mailing
> address is retained in the full license files (COPYING and LGPL-2.1).
>
> The old address is still present in t/diff-lib/COPYING.  This is
> intentional, as the file is used in tests and the contents are not
> expected to change.
>
> Signed-off-by: Todd Zullinger 
> ---

This change probably makes sense.  From here on after applying the
patch, we won't have to worry about updating these every time they
move---not that they have moved often, though ;-)

>  compat/obstack.c| 5 ++---
>  ...
>  ewah/ewok_rlw.h | 3 +--
>  git-gui/git-gui.sh  | 3 +--
>  imap-send.c | 3 +--
>  ...
>  44 files changed, 69 insertions(+), 103 deletions(-)

I've tried hard to keep the git-gui/ part as a separate project (it
indeed is managed separately).  I have been, and am still only
pulling from its "upstream" repository (Pat, who is its project
lead, Cc'ed), refaining from making changes that do not exist there
at git://repo.or.cz/git-gui.git/ to the tree I publish.

I'll separate the part from this patch that touches git-gui/* and
try to arrange the next pull from git-gui repository would have the
omitted part somehow.  Given that the "upstream" seems to be inactive
these days, we might want to change the way we manage that part of
the tree, though.

Thanks.


Re: [PATCH] rebase -i: fix comment typo

2017-11-08 Thread Junio C Hamano
Adam Dinwoodie  writes:

> Signed-off-by: Adam Dinwoodie 
> ---
>  git-rebase--interactive.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 2563dc52d..437815669 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -722,7 +722,7 @@ collapse_todo_ids() {
>   git rebase--helper --shorten-ids
>  }
>  
> -# Add commands after a pick or after a squash/fixup serie
> +# Add commands after a pick or after a squash/fixup series
>  # in the todo list.
>  add_exec_commands () {
>   {

Thanks.  I recall deliberately letting this slip through knowing
that it came from France ;-)

Will apply.


Re: Bug - Status - Space in Filename

2017-11-08 Thread Junio C Hamano
Joseph Strauss  writes:

> I believe I have found a bug in the way git status -s lists filenames.
>
> According to the documentation:
>   The fields (including the ->) are separated from each other by a single 
> space. If a filename contains whitespace or other nonprintable characters,   
> that field will be quoted in the manner of a C string literal: surrounded by 
> ASCII double quote (34) characters, and with interior special characters 
> backslash-escaped.
>
> While this is true in most situations, it does not seem to apply to merge 
> conflicts. When a file has merge conflicts I am getting the following:
>  $ git status -s
>  UU some/path/with space/in/the/name
>  M  "another/path/with space/in/the/name "
>
> I found the same problem for the following versions:
> . git version 2.15.0.windows.1
> . git version 2.10.0

Thanks.

wt_shortstatus_status() has this code:

if (s->null_termination) {
fprintf(stdout, "%s%c", it->string, 0);
if (d->head_path)
fprintf(stdout, "%s%c", d->head_path, 0);
} else {
struct strbuf onebuf = STRBUF_INIT;
const char *one;
if (d->head_path) {
one = quote_path(d->head_path, s->prefix, );
if (*one != '"' && strchr(one, ' ') != NULL) {
putchar('"');
strbuf_addch(, '"');
one = onebuf.buf;
}
printf("%s -> ", one);
strbuf_release();
}
one = quote_path(it->string, s->prefix, );
if (*one != '"' && strchr(one, ' ') != NULL) {
putchar('"');
strbuf_addch(, '"');
one = onebuf.buf;
}
printf("%s\n", one);
strbuf_release();
}

But the corresponding part in wt_shortstatus_unmerged() reads like
this:

if (s->null_termination) {
fprintf(stdout, " %s%c", it->string, 0);
} else {
struct strbuf onebuf = STRBUF_INIT;
const char *one;
one = quote_path(it->string, s->prefix, );
printf(" %s\n", one);
strbuf_release();
}

The difference in d->head_path part is dealing about renames, which
is irrelevant for showing unmerged paths, but the key difference is
that the _unmerged() forgets to add the enclosing "" around the
result of quote_path().

It seems that wt_shortstatus_other() shares the same issue.  Perhaps
this will "fix" it?

Having said all that, the whole "quote path using c-style" was
designed very deliberately to treat SP (but not other kind of
whitespaces like HT) as something we do *not* have to quote (and
more importantly, do not *want* to quote, as pathnames with SP in it
are quite common for those who are used to "My Documents/" etc.), so
it is arguably that what is broken and incorrect is the extra
quoting with dq done in wt_shortstatus_status().  It of course is
too late to change the rules this late in the game, though.

Perhaps a better approach is to refactor the extra quoting I moved
to this emit_with_optional_dq() helper into quote_path() which
currently is just aliased to quote_path_relative().  It also is
possible that we may want to extend the "no_dq" parameter that is
given to quote_c_style() helper from a boolean to a set of flag
bits, and allow callers to request "I want SP added to the set of
bytes that triggers the quoting".


 wt-status.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index bedef256ce..472d53d596 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1671,6 +1671,20 @@ static void wt_longstatus_print(struct wt_status *s)
wt_longstatus_print_stash_summary(s);
 }
 
+static const char *emit_with_optional_dq(const char *in, const char *prefix, 
+ struct strbuf *out)
+{
+   const char *one;
+
+   one = quote_path(in, prefix, out);
+   if (*one != '"' && strchr(one, ' ') != NULL) {
+   putchar('"');
+   strbuf_addch(out, '"');
+   one = out->buf;
+   }
+   return one;
+}
+
 static void wt_shortstatus_unmerged(struct string_list_item *it,
   struct wt_status *s)
 {
@@ -1692,8 +1706,9 @@ static void wt_shortstatus_unmerged(struct 
string_list_item *it,
} else {
struct strbuf onebuf = STRBUF_INIT;
const char *one;
-   one = quote_path(it->string, s->prefix, );
-   printf(" %s\n", one);
+   putchar(' ');
+   one = emit_with_optional_dq(it->string, s->prefix, );
+   printf("%s\n", one);

Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-11-08 Thread Junio C Hamano
Michael J Gruber  writes:

> It seems the consensus was that current functionality is as designed but
> not necessarily as expected, and another mode "--fork-base" (that does
> what I suggested as "fix") would meet these expectations. I would reuse
> the documentation of the current mode as a description of the new mode
> and add documentation for the existing mode ;)

If we are going to have two modes, it would be absolutely necessary
to explain in the documentation what they compute exactly in terms
that the end users understand, so that the users can choose which
one to use.

The current documentation is not that great because it can get away
with "we automatically by magic find the commit on top of which you
forked your branch", but when we have two different kinds of magic,
that would no longer be a useful description to help users to
choose.  We probably need to enhance the description in "Discussion
on fork-point mode" as a preliminary clean-up.  The attached is my
attempt.

I'd expect `--fork-base` to be explained in a similar way and to the
level of detail to help users pick which one of the three options
(i.e. without any option, with fork-point and with the new one) is
appropriate for their situation.  I've been thinking about what your
patch does, and I cannot come up with a reasonable way to explain
what it computes in the terms the users can get their heads around,
with history illustrations.

--- >8 ---
Subject: merge-base --fork-point doc: clarify the example and failure modes

The illustrated history used to explain the `--fork-point` mode
named three keypoint commits B3, B2 and B1 from the oldest to the
newest, which was hard to read.  Relabel them to B0, B1, B2.  Also
illustrate the history after the rebase using the `--fork-point`
facility was made.

The text already mentions use of reflog, but the description is not
clear what benefit we are trying to gain by using reflog.  Clarify
that it is to find the commits that were known to be at the tip of
the remote-tracking branch.  This in turn necessitates users to know
the ramifications of the underlying assumptions, namely, expiry of
reflog entries will make it impossible to determine which commits
were at the tip of the remote-tracking branches and we fail when in
doubt (instead of giving a random and incorrect result without even
warning).  Another limitation is that it won't be useful if you did
not fork from the tip of a remote-tracking branch but from in the
middle.

Describe them.

Signed-off-by: Junio C Hamano 
---
 Documentation/git-merge-base.txt | 64 +++-
 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-merge-base.txt b/Documentation/git-merge-base.txt
index b968b64c38..a4859c8597 100644
--- a/Documentation/git-merge-base.txt
+++ b/Documentation/git-merge-base.txt
@@ -154,23 +154,71 @@ topic origin/master`, the history of remote-tracking 
branch
 `origin/master` may have been rewound and rebuilt, leading to a
 history of this shape:
 
-o---B1
+o---B2
/
-   ---o---o---B2--o---o---o---B (origin/master)
+   ---o---o---B1--o---o---o---B (origin/master)
\
-B3
+B0
  \
-  Derived (topic)
+  D0---D1---D (topic)
 
-where `origin/master` used to point at commits B3, B2, B1 and now it
+where `origin/master` used to point at commits B0, B1, B2 and now it
 points at B, and your `topic` branch was started on top of it back
-when `origin/master` was at B3. This mode uses the reflog of
-`origin/master` to find B3 as the fork point, so that the `topic`
-can be rebased on top of the updated `origin/master` by:
+when `origin/master` was at B0, and you built three commits, D0, D1,
+and D, on top of it.  Imagine that you now want to rebase the work
+you did on the topic on top of the updated origin/master.
+
+In such a case, `git merge-base origin/master topic` would return the
+parent of B0 in the above picture, but B0^..D is *not* the range of
+commits you would want to replay on top of B (it includes B0, which
+is not what you wrote; it is a commit the other side discarded when
+it moved its tip from B0 to B1).  
+
+`git merge-base --fork-point origin/master topic` is designed to
+help in such a case.  It takes not only B but also B0, B1, and B2
+(i.e. old tips of the remote-tracking branches your repository's
+reflog knows about) into account to see on which commit your topic
+branch was built and finds B0, allowing you to replay only the
+commits on your topic, excluding the commits the other side later
+discarded.
+
+Hence
 
 $ fork_point=$(git merge-base --fork-point origin/master topic)
+
+will find B0, and
+
 $ git rebase --onto origin/master $fork_point topic
 
+will replay D0, D1 and D on top of B to create a new history of this
+shape:
+
+

Re: [RFD] Long term plan with submodule refs?

2017-11-08 Thread Jonathan Tan
On Wed,  8 Nov 2017 16:10:07 -0800
Stefan Beller  wrote:

I thought of a possible alternative and how it would work.

> Possible data models and workflow implications
> ==
> In the following different data models are presented, which aid a submodule
> heavy workflow each giving pros and cons.

What if, in the submodule, we have a new ref backend that mirrors the
superproject? When initializing the submodule, its original refs are not
cloned at all, but instead virtual refs are used.

Creation of brand-new refs is forbidden in the submodule.

When reading a ref in the submodule, if that ref is the current branch
in the superproject, read the corresponding gitlink entry in the index
(which may be dirty); otherwise read the gitlink in the tree of the tip
commit.

When updating a ref in the submodule, if that ref is the current branch
in the superproject, update the index; otherwise, create a commit on top
of the tip and update the ref to point to the new tip.

No synchronicity is enforced between superproject and submodule in terms
of HEAD, though: If a submodule is currently checked out to a branch,
and the gitlink for that branch is updated through whatever means, that
is equivalent to a "git reset --soft" in the submodule.

These rules seem straightforward to me (although I have been working
with Git for a while, so perhaps I'm not the best judge), and I think
leads to a good workflow, as discussed below.

> Workflows
> =
> * Obtaining a copy of the Superproject tightly coupled with submodules
>   solved via git clone --recurse-submodules=
> * Changing the submodule selection
>   solved via submodule.active flags
> * Changing the remote / Interacting with a different remote for all submodules
>   -> need to be solved, not core issue of this discussion
> * Syncing to the latest upstream
>   solved via git pull --recurse  

(skipping the above, since they are either solved or not a core issue)

> * Working on a local feature in one submodule
>   -> How do refs work spanning superproject/submodule?

This is perhaps one weak point of my proposal - you can't work on a
submodule as if it were independent. You can checkout a branch and make
commits, but (i) they will automatically affect the superproject, and
(ii) the "origin/foo" etc. branches are those of the superproject. (But
if you checkout a detached HEAD, everything should still work.)

> * Working on a feature spanning multiple submodules
>   -> How do refs work spanning multiple repos?

The above rules allow the following workflow:
 - "checkout --recurse-submodules" the branch you want on the
   superproject
 - make whatever changes you want in each submodule
 - commit each individual submodule (which updates the index of the
   superproject), then commit the superproject (we can introduce a
   commit --recurse-submodules to make this more convenient)
 - a "push --recurse-submodules" can be implemented to push the
   superproject and its submodules independently (and the same refspec
   can be legitimately used both when pushing the superproject and when
   pushing a submodule, since the ref names are the same, and not by
   coincidence)

If the user insists on making changes on a non-current branch (i.e. by
creating commits in submodules then using "git update-ref" or
equivalent), possibly multiple commits would be created in the
superproject, but the user can still squash them later if desired.

> * Working on a bug fix (Changing the feature that you currently work on, 
> branches)
>   -> How does switching branches in the superproject affect submodules

You will have to stash or commit your changes. (Which reminds me...GC in
the subproject will need to consult the revlog of the superproject too.)

> New type of symbolic refs
> =
> A symbolic ref can currently only point at a ref or another symbolic ref.
> This proposal showcases different scenarios on how this could change in the
> future.
> 
> HEAD pointing at the superprojects index
> 

Assuming we don't need synchronicity, the existing HEAD format can be
retained. To clarify what happens during ref writes, I'll reuse the
scenarios Stefan described:

> Ref write operations driven by the submodule, affecting target ref
>   e.g. git commit, reset --hard, update-ref (in the submodule)
> 
> The HEAD stays the same, pointing at the superproject.
> The gitlink is changed to the target sha1, using
> 
>   git -C  update-index --add \
>   --cacheinfo 16,$SHA1,
> 
> This will affect the superprojects index, such that then a commit in
> the superproject is needed.

In this proposal, the HEAD also stays the same (pointing at the branch).

Either the index is updated or a commit is needed. If a commit is
needed, it is automatically performed.

> Ref write operations driven by the superproject, changing the gitlink
>   e.g. git checkout , git reset --hard (in the 

[RFD] Long term plan with submodule refs?

2017-11-08 Thread Stefan Beller
> The relationship is indeed currently useful, but if the long term plan
> is to strongly discourage detached submodule HEAD, then I would think
> that these patches are in the wrong direction. (If the long term plan is
> to end up supporting both detached and linked submodule HEAD, then these
> patches are fine, of course.) So I think that the plan referenced in
> Junio's email (that you linked above) still needs to be discussed.

This email presents different approaches.

Objective
=
This document should summarize the current situation of Git submodules
and start a discussion of where it can be headed long term.
Show different ways in which submodule refs could evolve.

Background
==
Submodules in Git are considered as an independet repository currently.
This is okay for current workflows, such as utilizing a library that is
rarely updated. Other workflows that require a tighter integration between
submodule and superproject are possible, but cumbersome as there is an
additional step that has to be performed, which is the update of the gitlink
pointer in the superproject.

Other discussions of the past:
"Re-attach HEAD?"
  https://public-inbox.org/git/20170501180058.8063-1-sbel...@google.com/
"Semantics of checkout --recursive for submodules on a branch"
  https://public-inbox.org/git/20170630003851.17288-1-sbel...@google.com/
"A new type of symref?"
  https://public-inbox.org/git/xmqqvamqg2fy@gitster.mtv.corp.google.com/

Workflows
=
* Obtaining a copy of the Superproject tightly coupled with submodules
  solved via git clone --recurse-submodules=
* Changing the submodule selection
  solved via submodule.active flags
* Changing the remote / Interacting with a different remote for all submodules
  -> need to be solved, not core issue of this discussion
* Syncing to the latest upstream
  solved via git pull --recurse  
* Working on a local feature in one submodule
  -> How do refs work spanning superproject/submodule?
* Working on a feature spanning multiple submodules
  -> How do refs work spanning multiple repos?
* Working on a bug fix (Changing the feature that you currently work on, 
branches)
  -> How does switching branches in the superproject affect submodules

This discussion should resolve around refs are handled in submodules in
relation to a superproject.

Possible data models and workflow implications
==
In the following different data models are presented, which aid a submodule
heavy workflow each giving pros and cons.

Keep everything as is, superproject and submodule have their own refs
-
In this alternative we'd just make existing commands nicer, e.g.
git-status, git-log would give information about the superprojects
gitlink similar as they give information about a remote branch.

We might want to introduce an option that triggers adding the submodule
to the superproject once a commit is done in the submodule.

Pros:
 * easiest to implement
 * easy to understand when having a git background already
 
Cons:
 * Current tools that manage multiple repositories (e.g. repo, git-slave)
   have "branches in parallel", i.e. each repo has a branch of the same
   name, instead of using a superproject to manage the state of all repos
   involved. So users of such tools may be confused by submodules.
 * when using a detached HEAD in the submodule, we may run into git-gc issues.
 

Use replicate refs in submodules

This approach will replicate the superproject refs into the submodule
ref namespace, e.g. git-branch learns about --recurse-submodules, which
creates a branch of a given name in all submodules. These (topic) branches
should be kept in sync with the superproject

Pros:
 * This seemed intuitive to Gerrit users
 * 'quick' to implement, most of the commands are already there,
   just git-branch is needed to have the workflows mentioned above complete.
Cons:
 * What does "git checkout -b A B" mean? (special case: B == HEAD)
   Is the branch name replicated as a string into the submodule operation,
   or do we dereference the superprojects gitlink and walk from there?
   When taking the superprojects gitlink, then why do we have the branches
   in the submodule in the first place? When taking the string as-is,
   then it might confuse users.
 * non-atomic of refs between superproject and submodule by design;
   This relies on superproject and submodule to stay in sync via hope.

No submodule refstore at all

Use refs and commits in the superproject to stitch submodule changes
together. Disallow branches in the submodule. This is only restricted
to the working tree inside the superproject, such that the output of git-branch
changes depending whether the working tree is in- or outside the superproject
working tree.

The messages of git-status inside the superproject working tree are changed
as "detached 

Urgent response !!!

2017-11-08 Thread Melisa Mehmet
Greetings,

I have a business proposal I would love to discuss with you. please reply me 
for more details
Yours Sincerely,
miss.melisa.mehmet


Re: Test failures on 'pu' branch

2017-11-08 Thread Ramsay Jones


On 08/11/17 22:34, Ramsay Jones wrote:
> 
> 
> On 08/11/17 20:36, Stefan Beller wrote:
>> On Wed, Nov 8, 2017 at 12:28 PM, Ramsay Jones
>>  wrote:
>>
>>> t5300-pack-object.sh (Wstat: 256 Tests: 40 
>>> Failed: 2)
>>
>>> t5500-fetch-pack.sh  (Wstat: 256 Tests: 355 
>>> Failed: 6)
>>
>> These are series
>>
>>> t5601-clone.sh   (Wstat: 256 Tests: 102 
>>> Failed: 4)
>>
>> This one is a spurious test. I had that flake on me once in the last weeks, 
>> too.
>> But upon investigation I could not reproduce.
>> See 
>> https://public-inbox.org/git/xmqq376ipdpx@gitster.mtv.corp.google.com/
>>
> 
> No, this is not related to that. In fact several tests start
> working if I change the '--filter=blobs:limit=0' to instead
> read '--filter=blob:limit=0' (ie. change blob_s_ to blob).
> 
> In fact t5601 now works with the following patch:

OK, so the patch given below fixes all tests except t5300.37.
All the patch does is change 'blobs' to 'blob' in the --filter
parameters.

The single failure looks like:

  $ ./t5300-pack-object.sh -i -v
  ...
  Initialized empty Git repository in /home/ramsay/git/t/trash   
directory.t5300-pack-object/server/.git/
  [master (root-commit) a72904c] x
   Author: A U Thor 
   3 files changed, 3 insertions(+)
   create mode 100644 .git-a
   create mode 100644 a
   create mode 100644 b
  Counting objects: 2, done.
  Compressing objects: 100% (2/2), done.
  Total 2 (delta 0), reused 0 (delta 0)
  ad26794874784493dafa81f1644b3dcfad05d843
  not ok 37 - filtering by size never excludes special files
  # 
  # rm -rf server &&
  # git init server &&
  # printf a-very-long-file > server/a &&
  # printf a-very-long-file > server/.git-a &&
  # printf b-very-long-file > server/b &&
  # git -C server add a .git-a b &&
  # git -C server commit -m x &&
  # 
  # git -C server rev-parse HEAD >objects &&
  # git -C server pack-objects --revs --stdout 
--filter=blob:limit=10 my.pack &&
  # 
  # # Ensure that the .git-a blob is in the packfile, despite also
  # # appearing as a non-.git file
  # git index-pack my.pack &&
  # git verify-pack -v my.idx >objectlist &&
  # grep $(git hash-object server/a) objectlist
  # 
  $ 

Hope that helps.

ATB,
Ramsay Jones

-- >8 --
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 0739a0796..17c9ffdca 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -472,7 +472,7 @@ test_expect_success 'filtering by size works with multiple 
excluded' '
git -C server commit -m x &&
 
git -C server rev-parse HEAD >objects &&
-   git -C server pack-objects --revs --stdout --filter=blobs:limit=10 
my.pack &&
+   git -C server pack-objects --revs --stdout --filter=blob:limit=10 
my.pack &&
 
# Ensure that only the small blobs are in the packfile
git index-pack my.pack &&
@@ -493,7 +493,7 @@ test_expect_success 'filtering by size never excludes 
special files' '
git -C server commit -m x &&
 
git -C server rev-parse HEAD >objects &&
-   git -C server pack-objects --revs --stdout --filter=blobs:limit=10 
my.pack &&
+   git -C server pack-objects --revs --stdout --filter=blob:limit=10 
my.pack &&
 
# Ensure that the .git-a blob is in the packfile, despite also
# appearing as a non-.git file
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 86cf65323..08b7a32c7 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -762,7 +762,7 @@ test_expect_success 'filtering by size' '
test_config -C server uploadpack.allowfilter 1 &&
 
test_create_repo client &&
-   git -C client fetch-pack --filter=blobs:limit=0 ../server HEAD &&
+   git -C client fetch-pack --filter=blob:limit=0 ../server HEAD &&
 
# Ensure that object is not inadvertently fetched
test_must_fail git -C client cat-file -e $(git hash-object server/one.t)
@@ -774,7 +774,7 @@ test_expect_success 'filtering by size has no effect if 
support for it is not ad
test_commit -C server one &&
 
test_create_repo client &&
-   git -C client fetch-pack --filter=blobs:limit=0 ../server HEAD 2> err &&
+   git -C client fetch-pack --filter=blob:limit=0 ../server HEAD 2> err &&
 
# Ensure that object is fetched
git -C client cat-file -e $(git hash-object server/one.t) &&
@@ -800,7 +800,7 @@ setup_blob_max_bytes () {
 do_blob_max_bytes() {
SERVER="$1" &&
 
-   git -C client fetch --filter=blobs:limit=0 origin HEAD:somewhere &&
+   git -C client fetch --filter=blob:limit=0 origin HEAD:somewhere &&
 
# Ensure that commit is fetched, but blob is not
test_config -C client 

Re: [RFC PATCH 0/4] git-status reports relation to superproject

2017-11-08 Thread Jonathan Tan
On Wed,  8 Nov 2017 11:55:05 -0800
Stefan Beller  wrote:

>   $ git -c status.superprojectinfo status
>   HEAD detached at v2.15-rc2
>   superproject is 6 commits behind HEAD 7070ce2..5e6d0fb
>   nothing to commit, working tree clean
> 
> How cool is that?
> 
> This series side steps the questions raised in
> https://public-inbox.org/git/xmqq4lq6hmp2.fsf...@gitster.mtv.corp.google.com/
> which I am also putting together albeit slowly.
> 
> This series just reports the relationship between the superprojects gitlink
> (if any) to HEAD. I think that is useful information in the current
> world of submodules.

The relationship is indeed currently useful, but if the long term plan
is to strongly discourage detached submodule HEAD, then I would think
that these patches are in the wrong direction. (If the long term plan is
to end up supporting both detached and linked submodule HEAD, then these
patches are fine, of course.) So I think that the plan referenced in
Junio's email (that you linked above) still needs to be discussed.

About the patches themselves, they look OK to me. Some minor things off
the top of my head are to retain the "ours" and "theirs" (instead of
"one" and "two"), and to replicate the language in remote.c more closely
("This submodule is (ahead of/behind) the superproject by %d commit(s)")
instead of inventing your own.


Re: Test failures on 'pu' branch

2017-11-08 Thread Ramsay Jones


On 08/11/17 20:36, Stefan Beller wrote:
> On Wed, Nov 8, 2017 at 12:28 PM, Ramsay Jones
>  wrote:
> 
>> t5300-pack-object.sh (Wstat: 256 Tests: 40 
>> Failed: 2)
> 
>> t5500-fetch-pack.sh  (Wstat: 256 Tests: 355 
>> Failed: 6)
> 
> These are series
> 
>> t5601-clone.sh   (Wstat: 256 Tests: 102 
>> Failed: 4)
> 
> This one is a spurious test. I had that flake on me once in the last weeks, 
> too.
> But upon investigation I could not reproduce.
> See https://public-inbox.org/git/xmqq376ipdpx@gitster.mtv.corp.google.com/
> 

No, this is not related to that. In fact several tests start
working if I change the '--filter=blobs:limit=0' to instead
read '--filter=blob:limit=0' (ie. change blob_s_ to blob).

In fact t5601 now works with the following patch:

-- >8 --
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f18d9454a..0074690f7 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -630,7 +630,7 @@ partial_clone () {
test_config -C "$SERVER" uploadpack.allowfilter 1 &&
test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
 
-   git clone --filter=blobs:limit=0 "$URL" client &&
+   git clone --filter=blob:limit=0 "$URL" client &&
 
git -C client fsck &&
 
@@ -651,7 +651,7 @@ test_expect_success 'partial clone: warn if server does not 
support object filte
 test_create_repo server &&
 test_commit -C server one &&
 
-   git clone --filter=blobs:limit=0 "file://$(pwd)/server" client 2> err &&
+   git clone --filter=blob:limit=0 "file://$(pwd)/server" client 2> err &&
 
test_i18ngrep "filtering not recognized by server" err
 '
@@ -673,7 +673,7 @@ test_expect_success 'batch missing blob request during 
checkout' '
test_config -C server uploadpack.allowfilter 1 &&
test_config -C server uploadpack.allowanysha1inwant 1 &&
 
-   git clone --filter=blobs:limit=0 "file://$(pwd)/server" client &&
+   git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
 
# Ensure that there is only one negotiation by checking that there is
# only "done" line sent. ("done" marks the end of negotiation.)
@@ -705,7 +705,7 @@ test_expect_success 'batch missing blob request does not 
inadvertently try to fe
test_config -C server uploadpack.allowanysha1inwant 1 &&
 
# Make sure that it succeeds
-   git clone --filter=blobs:limit=0 "file://$(pwd)/server" client
+   git clone --filter=blob:limit=0 "file://$(pwd)/server" client
 '
 
 . "$TEST_DIRECTORY"/lib-httpd.sh
-- 8< --

A similar patch to 't/t5300-pack-object.sh' gets one of the two
failing tests working. I haven't looked at 't/t5500-fetch-pack.sh'
yet.

ATB,
Ramsay Jones




Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-08 Thread Jeff Hostetler



On 11/8/2017 4:51 PM, Jonathan Tan wrote:

On Wed, 8 Nov 2017 15:32:21 -0500
Jeff Hostetler  wrote:


Thanks Jonathan.

I moved my version of part 2 on top of yesterday's part 1.
There are a few changes between my version and yours. Could
you take a quick look at them and see if they make sense?
(I'll spare the mailing list another patch series until after
I attend to the feed back on part 1.)

https://github.com/jeffhostetler/git/commits/core/pc3_p2


Thanks - the differences are quite minor, and they generally make sense.
The main one is that finish_object() in builtin/rev-list.c now returns
int instead of void, but that makes sense.

Other than that:

  - I think you accidentally squashed the rev-list commit into
"sha1_file: support lazily fetching missing objects".
  - The documentation for --exclude-promisor-objects in
git-pack-objects.txt should be "Omit objects that are known to be in
the promisor remote". (This option has the purpose of operating only
on locally created objects, so that when we repack, we still maintain
a distinction between locally created objects [without .promisor] and
objects from the promisor remote [with .promisor].)
  - The transport options in gitremote-helpers.txt could have the same
documentation as in transport.h.



thanks for the quick turn around.  i'll get these into my next
version next week.

Jeff


Re: Test failures on 'pu' branch

2017-11-08 Thread Jeff Hostetler



On 11/8/2017 3:36 PM, Stefan Beller wrote:

On Wed, Nov 8, 2017 at 12:28 PM, Ramsay Jones
 wrote:


t5300-pack-object.sh (Wstat: 256 Tests: 40 Failed: 
2)



t5500-fetch-pack.sh  (Wstat: 256 Tests: 355 Failed: 
6)


These are series


t5601-clone.sh   (Wstat: 256 Tests: 102 Failed: 
4)


This one is a spurious test. I had that flake on me once in the last weeks, too.
But upon investigation I could not reproduce.
See https://public-inbox.org/git/xmqq376ipdpx@gitster.mtv.corp.google.com/



I suspect that the failures related to the jh/partial-clone-* branches
are probably due to slight differences between yesterday's part-1 and
last week's version of part-2 and part-3.

I'm going to be on vacation until Monday, so can we just pull those
parts out of 'pu' until I can get you new versions of parts 2 and 3 ?

Thanks
Jeff


Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-08 Thread Jonathan Tan
On Wed, 8 Nov 2017 15:32:21 -0500
Jeff Hostetler  wrote:

> Thanks Jonathan.
> 
> I moved my version of part 2 on top of yesterday's part 1.
> There are a few changes between my version and yours. Could
> you take a quick look at them and see if they make sense?
> (I'll spare the mailing list another patch series until after
> I attend to the feed back on part 1.)
> 
> https://github.com/jeffhostetler/git/commits/core/pc3_p2

Thanks - the differences are quite minor, and they generally make sense.
The main one is that finish_object() in builtin/rev-list.c now returns
int instead of void, but that makes sense.

Other than that:

 - I think you accidentally squashed the rev-list commit into
   "sha1_file: support lazily fetching missing objects".
 - The documentation for --exclude-promisor-objects in
   git-pack-objects.txt should be "Omit objects that are known to be in
   the promisor remote". (This option has the purpose of operating only
   on locally created objects, so that when we repack, we still maintain
   a distinction between locally created objects [without .promisor] and
   objects from the promisor remote [with .promisor].)
 - The transport options in gitremote-helpers.txt could have the same
   documentation as in transport.h.


Re: Test failures on 'pu' branch

2017-11-08 Thread Stefan Beller
On Wed, Nov 8, 2017 at 12:28 PM, Ramsay Jones
 wrote:

> t5300-pack-object.sh (Wstat: 256 Tests: 40 
> Failed: 2)

> t5500-fetch-pack.sh  (Wstat: 256 Tests: 355 
> Failed: 6)

These are series

> t5601-clone.sh   (Wstat: 256 Tests: 102 
> Failed: 4)

This one is a spurious test. I had that flake on me once in the last weeks, too.
But upon investigation I could not reproduce.
See https://public-inbox.org/git/xmqq376ipdpx@gitster.mtv.corp.google.com/


Re: [PATCH 1/9] extension.partialclone: introduce partial clone extension

2017-11-08 Thread Jeff Hostetler



On 11/6/2017 2:16 PM, Jonathan Tan wrote:

On Mon, 6 Nov 2017 12:32:45 -0500
Jeff Hostetler  wrote:


Yes, that is a point I wanted to ask about.  I renamed the
extensions.partialclone that you created and then I moved your
remote..blob-max-bytes setting to be in extensions too.
Moving it to core.partialclonefilter is fine.


OK - in that case, it might be easier to just reuse my first patch in
its entirety. "core.partialclonefilter" is not used until the
fetching/cloning part anyway.



Good point.  I'll take a look at refactoring that.
If it looks like the result will be mostly/effectively
your original patches, I'll let you know and hand part 2
back to you.


Sounds good. I uploaded the result of rebasing my part 2 patches on top
of your part 1 patches here, if you would like it as a reference:

https://github.com/jonathantanmy/git/commits/pc20171106



Thanks Jonathan.

I moved my version of part 2 on top of yesterday's part 1.
There are a few changes between my version and yours. Could
you take a quick look at them and see if they make sense?
(I'll spare the mailing list another patch series until after
I attend to the feed back on part 1.)

https://github.com/jeffhostetler/git/commits/core/pc3_p2

Thanks
Jeff



Test failures on 'pu' branch

2017-11-08 Thread Ramsay Jones
Hi Junio,

You are probably already aware, but just in case, the 'pu' branch
fails the testsuite for me as follows:

$ tail -18 ptest-out
Test Summary Report
---
t5300-pack-object.sh (Wstat: 256 Tests: 40 Failed: 
2)
  Failed tests:  36-37
  Non-zero exit status: 1
t5500-fetch-pack.sh  (Wstat: 256 Tests: 355 Failed: 
6)
  Failed tests:  350-355
  Non-zero exit status: 1
t5601-clone.sh   (Wstat: 256 Tests: 102 Failed: 
4)
  Failed tests:  99-102
  Non-zero exit status: 1
Files=800, Tests=16964, 462 wallclock secs ( 4.57 usr  0.78 sys + 57.93 cusr 
24.16 csys = 87.44 CPU)
Result: FAIL
Makefile:45: recipe for target 'prove' failed
make[1]: *** [prove] Error 1
make[1]: Leaving directory '/home/ramsay/git/t'
Makefile:2448: recipe for target 'test' failed
make: *** [test] Error 2
$ 

Looking at the first test failure in each file, the failures seems
to be related to the 'jh/partial-clone' branch.

Just FYI. Thanks.

ATB,
Ramsay Jones



[PATCH 2/4] submodule.c: factor start_ls_files_dot_dot out of get_superproject_working_tree

2017-11-08 Thread Stefan Beller
We'll reuse the code of the factored out function shortly, when exploring
the superproject for another aspect. Instead of knowing the root of the
superproject we'll find out about the gitlink value.

Signed-off-by: Stefan Beller 
---
 submodule.c | 53 ++---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/submodule.c b/submodule.c
index 239d94d539..4fcb64469e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1977,15 +1977,13 @@ void absorb_git_dir_into_superproject(const char 
*prefix,
}
 }
 
-const char *get_superproject_working_tree(void)
+/* Starts a child process `ls-files` one directory above the root of the repo. 
*/
+static int start_ls_files_dot_dot(struct child_process *cp, struct strbuf *out)
 {
-   struct child_process cp = CHILD_PROCESS_INIT;
-   struct strbuf sb = STRBUF_INIT;
const char *one_up = real_path_if_valid("../");
-   const char *cwd = xgetcwd();
-   const char *ret = NULL;
const char *subpath;
-   int code;
+   char *cwd = xgetcwd();
+   struct strbuf sb = STRBUF_INIT;
ssize_t len;
 
if (!is_inside_work_tree())
@@ -1994,31 +1992,48 @@ const char *get_superproject_working_tree(void)
 * We might have a superproject, but it is harder
 * to determine.
 */
-   return NULL;
+   return -1;
 
if (!one_up)
-   return NULL;
+   return -1;
 
subpath = relative_path(cwd, one_up, );
 
-   prepare_submodule_repo_env(_array);
-   argv_array_pop(_array);
+   prepare_submodule_repo_env(>env_array);
+   argv_array_pop(>env_array);
 
-   argv_array_pushl(, "--literal-pathspecs", "-C", "..",
+   argv_array_pushl(>args, "--literal-pathspecs", "-C", "..",
"ls-files", "-z", "--stage", "--full-name", "--",
subpath, NULL);
-   strbuf_reset();
 
-   cp.no_stdin = 1;
-   cp.no_stderr = 1;
-   cp.out = -1;
-   cp.git_cmd = 1;
+   cp->no_stdin = 1;
+   cp->no_stderr = 1;
+   cp->out = -1;
+   cp->git_cmd = 1;
 
-   if (start_command())
+   if (start_command(cp))
die(_("could not start ls-files in .."));
 
-   len = strbuf_read(, cp.out, PATH_MAX);
-   close(cp.out);
+   len = strbuf_read(out, cp->out, PATH_MAX);
+   close(cp->out);
+
+   strbuf_release();
+   free(cwd);
+   return len;
+}
+
+const char *get_superproject_working_tree(void)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf sb = STRBUF_INIT;
+   const char *cwd = xgetcwd();
+   const char *ret = NULL;
+   int code;
+   ssize_t len;
+
+   len = start_ls_files_dot_dot(, );
+   if (len < 0)
+   return NULL;
 
if (starts_with(sb.buf, "16")) {
int super_sub_len;
-- 
2.15.0.128.g40905b34bf.dirty



[PATCH 4/4] git-status: report reference to superproject

2017-11-08 Thread Stefan Beller
In a submodule the position of HEAD in relation to the gitlink pointer
in the superproject may be of interest.

Introduce a config option `status.superprojectInfo` that when enabled
will report the relation between HEAD and the commit pointed to by the
gitlink in the index of the superproject.

Signed-off-by: Stefan Beller 
---
 Documentation/config.txt |  5 +
 builtin/commit.c |  2 ++
 t/t7519-superproject.sh  | 57 
 wt-status.c  | 37 +++
 wt-status.h  |  1 +
 5 files changed, 102 insertions(+)
 create mode 100755 t/t7519-superproject.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5f0d62753d..7825a1a7be 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3097,6 +3097,11 @@ status.submoduleSummary::
submodule summary' command, which shows a similar output but does
not honor these settings.
 
+status.superprojectInfo
+   Defaults to false.
+   This shows the relation of the current HEAD to the commit pointed
+   to be the gitlink entry in the superprojects index.
+
 stash.showPatch::
If this is set to true, the `git stash show` command without an
option will show the stash entry in patch form.  Defaults to false.
diff --git a/builtin/commit.c b/builtin/commit.c
index c38542ee46..f937f6c6cf 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1286,6 +1286,8 @@ static int git_status_config(const char *k, const char 
*v, void *cb)
s->submodule_summary = -1;
return 0;
}
+   if (!strcmp(k, "status.superprojectinfo"))
+   s->superproject_info = git_config_bool(k, v);
if (!strcmp(k, "status.short")) {
if (git_config_bool(k, v))
status_deferred_config.status_format = 
STATUS_FORMAT_SHORT;
diff --git a/t/t7519-superproject.sh b/t/t7519-superproject.sh
new file mode 100755
index 00..ade2379a59
--- /dev/null
+++ b/t/t7519-superproject.sh
@@ -0,0 +1,57 @@
+
+
+test_description='git status for superproject relations'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit initial &&
+   test_create_repo sub &&
+   # this whole file tests superproject reporting, so set this config here
+   git -C sub config status.superprojectInfo true
+'
+
+test_expect_success 'repo on initial commit does not mention superproject' '
+   git -C sub status > actual &&
+   test_i18ngrep "No commits yet" actual &&
+   test_i18ngrep -v superproject actual
+'
+
+test_expect_success 'setup submodule' '
+   test_commit -C sub initial &&
+   git submodule add ./sub sub
+'
+
+test_expect_success 'submodule in sync with superproject index' '
+   git -C sub status >actual &&
+   test_i18ngrep "superproject points at HEAD" actual
+'
+
+test_expect_success 'submodule in sync with superproject' '
+   git commit -a -m "superproject adds submodule" &&
+   git -C sub status >actual &&
+   test_i18ngrep "superproject points at HEAD" actual
+'
+
+test_expect_success 'submodule advances two commits' '
+   git -C sub commit --allow-empty -m "test" &&
+   git -C sub commit --allow-empty -m "test2" &&
+   git -C sub status >actual &&
+   test_i18ngrep "superproject is 2 commits behind HEAD" actual
+'
+
+test_expect_success 'submodule behind superproject' '
+   git add sub &&
+   git commit -m "update sub" &&
+   git -C sub reset --hard HEAD^ &&
+   git -C sub status >actual &&
+   test_i18ngrep "superproject is ahead of HEAD by 1 commits" actual
+'
+
+test_expect_success 'submodule and superproject differ' '
+   git -C sub commit --allow-empty -m "test2b" &&
+   git -C sub status >actual &&
+   test_i18ngrep "superproject and HEAD differ by +1, -1 commits" actual
+'
+
+test_done
diff --git a/wt-status.c b/wt-status.c
index bedef256ce..3e8e27a550 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1027,6 +1027,41 @@ static void wt_longstatus_print_tracking(struct 
wt_status *s)
strbuf_release();
 }
 
+static void wt_longstatus_print_superproject_relation(struct wt_status *s)
+{
+   struct object_id oid;
+   struct commit *in_gitlink, *head;
+   int head_nr, gitlink_nr;
+
+   if (get_superproject_gitlink())
+   return;
+
+   in_gitlink = lookup_commit();
+
+   read_ref("HEAD", );
+   head = lookup_commit();
+
+   compare_commits(head, in_gitlink, _nr, _nr);
+
+   if (!head_nr && !gitlink_nr)
+   printf(_("superproject points at HEAD\n"));
+   else if (!head_nr)
+   printf(_("superproject is ahead of HEAD by %d commits 
%s..%s\n"),
+gitlink_nr,
+find_unique_abbrev(head->object.oid.hash, 
DEFAULT_ABBREV),
+find_unique_abbrev(in_gitlink->object.oid.hash, 

[PATCH 1/4] remote, revision: factor out exclusive counting between two commits

2017-11-08 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 remote.c   | 40 +---
 revision.c | 45 +
 revision.h |  7 +++
 3 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/remote.c b/remote.c
index 685e776a65..60c689383a 100644
--- a/remote.c
+++ b/remote.c
@@ -1990,9 +1990,7 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
 {
struct object_id oid;
struct commit *ours, *theirs;
-   struct rev_info revs;
const char *base;
-   struct argv_array argv = ARGV_ARRAY_INIT;
 
/* Cannot stat unless we are marked to build on top of somebody else. */
base = branch_get_upstream(branch, NULL);
@@ -2014,43 +2012,7 @@ int stat_tracking_info(struct branch *branch, int 
*num_ours, int *num_theirs,
if (!ours)
return -1;
 
-   /* are we the same? */
-   if (theirs == ours) {
-   *num_theirs = *num_ours = 0;
-   return 0;
-   }
-
-   /* Run "rev-list --left-right ours...theirs" internally... */
-   argv_array_push(, ""); /* ignored */
-   argv_array_push(, "--left-right");
-   argv_array_pushf(, "%s...%s",
-oid_to_hex(>object.oid),
-oid_to_hex(>object.oid));
-   argv_array_push(, "--");
-
-   init_revisions(, NULL);
-   setup_revisions(argv.argc, argv.argv, , NULL);
-   if (prepare_revision_walk())
-   die("revision walk setup failed");
-
-   /* ... and count the commits on each side. */
-   *num_ours = 0;
-   *num_theirs = 0;
-   while (1) {
-   struct commit *c = get_revision();
-   if (!c)
-   break;
-   if (c->object.flags & SYMMETRIC_LEFT)
-   (*num_ours)++;
-   else
-   (*num_theirs)++;
-   }
-
-   /* clear object flags smudged by the above traversal */
-   clear_commit_marks(ours, ALL_REV_FLAGS);
-   clear_commit_marks(theirs, ALL_REV_FLAGS);
-
-   argv_array_clear();
+   compare_commits(ours, theirs, num_ours, num_theirs);
return 0;
 }
 
diff --git a/revision.c b/revision.c
index 99c95c19b0..fe1faf2628 100644
--- a/revision.c
+++ b/revision.c
@@ -1159,6 +1159,51 @@ int ref_excluded(struct string_list *ref_excludes, const 
char *path)
return 0;
 }
 
+void compare_commits(struct commit *ours, struct commit *theirs,
+   int *num_ours, int *num_theirs)
+{
+   struct rev_info revs;
+   struct argv_array argv = ARGV_ARRAY_INIT;
+
+   /* are we the same? */
+   if (theirs == ours) {
+   *num_theirs = *num_ours = 0;
+   return;
+   }
+
+   /* Run "rev-list --left-right ours...theirs" internally... */
+   argv_array_push(, ""); /* ignored */
+   argv_array_push(, "--left-right");
+   argv_array_pushf(, "%s...%s",
+oid_to_hex(>object.oid),
+oid_to_hex(>object.oid));
+   argv_array_push(, "--");
+
+   init_revisions(, NULL);
+   setup_revisions(argv.argc, argv.argv, , NULL);
+   if (prepare_revision_walk())
+   die("revision walk setup failed");
+
+   /* ... and count the commits on each side. */
+   *num_ours = 0;
+   *num_theirs = 0;
+   while (1) {
+   struct commit *c = get_revision();
+   if (!c)
+   break;
+   if (c->object.flags & SYMMETRIC_LEFT)
+   (*num_ours)++;
+   else
+   (*num_theirs)++;
+   }
+
+   /* clear object flags smudged by the above traversal */
+   clear_commit_marks(ours, ALL_REV_FLAGS);
+   clear_commit_marks(theirs, ALL_REV_FLAGS);
+
+   argv_array_clear();
+}
+
 static int handle_one_ref(const char *path, const struct object_id *oid,
  int flag, void *cb_data)
 {
diff --git a/revision.h b/revision.h
index 54761200ad..3ff6a5190b 100644
--- a/revision.h
+++ b/revision.h
@@ -324,4 +324,11 @@ extern int rewrite_parents(struct rev_info *revs, struct 
commit *commit,
  */
 extern struct commit_list *get_saved_parents(struct rev_info *revs, const 
struct commit *commit);
 
+/*
+ * Compute the number of commits between 'one' and 'two' storing the number
+ * of commits in their parent DAG  ncluded in each but not the other.
+ */
+extern void compare_commits(struct commit *one, struct commit *two,
+   int *num_one, int *num_two);
+
 #endif
-- 
2.15.0.128.g40905b34bf.dirty



[PATCH 3/4] submodule.c: get superprojects gitlink value

2017-11-08 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 submodule.c | 27 +++
 submodule.h |  6 ++
 2 files changed, 33 insertions(+)

diff --git a/submodule.c b/submodule.c
index 4fcb64469e..68b123eb13 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2074,6 +2074,33 @@ const char *get_superproject_working_tree(void)
return ret;
 }
 
+/*
+ * Returns 0 when the gitlink is found in the superprojects index,
+ * the value will be found in `oid`. Otherwise return -1.
+ */
+int get_superproject_gitlink(struct object_id *oid)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf sb = STRBUF_INIT;
+   const char *hash;
+
+   if (start_ls_files_dot_dot(, ) < 0)
+   return -1;
+
+   if (!skip_prefix(sb.buf, "16 ", ))
+   /*
+* superproject doesn't have a gitlink at submodule position or
+* output is gibberish
+*/
+   return -1;
+
+   if (get_oid_hex(hash, oid))
+   /* could not parse the object name */
+   return -1;
+
+   return 0;
+}
+
 /*
  * Put the gitdir for a submodule (given relative to the main
  * repository worktree) into `buf`, or return -1 on error.
diff --git a/submodule.h b/submodule.h
index f0da0277a4..5fc602f0c7 100644
--- a/submodule.h
+++ b/submodule.h
@@ -137,4 +137,10 @@ extern void absorb_git_dir_into_superproject(const char 
*prefix,
  */
 extern const char *get_superproject_working_tree(void);
 
+/*
+ * Returns 0 when the gitlink is found in the superprojects index,
+ * the value will be found in `oid`. Otherwise return -1.
+ */
+extern int get_superproject_gitlink(struct object_id *oid);
+
 #endif
-- 
2.15.0.128.g40905b34bf.dirty



[RFC PATCH 0/4] git-status reports relation to superproject

2017-11-08 Thread Stefan Beller
  $ git -c status.superprojectinfo status
  HEAD detached at v2.15-rc2
  superproject is 6 commits behind HEAD 7070ce2..5e6d0fb
  nothing to commit, working tree clean

How cool is that?

This series side steps the questions raised in
https://public-inbox.org/git/xmqq4lq6hmp2.fsf...@gitster.mtv.corp.google.com/
which I am also putting together albeit slowly.

This series just reports the relationship between the superprojects gitlink
(if any) to HEAD. I think that is useful information in the current
world of submodules.

Stefan

Stefan Beller (4):
  remote, revision: factor out exclusive counting between two commits
  submodule.c: factor start_ls_files_dot_dot out of
get_superproject_working_tree
  submodule.c: get superprojects gitlink value
  git-status: report reference to superproject

 Documentation/config.txt |  5 +++
 builtin/commit.c |  2 ++
 remote.c | 40 +---
 revision.c   | 45 +++
 revision.h   |  7 +
 submodule.c  | 80 
 submodule.h  |  6 
 t/t7519-superproject.sh  | 57 ++
 wt-status.c  | 37 ++
 wt-status.h  |  1 +
 10 files changed, 222 insertions(+), 58 deletions(-)
 create mode 100755 t/t7519-superproject.sh

-- 
2.15.0.128.g40905b34bf.dirty



Re: What's cooking in git.git (Nov 2017, #03; Wed, 8)

2017-11-08 Thread Brandon Williams
On 11/08, Junio C Hamano wrote:
> Here are the topics that have been cooking.  Commits prefixed with
> '-' are only in 'pu' (proposed updates) while commits prefixed with
> '+' are in 'next'.  The ones marked with '.' do not appear in any of
> the integration branches, but I am still holding onto them.
> 
> The tip of 'next' has been rebuilt on top of v2.15, while kicking a
> few topics back to 'pu'.
> 
> You can find the changes described here in the integration branches
> of the repositories listed at
> 
> http://git-blame.blogspot.com/p/git-public-repositories.html
> 
> --
> [New Topics]
> 
> * bw/rebase-i-ignored-submodule-fix (2017-11-07) 1 commit
>  - wt-status: actually ignore submodules when requested
> 
>  "git rebase -i" recently started misbehaving when a submodule that
>  is configured with 'submodule..ignore' is dirty; this has
>  been corrected.
> 
>  Will merge to 'next'.
>  I've edited in the tweak brought up in the discussion.  Please
>  eyeball to sanity check.

Tweaks look good to me!

-- 
Brandon Williams


Re: [PATCH 02/14] clone, fetch-pack, index-pack, transport: partial clone

2017-11-08 Thread Adam Dinwoodie
On Friday 03 November 2017 at 01:32 pm -0700, Jonathan Tan wrote:
> On Thu,  2 Nov 2017 20:31:17 +
> Jeff Hostetler  wrote:
> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> > index a0a35e6..31cd5ba 100644
> > --- a/builtin/index-pack.c
> > +++ b/builtin/index-pack.c
> > @@ -222,6 +222,16 @@ static unsigned check_object(struct object *obj)
> > if (!(obj->flags & FLAG_CHECKED)) {
> > unsigned long size;
> > int type = sha1_object_info(obj->oid.hash, );
> > +
> > +   if (type <= 0) {
> > +   /*
> > +* TODO Use the promisor code to conditionally
> > +* try to fetch this object -or- assume it is ok.
> > +*/
> > +   obj->flags |= FLAG_CHECKED;
> > +   return 0;
> > +   }
> > +
> > if (type <= 0)
> > die(_("did not receive expected object %s"),
> >   oid_to_hex(>oid));
> 
> This causes some repo corruption tests to fail.

Confirmed: I see this patch, or at least f7e0dbc38 ("clone, fetch-pack,
index-pack, transport: partial clone", 2017-11-02), causing t5300.26 to
fail on 64-bit Cygwin.

For the sake of anyone trying to reproduce this, I needed to cherry pick
66d4c7a58 ("fixup! upload-pack: add object filtering for partial clone",
2017-11-08) onto that commit before I was able to get it to compile.

Adam


RE: Bug - Status - Space in Filename

2017-11-08 Thread Joseph Strauss
I believe I have found a bug in the way git status -s lists filenames.

According to the documentation:
  The fields (including the ->) are separated from each other by a single 
space. If a filename contains whitespace or other nonprintable characters,   
that field will be quoted in the manner of a C string literal: surrounded by 
ASCII double quote (34) characters, and with interior special characters 
backslash-escaped.

While this is true in most situations, it does not seem to apply to merge 
conflicts. When a file has merge conflicts I am getting the following:
 $ git status -s
 UU some/path/with space/in/the/name
 M  "another/path/with space/in/the/name "

I found the same problem for the following versions:
. git version 2.15.0.windows.1
. git version 2.10.0



Joseph Kalman Strauss
Lifecycle Management Engineer
B Photo
212-239-7500 x2212
josep...@bhphoto.com



Invalid memory access in `git apply`

2017-11-08 Thread mqudsi
**Resending as it seems that the attachments caused the last email to wind up
in a black hole**

There seems to be bug in the `git apply` that leads to out-of-bounds memory
access when --ignore-space-change is combined with --inaccurate-eof and
applying a patch.

On occasion, this can lead to error output like the following:

 mqudsi@ZBook ~> git apply --ignore-space-change --ignore-whitespace
 --allow-overlap --inaccurate-eof without_whitespace.diff
 *** Error in `git': malloc(): memory corruption: 0x02543530 ***
 === Backtrace: =
 /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fdda79c77e5]
 /lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7fdda79d213e]
 /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7fdda79d4184]
 
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_doallocate+0x55)[0x7fdda79bd1d5]
 /lib/x86_64-linux-gnu/libc.so.6(_IO_doallocbuf+0x34)[0x7fdda79cb594]
 
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_overflow+0x1c8)[0x7fdda79ca8f8]
 /lib/x86_64-linux-gnu/libc.so.6(_IO_file_xsputn+0xad)[0x7fdda79c928d]
 /lib/x86_64-linux-gnu/libc.so.6(fputs+0x98)[0x7fdda79be0c8]
 git[0x5386cd]
 git[0x538714]
 git[0x538940]
 git[0x40e220]
 git[0x410a10]
 git[0x41256e]
 git[0x412df7]
 git[0x415935]
 git[0x406436]
 git[0x40555c]

The original file being patched (clipboard.vim) and the patch file that I had
attempted to apply (without_whitespace.diff) are attached, along with the
full, unabridged output of the memory map as a result of the out-of-bounds
access (memory_map.txt).

The memory map output was generated under git 2.7.4; repeated attempts to
reproduce the memory map dump with both 2.7.4 and 2.15 produce the following
output:

 mqudsi@ZBook ~/.c/nvim> git apply --ignore-space-change  
--inaccurate-eof
 --whitespace=fix without_whitespace.diff
 fatal: BUG: caller miscounted postlen: asked 248, orig = 251, used = 
249

Mahmoud Al-Qudsi
NeoSmart Technologies

--Attachments--

* clipboard.vim: http://termbin.com/u25t
* without_whitespace.diff: http://termbin.com/bu9y
* memory_map.txt: http://termbin.com/cboz




Re: [RFC PATCH] rebisect: add script for easier bisect log editing

2017-11-08 Thread Adam Dinwoodie
On Wednesday 08 November 2017 at 05:15 pm +0100, Christian Couder wrote:
> >> +git bisect replay "$GIT_BISECT_LOG_TMP"
> >> +rm -f "$GIT_BISECT_LOG_TMP"
> 
> While at it, is there a reason for the -f option above?

I was following the lead of git-bisect.sh, which has used `rm -f` for
such things ever since it was first introduced[^1], although it appears
that, since v2.15.0, all the `rm`s in that script have been moved to the
C code[^2].

Actually applying thought, rather than just following existing
precedent, I suspect having `-f` is useful because it means the command
will work even if the shell has picked up that `rm` should otherwise
have a `-i` argument from somewhere.

[^1]: 8cc6a0831 ("[PATCH] Making it easier to find which change introduced a 
bug", 2005-07-30)
[^2]: fb71a3299 ("bisect--helper: `bisect_clean_state` shell function in C", 
2017-09-29)


Re: [RFC PATCH] rebisect: add script for easier bisect log editing

2017-11-08 Thread Adam Dinwoodie
On Wednesday 08 November 2017 at 05:12 pm +0100, Christian Couder wrote:
> On Wed, Nov 8, 2017 at 2:59 PM, Adam Dinwoodie  wrote:
> > +git bisect reset HEAD
> 
> I guess that using "reset HEAD" could be cheaper than just "reset" and
> that's the reason you are using it.

Exactly that, yes.  I often use `reset HEAD` in my own workflows in the
name of speed, and I can't see any disadvantages of doing it here, too.

> > +git bisect start
> 
> Are you sure that this "start" is necessary? The doc says that "reset"
> followed by "replay that-file" should be enough.

It isn't necessary, in that the process works if you skip that command.
However, without it, the `git bisect replay` command prints "We are not
bisecting" before it does anything else, so having the `bisect start`
there explicitly removes that extraneous output.

If the script were integrated into git-bisect itself, it would probably
make sense to change that behaviour so the warning isn't printed.  (It
quite possibly makes sense to remove the warning when running `bisect
replay` regardless.)  But when writing the stand-alone script I wanted
things to work without any changes to the core Git code.


Re: [RFC PATCH] rebisect: add script for easier bisect log editing

2017-11-08 Thread Christian Couder
>> +git bisect replay "$GIT_BISECT_LOG_TMP"
>> +rm -f "$GIT_BISECT_LOG_TMP"

While at it, is there a reason for the -f option above?


Re: [RFC PATCH] rebisect: add script for easier bisect log editing

2017-11-08 Thread Christian Couder
On Wed, Nov 8, 2017 at 2:59 PM, Adam Dinwoodie  wrote:
> Add a short script, vaguely inspired by `git rebase --interactive`, to
> ease the process described in the `git bisect` documentation of saving
> off a bisect log, editing it, then replaying it.

Nice idea.

> Signed-off-by: Adam Dinwoodie 
> ---
>
> When I'm bisecting, I find I need to semi-regularly go back and change
> my good/bad/skip response for some commits.  The bisect documentation
> describes doing this by saving `git bisect log` output, editing it, then
> using `git bisect replay`.  Which is a perfectly fine technique, but
> automation is A Good Thing(TM).  The below script is a short proof of
> concept for changing this process to be a single command.
>
> Ideally (at least from my perspective), this function would be rolled
> into the main `git bisect` tool, as `git bisect edit` or similar.

I agree and I don't think it would be very difficult to convert to
such a sub command.

> Before I start working on that, however, I wanted to see what the list
> thought of the idea.
>
>  contrib/git-rebisect.sh | 12 
>  1 file changed, 12 insertions(+)
>  create mode 100755 contrib/git-rebisect.sh
>
> diff --git a/contrib/git-rebisect.sh b/contrib/git-rebisect.sh
> new file mode 100755
> index 0..60f20b278
> --- /dev/null
> +++ b/contrib/git-rebisect.sh
> @@ -0,0 +1,12 @@
> +#!/bin/sh
> +
> +GIT_EDITOR="$(git var GIT_EDITOR)"
> +GIT_DIR="$(git rev-parse --git-dir)"
> +GIT_BISECT_LOG_TMP="${GIT_DIR}/BISECT_LOG_EDIT"
> +
> +git bisect log >"$GIT_BISECT_LOG_TMP"
> +"$GIT_EDITOR" "$GIT_BISECT_LOG_TMP"
> +git bisect reset HEAD

I guess that using "reset HEAD" could be cheaper than just "reset" and
that's the reason you are using it.

> +git bisect start

Are you sure that this "start" is necessary? The doc says that "reset"
followed by "replay that-file" should be enough.

> +git bisect replay "$GIT_BISECT_LOG_TMP"
> +rm -f "$GIT_BISECT_LOG_TMP"

Thanks,
Christian.


Re: [RFC] fastindex: parallelize index load

2017-11-08 Thread Ben Peart

This is an RFC because it works but I've not done the code cleanup,
added tests, support in the update-index command to add/remove it, etc.
As a result, there is no reason to point out all the places I'm not
currently following the git coding style. :)

I wanted to get feedback on the concept first, especially as the way I'm
adding the TOC information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack, as is the resetting of the prefix encoding for V4
indexes.  They are, however, entirely backwards compatible with older
versions of git which can still properly read and use the index.

The effect can be seen using t/helper/test-read-cache:

fastindex
testcount   files   TRUEFALSE Savings

test-read-cache 500 100K6.398.33  23.36%
test-read-cache 100 1M  12.49   18.68 33.12%


On 11/8/2017 9:42 AM, Ben Peart wrote:

This patch will address the CPU cost of loading the index by adding
additional data to the index that will allow us to multi-thread the
loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file.  With
version 2, 3 or even 4 indexes, we can utilize the Index Entry Offset Table
(IEOT) to parallelize the loading and conversion of the cache entries
across all available CPU cores.

To make this work for V4 indexes, when writing the index, it periodically
"resets" the compression by encoding the current entry as if the path
name for the previous entry is completely different and saves the offset
of that entry in the IEOT.  Basically, with V4 indexes, it generates
offsets into blocks of prefix-compressed entries.

To enable reading the IEOT extension before reading all the variable
length cache entries and other extensions, the IEOT is written last,
right before the trailing SHA1.

The format of that extension has the signature bytes and size at the
beginning (like a normal extension) as well as at the end in reverse
order to enable parsing the extension by seeking back from the end of
the file.  See the diagram below for details.

During index load, read the index header then seek to the end of the
index, back up past the trailing SHA1 and look for the IEOT extension
signature bytes.  If they exist, read the 32-bit size and seek back to
the extension header and verify the leading header and size bits.  If
they all match, we can be assured we have a valid IEOT extension.

If the IEOT extension is available, create multiple threads to divide
the work of loading and converting the cache entries across all
available CPU cores.  Once the cache entries are loaded, the rest of the
extensions can be loaded and processed normally (skipping the IEOT entry
as it has already been processed).  If the IEOT extension is not
available then parsing the index will proceed as usual with a single thread.

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart 
---

Notes:
 Base Ref: v2.14.3.windows.1
 Web-Diff: https://github.com/benpeart/git/commit/1e818c7835
 Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git 
checkout 1e818c7835

  Makefile  |   2 +
  cache.h   |  18 +++
  config.c  |  20 +++
  config.h  |   1 +
  environment.c |   3 +
  read-cache.c  | 340 +++---
  t/helper/test-dump-ieot.c |  78 +++
  t/helper/test-ieot.c  |  72 ++
  8 files changed, 513 insertions(+), 21 deletions(-)
  create mode 100644 t/helper/test-dump-ieot.c
  create mode 100644 t/helper/test-ieot.c

diff --git a/Makefile b/Makefile
index ebd0a75d87..99fa8dd8d3 100644
--- a/Makefile
+++ b/Makefile
@@ -640,12 +640,14 @@ TEST_PROGRAMS_NEED_X += test-config
  TEST_PROGRAMS_NEED_X += test-date
  TEST_PROGRAMS_NEED_X += test-delta
  TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-ieot
  TEST_PROGRAMS_NEED_X += test-dump-split-index
  TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
  TEST_PROGRAMS_NEED_X += test-fake-ssh
  TEST_PROGRAMS_NEED_X += test-genrandom
  TEST_PROGRAMS_NEED_X += test-hashmap
  TEST_PROGRAMS_NEED_X += test-helper
+TEST_PROGRAMS_NEED_X += test-ieot
  TEST_PROGRAMS_NEED_X += test-index-version
  TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
  TEST_PROGRAMS_NEED_X += test-line-buffer
diff --git 

Re: [RFC] fastindex: parallelize index load

2017-11-08 Thread Ben Peart

This is an RFC because it works but I've not done the code cleanup,
added tests, support in the update-index command to add/remove it, etc.
As a result, there is no reason to point out all the places I'm not
currently following the git coding style. :)

I wanted to get feedback on the concept first, especially as the way I'm
adding the TOC information via an extension that can be read before the
variable length section of cache entries and other extensions is a bit
of a clever hack, as is the resetting of the prefix encoding for V4
indexes.  They are, however, entirely backwards compatible with older
versions of git which can still properly read and use the index.

The effect can be seen using t/helper/test-read-cache:

fastindex
testcount   files   TRUEFALSE Savings

test-read-cache 500 100K6.398.33  23.36%
test-read-cache 100 1M  12.49   18.68 33.12%



On 11/8/2017 9:42 AM, Ben Peart wrote:

This patch will address the CPU cost of loading the index by adding
additional data to the index that will allow us to multi-thread the
loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file.  With
version 2, 3 or even 4 indexes, we can utilize the Index Entry Offset Table
(IEOT) to parallelize the loading and conversion of the cache entries
across all available CPU cores.

To make this work for V4 indexes, when writing the index, it periodically
"resets" the compression by encoding the current entry as if the path
name for the previous entry is completely different and saves the offset
of that entry in the IEOT.  Basically, with V4 indexes, it generates
offsets into blocks of prefix-compressed entries.

To enable reading the IEOT extension before reading all the variable
length cache entries and other extensions, the IEOT is written last,
right before the trailing SHA1.

The format of that extension has the signature bytes and size at the
beginning (like a normal extension) as well as at the end in reverse
order to enable parsing the extension by seeking back from the end of
the file.  See the diagram below for details.

During index load, read the index header then seek to the end of the
index, back up past the trailing SHA1 and look for the IEOT extension
signature bytes.  If they exist, read the 32-bit size and seek back to
the extension header and verify the leading header and size bits.  If
they all match, we can be assured we have a valid IEOT extension.

If the IEOT extension is available, create multiple threads to divide
the work of loading and converting the cache entries across all
available CPU cores.  Once the cache entries are loaded, the rest of the
extensions can be loaded and processed normally (skipping the IEOT entry
as it has already been processed).  If the IEOT extension is not
available then parsing the index will proceed as usual with a single thread.

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart 
---

Notes:
 Base Ref: v2.14.3.windows.1
 Web-Diff: https://github.com/benpeart/git/commit/1e818c7835
 Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git 
checkout 1e818c7835

  Makefile  |   2 +
  cache.h   |  18 +++
  config.c  |  20 +++
  config.h  |   1 +
  environment.c |   3 +
  read-cache.c  | 340 +++---
  t/helper/test-dump-ieot.c |  78 +++
  t/helper/test-ieot.c  |  72 ++
  8 files changed, 513 insertions(+), 21 deletions(-)
  create mode 100644 t/helper/test-dump-ieot.c
  create mode 100644 t/helper/test-ieot.c

diff --git a/Makefile b/Makefile
index ebd0a75d87..99fa8dd8d3 100644
--- a/Makefile
+++ b/Makefile
@@ -640,12 +640,14 @@ TEST_PROGRAMS_NEED_X += test-config
  TEST_PROGRAMS_NEED_X += test-date
  TEST_PROGRAMS_NEED_X += test-delta
  TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-ieot
  TEST_PROGRAMS_NEED_X += test-dump-split-index
  TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
  TEST_PROGRAMS_NEED_X += test-fake-ssh
  TEST_PROGRAMS_NEED_X += test-genrandom
  TEST_PROGRAMS_NEED_X += test-hashmap
  TEST_PROGRAMS_NEED_X += test-helper
+TEST_PROGRAMS_NEED_X += test-ieot
  TEST_PROGRAMS_NEED_X += test-index-version
  TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
  TEST_PROGRAMS_NEED_X += test-line-buffer
diff --git 

[RFC] fastindex: parallelize index load

2017-11-08 Thread Ben Peart
This patch will address the CPU cost of loading the index by adding
additional data to the index that will allow us to multi-thread the
loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file.  With
version 2, 3 or even 4 indexes, we can utilize the Index Entry Offset Table
(IEOT) to parallelize the loading and conversion of the cache entries
across all available CPU cores.

To make this work for V4 indexes, when writing the index, it periodically
"resets" the compression by encoding the current entry as if the path
name for the previous entry is completely different and saves the offset
of that entry in the IEOT.  Basically, with V4 indexes, it generates
offsets into blocks of prefix-compressed entries.

To enable reading the IEOT extension before reading all the variable
length cache entries and other extensions, the IEOT is written last,
right before the trailing SHA1.

The format of that extension has the signature bytes and size at the
beginning (like a normal extension) as well as at the end in reverse
order to enable parsing the extension by seeking back from the end of
the file.  See the diagram below for details.

During index load, read the index header then seek to the end of the
index, back up past the trailing SHA1 and look for the IEOT extension
signature bytes.  If they exist, read the 32-bit size and seek back to
the extension header and verify the leading header and size bits.  If
they all match, we can be assured we have a valid IEOT extension.

If the IEOT extension is available, create multiple threads to divide
the work of loading and converting the cache entries across all
available CPU cores.  Once the cache entries are loaded, the rest of the
extensions can be loaded and processed normally (skipping the IEOT entry
as it has already been processed).  If the IEOT extension is not
available then parsing the index will proceed as usual with a single thread.

The on-disk format looks like this:

Index header
Cache entry 1
Cache entry 2
.
.
Extension 1
Extension 2
.
.
Index Entry Offset Table Extension (must be written last!)
IEOT signature bytes
32-bit size
32-bit version
32-bit Cache Entry Offset 1
32-bit Cache Entry count
32-bit Cache Entry Offset 2
32-bit Cache Entry count
.
.
32-bit version
32-bit size
IEOT signature bytes
SHA1

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: v2.14.3.windows.1
Web-Diff: https://github.com/benpeart/git/commit/1e818c7835
Checkout: git fetch https://github.com/benpeart/git fastindex-v1 && git 
checkout 1e818c7835

 Makefile  |   2 +
 cache.h   |  18 +++
 config.c  |  20 +++
 config.h  |   1 +
 environment.c |   3 +
 read-cache.c  | 340 +++---
 t/helper/test-dump-ieot.c |  78 +++
 t/helper/test-ieot.c  |  72 ++
 8 files changed, 513 insertions(+), 21 deletions(-)
 create mode 100644 t/helper/test-dump-ieot.c
 create mode 100644 t/helper/test-ieot.c

diff --git a/Makefile b/Makefile
index ebd0a75d87..99fa8dd8d3 100644
--- a/Makefile
+++ b/Makefile
@@ -640,12 +640,14 @@ TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-ieot
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
 TEST_PROGRAMS_NEED_X += test-fake-ssh
 TEST_PROGRAMS_NEED_X += test-genrandom
 TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-helper
+TEST_PROGRAMS_NEED_X += test-ieot
 TEST_PROGRAMS_NEED_X += test-index-version
 TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
 TEST_PROGRAMS_NEED_X += test-line-buffer
diff --git a/cache.h b/cache.h
index 45597732d8..e9ec1bf41a 100644
--- a/cache.h
+++ b/cache.h
@@ -326,6 +326,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define IEOT_CHANGED   (1 << 9)
 
 struct split_index;
 struct untracked_cache;
@@ -770,6 +771,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_fast_index;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
@@ -2025,4 +2027,20 @@ void sleep_millisec(int millisec);
  */
 void safe_create_dir(const char *dir, int share);
 
+
+#ifndef NO_PTHREADS
+struct index_entry_offset
+{  // starting byte offset into index file, count of index entries in this 
block
+   int offset, nr;
+};
+
+struct index_entry_offset_table
+{
+   int nr; // number of ieot entries in array
+   struct index_entry_offset entries[0];
+};
+
+struct 

Re: [PATCH v2 0/6] Partial clone part 1: object filtering

2017-11-08 Thread Jeff Hostetler



On 11/7/2017 7:54 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


I can see some use for this parameter - for example, when doing a report
for statistical purposes (percentage of objects missing, for example) or
for a background task that downloads missing objects into a cache. Also,
power users who know what they're doing (or normal users in an
emergency) can use this option when they have no network connection if
they really need to find something out from the local repo.

In these cases, the promisor check (after detecting that the object is
missing) is indeed not so useful, I think. (Or we can do the
--exclude=missing and --exclude=promisor idea that Jeff mentioned -
--exclude=missing now, and --exclude=promisor after we add promisor
support.)


This sounds like a reasonable thing to have in the endgame state to
me.


OK thanks, I'll change it to --exclude=missing in my next version.
 



Having said that, I would be OK if we didn't have tolerance (and/or
reporting) of missing objects right now. As far as I know, for the
initial implementation of partial clone, only the server performs any
filtering, and we assume that the server possesses all objects (so it
does not need to filter out any missing objects).


True.  It does not have to exist in an early part, but I do not
think we would terribly mind if it does, if only to help debugging
and development.

Thanks for thinking it through.



Right, it could come later, but having it here in part 1 as part
of the initial series completes the pre-promisor portion of these
commands.  Having a print-missing option lets rev-list -- as is --
be used in a bulk-fetch-object pre-checkout hook or as part of a
"give me what I need before I go offline" command.  This is useful
by itself.  It does augment the dynamic fetch-object code added in
part 2 and the unpack-trees changes in part 3 to call fetch-object.

Jeff




Re: [PATCH v2] doc/SubmittingPatches: correct subject guidance

2017-11-08 Thread Eric Sunshine
On Wed, Nov 8, 2017 at 8:47 AM, Adam Dinwoodie  wrote:
> The examples and common practice for adding markers such as "RFC" or
> "v2" to the subject of patch emails is to have them within the same
> brackets as the "PATCH" text, not after the closing bracket.  Further,
> the practice of `git format-patch` and the like, as well as what appears
> to be the more common pratice on the mailing list, is to use "[RFC
> PATCH]", not "[PATCH/RFC]".
>
> Update the SubmittingPatches article to match.
>
> Signed-off-by: Adam Dinwoodie 
> ---
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> @@ -184,12 +184,14 @@ lose tabs that way if you are not careful.
>  It is a common convention to prefix your subject line with
>  [PATCH].  This lets people easily distinguish patches from other
> -e-mail discussions.  Use of additional markers after PATCH and
> -the closing bracket to mark the nature of the patch is also
> -encouraged.  E.g. [PATCH/RFC] is often used when the patch is
> -not ready to be applied but it is for discussion, [PATCH v2],
> -[PATCH v3] etc. are often seen when you are sending an update to
> -what you have previously sent.
> +e-mail discussions.  Use of markers in addition to PATCH within
> +the brackets to describe the nature of the patch is also
> +encouraged.  E.g. [RFC PATCH] is often used when the patch is not
> +ready to be applied but it is for discussion, and can be added
> +with the `--rfc` argument to `git format-patch` or `git
> +send-email`, while [PATCH v2], [PATCH v3] etc.  are often seen

It has become a bit of a run-on sentence, but aside from that and the
unnecessary extra whitespace between "etc." and "are", it looks good
to me.

> +when you are sending an update to what you have previously sent,
> +and can be added with the `-v ` arguments to the same commands.


[RFC PATCH] rebisect: add script for easier bisect log editing

2017-11-08 Thread Adam Dinwoodie
Add a short script, vaguely inspired by `git rebase --interactive`, to
ease the process described in the `git bisect` documentation of saving
off a bisect log, editing it, then replaying it.

Signed-off-by: Adam Dinwoodie 
---

When I'm bisecting, I find I need to semi-regularly go back and change
my good/bad/skip response for some commits.  The bisect documentation
describes doing this by saving `git bisect log` output, editing it, then
using `git bisect replay`.  Which is a perfectly fine technique, but
automation is A Good Thing(TM).  The below script is a short proof of
concept for changing this process to be a single command.

Ideally (at least from my perspective), this function would be rolled
into the main `git bisect` tool, as `git bisect edit` or similar.
Before I start working on that, however, I wanted to see what the list
thought of the idea.

 contrib/git-rebisect.sh | 12 
 1 file changed, 12 insertions(+)
 create mode 100755 contrib/git-rebisect.sh

diff --git a/contrib/git-rebisect.sh b/contrib/git-rebisect.sh
new file mode 100755
index 0..60f20b278
--- /dev/null
+++ b/contrib/git-rebisect.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+GIT_EDITOR="$(git var GIT_EDITOR)"
+GIT_DIR="$(git rev-parse --git-dir)"
+GIT_BISECT_LOG_TMP="${GIT_DIR}/BISECT_LOG_EDIT"
+
+git bisect log >"$GIT_BISECT_LOG_TMP"
+"$GIT_EDITOR" "$GIT_BISECT_LOG_TMP"
+git bisect reset HEAD
+git bisect start
+git bisect replay "$GIT_BISECT_LOG_TMP"
+rm -f "$GIT_BISECT_LOG_TMP"
-- 
2.14.3



[PATCH v2] doc/SubmittingPatches: correct subject guidance

2017-11-08 Thread Adam Dinwoodie
The examples and common practice for adding markers such as "RFC" or
"v2" to the subject of patch emails is to have them within the same
brackets as the "PATCH" text, not after the closing bracket.  Further,
the practice of `git format-patch` and the like, as well as what appears
to be the more common pratice on the mailing list, is to use "[RFC
PATCH]", not "[PATCH/RFC]".

Update the SubmittingPatches article to match.

Signed-off-by: Adam Dinwoodie 
---

I'm re-rolling this patch with some more substantive changes, as a bit
more research points to (a) "RFC PATCH" being more common on this
mailing list than "PATCH/RFC", at least in recent usage, and (b) so the
instructions match the best practice according to `git format-patch` and
friends.

 Documentation/SubmittingPatches | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 558d465b6..95abf6084 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -184,12 +184,14 @@ lose tabs that way if you are not careful.
 
 It is a common convention to prefix your subject line with
 [PATCH].  This lets people easily distinguish patches from other
-e-mail discussions.  Use of additional markers after PATCH and
-the closing bracket to mark the nature of the patch is also
-encouraged.  E.g. [PATCH/RFC] is often used when the patch is
-not ready to be applied but it is for discussion, [PATCH v2],
-[PATCH v3] etc. are often seen when you are sending an update to
-what you have previously sent.
+e-mail discussions.  Use of markers in addition to PATCH within
+the brackets to describe the nature of the patch is also
+encouraged.  E.g. [RFC PATCH] is often used when the patch is not
+ready to be applied but it is for discussion, and can be added
+with the `--rfc` argument to `git format-patch` or `git
+send-email`, while [PATCH v2], [PATCH v3] etc.  are often seen
+when you are sending an update to what you have previously sent,
+and can be added with the `-v ` arguments to the same commands.
 
 "git format-patch" command follows the best current practice to
 format the body of an e-mail message.  At the beginning of the
-- 
2.14.3



[PATCH] rebase -i: fix comment typo

2017-11-08 Thread Adam Dinwoodie
Signed-off-by: Adam Dinwoodie 
---
 git-rebase--interactive.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2563dc52d..437815669 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -722,7 +722,7 @@ collapse_todo_ids() {
git rebase--helper --shorten-ids
 }
 
-# Add commands after a pick or after a squash/fixup serie
+# Add commands after a pick or after a squash/fixup series
 # in the todo list.
 add_exec_commands () {
{
-- 
2.14.3



[PATCH] doc/SubmittingPatches: correct subject guidance

2017-11-08 Thread Adam Dinwoodie
The examples and common practice for adding markers such as "RFC" or
"v2" to the subject of patch emails is to have them within the same
brackets as the "PATCH" text.  Update the description to match this
behaviour, rather than asserting such markers should be after the
closing bracket.

Signed-off-by: Adam Dinwoodie 
---
 Documentation/SubmittingPatches | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 558d465b6..e91ce8269 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -184,7 +184,7 @@ lose tabs that way if you are not careful.
 
 It is a common convention to prefix your subject line with
 [PATCH].  This lets people easily distinguish patches from other
-e-mail discussions.  Use of additional markers after PATCH and
+e-mail discussions.  Use of additional markers between PATCH and
 the closing bracket to mark the nature of the patch is also
 encouraged.  E.g. [PATCH/RFC] is often used when the patch is
 not ready to be applied but it is for discussion, [PATCH v2],
-- 
2.14.3



Re: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-11-08 Thread Michael J Gruber
Ekelhart Jakob venit, vidit, dixit 08.11.2017 09:52:
> Thank you for all the effort to fix this issue. Unfortunately, we are still 
> suffering from this and our workaround just stopped being sufficient.
> 
> We were wondering if there is any way to tell when this fix will be released?
> 
> BR Jakob

Soon (TM) :)

Term start kept me busy, but I'll try and resume dangling topics this
week or next.

It seems the consensus was that current functionality is as designed but
not necessarily as expected, and another mode "--fork-base" (that does
what I suggested as "fix") would meet these expectations. I would reuse
the documentation of the current mode as a description of the new mode
and add documentation for the existing mode ;)

Michael


RE: [PATCH 2/3] merge-base: return fork-point outside reflog

2017-11-08 Thread Ekelhart Jakob
Thank you for all the effort to fix this issue. Unfortunately, we are still 
suffering from this and our workaround just stopped being sufficient.

We were wondering if there is any way to tell when this fix will be released?

BR Jakob

-Original Message-
From: Junio C Hamano *EXTERN* [mailto:gits...@pobox.com] 
Sent: Dienstag, 3. Oktober 2017 08:06
To: Michael J Gruber 
Cc: git@vger.kernel.org; Ekelhart Jakob ; Jeff King 
; Johannes Schindelin 
Subject: Re: [PATCH 2/3] merge-base: return fork-point outside reflog

Junio C Hamano  writes:

> Michael J Gruber  writes:
>
>> I'm still trying to understand what the original intent was: If we 
>> abstract from the implementation (as we should, as you rightly
>> emphasize) and talk about historical tips then we have to ask ourselves:
>> - What is "historical"?
>> - What is tip?
>> - Tip of what, i.e. what is a "branch"?
>
> The feature was meant to be a solution for "upstream rebased the 
> branch I based my work on."
> ...

So, what is the status of this thing?

While I think 1/3 and 3/3 of these three definitely make sense, I do not think 
"fork-point outside reflog" does as-is If it is not even part of the commits 
that were known to be at the tip some time in the past (including "right 
now"---which is the fix you made with 3/3 is about), then the patch may make 
the command return something in more situations, and these extra things that it 
returns might even be improvements, but they are definitely not "fork-points".

To be quite honest, I am not convinced that the extra output you would get out 
of the command by removing the latter half of "which are the ancestors that 
were known to be at the tip?" would always give better commit to use as the 
beginning of the topic to be rebased, as I do not see any reasoning behind why, 
unlike the filtered case where there _is_ a strong reasoning (with explained
limitation) behind it.

As long as the code misidentifies and picks a commits deeper than necessary, 
which will force the user to say "rebase --skip", I think we are OK.  What we 
want to absolutely avoid is the opposite; somehow the code misidentifies a 
commit that is part of the work you want to rebase as a recommended fork-point, 
which would cause the rebase to silently lose changes.  I do not think I saw 
why it won't happen explained in the log message of 2/3 at all.