Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-14 Thread Duy Nguyen
On Thu, Mar 15, 2018 at 2:34 AM, Junio C Hamano  wrote:
> * nd/pack-objects-pack-struct (2018-03-05) 9 commits
>  - pack-objects: reorder 'hash' to pack struct object_entry
>  - pack-objects: refer to delta objects by index instead of pointer
>  - pack-objects: move in_pack out of struct object_entry
>  - pack-objects: move in_pack_pos out of struct object_entry
>  - pack-objects: note about in_pack_header_size
>  - pack-objects: use bitfield for object_entry::depth
>  - pack-objects: use bitfield for object_entry::dfs_state
>  - pack-objects: turn type and in_pack_type to bitfields
>  - pack-objects: document holes in struct object_entry.h
>
>  "git pack-objects" needs to allocate tons of "struct object_entry"
>  while doing its work, and shrinking its size helps the performance
>  quite a bit.
>
>  Will merge to 'next'.

Hold it. A reroll is coming. I'm a bit busy this week and can't really do much.

> * nd/worktree-prune (2018-03-06) 3 commits
>  - worktree prune: improve prune logic when worktree is moved
>  - worktree: delete dead code
>  - gc.txt: more details about what gc does
>
>  The way "git worktree prune" worked internally has been simplified,
>  by assuming how "git worktree move" moves an existing worktree to a
>  different place.
>
>  Will merge to 'next'.

Same.
-- 
Duy


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-14 Thread Sergey Organov
Hi Buga,

Igor Djordjevic  writes:

> On 14/03/2018 15:24, Sergey Organov wrote:
[...]
>> Thinking about it I've got an idea that what we actually need is
>> --no-flatten flag that, when used alone, will just tell "git rebase" to
>> stop flattening history, and which will be implicitly imposed by
>> --recreate-merges (and --preserve-merges).
>> 
>> Then the only thing the --recreate-merges will tune is to put 'merge'
>> directives into the todo list for merge commits, exactly according to
>> what its name suggests, while the default behavior will be to put 'pick'
>> with suitable syntax into the todo. And arguments to the
>> --recreate-merge will specify additional options for the 'merge'
>> directive, obviously.
>
> This seem to basically boil down to what I mentioned previously[2] 
> through use of new `--rebase-merges` alongside `--recreate-merges`, just 
> that you named it `--no-flatten` here, but the point is the same - and 
> not something Johannes liked, "polluting" rebase option space further.

Not quite so. The problem with --XXX-merges flags is that they do two
things at once: they say _what_ to do and _how_ to do it. Clean UI
designs usually have these things separate, and that's what I propose.

The --[no-]flatten says _what_ (not) to do, and --recreate-merges says
_how_ exactly it will be performed. In this model --no-flatten could
have been called, say --preserve-shape, but not --rebase-merges.

To minimize pollution, the _how_ part could rather be made option value:

--no-flatten[=]

where  is 'rebase', 'remerge', etc.

In this case we will need separate option to specify strategy options,
if required, that will lead us to something similar to the set of merge
strategies options.

> I would agree with him, and settling onto `--rebase-merges` _instead_ of 
> `--recreate-merges` seems as a more appropriate name, indeed, now that 
> default behavior is actually merge commit rebasing and not recreating 
> (recreating still being possible through user editing the todo list).

I hope he'd be pleased to be able to say --no-flatten=remerge and get
back his current mode of operation, that he obviously has a good use
for.

-- Sergey


Hey

2018-03-14 Thread Financial Services
Do you need a loan


What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-14 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/gc-auto-in-commit (2018-03-01) 1 commit
  (merged to 'next' on 2018-03-02 at 96a5a4d629)
 + commit: run git gc --auto just before the post-commit hook

 "git commit" used to run "gc --auto" near the end, which was lost
 when the command was reimplemented in C by mistake.


* ab/pre-auto-gc-battery (2018-02-28) 1 commit
  (merged to 'next' on 2018-03-06 at ca9cb273cb)
 + hooks/pre-auto-gc-battery: allow gc to run on non-laptops

 A sample auto-gc hook (in contrib/) to skip auto-gc while on
 battery has been updated to almost always allow running auto-gc
 unless on_ac_power command is absolutely sure that we are on
 battery power (earlier, it skipped unless the command is sure that
 we are on ac power).


* ag/userdiff-go-funcname (2018-03-01) 1 commit
  (merged to 'next' on 2018-03-02 at ea404d1be9)
 + userdiff: add built-in pattern for golang

 "git diff" and friends learned funcname patterns for Go language
 source files.


* bp/untracked-cache-noflush (2018-02-28) 2 commits
  (merged to 'next' on 2018-03-02 at 709887971b)
 + untracked cache: use git_env_bool() not getenv() for customization
 + dir.c: don't flag the index as dirty for changes to the untracked cache

 Writing out the index file when the only thing that changed in it
 is the untracked cache information is often wasteful, and this has
 been optimized out.


* ds/find-unique-abbrev-optim (2018-02-27) 1 commit
  (merged to 'next' on 2018-03-02 at 0b6d4f9335)
 + sha1_name: fix uninitialized memory errors

 While finding unique object name abbreviation, the code may
 accidentally have read beyond the end of the array of object names
 in a pack.


* ds/mark-parents-uninteresting-optim (2018-02-27) 1 commit
  (merged to 'next' on 2018-03-02 at 5a42c79806)
 + revision.c: reduce object database queries

 Micro optimization in revision traversal code.


* jc/test-must-be-empty (2018-02-27) 1 commit
  (merged to 'next' on 2018-03-02 at ec129f1b97)
 + test_must_be_empty: make sure the file exists, not just empty

 Test framework tweak to catch developer thinko.


* jh/status-no-ahead-behind (2018-01-24) 4 commits
  (merged to 'next' on 2018-03-02 at 68bde8d571)
 + status: support --no-ahead-behind in long format
 + status: update short status to respect --no-ahead-behind
 + status: add --[no-]ahead-behind to status and commit for V2 format.
 + stat_tracking_info: return +1 when branches not equal

 "git status" can spend a lot of cycles to compute the relation
 between the current branch and its upstream, which can now be
 disabled with "--no-ahead-behind" option.


* jk/add-i-diff-filter (2018-03-05) 2 commits
  (merged to 'next' on 2018-03-08 at 6ef737add3)
 + add--interactive: detect bogus diffFilter output
 + t3701: add a test for interactive.diffFilter

 The "interactive.diffFilter" used by "git add -i" must retain
 one-to-one correspondence between its input and output, but it was
 not enforced and caused end-user confusion.  We now at least make
 sure the filtered result has the same number of lines as its input
 to detect a broken filter.


* jk/smart-http-protocol-doc-fix (2018-03-05) 1 commit
  (merged to 'next' on 2018-03-08 at 599b1a7c42)
 + smart-http: document flush after "# service" line

 A doc update.


* ma/roll-back-lockfiles (2018-02-28) 5 commits
  (merged to 'next' on 2018-03-06 at be29bf891c)
 + sequencer: do not roll back lockfile unnecessarily
 + merge: always roll back lock in `checkout_fast_forward()`
 + merge-recursive: always roll back lock in `merge_recursive_generic()`
 + sequencer: always roll back lock in `do_recursive_merge()`
 + sequencer: make lockfiles non-static
 (this branch is used by ma/skip-writing-unchanged-index.)

 Some codepaths used to take a lockfile and did not roll it back;
 they are automatically rolled back at program exit, so there is no
 real "breakage", but it still is a good practice to roll back when
 you are done with a lockfile.


* mk/doc-pretty-fill (2018-02-27) 1 commit
  (merged to 'next' on 2018-03-02 at 623461b127)
 + docs/pretty-formats: fix typo '% <()' -> '%<|()'

 Docfix.


* nd/diff-stat-with-summary (2018-02-27) 2 commits
  (merged to 'next' on 2018-03-06 at d543f92f5e)
 + diff: add --compact-summary
 + diff.c: refactor pprint_rename() to use strbuf

 "git diff" and friends learned "--compact-summary" that shows the
 information usually given with the "--summary" option on the same
 line as the diffstat output of the "--stat" option (which saves
 vertical space and keeps info on a 

Re: [PATCH v3 00/36] object_id part 12

2018-03-14 Thread brian m. carlson
On Wed, Mar 14, 2018 at 09:48:30AM -0700, Junio C Hamano wrote:
> As always, thanks for working on this.  
> 
> After this series, what jumps at me out of output from
> 
> git grep -e '[^0-9A-Za-z_][24]0[^0-9A-Za-z_]' -- '*.[ch]' \
>   ':!*sha1*' ':!contrib/' ':!compat/'
> 
> are code that parses the incoming patch in apply.c (where the full
> blob object names used for binary patches are assumed to be in
> SHA-1), builtin/pack-objects.c (where it has to know the current
> file format of a packfile intimately) and diff.c (where it clips the
> length to which the blob object names on the "index" lines are
> abbreviated to).  Changing 40 in the last one to "the hex length of
> the currently deployed hash" should be relatively uncontroversial.

I have patches that hit several of these places, but not diff.c.  I'll
probably pick that piece up next.

By the way, thanks for a useful regex.  It's a little bit different
than what I've been using, but provides a nice overview.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v3 00/36] object_id part 12

2018-03-14 Thread brian m. carlson
On Wed, Mar 14, 2018 at 10:31:37AM -0700, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> 
> > -+  buf += the_hash_algo->rawsz;
> > -+  size -= the_hash_algo->rawsz;
> > ++  memcpy(it->oid.hash, (const unsigned char*)buf, rawsz);
> > ++  buf += rawsz;
> > ++  size -= rawsz;
> > }
> 
> Using memcpy() to stuff the hash[] field of oid structure with a
> bare byte array of rawsz bytes appears twice as a pattern in these
> patches.  I wonder if this is something we want to abstract behind
> the API, e.g.
> 
>   size_t oidstuff_(struct object_id *oid, const unsigned char *buf)
>   {
>   size_t rawsz = the_hash_algo->rawsz;
>   memcpy(oid->hash, buf, rawsz);
> return rawsz;
>   }
> 
> It just felt a bit uneven to be using a bare-metal memcpy() when
> oidcpy() abstraction releaves the callers from having to be aware of
> the rawsz all the time.

Duy suggested oidread and oidwrite, which I can certainly implement.
I'm also comfortable with just keeping hashcpy around for these cases if
we want.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] http: fix an unused variable warning for 'curl_no_proxy'

2018-03-14 Thread Jeff King
On Wed, Mar 14, 2018 at 11:01:01PM +, Ramsay Jones wrote:

> >> The version of libcurl installed was 0x070f04. So, while it was fresh in my
> >> mind, I applied and tested this patch.
> > 
> > Makes sense. This #if would go away under my "do not support antique
> > curl versions" proposal. I haven't really pushed that forward since Tom
> > Christensen's patches to actually make the thing build (and presumably
> > since he is building on antique versions he can't turn on -Werror
> > anyway, since IIRC it tends to have some false positives).
> 
> Yes, I suspected this would be removed by your proposed update (hence
> the cc:), but I didn't know what the ETA on your patches was. Is this
> worth doing, or are you about to re-visit that series?

I wasn't about to revisit it. I wasn't sure if we actually wanted to
proceed or not, since Git _does_ actually build now with the old
versions (and there was some minor grumbling about dropping
compatibility).

> > I'm not sure whether our ordering of these variables actually means
> > much, but arguably it makes sense to keep the proxy-related variables
> > near each other, even if one of them has to be surrounded by an #if.
> > 
> > I guess you were going for ordering the #if's in increasing version
> > order. I'm not sure the existing code follows that pattern very well.
> 
> Yes, that was the idea, but I was already in two minds about that!
> In the end I went with this, because not all of the proxy variables
> are together anyway. ;-) (see, for example, 'proxy_auth' and 
> 'curl_proxyuserpwd' around line #113).
> 
> So, I don't mind placing the #ifdef around the current definition
> (rather than moving it up), if you would prefer that. (It will have
> to be tomorrow, since I have put that laptop away now!).

I'm OK with it either way, then. Thanks.

-Peff


Re: Official Notice

2018-03-14 Thread Shalom Saada Saar
This email just won a sum of €5,000,000. For claims, Send your NAME, AGE & 
TELEPHONE NUMBER to:  mastercard-awa...@columbus.rr.com


Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-14 Thread Igor Djordjevic
Hi Sergey,

On 14/03/2018 08:21, Sergey Organov wrote:
> 
> There are still 2 issues about the implementation that need to be
> discussed though:
> 
> 1. Still inverted order of the second merge compared to RFC.
> 
> It'd be simple to "fix" again, except I'm not sure it'd be better, and
> as there is no existing experiences with this step to follow, it
> probably should be left as in the original, where it means "merge the
> changes made in B' (w.r.t B) into our intermediate version of the
> resulting merge".
> 
> The original Phillip's version seems to better fit the asymmetry between
> mainline and side-branch handling.
> 
> The actual difference will be only in the order of ours vs theirs in
> conflicts though, and thus it's not that critical.

Shouldn`t this be easy to solve just by changing the order of  
and , on passing to `git merge-recursive`, if needed? (or 
that`s what you meant by "simple to fix"?)

> 2. The U1' == U2' consistency check in RFC that I still think is worth
> to be implemented.

At the moment, I think we`d appreciate test cases where it actually 
proves useful, as the general consensus seems to be leaning towards 
it possibly being annoying (over-paranoid).

> In application to the method being discussed, we only need the check if
> the final merge went without conflicts, so the user was not already
> involved, and the check itself is then pretty simple:
> 
>  "proceed without stop only if $tree = $tree_U1'"
> 
> Its equivalence to the U1' == U2' test in the RFC follows from the fact
> that if M' is non-conflicting merge of U1' and U2', then M' == U1' if
> and only if U2' == U1'.

Nicely spot! I`m glad there`s still (kind of) former U1' == U2' check 
in this approach, too, in case it proves useful :)

> Finally, here is a sketch of the implementation that I'd suggest to
> use:
> 
> git-rebase-first-parent --onto A' M
> tree_U1'=$(git write-tree)
> git merge-recursive B -- $tree_U1' B'
> tree=$(git write-tree)
> M'=$(git log --pretty=%B -1 M | git commit-tree -pA' -pB')
> [ $conflicted_last_merge = "yes" ] ||
>   trees-match $tree_U1' $tree || 
>   stop-for-user-amendment

Yes, in case where we would want the "no-op merge" check (equivalent 
to U1' == U2' with original approach), this aligns with something I 
would expect.

Note that all the "rebase merge commit" steps leading to the check 
will/should probably be observed as a single one from user`s perspective 
(in worst case ending with nested conflicts we discussed), thus 
`$conflicted_last_merge` is not related to `merge-recursive` step(s) 
only, but `rebase-first-parent`, too (just in case this isn`t implied).

Might be easier to reason about simply as `[ $conflicts = "yes" ] || `

> where 'git-rebase-first-parent' denotes whatever machinery is currently
> being used to rebase simple non-merge commit. Handy approximation of
> which for stand-alone scripting is:
> 
> git checkout --detach A' && git cherry-pick -m 1 M
> 
> [As an interesting note, observe how, after all, that original Johannes
> Sixt's idea of rebasing of merge commit by cherry-picking its first
> parent is back there.]

Heh ;) It`s always a bit enlightening when people start from different 
positions, opposing ones, even (or so it may seem, at least), but 
eventually end up in the same place, through means of open (minded) 
discussion.

Regards, Buga


Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-14 Thread Igor Djordjevic
On 14/03/2018 15:24, Sergey Organov wrote:
> 
> > > Second side note: if we can fast-forward, currently we prefer that, and I
> > > think we should keep that behavior with -R, too.
> >
> > I agree.
> 
> I'm admittedly somewhat lost in the discussion, but are you talking
> fast-forward on _rebasing_ existing merge? Where would it go in any of
> the suggested algorithms of rebasing and why?
> 
> I readily see how it can break merges. E.g., any "git merge --ff-only
> --no-ff" merge will magically disappear. So, even if somehow supported,
> fast-forward should not be performed by default during _rebasing_ of a
> merge.

Hmm, now that you brought this up, I can only agree, of course.

What I had in my mind was more similar to "no-rebase-cousins", like 
if we can get away without actually rebasing the merge but still 
using the original one, do it. But I guess that`s not what Johannes 
originally asked about.

This is another definitive difference between rebasing (`pick`?) and 
recreating (`merge`) a merge commit - in the case where we`re rebasing, 
of course it doesn`t make sense to drop commit this time (due to 
fast-forward). This does make sense in recreating the merge (only).

> > > If the user wants to force a new merge, they simply remove that -R
> > > flag.

And this sounds wrong now, too, because we actually have _three_
possible behaviors here - (1) rebase merge commit, which should 
always do what its told (so no fast-forwarding, otherwise the whole 
concept of rebasing a merge commit doesn`t make sense), and recreate 
merge commit, which should (2) by default use fast-forward where 
possible (or whatever the settings say), but (3) also be possible to 
force a new merge as well (through standard `--no-ff`, I guess, or 
something).

> Alternatively, they'd replace 'pick' with 'merge', as they already do
> for other actions. "A plurality is not to be posited without necessity".
> 
> Please, _please_, don't use 'merge' command to 'pick' merge commits!
> It's utterly confusing!

I agree here, as previously discussed[1], but let`s hear Johannes.

> Thinking about it I've got an idea that what we actually need is
> --no-flatten flag that, when used alone, will just tell "git rebase" to
> stop flattening history, and which will be implicitly imposed by
> --recreate-merges (and --preserve-merges).
> 
> Then the only thing the --recreate-merges will tune is to put 'merge'
> directives into the todo list for merge commits, exactly according to
> what its name suggests, while the default behavior will be to put 'pick'
> with suitable syntax into the todo. And arguments to the
> --recreate-merge will specify additional options for the 'merge'
> directive, obviously.

This seem to basically boil down to what I mentioned previously[2] 
through use of new `--rebase-merges` alongside `--recreate-merges`, just 
that you named it `--no-flatten` here, but the point is the same - and 
not something Johannes liked, "polluting" rebase option space further.

I would agree with him, and settling onto `--rebase-merges` _instead_ of 
`--recreate-merges` seems as a more appropriate name, indeed, now that 
default behavior is actually merge commit rebasing and not recreating 
(recreating still being possible through user editing the todo list).

Now, the only thing left seems to be agreeing on actual command to 
use to rebase the merge commit, to `pick` it, so to say... ;)

Regards, Buga

[1] https://public-inbox.org/git/77b695d0-7564-80d7-d9e6-70a531e66...@gmail.com/
[2] https://public-inbox.org/git/b329bb98-f9d6-3d51-2513-465aad2fa...@gmail.com/


Re: [PATCH] http: fix an unused variable warning for 'curl_no_proxy'

2018-03-14 Thread Ramsay Jones


On 14/03/18 22:15, Jeff King wrote:
> On Wed, Mar 14, 2018 at 09:56:06PM +, Ramsay Jones wrote:
> 
>> Signed-off-by: Ramsay Jones 
>> ---
>>
>> Hi Junio,
>>
>> I happened to be building git on an _old_ laptop earlier this evening
>> and gcc complained, thus:
>>
>>   CC http.o
>>   http.c:77:20: warning: ‘curl_no_proxy’ defined but not used 
>> [-Wunused-variable]
>>static const char *curl_no_proxy;
>>   ^
>> The version of libcurl installed was 0x070f04. So, while it was fresh in my
>> mind, I applied and tested this patch.
> 
> Makes sense. This #if would go away under my "do not support antique
> curl versions" proposal. I haven't really pushed that forward since Tom
> Christensen's patches to actually make the thing build (and presumably
> since he is building on antique versions he can't turn on -Werror
> anyway, since IIRC it tends to have some false positives).

Yes, I suspected this would be removed by your proposed update (hence
the cc:), but I didn't know what the ETA on your patches was. Is this
worth doing, or are you about to re-visit that series?

> I agree with Jonathan that this explanation should be in the commit
> message. The patch itself looks OK, although:
> 
>> diff --git a/http.c b/http.c
>> index 8c11156ae..a5bd5d62c 100644
>> --- a/http.c
>> +++ b/http.c
>> @@ -69,6 +69,9 @@ static const char *ssl_key;
>>  #if LIBCURL_VERSION_NUM >= 0x070908
>>  static const char *ssl_capath;
>>  #endif
>> +#if LIBCURL_VERSION_NUM >= 0x071304
>> +static const char *curl_no_proxy;
>> +#endif
>>  #if LIBCURL_VERSION_NUM >= 0x072c00
>>  static const char *ssl_pinnedkey;
>>  #endif
>> @@ -77,7 +80,6 @@ static long curl_low_speed_limit = -1;
>>  static long curl_low_speed_time = -1;
>>  static int curl_ftp_no_epsv;
>>  static const char *curl_http_proxy;
>> -static const char *curl_no_proxy;
> 
> I'm not sure whether our ordering of these variables actually means
> much, but arguably it makes sense to keep the proxy-related variables
> near each other, even if one of them has to be surrounded by an #if.
> 
> I guess you were going for ordering the #if's in increasing version
> order. I'm not sure the existing code follows that pattern very well.

Yes, that was the idea, but I was already in two minds about that!
In the end I went with this, because not all of the proxy variables
are together anyway. ;-) (see, for example, 'proxy_auth' and 
'curl_proxyuserpwd' around line #113).

So, I don't mind placing the #ifdef around the current definition
(rather than moving it up), if you would prefer that. (It will have
to be tomorrow, since I have put that laptop away now!).

ATB,
Ramsay Jones



Re: [PATCH 0/3] Switch the default PCRE from v1 to v2 + configure fixes

2018-03-14 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> This small series makes USE_LIBPCRE=YesPlease mean
> USE_LIBPCRE2=YesPlease, instead of USE_LIBPCRE1=YesPlease is it does
> now. Along the way I fixed a couple of minor issues in the PCRE
> detection in the autoconf script.
>
> Ævar Arnfjörð Bjarmason (3):
>   configure: fix a regression in PCRE v1 detection
>   configure: detect redundant --with-libpcre & --with-libpcre1
>   Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1
>
>  Makefile | 26 +-
>  configure.ac | 26 +++---
>  2 files changed, 28 insertions(+), 24 deletions(-)

Makes sense.  Will queue.


Re: [PATCH 2/2] fetch-pack: do not check links for partial fetch

2018-03-14 Thread Jonathan Tan
On Wed, 14 Mar 2018 14:55:31 -0700
Junio C Hamano  wrote:

> Jonathan Tan  writes:
> 
> > When doing a partial clone or fetch with transfer.fsckobjects=1, use the
> > --fsck-objects instead of the --strict flag when invoking index-pack so
> > that links are not checked, only objects. This is because incomplete
> > links are expected when doing a partial clone or fetch.
> 
> It is expected that _some_ links are missing, but this makes me
> wonder if we can do better than disabling the connectivity check
> altogether.  Does "git fetch" lack sufficient information to attempt
> the connectivity check, and when (and only when) it hits a broken
> link, see if that is because the connectivity check traversal is
> crossing a "partial" fetch boundary, or something along that line?

Our only definition (currently) for the "partial" fetch boundary is
whether an object in a promisor packfile (a packfile obtained from the
promisor remote) references it, so I think that checking for crossing a
"partial" fetch boundary does not add anything. This is because by that
definition, any missing links observed from objects newly fetched from
the promisor remote cross a "partial" fetch boundary (since all objects
fetched in this way "promise" all objects that they refer to).

But it is true that we might be able to do better in checking, for
example, that a packfile fetched using a blob size limit contains all
referenced trees (that is, only blobs are allowed to be missing).


Re: How to debug a "git merge"?

2018-03-14 Thread Jeff King
On Wed, Mar 14, 2018 at 05:56:04PM +0100, Lars Schneider wrote:

> I am investigating a Git merge (a86dd40fe) in which an older version of 
> a file won over the newer version. I try to understand why this is the 
> case. I can reproduce the merge with the following commands:
> $ git checkout -b test a02fa3303
> $ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c
> 
> The merge actually generates a merge conflict but not for my
> problematic file. The common ancestor of the two parents (merge base) 
> is b91161554.
> 
> The merge graph is not pretty (the committers don't have a clean 
> branching scheme) but I cannot spot a problem between the merge commit
> and the common ancestor:
> $ git log --graph --oneline a86dd40fe
> 
> Can you give me a hint how to debug this merge further? How can I 
> understand why Git picked a certain version of a file in a merge?

Maybe a stupid question, but: did you make sure that the merge does
indeed pick the wrong version of the file? The other option is that
somebody mistakenly did a "checkout --ours" or similar while resolving
the conflict.

If the wrong file is indeed picked by the merge, then you may want to
try switching merge drivers. E.g., "-s resolve" is a bit simpler and
stupider than the default merge-recursive. If the problem goes away,
then we know it's a possible bug in merge-recursive (or maybe a
confusing implication of its strategy). If the problem is still there
with "resolve", then at least it may be easier to debug. ;)

-Peff


Re: [PATCH] http: fix an unused variable warning for 'curl_no_proxy'

2018-03-14 Thread Jeff King
On Wed, Mar 14, 2018 at 09:56:06PM +, Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Junio,
> 
> I happened to be building git on an _old_ laptop earlier this evening
> and gcc complained, thus:
> 
>   CC http.o
>   http.c:77:20: warning: ‘curl_no_proxy’ defined but not used 
> [-Wunused-variable]
>static const char *curl_no_proxy;
>   ^
> The version of libcurl installed was 0x070f04. So, while it was fresh in my
> mind, I applied and tested this patch.

Makes sense. This #if would go away under my "do not support antique
curl versions" proposal. I haven't really pushed that forward since Tom
Christensen's patches to actually make the thing build (and presumably
since he is building on antique versions he can't turn on -Werror
anyway, since IIRC it tends to have some false positives).

I agree with Jonathan that this explanation should be in the commit
message. The patch itself looks OK, although:

> diff --git a/http.c b/http.c
> index 8c11156ae..a5bd5d62c 100644
> --- a/http.c
> +++ b/http.c
> @@ -69,6 +69,9 @@ static const char *ssl_key;
>  #if LIBCURL_VERSION_NUM >= 0x070908
>  static const char *ssl_capath;
>  #endif
> +#if LIBCURL_VERSION_NUM >= 0x071304
> +static const char *curl_no_proxy;
> +#endif
>  #if LIBCURL_VERSION_NUM >= 0x072c00
>  static const char *ssl_pinnedkey;
>  #endif
> @@ -77,7 +80,6 @@ static long curl_low_speed_limit = -1;
>  static long curl_low_speed_time = -1;
>  static int curl_ftp_no_epsv;
>  static const char *curl_http_proxy;
> -static const char *curl_no_proxy;

I'm not sure whether our ordering of these variables actually means
much, but arguably it makes sense to keep the proxy-related variables
near each other, even if one of them has to be surrounded by an #if.

I guess you were going for ordering the #if's in increasing version
order. I'm not sure the existing code follows that pattern very well.

-Peff


Re: [PATCH] http: fix an unused variable warning for 'curl_no_proxy'

2018-03-14 Thread Jonathan Nieder
Hi,

Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Junio,
>
> I happened to be building git on an _old_ laptop earlier this evening
> and gcc complained, thus:
>
>   CC http.o
>   http.c:77:20: warning: ‘curl_no_proxy’ defined but not used 
> [-Wunused-variable]
>static const char *curl_no_proxy;
>   ^
> The version of libcurl installed was 0x070f04. So, while it was fresh in my
> mind, I applied and tested this patch.

Mind including this in the commit message?  Especially the error message
can be very useful.

With or without such a commit message tweak,
Reviewed-by: Jonathan Nieder 

This variable has been unused in the old-curl case since it was
introduced in v2.8.0-rc2~2^2 (http: honor no_http env variable to
bypass proxy, 2016-02-29).  Thanks for fixing it.

Sincerely,
Jonathan

> ATB,
> Ramsay Jones
> 
>  http.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/http.c b/http.c
> index 8c11156ae..a5bd5d62c 100644
> --- a/http.c
> +++ b/http.c
> @@ -69,6 +69,9 @@ static const char *ssl_key;
>  #if LIBCURL_VERSION_NUM >= 0x070908
>  static const char *ssl_capath;
>  #endif
> +#if LIBCURL_VERSION_NUM >= 0x071304
> +static const char *curl_no_proxy;
> +#endif
>  #if LIBCURL_VERSION_NUM >= 0x072c00
>  static const char *ssl_pinnedkey;
>  #endif
> @@ -77,7 +80,6 @@ static long curl_low_speed_limit = -1;
>  static long curl_low_speed_time = -1;
>  static int curl_ftp_no_epsv;
>  static const char *curl_http_proxy;
> -static const char *curl_no_proxy;
>  static const char *http_proxy_authmethod;
>  static struct {
>   const char *name;


Re: [PATCH v5 12/35] serve: introduce git-serve

2018-03-14 Thread Junio C Hamano
Brandon Williams  writes:

> Introduce git-serve, the base server for protocol version 2.
> ...
>  Documentation/Makefile  |   1 +
>  Documentation/technical/protocol-v2.txt | 174 +

asciidoc: ERROR: protocol-v2.txt: line 20: only book doctypes can contain level 
0 sections
asciidoc: ERROR: protocol-v2.txt: line 374: [blockdef-listing] missing closing 
delimiter
Makefile:368: recipe for target 'technical/protocol-v2.html' failed

I'll redo today's integration cycle to see if I can move this topic
to a late part of 'pu', so that I can at least keep the part of 'pu'
that is beyond what is in 'next' and still usable a bit larger.  The
bw/protocol-v2 topic has been merged immediately above the topics
that are already in 'next' for the past week or so, but I cannot
afford to leave the build broken for majority of merges of 'pu'.



Re: [PATCH 2/2] fetch-pack: do not check links for partial fetch

2018-03-14 Thread Junio C Hamano
Jonathan Tan  writes:

> When doing a partial clone or fetch with transfer.fsckobjects=1, use the
> --fsck-objects instead of the --strict flag when invoking index-pack so
> that links are not checked, only objects. This is because incomplete
> links are expected when doing a partial clone or fetch.

It is expected that _some_ links are missing, but this makes me
wonder if we can do better than disabling the connectivity check
altogether.  Does "git fetch" lack sufficient information to attempt
the connectivity check, and when (and only when) it hits a broken
link, see if that is because the connectivity check traversal is
crossing a "partial" fetch boundary, or something along that line?

>
> Signed-off-by: Jonathan Tan 
> ---
>  fetch-pack.c | 13 +++--
>  t/t5616-partial-clone.sh | 11 +++
>  2 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index d97461296..1d6117565 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -886,8 +886,17 @@ static int get_pack(struct fetch_pack_args *args,
>   ? fetch_fsck_objects
>   : transfer_fsck_objects >= 0
>   ? transfer_fsck_objects
> - : 0)
> - argv_array_push(&cmd.args, "--strict");
> + : 0) {
> + if (args->from_promisor)
> + /*
> +  * We cannot use --strict in index-pack because it
> +  * checks both broken objects and links, but we only
> +  * want to check for broken objects.
> +  */
> + argv_array_push(&cmd.args, "--fsck-objects");
> + else
> + argv_array_push(&cmd.args, "--strict");
> + }
>  
>   cmd.in = demux.out;
>   cmd.git_cmd = 1;
> diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
> index 29d863118..cee556536 100755
> --- a/t/t5616-partial-clone.sh
> +++ b/t/t5616-partial-clone.sh
> @@ -143,4 +143,15 @@ test_expect_success 'manual prefetch of missing objects' 
> '
>   test_line_count = 0 observed.oids
>  '
>  
> +test_expect_success 'partial clone with transfer.fsckobjects=1 uses 
> index-pack --fsck-objects' '
> + git init src &&
> + test_commit -C src x &&
> + test_config -C src uploadpack.allowfilter 1 &&
> + test_config -C src uploadpack.allowanysha1inwant 1 &&
> +
> + GIT_TRACE="$(pwd)/trace" git -c transfer.fsckobjects=1 \
> + clone --filter="blob:none" "file://$(pwd)/src" dst &&
> + grep "git index-pack.*--fsck-objects" trace
> +'
> +
>  test_done


[PATCH] http: fix an unused variable warning for 'curl_no_proxy'

2018-03-14 Thread Ramsay Jones

Signed-off-by: Ramsay Jones 
---

Hi Junio,

I happened to be building git on an _old_ laptop earlier this evening
and gcc complained, thus:

  CC http.o
  http.c:77:20: warning: ‘curl_no_proxy’ defined but not used 
[-Wunused-variable]
   static const char *curl_no_proxy;
  ^
The version of libcurl installed was 0x070f04. So, while it was fresh in my
mind, I applied and tested this patch.

ATB,
Ramsay Jones

 http.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/http.c b/http.c
index 8c11156ae..a5bd5d62c 100644
--- a/http.c
+++ b/http.c
@@ -69,6 +69,9 @@ static const char *ssl_key;
 #if LIBCURL_VERSION_NUM >= 0x070908
 static const char *ssl_capath;
 #endif
+#if LIBCURL_VERSION_NUM >= 0x071304
+static const char *curl_no_proxy;
+#endif
 #if LIBCURL_VERSION_NUM >= 0x072c00
 static const char *ssl_pinnedkey;
 #endif
@@ -77,7 +80,6 @@ static long curl_low_speed_limit = -1;
 static long curl_low_speed_time = -1;
 static int curl_ftp_no_epsv;
 static const char *curl_http_proxy;
-static const char *curl_no_proxy;
 static const char *http_proxy_authmethod;
 static struct {
const char *name;
-- 
2.16.0


[PATCH v2 1/2] stash push: avoid printing errors

2018-03-14 Thread Thomas Gummerer
Currently 'git stash push -u -- ' prints the following errors
if  only matches untracked files:

fatal: pathspec 'untracked' did not match any files
error: unrecognized input

This is because we first clean up the untracked files using 'git clean
', and then use a command chain involving 'git add -u
' and 'git apply' to clear the changes to files that are in
the index and were stashed.

As the  only includes untracked files that were already
removed by 'git clean', the 'git add' call will barf, and so will 'git
apply', as there are no changes that need to be applied.

Fix this by making sure to only call this command chain if there are
still files that match  after the call to 'git clean'.

Reported-by: Marc Strapetz 
Signed-off-by: Thomas Gummerer 
---

> Either way I'll try to address this as soon as I can get some
> time to look at it.

I finally got around to do this.  The fix (in the second patch) turns
out to be fairly simple, I just forgot to pass the pathspec along to
one function whene originally introducing the pathspec feature in git
stash push (more explanation in the commit message for the patch
itself).  Thanks Marc for reporting the two breakages!

v2 also fixes a couple of typos in the first patch which I failed to
notice when I sent it out last time.

 git-stash.sh | 2 +-
 t/t3903-stash.sh | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/git-stash.sh b/git-stash.sh
index fc8f8ae640..058ad0bed8 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -320,7 +320,7 @@ push_stash () {
git clean --force --quiet -d $CLEAN_X_OPTION -- "$@"
fi
 
-   if test $# != 0
+   if test $# != 0 && git ls-files --error-unmatch -- "$@" 
>/dev/null 2>/dev/null
then
git add -u -- "$@" |
git checkout-index -z --force --stdin
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index aefde7b172..fbfda4b243 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -1096,4 +1096,11 @@ test_expect_success 'stash --  works with binary 
files' '
test_path_is_file subdir/untracked
 '
 
+test_expect_success 'stash -u --  doesnt print error' '
+   >untracked &&
+   git stash push -u -- untracked 2>actual &&
+   test_path_is_missing untracked &&
+   test_line_count = 0 actual
+'
+
 test_done
-- 
2.16.2.804.g6dcf76e11



[PATCH v2 2/2] stash push -u: don't create empty stash

2018-03-14 Thread Thomas Gummerer
When introducing the stash push feature, and thus allowing users to pass
in a pathspec to limit the files that would get stashed in
df6bba0937 ("stash: teach 'push' (and 'create_stash') to honor
pathspec", 2017-02-28), this developer missed one place where the
pathspec should be passed in.

Namely in the call to the 'untracked_files()' function in the
'no_changes()' function.  This resulted in 'git stash push -u --
' creating an empty stash when there are untracked files
in the repository other that don't match the pathspec.

As 'git stash' never creates empty stashes, this behaviour is wrong and
confusing for users.  Instead it should just show a message "No local
changes to save", and not create a stash.

Luckily the 'untracked_files()' function already correctly respects
pathspecs that are passed to it, so the fix is simply to pass the
pathspec along to the function.

Reported-by: Marc Strapetz 
Signed-off-by: Thomas Gummerer 
---
 git-stash.sh | 2 +-
 t/t3903-stash.sh | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/git-stash.sh b/git-stash.sh
index 058ad0bed8..7a4ec98f6b 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -39,7 +39,7 @@ fi
 no_changes () {
git diff-index --quiet --cached HEAD --ignore-submodules -- "$@" &&
git diff-files --quiet --ignore-submodules -- "$@" &&
-   (test -z "$untracked" || test -z "$(untracked_files)")
+   (test -z "$untracked" || test -z "$(untracked_files $@)")
 }
 
 untracked_files () {
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index fbfda4b243..5e7078c083 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -1103,4 +1103,10 @@ test_expect_success 'stash -u --  doesnt 
print error' '
test_line_count = 0 actual
 '
 
+test_expect_success 'stash -u --  shows no changes when there 
are none' '
+   git stash push -u -- non-existant >actual &&
+   echo "No local changes to save" >expect &&
+   test_i18ncmp expect actual
+'
+
 test_done
-- 
2.16.2.804.g6dcf76e11



[PATCH v2 3/3] shortlog: disallow left-over arguments outside repo

2018-03-14 Thread Martin Ågren
If we are outside a repo and have any arguments left after
option-parsing, `setup_revisions()` will try to do its job and
something like this will happen:

$ git shortlog v2.16.0..
BUG: environment.c:183: git environment hasn't been setup
Aborted (core dumped)

The usage is wrong, but we could obviously handle this better. Note that
commit abe549e179 (shortlog: do not require to run from inside a git
repository, 2008-03-14) explicitly enabled `git shortlog` to run from
outside a repo, since we do not need a repo for parsing data from stdin.

Disallow left-over arguments when run from outside a repo.

Signed-off-by: Martin Ågren 
---
 t/t4201-shortlog.sh | 5 +
 builtin/shortlog.c  | 5 +
 2 files changed, 10 insertions(+)

diff --git a/t/t4201-shortlog.sh b/t/t4201-shortlog.sh
index da10478f59..ff6649ed9a 100755
--- a/t/t4201-shortlog.sh
+++ b/t/t4201-shortlog.sh
@@ -127,6 +127,11 @@ test_expect_success !MINGW 'shortlog can read --format=raw 
output' '
test_cmp expect out
 '
 
+test_expect_success 'shortlog from non-git directory refuses extra arguments' '
+   test_must_fail env GIT_DIR=non-existing git shortlog foo 2>out &&
+   test_i18ngrep "too many arguments" out
+'
+
 test_expect_success 'shortlog should add newline when input line matches 
wraplen' '
cat >expect <<\EOF &&
 A U Thor (2):
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index dc4af03fca..3a823b3128 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -293,6 +293,11 @@ int cmd_shortlog(int argc, const char **argv, const char 
*prefix)
 parse_done:
argc = parse_options_end(&ctx);
 
+   if (nongit && argc > 1) {
+   error(_("too many arguments given outside repository"));
+   usage_with_options(shortlog_usage, options);
+   }
+
if (setup_revisions(argc, argv, &rev, NULL) != 1) {
error(_("unrecognized argument: %s"), argv[1]);
usage_with_options(shortlog_usage, options);
-- 
2.16.2.246.ga4ee8f



[PATCH v2 2/3] shortlog: add usage-string for stdin-reading

2018-03-14 Thread Martin Ågren
This has been missing since we learned to print usage, way back in
4e27fb06f (add commit count options to git-shortlog, 2006-10-06).

While at it, drop the [] around "...". This matches `git log -h`
and Documentation/git-{short}log.txt. It formally makes it look like we
do not allow `git shortlog --`, but we gain readability and consistency.

Signed-off-by: Martin Ågren 
Signed-off-by: Junio C Hamano 
---
 builtin/shortlog.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index e29875b843..dc4af03fca 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -11,7 +11,8 @@
 #include "parse-options.h"
 
 static char const * const shortlog_usage[] = {
-   N_("git shortlog [] [] [[--] [...]]"),
+   N_("git shortlog [] [] [[--] ...]"),
+   N_("git log --pretty=short | git shortlog []"),
NULL
 };
 
-- 
2.16.2.246.ga4ee8f



[PATCH v2 1/3] git-shortlog.txt: reorder usages

2018-03-14 Thread Martin Ågren
The first usage we give is the original one where, e.g., `git log` is
piped through `git shortlog`. The description that follows reads the
other way round, by first focusing on the general behavior, then ending
with the behavior when reading from stdin.

It is also a tiny bit odd that what is probably the most common usage
and the one a reader is probably looking for is not at the top of the
list. Of course, it is only a two-item list, so it is not _that_ hard to
find... The next commit will add the original usage to the usage string
in builtin/shortlog.c, and it feels more natural to do so below the
most common usage. To avoid being inconsistent, reorder these two
usages here first.

Signed-off-by: Martin Ågren 
Signed-off-by: Junio C Hamano 
---
 Documentation/git-shortlog.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index ee6c5476c1..5e35ea18ac 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -8,8 +8,8 @@ git-shortlog - Summarize 'git log' output
 SYNOPSIS
 
 [verse]
-git log --pretty=short | 'git shortlog' []
 'git shortlog' [] [] [[\--] ...]
+git log --pretty=short | 'git shortlog' []
 
 DESCRIPTION
 ---
-- 
2.16.2.246.ga4ee8f



[PATCH v2 0/3] shortlog: disallow left-over arguments outside repo

2018-03-14 Thread Martin Ågren
This is v2 of my attempt at stopping shortlog from BUG-ing when it is
used incorrectly outside a repo. Thanks Jonathan and Junio for helpful
comments.

Patches 1 and 2 are identical to pu. The error message in patch 3 is now
more general. The error condition on the other hand is a bit more
specific, "argc > 1", to better match the intention and commit message.

Martin

Martin Ågren (3):
  git-shortlog.txt: reorder usages
  shortlog: add usage-string for stdin-reading
  shortlog: disallow left-over arguments when run outside repo

 Documentation/git-shortlog.txt | 2 +-
 t/t4201-shortlog.sh| 5 +
 builtin/shortlog.c | 8 +++-
 3 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.16.2.246.ga4ee8f



Re: [PATCH v5 01/35] pkt-line: introduce packet_read_with_status

2018-03-14 Thread Junio C Hamano
Brandon Williams  writes:

> +/*
> + * Read a packetized line into a buffer like the 'packet_read()' function but
> + * returns an 'enum packet_read_status' which indicates the status of the 
> read.
> + * The number of bytes read will be assigined to *pktlen if the status of the
> + * read was 'PACKET_READ_NORMAL'.
> + */
> +enum packet_read_status {
> + PACKET_READ_EOF,
> + PACKET_READ_NORMAL,
> + PACKET_READ_FLUSH,
> +};

EOF was -1 and NORMAL was 0 in the previous round; do we need to
read through all the invocations of functions that return this type
and make sure there is no "while (such_a_function())" that used to see
if we read NORMAL that is left un-updated?

I just have gone thru all the hits from

 $ git grep -n -e packet_erad_with_status -e packet_reader_read -e 
packet_reader_peek

There are a few

switch (packet_reader_peek())

which by definition we do not have to worry about.  Then majority of
what could be problematic are of the form:

while (packet_reader_read() == PACKET_READ_NORMAL)

and they were this way even in the previous version, so it seems
quite alright.

Will replace.  Thanks.


Re: How to debug a "git merge"?

2018-03-14 Thread Lars Schneider

> On 14 Mar 2018, at 18:02, Derrick Stolee  wrote:
> 
> On 3/14/2018 12:56 PM, Lars Schneider wrote:
>> Hi,
>> 
>> I am investigating a Git merge (a86dd40fe) in which an older version of
>> a file won over the newer version. I try to understand why this is the
>> case. I can reproduce the merge with the following commands:
>> $ git checkout -b test a02fa3303
>> $ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c
>> 
>> The merge actually generates a merge conflict but not for my
>> problematic file. The common ancestor of the two parents (merge base)
>> is b91161554.
>> 
>> The merge graph is not pretty (the committers don't have a clean
>> branching scheme) but I cannot spot a problem between the merge commit
>> and the common ancestor:
>> $ git log --graph --oneline a86dd40fe
> 
> Have you tried `git log --graph --oneline --simplify-merges -- path` to see 
> what changes and merges involved the file? I find that view to be very 
> helpful (while the default history simplification can hide things). In 
> particular, if there was a change that was reverted in one side and not 
> another, we could find out.

Thanks for this tip! Unfortunately, this only confirms my current view:

### First parent
$ git log --graph --oneline --simplify-merges a02fa3303 -- path/to/problem
* 4e47a10c7 <-- old version
* 01f01f61c 

### Second parent
$ git log --graph --oneline --simplify-merges c1b82995c -- path/to/problem
* 590e52ed1 <-- new version
* 8e598828d 
* ad4e9034b 
* 4e47a10c7 
* 01f01f61c 

### Merge
$ git log --graph --oneline --simplify-merges a86dd40fe -- path/to/problem
*   a86dd40fe <-- old version ?!?! That's the problem!
|\
| * 590e52ed1 <-- new version
| * 8e598828d
| * ad4e9034b
|/
* 4e47a10c7 <-- old version
* 01f01f61c


> You could also use the "A...B" to check your two commits for merging, and 
> maybe add "--boundary".

$ git diff --boundary a02fa3303...c1b82995c -- path/to/problem

This looks like the correct diff. The "new version" is mark as +/add/green in 
the diff.

Does this make any sense to you?

Thank you,
Lars

Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-14 Thread Junio C Hamano
Derrick Stolee  writes:

> This v6 includes feedback around csum-file.c and the rename of hashclose()
> to finalize_hashfile(). These are the first two commits of the series, so
> they could be pulled out independently.
>
> The only other change since v5 is that I re-ran the performance numbers
> in "commit: integrate commit graph with commit parsing".

Thanks.

> Hopefully this version is ready to merge. I have several follow-up topics
> in mind to submit soon after, including:

A few patches add trailing blank lines and other whitespace
breakages, which will stop my "git merge" later to 'next' and down,
as I have a pre-commit hook to catch them.

Here is the output from my "git am -s" session.

Applying: csum-file: rename hashclose() to finalize_hashfile()
Applying: csum-file: refactor finalize_hashfile() method
.git/rebase-apply/patch:109: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Applying: commit-graph: add format document
.git/rebase-apply/patch:175: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Applying: graph: add commit graph design document
.git/rebase-apply/patch:42: new blank line at EOF.
+
.git/rebase-apply/patch:109: new blank line at EOF.
+
warning: 2 lines add whitespace errors.
Applying: commit-graph: create git-commit-graph builtin
.git/rebase-apply/patch:323: space before tab in indent.
fd = hold_lock_file_for_update(&lk, graph_name, 0);
.git/rebase-apply/patch:334: space before tab in indent.
fd = hold_lock_file_for_update(&lk, graph_name, 
LOCK_DIE_ON_ERROR);
.git/rebase-apply/patch:385: new blank line at EOF.
+
.git/rebase-apply/patch:398: new blank line at EOF.
+
warning: 2 lines applied after fixing whitespace errors.
Applying: commit-graph: implement write_commit_graph()
.git/rebase-apply/patch:138: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:144: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:154: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:160: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:197: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
warning: squelched 6 whitespace errors
warning: 10 lines applied after fixing whitespace errors.
Applying: commit-graph: implement 'git-commit-graph write'
Test number t5318 already taken
.git/rebase-apply/patch:346: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:356: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:366: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:374: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:384: indent with spaces.
cd "$TRASH_DIRECTORY/bare" &&
warning: 5 lines add whitespace errors.
Applying: commit-graph: implement git commit-graph read
Applying: commit-graph: add core.commitGraph setting
Applying: commit-graph: close under reachability
.git/rebase-apply/patch:302: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:310: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:321: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:331: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:341: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
warning: squelched 2 whitespace errors
warning: 7 lines add whitespace errors.
Applying: commit: integrate commit graph with commit parsing
.git/rebase-apply/patch:224: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:227: trailing whitespace.
graph_read_expect "9" "large_edges" 
.git/rebase-apply/patch:234: indent with spaces.
cd "$TRASH_DIRECTORY" &&
warning: 2 lines applied after fixing whitespace errors.
Applying: commit-graph: read only from specific pack-indexes
.git/rebase-apply/patch:196: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:209: indent with spaces.
cd "$TRASH_DIRECTORY" &&
warning: 1 line applied after fixing whitespace errors.
Applying: commit-graph: build graph from starting commits
.git/rebase-apply/patch:148: indent with spaces.
cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:158: indent with spaces.
cd "$TRASH_DIRECTORY" &&
warning: 1 line applied after fixing whitespace errors.
Applying: commit-graph: implement "--additive" option



Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-14 Thread Ramsay Jones


On 14/03/18 19:27, Derrick Stolee wrote:
> This v6 includes feedback around csum-file.c and the rename of hashclose()
> to finalize_hashfile(). These are the first two commits of the series, so
> they could be pulled out independently.
> 
> The only other change since v5 is that I re-ran the performance numbers
> in "commit: integrate commit graph with commit parsing".

I haven't looked at v6 (I will wait for it to hit pu), but v5 is
still causing sparse to complain.

The diff given below (on top of current pu @9e418c7c9), fixes it
for me. (Using a plain integer as a NULL pointer, in builtin/commit-
graph.c, and the 'commit_graph' symbol should be file-local, in
commit-graph.c).

Thanks!

ATB,
Ramsay Jones

-- >8 --
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 62ac26e44..855df66bd 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -31,7 +31,7 @@ static struct opts_commit_graph {
 
 static int graph_read(int argc, const char **argv)
 {
-   struct commit_graph *graph = 0;
+   struct commit_graph *graph = NULL;
char *graph_name;
 
static struct option builtin_commit_graph_read_options[] = {
diff --git a/commit-graph.c b/commit-graph.c
index 631edac4c..7b45fe85d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -182,7 +182,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 }
 
 /* global storage */
-struct commit_graph *commit_graph = NULL;
+static struct commit_graph *commit_graph = NULL;
 
 static void prepare_commit_graph_one(const char *obj_dir)
 {



Re: [PATCH v3 00/36] object_id part 12

2018-03-14 Thread Junio C Hamano
Junio C Hamano  writes:

> "brian m. carlson"  writes:
>
>> This is the twelfth in a series of patches to convert various parts of
>> the code to struct object_id.
>>
>> brian m. carlson (36):
>>   ...
> As always, thanks for working on this.  

There are a few topics that add new callsites to functions that are
updated (e.g. find_unique_abbrev()), so I'll need to do a bit of the
usual evil merging to coax this topic in.  Please holler if you spot
my screw-up in what I'll push out in a few hours.

Thanks.


[PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile()

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee 
---
 builtin/index-pack.c   | 2 +-
 builtin/pack-objects.c | 6 +++---
 bulk-checkin.c | 4 ++--
 csum-file.c| 2 +-
 csum-file.h| 4 ++--
 fast-import.c  | 2 +-
 pack-bitmap-write.c| 2 +-
 pack-write.c   | 4 ++--
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 59878e70b8..157bceb264 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1269,7 +1269,7 @@ static void conclude_pack(int fix_thin_pack, const char 
*curr_pack, unsigned cha
nr_objects - nr_objects_initial);
stop_progress_msg(&progress, msg.buf);
strbuf_release(&msg);
-   hashclose(f, tail_hash, 0);
+   finalize_hashfile(f, tail_hash, 0);
hashcpy(read_hash, pack_hash);
fixup_pack_header_footer(output_fd, pack_hash,
 curr_pack, nr_objects,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a197926eaa..84e9f57b7f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,11 +837,11 @@ static void write_pack_file(void)
 * If so, rewrite it like in fast-import
 */
if (pack_to_stdout) {
-   hashclose(f, oid.hash, CSUM_CLOSE);
+   finalize_hashfile(f, oid.hash, CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
-   hashclose(f, oid.hash, CSUM_FSYNC);
+   finalize_hashfile(f, oid.hash, CSUM_FSYNC);
} else {
-   int fd = hashclose(f, oid.hash, 0);
+   int fd = finalize_hashfile(f, oid.hash, 0);
fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
 nr_written, oid.hash, offset);
close(fd);
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9d87eac07b..227cc9f3b1 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,9 +35,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state 
*state)
unlink(state->pack_tmp_name);
goto clear_exit;
} else if (state->nr_written == 1) {
-   hashclose(state->f, oid.hash, CSUM_FSYNC);
+   finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
} else {
-   int fd = hashclose(state->f, oid.hash, 0);
+   int fd = finalize_hashfile(state->f, oid.hash, 0);
fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
 state->nr_written, oid.hash,
 state->offset);
diff --git a/csum-file.c b/csum-file.c
index 5eda7fb6af..e6c95a6915 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -53,7 +53,7 @@ void hashflush(struct hashfile *f)
}
 }
 
-int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
+int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int 
flags)
 {
int fd;
 
diff --git a/csum-file.h b/csum-file.h
index 992e5c0141..9ba87f0a6c 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -26,14 +26,14 @@ struct hashfile_checkpoint {
 extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint 
*);
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
-/* hashclose flags */
+/* finalize_hashfile flags */
 #define CSUM_CLOSE 1
 #define CSUM_FSYNC 2
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
 extern struct hashfile *hashfd_throughput(int fd, const char *name, struct 
progress *tp);
-extern int hashclose(struct hashfile *, unsigned char *, unsigned int);
+extern int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
 extern void hashwrite(struct hashfile *, const void *, unsigned int);
 extern void hashflush(struct hashfile *f);
 extern void crc32_begin(struct hashfile *);
diff --git a/fast-import.c b/fast-import.c
index 58ef360da4..2e5d17318d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1016,7 +1016,7 @@ static void end_packfile(void)
struct tag *t;
 
close_pack_windows(pack_data);
-   hashclose(pack_file, cur_pack_oid.hash, 0);
+   finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
pack_data->pack_name, object_count,
cur_pack_oid.hash, pack_size);

[PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  39 
 builtin/commit-graph.c |  33 ++
 t/t5318-commit-graph.sh| 125 +
 3 files changed, 197 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 5913340fad..e688843808 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,6 +5,45 @@ NAME
 
 git-commit-graph - Write and verify Git commit graph files
 
+
+SYNOPSIS
+
+[verse]
+'git commit-graph write'  [--object-dir ]
+
+
+DESCRIPTION
+---
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+---
+--object-dir::
+   Use given directory for the location of packfiles and commit graph
+   file. The commit graph file is expected to be at /info/commit-graph
+   and the packfiles are expected to be in /pack.
+
+
+COMMANDS
+
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
+Includes all commits from the existing commit graph file.
+
+
+EXAMPLES
+
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+
+$ git commit-graph write
+
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 8ff7336527..a9d61f649a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -1,9 +1,18 @@
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
+#include "lockfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
+   N_("git commit-graph write [--object-dir ]"),
+   NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+   N_("git commit-graph write [--object-dir ]"),
NULL
 };
 
@@ -11,6 +20,25 @@ static struct opts_commit_graph {
const char *obj_dir;
 } opts;
 
+static int graph_write(int argc, const char **argv)
+{
+   static struct option builtin_commit_graph_write_options[] = {
+   OPT_STRING(0, "object-dir", &opts.obj_dir,
+   N_("dir"),
+   N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_write_options,
+builtin_commit_graph_write_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   write_commit_graph(opts.obj_dir);
+   return 0;
+}
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
@@ -31,6 +59,11 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
 builtin_commit_graph_usage,
 PARSE_OPT_STOP_AT_NON_OPTION);
 
+   if (argc > 0) {
+   if (!strcmp(argv[0], "write"))
+   return graph_write(argc, argv);
+   }
+
usage_with_options(builtin_commit_graph_usage,
   builtin_commit_graph_options);
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 00..43707ce5bb
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,125 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+   mkdir full &&
+   cd "$TRASH_DIRECTORY/full" &&
+   git init &&
+   objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+cd "$TRASH_DIRECTORY/full" &&
+   git commit-graph write --object-dir . &&
+   test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+cd "$TRASH_DIRECTORY/full" &&
+   for i in $(test_seq 3)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git repack
+'
+
+test_expect_success 'write graph' '
+cd "$TRASH_DIRECTORY/full" &&
+   graph1=$(git commit-graph write) &&
+   test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add more commits' '
+cd "$TRASH_DIRECTORY/full" &&
+   git reset --hard commits/1 &&
+   for i in $(test_seq 4 5)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git reset --hard commits/2 &&
+   for i in $(test_seq 6 7)
+   do
+   test_commit $i &&
+   git branch commits/$i
+   done &&
+   git reset --hard commits/2 &&
+   git merge commits/4 &&
+

[PATCH v6 05/14] commit-graph: create git-commit-graph builtin

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Signed-off-by: Derrick Stolee 
---
 .gitignore |  1 +
 Documentation/git-commit-graph.txt | 11 ++
 Makefile   |  1 +
 builtin.h  |  1 +
 builtin/commit-graph.c | 37 ++
 command-list.txt   |  1 +
 contrib/completion/git-completion.bash |  2 ++
 git.c  |  1 +
 8 files changed, 55 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..e82f90184d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
new file mode 100644
index 00..5913340fad
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,11 @@
+git-commit-graph(1)
+===
+
+NAME
+
+git-commit-graph - Write and verify Git commit graph files
+
+GIT
+---
+Part of the linkgit:git[1] suite
+
diff --git a/Makefile b/Makefile
index de4b8f0c02..a928d4de66 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..079855b6d4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const 
char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 00..8ff7336527
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,37 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+   N_("git commit-graph [--object-dir ]"),
+   NULL
+};
+
+static struct opts_commit_graph {
+   const char *obj_dir;
+} opts;
+
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+   static struct option builtin_commit_graph_options[] = {
+   OPT_STRING(0, "object-dir", &opts.obj_dir,
+   N_("dir"),
+   N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage_with_options(builtin_commit_graph_usage,
+  builtin_commit_graph_options);
+
+   git_config(git_default_config, NULL);
+   argc = parse_options(argc, argv, prefix,
+builtin_commit_graph_options,
+builtin_commit_graph_usage,
+PARSE_OPT_STOP_AT_NON_OPTION);
+
+   usage_with_options(builtin_commit_graph_usage,
+  builtin_commit_graph_options);
+}
+
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..835c5890be 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean   mainporcelain
 git-clone   mainporcelain   init
 git-column  purehelpers
 git-commit  mainporcelain   history
+git-commit-graphplumbingmanipulators
 git-commit-tree plumbingmanipulators
 git-config  ancillarymanipulators
 git-count-objects   ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 91536d831c..a24af902d8 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -841,6 +841,7 @@ __git_list_porcelain_commands ()
check-ref-format) : plumbing;;
checkout-index)   : plumbing;;
column)   : internal helper;;
+   commit-graph) 

[PATCH v6 13/14] commit-graph: build graph from starting commits

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 14 +-
 builtin/commit-graph.c | 27 +--
 commit-graph.c | 27 +--
 commit-graph.h |  4 +++-
 t/t5318-commit-graph.sh| 13 +
 5 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index b945510f0f..0710a68f2d 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -34,7 +34,13 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
-walking objects only in the specified packfiles.
+walking objects only in the specified packfiles. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
 
 'read'::
 
@@ -58,6 +64,12 @@ $ git commit-graph write
 $ echo  | git commit-graph write --stdin-packs
 
 
+* Write a graph file containing all reachable commits.
++
+
+$ git show-ref -s | git commit-graph write --stdin-commits
+
+
 * Read basic information from the commit-graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index eebca57e6f..1c7b7e72b0 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
+   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -18,13 +18,14 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
+   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
int stdin_packs;
+   int stdin_commits;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -79,6 +80,8 @@ static int graph_write(int argc, const char **argv)
 {
const char **pack_indexes = NULL;
int packs_nr = 0;
+   const char **commit_hex = NULL;
+   int commits_nr = 0;
const char **lines = NULL;
int lines_nr = 0;
int lines_alloc = 0;
@@ -89,6 +92,8 @@ static int graph_write(int argc, const char **argv)
N_("The object directory to store the graph")),
OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
N_("scan packfiles listed by stdin for commits")),
+   OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+   N_("start walk at commits listed by stdin")),
OPT_END(),
};
 
@@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
+   if (opts.stdin_packs && opts.stdin_commits)
+   die(_("cannot use both --stdin-commits and --stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
-   if (opts.stdin_packs) {
+   if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
lines_nr = 0;
lines_alloc = 128;
@@ -110,13 +117,21 @@ static int graph_write(int argc, const char **argv)
lines[lines_nr++] = strbuf_detach(&buf, NULL);
}
 
-   pack_indexes = lines;
-   packs_nr = lines_nr;
+   if (opts.stdin_packs) {
+   pack_indexes = lines;
+   packs_nr = lines_nr;
+   }
+   if (opts.stdin_commits) {
+   commit_hex = lines;
+   commits_nr = lines_nr;
+ 

[PATCH v6 14/14] commit-graph: implement "--additive" option

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 10 ++
 builtin/commit-graph.c | 10 +++---
 commit-graph.c | 17 -
 commit-graph.h |  3 ++-
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 0710a68f2d..ccf5e203ce 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -41,6 +41,9 @@ With the `--stdin-commits` option, generate the new commit 
graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
 --stdin-packs.)
++
+With the `--additive` option, include all commits that are present
+in the existing commit-graph file.
 
 'read'::
 
@@ -70,6 +73,13 @@ $ echo  | git commit-graph write --stdin-packs
 $ git show-ref -s | git commit-graph write --stdin-commits
 
 
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --additive
+
+
 * Read basic information from the commit-graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 1c7b7e72b0..d26a6d6de3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--additive] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = 
{
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--additive] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
const char *obj_dir;
int stdin_packs;
int stdin_commits;
+   int additive;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -94,6 +95,8 @@ static int graph_write(int argc, const char **argv)
N_("scan packfiles listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
N_("start walk at commits listed by stdin")),
+   OPT_BOOL(0, "additive", &opts.additive,
+   N_("include all commits already in the commit-graph 
file")),
OPT_END(),
};
 
@@ -131,7 +134,8 @@ static int graph_write(int argc, const char **argv)
   pack_indexes,
   packs_nr,
   commit_hex,
-  commits_nr);
+  commits_nr,
+  opts.additive);
 
return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index 9f1ba9bff6..6348bab82b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -533,7 +533,8 @@ void write_commit_graph(const char *obj_dir,
const char **pack_indexes,
int nr_packs,
const char **commit_hex,
-   int nr_commits)
+   int nr_commits,
+   int additive)
 {
struct packed_oid_list oids;
struct packed_commit_list commits;
@@ -551,10 +552,24 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
oids.alloc = approximate_object_count() / 4;
 
+   if (additive) {
+   prepare_commit_graph_one(obj_dir);
+   if (commit_graph)
+   oids.alloc += commit_graph->num_commits;
+   }
+
if (oids.alloc < 1024)
oids.alloc = 1024;
ALLOC_ARRAY(oids.list, oids.alloc);
 
+   if (additive && commit_graph) {
+   for (i = 0; i < commit_graph->num_commits; i++) {
+   const unsigned char *hash = 
commit_graph->chunk_oid_lookup +
+   commit_graph->hash_len * i;
+   hashcpy(oids.list[oids.nr++].hash, hash);
+   }
+   }
+
if (pack_indexes) {
struct strbuf pack

[PATCH v6 12/14] commit-graph: read only from specific pack-indexes

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt | 11 ++-
 builtin/commit-graph.c | 33 ++---
 commit-graph.c | 26 --
 commit-graph.h |  4 +++-
 packfile.c |  4 ++--
 packfile.h |  2 ++
 t/t5318-commit-graph.sh| 10 ++
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 51cb038f3d..b945510f0f 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -32,7 +32,9 @@ COMMANDS
 'write'::
 
 Write a commit graph file based on the commits found in packfiles.
-Includes all commits from the existing commit graph file.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified packfiles.
 
 'read'::
 
@@ -49,6 +51,13 @@ EXAMPLES
 $ git commit-graph write
 
 
+* Write a graph file, extending the current graph file using commits
+* in .
++
+
+$ echo  | git commit-graph write --stdin-packs
+
+
 * Read basic information from the commit-graph file.
 +
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0e164becff..eebca57e6f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ]"),
+   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
NULL
 };
 
@@ -18,12 +18,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ]"),
+   N_("git commit-graph write [--object-dir ] [--stdin-packs]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int stdin_packs;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
+   const char **pack_indexes = NULL;
+   int packs_nr = 0;
+   const char **lines = NULL;
+   int lines_nr = 0;
+   int lines_alloc = 0;
+
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+   N_("scan packfiles listed by stdin for commits")),
OPT_END(),
};
 
@@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
-   write_commit_graph(opts.obj_dir);
+   if (opts.stdin_packs) {
+   struct strbuf buf = STRBUF_INIT;
+   lines_nr = 0;
+   lines_alloc = 128;
+   ALLOC_ARRAY(lines, lines_alloc);
+
+   while (strbuf_getline(&buf, stdin) != EOF) {
+   ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+   lines[lines_nr++] = strbuf_detach(&buf, NULL);
+   }
+
+   pack_indexes = lines;
+   packs_nr = lines_nr;
+   }
+
+   write_commit_graph(opts.obj_dir,
+  pack_indexes,
+  packs_nr);
+
return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 98e2b89b94..f0d7585ddb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -529,7 +529,9 @@ static void close_reachable(struct packed_oid_list *oids)
}
 }
 
-void write_commit_graph(const char *obj_dir)
+void write_commit_graph(const char *obj_dir,
+   const char **pack_indexes,
+   int nr_packs)
 {
struct packed_oid_list oids;
struct packed_commit_list commits;
@@ -551,7 +553,27 @@ void write_commit_graph(const char *obj_dir)
oids.alloc = 1024;
ALLOC_ARRAY(oids.list, oids.alloc);
 
-   for_each_packed_object(add_packed_commits, &oids, 0);
+   if (pack_indexes) {
+   struct strbuf packname = STRBUF_INIT;
+   int dirlen;
+   strbuf_addf(&packname, "%s/pack/", obj_dir);
+   dirlen = packname.len;
+ 

[PATCH v6 11/14] commit: integrate commit graph with commit parsing

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv   |  1.02s |  0.14s | -86%  |
| rev-list --all   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee 
---
 alloc.c |   1 +
 commit-graph.c  | 141 +++-
 commit-graph.h  |  12 +
 commit.c|   3 ++
 commit.h|   3 ++
 t/t5318-commit-graph.sh |  47 +++-
 6 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
+   c->graph_pos = COMMIT_NOT_FROM_GRAPH;
return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index fc7b4fa622..98e2b89b94 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,7 +38,6 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
GRAPH_OID_LEN + 8)
 
-
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -182,6 +181,145 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
exit(1);
 }
 
+/* global storage */
+struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+   char *graph_name;
+
+   if (commit_graph)
+   return;
+
+   graph_name = get_commit_graph_filename(obj_dir);
+   commit_graph = load_commit_graph_one(graph_name);
+
+   FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+   struct alternate_object_database *alt;
+   char *obj_dir;
+
+   if (prepare_commit_graph_run_once)
+   return;
+   prepare_commit_graph_run_once = 1;
+
+   obj_dir = get_object_directory();
+   prepare_commit_graph_one(obj_dir);
+   prepare_alt_odb();
+   for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+   prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+   if (!commit_graph)
+   return;
+
+   if (commit_graph->graph_fd >= 0) {
+   munmap((void *)commit_graph->data, commit_graph->data_len);
+   commit_graph->data = NULL;
+   close(commit_graph->graph_fd);
+   }
+
+   FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, 
uint32_t *pos)
+{
+   return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+   g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+uint64_t pos,
+struct commit_list **pptr)
+{
+   struct commit *c;
+   struct object_id oid;
+   hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+   c = lookup_commit(&oid);
+   if (!c)
+   die("could not find commit %s", oid_to_hex(&oid));
+   c->graph_pos = pos;
+   return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
+{
+   struct object_id oid;
+   uint32_t edge_value;
+   uint32_t *parent_data_ptr;
+   uint64_t date_low, date_high;
+   struct commit_list **pptr;
+   const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len 
+ 16) * pos;
+
+   item->object.parsed = 1;
+   item->graph_pos = pos;
+
+   hashcpy(oid.hash, commit_data);
+   item->tree = lookup_tree(&oid);
+
+   date_high = ntohl(*(uint32_t*)(commit_data + g->hash_len + 8)) & 0x3;
+   date_low = ntohl(*(uint32_t*)(commit_data + g->hash_len + 12));
+   item->date = (timestam

[PATCH v6 04/14] graph: add commit graph design document

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 164 +++
 1 file changed, 164 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
new file mode 100644
index 00..d11753ac6f
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,164 @@
+Git Commit Graph Design Notes
+=
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+If A and B are commits with generation numbers N and M, respectively,
+and N <= M, then A cannot reach B. That is, we know without searching
+that B is not an ancestor of A because it is further from a root commit
+than A.
+
+Conversely, when checking if A is an ancestor of B, then we only need
+to walk commits until all commits on the walk boundary have generation
+number at most N. If we walk commits using a priority queue seeded by
+generation numbers, then we always expand the boundary commit with highest
+generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+If A and B are commits with commit time X and Y, respectively, and
+X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+---
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked a

[PATCH v6 09/14] commit-graph: add core.commitGraph setting

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 3 +++
 cache.h  | 1 +
 config.c | 5 +
 environment.c| 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ce9102cea8..9e3da629b8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+   Enable git commit graph feature. Allows reading from .graph files.
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index d06932ed0b..e62569fbb1 100644
--- a/cache.h
+++ b/cache.h
@@ -801,6 +801,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index b0c20e6cb8..25ee4a676c 100644
--- a/config.c
+++ b/config.c
@@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.commitgraph")) {
+   core_commit_graph = git_config_bool(var, value);
+   return 0;
+   }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index d6dd64662c..8853e2f0dd 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ enum push_default_type push_default = 
PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.14.1



[PATCH v6 00/14] Serialized Git Commit Graph

2018-03-14 Thread Derrick Stolee
This v6 includes feedback around csum-file.c and the rename of hashclose()
to finalize_hashfile(). These are the first two commits of the series, so
they could be pulled out independently.

The only other change since v5 is that I re-ran the performance numbers
in "commit: integrate commit graph with commit parsing".

Hopefully this version is ready to merge. I have several follow-up topics
in mind to submit soon after, including:

* Auto-generate the commit graph as the repo changes:
   i. teach git-commit-graph an "fsck" subcommand and integrate with git-fsck
  ii. teach git-repack to call git-commit-graph
* Generation numbers:
   i. teach git-commit-graph to compute generation numbers
  ii. consume generation numbers in paint_down_to_common()
* Move globals from commit-graph.c to the_repository

The three bullets (*) are relatively independent but have sub-items that
appear in priority order.

Derrick Stolee (14):
  csum-file: rename hashclose() to finalize_hashfile()
  csum-file: refactor finalize_hashfile() method
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  commit-graph: implement write_commit_graph()
  commit-graph: implement 'git-commit-graph write'
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--additive" option

 .gitignore  |   1 +
 Documentation/config.txt|   3 +
 Documentation/git-commit-graph.txt  |  93 +++
 Documentation/technical/commit-graph-format.txt |  98 
 Documentation/technical/commit-graph.txt| 164 ++
 Makefile|   2 +
 alloc.c |   1 +
 builtin.h   |   1 +
 builtin/commit-graph.c  | 172 ++
 builtin/index-pack.c|   2 +-
 builtin/pack-objects.c  |   6 +-
 bulk-checkin.c  |   4 +-
 cache.h |   1 +
 command-list.txt|   1 +
 commit-graph.c  | 719 
 commit-graph.h  |  47 ++
 commit.c|   3 +
 commit.h|   3 +
 config.c|   5 +
 contrib/completion/git-completion.bash  |   2 +
 csum-file.c |  10 +-
 csum-file.h |   9 +-
 environment.c   |   1 +
 fast-import.c   |   2 +-
 git.c   |   1 +
 pack-bitmap-write.c |   2 +-
 pack-write.c|   5 +-
 packfile.c  |   4 +-
 packfile.h  |   2 +
 t/t5318-commit-graph.sh | 225 
 30 files changed, 1568 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 Documentation/technical/commit-graph-format.txt
 create mode 100644 Documentation/technical/commit-graph.txt
 create mode 100644 builtin/commit-graph.c
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h
 create mode 100755 t/t5318-commit-graph.sh


base-commit: d0db9edba0050ada6f6eac68061599690d2a4333
-- 
2.14.1



[PATCH v6 08/14] commit-graph: implement git commit-graph read

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  12 
 builtin/commit-graph.c |  56 +++
 commit-graph.c | 140 -
 commit-graph.h |  23 ++
 t/t5318-commit-graph.sh|  32 +++--
 5 files changed, 257 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index e688843808..51cb038f3d 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 
 [verse]
+'git commit-graph read'  [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
 
@@ -33,6 +34,11 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 Includes all commits from the existing commit graph file.
 
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
 
 EXAMPLES
 
@@ -43,6 +49,12 @@ EXAMPLES
 $ git commit-graph write
 
 
+* Read basic information from the commit-graph file.
++
+
+$ git commit-graph read
+
+
 
 GIT
 ---
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index a9d61f649a..0e164becff 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,10 +7,16 @@
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
+   N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph write [--object-dir ]"),
NULL
 };
 
+static const char * const builtin_commit_graph_read_usage[] = {
+   N_("git commit-graph read [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_write_usage[] = {
N_("git commit-graph write [--object-dir ]"),
NULL
@@ -20,6 +26,54 @@ static struct opts_commit_graph {
const char *obj_dir;
 } opts;
 
+static int graph_read(int argc, const char **argv)
+{
+   struct commit_graph *graph = 0;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_read_options[] = {
+   OPT_STRING(0, "object-dir", &opts.obj_dir,
+   N_("dir"),
+   N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_read_options,
+builtin_commit_graph_read_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+
+   if (!graph)
+   die("graph file %s does not exist", graph_name);
+   FREE_AND_NULL(graph_name);
+
+   printf("header: %08x %d %d %d %d\n",
+   ntohl(*(uint32_t*)graph->data),
+   *(unsigned char*)(graph->data + 4),
+   *(unsigned char*)(graph->data + 5),
+   *(unsigned char*)(graph->data + 6),
+   *(unsigned char*)(graph->data + 7));
+   printf("num_commits: %u\n", graph->num_commits);
+   printf("chunks:");
+
+   if (graph->chunk_oid_fanout)
+   printf(" oid_fanout");
+   if (graph->chunk_oid_lookup)
+   printf(" oid_lookup");
+   if (graph->chunk_commit_data)
+   printf(" commit_metadata");
+   if (graph->chunk_large_edges)
+   printf(" large_edges");
+   printf("\n");
+
+   return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
static struct option builtin_commit_graph_write_options[] = {
@@ -60,6 +114,8 @@ int cmd_commit_graph(int argc, const char **argv, const char 
*prefix)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
if (argc > 0) {
+   if (!strcmp(argv[0], "read"))
+   return graph_read(argc, argv);
if (!strcmp(argv[0], "write"))
return graph_write(argc, argv);
}
diff --git a/commit-graph.c b/commit-graph.c
index 9bef691d9b..2f2e2c7083 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,11 +39,149 @@
GRAPH_OID_LEN + 8)
 
 
-static char *get_commit_graph_filename(const char *obj_dir)
+char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static struct commit_graph *alloc_commit_graph(void)
+{
+   struct commit_graph *g = xmalloc(sizeof(*g));
+ 

[PATCH v6 06/14] commit-graph: implement write_commit_graph()

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 commit-graph.c | 359 +
 commit-graph.h |   7 ++
 3 files changed, 367 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index a928d4de66..49492c3e1c 100644
--- a/Makefile
+++ b/Makefile
@@ -771,6 +771,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 00..9bef691d9b
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,359 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
+#define GRAPH_PARENT_MISSING 0x7fff
+#define GRAPH_EDGE_LAST_MASK 0x7fff
+#define GRAPH_PARENT_NONE 0x7000
+
+#define GRAPH_LAST_EDGE 0x8000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+   GRAPH_OID_LEN + 8)
+
+
+static char *get_commit_graph_filename(const char *obj_dir)
+{
+   return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+struct commit **commits,
+int nr_commits)
+{
+   int i, count = 0;
+   struct commit **list = commits;
+
+   /*
+* Write the first-level table (the list is sorted,
+* but we use a 256-entry lookup to be able to avoid
+* having to do eight extra binary search iterations).
+*/
+   for (i = 0; i < 256; i++) {
+   while (count < nr_commits) {
+   if ((*list)->object.oid.hash[0] != i)
+   break;
+   count++;
+   list++;
+   }
+
+   hashwrite_be32(f, count);
+   }
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   int count;
+   for (count = 0; count < nr_commits; count++, list++)
+   hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+   struct commit **commits = table;
+   return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+  struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+   uint32_t num_extra_edges = 0;
+
+   while (list < last) {
+   struct commit_list *parent;
+   int edge_value;
+   uint32_t packedDate[2];
+
+   parse_commit(*list);
+   hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+   parent = (*list)->parents;
+
+   if (!parent)
+   edge_value = GRAPH_PARENT_NONE;
+   else {
+   edge_value = sha1_pos(parent->item->object.oid.hash,
+ commits,
+ nr_commits,
+ commit_to_sha1);
+
+   if (edge_value < 0)
+   edge_value = GRAPH_PARENT_MISSING;
+   }
+
+   hashwrite_be32(f, edge_value);
+
+   if (parent)
+   parent = parent->next;
+
+   if (!parent)
+   edge_value = GRAPH_PARENT_NONE;
+   else if (parent->next)
+   edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | 
num_extra_edges;
+   else {
+   edg

[PATCH v6 10/14] commit-graph: close under reachability

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Also automatically add commits from the existing graph file, if it
exists.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2f2e2c7083..fc7b4fa622 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -369,6 +369,28 @@ static int add_packed_commits(const struct object_id *oid,
return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+   int i;
+   struct rev_info revs;
+   struct commit *commit;
+   init_revisions(&revs, NULL);
+   for (i = 0; i < oids->nr; i++) {
+   commit = lookup_commit(&oids->list[i]);
+   if (commit && !parse_commit(commit))
+   revs.commits = commit_list_insert(commit, 
&revs.commits);
+   }
+
+   if (prepare_revision_walk(&revs))
+   die(_("revision walk setup failed"));
+
+   while ((commit = get_revision(&revs)) != NULL) {
+   ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+   oidcpy(&oids->list[oids->nr], &(commit->object.oid));
+   (oids->nr)++;
+   }
+}
+
 void write_commit_graph(const char *obj_dir)
 {
struct packed_oid_list oids;
@@ -392,6 +414,7 @@ void write_commit_graph(const char *obj_dir)
ALLOC_ARRAY(oids.list, oids.alloc);
 
for_each_packed_object(add_packed_commits, &oids, 0);
+   close_reachable(&oids);
 
QSORT(oids.list, oids.nr, commit_compare);
 
-- 
2.14.1



[PATCH v6 02/14] csum-file: refactor finalize_hashfile() method

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee 
---
 builtin/pack-objects.c | 4 ++--
 bulk-checkin.c | 2 +-
 csum-file.c| 8 
 csum-file.h| 5 +++--
 pack-bitmap-write.c| 2 +-
 pack-write.c   | 5 +++--
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 84e9f57b7f..2b15afd932 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,9 +837,9 @@ static void write_pack_file(void)
 * If so, rewrite it like in fast-import
 */
if (pack_to_stdout) {
-   finalize_hashfile(f, oid.hash, CSUM_CLOSE);
+   finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | 
CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
-   finalize_hashfile(f, oid.hash, CSUM_FSYNC);
+   finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | 
CSUM_FSYNC | CSUM_CLOSE);
} else {
int fd = finalize_hashfile(f, oid.hash, 0);
fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 227cc9f3b1..70b14fdf41 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,7 +35,7 @@ static void finish_bulk_checkin(struct bulk_checkin_state 
*state)
unlink(state->pack_tmp_name);
goto clear_exit;
} else if (state->nr_written == 1) {
-   finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
+   finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | 
CSUM_FSYNC | CSUM_CLOSE);
} else {
int fd = finalize_hashfile(state->f, oid.hash, 0);
fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
diff --git a/csum-file.c b/csum-file.c
index e6c95a6915..53ce37f7ca 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -61,11 +61,11 @@ int finalize_hashfile(struct hashfile *f, unsigned char 
*result, unsigned int fl
the_hash_algo->final_fn(f->buffer, &f->ctx);
if (result)
hashcpy(result, f->buffer);
-   if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-   /* write checksum and close fd */
+   if (flags & CSUM_HASH_IN_STREAM)
flush(f, f->buffer, the_hash_algo->rawsz);
-   if (flags & CSUM_FSYNC)
-   fsync_or_die(f->fd, f->name);
+   if (flags & CSUM_FSYNC)
+   fsync_or_die(f->fd, f->name);
+   if (flags & CSUM_CLOSE) {
if (close(f->fd))
die_errno("%s: sha1 file error on close", f->name);
fd = 0;
diff --git a/csum-file.h b/csum-file.h
index 9ba87f0a6c..c5a2e335e7 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -27,8 +27,9 @@ extern void hashfile_checkpoint(struct hashfile *, struct 
hashfile_checkpoint *)
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
 /* finalize_hashfile flags */
-#define CSUM_CLOSE 1
-#define CSUM_FSYNC 2
+#define CSUM_CLOSE 1
+#define CSUM_FSYNC 2
+#define CSUM_HASH_IN_STREAM4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 662b44f97d..db4c832428 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
if (options & BITMAP_OPT_HASH_CACHE)
write_hash_cache(f, index, index_nr);
 
-   finalize_hashfile(f, NULL, CSUM_FSYNC);
+   finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | 
CSUM_CLOSE);
 
if (adjust_shared_perm(tmp_file.buf))
die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index 044f427392..a9d46bc03f 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objec
}
 
hashwrite(f, sha1, the_hash_algo->rawsz);
-   finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-   ? CSUM_CLOSE : CSUM_FSYNC));
+   finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+   ((opts->flags & WRITE_IDX_VERIFY)
+   ? 0 : CSUM_FSYNC));
return index_name;
 }
 
-- 
2.14.1



[PATCH v6 03/14] commit-graph: add format document

2018-03-14 Thread Derrick Stolee
From: Derrick Stolee 

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph-format.txt | 98 +
 1 file changed, 98 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 00..4402baa131
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,98 @@
+Git commit graph format
+===
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as 32-bit integers corresponding to
+the array position withing the list of commit OIDs. We use the most-significant
+bit for special purposes, so we can store at most (1 << 31) - 1 (around 2
+billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+  The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+  Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+  We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+ Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+  First 4 bytes describe the chunk id. Value 0 is a terminating label.
+  Other 8 bytes provide the byte-offset in current file for chunk to
+  start. (Chunks are ordered contiguously in the file, so you can infer
+  the length using the next chunk position if necessary.) Each chunk
+  type appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+  The ith entry, F[i], stores the number of OIDs with first
+  byte at most i. Thus F[255] stores the total
+  number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+  The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+* The first H bytes are for the OID of the root tree.
+* The next 8 bytes are for the positions of the first two parents
+  of the ith commit. Stores value 0x if no parent in that
+  position. If there are more than two parents, the second value
+  has its most-significant bit on and the other bits store an array
+  position into the Large Edge List chunk.
+* The next 8 bytes store the generation number of the commit and
+  the commit time in seconds since EPOCH. The generation number
+  uses the higher 30 bits of the first 4 bytes, while the commit
+  time uses the 32 bits of the second 4 bytes, along with the lowest
+  2 bits of the lowest byte, storing the 33rd and 34th bit of the
+  commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+  This list of 4-byte values store the second through nth parents for
+  all octopus merges. The second parent value in the commit data stores
+  an array position within this list along with the most-significant bit
+  on. Starting at that array position, iterate through this list of commit
+  positions for the parents until reaching a value with the 
most-significant
+  bit on. 

[PATCH v2 4/5] ref-filter: add return value to parsers

2018-03-14 Thread Olga Telezhnaya
Continue removing any printing from ref-filter formatting logic,
so that it could be more general.

Change the signature of parsers by adding return value and
strbuf parameter for error message.

Signed-off-by: Olga Telezhnaia 
---
 ref-filter.c | 177 +++
 1 file changed, 118 insertions(+), 59 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index dd83ef326511d..62ea4adcd0ff1 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -101,22 +101,28 @@ static struct used_atom {
 } *used_atom;
 static int used_atom_cnt, need_tagged, need_symref;
 
-static void color_atom_parser(const struct ref_format *format, struct 
used_atom *atom, const char *color_value)
+static int color_atom_parser(const struct ref_format *format, struct used_atom 
*atom,
+const char *color_value, struct strbuf *err)
 {
-   if (!color_value)
-   die(_("expected format: %%(color:)"));
-   if (color_parse(color_value, atom->u.color) < 0)
-   die(_("unrecognized color: %%(color:%s)"), color_value);
+   if (!color_value) {
+   strbuf_addstr(err, _("expected format: %(color:)"));
+   return -1;
+   }
+   if (color_parse(color_value, atom->u.color) < 0) {
+   strbuf_addf(err, _("unrecognized color: %%(color:%s)"), 
color_value);
+   return -1;
+   }
/*
 * We check this after we've parsed the color, which lets us complain
 * about syntactically bogus color names even if they won't be used.
 */
if (!want_color(format->use_color))
color_parse("", atom->u.color);
+   return 0;
 }
 
-static void refname_atom_parser_internal(struct refname_atom *atom,
-const char *arg, const char *name)
+static int refname_atom_parser_internal(struct refname_atom *atom, const char 
*arg,
+const char *name, struct strbuf *err)
 {
if (!arg)
atom->option = R_NORMAL;
@@ -125,17 +131,25 @@ static void refname_atom_parser_internal(struct 
refname_atom *atom,
else if (skip_prefix(arg, "lstrip=", &arg) ||
 skip_prefix(arg, "strip=", &arg)) {
atom->option = R_LSTRIP;
-   if (strtol_i(arg, 10, &atom->lstrip))
-   die(_("Integer value expected refname:lstrip=%s"), arg);
+   if (strtol_i(arg, 10, &atom->lstrip)) {
+   strbuf_addf(err, _("Integer value expected 
refname:lstrip=%s"), arg);
+   return -1;
+   }
} else if (skip_prefix(arg, "rstrip=", &arg)) {
atom->option = R_RSTRIP;
-   if (strtol_i(arg, 10, &atom->rstrip))
-   die(_("Integer value expected refname:rstrip=%s"), arg);
-   } else
-   die(_("unrecognized %%(%s) argument: %s"), name, arg);
+   if (strtol_i(arg, 10, &atom->rstrip)) {
+   strbuf_addf(err, _("Integer value expected 
refname:rstrip=%s"), arg);
+   return -1;
+   }
+   } else {
+   strbuf_addf(err, _("unrecognized %%(%s) argument: %s"), name, 
arg);
+   return -1;
+   }
+   return 0;
 }
 
-static void remote_ref_atom_parser(const struct ref_format *format, struct 
used_atom *atom, const char *arg)
+static int remote_ref_atom_parser(const struct ref_format *format, struct 
used_atom *atom,
+ const char *arg, struct strbuf *err)
 {
struct string_list params = STRING_LIST_INIT_DUP;
int i;
@@ -145,9 +159,8 @@ static void remote_ref_atom_parser(const struct ref_format 
*format, struct used_
 
if (!arg) {
atom->u.remote_ref.option = RR_REF;
-   refname_atom_parser_internal(&atom->u.remote_ref.refname,
-arg, atom->name);
-   return;
+   return refname_atom_parser_internal(&atom->u.remote_ref.refname,
+   arg, atom->name, err);
}
 
atom->u.remote_ref.nobracket = 0;
@@ -170,29 +183,40 @@ static void remote_ref_atom_parser(const struct 
ref_format *format, struct used_
atom->u.remote_ref.push_remote = 1;
} else {
atom->u.remote_ref.option = RR_REF;
-   
refname_atom_parser_internal(&atom->u.remote_ref.refname,
-arg, atom->name);
+   if 
(refname_atom_parser_internal(&atom->u.remote_ref.refname,
+arg, atom->name, err))
+   return -1;
}
}
 
string_list_clear(¶ms, 0);
+   return 0;
 }
 
-static void body_atom_parser(const struct ref_format *format,

[PATCH v2 2/5] ref-filter: add return value && strbuf to handlers

2018-03-14 Thread Olga Telezhnaya
Continue removing any printing from ref-filter formatting logic,
so that it could be more general.

Change the signature of handlers by adding return value
and strbuf parameter for errors.

Signed-off-by: Olga Telezhnaia 
---
 ref-filter.c | 71 
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 54fae00bdd410..d120360104806 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -387,7 +387,8 @@ struct ref_formatting_state {
 
 struct atom_value {
const char *s;
-   void (*handler)(struct atom_value *atomv, struct ref_formatting_state 
*state);
+   int (*handler)(struct atom_value *atomv, struct ref_formatting_state 
*state,
+  struct strbuf *err);
uintmax_t value; /* used for sorting when not FIELD_STR */
struct used_atom *atom;
 };
@@ -481,7 +482,8 @@ static void quote_formatting(struct strbuf *s, const char 
*str, int quote_style)
}
 }
 
-static void append_atom(struct atom_value *v, struct ref_formatting_state 
*state)
+static int append_atom(struct atom_value *v, struct ref_formatting_state 
*state,
+  struct strbuf *unused_err)
 {
/*
 * Quote formatting is only done when the stack has a single
@@ -493,6 +495,7 @@ static void append_atom(struct atom_value *v, struct 
ref_formatting_state *state
quote_formatting(&state->stack->output, v->s, 
state->quote_style);
else
strbuf_addstr(&state->stack->output, v->s);
+   return 0;
 }
 
 static void push_stack_element(struct ref_formatting_stack **stack)
@@ -527,7 +530,8 @@ static void end_align_handler(struct ref_formatting_stack 
**stack)
strbuf_release(&s);
 }
 
-static void align_atom_handler(struct atom_value *atomv, struct 
ref_formatting_state *state)
+static int align_atom_handler(struct atom_value *atomv, struct 
ref_formatting_state *state,
+ struct strbuf *unused_err)
 {
struct ref_formatting_stack *new_stack;
 
@@ -535,6 +539,7 @@ static void align_atom_handler(struct atom_value *atomv, 
struct ref_formatting_s
new_stack = state->stack;
new_stack->at_end = end_align_handler;
new_stack->at_end_data = &atomv->atom->u.align;
+   return 0;
 }
 
 static void if_then_else_handler(struct ref_formatting_stack **stack)
@@ -572,7 +577,8 @@ static void if_then_else_handler(struct 
ref_formatting_stack **stack)
free(if_then_else);
 }
 
-static void if_atom_handler(struct atom_value *atomv, struct 
ref_formatting_state *state)
+static int if_atom_handler(struct atom_value *atomv, struct 
ref_formatting_state *state,
+  struct strbuf *unused_err)
 {
struct ref_formatting_stack *new_stack;
struct if_then_else *if_then_else = xcalloc(sizeof(struct 
if_then_else), 1);
@@ -584,6 +590,7 @@ static void if_atom_handler(struct atom_value *atomv, 
struct ref_formatting_stat
new_stack = state->stack;
new_stack->at_end = if_then_else_handler;
new_stack->at_end_data = if_then_else;
+   return 0;
 }
 
 static int is_empty(const char *s)
@@ -596,19 +603,24 @@ static int is_empty(const char *s)
return 1;
 }
 
-static void then_atom_handler(struct atom_value *atomv, struct 
ref_formatting_state *state)
+static int then_atom_handler(struct atom_value *atomv, struct 
ref_formatting_state *state,
+struct strbuf *err)
 {
struct ref_formatting_stack *cur = state->stack;
struct if_then_else *if_then_else = NULL;
 
if (cur->at_end == if_then_else_handler)
if_then_else = (struct if_then_else *)cur->at_end_data;
-   if (!if_then_else)
-   die(_("format: %%(then) atom used without an %%(if) atom"));
-   if (if_then_else->then_atom_seen)
-   die(_("format: %%(then) atom used more than once"));
-   if (if_then_else->else_atom_seen)
-   die(_("format: %%(then) atom used after %%(else)"));
+   if (!if_then_else) {
+   strbuf_addstr(err, _("format: %(then) atom used without an 
%(if) atom"));
+   return -1;
+   } else if (if_then_else->then_atom_seen) {
+   strbuf_addstr(err, _("format: %(then) atom used more than 
once"));
+   return -1;
+   } else if (if_then_else->else_atom_seen) {
+   strbuf_addstr(err, _("format: %(then) atom used after 
%(else)"));
+   return -1;
+   }
if_then_else->then_atom_seen = 1;
/*
 * If the 'equals' or 'notequals' attribute is used then
@@ -624,34 +636,44 @@ static void then_atom_handler(struct atom_value *atomv, 
struct ref_formatting_st
} else if (cur->output.len && !is_empty(cur->output.buf))
if_then_else->condition_satisfied = 1;
strbuf_reset(&cur->output);
+   return 0;
 }
 
-static void else_atom_handler(stru

[PATCH v2 1/5] ref-filter: start adding strbufs with errors

2018-03-14 Thread Olga Telezhnaya
This is a first step in removing any printing from
ref-filter formatting logic, so that it could be more general.
Everything would be the same for show_ref_array_item() users.
But, if you want to deal with errors by your own, you could invoke
format_ref_array_item(). It means that you need to print everything
(the result and errors) on your side.

This commit changes signature of format_ref_array_item() by adding
return value and strbuf parameter for errors, and adjusts
its callers. While at it, reduce the scope of the out-variable.

Signed-off-by: Olga Telezhnaia 
---
 builtin/branch.c |  7 +--
 ref-filter.c | 17 -
 ref-filter.h |  7 ---
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 8dcc2ed058be6..f86709ca42d5e 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -391,7 +391,6 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
struct ref_array array;
int maxwidth = 0;
const char *remote_prefix = "";
-   struct strbuf out = STRBUF_INIT;
char *to_free = NULL;
 
/*
@@ -419,7 +418,10 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
ref_array_sort(sorting, &array);
 
for (i = 0; i < array.nr; i++) {
-   format_ref_array_item(array.items[i], format, &out);
+   struct strbuf out = STRBUF_INIT;
+   struct strbuf err = STRBUF_INIT;
+   if (format_ref_array_item(array.items[i], format, &out, &err))
+   die("%s", err.buf);
if (column_active(colopts)) {
assert(!filter->verbose && "--column and --verbose are 
incompatible");
 /* format to a string_list to let print_columns() do 
its job */
@@ -428,6 +430,7 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
fwrite(out.buf, 1, out.len, stdout);
putchar('\n');
}
+   strbuf_release(&err);
strbuf_release(&out);
}
 
diff --git a/ref-filter.c b/ref-filter.c
index 45fc56216aaa8..54fae00bdd410 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2118,9 +2118,10 @@ static void append_literal(const char *cp, const char 
*ep, struct ref_formatting
}
 }
 
-void format_ref_array_item(struct ref_array_item *info,
+int format_ref_array_item(struct ref_array_item *info,
   const struct ref_format *format,
-  struct strbuf *final_buf)
+  struct strbuf *final_buf,
+  struct strbuf *error_buf)
 {
const char *cp, *sp, *ep;
struct ref_formatting_state state = REF_FORMATTING_STATE_INIT;
@@ -2148,19 +2149,25 @@ void format_ref_array_item(struct ref_array_item *info,
resetv.s = GIT_COLOR_RESET;
append_atom(&resetv, &state);
}
-   if (state.stack->prev)
-   die(_("format: %%(end) atom missing"));
+   if (state.stack->prev) {
+   strbuf_addstr(error_buf, _("format: %(end) atom missing"));
+   return -1;
+   }
strbuf_addbuf(final_buf, &state.stack->output);
pop_stack_element(&state.stack);
+   return 0;
 }
 
 void show_ref_array_item(struct ref_array_item *info,
 const struct ref_format *format)
 {
struct strbuf final_buf = STRBUF_INIT;
+   struct strbuf error_buf = STRBUF_INIT;
 
-   format_ref_array_item(info, format, &final_buf);
+   if (format_ref_array_item(info, format, &final_buf, &error_buf))
+   die("%s", error_buf.buf);
fwrite(final_buf.buf, 1, final_buf.len, stdout);
+   strbuf_release(&error_buf);
strbuf_release(&final_buf);
putchar('\n');
 }
diff --git a/ref-filter.h b/ref-filter.h
index 0d98342b34319..e13f8e6f8721a 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -110,9 +110,10 @@ int verify_ref_format(struct ref_format *format);
 /*  Sort the given ref_array as per the ref_sorting provided */
 void ref_array_sort(struct ref_sorting *sort, struct ref_array *array);
 /*  Based on the given format and quote_style, fill the strbuf */
-void format_ref_array_item(struct ref_array_item *info,
-  const struct ref_format *format,
-  struct strbuf *final_buf);
+int format_ref_array_item(struct ref_array_item *info,
+ const struct ref_format *format,
+ struct strbuf *final_buf,
+ struct strbuf *error_buf);
 /*  Print the ref using the given format and quote_style */
 void show_ref_array_item(struct ref_array_item *info, const struct ref_format 
*format);
 /*  Parse a single sort specifier and add it to the list */

--
https://github.com/git/git/pull/466


[PATCH v2 5/5] ref-filter: get_ref_atom_value() error handling

2018-03-14 Thread Olga Telezhnaya
Finish removing any printing from ref-filter formatting logic,
so that it could be more general.

Change the signature of get_ref_atom_value() and underlying functions
by adding return value and strbuf parameter for error message.

It's important to mention that grab_objectname() returned 1 if
it gets objectname atom and 0 otherwise. Now this logic changed:
we return 0 if we have no error, -1 otherwise. If someone needs to
know whether it's objectname atom or not, he/she could use
starts_with() function. It duplicates this checking but it does not
sound like a really big overhead.

Signed-off-by: Olga Telezhnaia 
---
 ref-filter.c | 109 +--
 1 file changed, 69 insertions(+), 40 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 62ea4adcd0ff1..3f0c3924273d5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -831,26 +831,27 @@ static void *get_obj(const struct object_id *oid, struct 
object **obj, unsigned
 }
 
 static int grab_objectname(const char *name, const unsigned char *sha1,
-  struct atom_value *v, struct used_atom *atom)
+  struct atom_value *v, struct used_atom *atom,
+  struct strbuf *err)
 {
if (starts_with(name, "objectname")) {
if (atom->u.objectname.option == O_SHORT) {
v->s = xstrdup(find_unique_abbrev(sha1, 
DEFAULT_ABBREV));
-   return 1;
} else if (atom->u.objectname.option == O_FULL) {
v->s = xstrdup(sha1_to_hex(sha1));
-   return 1;
} else if (atom->u.objectname.option == O_LENGTH) {
v->s = xstrdup(find_unique_abbrev(sha1, 
atom->u.objectname.length));
-   return 1;
-   } else
-   die("BUG: unknown %%(objectname) option");
+   } else {
+   strbuf_addstr(err, "BUG: unknown %(objectname) option");
+   return -1;
+   }
}
return 0;
 }
 
 /* See grab_values */
-static void grab_common_values(struct atom_value *val, int deref, struct 
object *obj, void *buf, unsigned long sz)
+static int grab_common_values(struct atom_value *val, int deref, struct object 
*obj,
+ void *buf, unsigned long sz, struct strbuf *err)
 {
int i;
 
@@ -868,8 +869,10 @@ static void grab_common_values(struct atom_value *val, int 
deref, struct object
v->s = xstrfmt("%lu", sz);
}
else if (deref)
-   grab_objectname(name, obj->oid.hash, v, &used_atom[i]);
+   if (grab_objectname(name, obj->oid.hash, v, 
&used_atom[i], err))
+   return -1;
}
+   return 0;
 }
 
 /* See grab_values */
@@ -1225,9 +1228,11 @@ static void fill_missing_values(struct atom_value *val)
  * pointed at by the ref itself; otherwise it is the object the
  * ref (which is a tag) refers to.
  */
-static void grab_values(struct atom_value *val, int deref, struct object *obj, 
void *buf, unsigned long sz)
+static int grab_values(struct atom_value *val, int deref, struct object *obj,
+  void *buf, unsigned long sz, struct strbuf *err)
 {
-   grab_common_values(val, deref, obj, buf, sz);
+   if (grab_common_values(val, deref, obj, buf, sz, err))
+   return -1;
switch (obj->type) {
case OBJ_TAG:
grab_tag_values(val, deref, obj, buf, sz);
@@ -1247,8 +1252,10 @@ static void grab_values(struct atom_value *val, int 
deref, struct object *obj, v
/* grab_blob_values(val, deref, obj, buf, sz); */
break;
default:
-   die("Eh?  Object of type %d?", obj->type);
+   strbuf_addf(err, "Eh?  Object of type %d?", obj->type);
+   return -1;
}
+   return 0;
 }
 
 static inline char *copy_advance(char *dst, const char *src)
@@ -1335,8 +1342,9 @@ static const char *show_ref(struct refname_atom *atom, 
const char *refname)
return refname;
 }
 
-static void fill_remote_ref_details(struct used_atom *atom, const char 
*refname,
-   struct branch *branch, const char **s)
+static int fill_remote_ref_details(struct used_atom *atom, const char *refname,
+  struct branch *branch, const char **s,
+  struct strbuf *err)
 {
int num_ours, num_theirs;
if (atom->u.remote_ref.option == RR_REF)
@@ -1362,7 +1370,7 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
} else if (atom->u.remote_ref.option == RR_TRACKSHORT) {
if (stat_tracking_info(branch, &num_ours, &num_theirs,
   NULL, AHEAD_BEHIND_FULL) < 0)
-  

[PATCH v2 3/5] ref-filter: change parsing function error handling

2018-03-14 Thread Olga Telezhnaya
Continue removing any printing from ref-filter formatting logic,
so that it could be more general.

Change the signature of parse_ref_filter_atom() by changing return value,
adding previous return value to function parameter and also adding
strbuf parameter for error message.

Signed-off-by: Olga Telezhnaia 
---
 ref-filter.c | 43 ++-
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index d120360104806..dd83ef326511d 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -397,7 +397,8 @@ struct atom_value {
  * Used to parse format string and sort specifiers
  */
 static int parse_ref_filter_atom(const struct ref_format *format,
-const char *atom, const char *ep)
+const char *atom, const char *ep, int *res,
+struct strbuf *err)
 {
const char *sp;
const char *arg;
@@ -406,14 +407,18 @@ static int parse_ref_filter_atom(const struct ref_format 
*format,
sp = atom;
if (*sp == '*' && sp < ep)
sp++; /* deref */
-   if (ep <= sp)
-   die(_("malformed field name: %.*s"), (int)(ep-atom), atom);
+   if (ep <= sp) {
+   strbuf_addf(err, _("malformed field name: %.*s"), 
(int)(ep-atom), atom);
+   return -1;
+   }
 
/* Do we have the atom already used elsewhere? */
for (i = 0; i < used_atom_cnt; i++) {
int len = strlen(used_atom[i].name);
-   if (len == ep - atom && !memcmp(used_atom[i].name, atom, len))
-   return i;
+   if (len == ep - atom && !memcmp(used_atom[i].name, atom, len)) {
+   *res = i;
+   return 0;
+   }
}
 
/*
@@ -432,8 +437,10 @@ static int parse_ref_filter_atom(const struct ref_format 
*format,
break;
}
 
-   if (ARRAY_SIZE(valid_atom) <= i)
-   die(_("unknown field name: %.*s"), (int)(ep-atom), atom);
+   if (ARRAY_SIZE(valid_atom) <= i) {
+   strbuf_addf(err, _("unknown field name: %.*s"), (int)(ep-atom), 
atom);
+   return -1;
+   }
 
/* Add it in, including the deref prefix */
at = used_atom_cnt;
@@ -458,7 +465,8 @@ static int parse_ref_filter_atom(const struct ref_format 
*format,
need_tagged = 1;
if (!strcmp(valid_atom[i].name, "symref"))
need_symref = 1;
-   return at;
+   *res = at;
+   return 0;
 }
 
 static void quote_formatting(struct strbuf *s, const char *str, int 
quote_style)
@@ -725,17 +733,20 @@ int verify_ref_format(struct ref_format *format)
 
format->need_color_reset_at_eol = 0;
for (cp = format->format; *cp && (sp = find_next(cp)); ) {
+   struct strbuf err = STRBUF_INIT;
const char *color, *ep = strchr(sp, ')');
int at;
 
if (!ep)
return error(_("malformed format string %s"), sp);
/* sp points at "%(" and ep points at the closing ")" */
-   at = parse_ref_filter_atom(format, sp + 2, ep);
+   if (parse_ref_filter_atom(format, sp + 2, ep, &at, &err))
+   die("%s", err.buf);
cp = ep + 1;
 
if (skip_prefix(used_atom[at].name, "color:", &color))
format->need_color_reset_at_eol = !!strcmp(color, 
"reset");
+   strbuf_release(&err);
}
if (format->need_color_reset_at_eol && !want_color(format->use_color))
format->need_color_reset_at_eol = 0;
@@ -2154,13 +2165,14 @@ int format_ref_array_item(struct ref_array_item *info,
 
for (cp = format->format; *cp && (sp = find_next(cp)); cp = ep + 1) {
struct atom_value *atomv;
+   int pos;
 
ep = strchr(sp, ')');
if (cp < sp)
append_literal(cp, sp, &state);
-   get_ref_atom_value(info,
-  parse_ref_filter_atom(format, sp + 2, ep),
-  &atomv);
+   if (parse_ref_filter_atom(format, sp + 2, ep, &pos, error_buf))
+   return -1;
+   get_ref_atom_value(info, pos, &atomv);
if (atomv->handler(atomv, &state, error_buf))
return -1;
}
@@ -2215,7 +2227,12 @@ static int parse_sorting_atom(const char *atom)
 */
struct ref_format dummy = REF_FORMAT_INIT;
const char *end = atom + strlen(atom);
-   return parse_ref_filter_atom(&dummy, atom, end);
+   struct strbuf err = STRBUF_INIT;
+   int res;
+   if (parse_ref_filter_atom(&dummy, atom, end, &res, &err))
+   die("%s", err.buf);
+   strbuf_release(&err);
+   return res;
 }
 
 /*  If no sorting o

[PATCH 2/2] fetch-pack: do not check links for partial fetch

2018-03-14 Thread Jonathan Tan
When doing a partial clone or fetch with transfer.fsckobjects=1, use the
--fsck-objects instead of the --strict flag when invoking index-pack so
that links are not checked, only objects. This is because incomplete
links are expected when doing a partial clone or fetch.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c | 13 +++--
 t/t5616-partial-clone.sh | 11 +++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index d97461296..1d6117565 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -886,8 +886,17 @@ static int get_pack(struct fetch_pack_args *args,
? fetch_fsck_objects
: transfer_fsck_objects >= 0
? transfer_fsck_objects
-   : 0)
-   argv_array_push(&cmd.args, "--strict");
+   : 0) {
+   if (args->from_promisor)
+   /*
+* We cannot use --strict in index-pack because it
+* checks both broken objects and links, but we only
+* want to check for broken objects.
+*/
+   argv_array_push(&cmd.args, "--fsck-objects");
+   else
+   argv_array_push(&cmd.args, "--strict");
+   }
 
cmd.in = demux.out;
cmd.git_cmd = 1;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 29d863118..cee556536 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -143,4 +143,15 @@ test_expect_success 'manual prefetch of missing objects' '
test_line_count = 0 observed.oids
 '
 
+test_expect_success 'partial clone with transfer.fsckobjects=1 uses index-pack 
--fsck-objects' '
+   git init src &&
+   test_commit -C src x &&
+   test_config -C src uploadpack.allowfilter 1 &&
+   test_config -C src uploadpack.allowanysha1inwant 1 &&
+
+   GIT_TRACE="$(pwd)/trace" git -c transfer.fsckobjects=1 \
+   clone --filter="blob:none" "file://$(pwd)/src" dst &&
+   grep "git index-pack.*--fsck-objects" trace
+'
+
 test_done
-- 
2.16.2.520.gd0db9edba.dirty



[PATCH 0/2] Make partial clone/fetch work when transfer.fsckobjects=1

2018-03-14 Thread Jonathan Tan
One of my colleagues noticed that we obtain a "fatal: did not receive
expected object" error when partial-cloning (that is, with --filter set)
if transfer.fsckobjects is true. Here's a fix for that.

Jonathan Tan (2):
  index-pack: support checking objects but not links
  fetch-pack: do not check links for partial fetch

 Documentation/git-index-pack.txt |  3 +++
 builtin/index-pack.c |  6 --
 fetch-pack.c | 13 +++--
 t/t5302-pack-index.sh|  5 +
 t/t5616-partial-clone.sh | 11 +++
 5 files changed, 34 insertions(+), 4 deletions(-)

-- 
2.16.2.520.gd0db9edba.dirty



[PATCH 1/2] index-pack: support checking objects but not links

2018-03-14 Thread Jonathan Tan
The index-pack command currently supports the
--check-self-contained-and-connected argument, for internal use only,
that instructs it to only check for broken links and not broken objects.
For partial clones, we need the inverse, so add a --fsck-objects
argument that checks for broken objects and not broken links, also for
internal use only.

This will be used by fetch-pack in a subsequent patch.

Signed-off-by: Jonathan Tan 
---
 Documentation/git-index-pack.txt | 3 +++
 builtin/index-pack.c | 6 --
 t/t5302-pack-index.sh| 5 +
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index 1b4b65d66..138edb47b 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -77,6 +77,9 @@ OPTIONS
 --check-self-contained-and-connected::
Die if the pack contains broken links. For internal use only.
 
+--fsck-objects::
+   Die if the pack contains broken objects. For internal use only.
+
 --threads=::
Specifies the number of threads to spawn when resolving
deltas. This requires that index-pack be compiled with
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 59878e70b..f46cb5967 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -827,7 +827,7 @@ static void sha1_object(const void *data, struct 
object_entry *obj_entry,
free(has_data);
}
 
-   if (strict) {
+   if (strict || do_fsck_object) {
read_lock();
if (type == OBJ_BLOB) {
struct blob *blob = lookup_blob(oid);
@@ -853,7 +853,7 @@ static void sha1_object(const void *data, struct 
object_entry *obj_entry,
if (do_fsck_object &&
fsck_object(obj, buf, size, &fsck_options))
die(_("Error in object"));
-   if (fsck_walk(obj, NULL, &fsck_options))
+   if (strict && fsck_walk(obj, NULL, &fsck_options))
die(_("Not all child objects of %s are 
reachable"), oid_to_hex(&obj->oid));
 
if (obj->type == OBJ_TREE) {
@@ -1688,6 +1688,8 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
} else if (!strcmp(arg, 
"--check-self-contained-and-connected")) {
strict = 1;
check_self_contained_and_connected = 1;
+   } else if (!strcmp(arg, "--fsck-objects")) {
+   do_fsck_object = 1;
} else if (!strcmp(arg, "--verify")) {
verify = 1;
} else if (!strcmp(arg, "--verify-stat")) {
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index c2fc584da..d695a6082 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -262,4 +262,9 @@ EOF
 grep "^warning:.* expected .tagger. line" err
 '
 
+test_expect_success 'index-pack --fsck-objects also warns upon missing tagger 
in tag' '
+git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
+grep "^warning:.* expected .tagger. line" err
+'
+
 test_done
-- 
2.16.2.520.gd0db9edba.dirty



[PATCH v5 06/35] transport: use get_refs_via_connect to get refs

2018-03-14 Thread Brandon Williams
Remove code duplication and use the existing 'get_refs_via_connect()'
function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
'git_transport_push()'.

Signed-off-by: Brandon Williams 
---
 transport.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/transport.c b/transport.c
index fc802260f6..8e87790962 100644
--- a/transport.c
+++ b/transport.c
@@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport *transport,
args.cloning = transport->cloning;
args.update_shallow = data->options.update_shallow;
 
-   if (!data->got_remote_heads) {
-   connect_setup(transport, 0);
-   get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
-NULL, &data->shallow);
-   data->got_remote_heads = 1;
-   }
+   if (!data->got_remote_heads)
+   refs_tmp = get_refs_via_connect(transport, 0);
 
refs = fetch_pack(&args, data->fd, data->conn,
  refs_tmp ? refs_tmp : transport->remote_refs,
@@ -541,14 +537,8 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
struct send_pack_args args;
int ret;
 
-   if (!data->got_remote_heads) {
-   struct ref *tmp_refs;
-   connect_setup(transport, 1);
-
-   get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL,
-NULL, &data->shallow);
-   data->got_remote_heads = 1;
-   }
+   if (!data->got_remote_heads)
+   get_refs_via_connect(transport, 1);
 
memset(&args, 0, sizeof(args));
args.send_mirror = !!(flags & TRANSPORT_PUSH_MIRROR);
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 05/35] upload-pack: factor out processing lines

2018-03-14 Thread Brandon Williams
Factor out the logic for processing shallow, deepen, deepen_since, and
deepen_not lines into their own functions to simplify the
'receive_needs()' function in addition to making it easier to reuse some
of this logic when implementing protocol_v2.

Signed-off-by: Brandon Williams 
---
 upload-pack.c | 113 +-
 1 file changed, 74 insertions(+), 39 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 2ad73a98b1..1e8a9e1caf 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -724,6 +724,75 @@ static void deepen_by_rev_list(int ac, const char **av,
packet_flush(1);
 }
 
+static int process_shallow(const char *line, struct object_array *shallows)
+{
+   const char *arg;
+   if (skip_prefix(line, "shallow ", &arg)) {
+   struct object_id oid;
+   struct object *object;
+   if (get_oid_hex(arg, &oid))
+   die("invalid shallow line: %s", line);
+   object = parse_object(&oid);
+   if (!object)
+   return 1;
+   if (object->type != OBJ_COMMIT)
+   die("invalid shallow object %s", oid_to_hex(&oid));
+   if (!(object->flags & CLIENT_SHALLOW)) {
+   object->flags |= CLIENT_SHALLOW;
+   add_object_array(object, NULL, shallows);
+   }
+   return 1;
+   }
+
+   return 0;
+}
+
+static int process_deepen(const char *line, int *depth)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen ", &arg)) {
+   char *end = NULL;
+   *depth = (int)strtol(arg, &end, 0);
+   if (!end || *end || *depth <= 0)
+   die("Invalid deepen: %s", line);
+   return 1;
+   }
+
+   return 0;
+}
+
+static int process_deepen_since(const char *line, timestamp_t *deepen_since, 
int *deepen_rev_list)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen-since ", &arg)) {
+   char *end = NULL;
+   *deepen_since = parse_timestamp(arg, &end, 0);
+   if (!end || *end || !deepen_since ||
+   /* revisions.c's max_age -1 is special */
+   *deepen_since == -1)
+   die("Invalid deepen-since: %s", line);
+   *deepen_rev_list = 1;
+   return 1;
+   }
+   return 0;
+}
+
+static int process_deepen_not(const char *line, struct string_list 
*deepen_not, int *deepen_rev_list)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen-not ", &arg)) {
+   char *ref = NULL;
+   struct object_id oid;
+   if (expand_ref(arg, strlen(arg), &oid, &ref) != 1)
+   die("git upload-pack: ambiguous deepen-not: %s", line);
+   string_list_append(deepen_not, ref);
+   free(ref);
+   *deepen_rev_list = 1;
+   return 1;
+   }
+   return 0;
+}
+
 static void receive_needs(void)
 {
struct object_array shallows = OBJECT_ARRAY_INIT;
@@ -745,49 +814,15 @@ static void receive_needs(void)
if (!line)
break;
 
-   if (skip_prefix(line, "shallow ", &arg)) {
-   struct object_id oid;
-   struct object *object;
-   if (get_oid_hex(arg, &oid))
-   die("invalid shallow line: %s", line);
-   object = parse_object(&oid);
-   if (!object)
-   continue;
-   if (object->type != OBJ_COMMIT)
-   die("invalid shallow object %s", 
oid_to_hex(&oid));
-   if (!(object->flags & CLIENT_SHALLOW)) {
-   object->flags |= CLIENT_SHALLOW;
-   add_object_array(object, NULL, &shallows);
-   }
+   if (process_shallow(line, &shallows))
continue;
-   }
-   if (skip_prefix(line, "deepen ", &arg)) {
-   char *end = NULL;
-   depth = strtol(arg, &end, 0);
-   if (!end || *end || depth <= 0)
-   die("Invalid deepen: %s", line);
+   if (process_deepen(line, &depth))
continue;
-   }
-   if (skip_prefix(line, "deepen-since ", &arg)) {
-   char *end = NULL;
-   deepen_since = parse_timestamp(arg, &end, 0);
-   if (!end || *end || !deepen_since ||
-   /* revisions.c's max_age -1 is special */
-   deepen_since == -1)
-   die("Invalid deepen-since: %s", line);
-   deepen_rev_list = 1;
+   if (proc

[PATCH v5 08/35] connect: discover protocol version outside of get_remote_heads

2018-03-14 Thread Brandon Williams
In order to prepare for the addition of protocol_v2 push the protocol
version discovery outside of 'get_remote_heads()'.  This will allow for
keeping the logic for processing the reference advertisement for
protocol_v1 and protocol_v0 separate from the logic for protocol_v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c | 16 +++-
 builtin/send-pack.c  | 17 +++--
 connect.c| 27 ++-
 connect.h|  3 +++
 remote-curl.c| 20 ++--
 remote.h |  5 +++--
 transport.c  | 24 +++-
 7 files changed, 83 insertions(+), 29 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f9..85d4faf76c 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -4,6 +4,7 @@
 #include "remote.h"
 #include "connect.h"
 #include "sha1-array.h"
+#include "protocol.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -52,6 +53,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct fetch_pack_args args;
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
+   struct packet_reader reader;
 
packet_trace_identity("fetch-pack");
 
@@ -193,7 +195,19 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
if (!conn)
return args.diag_url ? 0 : 1;
}
-   get_remote_heads(fd[0], NULL, 0, &ref, 0, NULL, &shallow);
+
+   packet_reader_init(&reader, fd[0], NULL, 0,
+  PACKET_READ_CHOMP_NEWLINE |
+  PACKET_READ_GENTLE_ON_EOF);
+
+   switch (discover_version(&reader)) {
+   case protocol_v1:
+   case protocol_v0:
+   get_remote_heads(&reader, &ref, 0, NULL, &shallow);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
ref = fetch_pack(&args, fd, conn, ref, dest, sought, nr_sought,
 &shallow, pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index fc4f0bb5fb..83cb125a68 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -14,6 +14,7 @@
 #include "sha1-array.h"
 #include "gpg-interface.h"
 #include "gettext.h"
+#include "protocol.h"
 
 static const char * const send_pack_usage[] = {
N_("git send-pack [--all | --mirror] [--dry-run] [--force] "
@@ -154,6 +155,7 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
int progress = -1;
int from_stdin = 0;
struct push_cas_option cas = {0};
+   struct packet_reader reader;
 
struct option options[] = {
OPT__VERBOSITY(&verbose),
@@ -256,8 +258,19 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
args.verbose ? CONNECT_VERBOSE : 0);
}
 
-   get_remote_heads(fd[0], NULL, 0, &remote_refs, REF_NORMAL,
-&extra_have, &shallow);
+   packet_reader_init(&reader, fd[0], NULL, 0,
+  PACKET_READ_CHOMP_NEWLINE |
+  PACKET_READ_GENTLE_ON_EOF);
+
+   switch (discover_version(&reader)) {
+   case protocol_v1:
+   case protocol_v0:
+   get_remote_heads(&reader, &remote_refs, REF_NORMAL,
+&extra_have, &shallow);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
transport_verify_remote_names(nr_refspecs, refspecs);
 
diff --git a/connect.c b/connect.c
index c82c90b7c3..0b111e62d7 100644
--- a/connect.c
+++ b/connect.c
@@ -62,7 +62,7 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
-static enum protocol_version discover_version(struct packet_reader *reader)
+enum protocol_version discover_version(struct packet_reader *reader)
 {
enum protocol_version version = protocol_unknown_version;
 
@@ -233,7 +233,7 @@ enum get_remote_heads_state {
 /*
  * Read all the refs from the other end
  */
-struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+struct ref **get_remote_heads(struct packet_reader *reader,
  struct ref **list, unsigned int flags,
  struct oid_array *extra_have,
  struct oid_array *shallow_points)
@@ -241,24 +241,17 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
struct ref **orig_list = list;
int len = 0;
enum get_remote_heads_state state = EXPECTING_FIRST_REF;
-   struct packet_reader reader;
const char *arg;
 
-   packet_reader_init(&reader, in, src_buf, src_len,
-  PACKET_READ_CHOMP_NEWLINE |
-

[PATCH v5 09/35] transport: store protocol version

2018-03-14 Thread Brandon Williams
Once protocol_v2 is introduced requesting a fetch or a push will need to
be handled differently depending on the protocol version.  Store the
protocol version the server is speaking in 'struct git_transport_data'
and use it to determine what to do in the case of a fetch or a push.

Signed-off-by: Brandon Williams 
---
 transport.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/transport.c b/transport.c
index 63c3dbab94..2378dcb38c 100644
--- a/transport.c
+++ b/transport.c
@@ -118,6 +118,7 @@ struct git_transport_data {
struct child_process *conn;
int fd[2];
unsigned got_remote_heads : 1;
+   enum protocol_version version;
struct oid_array extra_have;
struct oid_array shallow;
 };
@@ -200,7 +201,8 @@ static struct ref *get_refs_via_connect(struct transport 
*transport, int for_pus
   PACKET_READ_CHOMP_NEWLINE |
   PACKET_READ_GENTLE_ON_EOF);
 
-   switch (discover_version(&reader)) {
+   data->version = discover_version(&reader);
+   switch (data->version) {
case protocol_v1:
case protocol_v0:
get_remote_heads(&reader, &refs,
@@ -221,7 +223,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 {
int ret = 0;
struct git_transport_data *data = transport->data;
-   struct ref *refs;
+   struct ref *refs = NULL;
char *dest = xstrdup(transport->url);
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
@@ -247,10 +249,18 @@ static int fetch_refs_via_pack(struct transport 
*transport,
if (!data->got_remote_heads)
refs_tmp = get_refs_via_connect(transport, 0);
 
-   refs = fetch_pack(&args, data->fd, data->conn,
- refs_tmp ? refs_tmp : transport->remote_refs,
- dest, to_fetch, nr_heads, &data->shallow,
- &transport->pack_lockfile);
+   switch (data->version) {
+   case protocol_v1:
+   case protocol_v0:
+   refs = fetch_pack(&args, data->fd, data->conn,
+ refs_tmp ? refs_tmp : transport->remote_refs,
+ dest, to_fetch, nr_heads, &data->shallow,
+ &transport->pack_lockfile);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
close(data->fd[0]);
close(data->fd[1]);
if (finish_connect(data->conn))
@@ -549,7 +559,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
 {
struct git_transport_data *data = transport->data;
struct send_pack_args args;
-   int ret;
+   int ret = 0;
 
if (!data->got_remote_heads)
get_refs_via_connect(transport, 1);
@@ -574,8 +584,15 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
else
args.push_cert = SEND_PACK_PUSH_CERT_NEVER;
 
-   ret = send_pack(&args, data->fd, data->conn, remote_refs,
-   &data->extra_have);
+   switch (data->version) {
+   case protocol_v1:
+   case protocol_v0:
+   ret = send_pack(&args, data->fd, data->conn, remote_refs,
+   &data->extra_have);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
close(data->fd[1]);
close(data->fd[0]);
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 16/35] transport: convert transport_get_remote_refs to take a list of ref prefixes

2018-03-14 Thread Brandon Williams
Teach transport_get_remote_refs() to accept a list of ref prefixes,
which will be sent to the server for use in filtering when using
protocol v2. (This list will be ignored when not using protocol v2.)

Signed-off-by: Brandon Williams 
---
 builtin/clone.c |  2 +-
 builtin/fetch.c |  4 ++--
 builtin/ls-remote.c |  2 +-
 builtin/remote.c|  2 +-
 transport.c |  7 +--
 transport.h | 12 +++-
 6 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 284651797e..6e77d993fa 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1121,7 +1121,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !deepen)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport);
+   refs = transport_get_remote_refs(transport, NULL);
 
if (refs) {
mapped_refs = wanted_peer_refs(refs, refspec);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7bbcd26faf..850382f559 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -250,7 +250,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, &existing_refs);
-   for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
+   for (ref = transport_get_remote_refs(transport, NULL); ref; ref = 
ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -336,7 +336,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = &orefs;
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport);
+   const struct ref *remote_refs = transport_get_remote_refs(transport, 
NULL);
 
if (refspec_count) {
struct refspec *fetch_refspec;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c4be98ab9e..c6e9847c5c 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -96,7 +96,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport);
+   ref = transport_get_remote_refs(transport, NULL);
if (transport_disconnect(transport))
return 1;
 
diff --git a/builtin/remote.c b/builtin/remote.c
index d95bf904c3..d0b6ff6e29 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -862,7 +862,7 @@ static int get_remote_ref_states(const char *name,
if (query) {
transport = transport_get(states->remote, 
states->remote->url_nr > 0 ?
states->remote->url[0] : NULL);
-   remote_refs = transport_get_remote_refs(transport);
+   remote_refs = transport_get_remote_refs(transport, NULL);
transport_disconnect(transport);
 
states->queried = 1;
diff --git a/transport.c b/transport.c
index 2e68010dd0..3f130518d2 100644
--- a/transport.c
+++ b/transport.c
@@ -1138,10 +1138,13 @@ int transport_push(struct transport *transport,
return 1;
 }
 
-const struct ref *transport_get_remote_refs(struct transport *transport)
+const struct ref *transport_get_remote_refs(struct transport *transport,
+   const struct argv_array 
*ref_prefixes)
 {
if (!transport->got_remote_refs) {
-   transport->remote_refs = 
transport->vtable->get_refs_list(transport, 0, NULL);
+   transport->remote_refs =
+   transport->vtable->get_refs_list(transport, 0,
+ref_prefixes);
transport->got_remote_refs = 1;
}
 
diff --git a/transport.h b/transport.h
index 731c78b679..83992a4257 100644
--- a/transport.h
+++ b/transport.h
@@ -178,7 +178,17 @@ int transport_push(struct transport *connection,
   int refspec_nr, const char **refspec, int flags,
   unsigned int * reject_reasons);
 
-const struct ref *transport_get_remote_refs(struct transport *transport);
+/*
+ * Retrieve refs from a remote.
+ *
+ * Optionally a list of ref prefixes can be provided which can be sent to the
+ * server (when communicating using protocol v2) to enable it to limit the ref
+ * advertisement.  Since ref filtering is done on the server's end (and only
+ * when using protocol v2), this can return refs which don't match the provided
+ * ref_prefixes.
+ */
+const struct ref *transport_get_remote_refs(struct transport *transport,
+   const struct argv_array 
*ref_prefixes);
 
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_p

[PATCH v5 15/35] transport: convert get_refs_list to take a list of ref prefixes

2018-03-14 Thread Brandon Williams
Convert the 'struct transport' virtual function 'get_refs_list()' to
optionally take an argv_array of ref prefixes.  When communicating with
a server using protocol v2 these ref prefixes can be sent when
requesting a listing of their refs allowing the server to filter the
refs it sends based on the sent prefixes.  This list will be ignored
when not using protocol v2.

Signed-off-by: Brandon Williams 
---
 transport-helper.c   |  5 +++--
 transport-internal.h | 11 ++-
 transport.c  | 18 +++---
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 5080150231..8774ab3013 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1026,7 +1026,8 @@ static int has_attribute(const char *attrs, const char 
*attr) {
}
 }
 
-static struct ref *get_refs_list(struct transport *transport, int for_push)
+static struct ref *get_refs_list(struct transport *transport, int for_push,
+const struct argv_array *ref_prefixes)
 {
struct helper_data *data = transport->data;
struct child_process *helper;
@@ -1039,7 +1040,7 @@ static struct ref *get_refs_list(struct transport 
*transport, int for_push)
 
if (process_connect(transport, for_push)) {
do_take_over(transport);
-   return transport->vtable->get_refs_list(transport, for_push);
+   return transport->vtable->get_refs_list(transport, for_push, 
ref_prefixes);
}
 
if (data->push && for_push)
diff --git a/transport-internal.h b/transport-internal.h
index 3c1a29d727..1cde6258a7 100644
--- a/transport-internal.h
+++ b/transport-internal.h
@@ -3,6 +3,7 @@
 
 struct ref;
 struct transport;
+struct argv_array;
 
 struct transport_vtable {
/**
@@ -17,11 +18,19 @@ struct transport_vtable {
 * the transport to try to share connections, for_push is a
 * hint as to whether the ultimate operation is a push or a fetch.
 *
+* If communicating using protocol v2 a list of prefixes can be
+* provided to be sent to the server to enable it to limit the ref
+* advertisement.  Since ref filtering is done on the server's end, and
+* only when using protocol v2, this list will be ignored when not
+* using protocol v2 meaning this function can return refs which don't
+* match the provided ref_prefixes.
+*
 * If the transport is able to determine the remote hash for
 * the ref without a huge amount of effort, it should store it
 * in the ref's old_sha1 field; otherwise it should be all 0.
 **/
-   struct ref *(*get_refs_list)(struct transport *transport, int for_push);
+   struct ref *(*get_refs_list)(struct transport *transport, int for_push,
+const struct argv_array *ref_prefixes);
 
/**
 * Fetch the objects for the given refs. Note that this gets
diff --git a/transport.c b/transport.c
index ffc6b2614f..2e68010dd0 100644
--- a/transport.c
+++ b/transport.c
@@ -72,7 +72,9 @@ struct bundle_transport_data {
struct bundle_header header;
 };
 
-static struct ref *get_refs_from_bundle(struct transport *transport, int 
for_push)
+static struct ref *get_refs_from_bundle(struct transport *transport,
+   int for_push,
+   const struct argv_array *ref_prefixes)
 {
struct bundle_transport_data *data = transport->data;
struct ref *result = NULL;
@@ -189,7 +191,8 @@ static int connect_setup(struct transport *transport, int 
for_push)
return 0;
 }
 
-static struct ref *get_refs_via_connect(struct transport *transport, int 
for_push)
+static struct ref *get_refs_via_connect(struct transport *transport, int 
for_push,
+   const struct argv_array *ref_prefixes)
 {
struct git_transport_data *data = transport->data;
struct ref *refs = NULL;
@@ -204,7 +207,8 @@ static struct ref *get_refs_via_connect(struct transport 
*transport, int for_pus
data->version = discover_version(&reader);
switch (data->version) {
case protocol_v2:
-   get_remote_refs(data->fd[1], &reader, &refs, for_push, NULL);
+   get_remote_refs(data->fd[1], &reader, &refs, for_push,
+   ref_prefixes);
break;
case protocol_v1:
case protocol_v0:
@@ -250,7 +254,7 @@ static int fetch_refs_via_pack(struct transport *transport,
args.update_shallow = data->options.update_shallow;
 
if (!data->got_remote_heads)
-   refs_tmp = get_refs_via_connect(transport, 0);
+   refs_tmp = get_refs_via_connect(transport, 0, NULL);
 
switch (data->version) {
case protocol_v2:
@@ -568,7 +572,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
 

[PATCH v5 24/35] connect: don't request v2 when pushing

2018-03-14 Thread Brandon Williams
In order to be able to ship protocol v2 with only supporting fetch, we
need clients to not issue a request to use protocol v2 when pushing
(since the client currently doesn't know how to push using protocol v2).
This allows a client to have protocol v2 configured in
`protocol.version` and take advantage of using v2 for fetch and falling
back to using v0 when pushing while v2 for push is being designed.

We could run into issues if we didn't fall back to protocol v2 when
pushing right now.  This is because currently a server will ignore a request to
use v2 when contacting the 'receive-pack' endpoint and fall back to
using v0, but when push v2 is rolled out to servers, the 'receive-pack'
endpoint will start responding using v2.  So we don't want to get into a
state where a client is requesting to push with v2 before they actually
know how to push using v2.

Signed-off-by: Brandon Williams 
---
 connect.c  |  8 
 t/t5702-protocol-v2.sh | 24 
 2 files changed, 32 insertions(+)

diff --git a/connect.c b/connect.c
index a57a060dc4..54971166ac 100644
--- a/connect.c
+++ b/connect.c
@@ -1218,6 +1218,14 @@ struct child_process *git_connect(int fd[2], const char 
*url,
enum protocol protocol;
enum protocol_version version = get_protocol_version_config();
 
+   /*
+* NEEDSWORK: If we are trying to use protocol v2 and we are planning
+* to perform a push, then fallback to v0 since the client doesn't know
+* how to push yet using v2.
+*/
+   if (version == protocol_v2 && !strcmp("git-receive-pack", prog))
+   version = protocol_v0;
+
/* Without this we cannot rely on waitpid() to tell
 * what happened to our children.
 */
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 4365ac2736..e3a7c09d4a 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -95,6 +95,30 @@ test_expect_success 'pull with git:// using protocol v2' '
grep "fetch< version 2" log
 '
 
+test_expect_success 'push with git:// and a config of v2 does not request v2' '
+   test_when_finished "rm -f log" &&
+
+   # Till v2 for push is designed, make sure that if a client has
+   # protocol.version configured to use v2, that the client instead falls
+   # back and uses v0.
+
+   test_commit -C daemon_child three &&
+
+   # Push to another branch, as the target repository has the
+   # master branch checked out and we cannot push into it.
+   GIT_TRACE_PACKET="$(pwd)/log" git -C daemon_child -c protocol.version=2 
\
+   push origin HEAD:client_branch &&
+
+   git -C daemon_child log -1 --format=%s >actual &&
+   git -C "$daemon_parent" log -1 --format=%s client_branch >expect &&
+   test_cmp expect actual &&
+
+   # Client requested to use protocol v2
+   ! grep "push> .*\\\0\\\0version=2\\\0$" log &&
+   # Server responded using protocol v2
+   ! grep "push< version 2" log
+'
+
 stop_git_daemon
 
 # Test protocol v2 with 'file://' transport
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 20/35] upload-pack: introduce fetch server command

2018-03-14 Thread Brandon Williams
Introduce the 'fetch' server command.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt | 127 +++
 serve.c |   2 +
 t/t5701-git-serve.sh|   1 +
 upload-pack.c   | 266 
 upload-pack.h   |   6 +
 5 files changed, 402 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 422edf870e..9ce91d7213 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -203,3 +203,130 @@ The output of ls-refs is as follows:
 ref-attribute = (symref | peeled)
 symref = "symref-target:" symref-target
 peeled = "peeled:" obj-id
+
+ fetch
+---
+
+`fetch` is the command used to fetch a packfile in v2.  It can be looked
+at as a modified version of the v1 fetch where the ref-advertisement is
+stripped out (since the `ls-refs` command fills that role) and the
+message format is tweaked to eliminate redundancies and permit easy
+addition of future extensions.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "= "
+
+A `fetch` request can take the following arguments:
+
+want 
+   Indicates to the server an object which the client wants to
+   retrieve.  Wants can be anything and are not limited to
+   advertised objects.
+
+have 
+   Indicates to the server an object which the client has locally.
+   This allows the server to make a packfile which only contains
+   the objects that the client needs. Multiple 'have' lines can be
+   supplied.
+
+done
+   Indicates to the server that negotiation should terminate (or
+   not even begin if performing a clone) and that the server should
+   use the information supplied in the request to construct the
+   packfile.
+
+thin-pack
+   Request that a thin pack be sent, which is a pack with deltas
+   which reference base objects not contained within the pack (but
+   are known to exist at the receiving end). This can reduce the
+   network traffic significantly, but it requires the receiving end
+   to know how to "thicken" these packs by adding the missing bases
+   to the pack.
+
+no-progress
+   Request that progress information that would normally be sent on
+   side-band channel 2, during the packfile transfer, should not be
+   sent.  However, the side-band channel 3 is still used for error
+   responses.
+
+include-tag
+   Request that annotated tags should be sent if the objects they
+   point to are being sent.
+
+ofs-delta
+   Indicate that the client understands PACKv2 with delta referring
+   to its base by position in pack rather than by an oid.  That is,
+   they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
+
+The response of `fetch` is broken into a number of sections separated by
+delimiter packets (0001), with each section beginning with its section
+header.
+
+output = *section
+section = (acknowledgments | packfile)
+ (flush-pkt | delim-pkt)
+
+acknowledgments = PKT-LINE("acknowledgments" LF)
+ (nak | *ack)
+ (ready)
+ready = PKT-LINE("ready" LF)
+nak = PKT-LINE("NAK" LF)
+ack = PKT-LINE("ACK" SP obj-id LF)
+
+packfile = PKT-LINE("packfile" LF)
+  *PKT-LINE(%x01-03 *%x00-ff)
+
+
+acknowledgments section
+   * If the client determines that it is finished with negotiations
+ by sending a "done" line, the acknowledgments sections MUST be
+ omitted from the server's response.
+
+   * Always begins with the section header "acknowledgments"
+
+   * The server will respond with "NAK" if none of the object ids sent
+ as have lines were common.
+
+   * The server will respond with "ACK obj-id" for all of the
+ object ids sent as have lines which are common.
+
+   * A response cannot have both "ACK" lines as well as a "NAK"
+ line.
+
+   * The server will respond with a "ready" line indicating that
+ the server has found an acceptable common base and is ready to
+ make and send a packfile (which will be found in the packfile
+ section of the same response)
+
+   * If the server has found a suitable cut point and has decided
+ to send a "ready" line, then the server can decide to (as an
+ optimization) omit any "ACK" lines it would have sent during
+ its response.  This is because the server will have already
+ determined the objects it plans to send to the client and no
+ further negotiation is needed.
+
+
+packfile section
+   * This section is only included if the client has sent 'want'
+   

[PATCH v5 30/35] remote-curl: store the protocol version the server responded with

2018-03-14 Thread Brandon Williams
Store the protocol version the server responded with when performing
discovery.  This will be used in a future patch to either change the
'Git-Protocol' header sent in subsequent requests or to determine if a
client needs to fallback to using a different protocol version.

Signed-off-by: Brandon Williams 
---
 remote-curl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 4086aa733b..c540358438 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -171,6 +171,7 @@ struct discovery {
size_t len;
struct ref *refs;
struct oid_array shallow;
+   enum protocol_version version;
unsigned proto_git : 1;
 };
 static struct discovery *last_discovery;
@@ -184,7 +185,8 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
   PACKET_READ_CHOMP_NEWLINE |
   PACKET_READ_GENTLE_ON_EOF);
 
-   switch (discover_version(&reader)) {
+   heads->version = discover_version(&reader);
+   switch (heads->version) {
case protocol_v2:
die("support for protocol v2 not implemented yet");
break;
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 25/35] transport-helper: remove name parameter

2018-03-14 Thread Brandon Williams
Commit 266f1fdfa (transport-helper: be quiet on read errors from
helpers, 2013-06-21) removed a call to 'die()' which printed the name of
the remote helper passed in to the 'recvline_fh()' function using the
'name' parameter.  Once the call to 'die()' was removed the parameter
was no longer necessary but wasn't removed.  Clean up 'recvline_fh()'
parameter list by removing the 'name' parameter.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 8774ab3013..9677ead426 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -49,7 +49,7 @@ static void sendline(struct helper_data *helper, struct 
strbuf *buffer)
die_errno("Full write to remote helper failed");
 }
 
-static int recvline_fh(FILE *helper, struct strbuf *buffer, const char *name)
+static int recvline_fh(FILE *helper, struct strbuf *buffer)
 {
strbuf_reset(buffer);
if (debug)
@@ -67,7 +67,7 @@ static int recvline_fh(FILE *helper, struct strbuf *buffer, 
const char *name)
 
 static int recvline(struct helper_data *helper, struct strbuf *buffer)
 {
-   return recvline_fh(helper->out, buffer, helper->name);
+   return recvline_fh(helper->out, buffer);
 }
 
 static void write_constant(int fd, const char *str)
@@ -586,7 +586,7 @@ static int process_connect_service(struct transport 
*transport,
goto exit;
 
sendline(data, &cmdbuf);
-   if (recvline_fh(input, &cmdbuf, name))
+   if (recvline_fh(input, &cmdbuf))
exit(128);
 
if (!strcmp(cmdbuf.buf, "")) {
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 23/35] connect: refactor git_connect to only get the protocol version once

2018-03-14 Thread Brandon Williams
Instead of having each builtin transport asking for which protocol
version the user has configured in 'protocol.version' by calling
`get_protocol_version_config()` multiple times, factor this logic out
so there is just a single call at the beginning of `git_connect()`.

This will be helpful in the next patch where we can have centralized
logic which determines if we need to request a different protocol
version than what the user has configured.

Signed-off-by: Brandon Williams 
---
 connect.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/connect.c b/connect.c
index 5bb9d34844..a57a060dc4 100644
--- a/connect.c
+++ b/connect.c
@@ -1035,6 +1035,7 @@ static enum ssh_variant determine_ssh_variant(const char 
*ssh_command,
  */
 static struct child_process *git_connect_git(int fd[2], char *hostandport,
 const char *path, const char *prog,
+enum protocol_version version,
 int flags)
 {
struct child_process *conn;
@@ -1073,10 +1074,10 @@ static struct child_process *git_connect_git(int fd[2], 
char *hostandport,
target_host, 0);
 
/* If using a new version put that stuff here after a second null byte 
*/
-   if (get_protocol_version_config() > 0) {
+   if (version > 0) {
strbuf_addch(&request, '\0');
strbuf_addf(&request, "version=%d%c",
-   get_protocol_version_config(), '\0');
+   version, '\0');
}
 
packet_write(fd[1], request.buf, request.len);
@@ -1092,14 +1093,14 @@ static struct child_process *git_connect_git(int fd[2], 
char *hostandport,
  */
 static void push_ssh_options(struct argv_array *args, struct argv_array *env,
 enum ssh_variant variant, const char *port,
-int flags)
+enum protocol_version version, int flags)
 {
if (variant == VARIANT_SSH &&
-   get_protocol_version_config() > 0) {
+   version > 0) {
argv_array_push(args, "-o");
argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
-get_protocol_version_config());
+version);
}
 
if (flags & CONNECT_IPV4) {
@@ -1152,7 +1153,8 @@ static void push_ssh_options(struct argv_array *args, 
struct argv_array *env,
 
 /* Prepare a child_process for use by Git's SSH-tunneled transport. */
 static void fill_ssh_args(struct child_process *conn, const char *ssh_host,
- const char *port, int flags)
+ const char *port, enum protocol_version version,
+ int flags)
 {
const char *ssh;
enum ssh_variant variant;
@@ -1186,14 +1188,14 @@ static void fill_ssh_args(struct child_process *conn, 
const char *ssh_host,
argv_array_push(&detect.args, ssh);
argv_array_push(&detect.args, "-G");
push_ssh_options(&detect.args, &detect.env_array,
-VARIANT_SSH, port, flags);
+VARIANT_SSH, port, version, flags);
argv_array_push(&detect.args, ssh_host);
 
variant = run_command(&detect) ? VARIANT_SIMPLE : VARIANT_SSH;
}
 
argv_array_push(&conn->args, ssh);
-   push_ssh_options(&conn->args, &conn->env_array, variant, port, flags);
+   push_ssh_options(&conn->args, &conn->env_array, variant, port, version, 
flags);
argv_array_push(&conn->args, ssh_host);
 }
 
@@ -1214,6 +1216,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
char *hostandport, *path;
struct child_process *conn;
enum protocol protocol;
+   enum protocol_version version = get_protocol_version_config();
 
/* Without this we cannot rely on waitpid() to tell
 * what happened to our children.
@@ -1228,7 +1231,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
printf("Diag: path=%s\n", path ? path : "NULL");
conn = NULL;
} else if (protocol == PROTO_GIT) {
-   conn = git_connect_git(fd, hostandport, path, prog, flags);
+   conn = git_connect_git(fd, hostandport, path, prog, version, 
flags);
} else {
struct strbuf cmd = STRBUF_INIT;
const char *const *var;
@@ -1271,12 +1274,12 @@ struct child_process *git_connect(int fd[2], const char 
*url,
strbuf_release(&cmd);
return NULL;
}
-   fill_ssh_args(conn, ssh_host, port, flags);
+   fill_ssh_args(conn

[PATCH v5 34/35] remote-curl: implement stateless-connect command

2018-03-14 Thread Brandon Williams
Teach remote-curl the 'stateless-connect' command which is used to
establish a stateless connection with servers which support protocol
version 2.  This allows remote-curl to act as a proxy, allowing the git
client to communicate natively with a remote end, simply using
remote-curl as a pass through to convert requests to http.

Signed-off-by: Brandon Williams 
---
 remote-curl.c  | 207 -
 t/t5702-protocol-v2.sh |  45 +
 2 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 66a53f74bb..87f5b77b29 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -188,7 +188,12 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
heads->version = discover_version(&reader);
switch (heads->version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   /*
+* Do nothing.  This isn't a list of refs but rather a
+* capability advertisement.  Client would have run
+* 'stateless-connect' so we'll dump this capability listing
+* and let them request the refs themselves.
+*/
break;
case protocol_v1:
case protocol_v0:
@@ -1085,6 +1090,202 @@ static void parse_push(struct strbuf *buf)
free(specs);
 }
 
+/*
+ * Used to represent the state of a connection to an HTTP server when
+ * communicating using git's wire-protocol version 2.
+ */
+struct proxy_state {
+   char *service_name;
+   char *service_url;
+   struct curl_slist *headers;
+   struct strbuf request_buffer;
+   int in;
+   int out;
+   struct packet_reader reader;
+   size_t pos;
+   int seen_flush;
+};
+
+static void proxy_state_init(struct proxy_state *p, const char *service_name,
+enum protocol_version version)
+{
+   struct strbuf buf = STRBUF_INIT;
+
+   memset(p, 0, sizeof(*p));
+   p->service_name = xstrdup(service_name);
+
+   p->in = 0;
+   p->out = 1;
+   strbuf_init(&p->request_buffer, 0);
+
+   strbuf_addf(&buf, "%s%s", url.buf, p->service_name);
+   p->service_url = strbuf_detach(&buf, NULL);
+
+   p->headers = http_copy_default_headers();
+
+   strbuf_addf(&buf, "Content-Type: application/x-%s-request", 
p->service_name);
+   p->headers = curl_slist_append(p->headers, buf.buf);
+   strbuf_reset(&buf);
+
+   strbuf_addf(&buf, "Accept: application/x-%s-result", p->service_name);
+   p->headers = curl_slist_append(p->headers, buf.buf);
+   strbuf_reset(&buf);
+
+   p->headers = curl_slist_append(p->headers, "Transfer-Encoding: 
chunked");
+
+   /* Add the Git-Protocol header */
+   if (get_protocol_http_header(version, &buf))
+   p->headers = curl_slist_append(p->headers, buf.buf);
+
+   packet_reader_init(&p->reader, p->in, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF);
+
+   strbuf_release(&buf);
+}
+
+static void proxy_state_clear(struct proxy_state *p)
+{
+   free(p->service_name);
+   free(p->service_url);
+   curl_slist_free_all(p->headers);
+   strbuf_release(&p->request_buffer);
+}
+
+/*
+ * CURLOPT_READFUNCTION callback function.
+ * Attempts to copy over a single packet-line at a time into the
+ * curl provided buffer.
+ */
+static size_t proxy_in(char *buffer, size_t eltsize,
+  size_t nmemb, void *userdata)
+{
+   size_t max;
+   struct proxy_state *p = userdata;
+   size_t avail = p->request_buffer.len - p->pos;
+
+
+   if (eltsize != 1)
+   BUG("curl read callback called with size = %"PRIuMAX" != 1",
+   (uintmax_t)eltsize);
+   max = nmemb;
+
+   if (!avail) {
+   if (p->seen_flush) {
+   p->seen_flush = 0;
+   return 0;
+   }
+
+   strbuf_reset(&p->request_buffer);
+   switch (packet_reader_read(&p->reader)) {
+   case PACKET_READ_EOF:
+   die("unexpected EOF when reading from parent process");
+   case PACKET_READ_NORMAL:
+   packet_buf_write_len(&p->request_buffer, p->reader.line,
+p->reader.pktlen);
+   break;
+   case PACKET_READ_DELIM:
+   packet_buf_delim(&p->request_buffer);
+   break;
+   case PACKET_READ_FLUSH:
+   packet_buf_flush(&p->request_buffer);
+   p->seen_flush = 1;
+   break;
+   }
+   p->pos = 0;
+   avail = p->request_buffer.len;
+   }
+
+   if (max < avail)
+   avail = max;
+   memcpy(buffer, p->request_buffer.buf + p->pos, avail);
+   p->pos += avail;
+   r

[PATCH v5 35/35] remote-curl: don't request v2 when pushing

2018-03-14 Thread Brandon Williams
In order to be able to ship protocol v2 with only supporting fetch, we
need clients to not issue a request to use protocol v2 when pushing
(since the client currently doesn't know how to push using protocol v2).
This allows a client to have protocol v2 configured in
`protocol.version` and take advantage of using v2 for fetch and falling
back to using v0 when pushing while v2 for push is being designed.

We could run into issues if we didn't fall back to protocol v2 when
pushing right now.  This is because currently a server will ignore a request to
use v2 when contacting the 'receive-pack' endpoint and fall back to
using v0, but when push v2 is rolled out to servers, the 'receive-pack'
endpoint will start responding using v2.  So we don't want to get into a
state where a client is requesting to push with v2 before they actually
know how to push using v2.

Signed-off-by: Brandon Williams 
---
 remote-curl.c  | 11 ++-
 t/t5702-protocol-v2.sh | 24 
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 87f5b77b29..595447b16e 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -322,6 +322,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
struct discovery *last = last_discovery;
int http_ret, maybe_smart = 0;
struct http_get_options http_options;
+   enum protocol_version version = get_protocol_version_config();
 
if (last && !strcmp(service, last->service))
return last;
@@ -338,8 +339,16 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
strbuf_addf(&refs_url, "service=%s", service);
}
 
+   /*
+* NEEDSWORK: If we are trying to use protocol v2 and we are planning
+* to perform a push, then fallback to v0 since the client doesn't know
+* how to push yet using v2.
+*/
+   if (version == protocol_v2 && !strcmp("git-receive-pack", service))
+   version = protocol_v0;
+
/* Add the extra Git-Protocol header */
-   if (get_protocol_http_header(get_protocol_version_config(), 
&protocol_header))
+   if (get_protocol_http_header(version, &protocol_header))
string_list_append(&extra_headers, protocol_header.buf);
 
memset(&http_options, 0, sizeof(http_options));
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 124063c2c4..56f7c3c326 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -244,6 +244,30 @@ test_expect_success 'fetch with http:// using protocol v2' 
'
grep "git< version 2" log
 '
 
+test_expect_success 'push with http:// and a config of v2 does not request v2' 
'
+   test_when_finished "rm -f log" &&
+   # Till v2 for push is designed, make sure that if a client has
+   # protocol.version configured to use v2, that the client instead falls
+   # back and uses v0.
+
+   test_commit -C http_child three &&
+
+   # Push to another branch, as the target repository has the
+   # master branch checked out and we cannot push into it.
+   GIT_TRACE_PACKET="$(pwd)/log" git -C http_child -c protocol.version=2 \
+   push origin HEAD:client_branch &&
+
+   git -C http_child log -1 --format=%s >actual &&
+   git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" log -1 --format=%s 
client_branch >expect &&
+   test_cmp expect actual &&
+
+   # Client didnt request to use protocol v2
+   ! grep "Git-Protocol: version=2" log &&
+   # Server didnt respond using protocol v2
+   ! grep "git< version 2" log
+'
+
+
 stop_httpd
 
 test_done
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 33/35] http: eliminate "# service" line when using protocol v2

2018-03-14 Thread Brandon Williams
When an http info/refs request is made, requesting that protocol v2 be
used, don't send a "# service" line since this line is not part of the
v2 spec.

Signed-off-by: Brandon Williams 
---
 http-backend.c | 8 ++--
 remote-curl.c  | 3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index f3dc218b2a..5d241e9109 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -10,6 +10,7 @@
 #include "url.h"
 #include "argv-array.h"
 #include "packfile.h"
+#include "protocol.h"
 
 static const char content_type[] = "Content-Type";
 static const char content_length[] = "Content-Length";
@@ -466,8 +467,11 @@ static void get_info_refs(struct strbuf *hdr, char *arg)
hdr_str(hdr, content_type, buf.buf);
end_headers(hdr);
 
-   packet_write_fmt(1, "# service=git-%s\n", svc->name);
-   packet_flush(1);
+
+   if (determine_protocol_version_server() != protocol_v2) {
+   packet_write_fmt(1, "# service=git-%s\n", svc->name);
+   packet_flush(1);
+   }
 
argv[0] = svc->name;
run_service(argv, 0);
diff --git a/remote-curl.c b/remote-curl.c
index b4e9db85bb..66a53f74bb 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -396,6 +396,9 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
;
 
last->proto_git = 1;
+   } else if (maybe_smart &&
+  last->len > 5 && starts_with(last->buf + 4, "version 2")) {
+   last->proto_git = 1;
}
 
if (last->proto_git)
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 32/35] http: don't always add Git-Protocol header

2018-03-14 Thread Brandon Williams
Instead of always sending the Git-Protocol header with the configured
version with every http request, explicitly send it when discovering
refs and then only send it on subsequent http requests if the server
understood the version requested.

Signed-off-by: Brandon Williams 
---
 http.c| 17 -
 remote-curl.c | 33 +
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/http.c b/http.c
index e1757d62b2..8f1129ac7c 100644
--- a/http.c
+++ b/http.c
@@ -904,21 +904,6 @@ static void set_from_env(const char **var, const char 
*envname)
*var = val;
 }
 
-static void protocol_http_header(void)
-{
-   if (get_protocol_version_config() > 0) {
-   struct strbuf protocol_header = STRBUF_INIT;
-
-   strbuf_addf(&protocol_header, GIT_PROTOCOL_HEADER ": 
version=%d",
-   get_protocol_version_config());
-
-
-   extra_http_headers = curl_slist_append(extra_http_headers,
-  protocol_header.buf);
-   strbuf_release(&protocol_header);
-   }
-}
-
 void http_init(struct remote *remote, const char *url, int proactive_auth)
 {
char *low_speed_limit;
@@ -949,8 +934,6 @@ void http_init(struct remote *remote, const char *url, int 
proactive_auth)
if (remote)
var_override(&http_proxy_authmethod, 
remote->http_proxy_authmethod);
 
-   protocol_http_header();
-
pragma_header = curl_slist_append(http_copy_default_headers(),
"Pragma: no-cache");
no_pragma_header = curl_slist_append(http_copy_default_headers(),
diff --git a/remote-curl.c b/remote-curl.c
index c540358438..b4e9db85bb 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -291,6 +291,19 @@ static int show_http_message(struct strbuf *type, struct 
strbuf *charset,
return 0;
 }
 
+static int get_protocol_http_header(enum protocol_version version,
+   struct strbuf *header)
+{
+   if (version > 0) {
+   strbuf_addf(header, GIT_PROTOCOL_HEADER ": version=%d",
+   version);
+
+   return 1;
+   }
+
+   return 0;
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
struct strbuf exp = STRBUF_INIT;
@@ -299,6 +312,8 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
struct strbuf buffer = STRBUF_INIT;
struct strbuf refs_url = STRBUF_INIT;
struct strbuf effective_url = STRBUF_INIT;
+   struct strbuf protocol_header = STRBUF_INIT;
+   struct string_list extra_headers = STRING_LIST_INIT_DUP;
struct discovery *last = last_discovery;
int http_ret, maybe_smart = 0;
struct http_get_options http_options;
@@ -318,11 +333,16 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
strbuf_addf(&refs_url, "service=%s", service);
}
 
+   /* Add the extra Git-Protocol header */
+   if (get_protocol_http_header(get_protocol_version_config(), 
&protocol_header))
+   string_list_append(&extra_headers, protocol_header.buf);
+
memset(&http_options, 0, sizeof(http_options));
http_options.content_type = &type;
http_options.charset = &charset;
http_options.effective_url = &effective_url;
http_options.base_url = &url;
+   http_options.extra_headers = &extra_headers;
http_options.initial_request = 1;
http_options.no_cache = 1;
http_options.keep_error = 1;
@@ -389,6 +409,8 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
strbuf_release(&charset);
strbuf_release(&effective_url);
strbuf_release(&buffer);
+   strbuf_release(&protocol_header);
+   string_list_clear(&extra_headers, 0);
last_discovery = last;
return last;
 }
@@ -425,6 +447,7 @@ struct rpc_state {
char *service_url;
char *hdr_content_type;
char *hdr_accept;
+   char *protocol_header;
char *buf;
size_t alloc;
size_t len;
@@ -611,6 +634,10 @@ static int post_rpc(struct rpc_state *rpc)
headers = curl_slist_append(headers, needs_100_continue ?
"Expect: 100-continue" : "Expect:");
 
+   /* Add the extra Git-Protocol header */
+   if (rpc->protocol_header)
+   headers = curl_slist_append(headers, rpc->protocol_header);
+
 retry:
slot = get_active_slot();
 
@@ -751,6 +778,11 @@ static int rpc_service(struct rpc_state *rpc, struct 
discovery *heads)
strbuf_addf(&buf, "Accept: application/x-%s-result", svc);
rpc->hdr_accept = strbuf_detach(&buf, NULL);
 
+   if (get_protocol_http_header(heads->version, &buf))
+   rpc->protocol_header = strbuf_detach(&buf, NULL);
+   else
+   rpc->protocol_header = NULL;
+

[PATCH v5 29/35] remote-curl: create copy of the service name

2018-03-14 Thread Brandon Williams
Make a copy of the service name being requested instead of relying on
the buffer pointed to by the passed in 'const char *' to remain
unchanged.

Currently, all service names are string constants, but a subsequent
patch will introduce service names from external sources.

Signed-off-by: Brandon Williams 
---
 remote-curl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index dae8a4a48d..4086aa733b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -165,7 +165,7 @@ static int set_option(const char *name, const char *value)
 }
 
 struct discovery {
-   const char *service;
+   char *service;
char *buf_alloc;
char *buf;
size_t len;
@@ -257,6 +257,7 @@ static void free_discovery(struct discovery *d)
free(d->shallow.oid);
free(d->buf_alloc);
free_refs(d->refs);
+   free(d->service);
free(d);
}
 }
@@ -343,7 +344,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
warning(_("redirecting to %s"), url.buf);
 
last= xcalloc(1, sizeof(*last_discovery));
-   last->service = service;
+   last->service = xstrdup(service);
last->buf_alloc = strbuf_detach(&buffer, &last->len);
last->buf = last->buf_alloc;
 
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 28/35] pkt-line: add packet_buf_write_len function

2018-03-14 Thread Brandon Williams
Add the 'packet_buf_write_len()' function which allows for writing an
arbitrary length buffer into a 'struct strbuf' and formatting it in
packet-line format.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 16 
 pkt-line.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 7296731cf3..555eb2a507 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -215,6 +215,22 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, 
...)
va_end(args);
 }
 
+void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len)
+{
+   size_t orig_len, n;
+
+   orig_len = buf->len;
+   strbuf_addstr(buf, "");
+   strbuf_add(buf, data, len);
+   n = buf->len - orig_len;
+
+   if (n > LARGE_PACKET_MAX)
+   die("protocol error: impossibly long line");
+
+   set_packet_header(&buf->buf[orig_len], n);
+   packet_trace(data, len, 1);
+}
+
 int write_packetized_from_fd(int fd_in, int fd_out)
 {
static char buf[LARGE_PACKET_DATA_MAX];
diff --git a/pkt-line.h b/pkt-line.h
index 9570bd7a0a..5b28d43472 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -26,6 +26,7 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int write_packetized_from_fd(int fd_in, int fd_out);
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 31/35] http: allow providing extra headers for http requests

2018-03-14 Thread Brandon Williams
Add a way for callers to request that extra headers be included when
making http requests.

Signed-off-by: Brandon Williams 
---
 http.c | 8 
 http.h | 7 +++
 2 files changed, 15 insertions(+)

diff --git a/http.c b/http.c
index 5977712712..e1757d62b2 100644
--- a/http.c
+++ b/http.c
@@ -1723,6 +1723,14 @@ static int http_request(const char *url,
 
headers = curl_slist_append(headers, buf.buf);
 
+   /* Add additional headers here */
+   if (options && options->extra_headers) {
+   const struct string_list_item *item;
+   for_each_string_list_item(item, options->extra_headers) {
+   headers = curl_slist_append(headers, item->string);
+   }
+   }
+
curl_easy_setopt(slot->curl, CURLOPT_URL, url);
curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "gzip");
diff --git a/http.h b/http.h
index f7bd3b26b0..4df4a25e1a 100644
--- a/http.h
+++ b/http.h
@@ -172,6 +172,13 @@ struct http_get_options {
 * for details.
 */
struct strbuf *base_url;
+
+   /*
+* If not NULL, contains additional HTTP headers to be sent with the
+* request. The strings in the list must not be freed until after the
+* request has completed.
+*/
+   struct string_list *extra_headers;
 };
 
 /* Return values for http_get_*() */
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 27/35] transport-helper: introduce stateless-connect

2018-03-14 Thread Brandon Williams
Introduce the transport-helper capability 'stateless-connect'.  This
capability indicates that the transport-helper can be requested to run
the 'stateless-connect' command which should attempt to make a
stateless connection with a remote end.  Once established, the
connection can be used by the git client to communicate with
the remote end natively in a stateless-rpc manner as supported by
protocol v2.  This means that the client must send everything the server
needs in a single request as the client must not assume any
state-storing on the part of the server or transport.

If a stateless connection cannot be established then the remote-helper
will respond in the same manner as the 'connect' command indicating that
the client should fallback to using the dumb remote-helper commands.

A future patch will implement the 'stateless-connect' capability in our
http remote-helper (remote-curl) so that protocol v2 can be used using
the http transport.

Signed-off-by: Brandon Williams 
---
 Documentation/gitremote-helpers.txt | 32 +
 transport-helper.c  | 11 ++
 transport.c |  1 +
 transport.h |  6 ++
 4 files changed, 50 insertions(+)

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 4a584f3c5d..cd9b34d230 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -102,6 +102,14 @@ Capabilities for Pushing
 +
 Supported commands: 'connect'.
 
+'stateless-connect'::
+   Experimental; for internal use only.
+   Can attempt to connect to a remote server for communication
+   using git's wire-protocol version 2.  See the documentation
+   for the stateless-connect command for more information.
++
+Supported commands: 'stateless-connect'.
+
 'push'::
Can discover remote refs and push local commits and the
history leading up to them to new or existing remote refs.
@@ -136,6 +144,14 @@ Capabilities for Fetching
 +
 Supported commands: 'connect'.
 
+'stateless-connect'::
+   Experimental; for internal use only.
+   Can attempt to connect to a remote server for communication
+   using git's wire-protocol version 2.  See the documentation
+   for the stateless-connect command for more information.
++
+Supported commands: 'stateless-connect'.
+
 'fetch'::
Can discover remote refs and transfer objects reachable from
them to the local object store.
@@ -375,6 +391,22 @@ Supported if the helper has the "export" capability.
 +
 Supported if the helper has the "connect" capability.
 
+'stateless-connect' ::
+   Experimental; for internal use only.
+   Connects to the given remote service for communication using
+   git's wire-protocol version 2.  Valid replies to this command
+   are empty line (connection established), 'fallback' (no smart
+   transport support, fall back to dumb transports) and just
+   exiting with error message printed (can't connect, don't bother
+   trying to fall back).  After line feed terminating the positive
+   (empty) response, the output of the service starts.  Messages
+   (both request and response) must consist of zero or more
+   PKT-LINEs, terminating in a flush packet. The client must not
+   expect the server to store any state in between request-response
+   pairs.  After the connection ends, the remote helper exits.
++
+Supported if the helper has the "stateless-connect" capability.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
diff --git a/transport-helper.c b/transport-helper.c
index 830f21f0a9..aecbc4a845 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -12,6 +12,7 @@
 #include "argv-array.h"
 #include "refs.h"
 #include "transport-internal.h"
+#include "protocol.h"
 
 static int debug;
 
@@ -26,6 +27,7 @@ struct helper_data {
option : 1,
push : 1,
connect : 1,
+   stateless_connect : 1,
signed_tags : 1,
check_connectivity : 1,
no_disconnect_req : 1,
@@ -188,6 +190,8 @@ static struct child_process *get_helper(struct transport 
*transport)
refspecs[refspec_nr++] = xstrdup(arg);
} else if (!strcmp(capname, "connect")) {
data->connect = 1;
+   } else if (!strcmp(capname, "stateless-connect")) {
+   data->stateless_connect = 1;
} else if (!strcmp(capname, "signed-tags")) {
data->signed_tags = 1;
} else if (skip_prefix(capname, "export-marks ", &arg)) {
@@ -612,6 +616,13 @@ static int process_connect_service(struct transport 
*transport,
if (data->connect) {
st

[PATCH v5 26/35] transport-helper: refactor process_connect_service

2018-03-14 Thread Brandon Williams
A future patch will need to take advantage of the logic which runs and
processes the response of the connect command on a remote helper so
factor out this logic from 'process_connect_service()' and place it into
a helper function 'run_connect()'.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 67 ++
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 9677ead426..830f21f0a9 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -545,14 +545,13 @@ static int fetch_with_import(struct transport *transport,
return 0;
 }
 
-static int process_connect_service(struct transport *transport,
-  const char *name, const char *exec)
+static int run_connect(struct transport *transport, struct strbuf *cmdbuf)
 {
struct helper_data *data = transport->data;
-   struct strbuf cmdbuf = STRBUF_INIT;
-   struct child_process *helper;
-   int r, duped, ret = 0;
+   int ret = 0;
+   int duped;
FILE *input;
+   struct child_process *helper;
 
helper = get_helper(transport);
 
@@ -568,44 +567,54 @@ static int process_connect_service(struct transport 
*transport,
input = xfdopen(duped, "r");
setvbuf(input, NULL, _IONBF, 0);
 
+   sendline(data, cmdbuf);
+   if (recvline_fh(input, cmdbuf))
+   exit(128);
+
+   if (!strcmp(cmdbuf->buf, "")) {
+   data->no_disconnect_req = 1;
+   if (debug)
+   fprintf(stderr, "Debug: Smart transport connection "
+   "ready.\n");
+   ret = 1;
+   } else if (!strcmp(cmdbuf->buf, "fallback")) {
+   if (debug)
+   fprintf(stderr, "Debug: Falling back to dumb "
+   "transport.\n");
+   } else {
+   die("Unknown response to connect: %s",
+   cmdbuf->buf);
+   }
+
+   fclose(input);
+   return ret;
+}
+
+static int process_connect_service(struct transport *transport,
+  const char *name, const char *exec)
+{
+   struct helper_data *data = transport->data;
+   struct strbuf cmdbuf = STRBUF_INIT;
+   int ret = 0;
+
/*
 * Handle --upload-pack and friends. This is fire and forget...
 * just warn if it fails.
 */
if (strcmp(name, exec)) {
-   r = set_helper_option(transport, "servpath", exec);
+   int r = set_helper_option(transport, "servpath", exec);
if (r > 0)
warning("Setting remote service path not supported by 
protocol.");
else if (r < 0)
warning("Invalid remote service path.");
}
 
-   if (data->connect)
+   if (data->connect) {
strbuf_addf(&cmdbuf, "connect %s\n", name);
-   else
-   goto exit;
-
-   sendline(data, &cmdbuf);
-   if (recvline_fh(input, &cmdbuf))
-   exit(128);
-
-   if (!strcmp(cmdbuf.buf, "")) {
-   data->no_disconnect_req = 1;
-   if (debug)
-   fprintf(stderr, "Debug: Smart transport connection "
-   "ready.\n");
-   ret = 1;
-   } else if (!strcmp(cmdbuf.buf, "fallback")) {
-   if (debug)
-   fprintf(stderr, "Debug: Falling back to dumb "
-   "transport.\n");
-   } else
-   die("Unknown response to connect: %s",
-   cmdbuf.buf);
+   ret = run_connect(transport, &cmdbuf);
+   }
 
-exit:
strbuf_release(&cmdbuf);
-   fclose(input);
return ret;
 }
 
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 21/35] fetch-pack: perform a fetch using v2

2018-03-14 Thread Brandon Williams
When communicating with a v2 server, perform a fetch by requesting the
'fetch' command.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt |  68 +-
 builtin/fetch-pack.c|   2 +-
 fetch-pack.c| 270 +++-
 fetch-pack.h|   4 +-
 serve.c |   2 +-
 t/t5701-git-serve.sh|   2 +-
 t/t5702-protocol-v2.sh  |  97 +
 transport.c |   7 +-
 upload-pack.c   | 141 ++---
 upload-pack.h   |   4 +
 10 files changed, 549 insertions(+), 48 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 9ce91d7213..7b5a1d4000 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -259,12 +259,43 @@ A `fetch` request can take the following arguments:
to its base by position in pack rather than by an oid.  That is,
they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
 
+shallow 
+   A client must notify the server of all commits for which it only
+   has shallow copies (meaning that it doesn't have the parents of
+   a commit) by supplying a 'shallow ' line for each such
+   object so that the server is aware of the limitations of the
+   client's history.  This is so that the server is aware that the
+   client may not have all objects reachable from such commits.
+
+deepen 
+   Requests that the fetch/clone should be shallow having a commit
+   depth of  relative to the remote side.
+
+deepen-relative
+   Requests that the semantics of the "deepen" command be changed
+   to indicate that the depth requested is relative to the client's
+   current shallow boundary, instead of relative to the requested
+   commits.
+
+deepen-since 
+   Requests that the shallow clone/fetch should be cut at a
+   specific time, instead of depth.  Internally it's equivalent to
+   doing "git rev-list --max-age=". Cannot be used with
+   "deepen".
+
+deepen-not 
+   Requests that the shallow clone/fetch should be cut at a
+   specific revision specified by '', instead of a depth.
+   Internally it's equivalent of doing "git rev-list --not ".
+   Cannot be used with "deepen", but can be used with
+   "deepen-since".
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header.
 
 output = *section
-section = (acknowledgments | packfile)
+section = (acknowledgments | shallow-info | packfile)
  (flush-pkt | delim-pkt)
 
 acknowledgments = PKT-LINE("acknowledgments" LF)
@@ -274,6 +305,11 @@ header.
 nak = PKT-LINE("NAK" LF)
 ack = PKT-LINE("ACK" SP obj-id LF)
 
+shallow-info = PKT-LINE("shallow-info" LF)
+  *PKT-LINE((shallow | unshallow) LF)
+shallow = "shallow" SP obj-id
+unshallow = "unshallow" SP obj-id
+
 packfile = PKT-LINE("packfile" LF)
   *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -306,6 +342,36 @@ header.
  determined the objects it plans to send to the client and no
  further negotiation is needed.
 
+
+shallow-info section
+   If the client has requested a shallow fetch/clone, a shallow
+   client requests a fetch or the server is shallow then the
+   server's response may include a shallow-info section.  The
+   shallow-info section will be included if (due to one of the
+   above conditions) the server needs to inform the client of any
+   shallow boundaries or adjustments to the clients already
+   existing shallow boundaries.
+
+   * Always begins with the section header "shallow-info"
+
+   * If a positive depth is requested, the server will compute the
+ set of commits which are no deeper than the desired depth.
+
+   * The server sends a "shallow obj-id" line for each commit whose
+ parents will not be sent in the following packfile.
+
+   * The server sends an "unshallow obj-id" line for each commit
+ which the client has indicated is shallow, but is no longer
+ shallow as a result of the fetch (due to its parents being
+ sent in the following packfile).
+
+   * The server MUST NOT send any "unshallow" lines for anything
+ which the client has not indicated was shallow as a part of
+ its request.
+
+   * This section is only included if a packfile section is also
+ included in the response.
+
 
 packfile section
* This section is only included if the client has sent 'want'
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index b2374ddbbf..f9d7d0b5a5 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@

[PATCH v5 22/35] fetch-pack: support shallow requests

2018-03-14 Thread Brandon Williams
Enable shallow clones and deepen requests using protocol version 2 if
the server 'fetch' command supports the 'shallow' feature.

Signed-off-by: Brandon Williams 
---
 connect.c| 22 
 connect.h|  2 ++
 fetch-pack.c | 71 +++-
 3 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/connect.c b/connect.c
index e42d779f71..5bb9d34844 100644
--- a/connect.c
+++ b/connect.c
@@ -82,6 +82,28 @@ int server_supports_v2(const char *c, int die_on_error)
return 0;
 }
 
+int server_supports_feature(const char *c, const char *feature,
+   int die_on_error)
+{
+   int i;
+
+   for (i = 0; i < server_capabilities_v2.argc; i++) {
+   const char *out;
+   if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
+   (!*out || *(out++) == '=')) {
+   if (parse_feature_request(out, feature))
+   return 1;
+   else
+   break;
+   }
+   }
+
+   if (die_on_error)
+   die("server doesn't support feature '%s'", feature);
+
+   return 0;
+}
+
 static void process_capabilities_v2(struct packet_reader *reader)
 {
while (packet_reader_read(reader) == PACKET_READ_NORMAL)
diff --git a/connect.h b/connect.h
index 8898d44952..0e69c6709c 100644
--- a/connect.h
+++ b/connect.h
@@ -17,5 +17,7 @@ struct packet_reader;
 extern enum protocol_version discover_version(struct packet_reader *reader);
 
 extern int server_supports_v2(const char *c, int die_on_error);
+extern int server_supports_feature(const char *c, const char *feature,
+  int die_on_error);
 
 #endif
diff --git a/fetch-pack.c b/fetch-pack.c
index dffcfd66a5..837e1fd21d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1008,6 +1008,26 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
return ref;
 }
 
+static void add_shallow_requests(struct strbuf *req_buf,
+const struct fetch_pack_args *args)
+{
+   if (is_repository_shallow())
+   write_shallow_commits(req_buf, 1, NULL);
+   if (args->depth > 0)
+   packet_buf_write(req_buf, "deepen %d", args->depth);
+   if (args->deepen_since) {
+   timestamp_t max_age = approxidate(args->deepen_since);
+   packet_buf_write(req_buf, "deepen-since %"PRItime, max_age);
+   }
+   if (args->deepen_not) {
+   int i;
+   for (i = 0; i < args->deepen_not->nr; i++) {
+   struct string_list_item *s = args->deepen_not->items + 
i;
+   packet_buf_write(req_buf, "deepen-not %s", s->string);
+   }
+   }
+}
+
 static void add_wants(const struct ref *wants, struct strbuf *req_buf)
 {
for ( ; wants ; wants = wants->next) {
@@ -1093,6 +1113,12 @@ static int send_fetch_request(int fd_out, const struct 
fetch_pack_args *args,
if (prefer_ofs_delta)
packet_buf_write(&req_buf, "ofs-delta");
 
+   /* Add shallow-info and deepen request */
+   if (server_supports_feature("fetch", "shallow", 0))
+   add_shallow_requests(&req_buf, args);
+   else if (is_repository_shallow() || args->deepen)
+   die(_("Server does not support shallow requests"));
+
/* add wants */
add_wants(wants, &req_buf);
 
@@ -1122,7 +1148,7 @@ static int process_section_header(struct packet_reader 
*reader,
int ret;
 
if (packet_reader_peek(reader) != PACKET_READ_NORMAL)
-   die("error reading packet");
+   die("error reading section header '%s'", section);
 
ret = !strcmp(reader->line, section);
 
@@ -1177,6 +1203,43 @@ static int process_acks(struct packet_reader *reader, 
struct oidset *common)
return received_ready ? 2 : (received_ack ? 1 : 0);
 }
 
+static void receive_shallow_info(struct fetch_pack_args *args,
+struct packet_reader *reader)
+{
+   process_section_header(reader, "shallow-info", 0);
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+   const char *arg;
+   struct object_id oid;
+
+   if (skip_prefix(reader->line, "shallow ", &arg)) {
+   if (get_oid_hex(arg, &oid))
+   die(_("invalid shallow line: %s"), 
reader->line);
+   register_shallow(&oid);
+   continue;
+   }
+   if (skip_prefix(reader->line, "unshallow ", &arg)) {
+   if (get_oid_hex(arg, &oid))
+   die(_("invalid unshallow line: %s"), 
reader->line);
+   if (!lookup_object(oid.hash))
+   die(_("object not found: %s"), reader->line);
+   /* make 

[PATCH v5 19/35] push: pass ref prefixes when pushing

2018-03-14 Thread Brandon Williams
Construct a list of ref prefixes to be passed to 'get_refs_list()' from
the refspec to be used during the push.  This list of ref prefixes will
be used to allow the server to filter the ref advertisement when
communicating using protocol v2.

Signed-off-by: Brandon Williams 
---
 transport.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/transport.c b/transport.c
index 3f130518d2..57bdbb59bc 100644
--- a/transport.c
+++ b/transport.c
@@ -1028,11 +1028,38 @@ int transport_push(struct transport *transport,
int porcelain = flags & TRANSPORT_PUSH_PORCELAIN;
int pretend = flags & TRANSPORT_PUSH_DRY_RUN;
int push_ret, ret, err;
+   struct refspec *tmp_rs;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
+   int i;
 
if (check_push_refs(local_refs, refspec_nr, refspec) < 0)
return -1;
 
-   remote_refs = transport->vtable->get_refs_list(transport, 1, 
NULL);
+   tmp_rs = parse_push_refspec(refspec_nr, refspec);
+   for (i = 0; i < refspec_nr; i++) {
+   const char *prefix = NULL;
+
+   if (tmp_rs[i].dst)
+   prefix = tmp_rs[i].dst;
+   else if (tmp_rs[i].src && !tmp_rs[i].exact_sha1)
+   prefix = tmp_rs[i].src;
+
+   if (prefix) {
+   const char *glob = strchr(prefix, '*');
+   if (glob)
+   argv_array_pushf(&ref_prefixes, "%.*s",
+(int)(glob - prefix),
+prefix);
+   else
+   expand_ref_prefix(&ref_prefixes, 
prefix);
+   }
+   }
+
+   remote_refs = transport->vtable->get_refs_list(transport, 1,
+  &ref_prefixes);
+
+   argv_array_clear(&ref_prefixes);
+   free_refspec(refspec_nr, tmp_rs);
 
if (flags & TRANSPORT_PUSH_ALL)
match_flags |= MATCH_REFS_ALL;
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 17/35] ls-remote: pass ref prefixes when requesting a remote's refs

2018-03-14 Thread Brandon Williams
Construct an argv_array of ref prefixes based on the patterns supplied
via the command line and pass them to 'transport_get_remote_refs()' to
be used when communicating protocol v2 so that the server can limit the
ref advertisement based on those prefixes.

Signed-off-by: Brandon Williams 
---
 builtin/ls-remote.c| 15 +--
 refs.c | 14 ++
 refs.h |  7 +++
 t/t5702-protocol-v2.sh | 26 ++
 4 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c6e9847c5c..4276bf97d5 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "transport.h"
 #include "remote.h"
+#include "refs.h"
 
 static const char * const ls_remote_usage[] = {
N_("git ls-remote [--heads] [--tags] [--refs] [--upload-pack=]\n"
@@ -43,6 +44,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
int show_symref_target = 0;
const char *uploadpack = NULL;
const char **pattern = NULL;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
struct remote *remote;
struct transport *transport;
@@ -74,8 +76,17 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (argc > 1) {
int i;
pattern = xcalloc(argc, sizeof(const char *));
-   for (i = 1; i < argc; i++)
+   for (i = 1; i < argc; i++) {
+   const char *glob;
pattern[i - 1] = xstrfmt("*/%s", argv[i]);
+
+   glob = strchr(argv[i], '*');
+   if (glob)
+   argv_array_pushf(&ref_prefixes, "%.*s",
+(int)(glob - argv[i]), 
argv[i]);
+   else
+   expand_ref_prefix(&ref_prefixes, argv[i]);
+   }
}
 
remote = remote_get(dest);
@@ -96,7 +107,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport, NULL);
+   ref = transport_get_remote_refs(transport, &ref_prefixes);
if (transport_disconnect(transport))
return 1;
 
diff --git a/refs.c b/refs.c
index 20ba82b434..cefbad2076 100644
--- a/refs.c
+++ b/refs.c
@@ -13,6 +13,7 @@
 #include "tag.h"
 #include "submodule.h"
 #include "worktree.h"
+#include "argv-array.h"
 
 /*
  * List of all available backends
@@ -501,6 +502,19 @@ int refname_match(const char *abbrev_name, const char 
*full_name)
return 0;
 }
 
+/*
+ * Given a 'prefix' expand it by the rules in 'ref_rev_parse_rules' and add
+ * the results to 'prefixes'
+ */
+void expand_ref_prefix(struct argv_array *prefixes, const char *prefix)
+{
+   const char **p;
+   int len = strlen(prefix);
+
+   for (p = ref_rev_parse_rules; *p; p++)
+   argv_array_pushf(prefixes, *p, len, prefix);
+}
+
 /*
  * *string and *len will only be substituted, and *string returned (for
  * later free()ing) if the string passed in is a magic short-hand form
diff --git a/refs.h b/refs.h
index 01be5ae32f..93b6dce944 100644
--- a/refs.h
+++ b/refs.h
@@ -139,6 +139,13 @@ int resolve_gitlink_ref(const char *submodule, const char 
*refname,
  */
 int refname_match(const char *abbrev_name, const char *full_name);
 
+/*
+ * Given a 'prefix' expand it by the rules in 'ref_rev_parse_rules' and add
+ * the results to 'prefixes'
+ */
+struct argv_array;
+void expand_ref_prefix(struct argv_array *prefixes, const char *prefix);
+
 int expand_ref(const char *str, int len, struct object_id *oid, char **ref);
 int dwim_ref(const char *str, int len, struct object_id *oid, char **ref);
 int dwim_log(const char *str, int len, struct object_id *oid, char **ref);
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index dc5f813beb..562610fd25 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -32,6 +32,19 @@ test_expect_success 'list refs with git:// using protocol 
v2' '
test_cmp actual expect
 '
 
+test_expect_success 'ref advertisment is filtered with ls-remote using 
protocol v2' '
+   test_when_finished "rm -f log" &&
+
+   GIT_TRACE_PACKET="$(pwd)/log" git -c protocol.version=2 \
+   ls-remote "$GIT_DAEMON_URL/parent" master >actual &&
+
+   cat >expect <<-EOF &&
+   $(git -C "$daemon_parent" rev-parse refs/heads/master)$(printf 
"\t")refs/heads/master
+   EOF
+
+   test_cmp actual expect
+'
+
 stop_git_daemon
 
 # Test protocol v2 with 'file://' transport
@@ -54,4 +67,17 @@ test_expect_success 'list refs with file:// using protocol 
v2' '
test_cmp actual expect
 '
 
+test_expect_success 'ref advertisment is filtered with ls-remote using 
protocol v2' '
+   tes

[PATCH v5 13/35] ls-refs: introduce ls-refs server command

2018-03-14 Thread Brandon Williams
Introduce the ls-refs server command.  In protocol v2, the ls-refs
command is used to request the ref advertisement from the server.  Since
it is a command which can be requested (as opposed to mandatory in v1),
a client can sent a number of parameters in its request to limit the ref
advertisement based on provided ref-prefixes.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt |  31 +++
 Makefile|   1 +
 ls-refs.c   |  96 
 ls-refs.h   |  10 +++
 serve.c |   8 ++
 t/t5701-git-serve.sh| 115 
 6 files changed, 261 insertions(+)
 create mode 100644 ls-refs.c
 create mode 100644 ls-refs.h

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 3a671497b2..422edf870e 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -172,3 +172,34 @@ printable ASCII characters except space (i.e., the byte 
range 32 < x <
 "git/1.8.3.1"). The agent strings are purely informative for statistics
 and debugging purposes, and MUST NOT be used to programmatically assume
 the presence or absence of particular features.
+
+ ls-refs
+-
+
+`ls-refs` is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in arguments
+which can be used to limit the refs sent from the server.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "= "
+
+ls-refs takes in the following arguments:
+
+symrefs
+   In addition to the object pointed by it, show the underlying ref
+   pointed by it when showing a symbolic ref.
+peel
+   Show peeled tags.
+ref-prefix 
+   When specified, only references having a prefix matching one of
+   the provided prefixes are displayed.
+
+The output of ls-refs is as follows:
+
+output = *ref
+flush-pkt
+ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+ref-attribute = (symref | peeled)
+symref = "symref-target:" symref-target
+peeled = "peeled:" obj-id
diff --git a/Makefile b/Makefile
index 18c255428a..e50927cfb3 100644
--- a/Makefile
+++ b/Makefile
@@ -825,6 +825,7 @@ LIB_OBJS += list-objects-filter-options.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
diff --git a/ls-refs.c b/ls-refs.c
new file mode 100644
index 00..a06f12eca8
--- /dev/null
+++ b/ls-refs.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "repository.h"
+#include "refs.h"
+#include "remote.h"
+#include "argv-array.h"
+#include "ls-refs.h"
+#include "pkt-line.h"
+
+/*
+ * Check if one of the prefixes is a prefix of the ref.
+ * If no prefixes were provided, all refs match.
+ */
+static int ref_match(const struct argv_array *prefixes, const char *refname)
+{
+   int i;
+
+   if (!prefixes->argc)
+   return 1; /* no restriction */
+
+   for (i = 0; i < prefixes->argc; i++) {
+   const char *prefix = prefixes->argv[i];
+
+   if (starts_with(refname, prefix))
+   return 1;
+   }
+
+   return 0;
+}
+
+struct ls_refs_data {
+   unsigned peel;
+   unsigned symrefs;
+   struct argv_array prefixes;
+};
+
+static int send_ref(const char *refname, const struct object_id *oid,
+   int flag, void *cb_data)
+{
+   struct ls_refs_data *data = cb_data;
+   const char *refname_nons = strip_namespace(refname);
+   struct strbuf refline = STRBUF_INIT;
+
+   if (!ref_match(&data->prefixes, refname))
+   return 0;
+
+   strbuf_addf(&refline, "%s %s", oid_to_hex(oid), refname_nons);
+   if (data->symrefs && flag & REF_ISSYMREF) {
+   struct object_id unused;
+   const char *symref_target = resolve_ref_unsafe(refname, 0,
+  &unused,
+  &flag);
+
+   if (!symref_target)
+   die("'%s' is a symref but it is not?", refname);
+
+   strbuf_addf(&refline, " symref-target:%s", symref_target);
+   }
+
+   if (data->peel) {
+   struct object_id peeled;
+   if (!peel_ref(refname, &peeled))
+   strbuf_addf(&refline, " peeled:%s", 
oid_to_hex(&peeled));
+   }
+
+   strbuf_addch(&refline, '\n');
+   packet_write(1, refline.buf, refline.len);
+
+   strbuf_release(&refline);
+   return 0;
+}
+
+int ls_refs(struct repository *r, struct argv_array *keys,
+   struct packet_reader *re

[PATCH v5 18/35] fetch: pass ref prefixes when fetching

2018-03-14 Thread Brandon Williams
Construct a list of ref prefixes to be passed to
'transport_get_remote_refs()' from the refspec to be used during the
fetch.  This list of ref prefixes will be used to allow the server to
filter the ref advertisement when communicating using protocol v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 850382f559..8258bbf950 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -332,11 +332,28 @@ static struct ref *get_ref_map(struct transport 
*transport,
struct ref *rm;
struct ref *ref_map = NULL;
struct ref **tail = &ref_map;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = &orefs;
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport, 
NULL);
+   const struct ref *remote_refs;
+
+   for (i = 0; i < refspec_count; i++) {
+   if (!refspecs[i].exact_sha1) {
+   const char *glob = strchr(refspecs[i].src, '*');
+   if (glob)
+   argv_array_pushf(&ref_prefixes, "%.*s",
+(int)(glob - refspecs[i].src),
+refspecs[i].src);
+   else
+   expand_ref_prefix(&ref_prefixes, 
refspecs[i].src);
+   }
+   }
+
+   remote_refs = transport_get_remote_refs(transport, &ref_prefixes);
+
+   argv_array_clear(&ref_prefixes);
 
if (refspec_count) {
struct refspec *fetch_refspec;
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 12/35] serve: introduce git-serve

2018-03-14 Thread Brandon Williams
Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current
wire protocol.  The intention is that it will be a simpler, less
wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.  In its place a server will export a list of
capabilities and commands which it supports in a capability
advertisement.  A client can then request that a particular command be
executed by providing a number of capabilities and command specific
parameters.  At the completion of a command, a client can request that
another command be executed or can terminate the connection by sending a
flush packet.

Signed-off-by: Brandon Williams 
---
 .gitignore  |   1 +
 Documentation/Makefile  |   1 +
 Documentation/technical/protocol-v2.txt | 174 +
 Makefile|   2 +
 builtin.h   |   1 +
 builtin/serve.c |  30 +++
 git.c   |   1 +
 serve.c | 247 
 serve.h |  15 ++
 t/t5701-git-serve.sh|  60 ++
 10 files changed, 532 insertions(+)
 create mode 100644 Documentation/technical/protocol-v2.txt
 create mode 100644 builtin/serve.c
 create mode 100644 serve.c
 create mode 100644 serve.h
 create mode 100755 t/t5701-git-serve.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..2d0450c262 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
 /git-rm
 /git-send-email
 /git-send-pack
+/git-serve
 /git-sh-i18n
 /git-sh-i18n--envsubst
 /git-sh-setup
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 4ae9ba5c86..b105775acd 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
+TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 00..3a671497b2
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,174 @@
+ Git Wire Protocol, Version 2
+==
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+supported by a single service
+  * Easily extendable as capabilities are moved into their own section
+of the protocol, no longer being hidden behind a NUL byte and
+limited by the size of a pkt-line
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+  * Designed with http and stateless-rpc in mind.  With clear flush
+semantics the http remote helper can simply act as a proxy
+
+ Detailed Design
+=
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Packet-Line Framing
+-
+
+All communication is done using packet-line framing, just as in v1.  See
+`Documentation/technical/pack-protocol.txt` and
+`Documentation/technical/protocol-common.txt` for more information.
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Initial Client Request
+
+
+In general a client can request to speak protocol v2 by sending
+`version=2` through the respective side-channel for the transport being
+used which inevitably sets `GIT_PROTOCOL`.  More information can be
+found in `pack-protocol.txt` and `http-protocol.txt`.  In all cases the
+response from the server is the capability advertisement.
+
+ Git Transport
+~~~
+
+When using the git:// transport, you can request to use protocol v2 by
+sending "version=2" as an extra parameter:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
+
+ SSH and File Transport
+
+
+When using either the ssh:// or file:// transport, the GIT_PROTOCOL
+environment variable must be set explicitly to include "versi

[PATCH v5 14/35] connect: request remote refs using v2

2018-03-14 Thread Brandon Williams
Teach the client to be able to request a remote's refs using protocol
v2.  This is done by having a client issue a 'ls-refs' request to a v2
server.

Signed-off-by: Brandon Williams 
---
 builtin/upload-pack.c  |  10 +--
 connect.c  | 138 +++--
 connect.h  |   2 +
 remote.h   |   6 ++
 t/t5702-protocol-v2.sh |  57 +
 transport.c|   2 +-
 6 files changed, 204 insertions(+), 11 deletions(-)
 create mode 100755 t/t5702-protocol-v2.sh

diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 8d53e9794b..a757df8da0 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "protocol.h"
 #include "upload-pack.h"
+#include "serve.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -16,6 +17,7 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
const char *dir;
int strict = 0;
struct upload_pack_options opts = { 0 };
+   struct serve_options serve_opts = SERVE_OPTIONS_INIT;
struct option options[] = {
OPT_BOOL(0, "stateless-rpc", &opts.stateless_rpc,
 N_("quit after a single request/response exchange")),
@@ -48,11 +50,9 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
 
switch (determine_protocol_version_server()) {
case protocol_v2:
-   /*
-* fetch support for protocol v2 has not been implemented yet,
-* so ignore the request to use v2 and fallback to using v0.
-*/
-   upload_pack(&opts);
+   serve_opts.advertise_capabilities = opts.advertise_refs;
+   serve_opts.stateless_rpc = opts.stateless_rpc;
+   serve(&serve_opts);
break;
case protocol_v1:
/*
diff --git a/connect.c b/connect.c
index 4b89b984c4..e42d779f71 100644
--- a/connect.c
+++ b/connect.c
@@ -12,9 +12,11 @@
 #include "sha1-array.h"
 #include "transport.h"
 #include "strbuf.h"
+#include "version.h"
 #include "protocol.h"
 
-static char *server_capabilities;
+static char *server_capabilities_v1;
+static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
 static const char *parse_feature_value(const char *, const char *, int *);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
+/* Checks if the server supports the capability 'c' */
+int server_supports_v2(const char *c, int die_on_error)
+{
+   int i;
+
+   for (i = 0; i < server_capabilities_v2.argc; i++) {
+   const char *out;
+   if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
+   (!*out || *out == '='))
+   return 1;
+   }
+
+   if (die_on_error)
+   die("server doesn't support '%s'", c);
+
+   return 0;
+}
+
+static void process_capabilities_v2(struct packet_reader *reader)
+{
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL)
+   argv_array_push(&server_capabilities_v2, reader->line);
+
+   if (reader->status != PACKET_READ_FLUSH)
+   die("expected flush after capabilities");
+}
+
 enum protocol_version discover_version(struct packet_reader *reader)
 {
enum protocol_version version = protocol_unknown_version;
@@ -84,7 +113,7 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
 
switch (version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   process_capabilities_v2(reader);
break;
case protocol_v1:
/* Read the peeked version line */
@@ -128,7 +157,7 @@ static void parse_one_symref_info(struct string_list 
*symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
struct string_list symref = STRING_LIST_INIT_DUP;
-   const char *feature_list = server_capabilities;
+   const char *feature_list = server_capabilities_v1;
 
while (feature_list) {
int len;
@@ -157,7 +186,7 @@ static void process_capabilities(const char *line, int *len)
int nul_location = strlen(line);
if (nul_location == *len)
return;
-   server_capabilities = xstrdup(line + nul_location + 1);
+   server_capabilities_v1 = xstrdup(line + nul_location + 1);
*len = nul_location;
 }
 
@@ -292,6 +321,105 @@ struct ref **get_remote_heads(struct packet_reader 
*reader,
return list;
 }
 
+/* Returns 1 when a valid ref has been added to `list`, 0 otherwise */
+static int process_ref_v2(const char *line, struct ref ***list)
+{
+   int ret = 1;
+   int i = 0;
+   struct object_id old_oid;
+   str

[PATCH v5 11/35] test-pkt-line: introduce a packet-line test helper

2018-03-14 Thread Brandon Williams
Introduce a packet-line test helper which can either pack or unpack an
input stream into packet-lines and writes out the result to stdout.

Signed-off-by: Brandon Williams 
---
 Makefile |  1 +
 t/helper/test-pkt-line.c | 64 
 2 files changed, 65 insertions(+)
 create mode 100644 t/helper/test-pkt-line.c

diff --git a/Makefile b/Makefile
index b7ccc05fac..3b849c0607 100644
--- a/Makefile
+++ b/Makefile
@@ -669,6 +669,7 @@ TEST_PROGRAMS_NEED_X += test-mktemp
 TEST_PROGRAMS_NEED_X += test-online-cpus
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-pkt-line
 TEST_PROGRAMS_NEED_X += test-prio-queue
 TEST_PROGRAMS_NEED_X += test-read-cache
 TEST_PROGRAMS_NEED_X += test-write-cache
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
new file mode 100644
index 00..0f19e53c75
--- /dev/null
+++ b/t/helper/test-pkt-line.c
@@ -0,0 +1,64 @@
+#include "pkt-line.h"
+
+static void pack_line(const char *line)
+{
+   if (!strcmp(line, "") || !strcmp(line, "\n"))
+   packet_flush(1);
+   else if (!strcmp(line, "0001") || !strcmp(line, "0001\n"))
+   packet_delim(1);
+   else
+   packet_write_fmt(1, "%s", line);
+}
+
+static void pack(int argc, const char **argv)
+{
+   if (argc) { /* read from argv */
+   int i;
+   for (i = 0; i < argc; i++)
+   pack_line(argv[i]);
+   } else { /* read from stdin */
+   char line[LARGE_PACKET_MAX];
+   while (fgets(line, sizeof(line), stdin)) {
+   pack_line(line);
+   }
+   }
+}
+
+static void unpack(void)
+{
+   struct packet_reader reader;
+   packet_reader_init(&reader, 0, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF |
+  PACKET_READ_CHOMP_NEWLINE);
+
+   while (packet_reader_read(&reader) != PACKET_READ_EOF) {
+   switch (reader.status) {
+   case PACKET_READ_EOF:
+   break;
+   case PACKET_READ_NORMAL:
+   printf("%s\n", reader.line);
+   break;
+   case PACKET_READ_FLUSH:
+   printf("\n");
+   break;
+   case PACKET_READ_DELIM:
+   printf("0001\n");
+   break;
+   }
+   }
+}
+
+int cmd_main(int argc, const char **argv)
+{
+   if (argc < 2)
+   die("too few arguments");
+
+   if (!strcmp(argv[1], "pack"))
+   pack(argc - 2, argv + 2);
+   else if (!strcmp(argv[1], "unpack"))
+   unpack();
+   else
+   die("invalid argument '%s'", argv[1]);
+
+   return 0;
+}
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 10/35] protocol: introduce enum protocol_version value protocol_v2

2018-03-14 Thread Brandon Williams
Introduce protocol_v2, a new value for 'enum protocol_version'.
Subsequent patches will fill in the implementation of protocol_v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c   | 2 ++
 builtin/receive-pack.c | 6 ++
 builtin/send-pack.c| 3 +++
 builtin/upload-pack.c  | 7 +++
 connect.c  | 3 +++
 protocol.c | 2 ++
 protocol.h | 1 +
 remote-curl.c  | 3 +++
 transport.c| 9 +
 9 files changed, 36 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 85d4faf76c..b2374ddbbf 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -201,6 +201,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version(&reader)) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
case protocol_v1:
case protocol_v0:
get_remote_heads(&reader, &ref, 0, NULL, &shallow);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index b7ce7c7f52..3656e94fdb 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1963,6 +1963,12 @@ int cmd_receive_pack(int argc, const char **argv, const 
char *prefix)
unpack_limit = receive_unpack_limit;
 
switch (determine_protocol_version_server()) {
+   case protocol_v2:
+   /*
+* push support for protocol v2 has not been implemented yet,
+* so ignore the request to use v2 and fallback to using v0.
+*/
+   break;
case protocol_v1:
/*
 * v1 is just the original protocol with a version string,
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 83cb125a68..b5427f75e3 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -263,6 +263,9 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version(&reader)) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(&reader, &remote_refs, REF_NORMAL,
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 2cb5cb35b0..8d53e9794b 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -47,6 +47,13 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
die("'%s' does not appear to be a git repository", dir);
 
switch (determine_protocol_version_server()) {
+   case protocol_v2:
+   /*
+* fetch support for protocol v2 has not been implemented yet,
+* so ignore the request to use v2 and fallback to using v0.
+*/
+   upload_pack(&opts);
+   break;
case protocol_v1:
/*
 * v1 is just the original protocol with a version string,
diff --git a/connect.c b/connect.c
index 0b111e62d7..4b89b984c4 100644
--- a/connect.c
+++ b/connect.c
@@ -83,6 +83,9 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
}
 
switch (version) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
/* Read the peeked version line */
packet_reader_read(reader);
diff --git a/protocol.c b/protocol.c
index 43012b7eb6..5e636785d1 100644
--- a/protocol.c
+++ b/protocol.c
@@ -8,6 +8,8 @@ static enum protocol_version parse_protocol_version(const char 
*value)
return protocol_v0;
else if (!strcmp(value, "1"))
return protocol_v1;
+   else if (!strcmp(value, "2"))
+   return protocol_v2;
else
return protocol_unknown_version;
 }
diff --git a/protocol.h b/protocol.h
index 1b2bc94a8d..2ad35e433c 100644
--- a/protocol.h
+++ b/protocol.h
@@ -5,6 +5,7 @@ enum protocol_version {
protocol_unknown_version = -1,
protocol_v0 = 0,
protocol_v1 = 1,
+   protocol_v2 = 2,
 };
 
 /*
diff --git a/remote-curl.c b/remote-curl.c
index 9f6d07683d..dae8a4a48d 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -185,6 +185,9 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version(&reader)) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
diff --git a/transport.c b/transport.c
index 2378dcb38c..83d9dd1df6 100644
--- a/transport.c
+++ b/transport.c
@@ -203,6 +203,9 @@ static st

[PATCH v5 04/35] upload-pack: convert to a builtin

2018-03-14 Thread Brandon Williams
In order to allow for code sharing with the server-side of fetch in
protocol-v2 convert upload-pack to be a builtin.

Signed-off-by: Brandon Williams 
---
 Makefile  |   3 +-
 builtin.h |   1 +
 builtin/upload-pack.c |  67 ++
 git.c |   1 +
 upload-pack.c | 107 ++
 upload-pack.h |  13 +
 6 files changed, 109 insertions(+), 83 deletions(-)
 create mode 100644 builtin/upload-pack.c
 create mode 100644 upload-pack.h

diff --git a/Makefile b/Makefile
index 1a9b23b679..b7ccc05fac 100644
--- a/Makefile
+++ b/Makefile
@@ -639,7 +639,6 @@ PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
 PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
-PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
 
 # Binary suffix, set to .exe for Windows builds
@@ -909,6 +908,7 @@ LIB_OBJS += tree-diff.o
 LIB_OBJS += tree.o
 LIB_OBJS += tree-walk.o
 LIB_OBJS += unpack-trees.o
+LIB_OBJS += upload-pack.o
 LIB_OBJS += url.o
 LIB_OBJS += urlmatch.o
 LIB_OBJS += usage.o
@@ -1026,6 +1026,7 @@ BUILTIN_OBJS += builtin/update-index.o
 BUILTIN_OBJS += builtin/update-ref.o
 BUILTIN_OBJS += builtin/update-server-info.o
 BUILTIN_OBJS += builtin/upload-archive.o
+BUILTIN_OBJS += builtin/upload-pack.o
 BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..f332a12574 100644
--- a/builtin.h
+++ b/builtin.h
@@ -231,6 +231,7 @@ extern int cmd_update_ref(int argc, const char **argv, 
const char *prefix);
 extern int cmd_update_server_info(int argc, const char **argv, const char 
*prefix);
 extern int cmd_upload_archive(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_archive_writer(int argc, const char **argv, const char 
*prefix);
+extern int cmd_upload_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_var(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_tag(int argc, const char **argv, const char *prefix);
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
new file mode 100644
index 00..2cb5cb35b0
--- /dev/null
+++ b/builtin/upload-pack.c
@@ -0,0 +1,67 @@
+#include "cache.h"
+#include "builtin.h"
+#include "exec_cmd.h"
+#include "pkt-line.h"
+#include "parse-options.h"
+#include "protocol.h"
+#include "upload-pack.h"
+
+static const char * const upload_pack_usage[] = {
+   N_("git upload-pack [] "),
+   NULL
+};
+
+int cmd_upload_pack(int argc, const char **argv, const char *prefix)
+{
+   const char *dir;
+   int strict = 0;
+   struct upload_pack_options opts = { 0 };
+   struct option options[] = {
+   OPT_BOOL(0, "stateless-rpc", &opts.stateless_rpc,
+N_("quit after a single request/response exchange")),
+   OPT_BOOL(0, "advertise-refs", &opts.advertise_refs,
+N_("exit immediately after initial ref 
advertisement")),
+   OPT_BOOL(0, "strict", &strict,
+N_("do not try /.git/ if  is no 
Git directory")),
+   OPT_INTEGER(0, "timeout", &opts.timeout,
+   N_("interrupt transfer after  seconds of 
inactivity")),
+   OPT_END()
+   };
+
+   packet_trace_identity("upload-pack");
+   check_replace_refs = 0;
+
+   argc = parse_options(argc, argv, NULL, options, upload_pack_usage, 0);
+
+   if (argc != 1)
+   usage_with_options(upload_pack_usage, options);
+
+   if (opts.timeout)
+   opts.daemon_mode = 1;
+
+   setup_path();
+
+   dir = argv[0];
+
+   if (!enter_repo(dir, strict))
+   die("'%s' does not appear to be a git repository", dir);
+
+   switch (determine_protocol_version_server()) {
+   case protocol_v1:
+   /*
+* v1 is just the original protocol with a version string,
+* so just fall through after writing the version string.
+*/
+   if (opts.advertise_refs || !opts.stateless_rpc)
+   packet_write_fmt(1, "version 1\n");
+
+   /* fallthrough */
+   case protocol_v0:
+   upload_pack(&opts);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
+   return 0;
+}
diff --git a/git.c b/git.c
index c870b9719c..f71073dc8d 100644
--- a/git.c
+++ b/git.c
@@ -478,6 +478,7 @@ static struct cmd_struct commands[] = {
{ "update-server-info", cmd_update_server_info, RUN_SETUP },
{ "upload-archive", cmd_upload_archive },
{ "upload-archive--writer", cmd_upload_archive_writer },
+   { "upload-pack", cmd_upload_pack },
{ "var", cmd_var, RUN_SETUP_GENTLY },
{ "veri

[PATCH v5 07/35] connect: convert get_remote_heads to use struct packet_reader

2018-03-14 Thread Brandon Williams
In order to allow for better control flow when protocol_v2 is introduced
convert 'get_remote_heads()' to use 'struct packet_reader' to read
packet lines.  This enables a client to be able to peek the first line
of a server's response (without consuming it) in order to determine the
protocol version its speaking and then passing control to the
appropriate handler.

This is needed because the initial response from a server speaking
protocol_v0 includes the first ref, while subsequent protocol versions
respond with a version line.  We want to be able to read this first line
without consuming the first ref sent in the protocol_v0 case so that the
protocol version the server is speaking can be determined outside of
'get_remote_heads()' in a future patch.

Signed-off-by: Brandon Williams 
---
 connect.c | 173 ++
 1 file changed, 95 insertions(+), 78 deletions(-)

diff --git a/connect.c b/connect.c
index c3a014c5ba..c82c90b7c3 100644
--- a/connect.c
+++ b/connect.c
@@ -48,6 +48,12 @@ int check_ref_type(const struct ref *ref, int flags)
 
 static void die_initial_contact(int unexpected)
 {
+   /*
+* A hang-up after seeing some response from the other end
+* means that it is unexpected, as we know the other end is
+* willing to talk to us.  A hang-up before seeing any
+* response does not necessarily mean an ACL problem, though.
+*/
if (unexpected)
die(_("The remote end hung up upon initial contact"));
else
@@ -56,6 +62,40 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
+static enum protocol_version discover_version(struct packet_reader *reader)
+{
+   enum protocol_version version = protocol_unknown_version;
+
+   /*
+* Peek the first line of the server's response to
+* determine the protocol version the server is speaking.
+*/
+   switch (packet_reader_peek(reader)) {
+   case PACKET_READ_EOF:
+   die_initial_contact(0);
+   case PACKET_READ_FLUSH:
+   case PACKET_READ_DELIM:
+   version = protocol_v0;
+   break;
+   case PACKET_READ_NORMAL:
+   version = determine_protocol_version_client(reader->line);
+   break;
+   }
+
+   switch (version) {
+   case protocol_v1:
+   /* Read the peeked version line */
+   packet_reader_read(reader);
+   break;
+   case protocol_v0:
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
+   return version;
+}
+
 static void parse_one_symref_info(struct string_list *symref, const char *val, 
int len)
 {
char *sym, *target;
@@ -109,60 +149,21 @@ static void annotate_refs_with_symref_info(struct ref 
*ref)
string_list_clear(&symref, 0);
 }
 
-/*
- * Read one line of a server's ref advertisement into packet_buffer.
- */
-static int read_remote_ref(int in, char **src_buf, size_t *src_len,
-  int *responded)
+static void process_capabilities(const char *line, int *len)
 {
-   int len = packet_read(in, src_buf, src_len,
- packet_buffer, sizeof(packet_buffer),
- PACKET_READ_GENTLE_ON_EOF |
- PACKET_READ_CHOMP_NEWLINE);
-   const char *arg;
-   if (len < 0)
-   die_initial_contact(*responded);
-   if (len > 4 && skip_prefix(packet_buffer, "ERR ", &arg))
-   die("remote error: %s", arg);
-
-   *responded = 1;
-
-   return len;
-}
-
-#define EXPECTING_PROTOCOL_VERSION 0
-#define EXPECTING_FIRST_REF 1
-#define EXPECTING_REF 2
-#define EXPECTING_SHALLOW 3
-
-/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
-static int process_protocol_version(void)
-{
-   switch (determine_protocol_version_client(packet_buffer)) {
-   case protocol_v1:
-   return 1;
-   case protocol_v0:
-   return 0;
-   default:
-   die("server is speaking an unknown protocol");
-   }
-}
-
-static void process_capabilities(int *len)
-{
-   int nul_location = strlen(packet_buffer);
+   int nul_location = strlen(line);
if (nul_location == *len)
return;
-   server_capabilities = xstrdup(packet_buffer + nul_location + 1);
+   server_capabilities = xstrdup(line + nul_location + 1);
*len = nul_location;
 }
 
-static int process_dummy_ref(void)
+static int process_dummy_ref(const char *line)
 {
struct object_id oid;
const char *name;
 
-   if (parse_oid_hex(packet_buffer, &oid, &name))
+   if (parse_oid_hex(line, &oid, &name))
return 0;
if (*name != ' ')
return 0;
@@ -171,20 +172,20 @@ static int process_dummy_ref(void)
return !oidcmp(&null_

[PATCH v5 02/35] pkt-line: allow peeking a packet line without consuming it

2018-03-14 Thread Brandon Williams
Sometimes it is advantageous to be able to peek the next packet line
without consuming it (e.g. to be able to determine the protocol version
a server is speaking).  In order to do that introduce 'struct
packet_reader' which is an abstraction around the normal packet reading
logic.  This enables a caller to be able to peek a single line at a time
using 'packet_reader_peek()' and having a caller consume a line by
calling 'packet_reader_read()'.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 50 ++
 pkt-line.h | 58 ++
 2 files changed, 108 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index db2fb29ac3..1881dc8813 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -400,3 +400,53 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf 
*sb_out)
}
return sb_out->len - orig_len;
 }
+
+/* Packet Reader Functions */
+void packet_reader_init(struct packet_reader *reader, int fd,
+   char *src_buffer, size_t src_len,
+   int options)
+{
+   memset(reader, 0, sizeof(*reader));
+
+   reader->fd = fd;
+   reader->src_buffer = src_buffer;
+   reader->src_len = src_len;
+   reader->buffer = packet_buffer;
+   reader->buffer_size = sizeof(packet_buffer);
+   reader->options = options;
+}
+
+enum packet_read_status packet_reader_read(struct packet_reader *reader)
+{
+   if (reader->line_peeked) {
+   reader->line_peeked = 0;
+   return reader->status;
+   }
+
+   reader->status = packet_read_with_status(reader->fd,
+&reader->src_buffer,
+&reader->src_len,
+reader->buffer,
+reader->buffer_size,
+&reader->pktlen,
+reader->options);
+
+   if (reader->status == PACKET_READ_NORMAL)
+   reader->line = reader->buffer;
+   else
+   reader->line = NULL;
+
+   return reader->status;
+}
+
+enum packet_read_status packet_reader_peek(struct packet_reader *reader)
+{
+   /* Only allow peeking a single line */
+   if (reader->line_peeked)
+   return reader->status;
+
+   /* Peek a line by reading it and setting peeked flag */
+   packet_reader_read(reader);
+   reader->line_peeked = 1;
+   return reader->status;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 099b26b95f..11b04f026f 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -112,6 +112,64 @@ char *packet_read_line_buf(char **src_buf, size_t 
*src_len, int *size);
  */
 ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
 
+struct packet_reader {
+   /* source file descriptor */
+   int fd;
+
+   /* source buffer and its size */
+   char *src_buffer;
+   size_t src_len;
+
+   /* buffer that pkt-lines are read into and its size */
+   char *buffer;
+   unsigned buffer_size;
+
+   /* options to be used during reads */
+   int options;
+
+   /* status of the last read */
+   enum packet_read_status status;
+
+   /* length of data read during the last read */
+   int pktlen;
+
+   /* the last line read */
+   const char *line;
+
+   /* indicates if a line has been peeked */
+   int line_peeked;
+};
+
+/*
+ * Initialize a 'struct packet_reader' object which is an
+ * abstraction around the 'packet_read_with_status()' function.
+ */
+extern void packet_reader_init(struct packet_reader *reader, int fd,
+  char *src_buffer, size_t src_len,
+  int options);
+
+/*
+ * Perform a packet read and return the status of the read.
+ * The values of 'pktlen' and 'line' are updated based on the status of the
+ * read as follows:
+ *
+ * PACKET_READ_ERROR: 'pktlen' is set to '-1' and 'line' is set to NULL
+ * PACKET_READ_NORMAL: 'pktlen' is set to the number of bytes read
+ *'line' is set to point at the read line
+ * PACKET_READ_FLUSH: 'pktlen' is set to '0' and 'line' is set to NULL
+ */
+extern enum packet_read_status packet_reader_read(struct packet_reader 
*reader);
+
+/*
+ * Peek the next packet line without consuming it and return the status.
+ * The next call to 'packet_reader_read()' will perform a read of the same line
+ * that was peeked, consuming the line.
+ *
+ * Peeking multiple times without calling 'packet_reader_read()' will return
+ * the same result.
+ */
+extern enum packet_read_status packet_reader_peek(struct packet_reader 
*reader);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 #define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 01/35] pkt-line: introduce packet_read_with_status

2018-03-14 Thread Brandon Williams
The current pkt-line API encodes the status of a pkt-line read in the
length of the read content.  An error is indicated with '-1', a flush
with '0' (which can be confusing since a return value of '0' can also
indicate an empty pkt-line), and a positive integer for the length of
the read content otherwise.  This doesn't leave much room for allowing
the addition of additional special packets in the future.

To solve this introduce 'packet_read_with_status()' which reads a packet
and returns the status of the read encoded as an 'enum packet_status'
type.  This allows for easily identifying between special and normal
packets as well as errors.  It also enables easily adding a new special
packet in the future.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 51 +--
 pkt-line.h | 16 
 2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 2827ca772a..db2fb29ac3 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -280,28 +280,39 @@ static int packet_length(const char *linelen)
return (val < 0) ? val : (val << 8) | hex2chr(linelen + 2);
 }
 
-int packet_read(int fd, char **src_buf, size_t *src_len,
-   char *buffer, unsigned size, int options)
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
+   size_t *src_len, char *buffer,
+   unsigned size, int *pktlen,
+   int options)
 {
-   int len, ret;
+   int len;
char linelen[4];
 
-   ret = get_packet_data(fd, src_buf, src_len, linelen, 4, options);
-   if (ret < 0)
-   return ret;
+   if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) {
+   *pktlen = -1;
+   return PACKET_READ_EOF;
+   }
+
len = packet_length(linelen);
-   if (len < 0)
+
+   if (len < 0) {
die("protocol error: bad line length character: %.4s", linelen);
-   if (!len) {
+   } else if (!len) {
packet_trace("", 4, 0);
-   return 0;
+   *pktlen = 0;
+   return PACKET_READ_FLUSH;
+   } else if (len < 4) {
+   die("protocol error: bad line length %d", len);
}
+
len -= 4;
-   if (len >= size)
+   if ((unsigned)len >= size)
die("protocol error: bad line length %d", len);
-   ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
-   if (ret < 0)
-   return ret;
+
+   if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0) 
{
+   *pktlen = -1;
+   return PACKET_READ_EOF;
+   }
 
if ((options & PACKET_READ_CHOMP_NEWLINE) &&
len && buffer[len-1] == '\n')
@@ -309,7 +320,19 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
 
buffer[len] = 0;
packet_trace(buffer, len, 0);
-   return len;
+   *pktlen = len;
+   return PACKET_READ_NORMAL;
+}
+
+int packet_read(int fd, char **src_buffer, size_t *src_len,
+   char *buffer, unsigned size, int options)
+{
+   int pktlen = -1;
+
+   packet_read_with_status(fd, src_buffer, src_len, buffer, size,
+   &pktlen, options);
+
+   return pktlen;
 }
 
 static char *packet_read_line_generic(int fd,
diff --git a/pkt-line.h b/pkt-line.h
index 3dad583e2d..099b26b95f 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -65,6 +65,22 @@ int write_packetized_from_buf(const char *src_in, size_t 
len, int fd_out);
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
*buffer, unsigned size, int options);
 
+/*
+ * Read a packetized line into a buffer like the 'packet_read()' function but
+ * returns an 'enum packet_read_status' which indicates the status of the read.
+ * The number of bytes read will be assigined to *pktlen if the status of the
+ * read was 'PACKET_READ_NORMAL'.
+ */
+enum packet_read_status {
+   PACKET_READ_EOF,
+   PACKET_READ_NORMAL,
+   PACKET_READ_FLUSH,
+};
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
+   size_t *src_len, char *buffer,
+   unsigned size, int *pktlen,
+   int options);
+
 /*
  * Convenience wrapper for packet_read that is not gentle, and sets the
  * CHOMP_NEWLINE option. The return value is NULL for a flush packet,
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 03/35] pkt-line: add delim packet support

2018-03-14 Thread Brandon Williams
One of the design goals of protocol-v2 is to improve the semantics of
flush packets.  Currently in protocol-v1, flush packets are used both to
indicate a break in a list of packet lines as well as an indication that
one side has finished speaking.  This makes it particularly difficult
to implement proxies as a proxy would need to completely understand git
protocol instead of simply looking for a flush packet.

To do this, introduce the special deliminator packet '0001'.  A delim
packet can then be used as a deliminator between lists of packet lines
while flush packets can be reserved to indicate the end of a response.

Documentation for how this packet will be used in protocol v2 will
included in a future patch.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 16 
 pkt-line.h |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 1881dc8813..7296731cf3 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,12 @@ void packet_flush(int fd)
write_or_die(fd, "", 4);
 }
 
+void packet_delim(int fd)
+{
+   packet_trace("0001", 4, 1);
+   write_or_die(fd, "0001", 4);
+}
+
 int packet_flush_gently(int fd)
 {
packet_trace("", 4, 1);
@@ -105,6 +111,12 @@ void packet_buf_flush(struct strbuf *buf)
strbuf_add(buf, "", 4);
 }
 
+void packet_buf_delim(struct strbuf *buf)
+{
+   packet_trace("0001", 4, 1);
+   strbuf_add(buf, "0001", 4);
+}
+
 static void set_packet_header(char *buf, const int size)
 {
static char hexchar[] = "0123456789abcdef";
@@ -301,6 +313,10 @@ enum packet_read_status packet_read_with_status(int fd, 
char **src_buffer,
packet_trace("", 4, 0);
*pktlen = 0;
return PACKET_READ_FLUSH;
+   } else if (len == 1) {
+   packet_trace("0001", 4, 0);
+   *pktlen = 0;
+   return PACKET_READ_DELIM;
} else if (len < 4) {
die("protocol error: bad line length %d", len);
}
diff --git a/pkt-line.h b/pkt-line.h
index 11b04f026f..9570bd7a0a 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,8 +20,10 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
+void packet_delim(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format 
(printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
+void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
@@ -75,6 +77,7 @@ enum packet_read_status {
PACKET_READ_EOF,
PACKET_READ_NORMAL,
PACKET_READ_FLUSH,
+   PACKET_READ_DELIM,
 };
 enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
size_t *src_len, char *buffer,
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v5 00/35] protocol version 2

2018-03-14 Thread Brandon Williams
Changes in v5:

 * Tweaked the API of the packet_read_with_status function so that it
   wrote the pktlen value even in the presence of non-normal reads
   (flush, delim, EOF).

 * Changed the format of ref-patterns for the ls-refs server command.
   They are now ref-prefixes and matching is done as a simple prefix
   match against refnames.

 * Tweaked the API for server commands.  Instead of the server code
   dispatching the command and passing it an argv_array of keys and
   args, I changed it so that it passes it an argv_array of keys and
   then passes off the packet_reader so that the command can do the
   reading of the remainder of the input itself.  This is to account for
   future commands (e.g. push) where the argument stream will include a
   packfile and doesn't make much sense to push into an argv_array.

 * Various documentation changes, one of which calls out that the
   protocol is stateless by default and if state needs to be introduced
   at a later point, it must be hidden behind a capability which is only
   advertised when using transports which support state on the server
   side.

v4 found at:
https://public-inbox.org/git/20180228232252.102167-1-bmw...@google.com/

Brandon Williams (35):
  pkt-line: introduce packet_read_with_status
  pkt-line: allow peeking a packet line without consuming it
  pkt-line: add delim packet support
  upload-pack: convert to a builtin
  upload-pack: factor out processing lines
  transport: use get_refs_via_connect to get refs
  connect: convert get_remote_heads to use struct packet_reader
  connect: discover protocol version outside of get_remote_heads
  transport: store protocol version
  protocol: introduce enum protocol_version value protocol_v2
  test-pkt-line: introduce a packet-line test helper
  serve: introduce git-serve
  ls-refs: introduce ls-refs server command
  connect: request remote refs using v2
  transport: convert get_refs_list to take a list of ref prefixes
  transport: convert transport_get_remote_refs to take a list of ref
prefixes
  ls-remote: pass ref prefixes when requesting a remote's refs
  fetch: pass ref prefixes when fetching
  push: pass ref prefixes when pushing
  upload-pack: introduce fetch server command
  fetch-pack: perform a fetch using v2
  fetch-pack: support shallow requests
  connect: refactor git_connect to only get the protocol version once
  connect: don't request v2 when pushing
  transport-helper: remove name parameter
  transport-helper: refactor process_connect_service
  transport-helper: introduce stateless-connect
  pkt-line: add packet_buf_write_len function
  remote-curl: create copy of the service name
  remote-curl: store the protocol version the server responded with
  http: allow providing extra headers for http requests
  http: don't always add Git-Protocol header
  http: eliminate "# service" line when using protocol v2
  remote-curl: implement stateless-connect command
  remote-curl: don't request v2 when pushing

 .gitignore  |   1 +
 Documentation/Makefile  |   1 +
 Documentation/gitremote-helpers.txt |  32 ++
 Documentation/technical/protocol-v2.txt | 398 +++
 Makefile|   7 +-
 builtin.h   |   2 +
 builtin/clone.c |   2 +-
 builtin/fetch-pack.c|  20 +-
 builtin/fetch.c |  21 +-
 builtin/ls-remote.c |  15 +-
 builtin/receive-pack.c  |   6 +
 builtin/remote.c|   2 +-
 builtin/send-pack.c |  20 +-
 builtin/serve.c |  30 ++
 builtin/upload-pack.c   |  74 +++
 connect.c   | 364 ++
 connect.h   |   7 +
 fetch-pack.c| 339 -
 fetch-pack.h|   4 +-
 git.c   |   2 +
 http-backend.c  |   8 +-
 http.c  |  25 +-
 http.h  |   7 +
 ls-refs.c   |  96 
 ls-refs.h   |  10 +
 pkt-line.c  | 133 -
 pkt-line.h  |  78 +++
 protocol.c  |   2 +
 protocol.h  |   1 +
 refs.c  |  14 +
 refs.h  |   7 +
 remote-curl.c   | 280 ++-
 remote.h|  11 +-
 serve.c | 257 ++
 serve.h |  15 +
 t/helper/test-pkt-line.c|  64 +++
 t/t5701-git-serve.sh| 176 +++
 t/t5702-protocol-v2.sh  | 273 +++
 transport-helper.c  |  87 ++--
 

Re: [GSoC] [PATCH] test: avoid pipes in git related commands for test suite

2018-03-14 Thread Eric Sunshine
On Wed, Mar 14, 2018 at 5:57 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Wed, Mar 14 2018, Eric Sunshine jotted:
>> On Tue, Mar 13, 2018 at 4:19 PM, Pratik Karki  
>> wrote:
>>> -'git diff-tree -r -M --name-status  HEAD^ HEAD | \
>>> - grep "^R100..*path0/COPYING..*path2/COPYING" &&
>>> - git diff-tree -r -M --name-status  HEAD^ HEAD | \
>>> - grep "^R100..*path0/README..*path2/README"'
>>> +'git diff-tree -r -M --name-status  HEAD^ HEAD >actual &&
>>> + grep "^R100..*path0/COPYING..*path2/COPYING" actual &&
>>> + git diff-tree -r -M --name-status  HEAD^ HEAD >actual &&
>>> + grep "^R100..*path0/README..*path2/README" actual'
>>
>> Although this "mechanical" transformation is technically correct, it
>> is nevertheless wasteful. The exact same "git diff-tree ..." command
>> is run twice, and both times output is captured to file 'actual',
>> which makes the second invocation superfluous. Instead, a better
>> transformation would be:
>>
>> git diff-tree ... >actual &&
>> grep ... actual &&
>> grep ... actual
>>
> I think we have to be careful to not be overly picky with rejecting
> mechanical transformations that fix bugs on the basis that while we're
> at it the test could also be rewritten.
>
> I.e. this bug was there before, maybe we should purely focus on just
> replacing the harmful pipe pattern that hides errors in this series and
> leave rewriting the actual test logic for a later patch.

Thanks for presenting an opposing opinion. While I understand your
position, the reason for my suggested transformation is that if the
patch already transformed the code in the way suggested, it would
increase my confidence, as a reviewer, that the patch author had
_studied_ and _understood_ the code. Increased confidence is
especially important for mechanical transformations since -- as seen
in the unsnipped review comment below -- blindly-applied mechanical
transformations can be suboptimal or outright incorrect.

It's also the sort of review comment I would make even to very
seasoned project participants[1].

[1]: 
https://public-inbox.org/git/capig+cqlmyqerhpxvzhmy7gapnbe25h_kosws-zjubo4bru...@mail.gmail.com/

>>> -   test $(git cat-file commit refs/remotes/glob | \
>>> -  grep "^parent " | wc -l) -eq 2
>>> +   test $(git cat-file commit refs/remotes/glob >actual &&
>>> +  grep "^parent " actual | wc -l) -eq 2
>>
>> This is not a great transformation. If "git cat-file" fails, then
>> neither 'grep' nor 'wc' will run, and the result will be as if 'test'
>> was called without an argument before "-eq". For example:
>>
>> % test $(false >actual && grep "^parent " actual | wc -l) -eq 2
>> test: -eq: unary operator expected
>>
>> It would be better to run "git cat-file" outside of "test $(...)". For 
>> instance:
>>
>> git cat-file ... >actual &&
>> test $(grep ... actual | wc -l) -eq 2
>>
>> Alternately, you could take advantage of the test_line_count() helper 
>> function:
>>
>> git cat-file ... >actual &&
>> grep ... actual >actual2 &&
>> test_line_count = 2 actual2
>
> In this case though as you rightly point out the rewrite is introducing
> a regression, which should definitely be fixed.


Re: [PATCH 1/2] rebase: support --signoff with implicit rebase

2018-03-14 Thread Junio C Hamano
Phillip Wood  writes:

> From: Phillip Wood 
>
> This allows one to run 'git rebase --exec "make check" --signoff'
> which is useful when preparing a patch series for publication and is
> more convenient than doing the signoff with another --exec command.
> This change also allows --root without --onto to work with --signoff
> as well (--root with --onto was already supported). Note that the
> failing test is due to a bug in 'rebase --root' when the root commit
> is empty which will be fixed in the next commit.
>
> Signed-off-by: Phillip Wood 
> ---

How important is the word "implicit" in the title?  Is it your
intention to actively ignore --signoff when we fall into the
rebase--interactive codepath explicitly?

I offhand do not think of a strong reason why it is a bad idea to
run "git rebase -i --signoff", turn a few "pick" to either "reword"
or "edit", and then expect that the editor to edit log messages for
these commits to add your sign-off when you start editing them.
The "pick"s that are left as-is would also turn into doing an
otherwise no-op "commit --amend -s", I guess.

If you are teaching --signoff to the whole of "rebase--interactive",
then "git rebase --help" needs a bit of update.

--signoff::
This flag is passed to 'git am' to sign off all the rebased
commits (see linkgit:git-am[1]). Incompatible with the
--interactive option.



Re: [PATCH v4 27/35] transport-helper: introduce stateless-connect

2018-03-14 Thread Brandon Williams
On 03/13, Jonathan Tan wrote:
> On Wed, 28 Feb 2018 15:22:44 -0800
> Brandon Williams  wrote:
> 
> > +'stateless-connect'::
> > +   Experimental; for internal use only.
> > +   Can attempt to connect to a remote server for communication
> > +   using git's wire-protocol version 2.  This establishes a
> > +   stateless, half-duplex connection.
> > ++
> > +Supported commands: 'stateless-connect'.
> > +
> >  'push'::
> > Can discover remote refs and push local commits and the
> > history leading up to them to new or existing remote refs.
> > @@ -136,6 +144,14 @@ Capabilities for Fetching
> >  +
> >  Supported commands: 'connect'.
> >  
> > +'stateless-connect'::
> > +   Experimental; for internal use only.
> > +   Can attempt to connect to a remote server for communication
> > +   using git's wire-protocol version 2.  This establishes a
> > +   stateless, half-duplex connection.
> > ++
> > +Supported commands: 'stateless-connect'.
> 
> I don't think we should use the term "half-duplex" - from a search, it
> means that both parties can use the wire but not simultaneously, which
> is not strictly true. Might be better to just say "see the documentation
> for the stateless-connect command for more information".
> 
> > +'stateless-connect' ::
> > +   Experimental; for internal use only.
> > +   Connects to the given remote service for communication using
> > +   git's wire-protocol version 2.  This establishes a stateless,
> > +   half-duplex connection.  Valid replies to this command are empty
> > +   line (connection established), 'fallback' (no smart transport
> > +   support, fall back to dumb transports) and just exiting with
> > +   error message printed (can't connect, don't bother trying to
> > +   fall back).  After line feed terminating the positive (empty)
> > +   response, the output of the service starts.  Messages (both
> > +   request and response) must be terminated with a single flush
> > +   packet, allowing the remote helper to properly act as a proxy.
> > +   After the connection ends, the remote helper exits.
> > ++
> > +Supported if the helper has the "stateless-connect" capability.
> 
> I'm not sure of the relevance of "allowing the remote helper to properly
> act as a proxy" - this scheme does make it easier to implement proxies,
> not for any party to start acting as one instead. I would write that
> part as:
> 
> Messages (both request and response) must consist of zero or more
> PKT-LINEs, terminating in a flush packet. The client must not expect
> the server to store any state in between request-response pairs.
> 
> (This covers the so-called "half-duplex" part and the "stateless" part.)

Thanks for helping wordsmith this, I'll update the docs based on these
suggestions.

-- 
Brandon Williams


Re: [PATCH v3 00/36] object_id part 12

2018-03-14 Thread Junio C Hamano
"brian m. carlson"  writes:

> -+buf += the_hash_algo->rawsz;
> -+size -= the_hash_algo->rawsz;
> ++memcpy(it->oid.hash, (const unsigned char*)buf, rawsz);
> ++buf += rawsz;
> ++size -= rawsz;
>   }

Using memcpy() to stuff the hash[] field of oid structure with a
bare byte array of rawsz bytes appears twice as a pattern in these
patches.  I wonder if this is something we want to abstract behind
the API, e.g.

size_t oidstuff_(struct object_id *oid, const unsigned char *buf)
{
size_t rawsz = the_hash_algo->rawsz;
memcpy(oid->hash, buf, rawsz);
return rawsz;
}

It just felt a bit uneven to be using a bare-metal memcpy() when
oidcpy() abstraction releaves the callers from having to be aware of
the rawsz all the time.



Dear friend,

2018-03-14 Thread Baari Abdul


Dear friend,

I  Mr.Baari Abdul, Head of Operation at Bank of Africa. I want invite into a 
business overture which involves an amount of $ 22.3 million. At your 
acceptance, this 

amount will be transferred to your name as a foreign partner.
 
I need your help to get this fund to be transfer out from here to your account, 
and we share at a ratio of 50% each. You will receive this amount by bank 
transfer.
Please send your full name.
You’re directly phone numbers.
Address.
 
I will detail you more about this transaction but i need above data to make 
some vital changes in the transit account which will make your name appear as 
the true 

beneficiary of the fund. You have to contact me through my private e-mail at 
(baariabdul...@gmail.com) your prompt reply will be highly appreciated.
Sincerely.

best regard
Mr.Baari Abdul.


Re: [PATCH 3/3] Makefile: optionally symlink libexec/git-core binaries to bin/git

2018-03-14 Thread Linus Torvalds
On Wed, Mar 14, 2018 at 3:14 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Wed, Mar 14 2018, Johannes Sixt jotted:
>>
>> It is important to leave the default at hard-linking the binaries,
>> because on Windows symbolic links are second class citizens (they
>> require special privileges and there is a distinction between link
>> targets being files or directories). Hard links work well.
>
> Yeah makes sense. I just want to add this as an option, and think if
> it's proven to be un-buggy we could probably turn it on by default on
> the *nix's if people prefer that, but yeah, we'll definitely need the
> uname detection.

I definitely would prefer to make symlinks the default on unix.

It's what we used to do (long long ago), and as you pointed out, it's
a lot clearer what's going on too when you don't have to look at inode
numbers and link counts.

Forcing hardlinking everywhere by default just because Windows
filesystems suck donkey ass through a straw is not the right thing
either.

Linus


Re: How to debug a "git merge"?

2018-03-14 Thread Derrick Stolee

On 3/14/2018 12:56 PM, Lars Schneider wrote:

Hi,

I am investigating a Git merge (a86dd40fe) in which an older version of
a file won over the newer version. I try to understand why this is the
case. I can reproduce the merge with the following commands:
$ git checkout -b test a02fa3303
$ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c

The merge actually generates a merge conflict but not for my
problematic file. The common ancestor of the two parents (merge base)
is b91161554.

The merge graph is not pretty (the committers don't have a clean
branching scheme) but I cannot spot a problem between the merge commit
and the common ancestor:
$ git log --graph --oneline a86dd40fe


Have you tried `git log --graph --oneline --simplify-merges -- path` to 
see what changes and merges involved the file? I find that view to be 
very helpful (while the default history simplification can hide things). 
In particular, if there was a change that was reverted in one side and 
not another, we could find out.


You could also use the "A...B" to check your two commits for merging, 
and maybe add "--boundary".




Can you give me a hint how to debug this merge further? How can I
understand why Git picked a certain version of a file in a merge?

I am using Git 2.16.2 on Linux.

Thanks,
Lars




Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag

2018-03-14 Thread Junio C Hamano
Derrick Stolee  writes:

>>  close_commit_graph();
>>
>> And after writing all data out (oh by the way, why aren't we passing
>> commit_graph instance around and instead relying on a file-scope
>> static global?)...
>
> Yeah, we should remove the global dependence. Is this a blocker for
> the series?

I do not think it is such a big deal.  It was just that I found it a
bit curious while reading it through, knowing that you are already
familiar with the work being done in that "the_repository" area.

>> I _think_ the word "close" in the name hashclose() is about closing
>> the (virtual) stream for the hashing that is overlayed on top of the
>> underlying file descriptor, and being able to choose between closing
>> and not closing the underlying file descriptor when "closing" the
>> hashing layer sort of makes sense.  So I won't complain too much
>> about hashclose() that takes optional CSUM_CLOSE flag.
>
> I agree this "close" word is incorrect. We really want
> "finalize_hashfile()" which may include closing the file.

Yeah, that is much better.  I do not think I'd mind seeing a prelim
step at the very beginning of the series to just rename the function
before the series starts to change anything else (there aren't that
many callers and I do not think there is any topic in flight that
changes these existing callsites).  Or we can leave it for clean-up
after the dust settles.  Either is fine as long as we know that we
eventually get there.

> My new solution works this way. The only caveat is that existing
> callers end up with this diff:
>
> - hashclose(f, _, CSUM_FSYNC);
> + hashclose(f, _, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);

I think I am fine with that.  It feels a bit nonsensical for a
caller to ask fsync when it is not asking fd to be closed, as I'd
imagine that the typical reason why the caller wants the fd left
open is because the caller still wants to do something to it
(e.g. write some more things into it) and a caller who would care
about fsync would want to do so _after_ finishing its own writing,
but that may be just me.

>> And then we can keep the "FSYNC means fsync and then close" the
>> current set of callers rely on.  I dunno if that is a major issue,
>> but I do think "close this, or no, keep it open" is far worse than
>> "do we want the resulting hash in the stream?"
>
> I'm not happy with this solution of needing an extra call like this
> in-between, especially since hashclose() knows how to FSYNC.

I guess we are repeating the same as above ;-)  As I said, I do not
care too deeply either way.

>> An alternative design of the above is without making
>> CSUM_HASH_IN_STREAM a new flag bit.  I highly suspect that the
>> calling codepath _knows_ whether the resulting final hash will be
>> written out at the end of the stream or not when it wraps an fd with
>> a hashfile structure, so "struct hashfile" could gain a bit to tell
>> hashclose() whether the resulting hash need to be written (or not).
>> That would be a bit larger change than what I outlined above, and I
>> do not know if it is worth doing, though.
>
> This certainly seems trickier to get right, but if we think it is the
> right solution I'll spend the time pairing struct creations with
> stream closings.

I still do not think of a compelling reason why such an alternative
approach would be worth taking, and do prefer the approach to let
the caller choose when finalize function is called via a flag bit.

Thanks.



How to debug a "git merge"?

2018-03-14 Thread Lars Schneider
Hi,

I am investigating a Git merge (a86dd40fe) in which an older version of 
a file won over the newer version. I try to understand why this is the 
case. I can reproduce the merge with the following commands:
$ git checkout -b test a02fa3303
$ GIT_MERGE_VERBOSITY=5 git merge --verbose c1b82995c

The merge actually generates a merge conflict but not for my
problematic file. The common ancestor of the two parents (merge base) 
is b91161554.

The merge graph is not pretty (the committers don't have a clean 
branching scheme) but I cannot spot a problem between the merge commit
and the common ancestor:
$ git log --graph --oneline a86dd40fe

Can you give me a hint how to debug this merge further? How can I 
understand why Git picked a certain version of a file in a merge?

I am using Git 2.16.2 on Linux.

Thanks,
Lars


Re: [PATCH v3 00/36] object_id part 12

2018-03-14 Thread Junio C Hamano
"brian m. carlson"  writes:

> This is the twelfth in a series of patches to convert various parts of
> the code to struct object_id.
>
> Changes from v2:
> * Rebase onto master (to fix "typename" → "type_name" changes).
> * Replace some uses of hashcpy with memcpy.
> * Replace some instances of "20" with references to the_hash_algo.
>
> Changes from v1:
> * Rebase onto master.
>
> tbdiff output below.
>
> brian m. carlson (36):
>   bulk-checkin: convert index_bulk_checkin to struct object_id
>   builtin/write-tree: convert to struct object_id
>   cache-tree: convert write_*_as_tree to object_id
>   cache-tree: convert remnants to struct object_id
>   resolve-undo: convert struct resolve_undo_info to object_id
>   tree: convert read_tree_recursive to struct object_id
>   ref-filter: convert grab_objectname to struct object_id
>   strbuf: convert strbuf_add_unique_abbrev to use struct object_id
>   wt-status: convert struct wt_status_state to object_id
>   Convert find_unique_abbrev* to struct object_id
>   http-walker: convert struct object_request to use struct object_id
>   send-pack: convert remaining functions to struct object_id
>   replace_object: convert struct replace_object to object_id
>   builtin/mktag: convert to struct object_id
>   archive: convert write_archive_entry_fn_t to object_id
>   archive: convert sha1_file_to_archive to struct object_id
>   builtin/index-pack: convert struct ref_delta_entry to object_id
>   sha1_file: convert read_loose_object to use struct object_id
>   sha1_file: convert check_sha1_signature to struct object_id
>   streaming: convert open_istream to use struct object_id
>   builtin/mktree: convert to struct object_id
>   sha1_file: convert assert_sha1_type to object_id
>   sha1_file: convert retry_bad_packed_offset to struct object_id
>   packfile: convert unpack_entry to struct object_id
>   Convert remaining callers of sha1_object_info_extended to object_id
>   sha1_file: convert sha1_object_info* to object_id
>   builtin/fmt-merge-msg: convert remaining code to object_id
>   builtin/notes: convert static functions to object_id
>   tree-walk: convert get_tree_entry_follow_symlinks internals to
> object_id
>   streaming: convert istream internals to struct object_id
>   tree-walk: convert tree entry functions to object_id
>   sha1_file: convert read_object_with_reference to object_id
>   sha1_file: convert read_sha1_file to struct object_id
>   Convert lookup_replace_object to struct object_id
>   sha1_file: introduce a constant for max header length
>   convert: convert to struct object_id

As always, thanks for working on this.  

After this series, what jumps at me out of output from

git grep -e '[^0-9A-Za-z_][24]0[^0-9A-Za-z_]' -- '*.[ch]' \
':!*sha1*' ':!contrib/' ':!compat/'

are code that parses the incoming patch in apply.c (where the full
blob object names used for binary patches are assumed to be in
SHA-1), builtin/pack-objects.c (where it has to know the current
file format of a packfile intimately) and diff.c (where it clips the
length to which the blob object names on the "index" lines are
abbreviated to).  Changing 40 in the last one to "the hex length of
the currently deployed hash" should be relatively uncontroversial.


Re: [PATCH/RFC v3 08/12] pack-objects: refer to delta objects by index instead of pointer

2018-03-14 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Notice that packing_data::nr_objects is uint32_t, we could only handle
> maximum 4G objects and can address all of them with an uint32_t. If we
> use a pointer here, we waste 4 bytes on 64 bit architecture.

Some things are left unsaid or left unclear and make readers stutter
a bit while reading this paragraph.  We can address them with
uint32_t only because we happen to have a linear array of all
objects involved already, i.e. the pack->objects[] array.  The
readers are forced to rephrase the above in their mind

... and each of them can be identified with an uint32_t.
Because we have all of these objects in pack->objects[], we
can replace the "delta" field in each object entry that
points at its delta base object with uint32_t index into
this array to save memory (on 64-bit arch, 8-byte pointer
gets shrunk to 4-byte uint).

or something like that before understanding why this is a valid
memory footprint optimization.



  1   2   >