Shipment delivery

2018-03-15 Thread Fedex Express Service
See attached file for details


[ANNOUNCE] Git v2.17.0-rc0

2018-03-15 Thread Junio C Hamano
An early preview release Git v2.17.0-rc0 is now available for
testing at the usual places.  It is comprised of 474 non-merge
commits since v2.16.0, contributed by 60 people, 18 of which are
new faces.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/testing/

The following public repositories all have a copy of the
'v2.17.0-rc0' tag and the 'master' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

New contributors whose contributions weren't in v2.16.0 are as follows.
Welcome to the Git development community!

  Adam Borowski, Alban Gruin, Andreas G. Schacker, Bernhard
  M. Wiedemann, Christian Ludwig, Gargi Sharma, Genki Sky,
  Gregory Herrero, Jon Simons, Juan F. Codagnone, Kim Gybels,
  Lucas Werkmeister, Mathias Rav, Motoki Seki, Stefan Moch,
  Stephen R Guglielmo, Tatyana Krasnukha, and Thomas Levesque.

Returning contributors who helped this release are as follows.
Thanks for your continued support.

  Ævar Arnfjörð Bjarmason, Alexander Shopov, Alex Bennée,
  Ben Peart, Brandon Williams, brian m. carlson, Christian
  Couder, Daniel Knittl-Frank, Derrick Stolee, Elijah Newren,
  Eric Sunshine, Eric Wong, Jason Merrill, Jeff Hostetler, Jeff
  King, Johannes Schindelin, Jonathan Nieder, Jonathan Tan, Junio
  C Hamano, Kaartic Sivaraam, Mårten Kongstad, Martin Ågren,
  Matthieu Moy, Michael Haggerty, Nathan Payre, Nguyễn Thái
  Ngọc Duy, Nicolas Morey-Chaisemartin, Olga Telezhnaya, Patryk
  Obara, Phillip Wood, Prathamesh Chavan, Ramsay Jones, Randall
  S. Becker, Rasmus Villemoes, René Scharfe, Robert P. J. Day,
  Stefan Beller, SZEDER Gábor, Thomas Gummerer, Todd Zullinger,
  Torsten Bögershausen, and Yasushi SHOJI.



Git 2.17 Release Notes (draft)
==

Updates since v2.16
---

UI, Workflows & Features

 * "diff" family of commands learned "--find-object=" option
   to limit the findings to changes that involve the named object.

 * "git format-patch" learned to give 72-cols to diffstat, which is
   consistent with other line length limits the subcommand uses for
   its output meant for e-mails.

 * The log from "git daemon" can be redirected with a new option; one
   relevant use case is to send the log to standard error (instead of
   syslog) when running it from inetd.

 * "git rebase" learned to take "--allow-empty-message" option.

 * "git am" has learned the "--quit" option, in addition to the
   existing "--abort" option; having the pair mirrors a few other
   commands like "rebase" and "cherry-pick".

 * "git worktree add" learned to run the post-checkout hook, just like
   "git clone" runs it upon the initial checkout.

 * "git tag" learned an explicit "--edit" option that allows the
   message given via "-m" and "-F" to be further edited.

 * "git fetch --prune-tags" may be used as a handy short-hand for
   getting rid of stale tags that are locally held.

 * The new "--show-current-patch" option gives an end-user facing way
   to get the diff being applied when "git rebase" (and "git am")
   stops with a conflict.

 * "git add -p" used to offer "/" (look for a matching hunk) as a
   choice, even there was only one hunk, which has been corrected.
   Also the single-key help is now given only for keys that are
   enabled (e.g. help for '/' won't be shown when there is only one
   hunk).

 * Since Git 1.7.9, "git merge" defaulted to --no-ff (i.e. even when
   the side branch being merged is a descendant of the current commit,
   create a merge commit instead of fast-forwarding) when merging a
   tag object.  This was appropriate default for integrators who pull
   signed tags from their downstream contributors, but caused an
   unnecessary merges when used by downstream contributors who
   habitually "catch up" their topic branches with tagged releases
   from the upstream.  Update "git merge" to default to --no-ff only
   when merging a tag object that does *not* sit at its usual place in
   refs/tags/ hierarchy, and allow fast-forwarding otherwise, to
   mitigate the problem.

 * "git status" can spend a lot of cycles to compute the relation
   between the current branch and its upstream, which can now be
   disabled with "--no-ahead-behind" option.

 * "git diff" and friends learned funcname patterns for Go language
   source files.

 * "git send-email" learned "--reply-to=" option.

 * Funcname pattern used for C# now recognizes "async" keyword.


Performance, Internal Implementation, Development Support etc.

 * More perf tests for threaded grep

 * "perf" test output can be sent to codespeed server.

 * The build procedure for perl/ part has been greatly simplified by
   weaning ourselves off of MakeMaker.

 * In preparation for implementing narrow/partial clone, the machinery
   for checking object connectivity used by gc and fsck has been

Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Junio C Hamano
Lars Schneider  writes:

> If I hurry up (IOW: send a reroll tonight), would this topic
> have a chance for 2.17-rc1 or is it too late?

That depends on how well the reroll is made, but for an
average-sized and average-importance topic like this one that is not
yet in 'next' yet, even if it is very close, I'd have to say that it
is cutting a bit too close.




Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread Eric Sunshine
On Thu, Mar 15, 2018 at 7:35 PM, Lars Schneider
 wrote:
>> On 16 Mar 2018, at 00:25, Eric Sunshine  wrote:
>>>if (is_encoding_utf8(src) && is_encoding_utf8(dst))
>>>return 1;
>>> +   if (same_utf_encoding(src, dst))
>>> +   return 1;
>>>return !strcasecmp(src, dst);
>>> }
>>
>> This seems odd. I would have expected the newly-added generalized
>> conditional to replace the original UTF-8-specific conditional, not
>> supplement it. That is, shouldn't the entire function body be:
>>
>>if (same_utf_encoding(src, dst))
>>return 1;
>>return !strcasecmp(src, dst);
>
> No, because is_encoding_utf8() returns "true" (=1) if the encoding
> is NULL. That is not the case for UTF-16 et al. The caller of
> same_encoding() might expect that behavior.
> I could have moved the "UTF-8" == NULL assumption into
> same_utf_encoding() but that did not feel right.
> Does this make sense?

Not particularly.

Looking at 677cfed56a (commit-tree: cope with different ways "utf-8"
can be spelled., 2006-12-30), which introduced is_encoding_utf8()
along with its builtin NULL-check, I can see why that function wants
to treat NULL the same as UTF-8.

However, I'm having a tough time imagining cases in which callers
would want same_encoding() to return true if both arguments are NULL,
but outright crash if only one is NULL (which is the behavior even
before this patch). In other words, same_encoding() takes advantage of
is_encoding_utf8() for its convenience, not for its NULL-handling.
Given that view, the two explicit is_encoding_utf8() calls in
same_encoding() seem redundant once the same_utf_encoding() call is
added.


[L10N] Kickoff of translation for Git 2.17.0 round 1

2018-03-15 Thread Jiang Xin
Hi,

Git v2.17.0-rc0 has been released, and it's time to start new round of git l10n.
This time there are 130+ updated messages need to be translated since last
update:

l10n: git.pot: v2.17.0 round 1 (132 new, 44 removed)

Generate po/git.pot from v2.17.0-rc0 for git v2.17.0 l10n round 1.

Signed-off-by: Jiang Xin 

You can get it from the usual place:

https://github.com/git-l10n/git-po/

As how to update your XX.po and help to translate Git, please see
"Updating a XX.po file" and other sections in “po/README" file.

--
Jiang Xin


Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread Lars Schneider

> On 16 Mar 2018, at 00:25, Eric Sunshine  wrote:
> 
> On Thu, Mar 15, 2018 at 6:57 PM,   wrote:
>> The function same_encoding() checked only for alternative UTF-8 encoding
>> names. Teach it to check for all kinds of alternative UTF encoding
>> names.
>> 
>> Signed-off-by: Lars Schneider 
>> ---
>> diff --git a/utf8.c b/utf8.c
>> @@ -401,11 +401,27 @@ void strbuf_utf8_replace(struct strbuf *sb_src, int 
>> pos, int width,
>> +static int same_utf_encoding(const char *src, const char *dst)
>> +{
>> +   if (istarts_with(src, "utf") && istarts_with(dst, "utf")) {
>> +   /* src[3] or dst[3] might be '\0' */
>> +   int i = (src[3] == '-' ? 4 : 3);
>> +   int j = (dst[3] == '-' ? 4 : 3);
>> +   return !strcasecmp(src+i, dst+j);
>> +   }
>> +   return 0;
>> +}
> 
> Alternate implementation without magic numbers:
> 
>if (iskip_prefix(src, "utf", ) &&
>iskip_prefix(dst, "utf", )) {
>if (*src == '-')
>src++;
>if (*dst == '-')
>dst++;
>return !strcasecmp(src, dst);
>}
>return 0;
> 
> ... assuming you add an iskip_prefix() function (patterned after 
> skip_prefix()).

If a reroll is necessary, then I can do this.


>> int is_encoding_utf8(const char *name)
>> {
>>if (!name)
>>return 1;
>> -   if (!strcasecmp(name, "utf-8") || !strcasecmp(name, "utf8"))
>> +   if (same_utf_encoding("utf-8", name))
>>return 1;
>>return 0;
>> }
>> @@ -414,6 +430,8 @@ int same_encoding(const char *src, const char *dst)
>> {
>>if (is_encoding_utf8(src) && is_encoding_utf8(dst))
>>return 1;
>> +   if (same_utf_encoding(src, dst))
>> +   return 1;
>>return !strcasecmp(src, dst);
>> }
> 
> This seems odd. I would have expected the newly-added generalized
> conditional to replace the original UTF-8-specific conditional, not
> supplement it. That is, shouldn't the entire function body be:
> 
>if (same_utf_encoding(src, dst))
>return 1;
>return !strcasecmp(src, dst);

No, because is_encoding_utf8() returns "true" (=1) if the encoding
is NULL. That is not the case for UTF-16 et al. The caller of
same_encoding() might expect that behavior.

I could have moved the "UTF-8" == NULL assumption into 
same_utf_encoding() but that did not feel right.

Does this make sense?

- Lars

Re: [PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread Eric Sunshine
On Thu, Mar 15, 2018 at 6:57 PM,   wrote:
> The function same_encoding() checked only for alternative UTF-8 encoding
> names. Teach it to check for all kinds of alternative UTF encoding
> names.
>
> Signed-off-by: Lars Schneider 
> ---
> diff --git a/utf8.c b/utf8.c
> @@ -401,11 +401,27 @@ void strbuf_utf8_replace(struct strbuf *sb_src, int 
> pos, int width,
> +static int same_utf_encoding(const char *src, const char *dst)
> +{
> +   if (istarts_with(src, "utf") && istarts_with(dst, "utf")) {
> +   /* src[3] or dst[3] might be '\0' */
> +   int i = (src[3] == '-' ? 4 : 3);
> +   int j = (dst[3] == '-' ? 4 : 3);
> +   return !strcasecmp(src+i, dst+j);
> +   }
> +   return 0;
> +}

Alternate implementation without magic numbers:

if (iskip_prefix(src, "utf", ) &&
iskip_prefix(dst, "utf", )) {
if (*src == '-')
src++;
if (*dst == '-')
dst++;
return !strcasecmp(src, dst);
}
return 0;

... assuming you add an iskip_prefix() function (patterned after skip_prefix()).

>  int is_encoding_utf8(const char *name)
>  {
> if (!name)
> return 1;
> -   if (!strcasecmp(name, "utf-8") || !strcasecmp(name, "utf8"))
> +   if (same_utf_encoding("utf-8", name))
> return 1;
> return 0;
>  }
> @@ -414,6 +430,8 @@ int same_encoding(const char *src, const char *dst)
>  {
> if (is_encoding_utf8(src) && is_encoding_utf8(dst))
> return 1;
> +   if (same_utf_encoding(src, dst))
> +   return 1;
> return !strcasecmp(src, dst);
>  }

This seems odd. I would have expected the newly-added generalized
conditional to replace the original UTF-8-specific conditional, not
supplement it. That is, shouldn't the entire function body be:

if (same_utf_encoding(src, dst))
return 1;
return !strcasecmp(src, dst);

?


Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-15 Thread Igor Djordjevic
Hi Sergey,

On 15/03/2018 08:52, Sergey Organov wrote:
> 
> > > 2. The U1' == U2' consistency check in RFC that I still think is worth
> > > to be implemented.
> >
> > At the moment, I think we`d appreciate test cases where it actually 
> > proves useful, as the general consensus seems to be leaning towards 
> > it possibly being annoying (over-paranoid).
> 
> As we now have a simple way to actually check it even in this algorithm,
> I'd suggest command-line option to either relax or enforce the check,
> whatever the default is. For the default, I'd still opt for safety, as
> without it we will gather little experience with this new matter.
> 
> Honestly, without this check available, I'd likely vote for at least an
> option for stopping on every rebased merge, on the ground that if
> rebasing a non-merge could be a trouble, rebasing a merge is at least
> double-trouble, and it's not that frequent anyway. So the check we
> discuss is actually a way to make all the process much less paranoid,
> not more.
> 
> By the way, nobody yet commented about "rerere" behavior that basically
> stops rebasing every time it fires. Do you consider it over-paranoid?

I wouldn`t really know, my workflows are usually/still rather simple, I 
don`t think I`ve ever used it on purpose, and I don`t really remember 
I`ve triggered it by accident, not having it stop for amendment, at least.

You did say you find it annoying yourself, though ;) But also accepting 
it as something that probably has a good reason, though (thus not 
considering it over-paranoid, even if annoying).

> As for test cases, I have none myself, but "-s ours" merge may be an
> example of an actual trouble.
> 
> If we don't treat it specially, then changes to side branch will be
> silently propagated over the merge, that's obviously not what is needed,
> provided user keeps his intention to leave the merge "-s ours".
> 
> If we do treat it specially, it could be the case that the merge in
> question only looks like "-s ours" by pure accident, and thus changes to
> the side branch should be propagated.
> 
> I don't see how we can safely proceed without stop for user assistance.
> Had we already achieved some consensus on this issue?

I don`t know, from what Johannes said in the past, I got an 
impression that this is to be expected ("by design"), and not worth 
bothering to stop for. And he is one of the heaviest users of (merge) 
rebasing I know.

Personally, I still feel it would make sense to stop in case like 
this, indeed, but it`s just my humble (and not necessarily much 
educated) opinion.

> > > Finally, here is a sketch of the implementation that I'd suggest to
> > > use:
> > >
> > > git-rebase-first-parent --onto A' M
> > > tree_U1'=$(git write-tree)
> > > git merge-recursive B -- $tree_U1' B'
> > > tree=$(git write-tree)
> > > M'=$(git log --pretty=%B -1 M | git commit-tree -pA' -pB')
> > > [ $conflicted_last_merge = "yes" ] ||
> > >   trees-match $tree_U1' $tree || 
> > >   stop-for-user-amendment
> >
> > Yes, in case where we would want the "no-op merge" check (equivalent 
> > to U1' == U2' with original approach), this aligns with something I 
> > would expect.
> >
> > Note that all the "rebase merge commit" steps leading to the check 
> > will/should probably be observed as a single one from user`s perspective 
> > (in worst case ending with nested conflicts we discussed), thus 
> > `$conflicted_last_merge` is not related to `merge-recursive` step(s) 
> > only, but `rebase-first-parent`, too (just in case this isn`t implied).
> >
> > Might be easier to reason about simply as `[ $conflicts = "yes" ] || `
> 
> No. For this check it's essential to ensure that no tweaking of the
> content has been performed under the hood after the user has resolved
> conflicts, i.e., after he has been involved last time.
> 
> If all this is done in one "huge merge" step from user point of view,
> then the check belongs to this merge, as this is the last (and the only)
> one. If it's done in steps (and I vote for it), only the last merge
> status is essential for the check, preceding merges don't matter.

"Huge merge" step (from user point of view) is exactly how I perceived 
Johannes` opinion on it, describing it`s already part of Git user 
experience (with possible nested conflicts), while otherwise possibly 
hard to explain where we are precisely at in the moment of stopping for 
(intermediate) conflict resolution.

Thus only `$conflicts`, meaning anything in the "huge merge", as no user 
action/tweaking/involvement can happen until the "huge merge" is done.

> As I said, putting myself on the user side, I'd prefer entirely separate
> first step of the algorithm, exactly as written, with its own conflict
> resolution, all running entirely the same way as it does with non-merge
> commits. I'm used to it and don't want to learn something new without
> necessity. I.e., I'd prefer to actually see it in two separate stages,
> like this:
> 
> Rebasing mainline of the 

Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Lars Schneider

> On 15 Mar 2018, at 20:18, Lars Schneider  wrote:
> 
> 
>> On 15 Mar 2018, at 02:34, Junio C Hamano  wrote:
>> 
>> ...
>> 
>> * ls/checkout-encoding (2018-03-09) 10 commits
>> - convert: add round trip check based on 'core.checkRoundtripEncoding'
>> - convert: add tracing for 'working-tree-encoding' attribute
>> - convert: advise canonical UTF encoding names
>> - convert: check for detectable errors in UTF encodings
>> - convert: add 'working-tree-encoding' attribute
>> - utf8: add function to detect a missing UTF-16/32 BOM
>> - utf8: add function to detect prohibited UTF-16/32 BOM
>> - strbuf: add a case insensitive starts_with()
>> - strbuf: add xstrdup_toupper()
>> - strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
>> 
>> The new "checkout-encoding" attribute can ask Git to convert the
>> contents to the specified encoding when checking out to the working
>> tree (and the other way around when checking in).
>> 
>> Expecting a reroll.
>> cf. <66370a41-a048-44e7-9bf8-4631c50aa...@gmail.com>
>> Modulo minor design decision corrections, the series is almost there.
> 
> If I hurry up (IOW: send a reroll tonight), would this topic
> have a chance for 2.17-rc1 or is it too late?

I just send out a reroll in case the series is still relevant
for 2.17-rc1.

https://public-inbox.org/git/20180315225746.18119-1-lars.schnei...@autodesk.com/

Thanks,
Lars

[PATCH v12 08/10] convert: check for detectable errors in UTF encodings

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.

Signed-off-by: Lars Schneider 
---
 convert.c| 61 +++
 t/t0028-working-tree-encoding.sh | 62 
 2 files changed, 123 insertions(+)

diff --git a/convert.c b/convert.c
index 85e49741af..3cab4fa907 100644
--- a/convert.c
+++ b/convert.c
@@ -266,6 +266,64 @@ static int will_convert_lf_to_crlf(size_t len, struct 
text_stat *stats,
 
 }
 
+static int validate_encoding(const char *path, const char *enc,
+ const char *data, size_t len, int die_on_error)
+{
+   /* We only check for UTF here as UTF?? can be an alias for UTF-?? */
+   if (istarts_with(enc, "UTF")) {
+   /*
+* Check for detectable errors in UTF encodings
+*/
+   if (has_prohibited_utf_bom(enc, data, len)) {
+   const char *error_msg = _(
+   "BOM is prohibited in '%s' if encoded as %s");
+   /*
+* This advice is shown for UTF-??BE and UTF-??LE 
encodings.
+* We cut off the last two characters of the encoding 
name
+* to generate the encoding name suitable for BOMs.
+*/
+   const char *advise_msg = _(
+   "The file '%s' contains a byte order "
+   "mark (BOM). Please use UTF-%s as "
+   "working-tree-encoding.");
+   const char *stripped = NULL;
+   char *upper = xstrdup_toupper(enc);
+   upper[strlen(upper)-2] = '\0';
+   if (!skip_prefix(upper, "UTF-", ))
+   skip_prefix(stripped, "UTF", );
+   advise(advise_msg, path, stripped);
+   free(upper);
+   if (die_on_error)
+   die(error_msg, path, enc);
+   else {
+   return error(error_msg, path, enc);
+   }
+
+   } else if (is_missing_required_utf_bom(enc, data, len)) {
+   const char *error_msg = _(
+   "BOM is required in '%s' if encoded as %s");
+   const char *advise_msg = _(
+   "The file '%s' is missing a byte order "
+   "mark (BOM). Please use UTF-%sBE or UTF-%sLE "
+   "(depending on the byte order) as "
+   "working-tree-encoding.");
+   const char *stripped = NULL;
+   char *upper = xstrdup_toupper(enc);
+   if (!skip_prefix(upper, "UTF-", ))
+   skip_prefix(stripped, "UTF", );
+   advise(advise_msg, path, stripped, stripped);
+   free(upper);
+   if (die_on_error)
+   die(error_msg, path, enc);
+   else {
+   return error(error_msg, path, enc);
+   }
+   }
+
+   }
+   return 0;
+}
+
 static const char *default_encoding = "UTF-8";
 
 static int encode_to_git(const char *path, const char *src, size_t src_len,
@@ -291,6 +349,9 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
if (!buf && !src)
return 1;
 
+   if (validate_encoding(path, enc, src, src_len, die_on_error))
+   return 0;
+
dst = reencode_string_len(src, src_len, default_encoding, enc,
  _len);
if (!dst) {
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index d67dbde1d4..1bb528b339 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -62,6 +62,52 @@ test_expect_success 'check $GIT_DIR/info/attributes support' 
'
 
 for i in 16 32
 do
+   test_expect_success "check prohibited UTF-${i} BOM" '
+   test_when_finished "git reset --hard HEAD" &&
+
+   echo "*.utf${i}be text working-tree-encoding=utf-${i}be" 
>>.gitattributes &&
+   echo "*.utf${i}le text working-tree-encoding=utf-${i}LE" 
>>.gitattributes &&
+
+   # Here we add a UTF-16 (resp. UTF-32) files with BOM 
(big/little-endian)
+   # but we tell Git to treat it as UTF-16BE/UTF-16LE (resp. 
UTF-32).
+   # In these cases the BOM is prohibited.
+   cp bebom.utf${i}be.raw bebom.utf${i}be &&
+   test_must_fail git add bebom.utf${i}be 2>err.out &&
+   test_i18ngrep 

[PATCH v12 09/10] convert: add tracing for 'working-tree-encoding' attribute

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.

Signed-off-by: Lars Schneider 
---
 convert.c| 25 +
 t/t0028-working-tree-encoding.sh |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/convert.c b/convert.c
index 3cab4fa907..ba6f2019a3 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,29 @@ static int validate_encoding(const char *path, const char 
*enc,
return 0;
 }
 
+static void trace_encoding(const char *context, const char *path,
+  const char *encoding, const char *buf, size_t len)
+{
+   static struct trace_key coe = TRACE_KEY_INIT(WORKING_TREE_ENCODING);
+   struct strbuf trace = STRBUF_INIT;
+   int i;
+
+   strbuf_addf(, "%s (%s, considered %s):\n", context, path, 
encoding);
+   for (i = 0; i < len && buf; ++i) {
+   strbuf_addf(
+   ,"| \e[2m%2i:\e[0m %2x \e[2m%c\e[0m%c",
+   i,
+   (unsigned char) buf[i],
+   (buf[i] > 32 && buf[i] < 127 ? buf[i] : ' '),
+   ((i+1) % 8 && (i+1) < len ? ' ' : '\n')
+   );
+   }
+   strbuf_addchars(, '\n', 1);
+
+   trace_strbuf(, );
+   strbuf_release();
+}
+
 static const char *default_encoding = "UTF-8";
 
 static int encode_to_git(const char *path, const char *src, size_t src_len,
@@ -352,6 +375,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
if (validate_encoding(path, enc, src, src_len, die_on_error))
return 0;
 
+   trace_encoding("source", path, enc, src, src_len);
dst = reencode_string_len(src, src_len, default_encoding, enc,
  _len);
if (!dst) {
@@ -369,6 +393,7 @@ static int encode_to_git(const char *path, const char *src, 
size_t src_len,
return 0;
}
}
+   trace_encoding("destination", path, default_encoding, dst, dst_len);
 
strbuf_attach(buf, dst, dst_len, dst_len + 1);
return 1;
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 1bb528b339..2ff7541b34 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -4,6 +4,8 @@ test_description='working-tree-encoding conversion via 
gitattributes'
 
 . ./test-lib.sh
 
+GIT_TRACE_WORKING_TREE_ENCODING=1 && export GIT_TRACE_WORKING_TREE_ENCODING
+
 test_expect_success 'setup test files' '
git config core.eol lf &&
 
-- 
2.16.2



[PATCH v12 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding'

2018-03-15 Thread lars . schneider
From: Lars Schneider 

UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1].

Add 'core.checkRoundtripEncoding', which contains a comma separated
list of encodings, to define for what encodings Git should check the
conversion round trip if they are used in the 'working-tree-encoding'
attribute.

Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'.

[1] 
https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode

Signed-off-by: Lars Schneider 
---
 Documentation/config.txt |  6 
 Documentation/gitattributes.txt  |  8 +
 config.c |  5 +++
 convert.c| 77 
 convert.h|  1 +
 environment.c|  1 +
 t/t0028-working-tree-encoding.sh | 39 
 7 files changed, 137 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92b..7dcac9b540 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -530,6 +530,12 @@ core.autocrlf::
This variable can be set to 'input',
in which case no output conversion is performed.
 
+core.checkRoundtripEncoding::
+   A comma and/or whitespace separated list of encodings that Git
+   performs UTF-8 round trip checks on if they are used in an
+   `working-tree-encoding` attribute (see linkgit:gitattributes[5]).
+   The default value is `SHIFT-JIS`.
+
 core.symlinks::
If false, symbolic links are checked out as small plain files that
contain the link text. linkgit:git-update-index[1] and
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 31a4f92840..aa3deae392 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -312,6 +312,14 @@ number of pitfalls:
   internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
   That operation will fail and cause an error.
 
+- Reencoding content to non-UTF encodings can cause errors as the
+  conversion might not be UTF-8 round trip safe. If you suspect your
+  encoding to not be round trip safe, then add it to
+  `core.checkRoundtripEncoding` to make Git check the round trip
+  encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
+  set) is known to have round trip issues with UTF-8 and is checked by
+  default.
+
 - Reencoding content requires resources that might slow down certain
   Git operations (e.g 'git checkout' or 'git add').
 
diff --git a/config.c b/config.c
index 1f003fbb90..d0ada9fcd4 100644
--- a/config.c
+++ b/config.c
@@ -1172,6 +1172,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.checkroundtripencoding")) {
+   check_roundtrip_encoding = xstrdup(value);
+   return 0;
+   }
+
if (!strcmp(var, "core.notesref")) {
notes_ref_name = xstrdup(value);
return 0;
diff --git a/convert.c b/convert.c
index ba6f2019a3..2a002af66d 100644
--- a/convert.c
+++ b/convert.c
@@ -347,6 +347,42 @@ static void trace_encoding(const char *context, const char 
*path,
strbuf_release();
 }
 
+static int check_roundtrip(const char *enc_name)
+{
+   /*
+* check_roundtrip_encoding contains a string of comma and/or
+* space separated encodings (eg. "UTF-16, ASCII, CP1125").
+* Search for the given encoding in that string.
+*/
+   const char *found = strcasestr(check_roundtrip_encoding, enc_name);
+   const char *next;
+   int len;
+   if (!found)
+   return 0;
+   next = found + strlen(enc_name);
+   len = strlen(check_roundtrip_encoding);
+   return (found && (
+   /*
+* check that the found encoding is at the
+* beginning of check_roundtrip_encoding or
+* that it is prefixed with a space or comma
+*/
+   found == check_roundtrip_encoding || (
+   (isspace(found[-1]) || found[-1] == ',')
+   )
+   ) && (
+   /*
+* check that the found encoding is at the
+* end of check_roundtrip_encoding or
+* that it is suffixed with a space or comma
+*/
+   next == check_roundtrip_encoding + len || (
+   next < check_roundtrip_encoding + len &&
+   (isspace(next[0]) || next[0] == 

[PATCH v12 04/10] utf8: teach same_encoding() alternative UTF encoding names

2018-03-15 Thread lars . schneider
From: Lars Schneider 

The function same_encoding() checked only for alternative UTF-8 encoding
names. Teach it to check for all kinds of alternative UTF encoding
names.

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 utf8.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/utf8.c b/utf8.c
index 2c27ce0137..c30daf4d34 100644
--- a/utf8.c
+++ b/utf8.c
@@ -401,11 +401,27 @@ void strbuf_utf8_replace(struct strbuf *sb_src, int pos, 
int width,
strbuf_release(_dst);
 }
 
+/*
+ * Returns true (1) if the src encoding name matches the dst encoding
+ * name directly or one of its alternative names. E.g. UTF-16BE is the
+ * same as UTF16BE.
+ */
+static int same_utf_encoding(const char *src, const char *dst)
+{
+   if (istarts_with(src, "utf") && istarts_with(dst, "utf")) {
+   /* src[3] or dst[3] might be '\0' */
+   int i = (src[3] == '-' ? 4 : 3);
+   int j = (dst[3] == '-' ? 4 : 3);
+   return !strcasecmp(src+i, dst+j);
+   }
+   return 0;
+}
+
 int is_encoding_utf8(const char *name)
 {
if (!name)
return 1;
-   if (!strcasecmp(name, "utf-8") || !strcasecmp(name, "utf8"))
+   if (same_utf_encoding("utf-8", name))
return 1;
return 0;
 }
@@ -414,6 +430,8 @@ int same_encoding(const char *src, const char *dst)
 {
if (is_encoding_utf8(src) && is_encoding_utf8(dst))
return 1;
+   if (same_utf_encoding(src, dst))
+   return 1;
return !strcasecmp(src, dst);
 }
 
-- 
2.16.2



[PATCH v12 07/10] convert: add 'working-tree-encoding' attribute

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front ends do not
visualize the content.

Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git converts the
content to a canonical UTF-8 representation. On checkout Git will
reverse the conversion.

Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt  |  80 ++
 convert.c| 113 ++-
 convert.h|   1 +
 sha1_file.c  |   2 +-
 t/t0028-working-tree-encoding.sh | 142 +++
 5 files changed, 336 insertions(+), 2 deletions(-)
 create mode 100755 t/t0028-working-tree-encoding.sh

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 30687de81a..31a4f92840 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -272,6 +272,86 @@ few exceptions.  Even though...
   catch potential problems early, safety triggers.
 
 
+`working-tree-encoding`
+^^^
+
+Git recognizes files encoded in ASCII or one of its supersets (e.g.
+UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
+encodings (e.g. UTF-16) are interpreted as binary and consequently
+built-in Git text processing tools (e.g. 'git diff') as well as most Git
+web front ends do not visualize the contents of these files by default.
+
+In these cases you can tell Git the encoding of a file in the working
+directory with the `working-tree-encoding` attribute. If a file with this
+attribute is added to Git, then Git reencodes the content from the
+specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
+content in its internal data structure (called "the index"). On checkout
+the content is reencoded back to the specified encoding.
+
+Please note that using the `working-tree-encoding` attribute may have a
+number of pitfalls:
+
+- Alternative Git implementations (e.g. JGit or libgit2) and older Git
+  versions (as of March 2018) do not support the `working-tree-encoding`
+  attribute. If you decide to use the `working-tree-encoding` attribute
+  in your repository, then it is strongly recommended to ensure that all
+  clients working with the repository support it.
+
+  For example, Microsoft Visual Studio resources files (`*.rc`) or
+  PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
+  If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
+  a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
+  stored as UTF-8 internally. A client without `working-tree-encoding`
+  support will checkout `foo.ps1` as UTF-8 encoded file. This will
+  typically cause trouble for the users of this file.
+
+  If a Git client, that does not support the `working-tree-encoding`
+  attribute, adds a new file `bar.ps1`, then `bar.ps1` will be
+  stored "as-is" internally (in this example probably as UTF-16).
+  A client with `working-tree-encoding` support will interpret the
+  internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
+  That operation will fail and cause an error.
+
+- Reencoding content requires resources that might slow down certain
+  Git operations (e.g 'git checkout' or 'git add').
+
+Use the `working-tree-encoding` attribute only if you cannot store a file
+in UTF-8 encoding and if you want Git to be able to process the content
+as text.
+
+As an example, use the following attributes if your '*.ps1' files are
+UTF-16 encoded with byte order mark (BOM) and you want Git to perform
+automatic line ending conversion based on your platform.
+
+
+*.ps1  text working-tree-encoding=UTF-16
+
+
+Use the following attributes if your '*.ps1' files are UTF-16 little
+endian encoded without BOM and you want Git to use Windows line endings
+in the working directory. Please note, it is highly recommended to
+explicitly define the line endings with `eol` if the `working-tree-encoding`
+attribute is used to avoid ambiguity.
+
+
+*.ps1  text working-tree-encoding=UTF-16LE eol=CRLF
+
+
+You can get a list of all available encodings on your platform with the
+following command:
+
+
+iconv --list
+
+
+If you do not know the encoding of a file, then you can use the `file`
+command to guess the encoding:
+
+
+file foo.ps1
+
+
+
 `ident`
 ^^^
 
diff --git a/convert.c b/convert.c
index b976eb968c..85e49741af 100644
--- a/convert.c
+++ b/convert.c
@@ -7,6 +7,7 @@

[PATCH v12 06/10] utf8: add function to detect a missing UTF-16/32 BOM

2018-03-15 Thread lars . schneider
From: Lars Schneider 

If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.

The Unicode standard instructs to assume big-endian if there in no BOM
for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
in HTML5 recommends to assume little-endian to "deal with deployed
content" [3]. Strictly requiring a BOM seems to be the safest option
for content in Git.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#gen6
[2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
 Section 3.10, D98, page 132
[3] https://encoding.spec.whatwg.org/#utf-16le

Signed-off-by: Lars Schneider 
---
 utf8.c | 13 +
 utf8.h | 19 +++
 2 files changed, 32 insertions(+)

diff --git a/utf8.c b/utf8.c
index d16dc1f244..2d8821d36e 100644
--- a/utf8.c
+++ b/utf8.c
@@ -582,6 +582,19 @@ int has_prohibited_utf_bom(const char *enc, const char 
*data, size_t len)
);
 }
 
+int is_missing_required_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+  (same_utf_encoding(enc, "UTF-16")) &&
+  !(has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+  (same_utf_encoding(enc, "UTF-32")) &&
+  !(has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 0db1db4519..cce654a64a 100644
--- a/utf8.h
+++ b/utf8.h
@@ -79,4 +79,23 @@ void strbuf_utf8_align(struct strbuf *buf, align_type 
position, unsigned int wid
  */
 int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
 
+/*
+ * If the endianness is not defined in the encoding name, then we
+ * require a BOM. The function returns true if a required BOM is missing.
+ *
+ * The Unicode standard instructs to assume big-endian if there in no
+ * BOM for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard
+ * used in HTML5 recommends to assume little-endian to "deal with
+ * deployed content" [3].
+ *
+ * Therefore, strictly requiring a BOM seems to be the safest option for
+ * content in Git.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#gen6
+ * [2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
+ * Section 3.10, D98, page 132
+ * [3] https://encoding.spec.whatwg.org/#utf-16le
+ */
+int is_missing_required_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.16.2



[PATCH v12 02/10] strbuf: add xstrdup_toupper()

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Create a copy of an existing string and make all characters upper case.
Similar xstrdup_tolower().

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 12 
 strbuf.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index 55b7daeb35..b635f0bdc4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -784,6 +784,18 @@ char *xstrdup_tolower(const char *string)
return result;
 }
 
+char *xstrdup_toupper(const char *string)
+{
+   char *result;
+   size_t len, i;
+
+   len = strlen(string);
+   result = xmallocz(len);
+   for (i = 0; i < len; i++)
+   result[i] = toupper(string[i]);
+   return result;
+}
+
 char *xstrvfmt(const char *fmt, va_list ap)
 {
struct strbuf buf = STRBUF_INIT;
diff --git a/strbuf.h b/strbuf.h
index 14c8c10d66..df7ced53ed 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -607,6 +607,7 @@ __attribute__((format (printf,2,3)))
 extern int fprintf_ln(FILE *fp, const char *fmt, ...);
 
 char *xstrdup_tolower(const char *);
+char *xstrdup_toupper(const char *);
 
 /**
  * Create a newly allocated string using printf format. You can do this easily
-- 
2.16.2



[PATCH v12 00/10] convert: add support for different encodings

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Hi,

Patches 1-6,9 are preparation and helper functions. Patch 4 is new.
Patch 7,8,10 are the actual change.

This series depends on Torsten's 8462ff43e4 (convert_to_git():
safe_crlf/checksafe becomes int conv_flags, 2018-01-13) which is
already in master.

Changes since v11:

* die if w-t-e is configured with a true/false (=undefined!)
  value (Junio)
* improve same_encoding to detect all alternatives for
  UTF encodings (new commit, Junio)
* squash in "advise canonical UTF encoding names" and
  remove commit (Junio)
* fix erroneous # in comment (Junio)
* force segv for non-UTF encodings in validate_encoding() (Junio)

Thanks,
Lars


  RFC: 
https://public-inbox.org/git/bdb9b884-6d17-4be3-a83c-f67e2afa2...@gmail.com/
   v1: 
https://public-inbox.org/git/20171211155023.1405-1-lars.schnei...@autodesk.com/
   v2: 
https://public-inbox.org/git/2017122915.39680-1-lars.schnei...@autodesk.com/
   v3: 
https://public-inbox.org/git/20180106004808.77513-1-lars.schnei...@autodesk.com/
   v4: 
https://public-inbox.org/git/20180120152418.52859-1-lars.schnei...@autodesk.com/
   v5: https://public-inbox.org/git/20180129201855.9182-1-tbo...@web.de/
   v6: 
https://public-inbox.org/git/20180209132830.55385-1-lars.schnei...@autodesk.com/
   v7: 
https://public-inbox.org/git/20180215152711.158-1-lars.schnei...@autodesk.com/
   v8: 
https://public-inbox.org/git/20180224162801.98860-1-lars.schnei...@autodesk.com/
   v9: 
https://public-inbox.org/git/20180304201418.60958-1-lars.schnei...@autodesk.com/
  v10: 
https://public-inbox.org/git/20180307173026.30058-1-lars.schnei...@autodesk.com/
  v11: 
https://public-inbox.org/git/20180309173536.62012-1-lars.schnei...@autodesk.com/

Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/0daedbbd76
Checkout: git fetch https://github.com/larsxschneider/git encoding-v12 && git 
checkout 0daedbbd76


### Interdiff (v11..v12):

diff --git a/convert.c b/convert.c
index c2d24882c1..2a002af66d 100644
--- a/convert.c
+++ b/convert.c
@@ -280,13 +280,13 @@ static int validate_encoding(const char *path, const char 
*enc,
/*
 * This advice is shown for UTF-??BE and UTF-??LE 
encodings.
 * We cut off the last two characters of the encoding 
name
-# to generate the encoding name suitable for BOMs.
+* to generate the encoding name suitable for BOMs.
 */
const char *advise_msg = _(
"The file '%s' contains a byte order "
"mark (BOM). Please use UTF-%s as "
"working-tree-encoding.");
-   const char *stripped = "";
+   const char *stripped = NULL;
char *upper = xstrdup_toupper(enc);
upper[strlen(upper)-2] = '\0';
if (!skip_prefix(upper, "UTF-", ))
@@ -307,7 +307,7 @@ static int validate_encoding(const char *path, const char 
*enc,
"mark (BOM). Please use UTF-%sBE or UTF-%sLE "
"(depending on the byte order) as "
"working-tree-encoding.");
-   const char *stripped = "";
+   const char *stripped = NULL;
char *upper = xstrdup_toupper(enc);
if (!skip_prefix(upper, "UTF-", ))
skip_prefix(stripped, "UTF", );
@@ -1222,12 +1222,11 @@ static const char *git_path_check_encoding(struct 
attr_check_item *check)
return NULL;

if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
-   error(_("working-tree-encoding attribute requires a value"));
-   return NULL;
+   die(_("working-tree-encoding attribute requires a value"));
}

/* Don't encode to the default encoding */
-   if (is_encoding_utf8(value) && is_encoding_utf8(default_encoding))
+   if (same_encoding(value, default_encoding))
return NULL;

return value;
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 07089bba2e..884f0878b1 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -149,25 +149,23 @@ done
 test_expect_success 'check unsupported encodings' '
test_when_finished "git reset --hard HEAD" &&

-   echo "*.set text working-tree-encoding" >>.gitattributes &&
+   echo "*.set text working-tree-encoding" >.gitattributes &&
printf "set" >t.set &&
-   git add t.set 2>err.out &&
-   test_i18ngrep "error: working-tree-encoding attribute requires a value" 
err.out &&
+   test_must_fail git add t.set 2>err.out &&
+   test_i18ngrep "working-tree-encoding attribute requires a value" 
err.out &&

-   echo 

[PATCH v12 05/10] utf8: add function to detect prohibited UTF-16/32 BOM

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#bom10

Signed-off-by: Lars Schneider 
---
 utf8.c | 26 ++
 utf8.h |  9 +
 2 files changed, 35 insertions(+)

diff --git a/utf8.c b/utf8.c
index c30daf4d34..d16dc1f244 100644
--- a/utf8.c
+++ b/utf8.c
@@ -556,6 +556,32 @@ char *reencode_string_len(const char *in, int insz,
 }
 #endif
 
+static int has_bom_prefix(const char *data, size_t len,
+ const char *bom, size_t bom_len)
+{
+   return (len >= bom_len) && !memcmp(data, bom, bom_len);
+}
+
+static const char utf16_be_bom[] = {0xFE, 0xFF};
+static const char utf16_le_bom[] = {0xFF, 0xFE};
+static const char utf32_be_bom[] = {0x00, 0x00, 0xFE, 0xFF};
+static const char utf32_le_bom[] = {0xFF, 0xFE, 0x00, 0x00};
+
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len)
+{
+   return (
+ (same_utf_encoding("UTF-16BE", enc) ||
+  same_utf_encoding("UTF-16LE", enc)) &&
+ (has_bom_prefix(data, len, utf16_be_bom, sizeof(utf16_be_bom)) ||
+  has_bom_prefix(data, len, utf16_le_bom, sizeof(utf16_le_bom)))
+   ) || (
+ (same_utf_encoding("UTF-32BE",  enc) ||
+  same_utf_encoding("UTF-32LE", enc)) &&
+ (has_bom_prefix(data, len, utf32_be_bom, sizeof(utf32_be_bom)) ||
+  has_bom_prefix(data, len, utf32_le_bom, sizeof(utf32_le_bom)))
+   );
+}
+
 /*
  * Returns first character length in bytes for multi-byte `text` according to
  * `encoding`.
diff --git a/utf8.h b/utf8.h
index 6bbcf31a83..0db1db4519 100644
--- a/utf8.h
+++ b/utf8.h
@@ -70,4 +70,13 @@ typedef enum {
 void strbuf_utf8_align(struct strbuf *buf, align_type position, unsigned int 
width,
   const char *s);
 
+/*
+ * If a data stream is declared as UTF-16BE or UTF-16LE, then a UTF-16
+ * BOM must not be used [1]. The same applies for the UTF-32 equivalents.
+ * The function returns true if this rule is violated.
+ *
+ * [1] http://unicode.org/faq/utf_bom.html#bom10
+ */
+int has_prohibited_utf_bom(const char *enc, const char *data, size_t len);
+
 #endif
-- 
2.16.2



[PATCH v12 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower()

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Since 3733e69464 (use xmallocz to avoid size arithmetic, 2016-02-22) we
allocate the buffer for the lower case string with xmallocz(). This
already ensures a NUL at the end of the allocated buffer.

Remove the unnecessary assignment.

Signed-off-by: Lars Schneider 
---
 strbuf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 1df674e919..55b7daeb35 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -781,7 +781,6 @@ char *xstrdup_tolower(const char *string)
result = xmallocz(len);
for (i = 0; i < len; i++)
result[i] = tolower(string[i]);
-   result[i] = '\0';
return result;
 }
 
-- 
2.16.2



[PATCH v12 03/10] strbuf: add a case insensitive starts_with()

2018-03-15 Thread lars . schneider
From: Lars Schneider 

Check in a case insensitive manner if one string is a prefix of another
string.

This function is used in a subsequent commit.

Signed-off-by: Lars Schneider 
---
 git-compat-util.h | 1 +
 strbuf.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 68b2ad531e..95c9b34832 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -455,6 +455,7 @@ extern void (*get_warn_routine(void))(const char *warn, 
va_list params);
 extern void set_die_is_recursing_routine(int (*routine)(void));
 
 extern int starts_with(const char *str, const char *prefix);
+extern int istarts_with(const char *str, const char *prefix);
 
 /*
  * If the string "str" begins with the string found in "prefix", return 1.
diff --git a/strbuf.c b/strbuf.c
index b635f0bdc4..99812b8488 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -11,6 +11,15 @@ int starts_with(const char *str, const char *prefix)
return 0;
 }
 
+int istarts_with(const char *str, const char *prefix)
+{
+   for (; ; str++, prefix++)
+   if (!*prefix)
+   return 1;
+   else if (tolower(*str) != tolower(*prefix))
+   return 0;
+}
+
 int skip_to_optional_arg_default(const char *str, const char *prefix,
 const char **arg, const char *def)
 {
-- 
2.16.2



Re: [PATCH v6 12/14] commit-graph: read only from specific pack-indexes

2018-03-15 Thread SZEDER Gábor
On Wed, Mar 14, 2018 at 8:27 PM, Derrick Stolee  wrote:
> From: Derrick Stolee 
>
> Teach git-commit-graph to inspect the objects only in a certain list
> of pack-indexes within the given pack directory. This allows updating
> the commit graph iteratively.

This commit message, and indeed the code itself talk about pack
indexes ...

> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/git-commit-graph.txt | 11 ++-
>  builtin/commit-graph.c | 33 ++---
>  commit-graph.c | 26 --
>  commit-graph.h |  4 +++-
>  packfile.c |  4 ++--
>  packfile.h |  2 ++
>  t/t5318-commit-graph.sh| 10 ++
>  7 files changed, 81 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/git-commit-graph.txt 
> b/Documentation/git-commit-graph.txt
> index 51cb038f3d..b945510f0f 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -32,7 +32,9 @@ COMMANDS
>  'write'::
>
>  Write a commit graph file based on the commits found in packfiles.
> -Includes all commits from the existing commit graph file.
> ++
> +With the `--stdin-packs` option, generate the new commit graph by
> +walking objects only in the specified packfiles.

... but this piece of documentation ...

> +   OPT_BOOL(0, "stdin-packs", _packs,
> +   N_("scan packfiles listed by stdin for commits")),

... and this help text, and even the name of the option talk about
packfiles.


Re: [PATCH v2 3/5] ref-filter: change parsing function error handling

2018-03-15 Thread Junio C Hamano
Olga Telezhnaya  writes:

> Continue removing any printing from ref-filter formatting logic,
> so that it could be more general.

Hmm.

> Change the signature of parse_ref_filter_atom() by changing return value,
> adding previous return value to function parameter and also adding
> strbuf parameter for error message.

This says what the patch changes, but it does not explain why it is
a good idea to return something with different meaning to the
callers (which of course forces us to update all callers so that
they pass  pointer and check the value returned in the
variable), or more importantly what meaning the return value has and
how the callers are expected to use it.  While at it, it probably is
a good idea to explain what the original return value means.

The return value from parse_ref_filter_atom() used to be the
position the atom is found in the used_atom[] array; that
information is now returned in an integer pointed at by the
*res parameter.  The function now returns 0 for success and
-1 for failure.

or something like that.

Having said that, I wonder if a calling convention that does not
force callers to pass in a pointer-to-int may make more sense.
Because the original return value is an index into an array, we know
the normal return values are not negative.  An updated caller could
become like this instead:

pos = parse_ref_filter_atom(format, atom, ep, );
if (pos < 0)
die("%s", err.buf);
... original code that used 'pos' can stay as before ...



Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names

2018-03-15 Thread Lars Schneider

> On 09 Mar 2018, at 20:11, Junio C Hamano  wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> From: Lars Schneider 
>> 
>> The canonical name of an UTF encoding has the format UTF, dash, number,
>> and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE).
>> Some iconv versions support alternative names without a dash or with
>> lower case characters.
>> 
>> To avoid problems between different iconv version always suggest the
>> canonical UTF names in advise messages.
>> 
>> Signed-off-by: Lars Schneider 
>> ---
> 
> I think it is probably better to squash this to earlier step,
> i.e. jumping straight to the endgame solution.

ok!


>> diff --git a/convert.c b/convert.c
>> index b80d666a6b..9a3ae7cce1 100644
>> --- a/convert.c
>> +++ b/convert.c
>> @@ -279,12 +279,20 @@ static int validate_encoding(const char *path, const 
>> char *enc,
>>  "BOM is prohibited in '%s' if encoded as %s");
>>  /*
>>   * This advice is shown for UTF-??BE and UTF-??LE 
>> encodings.
>> + * We cut off the last two characters of the encoding 
>> name
>> + # to generate the encoding name suitable for BOMs.
>>   */
> 
> I somehow thought that I saw "s/#/*/" in somebody's response during
> the previous round?

Oops. Will fix!


>>  const char *advise_msg = _(
>>  "The file '%s' contains a byte order "
>> -"mark (BOM). Please use %.6s as "
>> +"mark (BOM). Please use UTF-%s as "
>>  "working-tree-encoding.");
>> -advise(advise_msg, path, enc);
>> +const char *stripped = "";
>> +char *upper = xstrdup_toupper(enc);
>> +upper[strlen(upper)-2] = '\0';
>> +if (!skip_prefix(upper, "UTF-", ))
>> +skip_prefix(stripped, "UTF", );
>> +advise(advise_msg, path, stripped);
>> +free(upper);
> 
> If this codepath is ever entered with "enc" that does not begin with
> "UTF" (e.g. "Shift_JIS", which is impossible in the current code,
> but I'll talk about future-proofing here), then neither of these
> skip_prefix will trigger, and then you'd end up suggesting to use
> "UTF-" that is nonsense.  Perhaps initialize stripped to NULL and
> force advise to segv to catch such a programmer error?

Agreed!


Thanks,
Lars



Re: [RFC] Rebasing merges: a jorney to the ultimate solution(RoadClear)

2018-03-15 Thread Igor Djordjevic
Hi Sergey,

On 15/03/2018 07:00, Sergey Organov wrote:
> 
> > > Thinking about it I've got an idea that what we actually need is
> > > --no-flatten flag that, when used alone, will just tell "git rebase" to
> > > stop flattening history, and which will be implicitly imposed by
> > > --recreate-merges (and --preserve-merges).
> > >
> > > Then the only thing the --recreate-merges will tune is to put 'merge'
> > > directives into the todo list for merge commits, exactly according to
> > > what its name suggests, while the default behavior will be to put 'pick'
> > > with suitable syntax into the todo. And arguments to the
> > > --recreate-merge will specify additional options for the 'merge'
> > > directive, obviously.
> >
> > This seem to basically boil down to what I mentioned previously[2] 
> > through use of new `--rebase-merges` alongside `--recreate-merges`, just 
> > that you named it `--no-flatten` here, but the point is the same - and 
> > not something Johannes liked, "polluting" rebase option space further.
> 
> Not quite so. The problem with --XXX-merges flags is that they do two
> things at once: they say _what_ to do and _how_ to do it. Clean UI
> designs usually have these things separate, and that's what I propose.
> 
> The --[no-]flatten says _what_ (not) to do, and --recreate-merges says
> _how_ exactly it will be performed. In this model --no-flatten could
> have been called, say --preserve-shape, but not --rebase-merges.
> 
> To minimize pollution, the _how_ part could rather be made option value:
> 
> --no-flatten[=]
> 
> where  is 'rebase', 'remerge', etc.
> 
> In this case we will need separate option to specify strategy options,
> if required, that will lead us to something similar to the set of merge
> strategies options.
> 
> > I would agree with him, and settling onto `--rebase-merges` _instead_ of 
> > `--recreate-merges` seems as a more appropriate name, indeed, now that 
> > default behavior is actually merge commit rebasing and not recreating 
> > (recreating still being possible through user editing the todo list).
> 
> I hope he'd be pleased to be able to say --no-flatten=remerge and get
> back his current mode of operation, that he obviously has a good use
> for.

Makes sense, I like it, thanks for elaborating. [ Especially that you 
used "(no) flatten" phrasing, where original `--preserve-merges` 
documentation says it`s used "not to flatten the history", nice touch ;) ]

Not sure if I would prefer original `recreate` instead of `remerge` 
as that particular strategy name, though, but might be I`m just more 
used to the former one at the moment.

But if I think about it more, "recreate" seems straightforward enough - 
create something (again), kind of making it more obvious that it is a 
new thing (that we are now creating, again, based on some old thing).

On the other hand, "remerge" communicates "merge something again", 
which doesn`t necessarily mean creating a new thing (based on, but 
not attached to old thing), and could also be interpreted as "merge 
existing thing again" (and leaves me wondering if it would better 
suite some other strategy possible in the future).

Not sure if that explanation suffices, but it comes to "recreate a 
merge" having more clear meaning than "remerge a merge", being 
somewhat ambiguous, thus confusing (to me, at least).

I don`t know, thinking too much?

Regards, Buga


Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-15 Thread Ramsay Jones


On 15/03/18 18:41, Junio C Hamano wrote:
> Johannes Schindelin  writes:
> 
>> Stolee, you definitely want to inspect those changes (`git log --check`
>> was introduced to show you whitespace problems). If all of those
>> whitespace issues are unintentional, you can fix them using `git rebase
>> --whitespace=fix` in the most efficient way.
> 
> Another way that may be easier (depending on the way Derrick works)
> is to fetch from me and start working from there, as if they were
> the last set of commits that were sent to the list.  "git log
> --first-parent --oneline master..pu" would show where the tip of the
> topic is.

BTW, thanks for adding the 'SQUASH??? sparse fixes' on top of that
branch - sparse is now quiet on the 'pu' branch. (The same can't
be said of static-check.pl, but that is a different issue. ;-) ).

Thanks!

ATB,
Ramsay Jones



CONTACT DHL OFFICE IMMEDIATELY FOR DELIVERY OF YOUR ATM MASTERCARD

2018-03-15 Thread MR Paul Ogie
Attention; Beneficiary,

This is to official inform you that we have been having meetings for the past 
three (3) weeks which ended two days ago with MR. JIM YONG KIM the Former world 
bank president and other seven continent presidents on the congress we treated 
on solution to scam victim problems.

Note: we have decided to contact you following the reports we received from 
anti-fraud international monitoring group your name/email has been submitted to 
us therefore the united nations have agreed to compensate you with the sum of 
(USD$1.5 Million) this compensation is also including international business 
that failed you in past due to government problems etc.

We have arranged your payment through our ATM MasterCard and deposited it in 
DHL Office to deliver it to you which is the latest instruction from the Former 
World Bank president MR. JIM YONG KIM, For your information’s, the delivery 
charges already paid by U.N treasury, the only money you will send to DHL 
office Benin Republic is $225 dollars for security keeping fee, U.N coordinator 
already paid for others charges fees for delivery except the security keeping 
fee, the director of DHL refused to collect the security keeping fee from U.N 
treasury because Direct of DHL office said they don’t know exactly time you 
will contact them to reconfirm your details to avoid counting demurrage.

Therefore be advice to contact DHL Office agent Benin. Rev:Tony harries who is 
in position to post your ATM MasterCard, contact DHL Office  immediately with 
the bellow email & phone number  as listed below.

Contact name: Rev:Tony  HARRIES
Email:(post.dhlbenin.serv...@outlook.com )
Tell: +229-98787436

Make sure you reconfirmed DHL Office your details ASAP as stated below to avoid 
wrong delivery.

Your full name..
Home address:.
Your country...
Your city..
Telephone..
Occupation:...
Age:……..


Let us know as soon as possible you receive your ATM MasterCard for proper 
verifications.

Regards,
MR Paul Ogie.
DEPUTY SECRETARY-GENERAL (U.N)
Benin Republic.


Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

2018-03-15 Thread Lars Schneider

> On 09 Mar 2018, at 20:10, Junio C Hamano  wrote:
> 
> lars.schnei...@autodesk.com writes:
> 
>> +static const char *default_encoding = "UTF-8";
>> +
>> ...
>> +static const char *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +const char *value = check->value;
>> +
>> +if (ATTR_UNSET(value) || !strlen(value))
>> +return NULL;
>> +
>> +if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
>> +error(_("working-tree-encoding attribute requires a value"));
>> +return NULL;
>> +}
> 
> Hmph, so we decide to be loud but otherwise ignore an undefined
> configuration?  Shouldn't we rather die instead to avoid touching
> the user data in unexpected ways?

OK.


>> +
>> +/* Don't encode to the default encoding */
>> +if (!strcasecmp(value, default_encoding))
>> +return NULL;
> 
> Is this an optimization to avoid "recode one encoding to the same
> encoding" no-op overhead?

Correct.

>  We already have the optimization in the
> same spirit in may existing codepaths that has nothing to do with
> w-t-e, and I think we should share the code.  Two pieces of thought
> comes to mind.
> 
> One is a lot smaller in scale: Is same_encoding() sufficient for
> this callsite instead of strcasecmp()?

Yes!


> The other one is a lot bigger: Looking at all the existing callers
> of same_encoding() that call reencode_string() when it returns false,
> would it make sense to drop same_encoding() and move the optimization
> to reencode_string() instead?
> 
> I suspect that the answer to the smaller one is "yes, and even if
> not, it should be easy to enhance/extend same_encoding() to make it
> do what we want it to, and such a change will benefit even existing
> callers."  The answer to the larger one is likely "the optimization
> is not about skipping only reencode_string() call but other things
> are subtly different among callers of same_encoding(), so such a
> refactoring would not be all that useful."

I agree. reencode_string() would need to signal 3 cases:
1. reencode performed
2. reencode not necessary
3. reencode failed

We could model "reencode not necessary" as "char *in == char *return".
However, I think this should be tackled in a separate series.

Thanks
Lars


CONTACT DHL OFFICE IMMEDIATELY FOR DELIVERY OF YOUR ATM MASTERCARD

2018-03-15 Thread MR Paul Ogie
Attention; Beneficiary,

This is to official inform you that we have been having meetings for the past 
three (3) weeks which ended two days ago with MR. JIM YONG KIM the Former world 
bank president and other seven continent presidents on the congress we treated 
on solution to scam victim problems.

Note: we have decided to contact you following the reports we received from 
anti-fraud international monitoring group your name/email has been submitted to 
us therefore the united nations have agreed to compensate you with the sum of 
(USD$1.5 Million) this compensation is also including international business 
that failed you in past due to government problems etc.

We have arranged your payment through our ATM MasterCard and deposited it in 
DHL Office to deliver it to you which is the latest instruction from the Former 
World Bank president MR. JIM YONG KIM, For your information’s, the delivery 
charges already paid by U.N treasury, the only money you will send to DHL 
office Benin Republic is $225 dollars for security keeping fee, U.N coordinator 
already paid for others charges fees for delivery except the security keeping 
fee, the director of DHL refused to collect the security keeping fee from U.N 
treasury because Direct of DHL office said they don’t know exactly time you 
will contact them to reconfirm your details to avoid counting demurrage.

Therefore be advice to contact DHL Office agent Benin. Rev:Tony harries who is 
in position to post your ATM MasterCard, contact DHL Office  immediately with 
the bellow email & phone number  as listed below.

Contact name: Rev:Tony  HARRIES
Email:(post.dhlbenin.serv...@outlook.com )
Tell: +229-98787436

Make sure you reconfirmed DHL Office your details ASAP as stated below to avoid 
wrong delivery.

Your full name..
Home address:.
Your country...
Your city..
Telephone..
Occupation:...
Age:……..


Let us know as soon as possible you receive your ATM MasterCard for proper 
verifications.

Regards,
MR Paul Ogie.
DEPUTY SECRETARY-GENERAL (U.N)
Benin Republic.


Re: [PATCH v2 5/5] ref-filter: get_ref_atom_value() error handling

2018-03-15 Thread Junio C Hamano
Martin Ågren  writes:

>>  static int grab_objectname(const char *name, const unsigned char *sha1,
>> -  struct atom_value *v, struct used_atom *atom)
>> +  struct atom_value *v, struct used_atom *atom,
>> +  struct strbuf *err)
>>  {
>> ...
>> +   } else {
>> +   strbuf_addstr(err, "BUG: unknown %(objectname) 
>> option");
>> +   return -1;
>> +   }
>> }
>> return 0;
>>  }
>
> This is interesting. This die() is never ever supposed to actually
> trigger, except to allow a developer adding some new O_xxx-value to
> quickly notice that they have forgotten to add code here.

Yup, BUG() is meant for a case like this; the original code predates
the BUG("message") which in turn predates the die("BUG: message")
convention, I think.

>> default:
>> -   die("Eh?  Object of type %d?", obj->type);
>> +   strbuf_addf(err, "Eh?  Object of type %d?", obj->type);
>> +   return -1;
>> }
>> +   return 0;
>>  }
>
> This seems similar. The string here is quite sloppy, and I do not
> believe that the author intended this to be user-visible. I believe this
> is more like a very short way of saying "how could we possibly get
> here??". It could also be written as die("BUG: unknown object type %d",
> obj->type), or even better: BUG(...).

Likewise.


Re: [PATCH v2 5/5] ref-filter: get_ref_atom_value() error handling

2018-03-15 Thread Eric Sunshine
On Thu, Mar 15, 2018 at 4:47 PM, Martin Ågren  wrote:
> These are "real" errors and yield several more changes in the remainder.
> Ignoring those BUG-type messages at the beginning of this patch would
> give a patch like the one below.
>
> +static int get_object(struct ref_array_item *ref, const struct object_id 
> *oid,
> +  int deref, struct object **obj, struct strbuf *err)
>  {
> void *buf = get_obj(oid, obj, , );
> -   if (!buf)
> -   die(_("missing object %s for %s"),
> -   oid_to_hex(oid), ref->refname);
> -   if (!*obj)
> -   die(_("parse_object_buffer failed on %s for %s"),
> -   oid_to_hex(oid), ref->refname);
> -
> +   if (!buf) {
> +   strbuf_addf(err, _("missing object %s for %s"), 
> oid_to_hex(oid),
> +   ref->refname);
> +   return -1;
> +   }
> +   if (!*obj) {
> +   strbuf_addf(err, _("parse_object_buffer failed on %s for %s"),
> +   oid_to_hex(oid), ref->refname);
> +   return -1;

Doesn't this leak 'buf'?

> +   }
> grab_values(ref->value, deref, *obj, buf, size);
> if (!eaten)
> free(buf);
> +   return 0;
>  }


Re: [PATCH v2 5/5] ref-filter: get_ref_atom_value() error handling

2018-03-15 Thread Martin Ågren
I skimmed the first four patches of this v2. It seems that patches 1 and
4 are identical to v2. Patches 2 and 3 have very straightforward changes
based on my earlier comments. Let's see what this patch is about. :-)

On 14 March 2018 at 20:04, Olga Telezhnaya  wrote:
> Finish removing any printing from ref-filter formatting logic,
> so that it could be more general.
>
> Change the signature of get_ref_atom_value() and underlying functions
> by adding return value and strbuf parameter for error message.
>
> It's important to mention that grab_objectname() returned 1 if
> it gets objectname atom and 0 otherwise. Now this logic changed:
> we return 0 if we have no error, -1 otherwise. If someone needs to
> know whether it's objectname atom or not, he/she could use
> starts_with() function. It duplicates this checking but it does not
> sound like a really big overhead.
>
> Signed-off-by: Olga Telezhnaia 
> ---
>  ref-filter.c | 109 
> +--
>  1 file changed, 69 insertions(+), 40 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 62ea4adcd0ff1..3f0c3924273d5 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -831,26 +831,27 @@ static void *get_obj(const struct object_id *oid, 
> struct object **obj, unsigned
>  }
>
>  static int grab_objectname(const char *name, const unsigned char *sha1,
> -  struct atom_value *v, struct used_atom *atom)
> +  struct atom_value *v, struct used_atom *atom,
> +  struct strbuf *err)
>  {
> if (starts_with(name, "objectname")) {
> if (atom->u.objectname.option == O_SHORT) {
> v->s = xstrdup(find_unique_abbrev(sha1, 
> DEFAULT_ABBREV));
> -   return 1;
> } else if (atom->u.objectname.option == O_FULL) {
> v->s = xstrdup(sha1_to_hex(sha1));
> -   return 1;
> } else if (atom->u.objectname.option == O_LENGTH) {
> v->s = xstrdup(find_unique_abbrev(sha1, 
> atom->u.objectname.length));
> -   return 1;
> -   } else
> -   die("BUG: unknown %%(objectname) option");
> +   } else {
> +   strbuf_addstr(err, "BUG: unknown %(objectname) 
> option");
> +   return -1;
> +   }
> }
> return 0;
>  }

This is interesting. This die() is never ever supposed to actually
trigger, except to allow a developer adding some new O_xxx-value to
quickly notice that they have forgotten to add code here.

>  /* See grab_values */
> -static void grab_common_values(struct atom_value *val, int deref, struct 
> object *obj, void *buf, unsigned long sz)
> +static int grab_common_values(struct atom_value *val, int deref, struct 
> object *obj,
> + void *buf, unsigned long sz, struct strbuf *err)
>  {
> int i;
>
> @@ -868,8 +869,10 @@ static void grab_common_values(struct atom_value *val, 
> int deref, struct object
> v->s = xstrfmt("%lu", sz);
> }
> else if (deref)
> -   grab_objectname(name, obj->oid.hash, v, 
> _atom[i]);
> +   if (grab_objectname(name, obj->oid.hash, v, 
> _atom[i], err))
> +   return -1;
> }
> +   return 0;
>  }

So if that conversion I commented on above had not happened, this would
not have been necessary. Let's read on...

>  /* See grab_values */
> @@ -1225,9 +1228,11 @@ static void fill_missing_values(struct atom_value *val)
>   * pointed at by the ref itself; otherwise it is the object the
>   * ref (which is a tag) refers to.
>   */
> -static void grab_values(struct atom_value *val, int deref, struct object 
> *obj, void *buf, unsigned long sz)
> +static int grab_values(struct atom_value *val, int deref, struct object *obj,
> +  void *buf, unsigned long sz, struct strbuf *err)
>  {
> -   grab_common_values(val, deref, obj, buf, sz);
> +   if (grab_common_values(val, deref, obj, buf, sz, err))
> +   return -1;
> switch (obj->type) {
> case OBJ_TAG:
> grab_tag_values(val, deref, obj, buf, sz);
> @@ -1247,8 +1252,10 @@ static void grab_values(struct atom_value *val, int 
> deref, struct object *obj, v
> /* grab_blob_values(val, deref, obj, buf, sz); */
> break;
> default:
> -   die("Eh?  Object of type %d?", obj->type);
> +   strbuf_addf(err, "Eh?  Object of type %d?", obj->type);
> +   return -1;
> }
> +   return 0;
>  }

This seems similar. The string here is quite sloppy, and I do not
believe that the author intended this to be user-visible. I believe this
is more like a very 

Re: [PATCH v4 3/3] t/t3200: fix a typo in a test description

2018-03-15 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Signed-off-by: Kaartic Sivaraam 
> ---
>  t/t3200-branch.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks.  It is very nice to see unrelated issues nearby found while
working on something else.  Will queue.

>
> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 503a88d02..6c0b7ea4a 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -528,7 +528,7 @@ test_expect_success 'git branch -c -f o/q o/p should work 
> when o/p exists' '
>   git branch -c -f o/q o/p
>  '
>  
> -test_expect_success 'git branch -c qq rr/qq should fail when r exists' '
> +test_expect_success 'git branch -c qq rr/qq should fail when rr exists' '
>   git branch qq &&
>   git branch rr &&
>   test_must_fail git branch -c qq rr/qq


Re: [bug] git stash push {dir-pathspec} wipes untracked files

2018-03-15 Thread Igor Djordjevic
On 15/03/2018 17:52, Junio C Hamano wrote:
> 
> > Hi, I ran into what I believe is a bug today.  I’m using primarily Git
> > for Windows 2.16.2 and also reproduced the behavior on Git for Windows
> > 2.15.1 and Git 2.14.1 on Ubuntu:
> >
> > Given any repository with at least one subdirectory:
> >
> > 1.   Create some untracked files in the subdir
> > 2.   Modify a tracked file in the subdir
> > 3.   Execute `git stash push subdir`
> > 4.   The untracked files will be removed, without warning.
> >
> > `git stash push` behaves as-expcted and does not touch untracked
> > files.  It’s only when a directory tree is specified as [pathspec]
> > that the problem occurs.
> 
> I wonder if this is the same as the topic on this thread.
> 
>   
> https://public-inbox.org/git/ca+hnv10i7avwxjrqjxxy1lnjtmhr7le4twxhhuybiwtmjco...@mail.gmail.com/
> 
> What is curious is that the fix bba067d2 ("stash: don't delete
> untracked files that match pathspec", 2018-01-06) appeared first in
> 2.16.2, on which Windows 2.16.2 is supposed to be built upon.
> 
> > Here's the precise reproduction case executed on a linux box:
> 
> This does not reproduce for me with v2.16.2-17-g38e79b1fda (the tip
> of 'maint'); I do not have an  install of vanilla v2.16.2 handy, but
> I suspect v2.16.2 would work just fine, too.
> 
> > jake@jake-VirtualBox:~/woot$ git --version
> > git version 2.14.1
> > ...
> > The expected result is that when I do `ls subdir` the file
> > "untracked.txt" still exists.  Alternatively, git stash should warn me
> > before destroying my untracked files, and require I specify --force or
> > similar to invoke destructive behavior.

I can't seem to reproduce this on 2.16.2.windows.1, either:

+ git --version
git version 2.16.2.windows.1
+ git init woot
Initialized empty Git repository in /woot/.git/
+ cd woot
+ mkdir subdir
+ echo test
+ echo test
+ git add meh.txt subdir/meh2.txt
+ git commit '--message=stash bug testing'
[master (root-commit) afec47d] stash bug testing
 2 files changed, 2 insertions(+)
 create mode 100644 meh.txt
 create mode 100644 subdir/meh2.txt
+ git commit '--message=stash bug testing'
On branch master
nothing to commit, working tree clean
+ echo test
+ echo append
+ git stash push subdir
Saved working directory and index state WIP on master: afec47d stash bug testing
+ ls subdir
meh2.txt  untracked.txt

Regards, Buga


Re: [PATCH v4 2/3] builtin/branch: give more useful error messages when renaming

2018-03-15 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> +static void get_error_msg(struct strbuf* error_msg,
> +   const char* oldname, enum 
> old_branch_validation_result old_branch_name_res,
> +   const char* newname, enum branch_validation_result 
> new_branch_name_res)
> +{
> + const char* connector_string = "; ";
> + unsigned append_connector = 0;
> +
> + switch (old_branch_name_res) {
> + case VALIDATION_1_FATAL_INVALID_OLD_BRANCH_NAME:
> + strbuf_addf(error_msg,
> + _("old branch name '%s' is invalid"), oldname);
> + append_connector = 1;
> + break;
> + case VALIDATION_1_FATAL_OLD_BRANCH_DOESNT_EXIST:
> + strbuf_addf(error_msg,
> + _("branch '%s' doesn't exist"), oldname);
> + append_connector = 1;
> + break;
> +
> + /* not necessary to handle nonfatal cases */
> + case VALIDATION_1_PASS_OLD_BRANCH_EXISTS:
> + case VALIDATION_1_WARN_BAD_OLD_BRANCH_NAME:
> + break;
> + }
> +
> + switch (new_branch_name_res) {
> + case VALIDATION_FATAL_BRANCH_EXISTS_NO_FORCE:
> + strbuf_addf(error_msg, "%s",
> + (append_connector) ? connector_string : "");
> + strbuf_addf(error_msg,
> + _("branch '%s' already exists"), newname);
> + break;
> + case VALIDATION_FATAL_CANNOT_FORCE_UPDATE_CURRENT_BRANCH:
> + strbuf_addf(error_msg, "%s",
> + (append_connector) ? connector_string : "");
> + strbuf_addstr(error_msg,
> + _("cannot force update the current branch"));
> + break;
> + case VALIDATION_FATAL_INVALID_BRANCH_NAME:
> + strbuf_addf(error_msg, "%s",
> + (append_connector) ? connector_string : "");
> + strbuf_addf(error_msg,
> + _("new branch name '%s' is invalid"), newname);
> + break;
> +
> + /* not necessary to handle nonfatal cases */
> + case VALIDATION_PASS_BRANCH_DOESNT_EXIST:
> + case VALIDATION_PASS_BRANCH_EXISTS:
> + case VALIDATION_WARN_BRANCH_EXISTS:
> + break;
> + }
> +}

Quite honestly, I am not sure if this amount of new code that
results in sentence lego is really worth it.  Is it so wrong for
"branch -m tset master" to complain that master already exists so no
branch can be renamed to it?


Re: [PATCH v4 1/3] branch: introduce dont_fail parameter for branchname validation

2018-03-15 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> This parameter allows the branchname validation functions to
> optionally return a flag specifying the reason for failure, when
> requested. This allows the caller to know why it was about to die.
> This allows more useful error messages to be given to the user when
> trying to rename a branch.
>
> The flags are specified in the form of an enum and values for success
> flags have been assigned explicitly to clearly express that certain
> callers rely on those values and they cannot be arbitrary.
>
> Only the logic has been added but no caller has been made to use
> it, yet. So, no functional changes.
>
> Signed-off-by: Kaartic Sivaraam 
> ---

So... I am not finding dont_fail that was mentioned on the title
anywhere else in the patch.  Such lack of attention to detail is
a bit off-putting.

The change itself overall looks OK.  One minor thing that made me
wonder was this bit:

> +enum branch_validation_result {
> + /* Flags that convey there are fatal errors */
> + VALIDATION_FATAL_BRANCH_EXISTS_NO_FORCE = -3,
> + VALIDATION_FATAL_CANNOT_FORCE_UPDATE_CURRENT_BRANCH,
> + VALIDATION_FATAL_INVALID_BRANCH_NAME,
> + /* Flags that convey there are no fatal errors */
> + VALIDATION_PASS_BRANCH_DOESNT_EXIST = 0,
> + VALIDATION_PASS_BRANCH_EXISTS = 1,
> + VALIDATION_WARN_BRANCH_EXISTS = 2
> +};

where adding new error types will force us to touch _two_ lines
(i.e. either you add a new error before NO_FORCE with value -4 and
then remove the "= -3" from NO_FORCE, or you add a new error after
INVALID, and update NO_FORCE to -4), which can easily be screwed up
by a careless developer.  The current code is not wrong per-se, but
I wonder if it can be made less error prone.



Re: [PATCH v2 2/2] stash push -u: don't create empty stash

2018-03-15 Thread Junio C Hamano
Thomas Gummerer  writes:

>  no_changes () {
>   git diff-index --quiet --cached HEAD --ignore-submodules -- "$@" &&
>   git diff-files --quiet --ignore-submodules -- "$@" &&
> - (test -z "$untracked" || test -z "$(untracked_files)")
> + (test -z "$untracked" || test -z "$(untracked_files $@)")

Don't you need a pair of double-quotes around that $@?


Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-15 Thread Ævar Arnfjörð Bjarmason

On Thu, Mar 15 2018, Duy Nguyen jotted:

> On Mon, Mar 12, 2018 at 8:30 PM, Ævar Arnfjörð Bjarmason
>  wrote:
>> We already have pack.packSizeLimit, perhaps we could call this
>> e.g. gc.keepPacksSize=2GB?
>
> I'm OK either way. The "base pack" concept comes from the
> "--keep-base-pack" option where we do keep _one_ base pack. But gc
> config var has a slightly different semantics when it can keep
> multiple packs.

I see, yeah it would be great to generalize it to N packs.

>> Finally I wonder if there should be something equivalent to
>> gc.autoPackLimit for this. I.e. with my proposed semantics above it's
>> possible that we end up growing forever, i.e. I could have 1000 2GB
>> packs and then 50 very small packs per gc.autoPackLimit.
>>
>> Maybe we need a gc.keepPackLimit=100 to deal with that, then e.g. if
>> gc.keepPacksSize=2GB is set and we have 101 >= 2GB packs, we'd pick the
>> two smallest one and not issue a --keep-pack for those, although then
>> maybe our memory use would spike past the limit.
>>
>> I don't know, maybe we can leave that for later, but I'm quite keen to
>> turn the top-level config variable into something that just considers
>> size instead of "base" if possible, and it seems we're >95% of the way
>> to that already with this patch.
>
> At least I will try to ignore gc.keepPacksSize if all packs are kept
> because of it. That repack run will hurt. But then we're back to one
> giant pack and plenty of small packs that will take some time to grow
> up to 2GB again.

I think that semantic really should have its own option. The usefulness
of this is significantly diminished if it's not a guarantee on the
resource use of git-gc.

Consider a very large repo where we clone and get a 4GB pack. Then as
time goes on we end up with lots of loose objects and small packs from
pulling, and eventually end up with say 4GB + 2x 500MB packs (if our
limit is 500MB).

If I understand what you're saying correctly if we ever match the gc
--auto requirements because we have *just* the big packs and then a
bunch of loose objects (say we rebased a lot) then we'll try to create a
giant 5GB pack (+ loose objects).

>> Finally, I don't like the way the current implementation conflates a
>> "size" variable with auto detecting the size from memory, leaving no way
>> to fallback to the auto-detection if you set it manually.
>>
>> I think we should split out the auto-memory behavior into another
>> variable, and also make the currently hardcoded 50% of memory
>> configurable.
>>
>> That way you could e.g. say you'd always like to keep 2GB packs, but if
>> you happen to have ended up with a 1GB pack and it's time to repack, and
>> you only have 500MB free memory on that system, it would keep the 1GB
>> one until such time as we have more memory.
>
> I don't calculate based on free memory (it's tricky to get that right
> on linux) but physical memory. If you don't have enough memory
> according to this formula, you won't until you add more memory sticks.

Ah, thanks for the clarification.

>>
>> Actually maybe that should be a "if we're that low on memory, forget
>> about GC for now" config, but urgh, there's a lot of potential
>> complexity to be handled here...
>
> Yeah I think what you want is a hook. You can make custom rules then.
> We already have pre-auto-gc hook and could pretty much do what you
> want without pack-objects memory estimation. But if you want it, maybe
> we can export the info to the hook somehow.

I can do away with that particular thing, but I'd really like to do
without the hook. I can automate it on some machines, but then we also
have un-managed laptops run by users who clone big repos. It's much
easier to tell them to set a few git config variables than have them
install & keep some hook up-to-date.


Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Lars Schneider

> On 15 Mar 2018, at 02:34, Junio C Hamano  wrote:
> 
> ...
> 
> * ls/checkout-encoding (2018-03-09) 10 commits
> - convert: add round trip check based on 'core.checkRoundtripEncoding'
> - convert: add tracing for 'working-tree-encoding' attribute
> - convert: advise canonical UTF encoding names
> - convert: check for detectable errors in UTF encodings
> - convert: add 'working-tree-encoding' attribute
> - utf8: add function to detect a missing UTF-16/32 BOM
> - utf8: add function to detect prohibited UTF-16/32 BOM
> - strbuf: add a case insensitive starts_with()
> - strbuf: add xstrdup_toupper()
> - strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
> 
> The new "checkout-encoding" attribute can ask Git to convert the
> contents to the specified encoding when checking out to the working
> tree (and the other way around when checking in).
> 
> Expecting a reroll.
> cf. <66370a41-a048-44e7-9bf8-4631c50aa...@gmail.com>
> Modulo minor design decision corrections, the series is almost there.

If I hurry up (IOW: send a reroll tonight), would this topic
have a chance for 2.17-rc1 or is it too late?

Thanks,
Lars




Attention

2018-03-15 Thread Webmail Service
Dear eMail User,

Your email account is due for upgrade. Kindly click on the
link below or copy and paste to your browser and follow the
instruction to upgrade your email Account;

https://www.zipsurvey.com/LaunchSurvey.aspx?suid=85067=BF43451A

Our webmail Technical Team will update your account. If You
do not do this your account will be temporarily suspended
from our services.

Warning!! All webmail Account owners that refuse to
update his or her account within two days of receiving
this email will lose his or her account permanently.

Thank you for your cooperation!

Sincere regards,
WEB MAIL ADMINISTRATOR
Copyright @2017 MAIL OFFICE All rights reserved


Re: [PATCH v1] Fix bugs preventing adding updated cache entries to the name hash

2018-03-15 Thread Junio C Hamano
Ben Peart  writes:

> This 2nd part of the patch was more for code cleanliness.

Thanks.


Re: [PATCH v1] Fix bugs preventing adding updated cache entries to the name hash

2018-03-15 Thread Ben Peart



On 3/15/2018 1:58 PM, Junio C Hamano wrote:

Ben Peart  writes:


Update replace_index_entry() to clear the CE_HASHED flag from the new cache
entry so that it can add it to the name hash in set_index_entry()


OK.


diff --git a/read-cache.c b/read-cache.c
index 977921d90c..bdfa552861 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -62,6 +62,7 @@ static void replace_index_entry(struct index_state *istate, 
int nr, struct cache
replace_index_entry_in_base(istate, old, ce);
remove_name_hash(istate, old);
free(old);
+   ce->ce_flags &= ~CE_HASHED;
set_index_entry(istate, nr, ce);
ce->ce_flags |= CE_UPDATE_IN_BASE;
mark_fsmonitor_invalid(istate, ce);


As we are removing "old" that is not "ce", an earlier call to
remove_name_hash() that clears the CE_HASHED bit from the cache
entry does not help us at all.  We need to clear the bit from "ce"
ourselves before calling set_index_entry() on it, otherwise the call
would become a no-op wrt the name hash.  Makes sense.



Correct.  As you note below, this one line is sufficient to fix the 
actual bug.



Makes me wonder why "ce" which is a replacement for what is in the
index already has the hashed bit, though.  Is that the failure to
use copy_cache_entry() in the caller the other part of this patch
fixes?  To me it looks like copy_cache_entry() is designed for
copying an entry's data to another one that has a different name,
but in the refresh codepath, we _know_ we are replacing an old entry
with an entry with the same name, so it somehow feels a bit strange
to use copy_cache_entry(), instead of doing memcpy() (and possibly
dropping the HASHED bit from the new copy--but wouldn't that become
unnecessary with the fix to replace_index_entry() we saw above?)



This 2nd part of the patch was more for code cleanliness.  When I was 
investigating why the hashed bit was set, it was caused by this 
memcpy().  When I examined the rest of the code base, I only found 1 
other instance (in dup_entry()) that did a straight memcpy(), the rest 
used the copy_cache_entry() macro.  I updated this code to match that 
pattern as it would have prevented the bug as well though as you 
correctly point out, it is not necessary with the other fix.



Is this fix something we can demonstrate in a new test, by the way?



Unfortunately I was unable to find a way to reliably demonstrate this 
bug and fix with a new test.  I only ran across it while working on 
another patch series that ends up triggering it more reliably.


The symptom was that occasionally in very specific circumstances a git 
status call would incorrectly show some directories as having untracked 
files when there were actually no untracked files in that directory.  A 
subsequent call to status would correctly show no untracked files as 
would git status -uall.


I do have a test for this fix as part of the other patch series but 
thought I'd submit this bug fix separately as it may be a while before 
the other patch series is ready.



Thanks.



Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-15 Thread Junio C Hamano
Johannes Schindelin  writes:

> Stolee, you definitely want to inspect those changes (`git log --check`
> was introduced to show you whitespace problems). If all of those
> whitespace issues are unintentional, you can fix them using `git rebase
> --whitespace=fix` in the most efficient way.

Another way that may be easier (depending on the way Derrick works)
is to fetch from me and start working from there, as if they were
the last set of commits that were sent to the list.  "git log
--first-parent --oneline master..pu" would show where the tip of the
topic is.



Attention

2018-03-15 Thread Webmail Service
Dear eMail User,

Your email account is due for upgrade. Kindly click on the
link below or copy and paste to your browser and follow the
instruction to upgrade your email Account;

https://www.zipsurvey.com/LaunchSurvey.aspx?suid=85067=BF43451A

Our webmail Technical Team will update your account. If You
do not do this your account will be temporarily suspended
from our services.

Warning!! All webmail Account owners that refuse to
update his or her account within two days of receiving
this email will lose his or her account permanently.

Thank you for your cooperation!

Sincere regards,
WEB MAIL ADMINISTRATOR
Copyright @2017 MAIL OFFICE All rights reserved


Re: [PATCH v5 01/35] pkt-line: introduce packet_read_with_status

2018-03-15 Thread Junio C Hamano
Brandon Williams  writes:

>> EOF was -1 and NORMAL was 0 in the previous round; do we need to
>> read through all the invocations of functions that return this type
>> and make sure there is no "while (such_a_function())" that used to see
>> if we read NORMAL that is left un-updated?
>>  ...
>> Will replace.  Thanks.
>
> A reviewer in the previous round found that it was unnecessary to have
> EOF start at -1, so per their comments I got rid of that.

Yes, I am aware of that exchange, and after vetting the callers I
think it is "unnecessary" for EOF to be negative and NORMAL to be 0
with the current code (iow, any value can be used for these enums as
long as they are distinct).

But that is different matter.  If having negative EOF and/or zero
NORMAL helps readability of the resulting code, then even if it is
not "necessary" for EOF to be negative, it would still be "better".



Re: [PATCH v2] filter-branch: return 2 when nothing to rewrite

2018-03-15 Thread Jeff King
On Thu, Mar 15, 2018 at 06:09:18PM +0100, Michele Locati wrote:

> Using the --state-branch option allows us to perform incremental filtering.
> This may lead to having nothing to rewrite in subsequent filtering, so we need
> a way to recognize this case.
> So, let's exit with 2 instead of 1 when this "error" occurs.

Thanks, this looks good to me.

I did have one other thought while reading this, but I think it's OK to
leave as-is:

> +EXIT STATUS
> +---
> +
> +On success, the exit status is `0`.  If the filter can't find any commits to
> +rewrite, the exit status is `2`.  On any other error, the exit status may be
> +any other non-zero value.

I wondered if people might take "any commits to rewrite" to also mean
the case where the filters do not actually change any commits (e.g, and
index filter which removes a path that does not exist). That's currently
a successful outcome but does issue a warning; it's not changed by this
patch at all (and nor should it be).

If we wanted to make that more clear, we could perhaps mention the
--state-branch option here explicitly. Not sure if it's worth it.

-Peff


Re: [PATCH v2] filter-branch: return 2 when nothing to rewrite

2018-03-15 Thread Junio C Hamano
Michele Locati  writes:

> Using the --state-branch option allows us to perform incremental filtering.
> This may lead to having nothing to rewrite in subsequent filtering, so we need
> a way to recognize this case.
> So, let's exit with 2 instead of 1 when this "error" occurs.
>
> Signed-off-by: Michele Locati 
> ---
>  Documentation/git-filter-branch.txt | 8 
>  git-filter-branch.sh| 2 +-
>  2 files changed, 9 insertions(+), 1 deletion(-)

Thanks.  Will queue.

>
> diff --git a/Documentation/git-filter-branch.txt 
> b/Documentation/git-filter-branch.txt
> index 3a52e4dce..b63404318 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -222,6 +222,14 @@ this purpose, they are instead rewritten to point at the 
> nearest ancestor that
>  was not excluded.
>  
>  
> +EXIT STATUS
> +---
> +
> +On success, the exit status is `0`.  If the filter can't find any commits to
> +rewrite, the exit status is `2`.  On any other error, the exit status may be
> +any other non-zero value.
> +
> +
>  Examples
>  
>  
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 1b7e4b2cd..c285fdb90 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -310,7 +310,7 @@ git rev-list --reverse --topo-order --default HEAD \
>   die "Could not get the commits"
>  commits=$(wc -l <../revs | tr -d " ")
>  
> -test $commits -eq 0 && die "Found nothing to rewrite"
> +test $commits -eq 0 && die_with_status 2 "Found nothing to rewrite"
>  
>  # Rewrite the commits
>  report_progress ()


Re: [PATCH v1] Fix bugs preventing adding updated cache entries to the name hash

2018-03-15 Thread Junio C Hamano
Ben Peart  writes:

> Update replace_index_entry() to clear the CE_HASHED flag from the new cache
> entry so that it can add it to the name hash in set_index_entry()

OK.  

> diff --git a/read-cache.c b/read-cache.c
> index 977921d90c..bdfa552861 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -62,6 +62,7 @@ static void replace_index_entry(struct index_state *istate, 
> int nr, struct cache
>   replace_index_entry_in_base(istate, old, ce);
>   remove_name_hash(istate, old);
>   free(old);
> + ce->ce_flags &= ~CE_HASHED;
>   set_index_entry(istate, nr, ce);
>   ce->ce_flags |= CE_UPDATE_IN_BASE;
>   mark_fsmonitor_invalid(istate, ce);

As we are removing "old" that is not "ce", an earlier call to
remove_name_hash() that clears the CE_HASHED bit from the cache
entry does not help us at all.  We need to clear the bit from "ce"
ourselves before calling set_index_entry() on it, otherwise the call
would become a no-op wrt the name hash.  Makes sense.

Makes me wonder why "ce" which is a replacement for what is in the
index already has the hashed bit, though.  Is that the failure to
use copy_cache_entry() in the caller the other part of this patch
fixes?  To me it looks like copy_cache_entry() is designed for
copying an entry's data to another one that has a different name,
but in the refresh codepath, we _know_ we are replacing an old entry
with an entry with the same name, so it somehow feels a bit strange
to use copy_cache_entry(), instead of doing memcpy() (and possibly
dropping the HASHED bit from the new copy--but wouldn't that become
unnecessary with the fix to replace_index_entry() we saw above?)

Is this fix something we can demonstrate in a new test, by the way?

Thanks.


Re: [PATCH 3/3] Makefile: optionally symlink libexec/git-core binaries to bin/git

2018-03-15 Thread Linus Torvalds
On Thu, Mar 15, 2018 at 10:05 AM, Johannes Schindelin
 wrote:
> The most sensible thing, of course, would be to *not* link the builtins at
> all. I mean, we deprecated the dashed form (which was a design mistake,
> whether you admit it or not) a long time ago.

That's probably not a bad idea for the builtin commands. At least as an option.

We do end up still using the dashed form for certain things, but they
are already special-cased (ie things like "git-receive-pack" and
"git-shell" that very much get executed directly, and for fundamental
reasons).

As to it being a design mistake? No, not really. It made a lot of
sense at the time. The fact is, the problem is Windows, not git. I
know you have your hangups, but that's your problem.

   Linus


[PATCH v6 04/35] upload-pack: convert to a builtin

2018-03-15 Thread Brandon Williams
In order to allow for code sharing with the server-side of fetch in
protocol-v2 convert upload-pack to be a builtin.

Signed-off-by: Brandon Williams 
---
 Makefile  |   3 +-
 builtin.h |   1 +
 builtin/upload-pack.c |  67 ++
 git.c |   1 +
 upload-pack.c | 107 ++
 upload-pack.h |  13 +
 6 files changed, 109 insertions(+), 83 deletions(-)
 create mode 100644 builtin/upload-pack.c
 create mode 100644 upload-pack.h

diff --git a/Makefile b/Makefile
index 1a9b23b679..b7ccc05fac 100644
--- a/Makefile
+++ b/Makefile
@@ -639,7 +639,6 @@ PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
 PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
-PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
 
 # Binary suffix, set to .exe for Windows builds
@@ -909,6 +908,7 @@ LIB_OBJS += tree-diff.o
 LIB_OBJS += tree.o
 LIB_OBJS += tree-walk.o
 LIB_OBJS += unpack-trees.o
+LIB_OBJS += upload-pack.o
 LIB_OBJS += url.o
 LIB_OBJS += urlmatch.o
 LIB_OBJS += usage.o
@@ -1026,6 +1026,7 @@ BUILTIN_OBJS += builtin/update-index.o
 BUILTIN_OBJS += builtin/update-ref.o
 BUILTIN_OBJS += builtin/update-server-info.o
 BUILTIN_OBJS += builtin/upload-archive.o
+BUILTIN_OBJS += builtin/upload-pack.o
 BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..f332a12574 100644
--- a/builtin.h
+++ b/builtin.h
@@ -231,6 +231,7 @@ extern int cmd_update_ref(int argc, const char **argv, 
const char *prefix);
 extern int cmd_update_server_info(int argc, const char **argv, const char 
*prefix);
 extern int cmd_upload_archive(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_archive_writer(int argc, const char **argv, const char 
*prefix);
+extern int cmd_upload_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_var(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_tag(int argc, const char **argv, const char *prefix);
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
new file mode 100644
index 00..2cb5cb35b0
--- /dev/null
+++ b/builtin/upload-pack.c
@@ -0,0 +1,67 @@
+#include "cache.h"
+#include "builtin.h"
+#include "exec_cmd.h"
+#include "pkt-line.h"
+#include "parse-options.h"
+#include "protocol.h"
+#include "upload-pack.h"
+
+static const char * const upload_pack_usage[] = {
+   N_("git upload-pack [] "),
+   NULL
+};
+
+int cmd_upload_pack(int argc, const char **argv, const char *prefix)
+{
+   const char *dir;
+   int strict = 0;
+   struct upload_pack_options opts = { 0 };
+   struct option options[] = {
+   OPT_BOOL(0, "stateless-rpc", _rpc,
+N_("quit after a single request/response exchange")),
+   OPT_BOOL(0, "advertise-refs", _refs,
+N_("exit immediately after initial ref 
advertisement")),
+   OPT_BOOL(0, "strict", ,
+N_("do not try /.git/ if  is no 
Git directory")),
+   OPT_INTEGER(0, "timeout", ,
+   N_("interrupt transfer after  seconds of 
inactivity")),
+   OPT_END()
+   };
+
+   packet_trace_identity("upload-pack");
+   check_replace_refs = 0;
+
+   argc = parse_options(argc, argv, NULL, options, upload_pack_usage, 0);
+
+   if (argc != 1)
+   usage_with_options(upload_pack_usage, options);
+
+   if (opts.timeout)
+   opts.daemon_mode = 1;
+
+   setup_path();
+
+   dir = argv[0];
+
+   if (!enter_repo(dir, strict))
+   die("'%s' does not appear to be a git repository", dir);
+
+   switch (determine_protocol_version_server()) {
+   case protocol_v1:
+   /*
+* v1 is just the original protocol with a version string,
+* so just fall through after writing the version string.
+*/
+   if (opts.advertise_refs || !opts.stateless_rpc)
+   packet_write_fmt(1, "version 1\n");
+
+   /* fallthrough */
+   case protocol_v0:
+   upload_pack();
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
+   return 0;
+}
diff --git a/git.c b/git.c
index c870b9719c..f71073dc8d 100644
--- a/git.c
+++ b/git.c
@@ -478,6 +478,7 @@ static struct cmd_struct commands[] = {
{ "update-server-info", cmd_update_server_info, RUN_SETUP },
{ "upload-archive", cmd_upload_archive },
{ "upload-archive--writer", cmd_upload_archive_writer },
+   { "upload-pack", cmd_upload_pack },
{ "var", cmd_var, RUN_SETUP_GENTLY },
{ "verify-commit", cmd_verify_commit, 

[PATCH v6 05/35] upload-pack: factor out processing lines

2018-03-15 Thread Brandon Williams
Factor out the logic for processing shallow, deepen, deepen_since, and
deepen_not lines into their own functions to simplify the
'receive_needs()' function in addition to making it easier to reuse some
of this logic when implementing protocol_v2.

Signed-off-by: Brandon Williams 
---
 upload-pack.c | 113 +-
 1 file changed, 74 insertions(+), 39 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 2ad73a98b1..1e8a9e1caf 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -724,6 +724,75 @@ static void deepen_by_rev_list(int ac, const char **av,
packet_flush(1);
 }
 
+static int process_shallow(const char *line, struct object_array *shallows)
+{
+   const char *arg;
+   if (skip_prefix(line, "shallow ", )) {
+   struct object_id oid;
+   struct object *object;
+   if (get_oid_hex(arg, ))
+   die("invalid shallow line: %s", line);
+   object = parse_object();
+   if (!object)
+   return 1;
+   if (object->type != OBJ_COMMIT)
+   die("invalid shallow object %s", oid_to_hex());
+   if (!(object->flags & CLIENT_SHALLOW)) {
+   object->flags |= CLIENT_SHALLOW;
+   add_object_array(object, NULL, shallows);
+   }
+   return 1;
+   }
+
+   return 0;
+}
+
+static int process_deepen(const char *line, int *depth)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen ", )) {
+   char *end = NULL;
+   *depth = (int)strtol(arg, , 0);
+   if (!end || *end || *depth <= 0)
+   die("Invalid deepen: %s", line);
+   return 1;
+   }
+
+   return 0;
+}
+
+static int process_deepen_since(const char *line, timestamp_t *deepen_since, 
int *deepen_rev_list)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen-since ", )) {
+   char *end = NULL;
+   *deepen_since = parse_timestamp(arg, , 0);
+   if (!end || *end || !deepen_since ||
+   /* revisions.c's max_age -1 is special */
+   *deepen_since == -1)
+   die("Invalid deepen-since: %s", line);
+   *deepen_rev_list = 1;
+   return 1;
+   }
+   return 0;
+}
+
+static int process_deepen_not(const char *line, struct string_list 
*deepen_not, int *deepen_rev_list)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen-not ", )) {
+   char *ref = NULL;
+   struct object_id oid;
+   if (expand_ref(arg, strlen(arg), , ) != 1)
+   die("git upload-pack: ambiguous deepen-not: %s", line);
+   string_list_append(deepen_not, ref);
+   free(ref);
+   *deepen_rev_list = 1;
+   return 1;
+   }
+   return 0;
+}
+
 static void receive_needs(void)
 {
struct object_array shallows = OBJECT_ARRAY_INIT;
@@ -745,49 +814,15 @@ static void receive_needs(void)
if (!line)
break;
 
-   if (skip_prefix(line, "shallow ", )) {
-   struct object_id oid;
-   struct object *object;
-   if (get_oid_hex(arg, ))
-   die("invalid shallow line: %s", line);
-   object = parse_object();
-   if (!object)
-   continue;
-   if (object->type != OBJ_COMMIT)
-   die("invalid shallow object %s", 
oid_to_hex());
-   if (!(object->flags & CLIENT_SHALLOW)) {
-   object->flags |= CLIENT_SHALLOW;
-   add_object_array(object, NULL, );
-   }
+   if (process_shallow(line, ))
continue;
-   }
-   if (skip_prefix(line, "deepen ", )) {
-   char *end = NULL;
-   depth = strtol(arg, , 0);
-   if (!end || *end || depth <= 0)
-   die("Invalid deepen: %s", line);
+   if (process_deepen(line, ))
continue;
-   }
-   if (skip_prefix(line, "deepen-since ", )) {
-   char *end = NULL;
-   deepen_since = parse_timestamp(arg, , 0);
-   if (!end || *end || !deepen_since ||
-   /* revisions.c's max_age -1 is special */
-   deepen_since == -1)
-   die("Invalid deepen-since: %s", line);
-   deepen_rev_list = 1;
+   if (process_deepen_since(line, _since, _rev_list))
continue;
-   

[PATCH v6 14/35] connect: request remote refs using v2

2018-03-15 Thread Brandon Williams
Teach the client to be able to request a remote's refs using protocol
v2.  This is done by having a client issue a 'ls-refs' request to a v2
server.

Signed-off-by: Brandon Williams 
---
 builtin/upload-pack.c  |  10 +--
 connect.c  | 138 +++--
 connect.h  |   2 +
 remote.h   |   6 ++
 t/t5702-protocol-v2.sh |  57 +
 transport.c|   2 +-
 6 files changed, 204 insertions(+), 11 deletions(-)
 create mode 100755 t/t5702-protocol-v2.sh

diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 8d53e9794b..a757df8da0 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "protocol.h"
 #include "upload-pack.h"
+#include "serve.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -16,6 +17,7 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
const char *dir;
int strict = 0;
struct upload_pack_options opts = { 0 };
+   struct serve_options serve_opts = SERVE_OPTIONS_INIT;
struct option options[] = {
OPT_BOOL(0, "stateless-rpc", _rpc,
 N_("quit after a single request/response exchange")),
@@ -48,11 +50,9 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
 
switch (determine_protocol_version_server()) {
case protocol_v2:
-   /*
-* fetch support for protocol v2 has not been implemented yet,
-* so ignore the request to use v2 and fallback to using v0.
-*/
-   upload_pack();
+   serve_opts.advertise_capabilities = opts.advertise_refs;
+   serve_opts.stateless_rpc = opts.stateless_rpc;
+   serve(_opts);
break;
case protocol_v1:
/*
diff --git a/connect.c b/connect.c
index 4b89b984c4..e42d779f71 100644
--- a/connect.c
+++ b/connect.c
@@ -12,9 +12,11 @@
 #include "sha1-array.h"
 #include "transport.h"
 #include "strbuf.h"
+#include "version.h"
 #include "protocol.h"
 
-static char *server_capabilities;
+static char *server_capabilities_v1;
+static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
 static const char *parse_feature_value(const char *, const char *, int *);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
+/* Checks if the server supports the capability 'c' */
+int server_supports_v2(const char *c, int die_on_error)
+{
+   int i;
+
+   for (i = 0; i < server_capabilities_v2.argc; i++) {
+   const char *out;
+   if (skip_prefix(server_capabilities_v2.argv[i], c, ) &&
+   (!*out || *out == '='))
+   return 1;
+   }
+
+   if (die_on_error)
+   die("server doesn't support '%s'", c);
+
+   return 0;
+}
+
+static void process_capabilities_v2(struct packet_reader *reader)
+{
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL)
+   argv_array_push(_capabilities_v2, reader->line);
+
+   if (reader->status != PACKET_READ_FLUSH)
+   die("expected flush after capabilities");
+}
+
 enum protocol_version discover_version(struct packet_reader *reader)
 {
enum protocol_version version = protocol_unknown_version;
@@ -84,7 +113,7 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
 
switch (version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   process_capabilities_v2(reader);
break;
case protocol_v1:
/* Read the peeked version line */
@@ -128,7 +157,7 @@ static void parse_one_symref_info(struct string_list 
*symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
struct string_list symref = STRING_LIST_INIT_DUP;
-   const char *feature_list = server_capabilities;
+   const char *feature_list = server_capabilities_v1;
 
while (feature_list) {
int len;
@@ -157,7 +186,7 @@ static void process_capabilities(const char *line, int *len)
int nul_location = strlen(line);
if (nul_location == *len)
return;
-   server_capabilities = xstrdup(line + nul_location + 1);
+   server_capabilities_v1 = xstrdup(line + nul_location + 1);
*len = nul_location;
 }
 
@@ -292,6 +321,105 @@ struct ref **get_remote_heads(struct packet_reader 
*reader,
return list;
 }
 
+/* Returns 1 when a valid ref has been added to `list`, 0 otherwise */
+static int process_ref_v2(const char *line, struct ref ***list)
+{
+   int ret = 1;
+   int i = 0;
+   struct object_id old_oid;
+   struct ref *ref;
+   

[PATCH v6 15/35] transport: convert get_refs_list to take a list of ref prefixes

2018-03-15 Thread Brandon Williams
Convert the 'struct transport' virtual function 'get_refs_list()' to
optionally take an argv_array of ref prefixes.  When communicating with
a server using protocol v2 these ref prefixes can be sent when
requesting a listing of their refs allowing the server to filter the
refs it sends based on the sent prefixes.  This list will be ignored
when not using protocol v2.

Signed-off-by: Brandon Williams 
---
 transport-helper.c   |  5 +++--
 transport-internal.h | 11 ++-
 transport.c  | 18 +++---
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 5080150231..8774ab3013 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1026,7 +1026,8 @@ static int has_attribute(const char *attrs, const char 
*attr) {
}
 }
 
-static struct ref *get_refs_list(struct transport *transport, int for_push)
+static struct ref *get_refs_list(struct transport *transport, int for_push,
+const struct argv_array *ref_prefixes)
 {
struct helper_data *data = transport->data;
struct child_process *helper;
@@ -1039,7 +1040,7 @@ static struct ref *get_refs_list(struct transport 
*transport, int for_push)
 
if (process_connect(transport, for_push)) {
do_take_over(transport);
-   return transport->vtable->get_refs_list(transport, for_push);
+   return transport->vtable->get_refs_list(transport, for_push, 
ref_prefixes);
}
 
if (data->push && for_push)
diff --git a/transport-internal.h b/transport-internal.h
index 3c1a29d727..1cde6258a7 100644
--- a/transport-internal.h
+++ b/transport-internal.h
@@ -3,6 +3,7 @@
 
 struct ref;
 struct transport;
+struct argv_array;
 
 struct transport_vtable {
/**
@@ -17,11 +18,19 @@ struct transport_vtable {
 * the transport to try to share connections, for_push is a
 * hint as to whether the ultimate operation is a push or a fetch.
 *
+* If communicating using protocol v2 a list of prefixes can be
+* provided to be sent to the server to enable it to limit the ref
+* advertisement.  Since ref filtering is done on the server's end, and
+* only when using protocol v2, this list will be ignored when not
+* using protocol v2 meaning this function can return refs which don't
+* match the provided ref_prefixes.
+*
 * If the transport is able to determine the remote hash for
 * the ref without a huge amount of effort, it should store it
 * in the ref's old_sha1 field; otherwise it should be all 0.
 **/
-   struct ref *(*get_refs_list)(struct transport *transport, int for_push);
+   struct ref *(*get_refs_list)(struct transport *transport, int for_push,
+const struct argv_array *ref_prefixes);
 
/**
 * Fetch the objects for the given refs. Note that this gets
diff --git a/transport.c b/transport.c
index ffc6b2614f..2e68010dd0 100644
--- a/transport.c
+++ b/transport.c
@@ -72,7 +72,9 @@ struct bundle_transport_data {
struct bundle_header header;
 };
 
-static struct ref *get_refs_from_bundle(struct transport *transport, int 
for_push)
+static struct ref *get_refs_from_bundle(struct transport *transport,
+   int for_push,
+   const struct argv_array *ref_prefixes)
 {
struct bundle_transport_data *data = transport->data;
struct ref *result = NULL;
@@ -189,7 +191,8 @@ static int connect_setup(struct transport *transport, int 
for_push)
return 0;
 }
 
-static struct ref *get_refs_via_connect(struct transport *transport, int 
for_push)
+static struct ref *get_refs_via_connect(struct transport *transport, int 
for_push,
+   const struct argv_array *ref_prefixes)
 {
struct git_transport_data *data = transport->data;
struct ref *refs = NULL;
@@ -204,7 +207,8 @@ static struct ref *get_refs_via_connect(struct transport 
*transport, int for_pus
data->version = discover_version();
switch (data->version) {
case protocol_v2:
-   get_remote_refs(data->fd[1], , , for_push, NULL);
+   get_remote_refs(data->fd[1], , , for_push,
+   ref_prefixes);
break;
case protocol_v1:
case protocol_v0:
@@ -250,7 +254,7 @@ static int fetch_refs_via_pack(struct transport *transport,
args.update_shallow = data->options.update_shallow;
 
if (!data->got_remote_heads)
-   refs_tmp = get_refs_via_connect(transport, 0);
+   refs_tmp = get_refs_via_connect(transport, 0, NULL);
 
switch (data->version) {
case protocol_v2:
@@ -568,7 +572,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
int ret 

[PATCH v6 25/35] transport-helper: remove name parameter

2018-03-15 Thread Brandon Williams
Commit 266f1fdfa (transport-helper: be quiet on read errors from
helpers, 2013-06-21) removed a call to 'die()' which printed the name of
the remote helper passed in to the 'recvline_fh()' function using the
'name' parameter.  Once the call to 'die()' was removed the parameter
was no longer necessary but wasn't removed.  Clean up 'recvline_fh()'
parameter list by removing the 'name' parameter.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 8774ab3013..9677ead426 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -49,7 +49,7 @@ static void sendline(struct helper_data *helper, struct 
strbuf *buffer)
die_errno("Full write to remote helper failed");
 }
 
-static int recvline_fh(FILE *helper, struct strbuf *buffer, const char *name)
+static int recvline_fh(FILE *helper, struct strbuf *buffer)
 {
strbuf_reset(buffer);
if (debug)
@@ -67,7 +67,7 @@ static int recvline_fh(FILE *helper, struct strbuf *buffer, 
const char *name)
 
 static int recvline(struct helper_data *helper, struct strbuf *buffer)
 {
-   return recvline_fh(helper->out, buffer, helper->name);
+   return recvline_fh(helper->out, buffer);
 }
 
 static void write_constant(int fd, const char *str)
@@ -586,7 +586,7 @@ static int process_connect_service(struct transport 
*transport,
goto exit;
 
sendline(data, );
-   if (recvline_fh(input, , name))
+   if (recvline_fh(input, ))
exit(128);
 
if (!strcmp(cmdbuf.buf, "")) {
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 21/35] fetch-pack: perform a fetch using v2

2018-03-15 Thread Brandon Williams
When communicating with a v2 server, perform a fetch by requesting the
'fetch' command.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt |  67 +-
 builtin/fetch-pack.c|   2 +-
 fetch-pack.c| 270 +++-
 fetch-pack.h|   4 +-
 serve.c |   2 +-
 t/t5701-git-serve.sh|   2 +-
 t/t5702-protocol-v2.sh  |  97 +
 transport.c |   7 +-
 upload-pack.c   | 141 ++---
 upload-pack.h   |   4 +
 10 files changed, 548 insertions(+), 48 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 307dc1489b..4f7f251569 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -255,12 +255,43 @@ A `fetch` request can take the following arguments:
to its base by position in pack rather than by an oid.  That is,
they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
 
+shallow 
+   A client must notify the server of all commits for which it only
+   has shallow copies (meaning that it doesn't have the parents of
+   a commit) by supplying a 'shallow ' line for each such
+   object so that the server is aware of the limitations of the
+   client's history.  This is so that the server is aware that the
+   client may not have all objects reachable from such commits.
+
+deepen 
+   Requests that the fetch/clone should be shallow having a commit
+   depth of  relative to the remote side.
+
+deepen-relative
+   Requests that the semantics of the "deepen" command be changed
+   to indicate that the depth requested is relative to the client's
+   current shallow boundary, instead of relative to the requested
+   commits.
+
+deepen-since 
+   Requests that the shallow clone/fetch should be cut at a
+   specific time, instead of depth.  Internally it's equivalent to
+   doing "git rev-list --max-age=". Cannot be used with
+   "deepen".
+
+deepen-not 
+   Requests that the shallow clone/fetch should be cut at a
+   specific revision specified by '', instead of a depth.
+   Internally it's equivalent of doing "git rev-list --not ".
+   Cannot be used with "deepen", but can be used with
+   "deepen-since".
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header.
 
 output = *section
-section = (acknowledgments | packfile)
+section = (acknowledgments | shallow-info | packfile)
  (flush-pkt | delim-pkt)
 
 acknowledgments = PKT-LINE("acknowledgments" LF)
@@ -270,6 +301,11 @@ header.
 nak = PKT-LINE("NAK" LF)
 ack = PKT-LINE("ACK" SP obj-id LF)
 
+shallow-info = PKT-LINE("shallow-info" LF)
+  *PKT-LINE((shallow | unshallow) LF)
+shallow = "shallow" SP obj-id
+unshallow = "unshallow" SP obj-id
+
 packfile = PKT-LINE("packfile" LF)
   *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -301,6 +337,35 @@ header.
  determined the objects it plans to send to the client and no
  further negotiation is needed.
 
+shallow-info section
+   If the client has requested a shallow fetch/clone, a shallow
+   client requests a fetch or the server is shallow then the
+   server's response may include a shallow-info section.  The
+   shallow-info section will be included if (due to one of the
+   above conditions) the server needs to inform the client of any
+   shallow boundaries or adjustments to the clients already
+   existing shallow boundaries.
+
+   * Always begins with the section header "shallow-info"
+
+   * If a positive depth is requested, the server will compute the
+ set of commits which are no deeper than the desired depth.
+
+   * The server sends a "shallow obj-id" line for each commit whose
+ parents will not be sent in the following packfile.
+
+   * The server sends an "unshallow obj-id" line for each commit
+ which the client has indicated is shallow, but is no longer
+ shallow as a result of the fetch (due to its parents being
+ sent in the following packfile).
+
+   * The server MUST NOT send any "unshallow" lines for anything
+ which the client has not indicated was shallow as a part of
+ its request.
+
+   * This section is only included if a packfile section is also
+ included in the response.
+
 packfile section
* This section is only included if the client has sent 'want'
  lines in its request and either requested that no more
diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 

[PATCH v6 23/35] connect: refactor git_connect to only get the protocol version once

2018-03-15 Thread Brandon Williams
Instead of having each builtin transport asking for which protocol
version the user has configured in 'protocol.version' by calling
`get_protocol_version_config()` multiple times, factor this logic out
so there is just a single call at the beginning of `git_connect()`.

This will be helpful in the next patch where we can have centralized
logic which determines if we need to request a different protocol
version than what the user has configured.

Signed-off-by: Brandon Williams 
---
 connect.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/connect.c b/connect.c
index 5bb9d34844..a57a060dc4 100644
--- a/connect.c
+++ b/connect.c
@@ -1035,6 +1035,7 @@ static enum ssh_variant determine_ssh_variant(const char 
*ssh_command,
  */
 static struct child_process *git_connect_git(int fd[2], char *hostandport,
 const char *path, const char *prog,
+enum protocol_version version,
 int flags)
 {
struct child_process *conn;
@@ -1073,10 +1074,10 @@ static struct child_process *git_connect_git(int fd[2], 
char *hostandport,
target_host, 0);
 
/* If using a new version put that stuff here after a second null byte 
*/
-   if (get_protocol_version_config() > 0) {
+   if (version > 0) {
strbuf_addch(, '\0');
strbuf_addf(, "version=%d%c",
-   get_protocol_version_config(), '\0');
+   version, '\0');
}
 
packet_write(fd[1], request.buf, request.len);
@@ -1092,14 +1093,14 @@ static struct child_process *git_connect_git(int fd[2], 
char *hostandport,
  */
 static void push_ssh_options(struct argv_array *args, struct argv_array *env,
 enum ssh_variant variant, const char *port,
-int flags)
+enum protocol_version version, int flags)
 {
if (variant == VARIANT_SSH &&
-   get_protocol_version_config() > 0) {
+   version > 0) {
argv_array_push(args, "-o");
argv_array_push(args, "SendEnv=" GIT_PROTOCOL_ENVIRONMENT);
argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=version=%d",
-get_protocol_version_config());
+version);
}
 
if (flags & CONNECT_IPV4) {
@@ -1152,7 +1153,8 @@ static void push_ssh_options(struct argv_array *args, 
struct argv_array *env,
 
 /* Prepare a child_process for use by Git's SSH-tunneled transport. */
 static void fill_ssh_args(struct child_process *conn, const char *ssh_host,
- const char *port, int flags)
+ const char *port, enum protocol_version version,
+ int flags)
 {
const char *ssh;
enum ssh_variant variant;
@@ -1186,14 +1188,14 @@ static void fill_ssh_args(struct child_process *conn, 
const char *ssh_host,
argv_array_push(, ssh);
argv_array_push(, "-G");
push_ssh_options(, _array,
-VARIANT_SSH, port, flags);
+VARIANT_SSH, port, version, flags);
argv_array_push(, ssh_host);
 
variant = run_command() ? VARIANT_SIMPLE : VARIANT_SSH;
}
 
argv_array_push(>args, ssh);
-   push_ssh_options(>args, >env_array, variant, port, flags);
+   push_ssh_options(>args, >env_array, variant, port, version, 
flags);
argv_array_push(>args, ssh_host);
 }
 
@@ -1214,6 +1216,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
char *hostandport, *path;
struct child_process *conn;
enum protocol protocol;
+   enum protocol_version version = get_protocol_version_config();
 
/* Without this we cannot rely on waitpid() to tell
 * what happened to our children.
@@ -1228,7 +1231,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
printf("Diag: path=%s\n", path ? path : "NULL");
conn = NULL;
} else if (protocol == PROTO_GIT) {
-   conn = git_connect_git(fd, hostandport, path, prog, flags);
+   conn = git_connect_git(fd, hostandport, path, prog, version, 
flags);
} else {
struct strbuf cmd = STRBUF_INIT;
const char *const *var;
@@ -1271,12 +1274,12 @@ struct child_process *git_connect(int fd[2], const char 
*url,
strbuf_release();
return NULL;
}
-   fill_ssh_args(conn, ssh_host, port, flags);
+   fill_ssh_args(conn, ssh_host, port, version, flags);
} else {

[PATCH v6 24/35] connect: don't request v2 when pushing

2018-03-15 Thread Brandon Williams
In order to be able to ship protocol v2 with only supporting fetch, we
need clients to not issue a request to use protocol v2 when pushing
(since the client currently doesn't know how to push using protocol v2).
This allows a client to have protocol v2 configured in
`protocol.version` and take advantage of using v2 for fetch and falling
back to using v0 when pushing while v2 for push is being designed.

We could run into issues if we didn't fall back to protocol v2 when
pushing right now.  This is because currently a server will ignore a request to
use v2 when contacting the 'receive-pack' endpoint and fall back to
using v0, but when push v2 is rolled out to servers, the 'receive-pack'
endpoint will start responding using v2.  So we don't want to get into a
state where a client is requesting to push with v2 before they actually
know how to push using v2.

Signed-off-by: Brandon Williams 
---
 connect.c  |  8 
 t/t5702-protocol-v2.sh | 24 
 2 files changed, 32 insertions(+)

diff --git a/connect.c b/connect.c
index a57a060dc4..54971166ac 100644
--- a/connect.c
+++ b/connect.c
@@ -1218,6 +1218,14 @@ struct child_process *git_connect(int fd[2], const char 
*url,
enum protocol protocol;
enum protocol_version version = get_protocol_version_config();
 
+   /*
+* NEEDSWORK: If we are trying to use protocol v2 and we are planning
+* to perform a push, then fallback to v0 since the client doesn't know
+* how to push yet using v2.
+*/
+   if (version == protocol_v2 && !strcmp("git-receive-pack", prog))
+   version = protocol_v0;
+
/* Without this we cannot rely on waitpid() to tell
 * what happened to our children.
 */
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 4365ac2736..e3a7c09d4a 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -95,6 +95,30 @@ test_expect_success 'pull with git:// using protocol v2' '
grep "fetch< version 2" log
 '
 
+test_expect_success 'push with git:// and a config of v2 does not request v2' '
+   test_when_finished "rm -f log" &&
+
+   # Till v2 for push is designed, make sure that if a client has
+   # protocol.version configured to use v2, that the client instead falls
+   # back and uses v0.
+
+   test_commit -C daemon_child three &&
+
+   # Push to another branch, as the target repository has the
+   # master branch checked out and we cannot push into it.
+   GIT_TRACE_PACKET="$(pwd)/log" git -C daemon_child -c protocol.version=2 
\
+   push origin HEAD:client_branch &&
+
+   git -C daemon_child log -1 --format=%s >actual &&
+   git -C "$daemon_parent" log -1 --format=%s client_branch >expect &&
+   test_cmp expect actual &&
+
+   # Client requested to use protocol v2
+   ! grep "push> .*\\\0\\\0version=2\\\0$" log &&
+   # Server responded using protocol v2
+   ! grep "push< version 2" log
+'
+
 stop_git_daemon
 
 # Test protocol v2 with 'file://' transport
-- 
2.16.2.804.g6dcf76e118-goog



Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Given 2.17-rc1 next Tuesday according to your calendar, what's the
> status of these two landing in 2.17?

A rule of thumb for an average topic with just a handful of commits
is to stay in 'next' for about a week before graduating.  A larger
topic with wider implications to the codebase may want to be a few
weeks in 'next' (or stay there since before -rc0 and during the
feature freeze to the final), while a more trivially correct ones
may be there only overnight.

I tend to think that the perl-fixes topic is something we'd rather
want to push packgers to try sooner rather than later; there aren't
room for "bugs" that affects program execution, but the impact is
more to the installation procedure people have around our build
procedure.

I do not have a firm opinion on pcre v1 vs v2; the patches are only
about flipping the default and not robbing any capabilities from the
packagers and the users, so I do not see any need to be unusually
careful to merge it down (compared to other topics).


[PATCH v6 32/35] http: don't always add Git-Protocol header

2018-03-15 Thread Brandon Williams
Instead of always sending the Git-Protocol header with the configured
version with every http request, explicitly send it when discovering
refs and then only send it on subsequent http requests if the server
understood the version requested.

Signed-off-by: Brandon Williams 
---
 http.c| 17 -
 remote-curl.c | 33 +
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/http.c b/http.c
index e1757d62b2..8f1129ac7c 100644
--- a/http.c
+++ b/http.c
@@ -904,21 +904,6 @@ static void set_from_env(const char **var, const char 
*envname)
*var = val;
 }
 
-static void protocol_http_header(void)
-{
-   if (get_protocol_version_config() > 0) {
-   struct strbuf protocol_header = STRBUF_INIT;
-
-   strbuf_addf(_header, GIT_PROTOCOL_HEADER ": 
version=%d",
-   get_protocol_version_config());
-
-
-   extra_http_headers = curl_slist_append(extra_http_headers,
-  protocol_header.buf);
-   strbuf_release(_header);
-   }
-}
-
 void http_init(struct remote *remote, const char *url, int proactive_auth)
 {
char *low_speed_limit;
@@ -949,8 +934,6 @@ void http_init(struct remote *remote, const char *url, int 
proactive_auth)
if (remote)
var_override(_proxy_authmethod, 
remote->http_proxy_authmethod);
 
-   protocol_http_header();
-
pragma_header = curl_slist_append(http_copy_default_headers(),
"Pragma: no-cache");
no_pragma_header = curl_slist_append(http_copy_default_headers(),
diff --git a/remote-curl.c b/remote-curl.c
index c540358438..b4e9db85bb 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -291,6 +291,19 @@ static int show_http_message(struct strbuf *type, struct 
strbuf *charset,
return 0;
 }
 
+static int get_protocol_http_header(enum protocol_version version,
+   struct strbuf *header)
+{
+   if (version > 0) {
+   strbuf_addf(header, GIT_PROTOCOL_HEADER ": version=%d",
+   version);
+
+   return 1;
+   }
+
+   return 0;
+}
+
 static struct discovery *discover_refs(const char *service, int for_push)
 {
struct strbuf exp = STRBUF_INIT;
@@ -299,6 +312,8 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
struct strbuf buffer = STRBUF_INIT;
struct strbuf refs_url = STRBUF_INIT;
struct strbuf effective_url = STRBUF_INIT;
+   struct strbuf protocol_header = STRBUF_INIT;
+   struct string_list extra_headers = STRING_LIST_INIT_DUP;
struct discovery *last = last_discovery;
int http_ret, maybe_smart = 0;
struct http_get_options http_options;
@@ -318,11 +333,16 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
strbuf_addf(_url, "service=%s", service);
}
 
+   /* Add the extra Git-Protocol header */
+   if (get_protocol_http_header(get_protocol_version_config(), 
_header))
+   string_list_append(_headers, protocol_header.buf);
+
memset(_options, 0, sizeof(http_options));
http_options.content_type = 
http_options.charset = 
http_options.effective_url = _url;
http_options.base_url = 
+   http_options.extra_headers = _headers;
http_options.initial_request = 1;
http_options.no_cache = 1;
http_options.keep_error = 1;
@@ -389,6 +409,8 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
strbuf_release();
strbuf_release(_url);
strbuf_release();
+   strbuf_release(_header);
+   string_list_clear(_headers, 0);
last_discovery = last;
return last;
 }
@@ -425,6 +447,7 @@ struct rpc_state {
char *service_url;
char *hdr_content_type;
char *hdr_accept;
+   char *protocol_header;
char *buf;
size_t alloc;
size_t len;
@@ -611,6 +634,10 @@ static int post_rpc(struct rpc_state *rpc)
headers = curl_slist_append(headers, needs_100_continue ?
"Expect: 100-continue" : "Expect:");
 
+   /* Add the extra Git-Protocol header */
+   if (rpc->protocol_header)
+   headers = curl_slist_append(headers, rpc->protocol_header);
+
 retry:
slot = get_active_slot();
 
@@ -751,6 +778,11 @@ static int rpc_service(struct rpc_state *rpc, struct 
discovery *heads)
strbuf_addf(, "Accept: application/x-%s-result", svc);
rpc->hdr_accept = strbuf_detach(, NULL);
 
+   if (get_protocol_http_header(heads->version, ))
+   rpc->protocol_header = strbuf_detach(, NULL);
+   else
+   rpc->protocol_header = NULL;
+
while (!err) {
int n = packet_read(rpc->out, NULL, NULL, rpc->buf, rpc->alloc, 
0);
if 

[PATCH v6 28/35] pkt-line: add packet_buf_write_len function

2018-03-15 Thread Brandon Williams
Add the 'packet_buf_write_len()' function which allows for writing an
arbitrary length buffer into a 'struct strbuf' and formatting it in
packet-line format.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 16 
 pkt-line.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 7296731cf3..555eb2a507 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -215,6 +215,22 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, 
...)
va_end(args);
 }
 
+void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len)
+{
+   size_t orig_len, n;
+
+   orig_len = buf->len;
+   strbuf_addstr(buf, "");
+   strbuf_add(buf, data, len);
+   n = buf->len - orig_len;
+
+   if (n > LARGE_PACKET_MAX)
+   die("protocol error: impossibly long line");
+
+   set_packet_header(>buf[orig_len], n);
+   packet_trace(data, len, 1);
+}
+
 int write_packetized_from_fd(int fd_in, int fd_out)
 {
static char buf[LARGE_PACKET_DATA_MAX];
diff --git a/pkt-line.h b/pkt-line.h
index 9570bd7a0a..5b28d43472 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -26,6 +26,7 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int write_packetized_from_fd(int fd_in, int fd_out);
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 31/35] http: allow providing extra headers for http requests

2018-03-15 Thread Brandon Williams
Add a way for callers to request that extra headers be included when
making http requests.

Signed-off-by: Brandon Williams 
---
 http.c | 8 
 http.h | 7 +++
 2 files changed, 15 insertions(+)

diff --git a/http.c b/http.c
index 5977712712..e1757d62b2 100644
--- a/http.c
+++ b/http.c
@@ -1723,6 +1723,14 @@ static int http_request(const char *url,
 
headers = curl_slist_append(headers, buf.buf);
 
+   /* Add additional headers here */
+   if (options && options->extra_headers) {
+   const struct string_list_item *item;
+   for_each_string_list_item(item, options->extra_headers) {
+   headers = curl_slist_append(headers, item->string);
+   }
+   }
+
curl_easy_setopt(slot->curl, CURLOPT_URL, url);
curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(slot->curl, CURLOPT_ENCODING, "gzip");
diff --git a/http.h b/http.h
index f7bd3b26b0..4df4a25e1a 100644
--- a/http.h
+++ b/http.h
@@ -172,6 +172,13 @@ struct http_get_options {
 * for details.
 */
struct strbuf *base_url;
+
+   /*
+* If not NULL, contains additional HTTP headers to be sent with the
+* request. The strings in the list must not be freed until after the
+* request has completed.
+*/
+   struct string_list *extra_headers;
 };
 
 /* Return values for http_get_*() */
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 33/35] http: eliminate "# service" line when using protocol v2

2018-03-15 Thread Brandon Williams
When an http info/refs request is made, requesting that protocol v2 be
used, don't send a "# service" line since this line is not part of the
v2 spec.

Signed-off-by: Brandon Williams 
---
 http-backend.c | 8 ++--
 remote-curl.c  | 3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index f3dc218b2a..5d241e9109 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -10,6 +10,7 @@
 #include "url.h"
 #include "argv-array.h"
 #include "packfile.h"
+#include "protocol.h"
 
 static const char content_type[] = "Content-Type";
 static const char content_length[] = "Content-Length";
@@ -466,8 +467,11 @@ static void get_info_refs(struct strbuf *hdr, char *arg)
hdr_str(hdr, content_type, buf.buf);
end_headers(hdr);
 
-   packet_write_fmt(1, "# service=git-%s\n", svc->name);
-   packet_flush(1);
+
+   if (determine_protocol_version_server() != protocol_v2) {
+   packet_write_fmt(1, "# service=git-%s\n", svc->name);
+   packet_flush(1);
+   }
 
argv[0] = svc->name;
run_service(argv, 0);
diff --git a/remote-curl.c b/remote-curl.c
index b4e9db85bb..66a53f74bb 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -396,6 +396,9 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
;
 
last->proto_git = 1;
+   } else if (maybe_smart &&
+  last->len > 5 && starts_with(last->buf + 4, "version 2")) {
+   last->proto_git = 1;
}
 
if (last->proto_git)
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 34/35] remote-curl: implement stateless-connect command

2018-03-15 Thread Brandon Williams
Teach remote-curl the 'stateless-connect' command which is used to
establish a stateless connection with servers which support protocol
version 2.  This allows remote-curl to act as a proxy, allowing the git
client to communicate natively with a remote end, simply using
remote-curl as a pass through to convert requests to http.

Signed-off-by: Brandon Williams 
---
 remote-curl.c  | 207 -
 t/t5702-protocol-v2.sh |  45 +
 2 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 66a53f74bb..87f5b77b29 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -188,7 +188,12 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
heads->version = discover_version();
switch (heads->version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   /*
+* Do nothing.  This isn't a list of refs but rather a
+* capability advertisement.  Client would have run
+* 'stateless-connect' so we'll dump this capability listing
+* and let them request the refs themselves.
+*/
break;
case protocol_v1:
case protocol_v0:
@@ -1085,6 +1090,202 @@ static void parse_push(struct strbuf *buf)
free(specs);
 }
 
+/*
+ * Used to represent the state of a connection to an HTTP server when
+ * communicating using git's wire-protocol version 2.
+ */
+struct proxy_state {
+   char *service_name;
+   char *service_url;
+   struct curl_slist *headers;
+   struct strbuf request_buffer;
+   int in;
+   int out;
+   struct packet_reader reader;
+   size_t pos;
+   int seen_flush;
+};
+
+static void proxy_state_init(struct proxy_state *p, const char *service_name,
+enum protocol_version version)
+{
+   struct strbuf buf = STRBUF_INIT;
+
+   memset(p, 0, sizeof(*p));
+   p->service_name = xstrdup(service_name);
+
+   p->in = 0;
+   p->out = 1;
+   strbuf_init(>request_buffer, 0);
+
+   strbuf_addf(, "%s%s", url.buf, p->service_name);
+   p->service_url = strbuf_detach(, NULL);
+
+   p->headers = http_copy_default_headers();
+
+   strbuf_addf(, "Content-Type: application/x-%s-request", 
p->service_name);
+   p->headers = curl_slist_append(p->headers, buf.buf);
+   strbuf_reset();
+
+   strbuf_addf(, "Accept: application/x-%s-result", p->service_name);
+   p->headers = curl_slist_append(p->headers, buf.buf);
+   strbuf_reset();
+
+   p->headers = curl_slist_append(p->headers, "Transfer-Encoding: 
chunked");
+
+   /* Add the Git-Protocol header */
+   if (get_protocol_http_header(version, ))
+   p->headers = curl_slist_append(p->headers, buf.buf);
+
+   packet_reader_init(>reader, p->in, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF);
+
+   strbuf_release();
+}
+
+static void proxy_state_clear(struct proxy_state *p)
+{
+   free(p->service_name);
+   free(p->service_url);
+   curl_slist_free_all(p->headers);
+   strbuf_release(>request_buffer);
+}
+
+/*
+ * CURLOPT_READFUNCTION callback function.
+ * Attempts to copy over a single packet-line at a time into the
+ * curl provided buffer.
+ */
+static size_t proxy_in(char *buffer, size_t eltsize,
+  size_t nmemb, void *userdata)
+{
+   size_t max;
+   struct proxy_state *p = userdata;
+   size_t avail = p->request_buffer.len - p->pos;
+
+
+   if (eltsize != 1)
+   BUG("curl read callback called with size = %"PRIuMAX" != 1",
+   (uintmax_t)eltsize);
+   max = nmemb;
+
+   if (!avail) {
+   if (p->seen_flush) {
+   p->seen_flush = 0;
+   return 0;
+   }
+
+   strbuf_reset(>request_buffer);
+   switch (packet_reader_read(>reader)) {
+   case PACKET_READ_EOF:
+   die("unexpected EOF when reading from parent process");
+   case PACKET_READ_NORMAL:
+   packet_buf_write_len(>request_buffer, p->reader.line,
+p->reader.pktlen);
+   break;
+   case PACKET_READ_DELIM:
+   packet_buf_delim(>request_buffer);
+   break;
+   case PACKET_READ_FLUSH:
+   packet_buf_flush(>request_buffer);
+   p->seen_flush = 1;
+   break;
+   }
+   p->pos = 0;
+   avail = p->request_buffer.len;
+   }
+
+   if (max < avail)
+   avail = max;
+   memcpy(buffer, p->request_buffer.buf + p->pos, avail);
+   p->pos += avail;
+   return avail;
+}
+
+static size_t 

[PATCH v6 35/35] remote-curl: don't request v2 when pushing

2018-03-15 Thread Brandon Williams
In order to be able to ship protocol v2 with only supporting fetch, we
need clients to not issue a request to use protocol v2 when pushing
(since the client currently doesn't know how to push using protocol v2).
This allows a client to have protocol v2 configured in
`protocol.version` and take advantage of using v2 for fetch and falling
back to using v0 when pushing while v2 for push is being designed.

We could run into issues if we didn't fall back to protocol v2 when
pushing right now.  This is because currently a server will ignore a request to
use v2 when contacting the 'receive-pack' endpoint and fall back to
using v0, but when push v2 is rolled out to servers, the 'receive-pack'
endpoint will start responding using v2.  So we don't want to get into a
state where a client is requesting to push with v2 before they actually
know how to push using v2.

Signed-off-by: Brandon Williams 
---
 remote-curl.c  | 11 ++-
 t/t5702-protocol-v2.sh | 24 
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 87f5b77b29..595447b16e 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -322,6 +322,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
struct discovery *last = last_discovery;
int http_ret, maybe_smart = 0;
struct http_get_options http_options;
+   enum protocol_version version = get_protocol_version_config();
 
if (last && !strcmp(service, last->service))
return last;
@@ -338,8 +339,16 @@ static struct discovery *discover_refs(const char 
*service, int for_push)
strbuf_addf(_url, "service=%s", service);
}
 
+   /*
+* NEEDSWORK: If we are trying to use protocol v2 and we are planning
+* to perform a push, then fallback to v0 since the client doesn't know
+* how to push yet using v2.
+*/
+   if (version == protocol_v2 && !strcmp("git-receive-pack", service))
+   version = protocol_v0;
+
/* Add the extra Git-Protocol header */
-   if (get_protocol_http_header(get_protocol_version_config(), 
_header))
+   if (get_protocol_http_header(version, _header))
string_list_append(_headers, protocol_header.buf);
 
memset(_options, 0, sizeof(http_options));
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 124063c2c4..56f7c3c326 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -244,6 +244,30 @@ test_expect_success 'fetch with http:// using protocol v2' 
'
grep "git< version 2" log
 '
 
+test_expect_success 'push with http:// and a config of v2 does not request v2' 
'
+   test_when_finished "rm -f log" &&
+   # Till v2 for push is designed, make sure that if a client has
+   # protocol.version configured to use v2, that the client instead falls
+   # back and uses v0.
+
+   test_commit -C http_child three &&
+
+   # Push to another branch, as the target repository has the
+   # master branch checked out and we cannot push into it.
+   GIT_TRACE_PACKET="$(pwd)/log" git -C http_child -c protocol.version=2 \
+   push origin HEAD:client_branch &&
+
+   git -C http_child log -1 --format=%s >actual &&
+   git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" log -1 --format=%s 
client_branch >expect &&
+   test_cmp expect actual &&
+
+   # Client didnt request to use protocol v2
+   ! grep "Git-Protocol: version=2" log &&
+   # Server didnt respond using protocol v2
+   ! grep "git< version 2" log
+'
+
+
 stop_httpd
 
 test_done
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 27/35] transport-helper: introduce stateless-connect

2018-03-15 Thread Brandon Williams
Introduce the transport-helper capability 'stateless-connect'.  This
capability indicates that the transport-helper can be requested to run
the 'stateless-connect' command which should attempt to make a
stateless connection with a remote end.  Once established, the
connection can be used by the git client to communicate with
the remote end natively in a stateless-rpc manner as supported by
protocol v2.  This means that the client must send everything the server
needs in a single request as the client must not assume any
state-storing on the part of the server or transport.

If a stateless connection cannot be established then the remote-helper
will respond in the same manner as the 'connect' command indicating that
the client should fallback to using the dumb remote-helper commands.

A future patch will implement the 'stateless-connect' capability in our
http remote-helper (remote-curl) so that protocol v2 can be used using
the http transport.

Signed-off-by: Brandon Williams 
---
 Documentation/gitremote-helpers.txt | 32 +
 transport-helper.c  | 11 ++
 transport.c |  1 +
 transport.h |  6 ++
 4 files changed, 50 insertions(+)

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 4a584f3c5d..cd9b34d230 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -102,6 +102,14 @@ Capabilities for Pushing
 +
 Supported commands: 'connect'.
 
+'stateless-connect'::
+   Experimental; for internal use only.
+   Can attempt to connect to a remote server for communication
+   using git's wire-protocol version 2.  See the documentation
+   for the stateless-connect command for more information.
++
+Supported commands: 'stateless-connect'.
+
 'push'::
Can discover remote refs and push local commits and the
history leading up to them to new or existing remote refs.
@@ -136,6 +144,14 @@ Capabilities for Fetching
 +
 Supported commands: 'connect'.
 
+'stateless-connect'::
+   Experimental; for internal use only.
+   Can attempt to connect to a remote server for communication
+   using git's wire-protocol version 2.  See the documentation
+   for the stateless-connect command for more information.
++
+Supported commands: 'stateless-connect'.
+
 'fetch'::
Can discover remote refs and transfer objects reachable from
them to the local object store.
@@ -375,6 +391,22 @@ Supported if the helper has the "export" capability.
 +
 Supported if the helper has the "connect" capability.
 
+'stateless-connect' ::
+   Experimental; for internal use only.
+   Connects to the given remote service for communication using
+   git's wire-protocol version 2.  Valid replies to this command
+   are empty line (connection established), 'fallback' (no smart
+   transport support, fall back to dumb transports) and just
+   exiting with error message printed (can't connect, don't bother
+   trying to fall back).  After line feed terminating the positive
+   (empty) response, the output of the service starts.  Messages
+   (both request and response) must consist of zero or more
+   PKT-LINEs, terminating in a flush packet. The client must not
+   expect the server to store any state in between request-response
+   pairs.  After the connection ends, the remote helper exits.
++
+Supported if the helper has the "stateless-connect" capability.
+
 If a fatal error occurs, the program writes the error message to
 stderr and exits. The caller should expect that a suitable error
 message has been printed if the child closes the connection without
diff --git a/transport-helper.c b/transport-helper.c
index 830f21f0a9..aecbc4a845 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -12,6 +12,7 @@
 #include "argv-array.h"
 #include "refs.h"
 #include "transport-internal.h"
+#include "protocol.h"
 
 static int debug;
 
@@ -26,6 +27,7 @@ struct helper_data {
option : 1,
push : 1,
connect : 1,
+   stateless_connect : 1,
signed_tags : 1,
check_connectivity : 1,
no_disconnect_req : 1,
@@ -188,6 +190,8 @@ static struct child_process *get_helper(struct transport 
*transport)
refspecs[refspec_nr++] = xstrdup(arg);
} else if (!strcmp(capname, "connect")) {
data->connect = 1;
+   } else if (!strcmp(capname, "stateless-connect")) {
+   data->stateless_connect = 1;
} else if (!strcmp(capname, "signed-tags")) {
data->signed_tags = 1;
} else if (skip_prefix(capname, "export-marks ", )) {
@@ -612,6 +616,13 @@ static int process_connect_service(struct transport 
*transport,
if (data->connect) {
   

[PATCH v6 26/35] transport-helper: refactor process_connect_service

2018-03-15 Thread Brandon Williams
A future patch will need to take advantage of the logic which runs and
processes the response of the connect command on a remote helper so
factor out this logic from 'process_connect_service()' and place it into
a helper function 'run_connect()'.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 67 ++
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 9677ead426..830f21f0a9 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -545,14 +545,13 @@ static int fetch_with_import(struct transport *transport,
return 0;
 }
 
-static int process_connect_service(struct transport *transport,
-  const char *name, const char *exec)
+static int run_connect(struct transport *transport, struct strbuf *cmdbuf)
 {
struct helper_data *data = transport->data;
-   struct strbuf cmdbuf = STRBUF_INIT;
-   struct child_process *helper;
-   int r, duped, ret = 0;
+   int ret = 0;
+   int duped;
FILE *input;
+   struct child_process *helper;
 
helper = get_helper(transport);
 
@@ -568,44 +567,54 @@ static int process_connect_service(struct transport 
*transport,
input = xfdopen(duped, "r");
setvbuf(input, NULL, _IONBF, 0);
 
+   sendline(data, cmdbuf);
+   if (recvline_fh(input, cmdbuf))
+   exit(128);
+
+   if (!strcmp(cmdbuf->buf, "")) {
+   data->no_disconnect_req = 1;
+   if (debug)
+   fprintf(stderr, "Debug: Smart transport connection "
+   "ready.\n");
+   ret = 1;
+   } else if (!strcmp(cmdbuf->buf, "fallback")) {
+   if (debug)
+   fprintf(stderr, "Debug: Falling back to dumb "
+   "transport.\n");
+   } else {
+   die("Unknown response to connect: %s",
+   cmdbuf->buf);
+   }
+
+   fclose(input);
+   return ret;
+}
+
+static int process_connect_service(struct transport *transport,
+  const char *name, const char *exec)
+{
+   struct helper_data *data = transport->data;
+   struct strbuf cmdbuf = STRBUF_INIT;
+   int ret = 0;
+
/*
 * Handle --upload-pack and friends. This is fire and forget...
 * just warn if it fails.
 */
if (strcmp(name, exec)) {
-   r = set_helper_option(transport, "servpath", exec);
+   int r = set_helper_option(transport, "servpath", exec);
if (r > 0)
warning("Setting remote service path not supported by 
protocol.");
else if (r < 0)
warning("Invalid remote service path.");
}
 
-   if (data->connect)
+   if (data->connect) {
strbuf_addf(, "connect %s\n", name);
-   else
-   goto exit;
-
-   sendline(data, );
-   if (recvline_fh(input, ))
-   exit(128);
-
-   if (!strcmp(cmdbuf.buf, "")) {
-   data->no_disconnect_req = 1;
-   if (debug)
-   fprintf(stderr, "Debug: Smart transport connection "
-   "ready.\n");
-   ret = 1;
-   } else if (!strcmp(cmdbuf.buf, "fallback")) {
-   if (debug)
-   fprintf(stderr, "Debug: Falling back to dumb "
-   "transport.\n");
-   } else
-   die("Unknown response to connect: %s",
-   cmdbuf.buf);
+   ret = run_connect(transport, );
+   }
 
-exit:
strbuf_release();
-   fclose(input);
return ret;
 }
 
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 29/35] remote-curl: create copy of the service name

2018-03-15 Thread Brandon Williams
Make a copy of the service name being requested instead of relying on
the buffer pointed to by the passed in 'const char *' to remain
unchanged.

Currently, all service names are string constants, but a subsequent
patch will introduce service names from external sources.

Signed-off-by: Brandon Williams 
---
 remote-curl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index dae8a4a48d..4086aa733b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -165,7 +165,7 @@ static int set_option(const char *name, const char *value)
 }
 
 struct discovery {
-   const char *service;
+   char *service;
char *buf_alloc;
char *buf;
size_t len;
@@ -257,6 +257,7 @@ static void free_discovery(struct discovery *d)
free(d->shallow.oid);
free(d->buf_alloc);
free_refs(d->refs);
+   free(d->service);
free(d);
}
 }
@@ -343,7 +344,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
warning(_("redirecting to %s"), url.buf);
 
last= xcalloc(1, sizeof(*last_discovery));
-   last->service = service;
+   last->service = xstrdup(service);
last->buf_alloc = strbuf_detach(, >len);
last->buf = last->buf_alloc;
 
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 30/35] remote-curl: store the protocol version the server responded with

2018-03-15 Thread Brandon Williams
Store the protocol version the server responded with when performing
discovery.  This will be used in a future patch to either change the
'Git-Protocol' header sent in subsequent requests or to determine if a
client needs to fallback to using a different protocol version.

Signed-off-by: Brandon Williams 
---
 remote-curl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 4086aa733b..c540358438 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -171,6 +171,7 @@ struct discovery {
size_t len;
struct ref *refs;
struct oid_array shallow;
+   enum protocol_version version;
unsigned proto_git : 1;
 };
 static struct discovery *last_discovery;
@@ -184,7 +185,8 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
   PACKET_READ_CHOMP_NEWLINE |
   PACKET_READ_GENTLE_ON_EOF);
 
-   switch (discover_version()) {
+   heads->version = discover_version();
+   switch (heads->version) {
case protocol_v2:
die("support for protocol v2 not implemented yet");
break;
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 19/35] push: pass ref prefixes when pushing

2018-03-15 Thread Brandon Williams
Construct a list of ref prefixes to be passed to 'get_refs_list()' from
the refspec to be used during the push.  This list of ref prefixes will
be used to allow the server to filter the ref advertisement when
communicating using protocol v2.

Signed-off-by: Brandon Williams 
---
 transport.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/transport.c b/transport.c
index 3f130518d2..57bdbb59bc 100644
--- a/transport.c
+++ b/transport.c
@@ -1028,11 +1028,38 @@ int transport_push(struct transport *transport,
int porcelain = flags & TRANSPORT_PUSH_PORCELAIN;
int pretend = flags & TRANSPORT_PUSH_DRY_RUN;
int push_ret, ret, err;
+   struct refspec *tmp_rs;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
+   int i;
 
if (check_push_refs(local_refs, refspec_nr, refspec) < 0)
return -1;
 
-   remote_refs = transport->vtable->get_refs_list(transport, 1, 
NULL);
+   tmp_rs = parse_push_refspec(refspec_nr, refspec);
+   for (i = 0; i < refspec_nr; i++) {
+   const char *prefix = NULL;
+
+   if (tmp_rs[i].dst)
+   prefix = tmp_rs[i].dst;
+   else if (tmp_rs[i].src && !tmp_rs[i].exact_sha1)
+   prefix = tmp_rs[i].src;
+
+   if (prefix) {
+   const char *glob = strchr(prefix, '*');
+   if (glob)
+   argv_array_pushf(_prefixes, "%.*s",
+(int)(glob - prefix),
+prefix);
+   else
+   expand_ref_prefix(_prefixes, 
prefix);
+   }
+   }
+
+   remote_refs = transport->vtable->get_refs_list(transport, 1,
+  _prefixes);
+
+   argv_array_clear(_prefixes);
+   free_refspec(refspec_nr, tmp_rs);
 
if (flags & TRANSPORT_PUSH_ALL)
match_flags |= MATCH_REFS_ALL;
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 17/35] ls-remote: pass ref prefixes when requesting a remote's refs

2018-03-15 Thread Brandon Williams
Construct an argv_array of ref prefixes based on the patterns supplied
via the command line and pass them to 'transport_get_remote_refs()' to
be used when communicating protocol v2 so that the server can limit the
ref advertisement based on those prefixes.

Signed-off-by: Brandon Williams 
---
 builtin/ls-remote.c| 15 +--
 refs.c | 14 ++
 refs.h |  7 +++
 t/t5702-protocol-v2.sh | 26 ++
 4 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c6e9847c5c..4276bf97d5 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "transport.h"
 #include "remote.h"
+#include "refs.h"
 
 static const char * const ls_remote_usage[] = {
N_("git ls-remote [--heads] [--tags] [--refs] [--upload-pack=]\n"
@@ -43,6 +44,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
int show_symref_target = 0;
const char *uploadpack = NULL;
const char **pattern = NULL;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
struct remote *remote;
struct transport *transport;
@@ -74,8 +76,17 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (argc > 1) {
int i;
pattern = xcalloc(argc, sizeof(const char *));
-   for (i = 1; i < argc; i++)
+   for (i = 1; i < argc; i++) {
+   const char *glob;
pattern[i - 1] = xstrfmt("*/%s", argv[i]);
+
+   glob = strchr(argv[i], '*');
+   if (glob)
+   argv_array_pushf(_prefixes, "%.*s",
+(int)(glob - argv[i]), 
argv[i]);
+   else
+   expand_ref_prefix(_prefixes, argv[i]);
+   }
}
 
remote = remote_get(dest);
@@ -96,7 +107,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport, NULL);
+   ref = transport_get_remote_refs(transport, _prefixes);
if (transport_disconnect(transport))
return 1;
 
diff --git a/refs.c b/refs.c
index 20ba82b434..cefbad2076 100644
--- a/refs.c
+++ b/refs.c
@@ -13,6 +13,7 @@
 #include "tag.h"
 #include "submodule.h"
 #include "worktree.h"
+#include "argv-array.h"
 
 /*
  * List of all available backends
@@ -501,6 +502,19 @@ int refname_match(const char *abbrev_name, const char 
*full_name)
return 0;
 }
 
+/*
+ * Given a 'prefix' expand it by the rules in 'ref_rev_parse_rules' and add
+ * the results to 'prefixes'
+ */
+void expand_ref_prefix(struct argv_array *prefixes, const char *prefix)
+{
+   const char **p;
+   int len = strlen(prefix);
+
+   for (p = ref_rev_parse_rules; *p; p++)
+   argv_array_pushf(prefixes, *p, len, prefix);
+}
+
 /*
  * *string and *len will only be substituted, and *string returned (for
  * later free()ing) if the string passed in is a magic short-hand form
diff --git a/refs.h b/refs.h
index 01be5ae32f..93b6dce944 100644
--- a/refs.h
+++ b/refs.h
@@ -139,6 +139,13 @@ int resolve_gitlink_ref(const char *submodule, const char 
*refname,
  */
 int refname_match(const char *abbrev_name, const char *full_name);
 
+/*
+ * Given a 'prefix' expand it by the rules in 'ref_rev_parse_rules' and add
+ * the results to 'prefixes'
+ */
+struct argv_array;
+void expand_ref_prefix(struct argv_array *prefixes, const char *prefix);
+
 int expand_ref(const char *str, int len, struct object_id *oid, char **ref);
 int dwim_ref(const char *str, int len, struct object_id *oid, char **ref);
 int dwim_log(const char *str, int len, struct object_id *oid, char **ref);
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index dc5f813beb..562610fd25 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -32,6 +32,19 @@ test_expect_success 'list refs with git:// using protocol 
v2' '
test_cmp actual expect
 '
 
+test_expect_success 'ref advertisment is filtered with ls-remote using 
protocol v2' '
+   test_when_finished "rm -f log" &&
+
+   GIT_TRACE_PACKET="$(pwd)/log" git -c protocol.version=2 \
+   ls-remote "$GIT_DAEMON_URL/parent" master >actual &&
+
+   cat >expect <<-EOF &&
+   $(git -C "$daemon_parent" rev-parse refs/heads/master)$(printf 
"\t")refs/heads/master
+   EOF
+
+   test_cmp actual expect
+'
+
 stop_git_daemon
 
 # Test protocol v2 with 'file://' transport
@@ -54,4 +67,17 @@ test_expect_success 'list refs with file:// using protocol 
v2' '
test_cmp actual expect
 '
 
+test_expect_success 'ref advertisment is filtered with ls-remote using 
protocol v2' '
+   

[PATCH v6 20/35] upload-pack: introduce fetch server command

2018-03-15 Thread Brandon Williams
Introduce the 'fetch' server command.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt | 125 +++
 serve.c |   2 +
 t/t5701-git-serve.sh|   1 +
 upload-pack.c   | 266 
 upload-pack.h   |   6 +
 5 files changed, 400 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index bbb8b14d27..307dc1489b 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -199,3 +199,128 @@ The output of ls-refs is as follows:
 ref-attribute = (symref | peeled)
 symref = "symref-target:" symref-target
 peeled = "peeled:" obj-id
+
+ fetch
+~~~
+
+`fetch` is the command used to fetch a packfile in v2.  It can be looked
+at as a modified version of the v1 fetch where the ref-advertisement is
+stripped out (since the `ls-refs` command fills that role) and the
+message format is tweaked to eliminate redundancies and permit easy
+addition of future extensions.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "= "
+
+A `fetch` request can take the following arguments:
+
+want 
+   Indicates to the server an object which the client wants to
+   retrieve.  Wants can be anything and are not limited to
+   advertised objects.
+
+have 
+   Indicates to the server an object which the client has locally.
+   This allows the server to make a packfile which only contains
+   the objects that the client needs. Multiple 'have' lines can be
+   supplied.
+
+done
+   Indicates to the server that negotiation should terminate (or
+   not even begin if performing a clone) and that the server should
+   use the information supplied in the request to construct the
+   packfile.
+
+thin-pack
+   Request that a thin pack be sent, which is a pack with deltas
+   which reference base objects not contained within the pack (but
+   are known to exist at the receiving end). This can reduce the
+   network traffic significantly, but it requires the receiving end
+   to know how to "thicken" these packs by adding the missing bases
+   to the pack.
+
+no-progress
+   Request that progress information that would normally be sent on
+   side-band channel 2, during the packfile transfer, should not be
+   sent.  However, the side-band channel 3 is still used for error
+   responses.
+
+include-tag
+   Request that annotated tags should be sent if the objects they
+   point to are being sent.
+
+ofs-delta
+   Indicate that the client understands PACKv2 with delta referring
+   to its base by position in pack rather than by an oid.  That is,
+   they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
+
+The response of `fetch` is broken into a number of sections separated by
+delimiter packets (0001), with each section beginning with its section
+header.
+
+output = *section
+section = (acknowledgments | packfile)
+ (flush-pkt | delim-pkt)
+
+acknowledgments = PKT-LINE("acknowledgments" LF)
+ (nak | *ack)
+ (ready)
+ready = PKT-LINE("ready" LF)
+nak = PKT-LINE("NAK" LF)
+ack = PKT-LINE("ACK" SP obj-id LF)
+
+packfile = PKT-LINE("packfile" LF)
+  *PKT-LINE(%x01-03 *%x00-ff)
+
+acknowledgments section
+   * If the client determines that it is finished with negotiations
+ by sending a "done" line, the acknowledgments sections MUST be
+ omitted from the server's response.
+
+   * Always begins with the section header "acknowledgments"
+
+   * The server will respond with "NAK" if none of the object ids sent
+ as have lines were common.
+
+   * The server will respond with "ACK obj-id" for all of the
+ object ids sent as have lines which are common.
+
+   * A response cannot have both "ACK" lines as well as a "NAK"
+ line.
+
+   * The server will respond with a "ready" line indicating that
+ the server has found an acceptable common base and is ready to
+ make and send a packfile (which will be found in the packfile
+ section of the same response)
+
+   * If the server has found a suitable cut point and has decided
+ to send a "ready" line, then the server can decide to (as an
+ optimization) omit any "ACK" lines it would have sent during
+ its response.  This is because the server will have already
+ determined the objects it plans to send to the client and no
+ further negotiation is needed.
+
+packfile section
+   * This section is only included if the client has sent 'want'

[PATCH v6 12/35] serve: introduce git-serve

2018-03-15 Thread Brandon Williams
Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current
wire protocol.  The intention is that it will be a simpler, less
wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.  In its place a server will export a list of
capabilities and commands which it supports in a capability
advertisement.  A client can then request that a particular command be
executed by providing a number of capabilities and command specific
parameters.  At the completion of a command, a client can request that
another command be executed or can terminate the connection by sending a
flush packet.

Signed-off-by: Brandon Williams 
---
 .gitignore  |   1 +
 Documentation/Makefile  |   1 +
 Documentation/technical/protocol-v2.txt | 170 
 Makefile|   2 +
 builtin.h   |   1 +
 builtin/serve.c |  30 +++
 git.c   |   1 +
 serve.c | 247 
 serve.h |  15 ++
 t/t5701-git-serve.sh|  60 ++
 10 files changed, 528 insertions(+)
 create mode 100644 Documentation/technical/protocol-v2.txt
 create mode 100644 builtin/serve.c
 create mode 100644 serve.c
 create mode 100644 serve.h
 create mode 100755 t/t5701-git-serve.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..2d0450c262 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
 /git-rm
 /git-send-email
 /git-send-pack
+/git-serve
 /git-sh-i18n
 /git-sh-i18n--envsubst
 /git-sh-setup
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 4ae9ba5c86..b105775acd 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
 TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
+TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 00..270b28f364
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,170 @@
+ Git Wire Protocol, Version 2
+==
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+supported by a single service
+  * Easily extendable as capabilities are moved into their own section
+of the protocol, no longer being hidden behind a NUL byte and
+limited by the size of a pkt-line
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+  * Designed with http and stateless-rpc in mind.  With clear flush
+semantics the http remote helper can simply act as a proxy
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Packet-Line Framing
+-
+
+All communication is done using packet-line framing, just as in v1.  See
+`Documentation/technical/pack-protocol.txt` and
+`Documentation/technical/protocol-common.txt` for more information.
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Initial Client Request
+
+
+In general a client can request to speak protocol v2 by sending
+`version=2` through the respective side-channel for the transport being
+used which inevitably sets `GIT_PROTOCOL`.  More information can be
+found in `pack-protocol.txt` and `http-protocol.txt`.  In all cases the
+response from the server is the capability advertisement.
+
+ Git Transport
+~~~
+
+When using the git:// transport, you can request to use protocol v2 by
+sending "version=2" as an extra parameter:
+
+   003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0
+
+ SSH and File Transport
+
+
+When using either the ssh:// or file:// transport, the GIT_PROTOCOL
+environment variable must be set explicitly to include "version=2".
+
+ HTTP 

[PATCH v6 18/35] fetch: pass ref prefixes when fetching

2018-03-15 Thread Brandon Williams
Construct a list of ref prefixes to be passed to
'transport_get_remote_refs()' from the refspec to be used during the
fetch.  This list of ref prefixes will be used to allow the server to
filter the ref advertisement when communicating using protocol v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 850382f559..8258bbf950 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -332,11 +332,28 @@ static struct ref *get_ref_map(struct transport 
*transport,
struct ref *rm;
struct ref *ref_map = NULL;
struct ref **tail = _map;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport, 
NULL);
+   const struct ref *remote_refs;
+
+   for (i = 0; i < refspec_count; i++) {
+   if (!refspecs[i].exact_sha1) {
+   const char *glob = strchr(refspecs[i].src, '*');
+   if (glob)
+   argv_array_pushf(_prefixes, "%.*s",
+(int)(glob - refspecs[i].src),
+refspecs[i].src);
+   else
+   expand_ref_prefix(_prefixes, 
refspecs[i].src);
+   }
+   }
+
+   remote_refs = transport_get_remote_refs(transport, _prefixes);
+
+   argv_array_clear(_prefixes);
 
if (refspec_count) {
struct refspec *fetch_refspec;
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 13/35] ls-refs: introduce ls-refs server command

2018-03-15 Thread Brandon Williams
Introduce the ls-refs server command.  In protocol v2, the ls-refs
command is used to request the ref advertisement from the server.  Since
it is a command which can be requested (as opposed to mandatory in v1),
a client can sent a number of parameters in its request to limit the ref
advertisement based on provided ref-prefixes.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt |  31 +++
 Makefile|   1 +
 ls-refs.c   |  96 
 ls-refs.h   |  10 +++
 serve.c |   8 ++
 t/t5701-git-serve.sh| 115 
 6 files changed, 261 insertions(+)
 create mode 100644 ls-refs.c
 create mode 100644 ls-refs.h

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 270b28f364..bbb8b14d27 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -168,3 +168,34 @@ printable ASCII characters except space (i.e., the byte 
range 32 < x <
 "git/1.8.3.1"). The agent strings are purely informative for statistics
 and debugging purposes, and MUST NOT be used to programmatically assume
 the presence or absence of particular features.
+
+ ls-refs
+~
+
+`ls-refs` is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in arguments
+which can be used to limit the refs sent from the server.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features: "= "
+
+ls-refs takes in the following arguments:
+
+symrefs
+   In addition to the object pointed by it, show the underlying ref
+   pointed by it when showing a symbolic ref.
+peel
+   Show peeled tags.
+ref-prefix 
+   When specified, only references having a prefix matching one of
+   the provided prefixes are displayed.
+
+The output of ls-refs is as follows:
+
+output = *ref
+flush-pkt
+ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+ref-attribute = (symref | peeled)
+symref = "symref-target:" symref-target
+peeled = "peeled:" obj-id
diff --git a/Makefile b/Makefile
index 18c255428a..e50927cfb3 100644
--- a/Makefile
+++ b/Makefile
@@ -825,6 +825,7 @@ LIB_OBJS += list-objects-filter-options.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
diff --git a/ls-refs.c b/ls-refs.c
new file mode 100644
index 00..a06f12eca8
--- /dev/null
+++ b/ls-refs.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "repository.h"
+#include "refs.h"
+#include "remote.h"
+#include "argv-array.h"
+#include "ls-refs.h"
+#include "pkt-line.h"
+
+/*
+ * Check if one of the prefixes is a prefix of the ref.
+ * If no prefixes were provided, all refs match.
+ */
+static int ref_match(const struct argv_array *prefixes, const char *refname)
+{
+   int i;
+
+   if (!prefixes->argc)
+   return 1; /* no restriction */
+
+   for (i = 0; i < prefixes->argc; i++) {
+   const char *prefix = prefixes->argv[i];
+
+   if (starts_with(refname, prefix))
+   return 1;
+   }
+
+   return 0;
+}
+
+struct ls_refs_data {
+   unsigned peel;
+   unsigned symrefs;
+   struct argv_array prefixes;
+};
+
+static int send_ref(const char *refname, const struct object_id *oid,
+   int flag, void *cb_data)
+{
+   struct ls_refs_data *data = cb_data;
+   const char *refname_nons = strip_namespace(refname);
+   struct strbuf refline = STRBUF_INIT;
+
+   if (!ref_match(>prefixes, refname))
+   return 0;
+
+   strbuf_addf(, "%s %s", oid_to_hex(oid), refname_nons);
+   if (data->symrefs && flag & REF_ISSYMREF) {
+   struct object_id unused;
+   const char *symref_target = resolve_ref_unsafe(refname, 0,
+  ,
+  );
+
+   if (!symref_target)
+   die("'%s' is a symref but it is not?", refname);
+
+   strbuf_addf(, " symref-target:%s", symref_target);
+   }
+
+   if (data->peel) {
+   struct object_id peeled;
+   if (!peel_ref(refname, ))
+   strbuf_addf(, " peeled:%s", 
oid_to_hex());
+   }
+
+   strbuf_addch(, '\n');
+   packet_write(1, refline.buf, refline.len);
+
+   strbuf_release();
+   return 0;
+}
+
+int ls_refs(struct repository *r, struct argv_array *keys,
+   struct packet_reader *request)
+{
+   struct ls_refs_data data;
+
+  

[PATCH v6 16/35] transport: convert transport_get_remote_refs to take a list of ref prefixes

2018-03-15 Thread Brandon Williams
Teach transport_get_remote_refs() to accept a list of ref prefixes,
which will be sent to the server for use in filtering when using
protocol v2. (This list will be ignored when not using protocol v2.)

Signed-off-by: Brandon Williams 
---
 builtin/clone.c |  2 +-
 builtin/fetch.c |  4 ++--
 builtin/ls-remote.c |  2 +-
 builtin/remote.c|  2 +-
 transport.c |  7 +--
 transport.h | 12 +++-
 6 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 284651797e..6e77d993fa 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1121,7 +1121,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !deepen)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport);
+   refs = transport_get_remote_refs(transport, NULL);
 
if (refs) {
mapped_refs = wanted_peer_refs(refs, refspec);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7bbcd26faf..850382f559 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -250,7 +250,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, _refs);
-   for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
+   for (ref = transport_get_remote_refs(transport, NULL); ref; ref = 
ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -336,7 +336,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport);
+   const struct ref *remote_refs = transport_get_remote_refs(transport, 
NULL);
 
if (refspec_count) {
struct refspec *fetch_refspec;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c4be98ab9e..c6e9847c5c 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -96,7 +96,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport);
+   ref = transport_get_remote_refs(transport, NULL);
if (transport_disconnect(transport))
return 1;
 
diff --git a/builtin/remote.c b/builtin/remote.c
index d95bf904c3..d0b6ff6e29 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -862,7 +862,7 @@ static int get_remote_ref_states(const char *name,
if (query) {
transport = transport_get(states->remote, 
states->remote->url_nr > 0 ?
states->remote->url[0] : NULL);
-   remote_refs = transport_get_remote_refs(transport);
+   remote_refs = transport_get_remote_refs(transport, NULL);
transport_disconnect(transport);
 
states->queried = 1;
diff --git a/transport.c b/transport.c
index 2e68010dd0..3f130518d2 100644
--- a/transport.c
+++ b/transport.c
@@ -1138,10 +1138,13 @@ int transport_push(struct transport *transport,
return 1;
 }
 
-const struct ref *transport_get_remote_refs(struct transport *transport)
+const struct ref *transport_get_remote_refs(struct transport *transport,
+   const struct argv_array 
*ref_prefixes)
 {
if (!transport->got_remote_refs) {
-   transport->remote_refs = 
transport->vtable->get_refs_list(transport, 0, NULL);
+   transport->remote_refs =
+   transport->vtable->get_refs_list(transport, 0,
+ref_prefixes);
transport->got_remote_refs = 1;
}
 
diff --git a/transport.h b/transport.h
index 731c78b679..83992a4257 100644
--- a/transport.h
+++ b/transport.h
@@ -178,7 +178,17 @@ int transport_push(struct transport *connection,
   int refspec_nr, const char **refspec, int flags,
   unsigned int * reject_reasons);
 
-const struct ref *transport_get_remote_refs(struct transport *transport);
+/*
+ * Retrieve refs from a remote.
+ *
+ * Optionally a list of ref prefixes can be provided which can be sent to the
+ * server (when communicating using protocol v2) to enable it to limit the ref
+ * advertisement.  Since ref filtering is done on the server's end (and only
+ * when using protocol v2), this can return refs which don't match the provided
+ * ref_prefixes.
+ */
+const struct ref *transport_get_remote_refs(struct transport *transport,
+   const struct argv_array 
*ref_prefixes);
 
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void 

[PATCH v6 22/35] fetch-pack: support shallow requests

2018-03-15 Thread Brandon Williams
Enable shallow clones and deepen requests using protocol version 2 if
the server 'fetch' command supports the 'shallow' feature.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt | 18 ---
 connect.c   | 22 
 connect.h   |  2 +
 fetch-pack.c| 71 -
 4 files changed, 105 insertions(+), 8 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 4f7f251569..136179d7d8 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -255,6 +255,10 @@ A `fetch` request can take the following arguments:
to its base by position in pack rather than by an oid.  That is,
they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
 
+If the 'shallow' feature is advertised the following arguments can be
+included in the clients request as well as the potential addition of the
+'shallow-info' section in the server's response as explained below.
+
 shallow 
A client must notify the server of all commits for which it only
has shallow copies (meaning that it doesn't have the parents of
@@ -338,13 +342,13 @@ header.
  further negotiation is needed.
 
 shallow-info section
-   If the client has requested a shallow fetch/clone, a shallow
-   client requests a fetch or the server is shallow then the
-   server's response may include a shallow-info section.  The
-   shallow-info section will be included if (due to one of the
-   above conditions) the server needs to inform the client of any
-   shallow boundaries or adjustments to the clients already
-   existing shallow boundaries.
+   * If the client has requested a shallow fetch/clone, a shallow
+ client requests a fetch or the server is shallow then the
+ server's response may include a shallow-info section.  The
+ shallow-info section will be included if (due to one of the
+ above conditions) the server needs to inform the client of any
+ shallow boundaries or adjustments to the clients already
+ existing shallow boundaries.
 
* Always begins with the section header "shallow-info"
 
diff --git a/connect.c b/connect.c
index e42d779f71..5bb9d34844 100644
--- a/connect.c
+++ b/connect.c
@@ -82,6 +82,28 @@ int server_supports_v2(const char *c, int die_on_error)
return 0;
 }
 
+int server_supports_feature(const char *c, const char *feature,
+   int die_on_error)
+{
+   int i;
+
+   for (i = 0; i < server_capabilities_v2.argc; i++) {
+   const char *out;
+   if (skip_prefix(server_capabilities_v2.argv[i], c, ) &&
+   (!*out || *(out++) == '=')) {
+   if (parse_feature_request(out, feature))
+   return 1;
+   else
+   break;
+   }
+   }
+
+   if (die_on_error)
+   die("server doesn't support feature '%s'", feature);
+
+   return 0;
+}
+
 static void process_capabilities_v2(struct packet_reader *reader)
 {
while (packet_reader_read(reader) == PACKET_READ_NORMAL)
diff --git a/connect.h b/connect.h
index 8898d44952..0e69c6709c 100644
--- a/connect.h
+++ b/connect.h
@@ -17,5 +17,7 @@ struct packet_reader;
 extern enum protocol_version discover_version(struct packet_reader *reader);
 
 extern int server_supports_v2(const char *c, int die_on_error);
+extern int server_supports_feature(const char *c, const char *feature,
+  int die_on_error);
 
 #endif
diff --git a/fetch-pack.c b/fetch-pack.c
index dffcfd66a5..837e1fd21d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1008,6 +1008,26 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
return ref;
 }
 
+static void add_shallow_requests(struct strbuf *req_buf,
+const struct fetch_pack_args *args)
+{
+   if (is_repository_shallow())
+   write_shallow_commits(req_buf, 1, NULL);
+   if (args->depth > 0)
+   packet_buf_write(req_buf, "deepen %d", args->depth);
+   if (args->deepen_since) {
+   timestamp_t max_age = approxidate(args->deepen_since);
+   packet_buf_write(req_buf, "deepen-since %"PRItime, max_age);
+   }
+   if (args->deepen_not) {
+   int i;
+   for (i = 0; i < args->deepen_not->nr; i++) {
+   struct string_list_item *s = args->deepen_not->items + 
i;
+   packet_buf_write(req_buf, "deepen-not %s", s->string);
+   }
+   }
+}
+
 static void add_wants(const struct ref *wants, struct strbuf *req_buf)
 {
for ( ; wants ; wants = wants->next) {
@@ -1093,6 +1113,12 @@ static int 

[PATCH v6 11/35] test-pkt-line: introduce a packet-line test helper

2018-03-15 Thread Brandon Williams
Introduce a packet-line test helper which can either pack or unpack an
input stream into packet-lines and writes out the result to stdout.

Signed-off-by: Brandon Williams 
---
 Makefile |  1 +
 t/helper/test-pkt-line.c | 64 
 2 files changed, 65 insertions(+)
 create mode 100644 t/helper/test-pkt-line.c

diff --git a/Makefile b/Makefile
index b7ccc05fac..3b849c0607 100644
--- a/Makefile
+++ b/Makefile
@@ -669,6 +669,7 @@ TEST_PROGRAMS_NEED_X += test-mktemp
 TEST_PROGRAMS_NEED_X += test-online-cpus
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-pkt-line
 TEST_PROGRAMS_NEED_X += test-prio-queue
 TEST_PROGRAMS_NEED_X += test-read-cache
 TEST_PROGRAMS_NEED_X += test-write-cache
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
new file mode 100644
index 00..0f19e53c75
--- /dev/null
+++ b/t/helper/test-pkt-line.c
@@ -0,0 +1,64 @@
+#include "pkt-line.h"
+
+static void pack_line(const char *line)
+{
+   if (!strcmp(line, "") || !strcmp(line, "\n"))
+   packet_flush(1);
+   else if (!strcmp(line, "0001") || !strcmp(line, "0001\n"))
+   packet_delim(1);
+   else
+   packet_write_fmt(1, "%s", line);
+}
+
+static void pack(int argc, const char **argv)
+{
+   if (argc) { /* read from argv */
+   int i;
+   for (i = 0; i < argc; i++)
+   pack_line(argv[i]);
+   } else { /* read from stdin */
+   char line[LARGE_PACKET_MAX];
+   while (fgets(line, sizeof(line), stdin)) {
+   pack_line(line);
+   }
+   }
+}
+
+static void unpack(void)
+{
+   struct packet_reader reader;
+   packet_reader_init(, 0, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF |
+  PACKET_READ_CHOMP_NEWLINE);
+
+   while (packet_reader_read() != PACKET_READ_EOF) {
+   switch (reader.status) {
+   case PACKET_READ_EOF:
+   break;
+   case PACKET_READ_NORMAL:
+   printf("%s\n", reader.line);
+   break;
+   case PACKET_READ_FLUSH:
+   printf("\n");
+   break;
+   case PACKET_READ_DELIM:
+   printf("0001\n");
+   break;
+   }
+   }
+}
+
+int cmd_main(int argc, const char **argv)
+{
+   if (argc < 2)
+   die("too few arguments");
+
+   if (!strcmp(argv[1], "pack"))
+   pack(argc - 2, argv + 2);
+   else if (!strcmp(argv[1], "unpack"))
+   unpack();
+   else
+   die("invalid argument '%s'", argv[1]);
+
+   return 0;
+}
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 10/35] protocol: introduce enum protocol_version value protocol_v2

2018-03-15 Thread Brandon Williams
Introduce protocol_v2, a new value for 'enum protocol_version'.
Subsequent patches will fill in the implementation of protocol_v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c   | 2 ++
 builtin/receive-pack.c | 6 ++
 builtin/send-pack.c| 3 +++
 builtin/upload-pack.c  | 7 +++
 connect.c  | 3 +++
 protocol.c | 2 ++
 protocol.h | 1 +
 remote-curl.c  | 3 +++
 transport.c| 9 +
 9 files changed, 36 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 85d4faf76c..b2374ddbbf 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -201,6 +201,8 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version()) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
case protocol_v1:
case protocol_v0:
get_remote_heads(, , 0, NULL, );
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index b7ce7c7f52..3656e94fdb 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1963,6 +1963,12 @@ int cmd_receive_pack(int argc, const char **argv, const 
char *prefix)
unpack_limit = receive_unpack_limit;
 
switch (determine_protocol_version_server()) {
+   case protocol_v2:
+   /*
+* push support for protocol v2 has not been implemented yet,
+* so ignore the request to use v2 and fallback to using v0.
+*/
+   break;
case protocol_v1:
/*
 * v1 is just the original protocol with a version string,
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 83cb125a68..b5427f75e3 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -263,6 +263,9 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version()) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(, _refs, REF_NORMAL,
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 2cb5cb35b0..8d53e9794b 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -47,6 +47,13 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
die("'%s' does not appear to be a git repository", dir);
 
switch (determine_protocol_version_server()) {
+   case protocol_v2:
+   /*
+* fetch support for protocol v2 has not been implemented yet,
+* so ignore the request to use v2 and fallback to using v0.
+*/
+   upload_pack();
+   break;
case protocol_v1:
/*
 * v1 is just the original protocol with a version string,
diff --git a/connect.c b/connect.c
index 0b111e62d7..4b89b984c4 100644
--- a/connect.c
+++ b/connect.c
@@ -83,6 +83,9 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
}
 
switch (version) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
/* Read the peeked version line */
packet_reader_read(reader);
diff --git a/protocol.c b/protocol.c
index 43012b7eb6..5e636785d1 100644
--- a/protocol.c
+++ b/protocol.c
@@ -8,6 +8,8 @@ static enum protocol_version parse_protocol_version(const char 
*value)
return protocol_v0;
else if (!strcmp(value, "1"))
return protocol_v1;
+   else if (!strcmp(value, "2"))
+   return protocol_v2;
else
return protocol_unknown_version;
 }
diff --git a/protocol.h b/protocol.h
index 1b2bc94a8d..2ad35e433c 100644
--- a/protocol.h
+++ b/protocol.h
@@ -5,6 +5,7 @@ enum protocol_version {
protocol_unknown_version = -1,
protocol_v0 = 0,
protocol_v1 = 1,
+   protocol_v2 = 2,
 };
 
 /*
diff --git a/remote-curl.c b/remote-curl.c
index 9f6d07683d..dae8a4a48d 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -185,6 +185,9 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version()) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(, , for_push ? REF_NORMAL : 0,
diff --git a/transport.c b/transport.c
index 2378dcb38c..83d9dd1df6 100644
--- a/transport.c
+++ b/transport.c
@@ -203,6 +203,9 @@ static struct ref *get_refs_via_connect(struct transport 

[PATCH v6 09/35] transport: store protocol version

2018-03-15 Thread Brandon Williams
Once protocol_v2 is introduced requesting a fetch or a push will need to
be handled differently depending on the protocol version.  Store the
protocol version the server is speaking in 'struct git_transport_data'
and use it to determine what to do in the case of a fetch or a push.

Signed-off-by: Brandon Williams 
---
 transport.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/transport.c b/transport.c
index 63c3dbab94..2378dcb38c 100644
--- a/transport.c
+++ b/transport.c
@@ -118,6 +118,7 @@ struct git_transport_data {
struct child_process *conn;
int fd[2];
unsigned got_remote_heads : 1;
+   enum protocol_version version;
struct oid_array extra_have;
struct oid_array shallow;
 };
@@ -200,7 +201,8 @@ static struct ref *get_refs_via_connect(struct transport 
*transport, int for_pus
   PACKET_READ_CHOMP_NEWLINE |
   PACKET_READ_GENTLE_ON_EOF);
 
-   switch (discover_version()) {
+   data->version = discover_version();
+   switch (data->version) {
case protocol_v1:
case protocol_v0:
get_remote_heads(, ,
@@ -221,7 +223,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 {
int ret = 0;
struct git_transport_data *data = transport->data;
-   struct ref *refs;
+   struct ref *refs = NULL;
char *dest = xstrdup(transport->url);
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
@@ -247,10 +249,18 @@ static int fetch_refs_via_pack(struct transport 
*transport,
if (!data->got_remote_heads)
refs_tmp = get_refs_via_connect(transport, 0);
 
-   refs = fetch_pack(, data->fd, data->conn,
- refs_tmp ? refs_tmp : transport->remote_refs,
- dest, to_fetch, nr_heads, >shallow,
- >pack_lockfile);
+   switch (data->version) {
+   case protocol_v1:
+   case protocol_v0:
+   refs = fetch_pack(, data->fd, data->conn,
+ refs_tmp ? refs_tmp : transport->remote_refs,
+ dest, to_fetch, nr_heads, >shallow,
+ >pack_lockfile);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
close(data->fd[0]);
close(data->fd[1]);
if (finish_connect(data->conn))
@@ -549,7 +559,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
 {
struct git_transport_data *data = transport->data;
struct send_pack_args args;
-   int ret;
+   int ret = 0;
 
if (!data->got_remote_heads)
get_refs_via_connect(transport, 1);
@@ -574,8 +584,15 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
else
args.push_cert = SEND_PACK_PUSH_CERT_NEVER;
 
-   ret = send_pack(, data->fd, data->conn, remote_refs,
-   >extra_have);
+   switch (data->version) {
+   case protocol_v1:
+   case protocol_v0:
+   ret = send_pack(, data->fd, data->conn, remote_refs,
+   >extra_have);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
close(data->fd[1]);
close(data->fd[0]);
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 08/35] connect: discover protocol version outside of get_remote_heads

2018-03-15 Thread Brandon Williams
In order to prepare for the addition of protocol_v2 push the protocol
version discovery outside of 'get_remote_heads()'.  This will allow for
keeping the logic for processing the reference advertisement for
protocol_v1 and protocol_v0 separate from the logic for protocol_v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c | 16 +++-
 builtin/send-pack.c  | 17 +++--
 connect.c| 27 ++-
 connect.h|  3 +++
 remote-curl.c| 20 ++--
 remote.h |  5 +++--
 transport.c  | 24 +++-
 7 files changed, 83 insertions(+), 29 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f9..85d4faf76c 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -4,6 +4,7 @@
 #include "remote.h"
 #include "connect.h"
 #include "sha1-array.h"
+#include "protocol.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -52,6 +53,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct fetch_pack_args args;
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
+   struct packet_reader reader;
 
packet_trace_identity("fetch-pack");
 
@@ -193,7 +195,19 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
if (!conn)
return args.diag_url ? 0 : 1;
}
-   get_remote_heads(fd[0], NULL, 0, , 0, NULL, );
+
+   packet_reader_init(, fd[0], NULL, 0,
+  PACKET_READ_CHOMP_NEWLINE |
+  PACKET_READ_GENTLE_ON_EOF);
+
+   switch (discover_version()) {
+   case protocol_v1:
+   case protocol_v0:
+   get_remote_heads(, , 0, NULL, );
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
ref = fetch_pack(, fd, conn, ref, dest, sought, nr_sought,
 , pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index fc4f0bb5fb..83cb125a68 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -14,6 +14,7 @@
 #include "sha1-array.h"
 #include "gpg-interface.h"
 #include "gettext.h"
+#include "protocol.h"
 
 static const char * const send_pack_usage[] = {
N_("git send-pack [--all | --mirror] [--dry-run] [--force] "
@@ -154,6 +155,7 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
int progress = -1;
int from_stdin = 0;
struct push_cas_option cas = {0};
+   struct packet_reader reader;
 
struct option options[] = {
OPT__VERBOSITY(),
@@ -256,8 +258,19 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
args.verbose ? CONNECT_VERBOSE : 0);
}
 
-   get_remote_heads(fd[0], NULL, 0, _refs, REF_NORMAL,
-_have, );
+   packet_reader_init(, fd[0], NULL, 0,
+  PACKET_READ_CHOMP_NEWLINE |
+  PACKET_READ_GENTLE_ON_EOF);
+
+   switch (discover_version()) {
+   case protocol_v1:
+   case protocol_v0:
+   get_remote_heads(, _refs, REF_NORMAL,
+_have, );
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
transport_verify_remote_names(nr_refspecs, refspecs);
 
diff --git a/connect.c b/connect.c
index c82c90b7c3..0b111e62d7 100644
--- a/connect.c
+++ b/connect.c
@@ -62,7 +62,7 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
-static enum protocol_version discover_version(struct packet_reader *reader)
+enum protocol_version discover_version(struct packet_reader *reader)
 {
enum protocol_version version = protocol_unknown_version;
 
@@ -233,7 +233,7 @@ enum get_remote_heads_state {
 /*
  * Read all the refs from the other end
  */
-struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+struct ref **get_remote_heads(struct packet_reader *reader,
  struct ref **list, unsigned int flags,
  struct oid_array *extra_have,
  struct oid_array *shallow_points)
@@ -241,24 +241,17 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
struct ref **orig_list = list;
int len = 0;
enum get_remote_heads_state state = EXPECTING_FIRST_REF;
-   struct packet_reader reader;
const char *arg;
 
-   packet_reader_init(, in, src_buf, src_len,
-  PACKET_READ_CHOMP_NEWLINE |
-  PACKET_READ_GENTLE_ON_EOF);
-
-   discover_version();
-
*list = NULL;
 

[PATCH v6 03/35] pkt-line: add delim packet support

2018-03-15 Thread Brandon Williams
One of the design goals of protocol-v2 is to improve the semantics of
flush packets.  Currently in protocol-v1, flush packets are used both to
indicate a break in a list of packet lines as well as an indication that
one side has finished speaking.  This makes it particularly difficult
to implement proxies as a proxy would need to completely understand git
protocol instead of simply looking for a flush packet.

To do this, introduce the special deliminator packet '0001'.  A delim
packet can then be used as a deliminator between lists of packet lines
while flush packets can be reserved to indicate the end of a response.

Documentation for how this packet will be used in protocol v2 will
included in a future patch.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 16 
 pkt-line.h |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 1881dc8813..7296731cf3 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,12 @@ void packet_flush(int fd)
write_or_die(fd, "", 4);
 }
 
+void packet_delim(int fd)
+{
+   packet_trace("0001", 4, 1);
+   write_or_die(fd, "0001", 4);
+}
+
 int packet_flush_gently(int fd)
 {
packet_trace("", 4, 1);
@@ -105,6 +111,12 @@ void packet_buf_flush(struct strbuf *buf)
strbuf_add(buf, "", 4);
 }
 
+void packet_buf_delim(struct strbuf *buf)
+{
+   packet_trace("0001", 4, 1);
+   strbuf_add(buf, "0001", 4);
+}
+
 static void set_packet_header(char *buf, const int size)
 {
static char hexchar[] = "0123456789abcdef";
@@ -301,6 +313,10 @@ enum packet_read_status packet_read_with_status(int fd, 
char **src_buffer,
packet_trace("", 4, 0);
*pktlen = 0;
return PACKET_READ_FLUSH;
+   } else if (len == 1) {
+   packet_trace("0001", 4, 0);
+   *pktlen = 0;
+   return PACKET_READ_DELIM;
} else if (len < 4) {
die("protocol error: bad line length %d", len);
}
diff --git a/pkt-line.h b/pkt-line.h
index 11b04f026f..9570bd7a0a 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,8 +20,10 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
+void packet_delim(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format 
(printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
+void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
@@ -75,6 +77,7 @@ enum packet_read_status {
PACKET_READ_EOF,
PACKET_READ_NORMAL,
PACKET_READ_FLUSH,
+   PACKET_READ_DELIM,
 };
 enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
size_t *src_len, char *buffer,
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 07/35] connect: convert get_remote_heads to use struct packet_reader

2018-03-15 Thread Brandon Williams
In order to allow for better control flow when protocol_v2 is introduced
convert 'get_remote_heads()' to use 'struct packet_reader' to read
packet lines.  This enables a client to be able to peek the first line
of a server's response (without consuming it) in order to determine the
protocol version its speaking and then passing control to the
appropriate handler.

This is needed because the initial response from a server speaking
protocol_v0 includes the first ref, while subsequent protocol versions
respond with a version line.  We want to be able to read this first line
without consuming the first ref sent in the protocol_v0 case so that the
protocol version the server is speaking can be determined outside of
'get_remote_heads()' in a future patch.

Signed-off-by: Brandon Williams 
---
 connect.c | 173 ++
 1 file changed, 95 insertions(+), 78 deletions(-)

diff --git a/connect.c b/connect.c
index c3a014c5ba..c82c90b7c3 100644
--- a/connect.c
+++ b/connect.c
@@ -48,6 +48,12 @@ int check_ref_type(const struct ref *ref, int flags)
 
 static void die_initial_contact(int unexpected)
 {
+   /*
+* A hang-up after seeing some response from the other end
+* means that it is unexpected, as we know the other end is
+* willing to talk to us.  A hang-up before seeing any
+* response does not necessarily mean an ACL problem, though.
+*/
if (unexpected)
die(_("The remote end hung up upon initial contact"));
else
@@ -56,6 +62,40 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
+static enum protocol_version discover_version(struct packet_reader *reader)
+{
+   enum protocol_version version = protocol_unknown_version;
+
+   /*
+* Peek the first line of the server's response to
+* determine the protocol version the server is speaking.
+*/
+   switch (packet_reader_peek(reader)) {
+   case PACKET_READ_EOF:
+   die_initial_contact(0);
+   case PACKET_READ_FLUSH:
+   case PACKET_READ_DELIM:
+   version = protocol_v0;
+   break;
+   case PACKET_READ_NORMAL:
+   version = determine_protocol_version_client(reader->line);
+   break;
+   }
+
+   switch (version) {
+   case protocol_v1:
+   /* Read the peeked version line */
+   packet_reader_read(reader);
+   break;
+   case protocol_v0:
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
+   return version;
+}
+
 static void parse_one_symref_info(struct string_list *symref, const char *val, 
int len)
 {
char *sym, *target;
@@ -109,60 +149,21 @@ static void annotate_refs_with_symref_info(struct ref 
*ref)
string_list_clear(, 0);
 }
 
-/*
- * Read one line of a server's ref advertisement into packet_buffer.
- */
-static int read_remote_ref(int in, char **src_buf, size_t *src_len,
-  int *responded)
+static void process_capabilities(const char *line, int *len)
 {
-   int len = packet_read(in, src_buf, src_len,
- packet_buffer, sizeof(packet_buffer),
- PACKET_READ_GENTLE_ON_EOF |
- PACKET_READ_CHOMP_NEWLINE);
-   const char *arg;
-   if (len < 0)
-   die_initial_contact(*responded);
-   if (len > 4 && skip_prefix(packet_buffer, "ERR ", ))
-   die("remote error: %s", arg);
-
-   *responded = 1;
-
-   return len;
-}
-
-#define EXPECTING_PROTOCOL_VERSION 0
-#define EXPECTING_FIRST_REF 1
-#define EXPECTING_REF 2
-#define EXPECTING_SHALLOW 3
-
-/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
-static int process_protocol_version(void)
-{
-   switch (determine_protocol_version_client(packet_buffer)) {
-   case protocol_v1:
-   return 1;
-   case protocol_v0:
-   return 0;
-   default:
-   die("server is speaking an unknown protocol");
-   }
-}
-
-static void process_capabilities(int *len)
-{
-   int nul_location = strlen(packet_buffer);
+   int nul_location = strlen(line);
if (nul_location == *len)
return;
-   server_capabilities = xstrdup(packet_buffer + nul_location + 1);
+   server_capabilities = xstrdup(line + nul_location + 1);
*len = nul_location;
 }
 
-static int process_dummy_ref(void)
+static int process_dummy_ref(const char *line)
 {
struct object_id oid;
const char *name;
 
-   if (parse_oid_hex(packet_buffer, , ))
+   if (parse_oid_hex(line, , ))
return 0;
if (*name != ' ')
return 0;
@@ -171,20 +172,20 @@ static int process_dummy_ref(void)
return !oidcmp(_oid, ) && 

[PATCH v6 02/35] pkt-line: allow peeking a packet line without consuming it

2018-03-15 Thread Brandon Williams
Sometimes it is advantageous to be able to peek the next packet line
without consuming it (e.g. to be able to determine the protocol version
a server is speaking).  In order to do that introduce 'struct
packet_reader' which is an abstraction around the normal packet reading
logic.  This enables a caller to be able to peek a single line at a time
using 'packet_reader_peek()' and having a caller consume a line by
calling 'packet_reader_read()'.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 50 ++
 pkt-line.h | 58 ++
 2 files changed, 108 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index db2fb29ac3..1881dc8813 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -400,3 +400,53 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf 
*sb_out)
}
return sb_out->len - orig_len;
 }
+
+/* Packet Reader Functions */
+void packet_reader_init(struct packet_reader *reader, int fd,
+   char *src_buffer, size_t src_len,
+   int options)
+{
+   memset(reader, 0, sizeof(*reader));
+
+   reader->fd = fd;
+   reader->src_buffer = src_buffer;
+   reader->src_len = src_len;
+   reader->buffer = packet_buffer;
+   reader->buffer_size = sizeof(packet_buffer);
+   reader->options = options;
+}
+
+enum packet_read_status packet_reader_read(struct packet_reader *reader)
+{
+   if (reader->line_peeked) {
+   reader->line_peeked = 0;
+   return reader->status;
+   }
+
+   reader->status = packet_read_with_status(reader->fd,
+>src_buffer,
+>src_len,
+reader->buffer,
+reader->buffer_size,
+>pktlen,
+reader->options);
+
+   if (reader->status == PACKET_READ_NORMAL)
+   reader->line = reader->buffer;
+   else
+   reader->line = NULL;
+
+   return reader->status;
+}
+
+enum packet_read_status packet_reader_peek(struct packet_reader *reader)
+{
+   /* Only allow peeking a single line */
+   if (reader->line_peeked)
+   return reader->status;
+
+   /* Peek a line by reading it and setting peeked flag */
+   packet_reader_read(reader);
+   reader->line_peeked = 1;
+   return reader->status;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 099b26b95f..11b04f026f 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -112,6 +112,64 @@ char *packet_read_line_buf(char **src_buf, size_t 
*src_len, int *size);
  */
 ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
 
+struct packet_reader {
+   /* source file descriptor */
+   int fd;
+
+   /* source buffer and its size */
+   char *src_buffer;
+   size_t src_len;
+
+   /* buffer that pkt-lines are read into and its size */
+   char *buffer;
+   unsigned buffer_size;
+
+   /* options to be used during reads */
+   int options;
+
+   /* status of the last read */
+   enum packet_read_status status;
+
+   /* length of data read during the last read */
+   int pktlen;
+
+   /* the last line read */
+   const char *line;
+
+   /* indicates if a line has been peeked */
+   int line_peeked;
+};
+
+/*
+ * Initialize a 'struct packet_reader' object which is an
+ * abstraction around the 'packet_read_with_status()' function.
+ */
+extern void packet_reader_init(struct packet_reader *reader, int fd,
+  char *src_buffer, size_t src_len,
+  int options);
+
+/*
+ * Perform a packet read and return the status of the read.
+ * The values of 'pktlen' and 'line' are updated based on the status of the
+ * read as follows:
+ *
+ * PACKET_READ_ERROR: 'pktlen' is set to '-1' and 'line' is set to NULL
+ * PACKET_READ_NORMAL: 'pktlen' is set to the number of bytes read
+ *'line' is set to point at the read line
+ * PACKET_READ_FLUSH: 'pktlen' is set to '0' and 'line' is set to NULL
+ */
+extern enum packet_read_status packet_reader_read(struct packet_reader 
*reader);
+
+/*
+ * Peek the next packet line without consuming it and return the status.
+ * The next call to 'packet_reader_read()' will perform a read of the same line
+ * that was peeked, consuming the line.
+ *
+ * Peeking multiple times without calling 'packet_reader_read()' will return
+ * the same result.
+ */
+extern enum packet_read_status packet_reader_peek(struct packet_reader 
*reader);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 #define LARGE_PACKET_DATA_MAX (LARGE_PACKET_MAX - 4)
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 06/35] transport: use get_refs_via_connect to get refs

2018-03-15 Thread Brandon Williams
Remove code duplication and use the existing 'get_refs_via_connect()'
function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
'git_transport_push()'.

Signed-off-by: Brandon Williams 
---
 transport.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/transport.c b/transport.c
index fc802260f6..8e87790962 100644
--- a/transport.c
+++ b/transport.c
@@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport *transport,
args.cloning = transport->cloning;
args.update_shallow = data->options.update_shallow;
 
-   if (!data->got_remote_heads) {
-   connect_setup(transport, 0);
-   get_remote_heads(data->fd[0], NULL, 0, _tmp, 0,
-NULL, >shallow);
-   data->got_remote_heads = 1;
-   }
+   if (!data->got_remote_heads)
+   refs_tmp = get_refs_via_connect(transport, 0);
 
refs = fetch_pack(, data->fd, data->conn,
  refs_tmp ? refs_tmp : transport->remote_refs,
@@ -541,14 +537,8 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
struct send_pack_args args;
int ret;
 
-   if (!data->got_remote_heads) {
-   struct ref *tmp_refs;
-   connect_setup(transport, 1);
-
-   get_remote_heads(data->fd[0], NULL, 0, _refs, REF_NORMAL,
-NULL, >shallow);
-   data->got_remote_heads = 1;
-   }
+   if (!data->got_remote_heads)
+   get_refs_via_connect(transport, 1);
 
memset(, 0, sizeof(args));
args.send_mirror = !!(flags & TRANSPORT_PUSH_MIRROR);
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 01/35] pkt-line: introduce packet_read_with_status

2018-03-15 Thread Brandon Williams
The current pkt-line API encodes the status of a pkt-line read in the
length of the read content.  An error is indicated with '-1', a flush
with '0' (which can be confusing since a return value of '0' can also
indicate an empty pkt-line), and a positive integer for the length of
the read content otherwise.  This doesn't leave much room for allowing
the addition of additional special packets in the future.

To solve this introduce 'packet_read_with_status()' which reads a packet
and returns the status of the read encoded as an 'enum packet_status'
type.  This allows for easily identifying between special and normal
packets as well as errors.  It also enables easily adding a new special
packet in the future.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 51 +--
 pkt-line.h | 16 
 2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 2827ca772a..db2fb29ac3 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -280,28 +280,39 @@ static int packet_length(const char *linelen)
return (val < 0) ? val : (val << 8) | hex2chr(linelen + 2);
 }
 
-int packet_read(int fd, char **src_buf, size_t *src_len,
-   char *buffer, unsigned size, int options)
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
+   size_t *src_len, char *buffer,
+   unsigned size, int *pktlen,
+   int options)
 {
-   int len, ret;
+   int len;
char linelen[4];
 
-   ret = get_packet_data(fd, src_buf, src_len, linelen, 4, options);
-   if (ret < 0)
-   return ret;
+   if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0) {
+   *pktlen = -1;
+   return PACKET_READ_EOF;
+   }
+
len = packet_length(linelen);
-   if (len < 0)
+
+   if (len < 0) {
die("protocol error: bad line length character: %.4s", linelen);
-   if (!len) {
+   } else if (!len) {
packet_trace("", 4, 0);
-   return 0;
+   *pktlen = 0;
+   return PACKET_READ_FLUSH;
+   } else if (len < 4) {
+   die("protocol error: bad line length %d", len);
}
+
len -= 4;
-   if (len >= size)
+   if ((unsigned)len >= size)
die("protocol error: bad line length %d", len);
-   ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
-   if (ret < 0)
-   return ret;
+
+   if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0) 
{
+   *pktlen = -1;
+   return PACKET_READ_EOF;
+   }
 
if ((options & PACKET_READ_CHOMP_NEWLINE) &&
len && buffer[len-1] == '\n')
@@ -309,7 +320,19 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
 
buffer[len] = 0;
packet_trace(buffer, len, 0);
-   return len;
+   *pktlen = len;
+   return PACKET_READ_NORMAL;
+}
+
+int packet_read(int fd, char **src_buffer, size_t *src_len,
+   char *buffer, unsigned size, int options)
+{
+   int pktlen = -1;
+
+   packet_read_with_status(fd, src_buffer, src_len, buffer, size,
+   , options);
+
+   return pktlen;
 }
 
 static char *packet_read_line_generic(int fd,
diff --git a/pkt-line.h b/pkt-line.h
index 3dad583e2d..099b26b95f 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -65,6 +65,22 @@ int write_packetized_from_buf(const char *src_in, size_t 
len, int fd_out);
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
*buffer, unsigned size, int options);
 
+/*
+ * Read a packetized line into a buffer like the 'packet_read()' function but
+ * returns an 'enum packet_read_status' which indicates the status of the read.
+ * The number of bytes read will be assigined to *pktlen if the status of the
+ * read was 'PACKET_READ_NORMAL'.
+ */
+enum packet_read_status {
+   PACKET_READ_EOF,
+   PACKET_READ_NORMAL,
+   PACKET_READ_FLUSH,
+};
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer,
+   size_t *src_len, char *buffer,
+   unsigned size, int *pktlen,
+   int options);
+
 /*
  * Convenience wrapper for packet_read that is not gentle, and sets the
  * CHOMP_NEWLINE option. The return value is NULL for a flush packet,
-- 
2.16.2.804.g6dcf76e118-goog



[PATCH v6 00/35] protocol version 2

2018-03-15 Thread Brandon Williams
Fixed the protocol-v2.txt technical document so that it builds when
running "make doc".

Brandon Williams (35):
  pkt-line: introduce packet_read_with_status
  pkt-line: allow peeking a packet line without consuming it
  pkt-line: add delim packet support
  upload-pack: convert to a builtin
  upload-pack: factor out processing lines
  transport: use get_refs_via_connect to get refs
  connect: convert get_remote_heads to use struct packet_reader
  connect: discover protocol version outside of get_remote_heads
  transport: store protocol version
  protocol: introduce enum protocol_version value protocol_v2
  test-pkt-line: introduce a packet-line test helper
  serve: introduce git-serve
  ls-refs: introduce ls-refs server command
  connect: request remote refs using v2
  transport: convert get_refs_list to take a list of ref prefixes
  transport: convert transport_get_remote_refs to take a list of ref
prefixes
  ls-remote: pass ref prefixes when requesting a remote's refs
  fetch: pass ref prefixes when fetching
  push: pass ref prefixes when pushing
  upload-pack: introduce fetch server command
  fetch-pack: perform a fetch using v2
  fetch-pack: support shallow requests
  connect: refactor git_connect to only get the protocol version once
  connect: don't request v2 when pushing
  transport-helper: remove name parameter
  transport-helper: refactor process_connect_service
  transport-helper: introduce stateless-connect
  pkt-line: add packet_buf_write_len function
  remote-curl: create copy of the service name
  remote-curl: store the protocol version the server responded with
  http: allow providing extra headers for http requests
  http: don't always add Git-Protocol header
  http: eliminate "# service" line when using protocol v2
  remote-curl: implement stateless-connect command
  remote-curl: don't request v2 when pushing

 .gitignore  |   1 +
 Documentation/Makefile  |   1 +
 Documentation/gitremote-helpers.txt |  32 ++
 Documentation/technical/protocol-v2.txt | 395 +++
 Makefile|   7 +-
 builtin.h   |   2 +
 builtin/clone.c |   2 +-
 builtin/fetch-pack.c|  20 +-
 builtin/fetch.c |  21 +-
 builtin/ls-remote.c |  15 +-
 builtin/receive-pack.c  |   6 +
 builtin/remote.c|   2 +-
 builtin/send-pack.c |  20 +-
 builtin/serve.c |  30 ++
 builtin/upload-pack.c   |  74 +++
 connect.c   | 364 ++
 connect.h   |   7 +
 fetch-pack.c| 339 -
 fetch-pack.h|   4 +-
 git.c   |   2 +
 http-backend.c  |   8 +-
 http.c  |  25 +-
 http.h  |   7 +
 ls-refs.c   |  96 
 ls-refs.h   |  10 +
 pkt-line.c  | 133 -
 pkt-line.h  |  78 +++
 protocol.c  |   2 +
 protocol.h  |   1 +
 refs.c  |  14 +
 refs.h  |   7 +
 remote-curl.c   | 280 ++-
 remote.h|  11 +-
 serve.c | 257 ++
 serve.h |  15 +
 t/helper/test-pkt-line.c|  64 +++
 t/t5701-git-serve.sh| 176 +++
 t/t5702-protocol-v2.sh  | 273 +++
 transport-helper.c  |  87 ++--
 transport-internal.h|  11 +-
 transport.c | 130 +++--
 transport.h |  18 +-
 upload-pack.c   | 615 ++--
 upload-pack.h   |  23 +
 44 files changed, 3317 insertions(+), 368 deletions(-)
 create mode 100644 Documentation/technical/protocol-v2.txt
 create mode 100644 builtin/serve.c
 create mode 100644 builtin/upload-pack.c
 create mode 100644 ls-refs.c
 create mode 100644 ls-refs.h
 create mode 100644 serve.c
 create mode 100644 serve.h
 create mode 100644 t/helper/test-pkt-line.c
 create mode 100755 t/t5701-git-serve.sh
 create mode 100755 t/t5702-protocol-v2.sh
 create mode 100644 upload-pack.h


base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
-- 
2.16.2.804.g6dcf76e118-goog



Re: [PATCH v5 01/35] pkt-line: introduce packet_read_with_status

2018-03-15 Thread Brandon Williams
On 03/14, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > +/*
> > + * Read a packetized line into a buffer like the 'packet_read()' function 
> > but
> > + * returns an 'enum packet_read_status' which indicates the status of the 
> > read.
> > + * The number of bytes read will be assigined to *pktlen if the status of 
> > the
> > + * read was 'PACKET_READ_NORMAL'.
> > + */
> > +enum packet_read_status {
> > +   PACKET_READ_EOF,
> > +   PACKET_READ_NORMAL,
> > +   PACKET_READ_FLUSH,
> > +};
> 
> EOF was -1 and NORMAL was 0 in the previous round; do we need to
> read through all the invocations of functions that return this type
> and make sure there is no "while (such_a_function())" that used to see
> if we read NORMAL that is left un-updated?
> 
> I just have gone thru all the hits from
> 
>  $ git grep -n -e packet_erad_with_status -e packet_reader_read -e 
> packet_reader_peek
> 
> There are a few
> 
>   switch (packet_reader_peek())
> 
> which by definition we do not have to worry about.  Then majority of
> what could be problematic are of the form:
> 
>   while (packet_reader_read() == PACKET_READ_NORMAL)
> 
> and they were this way even in the previous version, so it seems
> quite alright.
> 
> Will replace.  Thanks.

A reviewer in the previous round found that it was unnecessary to have
EOF start at -1, so per their comments I got rid of that.

-- 
Brandon Williams


Re: [PATCH v5 12/35] serve: introduce git-serve

2018-03-15 Thread Brandon Williams
On 03/14, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Introduce git-serve, the base server for protocol version 2.
> > ...
> >  Documentation/Makefile  |   1 +
> >  Documentation/technical/protocol-v2.txt | 174 +
> 
> asciidoc: ERROR: protocol-v2.txt: line 20: only book doctypes can contain 
> level 0 sections
> asciidoc: ERROR: protocol-v2.txt: line 374: [blockdef-listing] missing 
> closing delimiter
> Makefile:368: recipe for target 'technical/protocol-v2.html' failed
> 
> I'll redo today's integration cycle to see if I can move this topic
> to a late part of 'pu', so that I can at least keep the part of 'pu'
> that is beyond what is in 'next' and still usable a bit larger.  The
> bw/protocol-v2 topic has been merged immediately above the topics
> that are already in 'next' for the past week or so, but I cannot
> afford to leave the build broken for majority of merges of 'pu'.
> 

Turns out I don't know how to write properly formed asciidocs, sorry
about that.  I fixed up the docs in my series locally and I'll send out
a v5 in just a bit.

-- 
Brandon Williams


Re: [PATCH v6 00/14] Serialized Git Commit Graph

2018-03-15 Thread Johannes Schindelin
Hi Junio,

On Wed, 14 Mar 2018, Junio C Hamano wrote:

> A few patches add trailing blank lines and other whitespace
> breakages, which will stop my "git merge" later to 'next' and down,
> as I have a pre-commit hook to catch them.

I wonder how you cope with the intentional "whitespace breakage" caused by
a TAB after HS in my recreate-merges patch series...

> Here is the output from my "git am -s" session.
> 
> Applying: csum-file: rename hashclose() to finalize_hashfile()
> Applying: csum-file: refactor finalize_hashfile() method
> .git/rebase-apply/patch:109: new blank line at EOF.

Stolee, you definitely want to inspect those changes (`git log --check`
was introduced to show you whitespace problems). If all of those
whitespace issues are unintentional, you can fix them using `git rebase
--whitespace=fix` in the most efficient way.

Ciao,
Dscho


Re: Why don't we symlink libexec/git-core/* to bin/git?

2018-03-15 Thread Johannes Schindelin
Hi Junio,

On Wed, 14 Mar 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
> 
> > Is the only reason we're still installing these binaries like git-add in
> > libexec for compatibility with some old installation where that was
> > added to the $PATH, shouldn't we (and I can write this patch) also have
> > a toggle for "I want the modern install method" which would not install
> > any of these binaries like git-add at all?
> >
> > Then the libexec/ dir would only contain things that we really do need
> > the bin/git to dispatch to, like git-svn, git-bisect etc.
> 
> Removing them by default was proposed and failed; see this thread
> for example:
> 
>   https://public-inbox.org/git/7vr68b8q9p@gitster.siamese.dyndns.org/#t

Let's add a very, very important piece of information that was missing:
this thread is from late August 2008. We had deprecated the dashed form
"only for a couple of months" by then (we removed the dashed form from the
completions end of April 2008 in 799596a5d06 (completion: remove use of
dashed git commands, 2008-04-20) for example).

> If a packager ships Git without these copies in libexec, that is not
> the Git that promised users that prepending the $(git --exec-path)
> aka GIT_EXEC_PATH to your $PATH is a valid way to preserve their
> older script.
> 
> I do not think anybody actually minds to have an option to omit them
> as long as the users understand the consequence (i.e. old promises
> broken) and know they are not affected (i.e. they do not have
> scripts that rely on the old promise).

I am glad that you changed your stance from "without dashed builtins, your
Git is broken" to this much more tenable position to state that it may
break super-old promises whose use we discouraged already a full decade
ago.

To add some interesting information to this: in MinGit (the light-weight
"Git for applications" we bundle to avoid adding a hefty 230MB to any
application that wants to bundle Git for Windows), we simply ignored that
old promise. We do support hooks written as Unix shell scripts in MinGit,
and we have not had a single report since offering MinGit with v2.9.2 on
July 16th, 2016, that it broke anybody's scripts, so it seems that users
are more sensible than our promises ;-)

Not requiring Git to install any type of link makes it even possible to
bundle it as .zip file (which, let's face it, is the de facto standard for
cross-platform archiving).

Ciao,
Dscho

Re: [PATCH 2/2] fetch-pack: do not check links for partial fetch

2018-03-15 Thread Junio C Hamano
Jonathan Tan  writes:

> Our only definition (currently) for the "partial" fetch boundary is
> whether an object in a promisor packfile (a packfile obtained from the
> promisor remote) references it, so I think that checking for crossing a
> "partial" fetch boundary does not add anything.

Ah, that's a perfect answer. When we find a link that leads to a
missing object in a pack after receiving it from a remote repository
that gives us promises, the missing object must be available from
the remote repository by definition.  Makes perfect sense ;-)

Thanks.


[PATCH v2] filter-branch: return 2 when nothing to rewrite

2018-03-15 Thread Michele Locati
Using the --state-branch option allows us to perform incremental filtering.
This may lead to having nothing to rewrite in subsequent filtering, so we need
a way to recognize this case.
So, let's exit with 2 instead of 1 when this "error" occurs.

Signed-off-by: Michele Locati 
---
 Documentation/git-filter-branch.txt | 8 
 git-filter-branch.sh| 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-filter-branch.txt 
b/Documentation/git-filter-branch.txt
index 3a52e4dce..b63404318 100644
--- a/Documentation/git-filter-branch.txt
+++ b/Documentation/git-filter-branch.txt
@@ -222,6 +222,14 @@ this purpose, they are instead rewritten to point at the 
nearest ancestor that
 was not excluded.
 
 
+EXIT STATUS
+---
+
+On success, the exit status is `0`.  If the filter can't find any commits to
+rewrite, the exit status is `2`.  On any other error, the exit status may be
+any other non-zero value.
+
+
 Examples
 
 
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 1b7e4b2cd..c285fdb90 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -310,7 +310,7 @@ git rev-list --reverse --topo-order --default HEAD \
die "Could not get the commits"
 commits=$(wc -l <../revs | tr -d " ")
 
-test $commits -eq 0 && die "Found nothing to rewrite"
+test $commits -eq 0 && die_with_status 2 "Found nothing to rewrite"
 
 # Rewrite the commits
 report_progress ()
-- 
2.16.2.windows.1



Re: [PATCH 3/3] Makefile: optionally symlink libexec/git-core binaries to bin/git

2018-03-15 Thread Johannes Schindelin
Hi Linus,

On Wed, 14 Mar 2018, Linus Torvalds wrote:

> On Wed, Mar 14, 2018 at 3:14 AM, Ævar Arnfjörð Bjarmason
>  wrote:
> > On Wed, Mar 14 2018, Johannes Sixt jotted:
> >>
> >> It is important to leave the default at hard-linking the binaries,
> >> because on Windows symbolic links are second class citizens (they
> >> require special privileges and there is a distinction between link
> >> targets being files or directories). Hard links work well.
> >
> > Yeah makes sense. I just want to add this as an option, and think if
> > it's proven to be un-buggy we could probably turn it on by default on
> > the *nix's if people prefer that, but yeah, we'll definitely need the
> > uname detection.
> 
> I definitely would prefer to make symlinks the default on unix.
> 
> It's what we used to do (long long ago), and as you pointed out, it's
> a lot clearer what's going on too when you don't have to look at inode
> numbers and link counts.
> 
> Forcing hardlinking everywhere by default just because Windows
> filesystems suck donkey ass through a straw is not the right thing
> either.

The most sensible thing, of course, would be to *not* link the builtins at
all. I mean, we deprecated the dashed form (which was a design mistake,
whether you admit it or not) a long time ago.

Ciao,
Johannes

Re: [GSoC] [PATCH] test: avoid pipes in git related commands for test suite

2018-03-15 Thread Junio C Hamano
Eric Sunshine  writes:

> Thanks for presenting an opposing opinion. While I understand your
> position, the reason for my suggested transformation is that if the
> patch already transformed the code in the way suggested, it would
> increase my confidence, as a reviewer, that the patch author had
> _studied_ and _understood_ the code. Increased confidence is
> especially important for mechanical transformations since -- as seen
> in the unsnipped review comment below -- blindly-applied mechanical
> transformations can be suboptimal or outright incorrect.
>
> It's also the sort of review comment I would make even to very
> seasoned project participants[1].
>
> [1]: 
> https://public-inbox.org/git/capig+cqlmyqerhpxvzhmy7gapnbe25h_kosws-zjubo4bru...@mail.gmail.com/

Yes, it is a good example that mechanical conversions are often
mind-numbing and make even seasoned participants miss trivially
obvious improvement opportunities ;-)

It however is OK to be more lenient to newer participants and allow
deferring such "while at it, make it right" on top of "minimally
required for correctness", in order to encourage them by getting
something to the tree early ;-)


Re: [PATCH 3/3] Makefile: optionally symlink libexec/git-core binaries to bin/git

2018-03-15 Thread Johannes Schindelin
Hi,

On Wed, 14 Mar 2018, Johannes Sixt wrote:

> Am 13.03.2018 um 21:39 schrieb Ævar Arnfjörð Bjarmason:
> > Add a INSTALL_SYMLINKS option which if enabled, changes the default
> > hardlink installation method to one where the relevant binaries in
> > libexec/git-core are symlinked back to ../../bin, instead of being
> > hardlinked.
> > 
> > This new option also overrides the behavior of the existing
> > NO_*_HARDLINKS variables which in some cases would produce symlinks
> > within to libexec/, e.g. "git-add" symlinked to "git" which would be
> > copy of the "git" found in bin/, now "git-add" in libexec/ is always
> > going to be symlinked to the "git" found in the bin/ directory.
> 
> It is important to leave the default at hard-linking the binaries, because on
> Windows symbolic links are second class citizens (they require special
> privileges and there is a distinction between link targets being files or
> directories). Hard links work well.

To clarify: symbolic links do not exist in Windows Vista and earlier.
(There exists a concept called Junction points, but it has subtly
different semantics than symbolic links, different enough that we cannot
emulate symbolic links via Junctions).

Windows 7 and later do have symbolic links, but they require elevated
privileges to be created, as Hannes pointed out.

Since Windows 10 version 1703 (Creators Update), enabling Developer Mode
will disable this restriction and allow creating symlinks without UAC
elevation. See
https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/ for
details.

In Git for Windows, I originally missed the memo and forgot to add support
for the special flag, but since Git for Windows v2.13.1, users can create
symbolic links without administrators' privileges on Windows 10 (Creators
Update or later) in Developer Mode.

Of course, we still support Windows all the way back to Vista, so the
default is still: no symbolic links.

Thanks for your attention,
Dscho

RE: [bug] git stash push {dir-pathspec} wipes untracked files

2018-03-15 Thread Randall S. Becker


> -Original Message-
> From: git-ow...@vger.kernel.org  On Behalf
> Of Junio C Hamano
> Sent: March 15, 2018 12:52 PM
> To: Jake Stine 
> Cc: git@vger.kernel.org
> Subject: Re: [bug] git stash push {dir-pathspec} wipes untracked files
> 
> Jake Stine  writes:
> 
> > Hi, I ran into what I believe is a bug today.  I’m using primarily Git
> > for Windows 2.16.2 and also reproduced the behavior on Git for Windows
> > 2.15.1 and Git 2.14.1 on Ubuntu:
> >
> > Given any repository with at least one subdirectory:
> >
> > 1.   Create some untracked files in the subdir
> > 2.   Modify a tracked file in the subdir
> > 3.   Execute `git stash push subdir`
> > 4.   The untracked files will be removed, without warning.
> >
> > `git stash push` behaves as-expcted and does not touch untracked
> > files.  It’s only when a directory tree is specified as [pathspec]
> > that the problem occurs.
> 
> I wonder if this is the same as the topic on this thread.
> 
>   https://public-
> inbox.org/git/CA+HNv10i7AvWXjrQjxxy1LNJTmhr7LE4TwxhHUYBiWtmJCOf_
> a...@mail.gmail.com/
> 
> What is curious is that the fix bba067d2 ("stash: don't delete untracked files
> that match pathspec", 2018-01-06) appeared first in 2.16.2, on which
> Windows 2.16.2 is supposed to be built upon.
> 
> > Here's the precise reproduction case executed on a linux box:
> 
> This does not reproduce for me with v2.16.2-17-g38e79b1fda (the tip of
> 'maint'); I do not have an  install of vanilla v2.16.2 handy, but I suspect 
> v2.16.2
> would work just fine, too.

This does not reproduce for me either in v2.16.2.9-g3dbadef9f (which is the 
NonStop port of 2.16.2)

On branch master
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

modified:   subdir/a

Untracked files:
  (use "git add ..." to include in what will be committed)

subdir/b

no changes added to commit (use "git add" and/or "git commit -a")
/home/randall/foot: git stash push subdir
Saved working directory and index state WIP on master: b772270 i
/home/randall/foot: ls subdir
a  b
/home/randall/foot: git status
On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

subdir/b

nothing added to commit but untracked files present (use "git add" to track)



Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-15 Thread Duy Nguyen
On Mon, Mar 12, 2018 at 8:30 PM, Ævar Arnfjörð Bjarmason
 wrote:
> We already have pack.packSizeLimit, perhaps we could call this
> e.g. gc.keepPacksSize=2GB?

I'm OK either way. The "base pack" concept comes from the
"--keep-base-pack" option where we do keep _one_ base pack. But gc
config var has a slightly different semantics when it can keep
multiple packs.

> Finally I wonder if there should be something equivalent to
> gc.autoPackLimit for this. I.e. with my proposed semantics above it's
> possible that we end up growing forever, i.e. I could have 1000 2GB
> packs and then 50 very small packs per gc.autoPackLimit.
>
> Maybe we need a gc.keepPackLimit=100 to deal with that, then e.g. if
> gc.keepPacksSize=2GB is set and we have 101 >= 2GB packs, we'd pick the
> two smallest one and not issue a --keep-pack for those, although then
> maybe our memory use would spike past the limit.
>
> I don't know, maybe we can leave that for later, but I'm quite keen to
> turn the top-level config variable into something that just considers
> size instead of "base" if possible, and it seems we're >95% of the way
> to that already with this patch.

At least I will try to ignore gc.keepPacksSize if all packs are kept
because of it. That repack run will hurt. But then we're back to one
giant pack and plenty of small packs that will take some time to grow
up to 2GB again.

> Finally, I don't like the way the current implementation conflates a
> "size" variable with auto detecting the size from memory, leaving no way
> to fallback to the auto-detection if you set it manually.
>
> I think we should split out the auto-memory behavior into another
> variable, and also make the currently hardcoded 50% of memory
> configurable.
>
> That way you could e.g. say you'd always like to keep 2GB packs, but if
> you happen to have ended up with a 1GB pack and it's time to repack, and
> you only have 500MB free memory on that system, it would keep the 1GB
> one until such time as we have more memory.

I don't calculate based on free memory (it's tricky to get that right
on linux) but physical memory. If you don't have enough memory
according to this formula, you won't until you add more memory sticks.

>
> Actually maybe that should be a "if we're that low on memory, forget
> about GC for now" config, but urgh, there's a lot of potential
> complexity to be handled here...

Yeah I think what you want is a hook. You can make custom rules then.
We already have pre-auto-gc hook and could pretty much do what you
want without pack-objects memory estimation. But if you want it, maybe
we can export the info to the hook somehow.
-- 
Duy


Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-15 Thread Junio C Hamano
Duy Nguyen  writes:

> On Thu, Mar 15, 2018 at 2:34 AM, Junio C Hamano  wrote:
>> * nd/pack-objects-pack-struct (2018-03-05) 9 commits
>> ...
>>  Will merge to 'next'.
>
> Hold it. A reroll is coming. I'm a bit busy this week and can't really do 
> much.
>
>> * nd/worktree-prune (2018-03-06) 3 commits
>> ...
> Same.

Thanks, quick responses like this message really helps.


Re: [bug] git stash push {dir-pathspec} wipes untracked files

2018-03-15 Thread Junio C Hamano
Jake Stine  writes:

> Hi, I ran into what I believe is a bug today.  I’m using primarily Git
> for Windows 2.16.2 and also reproduced the behavior on Git for Windows
> 2.15.1 and Git 2.14.1 on Ubuntu:
>
> Given any repository with at least one subdirectory:
>
> 1.   Create some untracked files in the subdir
> 2.   Modify a tracked file in the subdir
> 3.   Execute `git stash push subdir`
> 4.   The untracked files will be removed, without warning.
>
> `git stash push` behaves as-expcted and does not touch untracked
> files.  It’s only when a directory tree is specified as [pathspec]
> that the problem occurs.

I wonder if this is the same as the topic on this thread.

  
https://public-inbox.org/git/ca+hnv10i7avwxjrqjxxy1lnjtmhr7le4twxhhuybiwtmjco...@mail.gmail.com/

What is curious is that the fix bba067d2 ("stash: don't delete
untracked files that match pathspec", 2018-01-06) appeared first in
2.16.2, on which Windows 2.16.2 is supposed to be built upon.

> Here's the precise reproduction case executed on a linux box:

This does not reproduce for me with v2.16.2-17-g38e79b1fda (the tip
of 'maint'); I do not have an  install of vanilla v2.16.2 handy, but
I suspect v2.16.2 would work just fine, too.

> jake@jake-VirtualBox:~/woot$ git --version
> git version 2.14.1
> ...
> The expected result is that when I do `ls subdir` the file
> "untracked.txt" still exists.  Alternatively, git stash should warn me
> before destroying my untracked files, and require I specify --force or
> similar to invoke destructive behavior.
>
>
> Thanks!
> Jake Stine


Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-03-15 Thread Duy Nguyen
On Mon, Mar 12, 2018 at 7:56 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Wed, Mar 07 2018, Junio C. Hamano jotted:
>
>> Duy Nguyen  writes:
 But to those who say "packs larger than this value is too big" via
 configuration, keeping only the largest of these above-threshold
 packs would look counter-intuitive, wouldn't it, I wonder?
>>>
>>> I think I'll just clarify this in the document. There may be a use
>>> case for keeping multiple large packs, but I don't see it (*). We can
>>> deal with it when it comes.
>>
>> When the project's history grows too much, a large pack that holds
>> its first 10 years of stuff, together with another one that holds
>> its second 20 years of stuff, may both be larger than the threshold
>> and want to be kept.  If we pick only the largest one, we would
>> explode the other one and repack together with loose objects.
>>
>> But realistically, those who would want to control the way in which
>> their repository is packed to such a degree are very likely to add
>> ".keep" files to these two packfiles themselves, so the above would
>> probably not a concern.  Perhaps we shouldn't do the "automatically
>> pick the largest one and exclude from repacking" when there is a
>> packfile that is marked with ".keep"?
>
> As someone who expects to use this (although hopefully in slightly
> modified form), it's very useful if we can keep the useful semantics in
> gc.* config values without needing some external job finding repos andis is
> creating *.keep files to get custom behavior.
>
> E.g. I have the use-case of wanting to set this on servers that I know
> are going to be used for cloning some big repos in user's ~ directory
> manually, so if I can set something sensible in /etc/gitconfig that's
> great, but it sucks a lot more to need to write some cronjob that goes
> hunting for repos in those ~ directories and tweaks *.keep files.

If this is about .gc.bigBasePackThreshold keeping all packs larger
than the threshold, then yes it will be so in the reroll.
-- 
Duy


[PATCH v2 2/3] worktree: delete dead code

2018-03-15 Thread Nguyễn Thái Ngọc Duy
This "link" was a feature in early iterations of multiple worktree
functionality for some reason it was dropped [1]. Since nobody creates
this "link", there's no need to check it.

This is mostly used to let the user moves a worktree manually [2]. If
you move a worktree within the same file system, this hard link count
lets us know the worktree is still there even if we don't know where it
is.

We support 'worktree move' now and don't need this anymore.

[1] last appearance in v4 message-id:
1393675983-3232-25-git-send-email-pclo...@gmail.com
and the reason in v5 was "revisit later", message-id:
1394246900-31535-1-git-send-email-pclo...@gmail.com
[2] 23af91d102 (prune: strategies for linked checkouts - 2014-11-30)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/gitrepository-layout.txt | 5 -
 builtin/worktree.c | 8 
 2 files changed, 13 deletions(-)

diff --git a/Documentation/gitrepository-layout.txt 
b/Documentation/gitrepository-layout.txt
index c60bcad44a..e85148f05e 100644
--- a/Documentation/gitrepository-layout.txt
+++ b/Documentation/gitrepository-layout.txt
@@ -275,11 +275,6 @@ worktrees//locked::
or manually by `git worktree prune`. The file may contain a string
explaining why the repository is locked.
 
-worktrees//link::
-   If this file exists, it is a hard link to the linked .git
-   file. It is used to detect if the linked repository is
-   manually removed.
-
 SEE ALSO
 
 linkgit:git-init[1],
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 670555dedd..e55edf2aa5 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -101,15 +101,7 @@ static int prune_worktree(const char *id, struct strbuf 
*reason)
}
path[len] = '\0';
if (!file_exists(path)) {
-   struct stat st_link;
free(path);
-   /*
-* the repo is moved manually and has not been
-* accessed since?
-*/
-   if (!stat(git_path("worktrees/%s/link", id), _link) &&
-   st_link.st_nlink > 1)
-   return 0;
if (st.st_mtime <= expire) {
strbuf_addf(reason, _("Removing worktrees/%s: gitdir 
file points to non-existent location"), id);
return 1;
-- 
2.16.2.903.gd04caf5039



  1   2   >