Re: [PATCH v2] add -p: fix 2.17.0-rc* regression due to moved code

2018-03-31 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index d190469cd8..c1f52e457f 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -1561,13 +1561,13 @@ sub patch_update_file {
>   elsif ($line =~ m|^/(.*)|) {
>   my $regex = $1;
>   unless ($other =~ m|/|) {
>   error_msg __("No other hunks to 
> search\n");
>   next;
>   }
> - if ($1 eq "") {
> + if ($regex eq "") {
>   print colored $prompt_color, __("search 
> for regex? ");

Ah.  That "unless ... { }" thing is what was inserted recently, and
the patch is an obvious fix once the problem is pointed out.  Thanks
for a careful reading.

It makes me wonder what the original author, who already captured $1
in $regex, was thinking when he wrote the comparison with $1 there,
but that is OK---ithappend long time ago in early 2009.

Will apply.  Thanks.

>   $regex = ;
>   if (defined $regex) {
>   chomp $regex;
>   }
>   }


Re: Git Question

2018-03-31 Thread Taylor Blau
On Sat, Mar 31, 2018 at 09:06:53PM -0500, Mark Wartman wrote:
> I saw this pretty exhaustive .gitignore list that a GitHub Help page
> linked to: 9257657  .  Are
> these configurations from the list something one should install on the
> User/global level, or do you recommend managing it in each and every
> project repository?

If those appear helpful for your project(s), then sure; but this is up
to you.

> Concerning these items below, when I said I get these when running a:
> git config —list
>
> credential.helper-osxkeychain
> filter.lfs.clean=git=lfs clean — %f
> filter.lfs.smudge=git-lfs smudge — %f
> filter.lfs.process=git-lfs filter-process
> filter.lfs.required=true
>
> I suspect that the GitHub GUI tool has installed them.  Do you find
> that possible?  I ask because I did not write them.

Yes, I believe that GitHub Desktop runs 'git lfs install', which adds
these to your configuration.


Thanks,
Taylor


Re: [OT] how does "git review --setup" figure out my username?

2018-03-31 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 31 2018, Robert P. J. Day wrote:

> On Sat, 31 Mar 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Sat, Mar 31, 2018 at 9:04 PM, Robert P. J. Day  
>> wrote:
>> >
>> >   (technically not a git question, but i kind of need to know the
>> > answer to this quickly as i'm writing some documentation and this is
>> > something i have to explain.)
>> >
>> >   i cloned a repository (hyperledger fabric) which has a top-level
>> > .gitreview file:
>> >
>> >   [gerrit]
>> >   host=gerrit.hyperledger.org
>> >   port=29418
>> >   project=fabric
>> >
>> > and, as i read it, if i want to configure to use gerrit, an initial
>> > invocation of "git review --setup" should do that for me, which it
>> > appears to do, as it adds the following to .git/config:
>> >
>> >   [remote "gerrit"]
>> > url = ssh://rpj...@gerrit.hyperledger.org:29418/fabric
>> > fetch = +refs/heads/*:refs/remotes/gerrit/*
>> >
>> > and copies over the commit-msg hook. so far, so good.
>> >
>> >   but from where does it figure out the username (rpjday) to use when
>> > configuring that remote? i have no gerrit configuration in my
>> > .gitconfig file. however, i have configured gerrit at the hyperledger
>> > end to use my SSH key, which is associated with my linux foundation ID
>> > (rpjday) that i registered to start using that repo.
>> >
>> >   is that where it gets the username from?
>>
>> I've never used gerrit, but from my skimming of
>> https://www.mediawiki.org/wiki/Gerrit/git-review#Setting_up_git-review
>> and
>> https://www.mediawiki.org/wiki/Gerrit/Tutorial#Configuring_git-review
>> it seems (to me) to say that it simply tries if your local loginname
>> works on the remote. Is rpjday your loginname on this system?
>
>   yes, but it's just a fluke that i used the same user name in both
> places ... what if i hadn't? which one would it have selected?

Seems like a fairly common case, especially as gerrit tends to be used
in Big Corp setups where your username is the same.

According to the docs I linked to it'll ask you if it doesn't happen to
match, but as I've said never used it, and that may be wrong.


Re: [OT] how does "git review --setup" figure out my username?

2018-03-31 Thread Robert P. J. Day
On Sat, 31 Mar 2018, Ævar Arnfjörð Bjarmason wrote:

> On Sat, Mar 31, 2018 at 9:04 PM, Robert P. J. Day  
> wrote:
> >
> >   (technically not a git question, but i kind of need to know the
> > answer to this quickly as i'm writing some documentation and this is
> > something i have to explain.)
> >
> >   i cloned a repository (hyperledger fabric) which has a top-level
> > .gitreview file:
> >
> >   [gerrit]
> >   host=gerrit.hyperledger.org
> >   port=29418
> >   project=fabric
> >
> > and, as i read it, if i want to configure to use gerrit, an initial
> > invocation of "git review --setup" should do that for me, which it
> > appears to do, as it adds the following to .git/config:
> >
> >   [remote "gerrit"]
> > url = ssh://rpj...@gerrit.hyperledger.org:29418/fabric
> > fetch = +refs/heads/*:refs/remotes/gerrit/*
> >
> > and copies over the commit-msg hook. so far, so good.
> >
> >   but from where does it figure out the username (rpjday) to use when
> > configuring that remote? i have no gerrit configuration in my
> > .gitconfig file. however, i have configured gerrit at the hyperledger
> > end to use my SSH key, which is associated with my linux foundation ID
> > (rpjday) that i registered to start using that repo.
> >
> >   is that where it gets the username from?
>
> I've never used gerrit, but from my skimming of
> https://www.mediawiki.org/wiki/Gerrit/git-review#Setting_up_git-review
> and
> https://www.mediawiki.org/wiki/Gerrit/Tutorial#Configuring_git-review
> it seems (to me) to say that it simply tries if your local loginname
> works on the remote. Is rpjday your loginname on this system?

  yes, but it's just a fluke that i used the same user name in both
places ... what if i hadn't? which one would it have selected?

rday

Re: [OT] how does "git review --setup" figure out my username?

2018-03-31 Thread Ævar Arnfjörð Bjarmason
On Sat, Mar 31, 2018 at 9:04 PM, Robert P. J. Day  wrote:
>
>   (technically not a git question, but i kind of need to know the
> answer to this quickly as i'm writing some documentation and this is
> something i have to explain.)
>
>   i cloned a repository (hyperledger fabric) which has a top-level
> .gitreview file:
>
>   [gerrit]
>   host=gerrit.hyperledger.org
>   port=29418
>   project=fabric
>
> and, as i read it, if i want to configure to use gerrit, an initial
> invocation of "git review --setup" should do that for me, which it
> appears to do, as it adds the following to .git/config:
>
>   [remote "gerrit"]
> url = ssh://rpj...@gerrit.hyperledger.org:29418/fabric
> fetch = +refs/heads/*:refs/remotes/gerrit/*
>
> and copies over the commit-msg hook. so far, so good.
>
>   but from where does it figure out the username (rpjday) to use when
> configuring that remote? i have no gerrit configuration in my
> .gitconfig file. however, i have configured gerrit at the hyperledger
> end to use my SSH key, which is associated with my linux foundation ID
> (rpjday) that i registered to start using that repo.
>
>   is that where it gets the username from?

I've never used gerrit, but from my skimming of
https://www.mediawiki.org/wiki/Gerrit/git-review#Setting_up_git-review
and https://www.mediawiki.org/wiki/Gerrit/Tutorial#Configuring_git-review
it seems (to me) to say that it simply tries if your local loginname
works on the remote. Is rpjday your loginname on this system?


[OT] how does "git review --setup" figure out my username?

2018-03-31 Thread Robert P. J. Day

  (technically not a git question, but i kind of need to know the
answer to this quickly as i'm writing some documentation and this is
something i have to explain.)

  i cloned a repository (hyperledger fabric) which has a top-level
.gitreview file:

  [gerrit]
  host=gerrit.hyperledger.org
  port=29418
  project=fabric

and, as i read it, if i want to configure to use gerrit, an initial
invocation of "git review --setup" should do that for me, which it
appears to do, as it adds the following to .git/config:

  [remote "gerrit"]
url = ssh://rpj...@gerrit.hyperledger.org:29418/fabric
fetch = +refs/heads/*:refs/remotes/gerrit/*

and copies over the commit-msg hook. so far, so good.

  but from where does it figure out the username (rpjday) to use when
configuring that remote? i have no gerrit configuration in my
.gitconfig file. however, i have configured gerrit at the hyperledger
end to use my SSH key, which is associated with my linux foundation ID
(rpjday) that i registered to start using that repo.

  is that where it gets the username from?

rday


Re: [PATCH v3 0/3] add -p: select individual hunk lines

2018-03-31 Thread Ævar Arnfjörð Bjarmason

On Fri, Mar 30 2018, Phillip Wood wrote:

> On 29/03/18 19:32, Junio C Hamano wrote:
>> Phillip Wood  writes:
>>
>>> From: Phillip Wood 
>>>
>>> Since v2 I've updated the patches to use '-' instead of '^' to invert
>>> the selection to match the rest of add -i and clean -i.
>>>
>>> These patches build on top of the recount fixes in [1]. The commit
>>> message for the first patch describes the motivation:
>>>
>>> "When I end up editing hunks it is almost always because I want to
>>> stage a subset of the lines in the hunk. Doing this by editing the
>>> hunk is inconvenient and error prone (especially so if the patch is
>>> going to be reversed before being applied). Instead offer an option
>>> for add -p to stage individual lines. When the user presses 'l' the
>>> hunk is redrawn with labels by the insertions and deletions and they
>>> are prompted to enter a list of the lines they wish to stage. Ranges
>>> of lines may be specified using 'a-b' where either 'a' or 'b' may be
>>> omitted to mean all lines from 'a' to the end of the hunk or all lines
>>> from 1 upto and including 'b'."
>>
>> I haven't seen any review comments on this round, and as I am not a
>> heavy user of "add -i" interface (even though I admit that I
>> originally wrote it), I haven't had a chance to exercise the code
>> myself in the two weeks since the patches have been queued in my
>> tree.
>>
>> I am inclihned to merge them to 'next' soonish, but please stop me
>> if anybody (including the original author) has further comments.
>>
>> Thanks.
>>
> Hi Junio, if no one else has any comments, then I think it's ready for
> next. I've not used this latest incarnation much but I've used the
> previous versions quite a bit.

First of all thinks for working on this. Something like this is a
feature I've long wanted to have and have just been manually using edit.

As for the code, one comment: For reasons of avoiding something like the
2.17.0-rc* bug I just sent a patch for, I think you should change your
use of the implicit $_ to something where you explicitly create lexical
variables instead.

It's bad style in Perl to use $_ for anything except a one-liner, and
similar to the $1 bug with your other patch, you'll get buggy code
(regardless of your use of local $_) if one of the functions you're
calling in these >10 line for-loops starts doing something to set $_
itself, as demonstrated by:

$ perl -wE 'sub foo { local $_; for (1..3) { bar(); say } } sub bar { $_ = 
$_ ** 2; } foo()'
1
4
9

Let's just name these variables, even if it wasn't for that caveat it
would still be a good idea, since for any non-trivial use of $_ you've
got to mentally keep track of what set $_ where, so it's hard to read.

As for the implementation, I *want* to love this, but it seems the way
it works is just fatally flawed, consider. *The* use-case I've had for
something like this (maybe yours differs?) is something where I do e.g.:

$ perl -pi -e 's/git/Git/g' README.md

Which gives me (among other things):

-See [Documentation/gittutorial.txt][] to get started, then see
-[Documentation/giteveryday.txt][] for a useful minimum set of commands, and
-Documentation/git-.txt for documentation of each command.
-If git has been correctly installed, then the tutorial can also be
-read with `man gittutorial` or `git help tutorial`, and the
-documentation of each command with `man git-` or `git help
+See [Documentation/Gittutorial.txt][] to get started, then see
+[Documentation/Giteveryday.txt][] for a useful minimum set of commands, and
+Documentation/Git-.txt for documentation of each command.
+If Git has been correctly installed, then the tutorial can also be
+read with `man Gittutorial` or `Git help tutorial`, and the
+documentation of each command with `man Git-` or `Git help

Which to me, is a perfect use-case for this feature. Here I
hypothetically want to change "git" to "Git" in prose, so I only want to
change that "If git has been" line, the rest are all references to
filenames or command names.

So I would manually edit the hunk via "e" to:

 See [Documentation/gittutorial.txt][] to get started, then see
 [Documentation/giteveryday.txt][] for a useful minimum set of commands, and
 Documentation/git-.txt for documentation of each command.
-If git has been correctly installed, then the tutorial can also be
+If Git has been correctly installed, then the tutorial can also be
 read with `man gittutorial` or `git help tutorial`, and the
 documentation of each command with `man git-` or `git help
 `.

Yay, but very tedious. Now let's use your feature to do this:

 1 -See [Documentation/gittutorial.txt][] to get started, then see
 2 -[Documentation/giteveryday.txt][] for a useful minimum set of commands, 
and
 3 -Documentation/git-.txt for documentation of each command.
 4 -If git has been correctly 

Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-31 Thread Ævar Arnfjörð Bjarmason

On Fri, Mar 30 2018, Junio C. Hamano wrote:

> [...]
> * jk/drop-ancient-curl (2017-08-09) 5 commits
>  - http: #error on too-old curl
>  - curl: remove ifdef'd code never used with curl >=7.19.4
>  - http: drop support for curl < 7.19.4
>  - http: drop support for curl < 7.16.0
>  - http: drop support for curl < 7.11.1
>
>  Some code in http.c that has bitrot is being removed.
>
>  Expecting a reroll.

This has been idle for a long time. Peff: What's left to be done for it?

> [...]
> * dj/runtime-prefix (2018-03-25) 3 commits
>  - exec_cmd: RUNTIME_PREFIX on some POSIX systems
>  - Makefile: add Perl runtime prefix support
>  - Makefile: generate Perl header from template file
>  (this branch is used by js/runtime-prefix-windows.)
>
>  A build-time option has been added to allow Git to be told to refer
>  to its associated files relative to the main binary, in the same
>  way that has been possible on Windows for quite some time, for
>  Linux, BSDs and Darwin.
>
>  Is this, together with the js/runtime-prefix-windows topic, ready
>  for 'next'?

dj/runtime-prefix look good to me (and I've been the reviewing it the
most it seems), I don't see any problems with js/runtime-prefix-windows
from peeking at it, but I haven't run it for obvious reasons.


Re: [PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 6:40 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag
> which (approximately) enables -Wextra so that any combination of them
> and -Werror can be set.
>
> I've long wanted to use DEVELOPER=1 in my production builds, but on
> some old systems I still get warnings, and thus the build would
> fail. However if the build/tests fail for some other reason, it would
> still be useful to scroll up and see what the relevant code is warning
> about.
>
> This change allows for that. Now setting DEVELOPER will set -Werror as
> before, but if DEVELOPER_NONFATAL is set you'll get the same warnings,
> but without -Werror.
>
> I've renamed the newly added EAGER_DEVELOPER flag to
> DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra,
> and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than
> inventing some new name of our own (VERY_EAGER_DEVELOPER?).

Before we go with zillions of *DEVELOPER* maybe we can have something
like DEVOPTS where you can give multiple keywords to a single variable
to influence config.mak.dev. This is similar to COMPILER_FEATURES we
already have in there, but now it's driven by the dev instead of the
compiler. So you can have keywords like "gentle" (no -Werror) "extra"
(-Wextra with no suppression) and something else.
-- 
Duy


[PATCH] send-email: report host and port separately when calling git credential

2018-03-31 Thread Michal Nazarewicz
When git-send-email uses git-credential to get SMTP password, it will
communicate SMTP host and port (if both are provided) as a single entry
‘host=:’.  This trips the ‘git-credential-store’ helper
which expects those values as separate keys (‘host’ and ‘port’).

Send the values as separate pieces of information so things work
smoothly.

Signed-off-by: Michał Nazarewicz 
---
 git-send-email.perl | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 2fa7818ca..2a9f89a58 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1229,14 +1229,6 @@ sub maildomain {
return maildomain_net() || maildomain_mta() || 'localhost.localdomain';
 }
 
-sub smtp_host_string {
-   if (defined $smtp_server_port) {
-   return "$smtp_server:$smtp_server_port";
-   } else {
-   return $smtp_server;
-   }
-}
-
 # Returns 1 if authentication succeeded or was not necessary
 # (smtp_user was not specified), and 0 otherwise.
 
@@ -1263,7 +1255,8 @@ sub smtp_auth_maybe {
# reject credentials.
$auth = Git::credential({
'protocol' => 'smtp',
-   'host' => smtp_host_string(),
+   'host' => $smtp_server,
+   'port' => $smtp_server_port,
'username' => $smtp_authuser,
# if there's no password, "git credential fill" will
# give us one, otherwise it'll just pass this one.
-- 
2.16.2



Re: [PATCH v2] add -p: fix 2.17.0-rc* regression due to moved code

2018-03-31 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 31 2018, Phillip Wood wrote:

> On 31/03/18 13:50, Ævar Arnfjörð Bjarmason wrote:
>> Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
>> there's more than one hunk", 2018-02-13) which is present in
>> 2.17.0-rc*, but not 2.16.0.
>>
>> In Perl, regex variables like $1 always refer to the last regex
>> match. When the aforementioned change added a new regex match between
>> the old match and the corresponding code that was expecting $1, the $1
>> variable would always be undef, since the newly inserted regex match
>> doesn't have any captures.
>>
>> As a result the "/" feature to search for a string in a hunk by regex
>> completely broke, on git.git:
>
> Good catch, I could have sworn I'd tested my patch but I obviously
> didn't notice the warning (I've got interactive.singlekey set so it
> prints the warning and then prompts as it always has done). Calling it
> completely broken is perhaps a little harsh as it does work if you
> enter the regex again and with interactive.singlekey set you only have
> to enter the regex once.

To clarify by "completely broken" I mean the "/" feature itself, but
yeah, you can still search by regex since we just so happen to have the
fallback codepath intended to catch "/" without an accompanying string,
which'll kick in and ask you for the regex since $1 will be undef at
that point, and will thus coerce stringwise to "".

> Thanks for fixing it
>
> Phillip
>>
>>  $ perl -pi -e 's/Git/Tig/g' README.md
>>  $ ./git --exec-path=$PWD add -p
>>  [..]
>>  Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
>>  Split into 4 hunks.
>>  [...]
>>  Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
>>  Use of uninitialized value $1 in string eq at 
>> /home/avar/g/git/git-add--interactive line 1568,  line 1.
>>  search for regex? Many
>>
>> I.e. the initial "/regex" command wouldn't work, and would always emit
>> a warning and ask again for a regex, now it works as intended again.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>
>> Of course I just noticed the grammar errors in the commit message
>> after sending. Here's a v2 with that fixed, also genreated the patch
>> with -U6 to make it clear what's going on.
>>
>>   git-add--interactive.perl | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
>> index d190469cd8..c1f52e457f 100755
>> --- a/git-add--interactive.perl
>> +++ b/git-add--interactive.perl
>> @@ -1561,13 +1561,13 @@ sub patch_update_file {
>>  elsif ($line =~ m|^/(.*)|) {
>>  my $regex = $1;
>>  unless ($other =~ m|/|) {
>>  error_msg __("No other hunks to 
>> search\n");
>>  next;
>>  }
>> -if ($1 eq "") {
>> +if ($regex eq "") {
>>  print colored $prompt_color, __("search 
>> for regex? ");
>>  $regex = ;
>>  if (defined $regex) {
>>  chomp $regex;
>>  }
>>  }
>>


Re: [PATCH v4 2/5] stash: convert apply to builtin

2018-03-31 Thread Joel Teichroeb
On Thu, Mar 29, 2018 at 1:07 PM, Junio C Hamano  wrote:
> Joel Teichroeb  writes:
>
>> +static int get_stash_info(struct stash_info *info, int argc, const char 
>> **argv)
>> +{
>
> So, this roughly corresponds to parse_flags_and_rev function, it seems.
>
>> + struct strbuf w_commit_rev = STRBUF_INIT;
>> + struct strbuf b_commit_rev = STRBUF_INIT;
>> + struct strbuf w_tree_rev = STRBUF_INIT;
>> + struct strbuf b_tree_rev = STRBUF_INIT;
>> + struct strbuf i_tree_rev = STRBUF_INIT;
>> + struct strbuf u_tree_rev = STRBUF_INIT;
>> + struct strbuf symbolic = STRBUF_INIT;
>> + struct strbuf out = STRBUF_INIT;
>> + int ret;
>> + const char *revision;
>> + const char *commit = NULL;
>> + char *end_of_rev;
>> + info->is_stash_ref = 0;
>> +
>> + if (argc > 1) {
>> + int i;
>> + struct strbuf refs_msg = STRBUF_INIT;
>> + for (i = 0; i < argc; ++i)
>> + strbuf_addf(_msg, " '%s'", argv[i]);
>> +
>> + fprintf_ln(stderr, _("Too many revisions specified:%s"), 
>> refs_msg.buf);
>> + strbuf_release(_msg);
>> +
>> + return -1;
>> + }
>> +
>> + if (argc == 1)
>> + commit = argv[0];
>> +
>> + strbuf_init(>revision, 0);
>> + if (commit == NULL) {
>> + if (have_stash()) {
>> + free_stash_info(info);
>> + return error(_("No stash entries found."));
>> + }
>> +
>> + strbuf_addf(>revision, "%s@{0}", ref_stash);
>> + } else if (strspn(commit, "0123456789") == strlen(commit)) {
>> + strbuf_addf(>revision, "%s@{%s}", ref_stash, commit);
>> + } else {
>> + strbuf_addstr(>revision, commit);
>> + }
>> +
>> + revision = info->revision.buf;
>> + strbuf_addstr(_commit_rev, revision);
>> + ret = !get_oid(w_commit_rev.buf, >w_commit);
>> + strbuf_release(_commit_rev);
>
> Use of strbuf w_commit_rev looks completely pointless here.  Am I
> mistaken to say that the above three lines are equivalent to:
>
> ret = !get_oid(revision, >w_commit);
>

Right, it was refactored to this in a previous version, but I didn't
quite think it through.

>> +
>> + if (!ret) {
>> + error(_("%s is not a valid reference"), revision);
>> + free_stash_info(info);
>> + return -1;
>> + }
>> +
>> + strbuf_addf(_commit_rev, "%s^1", revision);
>> + strbuf_addf(_tree_rev, "%s:", revision);
>> + strbuf_addf(_tree_rev, "%s^1:", revision);
>> + strbuf_addf(_tree_rev, "%s^2:", revision);
>> +
>> + ret = !get_oid(b_commit_rev.buf, >b_commit) &&
>> + !get_oid(w_tree_rev.buf, >w_tree) &&
>> + !get_oid(b_tree_rev.buf, >b_tree) &&
>> + !get_oid(i_tree_rev.buf, >i_tree);
>> +
>> + strbuf_release(_commit_rev);
>> + strbuf_release(_tree_rev);
>> + strbuf_release(_tree_rev);
>> + strbuf_release(_tree_rev);
>
> For the same reason, these strbuf's look pretty much pointless.  I
> wonder if a private helper
>
> static int grab_oid(struct oid *oid, const char *fmt, const char *rev)
> {
> struct strbuf buf = STRBUF_INIT;
> int ret;
>
> strbuf_addf(, fmt, rev);
> ret = get_oid(buf, oid);
> strbuf_release();
> return ret;
> }
>
> would help here?  Then you wouldn't be writing something like the
> above, and instead you'd grab four object names like so:
>
> if (grab_oid(>b_commit, "%s^1", revision) ||
> grab_oid(>w_tree, "%s:", revision) ||
> grab_oid(>b_tree, "%s&1:", revision) ||
> grab_oid(>i_tree, "%s&2:", revision)) {
> ... we found an error ...
> return -1;
> }
>
> which would be a lot easier to follow, no?

Very much agreed! I felt like that part of the code was the weakest
part of my patch before. I'm very happy to have it cleaned up like
this!

>
>> +int cmd_stash__helper(int argc, const char **argv, const char *prefix)
>> +{
>> + int result = 0;
>> + pid_t pid = getpid();
>> + const char *index_file;
>> +
>> + struct option options[] = {
>> + OPT_END()
>> + };
>> +
>> + git_config(git_default_config, NULL);
>> +
>> + argc = parse_options(argc, argv, prefix, options, 
>> git_stash_helper_usage,
>> + PARSE_OPT_KEEP_UNKNOWN|PARSE_OPT_KEEP_DASHDASH);
>> +
>> + index_file = get_index_file();
>> + xsnprintf(stash_index_path, PATH_MAX, "%s.stash.%"PRIuMAX, index_file, 
>> (uintmax_t)pid);
>
> Wouldn't it make more sense to get rid of PATH_MAX and hold it in a
> strbuf instead?  I.e.
>
> static struct strbuf stash_index_path = STRBUF_INIT;
> ...
> strbuf_addf(_index_path, "%s.stash.%" PRIuMAX, index_file, 
> (uintmax_t)pid);
>

That makes it a lot cleaner, 

[PATCH 4/3] Makefile: untangle DEVELOPER and -Werror

2018-03-31 Thread Ævar Arnfjörð Bjarmason
Change the DEVELOPER flag, and the newly added EAGER_DEVELOPER flag
which (approximately) enables -Wextra so that any combination of them
and -Werror can be set.

I've long wanted to use DEVELOPER=1 in my production builds, but on
some old systems I still get warnings, and thus the build would
fail. However if the build/tests fail for some other reason, it would
still be useful to scroll up and see what the relevant code is warning
about.

This change allows for that. Now setting DEVELOPER will set -Werror as
before, but if DEVELOPER_NONFATAL is set you'll get the same warnings,
but without -Werror.

I've renamed the newly added EAGER_DEVELOPER flag to
DEVELOPER_EXTRA. The reason is that it approximately turns on -Wextra,
and it'll be more consistent to add e.g. DEVELOPER_PEDANTIC later than
inventing some new name of our own (VERY_EAGER_DEVELOPER?).

The DEVELOPER_EXTRA flag implicitly means DEVELOPER_NONFATAL, but just
so that this change doesn't introduce yet another arbitrary limitation
it's possible to combine it with DEVELOPER_FATAL, which will turn on
-Werror for it as well.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

I really like where this is going, but as noted I think this on top
would be even better.

 Makefile   | 17 +++--
 config.mak.dev | 10 --
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/Makefile b/Makefile
index e4f04ce1cb..641461a569 100644
--- a/Makefile
+++ b/Makefile
@@ -432,11 +432,16 @@ all::
 # When cross-compiling, define HOST_CPU as the canonical name of the CPU on
 # which the built Git will run (for instance "x86_64").
 #
-# Define DEVELOPER to enable more compiler warnings. Compiler version
-# and faimily are auto detected, but could be overridden by defining
-# COMPILER_FEATURES (see config.mak.dev).
-# Define EAGER_DEVELOPER keeps compiler warnings non-fatal, but no warning
-# class is suppressed anymore.
+# Define DEVELOPER to enable more compiler warnings. We'll also enable
+# -Werror unless DEVELOPER_NONFATAL is defined. To enable even more
+# pedantic warnings that'll flag some potential existing issues in the
+# codebase turn on DEVELOPER_EXTRA, which implicitly sets DEVELOPER as
+# well, This is -Wextra with a whitelist of disabled warnings. Unless
+# DEVELOPER_NONFATAL is set DEVELOPER_EXTRA will turn it on
+# implicitly, so if you for some reason want both DEVELOPER and
+# DEVELOPER_EXTRA with fatal warnings, you need to set
+# DEVELOPER_FATAL=1 to force -Werror. See config.mak.dev for how this
+# all works.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1043,7 +1048,7 @@ include config.mak.uname
 -include config.mak.autogen
 -include config.mak
 
-ifdef EAGER_DEVELOPER
+ifdef DEVELOPER_EXTRA
 DEVELOPER = Yes
 endif
 ifdef DEVELOPER
diff --git a/config.mak.dev b/config.mak.dev
index 13883410b3..40f3365729 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -1,6 +1,12 @@
-ifndef EAGER_DEVELOPER
+ifndef DEVELOPER_NONFATAL
+ifndef DEVELOPER_EXTRA
 CFLAGS += -Werror
 endif
+endif
+ifdef DEVELOPER_FATAL
+CFLAGS += -Werror
+endif
+
 CFLAGS += -Wdeclaration-after-statement
 CFLAGS += -Wno-format-zero-length
 CFLAGS += -Wold-style-definition
@@ -23,7 +29,7 @@ CFLAGS += -Wextra
 # if a function is public, there should be a prototype and the right
 # header file should be included. If not, it should be static.
 CFLAGS += -Wmissing-prototypes
-ifndef EAGER_DEVELOPER
+ifndef DEVELOPER_EXTRA
 # These are disabled because we have these all over the place.
 CFLAGS += -Wno-empty-body
 CFLAGS += -Wno-missing-field-initializers
-- 
2.17.0.rc1.321.gba9d0f2565



Re: [PATCH v2] add -p: fix 2.17.0-rc* regression due to moved code

2018-03-31 Thread Phillip Wood

On 31/03/18 13:50, Ævar Arnfjörð Bjarmason wrote:

Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.

In Perl, regex variables like $1 always refer to the last regex
match. When the aforementioned change added a new regex match between
the old match and the corresponding code that was expecting $1, the $1
variable would always be undef, since the newly inserted regex match
doesn't have any captures.

As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:


Good catch, I could have sworn I'd tested my patch but I obviously 
didn't notice the warning (I've got interactive.singlekey set so it 
prints the warning and then prompts as it always has done). Calling it 
completely broken is perhaps a little harsh as it does work if you enter 
the regex again and with interactive.singlekey set you only have to 
enter the regex once.


Thanks for fixing it

Phillip


 $ perl -pi -e 's/Git/Tig/g' README.md
 $ ./git --exec-path=$PWD add -p
 [..]
 Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
 Split into 4 hunks.
 [...]
 Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
 Use of uninitialized value $1 in string eq at 
/home/avar/g/git/git-add--interactive line 1568,  line 1.
 search for regex? Many

I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Of course I just noticed the grammar errors in the commit message
after sending. Here's a v2 with that fixed, also genreated the patch
with -U6 to make it clear what's going on.

  git-add--interactive.perl | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index d190469cd8..c1f52e457f 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -1561,13 +1561,13 @@ sub patch_update_file {
elsif ($line =~ m|^/(.*)|) {
my $regex = $1;
unless ($other =~ m|/|) {
error_msg __("No other hunks to 
search\n");
next;
}
-   if ($1 eq "") {
+   if ($regex eq "") {
print colored $prompt_color, __("search for 
regex? ");
$regex = ;
if (defined $regex) {
chomp $regex;
}
}





Re: [PATCH v8 00/15] nd/pack-objects-pack-struct updates

2018-03-31 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 31 2018, Duy Nguyen wrote:

> On Sat, Mar 31, 2018 at 1:36 PM, Ævar Arnfjörð Bjarmason
>  wrote:
>>> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
>>> +
>>>  GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>>>  path where there are more than 1024 packs even if the actual number of
>>>  packs in repository is below this limit.
>>>
>>> -GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
>>> -code path where we do not cache objecct size in memory and read it
>>> -from existing packs on demand. This normally only happens when the
>>> -object size is over 2GB. This variable forces the code path on any
>>> -object larger than 2^ bytes.
>>
>> The docs here say set these env variables, but actually
>> GIT_TEST_FULL_IN_PACK_ARRAY is a special snowflake in requiring you to
>> set a bool value.
>>
>> I'd set GIT_TEST_SPLIT_INDEX=YesPlease already in my test setup & just
>> copied that as GIT_TEST_FULL_IN_PACK_ARRAY=YesPlease, but that'll error
>> out since it's expecting bool, not the env variable to be set.
>>
>> I really don't care which we use, but let's use either if(getenv()) or
>> if(git_env_bool()) consistently, and then have the docs either say "if
>> set" or "if set to a boolean value (see git-config(1))".
>
> I'll change GIT_TEST_SPLIT_INDEX to boolean too since I document it
> here anyway. Will wait for a while though to see if anything else
> should be part of v9.

Sounds good, FWIW (since I spied your forced push to your private branch
on Github) I mean something like this on top of what you just pushed:

diff --git a/t/README b/t/README
index 65dee935c0..583bede192 100644
--- a/t/README
+++ b/t/README
@@ -298,7 +298,8 @@ Running tests with special setups
 The whole test suite could be run to test some special features
 that cannot be easily covered by a few specific test cases. These
 could be enabled by running the test suite with correct GIT_TEST_
-environment set.
+environment variable set to a boolean value, as documented in the
+"Values" section of git-config(1).

 GIT_TEST_SPLIT_INDEX= forces split-index mode on the whole
 test suite.

I.e. the part above where we just say it has to be set should be changed
to indicate it's a boolean as understood by git, since in shell/*nix
idiom saying something has to be set just means ensure getenv() won't
return NULL.


[PATCH v6 5/6] worktree: factor out dwim_branch function

2018-03-31 Thread Thomas Gummerer
Factor out a dwim_branch function, which takes care of the dwim'ery in
'git worktree add '.  It's not too much code currently, but we're
adding a new kind of dwim in a subsequent patch, at which point it makes
more sense to have it as a separate function.

Factor it out now to reduce the patch noise in the next patch.

Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index f686ee1440..47189d50dd 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -363,6 +363,20 @@ static int add_worktree(const char *path, const char 
*refname,
return ret;
 }
 
+static const char *dwim_branch(const char *path, const char **new_branch)
+{
+   int n;
+   const char *s = worktree_basename(path, );
+   *new_branch = xstrndup(s, n);
+   if (guess_remote) {
+   struct object_id oid;
+   const char *remote =
+   unique_tracking_name(*new_branch, );
+   return remote;
+   }
+   return NULL;
+}
+
 static int add(int ac, const char **av, const char *prefix)
 {
struct add_opts opts;
@@ -415,16 +429,9 @@ static int add(int ac, const char **av, const char *prefix)
}
 
if (ac < 2 && !new_branch && !opts.detach) {
-   int n;
-   const char *s = worktree_basename(path, );
-   new_branch = xstrndup(s, n);
-   if (guess_remote) {
-   struct object_id oid;
-   const char *remote =
-   unique_tracking_name(new_branch, );
-   if (remote)
-   branch = remote;
-   }
+   const char *s = dwim_branch(path, _branch);
+   if (s)
+   branch = s;
}
 
if (ac == 2 && !new_branch && !opts.detach) {
-- 
2.16.1.78.ga2082135d8



[PATCH v6 6/6] worktree: teach "add" to check out existing branches

2018-03-31 Thread Thomas Gummerer
Currently 'git worktree add ' creates a new branch named after the
basename of the path by default.  If a branch with that name already
exists, the command refuses to do anything, unless the '--force' option
is given.

However we can do a little better than that, and check the branch out if
it is not checked out anywhere else.  This will help users who just want
to check an existing branch out into a new worktree, and save a few
keystrokes.

As the current behaviour is to simply 'die()' when a branch with the name
of the basename of the path already exists, there are no backwards
compatibility worries here.

We will still 'die()' if the branch is checked out in another worktree,
unless the --force flag is passed.

Signed-off-by: Thomas Gummerer 
---
 Documentation/git-worktree.txt |  9 +++--
 builtin/worktree.c | 22 +++---
 t/t2025-worktree-add.sh| 25 ++---
 3 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 41585f535d..eaa6bf713f 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -61,8 +61,13 @@ $ git worktree add --track -b   
/
 
 +
 If `` is omitted and neither `-b` nor `-B` nor `--detach` used,
-then, as a convenience, a new branch based at HEAD is created automatically,
-as if `-b $(basename )` was specified.
+then, as a convenience, a worktree with a branch named after
+`$(basename )` (call it ``) is created.  If ``
+doesn't exist, a new branch based on HEAD is automatically created as
+if `-b ` was given.  If `` exists in the repository,
+it will be checked out in the new worktree, if it's not checked out
+anywhere else, otherwise the command will refuse to create the
+worktree (unless `--force` is used).
 
 list::
 
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 47189d50dd..511d0aa370 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -363,11 +363,23 @@ static int add_worktree(const char *path, const char 
*refname,
return ret;
 }
 
-static const char *dwim_branch(const char *path, const char **new_branch)
+static const char *dwim_branch(const char *path, const char **new_branch,
+  int *checkout_existing_branch)
 {
int n;
const char *s = worktree_basename(path, );
-   *new_branch = xstrndup(s, n);
+   const char *branchname = xstrndup(s, n);
+   struct strbuf ref = STRBUF_INIT;
+
+   if (!strbuf_check_branch_ref(, branchname) &&
+   ref_exists(ref.buf)) {
+   *checkout_existing_branch = 1;
+   strbuf_release();
+   UNLEAK(branchname);
+   return branchname;
+   }
+
+   *new_branch = branchname;
if (guess_remote) {
struct object_id oid;
const char *remote =
@@ -385,6 +397,7 @@ static int add(int ac, const char **av, const char *prefix)
const char *branch;
const char *new_branch = NULL;
const char *opt_track = NULL;
+   int checkout_existing_branch = 0;
struct option options[] = {
OPT__FORCE(, N_("checkout  even if already 
checked out in other worktree")),
OPT_STRING('b', NULL, _branch, N_("branch"),
@@ -429,7 +442,8 @@ static int add(int ac, const char **av, const char *prefix)
}
 
if (ac < 2 && !new_branch && !opts.detach) {
-   const char *s = dwim_branch(path, _branch);
+   const char *s = dwim_branch(path, _branch,
+   _existing_branch);
if (s)
branch = s;
}
@@ -464,6 +478,8 @@ static int add(int ac, const char **av, const char *prefix)
if (run_command())
return -1;
branch = new_branch;
+   } else if (checkout_existing_branch) {
+ fprintf_ln(stderr, _("Checking out branch '%s'"), branch);
} else if (opt_track) {
die(_("--[no-]track can only be used if a new branch is 
created"));
}
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 2b95944973..f72cb0eb0b 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -198,13 +198,24 @@ test_expect_success '"add" with  omitted' '
test_cmp_rev HEAD bat
 '
 
-test_expect_success '"add" auto-vivify does not clobber existing branch' '
-   test_commit c1 &&
-   test_commit c2 &&
-   git branch precious HEAD~1 &&
-   test_must_fail git worktree add precious &&
-   test_cmp_rev HEAD~1 precious &&
-   test_path_is_missing precious
+test_expect_success '"add" checks out existing branch of dwimd name' '
+   git branch dwim HEAD~1 &&
+   git worktree add dwim &&
+   test_cmp_rev HEAD~1 dwim &&
+   (
+   cd dwim &&
+   test_cmp_rev dwim HEAD
+   )
+'
+

[PATCH v6 4/6] worktree: be clearer when "add" dwim-ery kicks in

2018-03-31 Thread Thomas Gummerer
Currently there is no indication in the "git worktree add" output that
a new branch was created.  This would be especially useful information
in the case where the dwim of "git worktree add " kicks in, as the
user didn't explicitly ask for a new branch, but we create one for them.

Print some additional output showing that a branch was created and the
branch name to help the user.

This will also be useful in a subsequent commit, which introduces a new
kind of dwim-ery of checking out the branch if it exists instead of
refusing to create a new worktree in that case, and where it's nice to
tell the user which kind of dwim-ery kicked in.

Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index cc94325886..f686ee1440 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -444,6 +444,8 @@ static int add(int ac, const char **av, const char *prefix)
 
if (new_branch) {
struct child_process cp = CHILD_PROCESS_INIT;
+
+   fprintf_ln(stderr, _("Creating branch '%s'"), new_branch);
cp.git_cmd = 1;
argv_array_push(, "branch");
if (new_branch_force)
-- 
2.16.1.78.ga2082135d8



[PATCH v6 0/6] worktree: teach "add" to check out existing branches

2018-03-31 Thread Thomas Gummerer
Thanks Eric for the review of the last round.

Previous rounds are at <20180121120208.12760-1-t.gumme...@gmail.com>,
<20180204221305.28300-1-t.gumme...@gmail.com>,
<20180317220830.30963-1-t.gumme...@gmail.com>,
<2018031719.4940-1-t.gumme...@gmail.com> and
20180325134947.25828-1-t.gumme...@gmail.com.

This round should fix all the UI issues Eric found in the last round.

The changes I made in a bit more detail:

- in addition to removing the 'force_new_branch' flag from 'struct
  add_opts', also remove the 'new_branch' member, which is local to
  the 'add()' function.  The other four members are needed in the
  'worktree_add()' function, so I left them there.   This patch is now
  the first patch in the series.

- added a new commit introducing a new hidden --show-new-head-line
  flag in 'git reset'.  This is used to suppress the "HEAD is now at
  ..."  line that 'git reset --hard' usually prints, so we can replace
  it with our own "New worktree HEAD is now at ..." line instead,
  while keeping the progress indicator for larger repositories.

  I guess this may be the most controversial bit at this point, as
  we'd be adding an option for internal use only.  Not sure how we
  feel about that. But short of going back to the old output format, I
  don't see a good option to make this work otherwise.  I tried
  re-using internal functions for this, but until we have 'struct
  repository' everywhere, that's going to be quite hard.

- Print the "Creating branch ..." and "Checking out branch ..."
  messages earlier in the process.  This is mainly to avoid the out of
  order "Branch '...' now set up to track ..." message.

- Various commit message and style cleanups

Note that these fixes are quite differently executed than I had
imagined them in the reply to the review comments, as things didn't
work as I had imagined.  The UI problems should be fixed now
nonetheless :)

Some examples of the new UI behaviour here for reference:

 - guess-remote mode

$ git worktree add --guess-remote ../next
Creating branch 'next'
Branch 'next' set up to track remote branch 'next' from 'origin'.
New worktree HEAD is now at caa68db14 Merge branch 
'sb/packfiles-in-repository' into next

 - original dwim (create a branch based on the current HEAD)

$ git worktree add ../test
Creating branch 'test'
New worktree HEAD is now at c2a499e6c Merge branch 'jh/partial-clone'

 - new dwim (check out existing branch)

$ git worktree add ../test
Checking out branch 'test'
New worktree HEAD is now at c2a499e6c Merge branch 'jh/partial-clone'

 - no new branch created
 
$ git worktree add ../test2 origin/master
New worktree HEAD is now at c2a499e6c Merge branch 'jh/partial-clone'

 - large repository (with original dwim)

$ g worktree add ../test
Creating branch 'test'
Checking out files: 100% (62915/62915), done.
New worktree HEAD is now at c2a9838452a4 Merge tag 'for-4.16/dm-fixes-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Compare this to the old UI (new dwim omitted, as there's no old
version of that):

 - guess-remote mode

$ git worktree add --guess-remote ../next
Branch 'next' set up to track remote branch 'next' from 'origin'.
Preparing ../next (identifier next)
HEAD is now at caa68db14 Merge branch 'sb/packfiles-in-repository' into next

 - original dwim (create a branch based on the current HEAD)

$ git worktree add ../test
Preparing ../test (identifier test)
HEAD is now at c2a499e6c Merge branch 'jh/partial-clone'

 - no new branch created

$ git worktree add ../test2 origin/master
Preparing ../test2 (identifier test2)
HEAD is now at c2a499e6c Merge branch 'jh/partial-clone'

 - large repository

$ git worktree add ../test
Preparing ../test (identifier test)
Checking out files: 100% (62915/62915), done.
HEAD is now at c2a9838452a4 Merge tag 'for-4.16/dm-fixes-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

The one thing we are loosing is a context line before "Checking out
files:", if no new branch is created.  Personally I feel like that's
acceptable, as the user just used the 'git worktree add' command, so
it should be intuitive where those files are being checked out.

We could also print "Preparing worktree " as a line in the
beginning (without mentioning the identifier, so we can print it in
the 'add()' function), but I don't feel like that's worth spending the
extra screen estate.  I don't feel strongly about that though, so if
someone has a moderately strong preference for that line being there,
I'm happy to add it.

Interdiff below:

diff --git a/builtin/reset.c b/builtin/reset.c
index e15f595799..54b2576449 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -288,6 +288,7 @@ static int git_reset_config(const char *var, const char 
*value, void *cb)
 int cmd_reset(int argc, const char **argv, const char *prefix)
 {
int reset_type = 

[PATCH v6 2/6] reset: introduce show-new-head-line option

2018-03-31 Thread Thomas Gummerer
Introduce a new --show-new-head-line command line option, that
determines whether the "HEAD is now at ..." message is printed or not.
It is enabled by default to preserve the current behaviour.

It will be used in a subsequent commit to disable printing the "HEAD is
now at ..." line in 'git worktree add', so it can be replaced with a
custom one, that explains the behaviour better.

We cannot just pass the --quite flag from 'git worktree add', as that
would also hide progress output when checking out large working trees,
which is undesirable.

As this option is only for internal use, which we probably don't want
to advertise to our users, at least until there's a need for it, make
it a hidden flag.

Signed-off-by: Thomas Gummerer 
---
 builtin/reset.c  | 5 -
 t/t7102-reset.sh | 5 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index e15f595799..54b2576449 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -288,6 +288,7 @@ static int git_reset_config(const char *var, const char 
*value, void *cb)
 int cmd_reset(int argc, const char **argv, const char *prefix)
 {
int reset_type = NONE, update_ref_status = 0, quiet = 0;
+   int show_new_head_line = 1;
int patch_mode = 0, unborn;
const char *rev;
struct object_id oid;
@@ -310,6 +311,7 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
OPT_BOOL('p', "patch", _mode, N_("select hunks 
interactively")),
OPT_BOOL('N', "intent-to-add", _to_add,
N_("record only the fact that removed paths 
will be added later")),
+   OPT_HIDDEN_BOOL(0, "show-new-head-line", _new_head_line, 
N_("show information on the new HEAD")),
OPT_END()
};
 
@@ -403,7 +405,8 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
 * switched to, saving the previous head in ORIG_HEAD before. */
update_ref_status = reset_refs(rev, );
 
-   if (reset_type == HARD && !update_ref_status && !quiet)
+   if (reset_type == HARD && !update_ref_status && !quiet &&
+   show_new_head_line)
print_new_head_line(lookup_commit_reference());
}
if (!pathspec.nr)
diff --git a/t/t7102-reset.sh b/t/t7102-reset.sh
index 95653a08ca..a160f78aba 100755
--- a/t/t7102-reset.sh
+++ b/t/t7102-reset.sh
@@ -568,4 +568,9 @@ test_expect_success 'reset --mixed sets up work tree' '
test_cmp expect actual
 '
 
+test_expect_success 'reset --no-show-new-head-line suppresses "HEAD is now at" 
output' '
+   git reset --hard --no-show-new-head-line HEAD >actual &&
+   ! grep "HEAD is now at" 

[PATCH v6 1/6] worktree: remove extra members from struct add_opts

2018-03-31 Thread Thomas Gummerer
There are two members of 'struct add_opts', which are only used inside
the 'add()' function, but being part of 'struct add_opts' they are
needlessly also passed to the 'add_worktree' function.

Make them local to the 'add()' function to make it clearer where they
are used.

Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 7cef5b120b..4d96a21a45 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -27,8 +27,6 @@ struct add_opts {
int detach;
int checkout;
int keep_locked;
-   const char *new_branch;
-   int force_new_branch;
 };
 
 static int show_only;
@@ -363,10 +361,11 @@ static int add(int ac, const char **av, const char 
*prefix)
const char *new_branch_force = NULL;
char *path;
const char *branch;
+   const char *new_branch = NULL;
const char *opt_track = NULL;
struct option options[] = {
OPT__FORCE(, N_("checkout  even if already 
checked out in other worktree")),
-   OPT_STRING('b', NULL, _branch, N_("branch"),
+   OPT_STRING('b', NULL, _branch, N_("branch"),
   N_("create a new branch")),
OPT_STRING('B', NULL, _branch_force, N_("branch"),
   N_("create or reset a branch")),
@@ -384,7 +383,7 @@ static int add(int ac, const char **av, const char *prefix)
memset(, 0, sizeof(opts));
opts.checkout = 1;
ac = parse_options(ac, av, prefix, options, worktree_usage, 0);
-   if (!!opts.detach + !!opts.new_branch + !!new_branch_force > 1)
+   if (!!opts.detach + !!new_branch + !!new_branch_force > 1)
die(_("-b, -B, and --detach are mutually exclusive"));
if (ac < 1 || ac > 2)
usage_with_options(worktree_usage, options);
@@ -395,33 +394,32 @@ static int add(int ac, const char **av, const char 
*prefix)
if (!strcmp(branch, "-"))
branch = "@{-1}";
 
-   opts.force_new_branch = !!new_branch_force;
-   if (opts.force_new_branch) {
+   if (new_branch_force) {
struct strbuf symref = STRBUF_INIT;
 
-   opts.new_branch = new_branch_force;
+   new_branch = new_branch_force;
 
if (!opts.force &&
-   !strbuf_check_branch_ref(, opts.new_branch) &&
+   !strbuf_check_branch_ref(, new_branch) &&
ref_exists(symref.buf))
die_if_checked_out(symref.buf, 0);
strbuf_release();
}
 
-   if (ac < 2 && !opts.new_branch && !opts.detach) {
+   if (ac < 2 && !new_branch && !opts.detach) {
int n;
const char *s = worktree_basename(path, );
-   opts.new_branch = xstrndup(s, n);
+   new_branch = xstrndup(s, n);
if (guess_remote) {
struct object_id oid;
const char *remote =
-   unique_tracking_name(opts.new_branch, );
+   unique_tracking_name(new_branch, );
if (remote)
branch = remote;
}
}
 
-   if (ac == 2 && !opts.new_branch && !opts.detach) {
+   if (ac == 2 && !new_branch && !opts.detach) {
struct object_id oid;
struct commit *commit;
const char *remote;
@@ -430,25 +428,25 @@ static int add(int ac, const char **av, const char 
*prefix)
if (!commit) {
remote = unique_tracking_name(branch, );
if (remote) {
-   opts.new_branch = branch;
+   new_branch = branch;
branch = remote;
}
}
}
 
-   if (opts.new_branch) {
+   if (new_branch) {
struct child_process cp = CHILD_PROCESS_INIT;
cp.git_cmd = 1;
argv_array_push(, "branch");
-   if (opts.force_new_branch)
+   if (new_branch_force)
argv_array_push(, "--force");
-   argv_array_push(, opts.new_branch);
+   argv_array_push(, new_branch);
argv_array_push(, branch);
if (opt_track)
argv_array_push(, opt_track);
if (run_command())
return -1;
-   branch = opts.new_branch;
+   branch = new_branch;
} else if (opt_track) {
die(_("--[no-]track can only be used if a new branch is 
created"));
}
-- 
2.16.1.78.ga2082135d8



[PATCH v6 3/6] worktree: improve message when creating a new worktree

2018-03-31 Thread Thomas Gummerer
Currently 'git worktree add' produces output like the following, when
'--no-checkout' is not given:

Preparing foo (identifier foo)
HEAD is now at 26da330922 

where the first line is written to stderr, and the second line coming
from 'git reset --hard' is written to stdout, even though both lines are
supposed to tell the user what has happened.  In addition to someone not
familiar with 'git worktree', this might seem as if the current HEAD was
modified, not the HEAD in the new working tree.

If the '--no-checkout' flag is given, the output of 'git worktree add'
is just:

Preparing foo (identifier foo)

even though the HEAD is set to a commit, which is just not checked out.

The identifier is also not particularly relevant for the user at the
moment, as it's only used internally to distinguish between different
worktrees that have the same $(basename ).

Fix these inconsistencies, and no longer show the identifier by making
the 'git reset --hard' call not print the "HEAD is now at ..." line
using the newly introduced flag from the previous commit, and printing
the message directly from the builtin command instead.

Signed-off-by: Thomas Gummerer 
---
 builtin/worktree.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 4d96a21a45..cc94325886 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -301,8 +301,6 @@ static int add_worktree(const char *path, const char 
*refname,
strbuf_addf(, "%s/commondir", sb_repo.buf);
write_file(sb.buf, "../..");
 
-   fprintf_ln(stderr, _("Preparing %s (identifier %s)"), path, name);
-
argv_array_pushf(_env, "%s=%s", GIT_DIR_ENVIRONMENT, sb_git.buf);
argv_array_pushf(_env, "%s=%s", GIT_WORK_TREE_ENVIRONMENT, path);
cp.git_cmd = 1;
@@ -322,12 +320,22 @@ static int add_worktree(const char *path, const char 
*refname,
cp.argv = NULL;
argv_array_clear();
argv_array_pushl(, "reset", "--hard", NULL);
+   argv_array_push(, "--no-show-new-head-line");
cp.env = child_env.argv;
ret = run_command();
if (ret)
goto done;
}
 
+   fprintf(stderr, _("New worktree HEAD is now at %s"),
+   find_unique_abbrev(commit->object.oid.hash, DEFAULT_ABBREV));
+
+   strbuf_reset();
+   pp_commit_easy(CMIT_FMT_ONELINE, commit, );
+   if (sb.len > 0)
+   fprintf(stderr, " %s", sb.buf);
+   fputc('\n', stderr);
+
is_junk = 0;
FREE_AND_NULL(junk_work_tree);
FREE_AND_NULL(junk_git_dir);
-- 
2.16.1.78.ga2082135d8



I need your assistance in my investment project.

2018-03-31 Thread Lisa Jaster



[PATCH v2] add -p: fix 2.17.0-rc* regression due to moved code

2018-03-31 Thread Ævar Arnfjörð Bjarmason
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.

In Perl, regex variables like $1 always refer to the last regex
match. When the aforementioned change added a new regex match between
the old match and the corresponding code that was expecting $1, the $1
variable would always be undef, since the newly inserted regex match
doesn't have any captures.

As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:

$ perl -pi -e 's/Git/Tig/g' README.md
$ ./git --exec-path=$PWD add -p
[..]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
Split into 4 hunks.
[...]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
Use of uninitialized value $1 in string eq at 
/home/avar/g/git/git-add--interactive line 1568,  line 1.
search for regex? Many

I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Of course I just noticed the grammar errors in the commit message
after sending. Here's a v2 with that fixed, also genreated the patch
with -U6 to make it clear what's going on.

 git-add--interactive.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index d190469cd8..c1f52e457f 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -1561,13 +1561,13 @@ sub patch_update_file {
elsif ($line =~ m|^/(.*)|) {
my $regex = $1;
unless ($other =~ m|/|) {
error_msg __("No other hunks to 
search\n");
next;
}
-   if ($1 eq "") {
+   if ($regex eq "") {
print colored $prompt_color, __("search 
for regex? ");
$regex = ;
if (defined $regex) {
chomp $regex;
}
}
-- 
2.17.0.rc1.321.gba9d0f2565



[PATCH] add -p: fix 2.17.0-rc* regression due to moved code

2018-03-31 Thread Ævar Arnfjörð Bjarmason
Fix a regression in 88f6ffc1c2 ("add -p: only bind search key if
there's more than one hunk", 2018-02-13) which is present in
2.17.0-rc*, but not 2.16.0.

In Perl regex variables like $1 always refer to the last regex
match. When the added a new regex match between the old match and the
code that was expecting its $1, the $1 variable would always become
undef.

As a result the "/" feature to search for a string in a hunk by regex
completely broke, on git.git:

$ perl -pi -e 's/Git/Tig/g' README.md
$ ./git --exec-path=$PWD add -p
[..]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? s
Split into 4 hunks.
[...]
Stage this hunk [y,n,q,a,d,j,J,g,/,s,e,?]? /Many
Use of uninitialized value $1 in string eq at 
/home/avar/g/git/git-add--interactive line 1568,  line 1.
search for regex? Many

I.e. the initial "/regex" command wouldn't work, and would always emit
a warning and ask again for a regex, now it works as intended again.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

Junio: You'll want this for 2.17.0 final. I didn't have time to add
tests for this, but looking over t3701-add-interactive.sh it seems
doable. Phillip: Maybe something you'd be interested in?

 git-add--interactive.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index d190469cd8..c1f52e457f 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -1564,7 +1564,7 @@ sub patch_update_file {
error_msg __("No other hunks to 
search\n");
next;
}
-   if ($1 eq "") {
+   if ($regex eq "") {
print colored $prompt_color, __("search 
for regex? ");
$regex = ;
if (defined $regex) {
-- 
2.17.0.rc1.321.gba9d0f2565



Re: [PATCH v8 00/15] nd/pack-objects-pack-struct updates

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 1:36 PM, Ævar Arnfjörð Bjarmason
 wrote:
>> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
>> +
>>  GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>>  path where there are more than 1024 packs even if the actual number of
>>  packs in repository is below this limit.
>>
>> -GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
>> -code path where we do not cache objecct size in memory and read it
>> -from existing packs on demand. This normally only happens when the
>> -object size is over 2GB. This variable forces the code path on any
>> -object larger than 2^ bytes.
>
> The docs here say set these env variables, but actually
> GIT_TEST_FULL_IN_PACK_ARRAY is a special snowflake in requiring you to
> set a bool value.
>
> I'd set GIT_TEST_SPLIT_INDEX=YesPlease already in my test setup & just
> copied that as GIT_TEST_FULL_IN_PACK_ARRAY=YesPlease, but that'll error
> out since it's expecting bool, not the env variable to be set.
>
> I really don't care which we use, but let's use either if(getenv()) or
> if(git_env_bool()) consistently, and then have the docs either say "if
> set" or "if set to a boolean value (see git-config(1))".

I'll change GIT_TEST_SPLIT_INDEX to boolean too since I document it
here anyway. Will wait for a while though to see if anything else
should be part of v9.
-- 
Duy


Re: [PATCH v8 00/15] nd/pack-objects-pack-struct updates

2018-03-31 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 31 2018, Nguyễn Thái Ngọc Duy wrote:

I'm testing this and it looks good to me so far, aside from this:

> - use git_env_*() instead of manually handling getenv() values
> [...]
>   struct packed_git **mapping, *p;
> - int cnt = 0, nr = 1 << OE_IN_PACK_BITS;
> -
> - if (getenv("GIT_TEST_FULL_IN_PACK_ARRAY")) {
> - /*
> -  * leave in_pack_by_idx NULL to force in_pack[] to be
> -  * used instead
> -  */
> - return;
> - }
> [...]
>
> + if (git_env_bool("GIT_TEST_FULL_IN_PACK_ARRAY", 0)) {
> + /*
> +  * do not initialize in_pack_by_idx[] to force the
> +  * slow path in oe_in_pack()
> +  */
> + } else {
> + prepare_in_pack_by_idx(pdata);
> + }
> [...]
> diff --git a/t/README b/t/README
> index 02bfb3fed5..c01d210c15 100644
> --- a/t/README
> +++ b/t/README
> @@ -291,16 +291,26 @@ expect the rest to function correctly.
>  and know what setup is needed for it.  Or when you want to run
>  everything up to a certain test.
>
> +
> +Running tests with special setups
> +-
> +
> +The whole test suite could be run to test some special features
> +that cannot be easily covered by a few specific test cases. These
> +could be enabled by running the test suite with correct GIT_TEST_
> +environment set.
> +
> +GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
> +
>  GIT_TEST_FULL_IN_PACK_ARRAY exercises the uncommon pack-objects code
>  path where there are more than 1024 packs even if the actual number of
>  packs in repository is below this limit.
>
> -GIT_TEST_OE_SIZE_BITS= exercises the uncommon pack-objects
> -code path where we do not cache objecct size in memory and read it
> -from existing packs on demand. This normally only happens when the
> -object size is over 2GB. This variable forces the code path on any
> -object larger than 2^ bytes.

The docs here say set these env variables, but actually
GIT_TEST_FULL_IN_PACK_ARRAY is a special snowflake in requiring you to
set a bool value.

I'd set GIT_TEST_SPLIT_INDEX=YesPlease already in my test setup & just
copied that as GIT_TEST_FULL_IN_PACK_ARRAY=YesPlease, but that'll error
out since it's expecting bool, not the env variable to be set.

I really don't care which we use, but let's use either if(getenv()) or
if(git_env_bool()) consistently, and then have the docs either say "if
set" or "if set to a boolean value (see git-config(1))".


Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-31 Thread Duy Nguyen
On Sat, Mar 31, 2018 at 06:20:07AM -0400, Jeff King wrote:
> On Sat, Mar 31, 2018 at 06:51:10AM +0200, Duy Nguyen wrote:
> 
> > >> +#define IN_PACK(obj) oe_in_pack(_pack, obj)
> > >
> > > How come this one gets a macro, but the earlier conversions don't?
> > >
> > > I guess the problem is that oe_in_pack() is defined in the generic
> > > pack-objects.h, but _pack is only in builtin/pack-objects.c?
> > >
> > > I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
> > > everywhere. It's longer, but it makes the code slightly less magical to
> > > read.
> > 
> > Longer was exactly why I added these macros (with the hope that the
> > macro upper case names already ring a "it's magical" bell). Should I
> > drop all these macros? Some code becomes a lot more verbose though.
> 
> I'm on the fence. I agree that the macro screams "magical". I just
> sometimes see a macro and think something really weird and
> unfunction-like is going on. But really we're just replacing a default
> parameter.
> 
> So I dunno. If you get rid of the macros and I look at it, I give even
> odds that I'll say "yech, put them back!". :)

It would look like this (on top of v8). I think the "_pack" part is
most distracting when it's used as part of an expression (or a
function argument). I probably went overboard with SET_ macros though.

-- 8< --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b5bba2c228..dec849b755 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,18 +29,6 @@
 #include "list.h"
 #include "packfile.h"
 
-#define IN_PACK(obj) oe_in_pack(_pack, obj)
-#define SIZE(obj) oe_size(_pack, obj)
-#define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
-#define DELTA_SIZE(obj) oe_delta_size(_pack, obj)
-#define DELTA(obj) oe_delta(_pack, obj)
-#define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
-#define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
-#define SET_DELTA(obj, val) oe_set_delta(_pack, obj, val)
-#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(_pack, obj, val)
-#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(_pack, obj, val)
-#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(_pack, obj, val)
-
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
N_("git pack-objects [...]  [<  | < 
]"),
@@ -137,14 +125,14 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, , );
if (!buf)
die("unable to read %s", oid_to_hex(>idx.oid));
-   base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, ,
+   base_buf = read_sha1_file(oe_delta(_pack, entry)->idx.oid.hash, 
,
  _size);
if (!base_buf)
die("unable to read %s",
-   oid_to_hex((entry)->idx.oid));
+   oid_to_hex(_delta(_pack, entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
-   if (!delta_buf || delta_size != DELTA_SIZE(entry))
+   if (!delta_buf || delta_size != oe_delta_size(_pack, entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -295,15 +283,15 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
-   size = DELTA_SIZE(entry);
+   size = oe_delta_size(_pack, entry);
buf = entry->delta_data;
entry->delta_data = NULL;
-   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
+   type = (allow_ofs_delta && oe_delta(_pack, 
entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
-   size = DELTA_SIZE(entry);
-   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
+   size = oe_delta_size(_pack, entry);
+   type = (allow_ofs_delta && oe_delta(_pack, 
entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
 
@@ -327,7 +315,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 * encoding of the relative offset for the delta
 * base from this object's position in the pack.
 */
-   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
+   off_t ofs = entry->idx.offset - oe_delta(_pack, 
entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -353,7 +341,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
-   hashwrite(f, 

Re: [PATCH v7 06/13] pack-objects: move in_pack out of struct object_entry

2018-03-31 Thread Jeff King
On Sat, Mar 31, 2018 at 06:51:10AM +0200, Duy Nguyen wrote:

> >> +#define IN_PACK(obj) oe_in_pack(_pack, obj)
> >
> > How come this one gets a macro, but the earlier conversions don't?
> >
> > I guess the problem is that oe_in_pack() is defined in the generic
> > pack-objects.h, but _pack is only in builtin/pack-objects.c?
> >
> > I wonder if it would be that bad to just say oe_in_pack(_pack, obj)
> > everywhere. It's longer, but it makes the code slightly less magical to
> > read.
> 
> Longer was exactly why I added these macros (with the hope that the
> macro upper case names already ring a "it's magical" bell). Should I
> drop all these macros? Some code becomes a lot more verbose though.

I'm on the fence. I agree that the macro screams "magical". I just
sometimes see a macro and think something really weird and
unfunction-like is going on. But really we're just replacing a default
parameter.

So I dunno. If you get rid of the macros and I look at it, I give even
odds that I'll say "yech, put them back!". :)

-Peff


Re: [PATCH v7 08/13] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-31 Thread Jeff King
On Sat, Mar 31, 2018 at 06:40:23AM +0200, Duy Nguyen wrote:

> > Unlike the depth, I don't think there's any _inherent_ reason you
> > couldn't throw, say, 1MB deltas into the cache (if you sized it large
> > enough). But I doubt such deltas are really all that common. Here are
> > the top 10 in linux.git:
> >
> >   $ git cat-file --batch-all-objects --batch-check='%(deltabase) 
> > %(objectsize:disk)' |
> > grep -v ^0 | sort -k 2nr | head
> >   a02b6794337286bc12c907c33d5d75537c240bd0 769103
> >   b28d4b64c05da02c5e8c684dcb9422876225ebdc 327116
> >   1e98ce86ed19aff9ba721d13a749ff08088c9922 325257
> >   a02b6794337286bc12c907c33d5d75537c240bd0 240647
> >   c550d99286c01867dfb26e432417f3106acf8611 177896
> >   5977795854f852c2b95dd023fd03cace023ee41c 119737
> >   4ccf9681c45d01d17376f7e0d266532a4460f5f8 112671
> >   b39fb6821faa9e7bc36de738152a2817b4bf3654 112657
> >   2645d6239b74bebd661436762e819b831095b084 103980
> >   b8ce7fe5d8def58dc63b7ae099eff7bd07e4e845 101014
> >
> > It's possible some weird workload would want to tweak this. Say you were
> > storing a ton of delta-capable files that were big and always differed
> > by a megabyte. And it was somehow really important to you to tradeoff
> > memory for CPU during the write phase of a pack.
> 
> We're not short on spare bits so I will try to raise this limit to 1MB
> (not because you mentioned 1MB, but because the largest size in your
> output is close to 1MB).

I doubt it matters much. Unless somebody has been tweaking the config
themselves, this has been limited to 1000 for everybody running
linux.git and nobody has ever noticed.

So I think it would only be an issue if:

  1. you had an oddball repo with gigantic deltas

AND

  2. you for some reason really cared about caching the deltas between
 phases

AND

  3. you had done enough homework to even figure out that this knob
 existed

I was thinking that you might care about (2) for serving fetches of your
oddball repository. But really, if you care about minimizing work, you
want to be reusing on-disk deltas anyway, which would skip this cache
entirely. So any work we do to reproduce the delta would probably be
dwarfed by the finding of this giant delta in the first place.

So raise the limit if you want, but I'd be surprised if anybody was even
doing (3) in the first place.

-Peff


Re: [PATCH v7 10/13] pack-objects: clarify the use of object_entry::size

2018-03-31 Thread Jeff King
On Sat, Mar 31, 2018 at 06:35:40AM +0200, Duy Nguyen wrote:

> On Fri, Mar 30, 2018 at 11:04 PM, Jeff King  wrote:
> > The subject says "clarify" so I was a little surprised to see code
> > changes. It looks like we're just avoiding reassigning on top of the
> > value repeatedly, which is part of that clarification. It looks like a
> > noop to me.
> 
> Oh well... I was counting on the new name (in_pack_size, which follows
> in_pack_type naming convention) to emphasize it (and the new "delta
> size" comment to point out where in_pack_size contains a delta size.

Just to be clear, my final "it looks like a noop" means "good, it looks
like it is a pure cosmetic change and no change to the behavior."

-Peff


[PATCH v8 14/15] pack-objects: reorder members to shrink struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes on 64-bit systems, before any field shrinking is done)
with 16 bits to spare (and a couple more in in_pack_header_size when
we really run out of bits).

This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).

Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%. The runtime
of repack has stayed the same throughout this series. Ævar's testing
on a big monorepo he has access to (bigger than linux-2.6.git) has
shown a 7.9% reduction, so the overall expected improvement should be
somewhere around 8%.

See 87po42cwql@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/) for
more detailed numbers and a test script used to produce the numbers
cited above.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/pack-objects.h b/pack-objects.h
index b5114a70a7..60192cce1f 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -26,6 +26,10 @@ enum dfs_state {
 };
 
 /*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ *
  * basic object info
  * -
  * idx.oid is filled up before delta searching starts. idx.crc32 is
@@ -74,34 +78,44 @@ enum dfs_state {
  */
 struct object_entry {
struct pack_idx_entry idx;
+   void *delta_data;   /* cached delta (uncompressed) */
+   off_t in_pack_offset;
+   uint32_t hash;  /* name hint hash */
unsigned size_:OE_SIZE_BITS;
unsigned size_valid:1;
-   unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
-   off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who
 * uses the same base as me
 */
-   void *delta_data;   /* cached delta (uncompressed) */
unsigned delta_size_:OE_DELTA_SIZE_BITS; /* delta data size 
(uncompressed) */
unsigned delta_size_valid:1;
+   unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
unsigned z_delta_size:OE_Z_DELTA_BITS;
+   unsigned type_valid:1;
unsigned type_:TYPE_BITS;
+   unsigned no_try_delta:1;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
-   unsigned type_valid:1;
-   uint32_t hash;  /* name hint hash */
-   unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
-   unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
+   unsigned char in_pack_header_size;
unsigned depth:OE_DEPTH_BITS;
+
+   /*
+* pahole results on 64-bit linux (gcc and clang)
+*
+*   size: 80, bit_padding: 20 bits, holes: 8 bits
+*
+* and on 32-bit (gcc)
+*
+*   size: 76, bit_padding: 20 bits, holes: 8 bits
+*/
 };
 
 struct packing_data {
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 08/15] pack-objects: refer to delta objects by index instead of pointer

2018-03-31 Thread Nguyễn Thái Ngọc Duy
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.

Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].

[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 116 ++---
 pack-objects.h |  67 ++--
 2 files changed, 124 insertions(+), 59 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4e5812a053..118c8fd993 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,12 @@
 #include "packfile.h"
 
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
+#define DELTA(obj) oe_delta(_pack, obj)
+#define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
+#define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
+#define SET_DELTA(obj, val) oe_set_delta(_pack, obj, val)
+#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(_pack, obj, val)
+#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(_pack, obj, val)
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
@@ -127,11 +133,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, , );
if (!buf)
die("unable to read %s", oid_to_hex(>idx.oid));
-   base_buf = read_sha1_file(entry->delta->idx.oid.hash, ,
+   base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, ,
  _size);
if (!base_buf)
die("unable to read %s",
-   oid_to_hex(>delta->idx.oid));
+   oid_to_hex((entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -288,12 +294,12 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
 
@@ -317,7 +323,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 * encoding of the relative offset for the delta
 * base from this object's position in the pack.
 */
-   off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -343,7 +349,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
-   hashwrite(f, entry->delta->idx.oid.hash, 20);
+   hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -379,8 +385,8 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
  dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
 
-   if (entry->delta)
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   if (DELTA(entry))
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
  type, entry->size);
@@ -408,7 +414,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
}
 
if (type == OBJ_OFS_DELTA) {
-   off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
 

[PATCH v8 00/15] nd/pack-objects-pack-struct updates

2018-03-31 Thread Nguyễn Thái Ngọc Duy
v8 changes

- prefer BUG() over die()
- do "1U <<" instead of "1 << " to avoid undefined behavior with
  signed shifting.
- add more comments based on Jeff's feedback
- plug a leak in try_delta() when delta_size is too large
- be kind and set depth/cache_max_small_delta_size to max limit
  instead of dying when the user gives a value over limit
- make travis execute pack-objects uncommon code
- use git_env_*() instead of manually handling getenv() values
- fallback code for when a new pack is added when pack-objects is
  running
- Compressed cached delta size limit is increased from 64k to 1MB
- Cached delta size limit is decreased from 2G to 1MB

Interdiff

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c774821930..b5bba2c228 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1439,7 +1439,7 @@ static void check_object(struct object_entry *entry)
goto give_up;
 
if (type < 0)
-   die("BUG: invalid type %d", type);
+   BUG("invalid type %d", type);
entry->in_pack_type = type;
 
/*
@@ -1861,6 +1861,11 @@ static pthread_mutex_t progress_mutex;
 
 #endif
 
+/*
+ * Return the size of the object without doing any delta
+ * reconstruction (so non-deltas are true object sizes, but deltas
+ * return the size of the delta data).
+ */
 unsigned long oe_get_size_slow(struct packing_data *pack,
   const struct object_entry *e)
 {
@@ -1881,7 +1886,7 @@ unsigned long oe_get_size_slow(struct packing_data *pack,
 
p = oe_in_pack(pack, e);
if (!p)
-   die("BUG: when e->type is a delta, it must belong to a pack");
+   BUG("when e->type is a delta, it must belong to a pack");
 
read_lock();
w_curs = NULL;
@@ -2006,8 +2011,10 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
+   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
+   free(delta_buf);
return 0;
+   }
 
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
@@ -2163,7 +2170,7 @@ static void find_deltas(struct object_entry **list, 
unsigned *list_size,
unsigned long size;
 
size = do_compress(>delta_data, 
DELTA_SIZE(entry));
-   if (size < (1 << OE_Z_DELTA_BITS)) {
+   if (size < (1U << OE_Z_DELTA_BITS)) {
entry->z_delta_size = size;
cache_lock();
delta_cache_size -= DELTA_SIZE(entry);
@@ -3131,7 +3138,7 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
};
 
if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
-   die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");
+   BUG("too many dfs states, increase OE_DFS_STATE_BITS");
 
check_replace_refs = 0;
 
@@ -3149,12 +3156,16 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
 
-   if (depth >= (1 << OE_DEPTH_BITS))
-   die(_("delta chain depth %d is greater than maximum limit %d"),
-   depth, (1 << OE_DEPTH_BITS) - 1);
-   if (cache_max_small_delta_size >= (1 << OE_Z_DELTA_BITS))
-   die(_("pack.deltaCacheLimit is greater than maximum limit %d"),
-   (1 << OE_Z_DELTA_BITS) - 1);
+   if (depth >= (1 << OE_DEPTH_BITS)) {
+   warning(_("delta chain depth %d is too deep, forcing %d"),
+   depth, (1 << OE_DEPTH_BITS) - 1);
+   depth = (1 << OE_DEPTH_BITS) - 1;
+   }
+   if (cache_max_small_delta_size >= (1U << OE_Z_DELTA_BITS)) {
+   warning(_("pack.deltaCacheLimit is too high, forcing %d"),
+   (1U << OE_Z_DELTA_BITS) - 1);
+   cache_max_small_delta_size = (1U << OE_Z_DELTA_BITS) - 1;
+   }
 
argv_array_push(, "pack-objects");
if (thin) {
@@ -3274,6 +3285,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   prepare_packing_data(_pack);
+
if (progress)
progress_state = start_progress(_("Counting objects"), 0);
if (!use_internal_rev_list)
diff --git a/ci/run-tests.sh b/ci/run-tests.sh
index 73e273fac7..857d144ee8 100755
--- a/ci/run-tests.sh
+++ b/ci/run-tests.sh
@@ -10,7 +10,10 @@ ln -s "$cache_dir/.prove" t/.prove
 make --quiet test
 if test "$jobname" = "linux-gcc"
 then
-   GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
+   export 

[PATCH v8 09/15] pack-objects: shrink z_delta_size field in struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt |  3 ++-
 builtin/pack-objects.c   | 24 ++--
 pack-objects.h   |  3 ++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9bd3f5a789..00fa824448 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2449,7 +2449,8 @@ pack.deltaCacheLimit::
The maximum size of a delta, that is cached in
linkgit:git-pack-objects[1]. This cache is used to speed up the
writing object phase by not having to recompute the final delta
-   result once the best match for all objects is found. Defaults to 1000.
+   result once the best match for all objects is found.
+   Defaults to 1000. Maximum value is 65535.
 
 pack.threads::
Specifies the number of threads to spawn when searching for best
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 118c8fd993..34e285a1b7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2100,12 +2100,19 @@ static void find_deltas(struct object_entry **list, 
unsigned *list_size,
 * between writes at that moment.
 */
if (entry->delta_data && !pack_to_stdout) {
-   entry->z_delta_size = do_compress(>delta_data,
- entry->delta_size);
-   cache_lock();
-   delta_cache_size -= entry->delta_size;
-   delta_cache_size += entry->z_delta_size;
-   cache_unlock();
+   unsigned long size;
+
+   size = do_compress(>delta_data, 
entry->delta_size);
+   if (size < (1U << OE_Z_DELTA_BITS)) {
+   entry->z_delta_size = size;
+   cache_lock();
+   delta_cache_size -= entry->delta_size;
+   delta_cache_size += entry->z_delta_size;
+   cache_unlock();
+   } else {
+   FREE_AND_NULL(entry->delta_data);
+   entry->z_delta_size = 0;
+   }
}
 
/* if we made n a delta, and if n is already at max
@@ -3086,6 +3093,11 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
depth, (1 << OE_DEPTH_BITS) - 1);
depth = (1 << OE_DEPTH_BITS) - 1;
}
+   if (cache_max_small_delta_size >= (1U << OE_Z_DELTA_BITS)) {
+   warning(_("pack.deltaCacheLimit is too high, forcing %d"),
+   (1U << OE_Z_DELTA_BITS) - 1);
+   cache_max_small_delta_size = (1U << OE_Z_DELTA_BITS) - 1;
+   }
 
argv_array_push(, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index 272ddeeedb..5613a3d040 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -4,6 +4,7 @@
 #define OE_DFS_STATE_BITS  2
 #define OE_DEPTH_BITS  12
 #define OE_IN_PACK_BITS10
+#define OE_Z_DELTA_BITS20
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -75,7 +76,7 @@ struct object_entry {
 */
void *delta_data;   /* cached delta (uncompressed) */
unsigned long delta_size;   /* delta data size (uncompressed) */
-   unsigned long z_delta_size; /* delta data size (compressed) */
+   unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 13/15] pack-objects: shrink delta_size field in struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.

If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.

Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 26 --
 pack-objects.h | 23 ++-
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b4ea6290f9..b5bba2c228 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -32,10 +32,12 @@
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
 #define SIZE(obj) oe_size(_pack, obj)
 #define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
+#define DELTA_SIZE(obj) oe_delta_size(_pack, obj)
 #define DELTA(obj) oe_delta(_pack, obj)
 #define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
 #define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
 #define SET_DELTA(obj, val) oe_set_delta(_pack, obj, val)
+#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(_pack, obj, val)
 #define SET_DELTA_CHILD(obj, val) oe_set_delta_child(_pack, obj, val)
 #define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(_pack, obj, val)
 
@@ -142,7 +144,7 @@ static void *get_delta(struct object_entry *entry)
oid_to_hex((entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
-   if (!delta_buf || delta_size != entry->delta_size)
+   if (!delta_buf || delta_size != DELTA_SIZE(entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -293,14 +295,14 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
-   size = entry->delta_size;
+   size = DELTA_SIZE(entry);
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
-   size = entry->delta_size;
+   size = DELTA_SIZE(entry);
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -1508,7 +1510,7 @@ static void check_object(struct object_entry *entry)
oe_set_type(entry, entry->in_pack_type);
SET_SIZE(entry, in_pack_size); /* delta size */
SET_DELTA(entry, base_entry);
-   entry->delta_size = in_pack_size;
+   SET_DELTA_SIZE(entry, in_pack_size);
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(_curs);
@@ -1938,7 +1940,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
-   max_size = trg_entry->delta_size;
+   max_size = DELTA_SIZE(trg_entry);
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
@@ -2009,10 +2011,14 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
+   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
+   free(delta_buf);
+   return 0;
+   }
 
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
-   if (delta_size == trg_entry->delta_size &&
+   if (delta_size == DELTA_SIZE(trg_entry) &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
@@ -2027,7 +2033,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
-   delta_cache_size -= trg_entry->delta_size;
+   delta_cache_size -= DELTA_SIZE(trg_entry);
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
@@ -2040,7 +2046,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
}
 
SET_DELTA(trg_entry, src_entry);
-   trg_entry->delta_size = delta_size;
+   SET_DELTA_SIZE(trg_entry, delta_size);
trg->depth = 

[PATCH v8 15/15] ci: exercise the whole test suite with uncommon code in pack-objects

2018-03-31 Thread Nguyễn Thái Ngọc Duy
Some recent optimizations have been added to pack-objects to reduce
memory usage and some code paths are split into two: one for common
use cases and one for rare ones. Make sure the rare cases are tested
with Travis since it requires manual test configuration that is
unlikely to be done by developers.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 ci/run-tests.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ci/run-tests.sh b/ci/run-tests.sh
index 73e273fac7..857d144ee8 100755
--- a/ci/run-tests.sh
+++ b/ci/run-tests.sh
@@ -10,7 +10,10 @@ ln -s "$cache_dir/.prove" t/.prove
 make --quiet test
 if test "$jobname" = "linux-gcc"
 then
-   GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
+   export GIT_TEST_SPLIT_INDEX=YesPlease
+   export GIT_TEST_FULL_IN_PACK_ARRAY=true
+   export GIT_TEST_OE_SIZE=10
+   make --quiet test
 fi
 
 check_unignored_build_artifacts
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 11/15] pack-objects: clarify the use of object_entry::size

2018-03-31 Thread Nguyễn Thái Ngọc Duy
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.

"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 11 ---
 pack-objects.h |  4 +++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 481b55c746..7a84c3f59a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1417,6 +1417,7 @@ static void check_object(struct object_entry *entry)
off_t ofs;
unsigned char *buf, c;
enum object_type type;
+   unsigned long in_pack_size;
 
buf = use_pack(p, _curs, entry->in_pack_offset, );
 
@@ -1426,7 +1427,7 @@ static void check_object(struct object_entry *entry)
 */
used = unpack_object_header_buffer(buf, avail,
   ,
-  >size);
+  _pack_size);
if (used == 0)
goto give_up;
 
@@ -1443,6 +1444,7 @@ static void check_object(struct object_entry *entry)
default:
/* Not a delta hence we've already got all we need. */
oe_set_type(entry, entry->in_pack_type);
+   entry->size = in_pack_size;
entry->in_pack_header_size = used;
if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > 
OBJ_BLOB)
goto give_up;
@@ -1499,6 +1501,7 @@ static void check_object(struct object_entry *entry)
 * circular deltas.
 */
oe_set_type(entry, entry->in_pack_type);
+   entry->size = in_pack_size; /* delta size */
SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
entry->delta_sibling_idx = base_entry->delta_child_idx;
@@ -1508,13 +1511,15 @@ static void check_object(struct object_entry *entry)
}
 
if (oe_type(entry)) {
+   off_t delta_pos;
+
/*
 * This must be a delta and we already know what the
 * final object type is.  Let's extract the actual
 * object size from the delta header.
 */
-   entry->size = get_size_from_delta(p, _curs,
-   entry->in_pack_offset + 
entry->in_pack_header_size);
+   delta_pos = entry->in_pack_offset + 
entry->in_pack_header_size;
+   entry->size = get_size_from_delta(p, _curs, 
delta_pos);
if (entry->size == 0)
goto give_up;
unuse_pack(_curs);
diff --git a/pack-objects.h b/pack-objects.h
index 5613a3d040..534f5a5e4d 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -30,7 +30,9 @@ enum dfs_state {
  *
  * "size" is the uncompressed object size. Compressed size of the raw
  * data for an object in a pack is not stored anywhere but is computed
- * and made available when reverse .idx is made.
+ * and made available when reverse .idx is made. Note that when a
+ * delta is reused, "size" is the uncompressed _delta_ size, not the
+ * canonical one after the delta has been applied.
  *
  * "hash" contains a path name hash which is used for sorting the
  * delta list and also during delta searching. Once prepare_pack()
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 10/15] pack-objects: don't check size when the object is bad

2018-03-31 Thread Nguyễn Thái Ngọc Duy
sha1_object_info() in check_objects() may fail to locate an object in
the pack and return type OBJ_BAD. In that case, it will likely leave
the "size" field untouched. We delay error handling until later in
prepare_pack() though. Until then, do not touch "size" field.

This field should contain the default value zero, but we can't say
sha1_object_info() cannot damage it. This becomes more important later
when the object size may have to be retrieved back from the
(non-existing) pack.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 34e285a1b7..481b55c746 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1741,7 +1741,7 @@ static void get_object_details(void)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
-   if (big_file_threshold < entry->size)
+   if (entry->type_valid && big_file_threshold < entry->size)
entry->no_try_delta = 1;
}
 
@@ -2454,7 +2454,7 @@ static void prepare_pack(int window, int depth)
 */
continue;
 
-   if (entry->size < 50)
+   if (!entry->type_valid || entry->size < 50)
continue;
 
if (entry->no_try_delta)
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 03/15] pack-objects: turn type and in_pack_type to bitfields

2018-03-31 Thread Nguyễn Thái Ngọc Duy
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.

A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.

But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger

fatal: unable to get type of object ...

Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.

[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)

[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 60 --
 cache.h|  2 ++
 object.h   |  1 -
 pack-bitmap-write.c|  6 ++---
 pack-objects.h | 20 --
 5 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c674b2843..7133baa63f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -265,7 +265,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
struct git_istream *st = NULL;
 
if (!usable_delta) {
-   if (entry->type == OBJ_BLOB &&
+   if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(entry->idx.oid.hash, , , 
NULL)) != NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
-   enum object_type type = entry->type;
+   enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
  dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0;   /* explicit */
else if (!entry->in_pack)
to_reuse = 0;   /* can't reuse what we don't have */
-   else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+   else if (oe_type(entry) == OBJ_REF_DELTA ||
+oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
-   else if (entry->type != entry->in_pack_type)
+   else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0;   /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0;   /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
 * And then all remaining commits and tags.
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (objects[i].type != OBJ_COMMIT &&
-   objects[i].type != OBJ_TAG)
+   if (oe_type([i]) != OBJ_COMMIT &&
+   oe_type([i]) != OBJ_TAG)
continue;
add_to_write_order(wo, _end, [i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
 * And then all the trees.
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (objects[i].type != OBJ_TREE)
+   if (oe_type([i]) != OBJ_TREE)
continue;
add_to_write_order(wo, _end, [i]);
}
@@ -1066,8 +1067,7 @@ static void create_object_entry(const struct object_id 
*oid,
 
entry = packlist_alloc(_pack, oid->hash, index_pos);
entry->hash = hash;
-   if (type)
-   entry->type = type;
+   oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+   enum object_type type;
 
buf = use_pack(p, _curs, entry->in_pack_offset, );
 
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
 * since non-delta representations could still be reused.
  

[PATCH v8 01/15] t/README: mention about running the test suite in special modes

2018-03-31 Thread Nguyễn Thái Ngọc Duy
From: Duy Nguyen 

There are features that would benefit from running the whole test
suite instead of just a few test cases written specifically for
them. Split-index mode is one of them. Document it.

This step is required because a few patches later, we will be
introduce more test modes like this to test some corner cases of
pack-objects as much as possible.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/README | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/README b/t/README
index 1a1361a806..09eb2b9768 100644
--- a/t/README
+++ b/t/README
@@ -292,6 +292,16 @@ and know what setup is needed for it.  Or when you want to 
run
 everything up to a certain test.
 
 
+Running tests with special setups
+-
+
+The whole test suite could be run to test some special features
+that cannot be easily covered by a few specific test cases. These
+could be enabled by running the test suite with correct GIT_TEST_
+environment set.
+
+GIT_TEST_SPLIT_INDEX forces split-index mode on the whole test suite.
+
 Naming Tests
 
 
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 12/15] pack-objects: shrink size field in struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.

Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.

Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).

Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like

make test GIT_TEST_OE_SIZE=4

which forces this code on all objects larger than 3 bytes.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 105 +++--
 pack-objects.c |   3 ++
 pack-objects.h |  57 +-
 t/README   |   6 +++
 4 files changed, 146 insertions(+), 25 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7a84c3f59a..b4ea6290f9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,8 @@
 #include "packfile.h"
 
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
+#define SIZE(obj) oe_size(_pack, obj)
+#define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
 #define DELTA(obj) oe_delta(_pack, obj)
 #define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
 #define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
@@ -274,7 +276,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 
if (!usable_delta) {
if (oe_type(entry) == OBJ_BLOB &&
-   entry->size > big_file_threshold &&
+   oe_size_greater_than(_pack, entry, big_file_threshold) &&
(st = open_istream(entry->idx.oid.hash, , , 
NULL)) != NULL)
buf = NULL;
else {
@@ -384,12 +386,13 @@ static off_t write_reuse_object(struct hashfile *f, 
struct object_entry *entry,
unsigned char header[MAX_PACK_OBJECT_HEADER],
  dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
+   unsigned long entry_size = SIZE(entry);
 
if (DELTA(entry))
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
- type, entry->size);
+ type, entry_size);
 
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -406,7 +409,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
datalen -= entry->in_pack_header_size;
 
if (!pack_to_stdout && p->index_version == 1 &&
-   check_pack_inflate(p, _curs, offset, datalen, entry->size)) {
+   check_pack_inflate(p, _curs, offset, datalen, entry_size)) {
error("corrupt packed object for %s",
  oid_to_hex(>idx.oid));
unuse_pack(_curs);
@@ -1407,6 +1410,8 @@ static void cleanup_preferred_base(void)
 
 static void check_object(struct object_entry *entry)
 {
+   unsigned long canonical_size;
+
if (IN_PACK(entry)) {
struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
@@ -1444,7 +1449,7 @@ static void check_object(struct object_entry *entry)
default:
/* Not a delta hence we've already got all we need. */
oe_set_type(entry, entry->in_pack_type);
-   entry->size = in_pack_size;
+   SET_SIZE(entry, in_pack_size);
entry->in_pack_header_size = used;
if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > 
OBJ_BLOB)
goto give_up;
@@ -1501,9 +1506,9 @@ static void check_object(struct object_entry *entry)
 * circular deltas.
 */
oe_set_type(entry, entry->in_pack_type);
-   entry->size = in_pack_size; /* delta size */
+   SET_SIZE(entry, in_pack_size); /* delta size */
SET_DELTA(entry, base_entry);
-   entry->delta_size = entry->size;
+   entry->delta_size = in_pack_size;
   

[PATCH v8 02/15] pack-objects: a bit of document about struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 45 +
 1 file changed, 45 insertions(+)

diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..c0a1f61aac 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,51 @@
 #ifndef PACK_OBJECTS_H
 #define PACK_OBJECTS_H
 
+/*
+ * basic object info
+ * -
+ * idx.oid is filled up before delta searching starts. idx.crc32 is
+ * only valid after the object is written out and will be used for
+ * generating the index. idx.offset will be both gradually set and
+ * used in writing phase (base objects get offset first, then deltas
+ * refer to them)
+ *
+ * "size" is the uncompressed object size. Compressed size of the raw
+ * data for an object in a pack is not stored anywhere but is computed
+ * and made available when reverse .idx is made.
+ *
+ * "hash" contains a path name hash which is used for sorting the
+ * delta list and also during delta searching. Once prepare_pack()
+ * returns it's no longer needed.
+ *
+ * source pack info
+ * 
+ * The (in_pack, in_pack_offset) tuple contains the location of the
+ * object in the source pack. in_pack_header_size allows quickly
+ * skipping the header and going straight to the zlib stream.
+ *
+ * "type" and "in_pack_type" both describe object type. in_pack_type
+ * may contain a delta type, while type is always the canonical type.
+ *
+ * deltas
+ * --
+ * Delta links (delta, delta_child and delta_sibling) are created to
+ * reflect that delta graph from the source pack then updated or added
+ * during delta searching phase when we find better deltas.
+ *
+ * delta_child and delta_sibling are last needed in
+ * compute_write_order(). "delta" and "delta_size" must remain valid
+ * at object writing phase in case the delta is not cached.
+ *
+ * If a delta is cached in memory and is compressed, delta_data points
+ * to the data and z_delta_size contains the compressed size. If it's
+ * uncompressed [1], z_delta_size must be zero. delta_size is always
+ * the uncompressed size and must be valid even if the delta is not
+ * cached.
+ *
+ * [1] during try_delta phase we don't bother with compressing because
+ * the delta could be quickly replaced with a better one.
+ */
 struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 07/15] pack-objects: move in_pack out of struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.

This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.

If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.

The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running

make test GIT_TEST_FULL_IN_PACK_ARRAY=1

to take advantage of other special case tests.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 28 +++---
 cache.h|  1 +
 pack-objects.c | 66 ++
 pack-objects.h | 36 ++-
 t/README   |  4 +++
 t/t5300-pack-object.sh |  5 
 6 files changed, 128 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7a672366bf..4e5812a053 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,6 +29,8 @@
 #include "list.h"
 #include "packfile.h"
 
+#define IN_PACK(obj) oe_in_pack(_pack, obj)
+
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
N_("git pack-objects [...]  [<  | < 
]"),
@@ -367,7 +369,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
 {
-   struct packed_git *p = entry->in_pack;
+   struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +480,7 @@ static off_t write_object(struct hashfile *f,
 
if (!reuse_object)
to_reuse = 0;   /* explicit */
-   else if (!entry->in_pack)
+   else if (!IN_PACK(entry))
to_reuse = 0;   /* can't reuse what we don't have */
else if (oe_type(entry) == OBJ_REF_DELTA ||
 oe_type(entry) == OBJ_OFS_DELTA)
@@ -1074,7 +1076,7 @@ static void create_object_entry(const struct object_id 
*oid,
else
nr_result++;
if (found_pack) {
-   entry->in_pack = found_pack;
+   oe_set_in_pack(_pack, entry, found_pack);
entry->in_pack_offset = found_offset;
}
 
@@ -1399,8 +1401,8 @@ static void cleanup_preferred_base(void)
 
 static void check_object(struct object_entry *entry)
 {
-   if (entry->in_pack) {
-   struct packed_git *p = entry->in_pack;
+   if (IN_PACK(entry)) {
+   struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1537,16 @@ static int pack_offset_sort(const void *_a, const void 
*_b)
 {
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+   const struct packed_git *a_in_pack = IN_PACK(a);
+   const struct packed_git *b_in_pack = IN_PACK(b);
 
/* avoid filesystem trashing with loose objects */
-   if (!a->in_pack && !b->in_pack)
+   if (!a_in_pack && !b_in_pack)
return oidcmp(>idx.oid, >idx.oid);
 
-   if (a->in_pack < b->in_pack)
+   if (a_in_pack < b_in_pack)
return -1;
-   if (a->in_pack > b->in_pack)
+   if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1582,7 @@ static void drop_reused_delta(struct object_entry *entry)
 
oi.sizep = >size;
oi.typep = 
-   if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
{
+   if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, ) < 0) 
{
/*
 * We failed to get the info from this pack for some reason;
 * fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1852,8 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
 * it, we will still save the transfer cost, as we already know
 * the other side has it and we won't send src_entry at all.
 */
-   if (reuse_delta && trg_entry->in_pack &&
-   trg_entry->in_pack == src_entry->in_pack &&
+   if (reuse_delta && IN_PACK(trg_entry) &&
+   IN_PACK(trg_entry) == IN_PACK(src_entry) &&

[PATCH v8 05/15] pack-objects: use bitfield for object_entry::depth

2018-03-31 Thread Nguyễn Thái Ngọc Duy
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt   | 1 +
 Documentation/git-pack-objects.txt | 4 +++-
 Documentation/git-repack.txt   | 4 +++-
 builtin/pack-objects.c | 6 ++
 pack-objects.h | 5 ++---
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf10c..9bd3f5a789 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2412,6 +2412,7 @@ pack.window::
 pack.depth::
The maximum delta depth used by linkgit:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+   Maximum value is 4095.
 
 pack.windowMemory::
The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index 81bc490ac5..3503c9e3e6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
-   The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
 
 --window-memory=::
This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
-   The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
 
 --threads=::
This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebb6e034cb..2ce05626d2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,12 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
 
+   if (depth >= (1 << OE_DEPTH_BITS)) {
+   warning(_("delta chain depth %d is too deep, forcing %d"),
+   depth, (1 << OE_DEPTH_BITS) - 1);
+   depth = (1 << OE_DEPTH_BITS) - 1;
+   }
+
argv_array_push(, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index 080ef62d31..cdce1648de 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
 #define PACK_OBJECTS_H
 
 #define OE_DFS_STATE_BITS  2
+#define OE_DEPTH_BITS  12
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -89,9 +90,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
-
-   int depth;
-
+   unsigned depth:OE_DEPTH_BITS;
 };
 
 struct packing_data {
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 04/15] pack-objects: use bitfield for object_entry::dfs_state

2018-03-31 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c |  3 +++
 pack-objects.h | 28 +---
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7133baa63f..ebb6e034cb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
OPT_END(),
};
 
+   if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+   BUG("too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
 
reset_pack_idx_option(_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index b4a83a6123..080ef62d31 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
 #ifndef PACK_OBJECTS_H
 #define PACK_OBJECTS_H
 
+#define OE_DFS_STATE_BITS  2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+   DFS_NONE = 0,
+   DFS_ACTIVE,
+   DFS_DONE,
+   DFS_NUM_STATES
+};
+
 /*
  * basic object info
  * -
@@ -73,19 +88,10 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+   unsigned dfs_state:OE_DFS_STATE_BITS;
 
-   /*
-* State flags for depth-first search used for analyzing delta cycles.
-*
-* The depth is measured in delta-links to the base (so if A is a delta
-* against B, then A has a depth of 1, and B a depth of 0).
-*/
-   enum {
-   DFS_NONE = 0,
-   DFS_ACTIVE,
-   DFS_DONE
-   } dfs_state;
int depth;
+
 };
 
 struct packing_data {
-- 
2.17.0.rc2.515.g4feb9b7923



[PATCH v8 06/15] pack-objects: move in_pack_pos out of struct object_entry

2018-03-31 Thread Nguyễn Thái Ngọc Duy
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c |  3 ++-
 pack-bitmap-write.c|  8 +---
 pack-bitmap.c  |  2 +-
 pack-bitmap.h  |  4 +++-
 pack-objects.h | 16 +++-
 5 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2ce05626d2..7a672366bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -879,7 +879,8 @@ static void write_pack_file(void)
 
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
-   bitmap_writer_build_type_index(written_list, 
nr_written);
+   bitmap_writer_build_type_index(
+   _pack, written_list, nr_written);
}
 
finish_tmp_packfile(, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index fd11f08940..f7c897515b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
 /**
  * Build the initial type index for the packfile
  */
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+   struct pack_idx_entry **index,
uint32_t index_nr)
 {
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry 
**index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+   ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
 
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
 
-   entry->in_pack_pos = i;
+   oe_set_in_pack_pos(to_pack, entry, i);
 
switch (oe_type(entry)) {
case OBJ_COMMIT:
@@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
 
-   return entry->in_pack_pos;
+   return oe_in_pack_pos(writer.to_pack, entry);
 }
 
 static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9270983e5f..865d9ecc4e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
 
if (oe)
-   reposition[i] = oe->in_pack_pos + 1;
+   reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
}
 
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, 
khash_sha1 *reused_bi
 
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t 
index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+   struct pack_idx_entry **index,
+   uint32_t index_nr);
 void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index cdce1648de..71ea992c3c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -79,7 +79,6 @@ struct object_entry {
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
uint32_t hash;  /* name hint hash */
-   unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
@@ -99,6 +98,8 @@ struct packing_data {
 
int32_t *index;
uint32_t index_size;
+
+   unsigned int *in_pack_pos;
 };
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -144,4 +145,17 @@ static inline void oe_set_type(struct object_entry *e,
e->type_ = (unsigned)type;
 }
 
+static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+   return pack->in_pack_pos[e - pack->objects];
+}
+
+static inline void oe_set_in_pack_pos(const struct 

Re: What's cooking in git.git (Mar 2018, #06; Fri, 30)

2018-03-31 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 10:38 PM, Junio C Hamano  wrote:
> * nd/repack-keep-pack (2018-03-26) 7 commits
>  - pack-objects: show some progress when counting kept objects
>  - gc --auto: exclude base pack if not enough mem to "repack -ad"
>  - gc: handle a corner case in gc.bigPackThreshold
>  - gc: add gc.bigPackThreshold config
>  - gc: add --keep-largest-pack option
>  - repack: add --keep-pack option
>  - t7700: have closing quote of a test at the beginning of line
>
>  "git gc" in a large repository takes a lot of time as it considers
>  to repack all objects into one pack by default.  The command has
>  been taught to pretend as if the largest existing packfile is
>  marked with ".keep" so that it is left untouched while objects in
>  other packs and loose ones are repacked.
>
>  Reroll exists, but it seems to be still slushy.
>  cf. <20180316192745.19557-1-pclo...@gmail.com>

The next one v4 [1] should be less slushy.

[1] https://public-inbox.org/git/20180324072507.21059-1-pclo...@gmail.com/#t
-- 
Duy


Re: [PATCH v7 12/13] pack-objects: shrink delta_size field in struct object_entry

2018-03-31 Thread Duy Nguyen
On Fri, Mar 30, 2018 at 11:24 PM, Jeff King  wrote:
> How come this doesn't get a pdata->oe_delta_size_limit like we have
> pdata->oe_size_limit? Would we want a matching
> $GIT_TEST_OE_DELTA_SIZE_BITS to test it, too?

Nope. This changes how the delta chain is formed (e.g. produces
shorter chains) and apparently some tests rely on that, like t5303.
-- 
Duy


Dear Customer

2018-03-31 Thread Remittance Dept
Bank Central Asia (BCA)
Jakarta Indonesia
Online Account Department.
Monday-Friday 8.30am to 4.pm


Dear Customer

This is to inform you that we have confirmed the sum of USD$48
.800.000.00 from the Federal Reserve Board New York  in order to setup
an online account on your behalf, so you are advice to fill the below
required information and get back to us  immediately.


Surname:
First Name:...
Other Name:
E-Mail Address:
Phone No:??.
Residential Address:?..
Country:...
State:..
Date of Birth:...
Marital Status:.
Occupation:..
Name of Beneficiary to account:...

we are waiting to hear from you with the complete information to
enable us proceed with  the online account ,be informed that your
account will have check books and ATM debit card and masterCard ones
your online account is ready.

Thanks for banking with us.

Yours faithfully,
Mr. George Melvin
Director Online account dept (BCA)


Re: Bad refspec messes up bundle.

2018-03-31 Thread Luciano Joublanc
Hi Johannes,

With such a comprehensive reply, I would feel guilty not making a
contribution now :) Be forewarned though, It's been about ten years
since I wrote anything in `C`!

I've cloned the `maint` branch, built the project, and added the test
as you suggested - it's failing as expected.

I'm somewhat confused though. I think it's m limited understanding of
'ref' and 'commit'.

On 30 March 2018 at 11:20, Johannes Schindelin
 wrote:
>
>
>   However, this would be incorrect, as the flags are stored with the
>   *commit*, not with the ref. So if two refs point to the same commit,
>   that new code would skip the second one by mistake!


Isn't that the point here? to deduplicate commits?  My limited
understanding is that at a 'ref' is like an alias or pointer to a
commit.

>
>   By the way, this makes me think that there is another very real bug in
>   the bundle code, in the part I showed above. Suppose you have a `master`
>   and a `next` ref, and both point at the same commit, then you would want
>   `git bundle create next.bundle master..next` to list `next`, don't you
>   think?


Doesn't this contradict what you just said, that we don't want to skip
refs with the same commit #?

In fact, if you look in the calling function, there is a
`object_array_remove_duplicates();`
Which to the best of my understanding removes duplicate refs (not
commits). However, I suspect this doesn't cover the `--all` case as
it's a switch rather than a revspec? Would that be right?

>
>
> - most likely, the best way to avoid duplicate refs entries is to use the
>   actual ref name and put it into a hash set. Luckily, we do have code
>   for this, and examples how to use it, too. See e.g. fc65b00da7e
>   (merge-recursive: change current file dir string_lists to hashmap,
>   2017-09-07). So you would define something like
>

Separately, if I do end up including the hashmap code, it should be
refactored out into it's own file, right?

Thanks again,

Luciano

-- 
This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient 
of this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded 
as an offer to sell or as a solicitation of an offer to buy any financial 
product, an official confirmation of any transaction, or as an official 
statement of the Dinosaur Group.  Email transmission cannot be guaranteed 
to be secure or error-free.  Therefore, we do not represent that this 
information is complete or accurate and it should not be relied upon as 
such.  All information is subject to change without notice.


TRADING ACCOUNT

2018-03-31 Thread KELLY ALAN

Dear sir ,

I KELLY ALAN purchasing and sales manager of CFM INTERNATIONAL . Our 
Company specialised in Supplying computer hardware and Electronic .We 
want to extend our supplier list because of concurrency in prices on the 
international market
We are seeking a supplier with whom we can to have  partnered long-term 
in order to have competitive prices .we are interested to buy the 
products you sell and want to place an order with you in big quantities.
Can you give us payment facilities ( 14 , 30 or 60 days payment terms ) 
?
What is the procedure for our account opening  and credit line 
application ?


Cordially

KELLY ALAN

CFM INTERNATIONAL
2 BOULEVARD DU GAL MARTIAL VALIN
75015 PARIS
REG N° 732075312
VAT N°FR60732075312
TEL +33171025367
FAX +33177759149
https://www.cfmaeroengines.com