date:20180301

Re: Bug report: "Use of uninitialized value $_ in print"

2018-03-01 Thread Jonathan Nieder

Hi,

Sam Kuper wrote:

> First, background. I encountered a bug on Debian Stretch, using this
> git version:
>
> $ git --version
> git version 2.11.0
>
> The bug is that in the midst of running
>
> git -c interactive.diffFilter="git diff --word-diff --color" add --patch
>
> and after having answered "y" to some patches and "n" to others, git
> suddenly spewed the following output:
>
> Use of uninitialized value $_ in print at
> /usr/lib/git-core/git-add--interactive line 1371,  line 74.
> Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n
> Use of uninitialized value $_ in print at
> /usr/lib/git-core/git-add--interactive line 1371,  line 75.
[...]

Strange.  The relevant line, for reference:

$ dpkg-deb --fsys-tarfile git_2.11.0-3+deb9u2_amd64.deb |
  tar Oxf - ./usr/lib/git-core/git-add--interactive |
  sed -n '1370,1372 p'

for (@{$hunk[$ix]{DISPLAY}}) {
print; < this one
}

This is a foreach loop, so it's supposed to have set $_ to each value
in the list @{$hunk[$ix]{DISPLAY}).  So why is Perl considering it
uninitialized?

Is this reproducible for you?  Do you have more details about how I
can reproduce it?

What arch are you on?  What perl version do you use?  Can you report
this using "reportbug git"?

Thanks,
Jonathan

Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-01 Thread Sergey Organov

Hi Igor,

Igor Djordjevic  writes:

> Hi Sergey,
>
> On 01/03/2018 06:39, Sergey Organov wrote:

[...]

>> 
>> Yeah, I now see it myself. I'm sorry for being lazy and not inspecting
>> this more carefully in the first place.
>
> No problem, that`s why we`re discussing it, and I`m glad we`re 
> aligned now, so we can move forward :)
>
>> > So while your original proposal currently seems like it could be 
>> > working nicely for non-interactive rebase (and might be some simpler 
>> > interactive ones), now hitting/acknowledging its first real use 
>> > limit, my additional quick attempt[1] just tries to aid pretty 
>> > interesting case of complicated interactive rebase, too, where we 
>> > might be able to do better as well, still using you original proposal 
>> > as a base idea :)
>> 
>> Yes, thank you for pushing me back to reality! :-) The work and thoughts
>> you are putting into solving the puzzle are greatly appreciated!
>
> You`re welcome, and I am enjoying it :)
>
>> Thinking about it overnight, I now suspect that original proposal had a
>> mistake in the final merge step. I think that what you did is a way to
>> fix it, and I want to try to figure what exactly was wrong in the
>> original proposal and to find simpler way of doing it right.
>> 
>> The likely solution is to use original UM as a merge-base for final
>> 3-way merge of U1' and U2', but I'm not sure yet. Sounds pretty natural
>> though, as that's exactly UM from which both U1' and U2' have diverged
>> due to rebasing and other history editing.
>
> Yes, this might be it...! ;)
>
> To prove myself it works, I`ve assembled a pretty crazy `-s ours`  
> merge interactive rebase scenario, and it seems this passes the test, 
> ticking all the check boxes (I could think of) :P

I must admit it's quite a relief to hear this. What we now have is so
simple and obvious that it'd be a huge disappointment if didn't work!

> Here, merge commit M is done with `-s ours` (obsoleting branch "B"), 
> plus amended to make it an "evil merge", where a commit B2 from 
> obsoleted branch "B" is cherry picked to "master".

[...]

> There, I hope I didn`t miss any expectation. And, it _seems_ to work 
> exactly as expected :D

That's very nice, to the level of being even suspect! :-)

To avoid falling into euphoria though, we need to keep in mind that
"expectations" is rather vague concept, and we likely still need to stop
for user amendment unless we absolutely sure nothing surprising happens.
I.e., we better require U1'==U2' test to succeed to proceed non-stop
automatically. Besides, it will be somewhat inline with what 'rerere'
does.

> In real life, except for usual possibility for conflicts during 
> commit rebasing, we might experience _three_ possible conflict 
> situations once "rebased" merge itself is to be created - two when 
> rebasing each of temporary merge helper commits, and one on the 
> "rebased" merge itself. This is something where we might think about 
> user experience, not introducing (too much) confusion...

Yeah, this is terribly important issue to take care of! Relative
simplicity of the concept itself raises the chances of finding a
suitable solution, I hope.

-- Sergey

Re: [RFC] Contributing to Git (on Windows)

2018-03-01 Thread Jonathan Nieder

Hi,

Derrick Stolee wrote:

> Now, we'd like to make that document publicly available. These steps are
> focused on a Windows user, so we propose putting them in the
> git-for-windows/git repo under CONTRIBUTING.md. I have a pull request open
> for feedback [1]. I'll read comments on that PR or in this thread.

Thanks!  I wonder if we can put this in the standard Documentation/
directory as well.  E.g. the Windows CONTRIBUTING.md could say say
"See Documentation/Contributing-On-Windows.md" so that the bulk of the
text would not have to be git-for-windows specific.

[...]
> +++ b/CONTRIBUTING.md
> @@ -0,0 +1,401 @@
> +How to Contribute to Git for Windows
> +

Would it make sense for this to be checked in with LF instead of CRLF
line endings, so that clones use the user's chosen / platform native
line ending?  The .gitattributes file could include

/CONTRIBUTING.md text

so that line ending conversion takes place even if the user hasn't
enabled core.autocrlf.

[...]
> +The SDK uses a different credential manager, so you may still want to 
> use Visual Studio
> +or normal Git Bash for interacting with your remotes.  Alternatively, 
> use SSH rather
> +than HTTPS and avoid credential manager problems.

Where can I read more about the problems in question?

[...]
> +Many new contributors ask: What should I start working on?
> +
> +One way to win big with the maintainer of an open-source project is to look 
> at the
> +[issues page](https://github.com/git-for-windows/git/issues) and see if 
> there are any issues that
> +you can fix quickly, or if anything catches your eye.

You can also look at https://crbug.com/git for non
Windows-specific issues.  And you can look at recent user questions
on the mailing list: https://public-inbox.org/git

[...]
> +If you are adding new functionality, you may need to create low-level tests 
> by creating
> +helper commands that test a very limited action. These commands are stored 
> in `t/helpers`.
> +When adding a helper, be sure to add a line to `t/Makefile` and to the 
> `.gitignore` for the
> +binary file you add. The Git community prefers functional tests using the 
> full `git`
> +executable, so be sure the test helper is required.

Maybe s/low-level/unit/?

[...]
> +Read [`t/README`](https://github.com/git/git/blob/master/t/README) for more 
> details.

Forgive my ignorance: does github flavored markdown allow relative
links?  (I.e., could this say [`t/README`](t/README)?)

[...]
> +You can also set certain environment variables to help test the performance 
> on different
> +repositories or with more repetitions. The full list is available in
> +[the `t/perf/README` 
> file](https://github.com/git/git/blob/master/t/perf/README),

Likewise.

[...]
> +Test Your Changes on Linux
> +--
> +
> +It can be important to work directly on the [core Git 
> codebase](https://github.com/git/git),
> +such as a recent commit into the `master` or `next` branch that has not been 
> incorporated
> +into Git for Windows. Also, it can help to run functional and performance 
> tests on your
> +code in Linux before submitting patches to the Linux-focused mailing list.

I'm surprised at this advice.  Does it actually come up?  When I was
on Mac I never worried about this, nor when I was on Windows.

I'm personally happy to review patches that haven't been tested on
Linux, though it's of course even nicer if the patch author mentions
what platforms they've tested on.

Maybe this can be reframed to refer specifically to cases where you've
modified some Linux platform-specific code (e.g. to add a new feature
to run-command.c)?

[...]
> +When preparing your patch, it is important to put yourself in the shoes of 
> the maintainer.

... and in the shoes of other users and developers working with Git down
the line who will interact with the patch later!

Actually, the maintainer is one of the least important audiences for a
commit message.  But may 'the maintainer' is a stand-in for 'the
project' here?

[...]
> +* Make sure the commits are signed off using `git commit (-s|--signoff)`. 
> This means that you
> +  testify to be allowed to contribute your changes, in particular that if 
> you did this on company
> +  time, said company approved to releasing your work as Open Source under 
> the GNU Public License v2.

Can this either point to or quote the relevant legal text from
Documentation/SubmittingPatches?  It feels dangerous (in the sense of,
potentially leading to misunderstandings and major future pain) to ask
people to sign off without them knowing exactly what that means.

The rest also looks nice.  Thanks for working on this.

Sincerely,
Jonathan

Re: [PATCH 2/2] parse-options: remove the unused parse_opt_commits() function

2018-03-01 Thread Jonathan Nieder

Ramsay Jones wrote:

> Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
>  is invalid', 2018-02-23), removed the last use of the callback
> function parse_opt_commits(). Remove this function declaration and
> definition, since it is now dead code.
>
> Signed-off-by: Ramsay Jones 
> ---
>  parse-options-cb.c | 16 
>  parse-options.h|  1 -
>  2 files changed, 17 deletions(-)

Reviewed-by: Jonathan Nieder 
Thanks.

Re: [PATCH 1/2] ref-filter: mark a file-local symbol as static

2018-03-01 Thread Jonathan Nieder

Hi,

Ramsay Jones wrote:

> Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
>  is invalid', 2018-02-23) added the add_str_to_commit_list()
> function, which causes sparse to issue a "... not declared. Should it
> be static?" warning for that symbol.

Thanks for catching it!

> In order to suppress the warning, mark that function as static.

Isn't this closer to

Indeed, the function is only used in this one compilation
unit. Mark it static.

?  In other words, sparse's warning is accurate, and this is not about
trying to quiet a false positive but about addressing a true positive.

> Signed-off-by: Ramsay Jones 
> ---
>  ref-filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks,
Jonathan

[no subject]

2018-03-01 Thread William Broady

hihttp://bit.ly/2oxaeuW   William Broady

[PATCH 2/2] parse-options: remove the unused parse_opt_commits() function

2018-03-01 Thread Ramsay Jones


Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
 is invalid', 2018-02-23), removed the last use of the callback
function parse_opt_commits(). Remove this function declaration and
definition, since it is now dead code.

Signed-off-by: Ramsay Jones 
---
 parse-options-cb.c | 16 
 parse-options.h|  1 -
 2 files changed, 17 deletions(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index c6679cb2c..c7320a73f 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -78,22 +78,6 @@ int parse_opt_verbosity_cb(const struct option *opt, const 
char *arg,
return 0;
 }
 
-int parse_opt_commits(const struct option *opt, const char *arg, int unset)
-{
-   struct object_id oid;
-   struct commit *commit;
-
-   if (!arg)
-   return -1;
-   if (get_oid(arg, ))
-   return error("malformed object name %s", arg);
-   commit = lookup_commit_reference();
-   if (!commit)
-   return error("no such commit %s", arg);
-   commit_list_insert(commit, opt->value);
-   return 0;
-}
-
 int parse_opt_object_name(const struct option *opt, const char *arg, int unset)
 {
struct object_id oid;
diff --git a/parse-options.h b/parse-options.h
index 4b4734f2e..2b8378ac1 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -224,7 +224,6 @@ extern int parse_opt_expiry_date_cb(const struct option *, 
const char *, int);
 extern int parse_opt_color_flag_cb(const struct option *, const char *, int);
 extern int parse_opt_verbosity_cb(const struct option *, const char *, int);
 extern int parse_opt_object_name(const struct option *, const char *, int);
-extern int parse_opt_commits(const struct option *, const char *, int);
 extern int parse_opt_tertiary(const struct option *, const char *, int);
 extern int parse_opt_string_list(const struct option *, const char *, int);
 extern int parse_opt_noop_cb(const struct option *, const char *, int);
-- 
2.16.0

[PATCH 1/2] ref-filter: mark a file-local symbol as static

2018-03-01 Thread Ramsay Jones


Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
 is invalid', 2018-02-23) added the add_str_to_commit_list()
function, which causes sparse to issue a "... not declared. Should it
be static?" warning for that symbol.

In order to suppress the warning, mark that function as static.

Signed-off-by: Ramsay Jones 
---
 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index f375e7670..69bf7b587 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2000,7 +2000,7 @@ static void do_merge_filter(struct ref_filter_cbdata 
*ref_cbdata)
free(to_clear);
 }
 
-int add_str_to_commit_list(struct string_list_item *item, void *commit_list)
+static int add_str_to_commit_list(struct string_list_item *item, void 
*commit_list)
 {
struct object_id oid;
struct commit *commit;
-- 
2.16.0

[PATCH 0/2] sparse warning on next branch

2018-03-01 Thread Ramsay Jones


Tonight's build had a sparse warning and an additional ./static-check.pl
symbol appear on the 'next' branch. As you know, I like to catch these
on the 'pu' branch before they graduate to 'next', but it seems that I
missed these! :( (The 'pu' branch is quite noisy with regard to sparse
and static-check.pl at the moment).

These patches were developed on top of 'next', however, I also tested
them on top of commit fcfba37337 directly. (Note, this is branch
'ps/contains-id-error-message' merged at commit 9623d6817b).

Ramsay Jones (2):
  ref-filter: mark a file-local symbol as static
  parse-options: remove the unused parse_opt_commits() function

 parse-options-cb.c | 16 
 parse-options.h|  1 -
 ref-filter.c   |  2 +-
 3 files changed, 1 insertion(+), 18 deletions(-)

-- 
2.16.0

Bug report: "Use of uninitialized value $_ in print"

2018-03-01 Thread Sam Kuper

First, background. I encountered a bug on Debian Stretch, using this
git version:

$ git --version
git version 2.11.0


The bug is that in the midst of running

git -c interactive.diffFilter="git diff --word-diff --color" add --patch

and after having answered "y" to some patches and "n" to others, git
suddenly spewed the following output:


Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 74.
Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.
Use of uninitialized value $_ in print at
/usr/lib/git-core/git-add--interactive line 1371,  line 75.


I hope that this bug report can help the git core maintainers to
either fix the problem upstream, or to co-ordinate a fix with the
Debian git maintainer(s) if the bug does not exist upstream.

Thanks for the great DVCS :)

P.S. I am not subscribed to this mailing list, so please CC me in your
reply if you want me to see it.

Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)

2018-03-01 Thread Igor Djordjevic

Hi Sergey,

On 01/03/2018 06:39, Sergey Organov wrote:
> 
> > > (3) ---X1---o---o---o---o---o---X2
> > >|\   |\
> > >| A1---A2---A3---U1  | A1'--A2'--A3'--U1'
> > >| \  |
> > >|  M |
> > >| /  |
> > >\-B1---B2---B3---U2  \-B1'--B2'--B3'--U2'
> > >
> > 
> > Meh, I hope I`m rushing it now, but for example, if we had decided to 
> > drop commit A2 during an interactive rebase (so losing A2' from 
> > diagram above), wouldn`t U2' still introduce those changes back, once 
> > U1' and U2' are merged, being incorrect/unwanted behavior...? :/
> > 
> > [...]
> 
> Yeah, I now see it myself. I'm sorry for being lazy and not inspecting
> this more carefully in the first place.

No problem, that`s why we`re discussing it, and I`m glad we`re 
aligned now, so we can move forward :)

> > So while your original proposal currently seems like it could be 
> > working nicely for non-interactive rebase (and might be some simpler 
> > interactive ones), now hitting/acknowledging its first real use 
> > limit, my additional quick attempt[1] just tries to aid pretty 
> > interesting case of complicated interactive rebase, too, where we 
> > might be able to do better as well, still using you original proposal 
> > as a base idea :)
> 
> Yes, thank you for pushing me back to reality! :-) The work and thoughts
> you are putting into solving the puzzle are greatly appreciated!

You`re welcome, and I am enjoying it :)

> Thinking about it overnight, I now suspect that original proposal had a
> mistake in the final merge step. I think that what you did is a way to
> fix it, and I want to try to figure what exactly was wrong in the
> original proposal and to find simpler way of doing it right.
> 
> The likely solution is to use original UM as a merge-base for final
> 3-way merge of U1' and U2', but I'm not sure yet. Sounds pretty natural
> though, as that's exactly UM from which both U1' and U2' have diverged
> due to rebasing and other history editing.

Yes, this might be it...! ;)

To prove myself it works, I`ve assembled a pretty crazy `-s ours`  
merge interactive rebase scenario, and it seems this passes the test, 
ticking all the check boxes (I could think of) :P

Let`s see our starting situation:

 (0) ---X8--B2'--X9 (master)
|\
| A1---A2---A3 (A)
| \
|  M (topic)
| /
\-B1---B2---B3 (B)

Here, merge commit M is done with `-s ours` (obsoleting branch "B"), 
plus amended to make it an "evil merge", where a commit B2 from 
obsoleted branch "B" is cherry picked to "master".

Now, we want to rebase "topic" (M) onto updated "master" (X9), but to 
make things more interesting, we`ll do it interactively, with some 
amendments, drops, additions and even more cherry-picks!

This is what the final result looks like:

 (1) ---X8--B2'--X9 (master)
 |\
 | A12--A2'---B3' (A)
 | \
 |  M' (topic)
 | /
 \-B1'--B3'---B4  (B)

During interactive rebase, on branch "A", we amended A1 into A12, 
dropped A3 and cherry-picked B3. On branch "B", B4 is added, B2' being 
omitted automatically as already present in "master".

So... In comparison to original merge commit M, rebased merge commit 
M' is expected to:

 - Add X9, from updated "master"
 - Have A1 changed to A12, due to A12 commit amendment
 - Keep A2, rebased as A2'
 - Remove A3, due to dropped A3 commit
 - Keep amendment from original (evil) merge commit M
 - Miss B1' like M does B, due to original `-s ours` merge strategy
 - Add B2, cherry-picked as B2' into "master"
 - Add B3, cherry-picked as B3' into "A"
 - Add B4, added to "B"
 - Most important, provide safety mechanism to "fail loud", being 
   aware of non-trivial things going on, allowing to stop for user 
   inspection/decision

There, I hope I didn`t miss any expectation. And, it _seems_ to work 
exactly as expected :D

Not to leave this to imagination only, and hopefully helping others 
to get to speed and possibly discuss this, pointing to still possible 
flaws, I`m adding a demo script[1], showing how this exact example 
works.

Note that script _is_ coined to avoid rebase conflicts, as they`re not 
currently important for the point to be made here.

In real life, except for usual possibility for conflicts during 
commit rebasing, we might experience _three_ possible conflict 
situations once "rebased" merge itself is to be created - two when 
rebasing each of temporary merge helper commits, and one on the 
"rebased" merge itself. This is something where we might think about 
user experience, not introducing (too much) confusion...

Regards, Buga

[1] Demonstration script:
-- >8 --
#!/bin/sh

# rm -rf ./.git
# rm -f ./test.txt

git init

touch ./test.txt
git add -- test.txt

#

Re: [PATCH 00/11] Reduce pack-objects memory footprint

2018-03-01 Thread Duy Nguyen

On Thu, Mar 1, 2018 at 8:33 PM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:
>
>> The array of object_entry in pack-objects can take a lot of memory
>> when pack-objects is run in "pack everything" mode. On linux-2.6.git,
>> this array alone takes roughly 800MB.
>>
>> This series reorders some fields and reduces field size... to keep
>> this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on
>> 64-bit linux and saves 260MB on linux-2.6.git.
>
> I'm very interested in this patch series. I don't have time to test this
> one right now (have to run), but with your previous RFC patch memory use
> (in the ~4GB range) on a big in-house repo went down by a bit over 3%,
> and it's ~5% faster.
>
> Before/after RSS 4440812 / 429 & runtime 172.73 / 162.45. This is
> after having already done a full git gc before, data via /usr/bin/time
> -v.

Jeff correctly pointed out elsewhere in this thread that RSS covers
both heap (this is what I try to reduce) and some file cache (we mmap
the whole pack file just to ease the reading) so RSS might not a good
indicator of memory reduction. Any new freed memory should be used for
cache which raises RSS back up. I think the RssAnon field in
/proc//status shows it better.

> So not huge, but respectable.
>
> We have a big repo, and this gets repacked on 6-8GB of memory on dev
> KVMs, so we're under a fair bit of memory pressure. git-gc slows things
> down a lot.
>
> It would be really nice to have something that made it use drastically
> less memory at the cost of less efficient packs. Is the property that

Ahh.. less efficient. You may be more interested in [1] then. It
avoids rewriting the base pack. Without the base pack, book keeping
becomes much much cheaper.

We still read every single byte in all packs though (I think, unless
you use pack-bitmap) and this amount of I/O affect the rest of the
system too. Perhaps reducing core.packedgitwindowsize might make it
friendlier to the OS, I don't know.

> you need to spend give or take the size of .git/objects in memory
> something inherent, or just a limitation of the current implementation?
> I.e. could we do a first pass to pick some objects based on some
> heuristic, then repack them N at a time, and finally delete the
> now-obsolete packs?
>
> Another thing I've dealt with is that on these machines their
> NFS-mounted storage gets exhausted (I'm told) due to some pathological
> operations git does during repack, I/O tends to get 5-6x slower. Of
> course ionice doesn't help because the local kernel doesn't know
> anything about how harmful it is.

[1] https://public-inbox.org/git/20180301092046.2769-1-pclo...@gmail.com/T/#u
-- 
Duy

Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry

2018-03-01 Thread Duy Nguyen

On Thu, Mar 1, 2018 at 9:49 PM, Jeff King  wrote:
> On Thu, Mar 01, 2018 at 04:10:48PM +0700, Nguyễn Thái Ngọc Duy wrote:
>
>> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
>> pack. Use an index isntead since the number of packs should be
>> relatively small.
>>
>> This limits the number of packs we can handle to 256 (still
>> unreasonably high for a repo to work well). If you have more than 256
>> packs, you'll need an older version of Git to repack first.
>
> I overall like the direction of this series, but I think this one is
> just too much. While you definitely shouldn't have a ton of packs, this
> leaves the user with no real escape hatch. And 256 isn't actually that
> many packs.

It was raised back to 4096 at the end (I didn't know how many spare
bits we had at this point).

Agreed on the escape hatch though. I think we could do better: if
there are more than X packs, we repack X packs into one and leave the
rest alone. The _next_ pack-objects will pick another X packs to
combine. Repeat until you only have one pack left.
-- 
Duy

Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config

2018-03-01 Thread Duy Nguyen

On Fri, Mar 2, 2018 at 1:14 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> pack-objects could be a big memory hog especially on large repos,
>> everybody knows that. The suggestion to stick a .keep file on the
>> largest pack to avoid this problem is also known for a long time.
>
> Yup, but not that it is not "largest" per-se.  The thing being large
> is a mere consequence that it is the base pack that holds the bulk
> of older parts of the history (e.g. the one that you obtained via
> the initial clone).

Thanks, "base pack" sounds much better. I was having trouble with
wording because I didn't nail this one down.

>> Let's do the suggestion automatically instead of waiting for people to
>> come to Git mailing list and get the advice. When a certain condition
>> is met, gc --auto create a .keep file temporary before repack is run,
>> then remove it afterward.
>>
>> gc --auto does this based on an estimation of pack-objects memory
>> usage and whether that fits in one third of system memory (the
>> assumption here is for desktop environment where there are many other
>> applications running).
>>
>> Since the estimation may be inaccurate and that 1/3 threshold is
>> arbitrary, give the user a finer control over this mechanism as well:
>> if the largest pack is larger than gc.bigPackThreshold, it's kept.
>
> If this is a transient mechanism during a single gc session, it
> would be far more preferrable if we can find a way to do this
> without actually having a .keep file on the filesystem.

That was my first attempt, manipulating packed_git::pack_keep inside
pack-objects. Then my whole git.git was gone. I was scared off so I
did this instead.

I've learned my lesson though (never test dangerous operations on your
worktree!) and will do that pack_keep again _if_ this gc --auto still
sounds like a good direction to go.
-- 
Duy

Re: Worktree silently deleted on git fetch/gc/log

2018-03-01 Thread Duy Nguyen

On Fri, Mar 2, 2018 at 3:14 AM, Eric Sunshine  wrote:
> As far as I know, there isn't any code in Git which would
> automatically remove the .git/worktrees/B directory, so it's not clear
> how that happened.

"git worktree prune" does, which is called by "git gc" and that was
actually executed from the command output in the original mail.

Дилян Палаузов did you move /git/B away manually? For example, you
originally created "B" with

git worktree add /somewhere/B g

then you do this at some point

mv /somewhere/B /git/B
-- 
Duy

Re: [PATCH v4 03/35] pkt-line: add delim packet support

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> On 03/01, Junio C Hamano wrote:
> ...
>> Hmph, strictly speaking, the "delim" does not have to be a part of
>> how packetized stream is defined.  As long as we stop abusing flush
>> as "This is merely an end of one segment of what I say." and make it
>> always mean "I am done speaking, it is your turn.", the application
>> payload can define its own syntax to separate groups of packets.
>
> Thanks actually a good point.  We could just as easily have the delim
> packet to be an empty packet-line "0004" or something like that.

Yes.  As long as there is an easy and obvious "cannot be a value"
constant, you can use it as a delimiter defined at the application
level.  For example, your command-request uses delim, like so:

+request = empty-request | command-request
+empty-request = flush-pkt
+command-request = command
+ capability-list
+ (command-args)
+ flush-pkt
+command = PKT-LINE("command=" key LF)
+command-args = delim-pkt
+  *PKT-Line(arg LF)

to mark the end of cap list, but if an empty packet does not make
sense as a member of a cap list and a commmand args list, then an
empty packet between cap list and command arg can be used instead.
A protocol-ignorant proxy can still work just fine.

Having a defined delim at the protocol level is often convenient, of
course, but once the application starts calling for multi-level
delimiters (i.e. maybe there are chapters and sections inside each
chapter in a single request message), it would not be sufficient to
define a single delim packet type.  The application layer needs to
define its own convention (e.g. if no "empty" section is allowed,
then "two consecutive delim is a chapter break; one delim is a
section break" can become a viable way to emulate multi-level
delimiters).

Re: [PATCH v4 12/35] serve: introduce git-serve

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

>  Documentation/technical/protocol-v2.txt | 171 

Unlike other things in Documentation/technical/, this is not listed
on TECH_DOCS list in Documentation/Makefile.  Shouldn't it be?

Re: [PATCH v4 03/35] pkt-line: add delim packet support

2018-03-01 Thread Brandon Williams

On 03/01, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > Brandon Williams  writes:
> >
> >> One of the design goals of protocol-v2 is to improve the semantics of
> >> flush packets.  Currently in protocol-v1, flush packets are used both to
> >> indicate a break in a list of packet lines as well as an indication that
> >> one side has finished speaking.  This makes it particularly difficult
> >> to implement proxies as a proxy would need to completely understand git
> >> protocol instead of simply looking for a flush packet.
> >
> > Good ;-) Yes, this has been one of the largest gripe about the
> > smart-http support code we have.
> 
> Hmph, strictly speaking, the "delim" does not have to be a part of
> how packetized stream is defined.  As long as we stop abusing flush
> as "This is merely an end of one segment of what I say." and make it
> always mean "I am done speaking, it is your turn.", the application
> payload can define its own syntax to separate groups of packets.

Thanks actually a good point.  We could just as easily have the delim
packet to be an empty packet-line "0004" or something like that.

> 
> I do not mind having this "delim" thing defined at the protocol
> level too much, though.

-- 
Brandon Williams

What's cooking in git.git (Mar 2018, #01; Thu, 1)

2018-03-01 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ab/untracked-cache-invalidation-docs (2018-02-09) 2 commits
  (merged to 'next' on 2018-02-14 at 11d2d07c4a)
 + update-index doc: note the caveat with "could not open..."
 + update-index doc: note a fixed bug in the untracked cache
 (this branch uses nd/fix-untracked-cache-invalidation.)

 Doc update to warn against remaining bugs in untracked cache.


* as/ll-i18n (2018-02-13) 1 commit
  (merged to 'next' on 2018-02-14 at b30154a04c)
 + Mark messages for translations

 Some messages in low level start-up codepath have been i18n-ized.


* bc/doc-interpret-trailers-grammofix (2018-02-13) 1 commit
  (merged to 'next' on 2018-02-14 at 940e6dc7a5)
 + docs/interpret-trailers: fix agreement error

 Docfix.


* bp/fsmonitor (2018-02-14) 1 commit
  (merged to 'next' on 2018-02-14 at 5c508858fb)
 + fsmonitor: update documentation to remove reference to invalid config 
settings

 Doc update for a recently added feature.


* bp/name-hash-dirname-fix (2018-02-08) 1 commit
  (merged to 'next' on 2018-02-14 at 2f564fb4b3)
 + name-hash: properly fold directory names in adjust_dirname_case()

 "git add" files in the same directory, but spelling the directory
 path in different cases on case insensitive filesystem, corrupted
 the name hash data structure and led to unexpected results.  This
 has been corrected.


* es/worktree-add-post-checkout-hook (2018-02-15) 1 commit
  (merged to 'next' on 2018-02-21 at 6ef6a130bf)
 + worktree: add: fix 'post-checkout' not knowing new worktree location

 "git worktree add" learned to run the post-checkout hook, just like
 "git clone" runs it upon the initial checkout.


* gs/test-unset-xdg-cache-home (2018-02-16) 1 commit
  (merged to 'next' on 2018-02-21 at 9aec46d404)
 + test-lib.sh: unset XDG_CACHE_HOME

 Test update.


* jc/blame-missing-path (2018-02-07) 1 commit
  (merged to 'next' on 2018-02-14 at 883d266e1e)
 + blame: tighten command line parser

 "git blame HEAD COPYING" in a bare repository failed to run, while
 "git blame HEAD -- COPYING" run just fine.  This has been corrected.


* jk/doc-do-not-write-extern (2018-02-08) 1 commit
  (merged to 'next' on 2018-02-14 at e55b5127de)
 + CodingGuidelines: mention "static" and "extern"

 Devdoc update.


* jk/gettext-poison (2018-02-08) 2 commits
  (merged to 'next' on 2018-02-14 at cca3719a59)
 + git-sh-i18n: check GETTEXT_POISON before USE_GETTEXT_SCHEME
 + t0205: drop redundant test

 Test updates.


* jk/push-options-via-transport-fix (2018-02-20) 2 commits
  (merged to 'next' on 2018-02-21 at a037cbfa2b)
 + remote-curl: unquote incoming push-options
 + t5545: factor out http repository setup

 "git push" over http transport did not unquote the push-options
 correctly.


* jk/sq-dequote-on-bogus-input (2018-02-14) 1 commit
  (merged to 'next' on 2018-02-14 at 75d4f1eaf8)
 + sq_dequote: fix extra consumption of source string

 Code to unquote single-quoted string (used in the parser for
 configuration files, etc.) did not diagnose bogus input correctly
 and produced bogus results instead.


* jk/t0002-simplify (2018-02-12) 1 commit
  (merged to 'next' on 2018-02-14 at a7a24f5f29)
 + t0002: simplify error checking

 Code cleanup.


* jk/test-hashmap-updates (2018-02-14) 6 commits
  (merged to 'next' on 2018-02-14 at a61a9bd8f0)
 + test-hashmap: use "unsigned int" for hash storage
 + test-hashmap: simplify alloc_test_entry
 + test-hashmap: use strbuf_getline rather than fgets
 + test-hashmap: use xsnprintf rather than snprintf
 + test-hashmap: check allocation computation for overflow
 + test-hashmap: use ALLOC_ARRAY rather than bare malloc

 Code clean-up.


* js/fix-merge-arg-quoting-in-rebase-p (2018-02-08) 1 commit
  (merged to 'next' on 2018-02-14 at 27ebf001a1)
 + rebase -p: fix incorrect commit message when calling `git merge`.

 "git rebase -p" mangled log messages of a merge commit, which is
 now fixed.


* js/packet-read-line-check-null (2018-02-08) 2 commits
  (merged to 'next' on 2018-02-14 at 6ba237b284)
 + always check for NULL return from packet_read_line()
 + correct error messages for NULL packet_read_line()

 Some low level protocol codepath could crash when they get an
 unexpected flush packet, which is now fixed.


* jt/binsearch-with-fanout (2018-02-15) 2 commits
  (merged to 'next' on 2018-02-15 at 7648891022)
 + packfile: refactor hash search with fanout table
 + packfile: remove GIT_DEBUG_LOOKUP log statements
 (this branch is used by ds/commit-graph.)

 Refactor the code to binary search starting from a

Re: Obsolete instruction in SubmittingPatches?

2018-03-01 Thread Junio C Hamano

Andrei Rybak  writes:

> On 01.03.2018 0:54, Junio C Hamano wrote:
>> Andrei Rybak  writes:
>> 
>>> Is this part of guidelines obsolete, or am I not understanding this
>>> correctly?
>> 
>> I am merely being nice (but only on "time-permitting" basis).
>
> Does that mean that the integration of a series is easier, when there is
> a re-send?

When I am not interested in a topic, or there are other reviewers
who are more qualified and are interested in the topic than I am,
the protocol to require the final "Here is a verbatim final resend,
the only difference from what was reviewed is the reviewed-by's from
the reviewers." will let me completely ignore the topic while it is
being discussed (as long as I trust the judgment of these reviewers).

Otherwise I'd need to keep track of the progress of the discussion
on each and every topic, which is often impossible.

Re: [PATCH v4 06/35] transport: use get_refs_via_connect to get refs

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> Remove code duplication and use the existing 'get_refs_via_connect()'
> function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> 'git_transport_push()'.
>
> Signed-off-by: Brandon Williams 
> ---
>  transport.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)

Nice ;-)

Re: [PATCH v4 05/35] upload-pack: factor out processing lines

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> Factor out the logic for processing shallow, deepen, deepen_since, and
> deepen_not lines into their own functions to simplify the
> 'receive_needs()' function in addition to making it easier to reuse some
> of this logic when implementing protocol_v2.

These little functions that still require their incoming data to
begin with fixed prefixes feels a bit strange way to refactor the
logic for later reuse (when I imagine "reuse", the first use case
that comes to my mind is "this data source our new code reads from
gives the same data as the old 'shallow' packet used to give, but in
a different syntax"---so I'd restructure the code in such a way that
the caller figures out the syntax part and the called helper just
groks the "information contents" unwrapped from the surface syntax;
the syntax may be different in the new codepath but once unwrapped,
the "information contents" to be processed would not be different
hence we can reuse the helper).

IOW, I would have expected the caller to be not like this:

> - if (skip_prefix(line, "shallow ", )) {
> - struct object_id oid;
> - struct object *object;
> - if (get_oid_hex(arg, ))
> - die("invalid shallow line: %s", line);
> - object = parse_object();
> - if (!object)
> - continue;
> - if (object->type != OBJ_COMMIT)
> - die("invalid shallow object %s", 
> oid_to_hex());
> - if (!(object->flags & CLIENT_SHALLOW)) {
> - object->flags |= CLIENT_SHALLOW;
> - add_object_array(object, NULL, );
> - }
> + if (process_shallow(line, ))
>   continue;
> + if (process_deepen(line, ))
>   continue;
...

but more like

if (skip_prefix(line, "shallow ", ) {
process_shallow(arg, );
continue;
}
if (skip_prefix(line, "deepen ", ) {
process_deepen(arg, );
continue;
}
...

I need to defer the final judgment until I see how they are used,
though.  It's not too big a deal either way---it just felt "not
quite right" to me.

Re: [PATCH v4 03/35] pkt-line: add delim packet support

2018-03-01 Thread Junio C Hamano

Junio C Hamano  writes:

> Brandon Williams  writes:
>
>> One of the design goals of protocol-v2 is to improve the semantics of
>> flush packets.  Currently in protocol-v1, flush packets are used both to
>> indicate a break in a list of packet lines as well as an indication that
>> one side has finished speaking.  This makes it particularly difficult
>> to implement proxies as a proxy would need to completely understand git
>> protocol instead of simply looking for a flush packet.
>
> Good ;-) Yes, this has been one of the largest gripe about the
> smart-http support code we have.

Hmph, strictly speaking, the "delim" does not have to be a part of
how packetized stream is defined.  As long as we stop abusing flush
as "This is merely an end of one segment of what I say." and make it
always mean "I am done speaking, it is your turn.", the application
payload can define its own syntax to separate groups of packets.

I do not mind having this "delim" thing defined at the protocol
level too much, though.

Re: [PATCH v4 00/35] protocol version 2

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> I've tried to keep building on the same base that I started with when
> sending out a new version of series, mostly because I thought it was
> easier to see what was different between rounds.

Yes.  It indeed is easier to see the evolution if the series does
not get rebased needlessly.

> I can, in the future, try to remember to put the commit its based on.
> Do we have any sort of guidance about the best practice here?

I recall we taught a new "--base" option to "format-patch" not too
long ago, so one way to do so may be:

$ git format-patch --cover-letter --base=v2.16.0-rc0 master..bw/protocol-v2
$ tail -4 -cover*.txt
base-commit: 1eaabe34fc6f486367a176207420378f587d3b48
--
2.16.2-345-g7e31236f65

perhaps?

Re: [PATCH v4 03/35] pkt-line: add delim packet support

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> One of the design goals of protocol-v2 is to improve the semantics of
> flush packets.  Currently in protocol-v1, flush packets are used both to
> indicate a break in a list of packet lines as well as an indication that
> one side has finished speaking.  This makes it particularly difficult
> to implement proxies as a proxy would need to completely understand git
> protocol instead of simply looking for a flush packet.

Good ;-) Yes, this has been one of the largest gripe about the
smart-http support code we have.

Re: [PATCH v4 02/35] pkt-line: allow peeking a packet line without consuming it

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> +enum packet_read_status packet_reader_read(struct packet_reader *reader)
> +{
> + if (reader->line_peeked) {
> + reader->line_peeked = 0;
> + return reader->status;
> + }
> +
> + reader->status = packet_read_with_status(reader->fd,
> +  >src_buffer,
> +  >src_len,
> +  reader->buffer,
> +  reader->buffer_size,
> +  >pktlen,
> +  reader->options);
> +
> + switch (reader->status) {
> + case PACKET_READ_EOF:
> + reader->pktlen = -1;
> + reader->line = NULL;
> + break;
> + case PACKET_READ_NORMAL:
> + reader->line = reader->buffer;
> + break;
> + case PACKET_READ_FLUSH:
> + reader->pktlen = 0;
> + reader->line = NULL;
> + break;
> + }
> +
> + return reader->status;
> +}

With the way _peek() interface interacts with the reader instance
(which by the way I find is well designed), it is understandable
that we want almost everything available in reader's fields, but
having to manually clear pktlen field upon non NORMAL status feels
a bit strange.  

Perhaps that is because the underlying packet_read_with_status()
does not set *pktlen in these cases?  Shouldn't it be doing that so
the caller does not have to?

A similar comment applies for reader's line field.  In priniciple,
as the status field is part of a reader, it does not have to exist
as a separate field, i.e.

#define line_of(reader) \
((reader).status == PACKET_READ_NORMAL ? \
(reader).buffer : NULL)

can be used to as substitute for it.  I guess it depends on how the
actual callers wants to use this interface.

Re: [PATCH v2 0/5] roll back locks in various code paths

2018-03-01 Thread Martin Ågren

On 1 March 2018 at 20:24, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> IMHO the result is easier to follow. Except for one case:
>>
>>> [...]
>>
>> where I think the logic just ends up repeating itself.
>
> Yup, this one I also had a bit of trouble with.

Thanks for your feedback, both of you. It's much appreciated.

After thinking about it, I tend to agree. That rewrite loses an
indentation level and makes it a bit clearer that we have two steps,
"maybe bail" and "write". But at the cost of duplicating logic -- after
all, those two steps are very closely related, so there's no need to
artificially separate them.

Here it is again, without that hunk, and without the commit message
claim that it'd be a good thing to have just a few uses of
"active_cache_changed" remaining.

This could go as a patch 6/5 onto ma/roll-back-lockfiles. I keep
debating myself whether this would do better earlier in such a six-pack,
where it would be "also, this releases a lock where we used to forget
to". Right now, there is overlap between patch 3/5 (which adds a line)
and this patch (which removes it). This patch doesn't entirely negate
the need for patch 3/5 though..

Martin

-- >8 --
Subject: write_locked_index(): add flag to avoid writing unchanged index

We have several callers like

if (active_cache_changed && write_locked_index(...))
handle_error();
rollback_lock_file(...);

where the final rollback is needed because "!active_cache_changed"
shortcuts the if-expression. There are also a few variants of this,
including some if-else constructs that make it more clear when the
explicit rollback is really needed.

Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and
simplify the callers. Leave the most complicated of the callers (in
builtin/update-index.c) unchanged. Rewriting it to use this new flag
would end up duplicating logic.

We could have made the new flag behave the other way round
("FORCE_WRITE"), but that could break existing users behind their backs.
Let's take the more conservative approach. We can still migrate existing
callers to use our new flag. Later we might even be able to flip the
default, possibly without entirely ignoring the risk to in-flight or
out-of-tree topics.

Suggested-by: Jeff King 
Signed-off-by: Martin Ågren 
---
 cache.h   |  4 
 builtin/add.c |  7 +++
 builtin/commit.c  | 10 +++---
 builtin/merge.c   | 15 ++-
 builtin/mv.c  |  4 ++--
 builtin/rm.c  |  7 +++
 merge-recursive.c |  5 ++---
 read-cache.c  |  6 ++
 rerere.c  |  8 +++-
 sequencer.c   | 11 +--
 10 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index d8b975a571..905d8eb6cc 100644
--- a/cache.h
+++ b/cache.h
@@ -622,6 +622,7 @@ extern int read_index_unmerged(struct index_state *);

 /* For use with `write_locked_index()`. */
 #define COMMIT_LOCK(1 << 0)
+#define SKIP_IF_UNCHANGED  (1 << 1)

 /*
  * Write the index while holding an already-taken lock. Close the lock,
@@ -638,6 +639,9 @@ extern int read_index_unmerged(struct index_state *);
  * With `COMMIT_LOCK`, the lock is always committed or rolled back.
  * Without it, the lock is closed, but neither committed nor rolled
  * back.
+ *
+ * If `SKIP_IF_UNCHANGED` is given and the index is unchanged, nothing
+ * is written (and the lock is rolled back if `COMMIT_LOCK` is given).
  */
 extern int write_locked_index(struct index_state *, struct lock_file *lock, 
unsigned flags);

diff --git a/builtin/add.c b/builtin/add.c
index bf01d89e28..9e5a80cc6f 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -534,10 +534,9 @@ int cmd_add(int argc, const char **argv, const char 
*prefix)
unplug_bulk_checkin();

 finish:
-   if (active_cache_changed) {
-   if (write_locked_index(_index, _file, COMMIT_LOCK))
-   die(_("Unable to write new index file"));
-   }
+   if (write_locked_index(_index, _file,
+  COMMIT_LOCK | SKIP_IF_UNCHANGED))
+   die(_("Unable to write new index file"));

UNLEAK(pathspec);
UNLEAK(dir);
diff --git a/builtin/commit.c b/builtin/commit.c
index 8a87701414..8d9b4081c3 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -423,13 +423,9 @@ static const char *prepare_index(int argc, const char 
**argv, const char *prefix
if (active_cache_changed
|| !cache_tree_fully_valid(active_cache_tree))
update_main_cache_tree(WRITE_TREE_SILENT);
-   if (active_cache_changed) {
-   if (write_locked_index(_index, _lock,
-  COMMIT_LOCK))
-   die(_("unable to write new_index file"));
-   } else {
-   rollback_lock_file(_lock);
-

Re: [PATCH v4 9/9] add -p: don't rely on apply's '--recount' option

2018-03-01 Thread Junio C Hamano

Phillip Wood  writes:

> From: Phillip Wood 
>
> Now that add -p counts patches properly it should be possible to turn
> off the '--recount' option when invoking 'git apply'

Sounds good ;-)

Re: [PATCH v4 8/9] add -p: fix counting when splitting and coalescing

2018-03-01 Thread Junio C Hamano

Phillip Wood  writes:

> @@ -887,8 +892,8 @@ sub merge_hunk {
>   $o_cnt = $n_cnt = 0;
>   for ($i = 1; $i < @{$prev->{TEXT}}; $i++) {
>   my $line = $prev->{TEXT}[$i];
> - if ($line =~ /^\+/) {
> - $n_cnt++;
> + if ($line =~ /^[+\\]/) {
> + $n_cnt++ if ($line =~ /^\+/);
>   push @line, $line;
>   next;
>   }

H, the logic may be correct, but this looks like a result of
attempting to minimize the number of changed lines and ending up
with a less-than-readble code.  "If the line begins with a plus or
backslash, do these things, the first of which is done only when
the line begins with a plus."  The same comment for the other hunk
that counts the $this side.

Re: Worktree silently deleted on git fetch/gc/log

2018-03-01 Thread Eric Sunshine

On Thu, Mar 1, 2018 at 2:24 PM, Дилян Палаузов
 wrote:
> /git/A/.git/worktrees/b/ is missing - that is the point.
> /git/B/,git wasn't modified since the worktree was created, cat:
> gitdir: /git/A/.git/worktrees/b

I'll assume that the lowercase 'b' was a typo in your email (as it was
in mine), and that the real content is "gitdir:
/git/A/.git/worktrees/B".

You should be able to get back to a working state like this:

% cd /git/A/
% mkdir -p .git/worktrees/B
% echo 'ref: refs/heads/g' >.git/worktrees/B/HEAD
% echo ../.. >.git/worktrees/B/commondir
% echo /git/B/.git >.git/worktrees/B/gitdir

% cd /git/B/
% git reset

As far as I know, there isn't any code in Git which would
automatically remove the .git/worktrees/B directory, so it's not clear
how that happened.

Re: [PATCH v4 7/9] add -p: calculate offset delta for edited patches

2018-03-01 Thread Junio C Hamano

Phillip Wood  writes:

> From: Phillip Wood 
>
> Recount the number of preimage and postimage lines in a hunk after it
> has been edited so any change in the number of insertions or deletions
> can be used to adjust the offsets of subsequent hunks. If an edited
> hunk is subsequently split then the offset correction will be lost. It
> would be possible to fix this if it is a problem, however the code
> here is still an improvement on the status quo for the common case
> where an edited hunk is applied without being split.
>
> This is also a necessary step to removing '--recount' and
> '--allow-overlap' from the invocation of 'git apply'. Before
> '--recount' can be removed the splitting and coalescing counting needs
> to be fixed to handle a missing newline at the end of a file. In order
> to remove '--allow-overlap' there needs to be i) some way of verifying
> the offset data in the edited hunk (probably by correlating the
> preimage (or postimage if the patch is going to be applied in reverse)
> lines of the edited and unedited versions to see if they are offset or
> if any leading/trailing context lines have been removed) and ii) a way of
> dealing with edited hunks that change context lines that are shared
> with neighbouring hunks.
>
> Signed-off-by: Phillip Wood 
> ---

Thanks for clear description of what is going on in the series.

> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index 7a0a5896bb..0df0c2aa06 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -938,13 +938,19 @@ sub coalesce_overlapping_hunks {
>   parse_hunk_header($text->[0]);
>   unless ($_->{USE}) {
>   $ofs_delta += $o_cnt - $n_cnt;
> + # If this hunk has been edited then subtract
> + # the delta that is due to the edit.
> + $_->{OFS_DELTA} and $ofs_delta -= $_->{OFS_DELTA};

The pattern

<> and <>;

is something you are newly introducing to this script.  I am not
sure if we want to see them.  I somehow find them harder to read
than the more straight-forward and naïve

if (<>) {
<>;
}


> + # If this hunk was edited then adjust the offset delta
> + # to reflect the edit.
> + $_->{OFS_DELTA} and $ofs_delta += $_->{OFS_DELTA};

Likewise.

> +sub recount_edited_hunk {
> + local $_;
> + my ($oldtext, $newtext) = @_;
> + my ($o_cnt, $n_cnt) = (0, 0);
> + for (@{$newtext}[1..$#{$newtext}]) {
> + my $mode = substr($_, 0, 1);
> + if ($mode eq '-') {
> + $o_cnt++;
> + } elsif ($mode eq '+') {
> + $n_cnt++;
> + } elsif ($mode eq ' ') {
> + $o_cnt++;
> + $n_cnt++;
> + }
> + }
> + my ($o_ofs, undef, $n_ofs, undef) =
> + parse_hunk_header($newtext->[0]);
> + $newtext->[0] = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt);
> + my (undef, $orig_o_cnt, undef, $orig_n_cnt) =
> + parse_hunk_header($oldtext->[0]);
> + # Return the change in the number of lines inserted by this hunk
> + return $orig_o_cnt - $orig_n_cnt - $o_cnt + $n_cnt;
> +}

OK.

> @@ -1114,25 +1144,32 @@ sub prompt_yesno {
>  }
>  
>  sub edit_hunk_loop {
> - my ($head, $hunk, $ix) = @_;
> - my $text = $hunk->[$ix]->{TEXT};
> + my ($head, $hunks, $ix) = @_;
> + my $hunk = $hunks->[$ix];
> + my $text = $hunk->{TEXT};
> ...
> + $newhunk->{OFS_DELTA} = recount_edited_hunk($text, $newtext);
> + # If this hunk has already been edited then add the
> + # offset delta of the previous edit to get the real
> + # delta from the original unedited hunk.
> + $hunk->{OFS_DELTA} and
> + $newhunk->{OFS_DELTA} += $hunk->{OFS_DELTA};

Ahh, good point.

Re: Bug: git log: boundary commits do not respect order (e.g. date-order, topo-order) 2

2018-03-01 Thread Josh Tepper

I just wanted to follow up -- what do you think?

On Thu, Feb 22, 2018 at 9:21 PM, Josh Tepper  wrote:
> My use case is that when combined with --graph, the ordering is
> necessary to render a reasonable looking commit graph; by placing the
> commits at the end, each boundary commit essentially produces another
> vertical line all the way to the bottom where the commits reside.
> More specifically, the case where I noticed this was:
>
> git log --boundary --graph ^master feature
>
> which will have one vertical line going all the way to the bottom for
> each merge from master into feature.
>
> Another issue that I've noticed is that if one is using
> --show-linear-break (instead of --graph), the linear breaks are
> missing between the boundary commits and the other commits.
>
> Regarding the documentation, while it doesn't explicitly say where the
> boundary commits will be ordered with the other commits (nor does it
> say that they'll have accurate linear breaks), the documentation for
> the order flags (and for the linear break flag) makes no mention that
> certain commits are excluded.  The implicit understanding of the
> documentation for --date-order is that it will apply to all of the
> commits.  Certainly, if you add the flag --full-history, one expects
> the ordering to apply to the additionally traversed commits.  I
> suppose boundary commits are a little different since they're not
> explicitly part of the range -- nonetheless I expected them to be
> treated similarly.
>
> Maybe this could at least be documented?
>
> Best,
> ~Josh
>
> On Thu, Feb 22, 2018 at 2:29 PM, Derrick Stolee  wrote:
>> On 2/21/2018 6:57 PM, Josh Tepper wrote:
>>>
>>> When using git log, boundary commits (ie, those commits added by
>>> specifying --boundary) do not respect the order (e.g., --date-order,
>>> --topo-order).  Consider the following commit history, where number
>>> indicates the order of the commit timestamps:
>>>
>>> 
>>> 0125  <--A
>>> \ \
>>>   346  <--B
>>>
>>>
>>> Executing the following command:
>>>
>>> $ git log --boundary --date-order ^A B
>>>
>>> Should produce the following order (boundary commits shown with dashes):
>>> 6 -5 4 3 -1
>>>
>>> However, it in fact produces:
>>> 6 4 3 -5 -1
>>>
>>> Please advise.
>>>
>>
>> Hi Josh,
>>
>> Looking at the docs [1], I don't see any specifics on how the boundary
>> commits should be ordered.
>>
>> Clearly, the implementation specifies that the boundary is written after all
>> other commits. For a full discussion of this, see the commit message for
>> 86ab4906a7c "revision walker: Fix --boundary when limited". Here is an
>> excerpt:
>>
>>  - After get_revision() finishes giving out all the positive
>>commits, if we are doing the boundary processing, we look at
>>the parents that we marked as potential boundaries earlier,
>>see if they are really boundaries, and give them out.
>>
>> The boundary commits are correctly sorted by topo-order among themselves as
>> of commit 4603ec0f960 "get_revision(): honor the topo_order flag for
>> boundary commits".
>>
>> So, I'm not sure that this is a bug (it is working "as designed") but it
>> certainly is non-obvious behavior.
>>
>> In what use case do you need these boundary commits to appear earlier?
>>
>> Thanks,
>> -Stolee
>>
>> [1] https://git-scm.com/docs/git-log
>>
>>

Re: Worktree silently deleted on git fetch/gc/log

2018-03-01 Thread Дилян Палаузов


Hello,

/git/A/.git/worktrees/b/ is missing - that is the point.

/git/B/,git wasn't modified since the worktree was created, cat:
gitdir: /git/A/.git/worktrees/b

Regards
  Дилян



On 03/01/18 19:09, Eric Sunshine wrote:

On Wed, Feb 28, 2018 at 7:44 AM, Дилян Палаузов
 wrote:

A (branch master) and
B (branch g) which is a worktree of the first.

/git/B g>$ git fetch
[...]
 From https://...
13e4c55a0..02655d5fb  g -> origin/g
c37a3ca25..bc7888511  master -> origin/master
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
/git/B g<>$ git log -p origin/g
fatal: Not a git repository: /git/A/.git/worktrees/B
/git/B$

Please note that on the second last prompt there is <>, so that git-prompt
has found the neccessary information and this was later deleted -
by 'gc' or 'log'.

What would be the procedure to restore the /git/A/.git/worktrees/B
structure?


Can you show us (via 'cat') the content of the following files?

/git/B/.git
/git/A/.git/worktrees/b/HEAD
/git/A/.git/worktrees/b/commondir
/git/A/.git/worktrees/b/gitdir

Re: [PATCH v2 0/5] roll back locks in various code paths

2018-03-01 Thread Junio C Hamano

Jeff King  writes:

> IMHO the result is easier to follow. Except for one case:
>
>> -if (active_cache_changed || force_write) {
>> -if (newfd < 0) {
>> -if (refresh_args.flags & REFRESH_QUIET)
>> -exit(128);
>> -unable_to_lock_die(get_index_file(), lock_error);
>> -}
>> -if (write_locked_index(_index, _file, COMMIT_LOCK))
>> -die("Unable to write new index file");
>> +if (newfd < 0 && (active_cache_changed || force_write)) {
>> +if (refresh_args.flags & REFRESH_QUIET)
>> +exit(128);
>> +unable_to_lock_die(get_index_file(), lock_error);
>>  }
>>  
>> -rollback_lock_file(_file);
>> +if (write_locked_index(_index, _file,
>> +   COMMIT_LOCK | (force_write ? 0 : 
>> SKIP_IF_UNCHANGED)))
>> +die("Unable to write new index file");
>
> where I think the logic just ends up repeating itself.

Yup, this one I also had a bit of trouble with.

Re: [PATCH v4 00/35] protocol version 2

2018-03-01 Thread Brandon Williams

On 03/01, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Lots of changes since v3 (well more than between v2 and v3).  Thanks for
> > all of the reviews on the last round, the series is getting more
> > polished.
> >
> >  * Eliminated the "# service" line from the response from an HTTP
> >server.  This means that the response to a v2 request is exactly the
> >same regardless of which transport you use!  Docs for this have been
> >added as well.
> >  * Changed how ref-patterns work with the `ls-refs` command.  Instead of
> >using wildmatch all patterns must either match exactly or they can
> >contain a single '*' character at the end to mean that the prefix
> >must match.  Docs for this have also been added.
> >  * Lots of updates to the docs.  Including documenting the
> >`stateless-connect` remote-helper command used by remote-curl to
> >handle the http transport.
> >  * Fixed a number of bugs with the `fetch` command, one of which didn't
> >use objects from configured alternates.
> 
> I noticed that this round is built on top of v2.16.0-rc0.  It
> certainly makes it easier to compare against the previous round
> which was built on top of that old commit and it is very much
> appreciated that a reroll does not involve pointless rebases.
> 
> For those who are helping from sidelines, it may be ehlpful to
> mention where in the history this was developed on, though, as
> applying these on the current 'master' has a handful of small
> conflicts.
> 
> Thanks, will replace and will comment on individual patches as
> needed.

I've tried to keep building on the same base that I started with when
sending out a new version of series, mostly because I thought it was
easier to see what was different between rounds.

I can, in the future, try to remember to put the commit its based on.
Do we have any sort of guidance about the best practice here?

-- 
Brandon Williams

Re: [RFC PATCH] color: respect the $NO_COLOR convention

2018-03-01 Thread Junio C Hamano

Leah Neukirchen  writes:

> You are right in calling this out an emerging new thing, but the
> second list of that page proves that it will be useful to settle on a
> common configuration, and my hope is by getting a few popular projects
> on board, others will soon follow.  It certainly is easy to implement,
> and rather unintrusive.  Users which don't know about this feature are
> completely unaffected.

There certainly is chicken-and-egg problem there.  Even though I
personally prefer not to see overuse of colors, I am not sure if
we the Git community as a whole would want to be involved until it
gets mainstream.

>>> if (color_stdout_is_tty < 0)
>>> color_stdout_is_tty = isatty(1);
>>> if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) {
>>
>> According to no-color.org's FAQ #2, NO_COLOR should affect only the
>> "default" behaviour, and should stay back if there is an explicit
>> end-user configuration (or command line override).  And this helper
>> function is called only from want_color() when their is no such
>> higher precedence setting, which is in line with the recommendation.
>>
>> Which is good.
>
> Yes, I took care of that.  Should this also be tested?  It doesn't
> quite fit into the setting of t4026-color.sh I think.

It probably fits much better in t7006, I would suspect.  Earlier,
setting color.ui to auto meant the output is colored when run under
test_terminal, but with this new environment set, the output will
have to be bland.

Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Jonathan Nieder

Hi,

Stefan Beller wrote:

> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

As others mentioned, this is essentially a fancy way to experiment
with adding a new header (with the same name as an existing header) to
a commit.  It is kind of a scary thing to do because anyone trying to
parse commits, including old versions of git, is likely to get
confused by the multiple trees.  It doesn't affect the reachability
calculation in the way that it should so this ends up being something
that should be straightforward to do with a message in the commit body
instead.

To affect reachability, you could use multiple parent lines instead.
You'd need synthetic commits to hang the trees on.  This is similar to
how "git stash" stores the index state.

In other words, I think what you are trying to do is feasible, but not
in the exact way you described.

[...]
> * porcelain to modify the repo "at larger scale", such as rebase,
> cherrypicking, reverting
>   involving more than 1 commit.
>
> These large scale operations involving multiple commits however
> are all modal in its nature. Before doing anything else, you have to
> finish or abort the rebase or you need expert knowledge how to
> go otherwise.
>
> During the rebase there might be a hard to resolve conflict, which
> you may not want to resolve right now, but defer to later.  Deferring a
> conflict is currently impossible, because precisely one tree is recorded.

Junio mentions you'd want to record:
 - stages of the index, to re-create a conflicted index
 - working tree files, with conflict markers

In addition you may also want to record:
 - state (todo list) from .git/rebase-merge, to allow picking up where
   you left off in such a larger operation
 - similar state for other commands --- e.g. MERGE_MSG

Recording this work-in-progress state is in the spirit of "git stash"
does.  People also sometimes like to record their state in progress with
a "wip commit" at the tip of a branch.  Both of those workflows would
benefit from something like this, I'd think.

So I kind of like this.  Maybe a "git save-wip" command that is like
"git stash" but records state to the current branch?  And similarly
improving "git stash" to record the same richer state.

And in the spirit of "git stash" I think it is possible without
even modifying the commit object format.

[...]
> I'd be advocating for having multiple trees in a commit
> possible locally; it might be a bad idea to publish such trees.

I think such "WIP state" may also be useful for publishing, to allow
collaborating on a thorny rebase or merge.

Thanks and hope that helps,
Jonathan

Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Junio C Hamano

Jacob Keller  writes:

> How does this let you defer a conflict? A future commit which modified
> blobs in that tree wouldn't know what version of the trees/blobs to
> actually use? Clearly future commits could record their own trees, but
> how would they generate the "correct" tree?
>
> Maybe I am missing something here?

If you write four trees out of each stage in the index and record
them, you could in theory have a new command that reads them and
recreate the conflicted index.  Oh, and then you would need the
fifth tree that records what the working-tree files (with conflict
markers) looked like, in order to reproduce the state seen by the
person who ran "git merge", attempted to resolve and gave up halfway
in the middle.

As a local operation, extending "git stash" somehow so that it can
stash even in a conflicted working tree may be a better approach,
and it does not need cruft headers in commit objects, I would think.

Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Junio C Hamano

Stefan Beller  writes:

> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

The second one is merely a random unauthorized header that is not
interpreted in any way by Git.  It is merely being confusing by
starting with "tree " and having 40-hex after it, but the 40-hex
does not get interpreted as an object name, and does not participate
in reachability computation (i.e. packing, pruning and fsck).

There is not much difference between that and a line of trailer in
the commit log message (other than this one is less discoverable).

Re: [PATCH v4 00/35] protocol version 2

2018-03-01 Thread Junio C Hamano

Brandon Williams  writes:

> Lots of changes since v3 (well more than between v2 and v3).  Thanks for
> all of the reviews on the last round, the series is getting more
> polished.
>
>  * Eliminated the "# service" line from the response from an HTTP
>server.  This means that the response to a v2 request is exactly the
>same regardless of which transport you use!  Docs for this have been
>added as well.
>  * Changed how ref-patterns work with the `ls-refs` command.  Instead of
>using wildmatch all patterns must either match exactly or they can
>contain a single '*' character at the end to mean that the prefix
>must match.  Docs for this have also been added.
>  * Lots of updates to the docs.  Including documenting the
>`stateless-connect` remote-helper command used by remote-curl to
>handle the http transport.
>  * Fixed a number of bugs with the `fetch` command, one of which didn't
>use objects from configured alternates.

I noticed that this round is built on top of v2.16.0-rc0.  It
certainly makes it easier to compare against the previous round
which was built on top of that old commit and it is very much
appreciated that a reroll does not involve pointless rebases.

For those who are helping from sidelines, it may be ehlpful to
mention where in the history this was developed on, though, as
applying these on the current 'master' has a handful of small
conflicts.

Thanks, will replace and will comment on individual patches as
needed.

Re: [PATCH] git-gui: bind CTRL/CMD+numpad ENTER to do_commit

2018-03-01 Thread Eric Sunshine

On Thu, Mar 1, 2018 at 9:39 AM, Birger Skogeng Pedersen
 wrote:
> ---

Please sign-off on your patch. See Documentation/SubmittingPatches.

Also, it would be helpful to write at least a short commit message
justifying the change. The reason you gave in your lead-in email[1]
might be sufficient:

In git-gui, we can hit CTRL/CMD+ENTER to create a commit. However,
using the numpad ENTER does not invoke the same command.

(assuming people don't argue that numpad ENTER should be saved for
some other function).

Thanks.

[1]: 
https://public-inbox.org/git/CAGr--=lxmtz5rrp4742u3vsradrsware2sitcsowatyson2...@mail.gmail.com/

> diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
> index 91c00e648..6de74ce63 100755
> --- a/git-gui/git-gui.sh
> +++ b/git-gui/git-gui.sh
> @@ -3867,6 +3867,7 @@ bind .   <$M1B-Key-equal> {show_more_context;break}
>  bind .   <$M1B-Key-plus> {show_more_context;break}
>  bind .   <$M1B-Key-KP_Add> {show_more_context;break}
>  bind .   <$M1B-Key-Return> do_commit
> +bind .   <$M1B-Key-KP_Enter> do_commit
>  foreach i [list $ui_index $ui_workdir] {
> bind $i{ toggle_or_diff click %W %x %y; break }
> bind $i <$M1B-Button-1>  { add_one_to_selection %W %x %y; break }

Re: [GSoC][PATCH v2] userdiff: add built-in pattern for golang

2018-03-01 Thread Eric Sunshine

On Thu, Mar 1, 2018 at 6:19 AM, Alban Gruin  wrote:
> This adds xfuncname and word_regex patterns for golang, a quite
> popular programming language. It also includes test cases for the
> xfuncname regex (t4018) and updated documentation.
>
> The xfuncname regex finds functions, structs and interfaces.  Although
> the Go language prohibits the opening brace from being on its own
> line, the regex does not makes it mandatory, to be able to match
> `func` statements like this:
>
> func foo(bar int,
> baz int) {
> }
>
> This is covered by the test case t4018/golang-long-func.

A possible suggested rewrite to make it flow a bit better and to
mention the loose whitespace matching:

The xfuncname regex finds functions, structs and interfaces.
Although the Go language prohibits the opening brace of a 'func'
from being on its own line, the regex makes the brace optional so
it can match function declarations wrapped over multiple lines
(covered by new test case t4018/golang-long-func):

func foo(bar int,
baz int) {
}

Whitespace matching is also a bit lax in order to handle
non-standard formatting of method declarations. For instance:

func(x *X) foo() {

versus typical 'gofmt' formatted:

func (x *x) foo() {

(Not necessarily worth a re-roll; perhaps Junio can pick it up when
queueing if he considers it an improvement.)

Thanks.

> The word_regex pattern finds identifiers, integers, floats, complex
> numbers and operators, according to the go specification.
>
> Signed-off-by: Alban Gruin

Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config

2018-03-01 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> pack-objects could be a big memory hog especially on large repos,
> everybody knows that. The suggestion to stick a .keep file on the
> largest pack to avoid this problem is also known for a long time.

Yup, but not that it is not "largest" per-se.  The thing being large
is a mere consequence that it is the base pack that holds the bulk
of older parts of the history (e.g. the one that you obtained via
the initial clone).

> Let's do the suggestion automatically instead of waiting for people to
> come to Git mailing list and get the advice. When a certain condition
> is met, gc --auto create a .keep file temporary before repack is run,
> then remove it afterward.
>
> gc --auto does this based on an estimation of pack-objects memory
> usage and whether that fits in one third of system memory (the
> assumption here is for desktop environment where there are many other
> applications running).
>
> Since the estimation may be inaccurate and that 1/3 threshold is
> arbitrary, give the user a finer control over this mechanism as well:
> if the largest pack is larger than gc.bigPackThreshold, it's kept.

If this is a transient mechanism during a single gc session, it
would be far more preferrable if we can find a way to do this
without actually having a .keep file on the filesystem.

Re: Worktree silently deleted on git fetch/gc/log

2018-03-01 Thread Eric Sunshine

On Wed, Feb 28, 2018 at 7:44 AM, Дилян Палаузов
 wrote:
> A (branch master) and
> B (branch g) which is a worktree of the first.
>
> /git/B g>$ git fetch
> [...]
> From https://...
>13e4c55a0..02655d5fb  g -> origin/g
>c37a3ca25..bc7888511  master -> origin/master
> Auto packing the repository in background for optimum performance.
> See "git help gc" for manual housekeeping.
> /git/B g<>$ git log -p origin/g
> fatal: Not a git repository: /git/A/.git/worktrees/B
> /git/B$
>
> Please note that on the second last prompt there is <>, so that git-prompt
> has found the neccessary information and this was this was later deleted -
> by 'gc' or 'log'.
>
> What would be the procedure to restore the /git/A/.git/worktrees/B
> structure?

Can you show us (via 'cat') the content of the following files?

/git/B/.git
/git/A/.git/worktrees/b/HEAD
/git/A/.git/worktrees/b/commondir
/git/A/.git/worktrees/b/gitdir

Re: [PATCH 09/11] pack-objects: refer to delta objects by index instead of pointer

2018-03-01 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> Notice that packing_data::nr_objects is uint32_t, we could only handle
> maximum 4G objects and can address all of them with an uint32_t. If we
> use a pointer here, we waste 4 bytes on 64 bit architecture.
>
> Convert these delta pointers to indexes. Since we need to handle NULL
> pointers as well, the index is shifted by one [1].

Makes perfect sense.

I do not think losing 1 slot out of possible 4G is a regression,
unlike the 256 packfile limit 07/11 imposes.

Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry

2018-03-01 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
> pack. Use an index isntead since the number of packs should be
> relatively small.
>
> This limits the number of packs we can handle to 256 (still
> unreasonably high for a repo to work well). If you have more than 256
> packs, you'll need an older version of Git to repack first.

I can tell without looking at the rest of the thread that people had
reasonable objection against this stance ;-)  This will not fly well.

Re: [PATCH 04/11] pack-objects: use bitfield for object_entry::depth

2018-03-01 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> This does not give us any saving due to padding. But we will be able
> to save once we cut 4 bytes out of this struct in a subsequent patch.
>
> Because of struct packing from now on we can only handle max depth
> 4095 (or even lower when new booleans are added in this struct). This
> should be ok since long delta chain will cause significant slow down
> anyway.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---

This should be marked as RFC/PATCH; I do not have objection to
limiting the delta depth to some reasonable length (rather than the
current 1<<31 or worse 1<<63), and 4k may be such a reasonable limit
(I'd actually think anything more than a few hundreds is probably a
bad idea), but it needs to be documented.

>  builtin/pack-objects.c | 4 
>  pack-objects.h | 6 ++
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index a4dbb40824..cfd97da7db 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, 
> const char *prefix)
>   if (pack_to_stdout != !base_name || argc)
>   usage_with_options(pack_usage, pack_objects_options);
>  
> + if (depth > (1 << OE_DEPTH_BITS))
> + die(_("delta chain depth %d is greater than maximum limit %d"),
> + depth, (1 << OE_DEPTH_BITS));
> +
>   argv_array_push(, "pack-objects");
>   if (thin) {
>   use_internal_rev_list = 1;
> diff --git a/pack-objects.h b/pack-objects.h
> index fca334ab4d..3941e6c9a6 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -2,6 +2,7 @@
>  #define PACK_OBJECTS_H
>  
>  #define OE_DFS_STATE_BITS 2
> +#define OE_DEPTH_BITS 12
>  
>  /*
>   * State flags for depth-first search used for analyzing delta cycles.
> @@ -43,10 +44,7 @@ struct object_entry {
>   unsigned tagged:1; /* near the very tip of refs */
>   unsigned filled:1; /* assigned write-order */
>   unsigned dfs_state:OE_DFS_STATE_BITS;
> -
> - /* XXX 20 bits hole, try to pack */
> -
> - int depth;
> + unsigned depth:OE_DEPTH_BITS;
>  
>   /* size: 120, padding: 4 */
>  };

[RFC] Contributing to Git (on Windows)

2018-03-01 Thread Derrick Stolee

We (Git devs at Microsoft) have had several people start contributing to 
Git over the past few years (I'm the most-recent addition). As we 
on-boarded to Git development on our Windows machines, we collected our 
setup steps on an internal wiki page.


Now, we'd like to make that document publicly available. These steps are 
focused on a Windows user, so we propose putting them in the 
git-for-windows/git repo under CONTRIBUTING.md. I have a pull request 
open for feedback [1]. I'll read comments on that PR or in this thread.


If you've ever done Git development on a Windows machine, I would love 
to hear your feedback on this document. Any other advice you have is 
greatly appreciated.


For anyone interested, there are also a discussion about submitting 
patches upstream. The document links to Documentation/CodingGuidelines 
and Documentation/SubmittingPatches, but tries to elaborate with 
additional advice.


Thanks,
-Stolee

[1] https://github.com/git-for-windows/git/pull/1529

Re: Obsolete instruction in SubmittingPatches?

2018-03-01 Thread Andrei Rybak

On 01.03.2018 0:54, Junio C Hamano wrote:
> Andrei Rybak  writes:
> 
>> Is this part of guidelines obsolete, or am I not understanding this
>> correctly?
> 
> I am merely being nice (but only on "time-permitting" basis).
> 

Does that mean that the integration of a series is easier, when there is
a re-send?

Re: [RFC PATCH] color: respect the $NO_COLOR convention

2018-03-01 Thread Leah Neukirchen

Junio C Hamano  writes:

> Leah Neukirchen  writes:
>
>> NO_COLOR (http://no-color.org/) is a comprehensive approach to disable
>> colors by default for all tools:
>
> The list of software that supports that "convention" is, eh,
> respectable.  Is it really a "convention" yet, or yet another thing
> the user needs to worry about?

You are right in calling this out an emerging new thing, but the
second list of that page proves that it will be useful to settle on a
common configuration, and my hope is by getting a few popular projects
on board, others will soon follow.  It certainly is easy to implement,
and rather unintrusive.  Users which don't know about this feature are
completely unaffected.

>>  if (color_stdout_is_tty < 0)
>>  color_stdout_is_tty = isatty(1);
>>  if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) {
>
> According to no-color.org's FAQ #2, NO_COLOR should affect only the
> "default" behaviour, and should stay back if there is an explicit
> end-user configuration (or command line override).  And this helper
> function is called only from want_color() when their is no such
> higher precedence setting, which is in line with the recommendation.
>
> Which is good.

Yes, I took care of that.  Should this also be tested?  It doesn't
quite fit into the setting of t4026-color.sh I think.

Thanks,
-- 
Leah Neukirchen    http://leah.zone

Re: [RFC PATCH] color: respect the $NO_COLOR convention

2018-03-01 Thread Junio C Hamano

Leah Neukirchen  writes:

> NO_COLOR (http://no-color.org/) is a comprehensive approach to disable
> colors by default for all tools:

The list of software that supports that "convention" is, eh,
respectable.  Is it really a "convention" yet, or yet another thing
the user needs to worry about?

> diff --git a/color.c b/color.c
> index d48dd947c..59e9c2459 100644
> --- a/color.c
> +++ b/color.c
> @@ -326,6 +326,8 @@ int git_config_colorbool(const char *var, const char 
> *value)
>  
>  static int check_auto_color(void)
>  {
> + if (getenv("NO_COLOR"))
> + return 0;

Our convention often calls for CONFIG_VAR=false to mean "I do not
want to see what CONFIG_VAR wants to do done", i.e.

NO_COLOR=false git show

would show colored output if there is no other settings.  But this
code contradicts the convention, deliberately because that is what
no-color.org wants.  Makes me wonder if that convention is worth
following in the first place.

>   if (color_stdout_is_tty < 0)
>   color_stdout_is_tty = isatty(1);
>   if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) {

According to no-color.org's FAQ #2, NO_COLOR should affect only the
"default" behaviour, and should stay back if there is an explicit
end-user configuration (or command line override).  And this helper
function is called only from want_color() when their is no such
higher precedence setting, which is in line with the recommendation.

Which is good.

[RFC PATCH] color: respect the $NO_COLOR convention

2018-03-01 Thread Leah Neukirchen

When the NO_COLOR environment variable is set to any value, default to
disabling color, i.e. resolve 'auto' to false.

NO_COLOR (http://no-color.org/) is a comprehensive approach to disable
colors by default for all tools:
> All command-line software which outputs text with ANSI color added
> should check for the presence of a NO_COLOR environment variable that,
> when present (regardless of its value), prevents the addition of ANSI
> color.

Signed-off-by: Leah Neukirchen 
---

This is a first stab at implementing NO_COLOR for git, effectively
making it then behave like before colors were enabled by default.

I feel this should be documented somewhere, but I'm not sure where the
best place is.  Perhaps in config.ui, or the Git environment variables
(but they all start with GIT_).

 color.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/color.c b/color.c
index d48dd947c..59e9c2459 100644
--- a/color.c
+++ b/color.c
@@ -326,6 +326,8 @@ int git_config_colorbool(const char *var, const char *value)
 
 static int check_auto_color(void)
 {
+   if (getenv("NO_COLOR"))
+   return 0;
if (color_stdout_is_tty < 0)
color_stdout_is_tty = isatty(1);
if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) {
-- 
2.16.2

Re: [PATCH v3 2/9] t3701: indent here documents

2018-03-01 Thread Junio C Hamano

Phillip Wood  writes:

> Thanks for the tips, tbdiff looks useful (I just need to learn to read
> diffs of diffs!). I also find rebasing them on a common ancestor useful
> but its a bit tedious.

Yes, comparing two versions of a series is somewhat unusual and
needs getting used to before one can do so efficiently.  You do
not have to always rebase (tbdiff is fairly good at what it does
even when two ranges are far apart), but it helps not to begin
developing on a codebase that is newer than needed (e.g. a bugfix
on 'next' is unneeded unless you are fixing a bug introduced by a
topic that is still on 'next'---in which case the best fix is one
that is on that topic, without depending on the rest of 'next').

In any case, thank _you_ for contributing.

Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-03-01 Thread demerphq

On 1 March 2018 at 16:08, Jeff King  wrote:
> On Thu, Mar 01, 2018 at 09:28:31AM -0500, Randall S. Becker wrote:
>
>> > It's not clear to me though if we just want to tweak the programs run in 
>> > the
>> > test scripts in order to get test_must_fail to stop complaining, or if we
>> > consider the unusual exit codes from our perl-based Git programs to be an
>> > error that should be fixed for real use, too.
>>
>> I'm living unusual exit code IRL all the time. So "fixed for real", is
>> what I'm looking for. So if we were to do that, where is the best
>> place to insert a fix - my original question - that would be permanent
>> in the main git test code. Or perhaps this needs to be in the main
>> code itself.
>
> If it's fixed in the real world, then it needs to be in the main code
> itself. It looks like git-svn already does this to some degree itself
> (most of the work happens in an eval, and it calls the "fatal" function
> if that throws an exception via 'die').
>
> So I think git-send-email.perl (and maybe others) needs to learn the
> same trick (by pushing the main bits of the script into an eval). Or it
> needs to include the SIG{__DIE__} trickery at the beginning of the
> script.
>
> I think the SIG{__DIE__} stuff could go into Git/PredictableDie.pm or
> something, and then any scripts that need it could just "use
> Git::PredictableDie".
>
> Does that make sense?

To me yes. By putting it in a module and 'use'ing it early you
guarantee it will be set up before any code following it is even
compiled.

If there is an existing module that the git perl code always uses then
it could go in there in a BEGIN{} block instead of adding the new
module.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-03-01 Thread Jeff King

On Thu, Mar 01, 2018 at 09:28:31AM -0500, Randall S. Becker wrote:

> > It's not clear to me though if we just want to tweak the programs run in the
> > test scripts in order to get test_must_fail to stop complaining, or if we
> > consider the unusual exit codes from our perl-based Git programs to be an
> > error that should be fixed for real use, too.
> 
> I'm living unusual exit code IRL all the time. So "fixed for real", is
> what I'm looking for. So if we were to do that, where is the best
> place to insert a fix - my original question - that would be permanent
> in the main git test code. Or perhaps this needs to be in the main
> code itself.

If it's fixed in the real world, then it needs to be in the main code
itself. It looks like git-svn already does this to some degree itself
(most of the work happens in an eval, and it calls the "fatal" function
if that throws an exception via 'die').

So I think git-send-email.perl (and maybe others) needs to learn the
same trick (by pushing the main bits of the script into an eval). Or it
needs to include the SIG{__DIE__} trickery at the beginning of the
script.

I think the SIG{__DIE__} stuff could go into Git/PredictableDie.pm or
something, and then any scripts that need it could just "use
Git::PredictableDie".

Does that make sense?

-Peff

Re: ref-filter: how to improve the code

2018-03-01 Thread Jeff King

On Thu, Mar 01, 2018 at 02:17:09PM +0300, Оля Тележная wrote:

> >> I tried to replace all die("...") with `return error("...")` and
> >> finally exit(), but actual problem is that we print "error:..."
> >> instead of "fatal:...", and it looks funny.
> >
> > If you do that, then format_ref_array_item() is still going to print
> > things, even if it doesn't die(). But for "cat-file --batch", we usually
> > do not print errors at all, but instead just say "... missing" (although
> > it depends on the error; syntactic errors in the format string would
> > still cause us to write to stderr).
> 
> Not sure if you catch my idea. format_ref_array_item() will not print
> anything, it will just return an error code. And if there was an error
> - we will print it in show_ref_array_item() (or go back to cat-file
> and print what we want).

OK, I think I misunderstood. It seems like there are three possible
strategies on the table:

  - low-level functions call error() and return -1, that gets passed up
through mid-level functions like format_ref_array_item(), and then
higher-level functions like show_ref_array_item() act on the error
code and call die(). The user sees something like:

  error: unable to parse object 1234abcd
  fatal: unable to format object

  - low-level functions return a numeric error code, which is then
formatted by higher-level functions like show_ref_array_item() to
produce a specific message

  - low-level functions stuff an error code into a strbuf and return -1,
and then higher-level functions like show_ref_array_item() will feed
that message to die("%s", err.buf).

I think the first one, besides changing the output, is going to produce
error() messages even for cases where we're calling
format_ref_array_item() directly, because error() writes its output
immediately.

The second is a pain in practice, because it doubles the work: you have
to come up with a list of error codes, and then translate it them into
strings. And there's no room to mention variable strings (like the name
of the object).

So I think the third is really the only viable option.

-Peff

Good day, I am contacting you in regards to my late client who bears the same surname with you. Please, contact me back urgently for details about this message. Barrister Ransley Bethel (ESQ).

2018-03-01 Thread Ransley Bethel

Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry

2018-03-01 Thread Jeff King

On Thu, Mar 01, 2018 at 04:10:48PM +0700, Nguyễn Thái Ngọc Duy wrote:

> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
> pack. Use an index isntead since the number of packs should be
> relatively small.
> 
> This limits the number of packs we can handle to 256 (still
> unreasonably high for a repo to work well). If you have more than 256
> packs, you'll need an older version of Git to repack first.

I overall like the direction of this series, but I think this one is
just too much. While you definitely shouldn't have a ton of packs, this
leaves the user with no real escape hatch. And 256 isn't actually that
many packs.

-Peff

[PATCH] git-gui: bind CTRL/CMD+numpad ENTER to do_commit

2018-03-01 Thread Birger Skogeng Pedersen

---
 git-gui/git-gui.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh
index 91c00e648..6de74ce63 100755
--- a/git-gui/git-gui.sh
+++ b/git-gui/git-gui.sh
@@ -3867,6 +3867,7 @@ bind .   <$M1B-Key-equal> {show_more_context;break}
 bind .   <$M1B-Key-plus> {show_more_context;break}
 bind .   <$M1B-Key-KP_Add> {show_more_context;break}
 bind .   <$M1B-Key-Return> do_commit
+bind .   <$M1B-Key-KP_Enter> do_commit
 foreach i [list $ui_index $ui_workdir] {
bind $i{ toggle_or_diff click %W %x %y; break }
bind $i <$M1B-Button-1>  { add_one_to_selection %W %x %y; break }
-- 
2.16.2.268.g7f9c27f2f.dirty

RE: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-03-01 Thread Randall S. Becker

On March 1, 2018 2:36 AM, Jeff King wrote:
> On Wed, Feb 28, 2018 at 05:51:14PM +0100, demerphq wrote:
> 
> > I would look into putting it into a module and then using the PERL5OPT
> > environment var to have it loaded automagically in any of your perl
> > scripts.
> >
> > For instance if you put that code into a module called Git/DieTrap.pm
> >
> > then you could do:
> >
> > PERL5OPT=-MGit::DieTrap
> >
> > In your test setup code assuming you have some. Then you don't need to
> > change any of your scripts just the test runner framework.
> 
> That's a clever trick.
> 
> It's not clear to me though if we just want to tweak the programs run in the
> test scripts in order to get test_must_fail to stop complaining, or if we
> consider the unusual exit codes from our perl-based Git programs to be an
> error that should be fixed for real use, too.

I'm living unusual exit code IRL all the time. So "fixed for real", is what I'm 
looking for. So if we were to do that, where is the best place to insert a fix 
- my original question - that would be permanent in the main git test code. Or 
perhaps this needs to be in the main code itself.

Cheers,
Randall

git-gui: CTRL/CMD + numpad ENTER does not invoke same command as "regular" ENTER

2018-03-01 Thread Birger Skogeng Pedersen

In git-gui, we can hit CTRL/CMD+ENTER to create a commit. However,
using the numpad ENTER does not invoke the same command.

I propose that both numpad ENTER and "regular" ENTER should invoke the
same command.

Re: [PATCH 00/11] Reduce pack-objects memory footprint

2018-03-01 Thread Ævar Arnfjörð Bjarmason


On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:

> The array of object_entry in pack-objects can take a lot of memory
> when pack-objects is run in "pack everything" mode. On linux-2.6.git,
> this array alone takes roughly 800MB.
>
> This series reorders some fields and reduces field size... to keep
> this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on
> 64-bit linux and saves 260MB on linux-2.6.git.

I'm very interested in this patch series. I don't have time to test this
one right now (have to run), but with your previous RFC patch memory use
(in the ~4GB range) on a big in-house repo went down by a bit over 3%,
and it's ~5% faster.

Before/after RSS 4440812 / 429 & runtime 172.73 / 162.45. This is
after having already done a full git gc before, data via /usr/bin/time
-v.

So not huge, but respectable.

We have a big repo, and this gets repacked on 6-8GB of memory on dev
KVMs, so we're under a fair bit of memory pressure. git-gc slows things
down a lot.

It would be really nice to have something that made it use drastically
less memory at the cost of less efficient packs. Is the property that
you need to spend give or take the size of .git/objects in memory
something inherent, or just a limitation of the current implementation?
I.e. could we do a first pass to pick some objects based on some
heuristic, then repack them N at a time, and finally delete the
now-obsolete packs?

Another thing I've dealt with is that on these machines their
NFS-mounted storage gets exhausted (I'm told) due to some pathological
operations git does during repack, I/O tends to get 5-6x slower. Of
course ionice doesn't help because the local kernel doesn't know
anything about how harmful it is.

Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry

2018-03-01 Thread Ævar Arnfjörð Bjarmason


On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:

> pack. Use an index isntead since the number of packs should be

s/isntead/instead/

> This limits the number of packs we can handle to 256 (still
> unreasonably high for a repo to work well). If you have more than 256
> packs, you'll need an older version of Git to repack first.

So if you have gc.autoPackLimit=300 this will break, how does it break?

Should we also make (& document) setting that variable higher than 256
an error?

Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache

2018-03-01 Thread Ben Peart




On 3/1/2018 2:42 AM, Jeff King wrote:

On Wed, Feb 28, 2018 at 01:27:03PM -0800, Junio C Hamano wrote:


Somehow this topic has been hanging without getting further
attention for too long.  It's time to wrap it up and moving it
forward.  How about this?

-- >8 --
From: Junio C Hamano 
Date: Wed, 28 Feb 2018 13:21:09 -0800
Subject: [PATCH] untracked cache: use git_env_bool() not getenv() for 
customization
[...]


This looks good to me. Thanks for tying up the loose end.

-Peff



Looks good to me as well.  Thanks for fixing the environment variables.

I'm having send-email issues so hope this one gets through.  Please 
ignore my [PATCH V2] if it ever makes it through.


Ben

[GSoC][PATCH v2] userdiff: add built-in pattern for golang

2018-03-01 Thread Alban Gruin

This adds xfuncname and word_regex patterns for golang, a quite
popular programming language. It also includes test cases for the
xfuncname regex (t4018) and updated documentation.

The xfuncname regex finds functions, structs and interfaces.  Although
the Go language prohibits the opening brace from being on its own
line, the regex does not makes it mandatory, to be able to match
`func` statements like this:

func foo(bar int,
baz int) {
}

This is covered by the test case t4018/golang-long-func.

The word_regex pattern finds identifiers, integers, floats, complex
numbers and operators, according to the go specification.

Signed-off-by: Alban Gruin 
---
 Documentation/gitattributes.txt | 2 ++
 t/t4018-diff-funcname.sh| 1 +
 t/t4018/golang-complex-function | 8 
 t/t4018/golang-func | 4 
 t/t4018/golang-interface| 4 
 t/t4018/golang-long-func| 5 +
 t/t4018/golang-struct   | 4 
 userdiff.c  | 9 +
 8 files changed, 37 insertions(+)
 create mode 100644 t/t4018/golang-complex-function
 create mode 100644 t/t4018/golang-func
 create mode 100644 t/t4018/golang-interface
 create mode 100644 t/t4018/golang-long-func
 create mode 100644 t/t4018/golang-struct

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index c21f5ca10..d52b254a2 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -714,6 +714,8 @@ patterns are available:
 
 - `fountain` suitable for Fountain documents.
 
+- `golang` suitable for source code in the Go language.
+
 - `html` suitable for HTML/XHTML documents.
 
 - `java` suitable for source code in the Java language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 1795ffc3a..22f9f88f0 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -33,6 +33,7 @@ diffpatterns="
css
fortran
fountain
+   golang
html
java
matlab
diff --git a/t/t4018/golang-complex-function b/t/t4018/golang-complex-function
new file mode 100644
index 0..e057dcefe
--- /dev/null
+++ b/t/t4018/golang-complex-function
@@ -0,0 +1,8 @@
+type Test struct {
+   a Type
+}
+
+func (t *Test) RIGHT(a Type) (Type, error) {
+   t.a = a
+   return ChangeMe, nil
+}
diff --git a/t/t4018/golang-func b/t/t4018/golang-func
new file mode 100644
index 0..8e9c9ac7c
--- /dev/null
+++ b/t/t4018/golang-func
@@ -0,0 +1,4 @@
+func RIGHT() {
+   a := 5
+   b := ChangeMe
+}
diff --git a/t/t4018/golang-interface b/t/t4018/golang-interface
new file mode 100644
index 0..553bedec9
--- /dev/null
+++ b/t/t4018/golang-interface
@@ -0,0 +1,4 @@
+type RIGHT interface {
+   a() Type
+   b() ChangeMe
+}
diff --git a/t/t4018/golang-long-func b/t/t4018/golang-long-func
new file mode 100644
index 0..ac3a77b5c
--- /dev/null
+++ b/t/t4018/golang-long-func
@@ -0,0 +1,5 @@
+func RIGHT(aVeryVeryVeryLongVariableName AVeryVeryVeryLongType,
+   anotherLongVariableName AnotherLongType) {
+   a := 5
+   b := ChangeMe
+}
diff --git a/t/t4018/golang-struct b/t/t4018/golang-struct
new file mode 100644
index 0..5deda77fe
--- /dev/null
+++ b/t/t4018/golang-struct
@@ -0,0 +1,4 @@
+type RIGHT struct {
+   a Type
+   b ChangeMe
+}
diff --git a/userdiff.c b/userdiff.c
index dbfb4e13c..8f5028f6b 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -38,6 +38,15 @@ IPATTERN("fortran",
 "|//|\\*\\*|::|[/<>=]="),
 IPATTERN("fountain", "^((\\.[^.]|(int|ext|est|int\\.?/ext|i/e)[. ]).*)$",
 "[^ \t-]+"),
+PATTERNS("golang",
+/* Functions */
+"^[ \t]*(func[ \t]*.*(\\{[ \t]*)?)\n"
+/* Structs and interfaces */
+"^[ \t]*(type[ \t].*(struct|interface)[ \t]*(\\{[ \t]*)?)",
+/* -- */
+"[a-zA-Z_][a-zA-Z0-9_]*"
+"|[-+0-9.eE]+i?|0[xX]?[0-9a-fA-F]+i?"
+"|[-+*/<>%&^|=!:]=|--|\\+\\+|<<=?|>>=?|&\\^=?|&&|\\|\\||<-|\\.{3}"),
 PATTERNS("html", "^[ \t]*(<[Hh][1-6]([ \t].*)?>.*)$",
 "[^<>= \t]+"),
 PATTERNS("java",
-- 
2.16.1

Re: ref-filter: how to improve the code

2018-03-01 Thread Оля Тележная

2018-02-28 16:25 GMT+03:00 Jeff King :
> On Sun, Feb 25, 2018 at 09:28:25PM +0300, Оля Тележная wrote:
>
>> I am trying to remove cat-file formatting part and reuse same
>> functionality from ref-filter.
>> I have a problem that cat-file sometimes needs to continue running
>> even if the request is broken, while in ref-filter we invoke die() in
>> many places everywhere during formatting process. I write this email
>> because I want to discuss how to implement the solution better.
>>
>> ref-filter has 2 functions which could be interesting for us:
>> format_ref_array_item() and show_ref_array_item(). I guess it's a good
>> idea to print everything in show_ref_array_item(), including all
>> errors. In that case all current users of ref-filter will invoke
>> show_ref_array_item() (as they did it before), and cat-file could use
>> format_ref_array_item() and work with the result in its own logic.
>
> Yes, that arrangement makes sense to me.
>
>> I tried to replace all die("...") with `return error("...")` and
>> finally exit(), but actual problem is that we print "error:..."
>> instead of "fatal:...", and it looks funny.
>
> If you do that, then format_ref_array_item() is still going to print
> things, even if it doesn't die(). But for "cat-file --batch", we usually
> do not print errors at all, but instead just say "... missing" (although
> it depends on the error; syntactic errors in the format string would
> still cause us to write to stderr).

Not sure if you catch my idea. format_ref_array_item() will not print
anything, it will just return an error code. And if there was an error
- we will print it in show_ref_array_item() (or go back to cat-file
and print what we want).

>
>> One of the easiest solutions is to add strbuf parameter for errors to
>> all functions that we use during formatting process, fill it in and
>> print in show_ref_array_item() if necessary. What do you think about
>> this idea?
>>
>> Another way is to change the resulting error message, print current
>> message with "error" prefix and then print something like "fatal:
>> could not format the output". It is easier but I am not sure that it's
>> a good idea to change the interface.
>
> For reference, the first one is what we've been switching to in the refs
> code. And I think it's been fairly successful there.
>
> -Peff

Re: [PATCH v3 2/9] t3701: indent here documents

2018-03-01 Thread Phillip Wood

Hi Junio

On 28/02/18 15:37, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> Is there an easy way for contributors to compare the branch they post to
>> what ends up it pu?
> 
> Distributed work is pretty much symmetric, so it can be done the
> same way as one would review a rerolled series by another co-worker.
> 
>  $ git log --oneline --first-parent origin/master..origin/pu
> 
> would show merges of topic branches, so you can find the tip of the
> topic of your earlier submission (it would be one $commit^2; call
> that $topic).  origin/master..$topic would be the one branch
> (i.e. what is in 'pu') to be compared.
> 
> The other branch to be compared is what you sent the previous one
> out of, or the new version of the patches.
> 
> To compare two branches, git://github.com/trast/tbdiff is one of the
> easier way.  
> 
> Before I learned about the tool, I used to "format-patch --stdout"
> on both branches, and ran "diff -u" between them, as a crude measure;
> it was more useful for spotting typofixes in the log messages than
> code changes, before I got good at reading diff of diffs ;-).
> 
> Also, tentatively rebasing the two branches on a common base, and
> then doing "git diff $oldtopic~$N $newtopic~$N" or something like
> that for varying value of $N (and N==0 is a good way for final
> sanity checks).

Thanks for the tips, tbdiff looks useful (I just need to learn to read
diffs of diffs!). I also find rebasing them on a common ancestor useful
but its a bit tedious.

Thanks again

Phillip

[PATCH v4 6/9] add -p: adjust offsets of subsequent hunks when one is skipped

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

Since commit 8cbd431082 ("git-add--interactive: replace hunk
recounting with apply --recount", 2008-7-2) if a hunk is skipped then
we rely on the context lines to apply subsequent hunks in the right
place. While this works most of the time it is possible for hunks to
end up being applied in the wrong place. To fix this adjust the offset
of subsequent hunks to correct for any change in the number of
insertions or deletions due to the skipped hunk. The change in offset
due to edited hunks that have the number of insertions or deletions
changed is ignored here, it will be fixed in the next commit.

Signed-off-by: Phillip Wood 
---
 git-add--interactive.perl  | 15 +--
 t/t3701-add-interactive.sh |  2 +-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 8ababa6453..7a0a5896bb 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -926,14 +926,25 @@ sub coalesce_overlapping_hunks {
my @out = ();
 
my ($last_o_ctx, $last_was_dirty);
+   my $ofs_delta = 0;
 
-   for (grep { $_->{USE} } @in) {
+   for (@in) {
if ($_->{TYPE} ne 'hunk') {
push @out, $_;
next;
}
my $text = $_->{TEXT};
-   my ($o_ofs) = parse_hunk_header($text->[0]);
+   my ($o_ofs, $o_cnt, $n_ofs, $n_cnt) =
+   parse_hunk_header($text->[0]);
+   unless ($_->{USE}) {
+   $ofs_delta += $o_cnt - $n_cnt;
+   next;
+   }
+   if ($ofs_delta) {
+   $n_ofs += $ofs_delta;
+   $_->{TEXT}->[0] = format_hunk_header($o_ofs, $o_cnt,
+$n_ofs, $n_cnt);
+   }
if (defined $last_o_ctx &&
$o_ofs <= $last_o_ctx &&
!$_->{DIRTY} &&
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 094eeca405..e3150a4e07 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -544,7 +544,7 @@ test_expect_success 'set up pathological context' '
test_write_lines +b " a" >patch
 '
 
-test_expect_failure 'add -p works with pathological context lines' '
+test_expect_success 'add -p works with pathological context lines' '
git reset &&
printf "%s\n" n y |
git add -p &&
-- 
2.16.1

[PATCH v4 5/9] t3701: add failing test for pathological context lines

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

When a hunk is skipped by add -i the offsets of subsequent hunks are
not adjusted to account for any missing insertions due to the skipped
hunk. Most of the time this does not matter as apply uses the context
lines to apply the subsequent hunks in the correct place, however in
pathological cases the context lines will match at the now incorrect
offset and the hunk will be applied in the wrong place. The offsets of
hunks following an edited hunk that has had the number of insertions
or deletions changed also need to be updated in the same way. Add
failing tests to demonstrate this.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v2:
 - removed test_set_editor as it is already set

changes since v1:
 - changed edit test to use the existing fake editor and to strip
   the hunk header and some context lines as well as deleting an
   insertion
 - style fixes

 t/t3701-add-interactive.sh | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index f95714230b..094eeca405 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -531,4 +531,34 @@ test_expect_success 'status ignores dirty submodules 
(except HEAD)' '
! grep dirty-otherwise output
 '
 
+test_expect_success 'set up pathological context' '
+   git reset --hard &&
+   test_write_lines a a a a a a a a a a a >a &&
+   git add a &&
+   git commit -m a &&
+   test_write_lines c b a a a a a a a b a a a a >a &&
+   test_write_lines a a a a a a a b a a a a >expected-1 &&
+   test_write_lines   b a a a a a a a b a a a a >expected-2 &&
+   # check editing can cope with missing header and deleted context lines
+   # as well as changes to other lines
+   test_write_lines +b " a" >patch
+'
+
+test_expect_failure 'add -p works with pathological context lines' '
+   git reset &&
+   printf "%s\n" n y |
+   git add -p &&
+   git cat-file blob :a >actual &&
+   test_cmp expected-1 actual
+'
+
+test_expect_failure 'add -p patch editing works with pathological context 
lines' '
+   git reset &&
+   # n q q below is in case edit fails
+   printf "%s\n" e yn q q |
+   git add -p &&
+   git cat-file blob :a >actual &&
+   test_cmp expected-2 actual
+'
+
 test_done
-- 
2.16.1

[PATCH v4 2/9] t3701: indent here documents

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

Indent here documents in line with the current style for tests.
While at it, quote the end marker of here-docs that do not use
variable interpolation.

Signed-off-by: Phillip Wood 
Signed-off-by: Junio C Hamano 
---

Notes:
changes since v3:
 - updated to match what was in pu

 t/t3701-add-interactive.sh | 174 ++---
 1 file changed, 87 insertions(+), 87 deletions(-)

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 058698df6a..3130dafcf0 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -22,14 +22,14 @@ test_expect_success 'status works (initial)' '
 '
 
 test_expect_success 'setup expected' '
-cat >expected expected expected patch fake_editor.sh &&
-   cat >>fake_editor.sh <<\EOF &&
-mv -f "$1" oldpatch &&
-mv -f patch "$1"
-EOF
+   cat >>fake_editor.sh <<-\EOF &&
+   mv -f "$1" oldpatch &&
+   mv -f patch "$1"
+   EOF
chmod a+x fake_editor.sh &&
test_set_editor "$(pwd)/fake_editor.sh"
 '
@@ -126,10 +126,10 @@ test_expect_success 'bad edit rejected' '
 '
 
 test_expect_success 'setup patch' '
-cat >patch patch expected patch expected expected expected

[PATCH v4 8/9] add -p: fix counting when splitting and coalescing

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

When a file has no trailing new line at the end diff records this by
appending "\ No newline at end of file" below the last line of the
file. This line should not be counted in the hunk header. Fix the
splitting and coalescing code to count files without a trailing new line
properly and change one of the tests to test splitting without a
trailing new line.

Signed-off-by: Phillip Wood 
---
 git-add--interactive.perl  | 13 +
 t/t3701-add-interactive.sh | 31 +++
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 0df0c2aa06..3226c2c4f0 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -793,6 +793,11 @@ sub split_hunk {
while (++$i < @$text) {
my $line = $text->[$i];
my $display = $display->[$i];
+   if ($line =~ /^\\/) {
+   push @{$this->{TEXT}}, $line;
+   push @{$this->{DISPLAY}}, $display;
+   next;
+   }
if ($line =~ /^ /) {
if ($this->{ADDDEL} &&
!defined $next_hunk_start) {
@@ -887,8 +892,8 @@ sub merge_hunk {
$o_cnt = $n_cnt = 0;
for ($i = 1; $i < @{$prev->{TEXT}}; $i++) {
my $line = $prev->{TEXT}[$i];
-   if ($line =~ /^\+/) {
-   $n_cnt++;
+   if ($line =~ /^[+\\]/) {
+   $n_cnt++ if ($line =~ /^\+/);
push @line, $line;
next;
}
@@ -905,8 +910,8 @@ sub merge_hunk {
 
for ($i = 1; $i < @{$this->{TEXT}}; $i++) {
my $line = $this->{TEXT}[$i];
-   if ($line =~ /^\+/) {
-   $n_cnt++;
+   if ($line =~ /^[+\\]/) {
+   $n_cnt++ if ($line =~ /^\+/);
push @line, $line;
next;
}
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 4cc9517eda..a9a9478a29 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -237,31 +237,46 @@ test_expect_success 'setup patch' '
 baseline
 content
+lastline
+   \ No newline at end of file
EOF
 '
 
-# Expected output, similar to the patch but w/ diff at the top
+# Expected output, diff is similar to the patch but w/ diff at the top
 test_expect_success 'setup expected' '
-   cat >expected <<-\EOF
-   diff --git a/file b/file
-   index b6f2c08..61b9053 100755
+   echo diff --git a/file b/file >expected &&
+   cat patch |sed "/^index/s/ 100644/ 100755/" >>expected &&
+   cat >expected-output <<-\EOF
--- a/file
+++ b/file
@@ -1,2 +1,4 @@
+firstline
 baseline
 content
+lastline
+   \ No newline at end of file
+   @@ -1,2 +1,3 @@
+   +firstline
+baseline
+content
+   @@ -1,2 +2,3 @@
+baseline
+content
+   +lastline
+   \ No newline at end of file
EOF
 '
 
 # Test splitting the first patch, then adding both
-test_expect_success 'add first line works' '
+test_expect_success C_LOCALE_OUTPUT 'add first line works' '
git commit -am "clear local changes" &&
git apply patch &&
-   (echo s; echo y; echo y) | git add -p file &&
-   git diff --cached > diff &&
-   diff_cmp expected diff
+   printf "%s\n" s y y | git add -p file 2>error |
+   sed -n -e "s/^Stage this hunk[^@]*\(@@ .*\)/\1/" \
+  -e "/^[-+@ ]"/p  >output &&
+   test_must_be_empty error &&
+   git diff --cached >diff &&
+   diff_cmp expected diff &&
+   test_cmp expected-output output
 '
 
 test_expect_success 'setup expected' '
-- 
2.16.1

[PATCH v4 4/9] t3701: don't hard code sha1 hash values

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

Use a filter when comparing diffs to fix the value of non-zero hashes
in diff index lines so we're not hard coding sha1 hash values in the
expected output. This makes it easier to change the expected output if
a test is edited as we don't need to worry about the exact hash value
and means the tests will work when the hash algorithm is transitioned
away from sha1.

Thanks-to: Junio C Hamano 
Signed-off-by: Phillip Wood 
---

Notes:
changes since v3:
 - fix zero hash values to seven digits
changes since v2:
 - fix hash values in index lines rather than removing the line
 - reworded commit message

 t/t3701-add-interactive.sh | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 836ce346ed..f95714230b 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -10,6 +10,19 @@ then
test_done
 fi
 
+diff_cmp () {
+   for x
+   do
+   sed  -e '/^index/s/[0-9a-f]*[1-9a-f][0-9a-f]*\.\./1234567../' \
+-e '/^index/s/\.\.[0-9a-f]*[1-9a-f][0-9a-f]*/..9abcdef/' \
+-e '/^index/s/ 00*\.\./ 000../' \
+-e '/^index/s/\.\.00*$/..000/' \
+-e '/^index/s/\.\.00* /..000 /' \
+"$x" >"$x.filtered"
+   done
+   test_cmp "$1.filtered" "$2.filtered"
+}
+
 test_expect_success 'setup (initial)' '
echo content >file &&
git add file &&
@@ -35,7 +48,7 @@ test_expect_success 'setup expected' '
 test_expect_success 'diff works (initial)' '
(echo d; echo 1) | git add -i >output &&
sed -ne "/new file/,/content/p" diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 test_expect_success 'revert works (initial)' '
git add file &&
@@ -72,7 +85,7 @@ test_expect_success 'setup expected' '
 test_expect_success 'diff works (commit)' '
(echo d; echo 1) | git add -i >output &&
sed -ne "/^index/,/content/p" diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 test_expect_success 'revert works (commit)' '
git add file &&
@@ -91,7 +104,7 @@ test_expect_success 'dummy edit works' '
test_set_editor : &&
(echo e; echo a) | git add -p &&
git diff > diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 
 test_expect_success 'setup patch' '
@@ -159,7 +172,7 @@ test_expect_success 'setup expected' '
 test_expect_success 'real edit works' '
(echo e; echo n; echo d) | git add -p &&
git diff >output &&
-   test_cmp expected output
+   diff_cmp expected output
 '
 
 test_expect_success 'skip files similarly as commit -a' '
@@ -171,7 +184,7 @@ test_expect_success 'skip files similarly as commit -a' '
git reset &&
git commit -am commit &&
git diff >expected &&
-   test_cmp expected output &&
+   diff_cmp expected output &&
git reset --hard HEAD^
 '
 rm -f .gitignore
@@ -248,7 +261,7 @@ test_expect_success 'add first line works' '
git apply patch &&
(echo s; echo y; echo y) | git add -p file &&
git diff --cached > diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 
 test_expect_success 'setup expected' '
@@ -271,7 +284,7 @@ test_expect_success 'deleting a non-empty file' '
rm non-empty &&
echo y | git add -p non-empty &&
git diff --cached >diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 
 test_expect_success 'setup expected' '
@@ -290,7 +303,7 @@ test_expect_success 'deleting an empty file' '
rm empty &&
echo y | git add -p empty &&
git diff --cached >diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 
 test_expect_success 'split hunk setup' '
@@ -355,7 +368,7 @@ test_expect_success 'patch mode ignores unmerged entries' '
+changed
EOF
git diff --cached >diff &&
-   test_cmp expected diff
+   diff_cmp expected diff
 '
 
 test_expect_success TTY 'diffs can be colorized' '
@@ -384,7 +397,7 @@ test_expect_success 'patch-mode via -i prompts for files' '
 
echo test >expect &&
git diff --cached --name-only >actual &&
-   test_cmp expect actual
+   diff_cmp expect actual
 '
 
 test_expect_success 'add -p handles globs' '
-- 
2.16.1

[PATCH v4 0/9] Correct offsets of hunks when one is skipped

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

I've fixed the second patch to match what was in pu and added some
extra code to patch 4 to handle zero sha1 hashes where the length
varies.

Cover letter to v1:

While working on a patch series to stage selected lines from a hunk
without having to edit it I got worried that subsequent patches would
be applied in the wrong place which lead to this series to correct the
offsets of hunks following those that are skipped or edited.

Interdiff to v3:

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 05540ee9ef..a9a9478a29 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -15,6 +15,9 @@ diff_cmp () {
do
sed  -e '/^index/s/[0-9a-f]*[1-9a-f][0-9a-f]*\.\./1234567../' \
 -e '/^index/s/\.\.[0-9a-f]*[1-9a-f][0-9a-f]*/..9abcdef/' \
+-e '/^index/s/ 00*\.\./ 000../' \
+-e '/^index/s/\.\.00*$/..000/' \
+-e '/^index/s/\.\.00* /..000 /' \
 "$x" >"$x.filtered"
done
test_cmp "$1.filtered" "$2.filtered"
@@ -32,7 +35,7 @@ test_expect_success 'status works (initial)' '
 '
 
 test_expect_success 'setup expected' '
-   cat >expected <<-EOF
+   cat >expected <<-\EOF
new file mode 100644
index 000..d95f3ad
--- /dev/null
@@ -69,7 +72,7 @@ test_expect_success 'status works (commit)' '
 '
 
 test_expect_success 'setup expected' '
-   cat >expected <<-EOF
+   cat >expected <<-\EOF
index 180b47c..b6f2c08 100644
--- a/file
+++ b/file
@@ -93,7 +96,7 @@ test_expect_success 'revert works (commit)' '
 
 
 test_expect_success 'setup expected' '
-   cat >expected <<-EOF
+   cat >expected <<-\EOF
EOF
 '
 
@@ -105,7 +108,7 @@ test_expect_success 'dummy edit works' '
 '
 
 test_expect_success 'setup patch' '
-   cat >patch <<-EOF
+   cat >patch <<-\EOF
@@ -1,1 +1,4 @@
 this
+patch
@@ -129,7 +132,7 @@ test_expect_success 'bad edit rejected' '
 '
 
 test_expect_success 'setup patch' '
-   cat >patch <<-EOF
+   cat >patch <<-\EOF
this patch
is garbage
EOF
@@ -142,7 +145,7 @@ test_expect_success 'garbage edit rejected' '
 '
 
 test_expect_success 'setup patch' '
-   cat >patch <<-EOF
+   cat >patch <<-\EOF
@@ -1,0 +1,0 @@
 baseline
+content
@@ -152,7 +155,7 @@ test_expect_success 'setup patch' '
 '
 
 test_expect_success 'setup expected' '
-   cat >expected <<-EOF
+   cat >expected <<-\EOF
diff --git a/file b/file
index b5dd6c9..f910ae9 100644
--- a/file
@@ -225,7 +228,7 @@ test_expect_success 'setup again' '
 
 # Write the patch file with a new line at the top and bottom
 test_expect_success 'setup patch' '
-   cat >patch <<-EOF
+   cat >patch <<-\EOF
index 180b47c..b6f2c08 100644
--- a/file
+++ b/file
@@ -242,7 +245,7 @@ test_expect_success 'setup patch' '
 test_expect_success 'setup expected' '
echo diff --git a/file b/file >expected &&
cat patch |sed "/^index/s/ 100644/ 100755/" >>expected &&
-   cat >expected-output <<-EOF
+   cat >expected-output <<-\EOF
--- a/file
+++ b/file
@@ -1,2 +1,4 @@
@@ -277,7 +280,7 @@ test_expect_success C_LOCALE_OUTPUT 'add first line works' '
 '
 
 test_expect_success 'setup expected' '
-   cat >expected <<-EOF
+   cat >expected <<-\EOF
diff --git a/non-empty b/non-empty
deleted file mode 100644
index d95f3ad..000
@@ -300,7 +303,7 @@ test_expect_success 'deleting a non-empty file' '
 '
 
 test_expect_success 'setup expected' '
-   cat >expected <<-EOF
+   cat >expected <<-\EOF
diff --git a/empty b/empty
deleted file mode 100644
index e69de29..000


Phillip Wood (9):
  add -i: add function to format hunk header
  t3701: indent here documents
  t3701: use test_write_lines and write_script
  t3701: don't hard code sha1 hash values
  t3701: add failing test for pathological context lines
  add -p: adjust offsets of subsequent hunks when one is skipped
  add -p: calculate offset delta for edited patches
  add -p: fix counting when splitting and coalescing
  add -p: don't rely on apply's '--recount' option

 git-add--interactive.perl  | 106 +
 t/t3701-add-interactive.sh | 291 +
 2 files changed, 243 insertions(+), 154 deletions(-)

-- 
2.16.1

[PATCH v4 1/9] add -i: add function to format hunk header

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

This code is duplicated in a couple of places so make it into a
function as we're going to add some more callers shortly.

Signed-off-by: Phillip Wood 
---
 git-add--interactive.perl | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 964c3a7542..8ababa6453 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -751,6 +751,15 @@ sub parse_hunk_header {
return ($o_ofs, $o_cnt, $n_ofs, $n_cnt);
 }
 
+sub format_hunk_header {
+   my ($o_ofs, $o_cnt, $n_ofs, $n_cnt) = @_;
+   return ("@@ -$o_ofs" .
+   (($o_cnt != 1) ? ",$o_cnt" : '') .
+   " +$n_ofs" .
+   (($n_cnt != 1) ? ",$n_cnt" : '') .
+   " @@\n");
+}
+
 sub split_hunk {
my ($text, $display) = @_;
my @split = ();
@@ -838,11 +847,7 @@ sub split_hunk {
my $o_cnt = $hunk->{OCNT};
my $n_cnt = $hunk->{NCNT};
 
-   my $head = ("@@ -$o_ofs" .
-   (($o_cnt != 1) ? ",$o_cnt" : '') .
-   " +$n_ofs" .
-   (($n_cnt != 1) ? ",$n_cnt" : '') .
-   " @@\n");
+   my $head = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt);
my $display_head = $head;
unshift @{$hunk->{TEXT}}, $head;
if ($diff_use_color) {
@@ -912,11 +917,7 @@ sub merge_hunk {
}
push @line, $line;
}
-   my $head = ("@@ -$o0_ofs" .
-   (($o_cnt != 1) ? ",$o_cnt" : '') .
-   " +$n0_ofs" .
-   (($n_cnt != 1) ? ",$n_cnt" : '') .
-   " @@\n");
+   my $head = format_hunk_header($o0_ofs, $o_cnt, $n0_ofs, $n_cnt);
@{$prev->{TEXT}} = ($head, @line);
 }
 
-- 
2.16.1

[PATCH v4 7/9] add -p: calculate offset delta for edited patches

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

Recount the number of preimage and postimage lines in a hunk after it
has been edited so any change in the number of insertions or deletions
can be used to adjust the offsets of subsequent hunks. If an edited
hunk is subsequently split then the offset correction will be lost. It
would be possible to fix this if it is a problem, however the code
here is still an improvement on the status quo for the common case
where an edited hunk is applied without being split.

This is also a necessary step to removing '--recount' and
'--allow-overlap' from the invocation of 'git apply'. Before
'--recount' can be removed the splitting and coalescing counting needs
to be fixed to handle a missing newline at the end of a file. In order
to remove '--allow-overlap' there needs to be i) some way of verifying
the offset data in the edited hunk (probably by correlating the
preimage (or postimage if the patch is going to be applied in reverse)
lines of the edited and unedited versions to see if they are offset or
if any leading/trailing context lines have been removed) and ii) a way of
dealing with edited hunks that change context lines that are shared
with neighbouring hunks.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1
 - edited hunks are now recounted before seeing if they apply in
   preparation for removing '--recount' when invoking 'git apply'
 - added sentence to commit message about the offset data being lost
   if an edited hunk is split

 git-add--interactive.perl  | 55 ++
 t/t3701-add-interactive.sh |  2 +-
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 7a0a5896bb..0df0c2aa06 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -938,13 +938,19 @@ sub coalesce_overlapping_hunks {
parse_hunk_header($text->[0]);
unless ($_->{USE}) {
$ofs_delta += $o_cnt - $n_cnt;
+   # If this hunk has been edited then subtract
+   # the delta that is due to the edit.
+   $_->{OFS_DELTA} and $ofs_delta -= $_->{OFS_DELTA};
next;
}
if ($ofs_delta) {
$n_ofs += $ofs_delta;
$_->{TEXT}->[0] = format_hunk_header($o_ofs, $o_cnt,
 $n_ofs, $n_cnt);
}
+   # If this hunk was edited then adjust the offset delta
+   # to reflect the edit.
+   $_->{OFS_DELTA} and $ofs_delta += $_->{OFS_DELTA};
if (defined $last_o_ctx &&
$o_ofs <= $last_o_ctx &&
!$_->{DIRTY} &&
@@ -1016,6 +1022,30 @@ marked for discarding."),
 marked for applying."),
 );
 
+sub recount_edited_hunk {
+   local $_;
+   my ($oldtext, $newtext) = @_;
+   my ($o_cnt, $n_cnt) = (0, 0);
+   for (@{$newtext}[1..$#{$newtext}]) {
+   my $mode = substr($_, 0, 1);
+   if ($mode eq '-') {
+   $o_cnt++;
+   } elsif ($mode eq '+') {
+   $n_cnt++;
+   } elsif ($mode eq ' ') {
+   $o_cnt++;
+   $n_cnt++;
+   }
+   }
+   my ($o_ofs, undef, $n_ofs, undef) =
+   parse_hunk_header($newtext->[0]);
+   $newtext->[0] = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt);
+   my (undef, $orig_o_cnt, undef, $orig_n_cnt) =
+   parse_hunk_header($oldtext->[0]);
+   # Return the change in the number of lines inserted by this hunk
+   return $orig_o_cnt - $orig_n_cnt - $o_cnt + $n_cnt;
+}
+
 sub edit_hunk_manually {
my ($oldtext) = @_;
 
@@ -1114,25 +1144,32 @@ sub prompt_yesno {
 }
 
 sub edit_hunk_loop {
-   my ($head, $hunk, $ix) = @_;
-   my $text = $hunk->[$ix]->{TEXT};
+   my ($head, $hunks, $ix) = @_;
+   my $hunk = $hunks->[$ix];
+   my $text = $hunk->{TEXT};
 
while (1) {
-   $text = edit_hunk_manually($text);
-   if (!defined $text) {
+   my $newtext = edit_hunk_manually($text);
+   if (!defined $newtext) {
return undef;
}
my $newhunk = {
-   TEXT => $text,
-   TYPE => $hunk->[$ix]->{TYPE},
+   TEXT => $newtext,
+   TYPE => $hunk->{TYPE},
USE => 1,
DIRTY => 1,
};
+   $newhunk->{OFS_DELTA} = recount_edited_hunk($text, $newtext);
+   # If this hunk has already been edited then add the
+

[PATCH v4 9/9] add -p: don't rely on apply's '--recount' option

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

Now that add -p counts patches properly it should be possible to turn
off the '--recount' option when invoking 'git apply'

Signed-off-by: Phillip Wood 
---

Notes:
I can't think of a reason why this shouldn't be OK but I can't help
feeling slightly nervous about it. I've made it a separate patch so it
can be easily dropped or reverted if I've missed something.

 git-add--interactive.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 3226c2c4f0..a64c0db57d 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -677,7 +677,7 @@ sub add_untracked_cmd {
 sub run_git_apply {
my $cmd = shift;
my $fh;
-   open $fh, '| git ' . $cmd . " --recount --allow-overlap";
+   open $fh, '| git ' . $cmd . " --allow-overlap";
print $fh @_;
return close $fh;
 }
-- 
2.16.1

[PATCH v4 3/9] t3701: use test_write_lines and write_script

2018-03-01 Thread Phillip Wood

From: Phillip Wood 

Simplify things slightly by using the above helpers.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v2
 - fixed use of test_set_editor to match what was in pu

 t/t3701-add-interactive.sh | 33 +
 1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 3130dafcf0..836ce346ed 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -87,13 +87,8 @@ test_expect_success 'setup expected' '
EOF
 '
 
-test_expect_success 'setup fake editor' '
-   >fake_editor.sh &&
-   chmod a+x fake_editor.sh &&
-   test_set_editor "$(pwd)/fake_editor.sh"
-'
-
 test_expect_success 'dummy edit works' '
+   test_set_editor : &&
(echo e; echo a) | git add -p &&
git diff > diff &&
test_cmp expected diff
@@ -110,12 +105,10 @@ test_expect_success 'setup patch' '
 '
 
 test_expect_success 'setup fake editor' '
-   echo "#!$SHELL_PATH" >fake_editor.sh &&
-   cat >>fake_editor.sh <<-\EOF &&
+   write_script "fake_editor.sh" <<-\EOF &&
mv -f "$1" oldpatch &&
mv -f patch "$1"
EOF
-   chmod a+x fake_editor.sh &&
test_set_editor "$(pwd)/fake_editor.sh"
 '
 
@@ -302,18 +295,12 @@ test_expect_success 'deleting an empty file' '
 
 test_expect_success 'split hunk setup' '
git reset --hard &&
-   for i in 10 20 30 40 50 60
-   do
-   echo $i
-   done >test &&
+   test_write_lines 10 20 30 40 50 60 >test &&
git add test &&
test_tick &&
git commit -m test &&
 
-   for i in 10 15 20 21 22 23 24 30 40 50 60
-   do
-   echo $i
-   done >test
+   test_write_lines 10 15 20 21 22 23 24 30 40 50 60 >test
 '
 
 test_expect_success 'split hunk "add -p (edit)"' '
@@ -334,17 +321,7 @@ test_expect_success 'split hunk "add -p (edit)"' '
 '
 
 test_expect_failure 'split hunk "add -p (no, yes, edit)"' '
-   cat >test <<-\EOF &&
-   5
-   10
-   20
-   21
-   30
-   31
-   40
-   50
-   60
-   EOF
+   test_write_lines 5 10 20 21 30 31 40 50 60 >test &&
git reset &&
# test sequence is s(plit), n(o), y(es), e(dit)
# q n q q is there to make sure we exit at the end.
-- 
2.16.1

[PATCH/RFC 0/1] Avoid expensive 'repack -ad' in gc --auto

2018-03-01 Thread Nguyễn Thái Ngọc Duy

The series [1] I just sent helps reduce pack-objects memory footprint
a bit. But even then it's still a huge memory hog. So this patch makes
a special treatment for gc --auto: avoid it completely.

The trick here is not new (pinning the largest pack with a .keep
file). It's just never done automatically. I think this is a good
thing to do, provided that gc --auto estimates memory usage more or
less correct.

And "git gc --auto" should run even on weak machines because it's part
of regular repo maintenance. You can't tell people "You can't work on
linux-2.6.git repository because your machine has too little memory".

The only thing left I think I should do is to use an external rev-list
to free up some more memory. But let's see how the first patch goes
first (documents and tests are missing, I know).

[1] https://public-inbox.org/git/%3c20180301091052.32267-1-pclo...@gmail.com%3E/

Nguyễn Thái Ngọc Duy (1):
  gc --auto: exclude the largest giant pack in low-memory config

 builtin/gc.c   | 125 +++--
 builtin/pack-objects.c |   2 +-
 config.mak.uname   |   1 +
 git-compat-util.h  |   4 ++
 pack-objects.h |   2 +
 5 files changed, 128 insertions(+), 6 deletions(-)

-- 
2.16.1.435.g8f24da2e1a

[PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config

2018-03-01 Thread Nguyễn Thái Ngọc Duy

pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
largest pack to avoid this problem is also known for a long time.

Let's do the suggestion automatically instead of waiting for people to
come to Git mailing list and get the advice. When a certain condition
is met, gc --auto create a .keep file temporary before repack is run,
then remove it afterward.

gc --auto does this based on an estimation of pack-objects memory
usage and whether that fits in one third of system memory (the
assumption here is for desktop environment where there are many other
applications running).

Since the estimation may be inaccurate and that 1/3 threshold is
arbitrary, give the user a finer control over this mechanism as well:
if the largest pack is larger than gc.bigPackThreshold, it's kept.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/gc.c   | 125 +++--
 builtin/pack-objects.c |   2 +-
 config.mak.uname   |   1 +
 git-compat-util.h  |   4 ++
 pack-objects.h |   2 +
 5 files changed, 128 insertions(+), 6 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..2d9965bcdf 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,10 @@
 #include "argv-array.h"
 #include "commit.h"
 #include "packfile.h"
+#include "pack.h"
+#include "pack-objects.h"
+#include "blob.h"
+#include "tree.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -39,6 +43,8 @@ static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
+static unsigned long big_pack_threshold = 0;
+static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
 
 static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT;
 static struct argv_array reflog = ARGV_ARRAY_INIT;
@@ -49,6 +55,7 @@ static struct argv_array rerere = ARGV_ARRAY_INIT;
 
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
+static struct strbuf temp_keep_file = STRBUF_INIT;
 
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
 
@@ -93,6 +100,18 @@ static void process_log_file(void)
}
 }
 
+static void delete_temp_keep_file(void)
+{
+   unlink(temp_keep_file.buf);
+}
+
+static void delete_temp_keep_file_on_signal(int signo)
+{
+   delete_temp_keep_file();
+   sigchain_pop(signo);
+   raise(signo);
+}
+
 static void process_log_file_at_exit(void)
 {
fflush(stderr);
@@ -126,6 +145,9 @@ static void gc_config(void)
git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
git_config_get_expiry("gc.logexpiry", _log_expire);
 
+   git_config_get_ulong("gc.bigpackthreshold", _pack_threshold);
+   git_config_get_ulong("pack.deltacachesize", _delta_cache_size);
+
git_config(git_default_config, NULL);
 }
 
@@ -164,7 +186,7 @@ static int too_many_loose_objects(void)
return needed;
 }
 
-static int too_many_packs(void)
+static int too_many_packs(struct packed_git **largest_pack)
 {
struct packed_git *p;
int cnt;
@@ -173,22 +195,104 @@ static int too_many_packs(void)
return 0;
 
prepare_packed_git();
+   *largest_pack = NULL;
for (cnt = 0, p = packed_git; p; p = p->next) {
if (!p->pack_local)
continue;
if (p->pack_keep)
continue;
+   if (!*largest_pack || (*largest_pack)->pack_size  < 
p->pack_size)
+   *largest_pack = p;
/*
 * Perhaps check the size of the pack and count only
 * very small ones here?
 */
cnt++;
}
+
return gc_auto_pack_limit < cnt;
 }
 
-static void add_repack_all_option(void)
+static inline unsigned long total_ram(void)
+{
+   unsigned long default_ram = 4;
+#ifdef HAVE_SYSINFO
+   struct sysinfo si;
+
+   if (!sysinfo())
+   return si.totalram;
+#elif defined(HAVE_BSD_SYSCTL) && defined(HW_MEMSIZE)
+   int64_t physical_memory;
+   int mib[2];
+   int length;
+
+   mib[0] = CTL_HW;
+   mib[1] = HW_MEMSIZE;
+   length = sizeof(int64_t);
+   if (!sysctl(mib, 2, _memory, , NULL, 0))
+   return physical_memory;
+#elif defined(GIT_WINDOWS_NATIVE)
+   MEMORYSTATUSEX memInfo;
+
+   memInfo.dwLength = sizeof(MEMORYSTATUSEX);
+   if (GlobalMemoryStatusEx())
+   return memInfo;ullTotalPhys;
+#else
+   fprintf(stderr, _("unrecognized platform, assuming %lu GB RAM\n"),
+   default_ram);
+#endif
+   return default_ram * 1024 * 1024 * 1024;
+}
+
+static int pack_objects_uses_too_much_memory(struct packed_git *pack)
+{
+   unsigned long nr_objects = approximate_object_count();
+   size_t mem_want, mem_have;
+
+

[PATCH 00/11] Reduce pack-objects memory footprint

2018-03-01 Thread Nguyễn Thái Ngọc Duy

The array of object_entry in pack-objects can take a lot of memory
when pack-objects is run in "pack everything" mode. On linux-2.6.git,
this array alone takes roughly 800MB.

This series reorders some fields and reduces field size... to keep
this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on
64-bit linux and saves 260MB on linux-2.6.git.

Now the bad side:

- the number of pack files pack-objects can handle is reduced to 4096
  (previously unlimited)
- max delta chain is also limited to 4096 (previously practically
  unlimited)
- some patches are quite invasive (e.g. replacing pointer with
  uint32_t) and reduces readability a bit.
- it may be tricker to add more data in object_entry in the future.

Nguyễn Thái Ngọc Duy (11):
  pack-objects: document holes in struct object_entry.h
  pack-objects: turn type and in_pack_type to bitfields
  pack-objects: use bitfield for object_entry::dfs_state
  pack-objects: use bitfield for object_entry::depth
  pack-objects: note about in_pack_header_size
  pack-objects: move in_pack_pos out of struct object_entry
  pack-objects: move in_pack out of struct object_entry
  pack-objects: faster reverse packed_git lookup
  pack-objects: refer to delta objects by index instead of pointer
  pack-objects: reorder 'hash' to pack struct object_entry
  pack-objects: increase pack file limit to 4096

 builtin/pack-objects.c | 189 ++---
 cache.h|   3 +
 object.h   |   1 -
 pack-bitmap-write.c|   8 +-
 pack-bitmap.c  |   2 +-
 pack-bitmap.h  |   4 +-
 pack-objects.h |  70 ++-
 7 files changed, 180 insertions(+), 97 deletions(-)

-- 
2.16.1.435.g8f24da2e1a

[PATCH 11/11] pack-objects: increase pack file limit to 4096

2018-03-01 Thread Nguyễn Thái Ngọc Duy

When OE_IN_PACK_BITS was added, we didn't have many bits left to spare
so the max number of packs that pack-objects could handle was limited
to 256. Now we have more spare bits, let's increase it to 4096 to be
on the safe side. If you have more than this many packs, you may need
to reconsider if you're still sane.

Increasing this also increases memory a bit because in_pack[] array in
packing_data is bigger, roughly 32kb, which is insignificant in
pack-objects context.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/pack-objects.h b/pack-objects.h
index 52087b32e5..ec4eba4ee4 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -3,7 +3,7 @@
 
 #define OE_DFS_STATE_BITS 2
 #define OE_DEPTH_BITS 12
-#define OE_IN_PACK_BITS 8
+#define OE_IN_PACK_BITS 12
 
 #define IN_PACK_POS(to_pack, obj) \
(to_pack)->in_pack_pos[(struct object_entry *)(obj) - 
(to_pack)->objects]
@@ -24,6 +24,11 @@ enum dfs_state {
DFS_NUM_STATES
 };
 
+/*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ */
 struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
@@ -51,7 +56,7 @@ struct object_entry {
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
 
-   /* XXX 12 bits hole, try to pack */
+   /* XXX 8 bits hole, try to pack */
 
unsigned depth:OE_DEPTH_BITS;
 
-- 
2.16.1.435.g8f24da2e1a

[PATCH 10/11] pack-objects: reorder 'hash' to pack struct object_entry

2018-03-01 Thread Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/pack-objects.h b/pack-objects.h
index f339f0411a..52087b32e5 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -33,12 +33,10 @@ struct object_entry {
uint32_t delta_sibling_idx; /* other deltified objects who
 * uses the same base as me
 */
-   /* XXX 4 bytes hole, try to pack */
-
+   uint32_t hash;  /* name hint hash */
void *delta_data;   /* cached delta (uncompressed) */
unsigned long delta_size;   /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
-   uint32_t hash;  /* name hint hash */
unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
unsigned type:TYPE_BITS;
@@ -57,7 +55,7 @@ struct object_entry {
 
unsigned depth:OE_DEPTH_BITS;
 
-   /* size: 104, padding: 4 */
+   /* size: 96 */
 };
 
 struct packing_data {
-- 
2.16.1.435.g8f24da2e1a

[PATCH 09/11] pack-objects: refer to delta objects by index instead of pointer

2018-03-01 Thread Nguyễn Thái Ngọc Duy

Notice that packing_data::nr_objects is uint32_t, we could only handle
maximum 4G objects and can address all of them with an uint32_t. If we
use a pointer here, we waste 4 bytes on 64 bit architecture.

Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].

There are holes in this struct but this patch is already big. Struct
packing can be done separately. Even with holes, we save 8 bytes per
object_entry.

[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 124 +++--
 pack-objects.h |  14 +++--
 2 files changed, 78 insertions(+), 60 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1fdb85ebb5..45076f2523 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -29,6 +29,20 @@
 #include "list.h"
 #include "packfile.h"
 
+#define DELTA(obj) \
+   ((obj)->delta_idx ? _pack.objects[(obj)->delta_idx - 1] : NULL)
+#define DELTA_CHILD(obj) \
+   ((obj)->delta_child_idx ? _pack.objects[(obj)->delta_child_idx - 1] 
: NULL)
+#define DELTA_SIBLING(obj) \
+   ((obj)->delta_sibling_idx ? _pack.objects[(obj)->delta_sibling_idx - 
1] : NULL)
+
+#define CLEAR_DELTA(obj) (obj)->delta_idx = 0
+#define CLEAR_DELTA_CHILD(obj) (obj)->delta_child_idx = 0
+#define CLEAR_DELTA_SIBLING(obj) (obj)->delta_sibling_idx = 0
+
+#define SET_DELTA(obj, val) (obj)->delta_idx = ((val) - to_pack.objects) + 1
+#define SET_DELTA_CHILD(obj, val) (obj)->delta_child_idx = ((val) - 
to_pack.objects) + 1
+
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
N_("git pack-objects [...]  [<  | < 
]"),
@@ -125,11 +139,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_sha1_file(entry->idx.oid.hash, , );
if (!buf)
die("unable to read %s", oid_to_hex(>idx.oid));
-   base_buf = read_sha1_file(entry->delta->idx.oid.hash, ,
+   base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, ,
  _size);
if (!base_buf)
die("unable to read %s",
-   oid_to_hex(>delta->idx.oid));
+   oid_to_hex((entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -286,12 +300,12 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
 
@@ -315,7 +329,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 * encoding of the relative offset for the delta
 * base from this object's position in the pack.
 */
-   off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -341,7 +355,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
-   hashwrite(f, entry->delta->idx.oid.hash, 20);
+   hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -377,8 +391,8 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
  dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
 
-   if (entry->delta)
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   if (DELTA(entry))
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
  type, entry->size);
@@ -406,7 +420,7 @@ static off_t

[PATCH 07/11] pack-objects: move in_pack out of struct object_entry

2018-03-01 Thread Nguyễn Thái Ngọc Duy

Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index isntead since the number of packs should be
relatively small.

This limits the number of packs we can handle to 256 (still
unreasonably high for a repo to work well). If you have more than 256
packs, you'll need an older version of Git to repack first.

This technically saves 7 bytes. But we don't see any of that in
practice due to padding. The saving becomes real when we pack this
struct tighter later.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 48 --
 pack-objects.h | 12 +--
 2 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7bb5544883..d0d371714a 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -367,7 +367,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
 {
-   struct packed_git *p = entry->in_pack;
+   struct packed_git *p = IN_PACK(_pack, entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +478,7 @@ static off_t write_object(struct hashfile *f,
 
if (!reuse_object)
to_reuse = 0;   /* explicit */
-   else if (!entry->in_pack)
+   else if (!IN_PACK(_pack, entry))
to_reuse = 0;   /* can't reuse what we don't have */
else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
@@ -1074,7 +1074,15 @@ static void create_object_entry(const struct object_id 
*oid,
else
nr_result++;
if (found_pack) {
-   entry->in_pack = found_pack;
+   int i;
+
+   for (i = 0; i < (1 << OE_IN_PACK_BITS); i++)
+   if (to_pack.in_pack[i] == found_pack) {
+   entry->in_pack_idx = i;
+   break;
+   }
+   if (i == (1 << OE_IN_PACK_BITS))
+   die("BUG: pack not found!");
entry->in_pack_offset = found_offset;
}
 
@@ -1399,8 +1407,8 @@ static void cleanup_preferred_base(void)
 
 static void check_object(struct object_entry *entry)
 {
-   if (entry->in_pack) {
-   struct packed_git *p = entry->in_pack;
+   if (IN_PACK(_pack, entry)) {
+   struct packed_git *p = IN_PACK(_pack, entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1543,16 @@ static int pack_offset_sort(const void *_a, const void 
*_b)
 {
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+   const struct packed_git *a_in_pack = IN_PACK(_pack, a);
+   const struct packed_git *b_in_pack = IN_PACK(_pack, b);
 
/* avoid filesystem trashing with loose objects */
-   if (!a->in_pack && !b->in_pack)
+   if (!a_in_pack && !b_in_pack)
return oidcmp(>idx.oid, >idx.oid);
 
-   if (a->in_pack < b->in_pack)
+   if (a_in_pack < b_in_pack)
return -1;
-   if (a->in_pack > b->in_pack)
+   if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1588,7 @@ static void drop_reused_delta(struct object_entry *entry)
 
oi.sizep = >size;
oi.typep = 
-   if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
{
+   if (packed_object_info(IN_PACK(_pack, entry), entry->in_pack_offset, 
) < 0) {
/*
 * We failed to get the info from this pack for some reason;
 * fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1858,8 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
 * it, we will still save the transfer cost, as we already know
 * the other side has it and we won't send src_entry at all.
 */
-   if (reuse_delta && trg_entry->in_pack &&
-   trg_entry->in_pack == src_entry->in_pack &&
+   if (reuse_delta && IN_PACK(_pack, trg_entry) &&
+   IN_PACK(_pack, trg_entry) == IN_PACK(_pack, src_entry) &&
!src_entry->preferred_base &&
trg_entry->in_pack_type != OBJ_REF_DELTA &&
trg_entry->in_pack_type != OBJ_OFS_DELTA)
@@ -2958,6 +2968,21 @@ static int option_parse_unpack_unreachable(const struct 
option *opt,
return 0;
 }
 
+static void init_in_pack_mapping(struct

[PATCH 04/11] pack-objects: use bitfield for object_entry::depth

2018-03-01 Thread Nguyễn Thái Ngọc Duy

This does not give us any saving due to padding. But we will be able
to save once we cut 4 bytes out of this struct in a subsequent patch.

Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 4 
 pack-objects.h | 6 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a4dbb40824..cfd97da7db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
 
+   if (depth > (1 << OE_DEPTH_BITS))
+   die(_("delta chain depth %d is greater than maximum limit %d"),
+   depth, (1 << OE_DEPTH_BITS));
+
argv_array_push(, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index fca334ab4d..3941e6c9a6 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
 #define PACK_OBJECTS_H
 
 #define OE_DFS_STATE_BITS 2
+#define OE_DEPTH_BITS 12
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -43,10 +44,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
-
-   /* XXX 20 bits hole, try to pack */
-
-   int depth;
+   unsigned depth:OE_DEPTH_BITS;
 
/* size: 120, padding: 4 */
 };
-- 
2.16.1.435.g8f24da2e1a

[PATCH 06/11] pack-objects: move in_pack_pos out of struct object_entry

2018-03-01 Thread Nguyễn Thái Ngọc Duy

This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (it's not freed in the same way that objects[] is
not). This saves us 8 bytes in struct object_entry.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 3 ++-
 pack-bitmap-write.c| 8 +---
 pack-bitmap.c  | 2 +-
 pack-bitmap.h  | 4 +++-
 pack-objects.h | 8 ++--
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cfd97da7db..7bb5544883 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -878,7 +878,8 @@ static void write_pack_file(void)
 
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
-   bitmap_writer_build_type_index(written_list, 
nr_written);
+   bitmap_writer_build_type_index(
+   _pack, written_list, nr_written);
}
 
finish_tmp_packfile(, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..1360a93311 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
 /**
  * Build the initial type index for the packfile
  */
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+   struct pack_idx_entry **index,
uint32_t index_nr)
 {
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry 
**index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+   ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
 
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
 
-   entry->in_pack_pos = i;
+   IN_PACK_POS(to_pack, entry) = i;
 
switch (entry->type) {
case OBJ_COMMIT:
@@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
 
-   return entry->in_pack_pos;
+   return IN_PACK_POS(writer.to_pack, entry);
 }
 
 static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 9270983e5f..f21479fe16 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
 
if (oe)
-   reposition[i] = oe->in_pack_pos + 1;
+   reposition[i] = IN_PACK_POS(mapping, oe) + 1;
}
 
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, 
khash_sha1 *reused_bi
 
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t 
index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+   struct pack_idx_entry **index,
+   uint32_t index_nr);
 void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index 017cc3425f..3bef28196c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -4,6 +4,9 @@
 #define OE_DFS_STATE_BITS 2
 #define OE_DEPTH_BITS 12
 
+#define IN_PACK_POS(to_pack, obj) \
+   (to_pack)->in_pack_pos[(struct object_entry *)(obj) - 
(to_pack)->objects]
+
 /*
  * State flags for depth-first search used for analyzing delta cycles.
  *
@@ -31,7 +34,6 @@ struct object_entry {
unsigned long delta_size;   /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
uint32_t hash;  /* name hint hash */
-   unsigned int in_pack_pos;
unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned type:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
@@ -46,7 +48,7 @@ struct object_entry {
unsigned dfs_state:OE_DFS_STATE_BITS;
unsigned depth:OE_DEPTH_BITS;
 
-   /* size: 120, padding: 4 */
+   /* size: 112 */
 };
 
 struct packing_data {
@@ -55,6 +57,8 @@ struct

[PATCH 02/11] pack-objects: turn type and in_pack_type to bitfields

2018-03-01 Thread Nguyễn Thái Ngọc Duy

This saves 8 bytes in sizeof(struct object_entry). On a large
repository like linux-2.6.git (6.5M objects), this saves us 52MB
memory.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 14 --
 cache.h|  2 ++
 object.h   |  1 -
 pack-objects.h |  8 
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c674b2843..fd217cb51f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+   enum object_type type;
 
buf = use_pack(p, _curs, entry->in_pack_offset, );
 
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
 * since non-delta representations could still be reused.
 */
used = unpack_object_header_buffer(buf, avail,
-  >in_pack_type,
+  ,
   >size);
if (used == 0)
goto give_up;
 
+   if (type < 0)
+   die("BUG: invalid type %d", type);
+   entry->in_pack_type = type;
+
/*
 * Determine if this is a delta and if so whether we can
 * reuse it or not.  Otherwise let's find out as cheaply as
@@ -1559,6 +1564,7 @@ static void drop_reused_delta(struct object_entry *entry)
 {
struct object_entry **p = >delta->delta_child;
struct object_info oi = OBJECT_INFO_INIT;
+   enum object_type type;
 
while (*p) {
if (*p == entry)
@@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry)
entry->depth = 0;
 
oi.sizep = >size;
-   oi.typep = >type;
+   oi.typep = 
if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
{
/*
 * We failed to get the info from this pack for some reason;
@@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry *entry)
 */
entry->type = sha1_object_info(entry->idx.oid.hash,
   >size);
+   } else {
+   if (type < 0)
+   die("BUG: invalid type %d", type);
+   entry->type = type;
}
 }
 
diff --git a/cache.h b/cache.h
index 21fbcc2414..862bdff83a 100644
--- a/cache.h
+++ b/cache.h
@@ -373,6 +373,8 @@ extern void free_name_hash(struct index_state *istate);
 #define read_blob_data_from_cache(path, sz) 
read_blob_data_from_index(_index, (path), (sz))
 #endif
 
+#define TYPE_BITS 3
+
 enum object_type {
OBJ_BAD = -1,
OBJ_NONE = 0,
diff --git a/object.h b/object.h
index 87563d9056..8ce294d6ec 100644
--- a/object.h
+++ b/object.h
@@ -25,7 +25,6 @@ struct object_array {
 
 #define OBJECT_ARRAY_INIT { 0, 0, NULL }
 
-#define TYPE_BITS   3
 /*
  * object flag allocation:
  * revision.h:  0-1026
diff --git a/pack-objects.h b/pack-objects.h
index 720a8e8756..f8b06e2521 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -14,11 +14,11 @@ struct object_entry {
void *delta_data;   /* cached delta (uncompressed) */
unsigned long delta_size;   /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
-   enum object_type type;
-   enum object_type in_pack_type;  /* could be delta */
uint32_t hash;  /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
+   unsigned type:TYPE_BITS;
+   unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
@@ -28,7 +28,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
 
-   /* XXX 28 bits hole, try to pack */
+   /* XXX 22 bits hole, try to pack */
/*
 * State flags for depth-first search used for analyzing delta cycles.
 *
@@ -41,7 +41,7 @@ struct object_entry {
DFS_DONE
} dfs_state;
int depth;
-   /* size: 136, padding: 4 */
+   /* size: 128, padding: 4 */
 };
 
 struct packing_data {
-- 
2.16.1.435.g8f24da2e1a

[PATCH 01/11] pack-objects: document holes in struct object_entry.h

2018-03-01 Thread Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..720a8e8756 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -28,6 +28,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
 
+   /* XXX 28 bits hole, try to pack */
/*
 * State flags for depth-first search used for analyzing delta cycles.
 *
@@ -40,6 +41,7 @@ struct object_entry {
DFS_DONE
} dfs_state;
int depth;
+   /* size: 136, padding: 4 */
 };
 
 struct packing_data {
-- 
2.16.1.435.g8f24da2e1a

[PATCH 05/11] pack-objects: note about in_pack_header_size

2018-03-01 Thread Nguyễn Thái Ngọc Duy

Object header in a pack is packed really tight (see
pack-format.txt). Even with 8 bytes length, we need 9-10 bytes most,
plus a hash (20 bytes). Which means this field only needs to store a
number as big as 32 (5 bits).

This is trickier to pack tight though since a new hash algorithm is
coming, the number of bits needed may quickly increase. So leave it
for now.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-objects.h b/pack-objects.h
index 3941e6c9a6..017cc3425f 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -32,7 +32,7 @@ struct object_entry {
unsigned long z_delta_size; /* delta data size (compressed) */
uint32_t hash;  /* name hint hash */
unsigned int in_pack_pos;
-   unsigned char in_pack_header_size;
+   unsigned char in_pack_header_size; /* note: spare bits available! */
unsigned type:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
-- 
2.16.1.435.g8f24da2e1a

[PATCH 08/11] pack-objects: faster reverse packed_git lookup

2018-03-01 Thread Nguyễn Thái Ngọc Duy

We do a linear search for in_pack index in create_object_entry(). This
function is called for every available object in the worst case (and
on linux-2.6.git, that's about 6.5M). Try to avoid that by saving the
index in packed_git. Since we should not have zillions of packs, this
extra space should not be a big deal.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 11 ++-
 cache.h|  1 +
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d0d371714a..1fdb85ebb5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1074,15 +1074,7 @@ static void create_object_entry(const struct object_id 
*oid,
else
nr_result++;
if (found_pack) {
-   int i;
-
-   for (i = 0; i < (1 << OE_IN_PACK_BITS); i++)
-   if (to_pack.in_pack[i] == found_pack) {
-   entry->in_pack_idx = i;
-   break;
-   }
-   if (i == (1 << OE_IN_PACK_BITS))
-   die("BUG: pack not found!");
+   entry->in_pack_idx = found_pack->index;
entry->in_pack_offset = found_offset;
}
 
@@ -2980,6 +2972,7 @@ static void init_in_pack_mapping(struct packing_data 
*to_pack)
if (i >= (1 << OE_IN_PACK_BITS))
die("BUG: too many packs to handle!");
to_pack->in_pack[i] = p;
+   p->index = i;
}
 }
 
diff --git a/cache.h b/cache.h
index 862bdff83a..b90feb3802 100644
--- a/cache.h
+++ b/cache.h
@@ -1635,6 +1635,7 @@ extern struct packed_git {
int index_version;
time_t mtime;
int pack_fd;
+   int index;  /* for builtin/pack-objects.c */
unsigned pack_local:1,
 pack_keep:1,
 freshened:1,
-- 
2.16.1.435.g8f24da2e1a

[PATCH 03/11] pack-objects: use bitfield for object_entry::dfs_state

2018-03-01 Thread Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c |  3 +++
 pack-objects.h | 33 -
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fd217cb51f..a4dbb40824 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
OPT_END(),
};
 
+   if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+   die("BUG: too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
 
reset_pack_idx_option(_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index f8b06e2521..fca334ab4d 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
 #ifndef PACK_OBJECTS_H
 #define PACK_OBJECTS_H
 
+#define OE_DFS_STATE_BITS 2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+   DFS_NONE = 0,
+   DFS_ACTIVE,
+   DFS_DONE,
+   DFS_NUM_STATES
+};
+
 struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
@@ -27,21 +42,13 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+   unsigned dfs_state:OE_DFS_STATE_BITS;
+
+   /* XXX 20 bits hole, try to pack */
 
-   /* XXX 22 bits hole, try to pack */
-   /*
-* State flags for depth-first search used for analyzing delta cycles.
-*
-* The depth is measured in delta-links to the base (so if A is a delta
-* against B, then A has a depth of 1, and B a depth of 0).
-*/
-   enum {
-   DFS_NONE = 0,
-   DFS_ACTIVE,
-   DFS_DONE
-   } dfs_state;
int depth;
-   /* size: 128, padding: 4 */
+
+   /* size: 120, padding: 4 */
 };
 
 struct packing_data {
-- 
2.16.1.435.g8f24da2e1a

Re: Reduce pack-objects memory footprint?

2018-03-01 Thread Duy Nguyen

On Wed, Feb 28, 2018 at 06:22:33PM +, Eric Wong wrote:
> Duy Nguyen  wrote:
> > which saves 12 bytes (or another 74 MB). 222 MB total is plenty of
> > space to keep some file cache from being evicted.
> 
> Nice!  I can definitely benefit from lower memory usage when
> packing.  Fwiw, I use pahole with other projects to help find
> packing opportunities:
> 
>   git://git.kernel.org/pub/scm/devel/pahole/pahole.git

Yes it's a wonderful tool.

> > @@ -14,11 +26,10 @@ struct object_entry {
> > void *delta_data;   /* cached delta (uncompressed) */
> > unsigned long delta_size;   /* delta data size (uncompressed) */
> > unsigned long z_delta_size; /* delta data size (compressed) */
> > -   enum object_type type;
> > -   enum object_type in_pack_type;  /* could be delta */
> > uint32_t hash;  /* name hint hash */
> > -   unsigned int in_pack_pos;
> > unsigned char in_pack_header_size;
> > +   unsigned type:3; /* enum object_type */
> > +   unsigned in_pack_type:3; /* enum object_type - could be delta */
> 
> For C99 compilers, enums can be bitfields.  I introduced the
> following macro into Ruby a few weeks ago to remain compatible
> with non-C99 compilers:
> 
> /*
>  * For declaring bitfields out of non-unsigned int types:
>  *   struct date {
>  *  BITFIELD(enum months) month:4;
>  *  ...
>  *   };
>  */
> #if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L)
> # define BITFIELD(type) type
> #else
> # define BITFIELD(type) unsigned int
> #endif

I tried this and got

In file included from builtin/pack-objects.c:20:0:
./pack-objects.h:49:19: l?i: ?type? is narrower than values of its type 
[-Werror]
  enum object_type type:TYPE_BITS;
   ^~~~

The compiler is not wrong. What it does not realize is pack-objects
code never uses out-of-range values (OBJ_BAD and OBJ_ANY) but I don't
see how I could suppress this warning. So I went back to non-enum
bitfields.

--
Duy

Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-03-01 Thread demerphq

On 1 March 2018 at 08:36, Jeff King  wrote:
> On Wed, Feb 28, 2018 at 05:51:14PM +0100, demerphq wrote:
>
>> I would look into putting it into a module and then using the PERL5OPT
>> environment var to have it loaded automagically in any of your perl
>> scripts.
>>
>> For instance if you put that code into a module called Git/DieTrap.pm
>>
>> then you could do:
>>
>> PERL5OPT=-MGit::DieTrap
>>
>> In your test setup code assuming you have some. Then you don't need to
>> change any of your scripts just the test runner framework.
>
> That's a clever trick.
>
> It's not clear to me though if we just want to tweak the programs run in
> the test scripts in order to get test_must_fail to stop complaining, or
> if we consider the unusual exit codes from our perl-based Git programs
> to be an error that should be fixed for real use, too.

Yeah, that is a decision you guys need to make, I am not familiar
enough with the issues to make any useful comment.

But I wanted to say that I will bring this subject up on perl5porters,
the exit code triggered by a die is a regular cause of trouble for
more than just you guys, and maybe we can get it changed for the
future. Nevertheless even if there was consensus it can be changed it
will take years before it is widely distributed enough to be useful to
you. :-(

Ill be bold and say sorry on the behalf of the perl committers for
this. Perl is so old sometimes things that used to make sense don't
make sense anymore.

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

93 matches

Mail list logo