Re: Bug report: "Use of uninitialized value $_ in print"
Hi, Sam Kuper wrote: > First, background. I encountered a bug on Debian Stretch, using this > git version: > > $ git --version > git version 2.11.0 > > The bug is that in the midst of running > > git -c interactive.diffFilter="git diff --word-diff --color" add --patch > > and after having answered "y" to some patches and "n" to others, git > suddenly spewed the following output: > > Use of uninitialized value $_ in print at > /usr/lib/git-core/git-add--interactive line 1371, line 74. > Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n > Use of uninitialized value $_ in print at > /usr/lib/git-core/git-add--interactive line 1371, line 75. [...] Strange. The relevant line, for reference: $ dpkg-deb --fsys-tarfile git_2.11.0-3+deb9u2_amd64.deb | tar Oxf - ./usr/lib/git-core/git-add--interactive | sed -n '1370,1372 p' for (@{$hunk[$ix]{DISPLAY}}) { print; < this one } This is a foreach loop, so it's supposed to have set $_ to each value in the list @{$hunk[$ix]{DISPLAY}). So why is Perl considering it uninitialized? Is this reproducible for you? Do you have more details about how I can reproduce it? What arch are you on? What perl version do you use? Can you report this using "reportbug git"? Thanks, Jonathan
Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)
Hi Igor, Igor Djordjevicwrites: > Hi Sergey, > > On 01/03/2018 06:39, Sergey Organov wrote: [...] >> >> Yeah, I now see it myself. I'm sorry for being lazy and not inspecting >> this more carefully in the first place. > > No problem, that`s why we`re discussing it, and I`m glad we`re > aligned now, so we can move forward :) > >> > So while your original proposal currently seems like it could be >> > working nicely for non-interactive rebase (and might be some simpler >> > interactive ones), now hitting/acknowledging its first real use >> > limit, my additional quick attempt[1] just tries to aid pretty >> > interesting case of complicated interactive rebase, too, where we >> > might be able to do better as well, still using you original proposal >> > as a base idea :) >> >> Yes, thank you for pushing me back to reality! :-) The work and thoughts >> you are putting into solving the puzzle are greatly appreciated! > > You`re welcome, and I am enjoying it :) > >> Thinking about it overnight, I now suspect that original proposal had a >> mistake in the final merge step. I think that what you did is a way to >> fix it, and I want to try to figure what exactly was wrong in the >> original proposal and to find simpler way of doing it right. >> >> The likely solution is to use original UM as a merge-base for final >> 3-way merge of U1' and U2', but I'm not sure yet. Sounds pretty natural >> though, as that's exactly UM from which both U1' and U2' have diverged >> due to rebasing and other history editing. > > Yes, this might be it...! ;) > > To prove myself it works, I`ve assembled a pretty crazy `-s ours` > merge interactive rebase scenario, and it seems this passes the test, > ticking all the check boxes (I could think of) :P I must admit it's quite a relief to hear this. What we now have is so simple and obvious that it'd be a huge disappointment if didn't work! > Here, merge commit M is done with `-s ours` (obsoleting branch "B"), > plus amended to make it an "evil merge", where a commit B2 from > obsoleted branch "B" is cherry picked to "master". [...] > There, I hope I didn`t miss any expectation. And, it _seems_ to work > exactly as expected :D That's very nice, to the level of being even suspect! :-) To avoid falling into euphoria though, we need to keep in mind that "expectations" is rather vague concept, and we likely still need to stop for user amendment unless we absolutely sure nothing surprising happens. I.e., we better require U1'==U2' test to succeed to proceed non-stop automatically. Besides, it will be somewhat inline with what 'rerere' does. > In real life, except for usual possibility for conflicts during > commit rebasing, we might experience _three_ possible conflict > situations once "rebased" merge itself is to be created - two when > rebasing each of temporary merge helper commits, and one on the > "rebased" merge itself. This is something where we might think about > user experience, not introducing (too much) confusion... Yeah, this is terribly important issue to take care of! Relative simplicity of the concept itself raises the chances of finding a suitable solution, I hope. -- Sergey
Re: [RFC] Contributing to Git (on Windows)
Hi, Derrick Stolee wrote: > Now, we'd like to make that document publicly available. These steps are > focused on a Windows user, so we propose putting them in the > git-for-windows/git repo under CONTRIBUTING.md. I have a pull request open > for feedback [1]. I'll read comments on that PR or in this thread. Thanks! I wonder if we can put this in the standard Documentation/ directory as well. E.g. the Windows CONTRIBUTING.md could say say "See Documentation/Contributing-On-Windows.md" so that the bulk of the text would not have to be git-for-windows specific. [...] > +++ b/CONTRIBUTING.md > @@ -0,0 +1,401 @@ > +How to Contribute to Git for Windows > + Would it make sense for this to be checked in with LF instead of CRLF line endings, so that clones use the user's chosen / platform native line ending? The .gitattributes file could include /CONTRIBUTING.md text so that line ending conversion takes place even if the user hasn't enabled core.autocrlf. [...] > +The SDK uses a different credential manager, so you may still want to > use Visual Studio > +or normal Git Bash for interacting with your remotes. Alternatively, > use SSH rather > +than HTTPS and avoid credential manager problems. Where can I read more about the problems in question? [...] > +Many new contributors ask: What should I start working on? > + > +One way to win big with the maintainer of an open-source project is to look > at the > +[issues page](https://github.com/git-for-windows/git/issues) and see if > there are any issues that > +you can fix quickly, or if anything catches your eye. You can also look at https://crbug.com/git for non Windows-specific issues. And you can look at recent user questions on the mailing list: https://public-inbox.org/git [...] > +If you are adding new functionality, you may need to create low-level tests > by creating > +helper commands that test a very limited action. These commands are stored > in `t/helpers`. > +When adding a helper, be sure to add a line to `t/Makefile` and to the > `.gitignore` for the > +binary file you add. The Git community prefers functional tests using the > full `git` > +executable, so be sure the test helper is required. Maybe s/low-level/unit/? [...] > +Read [`t/README`](https://github.com/git/git/blob/master/t/README) for more > details. Forgive my ignorance: does github flavored markdown allow relative links? (I.e., could this say [`t/README`](t/README)?) [...] > +You can also set certain environment variables to help test the performance > on different > +repositories or with more repetitions. The full list is available in > +[the `t/perf/README` > file](https://github.com/git/git/blob/master/t/perf/README), Likewise. [...] > +Test Your Changes on Linux > +-- > + > +It can be important to work directly on the [core Git > codebase](https://github.com/git/git), > +such as a recent commit into the `master` or `next` branch that has not been > incorporated > +into Git for Windows. Also, it can help to run functional and performance > tests on your > +code in Linux before submitting patches to the Linux-focused mailing list. I'm surprised at this advice. Does it actually come up? When I was on Mac I never worried about this, nor when I was on Windows. I'm personally happy to review patches that haven't been tested on Linux, though it's of course even nicer if the patch author mentions what platforms they've tested on. Maybe this can be reframed to refer specifically to cases where you've modified some Linux platform-specific code (e.g. to add a new feature to run-command.c)? [...] > +When preparing your patch, it is important to put yourself in the shoes of > the maintainer. ... and in the shoes of other users and developers working with Git down the line who will interact with the patch later! Actually, the maintainer is one of the least important audiences for a commit message. But may 'the maintainer' is a stand-in for 'the project' here? [...] > +* Make sure the commits are signed off using `git commit (-s|--signoff)`. > This means that you > + testify to be allowed to contribute your changes, in particular that if > you did this on company > + time, said company approved to releasing your work as Open Source under > the GNU Public License v2. Can this either point to or quote the relevant legal text from Documentation/SubmittingPatches? It feels dangerous (in the sense of, potentially leading to misunderstandings and major future pain) to ask people to sign off without them knowing exactly what that means. The rest also looks nice. Thanks for working on this. Sincerely, Jonathan
Re: [PATCH 2/2] parse-options: remove the unused parse_opt_commits() function
Ramsay Jones wrote: > Commit fcfba37337 ('ref-filter: make "--contains " less chatty if > is invalid', 2018-02-23), removed the last use of the callback > function parse_opt_commits(). Remove this function declaration and > definition, since it is now dead code. > > Signed-off-by: Ramsay Jones> --- > parse-options-cb.c | 16 > parse-options.h| 1 - > 2 files changed, 17 deletions(-) Reviewed-by: Jonathan Nieder Thanks.
Re: [PATCH 1/2] ref-filter: mark a file-local symbol as static
Hi, Ramsay Jones wrote: > Commit fcfba37337 ('ref-filter: make "--contains " less chatty if > is invalid', 2018-02-23) added the add_str_to_commit_list() > function, which causes sparse to issue a "... not declared. Should it > be static?" warning for that symbol. Thanks for catching it! > In order to suppress the warning, mark that function as static. Isn't this closer to Indeed, the function is only used in this one compilation unit. Mark it static. ? In other words, sparse's warning is accurate, and this is not about trying to quiet a false positive but about addressing a true positive. > Signed-off-by: Ramsay Jones> --- > ref-filter.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, Jonathan
[no subject]
hihttp://bit.ly/2oxaeuW William Broady
[PATCH 2/2] parse-options: remove the unused parse_opt_commits() function
Commit fcfba37337 ('ref-filter: make "--contains " less chatty if is invalid', 2018-02-23), removed the last use of the callback function parse_opt_commits(). Remove this function declaration and definition, since it is now dead code. Signed-off-by: Ramsay Jones--- parse-options-cb.c | 16 parse-options.h| 1 - 2 files changed, 17 deletions(-) diff --git a/parse-options-cb.c b/parse-options-cb.c index c6679cb2c..c7320a73f 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -78,22 +78,6 @@ int parse_opt_verbosity_cb(const struct option *opt, const char *arg, return 0; } -int parse_opt_commits(const struct option *opt, const char *arg, int unset) -{ - struct object_id oid; - struct commit *commit; - - if (!arg) - return -1; - if (get_oid(arg, )) - return error("malformed object name %s", arg); - commit = lookup_commit_reference(); - if (!commit) - return error("no such commit %s", arg); - commit_list_insert(commit, opt->value); - return 0; -} - int parse_opt_object_name(const struct option *opt, const char *arg, int unset) { struct object_id oid; diff --git a/parse-options.h b/parse-options.h index 4b4734f2e..2b8378ac1 100644 --- a/parse-options.h +++ b/parse-options.h @@ -224,7 +224,6 @@ extern int parse_opt_expiry_date_cb(const struct option *, const char *, int); extern int parse_opt_color_flag_cb(const struct option *, const char *, int); extern int parse_opt_verbosity_cb(const struct option *, const char *, int); extern int parse_opt_object_name(const struct option *, const char *, int); -extern int parse_opt_commits(const struct option *, const char *, int); extern int parse_opt_tertiary(const struct option *, const char *, int); extern int parse_opt_string_list(const struct option *, const char *, int); extern int parse_opt_noop_cb(const struct option *, const char *, int); -- 2.16.0
[PATCH 1/2] ref-filter: mark a file-local symbol as static
Commit fcfba37337 ('ref-filter: make "--contains " less chatty if is invalid', 2018-02-23) added the add_str_to_commit_list() function, which causes sparse to issue a "... not declared. Should it be static?" warning for that symbol. In order to suppress the warning, mark that function as static. Signed-off-by: Ramsay Jones--- ref-filter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ref-filter.c b/ref-filter.c index f375e7670..69bf7b587 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -2000,7 +2000,7 @@ static void do_merge_filter(struct ref_filter_cbdata *ref_cbdata) free(to_clear); } -int add_str_to_commit_list(struct string_list_item *item, void *commit_list) +static int add_str_to_commit_list(struct string_list_item *item, void *commit_list) { struct object_id oid; struct commit *commit; -- 2.16.0
[PATCH 0/2] sparse warning on next branch
Tonight's build had a sparse warning and an additional ./static-check.pl symbol appear on the 'next' branch. As you know, I like to catch these on the 'pu' branch before they graduate to 'next', but it seems that I missed these! :( (The 'pu' branch is quite noisy with regard to sparse and static-check.pl at the moment). These patches were developed on top of 'next', however, I also tested them on top of commit fcfba37337 directly. (Note, this is branch 'ps/contains-id-error-message' merged at commit 9623d6817b). Ramsay Jones (2): ref-filter: mark a file-local symbol as static parse-options: remove the unused parse_opt_commits() function parse-options-cb.c | 16 parse-options.h| 1 - ref-filter.c | 2 +- 3 files changed, 1 insertion(+), 18 deletions(-) -- 2.16.0
Bug report: "Use of uninitialized value $_ in print"
First, background. I encountered a bug on Debian Stretch, using this git version: $ git --version git version 2.11.0 The bug is that in the midst of running git -c interactive.diffFilter="git diff --word-diff --color" add --patch and after having answered "y" to some patches and "n" to others, git suddenly spewed the following output: Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 74. Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. Use of uninitialized value $_ in print at /usr/lib/git-core/git-add--interactive line 1371, line 75. I hope that this bug report can help the git core maintainers to either fix the problem upstream, or to co-ordinate a fix with the Debian git maintainer(s) if the bug does not exist upstream. Thanks for the great DVCS :) P.S. I am not subscribed to this mailing list, so please CC me in your reply if you want me to see it.
Re: [RFC] Rebasing merges: a jorney to the ultimate solution (Road Clear)
Hi Sergey, On 01/03/2018 06:39, Sergey Organov wrote: > > > > (3) ---X1---o---o---o---o---o---X2 > > >|\ |\ > > >| A1---A2---A3---U1 | A1'--A2'--A3'--U1' > > >| \ | > > >| M | > > >| / | > > >\-B1---B2---B3---U2 \-B1'--B2'--B3'--U2' > > > > > > > Meh, I hope I`m rushing it now, but for example, if we had decided to > > drop commit A2 during an interactive rebase (so losing A2' from > > diagram above), wouldn`t U2' still introduce those changes back, once > > U1' and U2' are merged, being incorrect/unwanted behavior...? :/ > > > > [...] > > Yeah, I now see it myself. I'm sorry for being lazy and not inspecting > this more carefully in the first place. No problem, that`s why we`re discussing it, and I`m glad we`re aligned now, so we can move forward :) > > So while your original proposal currently seems like it could be > > working nicely for non-interactive rebase (and might be some simpler > > interactive ones), now hitting/acknowledging its first real use > > limit, my additional quick attempt[1] just tries to aid pretty > > interesting case of complicated interactive rebase, too, where we > > might be able to do better as well, still using you original proposal > > as a base idea :) > > Yes, thank you for pushing me back to reality! :-) The work and thoughts > you are putting into solving the puzzle are greatly appreciated! You`re welcome, and I am enjoying it :) > Thinking about it overnight, I now suspect that original proposal had a > mistake in the final merge step. I think that what you did is a way to > fix it, and I want to try to figure what exactly was wrong in the > original proposal and to find simpler way of doing it right. > > The likely solution is to use original UM as a merge-base for final > 3-way merge of U1' and U2', but I'm not sure yet. Sounds pretty natural > though, as that's exactly UM from which both U1' and U2' have diverged > due to rebasing and other history editing. Yes, this might be it...! ;) To prove myself it works, I`ve assembled a pretty crazy `-s ours` merge interactive rebase scenario, and it seems this passes the test, ticking all the check boxes (I could think of) :P Let`s see our starting situation: (0) ---X8--B2'--X9 (master) |\ | A1---A2---A3 (A) | \ | M (topic) | / \-B1---B2---B3 (B) Here, merge commit M is done with `-s ours` (obsoleting branch "B"), plus amended to make it an "evil merge", where a commit B2 from obsoleted branch "B" is cherry picked to "master". Now, we want to rebase "topic" (M) onto updated "master" (X9), but to make things more interesting, we`ll do it interactively, with some amendments, drops, additions and even more cherry-picks! This is what the final result looks like: (1) ---X8--B2'--X9 (master) |\ | A12--A2'---B3' (A) | \ | M' (topic) | / \-B1'--B3'---B4 (B) During interactive rebase, on branch "A", we amended A1 into A12, dropped A3 and cherry-picked B3. On branch "B", B4 is added, B2' being omitted automatically as already present in "master". So... In comparison to original merge commit M, rebased merge commit M' is expected to: - Add X9, from updated "master" - Have A1 changed to A12, due to A12 commit amendment - Keep A2, rebased as A2' - Remove A3, due to dropped A3 commit - Keep amendment from original (evil) merge commit M - Miss B1' like M does B, due to original `-s ours` merge strategy - Add B2, cherry-picked as B2' into "master" - Add B3, cherry-picked as B3' into "A" - Add B4, added to "B" - Most important, provide safety mechanism to "fail loud", being aware of non-trivial things going on, allowing to stop for user inspection/decision There, I hope I didn`t miss any expectation. And, it _seems_ to work exactly as expected :D Not to leave this to imagination only, and hopefully helping others to get to speed and possibly discuss this, pointing to still possible flaws, I`m adding a demo script[1], showing how this exact example works. Note that script _is_ coined to avoid rebase conflicts, as they`re not currently important for the point to be made here. In real life, except for usual possibility for conflicts during commit rebasing, we might experience _three_ possible conflict situations once "rebased" merge itself is to be created - two when rebasing each of temporary merge helper commits, and one on the "rebased" merge itself. This is something where we might think about user experience, not introducing (too much) confusion... Regards, Buga [1] Demonstration script: -- >8 -- #!/bin/sh # rm -rf ./.git # rm -f ./test.txt git init touch ./test.txt git add -- test.txt #
Re: [PATCH 00/11] Reduce pack-objects memory footprint
On Thu, Mar 1, 2018 at 8:33 PM, Ævar Arnfjörð Bjarmasonwrote: > > On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted: > >> The array of object_entry in pack-objects can take a lot of memory >> when pack-objects is run in "pack everything" mode. On linux-2.6.git, >> this array alone takes roughly 800MB. >> >> This series reorders some fields and reduces field size... to keep >> this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on >> 64-bit linux and saves 260MB on linux-2.6.git. > > I'm very interested in this patch series. I don't have time to test this > one right now (have to run), but with your previous RFC patch memory use > (in the ~4GB range) on a big in-house repo went down by a bit over 3%, > and it's ~5% faster. > > Before/after RSS 4440812 / 429 & runtime 172.73 / 162.45. This is > after having already done a full git gc before, data via /usr/bin/time > -v. Jeff correctly pointed out elsewhere in this thread that RSS covers both heap (this is what I try to reduce) and some file cache (we mmap the whole pack file just to ease the reading) so RSS might not a good indicator of memory reduction. Any new freed memory should be used for cache which raises RSS back up. I think the RssAnon field in /proc//status shows it better. > So not huge, but respectable. > > We have a big repo, and this gets repacked on 6-8GB of memory on dev > KVMs, so we're under a fair bit of memory pressure. git-gc slows things > down a lot. > > It would be really nice to have something that made it use drastically > less memory at the cost of less efficient packs. Is the property that Ahh.. less efficient. You may be more interested in [1] then. It avoids rewriting the base pack. Without the base pack, book keeping becomes much much cheaper. We still read every single byte in all packs though (I think, unless you use pack-bitmap) and this amount of I/O affect the rest of the system too. Perhaps reducing core.packedgitwindowsize might make it friendlier to the OS, I don't know. > you need to spend give or take the size of .git/objects in memory > something inherent, or just a limitation of the current implementation? > I.e. could we do a first pass to pick some objects based on some > heuristic, then repack them N at a time, and finally delete the > now-obsolete packs? > > Another thing I've dealt with is that on these machines their > NFS-mounted storage gets exhausted (I'm told) due to some pathological > operations git does during repack, I/O tends to get 5-6x slower. Of > course ionice doesn't help because the local kernel doesn't know > anything about how harmful it is. [1] https://public-inbox.org/git/20180301092046.2769-1-pclo...@gmail.com/T/#u -- Duy
Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry
On Thu, Mar 1, 2018 at 9:49 PM, Jeff Kingwrote: > On Thu, Mar 01, 2018 at 04:10:48PM +0700, Nguyễn Thái Ngọc Duy wrote: > >> Instead of using 8 bytes (on 64 bit arch) to store a pointer to a >> pack. Use an index isntead since the number of packs should be >> relatively small. >> >> This limits the number of packs we can handle to 256 (still >> unreasonably high for a repo to work well). If you have more than 256 >> packs, you'll need an older version of Git to repack first. > > I overall like the direction of this series, but I think this one is > just too much. While you definitely shouldn't have a ton of packs, this > leaves the user with no real escape hatch. And 256 isn't actually that > many packs. It was raised back to 4096 at the end (I didn't know how many spare bits we had at this point). Agreed on the escape hatch though. I think we could do better: if there are more than X packs, we repack X packs into one and leave the rest alone. The _next_ pack-objects will pick another X packs to combine. Repeat until you only have one pack left. -- Duy
Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config
On Fri, Mar 2, 2018 at 1:14 AM, Junio C Hamanowrote: > Nguyễn Thái Ngọc Duy writes: > >> pack-objects could be a big memory hog especially on large repos, >> everybody knows that. The suggestion to stick a .keep file on the >> largest pack to avoid this problem is also known for a long time. > > Yup, but not that it is not "largest" per-se. The thing being large > is a mere consequence that it is the base pack that holds the bulk > of older parts of the history (e.g. the one that you obtained via > the initial clone). Thanks, "base pack" sounds much better. I was having trouble with wording because I didn't nail this one down. >> Let's do the suggestion automatically instead of waiting for people to >> come to Git mailing list and get the advice. When a certain condition >> is met, gc --auto create a .keep file temporary before repack is run, >> then remove it afterward. >> >> gc --auto does this based on an estimation of pack-objects memory >> usage and whether that fits in one third of system memory (the >> assumption here is for desktop environment where there are many other >> applications running). >> >> Since the estimation may be inaccurate and that 1/3 threshold is >> arbitrary, give the user a finer control over this mechanism as well: >> if the largest pack is larger than gc.bigPackThreshold, it's kept. > > If this is a transient mechanism during a single gc session, it > would be far more preferrable if we can find a way to do this > without actually having a .keep file on the filesystem. That was my first attempt, manipulating packed_git::pack_keep inside pack-objects. Then my whole git.git was gone. I was scared off so I did this instead. I've learned my lesson though (never test dangerous operations on your worktree!) and will do that pack_keep again _if_ this gc --auto still sounds like a good direction to go. -- Duy
Re: Worktree silently deleted on git fetch/gc/log
On Fri, Mar 2, 2018 at 3:14 AM, Eric Sunshinewrote: > As far as I know, there isn't any code in Git which would > automatically remove the .git/worktrees/B directory, so it's not clear > how that happened. "git worktree prune" does, which is called by "git gc" and that was actually executed from the command output in the original mail. Дилян Палаузов did you move /git/B away manually? For example, you originally created "B" with git worktree add /somewhere/B g then you do this at some point mv /somewhere/B /git/B -- Duy
Re: [PATCH v4 03/35] pkt-line: add delim packet support
Brandon Williamswrites: > On 03/01, Junio C Hamano wrote: > ... >> Hmph, strictly speaking, the "delim" does not have to be a part of >> how packetized stream is defined. As long as we stop abusing flush >> as "This is merely an end of one segment of what I say." and make it >> always mean "I am done speaking, it is your turn.", the application >> payload can define its own syntax to separate groups of packets. > > Thanks actually a good point. We could just as easily have the delim > packet to be an empty packet-line "0004" or something like that. Yes. As long as there is an easy and obvious "cannot be a value" constant, you can use it as a delimiter defined at the application level. For example, your command-request uses delim, like so: +request = empty-request | command-request +empty-request = flush-pkt +command-request = command + capability-list + (command-args) + flush-pkt +command = PKT-LINE("command=" key LF) +command-args = delim-pkt + *PKT-Line(arg LF) to mark the end of cap list, but if an empty packet does not make sense as a member of a cap list and a commmand args list, then an empty packet between cap list and command arg can be used instead. A protocol-ignorant proxy can still work just fine. Having a defined delim at the protocol level is often convenient, of course, but once the application starts calling for multi-level delimiters (i.e. maybe there are chapters and sections inside each chapter in a single request message), it would not be sufficient to define a single delim packet type. The application layer needs to define its own convention (e.g. if no "empty" section is allowed, then "two consecutive delim is a chapter break; one delim is a section break" can become a viable way to emulate multi-level delimiters).
Re: [PATCH v4 12/35] serve: introduce git-serve
Brandon Williamswrites: > Documentation/technical/protocol-v2.txt | 171 Unlike other things in Documentation/technical/, this is not listed on TECH_DOCS list in Documentation/Makefile. Shouldn't it be?
Re: [PATCH v4 03/35] pkt-line: add delim packet support
On 03/01, Junio C Hamano wrote: > Junio C Hamanowrites: > > > Brandon Williams writes: > > > >> One of the design goals of protocol-v2 is to improve the semantics of > >> flush packets. Currently in protocol-v1, flush packets are used both to > >> indicate a break in a list of packet lines as well as an indication that > >> one side has finished speaking. This makes it particularly difficult > >> to implement proxies as a proxy would need to completely understand git > >> protocol instead of simply looking for a flush packet. > > > > Good ;-) Yes, this has been one of the largest gripe about the > > smart-http support code we have. > > Hmph, strictly speaking, the "delim" does not have to be a part of > how packetized stream is defined. As long as we stop abusing flush > as "This is merely an end of one segment of what I say." and make it > always mean "I am done speaking, it is your turn.", the application > payload can define its own syntax to separate groups of packets. Thanks actually a good point. We could just as easily have the delim packet to be an empty packet-line "0004" or something like that. > > I do not mind having this "delim" thing defined at the protocol > level too much, though. -- Brandon Williams
What's cooking in git.git (Mar 2018, #01; Thu, 1)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [Graduated to "master"] * ab/untracked-cache-invalidation-docs (2018-02-09) 2 commits (merged to 'next' on 2018-02-14 at 11d2d07c4a) + update-index doc: note the caveat with "could not open..." + update-index doc: note a fixed bug in the untracked cache (this branch uses nd/fix-untracked-cache-invalidation.) Doc update to warn against remaining bugs in untracked cache. * as/ll-i18n (2018-02-13) 1 commit (merged to 'next' on 2018-02-14 at b30154a04c) + Mark messages for translations Some messages in low level start-up codepath have been i18n-ized. * bc/doc-interpret-trailers-grammofix (2018-02-13) 1 commit (merged to 'next' on 2018-02-14 at 940e6dc7a5) + docs/interpret-trailers: fix agreement error Docfix. * bp/fsmonitor (2018-02-14) 1 commit (merged to 'next' on 2018-02-14 at 5c508858fb) + fsmonitor: update documentation to remove reference to invalid config settings Doc update for a recently added feature. * bp/name-hash-dirname-fix (2018-02-08) 1 commit (merged to 'next' on 2018-02-14 at 2f564fb4b3) + name-hash: properly fold directory names in adjust_dirname_case() "git add" files in the same directory, but spelling the directory path in different cases on case insensitive filesystem, corrupted the name hash data structure and led to unexpected results. This has been corrected. * es/worktree-add-post-checkout-hook (2018-02-15) 1 commit (merged to 'next' on 2018-02-21 at 6ef6a130bf) + worktree: add: fix 'post-checkout' not knowing new worktree location "git worktree add" learned to run the post-checkout hook, just like "git clone" runs it upon the initial checkout. * gs/test-unset-xdg-cache-home (2018-02-16) 1 commit (merged to 'next' on 2018-02-21 at 9aec46d404) + test-lib.sh: unset XDG_CACHE_HOME Test update. * jc/blame-missing-path (2018-02-07) 1 commit (merged to 'next' on 2018-02-14 at 883d266e1e) + blame: tighten command line parser "git blame HEAD COPYING" in a bare repository failed to run, while "git blame HEAD -- COPYING" run just fine. This has been corrected. * jk/doc-do-not-write-extern (2018-02-08) 1 commit (merged to 'next' on 2018-02-14 at e55b5127de) + CodingGuidelines: mention "static" and "extern" Devdoc update. * jk/gettext-poison (2018-02-08) 2 commits (merged to 'next' on 2018-02-14 at cca3719a59) + git-sh-i18n: check GETTEXT_POISON before USE_GETTEXT_SCHEME + t0205: drop redundant test Test updates. * jk/push-options-via-transport-fix (2018-02-20) 2 commits (merged to 'next' on 2018-02-21 at a037cbfa2b) + remote-curl: unquote incoming push-options + t5545: factor out http repository setup "git push" over http transport did not unquote the push-options correctly. * jk/sq-dequote-on-bogus-input (2018-02-14) 1 commit (merged to 'next' on 2018-02-14 at 75d4f1eaf8) + sq_dequote: fix extra consumption of source string Code to unquote single-quoted string (used in the parser for configuration files, etc.) did not diagnose bogus input correctly and produced bogus results instead. * jk/t0002-simplify (2018-02-12) 1 commit (merged to 'next' on 2018-02-14 at a7a24f5f29) + t0002: simplify error checking Code cleanup. * jk/test-hashmap-updates (2018-02-14) 6 commits (merged to 'next' on 2018-02-14 at a61a9bd8f0) + test-hashmap: use "unsigned int" for hash storage + test-hashmap: simplify alloc_test_entry + test-hashmap: use strbuf_getline rather than fgets + test-hashmap: use xsnprintf rather than snprintf + test-hashmap: check allocation computation for overflow + test-hashmap: use ALLOC_ARRAY rather than bare malloc Code clean-up. * js/fix-merge-arg-quoting-in-rebase-p (2018-02-08) 1 commit (merged to 'next' on 2018-02-14 at 27ebf001a1) + rebase -p: fix incorrect commit message when calling `git merge`. "git rebase -p" mangled log messages of a merge commit, which is now fixed. * js/packet-read-line-check-null (2018-02-08) 2 commits (merged to 'next' on 2018-02-14 at 6ba237b284) + always check for NULL return from packet_read_line() + correct error messages for NULL packet_read_line() Some low level protocol codepath could crash when they get an unexpected flush packet, which is now fixed. * jt/binsearch-with-fanout (2018-02-15) 2 commits (merged to 'next' on 2018-02-15 at 7648891022) + packfile: refactor hash search with fanout table + packfile: remove GIT_DEBUG_LOOKUP log statements (this branch is used by ds/commit-graph.) Refactor the code to binary search starting from a
Re: Obsolete instruction in SubmittingPatches?
Andrei Rybakwrites: > On 01.03.2018 0:54, Junio C Hamano wrote: >> Andrei Rybak writes: >> >>> Is this part of guidelines obsolete, or am I not understanding this >>> correctly? >> >> I am merely being nice (but only on "time-permitting" basis). > > Does that mean that the integration of a series is easier, when there is > a re-send? When I am not interested in a topic, or there are other reviewers who are more qualified and are interested in the topic than I am, the protocol to require the final "Here is a verbatim final resend, the only difference from what was reviewed is the reviewed-by's from the reviewers." will let me completely ignore the topic while it is being discussed (as long as I trust the judgment of these reviewers). Otherwise I'd need to keep track of the progress of the discussion on each and every topic, which is often impossible.
Re: [PATCH v4 06/35] transport: use get_refs_via_connect to get refs
Brandon Williamswrites: > Remove code duplication and use the existing 'get_refs_via_connect()' > function to retrieve a remote's heads in 'fetch_refs_via_pack()' and > 'git_transport_push()'. > > Signed-off-by: Brandon Williams > --- > transport.c | 18 -- > 1 file changed, 4 insertions(+), 14 deletions(-) Nice ;-)
Re: [PATCH v4 05/35] upload-pack: factor out processing lines
Brandon Williamswrites: > Factor out the logic for processing shallow, deepen, deepen_since, and > deepen_not lines into their own functions to simplify the > 'receive_needs()' function in addition to making it easier to reuse some > of this logic when implementing protocol_v2. These little functions that still require their incoming data to begin with fixed prefixes feels a bit strange way to refactor the logic for later reuse (when I imagine "reuse", the first use case that comes to my mind is "this data source our new code reads from gives the same data as the old 'shallow' packet used to give, but in a different syntax"---so I'd restructure the code in such a way that the caller figures out the syntax part and the called helper just groks the "information contents" unwrapped from the surface syntax; the syntax may be different in the new codepath but once unwrapped, the "information contents" to be processed would not be different hence we can reuse the helper). IOW, I would have expected the caller to be not like this: > - if (skip_prefix(line, "shallow ", )) { > - struct object_id oid; > - struct object *object; > - if (get_oid_hex(arg, )) > - die("invalid shallow line: %s", line); > - object = parse_object(); > - if (!object) > - continue; > - if (object->type != OBJ_COMMIT) > - die("invalid shallow object %s", > oid_to_hex()); > - if (!(object->flags & CLIENT_SHALLOW)) { > - object->flags |= CLIENT_SHALLOW; > - add_object_array(object, NULL, ); > - } > + if (process_shallow(line, )) > continue; > + if (process_deepen(line, )) > continue; ... but more like if (skip_prefix(line, "shallow ", ) { process_shallow(arg, ); continue; } if (skip_prefix(line, "deepen ", ) { process_deepen(arg, ); continue; } ... I need to defer the final judgment until I see how they are used, though. It's not too big a deal either way---it just felt "not quite right" to me.
Re: [PATCH v4 03/35] pkt-line: add delim packet support
Junio C Hamanowrites: > Brandon Williams writes: > >> One of the design goals of protocol-v2 is to improve the semantics of >> flush packets. Currently in protocol-v1, flush packets are used both to >> indicate a break in a list of packet lines as well as an indication that >> one side has finished speaking. This makes it particularly difficult >> to implement proxies as a proxy would need to completely understand git >> protocol instead of simply looking for a flush packet. > > Good ;-) Yes, this has been one of the largest gripe about the > smart-http support code we have. Hmph, strictly speaking, the "delim" does not have to be a part of how packetized stream is defined. As long as we stop abusing flush as "This is merely an end of one segment of what I say." and make it always mean "I am done speaking, it is your turn.", the application payload can define its own syntax to separate groups of packets. I do not mind having this "delim" thing defined at the protocol level too much, though.
Re: [PATCH v4 00/35] protocol version 2
Brandon Williamswrites: > I've tried to keep building on the same base that I started with when > sending out a new version of series, mostly because I thought it was > easier to see what was different between rounds. Yes. It indeed is easier to see the evolution if the series does not get rebased needlessly. > I can, in the future, try to remember to put the commit its based on. > Do we have any sort of guidance about the best practice here? I recall we taught a new "--base" option to "format-patch" not too long ago, so one way to do so may be: $ git format-patch --cover-letter --base=v2.16.0-rc0 master..bw/protocol-v2 $ tail -4 -cover*.txt base-commit: 1eaabe34fc6f486367a176207420378f587d3b48 -- 2.16.2-345-g7e31236f65 perhaps?
Re: [PATCH v4 03/35] pkt-line: add delim packet support
Brandon Williamswrites: > One of the design goals of protocol-v2 is to improve the semantics of > flush packets. Currently in protocol-v1, flush packets are used both to > indicate a break in a list of packet lines as well as an indication that > one side has finished speaking. This makes it particularly difficult > to implement proxies as a proxy would need to completely understand git > protocol instead of simply looking for a flush packet. Good ;-) Yes, this has been one of the largest gripe about the smart-http support code we have.
Re: [PATCH v4 02/35] pkt-line: allow peeking a packet line without consuming it
Brandon Williamswrites: > +enum packet_read_status packet_reader_read(struct packet_reader *reader) > +{ > + if (reader->line_peeked) { > + reader->line_peeked = 0; > + return reader->status; > + } > + > + reader->status = packet_read_with_status(reader->fd, > + >src_buffer, > + >src_len, > + reader->buffer, > + reader->buffer_size, > + >pktlen, > + reader->options); > + > + switch (reader->status) { > + case PACKET_READ_EOF: > + reader->pktlen = -1; > + reader->line = NULL; > + break; > + case PACKET_READ_NORMAL: > + reader->line = reader->buffer; > + break; > + case PACKET_READ_FLUSH: > + reader->pktlen = 0; > + reader->line = NULL; > + break; > + } > + > + return reader->status; > +} With the way _peek() interface interacts with the reader instance (which by the way I find is well designed), it is understandable that we want almost everything available in reader's fields, but having to manually clear pktlen field upon non NORMAL status feels a bit strange. Perhaps that is because the underlying packet_read_with_status() does not set *pktlen in these cases? Shouldn't it be doing that so the caller does not have to? A similar comment applies for reader's line field. In priniciple, as the status field is part of a reader, it does not have to exist as a separate field, i.e. #define line_of(reader) \ ((reader).status == PACKET_READ_NORMAL ? \ (reader).buffer : NULL) can be used to as substitute for it. I guess it depends on how the actual callers wants to use this interface.
Re: [PATCH v2 0/5] roll back locks in various code paths
On 1 March 2018 at 20:24, Junio C Hamanowrote: > Jeff King writes: > >> IMHO the result is easier to follow. Except for one case: >> >>> [...] >> >> where I think the logic just ends up repeating itself. > > Yup, this one I also had a bit of trouble with. Thanks for your feedback, both of you. It's much appreciated. After thinking about it, I tend to agree. That rewrite loses an indentation level and makes it a bit clearer that we have two steps, "maybe bail" and "write". But at the cost of duplicating logic -- after all, those two steps are very closely related, so there's no need to artificially separate them. Here it is again, without that hunk, and without the commit message claim that it'd be a good thing to have just a few uses of "active_cache_changed" remaining. This could go as a patch 6/5 onto ma/roll-back-lockfiles. I keep debating myself whether this would do better earlier in such a six-pack, where it would be "also, this releases a lock where we used to forget to". Right now, there is overlap between patch 3/5 (which adds a line) and this patch (which removes it). This patch doesn't entirely negate the need for patch 3/5 though.. Martin -- >8 -- Subject: write_locked_index(): add flag to avoid writing unchanged index We have several callers like if (active_cache_changed && write_locked_index(...)) handle_error(); rollback_lock_file(...); where the final rollback is needed because "!active_cache_changed" shortcuts the if-expression. There are also a few variants of this, including some if-else constructs that make it more clear when the explicit rollback is really needed. Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and simplify the callers. Leave the most complicated of the callers (in builtin/update-index.c) unchanged. Rewriting it to use this new flag would end up duplicating logic. We could have made the new flag behave the other way round ("FORCE_WRITE"), but that could break existing users behind their backs. Let's take the more conservative approach. We can still migrate existing callers to use our new flag. Later we might even be able to flip the default, possibly without entirely ignoring the risk to in-flight or out-of-tree topics. Suggested-by: Jeff King Signed-off-by: Martin Ågren --- cache.h | 4 builtin/add.c | 7 +++ builtin/commit.c | 10 +++--- builtin/merge.c | 15 ++- builtin/mv.c | 4 ++-- builtin/rm.c | 7 +++ merge-recursive.c | 5 ++--- read-cache.c | 6 ++ rerere.c | 8 +++- sequencer.c | 11 +-- 10 files changed, 37 insertions(+), 40 deletions(-) diff --git a/cache.h b/cache.h index d8b975a571..905d8eb6cc 100644 --- a/cache.h +++ b/cache.h @@ -622,6 +622,7 @@ extern int read_index_unmerged(struct index_state *); /* For use with `write_locked_index()`. */ #define COMMIT_LOCK(1 << 0) +#define SKIP_IF_UNCHANGED (1 << 1) /* * Write the index while holding an already-taken lock. Close the lock, @@ -638,6 +639,9 @@ extern int read_index_unmerged(struct index_state *); * With `COMMIT_LOCK`, the lock is always committed or rolled back. * Without it, the lock is closed, but neither committed nor rolled * back. + * + * If `SKIP_IF_UNCHANGED` is given and the index is unchanged, nothing + * is written (and the lock is rolled back if `COMMIT_LOCK` is given). */ extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags); diff --git a/builtin/add.c b/builtin/add.c index bf01d89e28..9e5a80cc6f 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -534,10 +534,9 @@ int cmd_add(int argc, const char **argv, const char *prefix) unplug_bulk_checkin(); finish: - if (active_cache_changed) { - if (write_locked_index(_index, _file, COMMIT_LOCK)) - die(_("Unable to write new index file")); - } + if (write_locked_index(_index, _file, + COMMIT_LOCK | SKIP_IF_UNCHANGED)) + die(_("Unable to write new index file")); UNLEAK(pathspec); UNLEAK(dir); diff --git a/builtin/commit.c b/builtin/commit.c index 8a87701414..8d9b4081c3 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -423,13 +423,9 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix if (active_cache_changed || !cache_tree_fully_valid(active_cache_tree)) update_main_cache_tree(WRITE_TREE_SILENT); - if (active_cache_changed) { - if (write_locked_index(_index, _lock, - COMMIT_LOCK)) - die(_("unable to write new_index file")); - } else { - rollback_lock_file(_lock); -
Re: [PATCH v4 9/9] add -p: don't rely on apply's '--recount' option
Phillip Woodwrites: > From: Phillip Wood > > Now that add -p counts patches properly it should be possible to turn > off the '--recount' option when invoking 'git apply' Sounds good ;-)
Re: [PATCH v4 8/9] add -p: fix counting when splitting and coalescing
Phillip Woodwrites: > @@ -887,8 +892,8 @@ sub merge_hunk { > $o_cnt = $n_cnt = 0; > for ($i = 1; $i < @{$prev->{TEXT}}; $i++) { > my $line = $prev->{TEXT}[$i]; > - if ($line =~ /^\+/) { > - $n_cnt++; > + if ($line =~ /^[+\\]/) { > + $n_cnt++ if ($line =~ /^\+/); > push @line, $line; > next; > } H, the logic may be correct, but this looks like a result of attempting to minimize the number of changed lines and ending up with a less-than-readble code. "If the line begins with a plus or backslash, do these things, the first of which is done only when the line begins with a plus." The same comment for the other hunk that counts the $this side.
Re: Worktree silently deleted on git fetch/gc/log
On Thu, Mar 1, 2018 at 2:24 PM, Дилян Палаузовwrote: > /git/A/.git/worktrees/b/ is missing - that is the point. > /git/B/,git wasn't modified since the worktree was created, cat: > gitdir: /git/A/.git/worktrees/b I'll assume that the lowercase 'b' was a typo in your email (as it was in mine), and that the real content is "gitdir: /git/A/.git/worktrees/B". You should be able to get back to a working state like this: % cd /git/A/ % mkdir -p .git/worktrees/B % echo 'ref: refs/heads/g' >.git/worktrees/B/HEAD % echo ../.. >.git/worktrees/B/commondir % echo /git/B/.git >.git/worktrees/B/gitdir % cd /git/B/ % git reset As far as I know, there isn't any code in Git which would automatically remove the .git/worktrees/B directory, so it's not clear how that happened.
Re: [PATCH v4 7/9] add -p: calculate offset delta for edited patches
Phillip Woodwrites: > From: Phillip Wood > > Recount the number of preimage and postimage lines in a hunk after it > has been edited so any change in the number of insertions or deletions > can be used to adjust the offsets of subsequent hunks. If an edited > hunk is subsequently split then the offset correction will be lost. It > would be possible to fix this if it is a problem, however the code > here is still an improvement on the status quo for the common case > where an edited hunk is applied without being split. > > This is also a necessary step to removing '--recount' and > '--allow-overlap' from the invocation of 'git apply'. Before > '--recount' can be removed the splitting and coalescing counting needs > to be fixed to handle a missing newline at the end of a file. In order > to remove '--allow-overlap' there needs to be i) some way of verifying > the offset data in the edited hunk (probably by correlating the > preimage (or postimage if the patch is going to be applied in reverse) > lines of the edited and unedited versions to see if they are offset or > if any leading/trailing context lines have been removed) and ii) a way of > dealing with edited hunks that change context lines that are shared > with neighbouring hunks. > > Signed-off-by: Phillip Wood > --- Thanks for clear description of what is going on in the series. > diff --git a/git-add--interactive.perl b/git-add--interactive.perl > index 7a0a5896bb..0df0c2aa06 100755 > --- a/git-add--interactive.perl > +++ b/git-add--interactive.perl > @@ -938,13 +938,19 @@ sub coalesce_overlapping_hunks { > parse_hunk_header($text->[0]); > unless ($_->{USE}) { > $ofs_delta += $o_cnt - $n_cnt; > + # If this hunk has been edited then subtract > + # the delta that is due to the edit. > + $_->{OFS_DELTA} and $ofs_delta -= $_->{OFS_DELTA}; The pattern <> and <>; is something you are newly introducing to this script. I am not sure if we want to see them. I somehow find them harder to read than the more straight-forward and naïve if (<>) { <>; } > + # If this hunk was edited then adjust the offset delta > + # to reflect the edit. > + $_->{OFS_DELTA} and $ofs_delta += $_->{OFS_DELTA}; Likewise. > +sub recount_edited_hunk { > + local $_; > + my ($oldtext, $newtext) = @_; > + my ($o_cnt, $n_cnt) = (0, 0); > + for (@{$newtext}[1..$#{$newtext}]) { > + my $mode = substr($_, 0, 1); > + if ($mode eq '-') { > + $o_cnt++; > + } elsif ($mode eq '+') { > + $n_cnt++; > + } elsif ($mode eq ' ') { > + $o_cnt++; > + $n_cnt++; > + } > + } > + my ($o_ofs, undef, $n_ofs, undef) = > + parse_hunk_header($newtext->[0]); > + $newtext->[0] = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt); > + my (undef, $orig_o_cnt, undef, $orig_n_cnt) = > + parse_hunk_header($oldtext->[0]); > + # Return the change in the number of lines inserted by this hunk > + return $orig_o_cnt - $orig_n_cnt - $o_cnt + $n_cnt; > +} OK. > @@ -1114,25 +1144,32 @@ sub prompt_yesno { > } > > sub edit_hunk_loop { > - my ($head, $hunk, $ix) = @_; > - my $text = $hunk->[$ix]->{TEXT}; > + my ($head, $hunks, $ix) = @_; > + my $hunk = $hunks->[$ix]; > + my $text = $hunk->{TEXT}; > ... > + $newhunk->{OFS_DELTA} = recount_edited_hunk($text, $newtext); > + # If this hunk has already been edited then add the > + # offset delta of the previous edit to get the real > + # delta from the original unedited hunk. > + $hunk->{OFS_DELTA} and > + $newhunk->{OFS_DELTA} += $hunk->{OFS_DELTA}; Ahh, good point.
Re: Bug: git log: boundary commits do not respect order (e.g. date-order, topo-order) 2
I just wanted to follow up -- what do you think? On Thu, Feb 22, 2018 at 9:21 PM, Josh Tepperwrote: > My use case is that when combined with --graph, the ordering is > necessary to render a reasonable looking commit graph; by placing the > commits at the end, each boundary commit essentially produces another > vertical line all the way to the bottom where the commits reside. > More specifically, the case where I noticed this was: > > git log --boundary --graph ^master feature > > which will have one vertical line going all the way to the bottom for > each merge from master into feature. > > Another issue that I've noticed is that if one is using > --show-linear-break (instead of --graph), the linear breaks are > missing between the boundary commits and the other commits. > > Regarding the documentation, while it doesn't explicitly say where the > boundary commits will be ordered with the other commits (nor does it > say that they'll have accurate linear breaks), the documentation for > the order flags (and for the linear break flag) makes no mention that > certain commits are excluded. The implicit understanding of the > documentation for --date-order is that it will apply to all of the > commits. Certainly, if you add the flag --full-history, one expects > the ordering to apply to the additionally traversed commits. I > suppose boundary commits are a little different since they're not > explicitly part of the range -- nonetheless I expected them to be > treated similarly. > > Maybe this could at least be documented? > > Best, > ~Josh > > On Thu, Feb 22, 2018 at 2:29 PM, Derrick Stolee wrote: >> On 2/21/2018 6:57 PM, Josh Tepper wrote: >>> >>> When using git log, boundary commits (ie, those commits added by >>> specifying --boundary) do not respect the order (e.g., --date-order, >>> --topo-order). Consider the following commit history, where number >>> indicates the order of the commit timestamps: >>> >>> >>> 0125 <--A >>> \ \ >>> 346 <--B >>> >>> >>> Executing the following command: >>> >>> $ git log --boundary --date-order ^A B >>> >>> Should produce the following order (boundary commits shown with dashes): >>> 6 -5 4 3 -1 >>> >>> However, it in fact produces: >>> 6 4 3 -5 -1 >>> >>> Please advise. >>> >> >> Hi Josh, >> >> Looking at the docs [1], I don't see any specifics on how the boundary >> commits should be ordered. >> >> Clearly, the implementation specifies that the boundary is written after all >> other commits. For a full discussion of this, see the commit message for >> 86ab4906a7c "revision walker: Fix --boundary when limited". Here is an >> excerpt: >> >> - After get_revision() finishes giving out all the positive >>commits, if we are doing the boundary processing, we look at >>the parents that we marked as potential boundaries earlier, >>see if they are really boundaries, and give them out. >> >> The boundary commits are correctly sorted by topo-order among themselves as >> of commit 4603ec0f960 "get_revision(): honor the topo_order flag for >> boundary commits". >> >> So, I'm not sure that this is a bug (it is working "as designed") but it >> certainly is non-obvious behavior. >> >> In what use case do you need these boundary commits to appear earlier? >> >> Thanks, >> -Stolee >> >> [1] https://git-scm.com/docs/git-log >> >>
Re: Worktree silently deleted on git fetch/gc/log
Hello, /git/A/.git/worktrees/b/ is missing - that is the point. /git/B/,git wasn't modified since the worktree was created, cat: gitdir: /git/A/.git/worktrees/b Regards Дилян On 03/01/18 19:09, Eric Sunshine wrote: On Wed, Feb 28, 2018 at 7:44 AM, Дилян Палаузовwrote: A (branch master) and B (branch g) which is a worktree of the first. /git/B g>$ git fetch [...] From https://... 13e4c55a0..02655d5fb g -> origin/g c37a3ca25..bc7888511 master -> origin/master Auto packing the repository in background for optimum performance. See "git help gc" for manual housekeeping. /git/B g<>$ git log -p origin/g fatal: Not a git repository: /git/A/.git/worktrees/B /git/B$ Please note that on the second last prompt there is <>, so that git-prompt has found the neccessary information and this was later deleted - by 'gc' or 'log'. What would be the procedure to restore the /git/A/.git/worktrees/B structure? Can you show us (via 'cat') the content of the following files? /git/B/.git /git/A/.git/worktrees/b/HEAD /git/A/.git/worktrees/b/commondir /git/A/.git/worktrees/b/gitdir
Re: [PATCH v2 0/5] roll back locks in various code paths
Jeff Kingwrites: > IMHO the result is easier to follow. Except for one case: > >> -if (active_cache_changed || force_write) { >> -if (newfd < 0) { >> -if (refresh_args.flags & REFRESH_QUIET) >> -exit(128); >> -unable_to_lock_die(get_index_file(), lock_error); >> -} >> -if (write_locked_index(_index, _file, COMMIT_LOCK)) >> -die("Unable to write new index file"); >> +if (newfd < 0 && (active_cache_changed || force_write)) { >> +if (refresh_args.flags & REFRESH_QUIET) >> +exit(128); >> +unable_to_lock_die(get_index_file(), lock_error); >> } >> >> -rollback_lock_file(_file); >> +if (write_locked_index(_index, _file, >> + COMMIT_LOCK | (force_write ? 0 : >> SKIP_IF_UNCHANGED))) >> +die("Unable to write new index file"); > > where I think the logic just ends up repeating itself. Yup, this one I also had a bit of trouble with.
Re: [PATCH v4 00/35] protocol version 2
On 03/01, Junio C Hamano wrote: > Brandon Williamswrites: > > > Lots of changes since v3 (well more than between v2 and v3). Thanks for > > all of the reviews on the last round, the series is getting more > > polished. > > > > * Eliminated the "# service" line from the response from an HTTP > >server. This means that the response to a v2 request is exactly the > >same regardless of which transport you use! Docs for this have been > >added as well. > > * Changed how ref-patterns work with the `ls-refs` command. Instead of > >using wildmatch all patterns must either match exactly or they can > >contain a single '*' character at the end to mean that the prefix > >must match. Docs for this have also been added. > > * Lots of updates to the docs. Including documenting the > >`stateless-connect` remote-helper command used by remote-curl to > >handle the http transport. > > * Fixed a number of bugs with the `fetch` command, one of which didn't > >use objects from configured alternates. > > I noticed that this round is built on top of v2.16.0-rc0. It > certainly makes it easier to compare against the previous round > which was built on top of that old commit and it is very much > appreciated that a reroll does not involve pointless rebases. > > For those who are helping from sidelines, it may be ehlpful to > mention where in the history this was developed on, though, as > applying these on the current 'master' has a handful of small > conflicts. > > Thanks, will replace and will comment on individual patches as > needed. I've tried to keep building on the same base that I started with when sending out a new version of series, mostly because I thought it was easier to see what was different between rounds. I can, in the future, try to remember to put the commit its based on. Do we have any sort of guidance about the best practice here? -- Brandon Williams
Re: [RFC PATCH] color: respect the $NO_COLOR convention
Leah Neukirchenwrites: > You are right in calling this out an emerging new thing, but the > second list of that page proves that it will be useful to settle on a > common configuration, and my hope is by getting a few popular projects > on board, others will soon follow. It certainly is easy to implement, > and rather unintrusive. Users which don't know about this feature are > completely unaffected. There certainly is chicken-and-egg problem there. Even though I personally prefer not to see overuse of colors, I am not sure if we the Git community as a whole would want to be involved until it gets mainstream. >>> if (color_stdout_is_tty < 0) >>> color_stdout_is_tty = isatty(1); >>> if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) { >> >> According to no-color.org's FAQ #2, NO_COLOR should affect only the >> "default" behaviour, and should stay back if there is an explicit >> end-user configuration (or command line override). And this helper >> function is called only from want_color() when their is no such >> higher precedence setting, which is in line with the recommendation. >> >> Which is good. > > Yes, I took care of that. Should this also be tested? It doesn't > quite fit into the setting of t4026-color.sh I think. It probably fits much better in t7006, I would suspect. Earlier, setting color.ui to auto meant the output is colored when run under test_terminal, but with this new environment set, the output will have to be bland.
Re: The case for two trees in a commit ("How to make rebase less modal")
Hi, Stefan Beller wrote: > $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86 > parent 105d5b91138ced892765a84e771a061ede8d63b8 > author Stefan Beller1519859216 -0800 > committer Stefan Beller 1519859216 -0800 > tree 5495266479afc9a4bd9560e9feac465ed43fa63a > test commit > EOF > 19abfc3bf1c5d782045acf23abdf7eed81e16669 > $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669 > $ > > So it is technically possible to create a commit with two tree entries > and fsck is not complaining. As others mentioned, this is essentially a fancy way to experiment with adding a new header (with the same name as an existing header) to a commit. It is kind of a scary thing to do because anyone trying to parse commits, including old versions of git, is likely to get confused by the multiple trees. It doesn't affect the reachability calculation in the way that it should so this ends up being something that should be straightforward to do with a message in the commit body instead. To affect reachability, you could use multiple parent lines instead. You'd need synthetic commits to hang the trees on. This is similar to how "git stash" stores the index state. In other words, I think what you are trying to do is feasible, but not in the exact way you described. [...] > * porcelain to modify the repo "at larger scale", such as rebase, > cherrypicking, reverting > involving more than 1 commit. > > These large scale operations involving multiple commits however > are all modal in its nature. Before doing anything else, you have to > finish or abort the rebase or you need expert knowledge how to > go otherwise. > > During the rebase there might be a hard to resolve conflict, which > you may not want to resolve right now, but defer to later. Deferring a > conflict is currently impossible, because precisely one tree is recorded. Junio mentions you'd want to record: - stages of the index, to re-create a conflicted index - working tree files, with conflict markers In addition you may also want to record: - state (todo list) from .git/rebase-merge, to allow picking up where you left off in such a larger operation - similar state for other commands --- e.g. MERGE_MSG Recording this work-in-progress state is in the spirit of "git stash" does. People also sometimes like to record their state in progress with a "wip commit" at the tip of a branch. Both of those workflows would benefit from something like this, I'd think. So I kind of like this. Maybe a "git save-wip" command that is like "git stash" but records state to the current branch? And similarly improving "git stash" to record the same richer state. And in the spirit of "git stash" I think it is possible without even modifying the commit object format. [...] > I'd be advocating for having multiple trees in a commit > possible locally; it might be a bad idea to publish such trees. I think such "WIP state" may also be useful for publishing, to allow collaborating on a thorny rebase or merge. Thanks and hope that helps, Jonathan
Re: The case for two trees in a commit ("How to make rebase less modal")
Jacob Kellerwrites: > How does this let you defer a conflict? A future commit which modified > blobs in that tree wouldn't know what version of the trees/blobs to > actually use? Clearly future commits could record their own trees, but > how would they generate the "correct" tree? > > Maybe I am missing something here? If you write four trees out of each stage in the index and record them, you could in theory have a new command that reads them and recreate the conflicted index. Oh, and then you would need the fifth tree that records what the working-tree files (with conflict markers) looked like, in order to reproduce the state seen by the person who ran "git merge", attempted to resolve and gave up halfway in the middle. As a local operation, extending "git stash" somehow so that it can stash even in a conflicted working tree may be a better approach, and it does not need cruft headers in commit objects, I would think.
Re: The case for two trees in a commit ("How to make rebase less modal")
Stefan Bellerwrites: > $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86 > parent 105d5b91138ced892765a84e771a061ede8d63b8 > author Stefan Beller 1519859216 -0800 > committer Stefan Beller 1519859216 -0800 > tree 5495266479afc9a4bd9560e9feac465ed43fa63a > test commit > EOF > 19abfc3bf1c5d782045acf23abdf7eed81e16669 > $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669 > $ > > So it is technically possible to create a commit with two tree entries > and fsck is not complaining. The second one is merely a random unauthorized header that is not interpreted in any way by Git. It is merely being confusing by starting with "tree " and having 40-hex after it, but the 40-hex does not get interpreted as an object name, and does not participate in reachability computation (i.e. packing, pruning and fsck). There is not much difference between that and a line of trailer in the commit log message (other than this one is less discoverable).
Re: [PATCH v4 00/35] protocol version 2
Brandon Williamswrites: > Lots of changes since v3 (well more than between v2 and v3). Thanks for > all of the reviews on the last round, the series is getting more > polished. > > * Eliminated the "# service" line from the response from an HTTP >server. This means that the response to a v2 request is exactly the >same regardless of which transport you use! Docs for this have been >added as well. > * Changed how ref-patterns work with the `ls-refs` command. Instead of >using wildmatch all patterns must either match exactly or they can >contain a single '*' character at the end to mean that the prefix >must match. Docs for this have also been added. > * Lots of updates to the docs. Including documenting the >`stateless-connect` remote-helper command used by remote-curl to >handle the http transport. > * Fixed a number of bugs with the `fetch` command, one of which didn't >use objects from configured alternates. I noticed that this round is built on top of v2.16.0-rc0. It certainly makes it easier to compare against the previous round which was built on top of that old commit and it is very much appreciated that a reroll does not involve pointless rebases. For those who are helping from sidelines, it may be ehlpful to mention where in the history this was developed on, though, as applying these on the current 'master' has a handful of small conflicts. Thanks, will replace and will comment on individual patches as needed.
Re: [PATCH] git-gui: bind CTRL/CMD+numpad ENTER to do_commit
On Thu, Mar 1, 2018 at 9:39 AM, Birger Skogeng Pedersenwrote: > --- Please sign-off on your patch. See Documentation/SubmittingPatches. Also, it would be helpful to write at least a short commit message justifying the change. The reason you gave in your lead-in email[1] might be sufficient: In git-gui, we can hit CTRL/CMD+ENTER to create a commit. However, using the numpad ENTER does not invoke the same command. (assuming people don't argue that numpad ENTER should be saved for some other function). Thanks. [1]: https://public-inbox.org/git/CAGr--=lxmtz5rrp4742u3vsradrsware2sitcsowatyson2...@mail.gmail.com/ > diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh > index 91c00e648..6de74ce63 100755 > --- a/git-gui/git-gui.sh > +++ b/git-gui/git-gui.sh > @@ -3867,6 +3867,7 @@ bind . <$M1B-Key-equal> {show_more_context;break} > bind . <$M1B-Key-plus> {show_more_context;break} > bind . <$M1B-Key-KP_Add> {show_more_context;break} > bind . <$M1B-Key-Return> do_commit > +bind . <$M1B-Key-KP_Enter> do_commit > foreach i [list $ui_index $ui_workdir] { > bind $i{ toggle_or_diff click %W %x %y; break } > bind $i <$M1B-Button-1> { add_one_to_selection %W %x %y; break }
Re: [GSoC][PATCH v2] userdiff: add built-in pattern for golang
On Thu, Mar 1, 2018 at 6:19 AM, Alban Gruinwrote: > This adds xfuncname and word_regex patterns for golang, a quite > popular programming language. It also includes test cases for the > xfuncname regex (t4018) and updated documentation. > > The xfuncname regex finds functions, structs and interfaces. Although > the Go language prohibits the opening brace from being on its own > line, the regex does not makes it mandatory, to be able to match > `func` statements like this: > > func foo(bar int, > baz int) { > } > > This is covered by the test case t4018/golang-long-func. A possible suggested rewrite to make it flow a bit better and to mention the loose whitespace matching: The xfuncname regex finds functions, structs and interfaces. Although the Go language prohibits the opening brace of a 'func' from being on its own line, the regex makes the brace optional so it can match function declarations wrapped over multiple lines (covered by new test case t4018/golang-long-func): func foo(bar int, baz int) { } Whitespace matching is also a bit lax in order to handle non-standard formatting of method declarations. For instance: func(x *X) foo() { versus typical 'gofmt' formatted: func (x *x) foo() { (Not necessarily worth a re-roll; perhaps Junio can pick it up when queueing if he considers it an improvement.) Thanks. > The word_regex pattern finds identifiers, integers, floats, complex > numbers and operators, according to the go specification. > > Signed-off-by: Alban Gruin
Re: [PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config
Nguyễn Thái Ngọc Duywrites: > pack-objects could be a big memory hog especially on large repos, > everybody knows that. The suggestion to stick a .keep file on the > largest pack to avoid this problem is also known for a long time. Yup, but not that it is not "largest" per-se. The thing being large is a mere consequence that it is the base pack that holds the bulk of older parts of the history (e.g. the one that you obtained via the initial clone). > Let's do the suggestion automatically instead of waiting for people to > come to Git mailing list and get the advice. When a certain condition > is met, gc --auto create a .keep file temporary before repack is run, > then remove it afterward. > > gc --auto does this based on an estimation of pack-objects memory > usage and whether that fits in one third of system memory (the > assumption here is for desktop environment where there are many other > applications running). > > Since the estimation may be inaccurate and that 1/3 threshold is > arbitrary, give the user a finer control over this mechanism as well: > if the largest pack is larger than gc.bigPackThreshold, it's kept. If this is a transient mechanism during a single gc session, it would be far more preferrable if we can find a way to do this without actually having a .keep file on the filesystem.
Re: Worktree silently deleted on git fetch/gc/log
On Wed, Feb 28, 2018 at 7:44 AM, Дилян Палаузовwrote: > A (branch master) and > B (branch g) which is a worktree of the first. > > /git/B g>$ git fetch > [...] > From https://... >13e4c55a0..02655d5fb g -> origin/g >c37a3ca25..bc7888511 master -> origin/master > Auto packing the repository in background for optimum performance. > See "git help gc" for manual housekeeping. > /git/B g<>$ git log -p origin/g > fatal: Not a git repository: /git/A/.git/worktrees/B > /git/B$ > > Please note that on the second last prompt there is <>, so that git-prompt > has found the neccessary information and this was this was later deleted - > by 'gc' or 'log'. > > What would be the procedure to restore the /git/A/.git/worktrees/B > structure? Can you show us (via 'cat') the content of the following files? /git/B/.git /git/A/.git/worktrees/b/HEAD /git/A/.git/worktrees/b/commondir /git/A/.git/worktrees/b/gitdir
Re: [PATCH 09/11] pack-objects: refer to delta objects by index instead of pointer
Nguyễn Thái Ngọc Duywrites: > Notice that packing_data::nr_objects is uint32_t, we could only handle > maximum 4G objects and can address all of them with an uint32_t. If we > use a pointer here, we waste 4 bytes on 64 bit architecture. > > Convert these delta pointers to indexes. Since we need to handle NULL > pointers as well, the index is shifted by one [1]. Makes perfect sense. I do not think losing 1 slot out of possible 4G is a regression, unlike the 256 packfile limit 07/11 imposes.
Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry
Nguyễn Thái Ngọc Duywrites: > Instead of using 8 bytes (on 64 bit arch) to store a pointer to a > pack. Use an index isntead since the number of packs should be > relatively small. > > This limits the number of packs we can handle to 256 (still > unreasonably high for a repo to work well). If you have more than 256 > packs, you'll need an older version of Git to repack first. I can tell without looking at the rest of the thread that people had reasonable objection against this stance ;-) This will not fly well.
Re: [PATCH 04/11] pack-objects: use bitfield for object_entry::depth
Nguyễn Thái Ngọc Duywrites: > This does not give us any saving due to padding. But we will be able > to save once we cut 4 bytes out of this struct in a subsequent patch. > > Because of struct packing from now on we can only handle max depth > 4095 (or even lower when new booleans are added in this struct). This > should be ok since long delta chain will cause significant slow down > anyway. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- This should be marked as RFC/PATCH; I do not have objection to limiting the delta depth to some reasonable length (rather than the current 1<<31 or worse 1<<63), and 4k may be such a reasonable limit (I'd actually think anything more than a few hundreds is probably a bad idea), but it needs to be documented. > builtin/pack-objects.c | 4 > pack-objects.h | 6 ++ > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index a4dbb40824..cfd97da7db 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, > const char *prefix) > if (pack_to_stdout != !base_name || argc) > usage_with_options(pack_usage, pack_objects_options); > > + if (depth > (1 << OE_DEPTH_BITS)) > + die(_("delta chain depth %d is greater than maximum limit %d"), > + depth, (1 << OE_DEPTH_BITS)); > + > argv_array_push(, "pack-objects"); > if (thin) { > use_internal_rev_list = 1; > diff --git a/pack-objects.h b/pack-objects.h > index fca334ab4d..3941e6c9a6 100644 > --- a/pack-objects.h > +++ b/pack-objects.h > @@ -2,6 +2,7 @@ > #define PACK_OBJECTS_H > > #define OE_DFS_STATE_BITS 2 > +#define OE_DEPTH_BITS 12 > > /* > * State flags for depth-first search used for analyzing delta cycles. > @@ -43,10 +44,7 @@ struct object_entry { > unsigned tagged:1; /* near the very tip of refs */ > unsigned filled:1; /* assigned write-order */ > unsigned dfs_state:OE_DFS_STATE_BITS; > - > - /* XXX 20 bits hole, try to pack */ > - > - int depth; > + unsigned depth:OE_DEPTH_BITS; > > /* size: 120, padding: 4 */ > };
[RFC] Contributing to Git (on Windows)
We (Git devs at Microsoft) have had several people start contributing to Git over the past few years (I'm the most-recent addition). As we on-boarded to Git development on our Windows machines, we collected our setup steps on an internal wiki page. Now, we'd like to make that document publicly available. These steps are focused on a Windows user, so we propose putting them in the git-for-windows/git repo under CONTRIBUTING.md. I have a pull request open for feedback [1]. I'll read comments on that PR or in this thread. If you've ever done Git development on a Windows machine, I would love to hear your feedback on this document. Any other advice you have is greatly appreciated. For anyone interested, there are also a discussion about submitting patches upstream. The document links to Documentation/CodingGuidelines and Documentation/SubmittingPatches, but tries to elaborate with additional advice. Thanks, -Stolee [1] https://github.com/git-for-windows/git/pull/1529
Re: Obsolete instruction in SubmittingPatches?
On 01.03.2018 0:54, Junio C Hamano wrote: > Andrei Rybakwrites: > >> Is this part of guidelines obsolete, or am I not understanding this >> correctly? > > I am merely being nice (but only on "time-permitting" basis). > Does that mean that the integration of a series is easier, when there is a re-send?
Re: [RFC PATCH] color: respect the $NO_COLOR convention
Junio C Hamanowrites: > Leah Neukirchen writes: > >> NO_COLOR (http://no-color.org/) is a comprehensive approach to disable >> colors by default for all tools: > > The list of software that supports that "convention" is, eh, > respectable. Is it really a "convention" yet, or yet another thing > the user needs to worry about? You are right in calling this out an emerging new thing, but the second list of that page proves that it will be useful to settle on a common configuration, and my hope is by getting a few popular projects on board, others will soon follow. It certainly is easy to implement, and rather unintrusive. Users which don't know about this feature are completely unaffected. >> if (color_stdout_is_tty < 0) >> color_stdout_is_tty = isatty(1); >> if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) { > > According to no-color.org's FAQ #2, NO_COLOR should affect only the > "default" behaviour, and should stay back if there is an explicit > end-user configuration (or command line override). And this helper > function is called only from want_color() when their is no such > higher precedence setting, which is in line with the recommendation. > > Which is good. Yes, I took care of that. Should this also be tested? It doesn't quite fit into the setting of t4026-color.sh I think. Thanks, -- Leah Neukirchen http://leah.zone
Re: [RFC PATCH] color: respect the $NO_COLOR convention
Leah Neukirchenwrites: > NO_COLOR (http://no-color.org/) is a comprehensive approach to disable > colors by default for all tools: The list of software that supports that "convention" is, eh, respectable. Is it really a "convention" yet, or yet another thing the user needs to worry about? > diff --git a/color.c b/color.c > index d48dd947c..59e9c2459 100644 > --- a/color.c > +++ b/color.c > @@ -326,6 +326,8 @@ int git_config_colorbool(const char *var, const char > *value) > > static int check_auto_color(void) > { > + if (getenv("NO_COLOR")) > + return 0; Our convention often calls for CONFIG_VAR=false to mean "I do not want to see what CONFIG_VAR wants to do done", i.e. NO_COLOR=false git show would show colored output if there is no other settings. But this code contradicts the convention, deliberately because that is what no-color.org wants. Makes me wonder if that convention is worth following in the first place. > if (color_stdout_is_tty < 0) > color_stdout_is_tty = isatty(1); > if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) { According to no-color.org's FAQ #2, NO_COLOR should affect only the "default" behaviour, and should stay back if there is an explicit end-user configuration (or command line override). And this helper function is called only from want_color() when their is no such higher precedence setting, which is in line with the recommendation. Which is good.
[RFC PATCH] color: respect the $NO_COLOR convention
When the NO_COLOR environment variable is set to any value, default to disabling color, i.e. resolve 'auto' to false. NO_COLOR (http://no-color.org/) is a comprehensive approach to disable colors by default for all tools: > All command-line software which outputs text with ANSI color added > should check for the presence of a NO_COLOR environment variable that, > when present (regardless of its value), prevents the addition of ANSI > color. Signed-off-by: Leah Neukirchen--- This is a first stab at implementing NO_COLOR for git, effectively making it then behave like before colors were enabled by default. I feel this should be documented somewhere, but I'm not sure where the best place is. Perhaps in config.ui, or the Git environment variables (but they all start with GIT_). color.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/color.c b/color.c index d48dd947c..59e9c2459 100644 --- a/color.c +++ b/color.c @@ -326,6 +326,8 @@ int git_config_colorbool(const char *var, const char *value) static int check_auto_color(void) { + if (getenv("NO_COLOR")) + return 0; if (color_stdout_is_tty < 0) color_stdout_is_tty = isatty(1); if (color_stdout_is_tty || (pager_in_use() && pager_use_color)) { -- 2.16.2
Re: [PATCH v3 2/9] t3701: indent here documents
Phillip Woodwrites: > Thanks for the tips, tbdiff looks useful (I just need to learn to read > diffs of diffs!). I also find rebasing them on a common ancestor useful > but its a bit tedious. Yes, comparing two versions of a series is somewhat unusual and needs getting used to before one can do so efficiently. You do not have to always rebase (tbdiff is fairly good at what it does even when two ranges are far apart), but it helps not to begin developing on a codebase that is newer than needed (e.g. a bugfix on 'next' is unneeded unless you are fixing a bug introduced by a topic that is still on 'next'---in which case the best fix is one that is on that topic, without depending on the rest of 'next'). In any case, thank _you_ for contributing.
Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.
On 1 March 2018 at 16:08, Jeff Kingwrote: > On Thu, Mar 01, 2018 at 09:28:31AM -0500, Randall S. Becker wrote: > >> > It's not clear to me though if we just want to tweak the programs run in >> > the >> > test scripts in order to get test_must_fail to stop complaining, or if we >> > consider the unusual exit codes from our perl-based Git programs to be an >> > error that should be fixed for real use, too. >> >> I'm living unusual exit code IRL all the time. So "fixed for real", is >> what I'm looking for. So if we were to do that, where is the best >> place to insert a fix - my original question - that would be permanent >> in the main git test code. Or perhaps this needs to be in the main >> code itself. > > If it's fixed in the real world, then it needs to be in the main code > itself. It looks like git-svn already does this to some degree itself > (most of the work happens in an eval, and it calls the "fatal" function > if that throws an exception via 'die'). > > So I think git-send-email.perl (and maybe others) needs to learn the > same trick (by pushing the main bits of the script into an eval). Or it > needs to include the SIG{__DIE__} trickery at the beginning of the > script. > > I think the SIG{__DIE__} stuff could go into Git/PredictableDie.pm or > something, and then any scripts that need it could just "use > Git::PredictableDie". > > Does that make sense? To me yes. By putting it in a module and 'use'ing it early you guarantee it will be set up before any code following it is even compiled. If there is an existing module that the git perl code always uses then it could go in there in a BEGIN{} block instead of adding the new module. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.
On Thu, Mar 01, 2018 at 09:28:31AM -0500, Randall S. Becker wrote: > > It's not clear to me though if we just want to tweak the programs run in the > > test scripts in order to get test_must_fail to stop complaining, or if we > > consider the unusual exit codes from our perl-based Git programs to be an > > error that should be fixed for real use, too. > > I'm living unusual exit code IRL all the time. So "fixed for real", is > what I'm looking for. So if we were to do that, where is the best > place to insert a fix - my original question - that would be permanent > in the main git test code. Or perhaps this needs to be in the main > code itself. If it's fixed in the real world, then it needs to be in the main code itself. It looks like git-svn already does this to some degree itself (most of the work happens in an eval, and it calls the "fatal" function if that throws an exception via 'die'). So I think git-send-email.perl (and maybe others) needs to learn the same trick (by pushing the main bits of the script into an eval). Or it needs to include the SIG{__DIE__} trickery at the beginning of the script. I think the SIG{__DIE__} stuff could go into Git/PredictableDie.pm or something, and then any scripts that need it could just "use Git::PredictableDie". Does that make sense? -Peff
Re: ref-filter: how to improve the code
On Thu, Mar 01, 2018 at 02:17:09PM +0300, Оля Тележная wrote: > >> I tried to replace all die("...") with `return error("...")` and > >> finally exit(), but actual problem is that we print "error:..." > >> instead of "fatal:...", and it looks funny. > > > > If you do that, then format_ref_array_item() is still going to print > > things, even if it doesn't die(). But for "cat-file --batch", we usually > > do not print errors at all, but instead just say "... missing" (although > > it depends on the error; syntactic errors in the format string would > > still cause us to write to stderr). > > Not sure if you catch my idea. format_ref_array_item() will not print > anything, it will just return an error code. And if there was an error > - we will print it in show_ref_array_item() (or go back to cat-file > and print what we want). OK, I think I misunderstood. It seems like there are three possible strategies on the table: - low-level functions call error() and return -1, that gets passed up through mid-level functions like format_ref_array_item(), and then higher-level functions like show_ref_array_item() act on the error code and call die(). The user sees something like: error: unable to parse object 1234abcd fatal: unable to format object - low-level functions return a numeric error code, which is then formatted by higher-level functions like show_ref_array_item() to produce a specific message - low-level functions stuff an error code into a strbuf and return -1, and then higher-level functions like show_ref_array_item() will feed that message to die("%s", err.buf). I think the first one, besides changing the output, is going to produce error() messages even for cases where we're calling format_ref_array_item() directly, because error() writes its output immediately. The second is a pain in practice, because it doubles the work: you have to come up with a list of error codes, and then translate it them into strings. And there's no room to mention variable strings (like the name of the object). So I think the third is really the only viable option. -Peff
Good day, I am contacting you in regards to my late client who bears the same surname with you. Please, contact me back urgently for details about this message. Barrister Ransley Bethel (ESQ).
Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry
On Thu, Mar 01, 2018 at 04:10:48PM +0700, Nguyễn Thái Ngọc Duy wrote: > Instead of using 8 bytes (on 64 bit arch) to store a pointer to a > pack. Use an index isntead since the number of packs should be > relatively small. > > This limits the number of packs we can handle to 256 (still > unreasonably high for a repo to work well). If you have more than 256 > packs, you'll need an older version of Git to repack first. I overall like the direction of this series, but I think this one is just too much. While you definitely shouldn't have a ton of packs, this leaves the user with no real escape hatch. And 256 isn't actually that many packs. -Peff
[PATCH] git-gui: bind CTRL/CMD+numpad ENTER to do_commit
--- git-gui/git-gui.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/git-gui/git-gui.sh b/git-gui/git-gui.sh index 91c00e648..6de74ce63 100755 --- a/git-gui/git-gui.sh +++ b/git-gui/git-gui.sh @@ -3867,6 +3867,7 @@ bind . <$M1B-Key-equal> {show_more_context;break} bind . <$M1B-Key-plus> {show_more_context;break} bind . <$M1B-Key-KP_Add> {show_more_context;break} bind . <$M1B-Key-Return> do_commit +bind . <$M1B-Key-KP_Enter> do_commit foreach i [list $ui_index $ui_workdir] { bind $i{ toggle_or_diff click %W %x %y; break } bind $i <$M1B-Button-1> { add_one_to_selection %W %x %y; break } -- 2.16.2.268.g7f9c27f2f.dirty
RE: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.
On March 1, 2018 2:36 AM, Jeff King wrote: > On Wed, Feb 28, 2018 at 05:51:14PM +0100, demerphq wrote: > > > I would look into putting it into a module and then using the PERL5OPT > > environment var to have it loaded automagically in any of your perl > > scripts. > > > > For instance if you put that code into a module called Git/DieTrap.pm > > > > then you could do: > > > > PERL5OPT=-MGit::DieTrap > > > > In your test setup code assuming you have some. Then you don't need to > > change any of your scripts just the test runner framework. > > That's a clever trick. > > It's not clear to me though if we just want to tweak the programs run in the > test scripts in order to get test_must_fail to stop complaining, or if we > consider the unusual exit codes from our perl-based Git programs to be an > error that should be fixed for real use, too. I'm living unusual exit code IRL all the time. So "fixed for real", is what I'm looking for. So if we were to do that, where is the best place to insert a fix - my original question - that would be permanent in the main git test code. Or perhaps this needs to be in the main code itself. Cheers, Randall
git-gui: CTRL/CMD + numpad ENTER does not invoke same command as "regular" ENTER
In git-gui, we can hit CTRL/CMD+ENTER to create a commit. However, using the numpad ENTER does not invoke the same command. I propose that both numpad ENTER and "regular" ENTER should invoke the same command.
Re: [PATCH 00/11] Reduce pack-objects memory footprint
On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted: > The array of object_entry in pack-objects can take a lot of memory > when pack-objects is run in "pack everything" mode. On linux-2.6.git, > this array alone takes roughly 800MB. > > This series reorders some fields and reduces field size... to keep > this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on > 64-bit linux and saves 260MB on linux-2.6.git. I'm very interested in this patch series. I don't have time to test this one right now (have to run), but with your previous RFC patch memory use (in the ~4GB range) on a big in-house repo went down by a bit over 3%, and it's ~5% faster. Before/after RSS 4440812 / 429 & runtime 172.73 / 162.45. This is after having already done a full git gc before, data via /usr/bin/time -v. So not huge, but respectable. We have a big repo, and this gets repacked on 6-8GB of memory on dev KVMs, so we're under a fair bit of memory pressure. git-gc slows things down a lot. It would be really nice to have something that made it use drastically less memory at the cost of less efficient packs. Is the property that you need to spend give or take the size of .git/objects in memory something inherent, or just a limitation of the current implementation? I.e. could we do a first pass to pick some objects based on some heuristic, then repack them N at a time, and finally delete the now-obsolete packs? Another thing I've dealt with is that on these machines their NFS-mounted storage gets exhausted (I'm told) due to some pathological operations git does during repack, I/O tends to get 5-6x slower. Of course ionice doesn't help because the local kernel doesn't know anything about how harmful it is.
Re: [PATCH 07/11] pack-objects: move in_pack out of struct object_entry
On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted: > pack. Use an index isntead since the number of packs should be s/isntead/instead/ > This limits the number of packs we can handle to 256 (still > unreasonably high for a repo to work well). If you have more than 256 > packs, you'll need an older version of Git to repack first. So if you have gc.autoPackLimit=300 this will break, how does it break? Should we also make (& document) setting that variable higher than 256 an error?
Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
On 3/1/2018 2:42 AM, Jeff King wrote: On Wed, Feb 28, 2018 at 01:27:03PM -0800, Junio C Hamano wrote: Somehow this topic has been hanging without getting further attention for too long. It's time to wrap it up and moving it forward. How about this? -- >8 -- From: Junio C HamanoDate: Wed, 28 Feb 2018 13:21:09 -0800 Subject: [PATCH] untracked cache: use git_env_bool() not getenv() for customization [...] This looks good to me. Thanks for tying up the loose end. -Peff Looks good to me as well. Thanks for fixing the environment variables. I'm having send-email issues so hope this one gets through. Please ignore my [PATCH V2] if it ever makes it through. Ben
[GSoC][PATCH v2] userdiff: add built-in pattern for golang
This adds xfuncname and word_regex patterns for golang, a quite popular programming language. It also includes test cases for the xfuncname regex (t4018) and updated documentation. The xfuncname regex finds functions, structs and interfaces. Although the Go language prohibits the opening brace from being on its own line, the regex does not makes it mandatory, to be able to match `func` statements like this: func foo(bar int, baz int) { } This is covered by the test case t4018/golang-long-func. The word_regex pattern finds identifiers, integers, floats, complex numbers and operators, according to the go specification. Signed-off-by: Alban Gruin--- Documentation/gitattributes.txt | 2 ++ t/t4018-diff-funcname.sh| 1 + t/t4018/golang-complex-function | 8 t/t4018/golang-func | 4 t/t4018/golang-interface| 4 t/t4018/golang-long-func| 5 + t/t4018/golang-struct | 4 userdiff.c | 9 + 8 files changed, 37 insertions(+) create mode 100644 t/t4018/golang-complex-function create mode 100644 t/t4018/golang-func create mode 100644 t/t4018/golang-interface create mode 100644 t/t4018/golang-long-func create mode 100644 t/t4018/golang-struct diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index c21f5ca10..d52b254a2 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -714,6 +714,8 @@ patterns are available: - `fountain` suitable for Fountain documents. +- `golang` suitable for source code in the Go language. + - `html` suitable for HTML/XHTML documents. - `java` suitable for source code in the Java language. diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh index 1795ffc3a..22f9f88f0 100755 --- a/t/t4018-diff-funcname.sh +++ b/t/t4018-diff-funcname.sh @@ -33,6 +33,7 @@ diffpatterns=" css fortran fountain + golang html java matlab diff --git a/t/t4018/golang-complex-function b/t/t4018/golang-complex-function new file mode 100644 index 0..e057dcefe --- /dev/null +++ b/t/t4018/golang-complex-function @@ -0,0 +1,8 @@ +type Test struct { + a Type +} + +func (t *Test) RIGHT(a Type) (Type, error) { + t.a = a + return ChangeMe, nil +} diff --git a/t/t4018/golang-func b/t/t4018/golang-func new file mode 100644 index 0..8e9c9ac7c --- /dev/null +++ b/t/t4018/golang-func @@ -0,0 +1,4 @@ +func RIGHT() { + a := 5 + b := ChangeMe +} diff --git a/t/t4018/golang-interface b/t/t4018/golang-interface new file mode 100644 index 0..553bedec9 --- /dev/null +++ b/t/t4018/golang-interface @@ -0,0 +1,4 @@ +type RIGHT interface { + a() Type + b() ChangeMe +} diff --git a/t/t4018/golang-long-func b/t/t4018/golang-long-func new file mode 100644 index 0..ac3a77b5c --- /dev/null +++ b/t/t4018/golang-long-func @@ -0,0 +1,5 @@ +func RIGHT(aVeryVeryVeryLongVariableName AVeryVeryVeryLongType, + anotherLongVariableName AnotherLongType) { + a := 5 + b := ChangeMe +} diff --git a/t/t4018/golang-struct b/t/t4018/golang-struct new file mode 100644 index 0..5deda77fe --- /dev/null +++ b/t/t4018/golang-struct @@ -0,0 +1,4 @@ +type RIGHT struct { + a Type + b ChangeMe +} diff --git a/userdiff.c b/userdiff.c index dbfb4e13c..8f5028f6b 100644 --- a/userdiff.c +++ b/userdiff.c @@ -38,6 +38,15 @@ IPATTERN("fortran", "|//|\\*\\*|::|[/<>=]="), IPATTERN("fountain", "^((\\.[^.]|(int|ext|est|int\\.?/ext|i/e)[. ]).*)$", "[^ \t-]+"), +PATTERNS("golang", +/* Functions */ +"^[ \t]*(func[ \t]*.*(\\{[ \t]*)?)\n" +/* Structs and interfaces */ +"^[ \t]*(type[ \t].*(struct|interface)[ \t]*(\\{[ \t]*)?)", +/* -- */ +"[a-zA-Z_][a-zA-Z0-9_]*" +"|[-+0-9.eE]+i?|0[xX]?[0-9a-fA-F]+i?" +"|[-+*/<>%&^|=!:]=|--|\\+\\+|<<=?|>>=?|&\\^=?|&&|\\|\\||<-|\\.{3}"), PATTERNS("html", "^[ \t]*(<[Hh][1-6]([ \t].*)?>.*)$", "[^<>= \t]+"), PATTERNS("java", -- 2.16.1
Re: ref-filter: how to improve the code
2018-02-28 16:25 GMT+03:00 Jeff King: > On Sun, Feb 25, 2018 at 09:28:25PM +0300, Оля Тележная wrote: > >> I am trying to remove cat-file formatting part and reuse same >> functionality from ref-filter. >> I have a problem that cat-file sometimes needs to continue running >> even if the request is broken, while in ref-filter we invoke die() in >> many places everywhere during formatting process. I write this email >> because I want to discuss how to implement the solution better. >> >> ref-filter has 2 functions which could be interesting for us: >> format_ref_array_item() and show_ref_array_item(). I guess it's a good >> idea to print everything in show_ref_array_item(), including all >> errors. In that case all current users of ref-filter will invoke >> show_ref_array_item() (as they did it before), and cat-file could use >> format_ref_array_item() and work with the result in its own logic. > > Yes, that arrangement makes sense to me. > >> I tried to replace all die("...") with `return error("...")` and >> finally exit(), but actual problem is that we print "error:..." >> instead of "fatal:...", and it looks funny. > > If you do that, then format_ref_array_item() is still going to print > things, even if it doesn't die(). But for "cat-file --batch", we usually > do not print errors at all, but instead just say "... missing" (although > it depends on the error; syntactic errors in the format string would > still cause us to write to stderr). Not sure if you catch my idea. format_ref_array_item() will not print anything, it will just return an error code. And if there was an error - we will print it in show_ref_array_item() (or go back to cat-file and print what we want). > >> One of the easiest solutions is to add strbuf parameter for errors to >> all functions that we use during formatting process, fill it in and >> print in show_ref_array_item() if necessary. What do you think about >> this idea? >> >> Another way is to change the resulting error message, print current >> message with "error" prefix and then print something like "fatal: >> could not format the output". It is easier but I am not sure that it's >> a good idea to change the interface. > > For reference, the first one is what we've been switching to in the refs > code. And I think it's been fairly successful there. > > -Peff
Re: [PATCH v3 2/9] t3701: indent here documents
Hi Junio On 28/02/18 15:37, Junio C Hamano wrote: > Phillip Woodwrites: > >> Is there an easy way for contributors to compare the branch they post to >> what ends up it pu? > > Distributed work is pretty much symmetric, so it can be done the > same way as one would review a rerolled series by another co-worker. > > $ git log --oneline --first-parent origin/master..origin/pu > > would show merges of topic branches, so you can find the tip of the > topic of your earlier submission (it would be one $commit^2; call > that $topic). origin/master..$topic would be the one branch > (i.e. what is in 'pu') to be compared. > > The other branch to be compared is what you sent the previous one > out of, or the new version of the patches. > > To compare two branches, git://github.com/trast/tbdiff is one of the > easier way. > > Before I learned about the tool, I used to "format-patch --stdout" > on both branches, and ran "diff -u" between them, as a crude measure; > it was more useful for spotting typofixes in the log messages than > code changes, before I got good at reading diff of diffs ;-). > > Also, tentatively rebasing the two branches on a common base, and > then doing "git diff $oldtopic~$N $newtopic~$N" or something like > that for varying value of $N (and N==0 is a good way for final > sanity checks). Thanks for the tips, tbdiff looks useful (I just need to learn to read diffs of diffs!). I also find rebasing them on a common ancestor useful but its a bit tedious. Thanks again Phillip
[PATCH v4 6/9] add -p: adjust offsets of subsequent hunks when one is skipped
From: Phillip WoodSince commit 8cbd431082 ("git-add--interactive: replace hunk recounting with apply --recount", 2008-7-2) if a hunk is skipped then we rely on the context lines to apply subsequent hunks in the right place. While this works most of the time it is possible for hunks to end up being applied in the wrong place. To fix this adjust the offset of subsequent hunks to correct for any change in the number of insertions or deletions due to the skipped hunk. The change in offset due to edited hunks that have the number of insertions or deletions changed is ignored here, it will be fixed in the next commit. Signed-off-by: Phillip Wood --- git-add--interactive.perl | 15 +-- t/t3701-add-interactive.sh | 2 +- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 8ababa6453..7a0a5896bb 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -926,14 +926,25 @@ sub coalesce_overlapping_hunks { my @out = (); my ($last_o_ctx, $last_was_dirty); + my $ofs_delta = 0; - for (grep { $_->{USE} } @in) { + for (@in) { if ($_->{TYPE} ne 'hunk') { push @out, $_; next; } my $text = $_->{TEXT}; - my ($o_ofs) = parse_hunk_header($text->[0]); + my ($o_ofs, $o_cnt, $n_ofs, $n_cnt) = + parse_hunk_header($text->[0]); + unless ($_->{USE}) { + $ofs_delta += $o_cnt - $n_cnt; + next; + } + if ($ofs_delta) { + $n_ofs += $ofs_delta; + $_->{TEXT}->[0] = format_hunk_header($o_ofs, $o_cnt, +$n_ofs, $n_cnt); + } if (defined $last_o_ctx && $o_ofs <= $last_o_ctx && !$_->{DIRTY} && diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 094eeca405..e3150a4e07 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -544,7 +544,7 @@ test_expect_success 'set up pathological context' ' test_write_lines +b " a" >patch ' -test_expect_failure 'add -p works with pathological context lines' ' +test_expect_success 'add -p works with pathological context lines' ' git reset && printf "%s\n" n y | git add -p && -- 2.16.1
[PATCH v4 5/9] t3701: add failing test for pathological context lines
From: Phillip WoodWhen a hunk is skipped by add -i the offsets of subsequent hunks are not adjusted to account for any missing insertions due to the skipped hunk. Most of the time this does not matter as apply uses the context lines to apply the subsequent hunks in the correct place, however in pathological cases the context lines will match at the now incorrect offset and the hunk will be applied in the wrong place. The offsets of hunks following an edited hunk that has had the number of insertions or deletions changed also need to be updated in the same way. Add failing tests to demonstrate this. Signed-off-by: Phillip Wood --- Notes: changes since v2: - removed test_set_editor as it is already set changes since v1: - changed edit test to use the existing fake editor and to strip the hunk header and some context lines as well as deleting an insertion - style fixes t/t3701-add-interactive.sh | 30 ++ 1 file changed, 30 insertions(+) diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index f95714230b..094eeca405 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -531,4 +531,34 @@ test_expect_success 'status ignores dirty submodules (except HEAD)' ' ! grep dirty-otherwise output ' +test_expect_success 'set up pathological context' ' + git reset --hard && + test_write_lines a a a a a a a a a a a >a && + git add a && + git commit -m a && + test_write_lines c b a a a a a a a b a a a a >a && + test_write_lines a a a a a a a b a a a a >expected-1 && + test_write_lines b a a a a a a a b a a a a >expected-2 && + # check editing can cope with missing header and deleted context lines + # as well as changes to other lines + test_write_lines +b " a" >patch +' + +test_expect_failure 'add -p works with pathological context lines' ' + git reset && + printf "%s\n" n y | + git add -p && + git cat-file blob :a >actual && + test_cmp expected-1 actual +' + +test_expect_failure 'add -p patch editing works with pathological context lines' ' + git reset && + # n q q below is in case edit fails + printf "%s\n" e yn q q | + git add -p && + git cat-file blob :a >actual && + test_cmp expected-2 actual +' + test_done -- 2.16.1
[PATCH v4 2/9] t3701: indent here documents
From: Phillip WoodIndent here documents in line with the current style for tests. While at it, quote the end marker of here-docs that do not use variable interpolation. Signed-off-by: Phillip Wood Signed-off-by: Junio C Hamano --- Notes: changes since v3: - updated to match what was in pu t/t3701-add-interactive.sh | 174 ++--- 1 file changed, 87 insertions(+), 87 deletions(-) diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 058698df6a..3130dafcf0 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -22,14 +22,14 @@ test_expect_success 'status works (initial)' ' ' test_expect_success 'setup expected' ' -cat >expected expected expected patch fake_editor.sh && - cat >>fake_editor.sh <<\EOF && -mv -f "$1" oldpatch && -mv -f patch "$1" -EOF + cat >>fake_editor.sh <<-\EOF && + mv -f "$1" oldpatch && + mv -f patch "$1" + EOF chmod a+x fake_editor.sh && test_set_editor "$(pwd)/fake_editor.sh" ' @@ -126,10 +126,10 @@ test_expect_success 'bad edit rejected' ' ' test_expect_success 'setup patch' ' -cat >patch patch expected patch expected expected expected
[PATCH v4 8/9] add -p: fix counting when splitting and coalescing
From: Phillip WoodWhen a file has no trailing new line at the end diff records this by appending "\ No newline at end of file" below the last line of the file. This line should not be counted in the hunk header. Fix the splitting and coalescing code to count files without a trailing new line properly and change one of the tests to test splitting without a trailing new line. Signed-off-by: Phillip Wood --- git-add--interactive.perl | 13 + t/t3701-add-interactive.sh | 31 +++ 2 files changed, 32 insertions(+), 12 deletions(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 0df0c2aa06..3226c2c4f0 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -793,6 +793,11 @@ sub split_hunk { while (++$i < @$text) { my $line = $text->[$i]; my $display = $display->[$i]; + if ($line =~ /^\\/) { + push @{$this->{TEXT}}, $line; + push @{$this->{DISPLAY}}, $display; + next; + } if ($line =~ /^ /) { if ($this->{ADDDEL} && !defined $next_hunk_start) { @@ -887,8 +892,8 @@ sub merge_hunk { $o_cnt = $n_cnt = 0; for ($i = 1; $i < @{$prev->{TEXT}}; $i++) { my $line = $prev->{TEXT}[$i]; - if ($line =~ /^\+/) { - $n_cnt++; + if ($line =~ /^[+\\]/) { + $n_cnt++ if ($line =~ /^\+/); push @line, $line; next; } @@ -905,8 +910,8 @@ sub merge_hunk { for ($i = 1; $i < @{$this->{TEXT}}; $i++) { my $line = $this->{TEXT}[$i]; - if ($line =~ /^\+/) { - $n_cnt++; + if ($line =~ /^[+\\]/) { + $n_cnt++ if ($line =~ /^\+/); push @line, $line; next; } diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 4cc9517eda..a9a9478a29 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -237,31 +237,46 @@ test_expect_success 'setup patch' ' baseline content +lastline + \ No newline at end of file EOF ' -# Expected output, similar to the patch but w/ diff at the top +# Expected output, diff is similar to the patch but w/ diff at the top test_expect_success 'setup expected' ' - cat >expected <<-\EOF - diff --git a/file b/file - index b6f2c08..61b9053 100755 + echo diff --git a/file b/file >expected && + cat patch |sed "/^index/s/ 100644/ 100755/" >>expected && + cat >expected-output <<-\EOF --- a/file +++ b/file @@ -1,2 +1,4 @@ +firstline baseline content +lastline + \ No newline at end of file + @@ -1,2 +1,3 @@ + +firstline +baseline +content + @@ -1,2 +2,3 @@ +baseline +content + +lastline + \ No newline at end of file EOF ' # Test splitting the first patch, then adding both -test_expect_success 'add first line works' ' +test_expect_success C_LOCALE_OUTPUT 'add first line works' ' git commit -am "clear local changes" && git apply patch && - (echo s; echo y; echo y) | git add -p file && - git diff --cached > diff && - diff_cmp expected diff + printf "%s\n" s y y | git add -p file 2>error | + sed -n -e "s/^Stage this hunk[^@]*\(@@ .*\)/\1/" \ + -e "/^[-+@ ]"/p >output && + test_must_be_empty error && + git diff --cached >diff && + diff_cmp expected diff && + test_cmp expected-output output ' test_expect_success 'setup expected' ' -- 2.16.1
[PATCH v4 4/9] t3701: don't hard code sha1 hash values
From: Phillip WoodUse a filter when comparing diffs to fix the value of non-zero hashes in diff index lines so we're not hard coding sha1 hash values in the expected output. This makes it easier to change the expected output if a test is edited as we don't need to worry about the exact hash value and means the tests will work when the hash algorithm is transitioned away from sha1. Thanks-to: Junio C Hamano Signed-off-by: Phillip Wood --- Notes: changes since v3: - fix zero hash values to seven digits changes since v2: - fix hash values in index lines rather than removing the line - reworded commit message t/t3701-add-interactive.sh | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 836ce346ed..f95714230b 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -10,6 +10,19 @@ then test_done fi +diff_cmp () { + for x + do + sed -e '/^index/s/[0-9a-f]*[1-9a-f][0-9a-f]*\.\./1234567../' \ +-e '/^index/s/\.\.[0-9a-f]*[1-9a-f][0-9a-f]*/..9abcdef/' \ +-e '/^index/s/ 00*\.\./ 000../' \ +-e '/^index/s/\.\.00*$/..000/' \ +-e '/^index/s/\.\.00* /..000 /' \ +"$x" >"$x.filtered" + done + test_cmp "$1.filtered" "$2.filtered" +} + test_expect_success 'setup (initial)' ' echo content >file && git add file && @@ -35,7 +48,7 @@ test_expect_success 'setup expected' ' test_expect_success 'diff works (initial)' ' (echo d; echo 1) | git add -i >output && sed -ne "/new file/,/content/p" diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success 'revert works (initial)' ' git add file && @@ -72,7 +85,7 @@ test_expect_success 'setup expected' ' test_expect_success 'diff works (commit)' ' (echo d; echo 1) | git add -i >output && sed -ne "/^index/,/content/p" diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success 'revert works (commit)' ' git add file && @@ -91,7 +104,7 @@ test_expect_success 'dummy edit works' ' test_set_editor : && (echo e; echo a) | git add -p && git diff > diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success 'setup patch' ' @@ -159,7 +172,7 @@ test_expect_success 'setup expected' ' test_expect_success 'real edit works' ' (echo e; echo n; echo d) | git add -p && git diff >output && - test_cmp expected output + diff_cmp expected output ' test_expect_success 'skip files similarly as commit -a' ' @@ -171,7 +184,7 @@ test_expect_success 'skip files similarly as commit -a' ' git reset && git commit -am commit && git diff >expected && - test_cmp expected output && + diff_cmp expected output && git reset --hard HEAD^ ' rm -f .gitignore @@ -248,7 +261,7 @@ test_expect_success 'add first line works' ' git apply patch && (echo s; echo y; echo y) | git add -p file && git diff --cached > diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success 'setup expected' ' @@ -271,7 +284,7 @@ test_expect_success 'deleting a non-empty file' ' rm non-empty && echo y | git add -p non-empty && git diff --cached >diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success 'setup expected' ' @@ -290,7 +303,7 @@ test_expect_success 'deleting an empty file' ' rm empty && echo y | git add -p empty && git diff --cached >diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success 'split hunk setup' ' @@ -355,7 +368,7 @@ test_expect_success 'patch mode ignores unmerged entries' ' +changed EOF git diff --cached >diff && - test_cmp expected diff + diff_cmp expected diff ' test_expect_success TTY 'diffs can be colorized' ' @@ -384,7 +397,7 @@ test_expect_success 'patch-mode via -i prompts for files' ' echo test >expect && git diff --cached --name-only >actual && - test_cmp expect actual + diff_cmp expect actual ' test_expect_success 'add -p handles globs' ' -- 2.16.1
[PATCH v4 0/9] Correct offsets of hunks when one is skipped
From: Phillip WoodI've fixed the second patch to match what was in pu and added some extra code to patch 4 to handle zero sha1 hashes where the length varies. Cover letter to v1: While working on a patch series to stage selected lines from a hunk without having to edit it I got worried that subsequent patches would be applied in the wrong place which lead to this series to correct the offsets of hunks following those that are skipped or edited. Interdiff to v3: diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 05540ee9ef..a9a9478a29 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -15,6 +15,9 @@ diff_cmp () { do sed -e '/^index/s/[0-9a-f]*[1-9a-f][0-9a-f]*\.\./1234567../' \ -e '/^index/s/\.\.[0-9a-f]*[1-9a-f][0-9a-f]*/..9abcdef/' \ +-e '/^index/s/ 00*\.\./ 000../' \ +-e '/^index/s/\.\.00*$/..000/' \ +-e '/^index/s/\.\.00* /..000 /' \ "$x" >"$x.filtered" done test_cmp "$1.filtered" "$2.filtered" @@ -32,7 +35,7 @@ test_expect_success 'status works (initial)' ' ' test_expect_success 'setup expected' ' - cat >expected <<-EOF + cat >expected <<-\EOF new file mode 100644 index 000..d95f3ad --- /dev/null @@ -69,7 +72,7 @@ test_expect_success 'status works (commit)' ' ' test_expect_success 'setup expected' ' - cat >expected <<-EOF + cat >expected <<-\EOF index 180b47c..b6f2c08 100644 --- a/file +++ b/file @@ -93,7 +96,7 @@ test_expect_success 'revert works (commit)' ' test_expect_success 'setup expected' ' - cat >expected <<-EOF + cat >expected <<-\EOF EOF ' @@ -105,7 +108,7 @@ test_expect_success 'dummy edit works' ' ' test_expect_success 'setup patch' ' - cat >patch <<-EOF + cat >patch <<-\EOF @@ -1,1 +1,4 @@ this +patch @@ -129,7 +132,7 @@ test_expect_success 'bad edit rejected' ' ' test_expect_success 'setup patch' ' - cat >patch <<-EOF + cat >patch <<-\EOF this patch is garbage EOF @@ -142,7 +145,7 @@ test_expect_success 'garbage edit rejected' ' ' test_expect_success 'setup patch' ' - cat >patch <<-EOF + cat >patch <<-\EOF @@ -1,0 +1,0 @@ baseline +content @@ -152,7 +155,7 @@ test_expect_success 'setup patch' ' ' test_expect_success 'setup expected' ' - cat >expected <<-EOF + cat >expected <<-\EOF diff --git a/file b/file index b5dd6c9..f910ae9 100644 --- a/file @@ -225,7 +228,7 @@ test_expect_success 'setup again' ' # Write the patch file with a new line at the top and bottom test_expect_success 'setup patch' ' - cat >patch <<-EOF + cat >patch <<-\EOF index 180b47c..b6f2c08 100644 --- a/file +++ b/file @@ -242,7 +245,7 @@ test_expect_success 'setup patch' ' test_expect_success 'setup expected' ' echo diff --git a/file b/file >expected && cat patch |sed "/^index/s/ 100644/ 100755/" >>expected && - cat >expected-output <<-EOF + cat >expected-output <<-\EOF --- a/file +++ b/file @@ -1,2 +1,4 @@ @@ -277,7 +280,7 @@ test_expect_success C_LOCALE_OUTPUT 'add first line works' ' ' test_expect_success 'setup expected' ' - cat >expected <<-EOF + cat >expected <<-\EOF diff --git a/non-empty b/non-empty deleted file mode 100644 index d95f3ad..000 @@ -300,7 +303,7 @@ test_expect_success 'deleting a non-empty file' ' ' test_expect_success 'setup expected' ' - cat >expected <<-EOF + cat >expected <<-\EOF diff --git a/empty b/empty deleted file mode 100644 index e69de29..000 Phillip Wood (9): add -i: add function to format hunk header t3701: indent here documents t3701: use test_write_lines and write_script t3701: don't hard code sha1 hash values t3701: add failing test for pathological context lines add -p: adjust offsets of subsequent hunks when one is skipped add -p: calculate offset delta for edited patches add -p: fix counting when splitting and coalescing add -p: don't rely on apply's '--recount' option git-add--interactive.perl | 106 + t/t3701-add-interactive.sh | 291 + 2 files changed, 243 insertions(+), 154 deletions(-) -- 2.16.1
[PATCH v4 1/9] add -i: add function to format hunk header
From: Phillip WoodThis code is duplicated in a couple of places so make it into a function as we're going to add some more callers shortly. Signed-off-by: Phillip Wood --- git-add--interactive.perl | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 964c3a7542..8ababa6453 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -751,6 +751,15 @@ sub parse_hunk_header { return ($o_ofs, $o_cnt, $n_ofs, $n_cnt); } +sub format_hunk_header { + my ($o_ofs, $o_cnt, $n_ofs, $n_cnt) = @_; + return ("@@ -$o_ofs" . + (($o_cnt != 1) ? ",$o_cnt" : '') . + " +$n_ofs" . + (($n_cnt != 1) ? ",$n_cnt" : '') . + " @@\n"); +} + sub split_hunk { my ($text, $display) = @_; my @split = (); @@ -838,11 +847,7 @@ sub split_hunk { my $o_cnt = $hunk->{OCNT}; my $n_cnt = $hunk->{NCNT}; - my $head = ("@@ -$o_ofs" . - (($o_cnt != 1) ? ",$o_cnt" : '') . - " +$n_ofs" . - (($n_cnt != 1) ? ",$n_cnt" : '') . - " @@\n"); + my $head = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt); my $display_head = $head; unshift @{$hunk->{TEXT}}, $head; if ($diff_use_color) { @@ -912,11 +917,7 @@ sub merge_hunk { } push @line, $line; } - my $head = ("@@ -$o0_ofs" . - (($o_cnt != 1) ? ",$o_cnt" : '') . - " +$n0_ofs" . - (($n_cnt != 1) ? ",$n_cnt" : '') . - " @@\n"); + my $head = format_hunk_header($o0_ofs, $o_cnt, $n0_ofs, $n_cnt); @{$prev->{TEXT}} = ($head, @line); } -- 2.16.1
[PATCH v4 7/9] add -p: calculate offset delta for edited patches
From: Phillip WoodRecount the number of preimage and postimage lines in a hunk after it has been edited so any change in the number of insertions or deletions can be used to adjust the offsets of subsequent hunks. If an edited hunk is subsequently split then the offset correction will be lost. It would be possible to fix this if it is a problem, however the code here is still an improvement on the status quo for the common case where an edited hunk is applied without being split. This is also a necessary step to removing '--recount' and '--allow-overlap' from the invocation of 'git apply'. Before '--recount' can be removed the splitting and coalescing counting needs to be fixed to handle a missing newline at the end of a file. In order to remove '--allow-overlap' there needs to be i) some way of verifying the offset data in the edited hunk (probably by correlating the preimage (or postimage if the patch is going to be applied in reverse) lines of the edited and unedited versions to see if they are offset or if any leading/trailing context lines have been removed) and ii) a way of dealing with edited hunks that change context lines that are shared with neighbouring hunks. Signed-off-by: Phillip Wood --- Notes: changes since v1 - edited hunks are now recounted before seeing if they apply in preparation for removing '--recount' when invoking 'git apply' - added sentence to commit message about the offset data being lost if an edited hunk is split git-add--interactive.perl | 55 ++ t/t3701-add-interactive.sh | 2 +- 2 files changed, 47 insertions(+), 10 deletions(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 7a0a5896bb..0df0c2aa06 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -938,13 +938,19 @@ sub coalesce_overlapping_hunks { parse_hunk_header($text->[0]); unless ($_->{USE}) { $ofs_delta += $o_cnt - $n_cnt; + # If this hunk has been edited then subtract + # the delta that is due to the edit. + $_->{OFS_DELTA} and $ofs_delta -= $_->{OFS_DELTA}; next; } if ($ofs_delta) { $n_ofs += $ofs_delta; $_->{TEXT}->[0] = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt); } + # If this hunk was edited then adjust the offset delta + # to reflect the edit. + $_->{OFS_DELTA} and $ofs_delta += $_->{OFS_DELTA}; if (defined $last_o_ctx && $o_ofs <= $last_o_ctx && !$_->{DIRTY} && @@ -1016,6 +1022,30 @@ marked for discarding."), marked for applying."), ); +sub recount_edited_hunk { + local $_; + my ($oldtext, $newtext) = @_; + my ($o_cnt, $n_cnt) = (0, 0); + for (@{$newtext}[1..$#{$newtext}]) { + my $mode = substr($_, 0, 1); + if ($mode eq '-') { + $o_cnt++; + } elsif ($mode eq '+') { + $n_cnt++; + } elsif ($mode eq ' ') { + $o_cnt++; + $n_cnt++; + } + } + my ($o_ofs, undef, $n_ofs, undef) = + parse_hunk_header($newtext->[0]); + $newtext->[0] = format_hunk_header($o_ofs, $o_cnt, $n_ofs, $n_cnt); + my (undef, $orig_o_cnt, undef, $orig_n_cnt) = + parse_hunk_header($oldtext->[0]); + # Return the change in the number of lines inserted by this hunk + return $orig_o_cnt - $orig_n_cnt - $o_cnt + $n_cnt; +} + sub edit_hunk_manually { my ($oldtext) = @_; @@ -1114,25 +1144,32 @@ sub prompt_yesno { } sub edit_hunk_loop { - my ($head, $hunk, $ix) = @_; - my $text = $hunk->[$ix]->{TEXT}; + my ($head, $hunks, $ix) = @_; + my $hunk = $hunks->[$ix]; + my $text = $hunk->{TEXT}; while (1) { - $text = edit_hunk_manually($text); - if (!defined $text) { + my $newtext = edit_hunk_manually($text); + if (!defined $newtext) { return undef; } my $newhunk = { - TEXT => $text, - TYPE => $hunk->[$ix]->{TYPE}, + TEXT => $newtext, + TYPE => $hunk->{TYPE}, USE => 1, DIRTY => 1, }; + $newhunk->{OFS_DELTA} = recount_edited_hunk($text, $newtext); + # If this hunk has already been edited then add the +
[PATCH v4 9/9] add -p: don't rely on apply's '--recount' option
From: Phillip WoodNow that add -p counts patches properly it should be possible to turn off the '--recount' option when invoking 'git apply' Signed-off-by: Phillip Wood --- Notes: I can't think of a reason why this shouldn't be OK but I can't help feeling slightly nervous about it. I've made it a separate patch so it can be easily dropped or reverted if I've missed something. git-add--interactive.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 3226c2c4f0..a64c0db57d 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -677,7 +677,7 @@ sub add_untracked_cmd { sub run_git_apply { my $cmd = shift; my $fh; - open $fh, '| git ' . $cmd . " --recount --allow-overlap"; + open $fh, '| git ' . $cmd . " --allow-overlap"; print $fh @_; return close $fh; } -- 2.16.1
[PATCH v4 3/9] t3701: use test_write_lines and write_script
From: Phillip WoodSimplify things slightly by using the above helpers. Signed-off-by: Phillip Wood --- Notes: changes since v2 - fixed use of test_set_editor to match what was in pu t/t3701-add-interactive.sh | 33 + 1 file changed, 5 insertions(+), 28 deletions(-) diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 3130dafcf0..836ce346ed 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -87,13 +87,8 @@ test_expect_success 'setup expected' ' EOF ' -test_expect_success 'setup fake editor' ' - >fake_editor.sh && - chmod a+x fake_editor.sh && - test_set_editor "$(pwd)/fake_editor.sh" -' - test_expect_success 'dummy edit works' ' + test_set_editor : && (echo e; echo a) | git add -p && git diff > diff && test_cmp expected diff @@ -110,12 +105,10 @@ test_expect_success 'setup patch' ' ' test_expect_success 'setup fake editor' ' - echo "#!$SHELL_PATH" >fake_editor.sh && - cat >>fake_editor.sh <<-\EOF && + write_script "fake_editor.sh" <<-\EOF && mv -f "$1" oldpatch && mv -f patch "$1" EOF - chmod a+x fake_editor.sh && test_set_editor "$(pwd)/fake_editor.sh" ' @@ -302,18 +295,12 @@ test_expect_success 'deleting an empty file' ' test_expect_success 'split hunk setup' ' git reset --hard && - for i in 10 20 30 40 50 60 - do - echo $i - done >test && + test_write_lines 10 20 30 40 50 60 >test && git add test && test_tick && git commit -m test && - for i in 10 15 20 21 22 23 24 30 40 50 60 - do - echo $i - done >test + test_write_lines 10 15 20 21 22 23 24 30 40 50 60 >test ' test_expect_success 'split hunk "add -p (edit)"' ' @@ -334,17 +321,7 @@ test_expect_success 'split hunk "add -p (edit)"' ' ' test_expect_failure 'split hunk "add -p (no, yes, edit)"' ' - cat >test <<-\EOF && - 5 - 10 - 20 - 21 - 30 - 31 - 40 - 50 - 60 - EOF + test_write_lines 5 10 20 21 30 31 40 50 60 >test && git reset && # test sequence is s(plit), n(o), y(es), e(dit) # q n q q is there to make sure we exit at the end. -- 2.16.1
[PATCH/RFC 0/1] Avoid expensive 'repack -ad' in gc --auto
The series [1] I just sent helps reduce pack-objects memory footprint a bit. But even then it's still a huge memory hog. So this patch makes a special treatment for gc --auto: avoid it completely. The trick here is not new (pinning the largest pack with a .keep file). It's just never done automatically. I think this is a good thing to do, provided that gc --auto estimates memory usage more or less correct. And "git gc --auto" should run even on weak machines because it's part of regular repo maintenance. You can't tell people "You can't work on linux-2.6.git repository because your machine has too little memory". The only thing left I think I should do is to use an external rev-list to free up some more memory. But let's see how the first patch goes first (documents and tests are missing, I know). [1] https://public-inbox.org/git/%3c20180301091052.32267-1-pclo...@gmail.com%3E/ Nguyễn Thái Ngọc Duy (1): gc --auto: exclude the largest giant pack in low-memory config builtin/gc.c | 125 +++-- builtin/pack-objects.c | 2 +- config.mak.uname | 1 + git-compat-util.h | 4 ++ pack-objects.h | 2 + 5 files changed, 128 insertions(+), 6 deletions(-) -- 2.16.1.435.g8f24da2e1a
[PATCH/RFC 1/1] gc --auto: exclude the largest giant pack in low-memory config
pack-objects could be a big memory hog especially on large repos, everybody knows that. The suggestion to stick a .keep file on the largest pack to avoid this problem is also known for a long time. Let's do the suggestion automatically instead of waiting for people to come to Git mailing list and get the advice. When a certain condition is met, gc --auto create a .keep file temporary before repack is run, then remove it afterward. gc --auto does this based on an estimation of pack-objects memory usage and whether that fits in one third of system memory (the assumption here is for desktop environment where there are many other applications running). Since the estimation may be inaccurate and that 1/3 threshold is arbitrary, give the user a finer control over this mechanism as well: if the largest pack is larger than gc.bigPackThreshold, it's kept. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/gc.c | 125 +++-- builtin/pack-objects.c | 2 +- config.mak.uname | 1 + git-compat-util.h | 4 ++ pack-objects.h | 2 + 5 files changed, 128 insertions(+), 6 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index 77fa720bd0..2d9965bcdf 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -20,6 +20,10 @@ #include "argv-array.h" #include "commit.h" #include "packfile.h" +#include "pack.h" +#include "pack-objects.h" +#include "blob.h" +#include "tree.h" #define FAILED_RUN "failed to run %s" @@ -39,6 +43,8 @@ static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; static const char *prune_expire = "2.weeks.ago"; static const char *prune_worktrees_expire = "3.months.ago"; +static unsigned long big_pack_threshold = 0; +static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT; static struct argv_array reflog = ARGV_ARRAY_INIT; @@ -49,6 +55,7 @@ static struct argv_array rerere = ARGV_ARRAY_INIT; static struct tempfile *pidfile; static struct lock_file log_lock; +static struct strbuf temp_keep_file = STRBUF_INIT; static struct string_list pack_garbage = STRING_LIST_INIT_DUP; @@ -93,6 +100,18 @@ static void process_log_file(void) } } +static void delete_temp_keep_file(void) +{ + unlink(temp_keep_file.buf); +} + +static void delete_temp_keep_file_on_signal(int signo) +{ + delete_temp_keep_file(); + sigchain_pop(signo); + raise(signo); +} + static void process_log_file_at_exit(void) { fflush(stderr); @@ -126,6 +145,9 @@ static void gc_config(void) git_config_get_expiry("gc.worktreepruneexpire", _worktrees_expire); git_config_get_expiry("gc.logexpiry", _log_expire); + git_config_get_ulong("gc.bigpackthreshold", _pack_threshold); + git_config_get_ulong("pack.deltacachesize", _delta_cache_size); + git_config(git_default_config, NULL); } @@ -164,7 +186,7 @@ static int too_many_loose_objects(void) return needed; } -static int too_many_packs(void) +static int too_many_packs(struct packed_git **largest_pack) { struct packed_git *p; int cnt; @@ -173,22 +195,104 @@ static int too_many_packs(void) return 0; prepare_packed_git(); + *largest_pack = NULL; for (cnt = 0, p = packed_git; p; p = p->next) { if (!p->pack_local) continue; if (p->pack_keep) continue; + if (!*largest_pack || (*largest_pack)->pack_size < p->pack_size) + *largest_pack = p; /* * Perhaps check the size of the pack and count only * very small ones here? */ cnt++; } + return gc_auto_pack_limit < cnt; } -static void add_repack_all_option(void) +static inline unsigned long total_ram(void) +{ + unsigned long default_ram = 4; +#ifdef HAVE_SYSINFO + struct sysinfo si; + + if (!sysinfo()) + return si.totalram; +#elif defined(HAVE_BSD_SYSCTL) && defined(HW_MEMSIZE) + int64_t physical_memory; + int mib[2]; + int length; + + mib[0] = CTL_HW; + mib[1] = HW_MEMSIZE; + length = sizeof(int64_t); + if (!sysctl(mib, 2, _memory, , NULL, 0)) + return physical_memory; +#elif defined(GIT_WINDOWS_NATIVE) + MEMORYSTATUSEX memInfo; + + memInfo.dwLength = sizeof(MEMORYSTATUSEX); + if (GlobalMemoryStatusEx()) + return memInfo;ullTotalPhys; +#else + fprintf(stderr, _("unrecognized platform, assuming %lu GB RAM\n"), + default_ram); +#endif + return default_ram * 1024 * 1024 * 1024; +} + +static int pack_objects_uses_too_much_memory(struct packed_git *pack) +{ + unsigned long nr_objects = approximate_object_count(); + size_t mem_want, mem_have; + +
[PATCH 00/11] Reduce pack-objects memory footprint
The array of object_entry in pack-objects can take a lot of memory when pack-objects is run in "pack everything" mode. On linux-2.6.git, this array alone takes roughly 800MB. This series reorders some fields and reduces field size... to keep this struct smaller. Its size goes from 136 bytes to 96 bytes (29%) on 64-bit linux and saves 260MB on linux-2.6.git. Now the bad side: - the number of pack files pack-objects can handle is reduced to 4096 (previously unlimited) - max delta chain is also limited to 4096 (previously practically unlimited) - some patches are quite invasive (e.g. replacing pointer with uint32_t) and reduces readability a bit. - it may be tricker to add more data in object_entry in the future. Nguyễn Thái Ngọc Duy (11): pack-objects: document holes in struct object_entry.h pack-objects: turn type and in_pack_type to bitfields pack-objects: use bitfield for object_entry::dfs_state pack-objects: use bitfield for object_entry::depth pack-objects: note about in_pack_header_size pack-objects: move in_pack_pos out of struct object_entry pack-objects: move in_pack out of struct object_entry pack-objects: faster reverse packed_git lookup pack-objects: refer to delta objects by index instead of pointer pack-objects: reorder 'hash' to pack struct object_entry pack-objects: increase pack file limit to 4096 builtin/pack-objects.c | 189 ++--- cache.h| 3 + object.h | 1 - pack-bitmap-write.c| 8 +- pack-bitmap.c | 2 +- pack-bitmap.h | 4 +- pack-objects.h | 70 ++- 7 files changed, 180 insertions(+), 97 deletions(-) -- 2.16.1.435.g8f24da2e1a
[PATCH 11/11] pack-objects: increase pack file limit to 4096
When OE_IN_PACK_BITS was added, we didn't have many bits left to spare so the max number of packs that pack-objects could handle was limited to 256. Now we have more spare bits, let's increase it to 4096 to be on the safe side. If you have more than this many packs, you may need to reconsider if you're still sane. Increasing this also increases memory a bit because in_pack[] array in packing_data is bigger, roughly 32kb, which is insignificant in pack-objects context. Signed-off-by: Nguyễn Thái Ngọc Duy--- pack-objects.h | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/pack-objects.h b/pack-objects.h index 52087b32e5..ec4eba4ee4 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -3,7 +3,7 @@ #define OE_DFS_STATE_BITS 2 #define OE_DEPTH_BITS 12 -#define OE_IN_PACK_BITS 8 +#define OE_IN_PACK_BITS 12 #define IN_PACK_POS(to_pack, obj) \ (to_pack)->in_pack_pos[(struct object_entry *)(obj) - (to_pack)->objects] @@ -24,6 +24,11 @@ enum dfs_state { DFS_NUM_STATES }; +/* + * The size of struct nearly determines pack-objects's memory + * consumption. This struct is packed tight for that reason. When you + * add or reorder something in this struct, think a bit about this. + */ struct object_entry { struct pack_idx_entry idx; unsigned long size; /* uncompressed size */ @@ -51,7 +56,7 @@ struct object_entry { unsigned filled:1; /* assigned write-order */ unsigned dfs_state:OE_DFS_STATE_BITS; - /* XXX 12 bits hole, try to pack */ + /* XXX 8 bits hole, try to pack */ unsigned depth:OE_DEPTH_BITS; -- 2.16.1.435.g8f24da2e1a
[PATCH 10/11] pack-objects: reorder 'hash' to pack struct object_entry
Signed-off-by: Nguyễn Thái Ngọc Duy--- pack-objects.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/pack-objects.h b/pack-objects.h index f339f0411a..52087b32e5 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -33,12 +33,10 @@ struct object_entry { uint32_t delta_sibling_idx; /* other deltified objects who * uses the same base as me */ - /* XXX 4 bytes hole, try to pack */ - + uint32_t hash; /* name hint hash */ void *delta_data; /* cached delta (uncompressed) */ unsigned long delta_size; /* delta data size (uncompressed) */ unsigned long z_delta_size; /* delta data size (compressed) */ - uint32_t hash; /* name hint hash */ unsigned char in_pack_header_size; /* note: spare bits available! */ unsigned in_pack_idx:OE_IN_PACK_BITS; /* already in pack */ unsigned type:TYPE_BITS; @@ -57,7 +55,7 @@ struct object_entry { unsigned depth:OE_DEPTH_BITS; - /* size: 104, padding: 4 */ + /* size: 96 */ }; struct packing_data { -- 2.16.1.435.g8f24da2e1a
[PATCH 09/11] pack-objects: refer to delta objects by index instead of pointer
Notice that packing_data::nr_objects is uint32_t, we could only handle maximum 4G objects and can address all of them with an uint32_t. If we use a pointer here, we waste 4 bytes on 64 bit architecture. Convert these delta pointers to indexes. Since we need to handle NULL pointers as well, the index is shifted by one [1]. There are holes in this struct but this patch is already big. Struct packing can be done separately. Even with holes, we save 8 bytes per object_entry. [1] This means we can only index 2^32-2 objects even though nr_objects could contain 2^32-1 objects. It should not be a problem in practice because when we grow objects[], nr_alloc would probably blow up long before nr_objects hits the wall. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 124 +++-- pack-objects.h | 14 +++-- 2 files changed, 78 insertions(+), 60 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 1fdb85ebb5..45076f2523 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -29,6 +29,20 @@ #include "list.h" #include "packfile.h" +#define DELTA(obj) \ + ((obj)->delta_idx ? _pack.objects[(obj)->delta_idx - 1] : NULL) +#define DELTA_CHILD(obj) \ + ((obj)->delta_child_idx ? _pack.objects[(obj)->delta_child_idx - 1] : NULL) +#define DELTA_SIBLING(obj) \ + ((obj)->delta_sibling_idx ? _pack.objects[(obj)->delta_sibling_idx - 1] : NULL) + +#define CLEAR_DELTA(obj) (obj)->delta_idx = 0 +#define CLEAR_DELTA_CHILD(obj) (obj)->delta_child_idx = 0 +#define CLEAR_DELTA_SIBLING(obj) (obj)->delta_sibling_idx = 0 + +#define SET_DELTA(obj, val) (obj)->delta_idx = ((val) - to_pack.objects) + 1 +#define SET_DELTA_CHILD(obj, val) (obj)->delta_child_idx = ((val) - to_pack.objects) + 1 + static const char *pack_usage[] = { N_("git pack-objects --stdout [...] [< | < ]"), N_("git pack-objects [...] [< | < ]"), @@ -125,11 +139,11 @@ static void *get_delta(struct object_entry *entry) buf = read_sha1_file(entry->idx.oid.hash, , ); if (!buf) die("unable to read %s", oid_to_hex(>idx.oid)); - base_buf = read_sha1_file(entry->delta->idx.oid.hash, , + base_buf = read_sha1_file(DELTA(entry)->idx.oid.hash, , _size); if (!base_buf) die("unable to read %s", - oid_to_hex(>delta->idx.oid)); + oid_to_hex((entry)->idx.oid)); delta_buf = diff_delta(base_buf, base_size, buf, size, _size, 0); if (!delta_buf || delta_size != entry->delta_size) @@ -286,12 +300,12 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent size = entry->delta_size; buf = entry->delta_data; entry->delta_data = NULL; - type = (allow_ofs_delta && entry->delta->idx.offset) ? + type = (allow_ofs_delta && DELTA(entry)->idx.offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; } else { buf = get_delta(entry); size = entry->delta_size; - type = (allow_ofs_delta && entry->delta->idx.offset) ? + type = (allow_ofs_delta && DELTA(entry)->idx.offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; } @@ -315,7 +329,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent * encoding of the relative offset for the delta * base from this object's position in the pack. */ - off_t ofs = entry->idx.offset - entry->delta->idx.offset; + off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset; unsigned pos = sizeof(dheader) - 1; dheader[pos] = ofs & 127; while (ofs >>= 7) @@ -341,7 +355,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent return 0; } hashwrite(f, header, hdrlen); - hashwrite(f, entry->delta->idx.oid.hash, 20); + hashwrite(f, DELTA(entry)->idx.oid.hash, 20); hdrlen += 20; } else { if (limit && hdrlen + datalen + 20 >= limit) { @@ -377,8 +391,8 @@ static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, dheader[MAX_PACK_OBJECT_HEADER]; unsigned hdrlen; - if (entry->delta) - type = (allow_ofs_delta && entry->delta->idx.offset) ? + if (DELTA(entry)) + type = (allow_ofs_delta && DELTA(entry)->idx.offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; hdrlen = encode_in_pack_object_header(header, sizeof(header), type, entry->size); @@ -406,7 +420,7 @@ static off_t
[PATCH 07/11] pack-objects: move in_pack out of struct object_entry
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a pack. Use an index isntead since the number of packs should be relatively small. This limits the number of packs we can handle to 256 (still unreasonably high for a repo to work well). If you have more than 256 packs, you'll need an older version of Git to repack first. This technically saves 7 bytes. But we don't see any of that in practice due to padding. The saving becomes real when we pack this struct tighter later. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 48 -- pack-objects.h | 12 +-- 2 files changed, 47 insertions(+), 13 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 7bb5544883..d0d371714a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -367,7 +367,7 @@ static unsigned long write_no_reuse_object(struct hashfile *f, struct object_ent static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry, unsigned long limit, int usable_delta) { - struct packed_git *p = entry->in_pack; + struct packed_git *p = IN_PACK(_pack, entry); struct pack_window *w_curs = NULL; struct revindex_entry *revidx; off_t offset; @@ -478,7 +478,7 @@ static off_t write_object(struct hashfile *f, if (!reuse_object) to_reuse = 0; /* explicit */ - else if (!entry->in_pack) + else if (!IN_PACK(_pack, entry)) to_reuse = 0; /* can't reuse what we don't have */ else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA) /* check_object() decided it for us ... */ @@ -1074,7 +1074,15 @@ static void create_object_entry(const struct object_id *oid, else nr_result++; if (found_pack) { - entry->in_pack = found_pack; + int i; + + for (i = 0; i < (1 << OE_IN_PACK_BITS); i++) + if (to_pack.in_pack[i] == found_pack) { + entry->in_pack_idx = i; + break; + } + if (i == (1 << OE_IN_PACK_BITS)) + die("BUG: pack not found!"); entry->in_pack_offset = found_offset; } @@ -1399,8 +1407,8 @@ static void cleanup_preferred_base(void) static void check_object(struct object_entry *entry) { - if (entry->in_pack) { - struct packed_git *p = entry->in_pack; + if (IN_PACK(_pack, entry)) { + struct packed_git *p = IN_PACK(_pack, entry); struct pack_window *w_curs = NULL; const unsigned char *base_ref = NULL; struct object_entry *base_entry; @@ -1535,14 +1543,16 @@ static int pack_offset_sort(const void *_a, const void *_b) { const struct object_entry *a = *(struct object_entry **)_a; const struct object_entry *b = *(struct object_entry **)_b; + const struct packed_git *a_in_pack = IN_PACK(_pack, a); + const struct packed_git *b_in_pack = IN_PACK(_pack, b); /* avoid filesystem trashing with loose objects */ - if (!a->in_pack && !b->in_pack) + if (!a_in_pack && !b_in_pack) return oidcmp(>idx.oid, >idx.oid); - if (a->in_pack < b->in_pack) + if (a_in_pack < b_in_pack) return -1; - if (a->in_pack > b->in_pack) + if (a_in_pack > b_in_pack) return 1; return a->in_pack_offset < b->in_pack_offset ? -1 : (a->in_pack_offset > b->in_pack_offset); @@ -1578,7 +1588,7 @@ static void drop_reused_delta(struct object_entry *entry) oi.sizep = >size; oi.typep = - if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) { + if (packed_object_info(IN_PACK(_pack, entry), entry->in_pack_offset, ) < 0) { /* * We failed to get the info from this pack for some reason; * fall back to sha1_object_info, which may find another copy. @@ -1848,8 +1858,8 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, * it, we will still save the transfer cost, as we already know * the other side has it and we won't send src_entry at all. */ - if (reuse_delta && trg_entry->in_pack && - trg_entry->in_pack == src_entry->in_pack && + if (reuse_delta && IN_PACK(_pack, trg_entry) && + IN_PACK(_pack, trg_entry) == IN_PACK(_pack, src_entry) && !src_entry->preferred_base && trg_entry->in_pack_type != OBJ_REF_DELTA && trg_entry->in_pack_type != OBJ_OFS_DELTA) @@ -2958,6 +2968,21 @@ static int option_parse_unpack_unreachable(const struct option *opt, return 0; } +static void init_in_pack_mapping(struct
[PATCH 04/11] pack-objects: use bitfield for object_entry::depth
This does not give us any saving due to padding. But we will be able to save once we cut 4 bytes out of this struct in a subsequent patch. Because of struct packing from now on we can only handle max depth 4095 (or even lower when new booleans are added in this struct). This should be ok since long delta chain will cause significant slow down anyway. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 4 pack-objects.h | 6 ++ 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a4dbb40824..cfd97da7db 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3068,6 +3068,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (pack_to_stdout != !base_name || argc) usage_with_options(pack_usage, pack_objects_options); + if (depth > (1 << OE_DEPTH_BITS)) + die(_("delta chain depth %d is greater than maximum limit %d"), + depth, (1 << OE_DEPTH_BITS)); + argv_array_push(, "pack-objects"); if (thin) { use_internal_rev_list = 1; diff --git a/pack-objects.h b/pack-objects.h index fca334ab4d..3941e6c9a6 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -2,6 +2,7 @@ #define PACK_OBJECTS_H #define OE_DFS_STATE_BITS 2 +#define OE_DEPTH_BITS 12 /* * State flags for depth-first search used for analyzing delta cycles. @@ -43,10 +44,7 @@ struct object_entry { unsigned tagged:1; /* near the very tip of refs */ unsigned filled:1; /* assigned write-order */ unsigned dfs_state:OE_DFS_STATE_BITS; - - /* XXX 20 bits hole, try to pack */ - - int depth; + unsigned depth:OE_DEPTH_BITS; /* size: 120, padding: 4 */ }; -- 2.16.1.435.g8f24da2e1a
[PATCH 06/11] pack-objects: move in_pack_pos out of struct object_entry
This field is only need for pack-bitmap, which is an optional feature. Move it to a separate array that is only allocated when pack-bitmap is used (it's not freed in the same way that objects[] is not). This saves us 8 bytes in struct object_entry. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 3 ++- pack-bitmap-write.c| 8 +--- pack-bitmap.c | 2 +- pack-bitmap.h | 4 +++- pack-objects.h | 8 ++-- 5 files changed, 17 insertions(+), 8 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index cfd97da7db..7bb5544883 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -878,7 +878,8 @@ static void write_pack_file(void) if (write_bitmap_index) { bitmap_writer_set_checksum(oid.hash); - bitmap_writer_build_type_index(written_list, nr_written); + bitmap_writer_build_type_index( + _pack, written_list, nr_written); } finish_tmp_packfile(, pack_tmp_name, diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index e01f992884..1360a93311 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show) /** * Build the initial type index for the packfile */ -void bitmap_writer_build_type_index(struct pack_idx_entry **index, +void bitmap_writer_build_type_index(struct packing_data *to_pack, + struct pack_idx_entry **index, uint32_t index_nr) { uint32_t i; @@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry **index, writer.trees = ewah_new(); writer.blobs = ewah_new(); writer.tags = ewah_new(); + ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects); for (i = 0; i < index_nr; ++i) { struct object_entry *entry = (struct object_entry *)index[i]; enum object_type real_type; - entry->in_pack_pos = i; + IN_PACK_POS(to_pack, entry) = i; switch (entry->type) { case OBJ_COMMIT: @@ -147,7 +149,7 @@ static uint32_t find_object_pos(const unsigned char *sha1) "(object %s is missing)", sha1_to_hex(sha1)); } - return entry->in_pack_pos; + return IN_PACK_POS(writer.to_pack, entry); } static void show_object(struct object *object, const char *name, void *data) diff --git a/pack-bitmap.c b/pack-bitmap.c index 9270983e5f..f21479fe16 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1032,7 +1032,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, oe = packlist_find(mapping, sha1, NULL); if (oe) - reposition[i] = oe->in_pack_pos + 1; + reposition[i] = IN_PACK_POS(mapping, oe) + 1; } rebuild = bitmap_new(); diff --git a/pack-bitmap.h b/pack-bitmap.h index 3742a00e14..5ded2f139a 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bi void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(unsigned char *sha1); -void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t index_nr); +void bitmap_writer_build_type_index(struct packing_data *to_pack, + struct pack_idx_entry **index, + uint32_t index_nr); void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); diff --git a/pack-objects.h b/pack-objects.h index 017cc3425f..3bef28196c 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -4,6 +4,9 @@ #define OE_DFS_STATE_BITS 2 #define OE_DEPTH_BITS 12 +#define IN_PACK_POS(to_pack, obj) \ + (to_pack)->in_pack_pos[(struct object_entry *)(obj) - (to_pack)->objects] + /* * State flags for depth-first search used for analyzing delta cycles. * @@ -31,7 +34,6 @@ struct object_entry { unsigned long delta_size; /* delta data size (uncompressed) */ unsigned long z_delta_size; /* delta data size (compressed) */ uint32_t hash; /* name hint hash */ - unsigned int in_pack_pos; unsigned char in_pack_header_size; /* note: spare bits available! */ unsigned type:TYPE_BITS; unsigned in_pack_type:TYPE_BITS; /* could be delta */ @@ -46,7 +48,7 @@ struct object_entry { unsigned dfs_state:OE_DFS_STATE_BITS; unsigned depth:OE_DEPTH_BITS; - /* size: 120, padding: 4 */ + /* size: 112 */ }; struct packing_data { @@ -55,6 +57,8 @@ struct
[PATCH 02/11] pack-objects: turn type and in_pack_type to bitfields
This saves 8 bytes in sizeof(struct object_entry). On a large repository like linux-2.6.git (6.5M objects), this saves us 52MB memory. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 14 -- cache.h| 2 ++ object.h | 1 - pack-objects.h | 8 4 files changed, 18 insertions(+), 7 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5c674b2843..fd217cb51f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry) unsigned long avail; off_t ofs; unsigned char *buf, c; + enum object_type type; buf = use_pack(p, _curs, entry->in_pack_offset, ); @@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry) * since non-delta representations could still be reused. */ used = unpack_object_header_buffer(buf, avail, - >in_pack_type, + , >size); if (used == 0) goto give_up; + if (type < 0) + die("BUG: invalid type %d", type); + entry->in_pack_type = type; + /* * Determine if this is a delta and if so whether we can * reuse it or not. Otherwise let's find out as cheaply as @@ -1559,6 +1564,7 @@ static void drop_reused_delta(struct object_entry *entry) { struct object_entry **p = >delta->delta_child; struct object_info oi = OBJECT_INFO_INIT; + enum object_type type; while (*p) { if (*p == entry) @@ -1570,7 +1576,7 @@ static void drop_reused_delta(struct object_entry *entry) entry->depth = 0; oi.sizep = >size; - oi.typep = >type; + oi.typep = if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) { /* * We failed to get the info from this pack for some reason; @@ -1580,6 +1586,10 @@ static void drop_reused_delta(struct object_entry *entry) */ entry->type = sha1_object_info(entry->idx.oid.hash, >size); + } else { + if (type < 0) + die("BUG: invalid type %d", type); + entry->type = type; } } diff --git a/cache.h b/cache.h index 21fbcc2414..862bdff83a 100644 --- a/cache.h +++ b/cache.h @@ -373,6 +373,8 @@ extern void free_name_hash(struct index_state *istate); #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(_index, (path), (sz)) #endif +#define TYPE_BITS 3 + enum object_type { OBJ_BAD = -1, OBJ_NONE = 0, diff --git a/object.h b/object.h index 87563d9056..8ce294d6ec 100644 --- a/object.h +++ b/object.h @@ -25,7 +25,6 @@ struct object_array { #define OBJECT_ARRAY_INIT { 0, 0, NULL } -#define TYPE_BITS 3 /* * object flag allocation: * revision.h: 0-1026 diff --git a/pack-objects.h b/pack-objects.h index 720a8e8756..f8b06e2521 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -14,11 +14,11 @@ struct object_entry { void *delta_data; /* cached delta (uncompressed) */ unsigned long delta_size; /* delta data size (uncompressed) */ unsigned long z_delta_size; /* delta data size (compressed) */ - enum object_type type; - enum object_type in_pack_type; /* could be delta */ uint32_t hash; /* name hint hash */ unsigned int in_pack_pos; unsigned char in_pack_header_size; + unsigned type:TYPE_BITS; + unsigned in_pack_type:TYPE_BITS; /* could be delta */ unsigned preferred_base:1; /* * we do not pack this, but is available * to be used as the base object to delta @@ -28,7 +28,7 @@ struct object_entry { unsigned tagged:1; /* near the very tip of refs */ unsigned filled:1; /* assigned write-order */ - /* XXX 28 bits hole, try to pack */ + /* XXX 22 bits hole, try to pack */ /* * State flags for depth-first search used for analyzing delta cycles. * @@ -41,7 +41,7 @@ struct object_entry { DFS_DONE } dfs_state; int depth; - /* size: 136, padding: 4 */ + /* size: 128, padding: 4 */ }; struct packing_data { -- 2.16.1.435.g8f24da2e1a
[PATCH 01/11] pack-objects: document holes in struct object_entry.h
Signed-off-by: Nguyễn Thái Ngọc Duy--- pack-objects.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pack-objects.h b/pack-objects.h index 03f1191659..720a8e8756 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -28,6 +28,7 @@ struct object_entry { unsigned tagged:1; /* near the very tip of refs */ unsigned filled:1; /* assigned write-order */ + /* XXX 28 bits hole, try to pack */ /* * State flags for depth-first search used for analyzing delta cycles. * @@ -40,6 +41,7 @@ struct object_entry { DFS_DONE } dfs_state; int depth; + /* size: 136, padding: 4 */ }; struct packing_data { -- 2.16.1.435.g8f24da2e1a
[PATCH 05/11] pack-objects: note about in_pack_header_size
Object header in a pack is packed really tight (see pack-format.txt). Even with 8 bytes length, we need 9-10 bytes most, plus a hash (20 bytes). Which means this field only needs to store a number as big as 32 (5 bits). This is trickier to pack tight though since a new hash algorithm is coming, the number of bits needed may quickly increase. So leave it for now. Signed-off-by: Nguyễn Thái Ngọc Duy--- pack-objects.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pack-objects.h b/pack-objects.h index 3941e6c9a6..017cc3425f 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -32,7 +32,7 @@ struct object_entry { unsigned long z_delta_size; /* delta data size (compressed) */ uint32_t hash; /* name hint hash */ unsigned int in_pack_pos; - unsigned char in_pack_header_size; + unsigned char in_pack_header_size; /* note: spare bits available! */ unsigned type:TYPE_BITS; unsigned in_pack_type:TYPE_BITS; /* could be delta */ unsigned preferred_base:1; /* -- 2.16.1.435.g8f24da2e1a
[PATCH 08/11] pack-objects: faster reverse packed_git lookup
We do a linear search for in_pack index in create_object_entry(). This function is called for every available object in the worst case (and on linux-2.6.git, that's about 6.5M). Try to avoid that by saving the index in packed_git. Since we should not have zillions of packs, this extra space should not be a big deal. Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 11 ++- cache.h| 1 + 2 files changed, 3 insertions(+), 9 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d0d371714a..1fdb85ebb5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1074,15 +1074,7 @@ static void create_object_entry(const struct object_id *oid, else nr_result++; if (found_pack) { - int i; - - for (i = 0; i < (1 << OE_IN_PACK_BITS); i++) - if (to_pack.in_pack[i] == found_pack) { - entry->in_pack_idx = i; - break; - } - if (i == (1 << OE_IN_PACK_BITS)) - die("BUG: pack not found!"); + entry->in_pack_idx = found_pack->index; entry->in_pack_offset = found_offset; } @@ -2980,6 +2972,7 @@ static void init_in_pack_mapping(struct packing_data *to_pack) if (i >= (1 << OE_IN_PACK_BITS)) die("BUG: too many packs to handle!"); to_pack->in_pack[i] = p; + p->index = i; } } diff --git a/cache.h b/cache.h index 862bdff83a..b90feb3802 100644 --- a/cache.h +++ b/cache.h @@ -1635,6 +1635,7 @@ extern struct packed_git { int index_version; time_t mtime; int pack_fd; + int index; /* for builtin/pack-objects.c */ unsigned pack_local:1, pack_keep:1, freshened:1, -- 2.16.1.435.g8f24da2e1a
[PATCH 03/11] pack-objects: use bitfield for object_entry::dfs_state
Signed-off-by: Nguyễn Thái Ngọc Duy--- builtin/pack-objects.c | 3 +++ pack-objects.h | 33 - 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index fd217cb51f..a4dbb40824 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) OPT_END(), }; + if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS)) + die("BUG: too many dfs states, increase OE_DFS_STATE_BITS"); + check_replace_refs = 0; reset_pack_idx_option(_idx_opts); diff --git a/pack-objects.h b/pack-objects.h index f8b06e2521..fca334ab4d 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -1,6 +1,21 @@ #ifndef PACK_OBJECTS_H #define PACK_OBJECTS_H +#define OE_DFS_STATE_BITS 2 + +/* + * State flags for depth-first search used for analyzing delta cycles. + * + * The depth is measured in delta-links to the base (so if A is a delta + * against B, then A has a depth of 1, and B a depth of 0). + */ +enum dfs_state { + DFS_NONE = 0, + DFS_ACTIVE, + DFS_DONE, + DFS_NUM_STATES +}; + struct object_entry { struct pack_idx_entry idx; unsigned long size; /* uncompressed size */ @@ -27,21 +42,13 @@ struct object_entry { unsigned no_try_delta:1; unsigned tagged:1; /* near the very tip of refs */ unsigned filled:1; /* assigned write-order */ + unsigned dfs_state:OE_DFS_STATE_BITS; + + /* XXX 20 bits hole, try to pack */ - /* XXX 22 bits hole, try to pack */ - /* -* State flags for depth-first search used for analyzing delta cycles. -* -* The depth is measured in delta-links to the base (so if A is a delta -* against B, then A has a depth of 1, and B a depth of 0). -*/ - enum { - DFS_NONE = 0, - DFS_ACTIVE, - DFS_DONE - } dfs_state; int depth; - /* size: 128, padding: 4 */ + + /* size: 120, padding: 4 */ }; struct packing_data { -- 2.16.1.435.g8f24da2e1a
Re: Reduce pack-objects memory footprint?
On Wed, Feb 28, 2018 at 06:22:33PM +, Eric Wong wrote: > Duy Nguyenwrote: > > which saves 12 bytes (or another 74 MB). 222 MB total is plenty of > > space to keep some file cache from being evicted. > > Nice! I can definitely benefit from lower memory usage when > packing. Fwiw, I use pahole with other projects to help find > packing opportunities: > > git://git.kernel.org/pub/scm/devel/pahole/pahole.git Yes it's a wonderful tool. > > @@ -14,11 +26,10 @@ struct object_entry { > > void *delta_data; /* cached delta (uncompressed) */ > > unsigned long delta_size; /* delta data size (uncompressed) */ > > unsigned long z_delta_size; /* delta data size (compressed) */ > > - enum object_type type; > > - enum object_type in_pack_type; /* could be delta */ > > uint32_t hash; /* name hint hash */ > > - unsigned int in_pack_pos; > > unsigned char in_pack_header_size; > > + unsigned type:3; /* enum object_type */ > > + unsigned in_pack_type:3; /* enum object_type - could be delta */ > > For C99 compilers, enums can be bitfields. I introduced the > following macro into Ruby a few weeks ago to remain compatible > with non-C99 compilers: > > /* > * For declaring bitfields out of non-unsigned int types: > * struct date { > * BITFIELD(enum months) month:4; > * ... > * }; > */ > #if defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) > # define BITFIELD(type) type > #else > # define BITFIELD(type) unsigned int > #endif I tried this and got In file included from builtin/pack-objects.c:20:0: ./pack-objects.h:49:19: l?i: ?type? is narrower than values of its type [-Werror] enum object_type type:TYPE_BITS; ^~~~ The compiler is not wrong. What it does not realize is pack-objects code never uses out-of-range values (OBJ_BAD and OBJ_ANY) but I don't see how I could suppress this warning. So I went back to non-enum bitfields. -- Duy
Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.
On 1 March 2018 at 08:36, Jeff Kingwrote: > On Wed, Feb 28, 2018 at 05:51:14PM +0100, demerphq wrote: > >> I would look into putting it into a module and then using the PERL5OPT >> environment var to have it loaded automagically in any of your perl >> scripts. >> >> For instance if you put that code into a module called Git/DieTrap.pm >> >> then you could do: >> >> PERL5OPT=-MGit::DieTrap >> >> In your test setup code assuming you have some. Then you don't need to >> change any of your scripts just the test runner framework. > > That's a clever trick. > > It's not clear to me though if we just want to tweak the programs run in > the test scripts in order to get test_must_fail to stop complaining, or > if we consider the unusual exit codes from our perl-based Git programs > to be an error that should be fixed for real use, too. Yeah, that is a decision you guys need to make, I am not familiar enough with the issues to make any useful comment. But I wanted to say that I will bring this subject up on perl5porters, the exit code triggered by a die is a regular cause of trouble for more than just you guys, and maybe we can get it changed for the future. Nevertheless even if there was consensus it can be changed it will take years before it is widely distributed enough to be useful to you. :-( Ill be bold and say sorry on the behalf of the perl committers for this. Perl is so old sometimes things that used to make sense don't make sense anymore. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"