RE: cherry-pick very slow on big repository
Since this is happening during a merge, you might need to use merge.renameLimit or the merge strategy option of -Xno-renames. Although the code does fallback to use the diff.renameLimit but there is still a lot that is done before even checking the rename limit so I would first try getting renames turned off. Thanks, Kevin > -Original Message- > From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf > Of Peter Krefting > Sent: Friday, November 10, 2017 7:05 AM > To: Derrick Stolee > Cc: Jeff King ; Git Mailing List > Subject: Re: cherry-pick very slow on big repository > > Derrick Stolee: > > > Git is spending time detecting renames, which implies you probably > > renamed a folder or added and deleted a large number of files. This > > rename detection is quadratic (# adds times # deletes). > > Yes, a couple of directories with a lot of template files have been > renamed (and some removed, some added) between the current development > branch and this old maintenance branch. I get the "Performing inexact > rename detection" a lot when merging changes in the other direction. > > However, none of them applies to these particular commits, which only > touches files that are in the exact same location on both branches. > > > You can remove this rename detection by running your cherry-pick > > with `git -c diff.renameLimit=1 cherry-pick ...` > > That didn't work, actually it failed to finish with this setting in > effect, it hangs in such a way that I can't stop it with Ctrl+C > (neither when running from the command line, nor when running inside > gdb). It didn't finish in the 20 minutes I gave it. > > I also tried with diff.renames=false, which also seemed to fail. > > -- > \\// Peter - > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.softw > olves.pp.se%2F&data=02%7C01%7Ckewillf%40microsoft.com%7C6b831a75739e4 > 0428d3808d52844106c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636 > 459195209466999&sdata=kJtNLAs1LSoPy%2B%2BNADJkuEBPMZVcxkSkKzOEEeIG > VpM%3D&reserved=0
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
From: Junio C Hamano [mailto:gits...@pobox.com] Sent: Thursday, September 14, 2017 11:00 PM > > Kevin Willford writes: > > > 1. Does this statement, "I only care about the files in this > > sparse checkout, and do not concern me with anything else", mean > > that git should not change files outside the sparse-checkout whether > > that be in the working directory or in the index? Or does that only > > apply to the working directory and the index version of files can > > change to whatever git the git command would do without using > > sparse? For example if I am doing a 'git reset HEAD~1' should the > > version in the index of files outside the sparse not be changed or > > change to the HEAD~1 version with the skip-worktree bit on? > > My understanding of the purpose of "sparse checkout" thing is that > the user still wants to create correct whole-tree commit even the > user does not have the whole-tree checkout. The object names for > blobs recorded in the index that are meant to be included in the > next commit MUST be the same as those that would be in the index > when the "sparse" feature is not in use. "reset HEAD~1" should > match the index entries to the tree entries in HEAD~1. So, the > latter, I would think, among your two alternatives. > > IOW, after "git reset HEAD~", if you drop the skip-worktree bit from > all index entries, "git diff --cached HEAD" must say "there is no > changes". > > The only difference between the "sparse" and normal would be that, > because the "sparse" user does not intend to change anything outside > the "sparse" area, these paths outside her "sparse" area would not > materialize on the filesystem. For the next "write-tree" out of the > index to still write the correct tree out, the entries outside her > "sparse" area in the index MUST match the tree of the commit she > started working from. > Makes sense. And even though the reset might only change entries outside the sparse area and the next status will report "nothing to commit, working tree clean", that's okay because the user hasn't changed anything in their sparse area and intended to roll back the index to whatever they specified in their reset command. > > 2. How will this work with other git commands like merge, rebase, > > cherry-pick, etc.? > > 3. What about when there is a merge conflict with a file that is outside > > the sparse checkout? > > I would say, rather than forbidding such a merge, it should let her > see and deal with the conflict by dropping the "this is outside the > sparse area, so do not bother materializing it to the filesystem" > bit, but tell her loudly what it did ("I checked out a half-merged > file outside your sparse-checkout area because you'd need it while > resolving the conflict"). By doing things that way, the user can > decide if she wants to go ahead and complete the merge, even if the > conflict is outside the area she is currently interested in, or > postpone the merge and continue working on what she has been working > on inside the narrowed-down area first. > > I do not have a strong opinion whether the sparse-checkout > configuration file should be adjusted to match when the command must > tentatively bust the sparse checkout area; I'd imagine it can be > argued either way. > > Note that "sparse" is not my itch, and I would not be surprised if > those who designed it may want it to work differently from my > knee-jerk reaction in the previous two paragraphs, and I may even > find such an alternative solution preferable. > > But it is highly unlikely for any sensible solution would violate > the basic premise, i.e. "the indexed contents will stay the same as > the case without any 'sparse', so the next write-tree will do the > right thing". There was one other case that I thought about while implementing this approach and it is when the user creates a file that is outside their sparse definition. From your explanation above I will attempt to explain how I think it should work and please correct me if you see it working differently. The user creates a file that is outside the sparse area and it will show up as untracked. No problem here since the untracked are outside the scope of using sparse. Next the user adds the untracked file to the index. The skip-worktree bit should be off and stay off since the user could make additional changes and want to add them. Once the user commits the newly created file, I could see turning on the skip-worktree bit and removing the file from the working tree after the co
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
> From: Junio C Hamano [mailto:gits...@pobox.com] > Sent: Wednesday, September 13, 2017 4:18 PM > > Jacob Keller writes: > > > By definition if you do a sparse checkout, you're telling git "I only > > care about the files in this sparse checkout, and do not concern me > > with anything else"... So the proposed fix is "since git cleared the > > skip-worktree flag, we should actually also copy the file out again." > > but I think the real problem is that we're clearing skip worktree to > > begin with? > > As this 100% agree with what I've been saying, I do not have > anything to add to the discussion at the moment, so I'll stop > speaking now but will continue to monitor the thread so that I may > hear a new argument and datapoint that would make me change my mind. > > Thanks for a healthy discussion. Hi Junio, Thanks for the feedback. I will give this implementation a try. I still have the following unanswered questions, if you wouldn't mind answering. 1. Does this statement, "I only care about the files in this sparse checkout, and do not concern me with anything else", mean that git should not change files outside the sparse-checkout whether that be in the working directory or in the index? Or does that only apply to the working directory and the index version of files can change to whatever git the git command would do without using sparse? For example if I am doing a 'git reset HEAD~1' should the version in the index of files outside the sparse not be changed or change to the HEAD~1 version with the skip-worktree bit on? 2. How will this work with other git commands like merge, rebase, cherry-pick, etc.? 3. What about when there is a merge conflict with a file that is outside the sparse checkout? Should there not be a conflict because git shouldn't be trying to merge the file since it is outside the sparse checkout area? Should the conflicted file get written outside the sparse checkout so the user can resolve it? Thanks, Kevin
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
> From: Jacob Keller [mailto:jacob.kel...@gmail.com] > Sent: Tuesday, September 12, 2017 7:39 PM > > On Tue, Sep 12, 2017 at 4:30 PM, Kevin Willford wrote: > > > > Sorry. It was not in the sparse-checkout file. > > $ git config core.sparsecheckout true > > $ echo /i > .git/info/sparse-checkout > > $ git checkout -f > > was ran in the modified file case as well and "i" was the only file in the > > working directory before reset. > > > > > I'm assuming then that you mean that some file "m" is modified by the > commit, and do not mean to say that it has local modifications in the > working tree? That is not what I had inferred from earlier, so I was > very much confused. > Yes > In this example, the only file actually checked out is "i", as > everything else will have the skip-worktree bit set..? > Yes > So doing git reset HEAD~1 will reset the branch back one commit, and > now because of this reset is clearing the skip worktree flag, and > since you never had a copy of it checked out it is notified as > deleted, because it's no longer marked as skip-worktree? > Correct > > I think the real fix is to stop having reset clear skip-worktree, no? > > By definition if you do a sparse checkout, you're telling git "I only > care about the files in this sparse checkout, and do not concern me > with anything else"... So the proposed fix is "since git cleared the > skip-worktree flag, we should actually also copy the file out again." > but I think the real problem is that we're clearing skip worktree to > begin with? This certainly is an option but I would have some questions with this approach. Does this statement, "I only care about the files in this sparse checkout, and do not concern me with anything else", mean that git should not change files outside the sparse-checkout whether they are in the working directory or the index? Or does that only apply to the working directory and the index version of files can change to whatever git decides? So in the example above would "m" be the HEAD~1 version of the file in the index or the HEAD version before the reset? Does this apply to all git commands, merge, rebase, cherry-pick, etc? What about when there is a merge conflict with a file that is outside the sparse checkout? Seems to me it is a lot more complex than only caring about the files in the sparse checkout and no concern for anything else. Personally I would like to error on the side of letting the user decide what they want to do, even if that means turning off the skip-worktree bit and putting the working directory into an expected state.
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
> From: Jacob Keller [mailto:jacob.kel...@gmail.com] > Sent: Tuesday, September 12, 2017 4:29 PM > > On Tue, Sep 12, 2017 at 1:20 PM, Kevin Willford wrote: > > > > I think this is where I need to do a better job of explaining so here is a > > simple example. > > > > I have a file "a" that was added in the latest commit. > > $ git log --oneline > > c1fa646 (HEAD -> reset, master) add file a > > 40b342c Initial commit with file i > > > > Running the reset without using a sparse-checkout file > > > > $ git reset HEAD~1 > > $ git status > > On branch reset > > Untracked files: > > (use "git add ..." to include in what will be committed) > > > > a > > > > nothing added to commit but untracked files present (use "git add" to track) > > > > Turning on sparse-checkout and running checkout to make my working > > directory sparse > > > > $ git config core.sparsecheckout true > > $ echo /i > .git/info/sparse-checkout > > $ git checkout -f > > > > Running reset gives me > > $ git reset HEAD~1 > > $ git status > > On branch reset > > nothing to commit, working tree clean > > $ git ls-files > > i > > > > file a is gone. Not in the index and not in the working directory. > > Nothing to let the user know that anything changed. > > > > With a modified file no sparse-checkout > > $ git log --oneline > > 6fbd34a (HEAD -> reset, modified) modified file m > > c734d72 Initial commit with file i and m > > $ git reset HEAD~1 > > Unstaged changes after reset: > > M m > > $ git status > > On branch reset > > Changes not staged for commit: > > (use "git add ..." to update what will be committed) > > (use "git checkout -- ..." to discard changes in working directory) > > > > modified: m > > > > no changes added to commit (use "git add" and/or "git commit -a") > > > > With sparse-checkout > > $ git reset HEAD~1 > > Unstaged changes after reset: > > D m > > $ git status > > On branch reset > > Changes not staged for commit: > > (use "git add/rm ..." to update what will be committed) > > (use "git checkout -- ..." to discard changes in working directory) > > > > deleted:m > > > > no changes added to commit (use "git add" and/or "git commit -a") > > > > Wasn't "m" outside the sparse checkout? Or was it a file in the sparse > checkout? I mean to say, the file after setting up sparse checkout was > one of the "interesting" files that sparse checked out? > > Or was it in fact a separate file which wasn't there? Sorry. It was not in the sparse-checkout file. $ git config core.sparsecheckout true $ echo /i > .git/info/sparse-checkout $ git checkout -f was ran in the modified file case as well and "i" was the only file in the working directory before reset. > > I would think that in sparse-checkout world, you should only *ever* > have the files you list in sparse. > > So files outside sparse world should be ignored, not shown and not > show up in status, but they should absolutely not show up in the > working tree either. Sparse checkout uses the skip-worktree bit which has the following documentation: https://git-scm.com/docs/git-update-index "Skip-worktree bit can be defined in one (long) sentence: When reading an entry, if it is marked as skip-worktree, then Git pretends its working directory version is up to date and read the index version instead. To elaborate, "reading" means checking for file existence, reading file attributes or file content. The working directory version may be present or absent. If present, its content may match against the index version or not. Writing is not affected by this bit, content safety is still first priority. Note that Git can update working directory file, that is marked skip-worktree, if it is safe to do so (i.e. working directory version matches index version)" I'm not saying that is the right behavior, just pointing out the documentation. And from my experience git will turn on and off the skip-worktree flag depending on the command as we see happening with reset. > > You're not "changing" any commits, because the status of the file at > HEAD~1 is exactly what HEAD~1 says it is, but you just don't have a > checked out copy of it. In the case of reset yes and it matches HEAD and if the sparse flag was on, the user does not know that the file was changed
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
> From: Junio C Hamano [mailto:gits...@pobox.com] > Sent: Monday, September 11, 2017 9:57 PM > > Let's imagine a path P that is outside the sparse checkout area. > And we check out a commit that has P. P would still be recorded in > the index but it would not materialize in the working tree. "git > status" and friends are asked not to treat this as "locally removed", > to prevent "commit -a" from recording a removal, of course. > > Now, let's further imagine that you have a checkout of the same > project but at a commit that does not have P. Then you reset to > another commit that does have P. My understanding of what Kevin's > first test wants to demonstrate is that the index is populated with > P (because you did reset to a commit with that path) but it does not > materialize in the working tree (perhaps because that is outside the > sparse checkout area?), yet there is something missing compared to > the earlier case where "git status" and friends are asked not to > treat P as "locally removed". They instead show P as locally removed, > and "commit -a" would record a removal---that is indeed a problem. > > Am I reading the problem description correctly so far? If so, then > my answer to my first question (are we solving a right problem?) is > yes. > I think this is where I need to do a better job of explaining so here is a simple example. I have a file "a" that was added in the latest commit. $ git log --oneline c1fa646 (HEAD -> reset, master) add file a 40b342c Initial commit with file i Running the reset without using a sparse-checkout file $ git reset HEAD~1 $ git status On branch reset Untracked files: (use "git add ..." to include in what will be committed) a nothing added to commit but untracked files present (use "git add" to track) Turning on sparse-checkout and running checkout to make my working directory sparse $ git config core.sparsecheckout true $ echo /i > .git/info/sparse-checkout $ git checkout -f Running reset gives me $ git reset HEAD~1 $ git status On branch reset nothing to commit, working tree clean $ git ls-files i file a is gone. Not in the index and not in the working directory. Nothing to let the user know that anything changed. With a modified file no sparse-checkout $ git log --oneline 6fbd34a (HEAD -> reset, modified) modified file m c734d72 Initial commit with file i and m $ git reset HEAD~1 Unstaged changes after reset: M m $ git status On branch reset Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) modified: m no changes added to commit (use "git add" and/or "git commit -a") With sparse-checkout $ git reset HEAD~1 Unstaged changes after reset: D m $ git status On branch reset Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) deleted:m no changes added to commit (use "git add" and/or "git commit -a") I think we can agree that this is not the correct behavior. > But this time, I am trying to see if the approach is good. I am not > so sure if the approach taken by this patch is so obviously good as > you seem to think. A logical consequence of the approach "git > status thinks that P appears in the index and missing in the working > tree is a local removal, so let's squelch it by creating the file in > the working tree" is that we will end up fully populating the > working tree with paths that are clearly outside the area the user > designated as the sparse checkout area---surely that may squelch > "status", but to me, it looks like it is breaking what the user > wanted to achieve with "sparse checkout" in the first place. > I don't think that we are trying to "squelch" status so much as make it consistent with what the user would expect to happen. If that means not resetting entries with the skip-worktree bit or resetting the entries but keeping the skip-worktree bit on, okay, but I would argue that is not what the user wants because if you are now saying that sparse means git will not change files outside the sparse-checkout entries, what about merge, rebase, cherry-pick, apply? Should those only change the files that are in the sparse definition? If so we would be changing the commits from the original, i.e. cherry-pick 123 would create a different commit depending on whether or not you are using sparse as well as a different commit depending on what is in your sparse-checkout. I see reset being a similar scenario in that if everything is clean, after I "reset HEAD~1" I should be able to run "add ." + "commit" and have the same commit as before the reset. If this is changed to only reset the sparse entries, there will be staged changes after the reset because HEAD has changed but we didn't update the index versions of the files. If we do update the index with the "HEAD~1"
RE: What's cooking in git.git (Sep 2017, #02; Mon, 11)
> > * kw/write-index-reduce-alloc (2017-09-08) 2 commits > - read-cache: fix index corruption with index v4 > - Add t/helper/test-write-cache to .gitignore > > Expecting a reroll. > cf. q...@mail.gmail.com> > I didn't submit these patches so what would you like me to do? The reroll for read-cache fix was submitted here https://public-inbox.org/git/20170907192412.8085-1-t.gumme...@gmail.com/ - Kevin
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
> From: Junio C Hamano [mailto:gits...@pobox.com] > Sent: Friday, September 8, 2017 9:18 PM > > Kevin Willford writes: > > > 1. reset mixed when there were files that were added > > > > In this case the index will no longer have the entry at all because > > the reset is making the index look like before the file was added > > which didn't have it. When not using the sparse-checkout this is fine > > because the file is in the working directory and the reset didn't touch > > it. But using the sparse-checkout on that file and if the file was not > > in the working directory, the index gets reset and the entry for that > > file is gone and if we don't put the index version of the file before the > > reset into the working directory, then we have lost the content for > > that file > > I do not quite understand this argument. If you do > > edit $path > git add $path > rm $path > git reset > > for a $path that is not involved in the sparse thing, the version > that was previously indexed will be lost, but that is fine---the > user said that version is expendable by saying "reset". > > How would that be different when the $path were not to be > materialized in the working tree due to sparseness? Where did that > "blob" object in the index immediately before you called "reset" > came from, and why do you say that the user does *not* consider that > one expendable, unlike the case for non-sparse path example above? > I guess that I should have said files that were newly added, meaning they are new files that were created and added in the previous commit. I think that the difference is that the user explicitly removed the file. When using sparse it is git that is causing the removal of the file. For example if I have /file in my spare-checkout file so that I am only working on the one file. The previous commit had new file2 added and I run a git reset HEAD~1. I as the user do not expect that file2 just disappear, but yet that is what happens. So from you example above if I do. create $path git add $path git commit git checkout // where $path is not in the sparse-checkout git reset HEAD~1 $path will be gone yet I the user did not remove it. I guess you could argue that the user did when they specified their sparse-checkout and ran checkout but I wouldn't know what would go missing unless I ran a diff before the reset to see. There could have been X number of files created and added in the previous commit(s) and status after the reset would not report them and they are gone. So I could clone, setup sparse-checkout, checkout, reset HEAD~X and possibly lose data I didn't expect to. In the modified case where the previous commits have modifications to files outside the sparse-checkout at least the status after the reset reports the file as deleted so the user sees that something has happened to it. I suppose the entry could stay in the index with the skip-worktree bit on and not removed like it is now so that a git reset will only apply to the entries in the sparse-checkout? That seems like it would be changing the meaning of reset. > I suspect that a similar reasoning would apply to your 2., but I > didn't think it through. > > The possible misconception, which I perceive in both of these, is > that you are somehow disagreeing with this basic assumption: by > saying "git reset []", the user is telling us that the > version in the index, even if that is different from HEAD, > , or the file in the working tree, is *unwanted* and be > replaced with the one in HEAD (or when given). Touching > the working tree files upon "git reset" is the last thing the user > expects to happen. > I agree with this when you are not dealing with a sparse-checkout. When using a sparse-checkout I expect git not to touch things outside of what I have specified in my sparse-checkout file. If it does, it should let me know or put my working directory in a state that is expected. Especially when it is changing the skip-worktree bits causing files outside the sparse-checkout to be reported incorrectly by status.
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
From: Junio C Hamano [mailto:gits...@pobox.com] Sent: Friday, September 8, 2017 1:02 PM > Kevin Willford writes: > > > diff --git a/builtin/reset.c b/builtin/reset.c > > index d72c7d1c96..1b8bb45989 100644 > > --- a/builtin/reset.c > > +++ b/builtin/reset.c > > @@ -24,6 +24,7 @@ > > #include "cache-tree.h" > > #include "submodule.h" > > #include "submodule-config.h" > > +#include "dir.h" > > > > static const char * const git_reset_usage[] = { > > N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] > []"), > > @@ -124,8 +125,32 @@ static void update_index_from_diff(struct > diff_queue_struct *q, > > > > for (i = 0; i < q->nr; i++) { > > struct diff_filespec *one = q->queue[i]->one; > > + struct diff_filespec *two = q->queue[i]->two; > > + int pos; > > int is_missing = !(one->mode && !is_null_oid(&one->oid)); > > + int was_missing = !two->mode && is_null_oid(&two->oid); > > struct cache_entry *ce; > > + struct cache_entry *ceBefore; > > + struct checkout state = CHECKOUT_INIT; > > The five new variables are only used in the new block, so it > probably is better to limit their scope to the "we do something > unusual when sparse checkout is in effect" block as well. The scope > for the latter two can further be narrowed down to "we do need to > force a checkout of this entry" block. > > We prefer to name our variables with underscore (e.g. ce_before) > over camelCase (e.g. ceBefore) unless there is a compelling reason > (e.g. a platform specific code in compat/ layer to match platform > convention). > Will update. > > + > > + if (core_apply_sparse_checkout && !file_exists(two->path)) { > > + pos = cache_name_pos(two->path, strlen(two->path)); > > + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) > && > > + (is_missing || !was_missing)) > > + { > > + state.force = 1; > > + state.refresh_cache = 1; > > + state.istate = &the_index; > > + ceBefore = make_cache_entry(two->mode, > > + two->oid.hash, > > + two->path, 0, 0); > > + if (!ceBefore) > > + die(_("make_cache_entry failed for path > '%s'"), > > + two->path); > > + > > + checkout_entry(ceBefore, &state, NULL); > > + } > > + } > > Can we tell between the case where the reason why the path was not > there in the working tree was due to the path being excluded by the > sparse checkout and the path being removed manually by the end user? > > I guess ce_skip_worktree() check is sufficient; we force checkout only > when the path is marked to be skipped due to "sparse" thing. > > Do we have to worry about the reverse case, in which file_exists(two->path) > is true (i.e. the user created a file there manually) even though > the path is marked to be skipped due to "sparse" setting? > I don't believe so because if the user has a file there whether they modified it or not, it is what the user did and we just leave it there and a diff with what the index gets reset to will show how the file is different from what the index got reset to. > Other than that, the patch looks quite cleanly done and well explained. > > Thanks. >
RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
From: Junio C Hamano [mailto:gits...@pobox.com] Sent: Friday, September 8, 2017 1:12 PM > Kevin Willford writes: > > > When using the sparse checkout feature the git reset command will add > > entries to the index that will have the skip-worktree bit off but will > > leave the working directory empty. File data is lost because the index > > version of the files have been changed but there is nothing that is in the > > working directory. This will cause the next status call to show either > > deleted for files modified or deleting or nothing for files added. > > Getting rid of sparseness may of course be one way to correct the > discrepancy, but it is unclear to me if that is the best way to do > so. An alternative may be to add entries to the index that does > have the bit on for paths that are meant to be skipped due to > "sparse" thing, no? Am I being naive, missing some reason why that > would give us a worse result? There are two cases that this is trying to solve. 1. reset mixed when there were files that were added In this case the index will no longer have the entry at all because the reset is making the index look like before the file was added which didn't have it. When not using the sparse-checkout this is fine because the file is in the working directory and the reset didn't touch it. But using the sparse-checkout on that file and if the file was not in the working directory, the index gets reset and the entry for that file is gone and if we don't put the index version of the file before the reset into the working directory, then we have lost the content for that file 2. reset mixed when there were files modified This case is similar but with modified files there is an entry in the index but it is getting changed to a previous version of the file. If we don't get the file on disk then the version of the file that, in the non sparse-checkout case, would be on disk is lost and cannot be recovered. So even if we turn on the skip-worktree bit for this entry and it doesn't show up in status calls, we lost the previous version of the file.
[PATCH 0/1] reset: fix mixed reset when using sparse-checkout
Original discussion is here https://public-inbox.org/git/20170407192357.948-4-kewi...@microsoft.com/ When running a reset mixed and using the sparse-checkout the working directory needs to be updated so that there is not data loss when the index is updated. This is because the index is getting updated potentially removing entries without changing the working directory. When using the sparse-checkout feature the entries removed might not be on disk and are lost. This patch writes the before version of the file to disk if the mixed reset is going to change the index of a file that had the skip-wortree bit so that the file contents before the reset is preserved on disk and status will reports the correct results. Kevin Willford (1): reset: fix reset when using the sparse-checkout feature. builtin/reset.c | 25 + t/t7114-reset-sparse-checkout.sh | 60 2 files changed, 85 insertions(+) create mode 100755 t/t7114-reset-sparse-checkout.sh -- 2.14.1.474.g0558484247.dirty
[PATCH 1/1] reset: fix reset when using the sparse-checkout feature.
When using the sparse checkout feature the git reset command will add entries to the index that will have the skip-worktree bit off but will leave the working directory empty. File data is lost because the index version of the files have been changed but there is nothing that is in the working directory. This will cause the next status call to show either deleted for files modified or deleting or nothing for files added. The added files should be shown as untracked and modified files should be shown as modified. To fix this when the reset is running if there is not a file in the working directory and if it will be missing with the new index entry or was not missing in the previous version, we create the previous index version of the file in the working directory so that status will report correctly and the files will be availble for the user to deal with. Signed-off-by: Kevin Willford --- builtin/reset.c | 25 + t/t7114-reset-sparse-checkout.sh | 60 2 files changed, 85 insertions(+) create mode 100755 t/t7114-reset-sparse-checkout.sh diff --git a/builtin/reset.c b/builtin/reset.c index d72c7d1c96..1b8bb45989 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -24,6 +24,7 @@ #include "cache-tree.h" #include "submodule.h" #include "submodule-config.h" +#include "dir.h" static const char * const git_reset_usage[] = { N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] []"), @@ -124,8 +125,32 @@ static void update_index_from_diff(struct diff_queue_struct *q, for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; + struct diff_filespec *two = q->queue[i]->two; + int pos; int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int was_missing = !two->mode && is_null_oid(&two->oid); struct cache_entry *ce; + struct cache_entry *ceBefore; + struct checkout state = CHECKOUT_INIT; + + if (core_apply_sparse_checkout && !file_exists(two->path)) { + pos = cache_name_pos(two->path, strlen(two->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && + (is_missing || !was_missing)) + { + state.force = 1; + state.refresh_cache = 1; + state.istate = &the_index; + ceBefore = make_cache_entry(two->mode, + two->oid.hash, + two->path, 0, 0); + if (!ceBefore) + die(_("make_cache_entry failed for path '%s'"), + two->path); + + checkout_entry(ceBefore, &state, NULL); + } + } if (is_missing && !intent_to_add) { remove_file_from_cache(one->path); diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh new file mode 100755 index 00..f2a5426847 --- /dev/null +++ b/t/t7114-reset-sparse-checkout.sh @@ -0,0 +1,60 @@ +#!/bin/sh + +test_description='reset when using a sparse-checkout' + +. ./test-lib.sh + +# reset using a sparse-checkout file + +test_expect_success 'setup' ' + test_tick && + echo "checkout file" >c && + echo "modify file" >m && + echo "delete file" >d && + git add . && + git commit -m "initial commit" && + echo "added file" >a && + echo "modification of a file" >m && + git rm d && + git add . && + git commit -m "second commit" && + git checkout -b endCommit +' + +test_expect_success 'reset when there is a sparse-checkout' ' + echo "/c" >.git/info/sparse-checkout && + test_config core.sparsecheckout true && + git checkout -b resetBranch && + test_path_is_missing m && + test_path_is_missing a && + test_path_is_missing d && + git reset HEAD~1 && + test "checkout file" = "$(cat c)" && + test "modification of a file" = "$(cat m)" && + test "added file" = "$(cat a)" && + test_path
[PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap
The code was using two string_lists, one for the directories and one for the files. The code never checks the lists independently so we should be able to only use one list. The string_list also is a O(log n) for lookup and insertion. Switching this to use a hashmap will give O(1) which will save some time when there are millions of paths that will be checked. Signed-off-by: Kevin Willford --- merge-recursive.c | 56 --- merge-recursive.h | 3 +-- 2 files changed, 46 insertions(+), 13 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index d47f40ea81..35af3761ba 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -24,6 +24,31 @@ #include "dir.h" #include "submodule.h" +struct path_hashmap_entry { + struct hashmap_entry e; + char path[FLEX_ARRAY]; +}; + +static int path_hashmap_cmp(const void *cmp_data, + const void *entry, + const void *entry_or_key, + const void *keydata) +{ + const struct path_hashmap_entry *a = entry; + const struct path_hashmap_entry *b = entry_or_key; + const char *key = keydata; + + if (ignore_case) + return strcasecmp(a->path, key ? key : b->path); + else + return strcmp(a->path, key ? key : b->path); +} + +static unsigned int path_hash(const char *path) +{ + return ignore_case ? strihash(path) : strhash(path); +} + static void flush_output(struct merge_options *o) { if (o->buffer_output < 2 && o->obuf.len) { @@ -314,15 +339,15 @@ static int save_files_dirs(const unsigned char *sha1, struct strbuf *base, const char *path, unsigned int mode, int stage, void *context) { + struct path_hashmap_entry *entry; int baselen = base->len; struct merge_options *o = context; strbuf_addstr(base, path); - if (S_ISDIR(mode)) - string_list_insert(&o->current_directory_set, base->buf); - else - string_list_insert(&o->current_file_set, base->buf); + FLEX_ALLOC_MEM(entry, path, base->buf, base->len); + hashmap_entry_init(entry, path_hash(entry->path)); + hashmap_add(&o->current_file_dir_set, entry); strbuf_setlen(base, baselen); return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0); @@ -642,6 +667,7 @@ static void add_flattened_path(struct strbuf *out, const char *s) static char *unique_path(struct merge_options *o, const char *path, const char *branch) { + struct path_hashmap_entry *entry; struct strbuf newpath = STRBUF_INIT; int suffix = 0; size_t base_len; @@ -650,14 +676,16 @@ static char *unique_path(struct merge_options *o, const char *path, const char * add_flattened_path(&newpath, branch); base_len = newpath.len; - while (string_list_has_string(&o->current_file_set, newpath.buf) || - string_list_has_string(&o->current_directory_set, newpath.buf) || + while (hashmap_get_from_hash(&o->current_file_dir_set, +path_hash(newpath.buf), newpath.buf) || (!o->call_depth && file_exists(newpath.buf))) { strbuf_setlen(&newpath, base_len); strbuf_addf(&newpath, "_%d", suffix++); } - string_list_insert(&o->current_file_set, newpath.buf); + FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len); + hashmap_entry_init(entry, path_hash(entry->path)); + hashmap_add(&o->current_file_dir_set, entry); return strbuf_detach(&newpath, NULL); } @@ -1941,8 +1969,14 @@ int merge_trees(struct merge_options *o, if (unmerged_cache()) { struct string_list *entries, *re_head, *re_merge; int i; - string_list_clear(&o->current_file_set, 1); - string_list_clear(&o->current_directory_set, 1); + /* +* Only need the hashmap while processing entries, so +* initialize it here and free it when we are done running +* through the entries. Keeping it in the merge_options as +* opposed to decaring a local hashmap is for convenience +* so that we don't have to pass it to around. +*/ + hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512); get_files_dirs(o, head); get_files_dirs(o, merge); @@ -1978,6 +2012,8 @@ int merge_trees(struct merge_options *o, string_list_clear(re_head, 0); string_list_clear(entries, 1); + hashmap_free(&o->current_file_dir_set, 1); + f
[PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs
The return value of the get_files_dirs call is never being used. Looking at the history of the file and it was originally only being used for debug output statements. Also when read_tree_recursive return value is non zero it is changed to zero. This leads me to believe that it doesn't matter if read_tree_recursive gets an error. Since the debug output has been removed and the caller isn't checking the return value there is no reason to keep calculating and returning a value. Signed-off-by: Kevin Willford --- merge-recursive.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 033d7cd406..d47f40ea81 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1, return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0); } -static int get_files_dirs(struct merge_options *o, struct tree *tree) +static void get_files_dirs(struct merge_options *o, struct tree *tree) { - int n; struct pathspec match_all; memset(&match_all, 0, sizeof(match_all)); - if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o)) - return 0; - n = o->current_file_set.nr + o->current_directory_set.nr; - return n; + read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o); } /* -- 2.14.1.329.gcdd497e120.dirty
[PATCH v3 0/3] merge-recursive: replace string_list with hashmap
Code was using two string_lists, one for the directories and one for the files. The code never checks the lists independently so we should be able to only use one list. The string_list also is a O(log n) for lookup and insertion. Switching this to use a hashmap will give O(1) which will save some time when there are millions of paths that will be checked. Also cleaned up a memory leak and method where the return was not being used. Changes since last version: 1. Removed the function pointers and just check the ignore_case in the compare and hash methods. 2. Added a comment about the hashmap and why it is getting initialized and freed but not a local. 3. Use hashmap_get_from_hash and remove the dummy entry Kevin Willford (3): merge-recursive: fix memory leak merge-recursive: remove return value from get_files_dirs merge-recursive: change current file dir string_lists to hashmap merge-recursive.c | 76 --- merge-recursive.h | 3 +-- 2 files changed, 57 insertions(+), 22 deletions(-) -- 2.14.1.329.gcdd497e120.dirty
[PATCH v3 1/3] merge-recursive: fix memory leak
In merge_trees if process_renames or process_entry returns less than zero, the method will just return and not free re_merge, re_head, or entries. This change cleans up the allocated variables before returning to the caller. Signed-off-by: Kevin Willford --- merge-recursive.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1494ffdb82..033d7cd406 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o, re_merge = get_renames(o, merge, common, head, merge, entries); clean = process_renames(o, re_head, re_merge); if (clean < 0) - return clean; + goto cleanup; for (i = entries->nr-1; 0 <= i; i--) { const char *path = entries->items[i].string; struct stage_data *e = entries->items[i].util; @@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o, int ret = process_entry(o, path, e); if (!ret) clean = 0; - else if (ret < 0) - return ret; + else if (ret < 0) { + clean = ret; + goto cleanup; + } } } for (i = 0; i < entries->nr; i++) { @@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o, entries->items[i].string); } +cleanup: string_list_clear(re_merge, 0); string_list_clear(re_head, 0); string_list_clear(entries, 1); @@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o, free(re_merge); free(re_head); free(entries); + + if (clean < 0) + return clean; } else clean = 1; -- 2.14.1.329.gcdd497e120.dirty
[PATCH v2] merge-recursive: change current file dir string_lists to hashmap
The code was using two string_lists, one for the directories and one for the files. The code never checks the lists independently so we should be able to only use one list. The string_list also is a O(log n) for lookup and insertion. Switching this to use a hashmap will give O(1) which will save some time when there are millions of paths that will be checked. Also cleaned up a memory leak and method where the return was not being used. Signed-off-by: Kevin Willford --- merge-recursive.c | 76 --- merge-recursive.h | 3 +-- 2 files changed, 57 insertions(+), 22 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1494ffdb82..ebfe01017f 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -24,6 +24,31 @@ #include "dir.h" #include "submodule.h" +struct path_hashmap_entry { + struct hashmap_entry; + char path[FLEX_ARRAY]; +}; + +static int path_hashmap_cmp(const void *cmp_data, + const void *entry, + const void *entry_or_key, + const void *keydata) +{ + const struct path_hashmap_entry *a = entry; + const struct path_hashmap_entry *b = entry_or_key; + const char *key = keydata; + + if (ignore_case) + return strcasecmp(a->path, key ? key : b->path); + else + return strcmp(a->path, key ? key : b->path); +} + +static unsigned int path_hash(const char *path) +{ + return ignore_case ? strihash(path) : strhash(path); +} + static void flush_output(struct merge_options *o) { if (o->buffer_output < 2 && o->obuf.len) { @@ -314,29 +339,25 @@ static int save_files_dirs(const unsigned char *sha1, struct strbuf *base, const char *path, unsigned int mode, int stage, void *context) { + struct path_hashmap_entry *entry; int baselen = base->len; struct merge_options *o = context; strbuf_addstr(base, path); - if (S_ISDIR(mode)) - string_list_insert(&o->current_directory_set, base->buf); - else - string_list_insert(&o->current_file_set, base->buf); + FLEX_ALLOC_MEM(entry, path, base->buf, base->len); + hashmap_entry_init(entry, path_hash(entry->path)); + hashmap_add(&o->current_file_dir_set, entry); strbuf_setlen(base, baselen); return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0); } -static int get_files_dirs(struct merge_options *o, struct tree *tree) +static void get_files_dirs(struct merge_options *o, struct tree *tree) { - int n; struct pathspec match_all; memset(&match_all, 0, sizeof(match_all)); - if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o)) - return 0; - n = o->current_file_set.nr + o->current_directory_set.nr; - return n; + read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o); } /* @@ -646,6 +667,7 @@ static void add_flattened_path(struct strbuf *out, const char *s) static char *unique_path(struct merge_options *o, const char *path, const char *branch) { + struct path_hashmap_entry *entry; struct strbuf newpath = STRBUF_INIT; int suffix = 0; size_t base_len; @@ -654,14 +676,16 @@ static char *unique_path(struct merge_options *o, const char *path, const char * add_flattened_path(&newpath, branch); base_len = newpath.len; - while (string_list_has_string(&o->current_file_set, newpath.buf) || - string_list_has_string(&o->current_directory_set, newpath.buf) || + while (hashmap_get_from_hash(&o->current_file_dir_set, +path_hash(newpath.buf), newpath.buf) || (!o->call_depth && file_exists(newpath.buf))) { strbuf_setlen(&newpath, base_len); strbuf_addf(&newpath, "_%d", suffix++); } - string_list_insert(&o->current_file_set, newpath.buf); + FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len); + hashmap_entry_init(entry, path_hash(entry->path)); + hashmap_add(&o->current_file_dir_set, entry); return strbuf_detach(&newpath, NULL); } @@ -1945,8 +1969,14 @@ int merge_trees(struct merge_options *o, if (unmerged_cache()) { struct string_list *entries, *re_head, *re_merge; int i; - string_list_clear(&o->current_file_set, 1); - string_list_clear(&o->current_directory_set, 1); + /* +* Only need the hashmap while processing entries, so +* initialize it here and free it when we are done run
RE: [PATCH] read-cache: fix index corruption with index v4
> From: Thomas Gummerer [mailto:t.gumme...@gmail.com] > Sent: Monday, September 4, 2017 4:58 PM > > ce012deb98 ("read-cache: avoid allocating every ondisk entry when > writing", 2017-08-21) changed the way cache entries are written to the > index file. While previously it wrote the name to an struct that was > allocated using xcalloc(), it now uses ce_write() directly. Previously > ce_namelen - common bytes were written to the cache entry, which would > automatically make it nul terminated, as it was allocated using calloc. > > Now we are writing ce_namelen - common + 1 bytes directly from the > ce->name to the index. As ce->name is however not nul terminated, and > index-v4 needs the nul terminator to split between one index entry and > the next, this would end up in a corrupted index. > > Fix that by only writing ce_namelen - common bytes directly from > ce->name to the index, and adding the nul terminator in an extra call to > ce_write. > > This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in > config.mak and running the test suite (t1700 specifically broke). > > Signed-off-by: Thomas Gummerer > --- > > I unfortunately didn't have more time to dig so > > > As ce->name is however not nul terminated > > just comes from my memory and from the patch below actually fixing the > corruption, so it's really the most likely cause. Would be great if > someone who can remember more about the index could confirm that this > is indeed the case. > Digging into this and ce->name IS nul terminated. The issue comes in when the CE_STRIP_NAME is set, which is only set when using a split index. This sets the ce->ce_namelen = 0 without changing the actual ce->name buffer. When writing the entry for the split index version 4 it was using the first character in the ce->name buffer because of the + 1, which obviously isn't correct. Before it was using a newly allocated name buffer from the ondisk struct which was allocated based on the ce_namelen of zero. > read-cache.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/read-cache.c b/read-cache.c > index 40da87ea71..80830ddcfc 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2103,7 +2103,9 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct > cache_entry *ce, > if (!result) > result = ce_write(c, fd, to_remove_vi, prefix_size); > if (!result) > - result = ce_write(c, fd, ce->name + common, > ce_namelen(ce) - common + 1); > + result = ce_write(c, fd, ce->name + common, > ce_namelen(ce) - common); > + if (!result) > + result = ce_write(c, fd, "\0", 1); You could use the padding variable here as well which is used in the < version 4 ce_write. > > strbuf_splice(previous_name, common, to_remove, > ce->name + common, ce_namelen(ce) - common); > -- > 2.14.1.480.gb18f417b89 While looking at the code I was wondering if we could get around the whole setting ce->ce_namelen to zero bit but that would be much bigger patch and possibly introduce other bugs so this seems the appropriate fix for now. Thanks for finding this! Kevin
RE: [PATCH 2/3] merge-recursive: remove return value from get_files_dirs
> > On Mon, Aug 28, 2017 at 02:28:28PM -0600, Kevin Willford wrote: > > > The return value of the get_files_dirs call is never being used. > > Looking at the history of the file and it was originally only > > being used for debug output statements. Also when > > read_tree_recursive return value is non zero it is changed to > > zero. This leads me to believe that it doesn't matter if > > read_tree_recursive gets an error. > > Or that the function is buggy. :) That was one of my questions as well. Should read_tree_recursive be propagating a -1 and merge_trees be checking for that and bail when the call to get_files_dirs return is < 0? I made a commit with this change and ran the tests and they all still passed so either this return really doesn't matter or there are not sufficient tests covering it. I went with this change because it was not changing any of the current functionality and if we find a case where it matters that read_tree_recursive fails due to bad tree or something else we can address it then. > > I'm tempted to say that we should probably die() when > read_tree_recursive fails. This should only happen if we fail to parse > the tree, or if our callback (save_files_dirs here) returns failure, and > the latter looks like it never happens. > > > Since the debug output has been removed and the caller isn't > > checking the return value there is no reason to keep calulating > > and returning a value. > > Agreed, and I'm happy to see dead code go. > > Minor nit: s/calulating/calculating/ in your commit message. When will that spell checker for git messages be ready? ;) > > -Peff
[PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap
The code was using two string_lists, one for the directories and one for the files. The code never checks the lists independently so we should be able to only use one list. The string_list also is a O(log n) for lookup and insertion. Switching this to use a hashmap will give O(1) which will save some time when there are millions of paths that will be checked. Signed-off-by: Kevin Willford --- merge-recursive.c | 48 +--- merge-recursive.h | 3 +-- 2 files changed, 38 insertions(+), 13 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index d47f40ea81..ef4fe5f09f 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -24,6 +24,26 @@ #include "dir.h" #include "submodule.h" +struct path_hashmap_entry { + struct hashmap_entry; + char path[FLEX_ARRAY]; +}; + +static unsigned int (*pathhash)(const char *path); +static int (*pathcmp)(const char *a, const char *b); + +static int path_hashmap_cmp(const void *cmp_data, + const void *entry, + const void *entry_or_key, + const void *keydata) +{ + const struct path_hashmap_entry *a = entry; + const struct path_hashmap_entry *b = entry_or_key; + const char *key = keydata; + + return pathcmp(a->path, key ? key : b->path); +} + static void flush_output(struct merge_options *o) { if (o->buffer_output < 2 && o->obuf.len) { @@ -314,15 +334,15 @@ static int save_files_dirs(const unsigned char *sha1, struct strbuf *base, const char *path, unsigned int mode, int stage, void *context) { + struct path_hashmap_entry *entry; int baselen = base->len; struct merge_options *o = context; strbuf_addstr(base, path); - if (S_ISDIR(mode)) - string_list_insert(&o->current_directory_set, base->buf); - else - string_list_insert(&o->current_file_set, base->buf); + FLEX_ALLOC_MEM(entry, path, base->buf, base->len); + hashmap_entry_init(entry, pathhash(entry->path)); + hashmap_add(&o->current_file_dir_set, entry); strbuf_setlen(base, baselen); return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0); @@ -642,6 +662,8 @@ static void add_flattened_path(struct strbuf *out, const char *s) static char *unique_path(struct merge_options *o, const char *path, const char *branch) { + struct path_hashmap_entry dummy; + struct path_hashmap_entry *entry; struct strbuf newpath = STRBUF_INIT; int suffix = 0; size_t base_len; @@ -650,14 +672,17 @@ static char *unique_path(struct merge_options *o, const char *path, const char * add_flattened_path(&newpath, branch); base_len = newpath.len; - while (string_list_has_string(&o->current_file_set, newpath.buf) || - string_list_has_string(&o->current_directory_set, newpath.buf) || + hashmap_entry_init(&dummy, pathhash(newpath.buf)); + while (hashmap_get(&o->current_file_dir_set, &dummy, newpath.buf) || (!o->call_depth && file_exists(newpath.buf))) { strbuf_setlen(&newpath, base_len); strbuf_addf(&newpath, "_%d", suffix++); + hashmap_entry_init(&dummy, pathhash(newpath.buf)); } - string_list_insert(&o->current_file_set, newpath.buf); + FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len); + hashmap_entry_init(entry, pathhash(entry->path)); + hashmap_add(&o->current_file_dir_set, entry); return strbuf_detach(&newpath, NULL); } @@ -1941,8 +1966,7 @@ int merge_trees(struct merge_options *o, if (unmerged_cache()) { struct string_list *entries, *re_head, *re_merge; int i; - string_list_clear(&o->current_file_set, 1); - string_list_clear(&o->current_directory_set, 1); + hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512); get_files_dirs(o, head); get_files_dirs(o, merge); @@ -1978,6 +2002,8 @@ int merge_trees(struct merge_options *o, string_list_clear(re_head, 0); string_list_clear(entries, 1); + hashmap_free(&o->current_file_dir_set, 1); + free(re_merge); free(re_head); free(entries); @@ -2179,8 +2205,8 @@ void init_merge_options(struct merge_options *o) if (o->verbosity >= 5) o->buffer_output = 0; strbuf_init(&o->obuf, 0); - string_list_init(&o->current_file_set, 1); - string_list_init(&o->current_directory_set, 1); + pathhash = ignore_case ? stri
[PATCH 2/3] merge-recursive: remove return value from get_files_dirs
The return value of the get_files_dirs call is never being used. Looking at the history of the file and it was originally only being used for debug output statements. Also when read_tree_recursive return value is non zero it is changed to zero. This leads me to believe that it doesn't matter if read_tree_recursive gets an error. Since the debug output has been removed and the caller isn't checking the return value there is no reason to keep calulating and returning a value. Signed-off-by: Kevin Willford --- merge-recursive.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 033d7cd406..d47f40ea81 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1, return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0); } -static int get_files_dirs(struct merge_options *o, struct tree *tree) +static void get_files_dirs(struct merge_options *o, struct tree *tree) { - int n; struct pathspec match_all; memset(&match_all, 0, sizeof(match_all)); - if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o)) - return 0; - n = o->current_file_set.nr + o->current_directory_set.nr; - return n; + read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o); } /* -- 2.14.1.329.g6edf0add19
[PATCH 0/3] merge-recursive: replace string_list with hashmap
Code was using two string_lists, one for the directories and one for the files. The code never checks the lists independently so we should be able to only use one list. The string_list also is a O(log n) for lookup and insertion. Switching this to use a hashmap will give O(1) which will save some time when there are millions of paths that will be checked. Also cleaned up a memory leak and method where the return was not being used. Kevin Willford (3): merge-recursive: fix memory leak merge-recursive: remove return value from get_files_dirs merge-recursive: change current file dir string_lists to hashmap merge-recursive.c | 68 +++ merge-recursive.h | 3 +-- 2 files changed, 49 insertions(+), 22 deletions(-) -- 2.14.1.329.g6edf0add19
[PATCH 1/3] merge-recursive: fix memory leak
In merge_trees if process_renames or process_entry returns less than zero, the method will just return and not free re_merge, re_head, or entries. This change cleans up the allocated variables before returning to the caller. Signed-off-by: Kevin Willford --- merge-recursive.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1494ffdb82..033d7cd406 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o, re_merge = get_renames(o, merge, common, head, merge, entries); clean = process_renames(o, re_head, re_merge); if (clean < 0) - return clean; + goto cleanup; for (i = entries->nr-1; 0 <= i; i--) { const char *path = entries->items[i].string; struct stage_data *e = entries->items[i].util; @@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o, int ret = process_entry(o, path, e); if (!ret) clean = 0; - else if (ret < 0) - return ret; + else if (ret < 0) { + clean = ret; + goto cleanup; + } } } for (i = 0; i < entries->nr; i++) { @@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o, entries->items[i].string); } +cleanup: string_list_clear(re_merge, 0); string_list_clear(re_head, 0); string_list_clear(entries, 1); @@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o, free(re_merge); free(re_head); free(entries); + + if (clean < 0) + return clean; } else clean = 1; -- 2.14.1.329.g6edf0add19
[PATCH 1/3] perf: add test for writing the index
A performance test for writing the index to be able to determine if changes to allocating ondisk structure help. Signed-off-by: Kevin Willford --- Makefile| 1 + t/helper/test-write-cache.c | 23 +++ t/perf/p0007-write-cache.sh | 29 + 3 files changed, 53 insertions(+) create mode 100644 t/helper/test-write-cache.c create mode 100755 t/perf/p0007-write-cache.sh diff --git a/Makefile b/Makefile index 86ec29202b..c6b061086f 100644 --- a/Makefile +++ b/Makefile @@ -655,6 +655,7 @@ TEST_PROGRAMS_NEED_X += test-parse-options TEST_PROGRAMS_NEED_X += test-path-utils TEST_PROGRAMS_NEED_X += test-prio-queue TEST_PROGRAMS_NEED_X += test-read-cache +TEST_PROGRAMS_NEED_X += test-write-cache TEST_PROGRAMS_NEED_X += test-ref-store TEST_PROGRAMS_NEED_X += test-regex TEST_PROGRAMS_NEED_X += test-revision-walking diff --git a/t/helper/test-write-cache.c b/t/helper/test-write-cache.c new file mode 100644 index 00..b7ee039669 --- /dev/null +++ b/t/helper/test-write-cache.c @@ -0,0 +1,23 @@ +#include "cache.h" +#include "lockfile.h" + +static struct lock_file index_lock; + +int cmd_main(int argc, const char **argv) +{ + int i, cnt = 1, lockfd; + if (argc == 2) + cnt = strtol(argv[1], NULL, 0); + setup_git_directory(); + read_cache(); + for (i = 0; i < cnt; i++) { + lockfd = hold_locked_index(&index_lock, LOCK_DIE_ON_ERROR); + if (0 <= lockfd) { + write_locked_index(&the_index, &index_lock, COMMIT_LOCK); + } else { + rollback_lock_file(&index_lock); + } + } + + return 0; +} diff --git a/t/perf/p0007-write-cache.sh b/t/perf/p0007-write-cache.sh new file mode 100755 index 00..261fe92fd9 --- /dev/null +++ b/t/perf/p0007-write-cache.sh @@ -0,0 +1,29 @@ +#!/bin/sh + +test_description="Tests performance of writing the index" + +. ./perf-lib.sh + +test_perf_default_repo + +test_expect_success "setup repo" ' + if git rev-parse --verify refs/heads/p0006-ballast^{commit} + then + echo Assuming synthetic repo from many-files.sh + git config --local core.sparsecheckout 1 + cat >.git/info/sparse-checkout <<-EOF + /* + !ballast/* + EOF + else + echo Assuming non-synthetic repo... + fi && + nr_files=$(git ls-files | wc -l) +' + +count=3 +test_perf "write_locked_index $count times ($nr_files files)" " + test-write-cache $count +" + +test_done -- 2.14.1.205.g2812f3410d
[PATCH 2/3] read-cache: fix memory leak in do_write_index
The previous_name_buf was never getting released when there was an error in ce_write_entry or allow was false and execution was returned to the caller. Signed-off-by: Kevin Willford --- read-cache.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/read-cache.c b/read-cache.c index acfb028f48..47220cc30d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2192,7 +2192,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int newfd = tempfile->fd; git_SHA_CTX c; struct cache_header hdr; - int i, err, removed, extended, hdr_version; + int i, err = 0, removed, extended, hdr_version; struct cache_entry **cache = istate->cache; int entries = istate->cache_nr; struct stat st; @@ -2247,15 +2247,21 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (allow) warning(msg, ce->name); else - return error(msg, ce->name); + err = error(msg, ce->name); drop_cache_tree = 1; } if (ce_write_entry(&c, newfd, ce, previous_name) < 0) - return -1; + err = -1; + + if (err) + break; } strbuf_release(&previous_name_buf); + if (err) + return err; + /* Write extension data here */ if (!strip_extensions && istate->split_index) { struct strbuf sb = STRBUF_INIT; -- 2.14.1.205.g2812f3410d
[PATCH 3/3] read-cache: avoid allocating every ondisk entry when writing
When writing the index for each entry an ondisk struct will be allocated and freed in ce_write_entry. We can do better by using a ondisk struct on the stack for each entry. This is accomplished by using a stack ondisk_cache_entry_extended outside looping through the entries in do_write_index. Only the fixed fields of this struct are used when writing and depending on whether it is extended or not the flags2 field will be written. The name field is not used and instead the cache_entry name field is used directly when writing out the name. Because ce_write is using a buffer and memcpy to fill the buffer before flushing to disk, we don't have to worry about doing multiple ce_write calls. Running the p0007-write-cache.sh tests would save anywhere between 3-7% when the index had over a million entries with no performance degradation on small repos. Signed-off-by: Kevin Willford --- read-cache.c | 50 +- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/read-cache.c b/read-cache.c index 47220cc30d..694bed8d82 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1499,6 +1499,7 @@ struct ondisk_cache_entry_extended { }; /* These are only used for v3 or lower */ +#define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len) #define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) & ~7) #define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len) #define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len) @@ -2032,7 +2033,7 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce) } /* Copy miscellaneous fields but not the name */ -static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, +static void copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, struct cache_entry *ce) { short flags; @@ -2056,32 +2057,35 @@ static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk, struct ondisk_cache_entry_extended *ondisk2; ondisk2 = (struct ondisk_cache_entry_extended *)ondisk; ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16); - return ondisk2->name; - } - else { - return ondisk->name; } } static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, - struct strbuf *previous_name) + struct strbuf *previous_name, struct ondisk_cache_entry *ondisk) { int size; - struct ondisk_cache_entry *ondisk; int saved_namelen = saved_namelen; /* compiler workaround */ - char *name; int result; + static unsigned char padding[8] = { 0x00 }; if (ce->ce_flags & CE_STRIP_NAME) { saved_namelen = ce_namelen(ce); ce->ce_namelen = 0; } + if (ce->ce_flags & CE_EXTENDED) + size = offsetof(struct ondisk_cache_entry_extended, name); + else + size = offsetof(struct ondisk_cache_entry, name); + if (!previous_name) { - size = ondisk_ce_size(ce); - ondisk = xcalloc(1, size); - name = copy_cache_entry_to_ondisk(ondisk, ce); - memcpy(name, ce->name, ce_namelen(ce)); + int len = ce_namelen(ce); + copy_cache_entry_to_ondisk(ondisk, ce); + result = ce_write(c, fd, ondisk, size); + if (!result) + result = ce_write(c, fd, ce->name, len); + if (!result) + result = ce_write(c, fd, padding, align_padding_size(size, len)); } else { int common, to_remove, prefix_size; unsigned char to_remove_vi[16]; @@ -2094,16 +2098,12 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, to_remove = previous_name->len - common; prefix_size = encode_varint(to_remove, to_remove_vi); - if (ce->ce_flags & CE_EXTENDED) - size = offsetof(struct ondisk_cache_entry_extended, name); - else - size = offsetof(struct ondisk_cache_entry, name); - size += prefix_size + (ce_namelen(ce) - common + 1); - - ondisk = xcalloc(1, size); - name = copy_cache_entry_to_ondisk(ondisk, ce); - memcpy(name, to_remove_vi, prefix_size); - memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common); + copy_cache_entry_to_ondisk(ondisk, ce); + result = ce_write(c, fd, ondisk, size); + if (!result) + result = ce_write(c, fd, to_remove_vi, prefix_size); + if (!res
[PATCH 0/3] read-cache: use stack ondisk struct when writing index
When writing out the index, the ondisk struct was being allocated and freed within the loop of cache entries. A better way would be to use a ondisk struct on the stack and reuse it avoiding the alloc and free calls. A test has been added to measure the performance of writing the index (p0007-write-cache). Running this test on smaller repos showed no degradation in performance. On larger repos there was ~3-7% improvement. 0007.2: write_locked_index 10 times (101 files) 5.98(0.00+0.04) 5.75(0.01+0.04) -3.8% 0007.2: write_locked_index 10 times (101 files) 6.20(0.00+0.06) 5.86(0.01+0.03) -5.5% 0007.2: write_locked_index 3 times (4394531 files) 10.29(0.04+0.03) 9.75(0.04+0.01) -5.2% 0007.2: write_locked_index 3 times (4394531 files) 10.52(0.00+0.04) 9.79(0.03+0.03) -6.9% Kevin Willford (3): perf: add test for writing the index read-cache: fix memory leak in do_write_index read-cache: avoid allocating every ondisk entry when writing Makefile| 1 + read-cache.c| 62 + t/helper/test-write-cache.c | 23 + t/perf/p0007-write-cache.sh | 29 + 4 files changed, 87 insertions(+), 28 deletions(-) create mode 100644 t/helper/test-write-cache.c create mode 100755 t/perf/p0007-write-cache.sh -- 2.14.1.205.g2812f3410d
RE: [PATCH v2] commit: skip discarding the index if there is no pre-commit hook
> On Mon, Aug 14, 2017 at 03:54:25PM -0600, Kevin Willford wrote: > > > If there is not a pre-commit hook, there is no reason to discard > > the index and reread it. > > > > This change checks to presence of a pre-commit hook and then only > > discards the index if there was one. > > > > Signed-off-by: Kevin Willford > > --- > > builtin/commit.c | 15 +-- > > 1 file changed, 9 insertions(+), 6 deletions(-) > > Thanks, this looks nice and simple. > > > diff --git a/builtin/commit.c b/builtin/commit.c > > index e7a2cb6285..ab71b93518 100644 > > --- a/builtin/commit.c > > +++ b/builtin/commit.c > > @@ -940,12 +940,15 @@ static int prepare_to_commit(const char *index_file, > const char *prefix, > > return 0; > > } > > > > - /* > > -* Re-read the index as pre-commit hook could have updated it, > > -* and write it out as a tree. We must do this before we invoke > > -* the editor and after we invoke run_status above. > > -*/ > > - discard_cache(); > > + if (!no_verify && find_hook("pre-commit")) { > > + /* > > +* Re-read the index as pre-commit hook could have updated it, > > +* and write it out as a tree. We must do this before we invoke > > +* the editor and after we invoke run_status above. > > +*/ > > + discard_cache(); > > + } > > + > > read_cache_from(index_file); > > This read_cache_from() should be a noop, right, because it immediately > sees istate->initialized is set? So it shouldn't matter that it is not > in the conditional with discard_cache(). Though if its only purpose is > to re-read the just-discarded contents, perhaps it makes sense to put it > there for readability. > > -Peff I thought about that and didn't know if there were cases when this would be called and the cache has not been loaded. It didn't look like it since it is only called from cmd_commit and prepare_index is called before it. Also if in the future this call would be made when it had not read the index yet so thought it was safest just to leave this as always being called since it is basically a noop if the istate->initialized is set. Also based on this commit https://github.com/git/git/commit/2888605c649ccd423232161186d72c0e6c458a48 it looked like the discard_cache was added independent of the read_cache_from call, which made me think that the two were not tied together. Kevin
[PATCH v2] commit: skip discarding the index if there is no pre-commit hook
If there is not a pre-commit hook, there is no reason to discard the index and reread it. This change checks to presence of a pre-commit hook and then only discards the index if there was one. Signed-off-by: Kevin Willford --- builtin/commit.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index e7a2cb6285..ab71b93518 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -940,12 +940,15 @@ static int prepare_to_commit(const char *index_file, const char *prefix, return 0; } - /* -* Re-read the index as pre-commit hook could have updated it, -* and write it out as a tree. We must do this before we invoke -* the editor and after we invoke run_status above. -*/ - discard_cache(); + if (!no_verify && find_hook("pre-commit")) { + /* +* Re-read the index as pre-commit hook could have updated it, +* and write it out as a tree. We must do this before we invoke +* the editor and after we invoke run_status above. +*/ + discard_cache(); + } + read_cache_from(index_file); if (update_main_cache_tree(0)) { error(_("Error building trees")); -- 2.14.1.481.g785c1dc9ae
Re: [PATCH] cache-tree: remove use of strbuf_addf in update_one
On 8/10/2017 3:03 PM, Jeff King wrote: On Thu, Aug 10, 2017 at 11:58:34AM -0700, Stefan Beller wrote: On Thu, Aug 10, 2017 at 11:47 AM, Kevin Willford wrote: String formatting can be a performance issue when there are hundreds of thousands of trees. When changing this for the sake of performance, could you give an example (which kind of repository you need for this to become a bottleneck? I presume the large Windows repo? Or can I reproduce it with a small repo such as linux.git or even git.git?) and some numbers how this improves the performance? I was about to say the same thing. Normally I don't mind a small optimization without numbers if the result is obviously an improvement. But in this case the result is a lot less readable, and it's not entirely clear to me that it would always be an improvement (we now always run 3 strbuf calls instead of one, and have to check the length for each one). What I'm wondering specifically is if vsnprintf() on Kevin's platform (which I'll assume is Windows) is slow, and we would do better to replace it with a faster compat/ routine. -Peff The strbuf_add call is essentially only having to do a memcpy whereas the strbuf_addf will have to parse the string, determine the types, convert the data, and then get it in the buffer. That could be made faster with a better compat/ routine but I fear still far from the length check and memcpy. void strbuf_add(struct strbuf *sb, const void *data, size_t len) { strbuf_grow(sb, len); memcpy(sb->buf + sb->len, data, len); strbuf_setlen(sb, sb->len + len); } Here are some of the performance numbers from the windows repo. I will work on writing a perf test for this change so that we have a better idea on smaller repo what the impact of this change is on them. | w/o | with fix | --- git checkout | 36.08 s | 33.34 s | --- git checkout | 32.54 s | 28.26 s | --- git checkout | 44.10 s | 38.13 s | --- git merge| 32.90 s | 30.56 s | --- git rebase | 46.14 s | 42.18 s |
[PATCH] commit: skip discarding the index if there is no pre-commit hook
If there is not a pre-commit hook, there is no reason to discard the index and reread it. This change checks to presence of a pre-commit hook and then only discards the index if there was one. Signed-off-by: Kevin Willford --- builtin/commit.c | 29 + 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index e7a2cb6285..443949d87b 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -671,12 +671,22 @@ static int prepare_to_commit(const char *index_file, const char *prefix, const char *hook_arg2 = NULL; int clean_message_contents = (cleanup_mode != CLEANUP_NONE); int old_display_comment_prefix; + const char *precommit_hook = NULL; /* This checks and barfs if author is badly specified */ determine_author_info(author_ident); - if (!no_verify && run_commit_hook(use_editor, index_file, "pre-commit", NULL)) - return 0; + + if (!no_verify) { + /* +* Check to see if there is a pre-commit hook +* If there not one we can skip discarding the index later on +*/ + precommit_hook = find_hook("pre-commit"); + if (precommit_hook && + run_commit_hook(use_editor, index_file, "pre-commit", NULL)) + return 0; + } if (squash_message) { /* @@ -940,12 +950,15 @@ static int prepare_to_commit(const char *index_file, const char *prefix, return 0; } - /* -* Re-read the index as pre-commit hook could have updated it, -* and write it out as a tree. We must do this before we invoke -* the editor and after we invoke run_status above. -*/ - discard_cache(); + if (!no_verify && precommit_hook) { + /* +* Re-read the index as pre-commit hook could have updated it, +* and write it out as a tree. We must do this before we invoke +* the editor and after we invoke run_status above. +*/ + discard_cache(); + } + read_cache_from(index_file); if (update_main_cache_tree(0)) { error(_("Error building trees")); -- 2.14.0.rc0.286.g44127d70e4
[PATCH] cache-tree: remove use of strbuf_addf in update_one
String formatting can be a performance issue when there are hundreds of thousands of trees. Change to stop using the strbuf_addf and just add the strings or characters individually. There are a limited number of modes so added a switch for the known ones and a default case if something comes through that are not a known one for git. Signed-off-by: Kevin Willford --- cache-tree.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 2440d1dc89..41744b3db7 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -390,7 +390,29 @@ static int update_one(struct cache_tree *it, continue; strbuf_grow(&buffer, entlen + 100); - strbuf_addf(&buffer, "%o %.*s%c", mode, entlen, path + baselen, '\0'); + + switch (mode) { + case 0100644: + strbuf_add(&buffer, "100644 ", 7); + break; + case 0100664: + strbuf_add(&buffer, "100664 ", 7); + break; + case 0100755: + strbuf_add(&buffer, "100755 ", 7); + break; + case 012: + strbuf_add(&buffer, "12 ", 7); + break; + case 016: + strbuf_add(&buffer, "16 ", 7); + break; + default: + strbuf_addf(&buffer, "%o ", mode); + break; + } + strbuf_add(&buffer, path + baselen, entlen); + strbuf_addch(&buffer, '\0'); strbuf_add(&buffer, sha1, 20); #if DEBUG -- 2.14.0.rc0.286.g44127d70e4
[PATCH v2 0/2] Add progress for format-patch and rebase
Changes since last patch: 1. Use start_progress_delay so progress isn't shown if generating the patches is fast enough 2. Updated to have text of "Generating patches" 3. Only show progress when the --progress flag is passed 4. In the rebase script check stderr and the quiet option is not set before propagating the progress flag to format-patch Kevin Willford (2): format-patch: have progress option while generating patches rebase: turn on progress option by default for format-patch Documentation/git-format-patch.txt | 4 builtin/log.c | 10 ++ git-rebase--am.sh | 1 + git-rebase.sh | 6 ++ 4 files changed, 21 insertions(+) -- 2.14.0.rc0.286.g44127d70e4
[PATCH v2 1/2] format-patch: have progress option while generating patches
When generating patches for the rebase command if the user does not realize the branch they are rebasing onto is thousands of commits different there is no progress indication after initial rewinding message. The progress meter as presented in this patch assumes the thousands of patches to have a fine granularity as well as assuming to require all the same amount of work/time for each, such that a steady progress bar is achieved. We do not want to estimate the time for each patch based e.g. on their size or number of touched files (or parents) as that is too expensive for just a progress meter. This patch allows a progress option to be passed to format-patch so that the user can be informed the progress of generating the patch. This option is then used by the rebase command when calling format-patch. Signed-off-by: Kevin Willford --- Documentation/git-format-patch.txt | 4 builtin/log.c | 10 ++ 2 files changed, 14 insertions(+) diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt index c890328b02..6cbe462a77 100644 --- a/Documentation/git-format-patch.txt +++ b/Documentation/git-format-patch.txt @@ -23,6 +23,7 @@ SYNOPSIS [(--reroll-count|-v) ] [--to=] [--cc=] [--[no-]cover-letter] [--quiet] [--notes[=]] + [--progress] [] [ | ] @@ -283,6 +284,9 @@ you can use `--suffix=-patch` to get `0001-description-of-my-change-patch`. range are always formatted as creation patches, independently of this flag. +--progress:: + Show progress reports on stderr as patches are generated. + CONFIGURATION - You can specify extra mail header lines to be added to each message, diff --git a/builtin/log.c b/builtin/log.c index 725c7b8a1a..b07a5529c2 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -27,6 +27,7 @@ #include "version.h" #include "mailmap.h" #include "gpg-interface.h" +#include "progress.h" /* Set a default date-time format for git log ("log.date" config variable) */ static const char *default_date_mode = NULL; @@ -1422,6 +1423,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) char *branch_name = NULL; char *base_commit = NULL; struct base_tree_info bases; + int show_progress = 0; + struct progress *progress = NULL; const struct option builtin_format_patch_options[] = { { OPTION_CALLBACK, 'n', "numbered", &numbered, NULL, @@ -1493,6 +1496,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) OPT_FILENAME(0, "signature-file", &signature_file, N_("add a signature from a file")), OPT__QUIET(&quiet, N_("don't print the patch filenames")), + OPT_BOOL(0, "progress", &show_progress, +N_("show progress while generating patches")), OPT_END() }; @@ -1752,8 +1757,12 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) start_number--; } rev.add_signoff = do_signoff; + + if (show_progress) + progress = start_progress_delay(_("Generating patches"), total, 0, 1); while (0 <= --nr) { int shown; + display_progress(progress, total - nr); commit = list[nr]; rev.nr = total - nr + (start_number - 1); /* Make the second and subsequent mails replies to the first */ @@ -1818,6 +1827,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) if (!use_stdout) fclose(rev.diffopt.file); } + stop_progress(&progress); free(list); free(branch_name); string_list_clear(&extra_to, 0); -- 2.14.0.rc0.286.g44127d70e4
[PATCH v2 2/2] rebase: turn on progress option by default for format-patch
This change passes the progress option of format-patch checking that stderr is attached and rebase is not being run in quiet mode. Signed-off-by: Kevin Willford --- git-rebase--am.sh | 1 + git-rebase.sh | 6 ++ 2 files changed, 7 insertions(+) diff --git a/git-rebase--am.sh b/git-rebase--am.sh index 375239341f..ff98fe3a73 100644 --- a/git-rebase--am.sh +++ b/git-rebase--am.sh @@ -53,6 +53,7 @@ else git format-patch -k --stdout --full-index --cherry-pick --right-only \ --src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \ + $git_format_patch_opt \ "$revisions" ${restrict_revision+^$restrict_revision} \ >"$GIT_DIR/rebased-patches" ret=$? diff --git a/git-rebase.sh b/git-rebase.sh index f8b3d1fd97..ad8415e3cf 100755 --- a/git-rebase.sh +++ b/git-rebase.sh @@ -74,6 +74,7 @@ test "$(git config --bool rebase.stat)" = true && diffstat=t autostash="$(git config --bool rebase.autostash || echo false)" fork_point=auto git_am_opt= +git_format_patch_opt= rebase_root= force_rebase= allow_rerere_autoupdate= @@ -445,6 +446,11 @@ else state_dir="$apply_dir" fi +if test -t 2 && test -z "$GIT_QUIET" +then + git_format_patch_opt="$git_format_patch_opt --progress" +fi + if test -z "$rebase_root" then case "$#" in -- 2.14.0.rc0.286.g44127d70e4
RE: [PATCH 2/2] rebase: turn on progress option by default for format-patch
> -Original Message- > From: Stefan Beller [mailto:sbel...@google.com] > Sent: Wednesday, May 31, 2017 1:09 PM > To: Kevin Willford > Cc: git@vger.kernel.org; Junio C Hamano ; Kevin > Willford > Subject: Re: [PATCH 2/2] rebase: turn on progress option by default for > format-patch > > On Wed, May 31, 2017 at 8:04 AM, Kevin Willford > wrote: > > This change passes the progress option of format-patch by default and > > passes the -q --quiet option through to the format-patch call so that > > it is respected as well. > > This is not conflicting with Johannes rewrite of rebase in C? > (rebase is a huge beast IIUC) I will check with Johannes and see what possible conflicts there could be. Since these are flags that get passed to the format-patch code, it shouldn't take much to put it in the C code as well. > > When passing the progress option by default to formatting patches, maybe > we should use start_progress_delay in the previous patch instead to omit > the progress for short lived patch formatting sessions? > (say a delay of one second?) > I thought about that and certainly could do it but I have found it nice to have the number of patches that are generated in the output even for a small number or commits. For example when I run a `git rebase master` and expect there to be only 2 commits, the message "Generating patch: 100% (2/2), done." Gives me that good feeling that I did it right and didn't mess something up. I'm good either way though. > Thanks, > Stefan > > > > > Signed-off-by: Kevin Willford > > --- > > git-rebase--am.sh | 5 +++-- > > git-rebase.sh | 2 ++ > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/git-rebase--am.sh b/git-rebase--am.sh index > > 375239341f..ab2be30abf 100644 > > --- a/git-rebase--am.sh > > +++ b/git-rebase--am.sh > > @@ -51,8 +51,9 @@ then > > else > > rm -f "$GIT_DIR/rebased-patches" > > > > - git format-patch -k --stdout --full-index --cherry-pick > > --right-only \ > > - --src-prefix=a/ --dst-prefix=b/ --no-renames > > --no-cover-letter \ > > + git format-patch $git_format_patch_opt -k --stdout --full-index \ > > + --cherry-pick --right-only --src-prefix=a/ --dst-prefix=b/ \ > > + --no-renames --no-cover-letter --progress \ > > "$revisions" ${restrict_revision+^$restrict_revision} \ > > >"$GIT_DIR/rebased-patches" > > ret=$? > > diff --git a/git-rebase.sh b/git-rebase.sh index > > db1deed846..b72e319ac9 100755 > > --- a/git-rebase.sh > > +++ b/git-rebase.sh > > @@ -73,6 +73,7 @@ test "$(git config --bool rebase.stat)" = true && > > diffstat=t autostash="$(git config --bool rebase.autostash || echo false)" > > fork_point=auto > > git_am_opt= > > +git_format_patch_opt= > > rebase_root= > > force_rebase= > > allow_rerere_autoupdate= > > @@ -308,6 +309,7 @@ do > > --quiet) > > GIT_QUIET=t > > git_am_opt="$git_am_opt -q" > > + git_format_patch_opt="$git_format_patch_opt -q" > > verbose= > > diffstat= > > ;; > > -- > > 2.13.0.92.g73a4ce6a77 > >
RE: [PATCH 1/2] format-patch: have progress option while generating patches
> -Original Message- > From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On > Behalf Of Stefan Beller > Sent: Wednesday, May 31, 2017 12:40 PM > To: Kevin Willford > Cc: git@vger.kernel.org; Junio C Hamano ; Kevin > Willford > Subject: Re: [PATCH 1/2] format-patch: have progress option while > generating patches > > On Wed, May 31, 2017 at 8:04 AM, Kevin Willford > wrote: > > When generating patches for the rebase command if the user does not > > realize the branch they are rebasing onto is thousands of commits > > different there is no progress indication after initial rewinding > > message. > > > > This patch allows a progress option to be passed to format-patch so > > that the user can be informed the progress of generating the patch. > > This option will then be used by the rebase command when calling > > format-patch. > > After reading the code, I was looking for some explanation on the underlying > assumptions, such as: > > The progress meter as presented in this patch assumes the thousands of > patches to have a fine granularity as well as assuming to require all the > same amount of work/time for each, such that a steady progress bar > is achieved. > > We do not want to estimate the time for each patch based e.g. > on their size or number of touched files (or parents) as that is too > expensive for just a progress meter. > Sounds good. I will add some explanation to the commit message. > > > > > Signed-off-by: Kevin Willford > > --- > > Documentation/git-format-patch.txt | 8 > > builtin/log.c | 10 ++ > > 2 files changed, 18 insertions(+) > > > > diff --git a/Documentation/git-format-patch.txt > > b/Documentation/git-format-patch.txt > > index c890328b02..ee5f99f606 100644 > > --- a/Documentation/git-format-patch.txt > > +++ b/Documentation/git-format-patch.txt > > @@ -23,6 +23,7 @@ SYNOPSIS > >[(--reroll-count|-v) ] > >[--to=] [--cc=] > >[--[no-]cover-letter] [--quiet] [--notes[=]] > > + [--progress] > >[] > >[ | ] > > > > @@ -260,6 +261,7 @@ you can use `--suffix=-patch` to get `0001- > description-of-my-change-patch`. > > -q:: > > --quiet:: > > Do not print the names of the generated files to standard output. > > + Progress is not reported to the standard error stream. > > > > --no-binary:: > > Do not output contents of changes in binary files, instead @@ > > -283,6 +285,12 @@ you can use `--suffix=-patch` to get `0001-description- > of-my-change-patch`. > > range are always formatted as creation patches, independently > > of this flag. > > > > +--progress:: > > + Progress status is reported on the standard error stream > > + by default when it is attached to a terminal, unless -q > > + is specified. This flag forces progress status even if the > > + standard error stream is not directed to a terminal. > > + > > CONFIGURATION > > - > > You can specify extra mail header lines to be added to each message, > > diff --git a/builtin/log.c b/builtin/log.c index > > 631fbc984f..02c50431b6 100644 > > --- a/builtin/log.c > > +++ b/builtin/log.c > > @@ -26,6 +26,7 @@ > > #include "version.h" > > #include "mailmap.h" > > #include "gpg-interface.h" > > +#include "progress.h" > > > > /* Set a default date-time format for git log ("log.date" config > > variable) */ static const char *default_date_mode = NULL; @@ -1409,6 > > +1410,8 @@ int cmd_format_patch(int argc, const char **argv, const char > *prefix) > > char *branch_name = NULL; > > char *base_commit = NULL; > > struct base_tree_info bases; > > + int show_progress = 0; > > + struct progress *progress = NULL; > > > > const struct option builtin_format_patch_options[] = { > > { OPTION_CALLBACK, 'n', "numbered", &numbered, NULL, > > @@ -1480,6 +1483,8 @@ int cmd_format_patch(int argc, const char **argv, > const char *prefix) > > OPT_FILENAME(0, "signature-file", &signature_file, > > N_("add a signature from a file")), > > OPT__QUIET(&quiet, N_("don't print the patch > > filenames")), > &
[PATCH 0/2] Add progress to format-patch and rebase
This patch series is to add progress for the user when generating patches in format-patch. This is to aid the user when performing large rebases and have some indication to the user that progress is being made. These patches just add the normal progress indication when generating patches and makes it the default when running rebase. It will also respect the -q[uiet] flag and not show when it is present. Kevin Willford (2): format-patch: have progress option while generating patches rebase: turn on progress option by default for format-patch Documentation/git-format-patch.txt | 8 builtin/log.c | 10 ++ git-rebase--am.sh | 5 +++-- git-rebase.sh | 2 ++ 4 files changed, 23 insertions(+), 2 deletions(-) -- 2.13.0.92.g73a4ce6a77
[PATCH 1/2] format-patch: have progress option while generating patches
When generating patches for the rebase command if the user does not realize the branch they are rebasing onto is thousands of commits different there is no progress indication after initial rewinding message. This patch allows a progress option to be passed to format-patch so that the user can be informed the progress of generating the patch. This option will then be used by the rebase command when calling format-patch. Signed-off-by: Kevin Willford --- Documentation/git-format-patch.txt | 8 builtin/log.c | 10 ++ 2 files changed, 18 insertions(+) diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt index c890328b02..ee5f99f606 100644 --- a/Documentation/git-format-patch.txt +++ b/Documentation/git-format-patch.txt @@ -23,6 +23,7 @@ SYNOPSIS [(--reroll-count|-v) ] [--to=] [--cc=] [--[no-]cover-letter] [--quiet] [--notes[=]] + [--progress] [] [ | ] @@ -260,6 +261,7 @@ you can use `--suffix=-patch` to get `0001-description-of-my-change-patch`. -q:: --quiet:: Do not print the names of the generated files to standard output. + Progress is not reported to the standard error stream. --no-binary:: Do not output contents of changes in binary files, instead @@ -283,6 +285,12 @@ you can use `--suffix=-patch` to get `0001-description-of-my-change-patch`. range are always formatted as creation patches, independently of this flag. +--progress:: + Progress status is reported on the standard error stream + by default when it is attached to a terminal, unless -q + is specified. This flag forces progress status even if the + standard error stream is not directed to a terminal. + CONFIGURATION - You can specify extra mail header lines to be added to each message, diff --git a/builtin/log.c b/builtin/log.c index 631fbc984f..02c50431b6 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -26,6 +26,7 @@ #include "version.h" #include "mailmap.h" #include "gpg-interface.h" +#include "progress.h" /* Set a default date-time format for git log ("log.date" config variable) */ static const char *default_date_mode = NULL; @@ -1409,6 +1410,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) char *branch_name = NULL; char *base_commit = NULL; struct base_tree_info bases; + int show_progress = 0; + struct progress *progress = NULL; const struct option builtin_format_patch_options[] = { { OPTION_CALLBACK, 'n', "numbered", &numbered, NULL, @@ -1480,6 +1483,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) OPT_FILENAME(0, "signature-file", &signature_file, N_("add a signature from a file")), OPT__QUIET(&quiet, N_("don't print the patch filenames")), + OPT_BOOL(0, "progress", &show_progress, +N_("show progress")), OPT_END() }; @@ -1739,8 +1744,12 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) start_number--; } rev.add_signoff = do_signoff; + + if (show_progress && !quiet) + progress = start_progress(_("Generating patch"), total); while (0 <= --nr) { int shown; + display_progress(progress, total - nr); commit = list[nr]; rev.nr = total - nr + (start_number - 1); /* Make the second and subsequent mails replies to the first */ @@ -1805,6 +1814,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) if (!use_stdout) fclose(rev.diffopt.file); } + stop_progress(&progress); free(list); free(branch_name); string_list_clear(&extra_to, 0); -- 2.13.0.92.g73a4ce6a77
[PATCH 2/2] rebase: turn on progress option by default for format-patch
This change passes the progress option of format-patch by default and passes the -q --quiet option through to the format-patch call so that it is respected as well. Signed-off-by: Kevin Willford --- git-rebase--am.sh | 5 +++-- git-rebase.sh | 2 ++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/git-rebase--am.sh b/git-rebase--am.sh index 375239341f..ab2be30abf 100644 --- a/git-rebase--am.sh +++ b/git-rebase--am.sh @@ -51,8 +51,9 @@ then else rm -f "$GIT_DIR/rebased-patches" - git format-patch -k --stdout --full-index --cherry-pick --right-only \ - --src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \ + git format-patch $git_format_patch_opt -k --stdout --full-index \ + --cherry-pick --right-only --src-prefix=a/ --dst-prefix=b/ \ + --no-renames --no-cover-letter --progress \ "$revisions" ${restrict_revision+^$restrict_revision} \ >"$GIT_DIR/rebased-patches" ret=$? diff --git a/git-rebase.sh b/git-rebase.sh index db1deed846..b72e319ac9 100755 --- a/git-rebase.sh +++ b/git-rebase.sh @@ -73,6 +73,7 @@ test "$(git config --bool rebase.stat)" = true && diffstat=t autostash="$(git config --bool rebase.autostash || echo false)" fork_point=auto git_am_opt= +git_format_patch_opt= rebase_root= force_rebase= allow_rerere_autoupdate= @@ -308,6 +309,7 @@ do --quiet) GIT_QUIET=t git_am_opt="$git_am_opt -q" + git_format_patch_opt="$git_format_patch_opt -q" verbose= diffstat= ;; -- 2.13.0.92.g73a4ce6a77
RE: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.
> -Original Message- > From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On > Behalf Of Duy Nguyen > Sent: Wednesday, April 12, 2017 7:21 AM > To: Kevin Willford > Cc: Kevin Willford ; git@vger.kernel.org; > gits...@pobox.com; p...@peff.net > Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid > data loss. > > On Wed, Apr 12, 2017 at 5:30 AM, Kevin Willford > wrote: > > The loss of the skip-worktree bits is part of the problem if you are > > talking about modified files. The other issue that I was having is > > when running a reset and there were files added in the commit that is > > being reset, there will not be an entry in the index and not a file on > > disk so the data for that file is completely lost at that point. > > "status" also doesn't include anything about this loss of data. On > > modified files status will at least have the file as deleted since > > there is still an index entry but again the previous version of the file > > and it's > data is lost. > > Well, we could have "deleted" index entries, those marked with > CE_REMOVE. They are never written down to file though, so 'status' > won't benefit from that. Hopefully we can restore the file before the index > file is written down and we really lose skip-worktree bits? Because this is a reset --mixed it will never run through unpack_trees and The entries are never marked with CE_REMOVE. > > > To me this is totally unexpected behavior, for example if I am on a > > commit where there are only files that where added and run a reset > > HEAD~1 and then a status, it will show a clean working directory. > > Regardless of skip-worktree bits the user needs to have the data in > > the working directory after the reset or data is lost which is always bad. > > I agree we no longer have a place to save the skip-worktree bit, we have to > restore the state back as if skip-worktree bit does not exist. > It would be good if we could keep the logic in unpack_trees() though. > For example, if the file is present on disk even if skip-worktree bit is on, > unpack_trees() would abort instead of silently overwriting it. > This is a difference between skip-worktree and assume-unchanged bits. > If you do explicit checkout_entry() you might have to add more checks to > keep behavior consistent. > -- > Duy Because this is a reset --mixed it will follow the code path calling read_from_tree and ends up calling update_index_from_diff in the format_callback of the diff, so unpack_trees() is never called in the --mixed case. This code change also only applies when the file does not exist and the skip-worktree bit is on and the updated index entry either will be missing (covers the added scenario) or was not missing before (covers the modified scenario). If there is a better way to get the previous index entry to disk than what I am doing, I am happy to implement it correctly. Thanks, Kevin
RE: [PATCH 3/3] reset.c: update files when using sparse to avoid data loss.
> -Original Message- > From: Duy Nguyen [mailto:pclo...@gmail.com] > Sent: Monday, April 10, 2017 4:24 AM > To: Kevin Willford > Cc: git@vger.kernel.org; gits...@pobox.com; p...@peff.net; Kevin Willford > > Subject: Re: [PATCH 3/3] reset.c: update files when using sparse to avoid > data loss. > > On Fri, Apr 07, 2017 at 12:23:57PM -0700, Kevin Willford wrote: > > When using the sparse checkout feature the git reset command will add > > "git reset" has three different modes. It would be good if you mention what > mode is affected here. The tests are for --mixed only. I wonder if we need to > do anything for --hard and --soft? > > --soft touches branch SHA-1 index only, not worktree, so probably not. > > --hard should be handled by unpack_trees(), I think. > > But it would be good to cover these in the commit message as well to stop > readers from wondering. Sounds good. > > > entries to the index that will have the skip-worktree bit off but will > > leave the working directory empty. File data is lost because the > > index version of the files has been changed but there is nothing that > > is in the working directory. This will cause the next status call to > > show either deleted for files modified or deleting or nothing for files > added. > > The added files should be shown as untracked and modified files should > > be shown as modified. > > Hmm.. reading --mixed documentation again ("Resets the index but not > working tree"), I think the current behavior is expected regardless of skip- > worktree bits. > > Perhaps the problem is the loss of skip-worktree bits on entries added by > update_index_from_diff()? If the bits are at the right place, then it should > not matter if the same version exists on worktree or not and "status" or > "commit" should work as expected, I think. > > -- > Duy The loss of the skip-worktree bits is part of the problem if you are talking about modified files. The other issue that I was having is when running a reset and there were files added in the commit that is being reset, there will not be an entry in the index and not a file on disk so the data for that file is completely lost at that point. "status" also doesn't include anything about this loss of data. On modified files status will at least have the file as deleted since there is still an index entry but again the previous version of the file and it's data is lost. To me this is totally unexpected behavior, for example if I am on a commit where there are only files that where added and run a reset HEAD~1 and then a status, it will show a clean working directory. Regardless of skip-worktree bits the user needs to have the data in the working directory after the reset or data is lost which is always bad. Kevin
[PATCH 2/3] apply.c: do not checkout file when skip-worktree bit set
When using the sparse-checkout feature git should not write to the working directory for files with the skip-worktree bit on. With the skip-worktree bit on the file may or may not be in the working directory and if it is not we don't want or need to create it by calling checkout_entry. There are two callers of checkout_target. Both of which check that the file does not exist before calling checkout_target. load_current which make a call to lstat right before calling checkout_target and check_preimage which will only run checkout_taret it stat_ret is less than zero. It sets stat_ret to zero and only if !stat->cached will it lstat the file and set stat_ret to something other than zero. This patch checks if skip-worktree bit is on in checkout_target and just returns so that the entry doesn't not end up in the working directory. This is so that apply will not create a file in the working directory, then update the index but not keep the working directory up to date with the changes that happened in the index. Signed-off-by: Kevin Willford --- apply.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/apply.c b/apply.c index 0e2caeab9c..0da5d0b7c9 100644 --- a/apply.c +++ b/apply.c @@ -3327,6 +3327,24 @@ static int checkout_target(struct index_state *istate, { struct checkout costate = CHECKOUT_INIT; + /* +* Do not checkout the entry if the skipworktree bit is set +* +* Both callers of this method (check_preimage and load_current) +* check for the existance of the file before calling this +* method so we know that the file doesn't exist at this point +* and we don't need to perform that check again here. +* We just need to check the skip-worktree and return. +* +* This is to prevent git from creating a file in the +* working directory that has the skip-worktree bit on, +* then updating the index from the patch and not keeping +* the working directory version up to date with what it +* changed the index version to be. +*/ + if (ce_skip_worktree(ce)) + return 0; + costate.refresh_cache = 1; costate.istate = istate; if (checkout_entry(ce, &costate, NULL) || lstat(ce->name, st)) -- 2.12.2.windows.2.1.g7df5db8d31
[PATCH 3/3] reset.c: update files when using sparse to avoid data loss.
When using the sparse checkout feature the git reset command will add entries to the index that will have the skip-worktree bit off but will leave the working directory empty. File data is lost because the index version of the files has been changed but there is nothing that is in the working directory. This will cause the next status call to show either deleted for files modified or deleting or nothing for files added. The added files should be shown as untracked and modified files should be shown as modified. To fix this when the reset is running if there is not a file in the working directory and if it will be missing with the new index entry or was not missing in the previous version, we create the previous index version of the file in the working directory so that status will report correctly and the files will be availble for the user to deal with. Signed-off-by: Kevin Willford --- builtin/reset.c | 34 +++ t/t7114-reset-sparse-checkout.sh | 58 2 files changed, 92 insertions(+) create mode 100755 t/t7114-reset-sparse-checkout.sh diff --git a/builtin/reset.c b/builtin/reset.c index 8ab915bfcb..8ba97999f4 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -21,6 +21,7 @@ #include "parse-options.h" #include "unpack-trees.h" #include "cache-tree.h" +#include "dir.h" static const char * const git_reset_usage[] = { N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] []"), @@ -117,12 +118,45 @@ static void update_index_from_diff(struct diff_queue_struct *q, struct diff_options *opt, void *data) { int i; + int pos; int intent_to_add = *(int *)data; for (i = 0; i < q->nr; i++) { struct diff_filespec *one = q->queue[i]->one; + struct diff_filespec *two = q->queue[i]->two; int is_missing = !(one->mode && !is_null_oid(&one->oid)); + int was_missing = !two->mode && is_null_oid(&two->oid); struct cache_entry *ce; + struct cache_entry *ceBefore; + struct checkout state = CHECKOUT_INIT; + + /* +* When using the sparse-checkout feature the cache entries that are +* added here will not have the skip-worktree bit set. +* Without this code there is data that is lost because the files that +* would normally be in the working directory are not there and show as +* deleted for the next status or in the case of added files just disappear. +* We need to create the previous version of the files in the working +* directory so that they will have the right content and the next +* status call will show modified or untracked files correctly. +*/ + if (core_apply_sparse_checkout && !file_exists(two->path)) + { + pos = cache_name_pos(two->path, strlen(two->path)); + if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) && (is_missing || !was_missing)) + { + state.force = 1; + state.refresh_cache = 1; + state.istate = &the_index; + ceBefore = make_cache_entry(two->mode, two->oid.hash, two->path, + 0, 0); + if (!ceBefore) + die(_("make_cache_entry failed for path '%s'"), + two->path); + + checkout_entry(ceBefore, &state, NULL); + } + } if (is_missing && !intent_to_add) { remove_file_from_cache(one->path); diff --git a/t/t7114-reset-sparse-checkout.sh b/t/t7114-reset-sparse-checkout.sh new file mode 100755 index 00..8dd88fd46d --- /dev/null +++ b/t/t7114-reset-sparse-checkout.sh @@ -0,0 +1,58 @@ +#!/bin/sh + +test_description='reset when using a sparse-checkout' + +. ./test-lib.sh + +# reset using a sparse-checkout file + +test_expect_success 'setup' ' + test_tick && + echo "checkout file" >c && + echo "modify file" >m && + echo "delete file" >d && + git add . && + git commit -m "initial commit" && + echo "added file" >a && + echo "modification of a file" >m && + git rm d && + git add . && +
[PATCH 0/3] fix working directory file issues while using sparse-checkout
While using the sparse-checkout feature there are scenarios where the working directory should and should not be updated. This patch series addresses issues found in reset, apply, and merge conflicts. Kevin Willford (3): merge-recursive.c: conflict using sparse should update file apply.c: do not checkout file when skip-worktree bit set reset.c: update files when using sparse to avoid data loss. apply.c | 18 + builtin/reset.c | 34 +++ merge-recursive.c| 8 ++ t/t7114-reset-sparse-checkout.sh | 58 t/t7614-merge-sparse-checkout.sh | 27 +++ 5 files changed, 145 insertions(+) create mode 100755 t/t7114-reset-sparse-checkout.sh create mode 100755 t/t7614-merge-sparse-checkout.sh -- 2.12.2.windows.2.1.g7df5db8d31
[PATCH 1/3] merge-recursive.c: conflict using sparse should update file
Update the file when there is a conflict with a modify/delete scenario when using the sparse-checkout feature since the file might not be on disk because the skip-worktree bit is on and the user will need the file and content to determine how to resolve the conflict. Signed-off-by: Kevin Willford --- merge-recursive.c| 8 t/t7614-merge-sparse-checkout.sh | 27 +++ 2 files changed, 35 insertions(+) create mode 100755 t/t7614-merge-sparse-checkout.sh diff --git a/merge-recursive.c b/merge-recursive.c index b7ff1ada3c..54fce93dae 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1103,6 +1103,14 @@ static int handle_change_delete(struct merge_options *o, "and %s in %s. Version %s of %s left in tree."), change, path, o->branch2, change_past, o->branch1, o->branch1, path); + /* +* In a sparse checkout the file may not exist. Sadly, +* the CE_SKIP_WORKTREE flag is not preserved in the +* case of conflicts, therefore we do the next best +* thing: verify that the file exists. +*/ + if (!file_exists(path)) + ret = update_file(o, 0, a_oid, a_mode, path); } else { output(o, 1, _("CONFLICT (%s/delete): %s deleted in %s " "and %s in %s. Version %s of %s left in tree at %s."), diff --git a/t/t7614-merge-sparse-checkout.sh b/t/t7614-merge-sparse-checkout.sh new file mode 100755 index 00..6922f0244f --- /dev/null +++ b/t/t7614-merge-sparse-checkout.sh @@ -0,0 +1,27 @@ +#!/bin/sh + +test_description='merge can handle sparse-checkout' + +. ./test-lib.sh + +# merges with conflicts + +test_expect_success 'setup' ' + test_commit a && + test_commit file && + git checkout -b delete-file && + git rm file.t && + test_tick && + git commit -m "remove file" && + git checkout master && + test_commit modify file.t changed +' + +test_expect_success 'merge conflict deleted file and modified' ' + echo "/a" >.git/info/sparse-checkout && + test_config core.sparsecheckout true && + test_must_fail git merge delete-file && + test_path_is_file file.t +' + +test_done -- 2.12.2.windows.2.1.g7df5db8d31
[[PATCH v2] 2/4] patch-ids: replace the seen indicator with a commit pointer
From: Kevin Willford The cherry_pick_list was looping through the original side checking the seen indicator and setting the cherry_flag on the commit. If we save off the commit in the patch_id we can set the cherry_flag on the correct commit when running through the other side when a patch_id match is found. Signed-off-by: Kevin Willford --- patch-ids.c | 1 + patch-ids.h | 2 +- revision.c | 18 +++--- 3 files changed, 5 insertions(+), 16 deletions(-) diff --git a/patch-ids.c b/patch-ids.c index db31fa6..bafaae2 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -43,6 +43,7 @@ static int init_patch_id_entry(struct patch_id *patch, struct commit *commit, struct patch_ids *ids) { + patch->commit = commit; if (commit_patch_id(commit, &ids->diffopts, patch->patch_id)) return -1; diff --git a/patch-ids.h b/patch-ids.h index 9569ee0..dea1ecd 100644 --- a/patch-ids.h +++ b/patch-ids.h @@ -4,7 +4,7 @@ struct patch_id { struct hashmap_entry ent; unsigned char patch_id[GIT_SHA1_RAWSZ]; - char seen; + struct commit *commit; }; struct patch_ids { diff --git a/revision.c b/revision.c index edba5b7..19cc067 100644 --- a/revision.c +++ b/revision.c @@ -846,7 +846,7 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs) */ if (left_first != !!(flags & SYMMETRIC_LEFT)) continue; - commit->util = add_commit_patch_id(commit, &ids); + add_commit_patch_id(commit, &ids); } /* either cherry_mark or cherry_pick are true */ @@ -873,21 +873,9 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs) id = has_commit_patch_id(commit, &ids); if (!id) continue; - id->seen = 1; - commit->object.flags |= cherry_flag; - } - /* Now check the original side for seen ones */ - for (p = list; p; p = p->next) { - struct commit *commit = p->item; - struct patch_id *ent; - - ent = commit->util; - if (!ent) - continue; - if (ent->seen) - commit->object.flags |= cherry_flag; - commit->util = NULL; + commit->object.flags |= cherry_flag; + id->commit->object.flags |= cherry_flag; } free_patch_ids(&ids); -- 2.9.2.gvfs.2.42.gb7633a3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[[PATCH v2] 3/4] patch-ids: add flag to create the diff patch id using header only data
From: Kevin Willford This will allow a diff patch id to be created using only the header data so that the contents of the file will not have to be loaded. Signed-off-by: Kevin Willford --- diff.c | 16 ++-- diff.h | 2 +- patch-ids.c | 2 +- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 7d03419..28a4190 100644 --- a/diff.c +++ b/diff.c @@ -4455,7 +4455,7 @@ static void patch_id_consume(void *priv, char *line, unsigned long len) } /* returns 0 upon success, and writes result into sha1 */ -static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) +static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) { struct diff_queue_struct *q = &diff_queued_diff; int i; @@ -4490,9 +4490,6 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) diff_fill_sha1_info(p->one); diff_fill_sha1_info(p->two); - if (fill_mmfile(&mf1, p->one) < 0 || - fill_mmfile(&mf2, p->two) < 0) - return error("unable to read files to diff"); len1 = remove_space(p->one->path, strlen(p->one->path)); len2 = remove_space(p->two->path, strlen(p->two->path)); @@ -4527,6 +4524,13 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) len2, p->two->path); git_SHA1_Update(&ctx, buffer, len1); + if (diff_header_only) + continue; + + if (fill_mmfile(&mf1, p->one) < 0 || + fill_mmfile(&mf2, p->two) < 0) + return error("unable to read files to diff"); + if (diff_filespec_is_binary(p->one) || diff_filespec_is_binary(p->two)) { git_SHA1_Update(&ctx, oid_to_hex(&p->one->oid), @@ -4549,11 +4553,11 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) return 0; } -int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1) +int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1, int diff_header_only) { struct diff_queue_struct *q = &diff_queued_diff; int i; - int result = diff_get_patch_id(options, sha1); + int result = diff_get_patch_id(options, sha1, diff_header_only); for (i = 0; i < q->nr; i++) diff_free_filepair(q->queue[i]); diff --git a/diff.h b/diff.h index 125447b..7883729 100644 --- a/diff.h +++ b/diff.h @@ -342,7 +342,7 @@ extern int run_diff_files(struct rev_info *revs, unsigned int option); extern int run_diff_index(struct rev_info *revs, int cached); extern int do_diff_cache(const unsigned char *, struct diff_options *); -extern int diff_flush_patch_id(struct diff_options *, unsigned char *); +extern int diff_flush_patch_id(struct diff_options *, unsigned char *, int); extern int diff_result_code(struct diff_options *, int); diff --git a/patch-ids.c b/patch-ids.c index bafaae2..69a14a3 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -13,7 +13,7 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, else diff_root_tree_sha1(commit->object.oid.hash, "", options); diffcore_std(options); - return diff_flush_patch_id(options, sha1); + return diff_flush_patch_id(options, sha1, 0); } static int patch_id_cmp(struct patch_id *a, -- 2.9.2.gvfs.2.42.gb7633a3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[[PATCH v2] 4/4] rebase: avoid computing unnecessary patch IDs
From: Kevin Willford The `rebase` family of Git commands avoid applying patches that were already integrated upstream. They do that by using the revision walking option that computes the patch IDs of the two sides of the rebase (local-only patches vs upstream-only ones) and skipping those local patches whose patch ID matches one of the upstream ones. In many cases, this causes unnecessary churn, as already the set of paths touched by a given commit would suffice to determine that an upstream patch has no local equivalent. This hurts performance in particular when there are a lot of upstream patches, and/or large ones. Therefore, let's introduce the concept of a "diff-header-only" patch ID, compare those first, and only evaluate the "full" patch ID lazily. Please note that in contrast to the "full" patch IDs, those "diff-header-only" patch IDs are prone to collide with one another, as adjacent commits frequently touch the very same files. Hence we now have to be careful to allow multiple hash entries with the same hash. We accomplish that by using the hashmap_add() function that does not even test for hash collisions. This also allows us to evaluate the full patch ID lazily, i.e. only when we found commits with matching diff-header-only patch IDs. We add a performance test that demonstrates ~1-6% improvement. In practice this will depend on various factors such as how many upstream changes and how big those changes are along with whether file system caches are cold or warm. As Git's test suite has no way of catching performance regressions, we also add a regression test that verifies that the full patch ID computation is skipped when the diff-header-only computation suffices. Signed-off-by: Kevin Willford --- builtin/log.c| 2 +- patch-ids.c | 22 -- patch-ids.h | 2 +- t/perf/p3400-rebase.sh | 36 t/t6007-rev-list-cherry-pick-file.sh | 17 + 5 files changed, 71 insertions(+), 8 deletions(-) create mode 100644 t/perf/p3400-rebase.sh diff --git a/builtin/log.c b/builtin/log.c index fd1652f..b076993 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1331,7 +1331,7 @@ static void prepare_bases(struct base_tree_info *bases, struct object_id *patch_id; if (commit->util) continue; - if (commit_patch_id(commit, &diffopt, sha1)) + if (commit_patch_id(commit, &diffopt, sha1, 0)) die(_("cannot get patch id")); ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, bases->alloc_patch_id); patch_id = bases->patch_id + bases->nr_patch_id; diff --git a/patch-ids.c b/patch-ids.c index 69a14a3..0a4828a 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -5,7 +5,7 @@ #include "patch-ids.h" int commit_patch_id(struct commit *commit, struct diff_options *options, - unsigned char *sha1) + unsigned char *sha1, int diff_header_only) { if (commit->parents) diff_tree_sha1(commit->parents->item->object.oid.hash, @@ -13,13 +13,21 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, else diff_root_tree_sha1(commit->object.oid.hash, "", options); diffcore_std(options); - return diff_flush_patch_id(options, sha1, 0); + return diff_flush_patch_id(options, sha1, diff_header_only); } static int patch_id_cmp(struct patch_id *a, struct patch_id *b, - void *keydata) + struct diff_options *opt) { + if (is_null_sha1(a->patch_id) && + commit_patch_id(a->commit, opt, a->patch_id, 0)) + return error("Could not get patch ID for %s", + oid_to_hex(&a->commit->object.oid)); + if (is_null_sha1(b->patch_id) && + commit_patch_id(b->commit, opt, b->patch_id, 0)) + return error("Could not get patch ID for %s", + oid_to_hex(&b->commit->object.oid)); return hashcmp(a->patch_id, b->patch_id); } @@ -43,11 +51,13 @@ static int init_patch_id_entry(struct patch_id *patch, struct commit *commit, struct patch_ids *ids) { + unsigned char header_only_patch_id[GIT_SHA1_RAWSZ]; + patch->commit = commit; - if (commit_patch_id(commit, &ids->diffopts, patch->patch_id)) + if (commit_patch_id(commit, &ids->diffopts, header_only_patch_id, 1)) return -1; - hashmap_entry_init(patch, sha1hash(patch->patch_i
[[PATCH v2] 1/4] patch-ids: stop using a hand-rolled hashmap implementation
From: Kevin Willford This change will use the hashmap from the hashmap.h to keep track of the patch_ids that have been encountered instead of using an internal implementation. This simplifies the implementation of the patch ids. Signed-off-by: Kevin Willford --- patch-ids.c | 86 + patch-ids.h | 7 +++-- 2 files changed, 32 insertions(+), 61 deletions(-) diff --git a/patch-ids.c b/patch-ids.c index a4d0016..db31fa6 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -16,90 +16,62 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, return diff_flush_patch_id(options, sha1); } -static const unsigned char *patch_id_access(size_t index, void *table) +static int patch_id_cmp(struct patch_id *a, + struct patch_id *b, + void *keydata) { - struct patch_id **id_table = table; - return id_table[index]->patch_id; + return hashcmp(a->patch_id, b->patch_id); } -static int patch_pos(struct patch_id **table, int nr, const unsigned char *id) -{ - return sha1_pos(id, table, nr, patch_id_access); -} - -#define BUCKET_SIZE 190 /* 190 * 21 = 3990, with slop close enough to 4K */ -struct patch_id_bucket { - struct patch_id_bucket *next; - int nr; - struct patch_id bucket[BUCKET_SIZE]; -}; - int init_patch_ids(struct patch_ids *ids) { memset(ids, 0, sizeof(*ids)); diff_setup(&ids->diffopts); DIFF_OPT_SET(&ids->diffopts, RECURSIVE); diff_setup_done(&ids->diffopts); + hashmap_init(&ids->patches, (hashmap_cmp_fn)patch_id_cmp, 256); return 0; } int free_patch_ids(struct patch_ids *ids) { - struct patch_id_bucket *next, *patches; - - free(ids->table); - for (patches = ids->patches; patches; patches = next) { - next = patches->next; - free(patches); - } + hashmap_free(&ids->patches, 1); return 0; } -static struct patch_id *add_commit(struct commit *commit, - struct patch_ids *ids, - int no_add) +static int init_patch_id_entry(struct patch_id *patch, + struct commit *commit, + struct patch_ids *ids) { - struct patch_id_bucket *bucket; - struct patch_id *ent; - unsigned char sha1[20]; - int pos; - - if (commit_patch_id(commit, &ids->diffopts, sha1)) - return NULL; - pos = patch_pos(ids->table, ids->nr, sha1); - if (0 <= pos) - return ids->table[pos]; - if (no_add) - return NULL; - - pos = -1 - pos; + if (commit_patch_id(commit, &ids->diffopts, patch->patch_id)) + return -1; - bucket = ids->patches; - if (!bucket || (BUCKET_SIZE <= bucket->nr)) { - bucket = xcalloc(1, sizeof(*bucket)); - bucket->next = ids->patches; - ids->patches = bucket; - } - ent = &bucket->bucket[bucket->nr++]; - hashcpy(ent->patch_id, sha1); - - ALLOC_GROW(ids->table, ids->nr + 1, ids->alloc); - if (pos < ids->nr) - memmove(ids->table + pos + 1, ids->table + pos, - sizeof(ent) * (ids->nr - pos)); - ids->nr++; - ids->table[pos] = ent; - return ids->table[pos]; + hashmap_entry_init(patch, sha1hash(patch->patch_id)); + return 0; } struct patch_id *has_commit_patch_id(struct commit *commit, struct patch_ids *ids) { - return add_commit(commit, ids, 1); + struct patch_id patch; + + memset(&patch, 0, sizeof(patch)); + if (init_patch_id_entry(&patch, commit, ids)) + return NULL; + + return hashmap_get(&ids->patches, &patch, NULL); } struct patch_id *add_commit_patch_id(struct commit *commit, struct patch_ids *ids) { - return add_commit(commit, ids, 0); + struct patch_id *key = xcalloc(1, sizeof(*key)); + + if (init_patch_id_entry(key, commit, ids)) { + free(key); + return NULL; + } + + hashmap_add(&ids->patches, key); + return key; } diff --git a/patch-ids.h b/patch-ids.h index eeb56b3..9569ee0 100644 --- a/patch-ids.h +++ b/patch-ids.h @@ -2,15 +2,14 @@ #define PATCH_IDS_H struct patch_id { - unsigned char patch_id[20]; + struct hashmap_entry ent; + unsigned char patch_id[GIT_SHA1_RAWSZ]; char seen; }; struct patch_ids { + struct hashmap patches; struct diff_options diffopts; - int nr, alloc; - struct patch_id **table; - struct patch_id_bucket *patches;
[[PATCH v2] 0/4] Use header data patch ids for rebase to avoid loading file content
This patch series is to remove the hand rolled hashmap in the patch_ids and use the hashmap.h implementation. It also introduces the idea of having a header only patch id so that the contents of the files do not have to be loaded in order to determine if two commits are different. Kevin Willford (4): patch-ids: stop using a hand-rolled hashmap implementation patch-ids: replace the seen indicator with a commit pointer patch-ids: add flag to create the diff patch id using header only data rebase: avoid computing unnecessary patch IDs builtin/log.c| 2 +- diff.c | 16 +++--- diff.h | 2 +- patch-ids.c | 99 +++- patch-ids.h | 11 ++-- revision.c | 18 ++- t/perf/p3400-rebase.sh | 36 + t/t6007-rev-list-cherry-pick-file.sh | 17 +++ 8 files changed, 114 insertions(+), 87 deletions(-) create mode 100644 t/perf/p3400-rebase.sh -- 2.9.2.windows.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use path comparison for patch ids before the file content
When limiting the list in a revision walk using cherry pick, patch ids are calculated by producing the diff of the content of the files. This would be more efficent by using a patch id looking at the paths that were changed in the commits and only if all the file changed are the same fall back to getting the content of the files in the commits to determine if the commits are the same. This change uses a hashmap to store entries with a hash of the patch id calculated by just using the file paths. The entries constist of the commit and the patch id calculated using file contents which is initially empty. If there are commits that are found in the hashmap it means that the same files were changed in the commits and the file contents need to be checked in order to determine if the commits are truly the same. The patch id that is calcuated based on the file contents is then stored in the hashmap entry if needed in later comparisons. If the commits are determined to be the same the cherry_flag is set on the commit being checked as well as the commit in the hashmap entry saving running though the original list of commits checking a seen flag. This will speed up a rebase where the upstream has many changes but none of them have been pulled into the current branch. --- diff.c | 16 + diff.h | 2 +- patch-ids.c | 114 +--- patch-ids.h | 7 ++-- revision.c | 19 ++ 5 files changed, 73 insertions(+), 85 deletions(-) diff --git a/diff.c b/diff.c index fa78fc1..f38b663 100644 --- a/diff.c +++ b/diff.c @@ -4449,7 +4449,7 @@ static void patch_id_consume(void *priv, char *line, unsigned long len) } /* returns 0 upon success, and writes result into sha1 */ -static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) +static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1, int use_path_only) { struct diff_queue_struct *q = &diff_queued_diff; int i; @@ -4484,9 +4484,6 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) diff_fill_sha1_info(p->one); diff_fill_sha1_info(p->two); - if (fill_mmfile(&mf1, p->one) < 0 || - fill_mmfile(&mf2, p->two) < 0) - return error("unable to read files to diff"); len1 = remove_space(p->one->path, strlen(p->one->path)); len2 = remove_space(p->two->path, strlen(p->two->path)); @@ -4521,6 +4518,13 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) len2, p->two->path); git_SHA1_Update(&ctx, buffer, len1); + if (use_path_only) + continue; + + if (fill_mmfile(&mf1, p->one) < 0 || + fill_mmfile(&mf2, p->two) < 0) + return error("unable to read files to diff"); + if (diff_filespec_is_binary(p->one) || diff_filespec_is_binary(p->two)) { git_SHA1_Update(&ctx, sha1_to_hex(p->one->sha1), 40); @@ -4541,11 +4545,11 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1) return 0; } -int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1) +int diff_flush_patch_id(struct diff_options *options, unsigned char *sha1, int use_path_only) { struct diff_queue_struct *q = &diff_queued_diff; int i; - int result = diff_get_patch_id(options, sha1); + int result = diff_get_patch_id(options, sha1, use_path_only); for (i = 0; i < q->nr; i++) diff_free_filepair(q->queue[i]); diff --git a/diff.h b/diff.h index 125447b..7883729 100644 --- a/diff.h +++ b/diff.h @@ -342,7 +342,7 @@ extern int run_diff_files(struct rev_info *revs, unsigned int option); extern int run_diff_index(struct rev_info *revs, int cached); extern int do_diff_cache(const unsigned char *, struct diff_options *); -extern int diff_flush_patch_id(struct diff_options *, unsigned char *); +extern int diff_flush_patch_id(struct diff_options *, unsigned char *, int); extern int diff_result_code(struct diff_options *, int); diff --git a/patch-ids.c b/patch-ids.c index a4d0016..f0262ce 100644 --- a/patch-ids.c +++ b/patch-ids.c @@ -4,8 +4,9 @@ #include "sha1-lookup.h" #include "patch-ids.h" -int commit_patch_id(struct commit *commit, struct diff_options *options, - unsigned char *sha1) + +static int ll_commit_patch_id(struct commit *commit, struct diff_options *options, + unsigned char *sha1, int use_path_only) { if (commit->parents) diff_tree_sha1(commit->parents->item->object.oid.hash, @@ -13,93 +14,90 @@ int commit_patch_id(struct commit *commit, struct diff_options *options, else diff_root_tree_sha1