Re: [PATCH 01/24] Makefile: add pending semantic patches
Stefan Beller writes: > [Missing: SZEDERs sign off, so I also do not sign off] At least to me, based on my reading of DCO in Documentation/SubmittingPatches, this reasoning does not make much sense.
Re: [PATCHv2 00/24] Bring more repository handles into our code base]
Stefan Beller writes: > I also picked up the patch for pending semantic patches, as the > first patch, can I have your sign off, please? I find this step quite lacking. What was posted would have been perfectly fine as a "how about doing it this way" weatherbaloon patch, but as a part of real series, it needs to explain to the developers what the distinctions between two classes are, and who is to use the cocci patch at what point in the development cycle, etc., in an accompanying documentation update. So can we have both sign-off and write-up (perhaps cut&paste from the original e-mail discussion)? > Stefan Beller (23): > sha1_file: allow read_object to read objects in arbitrary repositories > packfile: allow has_packed_and_bad to handle arbitrary repositories > object-store: allow read_object_file_extended to read from arbitrary > repositories > object-store: prepare read_object_file to deal with arbitrary > repositories > object-store: prepare has_{sha1, object}_file[_with_flags] to handle > arbitrary repositories > object: parse_object to honor its repository argument > commit: allow parse_commit* to handle arbitrary repositories > commit-reach.c: allow paint_down_to_common to handle arbitrary > repositories > commit-reach.c: allow merge_bases_many to handle arbitrary > repositories > commit-reach.c: allow remove_redundant to handle arbitrary > repositories > commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary > repositories > commit-reach: prepare get_merge_bases to handle arbitrary repositories > commit-reach: prepare in_merge_bases[_many] to handle arbitrary > repositories > commit: prepare get_commit_buffer to handle arbitrary repositories > commit: prepare repo_unuse_commit_buffer to handle arbitrary > repositories > commit: prepare logmsg_reencode to handle arbitrary repositories > pretty: prepare format_commit_message to handle arbitrary repositories > submodule: use submodule repos for object lookup > submodule: don't add submodule as odb for push > commit-graph: convert remaining function to handle arbitrary > repositories > commit: make free_commit_buffer and release_commit_memory repository > agnostic > path.h: make REPO_GIT_PATH_FUNC repository agnostic > t/helper/test-repository: celebrate independence from the_repository It seems that this topic is full of commits with overly long title. > > Makefile | 6 +- > builtin/fsck.c| 3 +- > builtin/log.c | 6 +- > builtin/rev-list.c| 3 +- > cache.h | 2 + > commit-graph.c| 40 +++-- > commit-reach.c| 73 + > commit-reach.h| 38 +++-- > commit.c | 41 ++--- > commit.h | 43 +- > .../coccinelle/the_repository.pending.cocci | 144 ++ > object-store.h| 35 - > object.c | 8 +- > packfile.c| 5 +- > packfile.h| 2 +- > path.h| 2 +- > pretty.c | 28 ++-- > pretty.h | 7 +- > sha1-file.c | 34 +++-- > streaming.c | 2 +- > submodule.c | 79 +++--- > t/helper/test-repository.c| 10 ++ > 22 files changed, 459 insertions(+), 152 deletions(-) > create mode 100644 contrib/coccinelle/the_repository.pending.cocci > > git range-diff origin/sb/more-repo-in-api... > [...] # rebased to origin/master I offhand do not recall what happened during these 100+ pacthes. DId we acquire something significantly new that would have conflicted with this new round of the topic? I do not mind at all seeing that a series gets adjusted to the updated codebase, but I do dislike seeing it done without explanation---an unexplained rebase to a new base is a wasted opportunity to learn what areas of the codebase are so hot that multiple and separate topics would want to touch them. > -: -- > 116: 4ede3d42df Seventh batch for 2.20 > -: -- > 117: b1de196491 Makefile: add pending semantic patches > 1: 1b9b5c695e = 118: f1be5eb487 sha1_file: allow read_object to read > objects in arbitrary repositories > 2: 33b94066f2 = 119: 9100a5705d packfile: allow has_packed_and_bad to > handle arbitrary repositories > 3: 5217b6b1e1 = 120: a4ad7791da object-store: allow > read_object_file_extended to read from arbitrary repositories > 4: 2b7239b55b ! 121: 5d7b3a6dd9 objec
Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote
Hi Jonathan, On Tue, Oct 16, 2018 at 7:43 PM Jonathan Nieder wrote: > > Hi Christian, > > On Tue, Sep 25, 2018, Christian Couder wrote: > > > In the cover letter there is a "Discussion" section which is about > > this, but I agree that it might not be very clear. > > > > The main issue that this patch series tries to solve is that > > extensions.partialclone config option limits the partial clone and > > promisor features to only one remote. One related issue is that it > > also prevents to have other kind of promisor/partial clone/odb > > remotes. By other kind I mean remotes that would not necessarily be > > git repos, but that could store objects (that's where ODB, for Object > > DataBase, comes from) and could provide those objects to Git through a > > helper (or driver) script or program. > > Thanks for this explanation. I took the opportunity to learn more > while you were in the bay area for the google summer of code mentor > summit and learned a little more, which was very helpful to me. Thanks for inviting me at the Google offices in Sunnyvale and San Francisco to discuss about this and other issues. > The broader picture is that this is meant to make Git natively handle > large blobs in a nicer way. The design in this series has a few > components: > > 1. Teaching partial clone to attempt to fetch missing objects from > multiple remotes instead of only one. This is useful because you > can have a server that is nearby and cheaper to serve from (some > kind of local cache server) that you make requests to first before > falling back to the canonical source of objects. > > 2. Simplifying the protocol for fetching missing objects so that it > can be satisfied by a lighter weight object storage system than > a full Git server. The ODB helpers introduced in this series are > meant to speak such a simpler protocol since they are only used > for one-off requests of a collection of missing objects instead of > needing to understand refs, Git's negotiation, etc. > > 3. (possibly, though not in this series) Making the criteria for what > objects can be missing more aggressive, so that I can "git add" > a large file and work with it using Git without even having a > second copy of that object in my local object store. Yeah, I think this is a good summary of the issues I have been trying to address. > For (2), I would like to see us improve the remote helper > infrastructure instead of introducing a new ODB helper. Remote > helpers are already permitted to fetch some objects without listing > refs --- perhaps we will want to > > i. split listing refs to a separate capability, so that a remote > helper can advertise that it doesn't support that. (Alternatively > the remote could advertise that it has no refs.) > > ii. Use the "long-running process" mechanism to improve how Git > communicates with a remote helper. Yeah, I agree that improving the remote helper infrastructure is probably better than what I have been trying to add. And I agree with the above 2 steps you propose. > For (1), things get more tricky. In an object store from a partial > clone today, we relax the ordinary "closure under reachability" > invariant but in a minor way. We'll need to work out how this works > with multiple promisor remotes. > > The idea today is that there are two kinds of packs: promisor packs > (from the promisor remote) and non-promisor packs. Promisor packs are > allowed to have reachability edges (for example a tree->blob edge) > that point to a missing object, since the promisor remote has promised > that we will be able to access that object on demand. Non-promisor > packs are also allowed to have reachability edges that point to a > missing object, as long as there is a reachability edge from an object > in a promisor pack to the same object (because of the same promise). > See "Handling Missing Objects" in Documentation/technical/partial-clone.txt > for more details. > > To prevent older versions of Git from being confused by partial clone > repositories, they use the repositoryFormatVersion mechanism: > > [core] > repositoryFormatVersion = 1 > [extensions] > partialClone = ... > > If we change the invariant, we will need to use a new extensions.* key > to ensure that versions of Git that are not aware of the new invariant > do not operate on the repository. Maybe the versions of Git that are not aware of the new invariant could still work using only the specified remote while the new versions would know that they can use other remotes by looking at other config variables. > A promisor pack is indicated by there being a .promisor file next to > the usual .pack file. Currently the .promisor file is empty. The > previous idea was that once we want more metadata (e.g. for the sake of > multiple promisor remotes), we could write it in that file. For > example, remotes could be associ
Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows
Chris Webster writes: >>> > Use File::Spec->devnull() for output redirection to avoid messages >>> > when Windows version of Perl is first in path. The message 'The >>> >>> Dscho, "Windows version of Perl is first in path" somehow feels >>> contradicting with what one of the topics I saw from you were trying >>> to enforce (or, at least, "set as the supported configuration"). >>> >>> I am guessing that the Perl you are building and shipping with Git >>> for Windows would yield what the shell that ends up running the >>> scriptlet `git config --get-color $key` prefers when asked for >>> File::Spec->devnull(), and nothing will break with this patch even >>> if that is "/dev/null", but I thought I'd double check. >>> >>> Thanks. >>> > This problem originally showed up in the > https://github.com/so-fancy/diff-so-fancy project, which has a copy of > DiffHighlight.pm. That project allows diffsofancy (perl) to be run > from the command line without requiring the bash environment ((well , > sort of) including the associated perl). Thanks for additional comments. In any case, Windows is not my bailiwick, so I'll hope that the above comments from you would help Dscho in his response and wait. I know use of File::Spec->devnull() won't hurt POSIX folks so making sure this won't break Git for Windows is the primary thing I woudl worry about this patch.
Re: [PATCH 1/3] commit-reach: implement get_reachable_subset
On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget wrote: > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -688,3 +688,73 @@ int can_all_from_reach(struct commit_list *from, struct > commit_list *to, > object_array_clear(&from_objs); > return result; > } > + > +struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > +struct commit **to, int nr_to, > +int reachable_flag) > +{ > + struct commit **item; > + struct commit *current; > + struct commit_list *found_commits = NULL; > + struct commit **to_last = to + nr_to; > + struct commit **from_last = from + nr_from; > + uint32_t min_generation = GENERATION_NUMBER_INFINITY; > + int num_to_find = 0; > + > + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; > + > + for (item = to; item < to_last; item++) { > + struct commit *c = *item; > + > + parse_commit(c); > + if (c->generation < min_generation) > + min_generation = c->generation; So, when we don't have a commit-graph, is c->generation just going to be 0, making min_generation also be 0? (meaning we get no possible speed benefit from the commit-graph, since we just don't have that information available)? > + > + if (!(c->object.flags & PARENT1)) { > + c->object.flags |= PARENT1; > + num_to_find++; > + } > + } > + > + for (item = from; item < from_last; item++) { > + struct commit *c = *item; > + if (!(c->object.flags & PARENT2)) { > + c->object.flags |= PARENT2; > + parse_commit(c); > + > + prio_queue_put(&queue, *item); > + } > + } > + > + while (num_to_find && (current = prio_queue_get(&queue)) != NULL) { > + struct commit_list *parents; > + > + if (current->object.flags & PARENT1) { > + current->object.flags &= ~PARENT1; > + current->object.flags |= reachable_flag; > + commit_list_insert(current, &found_commits); > + num_to_find--; > + } > + > + for (parents = current->parents; parents; parents = > parents->next) { > + struct commit *p = parents->item; > + > + parse_commit(p); > + > + if (p->generation < min_generation) > + continue; > + > + if (p->object.flags & PARENT2) > + continue; > + > + p->object.flags |= PARENT2; > + prio_queue_put(&queue, p); > + } > + } > + > + clear_commit_marks_many(nr_to, to, PARENT1); > + clear_commit_marks_many(nr_from, from, PARENT2); > + > + return found_commits; > +} > + > diff --git a/commit-reach.h b/commit-reach.h > index 7d313e297..43bd50a70 100644 > --- a/commit-reach.h > +++ b/commit-reach.h > @@ -74,4 +74,14 @@ int can_all_from_reach_with_flag(struct object_array *from, > int can_all_from_reach(struct commit_list *from, struct commit_list *to, >int commit_date_cutoff); > > + > +/* > + * Return a list of commits containing the commits in the 'to' array > + * that are reachable from at least one commit in the 'from' array. > + * Also add the given 'flag' to each of the commits in the returned list. > + */ > +struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > +struct commit **to, int nr_to, > +int reachable_flag); > + > #endif > -- > gitgitgadget >
Re: [PATCH 0/3] Make add_missing_tags() linear
On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget wrote: > > As reported earlier [1], the add_missing_tags() method in remote.c has > quadratic performance. Some of that performance is curbed due to the > generation-number cutoff in in_merge_bases_many(). However, that fix doesn't > help users without a commit-graph, and it can still be painful if that > cutoff is sufficiently low compared to the tags we are using for > reachability testing. > > Add a new method in commit-reach.c called get_reachable_subset() which does > a many-to-many reachability test. Starting at the 'from' commits, walk until > the generation is below the smallest generation in the 'to' commits, or all > 'to' commits have been discovered. This performs only one commit walk for > the entire add_missing_tags() method, giving linear performance in the worst > case. > > Tests are added in t6600-test-reach.sh to ensure get_reachable_subset() > works independently of its application in add_missing_tags(). On the original repo where the topic was brought up, with commit-graph NOT turned on and using origin/master, I see: $ time git push --dry-run --follow-tags /home/newren/repo-mirror To /home/newren/repo-mirror * [new branch] test5 -> test5 real 1m20.081s user 1m19.688s sys 0m0.292s Merging this series in, I now get: $ time git push --dry-run --follow-tags /home/newren/repo-mirror To /home/newren/repo-mirror * [new branch] test5 -> test5 real 0m2.857s user 0m2.580s sys 0m0.328s which provides a very nice speedup. Oddly enough, if I _also_ do the following: $ git config core.commitgraph true $ git config gc.writecommitgraph true $ git gc then my timing actually slows down just slightly: $ time git push --follow-tags --dry-run /home/newren/repo-mirror To /home/newren/repo-mirror * [new branch] test5 -> test5 real 0m3.027s user 0m2.696s sys 0m0.400s (run-to-run variation seems pretty consistent, < .1s variation, so this difference is just enough to notice.) I wouldn't be that surprised if that means there's some really old tags with very small generation numbers, meaning it's not gaining anything in this special case from the commit-graph, but it does pay the cost of loading the commit-graph. Anyway, looks good in my testing. Thanks much for working on this!
Re: [PATCH v7 09/10] submodule: support reading .gitmodules when it's not in the working tree
Antonio Ospite writes: > I see, this is also mentioned in t/README, I had overlooked that part. > Thank you for reporting. > >> Without this fix, your new test case will fail on Windows all the time, >> see e.g. >> https://git-for-windows.visualstudio.com/git/_build/results?buildId=22913&view=logs >> > > Junio, what is the plan for 'ao/submodule-wo-gitmodules-checked-out'? > I did not and do not have a specific plan ;-) If the only remaining issue in the previous round of the topic were what you said in <20181010205645.e1529eff9099805029b1d...@ao2.it>, which you addressed in this round, and given that Stefan (who is likely to be the person who would need to work with you if there is any issues later found in this topic) seemed to be happy with it in , I'd say with Dscho's bug fixed, it should be ready for 'next'. > I see it's not in next yet; do you want me to resend the whole series > with this fixup in or would it be less overhead for you to apply it > directly to patch 9/10 from v7 of the series? In either way, this involves rebuilding ao/* topic and then redoing sb/submodule-recursive-fetch-gets-the-tip topic on top, before I can do the 'pu' with them, so I cannot promise it will happen today, but let's see. I think I have enough material to do the fix-up locally without any additional thing sent from you. Thanks. > P.S. I was wondering if it is worth having patchset versions mentioned > somewhere in pu/, maybe in merge commits if not in branch names? No, not in branch names. The tip date published in the "What's cooking" report is taken from the committer date but we may want to use the author date instead, which may help (and encourage people to be careful _before_ sending things out, to avoid doing many rerolls in a day).
Re: commit-graph is cool (overcoming add_missing_tags() perf issues)
On Tue, Oct 30, 2018 at 9:23 AM Ævar Arnfjörð Bjarmason wrote: > > On Wed, Oct 17, 2018 at 8:41 PM Elijah Newren wrote: > > (And in the mean time I gave the user a one-liner to nuke his > > local-only tags that I suspect he doesn't need.) > > Just a note that you can usually set 'fetch.pruneTags=true' these days > to make that happen. TIL. Thanks for the heads up.
Re: commit-graph is cool (overcoming add_missing_tags() perf issues)
On Tue, Oct 30, 2018 at 7:22 AM Derrick Stolee wrote: > > On 10/17/2018 2:00 PM, Elijah Newren wrote: > > Hi, > > > > Just wanted to give a shout-out for the commit-graph work and how > > impressive it is. I had an internal report from a user that git > > pushes containing only one new tiny commit were taking over a minute > > (in a moderate size repo with good network connectivity). After > > digging for a while, I noticed three unusual things about the repo[1]: > >* he had push.followTags set to true > >* upstream repo had about 20k tags (despite only 55k commits) > >* his repo had an additional 2.5k tags, but none of these were in > > the history of the branches he was pushing and thus would not be > > included in any pushes. > > > > Digging in, almost all the time was CPU-bound and spent in > > add_missing_tags()[2]. If I'm reading the code correctly, it appears > > that function loops over each tag, calling in_merge_bases_many() once > > per tag. Thus, for his case, we were potentially walking all of > > history of the main branch 2.5k times. That seemed rather suboptimal. > > Elijah, > > Do you still have this repo around? Could you by chance test the > performance with the new algorithm for add_missing_tags() in [1]? > Specifically, please test it without a commit-graph file, since your > data shape already makes use of generation numbers pretty well. > > Thanks, > -Stolee I nuked it, but turns out I had a backup that I found after digging around for a bit. I'll post a comment on the series with the results. By the way, I've been running pu for about a week with this tweak: diff --git a/revision.c b/revision.c index b5108b75ab..d20c687e71 100644 --- a/revision.c +++ b/revision.c @@ -1457,6 +1457,7 @@ void repo_init_revisions(struct repository *r, revs->pruning.change = file_change; revs->pruning.change_fn_data = revs; revs->sort_order = REV_SORT_IN_GRAPH_ORDER; + revs->topo_order = 1; revs->dense = 1; revs->prefix = prefix; revs->max_age = -1; Only ran into one small problem once, and it wasn't commit-graph related; rather it was related to my above patch and needing to not have topo_order be set. (I just bailed and used my older system-installed git from /usr/bin/ in that one case.) So, I think the commit-graph stuff is looking pretty good, and I find the recent thread on further improvements with corrected commit date (among other possibilities) very intriguing...even if I haven't had much time to comment or test recently.
Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows
Resending in text mode. On Tue, Oct 30, 2018 at 10:20 PM Chris Webster wrote: > > On Tue, Oct 30, 2018 at 9:54 PM Junio C Hamano wrote: >> >> "chris via GitGitGadget" writes: >> >> > From: chris >> > >> > Use File::Spec->devnull() for output redirection to avoid messages >> > when Windows version of Perl is first in path. The message 'The >> >> Dscho, "Windows version of Perl is first in path" somehow feels >> contradicting with what one of the topics I saw from you were trying >> to enforce (or, at least, "set as the supported configuration"). >> >> I am guessing that the Perl you are building and shipping with Git >> for Windows would yield what the shell that ends up running the >> scriptlet `git config --get-color $key` prefers when asked for >> File::Spec->devnull(), and nothing will break with this patch even >> if that is "/dev/null", but I thought I'd double check. >> >> Thanks. >> This problem originally showed up in the https://github.com/so-fancy/diff-so-fancy project, which has a copy of DiffHighlight.pm. That project allows diffsofancy (perl) to be run from the command line without requiring the bash environment ((well , sort of) including the associated perl). > >> > system cannot find the path specified.' is displayed each time git is >> > run to get colors. >> > >> > Signed-off-by: Chris. Webster >> > --- >> > contrib/diff-highlight/DiffHighlight.pm | 7 ++- >> > 1 file changed, 6 insertions(+), 1 deletion(-) >> > >> > diff --git a/contrib/diff-highlight/DiffHighlight.pm >> > b/contrib/diff-highlight/DiffHighlight.pm >> > index 536754583..7440aa1c4 100644 >> > --- a/contrib/diff-highlight/DiffHighlight.pm >> > +++ b/contrib/diff-highlight/DiffHighlight.pm >> > @@ -4,6 +4,11 @@ use 5.008; >> > use warnings FATAL => 'all'; >> > use strict; >> > >> > +# Use the correct value for both UNIX and Windows (/dev/null vs nul) >> > +use File::Spec; >> > + >> > +my $NULL = File::Spec->devnull(); >> > + >> > # Highlight by reversing foreground and background. You could do >> > # other things like bold or underline if you prefer. >> > my @OLD_HIGHLIGHT = ( >> > @@ -134,7 +139,7 @@ sub highlight_stdin { >> > # fallback, which means we will work even if git can't be run. >> > sub color_config { >> > my ($key, $default) = @_; >> > - my $s = `git config --get-color $key 2>/dev/null`; >> > + my $s = `git config --get-color $key 2>$NULL`; >> > return length($s) ? $s : $default; >> > }
Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
Jeff King writes: > On Wed, Oct 31, 2018 at 01:03:39AM -0400, Jeff King wrote: > >> Phew. I almost just deleted all of the above, because now I think I'm >> ready to write that comment you asked for. ;) But I left it since maybe >> it makes sense to follow my thought process. > > So here it is in a more succinct form. Thanks. > + > + /* > + * Unlike the loose object case, we do not have to worry here > + * about running out of input bytes and spinning infinitely. If > + * we get Z_BUF_ERROR due to too few input bytes, then we'll > + * replenish them in the next use_pack() call when we loop. If > + * we truly hit the end of the pack (i.e., because it's corrupt > + * or truncated), then use_pack() catches that and will die(). > + */ > if (status != Z_OK && status != Z_BUF_ERROR) { > git_inflate_end(&st->z); > st->z_state = z_error; Reads well. Will apply.
Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
On Wed, Oct 31, 2018 at 01:03:39AM -0400, Jeff King wrote: > Phew. I almost just deleted all of the above, because now I think I'm > ready to write that comment you asked for. ;) But I left it since maybe > it makes sense to follow my thought process. So here it is in a more succinct form. -Peff -- >8 -- Subject: [PATCH] read_istream_pack_non_delta(): document input handling Twice now we have scratched our heads about why the loose streaming code needs the protection added by 692f0bc7ae (avoid infinite loop in read_istream_loose, 2013-03-25), but the similar code in its pack counterpart does not. The short answer is that use_pack() will die before it lets us run out of bytes. Note that this could mean reading garbage (including the trailing hash) from the packfile in some cases of corruption, but that's OK. zlib will notice and complain (and if not, certainly the end result will not match the object hash we expect). Let's leave a comment this time to document our findings. Signed-off-by: Jeff King --- streaming.c | 9 + 1 file changed, 9 insertions(+) diff --git a/streaming.c b/streaming.c index d1e6b2dce6..ac7c7a22f9 100644 --- a/streaming.c +++ b/streaming.c @@ -408,6 +408,15 @@ static read_method_decl(pack_non_delta) st->z_state = z_done; break; } + + /* +* Unlike the loose object case, we do not have to worry here +* about running out of input bytes and spinning infinitely. If +* we get Z_BUF_ERROR due to too few input bytes, then we'll +* replenish them in the next use_pack() call when we loop. If +* we truly hit the end of the pack (i.e., because it's corrupt +* or truncated), then use_pack() catches that and will die(). +*/ if (status != Z_OK && status != Z_BUF_ERROR) { git_inflate_end(&st->z); st->z_state = z_error; -- 2.19.1.1298.g19f18f2a22
Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows
> Subject: Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows As this is only about contrib/diff-highlight, please make it clear that it is the area the patch affects on its title, i.e. Subject: diff-highlight: use File::Spec->devnull(), not /dev/null or something like that. > From: chris Please make this line read like From: Chris Webster i.e. the author should be the person who is signing off that patch. > Use File::Spec->devnull() for output redirection to avoid messages > when Windows version of Perl is first in path. The message 'The > system cannot find the path specified.' is displayed each time git is > run to get colors. > > Signed-off-by: Chris. Webster > --- > contrib/diff-highlight/DiffHighlight.pm | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) There are a handful more instances of /dev/null found if you do $ git grep /dev/null -- \*.pl \*.pm The one in perl/Git.pm must be shared by scripts written in Perl, so it may be worth giving the same tweak to it, like this patch does to the highlight script. > diff --git a/contrib/diff-highlight/DiffHighlight.pm > b/contrib/diff-highlight/DiffHighlight.pm > index 536754583..7440aa1c4 100644 > --- a/contrib/diff-highlight/DiffHighlight.pm > +++ b/contrib/diff-highlight/DiffHighlight.pm > @@ -4,6 +4,11 @@ use 5.008; > use warnings FATAL => 'all'; > use strict; > > +# Use the correct value for both UNIX and Windows (/dev/null vs nul) > +use File::Spec; > + > +my $NULL = File::Spec->devnull(); > + > # Highlight by reversing foreground and background. You could do > # other things like bold or underline if you prefer. > my @OLD_HIGHLIGHT = ( > @@ -134,7 +139,7 @@ sub highlight_stdin { > # fallback, which means we will work even if git can't be run. > sub color_config { > my ($key, $default) = @_; > - my $s = `git config --get-color $key 2>/dev/null`; > + my $s = `git config --get-color $key 2>$NULL`; > return length($s) ? $s : $default; > }
Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
On Wed, Oct 31, 2018 at 01:44:25PM +0900, Junio C Hamano wrote: > Jeff King writes: > > >> See 692f0bc7 to find who did the fix you stole from, and for what > >> kind of breakage the original fix was made. > > > > Heh. I did not dig into it, but actually thought "I'll bet Junio had to > > get this right when he wrote the streaming code. No wonder he spotted my > > mistake so quickly!". > > > >> By the way, a very similar loop for pack_non_delta istream iterates > >> while total_read is smaller than sz, but it does not have the same > >> check upon BUF_ERROR to see if we've read everything. > > > > Indeed. Did you find that one by inspection, or did you peek at: > > > > https://public-inbox.org/git/20130325202114.gd16...@sigill.intra.peff.net/ > > I looked for BUF_ERROR in the streaming.c and found two instances in > a very similar looking loop with a subtle differnce, and the > difference was due to one of them getting fixed in the past while > the other one was left intact as written at its inception. > > I should have looked for that message to read the part below > three-dash mark. Or we may want to transplant that comment somehow > to the function so next person will not be puzzled like I did? Hmm. Reading that function, I am not sure if it actually might need fixing. Might we actually get Z_BUF_ERROR asking for more input if zlib reads to the end of the pack window? That is probably quite unlikely in practice, but in theory you could feed a very large buffer for the output and use a very small pack window. So I do not think we can use the same logic in that loop. But at the same time, what prevents use_pack() from getting to the very end of the pack and saying "I have no bytes left for you"? And then we'd loop infinitely, feeding zlib nothing. I'm not sure what the solution is. I do not think this works: diff --git a/streaming.c b/streaming.c index d1e6b2dce6..a92a85ed10 100644 --- a/streaming.c +++ b/streaming.c @@ -394,6 +394,9 @@ static read_method_decl(pack_non_delta) mapped = use_pack(st->u.in_pack.pack, &window, st->u.in_pack.pos, &st->z.avail_in); + if (!st->z.avail_in) + break; + st->z.next_out = (unsigned char *)buf + total_read; st->z.avail_out = sz - total_read; st->z.next_in = mapped; because we may have read to the very end but still have bytes to output. Though hrm. I think use_pack() will always tell us about the trailing 20-byte hash in the "avail" window. Which means we should never legitimately get to 0 there, because it means that either: 1. We're reading the trailing hash, which cannot possibly be right (in most cases I'd expect zlib to barf at that point anyway, but of course it's possible to have a hash that is valid zlib data ;) ). 2. We're truncated _before_ the hash, so we really did read to EOF, and there are no more bytes. I suspect we may actually detect this case upon opening the pack (since we do peek at the trailer then), but again that could be fooled by coincidence. I guess that's not the whole story, though. use_pack() tries to promise at least 20 bytes (to simplify some of the other parsing routines). So we shouldn't actually ever get "0" here. If we really are that close to the end of the pack, we'd hit this logic in use_pack: if (offset > (p->pack_size - the_hash_algo->rawsz)) die("offset beyond end of packfile (truncated pack?)"); So actually, I think this code is OK as-is. We will always have at least 20 bytes of input, or use_pack() will die. Phew. I almost just deleted all of the above, because now I think I'm ready to write that comment you asked for. ;) But I left it since maybe it makes sense to follow my thought process. -Peff
Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows
"chris via GitGitGadget" writes: > From: chris > > Use File::Spec->devnull() for output redirection to avoid messages > when Windows version of Perl is first in path. The message 'The Dscho, "Windows version of Perl is first in path" somehow feels contradicting with what one of the topics I saw from you were trying to enforce (or, at least, "set as the supported configuration"). I am guessing that the Perl you are building and shipping with Git for Windows would yield what the shell that ends up running the scriptlet `git config --get-color $key` prefers when asked for File::Spec->devnull(), and nothing will break with this patch even if that is "/dev/null", but I thought I'd double check. Thanks. > system cannot find the path specified.' is displayed each time git is > run to get colors. > > Signed-off-by: Chris. Webster > --- > contrib/diff-highlight/DiffHighlight.pm | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/contrib/diff-highlight/DiffHighlight.pm > b/contrib/diff-highlight/DiffHighlight.pm > index 536754583..7440aa1c4 100644 > --- a/contrib/diff-highlight/DiffHighlight.pm > +++ b/contrib/diff-highlight/DiffHighlight.pm > @@ -4,6 +4,11 @@ use 5.008; > use warnings FATAL => 'all'; > use strict; > > +# Use the correct value for both UNIX and Windows (/dev/null vs nul) > +use File::Spec; > + > +my $NULL = File::Spec->devnull(); > + > # Highlight by reversing foreground and background. You could do > # other things like bold or underline if you prefer. > my @OLD_HIGHLIGHT = ( > @@ -134,7 +139,7 @@ sub highlight_stdin { > # fallback, which means we will work even if git can't be run. > sub color_config { > my ($key, $default) = @_; > - my $s = `git config --get-color $key 2>/dev/null`; > + my $s = `git config --get-color $key 2>$NULL`; > return length($s) ? $s : $default; > }
Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows
Jeff King writes: > On Tue, Oct 30, 2018 at 11:26:36AM -0700, chris via GitGitGadget wrote: > >> From: chris > > You might want to adjust your user.name. :) Yes, absolutely. We'd want to see that the From: line and one of the Signed-off-by: lines are idential. >> Use File::Spec->devnull() for output redirection to avoid messages >> when Windows version of Perl is first in path. The message 'The >> system cannot find the path specified.' is displayed each time git is >> run to get colors. > > Thanks, makes sense, and the patch looks good to me. Yup, and we already use File::Spec everywhere anyway, so this is not a new dependency, either. Which is very good.
Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
Jeff King writes: >> See 692f0bc7 to find who did the fix you stole from, and for what >> kind of breakage the original fix was made. > > Heh. I did not dig into it, but actually thought "I'll bet Junio had to > get this right when he wrote the streaming code. No wonder he spotted my > mistake so quickly!". > >> By the way, a very similar loop for pack_non_delta istream iterates >> while total_read is smaller than sz, but it does not have the same >> check upon BUF_ERROR to see if we've read everything. > > Indeed. Did you find that one by inspection, or did you peek at: > > https://public-inbox.org/git/20130325202114.gd16...@sigill.intra.peff.net/ I looked for BUF_ERROR in the streaming.c and found two instances in a very similar looking loop with a subtle differnce, and the difference was due to one of them getting fixed in the past while the other one was left intact as written at its inception. I should have looked for that message to read the part below three-dash mark. Or we may want to transplant that comment somehow to the function so next person will not be puzzled like I did?
Re: [PATCH 1/2] ls-remote: do not send ref prefixes for patterns
Jeff King writes: > Since b4be74105f (ls-remote: pass ref prefixes when requesting a > remote's refs, 2018-03-15), "ls-remote foo" will pass "refs/heads/foo", > "refs/tags/foo", etc to the transport code in an attempt to let the > other side reduce the size of its advertisement. Jonathan, seeing 2b554353 ("fetch: send "refs/tags/" prefix upon CLI refspecs", 2018-06-05), I am guessing that you are doing the proto v2 work inherited from Brandon? Having to undo this is unfortunate, but I agree with this patch that we need to do this until ref prefix learns to grok wildcards. > Unfortunately this is not correct, as ls-remote patterns do not follow > the usual ref lookup rules, and are in fact tail-matched. So we could > find "refs/heads/foo" or "refs/heads/a/much/deeper/foo" or even > "refs/another/hierarchy/foo". > > Since we can't pass a prefix and there's not yet a v2 extension for > matching wildcards, we must disable this feature to keep the same > behavior as v1. > > Reported-by: Jon Simons > Signed-off-by: Jeff King > --- > builtin/ls-remote.c | 8 > t/t5512-ls-remote.sh | 9 + > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c > index 6a0cdec30d..5faa8459d9 100644 > --- a/builtin/ls-remote.c > +++ b/builtin/ls-remote.c > @@ -88,15 +88,7 @@ int cmd_ls_remote(int argc, const char **argv, const char > *prefix) > int i; > pattern = xcalloc(argc, sizeof(const char *)); > for (i = 1; i < argc; i++) { > - const char *glob; > pattern[i - 1] = xstrfmt("*/%s", argv[i]); > - > - glob = strchr(argv[i], '*'); > - if (glob) > - argv_array_pushf(&ref_prefixes, "%.*s", > - (int)(glob - argv[i]), > argv[i]); > - else > - expand_ref_prefix(&ref_prefixes, argv[i]); > } > } > > diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh > index bc5703ff9b..ca1b7e5bc1 100755 > --- a/t/t5512-ls-remote.sh > +++ b/t/t5512-ls-remote.sh > @@ -302,4 +302,13 @@ test_expect_success 'ls-remote works outside repository' > ' > nongit git ls-remote dst.git > ' > > +test_expect_success 'ls-remote patterns work with all protocol versions' ' > + git for-each-ref --format="%(objectname)%(refname)" \ > + refs/heads/master refs/remotes/origin/master >expect && > + git -c protocol.version=1 ls-remote . master >actual.v1 && > + test_cmp expect actual.v1 && > + git -c protocol.version=2 ls-remote . master >actual.v2 && > + test_cmp expect actual.v2 > +' > + > test_done
Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
On Wed, Oct 31, 2018 at 01:23:54PM +0900, Junio C Hamano wrote: > Jeff King writes: > > > The bug comes from commit f6371f9210 (sha1_file: add > > read_loose_object() function, 2017-01-13), which > > reimplemented some of the existing loose object functions. > > So it's worth checking if this bug was inherited from any of > > those. The answers seems to be no. The two obvious > > candidates are both OK: > > > > 1. unpack_sha1_rest(); this doesn't need to loop on > > Z_BUF_ERROR at all, since it allocates the expected > > output buffer in advance (which we can't do since we're > > explicitly streaming here) > > > > 2. check_object_signature(); the streaming path relies on > > the istream interface, which uses read_istream_loose() > > for this case. That function uses a similar "is our > > output buffer full" check with Z_BUF_ERROR (which is > > where I stole it from for this patch!) > > See 692f0bc7 to find who did the fix you stole from, and for what > kind of breakage the original fix was made. Heh. I did not dig into it, but actually thought "I'll bet Junio had to get this right when he wrote the streaming code. No wonder he spotted my mistake so quickly!". > By the way, a very similar loop for pack_non_delta istream iterates > while total_read is smaller than sz, but it does not have the same > check upon BUF_ERROR to see if we've read everything. Indeed. Did you find that one by inspection, or did you peek at: https://public-inbox.org/git/20130325202114.gd16...@sigill.intra.peff.net/ ? :) -Peff
Re: Using --word-diff breaks --color-moved
james harvey writes: > If you use both "--word-diff" and "--color-moved", regardless of the > order of arguments, "--word-diff" takes precedence and "--color-moved" > isn't allowed to do anything. > > I think "--color-moved" should have precedence over "--word-diff". I > cannot think of a scenario where a user would supply both options, and > actually want "--word-diff" to take precedence. I am not sure if I follow. If these two cannot work well together, then we should just reject the request as asking for incompatible combination of options while we are parsing the command line arguments, rather than arguing which one should trump the other one---that would simply lead to "in my opinion, word-diff is more important" vs "in mine, color-moved is more important", no?
[PATCH 2/2] ls-remote: pass heads/tags prefixes to transport
Unlike its arbitrary text patterns, the --heads and --tags options to ls-remote are true prefixes. We can pass this information to the transport code. If the v2 protocol is in use, that will reduce the size of the ref advertisement. Note that the test added here succeeds both before and after the patch. This is an optimization, not a bug-fix; it's just making sure we didn't break anything. Signed-off-by: Jeff King --- builtin/ls-remote.c | 5 + t/t5512-ls-remote.sh | 9 + 2 files changed, 14 insertions(+) diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 5faa8459d9..1d7f1f5ce2 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -92,6 +92,11 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix) } } + if (flags & REF_TAGS) + argv_array_push(&ref_prefixes, "refs/tags/"); + if (flags & REF_HEADS) + argv_array_push(&ref_prefixes, "refs/heads/"); + remote = remote_get(dest); if (!remote) { if (dest) diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh index ca1b7e5bc1..91ee6841c1 100755 --- a/t/t5512-ls-remote.sh +++ b/t/t5512-ls-remote.sh @@ -311,4 +311,13 @@ test_expect_success 'ls-remote patterns work with all protocol versions' ' test_cmp expect actual.v2 ' +test_expect_success 'ls-remote prefixes work with all protocol versions' ' + git for-each-ref --format="%(objectname)%(refname)" \ + refs/heads/ refs/tags/ >expect && + git -c protocol.version=1 ls-remote --heads --tags . >actual.v1 && + test_cmp expect actual.v1 && + git -c protocol.version=2 ls-remote --heads --tags . >actual.v2 && + test_cmp expect actual.v2 +' + test_done -- 2.19.1.1298.g19f18f2a22
[PATCH 1/2] ls-remote: do not send ref prefixes for patterns
Since b4be74105f (ls-remote: pass ref prefixes when requesting a remote's refs, 2018-03-15), "ls-remote foo" will pass "refs/heads/foo", "refs/tags/foo", etc to the transport code in an attempt to let the other side reduce the size of its advertisement. Unfortunately this is not correct, as ls-remote patterns do not follow the usual ref lookup rules, and are in fact tail-matched. So we could find "refs/heads/foo" or "refs/heads/a/much/deeper/foo" or even "refs/another/hierarchy/foo". Since we can't pass a prefix and there's not yet a v2 extension for matching wildcards, we must disable this feature to keep the same behavior as v1. Reported-by: Jon Simons Signed-off-by: Jeff King --- builtin/ls-remote.c | 8 t/t5512-ls-remote.sh | 9 + 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c index 6a0cdec30d..5faa8459d9 100644 --- a/builtin/ls-remote.c +++ b/builtin/ls-remote.c @@ -88,15 +88,7 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix) int i; pattern = xcalloc(argc, sizeof(const char *)); for (i = 1; i < argc; i++) { - const char *glob; pattern[i - 1] = xstrfmt("*/%s", argv[i]); - - glob = strchr(argv[i], '*'); - if (glob) - argv_array_pushf(&ref_prefixes, "%.*s", -(int)(glob - argv[i]), argv[i]); - else - expand_ref_prefix(&ref_prefixes, argv[i]); } } diff --git a/t/t5512-ls-remote.sh b/t/t5512-ls-remote.sh index bc5703ff9b..ca1b7e5bc1 100755 --- a/t/t5512-ls-remote.sh +++ b/t/t5512-ls-remote.sh @@ -302,4 +302,13 @@ test_expect_success 'ls-remote works outside repository' ' nongit git ls-remote dst.git ' +test_expect_success 'ls-remote patterns work with all protocol versions' ' + git for-each-ref --format="%(objectname)%(refname)" \ + refs/heads/master refs/remotes/origin/master >expect && + git -c protocol.version=1 ls-remote . master >actual.v1 && + test_cmp expect actual.v1 && + git -c protocol.version=2 ls-remote . master >actual.v2 && + test_cmp expect actual.v2 +' + test_done -- 2.19.1.1298.g19f18f2a22
Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
Jeff King writes: > The bug comes from commit f6371f9210 (sha1_file: add > read_loose_object() function, 2017-01-13), which > reimplemented some of the existing loose object functions. > So it's worth checking if this bug was inherited from any of > those. The answers seems to be no. The two obvious > candidates are both OK: > > 1. unpack_sha1_rest(); this doesn't need to loop on > Z_BUF_ERROR at all, since it allocates the expected > output buffer in advance (which we can't do since we're > explicitly streaming here) > > 2. check_object_signature(); the streaming path relies on > the istream interface, which uses read_istream_loose() > for this case. That function uses a similar "is our > output buffer full" check with Z_BUF_ERROR (which is > where I stole it from for this patch!) See 692f0bc7 to find who did the fix you stole from, and for what kind of breakage the original fix was made. By the way, a very similar loop for pack_non_delta istream iterates while total_read is smaller than sz, but it does not have the same check upon BUF_ERROR to see if we've read everything.
[PATCH 0/2] ls-remote and v2 ref prefixes
This series fixes a bug where ls-remote sends a ref-advertisement prefix when it shouldn't, and then optimizes a spot where it doesn't send one but could. [1/2]: ls-remote: do not send ref prefixes for patterns [2/2]: ls-remote: pass heads/tags prefixes to transport builtin/ls-remote.c | 13 + t/t5512-ls-remote.sh | 18 ++ 2 files changed, 23 insertions(+), 8 deletions(-) -Peff
Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows
On Tue, Oct 30, 2018 at 11:26:36AM -0700, chris via GitGitGadget wrote: > From: chris You might want to adjust your user.name. :) > Use File::Spec->devnull() for output redirection to avoid messages > when Windows version of Perl is first in path. The message 'The > system cannot find the path specified.' is displayed each time git is > run to get colors. Thanks, makes sense, and the patch looks good to me. -Peff
Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering
"Johannes Schindelin via GitGitGadget" writes: > An alternative approach which was rejected at the time (because it > interfered with the then-ongoing work to compile Git for Windows using MS > Visual C++) would patch the make_environment_block() function to skip the > specified environment variables before handing down the environment block to > the spawned process. Currently it would interfere with the mingw-utf-8-env > patch series I sent earlier today > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. So the rejected approach that was not used in this series would interfere with the other one, but I do not have to worry about it because this series does not use that approach? I had to read the six lines above twice to figure out that it essentially is saying "I shouldn't care", but please let me know if I misread the paragraph and I need to care ;-) > While at it, this patch series also cleans up house and moves the > Windows-specific core.* variable handling to compat/mingw.c rather than > cluttering environment.c and config.c with things that e.g. developers on > Linux do not want to care about. Or Macs. When I skimmed this series earlier, I found that patches 2 & 3 sensibly implemented to achieve this goal. > > Johannes Schindelin (4): > config: rename `dummy` parameter to `cb` in git_default_config() > Allow for platform-specific core.* config settings > Move Windows-specific config settings into compat/mingw.c > mingw: unset PERL5LIB by default > > Documentation/config.txt | 6 > cache.h | 8 - > compat/mingw.c | 58 +++- > compat/mingw.h | 3 ++ > config.c | 18 --- > environment.c| 1 - > git-compat-util.h| 8 + > t/t0029-core-unsetenvvars.sh | 30 +++ > 8 files changed, 109 insertions(+), 23 deletions(-) > create mode 100755 t/t0029-core-unsetenvvars.sh > > > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 > Published-As: > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git > pr-62/dscho/perl5lib-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/62
Re: [PATCH 1/3] commit-reach: implement get_reachable_subset
"Derrick Stolee via GitGitGadget" writes: > +struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > + struct commit **to, int nr_to, > + int reachable_flag) This is OR'ed into object.flags, and I somoehow expected to see it as 'unsigned int', not a signed one. > +{ > + struct commit **item; > + struct commit *current; > + struct commit_list *found_commits = NULL; > + struct commit **to_last = to + nr_to; > + struct commit **from_last = from + nr_from; > + uint32_t min_generation = GENERATION_NUMBER_INFINITY; > + int num_to_find = 0; > + > + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; > + > + for (item = to; item < to_last; item++) { > + struct commit *c = *item; > + > + parse_commit(c); > + if (c->generation < min_generation) > + min_generation = c->generation; > + > + if (!(c->object.flags & PARENT1)) { > + c->object.flags |= PARENT1; > + num_to_find++; > + } > + } > + > + for (item = from; item < from_last; item++) { > + struct commit *c = *item; > + if (!(c->object.flags & PARENT2)) { > + c->object.flags |= PARENT2; > + parse_commit(c); > + > + prio_queue_put(&queue, *item); > + } > + } OK, we marked "to" with PARENT1 and counted them in num_to_find without dups. We also marked "from" with PARENT2 and threw them in the "queue" without dups. Mental note: the caller must guarantee that everybody reachable from "to" and "from" have PARENT1 and PARENT2 clear. This might deserve to be in the comment before the function. > + while (num_to_find && (current = prio_queue_get(&queue)) != NULL) { > + struct commit_list *parents; > + > + if (current->object.flags & PARENT1) { > + current->object.flags &= ~PARENT1; > + current->object.flags |= reachable_flag; > + commit_list_insert(current, &found_commits); > + num_to_find--; What is in "queue" are all reachable from "from" so we found this object in the "to" set is reachable. One down. > + } > + > + for (parents = current->parents; parents; parents = > parents->next) { > + struct commit *p = parents->item; > + > + parse_commit(p); > + > + if (p->generation < min_generation) > + continue; Any commit reachable from "from" whose generation is too small (i.e. old) can never reach any of "to", because all "to" are at generation "min_generation" or greater. Makes sense. > + if (p->object.flags & PARENT2) > + continue; If the object is painted with PARENT2, we know we have it in queue and traversing to find more of its ancestors. Avoid recursing into it twice. Makes sense. > + p->object.flags |= PARENT2; > + prio_queue_put(&queue, p); Otherwise, explore this parent. > + } > + } > + > + clear_commit_marks_many(nr_to, to, PARENT1); Among "to", the ones that were not found still have PARENT1. Because we do not propagate this flag down the ancestry chain like merge-base discovery, this only has to clear just one level. > + clear_commit_marks_many(nr_from, from, PARENT2); And then we smudged everything that are reachable from "from" with this flag. In the worst case, i.e. when num_to_find is still positive here, we would have painted all commits down to the root, and we need to clear all of them. > + return found_commits; > +} OK, this all makes sense. Unlike merge-base traversals, this does not have to traverse from the "to" side at all, which makes it quite simpler and straight-forward. I do wonder if we can now reimplement in_merge_bases_many() in terms of this helper, and if that gives us a better performance. It asks "is 'commit', i.e. a single 'to', an ancestor of, i.e. reachable from, one of the 'references', i.e. 'from'"? > diff --git a/commit-reach.h b/commit-reach.h > index 7d313e297..43bd50a70 100644 > --- a/commit-reach.h > +++ b/commit-reach.h > @@ -74,4 +74,14 @@ int can_all_from_reach_with_flag(struct object_array *from, > int can_all_from_reach(struct commit_list *from, struct commit_list *to, > int commit_date_cutoff); > > + > +/* > + * Return a list of commits containing the commits in the 'to' array > + * that are reachable from at least one commit in the 'from' array. > + * Also add the given 'flag' to each of the commits in the returned list. > + */ > +struct commit_list *get_reachable_subset(struct commit **from, int nr_from, > + struct com
Re: [PATCH 0/3] Make add_missing_tags() linear
"Derrick Stolee via GitGitGadget" writes: > Add a new method in commit-reach.c called get_reachable_subset() which does > a many-to-many reachability test. Starting at the 'from' commits, walk until > the generation is below the smallest generation in the 'to' commits, or all > 'to' commits have been discovered. This performs only one commit walk for > the entire add_missing_tags() method, giving linear performance in the worst > case. ;-) I think in_merge_bases_many() was an attempt to do one half of this (i.e. it checks only one 'to' against main 'from' to see if any of them reach). I wonder why we didn't extend it to multiple 'to' back when we invented it. In any case, good to see this code optimized. Thanks.
Re: [PATCH v3 0/5] am/rebase: share read_author_script()
Phillip Wood writes: > From: Phillip Wood > > Thanks to Junio for the feedback on v2. I've updated patch 4 based on > those comments, the rest are unchanged. Hmph, all these five patches seem to be identical to what I have in 'pu'. Did you send the right version? > v1 cover letter: > > This is a follow up to pw/rebase-i-author-script-fix, it reduces code > duplication and improves rebase's parsing of the author script. After > this I'll do another series to share the code to write the author > script. > > Phillip Wood (5): > am: don't die in read_author_script() > am: improve author-script error reporting > am: rename read_author_script() > add read_author_script() to libgit > sequencer: use read_author_script() > > builtin/am.c | 60 ++-- > sequencer.c | 192 --- > sequencer.h | 3 + > 3 files changed, 128 insertions(+), 127 deletions(-)
Re: [PATCH v3 2/2] worktree: rename is_worktree_locked to worktree_lock_reason
nbelakov...@gmail.com writes: > From: Nickolai Belakovski > > A function prefixed with 'is_' would be expected to return a boolean, > however this function returns a string. > > Signed-off-by: Nickolai Belakovski > --- Given that there is a clear documentation in worktree.h, and a pointer that is not NULL is true in "if/while(ptr)", I'd say this change is a borderline "meh". I'll queue this on top of 1/2 and try merging to the integration topics to see if it interacts with any other topics in flight. Since the patch has already been written, let's not waste the effort if there is no conflict with anybody else (otherwise I'd discard this one---a "meh" patch is not worth having to worry about conflict resolution). Thanks. > builtin/worktree.c | 10 +- > worktree.c | 2 +- > worktree.h | 4 ++-- > 3 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/builtin/worktree.c b/builtin/worktree.c > index c4abbde2b..5e8402617 100644 > --- a/builtin/worktree.c > +++ b/builtin/worktree.c > @@ -245,7 +245,7 @@ static void validate_worktree_add(const char *path, const > struct add_opts *opts) > if (!wt) > goto done; > > - locked = !!is_worktree_locked(wt); > + locked = !!worktree_lock_reason(wt); > if ((!locked && opts->force) || (locked && opts->force > 1)) { > if (delete_git_dir(wt->id)) > die(_("unable to re-add worktree '%s'"), path); > @@ -682,7 +682,7 @@ static int lock_worktree(int ac, const char **av, const > char *prefix) > if (is_main_worktree(wt)) > die(_("The main working tree cannot be locked or unlocked")); > > - old_reason = is_worktree_locked(wt); > + old_reason = worktree_lock_reason(wt); > if (old_reason) { > if (*old_reason) > die(_("'%s' is already locked, reason: %s"), > @@ -714,7 +714,7 @@ static int unlock_worktree(int ac, const char **av, const > char *prefix) > die(_("'%s' is not a working tree"), av[0]); > if (is_main_worktree(wt)) > die(_("The main working tree cannot be locked or unlocked")); > - if (!is_worktree_locked(wt)) > + if (!worktree_lock_reason(wt)) > die(_("'%s' is not locked"), av[0]); > ret = unlink_or_warn(git_common_path("worktrees/%s/locked", wt->id)); > free_worktrees(worktrees); > @@ -787,7 +787,7 @@ static int move_worktree(int ac, const char **av, const > char *prefix) > validate_no_submodules(wt); > > if (force < 2) > - reason = is_worktree_locked(wt); > + reason = worktree_lock_reason(wt); > if (reason) { > if (*reason) > die(_("cannot move a locked working tree, lock reason: > %s\nuse 'move -f -f' to override or unlock first"), > @@ -900,7 +900,7 @@ static int remove_worktree(int ac, const char **av, const > char *prefix) > if (is_main_worktree(wt)) > die(_("'%s' is a main working tree"), av[0]); > if (force < 2) > - reason = is_worktree_locked(wt); > + reason = worktree_lock_reason(wt); > if (reason) { > if (*reason) > die(_("cannot remove a locked working tree, lock > reason: %s\nuse 'remove -f -f' to override or unlock first"), > diff --git a/worktree.c b/worktree.c > index b0d0b5426..befdbe7fa 100644 > --- a/worktree.c > +++ b/worktree.c > @@ -235,7 +235,7 @@ int is_main_worktree(const struct worktree *wt) > return !wt->id; > } > > -const char *is_worktree_locked(struct worktree *wt) > +const char *worktree_lock_reason(struct worktree *wt) > { > assert(!is_main_worktree(wt)); > > diff --git a/worktree.h b/worktree.h > index 6b12a3cf6..55d449b6a 100644 > --- a/worktree.h > +++ b/worktree.h > @@ -10,7 +10,7 @@ struct worktree { > char *path; > char *id; > char *head_ref; /* NULL if HEAD is broken or detached */ > - char *lock_reason; /* private - use is_worktree_locked */ > + char *lock_reason; /* private - use worktree_lock_reason */ > struct object_id head_oid; > int is_detached; > int is_bare; > @@ -60,7 +60,7 @@ extern int is_main_worktree(const struct worktree *wt); > * Return the reason string if the given worktree is locked or NULL > * otherwise. > */ > -extern const char *is_worktree_locked(struct worktree *wt); > +extern const char *worktree_lock_reason(struct worktree *wt); > > #define WT_VALIDATE_WORKTREE_MISSING_OK (1 << 0)
Re: [PATCH v3 1/2] worktree: update documentation for lock_reason and lock_reason_valid
nbelakov...@gmail.com writes: > From: Nickolai Belakovski > > Clarify that these fields are to be considered implementation details > and direct the reader to use the is_worktree_locked function to retrieve > said information. > > Signed-off-by: Nickolai Belakovski > --- > worktree.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/worktree.h b/worktree.h > index df3fc30f7..6b12a3cf6 100644 > --- a/worktree.h > +++ b/worktree.h > @@ -10,12 +10,12 @@ struct worktree { > char *path; > char *id; > char *head_ref; /* NULL if HEAD is broken or detached */ > - char *lock_reason; /* internal use */ > + char *lock_reason; /* private - use is_worktree_locked */ s/use /used by/, probably. > struct object_id head_oid; > int is_detached; > int is_bare; > int is_current; > - int lock_reason_valid; > + int lock_reason_valid; /* private */ > }; These annotations to the two fields are not wrong per-se, but I have a feeling that it would equally be important to document what the other "non-private" fields mean, if peeking them *is* the API this subsystem offers. Thanks.
Using --word-diff breaks --color-moved
If you use both "--word-diff" and "--color-moved", regardless of the order of arguments, "--word-diff" takes precedence and "--color-moved" isn't allowed to do anything. I think "--color-moved" should have precedence over "--word-diff". I cannot think of a scenario where a user would supply both options, and actually want "--word-diff" to take precedence. If I'm not thinking of a scenario where this wouldn't be desired, perhaps whichever is first as an argument could take precedence. (The same behavior happens if 4+ lines are moved and "--color-moved{default=zebra}" is used, but below "--color-moved=plain" is used to be a smaller testcase.) Given the following setup: $ cat << EOF > file > a > b > c > EOF $ git add file $ git commit -m "Added file." $ cat << EOF > file > b > c > a > EOF You can have moved lines colorization: $ git diff --color-moved=plain ... [oldMovedColor]-a b c [newMovedColor]+a You can diff based on words: $ git diff --word-diff ... [oldColor][-a-] b c [newColor][+a+} But, you cannot diff based on words, and have moved lines colorization: $ git diff --color-moved=plain --word-diff $ git diff --word-diff ... [oldColor][-a-] b c [newColor][+a+}
Re: [PATCH 12/12] fsck: mark strings for translation
Hi, Ævar Arnfjörð Bjarmason wrote: >> On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano wrote: >>> SZEDER Gábor writes: Nguyễn Thái Ngọc Duy wrote: > -fprintf(stderr, "%s in %s %s: %s\n", > -msg_type, printable_type(obj), describe_object(obj), err); > +fprintf_ln(stderr, _("%s in %s %s: %s"), Are the (f)printf() -> (f)printf_ln() changes all over 'builtin/fsck.c' really necessary to mark strings for translation? >>> >>> It is beyond absolute minimum but I saw it argued here that this >>> makes it easier to manage the .po and .pot files if your message >>> strings do not end with LF, a you are much less likely to _add_ >>> unneeded LF to the translated string than _lose_ LF at the end of >>> translated string. [...] > As Jonathan pointed out in the follow-up message[1] this sort of thing > is checked for in msgfmt, so sure, let's strip the \n, but it's really > not something we need to worry about. Likewise with translators turning > "%s" into "%d" (or "% s") or whatever. IMHO the advantage of leaving the \n out in the message is not so much that we don't trust translators but more that where we can make their lives easier, we should. In other words, I'm glad the patch does that, and Ævar, I agree. Thanks, both. Jonathan > 1. > https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/
[PATCH 2/3] check_stream_sha1(): handle input underflow
This commit fixes an infinite loop when fscking large truncated loose objects. The check_stream_sha1() function takes an mmap'd loose object buffer and streams 4k of output at a time, checking its sha1. The loop quits when we've output enough bytes (we know the size from the object header), or when zlib tells us anything except Z_OK or Z_BUF_ERROR. The latter is expected because zlib may run out of room in our 4k buffer, and that is how it tells us to process the output and loop again. But Z_BUF_ERROR also covers another case: one in which zlib cannot make forward progress because it needs more _input_. This should never happen in this loop, because though we're streaming the output, we have the entire deflated input available in the mmap'd buffer. But since we don't check this case, we'll just loop infinitely if we do see a truncated object, thinking that zlib is asking for more output space. It's tempting to fix this by checking stream->avail_in as part of the loop condition (and quitting if all of our bytes have been consumed). But that assumes that once zlib has consumed the input, there is nothing left to do. That's not necessarily the case: it may have read our input into its internal state, but still have bytes to output. Instead, let's continue on Z_BUF_ERROR only when we see the case we're expecting: the previous round filled our output buffer completely. If it didn't (and we still saw Z_BUF_ERROR), we know something is wrong and should break out of the loop. The bug comes from commit f6371f9210 (sha1_file: add read_loose_object() function, 2017-01-13), which reimplemented some of the existing loose object functions. So it's worth checking if this bug was inherited from any of those. The answers seems to be no. The two obvious candidates are both OK: 1. unpack_sha1_rest(); this doesn't need to loop on Z_BUF_ERROR at all, since it allocates the expected output buffer in advance (which we can't do since we're explicitly streaming here) 2. check_object_signature(); the streaming path relies on the istream interface, which uses read_istream_loose() for this case. That function uses a similar "is our output buffer full" check with Z_BUF_ERROR (which is where I stole it from for this patch!) Reported-by: Ævar Arnfjörð Bjarmason Helped-by: Junio C Hamano Signed-off-by: Jeff King --- sha1-file.c | 3 ++- t/t1450-fsck.sh | 19 +++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/sha1-file.c b/sha1-file.c index dd0b6aa873..2daf7d9935 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -2199,7 +2199,8 @@ static int check_stream_sha1(git_zstream *stream, * see the comment in unpack_sha1_rest for details. */ while (total_read <= size && - (status == Z_OK || status == Z_BUF_ERROR)) { + (status == Z_OK || + (status == Z_BUF_ERROR && !stream->avail_out))) { stream->next_out = buf; stream->avail_out = sizeof(buf); if (size - total_read < stream->avail_out) diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 3421f12e8a..b5677d26a4 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -683,6 +683,25 @@ test_expect_success 'fsck detects trailing loose garbage (large blob)' ' test_i18ngrep "garbage.*$blob" out ' +test_expect_success 'fsck detects truncated loose object' ' + # make it big enough that we know we will truncate in the data + # portion, not the header + test-tool genrandom truncate 4096 >file && + blob=$(git hash-object -w file) && + file=$(sha1_file $blob) && + test_when_finished "remove_object $blob" && + test_copy_bytes 1024 <"$file" >tmp && + rm "$file" && + mv -f tmp "$file" && + + # check both regular and streaming code paths + test_must_fail git fsck 2>out && + test_i18ngrep corrupt.*$blob out && + + test_must_fail git -c core.bigfilethreshold=128 fsck 2>out && + test_i18ngrep corrupt.*$blob out +' + # for each of type, we have one version which is referenced by another object # (and so while unreachable, not dangling), and another variant which really is # dangling. -- 2.19.1.1235.g6b27db57c2
[PATCH 3/3] cat-file: handle streaming failures consistently
There are three ways to convince cat-file to stream a blob: - cat-file -p $blob - cat-file blob $blob - echo $batch | cat-file --batch In the first two, we simply exit with the error code of streaw_blob_to_fd(). That means that an error will cause us to exit with "-1" (which we try to avoid) without printing any kind of error message (which is confusing to the user). Instead, let's match the third case, which calls die() on an error. Unfortunately we cannot be more specific, as stream_blob_to_fd() does not tell us whether the problem was on reading (e.g., a corrupt object) or on writing (e.g., ENOSPC). That might be an opportunity for future work, but for now we will at least exit with a sane message and exit code. Signed-off-by: Jeff King --- builtin/cat-file.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 8d97c84725..0d403eb77d 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -50,6 +50,13 @@ static int filter_object(const char *path, unsigned mode, return 0; } +static int stream_blob(const struct object_id *oid) +{ + if (stream_blob_to_fd(1, oid, NULL, 0)) + die("unable to stream %s to stdout", oid_to_hex(oid)); + return 0; +} + static int cat_one_file(int opt, const char *exp_type, const char *obj_name, int unknown_type) { @@ -132,7 +139,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, } if (type == OBJ_BLOB) - return stream_blob_to_fd(1, &oid, NULL, 0); + return stream_blob(&oid); buf = read_object_file(&oid, &type, &size); if (!buf) die("Cannot read object %s", obj_name); @@ -155,7 +162,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name, oidcpy(&blob_oid, &oid); if (oid_object_info(the_repository, &blob_oid, NULL) == OBJ_BLOB) - return stream_blob_to_fd(1, &blob_oid, NULL, 0); + return stream_blob(&blob_oid); /* * we attempted to dereference a tag to a blob * and failed; there may be new dereference @@ -319,8 +326,9 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d BUG("invalid cmdmode: %c", opt->cmdmode); batch_write(opt, contents, size); free(contents); - } else if (stream_blob_to_fd(1, oid, NULL, 0) < 0) - die("unable to stream %s to stdout", oid_to_hex(oid)); + } else { + stream_blob(oid); + } } else { enum object_type type; -- 2.19.1.1235.g6b27db57c2
[PATCH 1/3] t1450: check large blob in trailing-garbage test
Commit cce044df7f (fsck: detect trailing garbage in all object types, 2017-01-13) added two tests of trailing garbage in a loose object file: one with a commit and one with a blob. The point of having two is that blobs would follow a different code path that streamed the contents, instead of loading it into a buffer as usual. At the time, merely being a blob was enough to trigger the streaming code path. But since 7ac4f3a007 (fsck: actually fsck blob data, 2018-05-02), we now only stream blobs that are actually large. So since then, the streaming code path is not tested at all for this case. We can restore the original intent of the test by tweaking core.bigFileThreshold to make our small blob seem large. There's no easy way to externally verify that we followed the streaming code path, but I did check before/after using a temporary debug statement. Signed-off-by: Jeff King --- I prepared this series on master, but it occurs to me you may want to apply patch 2 on top of f6371f9210 or thereabouts, which introduced the bug it fixes. If so, then obviously this one doesn't make sense back then, and should go on top of 7ac4f3a007. It should be semantically independent, though there may be a minor text conflict. t/t1450-fsck.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 0f2dd26f74..3421f12e8a 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -673,13 +673,13 @@ test_expect_success 'fsck detects trailing loose garbage (commit)' ' test_i18ngrep "garbage.*$commit" out ' -test_expect_success 'fsck detects trailing loose garbage (blob)' ' +test_expect_success 'fsck detects trailing loose garbage (large blob)' ' blob=$(echo trailing | git hash-object -w --stdin) && file=$(sha1_file $blob) && test_when_finished "remove_object $blob" && chmod +w "$file" && echo garbage >>"$file" && - test_must_fail git fsck 2>out && + test_must_fail git -c core.bigfilethreshold=5 fsck 2>out && test_i18ngrep "garbage.*$blob" out ' -- 2.19.1.1235.g6b27db57c2
Re: Infinite loop regression in git-fsck in v2.12.0
On Tue, Oct 30, 2018 at 06:56:03PM -0400, Jeff King wrote: > > > while (total_read <= size && > > > +stream->avail_in > 0 && > > > (status == Z_OK || status == Z_BUF_ERROR)) { > > > stream->next_out = buf; > > > stream->avail_out = sizeof(buf); > > > > Hmph. If the last round consumed the final input byte and needed > > output space of N bytes, but only M (< N) bytes of the output space > > was available, then it would have reduced both avail_in and > > avail_out down to zero and yielded Z_BUF_ERROR, no? Or would zlib > > refrain from consuming that final byte (leaving avail_in to at least > > one) and give us Z_BUF_ERROR in such a case? > > Hmm, yeah, good thinking. I think zlib could consume that final byte > into its internal buffer. > > As part of my digging, I looked at how the loose streaming code handles > this. It checks that when we see Z_BUF_ERROR, we actually did run out of > output bytes (so if we didn't, then we know it's not the case we > expected to be looping on). > > I have some patches almost ready to send; I'll use that technique. And here they are. [1/3]: t1450: check large blob in trailing-garbage test [2/3]: check_stream_sha1(): handle input underflow [3/3]: cat-file: handle streaming failures consistently builtin/cat-file.c | 16 sha1-file.c| 3 ++- t/t1450-fsck.sh| 23 +-- 3 files changed, 35 insertions(+), 7 deletions(-) -Peff
Re: Infinite loop regression in git-fsck in v2.12.0
On Tue, Oct 30, 2018 at 10:56:22PM +0100, Ævar Arnfjörð Bjarmason wrote: > > So maybe a good approach would be that we'd annotate all those test > > whose fsck fails with "this is how it should fail", and run those tests > > under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting > > that no tests other than those marked as failing the fsck check at the > > end fail it. > [...] > Jeff: Gotta turn in for the night, but maybe Something you're maybe > interested in carrying forward for this fix? It's not that much work to > mark up the failing tests, there's 10-20 of them from some quick > eyeballing. For this fix, I'd much rather add a specific test to the existing fsck tests. Otherwise, we're relying on what a bunch of other tests happen to be doing now, but there's little hope that they won't get refactored in a way that puts a gap in our test coverage. IOW, I think of things like GIT_TEST_FSCK as a kind of shotgun approach. They may find things, and we should fix them and make sure it runs clean. But ultimately, specific cases of interest should get their own tests. -Peff
Re: [RFC v1] Add virtual file system settings and hook proc
Ben Peart writes: > diff --git a/config.c b/config.c > index 4051e38823..96e05ee0f1 100644 > --- a/config.c > +++ b/config.c > ... > @@ -2307,6 +2311,37 @@ int git_config_get_index_threads(void) > return 0; /* auto */ > } > > +int git_config_get_virtualfilesystem(void) > +{ > + if (git_config_get_pathname("core.virtualfilesystem", > &core_virtualfilesystem)) > + core_virtualfilesystem = getenv("GIT_TEST_VIRTUALFILESYSTEM"); > + > + if (core_virtualfilesystem && !*core_virtualfilesystem) > + core_virtualfilesystem = NULL; > + > + if (core_virtualfilesystem) { > + /* > + * Some git commands spawn helpers and redirect the index to a > different > + * location. These include "difftool -d" and the sequencer > + * (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) > and others. > + * In those instances we don't want to update their temporary > index with > + * our virtualization data. > + */ > + char *default_index_file = xstrfmt("%s/%s", > the_repository->gitdir, "index"); > + int should_run_hook = !strcmp(default_index_file, > the_repository->index_file); > + > + free(default_index_file); > + if (should_run_hook) { > + /* virtual file system relies on the sparse checkout > logic so force it on */ > + core_apply_sparse_checkout = 1; > + return 1; > + } > + core_virtualfilesystem = NULL; It would be a small leak if this came from config_get_pathname(), but if it came from $GIT_TEST_VFS env, we cannot free it X-<. A helper function called *_get_X() that does not return X as its return value or updating the location pointed by its *dst parameter, and instead only stores its finding in a global variable feels somewhat odd. It smells more like "find out", "probe", "check", etc. > diff --git a/dir.c b/dir.c > index 47c2fca8dc..3097b0e446 100644 > --- a/dir.c > +++ b/dir.c > @@ -21,6 +21,7 @@ > #include "ewah/ewok.h" > #include "fsmonitor.h" > #include "submodule-config.h" > +#include "virtualfilesystem.h" > > /* > * Tells read_directory_recursive how a file or directory should be treated. > @@ -1109,6 +1110,18 @@ int is_excluded_from_list(const char *pathname, > struct exclude_list *el, struct index_state *istate) > { > struct exclude *exclude; > + > + /* > + * The virtual file system data is used to prevent git from traversing > + * any part of the tree that is not in the virtual file system. Return > + * 1 to exclude the entry if it is not found in the virtual file system, > + * else fall through to the regular excludes logic as it may further > exclude. > + */ This comment will sit better immediately in front of the call to "is excluded from vfs?" helper function. > + if (*dtype == DT_UNKNOWN) > + *dtype = get_dtype(NULL, istate, pathname, pathlen); We try to defer paying cost to determine unknown *dtype as late as possible by having this call in last_exclude_matching_from_list(), and not here. If we are doing this, we probably should update the callpaths that call last_exclude_matching_from_list() to make the caller responsible for doing get_dtype() and drop the lazy finding of dtype from the callee. Alternatively, the new "is excluded from vfs" helper can learn to do the lazy get_dtype() just like the existing last_exclude_matching_from_list() does. I suspect the latter may be simpler. > + if (is_excluded_from_virtualfilesystem(pathname, pathlen, *dtype) > 0) > + return 1; > + > exclude = last_exclude_matching_from_list(pathname, pathlen, basename, > dtype, el, istate); > if (exclude) > @@ -1324,8 +1337,20 @@ struct exclude *last_exclude_matching(struct > dir_struct *dir, > int is_excluded(struct dir_struct *dir, struct index_state *istate, > const char *pathname, int *dtype_p) > { > - struct exclude *exclude = > - last_exclude_matching(dir, istate, pathname, dtype_p); > + struct exclude *exclude; > + > + /* > + * The virtual file system data is used to prevent git from traversing > + * any part of the tree that is not in the virtual file system. Return > + * 1 to exclude the entry if it is not found in the virtual file system, > + * else fall through to the regular excludes logic as it may further > exclude. > + */ > + if (*dtype_p == DT_UNKNOWN) > + *dtype_p = get_dtype(NULL, istate, pathname, strlen(pathname)); Exactly the same comment as above. > + if (is_excluded_from_virtualfilesystem(pathname, strlen(pathname), > *dtype_p) > 0) > + return 1; > + > + exclude = last_exclude_matching(dir, istate, pathname, dtype_p); > if (exclude) > r
Re: Infinite loop regression in git-fsck in v2.12.0
On Wed, Oct 31, 2018 at 07:28:00AM +0900, Junio C Hamano wrote: > > So we need to distinguish those cases. I think this is the simplest fix: > > > > diff --git a/sha1-file.c b/sha1-file.c > > index dd0b6aa873..a7ff5fe25d 100644 > > --- a/sha1-file.c > > +++ b/sha1-file.c > > @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream, > > * see the comment in unpack_sha1_rest for details. > > */ > > while (total_read <= size && > > + stream->avail_in > 0 && > >(status == Z_OK || status == Z_BUF_ERROR)) { > > stream->next_out = buf; > > stream->avail_out = sizeof(buf); > > Hmph. If the last round consumed the final input byte and needed > output space of N bytes, but only M (< N) bytes of the output space > was available, then it would have reduced both avail_in and > avail_out down to zero and yielded Z_BUF_ERROR, no? Or would zlib > refrain from consuming that final byte (leaving avail_in to at least > one) and give us Z_BUF_ERROR in such a case? Hmm, yeah, good thinking. I think zlib could consume that final byte into its internal buffer. As part of my digging, I looked at how the loose streaming code handles this. It checks that when we see Z_BUF_ERROR, we actually did run out of output bytes (so if we didn't, then we know it's not the case we expected to be looping on). I have some patches almost ready to send; I'll use that technique. -Peff
Re: Infinite loop regression in git-fsck in v2.12.0
Jeff King writes: > The problem isn't actually a sha1 mismatch, though that's what > parse_object() will report. The issue is actually that the file is > truncated. So zlib does not say "this is corrupt", but rather "I need > more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both > for "I need more bytes" (in which we know we are truncated, because we > fed the whole mmap'd file in the first place) as well as "I need more > output buffer space" (which just means we should keep looping!). > > So we need to distinguish those cases. I think this is the simplest fix: > > diff --git a/sha1-file.c b/sha1-file.c > index dd0b6aa873..a7ff5fe25d 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream, >* see the comment in unpack_sha1_rest for details. >*/ > while (total_read <= size && > +stream->avail_in > 0 && > (status == Z_OK || status == Z_BUF_ERROR)) { > stream->next_out = buf; > stream->avail_out = sizeof(buf); Hmph. If the last round consumed the final input byte and needed output space of N bytes, but only M (< N) bytes of the output space was available, then it would have reduced both avail_in and avail_out down to zero and yielded Z_BUF_ERROR, no? Or would zlib refrain from consuming that final byte (leaving avail_in to at least one) and give us Z_BUF_ERROR in such a case? > This works just by checking that we are making forward progress in the > output buffer. I think that would _probably_ be OK for this case, since > we know we have all of the input available. But in a case where we're > feeding the input in a stream, it would not be. It's possible there that > we would not create any output in one round, but would do so after > feeding more input bytes. Yes, exactly. > I think the patch I showed above addresses the root cause more directly. > I'll wrap that up in a real commit, but I think there may be some > related work: > > - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't > strictly correct, but is probably good enough). However, "git > cat-file blob 19f9c827" exits non-zero without printing anything. It > probably should complain more loudly. > > - the offending loop comes from f6371f9210. But that commit was mostly > cargo-culting other parts of sha1-file.c. I'm worried that this bug > exists elsewhere, too. I'll dig around to see if I can find other > instances. Thanks.
[PATCH 21/24] commit-graph: convert remaining function to handle arbitrary repositories
Convert all functions to handle arbitrary repositories in commit-graph.c that are used by functions taking a repository argument already. Notable exclusion is write_commit_graph and its local functions as that only works on the_repository. Signed-off-by: Stefan Beller --- commit-graph.c | 40 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 40c855f185..f78a8e96b5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -292,7 +292,8 @@ static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t g->chunk_oid_lookup, g->hash_len, pos); } -static struct commit_list **insert_parent_or_die(struct commit_graph *g, +static struct commit_list **insert_parent_or_die(struct repository *r, +struct commit_graph *g, uint64_t pos, struct commit_list **pptr) { @@ -303,7 +304,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g, die("invalid parent position %"PRIu64, pos); hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); - c = lookup_commit(the_repository, &oid); + c = lookup_commit(r, &oid); if (!c) die(_("could not find commit %s"), oid_to_hex(&oid)); c->graph_pos = pos; @@ -317,7 +318,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } -static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos) +static int fill_commit_in_graph(struct repository *r, + struct commit *item, + struct commit_graph *g, uint32_t pos) { uint32_t edge_value; uint32_t *parent_data_ptr; @@ -341,13 +344,13 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin edge_value = get_be32(commit_data + g->hash_len); if (edge_value == GRAPH_PARENT_NONE) return 1; - pptr = insert_parent_or_die(g, edge_value, pptr); + pptr = insert_parent_or_die(r, g, edge_value, pptr); edge_value = get_be32(commit_data + g->hash_len + 4); if (edge_value == GRAPH_PARENT_NONE) return 1; if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) { - pptr = insert_parent_or_die(g, edge_value, pptr); + pptr = insert_parent_or_die(r, g, edge_value, pptr); return 1; } @@ -355,7 +358,7 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin 4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK)); do { edge_value = get_be32(parent_data_ptr); - pptr = insert_parent_or_die(g, + pptr = insert_parent_or_die(r, g, edge_value & GRAPH_EDGE_LAST_MASK, pptr); parent_data_ptr++; @@ -374,7 +377,9 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin } } -static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item) +static int parse_commit_in_graph_one(struct repository *r, +struct commit_graph *g, +struct commit *item) { uint32_t pos; @@ -382,7 +387,7 @@ static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item return 1; if (find_commit_in_graph(item, g, &pos)) - return fill_commit_in_graph(item, g, pos); + return fill_commit_in_graph(r, item, g, pos); return 0; } @@ -391,7 +396,7 @@ int parse_commit_in_graph(struct repository *r, struct commit *item) { if (!prepare_commit_graph(r)) return 0; - return parse_commit_in_graph_one(r->objects->commit_graph, item); + return parse_commit_in_graph_one(r, r->objects->commit_graph, item); } void load_commit_graph_info(struct repository *r, struct commit *item) @@ -403,19 +408,22 @@ void load_commit_graph_info(struct repository *r, struct commit *item) fill_commit_graph_info(item, r->objects->commit_graph, pos); } -static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *c) +static struct tree *load_tree_for_commit(struct repository *r, +struct commit_graph *g, +struct commit *c) { struct object_id oid; const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * (c->graph_pos); hashcpy(oid.hash, commit_data
[PATCH 22/24] commit: make free_commit_buffer and release_commit_memory repository agnostic
Pass the object pool to free_commit_buffer and release_commit_memory, such that we can eliminate access to 'the_repository'. Also remove the TODO in release_commit_memory, as commit->util was removed in 9d2c97016f (commit.h: delete 'util' field in struct commit, 2018-05-19) Signed-off-by: Stefan Beller --- builtin/fsck.c | 3 ++- builtin/log.c | 6 -- builtin/rev-list.c | 3 ++- commit.c | 9 - commit.h | 4 ++-- object.c | 2 +- 6 files changed, 15 insertions(+), 12 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 06eb421720..c476ac6983 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -382,7 +382,8 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size) if (obj->type == OBJ_TREE) free_tree_buffer((struct tree *)obj); if (obj->type == OBJ_COMMIT) - free_commit_buffer((struct commit *)obj); + free_commit_buffer(the_repository->parsed_objects, + (struct commit *)obj); return err; } diff --git a/builtin/log.c b/builtin/log.c index 061d4fd864..64c2649c7c 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -395,7 +395,8 @@ static int cmd_log_walk(struct rev_info *rev) * We may show a given commit multiple times when * walking the reflogs. */ - free_commit_buffer(commit); + free_commit_buffer(the_repository->parsed_objects, + commit); free_commit_list(commit->parents); commit->parents = NULL; } @@ -1922,7 +1923,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) open_next_file(rev.numbered_files ? NULL : commit, NULL, &rev, quiet)) die(_("Failed to create output files")); shown = log_tree_commit(&rev, commit); - free_commit_buffer(commit); + free_commit_buffer(the_repository->parsed_objects, + commit); /* We put one extra blank line between formatted * patches and this flag is used by log-tree code diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 5064d08e1b..a8867f0c4b 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -197,7 +197,8 @@ static void finish_commit(struct commit *commit, void *data) free_commit_list(commit->parents); commit->parents = NULL; } - free_commit_buffer(commit); + free_commit_buffer(the_repository->parsed_objects, + commit); } static inline void finish_object__ma(struct object *obj) diff --git a/commit.c b/commit.c index 7d2f3a9a93..4fe74aa4bc 100644 --- a/commit.c +++ b/commit.c @@ -328,10 +328,10 @@ void repo_unuse_commit_buffer(struct repository *r, free((void *)buffer); } -void free_commit_buffer(struct commit *commit) +void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit) { struct commit_buffer *v = buffer_slab_peek( - the_repository->parsed_objects->buffer_slab, commit); + pool->buffer_slab, commit); if (v) { FREE_AND_NULL(v->buffer); v->size = 0; @@ -354,13 +354,12 @@ struct object_id *get_commit_tree_oid(const struct commit *commit) return &get_commit_tree(commit)->object.oid; } -void release_commit_memory(struct commit *c) +void release_commit_memory(struct parsed_object_pool *pool, struct commit *c) { c->maybe_tree = NULL; c->index = 0; - free_commit_buffer(c); + free_commit_buffer(pool, c); free_commit_list(c->parents); - /* TODO: what about commit->util? */ c->object.parsed = 0; } diff --git a/commit.h b/commit.h index 2e6b799b26..d2779a23f6 100644 --- a/commit.h +++ b/commit.h @@ -140,7 +140,7 @@ void repo_unuse_commit_buffer(struct repository *r, /* * Free any cached object buffer associated with the commit. */ -void free_commit_buffer(struct commit *); +void free_commit_buffer(struct parsed_object_pool *pool, struct commit *); struct tree *get_commit_tree(const struct commit *); struct object_id *get_commit_tree_oid(const struct commit *); @@ -149,7 +149,7 @@ struct object_id *get_commit_tree_oid(const struct commit *); * Release memory related to a commit, including the parent list and * any cached object buffer. */ -void release_commit_memory(struct commit *c); +void release_commit_memory(struct parsed_object_pool *pool, struct commit *c); /* * Disassociate any cached object buffer from the commit, but do not free it. diff --git a/object.c b/object.c index 003f870484..c4170d2d0c 100644 --- a/object.c +++ b/object.c @@ -540,7 +540,7 @@ void parsed_object_pool_clear(struct parsed_
[PATCH 19/24] submodule: use submodule repos for object lookup
This converts the 'show_submodule_header' function to use the repository API properly, such that the submodule objects are not added to the main object store. Signed-off-by: Stefan Beller --- submodule.c | 76 ++--- 1 file changed, 61 insertions(+), 15 deletions(-) diff --git a/submodule.c b/submodule.c index d9d3046689..0fdda45252 100644 --- a/submodule.c +++ b/submodule.c @@ -443,7 +443,7 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path, return prepare_revision_walk(rev); } -static void print_submodule_summary(struct rev_info *rev, struct diff_options *o) +static void print_submodule_summary(struct repository *r, struct rev_info *rev, struct diff_options *o) { static const char format[] = " %m %s"; struct strbuf sb = STRBUF_INIT; @@ -454,7 +454,8 @@ static void print_submodule_summary(struct rev_info *rev, struct diff_options *o ctx.date_mode = rev->date_mode; ctx.output_encoding = get_log_output_encoding(); strbuf_setlen(&sb, 0); - format_commit_message(commit, format, &sb, &ctx); + repo_format_commit_message(r, commit, format, &sb, + &ctx); strbuf_addch(&sb, '\n'); if (commit->object.flags & SYMMETRIC_LEFT) diff_emit_submodule_del(o, sb.buf); @@ -481,14 +482,47 @@ void prepare_submodule_repo_env(struct argv_array *out) DEFAULT_GIT_DIR_ENVIRONMENT); } -/* Helper function to display the submodule header line prior to the full - * summary output. If it can locate the submodule objects directory it will - * attempt to lookup both the left and right commits and put them into the - * left and right pointers. +/* + * Initialize 'out' based on the provided submodule path. + * + * Unlike repo_submodule_init, this tolerates submodules not present + * in .gitmodules. This function exists only to preserve historical behavior, + * + * Returns 0 on success, -1 when the submodule is not present. */ -static void show_submodule_header(struct diff_options *o, const char *path, +static struct repository *open_submodule(const char *path) +{ + struct strbuf sb = STRBUF_INIT; + struct repository *out = xmalloc(sizeof(*out)); + + if (submodule_to_gitdir(&sb, path) || repo_init(out, sb.buf, NULL)) { + strbuf_release(&sb); + free(out); + return NULL; + } + + out->submodule_prefix = xstrfmt("%s%s/", + the_repository->submodule_prefix ? + the_repository->submodule_prefix : + "", path); + + strbuf_release(&sb); + return out; +} + +/* + * Helper function to display the submodule header line prior to the full + * summary output. + * + * If it can locate the submodule git directory it will create a repository + * handle for the submodule and lookup both the left and right commits and + * put them into the left and right pointers. + */ +static void show_submodule_header(struct diff_options *o, + const char *path, struct object_id *one, struct object_id *two, unsigned dirty_submodule, + struct repository *sub, struct commit **left, struct commit **right, struct commit_list **merge_bases) { @@ -507,7 +541,7 @@ static void show_submodule_header(struct diff_options *o, const char *path, else if (is_null_oid(two)) message = "(submodule deleted)"; - if (add_submodule_odb(path)) { + if (!sub) { if (!message) message = "(commits not present)"; goto output_header; @@ -517,8 +551,8 @@ static void show_submodule_header(struct diff_options *o, const char *path, * Attempt to lookup the commit references, and determine if this is * a fast forward or fast backwards update. */ - *left = lookup_commit_reference(the_repository, one); - *right = lookup_commit_reference(the_repository, two); + *left = lookup_commit_reference(sub, one); + *right = lookup_commit_reference(sub, two); /* * Warn about missing commits in the submodule project, but only if @@ -528,7 +562,7 @@ static void show_submodule_header(struct diff_options *o, const char *path, (!is_null_oid(two) && !*right)) message = "(commits not present)"; - *merge_bases = get_merge_bases(*left, *right); + *merge_bases = repo_get_merge_bases(sub, *left, *right); if (*merge_bases) { if ((*merge_bases)->item == *left) fast_forward = 1; @@ -562,16 +596,18 @@ void show_submodule_summary(struct diff_options *o, const char *path, struct rev_info rev;
[PATCH 12/24] commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary repositories
Signed-off-by: Stefan Beller --- commit-reach.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index ab2bb1e5d5..bf7a513991 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -216,7 +216,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt return filled; } -static struct commit_list *get_merge_bases_many_0(struct commit *one, +static struct commit_list *get_merge_bases_many_0(struct repository *r, + struct commit *one, int n, struct commit **twos, int cleanup) @@ -226,7 +227,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, struct commit_list *result; int cnt, i; - result = merge_bases_many(the_repository, one, n, twos); + result = merge_bases_many(r, one, n, twos); for (i = 0; i < n; i++) { if (one == twos[i]) return result; @@ -249,7 +250,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, clear_commit_marks(one, all_flags); clear_commit_marks_many(n, twos, all_flags); - cnt = remove_redundant(the_repository, rslt, cnt); + cnt = remove_redundant(r, rslt, cnt); result = NULL; for (i = 0; i < cnt; i++) commit_list_insert_by_date(rslt[i], &result); @@ -261,19 +262,19 @@ struct commit_list *get_merge_bases_many(struct commit *one, int n, struct commit **twos) { - return get_merge_bases_many_0(one, n, twos, 1); + return get_merge_bases_many_0(the_repository, one, n, twos, 1); } struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos) { - return get_merge_bases_many_0(one, n, twos, 0); + return get_merge_bases_many_0(the_repository, one, n, twos, 0); } struct commit_list *get_merge_bases(struct commit *one, struct commit *two) { - return get_merge_bases_many_0(one, 1, &two, 1); + return get_merge_bases_many_0(the_repository, one, 1, &two, 1); } /* -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 14/24] commit-reach: prepare in_merge_bases[_many] to handle arbitrary repositories
Signed-off-by: Stefan Beller --- commit-reach.c | 15 +-- commit-reach.h | 12 ++-- contrib/coccinelle/the_repository.pending.cocci | 16 3 files changed, 35 insertions(+), 8 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 3be5526957..88a65f854f 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -312,16 +312,17 @@ int is_descendant_of(struct commit *commit, struct commit_list *with_commit) /* * Is "commit" an ancestor of one of the "references"? */ -int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference) +int repo_in_merge_bases_many(struct repository *r, struct commit *commit, +int nr_reference, struct commit **reference) { struct commit_list *bases; int ret = 0, i; uint32_t min_generation = GENERATION_NUMBER_INFINITY; - if (parse_commit(commit)) + if (repo_parse_commit(r, commit)) return ret; for (i = 0; i < nr_reference; i++) { - if (parse_commit(reference[i])) + if (repo_parse_commit(r, reference[i])) return ret; if (reference[i]->generation < min_generation) min_generation = reference[i]->generation; @@ -330,7 +331,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * if (commit->generation > min_generation) return ret; - bases = paint_down_to_common(the_repository, commit, + bases = paint_down_to_common(r, commit, nr_reference, reference, commit->generation); if (commit->object.flags & PARENT2) @@ -344,9 +345,11 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * /* * Is "commit" an ancestor of (i.e. reachable from) the "reference"? */ -int in_merge_bases(struct commit *commit, struct commit *reference) +int repo_in_merge_bases(struct repository *r, + struct commit *commit, + struct commit *reference) { - return in_merge_bases_many(commit, 1, &reference); + return repo_in_merge_bases_many(r, commit, 1, &reference); } struct commit_list *reduce_heads(struct commit_list *heads) diff --git a/commit-reach.h b/commit-reach.h index c647447263..6b16f12467 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -27,8 +27,16 @@ struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r, struct commit_list *get_octopus_merge_bases(struct commit_list *in); int is_descendant_of(struct commit *commit, struct commit_list *with_commit); -int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference); -int in_merge_bases(struct commit *commit, struct commit *reference); +int repo_in_merge_bases(struct repository *r, + struct commit *commit, + struct commit *reference); +int repo_in_merge_bases_many(struct repository *r, +struct commit *commit, +int nr_reference, struct commit **reference); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2) +#define in_merge_bases_many(c1, n, cs) repo_in_merge_bases_many(the_repository, c1, n, cs) +#endif /* * Takes a list of commits and returns a new list where those diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 5e037fe428..8c6a71bf64 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -91,3 +91,19 @@ expression G; + repo_get_merge_bases_many_dirty(the_repository, E, F, G); +@@ +expression E; +expression F; +@@ +- in_merge_bases( ++ repo_in_merge_bases(the_repository, + E, F); + +@@ +expression E; +expression F; +expression G; +@@ +- in_merge_bases_many( ++ repo_in_merge_bases_many(the_repository, + E, F, G); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 13/24] commit-reach: prepare get_merge_bases to handle arbitrary repositories
Similarly to previous patches, the get_merge_base functions are used often in the code base, which makes migrating them hard. Implement the new functions, prefixed with 'repo_' and hide the old functions behind a wrapper macro. Signed-off-by: Stefan Beller --- commit-reach.c| 24 ++--- commit-reach.h| 26 +++--- .../coccinelle/the_repository.pending.cocci | 27 +++ 3 files changed, 57 insertions(+), 20 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index bf7a513991..3be5526957 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -258,23 +258,27 @@ static struct commit_list *get_merge_bases_many_0(struct repository *r, return result; } -struct commit_list *get_merge_bases_many(struct commit *one, -int n, -struct commit **twos) +struct commit_list *repo_get_merge_bases_many(struct repository *r, + struct commit *one, + int n, + struct commit **twos) { - return get_merge_bases_many_0(the_repository, one, n, twos, 1); + return get_merge_bases_many_0(r, one, n, twos, 1); } -struct commit_list *get_merge_bases_many_dirty(struct commit *one, - int n, - struct commit **twos) +struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r, + struct commit *one, + int n, + struct commit **twos) { - return get_merge_bases_many_0(the_repository, one, n, twos, 0); + return get_merge_bases_many_0(r, one, n, twos, 0); } -struct commit_list *get_merge_bases(struct commit *one, struct commit *two) +struct commit_list *repo_get_merge_bases(struct repository *r, +struct commit *one, +struct commit *two) { - return get_merge_bases_many_0(the_repository, one, 1, &two, 1); + return get_merge_bases_many_0(r, one, 1, &two, 1); } /* diff --git a/commit-reach.h b/commit-reach.h index 122a23a24d..c647447263 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -8,17 +8,23 @@ struct commit_list; struct contains_cache; struct ref_filter; -struct commit_list *get_merge_bases_many(struct commit *one, -int n, -struct commit **twos); -struct commit_list *get_merge_bases_many_dirty(struct commit *one, - int n, - struct commit **twos); -struct commit_list *get_merge_bases(struct commit *one, struct commit *two); -struct commit_list *get_octopus_merge_bases(struct commit_list *in); - +struct commit_list *repo_get_merge_bases(struct repository *r, +struct commit *rev1, +struct commit *rev2); +struct commit_list *repo_get_merge_bases_many(struct repository *r, + struct commit *one, int n, + struct commit **twos); /* To be used only when object flags after this call no longer matter */ -struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos); +struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r, + struct commit *one, int n, + struct commit **twos); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define get_merge_bases(r1, r2) repo_get_merge_bases(the_repository, r1, r2) +#define get_merge_bases_many(one, n, two) repo_get_merge_bases_many(the_repository, one, n, two) +#define get_merge_bases_many_dirty(one, n, twos) repo_get_merge_bases_many_dirty(the_repository, one, n, twos) +#endif + +struct commit_list *get_octopus_merge_bases(struct commit_list *in); int is_descendant_of(struct commit *commit, struct commit_list *with_commit); int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference); diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index b185fe0a1d..5e037fe428 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -64,3 +64,30 @@ expression E; - parse_commit( + repo_parse_commit(the_repository, E) + +@@ +expression E; +expression F; +@@ +- get_merge_bases( ++ repo_get_merge_bases(the_repository, + E, F); + +@@ +expression E; +expression F; +expression G;
[PATCH 20/24] submodule: don't add submodule as odb for push
In push_submodule(), because we do not actually need access to objects in the submodule, do not invoke add_submodule_odb(). (for_each_remote_ref_submodule() does not require access to those objects, and the actual push is done by spawning another process, which handles object access by itself.) This code of push_submodule() is exercised in t5531 and continues to work, showing that the submodule odbc is not needed. Signed-off-by: Stefan Beller --- submodule.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/submodule.c b/submodule.c index 0fdda45252..da1ac1e6f8 100644 --- a/submodule.c +++ b/submodule.c @@ -1024,9 +1024,6 @@ static int push_submodule(const char *path, const struct string_list *push_options, int dry_run) { - if (add_submodule_odb(path)) - return 1; - if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; argv_array_push(&cp.args, "push"); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 17/24] commit: prepare logmsg_reencode to handle arbitrary repositories
Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- commit.h| 8 contrib/coccinelle/the_repository.pending.cocci | 9 + pretty.c| 13 +++-- 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/commit.h b/commit.h index 57375e3239..2e6b799b26 100644 --- a/commit.h +++ b/commit.h @@ -180,6 +180,14 @@ extern int has_non_ascii(const char *text); extern const char *logmsg_reencode(const struct commit *commit, char **commit_encoding, const char *output_encoding); +const char *repo_logmsg_reencode(struct repository *r, +const struct commit *commit, +char **commit_encoding, +const char *output_encoding); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, enc, out) +#endif + extern const char *skip_blank_lines(const char *msg); /** Removes the first commit from a list sorted by date, and adds all diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 516f19ffee..f5b42cfc62 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -123,3 +123,12 @@ expression F; - unuse_commit_buffer( + repo_unuse_commit_buffer(the_repository, E, F); + +@@ +expression E; +expression F; +expression G; +@@ +- logmsg_reencode( ++ repo_logmsg_reencode(the_repository, + E, F, G); diff --git a/pretty.c b/pretty.c index 8ca29e9281..b359b68750 100644 --- a/pretty.c +++ b/pretty.c @@ -595,14 +595,15 @@ static char *replace_encoding_header(char *buf, const char *encoding) return strbuf_detach(&tmp, NULL); } -const char *logmsg_reencode(const struct commit *commit, - char **commit_encoding, - const char *output_encoding) +const char *repo_logmsg_reencode(struct repository *r, +const struct commit *commit, +char **commit_encoding, +const char *output_encoding) { static const char *utf8 = "UTF-8"; const char *use_encoding; char *encoding; - const char *msg = get_commit_buffer(commit, NULL); + const char *msg = repo_get_commit_buffer(r, commit, NULL); char *out; if (!output_encoding || !*output_encoding) { @@ -630,7 +631,7 @@ const char *logmsg_reencode(const struct commit *commit, * the cached copy from get_commit_buffer, we need to duplicate it * to avoid munging the cached copy. */ - if (msg == get_cached_commit_buffer(the_repository, commit, NULL)) + if (msg == get_cached_commit_buffer(r, commit, NULL)) out = xstrdup(msg); else out = (char *)msg; @@ -644,7 +645,7 @@ const char *logmsg_reencode(const struct commit *commit, */ out = reencode_string(msg, output_encoding, use_encoding); if (out) - unuse_commit_buffer(commit, msg); + repo_unuse_commit_buffer(r, commit, msg); } /* -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 18/24] pretty: prepare format_commit_message to handle arbitrary repositories
Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- contrib/coccinelle/the_repository.pending.cocci | 10 ++ pretty.c| 15 --- pretty.h| 7 ++- 3 files changed, 24 insertions(+), 8 deletions(-) diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index f5b42cfc62..2ee702ecf7 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -132,3 +132,13 @@ expression G; - logmsg_reencode( + repo_logmsg_reencode(the_repository, E, F, G); + +@@ +expression E; +expression F; +expression G; +expression H; +@@ +- format_commit_message( ++ repo_format_commit_message(the_repository, + E, F, G, H); diff --git a/pretty.c b/pretty.c index b359b68750..3240495308 100644 --- a/pretty.c +++ b/pretty.c @@ -1508,9 +1508,10 @@ void userformat_find_requirements(const char *fmt, struct userformat_want *w) strbuf_release(&dummy); } -void format_commit_message(const struct commit *commit, - const char *format, struct strbuf *sb, - const struct pretty_print_context *pretty_ctx) +void repo_format_commit_message(struct repository *r, + const struct commit *commit, + const char *format, struct strbuf *sb, + const struct pretty_print_context *pretty_ctx) { struct format_commit_context context; const char *output_enc = pretty_ctx->output_encoding; @@ -1524,9 +1525,9 @@ void format_commit_message(const struct commit *commit, * convert a commit message to UTF-8 first * as far as 'format_commit_item' assumes it in UTF-8 */ - context.message = logmsg_reencode(commit, - &context.commit_encoding, - utf8); + context.message = repo_logmsg_reencode(r, commit, + &context.commit_encoding, + utf8); strbuf_expand(sb, format, format_commit_item, &context); rewrap_message_tail(sb, &context, 0, 0, 0); @@ -1550,7 +1551,7 @@ void format_commit_message(const struct commit *commit, } free(context.commit_encoding); - unuse_commit_buffer(commit, context.message); + repo_unuse_commit_buffer(r, commit, context.message); } static void pp_header(struct pretty_print_context *pp, diff --git a/pretty.h b/pretty.h index 7359d318a9..e6625269cf 100644 --- a/pretty.h +++ b/pretty.h @@ -103,9 +103,14 @@ void pp_remainder(struct pretty_print_context *pp, const char **msg_p, * Put the result to "sb". * Please use this function for custom formats. */ -void format_commit_message(const struct commit *commit, +void repo_format_commit_message(struct repository *r, + const struct commit *commit, const char *format, struct strbuf *sb, const struct pretty_print_context *context); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define format_commit_message(c, f, s, con) \ + repo_format_commit_message(the_repository, c, f, s, con) +#endif /* * Parse given arguments from "arg", check it for correctness and -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 16/24] commit: prepare repo_unuse_commit_buffer to handle arbitrary repositories
Signed-off-by: Stefan Beller --- commit.c| 6 -- commit.h| 7 ++- contrib/coccinelle/the_repository.pending.cocci | 8 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/commit.c b/commit.c index 4034def16c..7d2f3a9a93 100644 --- a/commit.c +++ b/commit.c @@ -318,10 +318,12 @@ const void *repo_get_commit_buffer(struct repository *r, return ret; } -void unuse_commit_buffer(const struct commit *commit, const void *buffer) +void repo_unuse_commit_buffer(struct repository *r, + const struct commit *commit, + const void *buffer) { struct commit_buffer *v = buffer_slab_peek( - the_repository->parsed_objects->buffer_slab, commit); + r->parsed_objects->buffer_slab, commit); if (!(v && v->buffer == buffer)) free((void *)buffer); } diff --git a/commit.h b/commit.h index 591a77a5bb..57375e3239 100644 --- a/commit.h +++ b/commit.h @@ -130,7 +130,12 @@ const void *repo_get_commit_buffer(struct repository *r, * from an earlier call to get_commit_buffer. The buffer may or may not be * freed by this call; callers should not access the memory afterwards. */ -void unuse_commit_buffer(const struct commit *, const void *buffer); +void repo_unuse_commit_buffer(struct repository *r, + const struct commit *, + const void *buffer); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, b) +#endif /* * Free any cached object buffer associated with the commit. diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 4018e6eaf7..516f19ffee 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -115,3 +115,11 @@ expression F; - get_commit_buffer( + repo_get_commit_buffer(the_repository, E, F); + +@@ +expression E; +expression F; +@@ +- unuse_commit_buffer( ++ repo_unuse_commit_buffer(the_repository, + E, F); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 24/24] t/helper/test-repository: celebrate independence from the_repository
dade47c06c (commit-graph: add repo arg to graph readers, 2018-07-11) brought more independence from the_repository to the commit graph, however it was not completely independent of the_repository, as the previous patches show. To ensure we're not accessing the_repository by accident, we'd ideally assign NULL to the_repository to trigger a segfault on access. We currently have a temporary hack in cache.h, which relies on the_hash_algo (which is a short form of the_repository->hash_algo) to be set, so we cannot do that. The next best thing is to set all fields of the_repository to 0, so any accidental access is more likely to be found. Signed-off-by: Stefan Beller --- cache.h| 2 ++ t/helper/test-repository.c | 10 ++ 2 files changed, 12 insertions(+) diff --git a/cache.h b/cache.h index f7fabdde8f..9f535040af 100644 --- a/cache.h +++ b/cache.h @@ -1035,6 +1035,8 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) * * This will need to be extended or ripped out when we learn about * hashes of different sizes. +* +* When ripping this out, see TODO in test-repository.c. */ if (the_hash_algo->rawsz != 20) BUG("hash size not yet supported by hashcmp"); diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c index 6a84a53efb..f7f8618445 100644 --- a/t/helper/test-repository.c +++ b/t/helper/test-repository.c @@ -17,6 +17,11 @@ static void test_parse_commit_in_graph(const char *gitdir, const char *worktree, setup_git_env(gitdir); + memset(the_repository, 0, sizeof(*the_repository)); + + /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + if (repo_init(&r, gitdir, worktree)) die("Couldn't init repo"); @@ -43,6 +48,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir, setup_git_env(gitdir); + memset(the_repository, 0, sizeof(*the_repository)); + + /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + if (repo_init(&r, gitdir, worktree)) die("Couldn't init repo"); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 23/24] path.h: make REPO_GIT_PATH_FUNC repository agnostic
git_pathdup uses the_repository internally, but the macro REPO_GIT_PATH_FUNC is specifically made for arbitrary repositories. Switch to repo_git_path which works on arbitrary repositories. Signed-off-by: Stefan Beller --- path.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.h b/path.h index b654ea8ff5..651e6157fc 100644 --- a/path.h +++ b/path.h @@ -165,7 +165,7 @@ extern void report_linked_checkout_garbage(void); const char *git_path_##var(struct repository *r) \ { \ if (!r->cached_paths.var) \ - r->cached_paths.var = git_pathdup(filename); \ + r->cached_paths.var = repo_git_path(r, filename); \ return r->cached_paths.var; \ } -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 15/24] commit: prepare get_commit_buffer to handle arbitrary repositories
Signed-off-by: Stefan Beller --- commit.c| 8 +--- commit.h| 7 ++- contrib/coccinelle/the_repository.pending.cocci | 8 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/commit.c b/commit.c index 7a931d7fd4..4034def16c 100644 --- a/commit.c +++ b/commit.c @@ -297,13 +297,15 @@ const void *get_cached_commit_buffer(struct repository *r, const struct commit * return v->buffer; } -const void *get_commit_buffer(const struct commit *commit, unsigned long *sizep) +const void *repo_get_commit_buffer(struct repository *r, + const struct commit *commit, + unsigned long *sizep) { - const void *ret = get_cached_commit_buffer(the_repository, commit, sizep); + const void *ret = get_cached_commit_buffer(r, commit, sizep); if (!ret) { enum object_type type; unsigned long size; - ret = read_object_file(&commit->object.oid, &type, &size); + ret = repo_read_object_file(r, &commit->object.oid, &type, &size); if (!ret) die("cannot read commit object %s", oid_to_hex(&commit->object.oid)); diff --git a/commit.h b/commit.h index 08935f9a19..591a77a5bb 100644 --- a/commit.h +++ b/commit.h @@ -117,7 +117,12 @@ const void *get_cached_commit_buffer(struct repository *, const struct commit *, * from disk. The resulting memory should not be modified, and must be given * to unuse_commit_buffer when the caller is done. */ -const void *get_commit_buffer(const struct commit *, unsigned long *size); +const void *repo_get_commit_buffer(struct repository *r, + const struct commit *, + unsigned long *size); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s) +#endif /* * Tell the commit subsytem that we are done with a particular commit buffer. diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 8c6a71bf64..4018e6eaf7 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -107,3 +107,11 @@ expression G; - in_merge_bases_many( + repo_in_merge_bases_many(the_repository, E, F, G); + +@@ +expression E; +expression F; +@@ +- get_commit_buffer( ++ repo_get_commit_buffer(the_repository, + E, F); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 10/24] commit-reach.c: allow merge_bases_many to handle arbitrary repositories
Signed-off-by: Stefan Beller --- commit-reach.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 080ae0a758..4cf471bfaf 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -95,7 +95,9 @@ static struct commit_list *paint_down_to_common(struct repository *r, return result; } -static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos) +static struct commit_list *merge_bases_many(struct repository *r, + struct commit *one, int n, + struct commit **twos) { struct commit_list *list = NULL; struct commit_list *result = NULL; @@ -110,14 +112,14 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co return commit_list_insert(one, &result); } - if (parse_commit(one)) + if (repo_parse_commit(r, one)) return NULL; for (i = 0; i < n; i++) { - if (parse_commit(twos[i])) + if (repo_parse_commit(r, twos[i])) return NULL; } - list = paint_down_to_common(the_repository, one, n, twos, 0); + list = paint_down_to_common(r, one, n, twos, 0); while (list) { struct commit *commit = pop_commit(&list); @@ -224,7 +226,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, struct commit_list *result; int cnt, i; - result = merge_bases_many(one, n, twos); + result = merge_bases_many(the_repository, one, n, twos); for (i = 0; i < n; i++) { if (one == twos[i]) return result; -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 04/24] object-store: allow read_object_file_extended to read from arbitrary repositories
read_object_file_extended is not widely used, so migrate it all at once. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- object-store.h | 5 +++-- sha1-file.c| 11 ++- streaming.c| 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/object-store.h b/object-store.h index 63b7605a3e..3d98a682b2 100644 --- a/object-store.h +++ b/object-store.h @@ -161,12 +161,13 @@ void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned cha void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size); -extern void *read_object_file_extended(const struct object_id *oid, +extern void *read_object_file_extended(struct repository *r, + const struct object_id *oid, enum object_type *type, unsigned long *size, int lookup_replace); static inline void *read_object_file(const struct object_id *oid, enum object_type *type, unsigned long *size) { - return read_object_file_extended(oid, type, size, 1); + return read_object_file_extended(the_repository, oid, type, size, 1); } /* Read and unpack an object file into memory, write memory to an object file */ diff --git a/sha1-file.c b/sha1-file.c index 856e000ee1..c5b704aec5 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1403,7 +1403,8 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type, * deal with them should arrange to call read_object() and give error * messages themselves. */ -void *read_object_file_extended(const struct object_id *oid, +void *read_object_file_extended(struct repository *r, + const struct object_id *oid, enum object_type *type, unsigned long *size, int lookup_replace) @@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct object_id *oid, const char *path; struct stat st; const struct object_id *repl = lookup_replace ? - lookup_replace_object(the_repository, oid) : oid; + lookup_replace_object(r, oid) : oid; errno = 0; - data = read_object(the_repository, repl->hash, type, size); + data = read_object(r, repl->hash, type, size); if (data) return data; @@ -1428,11 +1429,11 @@ void *read_object_file_extended(const struct object_id *oid, die(_("replacement %s not found for %s"), oid_to_hex(repl), oid_to_hex(oid)); - if (!stat_sha1_file(the_repository, repl->hash, &st, &path)) + if (!stat_sha1_file(r, repl->hash, &st, &path)) die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(repl), path); - if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL) + if ((p = has_packed_and_bad(r, repl->hash)) != NULL) die(_("packed object %s (stored in %s) is corrupt"), oid_to_hex(repl), p->pack_name); diff --git a/streaming.c b/streaming.c index d1e6b2dce6..c843a1230f 100644 --- a/streaming.c +++ b/streaming.c @@ -490,7 +490,7 @@ static struct stream_vtbl incore_vtbl = { static open_method_decl(incore) { - st->u.incore.buf = read_object_file_extended(oid, type, &st->size, 0); + st->u.incore.buf = read_object_file_extended(the_repository, oid, type, &st->size, 0); st->u.incore.read_ptr = 0; st->vtbl = &incore_vtbl; -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 03/24] packfile: allow has_packed_and_bad to handle arbitrary repositories
has_packed_and_bad is not widely used, so just migrate it all at once. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- packfile.c | 5 +++-- packfile.h | 2 +- sha1-file.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/packfile.c b/packfile.c index f2850a00b5..63f8dc7b98 100644 --- a/packfile.c +++ b/packfile.c @@ -1138,12 +1138,13 @@ void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1) p->num_bad_objects++; } -const struct packed_git *has_packed_and_bad(const unsigned char *sha1) +const struct packed_git *has_packed_and_bad(struct repository *r, + const unsigned char *sha1) { struct packed_git *p; unsigned i; - for (p = the_repository->objects->packed_git; p; p = p->next) + for (p = r->objects->packed_git; p; p = p->next) for (i = 0; i < p->num_bad_objects; i++) if (hasheq(sha1, p->bad_object_sha1 + the_hash_algo->rawsz * i)) diff --git a/packfile.h b/packfile.h index 6c4037605d..d70c6d9afb 100644 --- a/packfile.h +++ b/packfile.h @@ -146,7 +146,7 @@ extern int packed_object_info(struct repository *r, off_t offset, struct object_info *); extern void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); -extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1); +extern const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1); /* * Iff a pack file in the given repository contains the object named by sha1, diff --git a/sha1-file.c b/sha1-file.c index b8ce21cbaf..856e000ee1 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1432,7 +1432,7 @@ void *read_object_file_extended(const struct object_id *oid, die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(repl), path); - if ((p = has_packed_and_bad(repl->hash)) != NULL) + if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL) die(_("packed object %s (stored in %s) is corrupt"), oid_to_hex(repl), p->pack_name); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 02/24] sha1_file: allow read_object to read objects in arbitrary repositories
Allow read_object (a file local functon in sha1_file) to handle arbitrary repositories by passing the repository down to oid_object_info_extended. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- sha1-file.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sha1-file.c b/sha1-file.c index dd0b6aa873..b8ce21cbaf 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1361,7 +1361,9 @@ int oid_object_info(struct repository *r, return type; } -static void *read_object(const unsigned char *sha1, enum object_type *type, +static void *read_object(struct repository *r, +const unsigned char *sha1, +enum object_type *type, unsigned long *size) { struct object_id oid; @@ -1373,7 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type, hashcpy(oid.hash, sha1); - if (oid_object_info_extended(the_repository, &oid, &oi, 0) < 0) + if (oid_object_info_extended(r, &oid, &oi, 0) < 0) return NULL; return content; } @@ -1414,7 +1416,7 @@ void *read_object_file_extended(const struct object_id *oid, lookup_replace_object(the_repository, oid) : oid; errno = 0; - data = read_object(repl->hash, type, size); + data = read_object(the_repository, repl->hash, type, size); if (data) return data; @@ -1755,7 +1757,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime) if (has_loose_object(oid)) return 0; - buf = read_object(oid->hash, &type, &len); + buf = read_object(the_repository, oid->hash, &type, &len); if (!buf) return error(_("cannot read sha1_file for %s"), oid_to_hex(oid)); hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), len) + 1; -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 11/24] commit-reach.c: allow remove_redundant to handle arbitrary repositories
Signed-off-by: Stefan Beller --- commit-reach.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 4cf471bfaf..ab2bb1e5d5 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -156,7 +156,7 @@ struct commit_list *get_octopus_merge_bases(struct commit_list *in) return ret; } -static int remove_redundant(struct commit **array, int cnt) +static int remove_redundant(struct repository *r, struct commit **array, int cnt) { /* * Some commit in the array may be an ancestor of @@ -174,7 +174,7 @@ static int remove_redundant(struct commit **array, int cnt) ALLOC_ARRAY(filled_index, cnt - 1); for (i = 0; i < cnt; i++) - parse_commit(array[i]); + repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; uint32_t min_generation = array[i]->generation; @@ -190,7 +190,7 @@ static int remove_redundant(struct commit **array, int cnt) if (array[j]->generation < min_generation) min_generation = array[j]->generation; } - common = paint_down_to_common(the_repository, array[i], filled, + common = paint_down_to_common(r, array[i], filled, work, min_generation); if (array[i]->object.flags & PARENT2) redundant[i] = 1; @@ -249,7 +249,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, clear_commit_marks(one, all_flags); clear_commit_marks_many(n, twos, all_flags); - cnt = remove_redundant(rslt, cnt); + cnt = remove_redundant(the_repository, rslt, cnt); result = NULL; for (i = 0; i < cnt; i++) commit_list_insert_by_date(rslt[i], &result); @@ -370,7 +370,7 @@ struct commit_list *reduce_heads(struct commit_list *heads) p->item->object.flags &= ~STALE; } } - num_head = remove_redundant(array, num_head); + num_head = remove_redundant(the_repository, array, num_head); for (i = 0; i < num_head; i++) tail = &commit_list_insert(array[i], tail)->next; free(array); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 07/24] object: parse_object to honor its repository argument
In 8e4b0b6047 (object.c: allow parse_object to handle arbitrary repositories, 2018-06-28), we forgot to pass the repository down to the read_object_file. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- object.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/object.c b/object.c index e54160550c..003f870484 100644 --- a/object.c +++ b/object.c @@ -259,8 +259,8 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if (obj && obj->parsed) return obj; - if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) || - (!obj && has_object_file(oid) && + if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || + (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { if (check_object_signature(repl, NULL, 0, NULL) < 0) { error(_("sha1 mismatch %s"), oid_to_hex(oid)); @@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) return lookup_object(r, oid->hash); } - buffer = read_object_file(oid, &type, &size); + buffer = repo_read_object_file(r, oid, &type, &size); if (buffer) { if (check_object_signature(repl, buffer, size, type_name(type)) < 0) { free(buffer); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 08/24] commit: allow parse_commit* to handle arbitrary repositories
Just like the previous commit, parse_commit and friends are used a lot and are found in new patches, so we cannot change their signature easily. Re-introduce these function prefixed with 'repo_' that take a repository argument and keep the original as a shallow macro. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- commit.c | 18 -- commit.h | 17 + .../coccinelle/the_repository.pending.cocci | 24 +++ 3 files changed, 48 insertions(+), 11 deletions(-) diff --git a/commit.c b/commit.c index dc8a39d52a..7a931d7fd4 100644 --- a/commit.c +++ b/commit.c @@ -443,7 +443,10 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b return 0; } -int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph) +int repo_parse_commit_internal(struct repository *r, + struct commit *item, + int quiet_on_missing, + int use_commit_graph) { enum object_type type; void *buffer; @@ -454,9 +457,9 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com return -1; if (item->object.parsed) return 0; - if (use_commit_graph && parse_commit_in_graph(the_repository, item)) + if (use_commit_graph && parse_commit_in_graph(r, item)) return 0; - buffer = read_object_file(&item->object.oid, &type, &size); + buffer = repo_read_object_file(r, &item->object.oid, &type, &size); if (!buffer) return quiet_on_missing ? -1 : error("Could not read %s", @@ -467,18 +470,19 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com oid_to_hex(&item->object.oid)); } - ret = parse_commit_buffer(the_repository, item, buffer, size, 0); + ret = parse_commit_buffer(r, item, buffer, size, 0); if (save_commit_buffer && !ret) { - set_commit_buffer(the_repository, item, buffer, size); + set_commit_buffer(r, item, buffer, size); return 0; } free(buffer); return ret; } -int parse_commit_gently(struct commit *item, int quiet_on_missing) +int repo_parse_commit_gently(struct repository *r, +struct commit *item, int quiet_on_missing) { - return parse_commit_internal(item, quiet_on_missing, 1); + return repo_parse_commit_internal(r, item, quiet_on_missing, 1); } void parse_commit_or_die(struct commit *item) diff --git a/commit.h b/commit.h index 1d260d62f5..08935f9a19 100644 --- a/commit.h +++ b/commit.h @@ -79,12 +79,21 @@ struct commit *lookup_commit_reference_by_name(const char *name); struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name); int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph); -int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph); -int parse_commit_gently(struct commit *item, int quiet_on_missing); -static inline int parse_commit(struct commit *item) +int repo_parse_commit_internal(struct repository *r, struct commit *item, + int quiet_on_missing, int use_commit_graph); +int repo_parse_commit_gently(struct repository *r, +struct commit *item, +int quiet_on_missing); +static inline int repo_parse_commit(struct repository *r, struct commit *item) { - return parse_commit_gently(item, 0); + return repo_parse_commit_gently(r, item, 0); } +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define parse_commit_internal(item, quiet, use) repo_parse_commit_internal(the_repository, item, quiet, use) +#define parse_commit_gently(item, quiet) repo_parse_commit_gently(the_repository, item, quiet) +#define parse_commit(item) repo_parse_commit(the_repository, item) +#endif + void parse_commit_or_die(struct commit *item); struct buffer_slab; diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 46f3a1b23a..b185fe0a1d 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -40,3 +40,27 @@ expression F; - has_object_file_with_flags( + repo_has_object_file_with_flags(the_repository, E) + +@@ +expression E; +expression F; +expression G; +@@ +- parse_commit_internal( ++ repo_parse_commit_internal(the_repository, + E, F, G) + +@@ +expression E; +expression F; +@@ +- parse_commit_gently( ++ repo_parse_commit_gently(the_repository, + E, F) + +@@ +expression E; +@@ +- parse_commit( ++ repo_parse_commit(the_repository, + E) -- 2.19.1.930.g4
[PATCH 06/24] object-store: prepare has_{sha1, object}_file[_with_flags] to handle arbitrary repositories
Signed-off-by: Stefan Beller --- .../coccinelle/the_repository.pending.cocci | 29 +++ object-store.h| 22 ++ sha1-file.c | 15 ++ 3 files changed, 55 insertions(+), 11 deletions(-) diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 3c7fa70502..46f3a1b23a 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -11,3 +11,32 @@ expression G; + repo_read_object_file(the_repository, E, F, G) +@@ +expression E; +@@ +- has_sha1_file( ++ repo_has_sha1_file(the_repository, + E) + +@@ +expression E; +expression F; +@@ +- has_sha1_file_with_flags( ++ repo_has_sha1_file_with_flags(the_repository, + E) + +@@ +expression E; +@@ +- has_object_file( ++ repo_has_object_file(the_repository, + E) + +@@ +expression E; +expression F; +@@ +- has_object_file_with_flags( ++ repo_has_object_file_with_flags(the_repository, + E) diff --git a/object-store.h b/object-store.h index 00a64622e6..2b5e6ff1ed 100644 --- a/object-store.h +++ b/object-store.h @@ -212,15 +212,27 @@ int read_loose_object(const char *path, * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass * nonzero flags to also set other flags. */ -extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags); -static inline int has_sha1_file(const unsigned char *sha1) +int repo_has_sha1_file_with_flags(struct repository *r, + const unsigned char *sha1, int flags); +static inline int repo_has_sha1_file(struct repository *r, +const unsigned char *sha1) { - return has_sha1_file_with_flags(sha1, 0); + return repo_has_sha1_file_with_flags(r, sha1, 0); } +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define has_sha1_file_with_flags(sha1, flags) repo_has_sha1_file_with_flags(the_repository, sha1, flags) +#define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1) +#endif + /* Same as the above, except for struct object_id. */ -extern int has_object_file(const struct object_id *oid); -extern int has_object_file_with_flags(const struct object_id *oid, int flags); +int repo_has_object_file(struct repository *r, const struct object_id *oid); +int repo_has_object_file_with_flags(struct repository *r, + const struct object_id *oid, int flags); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define has_object_file(oid) repo_has_object_file(the_repository, oid) +#define has_object_file_with_flags(oid, flags) repo_has_object_file_with_flags(the_repository, oid, flags) +#endif /* * Return true iff an alternate object database has a loose object diff --git a/sha1-file.c b/sha1-file.c index c5b704aec5..e77273ccfd 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1768,24 +1768,27 @@ int force_object_loose(const struct object_id *oid, time_t mtime) return ret; } -int has_sha1_file_with_flags(const unsigned char *sha1, int flags) +int repo_has_sha1_file_with_flags(struct repository *r, + const unsigned char *sha1, int flags) { struct object_id oid; if (!startup_info->have_repository) return 0; hashcpy(oid.hash, sha1); - return oid_object_info_extended(the_repository, &oid, NULL, + return oid_object_info_extended(r, &oid, NULL, flags | OBJECT_INFO_SKIP_CACHED) >= 0; } -int has_object_file(const struct object_id *oid) +int repo_has_object_file(struct repository *r, +const struct object_id *oid) { - return has_sha1_file(oid->hash); + return repo_has_sha1_file(r, oid->hash); } -int has_object_file_with_flags(const struct object_id *oid, int flags) +int repo_has_object_file_with_flags(struct repository *r, + const struct object_id *oid, int flags) { - return has_sha1_file_with_flags(oid->hash, flags); + return repo_has_sha1_file_with_flags(r, oid->hash, flags); } static void check_tree(const void *buf, size_t size) -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 09/24] commit-reach.c: allow paint_down_to_common to handle arbitrary repositories
As the function is file local and not widely used, migrate it all at once. Signed-off-by: Stefan Beller --- commit-reach.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index a9da65c462..080ae0a758 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -30,7 +30,8 @@ static int queue_has_nonstale(struct prio_queue *queue) } /* all input commits in one and twos[] must have been parsed! */ -static struct commit_list *paint_down_to_common(struct commit *one, int n, +static struct commit_list *paint_down_to_common(struct repository *r, + struct commit *one, int n, struct commit **twos, int min_generation) { @@ -83,7 +84,7 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, parents = parents->next; if ((p->object.flags & flags) == flags) continue; - if (parse_commit(p)) + if (repo_parse_commit(r, p)) return NULL; p->object.flags |= flags; prio_queue_put(&queue, p); @@ -116,7 +117,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co return NULL; } - list = paint_down_to_common(one, n, twos, 0); + list = paint_down_to_common(the_repository, one, n, twos, 0); while (list) { struct commit *commit = pop_commit(&list); @@ -187,8 +188,8 @@ static int remove_redundant(struct commit **array, int cnt) if (array[j]->generation < min_generation) min_generation = array[j]->generation; } - common = paint_down_to_common(array[i], filled, work, - min_generation); + common = paint_down_to_common(the_repository, array[i], filled, + work, min_generation); if (array[i]->object.flags & PARENT2) redundant[i] = 1; for (j = 0; j < filled; j++) @@ -322,7 +323,9 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * if (commit->generation > min_generation) return ret; - bases = paint_down_to_common(commit, nr_reference, reference, commit->generation); + bases = paint_down_to_common(the_repository, commit, +nr_reference, reference, +commit->generation); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); -- 2.19.1.930.g4563a0d9d0-goog
[PATCH 05/24] object-store: prepare read_object_file to deal with arbitrary repositories
As read_object_file is a widely used function (which is also regularly used in new code in flight between master..pu), changing its signature is painful is hard, as other series in flight rely on the original signature. It would burden the maintainer if we'd just change the signature. Introduce repo_read_object_file which takes the repository argument, and hide the original read_object_file as a macro behind NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21) Add a coccinelle patch to convert existing callers, but do not apply the resulting patch from 'make coccicheck' to keep the diff of this patch small. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- contrib/coccinelle/the_repository.pending.cocci | 13 + object-store.h | 10 -- 2 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 contrib/coccinelle/the_repository.pending.cocci diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci new file mode 100644 index 00..3c7fa70502 --- /dev/null +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -0,0 +1,13 @@ +// This file is used for the ongoing refactoring of +// bringing the index or repository struct in all of +// our code base. + +@@ +expression E; +expression F; +expression G; +@@ +- read_object_file( ++ repo_read_object_file(the_repository, + E, F, G) + diff --git a/object-store.h b/object-store.h index 3d98a682b2..00a64622e6 100644 --- a/object-store.h +++ b/object-store.h @@ -165,10 +165,16 @@ extern void *read_object_file_extended(struct repository *r, const struct object_id *oid, enum object_type *type, unsigned long *size, int lookup_replace); -static inline void *read_object_file(const struct object_id *oid, enum object_type *type, unsigned long *size) +static inline void *repo_read_object_file(struct repository *r, + const struct object_id *oid, + enum object_type *type, + unsigned long *size) { - return read_object_file_extended(the_repository, oid, type, size, 1); + return read_object_file_extended(r, oid, type, size, 1); } +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define read_object_file(oid, type, size) repo_read_object_file(the_repository, oid, type, size) +#endif /* Read and unpack an object file into memory, write memory to an object file */ int oid_object_info(struct repository *r, const struct object_id *, unsigned long *); -- 2.19.1.930.g4563a0d9d0-goog
[PATCHv2 00/24] Bring more repository handles into our code base]
This resends sb/more-repo-in-api On Thu, Oct 25, 2018 at 2:14 AM SZEDER Gábor wrote: > On Tue, Oct 16, 2018 at 04:35:49PM -0700, Stefan Beller wrote: > > This converts the 'show_submodule_header' function to use > > the repository API properly, such that the submodule objects > > are not added to the main object store. > > This patch breaks the test suite with 'GIT_TEST_COMMIT_GRAPH=1', in > particular 't4041-diff-submodule-option.sh' fails with: > [...] I debugged into this and the last four patches will fix it. I also picked up the patch for pending semantic patches, as the first patch, can I have your sign off, please? This also addresses Jonathans feedback in https://public-inbox.org/git/20181019203750.110741-1-jonathanta...@google.com/ A range-diff is below. Thanks, Stefan SZEDER Gábor (1): Makefile: add pending semantic patches Stefan Beller (23): sha1_file: allow read_object to read objects in arbitrary repositories packfile: allow has_packed_and_bad to handle arbitrary repositories object-store: allow read_object_file_extended to read from arbitrary repositories object-store: prepare read_object_file to deal with arbitrary repositories object-store: prepare has_{sha1, object}_file[_with_flags] to handle arbitrary repositories object: parse_object to honor its repository argument commit: allow parse_commit* to handle arbitrary repositories commit-reach.c: allow paint_down_to_common to handle arbitrary repositories commit-reach.c: allow merge_bases_many to handle arbitrary repositories commit-reach.c: allow remove_redundant to handle arbitrary repositories commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary repositories commit-reach: prepare get_merge_bases to handle arbitrary repositories commit-reach: prepare in_merge_bases[_many] to handle arbitrary repositories commit: prepare get_commit_buffer to handle arbitrary repositories commit: prepare repo_unuse_commit_buffer to handle arbitrary repositories commit: prepare logmsg_reencode to handle arbitrary repositories pretty: prepare format_commit_message to handle arbitrary repositories submodule: use submodule repos for object lookup submodule: don't add submodule as odb for push commit-graph: convert remaining function to handle arbitrary repositories commit: make free_commit_buffer and release_commit_memory repository agnostic path.h: make REPO_GIT_PATH_FUNC repository agnostic t/helper/test-repository: celebrate independence from the_repository Makefile | 6 +- builtin/fsck.c| 3 +- builtin/log.c | 6 +- builtin/rev-list.c| 3 +- cache.h | 2 + commit-graph.c| 40 +++-- commit-reach.c| 73 + commit-reach.h| 38 +++-- commit.c | 41 ++--- commit.h | 43 +- .../coccinelle/the_repository.pending.cocci | 144 ++ object-store.h| 35 - object.c | 8 +- packfile.c| 5 +- packfile.h| 2 +- path.h| 2 +- pretty.c | 28 ++-- pretty.h | 7 +- sha1-file.c | 34 +++-- streaming.c | 2 +- submodule.c | 79 +++--- t/helper/test-repository.c| 10 ++ 22 files changed, 459 insertions(+), 152 deletions(-) create mode 100644 contrib/coccinelle/the_repository.pending.cocci git range-diff origin/sb/more-repo-in-api... [...] # rebased to origin/master -: -- > 116: 4ede3d42df Seventh batch for 2.20 -: -- > 117: b1de196491 Makefile: add pending semantic patches 1: 1b9b5c695e = 118: f1be5eb487 sha1_file: allow read_object to read objects in arbitrary repositories 2: 33b94066f2 = 119: 9100a5705d packfile: allow has_packed_and_bad to handle arbitrary repositories 3: 5217b6b1e1 = 120: a4ad7791da object-store: allow read_object_file_extended to read from arbitrary repositories 4: 2b7239b55b ! 121: 5d7b3a6dd9 object-store: prepare read_object_file to deal with arbitrary repositories @@ -19,10 +19,10 @@ Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano - diff --git a/contrib/coccinelle/the_repository.cocci b/contrib/coccinelle/the_repository.cocci + diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci new file mode 100644 --- /dev/nu
[PATCH 01/24] Makefile: add pending semantic patches
From: SZEDER Gábor There are basically two main use cases for semantic patches: - To avoid undesirable code patterns, e.g. we should not use sha1_to_hex(oid.hash) or strbuf_addf(&sb, "fixed string"), but use oid_to_hex(&oid) or strbuf_addstr(&sb, "fixed string") instead. Note that in these cases we don't remove the functions sha1_to_hex() or strbuf_addf(), because there are good reasons to use them in other scenarios. Our semantic patches under 'contrib/coccinelle/' fall into this category, and we have 'make coccicheck' and the static analysis build job on Travis CI to catch these undesirable code patterns preferably early, and to prevent them from entering our codebase. - To perform one-off code transformations, e.g. to modify a function's name and/or signature and convert all its callsites; see e.g. commits abef9020e3 (sha1_file: convert sha1_object_info* to object_id, 2018-03-12) and b4f5aca40e (sha1_file: convert read_sha1_file to struct object_id, 2018-03-12). To allows semantic patches of the second category, we'll introduce the concept of "pending" semantic patches, stored in 'contrib/coccinelle/.pending.cocci' files, modifying 'make coccicheck' to skip them, and adding the new 'make coccicheck-pending' target to make it convenient to apply them. [Missing: SZEDERs sign off, so I also do not sign off] --- Makefile | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index b08d5ea258..6ea2bbd7f5 100644 --- a/Makefile +++ b/Makefile @@ -2739,9 +2739,11 @@ endif then \ echo '' SPATCH result: $@; \ fi -coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci)) +coccicheck: $(addsuffix .patch,$(filter-out %.pending.cocci,$(wildcard contrib/coccinelle/*.cocci))) -.PHONY: coccicheck +coccicheck-pending: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.pending.cocci)) + +.PHONY: coccicheck coccicheck-pending ### Installation rules -- 2.19.1.930.g4563a0d9d0-goog
Re: Infinite loop regression in git-fsck in v2.12.0
On Tue, Oct 30 2018, Ævar Arnfjörð Bjarmason wrote: > The test is easy, just add a 'git fsck' at the end of t5000-tar-tree.sh, > but more generally it seems having something like GIT_TEST_FSCK=true is > a good idea. We do a bunch of stress testing of the object store in the > test suite that we're unlikely to encounter in the wild. > > Of course my idea of how to do that in my > <20181030184331.27264-3-ava...@gmail.com> would be counterproductive, > i.e. it seems we want to catch all the cases where there's a bad fsck, > just that it returns in a certain way. > > So maybe a good approach would be that we'd annotate all those test > whose fsck fails with "this is how it should fail", and run those tests > under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting > that no tests other than those marked as failing the fsck check at the > end fail it. WIP patch for doing that: diff --git a/Makefile b/Makefile index b08d5ea258..ca624c381f 100644 --- a/Makefile +++ b/Makefile @@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-dump-fsmonitor.o TEST_BUILTINS_OBJS += test-dump-split-index.o TEST_BUILTINS_OBJS += test-dump-untracked-cache.o TEST_BUILTINS_OBJS += test-example-decorate.o +TEST_BUILTINS_OBJS += test-env-bool.o TEST_BUILTINS_OBJS += test-genrandom.o TEST_BUILTINS_OBJS += test-hashmap.o TEST_BUILTINS_OBJS += test-index-version.o diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 5df8b682aa..c4481085c4 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = { { "dump-fsmonitor", cmd__dump_fsmonitor }, { "dump-split-index", cmd__dump_split_index }, { "dump-untracked-cache", cmd__dump_untracked_cache }, + { "env-bool", cmd__env_bool }, { "example-decorate", cmd__example_decorate }, { "genrandom", cmd__genrandom }, { "hashmap", cmd__hashmap }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 71f470b871..f7845fbc56 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -13,6 +13,7 @@ int cmd__dump_cache_tree(int argc, const char **argv); int cmd__dump_fsmonitor(int argc, const char **argv); int cmd__dump_split_index(int argc, const char **argv); int cmd__dump_untracked_cache(int argc, const char **argv); +int cmd__env_bool(int argc, const char **argv); int cmd__example_decorate(int argc, const char **argv); int cmd__genrandom(int argc, const char **argv); int cmd__hashmap(int argc, const char **argv); diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh index 635918505d..92fbce2920 100755 --- a/t/t1305-config-include.sh +++ b/t/t1305-config-include.sh @@ -313,4 +313,8 @@ test_expect_success 'include cycles are detected' ' test_i18ngrep "exceeded maximum include depth" stderr ' +GIT_FSCK_FAILS=true +GIT_FSCK_FAILS_TEST=' + test_i18ngrep "exceeded maximum include depth" fsck.err +' test_done diff --git a/t/t3103-ls-tree-misc.sh b/t/t3103-ls-tree-misc.sh index 14520913af..06abf84ef4 100755 --- a/t/t3103-ls-tree-misc.sh +++ b/t/t3103-ls-tree-misc.sh @@ -22,4 +22,10 @@ test_expect_success 'ls-tree fails with non-zero exit code on broken tree' ' test_must_fail git ls-tree -r HEAD ' +GIT_FSCK_FAILS=true +GIT_FSCK_FAILS_TEST=' + test_i18ngrep "invalid sha1 pointer in cache-tree" fsck.err && + test_i18ngrep "broken link from" fsck.out && + test_i18ngrep "missing tree" fsck.out +' test_done diff --git a/t/test-lib.sh b/t/test-lib.sh index 897e6fcc94..d4ebb94998 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -454,6 +454,8 @@ GIT_EXIT_OK= trap 'die' EXIT trap 'exit $?' INT +GIT_FSCK_FAILS= + # The user-facing functions are loaded from a separate file so that # test_perf subshells can have them too . "$TEST_DIRECTORY/test-lib-functions.sh" @@ -790,6 +792,25 @@ test_at_end_hook_ () { } test_done () { + if test_have_prereq TEST_FSCK + then + desc='git fsck at end (due to GIT_TEST_FSCK)' + if test -n "$GIT_FSCK_FAILS" + then + test_expect_success "$desc (expected to fail)" ' + test_must_fail git fsck 2>fsck.err >fsck.out + ' + test_expect_success "$descriptor (expected to fail) -- assert failure mode" " + test_path_exists fsck.err && + test_path_exists fsck.out && + $GIT_FSCK_FAILS_TEST + " + else + test_expect_success "$desc" ' + git fsck + ' + fi + fi GIT_EXIT
Re: Infinite loop regression in git-fsck in v2.12.0
On Tue, Oct 30, 2018 at 09:03:24PM +0100, Ævar Arnfjörð Bjarmason wrote: > While playing around with having a GIT_TEST_FSCK=true as I suggested in > https://public-inbox.org/git/20181030184331.27264-3-ava...@gmail.com/ I > found that we've had an infinite loop in git-fsck since c68b489e56 > ("fsck: parse loose object paths directly", 2017-01-13) > > In particular in the while() loop added by f6371f9210 ("sha1_file: add > read_loose_object() function", 2017-01-13) in the check_stream_sha1() > function. > > To reproduce just: > > ( > cd t && > ./t5000-tar-tree.sh -d && > git -C trash\ directory.t5000-tar-tree/ fsck > ) Thanks, I was easily able to reproduce. > Before we'd print: > > error: sha1 mismatch 19f9c8273ec45a8938e6999cb59b3ff66739902a > error: 19f9c8273ec45a8938e6999cb59b3ff66739902a: object corrupt or missing > Checking object directories: 100% (256/256), done. > missing blob 19f9c8273ec45a8938e6999cb59b3ff66739902a The problem isn't actually a sha1 mismatch, though that's what parse_object() will report. The issue is actually that the file is truncated. So zlib does not say "this is corrupt", but rather "I need more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both for "I need more bytes" (in which we know we are truncated, because we fed the whole mmap'd file in the first place) as well as "I need more output buffer space" (which just means we should keep looping!). So we need to distinguish those cases. I think this is the simplest fix: diff --git a/sha1-file.c b/sha1-file.c index dd0b6aa873..a7ff5fe25d 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream, * see the comment in unpack_sha1_rest for details. */ while (total_read <= size && + stream->avail_in > 0 && (status == Z_OK || status == Z_BUF_ERROR)) { stream->next_out = buf; stream->avail_out = sizeof(buf); > I have no idea if this makes sense, but this fixes it and we pass all > the fsck tests with it: > > diff --git a/sha1-file.c b/sha1-file.c > index dd0b6aa873..fffc31458e 100644 > --- a/sha1-file.c > +++ b/sha1-file.c > @@ -2182,7 +2182,7 @@ static int check_stream_sha1(git_zstream *stream, > git_hash_ctx c; > unsigned char real_sha1[GIT_MAX_RAWSZ]; > unsigned char buf[4096]; > - unsigned long total_read; > + unsigned long total_read, last_total_read; > int status = Z_OK; > > the_hash_algo->init_fn(&c); > @@ -2193,6 +2193,7 @@ static int check_stream_sha1(git_zstream *stream, >* do not count against the object's content size. >*/ > total_read = stream->total_out - strlen(hdr) - 1; > + last_total_read = total_read; This works just by checking that we are making forward progress in the output buffer. I think that would _probably_ be OK for this case, since we know we have all of the input available. But in a case where we're feeding the input in a stream, it would not be. It's possible there that we would not create any output in one round, but would do so after feeding more input bytes. I think the patch I showed above addresses the root cause more directly. I'll wrap that up in a real commit, but I think there may be some related work: - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't strictly correct, but is probably good enough). However, "git cat-file blob 19f9c827" exits non-zero without printing anything. It probably should complain more loudly. - the offending loop comes from f6371f9210. But that commit was mostly cargo-culting other parts of sha1-file.c. I'm worried that this bug exists elsewhere, too. I'll dig around to see if I can find other instances. -Peff
Infinite loop regression in git-fsck in v2.12.0
While playing around with having a GIT_TEST_FSCK=true as I suggested in https://public-inbox.org/git/20181030184331.27264-3-ava...@gmail.com/ I found that we've had an infinite loop in git-fsck since c68b489e56 ("fsck: parse loose object paths directly", 2017-01-13) In particular in the while() loop added by f6371f9210 ("sha1_file: add read_loose_object() function", 2017-01-13) in the check_stream_sha1() function. To reproduce just: ( cd t && ./t5000-tar-tree.sh -d && git -C trash\ directory.t5000-tar-tree/ fsck ) Before we'd print: error: sha1 mismatch 19f9c8273ec45a8938e6999cb59b3ff66739902a error: 19f9c8273ec45a8938e6999cb59b3ff66739902a: object corrupt or missing Checking object directories: 100% (256/256), done. missing blob 19f9c8273ec45a8938e6999cb59b3ff66739902a Now we just hang on: Checking object directories: 9% (24/256) I have no idea if this makes sense, but this fixes it and we pass all the fsck tests with it: diff --git a/sha1-file.c b/sha1-file.c index dd0b6aa873..fffc31458e 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -2182,7 +2182,7 @@ static int check_stream_sha1(git_zstream *stream, git_hash_ctx c; unsigned char real_sha1[GIT_MAX_RAWSZ]; unsigned char buf[4096]; - unsigned long total_read; + unsigned long total_read, last_total_read; int status = Z_OK; the_hash_algo->init_fn(&c); @@ -2193,6 +2193,7 @@ static int check_stream_sha1(git_zstream *stream, * do not count against the object's content size. */ total_read = stream->total_out - strlen(hdr) - 1; + last_total_read = total_read; /* * This size comparison must be "<=" to read the final zlib packets; @@ -2207,6 +2208,9 @@ static int check_stream_sha1(git_zstream *stream, status = git_inflate(stream, Z_FINISH); the_hash_algo->update_fn(&c, buf, stream->next_out - buf); total_read += stream->next_out - buf; + if (last_total_read == total_read) + return -1; + last_total_read = total_read; } git_inflate_end(stream); I.e. we get into a loop where total_read isn't increasing. We no longer print "sha1 mismatch" but maybe that's an emergent effect of something else. Haven't checked. The test is easy, just add a 'git fsck' at the end of t5000-tar-tree.sh, but more generally it seems having something like GIT_TEST_FSCK=true is a good idea. We do a bunch of stress testing of the object store in the test suite that we're unlikely to encounter in the wild. Of course my idea of how to do that in my <20181030184331.27264-3-ava...@gmail.com> would be counterproductive, i.e. it seems we want to catch all the cases where there's a bad fsck, just that it returns in a certain way. So maybe a good approach would be that we'd annotate all those test whose fsck fails with "this is how it should fail", and run those tests under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting that no tests other than those marked as failing the fsck check at the end fail it.
[RFC v1] Add virtual file system settings and hook proc
From: Ben Peart On index load, clear/set the skip worktree bits based on the virtual file system data. Use virtual file system data to update skip-worktree bit in unpack-trees. Use virtual file system data to exclude files and folders not explicitly requested. Signed-off-by: Ben Peart --- We have taken several steps to make git perform well on very large repos. Some of those steps include: improving underlying algorithms, utilizing multi-threading where possible, and simplifying the behavior of some commands. These changes typically benefit all git repos to varying degrees. While these optimizations all help, they are insufficient to provide adequate performance on the very large repos we often work with. To make git perform well on the very largest repos, we had to make more significant changes. The biggest performance win by far is the work we have done to make git operations O(modified) instead of O(size of repo). This takes advantage of the fact that the number of files a developer has modified is a tiny fraction of the overall repo size. We accomplished this by utilizing the existing internal logic for the skip worktree bit and excludes to tell git to ignore all files and folders other than those that have been modified. This logic is driven by an external process that monitors writes to the repo and communicates the list of files and folders with changes to git via the virtual file system hook in this patch. The external process maintains a list of files and folders that have been modified. When git runs, it requests the list of files and folders that have been modified via the virtual file system hook. Git then sets/clears the skip-worktree bit on the cache entries and builds a hashmap of the modified files/folders that is used by the excludes logic to avoid scanning the entire repo looking for changes and untracked files. With this system, we have been able to make local git command performance on extremely large repos (millions of files, 1/2 million folders) entirely manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second commit, etc). Our desire is to eliminate all custom patches in our fork of git. To that end, I'm submitting this as an RFC to see how much interest there is and how much willingness to take this type of change into git. Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/c06b290d2f Checkout: git fetch https://github.com/benpeart/git virtual-filesystem-v1 && git checkout c06b290d2f Documentation/config.txt | 8 + Documentation/githooks.txt | 20 ++ Makefile | 1 + cache.h | 1 + config.c | 37 +++- config.h | 1 + dir.c| 34 +++- environment.c| 1 + read-cache.c | 2 + t/README | 3 + t/t1092-virtualfilesystem.sh | 354 +++ t/t1092/virtualfilesystem| 23 +++ unpack-trees.c | 23 ++- virtualfilesystem.c | 308 ++ virtualfilesystem.h | 25 +++ 15 files changed, 833 insertions(+), 8 deletions(-) create mode 100755 t/t1092-virtualfilesystem.sh create mode 100755 t/t1092/virtualfilesystem create mode 100644 virtualfilesystem.c create mode 100644 virtualfilesystem.h diff --git a/Documentation/config.txt b/Documentation/config.txt index 09e95e9e98..dd4b834375 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -441,6 +441,14 @@ core.fsmonitor:: avoiding unnecessary processing of files that have not changed. See the "fsmonitor-watchman" section of linkgit:githooks[5]. +core.virtualFilesystem:: + If set, the value of this variable is used as a command which + will identify all files and directories that are present in + the working directory. Git will only track and update files + listed in the virtual file system. Using the virtual file system + will supersede the sparse-checkout settings which will be ignored. + See the "virtual file system" section of linkgit:githooks[6]. + core.trustctime:: If false, the ctime differences between the index and the working tree are ignored; useful when the inode change time diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 959044347e..87f9ad2a77 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -492,6 +492,26 @@ This hook is invoked by `git-p4 submit`. It takes no parameters and nothing from standard input. Exiting with non-zero status from this script prevent `git-p4 submit` from launching. Run `git-p4 submit --help` for details. +virtualFilesystem +~~ + +"Virtual File System" allows populating the working directory sparsely. +The projection data is typically automatically generated by an external +process. Git will limit what files it che
[PATCH 0/1] mingw: fix isatty() after dup2()
This patch has been looong in the waiting (well over a year) for being sent to the Git mailing list; it is a fixup for a change of isatty() on Windows that had long made it into git.git's master branch. Johannes Schindelin (1): mingw: fix isatty() after dup2() compat/mingw.h | 3 +++ compat/winansi.c | 12 2 files changed, 15 insertions(+) base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-61%2Fdscho%2Fmingw-isatty-and-dup2-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-61/dscho/mingw-isatty-and-dup2-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/61 -- gitgitgadget
[PATCH 1/1] mingw: fix isatty() after dup2()
From: Johannes Schindelin Since a9b8a09c3c30 (mingw: replace isatty() hack, 2016-12-22), we handle isatty() by special-casing the stdin/stdout/stderr file descriptors, caching the return value. However, we missed the case where dup2() overrides the respective file descriptor. That poses a problem e.g. where the `show` builtin asks for a pager very early, the `setup_pager()` function sets the pager depending on the return value of `isatty()` and then redirects stdout. Subsequently, `cmd_log_init_finish()` calls `setup_pager()` *again*. What should happen now is that `isatty()` reports that stdout is *not* a TTY and consequently stdout should be left alone. Let's override dup2() to handle this appropriately. This fixes https://github.com/git-for-windows/git/issues/1077 Signed-off-by: Johannes Schindelin --- compat/mingw.h | 3 +++ compat/winansi.c | 12 2 files changed, 15 insertions(+) diff --git a/compat/mingw.h b/compat/mingw.h index f31dcff2b..b04556ce0 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -390,6 +390,9 @@ int mingw_raise(int sig); int winansi_isatty(int fd); #define isatty winansi_isatty +int winansi_dup2(int oldfd, int newfd); +#define dup2 winansi_dup2 + void winansi_init(void); HANDLE winansi_get_osfhandle(int fd); diff --git a/compat/winansi.c b/compat/winansi.c index a11a0f16d..f4f08237f 100644 --- a/compat/winansi.c +++ b/compat/winansi.c @@ -474,6 +474,18 @@ static void die_lasterr(const char *fmt, ...) va_end(params); } +#undef dup2 +int winansi_dup2(int oldfd, int newfd) +{ + int ret = dup2(oldfd, newfd); + + if (!ret && newfd >= 0 && newfd <= 2) + fd_is_interactive[newfd] = oldfd < 0 || oldfd > 2 ? + 0 : fd_is_interactive[oldfd]; + + return ret; +} + static HANDLE duplicate_handle(HANDLE hnd) { HANDLE hresult, hproc = GetCurrentProcess(); -- gitgitgadget
[PATCH v2 0/3] index-pack: test updates
I'd still probalby like to have core.checkCollisions as a config knob to be able to turn it off, but let's see what Jeff comes up with once he finishes his WIP cache patch. In the meantime 1-3/4 of my series is obviously correct test fixes which I'd like queued up first. Ævar Arnfjörð Bjarmason (3): pack-objects test: modernize style pack-objects tests: don't leave test .git corrupt at end index-pack tests: don't leave test repo dirty at end t/t1060-object-corruption.sh | 4 ++- t/t5300-pack-object.sh | 47 +++- 2 files changed, 28 insertions(+), 23 deletions(-) -- 2.19.1.899.g0250525e69
[PATCH v2 1/3] pack-objects test: modernize style
Modernize the quoting and indentation style of two tests added in 8685da4256 ("don't ever allow SHA1 collisions to exist by fetching a pack", 2007-03-20), and of a subsequent one added in 4614043c8f ("index-pack: use streaming interface for collision test on large blobs", 2012-05-24) which had copied the style of the first two. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t5300-pack-object.sh | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 6c620cd540..a0309e4bab 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -475,22 +475,22 @@ test_expect_success 'pack-objects in too-many-packs mode' ' # two tests at the end of this file. # -test_expect_success \ -'fake a SHA1 hash collision' \ -'long_a=$(git hash-object a | sed -e "s!^..!&/!") && - long_b=$(git hash-object b | sed -e "s!^..!&/!") && - test -f .git/objects/$long_b && - cp -f .git/objects/$long_a \ - .git/objects/$long_b' +test_expect_success 'fake a SHA1 hash collision' ' + long_a=$(git hash-object a | sed -e "s!^..!&/!") && + long_b=$(git hash-object b | sed -e "s!^..!&/!") && + test -f .git/objects/$long_b && + cp -f .git/objects/$long_a \ + .git/objects/$long_b +' -test_expect_success \ -'make sure index-pack detects the SHA1 collision' \ -'test_must_fail git index-pack -o bad.idx test-3.pack 2>msg && - test_i18ngrep "SHA1 COLLISION FOUND" msg' +test_expect_success 'make sure index-pack detects the SHA1 collision' ' + test_must_fail git index-pack -o bad.idx test-3.pack 2>msg && + test_i18ngrep "SHA1 COLLISION FOUND" msg +' -test_expect_success \ -'make sure index-pack detects the SHA1 collision (large blobs)' \ -'test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg && - test_i18ngrep "SHA1 COLLISION FOUND" msg' +test_expect_success 'make sure index-pack detects the SHA1 collision (large blobs)' ' + test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg && + test_i18ngrep "SHA1 COLLISION FOUND" msg +' test_done -- 2.19.1.899.g0250525e69
[PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end
Change the pack-objects tests to not leave their .git directory corrupt and the end. In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment was added warning against adding any subsequent tests, but since 4614043c8f ("index-pack: use streaming interface for collision test on large blobs", 2012-05-24) the comment has drifted away from the code, mentioning two test, when we actually have three. Instead of having this warning let's just create a new .git directory specifically for these tests. As an aside, it would be interesting to instrument the test suite to run a "git fsck" at the very end (in "test_done"). That would have errored before this change, and may find other issues #leftoverbits. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t5300-pack-object.sh | 37 - 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index a0309e4bab..410a09b0dd 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -468,29 +468,32 @@ test_expect_success 'pack-objects in too-many-packs mode' ' git fsck ' -# -# WARNING! -# -# The following test is destructive. Please keep the next -# two tests at the end of this file. -# - -test_expect_success 'fake a SHA1 hash collision' ' - long_a=$(git hash-object a | sed -e "s!^..!&/!") && - long_b=$(git hash-object b | sed -e "s!^..!&/!") && - test -f .git/objects/$long_b && - cp -f .git/objects/$long_a \ - .git/objects/$long_b +test_expect_success 'setup: fake a SHA1 hash collision' ' + git init corrupt && + ( + cd corrupt && + long_a=$(git hash-object -w ../a | sed -e "s!^..!&/!") && + long_b=$(git hash-object -w ../b | sed -e "s!^..!&/!") && + test -f .git/objects/$long_b && + cp -f .git/objects/$long_a \ + .git/objects/$long_b + ) ' test_expect_success 'make sure index-pack detects the SHA1 collision' ' - test_must_fail git index-pack -o bad.idx test-3.pack 2>msg && - test_i18ngrep "SHA1 COLLISION FOUND" msg + ( + cd corrupt && + test_must_fail git index-pack -o ../bad.idx ../test-3.pack 2>msg && + test_i18ngrep "SHA1 COLLISION FOUND" msg + ) ' test_expect_success 'make sure index-pack detects the SHA1 collision (large blobs)' ' - test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg && - test_i18ngrep "SHA1 COLLISION FOUND" msg + ( + cd corrupt && + test_must_fail git -c core.bigfilethreshold=1 index-pack -o ../bad.idx ../test-3.pack 2>msg && + test_i18ngrep "SHA1 COLLISION FOUND" msg + ) ' test_done -- 2.19.1.899.g0250525e69
[PATCH v2 3/3] index-pack tests: don't leave test repo dirty at end
Change a test added in 51054177b3 ("index-pack: detect local corruption in collision check", 2017-04-01) so that the repository isn't left dirty at the end. Due to the caveats explained in 720dae5a19 ("config doc: elaborate on fetch.fsckObjects security", 2018-07-27) even a "fetch" that fails will write to the local object store, so let's copy the bit-error test directory before running this test. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t1060-object-corruption.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh index ac1f189fd2..4feb65157d 100755 --- a/t/t1060-object-corruption.sh +++ b/t/t1060-object-corruption.sh @@ -117,8 +117,10 @@ test_expect_failure 'clone --local detects misnamed objects' ' ' test_expect_success 'fetch into corrupted repo with index-pack' ' + cp -R bit-error bit-error-cp && + test_when_finished "rm -rf bit-error-cp" && ( - cd bit-error && + cd bit-error-cp && test_must_fail git -c transfer.unpackLimit=1 \ fetch ../no-bit-error 2>stderr && test_i18ngrep ! -i collision stderr -- 2.19.1.899.g0250525e69
[PATCH 4/4] mingw: unset PERL5LIB by default
From: Johannes Schindelin Git for Windows ships with its own Perl interpreter, and insists on using it, so it will most likely wreak havoc if PERL5LIB is set before launching Git. Let's just unset that environment variables when spawning processes. To make this feature extensible (and overrideable), there is a new config setting `core.unsetenvvars` that allows specifying a comma-separated list of names to unset before spawning processes. Reported by Gabriel Fuhrmann. Signed-off-by: Johannes Schindelin --- Documentation/config.txt | 6 ++ compat/mingw.c | 35 ++- t/t0029-core-unsetenvvars.sh | 30 ++ 3 files changed, 70 insertions(+), 1 deletion(-) create mode 100755 t/t0029-core-unsetenvvars.sh diff --git a/Documentation/config.txt b/Documentation/config.txt index 09e95e9e9..f338f0b2c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -921,6 +921,12 @@ relatively high IO latencies. When enabled, Git will do the index comparison to the filesystem data in parallel, allowing overlapping IO's. Defaults to true. +core.unsetenvvars:: + Windows-only: comma-separated list of environment variables' + names that need to be unset before spawning any other process. + Defaults to `PERL5LIB` to account for the fact that Git for + Windows insists on using its own Perl interpreter. + core.createObject:: You can set this to 'link', in which case a hardlink followed by a delete of the source are used to make sure that object creation diff --git a/compat/mingw.c b/compat/mingw.c index 272d5e11e..181e74c23 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -212,6 +212,7 @@ enum hide_dotfiles_type { }; static enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY; +static char *unset_environment_variables; int mingw_core_config(const char *var, const char *value, void *cb) { @@ -223,6 +224,12 @@ int mingw_core_config(const char *var, const char *value, void *cb) return 0; } + if (!strcmp(var, "core.unsetenvvars")) { + free(unset_environment_variables); + unset_environment_variables = xstrdup(value); + return 0; + } + return 0; } @@ -1180,6 +1187,27 @@ static wchar_t *make_environment_block(char **deltaenv) return wenvblk; } +static void do_unset_environment_variables(void) +{ + static int done; + char *p = unset_environment_variables; + + if (done || !p) + return; + done = 1; + + for (;;) { + char *comma = strchr(p, ','); + + if (comma) + *comma = '\0'; + unsetenv(p); + if (!comma) + break; + p = comma + 1; + } +} + struct pinfo_t { struct pinfo_t *next; pid_t pid; @@ -1198,9 +1226,12 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen wchar_t wcmd[MAX_PATH], wdir[MAX_PATH], *wargs, *wenvblk = NULL; unsigned flags = CREATE_UNICODE_ENVIRONMENT; BOOL ret; + HANDLE cons; + + do_unset_environment_variables(); /* Determine whether or not we are associated to a console */ - HANDLE cons = CreateFile("CONOUT$", GENERIC_WRITE, + cons = CreateFile("CONOUT$", GENERIC_WRITE, FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (cons == INVALID_HANDLE_VALUE) { @@ -2437,6 +2468,8 @@ void mingw_startup(void) /* fix Windows specific environment settings */ setup_windows_environment(); + unset_environment_variables = xstrdup("PERL5LIB"); + /* initialize critical section for waitpid pinfo_t list */ InitializeCriticalSection(&pinfo_cs); diff --git a/t/t0029-core-unsetenvvars.sh b/t/t0029-core-unsetenvvars.sh new file mode 100755 index 0..24ce46a6e --- /dev/null +++ b/t/t0029-core-unsetenvvars.sh @@ -0,0 +1,30 @@ +#!/bin/sh + +test_description='test the Windows-only core.unsetenvvars setting' + +. ./test-lib.sh + +if ! test_have_prereq MINGW +then + skip_all='skipping Windows-specific tests' + test_done +fi + +test_expect_success 'setup' ' + mkdir -p "$TRASH_DIRECTORY/.git/hooks" && + write_script "$TRASH_DIRECTORY/.git/hooks/pre-commit" <<-\EOF + echo $HOBBES >&2 + EOF +' + +test_expect_success 'core.unsetenvvars works' ' + HOBBES=Calvin && + export HOBBES && + git commit --allow-empty -m with 2>err && + grep Calvin err && + git -c core.unsetenvvars=FINDUS,HOBBES,CALVIN \ + commit --allow-empty -m without 2>err && + ! grep Calvin err +' + +test_done -- gitgitgadget
[PATCH 1/4] config: rename `dummy` parameter to `cb` in git_default_config()
From: Johannes Schindelin This is the convention elsewhere (and prepares for the case where we may need to pass callback data). Signed-off-by: Johannes Schindelin --- config.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/config.c b/config.c index 4051e3882..3687c6783 100644 --- a/config.c +++ b/config.c @@ -1448,13 +1448,13 @@ static int git_default_mailmap_config(const char *var, const char *value) return 0; } -int git_default_config(const char *var, const char *value, void *dummy) +int git_default_config(const char *var, const char *value, void *cb) { if (starts_with(var, "core.")) return git_default_core_config(var, value); if (starts_with(var, "user.")) - return git_ident_config(var, value, dummy); + return git_ident_config(var, value, cb); if (starts_with(var, "i18n.")) return git_default_i18n_config(var, value); -- gitgitgadget
[PATCH 3/4] Move Windows-specific config settings into compat/mingw.c
From: Johannes Schindelin Signed-off-by: Johannes Schindelin --- cache.h| 8 compat/mingw.c | 18 ++ config.c | 8 environment.c | 1 - 4 files changed, 18 insertions(+), 17 deletions(-) diff --git a/cache.h b/cache.h index f7fabdde8..0ce95c5a8 100644 --- a/cache.h +++ b/cache.h @@ -906,14 +906,6 @@ int use_optional_locks(void); extern char comment_line_char; extern int auto_comment_line_char; -/* Windows only */ -enum hide_dotfiles_type { - HIDE_DOTFILES_FALSE = 0, - HIDE_DOTFILES_TRUE, - HIDE_DOTFILES_DOTGITONLY -}; -extern enum hide_dotfiles_type hide_dotfiles; - enum log_refs_config { LOG_REFS_UNSET = -1, LOG_REFS_NONE = 0, diff --git a/compat/mingw.c b/compat/mingw.c index 293f286af..272d5e11e 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -6,6 +6,7 @@ #include "../run-command.h" #include "../cache.h" #include "win32/lazyload.h" +#include "../config.h" #define HCAST(type, handle) ((type)(intptr_t)handle) @@ -203,8 +204,25 @@ static int ask_yes_no_if_possible(const char *format, ...) } } +/* Windows only */ +enum hide_dotfiles_type { + HIDE_DOTFILES_FALSE = 0, + HIDE_DOTFILES_TRUE, + HIDE_DOTFILES_DOTGITONLY +}; + +static enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY; + int mingw_core_config(const char *var, const char *value, void *cb) { + if (!strcmp(var, "core.hidedotfiles")) { + if (value && !strcasecmp(value, "dotgitonly")) + hide_dotfiles = HIDE_DOTFILES_DOTGITONLY; + else + hide_dotfiles = git_config_bool(var, value); + return 0; + } + return 0; } diff --git a/config.c b/config.c index 646b6cca9..5bf19a23c 100644 --- a/config.c +++ b/config.c @@ -1344,14 +1344,6 @@ static int git_default_core_config(const char *var, const char *value, void *cb) return 0; } - if (!strcmp(var, "core.hidedotfiles")) { - if (value && !strcasecmp(value, "dotgitonly")) - hide_dotfiles = HIDE_DOTFILES_DOTGITONLY; - else - hide_dotfiles = git_config_bool(var, value); - return 0; - } - if (!strcmp(var, "core.partialclonefilter")) { return git_config_string(&core_partial_clone_filter_default, var, value); diff --git a/environment.c b/environment.c index 3f3c8746c..30ecd4e78 100644 --- a/environment.c +++ b/environment.c @@ -71,7 +71,6 @@ int core_apply_sparse_checkout; int merge_log_config = -1; int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ unsigned long pack_size_limit_cfg; -enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY; enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET; #ifndef PROTECT_HFS_DEFAULT -- gitgitgadget
[PATCH 0/4] mingw: prevent external PERL5LIB from interfering
In Git for Windows, we do not support running the Perl scripts with any random Perl interpreter. Instead, we insist on using the shipped one (except for MinGit, where we do not even ship the Perl scripts, to save on space). However, if Git is called from, say, a Perl script running in a different Perl interpreter with appropriately configured PERL5LIB, it messes with Git's ability to run its Perl scripts. For that reason, we devised the presented method of defining a list of environment variables (via core.unsetEnvVars) that would then be unset before spawning any process, defaulting to PERL5LIB. An alternative approach which was rejected at the time (because it interfered with the then-ongoing work to compile Git for Windows using MS Visual C++) would patch the make_environment_block() function to skip the specified environment variables before handing down the environment block to the spawned process. Currently it would interfere with the mingw-utf-8-env patch series I sent earlier today [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. While at it, this patch series also cleans up house and moves the Windows-specific core.* variable handling to compat/mingw.c rather than cluttering environment.c and config.c with things that e.g. developers on Linux do not want to care about. Johannes Schindelin (4): config: rename `dummy` parameter to `cb` in git_default_config() Allow for platform-specific core.* config settings Move Windows-specific config settings into compat/mingw.c mingw: unset PERL5LIB by default Documentation/config.txt | 6 cache.h | 8 - compat/mingw.c | 58 +++- compat/mingw.h | 3 ++ config.c | 18 --- environment.c| 1 - git-compat-util.h| 8 + t/t0029-core-unsetenvvars.sh | 30 +++ 8 files changed, 109 insertions(+), 23 deletions(-) create mode 100755 t/t0029-core-unsetenvvars.sh base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-62/dscho/perl5lib-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/62 -- gitgitgadget
[PATCH 2/4] Allow for platform-specific core.* config settings
From: Johannes Schindelin In the Git for Windows project, we have ample precendent for config settings that apply to Windows, and to Windows only. Let's formalize this concept by introducing a platform_core_config() function that can be #define'd in a platform-specific manner. This will allow us to contain platform-specific code better, as the corresponding variables no longer need to be exported so that they can be defined in environment.c and be set in config.c Signed-off-by: Johannes Schindelin --- compat/mingw.c| 5 + compat/mingw.h| 3 +++ config.c | 6 +++--- git-compat-util.h | 8 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/compat/mingw.c b/compat/mingw.c index 81ef24286..293f286af 100644 --- a/compat/mingw.c +++ b/compat/mingw.c @@ -203,6 +203,11 @@ static int ask_yes_no_if_possible(const char *format, ...) } } +int mingw_core_config(const char *var, const char *value, void *cb) +{ + return 0; +} + /* Normalizes NT paths as returned by some low-level APIs. */ static wchar_t *normalize_ntpath(wchar_t *wbuf) { diff --git a/compat/mingw.h b/compat/mingw.h index f31dcff2b..e9d2b9cdd 100644 --- a/compat/mingw.h +++ b/compat/mingw.h @@ -11,6 +11,9 @@ typedef _sigset_t sigset_t; #undef _POSIX_THREAD_SAFE_FUNCTIONS #endif +extern int mingw_core_config(const char *var, const char *value, void *cb); +#define platform_core_config mingw_core_config + /* * things that are not available in header files */ diff --git a/config.c b/config.c index 3687c6783..646b6cca9 100644 --- a/config.c +++ b/config.c @@ -1093,7 +1093,7 @@ int git_config_color(char *dest, const char *var, const char *value) return 0; } -static int git_default_core_config(const char *var, const char *value) +static int git_default_core_config(const char *var, const char *value, void *cb) { /* This needs a better name */ if (!strcmp(var, "core.filemode")) { @@ -1363,7 +1363,7 @@ static int git_default_core_config(const char *var, const char *value) } /* Add other config variables here and to Documentation/config.txt. */ - return 0; + return platform_core_config(var, value, cb); } static int git_default_i18n_config(const char *var, const char *value) @@ -1451,7 +1451,7 @@ static int git_default_mailmap_config(const char *var, const char *value) int git_default_config(const char *var, const char *value, void *cb) { if (starts_with(var, "core.")) - return git_default_core_config(var, value); + return git_default_core_config(var, value, cb); if (starts_with(var, "user.")) return git_ident_config(var, value, cb); diff --git a/git-compat-util.h b/git-compat-util.h index 96a3f86d8..3a08d9916 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -342,6 +342,14 @@ typedef uintmax_t timestamp_t; #define _PATH_DEFPATH "/usr/local/bin:/usr/bin:/bin" #endif +#ifndef platform_core_config +static inline int noop_core_config(const char *var, const char *value, void *cb) +{ + return 0; +} +#define platform_core_config noop_core_config +#endif + #ifndef has_dos_drive_prefix static inline int git_has_dos_drive_prefix(const char *path) { -- gitgitgadget
[PATCH 0/1] DiffHighlight.pm: Use correct /dev/null for UNIX and Windows
Use File::Spec->devnull() for output redirection to avoid messages when Windows version of Perl is first in path. The message 'The system cannot find the path specified.' is displayed each time git is run to get colors. chris (1): Use correct /dev/null for UNIX and Windows contrib/diff-highlight/DiffHighlight.pm | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-59%2Fwebstech%2Fmaster-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-59/webstech/master-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/59 -- gitgitgadget
[PATCH 1/1] Use correct /dev/null for UNIX and Windows
From: chris Use File::Spec->devnull() for output redirection to avoid messages when Windows version of Perl is first in path. The message 'The system cannot find the path specified.' is displayed each time git is run to get colors. Signed-off-by: Chris. Webster --- contrib/diff-highlight/DiffHighlight.pm | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm index 536754583..7440aa1c4 100644 --- a/contrib/diff-highlight/DiffHighlight.pm +++ b/contrib/diff-highlight/DiffHighlight.pm @@ -4,6 +4,11 @@ use 5.008; use warnings FATAL => 'all'; use strict; +# Use the correct value for both UNIX and Windows (/dev/null vs nul) +use File::Spec; + +my $NULL = File::Spec->devnull(); + # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. my @OLD_HIGHLIGHT = ( @@ -134,7 +139,7 @@ sub highlight_stdin { # fallback, which means we will work even if git can't be run. sub color_config { my ($key, $default) = @_; - my $s = `git config --get-color $key 2>/dev/null`; + my $s = `git config --get-color $key 2>$NULL`; return length($s) ? $s : $default; } -- gitgitgadget
Re: [PATCH v2 06/16] sequencer: refactor sequencer_add_exec_commands() to work on a todo_list
On 27/10/2018 22:29, Alban Gruin wrote: This refactors sequencer_add_exec_commands() to work on a todo_list to avoid redundant reads and writes to the disk. An obvious way to do this would be to insert the `exec' command between the other commands, and reparse it once this is done. This is not what is done here. Instead, the command is appended to the buffer once, and a new list of items is created. Items from the old list are copied to it, and new `exec' items are appended when necessary. This eliminates the need to reparse the todo list, but this also means its buffer cannot be directly written to the disk, hence todo_list_write_to_disk(). I'd reword this slightly, maybe Instead of just inserting the `exec' command between the other commands, and re-parsing the buffer at the end the exec command is appended to the buffer once, and a new list of items is created. Items from the old list are copied across and new `exec' items are appended when necessary. This eliminates the need to reparse the buffer, but this also means we have to use todo_list_write_to_disk() to write the file. sequencer_add_exec_commands() still reads the todo list from the disk, as it is needed by rebase -p. todo_list_add_exec_commands() works on a todo_list structure, and reparses it at the end. I think the saying 'reparses' is confusing as that is what we're trying to avoid. complete_action() still uses sequencer_add_exec_commands() for now. This will be changed in a future commit. Signed-off-by: Alban Gruin --- sequencer.c | 69 + 1 file changed, 49 insertions(+), 20 deletions(-) diff --git a/sequencer.c b/sequencer.c index e12860c047..12a3efeca8 100644 --- a/sequencer.c +++ b/sequencer.c @@ -4216,6 +4216,50 @@ int sequencer_make_script(FILE *out, int argc, const char **argv, return 0; } +static void todo_list_add_exec_commands(struct todo_list *todo_list, + const char *commands) +{ + struct strbuf *buf = &todo_list->buf; + const char *old_buf = buf->buf; + size_t commands_len = strlen(commands + strlen("exec ")) - 1; + int i, first = 1, nr = 0, alloc = 0; Minor nit pick, I think it is clearer if first is initialized just before the loop as it is in the deleted code below. + struct todo_item *items = NULL, + base_item = {TODO_EXEC, NULL, 0, 0, commands_len, 0}; + + strbuf_addstr(buf, commands); + base_item.offset_in_buf = buf->len - commands_len - 1; + base_item.arg = buf->buf + base_item.offset_in_buf; I think if the user gives --exec more than once on the command line then commands will contain more than one exec command so this needs to parse commands and create one todo_item for each command. + + /* +* Insert after every pick. Here, fixup/squash chains +* are considered part of the pick, so we insert the commands *after* +* those chains if there are any. +*/ + for (i = 0; i < todo_list->nr; i++) { + enum todo_command command = todo_list->items[i].command; + if (todo_list->items[i].arg) + todo_list->items[i].arg = todo_list->items[i].arg - old_buf + buf->buf; + + if (command == TODO_PICK && !first) { + ALLOC_GROW(items, nr + 1, alloc); + memcpy(items + nr++, &base_item, sizeof(struct todo_item)); I think it would be clearer to say items[nr++] = base_item; rather than using memcpy. This applies below and to some of the other patches as well. Also this needs to loop over all the base_items if the user gave --exec more than once on the command line. Best Wishes Phillip + } + + ALLOC_GROW(items, nr + 1, alloc); + memcpy(items + nr++, todo_list->items + i, sizeof(struct todo_item)); + first = 0; + } + + /* insert or append final */ + ALLOC_GROW(items, nr + 1, alloc); + memcpy(items + nr++, &base_item, sizeof(struct todo_item)); + + FREE_AND_NULL(todo_list->items); + todo_list->items = items; + todo_list->nr = nr; + todo_list->alloc = alloc; +} + /* * Add commands after pick and (series of) squash/fixup commands * in the todo list. @@ -4224,10 +4268,7 @@ int sequencer_add_exec_commands(const char *commands) { const char *todo_file = rebase_path_todo(); struct todo_list todo_list = TODO_LIST_INIT; - struct todo_item *item; - struct strbuf *buf = &todo_list.buf; - size_t offset = 0, commands_len = strlen(commands); - int i, first; + int res; if (strbuf_read_file(&todo_list.buf, todo_file, 0) < 0) return error(_("could not read '%s'."), todo_file); @@ -4237,23 +4278,11 @@ int sequencer_add_exec_commands(const char *commands) return error(_("unusable todo list: '%s
Re: [PATCH v2 04/16] sequencer: introduce todo_list_write_to_file()
Hi Alban I like the direction this is going, it is an improvement on re-scanning the list at the end of each function. On 27/10/2018 22:29, Alban Gruin wrote: This introduce a new function to recreate the text of a todo list from its commands, and then to write it to the disk. This will be useful in the future, the buffer of a todo list won’t be treated as a strict mirror of the todo file by some of its functions once they will be refactored. I'd suggest rewording this slightly, maybe something like This introduces a new function to recreate the text of a todo list from its commands and write it to a file. This will be useful as the next few commits will change the use of the buffer in struct todo_list so it will no-longer be a mirror of the file on disk. This functionnality can already be found in todo_list_transform(), but s/functionnality/functionality/ it is specifically made to replace the buffer of a todo list, which is not the desired behaviour. Thus, the part of todo_list_transform() that actually creates the buffer is moved to a new function, todo_list_to_strbuf(). The rest is unused, and so is dropped. todo_list_write_to_file() can also take care to append the help text to s/care to append/care of appending/ the buffer before writing it to the disk, or to write only the first n items of the list. Why/when do we only want to write a subset of the items? Signed-off-by: Alban Gruin --- sequencer.c | 59 - sequencer.h | 4 +++- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/sequencer.c b/sequencer.c index 07296f1f57..0dd45677b1 100644 --- a/sequencer.c +++ b/sequencer.c @@ -4256,24 +4256,27 @@ int sequencer_add_exec_commands(const char *commands) return i; } -void todo_list_transform(struct todo_list *todo_list, unsigned flags) +static void todo_list_to_strbuf(struct todo_list *todo_list, struct strbuf *buf, + int num, unsigned flags) { - struct strbuf buf = STRBUF_INIT; struct todo_item *item; - int i; + int i, max = todo_list->nr; - for (item = todo_list->items, i = 0; i < todo_list->nr; i++, item++) { + if (num > 0 && num < max) + max = num; + + for (item = todo_list->items, i = 0; i < max; i++, item++) { /* if the item is not a command write it and continue */ if (item->command >= TODO_COMMENT) { - strbuf_addf(&buf, "%.*s\n", item->arg_len, item->arg); + strbuf_addf(buf, "%.*s\n", item->arg_len, item->arg); continue; } /* add command to the buffer */ if (flags & TODO_LIST_ABBREVIATE_CMDS) - strbuf_addch(&buf, command_to_char(item->command)); + strbuf_addch(buf, command_to_char(item->command)); else - strbuf_addstr(&buf, command_to_string(item->command)); + strbuf_addstr(buf, command_to_string(item->command)); /* add commit id */ if (item->commit) { @@ -4283,27 +4286,46 @@ void todo_list_transform(struct todo_list *todo_list, unsigned flags) if (item->command == TODO_MERGE) { if (item->flags & TODO_EDIT_MERGE_MSG) - strbuf_addstr(&buf, " -c"); + strbuf_addstr(buf, " -c"); else - strbuf_addstr(&buf, " -C"); + strbuf_addstr(buf, " -C"); } - strbuf_addf(&buf, " %s", oid); + strbuf_addf(buf, " %s", oid); } /* add all the rest */ if (!item->arg_len) - strbuf_addch(&buf, '\n'); + strbuf_addch(buf, '\n'); else - strbuf_addf(&buf, " %.*s\n", item->arg_len, item->arg); + strbuf_addf(buf, " %.*s\n", item->arg_len, item->arg); } +} - strbuf_reset(&todo_list->buf); - strbuf_add(&todo_list->buf, buf.buf, buf.len); +int todo_list_write_to_file(struct todo_list *todo_list, const char *file, + const char *shortrevisions, const char *shortonto, + int command_count, int append_help, int num, unsigned flags) This is a really long argument list which makes it easy for callers to get the parameters in the wrong order. I think append_help could probably be folded into the flags, I'm not sure what the command_count is used for but I've only read the first few patches. Maybe it would be better to pass a struct so we have named fields. Best Wishes Phillip +{ + int edit_todo = !(shortrevisions && shortonto), res; + struct strbuf buf = STRBUF_INIT; +
Re: commit-graph is cool (overcoming add_missing_tags() perf issues)
On Wed, Oct 17, 2018 at 8:41 PM Elijah Newren wrote: > (And in the mean time I gave the user a one-liner to nuke his > local-only tags that I suspect he doesn't need.) Just a note that you can usually set 'fetch.pruneTags=true' these days to make that happen.
Re: [PATCH v2] completion: use builtin completion for format-patch
On Tue, Oct 30, 2018 at 7:39 AM Denton Liu wrote: > > This patch offloads completion functionality for format-patch to > __gitcomp_builtin. In addition to this, send-email was borrowing some > completion options from format-patch so those options are moved into > send-email's completions. > > Signed-off-by: Denton Liu > --- > > I ran t9902-completion.sh on this patch and it seems to pass. Thus, this > should address the concerns about losing some completion options in > send-email. > > --- > contrib/completion/git-completion.bash | 34 +++--- > 1 file changed, 14 insertions(+), 20 deletions(-) > > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index d63d2dffd..cb4ef6723 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -1531,15 +1531,6 @@ _git_fetch () > __git_complete_remote_or_refspec > } > > -__git_format_patch_options=" > - --stdout --attach --no-attach --thread --thread= --no-thread > - --numbered --start-number --numbered-files --keep-subject --signoff > - --signature --no-signature --in-reply-to= --cc= --full-index --binary > - --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix= > - --inline --suffix= --ignore-if-in-upstream --subject-prefix= > - --output-directory --reroll-count --to= --quiet --notes > -" > - > _git_format_patch () > { > case "$cur" in > @@ -1550,7 +1541,7 @@ _git_format_patch () > return > ;; > --*) > - __gitcomp "$__git_format_patch_options" > + __gitcomp_builtin format-patch We do want to keep some extra options back because __gitcomp_builtin cannot show all supported options of format-patch. The reason is a subset of options is handled by setup_revisions() which does not have --git-completion-helper support. > @@ -2080,16 +2071,19 @@ _git_send_email () > return > ;; > --*) > - __gitcomp "--annotate --bcc --cc --cc-cmd --chain-reply-to > - --compose --confirm= --dry-run --envelope-sender > - --from --identity > - --in-reply-to --no-chain-reply-to > --no-signed-off-by-cc > - --no-suppress-from --no-thread --quiet --reply-to > - --signed-off-by-cc --smtp-pass --smtp-server > - --smtp-server-port --smtp-encryption= --smtp-user > - --subject --suppress-cc= --suppress-from --thread --to > - --validate --no-validate > - $__git_format_patch_options" > + __gitcomp "--all --annotate --attach --bcc --binary --cc > --cc= --cc-cmd > + --chain-reply-to --compose --confirm= --cover-letter > --dry-run > + --dst-prefix= --envelope-sender --from --full-index > --identity > + --ignore-if-in-upstream --inline --in-reply-to > --in-reply-to= > + --keep-subject --no-attach --no-chain-reply-to > --no-prefix > + --no-signature --no-signed-off-by-cc > --no-suppress-from --not --notes > + --no-thread --no-validate --numbered --numbered-files > + --output-directory --quiet --reply-to --reroll-count > --signature > + --signed-off-by-cc --signoff --smtp-encryption= > --smtp-pass > + --smtp-server --smtp-server-port --smtp-user > --src-prefix= > + --start-number --stdout --subject --subject-prefix= > --suffix= > + --suppress-cc= --suppress-from --thread --thread= > --to --to= > + --validate" I have no comment about this. In an ideal world, sendemail.perl could be taught to support --git-completion-helper but I don't think my little remaining Perl knowledge (or time) is enough to do it. Perhaps this will do. I don't know. -- Duy
Re: Lost changes after merge
The commits you mentioned are not present on the new pastes. On Tue, Oct 30, 2018 at 03:46:28AM +0100, Gray King wrote: > Sorry, seems the link has been expired, here is the new one: > * Before merge run `git log --format="%h %p %d" -n 20 --all --graph`: > One thing I noticed, is that you're using %p with --graph. And --graph enables parent rewriting. Which may surprise you if you don't know what it does. But apart from that, and assuming you only did `git merge f087081868` everything looks normal between those two pastes. Cheers, Rafael Ascensão
Re: commit-graph is cool (overcoming add_missing_tags() perf issues)
On 10/17/2018 2:00 PM, Elijah Newren wrote: Hi, Just wanted to give a shout-out for the commit-graph work and how impressive it is. I had an internal report from a user that git pushes containing only one new tiny commit were taking over a minute (in a moderate size repo with good network connectivity). After digging for a while, I noticed three unusual things about the repo[1]: * he had push.followTags set to true * upstream repo had about 20k tags (despite only 55k commits) * his repo had an additional 2.5k tags, but none of these were in the history of the branches he was pushing and thus would not be included in any pushes. Digging in, almost all the time was CPU-bound and spent in add_missing_tags()[2]. If I'm reading the code correctly, it appears that function loops over each tag, calling in_merge_bases_many() once per tag. Thus, for his case, we were potentially walking all of history of the main branch 2.5k times. That seemed rather suboptimal. Elijah, Do you still have this repo around? Could you by chance test the performance with the new algorithm for add_missing_tags() in [1]? Specifically, please test it without a commit-graph file, since your data shape already makes use of generation numbers pretty well. Thanks, -Stolee [1] https://public-inbox.org/git/pull.60.git.gitgitgad...@gmail.com/T/#t
[PATCH 2/3] test-reach: test get_reachable_subset
From: Derrick Stolee The get_reachable_subset() method returns the list of commits in the 'to' array that are reachable from at least one commit in the 'from' array. Add tests that check this method works in a few cases: 1. All commits in the 'to' list are reachable. This exercises the early-termination condition. 2. Some commits in the 'to' list are reachable. This exercises the loop-termination condition. 3. No commits in the 'to' list are reachable. This exercises the NULL return condition. Signed-off-by: Derrick Stolee --- t/helper/test-reach.c | 34 t/t6600-test-reach.sh | 52 +++ 2 files changed, 82 insertions(+), 4 deletions(-) diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 08d2ea68e..a0272178b 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -32,8 +32,8 @@ int cmd__reach(int ac, const char **av) struct commit *A, *B; struct commit_list *X, *Y; struct object_array X_obj = OBJECT_ARRAY_INIT; - struct commit **X_array; - int X_nr, X_alloc; + struct commit **X_array, **Y_array; + int X_nr, X_alloc, Y_nr, Y_alloc; struct strbuf buf = STRBUF_INIT; struct repository *r = the_repository; @@ -44,9 +44,10 @@ int cmd__reach(int ac, const char **av) A = B = NULL; X = Y = NULL; - X_nr = 0; - X_alloc = 16; + X_nr = Y_nr = 0; + X_alloc = Y_alloc = 16; ALLOC_ARRAY(X_array, X_alloc); + ALLOC_ARRAY(Y_array, Y_alloc); while (strbuf_getline(&buf, stdin) != EOF) { struct object_id oid; @@ -92,6 +93,8 @@ int cmd__reach(int ac, const char **av) case 'Y': commit_list_insert(c, &Y); + ALLOC_GROW(Y_array, Y_nr + 1, Y_alloc); + Y_array[Y_nr++] = c; break; default: @@ -136,6 +139,29 @@ int cmd__reach(int ac, const char **av) filter.with_commit_tag_algo = 0; printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache)); + } else if (!strcmp(av[1], "get_reachable_subset")) { + const int reachable_flag = 1; + int i, count = 0; + struct commit_list *current; + struct commit_list *list = get_reachable_subset(X_array, X_nr, + Y_array, Y_nr, + reachable_flag); + printf("get_reachable_subset(X,Y)\n"); + for (current = list; current; current = current->next) { + if (!(list->item->object.flags & reachable_flag)) + die(_("commit %s is not marked reachable"), + oid_to_hex(&list->item->object.oid)); + count++; + } + for (i = 0; i < Y_nr; i++) { + if (Y_array[i]->object.flags & reachable_flag) + count--; + } + + if (count < 0) + die(_("too many commits marked reachable")); + + print_sorted_commit_ids(list); } exit(0); diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index ae94b27f7..a0c64e617 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -265,4 +265,56 @@ test_expect_success 'commit_contains:miss' ' test_three_modes commit_contains --tag ' +test_expect_success 'get_reachable_subset:all' ' + cat >input <<-\EOF && + X:commit-9-1 + X:commit-8-3 + X:commit-7-5 + X:commit-6-6 + X:commit-1-7 + Y:commit-3-3 + Y:commit-1-7 + Y:commit-5-6 + EOF + ( + echo "get_reachable_subset(X,Y)" && + git rev-parse commit-3-3 \ + commit-1-7 \ + commit-5-6 | sort + ) >expect && + test_three_modes get_reachable_subset +' + +test_expect_success 'get_reachable_subset:some' ' + cat >input <<-\EOF && + X:commit-9-1 + X:commit-8-3 + X:commit-7-5 + X:commit-1-7 + Y:commit-3-3 + Y:commit-1-7 + Y:commit-5-6 + EOF + ( + echo "get_reachable_subset(X,Y)" && + git rev-parse commit-3-3 \ + commit-1-7 | sort + ) >expect && + test_three_modes get_reachable_subset +' + +test_expect_success 'get_reachable_subset:none' ' + cat >input <<-\EOF && + X:commit-9-1 + X:commit-8-3 + X:commit-7-5 + X:commit-1-7 + Y:commit-9-3 + Y:commit-7-6 + Y:commit-2-8 + EOF + echo "get_reachable_subset(X,Y)" >expect && + test_three_modes get_reach
[PATCH 3/3] remote: make add_missing_tags() linear
From: Derrick Stolee The add_missing_tags() method currently has quadratic behavior. This is due to a linear number (based on number of tags T) of calls to in_merge_bases_many, which has linear performance (based on number of commits C in the repository). Replace this O(T * C) algorithm with an O(T + C) algorithm by using get_reachable_subset(). We ignore the return list and focus instead on the reachable_flag assigned to the commits we care about, because we need to interact with the tag ref and not just the commit object. Signed-off-by: Derrick Stolee --- remote.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/remote.c b/remote.c index 81f4f01b0..b850f2feb 100644 --- a/remote.c +++ b/remote.c @@ -1205,9 +1205,36 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds * sent to the other side. */ if (sent_tips.nr) { + const int reachable_flag = 1; + struct commit_list *found_commits; + struct commit **src_commits; + int nr_src_commits = 0, alloc_src_commits = 16; + ALLOC_ARRAY(src_commits, alloc_src_commits); + for_each_string_list_item(item, &src_tag) { struct ref *ref = item->util; + struct commit *commit; + + if (is_null_oid(&ref->new_oid)) + continue; + commit = lookup_commit_reference_gently(the_repository, + &ref->new_oid, + 1); + if (!commit) + /* not pushing a commit, which is not an error */ + continue; + + ALLOC_GROW(src_commits, nr_src_commits + 1, alloc_src_commits); + src_commits[nr_src_commits++] = commit; + } + + found_commits = get_reachable_subset(sent_tips.tip, sent_tips.nr, +src_commits, nr_src_commits, +reachable_flag); + + for_each_string_list_item(item, &src_tag) { struct ref *dst_ref; + struct ref *ref = item->util; struct commit *commit; if (is_null_oid(&ref->new_oid)) @@ -1223,7 +1250,7 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds * Is this tag, which they do not have, reachable from * any of the commits we are sending? */ - if (!in_merge_bases_many(commit, sent_tips.nr, sent_tips.tip)) + if (!(commit->object.flags & reachable_flag)) continue; /* Add it in */ @@ -1231,7 +1258,12 @@ static void add_missing_tags(struct ref *src, struct ref **dst, struct ref ***ds oidcpy(&dst_ref->new_oid, &ref->new_oid); dst_ref->peer_ref = copy_ref(ref); } + + clear_commit_marks_many(nr_src_commits, src_commits, reachable_flag); + free(src_commits); + free_commit_list(found_commits); } + string_list_clear(&src_tag, 0); free(sent_tips.tip); } -- gitgitgadget
[PATCH 0/3] Make add_missing_tags() linear
As reported earlier [1], the add_missing_tags() method in remote.c has quadratic performance. Some of that performance is curbed due to the generation-number cutoff in in_merge_bases_many(). However, that fix doesn't help users without a commit-graph, and it can still be painful if that cutoff is sufficiently low compared to the tags we are using for reachability testing. Add a new method in commit-reach.c called get_reachable_subset() which does a many-to-many reachability test. Starting at the 'from' commits, walk until the generation is below the smallest generation in the 'to' commits, or all 'to' commits have been discovered. This performs only one commit walk for the entire add_missing_tags() method, giving linear performance in the worst case. Tests are added in t6600-test-reach.sh to ensure get_reachable_subset() works independently of its application in add_missing_tags(). Thanks, -Stolee [1] https://public-inbox.org/git/cabpp-becpsoxudovjbdg_3w9wus102rw+e+qpmd4g3qyd-q...@mail.gmail.com/ Derrick Stolee (3): commit-reach: implement get_reachable_subset test-reach: test get_reachable_subset remote: make add_missing_tags() linear commit-reach.c| 70 +++ commit-reach.h| 10 +++ remote.c | 34 - t/helper/test-reach.c | 34 ++--- t/t6600-test-reach.sh | 52 5 files changed, 195 insertions(+), 5 deletions(-) base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-60%2Fderrickstolee%2Fadd-missing-tags-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-60/derrickstolee/add-missing-tags-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/60 -- gitgitgadget
[PATCH 1/3] commit-reach: implement get_reachable_subset
From: Derrick Stolee The existing reachability algorithms in commit-reach.c focus on finding merge-bases or determining if all commits in a set X can reach at least one commit in a set Y. However, for two commits sets X and Y, we may also care about which commits in Y are reachable from at least one commit in X. Implement get_reachable_subset() which answers this question. Given two arrays of commits, 'from' and 'to', return a commit_list with every commit from the 'to' array that is reachable from at least one commit in the 'from' array. The algorithm is a simple walk starting at the 'from' commits, using the PARENT2 flag to indicate "this commit has already been added to the walk queue". By marking the 'to' commits with the PARENT1 flag, we can determine when we see a commit from the 'to' array. We remove the PARENT1 flag as we add that commit to the result list to avoid duplicates. The order of the resulting list is a reverse of the order that the commits are discovered in the walk. There are a couple shortcuts to avoid walking more than we need: 1. We determine the minimum generation number of commits in the 'to' array. We do not walk commits with generation number below this minimum. 2. We count how many distinct commits are in the 'to' array, and decrement this count when we discover a 'to' commit during the walk. If this number reaches zero, then we can terminate the walk. Tests will be added using the 'test-tool reach' helper in a subsequent commit. Signed-off-by: Derrick Stolee --- commit-reach.c | 70 ++ commit-reach.h | 10 2 files changed, 80 insertions(+) diff --git a/commit-reach.c b/commit-reach.c index 9f79ce0a2..a98532ecc 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -688,3 +688,73 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to, object_array_clear(&from_objs); return result; } + +struct commit_list *get_reachable_subset(struct commit **from, int nr_from, +struct commit **to, int nr_to, +int reachable_flag) +{ + struct commit **item; + struct commit *current; + struct commit_list *found_commits = NULL; + struct commit **to_last = to + nr_to; + struct commit **from_last = from + nr_from; + uint32_t min_generation = GENERATION_NUMBER_INFINITY; + int num_to_find = 0; + + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + + for (item = to; item < to_last; item++) { + struct commit *c = *item; + + parse_commit(c); + if (c->generation < min_generation) + min_generation = c->generation; + + if (!(c->object.flags & PARENT1)) { + c->object.flags |= PARENT1; + num_to_find++; + } + } + + for (item = from; item < from_last; item++) { + struct commit *c = *item; + if (!(c->object.flags & PARENT2)) { + c->object.flags |= PARENT2; + parse_commit(c); + + prio_queue_put(&queue, *item); + } + } + + while (num_to_find && (current = prio_queue_get(&queue)) != NULL) { + struct commit_list *parents; + + if (current->object.flags & PARENT1) { + current->object.flags &= ~PARENT1; + current->object.flags |= reachable_flag; + commit_list_insert(current, &found_commits); + num_to_find--; + } + + for (parents = current->parents; parents; parents = parents->next) { + struct commit *p = parents->item; + + parse_commit(p); + + if (p->generation < min_generation) + continue; + + if (p->object.flags & PARENT2) + continue; + + p->object.flags |= PARENT2; + prio_queue_put(&queue, p); + } + } + + clear_commit_marks_many(nr_to, to, PARENT1); + clear_commit_marks_many(nr_from, from, PARENT2); + + return found_commits; +} + diff --git a/commit-reach.h b/commit-reach.h index 7d313e297..43bd50a70 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -74,4 +74,14 @@ int can_all_from_reach_with_flag(struct object_array *from, int can_all_from_reach(struct commit_list *from, struct commit_list *to, int commit_date_cutoff); + +/* + * Return a list of commits containing the commits in the 'to' array + * that are reachable from at least one commit in the 'from' array. + * Also add the given 'flag' to each of the commits in the returned list. + */ +struct commit_list *get_reachable
Git Test Coverage Report (Tuesday, Oct 30)
This coverage report uses a new build definition that runs the four suites in parallel, then combines the results into the report below. You can see the build that generated this report at [1]. Thanks, -Stolee [1] https://git.visualstudio.com/git/_build/results?buildId=235&view=logs --- pu: 7f50cad4af91f508e14338e87c5911f990d0a938 jch: 13c38c20c0c9afee77ede86d423027eb79c51ced next: 53cd27f688b0bae2a5f2d8f5cea6ba63c6e4a781 master: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 master@{1}: c670b1f876521c9f7cd40184bf7ed05aad843433 Uncovered code in 'pu' not in 'jch' -- builtin/blame.c 7f50cad4af builtin/blame.c 200) repo_unuse_commit_buffer(the_repository, commit, message); 74e8221b52 builtin/blame.c 924) blame_date_width = sizeof("Thu Oct 19 16:00"); 74e8221b52 builtin/blame.c 925) break; builtin/describe.c 7f50cad4af builtin/describe.c 257) repo_parse_commit(the_repository, p); builtin/fsck.c 372738935a builtin/fsck.c 622) fprintf_ln(stderr, _("Checking %s link"), head_ref_name); 372738935a builtin/fsck.c 627) return error(_("invalid %s"), head_ref_name); builtin/grep.c c22b820141 builtin/grep.c 429) grep_read_unlock(); builtin/pack-objects.c 7f50cad4af builtin/pack-objects.c 2832) if (!repo_has_object_file(the_repository, &obj->oid) && is_promisor_object(&obj->oid)) builtin/reflog.c c9ef0d95eb builtin/reflog.c 585) all_worktrees = 0; c9ef0d95eb builtin/reflog.c 621) continue; date.c 74e8221b52 113) die("Timestamp too large for this system: %"PRItime, time); 74e8221b52 216) if (tm->tm_mon == human_tm->tm_mon) { 74e8221b52 217) if (tm->tm_mday > human_tm->tm_mday) { 74e8221b52 219) } else if (tm->tm_mday == human_tm->tm_mday) { 74e8221b52 220) hide.date = hide.wday = 1; 74e8221b52 221) } else if (tm->tm_mday + 5 > human_tm->tm_mday) { 74e8221b52 223) hide.date = 1; 74e8221b52 231) gettimeofday(&now, NULL); 74e8221b52 232) show_date_relative(time, tz, &now, buf); 74e8221b52 233) return; 74e8221b52 246) hide.seconds = 1; 74e8221b52 247) hide.tz |= !hide.date; 74e8221b52 248) hide.wday = hide.time = !hide.year; 74e8221b52 262) strbuf_rtrim(buf); 74e8221b52 287) gettimeofday(&now, NULL); 74e8221b52 290) human_tz = local_time_tzoffset(now.tv_sec, &human_tm); 74e8221b52 886) static int auto_date_style(void) 74e8221b52 888) return (isatty(1) || pager_in_use()) ? DATE_HUMAN : DATE_NORMAL; 74e8221b52 909) return DATE_HUMAN; 74e8221b52 911) return auto_date_style(); fsck.c 7f50cad4af 858) repo_unuse_commit_buffer(the_repository, commit, buffer); 7f50cad4af 878) repo_read_object_file(the_repository, 7f50cad4af 879) &tag->object.oid, &type, &size); http-push.c 7f50cad4af 1635) if (!repo_has_object_file(the_repository, &head_oid)) 7f50cad4af 1642) if (!repo_has_object_file(the_repository, &remote_ref->old_oid)) merge-recursive.c 4cdc48e412 1585) return -1; 4cdc48e412 1588) return -1; 4cdc48e412 1594) return -1; 4cdc48e412 1596) if (update_file(o, 1, b_oid, b_mode, collide_path)) 4cdc48e412 1597) return -1; 4cdc48e412 1664) return -1; 4cdc48e412 1667) return -1; 4cdc48e412 1670) return -1; b58ae691c0 1703) return -1; 387361a6a7 1738) return -1; 387361a6a7 1786) return -1; 387361a6a7 1795) new_path = unique_path(o, a->path, ci->branch1); 387361a6a7 1796) output(o, 1, _("Refusing to lose untracked file" 387361a6a7 1802) return -1; 387361a6a7 1805) return -1; 387361a6a7 1815) return -1; 387361a6a7 1831) return -1; 387361a6a7 1834) return -1; midx.c 1dcd9f2043 184) return; negotiator/default.c 7f50cad4af 71) if (repo_parse_commit(the_repository, commit)) packfile.c dc7d664335 350) close_midx(o->multi_pack_index); dc7d664335 351) o->multi_pack_index = NULL; refs.c 3a3b9d8cde 657) return 0; refs/files-backend.c remote.c 879b6a9e6f 1140) return error(_("dst ref %s receives from more than one src."), revision.c 7f50cad4af 726) if (repo_parse_commit(the_repository, p) < 0) sequencer.c 7f50cad4af 1624) repo_unuse_commit_buffer(the_repository, head_commit, 7f50cad4af 3865) repo_unuse_commit_buffer(the_repository, sha1-array.c 7fdf90d541 91) oidcpy(&oids[dst], &oids[src]); submodule-config.c bcbc780d14 740) return CONFIG_INVALID_KEY; 45f5ef3d77 755) warning(_("Could not update .gitmodules entry %s"), key); submodule.c b303ef65e7 524) the_repository->submodule_prefix : 060675d4fc 1378) strbuf_release(&gitdir); 183be9660a 1501) struct get_next_submodule_task *task = task_cb; 183be9660a 1505) get_next_submodule_task_release(task); 183be9660a 1532) return 0; 183be9660a 1536) goto out; 183be9660a 1551) return 0; tree.c 7f50cad4af 108) if (repo_parse_commit(the_repository, commit)) worktree.c 3a3b9d8cde 495) return -1; 3a3b9d8cde 508) return -1; 3a3b9d8cde 517) return -1; ab3e1f78ae 537) break; wrapper.c 5efde212fc 70) die("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", 5efde212fc 73) error("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", Commits introducing uncovered co
Re: [PATCH v3 1/3] [Outreachy] t3903-stash: test without configured user name
On 26-Oct-18 3:13 AM, Junio C Hamano wrote: Slavica Djukic writes: From: Slavica Please make sure this matches your sign-off below. This is part of enhancement request that ask for 'git stash' to work even if 'user.name' and 'user.email' are not configured. Due to an implementation detail, git-stash undesirably requires 'user.name' and 'user.email' to be set, but shouldn't. The issue is discussed here: https://public-inbox.org/git/87o9debty4@evledraar.gmail.com/T/#u. As the four lines above summarize the issue being highlighted by the expect-failure rather well, the last two lines are unnecessary. Please remove them. Alternatively, you can place them after the three-dash lines we see below. Signed-off-by: Slavica Djukic --- t/t3903-stash.sh | 14 ++ 1 file changed, 14 insertions(+) diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh index 9e06494ba0..ae2c905343 100755 --- a/t/t3903-stash.sh +++ b/t/t3903-stash.sh @@ -1156,4 +1156,18 @@ test_expect_success 'stash -- works with binary files' ' test_path_is_file subdir/untracked ' +test_expect_failure 'stash works when user.name and user.email are not set' ' +test_commit 1 && Just being curious, but do we need a fresh commit created at this point in the test? Many tests before this one begin with "git reset" and then run "git stash" without ever creating commit themselves, instead relying on the fact that there already is at least one commit created in the "setup" phase of the test that a "stash" created can be made relative to. I do not think this test is all that special in that regard to require its own commit. No, we don't need fresh commit here. Thank you for this and all other suggestions. I've changed test according to them. +test_config user.useconfigonly true && +test_config stash.usebuiltin true && +sane_unset GIT_AUTHOR_NAME && +sane_unset GIT_AUTHOR_EMAIL && +sane_unset GIT_COMMITTER_NAME && +sane_unset GIT_COMMITTER_EMAIL && +test_unconfig user.email && There are trailing whitespaces on the line above. Please remove. Also, Don't be original in the form alone---all other tests in this file indent with a leading HT, not four SPs. Please match the style of surrounding code. +test_unconfig user.name && +echo changed >1.t && +git stash +' + test_done Thanks. Please do not reroll the next round at too rapid a pace. I've taken my time for next round, I am working on 2/3 and 3/3 parts as well. I wouldn't have sent this patch if I understood you well in previous reply. Thank you, Slavica.
Re: [PATCH v7 09/10] submodule: support reading .gitmodules when it's not in the working tree
On Tue, 30 Oct 2018 10:57:09 +0100 (STD) Johannes Schindelin wrote: > Hi Antonio, > Hi Johannes, > On Thu, 25 Oct 2018, Antonio Ospite wrote: > > > diff --git a/t/t7418-submodule-sparse-gitmodules.sh > > b/t/t7418-submodule-sparse-gitmodules.sh > > new file mode 100755 [...] > > + echo "$PWD/submodule" >expect && > > Would you mind squashing this fixup in? > > -- snip -- > diff --git a/t/t7418-submodule-sparse-gitmodules.sh > b/t/t7418-submodule-sparse-gitmodules.sh > index 21a86b89c6cb..3f7f27188313 100755 > --- a/t/t7418-submodule-sparse-gitmodules.sh > +++ b/t/t7418-submodule-sparse-gitmodules.sh > @@ -55,7 +55,7 @@ test_expect_success 'initialising submodule when the > gitmodules config is not ch > test_must_fail git -C super config submodule.submodule.url && > git -C super submodule init && > git -C super config submodule.submodule.url >actual && > - echo "$PWD/submodule" >expect && > + echo "$(pwd)/submodule" >expect && > test_cmp expect actual > ' > > -- snap -- > > On Windows, `$PWD` and `$(pwd)` are *not* synonymous. The former > reflects the "Unix path" which is understood by the Bash script (and > only by the Bash script, *not* by `git.exe`!) while the latter refers to > the actual Windows path. > I see, this is also mentioned in t/README, I had overlooked that part. Thank you for reporting. > Without this fix, your new test case will fail on Windows all the time, > see e.g. > https://git-for-windows.visualstudio.com/git/_build/results?buildId=22913&view=logs > Junio, what is the plan for 'ao/submodule-wo-gitmodules-checked-out'? I see it's not in next yet; do you want me to resend the whole series with this fixup in or would it be less overhead for you to apply it directly to patch 9/10 from v7 of the series? Thank you, Antonio P.S. I was wondering if it is worth having patchset versions mentioned somewhere in pu/, maybe in merge commits if not in branch names? Or at least in whats-cooking.txt next to the date. -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
[PATCH v3 3/5] am: rename read_author_script()
From: Phillip Wood Rename read_author_script() in preparation for adding a shared read_author_script() function to libgit. Signed-off-by: Phillip Wood --- builtin/am.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index d42b725273..991d13f9a2 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct string_list *list) * script, and thus if the file differs from what this function expects, it is * better to bail out than to do something that the user does not expect. */ -static int read_author_script(struct am_state *state) +static int read_am_author_script(struct am_state *state) { const char *filename = am_path(state, "author-script"); struct strbuf buf = STRBUF_INIT; @@ -441,7 +441,7 @@ static void am_load(struct am_state *state) BUG("state file 'last' does not exist"); state->last = strtol(sb.buf, NULL, 10); - if (read_author_script(state) < 0) + if (read_am_author_script(state) < 0) die(_("could not parse author script")); read_commit_msg(state); -- 2.19.1
[PATCH v3 4/5] add read_author_script() to libgit
From: Phillip Wood Add read_author_script() to sequencer.c based on the implementation in builtin/am.c and update read_am_author_script() to use read_author_script(). The sequencer code that reads the author script will be updated in the next commit. Signed-off-by: Phillip Wood --- Notes: changes since v1: - added comment above read_author_script() - rebased to reflect changes added in patch 2 builtin/am.c | 86 + sequencer.c | 105 +++ sequencer.h | 3 ++ 3 files changed, 110 insertions(+), 84 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index 991d13f9a2..c5373158c0 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct am_state *state, die_errno(_("could not read '%s'"), am_path(state, file)); } -/** - * Take a series of KEY='VALUE' lines where VALUE part is - * sq-quoted, and append at the end of the string list - */ -static int parse_key_value_squoted(char *buf, struct string_list *list) -{ - while (*buf) { - struct string_list_item *item; - char *np; - char *cp = strchr(buf, '='); - if (!cp) { - np = strchrnul(buf, '\n'); - return error(_("unable to parse '%.*s'"), -(int) (np - buf), buf); - } - np = strchrnul(cp, '\n'); - *cp++ = '\0'; - item = string_list_append(list, buf); - - buf = np + (*np == '\n'); - *np = '\0'; - cp = sq_dequote(cp); - if (!cp) - return error(_("unable to dequote value of '%s'"), -item->string); - item->util = xstrdup(cp); - } - return 0; -} - /** * Reads and parses the state directory's "author-script" file, and sets * state->author_name, state->author_email and state->author_date accordingly. @@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct string_list *list) static int read_am_author_script(struct am_state *state) { const char *filename = am_path(state, "author-script"); - struct strbuf buf = STRBUF_INIT; - struct string_list kv = STRING_LIST_INIT_DUP; - int retval = -1; /* assume failure */ - int i, name_i = -2, email_i = -2, date_i = -2, err = 0; - int fd; assert(!state->author_name); assert(!state->author_email); assert(!state->author_date); - fd = open(filename, O_RDONLY); - if (fd < 0) { - if (errno == ENOENT) - return 0; - return error_errno(_("could not open '%s' for reading"), - filename); - } - strbuf_read(&buf, fd, 0); - close(fd); - if (parse_key_value_squoted(buf.buf, &kv)) - goto finish; - - for (i = 0; i < kv.nr; i++) { - if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) { - if (name_i >= 0) - name_i = error(_("'GIT_AUTHOR_NAME' already given")); - else - name_i = i; - } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) { - if (email_i >= 0) - email_i = error(_("'GIT_AUTHOR_EMAIL' already given")); - else - email_i = i; - } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) { - if (date_i >= 0) - date_i = error(_("'GIT_AUTHOR_DATE' already given")); - else - date_i = i; - } else { - err = error(_("unknown variable '%s'"), - kv.items[i].string); - } - } - if (name_i == -2) - error(_("missing 'GIT_AUTHOR_NAME'")); - if (email_i == -2) - error(_("missing 'GIT_AUTHOR_EMAIL'")); - if (date_i == -2) - error(_("missing 'GIT_AUTHOR_DATE'")); - if (date_i < 0 || email_i < 0 || date_i < 0 || err) - goto finish; - state->author_name = kv.items[name_i].util; - state->author_email = kv.items[email_i].util; - state->author_date = kv.items[date_i].util; - retval = 0; -finish: - string_list_clear(&kv, !!retval); - strbuf_release(&buf); - return retval; + return read_author_script(filename, &state->author_name, + &state->author_email, &state->author_date, 1); } /** diff --git a/sequencer.c b/sequencer.c index dc2c58d464..3530dbeb6c 100644 --- a/sequencer.c +++ b/sequencer.c @@ -660,6 +660,111
[PATCH v3 1/5] am: don't die in read_author_script()
From: Phillip Wood The caller is already prepared to handle errors returned from this function so there is no need for it to die if it cannot read the file. Suggested-by: Eric Sunshine Signed-off-by: Phillip Wood --- builtin/am.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/builtin/am.c b/builtin/am.c index 5e866d17c7..b68578bc3f 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state) if (fd < 0) { if (errno == ENOENT) return 0; - die_errno(_("could not open '%s' for reading"), filename); + return error_errno(_("could not open '%s' for reading"), + filename); } strbuf_read(&buf, fd, 0); close(fd); -- 2.19.1
[PATCH v3 5/5] sequencer: use read_author_script()
From: Phillip Wood Use the new function added in the last commit to read the author script, updating read_env_script() and read_author_ident(). We now have a single code path that reads the author script for am and all flavors of rebase. This changes the behavior of read_env_script() as previously it would set any environment variables that were in the author-script file. Now it is an error if the file contains other variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and GIT_AUTHOR_DATE are missing. This is what am and the non interactive version of rebase have been doing for several years so hopefully it will not cause a problem for interactive rebase users. The advantage is that we are reusing existing code from am which uses sq_dequote() to properly dequote variables. This fixes potential problems with user edited scripts as read_env_script() which did not track quotes properly. This commit also removes the fallback code for checking for a broken author script after git is upgraded when a rebase is stopped. Now that the parsing uses sq_dequote() it will reliably return an error if the quoting is broken and the user will have to abort the rebase and restart. This isn't ideal but it's a corner case and the detection of the broken quoting could be confused by user edited author scripts. Signed-off-by: Phillip Wood --- Notes: changes since v1 - use argv_array_pushf() as suggested by Eric - fixed strbuf handling as suggested by Eric - fix comments and commit message to reflect changed behavior of read_env_script() sequencer.c | 97 - 1 file changed, 21 insertions(+), 76 deletions(-) diff --git a/sequencer.c b/sequencer.c index 3530dbeb6c..987542f67c 100644 --- a/sequencer.c +++ b/sequencer.c @@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, char **email, char **date, } /* - * write_author_script() used to fail to terminate the last line with a "'" and - * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for - * the terminating "'" on the last line to see how "'" has been escaped in case - * git was upgraded while rebase was stopped. - */ -static int quoting_is_broken(const char *s, size_t n) -{ - /* Skip any empty lines in case the file was hand edited */ - while (n > 0 && s[--n] == '\n') - ; /* empty */ - if (n > 0 && s[n] != '\'') - return 1; - - return 0; -} - -/* - * Read a list of environment variable assignments (such as the author-script - * file) into an environment block. Returns -1 on error, 0 otherwise. + * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a + * file with shell quoting into struct argv_array. Returns -1 on + * error, 0 otherwise. */ static int read_env_script(struct argv_array *env) { - struct strbuf script = STRBUF_INIT; - int i, count = 0, sq_bug; - const char *p2; - char *p; + char *name, *email, *date; - if (strbuf_read_file(&script, rebase_path_author_script(), 256) <= 0) + if (read_author_script(rebase_path_author_script(), + &name, &email, &date, 0)) return -1; - /* write_author_script() used to quote incorrectly */ - sq_bug = quoting_is_broken(script.buf, script.len); - for (p = script.buf; *p; p++) - if (sq_bug && skip_prefix(p, "'''", &p2)) - strbuf_splice(&script, p - script.buf, p2 - p, "'", 1); - else if (skip_prefix(p, "'\\''", &p2)) - strbuf_splice(&script, p - script.buf, p2 - p, "'", 1); - else if (*p == '\'') - strbuf_splice(&script, p-- - script.buf, 1, "", 0); - else if (*p == '\n') { - *p = '\0'; - count++; - } - for (i = 0, p = script.buf; i < count; i++) { - argv_array_push(env, p); - p += strlen(p) + 1; - } + argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name); + argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email); + argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date); + free(name); + free(email); + free(date); return 0; } @@ -833,54 +804,28 @@ static char *get_author(const char *message) /* Read author-script and return an ident line (author timestamp) */ static const char *read_author_ident(struct strbuf *buf) { - const char *keys[] = { - "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE=" - }; struct strbuf out = STRBUF_INIT; - char *in, *eol; - const char *val[3]; - int i = 0; + char *name, *email, *date; - if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0) + if (read_author_script(rebase_path_author_script(), + &name, &email, &date, 0))