Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules
Am 26.09.2017 um 00:50 schrieb Stefan Beller: submodule..update can be assigned an arbitrary command via setting it to "!command". When this command is found in the regular config, Git ought to just run that command instead of other update mechanisms. However if that command is just found in the .gitmodules file, it is potentially untrusted, which is why we do not run it. Add a test confirming the behavior. Suggested-by: Jonathan NiederSigned-off-by: Stefan Beller --- updated to use the super robust script. Thanks Jonathan, Stefan t/t7406-submodule-update.sh | 14 ++ 1 file changed, 14 insertions(+) diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh index 034914a14f..d718cb00e7 100755 --- a/t/t7406-submodule-update.sh +++ b/t/t7406-submodule-update.sh @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in .git/config' ' ) ' +test_expect_success 'submodule update - command in .gitmodules is ignored' ' + test_when_finished "git -C super reset --hard HEAD^" && + + write_script must_not_run.sh <<-EOF && + >$TEST_DIRECTORY/bad + EOF I am pretty confident that this does not test what you intend to test. Notice that $TEST_DIRECTORY is expanded when the script is written. But that path contains a blank, and we have something like this in the test script: #!/bin/sh >/the/build/directory/t/trash directory.t7406/bad If you inject the bug against which this test protects into git-submodule, you should find a file "trash" in your t directory, and the file "bad" still absent. Not to mention that the script fails because it cannot run "directory.t7406/bad". To fix that, you should use and exported variable and access that from the test script, for example: write_script must_not_run.sh <<-\EOF && >"$TEST_DIRECTORY"/bad EOF ... ( export TEST_DIRECTORY && git -C super submodule update submodule ) && test_path_is_missing bad + + git -C super config -f .gitmodules submodule.submodule.update "!$TEST_DIRECTORY/must_not_run.sh" && + git -C super commit -a -m "add command to .gitmodules file" && + git -C super/submodule reset --hard $submodulesha1^ && + git -C super submodule update submodule && + test_path_is_missing bad +' + cat << EOF >expect Execution of 'false $submodulesha1' failed in submodule path 'submodule' EOF -- Hannes
Re: [PATCH 4/4] Move documentation of string_list into string-list.h
Junio C Hamanowrites: > > Thanks. I am not sure if you can safely reorder the contents of the > header files in general, but I trust that you made sure that this > does not introduce problems (e.g. referrals before definition). Alas, this time it seems my trust was grossly misplaced. Discarding this patch and redoing the integration cycle for the day.
Re: '--shallow-since' option is not available for git-pull in Git version 2.14.1
On Mon, Sep 25, 2017 at 04:31:10PM +0900, Sanggyu Nam wrote: > I’ve found that some subcommands such as git-clone, git-fetch, and > git-pull support an option named ‘--shallow-since’, as of Git version > 2.11. This option is documented in the man page of each subcommand. In > Git 2.14.1, I’ve checked that the option is available for git-clone > and git-fetch so that the history of a shallow repository is updated. > However, running git-pull with this option shows an error as follows: > > error: unknown option `shallow-since=2017-09-10T00:00:00+00:00' > > usage: git pull [] [ [...]] > ... > > I found that this option is not available in Git 2.14.1 on macOS and > Ubuntu 16.04.1. It seems the option handling of git-pull does not > match with what’s described in the man page. Since ‘git pull’ is a > shorthand for ‘git fetch’ followed by ‘git merge FETCH_HEAD’ in its > default mode, we can run these two commands in this order as a > workaround. > > This does not only count for --shallow-since, but also --deepen and --shallow-exclude.
Re: [PATCH 4/4] Move documentation of string_list into string-list.h
Han-Wen Nienhuyswrites: > This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did > the same for strbuf.h: > > * API documentation uses /** */ to set it apart from other comments. > > * Function names were stripped from the comments. > > * Ordering of the header was adjusted to follow the one from the text > file. > > * Edited some existing comments to follow the new standard. > > Signed-off-by: Han-Wen Nienhuys > --- Thanks. I am not sure if you can safely reorder the contents of the header files in general, but I trust that you made sure that this does not introduce problems (e.g. referrals before definition). Also, I am not sure what "the new standard" refers to. Have we established a new standard somewhere? Did we have discussion on it? Puzzled. Thanks. > Documentation/technical/api-string-list.txt | 209 > > string-list.h | 187 + > 2 files changed, 159 insertions(+), 237 deletions(-) > delete mode 100644 Documentation/technical/api-string-list.txt > > diff --git a/Documentation/technical/api-string-list.txt > b/Documentation/technical/api-string-list.txt > deleted file mode 100644 > index c08402b12..0 > --- a/Documentation/technical/api-string-list.txt > +++ /dev/null > @@ -1,209 +0,0 @@ > -string-list API > -=== > - > -The string_list API offers a data structure and functions to handle > -sorted and unsorted string lists. A "sorted" list is one whose > -entries are sorted by string value in `strcmp()` order. > - > -The 'string_list' struct used to be called 'path_list', but was renamed > -because it is not specific to paths. > - > -The caller: > - > -. Allocates and clears a `struct string_list` variable. > - > -. Initializes the members. You might want to set the flag `strdup_strings` > - if the strings should be strdup()ed. For example, this is necessary > - when you add something like git_path("..."), since that function returns > - a static buffer that will change with the next call to git_path(). > -+ > -If you need something advanced, you can manually malloc() the `items` > -member (you need this if you add things later) and you should set the > -`nr` and `alloc` members in that case, too. > - > -. Adds new items to the list, using `string_list_append`, > - `string_list_append_nodup`, `string_list_insert`, > - `string_list_split`, and/or `string_list_split_in_place`. > - > -. Can check if a string is in the list using `string_list_has_string` or > - `unsorted_string_list_has_string` and get it from the list using > - `string_list_lookup` for sorted lists. > - > -. Can sort an unsorted list using `string_list_sort`. > - > -. Can remove duplicate items from a sorted list using > - `string_list_remove_duplicates`. > - > -. Can remove individual items of an unsorted list using > - `unsorted_string_list_delete_item`. > - > -. Can remove items not matching a criterion from a sorted or unsorted > - list using `filter_string_list`, or remove empty strings using > - `string_list_remove_empty_items`. > - > -. Finally it should free the list using `string_list_clear`. > - > -Example: > - > - > -struct string_list list = STRING_LIST_INIT_NODUP; > -int i; > - > -string_list_append(, "foo"); > -string_list_append(, "bar"); > -for (i = 0; i < list.nr; i++) > - printf("%s\n", list.items[i].string) > - > - > -NOTE: It is more efficient to build an unsorted list and sort it > -afterwards, instead of building a sorted list (`O(n log n)` instead of > -`O(n^2)`). > -+ > -However, if you use the list to check if a certain string was added > -already, you should not do that (using unsorted_string_list_has_string()), > -because the complexity would be quadratic again (but with a worse factor). > - > -Functions > -- > - > -* General ones (works with sorted and unsorted lists as well) > - > -`string_list_init`:: > - > - Initialize the members of the string_list, set `strdup_strings` > - member according to the value of the second parameter. > - > -`filter_string_list`:: > - > - Apply a function to each item in a list, retaining only the > - items for which the function returns true. If free_util is > - true, call free() on the util members of any items that have > - to be deleted. Preserve the order of the items that are > - retained. > - > -`string_list_remove_empty_items`:: > - > - Remove any empty strings from the list. If free_util is true, > - call free() on the util members of any items that have to be > - deleted. Preserve the order of the items that are retained. > - > -`print_string_list`:: > - > - Dump a string_list to stdout, useful mainly for debugging purposes. It > - can take an optional header argument and it writes out the > - string-pointer pairs of the string_list, each one in its own line. > - > -`string_list_clear`:: > - > - Free a
Re: [PATCH 2/4] Clarify return value ownership of real_path and read_gitfile_gently
Han-Wen Nienhuyswrites: >>Subject: Re: [PATCH 2/4] Clarify return value ownership of real_path and >>read_gitfile_gently We try to make "shortlog --no-merges" output consistently say what area the change is about, followed by a colon, followed by a short description. > Signed-off-by: Han-Wen Nienhuys > --- > abspath.c | 3 +++ > setup.c | 3 ++- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/abspath.c b/abspath.c > index 708aff8d4..792a2d533 100644 > --- a/abspath.c > +++ b/abspath.c > @@ -202,6 +202,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char > *path, > return retval; > } > > +/* Resolve `path` into an absolute, cleaned-up path. The return value > + * comes from a global shared buffer and should not be freed. > + */ > const char *real_path(const char *path) > { > static struct strbuf realpath = STRBUF_INIT; The part about "what it does" is a good thing to have here, but I do not think the second sentence adds much if it stays here (if the comment were in a header file far from implementation, then it is a different matter). Besides, "should not be freed" is not the only important thing---it is already implied by the function returning "const char *" (i.e. the caller/receiver does not own it). It will stay valid only until the next call, so the caller needs to xstrdup() it if it wants to keep the value. That is equally important. But quite honestly, that pattern appears very often in our codebase (all users of get_pathname() share the same characteristics) that these details (i.e. do not free, expect the buffer to be recycled) probaly is not worth it. Mentioning that the returned value is in a shared buffer for short-term use would be sufficient. > diff --git a/setup.c b/setup.c > index 6d8380acd..31853724c 100644 > --- a/setup.c > +++ b/setup.c > @@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char > *path, const char *dir) > > /* > * Try to read the location of the git directory from the .git file, > - * return path to git directory if found. > + * return path to git directory if found. The return value comes from > + * a shared pool and should not be freed. OK, the returned value comes from the return value of real_path(), so the early half of the new warning is worth having. "should not be freed" is both extraneous (for those who are already aware of the common pattern in our codebase) and insufficient (for those who are not yet). Thanks. > * > * On failure, if return_error_code is not NULL, return_error_code > * will be set to an error code and NULL will be returned. If
Re: [PATCH] fast-import: checkpoint: dump branches/tags/marks even if object_count==0
"Eric Rannaud"writes: > The checkpoint command cycles packfiles if object_count != 0, a sensible > test or there would be no pack files to write. Since 820b931012, the > command also dumps branches, tags and marks, but still conditionally. > However, it is possible for a command stream to modify refs or create > marks without creating any new objects. That reasoning sounds sensible. Especially given the discussion of "checkpoint" and "progress" we can see in "git fast-import --help" documentation. E.g. Placing a progress command immediately after a checkpoint will inform the reader when the checkpoint has been completed and it can safely access the refs that fast-import updated. would not be true without this change, I suspect. Can we also add a new test or two that protect this from future breakages? Thanks.
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff Kingwrites: >> #ifndef EUNDERFLOW >> # ifdef ENODATA >> # define EUNDERFLOW ENODATA >> # else >> # define EUNDERFLOW ESPIPE >> # endif >> #endif > > Right, I think our mails just crossed but I'm leaning in this direction. Hmph, I may be slow (or may be skimming the exchanges too fast), but what exactly is wrong with "0"? As long as we do not have to tell two or more "not exactly an error from syscall in errno" cases, I would think "0" is the best value to use. If the syserror message _is_ the issue, then we'd need to either pick an existing errno that is available everywhere (with possibly suboptimal message), or pick something and prepare a fallback for platforms that lack the errno, so picking "0" as the value and use whatever logic we would have used for the "fallback" would not sound too bad. I.e. if (read_in_full(..., size) != size) if (errno) die_errno("oops"); else die("short read"); If the callsite were too numerous, #define die_errno_or(msg1, msg2) if (errno) die_errno(msg1); else die(msg2) perhaps?
Re: -s theirs use-case(s) Was: BUG: merge -s theirs is not in effect
Yaroslav Halchenkowrites: > 1. As a workaround for absence of -m theirs I using mtheirs git alias: > (I believe provided to me awhile back here on the list): > > mtheirs = !sh -c 'git merge -s ours --no-commit $1 && git read-tree -m -u > $1' - > > and it worked fine for my usecases > > 2. I think that if there is a reason for -s ours to exist, so there for -s > theirs > since it is just the directionality of merges which changes between the two Just on this point. They are not exactly symmetric. Imagine there are some undesirable changes you want to vanquish from the world, but they have already built on useful changes on top of the undesirable changes. A hypothetical history might look like this: B---C / X---X---A / ---o---o your mainline where 'X' denotes those unwanted changes. With a "-s ours" merge, you can declare that changes on the other branch will never be merged to your branch, i.e. B---C / X---X---A / \ ---o---o---M your mainline and then you can safely merge A and C into your branch, without having to worry about them bringing the unwanted changes to your tree state. B---C / \ X---X---A \ / \ \ \ ---o---o---M---N---O your mainline That is the primary reason why "-s ours" exists, i.e. you do not control the branch where mistakes X were made because that is somebody else's history. The symmetiric case where _you_ have wrong changes do not need "-s theirs". These mistakes X are yours, so are the changes depend on them: B---C / X---X---A / ---o---o their mainline and you can just rebase A, B and C on top of their mainline while getting rid of Xs yourself before publishing. B'--C' / ---o---o---A' The reason why ours and theirs are not symmetric is because you are you and not them---the control and ownership of our history and their history is not symmetric. There may be valid workflows that benefit from "-s theirs", and I would not be surprised at all if we found more of them in the past 9 years since we had the "why -s theirs does not exist" discussion in 2008. But "because -s ours can be used in reverse to emulate" is not a valid excuse to add "-s theirs". It can be used a rationale against adding it (e.g. "-s theirs generally is discouraged because it forsters a bad workflow, but in a very rare case where it might be useful, you can always check out their branch and merge yours using '-s ours' to emulate it, so we do not lose any functionality even if we did not add it"), though.
[PATCH] fast-import: checkpoint: dump branches/tags/marks even if object_count==0
The checkpoint command cycles packfiles if object_count != 0, a sensible test or there would be no pack files to write. Since 820b931012, the command also dumps branches, tags and marks, but still conditionally. However, it is possible for a command stream to modify refs or create marks without creating any new objects. For example, reset a branch: $ git fast-import reset refs/heads/master from refs/heads/master^ checkpoint but refs/heads/master remains unchanged. Other example: a commit command that re-creates an object that already exists in the object database. This fix unconditionally calls dump_{branches,tags,marks}() for all checkpoint commands. dump_branches() and dump_tags() are cheap to call in the case of a no-op. Signed-off-by: Eric Rannaud--- fast-import.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fast-import.c b/fast-import.c index 35bf671f12c4..d5e4cf0bad41 100644 --- a/fast-import.c +++ b/fast-import.c @@ -3189,10 +3189,10 @@ static void checkpoint(void) checkpoint_requested = 0; if (object_count) { cycle_packfile(); - dump_branches(); - dump_tags(); - dump_marks(); } + dump_branches(); + dump_tags(); + dump_marks(); } static void parse_checkpoint(void) -- 2.14.1
Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect
Junio C Hamanowrites: > Junio C Hamano writes: > >> I do not recall people talking about symbolic links but the case of >> binary files has been on the wishlist for a long time, and I do not >> know of anybody who is working on (or is planning to work on) it. > > Ah, I misremembered. > > We've addressed the "binary files" case back in 2012 with a944af1d > ("merge: teach -Xours/-Xtheirs to binary ll-merge driver", > 2012-09-08). I do not know offhand if it is just as easy to plumb > the MERGE_FAVOR_{OURS,THEIRS} bits thru the symbolic link codepath, > like that patch did to the binary file codepath. Perhaps the attached (totally untested) patch might be a good starting point. I do not know if you are interested in hacking on Git, and I do not feel offended if you are not, but perhaps somebody else might get interested in seeing if this #leftoverbits is a good direction to go in, and finishing it with docs and tests if it is ;-) merge-recursive.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1d3f8f0d22..3605275ca3 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1026,10 +1026,19 @@ static int merge_file_1(struct merge_options *o, >oid, !o->call_depth); } else if (S_ISLNK(a->mode)) { - oidcpy(>oid, >oid); - - if (!oid_eq(>oid, >oid)) - result->clean = 0; + switch (o->recursive_variant) { + case MERGE_RECURSIVE_NORMAL: + oidcpy(>oid, >oid); + if (!oid_eq(>oid, >oid)) + result->clean = 0; + break; + case MERGE_RECURSIVE_OURS: + oidcpy(>oid, >oid); + break; + case MERGE_RECURSIVE_THEIRS: + oidcpy(>oid, >oid); + break; + } } else die("BUG: unsupported object type in the tree"); }
Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect
Junio C Hamanowrites: > I do not recall people talking about symbolic links but the case of > binary files has been on the wishlist for a long time, and I do not > know of anybody who is working on (or is planning to work on) it. Ah, I misremembered. We've addressed the "binary files" case back in 2012 with a944af1d ("merge: teach -Xours/-Xtheirs to binary ll-merge driver", 2012-09-08). I do not know offhand if it is just as easy to plumb the MERGE_FAVOR_{OURS,THEIRS} bits thru the symbolic link codepath, like that patch did to the binary file codepath.
Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect
Yaroslav Halchenkowrites: > yes it does. Thanks. And that is where I realized that I should have used -X > theirs (not -s theirs), as the instruction on the option for the > (recursive) merge. And now problem is more specific: > > - conflict within file content editing was resolved as instructed > (taking "theirs" version) > > - BUT symlink was not taken from "theirs" and left as unresolved conflict: I wouldn't call it working-as-intended, but this unfortunately is expected. You'd encounter exactly the same behaviour when changes to a binary file conflicts. It is because -X _ONLY_ kicks in (i.e. that is how it is defined) when we would otherwise throw the half-merged result: <<< our version looks like this === their version looks like this >>> and ask you to edit that to a correct resolution. Because you would not normally be given something like the above when merging conflicted changes to symbolic links or to binary files, -X has no chance of affecting the outcome. I do not recall people talking about symbolic links but the case of binary files has been on the wishlist for a long time, and I do not know of anybody who is working on (or is planning to work on) it.
Re: [PATCH] Documentation: consolidate submodule..update
Brandon Williamswrites: >> +The method by which a submodule is updated by 'git submodule update', >> +which is the only affected command, others such as >> +'git checkout --recurse-submodules' are unaffected. It exists for >> +historical reasons, when 'git submodule' was the only command to >> +interact with submodules; settings like `submodule.active` >> +and `pull.rebase` are more specific. It is copied to the config >> +by `git submodule init` from the .gitmodules file. >> +Allowed values here are 'checkout', 'rebase', 'merge' or 'none'. >> +See description of 'update' command in linkgit:git-submodule[1] >> +for their meaning. Note that the '!command' form is intentionally >> +ignored here for security reasons. > > This probably needs to be tweaked a bit to say that the '!command' form > is ignored by submodule init, in that it isn't copied over from the > .gitmodules file, but if it is configured in your config it will be > respected by 'submodule update'. I do not think gitmodules.txt is the place to say anything more than what Stefan's patch says above. Perhaps config.txt should mention that in addition to what the variable with the same in .gitmodules can take, it is allowed to use the !command form. IOW, in config.txt submodule..update:: Specifies how 'git submodule update' should work on the named submodule. In addition to the values that can be specified in (and copied from) `.gitmodules` (see linkgit:gitmodules[5]), `!command` form can also be used. or something, perhaps?
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > What do you think of patch 7 in light of this? If the short-read case > gives us a sane errno, should we even bother trying to consistently > handle its error separately? I like the read_exactly_or_die variant because it makes callers more concise, but on the other hand a caller handling the error can write a more meaningful error message with the right amount of context. So I think you're right that it's better to drop patch 7. Thanks, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 05:20:20PM -0700, Jonathan Nieder wrote: > > What do you think of ENODATA? The glibc text for it is pretty > > appropriate. If it's not available everywhere, we'd have to fallback to > > something else (EIO? 0?). I don't know how esoteric it is. The likely > > candidate to be lacking it is Windows. > > ENODATA with a fallback to ESPIPE sounds fine to me. > > read() would never produce ESPIPE because it doesn't seek. > > So that would be: > > /* errno value to use for early EOF */ > #ifndef ENODATA > #define ENODATA ESPIPE > #endif Thanks, I'll re-roll with that (and thank you Stefan for mentioning ENODATA again; I came across it early in my search but discounted it as not quite right semantically. I should have been looking at the strerror() text, which is what really matters). What do you think of patch 7 in light of this? If the short-read case gives us a sane errno, should we even bother trying to consistently handle its error separately? -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote: >> #ifndef EUNDERFLOW >> # ifdef ENODATA >> # define EUNDERFLOW ENODATA >> # else >> # define EUNDERFLOW ESPIPE >> # endif >> #endif > > Right, I think our mails just crossed but I'm leaning in this direction. > I'd prefer to call it SHORT_READ_ERRNO or something, though. Your > "#ifndef EUNDERFLOW" had me thinking that this was something that a real > platform might have, but AFAICT you just made it up. Agreed. Two of the risks of replying too quickly. :) Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 05:17:32PM -0700, Jonathan Nieder wrote: > Jonathan Nieder wrote: > > On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote: > > >> ENODATA is not too bad. On my glibc system it yields "No data available" > >> from strerror(), which is at least comprehensible. > >> > >> We're still left with the question of whether it is defined everywhere > >> (and what to fallback to when it isn't). > > > > So, > > > > #ifndef EUNDERFLOW > > #ifdef ENODATA > > #define ENODATA EUNDERFLOW > > #else > > #define ENODATA ESPIPE > > #endif > > #endif > > > > ? > > Uh, pretend I said > > #ifndef EUNDERFLOW > # ifdef ENODATA > # define EUNDERFLOW ENODATA > # else > # define EUNDERFLOW ESPIPE > # endif > #endif Right, I think our mails just crossed but I'm leaning in this direction. I'd prefer to call it SHORT_READ_ERRNO or something, though. Your "#ifndef EUNDERFLOW" had me thinking that this was something that a real platform might have, but AFAICT you just made it up. -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > I definitely would prefer to avoid EIO, or anything that an actual > read() might return. > > What do you think of ENODATA? The glibc text for it is pretty > appropriate. If it's not available everywhere, we'd have to fallback to > something else (EIO? 0?). I don't know how esoteric it is. The likely > candidate to be lacking it is Windows. ENODATA with a fallback to ESPIPE sounds fine to me. read() would never produce ESPIPE because it doesn't seek. So that would be: /* errno value to use for early EOF */ #ifndef ENODATA #define ENODATA ESPIPE #endif Thanks, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jonathan Nieder wrote: > On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote: >> ENODATA is not too bad. On my glibc system it yields "No data available" >> from strerror(), which is at least comprehensible. >> >> We're still left with the question of whether it is defined everywhere >> (and what to fallback to when it isn't). > > So, > > #ifndef EUNDERFLOW > #ifdef ENODATA > #define ENODATA EUNDERFLOW > #else > #define ENODATA ESPIPE > #endif > #endif > > ? Uh, pretend I said #ifndef EUNDERFLOW # ifdef ENODATA # define EUNDERFLOW ENODATA # else # define EUNDERFLOW ESPIPE # endif #endif Sorry for the nonsense. Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 05:13:54PM -0700, Jonathan Nieder wrote: > > I actually prefer "0" of the all of the options discussed. At least it > > is immediately clear that it is not a syscall error. > > I have a basic aversion to ": Success" in error messages. Whenever I > see such an error message, I stop trusting the program that produced it > not to be seriously buggy. Maybe I'm the only one? That's a fair criticism, I think. > If no actual errno works, we could make a custom strerror-like function > with our own custom errors (making them negative to avoid clashing with > standard errno values), but this feels like overkill. Yes, I also thought about that. It feels pretty invasive (I guess we'd wrap strerror() with our own custom function so that callers don't have to remember which one to use). > In the same spirit of misleadingly confusing too-long and too-short, > there is also ERANGE ("Result too large"), which doesn't work here. > There's also EPROTO "Protocol error", but that's about protocols, not > file formats. More vague and therefor maybe more applicable is EBADMSG > "Bad message". There's also ENOMSG "No message of the desired type". > > If the goal is to support debugging, an error like EPIPE "Broken pipe" > or ESPIPE "Invalid seek" would be likely to lead me in the right > direction (wrong file size), even though it is misleading about how > the error surfaced. > > We could also avoid trying to be cute and use something generic like > EIO "Input/output error". I definitely would prefer to avoid EIO, or anything that an actual read() might return. What do you think of ENODATA? The glibc text for it is pretty appropriate. If it's not available everywhere, we'd have to fallback to something else (EIO? 0?). I don't know how esoteric it is. The likely candidate to be lacking it is Windows. -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 08:09:13PM -0400, Jeff King wrote: > On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote: >> After reading this discussion from the sideline, maybe >> >> ENODATA No message is available on the STREAM head read >> queue (POSIX.1) >> >> is not too bad after all. Or >> >> ESPIPE Invalid seek (POSIX.1) >> >> Technically we do not seek, but one could imagine we'd seek there >> to read? > > ENODATA is not too bad. On my glibc system it yields "No data available" > from strerror(), which is at least comprehensible. > > We're still left with the question of whether it is defined everywhere > (and what to fallback to when it isn't). So, #ifndef EUNDERFLOW #ifdef ENODATA #define ENODATA EUNDERFLOW #else #define ENODATA ESPIPE #endif #endif ? Windows has ESPIPE, and I'm not sure what other non-POSIX platform we need to worry about. Thanks, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote: >> All that really matters is the strerror() that the user would see. > > Agreed, though I didn't think those were necessarily standardized. The standard allows them to vary for the sake of internationalization, but they are de facto constants (probably because of application authors like us abusing them ;-)). [...] >> Of course, it's even >> better if we fix the callers and don't try to wedge this into errno. > > Yes. This patch is just a stop-gap. Perhaps we should abandon it > entirely. But I couldn't find a way to fix all the callers. If you have > a function that just returns "-1" when it sees a read_in_full() error, > how does _its_ caller tell the difference? > > I guess the answer is that the interface of the sub-function calling > read_in_full() needs to change. Yes. :/ >> If you are okay with the too-long/too-short confusion in EOVERFLOW, then >> there is EMSGSIZE: >> >> Message too long > > I don't really like that one either. > > I actually prefer "0" of the all of the options discussed. At least it > is immediately clear that it is not a syscall error. I have a basic aversion to ": Success" in error messages. Whenever I see such an error message, I stop trusting the program that produced it not to be seriously buggy. Maybe I'm the only one? If no actual errno works, we could make a custom strerror-like function with our own custom errors (making them negative to avoid clashing with standard errno values), but this feels like overkill. In the same spirit of misleadingly confusing too-long and too-short, there is also ERANGE ("Result too large"), which doesn't work here. There's also EPROTO "Protocol error", but that's about protocols, not file formats. More vague and therefor maybe more applicable is EBADMSG "Bad message". There's also ENOMSG "No message of the desired type". If the goal is to support debugging, an error like EPIPE "Broken pipe" or ESPIPE "Invalid seek" would be likely to lead me in the right direction (wrong file size), even though it is misleading about how the error surfaced. We could also avoid trying to be cute and use something generic like EIO "Input/output error". Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 05:06:58PM -0700, Stefan Beller wrote: > >> If you are okay with the too-long/too-short confusion in EOVERFLOW, then > >> there is EMSGSIZE: > >> > >> Message too long > > > > I don't really like that one either. > > > > I actually prefer "0" of the all of the options discussed. At least it > > is immediately clear that it is not a syscall error. > > > > -Peff > > After reading this discussion from the sideline, maybe > > ENODATA No message is available on the STREAM head read > queue (POSIX.1) > > is not too bad after all. Or > > ESPIPE Invalid seek (POSIX.1) > > Technically we do not seek, but one could imagine we'd seek there > to read? ENODATA is not too bad. On my glibc system it yields "No data available" from strerror(), which is at least comprehensible. We're still left with the question of whether it is defined everywhere (and what to fallback to when it isn't). -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 5:01 PM, Jeff Kingwrote: > >> If you are okay with the too-long/too-short confusion in EOVERFLOW, then >> there is EMSGSIZE: >> >> Message too long > > I don't really like that one either. > > I actually prefer "0" of the all of the options discussed. At least it > is immediately clear that it is not a syscall error. > > -Peff After reading this discussion from the sideline, maybe ENODATA No message is available on the STREAM head read queue (POSIX.1) is not too bad after all. Or ESPIPE Invalid seek (POSIX.1) Technically we do not seek, but one could imagine we'd seek there to read?
Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules
Stefan Beller wrote: > submodule..update can be assigned an arbitrary command via setting > it to "!command". When this command is found in the regular config, Git > ought to just run that command instead of other update mechanisms. > > However if that command is just found in the .gitmodules file, it is > potentially untrusted, which is why we do not run it. Add a test > confirming the behavior. > > Suggested-by: Jonathan Nieder> Signed-off-by: Stefan Beller > --- > t/t7406-submodule-update.sh | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh > index 034914a14f..d718cb00e7 100755 > --- a/t/t7406-submodule-update.sh > +++ b/t/t7406-submodule-update.sh > @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in > .git/config' ' > ) > ' > > +test_expect_success 'submodule update - command in .gitmodules is ignored' ' > + test_when_finished "git -C super reset --hard HEAD^" && > + > + write_script must_not_run.sh <<-EOF && > + >$TEST_DIRECTORY/bad > + EOF > + > + git -C super config -f .gitmodules submodule.submodule.update > "!$TEST_DIRECTORY/must_not_run.sh" && Long line, but I don't think I care. I wish there were a tool like "make style" to format shell scripts. > + git -C super commit -a -m "add command to .gitmodules file" && > + git -C super/submodule reset --hard $submodulesha1^ && > + git -C super submodule update submodule && > + test_path_is_missing bad > +' Per offline discussion, you tested that this fails when you use .git/config instead of .gitmodules, so there aren't any subtle typos here. :) Reviewed-by: Jonathan Nieder Thanks for writing it.
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 04:55:10PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > [EOVERFLOW] > > The file is a regular file, nbyte is greater than 0, the starting > > position is before the end-of-file, and the starting position is > > greater than or equal to the offset maximum established in the open > > file description associated with fildes. > > > > That's not _exactly_ what's going on here, but it's pretty close. And is > > what you would get if you implemented read_exactly() in terms of > > something like pread(). > > All that really matters is the strerror() that the user would see. Agreed, though I didn't think those were necessarily standardized. > For EOVERFLOW, that is > > Value too large to be stored in data type Interestingly that does not seem to be a good description for the case POSIX describes above. > For EILSEQ, it is > > Illegal byte sequence > > Neither is perfect, but the latter seems like a better fit for the kind > of file format errors we're describing here. I find that one actively misleading. It's not the bytes we saw, it's the lack of them. > Of course, it's even > better if we fix the callers and don't try to wedge this into errno. Yes. This patch is just a stop-gap. Perhaps we should abandon it entirely. But I couldn't find a way to fix all the callers. If you have a function that just returns "-1" when it sees a read_in_full() error, how does _its_ caller tell the difference? I guess the answer is that the interface of the sub-function calling read_in_full() needs to change. > If you are okay with the too-long/too-short confusion in EOVERFLOW, then > there is EMSGSIZE: > > Message too long I don't really like that one either. I actually prefer "0" of the all of the options discussed. At least it is immediately clear that it is not a syscall error. -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > [EOVERFLOW] > The file is a regular file, nbyte is greater than 0, the starting > position is before the end-of-file, and the starting position is > greater than or equal to the offset maximum established in the open > file description associated with fildes. > > That's not _exactly_ what's going on here, but it's pretty close. And is > what you would get if you implemented read_exactly() in terms of > something like pread(). All that really matters is the strerror() that the user would see. For EOVERFLOW, that is Value too large to be stored in data type For EILSEQ, it is Illegal byte sequence Neither is perfect, but the latter seems like a better fit for the kind of file format errors we're describing here. Of course, it's even better if we fix the callers and don't try to wedge this into errno. If you are okay with the too-long/too-short confusion in EOVERFLOW, then there is EMSGSIZE: Message too long Hope that helps, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote: >> Jef King wrote: >>> errno = 0; >>> read_in_full(fd, buf, sizeof(buf)); >>> if (errno) >>> die_errno("oops"); [...] >>> in general we frown on it for calls like >>> read(). >> >> Correct. Actually more than "frown on": except for with the few calls >> like strtoul that are advertised to work this way, POSIX does not make >> the guarantee the above code would rely on, at all. >> >> So it's not just frowned upon: it's so unportable that the standard >> calls it out as something that won't work. > > Is it unportable? Certainly read() is free reset errno to zero on > success. But is it allowed to set it to another random value? > > I think we're getting pretty academic here, but I'm curious if you have > a good reference. Yes, it is allowed to set it to another random value. It's a common thing for implementations to do when they hit a recoverable error, which they sometimes do (e.g. think EFAULT, EINTR, ETIMEDOUT, or an implementation calling strtoul and getting EINVAL or ERANGE). glibc only recently started trying to avoid this kind of behavior, because even though the standard allows it, users hate it. POSIX.1-2008 XSI 2.3 "Error Numbers" says Some functions provide the error number in a variable accessed through the symbol errno, defined by including the header. The value of errno should only be examined when it is indicated to be valid by a function's return value. See http://austingroupbugs.net/view.php?id=374 for an example where someone wanted to add a new guarantee of that kind to a function. Hope that helps, Jonathan
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 07:37:32PM -0400, Jeff King wrote: > > Correct. Actually more than "frown on": except for with the few calls > > like strtoul that are advertised to work this way, POSIX does not make > > the guarantee the above code would rely on, at all. > > > > So it's not just frowned upon: it's so unportable that the standard > > calls it out as something that won't work. > > Is it unportable? Certainly read() is free reset errno to zero on > success. But is it allowed to set it to another random value? > > I think we're getting pretty academic here, but I'm curious if you have > a good reference. Answering my own question. POSIX says: No function in this volume of IEEE Std 1003.1-2001 shall set errno to 0. The setting of errno after a successful call to a function is unspecified unless the description of that function specifies that errno shall not be modified. So that does seem to outlaw errno-only checks for most functions. It makes me wonder if the recent getdelim() fix is technically violating this. It should instead be explicitly checking for feof(). > IMHO as long as it _is_ deterministic and recognize as not an error from > read(), that's the best we can do. Which is why I went with "0" in the > first place. Seeing "read error: success" is a common-ish idiom (to me > anyway) for "read didn't fail, but some user-space logic did", if only > because it often happens accidentally. Another option I ran across from POSIX: [EOVERFLOW] The file is a regular file, nbyte is greater than 0, the starting position is before the end-of-file, and the starting position is greater than or equal to the offset maximum established in the open file description associated with fildes. That's not _exactly_ what's going on here, but it's pretty close. And is what you would get if you implemented read_exactly() in terms of something like pread(). -Peff
Re: [PATCH] Documentation: consolidate submodule..update
Stefan Beller wrote: > By as-is I refer to origin/pu. Ah, okay. I'm not in that habit because pu frequently loses topics --- it's mostly useful as a tool to (1) distribute topic branches and (2) check whether they are compatible with each other. [...] > On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Niederwrote: >> Maybe we should work on first wordsmithing the doc and then figuring >> out where it goes? The current state of the doc with (?) three >> different texts, > > such that each different text highlights each locations purpose. > >> all wrong, > > Care to spell out this bold claim? In the state of "master", "man git-config" tells me that [submodule ""] update = ... sets the "default update procedure for a submodule". It suggests that I read about "git submodule update" in git-submodule(1) for more details. This is problematic because 1. It doesn't tell me when I should and should not set it. 2. I would expect "git checkout --recurse-submodules" to respect the default update procedure. 3. It doesn't tell me what commands like "git fetch --recurse-submodules" will do. 4. It doesn't point me to better alternatives, when this configuration doesn't fit my use case. "man git-submodule" doesn't have a section on submodule..update. It has references scattered throughout the manpage. It only tells me how the setting affects the "git submodule update" command and doesn't address the problems (1)-(4). "man gitmodules" also refers to this setting as the "default update procedure for the named submodule". It doesn't answer the questions (1)-(4) either. Thanks and hope that helps, Jonathan
Re: [PATCH v2 4/5] sha1_name: Parse less while finding common prefix
On Mon, Sep 25, 2017 at 2:54 AM, Derrick Stoleewrote: > Create get_hex_char_from_oid() to parse oids one hex character at a > time. This prevents unnecessary copying of hex characters in > extend_abbrev_len() when finding the length of a common prefix. > > p0008.1: find_unique_abbrev() for existing objects > -- > > For 10 repeated tests, each checking 100,000 known objects, we find the > following results when running in a Linux VM: > > | | Pack | Packed | Loose | Base | New|| > | Repo | Files | Objects | Objects| Time | Time | Rel% | > |---|---|-||||| > | Git | 1 | 230078 | 0 | 0.08 s | 0.08 s | 0.0% | > | Git | 5 | 230162 | 0 | 0.17 s | 0.16 s | - 5.9% | > | Git | 4 | 154310 | 75852 | 0.14 s | 0.12 s | -14.3% | > | Linux | 1 | 5606645 | 0 | 0.50 s | 0.25 s | -50.0% | > | Linux |24 | 5606645 | 0 | 2.41 s | 2.08 s | -13.7% | > | Linux |23 | 5283204 | 323441 | 1.99 s | 1.69 s | -15.1% | > | VSTS | 1 | 4355923 | 0 | 0.40 s | 0.22 s | -45.0% | > | VSTS |32 | 4355923 | 0 | 2.09 s | 1.99 s | - 4.8% | > | VSTS |31 | 4276829 | 79094 | 3.60 s | 3.20 s | -11.1% | > > For the Windows repo running in Windows Subsystem for Linux: > > Pack Files: 50 > Packed Objects: 22,385,898 > Loose Objects: 492 > Base Time: 4.61 s > New Time: 4.61 s > Rel %: 0.0% > > p0008.2: find_unique_abbrev() for missing objects > - > > For 10 repeated tests, each checking 100,000 missing objects, we find > the following results when running in a Linux VM: > > | | Pack | Packed | Loose | Base | New|| > | Repo | Files | Objects | Objects| Time | Time | Rel% | > |---|---|-||||| > | Git | 1 | 230078 | 0 | 0.06 s | 0.05 s | -16.7% | > | Git | 5 | 230162 | 0 | 0.14 s | 0.15 s | + 7.1% | > | Git | 4 | 154310 | 75852 | 0.12 s | 0.12 s | 0.0% | > | Linux | 1 | 5606645 | 0 | 0.40 s | 0.17 s | -57.5% | > | Linux |24 | 5606645 | 0 | 1.59 s | 1.30 s | -18.2% | > | Linux |23 | 5283204 | 323441 | 1.23 s | 1.10 s | -10.6% | > | VSTS | 1 | 4355923 | 0 | 0.25 s | 0.12 s | -52.0% | > | VSTS |32 | 4355923 | 0 | 1.45 s | 1.34 s | - 7.6% | > | VSTS |31 | 4276829 | 79094 | 1.59 s | 1.34 s | -15.7% | > > For the Windows repo running in Windows Subsystem for Linux: > > Pack Files: 50 > Packed Objects: 22,385,898 > Loose Objects: 492 > Base Time: 3.91 s > New Time: 3.08 s > Rel %: -21.1% > These number look pretty cool! > Signed-off-by: Derrick Stolee > > Signed-off-by: Derrick Stolee double signoff? > --- > sha1_name.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/sha1_name.c b/sha1_name.c > index f2a1ebe49..bb47b6702 100644 > --- a/sha1_name.c > +++ b/sha1_name.c > @@ -480,13 +480,22 @@ struct min_abbrev_data { > char *hex; > }; > > +static inline char get_hex_char_from_oid(const struct object_id *oid, int i) 'i' is not very descriptive, maybe add a comment? (I realize it is just walking through the char*s one by one) Maybe this function (together with the change in the while below) could go into hex.c as "int progressively_cmp_oids" that returns the position at which the given oids differ? > +{ > + static const char hex[] = "0123456789abcdef"; > + > + if ((i & 1) == 0) > + return hex[oid->hash[i >> 1] >> 4]; > + else > + return hex[oid->hash[i >> 1] & 0xf]; > +} sha1_to_hex_r has very similar code, though it iterates less and covers both cases in the loop body. That is the actual reason I propose moving this function (or a variant thereof) to hex.c as there we can share code.
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 04:33:16PM -0700, Jonathan Nieder wrote: > > If I do this: > > > > errno = 0; > > read_in_full(fd, buf, sizeof(buf)); > > if (errno) > > die_errno("oops"); > > > > then we'll claim an error, even though there was none (remember that > > it's only an error for _some_ callers to not read the whole length). > > > > This may be sufficiently odd that we don't need to care about it. There > > are some calls (like strtoul) which require this kind of clear-and-check > > strategy with errno, but in general we frown on it for calls like > > read(). > > Correct. Actually more than "frown on": except for with the few calls > like strtoul that are advertised to work this way, POSIX does not make > the guarantee the above code would rely on, at all. > > So it's not just frowned upon: it's so unportable that the standard > calls it out as something that won't work. Is it unportable? Certainly read() is free reset errno to zero on success. But is it allowed to set it to another random value? I think we're getting pretty academic here, but I'm curious if you have a good reference. > > We could also introduce a better helper like this: > > > > int read_exactly(int fd, void *buf, size_t count) > > { > > ssize_t ret = read_in_full(fd, buf, count); > > if (ret < 0) > > return -1; > > if (ret != count) { > > errno = EILSEQ; > > return -1; > > } > > return 0; > > } > > > > Then we know that touching errno always coincides with an error return. > > And it's shorter to check "< 0" compared to "!= count" in the caller. > > But of course a caller which wants to distinguish the two cases for its > > error messages then has to look at errno: > > That sounds nice, but it doesn't solve the original problem of callers > using read_in_full that way. Right. None of the options discussed in this patch can do so. They can only take a caller which doesn't distinguish between the cases, and give it a deterministic value in errno for the short-read case. IMHO as long as it _is_ deterministic and recognize as not an error from read(), that's the best we can do. Which is why I went with "0" in the first place. Seeing "read error: success" is a common-ish idiom (to me anyway) for "read didn't fail, but some user-space logic did", if only because it often happens accidentally. -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: >> --- a/wrapper.c >> +++ b/wrapper.c >> @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count) >> ssize_t loaded = xread(fd, p, count); >> if (loaded < 0) >> return -1; >> -if (loaded == 0) >> +if (loaded == 0) { >> +errno = EILSEQ; >> return total; >> +} > > If I do this: > > errno = 0; > read_in_full(fd, buf, sizeof(buf)); > if (errno) > die_errno("oops"); > > then we'll claim an error, even though there was none (remember that > it's only an error for _some_ callers to not read the whole length). > > This may be sufficiently odd that we don't need to care about it. There > are some calls (like strtoul) which require this kind of clear-and-check > strategy with errno, but in general we frown on it for calls like > read(). Correct. Actually more than "frown on": except for with the few calls like strtoul that are advertised to work this way, POSIX does not make the guarantee the above code would rely on, at all. So it's not just frowned upon: it's so unportable that the standard calls it out as something that won't work. > We could also introduce a better helper like this: > > int read_exactly(int fd, void *buf, size_t count) > { > ssize_t ret = read_in_full(fd, buf, count); > if (ret < 0) > return -1; > if (ret != count) { > errno = EILSEQ; > return -1; > } > return 0; > } > > Then we know that touching errno always coincides with an error return. > And it's shorter to check "< 0" compared to "!= count" in the caller. > But of course a caller which wants to distinguish the two cases for its > error messages then has to look at errno: That sounds nice, but it doesn't solve the original problem of callers using read_in_full that way. Thanks, Jonathan
Re: [PATCH 7/7] add xread_in_full() helper
On Mon, Sep 25, 2017 at 03:16:08PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > Many callers of read_in_full() expect to see exactly "len" > > bytes, and die if that isn't the case. > > micronit: Can this be named read_in_full_or_die? > > Otherwise it's too easy to mistake for a function like xread, which > has different semantics. > > I realize that xmalloc, xmemdupz, etc use a different convention. > That's yet another reason to be explicit, IMHO. Yeah, we've definitely misused the "x" prefix for different things. I agree that being explicit probably can't hurt. I wonder if it's worth calling it "read_exactly_or_die()" to emphasize that not reading enough bytes is a die-able offense. -Peff
Re: [PATCH 6/7] worktree: check the result of read_in_full()
On Mon, Sep 25, 2017 at 03:14:26PM -0700, Jonathan Nieder wrote: > > diff --git a/builtin/worktree.c b/builtin/worktree.c > > index 2f4a4ef9cd..87b3d70b0b 100644 > > --- a/builtin/worktree.c > > +++ b/builtin/worktree.c > > @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf > > *reason) > > } > > len = xsize_t(st.st_size); > > path = xmallocz(len); > > - read_in_full(fd, path, len); > > + if (read_in_full(fd, path, len) != len) { > > + strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did > > not match stat (%s)"), > > + id, strerror(errno)); > > I'm a little confused. The 'if' condition checks for a read error but > the message says something about 'stat'. > > If we're trying to double-check the 'stat' result, shouldn't we read > all the way to EOF in case the file got longer? If you wanted to really double-check the stat result, yes you'd want to make sure there aren't extra bytes either. But really we're just making sure we were able to read _enough_ bytes. To be honest, I'm tempted to rip out the whole thing and replace it with strbuf_read_file(), which seems more straightforward. The function does seem to rely on the stat() to get the mtime, so we'd have to leave that part in. -Peff
Re: [PATCH 3/7] read_in_full: reset errno before reading
On Mon, Sep 25, 2017 at 03:09:14PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > In an ideal world, callers would always distinguish between > > these cases and give a useful message for each. But as an > > easy way to make our imperfect world better, let's reset > > errno to a known value. The best we can do is "0", which > > will yield something like: > > > > unable to read: Success > > > > That's not great, but at least it's deterministic and makes > > it clear that we didn't see an error from read(). > > Yuck. Can we set errno to something more specific instead? Yes, but what? You've suggested EILSEQ here, but that feels a bit...funny. I guess at least it's unlikely that read() would ever set errno to that itself, so we can be reasonably sure that seeing it is a sign of a short read. I thought ERANGE might be a possible candidate, but it doesn't quite fit. Ditto for ENODATA. I _would_ like to have something meaningful, as it would mean the whole xread_in_full() nonsense from the final patch would be unnecessary. It would simply be reasonable to show the errno after read_in_full. Another question is whether EILSEQ (or the others) is actually defined everywhere we compile. If it isn't, we'll have to define it ourselves (but to what? strerror() won't return anything useful for it). > read(2) also doesn't promise not to clobber errno on success. Yes, though it's only a problem if you're using something other than 0. Speaking of funny errno clobbering, with your patch: > diff --git a/wrapper.c b/wrapper.c > index 61aba0b5c1..1842a99b87 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count) > ssize_t loaded = xread(fd, p, count); > if (loaded < 0) > return -1; > - if (loaded == 0) > + if (loaded == 0) { > + errno = EILSEQ; > return total; > + } If I do this: errno = 0; read_in_full(fd, buf, sizeof(buf)); if (errno) die_errno("oops"); then we'll claim an error, even though there was none (remember that it's only an error for _some_ callers to not read the whole length). This may be sufficiently odd that we don't need to care about it. There are some calls (like strtoul) which require this kind of clear-and-check strategy with errno, but in general we frown on it for calls like read(). We could also introduce a better helper like this: int read_exactly(int fd, void *buf, size_t count) { ssize_t ret = read_in_full(fd, buf, count); if (ret < 0) return -1; if (ret != count) { errno = EILSEQ; return -1; } return 0; } Then we know that touching errno always coincides with an error return. And it's shorter to check "< 0" compared to "!= count" in the caller. But of course a caller which wants to distinguish the two cases for its error messages then has to look at errno: if (read_exactly(fd, buf, len) < 0) { if (errno == EILSEQ) /* eek, now this abstraction is leaky */ die("short read"); else die_errno("read error"); } I dunno. All options seem pretty gross to me. -Peff
Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
On Mon, Sep 25, 2017 at 02:59:27PM -0700, Jonathan Nieder wrote: > > But during the conflict resolution in c50424a6f0 (Merge > > branch 'jk/write-in-full-fix', 2017-09-25), this morphed > > into > [...] > Good eyes. Thanks. Sort of. :) I usually continually rebase my topics until they end up empty. So I notice bits like this either during conflict resolution, or when the topic is left unexpectedly non-empty. It's not foolproof, though. Sometimes rebasing a topic on itself is so painful that I "rebase --skip" liberally in the early parts, and would miss any merge funniness. > By the way, the description from that merge commit > > Many codepaths did not diagnose write failures correctly when disks > go full, due to their misuse of write_in_full() helper function, > which have been corrected. > > does not look accurate to me. At least the "Many codepaths" part. > All of those changes except for three should be no-ops. The scariest > one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c, > which is about overflowing a "long" on Windows instead of error > handling. The ones in config.c are definitely wrong, too. Though I'd agree that "many" might be overstating it. -Peff
[PATCH] t7406: submodule..update command must not be run from .gitmodules
submodule..update can be assigned an arbitrary command via setting it to "!command". When this command is found in the regular config, Git ought to just run that command instead of other update mechanisms. However if that command is just found in the .gitmodules file, it is potentially untrusted, which is why we do not run it. Add a test confirming the behavior. Suggested-by: Jonathan NiederSigned-off-by: Stefan Beller --- updated to use the super robust script. Thanks Jonathan, Stefan t/t7406-submodule-update.sh | 14 ++ 1 file changed, 14 insertions(+) diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh index 034914a14f..d718cb00e7 100755 --- a/t/t7406-submodule-update.sh +++ b/t/t7406-submodule-update.sh @@ -406,6 +406,20 @@ test_expect_success 'submodule update - command in .git/config' ' ) ' +test_expect_success 'submodule update - command in .gitmodules is ignored' ' + test_when_finished "git -C super reset --hard HEAD^" && + + write_script must_not_run.sh <<-EOF && + >$TEST_DIRECTORY/bad + EOF + + git -C super config -f .gitmodules submodule.submodule.update "!$TEST_DIRECTORY/must_not_run.sh" && + git -C super commit -a -m "add command to .gitmodules file" && + git -C super/submodule reset --hard $submodulesha1^ && + git -C super submodule update submodule && + test_path_is_missing bad +' + cat << EOF >expect Execution of 'false $submodulesha1' failed in submodule path 'submodule' EOF -- 2.14.0.rc0.3.g6c2e499285
Re: [PATCH] Documentation: consolidate submodule..update
On Mon, Sep 25, 2017 at 3:22 PM, Jonathan Niederwrote: > Stefan Beller wrote: >> On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williams wrote: >>> On 09/25, Stefan Beller wrote: > Have one place to explain the effects of setting submodule..update instead of two. Signed-off-by: Stefan Beller --- >> I disagree. Actually, I think the git-config(1) blurb could just >> point here, and that the text here ought to be clear about what >> commands it affects and how an end user would use it. > > I tend to agree with the consolidation. Something like this? >>> >>> I like the consolidation, its easier to keep up to date when its only in >>> one place. >> >> After thinking about it further, I'd like to withdraw this patch >> for now. >> >> The reason is that this would also forward point to >> git-submodule.txt, such that we'd still have 2 places >> in which it is explained. The current situation with explaining >> it in 3 places is not too bad as each place hints at their specific >> circumstance: >> git-submodule.txt explains the values to set itself. >> gitmodules.txt explains what the possible fields in that file are, >> and regarding this field it points out it is ignored in the init call. >> config.txt: actually describe the effects of the setting. >> >> I think we can keep it as-is for now. > > Sorry, I got lost. Which state is as-is? By as-is I refer to origin/pu. > As a user, how do I find out what submodule.*.update is going to do > and which commands will respect it? The user would discover it via 'man git-config' I would assume, which covers any config variable? It also directs to git-submodule which is more detailed, but the text there is suitable for the casual reader. (pu has origin/sb/doc-config-submodule-update) > Maybe we should work on first wordsmithing the doc and then figuring > out where it goes? The current state of the doc with (?) three > different texts, such that each different text highlights each locations purpose. > all wrong, Care to spell out this bold claim? Thanks, Stefan
Re: git ls-tree -d doesn't work if the specified path is the repository root?
Hi, Roy Wellington wrote: > When I run `git ls-tree -d HEAD -- subdir` from the root of my > repository, where `subdir` is a subdirectory in that root, I get the > treehash of that subdirectory. This is what I expect. > > However, if I merely replace `subdir` with `.` (the root of the > repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns > the treehashes of the /children/ of the root, instead of the root > itself, contrary to the documented behavior of -d. Can you be more specific? What documentation led you to this expectation? I don't have a strong opinion about this behavior, but in any case if the documentation and command disagree about the correct behavior then one of them needs to be fixed. :) > Is there some reason for this? This behavior seems like a bug to me: > it means that prior to calling ls-tree I need to check if the > referenced path happens to be the root, and if so, find some other > means (rev-parse?) of converting it to a treehash. Does git rev-parse HEAD:subdir work better for what you're trying to accomplish? To get the root of the repository, you can use git rev-parse HEAD: Thanks and hope that helps, Jonathan
Re: [PATCH] Documentation: consolidate submodule..update
Stefan Beller wrote: > On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williamswrote: >> On 09/25, Stefan Beller wrote: >>> Have one place to explain the effects of setting submodule..update >>> instead of two. >>> >>> Signed-off-by: Stefan Beller >>> --- > I disagree. Actually, I think the git-config(1) blurb could just > point here, and that the text here ought to be clear about what > commands it affects and how an end user would use it. I tend to agree with the consolidation. >>> >>> Something like this? >> >> I like the consolidation, its easier to keep up to date when its only in >> one place. > > After thinking about it further, I'd like to withdraw this patch > for now. > > The reason is that this would also forward point to > git-submodule.txt, such that we'd still have 2 places > in which it is explained. The current situation with explaining > it in 3 places is not too bad as each place hints at their specific > circumstance: > git-submodule.txt explains the values to set itself. > gitmodules.txt explains what the possible fields in that file are, > and regarding this field it points out it is ignored in the init call. > config.txt: actually describe the effects of the setting. > > I think we can keep it as-is for now. Sorry, I got lost. Which state is as-is? As a user, how do I find out what submodule.*.update is going to do and which commands will respect it? Maybe we should work on first wordsmithing the doc and then figuring out where it goes? The current state of the doc with (?) three different texts, all wrong, doesn't seem like a good state to preserve. Thanks, Jonathan
Re: [PATCH] Documentation: consolidate submodule..update
On Mon, Sep 25, 2017 at 12:17 PM, Brandon Williamswrote: > On 09/25, Stefan Beller wrote: >> Have one place to explain the effects of setting submodule..update >> instead of two. >> >> Signed-off-by: Stefan Beller >> --- >> >> I disagree. Actually, I think the git-config(1) blurb could just >> >> point here, and that the text here ought to be clear about what >> >> commands it affects and how an end user would use it. >> > >> > I tend to agree with the consolidation. >> >> Something like this? > > I like the consolidation, its easier to keep up to date when its only in > one place. After thinking about it further, I'd like to withdraw this patch for now. The reason is that this would also forward point to git-submodule.txt, such that we'd still have 2 places in which it is explained. The current situation with explaining it in 3 places is not too bad as each place hints at their specific circumstance: git-submodule.txt explains the values to set itself. gitmodules.txt explains what the possible fields in that file are, and regarding this field it points out it is ignored in the init call. config.txt: actually describe the effects of the setting. I think we can keep it as-is for now. Thanks, Stefan
Re: [PATCH 7/7] add xread_in_full() helper
Jeff King wrote: > Many callers of read_in_full() expect to see exactly "len" > bytes, and die if that isn't the case. micronit: Can this be named read_in_full_or_die? Otherwise it's too easy to mistake for a function like xread, which has different semantics. I realize that xmalloc, xmemdupz, etc use a different convention. That's yet another reason to be explicit, IMHO. Thanks, Jonathan
Re: [PATCH 6/7] worktree: check the result of read_in_full()
Jeff King wrote: > We try to read "len" bytes into a buffer and just assume > that it happened correctly. In practice this should usually > be the case, since we just stat'd the file to get the > length. But we could be fooled by transient errors or by > other processes racily truncating the file. > > Let's be more careful. There's a slim chance this could > catch a real error, but it also prevents people and tools > from getting worried while reading the code. > > Signed-off-by: Jeff King> --- > builtin/worktree.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/builtin/worktree.c b/builtin/worktree.c > index 2f4a4ef9cd..87b3d70b0b 100644 > --- a/builtin/worktree.c > +++ b/builtin/worktree.c > @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf > *reason) > } > len = xsize_t(st.st_size); > path = xmallocz(len); > - read_in_full(fd, path, len); > + if (read_in_full(fd, path, len) != len) { > + strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did > not match stat (%s)"), > + id, strerror(errno)); I'm a little confused. The 'if' condition checks for a read error but the message says something about 'stat'. If we're trying to double-check the 'stat' result, shouldn't we read all the way to EOF in case the file got longer? Puzzled, Jonathan
Re: [PATCH 5/7] worktree: use xsize_t to access file size
Jeff King wrote: > To read the "gitdir" file into memory, we stat the file and > allocate a buffer. But we store the size in an "int", which > may be truncated. We should use a size_t and xsize_t(), > which will detect truncation. > > An overflow is unlikely for a "gitdir" file, but it's a good > practice to model. > > Signed-off-by: Jeff King> --- > builtin/worktree.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check
Jeff King wrote: > Suggested-by: Jonathan Nieder> Signed-off-by: Jeff King > --- > builtin/get-tar-commit-id.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) Reviewed-by: Jonathan Nieder
Re: [PATCH 3/7] read_in_full: reset errno before reading
Jeff King wrote: > In an ideal world, callers would always distinguish between > these cases and give a useful message for each. But as an > easy way to make our imperfect world better, let's reset > errno to a known value. The best we can do is "0", which > will yield something like: > > unable to read: Success > > That's not great, but at least it's deterministic and makes > it clear that we didn't see an error from read(). Yuck. Can we set errno to something more specific instead? read(2) also doesn't promise not to clobber errno on success. How about something like the following? -- >8 -- Subject: read_in_full: set errno for partial reads Many callers of read_in_full() complain when we do not read their full byte-count. But a check like: if (read_in_full(fd, buf, len) != len) return error_errno("unable to read"); conflates two problem conditions: 1. A real error from read(). 2. There were fewer than "len" bytes available. In the first case, showing the user strerror(errno) is useful. In the second, we may see a random errno that was set by some previous system call. In an ideal world, callers would always distinguish between these cases and give a useful message for each. But as an easy way to make our imperfect world better, let's set errno in the second case to a known value. There is no standard "Unexpected end of file" errno value, so instead use EILSEQ, which yields a message like unable to read: Illegal byte sequence That's not great, but at least it's deterministic and makes it possible to reverse-engineer what went wrong. Change-Id: I48138052f71b7c6cf35c2d00a10dc8febaca4f10 Signed-off-by: Jonathan Nieder--- wrapper.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/wrapper.c b/wrapper.c index 61aba0b5c1..1842a99b87 100644 --- a/wrapper.c +++ b/wrapper.c @@ -318,8 +318,10 @@ ssize_t read_in_full(int fd, void *buf, size_t count) ssize_t loaded = xread(fd, p, count); if (loaded < 0) return -1; - if (loaded == 0) + if (loaded == 0) { + errno = EILSEQ; return total; + } count -= loaded; p += loaded; total += loaded; -- 2.14.1.821.g8fa685d3b7
Re: [PATCH 2/7] notes-merge: drop dead zero-write code
Jeff King wrote: > We call write_in_full() with a size that we know is greater > than zero. The return value can never be zero, then, since > write_in_full() converts such a failed write() into ENOSPC > and returns -1. We can just drop this branch of the error > handling entirely. > > Suggested-by: Jonathan Nieder> Signed-off-by: Jeff King > --- > notes-merge.c | 2 -- > 1 file changed, 2 deletions(-) Reviewed-by: Jonathan Nieder Thanks for tying up this loose end.
Re: [PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
Jeff King wrote: > Commit 06f46f237a (avoid "write_in_full(fd, buf, len) != > len" pattern, 2017-09-13) converted this callsite from: > > write_in_full(...) != 1 > > to > > write_in_full(...) < 0 > > But during the conflict resolution in c50424a6f0 (Merge > branch 'jk/write-in-full-fix', 2017-09-25), this morphed > into > > write_in_full(...) < 1 > > This behaves as we want, but we prefer to avoid modeling the > "less than length" error-check which can be subtly buggy, as > shown in efacf609c8 (config: avoid "write_in_full(fd, buf, > len) < len" pattern, 2017-09-13). > > Signed-off-by: Jeff King> --- > refs/files-backend.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Good eyes. Thanks. Reviewed-by: Jonathan Nieder By the way, the description from that merge commit Many codepaths did not diagnose write failures correctly when disks go full, due to their misuse of write_in_full() helper function, which have been corrected. does not look accurate to me. At least the "Many codepaths" part. All of those changes except for three should be no-ops. The scariest one is the 'long ret = write_in_full(fd, buf, size)' in notes-merge.c, which is about overflowing a "long" on Windows instead of error handling. Thanks, Jonathan
macOS Git installer on SourceForge
Hi, one of my colleagues made me aware today that the macOS Git installer is hosted on SourceForge and advertised on the official git-scm page: https://git-scm.com/download/mac SourceForge had some bad press as they apparently bundled junkware in their downloads: https://www.howtogeek.com/218764/warning-don%E2%80%99t-download-software-from-sourceforge-if-you-can-help-it/ Is this a reason for concern? Thanks, Lars
[PATCH 7/7] add xread_in_full() helper
Many callers of read_in_full() expect to see exactly "len" bytes, and die if that isn't the case. Since there are two interesting error cases: 1. We saw a read() error. 2. We saw EOF with fewer bytes than expected. we would ideally distinguish between them when reporting to the user. Let's add a helper to make this easier. We can convert cases that currently lump the two conditions together (improving their error messages) and ones that already distinguish (shortening their code). There are a number of read_in_full() calls left that we can't (or shouldn't) convert, for one or more of these reasons: - Some are not trying to read an exact number in the first place. So they only care about case 1, and are fine. - Some call error() instead of die(). We could possibly accommodate these, at the expense of making our helper a bit more clunky. - Some print nothing and return a failing code up the stack, in which case somebody eventually looks at errno. Handling these with a helper would be hard. Ideally there would be some errno value that we could set for short read, but there's no standard one (ENODATA is kind-of accurate, but it's not really "no data"; it's "not enough data", and we wouldn't want to get confused with a real ENODATA error). Signed-off-by: Jeff King--- builtin/get-tar-commit-id.c | 3 +-- bulk-checkin.c | 4 +--- cache.h | 7 +++ combine-diff.c | 8 +--- csum-file.c | 6 +- pack-write.c| 3 +-- wrapper.c | 11 +++ 7 files changed, 23 insertions(+), 19 deletions(-) diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c index 259ad40339..0b97196afc 100644 --- a/builtin/get-tar-commit-id.c +++ b/builtin/get-tar-commit-id.c @@ -24,8 +24,7 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix) if (argc != 1) usage(builtin_get_tar_commit_id_usage); - if (read_in_full(0, buffer, HEADERSIZE) != HEADERSIZE) - die_errno("git get-tar-commit-id: read error"); + xread_in_full(0, buffer, HEADERSIZE, "stdin"); if (header->typeflag[0] != 'g') return 1; if (!skip_prefix(content, "52 comment=", )) diff --git a/bulk-checkin.c b/bulk-checkin.c index 9a1f6c49ab..a9743785fb 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -115,9 +115,7 @@ static int stream_to_pack(struct bulk_checkin_state *state, if (size && !s.avail_in) { ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf); - if (read_in_full(fd, ibuf, rsize) != rsize) - die("failed to read %d bytes from '%s'", - (int)rsize, path); + xread_in_full(fd, ibuf, rsize, path); offset += rsize; if (*already_hashed_to < offset) { size_t hsize = offset - *already_hashed_to; diff --git a/cache.h b/cache.h index 49b083ee0a..deec532228 100644 --- a/cache.h +++ b/cache.h @@ -1757,6 +1757,13 @@ extern ssize_t read_in_full(int fd, void *buf, size_t count); extern ssize_t write_in_full(int fd, const void *buf, size_t count); extern ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset); +/* + * Like read_in_full(), but die on errors or a failure to read exactly "count" + * bytes. The "desc" string will be inserted into the error message as "reading + * %s". + */ +extern void xread_in_full(int fd, void *buf, size_t count, const char *desc); + static inline ssize_t write_str_in_full(int fd, const char *str) { return write_in_full(fd, str, strlen(str)); diff --git a/combine-diff.c b/combine-diff.c index 9e163d5ada..4bdf797a10 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -1027,7 +1027,6 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, free_filespec(df); } else if (0 <= (fd = open(elem->path, O_RDONLY))) { size_t len = xsize_t(st.st_size); - ssize_t done; int is_file, i; elem->mode = canon_mode(st.st_mode); @@ -1042,12 +1041,7 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent, result_size = len; result = xmallocz(len); - - done = read_in_full(fd, result, len); - if (done < 0) - die_errno("read error '%s'", elem->path); - else if (done < len) - die("early EOF '%s'", elem->path); + xread_in_full(fd, result, len, elem->path); /* If not a fake symlink, apply filters, e.g. autocrlf */
[PATCH 6/7] worktree: check the result of read_in_full()
We try to read "len" bytes into a buffer and just assume that it happened correctly. In practice this should usually be the case, since we just stat'd the file to get the length. But we could be fooled by transient errors or by other processes racily truncating the file. Let's be more careful. There's a slim chance this could catch a real error, but it also prevents people and tools from getting worried while reading the code. Signed-off-by: Jeff King--- builtin/worktree.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/builtin/worktree.c b/builtin/worktree.c index 2f4a4ef9cd..87b3d70b0b 100644 --- a/builtin/worktree.c +++ b/builtin/worktree.c @@ -59,7 +59,11 @@ static int prune_worktree(const char *id, struct strbuf *reason) } len = xsize_t(st.st_size); path = xmallocz(len); - read_in_full(fd, path, len); + if (read_in_full(fd, path, len) != len) { + strbuf_addf(reason, _("Removing worktrees/%s: gitdir read did not match stat (%s)"), + id, strerror(errno)); + return 1; + } close(fd); while (len && (path[len - 1] == '\n' || path[len - 1] == '\r')) len--; -- 2.14.1.1148.ga2561536a1
[PATCH 5/7] worktree: use xsize_t to access file size
To read the "gitdir" file into memory, we stat the file and allocate a buffer. But we store the size in an "int", which may be truncated. We should use a size_t and xsize_t(), which will detect truncation. An overflow is unlikely for a "gitdir" file, but it's a good practice to model. Signed-off-by: Jeff King--- builtin/worktree.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/builtin/worktree.c b/builtin/worktree.c index de26849f55..2f4a4ef9cd 100644 --- a/builtin/worktree.c +++ b/builtin/worktree.c @@ -38,7 +38,8 @@ static int prune_worktree(const char *id, struct strbuf *reason) { struct stat st; char *path; - int fd, len; + int fd; + size_t len; if (!is_directory(git_path("worktrees/%s", id))) { strbuf_addf(reason, _("Removing worktrees/%s: not a valid directory"), id); @@ -56,7 +57,7 @@ static int prune_worktree(const char *id, struct strbuf *reason) id, strerror(errno)); return 1; } - len = st.st_size; + len = xsize_t(st.st_size); path = xmallocz(len); read_in_full(fd, path, len); close(fd); -- 2.14.1.1148.ga2561536a1
[PATCH 4/7] get-tar-commit-id: prefer "!=" for read_in_full() error check
Comparing the result of read_in_full() using less-than is potentially dangerous, as discussed in 561598cfcf (read_pack_header: handle signed/unsigned comparison in read result, 2017-09-13). The instance in get-tar-commit-id is OK, because the HEADERSIZE macro expands to a signed integer. But if it were switched to an unsigned type like: size_t HEADERSIZE = ...; this would be a bug. Let's use the more robust "!=" construct. We can also drop the useless "n" variable while we're at it. Suggested-by: Jonathan NiederSigned-off-by: Jeff King --- builtin/get-tar-commit-id.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/builtin/get-tar-commit-id.c b/builtin/get-tar-commit-id.c index 6d9a79f9b3..259ad40339 100644 --- a/builtin/get-tar-commit-id.c +++ b/builtin/get-tar-commit-id.c @@ -20,14 +20,12 @@ int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix) struct ustar_header *header = (struct ustar_header *)buffer; char *content = buffer + RECORDSIZE; const char *comment; - ssize_t n; if (argc != 1) usage(builtin_get_tar_commit_id_usage); - n = read_in_full(0, buffer, HEADERSIZE); - if (n < HEADERSIZE) - die("git get-tar-commit-id: read error"); + if (read_in_full(0, buffer, HEADERSIZE) != HEADERSIZE) + die_errno("git get-tar-commit-id: read error"); if (header->typeflag[0] != 'g') return 1; if (!skip_prefix(content, "52 comment=", )) -- 2.14.1.1148.ga2561536a1
[PATCH 3/7] read_in_full: reset errno before reading
Many callers of read_in_full() complain when we do not read their full byte-count. But a check like: if (read_in_full(fd, buf, len) != len) return error_errno("unable to read"); conflates two problem conditions: 1. A real error from read(). 2. There were fewer than "len" bytes available. In the first case, showing the user strerror(errno) is useful. But in the second, we may see a random errno that was set by some previous system call. In an ideal world, callers would always distinguish between these cases and give a useful message for each. But as an easy way to make our imperfect world better, let's reset errno to a known value. The best we can do is "0", which will yield something like: unable to read: Success That's not great, but at least it's deterministic and makes it clear that we didn't see an error from read(). Signed-off-by: Jeff King--- wrapper.c | 1 + 1 file changed, 1 insertion(+) diff --git a/wrapper.c b/wrapper.c index 61aba0b5c1..f55debc92d 100644 --- a/wrapper.c +++ b/wrapper.c @@ -314,6 +314,7 @@ ssize_t read_in_full(int fd, void *buf, size_t count) char *p = buf; ssize_t total = 0; + errno = 0; while (count > 0) { ssize_t loaded = xread(fd, p, count); if (loaded < 0) -- 2.14.1.1148.ga2561536a1
[PATCH 2/7] notes-merge: drop dead zero-write code
We call write_in_full() with a size that we know is greater than zero. The return value can never be zero, then, since write_in_full() converts such a failed write() into ENOSPC and returns -1. We can just drop this branch of the error handling entirely. Suggested-by: Jonathan NiederSigned-off-by: Jeff King --- notes-merge.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/notes-merge.c b/notes-merge.c index 597d43f65c..4352c34a6e 100644 --- a/notes-merge.c +++ b/notes-merge.c @@ -308,8 +308,6 @@ static void write_buf_to_worktree(const struct object_id *obj, if (errno == EPIPE) break; die_errno("notes-merge"); - } else if (!ret) { - die("notes-merge: disk full?"); } size -= ret; buf += ret; -- 2.14.1.1148.ga2561536a1
[PATCH 1/7] files-backend: prefer "0" for write_in_full() error check
Commit 06f46f237a (avoid "write_in_full(fd, buf, len) != len" pattern, 2017-09-13) converted this callsite from: write_in_full(...) != 1 to write_in_full(...) < 0 But during the conflict resolution in c50424a6f0 (Merge branch 'jk/write-in-full-fix', 2017-09-25), this morphed into write_in_full(...) < 1 This behaves as we want, but we prefer to avoid modeling the "less than length" error-check which can be subtly buggy, as shown in efacf609c8 (config: avoid "write_in_full(fd, buf, len) < len" pattern, 2017-09-13). Signed-off-by: Jeff King--- refs/files-backend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index dac33628b3..e35c64c001 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3007,7 +3007,7 @@ static int files_reflog_expire(struct ref_store *ref_store, } else if (update && (write_in_full(get_lock_file_fd(>lk), oid_to_hex(_kept_oid), GIT_SHA1_HEXSZ) < 0 || - write_str_in_full(get_lock_file_fd(>lk), "\n") < 1 || + write_str_in_full(get_lock_file_fd(>lk), "\n") < 0 || close_ref_gently(lock) < 0)) { status |= error("couldn't write %s", get_lock_file_path(>lk)); -- 2.14.1.1148.ga2561536a1
[PATCH 0/7] read/write_in_full leftovers
This series addresses the bits leftover from the discussion two weeks ago in: https://public-inbox.org/git/20170913170807.cyx7rrpoyhaau...@sigill.intra.peff.net/ and its subthread. I don't think any of these is a real problem in practice, so this can be considered as a cleanup. I'm on the fence over whether the final one is a good idea. See the commit message for details. [1/7]: files-backend: prefer "0" for write_in_full() error check [2/7]: notes-merge: drop dead zero-write code [3/7]: read_in_full: reset errno before reading [4/7]: get-tar-commit-id: prefer "!=" for read_in_full() error check [5/7]: worktree: use xsize_t to access file size [6/7]: worktree: check the result of read_in_full() [7/7]: add xread_in_full() helper builtin/get-tar-commit-id.c | 5 + builtin/worktree.c | 11 --- bulk-checkin.c | 4 +--- cache.h | 7 +++ combine-diff.c | 8 +--- csum-file.c | 6 +- notes-merge.c | 2 -- pack-write.c| 3 +-- refs/files-backend.c| 2 +- wrapper.c | 12 10 files changed, 33 insertions(+), 27 deletions(-) -Peff
Re: [PATCH 4/4] Move documentation of string_list into string-list.h
On Mon, Sep 25, 2017 at 10:40 AM, Brandon Williamswrote: > On 09/25, Han-Wen Nienhuys wrote: >> This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did > > Not really important but most times we reference another commit from a > commit msg we include the one line summary like: > 'bdfdaa497 (strbuf.h: integrate api-strbuf.txt documentation, > 2015-01-16)' I have alias.gcs=show --date=short -s --pretty='format:%h (%s, %ad)' in the config file for that. Thanks for converting documentation into header files! Stefan
Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules
Stefan Beller wrote: > submodule..update can be assigned an arbitrary command via setting > it to "!command". When this command is found in the regular config, Git > ought to just run that command instead of other update mechanisms. > > However if that command is just found in the .gitmodules file, it is > potentially untrusted, which is why we do not run it. Add a test > confirming the behavior. > > Suggested-by: Jonathan Nieder> Signed-off-by: Stefan Beller > --- > t/t7406-submodule-update.sh | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh > index 034914a14f..780af4e6f5 100755 > --- a/t/t7406-submodule-update.sh > +++ b/t/t7406-submodule-update.sh > @@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in > .git/config' ' > ) > ' > > +test_expect_success 'submodule update - command in .gitmodules is ignored' ' > + test_when_finished "git -C super reset --hard HEAD^" && > + > + git -C super config -f .gitmodules submodule.submodule.update "!false > || echo >bad" && What does the '!false || echo >bad' do? Ideally we want this test to be super robust: e.g. if it runs the command but from a different directory, we still want the test to fail, and if it runs the command but using exec instead of a shell, we still want the test to fail. Maybe write_script would help with this. E.g. would something like test_when_finished ... && write_script must_not_run.sh <<-EOF && >$TEST_DIRECTORY/bad EOF git -C super config -f .gitmodules submodule.submodule.update \ "!$TEST_DIRECTORY/must_not_run.sh" && ... work? Thanks, Jonathan
[PATCH] t7406: submodule..update command must not be run from .gitmodules
submodule..update can be assigned an arbitrary command via setting it to "!command". When this command is found in the regular config, Git ought to just run that command instead of other update mechanisms. However if that command is just found in the .gitmodules file, it is potentially untrusted, which is why we do not run it. Add a test confirming the behavior. Suggested-by: Jonathan NiederSigned-off-by: Stefan Beller --- t/t7406-submodule-update.sh | 10 ++ 1 file changed, 10 insertions(+) diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh index 034914a14f..780af4e6f5 100755 --- a/t/t7406-submodule-update.sh +++ b/t/t7406-submodule-update.sh @@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in .git/config' ' ) ' +test_expect_success 'submodule update - command in .gitmodules is ignored' ' + test_when_finished "git -C super reset --hard HEAD^" && + + git -C super config -f .gitmodules submodule.submodule.update "!false || echo >bad" && + git -C super commit -a -m "add command to .gitmodules file" && + git -C super/submodule reset --hard $submodulesha1^ && + git -C super submodule update submodule 2>../actual && + test_path_is_missing super/bad +' + cat << EOF >expect Execution of 'false $submodulesha1' failed in submodule path 'submodule' EOF -- 2.14.0.rc0.3.g6c2e499285
Re: git ls-tree -d doesn't work if the specified path is the repository root?
> Is there some reason for this? This behavior seems like a bug to me: > it means that prior to calling ls-tree I need to check if the > referenced path happens to be the root, and if so, find some other > means (rev-parse?) of converting it to a treehash. Hmm, interesting. I see these four behaviors: [santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- src 04 tree 238a62ca62527423fd3190d00917ddfef0d254a3src [santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- src/ 04 tree 767beaaf0927f89e630c52830b6fbac138ec039asrc/bg_daemon [santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- . 04 tree 238a62ca62527423fd3190d00917ddfef0d254a3src 04 tree de12675d1023b1e743f7e11c2276fc417b3650a6tests [santiago@LykOS bg_daemon]$ git ls-tree -d HEAD -- ./ 04 tree 238a62ca62527423fd3190d00917ddfef0d254a3src 04 tree de12675d1023b1e743f7e11c2276fc417b3650a6tests IMHO, I think it'd be more consistent if . returned the root treehash and ./ returned the treehash of the directories in root. I don't personally know if there is a reason for this... Thanks, -Santiago. > > Thanks, > —Roy signature.asc Description: PGP signature
Re: [PATCH] Documentation: consolidate submodule..update
On 09/25, Stefan Beller wrote: > Have one place to explain the effects of setting submodule..update > instead of two. > > Signed-off-by: Stefan Beller> --- > >> I disagree. Actually, I think the git-config(1) blurb could just > >> point here, and that the text here ought to be clear about what > >> commands it affects and how an end user would use it. > > > > I tend to agree with the consolidation. > > Something like this? I like the consolidation, its easier to keep up to date when its only in one place. > > Documentation/config.txt | 9 + > Documentation/gitmodules.txt | 20 +++- > 2 files changed, 12 insertions(+), 17 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 1ac0ae6adb..0d5a296b6c 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -3085,14 +3085,7 @@ submodule..url:: > See linkgit:git-submodule[1] and linkgit:gitmodules[5] for details. > > submodule..update:: > - The method by which a submodule is updated by 'git submodule update', > - which is the only affected command, others such as > - 'git checkout --recurse-submodules' are unaffected. It exists for > - historical reasons, when 'git submodule' was the only command to > - interact with submodules; settings like `submodule.active` > - and `pull.rebase` are more specific. It is populated by > - `git submodule init` from the linkgit:gitmodules[5] file. > - See description of 'update' command in linkgit:git-submodule[1]. > + See `submodule..update` in linkgit:gitmodules[5]. > > submodule..branch:: > The remote branch name for a submodule, used by `git submodule > diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt > index db5d47eb19..d156dee387 100644 > --- a/Documentation/gitmodules.txt > +++ b/Documentation/gitmodules.txt > @@ -38,15 +38,17 @@ submodule..url:: > In addition, there are a number of optional keys: > > submodule..update:: > - Defines the default update procedure for the named submodule, > - i.e. how the submodule is updated by "git submodule update" > - command in the superproject. This is only used by `git > - submodule init` to initialize the configuration variable of > - the same name. Allowed values here are 'checkout', 'rebase', > - 'merge' or 'none'. See description of 'update' command in > - linkgit:git-submodule[1] for their meaning. Note that the > - '!command' form is intentionally ignored here for security > - reasons. > + The method by which a submodule is updated by 'git submodule update', > + which is the only affected command, others such as > + 'git checkout --recurse-submodules' are unaffected. It exists for > + historical reasons, when 'git submodule' was the only command to > + interact with submodules; settings like `submodule.active` > + and `pull.rebase` are more specific. It is copied to the config > + by `git submodule init` from the .gitmodules file. > + Allowed values here are 'checkout', 'rebase', 'merge' or 'none'. > + See description of 'update' command in linkgit:git-submodule[1] > + for their meaning. Note that the '!command' form is intentionally > + ignored here for security reasons. This probably needs to be tweaked a bit to say that the '!command' form is ignored by submodule init, in that it isn't copied over from the .gitmodules file, but if it is configured in your config it will be respected by 'submodule update'. > > submodule..branch:: > A remote branch name for tracking updates in the upstream submodule. > -- > 2.14.0.rc0.3.g6c2e499285 > -- Brandon Williams
git ls-tree -d doesn't work if the specified path is the repository root?
Hi, When I run `git ls-tree -d HEAD -- subdir` from the root of my repository, where `subdir` is a subdirectory in that root, I get the treehash of that subdirectory. This is what I expect. However, if I merely replace `subdir` with `.` (the root of the repository), (i.e., `git ls-tree -d HEAD -- .`) git ls-tree returns the treehashes of the /children/ of the root, instead of the root itself, contrary to the documented behavior of -d. Is there some reason for this? This behavior seems like a bug to me: it means that prior to calling ls-tree I need to check if the referenced path happens to be the root, and if so, find some other means (rev-parse?) of converting it to a treehash. Thanks, —Roy
[PATCH] Documentation: consolidate submodule..update
Have one place to explain the effects of setting submodule..update instead of two. Signed-off-by: Stefan Beller--- >> I disagree. Actually, I think the git-config(1) blurb could just >> point here, and that the text here ought to be clear about what >> commands it affects and how an end user would use it. > > I tend to agree with the consolidation. Something like this? Documentation/config.txt | 9 + Documentation/gitmodules.txt | 20 +++- 2 files changed, 12 insertions(+), 17 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 1ac0ae6adb..0d5a296b6c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3085,14 +3085,7 @@ submodule..url:: See linkgit:git-submodule[1] and linkgit:gitmodules[5] for details. submodule..update:: - The method by which a submodule is updated by 'git submodule update', - which is the only affected command, others such as - 'git checkout --recurse-submodules' are unaffected. It exists for - historical reasons, when 'git submodule' was the only command to - interact with submodules; settings like `submodule.active` - and `pull.rebase` are more specific. It is populated by - `git submodule init` from the linkgit:gitmodules[5] file. - See description of 'update' command in linkgit:git-submodule[1]. + See `submodule..update` in linkgit:gitmodules[5]. submodule..branch:: The remote branch name for a submodule, used by `git submodule diff --git a/Documentation/gitmodules.txt b/Documentation/gitmodules.txt index db5d47eb19..d156dee387 100644 --- a/Documentation/gitmodules.txt +++ b/Documentation/gitmodules.txt @@ -38,15 +38,17 @@ submodule..url:: In addition, there are a number of optional keys: submodule..update:: - Defines the default update procedure for the named submodule, - i.e. how the submodule is updated by "git submodule update" - command in the superproject. This is only used by `git - submodule init` to initialize the configuration variable of - the same name. Allowed values here are 'checkout', 'rebase', - 'merge' or 'none'. See description of 'update' command in - linkgit:git-submodule[1] for their meaning. Note that the - '!command' form is intentionally ignored here for security - reasons. + The method by which a submodule is updated by 'git submodule update', + which is the only affected command, others such as + 'git checkout --recurse-submodules' are unaffected. It exists for + historical reasons, when 'git submodule' was the only command to + interact with submodules; settings like `submodule.active` + and `pull.rebase` are more specific. It is copied to the config + by `git submodule init` from the .gitmodules file. + Allowed values here are 'checkout', 'rebase', 'merge' or 'none'. + See description of 'update' command in linkgit:git-submodule[1] + for their meaning. Note that the '!command' form is intentionally + ignored here for security reasons. submodule..branch:: A remote branch name for tracking updates in the upstream submodule. -- 2.14.0.rc0.3.g6c2e499285
Re: [PATCH] git: add --no-optional-locks option
>> There might be another option to cope with the situation: >> >> 4. Teach all commands to spinlock / busywait shortly for important >> locks instead of giving up. In that case git-status rewriting >> the index ought to be fine? > > We do have all the infrastructure in place to do a reasonable busywait > with backoff. I think the patch is roughly just: > > diff --git a/read-cache.c b/read-cache.c > index 65f4fe8375..fc1ba122a3 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state > *istate, > > int hold_locked_index(struct lock_file *lk, int lock_flags) > { > - return hold_lock_file_for_update(lk, get_index_file(), lock_flags); > + return hold_lock_file_for_update_timeout(lk, get_index_file(), > +lock_flags, 500); > } > > int read_index(struct index_state *istate) > > though I think there are a few sites which manually call > hold_lock_file_for_update() on the index that would need similar > treatment. uh, too bad. The patch above looks really promising, though. :) > > I suspect it would work OK in practice, unless your index is so big that > 500ms isn't enough. The user may also see minor stalls when there's lock > contention. I'm not sure how annoying that would be. There is only one way to find out. Though we don't want to volunteer all users into this experiment, I'd presume. Regarding larger indexes, I wonder if we can adapt the 500ms to the repo size. At first I thought the abbreviation length could be a good proxy to determine the maximum waiting time, but now I am not so sure any more. Stefan
Re: What's cooking in git.git (Sep 2017, #04; Mon, 25)
> > * sb/parse-options-blank-line-before-option-list (2017-08-25) 1 commit > - usage_with_options: omit double new line on empty option list > > "git worktree" with no option and no subcommand showed too many > blank lines in its help text, which has been reduced. > > Superseded by bc/rev-parse-parseopt-fix? yes it is, please retract this one. > * sb/submodule-recursive-checkout-detach-head (2017-07-28) 2 commits > - Documentation/checkout: clarify submodule HEADs to be detached > - recursive submodules: detach HEAD from new state > > "git checkout --recursive" may overwrite and rewind the history of > the branch that happens to be checked out in submodule > repositories, which might not be desirable. Detach the HEAD but > still allow the recursive checkout to succeed in such a case. > > Undecided. > This needs justification in a larger picture; it is unclear why > this is better than rejecting recursive checkout, for example. I'll revisit this with fresh eyes, the original plan was to get jn/per-repo-obj-store done to have more fundamental operations available to do the right thing, whatever it is, though. > > * jn/per-repo-obj-store (2017-09-07) 39 commits We intend to get this one rolling again.
Re: [PATCH 4/4] Move documentation of string_list into string-list.h
On 09/25, Han-Wen Nienhuys wrote: > This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did Not really important but most times we reference another commit from a commit msg we include the one line summary like: 'bdfdaa497 (strbuf.h: integrate api-strbuf.txt documentation, 2015-01-16)' > the same for strbuf.h: > > * API documentation uses /** */ to set it apart from other comments. > > * Function names were stripped from the comments. > > * Ordering of the header was adjusted to follow the one from the text > file. > > * Edited some existing comments to follow the new standard. > > Signed-off-by: Han-Wen Nienhuys> --- > Documentation/technical/api-string-list.txt | 209 > > string-list.h | 187 + > 2 files changed, 159 insertions(+), 237 deletions(-) > delete mode 100644 Documentation/technical/api-string-list.txt > > diff --git a/Documentation/technical/api-string-list.txt > b/Documentation/technical/api-string-list.txt > deleted file mode 100644 > index c08402b12..0 > --- a/Documentation/technical/api-string-list.txt > +++ /dev/null > @@ -1,209 +0,0 @@ > -string-list API > -=== > - > -The string_list API offers a data structure and functions to handle > -sorted and unsorted string lists. A "sorted" list is one whose > -entries are sorted by string value in `strcmp()` order. > - > -The 'string_list' struct used to be called 'path_list', but was renamed > -because it is not specific to paths. > - > -The caller: > - > -. Allocates and clears a `struct string_list` variable. > - > -. Initializes the members. You might want to set the flag `strdup_strings` > - if the strings should be strdup()ed. For example, this is necessary > - when you add something like git_path("..."), since that function returns > - a static buffer that will change with the next call to git_path(). > -+ > -If you need something advanced, you can manually malloc() the `items` > -member (you need this if you add things later) and you should set the > -`nr` and `alloc` members in that case, too. > - > -. Adds new items to the list, using `string_list_append`, > - `string_list_append_nodup`, `string_list_insert`, > - `string_list_split`, and/or `string_list_split_in_place`. > - > -. Can check if a string is in the list using `string_list_has_string` or > - `unsorted_string_list_has_string` and get it from the list using > - `string_list_lookup` for sorted lists. > - > -. Can sort an unsorted list using `string_list_sort`. > - > -. Can remove duplicate items from a sorted list using > - `string_list_remove_duplicates`. > - > -. Can remove individual items of an unsorted list using > - `unsorted_string_list_delete_item`. > - > -. Can remove items not matching a criterion from a sorted or unsorted > - list using `filter_string_list`, or remove empty strings using > - `string_list_remove_empty_items`. > - > -. Finally it should free the list using `string_list_clear`. > - > -Example: > - > - > -struct string_list list = STRING_LIST_INIT_NODUP; > -int i; > - > -string_list_append(, "foo"); > -string_list_append(, "bar"); > -for (i = 0; i < list.nr; i++) > - printf("%s\n", list.items[i].string) > - > - > -NOTE: It is more efficient to build an unsorted list and sort it > -afterwards, instead of building a sorted list (`O(n log n)` instead of > -`O(n^2)`). > -+ > -However, if you use the list to check if a certain string was added > -already, you should not do that (using unsorted_string_list_has_string()), > -because the complexity would be quadratic again (but with a worse factor). > - > -Functions > -- > - > -* General ones (works with sorted and unsorted lists as well) > - > -`string_list_init`:: > - > - Initialize the members of the string_list, set `strdup_strings` > - member according to the value of the second parameter. > - > -`filter_string_list`:: > - > - Apply a function to each item in a list, retaining only the > - items for which the function returns true. If free_util is > - true, call free() on the util members of any items that have > - to be deleted. Preserve the order of the items that are > - retained. > - > -`string_list_remove_empty_items`:: > - > - Remove any empty strings from the list. If free_util is true, > - call free() on the util members of any items that have to be > - deleted. Preserve the order of the items that are retained. > - > -`print_string_list`:: > - > - Dump a string_list to stdout, useful mainly for debugging purposes. It > - can take an optional header argument and it writes out the > - string-pointer pairs of the string_list, each one in its own line. > - > -`string_list_clear`:: > - > - Free a string_list. The `string` pointer of the items will be freed in > - case the `strdup_strings` member of the string_list is set. The second > - parameter
Re: [PATCH 3/4] Document submodule_to_gitdir
On 09/25, Han-Wen Nienhuys wrote: > Signed-off-by: Han-Wen Nienhuys> --- > submodule.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/submodule.c b/submodule.c > index b12600fc7..b66c23f5d 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1997,6 +1997,9 @@ const char *get_superproject_working_tree(void) > return ret; > } > > +/* Put the gitdir for a submodule (given relative to the main repository > worktree) > + * into `buf`, or return -1 on error. > + */ Same nit as in patch [2/4] > int submodule_to_gitdir(struct strbuf *buf, const char *submodule) > { > const struct submodule *sub; > -- > 2.14.1.821.g8fa685d3b7-goog > -- Brandon Williams
Re: [PATCH 2/4] Clarify return value ownership of real_path and read_gitfile_gently
On 09/25, Han-Wen Nienhuys wrote: > Signed-off-by: Han-Wen Nienhuys> --- > abspath.c | 3 +++ > setup.c | 3 ++- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/abspath.c b/abspath.c > index 708aff8d4..792a2d533 100644 > --- a/abspath.c > +++ b/abspath.c > @@ -202,6 +202,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char > *path, > return retval; > } > > +/* Resolve `path` into an absolute, cleaned-up path. The return value > + * comes from a global shared buffer and should not be freed. > + */ nit: Our comment style requires the opening '/*' which starts a comment block to be on its own line. > const char *real_path(const char *path) > { > static struct strbuf realpath = STRBUF_INIT; > diff --git a/setup.c b/setup.c > index 6d8380acd..31853724c 100644 > --- a/setup.c > +++ b/setup.c > @@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char > *path, const char *dir) > > /* > * Try to read the location of the git directory from the .git file, > - * return path to git directory if found. > + * return path to git directory if found. The return value comes from > + * a shared pool and should not be freed. > * > * On failure, if return_error_code is not NULL, return_error_code > * will be set to an error code and NULL will be returned. If > -- > 2.14.1.821.g8fa685d3b7-goog > -- Brandon Williams
Re: [PATCH] git: add --no-optional-locks option
On Mon, Sep 25, 2017 at 06:10:12PM +0200, Johannes Schindelin wrote: > On Thu, 21 Sep 2017, Jeff King wrote: > > > Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not* > > to lock the index and update it, 2016-08-12). Folks working on GitHub > > Desktop complained to me that it's only available on Windows. :) > > Okay, so this is trying to help me by upstreaming a patch from Git for > Windows? > > If so: thanks! Yes, in a round-about way. :) > The changes, in particular the different semantics, will guarantee that I > have to work on the consumer's side here, though, leaving me even less > time for the Git mailing list, so I will need a lot more help with > upstreaming patches. I think you can get away with doing nothing for the time being. The two patches can co-exist in the GfW codebase[1], and people using the existing option can switch over at their leisure. Eventually you may decide to revert 67e5ce7f63. -Peff [1] Modulo some trivial conflict resolution when the two are merged.
Re: [PATCH v2 1/1] Win32: simplify loading of DLL functions
Johannes Schindelin wrote: > Dynamic loading of DLL functions is duplicated in several places in Git > for Windows' source code. > > This patch adds a pair of macros to simplify the process: the > DECLARE_PROC_ADDR(, , , > ..) macro to be used at the beginning of a > code block, and the INIT_PROC_ADDR() macro to call before > using the declared function. The return value of the INIT_PROC_ADDR() > call has to be checked; If it is NULL, the function was not found in the > specified DLL. > > Example: > > DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, > LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); > > if (!INIT_PROC_ADDR(CreateHardLinkW)) > return error("Could not find CreateHardLinkW() function"; > > if (!CreateHardLinkW(source, target, NULL)) > return error("could not create hardlink from %S to %S", >source, target); > return 0; > > Signed-off-by: Karsten Blees> Signed-off-by: Johannes Schindelin > Reviewed-by: Jonathan Nieder Yep, this is indeed Reviewed-by: Jonathan Nieder > --- > compat/win32/lazyload.h | 57 > + > 1 file changed, 57 insertions(+) > create mode 100644 compat/win32/lazyload.h Thanks, Jonathan
VKTUR'un SİZE ÖZEL KAMPANYALARI DEVAM EDİYOR!!
Re: [PATCH] git: add --no-optional-locks option
Hi Kaartic, On Sun, 24 Sep 2017, Kaartic Sivaraam wrote: > On Thursday 21 September 2017 10:02 AM, Jeff King wrote: > > Some tools like IDEs or fancy editors may periodically run commands > > like "git status" in the background to keep track of the state of the > > repository. > > I might be missing something, shouldn't the IDEs be encouraged to use > libgit2 instead? I thought it was meant for these use cases. There are pros and cons. Visual Studio moved away from libgit2 e.g. to support SSH (back then, libgit2 did not support spawning processes, I have no idea whether that changed in the meantime). Ciao, Johannes
Re: [PATCH] git: add --no-optional-locks option
Hi Peff, On Thu, 21 Sep 2017, Jeff King wrote: > Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not* > to lock the index and update it, 2016-08-12). Folks working on GitHub > Desktop complained to me that it's only available on Windows. :) Okay, so this is trying to help me by upstreaming a patch from Git for Windows? If so: thanks! The changes, in particular the different semantics, will guarantee that I have to work on the consumer's side here, though, leaving me even less time for the Git mailing list, so I will need a lot more help with upstreaming patches. Ciao, Dscho
Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
On Sun, Sep 24, 2017 at 09:59:28PM +0200, Martin Ågren wrote: > > Anyway, doing: > > > > > > ASAN_OPTIONS=detect_leaks=1:abort_on_error=0:exitcode=0:log_path=/tmp/lsan/output > > \ > > make SANITIZE=address,leak test > > > > should pass the whole suite and give you a host of files to analyze. > > Thanks. My reading of the documentation was off. Turns out exitcode=0 > does not set the exit code to 0, but rather turns it off. Duh. Actually, the docs are quite confusing. The LSan "exitcode" option defaults to 23, and that is the exit code you get. So I think it probably is interpreted as the code to exit with, or 0 for "use the original exit code". > > I'm not sure of the best way to count things. > > Right. It's a tricky problem. And in the end, all we find out is where > we allocate and how we got there. Exactly where we lose/leak that piece > of allocated memory is another question... For hunting a particular trace, I think you have to walk up the list of called functions and see where the pointers go out of scope. I'm not sure how to make that easier (in theory a compiler-instrumentation like LSan could do it by performing a leak-check when pointers go out of scope. But it would also need to know about copies you've made of pointers, so I imagine it would be extremely slow to run). But at least on the topic of "how many unique leaks are there", I wrote the script below to try to give some basic answers. It just finds the first non-boring entry in each stack trace and reports that. Where "boring" is really "this function is not expected to free, but hands off memory ownership to somebody else". You can use it to do: perl leaks.pl /tmp/lsan/output.* | sort | uniq -c | sort -rn | head to see places that leak a lot. These are either boring calls that need to be annotated, or are high-value targets for de-leaking. I notice ref-filter.c has quite a few high entries on the list. I'm not sure yet which case it falls into. :) The other interesting thing is seeing how many "unique" leaks there are: perl leaks.pl /tmp/lsan/output.* | sort -u | wc -l I get a bit over 800 with a run of the test suite. Which is a lot, but fewer than I expected. And I'm sure quite a few of them are really "duplicates" that can be eliminated in chunks. So I don't know how useful any of that will be, but it at least should give _some_ metric that should be diminishing as we fix leaks. -- >8 -- #!/usr/bin/perl my $boring = join('|', # These are allocation functions that get called from a lot of places. qw( __interceptor_strdup __interceptor_calloc realloc malloc xstrdup xcalloc strbuf_ xmemdupz xstrvfmt xstrfmt xstrndup ), # These are really just the revision machinery not getting cleaned up; # for many we'd probably want to just UNLEAK() at the apex caller qw( add_rev_cmdline add_object_array_with_path add_pending_object_with_path add_pending_object_with_mode add_pending_object handle_revision_arg setup_revisions ), # More allocators that drop memory ownership qw( alloc_ref_with_prefix alloc_ref copy_ref commit_list_insert copy_pathspec ), ); my $boring_re = qr/^$boring/; my $skipping; while (<>) { if (/^\s*#[0-9]+ 0x[0-9a-f]+ in (.*)/) { next if $skipping; # we already reported this trace next if $1 =~ $boring_re; print $1, "\n"; $skipping = 1; } else { $skipping = 0; } }
[PATCH v2 0/1] Simplify loading of DLL functions on Windows
Changes since v1: - repeated the example from the commit message in the .h file - mentioned that the function is not thread-safe. - added Jonathan's Reviewed-by: footer Johannes Schindelin (1): Win32: simplify loading of DLL functions compat/win32/lazyload.h | 57 + 1 file changed, 57 insertions(+) create mode 100644 compat/win32/lazyload.h base-commit: 28996cec80690d2322359d3650a57e8de6e01eb6 Published-As: https://github.com/dscho/git/releases/tag/lazyload-v2 Fetch-It-Via: git fetch https://github.com/dscho/git lazyload-v2 Interdiff vs v1: diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h index 91c10dad2fb..9e631c8593f 100644 --- a/compat/win32/lazyload.h +++ b/compat/win32/lazyload.h @@ -1,7 +1,19 @@ #ifndef LAZYLOAD_H #define LAZYLOAD_H -/* simplify loading of DLL functions */ +/* + * A pair of macros to simplify loading of DLL functions. Example: + * + * DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, + * LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); + * + * if (!INIT_PROC_ADDR(CreateHardLinkW)) + * return error("Could not find CreateHardLinkW() function"; + * + * if (!CreateHardLinkW(source, target, NULL)) + * return error("could not create hardlink from %S to %S", + *source, target); + */ struct proc_addr { const char *const dll; @@ -20,6 +32,7 @@ struct proc_addr { * Loads a function from a DLL (once-only). * Returns non-NULL function pointer on success. * Returns NULL + errno == ENOSYS on failure. + * This function is not thread-safe. */ #define INIT_PROC_ADDR(function) \ (function = get_proc_addr(_addr_##function)) -- 2.14.1.windows.1.1024.gf2dea585d74.dirty
[PATCH v2 1/1] Win32: simplify loading of DLL functions
Dynamic loading of DLL functions is duplicated in several places in Git for Windows' source code. This patch adds a pair of macros to simplify the process: the DECLARE_PROC_ADDR(, , , ..) macro to be used at the beginning of a code block, and the INIT_PROC_ADDR() macro to call before using the declared function. The return value of the INIT_PROC_ADDR() call has to be checked; If it is NULL, the function was not found in the specified DLL. Example: DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); if (!INIT_PROC_ADDR(CreateHardLinkW)) return error("Could not find CreateHardLinkW() function"; if (!CreateHardLinkW(source, target, NULL)) return error("could not create hardlink from %S to %S", source, target); return 0; Signed-off-by: Karsten BleesSigned-off-by: Johannes Schindelin Reviewed-by: Jonathan Nieder Signed-off-by: Johannes Schindelin --- compat/win32/lazyload.h | 57 + 1 file changed, 57 insertions(+) create mode 100644 compat/win32/lazyload.h diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file mode 100644 index 000..9e631c8593f --- /dev/null +++ b/compat/win32/lazyload.h @@ -0,0 +1,57 @@ +#ifndef LAZYLOAD_H +#define LAZYLOAD_H + +/* + * A pair of macros to simplify loading of DLL functions. Example: + * + * DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, + * LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); + * + * if (!INIT_PROC_ADDR(CreateHardLinkW)) + * return error("Could not find CreateHardLinkW() function"; + * + * if (!CreateHardLinkW(source, target, NULL)) + * return error("could not create hardlink from %S to %S", + *source, target); + */ + +struct proc_addr { + const char *const dll; + const char *const function; + FARPROC pfunction; + unsigned initialized : 1; +}; + +/* Declares a function to be loaded dynamically from a DLL. */ +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \ + static struct proc_addr proc_addr_##function = \ + { #dll, #function, NULL, 0 }; \ + static rettype (WINAPI *function)(__VA_ARGS__) + +/* + * Loads a function from a DLL (once-only). + * Returns non-NULL function pointer on success. + * Returns NULL + errno == ENOSYS on failure. + * This function is not thread-safe. + */ +#define INIT_PROC_ADDR(function) \ + (function = get_proc_addr(_addr_##function)) + +static inline void *get_proc_addr(struct proc_addr *proc) +{ + /* only do this once */ + if (!proc->initialized) { + HANDLE hnd; + proc->initialized = 1; + hnd = LoadLibraryExA(proc->dll, NULL, +LOAD_LIBRARY_SEARCH_SYSTEM32); + if (hnd) + proc->pfunction = GetProcAddress(hnd, proc->function); + } + /* set ENOSYS if DLL or function was not found */ + if (!proc->pfunction) + errno = ENOSYS; + return proc->pfunction; +} + +#endif -- 2.14.1.windows.1.1024.gf2dea585d74.dirty
[PATCH 1/4] Fix typo in submodule.h
Signed-off-by: Han-Wen Nienhuys--- submodule.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/submodule.h b/submodule.h index 6b52133c8..f0da0277a 100644 --- a/submodule.h +++ b/submodule.h @@ -120,7 +120,7 @@ extern int submodule_move_head(const char *path, /* * Prepare the "env_array" parameter of a "struct child_process" for executing - * a submodule by clearing any repo-specific envirionment variables, but + * a submodule by clearing any repo-specific environment variables, but * retaining any config in the environment. */ extern void prepare_submodule_repo_env(struct argv_array *out); -- 2.14.1.821.g8fa685d3b7-goog
[PATCH 3/4] Document submodule_to_gitdir
Signed-off-by: Han-Wen Nienhuys--- submodule.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/submodule.c b/submodule.c index b12600fc7..b66c23f5d 100644 --- a/submodule.c +++ b/submodule.c @@ -1997,6 +1997,9 @@ const char *get_superproject_working_tree(void) return ret; } +/* Put the gitdir for a submodule (given relative to the main repository worktree) + * into `buf`, or return -1 on error. + */ int submodule_to_gitdir(struct strbuf *buf, const char *submodule) { const struct submodule *sub; -- 2.14.1.821.g8fa685d3b7-goog
[PATCH 2/4] Clarify return value ownership of real_path and read_gitfile_gently
Signed-off-by: Han-Wen Nienhuys--- abspath.c | 3 +++ setup.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/abspath.c b/abspath.c index 708aff8d4..792a2d533 100644 --- a/abspath.c +++ b/abspath.c @@ -202,6 +202,9 @@ char *strbuf_realpath(struct strbuf *resolved, const char *path, return retval; } +/* Resolve `path` into an absolute, cleaned-up path. The return value + * comes from a global shared buffer and should not be freed. + */ const char *real_path(const char *path) { static struct strbuf realpath = STRBUF_INIT; diff --git a/setup.c b/setup.c index 6d8380acd..31853724c 100644 --- a/setup.c +++ b/setup.c @@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char *path, const char *dir) /* * Try to read the location of the git directory from the .git file, - * return path to git directory if found. + * return path to git directory if found. The return value comes from + * a shared pool and should not be freed. * * On failure, if return_error_code is not NULL, return_error_code * will be set to an error code and NULL will be returned. If -- 2.14.1.821.g8fa685d3b7-goog
[PATCH 4/4] Move documentation of string_list into string-list.h
This mirrors commit bdfdaa4978dd92992737e662f25adc01d32d0774 which did the same for strbuf.h: * API documentation uses /** */ to set it apart from other comments. * Function names were stripped from the comments. * Ordering of the header was adjusted to follow the one from the text file. * Edited some existing comments to follow the new standard. Signed-off-by: Han-Wen Nienhuys--- Documentation/technical/api-string-list.txt | 209 string-list.h | 187 + 2 files changed, 159 insertions(+), 237 deletions(-) delete mode 100644 Documentation/technical/api-string-list.txt diff --git a/Documentation/technical/api-string-list.txt b/Documentation/technical/api-string-list.txt deleted file mode 100644 index c08402b12..0 --- a/Documentation/technical/api-string-list.txt +++ /dev/null @@ -1,209 +0,0 @@ -string-list API -=== - -The string_list API offers a data structure and functions to handle -sorted and unsorted string lists. A "sorted" list is one whose -entries are sorted by string value in `strcmp()` order. - -The 'string_list' struct used to be called 'path_list', but was renamed -because it is not specific to paths. - -The caller: - -. Allocates and clears a `struct string_list` variable. - -. Initializes the members. You might want to set the flag `strdup_strings` - if the strings should be strdup()ed. For example, this is necessary - when you add something like git_path("..."), since that function returns - a static buffer that will change with the next call to git_path(). -+ -If you need something advanced, you can manually malloc() the `items` -member (you need this if you add things later) and you should set the -`nr` and `alloc` members in that case, too. - -. Adds new items to the list, using `string_list_append`, - `string_list_append_nodup`, `string_list_insert`, - `string_list_split`, and/or `string_list_split_in_place`. - -. Can check if a string is in the list using `string_list_has_string` or - `unsorted_string_list_has_string` and get it from the list using - `string_list_lookup` for sorted lists. - -. Can sort an unsorted list using `string_list_sort`. - -. Can remove duplicate items from a sorted list using - `string_list_remove_duplicates`. - -. Can remove individual items of an unsorted list using - `unsorted_string_list_delete_item`. - -. Can remove items not matching a criterion from a sorted or unsorted - list using `filter_string_list`, or remove empty strings using - `string_list_remove_empty_items`. - -. Finally it should free the list using `string_list_clear`. - -Example: - - -struct string_list list = STRING_LIST_INIT_NODUP; -int i; - -string_list_append(, "foo"); -string_list_append(, "bar"); -for (i = 0; i < list.nr; i++) - printf("%s\n", list.items[i].string) - - -NOTE: It is more efficient to build an unsorted list and sort it -afterwards, instead of building a sorted list (`O(n log n)` instead of -`O(n^2)`). -+ -However, if you use the list to check if a certain string was added -already, you should not do that (using unsorted_string_list_has_string()), -because the complexity would be quadratic again (but with a worse factor). - -Functions -- - -* General ones (works with sorted and unsorted lists as well) - -`string_list_init`:: - - Initialize the members of the string_list, set `strdup_strings` - member according to the value of the second parameter. - -`filter_string_list`:: - - Apply a function to each item in a list, retaining only the - items for which the function returns true. If free_util is - true, call free() on the util members of any items that have - to be deleted. Preserve the order of the items that are - retained. - -`string_list_remove_empty_items`:: - - Remove any empty strings from the list. If free_util is true, - call free() on the util members of any items that have to be - deleted. Preserve the order of the items that are retained. - -`print_string_list`:: - - Dump a string_list to stdout, useful mainly for debugging purposes. It - can take an optional header argument and it writes out the - string-pointer pairs of the string_list, each one in its own line. - -`string_list_clear`:: - - Free a string_list. The `string` pointer of the items will be freed in - case the `strdup_strings` member of the string_list is set. The second - parameter controls if the `util` pointer of the items should be freed - or not. - -* Functions for sorted lists only - -`string_list_has_string`:: - - Determine if the string_list has a given string or not. - -`string_list_insert`:: - - Insert a new element to the string_list. The returned pointer can be - handy if you want to write something to the `util` pointer of the - string_list_item containing the just added string. If the given
[PATCH 0/4] Assorted comment fixes
I followed Peff's advice for string-list.h comments. Han-Wen Nienhuys (4): Fix typo in submodule.h Clarify return value ownership of real_path and read_gitfile_gently Document submodule_to_gitdir Move documentation of string_list into string-list.h Documentation/technical/api-string-list.txt | 209 abspath.c | 3 + setup.c | 3 +- string-list.h | 187 + submodule.c | 3 + submodule.h | 2 +- 6 files changed, 168 insertions(+), 239 deletions(-) delete mode 100644 Documentation/technical/api-string-list.txt -- 2.14.1.821.g8fa685d3b7-goog
Re: [PATCH v2 00/21] Read `packed-refs` using mmap()
Hi Peff, On Wed, 20 Sep 2017, Jeff King wrote: > On Tue, Sep 19, 2017 at 08:22:08AM +0200, Michael Haggerty wrote: > > > This branch is also available from my fork on GitHub as branch > > `mmap-packed-refs`. > > > > My main uncertainties are: > > > > 1. Does this code actually work on Windows? > > > > I can't really answer (1) due to lack of knowledge. Sorry, I have only a couple of minutes per day to look through the flood of emails from the Git mailing list and to answer them, so I forgot to say that I had a VM build and test Michael's patches, and they worked on Windows. Ciao, Dscho
Re: [PATCH v2 14/21] read_packed_refs(): ensure that references are ordered when read
Hi Michael, On Thu, 21 Sep 2017, Michael Haggerty wrote: > On 09/20/2017 08:50 PM, Jeff King wrote: > > On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote: > >> If `packed-refs` was already sorted, then (if the system allows it) > >> we can use the mmapped file contents directly. But if the system > >> doesn't allow a file that is currently mmapped to be replaced using > >> `rename()`, then it would be bad for us to keep the file mmapped for > >> any longer than necessary. So, on such systems, always make a copy of > >> the file contents, either as part of the sorting process, or > >> afterwards. > > > > So this sort-of answers my question from the previous commit (why we > > care about the distinction between NONE and TEMPORARY), since we now > > start treating them differently. > > > > But I'm a bit confused why there's any advantage in the TEMPORARY case > > to doing the mmap-and-copy versus just treating it like NONE and reading > > it directly. > > > > Hmm, I guess it comes down to the double-allocation thing again? Let me > > see if I have this right: > > > > 1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If > > not, then we temporarily hold it in memory twice while we copy it > > into the new sorted buffer. > > > > 2. For TEMPORARY, we mmap once. If it's sorted, then we make a single > > copy. If it's not sorted, then we do the copy+sort as a single > > step. > > > > 3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the > > single copy. > > > > So this is really there to help the TEMPORARY case reading an old > > unsorted file avoid needing to use double-the-ram during the copy? > > > > That seems like a good reason (and it does not seem to add too much > > complexity). > > Yes, that's correct. A deeper question is whether it's worth this extra > effort to optimize the conversion from "unsorted" to "known-sorted", > which it seems should only happen once per repository. But > > * Many more Git commands read the `packed-refs` file than write it. > So an "unsorted" file might have to be read multiple times before > the first time it is rewritten with the "sorted" trait. > > * Users might have multiple Git clients writing their `packed-refs` > file (e.g., the git CLI plus JGit in Eclipse), and they might not > upgrade both clients at the same time to versions that write the > "sorted" trait. So a single repository might go back and forth > between "unsorted" and "known-sorted" multiple times in its > life. > > * Theoretically, somebody might have a `packed-refs` file that is so > big that it takes up more than half of memory. I think there are > scenarios where such a file could be converted to "sorted" form > in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode. > > On the downside, in my benchmarking on Linux, there were hints that the > `MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK` > branch when operating on a known-sorted `packed-refs` file. But the > speed difference was barely detectable (and might be illusory). And > anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK` > mode will usually be used. It *would* be interesting to know if there is > a significant speed difference on Windows. Dscho? Sadly, I lack the time to test this thoroughly, but my experience is that those things are dominated by I/O on Windows, i.e. that the malloc() and copy won't matter all that much. In any case, the same argument you make for Linux holds, in reverse, for Windows: we cannot use MMAP_OK on Windows, so the point is moot ;-) Ciao, Dscho
-s theirs use-case(s) Was: BUG: merge -s theirs is not in effect
On Mon, 25 Sep 2017, Junio C Hamano wrote: >It is a different matter to resurrect the age old discussion that >happend in the summer of 2008 if '-s theirs' should or should not >exist. In short, the previous discussion can be summarised to >"we don't want '-s theirs' as it encourages the wrong workflow". > > https://public-inbox.org/git/alpine.DEB.1.00.0807290123300.2725@eeepc-johanness/ >https://public-inbox.org/git/7vtzen7bul@gitster.siamese.dyndns.org/ >https://public-inbox.org/git/20080720192130.6...@nanako3.lavabit.com/ >It is OK for people to come with new perspective and bring new >ideas to the table. We learned from experience while using Git >for longer and are wiser than what we were back then, and might >be able to make a better decision ;-) FWIW 1. As a workaround for absence of -m theirs I using mtheirs git alias: (I believe provided to me awhile back here on the list): mtheirs = !sh -c 'git merge -s ours --no-commit $1 && git read-tree -m -u $1' - and it worked fine for my usecases 2. I think that if there is a reason for -s ours to exist, so there for -s theirs since it is just the directionality of merges which changes between the two 3. My most frequently used use-case for -m theirs strategy is repositories such as http://datasets.datalad.org/openfmri/ds01/.git where we construct "datalad dataset" by crawling the web resource(s), and workflow consists of 3 branches: incoming -- content from the web "as is" incoming-processed -- content from the web "processed" (fully automatically), e.g. tarballs extracted etc master -- the "final" result, delivered to public incoming-processed is formed by -s theirs --no-commit incoming, then all content needed to be extracted/processed (since last such merge point) is processed and commit is done. Such "merge" allows us to establish a point of previous "processing state" so we could react appropriately whenever anything in "incoming" branch changes (so that there is a new commit). And then incoming-processed is merged (regular recursive) into the master branch, which might have further "manual" tune ups. PS thanks for CCing replies -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
-X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect
On Mon, 25 Sep 2017, Junio C Hamano wrote: > Yaroslav Halchenkowrites: > > d'oh, indeed there is no git-merge-theirs neither in debian pkg or a > > freshly > > built git and I found a rogue script in the PATH (which did nothing > > apparently, sorry!). BUT I was originally mislead by the --help/manpage: > Ahh, you're right. The text does make readers expect "-s theirs" to > exist. > ... > * I hope this should help things a bit. yes it does. Thanks. And that is where I realized that I should have used -X theirs (not -s theirs), as the instruction on the option for the (recursive) merge. And now problem is more specific: - conflict within file content editing was resolved as instructed (taking "theirs" version) - BUT symlink was not taken from "theirs" and left as unresolved conflict: $> rm -rf /tmp/repo1; mkdir /tmp/repo1; cd /tmp/repo1; git init .; ln -s sym1 link; echo 1 > file; git add file link; git commit -m 'common'; git co -b b1 ; ln -sf b1link link; echo "b1 file" >| file; git commit -m 'b2 changes' -a; git co master; ln -sf masterlink link; echo "master file" >| file; git commit -m 'also modified in master' -a; git merge -X theirs --no-edit b1; ls -l link; cat file warning: templates not found /home/yoh/share/git-core/templates Initialized empty Git repository in /tmp/repo1/.git/ [master (root-commit) f0b75bc] common 2 files changed, 2 insertions(+) create mode 100644 file create mode 12 link Switched to a new branch 'b1' [b1 45c93ca] b2 changes 2 files changed, 2 insertions(+), 2 deletions(-) Switched to branch 'master' [master 0ee6db2] also modified in master 2 files changed, 2 insertions(+), 2 deletions(-) Auto-merging link CONFLICT (content): Merge conflict in link Auto-merging file Automatic merge failed; fix conflicts and then commit the result. lrwxrwxrwx 1 yoh yoh 10 Sep 25 10:21 link -> masterlink b1 file changes on filesystem: link | Unmerged cached/staged changes: file | 2 +- link | Unmerged PS I will followup on -s theirs in a split thread PSS Thanks for CCing me your replies -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
Re: [PATCH] mailinfo: don't decode invalid =XY quoted-printable sequences
On Sat, Sep 23, 2017 at 08:04:40PM +0200, René Scharfe wrote: > Decode =XY in quoted-printable segments only if X and Y are hexadecimal > digits, otherwise just copy them. That's at least better than > interpreting negative results from hexval() as a character. Thanks, this looks good to me overall. I wondered if we should die() here, but walking over cruft may be more friendly. The base64 case does the same, though it actually ignores the bytes rather than copying them. Since this is never _supposed_ to happen, it's hard to say what behavior would be preferable without seeing a real-world broken case. -Peff
Re: [PATCH v3 00/21] Read `packed-refs` using mmap()
On Mon, Sep 25, 2017 at 09:59:57AM +0200, Michael Haggerty wrote: > This is v3 of a patch series that changes the reading and caching of > the `packed-refs` file to use `mmap()`. Thanks to Stefan, Peff, Dscho, > and Junio for their comments about v2. I think I have addressed all of > the feedback from v1 [1] and v2 [2]. > > This version has only minor changes relative to v2: > > * Fixed a trivial error in the commit message for patch 08. > > * In patch 13: > > * In the commit message, explain the appearance of `MMAP_TEMPORARY` > even though it is not yet treated differently than `MMAP_NONE`. > > * In `Makefile`, don't make `USE_WIN32_MMAP` imply > `MMAP_PREVENTS_DELETE`. > > * Correct the type of a local variable from `size_t` to `ssize_t`. Thanks, this version addresses all my nits. -Peff
Re: [PATCH 3/4] merge: --no-verify to bypass pre-merge hook
Junio C Hamano venit, vidit, dixit 24.09.2017 01:48: > Michael J Gruberwrites: > >> From: Michael J Gruber >> >> Analogous to commit, introduce a '--no-verify' option which bypasses the >> pre-merge hook. The shorthand '-n' is taken by the (non-existing) >> '--no-stat' already. >> >> Signed-off-by: Michael J Gruber >> --- > > It appears that some of the pieces in this patch, namely, > D/git-merge.txt and D/merge-options.txt, belong to 2/4, where we are > fixing up an earlier addition of --[no-]verify option to the > command, to be updated to only add mention of pre-merge hook in this > step? Oh, sorry, rebaser error. I should also rewrite all commits to my current e-mail. > The end result looks good regardless, I would think, though. Pending restructuring and attending to the other comments...
[RFC PATCH 8/8] sequencer: try to commit without forking 'git commit'
From: Phillip WoodIf the commit message does not need to be edited then create the commit without forking 'git commit'. Taking the best time of ten runs with a warm cache this reduces the time taken to cherry-pick 10 commits by 26% (from 319ms to 235ms), and the time taken by 'git rebase --continue' to pick 10 commits by 44% (from 376ms to 211ms) on my computer running linux. The greater saving for rebase is because it longer wastes time creating the commit summary just to throw it away. Even when not forking 'git commit' the commit message is written to a file and CHERRY_PICK_HEAD/REVERT_HEAD are created unnecessarily. This could be eliminated in future. I hacked up a version that does not write these files and just passed an strbuf (with the wrong message for fixup and squash commands) to do_commit() but I couldn't measure any significant time difference when running cherry-pick or rebase. I think eliminating the writes properly for rebase would require a bit of effort as the code would need to be restructured. Signed-off-by: Phillip Wood --- I wonder if this should reparse the author it gets from the existing commit rather than just reusing it. sequencer.c | 149 +++- 1 file changed, 147 insertions(+), 2 deletions(-) diff --git a/sequencer.c b/sequencer.c index 230bdb8535a422b1263429e5894e3f8b1733e844..ebba53583727e947ed767d6181b14254e23fc146 100644 --- a/sequencer.c +++ b/sequencer.c @@ -591,6 +591,18 @@ static int read_env_script(struct argv_array *env) return 0; } +static char *get_author(const char* message) +{ + size_t len; + const char *a; + + a = find_commit_header(message, "author", ); + if (a) + return xmemdupz(a, len); + + return NULL; +} + static const char staged_changes_advice[] = N_("you have staged changes in your working tree\n" "If these changes are meant to be squashed into the previous commit, run:\n" @@ -973,6 +985,130 @@ void print_commit_summary(const char *prefix, const struct object_id *oid, strbuf_release(); } +static int try_to_commit(struct strbuf *msgbuf, const char *author, +struct replay_opts *opts, unsigned int flags) +{ + struct object_id tree; + struct object_id oid; + struct commit *current_head; + struct commit_list *parents = NULL; + struct commit_extra_header *extra = NULL; + struct strbuf err = STRBUF_INIT; + struct strbuf amend_msg = STRBUF_INIT; + struct strbuf *msg; + char *amend_author = NULL; + const char *gpg_sign; + enum cleanup_mode cleanup; + int res = 0; + + msg = msgbuf; + if (flags & EDIT_MSG || flags & VERIFY_MSG) + return 1; + + if (get_oid("HEAD", )) { + current_head = NULL; + } else { + current_head = lookup_commit_or_die(, "HEAD"); + if (parse_commit(current_head)) + return error(_("could not parse HEAD commit")); + } + + if (flags & AMEND_MSG) { + const char *body; + const char *exclude_gpgsig[2] = { "gpgsig", NULL }; + const char *out_enc = get_commit_output_encoding(); + const char *message = logmsg_reencode(current_head, NULL, + out_enc); + + msg = _msg; + if (find_commit_subject(message, )) + strbuf_addstr(_msg, body); + author = amend_author = get_author (message); + unuse_commit_buffer(current_head, message); + if (!author) { + res = error(_("unable to parse commit author")); + goto out; + } + parents = copy_commit_list(current_head->parents); + extra = read_commit_extra_headers(current_head, exclude_gpgsig); + } else if (current_head) { + commit_list_insert(current_head, ); + } + + cleanup = (flags & CLEANUP_MSG) ? CLEANUP_ALL : default_msg_cleanup; + if (cleanup != CLEANUP_NONE) + strbuf_stripspace(msg, cleanup == CLEANUP_ALL); + if (!opts->allow_empty_message && message_is_empty(msg, cleanup)) { + res = 1; + goto out; + } + + gpg_sign = (opts->gpg_sign) ? opts->gpg_sign : default_gpg_sign; + + if (write_cache_as_tree(tree.hash, 0, NULL)) { + res = error(_("git write-tree failed to write a tree")); + goto out; + } + + if (!(flags & ALLOW_EMPTY) && !oidcmp(current_head ? + _head->tree->object.oid : + _tree_oid, )) { + res = 1; + goto out; + } + + res = commit_tree_extended(msg->buf, msg->len,
[RFC PATCH 2/8] commit: move code to update HEAD to libgit
From: Phillip WoodSigned-off-by: Phillip Wood --- builtin/commit.c | 31 +++ sequencer.c | 39 ++- sequencer.h | 2 ++ 3 files changed, 47 insertions(+), 25 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 0b8c1ef6f57cfed328d12255e6834adb4bda4137..497778ba2c02afdd4a337969a27ca781e8389040 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1578,13 +1578,11 @@ int cmd_commit(int argc, const char **argv, const char *prefix) struct strbuf sb = STRBUF_INIT; struct strbuf author_ident = STRBUF_INIT; const char *index_file, *reflog_msg; - char *nl; struct object_id oid; struct commit_list *parents = NULL; struct stat statbuf; struct commit *current_head = NULL; struct commit_extra_header *extra = NULL; - struct ref_transaction *transaction; struct strbuf err = STRBUF_INIT; if (argc == 2 && !strcmp(argv[1], "-h")) @@ -1625,10 +1623,10 @@ int cmd_commit(int argc, const char **argv, const char *prefix) reflog_msg = getenv("GIT_REFLOG_ACTION"); if (!current_head) { if (!reflog_msg) - reflog_msg = "commit (initial)"; + setenv ("GIT_REFLOG_ACTION", "commit (initial)", 1); } else if (amend) { if (!reflog_msg) - reflog_msg = "commit (amend)"; + setenv("GIT_REFLOG_ACTION", "commit (amend)", 1); parents = copy_commit_list(current_head->parents); } else if (whence == FROM_MERGE) { struct strbuf m = STRBUF_INIT; @@ -1637,7 +1635,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) struct commit_list **pptr = if (!reflog_msg) - reflog_msg = "commit (merge)"; + setenv("GIT_REFLOG_ACTION", "commit (merge)", 1); pptr = commit_list_append(current_head, pptr); fp = xfopen(git_path_merge_head(), "r"); while (strbuf_getline_lf(, fp) != EOF) { @@ -1660,9 +1658,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix) parents = reduce_heads(parents); } else { if (!reflog_msg) - reflog_msg = (whence == FROM_CHERRY_PICK) - ? "commit (cherry-pick)" - : "commit"; + setenv("GIT_REFLOG_ACTION", (whence == FROM_CHERRY_PICK) + ? "commit (cherry-pick)" + : "commit", 1); commit_list_insert(current_head, ); } @@ -1707,25 +1705,10 @@ int cmd_commit(int argc, const char **argv, const char *prefix) strbuf_release(_ident); free_commit_extra_headers(extra); - nl = strchr(sb.buf, '\n'); - if (nl) - strbuf_setlen(, nl + 1 - sb.buf); - else - strbuf_addch(, '\n'); - strbuf_insert(, 0, reflog_msg, strlen(reflog_msg)); - strbuf_insert(, strlen(reflog_msg), ": ", 2); - - transaction = ref_transaction_begin(); - if (!transaction || - ref_transaction_update(transaction, "HEAD", oid.hash, - current_head - ? current_head->object.oid.hash : null_sha1, - 0, sb.buf, ) || - ref_transaction_commit(transaction, )) { + if (update_head (current_head, , , )) { rollback_index_files(); die("%s", err.buf); } - ref_transaction_free(transaction); unlink(git_path_cherry_pick_head()); unlink(git_path_revert_head()); diff --git a/sequencer.c b/sequencer.c index 319208afb3de36c97b6c62d4ecf6e641245e7a54..917ad4a16216b30adb2c2c9650217926d8db8ba7 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1,10 +1,10 @@ #include "cache.h" #include "config.h" #include "lockfile.h" -#include "sequencer.h" #include "dir.h" #include "object.h" #include "commit.h" +#include "sequencer.h" #include "tag.h" #include "run-command.h" #include "exec_cmd.h" @@ -750,6 +750,43 @@ int template_untouched(const struct strbuf *sb, const char *template_file, return rest_is_empty(sb, start - sb->buf); } +int update_head(const struct commit *old_head, const struct object_id *new_head, + const struct strbuf *msg, struct strbuf *err) +{ + struct ref_transaction *transaction; + struct strbuf sb = STRBUF_INIT; + const char *nl, *reflog_msg; + int ret = 0; + + reflog_msg = getenv("GIT_REFLOG_ACTION"); + if (!reflog_msg) + reflog_msg=""; + + nl = strchr(msg->buf, '\n'); + if (nl) {
[RFC PATCH 0/8] sequencer: dont't fork git commit
From: Phillip WoodThese patches teach the sequencer to create commits without forking git commit when the commit message does not need to be edited. This speeds up cherry picking 10 commits by 26% and picking 10 commits with rebase --continue by 44%. The first few patches move bits of builtin/commit.c to sequencer.c. The last two patches actually implement creating commits in sequencer.c. Phillip Wood (8): commit: move empty message checks to libgit commit: move code to update HEAD to libgit sequencer: refactor update_head() commit: move post-rewrite code to libgit commit: move print_commit_summary() to libgit sequencer: simplify adding Signed-off-by: trailer sequencer: load commit related config sequencer: try to commit without forking 'git commit' builtin/commit.c | 269 ++-- builtin/rebase--helper.c | 13 +- builtin/revert.c | 15 +- sequencer.c | 447 ++- sequencer.h | 20 +++ 5 files changed, 503 insertions(+), 261 deletions(-) -- 2.14.1
[RFC PATCH 1/8] commit: move empty message checks to libgit
From: Phillip WoodSigned-off-by: Phillip Wood --- builtin/commit.c | 70 +++- sequencer.c | 60 sequencer.h | 10 3 files changed, 73 insertions(+), 67 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index b3b04f5dd3a94d1661e877c5019cc56ac46854ef..0b8c1ef6f57cfed328d12255e6834adb4bda4137 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -128,12 +128,7 @@ static char *sign_commit; * if editor is used, and only the whitespaces if the message * is specified explicitly. */ -static enum { - CLEANUP_SPACE, - CLEANUP_NONE, - CLEANUP_SCISSORS, - CLEANUP_ALL -} cleanup_mode; +static enum cleanup_mode cleanup_mode; static const char *cleanup_arg; static enum commit_whence whence; @@ -978,65 +973,6 @@ static int prepare_to_commit(const char *index_file, const char *prefix, return 1; } -static int rest_is_empty(struct strbuf *sb, int start) -{ - int i, eol; - const char *nl; - - /* Check if the rest is just whitespace and Signed-off-by's. */ - for (i = start; i < sb->len; i++) { - nl = memchr(sb->buf + i, '\n', sb->len - i); - if (nl) - eol = nl - sb->buf; - else - eol = sb->len; - - if (strlen(sign_off_header) <= eol - i && - starts_with(sb->buf + i, sign_off_header)) { - i = eol; - continue; - } - while (i < eol) - if (!isspace(sb->buf[i++])) - return 0; - } - - return 1; -} - -/* - * Find out if the message in the strbuf contains only whitespace and - * Signed-off-by lines. - */ -static int message_is_empty(struct strbuf *sb) -{ - if (cleanup_mode == CLEANUP_NONE && sb->len) - return 0; - return rest_is_empty(sb, 0); -} - -/* - * See if the user edited the message in the editor or left what - * was in the template intact - */ -static int template_untouched(struct strbuf *sb) -{ - struct strbuf tmpl = STRBUF_INIT; - const char *start; - - if (cleanup_mode == CLEANUP_NONE && sb->len) - return 0; - - if (!template_file || strbuf_read_file(, template_file, 0) <= 0) - return 0; - - strbuf_stripspace(, cleanup_mode == CLEANUP_ALL); - if (!skip_prefix(sb->buf, tmpl.buf, )) - start = sb->buf; - strbuf_release(); - return rest_is_empty(sb, start - sb->buf); -} - static const char *find_author_by_nickname(const char *name) { struct rev_info revs; @@ -1744,12 +1680,12 @@ int cmd_commit(int argc, const char **argv, const char *prefix) if (cleanup_mode != CLEANUP_NONE) strbuf_stripspace(, cleanup_mode == CLEANUP_ALL); - if (message_is_empty() && !allow_empty_message) { + if (message_is_empty(, cleanup_mode) && !allow_empty_message) { rollback_index_files(); fprintf(stderr, _("Aborting commit due to empty commit message.\n")); exit(1); } - if (template_untouched() && !allow_empty_message) { + if (template_untouched(, template_file, cleanup_mode) && !allow_empty_message) { rollback_index_files(); fprintf(stderr, _("Aborting commit; you did not edit the message.\n")); exit(1); diff --git a/sequencer.c b/sequencer.c index fcceabb80f4261006cdd65bc0ec95ac54ea42e7c..319208afb3de36c97b6c62d4ecf6e641245e7a54 100644 --- a/sequencer.c +++ b/sequencer.c @@ -690,6 +690,66 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, return run_command(); } +static int rest_is_empty(const struct strbuf *sb, int start) +{ + int i, eol; + const char *nl; + + /* Check if the rest is just whitespace and Signed-off-by's. */ + for (i = start; i < sb->len; i++) { + nl = memchr(sb->buf + i, '\n', sb->len - i); + if (nl) + eol = nl - sb->buf; + else + eol = sb->len; + + if (strlen(sign_off_header) <= eol - i && + starts_with(sb->buf + i, sign_off_header)) { + i = eol; + continue; + } + while (i < eol) + if (!isspace(sb->buf[i++])) + return 0; + } + + return 1; +} + +/* + * Find out if the message in the strbuf contains only whitespace and + * Signed-off-by lines. + */ +int message_is_empty(const struct strbuf *sb, enum cleanup_mode cleanup_mode) +{ + if (cleanup_mode == CLEANUP_NONE && sb->len) + return 0; + return
[RFC PATCH 4/8] commit: move post-rewrite code to libgit
From: Phillip WoodSigned-off-by: Phillip Wood --- builtin/commit.c | 42 +- sequencer.c | 46 ++ sequencer.h | 2 ++ 3 files changed, 49 insertions(+), 41 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 497778ba2c02afdd4a337969a27ca781e8389040..9d621098823d196643180226491e43c806154c13 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -31,9 +31,7 @@ #include "gpg-interface.h" #include "column.h" #include "sequencer.h" -#include "notes-utils.h" #include "mailmap.h" -#include "sigchain.h" static const char * const builtin_commit_usage[] = { N_("git commit [] [--] ..."), @@ -1465,37 +1463,6 @@ static int git_commit_config(const char *k, const char *v, void *cb) return git_status_config(k, v, s); } -static int run_rewrite_hook(const struct object_id *oldoid, - const struct object_id *newoid) -{ - struct child_process proc = CHILD_PROCESS_INIT; - const char *argv[3]; - int code; - struct strbuf sb = STRBUF_INIT; - - argv[0] = find_hook("post-rewrite"); - if (!argv[0]) - return 0; - - argv[1] = "amend"; - argv[2] = NULL; - - proc.argv = argv; - proc.in = -1; - proc.stdout_to_stderr = 1; - - code = start_command(); - if (code) - return code; - strbuf_addf(, "%s %s\n", oid_to_hex(oldoid), oid_to_hex(newoid)); - sigchain_push(SIGPIPE, SIG_IGN); - write_in_full(proc.in, sb.buf, sb.len); - close(proc.in); - strbuf_release(); - sigchain_pop(SIGPIPE); - return finish_command(); -} - int run_commit_hook(int editor_is_used, const char *index_file, const char *name, ...) { struct argv_array hook_env = ARGV_ARRAY_INIT; @@ -1725,14 +1692,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) rerere(0); run_commit_hook(use_editor, get_index_file(), "post-commit", NULL); if (amend && !no_post_rewrite) { - struct notes_rewrite_cfg *cfg; - cfg = init_copy_notes_for_rewrite("amend"); - if (cfg) { - /* we are amending, so current_head is not NULL */ - copy_note_for_rewrite(cfg, _head->object.oid, ); - finish_copy_notes_for_rewrite(cfg, "Notes added by 'git commit --amend'"); - } - run_rewrite_hook(_head->object.oid, ); + commit_post_rewrite(current_head, ); } if (!quiet) print_summary(prefix, , !current_head); diff --git a/sequencer.c b/sequencer.c index 1795a4df2a0021b2419d941c6083e49cd6647314..81bd0810df6bcf2e078abc220bb8984345bf467f 100644 --- a/sequencer.c +++ b/sequencer.c @@ -20,6 +20,8 @@ #include "trailer.h" #include "log-tree.h" #include "wt-status.h" +#include "notes-utils.h" +#include "sigchain.h" #define GIT_REFLOG_ACTION "GIT_REFLOG_ACTION" @@ -786,6 +788,50 @@ int update_head(const struct commit *old_head, const struct object_id *new_head, return ret; } +static int run_rewrite_hook(const struct object_id *oldoid, + const struct object_id *newoid) +{ + struct child_process proc = CHILD_PROCESS_INIT; + const char *argv[3]; + int code; + struct strbuf sb = STRBUF_INIT; + + argv[0] = find_hook("post-rewrite"); + if (!argv[0]) + return 0; + + argv[1] = "amend"; + argv[2] = NULL; + + proc.argv = argv; + proc.in = -1; + proc.stdout_to_stderr = 1; + + code = start_command(); + if (code) + return code; + strbuf_addf(, "%s %s\n", oid_to_hex(oldoid), oid_to_hex(newoid)); + sigchain_push(SIGPIPE, SIG_IGN); + write_in_full(proc.in, sb.buf, sb.len); + close(proc.in); + strbuf_release(); + sigchain_pop(SIGPIPE); + return finish_command(); +} + +void commit_post_rewrite(const struct commit *old_head, +const struct object_id *new_head) +{ + struct notes_rewrite_cfg *cfg; + cfg = init_copy_notes_for_rewrite("amend"); + if (cfg) { + /* we are amending, so current_head is not NULL */ + copy_note_for_rewrite(cfg, _head->object.oid, new_head); + finish_copy_notes_for_rewrite(cfg, "Notes added by 'git commit --amend'"); + } + run_rewrite_hook(_head->object.oid, new_head); +} + static int is_original_commit_empty(struct commit *commit) { const struct object_id *ptree_oid; diff --git a/sequencer.h b/sequencer.h index 87edf40e5274d59f48d5af57678100ea220d2c8a..45def684ad751d0b8dc62b6cfdfb819ddf183c89 100644 --- a/sequencer.h +++ b/sequencer.h @@ -62,4 +62,6 @@ int template_untouched(const struct strbuf *sb, const char *template_file,
[RFC PATCH 5/8] commit: move print_commit_summary() to libgit
From: Phillip WoodSigned-off-by: Phillip Wood --- builtin/commit.c | 126 --- sequencer.c | 116 ++ sequencer.h | 5 +++ 3 files changed, 129 insertions(+), 118 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 9d621098823d196643180226491e43c806154c13..fac392216c4dc4d4a2b9fbee1d9ee14da0a14209 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -43,31 +43,6 @@ static const char * const builtin_status_usage[] = { NULL }; -static const char implicit_ident_advice_noconfig[] = -N_("Your name and email address were configured automatically based\n" -"on your username and hostname. Please check that they are accurate.\n" -"You can suppress this message by setting them explicitly. Run the\n" -"following command and follow the instructions in your editor to edit\n" -"your configuration file:\n" -"\n" -"git config --global --edit\n" -"\n" -"After doing this, you may fix the identity used for this commit with:\n" -"\n" -"git commit --amend --reset-author\n"); - -static const char implicit_ident_advice_config[] = -N_("Your name and email address were configured automatically based\n" -"on your username and hostname. Please check that they are accurate.\n" -"You can suppress this message by setting them explicitly:\n" -"\n" -"git config --global user.name \"Your Name\"\n" -"git config --global user.email y...@example.com\n" -"\n" -"After doing this, you may fix the identity used for this commit with:\n" -"\n" -"git commit --amend --reset-author\n"); - static const char empty_amend_advice[] = N_("You asked to amend the most recent commit, but doing so would make\n" "it empty. You can repeat your command with --allow-empty, or you can\n" @@ -1343,97 +1318,6 @@ int cmd_status(int argc, const char **argv, const char *prefix) return 0; } -static const char *implicit_ident_advice(void) -{ - char *user_config = expand_user_path("~/.gitconfig", 0); - char *xdg_config = xdg_config_home("config"); - int config_exists = file_exists(user_config) || file_exists(xdg_config); - - free(user_config); - free(xdg_config); - - if (config_exists) - return _(implicit_ident_advice_config); - else - return _(implicit_ident_advice_noconfig); - -} - -static void print_summary(const char *prefix, const struct object_id *oid, - int initial_commit) -{ - struct rev_info rev; - struct commit *commit; - struct strbuf format = STRBUF_INIT; - struct object_id junk_oid; - const char *head; - struct pretty_print_context pctx = {0}; - struct strbuf author_ident = STRBUF_INIT; - struct strbuf committer_ident = STRBUF_INIT; - - commit = lookup_commit(oid); - if (!commit) - die(_("couldn't look up newly created commit")); - if (parse_commit(commit)) - die(_("could not parse newly created commit")); - - strbuf_addstr(, "format:%h] %s"); - - format_commit_message(commit, "%an <%ae>", _ident, ); - format_commit_message(commit, "%cn <%ce>", _ident, ); - if (strbuf_cmp(_ident, _ident)) { - strbuf_addstr(, "\n Author: "); - strbuf_addbuf_percentquote(, _ident); - } - if (author_date_is_interesting()) { - struct strbuf date = STRBUF_INIT; - format_commit_message(commit, "%ad", , ); - strbuf_addstr(, "\n Date: "); - strbuf_addbuf_percentquote(, ); - strbuf_release(); - } - if (!committer_ident_sufficiently_given()) { - strbuf_addstr(, "\n Committer: "); - strbuf_addbuf_percentquote(, _ident); - if (advice_implicit_identity) { - strbuf_addch(, '\n'); - strbuf_addstr(, implicit_ident_advice()); - } - } - strbuf_release(_ident); - strbuf_release(_ident); - - init_revisions(, prefix); - setup_revisions(0, NULL, , NULL); - - rev.diff = 1; - rev.diffopt.output_format = - DIFF_FORMAT_SHORTSTAT | DIFF_FORMAT_SUMMARY; - - rev.verbose_header = 1; - rev.show_root_diff = 1; - get_commit_format(format.buf, ); - rev.always_show_header = 0; - rev.diffopt.detect_rename = 1; - rev.diffopt.break_opt = 0; - diff_setup_done(); - - head = resolve_ref_unsafe("HEAD", 0, junk_oid.hash, NULL); - if (!strcmp(head, "HEAD")) - head = _("detached HEAD"); - else - skip_prefix(head, "refs/heads/", ); - printf("[%s%s ", head, initial_commit ? _(" (root-commit)") : ""); - - if (!log_tree_commit(, commit)) { - rev.always_show_header = 1; -
[RFC PATCH 6/8] sequencer: simplify adding Signed-off-by: trailer
From: Phillip WoodAdd the Signed-off-by: trailer in one place rather than adding it to the message when doing a recursive merge and then again when committing. This means that if there are conflicts when merging with a strategy other than 'recursive' the Signed-off-by: trailer will be added if the user commits the resolution themselves without passing '--signoff' to 'git commit'. It also simplifies the in-process commit that is about to be added to the sequencer. Signed-off-by: Phillip Wood --- sequencer.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/sequencer.c b/sequencer.c index 65031353f01d75624951222c2de280c427e26ac5..e9060e3ca50777687c578ff09c62cd901efcfb0e 100644 --- a/sequencer.c +++ b/sequencer.c @@ -476,9 +476,6 @@ static int do_recursive_merge(struct commit *base, struct commit *next, _(action_name(opts))); rollback_lock_file(_lock); - if (opts->signoff) - append_signoff(msgbuf, 0, 0); - if (!clean) append_conflicts_hint(msgbuf); @@ -656,8 +653,6 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, argv_array_push(, "--amend"); if (opts->gpg_sign) argv_array_pushf(, "-S%s", opts->gpg_sign); - if (opts->signoff) - argv_array_push(, "-s"); if (defmsg) argv_array_pushl(, "-F", defmsg, NULL); if ((flags & CLEANUP_MSG)) @@ -1339,6 +1334,9 @@ static int do_pick_commit(enum todo_command command, struct commit *commit, } } + if (opts->signoff) + append_signoff(, 0, 0); + if (is_rebase_i(opts) && write_author_script(msg.message) < 0) res = -1; else if (!opts->strategy || !strcmp(opts->strategy, "recursive") || command == TODO_REVERT) { -- 2.14.1
[RFC PATCH 7/8] sequencer: load commit related config
From: Phillip WoodLoad default values for message cleanup and gpg signing of commits in preparation for committing without forking 'git commit'. Signed-off-by: Phillip Wood --- builtin/rebase--helper.c | 13 - builtin/revert.c | 15 +-- sequencer.c | 30 ++ sequencer.h | 1 + 4 files changed, 56 insertions(+), 3 deletions(-) diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c index c82b4dce6838fa039e9c2cfc769c7cd17a8ef562..3a99f9f8f1396ee5a040c7c4e9367688fe04c9a4 100644 --- a/builtin/rebase--helper.c +++ b/builtin/rebase--helper.c @@ -9,6 +9,17 @@ static const char * const builtin_rebase_helper_usage[] = { NULL }; +static int git_rebase_helper_config (const char *k, const char *v, void *cb) +{ + int status; + + status = git_sequencer_config (k, v, NULL); + if (status) + return status; + + return git_default_config(k, v, NULL); +} + int cmd_rebase__helper(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; @@ -24,7 +35,7 @@ int cmd_rebase__helper(int argc, const char **argv, const char *prefix) OPT_END() }; - git_config(git_default_config, NULL); + git_config(git_rebase_helper_config, NULL); opts.action = REPLAY_INTERACTIVE_REBASE; opts.allow_ff = 1; diff --git a/builtin/revert.c b/builtin/revert.c index b9d927eb09c9ed87c84681df1396f4e6d9b13c97..2883f9eb71be1c2b205fd0f1d962748438f4c4b8 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -31,6 +31,17 @@ static const char * const cherry_pick_usage[] = { NULL }; +static int git_revert_config (const char *k, const char *v, void *cb) +{ + int status; + + status = git_sequencer_config (k, v, NULL); + if (status) + return status; + + return git_default_config(k, v, NULL); +} + static const char *action_name(const struct replay_opts *opts) { return opts->action == REPLAY_REVERT ? "revert" : "cherry-pick"; @@ -208,7 +219,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) if (isatty(0)) opts.edit = 1; opts.action = REPLAY_REVERT; - git_config(git_default_config, NULL); + git_config(git_revert_config, NULL); res = run_sequencer(argc, argv, ); if (res < 0) die(_("revert failed")); @@ -221,7 +232,7 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) int res; opts.action = REPLAY_PICK; - git_config(git_default_config, NULL); + git_config(git_revert_config, NULL); res = run_sequencer(argc, argv, ); if (res < 0) die(_("cherry-pick failed")); diff --git a/sequencer.c b/sequencer.c index e9060e3ca50777687c578ff09c62cd901efcfb0e..230bdb8535a422b1263429e5894e3f8b1733e844 100644 --- a/sequencer.c +++ b/sequencer.c @@ -687,6 +687,36 @@ static int run_git_commit(const char *defmsg, struct replay_opts *opts, return run_command(); } +static enum cleanup_mode default_msg_cleanup = CLEANUP_NONE; +static char *default_gpg_sign; + +int git_sequencer_config(const char *k, const char *v, void *cb) +{ + if (!strcmp(k, "commit.cleanup")) { + int status; + const char *s; + + status = git_config_string(, k, v); + if (!strcmp(s, "verbatim")) + default_msg_cleanup = CLEANUP_NONE; + else if (!strcmp(s, "whitespace")) + default_msg_cleanup = CLEANUP_SPACE; + else if (!strcmp(s, "strip")) + default_msg_cleanup = CLEANUP_ALL; + else if (!strcmp(s, "scissors")) + default_msg_cleanup = CLEANUP_NONE; + + return status; + } + + if (!strcmp(k, "commit.gpgsign")) { + default_gpg_sign = git_config_bool(k, v) ? "" : NULL; + return 0; + } + + return git_gpg_config(k, v, NULL); +} + static int rest_is_empty(const struct strbuf *sb, int start) { int i, eol; diff --git a/sequencer.h b/sequencer.h index d491f5cae2e6152859d114a08bbd7c935d1e80a6..2759ca2f7d1aba1268c88dc8bd94eb3e593897fe 100644 --- a/sequencer.h +++ b/sequencer.h @@ -49,6 +49,7 @@ extern const char sign_off_header[]; void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag); void append_conflicts_hint(struct strbuf *msgbuf); +int git_sequencer_config(const char *k, const char *v, void *cb); enum cleanup_mode { CLEANUP_SPACE, -- 2.14.1
[RFC PATCH 3/8] sequencer: refactor update_head()
From: Phillip WoodThe previous commit was a mechanical translation of the code from builtin/commit.c. Now that it uses its own strbuf for the reflog message it can be simplified slightly. Signed-off-by: Phillip Wood --- sequencer.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/sequencer.c b/sequencer.c index 917ad4a16216b30adb2c2c9650217926d8db8ba7..1795a4df2a0021b2419d941c6083e49cd6647314 100644 --- a/sequencer.c +++ b/sequencer.c @@ -759,8 +759,9 @@ int update_head(const struct commit *old_head, const struct object_id *new_head, int ret = 0; reflog_msg = getenv("GIT_REFLOG_ACTION"); - if (!reflog_msg) - reflog_msg=""; + if (reflog_msg) + strbuf_addstr(, reflog_msg); + strbuf_addstr(, ": "); nl = strchr(msg->buf, '\n'); if (nl) { @@ -769,8 +770,6 @@ int update_head(const struct commit *old_head, const struct object_id *new_head, strbuf_addbuf(, msg); strbuf_addch(, '\n'); } - strbuf_insert(, 0, reflog_msg, strlen(reflog_msg)); - strbuf_insert(, strlen(reflog_msg), ": ", 2); transaction = ref_transaction_begin(err); if (!transaction || -- 2.14.1
[PATCH v2 0/5] Improve abbreviation disambiguation
Thanks for the feedback on my v1 patch. Thanks also to Jeff Hostetler for helping me with this v2 patch, which includes an extra performance improvement in commit 5. Changes since last version: * Added helper program test-list-objects to construct a list of existing object ids. * test-abbrev now disambiguates a list of OIDs from stdin. * p0008-abbrev.sh now has two tests: * 0008.1 tests 100,000 known OIDs * 0008.2 tests 100,000 missing OIDs * Added a third performance improvement that uses binary search within packfiles to inspect at most two object ids per packfile. Thanks, Stolee --- When displaying object ids, we frequently want to see an abbreviation for easier typing. That abbreviation must be unambiguous among all object ids. The current implementation of find_unique_abbrev() performs a loop checking if each abbreviation length is unambiguous until finding one that works. This causes multiple round-trips to the disk when starting with the default abbreviation length (usually 7) but needing up to 12 characters for an unambiguous short-sha. For very large repos, this effect is pronounced and causes issues with several commands, from obvious consumers `status` and `log` to less obvious commands such as `fetch` and `push`. This patch improves performance by iterating over objects matching the short abbreviation only once, inspecting each object id, and reporting the minimum length of an unambiguous abbreviation. A helper program `test-list-objects` outputs a sampling of object ids, which we reorder using `sort -R` before using them as input to a performance test. A performance helper `test-abbrev` and performance test `p0008-abbrev.sh` are added to demonstrate performance improvements in this area. I include performance test numbers in the commit messages for each change, but I also include the difference between the baseline and the final change here: p0008.1: find_unique_abbrev() for existing objects -- For 10 repeated tests, each checking 100,000 known objects, we find the following results when running in a Linux VM: | | Pack | Packed | Loose | Base | New|| | Repo | Files | Objects | Objects| Time | Time | Rel% | |---|---|-||||| | Git | 1 | 230078 | 0 | 0.12 s | 0.05 s | -58.3% | | Git | 5 | 230162 | 0 | 0.25 s | 0.15 s | -40.0% | | Git | 4 | 154310 | 75852 | 0.18 s | 0.11 s | -38.9% | | Linux | 1 | 5606645 | 0 | 0.32 s | 0.10 s | -68.8% | | Linux |24 | 5606645 | 0 | 2.77 s | 2.00 s | -27.8% | | Linux |23 | 5283204 | 323441 | 2.86 s | 1.62 s | -43.4% | | VSTS | 1 | 4355923 | 0 | 0.27 s | 0.09 s | -66.7% | | VSTS |32 | 4355923 | 0 | 2.41 s | 2.01 s | -16.6% | | VSTS |31 | 4276829 | 79094 | 4.22 s | 3.02 s | -28.4% | For the Windows repo running in Windows Subsystem for Linux: Pack Files: 50 Packed Objects: 22,385,898 Loose Objects: 492 Base Time: 5.69 s New Time: 4.09 s Rel %: -28.1% p0008.2: find_unique_abbrev() for missing objects - For 10 repeated tests, each checking 100,000 missing objects, we find the following results when running in a Linux VM: | | Pack | Packed | Loose | Base | New|| | Repo | Files | Objects | Objects| Time | Time | Rel% | |---|---|-||||| | Git | 1 | 230078 | 0 | 0.61 s | 0.04 s | -93.4% | | Git | 5 | 230162 | 0 | 1.30 s | 0.15 s | -88.5% | | Git | 4 | 154310 | 75852 | 1.07 s | 0.12 s | -88.8% | | Linux | 1 | 5606645 | 0 | 0.65 s | 0.05 s | -92.3% | | Linux |24 | 5606645 | 0 | 7.12 s | 1.28 s | -82.0% | | Linux |23 | 5283204 | 323441 | 6.33 s | 0.96 s | -84.8% | | VSTS | 1 | 4355923 | 0 | 0.64 s | 0.05 s | -92.2% | | VSTS |32 | 4355923 | 0 | 7.77 s | 1.36 s | -82.5% | | VSTS |31 | 4276829 | 79094 | 7.76 s | 1.36 s | -82.5% | For the Windows repo running in Windows Subsystem for Linux: Pack Files: 50 Packed Objects: 22,385,898 Loose Objects: 492 Base Time: 38.9 s New Time: 2.7 s Rel %: -93.1% Derrick Stolee (5): test-list-objects: List a subset of object ids p0008-abbrev.sh: Test find_unique_abbrev() perf sha1_name: Unroll len loop in find_unique_abbrev_r sha1_name: Parse less while finding common prefix sha1_name: Minimize OID comparisons during disambiguation Makefile | 2 + sha1_name.c | 128 ++- t/helper/.gitignore | 2 + t/helper/test-abbrev.c | 19 +++ t/helper/test-list-objects.c | 85 t/perf/p0008-abbrev.sh | 22 6 files changed, 243 insertions(+), 15 deletions(-) create mode 100644 t/helper/test-abbrev.c create mode