Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
Ben Peart writes: > @@ -344,6 +346,7 @@ struct index_state { > struct hashmap dir_hash; > unsigned char sha1[20]; > struct untracked_cache *untracked; > + uint64_t fsmonitor_last_update; This field being zero has more significance than just "we haven't got any update yet", right? The way I am reading the code is that setting it 0 is a way to signal that fsmon has been inactivated. It also made me wonder if add_fsmonitor() that silently returns without doing anything when this field is already non-zero is a bug (in other words, I couldn't tell what the right answer would be to a question "shouldn't the caller be avoiding duplicate calls?"). > diff --git a/fsmonitor.c b/fsmonitor.c > new file mode 100644 > index 00..b8b2d88fe1 > --- /dev/null > +++ b/fsmonitor.c > ... This part was a pleasant read. Thanks.
Re: [PATCH v7 03/12] update-index: add a new --force-write-index option
Ben Peart writes: > + OPT_SET_INT(0, "force-write-index", &force_write, > + N_("write out the index even if is not flagged as > changed"), 1), Hmph. The only time this makes difference is when the code forgets to mark active_cache_changed even when it actually made a change to the index, no? I do understand the wish to be able to observe what _would_ be written if such a bug did not exist in order to debug the other aspects of the change in this series, but at the same time I fear that we may end up sweeping the problem under the rug by running the tests with this option. > OPT_END() > }; > > @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const > char *prefix) > die("BUG: bad untracked_cache value: %d", untracked_cache); > } > > - if (active_cache_changed) { > + if (active_cache_changed || force_write) { > if (newfd < 0) { > if (refresh_args.flags & REFRESH_QUIET) > exit(128);
Re: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
Jonathan Nieder writes: > D. Eliminate for_each_string_list_item and let callers just do > > unsigned int i; > for (i = 0; i < list->nr; i++) { > struct string_list_item *item = list->items[i]; > ... > } > >Having to declare item is unnecessarily verbose, decreasing the >appeal of this option. I think I like it anyway, but I wasn't able >to convince coccinelle to do it. When using the macro, item still needs to be declared outside by the user, so it's not all that unpleasant, though. > E. Use subtraction instead of addition: >#define for_each_string_list_item(item, list) \ > for (item = ...; \ >(item == list->items ? 0 : item - list->items) < nr; \ >item++) > >I expected the compiler to figure out that this is a long way of writing >(item - list->items), but at least with gcc 4.8.4 -O2, no such >luck. This generates uglier assembly than the NULL check. Yuck. You cannot easily unsee such an ugliness X-<. The patch and explanation above --- looked quite nicely written. Will queue. Thanks. > diff --git a/string-list.h b/string-list.h > index 29bfb7ae45..79ae567cbc 100644 > --- a/string-list.h > +++ b/string-list.h > @@ -32,8 +32,10 @@ void string_list_clear_func(struct string_list *list, > string_list_clear_func_t c > typedef int (*string_list_each_func_t)(struct string_list_item *, void *); > int for_each_string_list(struct string_list *list, >string_list_each_func_t, void *cb_data); > -#define for_each_string_list_item(item,list) \ > - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) > +#define for_each_string_list_item(item,list)\ > + for (item = (list)->items; \ > + item && item < (list)->items + (list)->nr; \ > + ++item) > > /* > * Apply want to each item in list, retaining only the ones for which
Re: Behaviour of 'git stash show' when a stash has untracked files
Kaartic Sivaraam writes: > Some time ago, I stashed a few changes along with untracked files. I > almost forgot it until recently. Then I wanted to see what I change I > had in the stash. So I did a 'git stash show '. It worked fine but > didn't say anything about the untracked files in that stash. That made > me wonder where the untracked files I added went. I then applied the > stash to see that they were still there but weren't listed in show. > > I understand that they aren't listed because 'git stash show' is > typically a "diff between the stashed state and its original parent" as > the documentation says but shouldn't there be at least a message that > the stash contains untracked files? Those untracked files are "part of > the stash" and I see no way to get information about their presence > currently. > > So, should this behaviour be changed? Hmm, crickets tell us that nobody is all that interested in this, it seems. I do not think I'd be against a new feature that lets users ask what untracked paths are in the stash (and even their contents), but I do think it is a bad idea to change a vanilla "stash show" to show that information in addition to what is currently shown. Two things need to be designed carefuly. One is the UI to _invoke_ the new feature, the other is the output from the new feature. As to the invocation, an obvious pair of choices are: - "git stash show-untracked stash@{0} [ [--] ]"? - "git stash show --untracked stash@{0} [ [--] ]"? I'd personally vote for the former, if only because the latter makes the design more complicated. For one thing, tying the feature to "show" means the output _must_ be in the form of "diff" output in order to be consistent with the normal output from the subcommand, but a whole-file creation diff may not be the best way to show the entire contents of an untracked file. Also by adding it as an option to the existing "show" command, it makes it debatable if the output should show the contents of untracked files in addition to the stashed changes of tracked paths, or in place of them. Because I suspect that viewing contents of the untracked files and the changes to tracked paths may serve quite different purposes from the point of view of expected use cases, I am leaning towards saying that it is a bad idea to show contents of untracked paths in addition to changes to tracked paths. There probably are other reasons why we should prefer the former, i.e. a separate subcommand to "git stash", independent of the existing "git stash show". Assuming that we choose to go with a separate command, the output format from the command does not have to be in the form of patch, so perhaps "git stash show-untracked --list" may be a way to list the paths (and we may want -z to show the list NUL-terminated like "ls-files" does)? There may be other operations to help those who want a way to learn about untracked paths in a stash. But this is not exactly my itch so I'll let people who do have the itch to work on designing the details out.
[PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list
From: Michael Haggerty If you pass a newly initialized or newly cleared `string_list` to `for_each_string_list_item()`, then the latter does for ( item = (list)->items; /* NULL */ item < (list)->items + (list)->nr; /* NULL + 0 */ ++item) Even though this probably works almost everywhere, it is undefined behavior, and it could plausibly cause highly-optimizing compilers to misbehave. C99 section 6.5.6 paragraph 8 explains: If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. and (6.3.2.3.3) a null pointer does not point to anything. Guard the loop with a NULL check to make the intent crystal clear to even the most pedantic compiler. A suitably clever compiler could let the NULL check only run in the first iteration, but regardless, this overhead is likely to be dwarfed by the work to be done on each item. This problem was noticed by Coverity. [jn: using a NULL check instead of a placeholder empty list; fleshed out the commit message based on mailing list discussion] Signed-off-by: Michael Haggerty Signed-off-by: Jonathan Nieder --- string-list.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) Junio C Hamano wrote: > Jonathan Nieder writes: >> ... But a quick test with gcc 4.8.4 >> -O2 finds that at least this compiler does not contain such an >> optimization. The overhead Michael Haggerty mentioned is real. > > Still, I have a feeling that users of string_list wouldn't care > the overhead of single pointer NULL-ness check. > > - apply.c collects conflicted paths and reports them with fprintf(). > > - builtin/clean.c uses the function to walk the list of paths to be >removed, and either does a human interaction (for "-i" codepath) >or goes to the filesystem to remove things. > > - builtin/config.c uses it in get_urlmatch() in preparation for >doing network-y things. > > - builtin/describe.c walks the list of exclude and include patterns >to run wildmatch on the candidate reference name to filter it out. > > ... > > In all of these examples, what happens for each item in the loop > seems to me far heavier than the overhead this macro adds. Yes, agreed. As a small tweak, #define for_each_string_list_item(item, list) \ for (item = ...; item && ...; ...) produces nicer assembly than #define for_each_string_list_item(item, list) \ for (item = ...; list->items && ...; ...) (By the way, the potential optimization I described isn't valid: we know that when item == NULL and list->items == NULL, list->nr is always zero, but the compiler has no way to know that. So it can't eliminate the NULL test. For comparison, a suitably smart compiler should be able to eliminate a 'list->nr != 0 &&' guard if 'list' doesn't escape in the loop body.) Recapping the other proposed fixes: A. Make it an invariant of string_list that items is never NULL and update string_list_init et al to use an empty array. This is pretty painless until you notice some other structs that embed string_list without using STRING_LIST_INIT. Updating all those would be too painful. B. #define for_each_string_list_item(item, list) \ if (list->items) \ for (item = ...; ...; ... ) This breaks a caller like if (foo) for_each_string_list_item(item, list) ... else ... making it a non-starter. C. As Gábor suggested, #define for_each_string_list_item(item, list) \ if (!list->items) \ ; /* nothing to do */ \ else \ for (item = ...; ...; ...) This handles the caller from (B) correctly. But it produces compiler warnings for a caller like if (foo) for_each_string_list_item(item, list) ... There is only one instance of that construct in git today. It looks nicer anyway with braces, so this approach would also be promising. D. Eliminate for_each_string_list_item and let callers just do unsigned int i; for (i = 0; i < list->nr; i++) { struct string_list_item *item = list->items[i]; ... } Having to declare item is unnecessarily verbose, decreasing the appeal of this option. I think I like it anyway, but I wasn't able to convince coccinelle to do it. E. Use subtraction instead of addition: #define for_each_string_list_item(item, list) \ for (item = ...; \ (item == list->items ? 0 : item - list->items) < nr; \ item++) I expected the compiler to figure out that this is a long way of writing (item - list->items), but at least with gcc 4.8.4 -O2, no such luck. This generates uglier assembly than the NULL check. diff --git
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Jonathan Nieder writes: > I'll send a patch with a commit message saying so to try to close out > this discussion. Thanks. One less thing we have to worry about ;-)
Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives
Jeff King writes: > At any rate, I think Jonathan's point is that writing: > > UNLEAK(foo) > > will silently pass in a normal build, and only much later will somebody > run a leak-checking build and see the compile error. Yeah, I think I understand that concern. #if SOME_CONDITION #define X(y) do_x(y) #else #define X(y) /* nothing */ #endif is something we quite often use, and I just found it a bit unusual that all of a sudden we start to be extra careful only with this macro.
Re: [PATCH] Win32: simplify loading of DLL functions
Jonathan Nieder writes: > ... > strerror(ENOSYS) is "Function not implemented". Cute. > > Reviewed-by: Jonathan Nieder Thanks, both. Will queue.
Re: [RFC PATCH 3/5] branch: cleanup branch name validation
Kaartic Sivaraam writes: > -int validate_new_branchname(const char *name, struct strbuf *ref, > - int force, int attr_only) > +int validate_branch_update(const char *name, struct strbuf *ref, > +int could_exist, int clobber_head) "update" to me means something already exists and the caller is asking this function if it is OK to update it, but is that what this function is used for? I do not find the original name too bad, but if I were renaming it, I'd call it ok_to_create_branch(), with the understanding that forcing a recreation of an existing branch falls into the wider definition of "create". Also I'd avoid "could", which can be taken as an optimization hint (i.e. "you usually do not have to worry about this thing to already exist, but I am telling you that for this one call that is not the case and you need to be a bit more careful by spending extra cycles to see if it is and deal with the situation accordingly if it indeed is"), and use "ok" as part of the name for the parameter (or flip the meaning of it and say "create_only" or something).
Re: [RFC PATCH 2/5] branch: document the usage of certain parameters
Kaartic Sivaraam writes: > Documentation for a certain function was incomplete as it didn't say > what certain parameters were used for. > > So, document them for the sake of completeness and easy reference. Thanks. > @@ -15,6 +15,11 @@ > * > * - reflog creates a reflog for the branch > * > + * - if 'force' is true, clobber_head indicates whether the branch could be > + * the current branch; else it has no effect Everybody else in this list begins with what it is describing for easy eyeballing. Can you make this match that pattern? Also, what does "could be" mean in that sentence? Is the caller telling the function "how, I do not exactly know if that is the case, but the branch I am asking to create you might be the same branch as what is currently checked out, so be extra careful"? Or is the comment telling a potential caller that it can pass true to signal that create_branch() is allowed to (re)"create" the branch that is already checked out (hence it already exists)? I think the confusing statement above arises because an assumption is unstated there. If the reader knows "Even with force, calling create_branch() on the currently checked out branch is normally forbidden", then the reader can guess your "could" mean the latter. - clobber_head_ok allows the currently checked out (hence existing) branch to be overwritten; without force, it has no effect. perhaps? As the underlying helper calls it clobber_head_ok, and that name is more clearly expresses that this is a permission than the current name, I chose to add _ok to the above example, but if you are to take the suggestion, you'd need to also update the names in the declaration, too. > + * > + * - quiet suppresses tracking information > + * > * - track causes the new branch to be configured to merge the remote > branch > * that start_name is a tracking branch for (if any). > */
Re: [RFC PATCH 1/5] builtin/checkout: avoid usage of '!!'
Kaartic Sivaraam writes: > There was a usage for which there's no compelling reason.So, replace > such a usage as with something else that expresses the intent more > clearly. I actually think this is a good example of the exception-rule. The function wants to take true or false in "int", and the caller has a pointer. And !!ptr is a shorter and more established way than ptr != NULL to turn non-NULL ness into an int boolean, without having to either repeat or introducing an otherwise unnecessary temporary.
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Jonathan Nieder writes: > ... But a quick test with gcc 4.8.4 > -O2 finds that at least this compiler does not contain such an > optimization. The overhead Michael Haggerty mentioned is real. Still, I have a feeling that users of string_list wouldn't care the overhead of single pointer NULL-ness check. - apply.c collects conflicted paths and reports them with fprintf(). - builtin/clean.c uses the function to walk the list of paths to be removed, and either does a human interaction (for "-i" codepath) or goes to the filesystem to remove things. - builtin/config.c uses it in get_urlmatch() in preparation for doing network-y things. - builtin/describe.c walks the list of exclude and include patterns to run wildmatch on the candidate reference name to filter it out. ... In all of these examples, what happens for each item in the loop seems to me far heavier than the overhead this macro adds. So...
Re: [PATCH] t4014: strengthen search patterns
Kaartic Sivaraam writes: > The regex patterns for some failing test cases were a bit loose > giving way for a few incorrect outputs being accepted as correct > outputs. If these were part of scripted Porcelain that needs to take any end-user input, then having '.' that are meant to match only a dot is a bug, but I personally do not think it is worth the patch noise to quote them, when we _know_ (after all, we are in control of the data we use for these tests) there is no other lines that would match these patterns. > To avoid such incorrect outputs from being flagged as correct ones > use fixed string matches when possible and strengthen regex when > it's not. > > Signed-off-by: Kaartic Sivaraam > --- > t/t4014-format-patch.sh | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh > index 482112ca339f0..7dff7996c9e1f 100755 > --- a/t/t4014-format-patch.sh > +++ b/t/t4014-format-patch.sh > @@ -163,7 +163,7 @@ test_expect_failure 'additional command line cc (rfc822)' > ' > git config --replace-all format.headers "Cc: R E Cipient > " && > git format-patch --cc="S. E. Cipient " --stdout > master..side | sed -e "/^\$/q" >patch5 && > grep "^Cc: R E Cipient ,\$" patch5 && > - grep "^ *\"S. E. Cipient\" \$" patch5 > + grep "^ *\"S\. E\. Cipient\" \$" patch5 > ' > > test_expect_success 'command line headers' ' > @@ -191,13 +191,13 @@ test_expect_success 'command line To: header (ascii)' ' > test_expect_failure 'command line To: header (rfc822)' ' > > git format-patch --to="R. E. Cipient " --stdout > master..side | sed -e "/^\$/q" >patch8 && > - grep "^To: \"R. E. Cipient\" \$" patch8 > + grep -F "To: \"R. E. Cipient\" " patch8 > ' > > test_expect_failure 'command line To: header (rfc2047)' ' > > git format-patch --to="R Ä Cipient " --stdout > master..side | sed -e "/^\$/q" >patch8 && > - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" > patch8 > + grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" > patch8 > ' > > test_expect_success 'configuration To: header (ascii)' ' > @@ -211,14 +211,14 @@ test_expect_failure 'configuration To: header (rfc822)' > ' > > git config format.to "R. E. Cipient " && > git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 && > - grep "^To: \"R. E. Cipient\" \$" patch9 > + grep -F "To: \"R. E. Cipient\" " patch9 > ' > > test_expect_failure 'configuration To: header (rfc2047)' ' > > git config format.to "R Ä Cipient " && > git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 && > - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" > patch9 > + grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" > patch9 > ' > > # check_patch : Verify that looks like a half-sane > > -- > https://github.com/git/git/pull/406
Re: [PATCH v2 0/2] fix read past end of array in alternates files
Jeff King writes: > On Mon, Sep 18, 2017 at 11:51:00AM -0400, Jeff King wrote: > >> This series fixes a regression in v2.11.1 where we might read past the >> end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't >> base the patch on there, for a few reasons: > > Here's a v2 that fixes a minor leak in the first patch (I carefully > remembered to free() the path buffer on the error path, but forgot to do > it in the normal one. Oops). Thanks. > I also tried to address Jonathan's "should this be in the commit > message" comment. The information above _is_ in there, but maybe putting > it at the top as a sort of tl;dr makes it easier to find? Probably. > The second patch is unchanged. > > Junio, I see you ended up back-porting it to v2.11. Would you prefer me > to have done it that way in the first place? I was trying to reduce your > work by basing it on "maint" (figuring that we wouldn't bother making a > v2.11.x release anyway, and that skips you having to apply the second > patch separately after the merge). Upon seeing that this dated back to 2.11, because I am lazy and do not assess how much the fix needs to go to older tracks when I am queuing (remember: my attention span during patch queueing is measured in minutes, as people send changes to different areas), I tend to first try to see what's the oldest maintenance track we can practically apply the patch to. It turned out that the conflict resolution to apply on maint-2.11 wasn't that painful, so I took the lazy route all the way---the real fix on the oldest, even though I do not know (because I refused to think and decide due to laziness) if a next v2.11.x release is necessary, followed by a nice-to-have warning that uses newer features on the maintenance track. That way, when we decide that the fix won't be a big deal to require a new v2.11.x, but it is nice to have in v2.13.x, we could merge the first one, without having to cherry-pick. All of the above is part of how the daily maintainer workflow goes, and there is no strong preference on my side, if the original is on the theoretically oldest (i.e. maint-2.11) or on the oldest practical (i.e. maint), as long as the conflicts are not too painful.
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Hi, Kaartic Sivaraam wrote: > On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote: >> Jonathan Nieder wrote: >>> Does the following alternate fix work? I think I prefer it because >>> it doesn't require introducing a new global. [...] >>> #define for_each_string_list_item(item,list) \ >>> - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) >>> + for (item = (list)->items; \ >>> +(list)->items && item < (list)->items + (list)->nr; \ >>> +++item) >> >> This is the possibility that I was referring to as "add[ing] overhead to >> each iteration of the loop". I'd rather not add an extra test-and-branch >> to every iteration of a loop in which `list->items` is *not* NULL, which >> your solution appears to do. Or are compilers routinely able to optimize >> the check out? > > I t seems at least 'gcc' is able to optimize this out even with a -O1 > and 'clang' optimizes this out with a -O2. Taking a sneak peek at > the 'Makefile' shows that our default is -O2. > > For a proof, see https://godbolt.org/g/CPt73L >From that link: for ( ;valid_int && *valid_int < 10; (*valid_int)++) { printf("Valid instance"); } Both gcc and clang are able to optimize out the 'valid_int &&' because it is dereferenced on the RHS of the &&. For comparison, 'item < (list)->items + (list)->nr' does not dereference (list)->items. So that optimization doesn't apply here. A smart compiler could be able to take advantage of there being no object pointed to by a null pointer, which means item < (list)->items + (list)->nr is always false when (list)->items is NULL, which in turn makes a '(list)->items &&' test redundant. But a quick test with gcc 4.8.4 -O2 finds that at least this compiler does not contain such an optimization. The overhead Michael Haggerty mentioned is real. Thanks and hope that helps, Jonathan
Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
Ben Peart writes: > +/* do stat comparison even if CE_FSMONITOR_VALID is true */ > +#define CE_MATCH_IGNORE_FSMONITOR 0X20 Hmm, when should a programmer use this flag? > +int git_config_get_fsmonitor(void) > +{ > + if (git_config_get_pathname("core.fsmonitor", &core_fsmonitor)) > + core_fsmonitor = getenv("GIT_FSMONITOR_TEST"); Will the environment be part of the published API, or is it a remnant of a useful tool for debugging while developing the feature? If it is the former (and I'd say why not, even though "git -c core.fsmontor=..." may be easy enough), we might want to rename it to replace _TEST with _PROGRAM or something and document it. > diff --git a/diff-lib.c b/diff-lib.c > index 2a52b07954..23c6d03ca9 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -12,6 +12,7 @@ > #include "refs.h" > #include "submodule.h" > #include "dir.h" > +#include "fsmonitor.h" > > /* > * diff-files > @@ -228,6 +229,7 @@ int run_diff_files(struct rev_info *revs, unsigned int > option) > > if (!changed && !dirty_submodule) { > ce_mark_uptodate(ce); > + mark_fsmonitor_valid(ce); I notice all calls to mark_fsmonitor_valid(ce) always follow a call to ce_mark_uptodate(ce). Should the call to the former be made as part of the latter instead? Are there cases where we want to call ce_mark_uptodate(ce) to mark the ce up-to-date, yet we do not want to call mark_fsmonitor_valid(ce)? How does a programmer tell when s/he wants to call ce_mark_uptodate(ce) if s/he also should call mark_fsmonitor_valid(ce)? Together with "when to pass IGNORE_FSMONITOR" question, is there a summary that guides new programmers to answer these questions in the new documentation? > diff --git a/dir.h b/dir.h > index e3717055d1..fab8fc1561 100644 > --- a/dir.h > +++ b/dir.h > @@ -139,6 +139,8 @@ struct untracked_cache { > int gitignore_invalidated; > int dir_invalidated; > int dir_opened; > + /* fsmonitor invalidation data */ > + unsigned int use_fsmonitor : 1; This makes it look like we will add a bit more fields in later patches that are about "invalidation" around fsmonitor, but it appears that such an addition never happens til the end of the series. And use_fsmonitor boolean does not seem to be what the comment says---it just tells us if fsmonitor is in use in the operation of the untracked cache, no? > diff --git a/entry.c b/entry.c > index cb291aa88b..5e6794f9fc 100644 > --- a/entry.c > +++ b/entry.c > @@ -4,6 +4,7 @@ > #include "streaming.h" > #include "submodule.h" > #include "progress.h" > +#include "fsmonitor.h" > > static void create_directories(const char *path, int path_len, > const struct checkout *state) > @@ -357,6 +358,7 @@ static int write_entry(struct cache_entry *ce, > lstat(ce->name, &st); > fill_stat_cache_info(ce, &st); > ce->ce_flags |= CE_UPDATE_IN_BASE; > + mark_fsmonitor_invalid(state->istate, ce); > state->istate->cache_changed |= CE_ENTRY_CHANGED; Similar to "how does mark_fsmonitor_valid() relate to ce_mark_uptodate()?" question earlier, mark_fsmonitor_invalid() seems to often appear in pairs with the update to cache_changed. Are there cases where we need CE_ENTRY_CHANGED bit added to the index state yet we do not want to call mark_fsmonitor_invalid()? I am wondering if there should be a single helper function that lets callers say "I changed this ce in this istate this way. Please update CE_VALID, CE_UPDATE_IN_BASE and CE_FSMONITOR_VALID accordingly". That "changed _this way_" is deliberately vague in my question above, because it is not yet clear to me when mark-valid and mark-invalid should and should not be called from the series. > + /* a fsmonitor process can return '*' to indicate all entries are > invalid */ > + if (query_success && query_result.buf[0] != '/') { > + /* Mark all entries returned by the monitor as dirty */ The comment talks about '*' and code checks if it is not '/'? The else clause of this if/else handles the "invalidate all" case, so shouldn't we be comparing with '*' instead here? Ah, fsmonitor-watchman section of the doc tells us to write the hook in a way to return slash, so the comment above the code is stale and the comparison with '/' is what we want, I guess.
Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives
On Wed, Sep 20, 2017 at 10:45:05AM +0900, Junio C Hamano wrote: > >> > +#else > >> > +#define UNLEAK(var) > >> > >> I think this should be defined to be something (for example, "do {} > >> while (0)"), at least so that we have compiler errors when UNLEAK(var) > >> is used incorrectly (for example, without the semicolon) when > >> SUPPRESS_ANNOTATED_LEAKS is not defined. > > > > Seems reasonable. > > Hmph, I am not so sure about this one. But I agree that the > semicolon must go. I thought we had run into some issues with a compiler or linter complaining about null statements before. But they _are_ pretty common, so maybe I'm mis-remembering (or maybe it was something like the if/else conditional you showed earlier). At any rate, I think Jonathan's point is that writing: UNLEAK(foo) will silently pass in a normal build, and only much later will somebody run a leak-checking build and see the compile error. Of course people adding UNLEAK()s are likely to be compiling leak-check builds to confirm that they've silenced the warning, so maybe it's not that important a case to catch. -Peff
[no subject]
<>
Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives
Jeff King writes: > On Tue, Sep 19, 2017 at 01:45:52PM -0700, Jonathan Tan wrote: > >> The following comments are assuming that we're going to standardize on >> UNLEAK(var); (with the semicolon). > > Yeah, I assumed we would. We don't have to, since this really is sort-of > magical, but I think the code looks better with it. > >> On Fri, 8 Sep 2017 02:38:41 -0400 >> Jeff King wrote: >> >> > +#ifdef SUPPRESS_ANNOTATED_LEAKS >> > +extern void unleak_memory(const void *ptr, size_t len); >> > +#define UNLEAK(var) unleak_memory(&(var), sizeof(var)); >> >> I would feel better if the semicolon was omitted. I don't think it >> matters in this particular case, though. > > You end up with a double semi-colon. Some linters might complain. Yeah, and it makes if (some condition) UNLEAK(var); else do_something_else_to(var); a syntax error. Should have spotted during the review; sorry. > >> > +#else >> > +#define UNLEAK(var) >> >> I think this should be defined to be something (for example, "do {} >> while (0)"), at least so that we have compiler errors when UNLEAK(var) >> is used incorrectly (for example, without the semicolon) when >> SUPPRESS_ANNOTATED_LEAKS is not defined. > > Seems reasonable. Hmph, I am not so sure about this one. But I agree that the semicolon must go. > > I think both are worth doing, but the patch is already in next so we > need to do it on top. Do you want to prepare a patch? > > -Peff
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Hi, Junio C Hamano wrote: > Kaartic Sivaraam writes: >> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote: >>> Jonathan Nieder wrote: Does the following alternate fix work? I think I prefer it because it doesn't require introducing a new global. [...] #define for_each_string_list_item(item,list) \ - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) + for (item = (list)->items; \ + (list)->items && item < (list)->items + (list)->nr; \ + ++item) >>> >>> This is the possibility that I was referring to as "add[ing] overhead to >>> each iteration of the loop". I'd rather not add an extra test-and-branch >>> to every iteration of a loop in which `list->items` is *not* NULL, which >>> your solution appears to do. Or are compilers routinely able to optimize >>> the check out? >> >> It seems at least 'gcc' is able to optimize this out even with a -O1 >> and 'clang' optimizes this out with a -O2. Taking a sneak peek at >> the 'Makefile' shows that our default is -O2. > > But doesn't the versions of gcc and clang currently available do the > right thing with the current code without this change anyway? I've > been operating under the assumption that this is to future-proof the > code even when the compilers change to use the "NULL+0 is undefined" > as an excuse to make demons fly out of your nose, so unfortunately I > do not think it is not so huge a plus to find that the current > compilers do the right thing to the code with proposed updates. I think you and Kaartic are talking about different things. Kaartic was checking that this wouldn't introduce a performance regression (thanks!). You are concerned about whether the C standard and common practice treat the resulting code as exhibiting undefined behavior. Fortunately the C standard is pretty clear about this. The undefined behavior here is at run time, not compile time. As you suggested in an earlier reply, the 'list->items &&' effectively guards the 'list->items + list->nr' to prevent that undefined behavior. I'll send a patch with a commit message saying so to try to close out this discussion. Thanks, Jonathan
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
Kaartic Sivaraam writes: > On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote: >>> Does the following alternate fix work? I think I prefer it because >>> it doesn't require introducing a new global. [...] >>> #define for_each_string_list_item(item,list) \ >>> - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) >>> + for (item = (list)->items; \ >>> +(list)->items && item < (list)->items + (list)->nr; \ >>> +++item) >> This is the possibility that I was referring to as "add[ing] overhead to >> each iteration of the loop". I'd rather not add an extra test-and-branch >> to every iteration of a loop in which `list->items` is *not* NULL, which >> your solution appears to do. Or are compilers routinely able to optimize >> the check out? > > It seems at least 'gcc' is able to optimize this out even with a -O1 > and 'clang' optimizes this out with a -O2. Taking a sneak peek at > the 'Makefile' shows that our default is -O2. But doesn't the versions of gcc and clang currently available do the right thing with the current code without this change anyway? I've been operating under the assumption that this is to future-proof the code even when the compilers change to use the "NULL+0 is undefined" as an excuse to make demons fly out of your nose, so unfortunately I do not think it is not so huge a plus to find that the current compilers do the right thing to the code with proposed updates.
[PATCH v2] describe: teach --match to handle branches and remotes
When `git describe` uses `--match`, it matches only tags, basically ignoring the `--all` argument even when it is specified. Fix it by also matching branch name and $remote_name/$remote_branch_name, for remote-tracking references, with the specified patterns. Update documentation accordingly and add tests. Signed-off-by: Max Kirillov --- Changed to use skip_prefix(). Calculate path_to_match only once. Add case of discarding unknown type with exclude Documentation/git-describe.txt | 24 ++-- builtin/describe.c | 29 + t/t6120-describe.sh| 27 +++ 3 files changed, 58 insertions(+), 22 deletions(-) diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index 26f19d3b07..c924c945ba 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -87,19 +87,23 @@ OPTIONS --match :: Only consider tags matching the given `glob(7)` pattern, - excluding the "refs/tags/" prefix. This can be used to avoid - leaking private tags from the repository. If given multiple times, a - list of patterns will be accumulated, and tags matching any of the - patterns will be considered. Use `--no-match` to clear and reset the - list of patterns. + excluding the "refs/tags/" prefix. If used with `--all`, it also + considers local branches and remote-tracking references matching the + pattern, excluding respectively "refs/heads/" and "refs/remotes/" + prefix; references of other types are never considered. If given + multiple times, a list of patterns will be accumulated, and tags + matching any of the patterns will be considered. Use `--no-match` to + clear and reset the list of patterns. --exclude :: Do not consider tags matching the given `glob(7)` pattern, excluding - the "refs/tags/" prefix. This can be used to narrow the tag space and - find only tags matching some meaningful criteria. If given multiple - times, a list of patterns will be accumulated and tags matching any - of the patterns will be excluded. When combined with --match a tag will - be considered when it matches at least one --match pattern and does not + the "refs/tags/" prefix. If used with `--all`, it also does not consider + local branches and remote-tracking references matching the pattern, + excluding respectively "refs/heads/" and "refs/remotes/" prefix; + references of other types are never considered. If given multiple times, + a list of patterns will be accumulated and tags matching any of the + patterns will be excluded. When combined with --match a tag will be + considered when it matches at least one --match pattern and does not match any of the --exclude patterns. Use `--no-exclude` to clear and reset the list of patterns. diff --git a/builtin/describe.c b/builtin/describe.c index 42afa1e244..f15b6e531d 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -129,13 +129,24 @@ static void add_to_known_names(const char *path, static int get_name(const char *path, const struct object_id *oid, int flag, void *cb_data) { - int is_tag = starts_with(path, "refs/tags/"); + int is_tag = 0; struct object_id peeled; int is_annotated, prio; - - /* Reject anything outside refs/tags/ unless --all */ - if (!all && !is_tag) + const char *path_to_match = NULL; + + if (skip_prefix(path, "refs/tags/", &path_to_match)) { + is_tag = 1; + } else if (all) { + if ((exclude_patterns.nr || patterns.nr) && + !skip_prefix(path, "refs/heads/", &path_to_match) && + !skip_prefix(path, "refs/remotes/", &path_to_match)) { + /* Only accept reference of known type if there are match/exclude patterns */ + return 0; + } + } else { + /* Reject anything outside refs/tags/ unless --all */ return 0; + } /* * If we're given exclude patterns, first exclude any tag which match @@ -144,11 +155,8 @@ static int get_name(const char *path, const struct object_id *oid, int flag, voi if (exclude_patterns.nr) { struct string_list_item *item; - if (!is_tag) - return 0; - for_each_string_list_item(item, &exclude_patterns) { - if (!wildmatch(item->string, path + 10, 0)) + if (!wildmatch(item->string, path_to_match, 0)) return 0; } } @@ -161,11 +169,8 @@ static int get_name(const char *path, const struct object_id *oid, int flag, voi int found = 0; struct string_list_item *item; - if (!is_tag) -
Re: [PATCH] describe: teach --match to handle branches and remotes
On Tue, Sep 19, 2017 at 08:52:24AM +0900, Junio C Hamano wrote: > I think you can use skip_prefix() to avoid counting the length of > the prefix yourself, i.e. Thanks, will use it. > The hardcoded +10 for "is_tag" case assumes that anything other than > "refs/tags/something" would ever be used to call into this function > when is_tag is true, and that may well be true in the current code > and have been so ever since the original code was written, but it > still smells like an invitation for future bugs. is_tag is used later. I'll chance it so that it does not rely on it to match, but it still has to produce it. > Was there a reason why A and c are in different cases? Are we > worried about case insensitive filesystems or something? The tags have been there of different case already. I don't know why. I'll change the branch names but I'm reluctant to touch existing tests. -- Max
RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit
> -Original Message- > From: David Turner [mailto:david.tur...@twosigma.com] > Sent: Tuesday, September 19, 2017 5:27 PM > To: 'Ben Peart' ; Ben Peart > > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; > p...@peff.net > Subject: RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the > fsmonitor valid bit > > > > -Original Message- > > From: Ben Peart [mailto:peart...@gmail.com] > > Sent: Tuesday, September 19, 2017 4:45 PM > > To: David Turner ; 'Ben Peart' > > > > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; > > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; > > p...@peff.net > > Subject: Re: [PATCH v7 06/12] ls-files: Add support in ls-files to > > display the fsmonitor valid bit > > > > > > > > On 9/19/2017 3:46 PM, David Turner wrote: > > >> -Original Message- > > >> From: Ben Peart [mailto:benpe...@microsoft.com] > > >> Sent: Tuesday, September 19, 2017 3:28 PM > > >> To: benpe...@microsoft.com > > >> Cc: David Turner ; ava...@gmail.com; > > >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > > >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > > >> Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to > > >> display the fsmonitor valid bit > > >> > > >> Add a new command line option (-f) to ls-files to have it use > > >> lowercase letters for 'fsmonitor valid' files > > >> > > >> Signed-off-by: Ben Peart > > >> --- > > >> builtin/ls-files.c | 8 ++-- > > >> 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > This is still missing the corresponding documentation patch. > > > > Sorry for the confusion. > > Thanks for following up. > > > > 10/12 (no reply, haven't checked whether same issue but I assume > > > same issue since the new case I mentioned isn't added) > > > > It wasn't a bug so I didn't "fix" it. I just sent an explanation and > > patch demonstrating why. You can find it here: > > I was concerned about the case of an untracked file inside a directory that > contains no tracked files. Your patch in this mail treats dir3 just like > dir1 and > dir2. I think you ought to test the case of a dir with no tracked files. > In the case where there is an untracked file inside a directory that contains no tracked files, git will (as shown by the "failing" test) actually find the untracked file. This is the correct/expected behavior. The test failure is just indicating that the optimization of not searching that directory for untracked files was not able to occur (because there was no entry in the untracked cache for that directory). > After more careful checking, it looks like this case does work, but it's still > worth testing.
[PATCH for jk/leak-checkers v2] git-compat-util: make UNLEAK less error-prone
Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false positives", 2017-09-08) introduced an UNLEAK macro to be used as "UNLEAK(var);", but its existing definitions leave semicolons that act as empty statements, which may cause some tools to complain. Therefore, modify its definitions to not leave these semicolons. Signed-off-by: Jonathan Tan --- OK, here is the same patch with an updated commit message. --- git-compat-util.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 003e444c4..9bc15b036 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1184,9 +1184,9 @@ extern int cmd_main(int, const char **); */ #ifdef SUPPRESS_ANNOTATED_LEAKS extern void unleak_memory(const void *ptr, size_t len); -#define UNLEAK(var) unleak_memory(&(var), sizeof(var)); +#define UNLEAK(var) unleak_memory(&(var), sizeof(var)) #else -#define UNLEAK(var) +#define UNLEAK(var) do {} while (0) #endif #endif -- 2.14.1.821.g8fa685d3b7-goog
Re: [PATCH for jk/leak-checkers] git-compat-util: make UNLEAK less error-prone
On Tue, Sep 19, 2017 at 02:34:56PM -0700, Jonathan Tan wrote: > Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false > positives", 2017-09-08) introduced an UNLEAK macro to be used as > "UNLEAK(var);", but its existing definitions make it possible to be > invoked as "UNLEAK(var)" (without the trailing semicolon) too. > > Therefore, modify its definitions to cause a compile-time error if > invoked without the trailing semicolon. The patch itself looks good, but I think your commit message dances around the real motivation: some parsers may complain about doubled semicolons, or semicolons without an associated line of code (which is really just a doubled semicolon, too, I guess). -Peff
[PATCH for jk/leak-checkers] git-compat-util: make UNLEAK less error-prone
Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false positives", 2017-09-08) introduced an UNLEAK macro to be used as "UNLEAK(var);", but its existing definitions make it possible to be invoked as "UNLEAK(var)" (without the trailing semicolon) too. Therefore, modify its definitions to cause a compile-time error if invoked without the trailing semicolon. Signed-off-by: Jonathan Tan --- Sure, here is the patch. --- git-compat-util.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 003e444c4..9bc15b036 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1184,9 +1184,9 @@ extern int cmd_main(int, const char **); */ #ifdef SUPPRESS_ANNOTATED_LEAKS extern void unleak_memory(const void *ptr, size_t len); -#define UNLEAK(var) unleak_memory(&(var), sizeof(var)); +#define UNLEAK(var) unleak_memory(&(var), sizeof(var)) #else -#define UNLEAK(var) +#define UNLEAK(var) do {} while (0) #endif #endif -- 2.14.1.821.g8fa685d3b7-goog
RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit
> -Original Message- > From: Ben Peart [mailto:peart...@gmail.com] > Sent: Tuesday, September 19, 2017 4:45 PM > To: David Turner ; 'Ben Peart' > > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; > p...@peff.net > Subject: Re: [PATCH v7 06/12] ls-files: Add support in ls-files to display the > fsmonitor valid bit > > > > On 9/19/2017 3:46 PM, David Turner wrote: > >> -Original Message- > >> From: Ben Peart [mailto:benpe...@microsoft.com] > >> Sent: Tuesday, September 19, 2017 3:28 PM > >> To: benpe...@microsoft.com > >> Cc: David Turner ; ava...@gmail.com; > >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > >> Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to > >> display the fsmonitor valid bit > >> > >> Add a new command line option (-f) to ls-files to have it use > >> lowercase letters for 'fsmonitor valid' files > >> > >> Signed-off-by: Ben Peart > >> --- > >> builtin/ls-files.c | 8 ++-- > >> 1 file changed, 6 insertions(+), 2 deletions(-) > > > > This is still missing the corresponding documentation patch. > > Sorry for the confusion. Thanks for following up. > > 10/12 (no reply, haven't checked whether same issue but I assume same > > issue since the new case I mentioned isn't added) > > It wasn't a bug so I didn't "fix" it. I just sent an explanation and patch > demonstrating why. You can find it here: I was concerned about the case of an untracked file inside a directory that contains no tracked files. Your patch in this mail treats dir3 just like dir1 and dir2. I think you ought to test the case of a dir with no tracked files. After more careful checking, it looks like this case does work, but it's still worth testing.
Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives
On Tue, Sep 19, 2017 at 01:45:52PM -0700, Jonathan Tan wrote: > The following comments are assuming that we're going to standardize on > UNLEAK(var); (with the semicolon). Yeah, I assumed we would. We don't have to, since this really is sort-of magical, but I think the code looks better with it. > On Fri, 8 Sep 2017 02:38:41 -0400 > Jeff King wrote: > > > +#ifdef SUPPRESS_ANNOTATED_LEAKS > > +extern void unleak_memory(const void *ptr, size_t len); > > +#define UNLEAK(var) unleak_memory(&(var), sizeof(var)); > > I would feel better if the semicolon was omitted. I don't think it > matters in this particular case, though. You end up with a double semi-colon. Some linters might complain. > > +#else > > +#define UNLEAK(var) > > I think this should be defined to be something (for example, "do {} > while (0)"), at least so that we have compiler errors when UNLEAK(var) > is used incorrectly (for example, without the semicolon) when > SUPPRESS_ANNOTATED_LEAKS is not defined. Seems reasonable. I think both are worth doing, but the patch is already in next so we need to do it on top. Do you want to prepare a patch? -Peff
Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives
Thanks - this does look like a good thing to have. Sorry for the late comments. The following comments are assuming that we're going to standardize on UNLEAK(var); (with the semicolon). On Fri, 8 Sep 2017 02:38:41 -0400 Jeff King wrote: > +#ifdef SUPPRESS_ANNOTATED_LEAKS > +extern void unleak_memory(const void *ptr, size_t len); > +#define UNLEAK(var) unleak_memory(&(var), sizeof(var)); I would feel better if the semicolon was omitted. I don't think it matters in this particular case, though. > +#else > +#define UNLEAK(var) I think this should be defined to be something (for example, "do {} while (0)"), at least so that we have compiler errors when UNLEAK(var) is used incorrectly (for example, without the semicolon) when SUPPRESS_ANNOTATED_LEAKS is not defined.
Re: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit
On 9/19/2017 3:46 PM, David Turner wrote: -Original Message- From: Ben Peart [mailto:benpe...@microsoft.com] Sent: Tuesday, September 19, 2017 3:28 PM To: benpe...@microsoft.com Cc: David Turner ; ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit Add a new command line option (-f) to ls-files to have it use lowercase letters for 'fsmonitor valid' files Signed-off-by: Ben Peart --- builtin/ls-files.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) This is still missing the corresponding documentation patch. Sorry for the confusion. The documentation is all in a patch together as they all have links to each other. You can find it here: https://public-inbox.org/git/20170919192744.19224-6-benpe...@microsoft.com/T/#u I can see from replies that at least some of my messages got through. In total, I sent messages about: 04/12 (I see replies) 05/12 (I see replies) 06/12 (no reply, issue not fixed) The documentation is all in a patch together as they all have links to each other. You can find it here: https://public-inbox.org/git/20170919192744.19224-6-benpe...@microsoft.com/T/#u 10/12 (no reply, haven't checked whether same issue but I assume same issue since the new case I mentioned isn't added) It wasn't a bug so I didn't "fix" it. I just sent an explanation and patch demonstrating why. You can find it here: https://public-inbox.org/git/84981984-02c1-f322-a617-57dfe1d87...@gmail.com/T/#u 12/12 (no reply, typo fixed -- no reply required) Hope this helps.
Re: [PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test
Hi, Ben Peart wrote: > The split index test t1700-split-index.sh has hard coded SHA values for > the index. Currently it supports index V4 and V3 but assumes there are > no index extensions loaded. > > When manually forcing the fsmonitor extension to be turned on when > running the test suite, the SHA values no longer match which causes the > test to fail. > > The potential matrix of index extensions and index versions can is quite > large so instead disable the extension before attempting to run the test. Thanks for finding and diagnosing this problem. This feels to me like the wrong fix. Wouldn't it be better for the test not to depend on the precise object ids? See the "Tips for Writing Tests" section in t/README: And such drastic changes to the core GIT that even changes these otherwise supposedly stable object IDs should be accompanied by an update to t-basic.sh. However, other tests that simply rely on basic parts of the core GIT working properly should not have that level of intimate knowledge of the core GIT internals. If all the test scripts hardcoded the object IDs like t-basic.sh does, that defeats the purpose of t-basic.sh, which is to isolate that level of validation in one place. Your test also ends up needing updating when such a change to the internal happens, so do _not_ do it and leave the low level of validation to t-basic.sh. Worse, t1700-split-index.sh doesn't explain where the object ids it uses comes from so it is not even obvious to a casual reader like me how to fix it. See t/diff-lib.sh for some examples of one way to avoid depending on the object id computation. Another way that is often preferable is to come up with commands to compute the expected hash values, like $(git rev-parse HEAD^{tree}), and use those instead of hard-coded values. Thanks and hope that helps, Jonathan
Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable
Torsten Bögershausen wrote: > Some implementations of `echo` support the '-e' option to enable > backslash interpretation of the following string. > As an addition, they support '-E' to turn it off. nit: please wrap the commit message to a consistent line width. > However, none of these are portable, POSIX doesn't even mention them, > and many implementations don't support them. > > A check for '-n' is already done in check-non-portable-shell.pl, > extend it to cover '-n', '-e' or '-E-' > > Signed-off-by: Torsten Bögershausen > --- > t/check-non-portable-shell.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) An excellent change. Thanks for noticing and fixing this. Reviewed-by: Jonathan Nieder
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
Ben Peart wrote: > Some stats on these same coding style errors in the current bash scripts: > > 298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space) > 140 instances of "if \[ .* \]" (not using the preferred "test") > 293 instances of "if .*; then" > > Wouldn't it be great not to have to write up style feedback for when > these all get copy/pasted into new scripts? Agreed. Care to write patches for these? :) (I think three patches, one for each issue, would do the trick.) Thanks, Jonathan
Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
On 9/19/2017 3:32 PM, David Turner wrote: I think my comment here might have gotten lost, and I don't want it to because it's something I'm really worried about: You have to update the test completely to ensure it passes. If you run the test with the '-v' option you will see the cause of the failure: t7519-status-fsmonitor.sh: line 27: dir3/untracked: No such file or directory To fix this, you will also need to update the 'setup' test to create the directory for the new untracked file to get created into. Then you will need to drop at least one file in it so that the directory is preserved through the 'git reset --hard' Then you have to update the several 'cat >expect' blocks to expect the new file. In addition, the ability to avoid scanning for untracked files relies on the untracked cache. If you don't have another file that git is aware of in that directory then there won't be a cache entry and git will do the required scan and detect the new untracked file (by design). Here is a patch to the test that updates it to meet all these requirements. I hope this helps. diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh index c6df85af5e..29ae4e284f 100755 --- a/t/t7519-status-fsmonitor.sh +++ b/t/t7519-status-fsmonitor.sh @@ -24,12 +24,14 @@ dirty_repo () { : >untracked && : >dir1/untracked && : >dir2/untracked && + : >dir3/untracked && echo 1 >modified && echo 2 >dir1/modified && echo 3 >dir2/modified && echo 4 >new && echo 5 >dir1/new && echo 6 >dir2/new + echo 7 >dir3/new } write_integration_script () { @@ -47,12 +49,14 @@ write_integration_script () { printf "untracked\0" printf "dir1/untracked\0" printf "dir2/untracked\0" + printf "dir3/untracked\0" printf "modified\0" printf "dir1/modified\0" printf "dir2/modified\0" printf "new\0" printf "dir1/new\0" printf "dir2/new\0" + printf "dir3/new\0" EOF } @@ -71,6 +75,8 @@ test_expect_success 'setup' ' mkdir dir2 && : >dir2/tracked && : >dir2/modified && + mkdir dir3 && + : >dir3/tracked && git -c core.fsmonitor= add . && git -c core.fsmonitor= commit -m initial && git config core.fsmonitor .git/hooks/fsmonitor-test && @@ -107,6 +113,7 @@ h dir1/modified H dir1/tracked h dir2/modified H dir2/tracked +H dir3/tracked h modified H tracked EOF @@ -126,6 +133,7 @@ H dir1/modified H dir1/tracked H dir2/modified H dir2/tracked +H dir3/tracked H modified H tracked EOF @@ -144,6 +152,7 @@ H dir1/modified H dir1/tracked H dir2/modified H dir2/tracked +H dir3/tracked H modified H tracked EOF @@ -164,6 +173,8 @@ H dir1/tracked H dir2/modified h dir2/new H dir2/tracked +h dir3/new +H dir3/tracked H modified h new H tracked @@ -174,6 +185,7 @@ test_expect_success 'newly added files are marked valid' ' git add new && git add dir1/new && git add dir2/new && + git add dir3/new && git ls-files -f >actual && test_cmp expect actual ' @@ -185,6 +197,8 @@ h dir1/tracked H dir2/modified h dir2/new h dir2/tracked +h dir3/new +h dir3/tracked H modified h new h tracked @@ -203,6 +217,7 @@ H dir1/modified h dir1/tracked h dir2/modified h dir2/tracked +h dir3/tracked h modified h tracked EOF @@ -269,6 +284,7 @@ do git add new && git add dir1/new && git add dir2/new && + git add dir3/new && git status >actual && git -c core.fsmonitor= status >expect && test_i18ncmp expect actual -Original Message- From: David Turner Sent: Friday, September 15, 2017 6:00 PM To: 'Ben Peart' Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net Subject: RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension -Original Message- +dirty_repo () { + : >untracked && + : >dir1/untracked && + : >dir2/untracked && + echo 1 >modified && + echo 2 >dir1/modified && + echo 3 >dir2/modified && + echo 4 >new && + echo 5 >dir1/new && + echo 6 >dir2/new If I add an untracked file named dir3/untracked to dirty_repo (and write_integration_script), then "status doesn't detect unreported modifications", below, fails. Did I do something wrong, or does this turn up a bug?
Re: [PATCH] doc: camelCase the config variables to improve readability
Hi, Kaartic Sivaraam wrote: > The config variable used weren't readable as they were in the > crude form of how git stores/uses it's config variables. nit: Git's commit messages describe the current behavior of Git in the present tense. Something like: These manpages' references to config variables are hard to read because they use all-lowercase names, without indicating where each word ends and begins in the configuration variable names. Improve readability by using camelCase instead. Git treats these names case-insensitively so this does not affect functionality. > Improve it's readability by replacing them with camelCased versions > of config variables as it doesn't have any impact on it's usage. nit: s/it's/its/ (in both places) > Signed-off-by: Kaartic Sivaraam > --- > Documentation/git-branch.txt | 4 ++-- > Documentation/git-tag.txt| 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) This also is just more consistent with the rest of the docs. FWIW, with or without the commit message tweaks mentioned above, Reviewed-by: Jonathan Nieder Thanks for your attention to detail.
Re: [PATCH v2 00/21] Read `packed-refs` using mmap()
Hi Michael, On Tue, 19 Sep 2017, Michael Haggerty wrote: > This is v2 of a patch series that changes the reading and caching of the > `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and > Johannes for their comments about v1 [1]. Thank you for the new iteration. > The main change since v1 is to accommodate Windows, which doesn't let > you replace a file using `rename()` if the file is currently mmapped. > This is unfortunate, because it means that Windows will never get the > O(N) → O(lg N) improvement for reading single references that more > capable systems can now enjoy. Triggered by your enquiry, I looked into passing the FILE_SHARE_DELETE flag which I hoped would let us delete the file even if it still is open (and mapped). In my tests, this did not work. If anybody wants to have a look at what I did (and whether they can make it work): https://github.com/dscho/git/tree/replace-wopen > The background was discussed on the mailing list [2]. The bottom line > is that on Windows, keeping the `packed-refs` lock mmapped would be > tantamount to holding reader lock on that file, preventing anybody > (even unrelated processes) from changing the `packed-refs` file while > it is mmapped. This is even worse than the situation for packfiles > (which is solved using `close_all_packs()`), because a packfile, once > created, never needs to be replaced—every packfile has a filename that > is determined from its contents. The worst that can happen if a > packfile is locked is that another process cannot remove it, but that > is not critical for correctness. The `packed-refs` file, on the other > hand, always has the same filename and needs to be overwritten for > correctness. > > So the approach taken here is that a new compile-time option, > `MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then > the `packed-refs` file is read quickly into memory then closed. Another approach would be to imitate close_all_packs() and rely on the Windows-specific code that retries renames in a staggered fashion, waiting a little longer and longer before retrying, and finally telling the user that some file cannot be overwritten: https://github.com/git-for-windows/git/blob/v2.14.1.windows.1/compat/mingw.c#L2439-L2441 This is not a new problem, by the way. If a file is in use while you try to run `git checkout` with a different version of that file, we have the exact same problem on Windows. And we deal with it using that retry_ask_yes_no() function. For this to work, the current process really would need to be able to release all snapshots in one go (for simplicity, I would not even check the filename but simply blow them all away when we want to overwrite packed-refs). I guess I should set aside some time to implement that on top of your series (I *really* want our in-house users to benefit from that O(lg n) improvement). In the meantime, I think this can go forward with the current design. Ciao, Dscho
RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit
> -Original Message- > From: Ben Peart [mailto:benpe...@microsoft.com] > Sent: Tuesday, September 19, 2017 3:28 PM > To: benpe...@microsoft.com > Cc: David Turner ; ava...@gmail.com; > christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com; > johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to display the > fsmonitor valid bit > > Add a new command line option (-f) to ls-files to have it use lowercase > letters for 'fsmonitor valid' files > > Signed-off-by: Ben Peart > --- > builtin/ls-files.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) This is still missing the corresponding documentation patch. I can see from replies that at least some of my messages got through. In total, I sent messages about: 04/12 (I see replies) 05/12 (I see replies) 06/12 (no reply, issue not fixed) 10/12 (no reply, haven't checked whether same issue but I assume same issue since the new case I mentioned isn't added) 12/12 (no reply, typo fixed -- no reply required)
[PATCH v2 2/2] read_info_alternates: warn on non-trivial errors
When we fail to open $GIT_DIR/info/alternates, we silently assume there are no alternates. This is the right thing to do for ENOENT, but not for other errors. A hard error is probably overkill here. If we fail to read an alternates file then either we'll complete our operation anyway, or we'll fail to find some needed object. Either way, a warning is good idea. And we already have a helper function to handle this pattern; let's just call warn_on_fopen_error(). Note that technically the errno from strbuf_read_file() might be from a read() error, not open(). But since read() would never return ENOENT or ENOTDIR, and since it produces a generic "unable to access" error, it's suitable for handling errors from either. Signed-off-by: Jeff King --- sha1_file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sha1_file.c b/sha1_file.c index b682b7ec06..1477ea7b50 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -432,6 +432,7 @@ static void read_info_alternates(const char * relative_base, int depth) path = xstrfmt("%s/info/alternates", relative_base); if (strbuf_read_file(&buf, path, 1024) < 0) { + warn_on_fopen_errors(path); free(path); return; } -- 2.14.1.1014.g252e627ae0
[PATCH v2 1/2] read_info_alternates: read contents into strbuf
This patch fixes a regression in v2.11.1 where we might read past the end of an mmap'd buffer. It was introduced in cf3c635210. The link_alt_odb_entries() function has always taken a ptr/len pair as input. Until cf3c635210 (alternates: accept double-quoted paths, 2016-12-12), we made a copy of those bytes in a string. But after that commit, we switched to parsing the input left-to-right, and we ignore "len" totally, instead reading until we hit a NUL. This has mostly gone unnoticed for a few reasons: 1. All but one caller passes a NUL-terminated string, with "len" pointing to the NUL. 2. The remaining caller, read_info_alternates(), passes in an mmap'd file. Unless the file is an exact multiple of the page size, it will generally be followed by NUL padding to the end of the page, which just works. The easiest way to demonstrate the problem is to build with: make SANITIZE=address NO_MMAP=Nope test Any test which involves $GIT_DIR/info/alternates will fail, as the mmap emulation (correctly) does not add an extra NUL, and ASAN complains about reading past the end of the buffer. One solution would be to teach link_alt_odb_entries() to respect "len". But it's actually a bit tricky, since we depend on unquote_c_style() under the hood, and it has no ptr/len variant. We could also just make a NUL-terminated copy of the input bytes and operate on that. But since all but one caller already is passing a string, instead let's just fix that caller to provide NUL-terminated input in the first place, by swapping out mmap for strbuf_read_file(). There's no advantage to using mmap on the alternates file. It's not expected to be large (and anyway, we're copying its contents into an in-memory linked list). Nor is using git_open() buying us anything here, since we don't keep the descriptor open for a long period of time. Let's also drop the "len" parameter entirely from link_alt_odb_entries(), since it's completely ignored. That will avoid any new callers re-introducing a similar bug. Reported-by: Michael Haggerty Signed-off-by: Jeff King --- sha1_file.c | 30 ++ 1 file changed, 10 insertions(+), 20 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index b4a67bb838..b682b7ec06 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -398,7 +398,7 @@ static const char *parse_alt_odb_entry(const char *string, return end; } -static void link_alt_odb_entries(const char *alt, int len, int sep, +static void link_alt_odb_entries(const char *alt, int sep, const char *relative_base, int depth) { struct strbuf objdirbuf = STRBUF_INIT; @@ -427,28 +427,18 @@ static void link_alt_odb_entries(const char *alt, int len, int sep, static void read_info_alternates(const char * relative_base, int depth) { - char *map; - size_t mapsz; - struct stat st; char *path; - int fd; + struct strbuf buf = STRBUF_INIT; path = xstrfmt("%s/info/alternates", relative_base); - fd = git_open(path); - free(path); - if (fd < 0) - return; - if (fstat(fd, &st) || (st.st_size == 0)) { - close(fd); + if (strbuf_read_file(&buf, path, 1024) < 0) { + free(path); return; } - mapsz = xsize_t(st.st_size); - map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0); - close(fd); - - link_alt_odb_entries(map, mapsz, '\n', relative_base, depth); - munmap(map, mapsz); + link_alt_odb_entries(buf.buf, '\n', relative_base, depth); + strbuf_release(&buf); + free(path); } struct alternate_object_database *alloc_alt_odb(const char *dir) @@ -503,7 +493,7 @@ void add_to_alternates_file(const char *reference) if (commit_lock_file(lock)) die_errno("unable to move new alternates file into place"); if (alt_odb_tail) - link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0); + link_alt_odb_entries(reference, '\n', NULL, 0); } free(alts); } @@ -516,7 +506,7 @@ void add_to_alternates_memory(const char *reference) */ prepare_alt_odb(); - link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0); + link_alt_odb_entries(reference, '\n', NULL, 0); } /* @@ -619,7 +609,7 @@ void prepare_alt_odb(void) if (!alt) alt = ""; alt_odb_tail = &alt_odb_list; - link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0); + link_alt_odb_entries(alt, PATH_SEP, NULL, 0); read_info_alternates(get_object_directory(), 0); } -- 2.14.1.1014.g252e627ae0
[PATCH v2 0/2] fix read past end of array in alternates files
On Mon, Sep 18, 2017 at 11:51:00AM -0400, Jeff King wrote: > This series fixes a regression in v2.11.1 where we might read past the > end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't > base the patch on there, for a few reasons: Here's a v2 that fixes a minor leak in the first patch (I carefully remembered to free() the path buffer on the error path, but forgot to do it in the normal one. Oops). I also tried to address Jonathan's "should this be in the commit message" comment. The information above _is_ in there, but maybe putting it at the top as a sort of tl;dr makes it easier to find? The second patch is unchanged. Junio, I see you ended up back-porting it to v2.11. Would you prefer me to have done it that way in the first place? I was trying to reduce your work by basing it on "maint" (figuring that we wouldn't bother making a v2.11.x release anyway, and that skips you having to apply the second patch separately after the merge). [1/2]: read_info_alternates: read contents into strbuf [2/2]: read_info_alternates: warn on non-trivial errors sha1_file.c | 31 +++ 1 file changed, 11 insertions(+), 20 deletions(-) -Peff
RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension
I think my comment here might have gotten lost, and I don't want it to because it's something I'm really worried about: > -Original Message- > From: David Turner > Sent: Friday, September 15, 2017 6:00 PM > To: 'Ben Peart' > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org; > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com; > p...@peff.net > Subject: RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor > extension > > > -Original Message- > > +dirty_repo () { > > + : >untracked && > > + : >dir1/untracked && > > + : >dir2/untracked && > > + echo 1 >modified && > > + echo 2 >dir1/modified && > > + echo 3 >dir2/modified && > > + echo 4 >new && > > + echo 5 >dir1/new && > > + echo 6 >dir2/new > > If I add an untracked file named dir3/untracked to dirty_repo (and > write_integration_script), then "status doesn't detect unreported > modifications", below, fails. Did I do something wrong, or does this turn up > a > bug?
[PATCH v7 08/12] fsmonitor: add a test tool to dump the index extension
Add a test utility (test-dump-fsmonitor) that will dump the fsmonitor index extension. Signed-off-by: Ben Peart --- Makefile | 1 + t/helper/test-dump-fsmonitor.c | 21 + 2 files changed, 22 insertions(+) create mode 100644 t/helper/test-dump-fsmonitor.c diff --git a/Makefile b/Makefile index 9d6ec9c1e9..d970cd00e9 100644 --- a/Makefile +++ b/Makefile @@ -639,6 +639,7 @@ TEST_PROGRAMS_NEED_X += test-config TEST_PROGRAMS_NEED_X += test-date TEST_PROGRAMS_NEED_X += test-delta TEST_PROGRAMS_NEED_X += test-dump-cache-tree +TEST_PROGRAMS_NEED_X += test-dump-fsmonitor TEST_PROGRAMS_NEED_X += test-dump-split-index TEST_PROGRAMS_NEED_X += test-dump-untracked-cache TEST_PROGRAMS_NEED_X += test-fake-ssh diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c new file mode 100644 index 00..ad452707e8 --- /dev/null +++ b/t/helper/test-dump-fsmonitor.c @@ -0,0 +1,21 @@ +#include "cache.h" + +int cmd_main(int ac, const char **av) +{ + struct index_state *istate = &the_index; + int i; + + setup_git_directory(); + if (do_read_index(istate, get_index_file(), 0) < 0) + die("unable to read index file"); + if (!istate->fsmonitor_last_update) { + printf("no fsmonitor\n"); + return 0; + } + printf("fsmonitor last update %"PRIuMAX"\n", (uintmax_t)istate->fsmonitor_last_update); + + for (i = 0; i < istate->cache_nr; i++) + printf((istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) ? "+" : "-"); + + return 0; +} -- 2.14.1.windows.1
[PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
When the index is read from disk, the fsmonitor index extension is used to flag the last known potentially dirty index entries. The registered core.fsmonitor command is called with the time the index was last updated and returns the list of files changed since that time. This list is used to flag any additional dirty cache entries and untracked cache directories. We can then use this valid state to speed up preload_index(), ie_match_stat(), and refresh_cache_ent() as they do not need to lstat() files to detect potential changes for those entries marked CE_FSMONITOR_VALID. In addition, if the untracked cache is turned on valid_cached_dir() can skip checking directories for new or changed files as fsmonitor will invalidate the cache only for those directories that have been identified as having potential changes. To keep the CE_FSMONITOR_VALID state accurate during git operations; when git updates a cache entry to match the current state on disk, it will now set the CE_FSMONITOR_VALID bit. Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit is cleared and the corresponding untracked cache directory is marked invalid. Signed-off-by: Ben Peart --- Makefile | 1 + apply.c| 2 +- builtin/update-index.c | 2 + cache.h| 10 +- config.c | 14 +++ config.h | 1 + diff-lib.c | 2 + dir.c | 27 -- dir.h | 2 + entry.c| 4 +- environment.c | 1 + fsmonitor.c| 253 + fsmonitor.h| 61 preload-index.c| 6 +- read-cache.c | 49 -- submodule.c| 2 +- unpack-trees.c | 8 +- 17 files changed, 419 insertions(+), 26 deletions(-) create mode 100644 fsmonitor.c create mode 100644 fsmonitor.h diff --git a/Makefile b/Makefile index f2bb7f2f63..9d6ec9c1e9 100644 --- a/Makefile +++ b/Makefile @@ -786,6 +786,7 @@ LIB_OBJS += ewah/ewah_rlw.o LIB_OBJS += exec_cmd.o LIB_OBJS += fetch-pack.o LIB_OBJS += fsck.o +LIB_OBJS += fsmonitor.o LIB_OBJS += gettext.o LIB_OBJS += gpg-interface.o LIB_OBJS += graph.o diff --git a/apply.c b/apply.c index 71cbbd141c..9061cc5f15 100644 --- a/apply.c +++ b/apply.c @@ -3399,7 +3399,7 @@ static int verify_index_match(const struct cache_entry *ce, struct stat *st) return -1; return 0; } - return ce_match_stat(ce, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE); + return ce_match_stat(ce, st, CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR); } #define SUBMODULE_PATCH_WITHOUT_INDEX 1 diff --git a/builtin/update-index.c b/builtin/update-index.c index e1ca0759d5..6f39ee9274 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -16,6 +16,7 @@ #include "pathspec.h" #include "dir.h" #include "split-index.h" +#include "fsmonitor.h" /* * Default to not allowing changes to the list of files. The @@ -233,6 +234,7 @@ static int mark_ce_flags(const char *path, int flag, int mark) else active_cache[pos]->ce_flags &= ~flag; active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; + mark_fsmonitor_invalid(&the_index, active_cache[pos]); cache_tree_invalidate_path(&the_index, path); active_cache_changed |= CE_ENTRY_CHANGED; return 0; diff --git a/cache.h b/cache.h index a916bc79e3..eccab968bd 100644 --- a/cache.h +++ b/cache.h @@ -203,6 +203,7 @@ struct cache_entry { #define CE_ADDED (1 << 19) #define CE_HASHED(1 << 20) +#define CE_FSMONITOR_VALID (1 << 21) #define CE_WT_REMOVE (1 << 22) /* remove in work directory */ #define CE_CONFLICTED(1 << 23) @@ -326,6 +327,7 @@ static inline unsigned int canon_mode(unsigned int mode) #define CACHE_TREE_CHANGED (1 << 5) #define SPLIT_INDEX_ORDERED(1 << 6) #define UNTRACKED_CHANGED (1 << 7) +#define FSMONITOR_CHANGED (1 << 8) struct split_index; struct untracked_cache; @@ -344,6 +346,7 @@ struct index_state { struct hashmap dir_hash; unsigned char sha1[20]; struct untracked_cache *untracked; + uint64_t fsmonitor_last_update; }; extern struct index_state the_index; @@ -679,8 +682,10 @@ extern void *read_blob_data_from_index(const struct index_state *, const char *, #define CE_MATCH_IGNORE_MISSING0x08 /* enable stat refresh */ #define CE_MATCH_REFRESH 0x10 -extern int ie_match_stat(const struct index_state *, const struct cache_entry *, struct stat *, unsigned int); -extern int ie_modified(const struct index_state *, const struct cache_entry *, struct stat *, unsigned int); +/* do stat comparison even if CE_FSMONITOR_VALID is true */ +#define CE_MATCH_IGNORE_FSMONITOR 0X20 +
[PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit
Add a new command line option (-f) to ls-files to have it use lowercase letters for 'fsmonitor valid' files Signed-off-by: Ben Peart --- builtin/ls-files.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index e1339e6d17..313962a0c1 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -31,6 +31,7 @@ static int show_resolve_undo; static int show_modified; static int show_killed; static int show_valid_bit; +static int show_fsmonitor_bit; static int line_terminator = '\n'; static int debug_mode; static int show_eol; @@ -86,7 +87,8 @@ static const char *get_tag(const struct cache_entry *ce, const char *tag) { static char alttag[4]; - if (tag && *tag && show_valid_bit && (ce->ce_flags & CE_VALID)) { + if (tag && *tag && ((show_valid_bit && (ce->ce_flags & CE_VALID)) || + (show_fsmonitor_bit && (ce->ce_flags & CE_FSMONITOR_VALID { memcpy(alttag, tag, 3); if (isalpha(tag[0])) { @@ -515,6 +517,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) N_("identify the file status with tags")), OPT_BOOL('v', NULL, &show_valid_bit, N_("use lowercase letters for 'assume unchanged' files")), + OPT_BOOL('f', NULL, &show_fsmonitor_bit, + N_("use lowercase letters for 'fsmonitor clean' files")), OPT_BOOL('c', "cached", &show_cached, N_("show cached files in the output (default)")), OPT_BOOL('d', "deleted", &show_deleted, @@ -584,7 +588,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) for (i = 0; i < exclude_list.nr; i++) { add_exclude(exclude_list.items[i].string, "", 0, el, --exclude_args); } - if (show_tag || show_valid_bit) { + if (show_tag || show_valid_bit || show_fsmonitor_bit) { tag_cached = "H "; tag_unmerged = "M "; tag_removed = "R "; -- 2.14.1.windows.1
[PATCH v7 10/12] fsmonitor: add test cases for fsmonitor extension
Test the ability to add/remove the fsmonitor index extension via update-index. Test that dirty files returned from the integration script are properly represented in the index extension and verify that ls-files correctly reports their state. Test that ensure status results are correct when using the new fsmonitor extension. Test untracked, modified, and new files by ensuring the results are identical to when not using the extension. Test that if the fsmonitor extension doesn't tell git about a change, it doesn't discover it on its own. This ensures git is honoring the extension and that we get the performance benefits desired. Three test integration scripts are provided: fsmonitor-all - marks all files as dirty fsmonitor-none - marks no files as dirty fsmonitor-watchman - integrates with Watchman with debug logging To run tests in the test suite while utilizing fsmonitor: First copy t/t7519/fsmonitor-all to a location in your path and then set GIT_FORCE_PRELOAD_TEST=true and GIT_FSMONITOR_TEST=fsmonitor-all and run your tests. Note: currently when using the test script fsmonitor-watchman on Windows, many tests fail due to a reported but not yet fixed bug in Watchman where it holds on to handles for directories and files which prevents the test directory from being cleaned up properly. Signed-off-by: Ben Peart --- t/t7519-status-fsmonitor.sh | 304 t/t7519/fsmonitor-all | 24 t/t7519/fsmonitor-none | 22 t/t7519/fsmonitor-watchman | 140 4 files changed, 490 insertions(+) create mode 100755 t/t7519-status-fsmonitor.sh create mode 100755 t/t7519/fsmonitor-all create mode 100755 t/t7519/fsmonitor-none create mode 100755 t/t7519/fsmonitor-watchman diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh new file mode 100755 index 00..c6df85af5e --- /dev/null +++ b/t/t7519-status-fsmonitor.sh @@ -0,0 +1,304 @@ +#!/bin/sh + +test_description='git status with file system watcher' + +. ./test-lib.sh + +# +# To run the entire git test suite using fsmonitor: +# +# copy t/t7519/fsmonitor-all to a location in your path and then set +# GIT_FSMONITOR_TEST=fsmonitor-all and run your tests. +# + +# Note, after "git reset --hard HEAD" no extensions exist other than 'TREE' +# "git update-index --fsmonitor" can be used to get the extension written +# before testing the results. + +clean_repo () { + git reset --hard HEAD && + git clean -fd +} + +dirty_repo () { + : >untracked && + : >dir1/untracked && + : >dir2/untracked && + echo 1 >modified && + echo 2 >dir1/modified && + echo 3 >dir2/modified && + echo 4 >new && + echo 5 >dir1/new && + echo 6 >dir2/new +} + +write_integration_script () { + write_script .git/hooks/fsmonitor-test<<-\EOF + if test "$#" -ne 2 + then + echo "$0: exactly 2 arguments expected" + exit 2 + fi + if test "$1" != 1 + then + echo "Unsupported core.fsmonitor hook version." >&2 + exit 1 + fi + printf "untracked\0" + printf "dir1/untracked\0" + printf "dir2/untracked\0" + printf "modified\0" + printf "dir1/modified\0" + printf "dir2/modified\0" + printf "new\0" + printf "dir1/new\0" + printf "dir2/new\0" + EOF +} + +test_lazy_prereq UNTRACKED_CACHE ' + { git update-index --test-untracked-cache; ret=$?; } && + test $ret -ne 1 +' + +test_expect_success 'setup' ' + mkdir -p .git/hooks && + : >tracked && + : >modified && + mkdir dir1 && + : >dir1/tracked && + : >dir1/modified && + mkdir dir2 && + : >dir2/tracked && + : >dir2/modified && + git -c core.fsmonitor= add . && + git -c core.fsmonitor= commit -m initial && + git config core.fsmonitor .git/hooks/fsmonitor-test && + cat >.gitignore <<-\EOF + .gitignore + expect* + actual* + marker* + EOF +' + +# test that the fsmonitor extension is off by default +test_expect_success 'fsmonitor extension is off by default' ' + test-dump-fsmonitor >actual && + grep "^no fsmonitor" actual +' + +# test that "update-index --fsmonitor" adds the fsmonitor extension +test_expect_success 'update-index --fsmonitor" adds the fsmonitor extension' ' + git update-index --fsmonitor && + test-dump-fsmonitor >actual && + grep "^fsmonitor last update" actual +' + +# test that "update-index --no-fsmonitor" removes the fsmonitor extension +test_expect_success 'update-index --no-fsmonitor" removes the fsmonitor extension' ' + git update-index --no-fsmonitor && + test-dump-fsmonitor >actual && + grep "^no fsmonitor" actual +' + +cat >expectexpect expect
[PATCH v7 09/12] split-index: disable the fsmonitor extension when running the split index test
The split index test t1700-split-index.sh has hard coded SHA values for the index. Currently it supports index V4 and V3 but assumes there are no index extensions loaded. When manually forcing the fsmonitor extension to be turned on when running the test suite, the SHA values no longer match which causes the test to fail. The potential matrix of index extensions and index versions can is quite large so instead disable the extension before attempting to run the test. Signed-off-by: Ben Peart --- t/t1700-split-index.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 22f69a410b..af9b847761 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -6,6 +6,7 @@ test_description='split index mode tests' # We need total control of index splitting here sane_unset GIT_TEST_SPLIT_INDEX +sane_unset GIT_FSMONITOR_TEST test_expect_success 'enable split index' ' git config splitIndex.maxPercentChange 100 && -- 2.14.1.windows.1
[PATCH v7 01/12] bswap: add 64 bit endianness helper get_be64
Add a new get_be64 macro to enable 64 bit endian conversions on memory that may or may not be aligned. Signed-off-by: Ben Peart --- compat/bswap.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/compat/bswap.h b/compat/bswap.h index 7d063e9e40..6b22c46214 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -158,7 +158,9 @@ static inline uint64_t git_bswap64(uint64_t x) #define get_be16(p)ntohs(*(unsigned short *)(p)) #define get_be32(p)ntohl(*(unsigned int *)(p)) +#define get_be64(p)ntohll(*(uint64_t *)(p)) #define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0) +#define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0) #else @@ -178,6 +180,13 @@ static inline uint32_t get_be32(const void *ptr) (uint32_t)p[3] << 0; } +static inline uint64_t get_be64(const void *ptr) +{ + const unsigned char *p = ptr; + return (uint64_t)get_be32(p[0]) << 32 | + (uint64_t)get_be32(p[4]) << 0; +} + static inline void put_be32(void *ptr, uint32_t value) { unsigned char *p = ptr; @@ -187,4 +196,17 @@ static inline void put_be32(void *ptr, uint32_t value) p[3] = value >> 0; } +static inline void put_be64(void *ptr, uint64_t value) +{ + unsigned char *p = ptr; + p[0] = value >> 56; + p[1] = value >> 48; + p[2] = value >> 40; + p[3] = value >> 32; + p[4] = value >> 24; + p[5] = value >> 16; + p[6] = value >> 8; + p[7] = value >> 0; +} + #endif -- 2.14.1.windows.1
[PATCH v7 05/12] fsmonitor: add documentation for the fsmonitor extension.
This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension. Signed-off-by: Ben Peart --- Documentation/config.txt | 7 + Documentation/git-ls-files.txt | 7 - Documentation/git-update-index.txt | 45 Documentation/githooks.txt | 28 Documentation/technical/index-format.txt | 19 ++ 5 files changed, 105 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index dc4e3f58a2..db52645cb4 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -413,6 +413,13 @@ core.protectNTFS:: 8.3 "short" names. Defaults to `true` on Windows, and `false` elsewhere. +core.fsmonitor:: + If set, the value of this variable is used as a command which + will identify all files that may have changed since the + requested date/time. This information is used to speed up git by + avoiding unnecessary processing of files that have not changed. + See the "fsmonitor-watchman" section of linkgit:githooks[5]. + core.trustctime:: If false, the ctime differences between the index and the working tree are ignored; useful when the inode change time diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index d153c17e06..3ac3e3a77d 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -9,7 +9,7 @@ git-ls-files - Show information about files in the index and the working tree SYNOPSIS [verse] -'git ls-files' [-z] [-t] [-v] +'git ls-files' [-z] [-t] [-v] [-f] (--[cached|deleted|others|ignored|stage|unmerged|killed|modified])* (-[c|d|o|i|s|u|k|m])* [--eol] @@ -133,6 +133,11 @@ a space) at the start of each line: that are marked as 'assume unchanged' (see linkgit:git-update-index[1]). +-f:: + Similar to `-t`, but use lowercase letters for files + that are marked as 'fsmonitor valid' (see + linkgit:git-update-index[1]). + --full-name:: When run from a subdirectory, the command usually outputs paths relative to the current directory. This diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index e19eba62cd..95231dbfcb 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -16,9 +16,11 @@ SYNOPSIS [--chmod=(+|-)x] [--[no-]assume-unchanged] [--[no-]skip-worktree] +[--[no-]fsmonitor-valid] [--ignore-submodules] [--[no-]split-index] [--[no-|test-|force-]untracked-cache] +[--[no-]fsmonitor] [--really-refresh] [--unresolve] [--again | -g] [--info-only] [--index-info] [-z] [--stdin] [--index-version ] @@ -111,6 +113,12 @@ you will need to handle the situation manually. set and unset the "skip-worktree" bit for the paths. See section "Skip-worktree bit" below for more information. +--[no-]fsmonitor-valid:: + When one of these flags is specified, the object name recorded + for the paths are not updated. Instead, these options + set and unset the "fsmonitor valid" bit for the paths. See + section "File System Monitor" below for more information. + -g:: --again:: Runs 'git update-index' itself on the paths whose index @@ -201,6 +209,15 @@ will remove the intended effect of the option. `--untracked-cache` used to imply `--test-untracked-cache` but this option would enable the extension unconditionally. +--fsmonitor:: +--no-fsmonitor:: + Enable or disable files system monitor feature. These options + take effect whatever the value of the `core.fsmonitor` + configuration variable (see linkgit:git-config[1]). But a warning + is emitted when the change goes against the configured value, as + the configured value will take effect next time the index is + read and this will remove the intended effect of the option. + \--:: Do not interpret any more arguments as options. @@ -447,6 +464,34 @@ command reads the index; while when `--[no-|force-]untracked-cache` are used, the untracked cache is immediately added to or removed from the index. +File System Monitor +--- + +This feature is intended to speed up git operations for repos that have +large working directories. + +It enables git to work together with a file system monitor (see the +"fsmonitor-watchman" section of linkgit:githooks[5]) that can +inform it as to what files have been modified. This enables git to avoid +having to lstat() every file to find modified files. + +When used in conjunction with the untracked cache, it can further improve +performance by avoiding the cost of scaning the entire working directory +looking
[PATCH v7 11/12] fsmonitor: add a sample integration script for Watchman
This script integrates the new fsmonitor capabilities of git with the cross platform Watchman file watching service. To use the script: Download and install Watchman from https://facebook.github.io/watchman/. Rename the sample integration hook from fsmonitor-watchman.sample to fsmonitor-watchman. Configure git to use the extension: git config core.fsmonitor .git/hooks/fsmonitor-watchman Optionally turn on the untracked cache for optimal performance. Signed-off-by: Ben Peart Signed-off-by: Johannes Schindelin Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Christian Couder --- templates/hooks--fsmonitor-watchman.sample | 122 + 1 file changed, 122 insertions(+) create mode 100755 templates/hooks--fsmonitor-watchman.sample diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample new file mode 100755 index 00..870a59d237 --- /dev/null +++ b/templates/hooks--fsmonitor-watchman.sample @@ -0,0 +1,122 @@ +#!/usr/bin/perl + +use strict; +use warnings; +use IPC::Open2; + +# An example hook script to integrate Watchman +# (https://facebook.github.io/watchman/) with git to speed up detecting +# new and modified files. +# +# The hook is passed a version (currently 1) and a time in nanoseconds +# formatted as a string and outputs to stdout all files that have been +# modified since the given time. Paths must be relative to the root of +# the working tree and separated by a single NUL. +# +# To enable this hook, rename this file to "query-watchman" and set +# 'git config core.fsmonitor .git/hooks/query-watchman' +# +my ($version, $time) = @ARGV; + +# Check the hook interface version + +if ($version == 1) { + # convert nanoseconds to seconds + $time = int $time / 10; +} else { + die "Unsupported query-fsmonitor hook version '$version'.\n" . + "Falling back to scanning...\n"; +} + +# Convert unix style paths to escaped Windows style paths when running +# in Windows command prompt + +my $system = `uname -s`; +$system =~ s/[\r\n]+//g; +my $git_work_tree; + +if ($system =~ m/^MSYS_NT/) { + $git_work_tree = `cygpath -aw "\$PWD"`; + $git_work_tree =~ s/[\r\n]+//g; + $git_work_tree =~ s,\\,/,g; +} else { + $git_work_tree = $ENV{'PWD'}; +} + +my $retry = 1; + +launch_watchman(); + +sub launch_watchman { + + # Set input record separator + local $/ = 0666; + + my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j') + or die "open2() failed: $!\n" . + "Falling back to scanning...\n"; + + # In the query expression below we're asking for names of files that + # changed since $time but were not transient (ie created after + # $time but no longer exist). + # + # To accomplish this, we're using the "since" generator to use the + # recency index to select candidate nodes and "fields" to limit the + # output to file names only. Then we're using the "expression" term to + # further constrain the results. + # + # The category of transient files that we want to ignore will have a + # creation clock (cclock) newer than $time_t value and will also not + # currently exist. + + my $query = <<" END"; + ["query", "$git_work_tree", { + "since": $time, + "fields": ["name"], + "expression": ["not", ["allof", ["since", $time, "cclock"], ["not", "exists"]]] + }] + END + + print CHLD_IN $query; + my $response = ; + + die "Watchman: command returned no output.\n" . + "Falling back to scanning...\n" if $response eq ""; + die "Watchman: command returned invalid output: $response\n" . + "Falling back to scanning...\n" unless $response =~ /^\{/; + + my $json_pkg; + eval { + require JSON::XS; + $json_pkg = "JSON::XS"; + 1; + } or do { + require JSON::PP; + $json_pkg = "JSON::PP"; + }; + + my $o = $json_pkg->new->utf8->decode($response); + + if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve root .* directory (.*) is not watched/) { + print STDERR "Adding '$git_work_tree' to watchman's watch list.\n"; + $retry--; + qx/watchman watch "$git_work_tree"/; + die "Failed to make watchman watch '$git_work_tree'.\n" . + "Falling back to scanning...\n" if $? != 0; + + # Watchman will always return all files on the first query so + # return the fast "everything is dirty" flag to git and do the + # Watchman query just to get it over with now so we won't pay + # the cost in git to look up each individual file. + print "/\0"; + eval { launch_watchman() }; + exit 0; +
[PATCH v7 02/12] preload-index: add override to enable testing preload-index
Preload index doesn't run unless it has a minimum number of 1000 files. To enable running tests with fewer files, add an environment variable (GIT_FORCE_PRELOAD_TEST) which will override that minimum and set it to 2. Signed-off-by: Ben Peart --- preload-index.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/preload-index.c b/preload-index.c index 70a4c80878..75564c497a 100644 --- a/preload-index.c +++ b/preload-index.c @@ -79,6 +79,8 @@ static void preload_index(struct index_state *index, return; threads = index->cache_nr / THREAD_COST; + if ((index->cache_nr > 1) && (threads < 2) && getenv("GIT_FORCE_PRELOAD_TEST")) + threads = 2; if (threads < 2) return; if (threads > MAX_PARALLEL) -- 2.14.1.windows.1
[PATCH v7 03/12] update-index: add a new --force-write-index option
At times, it makes sense to avoid the cost of writing out the index when the only changes can easily be recomputed on demand. This causes problems when trying to write test cases to verify that state as they can't guarantee the state has been persisted to disk. Add a new option (--force-write-index) to update-index that will ensure the index is written out even if the cache_changed flag is not set. Signed-off-by: Ben Peart --- builtin/update-index.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index d562f2ec69..e1ca0759d5 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -915,6 +915,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) struct refresh_params refresh_args = {0, &has_errors}; int lock_error = 0; int split_index = -1; + int force_write = 0; struct lock_file *lock_file; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; @@ -1006,6 +1007,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) N_("test if the filesystem supports untracked cache"), UC_TEST), OPT_SET_INT(0, "force-untracked-cache", &untracked_cache, N_("enable untracked cache without testing the filesystem"), UC_FORCE), + OPT_SET_INT(0, "force-write-index", &force_write, + N_("write out the index even if is not flagged as changed"), 1), OPT_END() }; @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) die("BUG: bad untracked_cache value: %d", untracked_cache); } - if (active_cache_changed) { + if (active_cache_changed || force_write) { if (newfd < 0) { if (refresh_args.flags & REFRESH_QUIET) exit(128); -- 2.14.1.windows.1
[PATCH v7 07/12] update-index: add fsmonitor support to update-index
Add support in update-index to manually add/remove the fsmonitor extension via --[no-]fsmonitor flags. Add support in update-index to manually set/clear the fsmonitor valid bit via --[no-]fsmonitor-valid flags. Signed-off-by: Ben Peart --- builtin/update-index.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 6f39ee9274..41618db098 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -33,6 +33,7 @@ static int force_remove; static int verbose; static int mark_valid_only; static int mark_skip_worktree_only; +static int mark_fsmonitor_only; #define MARK_FLAG 1 #define UNMARK_FLAG 2 static struct strbuf mtime_dir = STRBUF_INIT; @@ -229,12 +230,12 @@ static int mark_ce_flags(const char *path, int flag, int mark) int namelen = strlen(path); int pos = cache_name_pos(path, namelen); if (0 <= pos) { + mark_fsmonitor_invalid(&the_index, active_cache[pos]); if (mark) active_cache[pos]->ce_flags |= flag; else active_cache[pos]->ce_flags &= ~flag; active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; - mark_fsmonitor_invalid(&the_index, active_cache[pos]); cache_tree_invalidate_path(&the_index, path); active_cache_changed |= CE_ENTRY_CHANGED; return 0; @@ -460,6 +461,11 @@ static void update_one(const char *path) die("Unable to mark file %s", path); return; } + if (mark_fsmonitor_only) { + if (mark_ce_flags(path, CE_FSMONITOR_VALID, mark_fsmonitor_only == MARK_FLAG)) + die("Unable to mark file %s", path); + return; + } if (force_remove) { if (remove_file_from_cache(path)) @@ -918,6 +924,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) int lock_error = 0; int split_index = -1; int force_write = 0; + int fsmonitor = -1; struct lock_file *lock_file; struct parse_opt_ctx_t ctx; strbuf_getline_fn getline_fn; @@ -1011,6 +1018,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) N_("enable untracked cache without testing the filesystem"), UC_FORCE), OPT_SET_INT(0, "force-write-index", &force_write, N_("write out the index even if is not flagged as changed"), 1), + OPT_BOOL(0, "fsmonitor", &fsmonitor, + N_("enable or disable file system monitor")), + {OPTION_SET_INT, 0, "fsmonitor-valid", &mark_fsmonitor_only, NULL, + N_("mark files as fsmonitor valid"), + PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, MARK_FLAG}, + {OPTION_SET_INT, 0, "no-fsmonitor-valid", &mark_fsmonitor_only, NULL, + N_("clear fsmonitor valid bit"), + PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG}, OPT_END() }; @@ -1152,6 +1167,22 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) die("BUG: bad untracked_cache value: %d", untracked_cache); } + if (fsmonitor > 0) { + if (git_config_get_fsmonitor() == 0) + warning(_("core.fsmonitor is unset; " + "set it if you really want to " + "enable fsmonitor")); + add_fsmonitor(&the_index); + report(_("fsmonitor enabled")); + } else if (!fsmonitor) { + if (git_config_get_fsmonitor() == 1) + warning(_("core.fsmonitor is set; " + "remove it if you really want to " + "disable fsmonitor")); + remove_fsmonitor(&the_index); + report(_("fsmonitor disabled")); + } + if (active_cache_changed || force_write) { if (newfd < 0) { if (refresh_args.flags & REFRESH_QUIET) -- 2.14.1.windows.1
[PATCH v7 12/12] fsmonitor: add a performance test
Add a test utility (test-drop-caches) that flushes all changes to disk then drops file system cache on Windows, Linux, and OSX. Add a perf test (p7519-fsmonitor.sh) for fsmonitor. By default, the performance test will utilize the Watchman file system monitor if it is installed. If Watchman is not installed, it will use a dummy integration script that does not report any new or modified files. The dummy script has very little overhead which provides optimistic results. The performance test will also use the untracked cache feature if it is available as fsmonitor uses it to speed up scanning for untracked files. There are 3 environment variables that can be used to alter the default behavior of the performance test: GIT_PERF_7519_UNTRACKED_CACHE: used to configure core.untrackedCache GIT_PERF_7519_SPLIT_INDEX: used to configure core.splitIndex GIT_PERF_7519_FSMONITOR: used to configure core.fsMonitor The big win for using fsmonitor is the elimination of the need to scan the working directory looking for changed and untracked files. If the file information is all cached in RAM, the benefits are reduced. GIT_PERF_7519_DROP_CACHE: if set, the OS caches are dropped between tests Signed-off-by: Ben Peart Signed-off-by: Ævar Arnfjörð Bjarmason --- Makefile| 1 + t/helper/.gitignore | 1 + t/helper/test-drop-caches.c | 162 ++ t/perf/p7519-fsmonitor.sh | 184 4 files changed, 348 insertions(+) create mode 100644 t/helper/test-drop-caches.c create mode 100755 t/perf/p7519-fsmonitor.sh diff --git a/Makefile b/Makefile index d970cd00e9..b2653ee64f 100644 --- a/Makefile +++ b/Makefile @@ -638,6 +638,7 @@ TEST_PROGRAMS_NEED_X += test-ctype TEST_PROGRAMS_NEED_X += test-config TEST_PROGRAMS_NEED_X += test-date TEST_PROGRAMS_NEED_X += test-delta +TEST_PROGRAMS_NEED_X += test-drop-caches TEST_PROGRAMS_NEED_X += test-dump-cache-tree TEST_PROGRAMS_NEED_X += test-dump-fsmonitor TEST_PROGRAMS_NEED_X += test-dump-split-index diff --git a/t/helper/.gitignore b/t/helper/.gitignore index 721650256e..f9328eebdd 100644 --- a/t/helper/.gitignore +++ b/t/helper/.gitignore @@ -3,6 +3,7 @@ /test-config /test-date /test-delta +/test-drop-caches /test-dump-cache-tree /test-dump-split-index /test-dump-untracked-cache diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c new file mode 100644 index 00..4e5ca8f397 --- /dev/null +++ b/t/helper/test-drop-caches.c @@ -0,0 +1,162 @@ +#include "git-compat-util.h" + +#if defined(GIT_WINDOWS_NATIVE) + +static int cmd_sync(void) +{ + char Buffer[MAX_PATH]; + DWORD dwRet; + char szVolumeAccessPath[] = ".\\X:"; + HANDLE hVolWrite; + int success = 0; + + dwRet = GetCurrentDirectory(MAX_PATH, Buffer); + if ((0 == dwRet) || (dwRet > MAX_PATH)) + return error("Error getting current directory"); + + if ((Buffer[0] < 'A') || (Buffer[0] > 'Z')) + return error("Invalid drive letter '%c'", Buffer[0]); + + szVolumeAccessPath[4] = Buffer[0]; + hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL); + if (INVALID_HANDLE_VALUE == hVolWrite) + return error("Unable to open volume for writing, need admin access"); + + success = FlushFileBuffers(hVolWrite); + if (!success) + error("Unable to flush volume"); + + CloseHandle(hVolWrite); + + return !success; +} + +#define STATUS_SUCCESS (0xL) +#define STATUS_PRIVILEGE_NOT_HELD (0xC061L) + +typedef enum _SYSTEM_INFORMATION_CLASS { + SystemMemoryListInformation = 80, +} SYSTEM_INFORMATION_CLASS; + +typedef enum _SYSTEM_MEMORY_LIST_COMMAND { + MemoryCaptureAccessedBits, + MemoryCaptureAndResetAccessedBits, + MemoryEmptyWorkingSets, + MemoryFlushModifiedList, + MemoryPurgeStandbyList, + MemoryPurgeLowPriorityStandbyList, + MemoryCommandMax +} SYSTEM_MEMORY_LIST_COMMAND; + +static BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags) +{ + BOOL bResult; + DWORD dwBufferLength; + LUID luid; + TOKEN_PRIVILEGES tpPreviousState; + TOKEN_PRIVILEGES tpNewState; + + dwBufferLength = 16; + bResult = LookupPrivilegeValueA(0, lpName, &luid); + if (bResult) { + tpNewState.PrivilegeCount = 1; + tpNewState.Privileges[0].Luid = luid; + tpNewState.Privileges[0].Attributes = 0; + bResult = AdjustTokenPrivileges(TokenHandle, 0, &tpNewState, + (DWORD)((LPBYTE)&(tpNewState.Privileges[1]) - (LPBYTE)&tpNewState), + &tpPreviousState, &dwBufferLength); + if (bResult) { + tpPreviousState.PrivilegeCount = 1; +
[PATCH v7 00/12] Fast git status via a file system watcher
Subject: Fast git status via a file system watcher Thanks to everyone who provided feedback. There are lots of minor style changes, documentation updates and a fixed leak. The only functional change is the addition of support to set/clear the fsmonitor valid bit via 'git update-index --[no-]fsmonitor-valid' with the associated documentation and tests. Interdiff between V6 and V7: diff --git a/Documentation/config.txt b/Documentation/config.txt index c196007a27..db52645cb4 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -418,6 +418,7 @@ core.fsmonitor:: will identify all files that may have changed since the requested date/time. This information is used to speed up git by avoiding unnecessary processing of files that have not changed. + See the "fsmonitor-watchman" section of linkgit:githooks[5]. core.trustctime:: If false, the ctime differences between the index and the diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index d153c17e06..3ac3e3a77d 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -9,7 +9,7 @@ git-ls-files - Show information about files in the index and the working tree SYNOPSIS [verse] -'git ls-files' [-z] [-t] [-v] +'git ls-files' [-z] [-t] [-v] [-f] (--[cached|deleted|others|ignored|stage|unmerged|killed|modified])* (-[c|d|o|i|s|u|k|m])* [--eol] @@ -133,6 +133,11 @@ a space) at the start of each line: that are marked as 'assume unchanged' (see linkgit:git-update-index[1]). +-f:: + Similar to `-t`, but use lowercase letters for files + that are marked as 'fsmonitor valid' (see + linkgit:git-update-index[1]). + --full-name:: When run from a subdirectory, the command usually outputs paths relative to the current directory. This diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index e19eba62cd..95231dbfcb 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -16,9 +16,11 @@ SYNOPSIS [--chmod=(+|-)x] [--[no-]assume-unchanged] [--[no-]skip-worktree] +[--[no-]fsmonitor-valid] [--ignore-submodules] [--[no-]split-index] [--[no-|test-|force-]untracked-cache] +[--[no-]fsmonitor] [--really-refresh] [--unresolve] [--again | -g] [--info-only] [--index-info] [-z] [--stdin] [--index-version ] @@ -111,6 +113,12 @@ you will need to handle the situation manually. set and unset the "skip-worktree" bit for the paths. See section "Skip-worktree bit" below for more information. +--[no-]fsmonitor-valid:: + When one of these flags is specified, the object name recorded + for the paths are not updated. Instead, these options + set and unset the "fsmonitor valid" bit for the paths. See + section "File System Monitor" below for more information. + -g:: --again:: Runs 'git update-index' itself on the paths whose index @@ -201,6 +209,15 @@ will remove the intended effect of the option. `--untracked-cache` used to imply `--test-untracked-cache` but this option would enable the extension unconditionally. +--fsmonitor:: +--no-fsmonitor:: + Enable or disable files system monitor feature. These options + take effect whatever the value of the `core.fsmonitor` + configuration variable (see linkgit:git-config[1]). But a warning + is emitted when the change goes against the configured value, as + the configured value will take effect next time the index is + read and this will remove the intended effect of the option. + \--:: Do not interpret any more arguments as options. @@ -447,6 +464,34 @@ command reads the index; while when `--[no-|force-]untracked-cache` are used, the untracked cache is immediately added to or removed from the index. +File System Monitor +--- + +This feature is intended to speed up git operations for repos that have +large working directories. + +It enables git to work together with a file system monitor (see the +"fsmonitor-watchman" section of linkgit:githooks[5]) that can +inform it as to what files have been modified. This enables git to avoid +having to lstat() every file to find modified files. + +When used in conjunction with the untracked cache, it can further improve +performance by avoiding the cost of scaning the entire working directory +looking for new files. + +If you want to enable (or disable) this feature, it is easier to use +the `core.fsmonitor` configuration variable (see +linkgit:git-config[1]) than using the `--fsmonitor` option to +`git update-index` in each repository, especially if you want to do so +across all repositories you use, because you can set the configuration +variable to `true`
Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository
> Hm, can you say more about the context? From a certain point of view, > it might make sense for that command to succeed instead: if the repo > is already unshallow, then why should't "fetch --unshallow" complain > instead of declaring victory? A fellow in #git on Freenode was writing a script for automation and encountered this error, and asked how to find out whether a repo was shallow. My *first instinct* was to check if rev-parse had a flag for it; I wouldn't have been surprised if it did. I agree that treating it as a fatal error is a bit much in the first place, but I also think having a way to check can be useful. I also wonder if a lot of the stuff rev-parse is used for now should be moved to some sort of `git misc` command, but that's a different can of worms, so into rev-parse a new flag went. > What does git-path mean here? I wonder if it's a copy/paste error. > ... > Reviewed-by: Jonathan Nieder Yeah, the titles were copy-pasted without adjusting, thanks for fixing, Jonathan! ;) > I agree with the fixes to the test titles suggested, so I'll queue the > patch with the fixes squashed in. Hearing "yeah, the titles were > copy-pasted without adjusting, thanks for fixing, Jonathan!" sent by > =C3=98ystein would be super nice. Sounds good. Thanks for queueing my patch. My fourth! �sse
Re: [PATCH] Win32: simplify loading of DLL functions
Hi, Johannes Schindelin wrote: > Dynamic loading of DLL functions is duplicated in several places in Git > for Windows' source code. > > This patch adds a pair of macros to simplify the process: the > DECLARE_PROC_ADDR(, , , > ..) macro to be used at the beginning of a > code block, and the INIT_PROC_ADDR() macro to call before > using the declared function. The return value of the INIT_PROC_ADDR() > call has to be checked; If it is NULL, the function was not found in the > specified DLL. > > Example: > > DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, > LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); > > if (!INIT_PROC_ADDR(CreateHardLinkW)) > return error("Could not find CreateHardLinkW() function"; > > if (!CreateHardLinkW(source, target, NULL)) > return error("could not create hardlink from %S to %S", >source, target); > return 0; nit: whitespace is a bit strange here (mixture of tabs and spaces). Could this example go near the top of the header instead? That way, it's easier for people reading the header to see how to use it. > Signed-off-by: Karsten Blees > Signed-off-by: Johannes Schindelin > --- Just curious: what was Karsten's contribution? (I ask mostly because I'm interested in kinds of collaboration git metadata is failing to capture correctly --- e.g. pair programming.) > So far, there are no users (except in Git for Windows). Ben > promised to make use of it in his fsmonitor patch series. > > compat/win32/lazyload.h | 44 > 1 file changed, 44 insertions(+) > create mode 100644 compat/win32/lazyload.h Are any of the Git for Windows users something that could go upstream along with this patch? That would help illustrate what a good caller looks like, which should help with reviewing future patches that use this code. > --- /dev/null > +++ b/compat/win32/lazyload.h > @@ -0,0 +1,44 @@ [...] > +/* Declares a function to be loaded dynamically from a DLL. */ > +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \ > + static struct proc_addr proc_addr_##function = \ > + { #dll, #function, NULL, 0 }; \ > + static rettype (WINAPI *function)(__VA_ARGS__) > + > +/* > + * Loads a function from a DLL (once-only). > + * Returns non-NULL function pointer on success. > + * Returns NULL + errno == ENOSYS on failure. > + */ > +#define INIT_PROC_ADDR(function) \ > + (function = get_proc_addr(&proc_addr_##function)) Probably worth mentioning in the doc comment that this is not thread safe, so a caller that wants to lazy-init in a threaded context is responsible for doing their own locking. > + > +static inline void *get_proc_addr(struct proc_addr *proc) > +{ > + /* only do this once */ > + if (!proc->initialized) { > + HANDLE hnd; > + proc->initialized = 1; > + hnd = LoadLibraryExA(proc->dll, NULL, > + LOAD_LIBRARY_SEARCH_SYSTEM32); > + if (hnd) > + proc->pfunction = GetProcAddress(hnd, proc->function); > + } > + /* set ENOSYS if DLL or function was not found */ > + if (!proc->pfunction) > + errno = ENOSYS; > + return proc->pfunction; > +} strerror(ENOSYS) is "Function not implemented". Cute. Reviewed-by: Jonathan Nieder Thanks, Jonathan
Re: [PATCH v2] Improve performance of git status --ignored
On 09/18, Jameson Miller wrote: > Improve the performance of the directory listing logic when it wants to list > non-empty ignored directories. In order to show non-empty ignored directories, > the existing logic will recursively iterate through all contents of an ignored > directory. This change introduces the optimization to stop iterating through > the contents once it finds the first file. This can have a significant > improvement in 'git status --ignored' performance in repositories with a large > number of files in ignored directories. > > For an example of the performance difference on an example repository with > 196,000 files in 400 ignored directories: > > | Command| Time (s) | > | -- | - | > | git status | 1.2 | > | git status --ignored (old) | 3.9 | > | git status --ignored (new) | 1.4 | > > Signed-off-by: Jameson Miller Everything looks good to me. My only nit (and no need to change it for this patch) is that the first line of the commit msg usually has the form: : So that its easy to tell which part of the code a commit is changing. Seriously, great patch. Thanks! > --- > dir.c | 47 +-- > 1 file changed, 41 insertions(+), 6 deletions(-) > > diff --git a/dir.c b/dir.c > index 1c55dc3..1d17b80 100644 > --- a/dir.c > +++ b/dir.c > @@ -49,7 +49,7 @@ struct cached_dir { > static enum path_treatment read_directory_recursive(struct dir_struct *dir, > struct index_state *istate, const char *path, int len, > struct untracked_cache_dir *untracked, > - int check_only, const struct pathspec *pathspec); > + int check_only, int stop_at_first_file, const struct pathspec > *pathspec); > static int get_dtype(struct dirent *de, struct index_state *istate, >const char *path, int len); > > @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct > dir_struct *dir, > > untracked = lookup_untracked(dir->untracked, untracked, >dirname + baselen, len - baselen); > + > + /* > + * If this is an excluded directory, then we only need to check if > + * the directory contains any files. > + */ > return read_directory_recursive(dir, istate, dirname, len, > - untracked, 1, pathspec); > + untracked, 1, exclude, pathspec); > } > > /* > @@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct > dir_struct *dir, >* with check_only set. >*/ > return read_directory_recursive(dir, istate, path->buf, > path->len, > - cdir->ucd, 1, pathspec); > + cdir->ucd, 1, 0, pathspec); > /* >* We get path_recurse in the first run when >* directory_exists_in_index() returns index_nonexistent. We > @@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir) > * Also, we ignore the name ".git" (even if it is not a directory). > * That likely will not change. > * > + * If 'stop_at_first_file' is specified, 'path_excluded' is returned > + * to signal that a file was found. This is the least significant value that > + * indicates that a file was encountered that does not depend on the order of > + * whether an untracked or exluded path was encountered first. > + * > * Returns the most significant path_treatment value encountered in the scan. > + * If 'stop_at_first_file' is specified, `path_excluded` is the most > + * significant path_treatment value that will be returned. > */ > + > static enum path_treatment read_directory_recursive(struct dir_struct *dir, > struct index_state *istate, const char *base, int baselen, > struct untracked_cache_dir *untracked, int check_only, > - const struct pathspec *pathspec) > + int stop_at_first_file, const struct pathspec *pathspec) > { > struct cached_dir cdir; > enum path_treatment state, subdir_state, dir_state = path_none; > @@ -1832,12 +1845,34 @@ static enum path_treatment > read_directory_recursive(struct dir_struct *dir, > subdir_state = > read_directory_recursive(dir, istate, path.buf, >path.len, ud, > - check_only, pathspec); > + check_only, > stop_at_first_file, pathspec); > if (subdir_state > dir_state) > dir_state = subdir_state; > } > > if (check_only) { > + if (stop_at_first_file) { > + /* > + * If stopping at first file, then > +
Re: [PATCH v6 09/40] Add initial external odb support
I wonder if it's better to get a change like this (PATCH v6 09/40 and any of the previous patches that this depends on) in and then build on it rather than to review the whole patch set at a time. This would reduce ripple effects (of needing to change later patches in a patch set multiple times unnecessarily) and help collaboration (in that multiple people can write patches, since the foundation would already be laid). The same concerns about fsck apply, but that shouldn't be a problem, since this patch provides the same internal API as mine ("get" function taking in a single hash, and "have" function taking in a single hash) so it shouldn't be too difficult to adapt my fsck and gc patches [1]. (I can do that if necessary.) [1] https://public-inbox.org/git/20170915134343.3814d...@twelve2.svl.corp.google.com/ One possible issue (with committing something early) is that later work (for example, a fast long-running process protocol) will make the earlier work (for example, here, a simple single-shot protocol) obsolete, while saddling us with the necessity of maintaining the earlier one. To that end, if we want to start with the support for a hook, a better approach might be to only code the fast long-running process protocol, and put a wrapper script in contrib/ that can wrap a single-shot process in a long-running process. And another possible issue is that we design ourselves into a corner. Thinking about the use cases that I know about (the Android use case and the Microsoft GVFS use case), I don't think we are doing that - for Android, this means that large blob metadata needs to be part of the design (and this patch series does provide for that), and for Microsoft GVFS, "get" is relatively cheap, so a configuration option to not invoke "have" first when loading a missing object might be sufficient. As for the design itself (including fetch and clone), it differs from my patches (linked above as [1]) in that mine is self-contained (requiring only an updated Git server and Git client) whereas this, as far as I can tell, requires an external process and some measure of coordination between the administrator of the server and the client user (for example, the client must have the same ODB mechanism as the server, if not, the server might omit certain blobs that the client does not know how to fetch). And I think that my design can be extended to support a use case in which, for example, blobs corresponding to a certain type of filename (defined by a glob like in gitattributes) can be excluded during fetch/clone, much like --blob-max-bytes, and they can be fetched either through the built-in mechanism or through a custom hook. For those reasons, I still lean towards my design, but if we do want to go with this design, here are my comments about this patch... First of all: - You'll probably need to add a repository extension. - I get compile errors when I "git am" these onto master. I think '#include "config.h"' is needed in some places. On Sat, 16 Sep 2017 10:07:00 +0200 Christian Couder wrote: > The external-odb.{c,h} files contains the functions that are > called by the rest of Git from "sha1_file.c". > > The odb-helper.{c,h} files contains the functions to > actually implement communication with the external scripts or > processes that will manage external git objects. > > For now only script mode is supported, and only the 'have' and > 'get_git_obj' instructions are supported. This "have", as I see from this commit, is more like a "list" command in that it lists all hashes that it knows about, and does not check if a given hash exists. > +static struct odb_helper *helpers; > +static struct odb_helper **helpers_tail = &helpers; This could be done with the helpers in list.h instead. > +int external_odb_get_object(const unsigned char *sha1) > +{ > + struct odb_helper *o; > + const char *path; > + > + if (!external_odb_has_object(sha1)) > + return -1; > + > + path = sha1_file_name_alt(external_odb_root(), sha1); If the purpose of making these functions global in the previous patch is just for temporary names, I don't think it's necessary for them to be global. Just concatenate the hex SHA1 to external_odb_root()? > /* Returns 1 if we have successfully freshened the file, 0 otherwise. */ > @@ -667,7 +684,7 @@ static int check_and_freshen_nonlocal(const unsigned char > *sha1, int freshen) > if (check_and_freshen_file(path, freshen)) > return 1; > } > - return 0; > + return external_odb_has_object(sha1); > } > > static int check_and_freshen(const unsigned char *sha1, int freshen) > @@ -824,6 +841,9 @@ static int stat_sha1_file(const unsigned char *sha1, > struct stat *st, > return 0; > } > > + if (!external_odb_get_object(sha1) && !lstat(*path, st)) > + return 0; > + > return -1; > } > > @@ -859,7 +879,14 @@ static int open_sha1_file(const unsigned
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Le 19/09/2017 à 17:43, Johannes Schindelin a écrit : > > C'mon, don't *try* to misunderstand me. > > Of course there need to be updates as to the state of patch series. > > It's just that mails only go *so* far when you need to connect and > aggregate information. You need the connection between the original patch > series, the latest unaddressed reviews, links to the branches, history of > the patch series' iterations, and ideally links to the repositories of the > contributors with *their* branch names. And then, of course, your verdict > as to the state of the patch series and your expectation what happens > next. > > To relate that, you are using a plain text format that is not well defined > and not structured, and certainly not machine-readable, for information > that is crucial for project management. > > What you need is a tool to aggregate this information, to help working > with it, to manage the project, and to be updated automatically. And to > publish this information continuously, without costing you extra effort. > > I understand that you started before GitHub existed, and before GitHub was > an option, the script-generated What's cooking mail was the best you could > do. > > Ciao, > Dscho Hi, Would something like patchwork fix your need ? They now seems to have a REST API, which means it could probably be pluggeg into Junio's scripts and work seemlessly for him (or any other happy ML user) while other people can browse through the web interface. I used to work with this one: http://patches.opendataplane.org/project/lng-odp/list/ It is not the best example as the patch status are pretty much never updated on this one. But it would solve most of the points you raised, while keeping fully compatible with the way people actually work (including Junio). Nicolas
Re: RFC v3: Another proposed hash function transition plan
Hi Johannes, Thanks for your feedback. On 19/09/17 00:16, Johannes Schindelin wrote: >>> SHA-256 got much more cryptanalysis than SHA3-256 […]. >> >> I do not think this is true. > > Please read what I said again: SHA-256 got much more cryptanalysis > than SHA3-256. Indeed. What I meant is that SHA3-256 got at least as much cryptanalysis as SHA-256. :-) > I never said that SHA3-256 got little cryptanalysis. Personally, I > think that SHA3-256 got a ton more cryptanalysis than SHA-1, and that > SHA-256 *still* got more cryptanalysis. But my opinion does not count, > really. However, the two experts I pestered with questions over > questions left me with that strong impression, and their opinion does > count. OK, I respect your opinion and that of your two experts. Yet, the "much more" part of your statement, in particular, is something that may require a bit more explanations. Kind regards, Gilles
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Johannes Schindelin wrote: > What you need is a tool to aggregate this information, to help working > with it, to manage the project, and to be updated automatically. And to > publish this information continuously, without costing you extra effort. > > I understand that you started before GitHub existed, and before GitHub was > an option, the script-generated What's cooking mail was the best you could > do. On second reading, I think you're talking about GitHub's code review ("pull request") feature, not a bug tracker. There's been some active work (some that you've been involved in, I believe) on getting information from a GitHub pull request to the mailing list. One possibility would be to get information to also go in the other direction, so people have information about what happened to their patch in one place. I can also see why you are directing your attention to the maintainer for this one, since if the entire project adopts the GitHub Pull Request workflow, then the complexity and other flaws of such a two-way sync could be avoided. Unfortunately the maintainer is not the only person you'd need to convince. You'd also need to convince patch authors and reviewers to move to the new workflow. There are likely some potential contributors that this would bring in, that would like to get involved in the Git project but had been deterred by e.g. the mailing list's aggressive filtering of any email with an HTML part. Meanwhile it would also be a large adjustment for existing contributors, so it's not risk free. I personally like how Bazel[1] handles this. They have two ways to contribute: A. People used to GitHub can use a pull request like they are accustomed to. B. People preferring a more structured code review can use Gerrit. Having only two ways to contribute means that the code review information is still easy to archive and search, just like our mailing list archive. (Actually, an example I like even more is Akaros[2], which also appears to accept patch contributions by email.) Adding new ways for a contributor to submit a patch would mean that we can experiment with new workflows without getting in the way of people using an existing one. Thoughts? Jonathan [1] https://bazel.build/contributing.html [2] https://groups.google.com/forum/#!forum/akaros
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Hi, Johannes Schindelin wrote: > To relate that, you are using a plain text format that is not well defined > and not structured, and certainly not machine-readable, for information > that is crucial for project management. > > What you need is a tool to aggregate this information, to help working > with it, to manage the project, and to be updated automatically. And to > publish this information continuously, without costing you extra effort. > > I understand that you started before GitHub existed, and before GitHub was > an option, the script-generated What's cooking mail was the best you could > do. I think you are subtly (but not directly, for some reason?) advocating moving project management for the Git project to GitHub Issues. For what it's worth: 1. I would be happy to see Git adopt a bug tracker. As we've discussed on the list before, I suspect the only way that this is going to happen is if some contributors start using a bug tracker and keep up with bugs there, without requiring everyone to use it. That is how the Linux Kernel project started using bugzilla.kernel.org, for example. 2. GitHub Issues is one of my least favorite bug trackers, for what it's worth. If some sub-project of Git chooses to use it, then that's great and I won't get in their way. I'm just providing this single data point that approximately any other tracker (Bugzilla, JIRA, debbugs, patchwork) is something I'd be more likely to use. 3. This advice might feel hopeless, because if the maintainer is not involved in the initial pilot, then how does the bug tracker get notified when a patch has been accepted? But fortunately this is a problem other people have solved: e.g. most bug trackers have an API that can be used to automatically notify the bug when a patch with a certain subject line appears on a certain branch. Thanks and hope that helps, Jonathan
Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)
Hi Junio, On Tue, 19 Sep 2017, Junio C Hamano wrote: > Johannes Schindelin writes: > > >> Do you have a concrete suggestion to make these individual entries > >> more helpful for people who may want go back to the original thread > >> in doing so? In-reply-to: or References: fields of the "What's > >> cooking" report would not help. I often have the message IDs that > >> made/helped me make these individual comments in the description; > >> they alone would not react to mouse clicks, though. > > > > Oh gawd, not even more stuff piled onto the mail format. Please stop. > > ... > > It probably tries to serve too many purposes at the same time, and > > thereby none. > > Well, this was started as my attempt to give a public service that shows > a summary of what is happening in the entire integration tree, as there > was nothing like that before (and going to github.com and looking at > 'pu' branch would not give you an easy overview). As many people > contribute many topics to the project, complaining that it talks about > too many topics would not get you anywhere. > > If you find "What's cooking" report not serving your needs, and if no > one finds it not serving his or her needs, then I can stop sending these > out, of course, but I am not getting the impression that we are at that > point, at least not yet. C'mon, don't *try* to misunderstand me. Of course there need to be updates as to the state of patch series. It's just that mails only go *so* far when you need to connect and aggregate information. You need the connection between the original patch series, the latest unaddressed reviews, links to the branches, history of the patch series' iterations, and ideally links to the repositories of the contributors with *their* branch names. And then, of course, your verdict as to the state of the patch series and your expectation what happens next. To relate that, you are using a plain text format that is not well defined and not structured, and certainly not machine-readable, for information that is crucial for project management. What you need is a tool to aggregate this information, to help working with it, to manage the project, and to be updated automatically. And to publish this information continuously, without costing you extra effort. I understand that you started before GitHub existed, and before GitHub was an option, the script-generated What's cooking mail was the best you could do. Ciao, Dscho
RE: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
> -Original Message- > From: Torsten Bögershausen [mailto:tbo...@web.de] > Sent: Tuesday, September 19, 2017 10:16 AM > To: Ben Peart > Cc: Ben Peart ; Junio C Hamano > ; david.tur...@twosigma.com; ava...@gmail.com; > christian.cou...@gmail.com; git@vger.kernel.org; > johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net > Subject: Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index > extension > > > > > Should I just make the variable type itself uintmax_t and then just > > skip the cast altogether? I went with uint64_t because that is what > > getnanotime returned. > > > > That is a bit of taste question (or answer) > > Typically you declare the variables in the type you need, and this is > uint64_t. > > Let's step back a bit: > To print e.g a variable of type uint32_t, you use PRIu32 in the format > string, > like this: > > fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),", > > In theory (it is in the later specs, and it exists on many platforms), there > is a > PRIu64 as well. > > We don't seem to use it in Git, probably because uintmax_t is (more) > portable and understood by all platforms which support Git. > (And beside that, on most platforms uintmax_t is 64 bit). > > So my suggestion would be to keep uint64_t and cast the variable into > uintmax_t whenever it is printed. > Great! That is the way I have it.
RE: [PATCH v6 12/12] fsmonitor: add a performance test
Hi Ben, On Mon, 18 Sep 2017, Ben Peart wrote: > > From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de] > > On Fri, 15 Sep 2017, Ben Peart wrote: > > > > > + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) = > > > + (DWORD(WINAPI *)(INT, PVOID, > > ULONG))GetProcAddress(ntdll, "NtSetSystemInformation"); > > > + if (!NtSetSystemInformation) > > > + return error("Can't get function addresses, wrong Windows > > > +version?"); > > > > It turns out that we have seen this plenty of times in Git for Windows' > > fork, so much so that we came up with a nice helper to make this all a bit > > more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and > > INIT_PROC_ADDR helpers in compat/win32/lazyload.h. > > > > Maybe this would be the perfect excuse to integrate this patch into upstream > > Git? > > This patch is pretty hefty already. How about you push this capability > upstream and I take advantage of it in a later patch. :) Deal: https://public-inbox.org/git/f5a3add27206df3e7f39efeac8a3c3b47f2b79f2.1505834586.git.johannes.schinde...@gmx.de Ciao, Johannes
[PATCH] Win32: simplify loading of DLL functions
Dynamic loading of DLL functions is duplicated in several places in Git for Windows' source code. This patch adds a pair of macros to simplify the process: the DECLARE_PROC_ADDR(, , , ..) macro to be used at the beginning of a code block, and the INIT_PROC_ADDR() macro to call before using the declared function. The return value of the INIT_PROC_ADDR() call has to be checked; If it is NULL, the function was not found in the specified DLL. Example: DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW, LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES); if (!INIT_PROC_ADDR(CreateHardLinkW)) return error("Could not find CreateHardLinkW() function"; if (!CreateHardLinkW(source, target, NULL)) return error("could not create hardlink from %S to %S", source, target); return 0; Signed-off-by: Karsten Blees Signed-off-by: Johannes Schindelin --- So far, there are no users (except in Git for Windows). Ben promised to make use of it in his fsmonitor patch series. Published-As: https://github.com/dscho/git/releases/tag/lazyload-v1 Fetch-It-Via: git fetch https://github.com/dscho/git lazyload-v1 compat/win32/lazyload.h | 44 1 file changed, 44 insertions(+) create mode 100644 compat/win32/lazyload.h diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h new file mode 100644 index 000..91c10dad2fb --- /dev/null +++ b/compat/win32/lazyload.h @@ -0,0 +1,44 @@ +#ifndef LAZYLOAD_H +#define LAZYLOAD_H + +/* simplify loading of DLL functions */ + +struct proc_addr { + const char *const dll; + const char *const function; + FARPROC pfunction; + unsigned initialized : 1; +}; + +/* Declares a function to be loaded dynamically from a DLL. */ +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \ + static struct proc_addr proc_addr_##function = \ + { #dll, #function, NULL, 0 }; \ + static rettype (WINAPI *function)(__VA_ARGS__) + +/* + * Loads a function from a DLL (once-only). + * Returns non-NULL function pointer on success. + * Returns NULL + errno == ENOSYS on failure. + */ +#define INIT_PROC_ADDR(function) \ + (function = get_proc_addr(&proc_addr_##function)) + +static inline void *get_proc_addr(struct proc_addr *proc) +{ + /* only do this once */ + if (!proc->initialized) { + HANDLE hnd; + proc->initialized = 1; + hnd = LoadLibraryExA(proc->dll, NULL, +LOAD_LIBRARY_SEARCH_SYSTEM32); + if (hnd) + proc->pfunction = GetProcAddress(hnd, proc->function); + } + /* set ENOSYS if DLL or function was not found */ + if (!proc->pfunction) + errno = ENOSYS; + return proc->pfunction; +} + +#endif base-commit: 9ddaf86b06a8078420f59aec8cab6daa93cf1a91 -- 2.14.1.windows.1.1020.g03faabc8bc8.dirty
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote: Does the following alternate fix work? I think I prefer it because it doesn't require introducing a new global. [...] #define for_each_string_list_item(item,list) \ - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) + for (item = (list)->items; \ +(list)->items && item < (list)->items + (list)->nr; \ +++item) This is the possibility that I was referring to as "add[ing] overhead to each iteration of the loop". I'd rather not add an extra test-and-branch to every iteration of a loop in which `list->items` is *not* NULL, which your solution appears to do. Or are compilers routinely able to optimize the check out? It seems at least 'gcc' is able to optimize this out even with a -O1 and 'clang' optimizes this out with a -O2. Taking a sneak peek at the 'Makefile' shows that our default is -O2. For a proof, see https://godbolt.org/g/CPt73L --- Kaartic
Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension
> > Should I just make the variable type itself uintmax_t and then just skip > the cast altogether? I went with uint64_t because that is what > getnanotime returned. > That is a bit of taste question (or answer) Typically you declare the variables in the type you need, and this is uint64_t. Let's step back a bit: To print e.g a variable of type uint32_t, you use PRIu32 in the format string, like this: fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),", In theory (it is in the later specs, and it exists on many platforms), there is a PRIu64 as well. We don't seem to use it in Git, probably because uintmax_t is (more) portable and understood by all platforms which support Git. (And beside that, on most platforms uintmax_t is 64 bit). So my suggestion would be to keep uint64_t and cast the variable into uintmax_t whenever it is printed.
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
On Tue, Sep 19, 2017 at 3:38 PM, SZEDER Gábor wrote: > A bit "futuristic" option along these lines could be something like > this, using a scoped loop variable in the outer loop to ensure that > it's executed at most once: > > #define for_each_string_list_item(item,list) \ > for (int f_e_s_l_i = 1; (list)->items && f_e_s_l_i; f_e_s_l_i = 0) \ > for (item = (list)->items; item < (list)->items + (list)->nr; > ++item) > > The high number of underscores are an attempt to make reasonably sure > that the macro's loop variable doesn't shadow any variable in its > callers or isn't being shadowed in the loop body, which might(?) > trigger warnings in some compilers. Well, and a poor attempt at that, because, of course, the loop variable would still be shadowed in nested for_each_string_list_item loops... And our codebase has these loops nested in entry.c:finish_delayed_checkout(). OTOH, we don't seem to care too much about shadowed variables, since building with -Wshadow gives 91 warnings in current master...
Re: [PATCH] for_each_string_list_item(): behave correctly for empty list
On Tue, Sep 19, 2017 at 8:51 AM, Michael Haggerty wrote: > On 09/19/2017 02:08 AM, Stefan Beller wrote: >>> I am hoping that this last one is not allowed and we can use the >>> "same condition is checked every time we loop" version that hides >>> the uglyness inside the macro. >> >> By which you are referring to Jonathans solution posted. >> Maybe we can combine the two solutions (checking for thelist >> to not be NULL once, by Jonathan) and using an outer structure >> (SZEDERs solution) by replacing the condition by a for loop, >> roughly (untested): >> >> #define for_each_string_list_item(item,list) \ >> - for (item = (list)->items; item < (list)->items + (list)->nr; ++item) >> +for (; list; list = NULL) >> +for (item = (list)->items; item < (list)->items + (list)->nr; >> ++item) >> >> as that would not mingle with any dangling else clause. >> It is also just one statement, such that >> >> if (bla) >> for_each_string_list_item { >> baz(item); >> } >> else >> foo; >> >> still works. >> >> Are there downsides to this combined approach? > > On the plus side, it's pleasantly devious; I wouldn't have thought of > using a `for` loop for the initial test. But it doesn't work as written, > because (1) we don't need to guard against `list` being NULL, but rather > `list->items`; and (2) we don't have the liberty to set `list = NULL` > (or `list->items = NULL`, because `list` is owned by the caller and we > shouldn't modify it. > > The following is a bit closer: > > #define for_each_string_list_item(item,list) \ > for (item = (list)->items; item; item = NULL) \ > for (; item < (list)->items + (list)->nr; ++item) > > But I think that also fails, because a callsite that does > > for_each_string_list_item(myitem, mylist) > if (myitem.util) > break; > > would expect that `myitem` is still set after breaking out of the loop, > whereas the outer `for` loop would reset it to NULL. > > If `break` were an expression we could do something like > > #define for_each_string_list_item(item,list) \ > for (item = (list)->items; item; break) \ > for (; item < (list)->items + (list)->nr; ++item) A bit "futuristic" option along these lines could be something like this, using a scoped loop variable in the outer loop to ensure that it's executed at most once: #define for_each_string_list_item(item,list) \ for (int f_e_s_l_i = 1; (list)->items && f_e_s_l_i; f_e_s_l_i = 0) \ for (item = (list)->items; item < (list)->items + (list)->nr; ++item) The high number of underscores are an attempt to make reasonably sure that the macro's loop variable doesn't shadow any variable in its callers or isn't being shadowed in the loop body, which might(?) trigger warnings in some compilers. Alas we don't allow scoping the loop variable in for loops, and even a test balloon patch didn't make it into git.git. https://public-inbox.org/git/20170719181956.15845-1-sbel...@google.com/T/#u > So I think we're still left with the suggestions of Jonathan or Gábor. > Or the bigger change of initializing `string_list::items` to point at an > empty sentinal array (similar to `strbuf_slopbuf`) rather than NULL. > Personally, I think that Jonathan's approach makes the most sense, > unless somebody wants to jump in an implement a `string_list_slopbuf`. > > By the way, I wonder if any open-coded loops over `string_lists` make > the same mistake as the macro?
[PATCH] doc: camelCase the config variables to improve readability
The config variable used weren't readable as they were in the crude form of how git stores/uses it's config variables. Improve it's readability by replacing them with camelCased versions of config variables as it doesn't have any impact on it's usage. Signed-off-by: Kaartic Sivaraam --- Documentation/git-branch.txt | 4 ++-- Documentation/git-tag.txt| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt index e292737b9c5ab..58f1e5c9c74e1 100644 --- a/Documentation/git-branch.txt +++ b/Documentation/git-branch.txt @@ -92,10 +92,10 @@ OPTIONS all changes made to the branch ref, enabling use of date based sha1 expressions such as "@\{yesterday}". Note that in non-bare repositories, reflogs are usually - enabled by default by the `core.logallrefupdates` config option. + enabled by default by the `core.logAllRefUpdates` config option. The negated form `--no-create-reflog` only overrides an earlier `--create-reflog`, but currently does not negate the setting of - `core.logallrefupdates`. + `core.logAllRefUpdates`. -f:: --force:: diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt index 543fb425ee7c1..95e9f391d88fc 100644 --- a/Documentation/git-tag.txt +++ b/Documentation/git-tag.txt @@ -174,7 +174,7 @@ This option is only applicable when listing tags without annotation lines. `core.logAllRefUpdates` in linkgit:git-config[1]. The negated form `--no-create-reflog` only overrides an earlier `--create-reflog`, but currently does not negate the setting of - `core.logallrefupdates`. + `core.logAllRefUpdates`. :: The name of the tag to create, delete, or describe. -- https://github.com/git/git/pull/407
[PATCH] t4014: strengthen search patterns
The regex patterns for some failing test cases were a bit loose giving way for a few incorrect outputs being accepted as correct outputs. To avoid such incorrect outputs from being flagged as correct ones use fixed string matches when possible and strengthen regex when it's not. Signed-off-by: Kaartic Sivaraam --- t/t4014-format-patch.sh | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 482112ca339f0..7dff7996c9e1f 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -163,7 +163,7 @@ test_expect_failure 'additional command line cc (rfc822)' ' git config --replace-all format.headers "Cc: R E Cipient " && git format-patch --cc="S. E. Cipient " --stdout master..side | sed -e "/^\$/q" >patch5 && grep "^Cc: R E Cipient ,\$" patch5 && - grep "^ *\"S. E. Cipient\" \$" patch5 + grep "^ *\"S\. E\. Cipient\" \$" patch5 ' test_expect_success 'command line headers' ' @@ -191,13 +191,13 @@ test_expect_success 'command line To: header (ascii)' ' test_expect_failure 'command line To: header (rfc822)' ' git format-patch --to="R. E. Cipient " --stdout master..side | sed -e "/^\$/q" >patch8 && - grep "^To: \"R. E. Cipient\" \$" patch8 + grep -F "To: \"R. E. Cipient\" " patch8 ' test_expect_failure 'command line To: header (rfc2047)' ' git format-patch --to="R Ä Cipient " --stdout master..side | sed -e "/^\$/q" >patch8 && - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" patch8 + grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" patch8 ' test_expect_success 'configuration To: header (ascii)' ' @@ -211,14 +211,14 @@ test_expect_failure 'configuration To: header (rfc822)' ' git config format.to "R. E. Cipient " && git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 && - grep "^To: \"R. E. Cipient\" \$" patch9 + grep -F "To: \"R. E. Cipient\" " patch9 ' test_expect_failure 'configuration To: header (rfc2047)' ' git config format.to "R Ä Cipient " && git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 && - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" patch9 + grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" patch9 ' # check_patch : Verify that looks like a half-sane -- https://github.com/git/git/pull/406
Re: [PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible
On 09/19/2017 08:22 AM, Michael Haggerty wrote: > Keep a copy of the `packed-refs` file contents in memory for as long > as a `packed_ref_cache` object is in use: > > * If the system allows it, keep the `packed-refs` file mmapped. > > * If not (either because the system doesn't support `mmap()` at all, > or because a file that is currently mmapped cannot be replaced via > `rename()`), then make a copy of the file's contents in > heap-allocated space, and keep that around instead. > > We base the choice of behavior on a new build-time switch, > `MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows > variants. > > This whole change is still pointless, because we only read the > `packed-refs` file contents immediately after instantiating the > `packed_ref_cache`. But that will soon change. > > Signed-off-by: Michael Haggerty > --- > Makefile | 10 +++ > config.mak.uname | 3 + > refs/packed-backend.c | 184 > ++ > 3 files changed, 155 insertions(+), 42 deletions(-) > > diff --git a/Makefile b/Makefile > index f2bb7f2f63..7a49f99c4f 100644 > [...] > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index 0fe41a7203..4981516f1e 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > [...] > @@ -304,6 +371,61 @@ struct ref_iterator *mmapped_ref_iterator_begin( > return ref_iterator; > } > > +/* > + * Depending on `mmap_strategy`, either mmap or read the contents of > + * the `packed-refs` file into the `packed_refs` instance. Return 1 if > + * the file existed and was read, or 0 if the file was absent. Die on > + * errors. > + */ > +static int load_contents(struct packed_ref_cache *packed_refs) > +{ > + int fd; > + struct stat st; > + size_t size, bytes_read; Coverity helpfully pointed out that `bytes_read` has to be signed: `ssize_t`. I'll fix that in the next round after waiting for other comments. > [...] Michael
I Want To Know Your Opinion In Doing This Project.
My Dear, >From Mrs. Loveth Konnia. I belief that you can help in setting up a charity foundation for the benefit of mankind, I wish to establish a charity foundation to help the poor in your country under your care, Can you help to build this project in your country? Together We can make the world a better place when we help one another, Waiting to read from you today and know your opinion in doing this project, Remain Blessed, Mrs. Loveth
Re: [RFC SAMPLE] builtin/branch: give more useful error messages when renaming
Sorry, the email client seems to have crapped up the formatting. In case it looks difficult to follow, let me know so that I could send a better version. --- Kaartic
Re: [PATCH] doc: update information about windows build
On Monday 18 September 2017 12:32 AM, Phillip Wood wrote: May be the Windows build exit with failure on other repos rather than saying it passes? I'm not quite sure what you're asking. If the tests aren't run it needs to look like a pass or everyone's branches would be marked as failing on github and more importantly it wouldn't be clear when there was a real failure on linux/macos. If the tests are run then it will pass/fail depending on the test result. I thought failing wasn't a big issue because the macOS build failed most of the time due to time-outs, at least for me. Most of the failed builds in my build history were just the macOS builds timing out. That was some time ago. They seem to be passing quite often recently. --- Kaartic
[RFC SAMPLE] builtin/branch: give more useful error messages when renaming
The patch series results in a change in output as specified below. Only few cases have been shown here to keep it short. The output for other cases are similar. $ git branch * master foo bar Before patch, $ # Trying to rename non-existent branch $ git branch -m hypothet no_such_branch error: refname refs/heads/hypothet not found fatal: Branch rename failed $ # Trying to rename non-existent branch into existing one $ git branch -m hypothet master error: refname refs/heads/hypothet not found fatal: Branch rename failed $ # Trying to force update current branch $ git branch -M foo master fatal: Cannot force update the current branch. $ # Trying to force rename an in-existent branch with an invalid name $ git branch -M hypothet ?123 fatal: '?123' is not a valid branch name. After patch, $ # Trying to rename non-existent branch $ git branch -m hypothet no_such_branch fatal: branch 'hypothet' doesn't exist $ # Trying to rename non-existent branch into existing one $ git branch -m hypothet master fatal: branch 'hypothet' doesn't exist, and branch 'master' already exists $ # Trying to force update current branch $ git branch -M foo master fatal: cannot force update the current branch $ # Trying to force rename an in-existent branch with an invalid name $ git branch -M hypothet ?123 fatal: branch 'hypothet' doesn't exist, and branch name '?123' is invalid
Re: My first contribution
Hi Olga, On Tue, Sep 19, 2017 at 8:44 AM, Оля Тележная wrote: > Hello Jeff, > I want to try myself into new role of a Git contributor. Welcome to Git! > All of the projects sound super interesting for me and I am ready to take > part in all of them, but the project "Unifying Git's format languages" will > be super useful for me myself, so my dream is to try solve that task first. Great that you found a project you like! > I need help to choose my first task in the project, because first steps are > usually the most frightening. > I am familiar enough with C and there's no problem to read any docs, but > please help me choosing the first task. If you have any guidance like "how > to start" or "how to choose tasks", please send that links also. You can try to work first on the project you are interested in or you can find and work on a small project first. One way to find a small project is to see what people are talking about on the mailing list[1]. Quite often there are bugs that can be fixed, and more experienced people may help sketch out a solution. You can also find small items good for newcomers marked with the "leftoverbits" tag, which you can search for in the mailing list[2]. We don't have a written guide specifically downloading git, getting it built, running the tests, and so forth, but you might start with: git clone https://github.com/git/git and reading the INSTALL file. As the mailing list can be a bit intimidating at first, we don't mind working with you one-on-one a bit during the application period. About the mailing list, please add [Outreachy] in the subject to make it clear that you are applying for the Outreachy program. While at it on the Git mailing list we are used to replying to parts of message after those parts. We don't usually reply before the message. In other words we are used to "inline replying", not "top posting" (see https://en.wikipedia.org/wiki/Posting_style). Please try to use the same posting style as us, it will help keep discussions more standard and easier to understand. Also we feel free to remove parts of the messages we are quoting that are not relevant anymore. For getting in touch with us, direct email is our preferred method. We could also meet on IRC if you like, but it looks like our timezones might not overlap much. Still, we can probably set up a time. Let us know if you have any questions at all. This is our first time mentoring for Outreachy, so we've surely forgotten to mention some obvious thing you would want to know. :) Thanks, Christian. [1] There are details of the list at https://git-scm.com/community, but you may want to just browse the archive at: https://public-inbox.org/git [2] https://public-inbox.org/git/?q=leftoverbits
[RFC PATCH 4/5] branch: introduce dont_fail parameter for update validation
This parameter allows the branch update validation function to optionally return a flag specifying the reason for failure, when requested. This allows the caller to know why it was about to die. This allows more useful error messages to be given to the user when trying to rename a branch. The flags are specified in the form of an enum and values for success flags have been assigned explicitly to clearly express that certain callers rely those values and they cannot be arbitrary. Only the logic has been added but no caller has been made to use it, yet. So, no functional changes. Signed-off-by: Kaartic Sivaraam --- branch.c | 34 +++--- branch.h | 23 +-- builtin/branch.c | 2 +- builtin/checkout.c | 2 +- 4 files changed, 46 insertions(+), 15 deletions(-) diff --git a/branch.c b/branch.c index 2020dedf6..9dda336a0 100644 --- a/branch.c +++ b/branch.c @@ -178,28 +178,40 @@ int read_branch_desc(struct strbuf *buf, const char *branch_name) return 0; } -int validate_branch_update(const char *name, struct strbuf *ref, - int could_exist, int clobber_head) +int validate_branch_update(const char *name, struct strbuf *ref, int could_exist, + int clobber_head, unsigned dont_fail) { - if (strbuf_check_branch_ref(ref, name)) - die(_("'%s' is not a valid branch name."), name); + if (strbuf_check_branch_ref(ref, name)) { + if (dont_fail) + return INVALID_BRANCH_NAME; + else + die(_("'%s' is not a valid branch name."), name); + } if (!ref_exists(ref->buf)) - return 0; + return VALID_BRANCH_NAME; - if (!could_exist) - die(_("A branch named '%s' already exists."), ref->buf + strlen("refs/heads/")); + if (!could_exist) { + if (dont_fail) + return BRANCH_EXISTS; + else + die(_("A branch named '%s' already exists."), ref->buf + strlen("refs/heads/")); + } if (!clobber_head) { const char *head; struct object_id oid; head = resolve_ref_unsafe("HEAD", 0, oid.hash, NULL); - if (!is_bare_repository() && head && !strcmp(head, ref->buf)) - die(_("Cannot force update the current branch.")); + if (!is_bare_repository() && head && !strcmp(head, ref->buf)) { + if (dont_fail) + return CANNOT_FORCE_UPDATE_CURRENT_BRANCH; + else + die(_("Cannot force update the current branch.")); + } } - return 1; + return FORCE_UPDATING_BRANCH; } /* @@ -268,7 +280,7 @@ void create_branch(const char *name, const char *start_name, validate_existing_branch(name, &ref); dont_change_ref = 1; } else { - forcing = validate_branch_update(name, &ref, force, clobber_head); + forcing = validate_branch_update(name, &ref, force, clobber_head, 0); } real_ref = NULL; diff --git a/branch.h b/branch.h index 6ada7af59..c6a8a75bb 100644 --- a/branch.h +++ b/branch.h @@ -27,6 +27,16 @@ void create_branch(const char *name, const char *start_name, int force, int reflog, int clobber_head, int quiet, enum branch_track track); +enum branch_validation_result { + /* Flags that say it's NOT OK to update */ + BRANCH_EXISTS = -3, + CANNOT_FORCE_UPDATE_CURRENT_BRANCH, + INVALID_BRANCH_NAME, + /* Flags that say it's OK to update */ + VALID_BRANCH_NAME = 0, + FORCE_UPDATING_BRANCH = 1 +}; + /* * Validates whether the branch with the given name may be updated (created, renamed etc.,) * with respect to the given conditions. It returns the interpreted ref in ref. @@ -36,10 +46,19 @@ void create_branch(const char *name, const char *start_name, * if 'could_exist' is true, clobber_head indicates whether the branch could be the * current branch else it has no effect. * - * A non-zero return value indicates that a branch already exists and can be force updated. + * The return values have the following meaning, + * + * - If dont_fail is 0, the function dies in case of failure and returns flags of + * 'validate_result' that specify it is OK to update the branch. The positive + * non-zero flag implies that the branch can be force updated. + * + * - If dont_fail is 1, the function doesn't die in case of failure but returns flags + * of 'validate_result' that specify the reason for failure. The behaviour in case of + * success is same as above. * */ -int validate_branch_update(const char *name, struct strbuf *ref, int could_exist, int clobber_head); +int va
[RFC PATCH 2/5] branch: document the usage of certain parameters
Documentation for a certain function was incomplete as it didn't say what certain parameters were used for. So, document them for the sake of completeness and easy reference. Signed-off-by: Kaartic Sivaraam --- branch.h | 5 + 1 file changed, 5 insertions(+) diff --git a/branch.h b/branch.h index b07788558..33b7f5d88 100644 --- a/branch.h +++ b/branch.h @@ -15,6 +15,11 @@ * * - reflog creates a reflog for the branch * + * - if 'force' is true, clobber_head indicates whether the branch could be + * the current branch; else it has no effect + * + * - quiet suppresses tracking information + * * - track causes the new branch to be configured to merge the remote branch * that start_name is a tracking branch for (if any). */ -- 2.14.1.1006.g90ad9a07c
[RFC PATCH 0/5] branch: improve error messages of branch renaming
In builtin/branch, the error messages weren't handled directly by the branch renaming function and was left to the other function. Though this avoids redundancy this gave unclear error messages in some cases. So, make builtin/branch give more useful error messages. The first two patches are preparatory/cleanup patches. The third patch refactors a function to make it more usable/understandable(?). This results only in one functional change as noted there. I've tried my best not to screw anything up as a consequence of that refactor[note 1]. In case I missed something, let me know. The fourth patch introduces part of the logic needed to improve error messages. It's kept separate to keep things reviewable. The fifth patch is the main one which does the improvement of error messages. These patches apply on top of 'master' and be found in my fork[2]. Note: [1]: The Travis CI build did succeed but I don't think we can rely on that a lot because the test aren't exhaustive (I guess). https://travis-ci.org/sivaraam/git/builds/277146416 [2]: https://github.com/sivaraam/git/tree/work/branch-move-revamp Kaartic Sivaraam (5): builtin/checkout: avoid usage of '!!' for expressions branch: document the usage of certain parameters branch: cleanup branch name validation branch: introduce dont_fail parameter for update validation builtin/branch: give more useful error messages when renaming branch.c | 67 +++--- branch.h | 44 +-- builtin/branch.c | 48 +- builtin/checkout.c | 7 +++--- t/t3200-branch.sh | 4 5 files changed, 130 insertions(+), 40 deletions(-) -- 2.14.1.868.g66c78774b
[RFC PATCH 3/5] branch: cleanup branch name validation
The function that validates a new branch name was clumsy because, 1. It did more than just validating the branch name 2. Despite it's name, it is more often than not used to validate existing branch names through the 'force' and 'attr_only' parameters (whose names by the way weren't communicative). 3. The 'attr_only' parameter should have been "true" only when callers like to update some attribute of the branch and "false" when they were updating where the branch points to. It was misused by callers by setting it to "true" (to skip tests) even though they were force updating where the current branch was pointing to! This is an example of spoiling code clarity by making the caller rely on how the function is implemented thus making it hard to modify/maintain. This makes it unclear what the function does at all and would confuse the people who would ever want to it for the first time. So, refactor it into a function that validates whether the branch could be updated. This doesn't bear the uncommunicative 'new'. Further replace the 'force' parameter with a 'could_exist' parameter which specifies whether the given branch name could exist or not (it's just a better name for 'force'). Also replace the 'attr_only' with 'clobber_head' which is a more communicative way of seeing "If the branch could exist, it's OK if it is the current branch". Separate the validation of an existing branch into another function. This (at last!) addresses the NEEDSWORK that was added in fa7993767 (branch --set-upstream: regression fix, 2011-09-16) This refactor has only one functional change. It enforces strictly that an existing branch should be updated only with the 'force' switch. So, it's no more possible to do, $ git branch -m master master (which doesn't seem that useful). This strongly enforces the following statement of the 'git branch' documentation, "If exists, -M must be used to force the rename to happen." Signed-off-by: Kaartic Sivaraam --- branch.c | 41 ++--- branch.h | 20 builtin/branch.c | 2 +- builtin/checkout.c | 4 ++-- t/t3200-branch.sh | 4 5 files changed, 45 insertions(+), 26 deletions(-) diff --git a/branch.c b/branch.c index 703ded69c..2020dedf6 100644 --- a/branch.c +++ b/branch.c @@ -178,18 +178,19 @@ int read_branch_desc(struct strbuf *buf, const char *branch_name) return 0; } -int validate_new_branchname(const char *name, struct strbuf *ref, - int force, int attr_only) +int validate_branch_update(const char *name, struct strbuf *ref, + int could_exist, int clobber_head) { if (strbuf_check_branch_ref(ref, name)) die(_("'%s' is not a valid branch name."), name); if (!ref_exists(ref->buf)) return 0; - else if (!force && !attr_only) + + if (!could_exist) die(_("A branch named '%s' already exists."), ref->buf + strlen("refs/heads/")); - if (!attr_only) { + if (!clobber_head) { const char *head; struct object_id oid; @@ -197,9 +198,29 @@ int validate_new_branchname(const char *name, struct strbuf *ref, if (!is_bare_repository() && head && !strcmp(head, ref->buf)) die(_("Cannot force update the current branch.")); } + return 1; } +/* + * Validates whether the branch with the given name exists, returning the + * interpreted ref in ref. + * + * This method is invoked if the caller merely wants to know if it is OK + * to change some attribute for the named branch (e.g. tracking upstream). + * + */ +static void validate_existing_branch(const char *name, struct strbuf *ref) +{ + if (strbuf_check_branch_ref(ref, name)) + die(_("'%s' is not a valid branch name."), name); + + if (ref_exists(ref->buf)) + return; + else + die(_("branch '%s' doesn't exist"), name); +} + static int check_tracking_branch(struct remote *remote, void *cb_data) { char *tracking_branch = cb_data; @@ -243,13 +264,11 @@ void create_branch(const char *name, const char *start_name, if (track == BRANCH_TRACK_EXPLICIT || track == BRANCH_TRACK_OVERRIDE) explicit_tracking = 1; - if (validate_new_branchname(name, &ref, force, - track == BRANCH_TRACK_OVERRIDE || - clobber_head)) { - if (!force) - dont_change_ref = 1; - else - forcing = 1; + if (track == BRANCH_TRACK_OVERRIDE) { + validate_existing_branch(name, &ref); + dont_change_ref = 1; + } else { + forcing = validate_branch_update(name, &ref, force, clobber_head); }
[RFC PATCH 5/5] builtin/branch: give more useful error messages when renaming
When trying to rename an inexistent branch to an existing branch the rename failed specifying the new branch name exists rather than specifying that the branch trying to be renamed doesn't exist. $ git branch -m tset master fatal: A branch named 'master' already exists. It's conventional to report that 'tset' doesn't exist rather than reporting that 'master' exists, the same way the 'mv' command does. $ git branch -m tset master fatal: branch 'tset' doesn't exist. That has the problem that the error about an existing branch is shown only after the user corrects the error about inexistent branch. $ git branch -m test master fatal: A branch named 'master' already exists. This isn't useful either because the user would have corrected this error in a single go if he had been told this alongside the first error. So, give more useful error messages by giving errors about old branch name and new branch name at the same time. This is possible as the branch update validation function now returns the reason it was about to die, when requested. $ git branch -m tset master fatal: branch 'tset' doesn't exist, and branch 'master' already exists Note: Thanks to the strbuf API that made it possible to easily construct the composite error message strings! Signed-off-by: Kaartic Sivaraam --- builtin/branch.c | 48 ++-- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/builtin/branch.c b/builtin/branch.c index 205c12a11..27d24e83d 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -456,25 +456,56 @@ static void reject_rebase_or_bisect_branch(const char *target) free_worktrees(worktrees); } +static void get_error_msg(struct strbuf* error_msg, const char* oldname, unsigned old_branch_exists, + const char* newname, int new_branch_validation_result) +{ + const char* connector_string = ", and "; + const unsigned connector_length = 6; + unsigned connector_added = 0; + + if (!old_branch_exists) { + strbuf_addf(error_msg, _("branch '%s' doesn't exist"), oldname); + + /* add the 'connector_string' and remove it later if it's not needed */ + strbuf_addstr(error_msg, connector_string); + connector_added = 1; + } + + switch (new_branch_validation_result) { + case BRANCH_EXISTS: + strbuf_addf(error_msg, _("branch '%s' already exists"), newname); + break; + case CANNOT_FORCE_UPDATE_CURRENT_BRANCH: + strbuf_addstr(error_msg, _("cannot force update the current branch")); + break; + case INVALID_BRANCH_NAME: + strbuf_addf(error_msg, _("branch name '%s' is invalid"), newname); + break; + case VALID_BRANCH_NAME: + case FORCE_UPDATING_BRANCH: + if(connector_added) + strbuf_remove(error_msg, error_msg->len-connector_length, connector_length); + } +} + static void rename_branch(const char *oldname, const char *newname, int force) { struct strbuf oldref = STRBUF_INIT, newref = STRBUF_INIT, logmsg = STRBUF_INIT; struct strbuf oldsection = STRBUF_INIT, newsection = STRBUF_INIT; int recovery = 0; int clobber_head_ok; + struct strbuf error_msg = STRBUF_INIT, empty = STRBUF_INIT; if (!oldname) die(_("cannot rename the current branch while not on any.")); - if (strbuf_check_branch_ref(&oldref, oldname)) { + if (strbuf_check_branch_ref(&oldref, oldname) && ref_exists(oldref.buf)) + { /* * Bad name --- this could be an attempt to rename a * ref that we used to allow to be created by accident. */ - if (ref_exists(oldref.buf)) - recovery = 1; - else - die(_("Invalid branch name: '%s'"), oldname); + recovery = 1; } /* @@ -483,7 +514,10 @@ static void rename_branch(const char *oldname, const char *newname, int force) */ clobber_head_ok = !strcmp(oldname, newname); - validate_branch_update(newname, &newref, force, clobber_head_ok, 0); + get_error_msg(&error_msg, oldname, ref_exists(oldref.buf), + newname, validate_branch_update(newname, &newref, force, clobber_head_ok, 1)); + if (strbuf_cmp(&error_msg, &empty)) + die("%s", error_msg.buf); reject_rebase_or_bisect_branch(oldref.buf); @@ -509,6 +543,8 @@ static void rename_branch(const char *oldname, const char *newname, int force) die(_("Branch is renamed, but update of config-file failed")); strbuf_release(&oldsection); strbuf_release(&newsection); +
[RFC PATCH 1/5] builtin/checkout: avoid usage of '!!'
Documentation/CodingGuidelines says, "Some clever tricks, like using the !! operator with arithmetic constructs, can be extremely confusing to others. Avoid them, unless there is a compelling reason to use them." There was a usage for which there's no compelling reason.So, replace such a usage as with something else that expresses the intent more clearly. Signed-off-by: Kaartic Sivaraam --- I think the expression, !!opts.new_branch_force is equivalent to, opts.new_branch_force != NULL in all cases. If it's not, let me know. builtin/checkout.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 5c202b7af..76859da9d 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -1281,11 +1281,10 @@ int cmd_checkout(int argc, const char **argv, const char *prefix) if (opts.new_branch) { struct strbuf buf = STRBUF_INIT; + int force = opts.new_branch_force != NULL; - opts.branch_exists = - validate_new_branchname(opts.new_branch, &buf, - !!opts.new_branch_force, - !!opts.new_branch_force); + opts.branch_exists = validate_new_branchname(opts.new_branch, &buf, +force, force); strbuf_release(&buf); } -- 2.14.1.868.g66c78774b