Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano wrote: > OK, thanks for working well together. So does this (1) build on > Jonathan's fsck-squelching series, or (2) ignores that and builds > filtering first, potentially leaving the codebase to a broken state > where it can create fsck-unclean repository until Jonathan's series > is rebased on top of this, or (3) something else? [*1*] Excluding the partialclone patch (patch 9), I think that the answer is (2), but I don't think that it leaves the codebase in a broken state. In particular, none of the code modifies the repo, so it can't create a fsck-unclean one. Maybe one could say that this leaves the codebase with features that no one will ever use in the absence of partial clone, but I don't think that's true - rev-list with blob-size/sparse-specification filter seems independently useful, at least (for example, when desiring to operate on a repo in a sparse way without going through a workdir), and if we're planning to allow listing of objects, we probably should allow packing as well (especially since this doesn't add much code). The above is relevant only if we can exclude the partialclone patch, but I think that we can and we should, as I wrote in my reply to Jeff Hostetler [1]. As for how this patch set (excluding the partialclone patch) interacts with my fsck series, they are relatively independent, as far as I can tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not the fetch and clone parts, which we plan to instead adapt from Jeff Hostetler's patches, as far as I know) on top of these and resend those out once discussion on this has settled. [1] https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/ > I also saw a patch marked as "this is from Jonathan's earlier work", > taking the authorship (which to me implies that the changes were > extensive enough), so I am a bit at loss envisioning how this piece > fits in the bigger picture together with the other piece. The patch you mentioned is the partialclone patch, which I think can be considered separately from the rest (as I said above).
Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file
Jeff Hostetler writes: > +static int add_excludes_from_buffer(char *buf, size_t size, > + const char *base, int baselen, > + struct exclude_list *el); > + > /* > * Given a file with name "fname", read it (either from disk, or from > * an index if 'istate' is non-null), parse it and store the > @@ -754,9 +758,9 @@ static int add_excludes(const char *fname, const char > *base, int baselen, > struct sha1_stat *sha1_stat) > { > struct stat st; > - int fd, i, lineno = 1; > + int fd; > size_t size = 0; > - char *buf, *entry; > + char *buf; > > fd = open(fname, O_RDONLY); > if (fd < 0 || fstat(fd, &st) < 0) { The post-context of this hunk is quite interesting in that there is a call to read_skip_worktree_file_from_index(); which essentially pretends as if we read from the filesystem but in fact it grabs the blob object name registered in the index and reads from it. The reason why it is interesting is because this patch adds yet nother "let's instead read from a blob object" function and there is no sign to make the existing one take advantage of the new function seen in this patch.
Re: [PATCH v2 1/1] completion: add remaining flags to checkout
Junio C Hamano writes: > Johannes Sixt writes: > >>> The flags --force and --ignore-other-worktrees are not added as they are >>> potentially dangerous. >>> >>> The flags --progress and --no-progress are only useful for scripting and are >>> therefore also not included. >>> >>> Signed-off-by: Thomas Braun >>> --- >>> contrib/completion/git-completion.bash | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/contrib/completion/git-completion.bash >>> b/contrib/completion/git-completion.bash >>> index d934417475..eb6ade6974 100644 >>> --- a/contrib/completion/git-completion.bash >>> +++ b/contrib/completion/git-completion.bash >>> @@ -1250,7 +1250,8 @@ _git_checkout () >>> --*) >>> __gitcomp " >>> --quiet --ours --theirs --track --no-track --merge >>> - --conflict= --orphan --patch >>> + --conflict= --orphan --patch --detach >>> --ignore-skip-worktree-bits >>> + --recurse-submodules --no-recurse-submodules >>> " >>> ;; >>> *) >>> >> >> Looks good to me. Thanks, >> -- Hannes > > Doesn't quite. This breaks t9902, doesn't it? I've queued it with the following squashed in. t/t9902-completion.sh | 4 1 file changed, 4 insertions(+) diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh index 2cb999ecfa..fc614dcbfa 100755 --- a/t/t9902-completion.sh +++ b/t/t9902-completion.sh @@ -1245,6 +1245,10 @@ test_expect_success 'double dash "git checkout"' ' --conflict= --orphan Z --patch Z + --detach Z + --ignore-skip-worktree-bits Z + --recurse-submodules Z + --no-recurse-submodules Z EOF '
Re: [PATCH v2 1/1] completion: add remaining flags to checkout
Johannes Sixt writes: >> The flags --force and --ignore-other-worktrees are not added as they are >> potentially dangerous. >> >> The flags --progress and --no-progress are only useful for scripting and are >> therefore also not included. >> >> Signed-off-by: Thomas Braun >> --- >> contrib/completion/git-completion.bash | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/contrib/completion/git-completion.bash >> b/contrib/completion/git-completion.bash >> index d934417475..eb6ade6974 100644 >> --- a/contrib/completion/git-completion.bash >> +++ b/contrib/completion/git-completion.bash >> @@ -1250,7 +1250,8 @@ _git_checkout () >> --*) >> __gitcomp " >> --quiet --ours --theirs --track --no-track --merge >> ---conflict= --orphan --patch >> +--conflict= --orphan --patch --detach >> --ignore-skip-worktree-bits >> +--recurse-submodules --no-recurse-submodules >> " >> ;; >> *) >> > > Looks good to me. Thanks, > -- Hannes Doesn't quite. This breaks t9902, doesn't it?
Re: [PATCHv2 0/2] (x)diff cleanup: remove duplicate code
Stefan Beller writes: > v2: > * I realized that we don't have to change the hashing function of xdiff; > we can rather change the hashing function for the move detection, > which is less fundamental. > (That way I can shrink the series down to 2 patches) > * commented and renamed function parameters in the exposed xdiff functions. > * applies on top of jk/diff-color-moved-fix. Yes, by reusing the line hashing and comparison from xdiff/ we can ensure that we will use consistent comparison function, and the thing we need to focus on will become how correctly the caller uses the xdiff interface. This looks much better than the previous one. Eric's comment on the function parameters is right. We keep them in sync with the naming convention of xdiff/ as long as they are still part of xdiff layer, and the convention there is that the lines being compared are l1[] and l2[] whose lengths are s1 and s2, if I am not mistaken (well, I am not, as I just touched the function there during my lunch break ;-).
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
Jeff Hostetler writes: > From: Jeff Hostetler > > I've been working with Jonathan Tan to combine our partial clone > proposals. This patch series represents a first step in that effort > and introduces an object filtering mechanism to select unwanted > objects. > > [1] traverse_commit_list and list-objects is extended to allow > various filters. > [2] rev-list is extended to expose filtering. This allows testing > of the filtering options. And can be used later to predict > missing objects before commands like checkout or merge. > [3] pack-objects is extended to use filtering parameters and build > packfiles that omit unwanted objects. > > This patch series lays the ground work for subsequent parts which > will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc. OK, thanks for working well together. So does this (1) build on Jonathan's fsck-squelching series, or (2) ignores that and builds filtering first, potentially leaving the codebase to a broken state where it can create fsck-unclean repository until Jonathan's series is rebased on top of this, or (3) something else? [*1*] I also saw a patch marked as "this is from Jonathan's earlier work", taking the authorship (which to me implies that the changes were extensive enough), so I am a bit at loss envisioning how this piece fits in the bigger picture together with the other piece. [Footnote] *1* Not having the answer to this question does bother me, but it is perfectly fine if the answer is (2), especially while the series is in a WIP state.
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler wrote: > From: Jeff Hostetler > > I've been working with Jonathan Tan to combine our partial clone > proposals. This patch series represents a first step in that effort > and introduces an object filtering mechanism to select unwanted > objects. > > [1] traverse_commit_list and list-objects is extended to allow > various filters. > [2] rev-list is extended to expose filtering. This allows testing > of the filtering options. And can be used later to predict > missing objects before commands like checkout or merge. > [3] pack-objects is extended to use filtering parameters and build > packfiles that omit unwanted objects. > > This patch series lays the ground work for subsequent parts which > will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc. Thanks - I've taken a look at all of them except for the partialclone extension one, which I've only glanced over briefly. Apart from some style issues (indentation and typedef-ing of enums) I think that they generally look all right. One possible issue with using a formatted filter string (e.g. blob:none) directly passed to the server as opposed to actual client-interpreted flags (e.g. --blob-byte-limit=0) is that a user may be confused if the version of Git they're using supports fancier filters, which will work if they're running rev-list locally, but not when they're fetching from a not-so-fancy Git server. Maybe that is fine, though - something we've discussed internally is an ability to offload some calculations (e.g. git log -S) to Git servers, and if we end up doing something similar to that, users will need to deal with this problem anyway. The reason why I only glanced over the partialclone patch is because I think that that change needs more discussion than the rest, and it would be good to get the others in first. I know that you included the partialclone patch because it is needed by the rev-list one, but as I commented in the rev-list one, I think that it does not need to be aware of partial clones, at least for now. The overall ideas in the partialclone patch seem good, though - in particular, one conceptual difference from my patch [1] is that the filter setting is a property of the repository as opposed to the remote, which does seem like an improvement, because it makes it easier to make and explain e.g. a "git log --use-repo-filter -S" command that uses the preconfigured config. [1] https://public-inbox.org/git/407a298b52a9e0a2ee4135fe844e35b9a14c0f7b.1506714999.git.jonathanta...@google.com/
Re: Consequences of CRLF in index?
Junio C Hamano writes: > Jonathan Nieder writes: > >> I'd be interested to hear what happens when diff-ing across a line >> ending fixup commit. Is this an area where Git needs some >> improvement? "git merge" knows an -Xrenormalize option to deal with a >> related problem --- it's possible that "git diff" needs to learn a >> similar trick. > > My knee-jerk reaction is that (1) the end user definitely wants to > see preimage and postimage lines are different in such a commit by > default, one side has and the other side lacks ^M at the end., and > (2) one of the "whitespace ignoring" options (namely, "ignore space > at eol") may suffice, but if not, it should be easy to invent a new > one "ignore CR at eol". Here is a lunch-time hack to add that option. As you can see from the lack of docs, tests and a proper log message, I haven't played with it long enough to know how buggy it is, though. I am not all that interested in polishing it to completion myself and prefer to leave it as #leftoverbits ;-) Also I didn't bother teaching this to Stefan's color-moved thing, as the line comparator it uses will hopefully be unified with the one I am touching in xdiff/ with this patch. Have fun. diff.c| 5 - merge-recursive.c | 2 ++ xdiff/xdiff.h | 3 ++- xdiff/xutils.c| 34 -- 4 files changed, 40 insertions(+), 4 deletions(-) diff --git a/diff.c b/diff.c index 6fd288420b..eeca0fd3b8 100644 --- a/diff.c +++ b/diff.c @@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options) if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) || DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) || - DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL)) + DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) || + DIFF_XDL_TST(options, IGNORE_CR_AT_EOL)) DIFF_OPT_SET(options, DIFF_FROM_CONTENTS); else DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS); @@ -4659,6 +4660,8 @@ int diff_opt_parse(struct diff_options *options, DIFF_XDL_SET(options, IGNORE_WHITESPACE_CHANGE); else if (!strcmp(arg, "--ignore-space-at-eol")) DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); + else if (!strcmp(arg, "--ignore-cr-at-eol")) + DIFF_XDL_SET(options, IGNORE_CR_AT_EOL); else if (!strcmp(arg, "--ignore-blank-lines")) DIFF_XDL_SET(options, IGNORE_BLANK_LINES); else if (!strcmp(arg, "--indent-heuristic")) diff --git a/merge-recursive.c b/merge-recursive.c index 1d3f8f0d22..4a49ec93b1 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -2251,6 +2251,8 @@ int parse_merge_opt(struct merge_options *o, const char *s) DIFF_XDL_SET(o, IGNORE_WHITESPACE); else if (!strcmp(s, "ignore-space-at-eol")) DIFF_XDL_SET(o, IGNORE_WHITESPACE_AT_EOL); + else if (!strcmp(s, "ignore-cr-at-eol")) + DIFF_XDL_SET(o, IGNORE_CR_AT_EOL); else if (!strcmp(s, "renormalize")) o->renormalize = 1; else if (!strcmp(s, "no-renormalize")) diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index b090ad8eac..ff16243da9 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -32,7 +32,8 @@ extern "C" { #define XDF_IGNORE_WHITESPACE (1 << 2) #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3) #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4) -#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL) +#define XDF_IGNORE_CR_AT_EOL (1 << 5) +#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | XDF_IGNORE_CR_AT_EOL) #define XDF_PATIENCE_DIFF (1 << 5) #define XDF_HISTOGRAM_DIFF (1 << 6) diff --git a/xdiff/xutils.c b/xdiff/xutils.c index 04d7b32e4e..8720e272b9 100644 --- a/xdiff/xutils.c +++ b/xdiff/xutils.c @@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags) return (i == size); } +/* + * Have we eaten everything on the line, except for an optional + * CR at the very end? + */ +static int ends_with_optional_cr(const char *l, long s, long i) +{ + if (s && l[s-1] == '\n') + s--; + if (s == i) + return 1; + if (s == i + 1 && l[i] == '\r') + return 1; + return 0; +} + int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags) { int i1, i2; @@ -170,7 +185,8 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags) /* * -w matches everything that matches with -b, and -b in turn -* matches everything that matches with --ignore-space-at-eol. +* matches everything that matches with --ignore-space-at-eol, +* which in turn matches everything that matches with --ignore-cr-at-eol. * * Each flavor of ignoring needs different logic to skip whitespaces *
Re: [PATCH 10/13] rev-list: add list-objects filtering support
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler wrote: > static void finish_object(struct object *obj, const char *name, void > *cb_data) > { > struct rev_list_info *info = cb_data; > - if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) > + if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) { > + if (arg_print_missing) { > + list_objects_filter_map_insert( > + &missing_objects, &obj->oid, name, obj->type); > + return; > + } > + > + /* > +* Relax consistency checks when we expect missing > +* objects because of partial-clone or a previous > +* partial-fetch. > +* > +* Note that this is independent of any filtering that > +* we are doing in this run. > +*/ > + if (is_partial_clone_registered()) > + return; > + > die("missing blob object '%s'", oid_to_hex(&obj->oid)); I'm fine with arg_print_missing suppressing lazy fetching (when I rebase my patches on this, I'll have to ensure that fetch_if_missing is set to 0 if arg_print_missing is true), but I think that the behavior when arg_print_missing is false should be the opposite - we should let has_object_file() perform the lazy fetching, and die if it returns false (that is, if the fetching failed). > + } > if (info->revs->verify_objects && !obj->parsed && obj->type != > OBJ_COMMIT) > parse_object(&obj->oid); > }
Re: [PATCH 1/2] xdiff-interface: export comparing and hashing strings
On Tue, Oct 24, 2017 at 7:42 PM, Stefan Beller wrote: > This will turn out to be useful in a later patch. > > xdl_recmatch is exported in xdiff/xutils.h, to be used by various > xdiff/*.c files, but not outside of xdiff/. This one makes it available > to the outside, too. > > While at it, add documentation. > > Signed-off-by: Stefan Beller > --- > diff --git a/xdiff-interface.h b/xdiff-interface.h > @@ -29,4 +29,20 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg); > +/* > + * Compare the strings l1 with l2 which are of size s1 and s2 respectively. > + * Returns 1 if the strings are deemed equal, 0 otherwise. > + * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces > + * are treated for the comparision. > + */ > +extern int xdiff_compare_lines(const char *a, long len_a, > + const char *b, long len_b, long flags); The comment block talks about 'l1', 'l2', 's1', and 's2', but the declaration uses 'a', 'b, 'len_a', 'len_b'. Confusing.
Re: [PATCH 08/13] list-objects: add traverse_commit_list_filtered method
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler wrote: > +void traverse_commit_list_filtered( > + struct list_objects_filter_options *filter_options, > + struct rev_info *revs, > + show_commit_fn show_commit, > + show_object_fn show_object, > + list_objects_filter_map_foreach_cb print_omitted_object, > + void *show_data); So the function call chain, if we wanted a filtered traversal, is: traverse_commit_list_filtered -> traverse_commit_list__sparse_path (and friends, and each algorithm is in its own file) -> traverse_commit_list_worker This makes the implementation of each algorithm more easily understood (since they are all in their own files), but also increases the number of global functions and code files. I personally would combine the traverse_commit_list__* functions into one file (list-objects-filtered.c), make them static, and also put traverse_commit_list_filtered in there, but I understand that other people in the Git project may differ on this.
Re: [PATCH 07/13] list-objects-filter-options: common argument parsing
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler wrote: > + * ::= blob:none > + * blob:limit:[kmg] > + * sparse:oid: > + * sparse:path: I notice in the code below that there are some usages of "=" instead of ":" - could you clarify which one it is? (Ideally this would point to one point of documentation which serves as both user and technical documentation.) > + */ > +int parse_list_objects_filter(struct list_objects_filter_options > *filter_options, > + const char *arg) > +{ > + struct object_context oc; > + struct object_id sparse_oid; > + const char *v0; > + const char *v1; > + > + if (filter_options->choice) > + die(_("multiple object filter types cannot be combined")); > + > + /* > +* TODO consider rejecting 'arg' if it contains any > +* TODO injection characters (since we might send this > +* TODO to a sub-command or to the server and we don't > +* TODO want to deal with legacy quoting/escaping for > +* TODO a new feature). > +*/ > + > + filter_options->raw_value = strdup(arg); > + > + if (skip_prefix(arg, "blob:", &v0) || skip_prefix(arg, "blobs:", > &v0)) { I know that some people prefer leniency, but I think it's better to standardize on one form ("blob" instead of both "blob" and "blobs"). > + if (!strcmp(v0, "none")) { > + filter_options->choice = LOFC_BLOB_NONE; > + return 0; > + } > + > + if (skip_prefix(v0, "limit=", &v1) && > + git_parse_ulong(v1, &filter_options->blob_limit_value)) { > + filter_options->choice = LOFC_BLOB_LIMIT; > + return 0; > + } > + } > + else if (skip_prefix(arg, "sparse:", &v0)) { > + if (skip_prefix(v0, "oid=", &v1)) { > + filter_options->choice = LOFC_SPARSE_OID; > + if (!get_oid_with_context(v1, GET_OID_BLOB, > + &sparse_oid, &oc)) { > + /* > +* We successfully converted the > +* into an actual OID. Rewrite the raw_value > +* in canonoical form with just the OID. > +* (If we send this request to the server, we > +* want an absolute expression rather than a > +* local-ref-relative expression.) > +*/ > + free((char *)filter_options->raw_value); > + filter_options->raw_value = > + xstrfmt("sparse:oid=%s", > + oid_to_hex(&sparse_oid)); > + filter_options->sparse_oid_value = > + oiddup(&sparse_oid); > + } else { > + /* > +* We could not turn the into an > +* OID. Leave the raw_value as is in case > +* the server can parse it. (It may refer to > +* a branch, commit, or blob we don't have.) > +*/ > + } > + return 0; > + } > + > + if (skip_prefix(v0, "path=", &v1)) { > + filter_options->choice = LOFC_SPARSE_PATH; > + filter_options->sparse_path_value = strdup(v1); > + return 0; > + } > + } > + > + die(_("invalid filter expression '%s'"), arg); > + return 0; > +} > + > +int opt_parse_list_objects_filter(const struct option *opt, > + const char *arg, int unset) > +{ > + struct list_objects_filter_options *filter_options = opt->value; > + > + assert(arg); > + assert(!unset); > + > + return parse_list_objects_filter(filter_options, arg); > +} > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h > new file mode 100644 > index 000..23bd68e > --- /dev/null > +++ b/list-objects-filter-options.h > @@ -0,0 +1,50 @@ > +#ifndef LIST_OBJECTS_FILTER_OPTIONS_H > +#define LIST_OBJECTS_FILTER_OPTIONS_H > + > +#include "parse-options.h" > + > +/* > + * Common declarations and utilities for filtering objects (such as omitting > + * large blobs) in list_objects:traverse_commit_list() and git-rev-list. > + */ > + > +enum list_objects_filter_choice { > + LOFC_DISABLED = 0, > + LOFC_BLOB_NONE, > + LOFC_BLOB_LIMIT, > + LOFC_SPARSE_OID, > + LOFC_SPARSE_PATH, > +}; > + > +struct list_objects_filter_optio
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano wrote: > Jeff King writes: >> I definitely agree with the sentiment that as few things as possible >> should happen between calling getenv() and using its result. I've seen >> real bugs there from unexpected invalidation of the static buffer. > > Sure. I do not think we mind xstrdup()ing the result and freeing > after we are done, though ;-) More specifically, xstrdup_or_null(getenv(...)), if that route is chosen.
Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file
On Tue, Oct 24, 2017 at 2:53 PM, Jeff Hostetler wrote: > Refactor add_excludes() to separate the reading of the > exclude file into a buffer and the parsing of the buffer > into exclude_list items. > > Add add_excludes_from_blob_to_list() to allow an exclude > file be specified with an OID. > > Signed-off-by: Jeff Hostetler > --- > diff --git a/dir.c b/dir.c > @@ -841,6 +856,38 @@ int add_excludes_from_file_to_list(const char *fname, > const char *base, > +int add_excludes_from_blob_to_list( > + struct object_id *oid, > + const char *base, int baselen, > + struct exclude_list *el) > +{ > + char *buf; > + unsigned long size; > + enum object_type type; > + > + buf = read_sha1_file(oid->hash, &type, &size); > + if (!buf) > + return -1; > + > + if (type != OBJ_BLOB) { > + free(buf); > + return -1; > + } > + > + if (size == 0) { > + free(buf); > + return 0; > + } > + > + if (buf[size - 1] != '\n') { > + buf = xrealloc(buf, st_add(size, 1)); > + buf[size++] = '\n'; > + } > + > + add_excludes_from_buffer(buf, size, base, baselen, el); Seeing all the free()'s above, at first glance, one wonders why 'buf' isn't freed here after add_excludes_from_buffer(), however an examination of that function shows that 'buf' is assigned to el->filebuf and later freed by clear_exclude_list(). Okay. > + return 0; Should this be returning the result of add_excludes_from_buffer() rather than unconditionally returning 0? > +}
Re: [PATCH 03/13] list-objects: filter objects in traverse_commit_list
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler wrote: > +enum list_objects_filter_result { > + LOFR_ZERO = 0, > + LOFR_MARK_SEEN = 1<<0, Probably worth documenting, something like /* Mark this object so that it is skipped for the rest of the traversal. */ > + LOFR_SHOW = 1<<1, And something like /* Invoke show_object_fn on this object. This object may be revisited unless LOFR_MARK_SEEN is also set. */ > +}; > + > +/* See object.h and revision.h */ > +#define FILTER_REVISIT (1<<25) I think this should be declared closer to its use - in the sparse filter code or in the file that uses it. Wherever it is, also update the chart in object.h to indicate that we're using this 25th bit. > + > +enum list_objects_filter_type { > + LOFT_BEGIN_TREE, > + LOFT_END_TREE, > + LOFT_BLOB > +}; > + > +typedef enum list_objects_filter_result list_objects_filter_result; > +typedef enum list_objects_filter_type list_objects_filter_type; I don't think we typedef enums in Git code. > + > +typedef list_objects_filter_result (*filter_object_fn)( > + list_objects_filter_type filter_type, > + struct object *obj, > + const char *pathname, > + const char *filename, > + void *filter_data); > + > +void traverse_commit_list_worker( > + struct rev_info *, > + show_commit_fn, show_object_fn, void *show_data, > + filter_object_fn filter, void *filter_data); I think things would be much clearer if a default filter was declared (matching the behavior as of this patch when filter == NULL), say: static inline default_filter(args) { switch(filter_type) { case LOFT_BEGIN_TREE: return LOFR_MARK_SEEN | LOFR_SHOW; case LOFT_END_TREE: return LOFT_ZERO; ... And inline traverse_commit_list() instead of putting it in the .c file. This would reduce or eliminate the need to document traverse_commit_list_worker, including what happens if filter is NULL, and explain how a user would make their own filter_object_fn. > + > +#endif /* LIST_OBJECTS_H */ > -- > 2.9.3 >
Re: [PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure
Michael Haggerty writes: > ... But dc39e09942 added another blurb of code between > the loop and the cleanup. That blurb sometimes resets `ret` to zero, > making the cleanup code think that the locking was successful. > ... > The fix is simple: instead of just breaking out of the loop, jump > directly to the cleanup code. This fixes some tests in t1404 that were > added in the previous commit. OK. Now because we do not break and start packed_transaction but instead jump over that if statement, we'll leave packed_transation instance that we got from transaction_begin() that we called add_update() on, but haven't called transaction_prepare() on behind. That instance is pointed by backend_data pointer which is part of transaction, so presumably transaction_cleanup() called on it in the section labelled "cleanup:" should take care of it? Thanks for catching the issue and fixing it quickly. Will queue.
Re: Consequences of CRLF in index?
Jonathan Nieder writes: > I'd be interested to hear what happens when diff-ing across a line > ending fixup commit. Is this an area where Git needs some > improvement? "git merge" knows an -Xrenormalize option to deal with a > related problem --- it's possible that "git diff" needs to learn a > similar trick. My knee-jerk reaction is that (1) the end user definitely wants to see preimage and postimage lines are different in such a commit by default, one side has and the other side lacks ^M at the end., and (2) one of the "whitespace ignoring" options (namely, "ignore space at eol") may suffice, but if not, it should be easy to invent a new one "ignore CR at eol".
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
Jeff King writes: > I definitely agree with the sentiment that as few things as possible > should happen between calling getenv() and using its result. I've seen > real bugs there from unexpected invalidation of the static buffer. Sure. I do not think we mind xstrdup()ing the result and freeing after we are done, though ;-)
Re: [PATCH] status: do not get confused by submodules in excluded directories
Heiko Voigt writes: > On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote: >> Johannes Schindelin writes: >> >> > We meticulously pass the `exclude` flag to the `treat_directory()` >> > function so that we can indicate that files in it are excluded rather >> > than untracked when recursing. >> > >> > But we did not yet treat submodules the same way. >> >> ... "because of that, we ended up showing <> what situation>>" would be a nice thing to have here, so that it can >> be copied to the release notes for the bugfix. > > Yes I agree that would be nice here. It was not immediately obvious that > this only applies when using both flags: -u and --ignored. Does any of you care to fill in the <> then? ;-) > Looks good to me. > > Cheers Heiko
Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only
Junio C Hamano writes: > Brandon Williams writes: > >> One simple idea would be to convert the single 'flag' into various bit >> fields themselves, that way if you need to add a new flag you would just >> make a new bit field. I'm unaware of any downsides of doing so (though >> i may be missing something) but doing so would probably cause a bit of >> code churn. > > The reason why people want to have their own bit in the flags word > is because they want to use DIFF_OPT_{SET,CLR,TST,TOUCHED} but they > do not want to do the work to extend them beyond a single word. > > I think it is doable by making everything a 1-bit wide bitfield > without affecting existing users. ... but the "touched" thing may be harder---I haven't thought it through.
Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only
Brandon Williams writes: > One simple idea would be to convert the single 'flag' into various bit > fields themselves, that way if you need to add a new flag you would just > make a new bit field. I'm unaware of any downsides of doing so (though > i may be missing something) but doing so would probably cause a bit of > code churn. The reason why people want to have their own bit in the flags word is because they want to use DIFF_OPT_{SET,CLR,TST,TOUCHED} but they do not want to do the work to extend them beyond a single word. I think it is doable by making everything a 1-bit wide bitfield without affecting existing users.
Re: [RFC] protocol version 2
Brandon Williams writes: >> I actually have a reasonable guess why you want to have a separate >> delimiter (which has nothing to do with "optional delim can be >> omitted"), but I want to see it explained in this document clearly >> by its designer(s). > > Jonathan Tan suggested that we tighten flush semantics in a newer > protocol so that proxies are easier to work with. Currently proxies > need to understand the protocol instead of simply waiting for a flush. > > Also I've been told the smart http code is more complex because of the > current semantics of flush packets. I think the above two are the same thing ;-) but yes, "flush" in the original protocol were used for both "I am truly finished talking; now it is your turn" and "I am done with one section of what I need to say, and a different section now begins; it is still my turn to speak". The need to handle the latter makes smart-http quite ugly. Thanks.
[PATCH 1/5] fnv: migrate code from hashmap to fnv
The functions regarding the Fowler–Noll–Vo hash function are put in a separate compilation unit, as it is logically different from the hashmap functionality. Signed-off-by: Stefan Beller --- This may still be an interesting cleanup on its own. Thanks, Stefan Makefile| 1 + attr.c | 1 + builtin/fast-export.c | 1 + diff.c | 1 + fnv.c | 64 + fnv.h | 20 hashmap.c | 64 + hashmap.h | 15 merge-recursive.c | 1 + remote.c| 1 + submodule-config.c | 1 + t/helper/test-hashmap.c | 1 + 12 files changed, 93 insertions(+), 78 deletions(-) create mode 100644 fnv.c create mode 100644 fnv.h diff --git a/Makefile b/Makefile index cd75985991..8e1c5988f3 100644 --- a/Makefile +++ b/Makefile @@ -793,6 +793,7 @@ LIB_OBJS += ewah/ewah_io.o LIB_OBJS += ewah/ewah_rlw.o LIB_OBJS += exec_cmd.o LIB_OBJS += fetch-pack.o +LIB_OBJS += fnv.o LIB_OBJS += fsck.o LIB_OBJS += gettext.o LIB_OBJS += gpg-interface.o diff --git a/attr.c b/attr.c index dfc3a558d8..2e4217c4f1 100644 --- a/attr.c +++ b/attr.c @@ -16,6 +16,7 @@ #include "utf8.h" #include "quote.h" #include "thread-utils.h" +#include "fnv.h" const char git_attr__true[] = "(builtin)true"; const char git_attr__false[] = "\0(builtin)false"; diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 2fb60d6d48..62f4010510 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -21,6 +21,7 @@ #include "quote.h" #include "remote.h" #include "blob.h" +#include "fnv.h" static const char *fast_export_usage[] = { N_("git fast-export [rev-list-opts]"), diff --git a/diff.c b/diff.c index c4a669ffa8..a23f4521fb 100644 --- a/diff.c +++ b/diff.c @@ -22,6 +22,7 @@ #include "argv-array.h" #include "graph.h" #include "packfile.h" +#include "fnv.h" #ifdef NO_FAST_WORKING_DIRECTORY #define FAST_WORKING_DIRECTORY 0 diff --git a/fnv.c b/fnv.c new file mode 100644 index 00..b4cbf39f0a --- /dev/null +++ b/fnv.c @@ -0,0 +1,64 @@ +#include "fnv.h" + +#define FNV32_BASE ((unsigned int) 0x811c9dc5) +#define FNV32_PRIME ((unsigned int) 0x01000193) + +unsigned int strhash(const char *str) +{ + unsigned int c, hash = FNV32_BASE; + while ((c = (unsigned char) *str++)) + hash = (hash * FNV32_PRIME) ^ c; + return hash; +} + +unsigned int strihash(const char *str) +{ + unsigned int c, hash = FNV32_BASE; + while ((c = (unsigned char) *str++)) { + if (c >= 'a' && c <= 'z') + c -= 'a' - 'A'; + hash = (hash * FNV32_PRIME) ^ c; + } + return hash; +} + +unsigned int memhash(const void *buf, size_t len) +{ + unsigned int hash = FNV32_BASE; + unsigned char *ucbuf = (unsigned char *) buf; + while (len--) { + unsigned int c = *ucbuf++; + hash = (hash * FNV32_PRIME) ^ c; + } + return hash; +} + +unsigned int memihash(const void *buf, size_t len) +{ + unsigned int hash = FNV32_BASE; + unsigned char *ucbuf = (unsigned char *) buf; + while (len--) { + unsigned int c = *ucbuf++; + if (c >= 'a' && c <= 'z') + c -= 'a' - 'A'; + hash = (hash * FNV32_PRIME) ^ c; + } + return hash; +} + +/* + * Incoporate another chunk of data into a memihash + * computation. + */ +unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len) +{ + unsigned int hash = hash_seed; + unsigned char *ucbuf = (unsigned char *) buf; + while (len--) { + unsigned int c = *ucbuf++; + if (c >= 'a' && c <= 'z') + c -= 'a' - 'A'; + hash = (hash * FNV32_PRIME) ^ c; + } + return hash; +} diff --git a/fnv.h b/fnv.h new file mode 100644 index 00..b425c85c66 --- /dev/null +++ b/fnv.h @@ -0,0 +1,20 @@ +#ifndef FNV_H +#define FNV_H + +#include +/* + * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see + * http://www.isthe.com/chongo/tech/comp/fnv). + * `strhash` and `strihash` take 0-terminated strings, while `memhash` and + * `memihash` operate on arbitrary-length memory. + * `strihash` and `memihash` are case insensitive versions. + * `memihash_cont` is a variant of `memihash` that allows a computation to be + * continued with another chunk of data. + */ +extern unsigned int strhash(const char *buf); +extern unsigned int strihash(const char *buf); +extern unsigned int memhash(const void *buf, size_t len); +extern unsigned int memihash(const void *buf, size_t len); +extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len); + +#endif diff --git a/hashmap.c b/hashmap.c index d42f01ff5a..1605edbbc3 100644 --
[PATCH 2/2] diff.c: get rid of duplicate implementation
The implementations in diff.c to detect moved lines needs to compare strings and hash strings, which is implemented in that file, as well as in the xdiff library. Remove the rather recent implementation in diff.c and rely on the well exercised code in the xdiff lib. With this change the hash used for bucketing the strings for the moved line detection changes from FNV32 (that is provided via the hashmaps memhash) to DJB2 (which is used internally in xdiff). Benchmarks found on the web[1] do not indicate that these hashes are different in performance for readable strings. [1] https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed Signed-off-by: Stefan Beller --- diff.c | 82 -- 1 file changed, 4 insertions(+), 78 deletions(-) diff --git a/diff.c b/diff.c index c4a669ffa8..e6814b9e9c 100644 --- a/diff.c +++ b/diff.c @@ -707,88 +707,14 @@ struct moved_entry { struct moved_entry *next_line; }; -static int next_byte(const char **cp, const char **endp, -const struct diff_options *diffopt) -{ - int retval; - - if (*cp >= *endp) - return -1; - - if (isspace(**cp)) { - if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) { - while (*cp < *endp && isspace(**cp)) - (*cp)++; - /* -* After skipping a couple of whitespaces, -* we still have to account for one space. -*/ - return (int)' '; - } - - if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) { - while (*cp < *endp && isspace(**cp)) - (*cp)++; - /* -* return the first non-ws character via the usual -* below, unless we ate all of the bytes -*/ - if (*cp >= *endp) - return -1; - } - } - - retval = (unsigned char)(**cp); - (*cp)++; - return retval; -} - static int moved_entry_cmp(const struct diff_options *diffopt, const struct moved_entry *a, const struct moved_entry *b, const void *keydata) { - const char *ap = a->es->line, *ae = a->es->line + a->es->len; - const char *bp = b->es->line, *be = b->es->line + b->es->len; - - if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS)) - return a->es->len != b->es->len || memcmp(ap, bp, a->es->len); - - if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) { - while (ae > ap && isspace(ae[-1])) - ae--; - while (be > bp && isspace(be[-1])) - be--; - } - - while (1) { - int ca, cb; - ca = next_byte(&ap, &ae, diffopt); - cb = next_byte(&bp, &be, diffopt); - if (ca != cb) - return 1; - if (ca < 0) - return 0; - } -} - -static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_options *o) -{ - if (o->xdl_opts & XDF_WHITESPACE_FLAGS) { - static struct strbuf sb = STRBUF_INIT; - const char *ap = es->line, *ae = es->line + es->len; - int c; - - strbuf_reset(&sb); - while (ae > ap && isspace(ae[-1])) - ae--; - while ((c = next_byte(&ap, &ae, o)) >= 0) - strbuf_addch(&sb, c); - - return memhash(sb.buf, sb.len); - } else { - return memhash(es->line, es->len); - } + return !xdiff_compare_lines(a->es->line, a->es->len, + b->es->line, b->es->len, + diffopt->xdl_opts); } static struct moved_entry *prepare_entry(struct diff_options *o, @@ -797,7 +723,7 @@ static struct moved_entry *prepare_entry(struct diff_options *o, struct moved_entry *ret = xmalloc(sizeof(*ret)); struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no]; - ret->ent.hash = get_string_hash(l, o); + ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts); ret->es = l; ret->next_line = NULL; -- 2.15.0.rc2.6.g953226eb5f
[PATCH 1/2] xdiff-interface: export comparing and hashing strings
This will turn out to be useful in a later patch. xdl_recmatch is exported in xdiff/xutils.h, to be used by various xdiff/*.c files, but not outside of xdiff/. This one makes it available to the outside, too. While at it, add documentation. Signed-off-by: Stefan Beller --- xdiff-interface.c | 11 +++ xdiff-interface.h | 16 2 files changed, 27 insertions(+) diff --git a/xdiff-interface.c b/xdiff-interface.c index 018e033089..9b35af2455 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -5,6 +5,7 @@ #include "xdiff/xdiffi.h" #include "xdiff/xemit.h" #include "xdiff/xmacros.h" +#include "xdiff/xutils.h" struct xdiff_emit_state { xdiff_emit_consume_fn consume; @@ -296,6 +297,16 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg) } } +unsigned long xdiff_hash_string(const char *s, size_t len, long flags) +{ + return xdl_hash_record(&s, s + len, flags); +} + +int xdiff_compare_lines(const char *a, long len_a, const char *b, long len_b, long flags) +{ + return xdl_recmatch(a, len_a, b, len_b, flags); +} + int git_xmerge_style = -1; int git_xmerge_config(const char *var, const char *value, void *cb) diff --git a/xdiff-interface.h b/xdiff-interface.h index 6f6ba9095d..6f5abaf8d3 100644 --- a/xdiff-interface.h +++ b/xdiff-interface.h @@ -29,4 +29,20 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg); extern int git_xmerge_config(const char *var, const char *value, void *cb); extern int git_xmerge_style; +/* + * Compare the strings l1 with l2 which are of size s1 and s2 respectively. + * Returns 1 if the strings are deemed equal, 0 otherwise. + * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces + * are treated for the comparision. + */ +extern int xdiff_compare_lines(const char *a, long len_a, + const char *b, long len_b, long flags); + +/* + * Returns a hash of the string s of length len. + * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces + * are treated for the hash. + */ +extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags); + #endif -- 2.15.0.rc2.6.g953226eb5f
[PATCHv2 0/2] (x)diff cleanup: remove duplicate code
v2: * I realized that we don't have to change the hashing function of xdiff; we can rather change the hashing function for the move detection, which is less fundamental. (That way I can shrink the series down to 2 patches) * commented and renamed function parameters in the exposed xdiff functions. * applies on top of jk/diff-color-moved-fix. Thanks, Stefan v1: Junio wrote: > * I was hoping that the next_byte() and string_hash() thing, once >they are cleaned up, will eventually be shared with the xdiff/ >code at the lower layer, which needs to do pretty much the same >in order to implement various whitespace ignoring options. I am >not sure how well the approach taken by the WIP patch meshes with >the needs of the lower layer. This series does exactly this; although I chose to reuse the code in xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that code has seen more exercise already (hence I assume it has fewer bugs if any as well as its performance implications are well understood). However to do so, we need to pollute xdiff/xutils.c and include hashmap.h there (which also requires cache.h as that header has an inline function using BUG()), which I find a bit unfortunate but worth the trade off of using better tested code. Thanks, Stefan Stefan Beller (2): xdiff-interface: export comparing and hashing strings diff.c: get rid of duplicate implementation diff.c| 82 +++ xdiff-interface.c | 11 xdiff-interface.h | 16 +++ 3 files changed, 31 insertions(+), 78 deletions(-) -- 2.15.0.rc2.6.g953226eb5f
Re: [PATCH 3/4] xdiff: use stronger hash function internally
Stefan Beller wrote: > --- a/xdiff/xutils.c > +++ b/xdiff/xutils.c > @@ -24,7 +24,8 @@ > #include > #include "xinclude.h" > > - > +#include "cache.h" > +#include "hashmap.h" #includes like "git-compat-util.h" and "cache.h" can only be used as the *first* #include. Otherwise the feature test macros they set do not take effect, causing lots of unpleasant platform-specific breakage. Thanks, Jonathan
Re: [GIT PULL] l10n updates for 2.15.0 round 2
El mar, 24-10-2017 a las 11:48 +0900, Junio C Hamano escribió: > > Thanks, will pull. Nice to see a new language added to the > repertoire. > Thank you, I am very happy to be able to help the community. And bring Git a little closer to Latin America and Spain. Regards -- Christopher Díaz Riveros Gentoo Developer
Re: Consequences of CRLF in index?
Am 24.10.2017 um 19:48 schrieb Lars Schneider: I've migrated a large repo (110k+ files) with a lot of history (177k commits) and a lot of users (200+) to Git. Unfortunately, all text files in the index of the repo have CRLF line endings. In general this seems not to be a problem as the project is developed exclusively on Windows. However, I wonder if there are any "hidden consequences" of this setup? I've been working on a project with CRLF in every source file for a decade now. It's C++ source, and it isn't even Windows-only: when checked out on Linux, there are CRs in the files, with no bad consequences so far. GCC is happy with them. -- Hannes
Re: [PATCH 1/4] hashmap: introduce memhash_feed to access the internals of FNV-1 hash
>> +unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next) > > Why is the second parameter const and the first one isn't? (We tend > not to bother with const for value types.) will do. >> + hash = memhash_feed(hash, c); > > I guess compilers inline a copy of the function here with -O2. My > knee-jerk reaction, however, is horror in the face of adding a function > call to the inner loop of a hash function. Do you have performance > test results, ideally also with -O0? And why not make memhash_feed() > an inline function or macro to sidestep that issue? My gut feeling was similar, but then I assumed compilers of today would be smart. As per the discussion on a later patch, we could migrate all memhash* functions to fnv32.c except for the _feed, which will be static in its header fnv32.h Thanks, Stefan
Re: [PATCH 3/4] xdiff: use stronger hash function internally
On Tue, Oct 24, 2017 at 1:23 PM, René Scharfe wrote: > Am 24.10.2017 um 20:59 schrieb Stefan Beller: >> Instead of using the hash seeded with 5381, and updated via >> `(hash << 5) ^ new_byte`, use the FNV-1 primitives as offered by >> hashmap.h, which is seeded with 0x811c9dc5 and computed as >> `(hash * 0x01000193) ^ new_byte`. > > The hash function you're replacing is called DJB2; I think that's worth > mentioning. I was not aware of the name. I'll look it up; thanks! > > Performance test results would be nice. No idea how to find edge cases, > though, or better: demonstrate a lack thereof. My reasoning, though not in the commit message, is that the operations are essentially equal, just with different numeric values, hence no impact. I can look at the assembly and measure, too. > >> >> Signed-off-by: Stefan Beller >> --- >> xdiff/xutils.c | 19 --- >> 1 file changed, 8 insertions(+), 11 deletions(-) >> >> diff --git a/xdiff/xutils.c b/xdiff/xutils.c >> index 04d7b32e4e..a58a28c687 100644 >> --- a/xdiff/xutils.c >> +++ b/xdiff/xutils.c >> @@ -24,7 +24,8 @@ >> #include >> #include "xinclude.h" >> >> - >> +#include "cache.h" >> +#include "hashmap.h" > > Ouch. Defining FNV32_BASE and FNV32_PRIME here would be much easier > overall. And if that's too much duplication then those definitions > could be extracted into a new header file (fnv32.h?) included by both > hashmap.h and xutils.c. I guess fnv32.h would do; it would contain the defines as well as the static inline function to be used in the inner loop of patch 1. > >> >> >> long xdl_bogosqrt(long n) { >> @@ -228,7 +229,7 @@ int xdl_recmatch(const char *l1, long s1, const char >> *l2, long s2, long flags) >> >> static unsigned long xdl_hash_record_with_whitespace(char const **data, >> char const *top, long flags) { >> - unsigned long ha = 5381; >> + unsigned long ha = memhash(NULL, 0); >> char const *ptr = *data; >> >> for (; ptr < top && *ptr != '\n'; ptr++) { >> @@ -243,21 +244,18 @@ static unsigned long >> xdl_hash_record_with_whitespace(char const **data, >> ; /* already handled */ >> else if (flags & XDF_IGNORE_WHITESPACE_CHANGE >>&& !at_eol) { >> - ha += (ha << 5); >> - ha ^= (unsigned long) ' '; >> + ha = memhash_feed(ha, (unsigned char) ' '); > > All the memhash_feed() callers in this file cast to unsigned char. A > macro or a function (possibly inline) defined at the top could do > that for them. That would go away when using fnv32.h Thanks for the review! Stefan
Re: [PATCH 2/4] xdiff-interface: export comparing and hashing strings
On Tue, Oct 24, 2017 at 1:23 PM, René Scharfe wrote: > xdl_recmatch() is already exported; why not use it without this > wrapper? It is exported in xdiff/xutils.h, to be used by various xdiff/*.c files, but not outside of xdiff/. This one makes it available to the outside, too. >> +extern unsigned long xdiff_hash_string(const char *s, size_t len, long >> flags); > > Documenting the meaning of their parameters would be nice. I'll do that.
Re: [PATCH 3/4] xdiff: use stronger hash function internally
Am 24.10.2017 um 20:59 schrieb Stefan Beller: > Instead of using the hash seeded with 5381, and updated via > `(hash << 5) ^ new_byte`, use the FNV-1 primitives as offered by > hashmap.h, which is seeded with 0x811c9dc5 and computed as > `(hash * 0x01000193) ^ new_byte`. The hash function you're replacing is called DJB2; I think that's worth mentioning. Performance test results would be nice. No idea how to find edge cases, though, or better: demonstrate a lack thereof. > > Signed-off-by: Stefan Beller > --- > xdiff/xutils.c | 19 --- > 1 file changed, 8 insertions(+), 11 deletions(-) > > diff --git a/xdiff/xutils.c b/xdiff/xutils.c > index 04d7b32e4e..a58a28c687 100644 > --- a/xdiff/xutils.c > +++ b/xdiff/xutils.c > @@ -24,7 +24,8 @@ > #include > #include "xinclude.h" > > - > +#include "cache.h" > +#include "hashmap.h" Ouch. Defining FNV32_BASE and FNV32_PRIME here would be much easier overall. And if that's too much duplication then those definitions could be extracted into a new header file (fnv32.h?) included by both hashmap.h and xutils.c. > > > long xdl_bogosqrt(long n) { > @@ -228,7 +229,7 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, > long s2, long flags) > > static unsigned long xdl_hash_record_with_whitespace(char const **data, > char const *top, long flags) { > - unsigned long ha = 5381; > + unsigned long ha = memhash(NULL, 0); > char const *ptr = *data; > > for (; ptr < top && *ptr != '\n'; ptr++) { > @@ -243,21 +244,18 @@ static unsigned long > xdl_hash_record_with_whitespace(char const **data, > ; /* already handled */ > else if (flags & XDF_IGNORE_WHITESPACE_CHANGE >&& !at_eol) { > - ha += (ha << 5); > - ha ^= (unsigned long) ' '; > + ha = memhash_feed(ha, (unsigned char) ' '); All the memhash_feed() callers in this file cast to unsigned char. A macro or a function (possibly inline) defined at the top could do that for them. > } > else if (flags & XDF_IGNORE_WHITESPACE_AT_EOL >&& !at_eol) { > while (ptr2 != ptr + 1) { > - ha += (ha << 5); > - ha ^= (unsigned long) *ptr2; > + ha = memhash_feed(ha, (unsigned char) > *ptr2); > ptr2++; > } > } > continue; > } > - ha += (ha << 5); > - ha ^= (unsigned long) *ptr; > + ha = memhash_feed(ha, (unsigned char) *ptr); > } > *data = ptr < top ? ptr + 1: ptr; > > @@ -265,15 +263,14 @@ static unsigned long > xdl_hash_record_with_whitespace(char const **data, > } > > unsigned long xdl_hash_record(char const **data, char const *top, long > flags) { > - unsigned long ha = 5381; > + unsigned long ha = memhash(NULL, 0); > char const *ptr = *data; > > if (flags & XDF_WHITESPACE_FLAGS) > return xdl_hash_record_with_whitespace(data, top, flags); > > for (; ptr < top && *ptr != '\n'; ptr++) { > - ha += (ha << 5); > - ha ^= (unsigned long) *ptr; > + ha = memhash_feed(ha, (unsigned char) *ptr); > } > *data = ptr < top ? ptr + 1: ptr; > >
Re: [PATCH 2/4] xdiff-interface: export comparing and hashing strings
Am 24.10.2017 um 20:59 schrieb Stefan Beller: > This will turn out to be useful in a later patch > > Signed-off-by: Stefan Beller > --- > xdiff-interface.c | 11 +++ > xdiff-interface.h | 5 + > 2 files changed, 16 insertions(+) > > diff --git a/xdiff-interface.c b/xdiff-interface.c > index 018e033089..fd002ebbc2 100644 > --- a/xdiff-interface.c > +++ b/xdiff-interface.c > @@ -5,6 +5,7 @@ > #include "xdiff/xdiffi.h" > #include "xdiff/xemit.h" > #include "xdiff/xmacros.h" > +#include "xdiff/xutils.h" > > struct xdiff_emit_state { > xdiff_emit_consume_fn consume; > @@ -296,6 +297,16 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg) > } > } > > +unsigned long xdiff_hash_string(const char *s, size_t len, long flags) > +{ > + return xdl_hash_record(&s, s + len, flags); > +} > + > +int xdiff_compare_lines(const char *l1, long s1, const char *l2, long s2, > long flags) > +{ > + return xdl_recmatch(l1, s1, l2, s2, flags); > +} xdl_recmatch() is already exported; why not use it without this wrapper? > + > int git_xmerge_style = -1; > > int git_xmerge_config(const char *var, const char *value, void *cb) > diff --git a/xdiff-interface.h b/xdiff-interface.h > index 6f6ba9095d..d3cb9285c5 100644 > --- a/xdiff-interface.h > +++ b/xdiff-interface.h > @@ -29,4 +29,9 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg); > extern int git_xmerge_config(const char *var, const char *value, void *cb); > extern int git_xmerge_style; > > +extern int xdiff_compare_lines(const char *l1, long s1, > +const char *l2, long s2, long flags); > + > +extern unsigned long xdiff_hash_string(const char *s, size_t len, long > flags); Documenting the meaning of their parameters would be nice. s and len are easy enough to guess, but which flags can be used? At least a pointer to their definition in xdiff/xdiff.h would be helpful. And renaming l1, s1, l2, s2 to a, alen, b, blen or line1, len1, line2, len2 or similar would leave me less confused, but perhaps that's just me. > + > #endif >
Re: [PATCH 1/4] hashmap: introduce memhash_feed to access the internals of FNV-1 hash
Am 24.10.2017 um 20:59 schrieb Stefan Beller: > This will be useful shortly. > > Signed-off-by: Stefan Beller > --- > hashmap.c | 7 ++- > hashmap.h | 3 +++ > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/hashmap.c b/hashmap.c > index d42f01ff5a..d103eb1fd2 100644 > --- a/hashmap.c > +++ b/hashmap.c > @@ -26,13 +26,18 @@ unsigned int strihash(const char *str) > return hash; > } > > +unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next) Why is the second parameter const and the first one isn't? (We tend not to bother with const for value types.) > +{ > + return (hash_seed * FNV32_PRIME) ^ next; > +} > + > unsigned int memhash(const void *buf, size_t len) > { > unsigned int hash = FNV32_BASE; > unsigned char *ucbuf = (unsigned char *) buf; > while (len--) { > unsigned int c = *ucbuf++; > - hash = (hash * FNV32_PRIME) ^ c; > + hash = memhash_feed(hash, c); I guess compilers inline a copy of the function here with -O2. My knee-jerk reaction, however, is horror in the face of adding a function call to the inner loop of a hash function. Do you have performance test results, ideally also with -O0? And why not make memhash_feed() an inline function or macro to sidestep that issue? > } > return hash; > } > diff --git a/hashmap.h b/hashmap.h > index 7cb29a6aed..c2464385ed 100644 > --- a/hashmap.h > +++ b/hashmap.h > @@ -105,10 +105,13 @@ >* `strihash` and `memihash` are case insensitive versions. >* `memihash_cont` is a variant of `memihash` that allows a computation to > be >* continued with another chunk of data. > + * `memhash_feed` takes just one character and returns the hash based off > + * a previous hash. >*/ > extern unsigned int strhash(const char *buf); > extern unsigned int strihash(const char *buf); > extern unsigned int memhash(const void *buf, size_t len); > +extern unsigned int memhash_feed(unsigned int hash_seed, const unsigned char > next); > extern unsigned int memihash(const void *buf, size_t len); > extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, > size_t len); > >
Re: Multiple paths in GIT_EXEC_PATH
On Tue, 2017-10-17 at 18:21 +0300, Nikolay Yakimov wrote: > For why I need that is another matter. Long story short, I need git to > look for '.gitignore' in a particular non-standard location, since I > have multiple git repositories in the same workdir (that workdir being > $HOME and git repositories being stores for my different configs) That is solvable without needing multiple directories in $GIT_EXEC_PATH. vcsh, which also solves the 'multiple repos with same workdir' problem in a very thin wrapper around git, does this by setting core.excludesfile in the per repo config to a unique path. hurricane:~$ vcsh dotfiles config core.excludesfile .gitignore.d/dotfiles hurricane:~$ vcsh secrets config core.excludesfile .gitignore.d/secrets D.
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ågren wrote: > On 24 October 2017 at 18:45, Eric Sunshine wrote: > > On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: > >> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin > >> wrote: > >>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in > >>> init_merge_options(). > >>> Consequential call of getenv() may return NULL pointer and strtol() > >>> crashes. > >>> However the stored pointer to the obtained getenv() result may be > >>> invalidated > >>> by some other getenv() call from another thread as getenv() is not > >>> thread-safe. > > I'm having trouble wrapping my head around this. Under which > circumstances could the second call in the current code return NULL, but > the code after your patch behave in a well-defined (and correct) way? Yeah, it's not at all clear to me this is solving a real problem. I know Andrey mentioned playing around with fault injection in an earlier thread, so I'm wondering if there is an artificial fault being injected into the second getenv() call. Which does not seem like something that should be possible in the real world. I definitely agree with the sentiment that as few things as possible should happen between calling getenv() and using its result. I've seen real bugs there from unexpected invalidation of the static buffer. -Peff
Re: [PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure
On Tue, Oct 24, 2017 at 05:16:25PM +0200, Michael Haggerty wrote: > The fix is simple: instead of just breaking out of the loop, jump > directly to the cleanup code. This fixes some tests in t1404 that were > added in the previous commit. Nicely explained. I think that fix makes sense, and matches how the rest of the function behaved. The other option would be to switch the recently-added code to: if (!ret && packed_transaction) ... but it seems like the "goto" means one less thing for new code to remember if it gets added. -Peff
Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts
On Tue, Oct 24, 2017 at 12:19:26PM -0400, Eric Sunshine wrote: > On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty > wrote: > > diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh > > @@ -34,6 +34,86 @@ test_update_rejected () { > > +# Test adding and deleting D/F-conflicting references in a single > > +# transaction. > > +df_test() { > > + local prefix="$1" > > + shift > > + local pack=: > > Isn't "local" a Bash-ism we want to keep out of the test scripts? Yeah. It's supported by dash and many other shells, but we do try to avoid it[1]. I think in this case we could just drop it (but keep setting the "local foo" ones to empty with "foo=". -Peff [1] There's some more discussion in the thread at: https://public-inbox.org/git/20160601163747.ga10...@sigill.intra.peff.net/
Re: [PATCH 0/2] Fix an error-handling path when locking refs
On Tue, Oct 24, 2017 at 05:16:23PM +0200, Michael Haggerty wrote: > But in fact the problem is more general; it can happen whenever a > reference deletion within a transaction is processed successfully, and > then another reference update in the same transaction fails in > `lock_ref_for_update()`. In such a case the failed update and any > subsequent ones could be silently ignored. Thanks for digging this down to the root cause. It sounds like this might not be all that hard to trigger for users of "push --atomic", then (most of the rest of the code is saved by the fact that it still uses one ref per transaction, which I think can't trigger this). > There is a longer explanation in the second patch's commit message. > > The tests that I added probe a bunch of D/F update scenarios, which I > think should be characteristic of the scenarios that would trigger > this bug. It would be nice to have tests that examine other types of > failures (e.g., wrong `old_oid` values). What you added makes sense to me for now. I do admit I was a little surprised that we didn't have better test coverage for partial transaction failures. I think in general our test suite is weak on checking failures, and covering more cases (like old_oid) would be welcome. Those will be valuable especially as we started playing with more storage backends. I also agree that those tests can come post-release. > While looking at this code again, I realized that the new code > rewrites the `packed-refs` file more often than did the old code. > Specifically, after dc39e09942 (files_ref_store: use a transaction to > update packed refs, 2017-09-08), the `packed-refs` file is overwritten > for any transaction that includes a reference delete. Prior to that > commit, `packed-refs` was only overwritten if a deleted reference was > actually present in the existing `packed-refs` file. This is even the > case if there was previously no `packed-refs` file at all; now any > reference deletion causes an empty `packed-refs` file to be created. > > I think this is a conscionable regression, since deleting references > that are purely loose is probably not all that common, and the old > code had to read the whole `packed-refs` file anyway. Nevertheless, it > is obviously something that I would like to fix (though maybe not for > 2.15? I'm open to input about its urgency.) I agree it's not nearly as urgent as the fix you have here. It might show up on a busy system as increased lock contention over packed-refs. Or if you have really gigantic packed-refs files, a possible performance regression. But as you say, it should be a pretty rare case. So I'd be OK with leaving it to post-2.15. OTOH, I suspect it's a pretty small patch. If you happen to produce it quickly, it might be worth seeing before evaluating. -Peff
Re: [PATCH] path: use xmalloc in add_to_trie
On Tue, Oct 24, 2017 at 09:13:28AM -0700, Stefan Beller wrote: > On Tue, Oct 24, 2017 at 8:15 AM, Andrey Okoshkin > wrote: > > Add usage of xmalloc() instead of malloc() in add_to_trie() as xmalloc wraps > > and checks memory allocation result. > > > > Signed-off-by: Andrey Okoshkin > > --- > > Hello, > > I'm not sure but it looks like there is a missing check of the malloc > > result. > > memcpy() may crash with SIGSEGV due to the memory allocation failure. > > make_trie_node() uses xmalloc() and xcalloc() - so I believe add_to_trie() > > also should use it. > > Good catch! Thanks for spotting. > > Trying to find similar occurrences via git grep "= malloc" did not > yield other places that need the same fix. Don't forget realloc and calloc (though I don't think there are any cases that need touching there; just a hint for anybody doing auditing). -Peff
Re: v2.15.0-rc2 ref deletion bug
On Tue, Oct 24, 2017 at 01:05:00PM +0200, Michael Haggerty wrote: > > I'd expect one of: > > > > 1. We delete "foo" before updating "foo/bar", and we end up with a > > single ref. > > I don't think that this is possible in the general case in a single > transaction. The problem is that the code would need to take locks > > refs/tags/foo.lock > refs/tags/foo/bar.lock > > But the latter couldn't coexist with the loose reference file > > refs/tags/foo > > , which might already exist. Yeah, you're right. I was thinking of the opposite case (where you could create "foo.lock" even though "foo/bar" exists), but this one is impossible with filesystem semantics. > It is only imaginable to do this in a single transaction if you pack and > prune `refs/tags/foo` first, to get the loose reference out of the way, > before executing the transaction. Even then, you would have to beware of > a race where another process writes a loose version of `refs/tags/foo` > between the time that `pack-refs` prunes it and the time that the > transaction obtains the lock again. Yeah, it's probably better to avoid playing games here. Moving to a non-filesystem storage backend would just make the problem go away. -Peff
Re: Consequences of CRLF in index?
On Tue, Oct 24, 2017 at 11:14:15AM -0700, Jonathan Nieder wrote: > Hi, > > Lars Schneider wrote: > > > I've migrated a large repo (110k+ files) with a lot of history (177k > > commits) > > and a lot of users (200+) to Git. Unfortunately, all text files in the index > > of the repo have CRLF line endings. In general this seems not to be a > > problem > > as the project is developed exclusively on Windows. > > Sounds good. > > > However, I wonder if there are any "hidden consequences" of this setup? > > If there are consequences, then I see two options. Either I rebase the repo > > and fix the line endings for all commits or I add a single commit that fixes > > the line endings for all files. Both actions require coordination with the > > users to avoid repo trouble/merge conflicts. The "single fixup commit" > > options > > would also make diffs into the past look bad. Would a single large commit > > have > > any impact on the performance of standard Git operations? > > There are no hidden consequences that I'm aware of. If you later > decide that you want to become a cross-platform project, then you may > want to switch to LF endings, in which case I suggest the "single > fixup commit" strategy. > > In any event, you also probably want to declare what you're doing > using .gitattributes. By checking in the files as CRLF, you are > declaring that you do *not* want Git to treat them as text files > (i.e., you do not want Git to change the line endings), so something > as simple as > > * -text > > should do the trick. See gitattributes(5) for details. Exactly. The "hidden consequence" may be the other way around: If you don't specify .gitattributes, then all people who have core.autocrlf=true will suffer from a runtime penalty. core.autocrlf=true means "auto". At each checkout Git needs to figure out that the file has CRLF in the repo, so that there is no conversion done. The same happens on "git add" or "git commit" (and sometimes on "git status"). The penalty may be low, but Dscho once reported that it is measurable & painful on a "big repo". The other advantage of "* -text" in .gitattributes is that new files are handled consistantly accross developers. Those who have "core.autocrlf=false" would produce commits with CRLF for new files, and those developpers who have core.autocrlf=true would produce files with LF in the index and CRLF in the worktree. This may (most probably will) cause confusion later, when things are pushed and pulled. > > I'd be interested to hear what happens when diff-ing across a line > ending fixup commit. Is this an area where Git needs some > improvement? "git merge" knows an -Xrenormalize option to deal with a > related problem --- it's possible that "git diff" needs to learn a > similar trick. That is a tricky thing. Sometimes you want to see the CLRF - LF as a diff, (represented as "^M"), and sometimes not. All in all "* -text" is a good choice for Windows-only projects. > > Thanks and hope that helps, > Jonathan
Re: [PATCH 0/4] (x)diff cleanup: remove duplicate code
On Tue, Oct 24, 2017 at 11:59 AM, Stefan Beller wrote: > Junio wrote: > >> * I was hoping that the next_byte() and string_hash() thing, once >>they are cleaned up, will eventually be shared with the xdiff/ >>code at the lower layer, which needs to do pretty much the same >>in order to implement various whitespace ignoring options. I am >>not sure how well the approach taken by the WIP patch meshes with >>the needs of the lower layer. > > This series does exactly this; although I chose to reuse the code in > xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that > code has seen more exercise already (hence I assume it has fewer bugs > if any as well as its performance implications are well understood). > > However to do so, we need to pollute xdiff/xutils.c and include > hashmap.h there (which also requires cache.h as that header has > an inline function using BUG()), which I find a bit unfortunate but > worth the trade off of using better tested code. > This, of course, applies on top of jk/diff-color-moved-fix.
[PATCH 3/4] xdiff: use stronger hash function internally
Instead of using the hash seeded with 5381, and updated via `(hash << 5) ^ new_byte`, use the FNV-1 primitives as offered by hashmap.h, which is seeded with 0x811c9dc5 and computed as `(hash * 0x01000193) ^ new_byte`. Signed-off-by: Stefan Beller --- xdiff/xutils.c | 19 --- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/xdiff/xutils.c b/xdiff/xutils.c index 04d7b32e4e..a58a28c687 100644 --- a/xdiff/xutils.c +++ b/xdiff/xutils.c @@ -24,7 +24,8 @@ #include #include "xinclude.h" - +#include "cache.h" +#include "hashmap.h" long xdl_bogosqrt(long n) { @@ -228,7 +229,7 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags) static unsigned long xdl_hash_record_with_whitespace(char const **data, char const *top, long flags) { - unsigned long ha = 5381; + unsigned long ha = memhash(NULL, 0); char const *ptr = *data; for (; ptr < top && *ptr != '\n'; ptr++) { @@ -243,21 +244,18 @@ static unsigned long xdl_hash_record_with_whitespace(char const **data, ; /* already handled */ else if (flags & XDF_IGNORE_WHITESPACE_CHANGE && !at_eol) { - ha += (ha << 5); - ha ^= (unsigned long) ' '; + ha = memhash_feed(ha, (unsigned char) ' '); } else if (flags & XDF_IGNORE_WHITESPACE_AT_EOL && !at_eol) { while (ptr2 != ptr + 1) { - ha += (ha << 5); - ha ^= (unsigned long) *ptr2; + ha = memhash_feed(ha, (unsigned char) *ptr2); ptr2++; } } continue; } - ha += (ha << 5); - ha ^= (unsigned long) *ptr; + ha = memhash_feed(ha, (unsigned char) *ptr); } *data = ptr < top ? ptr + 1: ptr; @@ -265,15 +263,14 @@ static unsigned long xdl_hash_record_with_whitespace(char const **data, } unsigned long xdl_hash_record(char const **data, char const *top, long flags) { - unsigned long ha = 5381; + unsigned long ha = memhash(NULL, 0); char const *ptr = *data; if (flags & XDF_WHITESPACE_FLAGS) return xdl_hash_record_with_whitespace(data, top, flags); for (; ptr < top && *ptr != '\n'; ptr++) { - ha += (ha << 5); - ha ^= (unsigned long) *ptr; + ha = memhash_feed(ha, (unsigned char) *ptr); } *data = ptr < top ? ptr + 1: ptr; -- 2.15.0.rc2.6.g953226eb5f
[PATCH 4/4] diff.c: get rid of duplicate implementation
The implementations in diff.c to detect moved lines needs to compare strings and hash strings, which is implemented in that file, as well as in the xdiff library. Remove the rather recent implementation in diff.c and rely on the well exercised code in the xdiff lib. Signed-off-by: Stefan Beller --- diff.c | 82 -- 1 file changed, 4 insertions(+), 78 deletions(-) diff --git a/diff.c b/diff.c index c4a669ffa8..e6814b9e9c 100644 --- a/diff.c +++ b/diff.c @@ -707,88 +707,14 @@ struct moved_entry { struct moved_entry *next_line; }; -static int next_byte(const char **cp, const char **endp, -const struct diff_options *diffopt) -{ - int retval; - - if (*cp >= *endp) - return -1; - - if (isspace(**cp)) { - if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) { - while (*cp < *endp && isspace(**cp)) - (*cp)++; - /* -* After skipping a couple of whitespaces, -* we still have to account for one space. -*/ - return (int)' '; - } - - if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) { - while (*cp < *endp && isspace(**cp)) - (*cp)++; - /* -* return the first non-ws character via the usual -* below, unless we ate all of the bytes -*/ - if (*cp >= *endp) - return -1; - } - } - - retval = (unsigned char)(**cp); - (*cp)++; - return retval; -} - static int moved_entry_cmp(const struct diff_options *diffopt, const struct moved_entry *a, const struct moved_entry *b, const void *keydata) { - const char *ap = a->es->line, *ae = a->es->line + a->es->len; - const char *bp = b->es->line, *be = b->es->line + b->es->len; - - if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS)) - return a->es->len != b->es->len || memcmp(ap, bp, a->es->len); - - if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) { - while (ae > ap && isspace(ae[-1])) - ae--; - while (be > bp && isspace(be[-1])) - be--; - } - - while (1) { - int ca, cb; - ca = next_byte(&ap, &ae, diffopt); - cb = next_byte(&bp, &be, diffopt); - if (ca != cb) - return 1; - if (ca < 0) - return 0; - } -} - -static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_options *o) -{ - if (o->xdl_opts & XDF_WHITESPACE_FLAGS) { - static struct strbuf sb = STRBUF_INIT; - const char *ap = es->line, *ae = es->line + es->len; - int c; - - strbuf_reset(&sb); - while (ae > ap && isspace(ae[-1])) - ae--; - while ((c = next_byte(&ap, &ae, o)) >= 0) - strbuf_addch(&sb, c); - - return memhash(sb.buf, sb.len); - } else { - return memhash(es->line, es->len); - } + return !xdiff_compare_lines(a->es->line, a->es->len, + b->es->line, b->es->len, + diffopt->xdl_opts); } static struct moved_entry *prepare_entry(struct diff_options *o, @@ -797,7 +723,7 @@ static struct moved_entry *prepare_entry(struct diff_options *o, struct moved_entry *ret = xmalloc(sizeof(*ret)); struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no]; - ret->ent.hash = get_string_hash(l, o); + ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts); ret->es = l; ret->next_line = NULL; -- 2.15.0.rc2.6.g953226eb5f
[PATCH 2/4] xdiff-interface: export comparing and hashing strings
This will turn out to be useful in a later patch Signed-off-by: Stefan Beller --- xdiff-interface.c | 11 +++ xdiff-interface.h | 5 + 2 files changed, 16 insertions(+) diff --git a/xdiff-interface.c b/xdiff-interface.c index 018e033089..fd002ebbc2 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -5,6 +5,7 @@ #include "xdiff/xdiffi.h" #include "xdiff/xemit.h" #include "xdiff/xmacros.h" +#include "xdiff/xutils.h" struct xdiff_emit_state { xdiff_emit_consume_fn consume; @@ -296,6 +297,16 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg) } } +unsigned long xdiff_hash_string(const char *s, size_t len, long flags) +{ + return xdl_hash_record(&s, s + len, flags); +} + +int xdiff_compare_lines(const char *l1, long s1, const char *l2, long s2, long flags) +{ + return xdl_recmatch(l1, s1, l2, s2, flags); +} + int git_xmerge_style = -1; int git_xmerge_config(const char *var, const char *value, void *cb) diff --git a/xdiff-interface.h b/xdiff-interface.h index 6f6ba9095d..d3cb9285c5 100644 --- a/xdiff-interface.h +++ b/xdiff-interface.h @@ -29,4 +29,9 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg); extern int git_xmerge_config(const char *var, const char *value, void *cb); extern int git_xmerge_style; +extern int xdiff_compare_lines(const char *l1, long s1, + const char *l2, long s2, long flags); + +extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags); + #endif -- 2.15.0.rc2.6.g953226eb5f
[PATCH 1/4] hashmap: introduce memhash_feed to access the internals of FNV-1 hash
This will be useful shortly. Signed-off-by: Stefan Beller --- hashmap.c | 7 ++- hashmap.h | 3 +++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/hashmap.c b/hashmap.c index d42f01ff5a..d103eb1fd2 100644 --- a/hashmap.c +++ b/hashmap.c @@ -26,13 +26,18 @@ unsigned int strihash(const char *str) return hash; } +unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next) +{ + return (hash_seed * FNV32_PRIME) ^ next; +} + unsigned int memhash(const void *buf, size_t len) { unsigned int hash = FNV32_BASE; unsigned char *ucbuf = (unsigned char *) buf; while (len--) { unsigned int c = *ucbuf++; - hash = (hash * FNV32_PRIME) ^ c; + hash = memhash_feed(hash, c); } return hash; } diff --git a/hashmap.h b/hashmap.h index 7cb29a6aed..c2464385ed 100644 --- a/hashmap.h +++ b/hashmap.h @@ -105,10 +105,13 @@ * `strihash` and `memihash` are case insensitive versions. * `memihash_cont` is a variant of `memihash` that allows a computation to be * continued with another chunk of data. + * `memhash_feed` takes just one character and returns the hash based off + * a previous hash. */ extern unsigned int strhash(const char *buf); extern unsigned int strihash(const char *buf); extern unsigned int memhash(const void *buf, size_t len); +extern unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next); extern unsigned int memihash(const void *buf, size_t len); extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len); -- 2.15.0.rc2.6.g953226eb5f
[PATCH 0/4] (x)diff cleanup: remove duplicate code
Junio wrote: > * I was hoping that the next_byte() and string_hash() thing, once >they are cleaned up, will eventually be shared with the xdiff/ >code at the lower layer, which needs to do pretty much the same >in order to implement various whitespace ignoring options. I am >not sure how well the approach taken by the WIP patch meshes with >the needs of the lower layer. This series does exactly this; although I chose to reuse the code in xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that code has seen more exercise already (hence I assume it has fewer bugs if any as well as its performance implications are well understood). However to do so, we need to pollute xdiff/xutils.c and include hashmap.h there (which also requires cache.h as that header has an inline function using BUG()), which I find a bit unfortunate but worth the trade off of using better tested code. Thanks, Stefan Stefan Beller (4): hashmap: introduce memhash_feed to access the internals of FNV-1 hash xdiff-interface: export comparing and hashing strings xdiff: use stronger hash function internally diff.c: get rid of duplicate implementation diff.c| 82 +++ hashmap.c | 7 - hashmap.h | 3 ++ xdiff-interface.c | 11 xdiff-interface.h | 5 xdiff/xutils.c| 19 ++--- 6 files changed, 37 insertions(+), 90 deletions(-) -- 2.15.0.rc2.6.g953226eb5f
[PATCH 10/13] rev-list: add list-objects filtering support
From: Jeff Hostetler Teach rev-list to use the filtering provided by the traverse_commit_list_filtered() interface to omit unwanted objects from the result. This feature is only enabled when one of the "--objects*" options are used. Furthermore, when the "--filter-print-omitted" option is used, the omitted objects are printed at the end. These are marked with a "~". This option can be combined with "--quiet" to get a list of just the omitted objects. Signed-off-by: Jeff Hostetler --- Documentation/git-rev-list.txt | 5 ++- Documentation/rev-list-options.txt | 30 ++ builtin/rev-list.c | 84 +- 3 files changed, 116 insertions(+), 3 deletions(-) diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt index ef22f17..6d2e60d 100644 --- a/Documentation/git-rev-list.txt +++ b/Documentation/git-rev-list.txt @@ -47,7 +47,10 @@ SYNOPSIS [ --fixed-strings | -F ] [ --date=] [ [ --objects | --objects-edge | --objects-edge-aggressive ] - [ --unpacked ] ] + [ --unpacked ] + [ --filter= ] ] +[ --filter-print-missing ] +[ --filter-print-omitted ] [ --pretty | --header ] [ --bisect ] [ --bisect-vars ] diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 7d860bf..88f8878 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -706,6 +706,36 @@ ifdef::git-rev-list[] --unpacked:: Only useful with `--objects`; print the object IDs that are not in packs. + +--filter=:: + Only useful with one of the `--objects*`; omits objects (usually + blobs) from the list of printed objects. The '' + may be one of the following: ++ +The form '--filter=blob:none' omits all blobs. ++ +The form '--filter=blob:limit=[kmg]' omits blobs larger than n bytes +or units. The value may be zero. Special files matching '.git*' are +alwayse included, regardless of size. ++ +The form '--filter=sparse:oid=' uses a sparse-checkout +specification contained in the object (or the object that the expression +evaluates to) to omit blobs not required by the corresponding sparse +checkout. ++ +The form '--filter=sparse:path=' similarly uses a sparse-checkout +specification contained in . + +--filter-print-missing:: + Prints a list of the missing objects for the requested traversal. + Object IDs are prefixed with a ``?'' character. The object type + is printed after the ID. This may be used with or without any of + the above filtering options. + +--filter-print-omitted:: + Only useful with one of the above `--filter*`; prints a list + of the omitted objects. Object IDs are prefixed with a ``~'' + character. endif::git-rev-list[] --no-walk[=(sorted|unsorted)]:: diff --git a/builtin/rev-list.c b/builtin/rev-list.c index c1c74d4..7a0353f 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -12,6 +12,7 @@ #include "bisect.h" #include "progress.h" #include "reflog-walk.h" +#include "partial-clone-utils.h" static const char rev_list_usage[] = "git rev-list [OPTION] ... [ -- paths... ]\n" @@ -54,6 +55,11 @@ static const char rev_list_usage[] = static struct progress *progress; static unsigned progress_counter; +static struct list_objects_filter_options filter_options; +static struct oidmap missing_objects; +static int arg_print_missing; +static int arg_print_omitted; +#define DEFAULT_MAP_SIZE (16*1024) static void finish_commit(struct commit *commit, void *data); static void show_commit(struct commit *commit, void *data) @@ -181,8 +187,26 @@ static void finish_commit(struct commit *commit, void *data) static void finish_object(struct object *obj, const char *name, void *cb_data) { struct rev_list_info *info = cb_data; - if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) + if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) { + if (arg_print_missing) { + list_objects_filter_map_insert( + &missing_objects, &obj->oid, name, obj->type); + return; + } + + /* +* Relax consistency checks when we expect missing +* objects because of partial-clone or a previous +* partial-fetch. +* +* Note that this is independent of any filtering that +* we are doing in this run. +*/ + if (is_partial_clone_registered()) + return; + die("missing blob object '%s'", oid_to_hex(&obj->oid)); + } if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT) parse_object(&obj->oid); } @@ -202,6 +226,22 @@ static void show_edge(struct
[PATCH 12/13] pack-objects: add list-objects filtering
From: Jeff Hostetler Teach pack-objects to use the filtering provided by the traverse_commit_list_filtered() interface to omit unwanted objects from the resulting packfile. This feature is intended for partial clone/fetch. Filtering requires the use of the "--stdout" option. Signed-off-by: Jeff Hostetler --- Documentation/git-pack-objects.txt | 8 +++- builtin/pack-objects.c | 18 +- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index 473a161..8b4a223 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -12,7 +12,8 @@ SYNOPSIS 'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied] [--no-reuse-delta] [--delta-base-offset] [--non-empty] [--local] [--incremental] [--window=] [--depth=] - [--revs [--unpacked | --all]] [--stdout | base-name] + [--revs [--unpacked | --all]] + [--stdout [--filter=] | base-name] [--shallow] [--keep-true-parents] < object-list @@ -236,6 +237,11 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle. With this option, parents that are hidden by grafts are packed nevertheless. +--filter=:: + Requires `--stdout`. Omits certain objects (usually blobs) from + the resulting packfile. See linkgit:git-rev-list[1] for valid + `` forms. + SEE ALSO linkgit:git-rev-list[1] diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6e77dfd..a251850 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -79,6 +79,8 @@ static unsigned long cache_max_small_delta_size = 1000; static unsigned long window_memory_limit = 0; +static struct list_objects_filter_options filter_options; + /* * stats */ @@ -2816,7 +2818,12 @@ static void get_object_list(int ac, const char **av) if (prepare_revision_walk(&revs)) die("revision walk setup failed"); mark_edges_uninteresting(&revs, show_edge); - traverse_commit_list(&revs, show_commit, show_object, NULL); + if (filter_options.choice) + traverse_commit_list_filtered(&filter_options, &revs, + show_commit, show_object, + NULL, NULL); + else + traverse_commit_list(&revs, show_commit, show_object, NULL); if (unpack_unreachable_expiration) { revs.ignore_missing_links = 1; @@ -2952,6 +2959,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) N_("use a bitmap index if available to speed up counting objects")), OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index, N_("write a bitmap index together with the pack index")), + + OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options), + OPT_END(), }; @@ -3028,6 +3038,12 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; + if (filter_options.choice) { + if (!pack_to_stdout) + die("cannot use filtering with an indexable pack."); + use_bitmap_index = 0; + } + /* * "soft" reasons not to use bitmaps - for on-disk repack by default we want * -- 2.9.3
[PATCH 09/13] extension.partialclone: introduce partial clone extension
From: Jeff Hostetler Introduce the ability to have missing objects in a repo. This functionality is guarded by new repository extension options: `extensions.partialcloneremote` and `extensions.partialclonefilter`. See the update to Documentation/technical/repository-version.txt in this patch for more information. This patch is part of a patch originally authored by: Jonathan Tan Signed-off-by: Jeff Hostetler --- Documentation/technical/repository-version.txt | 22 ++ Makefile | 1 + cache.h| 4 ++ config.h | 3 + environment.c | 2 + partial-clone-utils.c | 99 ++ partial-clone-utils.h | 34 + setup.c| 15 8 files changed, 180 insertions(+) create mode 100644 partial-clone-utils.c create mode 100644 partial-clone-utils.h diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt index 00ad379..9d488db 100644 --- a/Documentation/technical/repository-version.txt +++ b/Documentation/technical/repository-version.txt @@ -86,3 +86,25 @@ for testing format-1 compatibility. When the config key `extensions.preciousObjects` is set to `true`, objects in the repository MUST NOT be deleted (e.g., by `git-prune` or `git repack -d`). + +`partialcloneremote` + + +When the config key `extensions.partialcloneremote` is set, it indicates +that the repo was created with a partial clone (or later performed +a partial fetch) and that the remote may have omitted sending +certain unwanted objects. Such a remote is called a "promisor remote" +and it promises that all such omitted objects can be fetched from it +in the future. + +The value of this key is the name of the promisor remote. + +`partialclonefilter` + + +When the config key `extensions.partialclonefilter` is set, it gives +the initial filter expression used to create the partial clone. +This value becomed the default filter expression for subsequent +fetches (called "partial fetches") from the promisor remote. This +value may also be set by the first explicit partial fetch following a +normal clone. diff --git a/Makefile b/Makefile index b9ff0b4..38632fb 100644 --- a/Makefile +++ b/Makefile @@ -841,6 +841,7 @@ LIB_OBJS += pack-write.o LIB_OBJS += pager.o LIB_OBJS += parse-options.o LIB_OBJS += parse-options-cb.o +LIB_OBJS += partial-clone-utils.o LIB_OBJS += patch-delta.o LIB_OBJS += patch-ids.o LIB_OBJS += path.o diff --git a/cache.h b/cache.h index 6440e2b..4b785c0 100644 --- a/cache.h +++ b/cache.h @@ -860,12 +860,16 @@ extern int grafts_replace_parents; #define GIT_REPO_VERSION 0 #define GIT_REPO_VERSION_READ 1 extern int repository_format_precious_objects; +extern char *repository_format_partial_clone_remote; +extern char *repository_format_partial_clone_filter; struct repository_format { int version; int precious_objects; int is_bare; char *work_tree; + char *partial_clone_remote; /* value of extensions.partialcloneremote */ + char *partial_clone_filter; /* value of extensions.partialclonefilter */ struct string_list unknown_extensions; }; diff --git a/config.h b/config.h index a49d264..90544ef 100644 --- a/config.h +++ b/config.h @@ -34,6 +34,9 @@ struct config_options { const char *git_dir; }; +#define KEY_PARTIALCLONEREMOTE "partialcloneremote" +#define KEY_PARTIALCLONEFILTER "partialclonefilter" + typedef int (*config_fn_t)(const char *, const char *, void *); extern int git_default_config(const char *, const char *, void *); extern int git_config_from_file(config_fn_t fn, const char *, void *); diff --git a/environment.c b/environment.c index 8289c25..2fcf9bb 100644 --- a/environment.c +++ b/environment.c @@ -27,6 +27,8 @@ int warn_ambiguous_refs = 1; int warn_on_object_refname_ambiguity = 1; int ref_paranoia = -1; int repository_format_precious_objects; +char *repository_format_partial_clone_remote; +char *repository_format_partial_clone_filter; const char *git_commit_encoding; const char *git_log_output_encoding; const char *apply_default_whitespace; diff --git a/partial-clone-utils.c b/partial-clone-utils.c new file mode 100644 index 000..8c925ae --- /dev/null +++ b/partial-clone-utils.c @@ -0,0 +1,99 @@ +#include "cache.h" +#include "config.h" +#include "partial-clone-utils.h" + +int is_partial_clone_registered(void) +{ + if (repository_format_partial_clone_remote || + repository_format_partial_clone_filter) + return 1; + + return 0; +} + +void partial_clone_utils_register( + const struct list_objects_filter_options *filter_options, + const char *remote, + const char *cmd_name) +{ + struct strbuf buf = STRBUF_INIT;
[PATCH 05/13] list-objects-filter-blobs-limit: add large blob filtering
From: Jeff Hostetler Create a filter for traverse_commit_list_worker() to omit blobs larger than a requested size from the result, but always include ".git*" special files. Signed-off-by: Jeff Hostetler --- Makefile | 1 + list-objects-filter-blobs-limit.c | 146 ++ list-objects-filter-blobs-limit.h | 18 + 3 files changed, 165 insertions(+) create mode 100644 list-objects-filter-blobs-limit.c create mode 100644 list-objects-filter-blobs-limit.h diff --git a/Makefile b/Makefile index 7e9d1f4..0fdeabb 100644 --- a/Makefile +++ b/Makefile @@ -807,6 +807,7 @@ LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o LIB_OBJS += line-range.o LIB_OBJS += list-objects.o +LIB_OBJS += list-objects-filter-blobs-limit.o LIB_OBJS += list-objects-filter-blobs-none.o LIB_OBJS += list-objects-filter-map.o LIB_OBJS += ll-merge.o diff --git a/list-objects-filter-blobs-limit.c b/list-objects-filter-blobs-limit.c new file mode 100644 index 000..f68963d --- /dev/null +++ b/list-objects-filter-blobs-limit.c @@ -0,0 +1,146 @@ +#include "cache.h" +#include "dir.h" +#include "tag.h" +#include "commit.h" +#include "tree.h" +#include "blob.h" +#include "diff.h" +#include "tree-walk.h" +#include "revision.h" +#include "list-objects.h" +#include "list-objects-filter-blobs-limit.h" + +#define DEFAULT_MAP_SIZE (16*1024) + +/* + * A filter for list-objects to omit large blobs, + * but always include ".git*" special files. + * And to OPTIONALLY collect a list of the omitted OIDs. + */ +struct filter_blobs_limit_data { + struct oidmap *omits; + unsigned long max_bytes; +}; + +static list_objects_filter_result filter_blobs_limit( + list_objects_filter_type filter_type, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_blobs_limit_data *filter_data = filter_data_; + struct list_objects_filter_data_entry *entry; + unsigned long object_length; + enum object_type t; + int is_special_filename; + + switch (filter_type) { + default: + die("unkown filter_type"); + return LOFR_ZERO; + + case LOFT_BEGIN_TREE: + assert(obj->type == OBJ_TREE); + /* always include all tree objects */ + return LOFR_MARK_SEEN | LOFR_SHOW; + + case LOFT_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + case LOFT_BLOB: + assert(obj->type == OBJ_BLOB); + assert((obj->flags & SEEN) == 0); + + is_special_filename = ((strncmp(filename, ".git", 4) == 0) && + filename[4]); + + /* +* If we are keeping a list of the omitted objects +* for the caller *AND* we previously "provisionally" +* omitted this object (because of size) *AND* it now +* has a special filename, make it not-omitted. +* Otherwise, continue to provisionally omit it. +*/ + if (filter_data->omits && + oidmap_get(filter_data->omits, &obj->oid)) { + if (!is_special_filename) + return LOFR_ZERO; + entry = oidmap_remove(filter_data->omits, &obj->oid); + free(entry); + return LOFR_MARK_SEEN | LOFR_SHOW; + } + + /* +* If filename matches ".git*", always include it (regardless +* of size). (This may include blobs that we do not have +* locally.) +*/ + if (is_special_filename) + return LOFR_MARK_SEEN | LOFR_SHOW; + + t = sha1_object_info(obj->oid.hash, &object_length); + if (t != OBJ_BLOB) { /* probably OBJ_NONE */ + /* +* We DO NOT have the blob locally, so we cannot +* apply the size filter criteria. Be conservative +* and force show it (and let the caller deal with +* the ambiguity). (This matches the behavior above +* when the special filename matches.) +*/ + return LOFR_MARK_SEEN | LOFR_SHOW; + } + + if (object_length < filter_data->max_bytes) + return LOFR_MARK_SEEN | LOFR_SHOW; + + /* +* Provisionally omit it. We've already established +* that this blob is too big and doesn't have a special +* filename, so we *WANT* to omit it. However, there +* may be a special file elsewhere in the tree that +* references this same blob, so we cannot reject it +
[PATCH 04/13] list-objects-filter-blobs-none: add filter to omit all blobs
From: Jeff Hostetler Create a simple filter for traverse_commit_list_worker() to omit all blobs from the result. This filter will be used in a future commit by rev-list and pack-objects to create a "commits and trees" result. This is intended for partial clone and fetch support. Signed-off-by: Jeff Hostetler --- Makefile | 1 + list-objects-filter-blobs-none.c | 83 list-objects-filter-blobs-none.h | 18 + 3 files changed, 102 insertions(+) create mode 100644 list-objects-filter-blobs-none.c create mode 100644 list-objects-filter-blobs-none.h diff --git a/Makefile b/Makefile index e59f12d..7e9d1f4 100644 --- a/Makefile +++ b/Makefile @@ -807,6 +807,7 @@ LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o LIB_OBJS += line-range.o LIB_OBJS += list-objects.o +LIB_OBJS += list-objects-filter-blobs-none.o LIB_OBJS += list-objects-filter-map.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o diff --git a/list-objects-filter-blobs-none.c b/list-objects-filter-blobs-none.c new file mode 100644 index 000..1b548b9 --- /dev/null +++ b/list-objects-filter-blobs-none.c @@ -0,0 +1,83 @@ +#include "cache.h" +#include "dir.h" +#include "tag.h" +#include "commit.h" +#include "tree.h" +#include "blob.h" +#include "diff.h" +#include "tree-walk.h" +#include "revision.h" +#include "list-objects.h" +#include "list-objects-filter-blobs-none.h" + +#define DEFAULT_MAP_SIZE (16*1024) + +/* + * A filter for list-objects to omit ALL blobs from the traversal. + * And to OPTIONALLY collect a list of the omitted OIDs. + */ +struct filter_blobs_none_data { + struct oidmap *omits; +}; + +static list_objects_filter_result filter_blobs_none( + list_objects_filter_type filter_type, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_blobs_none_data *filter_data = filter_data_; + + switch (filter_type) { + default: + die("unkown filter_type"); + return LOFR_ZERO; + + case LOFT_BEGIN_TREE: + assert(obj->type == OBJ_TREE); + /* always include all tree objects */ + return LOFR_MARK_SEEN | LOFR_SHOW; + + case LOFT_END_TREE: + assert(obj->type == OBJ_TREE); + return LOFR_ZERO; + + case LOFT_BLOB: + assert(obj->type == OBJ_BLOB); + assert((obj->flags & SEEN) == 0); + + if (filter_data->omits) + list_objects_filter_map_insert( + filter_data->omits, &obj->oid, pathname, + obj->type); + + return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */ + } +} + +void traverse_commit_list__blobs_none( + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + list_objects_filter_map_foreach_cb print_omitted_object, + void *ctx_data) +{ + struct filter_blobs_none_data d; + + memset(&d, 0, sizeof(d)); + if (print_omitted_object) { + d.omits = xcalloc(1, sizeof(*d.omits)); + oidmap_init(d.omits, DEFAULT_MAP_SIZE); + } + + traverse_commit_list_worker(revs, show_commit, show_object, ctx_data, + filter_blobs_none, &d); + + if (print_omitted_object) { + list_objects_filter_map_foreach(d.omits, + print_omitted_object, + ctx_data); + oidmap_free(d.omits, 1); + } +} diff --git a/list-objects-filter-blobs-none.h b/list-objects-filter-blobs-none.h new file mode 100644 index 000..363c9de --- /dev/null +++ b/list-objects-filter-blobs-none.h @@ -0,0 +1,18 @@ +#ifndef LIST_OBJECTS_FILTER_BLOBS_NONE_H +#define LIST_OBJECTS_FILTER_BLOBS_NONE_H + +#include "list-objects-filter-map.h" + +/* + * A filter for list-objects to omit ALL blobs + * from the traversal. + */ +void traverse_commit_list__blobs_none( + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + list_objects_filter_map_foreach_cb print_omitted_object, + void *ctx_data); + +#endif /* LIST_OBJECTS_FILTER_BLOBS_NONE_H */ + -- 2.9.3
[PATCH 13/13] t5317: pack-objects object filtering test
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- t/t5317-pack-objects-filter-objects.sh | 384 + 1 file changed, 384 insertions(+) create mode 100755 t/t5317-pack-objects-filter-objects.sh diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh new file mode 100755 index 000..ef7a8f6 --- /dev/null +++ b/t/t5317-pack-objects-filter-objects.sh @@ -0,0 +1,384 @@ +#!/bin/sh + +test_description='git pack-objects with object filtering for partial clone' + +. ./test-lib.sh + +# Test blob:none filter. + +test_expect_success 'setup r1' ' + echo "{print \$1}" >print_1.awk && + echo "{print \$2}" >print_2.awk && + + git init r1 && + for n in 1 2 3 4 5 + do + echo "This is file: $n" > r1/file.$n + git -C r1 add file.$n + git -C r1 commit -m "$n" + done +' + +test_expect_success 'verify blob count in normal packfile' ' + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r1 pack-objects --rev --stdout >all.pack <<-EOF && + HEAD + EOF + git -C r1 index-pack ../all.pack && + git -C r1 verify-pack -v ../all.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:none packfile has no blobs' ' + git -C r1 pack-objects --rev --stdout --filter=blob:none >filter.pack <<-EOF && + HEAD + EOF + git -C r1 index-pack ../filter.pack && + git -C r1 verify-pack -v ../filter.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + nr=$(wc -l expected && + git -C r1 verify-pack -v ../filter.pack \ + | grep -E "commit|tree" \ + | awk -f print_1.awk \ + | sort >observed && + test_cmp observed expected +' + +# Test blob:limit=[kmg] filter. +# We boundary test around the size parameter. The filter is strictly less than +# the value, so size 500 and 1000 should have the same results, but 1001 should +# filter more. + +test_expect_success 'setup r2' ' + git init r2 && + for n in 1000 1 + do + printf "%"$n"s" X > r2/large.$n + git -C r2 add large.$n + git -C r2 commit -m "$n" + done +' + +test_expect_success 'verify blob count in normal packfile' ' + git -C r2 ls-files -s large.1000 large.1 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 pack-objects --rev --stdout >all.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../all.pack && + git -C r2 verify-pack -v ../all.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=500 omits all blobs' ' + git -C r2 pack-objects --rev --stdout --filter=blob:limit=500 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + git -C r2 verify-pack -v ../filter.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + nr=$(wc -l filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + git -C r2 verify-pack -v ../filter.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + nr=$(wc -l expected && + git -C r2 pack-objects --rev --stdout --filter=blob:limit=1001 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + git -C r2 verify-pack -v ../filter.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=10001' ' + git -C r2 ls-files -s large.1000 large.1 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 pack-objects --rev --stdout --filter=blob:limit=10001 >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + git -C r2 verify-pack -v ../filter.pack \ + | grep blob \ + | awk -f print_1.awk \ + | sort >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=1k' ' + git -C r2 ls-files -s large.1000 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 pack-objects --rev --stdout --filter=blob:limit=1k >filter.pack <<-EOF && + HEAD + EOF + git -C r2 index-pack ../filter.pack && + git -C r2 verify-pack -v ../filt
[PATCH 01/13] dir: allow exclusions from blob in addition to file
From: Jeff Hostetler Refactor add_excludes() to separate the reading of the exclude file into a buffer and the parsing of the buffer into exclude_list items. Add add_excludes_from_blob_to_list() to allow an exclude file be specified with an OID. Signed-off-by: Jeff Hostetler --- dir.c | 51 +-- dir.h | 3 +++ 2 files changed, 52 insertions(+), 2 deletions(-) diff --git a/dir.c b/dir.c index 1d17b80..d848f2b 100644 --- a/dir.c +++ b/dir.c @@ -739,6 +739,10 @@ static void invalidate_directory(struct untracked_cache *uc, dir->dirs[i]->recurse = 0; } +static int add_excludes_from_buffer(char *buf, size_t size, + const char *base, int baselen, + struct exclude_list *el); + /* * Given a file with name "fname", read it (either from disk, or from * an index if 'istate' is non-null), parse it and store the @@ -754,9 +758,9 @@ static int add_excludes(const char *fname, const char *base, int baselen, struct sha1_stat *sha1_stat) { struct stat st; - int fd, i, lineno = 1; + int fd; size_t size = 0; - char *buf, *entry; + char *buf; fd = open(fname, O_RDONLY); if (fd < 0 || fstat(fd, &st) < 0) { @@ -813,6 +817,17 @@ static int add_excludes(const char *fname, const char *base, int baselen, } } + add_excludes_from_buffer(buf, size, base, baselen, el); + return 0; +} + +static int add_excludes_from_buffer(char *buf, size_t size, + const char *base, int baselen, + struct exclude_list *el) +{ + int i, lineno = 1; + char *entry; + el->filebuf = buf; if (skip_utf8_bom(&buf, size)) @@ -841,6 +856,38 @@ int add_excludes_from_file_to_list(const char *fname, const char *base, return add_excludes(fname, base, baselen, el, istate, NULL); } +int add_excludes_from_blob_to_list( + struct object_id *oid, + const char *base, int baselen, + struct exclude_list *el) +{ + char *buf; + unsigned long size; + enum object_type type; + + buf = read_sha1_file(oid->hash, &type, &size); + if (!buf) + return -1; + + if (type != OBJ_BLOB) { + free(buf); + return -1; + } + + if (size == 0) { + free(buf); + return 0; + } + + if (buf[size - 1] != '\n') { + buf = xrealloc(buf, st_add(size, 1)); + buf[size++] = '\n'; + } + + add_excludes_from_buffer(buf, size, base, baselen, el); + return 0; +} + struct exclude_list *add_exclude_list(struct dir_struct *dir, int group_type, const char *src) { diff --git a/dir.h b/dir.h index e371705..1bcf391 100644 --- a/dir.h +++ b/dir.h @@ -256,6 +256,9 @@ extern struct exclude_list *add_exclude_list(struct dir_struct *dir, extern int add_excludes_from_file_to_list(const char *fname, const char *base, int baselen, struct exclude_list *el, struct index_state *istate); extern void add_excludes_from_file(struct dir_struct *, const char *fname); +extern int add_excludes_from_blob_to_list(struct object_id *oid, + const char *base, int baselen, + struct exclude_list *el); extern void parse_exclude_pattern(const char **string, int *patternlen, unsigned *flags, int *nowildcardlen); extern void add_exclude(const char *string, const char *base, int baselen, struct exclude_list *el, int srcpos); -- 2.9.3
[PATCH 03/13] list-objects: filter objects in traverse_commit_list
From: Jeff Hostetler Create traverse_commit_list_filtered() and add filtering interface to allow certain objects to be omitted (not shown) during a traversal. Update traverse_commit_list() to be a wrapper for the above. Filtering will be used in a future commit by rev-list and pack-objects for narrow/partial clone/fetch to omit certain blobs from the output. traverse_bitmap_commit_list() does not work with filtering. If a packfile bitmap is present, it will not be used. Signed-off-by: Jeff Hostetler --- list-objects.c | 66 -- list-objects.h | 32 +++- 2 files changed, 81 insertions(+), 17 deletions(-) diff --git a/list-objects.c b/list-objects.c index b3931fa..3e86008 100644 --- a/list-objects.c +++ b/list-objects.c @@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs, show_object_fn show, struct strbuf *path, const char *name, -void *cb_data) +void *cb_data, +filter_object_fn filter, +void *filter_data) { struct object *obj = &blob->object; size_t pathlen; + list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW; if (!revs->blob_objects) return; @@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs, die("bad blob object"); if (obj->flags & (UNINTERESTING | SEEN)) return; - obj->flags |= SEEN; pathlen = path->len; strbuf_addstr(path, name); - show(obj, path->buf, cb_data); + if (filter) + r = filter(LOFT_BLOB, obj, path->buf, &path->buf[pathlen], filter_data); + if (r & LOFR_MARK_SEEN) + obj->flags |= SEEN; + if (r & LOFR_SHOW) + show(obj, path->buf, cb_data); strbuf_setlen(path, pathlen); } @@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs, show_object_fn show, struct strbuf *base, const char *name, -void *cb_data) +void *cb_data, +filter_object_fn filter, +void *filter_data) { struct object *obj = &tree->object; struct tree_desc desc; @@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs, enum interesting match = revs->diffopt.pathspec.nr == 0 ? all_entries_interesting: entry_not_interesting; int baselen = base->len; + list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW; if (!revs->tree_objects) return; @@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs, die("bad tree object %s", oid_to_hex(&obj->oid)); } - obj->flags |= SEEN; strbuf_addstr(base, name); - show(obj, base->buf, cb_data); + if (filter) + r = filter(LOFT_BEGIN_TREE, obj, base->buf, &base->buf[baselen], filter_data); + if (r & LOFR_MARK_SEEN) + obj->flags |= SEEN; + if (r & LOFR_SHOW) + show(obj, base->buf, cb_data); if (base->len) strbuf_addch(base, '/'); @@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs, process_tree(revs, lookup_tree(entry.oid), show, base, entry.path, -cb_data); +cb_data, filter, filter_data); else if (S_ISGITLINK(entry.mode)) process_gitlink(revs, entry.oid->hash, show, base, entry.path, @@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs, process_blob(revs, lookup_blob(entry.oid), show, base, entry.path, -cb_data); +cb_data, filter, filter_data); } + + if (filter) { + r = filter(LOFT_END_TREE, obj, base->buf, &base->buf[baselen], filter_data); + if (r & LOFR_MARK_SEEN) + obj->flags |= SEEN; + if (r & LOFR_SHOW) + show(obj, base->buf, cb_data); + } + strbuf_setlen(base, baselen); free_tree_buffer(tree); } @@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, struct tree *tree) add_pending_object(revs, &tree->object, ""); } -void traverse_commit_list(struct rev_info *revs, - show_commit_fn show_commit, - show_object_fn show_object, - void *data) +void trave
[PATCH 07/13] list-objects-filter-options: common argument parsing
From: Jeff Hostetler Create common routines and defines for parsing list-objects-filter-related command line arguments and pack-protocol fields. Signed-off-by: Jeff Hostetler --- Makefile | 1 + list-objects-filter-options.c | 101 ++ list-objects-filter-options.h | 50 + 3 files changed, 152 insertions(+) create mode 100644 list-objects-filter-options.c create mode 100644 list-objects-filter-options.h diff --git a/Makefile b/Makefile index fc82664..b9ff0b4 100644 --- a/Makefile +++ b/Makefile @@ -810,6 +810,7 @@ LIB_OBJS += list-objects.o LIB_OBJS += list-objects-filter-blobs-limit.o LIB_OBJS += list-objects-filter-blobs-none.o LIB_OBJS += list-objects-filter-map.o +LIB_OBJS += list-objects-filter-options.o LIB_OBJS += list-objects-filter-sparse.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c new file mode 100644 index 000..40f48ac --- /dev/null +++ b/list-objects-filter-options.c @@ -0,0 +1,101 @@ +#include "cache.h" +#include "commit.h" +#include "config.h" +#include "revision.h" +#include "list-objects.h" +#include "list-objects-filter-options.h" + +/* + * Parse value of the argument to the "filter" keword. + * On the command line this looks like: --filter= + * and in the pack protocol as: filter + * + * ::= blob:none + * blob:limit:[kmg] + * sparse:oid: + * sparse:path: + */ +int parse_list_objects_filter(struct list_objects_filter_options *filter_options, + const char *arg) +{ + struct object_context oc; + struct object_id sparse_oid; + const char *v0; + const char *v1; + + if (filter_options->choice) + die(_("multiple object filter types cannot be combined")); + + /* +* TODO consider rejecting 'arg' if it contains any +* TODO injection characters (since we might send this +* TODO to a sub-command or to the server and we don't +* TODO want to deal with legacy quoting/escaping for +* TODO a new feature). +*/ + + filter_options->raw_value = strdup(arg); + + if (skip_prefix(arg, "blob:", &v0) || skip_prefix(arg, "blobs:", &v0)) { + if (!strcmp(v0, "none")) { + filter_options->choice = LOFC_BLOB_NONE; + return 0; + } + + if (skip_prefix(v0, "limit=", &v1) && + git_parse_ulong(v1, &filter_options->blob_limit_value)) { + filter_options->choice = LOFC_BLOB_LIMIT; + return 0; + } + } + else if (skip_prefix(arg, "sparse:", &v0)) { + if (skip_prefix(v0, "oid=", &v1)) { + filter_options->choice = LOFC_SPARSE_OID; + if (!get_oid_with_context(v1, GET_OID_BLOB, + &sparse_oid, &oc)) { + /* +* We successfully converted the +* into an actual OID. Rewrite the raw_value +* in canonoical form with just the OID. +* (If we send this request to the server, we +* want an absolute expression rather than a +* local-ref-relative expression.) +*/ + free((char *)filter_options->raw_value); + filter_options->raw_value = + xstrfmt("sparse:oid=%s", + oid_to_hex(&sparse_oid)); + filter_options->sparse_oid_value = + oiddup(&sparse_oid); + } else { + /* +* We could not turn the into an +* OID. Leave the raw_value as is in case +* the server can parse it. (It may refer to +* a branch, commit, or blob we don't have.) +*/ + } + return 0; + } + + if (skip_prefix(v0, "path=", &v1)) { + filter_options->choice = LOFC_SPARSE_PATH; + filter_options->sparse_path_value = strdup(v1); + return 0; + } + } + + die(_("invalid filter expression '%s'"), arg); + return 0; +} + +int opt_parse_list_objects_filter(const struct option *opt, + const char *arg, int unset) +{ + struct list_objects_filter_options *filter_options = opt->value; + + assert(arg); +
[PATCH 11/13] t6112: rev-list object filtering test
From: Jeff Hostetler Signed-off-by: Jeff Hostetler --- t/t6112-rev-list-filters-objects.sh | 223 1 file changed, 223 insertions(+) create mode 100755 t/t6112-rev-list-filters-objects.sh diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh new file mode 100755 index 000..26fa12f --- /dev/null +++ b/t/t6112-rev-list-filters-objects.sh @@ -0,0 +1,223 @@ +#!/bin/sh + +test_description='git rev-list with object filtering for partial clone' + +. ./test-lib.sh + +# Test the blob:none filter. + +test_expect_success 'setup r1' ' + echo "{print \$1}" >print_1.awk && + echo "{print \$2}" >print_2.awk && + + git init r1 && + for n in 1 2 3 4 5 + do + echo "This is file: $n" > r1/file.$n + git -C r1 add file.$n + git -C r1 commit -m "$n" + done +' + +test_expect_success 'verify blob:none omits all 5 blobs' ' + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r1 rev-list HEAD --quiet --objects --filter-print-omitted --filter=blob:none \ + | awk -f print_1.awk \ + | sed "s/~//" >observed && + test_cmp observed expected +' + +test_expect_success 'verify emitted+omitted == all' ' + git -C r1 rev-list HEAD --objects \ + | awk -f print_1.awk \ + | sort >expected && + git -C r1 rev-list HEAD --objects --filter-print-omitted --filter=blob:none \ + | awk -f print_1.awk \ + | sed "s/~//" \ + | sort >observed && + test_cmp observed expected +' + + +# Test blob:limit=[kmg] filter. +# We boundary test around the size parameter. The filter is strictly less than +# the value, so size 500 and 1000 should have the same results, but 1001 should +# filter more. + +test_expect_success 'setup r2' ' + git init r2 && + for n in 1000 1 + do + printf "%"$n"s" X > r2/large.$n + git -C r2 add large.$n + git -C r2 commit -m "$n" + done +' + +test_expect_success 'verify blob:limit=500 omits all blobs' ' + git -C r2 ls-files -s large.1000 large.1 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted --filter=blob:limit=500 \ + | awk -f print_1.awk \ + | sed "s/~//" >observed && + test_cmp observed expected +' + +test_expect_success 'verify emitted+omitted == all' ' + git -C r2 rev-list HEAD --objects \ + | awk -f print_1.awk \ + | sort >expected && + git -C r2 rev-list HEAD --objects --filter-print-omitted --filter=blob:limit=500 \ + | awk -f print_1.awk \ + | sed "s/~//" \ + | sort >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=1000' ' + git -C r2 ls-files -s large.1000 large.1 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted --filter=blob:limit=1000 \ + | awk -f print_1.awk \ + | sed "s/~//" >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=1001' ' + git -C r2 ls-files -s large.1 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted --filter=blob:limit=1001 \ + | awk -f print_1.awk \ + | sed "s/~//" >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=1k' ' + git -C r2 ls-files -s large.1 \ + | awk -f print_2.awk \ + | sort >expected && + git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted --filter=blob:limit=1k \ + | awk -f print_1.awk \ + | sed "s/~//" >observed && + test_cmp observed expected +' + +test_expect_success 'verify blob:limit=1m' ' + cat expected && + git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted --filter=blob:limit=1m \ + | awk -f print_1.awk \ + | sed "s/~//" >observed && + test_cmp observed expected +' + +# Test sparse:path= filter. +# Use a local file containing a sparse-checkout specification to filter +# out blobs not required for the corresponding sparse-checkout. We do not +# require sparse-checkout to actually be enabled. + +test_expect_success 'setup r3' ' + git init r3 && + mkdir r3/dir1 && + for n in sparse1 sparse2 + do + echo "This is file: $n" > r3/$n + git -C r3 add $n + echo "This is file: dir1/$n" > r3/dir1/$n +
[PATCH 06/13] list-objects-filter-sparse: add sparse filter
From: Jeff Hostetler Create a filter for traverse_commit_list_worker() to only include the blobs the would be referenced by a sparse-checkout using the given specification. Signed-off-by: Jeff Hostetler --- Makefile | 1 + list-objects-filter-sparse.c | 241 +++ list-objects-filter-sparse.h | 30 ++ 3 files changed, 272 insertions(+) create mode 100644 list-objects-filter-sparse.c create mode 100644 list-objects-filter-sparse.h diff --git a/Makefile b/Makefile index 0fdeabb..fc82664 100644 --- a/Makefile +++ b/Makefile @@ -810,6 +810,7 @@ LIB_OBJS += list-objects.o LIB_OBJS += list-objects-filter-blobs-limit.o LIB_OBJS += list-objects-filter-blobs-none.o LIB_OBJS += list-objects-filter-map.o +LIB_OBJS += list-objects-filter-sparse.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o diff --git a/list-objects-filter-sparse.c b/list-objects-filter-sparse.c new file mode 100644 index 000..386b667 --- /dev/null +++ b/list-objects-filter-sparse.c @@ -0,0 +1,241 @@ +#include "cache.h" +#include "dir.h" +#include "tag.h" +#include "commit.h" +#include "tree.h" +#include "blob.h" +#include "diff.h" +#include "tree-walk.h" +#include "revision.h" +#include "list-objects.h" +#include "list-objects-filter-sparse.h" + +#define DEFAULT_MAP_SIZE (16*1024) + +/* + * A filter driven by a sparse-checkout specification to only + * include blobs that a sparse checkout would populate. + * + * The sparse-checkout spec can be loaded from a blob with the + * given OID or from a local pathname. We allow an OID because + * the repo may be bare or we may be doing the filtering on the + * server. + */ +struct frame { + int defval; + int child_prov_omit : 1; +}; + +struct filter_use_sparse_data { + struct oidmap *omits; + struct exclude_list el; + + size_t nr, alloc; + struct frame *array_frame; +}; + +static list_objects_filter_result filter_use_sparse( + list_objects_filter_type filter_type, + struct object *obj, + const char *pathname, + const char *filename, + void *filter_data_) +{ + struct filter_use_sparse_data *filter_data = filter_data_; + struct list_objects_filter_map_entry *entry_prev = NULL; + int val, dtype; + struct frame *frame; + + switch (filter_type) { + default: + die("unkown filter_type"); + return LOFR_ZERO; + + case LOFT_BEGIN_TREE: + assert(obj->type == OBJ_TREE); + dtype = DT_DIR; + val = is_excluded_from_list(pathname, strlen(pathname), + filename, &dtype, &filter_data->el, + &the_index); + if (val < 0) + val = filter_data->array_frame[filter_data->nr].defval; + + ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1, + filter_data->alloc); + filter_data->nr++; + filter_data->array_frame[filter_data->nr].defval = val; + filter_data->array_frame[filter_data->nr].child_prov_omit = 0; + + /* +* A directory with this tree OID may appear in multiple +* places in the tree. (Think of a directory move, with +* no other changes.) And with a different pathname, the +* is_excluded...() results for this directory and items +* contained within it may be different. So we cannot +* mark it SEEN (yet), since that will prevent process_tree() +* from revisiting this tree object with other pathnames. +* +* Only SHOW the tree object the first time we visit this +* tree object. +* +* We always show all tree objects. A future optimization +* may want to attempt to narrow this. +*/ + if (obj->flags & FILTER_REVISIT) + return LOFR_ZERO; + obj->flags |= FILTER_REVISIT; + return LOFR_SHOW; + + case LOFT_END_TREE: + assert(obj->type == OBJ_TREE); + assert(filter_data->nr > 0); + + frame = &filter_data->array_frame[filter_data->nr]; + filter_data->nr--; + + /* +* Tell our parent directory if any of our children were +* provisionally omitted. +*/ + filter_data->array_frame[filter_data->nr].child_prov_omit |= + frame->child_prov_omit; + + /* +* If there are NO provisionally omitted child objects (ALL child +* objects in this folder were INCLUDED), then we can mark the +* folder as SEEN (so we will not have to revisit it again). +
[PATCH 08/13] list-objects: add traverse_commit_list_filtered method
From: Jeff Hostetler Add traverse_commit_list_filtered() wrapper around the various filter methods using common data in object_filter_options. Signed-off-by: Jeff Hostetler --- list-objects.c | 45 + list-objects.h | 11 +++ 2 files changed, 56 insertions(+) diff --git a/list-objects.c b/list-objects.c index 3e86008..4ce2593 100644 --- a/list-objects.c +++ b/list-objects.c @@ -7,6 +7,9 @@ #include "tree-walk.h" #include "revision.h" #include "list-objects.h" +#include "list-objects-filter-blobs-none.h" +#include "list-objects-filter-blobs-limit.h" +#include "list-objects-filter-sparse.h" static void process_blob(struct rev_info *revs, struct blob *blob, @@ -266,3 +269,45 @@ void traverse_commit_list(struct rev_info *revs, show_commit, show_object, show_data, NULL, NULL); } + +void traverse_commit_list_filtered( + struct list_objects_filter_options *filter_options, + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + list_objects_filter_map_foreach_cb print_omitted_object, + void *show_data) +{ + switch (filter_options->choice) { + case LOFC_DISABLED: + traverse_commit_list(revs, show_commit, show_object, show_data); + return; + + case LOFC_BLOB_NONE: + traverse_commit_list__blobs_none( + revs, show_commit, show_object, print_omitted_object, + show_data); + return; + + case LOFC_BLOB_LIMIT: + traverse_commit_list__blobs_limit( + revs, show_commit, show_object, print_omitted_object, + show_data, filter_options->blob_limit_value); + return; + + case LOFC_SPARSE_OID: + traverse_commit_list__sparse_oid( + revs, show_commit, show_object, print_omitted_object, + show_data, filter_options->sparse_oid_value); + return; + + case LOFC_SPARSE_PATH: + traverse_commit_list__sparse_path( + revs, show_commit, show_object, print_omitted_object, + show_data, filter_options->sparse_path_value); + return; + + default: + die("unspecified list-objects filter"); + } +} diff --git a/list-objects.h b/list-objects.h index 43a06fb..d14b0e0 100644 --- a/list-objects.h +++ b/list-objects.h @@ -1,6 +1,9 @@ #ifndef LIST_OBJECTS_H #define LIST_OBJECTS_H +#include "list-objects-filter-map.h" +#include "list-objects-filter-options.h" + typedef void (*show_commit_fn)(struct commit *, void *); typedef void (*show_object_fn)(struct object *, const char *, void *); void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, void *); @@ -38,4 +41,12 @@ void traverse_commit_list_worker( show_commit_fn, show_object_fn, void *show_data, filter_object_fn filter, void *filter_data); +void traverse_commit_list_filtered( + struct list_objects_filter_options *filter_options, + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + list_objects_filter_map_foreach_cb print_omitted_object, + void *show_data); + #endif /* LIST_OBJECTS_H */ -- 2.9.3
[PATCH 00/13] WIP Partial clone part 1: object filtering
From: Jeff Hostetler I've been working with Jonathan Tan to combine our partial clone proposals. This patch series represents a first step in that effort and introduces an object filtering mechanism to select unwanted objects. [1] traverse_commit_list and list-objects is extended to allow various filters. [2] rev-list is extended to expose filtering. This allows testing of the filtering options. And can be used later to predict missing objects before commands like checkout or merge. [3] pack-objects is extended to use filtering parameters and build packfiles that omit unwanted objects. This patch series lays the ground work for subsequent parts which will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc. Jeff Hostetler (13): dir: allow exclusions from blob in addition to file list-objects-filter-map: extend oidmap to collect omitted objects list-objects: filter objects in traverse_commit_list list-objects-filter-blobs-none: add filter to omit all blobs list-objects-filter-blobs-limit: add large blob filtering list-objects-filter-sparse: add sparse filter list-objects-filter-options: common argument parsing list-objects: add traverse_commit_list_filtered method extension.partialclone: introduce partial clone extension rev-list: add list-objects filtering support t6112: rev-list object filtering test pack-objects: add list-objects filtering t5317: pack-objects object filtering test Documentation/git-pack-objects.txt | 8 +- Documentation/git-rev-list.txt | 5 +- Documentation/rev-list-options.txt | 30 ++ Documentation/technical/repository-version.txt | 22 ++ Makefile | 6 + builtin/pack-objects.c | 18 +- builtin/rev-list.c | 84 +- cache.h| 4 + config.h | 3 + dir.c | 51 +++- dir.h | 3 + environment.c | 2 + list-objects-filter-blobs-limit.c | 146 ++ list-objects-filter-blobs-limit.h | 18 ++ list-objects-filter-blobs-none.c | 83 ++ list-objects-filter-blobs-none.h | 18 ++ list-objects-filter-map.c | 63 list-objects-filter-map.h | 26 ++ list-objects-filter-options.c | 101 +++ list-objects-filter-options.h | 50 list-objects-filter-sparse.c | 241 list-objects-filter-sparse.h | 30 ++ list-objects.c | 111 +-- list-objects.h | 43 ++- partial-clone-utils.c | 99 +++ partial-clone-utils.h | 34 +++ setup.c| 15 + t/t5317-pack-objects-filter-objects.sh | 384 + t/t6112-rev-list-filters-objects.sh| 223 ++ 29 files changed, 1897 insertions(+), 24 deletions(-) create mode 100644 list-objects-filter-blobs-limit.c create mode 100644 list-objects-filter-blobs-limit.h create mode 100644 list-objects-filter-blobs-none.c create mode 100644 list-objects-filter-blobs-none.h create mode 100644 list-objects-filter-map.c create mode 100644 list-objects-filter-map.h create mode 100644 list-objects-filter-options.c create mode 100644 list-objects-filter-options.h create mode 100644 list-objects-filter-sparse.c create mode 100644 list-objects-filter-sparse.h create mode 100644 partial-clone-utils.c create mode 100644 partial-clone-utils.h create mode 100755 t/t5317-pack-objects-filter-objects.sh create mode 100755 t/t6112-rev-list-filters-objects.sh -- 2.9.3
[PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects
From: Jeff Hostetler Create helper class to extend oidmap to collect a list of omitted or missing objects during traversal. This will be used in a later commit by the list-object filtering code. Signed-off-by: Jeff Hostetler --- Makefile | 1 + list-objects-filter-map.c | 63 +++ list-objects-filter-map.h | 26 +++ 3 files changed, 90 insertions(+) create mode 100644 list-objects-filter-map.c create mode 100644 list-objects-filter-map.h diff --git a/Makefile b/Makefile index cd75985..e59f12d 100644 --- a/Makefile +++ b/Makefile @@ -807,6 +807,7 @@ LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o LIB_OBJS += line-range.o LIB_OBJS += list-objects.o +LIB_OBJS += list-objects-filter-map.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o diff --git a/list-objects-filter-map.c b/list-objects-filter-map.c new file mode 100644 index 000..7e496b3 --- /dev/null +++ b/list-objects-filter-map.c @@ -0,0 +1,63 @@ +#include "cache.h" +#include "list-objects-filter-map.h" + +int list_objects_filter_map_insert(struct oidmap *map, + const struct object_id *oid, + const char *pathname, enum object_type type) +{ + size_t len, size; + struct list_objects_filter_map_entry *e; + + if (oidmap_get(map, oid)) + return 1; + + len = ((pathname && *pathname) ? strlen(pathname) : 0); + size = (offsetof(struct list_objects_filter_map_entry, pathname) + len + 1); + e = xcalloc(1, size); + + oidcpy(&e->entry.oid, oid); + e->type = type; + if (pathname && *pathname) + strcpy(e->pathname, pathname); + + oidmap_put(map, e); + return 0; +} + +static int my_cmp(const void *a, const void *b) +{ + const struct oidmap_entry *ea, *eb; + + ea = *(const struct oidmap_entry **)a; + eb = *(const struct oidmap_entry **)b; + + return oidcmp(&ea->oid, &eb->oid); +} + +void list_objects_filter_map_foreach(struct oidmap *map, +list_objects_filter_map_foreach_cb cb, +void *cb_data) +{ + struct hashmap_iter iter; + struct list_objects_filter_map_entry **array; + struct list_objects_filter_map_entry *e; + int k, nr; + + nr = hashmap_get_size(&map->map); + if (!nr) + return; + + array = xcalloc(nr, sizeof(*e)); + + k = 0; + hashmap_iter_init(&map->map, &iter); + while ((e = hashmap_iter_next(&iter))) + array[k++] = e; + + QSORT(array, nr, my_cmp); + + for (k = 0; k < nr; k++) + cb(k, nr, array[k], cb_data); + + free(array); +} diff --git a/list-objects-filter-map.h b/list-objects-filter-map.h new file mode 100644 index 000..794fc81 --- /dev/null +++ b/list-objects-filter-map.h @@ -0,0 +1,26 @@ +#ifndef LIST_OBJECTS_FILTER_MAP_H +#define LIST_OBJECTS_FILTER_MAP_H + +#include "oidmap.h" + +struct list_objects_filter_map_entry { + struct oidmap_entry entry; /* must be first */ + enum object_type type; + char pathname[FLEX_ARRAY]; +}; + +extern int list_objects_filter_map_insert( + struct oidmap *map, + const struct object_id *oid, + const char *pathname, enum object_type type); + +typedef void (*list_objects_filter_map_foreach_cb)( + int i, int i_limit, + struct list_objects_filter_map_entry *e, void *cb_data); + +extern void list_objects_filter_map_foreach( + struct oidmap *map, + list_objects_filter_map_foreach_cb cb, + void *cb_data); + +#endif /* LIST_OBJECTS_FILTER_MAP_H */ -- 2.9.3
Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only
On 10/23, Stefan Beller wrote: > Signed-off-by: Stefan Beller > --- > > diff.c | 10 ++-- > diff.h | 1 + > t/t4015-diff-whitespace.sh | 114 > - > > See, only 10 lines of code! (and a few more for tests) > > We have run out of space in diff_options.flags,touched_flags. > as we 1< unused (removed in 882749a04f (diff: add --word-diff option that > generalizes --color-words, 2010-04-14)). But that postpones the > real fix for only a short amount of time. > > Ideas welcome how to extend the flag space. (We cannot just make it > a long either, as some arcane architecures have 32 bit longs.) One simple idea would be to convert the single 'flag' into various bit fields themselves, that way if you need to add a new flag you would just make a new bit field. I'm unaware of any downsides of doing so (though i may be missing something) but doing so would probably cause a bit of code churn. -- Brandon Williams
Re: [RFC] protocol version 2
On 10/24, Junio C Hamano wrote: > Brandon Williams writes: > > > * Capabilities were implemented as a hack and are hidden behind a NUL > > byte after the first ref sent from the server during the ref > > advertisement: > > ... > > > > * Various other technical debt (e.g. abusing capabilities to > > communicate agent and symref data, service name set using a query > > parameter). > > This sounds like a duplicate of the above. You're right, it mostly is a duplication of that. > > > Special Packets > > - > > > > In protocol v2 these special packets will have the following semantics: > > > > * '' Flush Packet (flush-pkt) - indicates the end of a message > > * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list > > After reading the remainder of the document twice, I do not think > the reason why we want to introduce a new type delim-pkt is > explained and justified well enough. If a command request takes a > command packet, zero or more capability packets, a mandatory > delimiter packet, zero or more parameter packets and a mandatory > flush packet, then you can use the same "flush" as delimiter in the > middle. The delimiter will of course become useful if you can omit > it when not necessary (e.g. after seeing capabilities, you may see a > flush and you will know there is no parameters and save the need to > send one "delim"). > > I actually have a reasonable guess why you want to have a separate > delimiter (which has nothing to do with "optional delim can be > omitted"), but I want to see it explained in this document clearly > by its designer(s). Jonathan Tan suggested that we tighten flush semantics in a newer protocol so that proxies are easier to work with. Currently proxies need to understand the protocol instead of simply waiting for a flush. Also I've been told the smart http code is more complex because of the current semantics of flush packets. > > > command-request = command > > capability-list > > delim-pkt > > (command specific parameters) > > flush-pkt > > command = PKT-LINE("command=" key LF) > > > > The server will then acknowledge the command and requested capabilities > > by echoing them back to the client and then launch into the command. > > > > acknowledge-request = command > > capability-list > > delim-pkt > > execute-command > > execute-command = > > It is not quite clear what the value of "echoing them back" is, > especially if that is done by always echoing verbatim. A reader may > naturally expect, when capabilities are exchanged between two > parties, these are negotiated so that the only ones that are > commonly supported by both ends would be used, or something like > that. The echoing back of the command and requested capabilities may or may not be needed. A client should only ever issue a command-request using advertised capabilities and commands so there really isn't much negotiation happening, just the server saying "here's what's on the menu" and the client picking only from that menu. Where the echoing back may be useful is if we wanted to (at some point) eliminate this initial round trip of doing the capability advertisement and then subsequent selection of capabilities and instead stuff that information into the GIT_PROTOCOL side channel in the initial request. That way the client could optimistically send capabilities that it doesn't yet know if the server supports. I thought this might be an interesting idea if we really really didn't want to live with the extra round trip. > > > A particular command can last for as many rounds as are required to > > complete the service (multiple for negotiation during fetch and push or > > no additional trips in the case of ls-refs). > > OK. > > > Commands in v2 > > > > > > Services are the core actions that a client wants to perform (fetch, > > push, etc). Each service has its own set of capabilities and its own > > language of commands (think 'want' lines in fetch). Optionally a > > service can take in initial parameters or data when a client sends it > > service request. > > So a service (like "fetch") employ a set of "command"s (like "want", > "have, etc)? In the earlier part of the document, we did not see > any mention of "service" and instead saw only "command" mentioned. > Is the state machine on both ends and the transition between states > implicit? E.g. when one side throws "want" command and the other > side acknowledges, they understand implicitly that they are now in a > "fetch" service session, even though there is nothing that said over > the wire that they are doing so? One reason I am wondering about > this is what we should do if a command verb may be applicable in > multiple services. Looks like I missed changing the word
Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads
Am 24.10.2017 um 02:52 schrieb Junio C Hamano: > René Scharfe writes: > >> Am 21.10.2017 um 14:18 schrieb Junio C Hamano: >>> René Scharfe writes: >>> FWIW, I use "-?" for that everywhere. I have yet to find a command or environment where it does something dangerous. >>> >>> Yeah, it would have made the world a better place if we made that >>> choice back in 2008. If we start a transition to make it so right >>> now, we might be able to make the world a better place by 2022, >>> perhaps. I am not sure if the pain during the transition is worth >>> it, though. >> >> "-?" works fine with builtins already -- they complain that the option >> is unknown and then show the short help text. > > Ah, I misunderstood what you meant, then. I thought you were > advocating to move the built-in short-help support to know about and > explicitly react to "-?", and somehow forgot that it "works" more or > less already. I don't mean to advocate here -- it's more like trying to get the accounting right. A little bit of OCD perhaps? > The fact that "-?" already works for most things is good, but the > transition pain still remains, as what's costly is to transition > people's expectation (i.e. "'-?' and not '-h' is the way to get > short help from any subcommand"), not the implementation to fill the > gaps for those that do not yet support '-?', I am afraid. A minor cost for help-seeking users would be the extra error message at the top of the short help text. The main change would cause "git ls-remote -h" to show remote heads and "git show-ref -h" to show HEAD. Confused users would have to resort to e.g. man, -help, --help, their search engine of choice, or -?. I feel this could be justified. It would be different if e.g. reset started to take -h as shorthand for --hard, as this would cause data loss. The hard part would probably be allowing the execution of commands with unknown arguments outside of repositories to show the help text, even if they'd normally (with correct arguments) require one. Converting all commands from RUN_SETUP to RUN_SETUP_GENTLY seems painful. Showing help when a required repo is not found might be possible somehow. With that I'm going to shut up, as I don't even use the affected commands, nor can I imagine being in the place of someone impacted by "git -h" not showing a help text. René
Re: Consequences of CRLF in index?
Hi, Lars Schneider wrote: > I've migrated a large repo (110k+ files) with a lot of history (177k commits) > and a lot of users (200+) to Git. Unfortunately, all text files in the index > of the repo have CRLF line endings. In general this seems not to be a problem > as the project is developed exclusively on Windows. Sounds good. > However, I wonder if there are any "hidden consequences" of this setup? > If there are consequences, then I see two options. Either I rebase the repo > and fix the line endings for all commits or I add a single commit that fixes > the line endings for all files. Both actions require coordination with the > users to avoid repo trouble/merge conflicts. The "single fixup commit" options > would also make diffs into the past look bad. Would a single large commit have > any impact on the performance of standard Git operations? There are no hidden consequences that I'm aware of. If you later decide that you want to become a cross-platform project, then you may want to switch to LF endings, in which case I suggest the "single fixup commit" strategy. In any event, you also probably want to declare what you're doing using .gitattributes. By checking in the files as CRLF, you are declaring that you do *not* want Git to treat them as text files (i.e., you do not want Git to change the line endings), so something as simple as * -text should do the trick. See gitattributes(5) for details. I'd be interested to hear what happens when diff-ing across a line ending fixup commit. Is this an area where Git needs some improvement? "git merge" knows an -Xrenormalize option to deal with a related problem --- it's possible that "git diff" needs to learn a similar trick. Thanks and hope that helps, Jonathan
Consequences of CRLF in index?
Hi, I've migrated a large repo (110k+ files) with a lot of history (177k commits) and a lot of users (200+) to Git. Unfortunately, all text files in the index of the repo have CRLF line endings. In general this seems not to be a problem as the project is developed exclusively on Windows. However, I wonder if there are any "hidden consequences" of this setup? If there are consequences, then I see two options. Either I rebase the repo and fix the line endings for all commits or I add a single commit that fixes the line endings for all files. Both actions require coordination with the users to avoid repo trouble/merge conflicts. The "single fixup commit" options would also make diffs into the past look bad. Would a single large commit have any impact on the performance of standard Git operations? Thanks, Lars
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On 24 October 2017 at 18:45, Eric Sunshine wrote: > On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: >> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin >> wrote: >>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in >>> init_merge_options(). >>> Consequential call of getenv() may return NULL pointer and strtol() crashes. >>> However the stored pointer to the obtained getenv() result may be >>> invalidated >>> by some other getenv() call from another thread as getenv() is not >>> thread-safe. I'm having trouble wrapping my head around this. Under which circumstances could the second call in the current code return NULL, but the code after your patch behave in a well-defined (and correct) way? > The distance between getenv() and the point where the value is > actually used is a big concern due to not knowing what is or might be > going on in called functions between the two points. According to [1], > the value returned by getenv() could be invalidated by another call to > getenv() (or setenv() or unsetenv() or putenv()), and we don't have > guarantee that we're safe from such invalidation considering that this > function calls out to others. For instance, after getenv() but before > the value is used, init_merge_options() calls merge_recursive_config() > which calls git_config() which calls git_xmerge_config(), and so on. > > For this reason, I have difficulty endorsing this change as-is. Yeah. The call should be immediately before `merge_verbosity` is used. Then, if a compiler wants to move the call, it has to do the work and prove that it's ok.
[PATCH] add a test for "describe --contains" with mixed tags
From: Orgad Shaneh If a repository has early lightweight tags and annotated tags later, running git describe --contains for an early commit used the annotated tag for reference, instead of the lightweight tag which was closer. This has been fixed in ef1e74065c19cc427c4a1b322154fd55d7a3588b, and regression was tweaked in 5554451de61cb90e530f30b96e62d455e1eff6a1. Add a test for that to avoid further regressions. Signed-off-by: Orgad Shaneh --- t/t6120-describe.sh | 16 1 file changed, 16 insertions(+) diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 1c0e8659d..08427f4b5 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -340,4 +340,20 @@ test_expect_success ULIMIT_STACK_SIZE 'describe works in a deep repo' ' test_cmp expect actual ' +test_expect_success 'describe --contains for light before annotated' ' + test_tick && + git commit --allow-empty -m First && + test_tick && + git commit --allow-empty -m Second && + test_tick && + git commit --allow-empty -m Third && + test_tick && + git tag light-before-annotated HEAD^ && + test_tick && + git tag -a -m annotated annotated-after-light + +' + +check_describe light-before-annotated~1 --contains light-before-annotated~1 + test_done -- 2.15.0.rc2
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller wrote: > On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin > wrote: >> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in >> init_merge_options(). >> Consequential call of getenv() may return NULL pointer and strtol() crashes. >> However the stored pointer to the obtained getenv() result may be invalidated >> by some other getenv() call from another thread as getenv() is not >> thread-safe. > > But do we have other threads running at the time? I feel uncomfortable about this change, not due to lack of thread safety, but due to the distance between the getenv() invocation and its point of use. See below for more detail. >> Signed-off-by: Andrey Okoshkin >> --- >> diff --git a/merge-recursive.c b/merge-recursive.c >> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct >> merge_options *o) >> void init_merge_options(struct merge_options *o) >> { >> + const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY"); > > Despite not being in a threaded environment, I wonder if we want to > minimize the time between calling getenv and the use of the result, > i.e. declare merge_verbosity here, but assign it later, just before the > condition? > > With or without this change: > Reviewed-by: Stefan Beller The distance between getenv() and the point where the value is actually used is a big concern due to not knowing what is or might be going on in called functions between the two points. According to [1], the value returned by getenv() could be invalidated by another call to getenv() (or setenv() or unsetenv() or putenv()), and we don't have guarantee that we're safe from such invalidation considering that this function calls out to others. For instance, after getenv() but before the value is used, init_merge_options() calls merge_recursive_config() which calls git_config() which calls git_xmerge_config(), and so on. For this reason, I have difficulty endorsing this change as-is. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html >> memset(o, 0, sizeof(struct merge_options)); >> o->verbosity = 2; >> o->buffer_output = 1; >> @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o) >> o->renormalize = 0; >> o->detect_rename = 1; >> merge_recursive_config(o); >> - if (getenv("GIT_MERGE_VERBOSITY")) >> - o->verbosity = >> - strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10); >> + if (merge_verbosity) >> + o->verbosity = strtol(merge_verbosity, NULL, 10); >> if (o->verbosity >= 5) >> o->buffer_output = 0; >> strbuf_init(&o->obuf, 0); >> -- >> 2.14.3
Re: [PATCH v2 1/1] completion: add remaining flags to checkout
Am 24.10.2017 um 15:19 schrieb Thomas Braun: In the commits 1fc458d9 (builtin/checkout: add --recurse-submodules switch, 2017-03-14), 08d595dc (checkout: add --ignore-skip-worktree-bits in sparse checkout mode, 2013-04-13) and 32669671 (checkout: introduce --detach synonym for "git checkout foo^{commit}", 2011-02-08) checkout gained new flags but the completion was not updated, although these flags are useful completions. Add them. The flags --force and --ignore-other-worktrees are not added as they are potentially dangerous. The flags --progress and --no-progress are only useful for scripting and are therefore also not included. Signed-off-by: Thomas Braun --- contrib/completion/git-completion.bash | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index d934417475..eb6ade6974 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1250,7 +1250,8 @@ _git_checkout () --*) __gitcomp " --quiet --ours --theirs --track --no-track --merge - --conflict= --orphan --patch + --conflict= --orphan --patch --detach --ignore-skip-worktree-bits + --recurse-submodules --no-recurse-submodules " ;; *) Looks good to me. Thanks, -- Hannes
Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin wrote: > Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in > init_merge_options(). > Consequential call of getenv() may return NULL pointer and strtol() crashes. > However the stored pointer to the obtained getenv() result may be invalidated > by some other getenv() call from another thread as getenv() is not > thread-safe. But do we have other threads running at the time? Inspecting the four callsites: * sequencer.c: The prior lines to hold the index lock suggests we're not in a threaded environment * builtin/merge-recursive.c: In cmd_merge_recursive and we're fiddling with argv/argc, which suggests we in a main function, not having threads around * builtin/am.c: fall_back_threeway called by am_run. (am is not threaded) * builtin/merge: In try_merge_strategy called from the main function, also index locking * builtin/checkout.c: merge_working_tree also locks the index. So I think this function is never called from within a threaded environment in git. > > Signed-off-by: Andrey Okoshkin > --- > merge-recursive.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/merge-recursive.c b/merge-recursive.c > index 1494ffdb8..eaac98145 100644 > --- a/merge-recursive.c > +++ b/merge-recursive.c > @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options > *o) > > void init_merge_options(struct merge_options *o) > { > + const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY"); Despite not being in a threaded environment, I wonder if we want to minimize the time between calling getenv and the use of the result, i.e. declare merge_verbosity here, but assign it later, just before the condition? (The compiler may shuffle stuff around anyway, so this is a moot suggestion; It gears mostly towards making the code more readable/maintainable when presenting this part of the code to the user.) With or without this change: Reviewed-by: Stefan Beller > memset(o, 0, sizeof(struct merge_options)); > o->verbosity = 2; > o->buffer_output = 1; > @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o) > o->renormalize = 0; > o->detect_rename = 1; > merge_recursive_config(o); > - if (getenv("GIT_MERGE_VERBOSITY")) > - o->verbosity = > - strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10); > + if (merge_verbosity) > + o->verbosity = strtol(merge_verbosity, NULL, 10); > if (o->verbosity >= 5) > o->buffer_output = 0; > strbuf_init(&o->obuf, 0); > -- > 2.14.3
Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts
On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty wrote: > diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh > @@ -34,6 +34,86 @@ test_update_rejected () { > +# Test adding and deleting D/F-conflicting references in a single > +# transaction. > +df_test() { > + local prefix="$1" > + shift > + local pack=: Isn't "local" a Bash-ism we want to keep out of the test scripts? > + local symadd=false > + local symdel=false > + local add_del=false > + local addref > + local delref > + while test $# -gt 0 > + do
Re: [PATCH] path: use xmalloc in add_to_trie
On Tue, Oct 24, 2017 at 8:15 AM, Andrey Okoshkin wrote: > Add usage of xmalloc() instead of malloc() in add_to_trie() as xmalloc wraps > and checks memory allocation result. > > Signed-off-by: Andrey Okoshkin > --- > Hello, > I'm not sure but it looks like there is a missing check of the malloc result. > memcpy() may crash with SIGSEGV due to the memory allocation failure. > make_trie_node() uses xmalloc() and xcalloc() - so I believe add_to_trie() > also should use it. Good catch! Thanks for spotting. Trying to find similar occurrences via git grep "= malloc" did not yield other places that need the same fix. Thanks, Stefan
Re: [PATCH] status: do not get confused by submodules in excluded directories
On Tue, Oct 24, 2017 at 5:15 AM, Heiko Voigt wrote: > Looks good to me. Same here, Thanks, Stefan
Re: No log --no-decorate completion?
On Tue, Oct 24, 2017 at 8:15 AM, Max Rothman wrote: > Just re-discovered this in my inbox. So is this worth fixing? I could > (probably) figure out a patch. > > Thanks, > Max > $ git log -- Display all 100 possibilities? (y or n) I guess adding one more option is no big deal, so go for it! We also have a couple --no-options already (as you said earlier), which I think makes it even more viable. Tangent: I wonder if we can cascade the completion by common prefixes, e.g. --no- or --ignore- occurs a couple of times, such that we could only show these --no- once and only if you try autocompleting thereafter you get all --no- options. Thanks for reviving this thread! Stefan
Re: No log --no-decorate completion?
Just re-discovered this in my inbox. So is this worth fixing? I could (probably) figure out a patch. Thanks, Max On Thu, Oct 12, 2017 at 1:41 PM, Max Rothman wrote: > To be fair, other --no* options complete, it's just --no-decorate, > --no-walk, --no-abbrev-commit, --no-expand-tabs, --no-patch, > --no-indent-heuristic, and --no-textconv that's missing. > > For example: > $ git log --no > --no-color --no-max-parents --no-min-parents --no-prefix > --not > --no-ext-diff --no-merges--no-notes --no-renames > --notes > > Thanks, > Max > > On Wed, Oct 11, 2017 at 2:09 PM, Stefan Beller wrote: >> On Wed, Oct 11, 2017 at 7:47 AM, Max Rothman wrote: >>> I recently noticed that in the git-completion script, there's >>> completion for --decorate={full,yes,no} for git log and family, but >>> not for --no-decorate. Is that intentional? If not, I *think* I see >>> how it could be added. >>> >>> Thanks, >>> Max >> >> Using git-blame, I found af4e9e8c87 (completion: update am, commit, and log, >> 2009-10-07) as well as af16bdaa3f (completion: fix and update 'git log >> --decorate=' >> options, 2015-05-01), both of their commit messages do not discuss leaving >> out >> --no-decorate intentionally. >> >> If you give --no- you'd get more than just the completion to >> --no-decorate, >> but all the negated options, I would assume? >> >> So maybe that is why no one added the negated options, yet? >> >> Thanks, >> Stefan
[PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once
Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in init_merge_options(). Consequential call of getenv() may return NULL pointer and strtol() crashes. However the stored pointer to the obtained getenv() result may be invalidated by some other getenv() call from another thread as getenv() is not thread-safe. Signed-off-by: Andrey Okoshkin --- merge-recursive.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 1494ffdb8..eaac98145 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options *o) void init_merge_options(struct merge_options *o) { + const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY"); memset(o, 0, sizeof(struct merge_options)); o->verbosity = 2; o->buffer_output = 1; @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o) o->renormalize = 0; o->detect_rename = 1; merge_recursive_config(o); - if (getenv("GIT_MERGE_VERBOSITY")) - o->verbosity = - strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10); + if (merge_verbosity) + o->verbosity = strtol(merge_verbosity, NULL, 10); if (o->verbosity >= 5) o->buffer_output = 0; strbuf_init(&o->obuf, 0); -- 2.14.3
[PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure
Since dc39e09942 (files_ref_store: use a transaction to update packed refs, 2017-09-08), failure to lock a reference has been handled incorrectly by `files_transaction_prepare()`. If `lock_ref_for_update()` fails in the lock-acquisition loop of that function, it sets `ret` then breaks out of that loop. Prior to dc39e09942, that was OK, because the only thing following the loop was the cleanup code. But dc39e09942 added another blurb of code between the loop and the cleanup. That blurb sometimes resets `ret` to zero, making the cleanup code think that the locking was successful. Specifically, whenever * One or more reference deletions have been processed successfully in the lock-acquisition loop. (Processing the first such reference causes a packed-ref transaction to be initialized.) * Then `lock_ref_for_update()` fails for a subsequent reference. Such a failure can happen for a number of reasons, such as the old SHA-1 not being correct, lock contention, etc. This causes a `break` out of the lock-acquisition loop. * The `packed-refs` lock is acquired successfully and `ref_transaction_prepare()` succeeds for the packed-ref transaction. This has the effect of resetting `ret` back to 0, and making the cleanup code think that lock acquisition was successful. In that case, any reference updates that were processed prior to breaking out of the loop would be carried out (loose and packed), but the reference that couldn't be locked and any subsequent references would silently be ignored. This can easily cause data loss if, for example, the user was trying to push a new name for an existing branch while deleting the old name. After the push, the branch could be left unreachable, and could even subsequently be garbage-collected. This problem was noticed in the context of deleting one reference and creating another in a single transaction, when the two references D/F conflict with each other, like git update-ref --stdin < --- refs/files-backend.c | 2 +- t/t1404-update-ref-errors.sh | 16 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 014dabb0bf..8cc1e07fdb 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2570,7 +2570,7 @@ static int files_transaction_prepare(struct ref_store *ref_store, ret = lock_ref_for_update(refs, update, transaction, head_ref, &affected_refnames, err); if (ret) - break; + goto cleanup; if (update->flags & REF_DELETING && !(update->flags & REF_LOG_ONLY) && diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh index 8b5e9a83c5..b7838967b8 100755 --- a/t/t1404-update-ref-errors.sh +++ b/t/t1404-update-ref-errors.sh @@ -276,11 +276,11 @@ test_expect_success 'D/F conflict prevents add short + delete long' ' df_test refs/df-as-dl --add-del foo foo/bar ' -test_expect_failure 'D/F conflict prevents delete long + add short' ' +test_expect_success 'D/F conflict prevents delete long + add short' ' df_test refs/df-dl-as --del-add foo/bar foo ' -test_expect_failure 'D/F conflict prevents delete short + add long' ' +test_expect_success 'D/F conflict prevents delete short + add long' ' df_test refs/df-ds-al --del-add foo foo/bar ' @@ -292,17 +292,17 @@ test_expect_success 'D/F conflict prevents add short + delete long packed' ' df_test refs/df-as-dlp --pack --add-del foo foo/bar ' -test_expect_failure 'D/F conflict prevents delete long packed + add short' ' +test_expect_success 'D/F conflict prevents delete long packed + add short' ' df_test refs/df-dlp-as --pack --del-add foo/bar foo ' -test_expect_failure 'D/F conflict prevents delete short packed + add long' ' +test_expect_success 'D/F conflict prevents delete short packed + add long' ' df_test refs/df-dsp-al --pack --del-add foo foo/bar ' # Try some combinations involving symbolic refs... -test_expect_failure 'D/F conflict prevents indirect add long + delete short' ' +test_expect_success 'D/F conflict prevents indirect add long + delete short' ' df_test refs/df-ial-ds --sym-add --add-del foo/bar foo ' @@ -314,11 +314,11 @@ test_expect_success 'D/F conflict prevents indirect add short + indirect delete df_test refs/df-ias-idl --sym-add --sym-del --add-del foo foo/bar ' -test_expect_failure 'D/F conflict prevents indirect delete long + indirect add short' ' +test_expect_success 'D/F conflict prevents indirect delete long + indirect add short' ' df_test refs/df-idl-ias --sym-add --sym-del --del-add foo/bar foo ' -test_expect_failure 'D/F conflict prevents indirect add long + delete short packed' ' +test_expect_success 'D/F conflict prevents indirect add long + delete short packed' ' df_test refs/df-ial-dsp --sym-add --pack --add-del foo/
[PATCH 1/2] t1404: add a bunch of tests of D/F conflicts
It is currently not allowed, in a single transaction, to add one reference and delete another reference if the two reference names D/F conflict with each other (e.g., like `refs/foo/bar` and `refs/foo`). The reason is that the code would need to take locks $GIT_DIR/refs/foo.lock $GIT_DIR/refs/foo/bar.lock But the latter lock couldn't coexist with the loose reference file $GIT_DIR/refs/foo , because `$GIT_DIR/refs/foo` cannot be both a directory and a file at the same time (hence the name "D/F conflict). Add a bunch of tests that we cleanly reject such transactions. In fact, many of the new tests currently fail. They will be fixed in the next commit along with an explanation. Reported-by: Jeff King Signed-off-by: Michael Haggerty --- t/t1404-update-ref-errors.sh | 146 +++ 1 file changed, 146 insertions(+) diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh index 100d50e362..8b5e9a83c5 100755 --- a/t/t1404-update-ref-errors.sh +++ b/t/t1404-update-ref-errors.sh @@ -34,6 +34,86 @@ test_update_rejected () { Q="'" +# Test adding and deleting D/F-conflicting references in a single +# transaction. +df_test() { + local prefix="$1" + shift + local pack=: + local symadd=false + local symdel=false + local add_del=false + local addref + local delref + while test $# -gt 0 + do + case "$1" in + --pack) + pack="git pack-refs --all" + shift + ;; + --sym-add) + # Perform the add via a symbolic reference + symadd=true + shift + ;; + --sym-del) + # Perform the del via a symbolic reference + symdel=true + shift + ;; + --del-add) + # Delete first reference then add second + add_del=false + delref="$prefix/r/$2" + addref="$prefix/r/$3" + shift 3 + ;; + --add-del) + # Add first reference then delete second + add_del=true + addref="$prefix/r/$2" + delref="$prefix/r/$3" + shift 3 + ;; + *) + echo 1>&2 "Extra args to df_test: $*" + return 1 + ;; + esac + done + git update-ref "$delref" $C && + if $symadd + then + addname="$prefix/s/symadd" && + git symbolic-ref "$addname" "$addref" + else + addname="$addref" + fi && + if $symdel + then + delname="$prefix/s/symdel" && + git symbolic-ref "$delname" "$delref" + else + delname="$delref" + fi && + cat >expected-err <<-EOF && + fatal: cannot lock ref $Q$addname$Q: $Q$delref$Q exists; cannot create $Q$addref$Q + EOF + $pack && + if $add_del + then + printf "%s\n" "create $addname $D" "delete $delname" + else + printf "%s\n" "delete $delname" "create $addname $D" + fi >commands && + test_must_fail git update-ref --stdin output.err && + test_cmp expected-err output.err && + printf "%s\n" "$C $delref" >expected-refs && + git for-each-ref --format="%(objectname) %(refname)" $prefix/r >actual-refs && + test_cmp expected-refs actual-refs +} + test_expect_success 'setup' ' git commit --allow-empty -m Initial && @@ -188,6 +268,72 @@ test_expect_success 'empty directory should not fool 1-arg delete' ' git update-ref --stdin ' +test_expect_success 'D/F conflict prevents add long + delete short' ' + df_test refs/df-al-ds --add-del foo/bar foo +' + +test_expect_success 'D/F conflict prevents add short + delete long' ' + df_test refs/df-as-dl --add-del foo foo/bar +' + +test_expect_failure 'D/F conflict prevents delete long + add short' ' + df_test refs/df-dl-as --del-add foo/bar foo +' + +test_expect_failure 'D/F conflict prevents delete short + add long' ' + df_test refs/df-ds-al --del-add foo foo/bar +' + +test_expect_success 'D/F conflict prevents add long + delete short packed' ' + df_test refs/df-al-dsp --pack --add-del foo/bar foo +' + +test_expect_success 'D/F conflict prevents add short + delete long packed' ' + df_test refs/df-as-dlp --pack --add-del foo foo/bar +' + +test_expect_failure 'D/F conflict prevents delete long packed + add short' ' + df_test refs/df-dlp-as --pack --del-add foo/bar foo +' + +test_expect_failure 'D/F conflict prevents delete short packe
[PATCH 0/2] Fix an error-handling path when locking refs
In [1], Peff described a bug that he found that could cause a reference transaction to be partly carried-out and partly not, with a successful return. The scenario that he discussed was the deletion of one reference and creation of another, where the two references D/F conflict with each other. But in fact the problem is more general; it can happen whenever a reference deletion within a transaction is processed successfully, and then another reference update in the same transaction fails in `lock_ref_for_update()`. In such a case the failed update and any subsequent ones could be silently ignored. There is a longer explanation in the second patch's commit message. The tests that I added probe a bunch of D/F update scenarios, which I think should be characteristic of the scenarios that would trigger this bug. It would be nice to have tests that examine other types of failures (e.g., wrong `old_oid` values). Bit since the fix is obviously a strict improvement and can prevent data loss, and since the release process is already pretty far along, I wanted to send this out ASAP. We can follow it up later with additional tests. These patches apply to current master. They are also available from my GitHub fork [2] as branch `ref-locking-fix`. While looking at this code again, I realized that the new code rewrites the `packed-refs` file more often than did the old code. Specifically, after dc39e09942 (files_ref_store: use a transaction to update packed refs, 2017-09-08), the `packed-refs` file is overwritten for any transaction that includes a reference delete. Prior to that commit, `packed-refs` was only overwritten if a deleted reference was actually present in the existing `packed-refs` file. This is even the case if there was previously no `packed-refs` file at all; now any reference deletion causes an empty `packed-refs` file to be created. I think this is a conscionable regression, since deleting references that are purely loose is probably not all that common, and the old code had to read the whole `packed-refs` file anyway. Nevertheless, it is obviously something that I would like to fix (though maybe not for 2.15? I'm open to input about its urgency.) [1] https://public-inbox.org/git/20171024082409.smwsd6pla64jj...@sigill.intra.peff.net/ [2] https://github.com/mhagger/git Michael Haggerty (2): t1404: add a bunch of tests of D/F conflicts files_transaction_prepare(): fix handling of ref lock failure refs/files-backend.c | 2 +- t/t1404-update-ref-errors.sh | 146 +++ 2 files changed, 147 insertions(+), 1 deletion(-) -- 2.14.1
[PATCH] path: use xmalloc in add_to_trie
Add usage of xmalloc() instead of malloc() in add_to_trie() as xmalloc wraps and checks memory allocation result. Signed-off-by: Andrey Okoshkin --- Hello, I'm not sure but it looks like there is a missing check of the malloc result. memcpy() may crash with SIGSEGV due to the memory allocation failure. make_trie_node() uses xmalloc() and xcalloc() - so I believe add_to_trie() also should use it. Best regards, Andrey path.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.c b/path.c index 335d4dd87..69b23ef17 100644 --- a/path.c +++ b/path.c @@ -191,7 +191,7 @@ static void *add_to_trie(struct trie *root, const char *key, void *value) * Split this node: child will contain this node's * existing children. */ - child = malloc(sizeof(*child)); + child = xmalloc(sizeof(*child)); memcpy(child->children, root->children, sizeof(root->children)); child->len = root->len - i - 1; -- 2.14.3
[PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading
There is code in post_read_index_from() to detect out of order cache entries when reading an index file. This order verification adds cost to read_index_from() that grows with the size of the index. Put this on-disk data-structure validation code behind an #ifdef DEBUG so only debug builds have to pay the cost. The effect can be seen using t/perf/p0002-read-cache.sh: Test w/git repo HEADHEAD~1 read_cache/discard_cache 1000 times 0.42(0.01+0.09) 0.48(0.01+0.09) +14.3% read_cache/discard_cache 1000 times 0.41(0.03+0.04) 0.49(0.00+0.10) +19.5% read_cache/discard_cache 1000 times 0.42(0.03+0.06) 0.49(0.06+0.04) +16.7% Test w/10K files HEADHEAD~1 --- read_cache/discard_cache 1000 times 1.58(0.04+0.00) 1.71(0.00+0.07) +8.2% read_cache/discard_cache 1000 times 1.64(0.01+0.07) 1.76(0.01+0.09) +7.3% read_cache/discard_cache 1000 times 1.62(0.03+0.04) 1.71(0.00+0.04) +5.6% Test w/100K files HEAD HEAD~1 - read_cache/discard_cache 1000 times 25.85(0.00+0.06) 27.35(0.01+0.06) +5.8% read_cache/discard_cache 1000 times 25.82(0.01+0.07) 27.25(0.01+0.07) +5.5% read_cache/discard_cache 1000 times 26.00(0.01+0.07) 27.36(0.06+0.03) +5.2% Test with 1,000K filesHEAD HEAD~1 --- read_cache/discard_cache 1000 times 200.61(0.01+0.07) 218.23(0.03+0.06) +8.8% read_cache/discard_cache 1000 times 201.62(0.03+0.06) 217.86(0.03+0.06) +8.1% read_cache/discard_cache 1000 times 201.64(0.01+0.09) 217.89(0.03+0.07) +8.1% Signed-off-by: Ben Peart Signed-off-by: Johannes Schindelin --- Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/95e20f17ff Checkout: git fetch https://github.com/benpeart/git no_ce_order-v2 && git checkout 95e20f17ff ### Interdiff (v1..v2): ### Patches read-cache.c | 4 1 file changed, 4 insertions(+) diff --git a/read-cache.c b/read-cache.c index 65f4fe8375..fc90ec0fce 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1664,6 +1664,7 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk, return ce; } +#ifdef DEBUG static void check_ce_order(struct index_state *istate) { unsigned int i; @@ -1685,6 +1686,7 @@ static void check_ce_order(struct index_state *istate) } } } +#endif static void tweak_untracked_cache(struct index_state *istate) { @@ -1720,7 +1722,9 @@ static void tweak_split_index(struct index_state *istate) static void post_read_index_from(struct index_state *istate) { +#ifdef DEBUG check_ce_order(istate); +#endif tweak_untracked_cache(istate); tweak_split_index(istate); } base-commit: c52ca88430e6ec7c834af38720295070d8a1e330 -- 2.14.1.windows.1.1034.g0776750557
Re: Git diff --no-index and file descriptor inputs
On Thu, 2017-09-07 at 15:55 -0400, Jason Pyeron wrote: > > -Original Message- > > From: Martin Ågren > > Sent: Thursday, September 7, 2017 1:48 PM > > > > On 7 September 2017 at 18:00, Jason Pyeron wrote: > > > > > > I was hoping to leverage --word-diff-regex, but the > > > > --no-index option > > > does not seem to work with <(...) notation. > > > > > > I am I doing something wrong or is this a bug? > > > > There were some patches floating around half a year ago, I > > don't know what happened to them. > > > > From: Dennis Kaarsemaker > > Sent: Friday, January 13, 2017 5:20 AM > > Subject: [PATCH v2 0/2] diff --no-index: support symlinks and pipes > > https://public-inbox.org/git/20170324213110.4331-1-den...@kaarsemaker.net/ > > I see, it goes back to 2016... > > > From: Dennis Kaarsemaker > > Subject: [RFC/PATCH 0/2] git diff <(command1) <(command2) > > Date: Fri, 11 Nov 2016 21:19:56 +0100 > > https://public-inbox.org/git/2016201958.2175-1-den...@kaarsemaker.net/ > > I will read up on those threads weekend, then try to apply the patches and > see what happens. > > Thanks for the google fu help. I hope to send a new version of this soon. I had almost no time to do anything git related in the last year, trying to catch up with mailing list traffic now. D.
[PATCH v2 1/1] completion: add remaining flags to checkout
In the commits 1fc458d9 (builtin/checkout: add --recurse-submodules switch, 2017-03-14), 08d595dc (checkout: add --ignore-skip-worktree-bits in sparse checkout mode, 2013-04-13) and 32669671 (checkout: introduce --detach synonym for "git checkout foo^{commit}", 2011-02-08) checkout gained new flags but the completion was not updated, although these flags are useful completions. Add them. The flags --force and --ignore-other-worktrees are not added as they are potentially dangerous. The flags --progress and --no-progress are only useful for scripting and are therefore also not included. Signed-off-by: Thomas Braun --- contrib/completion/git-completion.bash | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index d934417475..eb6ade6974 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1250,7 +1250,8 @@ _git_checkout () --*) __gitcomp " --quiet --ours --theirs --track --no-track --merge - --conflict= --orphan --patch + --conflict= --orphan --patch --detach --ignore-skip-worktree-bits + --recurse-submodules --no-recurse-submodules " ;; *) -- 2.15.0.rc0.245.g6d586db062
Re: [PATCH v1 1/1] completion: add remaining flags to checkout
> Johannes Sixt hat am 12. Oktober 2017 um 18:50 geschrieben: > > > Am 12.10.2017 um 14:20 schrieb Thomas Braun: > > In the commits 1d0fa898 (checkout: add --ignore-other-wortrees, > > 2015-01-03), 1fc458d9 (builtin/checkout: add --recurse-submodules switch, > > 2017-03-14), 870ebdb9 (checkout: add --progress option, 2015-11-01), > > 08d595dc (checkout: add --ignore-skip-worktree-bits in sparse checkout > > mode, 2013-04-13), 1d0fa898 (checkout: add --ignore-other-wortrees, > > 2015-01-03), 32669671 (checkout: introduce --detach synonym for "git > > checkout foo^{commit}", 2011-02-08) and db941099 (checkout -f: allow > > ignoring unmerged paths when checking out of the index, 2008-08-30) > > checkout gained new flags but the completion was not updated, although > > these flags are useful completions. Add them. > > > > Signed-off-by: Thomas Braun > > --- > > contrib/completion/git-completion.bash | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/contrib/completion/git-completion.bash > > b/contrib/completion/git-completion.bash > > index d934417475..393d4ae230 100644 > > --- a/contrib/completion/git-completion.bash > > +++ b/contrib/completion/git-completion.bash > > @@ -1250,7 +1250,9 @@ _git_checkout () > > --*) > > __gitcomp " > > --quiet --ours --theirs --track --no-track --merge > > - --conflict= --orphan --patch > > + --conflict= --orphan --patch --detach --progress > > --no-progress > > + --force --ignore-skip-worktree-bits > > --ignore-other-worktrees > > Destructive and dangerous options are typically not offered by command > completion. You should omit all three in the line above, IMO. > > Furthermore, --progress and --no-progress are not useful in daily work > on the command line, I think. By offering them, --p would not > complete to --patch anymore, you would need --pa. You should omit > them, too. Thanks for the review. I've fixed that for the next reroll.
Re: v2.15.0-rc2 ref deletion bug
On 10/24/2017 01:05 PM, Michael Haggerty wrote: > On 10/24/2017 10:24 AM, Jeff King wrote: >> I found a potentially serious bug in v2.15.0-rc2 (and earlier release >> candidates, too) that we may want to deal with before the release. >> >> If I do: >> [...] >> then at the end we have no refs at all! > > That's a serious bug. I'm looking into it right now. The fix is trivial (see below). But let me add some tests and make sure that there are no similar breakages in the area, then submit a full patch. Michael - refs/files-backend.c - index 29eb5e826f..fc3f2abcc6 100644 @@ -2523,15 +2523,15 @@ static int files_transaction_prepare(struct ref_store *ref_store, */ for (i = 0; i < transaction->nr; i++) { struct ref_update *update = transaction->updates[i]; ret = lock_ref_for_update(refs, update, transaction, head_ref, &affected_refnames, err); if (ret) - break; + goto cleanup; if (update->flags & REF_DELETING && !(update->flags & REF_LOG_ONLY) && !(update->flags & REF_ISPRUNING)) { /* * This reference has to be deleted from * packed-refs if it exists there.
Re: [RFC PATCH 2/8] commit: move code to update HEAD to libgit
Phillip Wood writes: >> I suspect that it would be sufficient to make update_head() helper >> function take the reflog action message as another parameter >> instead to fix the above, but there may be other reasons why you >> chose to do it this way---I cannot read it in your empty log >> message, though. > > Good point, sorry I should have added some explanation about that. I > went with using setenv() rather than passing a reflog message to > update_head() as it meant there were no changes needed on the sequencer > side as it already sets GIT_REFLOG_ACTION. As the sequencer already sets > GIT_REFLOG_ACTION, and git-commit does not fork any subprocesses I don't Doesn't "git commit" run number of hooks? Is it just the current code does not run any hooks (by chance) after these new setenv() calls are made and we happen to be safe?
Re: [PATCH] status: do not get confused by submodules in excluded directories
On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote: > Johannes Schindelin writes: > > > We meticulously pass the `exclude` flag to the `treat_directory()` > > function so that we can indicate that files in it are excluded rather > > than untracked when recursing. > > > > But we did not yet treat submodules the same way. > > ... "because of that, we ended up showing < what situation>>" would be a nice thing to have here, so that it can > be copied to the release notes for the bugfix. Yes I agree that would be nice here. It was not immediately obvious that this only applies when using both flags: -u and --ignored. Seems to be a corner that not many people are using. At first I thought a plain 'git status' would show that behavior... > How far back a release do we want to make this fix applicable? It > seems that it applies cleanly to maint-2.13 without breaking from my > quick test, so that is probably where I'll queue this, even though > we may no longer issue further maintenance releases on that track. > > Any comment from submodule folks? > > Sorry that I didn't notice this was left unattended by anybody til > now. Will queue while waiting for those who are into submodules to > respond. Looks good to me. Cheers Heiko
Re: v2.15.0-rc2 ref deletion bug
On 10/24/2017 10:24 AM, Jeff King wrote: > I found a potentially serious bug in v2.15.0-rc2 (and earlier release > candidates, too) that we may want to deal with before the release. > > If I do: > > git init -q repo > cd repo > obj=$(git hash-object -w /dev/null) > git update-ref refs/tags/foo $obj > git update-ref --stdin <<-EOF > delete refs/tags/foo > update refs/tags/foo/bar $obj > EOF > git for-each-ref > > then at the end we have no refs at all! That's a serious bug. I'm looking into it right now. > I'd expect one of: > > 1. We delete "foo" before updating "foo/bar", and we end up with a > single ref. I don't think that this is possible in the general case in a single transaction. The problem is that the code would need to take locks refs/tags/foo.lock refs/tags/foo/bar.lock But the latter couldn't coexist with the loose reference file refs/tags/foo , which might already exist. It is only imaginable to do this in a single transaction if you pack and prune `refs/tags/foo` first, to get the loose reference out of the way, before executing the transaction. Even then, you would have to beware of a race where another process writes a loose version of `refs/tags/foo` between the time that `pack-refs` prunes it and the time that the transaction obtains the lock again. > [...] Michael
Docker image for git
Hi, Want to check if there is a docker image/Dockerfile for git which is officially supported? I could not find one under dockerhub official images https://hub.docker.com/explore/ If not, want to check if there is any plan to add an official git image to dockerhub? https://docs.docker.com/docker-hub/official_repos/#how-do-i-create-a-new-official-repository DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: [ANNOUNCE] Git for Windows 2.14.3
Dear Git users, On Tue, 24 Oct 2017, Johannes Schindelin wrote: > Dear Git users, > > It is my pleasure to announce that Git for Windows 2.14.3 is available from: > > https://git-for-windows.github.io/ ... and at this point, usually the release notes are repeated. In this case, they were not, for a simple reason: I am working very hard on automating as much of the release engineering of Git for Windows (first, because it is no fun to do it manually, second, it costs a lot of my time, third, it is not robust if I can easily forget a step, fourth, it is an awfully low bus number, and fifth, it hogs my computer for hours during which I could do other work otherwise, when I have a perfectly fine cloud available at my fingertips). And despite working on this for the entire day (and until almost 4am), a couple of things did not work correctly, including the generation of this mail. I will make sure to write a blog post about this endeavor, in case anybody is interested in the background story. But in the short run: here are the missing release notes: Changes since Git for Windows v2.14.2(3) (October 12th 2017) New Features * Comes with Git v2.14.3. * Git for Windows now ships with a diff helper for OpenOffice documents. * Comes with Git LFS v2.3.4. * Comes with cURL v7.56.1. Bug Fixes * Git for Windows now handles worktrees at the top-level of a UNC share correctly. > Filename | SHA-256 > | --- > Git-2.14.3-64-bit.exe | > 9610e082b823beb7f0da91a98d9f73e1f3f2430c21b2c4e15517dea4f981be3f > Git-2.14.3-32-bit.exe | > 6e5a8a939f3014b396f58622954ab394d7982d036c84571394118f2360bdca96 > PortableGit-2.14.3-64-bit.7z.exe | > 2b1d952078795117a8c4549f6384275e047ebd75c10bea77e675f8b672e6d87a > PortableGit-2.14.3-32-bit.7z.exe | > f2dcb32c3133188d0b7a2c3683adcbebcc10b054467e1754d1b8b7e534a34494 > MinGit-2.14.3-64-bit.zip | > 538294d2b1472e561493b67855f92380d8139011c74be6bf3cdc5b5d321b1345 > MinGit-2.14.3-32-bit.zip | > a91385acb1da220612790807c41d0f304b41093c474b9d7342230ec194a3398e > MinGit-2.14.3-busybox-64-bit.zip | > b7710c7668d7ad3f1f5f7530b601d9bafbe66fcef5563c8ab74d442ac9478d8e > MinGit-2.14.3-busybox-32-bit.zip | > 8982fd12c60a9edd1b6f5f8465354534920bae351d38c867a2f4034a807d8231 > Git-2.14.3-64-bit.tar.bz2 | > a5f09850334d5069afa0013249cc6678a7cde52c673823e5386d5cad9df41f10 > Git-2.14.3-32-bit.tar.bz2 | > 644b7d7593e675f68a5a011d19a0a917430b79fb815f6260b807c00651696fa2 Ciao, Johannes
Re: [RFC PATCH 2/8] commit: move code to update HEAD to libgit
On 07/10/17 10:54, Junio C Hamano wrote: > Phillip Wood writes: > >> From: Phillip Wood >> >> Signed-off-by: Phillip Wood >> --- > > This seems to do a lot more than just moving code, most notably, it > uses setenv() to affect what happens in any subprocesses we may > spawn, and it is unclear if it was verified that this patch is free > of unwanted consequences due to that change (and any others I may > have missed while reading this patch, if any). > > I suspect that it would be sufficient to make update_head() helper > function take the reflog action message as another parameter > instead to fix the above, but there may be other reasons why you > chose to do it this way---I cannot read it in your empty log > message, though. Good point, sorry I should have added some explanation about that. I went with using setenv() rather than passing a reflog message to update_head() as it meant there were no changes needed on the sequencer side as it already sets GIT_REFLOG_ACTION. As the sequencer already sets GIT_REFLOG_ACTION, and git-commit does not fork any subprocesses I don't think this change has any unwanted consequences (I pushed a branch to github before submitting the patches and the test suite passes on travis). It would however be clearer to add a parameter to update_head() for the reflog message as you suggested. > > I will not give line-by-line style nitpick but in general we do not > leave a SP between function name and the open parenthesis that > starts its argument list. New code in this patch seems to use > mixture of styles. Sorry I should have spotted those before I posted this series, I go though all the patches and fix them (this would be a good opportunity for me to try using git-clang-format from next) Thanks for looking at this, did you have time to look at the other changes in this series or did this patch put you off looking further? I'll update and repost probably towards the end of next week. If I continue to base these patches on master then I think the patch that moves the code to print the commit summary will have (trivial) conflicts with the changes in ao/check-resolve-ref-unsafe-result in pu do you want the new patches based pu or are you happy with them based on master? Best Wishes Phillip >> diff --git a/builtin/commit.c b/builtin/commit.c >> index >> 0b8c1ef6f57cfed328d12255e6834adb4bda4137..497778ba2c02afdd4a337969a27ca781e8389040 >> 100644 >> --- a/builtin/commit.c >> +++ b/builtin/commit.c >> @@ -1578,13 +1578,11 @@ int cmd_commit(int argc, const char **argv, const >> char *prefix) >> struct strbuf sb = STRBUF_INIT; >> struct strbuf author_ident = STRBUF_INIT; >> const char *index_file, *reflog_msg; >> -char *nl; >> struct object_id oid; >> struct commit_list *parents = NULL; >> struct stat statbuf; >> struct commit *current_head = NULL; >> struct commit_extra_header *extra = NULL; >> -struct ref_transaction *transaction; >> struct strbuf err = STRBUF_INIT; >> >> if (argc == 2 && !strcmp(argv[1], "-h")) >> @@ -1625,10 +1623,10 @@ int cmd_commit(int argc, const char **argv, const >> char *prefix) >> reflog_msg = getenv("GIT_REFLOG_ACTION"); >> if (!current_head) { >> if (!reflog_msg) >> -reflog_msg = "commit (initial)"; >> +setenv ("GIT_REFLOG_ACTION", "commit (initial)", 1); >> } else if (amend) { >> if (!reflog_msg) >> -reflog_msg = "commit (amend)"; >> +setenv("GIT_REFLOG_ACTION", "commit (amend)", 1); >> parents = copy_commit_list(current_head->parents); >> } else if (whence == FROM_MERGE) { >> struct strbuf m = STRBUF_INIT; >> @@ -1637,7 +1635,7 @@ int cmd_commit(int argc, const char **argv, const char >> *prefix) >> struct commit_list **pptr = &parents; >> >> if (!reflog_msg) >> -reflog_msg = "commit (merge)"; >> +setenv("GIT_REFLOG_ACTION", "commit (merge)", 1); >> pptr = commit_list_append(current_head, pptr); >> fp = xfopen(git_path_merge_head(), "r"); >> while (strbuf_getline_lf(&m, fp) != EOF) { >> @@ -1660,9 +1658,9 @@ int cmd_commit(int argc, const char **argv, const char >> *prefix) >> parents = reduce_heads(parents); >> } else { >> if (!reflog_msg) >> -reflog_msg = (whence == FROM_CHERRY_PICK) >> -? "commit (cherry-pick)" >> -: "commit"; >> +setenv("GIT_REFLOG_ACTION", (whence == FROM_CHERRY_PICK) >> +? "commit (cherry-pick)" >> +: "commit", 1); >> commit_list_insert(current_head, &parents); >> } >> >> @@ -1707,25 +1705,10 @@ int cmd_commit(int argc, const char **argv, const
v2.15.0-rc2 ref deletion bug
I found a potentially serious bug in v2.15.0-rc2 (and earlier release candidates, too) that we may want to deal with before the release. If I do: git init -q repo cd repo obj=$(git hash-object -w /dev/null) git update-ref refs/tags/foo $obj git update-ref --stdin <<-EOF delete refs/tags/foo update refs/tags/foo/bar $obj EOF git for-each-ref then at the end we have no refs at all! I'd expect one of: 1. We delete "foo" before updating "foo/bar", and we end up with a single ref. 2. We complain that we cannot update "foo/bar" because "foo" still exists. I was hoping for (1). But in earlier releases we did (2). That makes sense because it's safer to do all updates in a transaction before doing any deletes (since if there's a simultaneous prune we'd rather see both refs present for a moment rather than neither). But the v2.15 behavior is just buggy, and may lead to data loss (we silently lose the refs, and then a subsequent prune may lose the objects). This bisects to Michael's dc39e09942 (files_ref_store: use a transaction to update packed refs, 2017-09-08). Curiously, it doesn't happen if you reverse the order of the entries in the transaction (which _shouldn't_ matter, since we try to process it atomically, but obviously it just tickles this bug in a funny way). I haven't figured out if the deletion has to be a prefix of the update to trigger the bug, or if the problem is more widespread. -Peff
Re: [PATCH] status: do not get confused by submodules in excluded directories
On Tue, Oct 17, 2017 at 03:10:11PM +0200, Johannes Schindelin wrote: > We meticulously pass the `exclude` flag to the `treat_directory()` > function so that we can indicate that files in it are excluded rather > than untracked when recursing. > > But we did not yet treat submodules the same way. > > Signed-off-by: Johannes Schindelin > --- > Published-As: > https://github.com/dscho/git/releases/tag/submodule-in-excluded-v1 > Fetch-It-Via: git fetch https://github.com/dscho/git submodule-in-excluded-v1 > dir.c | 2 +- > t/t7061-wtstatus-ignore.sh | 14 ++ > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/dir.c b/dir.c > index 1d17b800cf3..9987011da57 100644 > --- a/dir.c > +++ b/dir.c > @@ -1392,7 +1392,7 @@ static enum path_treatment treat_directory(struct > dir_struct *dir, > if (!(dir->flags & DIR_NO_GITLINKS)) { > unsigned char sha1[20]; > if (resolve_gitlink_ref(dirname, "HEAD", sha1) == 0) > - return path_untracked; > + return exclude ? path_excluded : path_untracked; > } > return path_recurse; > } > diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh > index fc6013ba3c8..8c849a4cd2f 100755 > --- a/t/t7061-wtstatus-ignore.sh > +++ b/t/t7061-wtstatus-ignore.sh > @@ -272,4 +272,18 @@ test_expect_success 'status ignored tracked directory > with uncommitted file in t > test_cmp expected actual > ' > > +cat >expected <<\EOF > +!! tracked/submodule/ > +EOF > + > +test_expect_success 'status ignores submodule in excluded directory' ' > + git init tracked/submodule && > + ( > + cd tracked/submodule && > + test_commit initial > + ) && Could this use test_commit -C tracked/submodule initial? > + git status --porcelain --ignored -u tracked/submodule >actual && > + test_cmp expected actual > +' > + > test_done > > base-commit: 111ef79afe185f8731920569450f6a65320f5d5f > -- > 2.14.2.windows.3
git describe returns tags including 'refs/tags/
Hi I am versioning my python package using git tags and the python-versioneer package. Basically versioneer gets the version number from git tags using the command "git describe --tags --dirty --always --long --match 'tag_prefix*'". I do not apply a tag prefix, hence tag_prefix=''. Now to the problem. My build breaks because the command above returns tags like this "refs/tags/2.12.2" which contains invalid characters. Actually, it seems that executing the command above reconfigured the way git describe behaves. because, before running the above command "git describe --tags --dirty" returned 2.12.2-17-gd8e62f9 After, "git describe --tags --dirty" returns warning: tag 'refs/tags/2.12.2' is really '2.12.2' here refs/tags/2.12.2-17-gd8e62f9 Is it so that "git describe --tags --dirty --always --long --match 'tag_prefix*'" changes the way git describe behave or am I missing something? If it does, how can I reset it? Thanks Per ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. **
Re: [RFE] Add minimal universal release management capabilities to GIT
On October 21, 2017 6:56:51 AM PDT, nicolas.mail...@laposte.net wrote: > > >- Mail original - >De: "Stefan Beller" > >> git tags ? > >Too loosely defined to be relied on by project-agnostic tools. That's >what most tools won't ever try to use those. Anything you will define >around tags as they stand is unlikely to work on the project of someone >else I think that this could easily be built by a separate script which provides git release command line and uses tags under the hood in a well formed way. I wouldn't say that the method outlined here works for all projects but I do agree it's fairly common and could work for many projects I think most large projects already use annotated tags and tho they have their own format it works pretty well. Showing a tool that could help projects create more standardized release tags would be helpful. I think such a tool could already be built, scripted to create annotated tags with a well formed name. I don't think you necessarily need to have this in core git, tho I do see that your main goal is to piggyback on git itselfs popularity Thanks Jake -- Sent from my Android device with K-9 Mail. Please excuse my brevity.