Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:
> OK, thanks for working well together.  So does this (1) build on
> Jonathan's fsck-squelching series, or (2) ignores that and builds
> filtering first, potentially leaving the codebase to a broken state
> where it can create fsck-unclean repository until Jonathan's series
> is rebased on top of this, or (3) something else?  [*1*]

Excluding the partialclone patch (patch 9), I think that the answer is
(2), but I don't think that it leaves the codebase in a broken state.
In particular, none of the code modifies the repo, so it can't create
a fsck-unclean one.

Maybe one could say that this leaves the codebase with features that
no one will ever use in the absence of partial clone, but I don't
think that's true - rev-list with blob-size/sparse-specification
filter seems independently useful, at least (for example, when
desiring to operate on a repo in a sparse way without going through a
workdir), and if we're planning to allow listing of objects, we
probably should allow packing as well (especially since this doesn't
add much code).

The above is relevant only if we can exclude the partialclone patch,
but I think that we can and we should, as I wrote in my reply to Jeff
Hostetler [1].

As for how this patch set (excluding the partialclone patch) interacts
with my fsck series, they are relatively independent, as far as I can
tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
the fetch and clone parts, which we plan to instead adapt from Jeff
Hostetler's patches, as far as I know) on top of these and resend
those out once discussion on this has settled.

[1] 
https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/

> I also saw a patch marked as "this is from Jonathan's earlier work",
> taking the authorship (which to me implies that the changes were
> extensive enough), so I am a bit at loss envisioning how this piece
> fits in the bigger picture together with the other piece.

The patch you mentioned is the partialclone patch, which I think can
be considered separately from the rest (as I said above).


Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file

2017-10-24 Thread Junio C Hamano
Jeff Hostetler  writes:

> +static int add_excludes_from_buffer(char *buf, size_t size,
> + const char *base, int baselen,
> + struct exclude_list *el);
> +
>  /*
>   * Given a file with name "fname", read it (either from disk, or from
>   * an index if 'istate' is non-null), parse it and store the
> @@ -754,9 +758,9 @@ static int add_excludes(const char *fname, const char 
> *base, int baselen,
>   struct sha1_stat *sha1_stat)
>  {
>   struct stat st;
> - int fd, i, lineno = 1;
> + int fd;
>   size_t size = 0;
> - char *buf, *entry;
> + char *buf;
>  
>   fd = open(fname, O_RDONLY);
>   if (fd < 0 || fstat(fd, &st) < 0) {

The post-context of this hunk is quite interesting in that there is
a call to read_skip_worktree_file_from_index(); which essentially 
pretends as if we read from the filesystem but in fact it grabs the
blob object name registered in the index and reads from it.

The reason why it is interesting is because this patch adds yet
nother "let's instead read from a blob object" function and there is
no sign to make the existing one take advantage of the new function
seen in this patch.


Re: [PATCH v2 1/1] completion: add remaining flags to checkout

2017-10-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Sixt  writes:
>
>>> The flags --force and --ignore-other-worktrees are not added as they are
>>> potentially dangerous.
>>>
>>> The flags --progress and --no-progress are only useful for scripting and are
>>> therefore also not included.
>>>
>>> Signed-off-by: Thomas Braun 
>>> ---
>>>   contrib/completion/git-completion.bash | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/contrib/completion/git-completion.bash 
>>> b/contrib/completion/git-completion.bash
>>> index d934417475..eb6ade6974 100644
>>> --- a/contrib/completion/git-completion.bash
>>> +++ b/contrib/completion/git-completion.bash
>>> @@ -1250,7 +1250,8 @@ _git_checkout ()
>>> --*)
>>> __gitcomp "
>>> --quiet --ours --theirs --track --no-track --merge
>>> -   --conflict= --orphan --patch
>>> +   --conflict= --orphan --patch --detach 
>>> --ignore-skip-worktree-bits
>>> +   --recurse-submodules --no-recurse-submodules
>>> "
>>> ;;
>>> *)
>>>
>>
>> Looks good to me. Thanks,
>> -- Hannes
>
> Doesn't quite.  This breaks t9902, doesn't it?

I've queued it with the following squashed in.

 t/t9902-completion.sh | 4 
 1 file changed, 4 insertions(+)

diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 2cb999ecfa..fc614dcbfa 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -1245,6 +1245,10 @@ test_expect_success 'double dash "git checkout"' '
--conflict=
--orphan Z
--patch Z
+   --detach Z
+   --ignore-skip-worktree-bits Z
+   --recurse-submodules Z
+   --no-recurse-submodules Z
EOF
 '
 


Re: [PATCH v2 1/1] completion: add remaining flags to checkout

2017-10-24 Thread Junio C Hamano
Johannes Sixt  writes:

>> The flags --force and --ignore-other-worktrees are not added as they are
>> potentially dangerous.
>>
>> The flags --progress and --no-progress are only useful for scripting and are
>> therefore also not included.
>>
>> Signed-off-by: Thomas Braun 
>> ---
>>   contrib/completion/git-completion.bash | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/completion/git-completion.bash 
>> b/contrib/completion/git-completion.bash
>> index d934417475..eb6ade6974 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -1250,7 +1250,8 @@ _git_checkout ()
>>  --*)
>>  __gitcomp "
>>  --quiet --ours --theirs --track --no-track --merge
>> ---conflict= --orphan --patch
>> +--conflict= --orphan --patch --detach 
>> --ignore-skip-worktree-bits
>> +--recurse-submodules --no-recurse-submodules
>>  "
>>  ;;
>>  *)
>>
>
> Looks good to me. Thanks,
> -- Hannes

Doesn't quite.  This breaks t9902, doesn't it?


Re: [PATCHv2 0/2] (x)diff cleanup: remove duplicate code

2017-10-24 Thread Junio C Hamano
Stefan Beller  writes:

> v2:
> * I realized that we don't have to change the hashing function of xdiff;
>   we can rather change the hashing function for the move detection,
>   which is less fundamental.
>   (That way I can shrink the series down to 2 patches)
> * commented and renamed function parameters in the exposed xdiff functions.
> * applies on top of jk/diff-color-moved-fix.

Yes, by reusing the line hashing and comparison from xdiff/ we can
ensure that we will use consistent comparison function, and the
thing we need to focus on will become how correctly the caller uses
the xdiff interface.  This looks much better than the previous one.

Eric's comment on the function parameters is right.  We keep them in
sync with the naming convention of xdiff/ as long as they are still
part of xdiff layer, and the convention there is that the lines
being compared are l1[] and l2[] whose lengths are s1 and s2, if I
am not mistaken (well, I am not, as I just touched the function
there during my lunch break ;-).




Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Junio C Hamano
Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> I've been working with Jonathan Tan to combine our partial clone
> proposals.  This patch series represents a first step in that effort
> and introduces an object filtering mechanism to select unwanted
> objects.
>
> [1] traverse_commit_list and list-objects is extended to allow
> various filters.
> [2] rev-list is extended to expose filtering.  This allows testing
> of the filtering options.  And can be used later to predict
> missing objects before commands like checkout or merge.
> [3] pack-objects is extended to use filtering parameters and build
> packfiles that omit unwanted objects.
>
> This patch series lays the ground work for subsequent parts which
> will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.

OK, thanks for working well together.  So does this (1) build on
Jonathan's fsck-squelching series, or (2) ignores that and builds
filtering first, potentially leaving the codebase to a broken state
where it can create fsck-unclean repository until Jonathan's series
is rebased on top of this, or (3) something else?  [*1*]

I also saw a patch marked as "this is from Jonathan's earlier work",
taking the authorship (which to me implies that the changes were
extensive enough), so I am a bit at loss envisioning how this piece
fits in the bigger picture together with the other piece.


[Footnote]

*1* Not having the answer to this question does bother me, but it is
perfectly fine if the answer is (2), especially while the series
is in a WIP state.



Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
> From: Jeff Hostetler 
>
> I've been working with Jonathan Tan to combine our partial clone
> proposals.  This patch series represents a first step in that effort
> and introduces an object filtering mechanism to select unwanted
> objects.
>
> [1] traverse_commit_list and list-objects is extended to allow
> various filters.
> [2] rev-list is extended to expose filtering.  This allows testing
> of the filtering options.  And can be used later to predict
> missing objects before commands like checkout or merge.
> [3] pack-objects is extended to use filtering parameters and build
> packfiles that omit unwanted objects.
>
> This patch series lays the ground work for subsequent parts which
> will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.

Thanks - I've taken a look at all of them except for the partialclone
extension one, which I've only glanced over briefly. Apart from some
style issues (indentation and typedef-ing of enums) I think that they
generally look all right.

One possible issue with using a formatted filter string (e.g.
blob:none) directly passed to the server as opposed to actual
client-interpreted flags (e.g. --blob-byte-limit=0) is that a user may
be confused if the version of Git they're using supports fancier
filters, which will work if they're running rev-list locally, but not
when they're fetching from a not-so-fancy Git server. Maybe that is
fine, though - something we've discussed internally is an ability to
offload some calculations (e.g. git log -S) to Git servers, and if we
end up doing something similar to that, users will need to deal with
this problem anyway.

The reason why I only glanced over the partialclone patch is because I
think that that change needs more discussion than the rest, and it
would be good to get the others in first. I know that you included the
partialclone patch because it is needed by the rev-list one, but as I
commented in the rev-list one, I think that it does not need to be
aware of partial clones, at least for now. The overall ideas in the
partialclone patch seem good, though - in particular, one conceptual
difference from my patch [1] is that the filter setting is a property
of the repository as opposed to the remote, which does seem like an
improvement, because it makes it easier to make and explain e.g. a
"git log --use-repo-filter -S" command that uses the preconfigured
config.

[1] 
https://public-inbox.org/git/407a298b52a9e0a2ee4135fe844e35b9a14c0f7b.1506714999.git.jonathanta...@google.com/


Re: Consequences of CRLF in index?

2017-10-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Nieder  writes:
>
>> I'd be interested to hear what happens when diff-ing across a line
>> ending fixup commit.  Is this an area where Git needs some
>> improvement?  "git merge" knows an -Xrenormalize option to deal with a
>> related problem --- it's possible that "git diff" needs to learn a
>> similar trick.
>
> My knee-jerk reaction is that (1) the end user definitely wants to
> see preimage and postimage lines are different in such a commit by
> default, one side has and the other side lacks ^M at the end., and
> (2) one of the "whitespace ignoring" options (namely, "ignore space
> at eol") may suffice, but if not, it should be easy to invent a new
> one "ignore CR at eol".

Here is a lunch-time hack to add that option.  As you can see from
the lack of docs, tests and a proper log message, I haven't played
with it long enough to know how buggy it is, though.  I am not all
that interested in polishing it to completion myself and prefer to
leave it as #leftoverbits ;-)

Also I didn't bother teaching this to Stefan's color-moved thing, as
the line comparator it uses will hopefully be unified with the one I
am touching in xdiff/ with this patch.

Have fun.

 diff.c|  5 -
 merge-recursive.c |  2 ++
 xdiff/xdiff.h |  3 ++-
 xdiff/xutils.c| 34 --
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..eeca0fd3b8 100644
--- a/diff.c
+++ b/diff.c
@@ -4202,7 +4202,8 @@ void diff_setup_done(struct diff_options *options)
 
if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
-   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
+   DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
+   DIFF_XDL_TST(options, IGNORE_CR_AT_EOL))
DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
else
DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);
@@ -4659,6 +4660,8 @@ int diff_opt_parse(struct diff_options *options,
DIFF_XDL_SET(options, IGNORE_WHITESPACE_CHANGE);
else if (!strcmp(arg, "--ignore-space-at-eol"))
DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
+   else if (!strcmp(arg, "--ignore-cr-at-eol"))
+   DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
else if (!strcmp(arg, "--ignore-blank-lines"))
DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
else if (!strcmp(arg, "--indent-heuristic"))
diff --git a/merge-recursive.c b/merge-recursive.c
index 1d3f8f0d22..4a49ec93b1 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2251,6 +2251,8 @@ int parse_merge_opt(struct merge_options *o, const char 
*s)
DIFF_XDL_SET(o, IGNORE_WHITESPACE);
else if (!strcmp(s, "ignore-space-at-eol"))
DIFF_XDL_SET(o, IGNORE_WHITESPACE_AT_EOL);
+   else if (!strcmp(s, "ignore-cr-at-eol"))
+   DIFF_XDL_SET(o, IGNORE_CR_AT_EOL);
else if (!strcmp(s, "renormalize"))
o->renormalize = 1;
else if (!strcmp(s, "no-renormalize"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index b090ad8eac..ff16243da9 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -32,7 +32,8 @@ extern "C" {
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
-#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL)
+#define XDF_IGNORE_CR_AT_EOL (1 << 5)
+#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | 
XDF_IGNORE_WHITESPACE_CHANGE | XDF_IGNORE_WHITESPACE_AT_EOL | 
XDF_IGNORE_CR_AT_EOL)
 
 #define XDF_PATIENCE_DIFF (1 << 5)
 #define XDF_HISTOGRAM_DIFF (1 << 6)
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 04d7b32e4e..8720e272b9 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -156,6 +156,21 @@ int xdl_blankline(const char *line, long size, long flags)
return (i == size);
 }
 
+/*
+ * Have we eaten everything on the line, except for an optional
+ * CR at the very end?
+ */
+static int ends_with_optional_cr(const char *l, long s, long i)
+{
+   if (s && l[s-1] == '\n')
+   s--;
+   if (s == i)
+   return 1;
+   if (s == i + 1 && l[i] == '\r')
+   return 1;
+   return 0;
+}
+
 int xdl_recmatch(const char *l1, long s1, const char *l2, long s2, long flags)
 {
int i1, i2;
@@ -170,7 +185,8 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, 
long s2, long flags)
 
/*
 * -w matches everything that matches with -b, and -b in turn
-* matches everything that matches with --ignore-space-at-eol.
+* matches everything that matches with --ignore-space-at-eol,
+* which in turn matches everything that matches with 
--ignore-cr-at-eol.
 *
 * Each flavor of ignoring needs different logic to skip whitespaces
 *

Re: [PATCH 10/13] rev-list: add list-objects filtering support

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
>  static void finish_object(struct object *obj, const char *name, void 
> *cb_data)
>  {
> struct rev_list_info *info = cb_data;
> -   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
> +   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
> +   if (arg_print_missing) {
> +   list_objects_filter_map_insert(
> +   &missing_objects, &obj->oid, name, obj->type);
> +   return;
> +   }
> +
> +   /*
> +* Relax consistency checks when we expect missing
> +* objects because of partial-clone or a previous
> +* partial-fetch.
> +*
> +* Note that this is independent of any filtering that
> +* we are doing in this run.
> +*/
> +   if (is_partial_clone_registered())
> +   return;
> +
> die("missing blob object '%s'", oid_to_hex(&obj->oid));

I'm fine with arg_print_missing suppressing lazy fetching (when I
rebase my patches on this, I'll have to ensure that fetch_if_missing
is set to 0 if arg_print_missing is true), but I think that the
behavior when arg_print_missing is false should be the opposite - we
should let has_object_file() perform the lazy fetching, and die if it
returns false (that is, if the fetching failed).

> +   }
> if (info->revs->verify_objects && !obj->parsed && obj->type != 
> OBJ_COMMIT)
> parse_object(&obj->oid);
>  }


Re: [PATCH 1/2] xdiff-interface: export comparing and hashing strings

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 7:42 PM, Stefan Beller  wrote:
> This will turn out to be useful in a later patch.
>
> xdl_recmatch is exported in xdiff/xutils.h, to be used by various
> xdiff/*.c files, but not outside of xdiff/. This one makes it available
> to the outside, too.
>
> While at it, add documentation.
>
> Signed-off-by: Stefan Beller 
> ---
> diff --git a/xdiff-interface.h b/xdiff-interface.h
> @@ -29,4 +29,20 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg);
> +/*
> + * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
> + * Returns 1 if the strings are deemed equal, 0 otherwise.
> + * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
> + * are treated for the comparision.
> + */
> +extern int xdiff_compare_lines(const char *a, long len_a,
> +  const char *b, long len_b, long flags);

The comment block talks about 'l1', 'l2', 's1', and 's2', but the
declaration uses 'a', 'b, 'len_a', 'len_b'. Confusing.


Re: [PATCH 08/13] list-objects: add traverse_commit_list_filtered method

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
> +void traverse_commit_list_filtered(
> +   struct list_objects_filter_options *filter_options,
> +   struct rev_info *revs,
> +   show_commit_fn show_commit,
> +   show_object_fn show_object,
> +   list_objects_filter_map_foreach_cb print_omitted_object,
> +   void *show_data);

So the function call chain, if we wanted a filtered traversal, is:
traverse_commit_list_filtered -> traverse_commit_list__sparse_path
(and friends, and each algorithm is in its own file) ->
traverse_commit_list_worker

This makes the implementation of each algorithm more easily understood
(since they are all in their own files), but also increases the number
of global functions and code files. I personally would combine the
traverse_commit_list__* functions into one file
(list-objects-filtered.c), make them static, and also put
traverse_commit_list_filtered in there, but I understand that other
people in the Git project may differ on this.


Re: [PATCH 07/13] list-objects-filter-options: common argument parsing

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
> + *  ::= blob:none
> + *   blob:limit:[kmg]
> + *   sparse:oid:
> + *   sparse:path:

I notice in the code below that there are some usages of "=" instead
of ":" - could you clarify which one it is? (Ideally this would point
to one point of documentation which serves as both user and technical
documentation.)

> + */
> +int parse_list_objects_filter(struct list_objects_filter_options 
> *filter_options,
> + const char *arg)
> +{
> +   struct object_context oc;
> +   struct object_id sparse_oid;
> +   const char *v0;
> +   const char *v1;
> +
> +   if (filter_options->choice)
> +   die(_("multiple object filter types cannot be combined"));
> +
> +   /*
> +* TODO consider rejecting 'arg' if it contains any
> +* TODO injection characters (since we might send this
> +* TODO to a sub-command or to the server and we don't
> +* TODO want to deal with legacy quoting/escaping for
> +* TODO a new feature).
> +*/
> +
> +   filter_options->raw_value = strdup(arg);
> +
> +   if (skip_prefix(arg, "blob:", &v0) || skip_prefix(arg, "blobs:", 
> &v0)) {

I know that some people prefer leniency, but I think it's better to
standardize on one form ("blob" instead of both "blob" and "blobs").

> +   if (!strcmp(v0, "none")) {
> +   filter_options->choice = LOFC_BLOB_NONE;
> +   return 0;
> +   }
> +
> +   if (skip_prefix(v0, "limit=", &v1) &&
> +   git_parse_ulong(v1, &filter_options->blob_limit_value)) {
> +   filter_options->choice = LOFC_BLOB_LIMIT;
> +   return 0;
> +   }
> +   }
> +   else if (skip_prefix(arg, "sparse:", &v0)) {
> +   if (skip_prefix(v0, "oid=", &v1)) {
> +   filter_options->choice = LOFC_SPARSE_OID;
> +   if (!get_oid_with_context(v1, GET_OID_BLOB,
> + &sparse_oid, &oc)) {
> +   /*
> +* We successfully converted the 
> +* into an actual OID.  Rewrite the raw_value
> +* in canonoical form with just the OID.
> +* (If we send this request to the server, we
> +* want an absolute expression rather than a
> +* local-ref-relative expression.)
> +*/
> +   free((char *)filter_options->raw_value);
> +   filter_options->raw_value =
> +   xstrfmt("sparse:oid=%s",
> +   oid_to_hex(&sparse_oid));
> +   filter_options->sparse_oid_value =
> +   oiddup(&sparse_oid);
> +   } else {
> +   /*
> +* We could not turn the  into an
> +* OID.  Leave the raw_value as is in case
> +* the server can parse it.  (It may refer to
> +* a branch, commit, or blob we don't have.)
> +*/
> +   }
> +   return 0;
> +   }
> +
> +   if (skip_prefix(v0, "path=", &v1)) {
> +   filter_options->choice = LOFC_SPARSE_PATH;
> +   filter_options->sparse_path_value = strdup(v1);
> +   return 0;
> +   }
> +   }
> +
> +   die(_("invalid filter expression '%s'"), arg);
> +   return 0;
> +}
> +
> +int opt_parse_list_objects_filter(const struct option *opt,
> + const char *arg, int unset)
> +{
> +   struct list_objects_filter_options *filter_options = opt->value;
> +
> +   assert(arg);
> +   assert(!unset);
> +
> +   return parse_list_objects_filter(filter_options, arg);
> +}
> diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
> new file mode 100644
> index 000..23bd68e
> --- /dev/null
> +++ b/list-objects-filter-options.h
> @@ -0,0 +1,50 @@
> +#ifndef LIST_OBJECTS_FILTER_OPTIONS_H
> +#define LIST_OBJECTS_FILTER_OPTIONS_H
> +
> +#include "parse-options.h"
> +
> +/*
> + * Common declarations and utilities for filtering objects (such as omitting
> + * large blobs) in list_objects:traverse_commit_list() and git-rev-list.
> + */
> +
> +enum list_objects_filter_choice {
> +   LOFC_DISABLED = 0,
> +   LOFC_BLOB_NONE,
> +   LOFC_BLOB_LIMIT,
> +   LOFC_SPARSE_OID,
> +   LOFC_SPARSE_PATH,
> +};
> +
> +struct list_objects_filter_optio

Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 9:48 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>> I definitely agree with the sentiment that as few things as possible
>> should happen between calling getenv() and using its result. I've seen
>> real bugs there from unexpected invalidation of the static buffer.
>
> Sure.  I do not think we mind xstrdup()ing the result and freeing
> after we are done, though ;-)

More specifically, xstrdup_or_null(getenv(...)), if that route is chosen.


Re: [PATCH 01/13] dir: allow exclusions from blob in addition to file

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 2:53 PM, Jeff Hostetler  wrote:
> Refactor add_excludes() to separate the reading of the
> exclude file into a buffer and the parsing of the buffer
> into exclude_list items.
>
> Add add_excludes_from_blob_to_list() to allow an exclude
> file be specified with an OID.
>
> Signed-off-by: Jeff Hostetler 
> ---
> diff --git a/dir.c b/dir.c
> @@ -841,6 +856,38 @@ int add_excludes_from_file_to_list(const char *fname, 
> const char *base,
> +int add_excludes_from_blob_to_list(
> +   struct object_id *oid,
> +   const char *base, int baselen,
> +   struct exclude_list *el)
> +{
> +   char *buf;
> +   unsigned long size;
> +   enum object_type type;
> +
> +   buf = read_sha1_file(oid->hash, &type, &size);
> +   if (!buf)
> +   return -1;
> +
> +   if (type != OBJ_BLOB) {
> +   free(buf);
> +   return -1;
> +   }
> +
> +   if (size == 0) {
> +   free(buf);
> +   return 0;
> +   }
> +
> +   if (buf[size - 1] != '\n') {
> +   buf = xrealloc(buf, st_add(size, 1));
> +   buf[size++] = '\n';
> +   }
> +
> +   add_excludes_from_buffer(buf, size, base, baselen, el);

Seeing all the free()'s above, at first glance, one wonders why 'buf'
isn't freed here after add_excludes_from_buffer(), however an
examination of that function shows that 'buf' is assigned to
el->filebuf and later freed by clear_exclude_list(). Okay.

> +   return 0;

Should this be returning the result of add_excludes_from_buffer()
rather than unconditionally returning 0?

> +}


Re: [PATCH 03/13] list-objects: filter objects in traverse_commit_list

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:

> +enum list_objects_filter_result {
> +   LOFR_ZERO  = 0,
> +   LOFR_MARK_SEEN = 1<<0,

Probably worth documenting, something like /* Mark this object so that
it is skipped for the rest of the traversal. */

> +   LOFR_SHOW  = 1<<1,

And something like /* Invoke show_object_fn on this object. This
object may be revisited unless LOFR_MARK_SEEN is also set. */

> +};
> +
> +/* See object.h and revision.h */
> +#define FILTER_REVISIT (1<<25)

I think this should be declared closer to its use - in the sparse
filter code or in the file that uses it. Wherever it is, also update
the chart in object.h to indicate that we're using this 25th bit.

> +
> +enum list_objects_filter_type {
> +   LOFT_BEGIN_TREE,
> +   LOFT_END_TREE,
> +   LOFT_BLOB
> +};
> +
> +typedef enum list_objects_filter_result list_objects_filter_result;
> +typedef enum list_objects_filter_type list_objects_filter_type;

I don't think we typedef enums in Git code.

> +
> +typedef list_objects_filter_result (*filter_object_fn)(
> +   list_objects_filter_type filter_type,
> +   struct object *obj,
> +   const char *pathname,
> +   const char *filename,
> +   void *filter_data);
> +
> +void traverse_commit_list_worker(
> +   struct rev_info *,
> +   show_commit_fn, show_object_fn, void *show_data,
> +   filter_object_fn filter, void *filter_data);

I think things would be much clearer if a default filter was declared
(matching the behavior as of this patch when filter == NULL), say:
static inline default_filter(args) { switch(filter_type) { case
LOFT_BEGIN_TREE: return LOFR_MARK_SEEN | LOFR_SHOW; case
LOFT_END_TREE: return LOFT_ZERO; ...

And inline traverse_commit_list() instead of putting it in the .c file.

This would reduce or eliminate the need to document
traverse_commit_list_worker, including what happens if filter is NULL,
and explain how a user would make their own filter_object_fn.

> +
> +#endif /* LIST_OBJECTS_H */
> --
> 2.9.3
>


Re: [PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure

2017-10-24 Thread Junio C Hamano
Michael Haggerty  writes:

> ... But dc39e09942 added another blurb of code between
> the loop and the cleanup. That blurb sometimes resets `ret` to zero,
> making the cleanup code think that the locking was successful.
> ...
> The fix is simple: instead of just breaking out of the loop, jump
> directly to the cleanup code. This fixes some tests in t1404 that were
> added in the previous commit.

OK.  Now because we do not break and start packed_transaction but
instead jump over that if statement, we'll leave packed_transation
instance that we got from transaction_begin() that we called
add_update() on, but haven't called transaction_prepare() on
behind.  That instance is pointed by backend_data pointer which is
part of transaction, so presumably transaction_cleanup() called on
it in the section labelled "cleanup:" should take care of it?

Thanks for catching the issue and fixing it quickly.  Will queue.


Re: Consequences of CRLF in index?

2017-10-24 Thread Junio C Hamano
Jonathan Nieder  writes:

> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

My knee-jerk reaction is that (1) the end user definitely wants to
see preimage and postimage lines are different in such a commit by
default, one side has and the other side lacks ^M at the end., and
(2) one of the "whitespace ignoring" options (namely, "ignore space
at eol") may suffice, but if not, it should be easy to invent a new
one "ignore CR at eol".


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Junio C Hamano
Jeff King  writes:

> I definitely agree with the sentiment that as few things as possible
> should happen between calling getenv() and using its result. I've seen
> real bugs there from unexpected invalidation of the static buffer.

Sure.  I do not think we mind xstrdup()ing the result and freeing
after we are done, though ;-)


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-24 Thread Junio C Hamano
Heiko Voigt  writes:

> On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
>> Johannes Schindelin  writes:
>> 
>> > We meticulously pass the `exclude` flag to the `treat_directory()`
>> > function so that we can indicate that files in it are excluded rather
>> > than untracked when recursing.
>> >
>> > But we did not yet treat submodules the same way.
>> 
>> ... "because of that, we ended up showing <> what situation>>" would be a nice thing to have here, so that it can
>> be copied to the release notes for the bugfix.  
>
> Yes I agree that would be nice here. It was not immediately obvious that
> this only applies when using both flags: -u and --ignored.

Does any of you care to fill in the <> then? ;-)

> Looks good to me.
>
> Cheers Heiko


Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only

2017-10-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Brandon Williams  writes:
>
>> One simple idea would be to convert the single 'flag' into various bit
>> fields themselves, that way if you need to add a new flag you would just
>> make a new bit field.  I'm unaware of any downsides of doing so (though
>> i may be missing something) but doing so would probably cause a bit of
>> code churn.
>
> The reason why people want to have their own bit in the flags word
> is because they want to use DIFF_OPT_{SET,CLR,TST,TOUCHED} but they
> do not want to do the work to extend them beyond a single word.  
>
> I think it is doable by making everything a 1-bit wide bitfield
> without affecting existing users.

... but the "touched" thing may be harder---I haven't thought it
through.


Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only

2017-10-24 Thread Junio C Hamano
Brandon Williams  writes:

> One simple idea would be to convert the single 'flag' into various bit
> fields themselves, that way if you need to add a new flag you would just
> make a new bit field.  I'm unaware of any downsides of doing so (though
> i may be missing something) but doing so would probably cause a bit of
> code churn.

The reason why people want to have their own bit in the flags word
is because they want to use DIFF_OPT_{SET,CLR,TST,TOUCHED} but they
do not want to do the work to extend them beyond a single word.  

I think it is doable by making everything a 1-bit wide bitfield
without affecting existing users.


Re: [RFC] protocol version 2

2017-10-24 Thread Junio C Hamano
Brandon Williams  writes:

>> I actually have a reasonable guess why you want to have a separate
>> delimiter (which has nothing to do with "optional delim can be
>> omitted"), but I want to see it explained in this document clearly
>> by its designer(s).
>
> Jonathan Tan suggested that we tighten flush semantics in a newer
> protocol so that proxies are easier to work with.  Currently proxies
> need to understand the protocol instead of simply waiting for a flush.
>
> Also I've been told the smart http code is more complex because of the
> current semantics of flush packets.

I think the above two are the same thing ;-) but yes, "flush" in the
original protocol were used for both "I am truly finished talking;
now it is your turn" and "I am done with one section of what I need
to say, and a different section now begins; it is still my turn to
speak".  The need to handle the latter makes smart-http quite ugly.

Thanks.


[PATCH 1/5] fnv: migrate code from hashmap to fnv

2017-10-24 Thread Stefan Beller
The functions regarding the Fowler–Noll–Vo hash function are put in a
separate compilation unit, as it is logically different from the hashmap
functionality.

Signed-off-by: Stefan Beller 
---

This may still be an interesting cleanup on its own.

Thanks,
Stefan

 Makefile|  1 +
 attr.c  |  1 +
 builtin/fast-export.c   |  1 +
 diff.c  |  1 +
 fnv.c   | 64 +
 fnv.h   | 20 
 hashmap.c   | 64 +
 hashmap.h   | 15 
 merge-recursive.c   |  1 +
 remote.c|  1 +
 submodule-config.c  |  1 +
 t/helper/test-hashmap.c |  1 +
 12 files changed, 93 insertions(+), 78 deletions(-)
 create mode 100644 fnv.c
 create mode 100644 fnv.h

diff --git a/Makefile b/Makefile
index cd75985991..8e1c5988f3 100644
--- a/Makefile
+++ b/Makefile
@@ -793,6 +793,7 @@ LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
+LIB_OBJS += fnv.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
diff --git a/attr.c b/attr.c
index dfc3a558d8..2e4217c4f1 100644
--- a/attr.c
+++ b/attr.c
@@ -16,6 +16,7 @@
 #include "utf8.h"
 #include "quote.h"
 #include "thread-utils.h"
+#include "fnv.h"
 
 const char git_attr__true[] = "(builtin)true";
 const char git_attr__false[] = "\0(builtin)false";
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 2fb60d6d48..62f4010510 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -21,6 +21,7 @@
 #include "quote.h"
 #include "remote.h"
 #include "blob.h"
+#include "fnv.h"
 
 static const char *fast_export_usage[] = {
N_("git fast-export [rev-list-opts]"),
diff --git a/diff.c b/diff.c
index c4a669ffa8..a23f4521fb 100644
--- a/diff.c
+++ b/diff.c
@@ -22,6 +22,7 @@
 #include "argv-array.h"
 #include "graph.h"
 #include "packfile.h"
+#include "fnv.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
diff --git a/fnv.c b/fnv.c
new file mode 100644
index 00..b4cbf39f0a
--- /dev/null
+++ b/fnv.c
@@ -0,0 +1,64 @@
+#include "fnv.h"
+
+#define FNV32_BASE ((unsigned int) 0x811c9dc5)
+#define FNV32_PRIME ((unsigned int) 0x01000193)
+
+unsigned int strhash(const char *str)
+{
+   unsigned int c, hash = FNV32_BASE;
+   while ((c = (unsigned char) *str++))
+   hash = (hash * FNV32_PRIME) ^ c;
+   return hash;
+}
+
+unsigned int strihash(const char *str)
+{
+   unsigned int c, hash = FNV32_BASE;
+   while ((c = (unsigned char) *str++)) {
+   if (c >= 'a' && c <= 'z')
+   c -= 'a' - 'A';
+   hash = (hash * FNV32_PRIME) ^ c;
+   }
+   return hash;
+}
+
+unsigned int memhash(const void *buf, size_t len)
+{
+   unsigned int hash = FNV32_BASE;
+   unsigned char *ucbuf = (unsigned char *) buf;
+   while (len--) {
+   unsigned int c = *ucbuf++;
+   hash = (hash * FNV32_PRIME) ^ c;
+   }
+   return hash;
+}
+
+unsigned int memihash(const void *buf, size_t len)
+{
+   unsigned int hash = FNV32_BASE;
+   unsigned char *ucbuf = (unsigned char *) buf;
+   while (len--) {
+   unsigned int c = *ucbuf++;
+   if (c >= 'a' && c <= 'z')
+   c -= 'a' - 'A';
+   hash = (hash * FNV32_PRIME) ^ c;
+   }
+   return hash;
+}
+
+/*
+ * Incoporate another chunk of data into a memihash
+ * computation.
+ */
+unsigned int memihash_cont(unsigned int hash_seed, const void *buf, size_t len)
+{
+   unsigned int hash = hash_seed;
+   unsigned char *ucbuf = (unsigned char *) buf;
+   while (len--) {
+   unsigned int c = *ucbuf++;
+   if (c >= 'a' && c <= 'z')
+   c -= 'a' - 'A';
+   hash = (hash * FNV32_PRIME) ^ c;
+   }
+   return hash;
+}
diff --git a/fnv.h b/fnv.h
new file mode 100644
index 00..b425c85c66
--- /dev/null
+++ b/fnv.h
@@ -0,0 +1,20 @@
+#ifndef FNV_H
+#define FNV_H
+
+#include 
+/*
+ * Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+ * http://www.isthe.com/chongo/tech/comp/fnv).
+ * `strhash` and `strihash` take 0-terminated strings, while `memhash` and
+ * `memihash` operate on arbitrary-length memory.
+ * `strihash` and `memihash` are case insensitive versions.
+ * `memihash_cont` is a variant of `memihash` that allows a computation to be
+ * continued with another chunk of data.
+ */
+extern unsigned int strhash(const char *buf);
+extern unsigned int strihash(const char *buf);
+extern unsigned int memhash(const void *buf, size_t len);
+extern unsigned int memihash(const void *buf, size_t len);
+extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, 
size_t len);
+
+#endif
diff --git a/hashmap.c b/hashmap.c
index d42f01ff5a..1605edbbc3 100644
--

[PATCH 2/2] diff.c: get rid of duplicate implementation

2017-10-24 Thread Stefan Beller
The implementations in diff.c to detect moved lines needs to compare
strings and hash strings, which is implemented in that file, as well
as in the xdiff library.

Remove the rather recent implementation in diff.c and rely on the well
exercised code in the xdiff lib.

With this change the hash used for bucketing the strings for the moved
line detection changes from FNV32 (that is provided via the hashmaps
memhash) to DJB2 (which is used internally in xdiff).  Benchmarks found
on the web[1] do not indicate that these hashes are different in
performance for readable strings.

[1] 
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Signed-off-by: Stefan Beller 
---
 diff.c | 82 --
 1 file changed, 4 insertions(+), 78 deletions(-)

diff --git a/diff.c b/diff.c
index c4a669ffa8..e6814b9e9c 100644
--- a/diff.c
+++ b/diff.c
@@ -707,88 +707,14 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-static int next_byte(const char **cp, const char **endp,
-const struct diff_options *diffopt)
-{
-   int retval;
-
-   if (*cp >= *endp)
-   return -1;
-
-   if (isspace(**cp)) {
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
-   while (*cp < *endp && isspace(**cp))
-   (*cp)++;
-   /*
-* After skipping a couple of whitespaces,
-* we still have to account for one space.
-*/
-   return (int)' ';
-   }
-
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
-   while (*cp < *endp && isspace(**cp))
-   (*cp)++;
-   /*
-* return the first non-ws character via the usual
-* below, unless we ate all of the bytes
-*/
-   if (*cp >= *endp)
-   return -1;
-   }
-   }
-
-   retval = (unsigned char)(**cp);
-   (*cp)++;
-   return retval;
-}
-
 static int moved_entry_cmp(const struct diff_options *diffopt,
   const struct moved_entry *a,
   const struct moved_entry *b,
   const void *keydata)
 {
-   const char *ap = a->es->line, *ae = a->es->line + a->es->len;
-   const char *bp = b->es->line, *be = b->es->line + b->es->len;
-
-   if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS))
-   return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
-
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
-   while (ae > ap && isspace(ae[-1]))
-   ae--;
-   while (be > bp && isspace(be[-1]))
-   be--;
-   }
-
-   while (1) {
-   int ca, cb;
-   ca = next_byte(&ap, &ae, diffopt);
-   cb = next_byte(&bp, &be, diffopt);
-   if (ca != cb)
-   return 1;
-   if (ca < 0)
-   return 0;
-   }
-}
-
-static unsigned get_string_hash(struct emitted_diff_symbol *es, struct 
diff_options *o)
-{
-   if (o->xdl_opts & XDF_WHITESPACE_FLAGS) {
-   static struct strbuf sb = STRBUF_INIT;
-   const char *ap = es->line, *ae = es->line + es->len;
-   int c;
-
-   strbuf_reset(&sb);
-   while (ae > ap && isspace(ae[-1]))
-   ae--;
-   while ((c = next_byte(&ap, &ae, o)) >= 0)
-   strbuf_addch(&sb, c);
-
-   return memhash(sb.buf, sb.len);
-   } else {
-   return memhash(es->line, es->len);
-   }
+   return !xdiff_compare_lines(a->es->line, a->es->len,
+   b->es->line, b->es->len,
+   diffopt->xdl_opts);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -797,7 +723,7 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
struct moved_entry *ret = xmalloc(sizeof(*ret));
struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 
-   ret->ent.hash = get_string_hash(l, o);
+   ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
ret->es = l;
ret->next_line = NULL;
 
-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 1/2] xdiff-interface: export comparing and hashing strings

2017-10-24 Thread Stefan Beller
This will turn out to be useful in a later patch.

xdl_recmatch is exported in xdiff/xutils.h, to be used by various
xdiff/*.c files, but not outside of xdiff/. This one makes it available
to the outside, too.

While at it, add documentation.

Signed-off-by: Stefan Beller 
---
 xdiff-interface.c | 11 +++
 xdiff-interface.h | 16 
 2 files changed, 27 insertions(+)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 018e033089..9b35af2455 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -5,6 +5,7 @@
 #include "xdiff/xdiffi.h"
 #include "xdiff/xemit.h"
 #include "xdiff/xmacros.h"
+#include "xdiff/xutils.h"
 
 struct xdiff_emit_state {
xdiff_emit_consume_fn consume;
@@ -296,6 +297,16 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg)
}
 }
 
+unsigned long xdiff_hash_string(const char *s, size_t len, long flags)
+{
+   return xdl_hash_record(&s, s + len, flags);
+}
+
+int xdiff_compare_lines(const char *a, long len_a, const char *b, long len_b, 
long flags)
+{
+   return xdl_recmatch(a, len_a, b, len_b, flags);
+}
+
 int git_xmerge_style = -1;
 
 int git_xmerge_config(const char *var, const char *value, void *cb)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 6f6ba9095d..6f5abaf8d3 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -29,4 +29,20 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg);
 extern int git_xmerge_config(const char *var, const char *value, void *cb);
 extern int git_xmerge_style;
 
+/*
+ * Compare the strings l1 with l2 which are of size s1 and s2 respectively.
+ * Returns 1 if the strings are deemed equal, 0 otherwise.
+ * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
+ * are treated for the comparision.
+ */
+extern int xdiff_compare_lines(const char *a, long len_a,
+  const char *b, long len_b, long flags);
+
+/*
+ * Returns a hash of the string s of length len.
+ * The `flags` given as XDF_WHITESPACE_FLAGS determine how white spaces
+ * are treated for the hash.
+ */
+extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
+
 #endif
-- 
2.15.0.rc2.6.g953226eb5f



[PATCHv2 0/2] (x)diff cleanup: remove duplicate code

2017-10-24 Thread Stefan Beller
v2:
* I realized that we don't have to change the hashing function of xdiff;
  we can rather change the hashing function for the move detection,
  which is less fundamental.
  (That way I can shrink the series down to 2 patches)
* commented and renamed function parameters in the exposed xdiff functions.
* applies on top of jk/diff-color-moved-fix.

Thanks,
Stefan

v1:
Junio wrote:

>  * I was hoping that the next_byte() and string_hash() thing, once
>they are cleaned up, will eventually be shared with the xdiff/
>code at the lower layer, which needs to do pretty much the same
>in order to implement various whitespace ignoring options.  I am
>not sure how well the approach taken by the WIP patch meshes with
>the needs of the lower layer.

This series does exactly this; although I chose to reuse the code in
xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that
code has seen more exercise already (hence I assume it has fewer bugs
if any as well as its performance implications are well understood).

However to do so, we need to pollute xdiff/xutils.c and include
hashmap.h there (which also requires cache.h as that header has
an inline function using BUG()), which I find a bit unfortunate but
worth the trade off of using better tested code.

Thanks,
Stefan


Stefan Beller (2):
  xdiff-interface: export comparing and hashing strings
  diff.c: get rid of duplicate implementation

 diff.c| 82 +++
 xdiff-interface.c | 11 
 xdiff-interface.h | 16 +++
 3 files changed, 31 insertions(+), 78 deletions(-)

-- 
2.15.0.rc2.6.g953226eb5f



Re: [PATCH 3/4] xdiff: use stronger hash function internally

2017-10-24 Thread Jonathan Nieder
Stefan Beller wrote:

> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -24,7 +24,8 @@
>  #include 
>  #include "xinclude.h"
>  
> -
> +#include "cache.h"
> +#include "hashmap.h"

#includes like "git-compat-util.h" and "cache.h" can only be used as
the *first* #include.

Otherwise the feature test macros they set do not take effect, causing
lots of unpleasant platform-specific breakage.

Thanks,
Jonathan


Re: [GIT PULL] l10n updates for 2.15.0 round 2

2017-10-24 Thread Christopher Díaz Riveros
El mar, 24-10-2017 a las 11:48 +0900, Junio C Hamano escribió:
> 
> Thanks, will pull.  Nice to see a new language added to the
> repertoire.
> 

Thank you, I am very happy to be able to help the community. And bring
Git a little closer to Latin America and Spain.

Regards

-- 
Christopher Díaz Riveros
Gentoo Developer


Re: Consequences of CRLF in index?

2017-10-24 Thread Johannes Sixt

Am 24.10.2017 um 19:48 schrieb Lars Schneider:

I've migrated a large repo (110k+ files) with a lot of history (177k commits)
and a lot of users (200+) to Git. Unfortunately, all text files in the index
of the repo have CRLF line endings. In general this seems not to be a problem
as the project is developed exclusively on Windows.

However, I wonder if there are any "hidden consequences" of this setup?


I've been working on a project with CRLF in every source file for a 
decade now. It's C++ source, and it isn't even Windows-only: when 
checked out on Linux, there are CRs in the files, with no bad 
consequences so far. GCC is happy with them.


-- Hannes


Re: [PATCH 1/4] hashmap: introduce memhash_feed to access the internals of FNV-1 hash

2017-10-24 Thread Stefan Beller
>> +unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next)
>
> Why is the second parameter const and the first one isn't?  (We tend
> not to bother with const for value types.)

will do.

>> + hash = memhash_feed(hash, c);
>
> I guess compilers inline a copy of the function here with -O2.  My
> knee-jerk reaction, however, is horror in the face of adding a function
> call to the inner loop of a hash function.  Do you have performance
> test results, ideally also with -O0?  And why not make memhash_feed()
> an inline function or macro to sidestep that issue?

My gut feeling was similar, but then I assumed compilers of today would
be smart.

As per the discussion on a later patch, we could migrate all memhash* functions
to fnv32.c except for the _feed, which will be static in its header fnv32.h

Thanks,
Stefan


Re: [PATCH 3/4] xdiff: use stronger hash function internally

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 1:23 PM, René Scharfe  wrote:
> Am 24.10.2017 um 20:59 schrieb Stefan Beller:
>> Instead of using the hash seeded with 5381, and updated via
>> `(hash << 5) ^ new_byte`, use the FNV-1 primitives as offered by
>> hashmap.h, which is seeded with 0x811c9dc5 and computed as
>> `(hash * 0x01000193) ^ new_byte`.
>
> The hash function you're replacing is called DJB2; I think that's worth
> mentioning.

I was not aware of the name. I'll look it up; thanks!

>
> Performance test results would be nice.  No idea how to find edge cases,
> though, or better: demonstrate a lack thereof.

My reasoning, though not in the commit message, is that the operations
are essentially equal, just with different numeric values, hence no impact.

I can look at the assembly and measure, too.

>
>>
>> Signed-off-by: Stefan Beller 
>> ---
>>   xdiff/xutils.c | 19 ---
>>   1 file changed, 8 insertions(+), 11 deletions(-)
>>
>> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
>> index 04d7b32e4e..a58a28c687 100644
>> --- a/xdiff/xutils.c
>> +++ b/xdiff/xutils.c
>> @@ -24,7 +24,8 @@
>>   #include 
>>   #include "xinclude.h"
>>
>> -
>> +#include "cache.h"
>> +#include "hashmap.h"
>
> Ouch.  Defining FNV32_BASE and FNV32_PRIME here would be much easier
> overall.  And if that's too much duplication then those definitions
> could be extracted into a new header file (fnv32.h?) included by both
> hashmap.h and xutils.c.

I guess fnv32.h would do; it would contain the defines as well as the
static inline
function to be used in the inner loop of patch 1.

>
>>
>>
>>   long xdl_bogosqrt(long n) {
>> @@ -228,7 +229,7 @@ int xdl_recmatch(const char *l1, long s1, const char 
>> *l2, long s2, long flags)
>>
>>   static unsigned long xdl_hash_record_with_whitespace(char const **data,
>>   char const *top, long flags) {
>> - unsigned long ha = 5381;
>> + unsigned long ha = memhash(NULL, 0);
>>   char const *ptr = *data;
>>
>>   for (; ptr < top && *ptr != '\n'; ptr++) {
>> @@ -243,21 +244,18 @@ static unsigned long 
>> xdl_hash_record_with_whitespace(char const **data,
>>   ; /* already handled */
>>   else if (flags & XDF_IGNORE_WHITESPACE_CHANGE
>>&& !at_eol) {
>> - ha += (ha << 5);
>> - ha ^= (unsigned long) ' ';
>> + ha = memhash_feed(ha, (unsigned char) ' ');
>
> All the memhash_feed() callers in this file cast to unsigned char.  A
> macro or a function (possibly inline) defined at the top could do
> that for them.

That would go away when using fnv32.h

Thanks for the review!
Stefan


Re: [PATCH 2/4] xdiff-interface: export comparing and hashing strings

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 1:23 PM, René Scharfe  wrote:

> xdl_recmatch() is already exported; why not use it without this
> wrapper?

It is exported in xdiff/xutils.h, to be used by various xdiff/*.c files, but
not outside of xdiff/. This one makes it available to the outside, too.

>> +extern unsigned long xdiff_hash_string(const char *s, size_t len, long 
>> flags);
>
> Documenting the meaning of their parameters would be nice.

I'll do that.


Re: [PATCH 3/4] xdiff: use stronger hash function internally

2017-10-24 Thread René Scharfe
Am 24.10.2017 um 20:59 schrieb Stefan Beller:
> Instead of using the hash seeded with 5381, and updated via
> `(hash << 5) ^ new_byte`, use the FNV-1 primitives as offered by
> hashmap.h, which is seeded with 0x811c9dc5 and computed as
> `(hash * 0x01000193) ^ new_byte`.

The hash function you're replacing is called DJB2; I think that's worth
mentioning.

Performance test results would be nice.  No idea how to find edge cases,
though, or better: demonstrate a lack thereof.

> 
> Signed-off-by: Stefan Beller 
> ---
>   xdiff/xutils.c | 19 ---
>   1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/xdiff/xutils.c b/xdiff/xutils.c
> index 04d7b32e4e..a58a28c687 100644
> --- a/xdiff/xutils.c
> +++ b/xdiff/xutils.c
> @@ -24,7 +24,8 @@
>   #include 
>   #include "xinclude.h"
>   
> -
> +#include "cache.h"
> +#include "hashmap.h"

Ouch.  Defining FNV32_BASE and FNV32_PRIME here would be much easier
overall.  And if that's too much duplication then those definitions
could be extracted into a new header file (fnv32.h?) included by both
hashmap.h and xutils.c.

>   
>   
>   long xdl_bogosqrt(long n) {
> @@ -228,7 +229,7 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, 
> long s2, long flags)
>   
>   static unsigned long xdl_hash_record_with_whitespace(char const **data,
>   char const *top, long flags) {
> - unsigned long ha = 5381;
> + unsigned long ha = memhash(NULL, 0);
>   char const *ptr = *data;
>   
>   for (; ptr < top && *ptr != '\n'; ptr++) {
> @@ -243,21 +244,18 @@ static unsigned long 
> xdl_hash_record_with_whitespace(char const **data,
>   ; /* already handled */
>   else if (flags & XDF_IGNORE_WHITESPACE_CHANGE
>&& !at_eol) {
> - ha += (ha << 5);
> - ha ^= (unsigned long) ' ';
> + ha = memhash_feed(ha, (unsigned char) ' ');

All the memhash_feed() callers in this file cast to unsigned char.  A
macro or a function (possibly inline) defined at the top could do
that for them.

>   }
>   else if (flags & XDF_IGNORE_WHITESPACE_AT_EOL
>&& !at_eol) {
>   while (ptr2 != ptr + 1) {
> - ha += (ha << 5);
> - ha ^= (unsigned long) *ptr2;
> + ha = memhash_feed(ha, (unsigned char) 
> *ptr2);
>   ptr2++;
>   }
>   }
>   continue;
>   }
> - ha += (ha << 5);
> - ha ^= (unsigned long) *ptr;
> + ha = memhash_feed(ha, (unsigned char) *ptr);
>   }
>   *data = ptr < top ? ptr + 1: ptr;
>   
> @@ -265,15 +263,14 @@ static unsigned long 
> xdl_hash_record_with_whitespace(char const **data,
>   }
>   
>   unsigned long xdl_hash_record(char const **data, char const *top, long 
> flags) {
> - unsigned long ha = 5381;
> + unsigned long ha = memhash(NULL, 0);
>   char const *ptr = *data;
>   
>   if (flags & XDF_WHITESPACE_FLAGS)
>   return xdl_hash_record_with_whitespace(data, top, flags);
>   
>   for (; ptr < top && *ptr != '\n'; ptr++) {
> - ha += (ha << 5);
> - ha ^= (unsigned long) *ptr;
> + ha = memhash_feed(ha, (unsigned char) *ptr);
>   }
>   *data = ptr < top ? ptr + 1: ptr;
>   
> 


Re: [PATCH 2/4] xdiff-interface: export comparing and hashing strings

2017-10-24 Thread René Scharfe
Am 24.10.2017 um 20:59 schrieb Stefan Beller:
> This will turn out to be useful in a later patch
> 
> Signed-off-by: Stefan Beller 
> ---
>   xdiff-interface.c | 11 +++
>   xdiff-interface.h |  5 +
>   2 files changed, 16 insertions(+)
> 
> diff --git a/xdiff-interface.c b/xdiff-interface.c
> index 018e033089..fd002ebbc2 100644
> --- a/xdiff-interface.c
> +++ b/xdiff-interface.c
> @@ -5,6 +5,7 @@
>   #include "xdiff/xdiffi.h"
>   #include "xdiff/xemit.h"
>   #include "xdiff/xmacros.h"
> +#include "xdiff/xutils.h"
>   
>   struct xdiff_emit_state {
>   xdiff_emit_consume_fn consume;
> @@ -296,6 +297,16 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg)
>   }
>   }
>   
> +unsigned long xdiff_hash_string(const char *s, size_t len, long flags)
> +{
> + return xdl_hash_record(&s, s + len, flags);
> +}
> +
> +int xdiff_compare_lines(const char *l1, long s1, const char *l2, long s2, 
> long flags)
> +{
> + return xdl_recmatch(l1, s1, l2, s2, flags);
> +}

xdl_recmatch() is already exported; why not use it without this
wrapper?

> +
>   int git_xmerge_style = -1;
>   
>   int git_xmerge_config(const char *var, const char *value, void *cb)
> diff --git a/xdiff-interface.h b/xdiff-interface.h
> index 6f6ba9095d..d3cb9285c5 100644
> --- a/xdiff-interface.h
> +++ b/xdiff-interface.h
> @@ -29,4 +29,9 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg);
>   extern int git_xmerge_config(const char *var, const char *value, void *cb);
>   extern int git_xmerge_style;
>   
> +extern int xdiff_compare_lines(const char *l1, long s1,
> +const char *l2, long s2, long flags);
> +
> +extern unsigned long xdiff_hash_string(const char *s, size_t len, long 
> flags);

Documenting the meaning of their parameters would be nice.  s and len
are easy enough to guess, but which flags can be used?  At least a
pointer to their definition in xdiff/xdiff.h would be helpful.  And
renaming l1, s1, l2, s2 to a, alen, b, blen or line1, len1, line2, len2
or similar would leave me less confused, but perhaps that's just me.

> +
>   #endif
> 


Re: [PATCH 1/4] hashmap: introduce memhash_feed to access the internals of FNV-1 hash

2017-10-24 Thread René Scharfe
Am 24.10.2017 um 20:59 schrieb Stefan Beller:
> This will be useful shortly.
> 
> Signed-off-by: Stefan Beller 
> ---
>   hashmap.c | 7 ++-
>   hashmap.h | 3 +++
>   2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/hashmap.c b/hashmap.c
> index d42f01ff5a..d103eb1fd2 100644
> --- a/hashmap.c
> +++ b/hashmap.c
> @@ -26,13 +26,18 @@ unsigned int strihash(const char *str)
>   return hash;
>   }
>   
> +unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next)

Why is the second parameter const and the first one isn't?  (We tend
not to bother with const for value types.)

> +{
> + return (hash_seed * FNV32_PRIME) ^ next;
> +}
> +
>   unsigned int memhash(const void *buf, size_t len)
>   {
>   unsigned int hash = FNV32_BASE;
>   unsigned char *ucbuf = (unsigned char *) buf;
>   while (len--) {
>   unsigned int c = *ucbuf++;
> - hash = (hash * FNV32_PRIME) ^ c;
> + hash = memhash_feed(hash, c);

I guess compilers inline a copy of the function here with -O2.  My
knee-jerk reaction, however, is horror in the face of adding a function
call to the inner loop of a hash function.  Do you have performance
test results, ideally also with -O0?  And why not make memhash_feed()
an inline function or macro to sidestep that issue?

>   }
>   return hash;
>   }
> diff --git a/hashmap.h b/hashmap.h
> index 7cb29a6aed..c2464385ed 100644
> --- a/hashmap.h
> +++ b/hashmap.h
> @@ -105,10 +105,13 @@
>* `strihash` and `memihash` are case insensitive versions.
>* `memihash_cont` is a variant of `memihash` that allows a computation to 
> be
>* continued with another chunk of data.
> + * `memhash_feed` takes just one character and returns the hash based off
> + * a previous hash.
>*/
>   extern unsigned int strhash(const char *buf);
>   extern unsigned int strihash(const char *buf);
>   extern unsigned int memhash(const void *buf, size_t len);
> +extern unsigned int memhash_feed(unsigned int hash_seed, const unsigned char 
> next);
>   extern unsigned int memihash(const void *buf, size_t len);
>   extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, 
> size_t len);
>   
> 


Re: Multiple paths in GIT_EXEC_PATH

2017-10-24 Thread Dennis Kaarsemaker
On Tue, 2017-10-17 at 18:21 +0300, Nikolay Yakimov wrote:
> For why I need that is another matter. Long story short, I need git to
> look for '.gitignore' in a particular non-standard location, since I
> have multiple git repositories in the same workdir (that workdir being
> $HOME and git repositories being stores for my different configs)

That is solvable without needing multiple directories in
$GIT_EXEC_PATH. vcsh, which also solves the 'multiple repos with same
workdir' problem in a very thin wrapper around git, does this by
setting core.excludesfile in the per repo config to a unique path.

hurricane:~$ vcsh dotfiles config core.excludesfile
.gitignore.d/dotfiles
hurricane:~$ vcsh secrets config core.excludesfile
.gitignore.d/secrets

D.


Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 07:11:24PM +0200, Martin Ågren wrote:

> On 24 October 2017 at 18:45, Eric Sunshine  wrote:
> > On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
> >> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
> >> wrote:
> >>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
> >>> init_merge_options().
> >>> Consequential call of getenv() may return NULL pointer and strtol() 
> >>> crashes.
> >>> However the stored pointer to the obtained getenv() result may be 
> >>> invalidated
> >>> by some other getenv() call from another thread as getenv() is not 
> >>> thread-safe.
> 
> I'm having trouble wrapping my head around this. Under which
> circumstances could the second call in the current code return NULL, but
> the code after your patch behave in a well-defined (and correct) way?

Yeah, it's not at all clear to me this is solving a real problem. I know
Andrey mentioned playing around with fault injection in an earlier
thread, so I'm wondering if there is an artificial fault being injected
into the second getenv() call. Which does not seem like something that
should be possible in the real world.

I definitely agree with the sentiment that as few things as possible
should happen between calling getenv() and using its result. I've seen
real bugs there from unexpected invalidation of the static buffer.

-Peff


Re: [PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 05:16:25PM +0200, Michael Haggerty wrote:

> The fix is simple: instead of just breaking out of the loop, jump
> directly to the cleanup code. This fixes some tests in t1404 that were
> added in the previous commit.

Nicely explained.

I think that fix makes sense, and matches how the rest of the function
behaved. The other option would be to switch the recently-added code to:

  if (!ret && packed_transaction)
 ...

but it seems like the "goto" means one less thing for new code to
remember if it gets added.

-Peff


Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 12:19:26PM -0400, Eric Sunshine wrote:

> On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty  
> wrote:
> > diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
> > @@ -34,6 +34,86 @@ test_update_rejected () {
> > +# Test adding and deleting D/F-conflicting references in a single
> > +# transaction.
> > +df_test() {
> > +   local prefix="$1"
> > +   shift
> > +   local pack=:
> 
> Isn't "local" a Bash-ism we want to keep out of the test scripts?

Yeah. It's supported by dash and many other shells, but we do try to
avoid it[1]. I think in this case we could just drop it (but keep
setting the "local foo" ones to empty with "foo=".

-Peff

[1] There's some more discussion in the thread at:

  https://public-inbox.org/git/20160601163747.ga10...@sigill.intra.peff.net/


Re: [PATCH 0/2] Fix an error-handling path when locking refs

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 05:16:23PM +0200, Michael Haggerty wrote:

> But in fact the problem is more general; it can happen whenever a
> reference deletion within a transaction is processed successfully, and
> then another reference update in the same transaction fails in
> `lock_ref_for_update()`. In such a case the failed update and any
> subsequent ones could be silently ignored.

Thanks for digging this down to the root cause. It sounds like this
might not be all that hard to trigger for users of "push --atomic", then
(most of the rest of the code is saved by the fact that it still uses
one ref per transaction, which I think can't trigger this).

> There is a longer explanation in the second patch's commit message.
> 
> The tests that I added probe a bunch of D/F update scenarios, which I
> think should be characteristic of the scenarios that would trigger
> this bug. It would be nice to have tests that examine other types of
> failures (e.g., wrong `old_oid` values).

What you added makes sense to me for now. I do admit I was a little
surprised that we didn't have better test coverage for partial
transaction failures. I think in general our test suite is weak on
checking failures, and covering more cases (like old_oid) would be
welcome. Those will be valuable especially as we started playing with
more storage backends.

I also agree that those tests can come post-release.

> While looking at this code again, I realized that the new code
> rewrites the `packed-refs` file more often than did the old code.
> Specifically, after dc39e09942 (files_ref_store: use a transaction to
> update packed refs, 2017-09-08), the `packed-refs` file is overwritten
> for any transaction that includes a reference delete. Prior to that
> commit, `packed-refs` was only overwritten if a deleted reference was
> actually present in the existing `packed-refs` file. This is even the
> case if there was previously no `packed-refs` file at all; now any
> reference deletion causes an empty `packed-refs` file to be created.
> 
> I think this is a conscionable regression, since deleting references
> that are purely loose is probably not all that common, and the old
> code had to read the whole `packed-refs` file anyway. Nevertheless, it
> is obviously something that I would like to fix (though maybe not for
> 2.15? I'm open to input about its urgency.)

I agree it's not nearly as urgent as the fix you have here. It might
show up on a busy system as increased lock contention over packed-refs.
Or if you have really gigantic packed-refs files, a possible performance
regression. But as you say, it should be a pretty rare case.

So I'd be OK with leaving it to post-2.15. OTOH, I suspect it's a pretty
small patch. If you happen to produce it quickly, it might be worth
seeing before evaluating.

-Peff


Re: [PATCH] path: use xmalloc in add_to_trie

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 09:13:28AM -0700, Stefan Beller wrote:

> On Tue, Oct 24, 2017 at 8:15 AM, Andrey Okoshkin  
> wrote:
> > Add usage of xmalloc() instead of malloc() in add_to_trie() as xmalloc wraps
> > and checks memory allocation result.
> >
> > Signed-off-by: Andrey Okoshkin 
> > ---
> > Hello,
> > I'm not sure but it looks like there is a missing check of the malloc 
> > result.
> > memcpy() may crash with SIGSEGV due to the memory allocation failure.
> > make_trie_node() uses xmalloc() and xcalloc() - so I believe add_to_trie()
> > also should use it.
> 
> Good catch! Thanks for spotting.
> 
> Trying to find similar occurrences via git grep "= malloc" did not
> yield other places that need the same fix.

Don't forget realloc and calloc (though I don't think there are any
cases that need touching there; just a hint for anybody doing auditing).

-Peff


Re: v2.15.0-rc2 ref deletion bug

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 01:05:00PM +0200, Michael Haggerty wrote:

> > I'd expect one of:
> > 
> >   1. We delete "foo" before updating "foo/bar", and we end up with a
> >  single ref.
> 
> I don't think that this is possible in the general case in a single
> transaction. The problem is that the code would need to take locks
> 
> refs/tags/foo.lock
> refs/tags/foo/bar.lock
> 
> But the latter couldn't coexist with the loose reference file
> 
> refs/tags/foo
> 
> , which might already exist.

Yeah, you're right. I was thinking of the opposite case (where you
could create "foo.lock" even though "foo/bar" exists), but this one is
impossible with filesystem semantics.

> It is only imaginable to do this in a single transaction if you pack and
> prune `refs/tags/foo` first, to get the loose reference out of the way,
> before executing the transaction. Even then, you would have to beware of
> a race where another process writes a loose version of `refs/tags/foo`
> between the time that `pack-refs` prunes it and the time that the
> transaction obtains the lock again.

Yeah, it's probably better to avoid playing games here. Moving to a
non-filesystem storage backend would just make the problem go away.

-Peff


Re: Consequences of CRLF in index?

2017-10-24 Thread Torsten Bögershausen
On Tue, Oct 24, 2017 at 11:14:15AM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Lars Schneider wrote:
> 
> > I've migrated a large repo (110k+ files) with a lot of history (177k 
> > commits)
> > and a lot of users (200+) to Git. Unfortunately, all text files in the index
> > of the repo have CRLF line endings. In general this seems not to be a 
> > problem
> > as the project is developed exclusively on Windows.
> 
> Sounds good.
> 
> > However, I wonder if there are any "hidden consequences" of this setup?
> > If there are consequences, then I see two options. Either I rebase the repo
> > and fix the line endings for all commits or I add a single commit that fixes
> > the line endings for all files. Both actions require coordination with the
> > users to avoid repo trouble/merge conflicts. The "single fixup commit" 
> > options
> > would also make diffs into the past look bad. Would a single large commit 
> > have
> > any impact on the performance of standard Git operations?
> 
> There are no hidden consequences that I'm aware of.  If you later
> decide that you want to become a cross-platform project, then you may
> want to switch to LF endings, in which case I suggest the "single
> fixup commit" strategy.
> 
> In any event, you also probably want to declare what you're doing
> using .gitattributes.  By checking in the files as CRLF, you are
> declaring that you do *not* want Git to treat them as text files
> (i.e., you do not want Git to change the line endings), so something
> as simple as
> 
>   * -text
> 
> should do the trick.  See gitattributes(5) for details.

Exactly.
The "hidden consequence" may be the other way around:
If you don't specify .gitattributes, then all people who have core.autocrlf=true
will suffer from a runtime penalty.
core.autocrlf=true means "auto".
At each checkout Git needs to figure out that the file has CRLF in the repo,
so that there is no conversion done.
The same happens on "git add" or "git commit" (and sometimes on "git status").

The penalty may be low, but Dscho once reported that it is measurable & painful
on a "big repo".

The other advantage of "* -text" in .gitattributes is that new files are handled
consistantly accross developers.

Those who have "core.autocrlf=false" would produce commits with CRLF for new
files, and those developpers who have core.autocrlf=true would produce files
with LF in the index and CRLF in the worktree.
This may (most probably will) cause confusion later, when things are pushed and
pulled.

> 
> I'd be interested to hear what happens when diff-ing across a line
> ending fixup commit.  Is this an area where Git needs some
> improvement?  "git merge" knows an -Xrenormalize option to deal with a
> related problem --- it's possible that "git diff" needs to learn a
> similar trick.

That is a tricky thing.
Sometimes you want to see the CLRF - LF as a diff, (represented as "^M"),
and sometimes not.

All in all "* -text" is a good choice for Windows-only projects.

> 
> Thanks and hope that helps,
> Jonathan


Re: [PATCH 0/4] (x)diff cleanup: remove duplicate code

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 11:59 AM, Stefan Beller  wrote:
> Junio wrote:
>
>>  * I was hoping that the next_byte() and string_hash() thing, once
>>they are cleaned up, will eventually be shared with the xdiff/
>>code at the lower layer, which needs to do pretty much the same
>>in order to implement various whitespace ignoring options.  I am
>>not sure how well the approach taken by the WIP patch meshes with
>>the needs of the lower layer.
>
> This series does exactly this; although I chose to reuse the code in
> xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that
> code has seen more exercise already (hence I assume it has fewer bugs
> if any as well as its performance implications are well understood).
>
> However to do so, we need to pollute xdiff/xutils.c and include
> hashmap.h there (which also requires cache.h as that header has
> an inline function using BUG()), which I find a bit unfortunate but
> worth the trade off of using better tested code.
>

This, of course, applies on top of jk/diff-color-moved-fix.


[PATCH 3/4] xdiff: use stronger hash function internally

2017-10-24 Thread Stefan Beller
Instead of using the hash seeded with 5381, and updated via
`(hash << 5) ^ new_byte`, use the FNV-1 primitives as offered by
hashmap.h, which is seeded with 0x811c9dc5 and computed as
`(hash * 0x01000193) ^ new_byte`.

Signed-off-by: Stefan Beller 
---
 xdiff/xutils.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 04d7b32e4e..a58a28c687 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -24,7 +24,8 @@
 #include 
 #include "xinclude.h"
 
-
+#include "cache.h"
+#include "hashmap.h"
 
 
 long xdl_bogosqrt(long n) {
@@ -228,7 +229,7 @@ int xdl_recmatch(const char *l1, long s1, const char *l2, 
long s2, long flags)
 
 static unsigned long xdl_hash_record_with_whitespace(char const **data,
char const *top, long flags) {
-   unsigned long ha = 5381;
+   unsigned long ha = memhash(NULL, 0);
char const *ptr = *data;
 
for (; ptr < top && *ptr != '\n'; ptr++) {
@@ -243,21 +244,18 @@ static unsigned long xdl_hash_record_with_whitespace(char 
const **data,
; /* already handled */
else if (flags & XDF_IGNORE_WHITESPACE_CHANGE
 && !at_eol) {
-   ha += (ha << 5);
-   ha ^= (unsigned long) ' ';
+   ha = memhash_feed(ha, (unsigned char) ' ');
}
else if (flags & XDF_IGNORE_WHITESPACE_AT_EOL
 && !at_eol) {
while (ptr2 != ptr + 1) {
-   ha += (ha << 5);
-   ha ^= (unsigned long) *ptr2;
+   ha = memhash_feed(ha, (unsigned char) 
*ptr2);
ptr2++;
}
}
continue;
}
-   ha += (ha << 5);
-   ha ^= (unsigned long) *ptr;
+   ha = memhash_feed(ha, (unsigned char) *ptr);
}
*data = ptr < top ? ptr + 1: ptr;
 
@@ -265,15 +263,14 @@ static unsigned long xdl_hash_record_with_whitespace(char 
const **data,
 }
 
 unsigned long xdl_hash_record(char const **data, char const *top, long flags) {
-   unsigned long ha = 5381;
+   unsigned long ha = memhash(NULL, 0);
char const *ptr = *data;
 
if (flags & XDF_WHITESPACE_FLAGS)
return xdl_hash_record_with_whitespace(data, top, flags);
 
for (; ptr < top && *ptr != '\n'; ptr++) {
-   ha += (ha << 5);
-   ha ^= (unsigned long) *ptr;
+   ha = memhash_feed(ha, (unsigned char) *ptr);
}
*data = ptr < top ? ptr + 1: ptr;
 
-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 4/4] diff.c: get rid of duplicate implementation

2017-10-24 Thread Stefan Beller
The implementations in diff.c to detect moved lines
needs to compare strings and hash strings, which is
implemented in that file, as well as in the xdiff
library.

Remove the rather recent implementation in diff.c
and rely on the well exercised code in the xdiff lib.

Signed-off-by: Stefan Beller 
---
 diff.c | 82 --
 1 file changed, 4 insertions(+), 78 deletions(-)

diff --git a/diff.c b/diff.c
index c4a669ffa8..e6814b9e9c 100644
--- a/diff.c
+++ b/diff.c
@@ -707,88 +707,14 @@ struct moved_entry {
struct moved_entry *next_line;
 };
 
-static int next_byte(const char **cp, const char **endp,
-const struct diff_options *diffopt)
-{
-   int retval;
-
-   if (*cp >= *endp)
-   return -1;
-
-   if (isspace(**cp)) {
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
-   while (*cp < *endp && isspace(**cp))
-   (*cp)++;
-   /*
-* After skipping a couple of whitespaces,
-* we still have to account for one space.
-*/
-   return (int)' ';
-   }
-
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
-   while (*cp < *endp && isspace(**cp))
-   (*cp)++;
-   /*
-* return the first non-ws character via the usual
-* below, unless we ate all of the bytes
-*/
-   if (*cp >= *endp)
-   return -1;
-   }
-   }
-
-   retval = (unsigned char)(**cp);
-   (*cp)++;
-   return retval;
-}
-
 static int moved_entry_cmp(const struct diff_options *diffopt,
   const struct moved_entry *a,
   const struct moved_entry *b,
   const void *keydata)
 {
-   const char *ap = a->es->line, *ae = a->es->line + a->es->len;
-   const char *bp = b->es->line, *be = b->es->line + b->es->len;
-
-   if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS))
-   return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
-
-   if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
-   while (ae > ap && isspace(ae[-1]))
-   ae--;
-   while (be > bp && isspace(be[-1]))
-   be--;
-   }
-
-   while (1) {
-   int ca, cb;
-   ca = next_byte(&ap, &ae, diffopt);
-   cb = next_byte(&bp, &be, diffopt);
-   if (ca != cb)
-   return 1;
-   if (ca < 0)
-   return 0;
-   }
-}
-
-static unsigned get_string_hash(struct emitted_diff_symbol *es, struct 
diff_options *o)
-{
-   if (o->xdl_opts & XDF_WHITESPACE_FLAGS) {
-   static struct strbuf sb = STRBUF_INIT;
-   const char *ap = es->line, *ae = es->line + es->len;
-   int c;
-
-   strbuf_reset(&sb);
-   while (ae > ap && isspace(ae[-1]))
-   ae--;
-   while ((c = next_byte(&ap, &ae, o)) >= 0)
-   strbuf_addch(&sb, c);
-
-   return memhash(sb.buf, sb.len);
-   } else {
-   return memhash(es->line, es->len);
-   }
+   return !xdiff_compare_lines(a->es->line, a->es->len,
+   b->es->line, b->es->len,
+   diffopt->xdl_opts);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -797,7 +723,7 @@ static struct moved_entry *prepare_entry(struct 
diff_options *o,
struct moved_entry *ret = xmalloc(sizeof(*ret));
struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 
-   ret->ent.hash = get_string_hash(l, o);
+   ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
ret->es = l;
ret->next_line = NULL;
 
-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 2/4] xdiff-interface: export comparing and hashing strings

2017-10-24 Thread Stefan Beller
This will turn out to be useful in a later patch

Signed-off-by: Stefan Beller 
---
 xdiff-interface.c | 11 +++
 xdiff-interface.h |  5 +
 2 files changed, 16 insertions(+)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 018e033089..fd002ebbc2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -5,6 +5,7 @@
 #include "xdiff/xdiffi.h"
 #include "xdiff/xemit.h"
 #include "xdiff/xmacros.h"
+#include "xdiff/xutils.h"
 
 struct xdiff_emit_state {
xdiff_emit_consume_fn consume;
@@ -296,6 +297,16 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg)
}
 }
 
+unsigned long xdiff_hash_string(const char *s, size_t len, long flags)
+{
+   return xdl_hash_record(&s, s + len, flags);
+}
+
+int xdiff_compare_lines(const char *l1, long s1, const char *l2, long s2, long 
flags)
+{
+   return xdl_recmatch(l1, s1, l2, s2, flags);
+}
+
 int git_xmerge_style = -1;
 
 int git_xmerge_config(const char *var, const char *value, void *cb)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 6f6ba9095d..d3cb9285c5 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -29,4 +29,9 @@ extern void xdiff_clear_find_func(xdemitconf_t *xecfg);
 extern int git_xmerge_config(const char *var, const char *value, void *cb);
 extern int git_xmerge_style;
 
+extern int xdiff_compare_lines(const char *l1, long s1,
+  const char *l2, long s2, long flags);
+
+extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
+
 #endif
-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 1/4] hashmap: introduce memhash_feed to access the internals of FNV-1 hash

2017-10-24 Thread Stefan Beller
This will be useful shortly.

Signed-off-by: Stefan Beller 
---
 hashmap.c | 7 ++-
 hashmap.h | 3 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/hashmap.c b/hashmap.c
index d42f01ff5a..d103eb1fd2 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -26,13 +26,18 @@ unsigned int strihash(const char *str)
return hash;
 }
 
+unsigned int memhash_feed(unsigned int hash_seed, const unsigned char next)
+{
+   return (hash_seed * FNV32_PRIME) ^ next;
+}
+
 unsigned int memhash(const void *buf, size_t len)
 {
unsigned int hash = FNV32_BASE;
unsigned char *ucbuf = (unsigned char *) buf;
while (len--) {
unsigned int c = *ucbuf++;
-   hash = (hash * FNV32_PRIME) ^ c;
+   hash = memhash_feed(hash, c);
}
return hash;
 }
diff --git a/hashmap.h b/hashmap.h
index 7cb29a6aed..c2464385ed 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -105,10 +105,13 @@
  * `strihash` and `memihash` are case insensitive versions.
  * `memihash_cont` is a variant of `memihash` that allows a computation to be
  * continued with another chunk of data.
+ * `memhash_feed` takes just one character and returns the hash based off
+ * a previous hash.
  */
 extern unsigned int strhash(const char *buf);
 extern unsigned int strihash(const char *buf);
 extern unsigned int memhash(const void *buf, size_t len);
+extern unsigned int memhash_feed(unsigned int hash_seed, const unsigned char 
next);
 extern unsigned int memihash(const void *buf, size_t len);
 extern unsigned int memihash_cont(unsigned int hash_seed, const void *buf, 
size_t len);
 
-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 0/4] (x)diff cleanup: remove duplicate code

2017-10-24 Thread Stefan Beller
Junio wrote:

>  * I was hoping that the next_byte() and string_hash() thing, once
>they are cleaned up, will eventually be shared with the xdiff/
>code at the lower layer, which needs to do pretty much the same
>in order to implement various whitespace ignoring options.  I am
>not sure how well the approach taken by the WIP patch meshes with
>the needs of the lower layer.

This series does exactly this; although I chose to reuse the code in
xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that
code has seen more exercise already (hence I assume it has fewer bugs
if any as well as its performance implications are well understood).

However to do so, we need to pollute xdiff/xutils.c and include
hashmap.h there (which also requires cache.h as that header has
an inline function using BUG()), which I find a bit unfortunate but
worth the trade off of using better tested code.

Thanks,
Stefan

Stefan Beller (4):
  hashmap: introduce memhash_feed to access the internals of FNV-1 hash
  xdiff-interface: export comparing and hashing strings
  xdiff: use stronger hash function internally
  diff.c: get rid of duplicate implementation

 diff.c| 82 +++
 hashmap.c |  7 -
 hashmap.h |  3 ++
 xdiff-interface.c | 11 
 xdiff-interface.h |  5 
 xdiff/xutils.c| 19 ++---
 6 files changed, 37 insertions(+), 90 deletions(-)

-- 
2.15.0.rc2.6.g953226eb5f



[PATCH 10/13] rev-list: add list-objects filtering support

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach rev-list to use the filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.

This feature is only enabled when one of the "--objects*"
options are used.

Furthermore, when the "--filter-print-omitted" option is
used, the omitted objects are printed at the end.  These
are marked with a "~".  This option can be combined with
"--quiet" to get a list of just the omitted objects.

Signed-off-by: Jeff Hostetler 
---
 Documentation/git-rev-list.txt |  5 ++-
 Documentation/rev-list-options.txt | 30 ++
 builtin/rev-list.c | 84 +-
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index ef22f17..6d2e60d 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -47,7 +47,10 @@ SYNOPSIS
 [ --fixed-strings | -F ]
 [ --date=]
 [ [ --objects | --objects-edge | --objects-edge-aggressive ]
-  [ --unpacked ] ]
+  [ --unpacked ]
+  [ --filter= ] ]
+[ --filter-print-missing ]
+[ --filter-print-omitted ]
 [ --pretty | --header ]
 [ --bisect ]
 [ --bisect-vars ]
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7d860bf..88f8878 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -706,6 +706,36 @@ ifdef::git-rev-list[]
 --unpacked::
Only useful with `--objects`; print the object IDs that are not
in packs.
+
+--filter=::
+   Only useful with one of the `--objects*`; omits objects (usually
+   blobs) from the list of printed objects.  The ''
+   may be one of the following:
++
+The form '--filter=blob:none' omits all blobs.
++
+The form '--filter=blob:limit=[kmg]' omits blobs larger than n bytes
+or units.  The value may be zero.  Special files matching '.git*' are
+alwayse included, regardless of size.
++
+The form '--filter=sparse:oid=' uses a sparse-checkout
+specification contained in the object (or the object that the expression
+evaluates to) to omit blobs not required by the corresponding sparse
+checkout.
++
+The form '--filter=sparse:path=' similarly uses a sparse-checkout
+specification contained in .
+
+--filter-print-missing::
+   Prints a list of the missing objects for the requested traversal.
+   Object IDs are prefixed with a ``?'' character.  The object type
+   is printed after the ID.  This may be used with or without any of
+   the above filtering options.
+
+--filter-print-omitted::
+   Only useful with one of the above `--filter*`; prints a list
+   of the omitted objects.  Object IDs are prefixed with a ``~''
+   character.
 endif::git-rev-list[]
 
 --no-walk[=(sorted|unsorted)]::
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index c1c74d4..7a0353f 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -12,6 +12,7 @@
 #include "bisect.h"
 #include "progress.h"
 #include "reflog-walk.h"
+#include "partial-clone-utils.h"
 
 static const char rev_list_usage[] =
 "git rev-list [OPTION] ... [ -- paths... ]\n"
@@ -54,6 +55,11 @@ static const char rev_list_usage[] =
 
 static struct progress *progress;
 static unsigned progress_counter;
+static struct list_objects_filter_options filter_options;
+static struct oidmap missing_objects;
+static int arg_print_missing;
+static int arg_print_omitted;
+#define DEFAULT_MAP_SIZE (16*1024)
 
 static void finish_commit(struct commit *commit, void *data);
 static void show_commit(struct commit *commit, void *data)
@@ -181,8 +187,26 @@ static void finish_commit(struct commit *commit, void 
*data)
 static void finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid))
+   if (obj->type == OBJ_BLOB && !has_object_file(&obj->oid)) {
+   if (arg_print_missing) {
+   list_objects_filter_map_insert(
+   &missing_objects, &obj->oid, name, obj->type);
+   return;
+   }
+
+   /*
+* Relax consistency checks when we expect missing
+* objects because of partial-clone or a previous
+* partial-fetch.
+*
+* Note that this is independent of any filtering that
+* we are doing in this run.
+*/
+   if (is_partial_clone_registered())
+   return;
+
die("missing blob object '%s'", oid_to_hex(&obj->oid));
+   }
if (info->revs->verify_objects && !obj->parsed && obj->type != 
OBJ_COMMIT)
parse_object(&obj->oid);
 }
@@ -202,6 +226,22 @@ static void show_edge(struct 

[PATCH 12/13] pack-objects: add list-objects filtering

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Teach pack-objects to use the filtering provided by the
traverse_commit_list_filtered() interface to omit unwanted
objects from the resulting packfile.

This feature is intended for partial clone/fetch.

Filtering requires the use of the "--stdout" option.

Signed-off-by: Jeff Hostetler 
---
 Documentation/git-pack-objects.txt |  8 +++-
 builtin/pack-objects.c | 18 +-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index 473a161..8b4a223 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -12,7 +12,8 @@ SYNOPSIS
 'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=] [--depth=]
-   [--revs [--unpacked | --all]] [--stdout | base-name]
+   [--revs [--unpacked | --all]]
+   [--stdout [--filter=] | base-name]
[--shallow] [--keep-true-parents] < object-list
 
 
@@ -236,6 +237,11 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it 
creates a bundle.
With this option, parents that are hidden by grafts are packed
nevertheless.
 
+--filter=::
+   Requires `--stdout`.  Omits certain objects (usually blobs) from
+   the resulting packfile.  See linkgit:git-rev-list[1] for valid
+   `` forms.
+
 SEE ALSO
 
 linkgit:git-rev-list[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6e77dfd..a251850 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -79,6 +79,8 @@ static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+static struct list_objects_filter_options filter_options;
+
 /*
  * stats
  */
@@ -2816,7 +2818,12 @@ static void get_object_list(int ac, const char **av)
if (prepare_revision_walk(&revs))
die("revision walk setup failed");
mark_edges_uninteresting(&revs, show_edge);
-   traverse_commit_list(&revs, show_commit, show_object, NULL);
+   if (filter_options.choice)
+   traverse_commit_list_filtered(&filter_options, &revs,
+ show_commit, show_object,
+ NULL, NULL);
+   else
+   traverse_commit_list(&revs, show_commit, show_object, NULL);
 
if (unpack_unreachable_expiration) {
revs.ignore_missing_links = 1;
@@ -2952,6 +2959,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
 N_("use a bitmap index if available to speed up 
counting objects")),
OPT_BOOL(0, "write-bitmap-index", &write_bitmap_index,
 N_("write a bitmap index together with the pack 
index")),
+
+   OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
+
OPT_END(),
};
 
@@ -3028,6 +3038,12 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (!rev_list_all || !rev_list_reflog || !rev_list_index)
unpack_unreachable_expiration = 0;
 
+   if (filter_options.choice) {
+   if (!pack_to_stdout)
+   die("cannot use filtering with an indexable pack.");
+   use_bitmap_index = 0;
+   }
+
/*
 * "soft" reasons not to use bitmaps - for on-disk repack by default we 
want
 *
-- 
2.9.3



[PATCH 09/13] extension.partialclone: introduce partial clone extension

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Introduce the ability to have missing objects in a repo.  This
functionality is guarded by new repository extension options:
`extensions.partialcloneremote` and
`extensions.partialclonefilter`.

See the update to Documentation/technical/repository-version.txt
in this patch for more information.

This patch is part of a patch originally authored by:
Jonathan Tan 

Signed-off-by: Jeff Hostetler 
---
 Documentation/technical/repository-version.txt | 22 ++
 Makefile   |  1 +
 cache.h|  4 ++
 config.h   |  3 +
 environment.c  |  2 +
 partial-clone-utils.c  | 99 ++
 partial-clone-utils.h  | 34 +
 setup.c| 15 
 8 files changed, 180 insertions(+)
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h

diff --git a/Documentation/technical/repository-version.txt 
b/Documentation/technical/repository-version.txt
index 00ad379..9d488db 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,25 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`partialcloneremote`
+
+
+When the config key `extensions.partialcloneremote` is set, it indicates
+that the repo was created with a partial clone (or later performed
+a partial fetch) and that the remote may have omitted sending
+certain unwanted objects.  Such a remote is called a "promisor remote"
+and it promises that all such omitted objects can be fetched from it
+in the future.
+
+The value of this key is the name of the promisor remote.
+
+`partialclonefilter`
+
+
+When the config key `extensions.partialclonefilter` is set, it gives
+the initial filter expression used to create the partial clone.
+This value becomed the default filter expression for subsequent
+fetches (called "partial fetches") from the promisor remote.  This
+value may also be set by the first explicit partial fetch following a
+normal clone.
diff --git a/Makefile b/Makefile
index b9ff0b4..38632fb 100644
--- a/Makefile
+++ b/Makefile
@@ -841,6 +841,7 @@ LIB_OBJS += pack-write.o
 LIB_OBJS += pager.o
 LIB_OBJS += parse-options.o
 LIB_OBJS += parse-options-cb.o
+LIB_OBJS += partial-clone-utils.o
 LIB_OBJS += patch-delta.o
 LIB_OBJS += patch-ids.o
 LIB_OBJS += path.o
diff --git a/cache.h b/cache.h
index 6440e2b..4b785c0 100644
--- a/cache.h
+++ b/cache.h
@@ -860,12 +860,16 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_partial_clone_remote;
+extern char *repository_format_partial_clone_filter;
 
 struct repository_format {
int version;
int precious_objects;
int is_bare;
char *work_tree;
+   char *partial_clone_remote; /* value of extensions.partialcloneremote */
+   char *partial_clone_filter; /* value of extensions.partialclonefilter */
struct string_list unknown_extensions;
 };
 
diff --git a/config.h b/config.h
index a49d264..90544ef 100644
--- a/config.h
+++ b/config.h
@@ -34,6 +34,9 @@ struct config_options {
const char *git_dir;
 };
 
+#define KEY_PARTIALCLONEREMOTE "partialcloneremote"
+#define KEY_PARTIALCLONEFILTER "partialclonefilter"
+
 typedef int (*config_fn_t)(const char *, const char *, void *);
 extern int git_default_config(const char *, const char *, void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
diff --git a/environment.c b/environment.c
index 8289c25..2fcf9bb 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,8 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_partial_clone_remote;
+char *repository_format_partial_clone_filter;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/partial-clone-utils.c b/partial-clone-utils.c
new file mode 100644
index 000..8c925ae
--- /dev/null
+++ b/partial-clone-utils.c
@@ -0,0 +1,99 @@
+#include "cache.h"
+#include "config.h"
+#include "partial-clone-utils.h"
+
+int is_partial_clone_registered(void)
+{
+   if (repository_format_partial_clone_remote ||
+   repository_format_partial_clone_filter)
+   return 1;
+
+   return 0;
+}
+
+void partial_clone_utils_register(
+   const struct list_objects_filter_options *filter_options,
+   const char *remote,
+   const char *cmd_name)
+{
+   struct strbuf buf = STRBUF_INIT;

[PATCH 05/13] list-objects-filter-blobs-limit: add large blob filtering

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Create a filter for traverse_commit_list_worker() to omit blobs
larger than a requested size from the result, but always include
".git*" special files.

Signed-off-by: Jeff Hostetler 
---
 Makefile  |   1 +
 list-objects-filter-blobs-limit.c | 146 ++
 list-objects-filter-blobs-limit.h |  18 +
 3 files changed, 165 insertions(+)
 create mode 100644 list-objects-filter-blobs-limit.c
 create mode 100644 list-objects-filter-blobs-limit.h

diff --git a/Makefile b/Makefile
index 7e9d1f4..0fdeabb 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filter-blobs-limit.o
 LIB_OBJS += list-objects-filter-blobs-none.o
 LIB_OBJS += list-objects-filter-map.o
 LIB_OBJS += ll-merge.o
diff --git a/list-objects-filter-blobs-limit.c 
b/list-objects-filter-blobs-limit.c
new file mode 100644
index 000..f68963d
--- /dev/null
+++ b/list-objects-filter-blobs-limit.c
@@ -0,0 +1,146 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filter-blobs-limit.h"
+
+#define DEFAULT_MAP_SIZE (16*1024)
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ * And to OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_blobs_limit_data {
+   struct oidmap *omits;
+   unsigned long max_bytes;
+};
+
+static list_objects_filter_result filter_blobs_limit(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_blobs_limit_data *filter_data = filter_data_;
+   struct list_objects_filter_data_entry *entry;
+   unsigned long object_length;
+   enum object_type t;
+   int is_special_filename;
+
+   switch (filter_type) {
+   default:
+   die("unkown filter_type");
+   return LOFR_ZERO;
+
+   case LOFT_BEGIN_TREE:
+   assert(obj->type == OBJ_TREE);
+   /* always include all tree objects */
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   case LOFT_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   case LOFT_BLOB:
+   assert(obj->type == OBJ_BLOB);
+   assert((obj->flags & SEEN) == 0);
+
+   is_special_filename = ((strncmp(filename, ".git", 4) == 0) &&
+  filename[4]);
+
+   /*
+* If we are keeping a list of the omitted objects
+* for the caller *AND* we previously "provisionally"
+* omitted this object (because of size) *AND* it now
+* has a special filename, make it not-omitted.
+* Otherwise, continue to provisionally omit it.
+*/
+   if (filter_data->omits &&
+   oidmap_get(filter_data->omits, &obj->oid)) {
+   if (!is_special_filename)
+   return LOFR_ZERO;
+   entry = oidmap_remove(filter_data->omits, &obj->oid);
+   free(entry);
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+   }
+
+   /*
+* If filename matches ".git*", always include it (regardless
+* of size).  (This may include blobs that we do not have
+* locally.)
+*/
+   if (is_special_filename)
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   t = sha1_object_info(obj->oid.hash, &object_length);
+   if (t != OBJ_BLOB) { /* probably OBJ_NONE */
+   /*
+* We DO NOT have the blob locally, so we cannot
+* apply the size filter criteria.  Be conservative
+* and force show it (and let the caller deal with
+* the ambiguity).  (This matches the behavior above
+* when the special filename matches.)
+*/
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+   }
+
+   if (object_length < filter_data->max_bytes)
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   /*
+* Provisionally omit it.  We've already established
+* that this blob is too big and doesn't have a special
+* filename, so we *WANT* to omit it.  However, there
+* may be a special file elsewhere in the tree that
+* references this same blob, so we cannot reject it
+

[PATCH 04/13] list-objects-filter-blobs-none: add filter to omit all blobs

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Create a simple filter for traverse_commit_list_worker() to omit
all blobs from the result.

This filter will be used in a future commit by rev-list and pack-objects
to create a "commits and trees" result.  This is intended for partial
clone and fetch support.

Signed-off-by: Jeff Hostetler 
---
 Makefile |  1 +
 list-objects-filter-blobs-none.c | 83 
 list-objects-filter-blobs-none.h | 18 +
 3 files changed, 102 insertions(+)
 create mode 100644 list-objects-filter-blobs-none.c
 create mode 100644 list-objects-filter-blobs-none.h

diff --git a/Makefile b/Makefile
index e59f12d..7e9d1f4 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filter-blobs-none.o
 LIB_OBJS += list-objects-filter-map.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
diff --git a/list-objects-filter-blobs-none.c b/list-objects-filter-blobs-none.c
new file mode 100644
index 000..1b548b9
--- /dev/null
+++ b/list-objects-filter-blobs-none.c
@@ -0,0 +1,83 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filter-blobs-none.h"
+
+#define DEFAULT_MAP_SIZE (16*1024)
+
+/*
+ * A filter for list-objects to omit ALL blobs from the traversal.
+ * And to OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_blobs_none_data {
+   struct oidmap *omits;
+};
+
+static list_objects_filter_result filter_blobs_none(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_blobs_none_data *filter_data = filter_data_;
+
+   switch (filter_type) {
+   default:
+   die("unkown filter_type");
+   return LOFR_ZERO;
+
+   case LOFT_BEGIN_TREE:
+   assert(obj->type == OBJ_TREE);
+   /* always include all tree objects */
+   return LOFR_MARK_SEEN | LOFR_SHOW;
+
+   case LOFT_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   case LOFT_BLOB:
+   assert(obj->type == OBJ_BLOB);
+   assert((obj->flags & SEEN) == 0);
+
+   if (filter_data->omits)
+   list_objects_filter_map_insert(
+   filter_data->omits, &obj->oid, pathname,
+   obj->type);
+
+   return LOFR_MARK_SEEN; /* but not LOFR_SHOW (hard omit) */
+   }
+}
+
+void traverse_commit_list__blobs_none(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   list_objects_filter_map_foreach_cb print_omitted_object,
+   void *ctx_data)
+{
+   struct filter_blobs_none_data d;
+
+   memset(&d, 0, sizeof(d));
+   if (print_omitted_object) {
+   d.omits = xcalloc(1, sizeof(*d.omits));
+   oidmap_init(d.omits, DEFAULT_MAP_SIZE);
+   }
+
+   traverse_commit_list_worker(revs, show_commit, show_object, ctx_data,
+   filter_blobs_none, &d);
+
+   if (print_omitted_object) {
+   list_objects_filter_map_foreach(d.omits,
+   print_omitted_object,
+   ctx_data);
+   oidmap_free(d.omits, 1);
+   }
+}
diff --git a/list-objects-filter-blobs-none.h b/list-objects-filter-blobs-none.h
new file mode 100644
index 000..363c9de
--- /dev/null
+++ b/list-objects-filter-blobs-none.h
@@ -0,0 +1,18 @@
+#ifndef LIST_OBJECTS_FILTER_BLOBS_NONE_H
+#define LIST_OBJECTS_FILTER_BLOBS_NONE_H
+
+#include "list-objects-filter-map.h"
+
+/*
+ * A filter for list-objects to omit ALL blobs
+ * from the traversal.
+ */
+void traverse_commit_list__blobs_none(
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   list_objects_filter_map_foreach_cb print_omitted_object,
+   void *ctx_data);
+
+#endif /* LIST_OBJECTS_FILTER_BLOBS_NONE_H */
+
-- 
2.9.3



[PATCH 13/13] t5317: pack-objects object filtering test

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 t/t5317-pack-objects-filter-objects.sh | 384 +
 1 file changed, 384 insertions(+)
 create mode 100755 t/t5317-pack-objects-filter-objects.sh

diff --git a/t/t5317-pack-objects-filter-objects.sh 
b/t/t5317-pack-objects-filter-objects.sh
new file mode 100755
index 000..ef7a8f6
--- /dev/null
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -0,0 +1,384 @@
+#!/bin/sh
+
+test_description='git pack-objects with object filtering for partial clone'
+
+. ./test-lib.sh
+
+# Test blob:none filter.
+
+test_expect_success 'setup r1' '
+   echo "{print \$1}" >print_1.awk &&
+   echo "{print \$2}" >print_2.awk &&
+
+   git init r1 &&
+   for n in 1 2 3 4 5
+   do
+   echo "This is file: $n" > r1/file.$n
+   git -C r1 add file.$n
+   git -C r1 commit -m "$n"
+   done
+'
+
+test_expect_success 'verify blob count in normal packfile' '
+   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r1 pack-objects --rev --stdout >all.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r1 index-pack ../all.pack &&
+   git -C r1 verify-pack -v ../all.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:none packfile has no blobs' '
+   git -C r1 pack-objects --rev --stdout --filter=blob:none >filter.pack 
<<-EOF &&
+   HEAD
+   EOF
+   git -C r1 index-pack ../filter.pack &&
+   git -C r1 verify-pack -v ../filter.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   nr=$(wc -l expected &&
+   git -C r1 verify-pack -v ../filter.pack \
+   | grep -E "commit|tree" \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+# Test blob:limit=[kmg] filter.
+# We boundary test around the size parameter.  The filter is strictly less than
+# the value, so size 500 and 1000 should have the same results, but 1001 should
+# filter more.
+
+test_expect_success 'setup r2' '
+   git init r2 &&
+   for n in 1000 1
+   do
+   printf "%"$n"s" X > r2/large.$n
+   git -C r2 add large.$n
+   git -C r2 commit -m "$n"
+   done
+'
+
+test_expect_success 'verify blob count in normal packfile' '
+   git -C r2 ls-files -s large.1000 large.1 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 pack-objects --rev --stdout >all.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r2 index-pack ../all.pack &&
+   git -C r2 verify-pack -v ../all.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=500 omits all blobs' '
+   git -C r2 pack-objects --rev --stdout --filter=blob:limit=500 
>filter.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r2 index-pack ../filter.pack &&
+   git -C r2 verify-pack -v ../filter.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   nr=$(wc -l filter.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r2 index-pack ../filter.pack &&
+   git -C r2 verify-pack -v ../filter.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   nr=$(wc -l expected &&
+   git -C r2 pack-objects --rev --stdout --filter=blob:limit=1001 
>filter.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r2 index-pack ../filter.pack &&
+   git -C r2 verify-pack -v ../filter.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=10001' '
+   git -C r2 ls-files -s large.1000 large.1 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 pack-objects --rev --stdout --filter=blob:limit=10001 
>filter.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r2 index-pack ../filter.pack &&
+   git -C r2 verify-pack -v ../filter.pack \
+   | grep blob \
+   | awk -f print_1.awk \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=1k' '
+   git -C r2 ls-files -s large.1000 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 pack-objects --rev --stdout --filter=blob:limit=1k 
>filter.pack <<-EOF &&
+   HEAD
+   EOF
+   git -C r2 index-pack ../filter.pack &&
+   git -C r2 verify-pack -v ../filt

[PATCH 01/13] dir: allow exclusions from blob in addition to file

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Refactor add_excludes() to separate the reading of the
exclude file into a buffer and the parsing of the buffer
into exclude_list items.

Add add_excludes_from_blob_to_list() to allow an exclude
file be specified with an OID.

Signed-off-by: Jeff Hostetler 
---
 dir.c | 51 +--
 dir.h |  3 +++
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 1d17b80..d848f2b 100644
--- a/dir.c
+++ b/dir.c
@@ -739,6 +739,10 @@ static void invalidate_directory(struct untracked_cache 
*uc,
dir->dirs[i]->recurse = 0;
 }
 
+static int add_excludes_from_buffer(char *buf, size_t size,
+   const char *base, int baselen,
+   struct exclude_list *el);
+
 /*
  * Given a file with name "fname", read it (either from disk, or from
  * an index if 'istate' is non-null), parse it and store the
@@ -754,9 +758,9 @@ static int add_excludes(const char *fname, const char 
*base, int baselen,
struct sha1_stat *sha1_stat)
 {
struct stat st;
-   int fd, i, lineno = 1;
+   int fd;
size_t size = 0;
-   char *buf, *entry;
+   char *buf;
 
fd = open(fname, O_RDONLY);
if (fd < 0 || fstat(fd, &st) < 0) {
@@ -813,6 +817,17 @@ static int add_excludes(const char *fname, const char 
*base, int baselen,
}
}
 
+   add_excludes_from_buffer(buf, size, base, baselen, el);
+   return 0;
+}
+
+static int add_excludes_from_buffer(char *buf, size_t size,
+   const char *base, int baselen,
+   struct exclude_list *el)
+{
+   int i, lineno = 1;
+   char *entry;
+
el->filebuf = buf;
 
if (skip_utf8_bom(&buf, size))
@@ -841,6 +856,38 @@ int add_excludes_from_file_to_list(const char *fname, 
const char *base,
return add_excludes(fname, base, baselen, el, istate, NULL);
 }
 
+int add_excludes_from_blob_to_list(
+   struct object_id *oid,
+   const char *base, int baselen,
+   struct exclude_list *el)
+{
+   char *buf;
+   unsigned long size;
+   enum object_type type;
+
+   buf = read_sha1_file(oid->hash, &type, &size);
+   if (!buf)
+   return -1;
+
+   if (type != OBJ_BLOB) {
+   free(buf);
+   return -1;
+   }
+
+   if (size == 0) {
+   free(buf);
+   return 0;
+   }
+
+   if (buf[size - 1] != '\n') {
+   buf = xrealloc(buf, st_add(size, 1));
+   buf[size++] = '\n';
+   }
+
+   add_excludes_from_buffer(buf, size, base, baselen, el);
+   return 0;
+}
+
 struct exclude_list *add_exclude_list(struct dir_struct *dir,
  int group_type, const char *src)
 {
diff --git a/dir.h b/dir.h
index e371705..1bcf391 100644
--- a/dir.h
+++ b/dir.h
@@ -256,6 +256,9 @@ extern struct exclude_list *add_exclude_list(struct 
dir_struct *dir,
 extern int add_excludes_from_file_to_list(const char *fname, const char *base, 
int baselen,
  struct exclude_list *el, struct  
index_state *istate);
 extern void add_excludes_from_file(struct dir_struct *, const char *fname);
+extern int add_excludes_from_blob_to_list(struct object_id *oid,
+ const char *base, int baselen,
+ struct exclude_list *el);
 extern void parse_exclude_pattern(const char **string, int *patternlen, 
unsigned *flags, int *nowildcardlen);
 extern void add_exclude(const char *string, const char *base,
int baselen, struct exclude_list *el, int srcpos);
-- 
2.9.3



[PATCH 03/13] list-objects: filter objects in traverse_commit_list

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted (not shown)
during a traversal.

Update traverse_commit_list() to be a wrapper for the above.

Filtering will be used in a future commit by rev-list and
pack-objects for narrow/partial clone/fetch to omit certain
blobs from the output.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.

Signed-off-by: Jeff Hostetler 
---
 list-objects.c | 66 --
 list-objects.h | 32 +++-
 2 files changed, 81 insertions(+), 17 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index b3931fa..3e86008 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
 show_object_fn show,
 struct strbuf *path,
 const char *name,
-void *cb_data)
+void *cb_data,
+filter_object_fn filter,
+void *filter_data)
 {
struct object *obj = &blob->object;
size_t pathlen;
+   list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
if (!revs->blob_objects)
return;
@@ -24,11 +27,15 @@ static void process_blob(struct rev_info *revs,
die("bad blob object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   obj->flags |= SEEN;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   show(obj, path->buf, cb_data);
+   if (filter)
+   r = filter(LOFT_BLOB, obj, path->buf, &path->buf[pathlen], 
filter_data);
+   if (r & LOFR_MARK_SEEN)
+   obj->flags |= SEEN;
+   if (r & LOFR_SHOW)
+   show(obj, path->buf, cb_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -69,7 +76,9 @@ static void process_tree(struct rev_info *revs,
 show_object_fn show,
 struct strbuf *base,
 const char *name,
-void *cb_data)
+void *cb_data,
+filter_object_fn filter,
+void *filter_data)
 {
struct object *obj = &tree->object;
struct tree_desc desc;
@@ -77,6 +86,7 @@ static void process_tree(struct rev_info *revs,
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
all_entries_interesting: entry_not_interesting;
int baselen = base->len;
+   list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_SHOW;
 
if (!revs->tree_objects)
return;
@@ -90,9 +100,13 @@ static void process_tree(struct rev_info *revs,
die("bad tree object %s", oid_to_hex(&obj->oid));
}
 
-   obj->flags |= SEEN;
strbuf_addstr(base, name);
-   show(obj, base->buf, cb_data);
+   if (filter)
+   r = filter(LOFT_BEGIN_TREE, obj, base->buf, 
&base->buf[baselen], filter_data);
+   if (r & LOFR_MARK_SEEN)
+   obj->flags |= SEEN;
+   if (r & LOFR_SHOW)
+   show(obj, base->buf, cb_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -112,7 +126,7 @@ static void process_tree(struct rev_info *revs,
process_tree(revs,
 lookup_tree(entry.oid),
 show, base, entry.path,
-cb_data);
+cb_data, filter, filter_data);
else if (S_ISGITLINK(entry.mode))
process_gitlink(revs, entry.oid->hash,
show, base, entry.path,
@@ -121,8 +135,17 @@ static void process_tree(struct rev_info *revs,
process_blob(revs,
 lookup_blob(entry.oid),
 show, base, entry.path,
-cb_data);
+cb_data, filter, filter_data);
}
+
+   if (filter) {
+   r = filter(LOFT_END_TREE, obj, base->buf, &base->buf[baselen], 
filter_data);
+   if (r & LOFR_MARK_SEEN)
+   obj->flags |= SEEN;
+   if (r & LOFR_SHOW)
+   show(obj, base->buf, cb_data);
+   }
+
strbuf_setlen(base, baselen);
free_tree_buffer(tree);
 }
@@ -183,10 +206,10 @@ static void add_pending_tree(struct rev_info *revs, 
struct tree *tree)
add_pending_object(revs, &tree->object, "");
 }
 
-void traverse_commit_list(struct rev_info *revs,
- show_commit_fn show_commit,
- show_object_fn show_object,
- void *data)
+void trave

[PATCH 07/13] list-objects-filter-options: common argument parsing

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Create common routines and defines for parsing
list-objects-filter-related command line arguments and
pack-protocol fields.

Signed-off-by: Jeff Hostetler 
---
 Makefile  |   1 +
 list-objects-filter-options.c | 101 ++
 list-objects-filter-options.h |  50 +
 3 files changed, 152 insertions(+)
 create mode 100644 list-objects-filter-options.c
 create mode 100644 list-objects-filter-options.h

diff --git a/Makefile b/Makefile
index fc82664..b9ff0b4 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@ LIB_OBJS += list-objects.o
 LIB_OBJS += list-objects-filter-blobs-limit.o
 LIB_OBJS += list-objects-filter-blobs-none.o
 LIB_OBJS += list-objects-filter-map.o
+LIB_OBJS += list-objects-filter-options.o
 LIB_OBJS += list-objects-filter-sparse.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
new file mode 100644
index 000..40f48ac
--- /dev/null
+++ b/list-objects-filter-options.c
@@ -0,0 +1,101 @@
+#include "cache.h"
+#include "commit.h"
+#include "config.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filter-options.h"
+
+/*
+ * Parse value of the argument to the "filter" keword.
+ * On the command line this looks like: --filter=
+ * and in the pack protocol as: filter 
+ *
+ *  ::= blob:none
+ *   blob:limit:[kmg]
+ *   sparse:oid:
+ *   sparse:path:
+ */
+int parse_list_objects_filter(struct list_objects_filter_options 
*filter_options,
+ const char *arg)
+{
+   struct object_context oc;
+   struct object_id sparse_oid;
+   const char *v0;
+   const char *v1;
+
+   if (filter_options->choice)
+   die(_("multiple object filter types cannot be combined"));
+
+   /*
+* TODO consider rejecting 'arg' if it contains any
+* TODO injection characters (since we might send this
+* TODO to a sub-command or to the server and we don't
+* TODO want to deal with legacy quoting/escaping for
+* TODO a new feature).
+*/
+
+   filter_options->raw_value = strdup(arg);
+
+   if (skip_prefix(arg, "blob:", &v0) || skip_prefix(arg, "blobs:", &v0)) {
+   if (!strcmp(v0, "none")) {
+   filter_options->choice = LOFC_BLOB_NONE;
+   return 0;
+   }
+
+   if (skip_prefix(v0, "limit=", &v1) &&
+   git_parse_ulong(v1, &filter_options->blob_limit_value)) {
+   filter_options->choice = LOFC_BLOB_LIMIT;
+   return 0;
+   }
+   }
+   else if (skip_prefix(arg, "sparse:", &v0)) {
+   if (skip_prefix(v0, "oid=", &v1)) {
+   filter_options->choice = LOFC_SPARSE_OID;
+   if (!get_oid_with_context(v1, GET_OID_BLOB,
+ &sparse_oid, &oc)) {
+   /*
+* We successfully converted the 
+* into an actual OID.  Rewrite the raw_value
+* in canonoical form with just the OID.
+* (If we send this request to the server, we
+* want an absolute expression rather than a
+* local-ref-relative expression.)
+*/
+   free((char *)filter_options->raw_value);
+   filter_options->raw_value =
+   xstrfmt("sparse:oid=%s",
+   oid_to_hex(&sparse_oid));
+   filter_options->sparse_oid_value =
+   oiddup(&sparse_oid);
+   } else {
+   /*
+* We could not turn the  into an
+* OID.  Leave the raw_value as is in case
+* the server can parse it.  (It may refer to
+* a branch, commit, or blob we don't have.)
+*/
+   }
+   return 0;
+   }
+
+   if (skip_prefix(v0, "path=", &v1)) {
+   filter_options->choice = LOFC_SPARSE_PATH;
+   filter_options->sparse_path_value = strdup(v1);
+   return 0;
+   }
+   }
+
+   die(_("invalid filter expression '%s'"), arg);
+   return 0;
+}
+
+int opt_parse_list_objects_filter(const struct option *opt,
+ const char *arg, int unset)
+{
+   struct list_objects_filter_options *filter_options = opt->value;
+
+   assert(arg);
+ 

[PATCH 11/13] t6112: rev-list object filtering test

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Signed-off-by: Jeff Hostetler 
---
 t/t6112-rev-list-filters-objects.sh | 223 
 1 file changed, 223 insertions(+)
 create mode 100755 t/t6112-rev-list-filters-objects.sh

diff --git a/t/t6112-rev-list-filters-objects.sh 
b/t/t6112-rev-list-filters-objects.sh
new file mode 100755
index 000..26fa12f
--- /dev/null
+++ b/t/t6112-rev-list-filters-objects.sh
@@ -0,0 +1,223 @@
+#!/bin/sh
+
+test_description='git rev-list with object filtering for partial clone'
+
+. ./test-lib.sh
+
+# Test the blob:none filter.
+
+test_expect_success 'setup r1' '
+   echo "{print \$1}" >print_1.awk &&
+   echo "{print \$2}" >print_2.awk &&
+
+   git init r1 &&
+   for n in 1 2 3 4 5
+   do
+   echo "This is file: $n" > r1/file.$n
+   git -C r1 add file.$n
+   git -C r1 commit -m "$n"
+   done
+'
+
+test_expect_success 'verify blob:none omits all 5 blobs' '
+   git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r1 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=blob:none \
+   | awk -f print_1.awk \
+   | sed "s/~//" >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify emitted+omitted == all' '
+   git -C r1 rev-list HEAD --objects \
+   | awk -f print_1.awk \
+   | sort >expected &&
+   git -C r1 rev-list HEAD --objects --filter-print-omitted 
--filter=blob:none \
+   | awk -f print_1.awk \
+   | sed "s/~//" \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+
+# Test blob:limit=[kmg] filter.
+# We boundary test around the size parameter.  The filter is strictly less than
+# the value, so size 500 and 1000 should have the same results, but 1001 should
+# filter more.
+
+test_expect_success 'setup r2' '
+   git init r2 &&
+   for n in 1000 1
+   do
+   printf "%"$n"s" X > r2/large.$n
+   git -C r2 add large.$n
+   git -C r2 commit -m "$n"
+   done
+'
+
+test_expect_success 'verify blob:limit=500 omits all blobs' '
+   git -C r2 ls-files -s large.1000 large.1 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=blob:limit=500 \
+   | awk -f print_1.awk \
+   | sed "s/~//" >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify emitted+omitted == all' '
+   git -C r2 rev-list HEAD --objects \
+   | awk -f print_1.awk \
+   | sort >expected &&
+   git -C r2 rev-list HEAD --objects --filter-print-omitted 
--filter=blob:limit=500 \
+   | awk -f print_1.awk \
+   | sed "s/~//" \
+   | sort >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=1000' '
+   git -C r2 ls-files -s large.1000 large.1 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=blob:limit=1000 \
+   | awk -f print_1.awk \
+   | sed "s/~//" >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=1001' '
+   git -C r2 ls-files -s large.1 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=blob:limit=1001 \
+   | awk -f print_1.awk \
+   | sed "s/~//" >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=1k' '
+   git -C r2 ls-files -s large.1 \
+   | awk -f print_2.awk \
+   | sort >expected &&
+   git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=blob:limit=1k \
+   | awk -f print_1.awk \
+   | sed "s/~//" >observed &&
+   test_cmp observed expected
+'
+
+test_expect_success 'verify blob:limit=1m' '
+   cat expected &&
+   git -C r2 rev-list HEAD --quiet --objects --filter-print-omitted 
--filter=blob:limit=1m \
+   | awk -f print_1.awk \
+   | sed "s/~//" >observed &&
+   test_cmp observed expected
+'
+
+# Test sparse:path= filter.
+# Use a local file containing a sparse-checkout specification to filter
+# out blobs not required for the corresponding sparse-checkout.  We do not
+# require sparse-checkout to actually be enabled.
+
+test_expect_success 'setup r3' '
+   git init r3 &&
+   mkdir r3/dir1 &&
+   for n in sparse1 sparse2
+   do
+   echo "This is file: $n" > r3/$n
+   git -C r3 add $n
+   echo "This is file: dir1/$n" > r3/dir1/$n
+  

[PATCH 06/13] list-objects-filter-sparse: add sparse filter

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Create a filter for traverse_commit_list_worker() to only include
the blobs the would be referenced by a sparse-checkout using the
given specification.

Signed-off-by: Jeff Hostetler 
---
 Makefile |   1 +
 list-objects-filter-sparse.c | 241 +++
 list-objects-filter-sparse.h |  30 ++
 3 files changed, 272 insertions(+)
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h

diff --git a/Makefile b/Makefile
index 0fdeabb..fc82664 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@ LIB_OBJS += list-objects.o
 LIB_OBJS += list-objects-filter-blobs-limit.o
 LIB_OBJS += list-objects-filter-blobs-none.o
 LIB_OBJS += list-objects-filter-map.o
+LIB_OBJS += list-objects-filter-sparse.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
diff --git a/list-objects-filter-sparse.c b/list-objects-filter-sparse.c
new file mode 100644
index 000..386b667
--- /dev/null
+++ b/list-objects-filter-sparse.c
@@ -0,0 +1,241 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filter-sparse.h"
+
+#define DEFAULT_MAP_SIZE (16*1024)
+
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec can be loaded from a blob with the
+ * given OID or from a local pathname.  We allow an OID because
+ * the repo may be bare or we may be doing the filtering on the
+ * server.
+ */
+struct frame {
+   int defval;
+   int child_prov_omit : 1;
+};
+
+struct filter_use_sparse_data {
+   struct oidmap *omits;
+   struct exclude_list el;
+
+   size_t nr, alloc;
+   struct frame *array_frame;
+};
+
+static list_objects_filter_result filter_use_sparse(
+   list_objects_filter_type filter_type,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_use_sparse_data *filter_data = filter_data_;
+   struct list_objects_filter_map_entry *entry_prev = NULL;
+   int val, dtype;
+   struct frame *frame;
+
+   switch (filter_type) {
+   default:
+   die("unkown filter_type");
+   return LOFR_ZERO;
+
+   case LOFT_BEGIN_TREE:
+   assert(obj->type == OBJ_TREE);
+   dtype = DT_DIR;
+   val = is_excluded_from_list(pathname, strlen(pathname),
+   filename, &dtype, &filter_data->el,
+   &the_index);
+   if (val < 0)
+   val = filter_data->array_frame[filter_data->nr].defval;
+
+   ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1,
+  filter_data->alloc);
+   filter_data->nr++;
+   filter_data->array_frame[filter_data->nr].defval = val;
+   filter_data->array_frame[filter_data->nr].child_prov_omit = 0;
+
+   /*
+* A directory with this tree OID may appear in multiple
+* places in the tree. (Think of a directory move, with
+* no other changes.)  And with a different pathname, the
+* is_excluded...() results for this directory and items
+* contained within it may be different.  So we cannot
+* mark it SEEN (yet), since that will prevent process_tree()
+* from revisiting this tree object with other pathnames.
+*
+* Only SHOW the tree object the first time we visit this
+* tree object.
+*
+* We always show all tree objects.  A future optimization
+* may want to attempt to narrow this.
+*/
+   if (obj->flags & FILTER_REVISIT)
+   return LOFR_ZERO;
+   obj->flags |= FILTER_REVISIT;
+   return LOFR_SHOW;
+
+   case LOFT_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   assert(filter_data->nr > 0);
+
+   frame = &filter_data->array_frame[filter_data->nr];
+   filter_data->nr--;
+
+   /*
+* Tell our parent directory if any of our children were
+* provisionally omitted.
+*/
+   filter_data->array_frame[filter_data->nr].child_prov_omit |=
+   frame->child_prov_omit;
+
+   /*
+* If there are NO provisionally omitted child objects (ALL 
child
+* objects in this folder were INCLUDED), then we can mark the
+* folder as SEEN (so we will not have to revisit it again).
+   

[PATCH 08/13] list-objects: add traverse_commit_list_filtered method

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Add traverse_commit_list_filtered() wrapper around the various
filter methods using common data in object_filter_options.

Signed-off-by: Jeff Hostetler 
---
 list-objects.c | 45 +
 list-objects.h | 11 +++
 2 files changed, 56 insertions(+)

diff --git a/list-objects.c b/list-objects.c
index 3e86008..4ce2593 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -7,6 +7,9 @@
 #include "tree-walk.h"
 #include "revision.h"
 #include "list-objects.h"
+#include "list-objects-filter-blobs-none.h"
+#include "list-objects-filter-blobs-limit.h"
+#include "list-objects-filter-sparse.h"
 
 static void process_blob(struct rev_info *revs,
 struct blob *blob,
@@ -266,3 +269,45 @@ void traverse_commit_list(struct rev_info *revs,
show_commit, show_object, show_data,
NULL, NULL);
 }
+
+void traverse_commit_list_filtered(
+   struct list_objects_filter_options *filter_options,
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   list_objects_filter_map_foreach_cb print_omitted_object,
+   void *show_data)
+{
+   switch (filter_options->choice) {
+   case LOFC_DISABLED:
+   traverse_commit_list(revs, show_commit, show_object, show_data);
+   return;
+
+   case LOFC_BLOB_NONE:
+   traverse_commit_list__blobs_none(
+   revs, show_commit, show_object, print_omitted_object,
+   show_data);
+   return;
+
+   case LOFC_BLOB_LIMIT:
+   traverse_commit_list__blobs_limit(
+   revs, show_commit, show_object, print_omitted_object,
+   show_data, filter_options->blob_limit_value);
+   return;
+
+   case LOFC_SPARSE_OID:
+   traverse_commit_list__sparse_oid(
+   revs, show_commit, show_object, print_omitted_object,
+   show_data, filter_options->sparse_oid_value);
+   return;
+
+   case LOFC_SPARSE_PATH:
+   traverse_commit_list__sparse_path(
+   revs, show_commit, show_object, print_omitted_object,
+   show_data, filter_options->sparse_path_value);
+   return;
+
+   default:
+   die("unspecified list-objects filter");
+   }
+}
diff --git a/list-objects.h b/list-objects.h
index 43a06fb..d14b0e0 100644
--- a/list-objects.h
+++ b/list-objects.h
@@ -1,6 +1,9 @@
 #ifndef LIST_OBJECTS_H
 #define LIST_OBJECTS_H
 
+#include "list-objects-filter-map.h"
+#include "list-objects-filter-options.h"
+
 typedef void (*show_commit_fn)(struct commit *, void *);
 typedef void (*show_object_fn)(struct object *, const char *, void *);
 void traverse_commit_list(struct rev_info *, show_commit_fn, show_object_fn, 
void *);
@@ -38,4 +41,12 @@ void traverse_commit_list_worker(
show_commit_fn, show_object_fn, void *show_data,
filter_object_fn filter, void *filter_data);
 
+void traverse_commit_list_filtered(
+   struct list_objects_filter_options *filter_options,
+   struct rev_info *revs,
+   show_commit_fn show_commit,
+   show_object_fn show_object,
+   list_objects_filter_map_foreach_cb print_omitted_object,
+   void *show_data);
+
 #endif /* LIST_OBJECTS_H */
-- 
2.9.3



[PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

I've been working with Jonathan Tan to combine our partial clone
proposals.  This patch series represents a first step in that effort
and introduces an object filtering mechanism to select unwanted
objects.

[1] traverse_commit_list and list-objects is extended to allow
various filters.
[2] rev-list is extended to expose filtering.  This allows testing
of the filtering options.  And can be used later to predict
missing objects before commands like checkout or merge.
[3] pack-objects is extended to use filtering parameters and build
packfiles that omit unwanted objects.

This patch series lays the ground work for subsequent parts which
will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.


Jeff Hostetler (13):
  dir: allow exclusions from blob in addition to file
  list-objects-filter-map: extend oidmap to collect omitted objects
  list-objects: filter objects in traverse_commit_list
  list-objects-filter-blobs-none: add filter to omit all blobs
  list-objects-filter-blobs-limit: add large blob filtering
  list-objects-filter-sparse: add sparse filter
  list-objects-filter-options: common argument parsing
  list-objects: add traverse_commit_list_filtered method
  extension.partialclone: introduce partial clone extension
  rev-list: add list-objects filtering support
  t6112: rev-list object filtering test
  pack-objects: add list-objects filtering
  t5317: pack-objects object filtering test

 Documentation/git-pack-objects.txt |   8 +-
 Documentation/git-rev-list.txt |   5 +-
 Documentation/rev-list-options.txt |  30 ++
 Documentation/technical/repository-version.txt |  22 ++
 Makefile   |   6 +
 builtin/pack-objects.c |  18 +-
 builtin/rev-list.c |  84 +-
 cache.h|   4 +
 config.h   |   3 +
 dir.c  |  51 +++-
 dir.h  |   3 +
 environment.c  |   2 +
 list-objects-filter-blobs-limit.c  | 146 ++
 list-objects-filter-blobs-limit.h  |  18 ++
 list-objects-filter-blobs-none.c   |  83 ++
 list-objects-filter-blobs-none.h   |  18 ++
 list-objects-filter-map.c  |  63 
 list-objects-filter-map.h  |  26 ++
 list-objects-filter-options.c  | 101 +++
 list-objects-filter-options.h  |  50 
 list-objects-filter-sparse.c   | 241 
 list-objects-filter-sparse.h   |  30 ++
 list-objects.c | 111 +--
 list-objects.h |  43 ++-
 partial-clone-utils.c  |  99 +++
 partial-clone-utils.h  |  34 +++
 setup.c|  15 +
 t/t5317-pack-objects-filter-objects.sh | 384 +
 t/t6112-rev-list-filters-objects.sh| 223 ++
 29 files changed, 1897 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filter-blobs-limit.c
 create mode 100644 list-objects-filter-blobs-limit.h
 create mode 100644 list-objects-filter-blobs-none.c
 create mode 100644 list-objects-filter-blobs-none.h
 create mode 100644 list-objects-filter-map.c
 create mode 100644 list-objects-filter-map.h
 create mode 100644 list-objects-filter-options.c
 create mode 100644 list-objects-filter-options.h
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h
 create mode 100755 t/t5317-pack-objects-filter-objects.sh
 create mode 100755 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3



[PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

Create helper class to extend oidmap to collect a list of
omitted or missing objects during traversal.

This will be used in a later commit by the list-object filtering
code.

Signed-off-by: Jeff Hostetler 
---
 Makefile  |  1 +
 list-objects-filter-map.c | 63 +++
 list-objects-filter-map.h | 26 +++
 3 files changed, 90 insertions(+)
 create mode 100644 list-objects-filter-map.c
 create mode 100644 list-objects-filter-map.h

diff --git a/Makefile b/Makefile
index cd75985..e59f12d 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ LIB_OBJS += levenshtein.o
 LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
+LIB_OBJS += list-objects-filter-map.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
diff --git a/list-objects-filter-map.c b/list-objects-filter-map.c
new file mode 100644
index 000..7e496b3
--- /dev/null
+++ b/list-objects-filter-map.c
@@ -0,0 +1,63 @@
+#include "cache.h"
+#include "list-objects-filter-map.h"
+
+int list_objects_filter_map_insert(struct oidmap *map,
+  const struct object_id *oid,
+  const char *pathname, enum object_type type)
+{
+   size_t len, size;
+   struct list_objects_filter_map_entry *e;
+
+   if (oidmap_get(map, oid))
+   return 1;
+
+   len = ((pathname && *pathname) ? strlen(pathname) : 0);
+   size = (offsetof(struct list_objects_filter_map_entry, pathname) + len 
+ 1);
+   e = xcalloc(1, size);
+
+   oidcpy(&e->entry.oid, oid);
+   e->type = type;
+   if (pathname && *pathname)
+   strcpy(e->pathname, pathname);
+
+   oidmap_put(map, e);
+   return 0;
+}
+
+static int my_cmp(const void *a, const void *b)
+{
+   const struct oidmap_entry *ea, *eb;
+
+   ea = *(const struct oidmap_entry **)a;
+   eb = *(const struct oidmap_entry **)b;
+
+   return oidcmp(&ea->oid, &eb->oid);
+}
+
+void list_objects_filter_map_foreach(struct oidmap *map,
+list_objects_filter_map_foreach_cb cb,
+void *cb_data)
+{
+   struct hashmap_iter iter;
+   struct list_objects_filter_map_entry **array;
+   struct list_objects_filter_map_entry *e;
+   int k, nr;
+
+   nr = hashmap_get_size(&map->map);
+   if (!nr)
+   return;
+
+   array = xcalloc(nr, sizeof(*e));
+
+   k = 0;
+   hashmap_iter_init(&map->map, &iter);
+   while ((e = hashmap_iter_next(&iter)))
+   array[k++] = e;
+
+   QSORT(array, nr, my_cmp);
+
+   for (k = 0; k < nr; k++)
+   cb(k, nr, array[k], cb_data);
+
+   free(array);
+}
diff --git a/list-objects-filter-map.h b/list-objects-filter-map.h
new file mode 100644
index 000..794fc81
--- /dev/null
+++ b/list-objects-filter-map.h
@@ -0,0 +1,26 @@
+#ifndef LIST_OBJECTS_FILTER_MAP_H
+#define LIST_OBJECTS_FILTER_MAP_H
+
+#include "oidmap.h"
+
+struct list_objects_filter_map_entry {
+   struct oidmap_entry entry; /* must be first */
+   enum object_type type;
+   char pathname[FLEX_ARRAY];
+};
+
+extern int list_objects_filter_map_insert(
+   struct oidmap *map,
+   const struct object_id *oid,
+   const char *pathname, enum object_type type);
+
+typedef void (*list_objects_filter_map_foreach_cb)(
+   int i, int i_limit,
+   struct list_objects_filter_map_entry *e, void *cb_data);
+
+extern void list_objects_filter_map_foreach(
+   struct oidmap *map,
+   list_objects_filter_map_foreach_cb cb,
+   void *cb_data);
+
+#endif /* LIST_OBJECTS_FILTER_MAP_H */
-- 
2.9.3



Re: [WIP PATCH] diff: add option to ignore whitespaces for move detection only

2017-10-24 Thread Brandon Williams
On 10/23, Stefan Beller wrote:
> Signed-off-by: Stefan Beller 
> ---
> 
>  diff.c |  10 ++--
>  diff.h |   1 +
>  t/t4015-diff-whitespace.sh | 114 
> -
>  
> See, only 10 lines of code! (and a few more for tests)
> 
> We have run out of space in diff_options.flags,touched_flags.
> as we 1< unused (removed in 882749a04f (diff: add --word-diff option that
> generalizes --color-words, 2010-04-14)). But that postpones the
> real fix for only a short amount of time.
> 
> Ideas welcome how to extend the flag space. (We cannot just make it
> a long either, as some arcane architecures have 32 bit longs.)

One simple idea would be to convert the single 'flag' into various bit
fields themselves, that way if you need to add a new flag you would just
make a new bit field.  I'm unaware of any downsides of doing so (though
i may be missing something) but doing so would probably cause a bit of
code churn.


-- 
Brandon Williams


Re: [RFC] protocol version 2

2017-10-24 Thread Brandon Williams
On 10/24, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> >   * Capabilities were implemented as a hack and are hidden behind a NUL
> > byte after the first ref sent from the server during the ref
> > advertisement:
> > ...
> >
> >   * Various other technical debt (e.g. abusing capabilities to
> > communicate agent and symref data, service name set using a query
> > parameter).
> 
> This sounds like a duplicate of the above.

You're right, it mostly is a duplication of that.

> 
> >  Special Packets
> > -
> >
> > In protocol v2 these special packets will have the following semantics:
> >
> >   * '' Flush Packet (flush-pkt) - indicates the end of a message
> >   * '0001' End-of-List delimiter (delim-pkt) - indicates the end of a list
> 
> After reading the remainder of the document twice, I do not think
> the reason why we want to introduce a new type delim-pkt is
> explained and justified well enough.  If a command request takes a
> command packet, zero or more capability packets, a mandatory
> delimiter packet, zero or more parameter packets and a mandatory
> flush packet, then you can use the same "flush" as delimiter in the
> middle.  The delimiter will of course become useful if you can omit
> it when not necessary (e.g. after seeing capabilities, you may see a
> flush and you will know there is no parameters and save the need to
> send one "delim").
> 
> I actually have a reasonable guess why you want to have a separate
> delimiter (which has nothing to do with "optional delim can be
> omitted"), but I want to see it explained in this document clearly
> by its designer(s).

Jonathan Tan suggested that we tighten flush semantics in a newer
protocol so that proxies are easier to work with.  Currently proxies
need to understand the protocol instead of simply waiting for a flush.

Also I've been told the smart http code is more complex because of the
current semantics of flush packets.

> 
> > command-request = command
> >   capability-list
> >   delim-pkt
> >   (command specific parameters)
> >   flush-pkt
> > command = PKT-LINE("command=" key LF)
> >
> > The server will then acknowledge the command and requested capabilities
> > by echoing them back to the client and then launch into the command.
> >
> > acknowledge-request = command
> >   capability-list
> >   delim-pkt
> >   execute-command
> > execute-command = 
> 
> It is not quite clear what the value of "echoing them back" is,
> especially if that is done by always echoing verbatim.  A reader may
> naturally expect, when capabilities are exchanged between two
> parties, these are negotiated so that the only ones that are
> commonly supported by both ends would be used, or something like
> that.

The echoing back of the command and requested capabilities may or may
not be needed.  A client should only ever issue a command-request using
advertised capabilities and commands so there really isn't much
negotiation happening, just the server saying "here's what's on the
menu" and the client picking only from that menu.

Where the echoing back may be useful is if we wanted to (at some point)
eliminate this initial round trip of doing the capability advertisement
and then subsequent selection of capabilities and instead stuff that
information into the GIT_PROTOCOL side channel in the initial request.
That way the client could optimistically send capabilities that it
doesn't yet know if the server supports.  I thought this might be an
interesting idea if we really really didn't want to live with the extra
round trip.

> 
> > A particular command can last for as many rounds as are required to
> > complete the service (multiple for negotiation during fetch and push or
> > no additional trips in the case of ls-refs).
> 
> OK.
> 
> >  Commands in v2
> > 
> >
> > Services are the core actions that a client wants to perform (fetch,
> > push, etc).  Each service has its own set of capabilities and its own
> > language of commands (think 'want' lines in fetch).  Optionally a
> > service can take in initial parameters or data when a client sends it
> > service request.
> 
> So a service (like "fetch") employ a set of "command"s (like "want",
> "have, etc)?  In the earlier part of the document, we did not see
> any mention of "service" and instead saw only "command" mentioned.
> Is the state machine on both ends and the transition between states
> implicit?  E.g. when one side throws "want" command and the other
> side acknowledges, they understand implicitly that they are now in a
> "fetch" service session, even though there is nothing that said over
> the wire that they are doing so?  One reason I am wondering about
> this is what we should do if a command verb may be applicable in
> multiple services.

Looks like I missed changing the word 

Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-24 Thread René Scharfe
Am 24.10.2017 um 02:52 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> Am 21.10.2017 um 14:18 schrieb Junio C Hamano:
>>> René Scharfe  writes:
>>>
 FWIW, I use "-?" for that everywhere.  I have yet to find a command or
 environment where it does something dangerous.
>>>
>>> Yeah, it would have made the world a better place if we made that
>>> choice back in 2008.  If we start a transition to make it so right
>>> now, we might be able to make the world a better place by 2022,
>>> perhaps.  I am not sure if the pain during the transition is worth
>>> it, though.
>>
>> "-?" works fine with builtins already -- they complain that the option
>> is unknown and then show the short help text.
> 
> Ah, I misunderstood what you meant, then.  I thought you were
> advocating to move the built-in short-help support to know about and
> explicitly react to "-?", and somehow forgot that it "works" more or
> less already.

I don't mean to advocate here -- it's more like trying to get the
accounting right.  A little bit of OCD perhaps?

> The fact that "-?" already works for most things is good, but the
> transition pain still remains, as what's costly is to transition
> people's expectation (i.e. "'-?' and not '-h' is the way to get
> short help from any subcommand"), not the implementation to fill the
> gaps for those that do not yet support '-?', I am afraid.

A minor cost for help-seeking users would be the extra error message at
the top of the short help text.

The main change would cause "git ls-remote -h" to show remote heads and
"git show-ref -h" to show HEAD.  Confused users would have to resort to
e.g. man, -help, --help, their search engine of choice, or -?.  I feel
this could be justified.  It would be different if e.g. reset started
to take -h as shorthand for --hard, as this would cause data loss.

The hard part would probably be allowing the execution of commands with
unknown arguments outside of repositories to show the help text, even
if they'd normally (with correct arguments) require one.  Converting
all commands from RUN_SETUP to RUN_SETUP_GENTLY seems painful.  Showing
help when a required repo is not found might be possible somehow.

With that I'm going to shut up, as I don't even use the affected
commands, nor can I imagine being in the place of someone impacted by
"git  -h" not showing a help text.

René


Re: Consequences of CRLF in index?

2017-10-24 Thread Jonathan Nieder
Hi,

Lars Schneider wrote:

> I've migrated a large repo (110k+ files) with a lot of history (177k commits)
> and a lot of users (200+) to Git. Unfortunately, all text files in the index
> of the repo have CRLF line endings. In general this seems not to be a problem
> as the project is developed exclusively on Windows.

Sounds good.

> However, I wonder if there are any "hidden consequences" of this setup?
> If there are consequences, then I see two options. Either I rebase the repo
> and fix the line endings for all commits or I add a single commit that fixes
> the line endings for all files. Both actions require coordination with the
> users to avoid repo trouble/merge conflicts. The "single fixup commit" options
> would also make diffs into the past look bad. Would a single large commit have
> any impact on the performance of standard Git operations?

There are no hidden consequences that I'm aware of.  If you later
decide that you want to become a cross-platform project, then you may
want to switch to LF endings, in which case I suggest the "single
fixup commit" strategy.

In any event, you also probably want to declare what you're doing
using .gitattributes.  By checking in the files as CRLF, you are
declaring that you do *not* want Git to treat them as text files
(i.e., you do not want Git to change the line endings), so something
as simple as

* -text

should do the trick.  See gitattributes(5) for details.

I'd be interested to hear what happens when diff-ing across a line
ending fixup commit.  Is this an area where Git needs some
improvement?  "git merge" knows an -Xrenormalize option to deal with a
related problem --- it's possible that "git diff" needs to learn a
similar trick.

Thanks and hope that helps,
Jonathan


Consequences of CRLF in index?

2017-10-24 Thread Lars Schneider
Hi,

I've migrated a large repo (110k+ files) with a lot of history (177k commits) 
and a lot of users (200+) to Git. Unfortunately, all text files in the index
of the repo have CRLF line endings. In general this seems not to be a problem 
as the project is developed exclusively on Windows.

However, I wonder if there are any "hidden consequences" of this setup?
If there are consequences, then I see two options. Either I rebase the repo 
and fix the line endings for all commits or I add a single commit that fixes 
the line endings for all files. Both actions require coordination with the 
users to avoid repo trouble/merge conflicts. The "single fixup commit" options 
would also make diffs into the past look bad. Would a single large commit have
any impact on the performance of standard Git operations?

Thanks,
Lars




Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Martin Ågren
On 24 October 2017 at 18:45, Eric Sunshine  wrote:
> On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
>> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
>> wrote:
>>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
>>> init_merge_options().
>>> Consequential call of getenv() may return NULL pointer and strtol() crashes.
>>> However the stored pointer to the obtained getenv() result may be 
>>> invalidated
>>> by some other getenv() call from another thread as getenv() is not 
>>> thread-safe.

I'm having trouble wrapping my head around this. Under which
circumstances could the second call in the current code return NULL, but
the code after your patch behave in a well-defined (and correct) way?

> The distance between getenv() and the point where the value is
> actually used is a big concern due to not knowing what is or might be
> going on in called functions between the two points. According to [1],
> the value returned by getenv() could be invalidated by another call to
> getenv() (or setenv() or unsetenv() or putenv()), and we don't have
> guarantee that we're safe from such invalidation considering that this
> function calls out to others. For instance, after getenv() but before
> the value is used, init_merge_options() calls merge_recursive_config()
> which calls git_config() which calls git_xmerge_config(), and so on.
>
> For this reason, I have difficulty endorsing this change as-is.

Yeah. The call should be immediately before `merge_verbosity` is used.
Then, if a compiler wants to move the call, it has to do the work and
prove that it's ok.


[PATCH] add a test for "describe --contains" with mixed tags

2017-10-24 Thread orgads
From: Orgad Shaneh 

If a repository has early lightweight tags and annotated tags later,
running git describe --contains for an early commit used the annotated
tag for reference, instead of the lightweight tag which was closer.

This has been fixed in ef1e74065c19cc427c4a1b322154fd55d7a3588b,
and regression was tweaked in 5554451de61cb90e530f30b96e62d455e1eff6a1.

Add a test for that to avoid further regressions.

Signed-off-by: Orgad Shaneh 
---
 t/t6120-describe.sh | 16 
 1 file changed, 16 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 1c0e8659d..08427f4b5 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -340,4 +340,20 @@ test_expect_success ULIMIT_STACK_SIZE 'describe works in a 
deep repo' '
test_cmp expect actual
 '
 
+test_expect_success 'describe --contains for light before annotated' '
+   test_tick &&
+   git commit --allow-empty -m First &&
+   test_tick &&
+   git commit --allow-empty -m Second &&
+   test_tick &&
+   git commit --allow-empty -m Third &&
+   test_tick &&
+   git tag light-before-annotated HEAD^ &&
+   test_tick &&
+   git tag -a -m annotated annotated-after-light
+
+'
+
+check_describe light-before-annotated~1 --contains light-before-annotated~1
+
 test_done
-- 
2.15.0.rc2



Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 12:28 PM, Stefan Beller  wrote:
> On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  
> wrote:
>> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
>> init_merge_options().
>> Consequential call of getenv() may return NULL pointer and strtol() crashes.
>> However the stored pointer to the obtained getenv() result may be invalidated
>> by some other getenv() call from another thread as getenv() is not 
>> thread-safe.
>
> But do we have other threads running at the time?

I feel uncomfortable about this change, not due to lack of thread
safety, but due to the distance between the getenv() invocation and
its point of use. See below for more detail.

>> Signed-off-by: Andrey Okoshkin 
>> ---
>> diff --git a/merge-recursive.c b/merge-recursive.c
>> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct 
>> merge_options *o)
>>  void init_merge_options(struct merge_options *o)
>>  {
>> +   const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
>
> Despite not being in a threaded environment, I wonder if we want to
> minimize the time between  calling getenv and the use of the result,
> i.e. declare merge_verbosity here, but assign it later, just before the
> condition?
>
> With or without this change:
> Reviewed-by: Stefan Beller 

The distance between getenv() and the point where the value is
actually used is a big concern due to not knowing what is or might be
going on in called functions between the two points. According to [1],
the value returned by getenv() could be invalidated by another call to
getenv() (or setenv() or unsetenv() or putenv()), and we don't have
guarantee that we're safe from such invalidation considering that this
function calls out to others. For instance, after getenv() but before
the value is used, init_merge_options() calls merge_recursive_config()
which calls git_config() which calls git_xmerge_config(), and so on.

For this reason, I have difficulty endorsing this change as-is.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html

>> memset(o, 0, sizeof(struct merge_options));
>> o->verbosity = 2;
>> o->buffer_output = 1;
>> @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o)
>> o->renormalize = 0;
>> o->detect_rename = 1;
>> merge_recursive_config(o);
>> -   if (getenv("GIT_MERGE_VERBOSITY"))
>> -   o->verbosity =
>> -   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
>> +   if (merge_verbosity)
>> +   o->verbosity = strtol(merge_verbosity, NULL, 10);
>> if (o->verbosity >= 5)
>> o->buffer_output = 0;
>> strbuf_init(&o->obuf, 0);
>> --
>> 2.14.3


Re: [PATCH v2 1/1] completion: add remaining flags to checkout

2017-10-24 Thread Johannes Sixt

Am 24.10.2017 um 15:19 schrieb Thomas Braun:

In the commits 1fc458d9 (builtin/checkout: add --recurse-submodules
switch, 2017-03-14), 08d595dc (checkout: add --ignore-skip-worktree-bits
in sparse checkout mode, 2013-04-13) and 32669671 (checkout: introduce
--detach synonym for "git checkout foo^{commit}", 2011-02-08) checkout
gained new flags but the completion was not updated, although these flags
are useful completions. Add them.

The flags --force and --ignore-other-worktrees are not added as they are
potentially dangerous.

The flags --progress and --no-progress are only useful for scripting and are
therefore also not included.

Signed-off-by: Thomas Braun 
---
  contrib/completion/git-completion.bash | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index d934417475..eb6ade6974 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1250,7 +1250,8 @@ _git_checkout ()
--*)
__gitcomp "
--quiet --ours --theirs --track --no-track --merge
-   --conflict= --orphan --patch
+   --conflict= --orphan --patch --detach 
--ignore-skip-worktree-bits
+   --recurse-submodules --no-recurse-submodules
"
;;
*)



Looks good to me. Thanks,
-- Hannes



Re: [PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 8:27 AM, Andrey Okoshkin  wrote:
> Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
> init_merge_options().
> Consequential call of getenv() may return NULL pointer and strtol() crashes.
> However the stored pointer to the obtained getenv() result may be invalidated
> by some other getenv() call from another thread as getenv() is not 
> thread-safe.

But do we have other threads running at the time?
Inspecting the four callsites:
* sequencer.c:
  The prior lines to hold the index lock suggests we're not in a threaded
  environment
* builtin/merge-recursive.c:
  In cmd_merge_recursive and we're fiddling with argv/argc, which
  suggests we in a main function, not having threads around
* builtin/am.c: fall_back_threeway called by am_run.
  (am is not threaded)
* builtin/merge:
  In try_merge_strategy called from the main function, also index locking
* builtin/checkout.c:
  merge_working_tree also locks the index.

So I think this function is never called from within a threaded environment
in git.

>
> Signed-off-by: Andrey Okoshkin 
> ---
>  merge-recursive.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 1494ffdb8..eaac98145 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options 
> *o)
>
>  void init_merge_options(struct merge_options *o)
>  {
> +   const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY");

Despite not being in a threaded environment, I wonder if we want to
minimize the time between  calling getenv and the use of the result,
i.e. declare merge_verbosity here, but assign it later, just before the
condition?

(The compiler may shuffle stuff around anyway, so this is a
moot suggestion; It gears mostly towards making the code more
readable/maintainable when presenting this part of the code
to the user.)

With or without this change:
Reviewed-by: Stefan Beller 


> memset(o, 0, sizeof(struct merge_options));
> o->verbosity = 2;
> o->buffer_output = 1;
> @@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o)
> o->renormalize = 0;
> o->detect_rename = 1;
> merge_recursive_config(o);
> -   if (getenv("GIT_MERGE_VERBOSITY"))
> -   o->verbosity =
> -   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
> +   if (merge_verbosity)
> +   o->verbosity = strtol(merge_verbosity, NULL, 10);
> if (o->verbosity >= 5)
> o->buffer_output = 0;
> strbuf_init(&o->obuf, 0);
> --
> 2.14.3


Re: [PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-24 Thread Eric Sunshine
On Tue, Oct 24, 2017 at 11:16 AM, Michael Haggerty  wrote:
> diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
> @@ -34,6 +34,86 @@ test_update_rejected () {
> +# Test adding and deleting D/F-conflicting references in a single
> +# transaction.
> +df_test() {
> +   local prefix="$1"
> +   shift
> +   local pack=:

Isn't "local" a Bash-ism we want to keep out of the test scripts?

> +   local symadd=false
> +   local symdel=false
> +   local add_del=false
> +   local addref
> +   local delref
> +   while test $# -gt 0
> +   do


Re: [PATCH] path: use xmalloc in add_to_trie

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 8:15 AM, Andrey Okoshkin  wrote:
> Add usage of xmalloc() instead of malloc() in add_to_trie() as xmalloc wraps
> and checks memory allocation result.
>
> Signed-off-by: Andrey Okoshkin 
> ---
> Hello,
> I'm not sure but it looks like there is a missing check of the malloc result.
> memcpy() may crash with SIGSEGV due to the memory allocation failure.
> make_trie_node() uses xmalloc() and xcalloc() - so I believe add_to_trie()
> also should use it.

Good catch! Thanks for spotting.

Trying to find similar occurrences via git grep "= malloc" did not
yield other places that need the same fix.

Thanks,
Stefan


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 5:15 AM, Heiko Voigt  wrote:

> Looks good to me.

Same here,

Thanks,
Stefan


Re: No log --no-decorate completion?

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 8:15 AM, Max Rothman  wrote:
> Just re-discovered this in my inbox. So is this worth fixing? I could
> (probably) figure out a patch.
>
> Thanks,
> Max
>

$ git log -- 
Display all 100 possibilities? (y or n)

I guess adding one more option is no big deal, so go for it!
We also have a couple --no-options already (as you said earlier),
which I think makes it even more viable.

Tangent:
I wonder if we can cascade the completion by common
prefixes, e.g. --no- or --ignore- occurs a couple of times,
such that we could only show these --no- once and only
if you try autocompleting thereafter you get all --no- options.

Thanks for reviving this thread!
Stefan


Re: No log --no-decorate completion?

2017-10-24 Thread Max Rothman
Just re-discovered this in my inbox. So is this worth fixing? I could
(probably) figure out a patch.

Thanks,
Max

On Thu, Oct 12, 2017 at 1:41 PM, Max Rothman  wrote:
> To be fair, other --no* options complete, it's just --no-decorate,
> --no-walk, --no-abbrev-commit, --no-expand-tabs, --no-patch,
> --no-indent-heuristic, and --no-textconv that's missing.
>
> For example:
> $ git log --no
> --no-color --no-max-parents   --no-min-parents   --no-prefix
>  --not
> --no-ext-diff  --no-merges--no-notes --no-renames
>  --notes
>
> Thanks,
> Max
>
> On Wed, Oct 11, 2017 at 2:09 PM, Stefan Beller  wrote:
>> On Wed, Oct 11, 2017 at 7:47 AM, Max Rothman  wrote:
>>> I recently noticed that in the git-completion script, there's
>>> completion for --decorate={full,yes,no} for git log and family, but
>>> not for --no-decorate. Is that intentional? If not, I *think* I see
>>> how it could be added.
>>>
>>> Thanks,
>>> Max
>>
>> Using git-blame, I found af4e9e8c87 (completion: update am, commit, and log,
>> 2009-10-07) as well as af16bdaa3f (completion: fix and update 'git log
>> --decorate='
>> options, 2015-05-01), both of their commit messages do not discuss leaving 
>> out
>> --no-decorate intentionally.
>>
>> If you give --no- you'd get more than just the completion to 
>> --no-decorate,
>> but all the negated options, I would assume?
>>
>> So maybe that is why no one added the negated options, yet?
>>
>> Thanks,
>> Stefan


[PATCH] merge-recursive: check GIT_MERGE_VERBOSITY only once

2017-10-24 Thread Andrey Okoshkin
Add check of 'GIT_MERGE_VERBOSITY' environment variable only once in
init_merge_options().
Consequential call of getenv() may return NULL pointer and strtol() crashes.
However the stored pointer to the obtained getenv() result may be invalidated
by some other getenv() call from another thread as getenv() is not thread-safe.

Signed-off-by: Andrey Okoshkin 
---
 merge-recursive.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb8..eaac98145 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2163,6 +2163,7 @@ static void merge_recursive_config(struct merge_options 
*o)
 
 void init_merge_options(struct merge_options *o)
 {
+   const char *merge_verbosity = getenv("GIT_MERGE_VERBOSITY");
memset(o, 0, sizeof(struct merge_options));
o->verbosity = 2;
o->buffer_output = 1;
@@ -2171,9 +2172,8 @@ void init_merge_options(struct merge_options *o)
o->renormalize = 0;
o->detect_rename = 1;
merge_recursive_config(o);
-   if (getenv("GIT_MERGE_VERBOSITY"))
-   o->verbosity =
-   strtol(getenv("GIT_MERGE_VERBOSITY"), NULL, 10);
+   if (merge_verbosity)
+   o->verbosity = strtol(merge_verbosity, NULL, 10);
if (o->verbosity >= 5)
o->buffer_output = 0;
strbuf_init(&o->obuf, 0);
-- 
2.14.3


[PATCH 2/2] files_transaction_prepare(): fix handling of ref lock failure

2017-10-24 Thread Michael Haggerty
Since dc39e09942 (files_ref_store: use a transaction to update packed
refs, 2017-09-08), failure to lock a reference has been handled
incorrectly by `files_transaction_prepare()`. If
`lock_ref_for_update()` fails in the lock-acquisition loop of that
function, it sets `ret` then breaks out of that loop. Prior to
dc39e09942, that was OK, because the only thing following the loop was
the cleanup code. But dc39e09942 added another blurb of code between
the loop and the cleanup. That blurb sometimes resets `ret` to zero,
making the cleanup code think that the locking was successful.

Specifically, whenever

* One or more reference deletions have been processed successfully in
  the lock-acquisition loop. (Processing the first such reference
  causes a packed-ref transaction to be initialized.)

* Then `lock_ref_for_update()` fails for a subsequent reference. Such
  a failure can happen for a number of reasons, such as the old SHA-1
  not being correct, lock contention, etc. This causes a `break` out
  of the lock-acquisition loop.

* The `packed-refs` lock is acquired successfully and
  `ref_transaction_prepare()` succeeds for the packed-ref transaction.
  This has the effect of resetting `ret` back to 0, and making the
  cleanup code think that lock acquisition was successful.

In that case, any reference updates that were processed prior to
breaking out of the loop would be carried out (loose and packed), but
the reference that couldn't be locked and any subsequent references
would silently be ignored.

This can easily cause data loss if, for example, the user was trying
to push a new name for an existing branch while deleting the old name.
After the push, the branch could be left unreachable, and could even
subsequently be garbage-collected.

This problem was noticed in the context of deleting one reference and
creating another in a single transaction, when the two references D/F
conflict with each other, like

git update-ref --stdin <
---
 refs/files-backend.c |  2 +-
 t/t1404-update-ref-errors.sh | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 014dabb0bf..8cc1e07fdb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2570,7 +2570,7 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
ret = lock_ref_for_update(refs, update, transaction,
  head_ref, &affected_refnames, err);
if (ret)
-   break;
+   goto cleanup;
 
if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 8b5e9a83c5..b7838967b8 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -276,11 +276,11 @@ test_expect_success 'D/F conflict prevents add short + 
delete long' '
df_test refs/df-as-dl --add-del foo foo/bar
 '
 
-test_expect_failure 'D/F conflict prevents delete long + add short' '
+test_expect_success 'D/F conflict prevents delete long + add short' '
df_test refs/df-dl-as --del-add foo/bar foo
 '
 
-test_expect_failure 'D/F conflict prevents delete short + add long' '
+test_expect_success 'D/F conflict prevents delete short + add long' '
df_test refs/df-ds-al --del-add foo foo/bar
 '
 
@@ -292,17 +292,17 @@ test_expect_success 'D/F conflict prevents add short + 
delete long packed' '
df_test refs/df-as-dlp --pack --add-del foo foo/bar
 '
 
-test_expect_failure 'D/F conflict prevents delete long packed + add short' '
+test_expect_success 'D/F conflict prevents delete long packed + add short' '
df_test refs/df-dlp-as --pack --del-add foo/bar foo
 '
 
-test_expect_failure 'D/F conflict prevents delete short packed + add long' '
+test_expect_success 'D/F conflict prevents delete short packed + add long' '
df_test refs/df-dsp-al --pack --del-add foo foo/bar
 '
 
 # Try some combinations involving symbolic refs...
 
-test_expect_failure 'D/F conflict prevents indirect add long + delete short' '
+test_expect_success 'D/F conflict prevents indirect add long + delete short' '
df_test refs/df-ial-ds --sym-add --add-del foo/bar foo
 '
 
@@ -314,11 +314,11 @@ test_expect_success 'D/F conflict prevents indirect add 
short + indirect delete
df_test refs/df-ias-idl --sym-add --sym-del --add-del foo foo/bar
 '
 
-test_expect_failure 'D/F conflict prevents indirect delete long + indirect add 
short' '
+test_expect_success 'D/F conflict prevents indirect delete long + indirect add 
short' '
df_test refs/df-idl-ias --sym-add --sym-del --del-add foo/bar foo
 '
 
-test_expect_failure 'D/F conflict prevents indirect add long + delete short 
packed' '
+test_expect_success 'D/F conflict prevents indirect add long + delete short 
packed' '
df_test refs/df-ial-dsp --sym-add --pack --add-del foo/

[PATCH 1/2] t1404: add a bunch of tests of D/F conflicts

2017-10-24 Thread Michael Haggerty
It is currently not allowed, in a single transaction, to add one
reference and delete another reference if the two reference names D/F
conflict with each other (e.g., like `refs/foo/bar` and `refs/foo`).
The reason is that the code would need to take locks

$GIT_DIR/refs/foo.lock
$GIT_DIR/refs/foo/bar.lock

But the latter lock couldn't coexist with the loose reference file

$GIT_DIR/refs/foo

, because `$GIT_DIR/refs/foo` cannot be both a directory and a file at
the same time (hence the name "D/F conflict).

Add a bunch of tests that we cleanly reject such transactions.

In fact, many of the new tests currently fail. They will be fixed in
the next commit along with an explanation.

Reported-by: Jeff King 
Signed-off-by: Michael Haggerty 
---
 t/t1404-update-ref-errors.sh | 146 +++
 1 file changed, 146 insertions(+)

diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 100d50e362..8b5e9a83c5 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -34,6 +34,86 @@ test_update_rejected () {
 
 Q="'"
 
+# Test adding and deleting D/F-conflicting references in a single
+# transaction.
+df_test() {
+   local prefix="$1"
+   shift
+   local pack=:
+   local symadd=false
+   local symdel=false
+   local add_del=false
+   local addref
+   local delref
+   while test $# -gt 0
+   do
+   case "$1" in
+   --pack)
+   pack="git pack-refs --all"
+   shift
+   ;;
+   --sym-add)
+   # Perform the add via a symbolic reference
+   symadd=true
+   shift
+   ;;
+   --sym-del)
+   # Perform the del via a symbolic reference
+   symdel=true
+   shift
+   ;;
+   --del-add)
+   # Delete first reference then add second
+   add_del=false
+   delref="$prefix/r/$2"
+   addref="$prefix/r/$3"
+   shift 3
+   ;;
+   --add-del)
+   # Add first reference then delete second
+   add_del=true
+   addref="$prefix/r/$2"
+   delref="$prefix/r/$3"
+   shift 3
+   ;;
+   *)
+   echo 1>&2 "Extra args to df_test: $*"
+   return 1
+   ;;
+   esac
+   done
+   git update-ref "$delref" $C &&
+   if $symadd
+   then
+   addname="$prefix/s/symadd" &&
+   git symbolic-ref "$addname" "$addref"
+   else
+   addname="$addref"
+   fi &&
+   if $symdel
+   then
+   delname="$prefix/s/symdel" &&
+   git symbolic-ref "$delname" "$delref"
+   else
+   delname="$delref"
+   fi &&
+   cat >expected-err <<-EOF &&
+   fatal: cannot lock ref $Q$addname$Q: $Q$delref$Q exists; cannot create 
$Q$addref$Q
+   EOF
+   $pack &&
+   if $add_del
+   then
+   printf "%s\n" "create $addname $D" "delete $delname"
+   else
+   printf "%s\n" "delete $delname" "create $addname $D"
+   fi >commands &&
+   test_must_fail git update-ref --stdin output.err &&
+   test_cmp expected-err output.err &&
+   printf "%s\n" "$C $delref" >expected-refs &&
+   git for-each-ref --format="%(objectname) %(refname)" $prefix/r 
>actual-refs &&
+   test_cmp expected-refs actual-refs
+}
+
 test_expect_success 'setup' '
 
git commit --allow-empty -m Initial &&
@@ -188,6 +268,72 @@ test_expect_success 'empty directory should not fool 1-arg 
delete' '
git update-ref --stdin
 '
 
+test_expect_success 'D/F conflict prevents add long + delete short' '
+   df_test refs/df-al-ds --add-del foo/bar foo
+'
+
+test_expect_success 'D/F conflict prevents add short + delete long' '
+   df_test refs/df-as-dl --add-del foo foo/bar
+'
+
+test_expect_failure 'D/F conflict prevents delete long + add short' '
+   df_test refs/df-dl-as --del-add foo/bar foo
+'
+
+test_expect_failure 'D/F conflict prevents delete short + add long' '
+   df_test refs/df-ds-al --del-add foo foo/bar
+'
+
+test_expect_success 'D/F conflict prevents add long + delete short packed' '
+   df_test refs/df-al-dsp --pack --add-del foo/bar foo
+'
+
+test_expect_success 'D/F conflict prevents add short + delete long packed' '
+   df_test refs/df-as-dlp --pack --add-del foo foo/bar
+'
+
+test_expect_failure 'D/F conflict prevents delete long packed + add short' '
+   df_test refs/df-dlp-as --pack --del-add foo/bar foo
+'
+
+test_expect_failure 'D/F conflict prevents delete short packe

[PATCH 0/2] Fix an error-handling path when locking refs

2017-10-24 Thread Michael Haggerty
In [1], Peff described a bug that he found that could cause a
reference transaction to be partly carried-out and partly not, with a
successful return. The scenario that he discussed was the deletion of
one reference and creation of another, where the two references D/F
conflict with each other.

But in fact the problem is more general; it can happen whenever a
reference deletion within a transaction is processed successfully, and
then another reference update in the same transaction fails in
`lock_ref_for_update()`. In such a case the failed update and any
subsequent ones could be silently ignored.

There is a longer explanation in the second patch's commit message.

The tests that I added probe a bunch of D/F update scenarios, which I
think should be characteristic of the scenarios that would trigger
this bug. It would be nice to have tests that examine other types of
failures (e.g., wrong `old_oid` values).

Bit since the fix is obviously a strict improvement and can prevent
data loss, and since the release process is already pretty far along,
I wanted to send this out ASAP. We can follow it up later with
additional tests.

These patches apply to current master. They are also available from my
GitHub fork [2] as branch `ref-locking-fix`.

While looking at this code again, I realized that the new code
rewrites the `packed-refs` file more often than did the old code.
Specifically, after dc39e09942 (files_ref_store: use a transaction to
update packed refs, 2017-09-08), the `packed-refs` file is overwritten
for any transaction that includes a reference delete. Prior to that
commit, `packed-refs` was only overwritten if a deleted reference was
actually present in the existing `packed-refs` file. This is even the
case if there was previously no `packed-refs` file at all; now any
reference deletion causes an empty `packed-refs` file to be created.

I think this is a conscionable regression, since deleting references
that are purely loose is probably not all that common, and the old
code had to read the whole `packed-refs` file anyway. Nevertheless, it
is obviously something that I would like to fix (though maybe not for
2.15? I'm open to input about its urgency.)

[1] 
https://public-inbox.org/git/20171024082409.smwsd6pla64jj...@sigill.intra.peff.net/
[2] https://github.com/mhagger/git

Michael Haggerty (2):
  t1404: add a bunch of tests of D/F conflicts
  files_transaction_prepare(): fix handling of ref lock failure

 refs/files-backend.c |   2 +-
 t/t1404-update-ref-errors.sh | 146 +++
 2 files changed, 147 insertions(+), 1 deletion(-)

-- 
2.14.1



[PATCH] path: use xmalloc in add_to_trie

2017-10-24 Thread Andrey Okoshkin
Add usage of xmalloc() instead of malloc() in add_to_trie() as xmalloc wraps
and checks memory allocation result.

Signed-off-by: Andrey Okoshkin 
---
Hello,
I'm not sure but it looks like there is a missing check of the malloc result.
memcpy() may crash with SIGSEGV due to the memory allocation failure.
make_trie_node() uses xmalloc() and xcalloc() - so I believe add_to_trie()
also should use it.

Best regards,
Andrey

 path.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.c b/path.c
index 335d4dd87..69b23ef17 100644
--- a/path.c
+++ b/path.c
@@ -191,7 +191,7 @@ static void *add_to_trie(struct trie *root, const char 
*key, void *value)
 * Split this node: child will contain this node's
 * existing children.
 */
-   child = malloc(sizeof(*child));
+   child = xmalloc(sizeof(*child));
memcpy(child->children, root->children, sizeof(root->children));
 
child->len = root->len - i - 1;
-- 
2.14.3


[PATCH v2] read_index_from(): Skip verification of the cache entry order to speed index loading

2017-10-24 Thread Ben Peart
There is code in post_read_index_from() to detect out of order cache
entries when reading an index file.  This order verification adds cost
to read_index_from() that grows with the size of the index.

Put this on-disk data-structure validation code behind an #ifdef DEBUG
so only debug builds have to pay the cost.

The effect can be seen using t/perf/p0002-read-cache.sh:

Test w/git repo   HEADHEAD~1

read_cache/discard_cache 1000 times   0.42(0.01+0.09) 0.48(0.01+0.09) +14.3%
read_cache/discard_cache 1000 times   0.41(0.03+0.04) 0.49(0.00+0.10) +19.5%
read_cache/discard_cache 1000 times   0.42(0.03+0.06) 0.49(0.06+0.04) +16.7%

Test w/10K files  HEADHEAD~1
---
read_cache/discard_cache 1000 times   1.58(0.04+0.00) 1.71(0.00+0.07) +8.2%
read_cache/discard_cache 1000 times   1.64(0.01+0.07) 1.76(0.01+0.09) +7.3%
read_cache/discard_cache 1000 times   1.62(0.03+0.04) 1.71(0.00+0.04) +5.6%

Test w/100K files HEAD HEAD~1
-
read_cache/discard_cache 1000 times   25.85(0.00+0.06) 27.35(0.01+0.06) +5.8%
read_cache/discard_cache 1000 times   25.82(0.01+0.07) 27.25(0.01+0.07) +5.5%
read_cache/discard_cache 1000 times   26.00(0.01+0.07) 27.36(0.06+0.03) +5.2%

Test with 1,000K filesHEAD  HEAD~1
---
read_cache/discard_cache 1000 times   200.61(0.01+0.07) 218.23(0.03+0.06) +8.8%
read_cache/discard_cache 1000 times   201.62(0.03+0.06) 217.86(0.03+0.06) +8.1%
read_cache/discard_cache 1000 times   201.64(0.01+0.09) 217.89(0.03+0.07) +8.1%

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/95e20f17ff
Checkout: git fetch https://github.com/benpeart/git no_ce_order-v2 && git 
checkout 95e20f17ff

### Interdiff (v1..v2):

### Patches

 read-cache.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..fc90ec0fce 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1664,6 +1664,7 @@ static struct cache_entry *create_from_disk(struct 
ondisk_cache_entry *ondisk,
return ce;
 }
 
+#ifdef DEBUG
 static void check_ce_order(struct index_state *istate)
 {
unsigned int i;
@@ -1685,6 +1686,7 @@ static void check_ce_order(struct index_state *istate)
}
}
 }
+#endif
 
 static void tweak_untracked_cache(struct index_state *istate)
 {
@@ -1720,7 +1722,9 @@ static void tweak_split_index(struct index_state *istate)
 
 static void post_read_index_from(struct index_state *istate)
 {
+#ifdef DEBUG
check_ce_order(istate);
+#endif
tweak_untracked_cache(istate);
tweak_split_index(istate);
 }

base-commit: c52ca88430e6ec7c834af38720295070d8a1e330
-- 
2.14.1.windows.1.1034.g0776750557



Re: Git diff --no-index and file descriptor inputs

2017-10-24 Thread Dennis Kaarsemaker
On Thu, 2017-09-07 at 15:55 -0400, Jason Pyeron wrote:
> > -Original Message-
> > From: Martin Ågren
> > Sent: Thursday, September 7, 2017 1:48 PM
> > 
> > On 7 September 2017 at 18:00, Jason Pyeron  wrote:
> > > 
> > > I was hoping to leverage --word-diff-regex, but the 
> > 
> > --no-index option 
> > > does not seem to work with <(...) notation.
> > > 
> > > I am I doing something wrong or is this a bug?
> > 
> > There were some patches floating around half a year ago, I 
> > don't know what happened to them.
> > 
> > From: Dennis Kaarsemaker
> > Sent: Friday, January 13, 2017 5:20 AM
> > Subject: [PATCH v2 0/2] diff --no-index: support symlinks and pipes
> > https://public-inbox.org/git/20170324213110.4331-1-den...@kaarsemaker.net/
> 
> I see, it goes back to 2016...
> 
> > From: Dennis Kaarsemaker
> > Subject: [RFC/PATCH 0/2] git diff <(command1) <(command2)
> > Date: Fri, 11 Nov 2016 21:19:56 +0100
> 
> https://public-inbox.org/git/2016201958.2175-1-den...@kaarsemaker.net/
> 
> I will read up on those threads weekend, then try to apply the patches and
> see what happens.
> 
> Thanks for the google fu help.

I hope to send a new version of this soon. I had almost no time to do
anything git related in the last year, trying to catch up with mailing
list traffic now.

D.


[PATCH v2 1/1] completion: add remaining flags to checkout

2017-10-24 Thread Thomas Braun
In the commits 1fc458d9 (builtin/checkout: add --recurse-submodules
switch, 2017-03-14), 08d595dc (checkout: add --ignore-skip-worktree-bits
in sparse checkout mode, 2013-04-13) and 32669671 (checkout: introduce
--detach synonym for "git checkout foo^{commit}", 2011-02-08) checkout
gained new flags but the completion was not updated, although these flags
are useful completions. Add them.

The flags --force and --ignore-other-worktrees are not added as they are
potentially dangerous.

The flags --progress and --no-progress are only useful for scripting and are
therefore also not included.

Signed-off-by: Thomas Braun 
---
 contrib/completion/git-completion.bash | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index d934417475..eb6ade6974 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1250,7 +1250,8 @@ _git_checkout ()
--*)
__gitcomp "
--quiet --ours --theirs --track --no-track --merge
-   --conflict= --orphan --patch
+   --conflict= --orphan --patch --detach 
--ignore-skip-worktree-bits
+   --recurse-submodules --no-recurse-submodules
"
;;
*)
-- 
2.15.0.rc0.245.g6d586db062



Re: [PATCH v1 1/1] completion: add remaining flags to checkout

2017-10-24 Thread Thomas Braun

> Johannes Sixt  hat am 12. Oktober 2017 um 18:50 geschrieben:
> 
> 
> Am 12.10.2017 um 14:20 schrieb Thomas Braun:
> > In the commits 1d0fa898 (checkout: add --ignore-other-wortrees,
> > 2015-01-03), 1fc458d9 (builtin/checkout: add --recurse-submodules switch,
> > 2017-03-14), 870ebdb9 (checkout: add --progress option, 2015-11-01),
> > 08d595dc (checkout: add --ignore-skip-worktree-bits in sparse checkout
> > mode, 2013-04-13), 1d0fa898 (checkout: add --ignore-other-wortrees,
> > 2015-01-03), 32669671 (checkout: introduce --detach synonym for "git
> > checkout foo^{commit}", 2011-02-08) and db941099 (checkout -f: allow
> > ignoring unmerged paths when checking out of the index, 2008-08-30)
> > checkout gained new flags but the completion was not updated, although
> > these flags are useful completions. Add them.
> > 
> > Signed-off-by: Thomas Braun 
> > ---
> >   contrib/completion/git-completion.bash | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/contrib/completion/git-completion.bash 
> > b/contrib/completion/git-completion.bash
> > index d934417475..393d4ae230 100644
> > --- a/contrib/completion/git-completion.bash
> > +++ b/contrib/completion/git-completion.bash
> > @@ -1250,7 +1250,9 @@ _git_checkout ()
> > --*)
> > __gitcomp "
> > --quiet --ours --theirs --track --no-track --merge
> > -   --conflict= --orphan --patch
> > +   --conflict= --orphan --patch --detach --progress 
> > --no-progress
> > +   --force --ignore-skip-worktree-bits 
> > --ignore-other-worktrees
> 
> Destructive and dangerous options are typically not offered by command 
> completion. You should omit all three in the line above, IMO.
> 
> Furthermore, --progress and --no-progress are not useful in daily work 
> on the command line, I think. By offering them, --p would not 
> complete to --patch anymore, you would need --pa. You should omit 
> them, too.

Thanks for the review.

I've fixed that for the next reroll.


Re: v2.15.0-rc2 ref deletion bug

2017-10-24 Thread Michael Haggerty
On 10/24/2017 01:05 PM, Michael Haggerty wrote:
> On 10/24/2017 10:24 AM, Jeff King wrote:
>> I found a potentially serious bug in v2.15.0-rc2 (and earlier release
>> candidates, too) that we may want to deal with before the release.
>>
>> If I do:
>> [...]
>> then at the end we have no refs at all!
> 
> That's a serious bug. I'm looking into it right now.

The fix is trivial (see below). But let me add some tests and make sure
that there are no similar breakages in the area, then submit a full patch.

Michael

- refs/files-backend.c
-
index 29eb5e826f..fc3f2abcc6 100644
@@ -2523,15 +2523,15 @@ static int files_transaction_prepare(struct
ref_store *ref_store,
 */
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];

ret = lock_ref_for_update(refs, update, transaction,
  head_ref, &affected_refnames, err);
if (ret)
-   break;
+   goto cleanup;

if (update->flags & REF_DELETING &&
!(update->flags & REF_LOG_ONLY) &&
!(update->flags & REF_ISPRUNING)) {
/*
 * This reference has to be deleted from
 * packed-refs if it exists there.


Re: [RFC PATCH 2/8] commit: move code to update HEAD to libgit

2017-10-24 Thread Junio C Hamano
Phillip Wood  writes:

>> I suspect that it would be sufficient to make update_head() helper
>> function take the reflog action message as another parameter
>> instead to fix the above, but there may be other reasons why you
>> chose to do it this way---I cannot read it in your empty log
>> message, though.
>
> Good point, sorry I should have added some explanation about that. I
> went with using setenv() rather than passing a reflog message to
> update_head() as it meant there were no changes needed on the sequencer
> side as it already sets GIT_REFLOG_ACTION. As the sequencer already sets
> GIT_REFLOG_ACTION, and git-commit does not fork any subprocesses I don't

Doesn't "git commit" run number of hooks?  Is it just the current
code does not run any hooks (by chance) after these new setenv()
calls are made and we happen to be safe?


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-24 Thread Heiko Voigt
On Tue, Oct 24, 2017 at 02:18:49PM +0900, Junio C Hamano wrote:
> Johannes Schindelin  writes:
> 
> > We meticulously pass the `exclude` flag to the `treat_directory()`
> > function so that we can indicate that files in it are excluded rather
> > than untracked when recursing.
> >
> > But we did not yet treat submodules the same way.
> 
> ... "because of that, we ended up showing < what situation>>" would be a nice thing to have here, so that it can
> be copied to the release notes for the bugfix.  

Yes I agree that would be nice here. It was not immediately obvious that
this only applies when using both flags: -u and --ignored.

Seems to be a corner that not many people are using. At first I thought
a plain 'git status' would show that behavior...

> How far back a release do we want to make this fix applicable?  It
> seems that it applies cleanly to maint-2.13 without breaking from my
> quick test, so that is probably where I'll queue this, even though
> we may no longer issue further maintenance releases on that track.
> 
> Any comment from submodule folks?
> 
> Sorry that I didn't notice this was left unattended by anybody til
> now.  Will queue while waiting for those who are into submodules to
> respond.

Looks good to me.

Cheers Heiko


Re: v2.15.0-rc2 ref deletion bug

2017-10-24 Thread Michael Haggerty
On 10/24/2017 10:24 AM, Jeff King wrote:
> I found a potentially serious bug in v2.15.0-rc2 (and earlier release
> candidates, too) that we may want to deal with before the release.
> 
> If I do:
> 
> git init -q repo
> cd repo
> obj=$(git hash-object -w /dev/null)
> git update-ref refs/tags/foo $obj
> git update-ref --stdin <<-EOF
> delete refs/tags/foo
> update refs/tags/foo/bar $obj
> EOF
> git for-each-ref
> 
> then at the end we have no refs at all!

That's a serious bug. I'm looking into it right now.

> I'd expect one of:
> 
>   1. We delete "foo" before updating "foo/bar", and we end up with a
>  single ref.

I don't think that this is possible in the general case in a single
transaction. The problem is that the code would need to take locks

refs/tags/foo.lock
refs/tags/foo/bar.lock

But the latter couldn't coexist with the loose reference file

refs/tags/foo

, which might already exist.

It is only imaginable to do this in a single transaction if you pack and
prune `refs/tags/foo` first, to get the loose reference out of the way,
before executing the transaction. Even then, you would have to beware of
a race where another process writes a loose version of `refs/tags/foo`
between the time that `pack-refs` prunes it and the time that the
transaction obtains the lock again.

> [...]

Michael


Docker image for git

2017-10-24 Thread Vaibhav Sood
Hi,

Want to check if there is a docker image/Dockerfile for git which is officially 
supported? I could not find one under dockerhub official images 
https://hub.docker.com/explore/

If not, want to check if there is any plan to add an official git image to 
dockerhub? 
https://docs.docker.com/docker-hub/official_repos/#how-do-i-create-a-new-official-repository

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.



Re: [ANNOUNCE] Git for Windows 2.14.3

2017-10-24 Thread Johannes Schindelin
Dear Git users,

On Tue, 24 Oct 2017, Johannes Schindelin wrote:

> Dear Git users,
> 
> It is my pleasure to announce that Git for Windows 2.14.3 is available from:
> 
>   https://git-for-windows.github.io/

... and at this point, usually the release notes are repeated. In this
case, they were not, for a simple reason: I am working very hard on
automating as much of the release engineering of Git for Windows (first,
because it is no fun to do it manually, second, it costs a lot of my time,
third, it is not robust if I can easily forget a step, fourth, it is an
awfully low bus number, and fifth, it hogs my computer for hours during
which I could do other work otherwise, when I have a perfectly fine cloud
available at my fingertips). And despite working on this for the entire
day (and until almost 4am), a couple of things did not work correctly,
including the generation of this mail. I will make sure to write a blog
post about this endeavor, in case anybody is interested in the background
story.

But in the short run: here are the missing release notes:

Changes since Git for Windows v2.14.2(3) (October 12th 2017)

New Features

  * Comes with Git v2.14.3.
  * Git for Windows now ships with a diff helper for OpenOffice
documents.
  * Comes with Git LFS v2.3.4.
  * Comes with cURL v7.56.1.

Bug Fixes

  * Git for Windows now handles worktrees at the top-level of a UNC
share correctly.

> Filename | SHA-256
>  | ---
> Git-2.14.3-64-bit.exe | 
> 9610e082b823beb7f0da91a98d9f73e1f3f2430c21b2c4e15517dea4f981be3f
> Git-2.14.3-32-bit.exe | 
> 6e5a8a939f3014b396f58622954ab394d7982d036c84571394118f2360bdca96
> PortableGit-2.14.3-64-bit.7z.exe | 
> 2b1d952078795117a8c4549f6384275e047ebd75c10bea77e675f8b672e6d87a
> PortableGit-2.14.3-32-bit.7z.exe | 
> f2dcb32c3133188d0b7a2c3683adcbebcc10b054467e1754d1b8b7e534a34494
> MinGit-2.14.3-64-bit.zip | 
> 538294d2b1472e561493b67855f92380d8139011c74be6bf3cdc5b5d321b1345
> MinGit-2.14.3-32-bit.zip | 
> a91385acb1da220612790807c41d0f304b41093c474b9d7342230ec194a3398e
> MinGit-2.14.3-busybox-64-bit.zip | 
> b7710c7668d7ad3f1f5f7530b601d9bafbe66fcef5563c8ab74d442ac9478d8e
> MinGit-2.14.3-busybox-32-bit.zip | 
> 8982fd12c60a9edd1b6f5f8465354534920bae351d38c867a2f4034a807d8231
> Git-2.14.3-64-bit.tar.bz2 | 
> a5f09850334d5069afa0013249cc6678a7cde52c673823e5386d5cad9df41f10
> Git-2.14.3-32-bit.tar.bz2 | 
> 644b7d7593e675f68a5a011d19a0a917430b79fb815f6260b807c00651696fa2

Ciao,
Johannes


Re: [RFC PATCH 2/8] commit: move code to update HEAD to libgit

2017-10-24 Thread Phillip Wood
On 07/10/17 10:54, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> From: Phillip Wood 
>>
>> Signed-off-by: Phillip Wood 
>> ---
> 
> This seems to do a lot more than just moving code, most notably, it
> uses setenv() to affect what happens in any subprocesses we may
> spawn, and it is unclear if it was verified that this patch is free
> of unwanted consequences due to that change (and any others I may
> have missed while reading this patch, if any).
>
> I suspect that it would be sufficient to make update_head() helper
> function take the reflog action message as another parameter
> instead to fix the above, but there may be other reasons why you
> chose to do it this way---I cannot read it in your empty log
> message, though.

Good point, sorry I should have added some explanation about that. I
went with using setenv() rather than passing a reflog message to
update_head() as it meant there were no changes needed on the sequencer
side as it already sets GIT_REFLOG_ACTION. As the sequencer already sets
GIT_REFLOG_ACTION, and git-commit does not fork any subprocesses I don't
think this change has any unwanted consequences (I pushed a branch to
github before submitting the patches and the test suite passes on
travis). It would however be clearer to add a parameter to update_head()
for the reflog message as you suggested.

> 
> I will not give line-by-line style nitpick but in general we do not
> leave a SP between function name and the open parenthesis that
> starts its argument list.  New code in this patch seems to use
> mixture of styles.

Sorry I should have spotted those before I posted this series, I go
though all the patches and fix them (this would be a good opportunity
for me to try using git-clang-format from next)

Thanks for looking at this, did you have time to look at the other
changes in this series or did this patch put you off looking further?
I'll update and repost probably towards the end of next week. If I
continue to base these patches on master then I think the patch that
moves the code to print the commit summary will have (trivial) conflicts
with the changes in ao/check-resolve-ref-unsafe-result in pu do you want
the new patches based pu or are you happy with them based on master?

Best Wishes

Phillip

>> diff --git a/builtin/commit.c b/builtin/commit.c
>> index 
>> 0b8c1ef6f57cfed328d12255e6834adb4bda4137..497778ba2c02afdd4a337969a27ca781e8389040
>>  100644
>> --- a/builtin/commit.c
>> +++ b/builtin/commit.c
>> @@ -1578,13 +1578,11 @@ int cmd_commit(int argc, const char **argv, const 
>> char *prefix)
>>  struct strbuf sb = STRBUF_INIT;
>>  struct strbuf author_ident = STRBUF_INIT;
>>  const char *index_file, *reflog_msg;
>> -char *nl;
>>  struct object_id oid;
>>  struct commit_list *parents = NULL;
>>  struct stat statbuf;
>>  struct commit *current_head = NULL;
>>  struct commit_extra_header *extra = NULL;
>> -struct ref_transaction *transaction;
>>  struct strbuf err = STRBUF_INIT;
>>  
>>  if (argc == 2 && !strcmp(argv[1], "-h"))
>> @@ -1625,10 +1623,10 @@ int cmd_commit(int argc, const char **argv, const 
>> char *prefix)
>>  reflog_msg = getenv("GIT_REFLOG_ACTION");
>>  if (!current_head) {
>>  if (!reflog_msg)
>> -reflog_msg = "commit (initial)";
>> +setenv ("GIT_REFLOG_ACTION", "commit (initial)", 1);
>>  } else if (amend) {
>>  if (!reflog_msg)
>> -reflog_msg = "commit (amend)";
>> +setenv("GIT_REFLOG_ACTION", "commit (amend)", 1);
>>  parents = copy_commit_list(current_head->parents);
>>  } else if (whence == FROM_MERGE) {
>>  struct strbuf m = STRBUF_INIT;
>> @@ -1637,7 +1635,7 @@ int cmd_commit(int argc, const char **argv, const char 
>> *prefix)
>>  struct commit_list **pptr = &parents;
>>  
>>  if (!reflog_msg)
>> -reflog_msg = "commit (merge)";
>> +setenv("GIT_REFLOG_ACTION", "commit (merge)", 1);
>>  pptr = commit_list_append(current_head, pptr);
>>  fp = xfopen(git_path_merge_head(), "r");
>>  while (strbuf_getline_lf(&m, fp) != EOF) {
>> @@ -1660,9 +1658,9 @@ int cmd_commit(int argc, const char **argv, const char 
>> *prefix)
>>  parents = reduce_heads(parents);
>>  } else {
>>  if (!reflog_msg)
>> -reflog_msg = (whence == FROM_CHERRY_PICK)
>> -? "commit (cherry-pick)"
>> -: "commit";
>> +setenv("GIT_REFLOG_ACTION", (whence == FROM_CHERRY_PICK)
>> +? "commit (cherry-pick)"
>> +: "commit", 1);
>>  commit_list_insert(current_head, &parents);
>>  }
>>  
>> @@ -1707,25 +1705,10 @@ int cmd_commit(int argc, const char **argv, const 

v2.15.0-rc2 ref deletion bug

2017-10-24 Thread Jeff King
I found a potentially serious bug in v2.15.0-rc2 (and earlier release
candidates, too) that we may want to deal with before the release.

If I do:

git init -q repo
cd repo
obj=$(git hash-object -w /dev/null)
git update-ref refs/tags/foo $obj
git update-ref --stdin <<-EOF
delete refs/tags/foo
update refs/tags/foo/bar $obj
EOF
git for-each-ref

then at the end we have no refs at all!

I'd expect one of:

  1. We delete "foo" before updating "foo/bar", and we end up with a
 single ref.

  2. We complain that we cannot update "foo/bar" because "foo" still
 exists.

I was hoping for (1). But in earlier releases we did (2). That makes
sense because it's safer to do all updates in a transaction before doing
any deletes (since if there's a simultaneous prune we'd rather see both
refs present for a moment rather than neither).

But the v2.15 behavior is just buggy, and may lead to data loss (we
silently lose the refs, and then a subsequent prune may lose the
objects). This bisects to Michael's dc39e09942 (files_ref_store: use a
transaction to update packed refs, 2017-09-08).

Curiously, it doesn't happen if you reverse the order of the entries in
the transaction (which _shouldn't_ matter, since we try to process it
atomically, but obviously it just tickles this bug in a funny way).

I haven't figured out if the deletion has to be a prefix of the update
to trigger the bug, or if the problem is more widespread.

-Peff


Re: [PATCH] status: do not get confused by submodules in excluded directories

2017-10-24 Thread Kevin Daudt
On Tue, Oct 17, 2017 at 03:10:11PM +0200, Johannes Schindelin wrote:
> We meticulously pass the `exclude` flag to the `treat_directory()`
> function so that we can indicate that files in it are excluded rather
> than untracked when recursing.
> 
> But we did not yet treat submodules the same way.
> 
> Signed-off-by: Johannes Schindelin 
> ---
> Published-As: 
> https://github.com/dscho/git/releases/tag/submodule-in-excluded-v1
> Fetch-It-Via: git fetch https://github.com/dscho/git submodule-in-excluded-v1
>  dir.c  |  2 +-
>  t/t7061-wtstatus-ignore.sh | 14 ++
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 1d17b800cf3..9987011da57 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1392,7 +1392,7 @@ static enum path_treatment treat_directory(struct 
> dir_struct *dir,
>   if (!(dir->flags & DIR_NO_GITLINKS)) {
>   unsigned char sha1[20];
>   if (resolve_gitlink_ref(dirname, "HEAD", sha1) == 0)
> - return path_untracked;
> + return exclude ? path_excluded : path_untracked;
>   }
>   return path_recurse;
>   }
> diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
> index fc6013ba3c8..8c849a4cd2f 100755
> --- a/t/t7061-wtstatus-ignore.sh
> +++ b/t/t7061-wtstatus-ignore.sh
> @@ -272,4 +272,18 @@ test_expect_success 'status ignored tracked directory 
> with uncommitted file in t
>   test_cmp expected actual
>  '
>  
> +cat >expected <<\EOF
> +!! tracked/submodule/
> +EOF
> +
> +test_expect_success 'status ignores submodule in excluded directory' '
> + git init tracked/submodule &&
> + (
> + cd tracked/submodule &&
> + test_commit initial
> + ) &&

Could this use test_commit -C tracked/submodule initial?

> + git status --porcelain --ignored -u tracked/submodule >actual &&
> + test_cmp expected actual
> +'
> +
>  test_done
> 
> base-commit: 111ef79afe185f8731920569450f6a65320f5d5f
> -- 
> 2.14.2.windows.3


git describe returns tags including 'refs/tags/

2017-10-24 Thread Voie, Per Erlend Torbergsen
Hi

I am versioning my python package using git tags and the python-versioneer 
package.

Basically versioneer gets the version number from git tags using the command 
"git describe --tags --dirty --always --long --match 'tag_prefix*'". I do not 
apply a tag prefix, hence tag_prefix=''.

Now to the problem. My build breaks because the command above returns tags like 
this "refs/tags/2.12.2" which contains invalid characters. Actually, it seems 
that executing the command above reconfigured the way git describe behaves.

because, before running the above command "git describe --tags --dirty" returned

2.12.2-17-gd8e62f9

After, "git describe --tags --dirty" returns

warning: tag 'refs/tags/2.12.2' is really '2.12.2' here
refs/tags/2.12.2-17-gd8e62f9

Is it so that "git describe --tags --dirty --always --long --match 
'tag_prefix*'" changes the way git describe behave or am I missing something? 
If it does, how can I reset it?

Thanks
Per

**
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**


Re: [RFE] Add minimal universal release management capabilities to GIT

2017-10-24 Thread Jacob Keller


On October 21, 2017 6:56:51 AM PDT, nicolas.mail...@laposte.net wrote:
>
>
>- Mail original -
>De: "Stefan Beller" 
>

>> git tags ?
>
>Too loosely defined to be relied on by project-agnostic tools. That's
>what most tools won't ever try to use those. Anything you will define
>around tags as they stand is unlikely to work on the project of someone
>else

I think that this could easily be built by a separate script which provides git 
release command line and uses tags under the hood in a well formed way. I 
wouldn't say that the method outlined here works for all projects but I do 
agree it's fairly common and could work for many projects

I think most large projects already use annotated tags and tho they have their 
own format it works pretty well. 

Showing a tool that could help projects create more standardized release tags 
would be helpful.

I think such a tool could already be built, scripted to create annotated tags 
with a well formed name. I don't think you necessarily need to have this in 
core git, tho I do see that your main goal is to piggyback on git itselfs 
popularity

Thanks
Jake
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.