date:20180913

Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2

2018-09-13 Thread Jonathan Nieder

Hi,

Josh Steadmon wrote:

> Subject: archive: allow archive over HTTP(S) with proto v2

It's interesting how little this has to touch the client.

> Signed-off-by: Josh Steadmon 
> ---
>  builtin/archive.c  |  8 +++-
>  http-backend.c | 10 +-
>  transport-helper.c |  5 +++--
>  3 files changed, 19 insertions(+), 4 deletions(-)
[]
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
>   status = packet_reader_read();
>   if (status != PACKET_READ_FLUSH)
>   die(_("git archive: expected a flush"));
> - }
> + } else if (version == protocol_v2 &&
> +starts_with(transport->url, "http"))

As Stefan noticed, this starts_with test seems a bit too loose.  For
example, what happens if I try an scp-style SSH URL like
http.example.com:path/to/repo, a local path like http/foo/bar, or a
custom protocol like httplikebutbetter://path/to/repo (honest
question: I haven't tried)?

> + /*
> +  * Commands over HTTP require two requests, so there's an
> +  * additional server response to parse.
> +  */
> + discover_version();

Can this be made consistent with the non-http case?  The original
capabilities (/info/refs) response told us what protocol version the
server wants to use, which means that a hypothetical protocol v3 could
use a completely different request format for the followup commands:
so could the server omit the protocol version in the v2
/git-upload-archive response?  Alternatively, if we want to include
the protocol version again, could we do that in stateful protocols as
well?

Related question: what should happen if the two responses declare
different protocol versions?  Should we diagnose that as a protocol
error?

[...]
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -32,6 +32,7 @@ struct rpc_service {
>  static struct rpc_service rpc_service[] = {
>   { "upload-pack", "uploadpack", 1, 1 },
>   { "receive-pack", "receivepack", 0, -1 },
> + { "upload-archive", "uploadarchive", 1, 1 },

shell.c orders these in almost-alphabetical order (receive-pack,
upload-pack, upload-archive).  I guess they should both use actual
alphabetical order?  (If you agree, then please feel free to do that
in a separate patch.)

[...]
> @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char 
> *service_name)
>   struct rpc_service *svc = select_service(hdr, service_name);
>   struct strbuf buf = STRBUF_INIT;
>  
> + if (!strcmp(service_name, "git-upload-archive")) {
> + /* git-upload-archive doesn't need --stateless-rpc */

This comment doesn't seem actionable.  Can it say why?  E.g. "[...]
because an upload-archive command always involves a single
round-trip".  Or alternatively, I think it's fine to omit the comment.

> + argv[1] = ".";
> + argv[2] = NULL;
> + }
[...]
> @@ -713,7 +720,8 @@ static struct service_cmd {
>   {"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
>  
>   {"POST", "/git-upload-pack$", service_rpc},
> - {"POST", "/git-receive-pack$", service_rpc}
> + {"POST", "/git-receive-pack$", service_rpc},
> + {"POST", "/git-upload-archive$", service_rpc},

Same comment about services seeming to be in a randomish order.

[...]
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -605,7 +605,8 @@ static int process_connect_service(struct transport 
> *transport,
>   ret = run_connect(transport, );
>   } else if (data->stateless_connect &&
>  (get_protocol_version_config() == protocol_v2) &&

(not about this patch) These parens don't help --- they make it harder
for me to read, especially with the new parens to try to match them up
with.

> -!strcmp("git-upload-pack", name)) {
> +(!strcmp("git-upload-pack", name) ||
> + !strcmp("git-upload-archive", name))) {

A part of me wonders about the wasted cycles comparing to
"git-upload-" twice, but (1) it is tiny relative to actually serving
the request and (2) if we're lucky, the compiler (or a compiler of the
future) inlines the strcmp call and could optimize it out.

[...]
> @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, 
> const char *name,
>  
>   /* Get_helper so connect is inited. */
>   get_helper(transport);
> - if (!data->connect)
> + if (!data->connect && !data->stateless_connect)
>   die(_("operation not supported by protocol"));

I don't understand this part.  Can you explain it further (possibly by
putting it in its own patch)?

Thanks for a pleasant read,
Jonathan

Re: [PATCH 2/3] archive: implement protocol v2 archive command

2018-09-13 Thread Jonathan Nieder

Hi,

Josh Steadmon wrote:

> This adds a new archive command for protocol v2. The command expects
> arguments in the form "argument X" which are passed unmodified to
> git-upload-archive--writer.
>
> This command works over the file://, Git, and SSH transports. HTTP
> support will be added in a separate patch.
>
> Signed-off-by: Josh Steadmon 
> ---
>  builtin/archive.c| 45 +++-
>  builtin/upload-archive.c | 44 ---
>  t/t5000-tar-tree.sh  |  5 +
>  3 files changed, 77 insertions(+), 17 deletions(-)

I like the diffstat. :)

Can this include some docs in Documentation/technical/ as well, to
help other implementors to understand the protocol so they can
interoperate?

Thanks,
Jonathan

Re: Add proto v2 archive command with HTTP support

2018-09-13 Thread Jonathan Nieder

Hi,

Josh Steadmon wrote:

> This series adds a new protocol v2 command for archiving, and allows
> this command to work over HTTP(S). This was previously discussed in [1].
> I've CCed everyone who participated in that discussion.

Yay!  Getting ready to read it now.

For the future, "git format-patch --cover-letter" does a few things
that can be nice for this kind of opening message:

- it lists the patches in the series, and a diffstat
- it puts [PATCH 0/3] in the subject line so people know what to expect

Thanks,
Jonathan

> [1]: 
> https://public-inbox.org/git/CANq=j3tk7qebjoc7vnwkh4+wbnibmjjp5yukd9te5naywuk...@mail.gmail.com/

Compliment of the day to you Dear Friend

2018-09-13 Thread Mrs. Amina Kadi

 Compliment of the day to you Dear Friend.

Dear Friend.
 
  I am Mrs. Amina Kadi. am sending this brief letter to solicit your
partnership to transfer $5.5 million US Dollars. I shall send you
more information and procedures when I receive positive response from
you.

Mrs. Amina Kadi

[PATCH v8 7/7] list-objects-filter: implement filter tree:0

2018-09-13 Thread Matthew DeVore

Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore 
---
 Documentation/rev-list-options.txt |  5 +++
 list-objects-filter-options.c  | 14 
 list-objects-filter-options.h  |  1 +
 list-objects-filter.c  | 49 ++
 t/t5317-pack-objects-filter-objects.sh | 28 +++
 t/t5616-partial-clone.sh   | 38 
 t/t6112-rev-list-filters-objects.sh| 12 +++
 7 files changed, 147 insertions(+)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index 7b273635d..5f1672913 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -731,6 +731,11 @@ the requested refs.
 +
 The form '--filter=sparse:path=' similarly uses a sparse-checkout
 specification contained in .
++
+The form '--filter=tree:' omits all blobs and trees whose depth
+from the root tree is >=  (minimum depth if an object is located
+at multiple depths in the commits traversed). Currently, only =0
+is supported, which omits all blobs and trees.
 
 --no-filter::
Turn off any previous `--filter=` argument.
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index c0e2bd6a0..14f251de4 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -50,6 +50,20 @@ static int gently_parse_list_objects_filter(
return 0;
}
 
+   } else if (skip_prefix(arg, "tree:", )) {
+   unsigned long depth;
+   if (!git_parse_ulong(v0, ) || depth != 0) {
+   if (errbuf) {
+   strbuf_init(errbuf, 0);
+   strbuf_addstr(
+   errbuf,
+   _("only 'tree:0' is supported"));
+   }
+   return 1;
+   }
+   filter_options->choice = LOFC_TREE_NONE;
+   return 0;
+
} else if (skip_prefix(arg, "sparse:oid=", )) {
struct object_context oc;
struct object_id sparse_oid;
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index a61f8..af64e5c66 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -10,6 +10,7 @@ enum list_objects_filter_choice {
LOFC_DISABLED = 0,
LOFC_BLOB_NONE,
LOFC_BLOB_LIMIT,
+   LOFC_TREE_NONE,
LOFC_SPARSE_OID,
LOFC_SPARSE_PATH,
LOFC__COUNT /* must be last */
diff --git a/list-objects-filter.c b/list-objects-filter.c
index 5f8b1a002..09b2b05d5 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -79,6 +79,54 @@ static void *filter_blobs_none__init(
return d;
 }
 
+/*
+ * A filter for list-objects to omit ALL trees and blobs from the traversal.
+ * Can OPTIONALLY collect a list of the omitted OIDs.
+ */
+struct filter_trees_none_data {
+   struct oidset *omits;
+};
+
+static enum list_objects_filter_result filter_trees_none(
+   enum list_objects_filter_situation filter_situation,
+   struct object *obj,
+   const char *pathname,
+   const char *filename,
+   void *filter_data_)
+{
+   struct filter_trees_none_data *filter_data = filter_data_;
+
+   switch (filter_situation) {
+   default:
+   BUG("unknown filter_situation: %d", filter_situation);
+
+   case LOFS_BEGIN_TREE:
+   case LOFS_BLOB:
+   if (filter_data->omits)
+   oidset_insert(filter_data->omits, >oid);
+   return LOFR_MARK_SEEN; /* but not LOFR_DO_SHOW (hard omit) */
+
+   case LOFS_END_TREE:
+   assert(obj->type == OBJ_TREE);
+   return LOFR_ZERO;
+
+   }
+}
+
+static void* filter_trees_none__init(
+   struct oidset *omitted,
+   struct list_objects_filter_options *filter_options,
+   filter_object_fn *filter_fn,
+   filter_free_fn *filter_free_fn)
+{
+   struct filter_trees_none_data *d = xcalloc(1, sizeof(*d));
+   d->omits = omitted;
+

[PATCH v8 1/7] list-objects: store common func args in struct

2018-09-13 Thread Matthew DeVore

This will make utility functions easier to create, as done by the next
patch.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 158 +++--
 1 file changed, 74 insertions(+), 84 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index c99c47ac1..584518a3f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -12,20 +12,25 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static void process_blob(struct rev_info *revs,
+struct traversal_context {
+   struct rev_info *revs;
+   show_object_fn show_object;
+   show_commit_fn show_commit;
+   void *show_data;
+   filter_object_fn filter_fn;
+   void *filter_data;
+};
+
+static void process_blob(struct traversal_context *ctx,
 struct blob *blob,
-show_object_fn show,
 struct strbuf *path,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
size_t pathlen;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
 
-   if (!revs->blob_objects)
+   if (!ctx->revs->blob_objects)
return;
if (!obj)
die("bad blob object");
@@ -41,21 +46,21 @@ static void process_blob(struct rev_info *revs,
 * may cause the actual filter to report an incomplete list
 * of missing objects.
 */
-   if (revs->exclude_promisor_objects &&
+   if (ctx->revs->exclude_promisor_objects &&
!has_object_file(>oid) &&
is_promisor_object(>oid))
return;
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BLOB, obj,
- path->buf, >buf[pathlen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BLOB, obj,
+  path->buf, >buf[pathlen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, path->buf, cb_data);
+   ctx->show_object(obj, path->buf, ctx->show_data);
strbuf_setlen(path, pathlen);
 }
 
@@ -81,26 +86,21 @@ static void process_blob(struct rev_info *revs,
  * the link, and how to do it. Whether it necessarily makes
  * any sense what-so-ever to ever do that is another issue.
  */
-static void process_gitlink(struct rev_info *revs,
+static void process_gitlink(struct traversal_context *ctx,
const unsigned char *sha1,
-   show_object_fn show,
struct strbuf *path,
-   const char *name,
-   void *cb_data)
+   const char *name)
 {
/* Nothing to do */
 }
 
-static void process_tree(struct rev_info *revs,
+static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
-show_object_fn show,
 struct strbuf *base,
-const char *name,
-void *cb_data,
-filter_object_fn filter_fn,
-void *filter_data)
+const char *name)
 {
struct object *obj = >object;
+   struct rev_info *revs = ctx->revs;
struct tree_desc desc;
struct name_entry entry;
enum interesting match = revs->diffopt.pathspec.nr == 0 ?
@@ -133,14 +133,14 @@ static void process_tree(struct rev_info *revs,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && filter_fn)
-   r = filter_fn(LOFS_BEGIN_TREE, obj,
- base->buf, >buf[baselen],
- filter_data);
+   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
+  base->buf, >buf[baselen],
+  ctx->filter_data);
if (r & LOFR_MARK_SEEN)
obj->flags |= SEEN;
if (r & LOFR_DO_SHOW)
-   show(obj, base->buf, cb_data);
+   ctx->show_object(obj, base->buf, ctx->show_data);
if (base->len)
strbuf_addch(base, '/');
 
@@ -157,29 +157,25 @@ static void process_tree(struct rev_info *revs,
}
 
if (S_ISDIR(entry.mode))
-   process_tree(revs,
+   process_tree(ctx,
 lookup_tree(the_repository, entry.oid),
-

[PATCH v8 3/7] list-objects: always parse trees gently

2018-09-13 Thread Matthew DeVore

If parsing fails when revs->ignore_missing_links and
revs->exclude_promisor_objects are both false, we print the OID anyway
in the die("bad tree object...") call, so any message printed by
parse_tree_gently() is superfluous.

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ccc529e5e..f9b51db7a 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,8 +143,6 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
-   int gently = revs->ignore_missing_links ||
-revs->exclude_promisor_objects;
 
if (!revs->tree_objects)
return;
@@ -152,7 +150,7 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, gently) < 0) {
+   if (parse_tree_gently(tree, 1) < 0) {
if (revs->ignore_missing_links)
return;
 
-- 
2.19.0.397.gdd90340f6a-goog

[PATCH v8 5/7] revision: mark non-user-given objects instead

2018-09-13 Thread Matthew DeVore

Currently, list-objects.c incorrectly treats all root trees of commits
as USER_GIVEN. Also, it would be easier to mark objects that are
non-user-given instead of user-given, since the places in the code
where we access an object through a reference are more obvious than
the places where we access an object that was given by the user.

Resolve these two problems by introducing a flag NOT_USER_GIVEN that
marks blobs and trees that are non-user-given, replacing USER_GIVEN.
(Only blobs and trees are marked because this mark is only used when
filtering objects, and filtering of other types of objects is not
supported yet.)

This fixes a bug in that git rev-list behaved differently from git
pack-objects. pack-objects would *not* filter objects given explicitly
on the command line and rev-list would filter. This was because the two
commands used a different function to add objects to the rev_info
struct. This seems to have been an oversight, and pack-objects has the
correct behavior, so I added a test to make sure that rev-list now
behaves properly.

Signed-off-by: Matthew DeVore 

fixup of 6defd70de
---
 list-objects.c  | 31 +
 revision.c  |  1 -
 revision.h  | 11 --
 t/t6112-rev-list-filters-objects.sh | 10 ++
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 243192af5..7a1a0929d 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -53,7 +53,7 @@ static void process_blob(struct traversal_context *ctx,
 
pathlen = path->len;
strbuf_addstr(path, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BLOB, obj,
   path->buf, >buf[pathlen],
   ctx->filter_data);
@@ -120,17 +120,19 @@ static void process_tree_contents(struct 
traversal_context *ctx,
continue;
}
 
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
+   if (S_ISDIR(entry.mode)) {
+   struct tree *t = lookup_tree(the_repository, entry.oid);
+   t->object.flags |= NOT_USER_GIVEN;
+   process_tree(ctx, t, base, entry.path);
+   }
else if (S_ISGITLINK(entry.mode))
process_gitlink(ctx, entry.oid->hash,
base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
+   else {
+   struct blob *b = lookup_blob(the_repository, entry.oid);
+   b->object.flags |= NOT_USER_GIVEN;
+   process_blob(ctx, b, base, entry.path);
+   }
}
 }
 
@@ -171,7 +173,7 @@ static void process_tree(struct traversal_context *ctx,
}
 
strbuf_addstr(base, name);
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn)
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn)
r = ctx->filter_fn(LOFS_BEGIN_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -185,7 +187,7 @@ static void process_tree(struct traversal_context *ctx,
if (!failed_parse)
process_tree_contents(ctx, tree, base);
 
-   if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
+   if ((obj->flags & NOT_USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
   base->buf, >buf[baselen],
   ctx->filter_data);
@@ -301,8 +303,11 @@ static void do_traverse(struct traversal_context *ctx)
 * an uninteresting boundary commit may not have its tree
 * parsed yet, but we are not going to show them anyway
 */
-   if (get_commit_tree(commit))
-   add_pending_tree(ctx->revs, get_commit_tree(commit));
+   if (get_commit_tree(commit)) {
+   struct tree *tree = get_commit_tree(commit);
+   tree->object.flags |= NOT_USER_GIVEN;
+   add_pending_tree(ctx->revs, tree);
+   }
ctx->show_commit(commit, ctx->show_data);
 
if (ctx->revs->tree_blobs_in_commit_order)
diff --git a/revision.c b/revision.c
index de4dce600..72d48a17f 100644
--- a/revision.c
+++ b/revision.c
@@ -175,7 +175,6 @@ static void add_pending_object_with_path(struct rev_info 
*revs,

[PATCH v8 4/7] rev-list: handle missing tree objects properly

2018-09-13 Thread Matthew DeVore

Previously, we assumed only blob objects could be missing. This patch
makes rev-list handle missing trees like missing blobs. The --missing=*
and --exclude-promisor-objects flags now work for trees as they already
do for blobs. This is demonstrated in t6112.

Signed-off-by: Matthew DeVore 
---
 builtin/rev-list.c | 11 ---
 list-objects.c | 11 +--
 revision.h | 15 +
 t/t0410-partial-clone.sh   | 45 ++
 t/t5317-pack-objects-filter-objects.sh | 13 
 t/t6112-rev-list-filters-objects.sh| 17 ++
 6 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 5b07f3f4a..49d6deed7 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -6,6 +6,7 @@
 #include "list-objects.h"
 #include "list-objects-filter.h"
 #include "list-objects-filter-options.h"
+#include "object.h"
 #include "object-store.h"
 #include "pack.h"
 #include "pack-bitmap.h"
@@ -209,7 +210,8 @@ static inline void finish_object__ma(struct object *obj)
 */
switch (arg_missing_action) {
case MA_ERROR:
-   die("missing blob object '%s'", oid_to_hex(>oid));
+   die("missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
case MA_ALLOW_ANY:
@@ -222,8 +224,8 @@ static inline void finish_object__ma(struct object *obj)
case MA_ALLOW_PROMISOR:
if (is_promisor_object(>oid))
return;
-   die("unexpected missing blob object '%s'",
-   oid_to_hex(>oid));
+   die("unexpected missing %s object '%s'",
+   type_name(obj->type), oid_to_hex(>oid));
return;
 
default:
@@ -235,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
 static int finish_object(struct object *obj, const char *name, void *cb_data)
 {
struct rev_list_info *info = cb_data;
-   if (obj->type == OBJ_BLOB && !has_object_file(>oid)) {
+   if (!has_object_file(>oid)) {
finish_object__ma(obj);
return 1;
}
@@ -373,6 +375,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
init_revisions(, prefix);
revs.abbrev = DEFAULT_ABBREV;
revs.commit_format = CMIT_FMT_UNSPECIFIED;
+   revs.do_not_die_on_missing_tree = 1;
 
/*
 * Scan the argument list before invoking setup_revisions(), so that we
diff --git a/list-objects.c b/list-objects.c
index f9b51db7a..243192af5 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -143,6 +143,7 @@ static void process_tree(struct traversal_context *ctx,
struct rev_info *revs = ctx->revs;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
+   int failed_parse;
 
if (!revs->tree_objects)
return;
@@ -150,7 +151,9 @@ static void process_tree(struct traversal_context *ctx,
die("bad tree object");
if (obj->flags & (UNINTERESTING | SEEN))
return;
-   if (parse_tree_gently(tree, 1) < 0) {
+
+   failed_parse = parse_tree_gently(tree, 1);
+   if (failed_parse) {
if (revs->ignore_missing_links)
return;
 
@@ -163,7 +166,8 @@ static void process_tree(struct traversal_context *ctx,
is_promisor_object(>oid))
return;
 
-   die("bad tree object %s", oid_to_hex(>oid));
+   if (!revs->do_not_die_on_missing_tree)
+   die("bad tree object %s", oid_to_hex(>oid));
}
 
strbuf_addstr(base, name);
@@ -178,7 +182,8 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   process_tree_contents(ctx, tree, base);
+   if (!failed_parse)
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
diff --git a/revision.h b/revision.h
index 007278cc1..5910613cb 100644
--- a/revision.h
+++ b/revision.h
@@ -126,6 +126,21 @@ struct rev_info {
line_level_traverse:1,
tree_blobs_in_commit_order:1,
 
+   /*
+* Blobs are shown without regard for their existence.
+* But not so for trees: unless exclude_promisor_objects
+* is set and the tree in question is a promisor object;
+* OR ignore_missing_links is set, the revision walker
+* dies with a "bad tree object HASH" message when
+* encountering a missing tree. For callers that can
+* handle missing trees

[PATCH v8 2/7] list-objects: refactor to process_tree_contents

2018-09-13 Thread Matthew DeVore

This will be used in a follow-up patch to reduce indentation needed when
invoking the logic conditionally. i.e. rather than:

if (foo) {
while (...) {
/* this is very indented */
}
}

we will have:

if (foo)
process_tree_contents(...);

Signed-off-by: Matthew DeVore 
---
 list-objects.c | 68 ++
 1 file changed, 41 insertions(+), 27 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index 584518a3f..ccc529e5e 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -94,6 +94,46 @@ static void process_gitlink(struct traversal_context *ctx,
/* Nothing to do */
 }
 
+static void process_tree(struct traversal_context *ctx,
+struct tree *tree,
+struct strbuf *base,
+const char *name);
+
+static void process_tree_contents(struct traversal_context *ctx,
+ struct tree *tree,
+ struct strbuf *base)
+{
+   struct tree_desc desc;
+   struct name_entry entry;
+   enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
+   all_entries_interesting : entry_not_interesting;
+
+   init_tree_desc(, tree->buffer, tree->size);
+
+   while (tree_entry(, )) {
+   if (match != all_entries_interesting) {
+   match = tree_entry_interesting(, base, 0,
+  
>revs->diffopt.pathspec);
+   if (match == all_entries_not_interesting)
+   break;
+   if (match == entry_not_interesting)
+   continue;
+   }
+
+   if (S_ISDIR(entry.mode))
+   process_tree(ctx,
+lookup_tree(the_repository, entry.oid),
+base, entry.path);
+   else if (S_ISGITLINK(entry.mode))
+   process_gitlink(ctx, entry.oid->hash,
+   base, entry.path);
+   else
+   process_blob(ctx,
+lookup_blob(the_repository, entry.oid),
+base, entry.path);
+   }
+}
+
 static void process_tree(struct traversal_context *ctx,
 struct tree *tree,
 struct strbuf *base,
@@ -101,10 +141,6 @@ static void process_tree(struct traversal_context *ctx,
 {
struct object *obj = >object;
struct rev_info *revs = ctx->revs;
-   struct tree_desc desc;
-   struct name_entry entry;
-   enum interesting match = revs->diffopt.pathspec.nr == 0 ?
-   all_entries_interesting: entry_not_interesting;
int baselen = base->len;
enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
int gently = revs->ignore_missing_links ||
@@ -144,29 +180,7 @@ static void process_tree(struct traversal_context *ctx,
if (base->len)
strbuf_addch(base, '/');
 
-   init_tree_desc(, tree->buffer, tree->size);
-
-   while (tree_entry(, )) {
-   if (match != all_entries_interesting) {
-   match = tree_entry_interesting(, base, 0,
-  >diffopt.pathspec);
-   if (match == all_entries_not_interesting)
-   break;
-   if (match == entry_not_interesting)
-   continue;
-   }
-
-   if (S_ISDIR(entry.mode))
-   process_tree(ctx,
-lookup_tree(the_repository, entry.oid),
-base, entry.path);
-   else if (S_ISGITLINK(entry.mode))
-   process_gitlink(ctx, entry.oid->hash, base, entry.path);
-   else
-   process_blob(ctx,
-lookup_blob(the_repository, entry.oid),
-base, entry.path);
-   }
+   process_tree_contents(ctx, tree, base);
 
if (!(obj->flags & USER_GIVEN) && ctx->filter_fn) {
r = ctx->filter_fn(LOFS_END_TREE, obj,
-- 
2.19.0.397.gdd90340f6a-goog

[PATCH v8 6/7] list-objects-filter: use BUG rather than die

2018-09-13 Thread Matthew DeVore

In some cases in this file, BUG makes more sense than die. In such
cases, a we get there from a coding error rather than a user error.

'return' has been removed following some instances of BUG since BUG does
not return.

Signed-off-by: Matthew DeVore 
---
 list-objects-filter.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/list-objects-filter.c b/list-objects-filter.c
index a0ba78b20..5f8b1a002 100644
--- a/list-objects-filter.c
+++ b/list-objects-filter.c
@@ -44,8 +44,7 @@ static enum list_objects_filter_result filter_blobs_none(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -102,8 +101,7 @@ static enum list_objects_filter_result filter_blobs_limit(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -208,8 +206,7 @@ static enum list_objects_filter_result filter_sparse(
 
switch (filter_situation) {
default:
-   die("unknown filter_situation");
-   return LOFR_ZERO;
+   BUG("unknown filter_situation: %d", filter_situation);
 
case LOFS_BEGIN_TREE:
assert(obj->type == OBJ_TREE);
@@ -389,7 +386,7 @@ void *list_objects_filter__init(
assert((sizeof(s_filters) / sizeof(s_filters[0])) == LOFC__COUNT);
 
if (filter_options->choice >= LOFC__COUNT)
-   die("invalid list-objects filter choice: %d",
+   BUG("invalid list-objects filter choice: %d",
filter_options->choice);
 
init_fn = s_filters[filter_options->choice];
-- 
2.19.0.397.gdd90340f6a-goog

[PATCH v8 0/7] filter: support for excluding all trees and blobs

2018-09-13 Thread Matthew DeVore

Things seem to have settled down in terms of responses, so here is a re-roll,
some of the changes being Junio's suggestions:
 - show a more helpful error if a positive integer is given after "tree:"
 - added a test for an issue that this patchset inadvertently fixes:
   git rev-list would filter objects given explicitly on the command-line,
   but it should not.
 - improved documentation on the NOT_USER_GIVEN flag in revision.h

Matthew DeVore (7):
  list-objects: store common func args in struct
  list-objects: refactor to process_tree_contents
  list-objects: always parse trees gently
  rev-list: handle missing tree objects properly
  revision: mark non-user-given objects instead
  list-objects-filter: use BUG rather than die
  list-objects-filter: implement filter tree:0

 Documentation/rev-list-options.txt |   5 +
 builtin/rev-list.c |  11 +-
 list-objects-filter-options.c  |  14 ++
 list-objects-filter-options.h  |   1 +
 list-objects-filter.c  |  60 ++-
 list-objects.c | 232 +
 revision.c |   1 -
 revision.h |  26 ++-
 t/t0410-partial-clone.sh   |  45 +
 t/t5317-pack-objects-filter-objects.sh |  41 +
 t/t5616-partial-clone.sh   |  38 
 t/t6112-rev-list-filters-objects.sh|  39 +
 12 files changed, 389 insertions(+), 124 deletions(-)

-- 
2.19.0.397.gdd90340f6a-goog

Re: [PATCH 3/3] sequencer: use read_author_script()

2018-09-13 Thread Eric Sunshine

On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood  wrote:
> Use the new function to read the author script, updating
> read_env_script() and read_author_ident(). This means we now have a
> single code path that reads the author script and uses sq_dequote() to
> dequote it. This fixes potential problems with user edited scripts
> as read_env_script() which did not track quotes properly.
> [...]
> Signed-off-by: Phillip Wood 
> ---
>  /*
>   * Read a list of environment variable assignments (such as the author-script
>   * file) into an environment block. Returns -1 on error, 0 otherwise.
>   */

According to this comment, this function is capable of parsing a file
of arbitrary "NAME=Value" lines, and indeed the original code does
just that, but...

>  static int read_env_script(struct argv_array *env)
>  {
> +   char *name, *email, *date;
>
> -   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
> +   if (read_author_script(rebase_path_author_script(),
> +  , , , 0))

...the new implementation is able to handle only GIT_AUTHOR_NAME,
GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE, in exactly that order.

This seems like a pretty serious (and possibly buggy) change of
behavior, and makes the function much less useful (in general). Is it
true that it will only ever be used for files containing that limited
set of names? If so, the behavior change deserves mention in the
commit message, the function comment needs updating, and the function
itself probably ought to be renamed.

> +   strbuf_addstr(, "GIT_AUTHOR_NAME=");
> +   strbuf_addstr(, name);
> +   argv_array_push(env, script.buf);
> +   strbuf_reset();
> +   strbuf_addstr(, "GIT_AUTHOR_EMAIL=");
> +   strbuf_addstr(, email);
> +   argv_array_push(env, script.buf);
> +   strbuf_reset();
> +   strbuf_addstr(, "GIT_AUTHOR_DATE=");
> +   strbuf_addstr(, date);
> +   argv_array_push(env, script.buf);
> +   strbuf_release();

Mentioned earlier[1], this can all collapse down to:

argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);

However, it's unfortunate that this manual and hard-coded
reconstruction is needed at all. If you restructure the factoring of
this patch series, such ugliness can be avoided altogether. For
instance, the series could be structured like this:

1. Introduce a general-purpose function for reading a file containing
arbitrary "NAME=Value" lines (not carrying about specific key names or
their order) and returning them in some data structure (perhaps via
'string_list' as parse_key_value_squoted() in patch 2/3 does).

2. Build read_author_script() atop #1, making it expect and extract
GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE (possibly in
that order, or possibly not if we don't care).

3. Retrofit existing parsers to call one of those two functions (this
step may happen over several patches). So, for instance,
read_env_script() would call the generic parser from #1, whereas
sequencer.c:read_author_ident() would call the more specific parser
from #2.

More below...

> @@ -790,54 +771,25 @@ static char *get_author(const char *message)
>  /* Read author-script and return an ident line (author  timestamp) */
>  static const char *read_author_ident(struct strbuf *buf)
>  {
> -   const char *keys[] = {
> -   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
> -   };
> -   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
> +   if (read_author_script(rebase_path_author_script(),
> +  , , , 0))
> return NULL;
> -   /* dequote values and construct ident line in-place */
> -   for (in = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
> -   if (!skip_prefix(in, keys[i], (const char **))) {
> -   warning(_("could not parse '%s' (looking for '%s')"),
> -   rebase_path_author_script(), keys[i]);
> -   return NULL;
> -   }
> -   if (!sq_dequote(in)) {
> -   warning(_("bad quoting on %s value in '%s'"),
> -   keys[i], rebase_path_author_script());
> -   return NULL;
> -   }
> -   if (i < 3) {
> -   warning(_("could not parse '%s' (looking for '%s')"),
> -   rebase_path_author_script(), keys[i]);
> -   return NULL;
> -   }

The parsing code being thrown away here does a better job of
diagnosing problems (thus helping the user figure out what went wrong)
than the new shared parser introduced by patch 2/3. The shared
function only ever reports a generic "unable to parse", whereas the
above code gets specific, saying that it was looking for a particular
key or that quoting was broken. I'd have expected the new shared

Re: [PATCH 2/3] add read_author_script() to libgit

2018-09-13 Thread Eric Sunshine

On Wed, Sep 12, 2018 at 6:11 AM Phillip Wood  wrote:
> Add read_author_script() to sequencer.c based on the implementation in
> builtin/am.c and update read_am_author_script() to use
> read_author_script(). The sequencer code that reads the author script
> will be updated in the next commit.
>
> Signed-off-by: Phillip Wood 
> ---
> diff --git a/builtin/am.c b/builtin/am.c
> @@ -305,39 +279,16 @@ static int parse_key_value_squoted(char *buf, struct 
> string_list *list)
>  static int read_am_author_script(struct am_state *state)
>  {

This function returns 'int'...

> const char *filename = am_path(state, "author-script");
> +   if (read_author_script(filename, >author_name,
> +  >author_email, >author_date, 1))
> +   exit(128);

...but only ever exit()s...

> +   return 0;

... or returns 0.

>  }

It's a little disturbing that it now invokes exit() directly rather
than calling die() since die() may involve extra functionality before
actually exiting.

What if, instead of exit()ing directly, you drop the conditional and
instead return the value of read_author_script():

return read_author_script(...);

and let the caller deal with the zero or non-zero return value as
usual? (True, you'd get two error messages upon failure rather than
just one, but that's not necessarily a bad thing.)

A possibly valid response is that such a change is outside the scope
of this series since the original code shared this odd architecture of
sometimes returning 0, sometimes -1, and sometimes die()ing.

Re: [PATCH v5 1/5] eoie: add End of Index Entry (EOIE) extension

2018-09-13 Thread Junio C Hamano

Ben Peart  writes:

> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 39133bcbc8..f613dd72e3 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -7,6 +7,7 @@ test_description='split index mode tests'
>  # We need total control of index splitting here
>  sane_unset GIT_TEST_SPLIT_INDEX
>  sane_unset GIT_FSMONITOR_TEST
> +export GIT_TEST_DISABLE_EOIE=true
>  
>  test_expect_success 'enable split index' '
>   git config splitIndex.maxPercentChange 100 &&

It is safer to squash the following in; we may want to revisit the
decision test-lint makes on this issue later, though.

-- >8 --
Subject: [PATCH] SQUASH???

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#export

specifies how "export name[=word]" ought to work, but because
writing "name=word; export name" is not so much more cumbersome
and some older shells that do not understand the former do grok
the latter.  test-lint also recommends spelling it this way.
---
 t/t1700-split-index.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index f613dd72e3..dab97c2187 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -7,7 +7,7 @@ test_description='split index mode tests'
 # We need total control of index splitting here
 sane_unset GIT_TEST_SPLIT_INDEX
 sane_unset GIT_FSMONITOR_TEST
-export GIT_TEST_DISABLE_EOIE=true
+GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
 
 test_expect_success 'enable split index' '
git config splitIndex.maxPercentChange 100 &&
-- 
2.19.0

[PATCH v2] linear-assignment: fix potential out of bounds memory access

2018-09-13 Thread Thomas Gummerer

Currently the 'compute_assignment()' function may read memory out
of bounds, even if used correctly.  Namely this happens when we only
have one column.  In that case we try to calculate the initial
minimum cost using '!j1' as column in the reduction transfer code.
That in turn causes us to try and get the cost from column 1 in the
cost matrix, which does not exist, and thus results in an out of
bounds memory read.

In the original paper [1], the example code initializes that minimum
cost to "infinite".  We could emulate something similar by setting the
minimum cost to INT_MAX, which would result in the same minimum cost
as the current algorithm, as we'd always go into the if condition at
least once, except when we only have one column, and column_count thus
equals 1.

If column_count does equal 1, the condition in the loop would always
be false, and we'd end up with a minimum of INT_MAX, which may lead to
integer overflows later in the algorithm.

For a column count of 1, we however do not even really need to go
through the whole algorithm.  A column count of 1 means that there's
no possible assignments, and we can just zero out the column2row and
row2column arrays, and return early from the function, while keeping
the reduction transfer part of the function the same as it is
currently.

Another solution would be to just not call the 'compute_assignment()'
function from the range diff code in this case, however it's better to
make the compute_assignment function more robust, so future callers
don't run into this potential problem.

Note that the test only fails under valgrind on Linux, but the same
command has been reported to segfault on Mac OS.

[1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
 algorithm for dense and sparse linear assignment
 problems. Computing, 38(4), 325–340.

Reported-by: ryenus 
Helped-by: Derrick Stolee 
Signed-off-by: Thomas Gummerer 
---
 linear-assignment.c   | 6 ++
 t/t3206-range-diff.sh | 5 +
 2 files changed, 11 insertions(+)

diff --git a/linear-assignment.c b/linear-assignment.c
index 9b3e56e283..ecffc09be6 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -19,6 +19,12 @@ void compute_assignment(int column_count, int row_count, int 
*cost,
int *free_row, free_count = 0, saved_free_count, *pred, *col;
int i, j, phase;
 
+   if (column_count < 2) {
+   memset(column2row, 0, sizeof(int) * column_count);
+   memset(row2column, 0, sizeof(int) * row_count);
+   return;
+   }
+
memset(column2row, -1, sizeof(int) * column_count);
memset(row2column, -1, sizeof(int) * row_count);
ALLOC_ARRAY(v, column_count);
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 2237c7f4af..fb4c13a84a 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -142,4 +142,9 @@ test_expect_success 'changed message' '
test_cmp expected actual
 '
 
+test_expect_success 'no commits on one side' '
+   git commit --amend -m "new message" &&
+   git range-diff master HEAD@{1} HEAD
+'
+
 test_done
-- 
2.19.0.397.gdd90340f6a

Re: [PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

2018-09-13 Thread Thomas Gummerer

On 09/12, Johannes Schindelin wrote:
> Hi Thomas,
> 
> [quickly, as I will go back to a proper vacation after this]

Sorry about interrupting your vacation, enjoy wherever you are! :)

> On Wed, 12 Sep 2018, Thomas Gummerer wrote:
> 
> > diff --git a/linear-assignment.c b/linear-assignment.c
> > index 9b3e56e283..7700b80eeb 100644
> > --- a/linear-assignment.c
> > +++ b/linear-assignment.c
> > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, 
> > int *cost,
> > else if (j1 < -1)
> > row2column[i] = -2 - j1;
> > else {
> > -   int min = COST(!j1, i) - v[!j1];
> > -   for (j = 1; j < column_count; j++)
> > +   int min = INT_MAX;
> 
> I am worried about this, as I tried very hard to avoid integer overruns.

Ah fair enough, now I think I understand where the calculation of the
initial value of min comes from, thanks!

> Wouldn't it be possible to replace the `else {` by an appropriate `else if
> (...) { ... } else {`? E.g. `else if (column_count < 2)` or some such?

Yes, I think that would be possible.  However if we're already special
casing "column_count < 2", I think we might as well just exit early
before running through the whole algorithm in that case.  If there's
only one column, there are no commits that can be assigned to
eachother, as there is only the one.

We could also just not run call 'compute_assignment' in the first
place if column_count == 1, however I'd rather make the function safer
to call, just in case we find it useful for something else in the
future.

Will send an updated patch in a bit.

> Ciao,
> Dscho
> 
> > +   for (j = 0; j < column_count; j++)
> > if (j != j1 && min > COST(j, i) - v[j])
> > min = COST(j, i) - v[j];
> > v[j1] -= min;
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..fb4c13a84a 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'no commits on one side' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done
> > -- 
> > 2.19.0.397.gdd90340f6a
> > 
> >

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-13 Thread Junio C Hamano

Thomas Gummerer  writes:

> Thanks, I do think this is a good idea.  I do however share Ævar's
> concern in https://public-inbox.org/git/87h8itkz2h@evledraar.gmail.com/.
> I have TEST_GIT_INDEX_VERSION=4 set in my config.mak since quite a
> long time, and had I missed this thread, I would all of a sudden not
> run the test suite with index format 4 anymore without any notice.
>
> I think the suggestion of erroring out if TEST_GIT_INDEX_VERSION is
> set would be useful in this case (and probably others in which you're
> renaming these variables).

I am not enthused by "you are using an old variable---we fail your
build/test".  The assumption is that people let config.mak laying
around regardless of how new/old the vintage of Git they are
building and testing.  I do not think you'd want to adjust your
config.mak as you switch between 'maint' and 'next.

I think it is OK to make it error only if the old one is set without
the new one.  Then people can have _both_ set to the same value
during the period in which the topic sails through pu down to next
down to master, after seeing an failure once while building and
testing 'pu'.

> Btw, I think it would be nice to have all these renaming/documenting
> variables for the test suite patches in one series, so they are easier
> to look at with more context.

Yeah, even though these three were posted as independent changes,
their additions to t/README inevitably conflicted with each other.

Re: [PATCH v2] builtin/remote: quote remote name on error to display empty name

2018-09-13 Thread Junio C Hamano

Junio C Hamano  writes:

> Have you run "make test" with this change?  
>
> I expect at least 5505.10 to fail without adjustment.

For now, I'll queue this on top, and if this turns out to be
sufficient, I will squash it in.

 t/t5505-remote.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index 241e6a319d..d2a2cdd453 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -145,7 +145,7 @@ test_expect_success 'remove remote protects local branches' 
'
 test_expect_success 'remove errors out early when deleting non-existent 
branch' '
(
cd test &&
-   echo "fatal: No such remote: foo" >expect &&
+   echo "fatal: No such remote: '\''foo'\''" >expect &&
test_must_fail git remote rm foo 2>actual &&
test_i18ncmp expect actual
)
@@ -173,7 +173,7 @@ test_expect_success 'remove remote with a branch without 
configured merge' '
 test_expect_success 'rename errors out early when deleting non-existent 
branch' '
(
cd test &&
-   echo "fatal: No such remote: foo" >expect &&
+   echo "fatal: No such remote: '\''foo'\''" >expect &&
test_must_fail git remote rename foo bar 2>actual &&
test_i18ncmp expect actual
)

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-13 Thread Thomas Gummerer

On 09/13, Ben Peart wrote:
> 
> 
> On 9/12/2018 6:31 PM, Thomas Gummerer wrote:
> > On 09/12, Ben Peart wrote:
> > > Teach get_index_format_default() to support running the test suite
> > > with specific index versions.  In particular, this enables the test suite
> > > to be run using index version 4 which is not the default so gets less 
> > > testing.
> > 
> > I found this commit message slightly misleading.  Running the test
> > suite with specific index versions is already supported, by defining
> > TEST_GIT_INDEX_VERSION in 'config.mak'.  What we're doing here is
> > introduce an additional environment variable that can also be used to
> > set the index format in tests.
> > 
> > Even setting TEST_GIT_INDEX_VERSION=4 in the environment does run the
> > test suite with index-v4.  Admittedly the name is a bit strange
> > compared to our usual GIT_TEST_* environment variable names, and it
> > should probably be documented better (it's only documented in the
> > Makefile currently), but I'm not sure we should introduce another
> > environment variable for this purpose?
> 
> TEST_GIT_INDEX_VERSION enables the testing I was looking for but you're
> right, it isn't well documented and the atypical naming and implementation
> don't help either.
> 
> I checked the documentation and code but didn't see any way to test the V4
> index code path so wrote this patch.  I wonder if we can improve the
> discoverability of TEST_GIT_INDEX_VERSION through better naming and
> documentation.
> 
> How about this as an alternative?

Thanks, I do think this is a good idea.  I do however share Ævar's
concern in https://public-inbox.org/git/87h8itkz2h@evledraar.gmail.com/.
I have TEST_GIT_INDEX_VERSION=4 set in my config.mak since quite a
long time, and had I missed this thread, I would all of a sudden not
run the test suite with index format 4 anymore without any notice.

I think the suggestion of erroring out if TEST_GIT_INDEX_VERSION is
set would be useful in this case (and probably others in which you're
renaming these variables).  Not sure how many people it would affect
(and most of those would probably read the mailing list), but it's not
a big change either.

Btw, I think it would be nice to have all these renaming/documenting
variables for the test suite patches in one series, so they are easier
to look at with more context.

> 
> diff --git a/Makefile b/Makefile
> index 5a969f5830..9e84ef02f7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -400,7 +400,7 @@ all::
>  # (defaults to "man") if you want to have a different default when
>  # "git help" is called without a parameter specifying the format.
>  #
> -# Define TEST_GIT_INDEX_VERSION to 2, 3 or 4 to run the test suite
> +# Define GIT_TEST_INDEX_VERSION to 2, 3 or 4 to run the test suite
>  # with a different indexfile format version.  If it isn't set the index
>  # file format used is index-v[23].
>  #
> @@ -2599,8 +2599,8 @@ endif
>  ifdef GIT_INTEROP_MAKE_OPTS
> @echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst
> ','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+
>  endif
> -ifdef TEST_GIT_INDEX_VERSION
> -   @echo TEST_GIT_INDEX_VERSION=\''$(subst ','\'',$(subst
> ','\'',$(TEST_GIT_INDEX_VERSION)))'\' >>$@+
> +ifdef GIT_TEST_INDEX_VERSION
> +   @echo GIT_TEST_INDEX_VERSION=\''$(subst ','\'',$(subst
> ','\'',$(GIT_TEST_INDEX_VERSION)))'\' >>$@+
> 
>  endif
> @if cmp $@+ $@ >/dev/null 2>&1; then $(RM) $@+; else mv $@+ $@; fi
> 
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 44288cbb59..31698c01c4 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -134,9 +134,9 @@ export EDITOR
>  GIT_TRACE_BARE=1
>  export GIT_TRACE_BARE
> 
> -if test -n "${TEST_GIT_INDEX_VERSION:+isset}"
> +if test -n "${GIT_TEST_INDEX_VERSION:+isset}"
>  then
> -   GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
> +   GIT_INDEX_VERSION="$GIT_TEST_INDEX_VERSION"
> export GIT_INDEX_VERSION
>  fi
> 
> diff --git a/t/README b/t/README
> index 9028b47d92..f872638a78 100644
> --- a/t/README
> +++ b/t/README
> @@ -315,10 +315,14 @@ packs on demand. This normally only happens when the
> object size is
>   over 2GB. This variable forces the code path on any object larger than
>bytes.
> 
> -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
> +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
>   path where deltas larger than this limit require extra memory
>   allocation for bookkeeping.
> 
> +GIT_TEST_INDEX_VERSION= exercises the index read/write code path
> +for the index version specified.  Can be set to any valid version
> +but the non-default version 4 is probably the most beneficial.
> +
>   Naming Tests
>   
>

Re: [PATCH v2] builtin/remote: quote remote name on error to display empty name

2018-09-13 Thread Junio C Hamano

Shulhan  writes:

> When adding new remote name with empty string, git will print the
> following error message,
>
>   fatal: '' is not a valid remote name\n
>
> But when removing remote name with empty string as input, git shows the
> empty string without quote,
>
>   fatal: No such remote: \n
>
> To make these error messages consistent, quote the name of the remote
> that we tried and failed to find.
>
> Signed-off-by: Shulhan 
> Reviewed-by: Junio C Hamano 
> ---
>  builtin/remote.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Have you run "make test" with this change?  

I expect at least 5505.10 to fail without adjustment.

Re: [PATCH v1] preload-index: update GIT_FORCE_PRELOAD_TEST support

2018-09-13 Thread Junio C Hamano

Ben Peart  writes:

> Rename GIT_FORCE_PRELOAD_TEST to GIT_TEST_PRELOAD for consistency with the
> other GIT_TEST_ special setups and properly document its use.
>
> Signed-off-by: Ben Peart 
> ---
>

Among the two unrelated changes that are not mentioned in the
proposed log message, the change to the test to use sane_unset is
probably OK, but replacing a call to getenv() with git_env_bool()
without making sure necessary header file(s) are included would
break the build.

I _think_ config.h is the header to include; I didn't try it,
though.

Thanks.


> Notes:
> Base Ref: v2.19.0
> Web-Diff: https://github.com/benpeart/git/commit/dcd201b920
> Checkout: git fetch https://github.com/benpeart/git git-test-preload-v1 
> && git checkout dcd201b920
>
>  preload-index.c | 2 +-
>  t/README| 3 +++
>  t/t7519-status-fsmonitor.sh | 4 ++--
>  3 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/preload-index.c b/preload-index.c
> index 71cd2437a3..cc8a7333c2 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -84,7 +84,7 @@ static void preload_index(struct index_state *index,
>   return;
>  
>   threads = index->cache_nr / THREAD_COST;
> - if ((index->cache_nr > 1) && (threads < 2) && 
> getenv("GIT_FORCE_PRELOAD_TEST"))
> + if ((index->cache_nr > 1) && (threads < 2) && 
> git_env_bool("GIT_TEST_PRELOAD", 0))
>   threads = 2;
>   if (threads < 2)
>   return;
> diff --git a/t/README b/t/README
> index 9028b47d92..73fb09560f 100644
> --- a/t/README
> +++ b/t/README
> @@ -319,6 +319,9 @@ GIT_TEST_OE_DELTA_SIZE= exercises the uncomon 
> pack-objects code
>  path where deltas larger than this limit require extra memory
>  allocation for bookkeeping.
>  
> +GIT_TEST_PRELOAD= exercises the preload-index code path by
> +overriding the minimum number of cache entries required per thread.
> +
>  Naming Tests
>  
>  
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index 756beb0d8e..9b703d49b5 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -245,9 +245,9 @@ do
>   git config core.preloadIndex $preload_val &&
>   if test $preload_val = true
>   then
> - GIT_FORCE_PRELOAD_TEST=$preload_val; export 
> GIT_FORCE_PRELOAD_TEST
> + GIT_TEST_PRELOAD=$preload_val; export GIT_TEST_PRELOAD
>   else
> - unset GIT_FORCE_PRELOAD_TEST
> + sane_unset GIT_TEST_PRELOAD
>   fi
>   '
>  
>
> base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670

Re: [PATCH v2 1/1] commit-reach: properly peel tags

2018-09-13 Thread Junio C Hamano

Derrick Stolee  writes:

>> +if (!from_one || from_one->type != OBJ_COMMIT) {
>> +/* no way to tell if this is reachable by
>> + * looking at the ancestry chain alone, so
>> + * leave a note to ourselves not to worry about
>> + * this object anymore.
>> + */
>> +from->objects[i].item->flags |= assign_flag;
>> +continue;
>> +}
>> +
>> +list[nr_commits] = (struct commit *)from_one;
>> +if (parse_commit(list[nr_commits]) ||
>> +list[nr_commits]->generation < min_generation)
>> +return 0; /* is this a leak? */
>
> Of course, after sending v2, I see this comment. This is a leak of
> 'list' and should be fixed.
>
> Not only is it a leak here, it is also a leak in the 'cleanup'
> section. I'll squash the following into v3, but I'll let v2 simmer for
> review before rerolling.
>
> diff --git a/commit-reach.c b/commit-reach.c
> index 4048a2132a..c457d8d85f 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -569,8 +569,11 @@ int can_all_from_reach_with_flag(struct
> object_array *from,
>
>     list[nr_commits] = (struct commit *)from_one;
>     if (parse_commit(list[nr_commits]) ||
> -   list[nr_commits]->generation < min_generation)
> -   return 0; /* is this a leak? */
> +   list[nr_commits]->generation < min_generation) {
> +   result = 0;
> +   goto cleanup;
> +   }
> +
>     nr_commits++;
>     }
>
> @@ -623,6 +626,7 @@ int can_all_from_reach_with_flag(struct
> object_array *from,
>     clear_commit_marks(list[i], RESULT);
>     clear_commit_marks(list[i], assign_flag);
>     }
> +   free(list);
>     return result;
>  }

With this, commit marks are cleared even when we do the "early
return", but only for the objects that appear in the resulting
list[].  Because the for() loop in the last hunk interates over
list[], those non-commit objects that are smudged with assign_flag
but never made to list[] will be left smudged when this function
returns (this is true when there is no early return).  

Is that intended?

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-13 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> On Wed, Sep 12 2018, Ben Peart wrote:
>
>> -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
>> +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
>>  path where deltas larger than this limit require extra memory
>>  allocation for bookkeeping.
>
> If re-rolled maybe better as a leading "fix a typo" patch.

Yeah, an easy to miss change that is unrelated to the topic is
better made as a separate patch.

Re: [PATCH 1/2] fetch-object: provide only one fetching function

2018-09-13 Thread Junio C Hamano

Jonathan Tan  writes:

>> Instead of explaining why the new convention is better to justify
>> (2), the above three lines handwave by saying "more flexible"
>> twice.  We should do better.
>> 
>>  fetch-object: unify fetch_object[s] functions
>> 
>>  There are fetch_object() and fetch_objects() helpers in
>>  fetch-object.h; as the latter takes "struct oid_array",
>>  the former cannot be made into a thin wrapper around the
>>  latter without an extra allocation and set-up cost.
>> 
>>  Update fetch_objects() to take an array of "struct
>>  object_id" and number of elements in it as separate
>>  parameters, remove fetch_object(), and adjust all existing
>>  callers of these functions to use the new fetch_objects().
>> 
>> perhaps?
>
> Thanks - your explanation is much clearer than mine. Let me know if you
> want a reroll (or if you can update the commit message yourself, that's
> fine too).

If there is no other change needed for either of the patches, I do
not mind rewording the 1/2 myself to save a round-trip.

Thanks.

Re: [PATCH] sequencer: fix --allow-empty-message behavior, make it smarter

2018-09-13 Thread Junio C Hamano

Elijah Newren  writes:

> This patch cleanly applies to both 2.19.0 and pu.  There are some related
> code cleanups that I'd like to make, but doing that cleanup conflicts with
> the various rewrite-rebase-in-C topics sitting in pu; since those are
> fairly lengthy, I really don't want to cause problems there, but I think
> SZEDER really wants this 2.19.0 regression fix before 2.20.0 and thus
> before those other topics.

Oh absolutely.  Materials for 2.19.x maintenance track can and
should jump over other topics for 2.20 and later.  Thanks for being
considerate.

> @@ -899,7 +899,7 @@ static int run_git_commit(const char *defmsg, struct 
> replay_opts *opts,
>   if ((flags & ALLOW_EMPTY))
>   argv_array_push(, "--allow-empty");
>  
> - if (opts->allow_empty_message)
> + if (!(flags & EDIT_MSG))
>   argv_array_push(, "--allow-empty-message");

Hmph.  I briefly wondered if an alternative logic would be better:

If and only the original commit being rewritten is empty,
then we allow the result to be empty.

But looking at EDIT_MSG would be more explicit and probably is a
better idea.  That would allow you to abort a reword of a commit
whose message is empty.

The reason why I thought about the alt logic is because I am worried
about a use case where

$ GIT_EDITOR=: git rebase|cherry-pick ...

is used to say "I do not want you to go interactive, when the only
interaction needed from me is to edit the message---I am perfectly
happy with the messages of commits being replayed or ones you come
up with as the default".  Because "--allow-empty-message" tends to
make things _less_ interactive, the worry is unfounded.

Re: [PATCH 7/9] submodule: fetch in submodules git directory instead of in worktree

2018-09-13 Thread Stefan Beller

On Wed, Sep 12, 2018 at 11:36 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > This patch started as a refactoring to make 'get_next_submodule' more
> > readable, but upon doing so, I realized that git-fetch actually doesn't
> > need to be run in the worktree. So let's run it in the git dir instead.
>
> It may be clear to the author but not clear to the reader of the
> above paragraph that "worktree", "fetch" and "git dir" all refer to
> the recursively invoked operation that updates the submodules
> repository.  s/git-fetch/"git fetch" for the submodule/ should be
> sufficient to help the readers.
>
> > That should pave the way towards fetching submodules that are currently
> > not checked out.
>
> Very good.
>
> > +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
> > +{
> > + prepare_submodule_repo_env_no_git_dir(out);
> > + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
> > +}
> > +
> >  /* Helper function to display the submodule header line prior to the full
> >   * summary output. If it can locate the submodule objects directory it will
> >   * attempt to lookup both the left and right commits and put them into the
> > @@ -1227,6 +1233,27 @@ static int get_fetch_recurse_config(const struct 
> > submodule *submodule,
> >   return spf->default_option;
> >  }
> >
> > +static const char *get_submodule_git_dir(struct repository *r, const char 
> > *path)
> > +{
> > + struct repository subrepo;
> > + const char *ret;
> > +
> > + if (repo_submodule_init(, r, path)) {
> > + /* no entry in .gitmodules? */
> > + struct strbuf gitdir = STRBUF_INIT;
> > + strbuf_repo_worktree_path(, r, "%s/.git", path);
> > + if (repo_init(, gitdir.buf, NULL)) {
> > + strbuf_release();
> > + return NULL;
> > + }
>
> This is for the modern "absorbed" layout?  Do we get a notice and
> encouragement to migrate from the historical layout, or there is no
> need to (e.g. the migration happens automatically in some other
> codepaths)?

No, the absorbed would also be handled by repo_submodule_init.
I wrote a patch once to migrate repo_submodule_init to take a
"struct *submodule" instead of a path as the third argument, which
would fall in line with this patch as well, I'll dig it up.

Historically git-fetch supported repositories that are not submodules
(but have a gitlink and a working tree in place) as well. That is covered
here. (see comment /* no entry in .gitmodules? */)

> > - strbuf_release(_path);
> > - strbuf_release(_git_dir);
>
> But if it is a leak, it is easily plugged by freeing git_dir here, I
> think.

Thanks.

Re: [PATCH v5 11/12] t1407: make hash size independent

2018-09-13 Thread Ævar Arnfjörð Bjarmason



On Thu, Sep 13 2018, brian m. carlson wrote:

> - $RWT for-each-reflog | cut -c 42- | sort >actual &&
> + $RWT for-each-reflog | cut -d" " -f 2- | sort >actual &&

Aside from hash size issues, this just makes the tests easier to
read. Thanks!

Re: [PATCH v2 10/11] multi-pack-index: report progress during 'verify'

2018-09-13 Thread Ævar Arnfjörð Bjarmason

On Thu, Sep 13 2018, Derrick Stolee via GitGitGadget wrote:

> + progress = start_progress(_("Verifying object offsets"), 
> m->num_objects);

I think in the spirit of my "commit-graph {write,verify}: add progress
output" it would be better to say:

"Verifying multi-pack-index object offsets"

I.e. both to make it clearer what's actually going on (without much
added verbosity), and also so that when the user turns on this feature
it's clear what gc / fsck etc. are doing for that feature in particular,
as with the commit-graph.

Re: [PATCH v1] fsmonitor: update GIT_TEST_FSMONITOR support

2018-09-13 Thread Ævar Arnfjörð Bjarmason

On Thu, Sep 13 2018, Ben Peart wrote:

> diff --git a/config.c b/config.c
> index 3461993f0a..3555c63f28 100644
> --- a/config.c
> +++ b/config.c
> @@ -2278,7 +2278,7 @@ int git_config_get_max_percent_split_change(void)
>  int git_config_get_fsmonitor(void)
>  {
>   if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
> - core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> + core_fsmonitor = getenv("GIT_TEST_FSMONITOR");
>
>   if (core_fsmonitor && !*core_fsmonitor)
>   core_fsmonitor = NULL;
> diff --git a/t/README b/t/README
> index 9028b47d92..545438c820 100644
> --- a/t/README
> +++ b/t/README
> @@ -319,6 +319,10 @@ GIT_TEST_OE_DELTA_SIZE= exercises the uncomon 
> pack-objects code
>  path where deltas larger than this limit require extra memory
>  allocation for bookkeeping.
>
> +GIT_TEST_FSMONITOR=$PWD/t7519/fsmonitor-all exercises the fsmonitor
> +code path for utilizing a file system monitor to speed up detecting
> +new or changed files.
> +
>  Naming Tests
>  

I've seen this & will watch out for it, but still, when I'm updating to
"next" in a couple of months I may not be tracking the exact state of
the integration of this patch, and then running with
GIT_FSMONITOR_TEST=... will suddenly be a noop.

So maybe something like this to test-lib.sh as well (or directly in
config.c):

if test -n "$GIT_FSMONITOR_TEST"
then
echo "The GIT_FSMONITOR_TEST variable has been renamed to 
GIT_TEST_FSMONITOR"
exit 1
fi

Maybe I'm being too nitpicky and there's only two of us who run the test
with this anyway, and we can deal with it.

It just rubs me the wrong way that we have a test mode that silently
stops being picked up because a command-line option or env variable got
renamed, especially since we've had it for 4 stable releases, especially
since it's so easy for us to avoid that confusion (just die),
v.s. potential time wasted downstream (wondering why fsmonitor stuff
broke on $SOME_OS even though we're testing for it during package
build...).

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-13 Thread Ævar Arnfjörð Bjarmason



On Wed, Sep 12 2018, Ben Peart wrote:

> -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
> +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
>  path where deltas larger than this limit require extra memory
>  allocation for bookkeeping.

If re-rolled maybe better as a leading "fix a typo" patch.

I was reading this in a mobile client earlier and couldn't see at a
glance (small screen+wrapping) what the change was here, so that sort of
nitpicking isn't purely theoretical :)

> +GIT_TEST_INDEX_VERSION= exercises the index read/write code path
> +for the index version specified.  Can be set to any valid version
> +but the non-default version 4 is probably the most beneficial.
> +

I'm not familiar with this and haven't dug, so I don't know this: are
any values of 1..4 OK? 0..4? Would be better to say that here...

Re: [PATCH 2/3] archive: implement protocol v2 archive command

2018-09-13 Thread Ævar Arnfjörð Bjarmason

On Wed, Sep 12 2018, Stefan Beller wrote:

> On Tue, Sep 11, 2018 at 10:36 PM Josh Steadmon  wrote:
>> +*/
>> +   status = packet_reader_read();
>> +   }
>> +   if (status != PACKET_READ_DELIM)
>> +   die(_("upload-archive: expected delim packet"));
>
> This is upload-archive, which is a low level plumbing command
> (see the main man page of git for an explanation of that category),
> so we do not translate the error/die() calls. Besides, this is executed
> on the server, which might have a different locale than the requesting
> client?
>
> Would asking for a setlocale() on the server side be an unreasonable
> feature request for the capabilities (in a follow up patch, and then not
> just for archive but also fetch/push, etc.)?

This would be very nice to have, but as you suggest in some follow-up
change.

I think though that instead of doing setlocale() it would be better to
pass some flag saying we're operating in a machine-readable mode, and
then we'd (as part of the protocol defintion) say we're going to emit
GIT_ERR_UPLOAD_ARCHIVE_EXPECTED_DELIM_PACKET or whatever.

Advantages of doing that over a server-side setlocale():

 1) Purely for translation purposes, users can update to a newer client
to get new translations, even though they're talking to an old
server.

 2) Again, only for translation purposes, servers may not have the
appropriate locales generated and/or linked to libgettext.

 3) Ditto, some clients that aren't git.git may want/need to emit
different translation messages to their consumers than what we have,
think some GUI client / Emacs magit etc. whose UI is different from
ours.

 4) Aside from translation purposes, getting a machine-readable
"push/pull" etc. mode would be very handy. E.g. now you need to
parse stderr to see why exactly your push failed (hook denied, or
non-fast-forward, or non-fast-forward where there was a lock race
condition? ...).

I also wonder if something like #4 wouldn't compliment something like
the proposed structured logging[1]. I.e. even though we'd like to run
git.git, and present exactly the message to the user we do now, we might
want to run in such a machine-readable mode under the hood when talking
to the server so we can log exactly how the push went / how it failed
for the purposes of aggregation.

1. https://public-inbox.org/git/20180713165621.52017-2-...@jeffhostetler.com/

[PATCH v2] read-cache: update TEST_GIT_INDEX_VERSION support

2018-09-13 Thread Ben Peart

Rename TEST_GIT_INDEX_VERSION to GIT_TEST_INDEX_VERSION for consistency with
the other GIT_TEST_ special setups and properly document its use.

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: v2.19.0
Web-Diff: https://github.com/benpeart/git/commit/e26ccb9004
Checkout: git fetch https://github.com/benpeart/git 
git-test-index-version-v2 && git checkout e26ccb9004

### Interdiff (v1..v2):

diff --git a/Makefile b/Makefile
index 5a969f5830..9e84ef02f7 100644
--- a/Makefile
+++ b/Makefile
@@ -400,7 +400,7 @@ all::
 # (defaults to "man") if you want to have a different default when
 # "git help" is called without a parameter specifying the format.
 #
-# Define TEST_GIT_INDEX_VERSION to 2, 3 or 4 to run the test suite
+# Define GIT_TEST_INDEX_VERSION to 2, 3 or 4 to run the test suite
 # with a different indexfile format version.  If it isn't set the index
 # file format used is index-v[23].
 #
@@ -2599,8 +2599,8 @@ endif
 ifdef GIT_INTEROP_MAKE_OPTS
@echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst 
','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+
 endif
-ifdef TEST_GIT_INDEX_VERSION
-   @echo TEST_GIT_INDEX_VERSION=\''$(subst ','\'',$(subst 
','\'',$(TEST_GIT_INDEX_VERSION)))'\' >>$@+
+ifdef GIT_TEST_INDEX_VERSION
+   @echo GIT_TEST_INDEX_VERSION=\''$(subst ','\'',$(subst 
','\'',$(GIT_TEST_INDEX_VERSION)))'\' >>$@+
 endif
@if cmp $@+ $@ >/dev/null 2>&1; then $(RM) $@+; else mv $@+ $@; fi

diff --git a/read-cache.c b/read-cache.c
index d140ce9989..7b1354d759 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1570,45 +1570,26 @@ static unsigned int get_index_format_default(void)
char *envversion = getenv("GIT_INDEX_VERSION");
char *endp;
int value;
-   unsigned int version = -1;
+   unsigned int version = INDEX_FORMAT_DEFAULT;

-   if (envversion) {
-   version = strtoul(envversion, , 10);
-   if (*endp ||
-   version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) 
{
-   warning(_("GIT_INDEX_VERSION set, but the value is 
invalid.\n"
-   "Using version %i"), INDEX_FORMAT_DEFAULT);
-   version = INDEX_FORMAT_DEFAULT;
-   }
-   }
-
-   if (version == -1) {
-   if (!git_config_get_int("index.version", )) {
+   if (!envversion) {
+   if (!git_config_get_int("index.version", ))
version = value;
if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
warning(_("index.version set, but the value is 
invalid.\n"
  "Using version %i"), INDEX_FORMAT_DEFAULT);
-   version = INDEX_FORMAT_DEFAULT;
-   }
+   return INDEX_FORMAT_DEFAULT;
}
+   return version;
}

-   if (version == -1) {
-   envversion = getenv("GIT_TEST_INDEX_VERSION");
-   if (envversion) {
version = strtoul(envversion, , 10);
if (*endp ||
version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
-   warning(_("GIT_TEST_INDEX_VERSION set, but the 
value is invalid.\n"
+   warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n"
  "Using version %i"), INDEX_FORMAT_DEFAULT);
version = INDEX_FORMAT_DEFAULT;
}
-   }
-   }
-
-   if (version == -1)
-   version = INDEX_FORMAT_DEFAULT;
-
return version;
 }

diff --git a/t/README b/t/README
index f872638a78..fb6359b17b 100644
--- a/t/README
+++ b/t/README
@@ -320,8 +320,7 @@ path where deltas larger than this limit require extra 
memory
 allocation for bookkeeping.

 GIT_TEST_INDEX_VERSION= exercises the index read/write code path
-for the index version specified.  Can be set to any valid version
-but the non-default version 4 is probably the most beneficial.
+for the index version specified.  Can be set to any valid version.

 Naming Tests
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 44288cbb59..31698c01c4 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -134,9 +134,9 @@ export EDITOR
 GIT_TRACE_BARE=1
 export GIT_TRACE_BARE

-if test -n "${TEST_GIT_INDEX_VERSION:+isset}"
+if test -n "${GIT_TEST_INDEX_VERSION:+isset}"
 then
-   GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
+   GIT_INDEX_VERSION="$GIT_TEST_INDEX_VERSION"
export GIT_INDEX_VERSION
 fi

### Patches

 Makefile  | 6 +++---
 t/README  | 5 -
 t/test-lib.sh | 4 ++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile

Re: [PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Derrick Stolee


On 9/13/2018 1:40 PM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:

+   then
+   line_num=$(($line_num + 1))
+   fi
+   fi
+   done

I have a feeling that a single Perl script instead of a shell loop
that runs many grep and awk as subprocesses performs better even on
Windows, and it would be more readable and maintainable.

perl -e '
my $line_num;
while (<>) {
# Hunk header?  Grab the beginning in postimage.
if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
$line_num = $1;
next;
}

# Have we seen a hunk?  Ignore "diff --git" etc.
next unless defined $line_num;

# Deleted line? Ignore.
if (/^-/) {
next;
}

# Show only the line number of added lines.
if (/^\+/) {
print "$line_num\n";
}
# Either common context or added line appear in
# the postimage.  Count it.
$line_num++;
}
'

or something like that, given that you seem to only need line
numbers in new_lines.txt anyway?


Thanks for the deep dive here, especially with the perl assistance. I've 
never written any perl, but it seems like the right tool here. I'll have 
time to revisit this next week.


Thanks,

-Stolee

Re: [PATCH 1/2] fetch-object: provide only one fetching function

2018-09-13 Thread Jonathan Tan

> Instead of explaining why the new convention is better to justify
> (2), the above three lines handwave by saying "more flexible"
> twice.  We should do better.
> 
>   fetch-object: unify fetch_object[s] functions
> 
>   There are fetch_object() and fetch_objects() helpers in
>   fetch-object.h; as the latter takes "struct oid_array",
>   the former cannot be made into a thin wrapper around the
>   latter without an extra allocation and set-up cost.
> 
>   Update fetch_objects() to take an array of "struct
>   object_id" and number of elements in it as separate
>   parameters, remove fetch_object(), and adjust all existing
>   callers of these functions to use the new fetch_objects().
> 
> perhaps?

Thanks - your explanation is much clearer than mine. Let me know if you
want a reroll (or if you can update the commit message yourself, that's
fine too).

Re: [PATCH v1] fsmonitor: update GIT_TEST_FSMONITOR support

2018-09-13 Thread Junio C Hamano

Ben Peart  writes:

> Rename GIT_FSMONITOR_TEST to GIT_TEST_FSMONITOR for consistency with the
> other GIT_TEST_ special setups and properly document its use.

Makes sense.

Thanks for such an attention to detail.

>
> Signed-off-by: Ben Peart 
> ---
>
> Notes:
> Base Ref: v2.19.0
> Web-Diff: https://github.com/benpeart/git/commit/311484a684
> Checkout: git fetch https://github.com/benpeart/git git-test-fsmonitor-v1 
> && git checkout 311484a684
>
>  config.c| 2 +-
>  t/README| 4 
>  t/t1700-split-index.sh  | 2 +-
>  t/t7519-status-fsmonitor.sh | 2 +-
>  4 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/config.c b/config.c
> index 3461993f0a..3555c63f28 100644
> --- a/config.c
> +++ b/config.c
> @@ -2278,7 +2278,7 @@ int git_config_get_max_percent_split_change(void)
>  int git_config_get_fsmonitor(void)
>  {
>   if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
> - core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
> + core_fsmonitor = getenv("GIT_TEST_FSMONITOR");
>  
>   if (core_fsmonitor && !*core_fsmonitor)
>   core_fsmonitor = NULL;
> diff --git a/t/README b/t/README
> index 9028b47d92..545438c820 100644
> --- a/t/README
> +++ b/t/README
> @@ -319,6 +319,10 @@ GIT_TEST_OE_DELTA_SIZE= exercises the uncomon 
> pack-objects code
>  path where deltas larger than this limit require extra memory
>  allocation for bookkeeping.
>  
> +GIT_TEST_FSMONITOR=$PWD/t7519/fsmonitor-all exercises the fsmonitor
> +code path for utilizing a file system monitor to speed up detecting
> +new or changed files.
> +
>  Naming Tests
>  
>  
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index b3b4d83eaf..f6a856f24c 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -6,7 +6,7 @@ test_description='split index mode tests'
>  
>  # We need total control of index splitting here
>  sane_unset GIT_TEST_SPLIT_INDEX
> -sane_unset GIT_FSMONITOR_TEST
> +sane_unset GIT_TEST_FSMONITOR
>  
>  test_expect_success 'enable split index' '
>   git config splitIndex.maxPercentChange 100 &&
> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index 756beb0d8e..d77012ea6d 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -8,7 +8,7 @@ test_description='git status with file system watcher'
>  # To run the entire git test suite using fsmonitor:
>  #
>  # copy t/t7519/fsmonitor-all to a location in your path and then set
> -# GIT_FSMONITOR_TEST=fsmonitor-all and run your tests.
> +# GIT_TEST_FSMONITOR=fsmonitor-all and run your tests.
>  #
>  
>  # Note, after "git reset --hard HEAD" no extensions exist other than 'TREE'
>
> base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670

[PATCH v2 01/11] multi-pack-index: add 'verify' verb

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The multi-pack-index builtin writes multi-pack-index files, and
uses a 'write' verb to do so. Add a 'verify' verb that checks this
file matches the contents of the pack-indexes it replaces.

The current implementation is a no-op, but will be extended in
small increments in later commits.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 10 ++
 builtin/multi-pack-index.c |  4 +++-
 midx.c | 13 +
 midx.h |  1 +
 t/t5319-multi-pack-index.sh|  8 
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index 1f97e79912..f7778a2c85 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -27,6 +27,10 @@ write::
When given as the verb, write a new MIDX file to
`/packs/multi-pack-index`.
 
+verify::
+   When given as the verb, verify the contents of the MIDX file
+   at `/packs/multi-pack-index`.
+
 
 EXAMPLES
 
@@ -43,6 +47,12 @@ $ git multi-pack-index write
 $ git multi-pack-index --object-dir  write
 ---
 
+* Verify the MIDX file for the packfiles in the current .git folder.
++
+---
+$ git multi-pack-index verify
+---
+
 
 SEE ALSO
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2633efd95d..fca70f8e4f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=] write"),
+   N_("git multi-pack-index [--object-dir=] (write|verify)"),
NULL
 };
 
@@ -42,6 +42,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir);
+   if (!strcmp(argv[0], "verify"))
+   return verify_midx_file(opts.object_dir);
 
die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index f3e8dbc108..b253bed517 100644
--- a/midx.c
+++ b/midx.c
@@ -928,3 +928,16 @@ void clear_midx_file(const char *object_dir)
 
free(midx);
 }
+
+int verify_midx_error;
+
+int verify_midx_file(const char *object_dir)
+{
+   struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+   verify_midx_error = 0;
+
+   if (!m)
+   return 0;
+
+   return verify_midx_error;
+}
diff --git a/midx.h b/midx.h
index a210f1af2a..ce80b91c68 100644
--- a/midx.h
+++ b/midx.h
@@ -43,5 +43,6 @@ int prepare_multi_pack_index_one(struct repository *r, const 
char *object_dir, i
 
 int write_midx_file(const char *object_dir);
 void clear_midx_file(const char *object_dir);
+int verify_midx_file(const char *object_dir);
 
 #endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 6f56b38674..1c4e0e6d31 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -150,6 +150,10 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'verify multi-pack-index success' '
+   git multi-pack-index verify --object-dir=$objdir
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
@@ -214,4 +218,8 @@ test_expect_success 'force some 64-bit offsets with 
pack-objects' '
midx_read_expect 1 63 5 objects64 " large-offsets"
 '
 
+test_expect_success 'verify multi-pack-index with 64-bit offsets' '
+   git multi-pack-index verify --object-dir=objects64
+'
+
 test_done
-- 
gitgitgadget

[PATCH v2 10/11] multi-pack-index: report progress during 'verify'

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

When verifying a multi-pack-index, the only action that takes
significant time is checking the object offsets. For example,
to verify a multi-pack-index containing 6.2 million objects in
the Linux kernel repository takes 1.3 seconds on my machine.
99% of that time is spent looking up object offsets in each of
the packfiles and comparing them to the multi-pack-index offset.

Add a progress indicator for that section of the 'verify' verb.

Signed-off-by: Derrick Stolee 
---
 midx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/midx.c b/midx.c
index 47e7e6113a..4d4c930522 100644
--- a/midx.c
+++ b/midx.c
@@ -7,6 +7,7 @@
 #include "object-store.h"
 #include "sha1-lookup.h"
 #include "midx.h"
+#include "progress.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -940,6 +941,7 @@ static void midx_report(const char *fmt, ...)
 int verify_midx_file(const char *object_dir)
 {
uint32_t i;
+   struct progress *progress = NULL;
struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
verify_midx_error = 0;
 
@@ -971,6 +973,7 @@ int verify_midx_file(const char *object_dir)
i, oid_to_hex(), oid_to_hex(), i 
+ 1);
}
 
+   progress = start_progress(_("Verifying object offsets"), 
m->num_objects);
for (i = 0; i < m->num_objects; i++) {
struct object_id oid;
struct pack_entry e;
@@ -995,7 +998,10 @@ int verify_midx_file(const char *object_dir)
if (m_offset != p_offset)
midx_report(_("incorrect object offset for oid[%d] = 
%s: %"PRIx64" != %"PRIx64),
i, oid_to_hex(), m_offset, p_offset);
+
+   display_progress(progress, i + 1);
}
+   stop_progress();
 
return verify_midx_error;
 }
-- 
gitgitgadget

[PATCH v2 04/11] multi-pack-index: verify packname order

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The final check we make while loading a multi-pack-index is that
the packfile names are in lexicographical order. Make this error
be a die() instead.

In order to test this condition, we need multiple packfiles.
Earlier in t5319-multi-pack-index.sh, we tested the interaction with
'git repack' but this limits us to one packfile in our object dir.
Move these repack tests until after the 'verify' tests.

Signed-off-by: Derrick Stolee 
---
 midx.c  |  6 ++
 t/t5319-multi-pack-index.sh | 10 ++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 8b054b39ab..e655a15aed 100644
--- a/midx.c
+++ b/midx.c
@@ -157,12 +157,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir, int local
 
cur_pack_name += strlen(cur_pack_name) + 1;
 
-   if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
-   error(_("multi-pack-index pack names out of order: '%s' 
before '%s'"),
+   if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
+   die(_("multi-pack-index pack names out of order: '%s' 
before '%s'"),
  m->pack_names[i - 1],
  m->pack_names[i]);
-   goto cleanup_fail;
-   }
}
 
return m;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index c54b6e7188..01a3cd6b00 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -181,6 +181,11 @@ MIDX_BYTE_CHUNK_COUNT=6
 MIDX_HEADER_SIZE=12
 MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
 MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
+MIDX_NUM_CHUNKS=5
+MIDX_CHUNK_LOOKUP_WIDTH=12
+MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
+$MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
+MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
 
 test_expect_success 'verify bad version' '
corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -212,6 +217,11 @@ test_expect_success 'verify invalid chunk offset' '
"invalid chunk offset (too large)"
 '
 
+test_expect_success 'verify packnames out of order' '
+   corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
+   "pack names out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
-- 
gitgitgadget

[PATCH v2 09/11] multi-pack-index: verify object offsets

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The 'git multi-pack-index verify' command must verify the object
offsets stored in the multi-pack-index are correct. There are two
ways the offset chunk can be incorrect: the pack-int-id and the
object offset.

Replace the BUG() statement with a die() statement, now that we
may hit a bad pack-int-id during a 'verify' command on a corrupt
multi-pack-index, and it is covered by a test.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 29 -
 t/t5319-multi-pack-index.sh | 27 +++
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 80094c02a7..47e7e6113a 100644
--- a/midx.c
+++ b/midx.c
@@ -197,7 +197,8 @@ int prepare_midx_pack(struct multi_pack_index *m, uint32_t 
pack_int_id)
struct strbuf pack_name = STRBUF_INIT;
 
if (pack_int_id >= m->num_packs)
-   BUG("bad pack-int-id");
+   die(_("bad pack-int-id: %u (%u total packs"),
+   pack_int_id, m->num_packs);
 
if (m->packs[pack_int_id])
return 0;
@@ -970,5 +971,31 @@ int verify_midx_file(const char *object_dir)
i, oid_to_hex(), oid_to_hex(), i 
+ 1);
}
 
+   for (i = 0; i < m->num_objects; i++) {
+   struct object_id oid;
+   struct pack_entry e;
+   off_t m_offset, p_offset;
+
+   nth_midxed_object_oid(, m, i);
+   if (!fill_midx_entry(, , m)) {
+   midx_report(_("failed to load pack entry for oid[%d] = 
%s"),
+   i, oid_to_hex());
+   continue;
+   }
+
+   if (open_pack_index(e.p)) {
+   midx_report(_("failed to load pack-index for packfile 
%s"),
+   e.p->pack_name);
+   break;
+   }
+
+   m_offset = e.offset;
+   p_offset = find_pack_entry_one(oid.hash, e.p);
+
+   if (m_offset != p_offset)
+   midx_report(_("incorrect object offset for oid[%d] = 
%s: %"PRIx64" != %"PRIx64),
+   i, oid_to_hex(), m_offset, p_offset);
+   }
+
return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index a968b9a959..828c240389 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -176,6 +176,7 @@ test_expect_success 'verify bad signature' '
 '
 
 HASH_LEN=20
+NUM_OBJECTS=74
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
@@ -192,6 +193,10 @@ MIDX_OID_FANOUT_WIDTH=4
 MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * 
$MIDX_OID_FANOUT_WIDTH + 1))
 MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * 
$MIDX_OID_FANOUT_WIDTH))
 MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
+MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * 
$HASH_LEN))
+MIDX_OFFSET_WIDTH=8
+MIDX_BYTE_PACK_INT_ID=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH 
+ 2))
+MIDX_BYTE_OFFSET=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 6))
 
 test_expect_success 'verify bad version' '
corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -243,6 +248,16 @@ test_expect_success 'verify oid lookup out of order' '
"oid lookup out of order"
 '
 
+test_expect_success 'verify incorrect pack-int-id' '
+   corrupt_midx_and_verify $MIDX_BYTE_PACK_INT_ID "\07" $objdir \
+   "bad pack-int-id"
+'
+
+test_expect_success 'verify incorrect offset' '
+   corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\07" $objdir \
+   "incorrect object offset"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
@@ -310,4 +325,16 @@ test_expect_success 'verify multi-pack-index with 64-bit 
offsets' '
git multi-pack-index verify --object-dir=objects64
 '
 
+NUM_OBJECTS=63
+MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + 54))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * 
$MIDX_OID_FANOUT_WIDTH))
+MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * 
$HASH_LEN))
+MIDX_OFFSET_LARGE_OFFSETS=$(($MIDX_OFFSET_OBJECT_OFFSETS + $NUM_OBJECTS * 
$MIDX_OFFSET_WIDTH))
+MIDX_BYTE_LARGE_OFFSET=$(($MIDX_OFFSET_LARGE_OFFSETS + 3))
+
+test_expect_success 'verify incorrect 64-bit offset' '
+   corrupt_midx_and_verify $MIDX_BYTE_LARGE_OFFSET "\07" objects64 \
+   "incorrect object offset"
+'
+
 test_done
-- 
gitgitgadget

[PATCH v2 08/11] multi-pack-index: fix 32-bit vs 64-bit size check

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

When loading a 64-bit offset, we intend to check that off_t can store
the resulting offset. However, the condition accidentally checks the
32-bit offset to see if it is smaller than a 64-bit value. Fix it,
and this will be covered by a test in the 'git multi-pack-index verify'
command in a later commit.

Signed-off-by: Derrick Stolee 
---
 midx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 06d5cfc826..80094c02a7 100644
--- a/midx.c
+++ b/midx.c
@@ -236,7 +236,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, 
uint32_t pos)
offset32 = get_be32(offset_data + sizeof(uint32_t));
 
if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
-   if (sizeof(offset32) < sizeof(uint64_t))
+   if (sizeof(off_t) < sizeof(uint64_t))
die(_("multi-pack-index stores a 64-bit offset, but 
off_t is too small"));
 
offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
-- 
gitgitgadget

[PATCH v2 05/11] multi-pack-index: verify missing pack

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

Signed-off-by: Derrick Stolee 
---
 midx.c  | 16 
 t/t5319-multi-pack-index.sh |  5 +
 2 files changed, 21 insertions(+)

diff --git a/midx.c b/midx.c
index e655a15aed..a02b19efc1 100644
--- a/midx.c
+++ b/midx.c
@@ -926,13 +926,29 @@ void clear_midx_file(const char *object_dir)
 
 int verify_midx_error;
 
+static void midx_report(const char *fmt, ...)
+{
+   va_list ap;
+   verify_midx_error = 1;
+   va_start(ap, fmt);
+   vfprintf(stderr, fmt, ap);
+   fprintf(stderr, "\n");
+   va_end(ap);
+}
+
 int verify_midx_file(const char *object_dir)
 {
+   uint32_t i;
struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
verify_midx_error = 0;
 
if (!m)
return 0;
 
+   for (i = 0; i < m->num_packs; i++) {
+   if (prepare_midx_pack(m, i))
+   midx_report("failed to load pack in position %d", i);
+   }
+
return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 01a3cd6b00..0a566afb05 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -222,6 +222,11 @@ test_expect_success 'verify packnames out of order' '
"pack names out of order"
 '
 
+test_expect_success 'verify packnames out of order' '
+   corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
+   "failed to load pack"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
-- 
gitgitgadget

[PATCH v2 11/11] fsck: verify multi-pack-index

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

When core.multiPackIndex is true, we may have a multi-pack-index
in our object directory. Add calls to 'git multi-pack-index verify'
at the end of 'git fsck' if so.

Signed-off-by: Derrick Stolee 
---
 builtin/fsck.c  | 18 ++
 t/t5319-multi-pack-index.sh | 13 -
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 63c8578cc1..06eb421720 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -848,5 +848,23 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
}
 
+   if (!git_config_get_bool("core.multipackindex", ) && i) {
+   struct child_process midx_verify = CHILD_PROCESS_INIT;
+   const char *midx_argv[] = { "multi-pack-index", "verify", NULL, 
NULL, NULL };
+
+   midx_verify.argv = midx_argv;
+   midx_verify.git_cmd = 1;
+   if (run_command(_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+
+   prepare_alt_odb(the_repository);
+   for (alt =  the_repository->objects->alt_odb_list; alt; alt = 
alt->next) {
+   midx_argv[2] = "--object-dir";
+   midx_argv[3] = alt->path;
+   if (run_command(_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+   }
+   }
+
return errors_found;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 828c240389..bd8e841b81 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -160,12 +160,17 @@ corrupt_midx_and_verify() {
DATA="${2:-\0}" &&
OBJDIR=$3 &&
GREPSTR="$4" &&
+   COMMAND="$5" &&
+   if test -z "$COMMAND"
+   then
+   COMMAND="git multi-pack-index verify --object-dir=$OBJDIR"
+   fi &&
FILE=$OBJDIR/pack/multi-pack-index &&
chmod a+w $FILE &&
test_when_finished mv midx-backup $FILE &&
cp $FILE midx-backup &&
printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
-   test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 
2>test_err &&
+   test_must_fail $COMMAND 2>test_err &&
grep -v "^+" test_err >err &&
test_i18ngrep "$GREPSTR" err
 }
@@ -258,6 +263,12 @@ test_expect_success 'verify incorrect offset' '
"incorrect object offset"
 '
 
+test_expect_success 'git-fsck incorrect offset' '
+   corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\07" $objdir \
+   "incorrect object offset" \
+   "git -c core.multipackindex=true fsck"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
-- 
gitgitgadget

[PATCH v2 07/11] multi-pack-index: verify oid lookup order

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

Signed-off-by: Derrick Stolee 
---
 midx.c  | 11 +++
 t/t5319-multi-pack-index.sh |  8 
 2 files changed, 19 insertions(+)

diff --git a/midx.c b/midx.c
index dfd26b4d74..06d5cfc826 100644
--- a/midx.c
+++ b/midx.c
@@ -959,5 +959,16 @@ int verify_midx_file(const char *object_dir)
i, oid_fanout1, oid_fanout2, i + 1);
}
 
+   for (i = 0; i < m->num_objects - 1; i++) {
+   struct object_id oid1, oid2;
+
+   nth_midxed_object_oid(, m, i);
+   nth_midxed_object_oid(, m, i + 1);
+
+   if (oidcmp(, ) >= 0)
+   midx_report(_("oid lookup out of order: oid[%d] = %s >= 
%s = oid[%d]"),
+   i, oid_to_hex(), oid_to_hex(), i 
+ 1);
+   }
+
return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 47a54e138d..a968b9a959 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -175,6 +175,7 @@ test_expect_success 'verify bad signature' '
"multi-pack-index signature"
 '
 
+HASH_LEN=20
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
@@ -189,6 +190,8 @@ MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
 MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + 652))
 MIDX_OID_FANOUT_WIDTH=4
 MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * 
$MIDX_OID_FANOUT_WIDTH + 1))
+MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * 
$MIDX_OID_FANOUT_WIDTH))
+MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
 
 test_expect_success 'verify bad version' '
corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -235,6 +238,11 @@ test_expect_success 'verify oid fanout out of order' '
"oid fanout out of order"
 '
 
+test_expect_success 'verify oid lookup out of order' '
+   corrupt_midx_and_verify $MIDX_BYTE_OID_LOOKUP "\00" $objdir \
+   "oid lookup out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
-- 
gitgitgadget

[PATCH v2 03/11] multi-pack-index: verify corrupt chunk lookup table

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

Signed-off-by: Derrick Stolee 
---
 midx.c  |  3 +++
 t/t5319-multi-pack-index.sh | 13 +
 2 files changed, 16 insertions(+)

diff --git a/midx.c b/midx.c
index ec78254bb6..8b054b39ab 100644
--- a/midx.c
+++ b/midx.c
@@ -100,6 +100,9 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir, int local
uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 
+
 MIDX_CHUNKLOOKUP_WIDTH * i);
 
+   if (chunk_offset >= m->data_len)
+   die(_("invalid chunk offset (too large)"));
+
switch (chunk_id) {
case MIDX_CHUNKID_PACKNAMES:
m->chunk_pack_names = m->data + chunk_offset;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index e04b5f43a2..c54b6e7188 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -178,6 +178,9 @@ test_expect_success 'verify bad signature' '
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
+MIDX_HEADER_SIZE=12
+MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
+MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
 
 test_expect_success 'verify bad version' '
corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -199,6 +202,16 @@ test_expect_success 'verify extended chunk count' '
"terminating multi-pack-index chunk id appears earlier than 
expected"
 '
 
+test_expect_success 'verify missing required chunk' '
+   corrupt_midx_and_verify $MIDX_BYTE_CHUNK_ID "\01" $objdir \
+   "missing required"
+'
+
+test_expect_success 'verify invalid chunk offset' '
+   corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
+   "invalid chunk offset (too large)"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
-- 
gitgitgadget

[PATCH v2 02/11] multi-pack-index: verify bad header

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

When verifying if a multi-pack-index file is valid, we want the
command to fail to signal an invalid file. Previously, we wrote
an error to stderr and continued as if we had no multi-pack-index.
Now, die() instead of error().

Add tests that check corrupted headers in a few ways:

* Bad signature
* Bad file version
* Bad hash version
* Truncated hash count
* Extended hash count

Signed-off-by: Derrick Stolee 
---
 midx.c  | 18 +--
 t/t5319-multi-pack-index.sh | 46 -
 2 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/midx.c b/midx.c
index b253bed517..ec78254bb6 100644
--- a/midx.c
+++ b/midx.c
@@ -76,24 +76,18 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir, int local
m->local = local;
 
m->signature = get_be32(m->data);
-   if (m->signature != MIDX_SIGNATURE) {
-   error(_("multi-pack-index signature 0x%08x does not match 
signature 0x%08x"),
+   if (m->signature != MIDX_SIGNATURE)
+   die(_("multi-pack-index signature 0x%08x does not match 
signature 0x%08x"),
  m->signature, MIDX_SIGNATURE);
-   goto cleanup_fail;
-   }
 
m->version = m->data[MIDX_BYTE_FILE_VERSION];
-   if (m->version != MIDX_VERSION) {
-   error(_("multi-pack-index version %d not recognized"),
+   if (m->version != MIDX_VERSION)
+   die(_("multi-pack-index version %d not recognized"),
  m->version);
-   goto cleanup_fail;
-   }
 
hash_version = m->data[MIDX_BYTE_HASH_VERSION];
-   if (hash_version != MIDX_HASH_VERSION) {
-   error(_("hash version %u does not match"), hash_version);
-   goto cleanup_fail;
-   }
+   if (hash_version != MIDX_HASH_VERSION)
+   die(_("hash version %u does not match"), hash_version);
m->hash_len = MIDX_HASH_LEN;
 
m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1c4e0e6d31..e04b5f43a2 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -154,6 +154,51 @@ test_expect_success 'verify multi-pack-index success' '
git multi-pack-index verify --object-dir=$objdir
 '
 
+# usage: corrupt_midx_and_verify
+corrupt_midx_and_verify() {
+   POS=$1 &&
+   DATA="${2:-\0}" &&
+   OBJDIR=$3 &&
+   GREPSTR="$4" &&
+   FILE=$OBJDIR/pack/multi-pack-index &&
+   chmod a+w $FILE &&
+   test_when_finished mv midx-backup $FILE &&
+   cp $FILE midx-backup &&
+   printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
+   test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 
2>test_err &&
+   grep -v "^+" test_err >err &&
+   test_i18ngrep "$GREPSTR" err
+}
+
+test_expect_success 'verify bad signature' '
+   corrupt_midx_and_verify 0 "\00" $objdir \
+   "multi-pack-index signature"
+'
+
+MIDX_BYTE_VERSION=4
+MIDX_BYTE_OID_VERSION=5
+MIDX_BYTE_CHUNK_COUNT=6
+
+test_expect_success 'verify bad version' '
+   corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
+   "multi-pack-index version"
+'
+
+test_expect_success 'verify bad OID version' '
+   corrupt_midx_and_verify $MIDX_BYTE_OID_VERSION "\02" $objdir \
+   "hash version"
+'
+
+test_expect_success 'verify truncated chunk count' '
+   corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
+   "missing required"
+'
+
+test_expect_success 'verify extended chunk count' '
+   corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
+   "terminating multi-pack-index chunk id appears earlier than 
expected"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
@@ -191,7 +236,6 @@ test_expect_success 'multi-pack-index in an alternate' '
 
 compare_results_with_midx "with alternate (remote midx)"
 
-
 # usage: corrupt_data   []
 corrupt_data () {
file=$1
-- 
gitgitgadget

[PATCH v2 06/11] multi-pack-index: verify oid fanout order

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

Signed-off-by: Derrick Stolee 
---
 midx.c  | 9 +
 t/t5319-multi-pack-index.sh | 8 
 2 files changed, 17 insertions(+)

diff --git a/midx.c b/midx.c
index a02b19efc1..dfd26b4d74 100644
--- a/midx.c
+++ b/midx.c
@@ -950,5 +950,14 @@ int verify_midx_file(const char *object_dir)
midx_report("failed to load pack in position %d", i);
}
 
+   for (i = 0; i < 255; i++) {
+   uint32_t oid_fanout1 = ntohl(m->chunk_oid_fanout[i]);
+   uint32_t oid_fanout2 = ntohl(m->chunk_oid_fanout[i + 1]);
+
+   if (oid_fanout1 > oid_fanout2)
+   midx_report(_("oid fanout out of order: fanout[%d] = 
%"PRIx32" > %"PRIx32" = fanout[%d]"),
+   i, oid_fanout1, oid_fanout2, i + 1);
+   }
+
return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 0a566afb05..47a54e138d 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -186,6 +186,9 @@ MIDX_CHUNK_LOOKUP_WIDTH=12
 MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
 MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
+MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + 652))
+MIDX_OID_FANOUT_WIDTH=4
+MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * 
$MIDX_OID_FANOUT_WIDTH + 1))
 
 test_expect_success 'verify bad version' '
corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -227,6 +230,11 @@ test_expect_success 'verify packnames out of order' '
"failed to load pack"
 '
 
+test_expect_success 'verify oid fanout out of order' '
+   corrupt_midx_and_verify $MIDX_BYTE_OID_FANOUT_ORDER "\01" $objdir \
+   "oid fanout out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
test_path_is_file $objdir/pack/multi-pack-index &&
git repack -adf &&
-- 
gitgitgadget

[PATCH v2 00/11] Add 'git multi-pack-index verify' command

2018-09-13 Thread Derrick Stolee via GitGitGadget

The multi-pack-index file provides faster lookups in repos with many
packfiles by duplicating the information from multiple pack-indexes into a
single file. This series allows us to verify a multi-pack-index using 'git
multi-pack-index verify' and 'git fsck' (when core.multiPackIndex is true).

The pattern for the tests is similar to that found in t5318-commit-graph.sh.

During testing, I found a bug in how we check for the size of off_t (we are
not actually checking off_t, but instead uint32_t). See "multi-pack-index:
fix 32-bit vs 64-bit size check".

Thanks to Ævar [1], I added a commit that provides progress updates when
checking object offsets.

Based on ds/multi-pack-index

[1] 
https://public-inbox.org/git/20180904202729.13900-1-ava...@gmail.com/T/#u

Derrick Stolee (11):
  multi-pack-index: add 'verify' verb
  multi-pack-index: verify bad header
  multi-pack-index: verify corrupt chunk lookup table
  multi-pack-index: verify packname order
  multi-pack-index: verify missing pack
  multi-pack-index: verify oid fanout order
  multi-pack-index: verify oid lookup order
  multi-pack-index: fix 32-bit vs 64-bit size check
  multi-pack-index: verify object offsets
  multi-pack-index: report progress during 'verify'
  fsck: verify multi-pack-index

 Documentation/git-multi-pack-index.txt |  10 ++
 builtin/fsck.c |  18 
 builtin/multi-pack-index.c |   4 +-
 midx.c | 113 
 midx.h |   1 +
 t/t5319-multi-pack-index.sh| 136 -
 6 files changed, 262 insertions(+), 20 deletions(-)


base-commit: 6a22d521260f86dff8fe6f23ab329cebb62ba4f0
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-34%2Fderrickstolee%2Fmidx%2Fverify-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-34/derrickstolee/midx/verify-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/34

Range-diff vs v1:

  1:  8dc38afe2b !  1:  d8ffd84d67 multi-pack-index: add 'verify' verb
 @@ -47,7 +47,7 @@
   
   static char const * const builtin_multi_pack_index_usage[] = {
  - N_("git multi-pack-index [--object-dir=] write"),
 -+ N_("git multi-pack-index [--object-dir=] [write|verify]"),
 ++ N_("git multi-pack-index [--object-dir=] (write|verify)"),
NULL
   };
   
  2:  787e1fb616 !  2:  9590895830 multi-pack-index: verify bad header
 @@ -61,10 +61,10 @@
   
  +# usage: corrupt_midx_and_verify
  +corrupt_midx_and_verify() {
 -+ POS=$1
 -+ DATA="${2:-\0}"
 -+ OBJDIR=$3
 -+ GREPSTR="$4"
 ++ POS=$1 &&
 ++ DATA="${2:-\0}" &&
 ++ OBJDIR=$3 &&
 ++ GREPSTR="$4" &&
  + FILE=$OBJDIR/pack/multi-pack-index &&
  + chmod a+w $FILE &&
  + test_when_finished mv midx-backup $FILE &&
  3:  b385aa2abf =  3:  2448173844 multi-pack-index: verify corrupt chunk 
lookup table
  4:  37ee24c82b =  4:  947241bfdc multi-pack-index: verify packname order
  5:  b747da415c =  5:  4058867380 multi-pack-index: verify missing pack
  6:  58e5c09468 =  6:  ea1c522702 multi-pack-index: verify oid fanout order
  7:  b21772d054 =  7:  511791de91 multi-pack-index: verify oid lookup order
  8:  b08d3f0055 =  8:  210649bf83 multi-pack-index: fix 32-bit vs 64-bit size 
check
  9:  e1498aea45 !  9:  ef20193d59 multi-pack-index: verify object offsets
 @@ -21,7 +21,8 @@
   
if (pack_int_id >= m->num_packs)
  - BUG("bad pack-int-id");
 -+ die(_("bad pack-int-id"));
 ++ die(_("bad pack-int-id: %u (%u total packs"),
 ++ pack_int_id, m->num_packs);
   
if (m->packs[pack_int_id])
return 0;
 10:  acf8cfd632 = 10:  29ebc17161 multi-pack-index: report progress during 
'verify'
 11:  09d16aff20 ! 11:  406c88b456 fsck: verify multi-pack-index
 @@ -40,14 +40,14 @@
  --- a/t/t5319-multi-pack-index.sh
  +++ b/t/t5319-multi-pack-index.sh
  @@
 -  DATA="${2:-\0}"
 -  OBJDIR=$3
 -  GREPSTR="$4"
 -+ COMMAND="$5"
 +  DATA="${2:-\0}" &&
 +  OBJDIR=$3 &&
 +  GREPSTR="$4" &&
 ++ COMMAND="$5" &&
  + if test -z "$COMMAND"
  + then
  + COMMAND="git multi-pack-index verify --object-dir=$OBJDIR"
 -+ fi
 ++ fi &&
FILE=$OBJDIR/pack/multi-pack-index &&
chmod a+w $FILE &&
test_when_finished mv midx-backup $FILE &&

-- 
gitgitgadget

[PATCH v1] preload-index: update GIT_FORCE_PRELOAD_TEST support

2018-09-13 Thread Ben Peart

Rename GIT_FORCE_PRELOAD_TEST to GIT_TEST_PRELOAD for consistency with the
other GIT_TEST_ special setups and properly document its use.

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: v2.19.0
Web-Diff: https://github.com/benpeart/git/commit/dcd201b920
Checkout: git fetch https://github.com/benpeart/git git-test-preload-v1 && 
git checkout dcd201b920

 preload-index.c | 2 +-
 t/README| 3 +++
 t/t7519-status-fsmonitor.sh | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/preload-index.c b/preload-index.c
index 71cd2437a3..cc8a7333c2 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -84,7 +84,7 @@ static void preload_index(struct index_state *index,
return;
 
threads = index->cache_nr / THREAD_COST;
-   if ((index->cache_nr > 1) && (threads < 2) && 
getenv("GIT_FORCE_PRELOAD_TEST"))
+   if ((index->cache_nr > 1) && (threads < 2) && 
git_env_bool("GIT_TEST_PRELOAD", 0))
threads = 2;
if (threads < 2)
return;
diff --git a/t/README b/t/README
index 9028b47d92..73fb09560f 100644
--- a/t/README
+++ b/t/README
@@ -319,6 +319,9 @@ GIT_TEST_OE_DELTA_SIZE= exercises the uncomon 
pack-objects code
 path where deltas larger than this limit require extra memory
 allocation for bookkeeping.
 
+GIT_TEST_PRELOAD= exercises the preload-index code path by
+overriding the minimum number of cache entries required per thread.
+
 Naming Tests
 
 
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 756beb0d8e..9b703d49b5 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -245,9 +245,9 @@ do
git config core.preloadIndex $preload_val &&
if test $preload_val = true
then
-   GIT_FORCE_PRELOAD_TEST=$preload_val; export 
GIT_FORCE_PRELOAD_TEST
+   GIT_TEST_PRELOAD=$preload_val; export GIT_TEST_PRELOAD
else
-   unset GIT_FORCE_PRELOAD_TEST
+   sane_unset GIT_TEST_PRELOAD
fi
'
 

base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670
-- 
2.18.0.windows.1

RE: [Question] Signature calculation ignoring parts of binary files

2018-09-13 Thread Randall S. Becker

On September 13, 2018 1:52 PM, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > "Randall S. Becker"  writes:
> >
> >> The scenario is slightly different.
> >> 1. Person A gives me a new binary file-1 with fingerprint A1. This
> >> goes into git unchanged.
> >> 2. Person B gives me binary file-2 with fingerprint B2. This does not
> >> go into git yet.
> >> 3. We attempt a git diff between the committed file-1 and uncommitted
> >> file-2 using a textconv implementation that strips what we don't need
to
> compare.
> >> 4. If file-1 and file-2 have no difference when textconv is used,
> >> file-2 is not added and not committed. It is discarded with impunity,
> >> never to be seen again, although we might whine a lot at the user for
> >> attempting to put
> >> file-2 in - but that's not git's issue.
> >
> > You are forgetting that Git is a distributed version control system,
> > aren't you?  Person A and B can introduce their "moral equivalent but
> > bytewise different" copies to their repository under the same object
> > name, and you can pull from them--what happens?
> >
> > It is fundamental that one object name given to Git identifies one
> > specific byte sequence contained in an object uniquely.  Once you
> > broke that, you no longer have Git.
> 
> Having said all that, if you want to keep the original with frills but
somehow
> give these bytewise different things that reduce to the same essence (e.g.
> when passed thru a filter like textconv), I suspect a better approach
might be
> to store both the "original" and the result of passing the "original"
through
> the filter in the object database.  In the above example, you'll get two
> "original"
> objects from person A and person B, plus one "canonical" object that are
> bytewise different from either of these two originals, but what they
reduce
> to when you use the filter on them.  Then you record the fact that to
derive
> the "essence" object, you can reduce either person A's or person B's
> "original" through the filter, perhaps by using "git notes" attached to
the
> "essence" object, recording the object names of these originals (the
reason
> why using notes in this direction is because you can mechanically
determine
> which "essence"
> object any given "original" object reduces to---it is just the matter of
passing
> it through the filter.  But there can be more than one "original" that
reduces
> to the same "essence").

I like that idea. It turns the reduced object into a contract. Thanks.

Re: [PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Eric Sunshine

On Thu, Sep 13, 2018 at 10:56 AM Derrick Stolee via GitGitGadget
 wrote:
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.

The bit about die() blocks is perhaps a bit too general. While it's
true that some die()'s signal very unlikely (or near-impossible)
conditions, others are merely reporting invalid user or other input to
the program. The latter category is often very much worth testing, as
the number of test_must_fail() invocations in the test suite shows.
68a6b3a1bd (worktree: teach 'move' to override lock when --force given
twice, 2018-08-28), which was highlighted in your cover letter,
provides a good example of legitimately testing that a die() is
covered. So, perhaps the above can be toned-down a bit by saying
something like:

...but die() blocks covering very unlikely (or near-impossible)
situations may not warrant coverage.

> It is important to not measure the coverage of the codebase by what old code
> is not covered.

Re: [Question] Signature calculation ignoring parts of binary files

2018-09-13 Thread Junio C Hamano

Junio C Hamano  writes:

> "Randall S. Becker"  writes:
>
>> The scenario is slightly different.
>> 1. Person A gives me a new binary file-1 with fingerprint A1. This goes into
>> git unchanged.
>> 2. Person B gives me binary file-2 with fingerprint B2. This does not go
>> into git yet.
>> 3. We attempt a git diff between the committed file-1 and uncommitted file-2
>> using a textconv implementation that strips what we don't need to compare.
>> 4. If file-1 and file-2 have no difference when textconv is used, file-2 is
>> not added and not committed. It is discarded with impunity, never to be seen
>> again, although we might whine a lot at the user for attempting to put
>> file-2 in - but that's not git's issue.
>
> You are forgetting that Git is a distributed version control system,
> aren't you?  Person A and B can introduce their "moral equivalent
> but bytewise different" copies to their repository under the same
> object name, and you can pull from them--what happens?
>
> It is fundamental that one object name given to Git identifies one
> specific byte sequence contained in an object uniquely.  Once you
> broke that, you no longer have Git.

Having said all that, if you want to keep the original with frills
but somehow give these bytewise different things that reduce to the
same essence (e.g. when passed thru a filter like textconv), I
suspect a better approach might be to store both the "original" and
the result of passing the "original" through the filter in the
object database.  In the above example, you'll get two "original"
objects from person A and person B, plus one "canonical" object that
are bytewise different from either of these two originals, but what
they reduce to when you use the filter on them.  Then you record the
fact that to derive the "essence" object, you can reduce either
person A's or person B's "original" through the filter, perhaps by
using "git notes" attached to the "essence" object, recording the
object names of these originals (the reason why using notes in this
direction is because you can mechanically determine which "essence"
object any given "original" object reduces to---it is just the
matter of passing it through the filter.  But there can be more than
one "original" that reduces to the same "essence").

[PATCH v1] fsmonitor: update GIT_TEST_FSMONITOR support

2018-09-13 Thread Ben Peart

Rename GIT_FSMONITOR_TEST to GIT_TEST_FSMONITOR for consistency with the
other GIT_TEST_ special setups and properly document its use.

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: v2.19.0
Web-Diff: https://github.com/benpeart/git/commit/311484a684
Checkout: git fetch https://github.com/benpeart/git git-test-fsmonitor-v1 
&& git checkout 311484a684

 config.c| 2 +-
 t/README| 4 
 t/t1700-split-index.sh  | 2 +-
 t/t7519-status-fsmonitor.sh | 2 +-
 4 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/config.c b/config.c
index 3461993f0a..3555c63f28 100644
--- a/config.c
+++ b/config.c
@@ -2278,7 +2278,7 @@ int git_config_get_max_percent_split_change(void)
 int git_config_get_fsmonitor(void)
 {
if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
-   core_fsmonitor = getenv("GIT_FSMONITOR_TEST");
+   core_fsmonitor = getenv("GIT_TEST_FSMONITOR");
 
if (core_fsmonitor && !*core_fsmonitor)
core_fsmonitor = NULL;
diff --git a/t/README b/t/README
index 9028b47d92..545438c820 100644
--- a/t/README
+++ b/t/README
@@ -319,6 +319,10 @@ GIT_TEST_OE_DELTA_SIZE= exercises the uncomon 
pack-objects code
 path where deltas larger than this limit require extra memory
 allocation for bookkeeping.
 
+GIT_TEST_FSMONITOR=$PWD/t7519/fsmonitor-all exercises the fsmonitor
+code path for utilizing a file system monitor to speed up detecting
+new or changed files.
+
 Naming Tests
 
 
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index b3b4d83eaf..f6a856f24c 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -6,7 +6,7 @@ test_description='split index mode tests'
 
 # We need total control of index splitting here
 sane_unset GIT_TEST_SPLIT_INDEX
-sane_unset GIT_FSMONITOR_TEST
+sane_unset GIT_TEST_FSMONITOR
 
 test_expect_success 'enable split index' '
git config splitIndex.maxPercentChange 100 &&
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index 756beb0d8e..d77012ea6d 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -8,7 +8,7 @@ test_description='git status with file system watcher'
 # To run the entire git test suite using fsmonitor:
 #
 # copy t/t7519/fsmonitor-all to a location in your path and then set
-# GIT_FSMONITOR_TEST=fsmonitor-all and run your tests.
+# GIT_TEST_FSMONITOR=fsmonitor-all and run your tests.
 #
 
 # Note, after "git reset --hard HEAD" no extensions exist other than 'TREE'

base-commit: 1d4361b0f344188ab5eec6dcea01f61a3a3a1670
-- 
2.18.0.windows.1

Re: [PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Junio C Hamano

"Derrick Stolee via GitGitGadget"  writes:

> From: Derrick Stolee 
>
> We have coverage targets in our Makefile for using gcov to display line
> coverage based on our test suite. The way I like to do it is to run:
>
> make coverage-test
> make coverage-report
>
> This leaves the repo in a state where every X.c file that was covered has
> an X.c.gcov file containing the coverage counts for every line, and "#"
> at every uncovered line.
>
> There have been a few bugs in recent patches what would have been caught
> if the test suite covered those blocks (including a few of mine). I want
> to work towards a "sensible" amount of coverage on new topics. In my opinion,
> this means that any logic should be covered, but the 'die()' blocks in error
> cases do not need to be covered.
>
> It is important to not measure the coverage of the codebase by what old code
> is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
> After creating the coverage statistics at a version (say, 'topic') you can
> then run
>
> contrib/coverage-diff.sh base topic
>
> to see the lines added between 'base' and 'topic' that are not covered by the
> test suite. The output uses 'git blame -c' format so you can find the commits
> responsible and view the line numbers for quick access to the context.
>
> Signed-off-by: Derrick Stolee 
> ---
>  contrib/coverage-diff.sh | 63 
>  1 file changed, 63 insertions(+)
>  create mode 100755 contrib/coverage-diff.sh
>
> diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
> new file mode 100755
> index 00..0f226f038c
> --- /dev/null
> +++ b/contrib/coverage-diff.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +
> +# Usage: 'contrib/coverage-diff.sh  
> +# Outputs a list of new lines in version2 compared to version1 that are
> +# not covered by the test suite. Assumes you ran
> +# 'make coverage-test coverage-report' from root first, so .gcov files exist.

I presume that it is "from root first while you have a checkout of
version2, so .gcov files for version2 exist in the working tree"?

> +V1=$1
> +V2=$2
> +
> +diff_lines () {
> + while read line
> + do
> + if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? 
> \\+([0-9]+)(,[0-9]+)? @@.*"

As you know you are reading from diff output, you do not have to be
so strict in this if condition.  It's not like you try to carefully
reject a line that began with "@@" but did not match this pattern in
the corresponding else block.

"read line" does funny things to backslashes and whitespaces in the
payload ("read -r line" sometimes helps), and echoing $line without
quoting will destroy its contents here and in the line below (but
you do not care because all you care about is if it begins with a
dash).

> + then
> + line_num=$(echo $line \
> + | awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? 
> \\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
> + else
> + echo "$line_num:$line"
> + if ! echo $line | grep -q -e "^-"

If a patch removes a line with a solitary 'n' on it, possibly
followed by a SP and something else, such a line would say "-n
something else", and some implementation of "echo -n something else"
would do what this line does not expect.  A safer way to do this,
as you only care if it is a deletion line, is to do

case "$line" in -*) ;; *) line_num=$(( $line_num + 1 ));; esac

Also you can make the echoing of "$line_num:$line" above
conditional, which will allow you to lose "grep ':+'" in the
pipeline that consumes output from this thing, e.g.

case "$line" in +*) echo "$line_num:$line";; esac

iff you must write this in shell (but see below).

> + then
> + line_num=$(($line_num + 1))
> + fi
> + fi
> + done

I have a feeling that a single Perl script instead of a shell loop
that runs many grep and awk as subprocesses performs better even on
Windows, and it would be more readable and maintainable.

perl -e '
my $line_num;
while (<>) {
# Hunk header?  Grab the beginning in postimage.
if (/^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@/) {
$line_num = $1;
next;
}

# Have we seen a hunk?  Ignore "diff --git" etc.
next unless defined $line_num;

# Deleted line? Ignore.
if (/^-/) {
next;
}

# Show only the line number of added lines.
if (/^\+/) {
print "$line_num\n";
}
# Either common context or added line appear in
# the postimage.  Count it.
$line_num++;
}
'

or something like that, given that you seem to only need line

Re: [PATCH v5 00/12] Hash-independent tests (part 3)

2018-09-13 Thread Derrick Stolee


On 9/13/2018 1:17 AM, brian m. carlson wrote:

This is the next in the series of improvements to make tests
hash-independent.

A range-diff is below.

Changes from v4:
* Add local statements to the &&-chain.
* Fix a typo in the local statement.
* Add a helpful comment about why test_detect_hash is hard-coded to
   SHA-1.


The range-diff and these changes since v4 are reasonable. I'm happy with 
the current version.


I'm looking forward to this change getting merged down, as I also want 
to use 'test_oid rawsz' for the 'git multi-pack-index verify' tests that 
are currently under review (v2 imminent).


Thanks,

-Stolee

Re: [PATCH 3/3] archive: allow archive over HTTP(S) with proto v2

2018-09-13 Thread Junio C Hamano

Josh Steadmon  writes:

> Signed-off-by: Josh Steadmon 
> ---
>  builtin/archive.c  |  8 +++-
>  http-backend.c | 10 +-
>  transport-helper.c |  5 +++--
>  3 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index 73831887d..5fa75b3f7 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -87,7 +87,13 @@ static int run_remote_archiver(int argc, const char **argv,
>   status = packet_reader_read();
>   if (status != PACKET_READ_FLUSH)
>   die(_("git archive: expected a flush"));
> - }
> + } else if (version == protocol_v2 &&
> +starts_with(transport->url, "http"))
> + /*
> +  * Commands over HTTP require two requests, so there's an
> +  * additional server response to parse.
> +  */
> + discover_version();

What should happen if the version discovered here is different from
v2 or the capabilities offered are different from what we saw
before?  Perhaps we need some sanity checks, as we on this side of
the connection know we are making two requests, and may even end up
talking with an instance of "upload-archive" that is different from
the one we talked with earlier.

> diff --git a/http-backend.c b/http-backend.c
> index 458642ef7..d62d583c7 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -32,6 +32,7 @@ struct rpc_service {
>  static struct rpc_service rpc_service[] = {
>   { "upload-pack", "uploadpack", 1, 1 },
>   { "receive-pack", "receivepack", 0, -1 },
> + { "upload-archive", "uploadarchive", 1, 1 },
>  };
>  
>  static struct string_list *get_parameters(void)
> @@ -637,6 +638,12 @@ static void service_rpc(struct strbuf *hdr, char 
> *service_name)
>   struct rpc_service *svc = select_service(hdr, service_name);
>   struct strbuf buf = STRBUF_INIT;
>  
> + if (!strcmp(service_name, "git-upload-archive")) {
> + /* git-upload-archive doesn't need --stateless-rpc */
> + argv[1] = ".";
> + argv[2] = NULL;
> + }
> +
>   strbuf_reset();
>   strbuf_addf(, "application/x-git-%s-request", svc->name);
>   check_content_type(hdr, buf.buf);
> @@ -713,7 +720,8 @@ static struct service_cmd {
>   {"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
>  
>   {"POST", "/git-upload-pack$", service_rpc},
> - {"POST", "/git-receive-pack$", service_rpc}
> + {"POST", "/git-receive-pack$", service_rpc},
> + {"POST", "/git-upload-archive$", service_rpc},
>  };
>  
>  static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
> diff --git a/transport-helper.c b/transport-helper.c
> index 143ca008c..b4b96fc89 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -605,7 +605,8 @@ static int process_connect_service(struct transport 
> *transport,
>   ret = run_connect(transport, );
>   } else if (data->stateless_connect &&
>  (get_protocol_version_config() == protocol_v2) &&
> -!strcmp("git-upload-pack", name)) {
> +(!strcmp("git-upload-pack", name) ||
> + !strcmp("git-upload-archive", name))) {
>   strbuf_addf(, "stateless-connect %s\n", name);
>   ret = run_connect(transport, );
>   if (ret)
> @@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, 
> const char *name,
>  
>   /* Get_helper so connect is inited. */
>   get_helper(transport);
> - if (!data->connect)
> + if (!data->connect && !data->stateless_connect)
>   die(_("operation not supported by protocol"));

This is somewhat curious.  The upload-pack going over HTTP also is
triggered by the same "stateless-connect" remote helper command, as
we just saw in the previous hunk, and that support is not new.  Why
do we need this change then?  What's different between the "data"
that is obtained by get_helper(transport) for driving upload-pack
and upload-archive?  Presumably upload-pack was working without this
change because it also sets the connect bit on, and upload-archive
does not work without this change because it does not?  Why do these
two behave differently?

>   if (!process_connect_service(transport, name, exec))

Re: [PATCH v2 1/1] commit-reach: properly peel tags

2018-09-13 Thread Derrick Stolee


On 9/13/2018 12:10 PM, Derrick Stolee via GitGitGadget wrote:

From: Derrick Stolee 

The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e
"commit-reach: make can_all_from_reach... linear" but incorrectly
assumed that all objects provided were commits. During a fetch
negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags
to the 'from' array. The current code creates a segfault.

Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach'
and add a test in t6600-test-reach.sh that demonstrates this segfault.

Correct the issue by peeling tags when investigating the initial list
of objects in the 'from' array.

Signed-off-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
  commit-reach.c| 33 ++---
  t/helper/test-reach.c | 22 +-
  t/t6600-test-reach.sh | 30 --
  3 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 86715c103c..4048a2132a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -544,20 +544,39 @@ int can_all_from_reach_with_flag(struct object_array 
*from,
  {
struct commit **list = NULL;
int i;
+   int nr_commits;
int result = 1;
  
  	ALLOC_ARRAY(list, from->nr);

+   nr_commits = 0;
for (i = 0; i < from->nr; i++) {
-   list[i] = (struct commit *)from->objects[i].item;
+   struct object *from_one = from->objects[i].item;
  
-		if (parse_commit(list[i]) ||

-   list[i]->generation < min_generation)
-   return 0;
+   if (!from_one || from_one->flags & assign_flag)
+   continue;
+
+   from_one = deref_tag(the_repository, from_one,
+"a from object", 0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
+   /* no way to tell if this is reachable by
+* looking at the ancestry chain alone, so
+* leave a note to ourselves not to worry about
+* this object anymore.
+*/
+   from->objects[i].item->flags |= assign_flag;
+   continue;
+   }
+
+   list[nr_commits] = (struct commit *)from_one;
+   if (parse_commit(list[nr_commits]) ||
+   list[nr_commits]->generation < min_generation)
+   return 0; /* is this a leak? */


Of course, after sending v2, I see this comment. This is a leak of 
'list' and should be fixed.


Not only is it a leak here, it is also a leak in the 'cleanup' section. 
I'll squash the following into v3, but I'll let v2 simmer for review 
before rerolling.



diff --git a/commit-reach.c b/commit-reach.c
index 4048a2132a..c457d8d85f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -569,8 +569,11 @@ int can_all_from_reach_with_flag(struct 
object_array *from,


    list[nr_commits] = (struct commit *)from_one;
    if (parse_commit(list[nr_commits]) ||
-   list[nr_commits]->generation < min_generation)
-   return 0; /* is this a leak? */
+   list[nr_commits]->generation < min_generation) {
+   result = 0;
+   goto cleanup;
+   }
+
    nr_commits++;
    }

@@ -623,6 +626,7 @@ int can_all_from_reach_with_flag(struct object_array 
*from,

    clear_commit_marks(list[i], RESULT);
    clear_commit_marks(list[i], assign_flag);
    }
+   free(list);
    return result;
 }

Re: [PATCH 2/3] archive: implement protocol v2 archive command

2018-09-13 Thread Junio C Hamano

Josh Steadmon  writes:

> +static int do_v2_command_and_cap(int out)
> +{
> + packet_write_fmt(out, "command=archive\n");
> + /* Capability list would go here, if we had any. */
> + packet_delim(out);
> +}
> +
>  static int run_remote_archiver(int argc, const char **argv,
>  const char *remote, const char *exec,
>  const char *name_hint)
> @@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
>   struct remote *_remote;
>   struct packet_reader reader;
>   enum packet_read_status status;
> + enum protocol_version version;
>  
>   _remote = remote_get(remote);
>   if (!_remote->url[0])
> @@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
>  
>   packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
>  
> + version = discover_version();

The original version of upload-archive that is correctly running on
the other end sends either NACK (unable to spawn) or ACK (ready to
serve) to us without waiting for us to speak first, so peeking that
with this discover_version() is a safe thing to do.

> + if (version == protocol_v2)
> + do_v2_command_and_cap(fd[1]);
> +

With proto v2, "server capabilities" have already been collected in
server_capabilities_v2 array in discover_version().  We are to pick
and ask the capabilities in that function and respond.  Right now we
do not need to do much, as we saw that very thin implementation of
that function above.

>   /*
>* Inject a fake --format field at the beginning of the
>* arguments, with the format inferred from our output

And then after that, both the original and updated protocol lets us
send the archive format and arguments (like revs and pathspecs),
followed by a flush packet...

> @@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char 
> **argv,
>   packet_write_fmt(fd[1], "argument %s\n", argv[i]);
>   packet_flush(fd[1]);

... which is a piece of code shared between the protocol versions
that ends here.

> - status = packet_reader_read();
> -
> - if (status == PACKET_READ_FLUSH)
> - die(_("git archive: expected ACK/NAK, got a flush packet"));
> - if (strcmp(reader.buffer, "ACK")) {
> - if (starts_with(reader.buffer, "NACK "))
> - die(_("git archive: NACK %s"), reader.buffer + 5);
> - if (starts_with(reader.buffer, "ERR "))
> - die(_("remote error: %s"), reader.buffer + 4);
> - die(_("git archive: protocol error"));
> + if (version == protocol_v0) {
> + status = packet_reader_read();
> +
> + if (status == PACKET_READ_FLUSH)
> + die(_("git archive: expected ACK/NAK, got a flush 
> packet"));
> + if (strcmp(reader.buffer, "ACK")) {
> + if (starts_with(reader.buffer, "NACK "))
> + die(_("git archive: NACK %s"), reader.buffer + 
> 5);
> + if (starts_with(reader.buffer, "ERR "))
> + die(_("remote error: %s"), reader.buffer + 4);
> + die(_("git archive: protocol error"));
> + }
> +
> + status = packet_reader_read();
> + if (status != PACKET_READ_FLUSH)
> + die(_("git archive: expected a flush"));
>   }

The original protocol lets upload-archive to report failure to spawn
the writer backend process and lets us act on it.  We do not need a
similar support in the updated protocol and instead can jump right
into receiving the archive stream because...?

> - status = packet_reader_read();
> - if (status != PACKET_READ_FLUSH)
> - die(_("git archive: expected a flush"));
> -

>   /* Now, start reading from fd[0] and spit it out to stdout */
>   rv = recv_sideband("archive", fd[0], 1);
>   rv |= transport_disconnect(transport);



> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 25d911635..534e8fd56 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -5,6 +5,7 @@
>  #include "builtin.h"
>  #include "archive.h"
>  #include "pkt-line.h"
> +#include "protocol.h"
>  #include "sideband.h"
>  #include "run-command.h"
>  #include "argv-array.h"
> @@ -73,13 +74,53 @@ static ssize_t process_input(int child_fd, int band)
>   return sz;
>  }
>  
> +static int handle_v2_command_and_cap(void)
> +{
> + struct packet_reader reader;
> + enum packet_read_status status;
> +
> + packet_reader_init(, 0, NULL, 0, PACKET_READ_CHOMP_NEWLINE);
> +
> + packet_write_fmt(1, "version 2\n");

This lets the discover_version() on the other side notice that we
are speaking version 2.

> + /*
> +  * We don't currently send any capabilities, but maybe we could list
> +  * supported archival formats?
> +  */
> + packet_flush(1);

[PATCH v2 1/1] commit-reach: properly peel tags

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

The can_all_from_reach_with_flag() algorithm was refactored in 4fbcca4e
"commit-reach: make can_all_from_reach... linear" but incorrectly
assumed that all objects provided were commits. During a fetch
negotiation, ok_to_give_up() in upload-pack.c may provide unpeeled tags
to the 'from' array. The current code creates a segfault.

Add a direct call to can_all_from_reach_with_flag() in 'test-tool reach'
and add a test in t6600-test-reach.sh that demonstrates this segfault.

Correct the issue by peeling tags when investigating the initial list
of objects in the 'from' array.

Signed-off-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
 commit-reach.c| 33 ++---
 t/helper/test-reach.c | 22 +-
 t/t6600-test-reach.sh | 30 --
 3 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 86715c103c..4048a2132a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -544,20 +544,39 @@ int can_all_from_reach_with_flag(struct object_array 
*from,
 {
struct commit **list = NULL;
int i;
+   int nr_commits;
int result = 1;
 
ALLOC_ARRAY(list, from->nr);
+   nr_commits = 0;
for (i = 0; i < from->nr; i++) {
-   list[i] = (struct commit *)from->objects[i].item;
+   struct object *from_one = from->objects[i].item;
 
-   if (parse_commit(list[i]) ||
-   list[i]->generation < min_generation)
-   return 0;
+   if (!from_one || from_one->flags & assign_flag)
+   continue;
+
+   from_one = deref_tag(the_repository, from_one,
+"a from object", 0);
+   if (!from_one || from_one->type != OBJ_COMMIT) {
+   /* no way to tell if this is reachable by
+* looking at the ancestry chain alone, so
+* leave a note to ourselves not to worry about
+* this object anymore.
+*/
+   from->objects[i].item->flags |= assign_flag;
+   continue;
+   }
+
+   list[nr_commits] = (struct commit *)from_one;
+   if (parse_commit(list[nr_commits]) ||
+   list[nr_commits]->generation < min_generation)
+   return 0; /* is this a leak? */
+   nr_commits++;
}
 
-   QSORT(list, from->nr, compare_commits_by_gen);
+   QSORT(list, nr_commits, compare_commits_by_gen);
 
-   for (i = 0; i < from->nr; i++) {
+   for (i = 0; i < nr_commits; i++) {
/* DFS from list[i] */
struct commit_list *stack = NULL;
 
@@ -600,7 +619,7 @@ int can_all_from_reach_with_flag(struct object_array *from,
}
 
 cleanup:
-   for (i = 0; i < from->nr; i++) {
+   for (i = 0; i < nr_commits; i++) {
clear_commit_marks(list[i], RESULT);
clear_commit_marks(list[i], assign_flag);
}
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index eb21103998..08d2ea68e8 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -31,6 +31,7 @@ int cmd__reach(int ac, const char **av)
struct object_id oid_A, oid_B;
struct commit *A, *B;
struct commit_list *X, *Y;
+   struct object_array X_obj = OBJECT_ARRAY_INIT;
struct commit **X_array;
int X_nr, X_alloc;
struct strbuf buf = STRBUF_INIT;
@@ -49,7 +50,8 @@ int cmd__reach(int ac, const char **av)
 
while (strbuf_getline(, stdin) != EOF) {
struct object_id oid;
-   struct object *o;
+   struct object *orig;
+   struct object *peeled;
struct commit *c;
if (buf.len < 3)
continue;
@@ -57,14 +59,14 @@ int cmd__reach(int ac, const char **av)
if (get_oid_committish(buf.buf + 2, ))
die("failed to resolve %s", buf.buf + 2);
 
-   o = parse_object(r, );
-   o = deref_tag_noverify(o);
+   orig = parse_object(r, );
+   peeled = deref_tag_noverify(orig);
 
-   if (!o)
+   if (!peeled)
die("failed to load commit for input %s resulting in 
oid %s\n",
buf.buf, oid_to_hex());
 
-   c = object_as_type(r, o, OBJ_COMMIT, 0);
+   c = object_as_type(r, peeled, OBJ_COMMIT, 0);
 
if (!c)
die("failed to load commit for input %s resulting in 
oid %s\n",
@@ -85,6 +87,7 @@ int cmd__reach(int ac, const char **av)
commit_list_insert(c, );
ALLOC_GROW(X_array, X_nr + 1, X_alloc);
X_array[X_nr++] = c;
+

[PATCH v2 0/1] Properly peel tags in can_all_from_reach_with_flags()

2018-09-13 Thread Derrick Stolee via GitGitGadget

As Peff reported [1], the refactored can_all_from_reach_with_flags() method
does not properly peel tags. Since the helper method can_all_from_reach()
and code in t/helper/test-reach.c all peel tags before getting to this
method, it is not super-simple to create a test that demonstrates this.

I modified t/helper/test-reach.c to allow calling
can_all_from_reach_with_flags() directly, and added a test in
t6600-test-reach.sh that demonstrates the segfault without the fix.

For V2, I compared the loop that inspects the 'from' commits in commit
ba3ca1edce "commit-reach: move can_all_from_reach_with_flags" to the version
here and got the following diff:

3c3
< if (from_one->flags & assign_flag)
---
> if (!from_one || from_one->flags & assign_flag)
5c5,7
< from_one = deref_tag(the_repository, from_one, "a from 
object", 0);
---
>
> from_one = deref_tag(the_repository, from_one,
>  "a from object", 0);
14a17,22
>
> list[nr_commits] = (struct commit *)from_one;
> if (parse_commit(list[nr_commits]) ||
> list[nr_commits]->generation < min_generation)
> return 0; /* is this a leak? */
> nr_commits++;

This diff includes the early termination we had before 'deref_tag' and the
comment for why we can ignore non-commit objects.

[1] 
https://public-inbox.org/git/0bf9103c-9377-506b-7ad7-e5273d8e9...@gmail.com/T/#u

Derrick Stolee (1):
  commit-reach: properly peel tags

 commit-reach.c| 33 ++---
 t/helper/test-reach.c | 22 +-
 t/t6600-test-reach.sh | 30 --
 3 files changed, 71 insertions(+), 14 deletions(-)


base-commit: 6621c838743812aaba96e55cfec8524ea1144c2d
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-39%2Fderrickstolee%2Ftag-fix-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-39/derrickstolee/tag-fix-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/39

Range-diff vs v1:

 1:  948e28 ! 1:  4bf21204dd commit-reach: properly peel tags
 @@ -36,9 +36,17 @@
  - if (parse_commit(list[i]) ||
  - list[i]->generation < min_generation)
  - return 0;
 ++ if (!from_one || from_one->flags & assign_flag)
 ++ continue;
 ++
  + from_one = deref_tag(the_repository, from_one,
  +  "a from object", 0);
  + if (!from_one || from_one->type != OBJ_COMMIT) {
 ++ /* no way to tell if this is reachable by
 ++  * looking at the ancestry chain alone, so
 ++  * leave a note to ourselves not to worry about
 ++  * this object anymore.
 ++  */
  + from->objects[i].item->flags |= assign_flag;
  + continue;
  + }
 @@ -187,7 +195,7 @@
  + echo "can_all_from_reach_with_flag(X,_,_,0,0):1" >expect &&
  + test_three_modes can_all_from_reach_with_flag
  +'
 -+ 
 ++
   test_expect_success 'commit_contains:hit' '
cat >input <<-\EOF &&
A:commit-7-7

-- 
gitgitgadget

RE: [Question] Signature calculation ignoring parts of binary files

2018-09-13 Thread Randall S. Becker

On September 13, 2018 11:03 AM, Junio C Hamano wrote:
> "Randall S. Becker"  writes:
> 
> > The scenario is slightly different.
> > 1. Person A gives me a new binary file-1 with fingerprint A1. This
> > goes into git unchanged.
> > 2. Person B gives me binary file-2 with fingerprint B2. This does not
> > go into git yet.
> > 3. We attempt a git diff between the committed file-1 and uncommitted
> > file-2 using a textconv implementation that strips what we don't need to
> compare.
> > 4. If file-1 and file-2 have no difference when textconv is used,
> > file-2 is not added and not committed. It is discarded with impunity,
> > never to be seen again, although we might whine a lot at the user for
> > attempting to put
> > file-2 in - but that's not git's issue.
> 
> You are forgetting that Git is a distributed version control system,
aren't you?
> Person A and B can introduce their "moral equivalent but bytewise
different"
> copies to their repository under the same object name, and you can pull
from
> them--what happens?
> 
> It is fundamental that one object name given to Git identifies one
specific
> byte sequence contained in an object uniquely.  Once you broke that, you
no
> longer have Git.

At that point I have a morally questionable situation, agreed. However, both
are permitted to exist in the underlying tree without conflict in git -
which I do consider a legitimately possible situation that will not break
the application at all - although there is a semantic conflict in the
application (not in git) that requires human decision to resolve. The fact
that both objects can exist in git with different fingerprints is a good
thing because it provides immutable evidence and ownership of someone
bypassing the intent of the application.

So, rather than using textconv, I shall implement this rule in the
application rather than trying to configure git to do it. If two conflicting
objects enter the commit history, the application will have the
responsibility to resolve the semantic/legal conflict.

Thanks,
Randall

Re: [PATCH 1/3] archive: use packet_reader for communications

2018-09-13 Thread Junio C Hamano

Junio C Hamano  writes:

>> -if (packet_read_line(fd[0], NULL))
>> +status = packet_reader_read();
>> +if (status != PACKET_READ_FLUSH)
>>  die(_("git archive: expected a flush"));
>
> This makes me wonder what happens if we got an EOF instead.  We fail
> to notice protocol error here, but do the code after this part
> correctly handle the situation?

Sorry, this part of my comment is completely backwards.

We require they send a flush, not a 0-length data packet of length
4, and otherwise we die, even though we used to treate 0-length data
packet of length 4 just like a flush.

So this is making the code more strict than the original.  As long
as all the existing implementations correctly use flush here, there
would be no unintended regression, but it bothers me that we have to
even worry if these behaviour changes affect the already deployed
software negatively.

>
>>  /* Now, start reading from fd[0] and spit it out to stdout */

Re: [Question] Signature calculation ignoring parts of binary files

2018-09-13 Thread Junio C Hamano

"Randall S. Becker"  writes:

> The scenario is slightly different.
> 1. Person A gives me a new binary file-1 with fingerprint A1. This goes into
> git unchanged.
> 2. Person B gives me binary file-2 with fingerprint B2. This does not go
> into git yet.
> 3. We attempt a git diff between the committed file-1 and uncommitted file-2
> using a textconv implementation that strips what we don't need to compare.
> 4. If file-1 and file-2 have no difference when textconv is used, file-2 is
> not added and not committed. It is discarded with impunity, never to be seen
> again, although we might whine a lot at the user for attempting to put
> file-2 in - but that's not git's issue.

You are forgetting that Git is a distributed version control system,
aren't you?  Person A and B can introduce their "moral equivalent
but bytewise different" copies to their repository under the same
object name, and you can pull from them--what happens?

It is fundamental that one object name given to Git identifies one
specific byte sequence contained in an object uniquely.  Once you
broke that, you no longer have Git.

Re: [PATCH 1/1] contrib: add coverage-diff script

2018-09-13 Thread Junio C Hamano

Derrick Stolee  writes:

> On 9/12/2018 6:54 PM, Junio C Hamano wrote:
>> Junio C Hamano  writes:
>>
>>> "Derrick Stolee via GitGitGadget"  writes:
>>>
   contrib/coverage-diff.sh | 70 
   1 file changed, 70 insertions(+)
   create mode 100755 contrib/coverage-diff.sh
>>> I fully appreciate the motivation.  But it is a bit sad that this
>>> begins with "#!/bin/bash" but it seems that the script is full of
>>> bash-isms.  I haven't gone through the script to see if these are
>>> inevitable or gratuitous yet, but I'd assume it made it easier for
>>> you to write it to step outside the pure POSIX shell?
>
> I completely forgot to avoid bash, as I wrote this first as an experiment.
>
>>> ...
 +  elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
 +  path=${BASH_REMATCH[2]}
>>> OK, it probably is easier to write in bash than using expr if you
>>> want to do regexp.
>> Just to clarify. I am saying that it is OK to give up writing in
>> pure POSIX and relying on bash-isms after seeing these lines.
>
> I'll try rewriting it using POSIX shell and see how hard it is.

Thanks.  Don't waste too much time on it and try to bend backwards
too far, though.

Re: [PATCH 1/3] archive: use packet_reader for communications

2018-09-13 Thread Junio C Hamano

Josh Steadmon  writes:

> Using packet_reader will simplify version detection and capability
> handling, which will make implementation of protocol v2 support in
> git-archive easier.

Is this meant as a change in implementation without any change in
behaviour?

> Signed-off-by: Josh Steadmon 
> ---
>  builtin/archive.c | 23 ++-
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/archive.c b/builtin/archive.c
> index e74f67539..e54fc39ad 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char 
> **argv,
>  const char *remote, const char *exec,
>  const char *name_hint)
>  {
> - char *buf;
>   int fd[2], i, rv;
>   struct transport *transport;
>   struct remote *_remote;
> + struct packet_reader reader;
> + enum packet_read_status status;
>  
>   _remote = remote_get(remote);
>   if (!_remote->url[0])
> @@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv,
>   transport = transport_get(_remote, _remote->url[0]);
>   transport_connect(transport, "git-upload-archive", exec, fd);
>  
> + packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
> +
>   /*
>* Inject a fake --format field at the beginning of the
>* arguments, with the format inferred from our output
> @@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char 
> **argv,
>   packet_write_fmt(fd[1], "argument %s\n", argv[i]);
>   packet_flush(fd[1]);
>  
> - buf = packet_read_line(fd[0], NULL);
> - if (!buf)
> + status = packet_reader_read();
> +
> + if (status == PACKET_READ_FLUSH)
>   die(_("git archive: expected ACK/NAK, got a flush packet"));

It is true that packet_read_line() returns a NULL on flush, but the
function also returns NULL on other conditions for which underlying
packet_read() returns 0 (or negative) length.  EOF, normal data with
zero length (i.e. packet length itself is 4), and DELIM packets
would all have led to this die() in the original code.  We fail to
notice that we got something unexpected when we were expecting to
get a normal packet with ACK or NACK on it.

> - if (strcmp(buf, "ACK")) {
> - if (starts_with(buf, "NACK "))
> - die(_("git archive: NACK %s"), buf + 5);
> - if (starts_with(buf, "ERR "))
> - die(_("remote error: %s"), buf + 4);
> + if (strcmp(reader.buffer, "ACK")) {

The way I read packet_reader_read()'s implementation and existing
callers (like the ones in fetch-pack.c) tells me that consumers
should not be looking at "reader.buffer"; they should instead be
reading from "reader.line".

> + if (starts_with(reader.buffer, "NACK "))
> + die(_("git archive: NACK %s"), reader.buffer + 5);
> + if (starts_with(reader.buffer, "ERR "))
> + die(_("remote error: %s"), reader.buffer + 4);
>   die(_("git archive: protocol error"));
>   }
>  
> - if (packet_read_line(fd[0], NULL))
> + status = packet_reader_read();
> + if (status != PACKET_READ_FLUSH)
>   die(_("git archive: expected a flush"));

This makes me wonder what happens if we got an EOF instead.  We fail
to notice protocol error here, but do the code after this part
correctly handle the situation?

>   /* Now, start reading from fd[0] and spit it out to stdout */

[PATCH v2 1/1] contrib: add coverage-diff script

2018-09-13 Thread Derrick Stolee via GitGitGadget

From: Derrick Stolee 

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has
an X.c.gcov file containing the coverage counts for every line, and "#"
at every uncovered line.

There have been a few bugs in recent patches what would have been caught
if the test suite covered those blocks (including a few of mine). I want
to work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by the
test suite. The output uses 'git blame -c' format so you can find the commits
responsible and view the line numbers for quick access to the context.

Signed-off-by: Derrick Stolee 
---
 contrib/coverage-diff.sh | 63 
 1 file changed, 63 insertions(+)
 create mode 100755 contrib/coverage-diff.sh

diff --git a/contrib/coverage-diff.sh b/contrib/coverage-diff.sh
new file mode 100755
index 00..0f226f038c
--- /dev/null
+++ b/contrib/coverage-diff.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+# Usage: 'contrib/coverage-diff.sh  
+# Outputs a list of new lines in version2 compared to version1 that are
+# not covered by the test suite. Assumes you ran
+# 'make coverage-test coverage-report' from root first, so .gcov files exist.
+
+V1=$1
+V2=$2
+
+diff_lines () {
+   while read line
+   do
+   if echo $line | grep -q -e "^@@ -([0-9]+)(,[0-9]+)? 
\\+([0-9]+)(,[0-9]+)? @@.*"
+   then
+   line_num=$(echo $line \
+   | awk 'match($0, "@@ -([0-9]+)(,[0-9]+)? 
\\+([0-9]+)(,[0-9]+)? @@.*", m) { print m[3] }')
+   else
+   echo "$line_num:$line"
+   if ! echo $line | grep -q -e "^-"
+   then
+   line_num=$(($line_num + 1))
+   fi
+   fi
+   done
+}
+
+files=$(git diff --raw $V1 $V2 \
+   | grep \.c$ \
+   | awk 'NF>1{print $NF}')
+
+for file in $files
+do
+   git diff $V1 $V2 -- $file \
+   | diff_lines \
+   | grep ":+" \
+   | sed 's/:/ /g' \
+   | awk '{print $1}' \
+   | sort \
+   >new_lines.txt
+
+   hash_file=$(echo $file | sed "s/\//\#/")
+   cat "$hash_file.gcov" \
+   | grep \#\#\#\#\# \
+   | sed 's/#: //g' \
+   | sed 's/\:/ /g' \
+   | awk '{print $1}' \
+   | sort \
+   >uncovered_lines.txt
+
+   comm -12 uncovered_lines.txt new_lines.txt \
+   | sed -e 's/$/\)/' \
+   | sed -e 's/^/\t/' \
+   >uncovered_new_lines.txt
+
+   grep -q '[^[:space:]]' < uncovered_new_lines.txt && \
+   echo $file && \
+   git blame -c $file \
+   | grep -f uncovered_new_lines.txt
+
+   rm -f new_lines.txt uncovered_lines.txt uncovered_new_lines.txt
+done
+
-- 
gitgitgadget

[PATCH v2 0/1] contrib: Add script to show uncovered "new" lines

2018-09-13 Thread Derrick Stolee via GitGitGadget

We have coverage targets in our Makefile for using gcov to display line
coverage based on our test suite. The way I like to do it is to run:

make coverage-test
make coverage-report

This leaves the repo in a state where every X.c file that was covered has an
X.c.gcov file containing the coverage counts for every line, and "#" at
every uncovered line.

There have been a few bugs in recent patches what would have been caught if
the test suite covered those blocks (including a few of mine). I want to
work towards a "sensible" amount of coverage on new topics. In my opinion,
this means that any logic should be covered, but the 'die()' blocks in error
cases do not need to be covered.

It is important to not measure the coverage of the codebase by what old code
is not covered. To help, I created the 'contrib/coverage-diff.sh' script.
After creating the coverage statistics at a version (say, 'topic') you can
then run

contrib/coverage-diff.sh base topic

to see the lines added between 'base' and 'topic' that are not covered by
the test suite. For example, I ran this against the 'jch' branch (d3c0046)
versus 'next' (dd90340) and got the following output:

builtin/commit.c
859fdc0c3cf (Derrick Stolee 2018-08-29 05:49:04 -0700   1657)   
write_commit_graph_reachable(get_object_directory(), 0);
builtin/rev-list.c
250edfa8c87 (Harald Nordgren2018-04-18 23:05:35 +0200   431)
bisect_flags |= BISECT_FIND_ALL;
builtin/worktree.c
e5353bef550 (Eric Sunshine  2018-08-28 17:20:19 -0400   60) 
error_errno(_("failed to delete '%s'"), sb.buf);
e19831c94f9 (Eric Sunshine  2018-08-28 17:20:23 -0400   251)
die(_("unable to re-add worktree '%s'"), path);
68a6b3a1bd4 (Eric Sunshine  2018-08-28 17:20:24 -0400   793)
die(_("cannot move a locked working tree, lock reason: %s\nuse 'move -f 
-f' to override or unlock first"),
f4143101cbb (Eric Sunshine  2018-08-28 17:20:25 -0400   906)
die(_("cannot remove a locked working tree, lock reason: %s\nuse 
'remove -f -f' to override or unlock first"),
read-cache.c
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1754)   
const unsigned char *cp = (const unsigned char *)name;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1757)   
previous_len = previous_ce ? previous_ce->ce_namelen : 0;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1758)   
strip_len = decode_varint();
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1759)   
if (previous_len < strip_len) {
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1760)   
if (previous_ce)
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1761)   
die(_("malformed name field in the index, near path 
'%s'"),
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1762)   
previous_ce->name);
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1764)   
die(_("malformed name field in the index in the first 
path"));
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1766)   
copy_len = previous_len - strip_len;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1767)   
name = (const char *)cp;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1773)   
len += copy_len;
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1794)   
if (copy_len)
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1795)   
memcpy(ce->name, previous_ce->name, copy_len);
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1796)   
memcpy(ce->name + copy_len, name, len + 1 - copy_len);
67922a3 (Nguyễn Thái Ngọc Duy   2018-09-02 15:19:33 +0200   1797)   
*ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len;
remote-curl.c
c3b9bc94b9b (Elijah Newren  2018-09-05 10:03:07 -0700   181)
options.filter = xstrdup(value);

Using this 'git blame' output, we can quickly inspect whether the uncovered
lines are appropriate. For instance:

 1. The line in builtin/commit.c is due to writing the commit-graph file
when GIT_TEST_COMMIT_GRAPH is enabled, which is not on by default in the
test suite. Being uncovered is expected here.


 2. The lines in builtin/worktree.c are all related to error conditions.
This is acceptable.


 3. The line in builtin/rev-list.c is a flag replacement in a block that is
otherwise unchanged. It must not be covered by the test suite normally.
This could be worth adding a test to ensure the new logic maintains old
behavior.


 4. The lines in

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-13 Thread Ben Peart





On 9/12/2018 6:31 PM, Thomas Gummerer wrote:

On 09/12, Ben Peart wrote:

Teach get_index_format_default() to support running the test suite
with specific index versions.  In particular, this enables the test suite
to be run using index version 4 which is not the default so gets less testing.


I found this commit message slightly misleading.  Running the test
suite with specific index versions is already supported, by defining
TEST_GIT_INDEX_VERSION in 'config.mak'.  What we're doing here is
introduce an additional environment variable that can also be used to
set the index format in tests.

Even setting TEST_GIT_INDEX_VERSION=4 in the environment does run the
test suite with index-v4.  Admittedly the name is a bit strange
compared to our usual GIT_TEST_* environment variable names, and it
should probably be documented better (it's only documented in the
Makefile currently), but I'm not sure we should introduce another
environment variable for this purpose?


TEST_GIT_INDEX_VERSION enables the testing I was looking for but you're 
right, it isn't well documented and the atypical naming and 
implementation don't help either.


I checked the documentation and code but didn't see any way to test the 
V4 index code path so wrote this patch.  I wonder if we can improve the 
discoverability of TEST_GIT_INDEX_VERSION through better naming and 
documentation.


How about this as an alternative?




diff --git a/Makefile b/Makefile
index 5a969f5830..9e84ef02f7 100644
--- a/Makefile
+++ b/Makefile
@@ -400,7 +400,7 @@ all::
 # (defaults to "man") if you want to have a different default when
 # "git help" is called without a parameter specifying the format.
 #
-# Define TEST_GIT_INDEX_VERSION to 2, 3 or 4 to run the test suite
+# Define GIT_TEST_INDEX_VERSION to 2, 3 or 4 to run the test suite
 # with a different indexfile format version.  If it isn't set the index
 # file format used is index-v[23].
 #
@@ -2599,8 +2599,8 @@ endif
 ifdef GIT_INTEROP_MAKE_OPTS
@echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst 
','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+

 endif
-ifdef TEST_GIT_INDEX_VERSION
-   @echo TEST_GIT_INDEX_VERSION=\''$(subst ','\'',$(subst 
','\'',$(TEST_GIT_INDEX_VERSION)))'\' >>$@+

+ifdef GIT_TEST_INDEX_VERSION
+   @echo GIT_TEST_INDEX_VERSION=\''$(subst ','\'',$(subst 
','\'',$(GIT_TEST_INDEX_VERSION)))'\' >>$@+


 endif
@if cmp $@+ $@ >/dev/null 2>&1; then $(RM) $@+; else mv $@+ $@; fi

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 44288cbb59..31698c01c4 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -134,9 +134,9 @@ export EDITOR
 GIT_TRACE_BARE=1
 export GIT_TRACE_BARE

-if test -n "${TEST_GIT_INDEX_VERSION:+isset}"
+if test -n "${GIT_TEST_INDEX_VERSION:+isset}"
 then
-   GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
+   GIT_INDEX_VERSION="$GIT_TEST_INDEX_VERSION"
export GIT_INDEX_VERSION
 fi

diff --git a/t/README b/t/README
index 9028b47d92..f872638a78 100644
--- a/t/README
+++ b/t/README
@@ -315,10 +315,14 @@ packs on demand. This normally only happens when 
the object size is

  over 2GB. This variable forces the code path on any object larger than
   bytes.

-GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
+GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
  path where deltas larger than this limit require extra memory
  allocation for bookkeeping.

+GIT_TEST_INDEX_VERSION= exercises the index read/write code path
+for the index version specified.  Can be set to any valid version
+but the non-default version 4 is probably the most beneficial.
+
  Naming Tests

[PATCH v2] builtin/remote: quote remote name on error to display empty name

2018-09-13 Thread Shulhan

When adding new remote name with empty string, git will print the
following error message,

  fatal: '' is not a valid remote name\n

But when removing remote name with empty string as input, git shows the
empty string without quote,

  fatal: No such remote: \n

To make these error messages consistent, quote the name of the remote
that we tried and failed to find.

Signed-off-by: Shulhan 
Reviewed-by: Junio C Hamano 
---
 builtin/remote.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 40c6f8a1b..f7edf7f2c 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -626,7 +626,7 @@ static int mv(int argc, const char **argv)
 
oldremote = remote_get(rename.old_name);
if (!remote_is_configured(oldremote, 1))
-   die(_("No such remote: %s"), rename.old_name);
+   die(_("No such remote: '%s'"), rename.old_name);
 
if (!strcmp(rename.old_name, rename.new_name) && oldremote->origin != 
REMOTE_CONFIG)
return migrate_file(oldremote);
@@ -762,7 +762,7 @@ static int rm(int argc, const char **argv)
 
remote = remote_get(argv[1]);
if (!remote_is_configured(remote, 1))
-   die(_("No such remote: %s"), argv[1]);
+   die(_("No such remote: '%s'"), argv[1]);
 
known_remotes.to_delete = remote;
for_each_remote(add_known_remote, _remotes);
@@ -861,7 +861,7 @@ static int get_remote_ref_states(const char *name,
 
states->remote = remote_get(name);
if (!states->remote)
-   return error(_("No such remote: %s"), name);
+   return error(_("No such remote: '%s'"), name);
 
read_branches();
 
-- 
2.19.0.400.ge8095fc63

Re: [PATCH 1/1] contrib: add coverage-diff script

2018-09-13 Thread Derrick Stolee


On 9/12/2018 6:54 PM, Junio C Hamano wrote:

Junio C Hamano  writes:


"Derrick Stolee via GitGitGadget"  writes:


  contrib/coverage-diff.sh | 70 
  1 file changed, 70 insertions(+)
  create mode 100755 contrib/coverage-diff.sh

I fully appreciate the motivation.  But it is a bit sad that this
begins with "#!/bin/bash" but it seems that the script is full of
bash-isms.  I haven't gone through the script to see if these are
inevitable or gratuitous yet, but I'd assume it made it easier for
you to write it to step outside the pure POSIX shell?


I completely forgot to avoid bash, as I wrote this first as an experiment.


...

+   elif [[ $REPLY =~ \+\+\+\ (b/)?([^[:blank:]$esc]+).* ]]; then
+   path=${BASH_REMATCH[2]}

OK, it probably is easier to write in bash than using expr if you
want to do regexp.

Just to clarify. I am saying that it is OK to give up writing in
pure POSIX and relying on bash-isms after seeing these lines.


I'll try rewriting it using POSIX shell and see how hard it is.

Thanks,

-Stolee

RE: [Question] Signature calculation ignoring parts of binary files

2018-09-13 Thread Randall S. Becker

On September 12, 2018 7:00 PM, Junio C Hamano wrote:
> "Randall S. Becker"  writes:
> 
> >> author is important to our process. My objective is to keep the
> >> original file 100% exact as supplied and then ignore any changes to
> >> the metadata that I don't care about (like Creator) if the remainder of
the
> file is the same.
> 
> That will *not* work.  If person A gave you a version of original, which
> hashes to X after you strip the cruft you do not care about, you would
> register that original with person A's fingerprint on under the name of X.
> What happens when person B gives you another version, which is not byte-
> for-byte identical to the one you got earlier from person A, but does hash
to
> the same X after you strip the cruft?  If you are going to store it in
Git, and if
> by SHA-1 you are calling what we perceive as "object name" in Git land,
you
> must store that one with person B's fingerprint on it also under the name
of
> X.  Now which version will you get from Git when you ask it to give you
the
> object that hashes to X?

The scenario is slightly different.
1. Person A gives me a new binary file-1 with fingerprint A1. This goes into
git unchanged.
2. Person B gives me binary file-2 with fingerprint B2. This does not go
into git yet.
3. We attempt a git diff between the committed file-1 and uncommitted file-2
using a textconv implementation that strips what we don't need to compare.
4. If file-1 and file-2 have no difference when textconv is used, file-2 is
not added and not committed. It is discarded with impunity, never to be seen
again, although we might whine a lot at the user for attempting to put
file-2 in - but that's not git's issue.
5. If file-1 and file-2 have differences when textconv is used, file-2 is
committed with fingerprint B2.
6. Even if an error is made by the user and they commit file-2 with B2
regardless of textconv, there will be a human who complains about it, but
git has two unambiguous fingerprints that happen to have no diffs after
textconv is applied.

My original hope was that textconv could be used to influence the
fingerprint, but I do not think that is the case, so I went with an
alternative. In the application, I am not allowed to strip any cruft off
file-1 when it is stored - it must be byte-for-byte the original file. This
application is marginally related to a DRM-like situation where we only care
about the original image provided by a user, but any copies that are
provided by another user with modified metadata will be disallowed from
repository.

Does that make more sense? 

Cheers,
Randall

Re: with git 1.8.3.1 get only merged tags

2018-09-13 Thread Ævar Arnfjörð Bjarmason

On Tue, Sep 11 2018, Michal Novotny wrote:

> I need to emulate git tag --merged with very old git 1.8.3.1. Is that
> somehow possible?
> I am looking for a bash function that would take what git 1.8.3.1
> offers and return only the tags accessible from the current branch

Jeff answer the question you had, but I just have one of my own: Is
RedHat stuck on 1.8-era git in some release it's still maintaining, does
this mean that e.g. you're still backporting security fixes to this
2012-era release?

Re: [PATCH] add -p: coalesce hunks before testing applicability

2018-09-13 Thread Phillip Wood

Hi Jochen

On 03/09/2018 20:01, Jochen Sprickerhof wrote:
> Hi Phillip,
> 
> * Phillip Wood  [2018-08-30 14:47]:
>> When $newhunk is created it is marked as dirty to prevent
>> coalesce_overlapping_hunks() from coalescing it. This patch does not
>> change that. What is happening is that by calling
>> coalesce_overlapping_hunks() the hunks that are not currently selected
>> are filtered out and any hunks that can be coalesced are (I think that
>> in the test that starts passing with this patch the only change is the
>> filtering as there's only a single hunk selected).
> 
> Agreed here. It would be enough to include the first hunk in the test to
> make it fail again. Still I would see the patch as going in the right
> direction as we need something like coalesce_overlapping_hunks() to make
> the hunks applicable after the edit.

Yes in the long term we want to be able to coalesce edited hunks, but I
think it is confusing to call coalesce_overlapping_hunks() at the moment
as it will not coalesce the edited hunks.

>> This is a subtle change to the test for the applicability of an edited
>> hunk. Previously when all the hunks were used to create the test patch
>> we could be certain that if the test patch applied then if the user
>> later selected any unselected hunk or deselected any selected hunk
>> then that operation would succeed. I'm not sure that is true now (but
>> I haven't thought about it for very long).
> 
> I'm not sure here. If we use the same test from t3701, do s(plit),
> y(es), e(dit), it would fail later on. Can you come up with an example?

I think that if you split a hunk, edit the first subhunk, transforming a
trailing context line to a deletion then try if you try to stage the
second subhunk it will fail. With your patch the edit will succeed as
the second subhunk is skipped when testing the edited patch. Then when
you try to stage the second subhunk it will fail as it's leading context
will contradict the trailing lines of the edited subhunk. With the old
method the edit failed but didn't store up trouble for the future.

>> We could restore the old test condition and coalesce the hunks by
>> copying all the hunks and setting $hunk->{USE}=1 when creating the
>> test patch if that turns out to be useful (it would be interesting to
>> see if the test still passes with that change).
> 
> We set USE=1 for $newhunk already, or where would you set it?

To match the old test it needs to be set on the hunks we've skipped or
haven't got to yet so they're all in the patch that's tested after
editing a hunk.

Best Wishes

Phillip

> 
> Cheers Jochen

Re: [PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

2018-09-13 Thread Eric Sunshine

On Wed, Sep 12, 2018 at 3:01 PM Thomas Gummerer  wrote:
> Subject: [PATCH] linear-assignment: fix potential out of bounds memory access
>
> Currently the 'compute_assignment()' function can may read memory out

"can may"?

> of bounds, even if used correctly.  Namely this happens when we only
> have one column.  In that case we try to calculate the initial
> minimum cost using '!j1' as column in the reduction transfer code.
> That in turn causes us to try and get the cost from column 1 in the
> cost matrix, which does not exist, and thus results in an out of
> bounds memory read.

git-credential-libsecret not prompting to unlock keyring

2018-09-13 Thread Paul Jolly

> Apologies, forgot the crucial details post that log:

This turned out to be an error unrelated to git or git-credential-libsecret.

Apologies for the noise (and the badly threaded reply earlier).


Paul

Re: with git 1.8.3.1 get only merged tags

2018-09-13 Thread Michal Novotny

On Tue, Sep 11, 2018 at 9:05 PM Jeff King  wrote:
>
> On Tue, Sep 11, 2018 at 12:43:15PM +0200, Michal Novotny wrote:
>
> > I need to emulate git tag --merged with very old git 1.8.3.1. Is that
> > somehow possible?
> > I am looking for a bash function that would take what git 1.8.3.1
> > offers and return only the tags accessible from the current branch
> > tip. Could you, please, give me at least a hint how this could be
> > done?
>
> This is not particularly fast, but it should work:
>
>   git for-each-ref refs/tags |
>   cut -f2 |
>   while read tag; do
> test "$(git merge-base $tag HEAD)" = \
>  "$(git rev-parse $tag^{commit})" && echo $tag
>   done

That works for me. Thank you a lot for help!

clime

>
> -Peff

76 matches

Mail list logo