date:20170523

Re: [PATCH 0/2] Fix warnings on access of a remote with Windows paths

2017-05-23 Thread Johannes Sixt


Am 24.05.2017 um 00:08 schrieb Junio C Hamano:

So in short:

  (1) Hannes's patches are good, but they solve a problem that is
  different from what their log messages say; the log message
  needs to be updated;

  (2) In nd/fopen-errors topic, we need a new patch that deals with
  EINVAL on Windows.


Will reroll the patches. Good analysis, BTW. (It was a late discovery of 
mine that nd/fopen-errors is actually the source of the warnings.)


-- Hannes

[PATCH 08/29] blame: rename ent_score function

2017-05-23 Thread Jeff Smith

Functions that will be publicly exposed should have names that better
reflect what they are a part of.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 7c493d2..129ef28 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -959,7 +959,7 @@ static void pass_blame_to_parent(struct blame_scoreboard 
*sb,
  *
  * Compute how trivial the lines in the blame_entry are.
  */
-static unsigned ent_score(struct blame_scoreboard *sb, struct blame_entry *e)
+static unsigned blame_entry_score(struct blame_scoreboard *sb, struct 
blame_entry *e)
 {
unsigned score;
const char *cp, *ep;
@@ -995,7 +995,7 @@ static void copy_split_if_better(struct blame_scoreboard 
*sb,
if (!this[1].suspect)
return;
if (best_so_far[1].suspect) {
-   if (ent_score(sb, [1]) < ent_score(sb, _so_far[1]))
+   if (blame_entry_score(sb, [1]) < blame_entry_score(sb, 
_so_far[1]))
return;
}
 
@@ -1107,7 +1107,7 @@ static struct blame_entry **filter_small(struct 
blame_scoreboard *sb,
struct blame_entry *p = *source;
struct blame_entry *oldsmall = *small;
while (p) {
-   if (ent_score(sb, p) <= score_min) {
+   if (blame_entry_score(sb, p) <= score_min) {
*small = p;
small = >next;
p = *small;
@@ -1156,7 +1156,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
next = e->next;
find_copy_in_blob(sb, e, parent, split, _p);
if (split[1].suspect &&
-   blame_move_score < ent_score(sb, [1])) {
+   blame_move_score < blame_entry_score(sb, 
[1])) {
split_blame(blamed, , split, e);
} else {
e->next = leftover;
@@ -1286,7 +1286,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
for (j = 0; j < num_ents; j++) {
struct blame_entry *split = blame_list[j].split;
if (split[1].suspect &&
-   blame_copy_score < ent_score(sb, [1])) {
+   blame_copy_score < blame_entry_score(sb, 
[1])) {
split_blame(blamed, , split,
blame_list[j].ent);
} else {
@@ -2104,8 +2104,8 @@ static void find_alignment(struct blame_scoreboard *sb, 
int *option)
num = e->lno + e->num_lines;
if (longest_dst_lines < num)
longest_dst_lines = num;
-   if (largest_score < ent_score(sb, e))
-   largest_score = ent_score(sb, e);
+   if (largest_score < blame_entry_score(sb, e))
+   largest_score = blame_entry_score(sb, e);
}
max_orig_digits = decimal_width(longest_src_lines);
max_digits = decimal_width(longest_dst_lines);
-- 
2.9.3

Re: [PATCH v2 4/7] grep: add support for the PCRE v1 JIT API

2017-05-23 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> diff --git a/grep.c b/grep.c
> index 1157529115..49e9aed457 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, 
> const struct grep_opt *opt)
>   const char *error;
>   int erroffset;
>   int options = PCRE_MULTILINE;
> +#ifdef PCRE_CONFIG_JIT
> + int canjit;
> +#endif

Is "canjit" a property purely of the library (e.g. version and
compilation option), or of combination of that and nature of the
pattern, or something else like the memory pressure?

I am wondering if it is worth doing something like this:

static int canjit = -1;
if (canjit < 0) {
pcre_config(PCRE_CONFIG_JIT, );
}

if it depends purely on the library linked to the process.

> @@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, 
> const struct grep_opt *opt)
>   if (!p->pcre1_regexp)
>   compile_regexp_failed(p, error);
>  
> - p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, );
> + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 
> PCRE_STUDY_JIT_COMPILE, );
>   if (!p->pcre1_extra_info && error)
>   die("%s", error);
> +
> +#ifdef PCRE_CONFIG_JIT
> + pcre_config(PCRE_CONFIG_JIT, );
> + if (canjit == 1) {
> + p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
> + if (!p->pcre1_jit_stack)
> + die("BUG: Couldn't allocate PCRE JIT stack");

I agree that dying is OK, but as far as I can tell, this is not a
BUG (there is no error a programmer can correct by a follow-up
patch); please do not mark it as such (it is likely that we'll later
do a tree-wide s/die("BUG:/BUG("/ and this will interfere).

> + pcre_assign_jit_stack(p->pcre1_extra_info, NULL, 
> p->pcre1_jit_stack);
> + p->pcre1_jit_on = 1;

Contrary to what I wondered about "canjit" above, I think it makes
tons of sense to contain the "is JIT in use?" information in "struct
grep_pat" and not rely on any global state.  Not that we are likely
to want to be able to JIT some patterns while not doing others.  So
I agree with the design choice of adding pcre1_jit_on field to the
structure.

But then, wouldn't it make more sense to do all of the above without
the canjit variable at all?  i.e. something like...

#ifdef PCRE_CONFIG_JIT
pcre_config(PCRE_CONFIG_JIT, >pcre1_jit_on);
if (p->pcre1_jit_on)
... stack thing ...
#else
p->pcre1_jit_on = 0;
#endif

> +#ifdef PCRE_CONFIG_JIT
> + if (p->pcre1_jit_on) {
> + pcre_free_study(p->pcre1_extra_info);
> + pcre_jit_stack_free(p->pcre1_jit_stack);
> + } else
> +#endif
> + /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
>   pcre_free(p->pcre1_extra_info);
> +
>   pcre_free((void *)p->pcre1_tables);

It is very thoughtful to add a blank line here (and you did the same
in another similar hunk), but I have a feeling that it is still a
bit too subtle a hint to signal to the readers that these two
pcre_free()s fire differently, i.e. the former does not fire if jit
is on but the latter always fires.

Would this be a bit safer while being not too ugly to live, I wonder?

#ifdef PCRE_CONFIG_JIT
if (p->pcre1_jit_on) {
pcre_free_study(p->pcre1_extra_info);
pcre_jit_stack_free(p->pcre1_jit_stack);
} else
#endif
{
/* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
pcre_free(p->pcre1_extra_info);
}
pcre_free((void *)p->pcre1_tables);

Thanks.

[PATCH 15/29] blame: move xdl_opts flags to scoreboard

2017-05-23 Thread Jeff Smith

The xdl_opts flags are used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index fdd41b4..8e676fb 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -125,7 +125,7 @@ struct progress_info {
 };
 
 static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
- xdl_emit_hunk_consume_func_t hunk_func, void *cb_data)
+ xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, 
int xdl_opts)
 {
xpparam_t xpp = {0};
xdemitconf_t xecfg = {0};
@@ -385,6 +385,7 @@ struct blame_scoreboard {
/* flags */
int reverse;
int show_root;
+   int xdl_opts;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -948,7 +949,7 @@ static void pass_blame_to_parent(struct blame_scoreboard 
*sb,
fill_origin_blob(>revs->diffopt, target, _o, 
>num_read_blob);
sb->num_get_patch++;
 
-   if (diff_hunks(_p, _o, blame_chunk_cb, ))
+   if (diff_hunks(_p, _o, blame_chunk_cb, , sb->xdl_opts))
die("unable to generate diff (%s -> %s)",
oid_to_hex(>commit->object.oid),
oid_to_hex(>commit->object.oid));
@@ -1097,7 +1098,7 @@ static void find_copy_in_blob(struct blame_scoreboard *sb,
 * file_p partially may match that image.
 */
memset(split, 0, sizeof(struct blame_entry [3]));
-   if (diff_hunks(file_p, _o, handle_split_cb, ))
+   if (diff_hunks(file_p, _o, handle_split_cb, , sb->xdl_opts))
die("unable to generate diff (%s)",
oid_to_hex(>commit->object.oid));
/* remainder, if any, all match the preimage */
@@ -2887,6 +2888,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.copy_score = blame_copy_score;
 
sb.show_root = show_root;
+   sb.xdl_opts = xdl_opts;
 
read_mailmap(, NULL);
 
-- 
2.9.3

[PATCH 29/29] blame: move entry prepend to libgit

2017-05-23 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 16 
 blame.h |  2 ++
 builtin/blame.c | 14 --
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/blame.c b/blame.c
index f6c9cb7..00404b9 100644
--- a/blame.c
+++ b/blame.c
@@ -1845,3 +1845,19 @@ void setup_scoreboard(struct blame_scoreboard *sb, const 
char *path, struct blam
if (orig)
*orig = o;
 }
+
+
+
+struct blame_entry *blame_entry_prepend(struct blame_entry *head,
+   long start, long end,
+   struct blame_origin *o)
+{
+   struct blame_entry *new_head = xcalloc(1, sizeof(struct blame_entry));
+   new_head->lno = start;
+   new_head->num_lines = end - start;
+   new_head->suspect = o;
+   new_head->s_lno = start;
+   new_head->next = head;
+   blame_origin_incref(o);
+   return new_head;
+}
diff --git a/blame.h b/blame.h
index 76fd8ef..a6c915c 100644
--- a/blame.h
+++ b/blame.h
@@ -170,4 +170,6 @@ extern const char *blame_nth_line(struct blame_scoreboard 
*sb, long lno);
 extern void init_scoreboard(struct blame_scoreboard *sb);
 extern void setup_scoreboard(struct blame_scoreboard *sb, const char *path, 
struct blame_origin **orig);
 
+extern struct blame_entry *blame_entry_prepend(struct blame_entry *head, long 
start, long end, struct blame_origin *o);
+
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 7d9e322..08f35bd 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -648,20 +648,6 @@ static int blame_move_callback(const struct option 
*option, const char *arg, int
return 0;
 }
 
-struct blame_entry *blame_entry_prepend(struct blame_entry *head,
-   long start, long end,
-   struct blame_origin *o)
-{
-   struct blame_entry *new_head = xcalloc(1, sizeof(struct blame_entry));
-   new_head->lno = start;
-   new_head->num_lines = end - start;
-   new_head->suspect = o;
-   new_head->s_lno = start;
-   new_head->next = head;
-   blame_origin_incref(o);
-   return new_head;
-}
-
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
struct rev_info revs;
-- 
2.9.3

[PATCH 28/29] blame: move scoreboard setup to libgit

2017-05-23 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 279 +++-
 blame.h |  10 +-
 builtin/blame.c | 276 ---
 3 files changed, 281 insertions(+), 284 deletions(-)

diff --git a/blame.c b/blame.c
index 798e61b..f6c9cb7 100644
--- a/blame.c
+++ b/blame.c
@@ -4,6 +4,7 @@
 #include "mergesort.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "tag.h"
 #include "blame.h"
 
 void blame_origin_decref(struct blame_origin *o)
@@ -49,7 +50,7 @@ static struct blame_origin *make_origin(struct commit 
*commit, const char *path)
  * Locate an existing origin or create a new one.
  * This moves the origin to front position in the commit util list.
  */
-struct blame_origin *get_origin(struct commit *commit, const char *path)
+static struct blame_origin *get_origin(struct commit *commit, const char *path)
 {
struct blame_origin *o, *l;
 
@@ -142,9 +143,9 @@ static void set_commit_buffer_from_strbuf(struct commit *c, 
struct strbuf *sb)
  * Prepare a dummy commit that represents the work tree (or staged) item.
  * Note that annotating work tree item never works in the reverse.
  */
-struct commit *fake_working_tree_commit(struct diff_options *opt,
-   const char *path,
-   const char *contents_from)
+static struct commit *fake_working_tree_commit(struct diff_options *opt,
+  const char *path,
+  const char *contents_from)
 {
struct commit *commit;
struct blame_origin *origin;
@@ -410,6 +411,13 @@ void blame_sort_final(struct blame_scoreboard *sb)
  compare_blame_final);
 }
 
+static int compare_commits_by_reverse_commit_date(const void *a,
+ const void *b,
+ void *c)
+{
+   return -compare_commits_by_commit_date(a, b, c);
+}
+
 /*
  * For debugging -- origin is refcounted, and this asserts that
  * we do not underflow.
@@ -483,6 +491,32 @@ static void queue_blames(struct blame_scoreboard *sb, 
struct blame_origin *porig
 }
 
 /*
+ * Fill the blob_sha1 field of an origin if it hasn't, so that later
+ * call to fill_origin_blob() can use it to locate the data.  blob_sha1
+ * for an origin is also used to pass the blame for the entire file to
+ * the parent to detect the case where a child's blob is identical to
+ * that of its parent's.
+ *
+ * This also fills origin->mode for corresponding tree path.
+ */
+static int fill_blob_sha1_and_mode(struct blame_origin *origin)
+{
+   if (!is_null_oid(>blob_oid))
+   return 0;
+   if (get_tree_entry(origin->commit->object.oid.hash,
+  origin->path,
+  origin->blob_oid.hash, >mode))
+   goto error_out;
+   if (sha1_object_info(origin->blob_oid.hash, NULL) != OBJ_BLOB)
+   goto error_out;
+   return 0;
+ error_out:
+   oidclr(>blob_oid);
+   origin->mode = S_IFINVALID;
+   return -1;
+}
+
+/*
  * We have an origin -- check if the same path exists in the
  * parent and return an origin structure to represent it.
  */
@@ -1574,3 +1608,240 @@ void assign_blame(struct blame_scoreboard *sb, int opt)
sanity_check_refcnt(sb);
}
 }
+
+static const char *get_next_line(const char *start, const char *end)
+{
+   const char *nl = memchr(start, '\n', end - start);
+   return nl ? nl + 1 : end;
+}
+
+/*
+ * To allow quick access to the contents of nth line in the
+ * final image, prepare an index in the scoreboard.
+ */
+static int prepare_lines(struct blame_scoreboard *sb)
+{
+   const char *buf = sb->final_buf;
+   unsigned long len = sb->final_buf_size;
+   const char *end = buf + len;
+   const char *p;
+   int *lineno;
+   int num = 0;
+
+   for (p = buf; p < end; p = get_next_line(p, end))
+   num++;
+
+   ALLOC_ARRAY(sb->lineno, num + 1);
+   lineno = sb->lineno;
+
+   for (p = buf; p < end; p = get_next_line(p, end))
+   *lineno++ = p - buf;
+
+   *lineno = len;
+
+   sb->num_lines = num;
+   return sb->num_lines;
+}
+
+static struct commit *find_single_final(struct rev_info *revs,
+   const char **name_p)
+{
+   int i;
+   struct commit *found = NULL;
+   const char *name = NULL;
+
+   for (i = 0; i < revs->pending.nr; i++) {
+   struct object *obj = revs->pending.objects[i].item;
+   if (obj->flags & UNINTERESTING)
+   continue;
+   obj = deref_tag(obj, NULL, 0);
+   if (obj->type != OBJ_COMMIT)
+   die("Non commit %s?", revs->pending.objects[i].name);
+   if (found)
+

[PATCH 27/29] blame: move scoreboard-related methods to libgit

2017-05-23 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 1313 ++
 blame.h |   11 +
 builtin/blame.c | 1318 ---
 3 files changed, 1324 insertions(+), 1318 deletions(-)

diff --git a/blame.c b/blame.c
index 8c025bc..798e61b 100644
--- a/blame.c
+++ b/blame.c
@@ -1,6 +1,9 @@
 #include "cache.h"
 #include "refs.h"
 #include "cache-tree.h"
+#include "mergesort.h"
+#include "diff.h"
+#include "diffcore.h"
 #include "blame.h"
 
 void blame_origin_decref(struct blame_origin *o)
@@ -261,3 +264,1313 @@ struct commit *fake_working_tree_commit(struct 
diff_options *opt,
 
return commit;
 }
+
+
+
+static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
+ xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, 
int xdl_opts)
+{
+   xpparam_t xpp = {0};
+   xdemitconf_t xecfg = {0};
+   xdemitcb_t ecb = {NULL};
+
+   xpp.flags = xdl_opts;
+   xecfg.hunk_func = hunk_func;
+   ecb.priv = cb_data;
+   return xdi_diff(file_a, file_b, , , );
+}
+
+/*
+ * Given an origin, prepare mmfile_t structure to be used by the
+ * diff machinery
+ */
+static void fill_origin_blob(struct diff_options *opt,
+struct blame_origin *o, mmfile_t *file, int 
*num_read_blob)
+{
+   if (!o->file.ptr) {
+   enum object_type type;
+   unsigned long file_size;
+
+   (*num_read_blob)++;
+   if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) &&
+   textconv_object(o->path, o->mode, >blob_oid, 1, 
>ptr, _size))
+   ;
+   else
+   file->ptr = read_sha1_file(o->blob_oid.hash, ,
+  _size);
+   file->size = file_size;
+
+   if (!file->ptr)
+   die("Cannot read blob %s for path %s",
+   oid_to_hex(>blob_oid),
+   o->path);
+   o->file = *file;
+   }
+   else
+   *file = o->file;
+}
+
+static void drop_origin_blob(struct blame_origin *o)
+{
+   if (o->file.ptr) {
+   free(o->file.ptr);
+   o->file.ptr = NULL;
+   }
+}
+
+/*
+ * Any merge of blames happens on lists of blames that arrived via
+ * different parents in a single suspect.  In this case, we want to
+ * sort according to the suspect line numbers as opposed to the final
+ * image line numbers.  The function body is somewhat longish because
+ * it avoids unnecessary writes.
+ */
+
+static struct blame_entry *blame_merge(struct blame_entry *list1,
+  struct blame_entry *list2)
+{
+   struct blame_entry *p1 = list1, *p2 = list2,
+   **tail = 
+
+   if (!p1)
+   return p2;
+   if (!p2)
+   return p1;
+
+   if (p1->s_lno <= p2->s_lno) {
+   do {
+   tail = >next;
+   if ((p1 = *tail) == NULL) {
+   *tail = p2;
+   return list1;
+   }
+   } while (p1->s_lno <= p2->s_lno);
+   }
+   for (;;) {
+   *tail = p2;
+   do {
+   tail = >next;
+   if ((p2 = *tail) == NULL)  {
+   *tail = p1;
+   return list1;
+   }
+   } while (p1->s_lno > p2->s_lno);
+   *tail = p1;
+   do {
+   tail = >next;
+   if ((p1 = *tail) == NULL) {
+   *tail = p2;
+   return list1;
+   }
+   } while (p1->s_lno <= p2->s_lno);
+   }
+}
+
+static void *get_next_blame(const void *p)
+{
+   return ((struct blame_entry *)p)->next;
+}
+
+static void set_next_blame(void *p1, void *p2)
+{
+   ((struct blame_entry *)p1)->next = p2;
+}
+
+/*
+ * Final image line numbers are all different, so we don't need a
+ * three-way comparison here.
+ */
+
+static int compare_blame_final(const void *p1, const void *p2)
+{
+   return ((struct blame_entry *)p1)->lno > ((struct blame_entry *)p2)->lno
+   ? 1 : -1;
+}
+
+static int compare_blame_suspect(const void *p1, const void *p2)
+{
+   const struct blame_entry *s1 = p1, *s2 = p2;
+   /*
+* to allow for collating suspects, we sort according to the
+* respective pointer value as the primary sorting criterion.
+* The actual relation is pretty unimportant as long as it
+* establishes a total order.  Comparing as integers gives us
+* that.
+*/
+   if (s1->suspect != s2->suspect)
+   return (intptr_t)s1->suspect > (intptr_t)s2->suspect ? 1 : -1;
+   if (s1->s_lno ==

[PATCH 21/29] blame: create scoreboard init function

2017-05-23 Thread Jeff Smith

Create function that initializes blame_scoreboard to default values.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index e343520..f839571 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2537,6 +2537,13 @@ static int blame_move_callback(const struct option 
*option, const char *arg, int
return 0;
 }
 
+void init_scoreboard(struct blame_scoreboard *sb)
+{
+   memset(sb, 0, sizeof(struct blame_scoreboard));
+   sb->move_score = BLAME_DEFAULT_MOVE_SCORE;
+   sb->copy_score = BLAME_DEFAULT_COPY_SCORE;
+}
+
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
struct rev_info revs;
@@ -2747,10 +2754,8 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
revs.disable_stdin = 1;
setup_revisions(argc, argv, , NULL);
-   memset(, 0, sizeof(sb));
-   sb.move_score = BLAME_DEFAULT_MOVE_SCORE;
-   sb.copy_score = BLAME_DEFAULT_COPY_SCORE;
 
+   init_scoreboard();
sb.revs = 
sb.contents_from = contents_from;
sb.reverse = reverse;
-- 
2.9.3

[PATCH 17/29] blame: make sanity_check use a callback in scoreboard

2017-05-23 Thread Jeff Smith

Allow the interface user to decide how to handle a failed sanity check,
whether that be to output with the current state or to do nothing.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 90c643c..1b53325 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -387,6 +387,10 @@ struct blame_scoreboard {
int show_root;
int xdl_opts;
int no_whole_file_rename;
+   int debug;
+
+   /* callbacks */
+   void(*on_sanity_fail)(struct blame_scoreboard *, int);
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -412,7 +416,7 @@ static void blame_coalesce(struct blame_scoreboard *sb)
}
}
 
-   if (DEBUG) /* sanity */
+   if (sb->debug) /* sanity */
sanity_check_refcnt(sb);
 }
 
@@ -1809,7 +1813,7 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
}
blame_origin_decref(suspect);
 
-   if (DEBUG) /* sanity */
+   if (sb->debug) /* sanity */
sanity_check_refcnt(sb);
}
 
@@ -2148,12 +2152,16 @@ static void sanity_check_refcnt(struct blame_scoreboard 
*sb)
baa = 1;
}
}
-   if (baa) {
-   int opt = 0160;
-   find_alignment(sb, );
-   output(sb, opt);
-   die("Baa %d!", baa);
-   }
+   if (baa)
+   sb->on_sanity_fail(sb, baa);
+}
+
+static void sanity_check_on_fail(struct blame_scoreboard *sb, int baa)
+{
+   int opt = OUTPUT_SHOW_SCORE | OUTPUT_SHOW_NUMBER | OUTPUT_SHOW_NAME;
+   find_alignment(sb, );
+   output(sb, opt);
+   die("Baa %d!", baa);
 }
 
 static unsigned parse_score(const char *arg)
@@ -2888,6 +2896,9 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (blame_copy_score)
sb.copy_score = blame_copy_score;
 
+   sb.debug = DEBUG;
+   sb.on_sanity_fail = _check_on_fail;
+
sb.show_root = show_root;
sb.xdl_opts = xdl_opts;
sb.no_whole_file_rename = no_whole_file_rename;
-- 
2.9.3

[PATCH 20/29] blame: rework methods that determine 'final' commit

2017-05-23 Thread Jeff Smith

Either prepare_initial or prepare_final is used to determine which
commit is marked as 'final'. Call the underlying methods directly to
make this more clear.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 49 +++--
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 61fd5b4..e343520 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2431,14 +2431,8 @@ static struct commit *find_single_final(struct rev_info 
*revs,
return found;
 }
 
-static char *prepare_final(struct blame_scoreboard *sb)
-{
-   const char *name;
-   sb->final = find_single_final(sb->revs, );
-   return xstrdup_or_null(name);
-}
-
-static const char *dwim_reverse_initial(struct blame_scoreboard *sb)
+static struct commit *dwim_reverse_initial(struct rev_info *revs,
+  const char **name_p)
 {
/*
 * DWIM "git blame --reverse ONE -- PATH" as
@@ -2449,11 +2443,11 @@ static const char *dwim_reverse_initial(struct 
blame_scoreboard *sb)
struct commit *head_commit;
unsigned char head_sha1[20];
 
-   if (sb->revs->pending.nr != 1)
+   if (revs->pending.nr != 1)
return NULL;
 
/* Is that sole rev a committish? */
-   obj = sb->revs->pending.objects[0].item;
+   obj = revs->pending.objects[0].item;
obj = deref_tag(obj, NULL, 0);
if (obj->type != OBJ_COMMIT)
return NULL;
@@ -2467,17 +2461,19 @@ static const char *dwim_reverse_initial(struct 
blame_scoreboard *sb)
 
/* Turn "ONE" into "ONE..HEAD" then */
obj->flags |= UNINTERESTING;
-   add_pending_object(sb->revs, _commit->object, "HEAD");
+   add_pending_object(revs, _commit->object, "HEAD");
 
-   sb->final = (struct commit *)obj;
-   return sb->revs->pending.objects[0].name;
+   if (name_p)
+   *name_p = revs->pending.objects[0].name;
+   return (struct commit *)obj;
 }
 
-static char *prepare_initial(struct blame_scoreboard *sb)
+static struct commit *find_single_initial(struct rev_info *revs,
+ const char **name_p)
 {
int i;
const char *final_commit_name = NULL;
-   struct rev_info *revs = sb->revs;
+   struct commit *found = NULL;
 
/*
 * There must be one and only one negative commit, and it must be
@@ -2490,19 +2486,22 @@ static char *prepare_initial(struct blame_scoreboard 
*sb)
obj = deref_tag(obj, NULL, 0);
if (obj->type != OBJ_COMMIT)
die("Non commit %s?", revs->pending.objects[i].name);
-   if (sb->final)
+   if (found)
die("More than one commit to dig up from, %s and %s?",
revs->pending.objects[i].name,
final_commit_name);
-   sb->final = (struct commit *) obj;
+   found = (struct commit *) obj;
final_commit_name = revs->pending.objects[i].name;
}
 
if (!final_commit_name)
-   final_commit_name = dwim_reverse_initial(sb);
+   found = dwim_reverse_initial(revs, _commit_name);
if (!final_commit_name)
die("No commit to dig up from?");
-   return xstrdup(final_commit_name);
+
+   if (name_p)
+   *name_p = final_commit_name;
+   return found;
 }
 
 static int blame_copy_callback(const struct option *option, const char *arg, 
int unset)
@@ -2546,7 +2545,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
struct blame_origin *o;
struct blame_entry *ent = NULL;
long dashdash_pos, lno;
-   char *final_commit_name = NULL;
+   const char *final_commit_name = NULL;
enum object_type type;
struct commit *final_commit = NULL;
struct progress_info pi = { NULL, 0 };
@@ -2755,14 +2754,15 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.revs = 
sb.contents_from = contents_from;
sb.reverse = reverse;
+
if (!reverse) {
-   final_commit_name = prepare_final();
+   sb.final = find_single_final(, _commit_name);
sb.commits.compare = compare_commits_by_commit_date;
}
else if (contents_from)
die(_("--contents and --reverse do not blend well."));
else {
-   final_commit_name = prepare_initial();
+   sb.final = find_single_initial(, _commit_name);
sb.commits.compare = compare_commits_by_reverse_commit_date;
if (revs.first_parent_only)
revs.children.name = NULL;
@@ -2917,10 +2917,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
if (!incremental)
setup_pager();
-
-   free(final_commit_name);

[PATCH 25/29] blame: move origin-related methods to libgit

2017-05-23 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 Makefile|  1 +
 blame.c | 62 +
 blame.h | 15 
 builtin/blame.c | 72 -
 4 files changed, 78 insertions(+), 72 deletions(-)
 create mode 100644 blame.c

diff --git a/Makefile b/Makefile
index e35542e..2d795ed 100644
--- a/Makefile
+++ b/Makefile
@@ -718,6 +718,7 @@ LIB_OBJS += argv-array.o
 LIB_OBJS += attr.o
 LIB_OBJS += base85.o
 LIB_OBJS += bisect.o
+LIB_OBJS += blame.o
 LIB_OBJS += blob.o
 LIB_OBJS += branch.o
 LIB_OBJS += bulk-checkin.o
diff --git a/blame.c b/blame.c
new file mode 100644
index 000..4855d6d
--- /dev/null
+++ b/blame.c
@@ -0,0 +1,62 @@
+#include "blame.h"
+
+void blame_origin_decref(struct blame_origin *o)
+{
+   if (o && --o->refcnt <= 0) {
+   struct blame_origin *p, *l = NULL;
+   if (o->previous)
+   blame_origin_decref(o->previous);
+   free(o->file.ptr);
+   /* Should be present exactly once in commit chain */
+   for (p = o->commit->util; p; l = p, p = p->next) {
+   if (p == o) {
+   if (l)
+   l->next = p->next;
+   else
+   o->commit->util = p->next;
+   free(o);
+   return;
+   }
+   }
+   die("internal error in blame_origin_decref");
+   }
+}
+
+/*
+ * Given a commit and a path in it, create a new origin structure.
+ * The callers that add blame to the scoreboard should use
+ * get_origin() to obtain shared, refcounted copy instead of calling
+ * this function directly.
+ */
+struct blame_origin *make_origin(struct commit *commit, const char *path)
+{
+   struct blame_origin *o;
+   FLEX_ALLOC_STR(o, path, path);
+   o->commit = commit;
+   o->refcnt = 1;
+   o->next = commit->util;
+   commit->util = o;
+   return o;
+}
+
+/*
+ * Locate an existing origin or create a new one.
+ * This moves the origin to front position in the commit util list.
+ */
+struct blame_origin *get_origin(struct commit *commit, const char *path)
+{
+   struct blame_origin *o, *l;
+
+   for (o = commit->util, l = NULL; o; l = o, o = o->next) {
+   if (!strcmp(o->path, path)) {
+   /* bump to front */
+   if (l) {
+   l->next = o->next;
+   o->next = commit->util;
+   commit->util = o;
+   }
+   return blame_origin_incref(o);
+   }
+   }
+   return make_origin(commit, path);
+}
diff --git a/blame.h b/blame.h
index c064d92..49b685e 100644
--- a/blame.h
+++ b/blame.h
@@ -140,4 +140,19 @@ struct blame_scoreboard {
void *found_guilty_entry_data;
 };
 
+/*
+ * Origin is refcounted and usually we keep the blob contents to be
+ * reused.
+ */
+static inline struct blame_origin *blame_origin_incref(struct blame_origin *o)
+{
+   if (o)
+   o->refcnt++;
+   return o;
+}
+extern void blame_origin_decref(struct blame_origin *o);
+
+extern struct blame_origin *make_origin(struct commit *commit, const char 
*path);
+extern struct blame_origin *get_origin(struct commit *commit, const char 
*path);
+
 #endif /* BLAME_H */
diff --git a/builtin/blame.c b/builtin/blame.c
index 07b1a76..2d6d834 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -124,39 +124,6 @@ static void fill_origin_blob(struct diff_options *opt,
*file = o->file;
 }
 
-/*
- * Origin is refcounted and usually we keep the blob contents to be
- * reused.
- */
-static inline struct blame_origin *blame_origin_incref(struct blame_origin *o)
-{
-   if (o)
-   o->refcnt++;
-   return o;
-}
-
-static void blame_origin_decref(struct blame_origin *o)
-{
-   if (o && --o->refcnt <= 0) {
-   struct blame_origin *p, *l = NULL;
-   if (o->previous)
-   blame_origin_decref(o->previous);
-   free(o->file.ptr);
-   /* Should be present exactly once in commit chain */
-   for (p = o->commit->util; p; l = p, p = p->next) {
-   if (p == o) {
-   if (l)
-   l->next = p->next;
-   else
-   o->commit->util = p->next;
-   free(o);
-   return;
-   }
-   }
-   die("internal error in blame_origin_decref");
-   }
-}
-
 static void drop_origin_blob(struct blame_origin *o)
 {
if (o->file.ptr) {
@@ -316,45 +283,6 @@ static void

[PATCH 18/29] blame: move progess updates to a scoreboard callback

2017-05-23 Thread Jeff Smith

Allow the interface user to decide how to handle a progress update.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 1b53325..d05907b 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -391,6 +391,9 @@ struct blame_scoreboard {
 
/* callbacks */
void(*on_sanity_fail)(struct blame_scoreboard *, int);
+   void(*found_guilty_entry)(struct blame_entry *, void *);
+
+   void *found_guilty_entry_data;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -1729,9 +1732,10 @@ static int emit_one_suspect_detail(struct blame_origin 
*suspect, int repeat)
  * The blame_entry is found to be guilty for the range.
  * Show it in incremental output.
  */
-static void found_guilty_entry(struct blame_entry *ent,
-  struct progress_info *pi)
+static void found_guilty_entry(struct blame_entry *ent, void *data)
 {
+   struct progress_info *pi = (struct progress_info *)data;
+
if (incremental) {
struct blame_origin *suspect = ent->suspect;
 
@@ -1754,11 +1758,6 @@ static void assign_blame(struct blame_scoreboard *sb, 
int opt)
 {
struct rev_info *revs = sb->revs;
struct commit *commit = prio_queue_get(>commits);
-   struct progress_info pi = { NULL, 0 };
-
-   if (show_progress)
-   pi.progress = start_progress_delay(_("Blaming lines"),
-  sb->num_lines, 50, 1);
 
while (commit) {
struct blame_entry *ent;
@@ -1800,7 +1799,8 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
suspect->guilty = 1;
for (;;) {
struct blame_entry *next = ent->next;
-   found_guilty_entry(ent, );
+   if (sb->found_guilty_entry)
+   sb->found_guilty_entry(ent, 
sb->found_guilty_entry_data);
if (next) {
ent = next;
continue;
@@ -1816,8 +1816,6 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
if (sb->debug) /* sanity */
sanity_check_refcnt(sb);
}
-
-   stop_progress();
 }
 
 static const char *format_time(timestamp_t time, const char *tz_str,
@@ -2550,6 +2548,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
char *final_commit_name = NULL;
enum object_type type;
struct commit *final_commit = NULL;
+   struct progress_info pi = { NULL, 0 };
 
struct string_list range_list = STRING_LIST_INIT_NODUP;
int output_option = 0, opt = 0;
@@ -2905,8 +2904,16 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
read_mailmap(, NULL);
 
+   sb.found_guilty_entry = _guilty_entry;
+   sb.found_guilty_entry_data = 
+   if (show_progress)
+   pi.progress = start_progress_delay(_("Blaming lines"),
+  sb.num_lines, 50, 1);
+
assign_blame(, opt);
 
+   stop_progress();
+
if (!incremental)
setup_pager();
 
-- 
2.9.3

[PATCH 24/29] blame: move core structures to header

2017-05-23 Thread Jeff Smith

The origin, entry, and scoreboard structures are core to the blame
interface and need to be exposed for blame functionality.

Signed-off-by: Jeff Smith 
---
 blame.h | 143 
 builtin/blame.c | 134 +---
 2 files changed, 144 insertions(+), 133 deletions(-)
 create mode 100644 blame.h

diff --git a/blame.h b/blame.h
new file mode 100644
index 000..c064d92
--- /dev/null
+++ b/blame.h
@@ -0,0 +1,143 @@
+#ifndef BLAME_H
+#define BLAME_H
+
+#include "cache.h"
+#include "commit.h"
+#include "xdiff-interface.h"
+#include "revision.h"
+#include "prio-queue.h"
+
+/*
+ * One blob in a commit that is being suspected
+ */
+struct blame_origin {
+   int refcnt;
+   /* Record preceding blame record for this blob */
+   struct blame_origin *previous;
+   /* origins are put in a list linked via `next' hanging off the
+* corresponding commit's util field in order to make finding
+* them fast.  The presence in this chain does not count
+* towards the origin's reference count.  It is tempting to
+* let it count as long as the commit is pending examination,
+* but even under circumstances where the commit will be
+* present multiple times in the priority queue of unexamined
+* commits, processing the first instance will not leave any
+* work requiring the origin data for the second instance.  An
+* interspersed commit changing that would have to be
+* preexisting with a different ancestry and with the same
+* commit date in order to wedge itself between two instances
+* of the same commit in the priority queue _and_ produce
+* blame entries relevant for it.  While we don't want to let
+* us get tripped up by this case, it certainly does not seem
+* worth optimizing for.
+*/
+   struct blame_origin *next;
+   struct commit *commit;
+   /* `suspects' contains blame entries that may be attributed to
+* this origin's commit or to parent commits.  When a commit
+* is being processed, all suspects will be moved, either by
+* assigning them to an origin in a different commit, or by
+* shipping them to the scoreboard's ent list because they
+* cannot be attributed to a different commit.
+*/
+   struct blame_entry *suspects;
+   mmfile_t file;
+   struct object_id blob_oid;
+   unsigned mode;
+   /* guilty gets set when shipping any suspects to the final
+* blame list instead of other commits
+*/
+   char guilty;
+   char path[FLEX_ARRAY];
+};
+
+/*
+ * Each group of lines is described by a blame_entry; it can be split
+ * as we pass blame to the parents.  They are arranged in linked lists
+ * kept as `suspects' of some unprocessed origin, or entered (when the
+ * blame origin has been finalized) into the scoreboard structure.
+ * While the scoreboard structure is only sorted at the end of
+ * processing (according to final image line number), the lists
+ * attached to an origin are sorted by the target line number.
+ */
+struct blame_entry {
+   struct blame_entry *next;
+
+   /* the first line of this group in the final image;
+* internally all line numbers are 0 based.
+*/
+   int lno;
+
+   /* how many lines this group has */
+   int num_lines;
+
+   /* the commit that introduced this group into the final image */
+   struct blame_origin *suspect;
+
+   /* the line number of the first line of this group in the
+* suspect's file; internally all line numbers are 0 based.
+*/
+   int s_lno;
+
+   /* how significant this entry is -- cached to avoid
+* scanning the lines over and over.
+*/
+   unsigned score;
+};
+
+/*
+ * The current state of the blame assignment.
+ */
+struct blame_scoreboard {
+   /* the final commit (i.e. where we started digging from) */
+   struct commit *final;
+   /* Priority queue for commits with unassigned blame records */
+   struct prio_queue commits;
+   struct rev_info *revs;
+   const char *path;
+
+   /*
+* The contents in the final image.
+* Used by many functions to obtain contents of the nth line,
+* indexed with scoreboard.lineno[blame_entry.lno].
+*/
+   const char *final_buf;
+   unsigned long final_buf_size;
+
+   /* linked list of blames */
+   struct blame_entry *ent;
+
+   /* look-up a line in the final buffer */
+   int num_lines;
+   int *lineno;
+
+   /* stats */
+   int num_read_blob;
+   int num_get_patch;
+   int num_commits;
+
+   /*
+* blame for a blame_entry with score lower than these thresholds
+* is not passed to the parent using move/copy logic.
+*/
+   unsigned move_score;
+

[PATCH 26/29] blame: move fake-commit-related methods to libgit

2017-05-23 Thread Jeff Smith

Signed-off-by: Jeff Smith 
---
 blame.c | 203 +++-
 blame.h |   4 +-
 builtin/blame.c | 197 --
 3 files changed, 205 insertions(+), 199 deletions(-)

diff --git a/blame.c b/blame.c
index 4855d6d..8c025bc 100644
--- a/blame.c
+++ b/blame.c
@@ -1,3 +1,6 @@
+#include "cache.h"
+#include "refs.h"
+#include "cache-tree.h"
 #include "blame.h"
 
 void blame_origin_decref(struct blame_origin *o)
@@ -28,7 +31,7 @@ void blame_origin_decref(struct blame_origin *o)
  * get_origin() to obtain shared, refcounted copy instead of calling
  * this function directly.
  */
-struct blame_origin *make_origin(struct commit *commit, const char *path)
+static struct blame_origin *make_origin(struct commit *commit, const char 
*path)
 {
struct blame_origin *o;
FLEX_ALLOC_STR(o, path, path);
@@ -60,3 +63,201 @@ struct blame_origin *get_origin(struct commit *commit, 
const char *path)
}
return make_origin(commit, path);
 }
+
+
+
+static void verify_working_tree_path(struct commit *work_tree, const char 
*path)
+{
+   struct commit_list *parents;
+   int pos;
+
+   for (parents = work_tree->parents; parents; parents = parents->next) {
+   const struct object_id *commit_oid = >item->object.oid;
+   struct object_id blob_oid;
+   unsigned mode;
+
+   if (!get_tree_entry(commit_oid->hash, path, blob_oid.hash, 
) &&
+   sha1_object_info(blob_oid.hash, NULL) == OBJ_BLOB)
+   return;
+   }
+
+   pos = cache_name_pos(path, strlen(path));
+   if (pos >= 0)
+   ; /* path is in the index */
+   else if (-1 - pos < active_nr &&
+!strcmp(active_cache[-1 - pos]->name, path))
+   ; /* path is in the index, unmerged */
+   else
+   die("no such path '%s' in HEAD", path);
+}
+
+static struct commit_list **append_parent(struct commit_list **tail, const 
struct object_id *oid)
+{
+   struct commit *parent;
+
+   parent = lookup_commit_reference(oid->hash);
+   if (!parent)
+   die("no such commit %s", oid_to_hex(oid));
+   return _list_insert(parent, tail)->next;
+}
+
+static void append_merge_parents(struct commit_list **tail)
+{
+   int merge_head;
+   struct strbuf line = STRBUF_INIT;
+
+   merge_head = open(git_path_merge_head(), O_RDONLY);
+   if (merge_head < 0) {
+   if (errno == ENOENT)
+   return;
+   die("cannot open '%s' for reading", git_path_merge_head());
+   }
+
+   while (!strbuf_getwholeline_fd(, merge_head, '\n')) {
+   struct object_id oid;
+   if (line.len < GIT_SHA1_HEXSZ || get_oid_hex(line.buf, ))
+   die("unknown line in '%s': %s", git_path_merge_head(), 
line.buf);
+   tail = append_parent(tail, );
+   }
+   close(merge_head);
+   strbuf_release();
+}
+
+/*
+ * This isn't as simple as passing sb->buf and sb->len, because we
+ * want to transfer ownership of the buffer to the commit (so we
+ * must use detach).
+ */
+static void set_commit_buffer_from_strbuf(struct commit *c, struct strbuf *sb)
+{
+   size_t len;
+   void *buf = strbuf_detach(sb, );
+   set_commit_buffer(c, buf, len);
+}
+
+/*
+ * Prepare a dummy commit that represents the work tree (or staged) item.
+ * Note that annotating work tree item never works in the reverse.
+ */
+struct commit *fake_working_tree_commit(struct diff_options *opt,
+   const char *path,
+   const char *contents_from)
+{
+   struct commit *commit;
+   struct blame_origin *origin;
+   struct commit_list **parent_tail, *parent;
+   struct object_id head_oid;
+   struct strbuf buf = STRBUF_INIT;
+   const char *ident;
+   time_t now;
+   int size, len;
+   struct cache_entry *ce;
+   unsigned mode;
+   struct strbuf msg = STRBUF_INIT;
+
+   read_cache();
+   time();
+   commit = alloc_commit_node();
+   commit->object.parsed = 1;
+   commit->date = now;
+   parent_tail = >parents;
+
+   if (!resolve_ref_unsafe("HEAD", RESOLVE_REF_READING, head_oid.hash, 
NULL))
+   die("no such ref: HEAD");
+
+   parent_tail = append_parent(parent_tail, _oid);
+   append_merge_parents(parent_tail);
+   verify_working_tree_path(commit, path);
+
+   origin = make_origin(commit, path);
+
+   ident = fmt_ident("Not Committed Yet", "not.committed.yet", NULL, 0);
+   strbuf_addstr(, "tree \n");
+   for (parent = commit->parents; parent; parent = parent->next)
+   strbuf_addf(, "parent %s\n",
+   oid_to_hex(>item->object.oid));
+   strbuf_addf(,
+

[PATCH 22/29] blame: create scoreboard setup function

2017-05-23 Thread Jeff Smith

Create function that completes setting up blame_scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 190 ++--
 1 file changed, 101 insertions(+), 89 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index f839571..fd41551 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2544,6 +2544,105 @@ void init_scoreboard(struct blame_scoreboard *sb)
sb->copy_score = BLAME_DEFAULT_COPY_SCORE;
 }
 
+void setup_scoreboard(struct blame_scoreboard *sb, const char *path, struct 
blame_origin **orig)
+{
+   const char *final_commit_name = NULL;
+   struct blame_origin *o;
+   struct commit *final_commit = NULL;
+   enum object_type type;
+
+   if (sb->reverse && sb->contents_from)
+   die(_("--contents and --reverse do not blend well."));
+
+   if (!sb->reverse) {
+   sb->final = find_single_final(sb->revs, _commit_name);
+   sb->commits.compare = compare_commits_by_commit_date;
+   } else {
+   sb->final = find_single_initial(sb->revs, _commit_name);
+   sb->commits.compare = compare_commits_by_reverse_commit_date;
+   }
+
+   if (sb->final && sb->contents_from)
+   die(_("cannot use --contents with final commit object name"));
+
+   if (sb->reverse && sb->revs->first_parent_only)
+   sb->revs->children.name = NULL;
+
+   if (!sb->final) {
+   /*
+* "--not A B -- path" without anything positive;
+* do not default to HEAD, but use the working tree
+* or "--contents".
+*/
+   setup_work_tree();
+   sb->final = fake_working_tree_commit(>revs->diffopt,
+path, sb->contents_from);
+   add_pending_object(sb->revs, &(sb->final->object), ":");
+   }
+
+   if (sb->reverse && sb->revs->first_parent_only) {
+   final_commit = find_single_final(sb->revs, NULL);
+   if (!final_commit)
+   die(_("--reverse and --first-parent together require 
specified latest commit"));
+   }
+
+   /*
+* If we have bottom, this will mark the ancestors of the
+* bottom commits we would reach while traversing as
+* uninteresting.
+*/
+   if (prepare_revision_walk(sb->revs))
+   die(_("revision walk setup failed"));
+
+   if (sb->reverse && sb->revs->first_parent_only) {
+   struct commit *c = final_commit;
+
+   sb->revs->children.name = "children";
+   while (c->parents &&
+  oidcmp(>object.oid, >final->object.oid)) {
+   struct commit_list *l = xcalloc(1, sizeof(*l));
+
+   l->item = c;
+   if (add_decoration(>revs->children,
+  >parents->item->object, l))
+   die("BUG: not unique item in first-parent 
chain");
+   c = c->parents->item;
+   }
+
+   if (oidcmp(>object.oid, >final->object.oid))
+   die(_("--reverse --first-parent together require range 
along first-parent chain"));
+   }
+
+   if (is_null_oid(>final->object.oid)) {
+   o = sb->final->util;
+   sb->final_buf = xmemdupz(o->file.ptr, o->file.size);
+   sb->final_buf_size = o->file.size;
+   }
+   else {
+   o = get_origin(sb->final, path);
+   if (fill_blob_sha1_and_mode(o))
+   die(_("no such path %s in %s"), path, 
final_commit_name);
+
+   if (DIFF_OPT_TST(>revs->diffopt, ALLOW_TEXTCONV) &&
+   textconv_object(path, o->mode, >blob_oid, 1, (char **) 
>final_buf,
+   >final_buf_size))
+   ;
+   else
+   sb->final_buf = read_sha1_file(o->blob_oid.hash, ,
+  >final_buf_size);
+
+   if (!sb->final_buf)
+   die(_("cannot read blob %s for path %s"),
+   oid_to_hex(>blob_oid),
+   path);
+   }
+   sb->num_read_blob++;
+   prepare_lines(sb);
+
+   if (orig)
+   *orig = o;
+}
+
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
struct rev_info revs;
@@ -2552,9 +2651,6 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
struct blame_origin *o;
struct blame_entry *ent = NULL;
long dashdash_pos, lno;
-   const char *final_commit_name = NULL;
-   enum object_type type;
-   struct commit *final_commit = NULL;
struct progress_info pi = { NULL, 0 };
 
struct string_list range_list =

[PATCH 07/29] blame: rename coalesce function

2017-05-23 Thread Jeff Smith

Functions that will be publicly exposed should have names that better
reflect what they are a part of.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 7854770..7c493d2 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -384,7 +384,7 @@ static void sanity_check_refcnt(struct blame_scoreboard *);
  * contiguous lines in the same origin (i.e.  pair),
  * merge them together.
  */
-static void coalesce(struct blame_scoreboard *sb)
+static void blame_coalesce(struct blame_scoreboard *sb)
 {
struct blame_entry *ent, *next;
 
@@ -2885,7 +2885,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
sb.ent = blame_sort(sb.ent, compare_blame_final);
 
-   coalesce();
+   blame_coalesce();
 
if (!(output_option & OUTPUT_PORCELAIN))
find_alignment(, _option);
-- 
2.9.3

[PATCH 12/29] blame: move contents_from to scoreboard

2017-05-23 Thread Jeff Smith

The argument from --contents is used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/blame.c b/builtin/blame.c
index 643f847..0955fc1 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -378,6 +378,9 @@ struct blame_scoreboard {
 */
unsigned move_score;
unsigned copy_score;
+
+   /* use this file's contents as the final image */
+   const char *contents_from;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -2735,6 +2738,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.copy_score = BLAME_DEFAULT_COPY_SCORE;
 
sb.revs = 
+   sb.contents_from = contents_from;
if (!reverse) {
final_commit_name = prepare_final();
sb.commits.compare = compare_commits_by_commit_date;
-- 
2.9.3

[PATCH 09/29] blame: rename nth_line function

2017-05-23 Thread Jeff Smith

Functions that will be publicly exposed should have names that better
reflect what they are a part of.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 129ef28..5082543 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -653,14 +653,14 @@ static void dup_entry(struct blame_entry ***queue,
*queue = >next;
 }
 
-static const char *nth_line(struct blame_scoreboard *sb, long lno)
+static const char *blame_nth_line(struct blame_scoreboard *sb, long lno)
 {
return sb->final_buf + sb->lineno[lno];
 }
 
 static const char *nth_line_cb(void *data, long lno)
 {
-   return nth_line((struct blame_scoreboard *)data, lno);
+   return blame_nth_line((struct blame_scoreboard *)data, lno);
 }
 
 /*
@@ -968,8 +968,8 @@ static unsigned blame_entry_score(struct blame_scoreboard 
*sb, struct blame_entr
return e->score;
 
score = 1;
-   cp = nth_line(sb, e->lno);
-   ep = nth_line(sb, e->lno + e->num_lines);
+   cp = blame_nth_line(sb, e->lno);
+   ep = blame_nth_line(sb, e->lno + e->num_lines);
while (cp < ep) {
unsigned ch = *((unsigned char *)cp);
if (isalnum(ch))
@@ -1078,9 +1078,9 @@ static void find_copy_in_blob(struct blame_scoreboard *sb,
/*
 * Prepare mmfile that contains only the lines in ent.
 */
-   cp = nth_line(sb, ent->lno);
+   cp = blame_nth_line(sb, ent->lno);
file_o.ptr = (char *) cp;
-   file_o.size = nth_line(sb, ent->lno + ent->num_lines) - cp;
+   file_o.size = blame_nth_line(sb, ent->lno + ent->num_lines) - cp;
 
/*
 * file_o is a part of final image we are annotating.
@@ -1866,7 +1866,7 @@ static void emit_porcelain(struct blame_scoreboard *sb, 
struct blame_entry *ent,
   ent->num_lines);
emit_porcelain_details(suspect, repeat);
 
-   cp = nth_line(sb, ent->lno);
+   cp = blame_nth_line(sb, ent->lno);
for (cnt = 0; cnt < ent->num_lines; cnt++) {
char ch;
if (cnt) {
@@ -1900,7 +1900,7 @@ static void emit_other(struct blame_scoreboard *sb, 
struct blame_entry *ent, int
get_commit_info(suspect->commit, , 1);
oid_to_hex_r(hex, >commit->object.oid);
 
-   cp = nth_line(sb, ent->lno);
+   cp = blame_nth_line(sb, ent->lno);
for (cnt = 0; cnt < ent->num_lines; cnt++) {
char ch;
int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ : 
abbrev;
-- 
2.9.3

[PATCH 14/29] blame: move show_root flag to scoreboard

2017-05-23 Thread Jeff Smith

The show_root flag is used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 161d15c..fdd41b4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -384,6 +384,7 @@ struct blame_scoreboard {
 
/* flags */
int reverse;
+   int show_root;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -1784,7 +1785,7 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
mark_parents_uninteresting(commit);
}
/* treat root commit as boundary */
-   if (!commit->parents && !show_root)
+   if (!commit->parents && !sb->show_root)
commit->object.flags |= UNINTERESTING;
 
/* Take responsibility for the remaining entries */
@@ -2885,6 +2886,8 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (blame_copy_score)
sb.copy_score = blame_copy_score;
 
+   sb.show_root = show_root;
+
read_mailmap(, NULL);
 
assign_blame(, opt);
-- 
2.9.3

[PATCH 16/29] blame: move no_whole_file_rename flag to scoreboard

2017-05-23 Thread Jeff Smith

The no_whole_file_rename flag is used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 8e676fb..90c643c 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -386,6 +386,7 @@ struct blame_scoreboard {
int reverse;
int show_root;
int xdl_opts;
+   int no_whole_file_rename;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -1411,7 +1412,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
 * The first pass looks for unrenamed path to optimize for
 * common cases, then we look for renames in the second pass.
 */
-   for (pass = 0; pass < 2 - no_whole_file_rename; pass++) {
+   for (pass = 0; pass < 2 - sb->no_whole_file_rename; pass++) {
struct blame_origin *(*find)(struct commit *, struct 
blame_origin *);
find = pass ? find_rename : find_origin;
 
@@ -2889,6 +2890,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
sb.show_root = show_root;
sb.xdl_opts = xdl_opts;
+   sb.no_whole_file_rename = no_whole_file_rename;
 
read_mailmap(, NULL);
 
-- 
2.9.3

[PATCH 23/29] blame: create entry prepend function

2017-05-23 Thread Jeff Smith

Create function that populates a blame_entry and prepends it to a list.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index fd41551..29771b7 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2643,6 +2643,20 @@ void setup_scoreboard(struct blame_scoreboard *sb, const 
char *path, struct blam
*orig = o;
 }
 
+struct blame_entry *blame_entry_prepend(struct blame_entry *head,
+   long start, long end,
+   struct blame_origin *o)
+{
+   struct blame_entry *new_head = xcalloc(1, sizeof(struct blame_entry));
+   new_head->lno = start;
+   new_head->num_lines = end - start;
+   new_head->suspect = o;
+   new_head->s_lno = start;
+   new_head->next = head;
+   blame_origin_incref(o);
+   return new_head;
+}
+
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
struct rev_info revs;
@@ -2885,16 +2899,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
for (range_i = ranges.nr; range_i > 0; --range_i) {
const struct range *r = [range_i - 1];
-   long bottom = r->start;
-   long top = r->end;
-   struct blame_entry *next = ent;
-   ent = xcalloc(1, sizeof(*ent));
-   ent->lno = bottom;
-   ent->num_lines = top - bottom;
-   ent->suspect = o;
-   ent->s_lno = bottom;
-   ent->next = next;
-   blame_origin_incref(o);
+   ent = blame_entry_prepend(ent, r->start, r->end, o);
}
 
o->suspects = ent;
-- 
2.9.3

[PATCH 13/29] blame: move reverse flag to scoreboard

2017-05-23 Thread Jeff Smith

The reverse flag is used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 0955fc1..161d15c 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -381,6 +381,9 @@ struct blame_scoreboard {
 
/* use this file's contents as the final image */
const char *contents_from;
+
+   /* flags */
+   int reverse;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -1339,7 +1342,8 @@ static void pass_whole_blame(struct blame_scoreboard *sb,
  * "parent" (and "porigin"), but what we mean is to find scapegoat to
  * exonerate ourselves.
  */
-static struct commit_list *first_scapegoat(struct rev_info *revs, struct 
commit *commit)
+static struct commit_list *first_scapegoat(struct rev_info *revs, struct 
commit *commit,
+  int reverse)
 {
if (!reverse) {
if (revs->first_parent_only &&
@@ -1353,9 +1357,9 @@ static struct commit_list *first_scapegoat(struct 
rev_info *revs, struct commit
return lookup_decoration(>children, >object);
 }
 
-static int num_scapegoats(struct rev_info *revs, struct commit *commit)
+static int num_scapegoats(struct rev_info *revs, struct commit *commit, int 
reverse)
 {
-   struct commit_list *l = first_scapegoat(revs, commit);
+   struct commit_list *l = first_scapegoat(revs, commit, reverse);
return commit_list_count(l);
 }
 
@@ -1393,7 +1397,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
struct blame_entry *toosmall = NULL;
struct blame_entry *blames, **blametail = 
 
-   num_sg = num_scapegoats(revs, commit);
+   num_sg = num_scapegoats(revs, commit, sb->reverse);
if (!num_sg)
goto finish;
else if (num_sg < ARRAY_SIZE(sg_buf))
@@ -1409,7 +1413,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
struct blame_origin *(*find)(struct commit *, struct 
blame_origin *);
find = pass ? find_rename : find_origin;
 
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct commit *p = sg->item;
@@ -1441,7 +1445,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
}
 
sb->num_commits++;
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct blame_origin *porigin = sg_origin[i];
@@ -1462,7 +1466,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
if (opt & PICKAXE_BLAME_MOVE) {
filter_small(sb, , >suspects, sb->move_score);
if (origin->suspects) {
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, 
sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct blame_origin *porigin = sg_origin[i];
@@ -1489,7 +1493,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
if (!origin->suspects)
goto finish;
 
-   for (i = 0, sg = first_scapegoat(revs, commit);
+   for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
 i < num_sg && sg;
 sg = sg->next, i++) {
struct blame_origin *porigin = sg_origin[i];
@@ -1770,7 +1774,7 @@ static void assign_blame(struct blame_scoreboard *sb, int 
opt)
 */
blame_origin_incref(suspect);
parse_commit(commit);
-   if (reverse ||
+   if (sb->reverse ||
(!(commit->object.flags & UNINTERESTING) &&
 !(revs->max_age != -1 && commit->date < revs->max_age)))
pass_blame(sb, suspect, opt);
@@ -2739,6 +2743,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
 
sb.revs = 
sb.contents_from = contents_from;
+   sb.reverse = reverse;
if (!reverse) {
final_commit_name = prepare_final();
sb.commits.compare = compare_commits_by_commit_date;
-- 
2.9.3

[PATCH 19/29] blame: wrap blame_sort and compare_blame_final

2017-05-23 Thread Jeff Smith

The new method's interface is marginally cleaner than blame_sort, and
will avoid the need to expose the compare_blame_final method.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index d05907b..61fd5b4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -328,12 +328,6 @@ static int compare_blame_suspect(const void *p1, const 
void *p2)
return s1->s_lno > s2->s_lno ? 1 : -1;
 }
 
-static struct blame_entry *blame_sort(struct blame_entry *head,
- int (*compare_fn)(const void *, const 
void *))
-{
-   return llist_mergesort (head, get_next_blame, set_next_blame, 
compare_fn);
-}
-
 static int compare_commits_by_reverse_commit_date(const void *a,
  const void *b,
  void *c)
@@ -396,6 +390,12 @@ struct blame_scoreboard {
void *found_guilty_entry_data;
 };
 
+static void blame_sort_final(struct blame_scoreboard *sb)
+{
+   sb->ent = llist_mergesort(sb->ent, get_next_blame, set_next_blame,
+ compare_blame_final);
+}
+
 static void sanity_check_refcnt(struct blame_scoreboard *);
 
 /*
@@ -1378,7 +1378,8 @@ static int num_scapegoats(struct rev_info *revs, struct 
commit *commit, int reve
  */
 static void distribute_blame(struct blame_scoreboard *sb, struct blame_entry 
*blamed)
 {
-   blamed = blame_sort(blamed, compare_blame_suspect);
+   blamed = llist_mergesort(blamed, get_next_blame, set_next_blame,
+compare_blame_suspect);
while (blamed)
{
struct blame_origin *porigin = blamed->suspect;
@@ -2922,7 +2923,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (incremental)
return 0;
 
-   sb.ent = blame_sort(sb.ent, compare_blame_final);
+   blame_sort_final();
 
blame_coalesce();
 
-- 
2.9.3

[PATCH 03/29] blame: remove unused parameters

2017-05-23 Thread Jeff Smith

Clean up blame code before moving it into libgit

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index fbd757e..3529f01 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -449,9 +449,7 @@ static struct origin *make_origin(struct commit *commit, 
const char *path)
  * Locate an existing origin or create a new one.
  * This moves the origin to front position in the commit util list.
  */
-static struct origin *get_origin(struct scoreboard *sb,
-struct commit *commit,
-const char *path)
+static struct origin *get_origin(struct commit *commit, const char *path)
 {
struct origin *o, *l;
 
@@ -499,8 +497,7 @@ static int fill_blob_sha1_and_mode(struct origin *origin)
  * We have an origin -- check if the same path exists in the
  * parent and return an origin structure to represent it.
  */
-static struct origin *find_origin(struct scoreboard *sb,
- struct commit *parent,
+static struct origin *find_origin(struct commit *parent,
  struct origin *origin)
 {
struct origin *porigin;
@@ -543,7 +540,7 @@ static struct origin *find_origin(struct scoreboard *sb,
 
if (!diff_queued_diff.nr) {
/* The path is the same as parent */
-   porigin = get_origin(sb, parent, origin->path);
+   porigin = get_origin(parent, origin->path);
oidcpy(>blob_oid, >blob_oid);
porigin->mode = origin->mode;
} else {
@@ -569,7 +566,7 @@ static struct origin *find_origin(struct scoreboard *sb,
die("internal error in blame::find_origin (%c)",
p->status);
case 'M':
-   porigin = get_origin(sb, parent, origin->path);
+   porigin = get_origin(parent, origin->path);
oidcpy(>blob_oid, >one->oid);
porigin->mode = p->one->mode;
break;
@@ -588,8 +585,7 @@ static struct origin *find_origin(struct scoreboard *sb,
  * We have an origin -- find the path that corresponds to it in its
  * parent and return an origin structure to represent it.
  */
-static struct origin *find_rename(struct scoreboard *sb,
- struct commit *parent,
+static struct origin *find_rename(struct commit *parent,
  struct origin *origin)
 {
struct origin *porigin = NULL;
@@ -615,7 +611,7 @@ static struct origin *find_rename(struct scoreboard *sb,
struct diff_filepair *p = diff_queued_diff.queue[i];
if ((p->status == 'R' || p->status == 'C') &&
!strcmp(p->two->path, origin->path)) {
-   porigin = get_origin(sb, parent, p->one->path);
+   porigin = get_origin(parent, p->one->path);
oidcpy(>blob_oid, >one->oid);
porigin->mode = p->one->mode;
break;
@@ -1270,7 +1266,7 @@ static void find_copy_in_parent(struct scoreboard *sb,
/* find_move already dealt with this path */
continue;
 
-   norigin = get_origin(sb, parent, p->one->path);
+   norigin = get_origin(parent, p->one->path);
oidcpy(>blob_oid, >one->oid);
norigin->mode = p->one->mode;
fill_origin_blob(>revs->diffopt, norigin, _p);
@@ -1404,8 +1400,7 @@ static void pass_blame(struct scoreboard *sb, struct 
origin *origin, int opt)
 * common cases, then we look for renames in the second pass.
 */
for (pass = 0; pass < 2 - no_whole_file_rename; pass++) {
-   struct origin *(*find)(struct scoreboard *,
-  struct commit *, struct origin *);
+   struct origin *(*find)(struct commit *, struct origin *);
find = pass ? find_rename : find_origin;
 
for (i = 0, sg = first_scapegoat(revs, commit);
@@ -1418,7 +1413,7 @@ static void pass_blame(struct scoreboard *sb, struct 
origin *origin, int opt)
continue;
if (parse_commit(p))
continue;
-   porigin = find(sb, p, origin);
+   porigin = find(p, origin);
if (!porigin)
continue;
if (!oidcmp(>blob_oid, >blob_oid)) {
@@ -2806,7 +2801,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
sb.final_buf_size = o->file.size;
}
else {
-   o = get_origin(, sb.final, path);
+

[PATCH 10/29] blame: move stat counters to scoreboard

2017-05-23 Thread Jeff Smith

Statistic counters are used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 5082543..f0c9ab8 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -61,11 +61,6 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP;
 #define DEBUG 0
 #endif
 
-/* stats */
-static int num_read_blob;
-static int num_get_patch;
-static int num_commits;
-
 #define PICKAXE_BLAME_MOVE 01
 #define PICKAXE_BLAME_COPY 02
 #define PICKAXE_BLAME_COPY_HARDER  04
@@ -151,13 +146,13 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
  * diff machinery
  */
 static void fill_origin_blob(struct diff_options *opt,
-struct blame_origin *o, mmfile_t *file)
+struct blame_origin *o, mmfile_t *file, int 
*num_read_blob)
 {
if (!o->file.ptr) {
enum object_type type;
unsigned long file_size;
 
-   num_read_blob++;
+   (*num_read_blob)++;
if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) &&
textconv_object(o->path, o->mode, >blob_oid, 1, 
>ptr, _size))
;
@@ -375,6 +370,11 @@ struct blame_scoreboard {
/* look-up a line in the final buffer */
int num_lines;
int *lineno;
+
+   /* stats */
+   int num_read_blob;
+   int num_get_patch;
+   int num_commits;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -934,9 +934,9 @@ static void pass_blame_to_parent(struct blame_scoreboard 
*sb,
d.offset = 0;
d.dstq =  d.srcq = >suspects;
 
-   fill_origin_blob(>revs->diffopt, parent, _p);
-   fill_origin_blob(>revs->diffopt, target, _o);
-   num_get_patch++;
+   fill_origin_blob(>revs->diffopt, parent, _p, 
>num_read_blob);
+   fill_origin_blob(>revs->diffopt, target, _o, 
>num_read_blob);
+   sb->num_get_patch++;
 
if (diff_hunks(_p, _o, blame_chunk_cb, ))
die("unable to generate diff (%s -> %s)",
@@ -1140,7 +1140,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
if (!unblamed)
return; /* nothing remains for this target */
 
-   fill_origin_blob(>revs->diffopt, parent, _p);
+   fill_origin_blob(>revs->diffopt, parent, _p, 
>num_read_blob);
if (!file_p.ptr)
return;
 
@@ -1269,7 +1269,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
norigin = get_origin(parent, p->one->path);
oidcpy(>blob_oid, >one->oid);
norigin->mode = p->one->mode;
-   fill_origin_blob(>revs->diffopt, norigin, _p);
+   fill_origin_blob(>revs->diffopt, norigin, _p, 
>num_read_blob);
if (!file_p.ptr)
continue;
 
@@ -1434,7 +1434,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
}
}
 
-   num_commits++;
+   sb->num_commits++;
for (i = 0, sg = first_scapegoat(revs, commit);
 i < num_sg && sg;
 sg = sg->next, i++) {
@@ -2818,7 +2818,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
oid_to_hex(>blob_oid),
path);
}
-   num_read_blob++;
+   sb.num_read_blob++;
lno = prepare_lines();
 
if (lno && !range_list.nr)
@@ -2899,9 +2899,9 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
}
 
if (show_stats) {
-   printf("num read blob: %d\n", num_read_blob);
-   printf("num get patch: %d\n", num_get_patch);
-   printf("num commits: %d\n", num_commits);
+   printf("num read blob: %d\n", sb.num_read_blob);
+   printf("num get patch: %d\n", sb.num_get_patch);
+   printf("num commits: %d\n", sb.num_commits);
}
return 0;
 }
-- 
2.9.3

[PATCH 01/29] blame: remove unneeded dependency on blob.h

2017-05-23 Thread Jeff Smith

With commit 21666f1 ("convert object type handling from a string to a
number", 2007-02-26), there was no longer a need for blame.c to include
blob.h but it was not removed.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index f00eda1..d39f6af 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -8,7 +8,6 @@
 #include "cache.h"
 #include "refs.h"
 #include "builtin.h"
-#include "blob.h"
 #include "commit.h"
 #include "tag.h"
 #include "tree-walk.h"
-- 
2.9.3

[PATCH 00/29] Add blame to libgit

2017-05-23 Thread Jeff Smith

Rather than duplicate large portions of builtin/blame.c in cgit, it
would be better to shift its core functionality into libgit.a.  The
functionality left in builtin/blame.c mostly relates to terminal
presentation.

Since RFC v2 patchset:
  Rebased (merged in timestamp_t changes)
  Reorganized to split out and group 'mechanical' changes

Jeff Smith (29):
  blame: remove unneeded dependency on blob.h
  blame: move textconv_object with related functions
  blame: remove unused parameters
  blame: rename origin structure to blame_origin
  blame: rename scoreboard structure to blame_scoreboard
  blame: rename origin-related functions
  blame: rename coalesce function
  blame: rename ent_score function
  blame: rename nth_line function
  blame: move stat counters to scoreboard
  blame: move copy/move thresholds to scoreboard
  blame: move contents_from to scoreboard
  blame: move reverse flag to scoreboard
  blame: move show_root flag to scoreboard
  blame: move xdl_opts flags to scoreboard
  blame: move no_whole_file_rename flag to scoreboard
  blame: make sanity_check use a callback in scoreboard
  blame: move progess updates to a scoreboard callback
  blame: wrap blame_sort and compare_blame_final
  blame: rework methods that determine 'final' commit
  blame: create scoreboard init function
  blame: create scoreboard setup function
  blame: create entry prepend function
  blame: move core structures to header
  blame: move origin-related methods to libgit
  blame: move fake-commit-related methods to libgit
  blame: move scoreboard-related methods to libgit
  blame: move scoreboard setup to libgit
  blame: move entry prepend to libgit

 Makefile   |1 +
 blame.c| 1863 +
 blame.h|  175 +
 builtin.h  |2 -
 builtin/blame.c| 2130 ++--
 builtin/cat-file.c |1 +
 diff.c |   23 +
 diff.h |7 +
 8 files changed, 2143 insertions(+), 2059 deletions(-)
 create mode 100644 blame.c
 create mode 100644 blame.h

-- 
2.9.3

[PATCH 11/29] blame: move copy/move thresholds to scoreboard

2017-05-23 Thread Jeff Smith

Copy and move score thresholds are used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 41 +++--
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index f0c9ab8..643f847 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -66,10 +66,6 @@ static struct string_list mailmap = STRING_LIST_INIT_NODUP;
 #define PICKAXE_BLAME_COPY_HARDER  04
 #define PICKAXE_BLAME_COPY_HARDEST 010
 
-/*
- * blame for a blame_entry with score lower than these thresholds
- * is not passed to the parent using move/copy logic.
- */
 static unsigned blame_move_score;
 static unsigned blame_copy_score;
 #define BLAME_DEFAULT_MOVE_SCORE   20
@@ -375,6 +371,13 @@ struct blame_scoreboard {
int num_read_blob;
int num_get_patch;
int num_commits;
+
+   /*
+* blame for a blame_entry with score lower than these thresholds
+* is not passed to the parent using move/copy logic.
+*/
+   unsigned move_score;
+   unsigned copy_score;
 };
 
 static void sanity_check_refcnt(struct blame_scoreboard *);
@@ -1156,7 +1159,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
next = e->next;
find_copy_in_blob(sb, e, parent, split, _p);
if (split[1].suspect &&
-   blame_move_score < blame_entry_score(sb, 
[1])) {
+   sb->move_score < blame_entry_score(sb, [1])) {
split_blame(blamed, , split, e);
} else {
e->next = leftover;
@@ -1165,7 +1168,7 @@ static void find_move_in_parent(struct blame_scoreboard 
*sb,
decref_split(split);
}
*unblamedtail = NULL;
-   toosmall = filter_small(sb, toosmall, , 
blame_move_score);
+   toosmall = filter_small(sb, toosmall, , 
sb->move_score);
} while (unblamed);
target->suspects = reverse_blame(leftover, NULL);
 }
@@ -1286,7 +1289,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
for (j = 0; j < num_ents; j++) {
struct blame_entry *split = blame_list[j].split;
if (split[1].suspect &&
-   blame_copy_score < blame_entry_score(sb, 
[1])) {
+   sb->copy_score < blame_entry_score(sb, [1])) {
split_blame(blamed, , split,
blame_list[j].ent);
} else {
@@ -1297,7 +1300,7 @@ static void find_copy_in_parent(struct blame_scoreboard 
*sb,
}
free(blame_list);
*unblamedtail = NULL;
-   toosmall = filter_small(sb, toosmall, , 
blame_copy_score);
+   toosmall = filter_small(sb, toosmall, , 
sb->copy_score);
} while (unblamed);
target->suspects = reverse_blame(leftover, NULL);
diff_flush(_opts);
@@ -1454,7 +1457,7 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
 * Optionally find moves in parents' files.
 */
if (opt & PICKAXE_BLAME_MOVE) {
-   filter_small(sb, , >suspects, 
blame_move_score);
+   filter_small(sb, , >suspects, sb->move_score);
if (origin->suspects) {
for (i = 0, sg = first_scapegoat(revs, commit);
 i < num_sg && sg;
@@ -1473,12 +1476,12 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
 * Optionally find copies from parents' files.
 */
if (opt & PICKAXE_BLAME_COPY) {
-   if (blame_copy_score > blame_move_score)
-   filter_small(sb, , >suspects, 
blame_copy_score);
-   else if (blame_copy_score < blame_move_score) {
+   if (sb->copy_score > sb->move_score)
+   filter_small(sb, , >suspects, 
sb->copy_score);
+   else if (sb->copy_score < sb->move_score) {
origin->suspects = blame_merge(origin->suspects, 
toosmall);
toosmall = NULL;
-   filter_small(sb, , >suspects, 
blame_copy_score);
+   filter_small(sb, , >suspects, 
sb->copy_score);
}
if (!origin->suspects)
goto finish;
@@ -2675,11 +2678,6 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
opt |= (PICKAXE_BLAME_COPY | PICKAXE_BLAME_MOVE |
PICKAXE_BLAME_COPY_HARDER);
 
-   if (!blame_move_score)
-   blame_move_score = BLAME_DEFAULT_MOVE_SCORE;
-   if

[PATCH 05/29] blame: rename scoreboard structure to blame_scoreboard

2017-05-23 Thread Jeff Smith

The scoreboard structure is core to the blame interface. Since
scoreboard will become more exposed, rename it to blame_scoreboard to
clarify what it is a part of.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 58 -
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 3e8aaa4..717868e 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -353,7 +353,7 @@ static int compare_commits_by_reverse_commit_date(const 
void *a,
 /*
  * The current state of the blame assignment.
  */
-struct scoreboard {
+struct blame_scoreboard {
/* the final commit (i.e. where we started digging from) */
struct commit *final;
/* Priority queue for commits with unassigned blame records */
@@ -377,14 +377,14 @@ struct scoreboard {
int *lineno;
 };
 
-static void sanity_check_refcnt(struct scoreboard *);
+static void sanity_check_refcnt(struct blame_scoreboard *);
 
 /*
  * If two blame entries that are next to each other came from
  * contiguous lines in the same origin (i.e.  pair),
  * merge them together.
  */
-static void coalesce(struct scoreboard *sb)
+static void coalesce(struct blame_scoreboard *sb)
 {
struct blame_entry *ent, *next;
 
@@ -410,7 +410,7 @@ static void coalesce(struct scoreboard *sb)
  * the commit priority queue of the score board.
  */
 
-static void queue_blames(struct scoreboard *sb, struct blame_origin *porigin,
+static void queue_blames(struct blame_scoreboard *sb, struct blame_origin 
*porigin,
 struct blame_entry *sorted)
 {
if (porigin->suspects)
@@ -653,14 +653,14 @@ static void dup_entry(struct blame_entry ***queue,
*queue = >next;
 }
 
-static const char *nth_line(struct scoreboard *sb, long lno)
+static const char *nth_line(struct blame_scoreboard *sb, long lno)
 {
return sb->final_buf + sb->lineno[lno];
 }
 
 static const char *nth_line_cb(void *data, long lno)
 {
-   return nth_line((struct scoreboard *)data, lno);
+   return nth_line((struct blame_scoreboard *)data, lno);
 }
 
 /*
@@ -919,7 +919,7 @@ static int blame_chunk_cb(long start_a, long count_a,
  * for the lines it is suspected to its parent.  Run diff to find
  * which lines came from parent and pass blame for them.
  */
-static void pass_blame_to_parent(struct scoreboard *sb,
+static void pass_blame_to_parent(struct blame_scoreboard *sb,
 struct blame_origin *target,
 struct blame_origin *parent)
 {
@@ -959,7 +959,7 @@ static void pass_blame_to_parent(struct scoreboard *sb,
  *
  * Compute how trivial the lines in the blame_entry are.
  */
-static unsigned ent_score(struct scoreboard *sb, struct blame_entry *e)
+static unsigned ent_score(struct blame_scoreboard *sb, struct blame_entry *e)
 {
unsigned score;
const char *cp, *ep;
@@ -986,7 +986,7 @@ static unsigned ent_score(struct scoreboard *sb, struct 
blame_entry *e)
  * so far, by comparing this and best_so_far and copying this into
  * bst_so_far as needed.
  */
-static void copy_split_if_better(struct scoreboard *sb,
+static void copy_split_if_better(struct blame_scoreboard *sb,
 struct blame_entry *best_so_far,
 struct blame_entry *this)
 {
@@ -1020,7 +1020,7 @@ static void copy_split_if_better(struct scoreboard *sb,
  *
  * All line numbers are 0-based.
  */
-static void handle_split(struct scoreboard *sb,
+static void handle_split(struct blame_scoreboard *sb,
 struct blame_entry *ent,
 int tlno, int plno, int same,
 struct blame_origin *parent,
@@ -1039,7 +1039,7 @@ static void handle_split(struct scoreboard *sb,
 }
 
 struct handle_split_cb_data {
-   struct scoreboard *sb;
+   struct blame_scoreboard *sb;
struct blame_entry *ent;
struct blame_origin *parent;
struct blame_entry *split;
@@ -1063,7 +1063,7 @@ static int handle_split_cb(long start_a, long count_a,
  * we can pass blames to it.  file_p has the blob contents for
  * the parent.
  */
-static void find_copy_in_blob(struct scoreboard *sb,
+static void find_copy_in_blob(struct blame_scoreboard *sb,
  struct blame_entry *ent,
  struct blame_origin *parent,
  struct blame_entry *split,
@@ -1099,7 +1099,7 @@ static void find_copy_in_blob(struct scoreboard *sb,
  * Returns a pointer to the link pointing to the old head of the small list.
  */
 
-static struct blame_entry **filter_small(struct scoreboard *sb,
+static struct blame_entry **filter_small(struct blame_scoreboard *sb,
 struct blame_entry **small,
 struct blame_entry **source,

[PATCH 04/29] blame: rename origin structure to blame_origin

2017-05-23 Thread Jeff Smith

The origin structure is core to the blame interface.  Since origin will
become more exposed, rename it to blame_origin to clarify what it is a
part of.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 114 
 1 file changed, 57 insertions(+), 57 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 3529f01..3e8aaa4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -87,10 +87,10 @@ static unsigned blame_copy_score;
 /*
  * One blob in a commit that is being suspected
  */
-struct origin {
+struct blame_origin {
int refcnt;
/* Record preceding blame record for this blob */
-   struct origin *previous;
+   struct blame_origin *previous;
/* origins are put in a list linked via `next' hanging off the
 * corresponding commit's util field in order to make finding
 * them fast.  The presence in this chain does not count
@@ -108,7 +108,7 @@ struct origin {
 * us get tripped up by this case, it certainly does not seem
 * worth optimizing for.
 */
-   struct origin *next;
+   struct blame_origin *next;
struct commit *commit;
/* `suspects' contains blame entries that may be attributed to
 * this origin's commit or to parent commits.  When a commit
@@ -151,7 +151,7 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
  * diff machinery
  */
 static void fill_origin_blob(struct diff_options *opt,
-struct origin *o, mmfile_t *file)
+struct blame_origin *o, mmfile_t *file)
 {
if (!o->file.ptr) {
enum object_type type;
@@ -180,17 +180,17 @@ static void fill_origin_blob(struct diff_options *opt,
  * Origin is refcounted and usually we keep the blob contents to be
  * reused.
  */
-static inline struct origin *origin_incref(struct origin *o)
+static inline struct blame_origin *origin_incref(struct blame_origin *o)
 {
if (o)
o->refcnt++;
return o;
 }
 
-static void origin_decref(struct origin *o)
+static void origin_decref(struct blame_origin *o)
 {
if (o && --o->refcnt <= 0) {
-   struct origin *p, *l = NULL;
+   struct blame_origin *p, *l = NULL;
if (o->previous)
origin_decref(o->previous);
free(o->file.ptr);
@@ -209,7 +209,7 @@ static void origin_decref(struct origin *o)
}
 }
 
-static void drop_origin_blob(struct origin *o)
+static void drop_origin_blob(struct blame_origin *o)
 {
if (o->file.ptr) {
free(o->file.ptr);
@@ -238,7 +238,7 @@ struct blame_entry {
int num_lines;
 
/* the commit that introduced this group into the final image */
-   struct origin *suspect;
+   struct blame_origin *suspect;
 
/* the line number of the first line of this group in the
 * suspect's file; internally all line numbers are 0 based.
@@ -410,13 +410,13 @@ static void coalesce(struct scoreboard *sb)
  * the commit priority queue of the score board.
  */
 
-static void queue_blames(struct scoreboard *sb, struct origin *porigin,
+static void queue_blames(struct scoreboard *sb, struct blame_origin *porigin,
 struct blame_entry *sorted)
 {
if (porigin->suspects)
porigin->suspects = blame_merge(porigin->suspects, sorted);
else {
-   struct origin *o;
+   struct blame_origin *o;
for (o = porigin->commit->util; o; o = o->next) {
if (o->suspects) {
porigin->suspects = sorted;
@@ -434,9 +434,9 @@ static void queue_blames(struct scoreboard *sb, struct 
origin *porigin,
  * get_origin() to obtain shared, refcounted copy instead of calling
  * this function directly.
  */
-static struct origin *make_origin(struct commit *commit, const char *path)
+static struct blame_origin *make_origin(struct commit *commit, const char 
*path)
 {
-   struct origin *o;
+   struct blame_origin *o;
FLEX_ALLOC_STR(o, path, path);
o->commit = commit;
o->refcnt = 1;
@@ -449,9 +449,9 @@ static struct origin *make_origin(struct commit *commit, 
const char *path)
  * Locate an existing origin or create a new one.
  * This moves the origin to front position in the commit util list.
  */
-static struct origin *get_origin(struct commit *commit, const char *path)
+static struct blame_origin *get_origin(struct commit *commit, const char *path)
 {
-   struct origin *o, *l;
+   struct blame_origin *o, *l;
 
for (o = commit->util, l = NULL; o; l = o, o = o->next) {
if (!strcmp(o->path, path)) {
@@ -476,7 +476,7 @@ static struct origin *get_origin(struct commit *commit, 
const char *path)
  *
  * This also fills origin->mode for corresponding tree path.
  */
-static int fill_blob_sha1_and_mode(struct

[PATCH 06/29] blame: rename origin-related functions

2017-05-23 Thread Jeff Smith

Functions related to blame_origin that will be publicly exposed should
have names that better reflect what they are a part of.

Signed-off-by: Jeff Smith 
---
 builtin/blame.c | 58 -
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 717868e..7854770 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -180,19 +180,19 @@ static void fill_origin_blob(struct diff_options *opt,
  * Origin is refcounted and usually we keep the blob contents to be
  * reused.
  */
-static inline struct blame_origin *origin_incref(struct blame_origin *o)
+static inline struct blame_origin *blame_origin_incref(struct blame_origin *o)
 {
if (o)
o->refcnt++;
return o;
 }
 
-static void origin_decref(struct blame_origin *o)
+static void blame_origin_decref(struct blame_origin *o)
 {
if (o && --o->refcnt <= 0) {
struct blame_origin *p, *l = NULL;
if (o->previous)
-   origin_decref(o->previous);
+   blame_origin_decref(o->previous);
free(o->file.ptr);
/* Should be present exactly once in commit chain */
for (p = o->commit->util; p; l = p, p = p->next) {
@@ -205,7 +205,7 @@ static void origin_decref(struct blame_origin *o)
return;
}
}
-   die("internal error in blame::origin_decref");
+   die("internal error in blame_origin_decref");
}
 }
 
@@ -393,7 +393,7 @@ static void coalesce(struct blame_scoreboard *sb)
ent->s_lno + ent->num_lines == next->s_lno) {
ent->num_lines += next->num_lines;
ent->next = next->next;
-   origin_decref(next->suspect);
+   blame_origin_decref(next->suspect);
free(next);
ent->score = 0;
next = ent; /* again */
@@ -461,7 +461,7 @@ static struct blame_origin *get_origin(struct commit 
*commit, const char *path)
o->next = commit->util;
commit->util = o;
}
-   return origin_incref(o);
+   return blame_origin_incref(o);
}
}
return make_origin(commit, path);
@@ -511,7 +511,7 @@ static struct blame_origin *find_origin(struct commit 
*parent,
 * The same path between origin and its parent
 * without renaming -- the most common case.
 */
-   return origin_incref (porigin);
+   return blame_origin_incref (porigin);
}
 
/* See if the origin->path is different between parent
@@ -630,7 +630,7 @@ static void add_blame_entry(struct blame_entry ***queue,
 {
struct blame_entry *e = xmalloc(sizeof(*e));
memcpy(e, src, sizeof(*e));
-   origin_incref(e->suspect);
+   blame_origin_incref(e->suspect);
 
e->next = **queue;
**queue = e;
@@ -645,8 +645,8 @@ static void add_blame_entry(struct blame_entry ***queue,
 static void dup_entry(struct blame_entry ***queue,
  struct blame_entry *dst, struct blame_entry *src)
 {
-   origin_incref(src->suspect);
-   origin_decref(dst->suspect);
+   blame_origin_incref(src->suspect);
+   blame_origin_decref(dst->suspect);
memcpy(dst, src, sizeof(*src));
dst->next = **queue;
**queue = dst;
@@ -687,7 +687,7 @@ static void split_overlap(struct blame_entry *split,
 
if (e->s_lno < tlno) {
/* there is a pre-chunk part not blamed on parent */
-   split[0].suspect = origin_incref(e->suspect);
+   split[0].suspect = blame_origin_incref(e->suspect);
split[0].lno = e->lno;
split[0].s_lno = e->s_lno;
split[0].num_lines = tlno - e->s_lno;
@@ -701,7 +701,7 @@ static void split_overlap(struct blame_entry *split,
 
if (same < e->s_lno + e->num_lines) {
/* there is a post-chunk part not blamed on parent */
-   split[2].suspect = origin_incref(e->suspect);
+   split[2].suspect = blame_origin_incref(e->suspect);
split[2].lno = e->lno + (same - e->s_lno);
split[2].s_lno = e->s_lno + (same - e->s_lno);
split[2].num_lines = e->s_lno + e->num_lines - same;
@@ -717,7 +717,7 @@ static void split_overlap(struct blame_entry *split,
 */
if (split[1].num_lines < 1)
return;
-   split[1].suspect = origin_incref(parent);
+   split[1].suspect = blame_origin_incref(parent);
 }
 
 /*
@@ -767,7 +767,7 @@ static void decref_split(struct blame_entry

[PATCH 02/29] blame: move textconv_object with related functions

2017-05-23 Thread Jeff Smith

textconv_object is used in places other than blame.c and should be moved
to a more appropriate location.  Other textconv related functions are
located in diff.c so that seems as good a place as any.

Signed-off-by: Jeff Smith 
---
 builtin.h  |  2 --
 builtin/blame.c| 28 
 builtin/cat-file.c |  1 +
 diff.c | 23 +++
 diff.h |  7 +++
 5 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/builtin.h b/builtin.h
index 9e4a898..498ac80 100644
--- a/builtin.h
+++ b/builtin.h
@@ -25,8 +25,6 @@ struct fmt_merge_msg_opts {
 extern int fmt_merge_msg(struct strbuf *in, struct strbuf *out,
 struct fmt_merge_msg_opts *);
 
-extern int textconv_object(const char *path, unsigned mode, const struct 
object_id *oid, int oid_valid, char **buf, unsigned long *buf_size);
-
 extern int is_builtin(const char *s);
 
 extern int cmd_add(int argc, const char **argv, const char *prefix);
diff --git a/builtin/blame.c b/builtin/blame.c
index d39f6af..fbd757e 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -147,34 +147,6 @@ static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
 }
 
 /*
- * Prepare diff_filespec and convert it using diff textconv API
- * if the textconv driver exists.
- * Return 1 if the conversion succeeds, 0 otherwise.
- */
-int textconv_object(const char *path,
-   unsigned mode,
-   const struct object_id *oid,
-   int oid_valid,
-   char **buf,
-   unsigned long *buf_size)
-{
-   struct diff_filespec *df;
-   struct userdiff_driver *textconv;
-
-   df = alloc_filespec(path);
-   fill_filespec(df, oid->hash, oid_valid, mode);
-   textconv = get_textconv(df);
-   if (!textconv) {
-   free_filespec(df);
-   return 0;
-   }
-
-   *buf_size = fill_textconv(textconv, df, buf);
-   free_filespec(df);
-   return 1;
-}
-
-/*
  * Given an origin, prepare mmfile_t structure to be used by the
  * diff machinery
  */
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 1890d7a..79a2c82 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -5,6 +5,7 @@
  */
 #include "cache.h"
 #include "builtin.h"
+#include "diff.h"
 #include "parse-options.h"
 #include "userdiff.h"
 #include "streaming.h"
diff --git a/diff.c b/diff.c
index 74283d9..040fb25 100644
--- a/diff.c
+++ b/diff.c
@@ -5270,6 +5270,29 @@ size_t fill_textconv(struct userdiff_driver *driver,
return size;
 }
 
+int textconv_object(const char *path,
+   unsigned mode,
+   const struct object_id *oid,
+   int oid_valid,
+   char **buf,
+   unsigned long *buf_size)
+{
+   struct diff_filespec *df;
+   struct userdiff_driver *textconv;
+
+   df = alloc_filespec(path);
+   fill_filespec(df, oid->hash, oid_valid, mode);
+   textconv = get_textconv(df);
+   if (!textconv) {
+   free_filespec(df);
+   return 0;
+   }
+
+   *buf_size = fill_textconv(textconv, df, buf);
+   free_filespec(df);
+   return 1;
+}
+
 void setup_diff_pager(struct diff_options *opt)
 {
/*
diff --git a/diff.h b/diff.h
index 5be1ee7..52ebd54 100644
--- a/diff.h
+++ b/diff.h
@@ -385,6 +385,13 @@ extern size_t fill_textconv(struct userdiff_driver *driver,
  */
 extern struct userdiff_driver *get_textconv(struct diff_filespec *one);
 
+/*
+ * Prepare diff_filespec and convert it using diff textconv API
+ * if the textconv driver exists.
+ * Return 1 if the conversion succeeds, 0 otherwise.
+ */
+extern int textconv_object(const char *path, unsigned mode, const struct 
object_id *oid, int oid_valid, char **buf, unsigned long *buf_size);
+
 extern int parse_rename_score(const char **cp_p);
 
 extern long parse_algorithm_value(const char *value);
-- 
2.9.3

Re: [PATCH v2 2/7] grep: skip pthreads overhead when using one thread

2017-05-23 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Skip the administrative overhead of using pthreads when only using one
> thread. Instead take the non-threaded path which would be taken under
> NO_PTHREADS.
>
> The threading support was initially added in commit
> 5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
> number of 8 threads. Later the number of threads was made configurable
> in commit 89f09dd34e ("grep: add --threads= option and
> grep.threads configuration", 2015-12-15).
>
> That change did not add any special handling for --threads=1. Now we
> take a slightly faster path by skipping thread handling entirely when
> 1 thread is requested.

OK, this is what Peff and you were discussing in the earlier round,
having the controller do the work himself, instead of sitting and
waiting for a sole worker to finish the work.  Looks good.

Thanks.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  builtin/grep.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 12e62fcbf3..bd008cb100 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char 
> *prefix)
>   num_threads = GREP_NUM_THREADS_DEFAULT;
>   else if (num_threads < 0)
>   die(_("invalid number of threads specified (%d)"), num_threads);
> + if (num_threads == 1)
> + num_threads = 0;
>  #else
>   if (num_threads)
>   warning(_("no threads support, ignoring --threads"));

Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading

2017-05-23 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Rather, it's just to make the code easier to reason about. It's
> confusing to debug this under threading & non-threading when the
> threading codepaths redundantly compile a pattern which is never used.
>
> The reason the patterns are recompiled is as a side-effect of
> duplicating the whole grep_opt structure, which is not thread safe,
> writable, and munged during execution. The grep_opt structure then
> points to the grep_pat structure where pattern or patterns are stored.
>
> I looked into e.g. splitting the API into some "do & alloc threadsafe
> stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
> execution time of grep_opt_dup() & pattern compilation is trivial
> compared to actually executing the grep, so there was no point. Even
> with the more expensive JIT changes to follow the most expensive PCRE
> patterns take something like 0.0X milliseconds to compile at most[1].

OK.

> The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
> --debug option to dump the parse tree", 2012-09-13) still works
> properly with this change. It only emits debugging info during pattern
> compilation, which is now dumped by the pattern compiled just before
> the first thread is started.

When opt is passed to run(), opt->debug is still true for the first
worker thread.  As long as opt->debug never makes difference after
compile_grep_patterns(opt) returns, I think the change in this patch
safe.  I do not know if we want to rely on it, but we can explain it
away by saying "we'll only debug the runtime behaviour for the first
worker only", or something, so it is not a big deal either way.

Thanks.

Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths

2017-05-23 Thread Torsten Bögershausen




diff --git a/builtin/clean.c b/builtin/clean.c
index d861f836a..937eb17b6 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -857,6 +857,38 @@ static void interactive_main_loop(void)
}
  }
  
+static void correct_untracked_entries(struct dir_struct *dir)

+{
+   int src, dst, ign;
+
+   for (src = dst = ign = 0; src < dir->nr; src++) {
+   /* skip paths in ignored[] that cannot be inside entries[src] */
+   while (ign < dir->ignored_nr &&
+  0 <= cmp_dir_entry(>entries[src], 
>ignored[ign]))
+   ign++;
+
+   if (ign < dir->ignored_nr &&
+   check_dir_entry_contains(dir->entries[src], 
dir->ignored[ign])) {
+   /* entries[src] contains an ignored path, so we drop it 
*/
+   free(dir->entries[src]);
+   } else {
+   struct dir_entry *ent = dir->entries[src++];
+
+   /* entries[src] does not contain an ignored path, so we 
keep it */
+   dir->entries[dst++] = ent;
+
+   /* then discard paths in entries[] contained inside 
entries[src] */
+   while (src < dir->nr &&
+  check_dir_entry_contains(ent, dir->entries[src]))
+   free(dir->entries[src++]);
+
+   /* compensate for the outer loop's loop control */
+   src--;
+   }
+   }
+   dir->nr = dst;
+}
+


Looking what the function does, would
drop_or_keep_untracked_entries()
make more sense ?

Re: [PATCH 03/15] handle_revision_arg: stop using "dotdot" as a generic pointer

2017-05-23 Thread Junio C Hamano

Jeff King  writes:

> The handle_revision_arg() function has a "dotdot" variable
> that it uses to find a ".." or "..." in the argument. If we
> don't find one, we look for other marks, like "^!". But we
> just keep re-using the "dotdot" variable, which is
> confusing.
>
> Let's introduce a separate "mark" variable that can be used
> for these other marks. They still reuse the same variable,
> but at least the name is no longer actively misleading.
>
> Signed-off-by: Jeff King 
> ---
> It may make sense to pull each of these into its own helper. I didn't
> really look because they're so small, and because the return semantics
> seemed confusing to me. Some of them return, and some of them keep
> parsing. Some of them restore the NUL they overwrite, and some do not.
>
> I didn't dig in to see if there are weird corner cases where they
> misbehave.

I do not quite know what corner cases you meant, but I agree that
many places in the codepath we forget to restore "^" we temporarily
overwrite.  I suspect that none of them is deliberately leaving "^"
unrestored and they are just being careless (or they truly do not
care because they assume nobody will look at arg later).

And I think not restoring cannot be a correct thing to do.  After
all of these parsing, add_rev_cmdline() wants to see arg_ intact.

But let's keep reading; perhaps they are addressed in a later patch,
or they are left as-is, and either is OK.

Re: [PATCH v2 00/25] Prepare to separate out a packed_ref_store

2017-05-23 Thread Junio C Hamano

Michael Haggerty  writes:

> * Since v1, branch `bc/object-id` has been merged to `next`, and it
>   has lots of conflicts with these changes. So I rebased this branch
>   onto a merge of `master` and `bc/object-id`. (I hope this makes
>   Junio's job easier.) This unfortunately causes a bit of tbdiff noise
>   between v1 and v2.

Heh, that gives me an excellent excuse to procrastinate, as I am
planning to merge that topic to 'master' by the end of this week ;-)

Jokes aside...

> * Patch [01/25]: in t3600, register the `test_when_finished` command
>   before executing `chmod a-w`.
>
> * Patch [04/25] (new patch): convert a few `die("internal error: ...")`
>   to `die("BUG: ...")`.
>
> * Patch [05/25]: Use `strlen()` rather than `memchr()` to check the
>   trim length, and `die()` rather than skipping if it is longer than
>   the reference name.
>
> * Patch [08/25]: Name the log message arguments `msg` for consistency
>   with existing code.
>
> * Patch [10/25]: Rename the new member from `packlock` to
>   `packed_refs_lock`.
>
> * Patch [13/25] (new patch): Move the check for valid
>   `transaction->state` from `files_transaction_commit()` to
>   `ref_transaction_commit()`.

All of these feel familiar ;-)

> * Patch [14/25]:
>
>   * Add more sanity checks of `transaction->state`.
>
>   * Don't add `ref_transaction_finish()` to the public API. Instead,
> teach `ref_transaction_commit()` to do the right thing whether or
> not `ref_transaction_prepare()` has been called.
>
>   * Add and improve docstrings.
>
>   * Allow `ref_transaction_abort()` to be called even before
> `ref_transaction_prepare()` (in which case it just calls
> `ref_transaction_free()`).
>
> * Lots of improvements to commit messages and comments, mostly to
>   clarify points that reviewers asked about.

Overall this looked like quite a quality work to me.

> These changes (along with the merge commit that they are based on) are
> also available as branch `packed-ref-store-prep` in my GitHub fork
> [2]. If you'd like to see a preview of the rest of the changes (which
> works but is not yet polished), checkout the `mmap-packed-refs` branch
> from the same place.

Thanks.

Re: [PATCH 0/15] retain blob info for git diff HEAD:foo HEAD:bar

2017-05-23 Thread Junio C Hamano

Patches around [PATCH 06-08/15] made some unexpected (at least to
me) turns but the series told a coherent story, building on top of
what has been achieved in the previous steps.

Thanks for a pleasant read.

Re: [PATCH 05/15] handle_revision_arg: add handle_dotdot() helper

2017-05-23 Thread Junio C Hamano

Jeff King  writes:

> The handle_revision_arg function is rather long, and a big
> chunk of it is handling the range operators. Let's pull that
> out to a separate helper. While we're doing so, we can clean
> up a few of the rough edges that made the flow hard to
> follow:
>
>   - instead of manually restoring *dotdot (that we overwrote
> with a NUL), do the real work in a sub-helper, which
> makes it clear that the munge/restore lines are a
> matched pair
>
>   - eliminate a goto which wasn't actually used for control
> flow, but only to avoid duplicating a few lines
> (instead, those lines are pushed into another helper
> function)
>
>   - use early returns instead of deep nesting
>
>   - consistently name all variables for the left-hand side
> of the range as "a" (rather than "this" or "from") and
> the right-hand side as "b" (rather than "next", or using
> the unadorned "sha1" or "flags" from the main function).
>
> Signed-off-by: Jeff King 
> ---
>  revision.c | 177 
> +++--
>  1 file changed, 102 insertions(+), 75 deletions(-)

Nicely done.  Together with [PATCH 04/15] the resulting code wrt ".."
is much easier to follow.

> diff --git a/revision.c b/revision.c
> index dec06ff8b..eb45501fd 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1429,10 +1429,109 @@ static void prepare_show_merge(struct rev_info *revs)
>   revs->limited = 1;
>  }
>  
> +static int dotdot_missing(const char *arg, char *dotdot,
> +   struct rev_info *revs, int symmetric)
> +{
> + if (revs->ignore_missing)
> + return 0;
> + /* de-munge so we report the full argument */
> + *dotdot = '.';
> + die(symmetric
> + ? "Invalid symmetric difference expression %s"
> + : "Invalid revision range %s", arg);
> +}
> +
> +static int handle_dotdot_1(const char *arg, char *dotdot,
> +struct rev_info *revs, int flags,
> +int cant_be_filename)
> +{
> + const char *a_name, *b_name;
> + struct object_id a_oid, b_oid;
> + struct object *a_obj, *b_obj;
> + unsigned int a_flags, b_flags;
> + int symmetric = 0;
> + unsigned int flags_exclude = flags ^ (UNINTERESTING | BOTTOM);
> +
> + a_name = arg;
> + if (!*a_name)
> + a_name = "HEAD";
> +
> + b_name = dotdot + 2;
> + if (*b_name == '.') {
> + symmetric = 1;
> + b_name++;
> + }
> + if (!*b_name)
> + b_name = "HEAD";
> +
> + if (get_sha1_committish(a_name, a_oid.hash) ||
> + get_sha1_committish(b_name, b_oid.hash))
> + return -1;
> +
> + if (!cant_be_filename) {
> + *dotdot = '.';
> + verify_non_filename(revs->prefix, arg);
> + *dotdot = '\0';
> + }
> +
> + a_obj = parse_object(a_oid.hash);
> + b_obj = parse_object(b_oid.hash);
> + if (!a_obj || !b_obj)
> + return dotdot_missing(arg, dotdot, revs, symmetric);
> +
> + if (!symmetric) {
> + /* just A..B */
> + b_flags = flags;
> + a_flags = flags_exclude;
> + } else {
> + /* A...B -- find merge bases between the two */
> + struct commit *a, *b;
> + struct commit_list *exclude;
> +
> + a = lookup_commit_reference(a_obj->oid.hash);
> + b = lookup_commit_reference(b_obj->oid.hash);
> + if (!a || !b)
> + return dotdot_missing(arg, dotdot, revs, symmetric);
> +
> + exclude = get_merge_bases(a, b);
> + add_rev_cmdline_list(revs, exclude, REV_CMD_MERGE_BASE,
> +  flags_exclude);
> + add_pending_commit_list(revs, exclude, flags_exclude);
> + free_commit_list(exclude);
> +
> + b_flags = flags;
> + a_flags = flags | SYMMETRIC_LEFT;
> + }
> +
> + a_obj->flags |= a_flags;
> + b_obj->flags |= b_flags;
> + add_rev_cmdline(revs, a_obj, a_name, REV_CMD_LEFT, a_flags);
> + add_rev_cmdline(revs, b_obj, b_name, REV_CMD_RIGHT, b_flags);
> + add_pending_object(revs, a_obj, a_name);
> + add_pending_object(revs, b_obj, b_name);
> + return 0;
> +}
> +
> +static int handle_dotdot(const char *arg,
> +  struct rev_info *revs, int flags,
> +  int cant_be_filename)
> +{
> + char *dotdot = strstr(arg, "..");
> + int ret;
> +
> + if (!dotdot)
> + return -1;
> +
> + *dotdot = '\0';
> + ret = handle_dotdot_1(arg, dotdot, revs, flags, cant_be_filename);
> + *dotdot = '.';
> +
> + return ret;
> +}
> +
>  int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, 
> unsigned revarg_opt)
>  {
>   struct object_context oc;
> - char *dotdot;
>   char *mark;
>   struct object *object;
>

Re: [WIP/RFC 00/23] repository object

2017-05-23 Thread Junio C Hamano

Brandon Williams  writes:

> On 05/22, Jeff King wrote:
>> That said, even if we never reached the point where we could handle all
>> submodule requests in-process, I think sticking the repo-related global
>> state in a struct certainly could not hurt general readability. So it's
>> a good direction regardless of whether we take it all the way.
>
> Glad you think so!

Yup, what Jeff said ;-)

Re: Passing revs to git-bundle-create via stdin

2017-05-23 Thread Junio C Hamano

Jeff King  writes:

> I think what's happening is that git-bundle actually runs _two_
> traversals using the command-line arguments. ...
> ... It was just a way of confirming my
> guess about the double-read.
>
> The real solutions I can think of are:
>
>   1. Teach git-bundle not to require the double-read. I'm not sure why
>  it's written the way it is, but presumably it would be tricky to
>  undo it (or we would have just written it the other way in the
>  first place!)

If I remember correctly, the reason why it does the double-read is
because it wants to cope with things like "--since".  There is no
explicit bottom of the DAG specified on the command line, and the
first one (without "--objects") is done to find "prerequisites" that
are written in the header.

Then the packdata is generated, which does another traversal (this
time with "--objects" option).

So perhaps the right way to fix it is to keep the first traversal
as-is, but update the second one (I think write-bundle-refs is the
culprit) so that it does not use the user-supplied command line
as-is; instead it should use the positive end of the history from
the command line with the negative end set to these "prerequisites"
commits.

I said "command line" in the above, but read that as "end user
input"; the list of rev-list command line arguments given from the
standard input is exactly the same deal.

RE: git submodule update --remote fails on non-master branches with shallow clones enabled

2017-05-23 Thread Maxim Chuvilyaev

Hi,

I am experiencing a problem with git submodules:
git cannot update submodules from remote when using a non-master branch and 
shallow clones.

Tested git versions

client: 2.13 (Windows)
server: 2.10 (CentOS 7)

Behaviour

when I have a submodule that is configured to get updates from a non-master 
branch and, at the same time, shallow clones are enabled, "git submodule update 
--remote" fails.

Example configuration in .gitmodules:

[submodule "webapi"]
   path = webapi
   url = ../../api/web-api.git
   branch = release/2.1
   shallow = true
   
The error message is:

fatal: Needed a single revision
Unable to find current origin/release/2.1 revision in submodule path 'webapi'

I think that the cause of this problem is the following:
When this submodule is initialised initially, in ".git/modules/webapi/config" I 
can see that fetch is configured this way:

fetch = +refs/heads/master:refs/remotes/origin/master

That's probably why git cannot find my release/2.1 branch. The question is - 
why this happens despite the submodule is explicitly configured to use a 
non-master branch in .gitmodules?... 

NB 
For the remote git, I enabled both uploadpack.allowReachableSHA1InWant & 
uploadpack.allowTipSHA1InWant.

Kind regards,
Maxim


This e-mail and any attachment(s), is confidential and may be legally 
privileged. It is intended solely for the addressee. If you are not the 
addressee, dissemination, copying or use of this e-mail or any of its content 
is prohibited and may be unlawful. If you are not the intended recipient please 
inform the sender immediately and destroy the e-mail, any attachment(s) and any 
copies. All liability for viruses is excluded to the fullest extent permitted 
by law. It is your responsibility to scan or otherwise check this email and any 
attachment(s). Unless otherwise stated (i) views expressed in this message are 
those of the individual sender (ii) no contract may be construed by this 
e-mail. Emails may be monitored and you are taken to consent to this monitoring.
Civica Pty Limited  ABN 83 003 691 718Incorporated in New South Wales

Re: Another git repo at kernel.org?

2017-05-23 Thread Stefan Beller

On Tue, May 23, 2017 at 4:32 PM, Junio C Hamano  wrote:
> Theodore Ts'o  writes:
>
>> So Junio owns the pub/scm/git/git.git tree on kernel.org, and he may
>> already have access to create new repo's under the pub/scm/git
>> hierarchy.  In which case we might not need to bug the kernel.org
>> administrators at all.
>
> Yes, sorry for a premature inquiry.

Thanks for bearing with my premature questions.
I just went off from https://www.kernel.org/category/contact-us.html
to see what the workflow would look like. Knowing that Junio
owns the complete pub/scm/git namespace is a good data point for
our discussion.

Sorry,
Stefan

Re: [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently

2017-05-23 Thread Philip Oakley

From: "Jeff King" 

On Sat, May 20, 2017 at 03:56:32PM +0100, Philip Oakley wrote:

> That means we do report the correct name for "a" in the
> pending array. But some code paths try to show the whole
> "a..b" name in error messages, and these erroneously show
> only "a" instead of "a..b". E.g.:
>
>  $ git cherry-pick HEAD:foo..HEAD:foo

shouldn't this be three dots? Also the para above uses two dot examples 
in
its description but the paras before that start by describing the three 
dot

case.

Yeah, it should be. The problem happens with the two-dot case, too (it's
the same code) but to provoke cherry-pick to actually show the error in
question, you need to use three-dots. There's probably a way to provoke
a broken error message with the two-dot case, but I didn't dig further
after finding this one.

Thanks for being a careful reader. Here's the patch with a modified
commit message. I don't think this series otherwise needs a re-roll so
far.

Agreed, it's only a small point. It is difficult to cover both the two and 
three dot cases mid flow. Perhaps show the two options very early

-- >8 --
Subject: [PATCH] handle_revision_arg: reset "dotdot" consistently

When we are parsing a range like "a..b", we write a

Perhaps this early?   s/"a..b"/"a..b" or "a...b"/

temporary NUL over the first ".", so that we can access the
names "a" and "b" as C strings. But our restoration of the
original "." is done at inconsistent times, which can lead
to confusing results.

For most calls, we restore the "." after we resolve the
names, but before we call verify_non_filename().  This means
that when we later call add_pending_object(), the name for
the left-hand "a" has been re-expanded to "a..b". You can
see this with:

This now looks 'correct' relative to the initial multi-dot options.

 git log --source a...b

where "b" will be correctly marked with "b", but "a" will be
marked with "a...b". Likewise with "a..b" (though you need
to use --boundary to even see "a" at all in that case).

To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
flag is set, we skip the non-filename check, and leave the
NUL in place.

That means we do report the correct name for "a" in the
pending array. But some code paths try to show the whole
"a...b" name in error messages, and these erroneously show
only "a" instead of "a...b". E.g.:

 $ git cherry-pick HEAD:foo...HEAD:foo
 error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a 
commit
 error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a 
commit

 fatal: Invalid symmetric difference expression HEAD:foo

(That last message should be "HEAD:foo...HEAD:foo"; I used
cherry-pick because it passes the CANNOT_BE_FILENAME flag).

As an interesting side note, cherry-pick actually looks at
and re-resolves the arguments from the pending->name fields.
So it would have been visibly broken by the first bug, but
the effect was canceled out by the second one.

This patch makes the whole function consistent by re-writing
the NUL immediately after calling verify_non_filename(), and
then restoring the "." as appropriate in some error-printing
and early-return code paths.

Signed-off-by: Jeff King 

--
Philip

---
revision.c | 3 +++
t/t4202-log.sh | 9 +
2 files changed, 12 insertions(+)

diff --git a/revision.c b/revision.c
index 8a8c1789c..014bf52e3 100644
--- a/revision.c
+++ b/revision.c
@@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi

 if (!cant_be_filename) {
 *dotdot = '.';
 verify_non_filename(revs->prefix, arg);
+ *dotdot = '\0';
 }

 a_obj = parse_object(from_sha1);
 b_obj = parse_object(sha1);
 if (!a_obj || !b_obj) {
 missing:
+ *dotdot = '.';
 if (revs->ignore_missing)
 return 0;
 die(symmetric
@@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi

 REV_CMD_RIGHT, flags);
 add_pending_object(revs, a_obj, this);
 add_pending_object(revs, b_obj, next);
+ *dotdot = '.';
 return 0;
 }
 *dotdot = '.';
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 1c7d6729c..76c511973 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1392,4 +1392,13 @@ test_expect_success 'log --source paints tag names' 
'

 test_cmp expect actual
'

+test_expect_success 'log --source paints symmetric ranges' '
+ cat >expect <<-\EOF &&
+ 09e12a9 source-b three
+ 8e393e1 source-a two
+ EOF
+ git log --oneline --source source-a...source-b >actual &&
+ test_cmp expect actual
+'
+
test_done
--
2.13.0.387.gec0afcebb.dirty

Re: Another git repo at kernel.org?

2017-05-23 Thread Junio C Hamano

Theodore Ts'o  writes:

> So Junio owns the pub/scm/git/git.git tree on kernel.org, and he may
> already have access to create new repo's under the pub/scm/git
> hierarchy.  In which case we might not need to bug the kernel.org
> administrators at all.

Yes, sorry for a premature inquiry.

Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-23 Thread Philip Oakley


From: "Junio C Hamano" 

Félix Saparelli  writes:


I created a git repository that, for joke reasons, has a single branch
called MASTER (in uppercase). Upon cloning this repo, git attempts to
checkout the master branch (in lowercase), which does not exist.


See what Git told you carefully and you can guess, I think.


guessing is rarely a good user experience ;-) however...



$ git clone g...@github.com:passcod/UPPERCASE-NPM.git
Cloning into 'UPPERCASE-NPM'...
remote: Counting objects: 14, done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 14 (delta 3), reused 14 (delta 3), pack-reused 0
Receiving objects: 100% (14/14), done.
Resolving deltas: 100% (3/3), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.


Perhaps here the warning message could include the value of the ref provided 
by the remote's HEAD which would then more clearly indicate to the user what 
was expected.


I haven't had chance to look at how easy that maybe in the code - perhaps a 
bit of low hanging fruit for someone?




So you have MASTER but not master, but your HEAD still is pointing
at non-existing master.  As HEAD is the way the remote tells the
clone what the default branch to be checked out is, the command
reports "we cannot do a checkout of the default branch."


--
Philip

Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-23 Thread Junio C Hamano

Jeff King  writes:

> ...we can probably restrict it to when autostash is in use, like:
>
>   /*
>* If this is a fast-forward, we can skip calling rebase and
>* just do the merge ourselves. But we don't know about
>* autostash, so use the real rebase command when it's in effect.
>*/
>   if (!autostash && is_descendant_of(merge_head, list)) {
>   opt_ff = "--ff-only";
>   return run_merge();
>   }
>
> AFAICT from the commit introducing this code (33b842a1e9), and the
> surrounding discussion:
>
>   
> http://public-inbox.org/git/of95d98cb6.47969c1c-onc1257fe1.0058d980-c1257fe1.00599...@dakosy.de/T/#u

Sounds like a sensible solution.

> But I notice on the run_merge() code path that we do still invoke
> git-merge.

... wouldn't that what we want even when the merge happens to be a
fast-forward one?  I am not sure what you meant by this, but...

> And rebase has been getting faster as it is moved to C code
> itself. So is this optimization even worth doing anymore?

...that might be something worth thinking about---my gut feeling
tells me something but we should go by a measurement, not by gut
feeling of a random somebody.

Thanks.

Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths

2017-05-23 Thread Junio C Hamano

Samuel Lijin  writes:

> There is an implicit assumption that a directory containing only
> untracked and ignored paths should itself be considered untracked. This
> makes sense in use cases where we're asking if a directory should be
> added to the git database, but not when we're asking if a directory can
> be safely removed from the working tree; as a result, clean -d would
> assume that an "untracked" directory containing ignored paths could be
> deleted, even though doing so would also remove the ignored paths.
>
> To get around this, we teach clean -d to collect ignored paths and skip
> an untracked directory if it contained an ignored path, instead just
> removing the untracked contents thereof. To achieve this, cmd_clean()
> has to collect all untracked contents of untracked directories, in
> addition to all ignored paths, to determine which untracked dirs must be
> skipped (because they contain ignored paths) and which ones should *not*
> be skipped.
>
> For this purpose, correct_untracked_entries() is introduced to prune a
> given dir_struct of untracked entries containing ignored paths and those
> untracked entries encompassed by the untracked entries which are not
> pruned away.
>
> A memory leak is also fixed in cmd_clean().
>
> This also fixes the known breakage in t7300, since clean -d now skips
> untracked directories containing ignored paths.

Nicely explained.  Will replace the previous 6/6 and my squash on
top that were queued on 'pu'.

Thanks.

Re: [PATCH 0/2] Fix warnings on access of a remote with Windows paths

2017-05-23 Thread Junio C Hamano

Johannes Schindelin  writes:

>> In this case, the warning occurs because I build with nd/fopen-errors.
>
> Ah. So the base commit Junio chose for your v1 is completely
> inappropriate. It should be nd/fopen-errors instead.

Actuallly, Hannes's patch text and problem description are
confusingly inconsistent, and I have to say that the above statement
is wrong---do not react to this statement just yet, because I'll
also say "but you are right" real soon.

Because "git fetch a/b" your UNIXy friends run will not consider
"a/b" as a remote nickname, "a\b" in "git fetch a\b" Windows folks
run must not be taken as a remote nickname either.  That is a
justification enough for Hannes's patch text, and that does not
change whether we take nd/fopen-errors or discard it.  So in that
sense, the patch text does not have anything to do with the other
topic.

But you are right (as promised ;-) to say that nd/fopen-errors needs
a bit more polishing to work correctly on Windows.  I unfortunately
do not think Hannes's patch addresses the real issue.  

Isn't the root cause of "Invalid argument" a colon in a (mistakenly
identified) remote nickname, not a backslash?  I doubt that it would
help to loosen valid_remote_nick() to pay attention to backslashes.
Surely, it may hide the issue when the path also has a backslash,
but wouldn't Windows folks see the same warning for "git fetch c:c"
for example even with Hannes's patch?

We use the pattern "try to open a path, and treat a failure to open
as indicating that the path is missing (and take information we
would have obtained from the contents of the path, if existed, from
elsewhere, as necessary)" pretty much everywhere, not just when
accessing a remote.  And nd/fopen-errors tries to tweak the "treat a
failure to open as a sign of missing path" to be a bit more careful
by warning unless the error we get is ENOENT or ENOTDIR (otherwise
we may treat a file that exists but cannot be read as if it is
missing without any indication).

The solution for the problem Hannes's proposed log message describes
lies in warn_on_fopen_errors() function that was introduced in the
nd/fopen-errors topic.  It appears that in Windows port, open() and
fopen() return EINVAL for a file that does not exist and whose name
Windows does not like, so we probably should do something like the
attached to work around the EINVAL (I do not know about the symbol
checked for #ifdef---there may be a more appropriate one).

I am not entirely happy with the workaround, because I do not know
if EINVAL can come _ONLY_ due to a colon in the pathname on Windows,
or if there are some other cases that deserve to be warned that also
yield EINVAL, and the attached will sweep the problem under the rug
for the "other cases", partially undoing what nd/fopen-errors topic
wanted to do.  But it does not make it worse than before the topic
happened [*1*].

And that kind of solution does belong to nd/fopen-errors.

So in short:

 (1) Hannes's patches are good, but they solve a problem that is
 different from what their log messages say; the log message
 needs to be updated;

 (2) In nd/fopen-errors topic, we need a new patch that deals with
 EINVAL on Windows.

Thanks.

[Footnote]

*1* An alternative that may allow us to avoid sweeping the issue
under the rug may be to have a more precise logic to see if the
open failure is due to missing file in the implementation of
open() and fopen() emulation---at that level, the logic can be
built with more platform knowledge (e.g. EINVAL?  Let's see what
path we got---ah, there is a colon)---and turn EINVAL into
ENOENT, or something like that.  I do not have a strong opinion
if that is a better solution or the attached workaround is good
enough.

 wrapper.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/wrapper.c b/wrapper.c
index 6e513c904a..7b6d3e3f94 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -418,9 +418,18 @@ FILE *fopen_for_writing(const char *path)
return ret;
 }

+static int is_an_error_for_missing_path(int no)
+{
+#if WINDOWS
+   if (errno == EINVAL)
+   return 1;
+#endif
+   return (errno == ENOENT || errno == ENOTDIR);
+}
+
 int warn_on_fopen_errors(const char *path)
 {
-   if (errno != ENOENT && errno != ENOTDIR) {
+   if (!is_error_for_missing_path(errno)) {
warn_on_inaccessible(path);
return -1;
}

Re: [PATCH v3 22/30] grep: factor test for \0 in grep patterns into a function

2017-05-23 Thread Brandon Williams

On 05/20, Ævar Arnfjörð Bjarmason wrote:
> Factor the test for \0 in grep patterns into a function. Since commit
> 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
> \0 is considered fixed as regcomp() can't handle it.
> 
> This change makes later changes that make use of either has_null() or
> is_fixed() (but not both) smaller.
> 
> While I'm at it make the comment conform to the style guide, i.e. add
> an opening "/*\n".
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  grep.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/grep.c b/grep.c
> index bf6c2494fd..79eb681c6e 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct 
> grep_pat *p,
>   die("%s'%s': %s", where, p->pattern, error);
>  }
>  
> +static int has_null(const char *s, size_t len)
> +{
> + /*
> +  * regcomp cannot accept patterns with NULs so when using it
> +  * we consider any pattern containing a NUL fixed.
> +  */
> + if (memchr(s, 0, len))
> + return 1;
> +
> + return 0;
> +}
> +
>  #ifdef USE_LIBPCRE
>  static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt 
> *opt)
>  {
> @@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len)
>  {
>   size_t i;
>  
> - /* regcomp cannot accept patterns with NULs so we
> -  * consider any pattern containing a NUL fixed.
> -  */
> - if (memchr(s, 0, len))
> - return 1;
> -
>   for (i = 0; i < len; i++) {
>   if (is_regex_special(s[i]))
>   return 0;
> @@ -451,7 +457,7 @@ static void compile_regexp(struct grep_pat *p, struct 
> grep_opt *opt)
>* simple string match using kws.  p->fixed tells us if we
>* want to use kws.
>*/
> - if (opt->fixed || is_fixed(p->pattern, p->patternlen))
> + if (opt->fixed || has_null(p->pattern, p->patternlen) || 
> is_fixed(p->pattern, p->patternlen))

small nit: longer than 80 char

>   p->fixed = !icase || ascii_only;
>   else
>   p->fixed = 0;
> -- 
> 2.13.0.303.g4ebf302169
> 

-- 
Brandon Williams

Re: [GSoC][PATCH v4 2/2] submodule: port subcommand foreach from shell to C

2017-05-23 Thread Brandon Williams

On 05/23, Stefan Beller wrote:
> On Tue, May 23, 2017 at 12:36 PM, Brandon Williams  wrote:
> >
> > You can set .git_cmd = 1 instead.
> >
> >> + cpr.dir = list_item->name;
> >> + prepare_submodule_repo_env(_array);
> >> +
> >> + argv_array_pushl(, "git", "--super-prefix", 
> >> displaypath,
> >
> > And then you don't need to include "git" here.
> 
> even if git_cmd = 1 is set, you'd need a first dummy argument?
> cf. find_unpushed_submodules, See comment in 9cfa1c260f
> (serialize collection of refs that contain submodule changes, 2016-11-16)

Different subsystem, you don't need a dummy first argument.  The
revision walking code does (for some reason) need a dummy first
argument.

> 
> >> +
> >> + info.argc = argc;
> >> + info.argv = argv;
> >> + info.prefix = prefix;
> >> + info.quiet = !!quiet;
> >> + info.recursive = !!recursive;
> >
> > If these values are boolean why do we need to do the extra '!!'?
> 
> Actually that was my advice. As we only have a limited space in a single
> bit, strange things happen when you were to do:
> 
> quiet = 2; /* be extra quiet */
> info.quiet = quiet;
> 
> This is not the case here, but other commands have evolved over time
> to first take a OPT_BOOL, and then in a later patch an OPT_INT.
> (some commands take a "-v -v -v")
> 
> And by having the double negative we'd have some defensive programming
> right here. (To prove I am not telling crazy stories, $ git log -S \!\!)

All good, I didn't notice that they were bit fields.

-- 
Brandon Williams

Re: [GSoC][PATCH v4 2/2] submodule: port subcommand foreach from shell to C

2017-05-23 Thread Stefan Beller

On Tue, May 23, 2017 at 12:36 PM, Brandon Williams  wrote:
>
> You can set .git_cmd = 1 instead.
>
>> + cpr.dir = list_item->name;
>> + prepare_submodule_repo_env(_array);
>> +
>> + argv_array_pushl(, "git", "--super-prefix", 
>> displaypath,
>
> And then you don't need to include "git" here.

even if git_cmd = 1 is set, you'd need a first dummy argument?
cf. find_unpushed_submodules, See comment in 9cfa1c260f
(serialize collection of refs that contain submodule changes, 2016-11-16)

>> +
>> + info.argc = argc;
>> + info.argv = argv;
>> + info.prefix = prefix;
>> + info.quiet = !!quiet;
>> + info.recursive = !!recursive;
>
> If these values are boolean why do we need to do the extra '!!'?

Actually that was my advice. As we only have a limited space in a single
bit, strange things happen when you were to do:

quiet = 2; /* be extra quiet */
info.quiet = quiet;

This is not the case here, but other commands have evolved over time
to first take a OPT_BOOL, and then in a later patch an OPT_INT.
(some commands take a "-v -v -v")

And by having the double negative we'd have some defensive programming
right here. (To prove I am not telling crazy stories, $ git log -S \!\!)

Re: [PATCH] usage: add NORETURN to BUG() function definitions

2017-05-23 Thread Ramsay Jones

On 23/05/17 04:32, Junio C Hamano wrote:
> Interesting.  One thing that I found somewhat suboptimal is that we
> do not get signalled by non-zero exit.

Warnings don't lead to non-zero exit, but similarly to -Werror, you can
provide a -Wsparse-error to turn warnings into errors:

  $ make builtin/worktree.sp
  SP builtin/worktree.c
  builtin/worktree.c:539:38: warning: Using plain integer as NULL pointer
  $ 
  $ make SPARSE_FLAGS=-Wsparse-error builtin/worktree.sp
  SP builtin/worktree.c
  builtin/worktree.c:539:38: error: Using plain integer as NULL pointer
  Makefile:2370: recipe for target 'builtin/worktree.sp' failed
  make: *** [builtin/worktree.sp] Error 1
  $ 

Unfortunately, that does not help too much because, as I mentioned before,
one warning is actually a sparse problem (and you can't turn it off):

  $ make pack-revindex.sp
  SP pack-revindex.c
  pack-revindex.c:64:23: warning: memset with byte count of 262144
  $ 

This is caused by sparse _unconditionally_ complaining about the byte count
used in calls to memset(), memcpy(), copy_to_user() and copy_from_user().
In addition, the byte count limits are hard-coded (v <= 0 || v > 10).
About a decade ago, I wrote a patch to enable/set the limit value from the
command line, but didn't get around to sending the patch upstream. :-D

[There is actually another problem warning, if you build with NO_REGEX=1].

Since cgcc was intended to be used as proxy for gcc, you might think you
could use CC=cgcc on a regular build, but that has problems of it's own:

  $ make clean >/dev/null 2>&1  # on 'pu' branch, build output in 'pout'
  $ make CC=cgcc >pout1 2>&1
  $ diff pout pout1
  99a100
  > pack-revindex.c:64:23: warning: memset with byte count of 262144
  199a201
  > imap-send.c:1439:9: warning: expression using sizeof on a function
  200a203,207
  > http.c:675:9: warning: expression using sizeof on a function
  > http.c:1676:25: warning: expression using sizeof on a function
  > http.c:1681:25: warning: expression using sizeof on a function
  > http.c:2082:9: warning: expression using sizeof on a function
  > http.c:2249:9: warning: expression using sizeof on a function
  219a227
  > http-walker.c:377:9: warning: expression using sizeof on a function
  222a231,233
  > http-push.c:189:9: warning: expression using sizeof on a function
  > http-push.c:200:9: warning: expression using sizeof on a function
  > http-push.c:202:9: warning: expression using sizeof on a function
  228a240,243
  > remote-curl.c:524:9: warning: expression using sizeof on a function
  > remote-curl.c:605:17: warning: expression using sizeof on a function
  > remote-curl.c:608:17: warning: expression using sizeof on a function
  > remote-curl.c:676:9: warning: expression using sizeof on a function
  374a390
  > builtin/worktree.c:539:38: warning: Using plain integer as NULL pointer

  ...

  $ 

See commit 9371322a60 (sparse: suppress some "using sizeof on a function"
warnings, 06-10-2013) for an explanation of the additional warnings.
I chose the SPARSE_FLAGS method to suppress those warnings, precisely
because I don't build git that way. (git grep -n SPARSE_FLAGS).

So, using CC='cgcc -Wsparse-error' as it stands isn't much help:

  $ make clean >/dev/null 2>&1
  $ make CC='cgcc -Wsparse-error'
  GIT_VERSION = 2.13.0.530.g896b4ae59
  * new build flags
  CC credential-store.o
  * new link flags
  CC common-main.o

  ...

  CC pack-objects.o
  CC pack-revindex.o
  pack-revindex.c:64:23: error: memset with byte count of 262144
  Makefile:2036: recipe for target 'pack-revindex.o' failed
  make: *** [pack-revindex.o] Error 1
  $ 

> Otherwise it would make a
> good addition to the "Static Analysis" task in .travis.yml file.

Unfortunately, some additional work required. :-P

ATB,
Ramsay Jones

Re: [PATCH 01/15] handle_revision_arg: reset "dotdot" consistently

2017-05-23 Thread Jeff King

On Sat, May 20, 2017 at 03:56:32PM +0100, Philip Oakley wrote:

> > That means we do report the correct name for "a" in the
> > pending array. But some code paths try to show the whole
> > "a..b" name in error messages, and these erroneously show
> > only "a" instead of "a..b". E.g.:
> > 
> >  $ git cherry-pick HEAD:foo..HEAD:foo
> 
> shouldn't this be three dots? Also the para above uses two dot examples in
> its description but the paras before that start by describing the three dot
> case.

Yeah, it should be. The problem happens with the two-dot case, too (it's
the same code) but to provoke cherry-pick to actually show the error in
question, you need to use three-dots. There's probably a way to provoke
a broken error message with the two-dot case, but I didn't dig further
after finding this one.

Thanks for being a careful reader. Here's the patch with a modified
commit message. I don't think this series otherwise needs a re-roll so
far.

-- >8 --
Subject: [PATCH] handle_revision_arg: reset "dotdot" consistently

When we are parsing a range like "a..b", we write a
temporary NUL over the first ".", so that we can access the
names "a" and "b" as C strings. But our restoration of the
original "." is done at inconsistent times, which can lead
to confusing results.

For most calls, we restore the "." after we resolve the
names, but before we call verify_non_filename().  This means
that when we later call add_pending_object(), the name for
the left-hand "a" has been re-expanded to "a..b". You can
see this with:

  git log --source a...b

where "b" will be correctly marked with "b", but "a" will be
marked with "a...b". Likewise with "a..b" (though you need
to use --boundary to even see "a" at all in that case).

To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
flag is set, we skip the non-filename check, and leave the
NUL in place.

That means we do report the correct name for "a" in the
pending array. But some code paths try to show the whole
"a...b" name in error messages, and these erroneously show
only "a" instead of "a...b". E.g.:

  $ git cherry-pick HEAD:foo...HEAD:foo
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
  fatal: Invalid symmetric difference expression HEAD:foo

(That last message should be "HEAD:foo...HEAD:foo"; I used
cherry-pick because it passes the CANNOT_BE_FILENAME flag).

As an interesting side note, cherry-pick actually looks at
and re-resolves the arguments from the pending->name fields.
So it would have been visibly broken by the first bug, but
the effect was canceled out by the second one.

This patch makes the whole function consistent by re-writing
the NUL immediately after calling verify_non_filename(), and
then restoring the "." as appropriate in some error-printing
and early-return code paths.

Signed-off-by: Jeff King 
---
 revision.c | 3 +++
 t/t4202-log.sh | 9 +
 2 files changed, 12 insertions(+)

diff --git a/revision.c b/revision.c
index 8a8c1789c..014bf52e3 100644
--- a/revision.c
+++ b/revision.c
@@ -1477,12 +1477,14 @@ int handle_revision_arg(const char *arg_, struct 
rev_info *revs, int flags, unsi
if (!cant_be_filename) {
*dotdot = '.';
verify_non_filename(revs->prefix, arg);
+   *dotdot = '\0';
}

a_obj = parse_object(from_sha1);
b_obj = parse_object(sha1);
if (!a_obj || !b_obj) {
missing:
+   *dotdot = '.';
if (revs->ignore_missing)
return 0;
die(symmetric
@@ -1525,6 +1527,7 @@ int handle_revision_arg(const char *arg_, struct rev_info 
*revs, int flags, unsi
REV_CMD_RIGHT, flags);
add_pending_object(revs, a_obj, this);
add_pending_object(revs, b_obj, next);
+   *dotdot = '.';
return 0;
}
*dotdot = '.';
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 1c7d6729c..76c511973 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1392,4 +1392,13 @@ test_expect_success 'log --source paints tag names' '
test_cmp expect actual
 '

+test_expect_success 'log --source paints symmetric ranges' '
+   cat >expect <<-\EOF &&
+   09e12a9 source-b three
+   8e393e1 source-a two
+   EOF
+   git log --oneline --source source-a...source-b >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.13.0.387.gec0afcebb.dirty

Re: What's cooking in git.git (May 2017, #07; Tue, 23)

2017-05-23 Thread Ævar Arnfjörð Bjarmason

On Tue, May 23, 2017 at 9:38 PM, Stefan Beller  wrote:
> On Tue, May 23, 2017 at 12:08 PM, Stefan Beller  wrote:
>> On Tue, May 23, 2017 at 1:08 AM, Junio C Hamano  wrote:
>>
>>> * sb/submodule-blanket-recursive (2017-05-23) 6 commits
>>>  . builtin/push.c: respect 'submodule.recurse' option
>>>  . builtin/grep.c: respect 'submodule.recurse' option
>>>  . builtin/fetch.c: respect 'submodule.recurse' option
>>>  . Introduce submodule.recurse option for worktree manipulators
>>>  . submodule test invocation: only pass additional arguments
>>>  . submodule.c: add has_submodules to check if we have any submodules
>>>  (this branch uses sb/reset-recurse-submodules.)
>>>
>>>  A new configuration variable "submodule.recurse" can be set to true
>>>  to force various commands run at the top-level superproject to
>>>  behave as if they were invoked with the "--recurse-submodules"
>>>  option.
>>>
>>>  Seems to break t7814 when merged to 'pu'.
>>
>> I will investigate! (It passes on its own, so I guess it is some
>> interference with a recent grep series)
>
> And the winner is 5d52a30eda (grep: amend submodule recursion
> test for regex engine testing, 2017-05-20, by Ævar)
>
> The tests added by grep rely on the old content of
> test 2 'grep correctly finds patterns in a submodule'.

Sorry about the fallout.

> The (whitespace broken) diff below fixes it.
> I think the best way forward is that my series relies on
> that series as a foundation then, and writes correct tests based
> on the file contents at that version.
>
> ---8<---
> diff --git a/t/t7814-grep-recurse-submodules.sh
> b/t/t7814-grep-recurse-submodules.sh
> index 14eeb54b4b..ce9fbbc1f6 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -36,18 +36,18 @@ test_expect_success 'grep correctly finds patterns
> in a submodule' '
>  test_expect_success 'grep finds patterns in a submodule via config' '
> test_config submodule.recurse true &&
> # expect from previous test
> -   git grep -e "bar" >actual &&
> +   git grep -e3 >actual &&
> test_cmp expect actual
>  '
>
>  test_expect_success 'grep --no-recurse-submodules overrides config' '
> test_config submodule.recurse true &&
> cat >expect <<-\EOF &&
> -   a:foobar
> -   b/b:bar
> +   a:(1|2)d(3|4)
> +   b/b:(3|4)
> EOF
>
> -   git grep -e "bar" --no-recurse-submodules >actual &&
> +   git grep -e4 --no-recurse-submodules >actual &&

The rest of my changed just did:

foobar -> (1|2)d(3|4)
foo-> (1|2)
bar-> (3|4)

While this works might want to do e.g. `-e "(3|4)"` here like the
rest. This works, but probably confusing going forward when it's the
only exception.

> test_cmp expect actual
>  '
> ---8<---
>
> Thanks,
> Stefan

Re: [PATCH v2 25/25] cache_ref_iterator_begin(): avoid priming unneeded directories

2017-05-23 Thread Jeff King

On Mon, May 22, 2017 at 04:17:55PM +0200, Michael Haggerty wrote:

> So:
> 
> * Move the responsibility for doing the prefix check directly to
>   `cache_ref_iterator`. This means that `cache_ref_iterator_begin()`
>   never has to wrap its return value in a `prefix_ref_iterator`.
> 
> * Teach `cache_ref_iterator_begin()` (and `prime_ref_dir()`) to be
>   stricter about what they iterate over and what directories they
>   prime.
> 
> * Teach `cache_ref_iterator` to keep track of whether the current
>   `cache_ref_iterator_level` is fully within the prefix. If so, skip
>   the prefix checks entirely.

As promised, I came back to this one with a more careful eye. These
changes all make sense to me, and the implementation matches.

My only question is:

> @@ -511,9 +582,12 @@ struct ref_iterator *cache_ref_iterator_begin(struct 
> ref_cache *cache,
>   level->index = -1;
>   level->dir = dir;
>  
> - if (prefix && *prefix)
> - ref_iterator = prefix_ref_iterator_begin(ref_iterator,
> -  prefix, 0);
> + if (prefix && *prefix) {
> + iter->prefix = xstrdup(prefix);
> + level->prefix_state = PREFIX_WITHIN_DIR;
> + } else {
> + level->prefix_state = PREFIX_CONTAINS_DIR;
> + }

Who frees the prefix? Does this need:

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index fda3942db..a3efc5c51 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -542,6 +542,7 @@ static int cache_ref_iterator_abort(struct ref_iterator 
*ref_iterator)
struct cache_ref_iterator *iter =
(struct cache_ref_iterator *)ref_iterator;
 
+   free(iter->prefix);
free(iter->levels);
base_ref_iterator_free(ref_iterator);
return ITER_DONE;

-Peff

Re: What's cooking in git.git (May 2017, #07; Tue, 23)

2017-05-23 Thread Stefan Beller

On Tue, May 23, 2017 at 12:08 PM, Stefan Beller  wrote:
> On Tue, May 23, 2017 at 1:08 AM, Junio C Hamano  wrote:
>
>> * sb/submodule-blanket-recursive (2017-05-23) 6 commits
>>  . builtin/push.c: respect 'submodule.recurse' option
>>  . builtin/grep.c: respect 'submodule.recurse' option
>>  . builtin/fetch.c: respect 'submodule.recurse' option
>>  . Introduce submodule.recurse option for worktree manipulators
>>  . submodule test invocation: only pass additional arguments
>>  . submodule.c: add has_submodules to check if we have any submodules
>>  (this branch uses sb/reset-recurse-submodules.)
>>
>>  A new configuration variable "submodule.recurse" can be set to true
>>  to force various commands run at the top-level superproject to
>>  behave as if they were invoked with the "--recurse-submodules"
>>  option.
>>
>>  Seems to break t7814 when merged to 'pu'.
>
> I will investigate! (It passes on its own, so I guess it is some
> interference with a recent grep series)

And the winner is 5d52a30eda (grep: amend submodule recursion
test for regex engine testing, 2017-05-20, by Ævar)

The tests added by grep rely on the old content of
test 2 'grep correctly finds patterns in a submodule'.

The (whitespace broken) diff below fixes it.
I think the best way forward is that my series relies on
that series as a foundation then, and writes correct tests based
on the file contents at that version.

---8<---
diff --git a/t/t7814-grep-recurse-submodules.sh
b/t/t7814-grep-recurse-submodules.sh
index 14eeb54b4b..ce9fbbc1f6 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -36,18 +36,18 @@ test_expect_success 'grep correctly finds patterns
in a submodule' '
 test_expect_success 'grep finds patterns in a submodule via config' '
test_config submodule.recurse true &&
# expect from previous test
-   git grep -e "bar" >actual &&
+   git grep -e3 >actual &&
test_cmp expect actual
 '

 test_expect_success 'grep --no-recurse-submodules overrides config' '
test_config submodule.recurse true &&
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
EOF

-   git grep -e "bar" --no-recurse-submodules >actual &&
+   git grep -e4 --no-recurse-submodules >actual &&
test_cmp expect actual
 '

---8<---

Thanks,
Stefan

Re: [GSoC][PATCH v4 2/2] submodule: port subcommand foreach from shell to C

2017-05-23 Thread Brandon Williams

On 05/21, Prathamesh Chavan wrote:
> This aims to make git-submodule foreach a builtin. This is the very
> first step taken in this direction. Hence, 'foreach' is ported to
> submodule--helper, and submodule--helper is called from git-submodule.sh.
> The code is split up to have one function to obtain all the list of
> submodules. This function acts as the front-end of git-submodule foreach
> subcommand. It calls the function for_each_submodule_list, which basically
> loops through the list and calls function fn, which in this case is
> runcommand_in_submodule. This third function is a calling function that
> takes care of running the command in that submodule, and recursively
> perform the same when --recursive is flagged.
> 
> The first function module_foreach first parses the options present in
> argv, and then with the help of module_list_compute, generates the list of
> submodules present in the current working tree.
> 
> The second function for_each_submodule_list traverses through the
> list, and calls function fn (which in case of submodule subcommand
> foreach is runcommand_in_submodule) is called for each entry.
> 
> The third function runcommand_in_submodule, generates a submodule struct sub
> for $name, value and then later prepends name=sub->name; and other
> value assignment to the env argv_array structure of a child_process.
> Also the  of submodule-foreach is push to args argv_array
> structure and finally, using run_command the commands are executed
> using a shell.
> 
> The third function also takes care of the recursive flag, by creating
> a separate child_process structure and prepending "--super-prefix 
> displaypath",
> to the args argv_array structure. Other required arguments and the
> input  of submodule-foreach is also appended to this argv_array.
> 
> The commit 1c4fb136db (submodule foreach: skip eval for more than one
> argument, 2013-09-27), which explains that why for the case when argc>1,
> we do not use eval. But since in this patch, we are calling the
> command in a separate shell itself for all values of argc, this case
> is not considered separately.
> 
> Both env variable $path and $sm_path were added since both are used in
> tests in t7407.
> 
> Helped-by: Brandon Williams 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> This series of patch is based on gitster/jk/bug-to-abort for untilizing its
> BUG() macro.
> 
> In this new version of patch, a new function
> get_submodule_displaypath is introduced, which is the same one
> as that in the patch series for porting of submodule subcommand
> status. I had to again introduce this in this patch as well as
> I am working on two separate branches for parting of each function.
> Also, the function for_each_submodule_list repeats for the same
> reason.
> 
> I have pushed this work on Github at:
> https://github.com/pratham-pc/git/commits/foreach
> 
> Its build report is available at:
> https://travis-ci.org/pratham-pc/git/builds/
> Branch: foreach
> Build #67
> 
> I have also made some changes in git-submodule.sh for correcting
> the $path variable. And hence made the corresponding changes in
> the new test introduced in t7407-submodule-foreach as well.
> I have push this work at:
> https://github.com/pratham-pc/git/commits/foreach-bug-fixed
> 
> Its build report is available at:
> https://travis-ci.org/pratham-pc/git/builds/
> Branch: foreach-bug-fixed
> Build #66
> 
>  builtin/submodule--helper.c | 142 
> 
>  git-submodule.sh|  39 +---
>  2 files changed, 143 insertions(+), 38 deletions(-)
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 566a5b6a6..4e19beaff 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -13,6 +13,8 @@
>  #include "refs.h"
>  #include "connect.h"
>  
> +typedef void (*submodule_list_func_t)(const struct cache_entry *list_item, 
> void *cb_data);
> +
>  static char *get_default_remote(void)
>  {
>   char *dest = NULL, *ret;
> @@ -219,6 +221,23 @@ static int resolve_relative_url_test(int argc, const 
> char **argv, const char *pr
>   return 0;
>  }
>  
> +static char *get_submodule_displaypath(const char *path, const char *prefix)
> +{
> + const char *super_prefix = get_super_prefix();
> +
> + if (prefix && super_prefix) {
> + BUG("cannot have prefix '%s' and superprefix '%s'",
> + prefix, super_prefix);
> + } else if (prefix) {
> + struct strbuf sb = STRBUF_INIT;
> + return xstrdup(relative_path(path, prefix, ));

You have a potential memory leak here, you need to release the strbuf
before returning.

> + } else if (super_prefix) {
> + return xstrfmt("%s/%s", super_prefix, path);
> + } else {
> + return xstrdup(path);
> + }
>

[PATCH v6 6/6] clean: teach clean -d to preserve ignored paths

2017-05-23 Thread Samuel Lijin

There is an implicit assumption that a directory containing only
untracked and ignored paths should itself be considered untracked. This
makes sense in use cases where we're asking if a directory should be
added to the git database, but not when we're asking if a directory can
be safely removed from the working tree; as a result, clean -d would
assume that an "untracked" directory containing ignored paths could be
deleted, even though doing so would also remove the ignored paths.

To get around this, we teach clean -d to collect ignored paths and skip
an untracked directory if it contained an ignored path, instead just
removing the untracked contents thereof. To achieve this, cmd_clean()
has to collect all untracked contents of untracked directories, in
addition to all ignored paths, to determine which untracked dirs must be
skipped (because they contain ignored paths) and which ones should *not*
be skipped.

For this purpose, correct_untracked_entries() is introduced to prune a
given dir_struct of untracked entries containing ignored paths and those
untracked entries encompassed by the untracked entries which are not
pruned away.

A memory leak is also fixed in cmd_clean().

This also fixes the known breakage in t7300, since clean -d now skips
untracked directories containing ignored paths.

Signed-off-by: Samuel Lijin 
---
 builtin/clean.c  | 42 ++
 t/t7300-clean.sh |  2 +-
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index d861f836a..937eb17b6 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -857,6 +857,38 @@ static void interactive_main_loop(void)
}
 }
 
+static void correct_untracked_entries(struct dir_struct *dir)
+{
+   int src, dst, ign;
+
+   for (src = dst = ign = 0; src < dir->nr; src++) {
+   /* skip paths in ignored[] that cannot be inside entries[src] */
+   while (ign < dir->ignored_nr &&
+  0 <= cmp_dir_entry(>entries[src], 
>ignored[ign]))
+   ign++;
+
+   if (ign < dir->ignored_nr &&
+   check_dir_entry_contains(dir->entries[src], 
dir->ignored[ign])) {
+   /* entries[src] contains an ignored path, so we drop it 
*/
+   free(dir->entries[src]);
+   } else {
+   struct dir_entry *ent = dir->entries[src++];
+
+   /* entries[src] does not contain an ignored path, so we 
keep it */
+   dir->entries[dst++] = ent;
+
+   /* then discard paths in entries[] contained inside 
entries[src] */
+   while (src < dir->nr &&
+  check_dir_entry_contains(ent, dir->entries[src]))
+   free(dir->entries[src++]);
+
+   /* compensate for the outer loop's loop control */
+   src--;
+   }
+   }
+   dir->nr = dst;
+}
+
 int cmd_clean(int argc, const char **argv, const char *prefix)
 {
int i, res;
@@ -916,6 +948,9 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
 
dir.flags |= DIR_SHOW_OTHER_DIRECTORIES;
 
+   if (remove_directories)
+   dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
+
if (read_cache() < 0)
die(_("index file corrupt"));
 
@@ -931,6 +966,7 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
   prefix, argv);
 
fill_directory(, );
+   correct_untracked_entries();
 
for (i = 0; i < dir.nr; i++) {
struct dir_entry *ent = dir.entries[i];
@@ -958,6 +994,12 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)
string_list_append(_list, rel);
}
 
+   for (i = 0; i < dir.nr; i++)
+   free(dir.entries[i]);
+
+   for (i = 0; i < dir.ignored_nr; i++)
+   free(dir.ignored[i]);
+
if (interactive && del_list.nr > 0)
interactive_main_loop();
 
diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index 3a2d709c2..7b36954d6 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -653,7 +653,7 @@ test_expect_success 'git clean -d respects pathspecs 
(pathspec is prefix of dir)
test_path_is_dir foobar
 '
 
-test_expect_failure 'git clean -d skips untracked dirs containing ignored 
files' '
+test_expect_success 'git clean -d skips untracked dirs containing ignored 
files' '
echo /foo/bar >.gitignore &&
echo ignoreme >>.gitignore &&
rm -rf foo &&
-- 
2.13.0

[PATCH v6 3/6] dir: recurse into untracked dirs for ignored files

2017-05-23 Thread Samuel Lijin

We consider directories containing only untracked and ignored files to
be themselves untracked, which in the usual case means we don't have to
search these directories. This is problematic when we want to collect
ignored files with DIR_SHOW_IGNORED_TOO, though, so we teach
read_directory_recursive() to recurse into untracked directories to find
the ignored files they contain when DIR_SHOW_IGNORED_TOO is set. This
has the side effect of also collecting all untracked files in untracked
directories as well.

Signed-off-by: Samuel Lijin 
---
 dir.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index f451bfa48..68cf6e47c 100644
--- a/dir.c
+++ b/dir.c
@@ -1784,7 +1784,10 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
dir_state = state;
 
/* recurse into subdir if instructed by treat_path */
-   if (state == path_recurse) {
+   if ((state == path_recurse) ||
+   ((state == path_untracked) &&
+(dir->flags & DIR_SHOW_IGNORED_TOO) &&
+(get_dtype(cdir.de, path.buf, path.len) == DT_DIR))) {
struct untracked_cache_dir *ud;
ud = lookup_untracked(dir->untracked, untracked,
  path.buf + baselen,
-- 
2.13.0

[PATCH v6 2/6] t7061: status --ignored should search untracked dirs

2017-05-23 Thread Samuel Lijin

Per eb8c5b87, `status --ignored` by design does not list ignored files
if they are in a directory which contains only ignored and untracked
files (which is itself considered to be untracked) without `-uall`. This
does not make sense for `--ignored`, which claims to "Show ignored files
as well."

Thus we revisit eb8c5b87 and decide that for such directories, `status
--ignored` will list the directory as untracked *and* list all ignored
files within said directory even without `-uall`.

Signed-off-by: Samuel Lijin 
---
 t/t7061-wtstatus-ignore.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index cdc0747bf..15e7592b6 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -9,9 +9,10 @@ cat >expected <<\EOF
 ?? actual
 ?? expected
 ?? untracked/
+!! untracked/ignored
 EOF
 
-test_expect_success 'status untracked directory with --ignored' '
+test_expect_failure 'status untracked directory with --ignored' '
echo "ignored" >.gitignore &&
mkdir untracked &&
: >untracked/ignored &&
@@ -20,7 +21,7 @@ test_expect_success 'status untracked directory with 
--ignored' '
test_cmp expected actual
 '
 
-test_expect_success 'same with gitignore starting with BOM' '
+test_expect_failure 'same with gitignore starting with BOM' '
printf "\357\273\277ignored\n" >.gitignore &&
mkdir -p untracked &&
: >untracked/ignored &&
-- 
2.13.0

[PATCH v6 4/6] dir: hide untracked contents of untracked dirs

2017-05-23 Thread Samuel Lijin

When we taught read_directory_recursive() to recurse into untracked
directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that
had the side effect of teaching it to collect the untracked contents of
untracked directories. It doesn't always make sense to return these,
though (we do need them for `clean -d`), so we introduce a flag
(DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory()
strips dir->entries of the untracked contents of untracked dirs.

We also introduce check_contains() to check if one dir_entry corresponds
to a path which contains the path corresponding to another dir_entry.

This also fixes known breakages in t7061, since status --ignored now
searches untracked directories for ignored files.

Signed-off-by: Samuel Lijin 
---
 Documentation/technical/api-directory-listing.txt |  6 +
 dir.c | 31 +++
 dir.h |  3 ++-
 t/t7061-wtstatus-ignore.sh|  4 +--
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/api-directory-listing.txt 
b/Documentation/technical/api-directory-listing.txt
index 7f8e78d91..6c77b4920 100644
--- a/Documentation/technical/api-directory-listing.txt
+++ b/Documentation/technical/api-directory-listing.txt
@@ -33,6 +33,12 @@ The notable options are:
Similar to `DIR_SHOW_IGNORED`, but return ignored files in `ignored[]`
in addition to untracked files in `entries[]`.
 
+`DIR_KEEP_UNTRACKED_CONTENTS`:::
+
+   Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, 
the
+   untracked contents of untracked directories are also returned in
+   `entries[]`.
+
 `DIR_COLLECT_IGNORED`:::
 
Special mode for git-add. Return ignored files in `ignored[]` and
diff --git a/dir.c b/dir.c
index 68cf6e47c..ba5eadeda 100644
--- a/dir.c
+++ b/dir.c
@@ -1850,6 +1850,14 @@ static int cmp_name(const void *p1, const void *p2)
return name_compare(e1->name, e1->len, e2->name, e2->len);
 }
 
+/* check if *out lexically strictly contains *in */
+static int check_contains(const struct dir_entry *out, const struct dir_entry 
*in)
+{
+   return (out->len < in->len) &&
+   (out->name[out->len - 1] == '/') &&
+   !memcmp(out->name, in->name, out->len);
+}
+
 static int treat_leading_path(struct dir_struct *dir,
  const char *path, int len,
  const struct pathspec *pathspec)
@@ -2065,6 +2073,29 @@ int read_directory(struct dir_struct *dir, const char 
*path,
read_directory_recursive(dir, path, len, untracked, 0, 
pathspec);
QSORT(dir->entries, dir->nr, cmp_name);
QSORT(dir->ignored, dir->ignored_nr, cmp_name);
+
+   /* 
+* if DIR_SHOW_IGNORED_TOO, read_directory_recursive() will also pick
+* up untracked contents of untracked dirs; by default we discard these,
+* but given DIR_KEEP_UNTRACKED_CONTENTS we do not
+*/
+   if ((dir->flags & DIR_SHOW_IGNORED_TOO) &&
+!(dir->flags & DIR_KEEP_UNTRACKED_CONTENTS)) {
+   int i, j;
+
+   /* remove from dir->entries untracked contents of untracked 
dirs */
+   for (i = j = 0; j < dir->nr; j++) {
+   if (i && check_contains(dir->entries[i - 1], 
dir->entries[j])) {
+   free(dir->entries[j]);
+   dir->entries[j] = NULL;
+   } else {
+   dir->entries[i++] = dir->entries[j];
+   }
+   }
+
+   dir->nr = i;
+   }
+
if (dir->untracked) {
static struct trace_key trace_untracked_stats = 
TRACE_KEY_INIT(UNTRACKED_STATS);
trace_printf_key(_untracked_stats,
diff --git a/dir.h b/dir.h
index bf23a470a..650e54bdf 100644
--- a/dir.h
+++ b/dir.h
@@ -151,7 +151,8 @@ struct dir_struct {
DIR_NO_GITLINKS = 1<<3,
DIR_COLLECT_IGNORED = 1<<4,
DIR_SHOW_IGNORED_TOO = 1<<5,
-   DIR_COLLECT_KILLED_ONLY = 1<<6
+   DIR_COLLECT_KILLED_ONLY = 1<<6,
+   DIR_KEEP_UNTRACKED_CONTENTS = 1<<7
} flags;
struct dir_entry **entries;
struct dir_entry **ignored;
diff --git a/t/t7061-wtstatus-ignore.sh b/t/t7061-wtstatus-ignore.sh
index 15e7592b6..fc6013ba3 100755
--- a/t/t7061-wtstatus-ignore.sh
+++ b/t/t7061-wtstatus-ignore.sh
@@ -12,7 +12,7 @@ cat >expected <<\EOF
 !! untracked/ignored
 EOF
 
-test_expect_failure 'status untracked directory with --ignored' '
+test_expect_success 'status untracked directory with --ignored' '
echo "ignored" >.gitignore &&
mkdir untracked &&
: >untracked/ignored &&
@@ -21,7 +21,7 @@ test_expect_failure 'status untracked directory with 
--ignored' '

[PATCH v6 5/6] dir: expose cmp_name() and check_contains()

2017-05-23 Thread Samuel Lijin

We want to use cmp_name() and check_contains() (which both compare
`struct dir_entry`s, the former in terms of the sort order, the latter
in terms of whether one lexically contains another) outside of dir.c,
so we have to (1) change their linkage and (2) rename them as
appropriate for the global namespace. The second is achieved by
renaming cmp_name() to cmp_dir_entry() and check_contains() to
check_dir_entry_contains().

Signed-off-by: Samuel Lijin 
---
 dir.c | 11 ++-
 dir.h |  3 +++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/dir.c b/dir.c
index ba5eadeda..aae6d00b4 100644
--- a/dir.c
+++ b/dir.c
@@ -1842,7 +1842,7 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
return dir_state;
 }
 
-static int cmp_name(const void *p1, const void *p2)
+int cmp_dir_entry(const void *p1, const void *p2)
 {
const struct dir_entry *e1 = *(const struct dir_entry **)p1;
const struct dir_entry *e2 = *(const struct dir_entry **)p2;
@@ -1851,7 +1851,7 @@ static int cmp_name(const void *p1, const void *p2)
 }
 
 /* check if *out lexically strictly contains *in */
-static int check_contains(const struct dir_entry *out, const struct dir_entry 
*in)
+int check_dir_entry_contains(const struct dir_entry *out, const struct 
dir_entry *in)
 {
return (out->len < in->len) &&
(out->name[out->len - 1] == '/') &&
@@ -2071,8 +2071,8 @@ int read_directory(struct dir_struct *dir, const char 
*path,
dir->untracked = NULL;
if (!len || treat_leading_path(dir, path, len, pathspec))
read_directory_recursive(dir, path, len, untracked, 0, 
pathspec);
-   QSORT(dir->entries, dir->nr, cmp_name);
-   QSORT(dir->ignored, dir->ignored_nr, cmp_name);
+   QSORT(dir->entries, dir->nr, cmp_dir_entry);
+   QSORT(dir->ignored, dir->ignored_nr, cmp_dir_entry);
 
/* 
 * if DIR_SHOW_IGNORED_TOO, read_directory_recursive() will also pick
@@ -2085,7 +2085,8 @@ int read_directory(struct dir_struct *dir, const char 
*path,
 
/* remove from dir->entries untracked contents of untracked 
dirs */
for (i = j = 0; j < dir->nr; j++) {
-   if (i && check_contains(dir->entries[i - 1], 
dir->entries[j])) {
+   if (i &&
+   check_dir_entry_contains(dir->entries[i - 1], 
dir->entries[j])) {
free(dir->entries[j]);
dir->entries[j] = NULL;
} else {
diff --git a/dir.h b/dir.h
index 650e54bdf..edb5fda58 100644
--- a/dir.h
+++ b/dir.h
@@ -327,6 +327,9 @@ static inline int dir_path_match(const struct dir_entry 
*ent,
  has_trailing_dir);
 }
 
+int cmp_dir_entry(const void *p1, const void *p2);
+int check_dir_entry_contains(const struct dir_entry *out, const struct 
dir_entry *in);
+
 void untracked_cache_invalidate_path(struct index_state *, const char *);
 void untracked_cache_remove_from_index(struct index_state *, const char *);
 void untracked_cache_add_to_index(struct index_state *, const char *);
-- 
2.13.0

[PATCH v6 1/6] t7300: clean -d should skip dirs with ignored files

2017-05-23 Thread Samuel Lijin

If git sees a directory which contains only untracked and ignored
files, clean -d should not remove that directory. It was recently
discovered that this is *not* true of git clean -d, and it's possible
that this has never worked correctly; this test and its accompanying
patch series aims to fix that.

Signed-off-by: Samuel Lijin 
---
 t/t7300-clean.sh | 16 
 1 file changed, 16 insertions(+)

diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
index b89fd2a6a..3a2d709c2 100755
--- a/t/t7300-clean.sh
+++ b/t/t7300-clean.sh
@@ -653,4 +653,20 @@ test_expect_success 'git clean -d respects pathspecs 
(pathspec is prefix of dir)
test_path_is_dir foobar
 '
 
+test_expect_failure 'git clean -d skips untracked dirs containing ignored 
files' '
+   echo /foo/bar >.gitignore &&
+   echo ignoreme >>.gitignore &&
+   rm -rf foo &&
+   mkdir -p foo/a/aa/aaa foo/b/bb/bbb &&
+   touch foo/bar foo/baz foo/a/aa/ignoreme foo/b/ignoreme foo/b/bb/1 
foo/b/bb/2 &&
+   git clean -df &&
+   test_path_is_dir foo &&
+   test_path_is_file foo/bar &&
+   test_path_is_missing foo/baz &&
+   test_path_is_file foo/a/aa/ignoreme &&
+   test_path_is_missing foo/a/aa/aaa &&
+   test_path_is_file foo/b/ignoreme &&
+   test_path_is_missing foo/b/bb
+'
+
 test_done
-- 
2.13.0

[PATCH v6 0/6] Fix clean -d and status --ignored

2017-05-23 Thread Samuel Lijin

Messed up on 6/6 in v5, forgot to include changes from earlier versions (karma
for not running tests before I send-email'd the patch series).

Samuel Lijin (6):
  t7300: clean -d should skip dirs with ignored files
  t7061: status --ignored should search untracked dirs
  dir: recurse into untracked dirs for ignored files
  dir: hide untracked contents of untracked dirs
  dir: expose cmp_name() and check_contains()
  clean: teach clean -d to preserve ignored paths

 Documentation/technical/api-directory-listing.txt |  6 
 builtin/clean.c   | 42 ++
 dir.c | 43 ---
 dir.h |  6 +++-
 t/t7061-wtstatus-ignore.sh|  1 +
 t/t7300-clean.sh  | 16 +
 6 files changed, 109 insertions(+), 5 deletions(-)

-- 
2.13.0

[PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.

This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).

There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwy...@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.

That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.

This change is not meant as an optimization, any performance gains
from this are in the hundreds to thousands of nanoseconds at most. If
we wanted more performance here we could just re-use the compiled
patterns in multiple threads (regcomp(3) is thread-safe), or partially
re-use them and the associated structures in the case of later PCRE
JIT changes.

Rather, it's just to make the code easier to reason about. It's
confusing to debug this under threading & non-threading when the
threading codepaths redundantly compile a pattern which is never used.

The reason the patterns are recompiled is as a side-effect of
duplicating the whole grep_opt structure, which is not thread safe,
writable, and munged during execution. The grep_opt structure then
points to the grep_pat structure where pattern or patterns are stored.

I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT changes to follow the most expensive PCRE
patterns take something like 0.0X milliseconds to compile at most[1].

The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.

1. http://sljit.sourceforge.net/pcre.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index b1095362fb..12e62fcbf3 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt)
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
-   o->debug = 0;
+   if (i)
+   o->debug = 0;
compile_grep_patterns(o);
err = pthread_create([i], NULL, run, o);
 
@@ -1167,8 +1168,6 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
if (!opt.fixed && opt.ignore_case)
opt.regflags |= REG_ICASE;
 
-   compile_grep_patterns();
-
/*
 * We have to find "--" in a separate pass, because its presence
 * influences how we will parse arguments that come before it.
@@ -1245,6 +1244,15 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
num_threads = 0;
 #endif
 
+   if (!num_threads)
+   /*
+* The compiled patterns on the main path are only
+* used when not using threading. Otherwise
+* start_threads() below calls compile_grep_patterns()
+* for each thread.
+*/
+   compile_grep_patterns();
+
 #ifndef NO_PTHREADS
if (num_threads) {
if (!(opt.name_only || opt.unmatch_name_only || opt.count)
-- 
2.13.0.303.g4ebf302169

[PATCH v2 7/7] grep: add support for PCRE v2

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease 
CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
p7820-grep-engines.sh
[...]
Test   HEAD~2HEAD~  
  HEAD


7820.3: perl grep 'how.to'  0.22(0.40+0.48)   
0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
7820.7: perl grep '^how to' 0.27(0.62+0.50)   
0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
7820.11: perl grep '[how] to'   0.33(0.92+0.47)   
0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.35(1.08+0.46)   
0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.78+0.51)   
0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

Test   HEAD~ HEAD

---
7820.3: perl grep 'how.to'  0.23(0.41+0.47)   
0.23(0.26+0.59) +0.0%
7820.7: perl grep '^how to' 0.27(0.64+0.47)   
0.23(0.28+0.56) -14.8%
7820.11: perl grep '[how] to'   0.34(0.95+0.44)   
0.25(0.38+0.56) -26.5%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.34(1.07+0.46)   
0.24(0.52+0.49) -29.4%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.81+0.46)   
0.22(0.33+0.54) -26.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  30 +---
 configure.ac  |  77 ++-
 grep.c| 143 ++
 grep.h|  17 +++
 t/test-lib.sh |   2 +-
 5 files changed, 250 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile

[PATCH v2 4/7] grep: add support for the PCRE v1 JIT API

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD 
p7820-grep-engines.sh
Test   HEAD~ HEAD

---
7820.3: perl grep 'how.to'  0.35(1.11+0.43)   
0.23(0.42+0.46) -34.3%
7820.7: perl grep '^how to' 0.64(2.71+0.36)   
0.27(0.66+0.44) -57.8%
7820.11: perl grep '[how] to'   0.63(2.51+0.42)   
0.33(0.98+0.39) -47.6%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   1.17(5.61+0.35)   
0.34(1.08+0.46) -70.9%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.43(1.52+0.44)   
0.30(0.88+0.42) -30.2%

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 34 +-
 grep.h |  6 ++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1157529115..49e9aed457 100644
--- a/grep.c
+++ b/grep.c
@@ -351,6 +351,9 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
const char *error;
int erroffset;
int options = PCRE_MULTILINE;
+#ifdef PCRE_CONFIG_JIT
+   int canjit;
+#endif
 
if (opt->ignore_case) {
if (has_non_ascii(p->pattern))
@@ -365,9 +368,20 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, );
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 
PCRE_STUDY_JIT_COMPILE, );
if (!p->pcre1_extra_info && error)
die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+   pcre_config(PCRE_CONFIG_JIT, );
+   if (canjit == 1) {
+   p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+   if (!p->pcre1_jit_stack)
+   die("BUG: Couldn't allocate PCRE JIT stack");
+   pcre_assign_jit_stack(p->pcre1_extra_info, NULL, 
p->pcre1_jit_stack);
+   p->pcre1_jit_on = 1;
+   }
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -378,8 +392,17 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
+#ifdef PCRE_CONFIG_JIT
+   if (p->pcre1_jit_on)
+   ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+   eol - line, 0, flags, ovector,
+   ARRAY_SIZE(ovector), p->pcre1_jit_stack);
+   else
+#endif
+   /* PCRE_CONFIG_JIT !p->pcre1_jit_on else branch */
ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
0, flags, ovector, ARRAY_SIZE(ovector));
+
if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
die("pcre_exec failed with error code %d", ret);
if (ret > 0) {
@@ -394,7 +417,16 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
pcre_free(p->pcre1_regexp);
+
+#ifdef PCRE_CONFIG_JIT
+   if (p->pcre1_jit_on) {
+   pcre_free_study(p->pcre1_extra_info);
+

[PATCH v2 3/7] log: add -P as a synonym for --perl-regexp

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Make it consistent with "grep" rather than to keep it open for future
use, and to avoid the confusion of -P meaning different things for
grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/rev-list-options.txt |  1 +
 revision.c |  2 +-
 t/t4202-log.sh | 12 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a46f70c2b1..9c44eae55d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
Consider the limiting patterns to be fixed strings (don't interpret
pattern as a regular expression).
 
+-P::
 --perl-regexp::
Consider the limiting patterns to be Perl-compatible regular
expressions.
diff --git a/revision.c b/revision.c
index 4883cdd2d0..60329da1bd 100644
--- a/revision.c
+++ b/revision.c
@@ -1996,7 +1996,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
DIFF_OPT_SET(>diffopt, PICKAXE_IGNORE_CASE);
} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-   } else if (!strcmp(arg, "--perl-regexp")) {
+   } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
} else if (!strcmp(arg, "--all-match")) {
revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index dbed3efeee..2b07d1c0c2 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -404,8 +404,20 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
--grep="(1|2)" >actual.fixed.short-arg &&
git log --pretty=tformat:%s -E \
--grep="\|2" >actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   git log --pretty=tformat:%s -P \
+   --grep="[\d]\|" >actual.perl.short-arg
+   else
+   test_must_fail git log -P \
+   --grep="[\d]\|"
+   fi &&
test_cmp expect.fixed actual.fixed.short-arg &&
test_cmp expect.extended actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   test_cmp expect.perl actual.perl.short-arg
+   fi &&
 
git log --pretty=tformat:%s --fixed-strings \
--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.13.0.303.g4ebf302169

[PATCH v2 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes

2017-05-23 Thread Ævar Arnfjörð Bjarmason

On Sun, May 21, 2017 at 1:50 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Easy to review? 29 (I mean 30) patches? Are you kidding me?!
>>
>> As noted in v1 (<20170511091829.5634-1-ava...@gmail.com>;
>> https://public-inbox.org/git/20170511091829.5634-1-ava...@gmail.com/)
>> these are all doc, test, refactoring etc. changes needed by the
>> subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series.
>>
>> Since Junio hasn't been picking it I'm no longer sending updates to
>> that patch series & waiting for this one to cook first.
>
> I actually do not mind a reroll that goes together with this.  The
> only reason why I skipped the earlier one was because I looked at
> the original one, and the discussion on the reroll of this 'easy to
> review' part indicated that it will be rerolled, before I got to
> look at these upper layer patches.

Great, now that the base of this is migrating to next, here's the
second part of this.

For v1 see <20170513234535.12749-1-ava...@gmail.com>
(https://public-inbox.org/git/20170513234535.12749-1-ava...@gmail.com/).

The only changes to the content are better if/else branching around
conditional macros (but no functional changes) in the PCRE v1 JIT API
patch in response to a comment by Simon Ruderich.

The only other changes are trivial updates to the commit messages to
account for t/perf changes made in the series this builds on.

Ævar Arnfjörð Bjarmason (7):
  grep: don't redundantly compile throwaway patterns under threading
  grep: skip pthreads overhead when using one thread
  log: add -P as a synonym for --perl-regexp
  grep: add support for the PCRE v1 JIT API
  grep: un-break building with PCRE < 8.32
  grep: un-break building with PCRE < 8.20
  grep: add support for PCRE v2

 Documentation/rev-list-options.txt |   1 +
 Makefile   |  30 +--
 builtin/grep.c |  16 +++-
 configure.ac   |  77 +---
 grep.c | 177 -
 grep.h |  31 +++
 revision.c |   2 +-
 t/t4202-log.sh |  12 +++
 t/test-lib.sh  |   2 +-
 9 files changed, 324 insertions(+), 24 deletions(-)

-- 
2.13.0.303.g4ebf302169

[PATCH v2 6/7] grep: un-break building with PCRE < 8.20

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.20.

The 8.20 release was the first release to have JIT & pcre_jit_stack in
the headers, so a mock type needs to be provided for it on those
releases.

Now git should compile with all PCRE versions that it supported before
my JIT change.

I've tested it as far back as version 7.5 released on 2008-01-10, once
I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1,
and I couldn't be bothered to anything older than 7.5 as I'm confident
that if the build breaks on those older versions it's not because of
my JIT change.

See the "un-break" change in this series ("grep: un-break building
with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the
main PCRE JIT commit.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/grep.h b/grep.h
index 73ef0ef8ec..b7b9d487b0 100644
--- a/grep.h
+++ b/grep.h
@@ -11,6 +11,9 @@
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
+#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
+typedef int pcre_jit_stack;
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-- 
2.13.0.303.g4ebf302169

[PATCH v2 2/7] grep: skip pthreads overhead when using one thread

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.

The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads= option and
grep.threads configuration", 2015-12-15).

That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 12e62fcbf3..bd008cb100 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
num_threads = GREP_NUM_THREADS_DEFAULT;
else if (num_threads < 0)
die(_("invalid number of threads specified (%d)"), num_threads);
+   if (num_threads == 1)
+   num_threads = 0;
 #else
if (num_threads)
warning(_("no threads support, ignoring --threads"));
-- 
2.13.0.303.g4ebf302169

[PATCH v2 5/7] grep: un-break building with PCRE < 8.32

2017-05-23 Thread Ævar Arnfjörð Bjarmason

Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.

The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).

This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.

So just take the easy way out and disable the JIT on any version older
than 8.32.

The reason this change isn't part of the initial change PCRE JIT
support is because possibly slightly annoying someone who's bisecting
with an ancient PCRE is worth it to have a cleaner history showing
which parts of the implementation are only used for ancient PCRE
versions. This also makes it easier to revert this change if we ever
decide to stop supporting those old versions.

1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
   native interface for JIT. Through this interface, the
   compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 8 
 grep.h | 5 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 49e9aed457..3c0c30f033 100644
--- a/grep.c
+++ b/grep.c
@@ -351,7 +351,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
const char *error;
int erroffset;
int options = PCRE_MULTILINE;
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
int canjit;
 #endif
 
@@ -372,7 +372,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_extra_info && error)
die("%s", error);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
pcre_config(PCRE_CONFIG_JIT, );
if (canjit == 1) {
p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
@@ -392,7 +392,7 @@ static int pcre1match(struct grep_pat *p, const char *line, 
const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
if (p->pcre1_jit_on)
ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
eol - line, 0, flags, ovector,
@@ -418,7 +418,7 @@ static void free_pcre1_regexp(struct grep_pat *p)
 {
pcre_free(p->pcre1_regexp);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_CAN_DO_MODERN_JIT
if (p->pcre1_jit_on) {
pcre_free_study(p->pcre1_extra_info);
pcre_jit_stack_free(p->pcre1_jit_stack);
diff --git a/grep.h b/grep.h
index 14f47189f9..73ef0ef8ec 100644
--- a/grep.h
+++ b/grep.h
@@ -3,6 +3,11 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include 
+#ifdef PCRE_CONFIG_JIT
+#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+#define GIT_PCRE1_CAN_DO_MODERN_JIT
+#endif
+#endif
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
-- 
2.13.0.303.g4ebf302169

Re: [PATCH v5 6/6] clean: teach clean -d to preserve ignored paths

2017-05-23 Thread Samuel Lijin

On Tue, May 23, 2017 at 8:52 AM, Junio C Hamano  wrote:
> Samuel Lijin  writes:
>
>> @@ -931,6 +961,7 @@ int cmd_clean(int argc, const char **argv, const char 
>> *prefix)
>>  prefix, argv);
>>
>>   fill_directory(, );
>> + correct_untracked_entries();
>>
>>   for (i = 0; i < dir.nr; i++) {
>>   struct dir_entry *ent = dir.entries[i];
>
> You used to set SHOW_IGNORED_TOO and KEEP_UNTRACKED_CONTENTS in
> dir.flags early in the function, and then free dir.entries[] and
> dir.ignored[] after we are done.  They are gone in this version.
>
> Intended?

Embarrassingly, no. Rerolling.

>> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
>> index 3a2d709c2..7b36954d6 100755
>> --- a/t/t7300-clean.sh
>> +++ b/t/t7300-clean.sh
>> @@ -653,7 +653,7 @@ test_expect_success 'git clean -d respects pathspecs 
>> (pathspec is prefix of dir)
>>   test_path_is_dir foobar
>>  '
>>
>> -test_expect_failure 'git clean -d skips untracked dirs containing ignored 
>> files' '
>> +test_expect_success 'git clean -d skips untracked dirs containing ignored 
>> files' '
>>   echo /foo/bar >.gitignore &&
>>   echo ignoreme >>.gitignore &&
>>   rm -rf foo &&

Re: [GSoC][PATCH v4 2/2] submodule: port subcommand foreach from shell to C

2017-05-23 Thread Brandon Williams

On 05/22, Stefan Beller wrote:
> On Sun, May 21, 2017 at 5:58 AM, Prathamesh Chavan  wrote:
> 
> > I have also made some changes in git-submodule.sh for correcting
> > the $path variable. And hence made the corresponding changes in
> > the new test introduced in t7407-submodule-foreach as well.
> > I have push this work at:
> > https://github.com/pratham-pc/git/commits/foreach-bug-fixed
> 
> This one seems to pass the test suite by having the bug fixed.
> (The patches posted here seems to be
> https://github.com/pratham-pc/git/commits/foreach
> which does not pass tests? These two series seem to only differ in
> the bug fix commit, which I think is a good idea to include, as then we
> have a bug fixed and the tests pass.)
> 
> > +static void for_each_submodule_list(const struct module_list list, 
> > submodule_list_func_t fn, void *cb_data)
> ..
> > +   return;
> 
> no need for an explicit return in a void function.
> 
> > +struct cb_foreach {
> > +   int argc;
> > +   const char **argv;
> > +   const char *prefix;
> > +   unsigned int quiet: 1;
> > +   unsigned int recursive: 1;
> > +};
> > +#define CB_FOREACH_INIT { 0, NULL, 0, 0 }
> 
> This static initializer doesn't quite match the struct,
> (I would expect two NULLs as we have two const char pointers).

If we ever move to a new version of C, these initializers would be much
more readable as we could assign values to the fields themselves.  But
that is unrelated to this change.

> 
> > +
> > +   info.argc = argc;
> > +   info.argv = argv;
> > +   info.prefix = prefix;
> > +   info.quiet = !!quiet;
> > +   info.recursive = !!recursive;
> 
> as you assign all fields of the struct yourself, you could also omit the
> static initialization via _INIT above.
> 
> 
> Apart from these two minor nits the code looks good to me.
> However we'd really want to have the bug fix patch as well.
> (At the time of submission of a patch we should not be aware
> of any tests failing, which we are without said bug fix patch)
> 
> Thanks,
> Stefan

-- 
Brandon Williams

Re: What's cooking in git.git (May 2017, #07; Tue, 23)

2017-05-23 Thread Stefan Beller

On Tue, May 23, 2017 at 1:08 AM, Junio C Hamano  wrote:

> * sb/submodule-blanket-recursive (2017-05-23) 6 commits
>  . builtin/push.c: respect 'submodule.recurse' option
>  . builtin/grep.c: respect 'submodule.recurse' option
>  . builtin/fetch.c: respect 'submodule.recurse' option
>  . Introduce submodule.recurse option for worktree manipulators
>  . submodule test invocation: only pass additional arguments
>  . submodule.c: add has_submodules to check if we have any submodules
>  (this branch uses sb/reset-recurse-submodules.)
>
>  A new configuration variable "submodule.recurse" can be set to true
>  to force various commands run at the top-level superproject to
>  behave as if they were invoked with the "--recurse-submodules"
>  option.
>
>  Seems to break t7814 when merged to 'pu'.

I will investigate! (It passes on its own, so I guess it is some
interference with a recent grep series)


> * sb/diff-color-move (2017-05-23) 17 commits
>  . diff.c: color moved lines differently
>  . diff: buffer all output if asked to
>  . diff.c: emit_line includes whitespace highlighting
>  . diff.c: convert diff_summary to use emit_line_*
>  . diff.c: convert diff_flush to use emit_line_*
>  . diff.c: convert word diffing to use emit_line_*
>  . diff.c: convert show_stats to use emit_line_*
>  . diff.c: convert emit_binary_diff_body to use emit_line_*
>  . submodule.c: convert show_submodule_summary to use emit_line_fmt
>  . diff.c: convert emit_rewrite_lines to use emit_line_*
>  . diff.c: convert emit_rewrite_diff to use emit_line_*
>  . diff.c: convert builtin_diff to use emit_line_*
>  . diff.c: convert fn_out_consume to use emit_line
>  . diff: introduce more flexible emit function
>  . diff.c: factor out diff_flush_patch_all_file_pairs
>  . diff: move line ending check into emit_hunk_header
>  . diff: readability fix
>
>  "git diff" has been taught to optionally paint new lines that are
>  the same as deleted lines elsewhere differently from genuinely new
>  lines.
>
>  Seems to break t4060 when merged to 'next'.

It breaks own its own, but when merged to next it breaks, too. :(

The reason for this is the submodule color thing that I added
last minute as manual inspection of submodule diffs seemed
odd to me.

It turns out submodule diffs were never colored appropriately,
so I'll resend with this interdiff (that let's test pass again),
once the discussion settles:

diff --git a/submodule.c b/submodule.c
index 428c996c97..19c63197fb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -550,8 +550,6 @@ void show_submodule_inline_diff(struct
diff_options *o, const char *path,

/* TODO: other options may need to be passed here. */
argv_array_push(, "diff");
-   if (o->use_color)
-   argv_array_push(, "--color=always");
argv_array_pushf(, "--line-prefix=%s", diff_line_prefix(o));
if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
argv_array_pushf(, "--src-prefix=%s%s/",

Re: [GSoC][PATCH v4 1/2] t7407: test "submodule foreach --recursive" from subdirectory added

2017-05-23 Thread Brandon Williams

On 05/21, Prathamesh Chavan wrote:
> Additional test cases added to the submodule-foreach test suite
> to check the submodule foreach --recursive behavior from a
> subdirectory as this was missing from the test suite.
> 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
> It was observed that after porting the submodule subcommand to
> C, it passed all the test from the existing test-suite.
> But since there was some observation made, where the output of
> the orignal submodule foreach subcommand wasn't matching to that
> of the newly ported function, this test has been added.
> 
> After which, it can been seen that the patch fails in test #9
> of t7407-submodule-foreach, which is the newly added
> test to that suite. The main reason of adding this test
> was to bring the behavior of $path for the submodule
> foreach --recursive case.
> 
> The observation made was as follows:
> 
> For a project - super containing dir (not a submodule)
> and a submodule sub which contains another submodule
> subsub. When we run a command from super/dir:
> 
> git submodule foreach "echo \$path-\$sm_path"
> 
> actual results:
> Entering '../sub'
> ../sub-../sub
> Entering '../sub/subsub'
> ../subsub-../subsub
> 
> ported function's result:
> Entering '../sub'
> sub-../sub
> Entering '../sub/subsub'
> subsub-../sub/subsub
> 
> This is occurring since in cmd_foreach of git-submodule.sh
> when we use to recurse, we call cmd_foreach
> and hence the process ran in the same shell.
> Because of this, the variable $wt_prefix is set only once
> which is at the beginning of the submodule foreach execution.
> wt_prefix=$(git rev-parse --show-prefix)
> 
> And since sm_path and path are set using $wt_prefix as :
> sm_path=$(git submodule--helper relative-path "$sm_path" "$wt_prefix") &&
> path=$sm_path
> It differs with the value of displaypath as well.
> 
> This make the value of $path confusing and I also feel it
> deviates from its documentation:
> $path is the name of the submodule directory relative
> to the superproject.
> 
> But since in refactoring the code, we wish to maintain the
> code in same way, we need to pass wt_prefix on every
> recursive call, which may result in complex C code.
> Another option could be to first correct the $path value
> in git-submodule.sh and then port the updated cmd_foreach.
> 
>  t/t7407-submodule-foreach.sh | 35 ++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/t/t7407-submodule-foreach.sh b/t/t7407-submodule-foreach.sh
> index 6ba5daf42..58a890e31 100755
> --- a/t/t7407-submodule-foreach.sh
> +++ b/t/t7407-submodule-foreach.sh
> @@ -79,7 +79,6 @@ test_expect_success 'test basic "submodule foreach" usage' '
>   ) &&
>   test_i18ncmp expect actual
>  '
> -

The removal of this line seems unrelated to the rest of this patch.  Was
this intended?

>  cat >expect <  Entering '../sub1'
>  $pwd/clone-foo1-../sub1-$sub1sha1
> @@ -197,6 +196,40 @@ test_expect_success 'test messages from "foreach 
> --recursive" from subdirectory'
>   test_i18ncmp expect actual
>  '
>  
> +sub1sha1=$(cd clone2/sub1 && git rev-parse HEAD)
> +sub2sha1=$(cd clone2/sub2 && git rev-parse HEAD)
> +sub3sha1=$(cd clone2/sub3 && git rev-parse HEAD)
> +nested1sha1=$(cd clone2/nested1 && git rev-parse HEAD)
> +nested2sha1=$(cd clone2/nested1/nested2 && git rev-parse HEAD)
> +nested3sha1=$(cd clone2/nested1/nested2/nested3 && git rev-parse HEAD)
> +submodulesha1=$(cd clone2/nested1/nested2/nested3/submodule && git rev-parse 
> HEAD)
> +
> +cat >expect < +Entering '../nested1'
> +$pwd/clone2-nested1-../nested1-$nested1sha1
> +Entering '../nested1/nested2'
> +$pwd/clone2/nested1-nested2-../nested2-$nested2sha1
> +Entering '../nested1/nested2/nested3'
> +$pwd/clone2/nested1/nested2-nested3-../nested3-$nested3sha1
> +Entering '../nested1/nested2/nested3/submodule'
> +$pwd/clone2/nested1/nested2/nested3-submodule-../submodule-$submodulesha1
> +Entering '../sub1'
> +$pwd/clone2-foo1-../sub1-$sub1sha1
> +Entering '../sub2'
> +$pwd/clone2-foo2-../sub2-$sub2sha1
> +Entering '../sub3'
> +$pwd/clone2-foo3-../sub3-$sub3sha1
> +EOF
> +
> +test_expect_success 'test "submodule foreach --recursive" from subdirectory' 
> '
> + (
> + cd clone2 &&
> + cd untracked &&
> + git submodule foreach --recursive "echo 
> \$toplevel-\$name-\$sm_path-\$sha1" >../../actual
> + ) &&
> + test_i18ncmp expect actual
> +'
> +
>  cat > expect <  nested1-nested1
>  nested2-nested2
> -- 
> 2.11.0
> 

-- 
Brandon Williams

Re: [PATCHv2 1/6] submodule.c: add has_submodules to check if we have any submodules

2017-05-23 Thread Stefan Beller

On Tue, May 23, 2017 at 11:40 AM, Brandon Williams  wrote:
> It doesn't look like any patches actually use this helper, is this
> intended?

It was needed for
https://public-inbox.org/git/20170411194616.4963-1-sbel...@google.com/
which we do not have in this series any more. Will drop this patch.

Re: [PATCHv2 1/6] submodule.c: add has_submodules to check if we have any submodules

2017-05-23 Thread Brandon Williams

On 05/22, Stefan Beller wrote:
> When submodules are involved, it often slows down the process, as most
> submodule related handling is either done via a child process or by
> iterating over the index finding all gitlinks.
> 
> For most commands that may interact with submodules, we need have a
> quick check if we do have any submodules at all, such that we can
> be fast in the case when no submodules are in use.  For this quick
> check introduce a function that checks with different heuristics if
> we do have submodules around, checking if
> * anything related to submodules is configured,
> * absorbed git dirs for submodules are present,
> * the '.gitmodules' file exists
> * gitlinks are recorded in the index.
> 
> Each heuristic has advantages and disadvantages.
> For example in a later patch, when we first use this function in
> git-clone, we'll just check for the existence of the '.gitmodules'
> file, because at the time of running the clone command there will
> be no absorbed git dirs or submodule configuration around.
> 
> Checking for any configuration related to submodules would be useful
> in a later stage (after cloning) to see if the submodules are actually
> in use.
> 
> Checking for absorbed git directories is good to see if the user has
> actually cloned submodules already (i.e. not just initialized them by
> configuring them).
> 
> The heuristic for checking the configuration requires this patch
> to have have a global state, whether the submodule config has already
> been read, and if there were any submodule related keys. Make
> 'submodule_config' private to the submodule code, and introduce
> 'load_submodule_config' that will take care of this global state.

It doesn't look like any patches actually use this helper, is this
intended?

> 
> Signed-off-by: Stefan Beller 
> ---
>  builtin/checkout.c  |  2 +-
>  builtin/fetch.c |  3 +-
>  builtin/read-tree.c |  3 +-
>  builtin/reset.c |  3 +-
>  builtin/submodule--helper.c | 10 ++
>  submodule.c | 80 
> -
>  submodule.h |  8 -
>  unpack-trees.c  |  3 +-
>  8 files changed, 79 insertions(+), 33 deletions(-)
> 
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index bfa5419f33..2787b343b1 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -1215,7 +1215,7 @@ int cmd_checkout(int argc, const char **argv, const 
> char *prefix)
>   }
>  
>   if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
> - git_config(submodule_config, NULL);
> + load_submodule_config();
>   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT)
>   
> set_config_update_recurse_submodules(recurse_submodules);
>   }
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 4ef7a08afc..4b5f172623 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1343,8 +1343,7 @@ int cmd_fetch(int argc, const char **argv, const char 
> *prefix)
>   int arg = 
> parse_fetch_recurse_submodules_arg("--recurse-submodules-default", 
> recurse_submodules_default);
>   set_config_fetch_recurse_submodules(arg);
>   }
> - gitmodules_config();
> - git_config(submodule_config, NULL);
> + load_submodule_config();
>   }
>  
>   if (all) {
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index 23e212ee8c..2f7f085b82 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -176,8 +176,7 @@ int cmd_read_tree(int argc, const char **argv, const char 
> *unused_prefix)
>   hold_locked_index(_file, LOCK_DIE_ON_ERROR);
>  
>   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) {
> - gitmodules_config();
> - git_config(submodule_config, NULL);
> + load_submodule_config();
>   set_config_update_recurse_submodules(RECURSE_SUBMODULES_ON);
>   }
>  
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 5ce27fcaed..319d8c1201 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -320,8 +320,7 @@ int cmd_reset(int argc, const char **argv, const char 
> *prefix)
>   parse_args(, argv, prefix, patch_mode, );
>  
>   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) {
> - gitmodules_config();
> - git_config(submodule_config, NULL);
> + load_submodule_config();
>   set_config_update_recurse_submodules(RECURSE_SUBMODULES_ON);
>   }
>  
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 85aafe46a4..92e13abe2d 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1013,9 +1013,7 @@ static int update_clone(int argc, const char **argv, 
> const char *prefix)
>   if (pathspec.nr)
>   suc.warn_if_uninitialized = 1;
>  
> - /* Overlay the parsed

Re: [PATCH 0/2] Fix warnings on access of a remote with Windows paths

2017-05-23 Thread Johannes Sixt


Am 23.05.2017 um 12:53 schrieb Johannes Schindelin:

Hi Hannes (& Junio, see below),

On Mon, 22 May 2017, Johannes Sixt wrote:


Am 22.05.2017 um 13:59 schrieb Johannes Schindelin:

On Sat, 20 May 2017, Johannes Sixt wrote:

This small series fixes these warnings on Windows:

C:\Temp\gittest>git fetch C:\Temp\gittest
warning: unable to access '.git/remotes/C:\Temp\gittest': Invalid argument
warning: unable to access '.git/branches/C:\Temp\gittest': Invalid
warning: argument
  From C:\Temp\gittest
   * branchHEAD   -> FETCH_HEAD

The fix is in the second patch; the first patch is a
preparation that allows to write the fix in my preferred style.


Thank you!

Maybe you want to accompany these patches with a simple test case that
uses e.g. ls-remote on $(pwd | tr / )?


Actually, no, I don't want to. It would have to be protected by MINGW, and I
don't want to burden us (and here I mean Windows folks) with a check for a
cosmetic deficiency. (Shell scripts, slow forks, yadda, yadda...)


Actually, yes, I want to.

Yes, it would have to be protected by MINGW, and yes, it would put a
burden on us, but also yes: it would put an even higher burden on me to
check this manually before releasing Git for Windows, or even worse: this
regression would be the kind of bug that triggers many bug reports,
addressing which would take a lot more time than writing this test case
and executing it as part of our test suite.


Fair enough. The patch looks good. I'll be able to test no earlier than 
Monday, though.




So here goes:

-- snipsnap --
From: Johannes Schindelin 
Date: Tue, 23 May 2017 12:42:13 +0200
Subject: [PATCH] mingw: verify that paths are not mistaken for remote nicknames

This added test case simply verifies that users will not be bothered
with bogus complaints à la

warning: unable to access '.git/remotes/D:\repo': Invalid argument

when fetching from a Windows path (in this case, D:\repo).

Signed-off-by: Johannes Schindelin 
---
  t/t5580-clone-push-unc.sh | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/t/t5580-clone-push-unc.sh b/t/t5580-clone-push-unc.sh
index b195f71ea98..93ce99ba3c6 100755
--- a/t/t5580-clone-push-unc.sh
+++ b/t/t5580-clone-push-unc.sh
@@ -1,13 +1,19 @@
  #!/bin/sh
  
-test_description='various UNC path tests (Windows-only)'

+test_description='various Windows-only path tests'
  . ./test-lib.sh
  
  if ! test_have_prereq MINGW; then

-   skip_all='skipping UNC path tests, requires Windows'
+   skip_all='skipping Windows-only path tests'
test_done
  fi
  
+test_expect_success 'remote nick cannot contain backslashes' '

+   BACKSLASHED="$(pwd | tr / )" &&
+   git ls-remote "$BACKSLASHED" >out 2>err &&
+   ! grep "unable to access" err
+'
+
  UNCPATH="$(pwd)"
  case "$UNCPATH" in
  [A-Z]:*)

Re: [PATCHv2 2/6] submodule test invocation: only pass additional arguments

2017-05-23 Thread Stefan Beller

On Mon, May 22, 2017 at 11:26 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh
>> index e8f70b806f..2672f104cf 100755
>> --- a/t/t2013-checkout-submodule.sh
>> +++ b/t/t2013-checkout-submodule.sh
>> @@ -65,9 +65,9 @@ test_expect_success '"checkout " honors 
>> submodule.*.ignore from .git/
>>
>>  KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
>>  KNOWN_FAILURE_SUBMODULE_RECURSIVE_NESTED=1
>> -test_submodule_switch_recursing "git checkout --recurse-submodules"
>> +test_submodule_switch_recursing "checkout"
>>
>> -test_submodule_forced_switch_recursing "git checkout -f 
>> --recurse-submodules"
>> +test_submodule_forced_switch_recursing "checkout -f"
>>
>>  test_submodule_switch "git checkout"
>
> Doesn't the above look crazy to you?

Oh well. The commit message doesn't explain why the craziness is
required here (really!).

submodule test invocation: only pass additional arguments

In a later patch we want to introduce a config option to trigger
the submodule recursing by default. As this option should be
available and uniform across all commands that deal with submodules
we'd want to test for this option in the submodule update library.

So instead of calling the whole test set again for
"git -c submodule.recurse foo" instead of
"git foo --recurse-submodules", we'd only want to introduce one
basic test that tests if the option is recognized and respected
to not overload the test suite.

>
> It is hostile to other people (and those who need to make merges)
> who have to work with test_submodule_switch_recursing that older one
> used to take the full command but its definition suddenly changes so
> that the caller now must omit the leading "git".

I am not aware of other people (or other series in flight by myself) that use
one of the switches currently.

>  Even worse,
> another helper with a similar-sounding name, test_submodule_switch,
> still must be called with the leading "git".

Oh, yeah that is a real issue. I will migrate all of them.

>
> The same comment applies to the one we can see below.

An alternative would be to come up with a slightly different name
to ensure we do not have issues with other series in flight. The function
name is already pretty long, so encoding even more information in it
may be not a good idea. But the argument is shorter, so maybe:

- test_submodule_switch_recursing "git reset --hard --recurse-submodules"
+ test_submodule_switch_recursing_args_only  "reset --hard"

Thanks,
Stefan

Re: [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt

2017-05-23 Thread Stefan Beller

On Mon, May 22, 2017 at 10:59 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/submodule.c b/submodule.c
>> index d3299e29c0..428c996c97 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> ...
>> @@ -547,15 +543,16 @@ void show_submodule_inline_diff(FILE *f, const char 
>> *path,
>>   if (right)
>>   new = two;
>>
>> - fflush(f);
>>   cp.git_cmd = 1;
>>   cp.dir = path;
>> - cp.out = dup(fileno(f));
>> + cp.out = -1;
>>   cp.no_stdin = 1;
>>
>>   /* TODO: other options may need to be passed here. */
>>   argv_array_push(, "diff");
>> - argv_array_pushf(, "--line-prefix=%s", line_prefix);
>> + if (o->use_color)
>> + argv_array_push(, "--color=always");
>> + argv_array_pushf(, "--line-prefix=%s", diff_line_prefix(o));
>
> This makes me wonder if we also need to explicitly decline coloring
> when o->use_color is not set.  After all, even if configuration in
> the submodule's config file says diff.color=never, we will enable
> the color with this codepath (because the user explicitly asked to
> use the color in the top-level), so we should do the same for the
> opposite case where the config says yes/auto if the user said no at
> the top-level, no?

That makes sense, so instead we'd do

 argv_array_push(, "--color=%s", o->use_color ?
"always" : "never");

to override the submodule config in all cases.

However that changes from current behavior.

You could imagine that you want to see the superproject colored
and the submodule non-colored to easily spot that it is a submodule change.
Currently this can be made to work via setting color=never in the
submodule and then run the diff from the superproject.

What we really want here is a switch that influences the automatic detection
and say: pretend "dup(fileno(f));" was your stdout, now run your auto-detection
to decide for yourself.

I am not sure if it worth the effort to fix this hypothetical situation, though.

Thanks,
Stefan

Re: BUG: The .gitignore rules can't be made to cross submodule boundaries

2017-05-23 Thread Stefan Beller

On Tue, May 23, 2017 at 2:55 AM, Ævar Arnfjörð Bjarmason
 wrote:
> On Tue, May 23, 2017 at 11:17 AM, Johannes Schindelin
>  wrote:
>> Hi Ævar,
>>
>> On Mon, 22 May 2017, Ævar Arnfjörð Bjarmason wrote:
>>
>>> When I was adding the sha1collisiondetection submodule to git.git I
>>> noticed that building git would dirty the submodule.
>>>
>>> This is because our own Makefile adds .depend/ directories. I hacked
>>> around it by just getting the upstream project accept carrying an ignore
>>> rule for that around:
>>> https://github.com/cr-marcstevens/sha1collisiondetection/commit/e8397b26
>>>
>>> A workaround for this is to have the Makefile add such a rule to
>>> .git/modules/sha1collisiondetection/info/exclude, but that's less
>>> convenient than being able to distribute it as a normal .gitignore rule.
>>>
>>> The submodule..ignore config provides an overly big hammer to
>>> solve this, it would be better if we had something like
>>> submodule..gitignore=. Then we could have e.g.
>>> .gitignore.sha1collisiondetection which would be added to whatever rules
>>> the repo's own .gitignore provides.
>>
>> While I have nothing but the utmost respect for Stefan and Brandon for
>> trying to improve submodules, maybe it would be a wiser idea to imitate
>> the same strategy with sha1dc as we use with git-gui and gitk, i.e.
>> perform a subtree merge instead of adding it as a submodule. It's not like
>> 570kB will kill us.

Actually that is a very valid bug report outside that series for the
behavior of submodules.

In a world where you use a submodule to track say a third party
library, the current behavior of .gitignore applying to each repo makes
sense.

When it is no third party, but a first party lib, then it is sensible to expect
that the building/testing infrastructure works across the whole repo set,
and the user wants just one central place to specify things, such as
ignoring certain files or applying .gitattributes.

This topic came up in various forms on the mailing list, most often for
config that ought to be applied across all repos[1].

That said I have no good idea yet how to fix this issue without introducing
the ultimate user confusion.

The conditional include of config files (by Duy as part of 2.13) seems like
an interesting approach, which we could build on top of.
We currently have a main config and a per-working-tree config, so I would
expect we'd introduce another config file that is included by all submodules
by default. It could be located in the superproject at ".git/config.super".
This config file could then specify

[submodule]
recursiveIgnore = [yes/no]
recursiveAttributes = [yes/no]

In that way commands run from within the submodule as well as from
the superproject would realize that the submodule needs to lookup
the superproject and use the attribute/ignore/config settings from there.

[1] Here the example for URL.insteadOf
https://public-inbox.org/git/capz477mcsbsfbqkzp69mt_brwz-0aes6twjofqrhizubv7z...@mail.gmail.com/

>
> The submodule/.gitignore bug/feature-request being reported here isn't
> something that impacts the ab/sha1dc series in practice.
>
> It was something I noticed while working with an earlier commit in
> that repo, but that's a commit that'll never be pinned by the
> git.git:sha1collisiondetection submodule.

Thanks for the bug report. As outlined above, we'd still need to bikeshed
how to fix it properly I'd think.

Thanks,
Stefan

Re: [WIP/RFC 17/23] repo: introduce new repository object

2017-05-23 Thread Brandon Williams

On 05/20, Stefan Beller wrote:
> On Thu, May 18, 2017 at 4:21 PM, Brandon Williams  wrote:
> > Introduce 'struct repo' an object used to represent a repository.
> 
> Is this the right place to outline what you expect from a repo object?
> Are you planning to use it everywhere?
> Is it lazy-init'd and it takes care of it itself, or would the caller
> have to take
> care of the state of the repo? ("the repo object is just a place to put the
> current globals")

Those are all great questions, questions that I don't think I have all
the answers for right now.  Since this is still in the idea phase I'm
hoping to hear what other people think this would look like in an ideal
world.  I don't think everything would need to be lazy-init'd or can
easily be done that way.  At least the index stuff isn't set up to
be able to do that.  I can see the config being lazy-init'd in one way
or another, though that would require passing in a repo object instead
of a config-set when you want to look-up a config value (which is
probably very reasonable) that way the config-set stored in the repo
object can be properly initialized.

> 
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  Makefile |  1 +
> >  repo.c   | 42 ++
> >  repo.h   | 15 +++
> >  3 files changed, 58 insertions(+)
> >  create mode 100644 repo.c
> >  create mode 100644 repo.h
> >
> > diff --git a/Makefile b/Makefile
> > index e35542e63..a49d2f96a 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -821,6 +821,7 @@ LIB_OBJS += refs/ref-cache.o
> >  LIB_OBJS += ref-filter.o
> >  LIB_OBJS += remote.o
> >  LIB_OBJS += replace_object.o
> > +LIB_OBJS += repo.o
> >  LIB_OBJS += rerere.o
> >  LIB_OBJS += resolve-undo.o
> >  LIB_OBJS += revision.o
> > diff --git a/repo.c b/repo.c
> > new file mode 100644
> > index 0..d47e98d95
> > --- /dev/null
> > +++ b/repo.c
> > @@ -0,0 +1,42 @@
> > +#include "cache.h"
> > +#include "repo.h"
> > +
> > +int
> > +repo_init(struct repo *repo, const char *gitdir, const char *worktree)
> 
> style ;)

Darn, I forgot to change this before sending out.  An easy fix, though I
still like this style better :P

> 
> 
> > +   /* Maybe need a check to verify that a worktree is indeed a 
> > worktree? */
> 
> add NEEDSWORK/FIXME prefix to comment?
> 
> > +void
> > +repo_clear(struct repo *repo)
> 
> style ;)

-- 
Brandon Williams

Re: [WIP/RFC 00/23] repository object

2017-05-23 Thread Brandon Williams

On 05/19, Ben Peart wrote:
> Glad to see you tackling this.  This is definitely a step in the
> right direction.
> 
> I realize that it will take a lot of work and that intermediate
> steps may just be pushing it the global state one level higher but
> eventually it would be great to see an entire code path global state
> free!
> 
> I'm personally interested because reducing the reliance on global
> state also helps us in our performance work as it makes it more
> possible to use threading to scale up the performance.

I'm glad that more people than just me are interesting in moving in this
direction.  It's definitely going to take some work and hopefully I can
convince some more people to help achieve this goal :D

> 
> Ben
> 
> On 5/18/2017 7:21 PM, Brandon Williams wrote:
> >When I first started working on the git project I found it very difficult to
> >understand parts of the code base because of the inherently global nature of
> >our code.  It also made working on submodules very difficult.  Since we can
> >only open up a single repository per process, you need to launch a child
> >process in order to process a submodule.  But you also need to be able to
> >communicate other stateful information to the children processes so that the
> >submodules know how best to format their output or match against a
> >pathspec...it ends up feeling like layering on hack after hack.  What I would
> >really like to do, is to have the ability to have a repository object so 
> >that I
> >can open a submodule in-process.
> >
> >Before this becomes a reality for all commands, much of the library code 
> >would
> >need to be refactored in order to work purely on handles instead of global
> >state.  As it turned out, ls-files is a pretty simple command and doesn't 
> >have
> >*too* many dependencies.  The biggest thing that needed to be changed was
> >piping through an index into a couple library routines so that they don't
> >inherently rely on 'the_index'.  A few of these changes I've sent out and can
> >be found at 'origin/bw/pathspec-sans-the-index' and
> >'origin/bw/dir-c-stops-relying-on-the-index' which this series is based on.
> >
> >Patches 1-16 are refactorings to prepare either library code or ls-files 
> >itself
> >to be ready to handle passing around an index struct.  Patches 17-22 
> >introduce
> >a repository struct and change a couple of things about how submodule caches
> >work (getting submodule information from .gitmodules).  And Patch 23 converts
> >ls-files to use a repository struct.
> >
> >The most interesting part of the series is from 17-23.  And 1-16 could be 
> >taken
> >as is without the rest of the series.
> >
> >This is still very much in a WIP state, though it does pass all tests.  What
> >I'm hoping for here is to get a discussion started about the feasibility of a
> >change like this and hopefully to get the ball rolling.  Is this a direction 
> >we
> >want to move in?  Is it worth the pain?
> >
> >Thanks for taking the time to look at this and entertain my insane ideas :)
> >
> >Brandon Williams (23):
> >  convert: convert get_cached_convert_stats_ascii to take an index
> >  convert: convert crlf_to_git to take an index
> >  convert: convert convert_to_git_filter_fd to take an index
> >  convert: convert convert_to_git to take an index
> >  convert: convert renormalize_buffer to take an index
> >  tree: convert read_tree to take an index parameter
> >  ls-files: convert overlay_tree_on_cache to take an index
> >  ls-files: convert write_eolinfo to take an index
> >  ls-files: convert show_killed_files to take an index
> >  ls-files: convert show_other_files to take an index
> >  ls-files: convert show_ru_info to take an index
> >  ls-files: convert ce_excluded to take an index
> >  ls-files: convert prune_cache to take an index
> >  ls-files: convert show_files to take an index
> >  ls-files: factor out debug info into a function
> >  ls-files: factor out tag calculation
> >  repo: introduce new repository object
> >  repo: add index_state to struct repo
> >  repo: add per repo config
> >  submodule-config: refactor to allow for multiple submodule_cache's
> >  repo: add repo_read_gitmodules
> >  submodule: add is_submodule_active
> >  ls-files: use repository object
> >
> > Makefile   |   1 +
> > apply.c|   2 +-
> > builtin/blame.c|   2 +-
> > builtin/commit.c   |   3 +-
> > builtin/ls-files.c | 348 
> > -
> > cache.h|   4 +-
> > combine-diff.c |   2 +-
> > config.c   |   2 +-
> > convert.c  |  31 +--
> > convert.h  |  19 +-
> > diff.c |   6 +-
> > dir.c  |   2 +-
> > git.c  |   2 +-
> > ll-merge.c

Re: [WIP/RFC 00/23] repository object

2017-05-23 Thread Brandon Williams

On 05/22, Jeff King wrote:
> On Thu, May 18, 2017 at 04:21:11PM -0700, Brandon Williams wrote:
> 
> > When I first started working on the git project I found it very difficult to
> > understand parts of the code base because of the inherently global nature of
> > our code.  It also made working on submodules very difficult.  Since we can
> > only open up a single repository per process, you need to launch a child
> > process in order to process a submodule.  But you also need to be able to
> > communicate other stateful information to the children processes so that the
> > submodules know how best to format their output or match against a
> > pathspec...it ends up feeling like layering on hack after hack.  What I 
> > would
> > really like to do, is to have the ability to have a repository object so 
> > that I
> > can open a submodule in-process.
> 
> We could always buy in fully to the multi-process model and just
> implement a generic RPC protocol between the parent and submodule gits.
> Does CORBA still exist?
> 
> (No, I am not serious about any of that).
> 
> > This is still very much in a WIP state, though it does pass all tests.  What
> > I'm hoping for here is to get a discussion started about the feasibility of 
> > a
> > change like this and hopefully to get the ball rolling.  Is this a 
> > direction we
> > want to move in?  Is it worth the pain?
> 
> I think the really painful part is going to be all of the system calls
> that rely on global state provided by the OS. Like, say, every
> filesystem call that expects to find working tree files without
> prepending the working tree path.

Yeah that's going to be one of the more challenging things to deal
with...

> 
> That said, even if we never reached the point where we could handle all
> submodule requests in-process, I think sticking the repo-related global
> state in a struct certainly could not hurt general readability. So it's
> a good direction regardless of whether we take it all the way.

Glad you think so!

-- 
Brandon Williams

Re: Passing revs to git-bundle-create via stdin

2017-05-23 Thread Jeff King

On Tue, May 23, 2017 at 01:44:55AM +0200, ch wrote:

> I'm using git bundles to create (incremental) backups of my local 
> repositories.
> This works quite well but for certain repositories I'm getting unexpectedly 
> big
> incremental bundles. I did some testing and from what I can tell it seems
> git-bundle-create has issues processing revs passed via stdin. To illustrate
> the problem I have included a small bash script below.
> 
> I'm using Git for Windows 2.13.0.windows.1 (64-bit). Unfortunately I don't 
> have
> access to a non-Windows box to check whether it's a problem specific to the
> Windows port.

Thanks for an easy reproduction recipe. I see the problem on Linux, too.

I think what's happening is that git-bundle actually runs _two_
traversals using the command-line arguments. It kicks off an external
rev-list via compute_and_write_prerequisites(), and then feeds the
arguments again to setup_revisions(). The first one eats all of stdin,
and the second just sees an empty input.

You can see it working if you do:

  $ git bundle create from-terminal.git --all --stdin
  ^feature
  ^master^
  [press ^D, i.e., ctrl-d]
  ^feature
  ^master^
  [press ^D again]

Hitting ^D tells the terminal driver to send an EOF; the first one goes
to the child rev-list, and then we repeat the input to get read by the
second traversal. The result is identical to your command-line-only
output. I have no idea if the ^D thing works at all on Windows, but I
don't mean it even as a workaround. It was just a way of confirming my
guess about the double-read.

The real solutions I can think of are:

  1. Teach git-bundle not to require the double-read. I'm not sure why
 it's written the way it is, but presumably it would be tricky to
 undo it (or we would have just written it the other way in the
 first place!)

  2. Git-bundle could buffer stdin and feed it to the two traversals. I
 think this actually ends up a little tricky, because the second
 traversal is done in-process (so we'd have to actually re-feed the
 buffer to our stdin via a "struct async", which feels pretty
 hacky).

  3. git-bundle could natively support --stdin, reading each line and
 convert it into traversal arguments. This is the quickest way to
 make your example work, but I suspect there will be funky corner
 cases (because we'd have to replicate the same rules that
 revision.c uses to read its input).

None of those are incredibly appealing.

-Peff

Re: Another git repo at kernel.org?

2017-05-23 Thread Theodore Ts'o

So Junio owns the pub/scm/git/git.git tree on kernel.org, and he may
already have access to create new repo's under the pub/scm/git
hierarchy.  In which case we might not need to bug the kernel.org
administrators at all.

Also, I'll note that it is possible to set up some repo's such that a
group of people have access to push to them.  You'll see for example
on git.kernel.org that certain repositories have as their owner "XFS
FS Group", or "ARM64 Group" or "Intel Wireless Group".

Cheers,

- Ted

Re:Re: [PATCH v5] send-email: --batch-size to work around some SMTP server limit

2017-05-23 Thread 赵小强



At 2017-05-22 17:26:41, "Ævar Arnfjörð Bjarmason"  wrote:
>On Sun, May 21, 2017 at 2:59 PM, xiaoqiang zhao  wrote:
>> Some email servers (e.g. smtp.163.com) limit the number emails to be
>> sent per session(connection) and this will lead to a faliure when
>> sending many messages.
>
>This OK to me, the nits I had are addressed by Junio's reply.
>
>Looking at this the Nth time now though I wonder about this approach
>in general. In all your E-Mails I don't think you ever said /what/
>sort of error you had from the SMTP server, you just said you had a
>failure or an error, I assume you hit one of the die's in the
>send_message() function. Can you paste the actual error you get
>without this patch?
>

When I send a patch series which has 13 (plus cover) messages as a test, I got 
errors as follows and send-email quit when sending the 11th message:

MI:DMC 163 smtp14,EsCowAD3o71TKyRZTBlJHw--.20496S13 1495542613 
http://mail.163.com/help/help_spam_16.htm?ip=1.203.183.150=smtp14=1495542613

Follow the link above, I find two error code:

•450 MI:DMC 当前连接发送的邮件数量超出限制。请减少每次连接中投递的邮件数量
•451 MI:DMC 当前连接发送的邮件数量超出限制。请控制每次连接中投递的邮件数量

Translate  into English:
•450 MI:DMC The number of messages sent execeeds the limits. Please reduce the 
number of messages  to be sent  per connection.
•451 MI:DMC The number of messages sent execeeds the limits. Please control the 
number of messages to be sent per connection.

Although has different error code, but  says similar reason. Testing with 
--smtp-debug option produce the same error.

>I wonder if something like this would Just Work for this case without
>any configuration or command-line options, with the added benefit of
>just working for anyone with transitory SMTP issues as well (patch
>posted with -w, full version at
>https://github.com/avar/git/commit/acb60c4bde50bdcb62b71ed46f49617e2caef84e.patch):
>
>diff --git a/git-send-email.perl b/git-send-email.perl
>index 8a1ee0f0d4..c2d85236d1 100755
>--- a/git-send-email.perl
>+++ b/git-send-email.perl
>@@ -1363,6 +1363,10 @@ EOF
>die __("The required SMTP server is not
>properly defined.")
>}
>
>+   my $num_tries = 0;
>+   my $max_tries = 5;
>+   smtp_again:
>+   eval {
>if ($smtp_encryption eq 'ssl') {
>$smtp_server_port ||= 465; # ssmtp
>require Net::SMTP::SSL;
>@@ -1429,6 +1433,22 @@ EOF
>}
>$smtp->dataend() or die $smtp->message;
>$smtp->code =~ /250|200/ or die
>sprintf(__("Failed to send %s\n"), $subject).$smtp->message;
>+   1;
>+   } or do {
>+   my $error = $@ || "Zombie Error";
>+
>+   warn sprintf(__("Failed to send %s due to
>error: %s"), $subject, $error);
>+   if ($num_tries++ < $max_tries) {
>+   $smtp->quit if defined $smtp;
>+   $smtp = undef;
>+   $auth = undef;
>+   my $sleep = $num_tries * 3; # 3, 6, 9, ...
>+   warn sprintf(__("This is retry %d/%d.
>Sleeping %d before trying again"),
>+$num_tries, $max_tries, $sleep);
>+   sleep($sleep);
>+   goto smtp_again;
>+   }
>+   };
>}
>if ($quiet) {
>printf($dry_run ? __("Dry-Sent %s\n") : __("Sent
>%s\n"), $subject);
>
>Now that's very much a WIP and I don't have a server like that to test against.
>
>Having worked with SMTP a lot in a past life/job, I'd say it's *very*
>likely that you're just getting a /^4/ error code from 163.com,
>probably 421, which would make this logic even simpler. I.e. we could
>just adjust this to back-off for /^4/ instead of trying to handle
>arbitrary errors.
>
>Anyway, I'm not interested in pursuing that WIP patch, and I don't
>think perfect should be the enemy of the good here. Your patch works
>for you, doesn't really damage anything else, so if you're not
>interested in hacking up something like the above I think we should
>just take it.
>
>But I do think it would be very good to get a reply to you / details
>in the commit message about what error you get exactly in this
>scenario, see if you get better details with --smtp-debug, and if so
>paste that (sans any secret info like user/password you don't want to
>share).
>
>Then if we're poking at this code in the future we can maybe just fix
>this in some more general fashion while keeping this use-case in mind.

Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-23 Thread Jeff King

[+cc Junio, whose code this is touching]

On Sun, May 21, 2017 at 12:17:06AM -0500, Tyler Brazier wrote:

> This script explains and tests what's going on:
> https://gist.github.com/tylerbrazier/4478e76fe44bf6657d4d3da6c789531d
> 
> pull is failing because it shortcuts to --ff-only then calls
> run_merge(), which does not know how to autostash. Removing the
> shortcut fixes the problem:

So I guess the ideal solution would be for us to do the autostash
ourselves, run the fast-forward merge, and then pop the stash. In theory
that's just "git stash push" followed by "git stash pop", but looking at
the implementation in git-rebase.sh, it looks like it's a little more
complicated than that.

Disabling the optimization sounds like a reasonable interim workaround,
but...

> diff --git a/builtin/pull.c b/builtin/pull.c
> index dd1a4a94e..225a59f5f 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -868,11 +868,6 @@ int cmd_pull(int argc, const char **argv, const
> char *prefix)
>   head = lookup_commit_reference(orig_head.hash);
>   commit_list_insert(head, );
>   merge_head = lookup_commit_reference(merge_heads.oid[0].hash);
> - if (is_descendant_of(merge_head, list)) {
> - /* we can fast-forward this without invoking rebase */
> - opt_ff = "--ff-only";
> - return run_merge();
> - }

...we can probably restrict it to when autostash is in use, like:

  /*
   * If this is a fast-forward, we can skip calling rebase and
   * just do the merge ourselves. But we don't know about
   * autostash, so use the real rebase command when it's in effect.
   */
  if (!autostash && is_descendant_of(merge_head, list)) {
opt_ff = "--ff-only";
return run_merge();
  }

AFAICT from the commit introducing this code (33b842a1e9), and the
surrounding discussion:

http://public-inbox.org/git/of95d98cb6.47969c1c-onc1257fe1.0058d980-c1257fe1.00599...@dakosy.de/T/#u

this is purely an optimization to avoid invoking rebase, so it's OK to
skip (and interestingly, the autostash question was raised in that
thread but not resolved).

But I notice on the run_merge() code path that we do still invoke
git-merge. And rebase has been getting faster as it is moved to C code
itself. So is this optimization even worth doing anymore?

-Peff

Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-23 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> Seems like it would be useful to have a way to ex-post-facto say "past
> history should use these URLs". i.e. if all git.git mirrors go down
> and we have to re-host, then you can just clone git.git and off you
> go, but the same isn't true of past submodule urls, or is it?

I do not know how heavily you are used to use submodules, but I
think submodule's URL is copied to the config of the superproject,
and that URL is what will be used from there on, so "past history or
future history will use that URL" is already the case, no?

Re: [PATCH v5 6/6] clean: teach clean -d to preserve ignored paths

2017-05-23 Thread Junio C Hamano

Samuel Lijin  writes:

> @@ -931,6 +961,7 @@ int cmd_clean(int argc, const char **argv, const char 
> *prefix)
>  prefix, argv);
>  
>   fill_directory(, );
> + correct_untracked_entries();
>  
>   for (i = 0; i < dir.nr; i++) {
>   struct dir_entry *ent = dir.entries[i];

You used to set SHOW_IGNORED_TOO and KEEP_UNTRACKED_CONTENTS in
dir.flags early in the function, and then free dir.entries[] and
dir.ignored[] after we are done.  They are gone in this version.

Intended?

> diff --git a/t/t7300-clean.sh b/t/t7300-clean.sh
> index 3a2d709c2..7b36954d6 100755
> --- a/t/t7300-clean.sh
> +++ b/t/t7300-clean.sh
> @@ -653,7 +653,7 @@ test_expect_success 'git clean -d respects pathspecs 
> (pathspec is prefix of dir)
>   test_path_is_dir foobar
>  '
>  
> -test_expect_failure 'git clean -d skips untracked dirs containing ignored 
> files' '
> +test_expect_success 'git clean -d skips untracked dirs containing ignored 
> files' '
>   echo /foo/bar >.gitignore &&
>   echo ignoreme >>.gitignore &&
>   rm -rf foo &&

Re: How do I see “merge events” in history?

2017-05-23 Thread Jeff King

On Tue, May 23, 2017 at 03:07:40PM +0300, Stefan Monov wrote:

> I use the GitHub web interface and the git cli. Answers for either or
> both are appreciated.
> 
> Sometimes, when I merge a branch into another branch, I see a commit
> with a message like "Merge branch 'master' into other_branch" in the
> GitHub history. But not always. So how do I see all "merge events",
> inside the history?

The simple answer is that to see all merges, you can run "git log
--merges". But I think that's not quite what you're asking.

Your "but not always" makes me think you are wondering why sometimes
when you run "git merge" (or "git pull), it results in a merge commit
and sometimes not, and whether you can see evidence of the times when it
was "not".

When a merge is a "fast forward", i.e., when the thing you are merging
is strictly a descendent of your current branch, then Git omits the
merge commit and simply updates your current branch tip to the thing
you're merging. This can happen if you forked a branch, and then when it
came time to merge it back, nothing had happened on the original branch.

In a fast forward merge, there's no evidence at all of the merge command
in the resulting history graph. So there's nothing to find via "git
log".

For some workflows you'd rather see a commit for such merges, even if it
could fast-forward. For example, when merging a topic branch, you may
want the graph to reflect that the work was done on a side branch, even
if nothing happened on "master" in the interim. You can use "git merge
--no-ff" for this.

> Even better if I can get a view with vertical lines showing branches
> and merges (like a graph).

Try "git log --oneline --decorate --graph". It will show you the graph
structure and annotate the tips of any branches.

-Peff

Re: [Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-23 Thread Jeff King

On Tue, May 23, 2017 at 04:01:06AM -0400, Samuel Lijin wrote:

> For some reason the repo on GH does not have a HEAD pointer:
> 
> $ git ls-remote https://github.com/passcod/UPPERCASE-NPM.git
> efc7dbfd6ca155d5d19ce67eb98603896062f35arefs/heads/MASTER
> e60ea8e6ec45ec45ff44ac8939cb4105b16477darefs/pull/1/head
> f35a73dcb151d336dc3d30c9a2c7423ecdb7bd1crefs/pull/2/head
> 0d9b3a1268ff39350e04a7183af0add912b686e6refs/tags/V1.0.0
> efc7dbfd6ca155d5d19ce67eb98603896062f35arefs/tags/V1.0.1
> 
> I'm not sure how you managed to do that, since GH rejects attempts to
> delete the current branch, but I believe if you set the default branch
> to MASTER it will work correctly.

The HEAD branch on the server side is still pointing at
refs/heads/master. But since that doesn't exist, upload-pack doesn't
advertise HEAD at all (it has no sha1).

You can recreate this situation with:

  cd /some/repo
  git init --bare dst.git
  git push dst.git HEAD:refs/heads/MASTER
  git ls-remote dst.git

No current-branch deletion required.

Félix should be able to re-point the server-side HEAD to MASTER via
GitHub's web interface.

GitHub's post-receive hook usually does that automatically if you push a
branch to a repository with an unborn HEAD. But I think there may be
corner cases where it gets confused.

-Peff

How do I see “merge events” in history?

2017-05-23 Thread Stefan Monov

I use the GitHub web interface and the git cli. Answers for either or
both are appreciated.

Sometimes, when I merge a branch into another branch, I see a commit
with a message like "Merge branch 'master' into other_branch" in the
GitHub history. But not always. So how do I see all "merge events",
inside the history?

Even better if I can get a view with vertical lines showing branches
and merges (like a graph).

Re: [PATCHv3 4/4] clone: use free_refspec() to free refspec list

2017-05-23 Thread SZEDER Gábor

On Tue, May 23, 2017 at 9:38 AM, Junio C Hamano  wrote:

> I was sifting entries in the draft "What's cooking" report to find
> topics to merge to 'next'.  I read the series over and as Peff said
> in his <20170515224615.f6hnnfngwpier...@sigill.intra.peff.net>, I
> think the series overall is good.
>
> Do you want to update the proposed log message for this one before
> the entire thing is merged to 'next'?

I followed Peff's advice on adding the default fetch refspec to this
temporary configuration environment (what's the proper term for
that?), which resulted in the simplest and cleanest code yet, and, as
a by product made this last patch moot, so I will drop it in the next
version.  The code is ready, but the change requires updates to the
first patch's commit message, maybe I can get around to it in the
evening.

Re: git --merge and option parsing

2017-05-23 Thread Jeff King

On Tue, May 23, 2017 at 10:17:18AM +, Holst, Henrik wrote:

> I am not sure if this is a bug but it was surprising to me so I
> thought I'd report it here.
> 
> I added `ui.column=auto` to my gitconfig and that does not work so
> well with pipes so I want to use `--no-column` option. I was a bit
> surprised that this does not work? It seems that `--merged` picks up
> `--no-column` as a commit reference instead it being parsed as an
> option.

That's behaving as expected. The --merged option has an optional
argument, which defaults to "HEAD". If you provide another argument
(even if it looks like an option to you), then it gets attached to
--merged.

As you saw, "git branch --no-column --merged" works, but so would "git
branch --merged HEAD --no-column".

-Peff

Re: [PATCHv3 4/4] clone: use free_refspec() to free refspec list

2017-05-23 Thread Jeff King

On Tue, May 23, 2017 at 04:38:07PM +0900, Junio C Hamano wrote:

> > First, the unchanged commit message is now (i.e. by using the parsed
> > refspecs returned by remote_get()) completely outdated.
> > Second, while it properly frees those refspecs, i.e. the array and all
> > its string fields, it will now leak the memory pointed by the
> > 'refspec' variable.  However, why free just that one field of the
> > 'struct *remote'?  Alas, we don't seem to have a free_remote()
> > function...
> 
> I was sifting entries in the draft "What's cooking" report to find
> topics to merge to 'next'.  I read the series over and as Peff said
> in his <20170515224615.f6hnnfngwpier...@sigill.intra.peff.net>, I
> think the series overall is good.
> 
> Do you want to update the proposed log message for this one before
> the entire thing is merged to 'next'?

I didn't quite say that it was good. I think the overall direction is
good, but I had some comments on patch 1. Maybe we want to accept the
ugliness there, but I had some suggestions that might make it less bad.

-Peff

Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-23 Thread Ævar Arnfjörð Bjarmason

On Tue, May 23, 2017 at 12:27 AM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> I liked the suggestion to make the URL a relative path, but this would
>> require you to maintain a mirror in the same places you push git.git
>> to, is that something you'd be willing to do?
>
> After thinking about this a bit more, I know what I think we want a
> bit better.
>
> Relative URL (e.g. ../sha1collisiondetection that sits next to the
> copy of git.git) may be a good way to go.  I can arrange to create
> necessary repository next to git.git on k.org and github.com but I
> need to double check about other places

If the URL doesn't point to the canonical upstream how do we review a
patch to update sha1dc here on list? Doesn't change from "'git am'
this and it'll work" to "'git am' this and it'll fail, then do this
submodule config modification dance, and run some other command".

I haven't tried repointing a submodule temporarily (and locally) to
another URL for such a use, how is that even done?

> Whether the submodule is referenced by a relative URL from the main
> project, the submodule should not come directly from the upstream,
> and various mirrors that sit next to git.git should not be blind and
> automated "mirrors".  This is because I do not want us to trust the
> security measures of https://github.com/cr-marcstevens/ repository.
> The consumers already need to trust k.org/pub/scm/git/git.git and by
> ensuring k.org/pub/scm/git/sha1dc is managed the same way, they do
> not have to trust anything extra.

I had the same comments Stefan pointed out below about this. So I
won't repeat any of that...

> Another reason is that we want to make sure all commits in the
> submodule that we bind to the superproject (i.e. git.git) are always
> in the submodule, regardless of what our upstream does, and one way
> to do so is to have control over _our_ canonical repository for the
> submodule.  In normal times, it will faithfully follow the upstream
> without doing anything else, but we'd keep the option of anchoring a
> submodule commit that is referenced by the superproject history with
> our own tag, if it is ever rewound away in the upstream history for
> whatever reason.

If we were talking about any other project but git.git I'd say "yeah
this makes sense".

But I think in our case we should keep in mind the main point of this
exercise is for us to dogfood submodule usage, not just so we get
whatever trivial benefits from updating the sha1collision/ directory
from upstream, but so that we run into issues with submodules that we
solve for all our users.

In this case you're basically concerned with:

 * We have N mirrors, but the upstream submodule URL is just one URL,
so let's not point to that, but to our N mirrors

Could also be addressed with a combination of 'pullUrl' for submodules
(inverse of pushUrl for push) to list the canonical one & list of
mirrors (or use the relative URL).

 * What if upstream say 5 years in the future rewinds their history,
github shuts down or whatever, can we check out and work with older
versions of git.git?

Seems like it would be useful to have a way to ex-post-facto say "past
history should use these URLs". i.e. if all git.git mirrors go down
and we have to re-host, then you can just clone git.git and off you
go, but the same isn't true of past submodule urls, or is it?

Re: [PATCH 0/2] Fix warnings on access of a remote with Windows paths

2017-05-23 Thread Johannes Schindelin

Hi Hannes (& Junio, see below),

On Mon, 22 May 2017, Johannes Sixt wrote:

> Am 22.05.2017 um 13:59 schrieb Johannes Schindelin:
> > On Sat, 20 May 2017, Johannes Sixt wrote:
> > > This small series fixes these warnings on Windows:
> > >
> > > C:\Temp\gittest>git fetch C:\Temp\gittest
> > > warning: unable to access '.git/remotes/C:\Temp\gittest': Invalid argument
> > > warning: unable to access '.git/branches/C:\Temp\gittest': Invalid
> > > warning: argument
> > >  From C:\Temp\gittest
> > >   * branchHEAD   -> FETCH_HEAD
> > >
> > > The fix is in the second patch; the first patch is a
> > > preparation that allows to write the fix in my preferred style.
> > 
> > Thank you!
> > 
> > Maybe you want to accompany these patches with a simple test case that
> > uses e.g. ls-remote on $(pwd | tr / )?
> 
> Actually, no, I don't want to. It would have to be protected by MINGW, and I
> don't want to burden us (and here I mean Windows folks) with a check for a
> cosmetic deficiency. (Shell scripts, slow forks, yadda, yadda...)

Actually, yes, I want to.

Yes, it would have to be protected by MINGW, and yes, it would put a
burden on us, but also yes: it would put an even higher burden on me to
check this manually before releasing Git for Windows, or even worse: this
regression would be the kind of bug that triggers many bug reports,
addressing which would take a lot more time than writing this test case
and executing it as part of our test suite.

So here goes:

-- snipsnap --
From: Johannes Schindelin 
Date: Tue, 23 May 2017 12:42:13 +0200
Subject: [PATCH] mingw: verify that paths are not mistaken for remote nicknames

This added test case simply verifies that users will not be bothered
with bogus complaints à la

warning: unable to access '.git/remotes/D:\repo': Invalid argument

when fetching from a Windows path (in this case, D:\repo).

Signed-off-by: Johannes Schindelin 
---
 t/t5580-clone-push-unc.sh | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/t/t5580-clone-push-unc.sh b/t/t5580-clone-push-unc.sh
index b195f71ea98..93ce99ba3c6 100755
--- a/t/t5580-clone-push-unc.sh
+++ b/t/t5580-clone-push-unc.sh
@@ -1,13 +1,19 @@
 #!/bin/sh
 
-test_description='various UNC path tests (Windows-only)'
+test_description='various Windows-only path tests'
 . ./test-lib.sh
 
 if ! test_have_prereq MINGW; then
-   skip_all='skipping UNC path tests, requires Windows'
+   skip_all='skipping Windows-only path tests'
test_done
 fi
 
+test_expect_success 'remote nick cannot contain backslashes' '
+   BACKSLASHED="$(pwd | tr / )" &&
+   git ls-remote "$BACKSLASHED" >out 2>err &&
+   ! grep "unable to access" err
+'
+
 UNCPATH="$(pwd)"
 case "$UNCPATH" in
 [A-Z]:*)
-- 
2.13.0.windows.1

1 2 >

1 - 100 of 124 matches

Mail list logo