Re: [PATCH] revision.c: Correctly dereference interesting_cache

2015-06-19 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

 This was introduced at b6e8a3b5 (2015-04-17, limit_list: avoid
 quadratic behavior from still_interesting), which
 also introduced the check a few lines before, which already dereferences
 `interesting_cache`. So at this point `interesting_cache` is guaranteed to
 be not NULL.

The above is the rationale for the coverity warning, but it does not
explain why this change is safe.

The code is called referencing the address of a local
 variable, so `interesting_cache` can actually never be NULL and trigger a
 segmentation fault by dereferencing it a few lines before this.

I'm having trouble parsing this sentence.  Do you mean that limit_list()
only calls still_interesting() (and thus, indirectly,
everybody_uninteresting()), with the second parameter equal to the
address of the local interesting_cache variable, so it can never be
NULL?

That makes sense, but I had to look at the code and reread the above
sentence a few times before I understood.

Do you know what this code is trying to check for?  What does it mean
for *interesting_cache to be NULL?

Should there be

if (!interesting_cache)
die(BUG: interesting_cache == NULL);

checks at the top of still_interesting and everybody_uninteresting to
futureproof this?

What does the *interesting_cache variable represent, anyway?

This code seems to be underdocumented.

Thanks and hope that helps,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 06/19] fsck: Report the ID of the error/warning

2015-06-19 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 21:28, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 Some legacy code has objects with non-fatal fsck issues; To enable the
 user to ignore those issues, let's print out the ID (e.g. when
 encountering missingemail, the user might want to call `git config
 --add receive.fsck.missingemail=warn`).

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c  | 16 
  t/t1450-fsck.sh |  4 ++--
  2 files changed, 18 insertions(+), 2 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 8e6faa8..0b3e18f 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -190,6 +190,20 @@ void fsck_set_msg_types(struct fsck_options *options, 
 const char *values)
  }
  }

 +static void append_msg_id(struct strbuf *sb, const char *msg_id)
 +{
 +for (;;) {
 +char c = *(msg_id)++;
 +
 +if (!c)
 +break;
 +if (c != '_')
 +strbuf_addch(sb, tolower(c));
 +}
 +
 +strbuf_addstr(sb, : );
 +}
 +
  __attribute__((format (printf, 4, 5)))
  static int report(struct fsck_options *options, struct object *object,
  enum fsck_msg_id id, const char *fmt, ...)
 @@ -198,6 +212,8 @@ static int report(struct fsck_options *options, struct 
 object *object,
  struct strbuf sb = STRBUF_INIT;
  int msg_type = fsck_msg_type(id, options), result;

 +append_msg_id(sb, msg_id_info[id].id_string);
 
 
 Nice.  The append function can be made a bit more context sensitive
 to upcase a char immediately after _ to make it easier to cut and
 paste into git config and keep the result readable, I think.
 
   git config --add receive.fsck.missingEmail=warn

Okay. I camelCased the IDs; it is a bit sore on my eyes in the command-line 
output, and the config variables are case-insensitive, anyway, but your wish is 
my command... I changed it locally, it will be part of v7.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 10/19] fsck: Make fsck_tag() warn-friendly

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 When fsck_tag() identifies a problem with the commit, it should try
 to make it possible to continue checking the commit object, in case the
 user wants to demote the detected errors to mere warnings.

I agree with that.  But if FSCK_MSG_BAD_OBJECT_SHA1 is an ignorable
error, why should we still have a conditional goto done here?

Shouldn't we be parsing the object the same way regardless?


 Just like fsck_commit(), there are certain problems that could hide other
 issues with the same tag object. For example, if the 'type' line is not
 encountered in the correct position, the 'tag' line – if there is any –
 would not be handled at all.

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/fsck.c b/fsck.c
 index 0cfa4d0..21e3052 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -640,7 +640,8 @@ static int fsck_tag_buffer(struct tag *tag, const char 
 *data,
   }
   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
   ret = report(options, tag-object, FSCK_MSG_BAD_OBJECT_SHA1, 
 invalid 'object' line format - bad sha1);
 - goto done;
 + if (ret)
 + goto done;
   }
   buffer += 41;
--
To unsubscribe from this list: send the line unsubscribe git in


Re: co-authoring commits

2015-06-19 Thread Tuncer Ayaz
On Fri, Jun 19, 2015 at 8:18 PM, Jakub Narębski jna...@gmail.com wrote:
 On 2015-06-18 at 23:25, Tuncer Ayaz wrote:
  On Thu, Jun 18, 2015 at 12:52 AM, Theodore Ts'o wrote:
   On Wed, Jun 17, 2015 at 10:26:32PM +0200, Tuncer Ayaz wrote:
 [...]
   One could imagine some frankly, quite rare example where there
   is a team of people who votes on each commit before it gets sent
   out and where everyone is equal and there is no hierarchy. In
   that case, perhaps you could set the from field to a mailing
   list address. But honestly, how often is that *all* of the
   authors are completely equal[1]?
 
  For that case something like patchwork, phabricator, or gerrit
  seems to be the logical tool to use, and should ideally leave a
  trace of approvals and such in the resulting commit message(s). If
  the patch management tool takes care of merging the commit(s), it
  can be harder to misattribute signed-off/reviewed-by/etc, which is
  a good thing.

 Doesn't Gerrit (at least) use trailer-like structured *notes* in the
 'reviews' category (i.e. refs/notes/reviews ref) to store
 information about review process?

Don't remember if it does specifically, but I'm sure it can be
configured to. I know Phabricator appends a lot upon doing the
commit. I'll have to check it the next time I happen to use
Gerrit.

  You could of course use multiple (everybody makes their own)
  commits, where you risk breaking bisectability and avoid the need
  for equal co-authorship support. In pair programming such
  intermediate commits will quite often be fixups, and when you
  attempt to squash the fixups for bisectability's sake, you may get
  a desire for co-authorship of the resulting commit.

 Hmmm... I didn't think about the problem of attributing authorship
 for squashed commits. Though here multiple 'author' headers, or

Strictly speaking, in a live pair programming session usually the one
currently typing will be corrected by the other dev, and the roles
will switch when the typist changes.

 multiline 'author' header would be a better match than 'coauthor'
 header (which itself doesn't need, I think, the date filed, or does
 it?)

What makes sense to me is that the date is decoupled from the list of
authors and there is one date only. I think of it as an expanded
version of the previously single-entry author field, where the limit
has been lifted. With my purely git-user hat on, that is.

So, yes, a multiline author header sounds like a solution to me. Maybe
it should be sorted alphabetically upon commit regardless of the order
of --author ONE --author TWO.

 [This is sent from Thunderbird news, so it should be all right]

This is fine, the other one was broken. Out of curiosity what's the
difference between Thunderbird email and news?
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-19 Thread Scott Schmit
On Fri, Jun 19, 2015 at 01:53:01PM -0700, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
  Can you think of a name for the option that is as short as `--quick`
  but means the same as `--connectivity-only`?
 
 No I can't.  I think `--connectivity-only` is a very good name that
 is unfortunately a mouthful, I agree that we need a name that is as
 short as `--x` that means the same as `--connectivity-only`.  I
 do not think `--quick` is that word; it does not mean such a thing.

How about `--linkage` or `--links`?

-- 
Scott Schmit
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 10/19] fsck: Make fsck_tag() warn-friendly

2015-06-19 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 22:18, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 When fsck_tag() identifies a problem with the commit, it should try
 to make it possible to continue checking the commit object, in case the
 user wants to demote the detected errors to mere warnings.
 
 I agree with that.  But if FSCK_MSG_BAD_OBJECT_SHA1 is an ignorable
 error, why should we still have a conditional goto done here?
 
 Shouldn't we be parsing the object the same way regardless?

Same reason as I mentioned before: in `git receive-pack --strict` we want to 
fail early, there is no sense to keep going when we know that nothing of the 
rest will make it to the server.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: co-authoring commits

2015-06-19 Thread Jakub Narębski
On Fri, Jun 19, 2015 at 11:11 PM, Tuncer Ayaz tuncer.a...@gmail.com wrote:
 On Fri, Jun 19, 2015 at 8:18 PM, Jakub Narębski jna...@gmail.com wrote:

 [This is sent from Thunderbird news, so it should be all right]

 This is fine, the other one was broken. Out of curiosity what's the
 difference between Thunderbird email and news?

One was sent as Reply All from the news interface (nntp://news.gmane.org),
one was sent as Reply All from the email interface (Gmail account).

Damned if I know why the difference...
-- 
Jakub Narebski
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 08/19] fsck: Make fsck_commit() warn-friendly

2015-06-19 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 What would be the end-user experience if you stopped at the first
 error?  You see an error, add an fsck.msg-id = ignore and rerun,
 only to find another error and rinse and repeat?  Wouldn't you
 rather see all of them and add the ignore to cover them in one go?

 I actually see a really good reason to *keep* the current behavior:
 one of the most prominent users of this code path is `git receive-pack
 --strict`. It is used heavily by GitHub to ensure at least a certain
 level of validity of pushed objects. Now, for this use case it is easy
 to see that you want to stop *as soon as an error was
 encountered*. And as GitHub sponsors my work on this patch series, my
 main aim is to support their use case.

 While I understand that use case, I do not think stopping after
 showing three more errors in a single commit would make much
 difference in the bigger picture.

I actually changed my mind.  The above talks about the value given
to the end user by noticing as many errors in a single object, but
I'd think fsck.msg-id is pretty much useless as a tool to keep
using a repository with malformed object in its history.  When you
are told object d6602ec is bad (that's the v0.99 tag that does not
have tagger field), you would never want to say in this repository,
any tag without tagger is allowed, because you would still want to
catch and prevent future breakages of the same kind in new tags.

And the way to do so is to say I know the object d6602ec is bad, so
do not report breakage you find to me by using the skip-list.  For
that use case, showing _all_ errors (or warnings for that matter)
does not add any value.

So let's stop at the first error as your patch did.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 14/19] fsck: Allow upgrading fsck warnings to errors

2015-06-19 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 22:22, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
  #define FSCK_FATAL -1
 +#define FSCK_INFO -2

  #define FOREACH_MSG_ID(FUNC) \
  /* fatal errors */ \
 @@ -50,15 +51,16 @@
  FUNC(ZERO_PADDED_DATE, ERROR) \
  /* warnings */ \
  FUNC(BAD_FILEMODE, WARN) \
 -FUNC(BAD_TAG_NAME, WARN) \
  FUNC(EMPTY_NAME, WARN) \
  FUNC(FULL_PATHNAME, WARN) \
  FUNC(HAS_DOT, WARN) \
  FUNC(HAS_DOTDOT, WARN) \
  FUNC(HAS_DOTGIT, WARN) \
 -FUNC(MISSING_TAGGER_ENTRY, WARN) \
  FUNC(NULL_SHA1, WARN) \
 -FUNC(ZERO_PADDED_FILEMODE, WARN)
 +FUNC(ZERO_PADDED_FILEMODE, WARN) \
 +/* infos (reported as warnings, but ignored by default) */ \
 +FUNC(BAD_TAG_NAME, INFO) \
 +FUNC(MISSING_TAGGER_ENTRY, INFO)
 
 Exactly the same comment as 12/19 applies to this change; not only
 complaints but also result makes sense part.

And my explanation is the same ;-) At 02/19 time, it would just puzzle me, as a 
reader, to see special treatment without any good reason.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 12/19] fsck: Disallow demoting grave fsck errors to warnings

2015-06-19 Thread Junio C Hamano
Ahh, I didn't see that they were not grouped by object types, features
or any meaningful axis.
That explains it (i.e. I can now understand why the original list was
ordered differently from the final order).


On Fri, Jun 19, 2015 at 2:09 PM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi Junio,

 On 2015-06-19 22:21, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:

 Some kinds of errors are intrinsically unrecoverable (e.g. errors while
 uncompressing objects). It does not make sense to allow demoting them to
 mere warnings.

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c  | 14 --
  t/t5504-fetch-receive-strict.sh | 11 +++
  2 files changed, 23 insertions(+), 2 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 21e3052..a4fbce3 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -9,7 +9,12 @@
  #include refs.h
  #include utf8.h

 +#define FSCK_FATAL -1
 +
  #define FOREACH_MSG_ID(FUNC) \
 +/* fatal errors */ \
 +FUNC(NUL_IN_HEADER, FATAL) \
 +FUNC(UNTERMINATED_HEADER, FATAL) \
  /* errors */ \
  FUNC(BAD_DATE, ERROR) \
  FUNC(BAD_DATE_OVERFLOW, ERROR) \
 @@ -39,11 +44,9 @@
  FUNC(MISSING_TYPE, ERROR) \
  FUNC(MISSING_TYPE_ENTRY, ERROR) \
  FUNC(MULTIPLE_AUTHORS, ERROR) \
 -FUNC(NUL_IN_HEADER, ERROR) \
  FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
  FUNC(TREE_NOT_SORTED, ERROR) \
  FUNC(UNKNOWN_TYPE, ERROR) \
 -FUNC(UNTERMINATED_HEADER, ERROR) \

 I think the end result very much makes a good sense, but why didn't
 this list enumerate the errors in the above final order from the
 beginning in 02/19?

 Because they are alphabetically ordered, within message type categories, that 
 is; this helped me develop with more ease (you do not want to know how many 
 hundreds of times I ran an interactive rebase on all of these patches...).

 And from the point of a development story (which a patch series is), it would 
 puzzle me, as a reader, if those two out of all the others were in front in 
 02/19, when they are no different from the others at that stage.

 Ciao,
 Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 12/19] fsck: Disallow demoting grave fsck errors to warnings

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 Some kinds of errors are intrinsically unrecoverable (e.g. errors while
 uncompressing objects). It does not make sense to allow demoting them to
 mere warnings.

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c  | 14 --
  t/t5504-fetch-receive-strict.sh | 11 +++
  2 files changed, 23 insertions(+), 2 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 21e3052..a4fbce3 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -9,7 +9,12 @@
  #include refs.h
  #include utf8.h
  
 +#define FSCK_FATAL -1
 +
  #define FOREACH_MSG_ID(FUNC) \
 + /* fatal errors */ \
 + FUNC(NUL_IN_HEADER, FATAL) \
 + FUNC(UNTERMINATED_HEADER, FATAL) \
   /* errors */ \
   FUNC(BAD_DATE, ERROR) \
   FUNC(BAD_DATE_OVERFLOW, ERROR) \
 @@ -39,11 +44,9 @@
   FUNC(MISSING_TYPE, ERROR) \
   FUNC(MISSING_TYPE_ENTRY, ERROR) \
   FUNC(MULTIPLE_AUTHORS, ERROR) \
 - FUNC(NUL_IN_HEADER, ERROR) \
   FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
   FUNC(TREE_NOT_SORTED, ERROR) \
   FUNC(UNKNOWN_TYPE, ERROR) \
 - FUNC(UNTERMINATED_HEADER, ERROR) \

I think the end result very much makes a good sense, but why didn't
this list enumerate the errors in the above final order from the
beginning in 02/19?
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 09/19] fsck: Handle multiple authors in commits specially

2015-06-19 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 22:16, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
  err = fsck_ident(buffer, commit-object, options);
  if (err)
  return err;
 +while (skip_prefix(buffer, author , buffer)) {
 +err = report(options, commit-object, 
 FSCK_MSG_MULTIPLE_AUTHORS, invalid format - multiple 'author' lines);
 +if (err)
 +return err;
 +err = fsck_ident(buffer, commit-object, options);
 +if (err)
 +return err;
 +}
 
 Hmph, naively I would have expected that you wouldn't need an
 extra call to fsck_ident() here, and instead would see something
 like this:
 
   author_count = 0;
   while (skip_prefix(author )) {
   author_count++;
 ... do the existing check as-is ...
   }
 if (author_count  1)
   err |= report(missing author);
   else if (author_count  1)
   err |= report(multiple authors);

Good idea! I fixed this in my branch and it will be part of v7.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 18/19] fsck: git receive-pack: support excluding objects from fsck'ing

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 + if (strcmp(var, receive.fsck.skiplist) == 0) {
 + const char *path = is_absolute_path(value) ?
 + value : git_path(%s, value);

This either absolute or inside $GIT_DIR looks somewhat strange to
me.  Shouldn't we mimick what git config --path does instead?
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 12/19] fsck: Disallow demoting grave fsck errors to warnings

2015-06-19 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 22:21, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 Some kinds of errors are intrinsically unrecoverable (e.g. errors while
 uncompressing objects). It does not make sense to allow demoting them to
 mere warnings.

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c  | 14 --
  t/t5504-fetch-receive-strict.sh | 11 +++
  2 files changed, 23 insertions(+), 2 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 21e3052..a4fbce3 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -9,7 +9,12 @@
  #include refs.h
  #include utf8.h

 +#define FSCK_FATAL -1
 +
  #define FOREACH_MSG_ID(FUNC) \
 +/* fatal errors */ \
 +FUNC(NUL_IN_HEADER, FATAL) \
 +FUNC(UNTERMINATED_HEADER, FATAL) \
  /* errors */ \
  FUNC(BAD_DATE, ERROR) \
  FUNC(BAD_DATE_OVERFLOW, ERROR) \
 @@ -39,11 +44,9 @@
  FUNC(MISSING_TYPE, ERROR) \
  FUNC(MISSING_TYPE_ENTRY, ERROR) \
  FUNC(MULTIPLE_AUTHORS, ERROR) \
 -FUNC(NUL_IN_HEADER, ERROR) \
  FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
  FUNC(TREE_NOT_SORTED, ERROR) \
  FUNC(UNKNOWN_TYPE, ERROR) \
 -FUNC(UNTERMINATED_HEADER, ERROR) \
 
 I think the end result very much makes a good sense, but why didn't
 this list enumerate the errors in the above final order from the
 beginning in 02/19?

Because they are alphabetically ordered, within message type categories, that 
is; this helped me develop with more ease (you do not want to know how many 
hundreds of times I ran an interactive rebase on all of these patches...).

And from the point of a development story (which a patch series is), it would 
puzzle me, as a reader, if those two out of all the others were in front in 
02/19, when they are no different from the others at that stage.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 06/19] fsck: Report the ID of the error/warning

2015-06-19 Thread Junio C Hamano
That ... can be made ... was not my wish but more like the way the
code is structured it is possible for somebody to do such a thing
easily, well done compliment ;-)

The message names will have to be shown somewhere in the
documentation, and in Documentation/ we try to use camelCase to show
the word boundary; it would be better to match that, as this output is
meant to be used there.

On Fri, Jun 19, 2015 at 2:34 PM, Johannes Schindelin
johannes.schinde...@gmx.de wrote:
 Hi Junio,

 On 2015-06-19 21:28, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:

 Some legacy code has objects with non-fatal fsck issues; To enable the
 user to ignore those issues, let's print out the ID (e.g. when
 encountering missingemail, the user might want to call `git config
 --add receive.fsck.missingemail=warn`).

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c  | 16 
  t/t1450-fsck.sh |  4 ++--
  2 files changed, 18 insertions(+), 2 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 8e6faa8..0b3e18f 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -190,6 +190,20 @@ void fsck_set_msg_types(struct fsck_options *options, 
 const char *values)
  }
  }

 +static void append_msg_id(struct strbuf *sb, const char *msg_id)
 +{
 +for (;;) {
 +char c = *(msg_id)++;
 +
 +if (!c)
 +break;
 +if (c != '_')
 +strbuf_addch(sb, tolower(c));
 +}
 +
 +strbuf_addstr(sb, : );
 +}
 +
  __attribute__((format (printf, 4, 5)))
  static int report(struct fsck_options *options, struct object *object,
  enum fsck_msg_id id, const char *fmt, ...)
 @@ -198,6 +212,8 @@ static int report(struct fsck_options *options, struct 
 object *object,
  struct strbuf sb = STRBUF_INIT;
  int msg_type = fsck_msg_type(id, options), result;

 +append_msg_id(sb, msg_id_info[id].id_string);


 Nice.  The append function can be made a bit more context sensitive
 to upcase a char immediately after _ to make it easier to cut and
 paste into git config and keep the result readable, I think.

   git config --add receive.fsck.missingEmail=warn

 Okay. I camelCased the IDs; it is a bit sore on my eyes in the command-line 
 output, and the config variables are case-insensitive, anyway, but your wish 
 is my command... I changed it locally, it will be part of v7.

 Ciao,
 Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 08/19] fsck: Make fsck_commit() warn-friendly

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 When fsck_commit() identifies a problem with the commit, it should try
 to make it possible to continue checking the commit object, in case the
 user wants to demote the detected errors to mere warnings.

That makes sense.

 Note that some problems are too problematic to simply ignore. For
 example, when the header lines are mixed up, we punt after encountering
 an incorrect line. Therefore, demoting certain warnings to errors can
 hide other problems. Example: demoting the missingauthor error to
 a warning would hide a problematic committer line.

Is this a warning to end-users (which should be better in the doc),
or because some of them are too problematic to ignore that forgot
to add the explanation hence we do not keep going in this code
(which should be in the log message if that is what is going on)?

I notice that there are many instances of

if (object does not pass some test)
return report(...);

that do not do err = report(); if (err) return; in this function
after applying this patch.

I think that answers the above question.  The answer is because
some are too problematic, even after this patch, we give up parsing
the remainder of the commit object once we hit certain errors,
leaving some other errors that appear later in the object
undetected.

I think that is a sensible design decision, but the proposed log
message forgets to say so.

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c | 28 
  1 file changed, 20 insertions(+), 8 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 9faaf53..9fe9f48 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -534,12 +534,18 @@ static int fsck_commit_buffer(struct commit *commit, 
 const char *buffer,
  
   if (!skip_prefix(buffer, tree , buffer))
   return report(options, commit-object, FSCK_MSG_MISSING_TREE, 
 invalid format - expected 'tree' line);
 - if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
 - return report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
 invalid 'tree' line format - bad sha1);
 + if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n') {
 + err = report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
 invalid 'tree' line format - bad sha1);
 + if (err)
 + return err;
 + }

I do not think this if (err) return err; that uses the return
value of report(), makes sense.

As all the errors that use this pattern are isolated ones that does
not break parsing of the remainder (e.g. author ident had an extra 
in it may break author  but that does not prevent us from checking
committer ).

Your report() switches its return value based on the user setting;
specifically, it returns 0 if the user tells us to ignore/skip or
warn.  Which means that the user will see all warnings, but we stop
at the first error.

Shouldn't we continue regardless of the end-user setting in order to
show errors on other fields, too?
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Johannes Schindelin
Hi Paul,

On 2015-06-19 18:15, Paul Tan wrote:
 On Fri, Jun 19, 2015 at 11:09 PM, Junio C Hamano gits...@pobox.com wrote:
 
 The primary thing I care about is to discourage callers of the API element
 am_state from touching these fields with strbuf functions. If it is char * 
 then
 the would think twice before modifying (which would involve allocating the
 new buffer, forming the new value and assigning into it). With the fields 
 left
 as strbuf after they are read or formed by the API (by reading by the API
 implementation from $GIT_DIR/rebase-apply/), it is too tempting to do
 strbuf_add(), strbuf_trim(), etc., without restoring the value to the 
 original
 saying I'm the last user of it anyway--that is the sloppiness we can 
 prevent
 by not leaving it as strbuf.
 
 I don't think this is a good deterrent. If the code wanted to, it
 could just use strbuf_attach()/strbuf_detach() as well, no?

Sadly, I am a bit too busy with some loose Git for Windows ends, so I haven't 
had the chance to look at your code.

But I still wanted to throw this idea out there: would it maybe possible to 
avoid having those values as global variables, and instead pass them as const 
char * parameters to the respective functions? That should help resolve the 
concerns of both sides because it would allow us to keep the strbuf instances, 
just as local variables.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 14/19] fsck: Allow upgrading fsck warnings to errors

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

  #define FSCK_FATAL -1
 +#define FSCK_INFO -2
  
  #define FOREACH_MSG_ID(FUNC) \
   /* fatal errors */ \
 @@ -50,15 +51,16 @@
   FUNC(ZERO_PADDED_DATE, ERROR) \
   /* warnings */ \
   FUNC(BAD_FILEMODE, WARN) \
 - FUNC(BAD_TAG_NAME, WARN) \
   FUNC(EMPTY_NAME, WARN) \
   FUNC(FULL_PATHNAME, WARN) \
   FUNC(HAS_DOT, WARN) \
   FUNC(HAS_DOTDOT, WARN) \
   FUNC(HAS_DOTGIT, WARN) \
 - FUNC(MISSING_TAGGER_ENTRY, WARN) \
   FUNC(NULL_SHA1, WARN) \
 - FUNC(ZERO_PADDED_FILEMODE, WARN)
 + FUNC(ZERO_PADDED_FILEMODE, WARN) \
 + /* infos (reported as warnings, but ignored by default) */ \
 + FUNC(BAD_TAG_NAME, INFO) \
 + FUNC(MISSING_TAGGER_ENTRY, INFO)

Exactly the same comment as 12/19 applies to this change; not only
complaints but also result makes sense part.

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 This option avoids unpacking each and all objects, and just verifies the
 connectivity.

That sounds like marketing ;-)

Wow this does not unpack unnecessarily, wait, it needs to unpack
and parse 3 out of 4 kinds of objects?

Jokes aside, given that you should regularly repack your repository
anyway, I do not think it is such a big downside that this mode
misses a corrupt objects, and the 1 out of 4 kinds of objects,
i.e. blobs, occupy major part of the repository storage, so this new
mode probably makes sense.

 diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
 index 922c346..2863a8a 100755
 --- a/t/t1450-fsck.sh
 +++ b/t/t1450-fsck.sh
 @@ -431,4 +431,26 @@ test_expect_success 'fsck notices ref pointing to 
 missing tag' '
   test_must_fail git -C missing fsck
  '
  
 +test_expect_success 'fsck --quick' '
 + rm -rf quick 
 + git init quick 
 + (
 + cd quick 
 + touch empty 
 + git add empty 
 + test_commit empty 
 + empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 
 + rm -f $empty 
 + echo invalid $empty 
 + test_must_fail git fsck --strict 
 + git fsck --strict --quick 
 + tree=$(git rev-parse HEAD:) 
 + suffix=${tree#??} 
 + tree=.git/objects/${tree%$suffix}/$suffix 
 + rm -f $tree 
 + echo invalid $tree 
 + test_must_fail git fsck --strict --quick
 + )
 +'
 +
  test_done

I see a few impedance mismatch here.  For --quick, I would have
expected that the addition would be in t/perf/, not here.

Also the fact that quickness comes by cheating on blobs is an
implementation detail; in the future, perhaps somebody may come up
with a way to do a quick fsck while making sure blob corruption is
also detected.  The new test that expects --quick to ignore a
corrupt blob forbids such a progress.

If the option name was --ignore-corrupt-blob, then the above
change is 100% justified, though.

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 08/19] fsck: Make fsck_commit() warn-friendly

2015-06-19 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 22:12, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 Note that some problems are too problematic to simply ignore. For
 example, when the header lines are mixed up, we punt after encountering
 an incorrect line. Therefore, demoting certain warnings to errors can
 hide other problems. Example: demoting the missingauthor error to
 a warning would hide a problematic committer line.
 
 Is this a warning to end-users (which should be better in the doc),
 or because some of them are too problematic to ignore that forgot
 to add the explanation hence we do not keep going in this code
 (which should be in the log message if that is what is going on)?

It was intended to offer the explanation for the design decision you commented 
on later:

 I notice that there are many instances of
 
   if (object does not pass some test)
   return report(...);
 
 that do not do err = report(); if (err) return; in this function
 after applying this patch.
 
 I think that answers the above question.  The answer is because
 some are too problematic, even after this patch, we give up parsing
 the remainder of the commit object once we hit certain errors,
 leaving some other errors that appear later in the object
 undetected.
 
 I think that is a sensible design decision, but the proposed log
 message forgets to say so.
 
 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c | 28 
  1 file changed, 20 insertions(+), 8 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 9faaf53..9fe9f48 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -534,12 +534,18 @@ static int fsck_commit_buffer(struct commit *commit, 
 const char *buffer,

  if (!skip_prefix(buffer, tree , buffer))
  return report(options, commit-object, FSCK_MSG_MISSING_TREE, 
 invalid format - expected 'tree' line);
 -if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
 -return report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
 invalid 'tree' line format - bad sha1);
 +if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n') {
 +err = report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
 invalid 'tree' line format - bad sha1);
 +if (err)
 +return err;
 +}
 
 I do not think this if (err) return err; that uses the return
 value of report(), makes sense.
 
 As all the errors that use this pattern are isolated ones that does
 not break parsing of the remainder (e.g. author ident had an extra 
 in it may break author  but that does not prevent us from checking
 committer ).
 
 Your report() switches its return value based on the user setting;
 specifically, it returns 0 if the user tells us to ignore/skip or
 warn.  Which means that the user will see all warnings, but we stop
 at the first error.
 
 Shouldn't we continue regardless of the end-user setting in order to
 show errors on other fields, too?

I can make that happen, but please note that this is a change of behavior: we 
always stopped upon the first error.

It was my intention not to change behavior in that way without a proper reason, 
and I saw none.

I actually see a really good reason to *keep* the current behavior: one of the 
most prominent users of this code path is `git receive-pack --strict`. It is 
used heavily by GitHub to ensure at least a certain level of validity of pushed 
objects. Now, for this use case it is easy to see that you want to stop *as 
soon as an error was encountered*. And as GitHub sponsors my work on this patch 
series, my main aim is to support their use case.

Having said that, I agree that it could actually make sense for `git fsck` to 
show all errors, or at least to have an option to do so.

But that is a story for another night ;-)
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 Can you think of a name for the option that is as short as `--quick`
 but means the same as `--connectivity-only`?

No I can't.  I think `--connectivity-only` is a very good name that
is unfortunately a mouthful, I agree that we need a name that is as
short as `--x` that means the same as `--connectivity-only`.  I
do not think `--quick` is that word; it does not mean such a thing.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 08/19] fsck: Make fsck_commit() warn-friendly

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 I do not think this if (err) return err; that uses the return
 value of report(), makes sense.
 
 As all the errors that use this pattern are isolated ones that does
 not break parsing of the remainder (e.g. author ident had an extra 
 in it may break author  but that does not prevent us from checking
 committer ).
 
 Your report() switches its return value based on the user setting;
 specifically, it returns 0 if the user tells us to ignore/skip or
 warn.  Which means that the user will see all warnings, but we stop
 at the first error.
 
 Shouldn't we continue regardless of the end-user setting in order to
 show errors on other fields, too?

 I can make that happen, but please note that this is a change of
 behavior: we always stopped upon the first error.

Yeah, and we always died when we saw error, without giving users an
option to turn it down.  So?

 It was my intention not to change behavior in that way without a
 proper reason, and I saw none.

What would be the end-user experience if you stopped at the first
error?  You see an error, add an fsck.msg-id = ignore and rerun,
only to find another error and rinse and repeat?  Wouldn't you
rather see all of them and add the ignore to cover them in one go?

 I actually see a really good reason to *keep* the current behavior:
 one of the most prominent users of this code path is `git receive-pack
 --strict`. It is used heavily by GitHub to ensure at least a certain
 level of validity of pushed objects. Now, for this use case it is easy
 to see that you want to stop *as soon as an error was
 encountered*. And as GitHub sponsors my work on this patch series, my
 main aim is to support their use case.

While I understand that use case, I do not think stopping after
showing three more errors in a single commit would make much
difference in the bigger picture.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 09/19] fsck: Handle multiple authors in commits specially

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

   err = fsck_ident(buffer, commit-object, options);
   if (err)
   return err;
 + while (skip_prefix(buffer, author , buffer)) {
 + err = report(options, commit-object, 
 FSCK_MSG_MULTIPLE_AUTHORS, invalid format - multiple 'author' lines);
 + if (err)
 + return err;
 + err = fsck_ident(buffer, commit-object, options);
 + if (err)
 + return err;
 + }

Hmph, naively I would have expected that you wouldn't need an
extra call to fsck_ident() here, and instead would see something
like this:

author_count = 0;
while (skip_prefix(author )) {
author_count++;
... do the existing check as-is ...
}
if (author_count  1)
err |= report(missing author);
else if (author_count  1)
err |= report(multiple authors);

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] revision.c: Correctly dereference interesting_cache

2015-06-19 Thread Jeff King
On Fri, Jun 19, 2015 at 12:01:23PM -0700, Stefan Beller wrote:

 This was introduced at b6e8a3b5 (2015-04-17, limit_list: avoid
 quadratic behavior from still_interesting), which
 also introduced the check a few lines before, which already dereferences
 `interesting_cache`. So at this point `interesting_cache` is guaranteed to
 be not NULL. The code is called referencing the address of a local
 variable, so `interesting_cache` can actually never be NULL and trigger a
 segmentation fault by dereferencing it a few lines before this.

Yeah, I agree it can never be NULL here or we would have already
segfaulted. Thanks for digging into it.

 I think the right thing is to check for `*interesting_cache` as that
 can become NULL actually.

I don't think this is right. We have found the interesting commit, so we
want to write it into the cache unconditionally, not only if there is
nothing else in the cache (we know if we got here that either there was
no current cache item, or it already became UNINTERESTING).

I think it is simply a misguided defensive measure to make sure that the
caller passed in a cache slot to us. But there is only one caller, and
they always pass a cache, so the first part of the function was lazy and
not defensive.

   while (list) {
   struct commit *commit = list-item;
   list = list-next;
   if (commit-object.flags  UNINTERESTING)
   continue;
 - if (interesting_cache)
 + if (*interesting_cache)
   *interesting_cache = commit;

So I think the right solution is just to drop the conditional entirely.
The current code is not wrong (it is always a noop). What you have here
actually misbehaves; it does not update the cache slot when it has
become UNINTERESTING. That does not produce wrong results, but it loses
the benefit of the cache in some cases.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v2 12/12] refs: move the remaining ref module declarations to refs.h

2015-06-19 Thread Michael Haggerty
On 06/15/2015 08:35 PM, Jeff King wrote:
 On Mon, Jun 15, 2015 at 11:13:22AM -0700, Junio C Hamano wrote:
 
 @@ -78,15 +170,15 @@ typedef int each_ref_fn(const char *refname,
   * modifies the reference also returns a nonzero value to immediately
   * stop the iteration.
   */
 -extern int head_ref(each_ref_fn, void *);
 +extern int head_ref(each_ref_fn fn, void *cb_data);

 For example, between these two, what did we gain?

 Because of their types, it already was clear what the two parameters
 are in the original, without noisewords like fn (which obviously
 stands for a function, but that is clear due to each_ref_fn).
 
 I think the real benefit of naming parameters is that you can talk about
 fn and cb_data by name in the docstring[1]. Of course we do not do
 that here, so we could clearly wait until if-and-when we do so. But I
 do not think it is a big deal for our style guide to say we always name
 parameters in declarations, and to bring things in line there (though
 perhaps it should be a separate patch in that case).

I also like the idea that all parameters in declarations should be
named. It is true that sometimes the purpose of a parameter is easy to
guess from its type without a name. But giving them names make them
easier to talk about in docstrings, in commit messages, or even on the
mailing list when reviewing patches or discussing bugs.

Moreover, I don't see the value of wasting mental energy wondering
hmmm, in this case is the meaning sufficiently obvious from the type?
every time I write a function declaration. It is easier to have a
consistent policy of putting them in.

Finally, when I'm inventing new functions (which I admit isn't the case
here), I usually write the declaration first and then copy-paste it to
the C file as the first step in writing the definition. If I name the
parameters in the declaration then (a) I don't have to go back and edit
them at the definition and (b) if parameter names are needed later at
the declaration (e.g., for a docstring), I don't have to edit the
declaration again, cross-referencing back to the C file to make sure I
name them consistently.

I will split the add names in declarations changes into a separate
patch. Also, that way Junio can ignore the patch if he still disagrees :-)

 -extern int for_each_glob_ref_in(each_ref_fn, const char *pattern, const 
 char* prefix, void *);
 +extern int for_each_glob_ref_in(each_ref_fn fn, const char *pattern, const 
 char* prefix, void *cb_data);

 Likewise for addition of fn and cb_data.

 If you really want to make unrelated changes to this file, what you
 should fix is to update const char* prefix to const char *prefix
 ;-)
 
 IMHO they are in the same boat (style fixes), and I would be happy to
 see both improved. :)

I will fix this too.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v2 08/12] initial_ref_transaction_commit(): function for initial ref creation

2015-06-19 Thread Michael Haggerty
On 06/15/2015 08:53 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 +struct ref_transaction *t;
 +struct strbuf err = STRBUF_INIT;
 +
 +t = ref_transaction_begin(err);
 +if (!t)
 +die(err.buf);
 
 Yikes, and sorry for sending three messages without consolidating
 against the same patch, but
 
   die(%s, err.buf);
 
 because
 
 extern NORETURN void die(const char *err, ...) __attribute__((format (printf, 
 1, 2)));
 
 in its declaration causes -Werror=format-security to barf.
 
 Likewise for a few other instances of the same construct all in 
 the same file.

Thanks for catching this. I'll fix it in the re-roll (and also add that
gcc option to my config.mak for the future).

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 Jokes aside, given that you should regularly repack your repository
 anyway, I do not think it is such a big downside that this mode
 misses a corrupt objects, and the 1 out of 4 kinds of objects,
 i.e. blobs, occupy major part of the repository storage, so this new
 mode probably makes sense.

 It actually makes a ton of sense as a kind of light-weight check ;-)

Yes, didn't I agree that it makes sense already?

 The meaning of quick that I was thinking of was not the same as
 fast, but more like just a quick check. As in quick  dirty ;-)

Sure.

 The point is not to ignore corrupt blobs, by the way, it is to check
 the connectivity only, and save substantial amounts of time doing so.

Yeah, I understand that; after all, that is exactly why I said make
sure it does not notice corrupt blobs is not a good test.

On the other hand, you are also checking that it notices a broken
tree (which makes it impossible to complete connectivity check),
which is a good test.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-19 Thread Junio C Hamano
Scott Schmit i.g...@comcast.net writes:

 On Fri, Jun 19, 2015 at 01:53:01PM -0700, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
  Can you think of a name for the option that is as short as `--quick`
  but means the same as `--connectivity-only`?
 
 No I can't.  I think `--connectivity-only` is a very good name that
 is unfortunately a mouthful, I agree that we need a name that is as
 short as `--x` that means the same as `--connectivity-only`.  I
 do not think `--quick` is that word; it does not mean such a thing.

 How about `--linkage` or `--links`?

Even though link may be shorter than connectivity, the real
difficulty is to come up with a phrase that conveys the -only part
of the fully spelled name, which is more important, without spending
5 letters that take to say -only.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Alexander Kuleshov
Hello Jeff and Junio,

Thank you for feedback and help. I think also I need to add yet another test
which tests case when configuration option is set and -o passed.

I'll make changes and resend the patch.

Thank you.


2015-06-19 10:14 GMT+06:00 Jeff King p...@peff.net:
 On Thu, Jun 18, 2015 at 02:46:36PM -0700, Junio C Hamano wrote:

  If I were designing from scratch, I would consider making -o - output
  to stdout, and letting it override a previous -o (or vice versa). We
  could still do that (and make --stdout an alias for that), but I don't
  know if it is worth the trouble (it does change the behavior for anybody
  who wanted a directory called -, but IMHO it is more likely to save
  somebody a headache than create one).

 I agree with later -o should override an earlier one, but I do not
 necessarily agree with '-o -' should be --stdout, for a simple
 reason that -o foo is not --stdout foo.

 Good point. At any rate, that was all in my designing from scratch
 hypothetical, so it is doubly not worth considering.

 Perhaps something like this to replace builtin/ part of Alexander's
 patch?
 [...]
 @@ -1337,6 +1342,9 @@ int cmd_format_patch(int argc, const char **argv, 
 const char *prefix)
   die (_(--subject-prefix and -k are mutually exclusive.));
   rev.preserve_subject = keep_subject;

 + if (!output_directory  !use_stdout)
 + output_directory = config_output_directory;
 +

 Yeah, I think that is the sanest way to do it given the constraints.

 -Peff

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-19 Thread Jeff King
On Fri, Jun 19, 2015 at 01:32:23AM -0400, Jeff King wrote:

 The least-work thing may actually be teaching the separate
 diff-highlight script to strip out the colorizing and re-add it by
 offset.

OK, here is that patch. It seems to work OK, and should preserve
existing colors produced by git anywhere within the input.

It should also serve as a stepping stone to using the same technique
inside git itself. I am not planning to do that anytime soon, though. I
took a look at the word-diff code earlier today, and my brain partially
melted. I'm sure it's possible, but I've already spent enough time and
brain cells on this for now.

I think this patch does need some more polishing, or more heuristics,
though. Some of the output is really fantastic. Here's a simple one from
69f9a6e54:

  -* link:v2.4.2/git.html[documentation for release 2.4.2]
  +* link:v2.4.3/git.html[documentation for release 2.4.3]

The angle brackets show what is highlighted, but that really doesn't do
it justice. I encourage people to apply the patch and

  git show --color $commit |
  contrib/diff-highlight/diff-highlight

to see for yourselves (I'm hesitant to assume that sending ANSI color
codes via email will end up readable for anyone).

What's interesting about that example (that didn't work before) is that
we can find multiple changes and highlight them independently (whereas
before, we would have highlighted everything between them, too).

Here's a better code example from 9cc2b07a7:

  -int parse_commit(struct commit *item)
  +int parse_commit_gently(struct commit *item, int quiet_on_missing)

It highlights the change in the function name and the matching change to
the parameter list. And from a few lines below in the same diff:

  -   return error(Could not read %s,
  +   return quiet_on_missing ? -1 :
  +   error(Could not read %s,

This demonstrates it working on multiple lines, and in particular one
where the matching content actually crosses a line boundary.

So that's all the good news. Here's some bad news, from f0e7f11d054:

  -   return offset1 - offset2;
  +   return offset1  offset2 ? -1 :
  +  offset1  offset2 ?  1 :
  +  0;

I don't think adding angle brackets to this one would do it justice, but
you can look for yourself. What happens is that we match offset1, but
then the - from the pre-image matches the first character of -1 from
the post-image. And all of  offset2 ?  is highlighted, then not -,
then 1 :, and so on. The result is rather confusing to read.

I think that can probably be fixed by smarter tokenizing of what we feed
to the per-hunk diff (i.e., -1 should be a full token).

It also does a weird thing where the offset1 from the second
post-image line is _not_ highlighted, and it should be (even though it's
ugly, it is the correct thing to do). I think what is happening is that
git's existing colorizing is getting in the way. It does not say turn
on green; OK, now turn off green. It says turn on green; OK, now
reset all colors. So at the beginning of each post-image line, we will
accidentally turn off any highlighting carried over from the previous
line. That's probably OK in general, because highlighting across lines
is a good indication that it's not helpful. But it is a little hacky.

And here's some more bad news. If you look at the diff for this
patch itself, it's terribly unreadable (the regular diff already is
pretty bad, but the highlights make it much worse). There are big chunks
where we take away 5 or 10 lines from the old code, and replace them
with totally unrelated lines. We end up highlighting almost the entire
thing, except for spaces and punctuation.

We might be able to solve this with a percentage heuristic similar to
the one Patrick proposed. It's not really interesting to highlight
unless we're doing it on probably 20% or less of the diff (where 20% is
a number I just made up).

I also have a feeling that we might get better results if we don't feed
the delimiters into the diff algorithm. I think the word-diff code
actually tokenizes foo bar baz into the list (foo, bar, baz) and diffs
only that. I think it then _loses_ the original whitespace, but here we
would need to record and restore it. We might be able to use the same
mechanism as for stripping and restoring the colors. I'm not sure.

Anyway, here's the patch. It's rather messy, so I'd recommend just
reading the resulting code; I wish there was a good way to convince the
diff engine to produce larger hunks.

---
 contrib/diff-highlight/diff-highlight | 202 +++---
 1 file changed, 86 insertions(+), 116 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index ffefc31..1525ccc 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -3,6 +3,7 @@
 use 5.008;
 use warnings FATAL = 'all';
 use strict;
+use Algorithm::Diff;
 
 # Highlight 

Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-19 Thread Matthieu Moy
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:

 Matthieu Moy matthieu@grenoble-inp.fr writes:

 Cool. Then almost all the work is done to get an automated test. Next
 step would be to add the tests itself in the code. I would do that by
 adding a hidden --selfcheck option to git send-email that would compare
 Mail::Address-parse($string); and split_addrs($string); for all your
 testcases, and die if they do not match. Then calling it from the
 testsuite would be trivial.

 Ok, are there such --selfcheck options elsewhere?

Not as far as I know.

 If I understand it right, you want to put the tests inside the
 git-send-email script. I don't feel really good about that but I guess
 it's hard to test it otherwise...

Hmm, actually there is, I didn't look at the right places yesterday.
git-send-email.perl already does 'use Git;', and there's already a set
of unit-tests for Git.pm: t9700-perl-git.sh, which calls perl
$TEST_DIRECTORY/t9700/test.pl.

So, you can just add your code as a function in Git.pm and unit-tests in
t/t9700/test.pl.

 Also what will we do with the failing tests? Just discard them? I
 think there's two sort of failing test:

  - When output provided by parse_address_ without Mail::Address
is better or has no impact at all on the code. Such as:

I'm not sure we can be better as long as we do use Mail::Address when
available. Any difference is potentially harmfull for the user because
it means that Git will have different behavior on different machines.

Perhaps this is an argument to use your version unconditionally and drop
Mail::Address actually.

But you can still test that with

  is(parse_address_(...), Doe, Jane, description);

(possibly not calling Mail::Address)

http://search.cpan.org/~exodist/Test-Simple-1.001014/lib/Test/More.pm

The cases where Mail::Address and your version give the same result can
be tested with a foreach loop calling

  is(parse_address_(...), Mail::Address(...), ...);

  - When we don't really care about the output, because the user entry
is wrong, and we just expect the script to be aborted somehow... We
don't need to test that.

... but if you already have the tests, you can keep them as known
failure.

See the TODO: BLOCK section of the doc of Test::More.

 I can do that on top of your series if you don't have time.

 Time will become a problem soon, but I think I can handle it unless
 you really want to do it !

If you have time, just do it.

Thanks,

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report: Unhandled Exception

2015-06-19 Thread Johannes Schindelin
Hi Gary,

On 2015-06-11 21:21, Gary England wrote:

 Using git version 1.9.2-preview20140411, in Git Bash for Windows,
 performing a git pull --rebase, received an unhandled exception.

Please note that the newest 1.9.x release is 1.9.5. Could you re-test with that 
version, please? Or maybe you want to give the 2nd release candidate of Git for 
Windows 2.x a try: https://git-for-windows.github.io/#download

Ciao,
Johannes

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 09/19] fsck: Handle multiple authors in commits specially

2015-06-19 Thread Johannes Schindelin
This problem has been detected in the wild, and is the primary reason
to introduce an option to demote certain fsck errors to warnings. Let's
offer to ignore this particular problem specifically.

Technically, we could handle such repositories by setting
receive.fsck.msg-id to missingcommitter=warn, but that could hide
missing tree objects in the same commit because we cannot continue
verifying any commit object after encountering a missing committer line,
while we can continue in the case of multiple author lines.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/fsck.c b/fsck.c
index 9fe9f48..0cfa4d0 100644
--- a/fsck.c
+++ b/fsck.c
@@ -38,6 +38,7 @@
FUNC(MISSING_TREE, ERROR) \
FUNC(MISSING_TYPE, ERROR) \
FUNC(MISSING_TYPE_ENTRY, ERROR) \
+   FUNC(MULTIPLE_AUTHORS, ERROR) \
FUNC(NUL_IN_HEADER, ERROR) \
FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
FUNC(TREE_NOT_SORTED, ERROR) \
@@ -571,6 +572,14 @@ static int fsck_commit_buffer(struct commit *commit, const 
char *buffer,
err = fsck_ident(buffer, commit-object, options);
if (err)
return err;
+   while (skip_prefix(buffer, author , buffer)) {
+   err = report(options, commit-object, 
FSCK_MSG_MULTIPLE_AUTHORS, invalid format - multiple 'author' lines);
+   if (err)
+   return err;
+   err = fsck_ident(buffer, commit-object, options);
+   if (err)
+   return err;
+   }
if (!skip_prefix(buffer, committer , buffer))
return report(options, commit-object, 
FSCK_MSG_MISSING_COMMITTER, invalid format - expected 'committer' line);
err = fsck_ident(buffer, commit-object, options);
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 13/19] fsck: Optionally ignore specific fsck issues completely

2015-06-19 Thread Johannes Schindelin
An fsck issue in a legacy repository might be so common that one would
like not to bother the user with mentioning it at all. With this change,
that is possible by setting the respective message type to ignore.

This change abuses the missingemail=warn test to verify that ignore
is also accepted and works correctly. And while at it, it makes sure
that multiple options work, too (they are passed to unpack-objects or
index-pack as a comma-separated list via the --strict=... command-line
option).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c  | 5 +
 fsck.h  | 1 +
 t/t5504-fetch-receive-strict.sh | 9 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index a4fbce3..b728646 100644
--- a/fsck.c
+++ b/fsck.c
@@ -137,6 +137,8 @@ static int parse_msg_type(const char *str, int len)
return FSCK_ERROR;
else if (!substrcmp(str, len, warn))
return FSCK_WARN;
+   else if (!substrcmp(str, len, ignore))
+   return FSCK_IGNORE;
else
die(Unknown fsck message type: '%.*s',
len, str);
@@ -220,6 +222,9 @@ static int report(struct fsck_options *options, struct 
object *object,
struct strbuf sb = STRBUF_INIT;
int msg_type = fsck_msg_type(id, options), result;
 
+   if (msg_type == FSCK_IGNORE)
+   return 0;
+
if (msg_type == FSCK_FATAL)
msg_type = FSCK_ERROR;
 
diff --git a/fsck.h b/fsck.h
index 738c9df..7e49372 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,6 +3,7 @@
 
 #define FSCK_ERROR 1
 #define FSCK_WARN 2
+#define FSCK_IGNORE 3
 
 struct fsck_options;
 
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 0d64229..cb077b7 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -133,7 +133,14 @@ test_expect_success 'push with 
receive.fsck.missingemail=warn' '
git --git-dir=dst/.git config \
receive.fsck.missingemail warn 
git push --porcelain dst bogus act 21 
-   grep missingemail act
+   grep missingemail act 
+   git --git-dir=dst/.git branch -D bogus 
+   git  --git-dir=dst/.git config --add \
+   receive.fsck.missingemail ignore 
+   git  --git-dir=dst/.git config --add \
+   receive.fsck.baddate warn 
+   git push --porcelain dst bogus act 21 
+   test_must_fail grep missingemail act
 '
 
 test_expect_success \
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 10/19] fsck: Make fsck_tag() warn-friendly

2015-06-19 Thread Johannes Schindelin
When fsck_tag() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.

Just like fsck_commit(), there are certain problems that could hide other
issues with the same tag object. For example, if the 'type' line is not
encountered in the correct position, the 'tag' line – if there is any –
would not be handled at all.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 0cfa4d0..21e3052 100644
--- a/fsck.c
+++ b/fsck.c
@@ -640,7 +640,8 @@ static int fsck_tag_buffer(struct tag *tag, const char 
*data,
}
if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
ret = report(options, tag-object, FSCK_MSG_BAD_OBJECT_SHA1, 
invalid 'object' line format - bad sha1);
-   goto done;
+   if (ret)
+   goto done;
}
buffer += 41;
 
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 3/3] Add filter-objects command

2015-06-19 Thread Charles Bailey
From: Charles Bailey cbaile...@bloomberg.net

filter-objects is a command to scan all objects in the object database
for the repository and print the ids of those which match the given
criteria.

The current supported criteria are object type and the minimum size of
the object.

The guiding use case is to scan repositories quickly for large objects
which may cause performance issues for users. The list of objects can
then be used to guide some future remediating action.

Signed-off-by: Charles Bailey cbaile...@bloomberg.net
---
 Documentation/git-filter-objects.txt | 38 +++
 Makefile |  1 +
 builtin.h|  1 +
 builtin/filter-objects.c | 73 
 git.c|  1 +
 t/t8100-filter-objects.sh| 67 +
 6 files changed, 181 insertions(+)
 create mode 100644 Documentation/git-filter-objects.txt
 create mode 100644 builtin/filter-objects.c
 create mode 100755 t/t8100-filter-objects.sh

diff --git a/Documentation/git-filter-objects.txt 
b/Documentation/git-filter-objects.txt
new file mode 100644
index 000..c10ca01
--- /dev/null
+++ b/Documentation/git-filter-objects.txt
@@ -0,0 +1,38 @@
+git-filter-objects(1)
+=
+
+NAME
+
+git-filter-objects - Scan through all objects in the repository and print those
+matching a given filter
+
+
+SYNOPSIS
+
+[verse]
+'git filter-objects' [-t type | --type=type] [--min-size=size]
+   [-v|--verbose]
+
+DESCRIPTION
+---
+Scans all objects in a repository - including any unreachable objects - and
+print out the ids of all matching objects.  If `--verbose` is specified then
+the object type and size is printed out as well as its id.
+
+OPTIONS
+---
+-t::
+--type::
+   Only list objects whose type matches type.
+
+--min-size::
+   Only list objects whose size exceeds size bytes.
+
+-v::
+--verbose::
+   Output in the followin format instead of just printing object ids:
+   sha1 SP type SP size
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 149f1c7..a7c017f 100644
--- a/Makefile
+++ b/Makefile
@@ -842,6 +842,7 @@ BUILTIN_OBJS += builtin/diff.o
 BUILTIN_OBJS += builtin/fast-export.o
 BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
+BUILTIN_OBJS += builtin/filter-objects.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
 BUILTIN_OBJS += builtin/fsck.o
diff --git a/builtin.h b/builtin.h
index b87df70..5a15693 100644
--- a/builtin.h
+++ b/builtin.h
@@ -62,6 +62,7 @@ extern int cmd_diff_tree(int argc, const char **argv, const 
char *prefix);
 extern int cmd_fast_export(int argc, const char **argv, const char *prefix);
 extern int cmd_fetch(int argc, const char **argv, const char *prefix);
 extern int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
+extern int cmd_filter_objects(int argc, const char **argv, const char *prefix);
 extern int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 extern int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
 extern int cmd_format_patch(int argc, const char **argv, const char *prefix);
diff --git a/builtin/filter-objects.c b/builtin/filter-objects.c
new file mode 100644
index 000..c40d621
--- /dev/null
+++ b/builtin/filter-objects.c
@@ -0,0 +1,73 @@
+#include cache.h
+#include builtin.h
+#include revision.h
+#include parse-options.h
+
+#include stdio.h
+
+static int req_type;
+static unsigned long min_size;
+static int verbose;
+
+static int check_object(const unsigned char *sha1)
+{
+   unsigned long size;
+   int type = sha1_object_info(sha1, size);
+
+   if (type  0)
+   return -1;
+
+   if (size = min_size  (!req_type || type == req_type)) {
+   if (verbose)
+   printf(%s %s %lu\n, sha1_to_hex(sha1), 
typename(type), size);
+   else
+   printf(%s\n, sha1_to_hex(sha1));
+   }
+
+   return 0;
+}
+
+static int check_loose_object(const unsigned char *sha1,
+ const char *path,
+ void *data)
+{
+   return check_object(sha1);
+}
+
+static int check_packed_object(const unsigned char *sha1,
+  struct packed_git *pack,
+  uint32_t pos,
+  void *data)
+{
+   return check_object(sha1);
+}
+
+static char *opt_type;
+static struct option builtin_filter_objects_options[] = {
+   OPT_ULONG(0, min-size, min_size, minimum size of object to show),
+   OPT_STRING('t', type, opt_type, NULL, type of objects to show),
+   OPT__VERBOSE(verbose, show object type and size),
+   OPT_END()
+};
+
+int cmd_filter_objects(int argc, const char **argv, const char *prefix)
+{
+   struct packed_git *p;
+
+   argc = 

Re: [PATCH/WIP v3 03/31] am: implement skeletal builtin am

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 4:26 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:
 @@ -0,0 +1,28 @@
 +/*
 + * Builtin git am
 + *
 + * Based on git-am.sh by Junio C Hamano.
 + */
 +#include cache.h
 +#include builtin.h
 +#include exec_cmd.h
 +
 +int cmd_am(int argc, const char **argv, const char *prefix)
 +{
 + /*
 +  * FIXME: Once all the features of git-am.sh have been re-implemented
 +  * in builtin/am.c, this preamble can be removed.
 +  */

 It's not broken, so FIXME is not quite appropriate (and that is
 why I sent you NEEDSWORK).

OK.

 Also mention that the entry in the
 commands[] array needs RUN_SETUP | NEED_WORK_TREE added, I think.

OK.

 + if (!getenv(_GIT_USE_BUILTIN_AM)) {
 + const char *path = mkpath(%s/git-am, git_exec_path());
 +
 + if (sane_execvp(path, (char **)argv)  0)
 + die_errno(could not exec %s, path);
 + } else {
 + prefix = setup_git_directory();
 + trace_repo_setup(prefix);
 + setup_work_tree();
 + }
 +
 + return 0;
 +}
 diff --git a/git.c b/git.c
 index e7a7713..a671535 100644
 --- a/git.c
 +++ b/git.c
 @@ -370,6 +370,7 @@ static int run_builtin(struct cmd_struct *p, int argc, 
 const char **argv)

  static struct cmd_struct commands[] = {
   { add, cmd_add, RUN_SETUP | NEED_WORK_TREE },
 + { am, cmd_am, NO_SETUP },

 NO_SETUP is for things like init and clone that start without a
 repository and then work in the one that they create.  I think
 imitating archive or diff is more appropriate.

Ah OK. I thought that handle_alias() in git.c would chdir() and set
GIT_DIR because it called setup_git_directory_gently(), and thus
requires NO_SETUP to restore the initial environment, but turns out it
chdir()s back to the original directory, and sets GIT_DIR
appropriately, so we're OK.

Thanks,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 00/19] Make git-pull a builtin

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 4:13 AM, Junio C Hamano gits...@pobox.com wrote:
 I didn't look carefully, but does that mean 04/19 has the what if
 you start from a subdirectory and are still using the scripted one?
 issue we discussed recently for am?

It does, but git-pull.sh does not care about the original working directory, no?

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Alexander Kuleshov
2015-06-19 3:46 GMT+06:00 Junio C Hamano gits...@pobox.com:
 I agree with later -o should override an earlier one, but I do not
 necessarily agree with '-o -' should be --stdout, for a simple
 reason that -o foo is not --stdout foo.

 Perhaps something like this to replace builtin/ part of Alexander's
 patch?

 @@ -1337,6 +1342,9 @@ int cmd_format_patch(int argc, const char **argv, const 
 char *prefix)
 die (_(--subject-prefix and -k are mutually exclusive.));
 rev.preserve_subject = keep_subject;

 +   if (!output_directory  !use_stdout)
 +   output_directory = config_output_directory;
 +


But there is following condition above:

 if (!use_stdout)
  output_directory = set_outdir(prefix, output_directory);

After which output_directory will be ./ everytime and


 +   if (!output_directory  !use_stdout)
 +   output_directory = config_output_directory;
 +


will not work here. What if we remove if (output_directory) {}
and update it as:

if (!use_stdout) {
if (!config_output_directory  !output_directory)
output_directory = set_outdir(prefix, output_directory);
else if (config_output_directory)
output_directory = config_output_directory;

if (mkdir(output_directory, 0777)  0  errno != EEXIST)
die_errno(_(Could not create directory '%s'),
  output_directory);
}
else
setup_pager();

?
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 03/19] fsck: Provide a function to parse fsck message IDs

2015-06-19 Thread Johannes Schindelin
These functions will be used in the next commits to allow the user to
ask fsck to handle specific problems differently, e.g. demoting certain
errors to warnings. The upcoming `fsck_set_msg_types()` function has to
handle partial strings because we would like to be able to parse, say,
'missingemail=warn,missingtaggerentry=warn' command line parameters
(which will be passed by receive-pack to index-pack and unpack-objects).

To make the parsing robust, we generate strings from the enum keys, and
using these keys, we match up strings without dashes case-insensitively
to the corresponding enum values.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 30 --
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index ab24618..da5717c 100644
--- a/fsck.c
+++ b/fsck.c
@@ -63,15 +63,41 @@ enum fsck_msg_id {
 };
 #undef MSG_ID
 
-#define MSG_ID(id, msg_type) { FSCK_##msg_type },
+#define STR(x) #x
+#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type },
 static struct {
+   const char *id_string;
int msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
FOREACH_MSG_ID(MSG_ID)
-   { -1 }
+   { NULL, -1 }
 };
 #undef MSG_ID
 
+static int parse_msg_id(const char *text, int len)
+{
+   int i, j;
+
+   if (len  0)
+   len = strlen(text);
+
+   for (i = 0; i  FSCK_MSG_MAX; i++) {
+   const char *key = msg_id_info[i].id_string;
+   /* match id_string case-insensitively, without underscores. */
+   for (j = 0; j  len; j++) {
+   char c = *(key++);
+   if (c == '_')
+   c = *(key++);
+   if (toupper(text[j]) != c)
+   break;
+   }
+   if (j == len  !*key)
+   return i;
+   }
+
+   return -1;
+}
+
 static int fsck_msg_type(enum fsck_msg_id msg_id,
struct fsck_options *options)
 {
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 04/19] fsck: Offer a function to demote fsck errors to warnings

2015-06-19 Thread Johannes Schindelin
There are legacy repositories out there whose older commits and tags
have issues that prevent pushing them when 'receive.fsckObjects' is set.
One real-life example is a commit object that has been hand-crafted to
list two authors.

Often, it is not possible to fix those issues without disrupting the
work with said repositories, yet it is still desirable to perform checks
by setting `receive.fsckObjects = true`. This commit is the first step
to allow demoting specific fsck issues to mere warnings.

The `fsck_set_msg_types()` function added by this commit parses a list
of settings in the form:

missingemail=warn,badname=warn,...

Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by
git fsck so far, but other call paths (e.g. git index-pack --strict)
error out *always* no matter what type was specified. Therefore, we need
to take extra care to set all message types to FSCK_ERROR by default in
those cases.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 82 +++---
 fsck.h | 10 ++--
 2 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/fsck.c b/fsck.c
index da5717c..8c3caff 100644
--- a/fsck.c
+++ b/fsck.c
@@ -103,13 +103,85 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 {
int msg_type;
 
-   msg_type = msg_id_info[msg_id].msg_type;
-   if (options-strict  msg_type == FSCK_WARN)
-   msg_type = FSCK_ERROR;
+   assert(msg_id = 0  msg_id  FSCK_MSG_MAX);
+
+   if (options-msg_type)
+   msg_type = options-msg_type[msg_id];
+   else {
+   msg_type = msg_id_info[msg_id].msg_type;
+   if (options-strict  msg_type == FSCK_WARN)
+   msg_type = FSCK_ERROR;
+   }
 
return msg_type;
 }
 
+static inline int substrcmp(const char *string, int len, const char *match)
+{
+   int match_len = strlen(match);
+   if (match_len != len)
+   return -1;
+   return memcmp(string, match, len);
+}
+
+static int parse_msg_type(const char *str, int len)
+{
+   if (len  0)
+   len = strlen(str);
+
+   if (!substrcmp(str, len, error))
+   return FSCK_ERROR;
+   else if (!substrcmp(str, len, warn))
+   return FSCK_WARN;
+   else
+   die(Unknown fsck message type: '%.*s',
+   len, str);
+}
+
+void fsck_set_msg_type(struct fsck_options *options,
+   const char *msg_id, int msg_id_len,
+   const char *msg_type, int msg_type_len)
+{
+   int id = parse_msg_id(msg_id, msg_id_len), type;
+
+   if (id  0)
+   die(Unhandled message id: %.*s, msg_id_len, msg_id);
+   type = parse_msg_type(msg_type, msg_type_len);
+
+   if (!options-msg_type) {
+   int i;
+   int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX);
+   for (i = 0; i  FSCK_MSG_MAX; i++)
+   msg_type[i] = fsck_msg_type(i, options);
+   options-msg_type = msg_type;
+   }
+
+   options-msg_type[id] = type;
+}
+
+void fsck_set_msg_types(struct fsck_options *options, const char *values)
+{
+   while (*values) {
+   int len = strcspn(values,  ,|), equal;
+
+   if (!len) {
+   values++;
+   continue;
+   }
+
+   for (equal = 0; equal  len; equal++)
+   if (values[equal] == '=' || values[equal] == ':')
+   break;
+
+   if (equal == len)
+   die(Missing '=': '%.*s', len, values);
+
+   fsck_set_msg_type(options, values, equal,
+   values + equal + 1, len - equal - 1);
+   values += len;
+   }
+}
+
 __attribute__((format (printf, 4, 5)))
 static int report(struct fsck_options *options, struct object *object,
enum fsck_msg_id id, const char *fmt, ...)
@@ -599,6 +671,10 @@ int fsck_object(struct object *obj, void *data, unsigned 
long size,
 
 int fsck_error_function(struct object *obj, int msg_type, const char *message)
 {
+   if (msg_type == FSCK_WARN) {
+   warning(object %s: %s, sha1_to_hex(obj-sha1), message);
+   return 0;
+   }
error(object %s: %s, sha1_to_hex(obj-sha1), message);
return 1;
 }
diff --git a/fsck.h b/fsck.h
index f6f268a..edb4540 100644
--- a/fsck.h
+++ b/fsck.h
@@ -6,6 +6,11 @@
 
 struct fsck_options;
 
+void fsck_set_msg_type(struct fsck_options *options,
+   const char *msg_id, int msg_id_len,
+   const char *msg_type, int msg_type_len);
+void fsck_set_msg_types(struct fsck_options *options, const char *values);
+
 /*
  * callback function for fsck_walk
  * type is the expected type of the object or OBJ_ANY
@@ -25,10 +30,11 @@ struct fsck_options {
fsck_walk_func walk;
fsck_error 

[PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-19 Thread Johannes Schindelin
This option avoids unpacking each and all objects, and just verifies the
connectivity. In particular with large repositories, this speeds up the
operation, at the expense of missing corrupt blobs and ignoring
unreachable objects, if any.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/git-fsck.txt |  7 ++-
 builtin/fsck.c |  7 ++-
 t/t1450-fsck.sh| 22 ++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index 25c431d..b98fb43 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 
 [verse]
 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
-[--[no-]full] [--strict] [--verbose] [--lost-found]
+[--[no-]full] [--quick] [--strict] [--verbose] [--lost-found]
 [--[no-]dangling] [--[no-]progress] [object*]
 
 DESCRIPTION
@@ -60,6 +60,11 @@ index file, all SHA-1 references in `refs` namespace, and 
all reflogs
object pools.  This is now default; you can turn it off
with --no-full.
 
+--quick::
+   Check only the connectivity of tags, commits and tree objects. By
+   avoiding to unpack blobs, this speeds up the operation, at the
+   expense of missing corrupt objects.
+
 --strict::
Enable more strict checking, namely to catch a file mode
recorded with g+w bit set, which was created by older
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 6de9f3e..75fcb5f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -23,6 +23,7 @@ static int show_tags;
 static int show_unreachable;
 static int include_reflogs = 1;
 static int check_full = 1;
+static int quick;
 static int check_strict;
 static int keep_cache_objects;
 static struct fsck_options fsck_walk_options = FSCK_OPTIONS_DEFAULT;
@@ -181,6 +182,8 @@ static void check_reachable_object(struct object *obj)
if (!(obj-flags  HAS_OBJ)) {
if (has_sha1_pack(obj-sha1))
return; /* it is in pack - forget about it */
+   if (quick  has_sha1_file(obj-sha1))
+   return;
printf(missing %s %s\n, typename(obj-type), 
sha1_to_hex(obj-sha1));
errors_found |= ERROR_REACHABLE;
return;
@@ -623,6 +626,7 @@ static struct option fsck_opts[] = {
OPT_BOOL(0, cache, keep_cache_objects, N_(make index objects head 
nodes)),
OPT_BOOL(0, reflogs, include_reflogs, N_(make reflogs head nodes 
(default))),
OPT_BOOL(0, full, check_full, N_(also consider packs and alternate 
objects)),
+   OPT_BOOL(0, quick, quick, N_(check only connectivity)),
OPT_BOOL(0, strict, check_strict, N_(enable more strict checking)),
OPT_BOOL(0, lost-found, write_lost_and_found,
N_(write dangling objects in 
.git/lost-found)),
@@ -659,7 +663,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
git_config(fsck_config, NULL);
 
fsck_head_link();
-   fsck_object_dir(get_object_directory());
+   if (!quick)
+   fsck_object_dir(get_object_directory());
 
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt-next) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 922c346..2863a8a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -431,4 +431,26 @@ test_expect_success 'fsck notices ref pointing to missing 
tag' '
test_must_fail git -C missing fsck
 '
 
+test_expect_success 'fsck --quick' '
+   rm -rf quick 
+   git init quick 
+   (
+   cd quick 
+   touch empty 
+   git add empty 
+   test_commit empty 
+   empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 
+   rm -f $empty 
+   echo invalid $empty 
+   test_must_fail git fsck --strict 
+   git fsck --strict --quick 
+   tree=$(git rev-parse HEAD:) 
+   suffix=${tree#??} 
+   tree=.git/objects/${tree%$suffix}/$suffix 
+   rm -f $tree 
+   echo invalid $tree 
+   test_must_fail git fsck --strict --quick
+   )
+'
+
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 15/19] fsck: Document the new receive.fsck.msg-id options

2015-06-19 Thread Johannes Schindelin
Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3e37b93..306ab7a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2205,6 +2205,20 @@ receive.fsckObjects::
Defaults to false. If not set, the value of `transfer.fsckObjects`
is used instead.
 
+receive.fsck.msg-id::
+   When `receive.fsckObjects` is set to true, errors can be switched
+   to warnings and vice versa by configuring the `receive.fsck.msg-id`
+   setting where the `msg-id` is the fsck message ID and the value
+   is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
+   the error/warning with the message ID, e.g. missingemail: invalid
+   author/committer line - missing email means that setting
+   `receive.fsck.missingemail = ignore` will hide that issue.
++
+This feature is intended to support working with legacy repositories
+which would not pass pushing when `receive.fsckObjects = true`, allowing
+the host to accept repositories with certain known issues but still catch
+other issues.
+
 receive.unpackLimit::
If the number of objects received in a push is below this
limit then the objects will be unpacked into loose object
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 19/19] fsck: support ignoring objects in `git fsck` via fsck.skiplist

2015-06-19 Thread Johannes Schindelin
Identical to support in `git receive-pack for the config option
`receive.fsck.skiplist`, we now support ignoring given objects in
`git fsck` via `fsck.skiplist` altogether.

This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt |  7 +++
 builtin/fsck.c   | 10 ++
 2 files changed, 17 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5f45115..5aba63a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1261,6 +1261,13 @@ that setting `fsck.missingemail = ignore` will hide that 
issue.
 This feature is intended to support working with legacy repositories
 which cannot be repaired without disruptive changes.
 
+fsck.skipList::
+   The path to a sorted list of object names (i.e. one SHA-1 per
+   line) that are known to be broken in a non-fatal way and should
+   be ignored. This feature is useful when an established project
+   should be accepted despite early commits containing errors that
+   can be safely ignored such as invalid committer email addresses.
+
 gc.aggressiveDepth::
The depth parameter used in the delta compression
algorithm used by 'git gc --aggressive'.  This defaults
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 75fcb5f..ce538ac 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -54,6 +54,16 @@ static int fsck_config(const char *var, const char *value, 
void *cb)
return 0;
}
 
+   if (strcmp(var, fsck.skiplist) == 0) {
+   const char *path = is_absolute_path(value) ?
+   value : git_path(%s, value);
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(sb, skiplist=%s, path);
+   fsck_set_msg_types(fsck_obj_options, sb.buf);
+   strbuf_release(sb);
+   return 0;
+   }
+
return git_default_config(var, value, cb);
 }
 
-- 
2.3.1.windows.1.9.g8c01ab4


--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 18/19] fsck: git receive-pack: support excluding objects from fsck'ing

2015-06-19 Thread Johannes Schindelin
The optional new config option `receive.fsck.skiplist` specifies the path
to a file listing the names, i.e. SHA-1s, one per line, of objects that
are to be ignored by `git receive-pack` when `receive.fsckObjects = true`.

This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).

The intended use case is for server administrators to inspect objects
that are reported by `git push` as being too problematic to enter the
repository, and to add the objects' SHA-1 to a (preferably sorted) file
when the objects are legitimate, i.e. when it is determined that those
problematic objects should be allowed to enter the server.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt|  7 ++
 builtin/receive-pack.c  |  8 ++
 fsck.c  | 54 +
 fsck.h  |  1 +
 t/t5504-fetch-receive-strict.sh | 12 +
 5 files changed, 82 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 41fd460..5f45115 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2230,6 +2230,13 @@ which would not pass pushing when `receive.fsckObjects = 
true`, allowing
 the host to accept repositories with certain known issues but still catch
 other issues.
 
+receive.fsck.skipList::
+   The path to a sorted list of object names (i.e. one SHA-1 per
+   line) that are known to be broken in a non-fatal way and should
+   be ignored. This feature is useful when an established project
+   should be accepted despite early commits containing errors that
+   can be safely ignored such as invalid committer email addresses.
+
 receive.unpackLimit::
If the number of objects received in a push is below this
limit then the objects will be unpacked into loose object
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 3afe8f8..80574f9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -117,6 +117,14 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
+   if (strcmp(var, receive.fsck.skiplist) == 0) {
+   const char *path = is_absolute_path(value) ?
+   value : git_path(%s, value);
+   strbuf_addf(fsck_msg_types, %cskiplist=%s,
+   fsck_msg_types.len ? ',' : '=', path);
+   return 0;
+   }
+
if (skip_prefix(var, receive.fsck., var)) {
if (is_valid_msg_type(var, value))
strbuf_addf(fsck_msg_types, %c%s=%s,
diff --git a/fsck.c b/fsck.c
index dedad01..f80b508 100644
--- a/fsck.c
+++ b/fsck.c
@@ -8,6 +8,7 @@
 #include fsck.h
 #include refs.h
 #include utf8.h
+#include sha1-array.h
 
 #define FSCK_FATAL -1
 #define FSCK_INFO -2
@@ -122,6 +123,43 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
return msg_type;
 }
 
+static void init_skiplist(struct fsck_options *options, const char *path)
+{
+   static struct sha1_array skiplist = SHA1_ARRAY_INIT;
+   int sorted, fd;
+   char buffer[41];
+   unsigned char sha1[20];
+
+   if (options-skiplist)
+   sorted = options-skiplist-sorted;
+   else {
+   sorted = 1;
+   options-skiplist = skiplist;
+   }
+
+   fd = open(path, O_RDONLY);
+   if (fd  0)
+   die(Could not open skip list: %s, path);
+   for (;;) {
+   int result = read_in_full(fd, buffer, sizeof(buffer));
+   if (result  0)
+   die_errno(Could not read '%s', path);
+   if (!result)
+   break;
+   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n')
+   die(Invalid SHA-1: %s, buffer);
+   sha1_array_append(skiplist, sha1);
+   if (sorted  skiplist.nr  1 
+   hashcmp(skiplist.sha1[skiplist.nr - 2],
+   sha1)  0)
+   sorted = 0;
+   }
+   close(fd);
+
+   if (sorted)
+   skiplist.sorted = 1;
+}
+
 static inline int substrcmp(const char *string, int len, const char *match)
 {
int match_len = strlen(match);
@@ -193,6 +231,18 @@ void fsck_set_msg_types(struct fsck_options *options, 
const char *values)
if (values[equal] == '=' || values[equal] == ':')
break;
 
+   if (!substrcmp(values, equal, skiplist)) {
+   char *path = xstrndup(values + equal + 1,
+   len - equal - 1);
+
+   if (equal == len)
+   die(skiplist requires a path);
+   init_skiplist(options, 

Re: [msysGit] crlf issues

2015-06-19 Thread Johannes Schindelin
Hi Rusi,

Cc:ing the main Git list, as some of this has been discussed at the GitMerge in 
Paris, and some participants of that discussion might want to join this thread.
 
On 2015-06-19 14:55, Rusi Mody wrote:
 Trying to setup git.
 Majority folks are MS VS users but also some *nixers.
 
 My current understanding/analysis of the situation:
 
 1. autocrlf is broken -- keep it false 
 2. safecrlf is a cure (to some extent) -- keep it true
 3. core.eol can be left native (or is crlf on windows/lf on *nix 
 preferred??)
 
 And of course most crucial: 
 4. Make sure ALL file types are covered in .gitattributes; eg
 
 *.txt text
 *.cs text
 :
 :
 *.pdf binary
 etc
 
 Still when we get a merge conflict the file (created by git with conflict 
 markers) shows ^M in vi
 ie I assume its unclean line endings.
 
 Any tips what to do?

The most important preliminary question is: how cross platform is your project? 
If there is no risk of any developer working on a non-Windows platform, you 
could simply commit CRLF and be done with it.

As to autocrlf: I actually had some good experience with setting it to `input` 
when working with other developers.

Ciao,
Johannes
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Bug report: Unhandled Exception

2015-06-19 Thread Johannes Schindelin
Hi Gary,

On 2015-06-19 16:09, Gary England wrote:

 Will do. It's unlikely that I'll be able to reproduce the error. But,
 in the event that it happens again, I'll let you know.

For the record, the problem usually lies with a certain implementation detail 
on 32-bit only (if you know the `rebase.exe` tool -- confusingly completely 
unrelated to `git rebase`! -- you will know the root cause). That's why it is 
typically a sporadic problem only.

Ciao,
Johannes
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-19 Thread Johannes Schindelin
At the moment, the git-fsck's integrity checks are targeted toward the
end user, i.e. the error messages are really just messages, intended for
human consumption.

Under certain circumstances, some of those errors should be allowed to
be turned into mere warnings, though, because the cost of fixing the
issues might well be larger than the cost of carrying those flawed
objects. For example, when an already-public repository contains a
commit object with two authors for years, it does not make sense to
force the maintainer to rewrite the history, affecting all contributors
negatively by forcing them to update.

This branch introduces an internal fsck API to be able to turn some of
the errors into warnings, and to make it easier to call the fsck
machinery from elsewhere in general.

I am proud to report that this work has been sponsored by GitHub.

Changes since v5:

- The BAD_* names have been renamed to INVALID_* names

Interdiff below the diffstat.

Johannes Schindelin (19):
  fsck: Introduce fsck options
  fsck: Introduce identifiers for fsck messages
  fsck: Provide a function to parse fsck message IDs
  fsck: Offer a function to demote fsck errors to warnings
  fsck (receive-pack): Allow demoting errors to warnings
  fsck: Report the ID of the error/warning
  fsck: Make fsck_ident() warn-friendly
  fsck: Make fsck_commit() warn-friendly
  fsck: Handle multiple authors in commits specially
  fsck: Make fsck_tag() warn-friendly
  fsck: Add a simple test for receive.fsck.msg-id
  fsck: Disallow demoting grave fsck errors to warnings
  fsck: Optionally ignore specific fsck issues completely
  fsck: Allow upgrading fsck warnings to errors
  fsck: Document the new receive.fsck.msg-id options
  fsck: Support demoting errors to warnings
  fsck: Introduce `git fsck --quick`
  fsck: git receive-pack: support excluding objects from fsck'ing
  fsck: support ignoring objects in `git fsck` via fsck.skiplist

 Documentation/config.txt|  39 +++
 Documentation/git-fsck.txt  |   7 +-
 builtin/fsck.c  |  75 --
 builtin/index-pack.c|  13 +-
 builtin/receive-pack.c  |  25 +-
 builtin/unpack-objects.c|  16 +-
 fsck.c  | 553 +++-
 fsck.h  |  31 ++-
 t/t1450-fsck.sh |  37 ++-
 t/t5302-pack-index.sh   |   2 +-
 t/t5504-fetch-receive-strict.sh |  51 
 11 files changed, 686 insertions(+), 163 deletions(-)

diff --git a/fsck.c b/fsck.c
index 9b8981e..f80b508 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,17 +19,17 @@
FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \
FUNC(BAD_DATE, ERROR) \
+   FUNC(BAD_DATE_OVERFLOW, ERROR) \
FUNC(BAD_EMAIL, ERROR) \
FUNC(BAD_NAME, ERROR) \
+   FUNC(BAD_OBJECT_SHA1, ERROR) \
FUNC(BAD_PARENT_SHA1, ERROR) \
+   FUNC(BAD_TAG_OBJECT, ERROR) \
FUNC(BAD_TIMEZONE, ERROR) \
+   FUNC(BAD_TREE, ERROR) \
FUNC(BAD_TREE_SHA1, ERROR) \
-   FUNC(DATE_OVERFLOW, ERROR) \
+   FUNC(BAD_TYPE, ERROR) \
FUNC(DUPLICATE_ENTRIES, ERROR) \
-   FUNC(INVALID_OBJECT_SHA1, ERROR) \
-   FUNC(INVALID_TAG_OBJECT, ERROR) \
-   FUNC(INVALID_TREE, ERROR) \
-   FUNC(INVALID_TYPE, ERROR) \
FUNC(MISSING_AUTHOR, ERROR) \
FUNC(MISSING_COMMITTER, ERROR) \
FUNC(MISSING_EMAIL, ERROR) \
@@ -46,8 +46,8 @@
FUNC(MISSING_TYPE, ERROR) \
FUNC(MISSING_TYPE_ENTRY, ERROR) \
FUNC(MULTIPLE_AUTHORS, ERROR) \
-   FUNC(NOT_SORTED, ERROR) \
FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
+   FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
@@ -60,7 +60,7 @@
FUNC(NULL_SHA1, WARN) \
FUNC(ZERO_PADDED_FILEMODE, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
-   FUNC(INVALID_TAG_NAME, INFO) \
+   FUNC(BAD_TAG_NAME, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO)
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
@@ -525,7 +525,7 @@ static int fsck_tree(struct tree *item, struct fsck_options 
*options)
if (has_dup_entries)
retval += report(options, item-object, 
FSCK_MSG_DUPLICATE_ENTRIES, contains duplicate file entries);
if (not_properly_sorted)
-   retval += report(options, item-object, FSCK_MSG_NOT_SORTED, 
not properly sorted);
+   retval += report(options, item-object, 
FSCK_MSG_TREE_NOT_SORTED, not properly sorted);
return retval;
 }
 
@@ -580,7 +580,7 @@ static int fsck_ident(const char **ident, struct object 
*obj, struct fsck_option
if (*p == '0'  p[1] != ' ')
return report(options, obj, FSCK_MSG_ZERO_PADDED_DATE, invalid 
author/committer line - zero-padded date);
if (date_overflows(strtoul(p, end, 10)))
-   return report(options, obj, FSCK_MSG_DATE_OVERFLOW, invalid 
author/committer line - 

Re: [PATCH v2 05/12] delete_refs(): improve error message

2015-06-19 Thread Michael Haggerty
On 06/15/2015 08:29 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 Subject: Re: [PATCH v2 05/12] delete_refs(): improve error message
 
 I'd call this make error message more generic.
 
 Change the error message from

 Could not remove branch %s

 to

 could not remove reference %s

 * s/branch/reference/. This change makes sense even for the existing
   caller, which uses the function to delete remote-tracking
   branches.
 
 and replace this bullet with something like:
 
 * Originally 'branch -d' was the only caller of this, but as part
   of the refs API, we will allow it to be called on refs that is not
   a branch or a remote-tracking branch.
 
 as calling a remote-tracking branch a 'branch' is not incorrect
 per-se.

True, but branch refs/remotes/origin/foo is not how we would usually
refer to that reference. Usually we would say reference
refs/remotes/origin/foo or remote-tracking branch origin/foo. But I
will make approximately the change that you suggest.

 What would count as true improvement is ...
 
 * Convert it to lower case, as per our usual convention.
 
 ... this one.  If somebody eventually chooses to make the message
 finer grained by switching on the prefix refs/{what}, so that the
 user can differentiate between branches, remote-tracking branches,
 tags, etc., that would also count as improvement as well.

I thought about proposing a function that changes a reference name into
its prose equivalent, but I'm pretty sure that the result would not be
internationalizable. Probably such a function would have to look like

enum refname_kind { REFNAME_KIND_BRANCH, REFNAME_KIND_TAG,
REFNAME_KIND_REMOTE, ...(?), REFNAME_KIND_OTHER};

enum refname_kind shorten_refname(const char *refname, const char
**short_refname);

and callers would still need logic to pick the correct
(internationalized) formatting string based on the return value of the
function.

Someday...

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 02/19] fsck: Introduce identifiers for fsck messages

2015-06-19 Thread Johannes Schindelin
Instead of specifying whether a message by the fsck machinery constitutes
an error or a warning, let's specify an identifier relating to the
concrete problem that was encountered. This is necessary for upcoming
support to be able to demote certain errors to warnings.

In the process, simplify the requirements on the calling code: instead of
having to handle full-blown varargs in every callback, we now send a
string buffer ready to be used by the callback.

We could use a simple enum for the message IDs here, but we want to
guarantee that the enum values are associated with the appropriate
message types (i.e. error or warning?). Besides, we want to introduce a
parser in the next commit that maps the string representation to the
enum value, hence we use the slightly ugly preprocessor construct that
is extensible for use with said parser.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 builtin/fsck.c |  26 +++-
 fsck.c | 201 +
 fsck.h |   5 +-
 3 files changed, 154 insertions(+), 78 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 981dca5..fff38fe 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -46,33 +46,23 @@ static int show_dangling = 1;
 #define DIRENT_SORT_HINT(de) ((de)-d_ino)
 #endif
 
-static void objreport(struct object *obj, const char *severity,
-  const char *err, va_list params)
+static void objreport(struct object *obj, const char *msg_type,
+   const char *err)
 {
-   fprintf(stderr, %s in %s %s: ,
-   severity, typename(obj-type), sha1_to_hex(obj-sha1));
-   vfprintf(stderr, err, params);
-   fputs(\n, stderr);
+   fprintf(stderr, %s in %s %s: %s\n,
+   msg_type, typename(obj-type), sha1_to_hex(obj-sha1), err);
 }
 
-__attribute__((format (printf, 2, 3)))
-static int objerror(struct object *obj, const char *err, ...)
+static int objerror(struct object *obj, const char *err)
 {
-   va_list params;
-   va_start(params, err);
errors_found |= ERROR_OBJECT;
-   objreport(obj, error, err, params);
-   va_end(params);
+   objreport(obj, error, err);
return -1;
 }
 
-__attribute__((format (printf, 3, 4)))
-static int fsck_error_func(struct object *obj, int type, const char *err, ...)
+static int fsck_error_func(struct object *obj, int type, const char *message)
 {
-   va_list params;
-   va_start(params, err);
-   objreport(obj, (type == FSCK_WARN) ? warning : error, err, params);
-   va_end(params);
+   objreport(obj, (type == FSCK_WARN) ? warning : error, message);
return (type == FSCK_WARN) ? 0 : 1;
 }
 
diff --git a/fsck.c b/fsck.c
index d83b811..ab24618 100644
--- a/fsck.c
+++ b/fsck.c
@@ -9,6 +9,98 @@
 #include refs.h
 #include utf8.h
 
+#define FOREACH_MSG_ID(FUNC) \
+   /* errors */ \
+   FUNC(BAD_DATE, ERROR) \
+   FUNC(BAD_DATE_OVERFLOW, ERROR) \
+   FUNC(BAD_EMAIL, ERROR) \
+   FUNC(BAD_NAME, ERROR) \
+   FUNC(BAD_OBJECT_SHA1, ERROR) \
+   FUNC(BAD_PARENT_SHA1, ERROR) \
+   FUNC(BAD_TAG_OBJECT, ERROR) \
+   FUNC(BAD_TIMEZONE, ERROR) \
+   FUNC(BAD_TREE, ERROR) \
+   FUNC(BAD_TREE_SHA1, ERROR) \
+   FUNC(BAD_TYPE, ERROR) \
+   FUNC(DUPLICATE_ENTRIES, ERROR) \
+   FUNC(MISSING_AUTHOR, ERROR) \
+   FUNC(MISSING_COMMITTER, ERROR) \
+   FUNC(MISSING_EMAIL, ERROR) \
+   FUNC(MISSING_GRAFT, ERROR) \
+   FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+   FUNC(MISSING_OBJECT, ERROR) \
+   FUNC(MISSING_PARENT, ERROR) \
+   FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+   FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+   FUNC(MISSING_TAG, ERROR) \
+   FUNC(MISSING_TAG_ENTRY, ERROR) \
+   FUNC(MISSING_TAG_OBJECT, ERROR) \
+   FUNC(MISSING_TREE, ERROR) \
+   FUNC(MISSING_TYPE, ERROR) \
+   FUNC(MISSING_TYPE_ENTRY, ERROR) \
+   FUNC(NUL_IN_HEADER, ERROR) \
+   FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
+   FUNC(TREE_NOT_SORTED, ERROR) \
+   FUNC(UNKNOWN_TYPE, ERROR) \
+   FUNC(UNTERMINATED_HEADER, ERROR) \
+   FUNC(ZERO_PADDED_DATE, ERROR) \
+   /* warnings */ \
+   FUNC(BAD_FILEMODE, WARN) \
+   FUNC(BAD_TAG_NAME, WARN) \
+   FUNC(EMPTY_NAME, WARN) \
+   FUNC(FULL_PATHNAME, WARN) \
+   FUNC(HAS_DOT, WARN) \
+   FUNC(HAS_DOTDOT, WARN) \
+   FUNC(HAS_DOTGIT, WARN) \
+   FUNC(MISSING_TAGGER_ENTRY, WARN) \
+   FUNC(NULL_SHA1, WARN) \
+   FUNC(ZERO_PADDED_FILEMODE, WARN)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+   FOREACH_MSG_ID(MSG_ID)
+   FSCK_MSG_MAX
+};
+#undef MSG_ID
+
+#define MSG_ID(id, msg_type) { FSCK_##msg_type },
+static struct {
+   int msg_type;
+} msg_id_info[FSCK_MSG_MAX + 1] = {
+   FOREACH_MSG_ID(MSG_ID)
+   { -1 }
+};
+#undef MSG_ID
+
+static int fsck_msg_type(enum fsck_msg_id msg_id,
+   struct fsck_options *options)
+{
+   

[PATCH v6 05/19] fsck (receive-pack): Allow demoting errors to warnings

2015-06-19 Thread Johannes Schindelin
For example, missing emails in commit and tag objects can be demoted to
mere warnings with

git config receive.fsck.missingemail=warn

The value is actually a comma-separated list.

In case that the same key is listed in multiple receive.fsck.msg-id
lines in the config, the latter configuration wins (this can happen for
example when both $HOME/.gitconfig and .git/config contain message type
settings).

As git receive-pack does not actually perform the checks, it hands off
the setting to index-pack or unpack-objects in the form of an optional
argument to the --strict option.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 builtin/index-pack.c |  4 
 builtin/receive-pack.c   | 17 +++--
 builtin/unpack-objects.c |  5 +
 fsck.c   |  8 
 fsck.h   |  1 +
 5 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 87ae9ba..98e14fe 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1633,6 +1633,10 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
} else if (!strcmp(arg, --strict)) {
strict = 1;
do_fsck_object = 1;
+   } else if (skip_prefix(arg, --strict=, arg)) {
+   strict = 1;
+   do_fsck_object = 1;
+   fsck_set_msg_types(fsck_options, arg);
} else if (!strcmp(arg, 
--check-self-contained-and-connected)) {
strict = 1;
check_self_contained_and_connected = 1;
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 94d0571..3afe8f8 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -19,6 +19,7 @@
 #include tag.h
 #include gpg-interface.h
 #include sigchain.h
+#include fsck.h
 
 static const char receive_pack_usage[] = git receive-pack git-dir;
 
@@ -36,6 +37,7 @@ static enum deny_action deny_current_branch = 
DENY_UNCONFIGURED;
 static enum deny_action deny_delete_current = DENY_UNCONFIGURED;
 static int receive_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
+static struct strbuf fsck_msg_types = STRBUF_INIT;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int advertise_atomic_push = 1;
@@ -115,6 +117,15 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
+   if (skip_prefix(var, receive.fsck., var)) {
+   if (is_valid_msg_type(var, value))
+   strbuf_addf(fsck_msg_types, %c%s=%s,
+   fsck_msg_types.len ? ',' : '=', var, value);
+   else
+   warning(Skipping unknown msg id '%s', var);
+   return 0;
+   }
+
if (strcmp(var, receive.fsckobjects) == 0) {
receive_fsck_objects = git_config_bool(var, value);
return 0;
@@ -1490,7 +1501,8 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
if (quiet)
argv_array_push(child.args, -q);
if (fsck_objects)
-   argv_array_push(child.args, --strict);
+   argv_array_pushf(child.args, --strict%s,
+   fsck_msg_types.buf);
child.no_stdout = 1;
child.err = err_fd;
child.git_cmd = 1;
@@ -1508,7 +1520,8 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
argv_array_pushl(child.args, index-pack,
 --stdin, hdr_arg, keep_arg, NULL);
if (fsck_objects)
-   argv_array_push(child.args, --strict);
+   argv_array_pushf(child.args, --strict%s,
+   fsck_msg_types.buf);
if (fix_thin)
argv_array_push(child.args, --fix-thin);
child.out = -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 6d17040..7cc086f 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -530,6 +530,11 @@ int cmd_unpack_objects(int argc, const char **argv, const 
char *prefix)
strict = 1;
continue;
}
+   if (skip_prefix(arg, --strict=, arg)) {
+   strict = 1;
+   fsck_set_msg_types(fsck_options, arg);
+   continue;
+   }
if (starts_with(arg, --pack_header=)) {
struct pack_header *hdr;
char *c;
diff --git a/fsck.c b/fsck.c
index 8c3caff..8e6faa8 100644
--- a/fsck.c

[PATCH v6 06/19] fsck: Report the ID of the error/warning

2015-06-19 Thread Johannes Schindelin
Some legacy code has objects with non-fatal fsck issues; To enable the
user to ignore those issues, let's print out the ID (e.g. when
encountering missingemail, the user might want to call `git config
--add receive.fsck.missingemail=warn`).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c  | 16 
 t/t1450-fsck.sh |  4 ++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 8e6faa8..0b3e18f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -190,6 +190,20 @@ void fsck_set_msg_types(struct fsck_options *options, 
const char *values)
}
 }
 
+static void append_msg_id(struct strbuf *sb, const char *msg_id)
+{
+   for (;;) {
+   char c = *(msg_id)++;
+
+   if (!c)
+   break;
+   if (c != '_')
+   strbuf_addch(sb, tolower(c));
+   }
+
+   strbuf_addstr(sb, : );
+}
+
 __attribute__((format (printf, 4, 5)))
 static int report(struct fsck_options *options, struct object *object,
enum fsck_msg_id id, const char *fmt, ...)
@@ -198,6 +212,8 @@ static int report(struct fsck_options *options, struct 
object *object,
struct strbuf sb = STRBUF_INIT;
int msg_type = fsck_msg_type(id, options), result;
 
+   append_msg_id(sb, msg_id_info[id].id_string);
+
va_start(ap, fmt);
strbuf_vaddf(sb, fmt, ap);
result = options-error_func(object, msg_type, sb.buf);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index cfb32b6..d6d3b13 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -231,8 +231,8 @@ test_expect_success 'tag with incorrect tag name  missing 
tagger' '
git fsck --tags 2out 
 
cat expect -EOF 
-   warning in tag $tag: invalid '\''tag'\'' name: wrong name format
-   warning in tag $tag: invalid format - expected '\''tagger'\'' line
+   warning in tag $tag: badtagname: invalid '\''tag'\'' name: wrong name 
format
+   warning in tag $tag: missingtaggerentry: invalid format - expected 
'\''tagger'\'' line
EOF
test_cmp expect out
 '
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 07/19] fsck: Make fsck_ident() warn-friendly

2015-06-19 Thread Johannes Schindelin
When fsck_ident() identifies a problem with the ident, it should still
advance the pointer to the next line so that fsck can continue in the
case of a mere warning.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 49 +++--
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/fsck.c b/fsck.c
index 0b3e18f..9faaf53 100644
--- a/fsck.c
+++ b/fsck.c
@@ -479,40 +479,45 @@ static int require_end_of_header(const void *data, 
unsigned long size,
 
 static int fsck_ident(const char **ident, struct object *obj, struct 
fsck_options *options)
 {
+   const char *p = *ident;
char *end;
 
-   if (**ident == '')
+   *ident = strchrnul(*ident, '\n');
+   if (**ident == '\n')
+   (*ident)++;
+
+   if (*p == '')
return report(options, obj, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, 
invalid author/committer line - missing space before email);
-   *ident += strcspn(*ident, \n);
-   if (**ident == '')
+   p += strcspn(p, \n);
+   if (*p == '')
return report(options, obj, FSCK_MSG_BAD_NAME, invalid 
author/committer line - bad name);
-   if (**ident != '')
+   if (*p != '')
return report(options, obj, FSCK_MSG_MISSING_EMAIL, invalid 
author/committer line - missing email);
-   if ((*ident)[-1] != ' ')
+   if (p[-1] != ' ')
return report(options, obj, 
FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, invalid author/committer line - missing 
space before email);
-   (*ident)++;
-   *ident += strcspn(*ident, \n);
-   if (**ident != '')
+   p++;
+   p += strcspn(p, \n);
+   if (*p != '')
return report(options, obj, FSCK_MSG_BAD_EMAIL, invalid 
author/committer line - bad email);
-   (*ident)++;
-   if (**ident != ' ')
+   p++;
+   if (*p != ' ')
return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, 
invalid author/committer line - missing space before date);
-   (*ident)++;
-   if (**ident == '0'  (*ident)[1] != ' ')
+   p++;
+   if (*p == '0'  p[1] != ' ')
return report(options, obj, FSCK_MSG_ZERO_PADDED_DATE, invalid 
author/committer line - zero-padded date);
-   if (date_overflows(strtoul(*ident, end, 10)))
+   if (date_overflows(strtoul(p, end, 10)))
return report(options, obj, FSCK_MSG_BAD_DATE_OVERFLOW, 
invalid author/committer line - date causes integer overflow);
-   if (end == *ident || *end != ' ')
+   if ((end == p || *end != ' '))
return report(options, obj, FSCK_MSG_BAD_DATE, invalid 
author/committer line - bad date);
-   *ident = end + 1;
-   if ((**ident != '+'  **ident != '-') ||
-   !isdigit((*ident)[1]) ||
-   !isdigit((*ident)[2]) ||
-   !isdigit((*ident)[3]) ||
-   !isdigit((*ident)[4]) ||
-   ((*ident)[5] != '\n'))
+   p = end + 1;
+   if ((*p != '+'  *p != '-') ||
+   !isdigit(p[1]) ||
+   !isdigit(p[2]) ||
+   !isdigit(p[3]) ||
+   !isdigit(p[4]) ||
+   (p[5] != '\n'))
return report(options, obj, FSCK_MSG_BAD_TIMEZONE, invalid 
author/committer line - bad time zone);
-   (*ident) += 6;
+   p += 6;
return 0;
 }
 
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 08/19] fsck: Make fsck_commit() warn-friendly

2015-06-19 Thread Johannes Schindelin
When fsck_commit() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.

Note that some problems are too problematic to simply ignore. For
example, when the header lines are mixed up, we punt after encountering
an incorrect line. Therefore, demoting certain warnings to errors can
hide other problems. Example: demoting the missingauthor error to
a warning would hide a problematic committer line.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index 9faaf53..9fe9f48 100644
--- a/fsck.c
+++ b/fsck.c
@@ -534,12 +534,18 @@ static int fsck_commit_buffer(struct commit *commit, 
const char *buffer,
 
if (!skip_prefix(buffer, tree , buffer))
return report(options, commit-object, FSCK_MSG_MISSING_TREE, 
invalid format - expected 'tree' line);
-   if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
-   return report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
invalid 'tree' line format - bad sha1);
+   if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n') {
+   err = report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
invalid 'tree' line format - bad sha1);
+   if (err)
+   return err;
+   }
buffer += 41;
while (skip_prefix(buffer, parent , buffer)) {
-   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n')
-   return report(options, commit-object, 
FSCK_MSG_BAD_PARENT_SHA1, invalid 'parent' line format - bad sha1);
+   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
+   err = report(options, commit-object, 
FSCK_MSG_BAD_PARENT_SHA1, invalid 'parent' line format - bad sha1);
+   if (err)
+   return err;
+   }
buffer += 41;
parent_line_count++;
}
@@ -548,11 +554,17 @@ static int fsck_commit_buffer(struct commit *commit, 
const char *buffer,
if (graft) {
if (graft-nr_parent == -1  !parent_count)
; /* shallow commit */
-   else if (graft-nr_parent != parent_count)
-   return report(options, commit-object, 
FSCK_MSG_MISSING_GRAFT, graft objects missing);
+   else if (graft-nr_parent != parent_count) {
+   err = report(options, commit-object, 
FSCK_MSG_MISSING_GRAFT, graft objects missing);
+   if (err)
+   return err;
+   }
} else {
-   if (parent_count != parent_line_count)
-   return report(options, commit-object, 
FSCK_MSG_MISSING_PARENT, parent objects missing);
+   if (parent_count != parent_line_count) {
+   err = report(options, commit-object, 
FSCK_MSG_MISSING_PARENT, parent objects missing);
+   if (err)
+   return err;
+   }
}
if (!skip_prefix(buffer, author , buffer))
return report(options, commit-object, 
FSCK_MSG_MISSING_AUTHOR, invalid format - expected 'author' line);
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 11/19] fsck: Add a simple test for receive.fsck.msg-id

2015-06-19 Thread Johannes Schindelin
Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 t/t5504-fetch-receive-strict.sh | 21 +
 1 file changed, 21 insertions(+)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 69ee13c..3f7e96a 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -115,4 +115,25 @@ test_expect_success 'push with transfer.fsckobjects' '
test_cmp exp act
 '
 
+cat bogus-commit \EOF
+tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
+author Bugs Bunny 1234567890 +
+committer Bugs Bunny b...@bun.ni 1234567890 +
+
+This commit object intentionally broken
+EOF
+
+test_expect_success 'push with receive.fsck.missingemail=warn' '
+   commit=$(git hash-object -t commit -w --stdin bogus-commit) 
+   git push . $commit:refs/heads/bogus 
+   rm -rf dst 
+   git init dst 
+   git --git-dir=dst/.git config receive.fsckobjects true 
+   test_must_fail git push --porcelain dst bogus 
+   git --git-dir=dst/.git config \
+   receive.fsck.missingemail warn 
+   git push --porcelain dst bogus act 21 
+   grep missingemail act
+'
+
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 12/19] fsck: Disallow demoting grave fsck errors to warnings

2015-06-19 Thread Johannes Schindelin
Some kinds of errors are intrinsically unrecoverable (e.g. errors while
uncompressing objects). It does not make sense to allow demoting them to
mere warnings.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c  | 14 --
 t/t5504-fetch-receive-strict.sh | 11 +++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 21e3052..a4fbce3 100644
--- a/fsck.c
+++ b/fsck.c
@@ -9,7 +9,12 @@
 #include refs.h
 #include utf8.h
 
+#define FSCK_FATAL -1
+
 #define FOREACH_MSG_ID(FUNC) \
+   /* fatal errors */ \
+   FUNC(NUL_IN_HEADER, FATAL) \
+   FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \
FUNC(BAD_DATE, ERROR) \
FUNC(BAD_DATE_OVERFLOW, ERROR) \
@@ -39,11 +44,9 @@
FUNC(MISSING_TYPE, ERROR) \
FUNC(MISSING_TYPE_ENTRY, ERROR) \
FUNC(MULTIPLE_AUTHORS, ERROR) \
-   FUNC(NUL_IN_HEADER, ERROR) \
FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
-   FUNC(UNTERMINATED_HEADER, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
FUNC(BAD_FILEMODE, WARN) \
@@ -157,6 +160,10 @@ void fsck_set_msg_type(struct fsck_options *options,
die(Unhandled message id: %.*s, msg_id_len, msg_id);
type = parse_msg_type(msg_type, msg_type_len);
 
+   if (type != FSCK_ERROR  msg_id_info[id].msg_type == FSCK_FATAL)
+   die(Cannot demote %.*s to %.*s, msg_id_len, msg_id,
+   msg_type_len, msg_type);
+
if (!options-msg_type) {
int i;
int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX);
@@ -213,6 +220,9 @@ static int report(struct fsck_options *options, struct 
object *object,
struct strbuf sb = STRBUF_INIT;
int msg_type = fsck_msg_type(id, options), result;
 
+   if (msg_type == FSCK_FATAL)
+   msg_type = FSCK_ERROR;
+
append_msg_id(sb, msg_id_info[id].id_string);
 
va_start(ap, fmt);
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 3f7e96a..0d64229 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -136,4 +136,15 @@ test_expect_success 'push with 
receive.fsck.missingemail=warn' '
grep missingemail act
 '
 
+test_expect_success \
+   'receive.fsck.unterminatedheader=warn triggers error' '
+   rm -rf dst 
+   git init dst 
+   git --git-dir=dst/.git config receive.fsckobjects true 
+   git --git-dir=dst/.git config \
+   receive.fsck.unterminatedheader warn 
+   test_must_fail git push --porcelain dst HEAD act 21 
+   grep Cannot demote unterminatedheader act
+'
+
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 14/19] fsck: Allow upgrading fsck warnings to errors

2015-06-19 Thread Johannes Schindelin
The 'invalid tag name' and 'missing tagger entry' warnings can now be
upgraded to errors by specifying `invalidtagname` and
`missingtaggerentry` in the receive.fsck.msg-id config setting.

Incidentally, the missing tagger warning is now really shown as a warning
(as opposed to being reported with the error: prefix, as it used to be
the case before this commit).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c| 24 +---
 t/t5302-pack-index.sh |  2 +-
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index b728646..dedad01 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,6 +10,7 @@
 #include utf8.h
 
 #define FSCK_FATAL -1
+#define FSCK_INFO -2
 
 #define FOREACH_MSG_ID(FUNC) \
/* fatal errors */ \
@@ -50,15 +51,16 @@
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
FUNC(BAD_FILEMODE, WARN) \
-   FUNC(BAD_TAG_NAME, WARN) \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
FUNC(HAS_DOTDOT, WARN) \
FUNC(HAS_DOTGIT, WARN) \
-   FUNC(MISSING_TAGGER_ENTRY, WARN) \
FUNC(NULL_SHA1, WARN) \
-   FUNC(ZERO_PADDED_FILEMODE, WARN)
+   FUNC(ZERO_PADDED_FILEMODE, WARN) \
+   /* infos (reported as warnings, but ignored by default) */ \
+   FUNC(BAD_TAG_NAME, INFO) \
+   FUNC(MISSING_TAGGER_ENTRY, INFO)
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
@@ -227,6 +229,8 @@ static int report(struct fsck_options *options, struct 
object *object,
 
if (msg_type == FSCK_FATAL)
msg_type = FSCK_ERROR;
+   else if (msg_type == FSCK_INFO)
+   msg_type = FSCK_WARN;
 
append_msg_id(sb, msg_id_info[id].id_string);
 
@@ -685,15 +689,21 @@ static int fsck_tag_buffer(struct tag *tag, const char 
*data,
goto done;
}
strbuf_addf(sb, refs/tags/%.*s, (int)(eol - buffer), buffer);
-   if (check_refname_format(sb.buf, 0))
-   report(options, tag-object, FSCK_MSG_BAD_TAG_NAME,
+   if (check_refname_format(sb.buf, 0)) {
+   ret = report(options, tag-object, FSCK_MSG_BAD_TAG_NAME,
   invalid 'tag' name: %.*s,
   (int)(eol - buffer), buffer);
+   if (ret)
+   goto done;
+   }
buffer = eol + 1;
 
-   if (!skip_prefix(buffer, tagger , buffer))
+   if (!skip_prefix(buffer, tagger , buffer)) {
/* early tags do not contain 'tagger' lines; warn only */
-   report(options, tag-object, FSCK_MSG_MISSING_TAGGER_ENTRY, 
invalid format - expected 'tagger' line);
+   ret = report(options, tag-object, 
FSCK_MSG_MISSING_TAGGER_ENTRY, invalid format - expected 'tagger' line);
+   if (ret)
+   goto done;
+   }
else
ret = fsck_ident(buffer, tag-object, options);
 
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index 61bc8da..3dc5ec4 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -259,7 +259,7 @@ EOF
 thirtyeight=${tag#??} 
 rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight 
 git index-pack --strict tag-test-${pack1}.pack 2err 
-grep ^error:.* expected .tagger. line err
+grep ^warning:.* expected .tagger. line err
 '
 
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v6 16/19] fsck: Support demoting errors to warnings

2015-06-19 Thread Johannes Schindelin
We already have support in `git receive-pack` to deal with some legacy
repositories which have non-fatal issues.

Let's make `git fsck` itself useful with such repositories, too, by
allowing users to ignore known issues, or at least demote those issues
to mere warnings.

Example: `git -c fsck.missingemail=ignore fsck` would hide
problems with missing emails in author, committer and tagger lines.

In the same spirit that `git receive-pack`'s usage of the fsck machinery
differs from `git fsck`'s – some of the non-fatal warnings in `git fsck`
are fatal with `git receive-pack` when receive.fsckObjects = true, for
example – we strictly separate the fsck.msg-id from the
receive.fsck.msg-id settings.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt | 11 +++
 builtin/fsck.c   | 12 
 t/t1450-fsck.sh  | 11 +++
 3 files changed, 34 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 306ab7a..41fd460 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1250,6 +1250,17 @@ filter.driver.smudge::
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+fsck.msg-id::
+   Allows overriding the message type (error, warn or ignore) of a
+   specific message ID such as `missingemail`.
++
+For convenience, fsck prefixes the error/warning with the message ID,
+e.g.  missingemail: invalid author/committer line - missing email means
+that setting `fsck.missingemail = ignore` will hide that issue.
++
+This feature is intended to support working with legacy repositories
+which cannot be repaired without disruptive changes.
+
 gc.aggressiveDepth::
The depth parameter used in the delta compression
algorithm used by 'git gc --aggressive'.  This defaults
diff --git a/builtin/fsck.c b/builtin/fsck.c
index fff38fe..6de9f3e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -46,6 +46,16 @@ static int show_dangling = 1;
 #define DIRENT_SORT_HINT(de) ((de)-d_ino)
 #endif
 
+static int fsck_config(const char *var, const char *value, void *cb)
+{
+   if (skip_prefix(var, fsck., var)) {
+   fsck_set_msg_type(fsck_obj_options, var, -1, value, -1);
+   return 0;
+   }
+
+   return git_default_config(var, value, cb);
+}
+
 static void objreport(struct object *obj, const char *msg_type,
const char *err)
 {
@@ -646,6 +656,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
include_reflogs = 0;
}
 
+   git_config(fsck_config, NULL);
+
fsck_head_link();
fsck_object_dir(get_object_directory());
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d6d3b13..922c346 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -287,6 +287,17 @@ test_expect_success 'rev-list --verify-objects with bad 
sha1' '
grep -q error: sha1 mismatch 63ff 
out
 '
 
+test_expect_success 'force fsck to ignore double author' '
+   git cat-file commit HEAD basis 
+   sed s/^author .*/,/ basis | tr , \\n multiple-authors 
+   new=$(git hash-object -t commit -w --stdin multiple-authors) 
+   test_when_finished remove_object $new 
+   git update-ref refs/heads/bogus $new 
+   test_when_finished git update-ref -d refs/heads/bogus 
+   test_must_fail git fsck 
+   git -c fsck.multipleauthors=ignore fsck
+'
+
 _bz='\0'
 _bz5=$_bz$_bz$_bz$_bz$_bz
 _bz20=$_bz5$_bz5$_bz5$_bz5
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


Re: Bug report: Unhandled Exception

2015-06-19 Thread Gary England
Hi Johannes,

Will do. It's unlikely that I'll be able to reproduce the error. But, in the 
event that it happens again, I'll let you know.

Thanks,
Gary


From: Johannes Schindelin johannes.schinde...@gmx.de
Sent: Friday, June 19, 2015 6:58 AM
To: Gary England
Cc: git@vger.kernel.org
Subject: Re: Bug report: Unhandled Exception

Hi Gary,

On 2015-06-11 21:21, Gary England wrote:

 Using git version 1.9.2-preview20140411, in Git Bash for Windows,
 performing a git pull --rebase, received an unhandled exception.

Please note that the newest 1.9.x release is 1.9.5. Could you re-test with that 
version, please? Or maybe you want to give the 2nd release candidate of Git for 
Windows 2.x a try: https://git-for-windows.github.io/#download

Ciao,
Johannes


--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] strbuf: stop out-of-boundary warnings from Coverity

2015-06-19 Thread Duy Nguyen
On Fri, Jun 19, 2015 at 5:50 PM, Remi Galan Alfonso
remi.galan-alfo...@ensimag.grenoble-inp.fr wrote:
 Duy Nguyen pclo...@gmail.com writes:

 + sb.buf[0] = 'Z';
 + printf(%c\n, strbuf_slopbuf[0]);
 + return 0;
 startup_info = git_startup_info;

 I might be wrong, but I definitely think that this
 printf and return 0 are some debug lines that you
 forgot to remove.

Yes it's debug code. I hoped that if I made a mistake forcing the
segfault, people would notice.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 3/3] Add filter-objects command

2015-06-19 Thread Jeff King
On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:

  Obviously I've glossed over the how to get a list of objects part.
  If you truly want all objects (not just reachable ones), or if rev-list
  --objects is too slow [...]
 
 So, yes, performance is definitely an issue and I could have called this
 command git magically-generate-all-object-for-scripts but then, as it
 was so easy to provide exactly the filtering that I was looking for in
 the C code, I thought I would do that as well and then filter-objects
 (filter-all-objects?) seemed like a better name.

Right, my point was only that it works for _your_ particular filter, but
it would be nice to have something more general. And we already have
cat-file --batch-check. IOW, I think I would prefer the magical form
because it's a better scripting building block. As you note,
filter-objects without any filters is exactly that. Your 10 extra
lines of C code are not exactly bloat, but I just wonder if other people
will find it all that useful.

 It's about an order of magnitude faster on the systems I've checked to
 do a parameterless filter-objects then rev-list --all --objects,
 although I understand they do different things.

Right, it's the object-opening and hash lookups that kill you in
rev-list, because it's actually walking the graph.

 I am also thinking about another piece that answers the question: which
 commits introduce any of (or the first of) this list of objects?. This
 can be done by parseing a diff --raw for commits but I think it should
 be possible to do this faster, too.

If you care about introduce, I think you have to traverse and do the
diffs. If you only care about contains (for example, because you want
to know which path the blob is found at), you can find trees which
mention it, then trees which mention that tree, and so on. I think that
ends up slower in practice, though.

I have patches that implement a rev-list --find=$sha1, which sets a
bit on $sha1 and then traverses with --objects until we find it (or
them; you can specify multiple). It's pretty straightforward, but it
does cost as much as git rev-list --objects in the worst case. Let me
know if you're interested and I can clean it up and post it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 3/3] Add filter-objects command

2015-06-19 Thread John Keeping
On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:
 So, yes, performance is definitely an issue and I could have called this
 command git magically-generate-all-object-for-scripts but then, as it
 was so easy to provide exactly the filtering that I was looking for in
 the C code, I thought I would do that as well and then filter-objects
 (filter-all-objects?) seemed like a better name.

By analogy with git filter-branch, I don't think filter-objects is a
good name here.  My preference would be ls-objects.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Using clean/smudge filters with difftool

2015-06-19 Thread John Keeping
On Fri, Jun 19, 2015 at 10:57:55AM +0200, Michael J Gruber wrote:
 Junio C Hamano venit, vidit, dixit 19.06.2015 00:55:
  John Keeping j...@keeping.me.uk writes:
  
  I think the summary is that there are some scenarios where the external
  diff tool should see the smudged version and others where the clean
  version is more appropriate and Git should support both options.  It
  seems this is a property of the filter, so I wonder if the best solution
  is a new filter.name.extdiff = [clean|smudge] configuration
  variable (there's probably a better name for the variable than
  extdiff).
  
  Not just the external diff, but the textconv filter obeys the same
  rule.  The setting should be done the same way for both, if we are
  going to go in that direction.
  
 
 textconv is a one-way filter from blob to readable blob. External
 diffs may prefer to work on blob rather than readable blob, but the
 currect setup does not seem to produce surprises.
 
 clean and smudge are two-way filters: clean from worktree blob (aka
 file) to repo blob, smudge the other way round.
 
 Typically, the user perceives these as inverse to each other. But we
 only require clean to be a left-inverse of smudge, i.e. (cat-file then)
 smudge then clean should give the same repo blob (as cat-file).
 
 We don't require that the other way round, i.e. we don't require smudge
 to be a left-inverse of clean, and in most setups (like the current one)
 it is not: smudge does not recreate what clean has cleaned out. It is a
 no-op (the identity, while clean is a projection).
 
 Now, since external diff runs on smudged blobs, it appears as if we
 mixed cleaned and smudged blobs when feeding external diffs; whereas
 really, we mix worktree blobs and smudged repo blobs, which is okay
 as per our definition of clean/smudge: the difference is irrelevant by
 definition.

I agree with this.

But I was wrong that should diff clean/should diff smudged is a
property of the filter.  I can also imagine a situation where a more
intelligent external diff tool wants to see the smudged version where a
naïve tool would want the clean version.

For example, some of the big file stores (e.g. git-lfs [1]) use
clean/smudge filters and I can imagine a diff utility that avoids
needing to fetch the data for large files and instead shows the diff on
the server when both blobs are available there.  In that case we
generally want to use the smudged copy for external diff, so the filter
would use that setting, but the diff utility knows better and would want
to override that.

[1] https://github.com/github/git-lfs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Add filter-objects command

2015-06-19 Thread Charles Bailey
On Fri, Jun 19, 2015 at 06:10:10AM -0400, Jeff King wrote:
 On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:
 
  filter-objects is a command to scan all objects in the object database
  for the repository and print the ids of those which match the given
  criteria.
  
  The current supported criteria are object type and the minimum size of
  the object.
  
  The guiding use case is to scan repositories quickly for large objects
  which may cause performance issues for users. The list of objects can
  then be used to guide some future remediating action.
 
 I've had to perform this exact same task. You can already do the
 filtering part pretty easily and efficiently with cat-file and a perl
 script, like:
 
   magically_generate_all_objects |
   git cat-file --batch-check='%(objectsize) %(objectname)' |
   perl -alne 'print $F[1] if $F[0]  1234'
 
 That's not as friendly as your filter-objects, but it's a lot more
 flexible (since you can ask cat-file for all sorts of information).
 
 Obviously I've glossed over the how to get a list of objects part.
 If you truly want all objects (not just reachable ones), or if rev-list
 --objects is too slow [...]

So, yes, performance is definitely an issue and I could have called this
command git magically-generate-all-object-for-scripts but then, as it
was so easy to provide exactly the filtering that I was looking for in
the C code, I thought I would do that as well and then filter-objects
(filter-all-objects?) seemed like a better name.

It's about an order of magnitude faster on the systems I've checked to
do a parameterless filter-objects then rev-list --all --objects,
although I understand they do different things.

I am also thinking about another piece that answers the question: which
commits introduce any of (or the first of) this list of objects?. This
can be done by parseing a diff --raw for commits but I think it should
be possible to do this faster, too.

Charles.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

2015-06-19 Thread Remi Galan Alfonso
Charles Bailey cbaile...@bloomberg.net writes:
 test_expect_success 'long options' '
 - test-parse-options --boolean --integer 1729 --boolean --string2=321 \
 - --verbose --verbose --no-dry-run --abbrev=10 --file fi.le\
 - --obsolete  output 2 output.err 
 + test-parse-options --boolean --integer 1729 --unsigned-long 16k \
 + --boolean --string2=321 --verbose --verbose --no-dry-run \
 + --abbrev=10 --file fi.le --obsolete \
 +  output 2 output.err 
 test_must_be_empty output.err 
 test_cmp expect output
 '

It's trivial matter but the line:
 +  output 2 output.err 
should be written:
 + output 2output.err 

It was incorrectly written before but since 
you are modifying the line, it might be a 
good thing to change it now.

Rémi
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

2015-06-19 Thread Charles Bailey
On Fri, Jun 19, 2015 at 01:03:25PM +0200, Remi Galan Alfonso wrote:
 
 It's trivial matter but the line:
  +  output 2 output.err 
 should be written:
  + output 2output.err 
 
 It was incorrectly written before but since 
 you are modifying the line, it might be a 
 good thing to change it now.

Yes, I can fold this in. I just changed the wrapping and didn't spot
this style error.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 04/31] am: implement patch queue mechanism

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 4:43 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 diff --git a/builtin/am.c b/builtin/am.c
 index dbc8836..af68c51 100644
 --- a/builtin/am.c
 +++ b/builtin/am.c
 @@ -6,6 +6,158 @@
  #include cache.h
  #include builtin.h
  #include exec_cmd.h
 +#include parse-options.h
 +#include dir.h
 +
 +struct am_state {
 + /* state directory path */
 + struct strbuf dir;

 Is this a temporary variable you will append /patch, etc. to form
 a different string to use for fopen() etc., or do you use separate
 strbufs for things like that and this is only used to initialize
 them?

  - If the former then dir is a misnomer.

  - If the latter, then it probably does not have to be a strbuf;
rather, it should probably be a const char *.  Unless you pass
this directly to functions that take a strbuf, such as
remove_dir_recursively(), that is.

It's the latter, and yes it does not need to be an strbuf since it
will not usually change.

However, I don't think it should be a const char*, as it means that
the API user has to keep track of the lifetime of the dir string.
Note that we will have to git_pathdup() as git_path() returns a static
buffer:

char *dir = git_pathdup(rebase-apply);
struct am_state state;
am_state_init(state);
state.dir = dir;
...stuff...
if (foo) {
 ... separate code path ...
 am_state_release(state);
 free(dir);
 return 0;
}
... separate code path ...
am_state_release(state);
free(dir);
return 0;

Note the additional memory bookkeeping.

Instead, I would rather the am_state struct take ownership of the
dir string and free it in am_state_release(). So dir will be a
char*:

struct am_state state;
am_state_init(state, git_path(rebase-apply));
... stuff ...
if (foo) {
 ... separate code path ...
 am_state_release(state);
 return 0;
}
... separate code path ...
am_state_release(state);
return 0;

 +/**
 + * Release memory allocated by an am_state.
 + */

 Everybody else in this file seems to say Initializes, Returns,
 Reads, etc.  While I personally prefer to use imperative
 (i.e. give command to this function to release memory allocated),
 you would want to be consistent throughout the file; Releases
 memory would make it so.

OK

 +/**
 + * Setup a new am session for applying patches
 + */
 +static void am_setup(struct am_state *state)
 +{
 + if (mkdir(state-dir.buf, 0777)  0  errno != EEXIST)
 + die_errno(_(failed to create directory '%s'), 
 state-dir.buf);
 +
 + write_file(am_path(state, next), 1, %d, state-cur);
 +
 + write_file(am_path(state, last), 1, %d, state-last);

 These two lines are closely related pair; drop the blank in between.

 I am tno sure if write_file() is an appropriate thing to use,
 though.  What happens when you get interrupted after opening the
 file but before you manage to write and close?  Shouldn't we be
 doing the usual write to temp, close and then rename to final
 dance?  This comment applies to all the other use of write_file().

We could, although I don't think it would help us much. The state
directory is made up of many different files, so even if we made
modifications to single files atomic, there's no point if we terminate
unexpectedly in the middle of writing multiple files to the state
directory as the state, as a whole, is corrupted anyway.

So, we must be able to make updates to the entire state directory as a
single transaction, which is a more difficult problem...

Furthermore, while applying patches, we may face abnormal termination
between e.g. the patch application and commit, so I think it is safe
to say that if abnormal termination occurs, we will not be able to
reliably --resume or --skip anyway, and the user should just run
--abort to go back to the last known state.

However, it would be an improvement if we wrote the next and last
files last, as we use their presence to determine if we have an am
session in progress or not, so if we terminate in am_setup() we will
just have a stray directory. I will make this change in the next
iteration.

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-19 Thread Jeff King
On Fri, Jun 19, 2015 at 03:34:55AM -0400, Jeff King wrote:

 And here's some more bad news. If you look at the diff for this
 patch itself, it's terribly unreadable (the regular diff already is
 pretty bad, but the highlights make it much worse). There are big chunks
 where we take away 5 or 10 lines from the old code, and replace them
 with totally unrelated lines. We end up highlighting almost the entire
 thing, except for spaces and punctuation.
 
 We might be able to solve this with a percentage heuristic similar to
 the one Patrick proposed. It's not really interesting to highlight
 unless we're doing it on probably 20% or less of the diff (where 20% is
 a number I just made up).

That turned out to be pretty easy; patch is below (on top of what I sent
earlier). I set the percentage at 50% based on eyeballing git log -p
in git.git, and it seems to give good results.

So I think the big remaining issue is improved tokenizing. Maybe Patrick
will want to take a stab at it.

---
diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index 1525ccc..9454446 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -114,12 +114,32 @@ sub show_hunk {
if $bits  2;
}
 
+   my $highlighted = count_highlight(@highlight_a) +
+ count_highlight(@highlight_b);
+   my $total = length($a) + length($b);
+   my $pct = $highlighted / $total;
+
+   if ($pct  0.5) {
+   @highlight_a = ();
+   @highlight_b = ();
+   }
+
# And now show the output both with the original stripped annotations,
# as well as our new highlights.
show_image($a, [merge_annotations(\@stripped_a, \@highlight_a)]);
show_image($b, [merge_annotations(\@stripped_b, \@highlight_b)]);
 }
 
+sub count_highlight {
+   my $total = 0;
+   while (@_) {
+   my $from = shift;
+   my $to = shift;
+   $total += $to-[0] - $from-[0];
+   }
+   return $total;
+}
+
 # Strip out any diff syntax (i.e., leading +/-), along with any ANSI color
 # codes from the pre- or post-image of a hunk. The result is a string of text
 # suitable for diffing against the other side of the hunk.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 10/31] am: refresh the index at start

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 5:28 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 If a file is unchanged but stat-dirty, git-apply may erroneously fail to
 apply patches, thinking that they conflict with a dirty working tree.

 As such, since 2a6f08a (am: refresh the index at start and --resolved,
 2011-08-15), git-am will refresh the index before applying patches.
 Re-implement this behavior.

 Good.

 I would actually have expected to see this as part of 08/31, though.

Ah right, makes sense since this patch is primarily about git-apply.

I've squashed this patch into 08/31.

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git svn die signal 6

2015-06-19 Thread Олег Кохтенко
I even found Pull Request on GitHub with useful changes, but looks
like there is too hard way to commit changes and author gave up.
https://github.com/git/git/pull/62 Probably you can take a short look
into it and include this changes in next release.

2015-06-15 16:54 GMT+03:00 Олег Кохтенко kohte...@gmail.com:
 Hi great git crew member :)
 I found small, but annoying bug in 'git svn'. If you would try to make
 any commit with new files (or delete/modify files) which contain in
 name @ sign - you'll see crash with minimum information about what's
 just happened. Such file names are very useful for iOS projects when
 you can add same pictures with different dimensions with suffix @2x or
 @3x. In guts, svn uses @ sign for pin version of file and it's
 probably confused if this sign presented in file name. For committing
 such changes thru svn you would use svn add 'im...@3x.png'@. This
 will prevent such problem.

 I hope you can easily find out how to reproduce and fix this issue.
 Please, feel free to contact me for any details about this.

 Best regards
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 5:10 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 + /* commit message and metadata */
 + struct strbuf author_name;
 + struct strbuf author_email;
 + struct strbuf author_date;
 + struct strbuf msg;

 Same comment as dir in the earlier patch applies to these.  If the
 fields are read or computed and then kept constant, use a temporary
 variable that is a strbuf to read/compute the final value, and then
 detach to a const char * field.  If they are constantly changing
 and in-place updates are vital, then they can and should be strbufs,
 but I do not think that is the case for these.

I do think it is the case here. The commit message and metadata fields
change for every patch we process, and we could be processing a large
volume of patches, so we must be careful to not leak any memory.

We could use a char* field, but then to prevent memory leaks we would
then have to free() it and malloc() a new string for every patch we
process, which will lead to lots of malloc()s and free()s over a large
volume of patches that can lead to memory fragmentation.

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Visualizing merge conflicts after the fact (using kdiff3)

2015-06-19 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 18.06.2015 17:57:
 Michael J Gruber g...@drmicha.warpmail.net writes:
 
 This type of request comes up often (for a reason). I'm wondering
 whether we could support it more systematically, either by exposing the
 steps above as a command, or by storing the unresolved merge somewhere
 (leveraging stash or rerere).
 
 Perhaps 'tr/remerge-diff' (on 'pu') is of interest?
 

Ingenious!

To me, this seems to be the most useful view if you want to understand a
merge just from the parents and the merge commit. Since you would use
that for individual commits only, the cpu cycles are well spent.

As and added benefit, tr/remerge-diff merges to current next with
conflicts (oid...) so that you get to test it on its own merge!

I haven't reviewed remerge-diff but merged it on top of my own local
additions and ran the full test suite successfully. Any big blocker to
watch out for?

Michael
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mergetool-lib: fix default tool selection

2015-06-19 Thread Michael J Gruber
When no diff nor merge tool is specified (config, option), mergetool-lib
is supposed to choose a default tool from a set of tools. That set is
constructed dynamically depending on the environment (graphical, editor
setting) as a space separated string of tool names.

719518f (mergetool--lib: set IFS for difftool and mergetool, 2015-05-20)
introduced a newline as IFS which breaks the parsing of the space
separated list into items, resulting in a failed search for an available
tool.

Set IFS to a space locally for the tool search.

Signed-off-by: Michael J Gruber g...@drmicha.warpmail.net
---
 git-mergetool--lib.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/git-mergetool--lib.sh b/git-mergetool--lib.sh
index 14b039d..54ac8e4 100644
--- a/git-mergetool--lib.sh
+++ b/git-mergetool--lib.sh
@@ -305,6 +305,7 @@ guess_merge_tool () {
EOF
 
# Loop over each candidate and stop when a valid merge tool is found.
+   IFS=' '
for tool in $tools
do
is_available $tool  echo $tool  return 0
-- 
2.4.4.610.gea30ab1.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

2015-06-19 Thread Charles Bailey
From: Charles Bailey cbaile...@bloomberg.net

The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing)
is more widely applicable. Add support for OPT_ULONG to parse-options.h
and change pack-objects.c use this support.

Signed-off-by: Charles Bailey cbaile...@bloomberg.net
---
 builtin/pack-objects.c   | 17 -
 parse-options.c  | 15 +++
 parse-options.h  |  5 -
 t/t0040-parse-options.sh | 46 ++
 test-parse-options.c |  3 +++
 5 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 80fe8c7..5de76db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2588,23 +2588,6 @@ static int option_parse_unpack_unreachable(const struct 
option *opt,
return 0;
 }
 
-static int option_parse_ulong(const struct option *opt,
- const char *arg, int unset)
-{
-   if (unset)
-   die(_(option %s does not accept negative form),
-   opt-long_name);
-
-   if (!git_parse_ulong(arg, opt-value))
-   die(_(unable to parse value '%s' for option %s),
-   arg, opt-long_name);
-   return 0;
-}
-
-#define OPT_ULONG(s, l, v, h) \
-   { OPTION_CALLBACK, (s), (l), (v), n, (h), \
- PARSE_OPT_NONEG, option_parse_ulong }
-
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
int use_internal_rev_list = 0;
diff --git a/parse-options.c b/parse-options.c
index 80106c0..76a5c3e 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -180,6 +180,21 @@ static int get_value(struct parse_opt_ctx_t *p,
return opterror(opt, expects a numerical value, 
flags);
return 0;
 
+   case OPTION_ULONG:
+   if (unset) {
+   *(unsigned long *)opt-value = 0;
+   return 0;
+   }
+   if (opt-flags  PARSE_OPT_OPTARG  !p-opt) {
+   *(unsigned long *)opt-value = opt-defval;
+   return 0;
+   }
+   if (get_arg(p, opt, flags, arg))
+   return -1;
+   if (!git_parse_ulong(arg, opt-value))
+   return opterror(opt, expects a numerical value, 
flags);
+   return 0;
+
default:
die(should not happen, someone must be hit on the forehead);
}
diff --git a/parse-options.h b/parse-options.h
index c71e9da..2ddb26f 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -18,7 +18,8 @@ enum parse_opt_type {
OPTION_INTEGER,
OPTION_CALLBACK,
OPTION_LOWLEVEL_CALLBACK,
-   OPTION_FILENAME
+   OPTION_FILENAME,
+   OPTION_ULONG
 };
 
 enum parse_opt_flags {
@@ -129,6 +130,8 @@ struct option {
 #define OPT_CMDMODE(s, l, v, h, i) { OPTION_CMDMODE, (s), (l), (v), NULL, \
  (h), PARSE_OPT_NOARG|PARSE_OPT_NONEG, 
NULL, (i) }
 #define OPT_INTEGER(s, l, v, h) { OPTION_INTEGER, (s), (l), (v), N_(n), 
(h) }
+#define OPT_ULONG(s, l, v, h)   { OPTION_ULONG, (s), (l), (v), N_(n), \
+ (h), PARSE_OPT_NONEG }
 #define OPT_STRING(s, l, v, a, h)   { OPTION_STRING,  (s), (l), (v), (a), (h) }
 #define OPT_STRING_LIST(s, l, v, a, h) \
{ OPTION_CALLBACK, (s), (l), (v), (a), \
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index ecb7417..55b3dba 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -19,6 +19,8 @@ usage: test-parse-options options
 
 -i, --integer n get a integer
 -j nget a integer, too
+-u, --unsigned-long n
+  get an unsigned long
 --set23   set integer to 23
 -t time get timestamp of time
 -L, --length strget length of str
@@ -58,6 +60,7 @@ mv expect expect.err
 cat expect.template EOF
 boolean: 0
 integer: 0
+unsigned long: 0
 timestamp: 0
 string: (not set)
 abbrev: 7
@@ -132,9 +135,32 @@ test_expect_success 'OPT_BOOL() no negation #2' 
'check_unknown_i18n --no-no-fear
 
 test_expect_success 'OPT_BOOL() positivation' 'check boolean: 0 -D --doubt'
 
+test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
+
+test_expect_success 'OPT_ULONG() simple' '
+   check unsigned long: 2345678 -u 2345678
+'
+
+test_expect_success 'OPT_ULONG() kilo' '
+   check unsigned long: 239616 -u 234k
+'
+
+test_expect_success 'OPT_ULONG() mega' '
+   check unsigned long: 104857600 -u 100m
+'
+
+test_expect_success 'OPT_ULONG() giga' '
+   check unsigned long: 1073741824 -u 1g
+'
+
+test_expect_success 'OPT_ULONG() 3giga' '
+   check unsigned long: 3221225472 -u 3g
+'
+
 cat  expect  EOF
 boolean: 2
 integer: 1729
+unsigned long: 16384
 timestamp: 0
 string: 123
 abbrev: 7
@@ -145,7 +171,7 @@ file: 

Improvements to parse-options and a new filter-objects command

2015-06-19 Thread Charles Bailey
In my team we've been looking for a fast way to check a large number of
repositories for large files, which are typically unintentionally checked in
binaries, so that we can warn repository owners and help them tidy up as
desired.

There seem to be two main approaches to scripting this. The first is to do
something revision-walk based such as `log --numstat` and the second is to scan
pack files using `verify-pack -v` and either to ensure that everything is packed
or scan loose objects separately.

The revision walking tends to be slow and parsing verify-pack -v is awkward
not only because of the need to take account of multiple packs and loose
objects, but also because it is porcelainish. For example, at some point it
gained a delta chain summary which needs to be snipped before the list of
packed objects can be sorted and used.

The third patch in this series adds a new built in which makes this simple and
fast. While implementing it, I found a couple of other improvements which I
think stand alone.

[PATCH 1/3] Correct test-parse-options to handle negative ints

I noticed that a printf in test-parse-options was using %u instead of %d for an
int with the consequence that it wouldn't ever print a negative value correctly.
I don't know that we do ever parse a negative integer as an option, but there's
no reason that it shouldn't work so I fixed it and added a trivial test.

[PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

I wanted to be able to parse options like --min-size=500k in my new command so I
started to add OPT_ULONG, only to realise that it already existed but was
private to pack-objects. I added OPT_ULONG support to parse-options based on the
existing OPT_INTEGER code, added new tests and changed pack-objects to use this
instead.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Correct test-parse-options to handle negative ints

2015-06-19 Thread Charles Bailey
From: Charles Bailey cbaile...@bloomberg.net

Fix the printf specification to treat 'integer' as the signed type that
it is and add a test that checks that we parse negative option
arguments.

Signed-off-by: Charles Bailey cbaile...@bloomberg.net
---
 t/t0040-parse-options.sh | 2 ++
 test-parse-options.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index b044785..ecb7417 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -151,6 +151,8 @@ test_expect_success 'short options' '
test_must_be_empty output.err
 '
 
+test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
+
 cat  expect  EOF
 boolean: 2
 integer: 1729
diff --git a/test-parse-options.c b/test-parse-options.c
index 5dabce6..7c492cf 100644
--- a/test-parse-options.c
+++ b/test-parse-options.c
@@ -82,7 +82,7 @@ int main(int argc, char **argv)
argc = parse_options(argc, (const char **)argv, prefix, options, usage, 
0);
 
printf(boolean: %d\n, boolean);
-   printf(integer: %u\n, integer);
+   printf(integer: %d\n, integer);
printf(timestamp: %lu\n, timestamp);
printf(string: %s\n, string ? string : (not set));
printf(abbrev: %d\n, abbrev);
-- 
2.4.0.53.g8440f74

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using clean/smudge filters with difftool

2015-06-19 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 19.06.2015 00:55:
 John Keeping j...@keeping.me.uk writes:
 
 I think the summary is that there are some scenarios where the external
 diff tool should see the smudged version and others where the clean
 version is more appropriate and Git should support both options.  It
 seems this is a property of the filter, so I wonder if the best solution
 is a new filter.name.extdiff = [clean|smudge] configuration
 variable (there's probably a better name for the variable than
 extdiff).
 
 Not just the external diff, but the textconv filter obeys the same
 rule.  The setting should be done the same way for both, if we are
 going to go in that direction.
 

textconv is a one-way filter from blob to readable blob. External
diffs may prefer to work on blob rather than readable blob, but the
currect setup does not seem to produce surprises.

clean and smudge are two-way filters: clean from worktree blob (aka
file) to repo blob, smudge the other way round.

Typically, the user perceives these as inverse to each other. But we
only require clean to be a left-inverse of smudge, i.e. (cat-file then)
smudge then clean should give the same repo blob (as cat-file).

We don't require that the other way round, i.e. we don't require smudge
to be a left-inverse of clean, and in most setups (like the current one)
it is not: smudge does not recreate what clean has cleaned out. It is a
no-op (the identity, while clean is a projection).

Now, since external diff runs on smudged blobs, it appears as if we
mixed cleaned and smudged blobs when feeding external diffs; whereas
really, we mix worktree blobs and smudged repo blobs, which is okay
as per our definition of clean/smudge: the difference is irrelevant by
definition.

I still think that feeding cleaned blobs to external diff would be less
surprising (and should be the default, but maybe can't be changed any
more) and feeding smudged blobs should be the special case requiring a
special config. Because otherwise, the external diff would have to know
which parts of the diff are irrelevant - if it display the complete
(uncleaned) diff, it shows differences (what will be committed) that
will not end up in the commit (because they will get cleaned out before).

As a guiding principle, a worktree-HEAD diff and an index-HEAD diff
should be previews of the result of commit -a resp. commit, and
therefore should diff cleaned versions. textconv, on the other hand, is
a setting by which you tell git: Don't show me the 'proper'
diff/commit-preview but a readable version.

Michael
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] strbuf: stop out-of-boundary warnings from Coverity

2015-06-19 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 Anyway it does not put strbuf_slopbuf in .rodata.

That's sad.  I wa hoping that it would behave the same as this,
which does give me SEGV:

#include stdio.h

static const char x = '\0';
static char *y = (char *)x;

int main (void) {
*y = 1;
}
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Using clean/smudge filters with difftool

2015-06-19 Thread Florian Aspart
2015-06-19 11:32 GMT+02:00 John Keeping j...@keeping.me.uk:
 On Fri, Jun 19, 2015 at 10:57:55AM +0200, Michael J Gruber wrote:
 Junio C Hamano venit, vidit, dixit 19.06.2015 00:55:
  John Keeping j...@keeping.me.uk writes:
 
  I think the summary is that there are some scenarios where the external
  diff tool should see the smudged version and others where the clean
  version is more appropriate and Git should support both options.  It
  seems this is a property of the filter, so I wonder if the best solution
  is a new filter.name.extdiff = [clean|smudge] configuration
  variable (there's probably a better name for the variable than
  extdiff).
 
  Not just the external diff, but the textconv filter obeys the same
  rule.  The setting should be done the same way for both, if we are
  going to go in that direction.
 

 textconv is a one-way filter from blob to readable blob. External
 diffs may prefer to work on blob rather than readable blob, but the
 currect setup does not seem to produce surprises.

 clean and smudge are two-way filters: clean from worktree blob (aka
 file) to repo blob, smudge the other way round.

 Typically, the user perceives these as inverse to each other. But we
 only require clean to be a left-inverse of smudge, i.e. (cat-file then)
 smudge then clean should give the same repo blob (as cat-file).

 We don't require that the other way round, i.e. we don't require smudge
 to be a left-inverse of clean, and in most setups (like the current one)
 it is not: smudge does not recreate what clean has cleaned out. It is a
 no-op (the identity, while clean is a projection).

 Now, since external diff runs on smudged blobs, it appears as if we
 mixed cleaned and smudged blobs when feeding external diffs; whereas
 really, we mix worktree blobs and smudged repo blobs, which is okay
 as per our definition of clean/smudge: the difference is irrelevant by
 definition.

 I agree with this.

 But I was wrong that should diff clean/should diff smudged is a
 property of the filter.  I can also imagine a situation where a more
 intelligent external diff tool wants to see the smudged version where a
 naïve tool would want the clean version.

 For example, some of the big file stores (e.g. git-lfs [1]) use
 clean/smudge filters and I can imagine a diff utility that avoids
 needing to fetch the data for large files and instead shows the diff on
 the server when both blobs are available there.  In that case we
 generally want to use the smudged copy for external diff, so the filter
 would use that setting, but the diff utility knows better and would want
 to override that.

 [1] https://github.com/github/git-lfs

I can understand why they are not fed with the clean copy by default.
Since some external diff tool enable modifying the working copy file,
this would correspond to apply the cleaning filter to the working copy
version.

Nevertheless, in my case it would be really helpful if there were an
option to feed the external diff tool with the cleaned version.
Otherwise, I'll probably write a custom script which does this.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Paul Tan
On Fri, Jun 19, 2015 at 11:09 PM, Junio C Hamano gits...@pobox.com wrote:
 You do realize that strbuf internally does alloc/free so as a solution to
 fragmentation issue you are at the mercy of the same alloc/free, don't you?

Yes, of course, but it has the alloc variable to keep track of the
size of the allocated buffer, and provided that ALLOC_GROW() uses a
sensible factor to grow the buffer, it won't be calling malloc/free
that much to be a problem.

Of course, we could do the same and just add a alloc variable as
well and then use it with strbuf_attach()/strbuf_detach(), but at this
point why not just store it in an strbuf then and avoid the extra
alloc struct member?

 The primary thing I care about is to discourage callers of the API element
 am_state from touching these fields with strbuf functions. If it is char * 
 then
 the would think twice before modifying (which would involve allocating the
 new buffer, forming the new value and assigning into it). With the fields left
 as strbuf after they are read or formed by the API (by reading by the API
 implementation from $GIT_DIR/rebase-apply/), it is too tempting to do
 strbuf_add(), strbuf_trim(), etc., without restoring the value to the original
 saying I'm the last user of it anyway--that is the sloppiness we can prevent
 by not leaving it as strbuf.

I don't think this is a good deterrent. If the code wanted to, it
could just use strbuf_attach()/strbuf_detach() as well, no?

I believe this kind of issue would be better solved with a big WARNING
comment and code review.

Also, there may be use cases where the user may wish to modify the
commit message/author fields. E.g., user may wish to modify the
message of the commit that results from am --continue to take into
account the conflicts that caused the am to fail.

Regards,
Paul
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Using clean/smudge filters with difftool

2015-06-19 Thread Junio C Hamano
Michael J Gruber g...@drmicha.warpmail.net writes:

 Now, since external diff runs on smudged blobs, it appears as if we
 mixed cleaned and smudged blobs when feeding external diffs; whereas
 really, we mix worktree blobs and smudged repo blobs, which is okay
 as per our definition of clean/smudge: the difference is irrelevant by
 definition.

It does not appear to mix cleaned and smudged to me (even though
before Dscho's commit that John pointed out, we did mix by mistake)
to me, but you arrived at the correct conclusion in the rest of your
sentence.

We treat worktree files and smudged repo blobs as comparable
because by definition the latter is what you get if you did a
checkout of the blob.  Indeed, when we know a worktree file is an
unmodified checkout from a blob and we want to have a read-only
temporary file for a smudged repo blob, we allow that worktree
file to be used as such.

So in that sense, the commit by Dscho that John pointed out earlier
was not something that changed the semantics; it merely made things
consistent (before that commit, we used to use clean version if we
do not have a usable worktree file).

It is a separate question which of clean or smudged an external diff
tool should be given to work on.

 I still think that feeding cleaned blobs to external diff would be less
 surprising (and should be the default, but maybe can't be changed any
 more) and feeding smudged blobs should be the special case requiring a
 special config.

Go back six years and make a review comment before 4e218f54 (Smudge
the files fed to external diff and textconv, 2009-03-21) was taken
;-).  The argument against that commit may have gone like this:

 * The current (that is, current as of 4e218f54^) code is
   inconsistent, and your patch has a side effect of making it
   consistent by always feeding smudged version.

 * We however could make it consistent by always feeding clean
   version (i.e. disable borrow-from-working-tree codepath when
   driving external diff).  And that gives us cleaner semantics; the
   internal diff and external diff will both work on clean, not
   smudged data.

 * Of course, going the clean way would not help your cause of
   allowing external diff to work on smudged version, so you would
   need a separate patch on top of that consistently feed 'clean'
   version fix to optionally allow consistently feed 'smudge'
   version mode to help msysGit issue 177.

And I would have bought such an argument with 97% chance [*1*].

I do not think 6 years have changed things very much with respect to
the above three-bullet point argument, except that it would be too
late to set the default to 'clean' all of a sudden.  So a plausible
way forward would be to

 * introduce an option to feed 'clean' versions to external diff
   drivers, perhaps with --ext-diff-clean=driver command line
   option and GIT_EXTERNAL_DIFF_CLEAN environment variable, both of
   which take precedence over existing --ext-diff/GIT_EXTERNAL_DIFF

 * optionally add a configuration variable diff.feedCleanToExternal
   that makes --ext-diff/GIT_EXTERNAL_DIFF behave as if their
   'clean' siblings were given.  Default it to false.

My gut feeling is that textconv should need a similar treatment for
consistency (after all, it goes through the same prepare_temp_file()
infrastructure).


[Footnote]

*1* The 3% reservation is that I am not entirely convinced that
both internal and external get to work on the same 'clean'
representation gives us cleaner semantics is always true.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Junio C Hamano
Alexander Kuleshov kuleshovm...@gmail.com writes:

 2015-06-19 3:46 GMT+06:00 Junio C Hamano gits...@pobox.com:
 I agree with later -o should override an earlier one, but I do not
 necessarily agree with '-o -' should be --stdout, for a simple
 reason that -o foo is not --stdout foo.

 Perhaps something like this to replace builtin/ part of Alexander's
 patch?

 @@ -1337,6 +1342,9 @@ int cmd_format_patch(int argc, const char **argv, 
 const char *prefix)
 die (_(--subject-prefix and -k are mutually exclusive.));
 rev.preserve_subject = keep_subject;

 +   if (!output_directory  !use_stdout)
 +   output_directory = config_output_directory;
 +


 But there is following condition above:

  if (!use_stdout)
   output_directory = set_outdir(prefix, output_directory);

 After which output_directory will be ./ everytime and


 +   if (!output_directory  !use_stdout)
 +   output_directory = config_output_directory;
 +


 will not work here.

I thought I made that if we did not see '-o dir' on the command
line, initialize output_directory to what we read from the config
before we make a call to set_outdir().

What I am missing?  

Puzzled...  FWIW, IIRC, the patch you are responding to passed the
test you added.

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH/WIP v3 07/31] am: extract patch, message and authorship with git-mailinfo

2015-06-19 Thread Junio C Hamano
Paul Tan pyoka...@gmail.com writes:

 On Fri, Jun 19, 2015 at 5:10 AM, Junio C Hamano gits...@pobox.com wrote:
 Paul Tan pyoka...@gmail.com writes:

 + /* commit message and metadata */
 + struct strbuf author_name;
 + struct strbuf author_email;
 + struct strbuf author_date;
 + struct strbuf msg;

 Same comment as dir in the earlier patch applies to these.  If the
 fields are read or computed and then kept constant, use a temporary
 variable that is a strbuf to read/compute the final value, and then
 detach to a const char * field.  If they are constantly changing
 and in-place updates are vital, then they can and should be strbufs,
 but I do not think that is the case for these.

 I do think it is the case here. The commit message and metadata fields
 change for every patch we process, and we could be processing a large
 volume of patches, so we must be careful to not leak any memory.

With the above fields, it is clear that the above fields are
per-message thing.  So the loop to process multiple messages is
conceptually:


set up the entire am_state (for things like cur=1, last=N)
for each message {
update per-message fields like cur, author, msg, etc.
process the single message
clean up per-message fileds like cur++, free(msg), etc.
}
tear down the entire am_state

Reusing the same strbuf and rely on them to clean up after
themselves is simply a bad taste.

More importantly, we'd want to make sure that people who will be
touching the process the single message part in the above loop to
think twice before mucking with read-only fields like author.

If they are char *, they would need to allocate new storage
themselves, format a new value into there, free the original, and
replace the field with the new value.  It takes a conscious effort
and very much visible code and would be clear to reviewers what is
going on, and gives them a chance to question if it is a good idea
to update things in-place in the first place.

If you left it in strbuf, that invites let's extend it temporarily
with strbuf_add() and then return to the original once I am done
with this single step, which is an entry to a slippery slope to
let's extend it with strbuf_add() for my purpose, and I do not even
bother to clean up because I know I am the last person to touch
this.

So, no, please don't leave a field that won't change during the bulk
of the processing in strbuf, unless you have a compelling reason to
do so (and compelling reasons are pretty much limited to after
all, that field we originally thought it won't change is something
we need to change very often).

Thanks.


--
To unsubscribe from this list: send the line unsubscribe git in


Re: Visualizing merge conflicts after the fact (using kdiff3)

2015-06-19 Thread Junio C Hamano
Michael J Gruber g...@drmicha.warpmail.net writes:

 Junio C Hamano venit, vidit, dixit 18.06.2015 17:57:
 
 Perhaps 'tr/remerge-diff' (on 'pu') is of interest?

 I haven't reviewed remerge-diff but merged it on top of my own local
 additions and ran the full test suite successfully. Any big blocker to
 watch out for?

What's cooking report marks it as Waiting for a
reroll. ($gmane/256591).

So 

  http://thread.gmane.org/gmane.comp.version-control.git/256591

would be the first place to revisit.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Alexander Kuleshov
 I thought I made that if we did not see '-o dir' on the command
 line, initialize output_directory to what we read from the config
 before we make a call to set_outdir().

 What I am missing?

 Puzzled...  FWIW, IIRC, the patch you are responding to passed the
 test you added.

Ok, Now we have:

if (!use_stdout)
output_directory = set_outdir(prefix, output_directory);
else
setup_pager();

and

if (output_directory) {
// test that we did not pass use_stdout and mkdir than
}

If we didn't pass --stdout and -o the set_outdir will be called
and there is

static const char *set_outdir(const char *prefix, const char *output_directory)
{
//printf(is_absoulte_path %d\n, is_absolute_path(output_directory));
if (output_directory  is_absolute_path(output_directory))
return output_directory;

if (!prefix || !*prefix) {
if (output_directory)
return output_directory;
return ./;
}

}

So it returns ./, output_directory will not be null. After this

 +   if (!output_directory  !use_stdout)
 +   output_directory = config_output_directory;

clause will not be executed never. Or I've missed something?

Thank you.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Improve contrib/diff-highlight to highlight unevenly-sized hunks

2015-06-19 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I do strip it off, so it is OK for it to be different in both the
 pre-image and post-image. But what I can't tolerate is the
 intermingling with actual data:

   +\t\t\x1b[32m;foo
   +\t\x1b[32m;bar

I think that depends on the definition of strip it off ;-)  What I
meant was that whitespace coloring can appear anywhere on the line,
e.g.

echo COPYING whitespace=tab-in-indent,-space-before-tab .gitattributes
printf  \tHeh\n COPYING
git diff

will give you

new + reset SP wserror HT reset new Heh reset LF

Obviously, stripping + painted in new is not sufficient.

Side note: hmm, shouldn't that SP painted in new, though?

 I think this array of spans is the only way to go. Otherwise whichever
 markup scheme processes the hunk first ruins the data for the next
 processor.

Yes, I agree with that 100%.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Junio C Hamano
Alexander Kuleshov kuleshovm...@gmail.com writes:

 Ah, you mean to put this check before.

I am fuzzy what you mean before (or after); the how about doing
it this way instead? patch we are discussing is to replace the
change you did in your original, so if you apply it you would know
what addition goes to where ;-)
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 3/3] Add filter-objects command

2015-06-19 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Right, my point was only that it works for _your_ particular
 filter, but it would be nice to have something more general. And
 we already have cat-file --batch-check. IOW, I think I would
 prefer the magical form because it's a better scripting building
 block. As you note, filter-objects without any filters is
 exactly that. Your 10 extra lines of C code are not exactly bloat,
 but I just wonder if other people will find it all that useful.

Yup.  I do not mind a fast enumerate all objects but I suspect
that making all fast may turn out to be not so great a trade-off
after all, as you would need more work on the now we have all
coming from our input, let's filter with this and that criteria
downstream in general cases.  Graph-based filtering e.g. Oops, here
is our whole customer database committed by mistake--which branch
should we rewrite to nuke? inherently is much more costly to do in
the downstream that essentially has to reconstruct the graph around
interesting parts of the history, and is better done by enumerate
phase spending time to do the actual graph traversal.

And filter-anything should not be the name for enumerate command
that comes on the upstream of a pipe.  You usually call what is
downstream of a pipe a filter.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v5 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 I basically made up names on the go, based on the messages.

 Some of the questionable groups are:
 
 BAD_DATE DATE_OVERFLOW

 I guess it should be BAD_DATE_OVERFLOW to be more consistent?

I am not sure about consistency, but surely a common prefix would
help readers to group things.  But for this particular group, I was
wondering if singling out integer overflow, zero stuffed
timestamp, etc. into such a finer sub-errors of you have a bad
timestamp was beneficial.

 BAD_TREE_SHA1 INVALID_OBJECT_SHA1 INVALID_TREE
 
 BAD_PARENT_SHA1 INVALID_OBJECT_SHA1

 So how about s/INVALID_/BAD_/g?

It is not just about distinction between INVAID and BAD.

I was basically wondering what rule decides which one among
BAD_TREE_SHA1, INVALID_OBJECT_SHA1 and INVALID_TREE I would get when
I have a random non-hexdigit string in various places, e.g. after
'tree ' in the object header of a commit object, after 'tag ' in a
tag object that says 'type tree', etc.

 Also it is unclear if NOT_SORTED is to be used ever for any error
 other than a tree object sorted incorrectly, or if we start noticing
 a new error that something is not sorted, we will reuse this one.

 s/NOT_SORTED/TREE_/ maybe?

If that error is specific to tree sorting order, then that would be
a definite improvement.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] mergetool-lib: fix default tool selection

2015-06-19 Thread Junio C Hamano
Michael J Gruber g...@drmicha.warpmail.net writes:

 When no diff nor merge tool is specified (config, option), mergetool-lib
 is supposed to choose a default tool from a set of tools. That set is
 constructed dynamically depending on the environment (graphical, editor
 setting) as a space separated string of tool names.

 719518f (mergetool--lib: set IFS for difftool and mergetool, 2015-05-20)
 introduced a newline as IFS which breaks the parsing of the space
 separated list into items, resulting in a failed search for an available
 tool.

 Set IFS to a space locally for the tool search.

I wondered where this locally is ensured; it turns out that this
shell function is supposed to be always called inside $() to return
its result via its standard output ;-)

So I think this change makes sense.

 Signed-off-by: Michael J Gruber g...@drmicha.warpmail.net
 ---
  git-mergetool--lib.sh | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/git-mergetool--lib.sh b/git-mergetool--lib.sh
 index 14b039d..54ac8e4 100644
 --- a/git-mergetool--lib.sh
 +++ b/git-mergetool--lib.sh
 @@ -305,6 +305,7 @@ guess_merge_tool () {
   EOF
  
   # Loop over each candidate and stop when a valid merge tool is found.
 + IFS=' '
   for tool in $tools
   do
   is_available $tool  echo $tool  return 0
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Alexander Kuleshov
Ah, you mean to put this check before. Just tested it and
many tests are broken. Will look on it now

2015-06-19 23:19 GMT+06:00 Alexander Kuleshov kuleshovm...@gmail.com:
 I thought I made that if we did not see '-o dir' on the command
 line, initialize output_directory to what we read from the config
 before we make a call to set_outdir().

 What I am missing?

 Puzzled...  FWIW, IIRC, the patch you are responding to passed the
 test you added.

 Ok, Now we have:

 if (!use_stdout)
 output_directory = set_outdir(prefix, output_directory);
 else
 setup_pager();

 and

 if (output_directory) {
 // test that we did not pass use_stdout and mkdir than
 }

 If we didn't pass --stdout and -o the set_outdir will be called
 and there is

 static const char *set_outdir(const char *prefix, const char 
 *output_directory)
 {
 //printf(is_absoulte_path %d\n, is_absolute_path(output_directory));
 if (output_directory  is_absolute_path(output_directory))
 return output_directory;

 if (!prefix || !*prefix) {
 if (output_directory)
 return output_directory;
 return ./;
 }
 
 }

 So it returns ./, output_directory will not be null. After this

 +   if (!output_directory  !use_stdout)
 +   output_directory = config_output_directory;

 clause will not be executed never. Or I've missed something?

 Thank you.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: co-authoring commits

2015-06-19 Thread Jakub Narębski
On 2015-06-18 at 23:25, Tuncer Ayaz wrote:
 On Thu, Jun 18, 2015 at 12:52 AM, Theodore Ts'o wrote:
 On Wed, Jun 17, 2015 at 10:26:32PM +0200, Tuncer Ayaz wrote:
[...]
 One could imagine some frankly, quite rare example where there is a
 team of people who votes on each commit before it gets sent out and
 where everyone is equal and there is no hierarchy. In that case,
 perhaps you could set the from field to a mailing list address. But
 honestly, how often is that *all* of the authors are completely
 equal[1]?
 
 For that case something like patchwork, phabricator, or gerrit seems
 to be the logical tool to use, and should ideally leave a trace of
 approvals and such in the resulting commit message(s). If the patch
 management tool takes care of merging the commit(s), it can be harder
 to misattribute signed-off/reviewed-by/etc, which is a good thing.

Doesn't Gerrit (at least) use trailer-like structured *notes* in the
'reviews' category (i.e. refs/notes/reviews ref) to store information
about review process?

 You could of course use multiple (everybody makes their own) commits,
 where you risk breaking bisectability and avoid the need for equal
 co-authorship support. In pair programming such intermediate commits
 will quite often be fixups, and when you attempt to squash the fixups
 for bisectability's sake, you may get a desire for co-authorship of
 the resulting commit.

Hmmm... I didn't think about the problem of attributing authorship
for squashed commits.  Though here multiple 'author' headers, or
multiline 'author' header would be a better match than 'coauthor'
header (which itself doesn't need, I think, the date filed, or does it?)

[This is sent from Thunderbird news, so it should be all right]
-- 
Jakub Narębski
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 1/3] Correct test-parse-options to handle negative ints

2015-06-19 Thread Junio C Hamano
Charles Bailey char...@hashpling.org writes:

 From: Charles Bailey cbaile...@bloomberg.net

 Fix the printf specification to treat 'integer' as the signed type that
 it is and add a test that checks that we parse negative option
 arguments.

 Signed-off-by: Charles Bailey cbaile...@bloomberg.net
 ---

Makes sense.  Will queue.

  t/t0040-parse-options.sh | 2 ++
  test-parse-options.c | 2 +-
  2 files changed, 3 insertions(+), 1 deletion(-)

 diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
 index b044785..ecb7417 100755
 --- a/t/t0040-parse-options.sh
 +++ b/t/t0040-parse-options.sh
 @@ -151,6 +151,8 @@ test_expect_success 'short options' '
   test_must_be_empty output.err
  '
  
 +test_expect_success 'OPT_INT() negative' 'check integer: -2345 -i -2345'
 +
  cat  expect  EOF
  boolean: 2
  integer: 1729
 diff --git a/test-parse-options.c b/test-parse-options.c
 index 5dabce6..7c492cf 100644
 --- a/test-parse-options.c
 +++ b/test-parse-options.c
 @@ -82,7 +82,7 @@ int main(int argc, char **argv)
   argc = parse_options(argc, (const char **)argv, prefix, options, usage, 
 0);
  
   printf(boolean: %d\n, boolean);
 - printf(integer: %u\n, integer);
 + printf(integer: %d\n, integer);
   printf(timestamp: %lu\n, timestamp);
   printf(string: %s\n, string ? string : (not set));
   printf(abbrev: %d\n, abbrev);
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v2] format-patch: introduce format.outputDirectory configuration

2015-06-19 Thread Alexander Kuleshov
We can pass -o/--output-directory to the format-patch command to
store patches not in the working directory. This patch introduces
format.outputDirectory configuration option for same purpose.

The case of usage of this configuration option can be convinience
to not pass everytime -o/--output-directory if an user has pattern
to store all patches in the /patches directory for example.

The format.outputDirectory has lower priority than command line
option, so if user will set format.outputDirectory and pass the
command line option, a result will be stored in a directory that
passed to command line option.

Signed-off-by: Alexander Kuleshov kuleshovm...@gmail.com
---
 Documentation/config.txt   |  4 
 Documentation/git-format-patch.txt |  6 +-
 builtin/log.c  |  8 
 t/t4014-format-patch.sh| 18 ++
 4 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e159fe5..4f991b6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1262,6 +1262,10 @@ format.coverLetter::
format-patch is invoked, but in addition can be set to auto, to
generate a cover-letter only when there's more than one patch.
 
+format.outputDirectory::
+   Set a custom directory to store the resulting files instead of the
+   current working directory.
+
 filter.driver.clean::
The command which is used to convert the content of a worktree
file to a blob upon checkin.  See linkgit:gitattributes[5] for
diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index 0dac4e9..38ddd76 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -57,7 +57,11 @@ The names of the output files are printed to standard
 output, unless the `--stdout` option is specified.
 
 If `-o` is specified, output files are created in dir.  Otherwise
-they are created in the current working directory.
+they are created in the current working directory. The default path
+can be set with the seting 'format.outputDirectory' configuration option.
+If `-o` is specified and 'format.outputDirectory' is set, output files
+will be stored in a dir that passed to `-o`. When 'format.outputDirectory'
+is set to get default behaviour back is to pass './' to the `-o`.
 
 By default, the subject of a single patch is [PATCH]  followed by
 the concatenation of lines from the commit message up to the first blank
diff --git a/builtin/log.c b/builtin/log.c
index 78b3e2c..fc26360 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -688,6 +688,8 @@ enum {
COVER_AUTO
 };
 
+static const char *config_output_directory = NULL;
+
 static int git_format_config(const char *var, const char *value, void *cb)
 {
if (!strcmp(var, format.headers)) {
@@ -758,6 +760,9 @@ static int git_format_config(const char *var, const char 
*value, void *cb)
config_cover_letter = git_config_bool(var, value) ? COVER_ON : 
COVER_OFF;
return 0;
}
+   if (!strcmp(var, format.outputdirectory)) {
+   return git_config_string(config_output_directory, var, value);
+   }
 
return git_log_config(var, value, cb);
 }
@@ -1368,6 +1373,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (rev.show_notes)
init_display_notes(rev.notes_opt);
 
+   if (!output_directory  !use_stdout)
+   output_directory = config_output_directory;
+
if (!use_stdout)
output_directory = set_outdir(prefix, output_directory);
else
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 890db11..613e2cc 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -40,6 +40,24 @@ test_expect_success setup '
 
 '
 
+test_expect_success format-patch format.outputDirectory option '
+   git config format.outputDirectory patches/ 
+   git format-patch master..side 
+   cnt=$(ls | wc -l) 
+   test $cnt = 3 
+   test_config format.outputDirectory patches/ 
+   git config --unset format.outputDirectory
+'
+
+test_expect_success format-patch format.outputDirectory overwritten with -o '
+   rm -rf patches 
+   git config format.outputDirectory patches/ 
+   git format-patch master..side -o . 
+   test_must_fail ls patches/ 
+   test_config format.outputDirectory patches/ 
+   git config --unset format.outputDirectory
+'
+
 test_expect_success format-patch --ignore-if-in-upstream '
 
git format-patch --stdout master..side patch0 
-- 
2.4.4.727.g5c3049e.dirty

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 04/19] fsck: Offer a function to demote fsck errors to warnings

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 diff --git a/fsck.c b/fsck.c
 index da5717c..8c3caff 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -103,13 +103,85 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
  {
   int msg_type;
  
 - msg_type = msg_id_info[msg_id].msg_type;
 - if (options-strict  msg_type == FSCK_WARN)
 - msg_type = FSCK_ERROR;
 + assert(msg_id = 0  msg_id  FSCK_MSG_MAX);
 +
 + if (options-msg_type)
 + msg_type = options-msg_type[msg_id];
 + else {
 + msg_type = msg_id_info[msg_id].msg_type;
 + if (options-strict  msg_type == FSCK_WARN)
 + msg_type = FSCK_ERROR;
 + }
  
   return msg_type;
  }

Nice.

 +static inline int substrcmp(const char *string, int len, const char *match)
 +{
 + int match_len = strlen(match);
 + if (match_len != len)
 + return -1;
 + return memcmp(string, match, len);
 +}

What this does looks suspiciously like starts_with(), but its name
substrcmp() does not give any hint that this is about the beginnig
part of string; if anything, it gives a wrong hint that it may be
any substring.  prefixcmp() might be a better name but that was the
old name for !starts_with() so we cannot use it here.  It is a
mouthful, but starts_with_counted() may be.

But the whole thing may be moot.

If we take the why not upcase the end-user string upfront
suggestion from the previous review, fsck_set_msg_types() would have
an upcased copy of the end-user string that it can muck with; it can
turn badfoo=error,poorbar=... into BADFOO=error,POORBAR=...
that is stored in its own writable memory (possibly a strbuf), and
at that point it can afford to NUL-terminate BADFOO=error after
finding where one specification ends with strcspn() before calling
fsck_set_msg_type(), which in turn calls parse_msg_type().

So all parse_msg_type() needs to do is just !strcmp().

 +
 +static int parse_msg_type(const char *str, int len)
 +{
 + if (len  0)
 + len = strlen(str);
 +
 + if (!substrcmp(str, len, error))
 + return FSCK_ERROR;
 + else if (!substrcmp(str, len, warn))
 + return FSCK_WARN;

 + else
 + die(Unknown fsck message type: '%.*s',
 + len, str);
 +}
 +
 +void fsck_set_msg_type(struct fsck_options *options,
 + const char *msg_id, int msg_id_len,
 + const char *msg_type, int msg_type_len)
 +{
 + int id = parse_msg_id(msg_id, msg_id_len), type;
 +
 + if (id  0)
 + die(Unhandled message id: %.*s, msg_id_len, msg_id);
 + type = parse_msg_type(msg_type, msg_type_len);
 +
 + if (!options-msg_type) {
 + int i;
 + int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX);
 + for (i = 0; i  FSCK_MSG_MAX; i++)
 + msg_type[i] = fsck_msg_type(i, options);
 + options-msg_type = msg_type;
 + }
 +
 + options-msg_type[id] = type;
 +}
 +
 +void fsck_set_msg_types(struct fsck_options *options, const char *values)
 +{
 + while (*values) {
 + int len = strcspn(values,  ,|), equal;
 +
 + if (!len) {
 + values++;
 + continue;
 + }
 +
 + for (equal = 0; equal  len; equal++)
 + if (values[equal] == '=' || values[equal] == ':')
 + break;
 +
 + if (equal == len)
 + die(Missing '=': '%.*s', len, values);
 +
 + fsck_set_msg_type(options, values, equal,
 + values + equal + 1, len - equal - 1);
 + values += len;
 + }
 +}
 +
  __attribute__((format (printf, 4, 5)))
  static int report(struct fsck_options *options, struct object *object,
   enum fsck_msg_id id, const char *fmt, ...)
 @@ -599,6 +671,10 @@ int fsck_object(struct object *obj, void *data, unsigned 
 long size,
  
  int fsck_error_function(struct object *obj, int msg_type, const char 
 *message)
  {
 + if (msg_type == FSCK_WARN) {
 + warning(object %s: %s, sha1_to_hex(obj-sha1), message);
 + return 0;
 + }
   error(object %s: %s, sha1_to_hex(obj-sha1), message);
   return 1;
  }
 diff --git a/fsck.h b/fsck.h
 index f6f268a..edb4540 100644
 --- a/fsck.h
 +++ b/fsck.h
 @@ -6,6 +6,11 @@
  
  struct fsck_options;
  
 +void fsck_set_msg_type(struct fsck_options *options,
 + const char *msg_id, int msg_id_len,
 + const char *msg_type, int msg_type_len);
 +void fsck_set_msg_types(struct fsck_options *options, const char *values);
 +
  /*
   * callback function for fsck_walk
   * type is the expected type of the object or OBJ_ANY
 @@ -25,10 +30,11 @@ struct fsck_options {
   fsck_walk_func walk;
   fsck_error error_func;
   unsigned strict:1;
 + int *msg_type;
  };
  
 -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0 

Re: [PATCH v6 06/19] fsck: Report the ID of the error/warning

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 Some legacy code has objects with non-fatal fsck issues; To enable the
 user to ignore those issues, let's print out the ID (e.g. when
 encountering missingemail, the user might want to call `git config
 --add receive.fsck.missingemail=warn`).

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---
  fsck.c  | 16 
  t/t1450-fsck.sh |  4 ++--
  2 files changed, 18 insertions(+), 2 deletions(-)

 diff --git a/fsck.c b/fsck.c
 index 8e6faa8..0b3e18f 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -190,6 +190,20 @@ void fsck_set_msg_types(struct fsck_options *options, 
 const char *values)
   }
  }
  
 +static void append_msg_id(struct strbuf *sb, const char *msg_id)
 +{
 + for (;;) {
 + char c = *(msg_id)++;
 +
 + if (!c)
 + break;
 + if (c != '_')
 + strbuf_addch(sb, tolower(c));
 + }
 +
 + strbuf_addstr(sb, : );
 +}
 +
  __attribute__((format (printf, 4, 5)))
  static int report(struct fsck_options *options, struct object *object,
   enum fsck_msg_id id, const char *fmt, ...)
 @@ -198,6 +212,8 @@ static int report(struct fsck_options *options, struct 
 object *object,
   struct strbuf sb = STRBUF_INIT;
   int msg_type = fsck_msg_type(id, options), result;
  
 + append_msg_id(sb, msg_id_info[id].id_string);


Nice.  The append function can be made a bit more context sensitive
to upcase a char immediately after _ to make it easier to cut and
paste into git config and keep the result readable, I think.

git config --add receive.fsck.missingEmail=warn

   va_start(ap, fmt);
   strbuf_vaddf(sb, fmt, ap);
   result = options-error_func(object, msg_type, sb.buf);
 diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
 index cfb32b6..d6d3b13 100755
 --- a/t/t1450-fsck.sh
 +++ b/t/t1450-fsck.sh
 @@ -231,8 +231,8 @@ test_expect_success 'tag with incorrect tag name  
 missing tagger' '
   git fsck --tags 2out 
  
   cat expect -EOF 
 - warning in tag $tag: invalid '\''tag'\'' name: wrong name format
 - warning in tag $tag: invalid format - expected '\''tagger'\'' line
 + warning in tag $tag: badtagname: invalid '\''tag'\'' name: wrong name 
 format
 + warning in tag $tag: missingtaggerentry: invalid format - expected 
 '\''tagger'\'' line
   EOF
   test_cmp expect out
  '
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v5 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-19 Thread Johannes Schindelin
Hi Junio,

first of all: the improvements discussed here are already part of v6.

On 2015-06-19 19:33, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 I basically made up names on the go, based on the messages.

 Some of the questionable groups are:

 BAD_DATE DATE_OVERFLOW

 I guess it should be BAD_DATE_OVERFLOW to be more consistent?
 
 I am not sure about consistency, but surely a common prefix would
 help readers to group things.  But for this particular group, I was
 wondering if singling out integer overflow, zero stuffed
 timestamp, etc. into such a finer sub-errors of you have a bad
 timestamp was beneficial.

Well, someone thought it a good idea to print different error messages, and I 
took that as an indicator that there is merit in being able to distinguish 
these issues from one another.
 
 BAD_TREE_SHA1 INVALID_OBJECT_SHA1 INVALID_TREE

 BAD_PARENT_SHA1 INVALID_OBJECT_SHA1

 So how about s/INVALID_/BAD_/g?
 
 It is not just about distinction between INVAID and BAD.
 
 I was basically wondering what rule decides which one among
 BAD_TREE_SHA1, INVALID_OBJECT_SHA1 and INVALID_TREE I would get when
 I have a random non-hexdigit string in various places, e.g. after
 'tree ' in the object header of a commit object, after 'tag ' in a
 tag object that says 'type tree', etc.

To be honest, I think the IDs do not really matter as much as your comment 
makes it sound: the IDs purpose is solely to be able to configure the message 
type (read: whether to error out, warn or ignore the issue). The real 
information is in the actual message (and I did not change any message, 
therefore I could not make things worse than they are right now).

Example: you would never read BAD_TREE_SHA1, but instead: badtreesha1: invalid 
'tree' line format - bad sha1.

For BAD_OBJECT_SHA1 (as it is now called), there are actually two code paths 
generating the error: invalid 'object' line format - bad sha1 and no valid 
object to fsck.

And for BAD_TREE it is: could not load commit's tree SHA1.

Thus, from the error message it should be really clear what is going on.

 Also it is unclear if NOT_SORTED is to be used ever for any error
 other than a tree object sorted incorrectly, or if we start noticing
 a new error that something is not sorted, we will reuse this one.

 s/NOT_SORTED/TREE_/ maybe?
 
 If that error is specific to tree sorting order, then that would be
 a definite improvement.

Yes, that is the case. Tree objects are assumed to list their contents in 
order, and this ID applies to the problem where a tree object's list is out of 
order.

As I said, I already made both discussed changes part of v6.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 07/19] fsck: Make fsck_ident() warn-friendly

2015-06-19 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 When fsck_ident() identifies a problem with the ident, it should still
 advance the pointer to the next line so that fsck can continue in the
 case of a mere warning.

 Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
 ---

Makes sense.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

2015-06-19 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Except for the minor nits above, I think this is a good change.

Oh, I forgot to mention one thing.  I am not sure if this should be
called ULONG.  unsigned long-ness is not the most important part
of this thing from the end-user's point of view, and also from the
point of view of the programmer who supports end-users by using this
new feature.

It is unlike OPT_INTEGER, the user can specify it as a human
readble scaled quantity that is the reason to use this new thing.
I think we discussed to introduce OPT_HUMINT (HUM stands for HUMAN,
obviously) or some name like that a few years ago to do exactly
this, but that is not a great name, either.

I was tempted to suggest a name that has size in it, but because
places that we may conceivably want to use it in the future would be
to specify:

 - sizes, e.g. split the packfiles into 4.3G chunks.

 - counts, e.g. show me the most recent 2k commits.

 - bandwidth, e.g. limit the transfer to consume at most 2M bps.

which is not limited to size, it is not a very good idea, either.

OPT_SCALED_ULONG(), or something with scaled in its name, perhaps?
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c

2015-06-19 Thread Jakub Narębski
W dniu 2015-06-19 o 19:58, Junio C Hamano pisze:
 Charles Bailey char...@hashpling.org writes: 
[...]
 +if (!git_parse_ulong(arg, opt-value))
 +return opterror(opt, expects a numerical value, 
 flags);
 
 This used to be:
 
 -die(_(unable to parse value '%s' for option %s),
 -arg, opt-long_name);
 
 but opterror() talks about which option, so there is no information
 loss by losing for option %s from here.  That means there is only
 one difference for pack-objects:
 
 $ git pack-objects --max-pack-size=1T
 fatal: unable to parse value '1T' for option max-pack-size
 $ ./git pack-objects --max-pack-size=1T
 error: option `max-pack-size' expects a numerical value
 usage: git pack-objects --stdout [options...
 ... 30 more lines omitted ...
 
 Eh, make that two:
 
  * We no longer say what value we did not like.  The user presumably
knows what he typed, so this is only a minor loss.

Well, in this case this is not a problem, but for longer commandline
invocation it might be hard to find the exact argument among all the
options (though I don't think there is any integer-accepting option
that can be repeated).
 
  * We used to stop without giving usage, as the error message was
specific enough.  We now spew descriptions on other options
unrelated to the specific error the user may want to concentrate
on.  Perhaps this is a minor regression.
 
 I wonder if expects a numerical value is the best way to say this.

Is expects numerical value easier to understand than unable to
parse value?
 
 Ponder:
 
  - we do not take 4.8

   - we won't take locale specific 4,8 (for some locales)

   - 4O is not numerical... 40 is

  - we do not take -4.
  - people may not realize, from numerical, that we take 5M.
 
 Except for the minor nits above, I think this is a good change.
 
 This is a totally unrelated tangent that does not have to be part of
 your series, but we probably should take 4.8M; I do not think we
 currently do.
 
 Oh, and perhaps 1T, too.
 

--
To unsubscribe from this list: send the line unsubscribe git in


  1   2   >