Re: [BUG] branch renamed to 'HEAD'

2017-02-28 Thread Jacob Keller
On Mon, Feb 27, 2017 at 4:53 PM, Jeff King  wrote:
> On Mon, Feb 27, 2017 at 04:33:36PM -0800, Junio C Hamano wrote:
>
>> A flag to affect the behaviour (as opposed to &flag as a secondary
>> return value, like Peff's patch does) can be made to work.  Perhaps
>> a flag that says "keep the input as is if the result is not a local
>> branch name" would pass an input "@" intact and that may be
>> sufficient to allow "git branch -m @" to rename the current branch
>> to "@" (I do not think it is a sensible rename, though ;-).  But
>> probably some callers need to keep the original input and compare
>> with the result to see if we expanded anything if we go that route.
>> At that point, I am not sure if there are much differences in the
>> ease of use between the two approaches.
>
> I just went into more detail in my reply to Jacob, but I do think this
> is a workable approach (and fortunately we seem to have banned bare "@"
> as a name, along with anything containing "@{}", so I think we would end
> up rejecting these nonsense names).
>
> I'll see if I can work up a patch. We'll still need to pass the flag
> around through the various functions, but at least it will be a flag and
> not a confusing negated out-parameter.
>
> -Peff

Yes, this is pretty much what I had imagined. I look forward to seeing
the patch.

Thanks,
Jake


[PATCH 1/2] docs/diffcore: fix grammar in diffcore-rename header

2017-02-28 Thread Patrick Steinhardt
Signed-off-by: Patrick Steinhardt 
---
 Documentation/gitdiffcore.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index 46bc6d077..cf009a187 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -119,7 +119,7 @@ the original is used), and can be customized by giving a 
number
 after "-B" option (e.g. "-B75" to tell it to use 75%).
 
 
-diffcore-rename: For Detection Renames and Copies
+diffcore-rename: For Detecting Renames and Copies
 -
 
 This transformation is used to detect renames and copies, and is
-- 
2.12.0



[PATCH 2/2] docs/diffcore: unquote "Complete Rewrites" in headers

2017-02-28 Thread Patrick Steinhardt
The gitdiffcore documentation quotes the term "Complete Rewrites" in
headers for no real gain. This would make sense if the term could be
easily confused if not properly grouped together. But actually, the term
is quite obvious and thus does not really need any quoting, especially
regarding that it is not used anywhere else.

But more importanly, this brings up a bug when rendering man pages: when
trying to render quotes inside of a section header, we end up with
quotes which have been misaligned to the end of line. E.g.

diffcore-break: For Splitting Up Complete Rewrites
--

renders as

DIFFCORE-BREAK: FOR SPLITTING UP  COMPLETE REWRITES""

, which is obviously wrong. While this is fixable for the man pages by
using double-quotes (e.g. ""COMPLETE REWRITES""), this again breaks it
for our generated HTML pages.

So fix the issue by simply dropping quotes inside of section headers,
which is currently only done for the term "Complete Rewrites".

Signed-off-by: Patrick Steinhardt 
---

The obvious other route here would be to fix how the stylesheets
handle quoting inside of these headers. But after taking a short
look, I didn't really get how our stylesheets actually stitch
together and as such bailed from doing so.

 Documentation/gitdiffcore.txt | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
index cf009a187..c0a60f315 100644
--- a/Documentation/gitdiffcore.txt
+++ b/Documentation/gitdiffcore.txt
@@ -84,8 +84,8 @@ format sections of the manual for 'git diff-{asterisk}' 
commands) or
 diff-patch format.
 
 
-diffcore-break: For Splitting Up "Complete Rewrites"
-
+diffcore-break: For Splitting Up Complete Rewrites
+--
 
 The second transformation in the chain is diffcore-break, and is
 controlled by the -B option to the 'git diff-{asterisk}' commands.  This is
@@ -177,8 +177,8 @@ the expense of making it slower.  Without 
`--find-copies-harder`,
 copied happened to have been modified in the same changeset.
 
 
-diffcore-merge-broken: For Putting "Complete Rewrites" Back Together
-
+diffcore-merge-broken: For Putting Complete Rewrites Back Together
+--
 
 This transformation is used to merge filepairs broken by
 diffcore-break, and not transformed into rename/copy by
-- 
2.12.0



Re: Bug: "git worktree add" Unable to checkout a branch with no local ref

2017-02-28 Thread Duy Nguyen
On Mon, Feb 27, 2017 at 9:22 PM, Alexander Grigoriev
 wrote:
> git version 2.10.2.windows.1:
> If a remote branch has never been checked out locally (its ref only exists
> in remotes// directory), "git worktree add" command is unable to
> check it out by its normal short name (not prefixed by remotes/),
> while "git checkout" command has been able to handle such a branch and
> properly convert it to a local branch.

We call that "dwim" (do what I mean). Unfortunately "git worktree add"
does not support it.

In the early prototype, "git worktree" called "git checkout"
underneath and something like that should have worked. But I don't
remember if the dwim came up when we decided not to let "git worktree"
run "git checkout". And since dwim thing is checkout thing, the
feature is gone from "git worktree add".

Anyway, I think patches are welcome.
-- 
Duy


Re: [PATCH] http: add an "auto" mode for http.emptyauth

2017-02-28 Thread Johannes Schindelin
Hi,

On Mon, 27 Feb 2017, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The auto mode may incur an extra round-trip over setting
> > http.emptyauth=true, because part of the emptyauth hack is to feed
> > this blank password to curl even before we've made a single request.
> 
> IOW, people who care about an extra round-trip have this workaround,
> which is good.
> 
> This, along with the possible security implications, may want to be
> added to the documentation but that is outside the topic of this change,
> and I think we would want to see such an update come from those who
> actually use NTLM (or Kerberos, but they know they have minimum security
> implications).
> 
> > +#ifndef LIBCURL_CAN_HANDLE_AUTH_ANY +  /* + * Our libcurl is
> > too old to do AUTH_ANY in the first place; + * just default to
> > turning the feature off.  +  */ +#else +/* + * In the
> > automatic case, kick in the empty-auth + * hack as long as we
> > would potentially try some + * method more exotic than "Basic"
> > or "Digest".  +  * + * But only do this when this is our
> > second or +  * subsequent * request, as by then we know what
> 
> I'll drop the '*' that you left while line-wrapping ;-)
> 
> > +* methods are available.  + */
> 
> Thanks.  This looks good.

I replaced the previous version in Git for Windows' `master` branch with
the one in `pu`.

Thanks,
Johannes


Re: [PATCH 2/2] apply: handle assertion failure gracefully

2017-02-28 Thread René Scharfe
Am 27.02.2017 um 23:33 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> Am 27.02.2017 um 21:04 schrieb Junio C Hamano:
>>> René Scharfe  writes:
>>>
> diff --git a/apply.c b/apply.c
> index cbf7cc7f2..9219d2737 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -3652,7 +3652,6 @@ static int check_preimage(struct apply_state *state,
>   if (!old_name)
>   return 0;
>
> - assert(patch->is_new <= 0);

 5c47f4c6 (builtin-apply: accept patch to an empty file) added that
 line. Its intent was to handle diffs that contain an old name even for
 a file that's created.  Citing from its commit message: "When we
 cannot be sure by parsing the patch that it is not a creation patch,
 we shouldn't complain when if there is no such a file."  Why not stop
 complaining also in case we happen to know for sure that it's a
 creation patch? I.e., why not replace the assert() with:

if (patch->is_new == 1)
goto is_new;

>   previous = previous_patch(state, patch, &status);
>>>
>>> When the caller does know is_new is true, old_name must be made/left
>>> NULL.  That is the invariant this assert is checking to catch an
>>> error in the calling code.
>>
>> There are some places in apply.c that set ->is_new to 1, but none of
>> them set ->old_name to NULL at the same time.
> 
> I thought all of these are flipping ->is_new that used to be -1
> (unknown) to (now we know it is new), and sets only new_name without
> doing anything to old_name, because they know originally both names
> are set to NULL.
> 
>> Having to keep these two members in sync sounds iffy anyway.  Perhaps
>> accessors can help, e.g. a setter which frees old_name when is_new is
>> set to 1, or a getter which returns NULL for old_name if is_new is 1.
> 
> Definitely, the setter would make it harder to make the mistake.

When I added setters, apply started to passed NULL to unlink(2) and
rmdir(2) in some of the new tests, which still failed.

That's because three of the diffs trigger both gitdiff_delete(), which
sets is_delete and old_name, and gitdiff_newfile(), which sets is_new
and new_name.  Create and delete equals move, right?  Or should we
error out at this point already?

The last new diff adds a new file that is copied.  Sounds impossible.
How about something like this, which forbids combinations that make no
sense.  Hope it's not too strict; at least all tests succeed.

---
 apply.c | 79 ++---
 1 file changed, 61 insertions(+), 18 deletions(-)

diff --git a/apply.c b/apply.c
index 21b0bebec5..6cb6860511 100644
--- a/apply.c
+++ b/apply.c
@@ -197,6 +197,14 @@ struct fragment {
 #define BINARY_DELTA_DEFLATED  1
 #define BINARY_LITERAL_DEFLATED 2
 
+enum patch_type {
+   CHANGE,
+   CREATE,
+   DELETE,
+   RENAME,
+   COPY
+};
+
 /*
  * This represents a "patch" to a file, both metainfo changes
  * such as creation/deletion, filemode and content changes represented
@@ -205,6 +213,7 @@ struct fragment {
 struct patch {
char *new_name, *old_name, *def_name;
unsigned int old_mode, new_mode;
+   enum patch_type type;
int is_new, is_delete;  /* -1 = unknown, 0 = false, 1 = true */
int rejected;
unsigned ws_rule;
@@ -229,6 +238,36 @@ struct patch {
struct object_id threeway_stage[3];
 };
 
+static int set_patch_type(struct patch *patch, enum patch_type type)
+{
+   if (patch->type != CHANGE && patch->type != type)
+   return error(_("conflicting patch types"));
+   patch->type = type;
+   switch (type) {
+   case CHANGE:
+   break;
+   case CREATE:
+   patch->is_new = 1;
+   patch->is_delete = 0;
+   free(patch->old_name);
+   patch->old_name = NULL;
+   break;
+   case DELETE:
+   patch->is_new = 0;
+   patch->is_delete = 1;
+   free(patch->new_name);
+   patch->new_name = NULL;
+   break;
+   case RENAME:
+   patch->is_rename = 1;
+   break;
+   case COPY:
+   patch->is_copy = 1;
+   break;
+   }
+   return 0;
+}
+
 static void free_fragment_list(struct fragment *list)
 {
while (list) {
@@ -907,13 +946,13 @@ static int parse_traditional_patch(struct apply_state 
*state,
}
}
if (is_dev_null(first)) {
-   patch->is_new = 1;
-   patch->is_delete = 0;
+   if (set_patch_type(patch, CREATE))
+   return -1;
name = find_name_traditional(state, second, NULL, 
state->p_value);
patch->new_name = name;
} else if (is_dev_null(second)) {
-   patch->is_new = 0;
-   patch->is_delete = 1;
+   if (set_patch_type(patch, DELETE))
+  

Re: [PATCH 2/6] Specify explicitly where we parse timestamps

2017-02-28 Thread Johannes Schindelin
Hi Junio,

On Mon, 27 Feb 2017, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> >> -  unsigned long number = strtoul(date, &end, 10);
> >> +  time_t number = parse_timestamp(date, &end, 10);
> >
> > This hunk does not belong to this step.  Everybody else in this step
> 
> obviously I meant "the left half of this hunk" ;-)

Obviously ;-)

Ciao,
Johannes


Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Johannes Schindelin
Hi Junio,

On Mon, 27 Feb 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > One notable fallout of this patch series is that on 64-bit Linux (and
> > other platforms where `unsigned long` is 64-bit), we now limit the
> > range of dates to LONG_MAX (i.e. the *signed* maximum value). This
> > needs to be done as `time_t` can be signed (and indeed is at least on
> > my Ubuntu setup).
> >
> > Obviously, I think that we can live with that, and I hope that all
> > interested parties agree.
> 
> s/ulong/time_t/ is definintely a good change, and it will take us to a
> place we would want to be in in some future.  

Actually. I have to take back the part where I hoped that all interested
parties would agree. The problem is 32-bit Linux:

$ cat >a1.c <<-\EOF
#include 
#include 
#include 

int main(int argc, char **argv)
{
printf("sizeof(long): %d, sizeof(time_t): %d, ulong_max: %lu\n",
   (int)sizeof(long), (int)sizeof(time_t), ULONG_MAX);
return 0;
}
EOF

$ gcc -m32 -Wall -o a1 a1.c

$ ./a1
sizeof(long): 4, sizeof(time_t): 4, ulong_max: 4294967295

So. Not only is `long` a 32-bit on 32-bit Linux, but so is `time_t`. And
with that, switching from `ULONG_MAX` as the maximal time we can represent
in Git to `LONG_MAX` is kind of a serious problem.

> As long as there remains no platform we care about whose time_t and long
> are still 32-bit signed integer, there will be a fallout to them with
> this change.

Sorry, I do not understand the verb "remains" in conjunction with "no
platform"...

Do you mean to say that currently no platform we care about has 32-bit
signed time_t/long?

If so, I just demonstrated this to be unfortunately incorrect.

> It appears that we use uint64_t in many places in our code.  So
> while philosophically time_t is the right type, uint64_t might be
> practically a safer alternative type to use at the endgame patch in
> this series.

Yes, I think you are right. We should use uint64_t instead of time_t, but
*semantically* we should not even use uint64_t. We should introduce our
own data type instead of repeating the mistake to use a data type that
does not convey its role to the reader.

Currently, I am favoring timestamp_t.

> I haven't seen it yet, but presumably the last one 6/6 is the endgame?

Maybe it took a while to get sent out, but it made it into public inbox:
http://public-inbox.org/git/75efe76cbb0636741a7c3aec9e21459bc1dc3cbe.1488231002.git.johannes.schinde...@gmx.de/

Ciao,
Johannes


[PATCH 1/8] interpret_branch_name: move docstring to header file

2017-02-28 Thread Jeff King
We generally put docstrings with function declarations,
because it's the callers who need to know how the function
works. Let's do so for interpret_branch_name().

Signed-off-by: Jeff King 
---
 cache.h | 21 +
 sha1_name.c | 21 -
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 80b6372cf..c67995caa 100644
--- a/cache.h
+++ b/cache.h
@@ -1363,6 +1363,27 @@ extern char *oid_to_hex_r(char *out, const struct 
object_id *oid);
 extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
 extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
 
+/*
+ * This reads short-hand syntax that not only evaluates to a commit
+ * object name, but also can act as if the end user spelled the name
+ * of the branch from the command line.
+ *
+ * - "@{-N}" finds the name of the Nth previous branch we were on, and
+ *   places the name of the branch in the given buf and returns the
+ *   number of characters parsed if successful.
+ *
+ * - "@{upstream}" finds the name of the other ref that
+ *is configured to merge with (missing  defaults
+ *   to the current branch), and places the name of the branch in the
+ *   given buf and returns the number of characters parsed if
+ *   successful.
+ *
+ * If the input is not of the accepted format, it returns a negative
+ * number to signal an error.
+ *
+ * If the input was ok but there are not N branch switches in the
+ * reflog, it returns 0.
+ */
 extern int interpret_branch_name(const char *str, int len, struct strbuf *);
 extern int get_oid_mb(const char *str, struct object_id *oid);
 
diff --git a/sha1_name.c b/sha1_name.c
index 9b5d14b4b..28865b3a1 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1238,27 +1238,6 @@ static int interpret_branch_mark(const char *name, int 
namelen,
return len + at;
 }
 
-/*
- * This reads short-hand syntax that not only evaluates to a commit
- * object name, but also can act as if the end user spelled the name
- * of the branch from the command line.
- *
- * - "@{-N}" finds the name of the Nth previous branch we were on, and
- *   places the name of the branch in the given buf and returns the
- *   number of characters parsed if successful.
- *
- * - "@{upstream}" finds the name of the other ref that
- *is configured to merge with (missing  defaults
- *   to the current branch), and places the name of the branch in the
- *   given buf and returns the number of characters parsed if
- *   successful.
- *
- * If the input is not of the accepted format, it returns a negative
- * number to signal an error.
- *
- * If the input was ok but there are not N branch switches in the
- * reflog, it returns 0.
- */
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
char *at;
-- 
2.12.0.359.gd4c8c42e9



[PATCH 4/8] interpret_branch_name: allow callers to restrict expansions

2017-02-28 Thread Jeff King
The original purpose of interpret_branch_name() was to be
used by get_sha1() in resolving refs. As such, some of its
expansions may point to refs outside of the local
"refs/heads" namespace.

Over time, the function has been picked up by other callers
who want to use the ref-expansion to give the user access to
the same shortcuts (e.g., allowing "git branch" to delete
via "@{-1}" or "@{upstream}").  These uses have confusing
corner cases when the expansion isn't in refs/heads/ (for
instance, deleting "@" tries to delete refs/heads/HEAD,
which is nonsense).

Callers can't know from the returned string how the
expansion happened (e.g., did the user really ask for a
branch named "HEAD", or did we do a bogus expansion?). One
fix would be to return some out-parameters describing the
types of expansion that occurred. This has the benefit that
the caller can generate precise error messages ("I
understood @{upstream} to mean origin/master, but that is a
remote tracking branch, so you cannot create it as a local
name").

However, out-parameters make calling interface somewhat
cumbersome. Instead, let's do the opposite: let the caller
tell us which elements to expand. That's easier to pass in,
and none of the callers give more precise error messages
than "@{upstream} isn't a valid branch name" anyway (which
should be sufficient).

The strbuf_branchname() function needs a similar parameter,
as most of the callers access interpret_branch_name()
through it. For now, we'll pass "0" for "no restrictions" in
each caller, and update them individually in subsequent
patches.

Signed-off-by: Jeff King 
---
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  2 +-
 builtin/merge.c|  2 +-
 cache.h| 13 +++--
 refs.c |  2 +-
 revision.c |  2 +-
 sha1_name.c| 52 +++-
 strbuf.h   |  6 +-
 8 files changed, 60 insertions(+), 21 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 94f7de7fa..cf0ece55d 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -216,7 +216,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
char *target = NULL;
int flags = 0;
 
-   strbuf_branchname(&bname, argv[i]);
+   strbuf_branchname(&bname, argv[i], 0);
free(name);
name = mkpathdup(fmt, bname.buf);
 
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f174f5030..05eefd994 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -452,7 +452,7 @@ static void setup_branch_path(struct branch_info *branch)
 {
struct strbuf buf = STRBUF_INIT;
 
-   strbuf_branchname(&buf, branch->name);
+   strbuf_branchname(&buf, branch->name, 0);
if (strcmp(buf.buf, branch->name))
branch->name = xstrdup(buf.buf);
strbuf_splice(&buf, 0, 0, "refs/heads/", 11);
diff --git a/builtin/merge.c b/builtin/merge.c
index a96d4fb50..848a29855 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -438,7 +438,7 @@ static void merge_name(const char *remote, struct strbuf 
*msg)
char *found_ref;
int len, early;
 
-   strbuf_branchname(&bname, remote);
+   strbuf_branchname(&bname, remote, 0);
remote = bname.buf;
 
memset(branch_head, 0, sizeof(branch_head));
diff --git a/cache.h b/cache.h
index c67995caa..a8816c914 100644
--- a/cache.h
+++ b/cache.h
@@ -1383,8 +1383,17 @@ extern char *oid_to_hex(const struct object_id *oid);
/* same static buffer as s
  *
  * If the input was ok but there are not N branch switches in the
  * reflog, it returns 0.
- */
-extern int interpret_branch_name(const char *str, int len, struct strbuf *);
+ *
+ * If "allowed" is non-zero, it is a treated as a bitfield of allowable
+ * expansions: local branches ("refs/heads/"), remote branches
+ * ("refs/remotes/"), or "HEAD". If no "allowed" bits are set, any expansion is
+ * allowed, even ones to refs outside of those namespaces.
+ */
+#define INTERPRET_BRANCH_LOCAL (1<<0)
+#define INTERPRET_BRANCH_REMOTE (1<<1)
+#define INTERPRET_BRANCH_HEAD (1<<2)
+extern int interpret_branch_name(const char *str, int len, struct strbuf *,
+unsigned allowed);
 extern int get_oid_mb(const char *str, struct object_id *oid);
 
 extern int validate_headref(const char *ref);
diff --git a/refs.c b/refs.c
index 6d0961921..da62119c2 100644
--- a/refs.c
+++ b/refs.c
@@ -405,7 +405,7 @@ int refname_match(const char *abbrev_name, const char 
*full_name)
 static char *substitute_branch_name(const char **string, int *len)
 {
struct strbuf buf = STRBUF_INIT;
-   int ret = interpret_branch_name(*string, *len, &buf);
+   int ret = interpret_branch_name(*string, *len, &buf, 0);
 
if (ret == *len) {
size_t size;
diff --git a/revision.c b/revision.c
index b37dbec37..771d079f6 100644
--- a/revision.c
+++ b/revision.c
@@ -147,7 +147,7 

[PATCH 3/8] strbuf_branchname: add docstring

2017-02-28 Thread Jeff King
This function and its companion, strbuf_check_branch_ref(),
did not have their purpose or semantics explained. Let's do
so.

Signed-off-by: Jeff King 
---
 strbuf.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/strbuf.h b/strbuf.h
index 47df0500d..6b51b2604 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -560,7 +560,22 @@ static inline void strbuf_complete_line(struct strbuf *sb)
strbuf_complete(sb, '\n');
 }
 
+/*
+ * Copy "name" to "sb", expanding any special @-marks as handled by
+ * interpret_branch_name(). The result is a non-qualified branch name
+ * (so "foo" or "origin/master" instead of "refs/heads/foo" or
+ * "refs/remotes/origin/master").
+ *
+ * Note that the resulting name may not be a syntactically valid refname.
+ */
 extern void strbuf_branchname(struct strbuf *sb, const char *name);
+
+/*
+ * Like strbuf_branchname() above, but confirm that the result is
+ * syntactically valid to be used as a local branch name in refs/heads/.
+ *
+ * The return value is "0" if the result is valid, and "-1" otherwise.
+ */
 extern int strbuf_check_branch_ref(struct strbuf *sb, const char *name);
 
 extern void strbuf_addstr_urlencode(struct strbuf *, const char *,
-- 
2.12.0.359.gd4c8c42e9



[PATCH 8/8] checkout: restrict @-expansions when finding branch

2017-02-28 Thread Jeff King
When we parse "git checkout $NAME", we try to interpret
$NAME as a local branch-name. If it is, then we point HEAD
to that branch. Otherwise, we detach the HEAD at whatever
commit $NAME points to.

We do the interpretation by calling strbuf_branchname(), and
then blindly sticking "refs/heads/" on the front. This leads
to nonsense results when expansions like "@{upstream}" or
"@" point to something besides a local branch. We end up
with a local branch name like "refs/heads/origin/master" or
"refs/heads/HEAD".

Normally this has no user-visible effect because those
branches don't exist, and so we fallback to feeding the
result to get_sha1(), which resolves them correctly.

But as the new test in t3204 shows, there are corner cases
where the effect is observable, and we check out the wrong
local branch rather than detaching to the correct one.

Signed-off-by: Jeff King 
---
 builtin/checkout.c|  2 +-
 t/t3204-branch-name-interpretation.sh | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 05eefd994..81f07c3ef 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -452,7 +452,7 @@ static void setup_branch_path(struct branch_info *branch)
 {
struct strbuf buf = STRBUF_INIT;
 
-   strbuf_branchname(&buf, branch->name, 0);
+   strbuf_branchname(&buf, branch->name, INTERPRET_BRANCH_LOCAL);
if (strcmp(buf.buf, branch->name))
branch->name = xstrdup(buf.buf);
strbuf_splice(&buf, 0, 0, "refs/heads/", 11);
diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
index 6115ad504..b8e396009 100755
--- a/t/t3204-branch-name-interpretation.sh
+++ b/t/t3204-branch-name-interpretation.sh
@@ -109,4 +109,14 @@ test_expect_success 'delete branch named "@"' '
expect_deleted refs/heads/@
 '
 
+test_expect_success 'checkout does not treat remote @{upstream} as a branch' '
+   git update-ref refs/remotes/origin/checkout one &&
+   git branch --set-upstream-to=origin/checkout &&
+   git update-ref refs/heads/origin/checkout two &&
+   git update-ref refs/heads/remotes/origin/checkout two &&
+
+   git checkout @{upstream} &&
+   expect_branch HEAD one
+'
+
 test_done
-- 
2.12.0.359.gd4c8c42e9


Re: [PATCH 2/2] docs/diffcore: unquote "Complete Rewrites" in headers

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 09:59:05AM +0100, Patrick Steinhardt wrote:

> The gitdiffcore documentation quotes the term "Complete Rewrites" in
> headers for no real gain. This would make sense if the term could be
> easily confused if not properly grouped together. But actually, the term
> is quite obvious and thus does not really need any quoting, especially
> regarding that it is not used anywhere else.
> 
> But more importanly, this brings up a bug when rendering man pages: when
> trying to render quotes inside of a section header, we end up with
> quotes which have been misaligned to the end of line. E.g.
> 
> diffcore-break: For Splitting Up Complete Rewrites
> --
> 
> renders as
> 
> DIFFCORE-BREAK: FOR SPLITTING UP  COMPLETE REWRITES""
> 
> , which is obviously wrong. While this is fixable for the man pages by
> using double-quotes (e.g. ""COMPLETE REWRITES""), this again breaks it
> for our generated HTML pages.
> 
> So fix the issue by simply dropping quotes inside of section headers,
> which is currently only done for the term "Complete Rewrites".

Thanks for a nice explanation of the issue. I was curious whether
asciidoctor gets this right. It does, though I suppose that's because we
only look at the HTML output. It sounds like the issue is in the
docbook->roff path.

At any rate, I agree with your analysis. It's not worth futzing with the
formatting when it reads just as well without the quotes.

-Peff


[PATCH 6/8] branch: restrict @-expansions when deleting

2017-02-28 Thread Jeff King
We use strbuf_branchname() to expand the branch name from
the command line, so you can delete the branch given by
@{-1}, for example.  However, we allow other nonsense like
"@", and we do not respect our "-r" flag (so we may end up
deleting an oddly-named local ref instead of a remote one).

We can fix this by passing the appropriate "allowed" flag to
strbuf_branchname().

Signed-off-by: Jeff King 
---
 builtin/branch.c  | 5 -
 t/t3204-branch-name-interpretation.sh | 4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index cf0ece55d..291fe90de 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -191,17 +191,20 @@ static int delete_branches(int argc, const char **argv, 
int force, int kinds,
int ret = 0;
int remote_branch = 0;
struct strbuf bname = STRBUF_INIT;
+   unsigned allowed_interpret;
 
switch (kinds) {
case FILTER_REFS_REMOTES:
fmt = "refs/remotes/%s";
/* For subsequent UI messages */
remote_branch = 1;
+   allowed_interpret = INTERPRET_BRANCH_REMOTE;
 
force = 1;
break;
case FILTER_REFS_BRANCHES:
fmt = "refs/heads/%s";
+   allowed_interpret = INTERPRET_BRANCH_LOCAL;
break;
default:
die(_("cannot use -a with -d"));
@@ -216,7 +219,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
char *target = NULL;
int flags = 0;
 
-   strbuf_branchname(&bname, argv[i], 0);
+   strbuf_branchname(&bname, argv[i], allowed_interpret);
free(name);
name = mkpathdup(fmt, bname.buf);
 
diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
index 2fe696ba6..c8fec5b8c 100755
--- a/t/t3204-branch-name-interpretation.sh
+++ b/t/t3204-branch-name-interpretation.sh
@@ -83,7 +83,7 @@ test_expect_success 'delete branch via remote @{upstream}' '
 # Note that we create two oddly named local branches here. We want to make
 # sure that we do not accidentally delete either of them, even if
 # shorten_unambiguous_ref() tweaks the name to avoid ambiguity.
-test_expect_failure 'delete @{upstream} expansion matches -r option' '
+test_expect_success 'delete @{upstream} expansion matches -r option' '
git update-ref refs/remotes/origin/remote-del two &&
git branch --set-upstream-to=origin/remote-del &&
git update-ref refs/heads/origin/remote-del two &&
@@ -103,7 +103,7 @@ test_expect_failure 'create branch named "@"' '
expect_branch refs/heads/@ one
 '
 
-test_expect_failure 'delete branch named "@"' '
+test_expect_success 'delete branch named "@"' '
git update-ref refs/heads/@ two &&
git branch -D @ &&
expect_deleted refs/heads/@
-- 
2.12.0.359.gd4c8c42e9



[PATCH 7/8] strbuf_check_ref_format(): expand only local branches

2017-02-28 Thread Jeff King
This function asks strbuf_branchname() to expand any @-marks
in the branchname, and then we blindly stick refs/heads/ in
front of the result. This is obviously nonsense if the
expansion is "HEAD" or a ref in refs/remotes/.

The most obvious end-user effect is that creating or
renaming a branch with an expansion may have confusing
results (e.g., creating refs/heads/origin/master from
"@{upstream}" when the operation should be disallowed).

We can fix this by telling strbuf_branchname() that we are
only interested in local expansions. Any unexpanded bits are
then fed to check_ref_format(), which either disallows them
(in the case of "@{upstream}") or lets them through
("refs/heads/@" is technically valid, if a bit silly).

Signed-off-by: Jeff King 
---
 sha1_name.c   | 2 +-
 t/t3204-branch-name-interpretation.sh | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index b21997c29..c0cfb69a4 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1317,7 +1317,7 @@ void strbuf_branchname(struct strbuf *sb, const char 
*name, unsigned allowed)
 
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
-   strbuf_branchname(sb, name, 0);
+   strbuf_branchname(sb, name, INTERPRET_BRANCH_LOCAL);
if (name[0] == '-')
return -1;
strbuf_splice(sb, 0, 0, "refs/heads/", 11);
diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
index c8fec5b8c..6115ad504 100755
--- a/t/t3204-branch-name-interpretation.sh
+++ b/t/t3204-branch-name-interpretation.sh
@@ -42,7 +42,7 @@ test_expect_success 'update branch via local @{upstream}' '
expect_branch local two
 '
 
-test_expect_failure 'disallow updating branch via remote @{upstream}' '
+test_expect_success 'disallow updating branch via remote @{upstream}' '
git update-ref refs/remotes/origin/remote one &&
git branch --set-upstream-to=origin/remote &&
 
@@ -98,7 +98,7 @@ test_expect_success 'delete @{upstream} expansion matches -r 
option' '
 # and not refs/heads/HEAD. These tests should not imply that refs/heads/@ is a
 # sane thing, but it _is_ technically allowed for now. If we disallow it, these
 # can be switched to test_must_fail.
-test_expect_failure 'create branch named "@"' '
+test_expect_success 'create branch named "@"' '
git branch -f @ one &&
expect_branch refs/heads/@ one
 '
-- 
2.12.0.359.gd4c8c42e9



Re: [PATCH 4/8] interpret_branch_name: allow callers to restrict expansions

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 07:14:34AM -0500, Jeff King wrote:

> However, out-parameters make calling interface somewhat
> cumbersome. Instead, let's do the opposite: let the caller
> tell us which elements to expand. That's easier to pass in,
> and none of the callers give more precise error messages
> than "@{upstream} isn't a valid branch name" anyway (which
> should be sufficient).

Two things to call attention to:

> -extern int interpret_branch_name(const char *str, int len, struct strbuf *);
> + *
> + * If "allowed" is non-zero, it is a treated as a bitfield of allowable
> + * expansions: local branches ("refs/heads/"), remote branches
> + * ("refs/remotes/"), or "HEAD". If no "allowed" bits are set, any expansion 
> is
> + * allowed, even ones to refs outside of those namespaces.
> + */
> +#define INTERPRET_BRANCH_LOCAL (1<<0)
> +#define INTERPRET_BRANCH_REMOTE (1<<1)
> +#define INTERPRET_BRANCH_HEAD (1<<2)

Is the "0 allows everything, but any bit turns on restrictions"
convention too confusing? It's convenient to use in the callers which do
not need restrictions, but we could add an INTERPRET_BRANCH_ALL flag if
that is more clear (but note that it _isn't_ just bitwise-AND of the
other flags, because technically an @{upstream} could point to
"refs/foo" or some other location).

> -int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
> +int interpret_branch_name(const char *name, int namelen, struct strbuf *buf,
> +   unsigned allowed)
>  {
>   char *at;
>   const char *start;
> @@ -1254,24 +1275,29 @@ int interpret_branch_name(const char *name, int 
> namelen, struct strbuf *buf)
>   if (len == namelen)
>   return len; /* consumed all */
>   else
> - return reinterpret(name, namelen, len, buf);
> + return reinterpret(name, namelen, len, buf, allowed);
>   }

It's hard to see from this context, but a careful reader may note that
we do not check "allowed" at all before calling
interpret_nth_prior_checkout(). This is looking for branch names via
HEAD, so I don't think it can ever return anything but a local name.

Which, hmm. I guess was valid when the flag was "only_branches", but
would not be valid under INTERPRET_BRANCH_REMOTE. I wonder if

  git branch -r -D @{-1}

incorrectly deletes refs/remotes/origin/master if you previously had
refs/heads/origin/master checked out.

-Peff


Re: [PATCH 4/8] interpret_branch_name: allow callers to restrict expansions

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 07:23:38AM -0500, Jeff King wrote:

> > -int interpret_branch_name(const char *name, int namelen, struct strbuf 
> > *buf)
> > +int interpret_branch_name(const char *name, int namelen, struct strbuf 
> > *buf,
> > + unsigned allowed)
> >  {
> > char *at;
> > const char *start;
> > @@ -1254,24 +1275,29 @@ int interpret_branch_name(const char *name, int 
> > namelen, struct strbuf *buf)
> > if (len == namelen)
> > return len; /* consumed all */
> > else
> > -   return reinterpret(name, namelen, len, buf);
> > +   return reinterpret(name, namelen, len, buf, allowed);
> > }
> 
> It's hard to see from this context, but a careful reader may note that
> we do not check "allowed" at all before calling
> interpret_nth_prior_checkout(). This is looking for branch names via
> HEAD, so I don't think it can ever return anything but a local name.
> 
> Which, hmm. I guess was valid when the flag was "only_branches", but
> would not be valid under INTERPRET_BRANCH_REMOTE. I wonder if
> 
>   git branch -r -D @{-1}
> 
> incorrectly deletes refs/remotes/origin/master if you previously had
> refs/heads/origin/master checked out.

The answer is "yes", it's broken. So interpret_branch_name() should do
an "allowed" check before expanding the nth-prior. The fix should be
easy, especially on top of the earlier 426f76595 (which, incidentally, I
already based this series on).

I'll hold off on re-rolling to see if it collects any other review.

-Peff


Re: [PATCH 1/2] wrapper.c: remove unused git_mkstemp() function

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 01:24:10AM +, Ramsay Jones wrote:

> The last caller of git_mkstemp() was removed in commit 6fec0a89
> ("verify_signed_buffer: use tempfile object", 16-06-2016). Since
> the introduction of the 'tempfile' APIs, along with git_mkstemp_mode,
> it is unlikely that new callers will materialize. Remove the dead
> code.

Yeah, I think the tempfile API is a better choice for anybody who wants
to add a new call. Removing the temptation (and the dead code) is a good
move.

-Peff


[PATCH 5/8] t3204: test git-branch @-expansion corner cases

2017-02-28 Thread Jeff King
git-branch feeds the branch names from the command line to
strbuf_branchname(), but we do not yet tell that function
which kinds of expansions should be allowed. Let's create a
set of tests that cover both the allowed and disallowed
cases.

That shows off some breakages where we currently create or
delete the wrong ref (and will make sure that we do not
break any cases that _should_ be working when we do add more
restrictions).

Note that we check branch creation and deletion, but do not
bother with renames. Those follow the same code path as
creation.

Signed-off-by: Jeff King 
---
 t/t3204-branch-name-interpretation.sh | 112 ++
 1 file changed, 112 insertions(+)
 create mode 100755 t/t3204-branch-name-interpretation.sh

diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
new file mode 100755
index 0..2fe696ba6
--- /dev/null
+++ b/t/t3204-branch-name-interpretation.sh
@@ -0,0 +1,112 @@
+#!/bin/sh
+
+test_description='interpreting exotic branch name arguments
+
+Branch name arguments are usually names which are taken to be inside of
+refs/heads/, but we interpret some magic syntax like @{-1}, @{upstream}, etc.
+This script aims to check the behavior of those corner cases.
+'
+. ./test-lib.sh
+
+expect_branch() {
+   git log -1 --format=%s "$1" >actual &&
+   echo "$2" >expect &&
+   test_cmp expect actual
+}
+
+expect_deleted() {
+   test_must_fail git rev-parse --verify "$1"
+}
+
+test_expect_success 'set up repo' '
+   test_commit one &&
+   test_commit two &&
+   git remote add origin foo.git
+'
+
+test_expect_success 'update branch via @{-1}' '
+   git branch previous one &&
+
+   git checkout previous &&
+   git checkout master &&
+
+   git branch -f @{-1} two &&
+   expect_branch previous two
+'
+
+test_expect_success 'update branch via local @{upstream}' '
+   git branch local one &&
+   git branch --set-upstream-to=local &&
+
+   git branch -f @{upstream} two &&
+   expect_branch local two
+'
+
+test_expect_failure 'disallow updating branch via remote @{upstream}' '
+   git update-ref refs/remotes/origin/remote one &&
+   git branch --set-upstream-to=origin/remote &&
+
+   test_must_fail git branch -f @{upstream} two
+'
+
+test_expect_success 'create branch with pseudo-qualified name' '
+   git branch refs/heads/qualified two &&
+   expect_branch refs/heads/refs/heads/qualified two
+'
+
+test_expect_success 'delete branch via @{-1}' '
+   git branch previous-del &&
+
+   git checkout previous-del &&
+   git checkout master &&
+
+   git branch -D @{-1} &&
+   expect_deleted previous-del
+'
+
+test_expect_success 'delete branch via local @{upstream}' '
+   git branch local-del &&
+   git branch --set-upstream-to=local-del &&
+
+   git branch -D @{upstream} &&
+   expect_deleted local-del
+'
+
+test_expect_success 'delete branch via remote @{upstream}' '
+   git update-ref refs/remotes/origin/remote-del two &&
+   git branch --set-upstream-to=origin/remote-del &&
+
+   git branch -r -D @{upstream} &&
+   expect_deleted origin/remote-del
+'
+
+# Note that we create two oddly named local branches here. We want to make
+# sure that we do not accidentally delete either of them, even if
+# shorten_unambiguous_ref() tweaks the name to avoid ambiguity.
+test_expect_failure 'delete @{upstream} expansion matches -r option' '
+   git update-ref refs/remotes/origin/remote-del two &&
+   git branch --set-upstream-to=origin/remote-del &&
+   git update-ref refs/heads/origin/remote-del two &&
+   git update-ref refs/heads/remotes/origin/remote-del two &&
+
+   test_must_fail git branch -D @{upstream} &&
+   expect_branch refs/heads/origin/remote-del two &&
+   expect_branch refs/heads/remotes/origin/remote-del two
+'
+
+# The thing we are testing here is that "@" is the real branch refs/heads/@,
+# and not refs/heads/HEAD. These tests should not imply that refs/heads/@ is a
+# sane thing, but it _is_ technically allowed for now. If we disallow it, these
+# can be switched to test_must_fail.
+test_expect_failure 'create branch named "@"' '
+   git branch -f @ one &&
+   expect_branch refs/heads/@ one
+'
+
+test_expect_failure 'delete branch named "@"' '
+   git update-ref refs/heads/@ two &&
+   git branch -D @ &&
+   expect_deleted refs/heads/@
+'
+
+test_done
-- 
2.12.0.359.gd4c8c42e9



Re: SHA1 collisions found

2017-02-28 Thread brian m. carlson
On Mon, Feb 27, 2017 at 02:29:18PM +0100, René Scharfe wrote:
> Am 25.02.2017 um 20:04 schrieb brian m. carlson:
> >>> So I think that the current scope left is best estimated by the
> >>> following command:
> >>>
> >>>   git grep -P 'unsigned char\s+(\*|.*20)' | grep -v '^Documentation'
> >>>
> >>> So there are approximately 1200 call sites left, which is quite a bit of
> >>> work.  I estimate between the work I've done and other people's
> >>> refactoring work (such as the refs backend refactor), we're about 40%
> >>> done.
> > 
> > As a note, I've been working on this pretty much nonstop since the
> > collision announcement was made.  After another 27 commits, I've got it
> > down from 1244 to 1119.
> > 
> > I plan to send another series out sometime after the existing series has
> > hit next.  People who are interested can follow the object-id-part*
> > branches at https://github.com/bk2204/git.
> 
> Perhaps the following script can help a bit; it converts local and static
> variables in specified files.  It's just a simplistic parser which can get
> at least shadowing variables, strings and comments wrong, so its results
> need to be reviewed carefully.
> 
> I failed to come up with an equivalent Coccinelle patch so far. :-/
> 
> René
> 
> 
> #!/bin/sh
> while test $# -gt 0
> do
>   file="$1"
>   tmp="$file.new"
>   test -f "$file" &&
>   perl -e '
>   use strict;
>   my %indent;
>   my %old;
>   my %new;
>   my $in_struct = 0;
>   while (<>) {
>   if (/^(\s*)}/) {
>   my $len = length $1;
>   foreach my $key (keys %indent) {
>   if ($len < length($indent{$key})) {
>   delete $indent{$key};
>   delete $old{$key};
>   delete $new{$key};
>   }
>   }
>   $in_struct = 0;
>   }
>   if (!$in_struct and /^(\s*)(static )?unsigned char 
> (\w+)\[20\];$/) {
>   my $prefix = "$1$2";
>   my $name = $3;
>   $indent{$.} = $1;
>   $old{$.} = qr/(?)(?   $name =~ s/sha1/oid/;
>   print $prefix . "struct object_id " . $name . 
> ";\n";
>   $new{$.} = $name . ".hash";
>   next;
>   }
>   if (/^(\s*)(static )?struct (\w+ )?\{$/) {
>   $in_struct = 1;
>   }
>   if (!$in_struct and ! /\/\*/) {
>   foreach my $key (keys %indent) {
>   s/$old{$key}/$new{$key}/g;
>   }
>   }
>   print;
>   }
>   ' "$file" >"$tmp" &&
>   mv "$tmp" "$file" ||
>   exit 1
>   shift
> done

I'll see how it works.  I'm currently in New Orleans visiting a friend
until Thursday, so I'll have less time than normal to look at these, but
I'll definitely give it a try.

Most of the issue is not the actual conversion, but finding the right
order in which to convert functions.  For example, the object-id-part8
branch on my GitHub account converts parse_object, but
parse_tree_indirect has to be converted before you can do parse_object.
That leads to another handful of patches that have to be done.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] http: attempt updating base URL only if no error

2017-02-28 Thread Jeff King
On Mon, Feb 27, 2017 at 06:53:11PM -0800, Jonathan Tan wrote:

> http.c supports HTTP redirects of the form
> 
>   http://foo/info/refs?service=git-upload-pack
>   -> http://anything
>   -> http://bar/info/refs?service=git-upload-pack
> 
> (that is to say, as long as the Git part of the path and the query
> string is preserved in the final redirect destination, the intermediate
> steps can have any URL). However, if one of the intermediate steps
> results in an HTTP exception, a confusing "unable to update url base
> from redirection" message is printed instead of a Curl error message
> with the HTTP exception code.

Right, your patch makes sense. A real HTTP error should take precedence
over the url-update trickery.

Acked-by: Jeff King 

> Therefore, teach http to check the HTTP status before attempting to
> check if only the "base" part of the URL differed. This commit teaches
> http_request_reauth to return early without updating options->base_url
> upon an error; the only invoker of this function that passes a non-NULL
> "options" is remote-curl.c (through "http_get_strbuf"), which only uses
> options->base_url for an informational message in the situations that
> this commit cares about (that is, when the return value is not HTTP_OK).

Running your included test, we get:

  fatal: unable to access 'http://127.0.0.1:5550/redir-to/502/': The
  requested URL returned error: 502

but the error really happened in the intermediate step. I wonder if we
should show the effective_url in that case, as it's more likely to
pinpoint the problem. OTOH, we do not mention the intermediate redirect
at all, so they might be confused about where that URL came from. If you
really want to debug HTTP confusion, you should use GIT_TRACE_CURL.

At any rate, that's orthogonal to your patch, which is obviously the
right thing to do.

-Peff


Re: [BUG] branch renamed to 'HEAD'

2017-02-28 Thread Jeff King
On Mon, Feb 27, 2017 at 07:53:02PM -0500, Jeff King wrote:

> On Mon, Feb 27, 2017 at 04:33:36PM -0800, Junio C Hamano wrote:
> 
> > A flag to affect the behaviour (as opposed to &flag as a secondary
> > return value, like Peff's patch does) can be made to work.  Perhaps
> > a flag that says "keep the input as is if the result is not a local
> > branch name" would pass an input "@" intact and that may be
> > sufficient to allow "git branch -m @" to rename the current branch
> > to "@" (I do not think it is a sensible rename, though ;-).  But
> > probably some callers need to keep the original input and compare
> > with the result to see if we expanded anything if we go that route.
> > At that point, I am not sure if there are much differences in the
> > ease of use between the two approaches.
> 
> I just went into more detail in my reply to Jacob, but I do think this
> is a workable approach (and fortunately we seem to have banned bare "@"
> as a name, along with anything containing "@{}", so I think we would end
> up rejecting these nonsense names).
> 
> I'll see if I can work up a patch. We'll still need to pass the flag
> around through the various functions, but at least it will be a flag and
> not a confusing negated out-parameter.

OK, I have a series which fixes this (diffstat below). When I audited
the other callers of interpret_branch_name() and strbuf_branchname(), it
turned out to be even more complicated. The callers basically fall into
a few buckets:

  1. Callers like get_sha1() and merge_name() pass the result to
 dwim_ref(), and are prepared to handle anything.

  2. Some callers stick "refs/heads/" in front of the result, and
 obviously only want local names. Most of git-branch and
 git-checkout fall into this boat.

  3. "git branch -d" can delete local _or_ remote branches, depending on
 the "-r" flag. So the expansion it wants varies, and we need to
 handle "just local" or "just remote".

So I converted the "only_branch" flag to an "allowed" bit-field. No
callers actually ask for more than a single type at once, but it was
easy to do it that way. It serves all of the callers, and will easily
adapt for the future (e.g., if "git branch -a -d" were ever allowed).

  [1/8]: interpret_branch_name: move docstring to header file
  [2/8]: strbuf_branchname: drop return value
  [3/8]: strbuf_branchname: add docstring
  [4/8]: interpret_branch_name: allow callers to restrict expansions
  [5/8]: t3204: test git-branch @-expansion corner cases
  [6/8]: branch: restrict @-expansions when deleting
  [7/8]: strbuf_check_ref_format(): expand only local branches
  [8/8]: checkout: restrict @-expansions when finding branch

 builtin/branch.c  |   5 +-
 builtin/checkout.c|   2 +-
 builtin/merge.c   |   2 +-
 cache.h   |  32 -
 refs.c|   2 +-
 revision.c|   2 +-
 sha1_name.c   |  76 ++---
 strbuf.h  |  21 +-
 t/t3204-branch-name-interpretation.sh | 122 ++
 9 files changed, 220 insertions(+), 44 deletions(-)
 create mode 100755 t/t3204-branch-name-interpretation.sh

-Peff


Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Jeff King
On Mon, Feb 27, 2017 at 10:30:20PM +0100, Johannes Schindelin wrote:

> One notable fallout of this patch series is that on 64-bit Linux (and
> other platforms where `unsigned long` is 64-bit), we now limit the range
> of dates to LONG_MAX (i.e. the *signed* maximum value). This needs to be
> done as `time_t` can be signed (and indeed is at least on my Ubuntu
> setup).
> 
> Obviously, I think that we can live with that, and I hope that all
> interested parties agree.

I do not just agree, but I think the move to a signed timestamp is a big
improvement. Git's object format is happy to represent times before
1970, but the code is not. I know this has been a pain for people who
import ancient histories into Git.

It looks from the discussion like the sanest path forward is our own
signed-64bit timestamp_t. That's unfortunate compared to using the
standard time_t, but hopefully it would reduce the number of knobs (like
TIME_T_IS_INT64) in the long run.

-Peff


Re: 'git submodules update' ignores credential.helper config of the parent repository

2017-02-28 Thread Jeff King
On Mon, Feb 27, 2017 at 11:09:12AM -0800, Stefan Beller wrote:

> For worktrees these multiple config files sounded like
> the obvious solution, but I wonder if there was also
> some bike shedding about other solutions?
> 
> I could imagine that we would want to have attributes
> for specific configuration, e.g.:
> 
> --8<--
> [core]
> repositoryformatversion = 0
> filemode = true
> bare = false
> logallrefupdates = true
> [remote "origin"]
> url = git://github.com/gitster/git
> fetch = +refs/heads/*:refs/remotes/origin/*
> [attribute "submodules"]
> read = true
> # this will be read and respected by submodules as well:
> [url."internal-git-miror"]
> insteadOf = github.com
> [attribute "submodules"]
> read = false
> # This (and the beginning of this file) will not be respected
> # by submodules
> [credential]
> helper =
> -->8--
> 
> This would change the semantics of a config file as the attribute for
> each setting depends on the location (was attribute.FOO.read =
> {true, false} read before).

I'm not enthused by this, just because there is a hidden dependency
between attribute.* sections and other ones. They _look_ like regular
config keys, but they really aren't.

I have a feeling that something like this would create unwelcome corner
cases in the config-writer, which is otherwise does not have to care
about which existing section of a file it adds a key to.

-Peff


[PATCH 2/8] strbuf_branchname: drop return value

2017-02-28 Thread Jeff King
The return value from strbuf_branchname() is confusing and
useless: it's 0 if the whole name was consumed by an @-mark,
but otherwise is the length of the original name we fed.

No callers actually look at the return value, so let's just
get rid of it.

Signed-off-by: Jeff King 
---
 sha1_name.c | 5 +
 strbuf.h| 2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 28865b3a1..4c1e91184 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1279,17 +1279,14 @@ int interpret_branch_name(const char *name, int 
namelen, struct strbuf *buf)
return -1;
 }
 
-int strbuf_branchname(struct strbuf *sb, const char *name)
+void strbuf_branchname(struct strbuf *sb, const char *name)
 {
int len = strlen(name);
int used = interpret_branch_name(name, len, sb);
 
-   if (used == len)
-   return 0;
if (used < 0)
used = 0;
strbuf_add(sb, name + used, len - used);
-   return len;
 }
 
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
diff --git a/strbuf.h b/strbuf.h
index cf1b5409e..47df0500d 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -560,7 +560,7 @@ static inline void strbuf_complete_line(struct strbuf *sb)
strbuf_complete(sb, '\n');
 }
 
-extern int strbuf_branchname(struct strbuf *sb, const char *name);
+extern void strbuf_branchname(struct strbuf *sb, const char *name);
 extern int strbuf_check_branch_ref(struct strbuf *sb, const char *name);
 
 extern void strbuf_addstr_urlencode(struct strbuf *, const char *,
-- 
2.12.0.359.gd4c8c42e9



Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Johannes Schindelin
Hi Peff,

On Tue, 28 Feb 2017, Jeff King wrote:

> On Mon, Feb 27, 2017 at 10:30:20PM +0100, Johannes Schindelin wrote:
> 
> > One notable fallout of this patch series is that on 64-bit Linux (and
> > other platforms where `unsigned long` is 64-bit), we now limit the
> > range of dates to LONG_MAX (i.e. the *signed* maximum value). This
> > needs to be done as `time_t` can be signed (and indeed is at least on
> > my Ubuntu setup).
> > 
> > Obviously, I think that we can live with that, and I hope that all
> > interested parties agree.
> 
> I do not just agree, but I think the move to a signed timestamp is a big
> improvement. Git's object format is happy to represent times before
> 1970, but the code is not. I know this has been a pain for people who
> import ancient histories into Git.
> 
> It looks from the discussion like the sanest path forward is our own
> signed-64bit timestamp_t. That's unfortunate compared to using the
> standard time_t, but hopefully it would reduce the number of knobs (like
> TIME_T_IS_INT64) in the long run.

Boy am I happy that I did not go ahead and changed the code to use
uint64_t yet...

I'll let the dust settle a bit and then make further changes and send out
v2.

Ciao,
Dscho


RE: Unconventional roles of git

2017-02-28 Thread Randall S. Becker
>From: ankostis [mailto:ankos...@gmail.com] 
>Sent: February 28, 2017 8:01 AM
>To: Randall S. Becker 
>Cc: Git Mailing List ; Jason Cooper 
>Subject: Re: Unconventional roles of git
>On 27 February 2017 at 20:16, Randall S. Becker 
> wrote:
>> On February 26, 2017 6:52 AM, Kostis Anagnostopoulos 
>>  worte:
>>> On 26 February 2017 at 02:13, Jason Cooper  
>>> wrote:
>>> > As someone looking to deploy (and having previously deployed) git in
>>> > unconventional roles, I'd like to add ...
>>>
>>> We are developing a distributed storage for type approval files regarding 
>>> all
>>> vehicles registered in Europe.[1]  To ensure integrity even after 10 or 30
>>> years, the hash of a commit of these files (as contained in a tag) are to be
>>> printed on the the paper certificates.
>>>
>>> - Can you provide some hints for other similar unconventional roles of git?
>>> - Any other comment on the above usage of git are welcomed.
>>
>> I am involved in managing manufacturing designs and parts configurations and 
>> approvals with git being intimately involved in the process of developing 
>> and deploying tested designs to >computerized manufacturing environments. 
>> It's pretty cool actually to see things become real.

>Thanks for the tip.
>Can you provide some links or the legislation involved?

I have not done much in the way of write-ups other than PowerPoint-based 
training material for the companies involved. So far, this does not seem to be 
subject to regulatory or legislation - but that depends on what is being 
manufactured. In the current situation, I'm helping organize cabinet parts, 
components, GCode, optimizations, and other arcane artifacts in the woodworking 
community for CNC and related process support. It is an evolving domain. I do 
wish that Cloud providers like Atlassian would provide more comprehensive 
integrated code reviews (a.k.a. Gerrit) for some of this work. Surprisingly 
that's a harder sell to dedicate a server internally.

Cheers,
Randall




gpg verify git sub modules useful?

2017-02-28 Thread Patrick Schleizer
When using git submodules, is there value in iterating about the git
submodules running "git verfiy-commit HEAD" or would that be already
covered by the git submodule verification?

Cheers,
Patrick


format-patch subject-prefix gets truncated when using the --numbered flag

2017-02-28 Thread Adrian Dudau
Hello,

I noticed that the --subject-prefix string gets truncated sometimes,
but only when using the --numbered flat. Here's an example:

addu@sestofb11:/data/fb/addu/git$ export longm="very very very very
very very very very very very very very very very long prefix"


addu@sestofb11:/data/fb/addu/git$ git format-patch -1 --subject-
prefix="$longm][PATCH"

As expected, in the generated patch file we have:
Subject: [very very very very very very very very very very very very
very very long prefix][PATCH] 
 First batch after 2.12

But now, if I pass the --numbered flag too:
addu@sestofb11:/data/fb/addu/git$ git format-patch -1 --numbered --
subject-prefix="$longm][PATCH"

In the generated patch file we get this: 
Subject: [very very very very very very very very very very verFirst
batch 
 after 2.12

This is happening on the latest master branch, so I dug through the
code and tracked the issue to this piece of code in log-tree.c:

if (opt->total > 0) {
static char buffer[64];
snprintf(buffer, sizeof(buffer),
 "Subject: [%s%s%0*d/%d] ",
 opt->subject_prefix,
 *opt->subject_prefix ? " " : "",
 digits_in_number(opt->total),
 opt->nr, opt->total);
subject = buffer;
} else if (opt->total == 0 && opt->subject_prefix && *opt-
>subject_prefix) {
static char buffer[256];
snprintf(buffer, sizeof(buffer),
 "Subject: [%s] ",
 opt->subject_prefix);
subject = buffer;
} else {
subject = "Subject: ";
}

Apparently the size of the "buffer" var is different in the two
situations. Anybody knows if this is by design or just an old
oversight?
I can send a patch to fix it but I'm not very familiar with the git
code and I'm afraid some hidden consequence I don't see right now.

Cheers,
--Adrian

Re: [PATCH 0/6] Use time_t

2017-02-28 Thread René Scharfe

Am 28.02.2017 um 15:28 schrieb Jeff King:

On Mon, Feb 27, 2017 at 10:30:20PM +0100, Johannes Schindelin wrote:


One notable fallout of this patch series is that on 64-bit Linux (and
other platforms where `unsigned long` is 64-bit), we now limit the range
of dates to LONG_MAX (i.e. the *signed* maximum value). This needs to be
done as `time_t` can be signed (and indeed is at least on my Ubuntu
setup).

Obviously, I think that we can live with that, and I hope that all
interested parties agree.


I do not just agree, but I think the move to a signed timestamp is a big
improvement. Git's object format is happy to represent times before
1970, but the code is not. I know this has been a pain for people who
import ancient histories into Git.

It looks from the discussion like the sanest path forward is our own
signed-64bit timestamp_t. That's unfortunate compared to using the
standard time_t, but hopefully it would reduce the number of knobs (like
TIME_T_IS_INT64) in the long run.


Glibc will get a way to enable 64-bit time_t on 32-bit platforms 
eventually (https://sourceware.org/glibc/wiki/Y2038ProofnessDesign). 
Can platforms that won't provide a 64-bit time_t by 2038 be actually 
used at that point?  How would we get time information on them?  How 
would a custom timestamp_t help us?


Regarding the need for knobs: We could let the compiler chose between 
strtoll() and strtol() based on the size of time_t, in an inline 
function.  The maximum value can be calculated using its size as well. 
And we could use PRIdMAX and cast to intmax_t for printing.


René


Re: [PATCH v5 04/24] files-backend: convert git_path() to strbuf_git_path()

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> git_path() and friends are going to be killed in files-backend.c in near
> future. And because there's a risk with overwriting buffer in
> git_path(), let's convert them all to strbuf_git_path(). We'll have
> easier time killing/converting strbuf_git_path() then because we won't
> have to worry about memory management again.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  refs/files-backend.c | 139 
> +++
>  1 file changed, 106 insertions(+), 33 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 4676525de..435db1293 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> [...]
> @@ -2586,9 +2603,15 @@ static int files_rename_ref(struct ref_store 
> *ref_store,
>   int flag = 0, logmoved = 0;
>   struct ref_lock *lock;
>   struct stat loginfo;
> - int log = !lstat(git_path("logs/%s", oldrefname), &loginfo);
> + struct strbuf sb_oldref = STRBUF_INIT;
> + struct strbuf sb_newref = STRBUF_INIT;
> + struct strbuf tmp_renamed_log = STRBUF_INIT;
> + int log, ret;
>   struct strbuf err = STRBUF_INIT;
>  
> + strbuf_git_path(&sb_oldref, "logs/%s", oldrefname);
> + log = !lstat(sb_oldref.buf, &loginfo);
> + strbuf_release(&sb_oldref);
>   if (log && S_ISLNK(loginfo.st_mode))
>   return error("reflog for %s is a symlink", oldrefname);
>  
> @@ -2602,7 +2625,12 @@ static int files_rename_ref(struct ref_store 
> *ref_store,
>   if (!rename_ref_available(oldrefname, newrefname))
>   return 1;
>  
> - if (log && rename(git_path("logs/%s", oldrefname), 
> git_path(TMP_RENAMED_LOG)))
> + strbuf_git_path(&sb_oldref, "logs/%s", oldrefname);
> + strbuf_git_path(&tmp_renamed_log, TMP_RENAMED_LOG);
> + ret = log && rename(sb_oldref.buf, tmp_renamed_log.buf);
> + strbuf_release(&sb_oldref);
> + strbuf_release(&tmp_renamed_log);
> + if (ret)
>   return error("unable to move logfile logs/%s to 
> "TMP_RENAMED_LOG": %s",
>   oldrefname, strerror(errno));
>  
> @@ -2681,13 +2709,19 @@ static int files_rename_ref(struct ref_store 
> *ref_store,
>   log_all_ref_updates = flag;
>  
>   rollbacklog:
> - if (logmoved && rename(git_path("logs/%s", newrefname), 
> git_path("logs/%s", oldrefname)))
> + strbuf_git_path(&sb_newref, "logs/%s", newrefname);
> + strbuf_git_path(&sb_oldref, "logs/%s", oldrefname);
> + if (logmoved && rename(sb_newref.buf, sb_oldref.buf))
>   error("unable to restore logfile %s from %s: %s",
>   oldrefname, newrefname, strerror(errno));
> + strbuf_git_path(&tmp_renamed_log, TMP_RENAMED_LOG);
>   if (!logmoved && log &&
> - rename(git_path(TMP_RENAMED_LOG), git_path("logs/%s", oldrefname)))
> + rename(tmp_renamed_log.buf, sb_oldref.buf))
>   error("unable to restore logfile %s from "TMP_RENAMED_LOG": %s",
>   oldrefname, strerror(errno));

It feels like you're writing, releasing, re-writing these strbufs more
than necessary. Maybe it would be clearer to set them once, and on
errors set `ret = error()` then jump to a label here...

> + strbuf_release(&sb_newref);
> + strbuf_release(&sb_oldref);
> + strbuf_release(&tmp_renamed_log);
>  

...and change this to `return ret`?

>   return 1;
>  }
> [...]
> @@ -4108,18 +4171,28 @@ static int files_reflog_expire(struct ref_store 
> *ref_store,
>  
>  static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  {
> + struct strbuf sb = STRBUF_INIT;
> +
>   /* Check validity (but we don't need the result): */
>   files_downcast(ref_store, 0, "init_db");
>  
>   /*
>* Create .git/refs/{heads,tags}
>*/
> - safe_create_dir(git_path("refs/heads"), 1);
> - safe_create_dir(git_path("refs/tags"), 1);
> + strbuf_git_path(&sb, "refs/heads");
> + safe_create_dir(sb.buf, 1);
> + strbuf_reset(&sb);
> + strbuf_git_path(&sb, "refs/tags");
> + safe_create_dir(sb.buf, 1);
> + strbuf_reset(&sb);
>   if (get_shared_repository()) {
> - adjust_shared_perm(git_path("refs/heads"));
> - adjust_shared_perm(git_path("refs/tags"));
> + strbuf_git_path(&sb, "refs/heads");
> + adjust_shared_perm(sb.buf);
> + strbuf_reset(&sb);
> + strbuf_git_path(&sb, "refs/tags");
> + adjust_shared_perm(sb.buf);
>   }
> + strbuf_release(&sb);
>   return 0;
>  }

It looks to me like `safe_create_dir()` already has the ability to
`adjust_shared_perm()`, or am I missing something? (I realize that this
is preexisting code.)

Michael



Re: [PATCH v5 05/24] files-backend: move "logs/" out of TMP_RENAMED_LOG

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> This makes reflog path building consistent, always in the form of
> 
> strbuf_git_path(sb, "logs/%s", refname);
> 
> It reduces the mental workload a bit in the next patch when that
> function call is converted.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  refs/files-backend.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 435db1293..69946b0de 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2513,7 +2513,7 @@ static int files_delete_refs(struct ref_store 
> *ref_store,
>   * IOW, to avoid cross device rename errors, the temporary renamed log must
>   * live into logs/refs.
>   */
> -#define TMP_RENAMED_LOG  "logs/refs/.tmp-renamed-log"
> +#define TMP_RENAMED_LOG  "refs/.tmp-renamed-log"

The constant name feels a little bit misleading now that it is not the
name of a logfile but rather a reference name. OTOH "tmp-renamed-log" is
*in* the reference name so I guess it's not really wrong.

>  struct rename_cb {
>   const char *tmp_renamed_log;
> @@ -2549,7 +2549,7 @@ static int rename_tmp_log(const char *newrefname)
>   int ret;
>  
>   strbuf_git_path(&path, "logs/%s", newrefname);
> - strbuf_git_path(&tmp, TMP_RENAMED_LOG);
> + strbuf_git_path(&tmp, "logs/%s", TMP_RENAMED_LOG);
>   cb.tmp_renamed_log = tmp.buf;
>   ret = raceproof_create_file(path.buf, rename_tmp_log_callback, &cb);
>   if (ret) {
> @@ -2626,12 +2626,12 @@ static int files_rename_ref(struct ref_store 
> *ref_store,
>   return 1;
>  
>   strbuf_git_path(&sb_oldref, "logs/%s", oldrefname);
> - strbuf_git_path(&tmp_renamed_log, TMP_RENAMED_LOG);
> + strbuf_git_path(&tmp_renamed_log, "logs/%s", TMP_RENAMED_LOG);
>   ret = log && rename(sb_oldref.buf, tmp_renamed_log.buf);
>   strbuf_release(&sb_oldref);
>   strbuf_release(&tmp_renamed_log);
>   if (ret)
> - return error("unable to move logfile logs/%s to 
> "TMP_RENAMED_LOG": %s",
> + return error("unable to move logfile logs/%s to 
> logs/"TMP_RENAMED_LOG": %s",
>   oldrefname, strerror(errno));

It seems like it would be preferable to use `sb_oldref.buf` and
`tmp.buf` when building the error message. But I guess that `tmp.buf`
might include some path preceding "logs/" that is unwanted in the error
message? But it's a shame to hardcode the file naming scheme here again.

Maybe we *do* want the path in the error message?

It just occurred to me: this temporary logfile lives in the main
repository, right? What if a worktree reference is being renamed? Part
of the advertised use of worktrees is that the worktree might live far
from the main directory, or even on removable media. But it's not
possible to rename files across partitions. Maybe this will come out in
the wash once worktrees are ref_stores themselves.

For that matter, what if a user tries to rename a worktree ref into a
common ref or vice versa?

>   if (delete_ref(oldrefname, orig_sha1, REF_NODEREF)) {
> [...]

Michael



Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Junio C Hamano
Jeff King  writes:

> I do not just agree, but I think the move to a signed timestamp is a big
> improvement. Git's object format is happy to represent times before
> 1970, but the code is not. I know this has been a pain for people who
> import ancient histories into Git.
>
> It looks from the discussion like the sanest path forward is our own
> signed-64bit timestamp_t. That's unfortunate compared to using the
> standard time_t, but hopefully it would reduce the number of knobs (like
> TIME_T_IS_INT64) in the long run.

Keeping it unsigned is safer in the short-term.  There are some
places that uses 0 as "impossible time" (e.g. somebody tried to
parse a string as time and returns a failure) and these places need
to be found and be replaced with probably the most negative value
that timestamp_t cn represent.  Another possible special value we
may use is for "expiring everything" but I think we tend to just use
the timestamp of the present time for that purpose and not UONG_MAX,
so we should be OK there.

But we need to cross the bridge to signed timestamp sometime, and I
do not see any reason why that somtime should not be now.




Re: [PATCH v5 24/24] t1406: new tests for submodule ref store

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  t/t1406-submodule-ref-store.sh (new +x) | 95 
> +
>  1 file changed, 95 insertions(+)
>  create mode 100755 t/t1406-submodule-ref-store.sh
> 
> diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
> new file mode 100755
> index 0..3b30ba62f
> --- /dev/null
> +++ b/t/t1406-submodule-ref-store.sh
> [...]

I haven't actually read this far in the patch series, but I noticed that
a test in this file fails:


t1406-submodule-ref-store.sh (Wstat: 256 Tests: 15
Failed: 1)
  Failed test:  10
  Non-zero exit status: 1

I didn't have time to look into it more; let me know if you can't
reproduce it.

Michael



Re: [PATCH v5 07/24] files-backend: add and use files_refname_path()

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> Keep repo-related path handling in one place. This will make it easier
> to add submodule/multiworktree support later.
> 
> This automatically adds the "if submodule then use the submodule version
> of git_path" to other call sites too. But it does not mean those
> operations are sumodule-ready. Not yet.

s/sumodule/submodule/

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  refs/files-backend.c | 45 +
>  1 file changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 7b4ea4c56..72f4e1746 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1180,6 +1180,18 @@ static void files_reflog_path(struct files_ref_store 
> *refs,
>   strbuf_git_path(sb, "logs/%s", refname);
>  }
>  
> +static void files_refname_path(struct files_ref_store *refs,
> +struct strbuf *sb,
> +const char *refname)
> +{
> + if (refs->submodule) {
> + strbuf_git_path_submodule(sb, refs->submodule, "%s", refname);
> + return;
> + }
> +
> + strbuf_git_path(sb, "%s", refname);
> +}

Maybe it's just me, but I find it odd to exit early here when the first
exit isn't due to an error. For me, structuring this like `if ()
call1(); else call2();` would make it clearer that the two code paths
are equally-valid alternatives, and either one or the other will be
executed.

I had the same feeling when I read `files_reflog_path()`

>  /*
>   * Get the packed_ref_cache for the specified files_ref_store,
>   * creating it if necessary.
> @@ -1251,10 +1263,7 @@ static void read_loose_refs(const char *dirname, 
> struct ref_dir *dir)
>   size_t path_baselen;
>   int err = 0;
>  
> - if (refs->submodule)
> - err = strbuf_git_path_submodule(&path, refs->submodule, "%s", 
> dirname);
> - else
> - strbuf_git_path(&path, "%s", dirname);
> + files_refname_path(refs, &path, dirname);

It's so nice to see these ugly `if (refs->submodule)` code blocks
disappearing!

> [...]

Michael



Re: [PATCH 1/2] docs/diffcore: fix grammar in diffcore-rename header

2017-02-28 Thread Junio C Hamano
Patrick Steinhardt  writes:

> Signed-off-by: Patrick Steinhardt 
> ---
>  Documentation/gitdiffcore.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/gitdiffcore.txt b/Documentation/gitdiffcore.txt
> index 46bc6d077..cf009a187 100644
> --- a/Documentation/gitdiffcore.txt
> +++ b/Documentation/gitdiffcore.txt
> @@ -119,7 +119,7 @@ the original is used), and can be customized by giving a 
> number
>  after "-B" option (e.g. "-B75" to tell it to use 75%).
>  
>  
> -diffcore-rename: For Detection Renames and Copies
> +diffcore-rename: For Detecting Renames and Copies
>  -
>  
>  This transformation is used to detect renames and copies, and is

Thanks for carefully reading.  Both looks reasonable.


Re: gpg verify git sub modules useful?

2017-02-28 Thread Junio C Hamano
Patrick Schleizer  writes:

> When using git submodules, is there value in iterating about the git
> submodules running "git verfiy-commit HEAD" or would that be already
> covered by the git submodule verification?

That depends on what you are referring to with the "git submodule
verification" and more importantly what threat you are guarding
against.  "git -C  verify-commit HEAD" may make sure
that the contents of that commit object is GPG signed by whoever you
trust--is that what you want to make sure?  Or do you want all
commits in the submodule history to be similarly signed because the
tree of the superproject can switch to some other commit there?


Re: [PATCH v5 09/24] refs.c: introduce get_main_ref_store()

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  refs.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index 81b64b4ed..dab1a21ac 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1456,15 +1456,23 @@ static struct ref_store *ref_store_init(const char 
> *submodule)
>   return refs;
>  }
>  
> +static struct ref_store *get_main_ref_store(void)
> +{
> + struct ref_store *refs;
> +
> + if (main_ref_store)
> + return main_ref_store;
> +
> + refs = ref_store_init(NULL);
> + return refs;

Unnecessary temporary variable?

> [...]

Michael




Re: [PATCH v5 08/24] files-backend: remove the use of git_path()

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> Given $GIT_DIR and $GIT_COMMON_DIR, files-backend is now in charge of
> deciding what goes where (*). The end goal is to pass $GIT_DIR only. A
> refs "view" of a linked worktree is a logical ref store that combines
> two files backends together.
> 
> (*) Not entirely true since strbuf_git_path_submodule() still does path
> translation underneath. But that's for another patch.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  refs/files-backend.c | 43 ++-
>  1 file changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 72f4e1746..a390eaadf 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -923,7 +923,8 @@ struct files_ref_store {
>* store:
>*/
>   const char *submodule;
> -
> + char *gitdir;
> + char *gitcommondir;
>   char *packed_refs_path;
>  
>   struct ref_entry *loose;
> @@ -985,6 +986,8 @@ static struct ref_store *files_ref_store_create(const 
> char *submodule)
>  {
>   struct files_ref_store *refs = xcalloc(1, sizeof(*refs));
>   struct ref_store *ref_store = (struct ref_store *)refs;
> + struct strbuf sb = STRBUF_INIT;
> + const char *gitdir = get_git_dir();
>  
>   base_ref_store_init(ref_store, &refs_be_files);
>  
> @@ -995,7 +998,11 @@ static struct ref_store *files_ref_store_create(const 
> char *submodule)
>   return ref_store;
>   }
>  
> - refs->packed_refs_path = git_pathdup("packed-refs");
> + refs->gitdir = xstrdup(gitdir);
> + get_common_dir_noenv(&sb, gitdir);
> + refs->gitcommondir = strbuf_detach(&sb, NULL);
> + strbuf_addf(&sb, "%s/packed-refs", refs->gitcommondir);
> + refs->packed_refs_path = strbuf_detach(&sb, NULL);

`git_path()` and friends avoid adding an extra `/` if `git_dir()`
already ends in a slash or if it is the empty string. Here you don't
have that functionality. Is that intentional?

Same thing below, too.

>  
>   return ref_store;
>  }
> @@ -1173,11 +1180,26 @@ static void files_reflog_path(struct files_ref_store 
> *refs,
> const char *refname)
>  {
>   if (!refname) {
> - strbuf_git_path(sb, "logs");
> + /*
> +  * FIXME: of course this is wrong in multi worktree
> +  * setting. To be fixed real soon.
> +  */
> + strbuf_addf(sb, "%s/logs", refs->gitcommondir);
>   return;
>   }
>  
> - strbuf_git_path(sb, "logs/%s", refname);
> + switch (ref_type(refname)) {
> + case REF_TYPE_PER_WORKTREE:
> + case REF_TYPE_PSEUDOREF:
> + strbuf_addf(sb, "%s/logs/%s", refs->gitdir, refname);
> + break;
> + case REF_TYPE_NORMAL:
> + strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
> + break;
> + default:
> + die("BUG: unknown ref type %d of ref %s",
> + ref_type(refname), refname);
> + }
>  }
>  
>  static void files_refname_path(struct files_ref_store *refs,
> @@ -1189,7 +1211,18 @@ static void files_refname_path(struct files_ref_store 
> *refs,
>   return;
>   }
>  
> - strbuf_git_path(sb, "%s", refname);
> + switch (ref_type(refname)) {
> + case REF_TYPE_PER_WORKTREE:
> + case REF_TYPE_PSEUDOREF:
> + strbuf_addf(sb, "%s/%s", refs->gitdir, refname);
> + break;
> + case REF_TYPE_NORMAL:
> + strbuf_addf(sb, "%s/%s", refs->gitcommondir, refname);
> + break;
> + default:
> + die("BUG: unknown ref type %d of ref %s",
> + ref_type(refname), refname);
> + }
>  }
>  
>  /*
> 

Michael



Re: [PATCH v5 12/24] refs.c: kill register_ref_store(), add register_submodule_ref_store()

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> This is the last function in this code (besides public API) that takes
> submodule argument and handles both main/submodule cases. Break it down,
> move main store registration in get_main_ref_store() and keep the rest
> in register_submodule_ref_store().
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  refs.c | 50 ++
>  1 file changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index c284cb4f4..6adc38e42 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1412,29 +1412,6 @@ static struct ref_store 
> *lookup_submodule_ref_store(const char *submodule)
>  }
>  
>  /*
> - * Register the specified ref_store to be the one that should be used
> - * for submodule (or the main repository if submodule is NULL). It is
> - * a fatal error to call this function twice for the same submodule.
> - */
> -static void register_ref_store(struct ref_store *refs, const char *submodule)
> -{
> - if (!submodule) {
> - if (main_ref_store)
> - die("BUG: main_ref_store initialized twice");
> -
> - main_ref_store = refs;
> - } else {
> - if (!submodule_ref_stores.tablesize)
> - hashmap_init(&submodule_ref_stores, submodule_hash_cmp, 
> 0);
> -
> - if (hashmap_put(&submodule_ref_stores,
> - alloc_submodule_hash_entry(submodule, refs)))
> - die("BUG: ref_store for submodule '%s' initialized 
> twice",
> - submodule);
> - }
> -}
> -
> -/*
>   * Create, record, and return a ref_store instance for the specified
>   * submodule (or the main repository if submodule is NULL).
>   */
> @@ -1448,7 +1425,6 @@ static struct ref_store *ref_store_init(const char 
> *submodule)
>   die("BUG: reference backend %s is unknown", be_name);
>  
>   refs = be->init(submodule);
> - register_ref_store(refs, submodule);
>   return refs;
>  }
>  
> @@ -1460,9 +1436,32 @@ static struct ref_store *get_main_ref_store(void)
>   return main_ref_store;
>  
>   refs = ref_store_init(NULL);
> + if (refs) {
> + if (main_ref_store)
> + die("BUG: main_ref_store initialized twice");
> +
> + main_ref_store = refs;
> + }
>   return refs;

Superfluous test: I don't think `ref_store_init()` ever returns NULL.
This also implies that you could check `main_ref_store` before calling
`ref_store_init()`, and eliminate a temporary.

>  }
>  
> +/*
> + * Register the specified ref_store to be the one that should be used
> + * for submodule. It is a fatal error to call this function twice for
> + * the same submodule.
> + */
> +static void register_submodule_ref_store(struct ref_store *refs,
> +  const char *submodule)
> +{
> + if (!submodule_ref_stores.tablesize)
> + hashmap_init(&submodule_ref_stores, submodule_hash_cmp, 0);
> +
> + if (hashmap_put(&submodule_ref_stores,
> + alloc_submodule_hash_entry(submodule, refs)))
> + die("BUG: ref_store for submodule '%s' initialized twice",
> + submodule);
> +}
> +
>  struct ref_store *get_ref_store(const char *submodule)
>  {
>   struct strbuf submodule_sb = STRBUF_INIT;
> @@ -1480,6 +1479,9 @@ struct ref_store *get_ref_store(const char *submodule)
>   if (is_nonbare_repository_dir(&submodule_sb))
>   refs = ref_store_init(submodule);
>   strbuf_release(&submodule_sb);
> +
> + if (refs)

I think `refs` should always be non-NULL here for the same reason.

> + register_submodule_ref_store(refs, submodule);
>   return refs;
>  }

Michael



Re: [PATCH v5 13/24] refs.c: make get_main_ref_store() public and use it

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> get_ref_store() will soon be renamed to get_submodule_ref_store().
> Together with future get_worktree_ref_store(), the three functions
> provide an appropriate ref store for different operation modes. New APIs
> will be added to operate directly on ref stores.

Nice.

Michael



Re: git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-02-28 Thread Torsten Bögershausen
On 2017-02-27 21:17, Junio C Hamano wrote:

> Torsten, you've been quite active in fixing various glitches around
> the EOL conversion in the latter half of last year.  Have any
> thoughts to share on this topic?
> 
> Thanks.

Sorry for the delay, being too busy with other things.
I followed the discussion, but didn't have good things to contribute.
I am not an expert in diff.c, but there seems to be a bug, thanks everybody
for digging.



Back to business:

My understanding is that git diff --quiet should be quiet, when
git add will not do anything (but the file is "touched".
The touched means that Git will detect e.g a new mtime or inode
or file size when doing lstat().

mtime is tricky, inodes are problematic under Windows.
What is easy to change is the file length.
I don't thing that we need a test file with LF, nor do we need
the sleep, touch or anything.
Would the the following work ?
(This is copy-paste, so the TABs may be corrupted)


#!/bin/sh
#
# Copyright (c) 2017 Mike Crowe
#
# These tests ensure that files changing line endings in the presence
# of .gitattributes to indicate that line endings should be ignored
# don't cause 'git diff' or 'git diff --quiet' to think that they have
# been changed.

test_description='git diff with files that require CRLF conversion'

. ./test-lib.sh

test_expect_success setup '
echo "* text=auto" > .gitattributes &&
printf "Hello\r\nWorld\r\n" >crlf.txt &&
git add .gitattributes crlf.txt lf.txt &&
git commit -m "initial"
'

test_expect_success 'quiet diff works on file with line-ending change that has
no effect on repository' '
printf "Hello\r\nWorld\n" >crlf.txt &&
git status &&
git diff --quiet
'

test_done







Re: [PATCH v5 00/24] Remove submodule from files-backend.c

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> v5 goes a bit longer than v4, basically:

I've read through patch 14/24 so far, and they all look good except for
the mostly superficial comments that I have sent so far. I like the way
this is heading!

I'll try to continue tomorrow.

Michael



RE: HELLO VERY URGENT.

2017-02-28 Thread info

Hello,

Hello, how are you doing? I want to inform you that, we have an  
inheritance of a deceased client with your surname. Reply via email:  
andrsi...@yandex.com with your full names for info.


Regards.

Miss. Melissa.
--
Correo Corporativo Hospital Universitario del Valle E.S.E
***

"Estamos re-dimensionandonos para crecer!"

**




Re: [PATCH v5 14/24] path.c: move some code out of strbuf_git_path_submodule()

2017-02-28 Thread Michael Haggerty
On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
> refs is learning to avoid path rewriting that is done by
> strbuf_git_path_submodule(). Factor out this code so it could be reused
> by refs*
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  path.c  | 34 +++---
>  submodule.c | 31 +++
>  submodule.h |  1 +
>  3 files changed, 39 insertions(+), 27 deletions(-)
> 
> diff --git a/path.c b/path.c
> index efcedafba..3451d2916 100644
> --- a/path.c
> +++ b/path.c
> @@ -475,35 +475,16 @@ const char *worktree_git_path(const struct worktree 
> *wt, const char *fmt, ...)
>  static int do_submodule_path(struct strbuf *buf, const char *path,
>const char *fmt, va_list args)
>  {
> - const char *git_dir;
>   struct strbuf git_submodule_common_dir = STRBUF_INIT;
>   struct strbuf git_submodule_dir = STRBUF_INIT;
> - const struct submodule *sub;
> - int err = 0;
> + int ret;
>  
> - strbuf_addstr(buf, path);
> - strbuf_complete(buf, '/');
> - strbuf_addstr(buf, ".git");
> -
> - git_dir = read_gitfile(buf->buf);
> - if (git_dir) {
> - strbuf_reset(buf);
> - strbuf_addstr(buf, git_dir);
> - }
> - if (!is_git_directory(buf->buf)) {
> - gitmodules_config();
> - sub = submodule_from_path(null_sha1, path);
> - if (!sub) {
> - err = SUBMODULE_PATH_ERR_NOT_CONFIGURED;

I didn't read this patch too carefully, but where the old code used the
constant `SUBMODULE_PATH_ERR_NOT_CONFIGURED`, the new code returns -1.
In fact, now the constant is totally unused. It looks like there's some
more cleanup that could happen.

> - goto cleanup;
> - }
> - strbuf_reset(buf);
> - strbuf_git_path(buf, "%s/%s", "modules", sub->name);
> - }
> -
> - strbuf_addch(buf, '/');
> - strbuf_addbuf(&git_submodule_dir, buf);
> + ret = submodule_to_gitdir(&git_submodule_dir, path);
> + if (ret)
> + goto cleanup;
>  
> + strbuf_complete(&git_submodule_dir, '/');
> + strbuf_addbuf(buf, &git_submodule_dir);
>   strbuf_vaddf(buf, fmt, args);
>  
>   if (get_common_dir_noenv(&git_submodule_common_dir, 
> git_submodule_dir.buf))
> @@ -514,8 +495,7 @@ static int do_submodule_path(struct strbuf *buf, const 
> char *path,
>  cleanup:
>   strbuf_release(&git_submodule_dir);
>   strbuf_release(&git_submodule_common_dir);
> -
> - return err;
> + return ret;
>  }
>  
>  char *git_pathdup_submodule(const char *path, const char *fmt, ...)
> diff --git a/submodule.c b/submodule.c
> index 3b98766a6..36b8d1d11 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1514,3 +1514,34 @@ void absorb_git_dir_into_superproject(const char 
> *prefix,
>   strbuf_release(&sb);
>   }
>  }
> +
> +int submodule_to_gitdir(struct strbuf *buf, const char *submodule)
> +{
> + const struct submodule *sub;
> + const char *git_dir;
> + int ret = 0;
> +
> + strbuf_reset(buf);
> + strbuf_addstr(buf, submodule);
> + strbuf_complete(buf, '/');
> + strbuf_addstr(buf, ".git");
> +
> + git_dir = read_gitfile(buf->buf);
> + if (git_dir) {
> + strbuf_reset(buf);
> + strbuf_addstr(buf, git_dir);
> + }
> + if (!is_git_directory(buf->buf)) {
> + gitmodules_config();
> + sub = submodule_from_path(null_sha1, submodule);
> + if (!sub) {
> + ret = -1;
> + goto cleanup;
> + }
> + strbuf_reset(buf);
> + strbuf_git_path(buf, "%s/%s", "modules", sub->name);
> + }
> +
> +cleanup:
> + return ret;
> +}
> diff --git a/submodule.h b/submodule.h
> index 05ab674f0..fc3d0303e 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -81,6 +81,7 @@ extern int push_unpushed_submodules(struct sha1_array 
> *commits,
>   int dry_run);
>  extern void connect_work_tree_and_git_dir(const char *work_tree, const char 
> *git_dir);
>  extern int parallel_submodules(void);
> +int submodule_to_gitdir(struct strbuf *buf, const char *submodule);

A docstring is always nice :-)

>  
>  /*
>   * Prepare the "env_array" parameter of a "struct child_process" for 
> executing
> 

Michael



Re: 'git submodules update' ignores credential.helper config of the parent repository

2017-02-28 Thread Stefan Beller
On Tue, Feb 28, 2017 at 6:37 AM, Jeff King  wrote:
>>
>> This would change the semantics of a config file as the attribute for
>> each setting depends on the location (was attribute.FOO.read =
>> {true, false} read before).
>
> I'm not enthused by this, just because there is a hidden dependency
> between attribute.* sections and other ones. They _look_ like regular
> config keys, but they really aren't.

True.

> I have a feeling that something like this would create unwelcome corner
> cases in the config-writer, which is otherwise does not have to care
> about which existing section of a file it adds a key to.

Yeah the writer would become a lot more involved, if we're not going
the stupid way (add these sections for nearly all keys. that would be
a mess but easy to implement)

So I guess then we rather settle with multiple config files or a white/blacklist
of config options to propagate from the superproject to its submodules.


Re: [PATCH] http: attempt updating base URL only if no error

2017-02-28 Thread Jonathan Tan

On 02/28/2017 05:28 AM, Jeff King wrote:

Right, your patch makes sense. A real HTTP error should take precedence
over the url-update trickery.

Acked-by: Jeff King 


Thanks!


Running your included test, we get:

  fatal: unable to access 'http://127.0.0.1:5550/redir-to/502/': The
  requested URL returned error: 502

but the error really happened in the intermediate step. I wonder if we
should show the effective_url in that case, as it's more likely to
pinpoint the problem. OTOH, we do not mention the intermediate redirect
at all, so they might be confused about where that URL came from. If you
really want to debug HTTP confusion, you should use GIT_TRACE_CURL.


Yeah, if we mention the effective_url, I think that there would need to 
be a lot more explaining to be done (e.g. why does my URL have 
"info/refs?service=git-upload-pack" tacked on at the end). It might be 
better to just recommend GIT_TRACE_CURL.


Re: format-patch subject-prefix gets truncated when using the --numbered flag

2017-02-28 Thread Junio C Hamano
Adrian Dudau  writes:

> I noticed that the --subject-prefix string gets truncated sometimes,
> but only when using the --numbered flat. Here's an example:
>
> addu@sestofb11:/data/fb/addu/git$ export longm="very very very very
> very very very very very very very very very very long prefix"

This looks like "dr, my arm hurts when I twist it this way---don't
do that then" ;-).  Truncation is designed and intended to preserve
space for the real title of commit.

Having said that...

> This is happening on the latest master branch, so I dug through the
> code and tracked the issue to this piece of code in log-tree.c:
>
> if (opt->total > 0) {
> static char buffer[64];
> snprintf(buffer, sizeof(buffer),
>  "Subject: [%s%s%0*d/%d] ",
>  opt->subject_prefix,
>  *opt->subject_prefix ? " " : "",
>  digits_in_number(opt->total),
>  opt->nr, opt->total);
> subject = buffer;
> } else if (opt->total == 0 && opt->subject_prefix && *opt-
>>subject_prefix) {
> static char buffer[256];
> snprintf(buffer, sizeof(buffer),
>  "Subject: [%s] ",
>  opt->subject_prefix);
> subject = buffer;
> } else {
> subject = "Subject: ";
> }
>
> Apparently the size of the "buffer" var is different in the two
> situations. Anybody knows if this is by design or just an old
> oversight?

I think this is just an old oversight.  The latter should have been
even shorter than the former one (or the former should be longer
than the latter) to account for the width of the counter e.g. 01/27.


Re: SHA1 collisions found

2017-02-28 Thread Junio C Hamano
Jeff King  writes:

> The first one is 98K. Mail headers may bump it over vger's 100K barrier.
> It's actually the _least_ interesting patch of the 3, because it just
> imports the code wholesale from the other project. But if it doesn't
> make it, you can fetch the whole series from:
>
>   https://github.com/peff/git jk/sha1dc
>
> (By the way, I don't see your version on the list, Linus, which probably
> means it was eaten by the 100K filter).
>
>   [1/3]: add collision-detecting sha1 implementation
>   [2/3]: sha1dc: adjust header includes for git
>   [3/3]: Makefile: add USE_SHA1DC knob

I was lazy so I fetched the above and then added this on top before
I start to play with it.

-- >8 --
From: Junio C Hamano 
Date: Tue, 28 Feb 2017 10:39:25 -0800
Subject: [PATCH] sha1dc: resurrect LICENSE file

The upstream releases the contents under the MIT license; the
initial import accidentally omitted its license file.  

Add it back.

Signed-off-by: Junio C Hamano 
---
 sha1dc/LICENSE | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 sha1dc/LICENSE

diff --git a/sha1dc/LICENSE b/sha1dc/LICENSE
new file mode 100644
index 00..4a3e6a1b15
--- /dev/null
+++ b/sha1dc/LICENSE
@@ -0,0 +1,30 @@
+MIT License
+
+Copyright (c) 2017:
+Marc Stevens
+Cryptology Group
+Centrum Wiskunde & Informatica
+P.O. Box 94079, 1090 GB Amsterdam, Netherlands
+m...@marc-stevens.nl
+
+Dan Shumow
+Microsoft Research
+dan...@microsoft.com
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
-- 
2.12.0-310-g733d1cbbe2



Re: SHA1 collisions found

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 11:07:37AM -0800, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> >>   [1/3]: add collision-detecting sha1 implementation
> >>   [2/3]: sha1dc: adjust header includes for git
> >>   [3/3]: Makefile: add USE_SHA1DC knob
> >
> > I was lazy so I fetched the above and then added this on top before
> > I start to play with it.
> >
> > -- >8 --
> > From: Junio C Hamano 
> > Date: Tue, 28 Feb 2017 10:39:25 -0800
> > Subject: [PATCH] sha1dc: resurrect LICENSE file
> 
> In a way similar to 8415558f55 ("sha1dc: avoid c99
> declaration-after-statement", 2017-02-24), we would want this on
> top.
> 
> -- >8 --
> Subject: sha1dc: avoid 'for' loop initial decl

Yeah, thanks, I had tweaked both that and the license thing locally but
had not pushed it out yet. Both are obvious improvements.

FWIW, I've been in touch with Dan Shumow, one of the authors, who has
been looking at whether we can speed up the sha1dc implementation, or
integrate the checks into the block-sha1 implementation.

Here are a few notes on the earlier timings I posted that came out in
our conversation:

  - the timings I showed earlier were actually openssl versus sha1dc.
The block-sha1 timings fall somewhere in the middle:

  [running test-sha1 on a fresh linux.git packfile]
  1.347s openssl
  2.079s block-sha1
  4.983s sha1dc

  [index-pack --verify on a fresh git.git packfile]
   6.919s openssl
   9.003s block-sha1
  17.955s sha1dc

Those are the operations that show off sha1 performance the most.
The first one is really not even that interesting; it's raw
sha1 performance. The second one is an actual operation users might
notice (though not as --verify exactly, but as "index-pack --stdin"
when receiving a fetch or a push).

So there's room for improvement, but the gap between block-sha1
and sha1dc is not quite as big as I showed earlier.

  - Dan timed the sha1dc implementation with and without the collision
detection enabled. The sha1 implementation is only 1.33x slower than
block-sha1 (for raw sha1 time). Adding in the detection makes it
2.6x slower.

So there's some potential gain from optimizing the sha1
implementation, but ultimately we may be looking at a 2x slowdown to
add in the collision detection.

It doesn't need to happen for _every_ sha1 we compute, but the
index-pack case is the one that almost certainly _does_ want it,
because that's when we're admitting remote objects into the
repository (ditto you'd probably want it for write_sha1_file(),
since you could be applying a patch from an untrusted source).

-Peff


Re: SHA1 collisions found

2017-02-28 Thread Junio C Hamano
Junio C Hamano  writes:

>>   [1/3]: add collision-detecting sha1 implementation
>>   [2/3]: sha1dc: adjust header includes for git
>>   [3/3]: Makefile: add USE_SHA1DC knob
>
> I was lazy so I fetched the above and then added this on top before
> I start to play with it.
>
> -- >8 --
> From: Junio C Hamano 
> Date: Tue, 28 Feb 2017 10:39:25 -0800
> Subject: [PATCH] sha1dc: resurrect LICENSE file

In a way similar to 8415558f55 ("sha1dc: avoid c99
declaration-after-statement", 2017-02-24), we would want this on
top.

-- >8 --
Subject: sha1dc: avoid 'for' loop initial decl

We write this:

type i;
for (i = initial; i < limit; i++)

instead of this:

for (type i = initial; i < limit; i++)

the latter of which is from c99.

Signed-off-by: Junio C Hamano 
---
 sha1dc/sha1.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index f4e261ae7a..6569b403e9 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -41,7 +41,8 @@
 
 void sha1_message_expansion(uint32_t W[80])
 {
-   for (unsigned i = 16; i < 80; ++i)
+   unsigned i;
+   for (i = 16; i < 80; ++i)
W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 
1);
 }
 
@@ -49,9 +50,10 @@ void sha1_compression(uint32_t ihv[5], const uint32_t m[16])
 {
uint32_t W[80];
uint32_t a, b, c, d, e;
+   unsigned i;
 
memcpy(W, m, 16 * 4);
-   for (unsigned i = 16; i < 80; ++i)
+   for (i = 16; i < 80; ++i)
W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 
1);
 
a = ihv[0];


[PATCH] Travis: also test on 32-bit Linux

2017-02-28 Thread Johannes Schindelin
When Git v2.9.1 was released, it had a bug that showed only on Windows
and on 32-bit systems: our assumption that `unsigned long` can hold
64-bit values turned out to be wrong.

This could have been caught earlier if we had a Continuous Testing
set up that includes a build and test run on 32-bit Linux.

Let's do this (and take care of the Windows build later). This patch
asks Travis CI to install a Docker image with 32-bit libraries and then
goes on to build and test Git using this 32-bit setup.

A big thank you to Lars Schneider without whose help this patch would
not have happened.

Signed-off-by: Johannes Schindelin 
---
Published-As: https://github.com/dscho/git/releases/tag/travis-32-bit-v1
Fetch-It-Via: git fetch https://github.com/dscho/git travis-32-bit-v1

 .travis.yml | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/.travis.yml b/.travis.yml
index 9c63c8c3f68..87d9e9051a6 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -39,6 +39,17 @@ env:
 
 matrix:
   include:
+- env: Linux32
+  os: linux
+  compiler: clang
+  sudo: required
+  services:
+- docker
+  before_install:
+- docker pull daald/ubuntu32:xenial
+  before_script:
+  script:
+- "sudo docker run -i -v \"${PWD}:/usr/src/git\" daald/ubuntu32:xenial 
/bin/bash -c \"linux32 --32bit i386 sh -c 'apt update && apt install -y 
build-essential libcurl4-openssl-dev libssl-dev libexpat-dev gettext python && 
cd /usr/src/git && DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS=\\\"--timer --jobs 
3 --state=failed,slow,save\\\" GIT_TEST_OPTS=--verbose-log 
GIT_TEST_CLONE_2GB=YesPlease make -j2 test'\""
 - env: Documentation
   os: linux
   compiler: clang

base-commit: 3bc53220cb2dcf709f7a027a3f526befd021d858
-- 
2.12.0.windows.1.3.g8a117c48243


Re: SHA1 collisions found

2017-02-28 Thread Linus Torvalds
On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano  wrote:
>
> In a way similar to 8415558f55 ("sha1dc: avoid c99
> declaration-after-statement", 2017-02-24), we would want this on
> top.

There's a few other simplifications that could be done:

 (1) make the symbols static that aren't used.

 The sha1.h header ends up declaring several things that shouldn't
have been exported.

 I suspect the code may have had some debug mode that got stripped
out from it before making it public (or that was never there, and was
just something the generating code could add).

 (2) get rid of the "safe mode" support.

 That one is meant for non-checking replacements where it
generates a *different* hash for input with the collision fingerpring,
but that's pointless for the git use when we abort on a collision
fingerprint.

I think the first one will show that the sha1_compression() function
isn't actually used, and with the removal of safe-mode I think
sha1_compression_W() also is unused.

Finally, only states 58 and 65 (out of all 80 states) are actually
used, and from what I can tell, the 'maski' value is always 0, so the
looping over 80 state masks is really just a loop over two.

The file has code top *generate* all the 80 sha1_recompression_step()
functions, and I don't think the compiler is smart enough to notice
that only two of them matter.

And because 'maski' is always zero, thisL

   ubc_dv_mask[sha1_dvs[i].maski]

code looks like it might as well just use ubc_dv_mask[0] - in fact the
ubc_dv_mask[] "array" really is just a single-entry array anyway:

   #define DVMASKSIZE 1

so that code has a few oddities in it. It's generated code, which is
probably why.

Basically, some of it could be improved. In particular, the "generate
code for 80 different recompression cases, but only ever use two of
them" really looks like it would blow up the code generation footprint
a lot.

I'm adding Marc Stevens and Dan Shumow to this email (bcc'd, so that
they don't get dragged into any unrelated email threads) in case they
want to comment.

I'm wondering if they perhaps have a cleaned-up version somewhere, or
maybe they can tell me that I'm just full of sh*t and missed
something.

Linus


Re: SHA1 collisions found

2017-02-28 Thread Shawn Pearce
On Tue, Feb 28, 2017 at 11:34 AM, Linus Torvalds
 wrote:
> On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano  wrote:
>>
>> In a way similar to 8415558f55 ("sha1dc: avoid c99
>> declaration-after-statement", 2017-02-24), we would want this on
>> top.
>
> There's a few other simplifications that could be done:

Yes, I found and did a number of these when I ported sha1dc to Java
for JGit[1], and it helped recover some of the lost throughput.

[1] https://git.eclipse.org/r/#/c/91852/

>  (1) make the symbols static that aren't used.
>
>  The sha1.h header ends up declaring several things that shouldn't
> have been exported.
>
>  I suspect the code may have had some debug mode that got stripped
> out from it before making it public (or that was never there, and was
> just something the generating code could add).
>
>  (2) get rid of the "safe mode" support.
>
>  That one is meant for non-checking replacements where it
> generates a *different* hash for input with the collision fingerpring,
> but that's pointless for the git use when we abort on a collision
> fingerprint.
>
> I think the first one will show that the sha1_compression() function
> isn't actually used, and with the removal of safe-mode I think
> sha1_compression_W() also is unused.

Correct.

> Finally, only states 58 and 65 (out of all 80 states) are actually
> used,

Yes, at present only states 58 and 65 are used. I cut out support for
other states.

> and from what I can tell, the 'maski' value is always 0, so the
> looping over 80 state masks is really just a loop over two.

Actually, look closer at that loop:

  for (i = 0; sha1_dvs[i].dvType != 0; ++i)
  {
if ((0 == ctx->ubc_check) || (((uint32_t)(1) << sha1_dvs[i].maskb)
& ubc_dv_mask[sha1_dvs[i].maski]))

Its a loop over all 32 bits looking for which bits are set. Most of
the time few bits if any are set for most message blocks. Changing
this code to find the lowest 1 bit set in ubc_dv_mask[0] provided a
significant improvement in throughput.

The sha1_dvs array is indexed by maskb, so the code can be reduced to:

  while (ubcDvMask != 0) {
int b = numberOfTrailingZeros(lowestOneBit(ubcDvMask));
UbcCheck.DvInfo dv = UbcCheck.DV[b];

Or something.


Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Junio C Hamano
René Scharfe  writes:

> Am 28.02.2017 um 15:28 schrieb Jeff King:
>
>> It looks from the discussion like the sanest path forward is our own
>> signed-64bit timestamp_t. That's unfortunate compared to using the
>> standard time_t, but hopefully it would reduce the number of knobs (like
>> TIME_T_IS_INT64) in the long run.
>
> Glibc will get a way to enable 64-bit time_t on 32-bit platforms
> eventually
> (https://sourceware.org/glibc/wiki/Y2038ProofnessDesign). Can
> platforms that won't provide a 64-bit time_t by 2038 be actually used
> at that point?  How would we get time information on them?  How would
> a custom timestamp_t help us?

That's a sensible "wait, let's step back a bit".  I take it that you
are saying "time_t is just fine", and I am inclined to agree.

Right now, they may be able to have future timestamps ranging to
year 2100 and switching to time_t would limit their ability to
express future time to 2038 but they would be able to express
timestamp in the past to cover most of 20th century.  Given that
these 32-bit time_t software platforms will die off before year 2038
(either by underlying hardware getting obsolete, or software updated
to handle 64-bit time_t), the (temporary) loss of 2038-2100 range
would not be too big a deal to warrant additional complexity.

> Regarding the need for knobs: We could let the compiler chose between
> strtoll() and strtol() based on the size of time_t, in an inline
> function.  The maximum value can be calculated using its size as
> well. And we could use PRIdMAX and cast to intmax_t for printing.

Thanks.


Re: 'git submodules update' ignores credential.helper config of the parent repository

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 10:05:24AM -0800, Stefan Beller wrote:

> > I have a feeling that something like this would create unwelcome corner
> > cases in the config-writer, which is otherwise does not have to care
> > about which existing section of a file it adds a key to.
> 
> Yeah the writer would become a lot more involved, if we're not going
> the stupid way (add these sections for nearly all keys. that would be
> a mess but easy to implement)
> 
> So I guess then we rather settle with multiple config files or a 
> white/blacklist
> of config options to propagate from the superproject to its submodules.

I'm still open to the idea that we simply improve the documentation to
make it clear that per-repo config really is per-repo, and is not shared
between super-projects and submodules. And then something like Duy's
proposed conditional config lets you set global config that flexibly
covers a set of repos.

-Peff


Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 09:26:23AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I do not just agree, but I think the move to a signed timestamp is a big
> > improvement. Git's object format is happy to represent times before
> > 1970, but the code is not. I know this has been a pain for people who
> > import ancient histories into Git.
> >
> > It looks from the discussion like the sanest path forward is our own
> > signed-64bit timestamp_t. That's unfortunate compared to using the
> > standard time_t, but hopefully it would reduce the number of knobs (like
> > TIME_T_IS_INT64) in the long run.
> 
> Keeping it unsigned is safer in the short-term.  There are some
> places that uses 0 as "impossible time" (e.g. somebody tried to
> parse a string as time and returns a failure) and these places need
> to be found and be replaced with probably the most negative value
> that timestamp_t cn represent.  Another possible special value we
> may use is for "expiring everything" but I think we tend to just use
> the timestamp of the present time for that purpose and not UONG_MAX,
> so we should be OK there.

Yeah. I think I was the one who invented the "0 is impossible"
convention. We can certainly stick with it for now (it's awkward if you
really do have an entry on Jan 1 1970, but other than that it's an OK
marker). I agree that the most negatively value is probably a saner
choice, but we can switch to it after the dust settles.

> But we need to cross the bridge to signed timestamp sometime, and I
> do not see any reason why that somtime should not be now.

Yep.

-Peff


Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 10:55:49AM -0800, Junio C Hamano wrote:

> > Glibc will get a way to enable 64-bit time_t on 32-bit platforms
> > eventually
> > (https://sourceware.org/glibc/wiki/Y2038ProofnessDesign). Can
> > platforms that won't provide a 64-bit time_t by 2038 be actually used
> > at that point?  How would we get time information on them?  How would
> > a custom timestamp_t help us?
> 
> That's a sensible "wait, let's step back a bit".  I take it that you
> are saying "time_t is just fine", and I am inclined to agree.
> 
> Right now, they may be able to have future timestamps ranging to
> year 2100 and switching to time_t would limit their ability to
> express future time to 2038 but they would be able to express
> timestamp in the past to cover most of 20th century.  Given that
> these 32-bit time_t software platforms will die off before year 2038
> (either by underlying hardware getting obsolete, or software updated
> to handle 64-bit time_t), the (temporary) loss of 2038-2100 range
> would not be too big a deal to warrant additional complexity.

For what it's worth, I'm on board with just using time_t if it reduces
the overall complexity. I agree that the "loss" of far-future timestamp
handling is unlikely to matter between now and 2038, and those systems
will have to figure out their time_t problems by then. By actually using
time_t we get to piggy-back on their solution.

-Peff


Re: 'git submodules update' ignores credential.helper config of the parent repository

2017-02-28 Thread Stefan Beller
On Tue, Feb 28, 2017 at 12:08 PM, Jeff King  wrote:
> On Tue, Feb 28, 2017 at 10:05:24AM -0800, Stefan Beller wrote:
>
>> > I have a feeling that something like this would create unwelcome corner
>> > cases in the config-writer, which is otherwise does not have to care
>> > about which existing section of a file it adds a key to.
>>
>> Yeah the writer would become a lot more involved, if we're not going
>> the stupid way (add these sections for nearly all keys. that would be
>> a mess but easy to implement)
>>
>> So I guess then we rather settle with multiple config files or a 
>> white/blacklist
>> of config options to propagate from the superproject to its submodules.
>
> I'm still open to the idea that we simply improve the documentation to
> make it clear that per-repo config really is per-repo, and is not shared
> between super-projects and submodules. And then something like Duy's
> proposed conditional config lets you set global config that flexibly
> covers a set of repos.

How would the workflow for that look like?
My naive thought on that is:

  (1)  $EDIT .git/config_to_be_included
  (2)  $ git config add-config-inclusion .git/config_to_be_included
  (3)  $ git submodule foreach git config add-inclusion-config
.git/config_to_be_included

which sounds a bit cumbersome to me.
So I guess we'd want some parts of that as part of another command, e.g.
(3) could be part of (2).

--
I am also open and willing to document this better; but were would
we want to put documentation? Obviously we would not want to put it
alongside each potentially useful config option to be inherited to
submodules. (that would imply repeating ourselves quite a lot in
the config man page).

I guess putting it into "man gitmodules" that I was writing tentatively
would make sense.
C.f
https://public-inbox.org/git/20161227234310.13264-4-sbel...@google.com/
(or search for "background story" in your emails)

Thanks,
Stefan


Re: Typesafer git hash patch

2017-02-28 Thread Jeff King
On Mon, Feb 27, 2017 at 10:59:15PM -0800, Linus Torvalds wrote:

> I saw that somebody is actually looking at doing this "well" - by
> doing it in many smaller steps. I tried. I gave up. The "unsigned char
> *sha1" model is so ingrained that doing it incrementally just seemed
> like a lot of pain. But it might be the right approach.

That somebody is brian carlson, cc'd.

> which is pretty nasty. The good news is that my patch passes all the
> tests, and while it's big it's mostly very very mindless, and a lot of
> it looks like cleanups, and the lines are generally shorter, eg
> 
> -   const unsigned char *mb = result->item->object.oid.hash;
> -   if (!hashcmp(mb, current_bad_oid->hash)) {
> 
> turns into
> 
> +   const hash_t *mb = &result->item->object.oid;
> +   if (!hashcmp(mb, current_bad_oid)) {
> 
> but I ended up also renaming a lot of common "sha1" as "hash", which
> adds to the noise in that patch.

I think the preimage there is worse than it ought to be because it's
mid-transition. "struct object" has an object_id, but the rest of the
function hasn't been converted yet. So ultimately it should be:

  const struct object_id *mb = &result->item->object.oid;
  if (!oidcmp(mb, current_bad_oid))

It looks like you stuck your "hash_t" inside "struct object_id", which I
think is redundant. They're both trying to solve the same problem.

> NOTE! It doesn't actually _fix_ the SHA1-centricity in any way, but it
> makes it a bit more obvious where the bigger problems are. Not that
> anybody would be surprised by what they are, but as part of writing
> the patch it did kind of pinpoint most of them, and about 30 of those
> new lines are added
> 
>  /* FIXME! Hardcoded hash sizes */
>  /* FIXME! Lots of fixed-size hashes */
>  /* FIXME! Fixed 20-byte hash usage */

Yeah, a lot of brian's patches have been focused around the fixing the
related size assumptions. We've got GIT_SHA1_HEXSZ which doesn't solve
the problem, but at least makes it easy to find. And a big improvement
in the most recent series is a parse_oid() function that lets you parse
object-ids left-to-right without knowing the size up front. So things
like:

  if (len > 82 &&
  !get_sha1_hex(buf, sha1_a) &&
  get_sha1_hex(buf + 41, sha1_b))

becomes more like:

  if (parse_oid(p, oid_a, &p) && *p++ == ' ' &&
  parse_oid(p, oid_b, &p) && *p++ == '\n')

Still, if you've done more conversion, it's probably worth showing it
publicly. It might not end up used, but it may serve as a reference
later.

> And as part of the type safety, I do think I may have found a bug:
> 
> show_one_mergetag():
> 
> strbuf_addf(&verify_message, "tag %s names a non-parent %s\n",
> tag->tag, tag->tagged->oid.hash);
> 
> note how it prints out the "non-parent %s", but that's a SHA1 hash
> that hasn't been converted to hex. Hmm?

Yeah, that's definitely a bug. I'm surprised that -Wformat doesn't
complain, but I guess we'd need -Wformat-signedness (which triggers
quite a lot of warnings related to "int"). It's especially bad for "%x",
which is implicitly unsigned.

-Peff


[PATCH v8 3/6] stash: refactor stash_create

2017-02-28 Thread Thomas Gummerer
Refactor the internal stash_create function to use a -m flag for
specifying the message and -u flag to indicate whether untracked files
should be added to the stash.

This makes it easier to pass a pathspec argument to stash_create in the
next patch.

The user interface for git stash create stays the same.

Signed-off-by: Thomas Gummerer 
---
 git-stash.sh | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/git-stash.sh b/git-stash.sh
index 8365ebba2a..ef5d1b45be 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -58,8 +58,22 @@ clear_stash () {
 }
 
 create_stash () {
-   stash_msg="$1"
-   untracked="$2"
+   stash_msg=
+   untracked=
+   while test $# != 0
+   do
+   case "$1" in
+   -m|--message)
+   shift
+   stash_msg=${1?"BUG: create_stash () -m requires an 
argument"}
+   ;;
+   -u|--include-untracked)
+   shift
+   untracked=${1?"BUG: create_stash () -u requires an 
argument"}
+   ;;
+   esac
+   shift
+   done
 
git update-index -q --refresh
if no_changes
@@ -268,7 +282,7 @@ push_stash () {
git reflog exists $ref_stash ||
clear_stash || die "$(gettext "Cannot initialize stash")"
 
-   create_stash "$stash_msg" $untracked
+   create_stash -m "$stash_msg" -u "$untracked"
store_stash -m "$stash_msg" -q $w_commit ||
die "$(gettext "Cannot save the current status")"
say "$(eval_gettext "Saved working directory and index state 
\$stash_msg")"
@@ -667,7 +681,7 @@ clear)
;;
 create)
shift
-   create_stash "$*" && echo "$w_commit"
+   create_stash -m "$*" && echo "$w_commit"
;;
 store)
shift
-- 
2.12.0.428.g67fe103aa



[PATCH v8 1/6] stash: introduce push verb

2017-02-28 Thread Thomas Gummerer
Introduce a new git stash push verb in addition to git stash save.  The
push verb is used to transition from the current command line arguments
to a more conventional way, in which the message is given as an argument
to the -m option.

This allows us to have pathspecs at the end of the command line
arguments like other Git commands do, so that the user can say which
subset of paths to stash (and leave others behind).

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
 Documentation/git-stash.txt |  3 +++
 git-stash.sh| 46 ++---
 t/t3903-stash.sh|  9 +
 3 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 2e9e344cd7..d240df4af7 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -15,6 +15,8 @@ SYNOPSIS
 'git stash' branch  []
 'git stash' [save [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
 [-u|--include-untracked] [-a|--all] []]
+'git stash' push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
+[-u|--include-untracked] [-a|--all] [-m|--message ]]
 'git stash' clear
 'git stash' create []
 'git stash' store [-m|--message ] [-q|--quiet] 
@@ -46,6 +48,7 @@ OPTIONS
 ---
 
 save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] 
[-q|--quiet] []::
+push [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] 
[-q|--quiet] [-m|--message ]::
 
Save your local modifications to a new 'stash' and roll them
back to HEAD (in the working tree and in the index).
diff --git a/git-stash.sh b/git-stash.sh
index 10c284d1aa..8365ebba2a 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -9,6 +9,8 @@ USAGE="list []
or: $dashless branch  []
or: $dashless [save [--patch] [-k|--[no-]keep-index] [-q|--quiet]
   [-u|--include-untracked] [-a|--all] []]
+   or: $dashless push [--patch] [-k|--[no-]keep-index] [-q|--quiet]
+ [-u|--include-untracked] [-a|--all] [-m ]
or: $dashless clear"
 
 SUBDIRECTORY_OK=Yes
@@ -189,10 +191,11 @@ store_stash () {
return $ret
 }
 
-save_stash () {
+push_stash () {
keep_index=
patch_mode=
untracked=
+   stash_msg=
while test $# != 0
do
case "$1" in
@@ -216,6 +219,11 @@ save_stash () {
-a|--all)
untracked=all
;;
+   -m|--message)
+   shift
+   test -z ${1+x} && usage
+   stash_msg=$1
+   ;;
--help)
show_help
;;
@@ -251,8 +259,6 @@ save_stash () {
die "$(gettext "Can't use --patch and --include-untracked or 
--all at the same time")"
fi
 
-   stash_msg="$*"
-
git update-index -q --refresh
if no_changes
then
@@ -291,6 +297,36 @@ save_stash () {
fi
 }
 
+save_stash () {
+   push_options=
+   while test $# != 0
+   do
+   case "$1" in
+   --)
+   shift
+   break
+   ;;
+   -*)
+   # pass all options through to push_stash
+   push_options="$push_options $1"
+   ;;
+   *)
+   break
+   ;;
+   esac
+   shift
+   done
+
+   stash_msg="$*"
+
+   if test -z "$stash_msg"
+   then
+   push_stash $push_options
+   else
+   push_stash $push_options -m "$stash_msg"
+   fi
+}
+
 have_stash () {
git rev-parse --verify --quiet $ref_stash >/dev/null
 }
@@ -617,6 +653,10 @@ save)
shift
save_stash "$@"
;;
+push)
+   shift
+   push_stash "$@"
+   ;;
 apply)
shift
apply_stash "$@"
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 2de3e18ce6..3577115807 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -775,4 +775,13 @@ test_expect_success 'stash is not confused by partial 
renames' '
test_path_is_missing file
 '
 
+test_expect_success 'push -m shows right message' '
+   >foo &&
+   git add foo &&
+   git stash push -m "test message" &&
+   echo "stash@{0}: On master: test message" >expect &&
+   git stash list -1 >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.428.g67fe103aa



Re: Typesafer git hash patch

2017-02-28 Thread brian m. carlson
On Tue, Feb 28, 2017 at 03:26:34PM -0500, Jeff King wrote:
> Yeah, a lot of brian's patches have been focused around the fixing the
> related size assumptions. We've got GIT_SHA1_HEXSZ which doesn't solve
> the problem, but at least makes it easy to find. And a big improvement
> in the most recent series is a parse_oid() function that lets you parse
> object-ids left-to-right without knowing the size up front. So things
> like:
> 
>   if (len > 82 &&
>   !get_sha1_hex(buf, sha1_a) &&
>   get_sha1_hex(buf + 41, sha1_b))
> 
> becomes more like:
> 
>   if (parse_oid(p, oid_a, &p) && *p++ == ' ' &&
>   parse_oid(p, oid_b, &p) && *p++ == '\n')

What I could do instead of using GIT_SHA1_HEXSZ is use GIT_MAX_HEXSZ for
things that are about allocating enough memory and create a global (or
function) for things that only care about what the current hash size is.
That might be a desirable approach.  If other people agree, I can make a
patch to do that.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v8 6/6] stash: allow pathspecs in the no verb form

2017-02-28 Thread Thomas Gummerer
Now that stash_push is used in the no verb form of stash, allow
specifying the command line for this form as well.  Always use -- to
disambiguate pathspecs from other non-option arguments.

Also make git stash -p an alias for git stash push -p.  This allows
users to use git stash -p .

Signed-off-by: Thomas Gummerer 
---
 Documentation/git-stash.txt | 11 +++
 git-stash.sh|  3 +++
 t/t3903-stash.sh| 15 +++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 4d8d30f179..70191d06b6 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -54,10 +54,13 @@ push [-p|--patch] [-k|--[no-]keep-index] 
[-u|--include-untracked] [-a|--all] [-q
Save your local modifications to a new 'stash' and roll them
back to HEAD (in the working tree and in the index).
The  part is optional and gives
-   the description along with the stashed state.  For quickly making
-   a snapshot, you can omit _both_ "save" and , but giving
-   only  does not trigger this action to prevent a misspelled
-   subcommand from making an unwanted stash.
+   the description along with the stashed state.
++
+For quickly making a snapshot, you can omit "push".  In this mode,
+non-option arguments are not allowed to prevent a misspelled
+subcommand from making an unwanted stash.  The two exceptions to this
+are `stash -p` which acts as alias for `stash push -p` and pathspecs,
+which are allowed after a double hyphen `--` for disambiguation.
 +
 When pathspec is given to 'git stash push', the new stash records the
 modified states only for the files that match the pathspec.  The index
diff --git a/git-stash.sh b/git-stash.sh
index f3c2f86804..207c8126f4 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -657,12 +657,15 @@ apply_to_branch () {
}
 }
 
+test "$1" = "-p" && set "push" "$@"
+
 PARSE_CACHE='--not-parsed'
 # The default command is "push" if nothing but options are given
 seen_non_option=
 for opt
 do
case "$opt" in
+   --) break ;;
-*) ;;
*) seen_non_option=t; break ;;
esac
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index e875fe8259..89877e4b52 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -892,4 +892,19 @@ test_expect_success 'untracked files are left in place 
when -u is not given' '
test_path_is_file untracked
 '
 
+test_expect_success 'stash without verb with pathspec' '
+   >"foo bar" &&
+   >foo &&
+   >bar &&
+   git add foo* &&
+   git stash -- "foo b*" &&
+   test_path_is_missing "foo bar" &&
+   test_path_is_file foo &&
+   test_path_is_file bar &&
+   git stash pop &&
+   test_path_is_file "foo bar" &&
+   test_path_is_file foo &&
+   test_path_is_file bar
+'
+
 test_done
-- 
2.12.0.428.g67fe103aa



[PATCH v8 4/6] stash: teach 'push' (and 'create_stash') to honor pathspec

2017-02-28 Thread Thomas Gummerer
While working on a repository, it's often helpful to stash the changes
of a single or multiple files, and leave others alone.  Unfortunately
git currently offers no such option.  git stash -p can be used to work
around this, but it's often impractical when there are a lot of changes
over multiple files.

Allow 'git stash push' to take pathspec to specify which paths to stash.

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
 Documentation/git-stash.txt|  9 +++-
 git-stash.sh   | 39 +++-
 t/t3903-stash.sh   | 92 ++
 t/t3905-stash-include-untracked.sh | 26 +++
 4 files changed, 154 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index d240df4af7..88369ed8b6 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -17,6 +17,7 @@ SYNOPSIS
 [-u|--include-untracked] [-a|--all] []]
 'git stash' push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
 [-u|--include-untracked] [-a|--all] [-m|--message ]]
+[--] [...]
 'git stash' clear
 'git stash' create []
 'git stash' store [-m|--message ] [-q|--quiet] 
@@ -48,7 +49,7 @@ OPTIONS
 ---
 
 save [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] 
[-q|--quiet] []::
-push [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] 
[-q|--quiet] [-m|--message ]::
+push [-p|--patch] [-k|--[no-]keep-index] [-u|--include-untracked] [-a|--all] 
[-q|--quiet] [-m|--message ] [--] [...]::
 
Save your local modifications to a new 'stash' and roll them
back to HEAD (in the working tree and in the index).
@@ -58,6 +59,12 @@ push [-p|--patch] [-k|--[no-]keep-index] 
[-u|--include-untracked] [-a|--all] [-q
only  does not trigger this action to prevent a misspelled
subcommand from making an unwanted stash.
 +
+When pathspec is given to 'git stash push', the new stash records the
+modified states only for the files that match the pathspec.  The index
+entries and working tree files are then rolled back to the state in
+HEAD only for these files, too, leaving files that do not match the
+pathspec intact.
++
 If the `--keep-index` option is used, all changes already added to the
 index are left intact.
 +
diff --git a/git-stash.sh b/git-stash.sh
index ef5d1b45be..5892abafa3 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -11,6 +11,7 @@ USAGE="list []
   [-u|--include-untracked] [-a|--all] []]
or: $dashless push [--patch] [-k|--[no-]keep-index] [-q|--quiet]
  [-u|--include-untracked] [-a|--all] [-m ]
+ [-- ...]
or: $dashless clear"
 
 SUBDIRECTORY_OK=Yes
@@ -35,15 +36,15 @@ else
 fi
 
 no_changes () {
-   git diff-index --quiet --cached HEAD --ignore-submodules -- &&
-   git diff-files --quiet --ignore-submodules &&
+   git diff-index --quiet --cached HEAD --ignore-submodules -- "$@" &&
+   git diff-files --quiet --ignore-submodules -- "$@" &&
(test -z "$untracked" || test -z "$(untracked_files)")
 }
 
 untracked_files () {
excl_opt=--exclude-standard
test "$untracked" = "all" && excl_opt=
-   git ls-files -o -z $excl_opt
+   git ls-files -o -z $excl_opt -- "$@"
 }
 
 clear_stash () {
@@ -71,12 +72,16 @@ create_stash () {
shift
untracked=${1?"BUG: create_stash () -u requires an 
argument"}
;;
+   --)
+   shift
+   break
+   ;;
esac
shift
done
 
git update-index -q --refresh
-   if no_changes
+   if no_changes "$@"
then
exit 0
fi
@@ -108,7 +113,7 @@ create_stash () {
# Untracked files are stored by themselves in a parentless 
commit, for
# ease of unpacking later.
u_commit=$(
-   untracked_files | (
+   untracked_files "$@" | (
GIT_INDEX_FILE="$TMPindex" &&
export GIT_INDEX_FILE &&
rm -f "$TMPindex" &&
@@ -131,7 +136,7 @@ create_stash () {
git read-tree --index-output="$TMPindex" -m $i_tree &&
GIT_INDEX_FILE="$TMPindex" &&
export GIT_INDEX_FILE &&
-   git diff-index --name-only -z HEAD -- 
>"$TMP-stagenames" &&
+   git diff-index --name-only -z HEAD -- "$@" 
>"$TMP-stagenames" &&
git update-index -z --add --remove --stdin 
<"$TMP-stagenames" &&
git write-tree &&
rm -f "$TMPindex"
@@ -145,7 +150,7 @@ create_stash () {
 
# find out what the user wants
GIT_INDEX_FILE="

Re: Typesafer git hash patch

2017-02-28 Thread Linus Torvalds
On Tue, Feb 28, 2017 at 11:53 AM, Junio C Hamano  wrote:
> Linus Torvalds  writes:
>>
>> Having the hashes be more encapsulated does seem to make things better
>> in many ways. What I did was to also just unify the notion of "hash_t"
>> and "struct object_id", so the two are entirely interchangeable.
>
> Sorry, but at this point in your description, you completely lost
> me.  I thought "struct object_id" was what you call "hash_t" in the
> above.

So what happened was that I started out just encapsulating

   unsigned char sha1[20];

as a

   hash_t hash;

and that made sense in a lot of situations. I always thought that code that used

struct object_id oid;

is just too ugly to live, so I'm not actually all that big of a fan of
the oid approach.

But the two approaches really are pretty much equivalent logically,
even if they don't look the same.

So I wanted to unify things: "One type to bring them all and in the
darkness bind them".

So I just basically made this:

typedef struct object_id {
unsigned char hash[GIT_HASH_SIZE];
} hash_t;

to create one single data structure that doesn't make my eyes bleed.
That "struct object_id" still exists, but I don't generally have to
look at it when doing the conversion, and any current users "just
work".

>> turns into
>>
>> +   const hash_t *mb = &result->item->object.oid;
>> +   if (!hashcmp(mb, current_bad_oid)) {
>
> Hmph.  I somehow thought the longer term directio for the above code
> would be to turn it into
>
> if (!oidcmp(&result->item->object.oid, ¤t_bad_oid))

Well, you can actually do it with my patch, since I left "oidcmp()"
alone and it's just an alias for "hashcmp()" in my tree.

Except I think "oid" is an odious name, and really confusing and not
at all descriptive.

Using a three-letter acronym when we have a four-letter actual word to
say it feels stupid and wrong to me.

So what my conversion does is basically say that the name is *hash*.
So instead of using "oidcmp", you use "hashcmp":

if (!hashcmp(&result->item->object.oid, ¤t_bad_oid))

and functions take a "hash_t *" argument rather than a "struct
object_id *" argument, and when there was any kind of confusion and
mixing of use, I converted to "hash_t".

Both oid and "unsigned char *" users got converted.

In other words, what I was aiming for was getting rid - entirely - of
the "two different types", and I disliked both "oid" and "unsigned
char []", so neither replaces the other.

> Having said all that, I do not offhand see a huge benefit of the
> current layout that has one layer between the hash (i.e. oid.hash)
> and the object name (i.e. oid) over "there is no need for oid.hash;
> oid is just a hash", which you seem to be doing.

Yes exactly.

>> And as part of the type safety, I do think I may have found a bug:
>>
>> show_one_mergetag():
>>
>> strbuf_addf(&verify_message, "tag %s names a non-parent 
>> %s\n",
>> tag->tag, tag->tagged->oid.hash);
>>
>> note how it prints out the "non-parent %s", but that's a SHA1 hash
>> that hasn't been converted to hex. Hmm?
>
> Yup.  That needs fixing, obviously.

I suspect nobody has ever hit that case - I tried to google for "names
a non-parent" and "tag" and "git" and the only thing that I found was
hits to git source.

So I was actually fairly impressed that the only thing I found was one
totally insignificant bug in a printout.

I did find a lot of cases where we really do mix a buffer of memory
("unsigned char *") with the hash. Not unsurprisingly, most of them
were in pack-file handling and in the tree parsing.

And some thing do the reverse, and really walk a hash name byte by
byte. Things like "find_pack_entry_one()" really does walk the bytes
of the hash.

With the conversion in place, those painful things are a bit more
obvious. So there's a couple of places where I just did a hard
conversion from a "unsigned char *" to a hash_t, but they are now
obvious casts and there's only 17 of them:

  [torvalds@i7 git]$ git grep '(hash_t \*)'
  builtin/index-pack.c:   hashcpy(ref_hash, (hash_t *) fill(20));
  builtin/pack-redundant.c:   hash_t *h1 = (hash_t
*)(p1_base + p1_off);
  builtin/pack-redundant.c:   hash_t *h2 = (hash_t
*)(p2_base + p2_off);
  builtin/pack-redundant.c:   hash_t *h1 = (hash_t
*)(p1_base + p1_off);
  builtin/pack-redundant.c:   hash_t *h2 = (hash_t
*)(p2_base + p2_off);
  builtin/pack-redundant.c:   hash_t *h = (hash_t *)(base + off);
  dir.c:  hashcpy(&ud->exclude_sha1, (hash_t *)rd->data);
  fast-import.c:  hashcpy(&e->versions[0].hash, (hash_t *)c);
  fast-import.c:  hashcpy(&e->versions[1].hash, (hash_t *)c);
  match-trees.c:  hashcpy((hash_t *)rewrite_here, rewrite_with);
  sha1-lookup.c:  lo, mi, hi, sha1_to_hex((hash_t *)key));
  sha1_file.c:return (hash_t *)(base + idx * GIT_SHA1_RAWSZ);
  

[PATCH v8 0/6] stash: support pathspec argument

2017-02-28 Thread Thomas Gummerer
Thanks Junio for the review and the squashable diff with the fix for
my errors.

I changed it a tiny bit, to use git stash push instead of git stash,
so the complete diff could go into 4/6, where I think I think the test
fits best.

Interdiff below:

diff --git a/git-stash.sh b/git-stash.sh
index 28d0624c75..207c8126f4 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -300,6 +300,8 @@ push_stash () {
if test $# != 0
then
git reset ${GIT_QUIET:+-q} -- "$@"
+   git ls-files -z --modified -- "$@" |
+   git checkout-index -z --force --stdin
git checkout ${GIT_QUIET:+-q} HEAD -- $(git ls-files -z 
--modified "$@")
git clean --force ${GIT_QUIET:+-q} -d -- "$@"
else
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index f7733b4dd4..89877e4b52 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -842,6 +842,22 @@ test_expect_success 'stash with file including $IFS 
character' '
test_path_is_file bar
 '
 
+test_expect_success 'stash with pathspec matching multiple paths' '
+   echo original >file &&
+   echo original >other-file &&
+   git commit -m "two" file other-file &&
+   echo modified >file &&
+   echo modified >other-file &&
+   git stash push -- "*file" &&
+   echo original >expect &&
+   test_cmp expect file &&
+   test_cmp expect other-file &&
+   git stash pop &&
+   echo modified >expect &&
+   test_cmp expect file &&
+   test_cmp expect other-file
+'
+
 test_expect_success 'stash push -p with pathspec shows no changes only once' '
>foo &&
git add foo &&

Thomas Gummerer (6):
  stash: introduce push verb
  stash: add test for the create command line arguments
  stash: refactor stash_create
  stash: teach 'push' (and 'create_stash') to honor pathspec
  stash: use stash_push for no verb form
  stash: allow pathspecs in the no verb form

 Documentation/git-stash.txt|  25 +--
 git-stash.sh   | 116 +--
 t/t3903-stash.sh   | 138 -
 t/t3905-stash-include-untracked.sh |  26 +++
 4 files changed, 275 insertions(+), 30 deletions(-)

-- 
2.12.0.428.g67fe103aa



[PATCH v8 2/6] stash: add test for the create command line arguments

2017-02-28 Thread Thomas Gummerer
Currently there is no test showing the expected behaviour of git stash
create's command line arguments.  Add a test for that to show the
current expected behaviour and to make sure future refactorings don't
break those expectations.

Signed-off-by: Thomas Gummerer 
---
 t/t3903-stash.sh | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 3577115807..ffe3549ea5 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -784,4 +784,22 @@ test_expect_success 'push -m shows right message' '
test_cmp expect actual
 '
 
+test_expect_success 'create stores correct message' '
+   >foo &&
+   git add foo &&
+   STASH_ID=$(git stash create "create test message") &&
+   echo "On master: create test message" >expect &&
+   git show --pretty=%s -s ${STASH_ID} >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'create with multiple arguments for the message' '
+   >foo &&
+   git add foo &&
+   STASH_ID=$(git stash create test untracked) &&
+   echo "On master: test untracked" >expect &&
+   git show --pretty=%s -s ${STASH_ID} >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.428.g67fe103aa



[PATCH v8 5/6] stash: use stash_push for no verb form

2017-02-28 Thread Thomas Gummerer
Now that we have stash_push, which accepts pathspec arguments, use
it instead of stash_save in git stash without any additional verbs.

Previously we allowed git stash -- -message, which is no longer allowed
after this patch.  Messages starting with a hyphen was allowed since
3c2eb80f, ("stash: simplify defaulting to "save" and reject unknown
options").  However it was never the intent to allow that, but rather it
was allowed accidentally.

Signed-off-by: Thomas Gummerer 
---
 Documentation/git-stash.txt |  8 
 git-stash.sh| 16 
 t/t3903-stash.sh|  4 +---
 3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 88369ed8b6..4d8d30f179 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -13,11 +13,11 @@ SYNOPSIS
 'git stash' drop [-q|--quiet] []
 'git stash' ( pop | apply ) [--index] [-q|--quiet] []
 'git stash' branch  []
-'git stash' [save [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
-[-u|--include-untracked] [-a|--all] []]
-'git stash' push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
+'git stash' save [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
+[-u|--include-untracked] [-a|--all] []
+'git stash' [push [-p|--patch] [-k|--[no-]keep-index] [-q|--quiet]
 [-u|--include-untracked] [-a|--all] [-m|--message ]]
-[--] [...]
+[--] [...]]
 'git stash' clear
 'git stash' create []
 'git stash' store [-m|--message ] [-q|--quiet] 
diff --git a/git-stash.sh b/git-stash.sh
index 5892abafa3..f3c2f86804 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -7,11 +7,11 @@ USAGE="list []
or: $dashless drop [-q|--quiet] []
or: $dashless ( pop | apply ) [--index] [-q|--quiet] []
or: $dashless branch  []
-   or: $dashless [save [--patch] [-k|--[no-]keep-index] [-q|--quiet]
-  [-u|--include-untracked] [-a|--all] []]
-   or: $dashless push [--patch] [-k|--[no-]keep-index] [-q|--quiet]
- [-u|--include-untracked] [-a|--all] [-m ]
- [-- ...]
+   or: $dashless save [--patch] [-k|--[no-]keep-index] [-q|--quiet]
+ [-u|--include-untracked] [-a|--all] []
+   or: $dashless [push [--patch] [-k|--[no-]keep-index] [-q|--quiet]
+  [-u|--include-untracked] [-a|--all] [-m ]
+  [-- ...]]
or: $dashless clear"
 
 SUBDIRECTORY_OK=Yes
@@ -658,7 +658,7 @@ apply_to_branch () {
 }
 
 PARSE_CACHE='--not-parsed'
-# The default command is "save" if nothing but options are given
+# The default command is "push" if nothing but options are given
 seen_non_option=
 for opt
 do
@@ -668,7 +668,7 @@ do
esac
 done
 
-test -n "$seen_non_option" || set "save" "$@"
+test -n "$seen_non_option" || set "push" "$@"
 
 # Main command set
 case "$1" in
@@ -719,7 +719,7 @@ branch)
 *)
case $# in
0)
-   save_stash &&
+   push_stash &&
say "$(gettext "(To restore them type \"git stash apply\")")"
;;
*)
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index f934993263..e875fe8259 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -274,9 +274,7 @@ test_expect_success 'stash --invalid-option' '
git add file2 &&
test_must_fail git stash --invalid-option &&
test_must_fail git stash save --invalid-option &&
-   test bar5,bar6 = $(cat file),$(cat file2) &&
-   git stash -- -message-starting-with-dash &&
-   test bar,bar2 = $(cat file),$(cat file2)
+   test bar5,bar6 = $(cat file),$(cat file2)
 '
 
 test_expect_success 'stash an added file' '
-- 
2.12.0.428.g67fe103aa



Re: [PATCH] Travis: also test on 32-bit Linux

2017-02-28 Thread Johannes Schindelin
Hi,

On Tue, 28 Feb 2017, Johannes Schindelin wrote:

> When Git v2.9.1 was released, it had a bug that showed only on Windows
> and on 32-bit systems: our assumption that `unsigned long` can hold
> 64-bit values turned out to be wrong.
> 
> This could have been caught earlier if we had a Continuous Testing set
> up that includes a build and test run on 32-bit Linux.
> 
> Let's do this (and take care of the Windows build later). This patch
> asks Travis CI to install a Docker image with 32-bit libraries and then
> goes on to build and test Git using this 32-bit setup.

For the record, I first tested this with the LONG_IS_32BIT flag forced to
`true` so that the date tests must fail. They did fail as expected (search
for "not ok" in this output):

https://travis-ci.org/git/git/jobs/206199002

(I actually cannot see the log right now, Travis seems to be under heavy
load...)

A much cleaned up version that does not force that LONG_IS_32BIT produced
this log:

https://travis-ci.org/git/git/jobs/206228708

(Same here, my browser fails to load the log, probably because Travis
experiences quite high a load...)

Please note that this approach is not without problems. It would appear
that Travis currently has serious problems to even *reach* the Docker Hub,
and therefore cannot download the Docker image:

https://travis-ci.org/git/git/jobs/206292947#L6

Ciao,
Johannes


Re: [PATCH v5 00/24] Remove submodule from files-backend.c

2017-02-28 Thread Junio C Hamano
Michael Haggerty  writes:

> On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
>> v5 goes a bit longer than v4, basically:
>
> I've read through patch 14/24 so far, and they all look good except for
> the mostly superficial comments that I have sent so far. I like the way
> this is heading!
>
> I'll try to continue tomorrow.
>
> Michael

Thanks.


Re: [PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-28 Thread Junio C Hamano
Andreas Heiduk  writes:

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index e6215c3..7c28e73 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -192,10 +192,9 @@ ifndef::git-log[]
>   given, do not munge pathnames and use NULs as output field terminators.
>  endif::git-log[]
>  +
> -Without this option, each pathname output will have TAB, LF, double quotes,
> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
> -respectively, and the pathname will be enclosed in double quotes if
> -any of those replacements occurred.
> +Without this option, pathnames with "unusual" characters are munged as
> +explained for the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).

Seeing that many other instances call this "quoted", we may want to
be consistent.  I can see "munge" in the pre-context, but that one
can stay as is. Under -z, no modification/munging happens.  With -z,
a specific kind of modification (called "quote" described in the
documentation for core.quotepath variable) happens.  The same
comment applies to the change to Documentation/git-apply.txt

Otherwise the patch looks good.

Thanks.


Re: Typesafer git hash patch

2017-02-28 Thread brian m. carlson
On Tue, Feb 28, 2017 at 12:25:20PM -0800, Linus Torvalds wrote:
> On Tue, Feb 28, 2017 at 11:53 AM, Junio C Hamano  wrote:
> > Sorry, but at this point in your description, you completely lost
> > me.  I thought "struct object_id" was what you call "hash_t" in the
> > above.
> 
> So what happened was that I started out just encapsulating
> 
>unsigned char sha1[20];
> 
> as a
> 
>hash_t hash;
> 
> and that made sense in a lot of situations. I always thought that code that 
> used
> 
> struct object_id oid;
> 
> is just too ugly to live, so I'm not actually all that big of a fan of
> the oid approach.

There was some discussion on the list about the best name to use, and
object_id seemed like the most popular decision.  I don't care if we add
a typedef for it and prefer that typedef, but the existing code avoided
typedefs in favor of explicit struct definitions.

I'm certainly not opposed to having less to type, because “object_id” is
awkward to type, but I've generally tried to defer to existing uses in
the codebase and what list regulars are comfortable with.

The only problem with using hash_t is that it's then not obvious as we
transition (assuming we don't take an omnibus patch) what is converted
and what isn't.

> But the two approaches really are pretty much equivalent logically,
> even if they don't look the same.

Yeah, I think they are.

> So I wanted to unify things: "One type to bring them all and in the
> darkness bind them".
> 
> So I just basically made this:
> 
> typedef struct object_id {
> unsigned char hash[GIT_HASH_SIZE];
> } hash_t;
> 
> to create one single data structure that doesn't make my eyes bleed.
> That "struct object_id" still exists, but I don't generally have to
> look at it when doing the conversion, and any current users "just
> work".

There is nothing that prevents us from doing a nice global
search-and-replace in the future if we think the status quo is bad.
That's something that could be automated with Coccinelle.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] http: attempt updating base URL only if no error

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 10:48:52AM -0800, Jonathan Tan wrote:

> > Running your included test, we get:
> > 
> >   fatal: unable to access 'http://127.0.0.1:5550/redir-to/502/': The
> >   requested URL returned error: 502
> > 
> > but the error really happened in the intermediate step. I wonder if we
> > should show the effective_url in that case, as it's more likely to
> > pinpoint the problem. OTOH, we do not mention the intermediate redirect
> > at all, so they might be confused about where that URL came from. If you
> > really want to debug HTTP confusion, you should use GIT_TRACE_CURL.
> 
> Yeah, if we mention the effective_url, I think that there would need to be a
> lot more explaining to be done (e.g. why does my URL have
> "info/refs?service=git-upload-pack" tacked on at the end). It might be
> better to just recommend GIT_TRACE_CURL.

Indeed. Your comment made me realize that my suggestion was the exact
opposite of the earlier d5ccbe4df (remote-curl: consistently report repo
url for http errors, 2013-04-05). :)

Given that we don't see a lot of questions on the list about this,
either it doesn't come up much, or they are capable of finding
GIT_TRACE_CURL or GIT_CURL_VERBOSE on their own. So I think we can leave
the message as-is.

-Peff


Re: [ANNOUNCE] Git for Windows 2.12.0

2017-02-28 Thread Johannes Schindelin
Hi,

On Tue, 28 Feb 2017, w...@cornell.edu wrote:

> I have attempted to download the new version but on 2/28/2016 starting
> at 12:00PM EST I have been unable to download GIT 2.12.  The error
> message that is returned is:
> 
> Make sure you’ve got the right web address: 
> https://github-cloud.s3.amazonaws.com

It looks as if AWS has serious problems right now. Try downloading from
here:

https://instant.io/#152a79b2aad5137413e1ca2edd43fc08f736d896

And please leave the browser open for a while after it downloaded.

Thanks,
Johannes

Re: [PATCH 5/5] ls-files: fix bug when recuring with relative pathspec

2017-02-28 Thread Junio C Hamano
Brandon Williams  writes:

> Fix a bug which causes a child process for a submodule to error out when a
> relative pathspec with a ".." is provided in the superproject.
>
> While at it, correctly construct the super-prefix to be used in a submodule
> when not at the root of the repository.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/ls-files.c | 8 ++--
>  t/t3007-ls-files-recurse-submodules.sh | 2 +-
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index 159229081..89533ab8e 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -194,12 +194,15 @@ static void compile_submodule_options(const struct 
> dir_struct *dir, int show_tag
>  static void show_gitlink(const struct cache_entry *ce)
>  {
>   struct child_process cp = CHILD_PROCESS_INIT;
> + struct strbuf name = STRBUF_INIT;
>   int status;
>   int i;
>  
> + quote_path_relative(ce->name, prefix, &name);
>   argv_array_pushf(&cp.args, "--super-prefix=%s%s/",

Same comment as 3/5.  quote_path is to produce c-quote and is not
even meant for shell script quoting.  run_command() interface would
do its own shell quoting when needed, so  I think you just want the
exact string you want to pass here.



Re: [PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-28 Thread Andreas Heiduk
Am 28.02.2017 um 21:51 schrieb Junio C Hamano:
> Andreas Heiduk  writes:
> 
>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index e6215c3..7c28e73 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -192,10 +192,9 @@ ifndef::git-log[]
>>  given, do not munge pathnames and use NULs as output field terminators.
>>  endif::git-log[]
>>  +
>> -Without this option, each pathname output will have TAB, LF, double quotes,
>> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
>> -respectively, and the pathname will be enclosed in double quotes if
>> -any of those replacements occurred.
>> +Without this option, pathnames with "unusual" characters are munged as
>> +explained for the configuration variable `core.quotePath` (see
>> +linkgit:git-config[1]).
> 
> Seeing that many other instances call this "quoted", we may want to
> be consistent.  I can see "munge" in the pre-context, but that one
> can stay as is. Under -z, no modification/munging happens.  With -z,
> a specific kind of modification (called "quote" described in the
> documentation for core.quotepath variable) happens.  The same
> comment applies to the change to Documentation/git-apply.txt
> 
> Otherwise the patch looks good.
> 
> Thanks.
>

I'll fix the "munged" and, unless there are objections, I will also
replace the remaining ones in the vicinity. These are the last
occurrences of "munged".

You are OK with the references to another man page? My idea was to
establish a well-known term.



Re: [PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-28 Thread Junio C Hamano
Andreas Heiduk  writes:

> I'll fix the "munged" and, unless there are objections, I will also
> replace the remaining ones in the vicinity. These are the last
> occurrences of "munged".

I'd rather see the "we do not munge" to stay the same.  "we do not
quote" still allows us to do modifications that are different from
quoting.

> You are OK with the references to another man page? My idea was to
> establish a well-known term.

The "well-known term" cannot just be "quote", because it is too
generic.  "git rev-parse --sq-quote" does a different kind of
quoting from the quoting done here for paths with unusual
characters, for example.

We certainly *could* (1) add to glossary-content.txt the definition
of "c-quote" and describe it there, (2) change the "see the quoting
explained for core.quotePath" to "unless -z is given, paths are
c-quoted", and (3) change the core.quotePath description to refer to
"c-quote" in the glossary.

But I am not sure it that makes the resulting document easier to use
by the end users.  I personally find the result of applying the
patch you posted easier.





Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Johannes Schindelin
Hi Junio,

On Tue, 28 Feb 2017, Junio C Hamano wrote:

> René Scharfe  writes:
> 
> > Am 28.02.2017 um 15:28 schrieb Jeff King:
> >
> >> It looks from the discussion like the sanest path forward is our own
> >> signed-64bit timestamp_t. That's unfortunate compared to using the
> >> standard time_t, but hopefully it would reduce the number of knobs
> >> (like TIME_T_IS_INT64) in the long run.
> >
> > Glibc will get a way to enable 64-bit time_t on 32-bit platforms
> > eventually (https://sourceware.org/glibc/wiki/Y2038ProofnessDesign).
> > Can platforms that won't provide a 64-bit time_t by 2038 be actually
> > used at that point?  How would we get time information on them?  How
> > would a custom timestamp_t help us?
> 
> That's a sensible "wait, let's step back a bit".  I take it that you are
> saying "time_t is just fine", and I am inclined to agree.
> 
> Right now, they may be able to have future timestamps ranging to
> year 2100 and switching to time_t would limit their ability to
> express future time to 2038 but they would be able to express
> timestamp in the past to cover most of 20th century.  Given that
> these 32-bit time_t software platforms will die off before year 2038
> (either by underlying hardware getting obsolete, or software updated
> to handle 64-bit time_t), the (temporary) loss of 2038-2100 range
> would not be too big a deal to warrant additional complexity.

You seem to assume that time_t is required to be signed. But from my
understanding that is only guaranteed by POSIX, not by ISO C.

We may very well buy ourselves a ton of trouble if we decide to switch to
`time_t` rather than to `int64_t`.

Ciao,
Johannes

Re: [PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-28 Thread Andreas Heiduk
Am 24.02.2017 um 22:54 schrieb Jakub Narębski:
> W dniu 24.02.2017 o 21:37, Andreas Heiduk pisze:
>> Linking the description for pathname quoting to the configuration
>> variable "core.quotePath" removes inconstistent and incomplete
>> sections while also giving two hints how to deal with it: Either with
>> "-c core.quotePath=false" or with "-z".
> 
> This patch I am not sure about.  On one hand it improves consistency
> (and makes information more complete), on the other hand it removes
> information at hand and instead refers to other manpage.

Indeed this is a trade-off. My intention was to establish some kind of
well-known term by citing "core.quotePath" everywhere. So after a while
"quoting" and "core.quotePath" should be known by everyone interested.

> Perhaps a better solution would be to craft a short description that
> is both sufficiently complete, and refers to "core.quotePath" for
> more details, and then transclude it with "include::quotepath.txt[]".

On the other hand this makes every section quite long. Personally, I
find long and repetitive documentation harder to comprehend.

>>
>> Signed-off-by: Andreas Heiduk 
>> ---
>>  Documentation/diff-format.txt |  7 ---
>>  Documentation/diff-generate-patch.txt |  7 +++
>>  Documentation/diff-options.txt|  7 +++
>>  Documentation/git-apply.txt   |  7 +++
>>  Documentation/git-commit.txt  |  9 ++---
>>  Documentation/git-ls-files.txt| 10 ++
>>  Documentation/git-ls-tree.txt | 10 +++---
>>  Documentation/git-status.txt  |  7 +++
>>  8 files changed, 35 insertions(+), 29 deletions(-)
>>
>> diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
>> index cf52626..706916c 100644
>> --- a/Documentation/diff-format.txt
>> +++ b/Documentation/diff-format.txt
>> @@ -78,9 +78,10 @@ Example:
>>  :100644 100644 5be4a4.. 00.. M file.c
>>  
>>  
>> -When `-z` option is not used, TAB, LF, and backslash characters
>> -in pathnames are represented as `\t`, `\n`, and `\\`,
>> -respectively.
>> +Without the `-z` option, pathnames with "unusual" characters are
>> +quoted as explained for the configuration variable `core.quotePath`
>> +(see linkgit:git-config[1]).  Using `-z` the filename is output
>> +verbatim and the line is terminated by a NUL byte.
>>  
>>  diff format for merges
>>  --
>> diff --git a/Documentation/diff-generate-patch.txt 
>> b/Documentation/diff-generate-patch.txt
>> index d2a7ff5..231105c 100644
>> --- a/Documentation/diff-generate-patch.txt
>> +++ b/Documentation/diff-generate-patch.txt
>> @@ -53,10 +53,9 @@ The index line includes the SHA-1 checksum before and 
>> after the change.
>>  The  is included if the file mode does not change; otherwise,
>>  separate lines indicate the old and the new mode.
>>  
>> -3.  TAB, LF, double quote and backslash characters in pathnames
>> -are represented as `\t`, `\n`, `\"` and `\\`, respectively.
>> -If there is need for such substitution then the whole
>> -pathname is put in double quotes.
>> +3.  Pathnames with "unusual" characters are quoted as explained for
>> +the configuration variable `core.quotePath` (see
>> +linkgit:git-config[1]).
>>  
>>  4.  All the `file1` files in the output refer to files before the
>>  commit, and all the `file2` files refer to files after the commit.
>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index e6215c3..7c28e73 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -192,10 +192,9 @@ ifndef::git-log[]
>>  given, do not munge pathnames and use NULs as output field terminators.
>>  endif::git-log[]
>>  +
>> -Without this option, each pathname output will have TAB, LF, double quotes,
>> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
>> -respectively, and the pathname will be enclosed in double quotes if
>> -any of those replacements occurred.
>> +Without this option, pathnames with "unusual" characters are munged as
>> +explained for the configuration variable `core.quotePath` (see
>> +linkgit:git-config[1]).
>>  
>>  --name-only::
>>  Show only names of changed files.
>> diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
>> index 8ddb207..a7a001b 100644
>> --- a/Documentation/git-apply.txt
>> +++ b/Documentation/git-apply.txt
>> @@ -108,10 +108,9 @@ the information is read from the current index instead.
>>  When `--numstat` has been given, do not munge pathnames,
>>  but use a NUL-terminated machine-readable format.
>>  +
>> -Without this option, each pathname output will have TAB, LF, double quotes,
>> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
>> -respectively, and the pathname will be enclosed in double quotes if
>> -any of those replacements occurred.
>> +Without this option, pathnames with "unusua

Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 09:54:58PM +0100, Johannes Schindelin wrote:

> > Right now, they may be able to have future timestamps ranging to
> > year 2100 and switching to time_t would limit their ability to
> > express future time to 2038 but they would be able to express
> > timestamp in the past to cover most of 20th century.  Given that
> > these 32-bit time_t software platforms will die off before year 2038
> > (either by underlying hardware getting obsolete, or software updated
> > to handle 64-bit time_t), the (temporary) loss of 2038-2100 range
> > would not be too big a deal to warrant additional complexity.
> 
> You seem to assume that time_t is required to be signed. But from my
> understanding that is only guaranteed by POSIX, not by ISO C.

I wonder how common that is in practice, and whether it is worth
treating it as a quality-of-implementation issue. IOW, to say "your
platform time_t doesn't handle negative times, so you get Jan 1 1970 for
any dates before then. Complain to your platform vendor".

I'm not sure how much complexity it would add to the code.  Either way,
when we parse an ascii-decimal timestamp from an object, we need to do
bounds checking. Whether that bound is at "0" or "LONG_MIN", I don't
think that it changes much.

Meanwhile, if we were to have a negative timestamp_t but the system
time_t is unsigned, we have to do a bounds-check any time we use a
system function like gmtime(), or risk funny wrap-around bugs.

-Peff


Re: 'git submodules update' ignores credential.helper config of the parent repository

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 12:21:57PM -0800, Stefan Beller wrote:

> > I'm still open to the idea that we simply improve the documentation to
> > make it clear that per-repo config really is per-repo, and is not shared
> > between super-projects and submodules. And then something like Duy's
> > proposed conditional config lets you set global config that flexibly
> > covers a set of repos.
> 
> How would the workflow for that look like?
> My naive thought on that is:
> 
>   (1)  $EDIT .git/config_to_be_included
>   (2)  $ git config add-config-inclusion .git/config_to_be_included
>   (3)  $ git submodule foreach git config add-inclusion-config
> .git/config_to_be_included
> 
> which sounds a bit cumbersome to me.
> So I guess we'd want some parts of that as part of another command, e.g.
> (3) could be part of (2).

I think it would be more like:

  (1) $EDIT ~/.gitconfig-super
  (2) git config --global \
includeIf.gitdir:/path/to/super.path .gitconfig-super

I know that is probably a bit more cumbersome to figure out than
treating the super/sub relationship in a special way. But I suspect for
a lot of cases that it actually ends up even better, because the
situation is more like:

  (1) $EDIT ~/.gitconfig-work
  (2) git config --global includeIf.gitdir:~/work.path .gitconfig-work

and then it covers all of your projects in ~/work, whether they are
super-projects, submodules, or regular repos.

> I am also open and willing to document this better; but were would
> we want to put documentation? Obviously we would not want to put it
> alongside each potentially useful config option to be inherited to
> submodules. (that would imply repeating ourselves quite a lot in
> the config man page).
> 
> I guess putting it into "man gitmodules" that I was writing tentatively
> would make sense.

Yeah, I think it is worth mentioning in "gitmodules", and probably in
git-config where we define per-repo config.

It may also be worth calling it out especially for url.insteadOf, just
because it is not clear there when the URL rewriting happens (it's not
insane to think that it happens in the super-project, that just doesn't
happen to be how it's implemented).

-Peff


Re: [PATCH 3/5] grep: fix bug when recuring with relative pathspec

2017-02-28 Thread Junio C Hamano
Brandon Williams  writes:

>   /* Add super prefix */
> + quote_path_relative(name, opt->prefix, &buf);

Hmph, do you want a quoted version here, not just relative_path()?

Perhaps add a test with an "unusual" byte (e.g. a double-quote) in
the path?

>   argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
>super_prefix ? super_prefix : "",
> -  name);
> +  buf.buf);
> + strbuf_release(&buf);
>   argv_array_push(&cp.args, "grep");
>  
>   /*
> @@ -1199,7 +1202,8 @@ int cmd_grep(int argc, const char **argv, const char 
> *prefix)
>  
>   parse_pathspec(&pathspec, 0,
>  PATHSPEC_PREFER_CWD |
> -(opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
> +(opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0) |
> +(super_prefix ? PATHSPEC_FROMROOT : 0),
>  prefix, argv + i);
>   pathspec.max_depth = opt.max_depth;
>   pathspec.recursive = 1;
> diff --git a/t/t7814-grep-recurse-submodules.sh 
> b/t/t7814-grep-recurse-submodules.sh
> index 418ba68fe..e0932b2b7 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -227,7 +227,7 @@ test_expect_success 'grep history with moved submoules' '
>   test_cmp expect actual
>  '
>  
> -test_expect_failure 'grep using relative path' '
> +test_expect_success 'grep using relative path' '
>   test_when_finished "rm -rf parent sub" &&
>   git init sub &&
>   echo "foobar" >sub/file &&


Re: [PATCH] strbuf: add strbuf_add_real_path()

2017-02-28 Thread Brandon Williams
On 02/27, René Scharfe wrote:
> Am 27.02.2017 um 19:22 schrieb Brandon Williams:
> >On 02/25, René Scharfe wrote:
> >>+void strbuf_add_real_path(struct strbuf *sb, const char *path)
> >>+{
> >>+   if (sb->len) {
> >>+   struct strbuf resolved = STRBUF_INIT;
> >>+   strbuf_realpath(&resolved, path, 1);
> >>+   strbuf_addbuf(sb, &resolved);
> >>+   strbuf_release(&resolved);
> >>+   } else
> >>+   strbuf_realpath(sb, path, 1);
> >
> >I know its not required but I would have braces on the 'else' branch
> >since they were needed on the 'if' branch.  But that's up to you and
> >your style :)
> 
> Personally I'd actually prefer them as well, but the project's style
> has traditionally been to avoid braces on such trailing single-line
> branches to save lines.  The CodingGuidelines for this topic have
> been clarified recently, though, and seem to require them now.
> Interesting.
> 
> René

Having the project's guidelines align with your own preference makes
things a bit easier!

-- 
Brandon Williams


Re: Typesafer git hash patch

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 08:33:13PM +, brian m. carlson wrote:

> On Tue, Feb 28, 2017 at 03:26:34PM -0500, Jeff King wrote:
> > Yeah, a lot of brian's patches have been focused around the fixing the
> > related size assumptions. We've got GIT_SHA1_HEXSZ which doesn't solve
> > the problem, but at least makes it easy to find. And a big improvement
> > in the most recent series is a parse_oid() function that lets you parse
> > object-ids left-to-right without knowing the size up front. So things
> > like:
> > 
> >   if (len > 82 &&
> >   !get_sha1_hex(buf, sha1_a) &&
> >   get_sha1_hex(buf + 41, sha1_b))
> > 
> > becomes more like:
> > 
> >   if (parse_oid(p, oid_a, &p) && *p++ == ' ' &&
> >   parse_oid(p, oid_b, &p) && *p++ == '\n')
> 
> What I could do instead of using GIT_SHA1_HEXSZ is use GIT_MAX_HEXSZ for
> things that are about allocating enough memory and create a global (or
> function) for things that only care about what the current hash size is.
> That might be a desirable approach.  If other people agree, I can make a
> patch to do that.

I was going to say "don't worry about it, and focus on converting to
constants at all for now". But I guess while you are doing that, it does
not hurt to split the MAX_HEXSZ cases out. It will save work in sorting
them later.

-Peff


Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

2017-02-28 Thread Junio C Hamano
Jonathan Tan  writes:

> It could be argued that in the future, Git might need to distinguish
> tree_objects from blob_objects - in particular, a user might want
> rev-list to print the trees but not the blobs. 

That was exactly why these bits were originally made to "appear
independent but in practice nobody sets only one and leaves others
off".  

And it didn't happen in the past 10 years, which tells us that we
should take this patch.

Thanks.



Re: [PATCH 2/3] revision: exclude trees/blobs given commit

2017-02-28 Thread Junio C Hamano
Jonathan Tan  writes:

> When the --objects argument is given to rev-list, an argument of the
> form "^$tree" can be given to exclude all blobs and trees reachable from
> that tree, but an argument of the form "^$commit" only excludes that
> commit, not any blob or tree reachable from it. Make "^$commit" behave
> consistent to "^$tree".
>
> Also, formalize this behavior in unit tests. (Some of the added tests
> would already pass even before this commit, but are included
> nevertheless for completeness.)
>
> Signed-off-by: Jonathan Tan 
> ---
>  revision.c   |  2 ++
>  t/t6000-rev-list-misc.sh | 88 
> 
>  2 files changed, 90 insertions(+)
>
> diff --git a/revision.c b/revision.c
> index 5e49d9e0e..e6a62da98 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -254,6 +254,8 @@ static struct commit *handle_commit(struct rev_info *revs,
>   die("unable to parse commit %s", name);
>   if (flags & UNINTERESTING) {
>   mark_parents_uninteresting(commit);
> + if (revs->tree_and_blob_objects)
> + mark_tree_uninteresting(commit->tree);

I fear that this may end up to be quite expensive.  Can we have a
perf test?


Re: Transition plan for git to move to a new hash function

2017-02-28 Thread brian m. carlson
On Mon, Feb 27, 2017 at 01:00:01PM +, Ian Jackson wrote:
> I said I was working on a transition plan.  Here it is.  This is
> obviously a draft for review, and I have no official status in the git
> project.  But I have extensive experience of protocol compatibility
> engineering, and I hope this will be helpful.
> 
> Ian.
> 
> 
> Subject: Transition plan for git to move to a new hash function
> 
> 
> BASIC PRINCIPLES
> 
> 
> We run multiple hashes in parallel.  Each object is named by exactly
> one hash.  We define that objects with identical content, but named by
> different hash functions, are different objects.

I think this is fine.

> Objects of one hash may refer to objects named by a different hash
> function to their own.  Preference rules arrange that normally, new
> hash objects refer to other new hash objects.

The existing codebase isn't really intended with that in mind.

It's not that I am arguing against this because I think it's a bad idea,
I'm arguing against it because as a contributor, I'm doubtful that this
is easily achievable given the state of the codebase.

> The intention is that for most projects, the existing SHA-1 based
> history will be retained and a new history built on top of it.
> (Rewriting is also possible but means a per-project hard switch.)

I like Peff's suggested approach in which we essentially rewrite history
under the hood, but have a lookup table which looks up the old hash
based on the new hash.  That allows us to refer to old objects, but not
have to share serialized data that mentions both hashes.

Obviously only the SHA-1 versions of old tags and commits will be able
to be validated, but that shouldn't be an issue.  We can hook that code
into a conversion routine that can handle on-the-fly object conversion.

We also can implement (optionally disabled) fallback functionality to
look up old SHA-1 hash names based on the new hash.

> We extend the textual object name syntax to explicitly name the hash
> used.  Every program that invokes git or speaks git protocols will
> need to understand the extended object name syntax.
> 
> Packfiles need to be extended to be able to contain objects named by
> new hash functions.  Blob objects with identical contents but named by
> different hash functions would ideally share storage.
> 
> Safety catches preferent accidental incorporation into a project of
> incompatibly-new objects, or additional deprecatedly-old objects.
> This allows for incremental deployment.

We have a compatibility mechanism already in place: if the
repositoryFormatVersion option is set to 1, but an unknown extension
flag is set, Git will bail out.

For network protocols, we have the server offer a hash=foo extension,
and make the client echo it back, and either bail or convert on the fly.
This makes it fast for new clients, and slow for old clients, which
encourages migration.

We could also store old-style packs for easy fetch by clients.

> TEXTUAL SYNTAX
> ==
> 
> The object name textual syntax is extended.  The new syntax may be
> used in all textual git objects and protocols (commits, tags, command
> lines, etc.).
> 
> We declare that the object name syntax is henceforth
>   [A-Z]+[0-9a-z]+ | [0-9a-f]+
> and that names [A-Z].* are deprecated as ref name components.

I'd simply say that we have data always be in the new format if it's
available, and tag the old SHA-1 versions instead.  Otherwise, as Peff
pointed out, we're going to be stuck typing a bunch of identical stuff
every time.  Again, this encourages migration.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


RE: SHA1 collisions found

2017-02-28 Thread Dan Shumow
[Responses inline]

No need to keep me "bcc'd" (though thanks for the consideration) -- I'm happy 
to ignore anything I don't want to be pulled into ;-)

Here's a rollup of what needs to be done based on the discussion below:

1) Remove extraneous exports from sha1.h
2) Remove "safe mode" support.
3) Remove sha1_compression_W if it is not needed by the performance 
improvements.
4) Evaluate logic around storing states and generating recompression states.  
Remove defines that bloat code footprint.

Thanks,
Dan


-Original Message-
From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds
Sent: Tuesday, February 28, 2017 11:34 AM
To: Junio C Hamano 
Cc: Jeff King ; Joey Hess ; Git Mailing List 

Subject: Re: SHA1 collisions found

On Tue, Feb 28, 2017 at 11:07 AM, Junio C Hamano  wrote:
>
> In a way similar to 8415558f55 ("sha1dc: avoid c99 
> declaration-after-statement", 2017-02-24), we would want this on top.

There's a few other simplifications that could be done:

 (1) make the symbols static that aren't used.

 The sha1.h header ends up declaring several things that shouldn't have 
been exported.

 I suspect the code may have had some debug mode that got stripped out from 
it before making it public (or that was never there, and was just something the 
generating code could add).

[danshu] Yes, this is reasonable.  The emphasis of the code, heretofore, had 
been the illustration of our unavoidable bit condition performance improvement 
to counter cryptanalysis.  I'm happy to remove the unused stuff from the public 
header.

 (2) get rid of the "safe mode" support.

 That one is meant for non-checking replacements where it generates a 
*different* hash for input with the collision fingerpring, but that's pointless 
for the git use when we abort on a collision fingerprint.

[danshu] Yes, I agree that if you aren't using this it can be taken out.  I 
believe Marc has some use cases / potentially consumers of this algorithm in 
mind.  We can move it into separate header/source files for anyone who wants to 
use it.

I think the first one will show that the sha1_compression() function isn't 
actually used, and with the removal of safe-mode I think
sha1_compression_W() also is unused.

[danshu]  Some of the performance experiments that I've looked at involve 
putting the sha1_compression_W(...) back in.  Though, that doesn't look like 
it's helping.  If it is unused after the performance improvements, we'll take 
it out, or move it into its own file.

Finally, only states 58 and 65 (out of all 80 states) are actually used, and 
from what I can tell, the 'maski' value is always 0, so the looping over 80 
state masks is really just a loop over two.

[danshu]  So, while looking at performance optimizations, I specifically looked 
at how much removing storing the intermediate states helps -- And I found that 
it doesn't seem to make a difference for performance.  My cursory hypothesis is 
because nothing is waiting on those writes to memory, the code moves on 
quickly.  That said, it is a bunch of code that is essentially doing nothing 
and removing that is worthwhile.  Though, partially what we're seeing here is 
that, as you point out below, we're working with generated code that we want to 
be general.  Specifically, right now, we're checking only disturbance vectors 
that we know can be used to efficiently attack the compression function.  It 
may be the case that further cryptanalysis uncovers more.  We want to have a 
general enough approach that we can add scanning for new disturbance vectors if 
they're found later.  Over specializing the code makes that more difficult, as 
currently the algorithm is data driven, and we don't need to write new code, 
but rather just add more data to check.  One other note -- the "maski" field of 
the  dv_info_t struct is not an index to check the state, but rather an index 
into the mask generated by the ubc check code, so that doesn't pertain to 
looping over the states.  More on this below.  

The file has code top *generate* all the 80 sha1_recompression_step() 
functions, and I don't think the compiler is smart enough to notice that only 
two of them matter.

[danshu] That's a good observation -- We should clean up the unused 
recompression steps, especially because that will generate a ton of object 
code.  We should add some logic to only compile the functions that are used.

And because 'maski' is always zero, thisL

   ubc_dv_mask[sha1_dvs[i].maski]

code looks like it might as well just use ubc_dv_mask[0] - in fact the 
ubc_dv_mask[] "array" really is just a single-entry array anyway:

   #define DVMASKSIZE 1

[danshu]  The idea here is that we are currently checking 32 disturbance 
vectors with our bit mask.  We're checking 32 DVs, because we have 32 bits of 
mask that we can use.  The DVs are ordered by their probability of leading to 
an attack (which is directly correlated to the complexity of finding a 
collision.

Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote:

> Jonathan Tan  writes:
> 
> > It could be argued that in the future, Git might need to distinguish
> > tree_objects from blob_objects - in particular, a user might want
> > rev-list to print the trees but not the blobs. 
> 
> That was exactly why these bits were originally made to "appear
> independent but in practice nobody sets only one and leaves others
> off".  
> 
> And it didn't happen in the past 10 years, which tells us that we
> should take this patch.

I actually have a patch which uses the distinction. It's for
upload-archive doing reachability checks (which seems rather familiar to
what's going on here).

The whole series (from 2013!) is at:

  git://github.com/peff/git jk/archive-reachability

but the relevant commits are below.

I don't think the same logic holds for this case, though, because
somebody actually can ask for a single blob.

-- >8 --
From: Jeff King 
Date: Wed, 5 Jun 2013 17:57:02 -0400
Subject: [PATCH] list-objects: optimize "revs->blob_objects = 0" case

If we are traversing trees during a "--objects"
traversal, we may skip blobs if the "blob_objects" field of
rev_info is not set. But we do so as the first thing in
process_blob(), only after we have actually created the
"struct blob" object, incurring a hash lookup. We can
optimize out this no-op call completely.

This does not actually affect any current code, as all of
the current traversals always set blob_objects when looking
at objects, anyway.

Signed-off-by: Jeff King 
---
 list-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aafb..58ad69557 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -117,7 +117,7 @@ static void process_tree(struct rev_info *revs,
process_gitlink(revs, entry.oid->hash,
show, base, entry.path,
cb_data);
-   else
+   else if (revs->blob_objects)
process_blob(revs,
 lookup_blob(entry.oid->hash),
 show, base, entry.path,
-- 
2.12.0.359.gd4c8c42e9

-- >8 --
From: Jeff King 
Date: Wed, 5 Jun 2013 18:02:42 -0400
Subject: [PATCH] archive: ignore blob objects when checking reachability

We cannot create an archive from a blob object, so we would
not expect anyone to provide one to us. And if they do, we
will fail anyway just after the reachability check.  We can
therefore optimize our reachability check to ignore blobs
completely, and not even create a "struct blob" for them.

Depending on the repository size and the exact place we find
the reachable object in the traversal, this can save 20-25%,
a we can avoid many lookups in the object hash.

The downside of this is that a blob provided to a remote
archive process will fail with "no such object" rather than
"object is not a tree" (we could organize the code to retain
the old message, but since we no longer know whether the
blob is reachable or not, we would potentially be leaking
information about the existence of unreachable objects).

Signed-off-by: Jeff King 
---
 archive.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/archive.c b/archive.c
index ef89b2556..489115f9f 100644
--- a/archive.c
+++ b/archive.c
@@ -383,6 +383,7 @@ static int object_is_reachable(const unsigned char *sha1)
save_commit_buffer = 0;
init_revisions(&data.revs, NULL);
setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL);
+   data.revs.blob_objects = 0;
if (prepare_revision_walk(&data.revs))
return 0;
 
-- 
2.12.0.359.gd4c8c42e9



Re: git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-02-28 Thread Junio C Hamano
Torsten Bögershausen  writes:

> On 2017-02-27 21:17, Junio C Hamano wrote:
>
>> Torsten, you've been quite active in fixing various glitches around
>> the EOL conversion in the latter half of last year.  Have any
>> thoughts to share on this topic?
>> 
>> Thanks.
>
> Sorry for the delay, being too busy with other things.
> I followed the discussion, but didn't have good things to contribute.
> I am not an expert in diff.c, but there seems to be a bug, thanks everybody
> for digging.
>
> Back to business:
>
> My understanding is that git diff --quiet should be quiet, when
> git add will not do anything.

Yes, I think that is a sensible criterion.  What I was interested to
hear from you the most was to double check if Mike's expectation is
reasonable.  Earlier we had a lengthy discussion on what to do when
convert-to-git and convert-to-working-tree conversions do not round
trip, and I was wondering if this was one of those cases.



Re: [PATCH v2 1/2] Documentation: Improve description for core.quotePath

2017-02-28 Thread Andreas Heiduk
Am 24.02.2017 um 22:43 schrieb Jakub Narębski:
> W dniu 24.02.2017 o 21:37, Andreas Heiduk pisze:
>> Signed-off-by: Andreas Heiduk 
> 
> Thanks.  This is good work.

:-)

> 
>> ---
>>  Documentation/config.txt | 24 ++--
>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
[...]
>> +
> 
> This empty line should not be here, I think.

Missed that bugger...

[...]
>> +values larger than 0x80 (e.g. octal `\302\265` for "micro" in
> 
> I wonder if we can put UTF-8 in AsciiDoc, that is write "μ"
> instead of spelling it "micro" (or: Greek letter "mu").
> 
> Or "µ" / "µ", though I wonder how well it is supported
> in manpage, info and PDF outputs...

... and fonts installed on client machines!

Since I started with a two-line diff for just git-ls-files I decided to
play safe with this,




Re: [PATCH v8 4/6] stash: teach 'push' (and 'create_stash') to honor pathspec

2017-02-28 Thread Junio C Hamano
Thomas Gummerer  writes:

> + git reset ${GIT_QUIET:+-q} -- "$@"
> + git ls-files -z --modified -- "$@" |
> + git checkout-index -z --force --stdin
> + git checkout ${GIT_QUIET:+-q} HEAD -- $(git ls-files -z 
> --modified "$@")

I think you forgot to remove this line, whose correction was added
as two lines immediately before it.  I'll remove it while queuing.

> + git clean --force ${GIT_QUIET:+-q} -d -- "$@"

Thanks.


Re: format-patch subject-prefix gets truncated when using the --numbered flag

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 03:59:16PM +, Adrian Dudau wrote:

> This is happening on the latest master branch, so I dug through the
> code and tracked the issue to this piece of code in log-tree.c:
> 
> if (opt->total > 0) {
> static char buffer[64];
> snprintf(buffer, sizeof(buffer),
>  "Subject: [%s%s%0*d/%d] ",
>  opt->subject_prefix,
>  *opt->subject_prefix ? " " : "",
>  digits_in_number(opt->total),
>  opt->nr, opt->total);
> subject = buffer;
> } else if (opt->total == 0 && opt->subject_prefix && *opt-
> >subject_prefix) {
> static char buffer[256];
> snprintf(buffer, sizeof(buffer),
>  "Subject: [%s] ",
>  opt->subject_prefix);
> subject = buffer;
> } else {
> subject = "Subject: ";
> }
> 
> Apparently the size of the "buffer" var is different in the two
> situations. Anybody knows if this is by design or just an old
> oversight?

I think it's just an old oversight. There are some other fixed-size
buffers later, too, which could similarly truncate.

I think these should all be "static strbuf", and entering the function
they should get strbuf_reset(), followed by a strbuf_addf(). The static
strbufs will be the owners of the allocated heap memory, and it will get
reused from call to call.

That stops the immediate problem. As a function interface, it's pretty
ugly. It would probably make more sense for the caller to pass in a
strbuf rather than have us pass out pointers to static storage. For the
call in make_cover_letter(), that would be fine. For show_log(), it's
less clear. That's called for every commit in "git log", which might be
a little sensitive to allocations.

The only persistent storage it has is via the rev_info. Perhaps it could
hold some scratch buffers for the traversal (though I don't think we
currently do any resource-freeing when done with a rev_info, so it
effectively becomes a leak).

-Peff


Re: [PATCH 4/8] interpret_branch_name: allow callers to restrict expansions

2017-02-28 Thread Junio C Hamano
Jeff King  writes:

> The original purpose of interpret_branch_name() was to be used by
> get_sha1() in resolving refs.  As such, some of its expansions may
> point to refs outside of the local "refs/heads" namespace.

I am not sure the reference to "get_sha1()" is entirely correct.

Until it was renamed at 431b1969fc ("Rename interpret/substitute
nth_last_branch functions", 2009-03-21), the function was called
interpret_nth_last_branch() which was originally introduced for the
name, not sha1, at ae5a6c3684 ("checkout: implement "@{-N}" shortcut
name for N-th last branch", 2009-01-17).  The use of the same syntax
and function for the object name came a bit later.

But I think that is an insignificant detail.  Let's read on.

> Over time, the function has been picked up by other callers
> who want to use the ref-expansion to give the user access to
> the same shortcuts (e.g., allowing "git branch" to delete
> via "@{-1}" or "@{upstream}").  These uses have confusing
> corner cases when the expansion isn't in refs/heads/ (for
> instance, deleting "@" tries to delete refs/heads/HEAD,
> which is nonsense).
>
> Callers can't know from the returned string how the
> expansion happened (e.g., did the user really ask for a
> branch named "HEAD", or did we do a bogus expansion?). One
> fix would be to return some out-parameters describing the
> types of expansion that occurred. This has the benefit that
> the caller can generate precise error messages ("I
> understood @{upstream} to mean origin/master, but that is a
> remote tracking branch, so you cannot create it as a local
> name").
>
> However, out-parameters make calling interface somewhat
> cumbersome. Instead, let's do the opposite: let the caller
> tell us which elements to expand. That's easier to pass in,
> and none of the callers give more precise error messages
> than "@{upstream} isn't a valid branch name" anyway (which
> should be sufficient).
>
> The strbuf_branchname() function needs a similar parameter,
> as most of the callers access interpret_branch_name()
> through it. For now, we'll pass "0" for "no restrictions" in
> each caller, and update them individually in subsequent
> patches.

OK.

> diff --git a/cache.h b/cache.h
> index c67995caa..a8816c914 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1383,8 +1383,17 @@ extern char *oid_to_hex(const struct object_id *oid);  
> /* same static buffer as s
>   *
>   * If the input was ok but there are not N branch switches in the
>   * reflog, it returns 0.
> - */
> -extern int interpret_branch_name(const char *str, int len, struct strbuf *);
> + *
> + * If "allowed" is non-zero, it is a treated as a bitfield of allowable
> + * expansions: local branches ("refs/heads/"), remote branches
> + * ("refs/remotes/"), or "HEAD". If no "allowed" bits are set, any expansion 
> is
> + * allowed, even ones to refs outside of those namespaces.
> + */

Answering the question in your follow-up, I personally do not find
"0 means anything goes" too confusing, but for satisfying those who
do, spelling ~0 is not too bad, either.

> +#define INTERPRET_BRANCH_LOCAL (1<<0)
> +#define INTERPRET_BRANCH_REMOTE (1<<1)
> +#define INTERPRET_BRANCH_HEAD (1<<2)
> +extern int interpret_branch_name(const char *str, int len, struct strbuf *,
> +  unsigned allowed);
>  extern int get_oid_mb(const char *str, struct object_id *oid);

> diff --git a/refs.c b/refs.c
> index 6d0961921..da62119c2 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -405,7 +405,7 @@ int refname_match(const char *abbrev_name, const char 
> *full_name)
>  static char *substitute_branch_name(const char **string, int *len)
>  {
>   struct strbuf buf = STRBUF_INIT;
> - int ret = interpret_branch_name(*string, *len, &buf);
> + int ret = interpret_branch_name(*string, *len, &buf, 0);
>  
>   if (ret == *len) {
>   size_t size;

This is the one used by dwim_ref/log, so we'd need to allow it to
resolve to anything, e.g. "@" -> "HEAD", and pretend that the user
typed that expansion.  OK.


Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

2017-02-28 Thread Jeff King
On Fri, Feb 24, 2017 at 05:18:36PM -0800, Jonathan Tan wrote:

> Whenever tree_objects is set to 1 in revision.h's struct rev_info,
> blob_objects is likewise set, and vice versa. Combine those two fields
> into one.
> 
> Some of the existing code does not handle tree_objects being different
> from blob_objects properly. For example, "handle_commit" in revision.c
> recurses from an UNINTERESTING tree into its subtree if tree_objects ==
> 1, completely ignoring blob_objects; it probably should still recurse if
> tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should
> behave differently depending on blob_objects (controlling the
> instantiation and marking of blob objects). This commit resolves the
> issue by forbidding tree_objects from being different to blob_objects.

Yeah, I agree that is awkward. I'm OK with the rule "if blob_objects is
set, then tree_objects must also be set". It's the other way around I
care more about.

> It could be argued that in the future, Git might need to distinguish
> tree_objects from blob_objects - in particular, a user might want
> rev-list to print the trees but not the blobs. However, this results in
> a minor performance savings at best in that objects no longer need to be
> instantiated (causing memory allocations and hashtable insertions) - no
> disk reads are being done for objects whether blob_objects is set or
> not.

In a full object-graph traversal, we actually spend a big chunk of our
time in hash lookups. My measurements (admittedly from 2013, which I
haven't repeated lately) show something like a 20-25% speedup for this
case.

My only use for it (and the source of those timings) was to compute
archive reachability, which nobody seems to care too much about. But I
suspect we could speed up your case, too, when we are just computing the
reachability of a non-blob. I.e., you should be able to turn on the
smallest subset of "commits only", "commits and trees", and "commits,
trees, and blobs", based on what the other side has asked for.

-Peff


Re: [PATCH 0/6] Use time_t

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 02:27:22PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > ... We can certainly stick with it for now (it's awkward if you
> > really do have an entry on Jan 1 1970, but other than that it's an OK
> > marker). I agree that the most negatively value is probably a saner
> > choice, but we can switch to it after the dust settles.
> 
> I was trying to suggest that we should strive to switch to the most
> negative or whatever the most implausible value in the new range
> (and leave it as a possible bug to be fixed if we missed a place
> that still used "0 is impossible") while doing the ulong to time_t
> (or timestamp_t that is i64).  
> 
> "safer in the short term" wasn't meant to be "let's not spend time
> to do quality work".  As long as we are switching, we should follow
> it through.

Sure, I'd be much happier to see it done now. I just didn't want to pile
on the requirements to the point that step 1 doesn't get done. But I
haven't even looked at the code changes needed for time_t. I suspect
Dscho has a better feel for it at this point.

-Peff


Re: [PATCH 4/8] interpret_branch_name: allow callers to restrict expansions

2017-02-28 Thread Jeff King
On Tue, Feb 28, 2017 at 12:27:45PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The original purpose of interpret_branch_name() was to be used by
> > get_sha1() in resolving refs.  As such, some of its expansions may
> > point to refs outside of the local "refs/heads" namespace.
> 
> I am not sure the reference to "get_sha1()" is entirely correct.
> 
> Until it was renamed at 431b1969fc ("Rename interpret/substitute
> nth_last_branch functions", 2009-03-21), the function was called
> interpret_nth_last_branch() which was originally introduced for the
> name, not sha1, at ae5a6c3684 ("checkout: implement "@{-N}" shortcut
> name for N-th last branch", 2009-01-17).  The use of the same syntax
> and function for the object name came a bit later.
> 
> But I think that is an insignificant detail.  Let's read on.

Yeah, sorry, I was lazy about digging up the history. I think the
problem actually started in ae0ba8e20a (Teach @{upstream} syntax to
strbuf_branchanme(), 2010-01-19), when the features were ported over
from get_sha1() to interpret_branch_name().

Since I need to re-roll anyway, I'll tweak this to be more accurate.

> > @@ -405,7 +405,7 @@ int refname_match(const char *abbrev_name, const char 
> > *full_name)
> >  static char *substitute_branch_name(const char **string, int *len)
> >  {
> > struct strbuf buf = STRBUF_INIT;
> > -   int ret = interpret_branch_name(*string, *len, &buf);
> > +   int ret = interpret_branch_name(*string, *len, &buf, 0);
> >  
> > if (ret == *len) {
> > size_t size;
> 
> This is the one used by dwim_ref/log, so we'd need to allow it to
> resolve to anything, e.g. "@" -> "HEAD", and pretend that the user
> typed that expansion.  OK.

Yeah. Left them all as "0" here, and then split the updates into their
own commits. So there's no commit that says "and we are leaving this
spot, because it is correct as-is". The other notable one is the
strbuf_branchname() call in merge_name().

I can mention those in the commit message here (I think I did in the
cover letter, but it would be nice to stick it in the history, since
that will be what comes up if you blame those lines).

-Peff


Re: format-patch subject-prefix gets truncated when using the --numbered flag

2017-02-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Adrian Dudau  writes:
>
>> I noticed that the --subject-prefix string gets truncated sometimes,
>> but only when using the --numbered flat. Here's an example:
>>
>> addu@sestofb11:/data/fb/addu/git$ export longm="very very very very
>> very very very very very very very very very very long prefix"
>
> This looks like "dr, my arm hurts when I twist it this way---don't
> do that then" ;-).  Truncation is designed and intended to preserve
> space for the real title of commit.
> 
> Having said that...
> ...
> I think this is just an old oversight.  The latter should have been
> even shorter than the former one (or the former should be longer
> than the latter) to account for the width of the counter e.g. 01/27.

And having said all that, if we really want to allow overlong
subject prefix that would end up hiding the real title of the patch,
a modern way to do so would be to use xstrfmt() like the attached
toy-patch does.  Note that this is merely a small first step---you'd
notice that "subject" is kept around as a "static" and only freed
upon entry to this function for the second time, to preserve the
ownership model of the original code.  In a real "fix" (if this
needs to be "fixed", that is), I think the ownership model of the
storage used for *subject_p and *extra_headers_p needs to be updated
so that it will become caller's responsibility to free them
(similarly, the ownership model of opt->diffopt.stat_sep that is
assigned the address of the static buffer[] in the same function
needs to be revisited).

That "buffer" thing I think would need to be a bit more careful even
in the current code, which _does_ use snprintf() correctly to avoid
overflowing the buffer[], by the way.  If you have an overlong
opt->mime_boundary, the resulting "e-mail" looking output can become
structurely broken.  The truncation may happen way before the full
line for Content-Transfer-Encoding: is written, for example.

So this function seems to have a lot more graver problems that need
to be looked at.

 log-tree.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index 8c2415747a..24c98f5a80 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -337,29 +337,23 @@ void log_write_email_headers(struct rev_info *opt, struct 
commit *commit,
 const char **extra_headers_p,
 int *need_8bit_cte_p)
 {
-   const char *subject = NULL;
+   static const char *subject = NULL;
const char *extra_headers = opt->extra_headers;
const char *name = oid_to_hex(opt->zero_commit ?
  &null_oid : &commit->object.oid);
 
+   free((void *)subject);
*need_8bit_cte_p = 0; /* unknown */
if (opt->total > 0) {
-   static char buffer[64];
-   snprintf(buffer, sizeof(buffer),
-"Subject: [%s%s%0*d/%d] ",
-opt->subject_prefix,
-*opt->subject_prefix ? " " : "",
-digits_in_number(opt->total),
-opt->nr, opt->total);
-   subject = buffer;
+   subject = xstrfmt("Subject: [%s%s%0*d/%d] ",
+ opt->subject_prefix,
+ *opt->subject_prefix ? " " : "",
+ digits_in_number(opt->total),
+ opt->nr, opt->total);
} else if (opt->total == 0 && opt->subject_prefix && 
*opt->subject_prefix) {
-   static char buffer[256];
-   snprintf(buffer, sizeof(buffer),
-"Subject: [%s] ",
-opt->subject_prefix);
-   subject = buffer;
+   subject = xstrfmt("Subject: [%s] ", opt->subject_prefix);
} else {
-   subject = "Subject: ";
+   subject = xstrdup("Subject: ");
}
 
fprintf(opt->diffopt.file, "From %s Mon Sep 17 00:00:00 2001\n", name);


Re: [PATCH 2/3] revision: exclude trees/blobs given commit

2017-02-28 Thread Jeff King
On Fri, Feb 24, 2017 at 05:18:37PM -0800, Jonathan Tan wrote:

> When the --objects argument is given to rev-list, an argument of the
> form "^$tree" can be given to exclude all blobs and trees reachable from
> that tree, but an argument of the form "^$commit" only excludes that
> commit, not any blob or tree reachable from it. Make "^$commit" behave
> consistent to "^$tree".

Like Junio, I suspect this is going to be quite expensive. This is
similar to the "--objects-edge" and "--objects-edge-aggressive" options,
which we had to pull back on the use of because of their expensiveness.

(And as an aside, wouldn't those options be the right place for what
you're doing?).

I also think that the mechanism here is not 100% accurate. The commit
traversal will stop once it has painted down, so you're effectively
exploring the trees of the merge bases. But older history could mention
an object that has resurfaced again (e.g., due to a cherry-pick).

Getting the 100% accurate answer is _really_ expensive, though with
reachability bitmaps it's not too bad. I just wonder if that's a better
approach to be taking.

-Peff


  1   2   >