Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-10-02 Thread Junio C Hamano
Jonathan Nieder  writes:

> +Signed Tags
> +~~~
> +We add a new field "gpgsig-newhash" to the tag object format to allow
> +signing tags without relying on SHA-1. Its signed payload is the
> +newhash-content of the tag with its gpgsig-newhash field and "-BEGIN PGP
> +SIGNATURE-" delimited in-body signature removed.
> +
> +This means tags can be signed
> +1. using SHA-1 only, as in existing signed tag objects
> +2. using both SHA-1 and NewHash, by using gpgsig-newhash and an in-body
> +   signature.
> +3. using only NewHash, by only using the gpgsig-newhash field.

I have the same issue with signed commit.

The signed parts for SHA-1 contents exclude the in-body signature
(obviously) and all the headers including gpgsig-newhash that is not
known to our old clients are included.  The signed parts for NewHash
contents exclude the in-body signature and gpgsig-newhash header,
but all other headers.  I somehow feel that we should just reserve
gpgsig-* to prepare for the day when we introduce newhash2 and later
and exclude all of them from the computation.  Treat the difference
between how SHA-1 contents excludes _only_ it knows about and how
NewHash contents excludes _all_ possible signatures, just like the
differece between where SHA-1 and NewHash contents has the
signature.  That is, yes, we didn't know better when we designed
SHA-1 contents, but now we know better and are correcting the
mistakes by moving the signature from in-body tail to a header, and
by excluding anything gpgsig-*, not just the known ones.

> +Mergetag embedding
> +~~
> +The mergetag field in the sha1-content of a commit contains the
> +sha1-content of a tag that was merged by that commit.
> +
> +The mergetag field in the newhash-content of the same commit contains the
> +newhash-content of the same tag.

OK.  

We do not have a tool that extracts them and creates a tag object,
but if such a tool is invented in the future, it would only have to
worry about newhash content, as it would be a local operation.
Makes sense.

> +Submodules
> +~~
> +To convert recorded submodule pointers, you need to have the converted
> +submodule repository in place. The translation table of the submodule
> +can be used to look up the new hash.

OK, I earlier commented on a paragraph that I couldn't tell what it
was talking about, but this is a lot more understandable.  Perhaps
the earlier one can be removed?

We saw earlier what happens during "fetch".  This seems to hint that
we would need to do a "recursive" fetch in the bottom-up direction,
but without fetching the superproject, you wouldn't know what submodules
are needed and from where, so there is a bit of chicken-and-egg problem
we need to address, as we further make the design more detailed.

> +Loose objects and unreachable objects
> +~
> ...
> +"git gc --auto" currently waits for there to be 50 packs present
> +before combining packfiles. Packing loose objects more aggressively
> +may cause the number of pack files to grow too quickly. This can be
> +mitigated by using a strategy similar to Martin Fick's exponential
> +rolling garbage collection script:
> +https://gerrit-review.googlesource.com/c/gerrit/+/35215

Yes, concatenating into the latest pack that still is small may be a
reasonable way, as there won't be many good chances to create good
deltas anyway until you have blobs and trees at sufficiently numbers
of different versions, to do a "quick GC whose only purpose is to
keep the number of loose object down".

> +To avoid a proliferation of UNREACHABLE_GARBAGE packs, they can be
> +combined under certain circumstances. If "gc.garbageTtl" is set to
> +greater than one day, then packs created within a single calendar day,
> +UTC, can be coalesced together. The resulting packfile would have an
> +mtime before midnight on that day, so this makes the effective maximum
> +ttl the garbageTtl + 1 day. If "gc.garbageTtl" is less than one day,
> +then we divide the calendar day into intervals one-third of that ttl
> +in duration. Packs created within the same interval can be coalesced
> +together. The resulting packfile would have an mtime before the end of
> +the interval, so this makes the effective maximum ttl equal to the
> +garbageTtl * 4/3.

OK.  

Is the use of mtime essential, or because packs are "write once and
from there access read-only", would a timestamp written somewhere in
the header or the trailer of the file, if existed, work equally
well?  Not a strong objection, but a mild suggestion that not
relying on mtime may be a good idea (it will keep an accidental /
unintended "touch" from keeping garbage alive longer than you want).

> +The UNREACHABLE_GARBAGE setting goes in the PSRC field of the pack
> +index. More generally, that field indicates where a pack came from:
> +
> + - 1 (PACK_SOURCE_RECEIVE) for a pack received over the network
> + - 2 (PACK_SOURCE_AUTO) for a pack created by a 

Re: [PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-10-02 Thread Pranit Bauva
Hey Junio,

On Tue, Oct 3, 2017 at 9:21 AM, Junio C Hamano  wrote:
> Ramsay Jones  writes:
>
>> On 02/10/17 14:44, Pranit Bauva wrote:
>> [snip]
>>>...
>> Yes, I also meant to tidy that up by removing some, now
>> redundant, initialisation later in that function.
>>
>> Note, that wasn't the only bug! (I have probably forgotten
>> some of them, but look at 964f4e2b0, for example).
>
> It seems that Pranit needs a bit more work to take known fixes from
> your efforts and we should wait for the series to be rerolled?

This series is the initial part of the whole series. Things started to
get a little problematic after bisect_start(). The reason I am doing
the series in parts so that I can get the part which is solid to be
merged and then start work on later parts since I don't have a time
constraint now.

Regards,
Pranit Bauva


Re: [PATCH v3 1/5] test-list-objects: List a subset of object ids

2017-10-02 Thread Junio C Hamano
Derrick Stolee  writes:

> Subject: Re: [PATCH v3 1/5] test-list-objects: List a subset of object ids

Style nit: s/List/list/;

>
> Signed-off-by: Derrick Stolee 

This should stick to the left hand side.

No need to resend, only to fix these two, as I'll tweak them when
queuing.



Re: [PATCH v2] request-pull: capitalise "Git" to make it a proper noun

2017-10-02 Thread Junio C Hamano
Jonathan Nieder  writes:

> Ann T Ropea wrote:
>
>> Of the many ways to spell the three-letter word, the variant "Git"
>> should be used when referring to a repository in a description; or, in
>> general, when it is used as a proper noun.
>>
>> We thus change the pull-request template message so that it reads
>>
>>"...in the Git repository at:"
>>
>> Besides, this brings us in line with the documentation, see
>> Documentation/howto/using-signed-tag-in-pull-request.txt
>>
>> Signed-off-by: Ann T Ropea 
>> Acked-by: Jonathan Nieder 
>> ---
>> v2: rename patch, correct Signed-off-by line, add Acked-by line
>>  git-request-pull.sh | 2 +-
>>  t/t5150-request-pull.sh | 4 ++--
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> Reviewed-by: Jonathan Nieder 
>
> Thanks for the quick turnaround.

Thanks, but the patch is whitespace damaged and did not apply (what
happend to "reviewer's oversight statement" ;-).

As there were only three instances, I just edited the target files
and committed, which was far quicker than trying to match
whitespaces by editing the patch text.



[PATCH 2/2] colors: git_default_config() does not read color.ui

2017-10-02 Thread Junio C Hamano
As we reverted 136c8c8b ("color: check color.ui in
git_default_config()", 2017-07-13), these need to be added back to
the codebase so that "git tag --list" and "git for-each-ref" would
still pay attention to color.ui setting.

A real fix to the problem introduced by 4c7f1819 ("make color.ui
default to 'auto'", 2013-06-10) must be found, to allow users to
override the default "auto" use of colors for any color-capable
plumbing commands, but let's leave that to a later round.

Signed-off-by: Junio C Hamano 
---
 builtin/for-each-ref.c | 3 ++-
 builtin/tag.c  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 5d7c921a77..238eb00e09 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "parse-options.h"
 #include "ref-filter.h"
+#include "color.h"
 
 static char const * const for_each_ref_usage[] = {
N_("git for-each-ref [] []"),
@@ -54,7 +55,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char 
*prefix)
 
format.format = "%(objectname) %(objecttype)\t%(refname)";
 
-   git_config(git_default_config, NULL);
+   git_config(git_color_default_config, NULL);
 
parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
if (maxcount < 0) {
diff --git a/builtin/tag.c b/builtin/tag.c
index 66e35b823b..46c3e78b55 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -158,7 +158,7 @@ static int git_tag_config(const char *var, const char 
*value, void *cb)
 
if (starts_with(var, "column."))
return git_column_config(var, value, "tag", );
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static void write_tag_body(int fd, const struct object_id *oid)
-- 
2.14.2-882-gfe33df219d



[PATCH 0/2] fixing "add -p" regression

2017-10-02 Thread Junio C Hamano
We were a bit too agressive in pushing color.ui configuration to
plumbing commands.  A real fix must be found to override the default
"auto" use of colors for any color-capable plumbing commands, but
let's leave that to a later round and concentrate on fixing the
regression first.

Junio C Hamano (2):
  Revert "color: check color.ui in git_default_config()"
  colors: git_default_config() does not read color.ui

 builtin/branch.c   | 2 +-
 builtin/clean.c| 3 ++-
 builtin/for-each-ref.c | 3 ++-
 builtin/grep.c | 2 +-
 builtin/show-branch.c  | 2 +-
 builtin/tag.c  | 2 +-
 color.c| 8 
 config.c   | 4 
 diff.c | 3 +++
 9 files changed, 19 insertions(+), 10 deletions(-)

-- 
2.14.2-882-gfe33df219d



[PATCH 1/2] Revert "color: check color.ui in git_default_config()"

2017-10-02 Thread Junio C Hamano
This reverts commit 136c8c8b8fa39f1315713248473dececf20f8fe7.

Even though we do want to fix the fallout from 4c7f1819 ("make
color.ui default to 'auto'", 2013-06-10), which made it impossible
to override it with "git -c color.ui={never,always} $plumbing",
allowing the plumbing commands to pay attention to color.ui
configuration variable turned out to be an unsatisfactory fix.

People who had color.ui=always, thinking that it should be safe to
do, because it won't apply to plumbing commands, got burned by it.

A bit of fix-up patches are needed, as the series that included the
patch being reverted, and changes after the series landed, have
and/or added code that assumes git_default_config() would read the
color.ui, and they need to be adjusted.

Signed-off-by: Junio C Hamano 
---
 builtin/branch.c  | 2 +-
 builtin/clean.c   | 3 ++-
 builtin/grep.c| 2 +-
 builtin/show-branch.c | 2 +-
 color.c   | 8 
 config.c  | 4 
 diff.c| 3 +++
 7 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 16d391b407..1969c7116c 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -92,7 +92,7 @@ static int git_branch_config(const char *var, const char 
*value, void *cb)
return config_error_nonbool(var);
return color_parse(value, branch_colors[slot]);
}
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static const char *branch_get_color(enum color_branch ix)
diff --git a/builtin/clean.c b/builtin/clean.c
index c1bafda5b6..057fc97fe4 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -125,7 +125,8 @@ static int git_clean_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
-   return git_default_config(var, value, cb);
+   /* inspect the color.ui config variable and others */
+   return git_color_default_config(var, value, cb);
 }
 
 static const char *clean_get_color(enum color_clean ix)
diff --git a/builtin/grep.c b/builtin/grep.c
index a7157f5632..0d6e669732 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -284,7 +284,7 @@ static int wait_all(void)
 static int grep_cmd_config(const char *var, const char *value, void *cb)
 {
int st = grep_config(var, value, cb);
-   if (git_default_config(var, value, cb) < 0)
+   if (git_color_default_config(var, value, cb) < 0)
st = -1;
 
if (!strcmp(var, "grep.threads")) {
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 28f245c8cc..7073a3eb97 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -554,7 +554,7 @@ static int git_show_branch_config(const char *var, const 
char *value, void *cb)
return 0;
}
 
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static int omit_in_dense(struct commit *commit, struct commit **rev, int n)
diff --git a/color.c b/color.c
index 7aa8b076f0..31b6207a00 100644
--- a/color.c
+++ b/color.c
@@ -361,6 +361,14 @@ int git_color_config(const char *var, const char *value, 
void *cb)
return 0;
 }
 
+int git_color_default_config(const char *var, const char *value, void *cb)
+{
+   if (git_color_config(var, value, cb) < 0)
+   return -1;
+
+   return git_default_config(var, value, cb);
+}
+
 void color_print_strbuf(FILE *fp, const char *color, const struct strbuf *sb)
 {
if (*color)
diff --git a/config.c b/config.c
index bc290e7563..a9356c1383 100644
--- a/config.c
+++ b/config.c
@@ -16,7 +16,6 @@
 #include "string-list.h"
 #include "utf8.h"
 #include "dir.h"
-#include "color.h"
 
 struct config_source {
struct config_source *prev;
@@ -1351,9 +1350,6 @@ int git_default_config(const char *var, const char 
*value, void *dummy)
if (starts_with(var, "advice."))
return git_default_advice_config(var, value);
 
-   if (git_color_config(var, value, dummy) < 0)
-   return -1;
-
if (!strcmp(var, "pager.color") || !strcmp(var, "color.pager")) {
pager_use_color = git_config_bool(var,value);
return 0;
diff --git a/diff.c b/diff.c
index 9c38258030..85e714f6c6 100644
--- a/diff.c
+++ b/diff.c
@@ -299,6 +299,9 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
}
 
+   if (git_color_config(var, value, cb) < 0)
+   return -1;
+
return git_diff_basic_config(var, value, cb);
 }
 
-- 
2.14.2-882-gfe33df219d



Re: [PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-10-02 Thread Junio C Hamano
Ramsay Jones  writes:

> On 02/10/17 14:44, Pranit Bauva wrote:
> [snip]
>>...
> Yes, I also meant to tidy that up by removing some, now
> redundant, initialisation later in that function.
>
> Note, that wasn't the only bug! (I have probably forgotten
> some of them, but look at 964f4e2b0, for example).

It seems that Pranit needs a bit more work to take known fixes from
your efforts and we should wait for the series to be rerolled?

Thanks both for working on this.


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Taylor Blau
On Tue, Oct 03, 2017 at 08:55:01AM +0900, Junio C Hamano wrote:
> Jonathan Nieder  writes:
>
> > The above does a nice job of explaining
> >
> >  - what this change is going to do
> >  - how it's good for the internal code structure / maintainability
> >
> > What it doesn't tell me about is why the user-facing effect won't
> > cause problems.  Is there no atom where %(atom:) was previously
> > accepted and did something meaningful that this may break?
>
> That is, was there any situation where %(atom) and %(atom:) did two
> differnt things and their differences made sense?
>
> > Looking at the manpage and code, I don't see any, so for what it's
> > worth, this is
> >
> > Reviewed-by: Jonathan Nieder 
> >
> > but for next time, please remember to discuss regression risk in
> > the commit message, too.
>
> Yes, I agree that it is necessary to make sure somebody looked at
> the issue _and_ record the fact that it happened.  Thanks for doing
> that already ;-)
>
> I also took a look at the code and currently we seem to abort,
> either with "unrecognised arg" (e.g. "refname:") or "does not take
> args" (e.g. "body"), so we should be OK, I'd think.

Thank you both for the helpful pointers. I will make sure to include a
more thorough review of potential breakage in existing code given a
particular change.

With respect to this particular patch, I agree with Jonathan and Junio
that there are no places in ref-filter.c that would be affected by this
change, and that it should be able to be applied cleanly without
breakage.

--
- Taylor


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-10-02 Thread Junio C Hamano
Junio C Hamano  writes:

> In any case, I think the first step may be to revert 136c8c8b from
> both 'master' and 2.14.x.  These alternative solutions can come on
> top.
>
> Thoughts?

With the attached patch, after reverting 136c8c8b ("color: check
color.ui in git_default_config()", 2017-07-13) from the tip of
'master', all tests seem to pass.  More importantly,

git -c color.ui=always diff-files -p >not-a-tty

will not get colors on its output file, because it does not pay
attention to color.ui configuration.

Note that I haven't _decided_ that reverting is the best way to move
forward (yet).  I am just giving a datapoint for people to use when
they assess how painful each possible avenues proposed are.

 builtin/for-each-ref.c | 3 ++-
 builtin/tag.c  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 5d7c921a77..238eb00e09 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -5,6 +5,7 @@
 #include "object.h"
 #include "parse-options.h"
 #include "ref-filter.h"
+#include "color.h"
 
 static char const * const for_each_ref_usage[] = {
N_("git for-each-ref [] []"),
@@ -54,7 +55,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char 
*prefix)
 
format.format = "%(objectname) %(objecttype)\t%(refname)";
 
-   git_config(git_default_config, NULL);
+   git_config(git_color_default_config, NULL);
 
parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
if (maxcount < 0) {
diff --git a/builtin/tag.c b/builtin/tag.c
index c627794181..f8bc1393ed 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -158,7 +158,7 @@ static int git_tag_config(const char *var, const char 
*value, void *cb)
 
if (starts_with(var, "column."))
return git_column_config(var, value, "tag", );
-   return git_default_config(var, value, cb);
+   return git_color_default_config(var, value, cb);
 }
 
 static void write_tag_body(int fd, const struct object_id *oid)




Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-10-02 Thread Junio C Hamano
Junio C Hamano  writes:

> Jonathan Nieder  writes:
>
>> Yet another alternative would be to treat color.ui=always as a
>> deprecated synonym for color.ui=auto.  I think that's my preferred
>> fix.
>
> It is mine, too.  And I do not think color.ui=absolutely-always for
> those who want to keep the current behaviour is unneeded.

Having said that, I do not mind solving what 136c8c8b ("color: check
color.ui in git_default_config()", 2017-07-13) tried to do in a
different way.  If 4c7f1819b that made even plumbing to default to
auto was wrong (because plumbing did not pay attention to color.ui
so people could not override that 'auto' default), we can partially
revert 4c7f1819 ("make color.ui default to 'auto'", 2013-06-10) to
make the default 'auto' not apply to plumbing.

In any case, I think the first step may be to revert 136c8c8b from
both 'master' and 2.14.x.  These alternative solutions can come on
top.

Thoughts?


Re: [PATCH 00/24] object_id part 10

2017-10-02 Thread brian m. carlson
On Mon, Oct 02, 2017 at 04:34:44PM -0700, Stefan Beller wrote:
> On Sun, Oct 1, 2017 at 3:08 PM, brian m. carlson
>  wrote:
> > I apologize for the unintended bounced emails that occurred in late
> > August; my mail server (previously hosted at home) got knocked offline
> > because I was without Internet for over a week after Hurricane Harvey.
> > I've since relocated mail to New York, so that shouldn't happen again.
> 
> I hope you are ok. (It's not just mail servers that don't like water, but
> other personal belongings, too)

Yes, we got a small amount of water in our garage, but overall we did
very well with only minor damage.  We and our families and friends were
safe the entire time.

I've spent some time getting the damage fixed instead of coding; hence
the delay in this series.  Overall, though, the damage was minor and we
were very fortunate.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 04/24] refs: convert update_ref and refs_update_ref to use struct object_id

2017-10-02 Thread brian m. carlson
On Mon, Oct 02, 2017 at 03:37:38PM -0700, Stefan Beller wrote:
> > diff --git a/bisect.c b/bisect.c
> > index 96beeb5d13..e8470a2e0f 100644
> > --- a/bisect.c
> > +++ b/bisect.c
> > @@ -685,11 +685,13 @@ static int bisect_checkout(const struct object_id 
> > *bisect_rev, int no_checkout)
> > char bisect_rev_hex[GIT_MAX_HEXSZ + 1];
> >
> > memcpy(bisect_rev_hex, oid_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
> > -   update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev->hash, NULL, 0, 
> > UPDATE_REFS_DIE_ON_ERR);
> > +   update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev, NULL, 0,
> > +  UPDATE_REFS_DIE_ON_ERR);
> 
> The number of characters decrease, yet the line gets an additional
> line break. While I don't mind this, the most interesting question that
> comes to mind is whether you tried the new clang formatting options
> in tree to adapt the indentation? ;)

I didn't.  Sometimes I use Coccinelle to convert part of a patch and fix
up the rest by hand, and it really likes to break lines at certain
points.  I expect that I'll do at least one reroll, so I'll fix this.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-10-02 Thread Junio C Hamano
Jonathan Nieder  writes:

> Yet another alternative would be to treat color.ui=always as a
> deprecated synonym for color.ui=auto.  I think that's my preferred
> fix.

It is mine, too.  And I do not think color.ui=absolutely-always for
those who want to keep the current behaviour is unneeded.



Re: Security of .git/config and .git/hooks

2017-10-02 Thread Junio C Hamano
Jonathan Nieder  writes:

> Proposed fix: because of case (1), I would like a way to tell Git to
> stop trusting any files in .git.  That is:
>
>  1. Introduce a (configurable) list of "safe" configuration items that
> can be set in .git/config and don't respect any others.

The list of "safe" things are configurable by having something in
~/.gitconfig, perhaps?

How would this work, from the end-user's point of view, with "git
config --global" and "git config --local"?

>  2. But what if I want to set a different pager per-repository?
> I think we could do this using configuration "profiles".
> My ~/.config/git/profiles/ directory would contain git-style
> config files for repositories to include.  Repositories could
> then contain
>
>   [include]
>   path = ~/.config/git/profiles/fancy-log-pager
>
> to make use of those settings.  The facility (1) would
> special-case this directory to allow it to set "unsafe" settings
> since files there are assumed not to be under the control of an
> attacker.

Meaning, "include" is not in "safe" category, but a value that
begins with "~/.config/git/" are excempt???

>  3. Likewise for hooks: my ~/.config/git/hooks/ directory would
> contain hooks for repositories to make use of.  Repositories could
> symlink to hook files from there to make use of them.

I am not sure what this means.  .git/hooks/pre-commit being a
symbolic link to "~/.config/git/hooks/pre-commit-fancy"
(i.e. readlink gives the path with tilde unexpanded), so that the
attacked sysadmin will not run it from ~attacker/.config/git/hooks?  

And the code that finds a hook to run sees .git/hooks/pre-commit,
resolves the symlink manually and makes sure it leads to somewhere
inside ~/.config/...  (otherwise it rejects) and then uses the
pointed-at copy?

At that point, we are not taking any advantage of symbolic-link-ness
of the entity, so .git/hooks/pre-commit being a text file that has a
single like, e.g.

# safe-hook: pre-commit-fancy

may be sufficient (and we do not have to worry about systems without
symbolic links)?  The machinery that used to manually resolved symlink
instead reads it, finds "pre-commit-fancy" in ~/.config/git/hooks/
and runs it, and you get the same behaviour, no?

> One downside of (3) is its reliance on symlinks.  Some alternatives:
>
>  3b. Use core.hooksPath configuration instead.  Rely on (2).
>  3c. Introduce new hook.* configuration to be used instead of hook
>  scripts.  Rely on (2).

I guess I invented 3d. without reading ahead X-<.  None of the 3x
variants other than 3 proper will not work for scripts and existing
code that sees that .git/hooks/pre-commit is an executable and runs
it, and 3 proper will not work without symbolic links, so this means
we'd need "git locate-hook pre-commit" (and underlying locate_hook()
helper API) that returns "/home/me/.git/config/hook/pre-commit-fancy"
or fails when we do this transition.  In an unconverted repository,
it may return $PWD/.git/hooks/pre-commit, or failure if we are
running under the paranoid mode.

Sounds workable.


Re: [PATCH v3] clang-format: add a comment about the meaning/status of the

2017-10-02 Thread Ramsay Jones


On 02/10/17 18:21, Brandon Williams wrote:
> On 10/02, Junio C Hamano wrote:
>> From: Stephan Beyer 
>>
>> Having a .clang-format file in a project can be understood in a way that
>> code has to be in the style defined by the .clang-format file, i.e., you
>> just have to run clang-format over all code and you are set.
>>
>> This unfortunately is not yet the case in the Git project, as the
>> format file is still work in progress.  Explain it with a comment in
>> the beginning of the file.
>>
>> Additionally, the working clang-format version is mentioned because the
>> config directives change from time to time (in a compatibility-breaking way).
>>
>> Signed-off-by: Stephan Beyer 
>> Signed-off-by: Junio C Hamano 
>> ---
>>
>>  * So here is a counter-proposal in a patch form.  I agree that my
>>earlier suggestion was unnecessarily verbose; this one spends
>>just as many lines and not more than the v2 round of Stephan's
>>patch.
>>
>>  .clang-format | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/.clang-format b/.clang-format
>> index 56822c116b..7670eec8df 100644
>> --- a/.clang-format
>> +++ b/.clang-format
>> @@ -1,4 +1,8 @@
>> -# Defaults
>> +# This file is an example configuration for clang-format 5.0.
>> +#
>> +# Note that this style definition should only be understood as a hint
>> +# for writing new code. The rules are still work-in-progress and does
>> +# not yet exactly match the style we have in the existing code.
> 
> Thanks for writing up this header comment to the .clang-format file,
> it's something I definitely should have included when I introduced it.
> 
> And I like the wording that you've both settled on, as it reflects our
> intentions (of having the code eventually conform to the format rules)
> and making note that this set of rules still needs to be tuned.

Just for the record, I have 'clang-format version 3.8.0-2ubuntu4
 (tags/RELEASE_380/final)' on Linux Mint 18.2, which requires me
to comment out:

AlignEscapedNewlines: Left
BreakStringLiterals: false
PenaltyBreakAssignment: 100

And on cygwin, I have 'clang-format version 4.0.1
 (tags/RELEASE_401/final)', which requires me to
comment out:

AlignEscapedNewlines: Left
PenaltyBreakAssignment: 100

So, I don't think I can play along! :(

[When playing with 3.8 on Linux, I noted that clang-format
seemed to ignore *all* settings in .clang-format, if it found
*any* config that it didn't know about! Not very friendly. :-P ]

ATB,
Ramsay Jones



Re: [idea] File history tracking hints

2017-10-02 Thread Junio C Hamano
Jeff Hostetler  writes:

>> How do you know when a guid needs adaption?
>
> I'm not sure I know what you mean by "adaption".

I think he meant adapting, and I think he is referring to what I
wrote in the message upthread to explain why "file ID" would not
help.

It seems to me, from reading the remainder of your message, that it
is also becoming clear to you that "file ID" would not help and your
conceptual thing was merely a hand-waving that was dubious how it
could be made into a concrete working design?  Hopefully we can
converge on a workable design that does not involve "file ID", and
that would be a good outcome.




Re: [PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-10-02 Thread Ramsay Jones


On 02/10/17 14:44, Pranit Bauva wrote:
[snip]
>> Look for []-ed comments in the commit messages for a note of
>> the changes I made to your original patches, in patches #2,
>> #4, #7-9, #11-12 and #14.
>>
>> The commits I added, which are just WIP, are as follows:
>>
>>   $ git log --oneline bisect~12..bisect
>>   7d7117040 (raj/bisect, bisect) bisect--helper: convert to struct object_id
>>   188ea5855 bisect--helper: add the get_bad_commit() function
>>   b75f46fb4 bisect--helper: add a log_commit() helper function
>>   4afc34403 bisect--helper: reduce the scope of a variable
>>   62495f6ae bisect--helper: remove useless sub-expression in condition
>>   964f4e2b0 bisect--helper: set correct term from --term-new= option
>>   62efc099f bisect--helper: remove redundant assignment to has_double_dash
>>   d35950b92 bisect--helper: remove redundant goto's
>>   b33f313ac bisect--helper: remove space just before \n in string
>>   3eb407156 bisect--helper: remove some unnecessary braces
>>   c2b89c9b8 bisect--helper: add some vertical whitespace
>>   8c883701c bisect--helper: fix up some coding style issues
>>   $
>>
>> Again IIRC, there are a couple of bug fixes in these commits ...
> 
> There is actually a major bug in the later part of previous series
> mostly in the bisect-next which actually caused delays. I think you
> have fixed it in your commit 682d0bff0. Although I would need to have
> a closer look at it. In original series, I did get a sigserv, and as
> you mention it in the commit that you have fixed it.

Yes, I also meant to tidy that up by removing some, now
redundant, initialisation later in that function.

Note, that wasn't the only bug! (I have probably forgotten
some of them, but look at 964f4e2b0, for example).

ATB,
Ramsay Jones



Re: [idea] File history tracking hints

2017-10-02 Thread Junio C Hamano
Stefan Beller  writes:

> I have rethought about the idea of GUIDs as proposed by Jeff and wanted
> to give a reply. After rereading this message, I think my thoughts are
> already included via:
>
>   - you're doing the work at the wrong point for _another_ reason. You're
>  freezing your (crappy) algorithm at tree creation time, and basically
>  making it pointless to ever create something better later, because even
>  if hardware and software improves, you've codified that "we have to
>  have crappy information".
>
> --
> My design proposal for these "rename hints" would be a special trailer,
> roughly:
>
> Rename: LICENSE -> legal.txt
> Rename: t/* -> tests/*
>
> or more generally:
>
> Rename:   

Yes, it is a non starter to have that baked in the log message of a
commit object.  The principle Linus lays out in the message does not
reject such hints stored outside baked-in data structure, which
allows mistakes to be corrected without affecting the real history,
though.

Another thing that makes what you wrote above of dubious value is
that it attaches such hints to "a commit" (whether baked inside the
log message, or as some form of "notes" that can be associated with
a specific commit); it adds hints at a wrong place.

Given identical pair of trees  that are wrapped in two pairs of
commits  and  where A^{tree}=B^{tree} and A^^{tree}=B^^{tree},
we do not want to have to give duplicated hints for A and B, to help
"git show A" and "git show B" to behave the same.

Rather, if we said "these two blobs A and B are similar and we want
diffcore-rename to pair them, no matter where they appear in any two
trees", then "git diff -M X Y", where X and Y may not have any
ancestry relationship (they may not even be commits) can be told
that the blob A that is in tree X and the blob B that is in tree Y
are renames or copies, no matter where in these trees the pair of
blobs appear, and no matter how X and Y are related (or unrelated)
in the history.

That is a bigger reason why annotating a commit may be a bad way to
go.


Re: [PATCH v2] request-pull: capitalise "Git" to make it a proper noun

2017-10-02 Thread Jonathan Nieder
Ann T Ropea wrote:

> Of the many ways to spell the three-letter word, the variant "Git"
> should be used when referring to a repository in a description; or, in
> general, when it is used as a proper noun.
>
> We thus change the pull-request template message so that it reads
>
>"...in the Git repository at:"
>
> Besides, this brings us in line with the documentation, see
> Documentation/howto/using-signed-tag-in-pull-request.txt
>
> Signed-off-by: Ann T Ropea 
> Acked-by: Jonathan Nieder 
> ---
> v2: rename patch, correct Signed-off-by line, add Acked-by line
>  git-request-pull.sh | 2 +-
>  t/t5150-request-pull.sh | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for the quick turnaround.

Sincerely,
Jonathan


Re: [PATCH v2] setup: update error message to be more meaningful

2017-10-02 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Incorrect case,
>
> $ git grep "some random regex" -n
> fatal: bad flag '-n' used after filename
>
> The above case is incorrect as "some random regex" isn't a filename
> in this case.

The command line rule is to have dashed options first and then other
arguments, so I agree that "option '-n' used after non-option
argument(s)" would be a better alternative.

"grep" is an oddball, as it allows you to be lazy and omit the "-e"
option when there is only one pattern, making a perfectly reasonable
"grep -e regex -n" into an invalid "grep regex -n".

As an aside, I wonder if we want to _() the message.  It's outside
the scope of this fix, obviously.

>  setup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/setup.c b/setup.c
> index 860507e1f..09c793282 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -230,7 +230,7 @@ void verify_filename(const char *prefix,
>int diagnose_misspelt_rev)
>  {
>   if (*arg == '-')
> - die("bad flag '%s' used after filename", arg);
> + die("option '%s' must come before non-option arguments", arg);
>   if (looks_like_pathspec(arg) || check_filename(prefix, arg))
>   return;
>   die_verify_filename(prefix, arg, diagnose_misspelt_rev);


Re: [PATCH v2] branch: change the error messages to be more meaningful

2017-10-02 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> I was recently searching to find the patches have gone missing in to
> the void for no obvious reason and found this. Should I consider this
> to be "Dropped" in terms of the "What's cooking" emails? or has this
> just not received the required attention?

I do not even recall what the patches did and if I thought what they
wanted to do made sense, so I wouldn't be surprised if I did not
pick them up, after seeing nobody's commenting on it.



Dearest

2017-10-02 Thread Asana Hajraf
Dearest,
I am Mrs. Asana Hajraf and I am married to Mr. Hassan Hajraf from kuwait for 19 
years without a child and my husband died in 2014. I am contacting you to let 
you know my desire to donate the sum of $4.5 million to charities in your 
country which I inherited from my late husband. Due to my illness of cancer 
were confirmed out of just eight months, so it is my desire to see that this 
money is invested to any organization of your choice and distributed each year 
among the charity organizations, motherless babies home, schools and support 
for the men and women homeless or what you may consider to be the benefit of 
those less fortunate. I do not want this fund to be invested in a manner 
without God. As soon as I receive your reply confirming your acceptance of the 
work I tell you, I will give all relevant information to authorize the release 
and transfer the money to you as my duly appointed representative.
Thanks,
Mrs Asana Hajraf.


Re: [PATCH v2] request-pull: capitalise "Git" to make it a proper noun

2017-10-02 Thread Ann T Ropea

Of the many ways to spell the three-letter word, the variant "Git"
should be used when referring to a repository in a description; or, in
general, when it is used as a proper noun.

We thus change the pull-request template message so that it reads

   "...in the Git repository at:"

Besides, this brings us in line with the documentation, see
Documentation/howto/using-signed-tag-in-pull-request.txt

Signed-off-by: Ann T Ropea 
Acked-by: Jonathan Nieder 
---
v2: rename patch, correct Signed-off-by line, add Acked-by line
 git-request-pull.sh | 2 +-
 t/t5150-request-pull.sh | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/git-request-pull.sh b/git-request-pull.sh
index eebd33276da9..13c172bd94fc 100755
--- a/git-request-pull.sh
+++ b/git-request-pull.sh
@@ -128,7 +128,7 @@ git show -s --format='The following changes since  
commit %H:


   %s (%ci)

-are available in the git repository at:
+are available in the Git repository at:
 ' $merge_base &&
 echo "  $url $pretty_remote" &&
 git show -s --format='
diff --git a/t/t5150-request-pull.sh b/t/t5150-request-pull.sh
index 82c33b88e710..08c210f03586 100755
--- a/t/t5150-request-pull.sh
+++ b/t/t5150-request-pull.sh
@@ -68,7 +68,7 @@ test_expect_success 'setup: two scripts for reading  
pull requests' '

cat <<-\EOT >read-request.sed &&
#!/bin/sed -nf
# Note that a request could ask for "tag $tagname"
-   / in the git repository at:$/!d
+   / in the Git repository at:$/!d
n
/^$/ n
s/ tag \([^ ]*\)$/ tag--\1/
@@ -192,7 +192,7 @@ test_expect_success 'pull request format' '

  SUBJECT (DATE)

-   are available in the git repository at:
+   are available in the Git repository at:

  URL BRANCH

--
2.13.6



Re: What means "git config bla ~/"?

2017-10-02 Thread Junio C Hamano
Jonathan Nieder  writes:

>> what's with that "git config bla ~/"? is this some config keyword
>> or something?
> ...
>
> I agree with you that it is less clear than it could be.  Ideas for
> clarifying it?

Yeah, if it were

git config section.var ~/some/thing

that would be far clearer than "bla" that what we see is a mere
placeholder.

Another obvious option is to remove the parenthetical comment.

Or "(but you can give values without quoting and let your shell
expand it)", without a confusing sample command that proved not to
help very much.





Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Junio C Hamano
Jonathan Nieder  writes:

> The above does a nice job of explaining
>
>  - what this change is going to do
>  - how it's good for the internal code structure / maintainability
>
> What it doesn't tell me about is why the user-facing effect won't
> cause problems.  Is there no atom where %(atom:) was previously
> accepted and did something meaningful that this may break?

That is, was there any situation where %(atom) and %(atom:) did two
differnt things and their differences made sense?

> Looking at the manpage and code, I don't see any, so for what it's
> worth, this is
>
> Reviewed-by: Jonathan Nieder 
>
> but for next time, please remember to discuss regression risk in
> the commit message, too.

Yes, I agree that it is necessary to make sure somebody looked at
the issue _and_ record the fact that it happened.  Thanks for doing
that already ;-)

I also took a look at the code and currently we seem to abort,
either with "unrecognised arg" (e.g. "refname:") or "does not take
args" (e.g. "body"), so we should be OK, I'd think.


Re: [PATCH 0/3] Comment fixes

2017-10-02 Thread Brandon Williams
On 09/26, Han-Wen Nienhuys wrote:
> follow more commit log conventions; verified it compiled (yay).
> 
> (should I send patches that are in 'pu' again as well?)
> 
> Han-Wen Nienhuys (3):
>   real_path: clarify return value ownership
>   read_gitfile_gently: clarify return value ownership.
>   string-list.h: move documentation from Documentation/api/ into header

This version looks good to me.

> 
>  Documentation/technical/api-string-list.txt | 209 
> 
>  abspath.c   |   4 +
>  setup.c |   3 +-
>  string-list.h   | 192 +
>  4 files changed, 168 insertions(+), 240 deletions(-)
>  delete mode 100644 Documentation/technical/api-string-list.txt
> 
> --
> 2.14.1.821.g8fa685d3b7-goog

-- 
Brandon Williams


Re: [PATCH v2] oidmap: map with OID as key

2017-10-02 Thread Brandon Williams
On 09/29, Jonathan Tan wrote:
> This is similar to using the hashmap in hashmap.c, but with an
> easier-to-use API. In particular, custom entry comparisons no longer
> need to be written, and lookups can be done without constructing a
> temporary entry structure.
> 
> This is implemented as a thin wrapper over the hashmap API. In
> particular, this means that there is an additional 4-byte overhead due
> to the fact that the first 4 bytes of the hash is redundantly stored.
> For now, I'm taking the simpler approach, but if need be, we can
> reimplement oidmap without affecting the callers significantly.
> 
> oidset has been updated to use oidmap.
> 
> Signed-off-by: Jonathan Tan 
> ---
> Some replies to v1 [1] [2] seem to indicate that simpler non-duplicated
> code should be preferred over optimizing away the storage of the 4-byte
> hash code, and I have no objection to that, so I have updated this code
> to be a thin wrapper over hashmap with the 4-byte overhead.
> 
> After this patch, if the 4-byte overhead is found to be too much, we can
> migrate to something similar to v1 relatively easily.
> 
> I decided not to go with the util pointer method because we will not be
> able to migrate away from it so easily if need be.

This makes me a bit sad because I tend to lean more towards making
things simpler.  I'm still a supporter of the 'util' pointer but I
understand why we would choose otherwise.

> 
> [1] https://public-inbox.org/git/xmqqr2ur348z@gitster.mtv.corp.google.com/
> [2] 
> https://public-inbox.org/git/20170929192624.4ukvpjujgiyzg...@sigill.intra.peff.net/
> ---
>  Makefile |  1 +
>  fetch-pack.c |  2 +-
>  oidmap.c | 51 +
>  oidmap.h | 68 
> 
>  oidset.c | 36 +++-
>  oidset.h |  6 --
>  6 files changed, 133 insertions(+), 31 deletions(-)
>  create mode 100644 oidmap.c
>  create mode 100644 oidmap.h
> 
> diff --git a/Makefile b/Makefile
> index ed4ca438b..64136dde4 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -821,6 +821,7 @@ LIB_OBJS += notes-cache.o
>  LIB_OBJS += notes-merge.o
>  LIB_OBJS += notes-utils.o
>  LIB_OBJS += object.o
> +LIB_OBJS += oidmap.o
>  LIB_OBJS += oidset.o
>  LIB_OBJS += packfile.o
>  LIB_OBJS += pack-bitmap.o
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 105506e9a..008b25d3d 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -611,7 +611,7 @@ static int tip_oids_contain(struct oidset *tip_oids,
>* add to "newlist" between calls, the additions will always be for
>* oids that are already in the set.
>*/
> - if (!tip_oids->map.tablesize) {
> + if (!tip_oids->map.map.tablesize) {
>   add_refs_to_oidset(tip_oids, unmatched);
>   add_refs_to_oidset(tip_oids, newlist);
>   }
> diff --git a/oidmap.c b/oidmap.c
> new file mode 100644
> index 0..6db4fffcd
> --- /dev/null
> +++ b/oidmap.c
> @@ -0,0 +1,51 @@
> +#include "cache.h"
> +#include "oidmap.h"
> +
> +static int cmpfn(const void *hashmap_cmp_fn_data,
> +  const void *entry, const void *entry_or_key,
> +  const void *keydata)
> +{
> + const struct oidmap_entry *entry_ = entry;
> + if (keydata)
> + return oidcmp(_->oid, (const struct object_id *) keydata);
> + return oidcmp(_->oid,
> +   &((const struct oidmap_entry *) entry_or_key)->oid);
> +}
> +
> +static int hash(const struct object_id *oid)
> +{
> + int hash;
> + memcpy(, oid->hash, sizeof(hash));
> + return hash;
> +}
> +
> +void oidmap_init(struct oidmap *map, size_t initial_size)
> +{
> + hashmap_init(>map, cmpfn, NULL, initial_size);
> +}
> +
> +void oidmap_free(struct oidmap *map, int free_entries)
> +{
> + if (!map)
> + return;
> + hashmap_free(>map, free_entries);
> +}
> +
> +void *oidmap_get(const struct oidmap *map, const struct object_id *key)
> +{
> + return hashmap_get_from_hash(>map, hash(key), key);
> +}
> +
> +void *oidmap_remove(struct oidmap *map, const struct object_id *key)
> +{
> + struct hashmap_entry entry;
> + hashmap_entry_init(, hash(key));
> + return hashmap_remove(>map, , key);
> +}
> +
> +void *oidmap_put(struct oidmap *map, void *entry)
> +{
> + struct oidmap_entry *to_put = entry;
> + hashmap_entry_init(_put->internal_entry, hash(_put->oid));
> + return hashmap_put(>map, to_put);
> +}
> diff --git a/oidmap.h b/oidmap.h
> new file mode 100644
> index 0..18f54cde1
> --- /dev/null
> +++ b/oidmap.h
> @@ -0,0 +1,68 @@
> +#ifndef OIDMAP_H
> +#define OIDMAP_H
> +
> +#include "hashmap.h"
> +
> +/*
> + * struct oidmap_entry is a structure representing an entry in the hash 
> table,
> + * which must be used as first member of user data structures.
> + *
> + * Users should set the oid field. oidmap_put() will populate the
> + * internal_entry field.
> + */
> 

Security of .git/config and .git/hooks

2017-10-02 Thread Jonathan Nieder
Hi,

This topic has been mentioned on this mailing list before but I had
trouble finding a relevant reference.  Links welcome.

Suppose that I add the following to .git/config in a repository on a
shared computer:

[pager]
log = rm -fr /
fsck = rm -fr /

("rm -fr /" is of course a placeholder here.)

I then tell a sysadmin that git commands are producing strange output
and I am having trouble understanding what is going on.  They may run
"git fsck" or "git log"; in either case, the output is passed to the
pager I configured, allowing me to run an arbitrary command using the
sysadmin's credentials.

You might say that this is the sysadmin's fault, that they should have
read through .git/config before running any Git commands.  But I don't
find it so easy to blame them.

A few related cases that might not seem so dated:

 1. I put my repository in a zip file and ask a Git expert to help me
recover data from it, or

 2. My repository is in a shared directory on NFS.  Unless the
administrator setting that up is very careful, it is likely that
the least privileged user with write access to .git/config or
.git/hooks/ may be someone that I don't want to be able to run
arbitrary commands on behalf of the most privileged user working
in that repository.

A similar case to compare to is Linux's "perf" tool, which used to
respect a .perfconfig file from the current working directory.
Fortunately, nowadays "perf" only respects ~/.perfconfig and
/etc/perfconfig.

Proposed fix: because of case (1), I would like a way to tell Git to
stop trusting any files in .git.  That is:

 1. Introduce a (configurable) list of "safe" configuration items that
can be set in .git/config and don't respect any others.

 2. But what if I want to set a different pager per-repository?
I think we could do this using configuration "profiles".
My ~/.config/git/profiles/ directory would contain git-style
config files for repositories to include.  Repositories could
then contain

[include]
path = ~/.config/git/profiles/fancy-log-pager

to make use of those settings.  The facility (1) would
special-case this directory to allow it to set "unsafe" settings
since files there are assumed not to be under the control of an
attacker.

 3. Likewise for hooks: my ~/.config/git/hooks/ directory would
contain hooks for repositories to make use of.  Repositories could
symlink to hook files from there to make use of them.

That would allow the configured hooks in ~/.config/git/hooks/ to
be easy to find and to upgrade in one place.

To help users migrate, when a hook is present and executable in
.git/hooks/, Git would print instructions for moving it to
~/.config/git/hooks/ and replacing it with a symlink after
inspecting it.

For backward compatibility, this facility would be controlled by a
global configuration setting.  If that setting is not enabled, then
the current, less safe behavior would remain.

One downside of (3) is its reliance on symlinks.  Some alternatives:

 3b. Use core.hooksPath configuration instead.  Rely on (2).
 3c. Introduce new hook.* configuration to be used instead of hook
 scripts.  Rely on (2).

Thoughts?
Jonathan


Re: [PATCH v6 7/7] ref-filter.c: parse trailers arguments with %(contents) atom

2017-10-02 Thread Junio C Hamano
Taylor Blau  writes:

> Thank you for pointing this out. I should have run "make test" on this
> patch set (or, as you suggested, `git rebase -x "make test" HEAD~7`)
> before sending it out. I appreciate you catching my mistake, and I'll
> make sure to run "make test" more diligently in the future :-).
>
> It sounds like Junio picked this up while queueing.

"Yup" in the sense that "you do not have to worry about it unless
there are other things you want to fix by rerolling" but I didn't
spot it myself and noticed the breakage; I tweaked it only after
Peff pointed out what is wrong.


Re: [PATCH 00/24] object_id part 10

2017-10-02 Thread Stefan Beller
On Sun, Oct 1, 2017 at 3:08 PM, brian m. carlson
 wrote:
> This is the tenth in a series of patches to convert from unsigned char
> [20] to struct object_id.  This series mostly involves changes to the
> refs code.  After these changes, there are almost no references to
> unsigned char in the main refs code.
>
> I've tried to update the code comments as I've touched them, but if I've
> missed any, please point them out.  A big thank-you goes out to René
> Scharfe, who did the work to make resolve_ref_unsafe optionally take a
> NULL pointer; this dramatically reduced the size of one of the patches,
> and allowed a few others to be dropped during the rebase.
>
> I apologize for the unintended bounced emails that occurred in late
> August; my mail server (previously hosted at home) got knocked offline
> because I was without Internet for over a week after Hurricane Harvey.
> I've since relocated mail to New York, so that shouldn't happen again.

I hope you are ok. (It's not just mail servers that don't like water, but
other personal belongings, too)

>
> This series is available from the following URL:
> https://github.com/bk2204/git.git object-id-part10
>

The whole series has been
Reviewed-by: Stefan Beller 


Re: [PATCH 00/24] object_id part 10

2017-10-02 Thread Brandon Williams
On 10/01, brian m. carlson wrote:
> This is the tenth in a series of patches to convert from unsigned char
> [20] to struct object_id.  This series mostly involves changes to the
> refs code.  After these changes, there are almost no references to
> unsigned char in the main refs code.
> 
> I've tried to update the code comments as I've touched them, but if I've
> missed any, please point them out.  A big thank-you goes out to René
> Scharfe, who did the work to make resolve_ref_unsafe optionally take a
> NULL pointer; this dramatically reduced the size of one of the patches,
> and allowed a few others to be dropped during the rebase.

Aside from a few unimportant style nits the series looks good to me!

-- 
Brandon Williams


Re: [PATCH 15/24] refs: convert read_ref_at to struct object_id

2017-10-02 Thread Brandon Williams
On 10/01, brian m. carlson wrote:
> Convert the callers and internals, including struct read_ref_at_cb, of
> read_ref_at to use struct object_id.
> 
> Signed-off-by: brian m. carlson 
> ---
>  builtin/show-branch.c |  5 ++---
>  refs.c| 34 +-
>  refs.h|  2 +-
>  sha1_name.c   |  3 +--
>  4 files changed, 21 insertions(+), 23 deletions(-)
> 
> diff --git a/builtin/show-branch.c b/builtin/show-branch.c
> index 722a7f4bec..8ef8ad10c5 100644
> --- a/builtin/show-branch.c
> +++ b/builtin/show-branch.c
> @@ -731,7 +731,7 @@ int cmd_show_branch(int ac, const char **av, const char 
> *prefix)
>   /* Ah, that is a date spec... */
>   timestamp_t at;
>   at = approxidate(reflog_base);
> - read_ref_at(ref, flags, at, -1, oid.hash, NULL,
> + read_ref_at(ref, flags, at, -1, , NULL,
>   NULL, NULL, );
>   }
>   }
> @@ -743,8 +743,7 @@ int cmd_show_branch(int ac, const char **av, const char 
> *prefix)
>   timestamp_t timestamp;
>   int tz;
>  
> - if (read_ref_at(ref, flags, 0, base+i, oid.hash, 
> ,
> - , , NULL)) {
> + if (read_ref_at(ref, flags, 0, base + i, , , 
> , , NULL)) {

This line maybe got too long?

> diff --git a/sha1_name.c b/sha1_name.c
> index 7de12743f3..f0ec3f0454 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -697,8 +697,7 @@ static int get_oid_basic(const char *str, int len, struct 
> object_id *oid,
>   return -1;
>   }
>   }
> - if (read_ref_at(real_ref, flags, at_time, nth, oid->hash, NULL,
> - _time, _tz, _cnt)) {
> + if (read_ref_at(real_ref, flags, at_time, nth, oid, NULL, 
> _time, _tz, _cnt)) {
>   if (!len) {
>   if (starts_with(real_ref, "refs/heads/")) {
>   str = real_ref + 11;

This one too.

-- 
Brandon Williams


Re: [PATCH 13/24] builtin/pack-objects: convert to struct object_id

2017-10-02 Thread Stefan Beller
> @@ -2520,17 +2521,17 @@ static void read_object_list_from_stdin(void)
> continue;
> }
> if (line[0] == '-') {
> -   if (get_sha1_hex(line+1, sha1))
> +   if (get_oid_hex(line+1, ))
> die("expected edge sha1, got garbage:\n %s",
> line);
...

> -   if (get_sha1_hex(line, sha1))
> +   if (parse_oid_hex(line, , ))
> die("expected sha1, got garbage:\n %s", line);

Ideally we do not forget about converting the die() calls, too.


Re: [PATCH] PR msg: capitalise "Git" to make it a proper noun

2017-10-02 Thread Jonathan Nieder
Hi,

Thanks for working to improve Git!

Bedhanger wrote:

> Subject: [PATCH] PR msg: capitalise "Git" to make it a proper noun

nit: What is a PR msg?  Looking with "git log git-request-pull.sh",
I see that previous patches called the subsystem request-pull, so
this could be

request-pull: capitalise "Git" to make it a proper noun

> Of the many ways to spell the three-letter word, the variant "Git"
> should be used when referring to a repository in a description; or, in
> general, when it is used as a proper noun.
>
> We thus change the pull-request template message so that it reads
>
>"...in the Git repository at:"
>
> Besides, this brings us in line with the documentation, see
> Documentation/howto/using-signed-tag-in-pull-request.txt
>
> Signed-off-by: bedhanger 

Please use your full name in the Signed-off-by line.  See
Documentation/SubmittingPatches section "(5) Certify your work" for
why we ask for this.

> ---
>  git-request-pull.sh | 2 +-
>  t/t5150-request-pull.sh | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)

The patch itself looks good, and I like the change it makes to the
email generated by git request-pull.  Looking forward to seeing what
other changes you come up with in the future.

Thanks and hope that helps,
Jonathan


Re: [PATCH 04/24] refs: convert update_ref and refs_update_ref to use struct object_id

2017-10-02 Thread Brandon Williams
On 10/01, brian m. carlson wrote:
> Convert update_ref, refs_update_ref, and write_pseudoref to use struct
> object_id.  Update the existing callers as well.  Remove update_ref_oid,
> as it is no longer needed.
> 
> Signed-off-by: brian m. carlson 
> ---
>  bisect.c  |  6 --
>  builtin/am.c  | 14 +++---
>  builtin/checkout.c|  3 +--
>  builtin/clone.c   | 14 +++---
>  builtin/merge.c   | 13 ++---
>  builtin/notes.c   | 10 +-
>  builtin/pull.c|  2 +-
>  builtin/reset.c   |  4 ++--
>  builtin/update-ref.c  |  2 +-
>  notes-cache.c |  3 +--
>  notes-utils.c |  2 +-
>  refs.c| 39 ---
>  refs.h|  5 +
>  sequencer.c   |  9 +++--
>  t/helper/test-ref-store.c | 10 +-
>  transport-helper.c|  3 ++-
>  transport.c   |  4 ++--
>  17 files changed, 65 insertions(+), 78 deletions(-)
> 
> diff --git a/bisect.c b/bisect.c
> index 96beeb5d13..e8470a2e0f 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -685,11 +685,13 @@ static int bisect_checkout(const struct object_id 
> *bisect_rev, int no_checkout)
>   char bisect_rev_hex[GIT_MAX_HEXSZ + 1];
>  
>   memcpy(bisect_rev_hex, oid_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
> - update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev->hash, NULL, 0, 
> UPDATE_REFS_DIE_ON_ERR);
> + update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev, NULL, 0,
> +UPDATE_REFS_DIE_ON_ERR);
>  
>   argv_checkout[2] = bisect_rev_hex;
>   if (no_checkout) {
> - update_ref(NULL, "BISECT_HEAD", bisect_rev->hash, NULL, 0, 
> UPDATE_REFS_DIE_ON_ERR);
> + update_ref(NULL, "BISECT_HEAD", bisect_rev, NULL, 0,
> +UPDATE_REFS_DIE_ON_ERR);
>   } else {
>   int res;
>   res = run_command_v_opt(argv_checkout, RUN_GIT_CMD);
> diff --git a/builtin/am.c b/builtin/am.c
> index d7513f5375..32120f42df 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -1068,8 +1068,8 @@ static void am_setup(struct am_state *state, enum 
> patch_format patch_format,
>   if (!get_oid("HEAD", _head)) {
>   write_state_text(state, "abort-safety", oid_to_hex(_head));
>   if (!state->rebasing)
> - update_ref_oid("am", "ORIG_HEAD", _head, NULL, 0,
> - UPDATE_REFS_DIE_ON_ERR);
> + update_ref("am", "ORIG_HEAD", _head, NULL, 0,
> +UPDATE_REFS_DIE_ON_ERR);
>   } else {
>   write_state_text(state, "abort-safety", "");
>   if (!state->rebasing)
> @@ -1686,8 +1686,8 @@ static void do_commit(const struct am_state *state)
>   strbuf_addf(, "%s: %.*s", reflog_msg, linelen(state->msg),
>   state->msg);
>  
> - update_ref_oid(sb.buf, "HEAD", , old_oid, 0,
> - UPDATE_REFS_DIE_ON_ERR);
> + update_ref(sb.buf, "HEAD", , old_oid, 0,
> +UPDATE_REFS_DIE_ON_ERR);
>  
>   if (state->rebasing) {
>   FILE *fp = xfopen(am_path(state, "rewritten"), "a");
> @@ -2147,9 +2147,9 @@ static void am_abort(struct am_state *state)
>   clean_index(_head, _head);
>  
>   if (has_orig_head)
> - update_ref_oid("am --abort", "HEAD", _head,
> - has_curr_head ? _head : NULL, 0,
> - UPDATE_REFS_DIE_ON_ERR);
> + update_ref("am --abort", "HEAD", _head,
> +has_curr_head ? _head : NULL, 0,
> +UPDATE_REFS_DIE_ON_ERR);
>   else if (curr_branch)
>   delete_ref(NULL, curr_branch, NULL, REF_NODEREF);
>  
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 3345a0d16f..fd0dec401e 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -664,8 +664,7 @@ static void update_refs_for_switch(const struct 
> checkout_opts *opts,
>   if (!strcmp(new->name, "HEAD") && !new->path && !opts->force_detach) {
>   /* Nothing to do. */
>   } else if (opts->force_detach || !new->path) {  /* No longer on any 
> branch. */
> - update_ref(msg.buf, "HEAD", new->commit->object.oid.hash, NULL,
> -REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
> + update_ref(msg.buf, "HEAD", >commit->object.oid, NULL, 
> REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
>   if (!opts->quiet) {
>   if (old->path &&
>   advice_detached_head && !opts->force_detach)
> diff --git a/builtin/clone.c b/builtin/clone.c
> index dbddd98f80..4135621aa3 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -610,8 +610,8 @@ static void write_followtags(const struct ref *refs, 
> const char *msg)
>   continue;

Re: [PATCH 06/24] Convert check_connected to use struct object_id

2017-10-02 Thread Stefan Beller
On Sun, Oct 1, 2017 at 3:08 PM, brian m. carlson
 wrote:
> Convert check_connected and the callbacks it takes to use struct
> object_id.
>

> diff --git a/connected.c b/connected.c
> index f416b05051..4a47f33270 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -16,13 +16,13 @@
>   *
>   * Returns 0 if everything is connected, non-zero otherwise.
>   */
> -int check_connected(sha1_iterate_fn fn, void *cb_data,
> +int check_connected(oid_iterate_fn fn, void *cb_data,
> struct check_connected_options *opt)
>  {
> struct child_process rev_list = CHILD_PROCESS_INIT;
> struct check_connected_options defaults = CHECK_CONNECTED_INIT;
> -   char commit[41];
> -   unsigned char sha1[20];
> +   char commit[GIT_MAX_HEXSZ + 1];
> +   struct object_id oid;
> int err = 0;
> struct packed_git *new_pack = NULL;
> struct transport *transport;

> @@ -77,7 +77,7 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
>
> sigchain_push(SIGPIPE, SIG_IGN);
>
> -   commit[40] = '\n';
> +   commit[GIT_SHA1_HEXSZ] = '\n';

While we are using SHA1, this and below is correctly using
GIT_SHA1_HEXSZ, but the array is defined as GIT_MAX_HEXSZ.

Upon switching the hash function, we would plug in the
GIT_NEWHASH_HEXSZ here, or if we do it dynamically
(using a vtable for hash functions, to switch at run time)
we'd need to make the decision based on the hash function.

Makes sense.

> do {
> /*
>  * If index-pack already checked that:
> @@ -87,17 +87,17 @@ int check_connected(sha1_iterate_fn fn, void *cb_data,
>  * are sure the ref is good and not sending it to
>  * rev-list for verification.
>  */
> -   if (new_pack && find_pack_entry_one(sha1, new_pack))
> +   if (new_pack && find_pack_entry_one(oid.hash, new_pack))
> continue;
>
> -   memcpy(commit, sha1_to_hex(sha1), 40);
> -   if (write_in_full(rev_list.in, commit, 41) < 0) {
> +   memcpy(commit, oid_to_hex(), GIT_SHA1_HEXSZ);
> +   if (write_in_full(rev_list.in, commit, GIT_SHA1_HEXSZ + 1) < 
> 0) {
> if (errno != EPIPE && errno != EINVAL)
> error_errno(_("failed write to rev-list"));
> err = -1;
> break;
> }


Re: Updated to v2.14.2 on macOS; git add --patch broken

2017-10-02 Thread Jonathan Nieder
Hi,

Toni Uebernickel wrote:

> I updated to git version v2.14.2 on macOS using homebrew.
>
> Since then `git add --patch` and `git stash save --patch` are not
> working anymore. It's just printing the complete diff without ever
> stopping to ask for actions. This results in an unusable state, as
> the whole command option is rendered useless.

Would a patch like the following help?

I am worried that other scripts using diff-files would need the same
kind of patch.  So it seems worthwhile to look for alternatives.

An alternative would be to partially roll back v2.14.2~61^2~4 (color:
check color.ui in git_default_config, 2017-07-13) by making it not
take effect for plumbing commands (i.e., by adding a boolean to
"struct startup_info" to indicate whether a command is low-level
plumbing).  That would make the behavior of Git harder to explain so I
don't particularly like it.  Plus it defeats the point of the patch.

Yet another alternative would be to treat color.ui=always as a
deprecated synonym for color.ui=auto.  I think that's my preferred
fix.

What do you think?

Thanks again for reporting,
Jonathan

diff --git i/git-add--interactive.perl w/git-add--interactive.perl
index 28b325d754..4ea69538c7 100755
--- i/git-add--interactive.perl
+++ w/git-add--interactive.perl
@@ -101,49 +101,49 @@ sub apply_patch_for_stash;
 
 my %patch_modes = (
'stage' => {
-   DIFF => 'diff-files -p',
+   DIFF => 'diff-files --no-color -p',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => 'file-only',
IS_REVERSE => 0,
},
'stash' => {
-   DIFF => 'diff-index -p HEAD',
+   DIFF => 'diff-index --no-color -p HEAD',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => undef,
IS_REVERSE => 0,
},
'reset_head' => {
-   DIFF => 'diff-index -p --cached',
+   DIFF => 'diff-index --no-color -p --cached',
APPLY => sub { apply_patch 'apply -R --cached', @_; },
APPLY_CHECK => 'apply -R --cached',
FILTER => 'index-only',
IS_REVERSE => 1,
},
'reset_nothead' => {
-   DIFF => 'diff-index -R -p --cached',
+   DIFF => 'diff-index --no-color -R -p --cached',
APPLY => sub { apply_patch 'apply --cached', @_; },
APPLY_CHECK => 'apply --cached',
FILTER => 'index-only',
IS_REVERSE => 0,
},
'checkout_index' => {
-   DIFF => 'diff-files -p',
+   DIFF => 'diff-files --no-color -p',
APPLY => sub { apply_patch 'apply -R', @_; },
APPLY_CHECK => 'apply -R',
FILTER => 'file-only',
IS_REVERSE => 1,
},
'checkout_head' => {
-   DIFF => 'diff-index -p',
+   DIFF => 'diff-index --no-color -p',
APPLY => sub { apply_patch_for_checkout_commit '-R', @_ },
APPLY_CHECK => 'apply -R',
FILTER => undef,
IS_REVERSE => 1,
},
'checkout_nothead' => {
-   DIFF => 'diff-index -R -p',
+   DIFF => 'diff-index --no-color -R -p',
APPLY => sub { apply_patch_for_checkout_commit '', @_ },
APPLY_CHECK => 'apply',
FILTER => undef,


Re: hooks are ignored if there are not marked as executable

2017-10-02 Thread Jonathan Nieder
Hi Damien,

Damien wrote:

> If you do `echo my_script > .git/hooks/pre-commit` and then `git commit`,
> The hook is just gonna be ignored.
> But if you do `chmod +x .git/hooks/pre-commit`, then it's executed.

This is intentional.

> I think ignoring a hook is misleading and not newbie friendly, an error
> message to signal an incorrectly configured hook could be better.

Adding a warning sounds like a nice change.  Care to propose a patch?

In the early days, sample hooks were installed without .sample in
their filename and could be enabled by "chmod +x"-ing them.  Because
of this, long-time users of Git may find themselves experiencing this
warning more often than they'd like.  That could be acceptable if the
warning shows a command to run to prevent the warning from showing up
again, though (see advice.c for some examples of how that can be
done).

The main code path to look at is run-command.c::find_hook.

"git grep -e 'rev-parse --git-path hooks' -- . ':!contrib'" finds a
few other code paths you may also want to look at.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Jonathan Nieder
Hi,

Taylor Blau wrote:

> Peff points out that different atom parsers handle the empty
> "sub-argument" list differently. An example of this is the format
> "%(refname:)".
>
> Since callers often use `string_list_split` (which splits the empty
> string with any delimiter as a 1-ary string_list containing the empty
> string), this makes handling empty sub-argument strings non-ergonomic.
>
> Let's fix this by assuming that atom parser implementations don't care
> about distinguishing between the empty string "%(refname:)" and no
> sub-arguments "%(refname)".
>
> Signed-off-by: Taylor Blau 
> ---
>  ref-filter.c| 10 +-
>  t/t6300-for-each-ref.sh |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)

The above does a nice job of explaining

 - what this change is going to do
 - how it's good for the internal code structure / maintainability

What it doesn't tell me about is why the user-facing effect won't
cause problems.  Is there no atom where %(atom:) was previously
accepted and did something meaningful that this may break?

Looking at the manpage and code, I don't see any, so for what it's
worth, this is

Reviewed-by: Jonathan Nieder 

but for next time, please remember to discuss regression risk in
the commit message, too.

Thanks,
Jonathan


Re: [PATCH 04/24] refs: convert update_ref and refs_update_ref to use struct object_id

2017-10-02 Thread Stefan Beller
> diff --git a/bisect.c b/bisect.c
> index 96beeb5d13..e8470a2e0f 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -685,11 +685,13 @@ static int bisect_checkout(const struct object_id 
> *bisect_rev, int no_checkout)
> char bisect_rev_hex[GIT_MAX_HEXSZ + 1];
>
> memcpy(bisect_rev_hex, oid_to_hex(bisect_rev), GIT_SHA1_HEXSZ + 1);
> -   update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev->hash, NULL, 0, 
> UPDATE_REFS_DIE_ON_ERR);
> +   update_ref(NULL, "BISECT_EXPECTED_REV", bisect_rev, NULL, 0,
> +  UPDATE_REFS_DIE_ON_ERR);

The number of characters decrease, yet the line gets an additional
line break. While I don't mind this, the most interesting question that
comes to mind is whether you tried the new clang formatting options
in tree to adapt the indentation? ;)

> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -266,12 +266,12 @@ static int reset_refs(const char *rev, const struct 
> object_id *oid)
> if (!get_oid("HEAD", _orig)) {
> orig = _orig;
> set_reflog_message(, "updating ORIG_HEAD", NULL);
> -   update_ref_oid(msg.buf, "ORIG_HEAD", orig, old_orig, 0,
> +   update_ref(msg.buf, "ORIG_HEAD", orig, old_orig, 0,
>UPDATE_REFS_MSG_ON_ERR);
> } else if (old_orig)
> delete_ref(NULL, "ORIG_HEAD", old_orig, 0);
> set_reflog_message(, "updating HEAD", rev);
> -   update_ref_status = update_ref_oid(msg.buf, "HEAD", oid, orig, 0,
> +   update_ref_status = update_ref(msg.buf, "HEAD", oid, orig, 0,
>UPDATE_REFS_MSG_ON_ERR);

At all the other places (of s/update_ref_oid/update_ref/) so far you re-indented
the second line to align. This doesn't align in the first place, so it
shall be good.


Re: git submodule add fails when using both --branch and --depth

2017-10-02 Thread Jonathan Nieder
Hi,

Thadeus Fleming wrote:

> I'm running git 2.14.2 on Ubuntu 16.04.
>
> Compare the behavior of
>
>> git clone --branch pu --depth 1 https://github.com/git/git git-pu
>
> which clones only the latest commit of the pu branch and

Yes.

>> mkdir tmp && cd tmp && git init
>> git submodule add --branch pu --depth 1 https://github.com/git/git \
>  git-pu
>
> which gives the error
>
> fatal: 'origin/pu' is not a commit and a branch 'pu' cannot be created from it
> Unable to checkout submodule 'git-pu'

As a side note (you are using "git submodule add --depth", not "git
submodule update --depth"), I suspect that "submodule update --depth"
may not always do what people expect.

With add --depth, I agree with your expectation and after your fix
everything should work fine.  But with update --depth, consider the
following sequence of steps:

 1. I create a repository "super" with submodule "sub" and publish
both.

 2. I make a couple commits to "sub" and a commit to "super" making
use of those changes and want to publish them.

 3. I use "git push --recurse-submodules" to publish my commits to
"sub" and "super":

a. First it pushes to "sub".

b. Then it pushes to "super".

Between steps 3(a) and 3(b), a person can still "git clone
--recurse-submodules" the "super" repository.  The repository "super"
does not have my change yet and "sub" does, but that is not a problem,
since commands like "git checkout --recurse-submodules" and "git
submodule update" know to check out the commit *before* my change in
the submodule.

But between steps 3(a) and 3(b), "git submodule update --depth=1"
would not work.  It would fetch the submodule with depth 1 and then
try to check out a commit that is deeper in history.

So I think there's more thinking needed there.

That's all a tangent to your report about add --depth, though.

Thanks,
Jonathan


Re: [idea] File history tracking hints

2017-10-02 Thread Jeff Hostetler



On 10/2/2017 3:18 PM, Stefan Beller wrote:

On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler  wrote:


Sorry to re-re-...-re-stir up such an old topic.

I wasn't really thinking about commit-to-commit hints.
I think these have lots of problems.  (If commit A->B does
"t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
then you need a way to compute a transitive closure to see
the net-net hints for A->C.  I think that quickly spirals
out of control.)


I agree. Though as a human I can still look at
A..C giving the hint that t/*.c and xyz/*.c ought to
be taken into account for rename detection.
(which is currently done with -M -C --find-copies-harder
as a generic "there are renamed things", and not the very
specific rule, that may be cheaper to examine compared to
these generic rules)


No, I was going in another direction.  For example, if a
tree-entry contains { file-guid, file-name, file-sha, ... }
then when diffing any 2 commits, you can match up files
(and folders) by their guids.  Renames pop out trivially when
their file-names don't match.  File moves pop out when the
file-guids appear in different trees.  Adds and deletes pop
out when file-guids don't have a peer. (I'm glossing over some
of the details, but you get the idea.)


How do you know when a guid needs adaption?


I'm not sure I know what you mean by "adaption".



(c.f. origin/jt/packmigrate)
If a commit moves a function out of a file into a new file,
the ideal version control could notice that the function
was moved into a new file and still attribute the original
authors by ignoring the move commit.


I think that's an orthogonal problem.  I could move a function
from one file to an existing file or to a new file it doesn't
matter.  Attributing those lines back to the original author
(rather than the mover) is a bit of a pipe dream IMHO.  And I
have to wonder if it is always the correct thing to do?  I can
see scenarios where you'd want the mover.

I guess there's nothing from stopping the "ideal VC system"
doing all this line-based analysis, but that shouldn't make
file renames expensive to detect (since that is the granularity
that people and most tools expect the system to work with).



Another series in flight could have modified that
function slightly (fixed a bug), such that it's hard to
reason about these things.

For guids I imagine the new file gets a new guid, such that
tracking the function becomes harder?



Yeah, I'm not thinking about tracking individual functions.




To address Junio's
question, independently added files with the same name will
have 2 different file-guids.  We amend the merge rules to
handle this case and pick one of them (say, the one that
is sorts less than the other) as the winner and go on.
All-in-all the solution is not trivial (as there are a few
edge cases to deal with), but it better matches the (casual)
user's perception of what happened to their tree over time.


The GUID would be made up at creation time, I assume?
Is there any input other than the file itself? (I assumed so
initially, such that:
   By having a GUID in the tree, we would divorce from the notion
   of a "content addressable file system" quickly, as we both could
   create the same tree locally (containing the same blobs) and
   yet the trees would have different names due to having different
   GUIDs in them
), which I'd find undesirable.


Right.  A real solution would store the guid data slightly
differently so we could preserve the existing SHA properties.
My example was more conceptual.




It also doesn't require expensive code to sniff for renames
on every command (which doesn't scale on really large repos).


I wonder if the rename detection could be offloaded to a server
(which scales) that provides a "hint file" to clients, such that the
clients can then cheaply make use of these specific hints.



I don't know.  Might be easier to add that computation to the
occasional client-side housekeeping (somewhat like the commit
generation number computation we keep talking about).

Thanks
Jeff


Re: git submodule add fails when using both --branch and --depth

2017-10-02 Thread Stefan Beller
On Sat, Sep 30, 2017 at 10:20 AM, Thadeus Fleming
 wrote:
> I'm running git 2.14.2 on Ubuntu 16.04.
>
> Compare the behavior of
>
>> git clone --branch pu --depth 1 https://github.com/git/git git-pu
>
> which clones only the latest commit of the pu branch and
>
>> mkdir tmp && cd tmp && git init
>> git submodule add --branch pu --depth 1 https://github.com/git/git \
>   git-pu
>
> which gives the error
>
> fatal: 'origin/pu' is not a commit and a branch 'pu' cannot be created
> from it
> Unable to checkout submodule 'git-pu'
>
> Investigating further, there is indeed only one commit in the local repo:
>
>> cd git-pu
>> git log --oneline | wc -l
> 1
>
> But that commit is the head of master.
>
>> git branch -a
> * master remotes/origin/master
>   remotes/origin/HEAD -> origin/master
>
> This appears to be because git-submodule--helper does not accept a
> --branch option. Using the --depth N option causes it to only clone N
> commits from the default branch, which generally do not include the
> desired branch. Thus, the next step,
>
> git checkout -f -q -B "$branch" "origin/$branch"
>
> fails, and provides the rather confusing error message above.
>
> I'd suggest that git-submodule--helper learn a --branch option
> consistent with git clone, and if that is impossible, that
> git submodule add rejects the simultaneous use of both the --branch and
> --depth options.

Adding the branch field to the submodule helper is a great idea.

>
>
>
> --tjf


Re: [PATCH] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Jeff King
On Mon, Oct 02, 2017 at 09:12:58AM -0700, Taylor Blau wrote:

> > I know this is getting _really_ subjective, but IMHO this is a lot more
> > reasoning than the comment needs. The commit message goes into the
> > details of the "why", but here I'd have just written something like:
> >
> >   /* treat "%(foo:)" the same as "%(foo)"; i.e., no arguments */
> 
> I sent an updated v2 of this "series" (without a cover-letter) that
> shortens this comment to more or less what you suggested. I've kept the
> commit message longer, since I think that that information is useful
> within "git blame".

Yeah, sorry if I wasn't clear: definitely the commit message is fine and
is the place to go into the detail and rationale.

-Peff


Re: [PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Jeff King
On Mon, Oct 02, 2017 at 09:10:34AM -0700, Taylor Blau wrote:

> Peff points out that different atom parsers handle the empty
> "sub-argument" list differently. An example of this is the format
> "%(refname:)".
> 
> Since callers often use `string_list_split` (which splits the empty
> string with any delimiter as a 1-ary string_list containing the empty
> string), this makes handling empty sub-argument strings non-ergonomic.
> 
> Let's fix this by assuming that atom parser implementations don't care
> about distinguishing between the empty string "%(refname:)" and no
> sub-arguments "%(refname)".
> 
> Signed-off-by: Taylor Blau 
> ---
>  ref-filter.c| 10 +-
>  t/t6300-for-each-ref.sh |  1 +
>  2 files changed, 10 insertions(+), 1 deletion(-)

This looks good to me, thanks.

-Peff


Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-10-02 Thread Jason Cooper
On Fri, Sep 29, 2017 at 10:34:13AM -0700, Jonathan Nieder wrote:
> Junio C Hamano wrote:
> > Jonathan Nieder  writes:
...
> > If it is a goal to eventually be able to lose SHA-1 compatibility
> > metadata from the objects, then we might want to remove SHA-1 based
> > signature bits (e.g. PGP trailer in signed tag, gpgsig header in the
> > commit object) from NewHash contents, and instead have them stored
> > in a side "metadata" table, only to be used while converting back.
> > I dunno if that is desirable.
> 
> I don't consider that desirable.
> 
> A SHA-1 based signature is still of historical interest even if my
> centuries-newer version of Git is not able to verify it.

Agreed, even a signature made by a now exposed and revoked key still has
validity.  Especially in a commit or merge.  We know it was made prior
to the key being compromised / revoked.

This is assuming that the keyholder can definitively say "Don't trust
signatures from this key after this date/time+".  And the signature
in question is in the git history prior to that cut off.

Tags are a different animal because they can be added at any time and
aren't directly incorporated into the history.

thx,

Jason.


Re: RFC v3: Another proposed hash function transition plan

2017-10-02 Thread Jeff King
On Mon, Oct 02, 2017 at 10:18:02AM -0700, Linus Torvalds wrote:

> On Mon, Oct 2, 2017 at 7:00 AM, Jason Cooper  wrote:
> >
> > Ahhh, so if I understand you correctly, you'd prefer SHA-256 over
> > SHA3-256 because it's more performant for your usecase?  Well, that's a
> > completely different animal that cryptographic suitability.
> 
> In almost all loads I've seen, zlib inflate() cost is a bigger deal
> than the crypto load. The crypto people talk about cycles per byte,
> but the deflate code is what usually takes the page faults and cache
> misses etc, and has bad branch prediction. That ends up easily being
> tens or thousands of cycles, even for small data.

If anyone is interested in the user-visible effects of slower crypto, I
think, there are some numbers in 8325e43b82 (Makefile: add DC_SHA1 knob,
2017-03-16). I don't know how SHA-256 compares to sha1dc exactly, but
certainly the latter is a lot slower than normal sha1.

The only real-world case I found with a noticeable slowdown was
index-pack.  Which in the worst case is roughly the same operation as
"git fsck" (inflate and compute the sha1 on every byte), but people tend
to actually do it a lot more often.

And it really _is_ slower for real-world operations; the CPU for
computing the sha1 of an incoming clone of linux.git jumped from ~3
minutes to ~6 minutes.  But I don't think we've seen a lot of
complaints, probably because that time is lumped in with "time to
transfer a gigabyte of data", so unless you're on a slow machine on fast
connection, you don't even really notice.

For day-to-day operations in a repository, I never came up with a good
example where the speed difference mattered. I think Dscho's giant-index
example is an outlier and the right answer there is not "pick a fast
crypto algorithm" but "stop using a slow crypto algorithm as a
checksum" (and also, stop routinely reading and writing 400MB for
day-to-day operations).

-Peff


Re: [PATCH v4] technical doc: add a design doc for hash function transition

2017-10-02 Thread Jason Cooper
Hi Jonathan,

On Wed, Sep 27, 2017 at 09:43:21PM -0700, Jonathan Nieder wrote:
> This document describes what a transition to a new hash function for
> Git would look like.  Add it to Documentation/technical/ as the plan
> of record so that future changes can be recorded as patches.
> 
> Also-by: Brandon Williams 
> Also-by: Jonathan Tan 
> Also-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
> On Thu, Mar 09, 2017 at 11:14 AM, Shawn Pearce wrote:
> > On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder  wrote:
> 
> >> Thanks for the kind words on what had quite a few flaws still.  Here's
> >> a new draft.  I think the next version will be a patch against
> >> Documentation/technical/.
> >
> > FWIW, I like this approach.
> 
> Okay, here goes.
> 
> Instead of sharding the loose object translation tables by first byte,
> we went for a single table.  It simplifies the design and we need to
> keep the number of loose objects under control anyway.
> 
> We also included a description of the transition plan and tried to
> include a summary of what has been agreed upon so far about the choice
> of hash function.
> 
> Thanks to Junio for reviving the discussion and in particular to Dscho
> for pushing this forward and making the missing pieces clearer.
> 
> Thoughts of all kinds welcome, as always.
> 
>  Documentation/Makefile |   1 +
>  .../technical/hash-function-transition.txt | 797 
> +
>  2 files changed, 798 insertions(+)
>  create mode 100644 Documentation/technical/hash-function-transition.txt
> 
...
> diff --git a/Documentation/technical/hash-function-transition.txt 
> b/Documentation/technical/hash-function-transition.txt
> new file mode 100644
> index 00..417ba491d0
> --- /dev/null
> +++ b/Documentation/technical/hash-function-transition.txt
> @@ -0,0 +1,797 @@
> +Git hash function transition
> +
> +
> +Objective
> +-
> +Migrate Git from SHA-1 to a stronger hash function.
> +
...
> +Goals
> +-
> +Where NewHash is a strong 256-bit hash function to replace SHA-1 (see
> +"Selection of a New Hash", below):

Could we clarify and say "a strong hash function with 256-bit output"?

...
> +Overview
> +
> +We introduce a new repository format extension. Repositories with this
> +extension enabled use NewHash instead of SHA-1 to name their objects.
> +This affects both object names and object content --- both the names
> +of objects and all references to other objects within an object are
> +switched to the new hash function.
> +
> +NewHash repositories cannot be read by older versions of Git.
> +
> +Alongside the packfile, a NewHash repository stores a bidirectional
> +mapping between NewHash and SHA-1 object names. The mapping is generated
> +locally and can be verified using "git fsck". Object lookups use this
> +mapping to allow naming objects using either their SHA-1 and NewHash names
> +interchangeably.

nit: Are we presuming that abbreviated hashes won't collide?  Or the
user needs to specify which hash type?

> +Object format
> +~
> +The content as a byte sequence of a tag, commit, or tree object named
> +by sha1 and newhash differ because an object named by newhash-name refers to
> +other objects by their newhash-names and an object named by sha1-name
> +refers to other objects by their sha1-names.
> +
> +The newhash-content of an object is the same as its sha1-content, except
> +that objects referenced by the object are named using their newhash-names
> +instead of sha1-names. Because a blob object does not refer to any
> +other object, its sha1-content and newhash-content are the same.
> +
> +The format allows round-trip conversion between newhash-content and
> +sha1-content.

It would be nice here to explicitly mention deterministic hashing.
Meaning that anyone who converts a commit from sha1 to newhash shall get
the same newhash.

> +
> +Object storage
> +~~
> +Loose objects use zlib compression and packed objects use the packed
> +format described in Documentation/technical/pack-format.txt, just like
> +today. The content that is compressed and stored uses newhash-content
> +instead of sha1-content.
> +
> +Pack index
> +~~
> +Pack index (.idx) files use a new v3 format that supports multiple
> +hash functions. They have the following format (all integers are in
> +network byte order):
> +
> +- A header appears at the beginning and consists of the following:
> +  - The 4-byte pack index signature: '\377t0c'
> +  - 4-byte version number: 3
> +  - 4-byte length of the header section, including the signature and
> +version number
> +  - 4-byte number of objects contained in the pack
> +  - 4-byte number of object formats in this pack index: 2
> +  - For each object format:
> +- 4-byte format identifier (e.g., 'sha1' for SHA-1)

This seems a 

Re: [PATCH] tag: avoid NULL pointer arithmetic

2017-10-02 Thread Jeff King
On Mon, Oct 02, 2017 at 03:06:57PM +0200, René Scharfe wrote:

> >> Avoid the ASan error by casting the results of the lookup functions to
> >> struct object pointers.  That works fine with NULL pointers as well.  We
> >> already rely on the object member being first in all object types in
> >> other places in the code.
> > 
> > Out of curiosity, did you have to do anything to coax this out of ASan
> > (e.g., a specific version)?  I've been running it pretty regularly and
> > didn't see this one (I did switch from clang to gcc a month or two ago,
> > but this code is pretty old, I think).
> 
> I did "make -j4 SANITIZE=undefined,address BLK_SHA1=1 test" with
> clang version 4.0.1-1 (tags/RELEASE_401/final), and t1450-fsck.sh failed.

Interesting. I can trigger it with the same setup, but not if:

  1. I use gcc instead of clang.

  2. If I only use one of UBSan and ASan. It's the combination that
 triggers it.

-Peff


Re: [PATCH 2/2] run-command: use ALLOC_ARRAY

2017-10-02 Thread Stefan Beller
On Sun, Oct 1, 2017 at 8:14 AM, René Scharfe  wrote:
> Use the macro ALLOC_ARRAY to allocate an array.  This is shorter and
> eaasier, as it automatically infers the size of elements.

easier

Thanks,
Stefan


Re: [idea] File history tracking hints

2017-10-02 Thread Stefan Beller
On Mon, Oct 2, 2017 at 11:51 AM, Jeff Hostetler  wrote:

> Sorry to re-re-...-re-stir up such an old topic.
>
> I wasn't really thinking about commit-to-commit hints.
> I think these have lots of problems.  (If commit A->B does
> "t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
> then you need a way to compute a transitive closure to see
> the net-net hints for A->C.  I think that quickly spirals
> out of control.)

I agree. Though as a human I can still look at
A..C giving the hint that t/*.c and xyz/*.c ought to
be taken into account for rename detection.
(which is currently done with -M -C --find-copies-harder
as a generic "there are renamed things", and not the very
specific rule, that may be cheaper to examine compared to
these generic rules)

> No, I was going in another direction.  For example, if a
> tree-entry contains { file-guid, file-name, file-sha, ... }
> then when diffing any 2 commits, you can match up files
> (and folders) by their guids.  Renames pop out trivially when
> their file-names don't match.  File moves pop out when the
> file-guids appear in different trees.  Adds and deletes pop
> out when file-guids don't have a peer. (I'm glossing over some
> of the details, but you get the idea.)

How do you know when a guid needs adaption?

(c.f. origin/jt/packmigrate)
If a commit moves a function out of a file into a new file,
the ideal version control could notice that the function
was moved into a new file and still attribute the original
authors by ignoring the move commit.

Another series in flight could have modified that
function slightly (fixed a bug), such that it's hard to
reason about these things.

For guids I imagine the new file gets a new guid, such that
tracking the function becomes harder?


> To address Junio's
> question, independently added files with the same name will
> have 2 different file-guids.  We amend the merge rules to
> handle this case and pick one of them (say, the one that
> is sorts less than the other) as the winner and go on.
> All-in-all the solution is not trivial (as there are a few
> edge cases to deal with), but it better matches the (casual)
> user's perception of what happened to their tree over time.

The GUID would be made up at creation time, I assume?
Is there any input other than the file itself? (I assumed so
initially, such that:
  By having a GUID in the tree, we would divorce from the notion
  of a "content addressable file system" quickly, as we both could
  create the same tree locally (containing the same blobs) and
  yet the trees would have different names due to having different
  GUIDs in them
), which I'd find undesirable.

> It also doesn't require expensive code to sniff for renames
> on every command (which doesn't scale on really large repos).

I wonder if the rename detection could be offloaded to a server
(which scales) that provides a "hint file" to clients, such that the
clients can then cheaply make use of these specific hints.


Re: git add -p stops working when setting color.ui = always

2017-10-02 Thread Thomas Gummerer
Hi,

On 10/02, Tsvi Mostovicz wrote:
> Hi,
> 
> When setting "color.ui = always" in the last few versions (not sure
> exactly when this started, but definitely exists in 2.14.1 and
> 2.14.2), git add -p stops working as expected. Instead of prompting
> the user, it skips through the prompts and doesn't allow selecting a
> hunk.

This is a known change in behaviour (for more information see the
thread at
https://public-inbox.org/git/86d0a377-8afd-460d-a90e-6327c6934...@gmail.com/).

> 
> Don't know why I had color.ui = always set in my .gitconfig.

Can you set color.ui = auto instead?  That should give you colored
output on the terminal, while not breaking git add -p.

> git version 2.14.2.666.gea220ee40
> 
> Thanks,
> 
> 
> Tsvi Mostovicz
> ttm...@gmail.com
> www.linkedin.com/in/tsvim


Re: [idea] File history tracking hints

2017-10-02 Thread Jeff Hostetler



On 10/2/2017 1:41 PM, Stefan Beller wrote:

It would be nice if every file (and tree) had a permanent GUID
associated with it.  Then the filename/pathname becomes a property
of the GUIDs.  Then you can exactly know about moves/renames with
minimal effort (and no guessing).



...


https://public-inbox.org/git/pine.lnx.4.58.0504150753440.7...@ppc970.osdl.org/

I'd encourge people to read and re-read that message until they can
recite it by heart.


I have rethought about the idea of GUIDs as proposed by Jeff and wanted
to give a reply. After rereading this message, I think my thoughts are
already included via:

   - you're doing the work at the wrong point for _another_ reason. You're
  freezing your (crappy) algorithm at tree creation time, and basically
  making it pointless to ever create something better later, because even
  if hardware and software improves, you've codified that "we have to
  have crappy information".

--
My design proposal for these "rename hints" would be a special trailer,
roughly:

 Rename: LICENSE -> legal.txt
 Rename: t/* -> tests/*

or more generally:

 Rename:   

This however has multiple issues due to potential
human inaccuracies:
(A) typos in the trailer key or in the pathspec
(resulting in different error modes)
(B) partial hints (We currently have a world of
completely missing hints, so I would not expect it to
be worse?)
(C) wrong hints. This ought to be no problem as Git would
take some CPU time to conclude the hint was bogus.

For (A), I would imagine we want a mechanism (e.g. notes)
to "correct" the hints. This is the similar issue as a typo in a
commit message, which we currently just ignore if the
commit has been merged to e.g. master.

So maybe we'd just design around that, giving the option
to give the correct hints via command line.

So if the commit has the typo'd hint

 Remame:  t/* -> tests/*

the human would see that (and also conclude that by
the commit message), and then invoke

git log -C -C-hint="t/* -> tests/*" ...

which would have the corrected hint and hence deliver
the best output.

Maybe the "-C-hint" flag is the best starting point when
going in that direction?

Thanks,
Stefan



Sorry to re-re-...-re-stir up such an old topic.

I wasn't really thinking about commit-to-commit hints.
I think these have lots of problems.  (If commit A->B does
"t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
then you need a way to compute a transitive closure to see
the net-net hints for A->C.  I think that quickly spirals
out of control.)

No, I was going in another direction.  For example, if a
tree-entry contains { file-guid, file-name, file-sha, ... }
then when diffing any 2 commits, you can match up files
(and folders) by their guids.  Renames pop out trivially when
their file-names don't match.  File moves pop out when the
file-guids appear in different trees.  Adds and deletes pop
out when file-guids don't have a peer. (I'm glossing over some
of the details, but you get the idea.)  To address Junio's
question, independently added files with the same name will
have 2 different file-guids.  We amend the merge rules to
handle this case and pick one of them (say, the one that
is sorts less than the other) as the winner and go on.
All-in-all the solution is not trivial (as there are a few
edge cases to deal with), but it better matches the (casual)
user's perception of what happened to their tree over time.
It also doesn't require expensive code to sniff for renames
on every command (which doesn't scale on really large repos).

But as I said before, that ship has passed...
Jeff


Re: [idea] File history tracking hints

2017-10-02 Thread Stefan Beller
>> It would be nice if every file (and tree) had a permanent GUID
>> associated with it.  Then the filename/pathname becomes a property
>> of the GUIDs.  Then you can exactly know about moves/renames with
>> minimal effort (and no guessing).
>
...

> https://public-inbox.org/git/pine.lnx.4.58.0504150753440.7...@ppc970.osdl.org/
>
> I'd encourge people to read and re-read that message until they can
> recite it by heart.

I have rethought about the idea of GUIDs as proposed by Jeff and wanted
to give a reply. After rereading this message, I think my thoughts are
already included via:

  - you're doing the work at the wrong point for _another_ reason. You're
 freezing your (crappy) algorithm at tree creation time, and basically
 making it pointless to ever create something better later, because even
 if hardware and software improves, you've codified that "we have to
 have crappy information".

--
My design proposal for these "rename hints" would be a special trailer,
roughly:

Rename: LICENSE -> legal.txt
Rename: t/* -> tests/*

or more generally:

Rename:   

This however has multiple issues due to potential
human inaccuracies:
(A) typos in the trailer key or in the pathspec
   (resulting in different error modes)
(B) partial hints (We currently have a world of
   completely missing hints, so I would not expect it to
   be worse?)
(C) wrong hints. This ought to be no problem as Git would
   take some CPU time to conclude the hint was bogus.

For (A), I would imagine we want a mechanism (e.g. notes)
to "correct" the hints. This is the similar issue as a typo in a
commit message, which we currently just ignore if the
commit has been merged to e.g. master.

So maybe we'd just design around that, giving the option
to give the correct hints via command line.

So if the commit has the typo'd hint

Remame:  t/* -> tests/*

the human would see that (and also conclude that by
the commit message), and then invoke

git log -C -C-hint="t/* -> tests/*" ...

which would have the corrected hint and hence deliver
the best output.

Maybe the "-C-hint" flag is the best starting point when
going in that direction?

Thanks,
Stefan


[PATCH v2] setup: update error message to be more meaningful

2017-10-02 Thread Kaartic Sivaraam
From: Kaartic Sivaraam 

The error message shown when a flag is found when expecting a
filename wasn't clear as it didn't communicate what was wrong
using the 'suitable' words in *all* cases.

$ git ls-files
README.md
test-file

Correct case,

$ git rev-parse README.md --flags
README.md
--flags
fatal: bad flag '--flags' used after filename

Incorrect case,

$ git grep "some random regex" -n
fatal: bad flag '-n' used after filename

The above case is incorrect as "some random regex" isn't a filename
in this case.

Change the error message to be general and communicative. This results
in the following output,

$ git rev-parse README.md --flags
README.md
--flags
fatal: option '--flags' must come before non-option arguments

$ git grep "some random regex" -n
fatal: option '-n' must come before non-option arguments

Signed-off-by: Kaartic Sivaraam 
---
 Changes in v2:

Change in error message.

 setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/setup.c b/setup.c
index 860507e1f..09c793282 100644
--- a/setup.c
+++ b/setup.c
@@ -230,7 +230,7 @@ void verify_filename(const char *prefix,
 int diagnose_misspelt_rev)
 {
if (*arg == '-')
-   die("bad flag '%s' used after filename", arg);
+   die("option '%s' must come before non-option arguments", arg);
if (looks_like_pathspec(arg) || check_filename(prefix, arg))
return;
die_verify_filename(prefix, arg, diagnose_misspelt_rev);
-- 
2.14.1.935.ge2b2bcd8a



Re: [PATCH v3] clang-format: add a comment about the meaning/status of the

2017-10-02 Thread Brandon Williams
On 10/02, Junio C Hamano wrote:
> From: Stephan Beyer 
> 
> Having a .clang-format file in a project can be understood in a way that
> code has to be in the style defined by the .clang-format file, i.e., you
> just have to run clang-format over all code and you are set.
> 
> This unfortunately is not yet the case in the Git project, as the
> format file is still work in progress.  Explain it with a comment in
> the beginning of the file.
> 
> Additionally, the working clang-format version is mentioned because the
> config directives change from time to time (in a compatibility-breaking way).
> 
> Signed-off-by: Stephan Beyer 
> Signed-off-by: Junio C Hamano 
> ---
> 
>  * So here is a counter-proposal in a patch form.  I agree that my
>earlier suggestion was unnecessarily verbose; this one spends
>just as many lines and not more than the v2 round of Stephan's
>patch.
> 
>  .clang-format | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/.clang-format b/.clang-format
> index 56822c116b..7670eec8df 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -1,4 +1,8 @@
> -# Defaults
> +# This file is an example configuration for clang-format 5.0.
> +#
> +# Note that this style definition should only be understood as a hint
> +# for writing new code. The rules are still work-in-progress and does
> +# not yet exactly match the style we have in the existing code.

Thanks for writing up this header comment to the .clang-format file,
it's something I definitely should have included when I introduced it.

And I like the wording that you've both settled on, as it reflects our
intentions (of having the code eventually conform to the format rules)
and making note that this set of rules still needs to be tuned.


Thanks!

>  
>  # Use tabs whenever we need to fill whitespace that spans at least from one 
> tab
>  # stop to the next one.
> -- 
> 2.14.2-820-gefeff4fbff
> 

-- 
Brandon Williams


Re: [PATCH v2/RFC] commit: change the meaning of an empty commit message

2017-10-02 Thread Kaartic Sivaraam
On Thu, 2017-08-31 at 19:06 +0530, Kaartic Sivaraam wrote:
> On Thu, 2017-08-24 at 13:19 -0700, Junio C Hamano wrote:
> > 
> > The latter is easier for me as we do not have to worry about 
> > breaking people's scripts and tools used in
> > their established workflows at all.
> > 
> 
> In that case, how about doing something similar to what was done to
> 'set-upstream' option of branch? We could print a warning notice when
> the commit message is found to be empty due to the presence of a sign-
> off line. As usual we could stop warning and stop identifying log
> messages consisting only signed-off lines as empty after a few years of
> doing that.
> 
> Note: I have no idea how good an idea this is. Let me know if it's a
> bad one.
> 


I was recently searching to find the patches have gone missing in to
the void for no obvious reason and found this. Should I consider this
to be "Dropped" in terms of the "What's cooking" emails or has this
just not received the required attention?

---
Kaartic


Re: [PATCH v2] branch: change the error messages to be more meaningful

2017-10-02 Thread Kaartic Sivaraam
On Mon, 2017-08-21 at 19:06 +0530, Kaartic Sivaraam wrote:
> The error messages shown when the branch command is misused
> by supplying it wrong number of parameters wasn't meaningful.
> That's because it used the the phrase "too many branches"
> assuming all parameters to be "valid" branch names. It's not
> always the case as exemplified below,
> 
> $ git branch
>   foo
> * master
> 
> $ git branch -m foo foo old
> fatal: too many branches for a rename operation
> 
> Change the messages to be more general thus making no assumptions
> about the "parameters".
> 
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Changes in v2:
> 
> - changed the wordings of the error message
> 
>  builtin/branch.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/branch.c b/builtin/branch.c
> index a3bd2262b..62981d358 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -707,12 +707,12 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   else if (argc == 2)
>   rename_branch(argv[0], argv[1], rename > 1);
>   else
> - die(_("too many branches for a rename operation"));
> + die(_("too many arguments for a rename operation"));
>   } else if (new_upstream) {
>   struct branch *branch = branch_get(argv[0]);
>  
>   if (argc > 1)
> - die(_("too many branches to set new upstream"));
> + die(_("too many arguments to set new upstream"));
>  
>   if (!branch) {
>   if (!argc || !strcmp(argv[0], "HEAD"))
> @@ -735,7 +735,7 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   struct strbuf buf = STRBUF_INIT;
>  
>   if (argc > 1)
> - die(_("too many branches to unset upstream"));
> + die(_("too many arguments to unset upstream"));
>  
>   if (!branch) {
>   if (!argc || !strcmp(argv[0], "HEAD"))


I was recently searching to find the patches have gone missing in to
the void for no obvious reason and found this. Should I consider this
to be "Dropped" in terms of the "What's cooking" emails? or has this
just not received the required attention?

---
Kaartic


Re: RFC v3: Another proposed hash function transition plan

2017-10-02 Thread Linus Torvalds
On Mon, Oct 2, 2017 at 7:00 AM, Jason Cooper  wrote:
>
> Ahhh, so if I understand you correctly, you'd prefer SHA-256 over
> SHA3-256 because it's more performant for your usecase?  Well, that's a
> completely different animal that cryptographic suitability.

In almost all loads I've seen, zlib inflate() cost is a bigger deal
than the crypto load. The crypto people talk about cycles per byte,
but the deflate code is what usually takes the page faults and cache
misses etc, and has bad branch prediction. That ends up easily being
tens or thousands of cycles, even for small data.

But it does obviously depend on exactly what you do. The Windows
people saw SHA1 as costly mainly due to the index file (which is just
a "fancy crc", and not even cryptographically important, and where the
cache misses actually happen when doing crypto, not decompressing the
data).

And fsck and big initial checkins can have a very different profile
than most "regular use" profiles. Again, there the crypto happens
first, and takes the cache misses. And the crypto is almost certainly
_much_ cheaper than just the act of loading the index file contents in
the first place. It may show up on profiles fairly clearly, but that's
mostly because crypto is *intensive*, not because crypto takes up most
of the cycles.

End result: honestly, the real cost on almost any load is not crypto
or necessarily even (de)compression, even if those are the things that
show up. It's the cache misses and the "get data into user space"
(whether using "read()" or page faulting). Worrying about cycles per
byte of compression speed is almost certainly missing the real issue.

The people who benchmark cryptography tend to intentionally avoid the
actual real work, because they just want to know the crypto costs. So
when you see numbers like "9 cycles per byte" vs "12 cycles per byte"
and think that it's a big deal - 30% performance difference! -  it's
almost certainly complete garbage. It may be 30%, but it is likely 30%
out of 10% total, meaning that it's almost in the noise for any but
some very special case.

 Linus


Re: [PATCH v3] clang-format: add a comment about the meaning/status of the

2017-10-02 Thread Stephan Beyer
Hi,

On 10/02/2017 01:37 AM, Junio C Hamano wrote:
> diff --git a/.clang-format b/.clang-format
> index 56822c116b..7670eec8df 100644
> --- a/.clang-format
> +++ b/.clang-format
> @@ -1,4 +1,8 @@
> -# Defaults
> +# This file is an example configuration for clang-format 5.0.
> +#
> +# Note that this style definition should only be understood as a hint
> +# for writing new code. The rules are still work-in-progress and does
> +# not yet exactly match the style we have in the existing code.

I'm totally fine with this.

Stephan


Re: What means "git config bla ~/"?

2017-10-02 Thread Jonathan Nieder
Hi,

rpj...@crashcourse.ca wrote:

> i'm sure i'm about to embarrass myself but, in "man git-config",
> OPTIONS, one reads:
>
>   --path
>
> git-config will expand leading ~ to the value of $HOME, and ~user
> to the   home directory for the specified user. This option has no
> effect when setting the value (but you can use git config bla ~/
> from the command line to let your shell do the expansion).
>
> what's with that "git config bla ~/"? is this some config keyword
> or something?

No need to be embarrased.  Here "bla" is a placeholder.  That is,
for example, I can run

git config --global include.path ~/.config/git/more-config

or

git config --global include.path $HOME/.config/git/more-config

to cause

[include]
path = /home/me/.config/git/more-config

to be added to my global configuration.  The expansion of ~ or $HOME
is performed by my shell, not Git.  For comparison, if I had run

git config --global include.path '~/.config/git/more-config'

then that would cause

[include]
path = ~/.config/git/more-config

to be added to my global configuration, but it would still have the
same effect at run time, since Git is also able to expand ~ to my home
directory.

The wording comes from

commit 1349484e341a3ec2ba02a86c8fbd97ea9dc8c756
Author: Matthieu Moy 
Date:   Wed Dec 30 17:51:53 2009 +0100

builtin-config: add --path option doing ~ and ~user expansion.

I agree with you that it is less clear than it could be.  Ideas for
clarifying it?

Thanks,
Jonathan


Re: [PATCH 0/4] pre-merge hook

2017-10-02 Thread Stefan Beller
On Sun, Oct 1, 2017 at 10:54 PM, Junio C Hamano  wrote:
> Michael J Gruber  writes:
>
>> Now that f8b863598c ("builtin/merge: honor commit-msg hook for merges",
>> 2017-09-07) has landed, merge is getting closer to behaving like commit,
>> which is important because both are used to complete merges (automatic
>> vs. non-automatic).
>
> Just a gentle ping to the thread to see if it is still alive.
>
> I think people agree that this is a good thing to do, but it seems
> that the execution leaves a few loose ends, judging from the review
> comments that have yet to be answered.
>
> Thanks.

I agree. IIRC the series was ok in design and goal, just minor comments
for coding.

Thanks,
Stefan


Re: RFC v3: Another proposed hash function transition plan

2017-10-02 Thread Brandon Williams
On 10/02, Jason Cooper wrote:
> Hi Jonathan,
> 
> On Tue, Sep 26, 2017 at 04:51:58PM -0700, Jonathan Nieder wrote:
> > Johannes Schindelin wrote:
> > > On Tue, 26 Sep 2017, Jason Cooper wrote:
> > >> For my use cases, as a user of git, I have a plan to maintain provable
> > >> integrity of existing objects stored in git under sha1 while migrating
> > >> away from sha1.  The same plan works for migrating away from SHA2 or
> > >> SHA3 when the time comes.
> > >
> > > Please do not make the mistake of taking your use case to be a template
> > > for everybody's use case.
> > 
> > That said, I'm curious at what plan you are alluding to.  Is it
> > something that could benefit others on the list?
> 
> Well, it's just a plan at this point.  As there's a lot of other work to
> do in the mean-time, and there's no possibility of transitioning until
> the dust has settled on NEWHASH.  :-)
> 
> Given an existing repository that needs to migrate from SHA1 to NEWHASH,
> and maintain backwards compatibility with clients that haven't migrated
> yet, how do we
> 
>   a) perform that migration,
>   b) allow non-updated clients to use the data prior to the switch, and
>   c) maintain provable integrity of the old objects as well as the new.
> 
> The primary method is counter-hashing, which re-uses the blobs, and
> creates parallel, deterministic tree, commit, and tag objects using
> NEWHASH for everything up to flag day.  post-flag-day only uses NEWHASH.
> A PGP "transition" key is used to counter-sign the NEWHASH version of
> the old signed tags.  The transition key is not required to be different
> than the existing maintainers key.
> 
> A critical feature is the ability of entities other than the maintainer
> to migrate to NEWHASH.  For example, let's say that git has fully
> implemented and tested NEWHASH.  linux.git intends to migrate, but it's
> going to take several months (get all the developers herded up).
> 
> In the interim, a security company, relying on Linux for it's products
> can counter-hash Linus' repo, and continue to do so every time he
> updates his tree.  This shrinks the attack window for an entity (with an
> undisclosed break of SHA1) down to a few minutes to an hour.  Otherwise,
> a check of the counter hashes in the future would reveal the
> substitution.
> 
> The deterministic feature is critical here because there is valuable
> integrity and trust built by counter-hashing quickly after publication.
> So once Linux migrates to NEWHASH, the hashes calculated by the security
> company should be identical.  IOW, use the timestamps that are in the
> SHA1 commit objects for the NEWHASH objects.  Which should be obvious,
> but it's worth explicitly mentioning that determinism provides great
> value.
> 
> We're in the process of writing this up formally, which will provide a
> lot more detail and rationale that this quick stream of thought.  :-)
> 
> I'm sure a lot of this has already been discussed on the list.  If so, I
> apologize for being repetitive.  Unfortunately, I'm not able to keep up
> with the MLs like I used to.
> 
> thx,
> 
> Jason.

Given the interests that you've expressed here I'd recommend taking a
look at
https://public-inbox.org/git/20170928044320.ga84...@aiede.mtv.corp.google.com/
which is the current version of the transition plan that the community
has settled on
(https://public-inbox.org/git/xmqqlgkyxgvq@gitster.mtv.corp.google.com/
shows that it should be merged to 'next' soon).  Once neat aspect of
this transition plan is that it doesn't require a flag day but rather
anyone can migrate to the new hash function and still interact with
repositories (via the wire) which are still running SHA1.

-- 
Brandon Williams


Re: [PATCH] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Taylor Blau
On Mon, Oct 02, 2017 at 02:43:35AM -0400, Jeff King wrote:
> On Sun, Oct 01, 2017 at 10:53:11PM -0700, Taylor Blau wrote:
>
> > Peff points out that different atom parsers handle the empty
> > "sub-argument" list differently. An example of this is the format
> > "%(refname:)".
> >
> > Since callers often use `string_list_split` (which splits the empty
> > string with any delimiter as a 1-ary string_list containing the empty
> > string), this makes handling empty sub-argument strings non-ergonomic.
> >
> > Let's fix this by assuming that atom parser implementations don't care
> > about distinguishing between the empty string "%(refname:)" and no
> > sub-arguments "%(refname)".
>
> This looks good to me (both the explanation and the function of the
> code).

Thanks :-).

> But let me assume for a moment that your "please let me know" from the
> earlier series is still in effect, and you wish to be showered with
> pedantry and subjective advice. ;)
>
> I see a lot of newer contributors sending single patches as a 1-patch
> series with a cover letter. As a reviewer, I think this is mostly just a
> hassle. The cover letter ends up mostly repeating the same content from
> the single commit, so readers end up having to go over it twice (and you
> ended up having to write it twice).
>
> Sometimes there _is_ useful information to be conveyed that doesn't
> belong in the commit message, but that can easily go after the "---" (or
> before a "-- >8 --" if you really feel it should be read before the
> commit message.
>
> In general, if you find yourself writing a really long cover letter, and
> especially one that isn't mostly "meta" information (like where this
> should be applied, or what's changed since the last version), you should
> consider whether that information ought to go into the commit message
> instead. The one exception is if you _do_ have a long series and you
> need to sketch out the approach to help the reader see the big picture
> (in which case your cover letter should be summarizing what's already in
> the commit messages).

Thank you for this advice. I was worried when writing my cover letter
last night that it would be considered repetitive, but I wasn't sure how
much brevity/detail would be desired in a patch series of this length.

I'll keep this in mind for the future.

> And before anybody digs in the list to find my novel-length cover
> letters to throw back in my face, I know that I'm very guilty of this.
> I'm trying to get better at it, and passing it on so you can learn from
> my mistakes. :)

I appreciate your humility ;-).

> > -   if (arg)
> > +   if (arg) {
> > arg = used_atom[at].name + (arg - atom) + 1;
> > +   if (!*arg) {
> > +   /*
> > +* string_list_split is often use by atom parsers to
> > +* split multiple sub-arguments for inspection.
> > +*
> > +* Given a string that does not contain a delimiter
> > +* (example: ""), string_list_split returns a 1-ary
> > +* string_list that requires adding special cases to
> > +* atom parsers.
> > +*
> > +* Thus, treat the empty argument string as NULL.
> > +*/
> > +   arg = NULL;
> > +   }
> > +   }
>
> I know this is getting _really_ subjective, but IMHO this is a lot more
> reasoning than the comment needs. The commit message goes into the
> details of the "why", but here I'd have just written something like:
>
>   /* treat "%(foo:)" the same as "%(foo)"; i.e., no arguments */

I sent an updated v2 of this "series" (without a cover-letter) that
shortens this comment to more or less what you suggested. I've kept the
commit message longer, since I think that that information is useful
within "git blame".


--
- Taylor


[PATCH v2] ref-filter.c: pass empty-string as NULL to atom parsers

2017-10-02 Thread Taylor Blau
Peff points out that different atom parsers handle the empty
"sub-argument" list differently. An example of this is the format
"%(refname:)".

Since callers often use `string_list_split` (which splits the empty
string with any delimiter as a 1-ary string_list containing the empty
string), this makes handling empty sub-argument strings non-ergonomic.

Let's fix this by assuming that atom parser implementations don't care
about distinguishing between the empty string "%(refname:)" and no
sub-arguments "%(refname)".

Signed-off-by: Taylor Blau 
---
 ref-filter.c| 10 +-
 t/t6300-for-each-ref.sh |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index bc591f4f3..f3e53d444 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -415,8 +415,16 @@ static int parse_ref_filter_atom(const struct ref_format 
*format,
REALLOC_ARRAY(used_atom, used_atom_cnt);
used_atom[at].name = xmemdupz(atom, ep - atom);
used_atom[at].type = valid_atom[i].cmp_type;
-   if (arg)
+   if (arg) {
arg = used_atom[at].name + (arg - atom) + 1;
+   if (!*arg) {
+   /*
+* Treat empty sub-arguments list as NULL (i.e.,
+* "%(atom:)" is equivalent to "%(atom)").
+*/
+   arg = NULL;
+   }
+   }
memset(_atom[at].u, 0, sizeof(used_atom[at].u));
if (valid_atom[i].parser)
valid_atom[i].parser(format, _atom[at], arg);
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 2274a4b73..edc1bd8ea 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -51,6 +51,7 @@ test_atom() {
 }
 
 test_atom head refname refs/heads/master
+test_atom head refname: refs/heads/master
 test_atom head refname:short master
 test_atom head refname:lstrip=1 heads/master
 test_atom head refname:lstrip=2 master
-- 
2.14.1.145.gb3622a4ee



Re: [PATCH v6 0/7] Support %(trailers) arguments in for-each-ref(1)

2017-10-02 Thread Taylor Blau
On Mon, Oct 02, 2017 at 09:15:00PM +0900, Junio C Hamano wrote:
> Junio C Hamano  writes:
>
> > Thanks.  t6300 seems to show that %(contents:trailers:unfold) does
> > not quite work, among other things.
> >
> >   https://travis-ci.org/git/git/jobs/282126607#L3658
> >
> > I didn't have a chance to look into it myself.
>
> Peff's "oops, your logic is backwards" fixes the above failure.
>
> We also need this on top to pass the gettext-poison build.
>
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 872973b954..3bdfa02559 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -685,19 +685,21 @@ test_expect_success '%(contents:trailers:only) and 
> %(contents:trailers:unfold) w
>  '
>
>  test_expect_success '%(trailers) rejects unknown trailers arguments' '
> + # error message cannot be checked under i18n
>   cat >expect <<-EOF &&
>   fatal: unknown %(trailers) argument: unsupported
>   EOF
>   test_must_fail git for-each-ref --format="%(trailers:unsupported)" 
> 2>actual &&
> - test_cmp expect actual
> + test_i18ncmp expect actual
>  '
>
>  test_expect_success '%(contents:trailers) rejects unknown trailers 
> arguments' '
> + # error message cannot be checked under i18n
>   cat >expect <<-EOF &&
>   fatal: unknown %(trailers) argument: unsupported
>   EOF
>   test_must_fail git for-each-ref 
> --format="%(contents:trailers:unsupported)" 2>actual &&
> - test_cmp expect actual
> + test_i18ncmp expect actual
>  '
>
>  test_expect_success 'basic atom: head contents:trailers' '

Thank you for pointing this out. I am not well-versed on gettext, and
its usage within Git. I am happy to send out v7 of this series, or you
can apply these changes in queueing. Whichever is easier :-).

--
- Taylor


Re: [PATCH v6 7/7] ref-filter.c: parse trailers arguments with %(contents) atom

2017-10-02 Thread Taylor Blau
On Mon, Oct 02, 2017 at 02:51:00AM -0400, Jeff King wrote:
> > diff --git a/ref-filter.c b/ref-filter.c
> > index 43ed10a5e..6c26b4733 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -212,9 +212,10 @@ static void contents_atom_parser(const struct 
> > ref_format *format, struct used_at
> > atom->u.contents.option = C_SIG;
> > else if (!strcmp(arg, "subject"))
> > atom->u.contents.option = C_SUB;
> > -   else if (!strcmp(arg, "trailers"))
> > -   atom->u.contents.option = C_TRAILERS;
> > -   else if (skip_prefix(arg, "lines=", )) {
> > +   else if (skip_prefix(arg, "trailers", )) {
> > +   skip_prefix(arg, ":", );
> > +   trailers_atom_parser(format, atom, *arg ? NULL : arg);
>
> I think your logic is flipped. You want "*arg ? arg : NULL";

Thank you for pointing this out. I should have run "make test" on this
patch set (or, as you suggested, `git rebase -x "make test" HEAD~7`)
before sending it out. I appreciate you catching my mistake, and I'll
make sure to run "make test" more diligently in the future :-).

It sounds like Junio picked this up while queueing.

--
- Taylor


Re: RFC v3: Another proposed hash function transition plan

2017-10-02 Thread Jason Cooper
Hi Jonathan,

On Tue, Sep 26, 2017 at 04:51:58PM -0700, Jonathan Nieder wrote:
> Johannes Schindelin wrote:
> > On Tue, 26 Sep 2017, Jason Cooper wrote:
> >> For my use cases, as a user of git, I have a plan to maintain provable
> >> integrity of existing objects stored in git under sha1 while migrating
> >> away from sha1.  The same plan works for migrating away from SHA2 or
> >> SHA3 when the time comes.
> >
> > Please do not make the mistake of taking your use case to be a template
> > for everybody's use case.
> 
> That said, I'm curious at what plan you are alluding to.  Is it
> something that could benefit others on the list?

Well, it's just a plan at this point.  As there's a lot of other work to
do in the mean-time, and there's no possibility of transitioning until
the dust has settled on NEWHASH.  :-)

Given an existing repository that needs to migrate from SHA1 to NEWHASH,
and maintain backwards compatibility with clients that haven't migrated
yet, how do we

  a) perform that migration,
  b) allow non-updated clients to use the data prior to the switch, and
  c) maintain provable integrity of the old objects as well as the new.

The primary method is counter-hashing, which re-uses the blobs, and
creates parallel, deterministic tree, commit, and tag objects using
NEWHASH for everything up to flag day.  post-flag-day only uses NEWHASH.
A PGP "transition" key is used to counter-sign the NEWHASH version of
the old signed tags.  The transition key is not required to be different
than the existing maintainers key.

A critical feature is the ability of entities other than the maintainer
to migrate to NEWHASH.  For example, let's say that git has fully
implemented and tested NEWHASH.  linux.git intends to migrate, but it's
going to take several months (get all the developers herded up).

In the interim, a security company, relying on Linux for it's products
can counter-hash Linus' repo, and continue to do so every time he
updates his tree.  This shrinks the attack window for an entity (with an
undisclosed break of SHA1) down to a few minutes to an hour.  Otherwise,
a check of the counter hashes in the future would reveal the
substitution.

The deterministic feature is critical here because there is valuable
integrity and trust built by counter-hashing quickly after publication.
So once Linux migrates to NEWHASH, the hashes calculated by the security
company should be identical.  IOW, use the timestamps that are in the
SHA1 commit objects for the NEWHASH objects.  Which should be obvious,
but it's worth explicitly mentioning that determinism provides great
value.

We're in the process of writing this up formally, which will provide a
lot more detail and rationale that this quick stream of thought.  :-)

I'm sure a lot of this has already been discussed on the list.  If so, I
apologize for being repetitive.  Unfortunately, I'm not able to keep up
with the MLs like I used to.

thx,

Jason.


[PATCH v3 0/5] Improve abbreviation disambituation

2017-10-02 Thread Derrick Stolee
Thanks for the feedback on my previous versions, and for patience
with my inexperience on the mailing list. I tried to respond to all
feedback, but decided to not apply one suggestion:

* Removed the "sort -R" from p0008-abbrev.sh, since it is a GNU-
  specific flag. The perf test now considers objects in "discovery"
  order, which improved performance all versions of the test as
  expected. New perf numbers are provided.

* get_hex_char_from_oid(): renamed `i` to `pos`. I did not unify the
  code in this method with that in hex.c:sha1_to_hex_r due to the
  extra branch in get_hex_char_from_oid and wanting to inline the
  method. If we want to make this method available for other callers,
  then I would be happy to submit a separate patch for this change
  after the current patch is accepted.

* Removed unecessary includes.

* Use uint32_t instead of unsigned int in test-list-objects.c

* Rearranged arguments in test-list-objects and fixed the n == 0 bug.

---

When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

A helper program `test-list-objects` outputs a sampling of object ids,
which we reorder using `sort -R` before using them as input to a
performance test. 

A performance helper `test-abbrev` and performance test `p0008-abbrev.sh`
are added to demonstrate performance improvements in this area.

I include performance test numbers in the commit messages for each
change, but I also include the difference between the baseline and the
final change here:


p0008.1: find_unique_abbrev() for existing objects
--

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.09 s | 0.05 s | -44.4% |
| Git   | 5 |  230162 |  0 | 0.11 s | 0.08 s | -27.3% |
| Git   | 4 |  154310 |  75852 | 0.09 s | 0.06 s | -33.3% |
| Linux | 1 | 5606645 |  0 | 0.13 s | 0.05 s | -61.5% |
| Linux |24 | 5606645 |  0 | 1.13 s | 0.88 s | -22.1% |
| Linux |23 | 5283204 | 323441 | 1.08 s | 0.80 s | -25.9% |
| VSTS  | 1 | 4355923 |  0 | 0.12 s | 0.05 s | -58.3% |
| VSTS  |32 | 4355923 |  0 | 1.02 s | 0.95 s | - 6.9% |
| VSTS  |31 | 4276829 |  79094 | 2.25 s | 1.93 s | -14.2% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 5.69 s
  New Time: 4.09 s
 Rel %: -28.1%

p0008.2: find_unique_abbrev() for missing objects
-

For 10 repeated tests, each checking 100,000 missing objects, we find
the following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.66 s | 0.07 s | -89.4% |
| Git   | 5 |  230162 |  0 | 0.90 s | 0.12 s | -86.7% |
| Git   | 4 |  154310 |  75852 | 0.79 s | 0.09 s | -88.6% |
| Linux | 1 | 5606645 |  0 | 0.48 s | 0.04 s | -91.7% |
| Linux |24 | 5606645 |  0 | 4.41 s | 0.85 s | -80.7% |
| Linux |23 | 5283204 | 323441 | 4.11 s | 0.78 s | -81.0% |
| VSTS  | 1 | 4355923 |  0 | 0.46 s | 0.04 s | -91.3% |
| VSTS  |32 | 4355923 |  0 | 5.40 s | 0.98 s | -81.9% |
| VSTS  |31 | 4276829 |  79094 | 5.88 s | 1.04 s | -82.3% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 38.9 s
  New Time:  2.7 s
 Rel %: -93.1%

Derrick Stolee (5):
  test-list-objects: List a subset of object ids
  p0008-abbrev.sh: Test find_unique_abbrev() perf
  sha1_name: Unroll len loop in find_unique_abbrev_r
  sha1_name: Parse less while finding common prefix
  sha1_name: Minimize OID comparisons during disambiguation
 Makefile |   2 +
 sha1_name.c 

[PATCH v3 3/5] sha1_name: Unroll len loop in find_unique_abbrev_r

2017-10-02 Thread Derrick Stolee
Unroll the while loop inside find_unique_abbrev_r to avoid iterating
through all loose objects and packfiles multiple times when the short
name is longer than the predicted length.

Instead, inspect each object that collides with the estimated
abbreviation to find the longest common prefix.

p0008.1: find_unique_abbrev() for existing objects
--

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New| |
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%|
|---|---|-||||-|
| Git   | 1 |  230078 |  0 | 0.09 s | 0.06 s | - 33.3% |
| Git   | 5 |  230162 |  0 | 0.11 s | 0.08 s | - 27.3% |
| Git   | 4 |  154310 |  75852 | 0.09 s | 0.07 s | - 22.2% |
| Linux | 1 | 5606645 |  0 | 0.12 s | 0.32 s | +146.2% |
| Linux |24 | 5606645 |  0 | 1.12 s | 1.12 s | -  0.9% |
| Linux |23 | 5283204 | 323441 | 1.08 s | 1.05 s | -  2.8% |
| VSTS  | 1 | 4355923 |  0 | 0.12 s | 0.23 s | + 91.7% |
| VSTS  |32 | 4355923 |  0 | 1.02 s | 1.08 s | +  5.9% |
| VSTS  |31 | 4276829 |  79094 | 2.25 s | 2.08 s | -  7.6% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 5.69 s
  New Time: 4.62 s
 Rel %: -18.9%

p0008.2: find_unique_abbrev() for missing objects
-

For 10 repeated tests, each checking 100,000 missing objects, we find
the following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.66 s | 0.08 s | -87.9% |
| Git   | 5 |  230162 |  0 | 0.90 s | 0.13 s | -85.6% |
| Git   | 4 |  154310 |  75852 | 0.79 s | 0.10 s | -87.3% |
| Linux | 1 | 5606645 |  0 | 0.48 s | 0.32 s | -33.3% |
| Linux |24 | 5606645 |  0 | 4.41 s | 1.09 s | -75.3% |
| Linux |23 | 5283204 | 323441 | 4.11 s | 0.99 s | -75.9% |
| VSTS  | 1 | 4355923 |  0 | 0.46 s | 0.25 s | -45.7% |
| VSTS  |32 | 4355923 |  0 | 5.40 s | 1.15 s | -78.7% |
| VSTS  |31 | 4276829 |  79094 | 5.88 s | 1.18 s | -79.9% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 39.0 s
  New Time:  3.9 s
 Rel %: -90.0%

Signed-off-by: Derrick Stolee 
---
 sha1_name.c | 57 ++---
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 134ac9742..f2a1ebe49 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -474,10 +474,32 @@ static unsigned msb(unsigned long val)
return r;
 }
 
-int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+struct min_abbrev_data {
+   unsigned int init_len;
+   unsigned int cur_len;
+   char *hex;
+};
+
+static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
-   int status, exists;
+   struct min_abbrev_data *mad = cb_data;
+
+   char *hex = oid_to_hex(oid);
+   unsigned int i = mad->init_len;
+   while (mad->hex[i] && mad->hex[i] == hex[i])
+   i++;
+
+   if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
+   mad->cur_len = i + 1;
+
+   return 0;
+}
 
+int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+{
+   struct disambiguate_state ds;
+   struct min_abbrev_data mad;
+   struct object_id oid_ret;
if (len < 0) {
unsigned long count = approximate_object_count();
/*
@@ -503,19 +525,24 @@ int find_unique_abbrev_r(char *hex, const unsigned char 
*sha1, int len)
sha1_to_hex_r(hex, sha1);
if (len == GIT_SHA1_HEXSZ || !len)
return GIT_SHA1_HEXSZ;
-   exists = has_sha1_file(sha1);
-   while (len < GIT_SHA1_HEXSZ) {
-   struct object_id oid_ret;
-   status = get_short_oid(hex, len, _ret, GET_OID_QUIETLY);
-   if (exists
-   ? !status
-   : status == SHORT_NAME_NOT_FOUND) {
-   hex[len] = 0;
-   return len;
-   }
-   len++;
-   }
-   return len;
+
+   if (init_object_disambiguation(hex, len, ) < 0)
+   return -1;
+
+   mad.init_len = len;
+   mad.cur_len = len;
+   mad.hex = hex;
+
+   ds.fn = extend_abbrev_len;
+   ds.always_call_fn = 1;
+   ds.cb_data = (void*)
+
+   find_short_object_filename();
+   find_short_packed_object();
+   

[PATCH v3 4/5] sha1_name: Parse less while finding common prefix

2017-10-02 Thread Derrick Stolee
Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.

p0008.1: find_unique_abbrev() for existing objects
--

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.06 s | 0.05 s | -16.7% |
| Git   | 5 |  230162 |  0 | 0.08 s | 0.08 s |   0.0% |
| Git   | 4 |  154310 |  75852 | 0.07 s | 0.06 s | -14.3% |
| Linux | 1 | 5606645 |  0 | 0.32 s | 0.14 s | -56.3% |
| Linux |24 | 5606645 |  0 | 1.12 s | 0.94 s | -16.1% |
| Linux |23 | 5283204 | 323441 | 1.05 s | 0.86 s | -18.1% |
| VSTS  | 1 | 4355923 |  0 | 0.23 s | 0.11 s | -52.2% |
| VSTS  |32 | 4355923 |  0 | 1.08 s | 0.95 s | -12.0% |
| VSTS  |31 | 4276829 |  79094 | 2.08 s | 1.82 s | -12.5% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 4.61 s
  New Time: 4.61 s
 Rel %: 0.0%

p0008.2: find_unique_abbrev() for missing objects
-

For 10 repeated tests, each checking 100,000 missing objects, we find
the following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.08 s | 0.07 s | -12.5% |
| Git   | 5 |  230162 |  0 | 0.13 s | 0.12 s | - 7.7% |
| Git   | 4 |  154310 |  75852 | 0.10 s | 0.09 s | -10.0% |
| Linux | 1 | 5606645 |  0 | 0.32 s | 0.13 s | -59.4% |
| Linux |24 | 5606645 |  0 | 1.09 s | 0.89 s | -18.3% |
| Linux |23 | 5283204 | 323441 | 0.99 s | 0.83 s | -16.2% |
| VSTS  | 1 | 4355923 |  0 | 0.25 s | 0.11 s | -56.0% |
| VSTS  |32 | 4355923 |  0 | 1.15 s | 0.99 s | -13.9% |
| VSTS  |31 | 4276829 |  79094 | 1.18 s | 1.02 s | -13.6% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 3.91 s
  New Time: 3.08 s
 Rel %: -21.1%

Signed-off-by: Derrick Stolee 
---
 sha1_name.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index f2a1ebe49..5081aeb71 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,13 +480,23 @@ struct min_abbrev_data {
char *hex;
 };
 
+static inline char get_hex_char_from_oid(const struct object_id *oid,
+int pos)
+{
+   static const char hex[] = "0123456789abcdef";
+
+   if ((pos & 1) == 0)
+   return hex[oid->hash[pos >> 1] >> 4];
+   else
+   return hex[oid->hash[pos >> 1] & 0xf];
+}
+
 static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
struct min_abbrev_data *mad = cb_data;
 
-   char *hex = oid_to_hex(oid);
unsigned int i = mad->init_len;
-   while (mad->hex[i] && mad->hex[i] == hex[i])
+   while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
i++;
 
if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
-- 
2.14.1.538.g56ec8fc98.dirty



[PATCH v3 5/5] sha1_name: Minimize OID comparisons during disambiguation

2017-10-02 Thread Derrick Stolee
Minimize OID comparisons during disambiguatiosn of packfile OIDs.

Teach git to use binary search with the full OID to find the object's
position (or insertion position, if not present) in the pack-index.
The object before and immediately after (or the one at the insertion
position) give the maximum common prefix.  No subsequent linear search
is required.

Take care of which two to inspect, in case the object id exists in the
packfile.

If the input to find_unique_abbrev_r() is a partial prefix, then the
OID used for the binary search is padded with zeroes so the object will
not exist in the repo (with high probability) and the same logic
applies.

p0008.1: find_unique_abbrev() for existing objects
--

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.05 s | 0.05 s |   0.0% |
| Git   | 5 |  230162 |  0 | 0.08 s | 0.08 s |   0.0% |
| Git   | 4 |  154310 |  75852 | 0.06 s | 0.06 s |   0.0% |
| Linux | 1 | 5606645 |  0 | 0.14 s | 0.05 s | -64.3% |
| Linux |24 | 5606645 |  0 | 0.94 s | 0.88 s | - 6.4% |
| Linux |23 | 5283204 | 323441 | 0.86 s | 0.80 s | - 7.0% |
| VSTS  | 1 | 4355923 |  0 | 0.11 s | 0.05 s | -54.5% |
| VSTS  |32 | 4355923 |  0 | 0.95 s | 0.95 s |   0.0% |
| VSTS  |31 | 4276829 |  79094 | 1.82 s | 1.93 s | + 6.0% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 4.62 s
  New Time: 4.09 s
 Rel %: -11.5%

p0008.2: find_unique_abbrev() for missing objects
-

For 10 repeated tests, each checking 100,000 missing objects, we find
the following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.07 s | 0.07 s |   0.0% |
| Git   | 5 |  230162 |  0 | 0.12 s | 0.12 s |   0.0% |
| Git   | 4 |  154310 |  75852 | 0.09 s | 0.09 s |   0.0% |
| Linux | 1 | 5606645 |  0 | 0.13 s | 0.04 s | -69.2% |
| Linux |24 | 5606645 |  0 | 0.89 s | 0.85 s | - 4.5% |
| Linux |23 | 5283204 | 323441 | 0.83 s | 0.78 s | - 6.0% |
| VSTS  | 1 | 4355923 |  0 | 0.11 s | 0.04 s | -63.6% |
| VSTS  |32 | 4355923 |  0 | 0.99 s | 0.98 s | - 1.0% |
| VSTS  |31 | 4276829 |  79094 | 1.02 s | 1.04 s | + 2.0% |

For the Windows repo running in Windows Subsystem for Linux:

Pack Files: 50
Packed Objects: 22,385,898
 Loose Objects: 492
 Base Time: 3.08 s
  New Time: 2.72 s
 Rel %: -11.8

Signed-off-by: Derrick Stolee 
---
 sha1_name.c | 70 +
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 5081aeb71..54b3a37da 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -478,6 +478,7 @@ struct min_abbrev_data {
unsigned int init_len;
unsigned int cur_len;
char *hex;
+   const unsigned char *hash;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, 
void *cb_data)
return 0;
 }
 
+static void find_abbrev_len_for_pack(struct packed_git *p,
+struct min_abbrev_data *mad)
+{
+   int match = 0;
+   uint32_t num, last, first = 0;
+   struct object_id oid;
+
+   open_pack_index(p);
+   num = p->num_objects;
+   last = num;
+   while (first < last) {
+   uint32_t mid = (first + last) / 2;
+   const unsigned char *current;
+   int cmp;
+
+   current = nth_packed_object_sha1(p, mid);
+   cmp = hashcmp(mad->hash, current);
+   if (!cmp) {
+   match = 1;
+   first = mid;
+   break;
+   }
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   /*
+* first is now the position in the packfile where we would insert
+* mad->hash if it does not exist (or the position of mad->hash if
+* it does exist). Hence, we consider a maximum of three objects
+* nearby for the abbreviation length.
+*/
+   mad->init_len = 0;
+   if (!match) {
+   nth_packed_object_oid(, p, first);
+   extend_abbrev_len(, 

[PATCH v3 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf

2017-10-02 Thread Derrick Stolee
Create helper program test-abbrev to compute the minimum length of a
disambiguating short-sha for an input list of object ids.

Perf test p0008-abbrev.sh runs test-abbrev for 100,000 object ids. For
test 0008.1, these objects exist. For test 0008.2 these objects do not
exist in the repo (with high probability).

Signed-off-by: Derrick Stolee 
---
 Makefile   |  1 +
 t/helper/.gitignore|  1 +
 t/helper/test-abbrev.c | 18 ++
 t/perf/p0008-abbrev.sh | 22 ++
 4 files changed, 42 insertions(+)
 create mode 100644 t/helper/test-abbrev.c
 create mode 100755 t/perf/p0008-abbrev.sh

diff --git a/Makefile b/Makefile
index 50a2eab80..63438a44e 100644
--- a/Makefile
+++ b/Makefile
@@ -638,6 +638,7 @@ X =
 
 PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 
+TEST_PROGRAMS_NEED_X += test-abbrev
 TEST_PROGRAMS_NEED_X += test-chmtime
 TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 9696f54bb..2190781ff 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -1,3 +1,4 @@
+/test-abbrev
 /test-chmtime
 /test-ctype
 /test-config
diff --git a/t/helper/test-abbrev.c b/t/helper/test-abbrev.c
new file mode 100644
index 0..3ad88611a
--- /dev/null
+++ b/t/helper/test-abbrev.c
@@ -0,0 +1,18 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+   struct object_id oid;
+   char hex[GIT_MAX_HEXSZ + 2];
+   const char *end;
+
+   setup_git_directory();
+
+   while (fgets(hex, GIT_MAX_HEXSZ + 2, stdin)) {
+   hex[GIT_MAX_HEXSZ] = 0;
+   if (!parse_oid_hex(hex, , ))
+   find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
+   }
+
+   exit(0);
+}
diff --git a/t/perf/p0008-abbrev.sh b/t/perf/p0008-abbrev.sh
new file mode 100755
index 0..5cbc8a888
--- /dev/null
+++ b/t/perf/p0008-abbrev.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+test_description='Test object disambiguation through abbreviations'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test-list-objects 10 > objs.txt
+
+test_perf 'find_unique_abbrev() for existing objects' '
+   test-abbrev < objs.txt
+'
+
+test-list-objects --missing 10 > objs.txt
+
+test_perf 'find_unique_abbrev() for missing objects' '
+   test-abbrev < objs.txt
+'
+
+rm objs.txt
+
+test_done
-- 
2.14.1.538.g56ec8fc98.dirty



[PATCH v3 1/5] test-list-objects: List a subset of object ids

2017-10-02 Thread Derrick Stolee
Create test-list-objects helper program to output a random sample of
OIDs present in the repo.

If no command line arguments are provided, all OIDs are output.

The last command line argument specifies how many samples to output.
Samples are collected across the entire set of available OIDs to avoid
clustering and therefore quite uniformly distributed.

If a command line argument "--missing" is given before the sample count,
then a list of OIDs is generated without examining the repo.

Signed-off-by: Derrick Stolee 
---
 Makefile |  1 +
 t/helper/.gitignore  |  1 +
 t/helper/test-list-objects.c | 87 
 3 files changed, 89 insertions(+)
 create mode 100644 t/helper/test-list-objects.c

diff --git a/Makefile b/Makefile
index ed4ca438b..50a2eab80 100644
--- a/Makefile
+++ b/Makefile
@@ -652,6 +652,7 @@ TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-index-version
 TEST_PROGRAMS_NEED_X += test-lazy-init-name-hash
 TEST_PROGRAMS_NEED_X += test-line-buffer
+TEST_PROGRAMS_NEED_X += test-list-objects
 TEST_PROGRAMS_NEED_X += test-match-trees
 TEST_PROGRAMS_NEED_X += test-mergesort
 TEST_PROGRAMS_NEED_X += test-mktemp
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 7c9d28a83..9696f54bb 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -13,6 +13,7 @@
 /test-index-version
 /test-lazy-init-name-hash
 /test-line-buffer
+/test-list-objects
 /test-match-trees
 /test-mergesort
 /test-mktemp
diff --git a/t/helper/test-list-objects.c b/t/helper/test-list-objects.c
new file mode 100644
index 0..22bc9b4e6
--- /dev/null
+++ b/t/helper/test-list-objects.c
@@ -0,0 +1,87 @@
+#include "cache.h"
+#include "packfile.h"
+
+struct count {
+   int total;
+   int select_mod;
+};
+
+int count_loose(const struct object_id *oid,
+   const char *path,
+   void *data)
+{
+   ((struct count*)data)->total++;
+   return 0;
+}
+
+int count_packed(const struct object_id *oid,
+struct packed_git *pack,
+uint32_t pos,
+void* data)
+{
+   ((struct count*)data)->total++;
+   return 0;
+}
+
+void output(const struct object_id *oid,
+   struct count *c)
+{
+   if (!(c->total % c->select_mod))
+   printf("%s\n", oid_to_hex(oid));
+   c->total--;
+}
+
+int output_loose(const struct object_id *oid,
+const char *path,
+void *data)
+{
+   output(oid, (struct count*)data);
+   return 0;
+}
+
+int output_packed(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void* data)
+{
+   output(oid, (struct count*)data);
+   return 0;
+}
+
+int cmd_main(int ac, const char **av)
+{
+   uint32_t hash_delt = 0xFDB97531;
+   uint32_t hash_base = 0x01020304;
+   int i, n = -1;
+   struct count c;
+   const int u_per_oid = sizeof(struct object_id) / sizeof(uint32_t);
+
+   c.total = 0;
+   if (ac > 1)
+   n = atoi(av[ac - 1]);
+
+   if (ac > 2 && !strcmp(av[1], "--missing")) {
+   while (c.total++ < n) {
+   for (i = 0; i < u_per_oid; i++) {
+   printf("%08x", hash_base);
+   hash_base += hash_delt;
+   }
+   printf("\n");
+   }
+   } else {
+   setup_git_directory();
+
+   if (n > 0) {
+   for_each_loose_object(count_loose, , 0);
+   for_each_packed_object(count_packed, , 0);
+   c.select_mod = 1 + c.total / n;
+   } else {
+   c.select_mod = 1;
+   }
+
+   for_each_loose_object(output_loose, , 0);
+   for_each_packed_object(output_packed, , 0);
+   }
+
+   exit(0);
+}
-- 
2.14.1.538.g56ec8fc98.dirty



Re: [PATCH v2 4/5] sha1_name: Parse less while finding common prefix

2017-10-02 Thread Derrick Stolee

My v3 patch is incoming, but I wanted to respond directly to this message.

On 9/25/2017 7:42 PM, Stefan Beller wrote:

On Mon, Sep 25, 2017 at 2:54 AM, Derrick Stolee  wrote:

Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.

p0008.1: find_unique_abbrev() for existing objects
--

For 10 repeated tests, each checking 100,000 known objects, we find the
following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.08 s | 0.08 s |   0.0% |
| Git   | 5 |  230162 |  0 | 0.17 s | 0.16 s | - 5.9% |
| Git   | 4 |  154310 |  75852 | 0.14 s | 0.12 s | -14.3% |
| Linux | 1 | 5606645 |  0 | 0.50 s | 0.25 s | -50.0% |
| Linux |24 | 5606645 |  0 | 2.41 s | 2.08 s | -13.7% |
| Linux |23 | 5283204 | 323441 | 1.99 s | 1.69 s | -15.1% |
| VSTS  | 1 | 4355923 |  0 | 0.40 s | 0.22 s | -45.0% |
| VSTS  |32 | 4355923 |  0 | 2.09 s | 1.99 s | - 4.8% |
| VSTS  |31 | 4276829 |  79094 | 3.60 s | 3.20 s | -11.1% |

For the Windows repo running in Windows Subsystem for Linux:

 Pack Files: 50
Packed Objects: 22,385,898
  Loose Objects: 492
  Base Time: 4.61 s
   New Time: 4.61 s
  Rel %: 0.0%

p0008.2: find_unique_abbrev() for missing objects
-

For 10 repeated tests, each checking 100,000 missing objects, we find
the following results when running in a Linux VM:

|   | Pack  | Packed  | Loose  | Base   | New||
| Repo  | Files | Objects | Objects| Time   | Time   | Rel%   |
|---|---|-|||||
| Git   | 1 |  230078 |  0 | 0.06 s | 0.05 s | -16.7% |
| Git   | 5 |  230162 |  0 | 0.14 s | 0.15 s | + 7.1% |
| Git   | 4 |  154310 |  75852 | 0.12 s | 0.12 s |   0.0% |
| Linux | 1 | 5606645 |  0 | 0.40 s | 0.17 s | -57.5% |
| Linux |24 | 5606645 |  0 | 1.59 s | 1.30 s | -18.2% |
| Linux |23 | 5283204 | 323441 | 1.23 s | 1.10 s | -10.6% |
| VSTS  | 1 | 4355923 |  0 | 0.25 s | 0.12 s | -52.0% |
| VSTS  |32 | 4355923 |  0 | 1.45 s | 1.34 s | - 7.6% |
| VSTS  |31 | 4276829 |  79094 | 1.59 s | 1.34 s | -15.7% |

For the Windows repo running in Windows Subsystem for Linux:

 Pack Files: 50
Packed Objects: 22,385,898
  Loose Objects: 492
  Base Time: 3.91 s
   New Time: 3.08 s
  Rel %: -21.1%


These number look pretty cool!


Signed-off-by: Derrick Stolee 

Signed-off-by: Derrick Stolee 

double signoff?


Oops! I'll be more careful with my format-patch in the future.




---
  sha1_name.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index f2a1ebe49..bb47b6702 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,13 +480,22 @@ struct min_abbrev_data {
 char *hex;
  };

+static inline char get_hex_char_from_oid(const struct object_id *oid, int i)

'i' is not very descriptive, maybe add a comment?
(I realize it is just walking through the char*s one by one)

I renamed 'i' to 'pos' in my v3.



Maybe this function (together with the change in the while below)
could go into hex.c as "int progressively_cmp_oids" that returns
the position at which the given oids differ?


+{
+   static const char hex[] = "0123456789abcdef";
+
+   if ((i & 1) == 0)
+   return hex[oid->hash[i >> 1] >> 4];
+   else
+   return hex[oid->hash[i >> 1] & 0xf];
+}

sha1_to_hex_r has very similar code, though it iterates less
and covers both cases in the loop body.

That is the actual reason I propose moving this function
(or a variant thereof) to hex.c as there we can share code.


You're right that sha1_to_hex_r is similar, in fact I based my work on 
it. There are a few reasons I didn't combine the two implementations:


* I wanted to be sure my patch was only touching the code for 
disambiguating short-shas. Modifying code in hex.c would touch many more 
code paths.


* I realize that the extra branch in my version is slower than the 
branchless loop body in sha1_to_hex_r, so either I would slow that 
method or make the method call more complicated by returning two chars 
at a time.


* I wanted to strongly hint that the method should be inlined, but I'm 
not sure how to guarantee that happens across a linker boundary without 
doing strange things in header files.


I'm happy to revisit this after my patch is complete, since I think 
there are interesting trade-offs to consider here. I'd prefer to keep 
this discussion separate from the focus on 

Re: [PATCH v6 09/40] Add initial external odb support

2017-10-02 Thread Ben Peart



On 9/29/2017 4:36 PM, Jonathan Tan wrote:

On Wed, 27 Sep 2017 18:46:30 +0200
Christian Couder  wrote:


I am ok to split the patch series, but I am not sure that 01/40 to
09/40 is the right range for the first patch series.
I would say that 01/40 to 07/40 is better as it can be seen as a
separate refactoring.


I mentioned 09/40 because this is (as far as I can tell) the first one
that introduces a new design.


I don't think single-shot processes would be a huge burden, because
the code is simpler, and because for example for filters we already
have single shot and long-running processes and no one complains about
that. It's code that is useful as it makes it much easier for people
to do some things (see the clone bundle example).

In fact in Git development we usually start to by first implementing
simpler single-shot solutions, before thinking, when the need arise,
to make it faster. So a perhaps an equally valid opinion could be to
first only submit the patches for the single-shot protocol and later
submit the rest of the series when we start getting feedback about how
external odbs are used.


My concern is that, as far as I understand about the Microsoft use case,
we already know that we need the faster solution, so the need has
already arisen.


And yeah I could change the order of the patch series to implement the
long-running processes first and the single-shot process last, so that
it could be possible to first get feedback about the long-running
processes, before we decide to merge or not the single-shot stuff, but
I don't think it would look like the most logical order.


My thinking was that we would just implement the long-running process
and not implement the single-shot process at all (besides maybe a script
in contrib/). If we are going to do both anyway, I agree that we should
do the single-shot process first.



I agree with Jonathan's feedback.  We already know the performance of 
single shot requests is insufficient as there are scenarios where there 
will potentially be many missing objects that need to be retrieved to 
complete a git operation (ie checkout).  As a results, we will need the 
long running process model so, overall, it will be simpler to focus 
entirely on that model and skip the single-shot model.


If the complexity of the process model is considered to be too high, we 
can provide helper code in both script and native code that can be used 
to reduce the cost/complexity.  I believe we have most of this already 
with the existing sub-process.c/h module and the packet.pm refactoring 
you have done earlier in this series.


Providing high quality working samples of both is another way to reduce 
the cost and improve the quality.



And another possible issue is that we design ourselves into a corner.
Thinking about the use cases that I know about (the Android use case and
the Microsoft GVFS use case), I don't think we are doing that - for
Android, this means that large blob metadata needs to be part of the
design (and this patch series does provide for that), and for Microsoft
GVFS, "get" is relatively cheap, so a configuration option to not invoke
"have" first when loading a missing object might be sufficient.


If the helper does not advertise the "have" capability, the "have"
instruction will not be sent to the helper, so the current design is
already working for that case.


Ah, that's good to know.


And I think that my design can be extended to support a use case in
which, for example, blobs corresponding to a certain type of filename
(defined by a glob like in gitattributes) can be excluded during
fetch/clone, much like --blob-max-bytes, and they can be fetched either
through the built-in mechanism or through a custom hook.


Sure, we could probably rebuild something equivalent to what I did on
top of your design.
My opinion though is that if we want to eventually get to the same
goal, it is better to first merge something that get us very close to
the end goal and then add some improvements on top of it.


I agree - I mentioned that because I personally prefer to review smaller
patch sets at a time, and my patch set already includes a lot of the
same infrastructure needed by yours - for example, the places in the
code to dynamically fetch objects, exclusion of objects when fetching or
cloning, configuring the cloned repo when cloning, fsck, and gc.



I agree here has well.  I think smaller patch sets that we can 
review/approve independently will be more effective.


I think Jonathan has a lot of the infrastructure support in his partial 
clone series.  I'd like to take that work, add your external ODB work + 
Jeff's filtering work and come up with the best of all three solution. :)



  - I get compile errors when I "git am" these onto master. I think
'#include "config.h"' is needed in some places.


It's strange because I get no compile errors even after a "make clean"
from my branch.
Could you show the actual errors?


I 

Re: RFC v3: Another proposed hash function transition plan

2017-10-02 Thread Johannes Schindelin
Hi Joan,

On Sun, 1 Oct 2017, Joan Daemen wrote:

> On 30/09/17 00:33, Johannes Schindelin wrote:
> 
> > As far as Git is concerned, we not only care about the source code of
> > the hash algorithm we use, we need to care even more about what you
> > call "executable": ready-to-use, high quality, well-tested
> > implementations.
> > 
> > We carry source code for SHA-1 as part of Git's source code, which was
> > hand-tuned to be as fast as Linus could get it, which was tricky given
> > that the tuning should be general enough to apply to all common intel
> > CPUs.
> > 
> > This hand-crafted code was blown out of the water by OpenSSL's SHA-1
> > in our tests here at Microsoft, thanks to the fact that OpenSSL does
> > vectorized SHA-1 computation now.
> > 
> > To me, this illustrates why it is not good enough to have only a
> > reference implementation available at our finger tips. Of course,
> > above-mentioned OpenSSL supports SHA-256 and SHA3-256, too, and at
> > least recent versions vectorize those, too.
> 
> There is a lot of high-quality optimized code for all SHA-3 functions
> and many CPUs in the Keccak code package
> https://github.com/gvanas/KeccakCodePackage but also OpenSSL contains
> some good SHA-3 code and then there are all those related to Ethereum.
> 
> By the way, you speak about SHA3-256, but the right choice would be to
> use SHAKE128. Well, what is exactly the right choice depends on what you
> want. If you want to have a function in the SHA3 standard (FIPS 202), it
> is SHAKE128.  You can boost performance on high-end CPUs by adopting
> Parallelhash from NIST SP 800-185, still a NIST standard. You can
> multiply that performance again by a factor of 2 by adopting
> KangarooTwelve. This is our (Keccak team) proposal for a parallelizable
> Keccak-based hash function that has a safety margin comparable to that
> of the SHA-2 functions. See https://keccak.team/kangarootwelve.html May
> I also suggest you read https://keccak.team/2017/is_sha3_slow.html

Thanks.

I have to admit that all those names that do not start with SHA and do not
end in 256 make me a bit dizzy.

> > Back to Intel processors: I read some vague hints about extensions
> > accelerating SHA-256 computation on future Intel processors, but not
> > SHA3-256.
> > 
> > It would make sense, of course, that more crypto libraries and more
> > hardware support would be available for SHA-256 than for SHA3-256
> > given the time since publication: 16 vs 5 years (I am playing it loose
> > here, taking just the year into account, not the exact date, so please
> > treat that merely as a ballpark figure).
> > 
> > So from a practical point of view, I wonder what your take is on, say,
> > hardware support for SHA3-256. Do you think this will become a focus
> > soon?
> 
> I think this is a chicken-and-egg problem. In any case, hardware support
> for one SHA3-256 will also work for the other SHA3 and SHAKE functions
> as they all use the same underlying primitive: the Keccak-f permutation.
> This is not the case for SHA2 because SHA224 and SHA256 use a different
> compression function than SHA384, SHA512, SHA512/224 and SHA512/256.

Okay.

So given that Git does not exactly have a big sway on hardware vendors, we
would have to hope that some other chicken lays that egg.

> > Also, what is your take on the question whether SHA-256 is good
> > enough?  SHA-1 was broken theoretically already 10 years after it was
> > published (which unfortunately did not prevent us from baking it into
> > Git), after all, while SHA-256 is 16 years old and the only known
> > weakness does not apply to Git's usage?
> 
> SHA-256 is more conservative than SHA-1 and I don't expect it to be
> broken in the coming decades (unless NSA inserted a backdoor but I don't
> think that is likely). But looking at the existing cryptanalysis, I
> think it is even less likely that I SHAKE128, ParallelHash or
> KangarooTwelve will be broken anytime.

That's reassuring! ;-)

> > Also, while I have the attention of somebody who knows a heck more
> > about cryptography than Git's top 10 committers combined: how soon do
> > you expect practical SHA-1 attacks that are much worse than what we
> > already have seen? I am concerned that if we do not move fast enough
> > to a new hash algorithm, and somebody finds a way in the meantime to
> > craft arbitrary messages given a prefix and an SHA-1, then we have a
> > huge problem on our hands.
> 
> This is hard to say. To be honest, when witnessing the first MD5
> collisions I did not expect them to lead to some real world attacks and
> just a few years later we saw real-world forged certificates based on
> MD5 collisions. And SHA-1 has a lot in common with MD5...

Oh, okay. I did not realize that MD5 and SHA-1 are so similar in design,
thank you for educating me!

> But let me end with a philosophical note. Independent of all the
> arguments for and against, I think this is ultimately about doing the
> right thing. The choice is here 

Re: [PATCH v6 00/40] Add initial experimental external ODB support

2017-10-02 Thread Ben Peart



On 9/16/2017 4:06 AM, Christian Couder wrote:


Highlevel view of the patches in the series
~~~



This is a massive patch series and IMO keeping it a single monolithic 
set of patches makes it difficult to review and unwieldy to make 
progress on as an "all or nothing" series.


I highly recommend breaking it up into multiple smaller patch series 
that can be reviewed and accepted individually.  I have to admit I've 
skipped reviewing the last couple of iterations simply because of the 
time investment required and difficulty separating out the various pieces.


I think using your division of the patches below is a good place to 
start.  Many of these would be good changes even if they weren't part of 
the larger external ODB effort.


  - Patch 1/40 is a small code cleanup that I already sent to the

   mailing list but may be removed in the end due to ongoing work
   on "git clone".

 - Patches 02/40 to 07/40 create a "Git/Packet.pm" module by
   refactoring "t0021/rot13-filter.pl". Functions from this new
   module will be used later in test scripts. According to Junio's
   suggestion compared to v5 we now first fully refactor
   "t0021/rot13-filter.pl" before creating the "Git/Packet.pm"
   module.



This seems like a very logical thing to do and should be split out so 
that progress can be made independently.



 - Patches 08/40 to 16/40 create the external ODB insfrastructure
   in external-odb.{c,h} and odb-helper.{c,h} for the script mode.
   The main changes compared to v5 are the following:
 - we mark as "extern" functions in *.h files
- we use sha1_pos() instead of sha1_entry_pos()
- we check the size in the header when we 'get' a Git object



This is the heart of the ODB infrastructure series.  I'll respond in 
more detail in the specific patches.



 - Patches 17/40 to 23/40 improve lib-http to make it possible to
   use it as an external ODB to test storing blobs in an HTTP
   server. The "upload.sh" and "list.sh" files are now properly
   indented and they use %% instead of % in parameter
   substitutions compared to v5.

 - Patches 24/40 to 32/40 improve the external ODB insfrastructure
   to support sub-processes and make everything work using
   them. The main changes compared to v5 are the following:
 - we mark as "extern" functions in *.h files
- we use the new subprocess_handshake() function
- we check the size in the header when we 'get' a Git object

 - Patch 33/40 uses attributes to mark blobs that should be handled
   by an external odb.

 - Patch 34/40 adds documentation about the external odb
   mechanism. This patch has been much improved since v5.

 - Patches 35/40 to 39/40 add the --initial-refspec to git clone
   along with tests.

 - Patch 40/40 adds documentation about transfering objects and
   metadata when using the external odb mechanism. This patch is
   new since v5.

Future work
~~~

There are still things that could be cleaned or improved. I think I
may work on:

   - Integrate changes in recent "read-object-process" work by Ben Peart.

   - Better test all the combinations of the different modes with and
 without "have" and "put_*" instructions.

   - Maybe implement the missing kinds of 'put' ('put_git_obj' and
 'put_direct'), so that Git could pass either a git object a plain
 object or ask the helper to retreive it directly from Git's object
 database.

   - Add more long running tests and improve tests in general.

Previous work and discussions
~

(Sorry for the old Gmane links, I hope I will try to replace them with
public-inbox.org at one point.)

Peff started to work on this and discuss this some years ago:

http://thread.gmane.org/gmane.comp.version-control.git/206886/focus=207040
http://thread.gmane.org/gmane.comp.version-control.git/247171
http://thread.gmane.org/gmane.comp.version-control.git/202902/focus=203020

His work, which is not compile-tested any more, is still there:

https://github.com/peff/git/commits/jk/external-odb-wip

Initial discussions about this new series are there:

http://thread.gmane.org/gmane.comp.version-control.git/288151/focus=295160

Version 1, 2, 3, 4 and 5 of this series are here:

https://public-inbox.org/git/20160613085546.11784-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20160628181933.24620-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20161130210420.15982-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/
https://public-inbox.org/git/20170803091926.1755-1-chrisc...@tuxfamily.org/

Some of the discussions related to Ben Peart's work that is used by
this series are here:

https://public-inbox.org/git/20170113155253.1644-1-benpe...@microsoft.com/

Re: What means "git config bla ~/"?

2017-10-02 Thread René Scharfe
[When I try to reply to rpj...@crashcourse.ca then my SMTP server
 says "Requested action not taken: mailbox unavailable
 invalid DNS MX or A/ resource record.", so I'm replying only
 to the list.]

Am 02.10.2017 um 12:13 schrieb rpj...@crashcourse.ca:
> i'm sure i'm about to embarrass myself but, in "man git-config",
> OPTIONS, one reads:
> 
> --path
> 
> git-config will expand leading ~ to the value of $HOME, and ~user to
> the   home directory for the specified user. This option has no
> effect when setting the value (but you can use git config bla ~/ from
> the command line to let your shell do the expansion).
> 
> what's with that "git config bla ~/"? is this some config keyword or
> something?
"bla" represents any configuration option that you want to change.  It
is not literally a keyword recognized by git config.  You could replace
it with "foo", if that helps.

The comment in parentheses reminds readers that shells like bash do
tilde expansion for command line arguments.  Consider the following
command:

$ git config format.outputDirectory ~/patches

Your shell rewrites "~/patches" to "/home/rpjday/patches" (or similar,
depending on your user name and home directory) before actually
executing git config. So even though git config doesn't expand ~ in
option values you'll still get a result like this afterwards:

$ git config format.outputDirectory
/home/rpjday/patches

René


Re: RFC v3: Another proposed hash function transition plan

2017-10-02 Thread Jason Cooper
Hi Johannes,

Thanks for the response.  Sorry for the delay.  Had a large deadline for
$dayjob.

On Wed, Sep 27, 2017 at 12:11:14AM +0200, Johannes Schindelin wrote:
> On Tue, 26 Sep 2017, Jason Cooper wrote:
> > On Thu, Sep 14, 2017 at 08:45:35PM +0200, Johannes Schindelin wrote:
> > > On Wed, 13 Sep 2017, Linus Torvalds wrote:
> > > > On Wed, Sep 13, 2017 at 6:43 AM, demerphq  wrote:
> > > > > SHA3 however uses a completely different design where it mixes a 1088
> > > > > bit block into a 1600 bit state, for a leverage of 2:3, and the excess
> > > > > is *preserved between each block*.
> > > > 
> > > > Yes. And considering that the SHA1 attack was actually predicated on
> > > > the fact that each block was independent (no extra state between), I
> > > > do think SHA3 is a better model.
> > > > 
> > > > So I'd rather see SHA3-256 than SHA256.
> > 
> > Well, for what it's worth, we need to be aware that SHA3 is *different*.
> > In crypto, "different" = "bugs haven't been found yet".  :-P
> > 
> > And SHA2 is *known*.  So we have a pretty good handle on how it'll
> > weaken over time.
> 
> Here, you seem to agree with me.

Yep.

> > > SHA-256 got much more cryptanalysis than SHA3-256, and apart from the
> > > length-extension problem that does not affect Git's usage, there are no
> > > known weaknesses so far.
> > 
> > While I think that statement is true on it's face (particularly when
> > including post-competition analysis), I don't think it's sufficient
> > justification to chose one over the other.
> 
> And here you don't.
> 
> I find that very confusing.

What I'm saying is that there is more to selecting a hash function for
git than just the cryptographic assessment.  In fact I would argue that
the primary cryptographic concern for git is "What is the likelihood
that we'll wake up one day to full collisions with no warning?"

To that, I'd argue that SHA-256's time in the field and SHA3-256's
competition give them both passing marks in that regard.  fwiw, I'd also
put Blake and Skein in there as well.

The chance that any of those will suffer sudden, catastrophic failure is
minimal.  IOW, we'll have warnings, and time to migrate to the next
function.

None of us can predict the future, but having a significant amount of
vetting reduces the chances of catastrophic failure.

> > > It would seem that the experts I talked to were much more concerned about
> > > that amount of attention than the particulars of the algorithm. My
> > > impression was that the new features of SHA3 were less studied than the
> > > well-known features of SHA2, and that the new-ness of SHA3 is not
> > > necessarily a good thing.
> > 
> > The only thing I really object to here is the abstract "experts".  We're
> > talking about cryptography and integrity here.  It's no longer
> > sufficient to cite anonymous experts.  Either they can put their
> > thoughts, opinions and analysis on record here, or it shouldn't be
> > considered.  Sorry.
> 
> Sorry, you are asking cryptography experts to spend their time on the Git
> mailing list. I tried to get them to speak out on the Git mailing list.
> They respectfully declined.

Ok, fair enough.  Just please understand that it's difficult to place
much weight on statements that we can't discuss with the person who made
them.

> > However, whether we chose SHA2 or SHA3 doesn't matter.
> 
> To you, it does not matter.

Well, I'd say it does not matter for *most* users.

> To me, it matters. To the several thousand developers working on Windows,
> probably the largest Git repository in active use, it matters. It matters
> because the speed difference that has little impact on you has a lot more
> impact on us.

Ahhh, so if I understand you correctly, you'd prefer SHA-256 over
SHA3-256 because it's more performant for your usecase?  Well, that's a
completely different animal that cryptographic suitability.

Have you been able to crunch numbers yet?  Will you be able to share
some empirical data?  I'd love to see some comparisons between SHA1,
SHA-256, SHA512-256, and SHA3-256 for different git operations under
your work load.

> > If SHA3 is chosen as the successor, it's going to get a *lot* more
> > adoption, and thus, a lot more analysis.  If cracks start to show, the
> > hard work of making git flexible is already done.  We can migrate to
> > SHA4/5/whatever in an orderly fashion with far less effort than the
> > transition away from SHA1.
> 
> Sure. And if XYZ789 is chosen, it's going to get a *lot* more adoption,
> too.
> 
> We think.
> 
> Let's be realistic. Git is pretty important to us, but it is not important
> enough to sway, say, Intel into announcing hardware support for SHA3.
> And if you try to force through *any* hash function only so that it gets
> more adoption and hence more support,

That's quite a jump from what I was saying.  I would never advise using
code in a production setting just to increase adoption.

What I /was/ saying: Let's say you don't get 

[PATCH 2/3] for-each-ref: let upstream/push optionally report merge name.

2017-10-02 Thread Johannes Schindelin
From: J Wyman 

There are times when scripts want to know not only the name of the
push branch on the remote, but also the name of the branch as known
by the remote repository.

A useful example of this is with the push command where
`branch..merge` is useful as the  value. Example:

$ git push  :

This patch offers the new suffix :merge for the push atom, allowing to
show exactly that. Example:

$ cat .git/config
...
[remote "origin"]
url = https://where.do.we.come/from
fetch = refs/heads/*:refs/remote/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
[branch "develop/with/topics"]
remote = origin
merge = refs/heads/develop/with/topics
...

$ git for-each-ref \
--format='%(push) %(push:remote-ref)' \
refs/heads
refs/remotes/origin/master refs/heads/master
refs/remotes/origin/develop/with/topics refs/heads/develop/with/topics
---
 Documentation/git-for-each-ref.txt | 11 ++-
 ref-filter.c   | 12 +++-
 remote.c   | 30 ++
 remote.h   |  2 ++
 4 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index 776368ee531..ba1147a1223 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -141,8 +141,9 @@ upstream::
also prints "[gone]" whenever unknown upstream ref is
encountered. Append `:track,nobracket` to show tracking
information without brackets (i.e "ahead N, behind M").  Also
-   respects `:remote-name` to state the name of the *remote* instead of
-   the ref.
+   respects `:remote-name` to state the name of the *remote* instead
+   of the ref, and `:remote-ref` to state the name of the *reference*
+   as locally known by the remote.
 +
 Has no effect if the ref does not have tracking information associated
 with it.  All the options apart from `nobracket` are mutually exclusive,
@@ -151,9 +152,9 @@ but if used together the last option is selected.
 push::
The name of a local ref which represents the `@{push}`
location for the displayed ref. Respects `:short`, `:lstrip`,
-   `:rstrip`, `:track`, `:trackshort` and `:remote-name` options as
-   `upstream` does. Produces an empty string if no `@{push}` ref is
-   configured.
+   `:rstrip`, `:track`, `:trackshort`, `:remote-name`, and `:remote-ref`
+   options as `upstream` does. Produces an empty string if no `@{push}`
+   ref is configured.
 
 HEAD::
'*' if HEAD matches current ref (the checked out branch), ' '
diff --git a/ref-filter.c b/ref-filter.c
index 58d53c09390..4401e573b8a 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -77,7 +77,7 @@ static struct used_atom {
struct align align;
struct {
enum {
-   RR_REF, RR_TRACK, RR_TRACKSHORT, RR_REMOTE_NAME
+   RR_REF, RR_TRACK, RR_TRACKSHORT, 
RR_REMOTE_NAME, RR_REMOTE_REF
} option;
struct refname_atom refname;
unsigned int nobracket : 1;
@@ -160,6 +160,8 @@ static void remote_ref_atom_parser(const struct ref_format 
*format, struct used_
atom->u.remote_ref.nobracket = 1;
else if (!strcmp(s, "remote-name"))
atom->u.remote_ref.option = RR_REMOTE_NAME;
+   else if (!strcmp(s, "remote-ref"))
+   atom->u.remote_ref.option = RR_REMOTE_REF;
else {
atom->u.remote_ref.option = RR_REF;

refname_atom_parser_internal(>u.remote_ref.refname,
@@ -1260,6 +1262,14 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
*s = xstrdup(remote);
else
*s = "";
+   } else if (atom->u.remote_ref.option == RR_REMOTE_REF) {
+   int explicit, for_push = starts_with(atom->name, "push");
+   const char *merge = remote_ref_for_branch(branch, for_push,
+ );
+   if (explicit)
+   *s = xstrdup(merge);
+   else
+   *s = "";
} else
die("BUG: unhandled RR_* enum");
 }
diff --git a/remote.c b/remote.c
index b220f0dfc61..2bdcfc280cd 100644
--- a/remote.c
+++ b/remote.c
@@ -675,6 +675,36 @@ const char *pushremote_for_branch(struct branch *branch, 
int *explicit)
return remote_for_branch(branch, explicit);
 }
 
+const char *remote_ref_for_branch(struct branch *branch, int for_push,
+ 

[PATCH 3/3] for-each-ref: test :remote-name and :remote-ref

2017-10-02 Thread Johannes Schindelin
This not only prevents regressions, but also serves as documentation
what this new feature is expected to do.

Signed-off-by: Johannes Schindelin 
---
 t/t6300-for-each-ref.sh | 32 
 1 file changed, 32 insertions(+)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 2274a4b7331..edc73dd79aa 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -675,4 +675,36 @@ test_expect_success 'Verify usage of %(symref:rstrip) 
atom' '
test_cmp expected actual
 '
 
+test_expect_success ':remote-name and :remote-ref' '
+   git init remote-tests &&
+   (
+   cd remote-tests &&
+   test_commit initial &&
+   git remote add from fifth.coffee:blub &&
+   git config branch.master.remote from &&
+   git config branch.master.merge refs/heads/stable &&
+   git remote add to southridge.audio:repo &&
+   git config remote.to.push "refs/heads/*:refs/heads/pushed/*" &&
+   git config branch.master.pushRemote to &&
+   for pair in "%(upstream)=refs/remotes/from/stable" \
+   "%(upstream:remote-name)=from" \
+   "%(upstream:remote-ref)=refs/heads/stable" \
+   "%(push)=refs/remotes/to/pushed/master" \
+   "%(push:remote-name)=to" \
+   "%(push:remote-ref)=refs/heads/pushed/master"
+   do
+   echo "${pair#*=}" >expect &&
+   git for-each-ref --format="${pair%=*}" \
+   refs/heads/master >actual &&
+   test_cmp expect actual
+   done &&
+   git branch push-simple &&
+   git config branch.push-simple.pushRemote from &&
+   actual="$(git for-each-ref \
+   --format="%(push:remote-name),%(push:remote-ref)" \
+   refs/heads/push-simple)" &&
+   test from, = "$actual"
+   )
+'
+
 test_done
-- 
2.14.2.windows.1


[PATCH 1/3] for-each-ref: let upstream/push optionally report the remote name

2017-10-02 Thread Johannes Schindelin
There are times when e.g. scripts want to know not only the name of the
upstream branch on the remote repository, but also the name of the
remote.

This patch offers the new suffix :remote for the upstream and for the
push atoms, allowing to show exactly that. Example:

$ cat .git/config
...
[remote "origin"]
url = https://where.do.we.come/from
fetch = refs/heads/*:refs/remote/origin/*
[remote "hello-world"]
url = https://hello.world/git
fetch = refs/heads/*:refs/remote/origin/*
pushURL = hello.world:git
push = refs/heads/*:refs/heads/*
[branch "master"]
remote = origin
pushRemote = hello-world
...

$ git for-each-ref \
--format='%(upstream) %(upstream:remote) %(push:remote)' \
refs/heads/master
refs/remotes/origin/master origin hello-world

Signed-off-by: Johannes Schindelin 
---
 Documentation/git-for-each-ref.txt | 16 +---
 ref-filter.c   | 25 +
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index 66b4e0a4050..776368ee531 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -140,17 +140,19 @@ upstream::
(behind), "<>" (ahead and behind), or "=" (in sync). `:track`
also prints "[gone]" whenever unknown upstream ref is
encountered. Append `:track,nobracket` to show tracking
-   information without brackets (i.e "ahead N, behind M").  Has
-   no effect if the ref does not have tracking information
-   associated with it.  All the options apart from `nobracket`
-   are mutually exclusive, but if used together the last option
-   is selected.
+   information without brackets (i.e "ahead N, behind M").  Also
+   respects `:remote-name` to state the name of the *remote* instead of
+   the ref.
++
+Has no effect if the ref does not have tracking information associated
+with it.  All the options apart from `nobracket` are mutually exclusive,
+but if used together the last option is selected.
 
 push::
The name of a local ref which represents the `@{push}`
location for the displayed ref. Respects `:short`, `:lstrip`,
-   `:rstrip`, `:track`, and `:trackshort` options as `upstream`
-   does. Produces an empty string if no `@{push}` ref is
+   `:rstrip`, `:track`, `:trackshort` and `:remote-name` options as
+   `upstream` does. Produces an empty string if no `@{push}` ref is
configured.
 
 HEAD::
diff --git a/ref-filter.c b/ref-filter.c
index bc591f4f3de..58d53c09390 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -76,7 +76,9 @@ static struct used_atom {
char color[COLOR_MAXLEN];
struct align align;
struct {
-   enum { RR_REF, RR_TRACK, RR_TRACKSHORT } option;
+   enum {
+   RR_REF, RR_TRACK, RR_TRACKSHORT, RR_REMOTE_NAME
+   } option;
struct refname_atom refname;
unsigned int nobracket : 1;
} remote_ref;
@@ -156,6 +158,8 @@ static void remote_ref_atom_parser(const struct ref_format 
*format, struct used_
atom->u.remote_ref.option = RR_TRACKSHORT;
else if (!strcmp(s, "nobracket"))
atom->u.remote_ref.nobracket = 1;
+   else if (!strcmp(s, "remote-name"))
+   atom->u.remote_ref.option = RR_REMOTE_NAME;
else {
atom->u.remote_ref.option = RR_REF;

refname_atom_parser_internal(>u.remote_ref.refname,
@@ -1247,6 +1251,15 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
*s = ">";
else
*s = "<>";
+   } else if (atom->u.remote_ref.option == RR_REMOTE_NAME) {
+   int explicit;
+   const char *remote = starts_with(atom->name, "push") ?
+   pushremote_for_branch(branch, ) :
+   remote_for_branch(branch, );
+   if (explicit)
+   *s = xstrdup(remote);
+   else
+   *s = "";
} else
die("BUG: unhandled RR_* enum");
 }
@@ -1364,9 +1377,13 @@ static void populate_value(struct ref_array_item *ref)
continue;
branch = branch_get(branch_name);
 
-   refname = branch_get_push(branch, NULL);
-   if (!refname)
-   continue;
+   if (starts_with(name, "push:remote-"))
+

[PATCH 0/3] for-each-ref: add :remote-ref and :remote-name specifiers

2017-10-02 Thread Johannes Schindelin
This introduces support for

git for-each-ref \
--format="%(merge:remote-name),%(merge:remote-ref)"
git for-each-ref \
--format="%(push:remote-name),%(push:remote-ref)"

to figure out the remote nickname as well as the name of the corresponding
branch on the remote.

Note: the `%(push:remote-name)` placeholder is only interpolated by the value
of `branch..pushRemote`; unlike `git push`, it does not fall back to
`branch..remote`. Likewise, `%(push:remote-ref)` interpolates to the
empty string unless `remote..pushRefs` is configured.

This is useful for third-party tools that need to know this type of information
for tons of branches.


J Wyman (1):
  for-each-ref: let upstream/push optionally report merge name.

Johannes Schindelin (2):
  for-each-ref: let upstream/push optionally report the remote name
  for-each-ref: test :remote-name and :remote-ref

 Documentation/git-for-each-ref.txt | 19 +++
 ref-filter.c   | 35 +++
 remote.c   | 30 ++
 remote.h   |  2 ++
 t/t6300-for-each-ref.sh| 32 
 5 files changed, 106 insertions(+), 12 deletions(-)


base-commit: ea220ee40cbb03a63ebad2be902057bf742492fd
Published-As: 
https://github.com/dscho/git/releases/tag/ref-filter-remote-name-v1
Fetch-It-Via: git fetch https://github.com/dscho/git ref-filter-remote-name-v1
-- 
2.14.2.windows.1



Re: [PATCH v16 1/6] bisect--helper: use OPT_CMDMODE instead of OPT_BOOL

2017-10-02 Thread Pranit Bauva
Hey Ramsay,

On Sat, Sep 30, 2017 at 6:29 PM, Ramsay Jones
 wrote:

> Hi Pranit,
>
> Just before Junio dropped your 'pb/bisect' branch from his
> repository (and What's cooking), I fetched it locally with
> the intention of finishing it off. (It would have been silly
> to waste all your good work).

Thanks!

> Although I have rebased your branch a few times, and added
> a few commits while 'reading' the code, I haven't actually
> added much to your branch (only 12 commits and I had meant
> to squash some of those together)!
>
> However, there were some bug fixes in there, so you may want
> to take a look at:
>
> git://repo.or.cz/git/raj.git branch 'bisect'
>
> [the 'pb-bisect' branch was the original branch from Junio,
> including the 'SQUASH' commit that I squashed!]

Yes, I have checked it out. I had worked on Stephan's review and
updated a few parts. I think you have included that as well as some of
your modifications. I will squash it in together and send the series
out in parts.

> These patches seem to relate to patches 1-5 & 8 of the original
> series. The diff between these patches and the first 6 patches
> of my bisect branch is given below. Note that most of the diff
> seems to be caused by swapping patch #6 for #8, but not all of
> the hunks are caused by this.
>
> Note that I moved some code between patches (e.g. some of the
> GIT_PATH_FUNC()s moved out of patch #4, because they were not
> used in that patch. Ah, is that why you moved patch #8 up?).
> [Also I added the 'bisect clean' message to delet_refs() to
> patch #4 as well.]

Even I thought keeping GIT_PATH_FUNC()s should be declared whenever
required. Did that change in this patch series.

> Look for []-ed comments in the commit messages for a note of
> the changes I made to your original patches, in patches #2,
> #4, #7-9, #11-12 and #14.
>
> The commits I added, which are just WIP, are as follows:
>
>   $ git log --oneline bisect~12..bisect
>   7d7117040 (raj/bisect, bisect) bisect--helper: convert to struct object_id
>   188ea5855 bisect--helper: add the get_bad_commit() function
>   b75f46fb4 bisect--helper: add a log_commit() helper function
>   4afc34403 bisect--helper: reduce the scope of a variable
>   62495f6ae bisect--helper: remove useless sub-expression in condition
>   964f4e2b0 bisect--helper: set correct term from --term-new= option
>   62efc099f bisect--helper: remove redundant assignment to has_double_dash
>   d35950b92 bisect--helper: remove redundant goto's
>   b33f313ac bisect--helper: remove space just before \n in string
>   3eb407156 bisect--helper: remove some unnecessary braces
>   c2b89c9b8 bisect--helper: add some vertical whitespace
>   8c883701c bisect--helper: fix up some coding style issues
>   $
>
> Again IIRC, there are a couple of bug fixes in these commits ...

There is actually a major bug in the later part of previous series
mostly in the bisect-next which actually caused delays. I think you
have fixed it in your commit 682d0bff0. Although I would need to have
a closer look at it. In original series, I did get a sigserv, and as
you mention it in the commit that you have fixed it.

> I have to go now, so I will leave it with you. ;-)
>
> Hope that helps.
>
> ATB,
> Ramsay Jones

Again Thanks!

>
> -- >8 --
> diff --git a/bisect.c b/bisect.c
> index 2838d672d..b19311ca7 100644
> --- a/bisect.c
> +++ b/bisect.c
> @@ -1066,7 +1066,7 @@ int bisect_clean_state(void)
> struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
> for_each_ref_in("refs/bisect", mark_for_removal, (void *) 
> _for_removal);
> string_list_append(_for_removal, xstrdup("BISECT_HEAD"));
> -   result = delete_refs("bisect: remove", _for_removal, 
> REF_NODEREF);
> +   result = delete_refs("bisect: clean", _for_removal, REF_NODEREF);
> refs_for_removal.strdup_strings = 1;
> string_list_clear(_for_removal, 0);
> unlink_or_warn(git_path_bisect_expected_rev());
> diff --git a/builtin/am.c b/builtin/am.c
> index c973bd96d..aa66f9915 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -32,22 +32,6 @@
>  #include "apply.h"
>  #include "string-list.h"
>
> -/**
> - * Returns 1 if the file is empty or does not exist, 0 otherwise.
> - */
> -static int is_empty_file(const char *filename)
> -{
> -   struct stat st;
> -
> -   if (stat(filename, ) < 0) {
> -   if (errno == ENOENT)
> -   return 1;
> -   die_errno(_("could not stat %s"), filename);
> -   }
> -
> -   return !st.st_size;
> -}
> -
>  /**
>   * Returns the length of the first line of msg.
>   */
> @@ -1300,7 +1284,7 @@ static int parse_mail(struct am_state *state, const 
> char *mail)
> goto finish;
> }
>
> -   if (is_empty_file(am_path(state, "patch"))) {
> +   if (is_empty_or_missing_file(am_path(state, "patch"))) {
> printf_ln(_("Patch is empty."));
> 

Re: [PATCH] tag: avoid NULL pointer arithmetic

2017-10-02 Thread René Scharfe
Am 02.10.2017 um 07:08 schrieb Jeff King:
> On Sun, Oct 01, 2017 at 04:45:13PM +0200, René Scharfe wrote:
> 
>> lookup_blob() etc. can return NULL if the referenced object isn't of the
>> expected type.  In theory it's wrong to reference the object member in
>> that case.  In practice it's OK because it's located at offset 0 for all
>> types, so the pointer arithmetic (NULL + 0) is optimized out by the
>> compiler.  The issue is reported by Clang's AddressSanitizer, though.
>>
>> Avoid the ASan error by casting the results of the lookup functions to
>> struct object pointers.  That works fine with NULL pointers as well.  We
>> already rely on the object member being first in all object types in
>> other places in the code.
> 
> Out of curiosity, did you have to do anything to coax this out of ASan
> (e.g., a specific version)?  I've been running it pretty regularly and
> didn't see this one (I did switch from clang to gcc a month or two ago,
> but this code is pretty old, I think).

I did "make -j4 SANITIZE=undefined,address BLK_SHA1=1 test" with
clang version 4.0.1-1 (tags/RELEASE_401/final), and t1450-fsck.sh failed.

René




Re: [PATCH v6 0/7] Support %(trailers) arguments in for-each-ref(1)

2017-10-02 Thread Junio C Hamano
Junio C Hamano  writes:

> Thanks.  t6300 seems to show that %(contents:trailers:unfold) does
> not quite work, among other things.
>
>   https://travis-ci.org/git/git/jobs/282126607#L3658
>
> I didn't have a chance to look into it myself.

Peff's "oops, your logic is backwards" fixes the above failure.

We also need this on top to pass the gettext-poison build.

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 872973b954..3bdfa02559 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -685,19 +685,21 @@ test_expect_success '%(contents:trailers:only) and 
%(contents:trailers:unfold) w
 '
 
 test_expect_success '%(trailers) rejects unknown trailers arguments' '
+   # error message cannot be checked under i18n
cat >expect <<-EOF &&
fatal: unknown %(trailers) argument: unsupported
EOF
test_must_fail git for-each-ref --format="%(trailers:unsupported)" 
2>actual &&
-   test_cmp expect actual
+   test_i18ncmp expect actual
 '
 
 test_expect_success '%(contents:trailers) rejects unknown trailers arguments' '
+   # error message cannot be checked under i18n
cat >expect <<-EOF &&
fatal: unknown %(trailers) argument: unsupported
EOF
test_must_fail git for-each-ref 
--format="%(contents:trailers:unsupported)" 2>actual &&
-   test_cmp expect actual
+   test_i18ncmp expect actual
 '
 
 test_expect_success 'basic atom: head contents:trailers' '


Re: What's cooking in git.git (Sep 2017, #06; Fri, 29)

2017-10-02 Thread Junio C Hamano
Martin Ågren  writes:

> On 29 September 2017 at 06:34, Junio C Hamano  wrote:
>>
>> * sd/branch-copy (2017-09-24) 4 commits
>>   (merged to 'next' on 2017-09-28 at a6eceefa02)
>>  + branch: fix "copy" to never touch HEAD
>>  + branch: add a --copy (-c) option to go with --move (-m)
>>  + branch: add test for -m renaming multiple config sections
>>  + config: create a function to format section headers
>>
>>  "git branch" learned "-c/-C" to create and switch to a new branch
>>  by copying an existing one.
>
> s/and switch to //, I believe, after your finishing patch.

Thanks for careful reading.


Loan

2017-10-02 Thread CAPITAL FINANCE
 UNSECURED BUSINESS/PERSONAL LOAN BY LOAN CAPITAL FINANCE
  - NO COLLATERAL
  - MINIMUM DOCUMENTATION
  - BUSINESS LOAN UP TO FIVE(5) MILLION US DOLLARS

  CONTACT US TODAY VIA EMAIL: financecapital...@mail.com



new contributors presentation

2017-10-02 Thread Nathan PAYRE
Hi all,
me and my two other partner (Daniel and Timothee) have make the choice
to contribute to gitHub for a university project supervised by Mattieu
Moy.

The principal project is to improve the git-send-email function, for
example we will try to implement the possibility to answer to a email
by keeping the recipient list or quote properly the email body!
Do you think that it's will be usefull ?
Do you have any other proposition to improve git-send-email ?

We are new in the contributor's world and we surely need you to
succeed this project !
Do you have any tips ?

Thanks you for reading me !

Nathan, Daniel, Thimothee


Inquiry

2017-10-02 Thread Stenvall Skoeld & Company
Sir/Madam,
I've visited your website and  I find your products very interested at
the first sight to create its place.

To go to the next step which concern the marketing Please kindly send
me your latest catalogue's product and please send us your prices CIF,
to start presentation in order to have feedback of the consumers and
whole sellers/distributors

Please contact us for more information
Your early reply is highly appreciated.Thank

Stenvall Skoeld & Company (China) Limited
4th Floor 139 No. 1 Ruijin Road
Huangpu District, Shanghai
Tel: +86 (21) 6316 6080


git add -p stops working when setting color.ui = always

2017-10-02 Thread Tsvi Mostovicz
Hi,

When setting "color.ui = always" in the last few versions (not sure
exactly when this started, but definitely exists in 2.14.1 and
2.14.2), git add -p stops working as expected. Instead of prompting
the user, it skips through the prompts and doesn't allow selecting a
hunk.

Don't know why I had color.ui = always set in my .gitconfig.

git version 2.14.2.666.gea220ee40

Thanks,


Tsvi Mostovicz
ttm...@gmail.com
www.linkedin.com/in/tsvim


Re: What's cooking in git.git (Sep 2017, #06; Fri, 29)

2017-10-02 Thread Martin Ågren
On 29 September 2017 at 06:34, Junio C Hamano  wrote:
>
> * sd/branch-copy (2017-09-24) 4 commits
>   (merged to 'next' on 2017-09-28 at a6eceefa02)
>  + branch: fix "copy" to never touch HEAD
>  + branch: add a --copy (-c) option to go with --move (-m)
>  + branch: add test for -m renaming multiple config sections
>  + config: create a function to format section headers
>
>  "git branch" learned "-c/-C" to create and switch to a new branch
>  by copying an existing one.

s/and switch to //, I believe, after your finishing patch.


Re: [PATCH] builtin/: add UNLEAKs

2017-10-02 Thread Martin Ågren
On 2 October 2017 at 08:25, Jeff King  wrote:
> On Sun, Oct 01, 2017 at 07:42:08PM +0200, Martin Ågren wrote:
>
>> Add some UNLEAKs where we are about to return from `cmd_*`. UNLEAK the
>> variables in the same order as we've declared them. While addressing
>> `msg` in builtin/tag.c, convert the existing `strbuf_release()` calls as
>> well.
>
> It might have raised Junio's eyebrows less to say something like:
>
>...convert the existing strbuf_release() calls as well (they're not
>wrong, but they also accomplish nothing and create an inconsistency
>with the UNLEAKed variables).

Most likely yes. Sorry about that. I have yet to be critiqued for
writing *too much* in a commit message. That should tell me something.

> Seeing hunks like this makes me happy with the UNLEAK() solution. It
> would have been a real pain to do this via actual freeing.

Yes, I was very happy to have it handy. :-)

Martin


Re: [PATCH 00/11] various lockfile-leaks and -fixes

2017-10-02 Thread Martin Ågren
On 2 October 2017 at 08:30, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I've read through each and with the exception of patches 10 and 11, they
>> all look good to me.
>>
>> For 10 and 11, I think you've convinced me that the current behavior is
>> buggy. I do wonder if the subtleties might be easier to understand as a
>> single patch, but I'm OK either way.
>
> I found it was not too hard to understand what is going on with
> 10/11 as separate patches, and I suspect that it would be also OK if
> they were a single patch (but I cannot easily "unsee" them, so this
> is merely a suspicion).
>
> Thanks, both.  Let's merge this to 'next' soonish.

Thanks both of you for your comments. Based on them, I have made the
following notes:

Patch 1: Make the logic a bit more future-proof/safe.

Patch 9: Instead of the two flags and the `HAS_SINGLE_BIT`-"cuteness",
just drop `CLOSE_LOCK` altogether. That should simplify things a bit
further. It might also help me come up with better explanations for
patches 10-11.

Patches 10-11: I'm sorry that I didn't succeed better in explaining
this. They have a couple of different things going on, but they are
obviously related and even a bit entangled. Let me try and see what I
can do, I'll try squashing them also. Maybe with `CLOSE_LOCK` gone, I'll
do better.

Especially 9-11 make me feel like wanting to re-roll this, for future
readers if nothing else. I expect to be able to do so in the middle of
this week (I don't know how this interferes with Junio's definition of
"soonish").

As fallout that could be taken care of now or later, I've noted this:

Patch 2: A follow-up could teach `update_index_if_able()` to always
commit or roll back.

Patch 5: A follow-up could clean up a similar construct in
`cmd_checkout_index()`.

Thanks again for your input. It's much appreciated.

Martin


Re: [PATCH 09/11] read-cache: require flags for `write_locked_index()`

2017-10-02 Thread Martin Ågren
On 2 October 2017 at 06:14, Martin Ågren  wrote:
> On 2 October 2017 at 05:49, Junio C Hamano  wrote:
>> Martin Ågren  writes:
>>
>>> ... Instead, require that one of the
>>> flags is set. Adjust documentation and the assert we already have for
>>> checking that we don't have too many flags. Add a macro `HAS_SINGLE_BIT`
>>> (inspired by `HAS_MULTI_BITS`) to simplify this check and similar checks
>>> in the future.
>>
>> I do not have a strong opinion against this approach, but if
>> something can take only one of two values, wouldn't it make more
>> sense to express it as a single boolean, I wonder.  Then there is no
>> need to invent a cute HAS_SINGLE_BIT() macro, either.
>>
>> "commit and leave it open" cannot be expressed with such a scheme,
>> but with the HAS_SINGLE_BIT() scheme it can't anyway, so...
>
> I did briefly consider renaming `flags` to `commit` and re-#defining the
> two flags to 0 and 1 (or even updating all the callers to use literal
> zeros and ones). It felt a bit awkward to downgrade `flags` to a bool
> -- normally we'd to the reverse change. But maybe I shouldn't have
> rejected that so easily. If we have a feeling we won't need other flags
> (or the "don't even close the file") any time soon, maybe it'd be good
> to tighten things up a bit. Thanks for looking at these.

Of course it wouldn't have to be as invasive. It could be "the lock will
always be closed and with COMMIT_LOCK, it will also be committed".
CLOSE_LOCK would be removed and the few current users of CLOSE_LOCK
would be converted to use 0.


Re: [PATCH 01/11] sha1_file: do not leak `lock_file`

2017-10-02 Thread Martin Ågren
On 2 October 2017 at 07:26, Jeff King  wrote:
> On Sun, Oct 01, 2017 at 04:56:02PM +0200, Martin Ågren wrote:
>
> The original code is actually a bit confusing/unsafe, as we set the
> "found" flag early and rollback here:
>
>> @@ -481,17 +481,15 @@ void add_to_alternates_file(const char *reference)
>>   strbuf_release();
>>   fclose(in);
>>
>> - if (found) {
>> - rollback_lock_file(lock);
>> - lock = NULL;
>> - }
>> + if (found)
>> + rollback_lock_file();
>
> and that leaves the "out" filehandle dangling. It's correct because of
> the later "do we still have the lock" check:
>
>> - if (lock) {
>> + if (is_lock_file_locked()) {
>>   fprintf_or_die(out, "%s\n", reference);
>
> but the two spots must remain in sync. If I were writing it from scratch
> I might have bumped "found" to the scope of the whole function, and then
> replaced this final "do we still have the lock" with:
>
>   if (found)
> rollback_lock_file();
>   else {
> fprintf_or_die(out, "%s\n", reference);
> if (commit_lock_file())
> ...etc...
>   }
>
> I don't know if it's worth changing now or not.

I'd like to address the feedback on other commits in a re-roll. I will
use that opportunity to try your approach above instead of this patch.

Thanks.


What means "git config bla ~/"?

2017-10-02 Thread rpjday
  i'm sure i'm about to embarrass myself but, in "man git-config",  
OPTIONS, one reads:


  --path

  git-config will expand leading ~ to the value of $HOME, and ~user  
to the   home directory for the specified user. This option has no  
effect when setting the value (but you can use git config bla ~/ from  
the command line to let your shell do the expansion).


  what's with that "git config bla ~/"? is this some config keyword  
or something?


rday



hooks are ignored if there are not marked as executable

2017-10-02 Thread Damien
Hi,

If you do `echo my_script > .git/hooks/pre-commit` and then `git commit`,
The hook is just gonna be ignored.
But if you do `chmod +x .git/hooks/pre-commit`, then it's executed.

I think ignoring a hook is misleading and not newbie friendly, an error
message to signal an incorrectly configured hook could be better.
At least as a warning to be backward-compatible.

Thanks,
Damien


  1   2   >