Regression (?) in core.safecrlf=false messaging

2018-06-03 Thread Anthony Sottile
I'm a bit unclear if I was depending on undocumented behaviour or not
here but it seems the messaging has recently changed with respect to
`core.safecrlf`

My reading of the documentation
https://git-scm.com/docs/git-config#git-config-coresafecrlf (I might
be wrong here)

- core.safecrlf = true -> fail hard when converting
- core.safecrlf = warn -> produce message when converting
- core.safecrlf = false -> convert silently

(note that these are all only relevant when `core.autocrlf = true`)

I've created a small script to demonstrate:

```
set -euxo pipefail

git --version

rm -rf repo
git init repo
cd repo
git config core.autocrlf input
git config core.safecrlf false
echo -en 'foo\r\nbar\r\n' > f
git add f
```

When run against 2.16.4:

```
$ PATH=$PWD/prefix/bin:$PATH bash test.sh
+ git --version
git version 2.16.4
+ rm -rf repo
+ git init repo
Initialized empty Git repository in /tmp/git/repo/.git/
+ cd repo
+ git config core.autocrlf input
+ git config core.safecrlf false
+ echo -en 'foo\r\nbar\r\n'
+ git add f
```

(notice how there are no messages about crlf conversion while adding
-- this is what I expect given I have core.safecrlf=false)


When run against master:

```console
$ PATH=$PWD/prefix/bin:$PATH bash test.sh
+ git --version
git version 2.18.0.rc0.42.gc2c7d17
+ rm -rf repo
+ git init repo
Initialized empty Git repository in /tmp/git/repo/.git/
+ cd repo
+ git config core.autocrlf input
+ git config core.safecrlf false
+ echo -en 'foo\r\nbar\r\n'
+ git add f
warning: CRLF will be replaced by LF in f.
The file will have its original line endings in your working directory.
```

A `git bisect` shows this as the first commit which breaks this
behaviour: 8462ff43e42ab67cecd16fdfb59451a53cc8a945

https://github.com/git/git/commit/8462ff43e42ab67cecd16fdfb59451a53cc8a945

The commit appears to be a refactor (?) that shouldn't have changed behaviour?

Here's the script I was using to bisect:
https://i.fluffy.cc/2L4ZtqV3cHfzNRkKPbHgTcz43HMQJxdK.html

```
git bisect start
git bisect bad v2.17.0
git bisect good v2.16.4
git bisect run ./test.sh
```

Noticed this due to test failures here:
https://github.com/pre-commit/pre-commit/pull/762#issuecomment-394236625

Thanks,

Anthony


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Jeff King
On Sun, Jun 03, 2018 at 11:56:37PM -0400, Jeff King wrote:

> So sometimes some_var needs to be freed and sometimes not (and every one
> of those uses is a potential leak, but it's OK because they're all
> program-lifetime globals anyway, and people don't _tend_ to set the same
> option over and over, leaking heap memory). And C being C, we can't
> convert a pointer-to-pointer-to-const into a pointer-to-pointer without
> an explicit cast.
> 
> Doing it "right" in C would probably involve two variables:
> 
>   const char *some_var = "default";
>   const char *some_var_storage = NULL;
> 
>   int git_config_string_smart(const char **ptr, char **storage,
>   const char *var, const char *value)
>   {
> ...
>   free(*storage);
>   *ptr = *storage = xstrdup(value);
>   return 0;
>   }
> 
>   #define GIT_CONFIG_STRING(name, var, value) \
>   git_config_string_smart(&(name), &(name##_storage), var, value)
> 
> Or something like that.

Oh, one other option I forgot to mention: we already have an "intern"
hashmap helper. So we could just replace the xstrdup() with strintern()
and magically the memory isn't "leaked".

I think this is a little bit of a hack, though. It catches my:

  [core]
  editor = foo
  editor = foo

case, because we only ever make one copy of the string "foo". But if I
do:

  [core]
  editor = 1
  editor = 2
  ...

then we get unique strings. And while they're not _technically_ leaked,
since the hashmap still knows about them, they might as well be. They're
still wasting space through the duration of the program run.

-Peff


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Jeff King  writes:

> With that strategy, we'd have to have a big initialize_defaults()
> function. Which actually might not be _too_ bad since we now have
> common-main.c, but:
>
>   - it sucks to keep the default values far away from the declarations
>
>   - it does carry a runtime cost. Not a huge one, but it sucks to pay it
> on every program startup, even if we're not using those variables.

True, and s/even if/even though most of the time/ the situation is
even worse X-<.


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Jeff King
On Mon, Jun 04, 2018 at 01:26:57PM +0900, Junio C Hamano wrote:

> > Doing it "right" in C would probably involve two variables:
> >
> >   const char *some_var = "default";
> >   const char *some_var_storage = NULL;
> >
> >   int git_config_string_smart(const char **ptr, char **storage,
> >   const char *var, const char *value)
> >   {
> > ...
> > free(*storage);
> > *ptr = *storage = xstrdup(value);
> > return 0;
> >   }
> >
> >   #define GIT_CONFIG_STRING(name, var, value) \
> >   git_config_string_smart(&(name), &(name##_storage), var, value)
> >
> > Or something like that.
> 
> The attitude the approach takes is that "run once and let exit(3)
> clean after us" programs *should* care.

Even with "let exit clean up", we are still leaking heap every time the
variable is assigned after the first. Again, I don't think it matters
that much in practice, but I think:

  [core]
  editor = foo
  editor = foo
  ...etc...

would leak arbitrary memory during the config parse, that would be
allocated for the remainder of the program. I guess you could say exit()
is handling it, but I think the point is that we are letting exit()
handle memory that was potentially useful until we exit, not leaks. :)

> And at that point, maybe
> 
>   char *some_var = xstrdup("default");
>   git_config_string(_var, ...);
> 
> that takes "char **" and frees the current storage before assigning
> to it may be simpler than the two-variable approach.

That _is_ much nicer, but you cannot use xstrdup() as the initializer
for a global "static char *some_var", which is what the majority of the
config variables are. It's this "static initializer sometimes, run-time
heap sometimes" duality to the variables that makes handling it such a
pain.

With that strategy, we'd have to have a big initialize_defaults()
function. Which actually might not be _too_ bad since we now have
common-main.c, but:

  - it sucks to keep the default values far away from the declarations

  - it does carry a runtime cost. Not a huge one, but it sucks to pay it
on every program startup, even if we're not using those variables.

-Peff


Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-03 Thread Jeff King
On Sun, Jun 03, 2018 at 12:27:49AM +0300, Max Kirillov wrote:

> Push passes to another commands, as described in
> https://public-inbox.org/git/20171129032214.gb32...@sigill.intra.peff.net/
> 
> As it gets complicated to correctly track the data length, instead transfer
> the data through parent process and cut the pipe as the specified length is
> reached. Do it only when CONTENT_LENGTH is set, otherwise pass the input
> directly to the forked commands.

I think this approach is reasonable. It's basically converting the
known-length case to a read-to-eof case for the sub-program, which
should paper over any problems of this type. And it's what we really
_want_ the web server to be doing in the first place.

Since this is slightly less efficient, and because it only matters if
the web server does not already close the pipe, should this have a
run-time configuration knob, even if it defaults to
safe-but-slightly-slower?

I admit I don't overly care that much myself (the only large-scale Git
server deployment I am personally familiar with does not use
git-http-backend at all), but it might be nice to leave an escape hatch.

There are a few things in the patch worth fixing, but overall I think it
looks like a pretty good direction. Comments inline.

> diff --git a/http-backend.c b/http-backend.c
> index 3066697a24..78a588c551 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -351,23 +351,22 @@ static ssize_t get_content_length(void)
>   return val;
>  }
>  
> -static ssize_t read_request(int fd, unsigned char **out)
> +static ssize_t read_request(int fd, unsigned char **out, ssize_t req_len)
>  {
> - ssize_t req_len = get_content_length();
> -
>   if (req_len < 0)
>   return read_request_eof(fd, out);
>   else
>   return read_request_fixed_len(fd, req_len, out);
>  }

Minor nit, but it might have been nice to build in this infrastructure
in the first patch, rather than refactoring it here. It would also make
it much more obvious that the first one is not handling some cases,
since we'd have "req_len" but not pass it to all of the code paths. ;)

> @@ -379,11 +378,18 @@ static void inflate_request(const char *prog_name, int 
> out, int buffer_input)
>   if (full_request)
>   n = 0; /* nothing left to read */
>   else
> - n = read_request(0, _request);
> + n = read_request(0, _request, req_len);
>   stream.next_in = full_request;
>   } else {
> - n = xread(0, in_buf, sizeof(in_buf));
> + ssize_t buffer_len;
> + if (req_remaining_len < 0 || req_remaining_len > 
> sizeof(in_buf))
> + buffer_len = sizeof(in_buf);
> + else
> + buffer_len = req_remaining_len;
> + n = xread(0, in_buf, buffer_len);
>   stream.next_in = in_buf;
> + if (req_remaining_len >= 0)
> + req_remaining_len -= n;
>   }

What happens here if xread() returns an error? We probably don't want to
modify req_remaining_len (it probably doesn't matter since we'd report
the errot after this, but it feels funny not to check here).

> +static void pipe_fixed_length(const char *prog_name, int out, size_t req_len)
> +{
> + unsigned char buf[8192];
> + size_t remaining_len = req_len;
> +
> + while (remaining_len > 0) {
> + size_t chunk_length = remaining_len > sizeof(buf) ? sizeof(buf) 
> : remaining_len;
> + size_t n = xread(0, buf, chunk_length);
> + if (n < 0)
> + die_errno("Reading request failed");

I was going to complain that we usually start our error messages with a
lowercase, but this program seems to be an exception. So here you've
followed the local custom, which is OK.

> + if (write_in_full(out, buf, n) < 0)
> + die_errno("%s aborted reading request", prog_name);

We don't necessarily know why the write failed. If it's EPIPE, then yes,
the program probably did abort. But all we know is that write() failed.
We should probably say something more generic like:

  die_errno("unable to write to '%s'");

or similar.

> diff --git a/t/helper/test-print-larger-than-ssize.c 
> b/t/helper/test-print-larger-than-ssize.c
> new file mode 100644
> index 00..83472a32f1
> --- /dev/null
> +++ b/t/helper/test-print-larger-than-ssize.c
> @@ -0,0 +1,11 @@
> +#include "test-tool.h"
> +#include "cache.h"
> +
> +int cmd__print_larger_than_ssize(int argc, const char **argv)
> +{
> + size_t large = ~0;
> +
> + large = ~(large & ~(large >> 1)) + 1;
> + printf("%" PRIuMAX "\n", (uintmax_t) large);
> + return 0;
> +}

I think this might be nicer as part of "git version --build-options".
Either as a byte-size as I showed in 

Re: [PATCH v7 2/2] http-backend: respect CONTENT_LENGTH for receive-pack

2018-06-03 Thread Junio C Hamano
Max Kirillov  writes:

> +static void pipe_fixed_length(const char *prog_name, int out, size_t req_len)
> +{
> + unsigned char buf[8192];
> + size_t remaining_len = req_len;
> +
> + while (remaining_len > 0) {
> + size_t chunk_length = remaining_len > sizeof(buf) ? sizeof(buf) 
> : remaining_len;
> + size_t n = xread(0, buf, chunk_length);
> + if (n < 0)
> + die_errno("Reading request failed");

n that is of type size_t is unsigned and cannot be negative here.


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Jeff King  writes:

> I've looked into it before, but that causes its own wave of headaches.
> The source of the problem is that we do:
>
>   const char *some_var = "default";
>   ...
>   git_config_string(_var, ...);

Yup, that is a valid pattern for "run once and let exit(3) clean
after us" programs.

> Doing it "right" in C would probably involve two variables:
>
>   const char *some_var = "default";
>   const char *some_var_storage = NULL;
>
>   int git_config_string_smart(const char **ptr, char **storage,
>   const char *var, const char *value)
>   {
> ...
>   free(*storage);
>   *ptr = *storage = xstrdup(value);
>   return 0;
>   }
>
>   #define GIT_CONFIG_STRING(name, var, value) \
>   git_config_string_smart(&(name), &(name##_storage), var, value)
>
> Or something like that.

The attitude the approach takes is that "run once and let exit(3)
clean after us" programs *should* care.  And at that point, maybe

char *some_var = xstrdup("default");
git_config_string(_var, ...);

that takes "char **" and frees the current storage before assigning
to it may be simpler than the two-variable approach.

But you're right.  We cannot just unconst the type and be done with
it---there are associated clean-up necessary if we were to do this.

Thanks.


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Jeff King
On Mon, Jun 04, 2018 at 12:44:15PM +0900, Junio C Hamano wrote:

> >> diff --git a/sequencer.c b/sequencer.c
> >> index b98690ecd41..aba03e9429a 100644
> >> --- a/sequencer.c
> >> +++ b/sequencer.c
> >> @@ -175,6 +175,7 @@ static int git_sequencer_config(const char *k, const 
> >> char *v, void *cb)
> >>warning(_("invalid commit message cleanup mode '%s'"),
> >>  s);
> >>  
> >> +  free(s);
> >>return status;
> >>}
> >
> > Shouldn't 's' now lose 'const'?  After all, git_config_string()
> > gives you an allocated memory so...
> 
> Yikes.  Should git_config_string() and git_config_pathname() take
> "char **dst" instead of "const char **" as their out-location
> parameter?  They both assign a pointer to an allocated piece of
> memory for the caller to own or dispose of, but because of
> const-ness of the pointee their first parameter has, a caller like
> this one must declare "const char *s" yet is forced to call free()
> not to leak the value when it is done.

I've looked into it before, but that causes its own wave of headaches.
The source of the problem is that we do:

  const char *some_var = "default";
  ...
  git_config_string(_var, ...);

So sometimes some_var needs to be freed and sometimes not (and every one
of those uses is a potential leak, but it's OK because they're all
program-lifetime globals anyway, and people don't _tend_ to set the same
option over and over, leaking heap memory). And C being C, we can't
convert a pointer-to-pointer-to-const into a pointer-to-pointer without
an explicit cast.

Doing it "right" in C would probably involve two variables:

  const char *some_var = "default";
  const char *some_var_storage = NULL;

  int git_config_string_smart(const char **ptr, char **storage,
  const char *var, const char *value)
  {
...
free(*storage);
*ptr = *storage = xstrdup(value);
return 0;
  }

  #define GIT_CONFIG_STRING(name, var, value) \
  git_config_string_smart(&(name), &(name##_storage), var, value)

Or something like that.

We could also get away from an out-parameter and use the return type,
since the single-pointer conversion is OK. But the primary value of
git_config_string() is that it lets you return the error code as a
one-liner.

-Peff


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> Signed-off-by: Stefan Beller 
>> ---
>>  sequencer.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/sequencer.c b/sequencer.c
>> index b98690ecd41..aba03e9429a 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -175,6 +175,7 @@ static int git_sequencer_config(const char *k, const 
>> char *v, void *cb)
>>  warning(_("invalid commit message cleanup mode '%s'"),
>>s);
>>  
>> +free(s);
>>  return status;
>>  }
>
> Shouldn't 's' now lose 'const'?  After all, git_config_string()
> gives you an allocated memory so...

Yikes.  Should git_config_string() and git_config_pathname() take
"char **dst" instead of "const char **" as their out-location
parameter?  They both assign a pointer to an allocated piece of
memory for the caller to own or dispose of, but because of
const-ness of the pointee their first parameter has, a caller like
this one must declare "const char *s" yet is forced to call free()
not to leak the value when it is done.



Re: [PATCH v7 1/2] http-backend: respect CONTENT_LENGTH as specified by rfc3875

2018-06-03 Thread Jeff King
On Sun, Jun 03, 2018 at 12:27:48AM +0300, Max Kirillov wrote:

> http-backend reads whole input until EOF. However, the RFC 3875 specifies
> that a script must read only as many bytes as specified by CONTENT_LENGTH
> environment variable. Web server may exercise the specification by not closing
> the script's standard input after writing content. In that case http-backend
> would hang waiting for the input. The issue is known to happen with
> IIS/Windows, for example.
> 
> Make http-backend read only CONTENT_LENGTH bytes, if it's defined, rather than
> the whole input until EOF. If the variable is not defined, keep older behavior
> of reading until EOF because it is used to support chunked transfer-encoding.
> 
> Signed-off-by: Florian Manschwetus 
> [mk: fixed trivial build failures and polished style issues]
> Helped-by: Junio C Hamano 
> Signed-off-by: Max Kirillov 
> ---
>  config.c   |  2 +-
>  config.h   |  1 +
>  http-backend.c | 43 ++-
>  3 files changed, 44 insertions(+), 2 deletions(-)

This first patch looks good to me, though it may be worth mentioning in
the commit message that we're only handling the buffered-input side here
(that is obvious to anybody reading this whole series now, but it may
help out people digging in the history later).

-Peff


Re: [PATCH] Use ZSH_NAME instead of ZSH_VERSION because it's redefined by git-completion.zsh

2018-06-03 Thread Junio C Hamano
Rick van Hattem  writes:

> The `git-completion.zsh` unsets the `$ZSH_VERSION` which makes this check 
> moot. The result (at least for me) is that zsh segfaults because of all the 
> variables it's unsetting.
> ---

Overlong line, lack of sign-off.

>  # Clear the variables caching builtins' options when (re-)sourcing
>  # the completion script.
> -if [[ -n ${ZSH_VERSION-} ]]; then
> +if [[ -n ${ZSH_NAME-} ]]; then

I am not a zsh user, and I do not know how reliable $ZSH_NAME can be
taken as an indication that we are running zsh and have already
found a usable git-completion-.bash script.

I think what the proposed log message refers to as "unsets" is this
part of the script:

...
zstyle -s ":completion:*:*:git:*" script script
if [ -z "$script" ]; then
local -a locations
local e
locations=(
$(dirname ${funcsourcetrace[1]%:*})/git-completion.bash
...
)
for e in $locations; do
test -f $e && script="$e" && break
done
fi
ZSH_VERSION='' . "$script"
...

If your ZSH_VERSION is empty, doesn't it indicate that the script
did not find a usable git-completion.bash script (to which it
outsources the bulk of the completion work)?  I do agree segfaulting
is not a friendly way to tell you that your setup is lacking to make
it work, but I have a feeling that what you are seeing is an
indication of a bigger problem, which will be sweeped under the rug
with this patch but without getting fixed...


Re: [PATCH] update-ref --stdin: use skip_prefix()

2018-06-03 Thread Jeff King
On Sun, Jun 03, 2018 at 04:36:51PM +0200, SZEDER Gábor wrote:

> Use skip_prefix() instead of starts_with() and strcmp() when parsing
> 'git update-ref's stdin to avoid a couple of magic numbers.

I was coincidentally looking at this the other day also noticed these.
Thanks for cleaning it up (and your patch looks obviously correct).

I also found it funny that we read the whole input into a buffer and
parse from there, rather than using strbuf_getline(). But that's
intentional due to e23d84350a (update-ref --stdin: read the whole input
at once, 2014-04-07). I think the line-oriented protocol actually can be
easily read like that, but the "-z" format ends up having to do awkward
reads.

Anyway, sort of a tangent, but I didn't want anybody else looking at
this having to dig down the same hole I did. ;)

-Peff


Re: [RFC PATCH 4/7] merge-recursive: fix assumption that head tree being merged is HEAD

2018-06-03 Thread Junio C Hamano
Elijah Newren  writes:

> `git merge-recursive` does a three-way merge between user-specified trees
> base, head, and remote.  Since the user is allowed to specify head, we can
> not necesarily assume that head == HEAD.
>
> We modify index_has_changes() to take an extra argument specifying the
> tree to compare the index to.  If NULL, it will compare to HEAD.  We then
> use this from merge-recursive to make sure we compare to the
> user-specified head.
>
> Signed-off-by: Elijah Newren 
> ---
>
> I'm really unsure where the index_has_changes() declaration should go;
> I stuck it in tree.h, but is there a better spot?

I think I saw you tried to lift an assumption that we're always
working on the_index in a separate patch recently.  Should that
logic apply also to this part of the codebase?  IOW, shouldn't
index_has_changes() take a pointer to istate (as opposed to a
function that uses the implicit the_index that should be named as
"cache_has_changes()" or something?)

I tend to think this function as part of the larger read-cache.c
family whose definitions are in cache.h and accompanied by macros
that are protected by NO_THE_INDEX_COMPATIBILITY_MACROS so if we
were to move it elsewhere, I'd keep the header part as-is and
implementation to read-cache.c to keep it together with the family,
but I do not see a huge issue with the current placement, either.

> diff --git a/cache.h b/cache.h
> index 89a107a7f7..439b9d9f6e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -634,14 +634,6 @@ extern int discard_index(struct index_state *);
>  extern void move_index_extensions(struct index_state *dst, struct 
> index_state *src);
>  extern int unmerged_index(const struct index_state *);
>  
> -/**
> - * Returns 1 if the index differs from HEAD, 0 otherwise. When on an unborn
> - * branch, returns 1 if there are entries in the index, 0 otherwise. If an
> - * strbuf is provided, the space-separated list of files that differ will be
> - * appended to it.
> - */
> -extern int index_has_changes(struct strbuf *sb);
> -
>  extern int verify_path(const char *path, unsigned mode);
>  extern int strcmp_offset(const char *s1, const char *s2, size_t 
> *first_change);
>  extern int index_dir_exists(struct index_state *istate, const char *name, 
> int namelen);
> diff --git a/merge-recursive.c b/merge-recursive.c
> index b3deb7b182..762aa087d0 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -3263,7 +3263,7 @@ int merge_trees(struct merge_options *o,
>   if (oid_eq(>object.oid, >object.oid)) {
>   struct strbuf sb = STRBUF_INIT;
>  
> - if (!o->call_depth && index_has_changes()) {
> + if (!o->call_depth && index_has_changes(, head)) {
>   err(o, _("Your local changes to the following files 
> would be overwritten by merge:\n  %s"),
>   sb.buf);
>   return -1;
> diff --git a/merge.c b/merge.c
> index 0783858739..965d287646 100644
> --- a/merge.c
> +++ b/merge.c
> @@ -14,19 +14,21 @@ static const char *merge_argument(struct commit *commit)
>   return oid_to_hex(commit ? >object.oid : 
> the_hash_algo->empty_tree);
>  }
>  
> -int index_has_changes(struct strbuf *sb)
> +int index_has_changes(struct strbuf *sb, struct tree *tree)
>  {
> - struct object_id head;
> + struct object_id cmp;
>   int i;
>  
> - if (!get_oid_tree("HEAD", )) {
> + if (tree)
> + cmp = tree->object.oid;
> + if (tree || !get_oid_tree("HEAD", )) {
>   struct diff_options opt;
>  
>   diff_setup();
>   opt.flags.exit_with_status = 1;
>   if (!sb)
>   opt.flags.quick = 1;
> - do_diff_cache(, );
> + do_diff_cache(, );
>   diffcore_std();
>   for (i = 0; sb && i < diff_queued_diff.nr; i++) {
>   if (i)
> diff --git a/t/t6044-merge-unrelated-index-changes.sh 
> b/t/t6044-merge-unrelated-index-changes.sh
> index 3876cfa4fa..1d43712c52 100755
> --- a/t/t6044-merge-unrelated-index-changes.sh
> +++ b/t/t6044-merge-unrelated-index-changes.sh
> @@ -126,7 +126,7 @@ test_expect_success 'recursive, when merge branch matches 
> merge base' '
>   test_path_is_missing .git/MERGE_HEAD
>  '
>  
> -test_expect_failure 'merge-recursive, when index==head but head!=HEAD' '
> +test_expect_success 'merge-recursive, when index==head but head!=HEAD' '
>   git reset --hard &&
>   git checkout C^0 &&
>  
> diff --git a/tree.h b/tree.h
> index e2a80be4ef..2e1d8ea720 100644
> --- a/tree.h
> +++ b/tree.h
> @@ -37,4 +37,12 @@ extern int read_tree_recursive(struct tree *tree,
>  extern int read_tree(struct tree *tree, int stage, struct pathspec *pathspec,
>struct index_state *istate);
>  
> +/**
> + * Returns 1 if the index differs from tree, 0 otherwise.  If tree is NULL,
> + * compares to HEAD.  If tree is NULL and on an unborn branch, returns 1 if
> + * there are entries in the 

Re: does a stash *need* any reference to the branch on which it was created?

2018-06-03 Thread Junio C Hamano
"Robert P. J. Day"  writes:

>   i realize that, when you "git stash push", stash graciously saves
> the branch you were on as part of the commit message, but does any
> subsequent stash operation technically *need* that branch name?

It is not "saves", but "the message it automatically generates
includes  and  as a human readable reminder".

"git stash" does not have to read that message, as it is not
prepared to read and understand what you wrote after you ran your
own "git stash push -m 'my random message'" anyway.  It is merely
for your consumption, especially when it appears in "git stash list".


Re: format-patch: no 'prerequisite-patch-id' info when specifying commit range

2018-06-03 Thread Ye Xiaolong
On 06/04, Junio C Hamano wrote:
>Ye Xiaolong  writes:
>
>> I narrowed down the problem to revision walk, if users specify the commit 
>> range
>> via "Z..C" pattern, the first prepare_revision_walk function called in
>> cmd_format_patch would mark all parents (ancestors) of Z to be uninteresting,
>> thus the next revision walk in prepare_bases wouldn't be able to reach
>> prerequisite patches, one quick solution I can think of is to clear
>> UNINTERESTING flag in reset_revision_walk, like below:
>>
>> void reset_revision_walk(void)
>> {
>>  clear_object_flags(SEEN | ADDED | SHOWN| UNINTERESTING);
>> }
>
>When you are done with objects that are UNINTERESTING in your
>application (i.e. only when "format-patch" is told to compute list
>of prereq patches by doing an extra revision walk), your application
>can call clear_object_flags() on the flags you are done with, I
>would think.
>
>But the current callers of reset_revision_walk() do not expect any
>flags other than the ones that are used to keep track of the
>traversal state, so it is likely you will break them if you suddenly
>started to clear flags randomly.

Got it, I'll try to call clear_object_flags in format-patch related codepatch
only, not to touch the global reset_revision_walk.

Thanks,
Xiaolong


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Stefan Beller  writes:

> Signed-off-by: Stefan Beller 
> ---
>  sequencer.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index b98690ecd41..aba03e9429a 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -175,6 +175,7 @@ static int git_sequencer_config(const char *k, const char 
> *v, void *cb)
>   warning(_("invalid commit message cleanup mode '%s'"),
> s);
>  
> + free(s);
>   return status;
>   }

Shouldn't 's' now lose 'const'?  After all, git_config_string()
gives you an allocated memory so...


Re: [RFC PATCH 4/7] merge-recursive: fix assumption that head tree being merged is HEAD

2018-06-03 Thread Ramsay Jones



On 04/06/18 00:37, brian m. carlson wrote:
> On Sun, Jun 03, 2018 at 02:52:12PM +0100, Ramsay Jones wrote:
>> On 03/06/18 07:58, Elijah Newren wrote:
>>> I'm really unsure where the index_has_changes() declaration should go;
>>> I stuck it in tree.h, but is there a better spot?
>>
>> Err, leave it where it is and '#include "tree.h"' ? :-D
> 
> Or leave it where it is and use a forward structure declaration?

Indeed, I had intended to mention that possibility as well.

[Note: the "merge-recursive.h" header file references several
'struct tree *' parameters, but does not itself include a
declaration/definition from any source. So, in all of the six
files that #include it, it relies on a previous #include to
provide such a declaration/definition. I haven't checked, but
I think that it is usually provided by the "commit.h" header (even
on the single occasion that "tree.h" was included!).]


ATB,
Ramsay Jones





Re: [RFC PATCH 4/6] commit-graph: avoid writing when repo is shallow

2018-06-03 Thread Junio C Hamano
Derrick Stolee  writes:

>>> several reasons. Instead of doing the hard thing to fix those
>>> interactions, instead prevent reading or writing a commit-graph file for
>>> shallow repositories.
>> The latter instead would want to vanish, I would guess.
>
> Do you mean that we should call destroy_commit_graph() if we detect a
> shallow repository during write_commit_graph(), then I can make that
> change.
>

No, I was just having trouble with reading "Instead of doing X,
instead do Y".


Re: format-patch: no 'prerequisite-patch-id' info when specifying commit range

2018-06-03 Thread Junio C Hamano
Ye Xiaolong  writes:

> I narrowed down the problem to revision walk, if users specify the commit 
> range
> via "Z..C" pattern, the first prepare_revision_walk function called in
> cmd_format_patch would mark all parents (ancestors) of Z to be uninteresting,
> thus the next revision walk in prepare_bases wouldn't be able to reach
> prerequisite patches, one quick solution I can think of is to clear
> UNINTERESTING flag in reset_revision_walk, like below:
>
> void reset_revision_walk(void)
> {
>   clear_object_flags(SEEN | ADDED | SHOWN| UNINTERESTING);
> }

When you are done with objects that are UNINTERESTING in your
application (i.e. only when "format-patch" is told to compute list
of prereq patches by doing an extra revision walk), your application
can call clear_object_flags() on the flags you are done with, I
would think.

But the current callers of reset_revision_walk() do not expect any
flags other than the ones that are used to keep track of the
traversal state, so it is likely you will break them if you suddenly
started to clear flags randomly.


Re: [PATCH v4 8/9] checkout: add advice for ambiguous "checkout "

2018-06-03 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I.e. I was trying to avoid printing out the "error: pathspec 'master'
> did not match any file(s) known to git." error altogether. That's still
> arguably a good direction, since we *know* "master" would have otherwise
> matched a remote branch, so that's probably a more informative message
> than falling back to checking out pathspecs and failing, and complaining
> about there being no such pathspec.

That ideal behaviour makes sense.

> But it was a pain to handle the various edge cases, e.g.:
>
> $ ./git --exec-path=$PWD checkout x y z
> error: pathspec 'x' did not match any file(s) known to git.
> error: pathspec 'y' did not match any file(s) known to git.
> error: pathspec 'z' did not match any file(s) known to git.

Let's take a detour to a tangent, as this example does not have
anything to do with the remote-tracking auto-dwimming. 

Ideally what do we want to say in this case?  What's allowed for 'x'
(2 possibilities) is different from whats allowed for 'y' and 'z'
(only 1 possibility)---do we want to complain that 'x' is not a rev
noris a file (we do not say 'x' could be a misspelt rev name right
now), and then 'y' and 'z' are not files (which is what we do now)?

That might be an improvement.  I dunno.  In any case, that is a
tangent that we do not have to address with these patches.

In contrast, the command line without y and z gives three
possibilities to 'x'.  'x' is not a rev, is not a remote-tracking
branch name that only a single remote has, and is not a file.  Now,
if we are going to mention that we failed to interpret it as the
latter two, perhaps we should also mention that it was not a rev
(which could have been misspelt)?

> So I decided just to let checkout_paths() to its thing and then print
> out an error about dwim branches if applicable if it failed.

Yeah, I think I get it now.  If you want to silence the "error" from
report_path_error() and replace it with something else, you would
need to change checout_paths(), as this function is sort-of used as
the last ditch effort after all else failed, and right now it is not
aware of what exactly all these other failed efforts were.

Thanks.  I'm looking at v6 reroll for queuing.





Re: [RFC PATCH 4/7] merge-recursive: fix assumption that head tree being merged is HEAD

2018-06-03 Thread brian m. carlson
On Sun, Jun 03, 2018 at 02:52:12PM +0100, Ramsay Jones wrote:
> On 03/06/18 07:58, Elijah Newren wrote:
> > I'm really unsure where the index_has_changes() declaration should go;
> > I stuck it in tree.h, but is there a better spot?
> 
> Err, leave it where it is and '#include "tree.h"' ? :-D

Or leave it where it is and use a forward structure declaration?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 11:28:43PM +0100, Philip Oakley wrote:
> It is here that Article 6 kicks in as to whether the 'organisation' can
> retain the data and continue to use it.

Article 6 is not about continuing to use data. Article 6 is about 
having and even obtaining it in the first place.

Article 17 and article 21 are about continuing to use data.

> For an open source project with an open source licence then an implict DCO
> applies for the meta data. It is the legal  basis for the the release.

Neither article 6 nor 17 or 21 have anything remotely like an "implicit 
DCO" as a legitimization for publishing employee data.

The GDPR is very explicit about implicit stuff never being a basis for 
consent, if you want to imply that is your basis. And consent can be 
withdrawn at any time anyway.

An open source license has nothing whatsoever to do with the question 
of version control metadata. A public version control system is not 
necessary to publish open source software.

> > - copyright is about distributing the program, not about distributing
> > version control metadata.
> It is specificaly about giving that right to copy by Jane Doe (but git gives
> no other information other than that supposedly globally unique 'author
> email'.

I don't get what you are saying. As I said, a public version control 
system is not necessary to publish open source software. The two things 
may be intimately related in practice, but not in theory.

> > - Being named is a right, not an obligation of the author. Hence, if
> > the author doesn't want his name published, the company doesn't have
> > legitimate grounds based in copyright for doing it anyway, against his
> > or her will.
> Git for Open Source is about open licencing by name. I'd agree that a closed
> corporate licence stays closed, but not forgotten.

Again I don't get what you are saying. The author has a right to be 
named as the author, not an obligation. This has nothing whatsoever to 
do with the question of Open Source vs. closed corporate licenses.

> > Let's be honest: We do not know what legitimization exactly in each
> > specific case the git metadata is being distributed under.
> 
> We should know, already. A specific licence [or limit] should be in place.
> We don't really want to have to let a court decide ;-)

It is insufficient to have a license for distributing the program. The 
license is not a GDPR legitimization for git metadata. Distributing the 
program can be done without distributing the author's identity as part 
of the metadata of his commits.

> The law is never decided by technical means, unfortunately.

It is. The GDPR refers to the state of the art of technology without 
defining it. Thus, technical means are very important in the GDPR. This 
may be something new for lawyers. If technology changes tomorrow, even 
without anything else changing, you may be breaking the GDPR by this 
simple fact tomorrow, while not breaking it today.

Again: Technology is very important in the GDPR.

> Regular git users should have no issues - they just need to point 
> their finger at the responsible authority.

If git users are putting commits online for global download, they are 
the responsible authority.

> The DCO/GPL2 are the legitimate data record that recipients should have for
> their copy. There is no right to be forgotten at that point.

What do you mean by "should have for their copy"? Why shouldn't there 
be a right to be forgotten? Open Source Software has been distributed a 
lot without detailed version control history information. Having this 
information as a record is certainly in the interest of the recipient, 
but it is very very questionable that it is an overriding legitimate 
grounds as per Art. 17 for keeping that data.

> I see the solution to be elsewhere, and that it is in some ways a strawman
> discussion: "if someone has the right to be forgotten, how do we delete the
> meta data", when that right (to delete the meta data in a properly licence
> repo) does not exist.

See, this kind of shady legal argument is what lawyers are selling you. 
Why not put the energy into designing a technical solution.

They tell you: "Ignore the GDPR. I will give you backup by giving you 
lots of disclaimers and excuses for doing so. Just give me a lot of 
money."

Having the ability to validate yet erase data form repositorys is 
desirable from a technical point of view. It has a lot of uses, not 
necessarily only legal ones. The objection of efficiency raised by Ted 
is a valid one. The strawman argument is not.

Best wishes
Peter
-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: GDPR compliance best practices?

2018-06-03 Thread Philip Oakley

From: "Peter Backes" 

On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote:

In most Git cases that legal/legitimate purpose is the copyright licence,
and/or corporate employment. That is, Jane wrote it, hence X has a legal
rights of use, and we need to have a record of that (Jane wrote it) as
evidence of that (I'm X, I can use it) right. That would mean that Jane
cannot just ask to have that record removed and expect it to be removed.


Re corporate employment:

For sure nobody would dare to quesion that a company has a right to
keep an internal record that Jane wrote it.

The issue is publishing that information. This is an entirely different
story.


It is here that Article 6 kicks in as to whether the 'organisation' can 
retain the data and continue to use it.

https://gdpr-info.eu/art-6-gdpr/
https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/lawful-basis-for-processing/
https://www.lawscot.org.uk/news-and-events/news/gdpr-legal-basis-and-why-it-matters/

For an open source project with an open source licence then an implict DCO 
applies for the meta data. It is the legal  basis for the the release.


If a corporate project has a closed source project, then yes, open 
publishing of that personal data within a repo's meta data would be 
incorrect, even though the internal repo would be kept.





I already stressed that from the very beginning.

Re copyright license:

No, a copyright license does not provide a legitimization.

- copyright is about distributing the program, not about distributing
version control metadata.


It is specificaly about giving that right to copy by Jane Doe (but git gives 
no other information other than that supposedly globally unique 'author 
email'.




- Being named is a right, not an obligation of the author. Hence, if
the author doesn't want his name published, the company doesn't have
legitimate grounds based in copyright for doing it anyway, against his
or her will.


Git for Open Source is about open licencing by name. I'd agree that a closed 
corporate licence stays closed, but not forgotten.





From a personal view, many folk want it to be that corporates (and open
source organisations) should hold no personal information with having
explicit permission that can then be withdrawn, with deletion to follow.
However that 'legal' clause does [generally] win.


Let's be honest: We do not know what legitimization exactly in each
specific case the git metadata is being distributed under.


We should know, already. A specific licence [or limit] should be in place. 
We don't really want to have to let a court decide ;-)




It may be copyright, it may be employment, but it may also be revocable
consent. This is, we cannot safely assume that no git user will ever
have to deal with a legitimate request based on the right to be
forgotten.



The law is never decided by technical means, unfortunately. Regular git 
users should have no issues - they just need to point their finger at the 
responsible authority. (beware though, of the oneway trap door that the 
users mistakes can become the problem for the responsible authority!)



In the git.git case (and linux.git) there is the DCO (to back up the 
GLP2)
as an explicit requirement/certification that puts the information into 
the

legal evidence category. IIUC almost all copyright ends up with a similar
evidentail trail for the meta data.


This makes things more complicated, not less. You have yet more meta
data to cope with, yet more opportunities to be bitten by the right to
be forgotten. Since I proposed a list of metadata where each entry can
be anonymized independently of each other, it would be able to deal
with this perfectly.


The DCO/GPL2 are the legitimate data record that recipients should have for 
their copy. There is no right to be forgotten at that point.




The more likely problem is if the content of the repo, rather than the 
meta

data, is subject to GDPR, and that could easily ruin any storage method.
Being able to mark an object as  would help here(*).


My proposal supports any part of the commit, including the contents of
individual files, as eraseable, yet verifiable data.


Also remember that most EU legislation is 'intent' based, rather than
'letter of', for the style of legal arguments (which is where some of the 
UK
Brexit misunderstandings come from), so it is more than possible to get 
into
the situation where an action is both mandated and illegal at the same 
time,
so plent of snake oil salesman continue to sell magic fixes according to 
the

customers local biases.


This may be true. I am not trying to sell snake oil, however. To have
erasure and verifiability at the same time is a highly generic feature
that may be desirable to have for a multitude of reasons, including but
not limited to legal ones like GDPR and copyright violations.


I do not believe Git has anything to worry about that wasn't already an
issue.


Yes, but it 

Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 05:03:44PM -0400, Theodore Y. Ts'o wrote:
> If you don't think a potential 2x -- 10x performance hit isn't a
> blocking factor --- sure, go ahead and try implementing it.  And good
> luck to you.  And this is not a guarantee that it won't get rejected.
> I certainly don't have the power to make that guarantee.

I do not want or expect a guarantee, or even a probability, of course. 
Just trying to avoid "STRONG REJECT. We could have said you before you 
even started implementing. Why didn't you discuss this beforehand?"

One would simply change something like

author A U Thor  1465982009 +

into something like

author 21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U 
Thor 
author-hash 469bb107e38f8e59dddb3bbd6f8646e052bf73d48427865563c7358a64467f2c
authordate c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 
1465982009 +
authordate-hash 199875e5aedb6cb164a2b40c16209dc5bb37f34c059a56c6d96766440fb0fe68

and then compute the commit id without the "author" and the 
"authordate" lines.

The *-hash values were obtained as follows:

echo -n '21bbba8e9ce9734022d2c23df247a2704c0320ad7d43c02e8bdecdfae27e23b4 A U 
Thor ' | sha3sum -a 256
echo -n 'c444f739ca317e09dbd3dae1207065585ae2c2e18cd0fc434b5bde08df1e0569 
1465982009 +' | sha3sum -a 256

The hex values here are simply the $huge_random_numbers

Verifying the commit ID by itself wouldn't be any less efficient than 
before. Admitteldly, it wouldn't verify the author and authordate 
integrity anymore without additional work. That would be some overhead, 
sure, and could be done on demand, and would mostly affect clones. I 
don't think it would be that much of a problem. It can be parallelized 
easily. The hashes for each field are independent of each other. They 
can all be verified in parallel in different threads running on 
different cores.

On djb's typical 2015 skylake machine the supercop benchmark tells us 
that sha3-256 (~=keccakc512) has a speed of about 20 cycles/byte for 
blocks of 64 bytes of data, see 
https://bench.cr.yp.to/results-sha3.html#amd64-skylake

Let's say we have 128 bytes of data on average for the author field, so 
conservatively speaking it takes about 3000 cycles (> 128*20) to hash 
and compare the hash.

At 3000 MHz, we can thus do roughly about 1000 verifications per second 
per core.

Let's assume we have 10 anonymizable fields of this kind per commit.

Then the overhead would be one second per 100 x ncores commits.

How many commits are we talking about in a huge repository? And how 
long does a clone of such a huge repository take at the moment? Do you 
have any numbers?

> If you don't have time to implement, why do you think it's fair to
> inflict on everyone else the request for time to do a design review
> for something for which the need hasn't even been established?

I do not request from anyone to even reply to my messages. I just see a 
lot of time being wasted by discussing things about my proposal that 
are technically irrelevant. If that time were put into reviewing the 
design, it would be spent better.

Please don't devalue a proposal. It is not true that the only value is 
in actual code and proposals are "bullshit".

I was not the first to raise the issue, as I clearly showed in my 
initial email.

The demand is in fact high; very high. At present, that demand is 
satisfied by lawyers. Who are writing snake oil disclaimers and such 
for enormous sums of money. In a lot of companies. To "solve" a 
technical issue by pseudo-legal means by finding excuses for why the 
"right to be forgotten" doesn't have to be implemented in specific 
cases such as git. What if all that lawyer money were put into actually 
solving the technical issues as technical issues? Engineers are 
apparently bad at marketing, the lawyers seem more successful in that 
respect.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


[GSoC] GSoC with git, week 5

2018-06-03 Thread Alban Gruin
Hi,

I wrote a post about this week. Unfortunately, I have a technical issue
with my website, so you can find it right in this email instead. I’m
sorry about that. I will publish it on my blog as soon as it comes back
online.

Feel free to give me your feedback!

Cheers,
Alban


--

As you may know, I made no less than three attempts to upstream my
changes about ``preserve-merges`` last week.

This week began by a fourth one[1].

4th attempt
---

Johannes pointed out some problems with my commit messages on that
series:

 - some lines are too long
 - they do not describe what I wanted to do well.

On the first point, commits messages should wrap at 72 characters, but
I configured Emacs to wrap pretty much anything at 80 characters.

He also wanted to keep the original name of
``git_rebase__interactive__preserve_merges()`` (which I renamed
``git_rebase__preserve_merges()``), but I decided not to, as this
function is now part of ``git-rebase--preserve-merges.sh``.

Also, Friday, Junio Hamano announced[2] that my changes (among many
others) have been merged into the ``pu`` (proposed updates) branch of
git.git. This does not mean that it will necessarily hit ``master``,
but it’s a start.

TODO: help
--

When you start an interactive rebase, you’re greeted with your
``$EDITOR``, containing a list of commits to edit, and an help text
explaining you how this works.

So, this week, I rewrote the code writing this message from shell
to C. You can find the branch here[3], and the discussion on the list
here[4].

The conversion itself is quite trivial, but strings are a rather
curious case here: some of them begins with one or two newlines. As
this becomes useless in C, Stefan advised me to remove them, as this
would probably cause less confusion for the translators, but it
implies changing the translations, as pointed out by Johannes and
Phillip Wood. Right now, no clear opinion has emerged from the
discussion.

Some additionnal work and refactoring could be made once more code has
been converted. For instance, the todo-list file is opened, modified,
and closed several times. Instead, we could create a ``strbuf``, build
it progressively, then write it once to the todo file.

What’s next?


I’ve started to work on converting the ``edit-todo`` functionnality of
interactive rebase. This is also a straightforward change, but it
would also require some future work once the conversion is completed.

[1]
https://public-inbox.org/git/20180528123422.6718-1-alban.gr...@gmail.com/
[2] https://public-inbox.org/git/xmqqa7sf7yzs@gitster-ct.c.googlers.com/
[3] https://github.com/agrn/git/tree/ag/append-todo-help-c
[4]
https://public-inbox.org/git/20180531110130.18839-1-alban.gr...@gmail.com/


Re: GDPR compliance best practices?

2018-06-03 Thread Theodore Y. Ts'o
On Sun, Jun 03, 2018 at 10:52:33PM +02h00, hPeter Backes wrote:
> But I will take your message as saying you at least don't see any 
> obvious criticism leading to complete rejection of the approach.

If you don't think a potential 2x -- 10x performance hit isn't a
blocking factor --- sure, go ahead and try implementing it.  And good
luck to you.  And this is not a guarantee that it won't get rejected.
I certainly don't have the power to make that guarantee.

If you don't have time to implement, why do you think it's fair to
inflict on everyone else the request for time to do a design review
for something for which the need hasn't even been established?

Regards,

 - Ted


Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 04:07:39PM -0400, Theodore Y. Ts'o wrote:
> Why don't you try to implement your proposal then, and then benchmark
> it.  After you find out how much of a performance disaster it's going
> to be, especially for large git repos, we can discuss who is being
> tyrannical.

See, Ted, but I have this other hobby project with git stash preserving 
timestamps, which is 90% done but not yet finished. I am a very busy 
person. I might implement it but it's not the topmost priority. Thus, 
first I want to discuss to not waste too much time implementing 
something that's then rejected by valid criticism while that criticms 
could have been raised beforehand. Perhaps I can convince my employer 
to work on it on their account. But there's so much to do at the moment.

I have a PhD, about very complex things like static program analysis by 
abstract interpretation. I love hacking very much but I can mostly only 
do it as a hobby because humanity is better served doing the complex 
things that not every hacker can do.

I know I am being whiny but that's how it is.

But I will take your message as saying you at least don't see any 
obvious criticism leading to complete rejection of the approach.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 09:48:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Sure, but what I'm pointing out is a) you can't focus on git as the
> technology because it tells you nothing about what's being done with it
> (e.g. the log file case I mentioned b) nobody who came up with the GDPR
> was concerned with some free software projects or the SCM used by
> companies, so this is very unlikely to be enforced.

As I already said, the GDPR refers to the state of the art in 
technology, without defining it.

The GDPR provides a generic framework. It covers everyone. From a 
single person running a small blog to a S enterprise. It also 
covers non-profits and state authorities. Everyone is covered. 
Including SCM used.

The GDPR will be enforced against SCMs. The question is just who will 
be the first to be affected. I suspect it will be a mega-corporation 
who fired one of their developers who wants to fight back and exercise 
his right to be forgotten against the company's public git repos.

> So nobody can be GDPR compliant in the face of archive.org and the like?

The GDPR has special exceptions for archives and the like.

> It does if you've got the ref. Maybe I just don't get your proposal,
> quote:
> 
> Do not hash anything directly to obtain the commit ID. Instead, hash a
> list of hashes of [$random_number, $information] pairs. $information
> could be an author id, a commit date, a comment, or anything else. Then
> store the commit id, the list of hashes, and the list of pairs to form
> the commit.
> 
> You're just proposing (if I've read this correctly) that the commit
> object should have some list of headers pointing to other SHA1s, and
> that fsck and the like be OK with these going away. Right?

Certainly not SHA1. SHA1 is completely broken. I know Linus has a bit 
of a different opinion. But there's really no defense for SHA1. It's an 
utterly broken algorithm and should not be used at all anymore.

> How is this intrinsically different from referring to something in the
> ref namespace that may be deleted in the future?

I guess I am partly repeating myself, but:

1. Having fsck be OK with erasure is not enough. It tells you nothing 
about anonymization. If the hash is the same in 5000 instances that's 
pseudonymization, not anonymization. You need to ensure a different 
hash in each instance, and you need to ensure there's no easy way to 
reconstruct the data from its hash. Hence $random_number (or let's call 
it $huge_random_number, it should have x bits if the hash has x bits). 
If you have the SHA1 64ca93f83bb29b51d8cbd6f3e6a8daff2e08d3ec it's too 
easy to figure out the plaintext (it's "Peter" BTW).

2. If you use a random UUID you cannot reconstruct the data from its 
hash, but you have the same issue about UUID reuse. Plus, you lose the 
ability to verify the author's name as part of the commit.

> Okey, so you're not reading the GDPR in some literal sense, but you're
> coming to a conclusion that's supported by ... what? To echo Theodore
> Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on
> this subject?

I'm replying in private conversation about this one. It's not relevant 
for this discussion.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: GDPR compliance best practices?

2018-06-03 Thread Theodore Y. Ts'o
On Sun, Jun 03, 2018 at 09:24:17PM +0200, Peter Backes wrote:
> 
> He said: It would be a tyranny of lawyers.
> 
> Let's not have a tyranny of lawyers. Let us, the engineers and hackers, 
> exercise the necessary control over those pesky lawyers by defining and 
> redefining the state of the art in technology, and prevent them from 
> defining it by themselves. For a hammer, everything looks like a nail. 
> What is the better options: To suggest people to pay for legal advice 
> by lawyers, who only offer lengthy disclaimers and such for bypassing 
> the right to be forgotten, or simply discuss technical changes for git 
> which enable its easy implementation, without legal excuses for not 
> doing supporting it?

Why don't you try to implement your proposal then, and then benchmark
it.  After you find out how much of a performance disaster it's going
to be, especially for large git repos, we can discuss who is being
tyrannical.
 
It may very well be that different people and companies will get
different legal advice, and one of the interesting things about many
git repos for open source project is that it is not owned by any one
company.  A change in the git repo format is one that has to be
adopted by the entire open source project, and if a portion of the
community isn't interesting in paying the overhead cost, and sticks
with the existing git repo format, I wonder what the "imperialistic"
(your word, not mine) EU will do --- try to lock up or sue everyone
from outside the EU that refuses to pay the 2x-10x performance
overhead and sticks with the original repo format, such that anyone
who wants to interoperate has to send git pushes in the orignial
format?

But in any case, way don't you send a patch and we can discuss?  As
the old saying goes, "code talks, bullshit walks".   :-)

Regards,

 - Ted


Re: GDPR compliance best practices?

2018-06-03 Thread Ævar Arnfjörð Bjarmason


On Sun, Jun 03 2018, Peter Backes wrote:

> On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I'm not trying to be selfish, I'm just trying to counter your literal
>> reading of the law with a comment of "it'll depend".
>>
>> Just like there's a law against public urination in many places, but
>> this is applied very differently to someone taking a piss in front of
>> parliament v.s. someone taking a piss in the forest on a hike, even
>> though the law itself usually makes no distinction about the two.
>
> We have huge companies using git now. This is not the tool used by a
> few kernel hackers anymore.

Sure, but what I'm pointing out is a) you can't focus on git as the
technology because it tells you nothing about what's being done with it
(e.g. the log file case I mentioned b) nobody who came up with the GDPR
was concerned with some free software projects or the SCM used by
companies, so this is very unlikely to be enforced.

>> In this example once you'd delete the UUID ref you don't have the UUID
>> -> author mapping anymore (and b.t.w. that could be a many to one
>> mapping).
>
> It is not relevant whether you have that mapping or not, it is enough
> that with additional information you could obtain it. For example, say,
> you have 5000 commits with the same UUID. Now your delete the mapping.
> But your friend still has it on his local copy. Now your friendly
> merely needs to tell you who is behind that UUID and instantly you can
> associate all 5000 commits with that person again.

So nobody can be GDPR compliant in the face of archive.org and the like?
If the law says that you need to delete information you published in the
past, and you do so, how is it your problem that someone mirrored &
re-published it? That's their compliance problem at that point.

> The GDPR is very explict about this, see recital 26. It says that
> pseudonymization is not enough, you need anonymization if you want to
> be free from regulation.
>
> In addition, and in contrast to my proposal, your solution doesn't
> allow verification of the author field.

It does if you've got the ref. Maybe I just don't get your proposal,
quote:

Do not hash anything directly to obtain the commit ID. Instead, hash a
list of hashes of [$random_number, $information] pairs. $information
could be an author id, a commit date, a comment, or anything else. Then
store the commit id, the list of hashes, and the list of pairs to form
the commit.

You're just proposing (if I've read this correctly) that the commit
object should have some list of headers pointing to other SHA1s, and
that fsck and the like be OK with these going away. Right?

How is this intrinsically different from referring to something in the
ref namespace that may be deleted in the future?

In both cases you're just trying to solve the problem of trying to
somehow encode data into a git repository today, that may go away
tomorrow. Similar to how a reference to some LFS object today going away
doesn't fail "git fsck".

>> I think again that this is taking too much of a literalist view. The
>> intent of that policy is to ensure that companies like Google can't just
>> close down their EU offices weasel out of compliance be saying "we're
>> just doing business from the US, it doesn't apply to us".
>>
>> It will not be used against anyone who's taking every reasonable
>> precaution from doing business with EU customers.
>
> I think you are underestimating the political intention behind the
> GDPR. It has kind of an imperialist goal, to set international
> standards, to enforce them against foreign companies and to pressure
> other nations to establish the same standards.
>
> If I would read the GPDR in a literal sense, I would in fact come to
> the same conclusion as you: It's about companies doing substantial
> business in the EU. But the GDPR is carefully constructed in such a way
> that it is hard not to be affected by the GDPR in one way or another,
> and the obvious way to cope with that risk is to more or less obey the
> GDPR rules even if one does not have substantial business interests in
> the EU.

Okey, so you're not reading the GDPR in some literal sense, but you're
coming to a conclusion that's supported by ... what? To echo Theodore
Y. Ts'o E-Mail have you consulted with someone who's an actual lawyer on
this subject?

I haven't but, I'm not suggesting that the git data format needs to
change because of some new EU law. You are, what's your basis for that
opinion?

It seems to me that the git project doesn't need to do anything about
this. There's plenty of things that are illegal to publish, and some of
which may be made illegal after the fact (e.g. national security related
information). If those things are incidentally saved in git repositories
the parties involved may need to run git-filter-branch.

Of course if they need to do that on a weekly basis because of some
overzealous law we may need to have some "native" 

Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
Addendum:

I one discussed with a philosopher the question: What is your argument 
against libertarianism?

He said: It would be a tyranny of lawyers.

Let's not have a tyranny of lawyers. Let us, the engineers and hackers, 
exercise the necessary control over those pesky lawyers by defining and 
redefining the state of the art in technology, and prevent them from 
defining it by themselves. For a hammer, everything looks like a nail. 
What is the better options: To suggest people to pay for legal advice 
by lawyers, who only offer lengthy disclaimers and such for bypassing 
the right to be forgotten, or simply discuss technical changes for git 
which enable its easy implementation, without legal excuses for not 
doing supporting it?

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 02:18:07PM -0400, Theodore Y. Ts'o wrote:
> I would gently suggest that if you really want to engage in something
> practical than speculating how the GPDR compliance will work out in
> actual practice, that you contact a lawyer and get official legal
> advice?

I completely disagree.

Erasure is a technical issue to be solved by engineers, not by lawyers.

And that's completely in line with the GDPR. The GDPR is ultimately not 
a legal thing to be solved by lawyers writing lengthy legal 
argumentations and disclaimers and such. They are not even the ones to 
take lead in GDPR implementation. All that would be simply snake oil. 
Some legal documentation may be necessary, and having a competent 
lawyer in a GDPR compliance task force is certainly a must. But that 
gets you done only 20% of the job, 80% is engineering. Every lawyer who 
claims to give you shady legal tricks to get the job 100% done in no 
time is a liar.

The GDPRs ultimate goal is to incline the world to improve how data 
protection is implemented on a technical level. The GDPR contains 
several blanket clauses that refer to the "state of the art" of 
technology, which the GDPR itself of course does not define and which 
is of course nothing a lawyer has any competence in.

My proposal is a technical, not a legal one: Provide a generic 
possibility of having eraseability and verifiability at the same time 
in git. Improve the state of the art in version control such that it is 
more in line with the GDPRs idea that people have a right to be 
forgotten, but to also be useful for a multitude of other applications. 
The lawyers can then build on this.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: [PATCH 19/22] sequencer.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sun, Jun 3, 2018 at 11:14 AM, Duy Nguyen  wrote:
> On Sun, Jun 3, 2018 at 10:16 AM, Eric Sunshine  
> wrote:
>> On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  
>> wrote:
>>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>>> -   fprintf(stderr, "Could not apply %s... %.*s\n",
>>> +   fprintf_ln(stderr, _("Could not apply %s... %.*s"),
>>
>> Did you want to downcase "Could" for consistency with other error
>> messages, or was this left as-is intentionally?
>
> I'm not sure. Others start with a prefix (like "error:",
> "warning:"). This is a bit different and makes me hesitate.

Yep, I realized after hitting Send that this wasn't an error/warning
message so probably shouldn't be changed.


Re: [PATCH 02/22] archive-zip.c: mark more strings for translation

2018-06-03 Thread brian m. carlson
On Sat, Jun 02, 2018 at 08:17:47AM +0200, Duy Nguyen wrote:
> On Sat, Jun 2, 2018 at 6:32 AM, Nguyễn Thái Ngọc Duy  
> wrote:
> > if (pathlen > 0x) {
> > -   return error("path too long (%d chars, SHA1: %s): %s",
> > +   return error(_("path too long (%d chars, SHA1: %s): %s"),
> > (int)pathlen, oid_to_hex(oid), path);
> 
> Off topic. Brian, do we have a common term to use here (i.e. in user
> facing messages) instead of "SHA1" after we move away from it? Is it
> still "object name", or "hash" or some other fancy term?

You could say "object ID" or "object" here.  It should be clear from
context what that means.  I tend to use "hash" for things which are not
necessarily object IDs (e.g. pack hash).
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: GDPR compliance best practices?

2018-06-03 Thread Theodore Y. Ts'o
On Sun, Jun 03, 2018 at 07:46:17PM +0200, Peter Backes wrote:
> 
> Let's be honest: We do not know what legitimization exactly in each 
> specific case the git metadata is being distributed under.

It seems like you are engaging in something even more dangerous than a
hardware engineering pretending they know how program, or a software
engineer knowing how to use as oldering iron --- and that's a
programmer pretending they know enough that they can speculate on the
law.

I would gently suggest that if you really want to engage in something
practical than speculating how the GPDR compliance will work out in
actual practice, that you contact a lawyer and get official legal
advice?

After getting that advice, if you or your company wants to implemnt,
you can then send patches, and those can get debated using the usual
patch submission process.

Cheers,

- Ted


Re: GDPR compliance best practices?

2018-06-03 Thread Philip Oakley

correcting a negative /with/without/ and inserting a comma.
- Original Message - 
From: "Philip Oakley" 

[snip]


From a personal view, many folk want it to be that corporates (and open
source organisations) should hold no personal information with having

s/with/without/


explicit permission that can then be withdrawn, with deletion to follow.
s/permission/permission,/  


However that 'legal' clause does [generally] win.





Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 04:28:31PM +0100, Philip Oakley wrote:
> In most Git cases that legal/legitimate purpose is the copyright licence,
> and/or corporate employment. That is, Jane wrote it, hence X has a legal
> rights of use, and we need to have a record of that (Jane wrote it) as
> evidence of that (I'm X, I can use it) right. That would mean that Jane
> cannot just ask to have that record removed and expect it to be removed.

Re corporate employment:

For sure nobody would dare to quesion that a company has a right to 
keep an internal record that Jane wrote it.

The issue is publishing that information. This is an entirely different 
story.

I already stressed that from the very beginning.

Re copyright license:

No, a copyright license does not provide a legitimization.

- copyright is about distributing the program, not about distributing 
version control metadata.

- Being named is a right, not an obligation of the author. Hence, if 
the author doesn't want his name published, the company doesn't have 
legitimate grounds based in copyright for doing it anyway, against his 
or her will.

> From a personal view, many folk want it to be that corporates (and open
> source organisations) should hold no personal information with having
> explicit permission that can then be withdrawn, with deletion to follow.
> However that 'legal' clause does [generally] win.

Let's be honest: We do not know what legitimization exactly in each 
specific case the git metadata is being distributed under.

It may be copyright, it may be employment, but it may also be revocable 
consent. This is, we cannot safely assume that no git user will ever 
have to deal with a legitimate request based on the right to be 
forgotten.

> In the git.git case (and linux.git) there is the DCO (to back up the GLP2)
> as an explicit requirement/certification that puts the information into the
> legal evidence category. IIUC almost all copyright ends up with a similar
> evidentail trail for the meta data.

This makes things more complicated, not less. You have yet more meta 
data to cope with, yet more opportunities to be bitten by the right to 
be forgotten. Since I proposed a list of metadata where each entry can 
be anonymized independently of each other, it would be able to deal 
with this perfectly.

> The more likely problem is if the content of the repo, rather than the meta
> data, is subject to GDPR, and that could easily ruin any storage method.
> Being able to mark an object as  would help here(*).

My proposal supports any part of the commit, including the contents of 
individual files, as eraseable, yet verifiable data.

> Also remember that most EU legislation is 'intent' based, rather than
> 'letter of', for the style of legal arguments (which is where some of the UK
> Brexit misunderstandings come from), so it is more than possible to get into
> the situation where an action is both mandated and illegal at the same time,
> so plent of snake oil salesman continue to sell magic fixes according to the
> customers local biases.

This may be true. I am not trying to sell snake oil, however. To have 
erasure and verifiability at the same time is a highly generic feature 
that may be desirable to have for a multitude of reasons, including but 
not limited to legal ones like GDPR and copyright violations.

> I do not believe Git has anything to worry about that wasn't already an
> issue.

Yes, but it definitely had and still does have something to worry about.

git should provide technical means to deal with this. I provided a 
proposal based on anonymization that does not in any way have any 
drawback compared to the status quo, except a slight increase in 
metadata size and various degrees of backwards incompatibility, 
depending on how it is implemented.

What do you think about my proposal as a solution for the problem?

You provide a lot of arguments about why it is not a necessity to have 
this, but let's assume it is; is there any actual problem you see with 
the proposal, except that someone would have to implement it?

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: git glob pattern in .gitignore and git command

2018-06-03 Thread Duy Nguyen
On Sun, Jun 3, 2018 at 2:58 AM, Yubin Ruan  wrote:
> To ignore all .js file under a directory `lib', I can use "lib/**/js" to match
> them. But when using git command such as "git add", using "git add lib/\*.js"
> is sufficient. Why is this difference in glob mode?

Historical reasons mostly. '**' comes later when '*' already
establishes its place. You can use '**' too with "git add
':(glob)lib/**/*.js'". See
https://git-scm.com/docs/gitglossary#gitglossary-aiddefpathspecapathspec
-- 
Duy


Re: git glob pattern in .gitignore and git command

2018-06-03 Thread Philip Oakley

Hi Yubun,

From: "Yubin Ruan" 

To ignore all .js file under a directory `lib', I can use "lib/**/js" to
match
them. But when using git command such as "git add", using "git add
lib/\*.js"
is sufficient. Why is this difference in glob mode?

I have heard that there are many different glob mode out there (e.g., bash
has
many different glob mode). So, which classes of glob mode does these two
belong to? Do they have a name?



Is this a question about `git add` being able to add a file that is marked
as being ignored in the .gitignore file? [Yes it can.]

Or, is this simply about the many different globbing capabilities of one's
shell, and of Git?

The double asterix (star) is specific/local to Git. It is described in the
various commands that use it, especially the gitignore man page `git help
ignore` or  https://git-scm.com/docs/gitignore.
"Two consecutive asterisks ("**") in patterns matched against full pathname
may have special meaning: ... "

The single asterix does have two modes depending on how you quote it. It is
described in the command line interface (cli) man page ` git help cli` or
https://git-scm.com/docs/gitcli.
"Many commands allow wildcards in paths, but you need to protect them from
getting globbed by the shell. These two mean different things: ... "

A common proper name for these asterix style characters is a "wildcards".
Try 'bash wildcards' or linux wildcards' in your favourite search engine.

--
Philip



Re: [PATCH v2 05/36] refspec: convert valid_fetch_refspec to use parse_refspec

2018-06-03 Thread Martin Ågren
Hi Brandon,

On 17 May 2018 at 00:57, Brandon Williams  wrote:
> Convert 'valid_fetch_refspec()' to use the new 'parse_refspec()'
> function to only parse a single refspec and eliminate an allocation.
>
> Signed-off-by: Brandon Williams 
> ---
>  refspec.c | 17 -
>  refspec.h |  3 ++-
>  2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/refspec.c b/refspec.c
> index af9d0d4b3..ab37b5ba1 100644
> --- a/refspec.c
> +++ b/refspec.c
> @@ -146,15 +146,6 @@ static struct refspec_item *parse_refspec_internal(int 
> nr_refspec, const char **
> die("Invalid refspec '%s'", refspec[i]);
>  }
>
> -int valid_fetch_refspec(const char *fetch_refspec_str)
> -{
> -   struct refspec_item *refspec;
> -
> -   refspec = parse_refspec_internal(1, _refspec_str, 1, 1);
> -   free_refspec(1, refspec);
> -   return !!refspec;
> -}
> -
>  struct refspec_item *parse_fetch_refspec(int nr_refspec, const char 
> **refspec)
>  {
> return parse_refspec_internal(nr_refspec, refspec, 1, 0);
> @@ -242,3 +233,11 @@ void refspec_clear(struct refspec *rs)
>
> rs->fetch = 0;
>  }
> +
> +int valid_fetch_refspec(const char *fetch_refspec_str)
> +{
> +   struct refspec_item refspec;
> +   int ret = parse_refspec(, fetch_refspec_str, REFSPEC_FETCH);
> +   refspec_item_clear();
> +   return ret;
> +}

My compiler warned about this function. The `dst` and `src` pointers
will equal some random data on the stack, then they may or may not be
assigned to, then we will call `free()` on them.

At least I *think* that we "may or may not" assign to them. I don't have
much or any time to really dig into this right now unfortunately.

I suppose this could use a REFSPEC_ITEM_INIT, or a memset inside
`parse_refspec()`, but I am very unfamiliar with this code.

Martin


[PATCH] Use ZSH_NAME instead of ZSH_VERSION because it's redefined by git-completion.zsh

2018-06-03 Thread Rick van Hattem
The `git-completion.zsh` unsets the `$ZSH_VERSION` which makes this check moot. 
The result (at least for me) is that zsh segfaults because of all the variables 
it's unsetting.
---
 contrib/completion/git-completion.bash | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 12814e9bbf6be..7e2b9ad454930 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -348,7 +348,7 @@ __gitcomp ()
 
 # Clear the variables caching builtins' options when (re-)sourcing
 # the completion script.
-if [[ -n ${ZSH_VERSION-} ]]; then
+if [[ -n ${ZSH_NAME-} ]]; then
unset $(set |sed -ne 
's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
 else
unset $(compgen -v __gitcomp_builtin_)

--
https://github.com/git/git/pull/500


[PATCH v2 21/23] sha1-file.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 sha1-file.c | 104 ++--
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index d2df30c4c4..a0ce97b10b 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -378,8 +378,8 @@ static int alt_odb_usable(struct raw_object_store *o,
 
/* Detect cases where alternate disappeared */
if (!is_directory(path->buf)) {
-   error("object directory %s does not exist; "
- "check .git/objects/info/alternates",
+   error(_("object directory %s does not exist; "
+   "check .git/objects/info/alternates"),
  path->buf);
return 0;
}
@@ -429,7 +429,7 @@ static int link_alt_odb_entry(struct repository *r, const 
char *entry,
strbuf_addstr(, entry);
 
if (strbuf_normalize_path() < 0 && relative_base) {
-   error("unable to normalize alternate object path: %s",
+   error(_("unable to normalize alternate object path: %s"),
  pathbuf.buf);
strbuf_release();
return -1;
@@ -500,14 +500,14 @@ static void link_alt_odb_entries(struct repository *r, 
const char *alt,
return;
 
if (depth > 5) {
-   error("%s: ignoring alternate object stores, nesting too deep.",
+   error(_("%s: ignoring alternate object stores, nesting too 
deep"),
relative_base);
return;
}
 
strbuf_add_absolute_path(, r->objects->objectdir);
if (strbuf_normalize_path() < 0)
-   die("unable to normalize object directory: %s",
+   die(_("unable to normalize object directory: %s"),
objdirbuf.buf);
 
while (*alt) {
@@ -562,7 +562,7 @@ void add_to_alternates_file(const char *reference)
hold_lock_file_for_update(, alts, LOCK_DIE_ON_ERROR);
out = fdopen_lock_file(, "w");
if (!out)
-   die_errno("unable to fdopen alternates lockfile");
+   die_errno(_("unable to fdopen alternates lockfile"));
 
in = fopen(alts, "r");
if (in) {
@@ -580,14 +580,14 @@ void add_to_alternates_file(const char *reference)
fclose(in);
}
else if (errno != ENOENT)
-   die_errno("unable to read alternates file");
+   die_errno(_("unable to read alternates file"));
 
if (found) {
rollback_lock_file();
} else {
fprintf_or_die(out, "%s\n", reference);
if (commit_lock_file())
-   die_errno("unable to move new alternates file into 
place");
+   die_errno(_("unable to move new alternates file into 
place"));
if (the_repository->objects->alt_odb_tail)
link_alt_odb_entries(the_repository, reference,
 '\n', NULL, 0);
@@ -778,7 +778,7 @@ static void mmap_limit_check(size_t length)
limit = SIZE_MAX;
}
if (length > limit)
-   die("attempting to mmap %"PRIuMAX" over limit %"PRIuMAX,
+   die(_("attempting to mmap %"PRIuMAX" over limit %"PRIuMAX),
(uintmax_t)length, (uintmax_t)limit);
 }
 
@@ -803,7 +803,7 @@ void *xmmap(void *start, size_t length,
 {
void *ret = xmmap_gently(start, length, prot, flags, fd, offset);
if (ret == MAP_FAILED)
-   die_errno("mmap failed");
+   die_errno(_("mmap failed"));
return ret;
 }
 
@@ -970,7 +970,7 @@ static void *map_sha1_file_1(struct repository *r, const 
char *path,
*size = xsize_t(st.st_size);
if (!*size) {
/* mmap() is forbidden on empty files */
-   error("object file %s is empty", path);
+   error(_("object file %s is empty"), path);
return NULL;
}
map = xmmap(NULL, *size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -1090,9 +1090,9 @@ static void *unpack_sha1_rest(git_zstream *stream, void 
*buffer, unsigned long s
}
 
if (status < 0)
-   error("corrupt loose object '%s'", sha1_to_hex(sha1));
+   error(_("corrupt loose object '%s'"), sha1_to_hex(sha1));
else if (stream->avail_in)
-   error("garbage at end of loose object '%s'",
+   error(_("garbage at end of loose object '%s'"),
  sha1_to_hex(sha1));
free(buf);
return NULL;
@@ -1134,7 +1134,7 @@ static int parse_sha1_header_extended(const char *hdr, 
struct object_info *oi,
if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
type = 0;
else if 

[PATCH v2 15/23] object.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 object.c| 10 +-
 t/t1450-fsck.sh |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/object.c b/object.c
index f7f4de3aaf..c20ea782f1 100644
--- a/object.c
+++ b/object.c
@@ -51,7 +51,7 @@ int type_from_string_gently(const char *str, ssize_t len, int 
gentle)
if (gentle)
return -1;
 
-   die("invalid object type \"%s\"", str);
+   die(_("invalid object type \"%s\""), str);
 }
 
 /*
@@ -167,7 +167,7 @@ void *object_as_type(struct object *obj, enum object_type 
type, int quiet)
}
else {
if (!quiet)
-   error("object %s is a %s, not a %s",
+   error(_("object %s is a %s, not a %s"),
  oid_to_hex(>oid),
  type_name(obj->type), type_name(type));
return NULL;
@@ -226,7 +226,7 @@ struct object *parse_object_buffer(const struct object_id 
*oid, enum object_type
obj = >object;
}
} else {
-   warning("object %s has unknown type id %d", oid_to_hex(oid), 
type);
+   warning(_("object %s has unknown type id %d"), oid_to_hex(oid), 
type);
obj = NULL;
}
return obj;
@@ -259,7 +259,7 @@ struct object *parse_object(const struct object_id *oid)
(!obj && has_object_file(oid) &&
 oid_object_info(the_repository, oid, NULL) == OBJ_BLOB)) {
if (check_object_signature(repl, NULL, 0, NULL) < 0) {
-   error("sha1 mismatch %s", oid_to_hex(oid));
+   error(_("sha1 mismatch %s"), oid_to_hex(oid));
return NULL;
}
parse_blob_buffer(lookup_blob(oid), NULL, 0);
@@ -270,7 +270,7 @@ struct object *parse_object(const struct object_id *oid)
if (buffer) {
if (check_object_signature(repl, buffer, size, type_name(type)) 
< 0) {
free(buffer);
-   error("sha1 mismatch %s", oid_to_hex(repl));
+   error(_("sha1 mismatch %s"), oid_to_hex(repl));
return NULL;
}
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 91fd71444d..7b7602ddb4 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -372,7 +372,7 @@ test_expect_success 'rev-list --verify-objects with bad 
sha1' '
 
test_might_fail git rev-list --verify-objects refs/heads/bogus 
>/dev/null 2>out &&
cat out &&
-   grep -q "error: sha1 mismatch 63ff" 
out
+   test_i18ngrep -q "error: sha1 mismatch 
63ff" out
 '
 
 test_expect_success 'force fsck to ignore double author' '
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 20/23] sequencer.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 sequencer.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index def8ae587f..01f3afe7f4 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -714,7 +714,7 @@ static const char *read_author_ident(struct strbuf *buf)
/* dequote values and construct ident line in-place */
for (in = out = buf->buf; i < 3 && in - buf->buf < buf->len; i++) {
if (!skip_prefix(in, keys[i], (const char **))) {
-   warning("could not parse '%s' (looking for '%s'",
+   warning(_("could not parse '%s' (looking for '%s'"),
rebase_path_author_script(), keys[i]);
return NULL;
}
@@ -736,7 +736,7 @@ static const char *read_author_ident(struct strbuf *buf)
}
 
if (i < 3) {
-   warning("could not parse '%s' (looking for '%s')",
+   warning(_("could not parse '%s' (looking for '%s')"),
rebase_path_author_script(), keys[i]);
return NULL;
}
@@ -1442,7 +1442,7 @@ static const char *command_to_string(const enum 
todo_command command)
 {
if (command < TODO_COMMENT)
return todo_command_info[command].str;
-   die("unknown command: %d", command);
+   die(_("unknown command: %d"), command);
 }
 
 static char command_to_char(const enum todo_command command)
@@ -2597,15 +2597,17 @@ static int error_with_patch(struct commit *commit,
if (intend_to_amend())
return -1;
 
-   fprintf(stderr, "You can amend the commit now, with\n"
-   "\n"
-   "  git commit --amend %s\n"
-   "\n"
-   "Once you are satisfied with your changes, run\n"
-   "\n"
-   "  git rebase --continue\n", gpg_sign_opt_quoted(opts));
+   fprintf(stderr,
+   _("You can amend the commit now, with\n"
+ "\n"
+ "  git commit --amend %s\n"
+ "\n"
+ "Once you are satisfied with your changes, run\n"
+ "\n"
+ "  git rebase --continue\n"),
+   gpg_sign_opt_quoted(opts));
} else if (exit_code)
-   fprintf_ln(stderr, "Could not apply %s... %.*s",
+   fprintf_ln(stderr, _("Could not apply %s... %.*s"),
short_commit_name(commit), subject_len, subject);
 
return exit_code;
@@ -2716,7 +2718,7 @@ static int do_label(const char *name, int len)
struct object_id head_oid;
 
if (len == 1 && *name == '#')
-   return error("illegal label name: '%.*s'", len, name);
+   return error(_("illegal label name: '%.*s'"), len, name);
 
strbuf_addf(_name, "refs/rewritten/%.*s", len, name);
strbuf_addf(, "rebase -i (label) '%.*s'", len, name);
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 23/23] transport-helper.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/t5801-remote-helpers.sh |  8 ++--
 transport-helper.c| 87 ---
 2 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index 88c7f158ef..e3bc53b0c7 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -126,7 +126,7 @@ test_expect_success 'forced push' '
 test_expect_success 'cloning without refspec' '
GIT_REMOTE_TESTGIT_REFSPEC="" \
git clone "testgit::${PWD}/server" local2 2>error &&
-   grep "this remote helper should implement refspec capability" error &&
+   test_i18ngrep "this remote helper should implement refspec capability" 
error &&
compare_refs local2 HEAD server HEAD
 '
 
@@ -134,7 +134,7 @@ test_expect_success 'pulling without refspecs' '
(cd local2 &&
git reset --hard &&
GIT_REMOTE_TESTGIT_REFSPEC="" git pull 2>../error) &&
-   grep "this remote helper should implement refspec capability" error &&
+   test_i18ngrep "this remote helper should implement refspec capability" 
error &&
compare_refs local2 HEAD server HEAD
 '
 
@@ -146,7 +146,7 @@ test_expect_success 'pushing without refspecs' '
GIT_REMOTE_TESTGIT_REFSPEC="" &&
export GIT_REMOTE_TESTGIT_REFSPEC &&
test_must_fail git push 2>../error) &&
-   grep "remote-helper doesn.t support push; refspec needed" error
+   test_i18ngrep "remote-helper doesn.t support push; refspec needed" error
 '
 
 test_expect_success 'pulling without marks' '
@@ -246,7 +246,7 @@ test_expect_success 'proper failure checks for fetching' '
(cd local &&
test_must_fail env GIT_REMOTE_TESTGIT_FAILURE=1 git fetch 2>error &&
cat error &&
-   grep -q "error while running fast-import" error
+   test_i18ngrep -q "error while running fast-import" error
)
 '
 
diff --git a/transport-helper.c b/transport-helper.c
index 9f487cc905..84a10661cc 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -48,7 +48,7 @@ static void sendline(struct helper_data *helper, struct 
strbuf *buffer)
if (debug)
fprintf(stderr, "Debug: Remote helper: -> %s", buffer->buf);
if (write_in_full(helper->helper->in, buffer->buf, buffer->len) < 0)
-   die_errno("full write to remote helper failed");
+   die_errno(_("full write to remote helper failed"));
 }
 
 static int recvline_fh(FILE *helper, struct strbuf *buffer)
@@ -77,7 +77,7 @@ static void write_constant(int fd, const char *str)
if (debug)
fprintf(stderr, "Debug: Remote helper: -> %s", str);
if (write_in_full(fd, str, strlen(str)) < 0)
-   die_errno("full write to remote helper failed");
+   die_errno(_("full write to remote helper failed"));
 }
 
 static const char *remove_ext_force(const char *url)
@@ -129,7 +129,7 @@ static struct child_process *get_helper(struct transport 
*transport)
 
code = start_command(helper);
if (code < 0 && errno == ENOENT)
-   die("unable to find remote helper for '%s'", data->name);
+   die(_("unable to find remote helper for '%s'"), data->name);
else if (code != 0)
exit(code);
 
@@ -145,7 +145,7 @@ static struct child_process *get_helper(struct transport 
*transport)
 */
duped = dup(helper->out);
if (duped < 0)
-   die_errno("can't dup helper output fd");
+   die_errno(_("can't dup helper output fd"));
data->out = xfdopen(duped, "r");
 
write_constant(helper->in, "capabilities\n");
@@ -196,13 +196,13 @@ static struct child_process *get_helper(struct transport 
*transport)
} else if (starts_with(capname, "no-private-update")) {
data->no_private_update = 1;
} else if (mandatory) {
-   die("unknown mandatory capability %s; this remote "
-   "helper probably needs newer version of Git",
+   die(_("unknown mandatory capability %s; this remote "
+ "helper probably needs newer version of Git"),
capname);
}
}
if (!data->rs.nr && (data->import || data->bidi_import || 
data->export)) {
-   warning("this remote helper should implement refspec 
capability");
+   warning(_("this remote helper should implement refspec 
capability"));
}
strbuf_release();
if (debug)
@@ -269,7 +269,7 @@ static int strbuf_set_helper_option(struct helper_data 
*data,
else if (!strcmp(buf->buf, "unsupported"))
ret = 1;
else {
-   warning("%s unexpectedly said: '%s'", data->name, buf->buf);
+   warning(_("%s unexpectedly said: '%s'"), data->name, buf->buf);
ret = 1;
}
   

[PATCH v2 02/23] archive-tar.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 archive-tar.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index b6f58ddf38..68e72d9176 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -121,7 +121,7 @@ static int stream_blocked(const struct object_id *oid)
 
st = open_istream(oid, , , NULL);
if (!st)
-   return error("cannot stream blob %s", oid_to_hex(oid));
+   return error(_("cannot stream blob %s"), oid_to_hex(oid));
for (;;) {
readlen = read_istream(st, buf, sizeof(buf));
if (readlen <= 0)
@@ -256,7 +256,7 @@ static int write_tar_entry(struct archiver_args *args,
*header.typeflag = TYPEFLAG_REG;
mode = (mode | ((mode & 0100) ? 0777 : 0666)) & ~tar_umask;
} else {
-   return error("unsupported file mode: 0%o (SHA1: %s)",
+   return error(_("unsupported file mode: 0%o (SHA1: %s)"),
 mode, oid_to_hex(oid));
}
if (pathlen > sizeof(header.name)) {
@@ -283,7 +283,7 @@ static int write_tar_entry(struct archiver_args *args,
enum object_type type;
buffer = object_file_to_archive(args, path, oid, old_mode, 
, );
if (!buffer)
-   return error("cannot read %s", oid_to_hex(oid));
+   return error(_("cannot read %s"), oid_to_hex(oid));
} else {
buffer = NULL;
size = 0;
@@ -454,17 +454,17 @@ static int write_tar_filter_archive(const struct archiver 
*ar,
filter.in = -1;
 
if (start_command() < 0)
-   die_errno("unable to start '%s' filter", argv[0]);
+   die_errno(_("unable to start '%s' filter"), argv[0]);
close(1);
if (dup2(filter.in, 1) < 0)
-   die_errno("unable to redirect descriptor");
+   die_errno(_("unable to redirect descriptor"));
close(filter.in);
 
r = write_tar_archive(ar, args);
 
close(1);
if (finish_command() != 0)
-   die("'%s' filter reported error", argv[0]);
+   die(_("'%s' filter reported error"), argv[0]);
 
strbuf_release();
return r;
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 16/23] pkt-line.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pkt-line.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 941e41dfc1..04d10bbd03 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -101,7 +101,7 @@ int packet_flush_gently(int fd)
 {
packet_trace("", 4, 1);
if (write_in_full(fd, "", 4) < 0)
-   return error("flush packet write failed");
+   return error(_("flush packet write failed"));
return 0;
 }
 
@@ -139,7 +139,7 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
n = out->len - orig_len;
 
if (n > LARGE_PACKET_MAX)
-   die("protocol error: impossibly long line");
+   die(_("protocol error: impossibly long line"));
 
set_packet_header(>buf[orig_len], n);
packet_trace(out->buf + orig_len + 4, n - 4, 1);
@@ -155,9 +155,9 @@ static int packet_write_fmt_1(int fd, int gently,
if (write_in_full(fd, buf.buf, buf.len) < 0) {
if (!gently) {
check_pipe(errno);
-   die_errno("packet write with format failed");
+   die_errno(_("packet write with format failed"));
}
-   return error("packet write with format failed");
+   return error(_("packet write with format failed"));
}
 
return 0;
@@ -189,21 +189,21 @@ static int packet_write_gently(const int fd_out, const 
char *buf, size_t size)
size_t packet_size;
 
if (size > sizeof(packet_write_buffer) - 4)
-   return error("packet write failed - data exceeds max packet 
size");
+   return error(_("packet write failed - data exceeds max packet 
size"));
 
packet_trace(buf, size, 1);
packet_size = size + 4;
set_packet_header(packet_write_buffer, packet_size);
memcpy(packet_write_buffer + 4, buf, size);
if (write_in_full(fd_out, packet_write_buffer, packet_size) < 0)
-   return error("packet write failed");
+   return error(_("packet write failed"));
return 0;
 }
 
 void packet_write(int fd_out, const char *buf, size_t size)
 {
if (packet_write_gently(fd_out, buf, size))
-   die_errno("packet write failed");
+   die_errno(_("packet write failed"));
 }
 
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
@@ -225,7 +225,7 @@ void packet_buf_write_len(struct strbuf *buf, const char 
*data, size_t len)
n = buf->len - orig_len;
 
if (n > LARGE_PACKET_MAX)
-   die("protocol error: impossibly long line");
+   die(_("protocol error: impossibly long line"));
 
set_packet_header(>buf[orig_len], n);
packet_trace(data, len, 1);
@@ -288,7 +288,7 @@ static int get_packet_data(int fd, char **src_buf, size_t 
*src_size,
} else {
ret = read_in_full(fd, dst, size);
if (ret < 0)
-   die_errno("read error");
+   die_errno(_("read error"));
}
 
/* And complain if we didn't get enough bytes to satisfy the read. */
@@ -296,7 +296,7 @@ static int get_packet_data(int fd, char **src_buf, size_t 
*src_size,
if (options & PACKET_READ_GENTLE_ON_EOF)
return -1;
 
-   die("the remote end hung up unexpectedly");
+   die(_("the remote end hung up unexpectedly"));
}
 
return ret;
@@ -324,7 +324,7 @@ enum packet_read_status packet_read_with_status(int fd, 
char **src_buffer,
len = packet_length(linelen);
 
if (len < 0) {
-   die("protocol error: bad line length character: %.4s", linelen);
+   die(_("protocol error: bad line length character: %.4s"), 
linelen);
} else if (!len) {
packet_trace("", 4, 0);
*pktlen = 0;
@@ -334,12 +334,12 @@ enum packet_read_status packet_read_with_status(int fd, 
char **src_buffer,
*pktlen = 0;
return PACKET_READ_DELIM;
} else if (len < 4) {
-   die("protocol error: bad line length %d", len);
+   die(_("protocol error: bad line length %d"), len);
}
 
len -= 4;
if ((unsigned)len >= size)
-   die("protocol error: bad line length %d", len);
+   die(_("protocol error: bad line length %d"), len);
 
if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0) 
{
*pktlen = -1;
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 03/23] archive-zip.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 archive-zip.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 48d843489c..7ad46d8854 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -309,11 +309,11 @@ static int write_zip_entry(struct archiver_args *args,
if (is_utf8(path))
flags |= ZIP_UTF8;
else
-   warning("path is not valid UTF-8: %s", path);
+   warning(_("path is not valid UTF-8: %s"), path);
}
 
if (pathlen > 0x) {
-   return error("path too long (%d chars, SHA1: %s): %s",
+   return error(_("path too long (%d chars, SHA1: %s): %s"),
(int)pathlen, oid_to_hex(oid), path);
}
 
@@ -340,7 +340,7 @@ static int write_zip_entry(struct archiver_args *args,
size > big_file_threshold) {
stream = open_istream(oid, , , NULL);
if (!stream)
-   return error("cannot stream blob %s",
+   return error(_("cannot stream blob %s"),
 oid_to_hex(oid));
flags |= ZIP_STREAM;
out = buffer = NULL;
@@ -348,7 +348,7 @@ static int write_zip_entry(struct archiver_args *args,
buffer = object_file_to_archive(args, path, oid, mode,
, );
if (!buffer)
-   return error("cannot read %s",
+   return error(_("cannot read %s"),
 oid_to_hex(oid));
crc = crc32(crc, buffer, size);
is_binary = entry_is_binary(path_without_prefix,
@@ -357,7 +357,7 @@ static int write_zip_entry(struct archiver_args *args,
}
compressed_size = (method == 0) ? size : 0;
} else {
-   return error("unsupported file mode: 0%o (SHA1: %s)", mode,
+   return error(_("unsupported file mode: 0%o (SHA1: %s)"), mode,
oid_to_hex(oid));
}
 
@@ -466,7 +466,7 @@ static int write_zip_entry(struct archiver_args *args,
zstream.avail_in = readlen;
result = git_deflate(, 0);
if (result != Z_OK)
-   die("deflate error (%d)", result);
+   die(_("deflate error (%d)"), result);
out_len = zstream.next_out - compressed;
 
if (out_len > 0) {
@@ -601,7 +601,7 @@ static void dos_time(timestamp_t *timestamp, int *dos_date, 
int *dos_time)
struct tm *t;
 
if (date_overflows(*timestamp))
-   die("timestamp too large for this system: %"PRItime,
+   die(_("timestamp too large for this system: %"PRItime),
*timestamp);
time = (time_t)*timestamp;
t = localtime();
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 05/23] builtin/grep.c: mark strings for translation and no full stops

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 9774920999..58f941e951 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -489,7 +489,7 @@ static int grep_cache(struct grep_opt *opt, struct 
repository *repo,
}
 
if (repo_read_index(repo) < 0)
-   die("index file corrupt");
+   die(_("index file corrupt"));
 
for (nr = 0; nr < repo->index->cache_nr; nr++) {
const struct cache_entry *ce = repo->index->cache[nr];
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 22/23] transport.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 transport.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/transport.c b/transport.c
index 2f6a7cb4cd..516a83b7f6 100644
--- a/transport.c
+++ b/transport.c
@@ -139,7 +139,7 @@ static struct ref *get_refs_from_bundle(struct transport 
*transport,
close(data->fd);
data->fd = read_bundle_header(transport->url, >header);
if (data->fd < 0)
-   die("could not read bundle '%s'.", transport->url);
+   die(_("could not read bundle '%s'"), transport->url);
for (i = 0; i < data->header.references.nr; i++) {
struct ref_list_entry *e = data->header.references.list + i;
struct ref *ref = alloc_ref(e->name);
@@ -654,7 +654,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
 
switch (data->version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   die(_("support for protocol v2 not implemented yet"));
break;
case protocol_v1:
case protocol_v0:
@@ -780,7 +780,7 @@ static enum protocol_allow_config 
parse_protocol_config(const char *key,
else if (!strcasecmp(value, "user"))
return PROTOCOL_ALLOW_USER_ONLY;
 
-   die("unknown value for config '%s': %s", key, value);
+   die(_("unknown value for config '%s': %s"), key, value);
 }
 
 static enum protocol_allow_config get_protocol_config(const char *type)
@@ -846,7 +846,7 @@ int is_transport_allowed(const char *type, int from_user)
 void transport_check_allowed(const char *type)
 {
if (!is_transport_allowed(type, -1))
-   die("transport '%s' not allowed", type);
+   die(_("transport '%s' not allowed"), type);
 }
 
 static struct transport_vtable bundle_vtable = {
@@ -898,7 +898,7 @@ struct transport *transport_get(struct remote *remote, 
const char *url)
if (helper) {
transport_helper_init(ret, helper);
} else if (starts_with(url, "rsync:")) {
-   die("git-over-rsync is no longer supported");
+   die(_("git-over-rsync is no longer supported"));
} else if (url_is_local_not_ssh(url) && is_file(url) && is_bundle(url, 
1)) {
struct bundle_transport_data *data = xcalloc(1, sizeof(*data));
transport_check_allowed("file");
@@ -1143,7 +1143,7 @@ int transport_push(struct transport *transport,
  transport->push_options,
  pretend)) {
oid_array_clear();
-   die("failed to push all needed submodules!");
+   die(_("failed to push all needed submodules"));
}
oid_array_clear();
}
@@ -1265,7 +1265,7 @@ int transport_connect(struct transport *transport, const 
char *name,
if (transport->vtable->connect)
return transport->vtable->connect(transport, name, exec, fd);
else
-   die("operation not supported by protocol");
+   die(_("operation not supported by protocol"));
 }
 
 int transport_disconnect(struct transport *transport)
@@ -1347,7 +1347,7 @@ static void read_alternate_refs(const char *path,
 
if (get_oid_hex(line.buf, ) ||
line.buf[GIT_SHA1_HEXSZ] != ' ') {
-   warning("invalid line while parsing alternate refs: %s",
+   warning(_("invalid line while parsing alternate refs: 
%s"),
line.buf);
break;
}
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 08/23] commit-graph.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 commit-graph.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 4c6127088f..5a300535b2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -76,28 +76,28 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
if (graph_size < GRAPH_MIN_SIZE) {
close(fd);
-   die("graph file %s is too small", graph_file);
+   die(_("graph file %s is too small"), graph_file);
}
graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
data = (const unsigned char *)graph_map;
 
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
-   error("graph signature %X does not match signature %X",
+   error(_("graph signature %X does not match signature %X"),
  graph_signature, GRAPH_SIGNATURE);
goto cleanup_fail;
}
 
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
-   error("graph version %X does not match version %X",
+   error(_("graph version %X does not match version %X"),
  graph_version, GRAPH_VERSION);
goto cleanup_fail;
}
 
hash_version = *(unsigned char*)(data + 5);
if (hash_version != GRAPH_OID_VERSION) {
-   error("hash version %X does not match version %X",
+   error(_("hash version %X does not match version %X"),
  hash_version, GRAPH_OID_VERSION);
goto cleanup_fail;
}
@@ -121,7 +121,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
-   error("improper chunk offset %08x%08x", 
(uint32_t)(chunk_offset >> 32),
+   error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
}
@@ -157,7 +157,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
if (chunk_repeated) {
-   error("chunk id %08x appears multiple times", chunk_id);
+   error(_("chunk id %08x appears multiple times"), 
chunk_id);
goto cleanup_fail;
}
 
@@ -243,7 +243,7 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit();
if (!c)
-   die("could not find commit %s", oid_to_hex());
+   die(_("could not find commit %s"), oid_to_hex());
c->graph_pos = pos;
return _list_insert(c, pptr)->next;
 }
@@ -516,7 +516,7 @@ static int add_packed_commits(const struct object_id *oid,
 
oi.typep = 
if (packed_object_info(the_repository, pack, offset, ) < 0)
-   die("unable to get type of object %s", oid_to_hex(oid));
+   die(_("unable to get type of object %s"), oid_to_hex(oid));
 
if (type != OBJ_COMMIT)
return 0;
@@ -624,9 +624,9 @@ void write_commit_graph(const char *obj_dir,
strbuf_addstr(, pack_indexes[i]);
p = add_packed_git(packname.buf, packname.len, 1);
if (!p)
-   die("error adding pack %s", packname.buf);
+   die(_("error adding pack %s"), packname.buf);
if (open_pack_index(p))
-   die("error opening index for %s", packname.buf);
+   die(_("error opening index for %s"), 
packname.buf);
for_each_object_in_pack(p, add_packed_commits, );
close_pack(p);
}
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 10/23] connect.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 connect.c | 78 +++
 t/t5570-git-daemon.sh |  6 ++--
 2 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/connect.c b/connect.c
index 083cf804a7..70b97cfef6 100644
--- a/connect.c
+++ b/connect.c
@@ -78,7 +78,7 @@ int server_supports_v2(const char *c, int die_on_error)
}
 
if (die_on_error)
-   die("server doesn't support '%s'", c);
+   die(_("server doesn't support '%s'"), c);
 
return 0;
 }
@@ -100,7 +100,7 @@ int server_supports_feature(const char *c, const char 
*feature,
}
 
if (die_on_error)
-   die("server doesn't support feature '%s'", feature);
+   die(_("server doesn't support feature '%s'"), feature);
 
return 0;
 }
@@ -111,7 +111,7 @@ static void process_capabilities_v2(struct packet_reader 
*reader)
argv_array_push(_capabilities_v2, reader->line);
 
if (reader->status != PACKET_READ_FLUSH)
-   die("expected flush after capabilities");
+   die(_("expected flush after capabilities"));
 }
 
 enum protocol_version discover_version(struct packet_reader *reader)
@@ -230,7 +230,7 @@ static int process_dummy_ref(const char *line)
 static void check_no_capabilities(const char *line, int len)
 {
if (strlen(line) != len)
-   warning("ignoring capabilities after first line '%s'",
+   warning(_("ignoring capabilities after first line '%s'"),
line + strlen(line));
 }
 
@@ -249,7 +249,7 @@ static int process_ref(const char *line, int len, struct 
ref ***list,
if (extra_have && !strcmp(name, ".have")) {
oid_array_append(extra_have, _oid);
} else if (!strcmp(name, "capabilities^{}")) {
-   die("protocol error: unexpected capabilities^{}");
+   die(_("protocol error: unexpected capabilities^{}"));
} else if (check_ref(name, flags)) {
struct ref *ref = alloc_ref(name);
oidcpy(>old_oid, _oid);
@@ -270,9 +270,9 @@ static int process_shallow(const char *line, int len,
return 0;
 
if (get_oid_hex(arg, _oid))
-   die("protocol error: expected shallow sha-1, got '%s'", arg);
+   die(_("protocol error: expected shallow sha-1, got '%s'"), arg);
if (!shallow_points)
-   die("repository on the other end cannot be shallow");
+   die(_("repository on the other end cannot be shallow"));
oid_array_append(shallow_points, _oid);
check_no_capabilities(line, len);
return 1;
@@ -307,13 +307,13 @@ struct ref **get_remote_heads(struct packet_reader 
*reader,
case PACKET_READ_NORMAL:
len = reader->pktlen;
if (len > 4 && skip_prefix(reader->line, "ERR ", ))
-   die("remote error: %s", arg);
+   die(_("remote error: %s"), arg);
break;
case PACKET_READ_FLUSH:
state = EXPECTING_DONE;
break;
case PACKET_READ_DELIM:
-   die("invalid packet");
+   die(_("invalid packet"));
}
 
switch (state) {
@@ -333,7 +333,7 @@ struct ref **get_remote_heads(struct packet_reader *reader,
case EXPECTING_SHALLOW:
if (process_shallow(reader->line, len, shallow_points))
break;
-   die("protocol error: unexpected '%s'", reader->line);
+   die(_("protocol error: unexpected '%s'"), reader->line);
case EXPECTING_DONE:
break;
}
@@ -441,11 +441,11 @@ struct ref **get_remote_refs(int fd_out, struct 
packet_reader *reader,
/* Process response from server */
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
if (!process_ref_v2(reader->line, ))
-   die("invalid ls-refs response: %s", reader->line);
+   die(_("invalid ls-refs response: %s"), reader->line);
}
 
if (reader->status != PACKET_READ_FLUSH)
-   die("expected flush after ref listing");
+   die(_("expected flush after ref listing"));
 
return list;
 }
@@ -544,7 +544,7 @@ static enum protocol get_protocol(const char *name)
return PROTO_SSH;
if (!strcmp(name, "file"))
return PROTO_FILE;
-   die("protocol '%s' is not supported", name);
+   die(_("protocol '%s' is not supported"), name);
 }
 
 static char *host_end(char **hoststart, int removebrackets)
@@ -595,7 +595,7 @@ static void enable_keepalive(int sockfd)
int ka = 1;
 
if (setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, , 

[PATCH v2 07/23] builtin/replace.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/replace.c | 82 +++
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/builtin/replace.c b/builtin/replace.c
index de826e8209..c77b325aa1 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -54,7 +54,7 @@ static int show_reference(const char *refname, const struct 
object_id *oid,
enum object_type obj_type, repl_type;
 
if (get_oid(refname, ))
-   return error("failed to resolve '%s' as a valid 
ref", refname);
+   return error(_("failed to resolve '%s' as a 
valid ref"), refname);
 
obj_type = oid_object_info(the_repository, ,
   NULL);
@@ -83,8 +83,8 @@ static int list_replace_refs(const char *pattern, const char 
*format)
else if (!strcmp(format, "long"))
data.format = REPLACE_FORMAT_LONG;
else
-   return error("invalid replace format '%s'\n"
-"valid formats are 'short', 'medium' and 'long'\n",
+   return error(_("invalid replace format '%s'\n"
+  "valid formats are 'short', 'medium' and 
'long'"),
 format);
 
for_each_replace_ref(the_repository, show_reference, (void *));
@@ -118,7 +118,7 @@ static int for_each_replace_name(const char **argv, 
each_replace_name_fn fn)
full_hex = ref.buf + base_len;
 
if (read_ref(ref.buf, )) {
-   error("replace ref '%s' not found", full_hex);
+   error(_("replace ref '%s' not found"), full_hex);
had_error = 1;
continue;
}
@@ -134,7 +134,7 @@ static int delete_replace_ref(const char *name, const char 
*ref,
 {
if (delete_ref(NULL, ref, oid, 0))
return 1;
-   printf_ln("Deleted replace ref '%s'", name);
+   printf_ln(_("Deleted replace ref '%s'"), name);
return 0;
 }
 
@@ -146,12 +146,12 @@ static int check_ref_valid(struct object_id *object,
strbuf_reset(ref);
strbuf_addf(ref, "%s%s", git_replace_ref_base, oid_to_hex(object));
if (check_refname_format(ref->buf, 0))
-   return error("'%s' is not a valid ref name", ref->buf);
+   return error(_("'%s' is not a valid ref name"), ref->buf);
 
if (read_ref(ref->buf, prev))
oidclr(prev);
else if (!force)
-   return error("replace ref '%s' already exists", ref->buf);
+   return error(_("replace ref '%s' already exists"), ref->buf);
return 0;
 }
 
@@ -171,10 +171,10 @@ static int replace_object_oid(const char *object_ref,
obj_type = oid_object_info(the_repository, object, NULL);
repl_type = oid_object_info(the_repository, repl, NULL);
if (!force && obj_type != repl_type)
-   return error("Objects must be of the same type.\n"
-"'%s' points to a replaced object of type '%s'\n"
-"while '%s' points to a replacement object of "
-"type '%s'.",
+   return error(_("Objects must be of the same type.\n"
+  "'%s' points to a replaced object of type '%s'\n"
+  "while '%s' points to a replacement object of "
+  "type '%s'."),
 object_ref, type_name(obj_type),
 replace_ref, type_name(repl_type));
 
@@ -200,10 +200,10 @@ static int replace_object(const char *object_ref, const 
char *replace_ref, int f
struct object_id object, repl;
 
if (get_oid(object_ref, ))
-   return error("failed to resolve '%s' as a valid ref",
+   return error(_("failed to resolve '%s' as a valid ref"),
 object_ref);
if (get_oid(replace_ref, ))
-   return error("failed to resolve '%s' as a valid ref",
+   return error(_("failed to resolve '%s' as a valid ref"),
 replace_ref);
 
return replace_object_oid(object_ref, , replace_ref, , 
force);
@@ -222,7 +222,7 @@ static int export_object(const struct object_id *oid, enum 
object_type type,
 
fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);
if (fd < 0)
-   return error_errno("unable to open %s for writing", filename);
+   return error_errno(_("unable to open %s for writing"), 
filename);
 
argv_array_push(, "--no-replace-objects");
argv_array_push(, "cat-file");
@@ -235,7 +235,7 @@ static int export_object(const struct object_id *oid, enum 
object_type type,
cmd.out = fd;
 
if (run_command())
-   return error("cat-file 

[PATCH v2 18/23] refspec.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 refspec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refspec.c b/refspec.c
index 6e1efd4d60..1266e509c2 100644
--- a/refspec.c
+++ b/refspec.c
@@ -127,7 +127,7 @@ void refspec_item_init(struct refspec_item *item, const 
char *refspec, int fetch
memset(item, 0, sizeof(*item));
 
if (!parse_refspec(item, refspec, fetch))
-   die("invalid refspec '%s'", refspec);
+   die(_("invalid refspec '%s'"), refspec);
 }
 
 void refspec_item_clear(struct refspec_item *item)
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 06/23] builtin/pack-objects.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Most of these are straight forward. GETTEXT_POISON does catch the last
string in cmd_pack_objects(), but since this is --progress output, it's
not supposed to be machine-readable.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 108 +
 t/t5500-fetch-pack.sh  |   2 +-
 2 files changed, 56 insertions(+), 54 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1a6ece425d..5c697279a9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -140,7 +140,7 @@ static void *get_delta(struct object_entry *entry)
 
buf = read_object_file(>idx.oid, , );
if (!buf)
-   die("unable to read %s", oid_to_hex(>idx.oid));
+   die(_("unable to read %s"), oid_to_hex(>idx.oid));
base_buf = read_object_file((entry)->idx.oid, ,
_size);
if (!base_buf)
@@ -149,7 +149,7 @@ static void *get_delta(struct object_entry *entry)
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
if (!delta_buf || delta_size != DELTA_SIZE(entry))
-   die("delta size changed");
+   die(_("delta size changed"));
free(buf);
free(base_buf);
return delta_buf;
@@ -406,7 +406,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
datalen = revidx[1].offset - offset;
if (!pack_to_stdout && p->index_version > 1 &&
check_pack_crc(p, _curs, offset, datalen, revidx->nr)) {
-   error("bad packed object CRC for %s",
+   error(_("bad packed object CRC for %s"),
  oid_to_hex(>idx.oid));
unuse_pack(_curs);
return write_no_reuse_object(f, entry, limit, usable_delta);
@@ -417,7 +417,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
 
if (!pack_to_stdout && p->index_version == 1 &&
check_pack_inflate(p, _curs, offset, datalen, entry_size)) {
-   error("corrupt packed object for %s",
+   error(_("corrupt packed object for %s"),
  oid_to_hex(>idx.oid));
unuse_pack(_curs);
return write_no_reuse_object(f, entry, limit, usable_delta);
@@ -548,7 +548,7 @@ static enum write_one_status write_one(struct hashfile *f,
 */
recursing = (e->idx.offset == 1);
if (recursing) {
-   warning("recursive delta detected for object %s",
+   warning(_("recursive delta detected for object %s"),
oid_to_hex(>idx.oid));
return WRITE_ONE_RECURSIVE;
} else if (e->idx.offset || e->preferred_base) {
@@ -582,7 +582,7 @@ static enum write_one_status write_one(struct hashfile *f,
 
/* make sure off_t is sufficiently large not to wrap */
if (signed_add_overflows(*offset, size))
-   die("pack too large for current definition of off_t");
+   die(_("pack too large for current definition of off_t"));
*offset += size;
return WRITE_ONE_WRITTEN;
 }
@@ -748,7 +748,8 @@ static struct object_entry **compute_write_order(void)
}
 
if (wo_end != to_pack.nr_objects)
-   die("ordered %u objects, expected %"PRIu32, wo_end, 
to_pack.nr_objects);
+   die(_("ordered %u objects, expected %"PRIu32),
+   wo_end, to_pack.nr_objects);
 
return wo;
 }
@@ -760,15 +761,15 @@ static off_t write_reused_pack(struct hashfile *f)
int fd;
 
if (!is_pack_valid(reuse_packfile))
-   die("packfile is invalid: %s", reuse_packfile->pack_name);
+   die(_("packfile is invalid: %s"), reuse_packfile->pack_name);
 
fd = git_open(reuse_packfile->pack_name);
if (fd < 0)
-   die_errno("unable to open packfile for reuse: %s",
+   die_errno(_("unable to open packfile for reuse: %s"),
  reuse_packfile->pack_name);
 
if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
-   die_errno("unable to seek in reused packfile");
+   die_errno(_("unable to seek in reused packfile"));
 
if (reuse_packfile_offset < 0)
reuse_packfile_offset = reuse_packfile->pack_size - 
the_hash_algo->rawsz;
@@ -779,7 +780,7 @@ static off_t write_reused_pack(struct hashfile *f)
int read_pack = xread(fd, buffer, sizeof(buffer));
 
if (read_pack <= 0)
-   die_errno("unable to read from reused packfile");
+   die_errno(_("unable to read from reused packfile"));
 
if (read_pack > to_write)
read_pack = to_write;
@@ -882,7 +883,7 @@ static void write_pack_file(void)
 * to preserve this property.
 */
 

[PATCH v2 13/23] environment.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 environment.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/environment.c b/environment.c
index 2a6de2330b..d129c4adc5 100644
--- a/environment.c
+++ b/environment.c
@@ -147,7 +147,7 @@ static char *expand_namespace(const char *raw_namespace)
strbuf_addf(, "refs/namespaces/%s", (*c)->buf);
strbuf_list_free(components);
if (check_refname_format(buf.buf, 0))
-   die("bad git namespace path \"%s\"", raw_namespace);
+   die(_("bad git namespace path \"%s\""), raw_namespace);
strbuf_addch(, '/');
return strbuf_detach(, NULL);
 }
@@ -329,7 +329,7 @@ char *get_graft_file(void)
 static void set_git_dir_1(const char *path)
 {
if (setenv(GIT_DIR_ENVIRONMENT, path, 1))
-   die("could not set GIT_DIR to '%s'", path);
+   die(_("could not set GIT_DIR to '%s'"), path);
setup_git_env(path);
 }
 
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 09/23] config.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 config.c  | 76 +++
 t/t1305-config-include.sh |  2 +-
 2 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/config.c b/config.c
index ec08db1d1b..96b6020819 100644
--- a/config.c
+++ b/config.c
@@ -116,12 +116,12 @@ static long config_buf_ftell(struct config_source *conf)
 }
 
 #define MAX_INCLUDE_DEPTH 10
-static const char include_depth_advice[] =
+static const char include_depth_advice[] = N_(
 "exceeded maximum include depth (%d) while including\n"
 "  %s\n"
 "from\n"
 "  %s\n"
-"Do you have circular includes?";
+"Do you have circular includes?");
 static int handle_path_include(const char *path, struct config_include_data 
*inc)
 {
int ret = 0;
@@ -133,7 +133,7 @@ static int handle_path_include(const char *path, struct 
config_include_data *inc
 
expanded = expand_user_path(path, 0);
if (!expanded)
-   return error("could not expand include path '%s'", path);
+   return error(_("could not expand include path '%s'"), path);
path = expanded;
 
/*
@@ -144,7 +144,7 @@ static int handle_path_include(const char *path, struct 
config_include_data *inc
char *slash;
 
if (!cf || !cf->path)
-   return error("relative config includes must come from 
files");
+   return error(_("relative config includes must come from 
files"));
 
slash = find_last_dir_sep(cf->path);
if (slash)
@@ -155,7 +155,7 @@ static int handle_path_include(const char *path, struct 
config_include_data *inc
 
if (!access_or_die(path, R_OK, 0)) {
if (++inc->depth > MAX_INCLUDE_DEPTH)
-   die(include_depth_advice, MAX_INCLUDE_DEPTH, path,
+   die(_(include_depth_advice), MAX_INCLUDE_DEPTH, path,
!cf ? "" :
cf->name ? cf->name :
"the command line");
@@ -342,13 +342,13 @@ static int git_config_parse_key_1(const char *key, char 
**store_key, int *basele
 
if (last_dot == NULL || last_dot == key) {
if (!quiet)
-   error("key does not contain a section: %s", key);
+   error(_("key does not contain a section: %s"), key);
return -CONFIG_NO_SECTION_OR_NAME;
}
 
if (!last_dot[1]) {
if (!quiet)
-   error("key does not contain variable name: %s", key);
+   error(_("key does not contain variable name: %s"), key);
return -CONFIG_NO_SECTION_OR_NAME;
}
 
@@ -372,13 +372,13 @@ static int git_config_parse_key_1(const char *key, char 
**store_key, int *basele
if (!iskeychar(c) ||
(i == baselen + 1 && !isalpha(c))) {
if (!quiet)
-   error("invalid key: %s", key);
+   error(_("invalid key: %s"), key);
goto out_free_ret_1;
}
c = tolower(c);
} else if (c == '\n') {
if (!quiet)
-   error("invalid key (newline): %s", key);
+   error(_("invalid key (newline): %s"), key);
goto out_free_ret_1;
}
if (store_key)
@@ -414,7 +414,7 @@ int git_config_parse_parameter(const char *text,
 
pair = strbuf_split_str(text, '=', 2);
if (!pair[0])
-   return error("bogus config parameter: %s", text);
+   return error(_("bogus config parameter: %s"), text);
 
if (pair[0]->len && pair[0]->buf[pair[0]->len - 1] == '=') {
strbuf_setlen(pair[0], pair[0]->len - 1);
@@ -426,7 +426,7 @@ int git_config_parse_parameter(const char *text,
strbuf_trim(pair[0]);
if (!pair[0]->len) {
strbuf_list_free(pair);
-   return error("bogus config parameter: %s", text);
+   return error(_("bogus config parameter: %s"), text);
}
 
if (git_config_parse_key(pair[0]->buf, _name, NULL)) {
@@ -461,7 +461,7 @@ int git_config_from_parameters(config_fn_t fn, void *data)
envw = xstrdup(env);
 
if (sq_dequote_to_argv(envw, , , ) < 0) {
-   ret = error("bogus format in %s", CONFIG_DATA_ENVIRONMENT);
+   ret = error(_("bogus format in %s"), CONFIG_DATA_ENVIRONMENT);
goto out;
}
 
@@ -1154,7 +1154,7 @@ static int git_default_core_config(const char *var, const 
char *value)
else {
int abbrev = git_config_int(var, value);
if (abbrev < minimum_abbrev || abbrev > 40)
-   

[PATCH v2 17/23] refs.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 refs.c| 40 +--
 t/t1400-update-ref.sh | 20 +++---
 t/t1404-update-ref-errors.sh  |  4 +--
 t/t3210-pack-refs.sh  |  2 +-
 t/t3310-notes-merge-manual-resolve.sh |  6 ++--
 t/t5505-remote.sh |  2 +-
 6 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/refs.c b/refs.c
index a033910353..becb78e441 100644
--- a/refs.c
+++ b/refs.c
@@ -188,7 +188,7 @@ int ref_resolves_to_object(const char *refname,
if (flags & REF_ISBROKEN)
return 0;
if (!has_sha1_file(oid->hash)) {
-   error("%s does not point to a valid object!", refname);
+   error(_("%s does not point to a valid object!"), refname);
return 0;
}
return 1;
@@ -567,9 +567,9 @@ int expand_ref(const char *str, int len, struct object_id 
*oid, char **ref)
if (!warn_ambiguous_refs)
break;
} else if ((flag & REF_ISSYMREF) && strcmp(fullref.buf, 
"HEAD")) {
-   warning("ignoring dangling symref %s", fullref.buf);
+   warning(_("ignoring dangling symref %s"), fullref.buf);
} else if ((flag & REF_ISBROKEN) && strchr(fullref.buf, '/')) {
-   warning("ignoring broken ref %s", fullref.buf);
+   warning(_("ignoring broken ref %s"), fullref.buf);
}
}
strbuf_release();
@@ -673,7 +673,7 @@ static int write_pseudoref(const char *pseudoref, const 
struct object_id *oid,
fd = hold_lock_file_for_update_timeout(, filename, 0,
   get_files_ref_lock_timeout_ms());
if (fd < 0) {
-   strbuf_addf(err, "could not open '%s' for writing: %s",
+   strbuf_addf(err, _("could not open '%s' for writing: %s"),
filename, strerror(errno));
goto done;
}
@@ -683,18 +683,18 @@ static int write_pseudoref(const char *pseudoref, const 
struct object_id *oid,
 
if (read_ref(pseudoref, _old_oid)) {
if (!is_null_oid(old_oid)) {
-   strbuf_addf(err, "could not read ref '%s'",
+   strbuf_addf(err, _("could not read ref '%s'"),
pseudoref);
rollback_lock_file();
goto done;
}
} else if (is_null_oid(old_oid)) {
-   strbuf_addf(err, "ref '%s' already exists",
+   strbuf_addf(err, _("ref '%s' already exists"),
pseudoref);
rollback_lock_file();
goto done;
} else if (oidcmp(_old_oid, old_oid)) {
-   strbuf_addf(err, "unexpected object ID when writing 
'%s'",
+   strbuf_addf(err, _("unexpected object ID when writing 
'%s'"),
pseudoref);
rollback_lock_file();
goto done;
@@ -702,7 +702,7 @@ static int write_pseudoref(const char *pseudoref, const 
struct object_id *oid,
}
 
if (write_in_full(fd, buf.buf, buf.len) < 0) {
-   strbuf_addf(err, "could not write to '%s'", filename);
+   strbuf_addf(err, _("could not write to '%s'"), filename);
rollback_lock_file();
goto done;
}
@@ -734,9 +734,9 @@ static int delete_pseudoref(const char *pseudoref, const 
struct object_id *old_o
return -1;
}
if (read_ref(pseudoref, _old_oid))
-   die("could not read ref '%s'", pseudoref);
+   die(_("could not read ref '%s'"), pseudoref);
if (oidcmp(_old_oid, old_oid)) {
-   error("unexpected object ID when deleting '%s'",
+   error(_("unexpected object ID when deleting '%s'"),
  pseudoref);
rollback_lock_file();
return -1;
@@ -871,13 +871,13 @@ static int read_ref_at_ent(struct object_id *ooid, struct 
object_id *noid,
if (!is_null_oid(>ooid)) {
oidcpy(cb->oid, noid);
if (oidcmp(>ooid, noid))
-   warning("log for ref %s has gap after %s",
+   warning(_("log for ref %s has gap after %s"),
cb->refname, show_date(cb->date, 
cb->tz, DATE_MODE(RFC2822)));
}
else if (cb->date == cb->at_time)
oidcpy(cb->oid, noid);
else if 

[PATCH v2 19/23] replace-object.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 replace-object.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/replace-object.c b/replace-object.c
index 801b5c1678..ddc1546b8c 100644
--- a/replace-object.c
+++ b/replace-object.c
@@ -17,7 +17,7 @@ static int register_replace_ref(const char *refname,
 
if (get_oid_hex(hash, _obj->original.oid)) {
free(repl_obj);
-   warning("bad replace ref name: %s", refname);
+   warning(_("bad replace ref name: %s"), refname);
return 0;
}
 
@@ -26,7 +26,7 @@ static int register_replace_ref(const char *refname,
 
/* Register new object */
if (oidmap_put(the_repository->objects->replace_map, repl_obj))
-   die("duplicate replace ref: %s", refname);
+   die(_("duplicate replace ref: %s"), refname);
 
return 0;
 }
@@ -69,5 +69,5 @@ const struct object_id *do_lookup_replace_object(struct 
repository *r,
return cur;
cur = _obj->replacement;
}
-   die("replace depth too high for object %s", oid_to_hex(oid));
+   die(_("replace depth too high for object %s"), oid_to_hex(oid));
 }
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 11/23] convert.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 convert.c | 38 --
 t/t0021-conversion.sh |  2 +-
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/convert.c b/convert.c
index f47e60022e..e53911d4f8 100644
--- a/convert.c
+++ b/convert.c
@@ -190,7 +190,7 @@ static enum eol output_eol(enum crlf_action crlf_action)
/* fall through */
return text_eol_is_crlf() ? EOL_CRLF : EOL_LF;
}
-   warning("illegal crlf_action %d", (int)crlf_action);
+   warning(_("illegal crlf_action %d"), (int)crlf_action);
return core_eol;
 }
 
@@ -207,7 +207,7 @@ static void check_global_conv_flags_eol(const char *path, 
enum crlf_action crlf_
else if (conv_flags & CONV_EOL_RNDTRP_WARN)
warning(_("CRLF will be replaced by LF in %s.\n"
  "The file will have its original line"
- " endings in your working directory."), path);
+ " endings in your working directory"), path);
} else if (old_stats->lonelf && !new_stats->lonelf ) {
/*
 * CRLFs would be added by checkout
@@ -217,7 +217,7 @@ static void check_global_conv_flags_eol(const char *path, 
enum crlf_action crlf_
else if (conv_flags & CONV_EOL_RNDTRP_WARN)
warning(_("LF will be replaced by CRLF in %s.\n"
  "The file will have its original line"
- " endings in your working directory."), path);
+ " endings in your working directory"), path);
}
 }
 
@@ -492,7 +492,7 @@ static int encode_to_worktree(const char *path, const char 
*src, size_t src_len,
dst = reencode_string_len(src, src_len, enc, default_encoding,
  _len);
if (!dst) {
-   error("failed to encode '%s' from %s to %s",
+   error(_("failed to encode '%s' from %s to %s"),
  path, default_encoding, enc);
return 0;
}
@@ -670,7 +670,8 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 
if (start_command(_process)) {
strbuf_release();
-   return error("cannot fork to run external filter '%s'", 
params->cmd);
+   return error(_("cannot fork to run external filter '%s'"),
+params->cmd);
}
 
sigchain_push(SIGPIPE, SIG_IGN);
@@ -689,13 +690,14 @@ static int filter_buffer_or_fd(int in, int out, void 
*data)
if (close(child_process.in))
write_err = 1;
if (write_err)
-   error("cannot feed the input to external filter '%s'", 
params->cmd);
+   error(_("cannot feed the input to external filter '%s'"),
+ params->cmd);
 
sigchain_pop(SIGPIPE);
 
status = finish_command(_process);
if (status)
-   error("external filter '%s' failed %d", params->cmd, status);
+   error(_("external filter '%s' failed %d"), params->cmd, status);
 
strbuf_release();
return (write_err || status);
@@ -730,13 +732,13 @@ static int apply_single_file_filter(const char *path, 
const char *src, size_t le
return 0;   /* error was already reported */
 
if (strbuf_read(, async.out, len) < 0) {
-   err = error("read from external filter '%s' failed", cmd);
+   err = error(_("read from external filter '%s' failed"), cmd);
}
if (close(async.out)) {
-   err = error("read from external filter '%s' failed", cmd);
+   err = error(_("read from external filter '%s' failed"), cmd);
}
if (finish_async()) {
-   err = error("external filter '%s' failed", cmd);
+   err = error(_("external filter '%s' failed"), cmd);
}
 
if (!err) {
@@ -790,7 +792,7 @@ static void handle_filter_error(const struct strbuf 
*filter_status,
 * Something went wrong with the protocol filter.
 * Force shutdown and restart if another blob requires 
filtering.
 */
-   error("external filter '%s' failed", entry->subprocess.cmd);
+   error(_("external filter '%s' failed"), entry->subprocess.cmd);
subprocess_stop(_map, >subprocess);
free(entry);
}
@@ -838,7 +840,7 @@ static int apply_multi_file_filter(const char *path, const 
char *src, size_t len
else if (wanted_capability & CAP_SMUDGE)
filter_type = "smudge";
else
-   die("unexpected filter type");
+   die(_("unexpected filter type"));
 
sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -849,7 +851,7 @@ static int apply_multi_file_filter(const char *path, const 
char *src, size_t 

[PATCH v2 00/23] Mark strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
v2 changes

- fix one _() (was "()" by accident)
- improve some error messages while i'm making changes
- restore some multi-sentence messages back

All non-_() changes are now collected in the first patch.

Interdiff

diff --git a/builtin/config.c b/builtin/config.c
index 85f043a243..3c26df6c48 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -160,7 +160,11 @@ static struct option builtin_config_options[] = {
 static void check_argc(int argc, int min, int max) {
if (argc >= min && argc <= max)
return;
-   error(_("wrong number of arguments"));
+   if (min == max)
+   error(_("wrong number of arguments, should be %d"), min);
+   else
+   error(_("wrong number of arguments, should be from %d to %d"),
+ min, max);
usage_with_options(builtin_config_usage, builtin_config_options);
 }
 
@@ -746,7 +750,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
ret = git_config_set_in_file_gently(given_config_source.file, 
argv[0], value);
if (ret == CONFIG_NOTHING_SET)
error(_("cannot overwrite multiple values with a single 
value\n"
-   "   Use a regexp, --add or --replace-all to change 
%s"), argv[0]);
+   "   Use a regexp, --add or --replace-all to change 
%s."), argv[0]);
return ret;
}
else if (actions == ACTION_SET_ALL) {
@@ -819,7 +823,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
if (ret < 0)
return ret;
if (ret == 0)
-   die(_("no such section!"));
+   die(_("no such section: %s"), argv[0]);
}
else if (actions == ACTION_REMOVE_SECTION) {
int ret;
@@ -830,7 +834,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
if (ret < 0)
return ret;
if (ret == 0)
-   die(_("no such section!"));
+   die(_("no such section: %s"), argv[0]);
}
else if (actions == ACTION_GET_COLOR) {
check_argc(argc, 1, 2);
diff --git a/builtin/replace.c b/builtin/replace.c
index c203534fd3..c77b325aa1 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -171,10 +171,10 @@ static int replace_object_oid(const char *object_ref,
obj_type = oid_object_info(the_repository, object, NULL);
repl_type = oid_object_info(the_repository, repl, NULL);
if (!force && obj_type != repl_type)
-   return error(_("objects must be of the same type.\n"
+   return error(_("Objects must be of the same type.\n"
   "'%s' points to a replaced object of type '%s'\n"
   "while '%s' points to a replacement object of "
-  "type '%s'"),
+  "type '%s'."),
 object_ref, type_name(obj_type),
 replace_ref, type_name(repl_type));
 
diff --git a/config.c b/config.c
index 80eae290e9..96b6020819 100644
--- a/config.c
+++ b/config.c
@@ -461,7 +461,7 @@ int git_config_from_parameters(config_fn_t fn, void *data)
envw = xstrdup(env);
 
if (sq_dequote_to_argv(envw, , , ) < 0) {
-   ret = error(("bogus format in %s"), CONFIG_DATA_ENVIRONMENT);
+   ret = error(_("bogus format in %s"), CONFIG_DATA_ENVIRONMENT);
goto out;
}
 
diff --git a/connect.c b/connect.c
index af8a581d0e..70b97cfef6 100644
--- a/connect.c
+++ b/connect.c
@@ -60,7 +60,7 @@ static NORETURN void die_initial_contact(int unexpected)
if (unexpected)
die(_("the remote end hung up upon initial contact"));
else
-   die(_("could not read from remote repository.\n\n"
+   die(_("Could not read from remote repository.\n\n"
  "Please make sure you have the correct access rights\n"
  "and the repository exists."));
 }
@@ -928,7 +928,7 @@ static enum protocol parse_connect_url(const char 
*url_orig, char **ret_host,
path = strchr(end, separator);
 
if (!path || !*path)
-   die(_("no path specified. See 'man git-pull' for valid url 
syntax"));
+   die(_("no path specified; see 'git help pull' for valid url 
syntax"));
 
/*
 * null-terminate hostname and point path to ~ for URL's like this:
diff --git a/dir.c b/dir.c
index 848df83321..8e7fa02a01 100644
--- a/dir.c
+++ b/dir.c
@@ -560,7 +560,7 @@ int report_path_error(const char *ps_matched,
if (found_dup)
continue;
 
-   error(_("pathspec '%s' did not match any file(s) known to 
git."),
+   error(_("pathspec '%s' did not match any file(s) known to 

[PATCH v2 01/23] Update messages in preparation for i18n

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are

- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
  messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
  not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
- the trailing \n is converted to printf_ln if possible
- errno_errno() is used instead of explicit strerror()

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 archive-zip.c |  2 +-
 builtin/config.c  | 18 +++-
 builtin/grep.c| 10 +++
 builtin/pack-objects.c| 10 +++
 builtin/replace.c | 22 +++---
 config.c  |  2 +-
 connect.c | 33 +++--
 convert.c |  6 ++--
 dir.c |  4 +--
 pkt-line.c|  2 +-
 refs.c| 12 
 refspec.c |  2 +-
 sequencer.c   |  8 +++---
 sha1-file.c   |  8 +++---
 t/t1400-update-ref.sh |  8 +++---
 t/t3005-ls-files-relative.sh  |  4 +--
 t/t5801-remote-helpers.sh |  6 ++--
 t/t7063-status-untracked-cache.sh |  2 +-
 transport-helper.c| 48 +++
 transport.c   |  8 +++---
 20 files changed, 111 insertions(+), 104 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 74f3fe9103..48d843489c 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -309,7 +309,7 @@ static int write_zip_entry(struct archiver_args *args,
if (is_utf8(path))
flags |= ZIP_UTF8;
else
-   warning("Path is not valid UTF-8: %s", path);
+   warning("path is not valid UTF-8: %s", path);
}
 
if (pathlen > 0x) {
diff --git a/builtin/config.c b/builtin/config.c
index b29d26dede..ebeb4c5638 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -110,7 +110,7 @@ static int option_parse_type(const struct option *opt, 
const char *arg,
 * --int' and '--type=bool
 * --type=int'.
 */
-   error("only one type at a time.");
+   error("only one type at a time");
usage_with_options(builtin_config_usage,
builtin_config_options);
}
@@ -160,7 +160,11 @@ static struct option builtin_config_options[] = {
 static void check_argc(int argc, int min, int max) {
if (argc >= min && argc <= max)
return;
-   error("wrong number of arguments");
+   if (min == max)
+   error("wrong number of arguments, should be %d", min);
+   else
+   error("wrong number of arguments, should be from %d to %d",
+ min, max);
usage_with_options(builtin_config_usage, builtin_config_options);
 }
 
@@ -595,7 +599,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
 
if (use_global_config + use_system_config + use_local_config +
!!given_config_source.file + !!given_config_source.blob > 1) {
-   error("only one config file at a time.");
+   error("only one config file at a time");
usage_with_options(builtin_config_usage, 
builtin_config_options);
}
 
@@ -622,7 +626,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
 * location; error out even if XDG_CONFIG_HOME
 * is set and points at a sane location.
 */
-   die("$HOME not set");
+   die("$HOME is not set");
 
if (access_or_warn(user_config, R_OK, 0) &&
xdg_config && !access_or_warn(xdg_config, R_OK, 0)) {
@@ -664,7 +668,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
}
 
if (HAS_MULTI_BITS(actions)) {
-   error("only one action at a time.");
+   error("only one action at a time");
usage_with_options(builtin_config_usage, 
builtin_config_options);
}
if (actions == 0)
@@ -819,7 +823,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
if (ret < 0)
return ret;
if (ret == 0)
-   die("No such section!");
+   die("no such section: %s", argv[0]);
}
else if (actions == ACTION_REMOVE_SECTION) {
int ret;
@@ -830,7 +834,7 @@ int cmd_config(int argc, const char 

[PATCH v2 12/23] dir.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 dir.c| 6 +++---
 t/t3005-ls-files-relative.sh | 4 ++--
 t/t7400-submodule-basic.sh   | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/dir.c b/dir.c
index 7e39f38563..8e7fa02a01 100644
--- a/dir.c
+++ b/dir.c
@@ -560,7 +560,7 @@ int report_path_error(const char *ps_matched,
if (found_dup)
continue;
 
-   error("pathspec '%s' did not match any file(s) known to git",
+   error(_("pathspec '%s' did not match any file(s) known to git"),
  pathspec->items[num].original);
errors++;
}
@@ -949,7 +949,7 @@ static void add_excludes_from_file_1(struct dir_struct 
*dir, const char *fname,
dir->unmanaged_exclude_files++;
el = add_exclude_list(dir, EXC_FILE, fname);
if (add_excludes(fname, "", 0, el, NULL, oid_stat) < 0)
-   die("cannot use %s as an exclude file", fname);
+   die(_("cannot use %s as an exclude file"), fname);
 }
 
 void add_excludes_from_file(struct dir_struct *dir, const char *fname)
@@ -3027,7 +3027,7 @@ static void connect_wt_gitdir_in_nested(const char 
*sub_worktree,
return;
 
if (repo_read_index() < 0)
-   die("index file corrupt in repo %s", subrepo.gitdir);
+   die(_("index file corrupt in repo %s"), subrepo.gitdir);
 
for (i = 0; i < subrepo.index->cache_nr; i++) {
const struct cache_entry *ce = subrepo.index->cache[i];
diff --git a/t/t3005-ls-files-relative.sh b/t/t3005-ls-files-relative.sh
index 341fad54ce..209b4c7cd8 100755
--- a/t/t3005-ls-files-relative.sh
+++ b/t/t3005-ls-files-relative.sh
@@ -50,7 +50,7 @@ test_expect_success 'ls-files -c' '
ls ../x* >expect.out &&
test_must_fail git ls-files -c --error-unmatch ../[xy]* 
>actual.out 2>actual.err &&
test_cmp expect.out actual.out &&
-   test_cmp expect.err actual.err
+   test_i18ncmp expect.err actual.err
)
 '
 
@@ -65,7 +65,7 @@ test_expect_success 'ls-files -o' '
ls ../y* >expect.out &&
test_must_fail git ls-files -o --error-unmatch ../[xy]* 
>actual.out 2>actual.err &&
test_cmp expect.out actual.out &&
-   test_cmp expect.err actual.err
+   test_i18ncmp expect.err actual.err
)
 '
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 2f532529b8..c1011b2311 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -375,7 +375,7 @@ test_expect_success 'init should register submodule url in 
.git/config' '
 
 test_failure_with_unknown_submodule () {
test_must_fail git submodule $1 no-such-submodule 2>output.err &&
-   grep "^error: .*no-such-submodule" output.err
+   test_i18ngrep "^error: .*no-such-submodule" output.err
 }
 
 test_expect_success 'init should fail with unknown submodule' '
-- 
2.18.0.rc0.333.g22e6ee6cdf



[PATCH v2 04/23] builtin/config.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/config.c  | 48 +--
 t/t1308-config-set.sh |  2 +-
 2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index ebeb4c5638..3c26df6c48 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -110,7 +110,7 @@ static int option_parse_type(const struct option *opt, 
const char *arg,
 * --int' and '--type=bool
 * --type=int'.
 */
-   error("only one type at a time");
+   error(_("only one type at a time"));
usage_with_options(builtin_config_usage,
builtin_config_options);
}
@@ -161,9 +161,9 @@ static void check_argc(int argc, int min, int max) {
if (argc >= min && argc <= max)
return;
if (min == max)
-   error("wrong number of arguments, should be %d", min);
+   error(_("wrong number of arguments, should be %d"), min);
else
-   error("wrong number of arguments, should be from %d to %d",
+   error(_("wrong number of arguments, should be from %d to %d"),
  min, max);
usage_with_options(builtin_config_usage, builtin_config_options);
 }
@@ -297,7 +297,7 @@ static int get_value(const char *key_, const char *regex_)
 
key_regexp = (regex_t*)xmalloc(sizeof(regex_t));
if (regcomp(key_regexp, key, REG_EXTENDED)) {
-   error("invalid key pattern: %s", key_);
+   error(_("invalid key pattern: %s"), key_);
FREE_AND_NULL(key_regexp);
ret = CONFIG_INVALID_PATTERN;
goto free_strings;
@@ -317,7 +317,7 @@ static int get_value(const char *key_, const char *regex_)
 
regexp = (regex_t*)xmalloc(sizeof(regex_t));
if (regcomp(regexp, regex_, REG_EXTENDED)) {
-   error("invalid pattern: %s", regex_);
+   error(_("invalid pattern: %s"), regex_);
FREE_AND_NULL(regexp);
ret = CONFIG_INVALID_PATTERN;
goto free_strings;
@@ -390,7 +390,7 @@ static char *normalize_value(const char *key, const char 
*value)
if (type == TYPE_COLOR) {
char v[COLOR_MAXLEN];
if (git_config_color(v, key, value))
-   die("cannot parse color '%s'", value);
+   die(_("cannot parse color '%s'"), value);
 
/*
 * The contents of `v` now contain an ANSI escape
@@ -485,13 +485,13 @@ static int get_colorbool(const char *var, int print)
 static void check_write(void)
 {
if (!given_config_source.file && !startup_info->have_repository)
-   die("not in a git directory");
+   die(_("not in a git directory"));
 
if (given_config_source.use_stdin)
-   die("writing to stdin is not supported");
+   die(_("writing to stdin is not supported"));
 
if (given_config_source.blob)
-   die("writing config blobs is not supported");
+   die(_("writing config blobs is not supported"));
 }
 
 struct urlmatch_current_candidate_value {
@@ -599,7 +599,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
 
if (use_global_config + use_system_config + use_local_config +
!!given_config_source.file + !!given_config_source.blob > 1) {
-   error("only one config file at a time");
+   error(_("only one config file at a time"));
usage_with_options(builtin_config_usage, 
builtin_config_options);
}
 
@@ -626,7 +626,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
 * location; error out even if XDG_CONFIG_HOME
 * is set and points at a sane location.
 */
-   die("$HOME is not set");
+   die(_("$HOME is not set"));
 
if (access_or_warn(user_config, R_OK, 0) &&
xdg_config && !access_or_warn(xdg_config, R_OK, 0)) {
@@ -663,12 +663,12 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
}
 
if ((actions & (ACTION_GET_COLOR|ACTION_GET_COLORBOOL)) && type) {
-   error("--get-color and variable type are incoherent");
+   error(_("--get-color and variable type are incoherent"));
usage_with_options(builtin_config_usage, 
builtin_config_options);
}
 
if (HAS_MULTI_BITS(actions)) {
-   error("only one action at a time");
+   error(_("only one action at a time"));
usage_with_options(builtin_config_usage, 
builtin_config_options);
}
if (actions == 0)
@@ -681,19 +681,19 @@ 

[PATCH v2 14/23] exec-cmd.c: mark more strings for translation

2018-06-03 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 exec-cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exec-cmd.c b/exec-cmd.c
index 02d31ee897..4f81f44310 100644
--- a/exec-cmd.c
+++ b/exec-cmd.c
@@ -358,7 +358,7 @@ int execl_git_cmd(const char *cmd, ...)
}
va_end(param);
if (MAX_ARGS <= argc)
-   return error("too many args to run %s", cmd);
+   return error(_("too many args to run %s"), cmd);
 
argv[argc] = NULL;
return execv_git_cmd(argv);
-- 
2.18.0.rc0.333.g22e6ee6cdf



Re: [PATCH v2] t/perf/run: Use proper "--get-regexp", not "--get-regex"

2018-06-03 Thread Philip Oakley

From: "Robert P. J. Day" 

On Sun, 3 Jun 2018, Thomas Gummerer wrote:


> Subject: [PATCH v2] t/perf/run: Use proper "--get-regexp", not

micronit: we prefer starting with a lowercase letter after the "area:"
prefix in commit messages.   Junio can probably fix that while
queuing, so no need to resend.


 argh, i actually know that, i just screwed up.


On 06/03, Robert P. J. Day wrote:
>
> Even though "--get-regex" appears to work with "git config", the
> clear standard is to spell out the action in full.

--get-regex works as the parse-option API allows abbreviations of the
full option to be specified as long as the abbreviation is
unambiguos.  I don't know if this is documented anywhere other than
'Documentation/technical/api-parse-options.txt' though.


it's in `git help cli`:

many commands allow a long option --option to be abbreviated only to their 
unique prefix (e.g. if there is no other option whose name begins with opt, 
you may be able to spell --opt to invoke the --option flag), but you should 
fully spell them out when writing your scripts;


It's a worthwile read, even if the man page isn't flagged up that often.



> Signed-off-by: Robert P. J. Day 
>
> ---

It took me a bit to figure out why there is a v2, and what changed
between the versions.  This space after the '---' would be a good
place to describe that to help reviewers.

For others that are curious, it seems like the word "clear" was added
in the commit message.

The change itself looks good to me.


 the actual rationale for v2 was in the subject, i originally put
just "get-regex" rather then "--get-regex"; i resubmitted for
consistency.


--
Philip 



Re: GDPR compliance best practices?

2018-06-03 Thread Philip Oakley

From: "Peter Backes" 

On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:

I'm not trying to be selfish, I'm just trying to counter your literal
reading of the law with a comment of "it'll depend".

Just like there's a law against public urination in many places, but
this is applied very differently to someone taking a piss in front of
parliament v.s. someone taking a piss in the forest on a hike, even
though the law itself usually makes no distinction about the two.


We have huge companies using git now. This is not the tool used by a
few kernel hackers anymore.


In this example once you'd delete the UUID ref you don't have the UUID
-> author mapping anymore (and b.t.w. that could be a many to one
mapping).


It is not relevant whether you have that mapping or not, it is enough
that with additional information you could obtain it. For example, say,
you have 5000 commits with the same UUID. Now your delete the mapping.
But your friend still has it on his local copy. Now your friendly
merely needs to tell you who is behind that UUID and instantly you can
associate all 5000 commits with that person again.

The GDPR is very explict about this, see recital 26. It says that
pseudonymization is not enough, you need anonymization if you want to
be free from regulation.

In addition, and in contrast to my proposal, your solution doesn't
allow verification of the author field.


I think again that this is taking too much of a literalist view. The
intent of that policy is to ensure that companies like Google can't just
close down their EU offices weasel out of compliance be saying "we're
just doing business from the US, it doesn't apply to us".

It will not be used against anyone who's taking every reasonable
precaution from doing business with EU customers.


I think you are underestimating the political intention behind the
GDPR. It has kind of an imperialist goal, to set international
standards, to enforce them against foreign companies and to pressure
other nations to establish the same standards.

If I would read the GPDR in a literal sense, I would in fact come to
the same conclusion as you: It's about companies doing substantial
business in the EU. But the GDPR is carefully constructed in such a way
that it is hard not to be affected by the GDPR in one way or another,
and the obvious way to cope with that risk is to more or less obey the
GDPR rules even if one does not have substantial business interests in
the EU.


What do you imagine that this is going to be like? That some EU citizen
is going to walk into a small business in South America one day, which
somehow is violating the GPDR, and when that business owner goes on
holiday to the EU they're going to get detained? Not even the US policy
against Cuba is anywhere remotely close to that.


Well not if he's locally interacting with that business, a situation
which I am sure is not regulated by the GDPR.

However, if a large US website accepts users from the EU and uses the
data gathered in conflict with the GDPR, perhaps selling it for use in
political campaigns, and it gets several fines for this by EU
authorities but ignores them and doesn't pay them, and the CEO one day
takes a flight to Frankfurt to continue by train to Switzerland to get
some cash from his bank account, then he will most likely not reach
Swiss territory.



--
Having been through corporate training and read up a number of the
conflicting views in the press, one of the issues is that there are two
viewpoints, one from each side of the fence.


From a corporate/organisation viewpoint, it is best if every case of holding

user information is for a legitimate purpose, which then means the company
has 'protection' from requests for removal because the data *is* held
legally/legitimately (which includes acting as evidence).

In most Git cases that legal/legitimate purpose is the copyright licence,
and/or corporate employment. That is, Jane wrote it, hence X has a legal
rights of use, and we need to have a record of that (Jane wrote it) as
evidence of that (I'm X, I can use it) right. That would mean that Jane
cannot just ask to have that record removed and expect it to be removed.


From a personal view, many folk want it to be that corporates (and open

source organisations) should hold no personal information with having
explicit permission that can then be withdrawn, with deletion to follow.
However that 'legal' clause does [generally] win.

In the git.git case (and linux.git) there is the DCO (to back up the GLP2)
as an explicit requirement/certification that puts the information into the
legal evidence category. IIUC almost all copyright ends up with a similar
evidentail trail for the meta data.


The more likely problem is if the content of the repo, rather than the meta
data, is subject to GDPR, and that could easily ruin any storage method.
Being able to mark an object as  would help here(*).

Also remember that most EU legislation is 'intent' based, 

Re: [PATCH 21/22] transport.c: mark more strings for translation

2018-06-03 Thread Duy Nguyen
On Sun, Jun 3, 2018 at 10:29 AM, Eric Sunshine  wrote:
> On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  
> wrote:
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>> diff --git a/transport.c b/transport.c
>> @@ -875,7 +875,7 @@ struct transport *transport_get(struct remote *remote, 
>> const char *url)
>> ret->progress = isatty(2);
>>
>> if (!remote)
>> -   die("No remote provided to transport_get()");
>> +   BUG("No remote provided to transport_get()");
>
> Did you want to downcase "No" or just didn't bother since it's not
> intended for end-user?

Not bother since it's BUG().
-- 
Duy


Re: [PATCH 19/22] sequencer.c: mark more strings for translation

2018-06-03 Thread Duy Nguyen
On Sun, Jun 3, 2018 at 10:16 AM, Eric Sunshine  wrote:
> On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  
> wrote:
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>> diff --git a/sequencer.c b/sequencer.c
>> @@ -2597,15 +2597,17 @@ static int error_with_patch(struct commit *commit,
>> -   fprintf(stderr, "You can amend the commit now, with\n"
>> -   "\n"
>> -   "  git commit --amend %s\n"
>> -   "\n"
>> -   "Once you are satisfied with your changes, run\n"
>> -   "\n"
>> -   "  git rebase --continue\n", 
>> gpg_sign_opt_quoted(opts));
>> +   fprintf(stderr,
>> +   _("You can amend the commit now, with\n"
>> + "\n"
>> + "  git commit --amend %s\n"
>> + "\n"
>> + "Once you are satisfied with your changes, run\n"
>> + "\n"
>> + "  git rebase --continue\n"),
>> +   gpg_sign_opt_quoted(opts));
>> } else if (exit_code)
>> -   fprintf(stderr, "Could not apply %s... %.*s\n",
>> +   fprintf_ln(stderr, _("Could not apply %s... %.*s"),
>
> Did you want to downcase "Could" for consistency with other error
> messages, or was this left as-is intentionally?

I'm not sure. Others start with a prefix (like "error:",
"warning:"). This is a bit different and makes me hesitate.
-- 
Duy


Re: [PATCH 03/22] builtin/config.c: mark more strings for translation

2018-06-03 Thread Duy Nguyen
On Sun, Jun 3, 2018 at 11:01 AM, Eric Sunshine  wrote:
> On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  
> wrote:
>> There are also some minor adjustments in the strings.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>> diff --git a/builtin/config.c b/builtin/config.c
>> @@ -746,7 +746,7 @@ int cmd_config(int argc, const char **argv, const char 
>> *prefix)
>> if (ret == CONFIG_NOTHING_SET)
>> error(_("cannot overwrite multiple values with a 
>> single value\n"
>> -   "   Use a regexp, --add or --replace-all to 
>> change %s."), argv[0]);
>> +   "   Use a regexp, --add or --replace-all to 
>> change %s"), argv[0]);
>
> Perhaps?
>
> cannot overwrite multiple values with a single value;
> use a regexp, --add or --replace-all to change %s

I'll probably leave these multi sentence errors alone. There's some
work going on about these multiple sentence messages. We can come back
after that series settles.

>> @@ -819,7 +819,7 @@ int cmd_config(int argc, const char **argv, const char 
>> *prefix)
>> if (ret == 0)
>> -   die("No such section!");
>> +   die(_("no such section!"));
>> @@ -830,7 +830,7 @@ int cmd_config(int argc, const char **argv, const char 
>> *prefix)
>> if (ret == 0)
>> -   die("No such section!");
>> +   die(_("no such section!"));
>
> In other patches, you dropped the trailing "!"; perhaps do so for
> these two also?
>
> Maybe even:
>
> die(_("no such section: %s", whatever);
>
> Though, that may be out of scope of this patch series.

Might as well do it while I'm at it. I'll do it in a separate patch
though and try to keep these i18n patches about _() only.
-- 
Duy


[PATCH] update-ref --stdin: use skip_prefix()

2018-06-03 Thread SZEDER Gábor
Use skip_prefix() instead of starts_with() and strcmp() when parsing
'git update-ref's stdin to avoid a couple of magic numbers.

Signed-off-by: SZEDER Gábor 
---
 builtin/update-ref.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 4b4714b3fd..4fa3c0a86f 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -311,11 +311,12 @@ static const char *parse_cmd_verify(struct 
ref_transaction *transaction,
 
 static const char *parse_cmd_option(struct strbuf *input, const char *next)
 {
-   if (!strncmp(next, "no-deref", 8) && next[8] == line_termination)
+   const char *rest;
+   if (skip_prefix(next, "no-deref", ) && *rest == line_termination)
update_flags |= REF_NO_DEREF;
else
die("option unknown: %s", next);
-   return next + 8;
+   return rest;
 }
 
 static void update_refs_stdin(struct ref_transaction *transaction)
@@ -332,16 +333,16 @@ static void update_refs_stdin(struct ref_transaction 
*transaction)
die("empty command in input");
else if (isspace(*next))
die("whitespace before command: %s", next);
-   else if (starts_with(next, "update "))
-   next = parse_cmd_update(transaction, , next + 7);
-   else if (starts_with(next, "create "))
-   next = parse_cmd_create(transaction, , next + 7);
-   else if (starts_with(next, "delete "))
-   next = parse_cmd_delete(transaction, , next + 7);
-   else if (starts_with(next, "verify "))
-   next = parse_cmd_verify(transaction, , next + 7);
-   else if (starts_with(next, "option "))
-   next = parse_cmd_option(, next + 7);
+   else if (skip_prefix(next, "update ", ))
+   next = parse_cmd_update(transaction, , next);
+   else if (skip_prefix(next, "create ", ))
+   next = parse_cmd_create(transaction, , next);
+   else if (skip_prefix(next, "delete ", ))
+   next = parse_cmd_delete(transaction, , next);
+   else if (skip_prefix(next, "verify ", ))
+   next = parse_cmd_verify(transaction, , next);
+   else if (skip_prefix(next, "option ", ))
+   next = parse_cmd_option(, next);
else
die("unknown command: %s", next);
 
-- 
2.18.0.rc0.207.ga6211da864



[PATCH v2] sha1-file.c: correct $GITDIR to $GIT_DIR in a comment

2018-06-03 Thread Robert P. J. Day
Fix single misspelling of $GITDIR to correct $GIT_DIR in a comment.

Signed-off-by: Robert P. J. Day 

---

  updated patch to clarify that the misspelling is only in a comment.

diff --git a/sha1-file.c b/sha1-file.c
index 555e780f4..695e5c627 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -610,7 +610,7 @@ void add_to_alternates_memory(const char *reference)
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
- * `path` may be relative and should point to $GITDIR.
+ * `path` may be relative and should point to $GIT_DIR.
  * `err` must not be null.
  */
 char *compute_alternate_path(const char *path, struct strbuf *err)

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [PATCH] sha1-file.c: Correct $GITDIR to $GIT_DIR

2018-06-03 Thread SZEDER Gábor
Could you please add "in a comment" to the end of the subject line?
So it will be immediately clear to future readers of 'git log
--oneline' that this is not a bugfix.

> Fix single misspelling of $GITDIR to correct $GIT_DIR.
> 
> Signed-off-by: Robert P. J. Day 
> 
> ---
> 
> only occurrence in entire code base that i ran across, so i figured
> might as well fix it.
> 
> diff --git a/sha1-file.c b/sha1-file.c
> index 555e780f4..695e5c627 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -610,7 +610,7 @@ void add_to_alternates_memory(const char *reference)
>  /*
>   * Compute the exact path an alternate is at and returns it. In case of
>   * error NULL is returned and the human readable error is added to `err`
> - * `path` may be relative and should point to $GITDIR.
> + * `path` may be relative and should point to $GIT_DIR.
>   * `err` must not be null.
>   */
>  char *compute_alternate_path(const char *path, struct strbuf *err)


Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 02:59:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
> I'm not trying to be selfish, I'm just trying to counter your literal
> reading of the law with a comment of "it'll depend".
> 
> Just like there's a law against public urination in many places, but
> this is applied very differently to someone taking a piss in front of
> parliament v.s. someone taking a piss in the forest on a hike, even
> though the law itself usually makes no distinction about the two.

We have huge companies using git now. This is not the tool used by a 
few kernel hackers anymore.

> In this example once you'd delete the UUID ref you don't have the UUID
> -> author mapping anymore (and b.t.w. that could be a many to one
> mapping).

It is not relevant whether you have that mapping or not, it is enough 
that with additional information you could obtain it. For example, say, 
you have 5000 commits with the same UUID. Now your delete the mapping. 
But your friend still has it on his local copy. Now your friendly 
merely needs to tell you who is behind that UUID and instantly you can 
associate all 5000 commits with that person again.

The GDPR is very explict about this, see recital 26. It says that 
pseudonymization is not enough, you need anonymization if you want to 
be free from regulation.

In addition, and in contrast to my proposal, your solution doesn't 
allow verification of the author field.

> I think again that this is taking too much of a literalist view. The
> intent of that policy is to ensure that companies like Google can't just
> close down their EU offices weasel out of compliance be saying "we're
> just doing business from the US, it doesn't apply to us".
> 
> It will not be used against anyone who's taking every reasonable
> precaution from doing business with EU customers.

I think you are underestimating the political intention behind the 
GDPR. It has kind of an imperialist goal, to set international 
standards, to enforce them against foreign companies and to pressure 
other nations to establish the same standards.

If I would read the GPDR in a literal sense, I would in fact come to 
the same conclusion as you: It's about companies doing substantial 
business in the EU. But the GDPR is carefully constructed in such a way 
that it is hard not to be affected by the GDPR in one way or another, 
and the obvious way to cope with that risk is to more or less obey the 
GDPR rules even if one does not have substantial business interests in 
the EU. 

> What do you imagine that this is going to be like? That some EU citizen
> is going to walk into a small business in South America one day, which
> somehow is violating the GPDR, and when that business owner goes on
> holiday to the EU they're going to get detained? Not even the US policy
> against Cuba is anywhere remotely close to that.

Well not if he's locally interacting with that business, a situation 
which I am sure is not regulated by the GDPR.

However, if a large US website accepts users from the EU and uses the 
data gathered in conflict with the GDPR, perhaps selling it for use in 
political campaigns, and it gets several fines for this by EU 
authorities but ignores them and doesn't pay them, and the CEO one day 
takes a flight to Frankfurt to continue by train to Switzerland to get 
some cash from his bank account, then he will most likely not reach 
Swiss territory.

Best wishes
Peter
-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: [RFC PATCH 4/7] merge-recursive: fix assumption that head tree being merged is HEAD

2018-06-03 Thread Ramsay Jones



On 03/06/18 07:58, Elijah Newren wrote:
> `git merge-recursive` does a three-way merge between user-specified trees
> base, head, and remote.  Since the user is allowed to specify head, we can
> not necesarily assume that head == HEAD.
> 
> We modify index_has_changes() to take an extra argument specifying the
> tree to compare the index to.  If NULL, it will compare to HEAD.  We then
> use this from merge-recursive to make sure we compare to the
> user-specified head.
> 
> Signed-off-by: Elijah Newren 
> ---
> 
> I'm really unsure where the index_has_changes() declaration should go;
> I stuck it in tree.h, but is there a better spot?

Err, leave it where it is and '#include "tree.h"' ? :-D

ATB,
Ramsay Jones




Re: [PATCH v2] t/perf/run: Use proper "--get-regexp", not "--get-regex"

2018-06-03 Thread Robert P. J. Day
On Sun, 3 Jun 2018, Thomas Gummerer wrote:

> > Subject: [PATCH v2] t/perf/run: Use proper "--get-regexp", not
>
> micronit: we prefer starting with a lowercase letter after the "area:"
> prefix in commit messages.   Junio can probably fix that while
> queuing, so no need to resend.

  argh, i actually know that, i just screwed up.

> On 06/03, Robert P. J. Day wrote:
> >
> > Even though "--get-regex" appears to work with "git config", the
> > clear standard is to spell out the action in full.
>
> --get-regex works as the parse-option API allows abbreviations of the
> full option to be specified as long as the abbreviation is
> unambiguos.  I don't know if this is documented anywhere other than
> 'Documentation/technical/api-parse-options.txt' though.
>
> > Signed-off-by: Robert P. J. Day 
> >
> > ---
>
> It took me a bit to figure out why there is a v2, and what changed
> between the versions.  This space after the '---' would be a good
> place to describe that to help reviewers.
>
> For others that are curious, it seems like the word "clear" was added
> in the commit message.
>
> The change itself looks good to me.

  the actual rationale for v2 was in the subject, i originally put
just "get-regex" rather then "--get-regex"; i resubmitted for
consistency.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [PATCH v2] t/perf/run: Use proper "--get-regexp", not "--get-regex"

2018-06-03 Thread Thomas Gummerer
> Subject: [PATCH v2] t/perf/run: Use proper "--get-regexp", not

micronit: we prefer starting with a lowercase letter after the "area:"
prefix in commit messages.   Junio can probably fix that while
queuing, so no need to resend.

On 06/03, Robert P. J. Day wrote:
> 
> Even though "--get-regex" appears to work with "git config", the
> clear standard is to spell out the action in full.

--get-regex works as the parse-option API allows abbreviations of the
full option to be specified as long as the abbreviation is
unambiguos.  I don't know if this is documented anywhere other than
'Documentation/technical/api-parse-options.txt' though.

> Signed-off-by: Robert P. J. Day 
> 
> ---

It took me a bit to figure out why there is a v2, and what changed
between the versions.  This space after the '---' would be a good
place to describe that to help reviewers.

For others that are curious, it seems like the word "clear" was added
in the commit message.

The change itself looks good to me.

> this is the only occurrence i saw of this in the entire code base, so
> it seemed worth tweaking just for consistency.
> 
> diff --git a/t/perf/run b/t/perf/run
> index 9aaa733c7..fb5753ea2 100755
> --- a/t/perf/run
> +++ b/t/perf/run
> @@ -110,7 +110,7 @@ run_dirs () {
>  get_subsections () {
>   section="$1"
>   test -z "$GIT_PERF_CONFIG_FILE" && return
> - git config -f "$GIT_PERF_CONFIG_FILE" --name-only --get-regex 
> "$section\..*\.[^.]+" |
> + git config -f "$GIT_PERF_CONFIG_FILE" --name-only --get-regexp 
> "$section\..*\.[^.]+" |
>   sed -e "s/$section\.\(.*\)\..*/\1/" | sort | uniq
>  }
> 
> 
> -- 
> 
> 
> Robert P. J. Day Ottawa, Ontario, CANADA
>   http://crashcourse.ca/dokuwiki
> 
> Twitter:   http://twitter.com/rpjday
> LinkedIn:   http://ca.linkedin.com/in/rpjday
> 


Re: GDPR compliance best practices?

2018-06-03 Thread Ævar Arnfjörð Bjarmason


On Sun, Jun 03 2018, Peter Backes wrote:

> On Sun, Jun 03, 2018 at 12:45:25PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> protection". I.e. regulators / prosecutors are much likely to go after
>> some advertising company than some project using a Git repo.
>
> Well, it is indeed rather unlikely that one particular git repo project
> will be targeted, but I guess it is basically certain that at least
> some of them will be.
>
> It is the same as a lottery, it's very unlikely you win the jackpot,
> yet someone wins it every few months. We should care about the entire
> community, not be too selfish.

I'm not trying to be selfish, I'm just trying to counter your literal
reading of the law with a comment of "it'll depend".

Just like there's a law against public urination in many places, but
this is applied very differently to someone taking a piss in front of
parliament v.s. someone taking a piss in the forest on a hike, even
though the law itself usually makes no distinction about the two.

>> Since the Author is free-form this sort of thing doesn't need to be part
>> of the git data format. You can just generate a UUID like
>> "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to
>> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to
>> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79".
>
> Well, this is merely pseudonymization, not anonymization. Note that the
> UUID, innocent as it may look, is not in any way less "personal data"
> than the author string itself. Your proposal would thus not actually
> solve the problem, only slightly transform it. Only when you truly
> anonymize (see my proposal about one way to to it), you can completely
> evade the GDPR.

In this example once you'd delete the UUID ref you don't have the UUID
-> author mapping anymore (and b.t.w. that could be a many to one
mapping).

This seems perfectly acceptable to be since the spirit of the GDPR is to
prevent easy Googling of who did what in the past, not to prevent
someone with tremendous resources from say doing a textual analysis of
all git.git commits to find out who authored what.

>> Sites that are paranoid about the GDPR could have a pre-receive hook
>> rejecting any pushes from EU customers unless their commits were in this
>> format.
>
> This won't work either. The GDPR makes each data processor directly
> responsible in relation to the data subject. So it does not matter at
> all who is pushing, it matters who is in the author field of the
> commits that were pushed. And since you don't have any information
> about whether those authors are residing within the EU or not, you have
> to assume they are and you have to obey the GDPR. Even if you are
> outside the EU and do not have any subsidiaries within the EU, the GDPR
> sill applies as long as you are processing personal data of EU citizen.
> Perhaps the authorities in your country will refuse to obey letters of
> request if the EU authorities try to enforce the GDPR on an
> international scope, but if you have a record of GDPR violation and you
> ever set foot on EU territory, you are fair game.

I think again that this is taking too much of a literalist view. The
intent of that policy is to ensure that companies like Google can't just
close down their EU offices weasel out of compliance be saying "we're
just doing business from the US, it doesn't apply to us".

It will not be used against anyone who's taking every reasonable
precaution from doing business with EU customers.

What do you imagine that this is going to be like? That some EU citizen
is going to walk into a small business in South America one day, which
somehow is violating the GPDR, and when that business owner goes on
holiday to the EU they're going to get detained? Not even the US policy
against Cuba is anywhere remotely close to that.

>> Instead I'll have a daily UUID issued from a government API
>
> Heaven forbid. ;) There is an old German proverb, warning that even
> humorous trolling might be dangerous: "Man soll den Teufel nicht an die
> Wand malen!" ;)


[PATCH] sha1-file.c: Correct $GITDIR to $GIT_DIR

2018-06-03 Thread Robert P. J. Day
Fix single misspelling of $GITDIR to correct $GIT_DIR.

Signed-off-by: Robert P. J. Day 

---

only occurrence in entire code base that i ran across, so i figured
might as well fix it.

diff --git a/sha1-file.c b/sha1-file.c
index 555e780f4..695e5c627 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -610,7 +610,7 @@ void add_to_alternates_memory(const char *reference)
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
- * `path` may be relative and should point to $GITDIR.
+ * `path` may be relative and should point to $GIT_DIR.
  * `err` must not be null.
  */
 char *compute_alternate_path(const char *path, struct strbuf *err)


-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [RFC/PATCH 3/7] rerere: add some documentation

2018-06-03 Thread Thomas Gummerer
On 05/24, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > +Conflict normalization
> > +--
> > +
> > +To try and re-do a conflict resolution, even when different merge
> > +strategies are used, 'rerere' computes a conflict ID for each
> > +conflict in the file.
> > +
> > +This is done by discarding the common ancestor version in the
> > +diff3-style, and re-ordering the two sides of the conflict, in
> > +alphabetic order.
> 
> s/discarding.*-style/normalising the conflicted section to 'merge' style/
> 
> The motivation behind the normalization should probably be given
> upfront in the first paragraph.  It is to ensure the recorded
> resolutions can be looked up from the rerere database for
> application, even when branches are merged in different order.  I am
> not sure what you meant by even when different merge stratagies are
> used; I'd drop that if I were writing the paragraph.

What I meant was when different conflict styles are used, and when the
branches are merged in different orders.  But merge strategies is
obviously not a good word for that.  Will rephrase this.

> > +Using this technique a conflict that looks as follows when for example
> > +'master' was merged into a topic branch:
> > +
> > +<<< HEAD
> > +foo
> > +===
> > +bar
> > +>>> master
> > +
> > +and the opposite way when the topic branch is merged into 'master':
> > +
> > +<<< HEAD
> > +bar
> > +===
> > +foo
> > +>>> topic
> > +
> > +can be recognized as the same conflict, and can automatically be
> > +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> > +be calculated from 'barfoo' in both cases.
> 
> You earlier talked about normalizing and reordering, but did not
> talk about "concatenate both with NUL in between and hash", so the
> explanation in the last two lines are not quite understandable by
> mere mortals, even though I know which part of the code you are
> referring to.  When you talk about hasing, you may want to make sure
> the readers understand that the branch label on <<< and >>> lines
> are ignored.
> 
> > +If there are multiple conflicts in one file, they are all appended to
> > +one another, both in the 'preimage' file as well as in the conflict
> > +ID.
> 
> In case it was not clear (and I do not think it is to those who only
> read your description and haven't thought things through
> themselves), this concatenation is why the normalization by
> reordering is helpful.  Imagine that a common ancestor had a file
> with a line with string "A" on it (I'll call such a line "line A"
> for brevity in the following) in its early part, and line X in its
> late part.  And then you fork four branches that do these things:
> 
> - AB: changes A to B
> - AC: changes A to C
> - XY: changes X to Y
> - XZ: changes X to Z
> 
> Now, forking a branch ABAC off of branch AB and then merging AC into
> it, and forking a branch ACAB off of branch AC and then merging AB
> into it, would yield the conflict in a different order.  The former
> would say "A became B or C, what now?" while the latter would say "A
> became C or B, what now?"
> 
> But the act of merging AC into ABAC and resolving the conflict to
> leave line D means that you declare: 
> 
> After examining what branches AB and AC did, I believe that
> making line A into line D is the best thing to do that is
> compatible with what AB and AC wanted to do.
> 
> So the conflict we would see when merging AB into ACAB should be
> resolved the same way---it is the resolution that is in line with
> that declaration.
> 
> Imagine that similarly you had previously forked branch XYXZ from
> XY, merged XZ into it, and resolved "X became Y or Z" into "X became
> W".
> 
> Now, if you forked a branch ABXY from AB and then merged XY, then
> ABXY would have line B in its early part and line Y in its later
> part.  Such a merge would be quite clean.  We can construct
> 4 combinations using these four branches ((AB, AC) x (XY, XZ)).
> 
> Merging ABXY and ACXZ would make "an early A became B or C, a late X
> became Y or Z" conflict, while merging ACXY and ABXZ would make "an
> early A became C or B, a late X became Y or Z".  We can see there
> are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
> 
> By sorting, we can give the conflict its canonical name, namely, "an
> early part became B or C, a late part becames X or Y", and whenever
> any of these four patterns appear, we can get to the same conflict
> and resolution that we saw earlier.  Without the sorting, we will
> have to somehow find a previous resolution from combinatorial
> explosion ;-)

Thanks for the in depth explanation!  I'll incorporate this into the
document.

> These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
> 2016-05-23), the conflict ID can safely collide, i.e. hash
> collisions that drops completely different conflicts and their
> resolutions into the same 

Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
On Sun, Jun 03, 2018 at 12:45:25PM +0200, Ævar Arnfjörð Bjarmason wrote:
> protection". I.e. regulators / prosecutors are much likely to go after
> some advertising company than some project using a Git repo.

Well, it is indeed rather unlikely that one particular git repo project 
will be targeted, but I guess it is basically certain that at least 
some of them will be.

It is the same as a lottery, it's very unlikely you win the jackpot, 
yet someone wins it every few months. We should care about the entire 
community, not be too selfish.

> Since the Author is free-form this sort of thing doesn't need to be part
> of the git data format. You can just generate a UUID like
> "5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to
> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to
> "refval:5c679eda-b4e5-4f35-b691-8e13862d4f79".

Well, this is merely pseudonymization, not anonymization. Note that the 
UUID, innocent as it may look, is not in any way less "personal data" 
than the author string itself. Your proposal would thus not actually 
solve the problem, only slightly transform it. Only when you truly 
anonymize (see my proposal about one way to to it), you can completely 
evade the GDPR.

> Sites that are paranoid about the GDPR could have a pre-receive hook
> rejecting any pushes from EU customers unless their commits were in this
> format.

This won't work either. The GDPR makes each data processor directly 
responsible in relation to the data subject. So it does not matter at 
all who is pushing, it matters who is in the author field of the 
commits that were pushed. And since you don't have any information 
about whether those authors are residing within the EU or not, you have 
to assume they are and you have to obey the GDPR. Even if you are 
outside the EU and do not have any subsidiaries within the EU, the GDPR 
sill applies as long as you are processing personal data of EU citizen. 
Perhaps the authorities in your country will refuse to obey letters of 
request if the EU authorities try to enforce the GDPR on an 
international scope, but if you have a record of GDPR violation and you 
ever set foot on EU territory, you are fair game.

> Instead I'll have a daily UUID issued from a government API

Heaven forbid. ;) There is an old German proverb, warning that even 
humorous trolling might be dangerous: "Man soll den Teufel nicht an die 
Wand malen!" ;)

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: GDPR compliance best practices?

2018-06-03 Thread Ævar Arnfjörð Bjarmason


On Sun, Jun 03 2018, Peter Backes wrote:

> Unfortunatly this important topic of GDPR compliance has not seen much
> interest.

I don't think you can infer that there's not much interest, but maybe
people just don't have anything to say about it.

There's a lot of discussions about this that I've seen, but what they
all have in common is that nobody really knows. Just like nobody really
knew what the "cookie law" would be like.

So I think all of us are just waiting to see.

I took the bite and tried to paraphrase some stuff I've read about it,
but as you pointed out in 20180417232504.ga4...@helen.plasma.xg8.de I
incorrectly surmised some stuff, although I very much suspect that *in
practice* the GDPR is going to be more about "consumer
protection". I.e. regulators / prosecutors are much likely to go after
some advertising company than some project using a Git repo.

Just like nobody's going after some local computer club's internal-only
website because it sets cookies without asking, but they might go after
Facebook for doing the same.

> [...]
> In course of this, anonymization could also be added. My idea would be
> as follows:
>
> Do not hash anything directly to obtain the commit ID. Instead, hash a
> list of hashes of [$random_number, $information] pairs. $information
> could be an author id, a commit date, a comment, or anything else. Then
> store the commit id, the list of hashes, and the list of pairs to form
> the commit.
>
> If someone requests erasure, simply empty the corresponding pair in the
> list. All that would be left would be the hash of the pair, which is
> completely anonymous (not more useful than a random number) and thus
> not covered by the GDPR. The history could still be completely
> verified, and when displaying the log, the erased entry could be
> displayed as "<>".
>
> What do you think about this?

Since the Author is free-form this sort of thing doesn't need to be part
of the git data format. You can just generate a UUID like
"5c679eda-b4e5-4f35-b691-8e13862d4f79" and then set user.name to
"refval:5c679eda-b4e5-4f35-b691-8e13862d4f79" and user.email to
"refval:5c679eda-b4e5-4f35-b691-8e13862d4f79".

Then you'd create a ref on the server like
refs/refval/5c679eda-b4e5-4f35-b691-8e13862d4f79 containing the real
"$user <$email>". If you then wanted to erase that field you'd just
delete the ref, and it would be much easier to teach stuff that renders
the likes of git-log to lookup these refs than changing the data format.

Sites that are paranoid about the GDPR could have a pre-receive hook
rejecting any pushes from EU customers unless their commits were in this
format.

Perhaps some variation of this is where the GDPR v2 will go. It'll be an
"obligation to be forgotten", and I won't be allowed to use my own name
anymore. Instead I'll have a daily UUID issued from a government API to
use on various forms, and the only way for anyone to resolve that will
be going through a webservice that'll reject UUID lookups older than N
months, caching those requests will be met with the death penalty. We'll
all be free at last.

Okey, that last paragraph is just trolling, but I think that refval: ->
ref convention is something worth considering if things *really* go in
this direction.


Re: how exactly can git config section names contain periods?

2018-06-03 Thread Johannes Sixt

Am 03.06.2018 um 11:53 schrieb Robert P. J. Day:

if i wanted to do something this admittedly awkward, would using
periods give me some benefit related to, i don't know, regex matching,
as compared to using a different character? or am i just way
overthinking this? is anyone out there actually taking advantage of
multi-level subsections?


There is nothing wrong with having all of these

git config foo.a.b.c.key value
git config foo.a.b.c value
git config foo.a.b.key value

in a single configuration file, I think. There is nothing special there. 
They're three different settings in two different (sub-)sections.


-- Hannes


Re: how exactly can git config section names contain periods?

2018-06-03 Thread Robert P. J. Day
On Sun, 3 Jun 2018, SZEDER Gábor wrote:

>
>
> > On Fri, 1 Jun 2018, Jeff King wrote:
> >
> > > On Fri, Jun 01, 2018 at 04:14:12PM -0400, Robert P. J. Day wrote:
> > > >   ok, so how on earth would i use "git config" at the command line
> > > > to set a config variable with some arbitrary level of subsections?
> > > > let's try this:
> > >
> > > You don't. There are only three levels: section, (optional)
> > > subsection, and key. If there is a subsection, it consists of
> > > _everything_ between the two outer periods.
>
> 
>
> >   if (for some weird reason) i wanted to define a multi-level
> > subsection,
>
> You can't, there are no multi-level subsections, see above.

  no, i *get* that, what i was asking was if i wanted to simulate or
emulate such a thing ... or is that just getting too weird and there
is no compelling reason to want to go down that road? (which i am
totally prepared to accept.)

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday


Re: how exactly can git config section names contain periods?

2018-06-03 Thread SZEDER Gábor



> On Fri, 1 Jun 2018, Jeff King wrote:
> 
> > On Fri, Jun 01, 2018 at 04:14:12PM -0400, Robert P. J. Day wrote:
> > >   ok, so how on earth would i use "git config" at the command line
> > > to set a config variable with some arbitrary level of subsections?
> > > let's try this:
> >
> > You don't. There are only three levels: section, (optional)
> > subsection, and key. If there is a subsection, it consists of
> > _everything_ between the two outer periods.



>   if (for some weird reason) i wanted to define a multi-level
> subsection,

You can't, there are no multi-level subsections, see above.




[PATCH v2] t/perf/run: Use proper "--get-regexp", not "--get-regex"

2018-06-03 Thread Robert P. J. Day


Even though "--get-regex" appears to work with "git config", the
clear standard is to spell out the action in full.

Signed-off-by: Robert P. J. Day 

---

this is the only occurrence i saw of this in the entire code base, so
it seemed worth tweaking just for consistency.

diff --git a/t/perf/run b/t/perf/run
index 9aaa733c7..fb5753ea2 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -110,7 +110,7 @@ run_dirs () {
 get_subsections () {
section="$1"
test -z "$GIT_PERF_CONFIG_FILE" && return
-   git config -f "$GIT_PERF_CONFIG_FILE" --name-only --get-regex 
"$section\..*\.[^.]+" |
+   git config -f "$GIT_PERF_CONFIG_FILE" --name-only --get-regexp 
"$section\..*\.[^.]+" |
sed -e "s/$section\.\(.*\)\..*/\1/" | sort | uniq
 }


-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



[PATCH] t/perf/run: Use proper "--get-regexp", not "get-regex"

2018-06-03 Thread Robert P. J. Day
Even though "--get-regex" appears to work with "git config", the
standard is to spell out the action in full.

Signed-off-by: Robert P. J. Day 

---

this is the only occurrence i saw of this in the entire code base, so
it seemed worth tweaking just for consistency.

diff --git a/t/perf/run b/t/perf/run
index 9aaa733c7..fb5753ea2 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -110,7 +110,7 @@ run_dirs () {
 get_subsections () {
section="$1"
test -z "$GIT_PERF_CONFIG_FILE" && return
-   git config -f "$GIT_PERF_CONFIG_FILE" --name-only --get-regex 
"$section\..*\.[^.]+" |
+   git config -f "$GIT_PERF_CONFIG_FILE" --name-only --get-regexp 
"$section\..*\.[^.]+" |
sed -e "s/$section\.\(.*\)\..*/\1/" | sort | uniq
 }


-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: how exactly can git config section names contain periods?

2018-06-03 Thread Robert P. J. Day
On Fri, 1 Jun 2018, Jeff King wrote:

> On Fri, Jun 01, 2018 at 04:14:12PM -0400, Robert P. J. Day wrote:

... snip ...

> >   ok, so how on earth would i use "git config" at the command line
> > to set a config variable with some arbitrary level of subsections?
> > let's try this:
>
> You don't. There are only three levels: section, (optional)
> subsection, and key. If there is a subsection, it consists of
> _everything_ between the two outer periods.
>
> >   $ git config --global a.b.c.d.e rday
> >
> > huh ... seemed to work fine, and added this to my ~/.gitconfig:
> >
> >   [a "b.c.d"]
> >   e = rday
> >
> > as i see it, the first component is intgerpreted as the section name,
> > the last component is the variable/key(?) name, and everything in
> > between is treated as subsection(s), which is not at all obvious from
> > that Doc file, or from "man git-config".
>
> Yep, your understanding is correct.
>
> >   and if a section name can contain periods, how would you specify
> > that at the command line?
>
> You can't, because section names cannot contain periods. ;)

  if (for some weird reason) i wanted to define a multi-level
subsection, is there any benefit to using periods as i did above, as
opposed to any other delimiting character? apparently, running this:

  $ git config --global a.b_c_d.e rday

dumps this into my ~/.gitconfig:

  [a "b_c_d"]
e = rday

if i wanted to do something this admittedly awkward, would using
periods give me some benefit related to, i don't know, regex matching,
as compared to using a different character? or am i just way
overthinking this? is anyone out there actually taking advantage of
multi-level subsections?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: GDPR compliance best practices?

2018-06-03 Thread Peter Backes
Hi,

Unfortunatly this important topic of GDPR compliance has not seen much 
interest.

After asking github about how they would cope with the issue of erasing 
the author field, they changed their privacy policy, which now 
clarifies that this won't be done.

My guess is that this would ultimately rely on "overriding legitimate 
grounds for the processing" (Art. 17 (1) point (a) GDPR) which is one 
of the most fragile legitimizations avaiblable in the GDPR.

The GDPR emphasizes the importance of using state of the art 
technology, including anonymization, in as much as possible to ensure 
privacy.

At 
https://public-inbox.org/git/CA+dhYEViN4-boZLN+5QJyE7RtX+q6a92p0C2O6TA53==bzf...@mail.gmail.com/T/
 
there is already some discussion about transitioning to a different 
hashing algorithm to get more in line with state of the art in hashing. 
(My clear favourite would be SHA-3.)

In course of this, anonymization could also be added. My idea would be 
as follows:

Do not hash anything directly to obtain the commit ID. Instead, hash a 
list of hashes of [$random_number, $information] pairs. $information 
could be an author id, a commit date, a comment, or anything else. Then 
store the commit id, the list of hashes, and the list of pairs to form 
the commit.

If someone requests erasure, simply empty the corresponding pair in the 
list. All that would be left would be the hash of the pair, which is 
completely anonymous (not more useful than a random number) and thus 
not covered by the GDPR. The history could still be completely 
verified, and when displaying the log, the erased entry could be 
displayed as "<>".

What do you think about this?

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: [PATCH 03/22] builtin/config.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> There are also some minor adjustments in the strings.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/builtin/config.c b/builtin/config.c
> @@ -746,7 +746,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
> if (ret == CONFIG_NOTHING_SET)
> error(_("cannot overwrite multiple values with a 
> single value\n"
> -   "   Use a regexp, --add or --replace-all to 
> change %s."), argv[0]);
> +   "   Use a regexp, --add or --replace-all to 
> change %s"), argv[0]);

Perhaps?

cannot overwrite multiple values with a single value;
use a regexp, --add or --replace-all to change %s

> @@ -819,7 +819,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
> if (ret == 0)
> -   die("No such section!");
> +   die(_("no such section!"));
> @@ -830,7 +830,7 @@ int cmd_config(int argc, const char **argv, const char 
> *prefix)
> if (ret == 0)
> -   die("No such section!");
> +   die(_("no such section!"));

In other patches, you dropped the trailing "!"; perhaps do so for
these two also?

Maybe even:

die(_("no such section: %s", whatever);

Though, that may be out of scope of this patch series.


Re: [PATCH 09/22] connect.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> There are also some rephrasing and breaking sentences to help
> translators.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/connect.c b/connect.c
> @@ -921,7 +928,7 @@ static enum protocol parse_connect_url(const char 
> *url_orig, char **ret_host,
> if (!path || !*path)
> -   die("No path specified. See 'man git-pull' for valid url 
> syntax");
> +   die(_("no path specified. See 'man git-pull' for valid url 
> syntax"));

Perhaps:

no path specified; see 'man git-pull' for valid url syntax

?


Re: [PATCH 08/22] config.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/config.c b/config.c
> @@ -461,7 +461,7 @@ int git_config_from_parameters(config_fn_t fn, void *data)
> envw = xstrdup(env);
>
> if (sq_dequote_to_argv(envw, , , ) < 0) {
> -   ret = error("bogus format in " CONFIG_DATA_ENVIRONMENT);
> +   ret = error(("bogus format in %s"), CONFIG_DATA_ENVIRONMENT);

s/((/(_(/


Re: [PATCH 11/22] dir.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/dir.c b/dir.c
> @@ -560,7 +560,7 @@ int report_path_error(const char *ps_matched,
> -   error("pathspec '%s' did not match any file(s) known to git.",
> +   error(_("pathspec '%s' did not match any file(s) known to 
> git."),

Drop the trailing period for consistency with other messages you've changed.


Re: [PATCH 10/22] convert.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/convert.c b/convert.c
> @@ -203,11 +203,11 @@ static void check_global_conv_flags_eol(const char 
> *path, enum crlf_action crlf_
> if (conv_flags & CONV_EOL_RNDTRP_DIE)
> -   die(_("CRLF would be replaced by LF in %s."), path);
> +   die(_("CRLF would be replaced by LF in %s"), path);
> else if (conv_flags & CONV_EOL_RNDTRP_WARN)
> warning(_("CRLF will be replaced by LF in %s.\n"
>   "The file will have its original line"
> - " endings in your working directory."), 
> path);
> + " endings in your working directory"), 
> path);
> @@ -217,7 +217,7 @@ static void check_global_conv_flags_eol(const char *path, 
> enum crlf_action crlf_
> else if (conv_flags & CONV_EOL_RNDTRP_WARN)
> warning(_("LF will be replaced by CRLF in %s.\n"
>   "The file will have its original line"
> - " endings in your working directory."), 
> path);
> + " endings in your working directory"), 
> path);
> }
>  }

For these two, perhaps:

 ... replace by  in %s;
the file will have its original line
endings in your working directory

> @@ -492,8 +492,8 @@ static int encode_to_worktree(const char *path, const 
> char *src, size_t src_len,
> dst = reencode_string_len(src, src_len, enc, default_encoding,
>   _len);
> if (!dst) {
> -   error("failed to encode '%s' from %s to %s",
> -   path, default_encoding, enc);
> +   error(_("failed to encode '%s' from %s to %s"),
> + path, default_encoding, enc);

The whitespace change on the second line fixes alignment with the
opening '('. Okay.


Re: [PATCH 06/22] builtin/replace.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/builtin/replace.c b/builtin/replace.c
> @@ -456,10 +456,10 @@ static int create_graft(int argc, const char **argv, 
> int force, int gentle)
> -   if (remove_signature()) {
> -   warning(_("the original commit '%s' has a gpg signature."), 
> old_ref);
> -   warning(_("the signature will be removed in the replacement 
> commit!"));
> -   }
> +   if (remove_signature())
> +   warning(_("the original commit '%s' has a gpg signature.\n"
> + "The signature will be removed in the replacement 
> commit!"),
> +   old_ref);

It's kind of weird to drop capitalization of the first sentence but
not the second. Also, you dropped trailing "!" in some other patches;
do you want to do so here? Perhaps, instead:

the original commit '%s' has a gpg signature;
the signature will be removed in the replacement commit


Re: [PATCH 22/22] transport-helper.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
> @@ -196,13 +196,13 @@ static struct child_process *get_helper(struct 
> transport *transport)
> -   die("Unknown mandatory capability %s. This remote "
> -   "helper probably needs newer version of Git.",
> +   die(_("Unknown mandatory capability %s. This remote "
> + "helper probably needs newer version of Git."),
> capname);

Did you want to downcase these and drop period for consistency with
others? Perhaps:

unknown mandatory capability '%s'; this remote
helper probably needs newer version of Git


Re: [PATCH 21/22] transport.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/transport.c b/transport.c
> @@ -875,7 +875,7 @@ struct transport *transport_get(struct remote *remote, 
> const char *url)
> ret->progress = isatty(2);
>
> if (!remote)
> -   die("No remote provided to transport_get()");
> +   BUG("No remote provided to transport_get()");

Did you want to downcase "No" or just didn't bother since it's not
intended for end-user?

It looks like most callers of transport_get() won't pass NULL for
'remote' so this makes sense as a BUG(). A couple callers,
archive.c:run_remote_archiver() and fetch-object.c:fetch_refs(), get
the remote via remote_get() but don't check for NULL before
dereferencing it (before even calling this function), so will crash if
remote_get() ever returns NULL (assuming that it ever could in those
cases).


Re: [PATCH 19/22] sequencer.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/sequencer.c b/sequencer.c
> @@ -2597,15 +2597,17 @@ static int error_with_patch(struct commit *commit,
> -   fprintf(stderr, "You can amend the commit now, with\n"
> -   "\n"
> -   "  git commit --amend %s\n"
> -   "\n"
> -   "Once you are satisfied with your changes, run\n"
> -   "\n"
> -   "  git rebase --continue\n", 
> gpg_sign_opt_quoted(opts));
> +   fprintf(stderr,
> +   _("You can amend the commit now, with\n"
> + "\n"
> + "  git commit --amend %s\n"
> + "\n"
> + "Once you are satisfied with your changes, run\n"
> + "\n"
> + "  git rebase --continue\n"),
> +   gpg_sign_opt_quoted(opts));
> } else if (exit_code)
> -   fprintf(stderr, "Could not apply %s... %.*s\n",
> +   fprintf_ln(stderr, _("Could not apply %s... %.*s"),

Did you want to downcase "Could" for consistency with other error
messages, or was this left as-is intentionally?


Re: [PATCH 20/22] sha1-file.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/sha1-file.c b/sha1-file.c
> @@ -71,17 +71,17 @@ static void git_hash_sha1_final(unsigned char *hash, 
> git_hash_ctx *ctx)
>  static void git_hash_unknown_init(git_hash_ctx *ctx)
>  {
> -   die("trying to init unknown hash");
> +   die(_("trying to init unknown hash"));
>  }
>
>  static void git_hash_unknown_update(git_hash_ctx *ctx, const void *data, 
> size_t len)
>  {
> -   die("trying to update unknown hash");
> +   die(_("trying to update unknown hash"));
>  }
>
>  static void git_hash_unknown_final(unsigned char *hash, git_hash_ctx *ctx)
>  {
> -   die("trying to finalize unknown hash");
> +   die(_("trying to finalize unknown hash"));
>  }

The above three are indicative of programmer error, aren't they? If
so, they ought not be translated (and perhaps changed to BUG at some
point).

>  const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
> @@ -378,8 +378,8 @@ static int alt_odb_usable(struct raw_object_store *o,
>
> /* Detect cases where alternate disappeared */
> if (!is_directory(path->buf)) {
> -   error("object directory %s does not exist; "
> - "check .git/objects/info/alternates.",
> +   error(_("object directory %s does not exist; "
> +   "check .git/objects/info/alternates."),

Perhaps drop the trailing period as you did in other cases.

>   path->buf);
> return 0;
> }


Re: [PATCH 16/22] refs.c: mark more strings for translation

2018-06-03 Thread Eric Sunshine
On Sat, Jun 2, 2018 at 12:32 AM, Nguyễn Thái Ngọc Duy  wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
> diff --git a/refs.c b/refs.c
> @@ -1845,7 +1845,7 @@ int ref_update_reject_duplicates(struct string_list 
> *refnames,
> if (!cmp) {
> strbuf_addf(err,
> -   "multiple updates for ref '%s' not 
> allowed.",
> +   _("multiple updates for ref '%s' not 
> allowed."),

In other messages in this patch, you dropped the period at the end of
the message. Perhaps do so here too.

> refnames->items[i].string);
> return 1;
> } else if (cmp > 0) {
> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> @@ -390,7 +390,7 @@ test_expect_success 'Query "master@{2005-05-26 23:33:01}" 
> (middle of history wit
> test_when_finished "rm -f o e" &&
> git rev-parse --verify "master@{2005-05-26 23:33:01}" >o 2>e &&
> test $B = $(cat o) &&
> -   test "warning: Log for ref $m has gap after $gd." = "$(cat e)"
> +   test_i18ngrep -F "warning: log for ref $m has gap after $gd" e
>  '

The change from '$(cat e)' to bare 'e' caught me off guard for a
moment, but use of test_i18ngrep explains it. Okay.

> diff --git a/t/t3310-notes-merge-manual-resolve.sh 
> b/t/t3310-notes-merge-manual-resolve.sh
> @@ -541,9 +541,9 @@ EOF
> -   grep -q "refs/notes/m" output &&
> -   grep -q "$(git rev-parse refs/notes/m)" output &&
> -   grep -q "$(git rev-parse NOTES_MERGE_PARTIAL^1)" output &&
> +   test_i18ngrep -q "refs/notes/m" output &&
> +   test_i18ngrep -q "$(git rev-parse refs/notes/m)" output &&
> +   test_i18ngrep -q "$(git rev-parse NOTES_MERGE_PARTIAL^1)" output &&

I hadn't seen test_i18ngrep take argument -q elsewhere, so I wondered
if it handled it correctly. Checking the implementation, I see that it
does pass it along to the underlying grep. Okay.

I also wondered briefly if it made sense to drop -q here since it
doesn't necessarily help debugging the test upon failure, but I
suppose such a change is outside the scope of this patch series, so
probably better not to.


  1   2   >