Re: Bug report: Corrupt pack file after committing a large file (>4 GB?)

2017-05-25 Thread Torsten Bögershausen
On 2017-05-26 07:51, Yu-Hsuan Chen wrote:
> Dear maintainer,
> 
> There is a bug where committing a large file corrupts the pack file in
> Windows. Steps to recreate are:
> 
> 1. git init
> 2. stage and commit a file larger than 4 GB (not entirely sure about this 
> size)
> 3. git checkout -f
> 
> The file checked out is much smaller than the original file size.
> 
> This behavior is surprising. If git does not support large files, I
> would at least expect an error message when staging or committing. I
> have post a question on StackOverflow regrading this issue, and has
> been confirmed by another user. (question id: 44022897)
> 
> Best regards,
> 
> David Chen
> 
Issues for Git for Windows should, in general, be reported here:
https://github.com/git-for-windows/git/

After 2 seconds of searching, we can find that the 4Gb problem
has already been reported:
https://github.com/git-for-windows/git/issues/1063

And, to my knowledge, it has not been fixed, since it is a
lot of effort to replace the "long" (or unsigned long) in the
Git code base with a better data type.

In other words, thanks for reminding us, more help is needed.





Re: Bug report: Corrupt pack file after committing a large file (>4 GB?)

2017-05-25 Thread Konstantin Khomoutov
On Fri, May 26, 2017 at 01:51:34PM +0800, Yu-Hsuan Chen wrote:

> There is a bug where committing a large file corrupts the pack file in
> Windows. Steps to recreate are:
> 
> 1. git init
> 2. stage and commit a file larger than 4 GB (not entirely sure about this 
> size)
> 3. git checkout -f
> 
> The file checked out is much smaller than the original file size.

For a bug report to be at least marginally useful, please state your Git
version (run "git --version") and the version of your Windows
installation (including whether it's 32- or 64-bit install).

Please also make sure you're really using Git for Windows (that is, you
got it from [1] or [2] and not, say, from Cygwin.

The best course of actions is to download the most recent available
version from the locations mentioned above and verify the problem still
manifests itself.

1. https://git-scm.com/download/win
2. https://git-for-windows.github.io/



Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-25 Thread Junio C Hamano
Tyler Brazier  writes:

> On Thu, May 25, 2017 at 6:33 PM, Junio C Hamano  wrote:
>> Jeff King  writes:
>>
>>> Anyway. All this has shown me is that it's probably pointless to do this
>>> timing at all on Linux. Somebody on Windows might get better results.
>>>
>>> But regardless, we need to do something. Correctness must trump
>>> optimizations, and the question is whether we can throw out the whole
>>> conditional, or if we should just restrict when it kicks in.
>>
>> Yes.  I personally do not mind going with the simplest approach.
>> The optimization thing is relatively new and we were perfectly happy
>> without it before ;-).

[administrivia: please do not top-post]

> Does git accept outside pull requests? I wouldn't mind committing the
> fix for this once it's been decided what the fix should be. (It might
> help my resume ;)

Please see Documentation/SubmittingPatches.

Thanks.


Bug report: Corrupt pack file after committing a large file (>4 GB?)

2017-05-25 Thread Yu-Hsuan Chen
Dear maintainer,

There is a bug where committing a large file corrupts the pack file in
Windows. Steps to recreate are:

1. git init
2. stage and commit a file larger than 4 GB (not entirely sure about this size)
3. git checkout -f

The file checked out is much smaller than the original file size.

This behavior is surprising. If git does not support large files, I
would at least expect an error message when staging or committing. I
have post a question on StackOverflow regrading this issue, and has
been confirmed by another user. (question id: 44022897)

Best regards,

David Chen


Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-25 Thread Tyler Brazier
Does git accept outside pull requests? I wouldn't mind committing the
fix for this once it's been decided what the fix should be. (It might
help my resume ;)

On Thu, May 25, 2017 at 6:33 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> Anyway. All this has shown me is that it's probably pointless to do this
>> timing at all on Linux. Somebody on Windows might get better results.
>>
>> But regardless, we need to do something. Correctness must trump
>> optimizations, and the question is whether we can throw out the whole
>> conditional, or if we should just restrict when it kicks in.
>
> Yes.  I personally do not mind going with the simplest approach.
> The optimization thing is relatively new and we were perfectly happy
> without it before ;-).
>


Re: [PATCH] docs/config.txt: fix indefinite article in core.fileMode description

2017-05-25 Thread Junio C Hamano
Obviously correct.  Thanks.


[PATCH v3 11/13] log: fix memory leak in open_next_file()

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

Noticed-by: Jeff King 
Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/log.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 26d6a3cf14..f075838df9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -842,8 +842,10 @@ static int open_next_file(struct commit *commit, const 
char *subject,
if (output_directory) {
strbuf_addstr(&filename, output_directory);
if (filename.len >=
-   PATH_MAX - FORMAT_PATCH_NAME_MAX - suffix_len)
+   PATH_MAX - FORMAT_PATCH_NAME_MAX - suffix_len) {
+   strbuf_release(&filename);
return error(_("name of output directory is too long"));
+   }
strbuf_complete(&filename, '/');
}
 
@@ -857,9 +859,11 @@ static int open_next_file(struct commit *commit, const 
char *subject,
if (!quiet)
printf("%s\n", filename.buf + outdir_offset);
 
-   if ((rev->diffopt.file = fopen(filename.buf, "w")) == NULL)
-   return error_errno(_("Cannot open patch file %s"),
-  filename.buf);
+   if ((rev->diffopt.file = fopen(filename.buf, "w")) == NULL) {
+   error_errno(_("Cannot open patch file %s"), filename.buf);
+   strbuf_release(&filename);
+   return -1;
+   }
 
strbuf_release(&filename);
return 0;
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 10/13] rerere.c: move error_errno() closer to the source system call

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

We are supposed to report errno from fopen(). fclose() between fopen()
and the report function could either change errno or reset it.

Signed-off-by: Junio C Hamano 
---
 rerere.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rerere.c b/rerere.c
index 1351b0c3fb..c26c29f87a 100644
--- a/rerere.c
+++ b/rerere.c
@@ -489,8 +489,9 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
if (output) {
io.io.output = fopen(output, "w");
if (!io.io.output) {
+   error_errno("Could not write %s", output);
fclose(io.input);
-   return error_errno("Could not write %s", output);
+   return -1;
}
}
 
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 08/13] wrapper.c: make warn_on_inaccessible() static

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

After the last patch, this function is not used outside anymore. Keep it
static.

Noticed-by: Ramsay Jones 
Signed-off-by: Junio C Hamano 
---
 git-compat-util.h |  2 --
 wrapper.c | 10 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index f74b401810..87f47828a5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1100,8 +1100,6 @@ int remove_or_warn(unsigned int mode, const char *path);
 int access_or_warn(const char *path, int mode, unsigned flag);
 int access_or_die(const char *path, int mode, unsigned flag);
 
-/* Warn on an inaccessible file that ought to be accessible */
-void warn_on_inaccessible(const char *path);
 /* Warn on an inaccessible file if errno indicates this is an error */
 int warn_on_fopen_errors(const char *path);
 
diff --git a/wrapper.c b/wrapper.c
index 6e513c904a..b117eb14a4 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -418,6 +418,11 @@ FILE *fopen_for_writing(const char *path)
return ret;
 }
 
+static void warn_on_inaccessible(const char *path)
+{
+   warning_errno(_("unable to access '%s'"), path);
+}
+
 int warn_on_fopen_errors(const char *path)
 {
if (errno != ENOENT && errno != ENOTDIR) {
@@ -597,11 +602,6 @@ int remove_or_warn(unsigned int mode, const char *file)
return S_ISGITLINK(mode) ? rmdir_or_warn(file) : unlink_or_warn(file);
 }
 
-void warn_on_inaccessible(const char *path)
-{
-   warning_errno(_("unable to access '%s'"), path);
-}
-
 static int access_error_is_ok(int err, unsigned flag)
 {
return err == ENOENT || err == ENOTDIR ||
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 12/13] wrapper: factor out is_missing_file_error()

2017-05-25 Thread Junio C Hamano
Our code often opens a path to an optional file, to work on its
contents when we can successfully open it.  We can ignore a failure
to open if such an optional file does not exist, but we do want to
report a failure in opening for other reasons (e.g. we got an I/O
error, or the file is there, but we lack the permission to open).

There is a logic to determine if an errno left by open/fopen
indicates such an ignorable error.  Split it out into a helper
function.  We may want to make it an extern in later steps, as many
hits from "git grep ENOENT" would instead want to be using the same
logic.

Signed-off-by: Junio C Hamano 
---
 wrapper.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/wrapper.c b/wrapper.c
index b117eb14a4..f1c87ec7ea 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -423,9 +423,23 @@ static void warn_on_inaccessible(const char *path)
warning_errno(_("unable to access '%s'"), path);
 }
 
+/*
+ * Our code often opens a path to an optional file, to work on its
+ * contents when we can successfully open it.  We can ignore a failure
+ * to open if such an optional file does not exist, but we do want to
+ * report a failure in opening for other reasons (e.g. we got an I/O
+ * error, or the file is there, but we lack the permission to open).
+ *
+ * Call this function after seeing an error from open() or fopen() to
+ * see if the errno indicates a missing file that we can safely ignore.
+ */
+static int is_missing_file_error(int errno_) {
+   return (errno_ == ENOENT || errno_ == ENOTDIR);
+}
+
 int warn_on_fopen_errors(const char *path)
 {
-   if (errno != ENOENT && errno != ENOTDIR) {
+   if (!is_missing_file_error(errno)) {
warn_on_inaccessible(path);
return -1;
}
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 13/13] is_missing_file_error(): work around EINVAL on Windows

2017-05-25 Thread Junio C Hamano
When asked to open/fopen a path, e.g. "a/b:/c", which does not exist
on the filesystem, Windows (correctly) fails to open it but sets
EINVAL to errno because the pathname has characters that cannot be
stored in its filesystem.

As this is an expected failure, teach is_missing_file_error() helper
about this case.

This is RFC, as there may be a case where we get EINVAL from
open/fopen for reasons other than "the filesystem does not like this
pathname" that may be worth reporting to the user, and this change
is sweeping such an error under the rug.

Signed-off-by: Junio C Hamano 
---
 wrapper.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/wrapper.c b/wrapper.c
index f1c87ec7ea..74aa3b7803 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -434,6 +434,10 @@ static void warn_on_inaccessible(const char *path)
  * see if the errno indicates a missing file that we can safely ignore.
  */
 static int is_missing_file_error(int errno_) {
+#ifdef GIT_WINDOWS_NATIVE
+   if (errno_ == EINVAL)
+   return 1;
+#endif
return (errno_ == ENOENT || errno_ == ENOTDIR);
 }
 
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 01/13] git_fopen: fix a sparse 'not declared' warning

2017-05-25 Thread Junio C Hamano
From: Ramsay Jones 

If git is built with the FREAD_READS_DIRECTORIES build variable set, this
would cause sparse to issue a 'not declared, should it be static?' warning
on Linux. This is a result of the method employed by 'compat/fopen.c' to
suppress the (possible) redefinition of the (system) fopen macro, which
also removes the extern declaration of the git_fopen function.

In order to suppress the warning, introduce a new macro to suppress the
definition (or possibly the re-definition) of the fopen symbol as a macro
override. This new macro (SUPPRESS_FOPEN_REDEFINITION) is only defined in
'compat/fopen.c', just prior to the inclusion of the 'git-compat-util.h'
header file.

Signed-off-by: Ramsay Jones 
Signed-off-by: Junio C Hamano 
---
 compat/fopen.c|  4 ++--
 git-compat-util.h | 10 ++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/compat/fopen.c b/compat/fopen.c
index b5ca142fed..107b3e8182 100644
--- a/compat/fopen.c
+++ b/compat/fopen.c
@@ -1,14 +1,14 @@
 /*
  *  The order of the following two lines is important.
  *
- *  FREAD_READS_DIRECTORIES is undefined before including git-compat-util.h
+ *  SUPPRESS_FOPEN_REDEFINITION is defined before including git-compat-util.h
  *  to avoid the redefinition of fopen within git-compat-util.h. This is
  *  necessary since fopen is a macro on some platforms which may be set
  *  based on compiler options. For example, on AIX fopen is set to fopen64
  *  when _LARGE_FILES is defined. The previous technique of merely undefining
  *  fopen after including git-compat-util.h is inadequate in this case.
  */
-#undef FREAD_READS_DIRECTORIES
+#define SUPPRESS_FOPEN_REDEFINITION
 #include "../git-compat-util.h"
 
 FILE *git_fopen(const char *path, const char *mode)
diff --git a/git-compat-util.h b/git-compat-util.h
index bd04564a69..6be55cf8b3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -689,10 +689,12 @@ char *gitstrdup(const char *s);
 #endif
 
 #ifdef FREAD_READS_DIRECTORIES
-#ifdef fopen
-#undef fopen
-#endif
-#define fopen(a,b) git_fopen(a,b)
+# if !defined(SUPPRESS_FOPEN_REDEFINITION)
+#  ifdef fopen
+#   undef fopen
+#  endif
+#  define fopen(a,b) git_fopen(a,b)
+# endif
 extern FILE *git_fopen(const char*, const char*);
 #endif
 
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 07/13] wrapper.c: add and use fopen_or_warn()

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

When fopen() returns NULL, it could be because the given path does not
exist, but it could also be some other errors and the caller has to
check. Add a wrapper so we don't have to repeat the same error check
everywhere.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 attr.c|  7 ++-
 bisect.c  |  2 +-
 builtin/blame.c   |  2 +-
 commit.c  |  2 +-
 config.c  |  2 +-
 git-compat-util.h |  1 +
 ident.c   |  8 +++-
 remote.c  |  4 ++--
 rerere.c  |  2 +-
 sequencer.c   |  8 
 server-info.c |  2 +-
 t/t1308-config-set.sh |  2 ++
 t/t5512-ls-remote.sh  | 13 ++---
 wrapper.c | 11 +++
 wt-status.c   |  3 ++-
 15 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/attr.c b/attr.c
index 7e2134471c..821203e2a9 100644
--- a/attr.c
+++ b/attr.c
@@ -720,16 +720,13 @@ void git_attr_set_direction(enum git_attr_direction 
new_direction,
 
 static struct attr_stack *read_attr_from_file(const char *path, int macro_ok)
 {
-   FILE *fp = fopen(path, "r");
+   FILE *fp = fopen_or_warn(path, "r");
struct attr_stack *res;
char buf[2048];
int lineno = 0;
 
-   if (!fp) {
-   if (errno != ENOENT && errno != ENOTDIR)
-   warn_on_inaccessible(path);
+   if (!fp)
return NULL;
-   }
res = xcalloc(1, sizeof(*res));
while (fgets(buf, sizeof(buf), fp)) {
char *bufp = buf;
diff --git a/bisect.c b/bisect.c
index 469a3e9061..bb28bf63b2 100644
--- a/bisect.c
+++ b/bisect.c
@@ -666,7 +666,7 @@ static int is_expected_rev(const struct object_id *oid)
if (stat(filename, &st) || !S_ISREG(st.st_mode))
return 0;
 
-   fp = fopen(filename, "r");
+   fp = fopen_or_warn(filename, "r");
if (!fp)
return 0;
 
diff --git a/builtin/blame.c b/builtin/blame.c
index 07506a3e45..34445d7894 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2071,7 +2071,7 @@ static int prepare_lines(struct scoreboard *sb)
  */
 static int read_ancestry(const char *graft_file)
 {
-   FILE *fp = fopen(graft_file, "r");
+   FILE *fp = fopen_or_warn(graft_file, "r");
struct strbuf buf = STRBUF_INIT;
if (!fp)
return -1;
diff --git a/commit.c b/commit.c
index 73c78c2b80..3eeda081f9 100644
--- a/commit.c
+++ b/commit.c
@@ -167,7 +167,7 @@ struct commit_graft *read_graft_line(char *buf, int len)
 
 static int read_graft_file(const char *graft_file)
 {
-   FILE *fp = fopen(graft_file, "r");
+   FILE *fp = fopen_or_warn(graft_file, "r");
struct strbuf buf = STRBUF_INIT;
if (!fp)
return -1;
diff --git a/config.c b/config.c
index 2894fbb6d0..e54d99d519 100644
--- a/config.c
+++ b/config.c
@@ -1422,7 +1422,7 @@ int git_config_from_file(config_fn_t fn, const char 
*filename, void *data)
int ret = -1;
FILE *f;
 
-   f = fopen(filename, "r");
+   f = fopen_or_warn(filename, "r");
if (f) {
flockfile(f);
ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, 
filename, f, data);
diff --git a/git-compat-util.h b/git-compat-util.h
index eb5c18c7fd..f74b401810 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -802,6 +802,7 @@ extern int xmkstemp(char *template);
 extern int xmkstemp_mode(char *template, int mode);
 extern char *xgetcwd(void);
 extern FILE *fopen_for_writing(const char *path);
+extern FILE *fopen_or_warn(const char *path, const char *mode);
 
 #define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc)))
 #define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), 
(alloc)))
diff --git a/ident.c b/ident.c
index bea871c8e0..91c7609055 100644
--- a/ident.c
+++ b/ident.c
@@ -72,12 +72,10 @@ static int add_mailname_host(struct strbuf *buf)
FILE *mailname;
struct strbuf mailnamebuf = STRBUF_INIT;
 
-   mailname = fopen("/etc/mailname", "r");
-   if (!mailname) {
-   if (errno != ENOENT)
-   warning_errno("cannot open /etc/mailname");
+   mailname = fopen_or_warn("/etc/mailname", "r");
+   if (!mailname)
return -1;
-   }
+
if (strbuf_getline(&mailnamebuf, mailname) == EOF) {
if (ferror(mailname))
warning_errno("cannot read /etc/mailname");
diff --git a/remote.c b/remote.c
index 801137c72e..1f2453d0f6 100644
--- a/remote.c
+++ b/remote.c
@@ -251,7 +251,7 @@ static const char *skip_spaces(const char *s)
 static void read_remotes_file(struct remote *remote)
 {
struct strbuf buf = STRBUF_INIT;
-   FILE *f = fopen(git_path("remotes/%s", remote->name), "r");
+   FILE *f = fopen_or_warn(git_path("remotes/%s", remote->name), "r");
 
if (!f)
r

[PATCH v3 09/13] print errno when reporting a system call error

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/log.c | 3 ++-
 rerere.c  | 4 ++--
 xdiff-interface.c | 4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index b3b10cc1ed..26d6a3cf14 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -858,7 +858,8 @@ static int open_next_file(struct commit *commit, const char 
*subject,
printf("%s\n", filename.buf + outdir_offset);
 
if ((rev->diffopt.file = fopen(filename.buf, "w")) == NULL)
-   return error(_("Cannot open patch file %s"), filename.buf);
+   return error_errno(_("Cannot open patch file %s"),
+  filename.buf);
 
strbuf_release(&filename);
return 0;
diff --git a/rerere.c b/rerere.c
index 971bfedfb2..1351b0c3fb 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,13 +484,13 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
io.input = fopen(path, "r");
io.io.wrerror = 0;
if (!io.input)
-   return error("Could not open %s", path);
+   return error_errno("Could not open %s", path);
 
if (output) {
io.io.output = fopen(output, "w");
if (!io.io.output) {
fclose(io.input);
-   return error("Could not write %s", output);
+   return error_errno("Could not write %s", output);
}
}
 
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 060038c2d6..d3f78ca2a7 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -164,9 +164,9 @@ int read_mmfile(mmfile_t *ptr, const char *filename)
size_t sz;
 
if (stat(filename, &st))
-   return error("Could not stat %s", filename);
+   return error_errno("Could not stat %s", filename);
if ((f = fopen(filename, "rb")) == NULL)
-   return error("Could not open %s", filename);
+   return error_errno("Could not open %s", filename);
sz = xsize_t(st.st_size);
ptr->ptr = xmalloc(sz ? sz : 1);
if (sz && fread(ptr->ptr, sz, 1, f) != 1) {
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 00/13] reporting unexpected errors after (f)open

2017-05-25 Thread Junio C Hamano
These are taken from https://github.com/pclouds/git/commits/fopen-or-warn
cf. 
with a few bits salvaged from its v2 version.


Ramsay Jones (1):
  git_fopen: fix a sparse 'not declared' warning

Nguyễn Thái Ngọc Duy (9):
  use xfopen() in more places
  clone: use xfopen() instead of fopen()
  config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
  wrapper.c: add and use warn_on_fopen_errors()
  wrapper.c: add and use fopen_or_warn()
  wrapper.c: make warn_on_inaccessible() static
  print errno when reporting a system call error
  rerere.c: move error_errno() closer to the source system call
  log: fix memory leak in open_next_file()

Junio C Hamano (3):
  config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
  wrapper: factor out is_missing_file_error()
  is_missing_file_error(): work around EINVAL on Windows

Patches 12 & 13 are my attempt to answer the issue the series had on
Windows.
cf. 


 attr.c|  7 ++-
 bisect.c  |  7 ++-
 builtin/am.c  |  8 ++--
 builtin/blame.c   |  2 +-
 builtin/clone.c   |  2 +-
 builtin/commit.c  |  5 +
 builtin/fast-export.c |  4 +---
 builtin/fsck.c|  3 +--
 builtin/log.c | 11 ---
 builtin/merge.c   |  4 +---
 builtin/pull.c|  3 +--
 commit.c  |  2 +-
 compat/fopen.c|  4 ++--
 config.c  |  5 -
 config.mak.uname  |  4 
 diff.c|  8 ++--
 dir.c |  6 +++---
 fast-import.c |  4 +---
 git-compat-util.h | 15 +--
 ident.c   |  8 +++-
 remote-testsvn.c  |  8 ++--
 remote.c  |  4 ++--
 rerere.c  |  7 ---
 sequencer.c   |  8 
 server-info.c |  2 +-
 t/t1308-config-set.sh | 13 -
 t/t5512-ls-remote.sh  | 13 ++---
 wrapper.c | 49 -
 wt-status.c   |  3 ++-
 xdiff-interface.c |  4 ++--
 30 files changed, 133 insertions(+), 90 deletions(-)

-- 
2.13.0-491-g71cfeddc25



[PATCH v3 06/13] wrapper.c: add and use warn_on_fopen_errors()

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

In many places, Git warns about an inaccessible file after a fopen()
failed. To discern these cases from other cases where we want to warn
about inaccessible files, introduce a new helper specifically to test
whether fopen() failed because the current user lacks the permission to
open file in question.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 config.c  |  3 +++
 dir.c |  6 +++---
 git-compat-util.h |  2 ++
 t/t1308-config-set.sh |  3 ++-
 wrapper.c | 10 ++
 5 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index b4a3205da3..2894fbb6d0 100644
--- a/config.c
+++ b/config.c
@@ -2640,6 +2640,9 @@ int git_config_rename_section_in_file(const char 
*config_filename,
}
 
if (!(config_file = fopen(config_filename, "rb"))) {
+   ret = warn_on_fopen_errors(config_filename);
+   if (ret)
+   goto out;
/* no config file means nothing to rename, no error */
goto commit_and_out;
}
diff --git a/dir.c b/dir.c
index f451bfa48c..be616e884e 100644
--- a/dir.c
+++ b/dir.c
@@ -745,9 +745,9 @@ static int add_excludes(const char *fname, const char 
*base, int baselen,
 
fd = open(fname, O_RDONLY);
if (fd < 0 || fstat(fd, &st) < 0) {
-   if (errno != ENOENT)
-   warn_on_inaccessible(fname);
-   if (0 <= fd)
+   if (fd < 0)
+   warn_on_fopen_errors(fname);
+   else
close(fd);
if (!check_index ||
(buf = read_skip_worktree_file_from_index(fname, &size, 
sha1_stat)) == NULL)
diff --git a/git-compat-util.h b/git-compat-util.h
index 6be55cf8b3..eb5c18c7fd 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1101,6 +1101,8 @@ int access_or_die(const char *path, int mode, unsigned 
flag);
 
 /* Warn on an inaccessible file that ought to be accessible */
 void warn_on_inaccessible(const char *path);
+/* Warn on an inaccessible file if errno indicates this is an error */
+int warn_on_fopen_errors(const char *path);
 
 #ifdef GMTIME_UNRELIABLE_ERRORS
 struct tm *git_gmtime(const time_t *);
diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index 72d5f1f931..df92fdcd40 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -195,7 +195,8 @@ test_expect_success POSIXPERM,SANITY 'proper error on 
non-accessible files' '
chmod -r .git/config &&
test_when_finished "chmod +r .git/config" &&
echo "Error (-1) reading configuration file .git/config." >expect &&
-   test_expect_code 2 test-config configset_get_value foo.bar .git/config 
2>actual &&
+   test_expect_code 2 test-config configset_get_value foo.bar .git/config 
2>output &&
+   grep "^Error" output >actual &&
test_cmp expect actual
 '
 
diff --git a/wrapper.c b/wrapper.c
index d837417709..20c25e7e65 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -418,6 +418,16 @@ FILE *fopen_for_writing(const char *path)
return ret;
 }
 
+int warn_on_fopen_errors(const char *path)
+{
+   if (errno != ENOENT && errno != ENOTDIR) {
+   warn_on_inaccessible(path);
+   return -1;
+   }
+
+   return 0;
+}
+
 int xmkstemp(char *template)
 {
int fd;
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 02/13] use xfopen() in more places

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

xfopen()

 - provides error details
 - explains error on reading, or writing, or whatever operation
 - has l10n support
 - prints file name in the error

Some of these are missing in the places that are replaced with xfopen(),
which is a clear win. In some other places, it's just less code (not as
clearly a win as the previous case but still is).

The only slight regresssion is in remote-testsvn, where we don't report
the file class (marks files) in the error messages anymore. But since
this is a _test_ svn remote transport, I'm not too concerned.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 bisect.c  | 5 +
 builtin/am.c  | 8 ++--
 builtin/commit.c  | 5 +
 builtin/fast-export.c | 4 +---
 builtin/fsck.c| 3 +--
 builtin/merge.c   | 4 +---
 builtin/pull.c| 3 +--
 diff.c| 8 ++--
 fast-import.c | 4 +---
 remote-testsvn.c  | 8 ++--
 10 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/bisect.c b/bisect.c
index 08c9fb7266..469a3e9061 100644
--- a/bisect.c
+++ b/bisect.c
@@ -438,10 +438,7 @@ static void read_bisect_paths(struct argv_array *array)
 {
struct strbuf str = STRBUF_INIT;
const char *filename = git_path_bisect_names();
-   FILE *fp = fopen(filename, "r");
-
-   if (!fp)
-   die_errno(_("Could not open file '%s'"), filename);
+   FILE *fp = xfopen(filename, "r");
 
while (strbuf_getline_lf(&str, fp) != EOF) {
strbuf_trim(&str);
diff --git a/builtin/am.c b/builtin/am.c
index a95dd8b4e6..f5dac7783e 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1275,12 +1275,8 @@ static int parse_mail(struct am_state *state, const char 
*mail)
die("BUG: invalid value for state->scissors");
}
 
-   mi.input = fopen(mail, "r");
-   if (!mi.input)
-   die("could not open input");
-   mi.output = fopen(am_path(state, "info"), "w");
-   if (!mi.output)
-   die("could not open output 'info'");
+   mi.input = xfopen(mail, "r");
+   mi.output = xfopen(am_path(state, "info"), "w");
if (mailinfo(&mi, am_path(state, "msg"), am_path(state, "patch")))
die("could not parse patch");
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 1d805f5da8..eda0d32311 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1695,10 +1695,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
if (!reflog_msg)
reflog_msg = "commit (merge)";
pptr = commit_list_append(current_head, pptr);
-   fp = fopen(git_path_merge_head(), "r");
-   if (fp == NULL)
-   die_errno(_("could not open '%s' for reading"),
- git_path_merge_head());
+   fp = xfopen(git_path_merge_head(), "r");
while (strbuf_getline_lf(&m, fp) != EOF) {
struct commit *parent;
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e0220630d0..128b99e6da 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -905,9 +905,7 @@ static void export_marks(char *file)
 static void import_marks(char *input_file)
 {
char line[512];
-   FILE *f = fopen(input_file, "r");
-   if (!f)
-   die_errno("cannot read '%s'", input_file);
+   FILE *f = xfopen(input_file, "r");
 
while (fgets(line, sizeof(line), f)) {
uint32_t mark;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b5e13a4556..00beaaa4e6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -280,8 +280,7 @@ static void check_unreachable_object(struct object *obj)
free(filename);
return;
}
-   if (!(f = fopen(filename, "w")))
-   die_errno("Could not open '%s'", filename);
+   f = xfopen(filename, "w");
if (obj->type == OBJ_BLOB) {
if (stream_blob_to_fd(fileno(f), &obj->oid, 
NULL, 1))
die_errno("Could not write '%s'", 
filename);
diff --git a/builtin/merge.c b/builtin/merge.c
index 703827f006..65a1501858 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -839,9 +839,7 @@ static int suggest_conflicts(void)
struct strbuf msgbuf = STRBUF_INIT;
 
filename = git_path_merge_msg();
-   fp = fopen(filename, "a");
-   if (!fp)
-   die_errno(_("Could not open '%s' for writing"), filename);
+   fp = xfopen(filename, "a");
 
append_conflicts_hint(&msgbuf);
fputs(msgbuf.buf, fp);
diff --git a/builtin/pull.c b/builtin/pull.c
index dd1a4a94e4..589c25becf 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -337,8 +337,7 @@ static void get_merge_heads(struct oid_arra

[PATCH v3 04/13] config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

This variable is added [1] with the assumption that on a sane system,
fopen(, "r") should return NULL. Linux and FreeBSD do not meet this
expectation while at least Windows and AIX do. Let's make sure they
behave the same way.

I only tested one version on Linux (4.7.0 with glibc 2.22) and
FreeBSD (11.0) but since GNU/kFreeBSD is fbsd kernel with gnu userspace,
I'm pretty sure it shares the same problem.

[1] cba22528fa (Add compat/fopen.c which returns NULL on attempt to open
directory - 2008-02-08)

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 config.mak.uname  | 3 +++
 t/t1308-config-set.sh | 8 
 2 files changed, 11 insertions(+)

diff --git a/config.mak.uname b/config.mak.uname
index 399fe19271..a25ffddb3e 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux)
NEEDS_LIBRT = YesPlease
HAVE_GETDELIM = YesPlease
SANE_TEXT_GREP=-a
+   FREAD_READS_DIRECTORIES = UnfortunatelyYes
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_ALLOCA_H = YesPlease
@@ -43,6 +44,7 @@ ifeq ($(uname_S),GNU/kFreeBSD)
HAVE_PATHS_H = YesPlease
DIR_HAS_BSD_GROUP_SEMANTICS = YesPlease
LIBC_CONTAINS_LIBINTL = YesPlease
+   FREAD_READS_DIRECTORIES = UnfortunatelyYes
 endif
 ifeq ($(uname_S),UnixWare)
CC = cc
@@ -201,6 +203,7 @@ ifeq ($(uname_S),FreeBSD)
GMTIME_UNRELIABLE_ERRORS = UnfortunatelyYes
HAVE_BSD_SYSCTL = YesPlease
PAGER_ENV = LESS=FRX LV=-c MORE=FRX
+   FREAD_READS_DIRECTORIES = UnfortunatelyYes
 endif
 ifeq ($(uname_S),OpenBSD)
NO_STRCASESTR = YesPlease
diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index ff50960cca..72d5f1f931 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -183,6 +183,14 @@ test_expect_success 'proper error on non-existent files' '
test_cmp expect actual
 '
 
+test_expect_success 'proper error on directory "files"' '
+   echo "Error (-1) reading configuration file a-directory." >expect &&
+   mkdir a-directory &&
+   test_expect_code 2 test-config configset_get_value foo.bar a-directory 
2>output &&
+   grep "^Error" output >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success POSIXPERM,SANITY 'proper error on non-accessible files' '
chmod -r .git/config &&
test_when_finished "chmod +r .git/config" &&
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 03/13] clone: use xfopen() instead of fopen()

2017-05-25 Thread Junio C Hamano
From: Nguyễn Thái Ngọc Duy 

copy_alternates() called fopen() without handling errors. By switching
to xfopen(), this bug is fixed.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 builtin/clone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index de85b85254..dde4fe73af 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -357,7 +357,7 @@ static void copy_alternates(struct strbuf *src, struct 
strbuf *dst,
 * to turn entries with paths relative to the original
 * absolute, so that they can be used in the new repository.
 */
-   FILE *in = fopen(src->buf, "r");
+   FILE *in = xfopen(src->buf, "r");
struct strbuf line = STRBUF_INIT;
 
while (strbuf_getline(&line, in) != EOF) {
-- 
2.13.0-491-g71cfeddc25



[PATCH v3 05/13] config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too

2017-05-25 Thread Junio C Hamano
Signed-off-by: Junio C Hamano 
---
 config.mak.uname | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config.mak.uname b/config.mak.uname
index a25ffddb3e..1743890164 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -110,6 +110,7 @@ ifeq ($(uname_S),Darwin)
BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1
HAVE_BSD_SYSCTL = YesPlease
+   FREAD_READS_DIRECTORIES = UnfortunatelyYes
 endif
 ifeq ($(uname_S),SunOS)
NEEDS_SOCKET = YesPlease
-- 
2.13.0-491-g71cfeddc25



[PATCH v4 10/10] rebase -i: rearrange fixup/squash lines using the rebase--helper

2017-05-25 Thread Liam Beguin
Hi Johannes,

Johannes Schindelin  writes:
> This operation has quadratic complexity, which is especially painful
> on Windows, where shell scripts are *already* slow (mainly due to the
> overhead of the POSIX emulation layer).
>
> Let's reimplement this with linear complexity (using a hash map to
> match the commits' subject lines) for the common case; Sadly, the
> fixup/squash feature's design neglected performance considerations,
> allowing arbitrary prefixes (read: `fixup! hell` will match the
> commit subject `hello world`), which means that we are stuck with
> quadratic performance in the worst case.
>
> The reimplemented logic also happens to fix a bug where commented-out
> lines (representing empty patches) were dropped by the previous code.
>
> While at it, clarify how the fixup/squash feature works in `git rebase
> -i`'s man page.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  Documentation/git-rebase.txt |  16 ++--
>  builtin/rebase--helper.c |   6 +-
>  git-rebase--interactive.sh   |  90 +---
>  sequencer.c  | 195 
> +++
>  sequencer.h  |   1 +
>  t/t3415-rebase-autosquash.sh |   2 +-
>  6 files changed, 212 insertions(+), 98 deletions(-)
>
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 53f4e14..c5464aa5365 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -430,13 +430,15 @@ without an explicit `--interactive`.
>  --autosquash::
>  --no-autosquash::
>   When the commit log message begins with "squash! ..." (or
> - "fixup! ..."), and there is a commit whose title begins with
> - the same ..., automatically modify the todo list of rebase -i
> - so that the commit marked for squashing comes right after the
> - commit to be modified, and change the action of the moved
> - commit from `pick` to `squash` (or `fixup`).  Ignores subsequent
> - "fixup! " or "squash! " after the first, in case you referred to an
> - earlier fixup/squash with `git commit --fixup/--squash`.
> + "fixup! ..."), and there is already a commit in the todo list that
> + matches the same `...`, automatically modify the todo list of rebase
> + -i so that the commit marked for squashing comes right after the
> + commit to be modified, and change the action of the moved commit
> + from `pick` to `squash` (or `fixup`).  A commit matches the `...` if
> + the commit subject matches, or if the `...` refers to the commit's
> + hash. As a fall-back, partial matches of the commit subject work,
> + too.  The recommended way to create fixup/squash commits is by using
> + the `--fixup`/`--squash` options of linkgit:git-commit[1].
>  +
>  This option is only valid when the `--interactive` option is used.
>  +
> diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
> index de3ccd9bfbc..e6591f01112 100644
> --- a/builtin/rebase--helper.c
> +++ b/builtin/rebase--helper.c
> @@ -14,7 +14,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   int keep_empty = 0;
>   enum {
>   CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S,
> - CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS
> + CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH
>   } command = 0;
>   struct option options[] = {
>   OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
> @@ -33,6 +33,8 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   N_("check the todo list"), CHECK_TODO_LIST),
>   OPT_CMDMODE(0, "skip-unnecessary-picks", &command,
>   N_("skip unnecessary picks"), SKIP_UNNECESSARY_PICKS),
> + OPT_CMDMODE(0, "rearrange-squash", &command,
> + N_("rearrange fixup/squash lines"), REARRANGE_SQUASH),
>   OPT_END()
>   };
>  
> @@ -59,5 +61,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!check_todo_list();
>   if (command == SKIP_UNNECESSARY_PICKS && argc == 1)
>   return !!skip_unnecessary_picks();
> + if (command == REARRANGE_SQUASH && argc == 1)
> + return !!rearrange_squash();
>   usage_with_options(builtin_rebase_helper_usage, options);
>  }
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 92e3ca1de3b..84c6e62518f 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -721,94 +721,6 @@ collapse_todo_ids() {
>   git rebase--helper --shorten-sha1s
>  }
>  
> -# Rearrange the todo list that has both "pick sha1 msg" and
> -# "pick sha1 fixup!/squash! msg" appears in it so that the latter
> -# comes immediately after the former, and change "pick" to
> -# "fixup"/"squash".
> -#
> -# Note that if the config has specified a custom instruction format
> -#

[PATCH v4 03/10] rebase -i: remove useless indentation

2017-05-25 Thread Liam Beguin
Hi Johannes,

Johannes Schindelin  writes:
> The commands used to be indented, and it is nice to look at, but when we
> transform the SHA-1s, the indentation is removed. So let's do away with it.
> 
> For the moment, at least: when we will use the upcoming rebase--helper
> to transform the SHA-1s, we *will* keep the indentation and can
> reintroduce it. Yet, to be able to validate the rebase--helper against
> the output of the current shell script version, we need to remove the
> extra indentation.
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  git-rebase--interactive.sh | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 609e150d38f..c40b1fd1d2e 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -155,13 +155,13 @@ reschedule_last_action () {
>  append_todo_help () {
>   gettext "
>  Commands:
> - p, pick = use commit
> - r, reword = use commit, but edit the commit message
> - e, edit = use commit, but stop for amending
> - s, squash = use commit, but meld into previous commit
> - f, fixup = like \"squash\", but discard this commit's log message
> - x, exec = run command (the rest of the line) using shell
> - d, drop = remove commit
> +p, pick = use commit
> +r, reword = use commit, but edit the commit message
> +e, edit = use commit, but stop for amending
> +s, squash = use commit, but meld into previous commit
> +f, fixup = like \"squash\", but discard this commit's log message
> +x, exec = run command (the rest of the line) using shell
> +d, drop = remove commit

do we also need to update all the translations since this is a `gettext`
function?

>  
>  These lines can be re-ordered; they are executed from top to bottom.
>  " | git stripspace --comment-lines >>"$todo"
> -- 
> 2.12.2.windows.2.800.gede8f145e06

Liam



[PATCH v4 05/10] rebase -i: also expand/collapse the SHA-1s via the rebase--helper

2017-05-25 Thread Liam Beguin
Hi Johannes,

Johannes Schindelin  writes:
> This is crucial to improve performance on Windows, as the speed is now
> mostly dominated by the SHA-1 transformation (because it spawns a new
> rev-parse process for *every* line, and spawning processes is pretty
> slow from Git for Windows' MSYS2 Bash).
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/rebase--helper.c   | 10 +++-
>  git-rebase--interactive.sh | 27 ++
>  sequencer.c| 57 
> ++
>  sequencer.h|  2 ++
>  4 files changed, 70 insertions(+), 26 deletions(-)
> 
> diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
> index 821058d452d..9444c8d6c60 100644
> --- a/builtin/rebase--helper.c
> +++ b/builtin/rebase--helper.c
> @@ -13,7 +13,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   struct replay_opts opts = REPLAY_OPTS_INIT;
>   int keep_empty = 0;
>   enum {
> - CONTINUE = 1, ABORT, MAKE_SCRIPT
> + CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S
>   } command = 0;
>   struct option options[] = {
>   OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
> @@ -24,6 +24,10 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   ABORT),
>   OPT_CMDMODE(0, "make-script", &command,
>   N_("make rebase script"), MAKE_SCRIPT),
> + OPT_CMDMODE(0, "shorten-sha1s", &command,
> + N_("shorten SHA-1s in the todo list"), SHORTEN_SHA1S),
> + OPT_CMDMODE(0, "expand-sha1s", &command,
> + N_("expand SHA-1s in the todo list"), EXPAND_SHA1S),

Since work is being done to convert to `struct object_id` would it
not be best to use a more generic name instead of 'sha1'?
maybe something like {shorten,expand}-hashs

>   OPT_END()
>   };
>  
> @@ -42,5 +46,9 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!sequencer_remove_state(&opts);
>   if (command == MAKE_SCRIPT && argc > 1)
>   return !!sequencer_make_script(keep_empty, stdout, argc, argv);
> + if (command == SHORTEN_SHA1S && argc == 1)
> + return !!transform_todo_ids(1);
> + if (command == EXPAND_SHA1S && argc == 1)
> + return !!transform_todo_ids(0);
>   usage_with_options(builtin_rebase_helper_usage, options);
>  }
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 214af0372ba..82a1941c42c 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -750,35 +750,12 @@ skip_unnecessary_picks () {
>   die "$(gettext "Could not skip unnecessary pick commands")"
>  }
>  
> -transform_todo_ids () {
> - while read -r command rest
> - do
> - case "$command" in
> - "$comment_char"* | exec)
> - # Be careful for oddball commands like 'exec'
> - # that do not have a SHA-1 at the beginning of $rest.
> - ;;
> - *)
> - sha1=$(git rev-parse --verify --quiet "$@" ${rest%%[
>  ]*}) &&
> - if test "a$rest" = "a${rest#*[   ]}"
> - then
> - rest=$sha1
> - else
> - rest="$sha1 ${rest#*[]}"
> - fi
> - ;;
> - esac
> - printf '%s\n' "$command${rest:+ }$rest"
> - done <"$todo" >"$todo.new" &&
> - mv -f "$todo.new" "$todo"
> -}
> -
>  expand_todo_ids() {
> - transform_todo_ids
> + git rebase--helper --expand-sha1s
>  }
>  
>  collapse_todo_ids() {
> - transform_todo_ids --short
> + git rebase--helper --shorten-sha1s
>  }
>  
>  # Rearrange the todo list that has both "pick sha1 msg" and
> diff --git a/sequencer.c b/sequencer.c
> index 88819a1a2a9..201d45b1677 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2437,3 +2437,60 @@ int sequencer_make_script(int keep_empty, FILE *out,
>   strbuf_release(&buf);
>   return 0;
>  }
> +
> +
> +int transform_todo_ids(int shorten_sha1s)
> +{
> + const char *todo_file = rebase_path_todo();
> + struct todo_list todo_list = TODO_LIST_INIT;
> + int fd, res, i;
> + FILE *out;
> +
> + strbuf_reset(&todo_list.buf);
> + fd = open(todo_file, O_RDONLY);
> + if (fd < 0)
> + return error_errno(_("could not open '%s'"), todo_file);
> + if (strbuf_read(&todo_list.buf, fd, 0) < 0) {
> + close(fd);
> + return error(_("could not read '%s'."), todo_file);
> + }
> + close(fd);
> +
> + res = parse_insn_buffer(todo_list.buf.buf, &todo_list);
> + if (res) {
> + todo_list_release(&todo_list);
> + return error(_("unusable instruc

[PATCH v4 02/10] rebase -i: generate the script via rebase--helper

2017-05-25 Thread Liam Beguin
Hi Johannes,

Johannes Schindelin  writes:
> The first step of an interactive rebase is to generate the so-called "todo
> script", to be stored in the state directory as "git-rebase-todo" and to
> be edited by the user.
> 
> Originally, we adjusted the output of `git log ` using a simple
> sed script. Over the course of the years, the code became more
> complicated. We now use shell scripting to edit the output of `git log`
> conditionally, depending whether to keep "empty" commits (i.e. commits
> that do not change any files).
> 
> On platforms where shell scripting is not native, this can be a serious
> drag. And it opens the door for incompatibilities between platforms when
> it comes to shell scripting or to Unix-y commands.
> 
> Let's just re-implement the todo script generation in plain C, using the
> revision machinery directly.
> 
> This is substantially faster, improving the speed relative to the
> shell script version of the interactive rebase from 2x to 3x on Windows.
> 
> Note that the rearrange_squash() function in git-rebase--interactive
> relied on the fact that we set the "format" variable to the config setting
> rebase.instructionFormat. Relying on a side effect like this is no good,
> hence we explicitly perform that assignment (possibly again) in
> rearrange_squash().
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/rebase--helper.c   |  8 +++-
>  git-rebase--interactive.sh | 44 +
>  sequencer.c| 49 
> ++
>  sequencer.h|  3 +++
>  4 files changed, 82 insertions(+), 22 deletions(-)
> 
> diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c
> index ca1ebb2fa18..821058d452d 100644
> --- a/builtin/rebase--helper.c
> +++ b/builtin/rebase--helper.c
> @@ -11,15 +11,19 @@ static const char * const builtin_rebase_helper_usage[] = 
> {
>  int cmd_rebase__helper(int argc, const char **argv, const char *prefix)
>  {
>   struct replay_opts opts = REPLAY_OPTS_INIT;
> + int keep_empty = 0;
>   enum {
> - CONTINUE = 1, ABORT
> + CONTINUE = 1, ABORT, MAKE_SCRIPT
>   } command = 0;
>   struct option options[] = {
>   OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")),
> + OPT_BOOL(0, "keep-empty", &keep_empty, N_("keep empty 
> commits")),
>   OPT_CMDMODE(0, "continue", &command, N_("continue rebase"),
>   CONTINUE),
>   OPT_CMDMODE(0, "abort", &command, N_("abort rebase"),
>   ABORT),
> + OPT_CMDMODE(0, "make-script", &command,
> + N_("make rebase script"), MAKE_SCRIPT),
>   OPT_END()
>   };
>  
> @@ -36,5 +40,7 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!sequencer_continue(&opts);
>   if (command == ABORT && argc == 1)
>   return !!sequencer_remove_state(&opts);
> + if (command == MAKE_SCRIPT && argc > 1)
> + return !!sequencer_make_script(keep_empty, stdout, argc, argv);
>   usage_with_options(builtin_rebase_helper_usage, options);
>  }
> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 2c9c0165b5a..609e150d38f 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -785,6 +785,7 @@ collapse_todo_ids() {
>  # each log message will be re-retrieved in order to normalize the
>  # autosquash arrangement
>  rearrange_squash () {
> + format=$(git config --get rebase.instructionFormat)
>   # extract fixup!/squash! lines and resolve any referenced sha1's
>   while read -r pick sha1 message
>   do
> @@ -1210,26 +1211,27 @@ else
>   revisions=$onto...$orig_head
>   shortrevisions=$shorthead
>  fi
> -format=$(git config --get rebase.instructionFormat)
> -# the 'rev-list .. | sed' requires %m to parse; the instruction requires %H 
> to parse
> -git rev-list $merges_option --format="%m%H ${format:-%s}" \
> - --reverse --left-right --topo-order \
> - $revisions ${restrict_revision+^$restrict_revision} | \
> - sed -n "s/^>//p" |
> -while read -r sha1 rest
> -do
> -
> - if test -z "$keep_empty" && is_empty_commit $sha1 && ! is_merge_commit 
> $sha1
> - then
> - comment_out="$comment_char "
> - else
> - comment_out=
> - fi
> +if test t != "$preserve_merges"
> +then
> + git rebase--helper --make-script ${keep_empty:+--keep-empty} \
> + $revisions ${restrict_revision+^$restrict_revision} >"$todo"
> +else
> + format=$(git config --get rebase.instructionFormat)
> + # the 'rev-list .. | sed' requires %m to parse; the instruction 
> requires %H to parse
> + git rev-list $merges_option --format="%m%H ${format:-%s}" \
> + --reverse --left-right --topo-order \
> + $revisions ${restrict_revision+^$restrict_revision} | 

[PATCH v4 00/10] The final building block for a faster rebase -i

2017-05-25 Thread Liam Beguin
Hi Johannes,

Johannes Schindelin  writes:
> This patch series reimplements the expensive pre- and post-processing of
> the todo script in C.
>
> And it concludes the work I did to accelerate rebase -i.
>
> Changes since v3:
>
> - removed the no-longer-used transform_todo_ids shell function
>
> - simplified transform_todo_ids()'s command parsing
>
> - fixed two commits in check_todo_list(), renamed the unclear
>   `raise_error` variable to `advise_to_edit_todo`, build the message
>   about missing commits directly (without the detour to building a
>   commit_list) and instead of assigning an unused pointer to commit->util
>   the code now uses (void *)1.
>
> - return early from check_todo_list() when parsing failed, even if the
>   check level is something else than CHECK_IGNORE
>
> - the todo list is generated is again generated in the same way as
>   before when rebase.instructionFormat is empty: it was interpreted as
>   if it had not been set
>
> - added a test for empty rebase.instructionFormat settings
>
>
> Johannes Schindelin (10):
>   t3415: verify that an empty instructionFormat is handled as before
>   rebase -i: generate the script via rebase--helper
>   rebase -i: remove useless indentation
>   rebase -i: do not invent onelines when expanding/collapsing SHA-1s
>   rebase -i: also expand/collapse the SHA-1s via the rebase--helper
>   t3404: relax rebase.missingCommitsCheck tests
>   rebase -i: check for missing commits in the rebase--helper
>   rebase -i: skip unnecessary picks using the rebase--helper
>   t3415: test fixup with wrapped oneline
>   rebase -i: rearrange fixup/squash lines using the rebase--helper
>
>  Documentation/git-rebase.txt  |  16 +-
>  builtin/rebase--helper.c  |  29 ++-
>  git-rebase--interactive.sh| 373 -
>  sequencer.c   | 530 
> ++
>  sequencer.h   |   8 +
>  t/t3404-rebase-interactive.sh |  22 +-
>  t/t3415-rebase-autosquash.sh  |  28 ++-
>  7 files changed, 646 insertions(+), 360 deletions(-)
>
>
> base-commit: 027a3b943b444a3e3a76f9a89803fc10245b858f
> Based-On: rebase--helper at https://github.com/dscho/git
> Fetch-Base-Via: git fetch https://github.com/dscho/git rebase--helper
> Published-As: https://github.com/dscho/git/releases/tag/rebase-i-extra-v4
> Fetch-It-Via: git fetch https://github.com/dscho/git rebase-i-extra-v4
>

This is my first review so it's probably not the best you'll get, but
here it goes!

I rebased the series ontop of v2.13.0 and run the whole `make test` on
both revisions.
The changes do not seem to have introduced any evident breakage as the
output of `make test` did not change.

I tried to time the execution on an interactive rebase (on Linux) but
I did not notice a significant change in speed.
Do we have a way to measure performance / speed changes between version?

Liam



Re: What's cooking in git.git (May 2017, #06; Mon, 22)

2017-05-25 Thread Junio C Hamano
Duy Nguyen  writes:

> On Mon, May 22, 2017 at 1:11 PM, Junio C Hamano  wrote:
>> * nd/fopen-errors (2017-05-09) 23 commits
>>  - t1308: add a test case on open a config directory
>>  - config.c: handle error on failing to fopen()
>>  - xdiff-interface.c: report errno on failure to stat() or fopen()
>>  - wt-status.c: report error on failure to fopen()
>>  - server-info: report error on failure to fopen()
>>  - sequencer.c: report error on failure to fopen()
>>  - rerere.c: report correct errno
>>  - rerere.c: report error on failure to fopen()
>>  - remote.c: report error on failure to fopen()
>>  - commit.c: report error on failure to fopen() the graft file
>>  - log: fix memory leak in open_next_file()
>>  - log: report errno on failure to fopen() a file
>>  - blame: report error on open if graft_file is a directory
>>  - bisect: report on fopen() error
>>  - ident.c: use fopen_or_warn()
>>  - attr.c: use fopen_or_warn()
>>  - wrapper.c: add fopen_or_warn()
>>  - wrapper.c: add warn_on_fopen_errors()
>>  - config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
>>  - config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
>>  - clone: use xfopen() instead of fopen()
>>  - Use xfopen() in more places
>>  - git_fopen: fix a sparse 'not declared' warning
>>
>>  We often try to open a file for reading whose existence is
>>  optional, and silently ignore errors from open/fopen; report such
>>  errors if they are not due to missing files.
>
> If anyone wants to continue this, I've cleaned up the series based on
> Johannes comments here [1]. It does not have the Darwin change though.

Also it seems to be missing the SUPPRESS_FOPEN_REDEF thing by
Ramsay.  I'll mix these two in, post to the list for review and
requeue.

Thanks.


> There was the last question about the '*' test change in ref path and
> what exact code change causes that, which I can't answer because I
> don't have Windows, or the time to simulate and pinpoint the fault on
> Linux.
>
> [1] https://github.com/pclouds/git/commits/fopen-or-warn


Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-25 Thread Junio C Hamano
Jeff King  writes:

> But let's consider related invocations and whether we're
> making them better or worse:
>
>- git log :/foo
>   (when "foo" matches a commit message)
>
>   This one should remain the same. Like the existing
>   wildcard rule, we're touching only verify_filename(),
>   not verify_non_filename(). So cases that _do_ resolve
>   as a rev will continue to do so.
>
>- git log :^foo
>   (when "^foo" exists in your index)
>
>   The same logic applies; it would continue to work. And
>   ditto for any other weird filenames in your index like
>   "(attr)foo".

"git show :$path" where $path happens to be "^foo" would grab the
contents of the $path out of the index and I think that is what you
meant, but use of "git log" in the example made me scratch my head
greatly.

>- git log :/foo
>   (when "foo" does _not_ match a commit message)
>   ...
>   This same downside actually exists currently when you
>   have an asterisk in your regex. E.g.,
>
> git log :/foo.*bar
>
>   will be treated as a pathspec (if it doesn't match a
>   commit message) due to the wildcard matching in
>   28fcc0b71.

In other words, we are not making things worse?

> I wrote all the above to try to convince myself that this
> doesn't have any serious regression cases. And I think I
> did.
>
> But I actually we could make the rules in alternative (2)
> above work. check_filename() would ask the pathspec code to
> parse each argument and get one of three results:
>
>   1. it's not pathspec magic; treat it like a filename
>  (e.g., just "foo", or even bogus stuff like ":%foo")
>
>   2. it is pathspec magic, and here is the matching filename
>  that ought to exist (e.g., "foo" from ":^foo" or
>  ":(exclude)foo")
>
>   3. it is pathspec magic, but there's no matching filename.
>  Assume it's a pathspec (e.g., "(attr)foo").
>
> I'm on the fence on whether it's worth the trouble versus
> the simple rule implemented by this patch.

Unlike "git log builtin-checkout.c" that errors out (only because
there is no such file in the checkout of the current version) and
makes its solution obvious to the users, this change has the risk of
silently accepting an ambiguous input and computing result that is
different from what the user intended to.  So I am not sure.  

As you pointedout, ":/" especially does look like a likely point of
failure, in that both "this is path at the top" pathspec magic and
"the commit with this string" are not something that we can say with
confidence that are rarely used because they are so esoteric.

As to "is it OK to build a rule that we cannot explain easily?", I
think it is OK to say "if it is not a rev, and if it is not a
pathname in the current working tree, you must disambiguate, but Git
helps by guessing in some cases---if you want to have more control
(e.g. you are a script), explicitly disambiguate and you'd be OK",
and leave the "some cases" vague, as long as we are only making
reasonably conservative guesses.


Re: [PATCHv5 00/17] Diff machine: highlight moved lines.

2017-05-25 Thread Junio C Hamano
Stefan Beller  writes:

> As you turn on/off normal coloring via "color.diff" and this only extends
> the coloring scheme, I have the impression "color" is the right section.
> Maybe color.diffStyle=[normal/enhanced] and "enhanced" means this
> feature for now?

Hmph, I thought the intent of color.diff is "is the diff command
itself is colored?"  In other words, color.diff=false should give
you monochrome if you say "diff --word-diff", etc.

> The only option in the "diff" section related to color is 
> diff.wsErrorHighlight
> which has a very similar purpose, so "diff.colorMoved" would fit in that
> scheme.

I didn't have "should diff output highlight whitespace errors?" in
mind when I wrote the message you are responding to, but yes, that
is quite similar to "should diff output show lines moved and lines
deleted/added differently?".

> So with these questions, I wonder if we want to color moved lines
> as "color.diff.context" (i.e. plain white text in the normal coloring scheme)
> This would serve the intended purpose of
> dimming the attention to moved lines.

Yes, but two points.

 (1) We want to do so while making it obvious where the boundary
 between two moved blocks of text whose destination (for
 moved-deleted lines) or source (for moved-added lines) is.

 (2) My message was an impression from using the code to review a
 patch that is meant to be "move without changing other things".
 For other purposes, there may be cases where moved ones may
 want to be highlighted, not dimmed.

> Regarding the last point of marking up adjacent blocks (which would
> indicate that there is a coherency issue or just moving from different
> places), we could highlight the last line of the previous block
> and first line of the next block in their "normal" colors (i.e.
> color.diff.old and color.diff.new).

Hmm.  That is an interesting thought.


Re: [PATCHv5 16/17] diff: buffer all output if asked to

2017-05-25 Thread Junio C Hamano
Stefan Beller  writes:

> Yes, this is essentially the v4 with small indentation fixes, so I
> assumed your signoff is still valid.

OK.  I just didn't expect to see one without "no changes since v4"
under the three-dash line.

Thanks.


Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths

2017-05-25 Thread Junio C Hamano
Samuel Lijin  writes:

> On Wed, May 24, 2017 at 12:14 AM, Torsten Bögershausen  wrote:
>>
>>> diff --git a/builtin/clean.c b/builtin/clean.c
>>> index d861f836a..937eb17b6 100644
>>> --- a/builtin/clean.c
>>> +++ b/builtin/clean.c
>>> @@ -857,6 +857,38 @@ static void interactive_main_loop(void)
>>> }
>>>   }
>>>   +static void correct_untracked_entries(struct dir_struct *dir)
>>
>> Looking what the function does, would
>> drop_or_keep_untracked_entries()
>> make more sense ?
>
> To me, drop_or_keep_ implies that they're either all dropped or all
> kept, nothing in between, which is why I went with correct_, to
> indicate that the set of untracked entries in the dir_struct prior to
> calling the method needs to be corrected.

Neither is a particularly good name, but if I have to pick, I'd say
we should keep yours.

drop-or-keep may indicate some are dropped while others are kept but
it does not say what the function is for.  correct is better in the
sense that the readers can guess that there is something wrong in
"dir" at the point and needs correcting by calling the helper, but
still does not convey what exactly is wrong.  How the wrong-ness is
corrected does not have to be explained in the name (i.e. I am
saying that drop-or-keep does not give us much useful information),
but I wish that the name of the helper hinted what kind of wrongness
is there to be corrected to the readers.





Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading

2017-05-25 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I think it's a pointless distraction to start speculating in this
> commit message what we're going to do with --debug it if it ever
> starts emitting some debugging information at pattern execution time.

OK.

> As an aside, I'd very much like to remove both --debug and the
> --and/--or/--all-match, gives some very rough edges in the UI and how
> easy it is to make that feature error or segfault, I suspect you might
> be the only one using it.

I agree that rewriting "grep -e A -e B" to "grep -e A|B" as an
optimization is an interesting possibility to look into, and I can
understand that having to support "--and" and "--not" would
make such an optimization harder to implement. "-e A --and -e B"
must become "-e A.*B|B.*A" and as you get more terms your unified
pattern will grow combinatorial, at which point you would be better
off matching N patterns and combining the result.

Ever saw a user run "ps | grep rogue | grep -v grep" to find a rogue
process to kill?  That would not work if the rogue process's command
line has a word "grep".  Because "git grep" is often run on files in
order to find the location the patterns appear in, "git grep -e
pattern | grep -v unwanted" shares the same issue--the unwanted
pattern may appear in the filename, and the downstream "grep -v" may
filter out a valid hit.  This is why "--not" exists [*1*].  I agree
that emulating it within the same "concatenate patterns into one"
optimization you are envisioning may be hard.

Attempting to optimize "--all-match" would share similar difficulty
with "--and", but your matching now must be done with the entire
buffer and not go line-by-line.  It was meant to make it possible to
say "find commits that avarab@ talks about both regex and log", i.e.

$ git log --author=avarab@ --all-match --grep=log --grep=regex

This is not something you can emulate by piping an output of grep to
another grep.

But none of the above means you have to give up optimizing.  

You can choose not to combine them into a single pattern if certain
constructions are hard, and do only the easy ones.  If you think
that harder combinations are not used very often, the result would
be faster for many cases while not losing useful features, which is
what we want.


[Footnote]

*1* For human consumption, lack of "--not" may not hurt in the sense
that there are workarounds (i.e. you can do without "| grep -v
unwanted" and filter irrelevant ones by eyeballing).  But it is
essential while scripting and trying to be precise.


Feature request: "git status" highlights files that are larger than 500kb.

2017-05-25 Thread Zhomart Mukhamejanov
So it will be easy to track that we don't accidentally commit huge files.

What do you think?


Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-25 Thread Junio C Hamano
Jeff King  writes:

> Anyway. All this has shown me is that it's probably pointless to do this
> timing at all on Linux. Somebody on Windows might get better results.
>
> But regardless, we need to do something. Correctness must trump
> optimizations, and the question is whether we can throw out the whole
> conditional, or if we should just restrict when it kicks in.

Yes.  I personally do not mind going with the simplest approach.
The optimization thing is relatively new and we were perfectly happy
without it before ;-).



Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule

2017-05-25 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> But from what you're saying here that seems like a non-issue, i.e. in
> such a scenario we'd just mirror the original repo[1], change the URL
> in git.git to that, and then anyone could easily use older history
> since it would be pointing to the new mirror.

Those who cloned before such a switch will still have the URL they
chose when they cloned and did "submodule init" on it, I think, so
they need to explicitly choose to switch to the new URL.  So I would
not exactly say "seems like a non-issue"; it certainly is an easy
thing to work around.


Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-25 Thread Junio C Hamano
Jeff King  writes:

> I dunno. I was thinking there might be a quick tweak, but I'm wondering
> if this arcane case is worth the restructuring we'd have to do to
> support it. It only comes up when you've moved the server repo's HEAD to
> an unborn branch _and_ you have other refs (since otherwise we don't
> even send capabilities at all!).

Thanks for digging.  You made me to start doubting it is worth
doing, too.


[PATCH] docs/config.txt: fix indefinite article in core.fileMode description

2017-05-25 Thread SZEDER Gábor
Signed-off-by: SZEDER Gábor 
---
 Documentation/config.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d5..f9adc9afa 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -334,7 +334,7 @@ core.fileMode::
is to be honored.
 +
 Some filesystems lose the executable bit when a file that is
-marked as executable is checked out, or checks out an
+marked as executable is checked out, or checks out a
 non-executable file with executable bit on.
 linkgit:git-clone[1] or linkgit:git-init[1] probe the filesystem
 to see if it handles the executable bit correctly
-- 
2.13.0.35.g14b6294b1



Re: [PATCH v3 2/2] Windows: do not treat a path with backslashes as a remote's nick name

2017-05-25 Thread Junio C Hamano
Johannes Sixt  writes:

>>> So in short:
>>>
>>>   (1) Hannes's patches are good, but they solve a problem that is
>>>   different from what their log messages say; the log message
>>>   needs to be updated;
>
> I do not resend patch 1/2 as it is unchanged in all regards. This 2/2
> changes the justification; patch text is unchanged.

Thanks.  I think this is explained better.  Complaints against
fopen() warnings sounded as if we should avoid attempting to open a
file that may result in _any_ failure, which I felt was misleading,
but it is not a huge issue.

So how do we want to proceed on the point (2), i.e. updating the
"warn on _unexpected_ errors from fopen" series to make it aware of
the EINVAL we can expect on Windows?  My primary question is if all
EINVAL we could ever see on Windows after open/fopen returns an
error is because the pathname the caller gave is not liked by the
filesystem (hence we also know that the path does not exist).

If that is the case, then the "workaround" patch I sent would be an
OK approach (even though I do not know what to write after #ifdef
and I suspect that is not "WINDOWS". We would want to cover the one
you use, the one Dscho releases and possibly the cygwin build).

If we can see EINVAL after open/fopen error that is _not_ expected
and indicates a failure that is worth reporting to the user (just
like we want to report e.g. I/O or permission errors), I think
Windows folks are in a better position than I am to decide between
that approach and a patch at lower level (e.g. teach open/fopen not
to give EINVAL and instead give ENOENT when appropriate).


>  remote.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/remote.c b/remote.c
> index ad6c5424ed..1949882c10 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -645,7 +645,12 @@ static int valid_remote_nick(const char *name)
>  {
>   if (!name[0] || is_dot_or_dotdot(name))
>   return 0;
> - return !strchr(name, '/'); /* no slash */
> +
> + /* remote nicknames cannot contain slashes */
> + while (*name)
> + if (is_dir_sep(*name++))
> + return 0;
> + return 1;
>  }
>  
>  const char *remote_for_branch(struct branch *branch, int *explicit)


Re: Documentation issue: git-stash examples

2017-05-25 Thread Jeff King
On Fri, May 26, 2017 at 08:37:41AM +1200, Adrian Forbes wrote:

> Yep, but how do I send it back to you? As a mail attachment?

Usually you'd send it to the list, with the commit message in the body
and the patch inline below, as generated by "git format-patch".

Unfortunately gmail is notorious for corrupting whitespace in message
bodies. You can use git-send-email to send the mails (but you'll need to
configure your gmail smtp information; see the EXAMPLE section of "git
help send-email").

Alternatively, you can make a pull request to https://github.com/git/git
and then send it to the list via https://submitgit.herokuapp.com/.

-Peff


Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman

2017-05-25 Thread Ævar Arnfjörð Bjarmason
On Thu, May 18, 2017 at 10:13 PM, Ben Peart  wrote:
> This hook script integrates the new fsmonitor capabilities of git with
> the cross platform Watchman file watching service. To use the script:
>
> Download and install Watchman from https://facebook.github.io/watchman/
> and instruct Watchman to watch your working directory for changes
> ('watchman watch-project /usr/src/git').
>
> Rename the sample integration hook from query-fsmonitor.sample to
> query-fsmonitor.
>
> Configure git to use the extension ('git config core.fsmonitor true')
> and optionally turn on the untracked cache for optimal performance
> ('git config core.untrackedcache true').
>
> Signed-off-by: Ben Peart 
> Signed-off-by: Johannes Schindelin 
> ---
>  templates/hooks--query-fsmonitor.sample | 27 +++
>  1 file changed, 27 insertions(+)
>  create mode 100644 templates/hooks--query-fsmonitor.sample
>
> diff --git a/templates/hooks--query-fsmonitor.sample 
> b/templates/hooks--query-fsmonitor.sample
> new file mode 100644
> index 00..4bd22f21d8
> --- /dev/null
> +++ b/templates/hooks--query-fsmonitor.sample
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# An example hook script to integrate Watchman
> +# (https://facebook.github.io/watchman/) with git to provide fast
> +# git status.
> +#
> +# The hook is passed a time_t formatted as a string and outputs to
> +# stdout all files that have been modified since the given time.
> +# Paths must be relative to the root of the working tree and
> +# separated by a single NUL.
> +#
> +# To enable this hook, rename this file to "query-fsmonitor"
> +
> +# Convert unix style paths to escaped Windows style paths
> +case "$(uname -s)" in
> +MINGW*|MSYS_NT*)
> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,,g')"
> +  ;;
> +*)
> +  GIT_WORK_TREE="$PWD"
> +  ;;
> +esac
> +
> +# Query Watchman for all the changes since the requested time
> +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, 
> \"fields\":[\"name\"]}]" | \
> +watchman -j | \
> +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); 
> die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if 
> defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'

I couldn't get watchman to work for me (built from source, keep
getting [1]), but I hacked up something you can hopefully test &
squash on top of this:

 # Query Watchman for all the changes since the requested time
-echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t,
\"fields\":[\"name\"]}]" | \
-watchman -j | \
-perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("",
<>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if
defined($o-
+echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t,
\"fields\":[\"name\"]}]" |
+   watchman -j |
+   perl -0666 -e '
+   use strict;
+   use warnings;
+
+   my $stdin = <>;
+   die "Watchman: The watchman command returned no
output, error above?\n" if $stdin eq "";
+   die "Watchman: Invalid input: $stdin\n" unless $stdin =~ /^\{/;
+
+   my $json_pkg;
+   eval {
+   require JSON::XS;
+   $json_pkg = "JSON::XS";
+   1;
+   } or do {
+   require JSON::PP;
+   $json_pkg = "JSON::PP";
+   };
+
+   my $o = $json_pkg->new->utf8->decode($stdin);
+   die "Watchman: $o->{error}.\nFalling back to scanning...\n"
+   if $o->{error};
+
+   local $, = "\0";
+   print @{$o->{files}};
+   '

Rationale:

 * We use the much faster JSON::XS if it's installed.
 * We use strict/warnings
 * Micro optimization: Replace joining stdin with an equivalent -0666
   invocation. See "perldoc perlrun".
 * Micro optimization: No need to join up the possibly large list of
   files into one big string, just set $, to \0 and stream out the
   array.
 * Error handling: My watchman is broken (so actually this isn't
   tested), it only spews to stderr and exits. Handle that by checking
   whether stdin is "".

Those changes are available at
https://github.com/avar/git/commits/avar/fsmonitor



1. watchman: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version
`CXXABI_1.3.11' not found (required by watchman)


[PATCH v3 7/7] grep: add support for PCRE v2

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add support for v2 of the PCRE API. This is a new major version of
PCRE that came out in early 2015[1].

The regular expression syntax is the same, but while the API is
similar, pretty much every function is either renamed or takes
different arguments. Thus using it via entirely new functions makes
sense, as opposed to trying to e.g. have one compile_pcre_pattern()
that would call either PCRE v1 or v2 functions.

Git can now be compiled with either USE_LIBPCRE1=YesPlease or
USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a
synonym for the former. Providing both is a compile-time error.

With earlier patches to enable JIT for PCRE v1 the performance of the
release versions of both libraries is almost exactly the same, with
PCRE v2 being around 1% slower.

However after I reported this to the pcre-dev mailing list[2] I got a
lot of help with the API use from Zoltán Herczeg, he subsequently
optimized some of the JIT functionality in v2 of the library.

Running the p7820-grep-engines.sh performance test against the latest
Subversion trunk of both, with both them and git compiled as -O3, and
the test run against linux.git, gives the following results. Just the
/perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 
USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 
USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease 
CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD 
p7820-grep-engines.sh
[...]
Test   HEAD~2HEAD~  
  HEAD


7820.3: perl grep 'how.to'  0.22(0.40+0.48)   
0.22(0.31+0.58) +0.0%   0.22(0.26+0.59) +0.0%
7820.7: perl grep '^how to' 0.27(0.62+0.50)   
0.28(0.60+0.50) +3.7%   0.22(0.25+0.60) -18.5%
7820.11: perl grep '[how] to'   0.33(0.92+0.47)   
0.33(0.94+0.45) +0.0%   0.25(0.42+0.51) -24.2%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.35(1.08+0.46)   
0.35(1.12+0.41) +0.0%   0.25(0.52+0.50) -28.6%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.78+0.51)   
0.30(0.86+0.42) +0.0%   0.25(0.29+0.54) -16.7%

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with
JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine
mentioning p7820-grep-engines.sh for more details on the test setup.

For ease of readability, a different run just of HEAD~ (PCRE v1 with
JIT v.s. PCRE v2), again with just the /perl/ tests shown:

Test   HEAD~ HEAD

---
7820.3: perl grep 'how.to'  0.23(0.41+0.47)   
0.23(0.26+0.59) +0.0%
7820.7: perl grep '^how to' 0.27(0.64+0.47)   
0.23(0.28+0.56) -14.8%
7820.11: perl grep '[how] to'   0.34(0.95+0.44)   
0.25(0.38+0.56) -26.5%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   0.34(1.07+0.46)   
0.24(0.52+0.49) -29.4%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.30(0.81+0.46)   
0.22(0.33+0.54) -26.7%

I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead,
when it does it's around 20% faster.

A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3)
the compiled pattern can be shared between threads, but not some of
the JIT context, however the grep threading support does all pattern &
JIT compilation in separate threads, so this code doesn't need to
concern itself with thread safety.

See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the
initial addition of PCRE v1. This change follows some of the same
patterns it did (and which were discussed on list at the time),
e.g. mocking up types with typedef instead of ifdef-ing them out when
USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the
program, but makes the code look nicer.

1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html
2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  36 ++
 configure.ac  |  77 +-
 grep.c| 147 ++
 grep.h|  17 +++
 t/test-lib.sh |   2 +-
 5 files changed, 258 insertions(+), 21 deletions(-)

diff --git a/Makefile b/Makefile
index a79274e5e6..e5a2

[PATCH v3 4/7] grep: add support for the PCRE v1 JIT API

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Change the grep PCRE v1 code to use JIT when available. When PCRE
support was initially added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into
8.20 released on 2011-10-21.

Enabling JIT support usually improves performance by more than
40%. The pattern compilation times are relatively slower, but those
relative numbers are tiny, and are easily made back in all but the
most trivial cases of grep. Detailed benchmarks & overview of
compilation times is at: http://sljit.sourceforge.net/pcre.html

With this change the difference in a t/perf/p7820-grep-engines.sh run
is, with just the /perl/ tests shown:

$ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc 
NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst 
LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD 
p7820-grep-engines.sh
Test   HEAD~ HEAD

---
7820.3: perl grep 'how.to'  0.35(1.11+0.43)   
0.23(0.42+0.46) -34.3%
7820.7: perl grep '^how to' 0.64(2.71+0.36)   
0.27(0.66+0.44) -57.8%
7820.11: perl grep '[how] to'   0.63(2.51+0.42)   
0.33(0.98+0.39) -47.6%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   1.17(5.61+0.35)   
0.34(1.08+0.46) -70.9%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.43(1.52+0.44)   
0.30(0.88+0.42) -30.2%

The conditional support for JIT is implemented as suggested in the
pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's
not present.

The implementation is relatively verbose because even if
PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine
if the JIT is available, and if so the faster pcre_jit_exec() function
should be called instead of pcre_exec(), and a different (but not
complimentary!) function needs to be called to free pcre1_extra_info.

There's no graceful fallback if pcre_jit_stack_alloc() fails under
PCRE_CONFIG_JIT, instead the program will simply abort. I don't think
this is worth handling gracefully, it'll only fail in cases where
malloc() doesn't work, in which case we're screwed anyway.

That there's no assignment of `p->pcre1_jit_on = 0` when
PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat()
function allocates the grep_pat allocates it with calloc(), so it's
guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined.

I you're bisecting and find this change, check that your PCRE isn't
older than 8.32. This change intentionally broke really old versions
of PCRE, but that's fixed in follow-up commits.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 40 
 grep.h |  6 ++
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index d03d424e5c..ffe95995ee 100644
--- a/grep.c
+++ b/grep.c
@@ -365,9 +365,22 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 
PCRE_STUDY_JIT_COMPILE, &error);
if (!p->pcre1_extra_info && error)
die("%s", error);
+
+#ifdef PCRE_CONFIG_JIT
+   pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
+   if (p->pcre1_jit_on == 1) {
+   p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
+   if (!p->pcre1_jit_stack)
+   die("Couldn't allocate PCRE JIT stack");
+   pcre_assign_jit_stack(p->pcre1_extra_info, NULL, 
p->pcre1_jit_stack);
+   } else if (p->pcre1_jit_on != 0) {
+   die("BUG: The pcre1_jit_on variable should be 0 or 1, not %d",
+   p->pcre1_jit_on);
+   }
+#endif
 }
 
 static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
@@ -378,8 +391,19 @@ static int pcre1match(struct grep_pat *p, const char 
*line, const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-   ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
-   0, flags, ovector, ARRAY_SIZE(ovector));
+#ifdef PCRE_CONFIG_JIT
+   if (p->pcre1_jit_on) {
+   ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+   eol - line, 0, flags, ovector,
+   ARRAY_SIZE(ovector), p->pcre1_jit_stack);
+   } else
+#endif
+   {
+   ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+   eol - line, 0, flags, ovector,
+   ARRAY_SIZE(ovector));
+   }
+
if (ret < 0 && ret != PCRE_ERROR_NOMATCH)

[PATCH v3 6/7] grep: un-break building with PCRE < 8.20

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.20.

The 8.20 release was the first release to have JIT & pcre_jit_stack in
the headers, so a mock type needs to be provided for it on those
releases.

Now git should compile with all PCRE versions that it supported before
my JIT change.

I've tested it as far back as version 7.5 released on 2008-01-10, once
I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1,
and I couldn't be bothered to anything older than 7.5 as I'm confident
that if the build breaks on those older versions it's not because of
my JIT change.

See the "un-break" change in this series ("grep: un-break building
with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the
main PCRE JIT commit.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/grep.h b/grep.h
index 3b948d9155..ce90969736 100644
--- a/grep.h
+++ b/grep.h
@@ -11,6 +11,9 @@
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
+#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
+typedef int pcre_jit_stack;
+#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-- 
2.13.0.303.g4ebf302169



[PATCH v3 5/7] grep: un-break building with PCRE < 8.32

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Amend my change earlier in this series ("grep: add support for the
PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1
versions earlier than 8.32.

The JIT support was added in version 8.20 released on 2011-10-21, but
it wasn't until 8.32 released on 2012-11-30 that the fast code path to
use the JIT via pcre_jit_exec() was added[1] (see also [2]).

This means that versions 8.20 through 8.31 could still use the JIT,
but supporting it on those versions would add to the already verbose
macro soup around JIT support it, and I don't expect that the use-case
of compiling a brand new git against a 5 year old PCRE is particularly
common, and if someone does that they can just get the existing
pre-JIT slow codepath.

So just take the easy way out and disable the JIT on any version older
than 8.32.

The reason this change isn't part of the initial change PCRE JIT
support is to have a cleaner history showing which parts of the
implementation are only used for ancient PCRE versions. This also
makes it easier to revert this change if we ever decide to stop
supporting those old versions.

1. http://www.pcre.org/original/changelog.txt ("28. Introducing a
   native interface for JIT. Through this interface, the
   compiled[...]")
2. https://bugs.exim.org/show_bug.cgi?id=2121

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 6 +++---
 grep.h | 5 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/grep.c b/grep.c
index ffe95995ee..19fa67c34c 100644
--- a/grep.c
+++ b/grep.c
@@ -369,7 +369,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
if (!p->pcre1_extra_info && error)
die("%s", error);
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_USE_JIT
pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
if (p->pcre1_jit_on == 1) {
p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
@@ -391,7 +391,7 @@ static int pcre1match(struct grep_pat *p, const char *line, 
const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_USE_JIT
if (p->pcre1_jit_on) {
ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
eol - line, 0, flags, ovector,
@@ -418,7 +418,7 @@ static int pcre1match(struct grep_pat *p, const char *line, 
const char *eol,
 static void free_pcre1_regexp(struct grep_pat *p)
 {
pcre_free(p->pcre1_regexp);
-#ifdef PCRE_CONFIG_JIT
+#ifdef GIT_PCRE1_USE_JIT
if (p->pcre1_jit_on) {
pcre_free_study(p->pcre1_extra_info);
pcre_jit_stack_free(p->pcre1_jit_stack);
diff --git a/grep.h b/grep.h
index 14f47189f9..3b948d9155 100644
--- a/grep.h
+++ b/grep.h
@@ -3,6 +3,11 @@
 #include "color.h"
 #ifdef USE_LIBPCRE1
 #include 
+#ifdef PCRE_CONFIG_JIT
+#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
+#define GIT_PCRE1_USE_JIT
+#endif
+#endif
 #ifndef PCRE_STUDY_JIT_COMPILE
 #define PCRE_STUDY_JIT_COMPILE 0
 #endif
-- 
2.13.0.303.g4ebf302169



[PATCH v3 3/7] log: add -P as a synonym for --perl-regexp

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a short -P option as a synonym for the longer --perl-regexp, for
consistency with the options the corresponding grep invocations
accept.

This was intentionally omitted in commit 727b6fc3ed ("log --grep:
accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified
future use.

Make it consistent with "grep" rather than to keep it open for future
use, and to avoid the confusion of -P meaning different things for
grep & log, as is the case with the -G option.

As noted in the aforementioned commit the --basic-regexp option can't
have a corresponding -G argument, as the log command already uses that
for -G.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/rev-list-options.txt |  1 +
 revision.c |  2 +-
 t/t4202-log.sh | 12 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a46f70c2b1..9c44eae55d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -91,6 +91,7 @@ endif::git-rev-list[]
Consider the limiting patterns to be fixed strings (don't interpret
pattern as a regular expression).
 
+-P::
 --perl-regexp::
Consider the limiting patterns to be Perl-compatible regular
expressions.
diff --git a/revision.c b/revision.c
index 4883cdd2d0..60329da1bd 100644
--- a/revision.c
+++ b/revision.c
@@ -1996,7 +1996,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED;
-   } else if (!strcmp(arg, "--perl-regexp")) {
+   } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE;
} else if (!strcmp(arg, "--all-match")) {
revs->grep_filter.all_match = 1;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index c44c4337f8..e6050faa14 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -404,8 +404,20 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
--grep="(1|2)" >actual.fixed.short-arg &&
git log --pretty=tformat:%s -E \
--grep="\|2" >actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   git log --pretty=tformat:%s -P \
+   --grep="[\d]\|" >actual.perl.short-arg
+   else
+   test_must_fail git log -P \
+   --grep="[\d]\|"
+   fi &&
test_cmp expect.fixed actual.fixed.short-arg &&
test_cmp expect.extended actual.extended.short-arg &&
+   if test_have_prereq PCRE
+   then
+   test_cmp expect.perl actual.perl.short-arg
+   fi &&
 
git log --pretty=tformat:%s --fixed-strings \
--grep="(1|2)" >actual.fixed.long-arg &&
-- 
2.13.0.303.g4ebf302169



[PATCH v3 1/7] grep: don't redundantly compile throwaway patterns under threading

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Change the pattern compilation logic under threading so that grep
doesn't compile a pattern it never ends up using on the non-threaded
code path, only to compile it again N times for N threads which will
each use their own copy, ignoring the initially compiled pattern.

This redundant compilation dates back to the initial introduction of
the threaded grep in commit 5b594f457a ("Threaded grep",
2010-01-25).

There was never any reason for doing this redundant work other than an
oversight in the initial commit. Jeff King suggested on-list in
<20170414212325.fefrl3qdjigwy...@sigill.intra.peff.net> that this
might be needed to check the pattern for sanity before threaded
execution commences.

That's not the case. The pattern is compiled under threading in
start_threads() before any concurrent execution has started by calling
pthread_create(), so if the pattern contains an error we still do the
right thing. I.e. die with one error before any threaded execution has
commenced, instead of e.g. spewing out an error for each N threads,
which could be a regression a change like this might inadvertently
introduce.

This change is not meant as an optimization, any performance gains
from this are in the hundreds to thousands of nanoseconds at most. If
we wanted more performance here we could just re-use the compiled
patterns in multiple threads (regcomp(3) is thread-safe), or partially
re-use them and the associated structures in the case of later PCRE
JIT changes.

Rather, it's just to make the code easier to reason about. It's
confusing to debug this under threading & non-threading when the
threading codepaths redundantly compile a pattern which is never used.

The reason the patterns are recompiled is as a side-effect of
duplicating the whole grep_opt structure, which is not thread safe,
writable, and munged during execution. The grep_opt structure then
points to the grep_pat structure where pattern or patterns are stored.

I looked into e.g. splitting the API into some "do & alloc threadsafe
stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the
execution time of grep_opt_dup() & pattern compilation is trivial
compared to actually executing the grep, so there was no point. Even
with the more expensive JIT changes to follow the most expensive PCRE
patterns take something like 0.0X milliseconds to compile at most[1].

The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach
--debug option to dump the parse tree", 2012-09-13) still works
properly with this change. It only emits debugging info during pattern
compilation, which is now dumped by the pattern compiled just before
the first thread is started.

1. http://sljit.sourceforge.net/pcre.html

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index b1095362fb..12e62fcbf3 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt)
int err;
struct grep_opt *o = grep_opt_dup(opt);
o->output = strbuf_out;
-   o->debug = 0;
+   if (i)
+   o->debug = 0;
compile_grep_patterns(o);
err = pthread_create(&threads[i], NULL, run, o);
 
@@ -1167,8 +1168,6 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
if (!opt.fixed && opt.ignore_case)
opt.regflags |= REG_ICASE;
 
-   compile_grep_patterns(&opt);
-
/*
 * We have to find "--" in a separate pass, because its presence
 * influences how we will parse arguments that come before it.
@@ -1245,6 +1244,15 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
num_threads = 0;
 #endif
 
+   if (!num_threads)
+   /*
+* The compiled patterns on the main path are only
+* used when not using threading. Otherwise
+* start_threads() below calls compile_grep_patterns()
+* for each thread.
+*/
+   compile_grep_patterns(&opt);
+
 #ifndef NO_PTHREADS
if (num_threads) {
if (!(opt.name_only || opt.unmatch_name_only || opt.count)
-- 
2.13.0.303.g4ebf302169



[PATCH v3 2/7] grep: skip pthreads overhead when using one thread

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Skip the administrative overhead of using pthreads when only using one
thread. Instead take the non-threaded path which would be taken under
NO_PTHREADS.

The threading support was initially added in commit
5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time
number of 8 threads. Later the number of threads was made configurable
in commit 89f09dd34e ("grep: add --threads= option and
grep.threads configuration", 2015-12-15).

That change did not add any special handling for --threads=1. Now we
take a slightly faster path by skipping thread handling entirely when
1 thread is requested.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 12e62fcbf3..bd008cb100 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
num_threads = GREP_NUM_THREADS_DEFAULT;
else if (num_threads < 0)
die(_("invalid number of threads specified (%d)"), num_threads);
+   if (num_threads == 1)
+   num_threads = 0;
 #else
if (num_threads)
warning(_("no threads support, ignoring --threads"));
-- 
2.13.0.303.g4ebf302169



[PATCH v3 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes

2017-05-25 Thread Ævar Arnfjörð Bjarmason
See <20170523192453.14172-1-ava...@gmail.com> for v2
(https://public-inbox.org/git/20170523192453.14172-1-ava...@gmail.com/).

This on top of "[PATCH v4 00/31] Easy to review grep & pre-PCRE
changes".

This hopefully addresses all the comments I had on v2. Comments below:

Ævar Arnfjörð Bjarmason (7):
  grep: don't redundantly compile throwaway patterns under threading
  grep: skip pthreads overhead when using one thread
  log: add -P as a synonym for --perl-regexp

No changes.

  grep: add support for the PCRE v1 JIT API

 * Simplify logic around canjit variable.

 * s/BUG: // for things that aren't internal bugs (but keep it for one
   that's an actual bug if it happens...)

 * Simplify syntax around `#ifdef PCRE_CONFIG_JIT` blocks.

  grep: un-break building with PCRE < 8.32

 * Reword commit message for less confusion.

 * Macro name: s/GIT_PCRE1_CAN_DO_MODERN_JIT/GIT_PCRE1_USE_JIT/g

  grep: un-break building with PCRE < 8.20

No changes.

  grep: add support for PCRE v2

 * Makefile: Clarify comment about what LIBPCREDIR=* does now.

 * Makefile: One use of USE_LIBPCRE is now USE_LIBPCRE1. It's
   logically the same, but less confusing now.

 * grep.c: Fix up comment syntax

 * grep.c: Same canjit & s/BUG: // bug fixes as noted for PCRE v1.

 Documentation/rev-list-options.txt |   1 +
 Makefile   |  36 +--
 builtin/grep.c |  16 +++-
 configure.ac   |  77 ---
 grep.c | 187 -
 grep.h |  31 ++
 revision.c |   2 +-
 t/t4202-log.sh |  12 +++
 t/test-lib.sh  |   2 +-
 9 files changed, 335 insertions(+), 29 deletions(-)

-- 
2.13.0.303.g4ebf302169



[PATCH v4 31/31] grep: assert that threading is enabled when calling grep_{lock,unlock}

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Change the grep_{lock,unlock} functions to assert that num_threads is
true, instead of only locking & unlocking the pthread mutex lock when
it is.

These functions are never called when num_threads isn't true, this
logic has gone through multiple iterations since the initial
introduction of grep threading in commit 5b594f457a ("Threaded grep",
2010-01-25), but ever since then they'd only be called if num_threads
was true, so this check made the code confusing to read.

Replace the check with an assertion, so that it's clear to the reader
that this code path is never taken unless we're spawning threads.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3c721b75a5..b1095362fb 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -73,14 +73,14 @@ static pthread_mutex_t grep_mutex;
 
 static inline void grep_lock(void)
 {
-   if (num_threads)
-   pthread_mutex_lock(&grep_mutex);
+   assert(num_threads);
+   pthread_mutex_lock(&grep_mutex);
 }
 
 static inline void grep_unlock(void)
 {
-   if (num_threads)
-   pthread_mutex_unlock(&grep_mutex);
+   assert(num_threads);
+   pthread_mutex_unlock(&grep_mutex);
 }
 
 /* Signalled when a new work_item is added to todo. */
-- 
2.13.0.303.g4ebf302169



[PATCH v4 29/31] pack-objects: fix buggy warning about threads

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Fix a buggy warning about threads under NO_PTHREADS=YesPlease. Due to
re-using the delta_search_threads variable for both the state of the
"pack.threads" config & the --threads option, setting "pack.threads"
but not supplying --threads would trigger the warning for both
"pack.threads" & --threads.

Solve this bug by resetting the delta_search_threads variable in
git_pack_config(), it might then be set by --threads again and be
subsequently warned about, as the test I'm changing here asserts.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/pack-objects.c | 4 +++-
 t/t5300-pack-object.sh | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9b4ba8a80d..efa21a15dd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2472,8 +2472,10 @@ static int git_pack_config(const char *k, const char *v, 
void *cb)
die("invalid number of threads specified (%d)",
delta_search_threads);
 #ifdef NO_PTHREADS
-   if (delta_search_threads != 1)
+   if (delta_search_threads != 1) {
warning("no threads support, ignoring %s", k);
+   delta_search_threads = 0;
+   }
 #endif
return 0;
}
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 6ed23ee1d2..9c68b99251 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -447,7 +447,7 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects 
--threads=N or pack.
 
git -c pack.threads=2 pack-objects --stdout --all /dev/null 
2>err &&
grep ^warning: err >warnings &&
-   test_must_fail test_line_count = 1 warnings &&
+   test_line_count = 1 warnings &&
grep -F "no threads support, ignoring pack.threads" err &&
 
git -c pack.threads=2 pack-objects --threads=4 --stdout --all 
/dev/null 2>err &&
-- 
2.13.0.303.g4ebf302169



[PATCH v4 26/31] grep: move is_fixed() earlier to avoid forward declaration

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Move the is_fixed() function which are currently only used in
compile_regexp() earlier so it can be used in the PCRE family of
functions in a later change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/grep.c b/grep.c
index 2d54baeaa3..d03d424e5c 100644
--- a/grep.c
+++ b/grep.c
@@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct 
grep_pat *p,
die("%s'%s': %s", where, p->pattern, error);
 }
 
+static int is_fixed(const char *s, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++) {
+   if (is_regex_special(s[i]))
+   return 0;
+   }
+
+   return 1;
+}
+
 static int has_null(const char *s, size_t len)
 {
/*
@@ -402,18 +414,6 @@ static void free_pcre1_regexp(struct grep_pat *p)
 }
 #endif /* !USE_LIBPCRE1 */
 
-static int is_fixed(const char *s, size_t len)
-{
-   size_t i;
-
-   for (i = 0; i < len; i++) {
-   if (is_regex_special(s[i]))
-   return 0;
-   }
-
-   return 1;
-}
-
 static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
struct strbuf sb = STRBUF_INIT;
-- 
2.13.0.303.g4ebf302169



[PATCH v4 27/31] test-lib: add a PTHREADS prerequisite

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a PTHREADS prerequisite which is false when git is compiled with
NO_PTHREADS=YesPlease.

There's lots of custom code that runs when threading isn't available,
but before this prerequisite there was no way to test it.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  | 1 +
 t/README  | 4 
 t/test-lib.sh | 1 +
 3 files changed, 6 insertions(+)

diff --git a/Makefile b/Makefile
index 374fbc7e58..a79274e5e6 100644
--- a/Makefile
+++ b/Makefile
@@ -2242,6 +2242,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
@echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
+   @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' 
>>$@+
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
@echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+
diff --git a/t/README b/t/README
index a90cb62583..2f95860369 100644
--- a/t/README
+++ b/t/README
@@ -817,6 +817,10 @@ use these, and "test_set_prereq" for how to define your 
own.
Test is run on a filesystem which converts decomposed utf-8 (nfd)
to precomposed utf-8 (nfc).
 
+ - PTHREADS
+
+   Git wasn't compiled with NO_PTHREADS=YesPlease.
+
 Tips for Writing Tests
 --
 
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1d0f636cbd..43529451f9 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1013,6 +1013,7 @@ esac
 
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
+test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
 test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
-- 
2.13.0.303.g4ebf302169



[PATCH v4 21/31] grep: catch a missing enum in switch statement

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a die(...) to a default case for the switch statement selecting
between grep pattern types under --recurse-submodules.

Normally this would be caught by -Wswitch, but the grep_pattern_type
type is converted to int by going through parse_options(). Changing
the argument type passed to compile_submodule_options() won't work,
the value will just get coerced. The -Wswitch-default warning will
warn about it, but that produces a lot of noise across the codebase,
this potential issue would be drowned in that noise.

Thus catching this at runtime is the least bad option. This won't ever
trigger in practice, but if a new pattern type were to be added this
catches an otherwise silent bug during development.

See commit 0281e487fd ("grep: optionally recurse into submodules",
2016-12-16) for the initial addition of this code.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index 3ffb5b4e81..a191e2976b 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -495,6 +495,8 @@ static void compile_submodule_options(const struct grep_opt 
*opt,
break;
case GREP_PATTERN_TYPE_UNSPECIFIED:
break;
+   default:
+   die("BUG: Added a new grep pattern type without updating switch 
statement");
}
 
for (pattern = opt->pattern_list; pattern != NULL;
-- 
2.13.0.303.g4ebf302169



[PATCH v4 22/31] grep: remove redundant regflags assignments

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Remove redundant assignments to the "regflags" variable. This variable
is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to
un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}.

Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon
seeing "-G" on the command line and flipped the bit on upon seeing
"-E", but I think that was perfectly sensible and it would have been a
bug if we didn't.  They were part of the command line parsing that
could have seen "-E" on the command line earlier.

When cca2c172 ("git-grep: do not die upon -F/-P when
grep.extendedRegexp is set.", 2011-05-09) switched the command line
parsing to "read into a 'tentatively this is what we saw the last'
variable and then finally commit just once", we didn't touch
opt.regflags for PCRE and FIXED, but we still had to flip regflags
between BRE and ERE, because parsing of grep.extendedregexp
configuration variable directly touched opt.regflags back then, which
was done by b22520a3 ("grep: allow -E and -n to be turned on by
default via configuration", 2011-03-30).

When 84befcd0 ("grep: add a grep.patternType configuration setting",
2012-08-03) introduced extended_regexp_option field, we stopped
flipping regflags while reading the configuration, and that was when
we should have noticed and stopped dropping REG_EXTENDED bit in the
"now we can commit what type to use" helper function.

There is no reason to do this anymore, so stop doing it, more to
reduce "wait this is used under fixed/BRE/PCRE how?" confusion when
reading the code, than to to save ourselves trivial CPU cycles by
removing one assignment.

1. "built-in "git grep"", 2006-04-30.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/grep.c b/grep.c
index 47cee45067..bf6c2494fd 100644
--- a/grep.c
+++ b/grep.c
@@ -179,7 +179,6 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_BRE:
opt->fixed = 0;
opt->pcre = 0;
-   opt->regflags &= ~REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_ERE:
@@ -191,13 +190,11 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
case GREP_PATTERN_TYPE_FIXED:
opt->fixed = 1;
opt->pcre = 0;
-   opt->regflags &= ~REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_PCRE:
opt->fixed = 0;
opt->pcre = 1;
-   opt->regflags &= ~REG_EXTENDED;
break;
}
 }
@@ -415,10 +412,9 @@ static void compile_fixed_regexp(struct grep_pat *p, 
struct grep_opt *opt)
 {
struct strbuf sb = STRBUF_INIT;
int err;
-   int regflags;
+   int regflags = opt->regflags;
 
basic_regex_quote_buf(&sb, p->pattern);
-   regflags = opt->regflags & ~REG_EXTENDED;
if (opt->ignore_case)
regflags |= REG_ICASE;
err = regcomp(&p->regexp, sb.buf, regflags);
-- 
2.13.0.303.g4ebf302169



[PATCH v4 30/31] grep: given --threads with NO_PTHREADS=YesPlease, warn

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a warning about missing thread support when grep.threads or
--threads is set to a non 0 (default) or 1 (no parallelism) value
under NO_PTHREADS=YesPlease.

This is for consistency with the index-pack & pack-objects commands,
which also take a --threads option & are configurable via
pack.threads, and have long warned about the same under
NO_PTHREADS=YesPlease.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/grep.c  | 13 +
 t/t7810-grep.sh | 18 ++
 2 files changed, 31 insertions(+)

diff --git a/builtin/grep.c b/builtin/grep.c
index a191e2976b..3c721b75a5 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -289,6 +289,17 @@ static int grep_cmd_config(const char *var, const char 
*value, void *cb)
if (num_threads < 0)
die(_("invalid number of threads specified (%d) for 
%s"),
num_threads, var);
+#ifdef NO_PTHREADS
+   else if (num_threads && num_threads != 1) {
+   /*
+* TRANSLATORS: %s is the configuration
+* variable for tweaking threads, currently
+* grep.threads
+*/
+   warning(_("no threads support, ignoring %s"), var);
+   num_threads = 0;
+   }
+#endif
}
 
return st;
@@ -1229,6 +1240,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
else if (num_threads < 0)
die(_("invalid number of threads specified (%d)"), num_threads);
 #else
+   if (num_threads)
+   warning(_("no threads support, ignoring --threads"));
num_threads = 0;
 #endif
 
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 561709ef6a..f106387820 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -791,6 +791,24 @@ do
"
 done
 
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'grep --threads=N or 
pack.threads=N warns when no pthreads' '
+   git grep --threads=2 Hello hello_world 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   git -c grep.threads=2 grep Hello hello_world 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring grep.threads" err &&
+   git -c grep.threads=2 grep --threads=4 Hello hello_world 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   grep -F "no threads support, ignoring grep.threads" err &&
+   git -c grep.threads=0 grep --threads=0 Hello hello_world 2>err &&
+   test_line_count = 0 err
+'
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
mkdir -p s &&
(
-- 
2.13.0.303.g4ebf302169



[PATCH v4 28/31] pack-objects & index-pack: add test for --threads warning

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a test for the warning that's emitted when --threads or
pack.threads is provided under NO_PTHREADS=YesPlease. This uses the
new PTHREADS prerequisite.

The assertion for C_LOCALE_OUTPUT in the latter test is currently
redundant, since unlike index-pack the pack-objects warnings aren't
i18n'd. However they might be changed to be i18n'd in the future, and
there's no harm in future-proofing the test.

There's an existing bug in the implementation of pack-objects which
this test currently tests for as-is. Details about the bug & the fix
are included in a follow-up change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5300-pack-object.sh | 36 
 1 file changed, 36 insertions(+)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 43a672c345..6ed23ee1d2 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -421,6 +421,42 @@ test_expect_success 'index-pack  works in non-repo' '
test_path_is_file foo.idx
 '
 
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'index-pack --threads=N or 
pack.threads=N warns when no pthreads' '
+   test_must_fail git index-pack --threads=2 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring --threads=2" err &&
+
+   test_must_fail git -c pack.threads=2 index-pack 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring pack.threads" err &&
+
+   test_must_fail git -c pack.threads=2 index-pack --threads=4 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads=4" err &&
+   grep -F "no threads support, ignoring pack.threads" err
+'
+
+test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or 
pack.threads=N warns when no pthreads' '
+   git pack-objects --threads=2 --stdout --all /dev/null 2>err 
&&
+   grep ^warning: err >warnings &&
+   test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+
+   git -c pack.threads=2 pack-objects --stdout --all /dev/null 
2>err &&
+   grep ^warning: err >warnings &&
+   test_must_fail test_line_count = 1 warnings &&
+   grep -F "no threads support, ignoring pack.threads" err &&
+
+   git -c pack.threads=2 pack-objects --threads=4 --stdout --all 
/dev/null 2>err &&
+   grep ^warning: err >warnings &&
+   test_line_count = 2 warnings &&
+   grep -F "no threads support, ignoring --threads" err &&
+   grep -F "no threads support, ignoring pack.threads" err
+'
+
 #
 # WARNING!
 #
-- 
2.13.0.303.g4ebf302169



[PATCH v4 25/31] grep: change internal *pcre* variable & function names to be *pcre1*

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Change the internal PCRE variable & function names to have a "1"
suffix. This is for preparation for libpcre2 support, where having
non-versioned names would be confusing.

An earlier change in this series ("grep: change the internal PCRE
macro names to be PCRE1", 2017-04-07) elaborates on the motivations
behind this change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 52 ++--
 grep.h |  8 
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/grep.c b/grep.c
index 7a3858a1c3..2d54baeaa3 100644
--- a/grep.c
+++ b/grep.c
@@ -178,23 +178,23 @@ static void grep_set_pattern_type_option(enum 
grep_pattern_type pattern_type, st
 
case GREP_PATTERN_TYPE_BRE:
opt->fixed = 0;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
break;
 
case GREP_PATTERN_TYPE_ERE:
opt->fixed = 0;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
opt->regflags |= REG_EXTENDED;
break;
 
case GREP_PATTERN_TYPE_FIXED:
opt->fixed = 1;
-   opt->pcre = 0;
+   opt->pcre1 = 0;
break;
 
case GREP_PATTERN_TYPE_PCRE:
opt->fixed = 0;
-   opt->pcre = 1;
+   opt->pcre1 = 1;
break;
}
 }
@@ -334,7 +334,7 @@ static int has_null(const char *s, size_t len)
 }
 
 #ifdef USE_LIBPCRE1
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt 
*opt)
 {
const char *error;
int erroffset;
@@ -342,23 +342,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const 
struct grep_opt *opt)
 
if (opt->ignore_case) {
if (has_non_ascii(p->pattern))
-   p->pcre_tables = pcre_maketables();
+   p->pcre1_tables = pcre_maketables();
options |= PCRE_CASELESS;
}
if (is_utf8_locale() && has_non_ascii(p->pattern))
options |= PCRE_UTF8;
 
-   p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
- p->pcre_tables);
-   if (!p->pcre_regexp)
+   p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
+ p->pcre1_tables);
+   if (!p->pcre1_regexp)
compile_regexp_failed(p, error);
 
-   p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error);
-   if (!p->pcre_extra_info && error)
+   p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error);
+   if (!p->pcre1_extra_info && error)
die("%s", error);
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
regmatch_t *match, int eflags)
 {
int ovector[30], ret, flags = 0;
@@ -366,7 +366,7 @@ static int pcrematch(struct grep_pat *p, const char *line, 
const char *eol,
if (eflags & REG_NOTBOL)
flags |= PCRE_NOTBOL;
 
-   ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line,
+   ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
0, flags, ovector, ARRAY_SIZE(ovector));
if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
die("pcre_exec failed with error code %d", ret);
@@ -379,25 +379,25 @@ static int pcrematch(struct grep_pat *p, const char 
*line, const char *eol,
return ret;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
-   pcre_free(p->pcre_regexp);
-   pcre_free(p->pcre_extra_info);
-   pcre_free((void *)p->pcre_tables);
+   pcre_free(p->pcre1_regexp);
+   pcre_free(p->pcre1_extra_info);
+   pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
-static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
+static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt 
*opt)
 {
die("cannot use Perl-compatible regexes when not compiled with 
USE_LIBPCRE");
 }
 
-static int pcrematch(struct grep_pat *p, const char *line, const char *eol,
+static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
regmatch_t *match, int eflags)
 {
return 1;
 }
 
-static void free_pcre_regexp(struct grep_pat *p)
+static void free_pcre1_regexp(struct grep_pat *p)
 {
 }
 #endif /* !USE_LIBPCRE1 */
@@ -479,8 +479,8 @@ static void compile_regexp(struct grep_pat *p, struct 
grep_opt *opt)
return;
}
 
-   if (opt->pcre) {
-   compile_pcre_regexp(p, opt);
+   if (opt->pcre1) {
+   compile_pcre1_regexp(p, opt);
return;
}
 
@@ -836,8 +836,8 @@ v

[PATCH v4 23/31] grep: factor test for \0 in grep patterns into a function

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Factor the test for \0 in grep patterns into a function. Since commit
9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a
\0 is considered fixed as regcomp() can't handle it.

This change makes later changes that make use of either has_null() or
is_fixed() (but not both) smaller.

While I'm at it make the comment conform to the style guide, i.e. add
an opening "/*\n".

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 grep.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/grep.c b/grep.c
index bf6c2494fd..18306bc605 100644
--- a/grep.c
+++ b/grep.c
@@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct 
grep_pat *p,
die("%s'%s': %s", where, p->pattern, error);
 }
 
+static int has_null(const char *s, size_t len)
+{
+   /*
+* regcomp cannot accept patterns with NULs so when using it
+* we consider any pattern containing a NUL fixed.
+*/
+   if (memchr(s, 0, len))
+   return 1;
+
+   return 0;
+}
+
 #ifdef USE_LIBPCRE
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
@@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len)
 {
size_t i;
 
-   /* regcomp cannot accept patterns with NULs so we
-* consider any pattern containing a NUL fixed.
-*/
-   if (memchr(s, 0, len))
-   return 1;
-
for (i = 0; i < len; i++) {
if (is_regex_special(s[i]))
return 0;
@@ -451,7 +457,9 @@ static void compile_regexp(struct grep_pat *p, struct 
grep_opt *opt)
 * simple string match using kws.  p->fixed tells us if we
 * want to use kws.
 */
-   if (opt->fixed || is_fixed(p->pattern, p->patternlen))
+   if (opt->fixed ||
+   has_null(p->pattern, p->patternlen) ||
+   is_fixed(p->pattern, p->patternlen))
p->fixed = !icase || ascii_only;
else
p->fixed = 0;
-- 
2.13.0.303.g4ebf302169



[PATCH v4 18/31] perf: add a comparison test of grep regex engines with -F

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a performance comparison test of grep regex engines given fixed
strings.

The current logic in compile_regexp() ignores the engine parameter and
uses kwset() to search for these, so this test shows no difference
between engines right now:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run 
p7821-grep-engines-fixed.sh
[...]
Test this tree

7821.1: fixed grep int   0.56(1.67+0.68)
7821.2: basic grep int   0.57(1.70+0.57)
7821.3: extended grep int0.59(1.76+0.51)
7821.4: perl grep int1.08(1.71+0.55)
7821.6: fixed grep uncommon  0.23(0.55+0.50)
7821.7: basic grep uncommon  0.24(0.55+0.50)
7821.8: extended grep uncommon   0.26(0.55+0.52)
7821.9: perl grep uncommon   0.24(0.58+0.47)
7821.11: fixed grep æ0.36(1.30+0.42)
7821.12: basic grep æ0.36(1.32+0.40)
7821.13: extended grep æ 0.38(1.30+0.42)
7821.14: perl grep æ 0.35(1.24+0.48)

Only when run with -i via GIT_PERF_7821_GREP_OPTS=' -i' do we avoid
avoid going through the same kwset.[ch] codepath, see the "Even when
-F..."  comment in grep.c. This only kicks for the non-ASCII case:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_7821_GREP_OPTS=' -i' ./run p7821-grep-engines-fixed.sh
[...]
Testthis tree
---
7821.1: fixed grep -i int   0.62(2.10+0.57)
7821.2: basic grep -i int   0.68(1.90+0.61)
7821.3: extended grep -i int0.78(1.94+0.57)
7821.4: perl grep -i int0.98(1.78+0.74)
7821.6: fixed grep -i uncommon  0.24(0.44+0.64)
7821.7: basic grep -i uncommon  0.25(0.56+0.54)
7821.8: extended grep -i uncommon   0.27(0.62+0.45)
7821.9: perl grep -i uncommon   0.24(0.59+0.49)
7821.11: fixed grep -i æ0.30(0.96+0.39)
7821.12: basic grep -i æ0.27(0.92+0.44)
7821.13: extended grep -i æ 0.28(0.90+0.46)
7821.14: perl grep -i æ 0.28(0.74+0.49)

I'm planning to change how fixed-string searching happens. This test
gives a baseline for comparing performance before & after any such
change.

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p7821-grep-engines-fixed.sh | 41 ++
 1 file changed, 41 insertions(+)
 create mode 100755 t/perf/p7821-grep-engines-fixed.sh

diff --git a/t/perf/p7821-grep-engines-fixed.sh 
b/t/perf/p7821-grep-engines-fixed.sh
new file mode 100755
index 00..c7ef1e198f
--- /dev/null
+++ b/t/perf/p7821-grep-engines-fixed.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+test_description="Comparison of git-grep's regex engines with -F
+
+Set GIT_PERF_7821_GREP_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_7821_GREP_OPTS=' -w'. See p7820-grep-engines.sh for more
+options to try.
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in 'int' 'uncommon' 'æ'
+do
+   for engine in fixed basic extended perl
+   do
+   if test $engine = "perl" && ! test_have_prereq PCRE
+   then
+   prereq="PCRE"
+   else
+   prereq=""
+   fi
+   test_perf $prereq "$engine grep$GIT_PERF_7821_GREP_OPTS 
$pattern" "
+   git -c grep.patternType=$engine 
grep$GIT_PERF_7821_GREP_OPTS $pattern >'out.$engine' || :
+   "
+   done
+
+   test_expect_success "assert that all engines found the same 
for$GIT_PERF_7821_GREP_OPTS $pattern" '
+   test_cmp out.fixed out.basic &&
+   test_cmp out.fixed out.extended &&
+   if test_have_prereq PCRE
+   then
+   test_cmp out.fixed out.perl
+   fi
+   '
+done
+
+test_done
-- 
2.13.0.303.g4ebf302169



[PATCH v4 24/31] grep: change the internal PCRE macro names to be PCRE1

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Change the internal USE_LIBPCRE define, & build options flag to use a
naming convention ending in PCRE1, without changing the long-standing
USE_LIBPCRE Makefile flag which enables this code.

This is for preparation for libpcre2 support where having things like
USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely
need to for backwards compatibility with old Makefile arguments would
be confusing.

In some ways it would be better to change everything that now uses
USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying
USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time
burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their
build scripts.

However I'd like to leave the door open to making
USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease,
i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it
the default.

This code and the USE_LIBPCRE Makefile argument was added in commit
63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was
no indication that the PCRE project would release an entirely new &
incompatible API around 3 years later.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  | 4 ++--
 grep.c| 6 +++---
 grep.h| 2 +-
 t/test-lib.sh | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index d1587452f1..374fbc7e58 100644
--- a/Makefile
+++ b/Makefile
@@ -1088,7 +1088,7 @@ ifdef NO_LIBGEN_H
 endif
 
 ifdef USE_LIBPCRE
-   BASIC_CFLAGS += -DUSE_LIBPCRE
+   BASIC_CFLAGS += -DUSE_LIBPCRE1
ifdef LIBPCREDIR
BASIC_CFLAGS += -I$(LIBPCREDIR)/include
EXTLIBS += -L$(LIBPCREDIR)/$(lib) 
$(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib)
@@ -2240,7 +2240,7 @@ GIT-BUILD-OPTIONS: FORCE
@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+
@echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+
@echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+
-   @echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
+   @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' 
>>$@+
@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst 
','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
diff --git a/grep.c b/grep.c
index 18306bc605..7a3858a1c3 100644
--- a/grep.c
+++ b/grep.c
@@ -333,7 +333,7 @@ static int has_null(const char *s, size_t len)
return 0;
 }
 
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
const char *error;
@@ -385,7 +385,7 @@ static void free_pcre_regexp(struct grep_pat *p)
pcre_free(p->pcre_extra_info);
pcre_free((void *)p->pcre_tables);
 }
-#else /* !USE_LIBPCRE */
+#else /* !USE_LIBPCRE1 */
 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt)
 {
die("cannot use Perl-compatible regexes when not compiled with 
USE_LIBPCRE");
@@ -400,7 +400,7 @@ static int pcrematch(struct grep_pat *p, const char *line, 
const char *eol,
 static void free_pcre_regexp(struct grep_pat *p)
 {
 }
-#endif /* !USE_LIBPCRE */
+#endif /* !USE_LIBPCRE1 */
 
 static int is_fixed(const char *s, size_t len)
 {
diff --git a/grep.h b/grep.h
index 267534ca24..073b0e4c92 100644
--- a/grep.h
+++ b/grep.h
@@ -1,7 +1,7 @@
 #ifndef GREP_H
 #define GREP_H
 #include "color.h"
-#ifdef USE_LIBPCRE
+#ifdef USE_LIBPCRE1
 #include 
 #else
 typedef int pcre;
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 04d857a42b..1d0f636cbd 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1014,7 +1014,7 @@ esac
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PYTHON" && test_set_prereq PYTHON
-test -n "$USE_LIBPCRE" && test_set_prereq PCRE
+test -n "$USE_LIBPCRE1" && test_set_prereq PCRE
 test -z "$NO_GETTEXT" && test_set_prereq GETTEXT
 
 # Can we rely on git's output in the C locale?
-- 
2.13.0.303.g4ebf302169



[PATCH v4 17/31] perf: add a comparison test of grep regex engines

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a very basic performance comparison test comparing the POSIX
basic, extended and perl engines.

In theory the "basic" and "extended" engines should be implemented
using the same underlying code with a slightly different pattern
parser, but some implementations may not do this. Jump through some
slight hoops to test both, which is worthwhile since "basic" is the
default.

Running this on an i7 3.4GHz Linux 4.9.0-2 Debian testing against a
checkout of linux.git & latest upstream PCRE, both PCRE and git
compiled with -O3 using gcc 7.1.1:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run 
p7820-grep-engines.sh
[...]
Testthis tree
---
7820.1: basic grep 'how.to' 0.34(1.24+0.53)
7820.2: extended grep 'how.to'  0.33(1.23+0.45)
7820.3: perl grep 'how.to'  0.31(1.05+0.56)
7820.5: basic grep '^how to'0.32(1.24+0.42)
7820.6: extended grep '^how to' 0.33(1.20+0.44)
7820.7: perl grep '^how to' 0.57(2.67+0.42)
7820.9: basic grep '[how] to'   0.51(2.16+0.45)
7820.10: extended grep '[how] to'   0.49(2.20+0.43)
7820.11: perl grep '[how] to'   0.56(2.60+0.43)
7820.13: basic grep '\(e.t[^ ]*\|v.ry\) rare'   0.66(3.25+0.40)
7820.14: extended grep '(e.t[^ ]*|v.ry) rare'   0.65(3.19+0.46)
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'   1.05(5.74+0.34)
7820.17: basic grep 'm\(ú\|u\)lt.b\(æ\|y\)te'   0.34(1.28+0.47)
7820.18: extended grep 'm(ú|u)lt.b(æ|y)te'  0.34(1.38+0.38)
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'  0.39(1.56+0.44)

Options can also be passed to git-grep via the GIT_PERF_7820_GREP_OPTS
environment variable. There are various modes such as "-v" that have
very different performance profiles, but handling the combinatorial
explosion of testing all those options would make this script much
more complex and harder to maintain. Instead just add the ability to
do one-shot runs with arbitrary options, e.g.:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_7820_GREP_OPTS=" -i" ./run p7820-grep-engines.sh
[...]
Test   this tree
--
7820.1: basic grep -i 'how.to' 0.49(1.72+0.38)
7820.2: extended grep -i 'how.to'  0.46(1.64+0.42)
7820.3: perl grep -i 'how.to'  0.44(1.45+0.45)
7820.5: basic grep -i '^how to'0.47(1.76+0.38)
7820.6: extended grep -i '^how to' 0.47(1.70+0.42)
7820.7: perl grep -i '^how to' 0.65(2.72+0.37)
7820.9: basic grep -i '[how] to'   0.86(3.64+0.42)
7820.10: extended grep -i '[how] to'   0.84(3.62+0.46)
7820.11: perl grep -i '[how] to'   0.73(3.06+0.39)
7820.13: basic grep -i '\(e.t[^ ]*\|v.ry\) rare'   1.63(8.13+0.36)
7820.14: extended grep -i '(e.t[^ ]*|v.ry) rare'   1.64(8.01+0.44)
7820.15: perl grep -i '(e.t[^ ]*|v.ry) rare'   1.44(6.88+0.44)
7820.17: basic grep -i 'm\(ú\|u\)lt.b\(æ\|y\)te'   0.66(2.67+0.44)
7820.18: extended grep -i 'm(ú|u)lt.b(æ|y)te'  0.66(2.67+0.43)
7820.19: perl grep -i 'm(ú|u)lt.b(æ|y)te'  0.59(2.31+0.37)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p7820-grep-engines.sh | 56 
 1 file changed, 56 insertions(+)
 create mode 100755 t/perf/p7820-grep-engines.sh

diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh
new file mode 100755
index 00..62aba19e76
--- /dev/null
+++ b/t/perf/p7820-grep-engines.sh
@@ -0,0 +1,56 @@
+#!/bin/sh
+
+test_description="Comparison of git-grep's regex engines
+
+Set GIT_PERF_7820_GREP_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try:
+
+   -i
+   -w
+   -v
+   -vi
+   -vw
+   -viw
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in \
+   'how.to' \
+   '^how to' \
+   '[how] to' \
+   '\(e.t[^ ]*\|v.ry\) rare' \
+   'm\(ú\|u\)lt.b\(æ\|y\)te'
+do
+   for engine in basic extended perl
+   do
+   if test $engine != "basic"
+   then
+   # Poor man's basic -> extended converter.
+   pattern=$(echo "$pattern" | sed 's/\\//g')
+   fi
+   if test $engine = "perl" && ! test_have_prereq PCRE
+   then
+   prereq="PCRE"
+   else
+   prereq=""
+   fi
+   test_perf $prereq "$engine grep$GIT_PERF_7820_GREP_O

[PATCH v4 15/31] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a git GIT_PERF_MAKE_COMMAND variable to compliment the existing
GIT_PERF_MAKE_OPTS facility. This allows specifying an arbitrary shell
command to execute instead of 'make'.

This is useful e.g. in cases where the name, semantics or defaults of
a Makefile flag have changed over time. It can even be used to change
the contents of the tree, useful for monkeypatching ancient versions
of git to get them to build.

This opens Pandora's box in some ways, it's now possible to
"jailbreak" the perf environment and e.g. modify the source tree via
this arbitrary instead of just issuing a custom "make" command, such a
command has to be re-entrant in the sense that subsequent perf runs
will re-use the possibly modified tree.

It would be pointless to try to mitigate or work around that caveat in
a tool purely aimed at Git developers, so this change makes no attempt
to do so.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile  |  3 +++
 t/perf/README | 17 -
 t/perf/run| 11 +--
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index eedadb8056..d1587452f1 100644
--- a/Makefile
+++ b/Makefile
@@ -2272,6 +2272,9 @@ endif
 ifdef GIT_PERF_MAKE_OPTS
@echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst 
','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@+
 endif
+ifdef GIT_PERF_MAKE_COMMAND
+   @echo GIT_PERF_MAKE_COMMAND=\''$(subst ','\'',$(subst 
','\'',$(GIT_PERF_MAKE_COMMAND)))'\' >>$@+
+endif
 ifdef GIT_INTEROP_MAKE_OPTS
@echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst 
','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+
 endif
diff --git a/t/perf/README b/t/perf/README
index 49ea4349be..0b6a8d2906 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -60,7 +60,22 @@ You can set the following variables (also in your 
config.mak):
 
 GIT_PERF_MAKE_OPTS
Options to use when automatically building a git tree for
-   performance testing.  E.g., -j6 would be useful.
+   performance testing. E.g., -j6 would be useful. Passed
+   directly to make as "make $GIT_PERF_MAKE_OPTS".
+
+GIT_PERF_MAKE_COMMAND
+   An arbitrary command that'll be run in place of the make
+   command, if set the GIT_PERF_MAKE_OPTS variable is
+   ignored. Useful in cases where source tree changes might
+   require issuing a different make command to different
+   revisions.
+
+   This can be (ab)used to monkeypatch or otherwise change the
+   tree about to be built. Note that the build directory can be
+   re-used for subsequent runs so the make command might get
+   executed multiple times on the same tree, but don't count on
+   any of that, that's an implementation detail that might change
+   in the future.
 
 GIT_PERF_REPO
 GIT_PERF_LARGE_REPO
diff --git a/t/perf/run b/t/perf/run
index c788d713ae..b61024a830 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -37,8 +37,15 @@ build_git_rev () {
cp "../../$config" "build/$rev/"
fi
done
-   (cd build/$rev && make $GIT_PERF_MAKE_OPTS) ||
-   die "failed to build revision '$mydir'"
+   (
+   cd build/$rev &&
+   if test -n "$GIT_PERF_MAKE_COMMAND"
+   then
+   sh -c "$GIT_PERF_MAKE_COMMAND"
+   else
+   make $GIT_PERF_MAKE_OPTS
+   fi
+   ) || die "failed to build revision '$mydir'"
 }
 
 run_dirs_helper () {
-- 
2.13.0.303.g4ebf302169



[PATCH v4 16/31] perf: emit progress output when unpacking & building

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Amend the t/perf/run output so that in addition to the "Running N
tests" heading currently being emitted, it also emits "Unpacking $rev"
and "Building $rev" when setting up the build/$rev directory & when
building it, respectively.

This makes it easier to see what's going on and what revision is being
tested as the output scrolls by.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/run | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/perf/run b/t/perf/run
index b61024a830..beb4acc0e4 100755
--- a/t/perf/run
+++ b/t/perf/run
@@ -24,6 +24,7 @@ run_one_dir () {
 
 unpack_git_rev () {
rev=$1
+   echo "=== Unpacking $rev in build/$rev ==="
mkdir -p build/$rev
(cd "$(git rev-parse --show-cdup)" && git archive --format=tar $rev) |
(cd build/$rev && tar x)
@@ -37,6 +38,7 @@ build_git_rev () {
cp "../../$config" "build/$rev/"
fi
done
+   echo "=== Building $rev ==="
(
cd build/$rev &&
if test -n "$GIT_PERF_MAKE_COMMAND"
-- 
2.13.0.303.g4ebf302169



[PATCH v4 20/31] perf: add a comparison test of log --grep regex engines with -F

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a performance comparison test of log --grepgrep regex engines
given fixed strings.

See the preceding fixed-string t/perf change ("perf: add a comparison
test of grep regex engines with -F", 2017-04-21) for notes about this,
in particular this mostly tests exactly the same codepath now, but
might not in the future:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run 
p4221-log-grep-engines-fixed.sh
[...]
Test this tree

4221.1: fixed log --grep='int'   5.99(5.55+0.40)
4221.2: basic log --grep='int'   5.92(5.56+0.31)
4221.3: extended log --grep='int'6.01(5.51+0.45)
4221.4: perl log --grep='int'5.99(5.56+0.38)
4221.6: fixed log --grep='uncommon'  5.06(4.76+0.27)
4221.7: basic log --grep='uncommon'  5.02(4.78+0.21)
4221.8: extended log --grep='uncommon'   4.99(4.78+0.20)
4221.9: perl log --grep='uncommon'   5.00(4.72+0.26)
4221.11: fixed log --grep='æ'5.35(5.12+0.20)
4221.12: basic log --grep='æ'5.34(5.11+0.20)
4221.13: extended log --grep='æ' 5.39(5.10+0.22)
4221.14: perl log --grep='æ' 5.44(5.16+0.23)

Only the non-ASCII -i case is different:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_4221_LOG_OPTS=' -i' ./run p4221-log-grep-engines-fixed.sh
[...]
Testthis tree
---
4221.1: fixed log -i --grep='int'   6.17(5.77+0.35)
4221.2: basic log -i --grep='int'   6.16(5.59+0.39)
4221.3: extended log -i --grep='int'6.15(5.70+0.39)
4221.4: perl log -i --grep='int'6.15(5.69+0.38)
4221.6: fixed log -i --grep='uncommon'  5.10(4.88+0.21)
4221.7: basic log -i --grep='uncommon'  5.04(4.76+0.25)
4221.8: extended log -i --grep='uncommon'   5.07(4.82+0.23)
4221.9: perl log -i --grep='uncommon'   5.03(4.78+0.22)
4221.11: fixed log -i --grep='æ'5.93(5.65+0.25)
4221.12: basic log -i --grep='æ'5.88(5.62+0.25)
4221.13: extended log -i --grep='æ' 6.02(5.69+0.29)
4221.14: perl log -i --grep='æ' 5.36(5.06+0.29)

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p4221-log-grep-engines-fixed.sh | 44 ++
 1 file changed, 44 insertions(+)
 create mode 100755 t/perf/p4221-log-grep-engines-fixed.sh

diff --git a/t/perf/p4221-log-grep-engines-fixed.sh 
b/t/perf/p4221-log-grep-engines-fixed.sh
new file mode 100755
index 00..060971265a
--- /dev/null
+++ b/t/perf/p4221-log-grep-engines-fixed.sh
@@ -0,0 +1,44 @@
+#!/bin/sh
+
+test_description="Comparison of git-log's --grep regex engines with -F
+
+Set GIT_PERF_4221_LOG_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_4221_LOG_OPTS=' -i'. Some options to try:
+
+   -i
+   --invert-grep
+   -i --invert-grep
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in 'int' 'uncommon' 'æ'
+do
+   for engine in fixed basic extended perl
+   do
+   if test $engine = "perl" && ! test_have_prereq PCRE
+   then
+   prereq="PCRE"
+   else
+   prereq=""
+   fi
+   test_perf $prereq "$engine log$GIT_PERF_4221_LOG_OPTS 
--grep='$pattern'" "
+   git -c grep.patternType=$engine log 
--pretty=format:%h$GIT_PERF_4221_LOG_OPTS --grep='$pattern' >'out.$engine' || :
+   "
+   done
+
+   test_expect_success "assert that all engines found the same 
for$GIT_PERF_4221_LOG_OPTS '$pattern'" '
+   test_cmp out.fixed out.basic &&
+   test_cmp out.fixed out.extended &&
+   if test_have_prereq PCRE
+   then
+   test_cmp out.fixed out.perl
+   fi
+   '
+done
+
+test_done
-- 
2.13.0.303.g4ebf302169



[PATCH v4 19/31] perf: add a comparison test of log --grep regex engines

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a very basic performance comparison test comparing the POSIX
basic, extended and perl engines with patterns matching log messages
via --grep=.

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run 
p4220-log-grep-engines.sh
[...]
Test  this tree
-
4220.1: basic log --grep='how.to' 6.22(6.00+0.21)
4220.2: extended log --grep='how.to'  6.23(5.98+0.23)
4220.3: perl log --grep='how.to'  6.07(5.79+0.25)
4220.5: basic log --grep='^how to'6.19(5.93+0.22)
4220.6: extended log --grep='^how to' 6.19(5.93+0.23)
4220.7: perl log --grep='^how to' 6.14(5.88+0.24)
4220.9: basic log --grep='[how] to'   6.96(6.65+0.28)
4220.10: extended log --grep='[how] to'   6.96(6.69+0.24)
4220.11: perl log --grep='[how] to'   6.95(6.58+0.33)
4220.13: basic log --grep='\(e.t[^ ]*\|v.ry\) rare'   7.10(6.80+0.27)
4220.14: extended log --grep='(e.t[^ ]*|v.ry) rare'   7.07(6.80+0.26)
4220.15: perl log --grep='(e.t[^ ]*|v.ry) rare'   7.70(7.46+0.22)
4220.17: basic log --grep='m\(ú\|u\)lt.b\(æ\|y\)te'   6.12(5.87+0.24)
4220.18: extended log --grep='m(ú|u)lt.b(æ|y)te'  6.14(5.84+0.26)
4220.19: perl log --grep='m(ú|u)lt.b(æ|y)te'  6.16(5.93+0.20)

With -i:

$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux 
GIT_PERF_4220_LOG_OPTS=' -i' ./run p4220-log-grep-engines.sh
[...]
Test this tree

4220.1: basic log -i --grep='how.to' 6.74(6.41+0.32)
4220.2: extended log -i --grep='how.to'  6.78(6.55+0.22)
4220.3: perl log -i --grep='how.to'  6.06(5.77+0.28)
4220.5: basic log -i --grep='^how to'6.80(6.57+0.22)
4220.6: extended log -i --grep='^how to' 6.83(6.52+0.29)
4220.7: perl log -i --grep='^how to' 6.16(5.94+0.20)
4220.9: basic log -i --grep='[how] to'   7.87(7.61+0.24)
4220.10: extended log -i --grep='[how] to'   7.85(7.57+0.27)
4220.11: perl log -i --grep='[how] to'   7.03(6.75+0.25)
4220.13: basic log -i --grep='\(e.t[^ ]*\|v.ry\) rare'   8.68(8.41+0.25)
4220.14: extended log -i --grep='(e.t[^ ]*|v.ry) rare'   8.80(8.44+0.28)
4220.15: perl log -i --grep='(e.t[^ ]*|v.ry) rare'   7.85(7.56+0.26)
4220.17: basic log -i --grep='m\(ú\|u\)lt.b\(æ\|y\)te'   6.94(6.68+0.24)
4220.18: extended log -i --grep='m(ú|u)lt.b(æ|y)te'  7.04(6.76+0.24)
4220.19: perl log -i --grep='m(ú|u)lt.b(æ|y)te'  6.26(5.92+0.29)

See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.

Before commit ("log: make --regexp-ignore-case work with
--perl-regexp", 2017-05-20) this test will almost definitely
fail (depending on the repo) if passed the -i option, since it wasn't
properly supported under PCRE.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p4220-log-grep-engines.sh | 53 
 1 file changed, 53 insertions(+)
 create mode 100755 t/perf/p4220-log-grep-engines.sh

diff --git a/t/perf/p4220-log-grep-engines.sh b/t/perf/p4220-log-grep-engines.sh
new file mode 100755
index 00..2bc47ded4d
--- /dev/null
+++ b/t/perf/p4220-log-grep-engines.sh
@@ -0,0 +1,53 @@
+#!/bin/sh
+
+test_description="Comparison of git-log's --grep regex engines
+
+Set GIT_PERF_4220_LOG_OPTS in the environment to pass options to
+git-grep. Make sure to include a leading space,
+e.g. GIT_PERF_4220_LOG_OPTS=' -i'. Some options to try:
+
+   -i
+   --invert-grep
+   -i --invert-grep
+"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+for pattern in \
+   'how.to' \
+   '^how to' \
+   '[how] to' \
+   '\(e.t[^ ]*\|v.ry\) rare' \
+   'm\(ú\|u\)lt.b\(æ\|y\)te'
+do
+   for engine in basic extended perl
+   do
+   if test $engine != "basic"
+   then
+   # Poor man's basic -> extended converter.
+   pattern=$(echo $pattern | sed 's/\\//g')
+   fi
+   if test $engine = "perl" && ! test_have_prereq PCRE
+   then
+   prereq="PCRE"
+   else
+   prereq=""
+   fi
+   test_perf $prereq "$engine log$GIT_PERF_4220_LOG_OPTS 
--grep='$pattern'" "
+   git -c grep.patternType=$engine log 
--pretty=format:%h$GIT_PERF_4220_LOG_OPTS --grep='$pattern' >'out.$engine' || :
+   "
+   done
+
+   

[PATCH v4 12/31] grep: add a test helper function for less verbose -f \0 tests

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a helper function to make the tests which check for patterns with
\0 in them more succinct. Right now this isn't a big win, but
subsequent commits will add a lot more of these tests.

The helper is based on the match() function in t3070-wildmatch.sh.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7008-grep-binary.sh | 58 +-
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 9c9c378119..df93d8e44c 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -4,6 +4,29 @@ test_description='git grep in binary files'
 
 . ./test-lib.sh
 
+nul_match () {
+   matches=$1
+   flags=$2
+   pattern=$3
+   pattern_human=$(echo "$pattern" | sed 's/Q//g')
+
+   if test "$matches" = 1
+   then
+   test_expect_success "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   git grep -f f $flags a
+   "
+   elif test "$matches" = 0
+   then
+   test_expect_success "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   test_must_fail git grep -f f $flags a
+   "
+   else
+   test_expect_success "PANIC: Test framework error. Unknown 
matches value $matches" 'false'
+   fi
+}
+
 test_expect_success 'setup' "
echo 'binaryQfile' | q_to_nul >a &&
git add a &&
@@ -69,35 +92,12 @@ test_expect_failure 'git grep .fi a' '
git grep .fi a
 '
 
-test_expect_success 'git grep -F yf a' "
-   printf 'yQf' | q_to_nul >f &&
-   git grep -f f -F a
-"
-
-test_expect_success 'git grep -F yx a' "
-   printf 'yQx' | q_to_nul >f &&
-   test_must_fail git grep -f f -F a
-"
-
-test_expect_success 'git grep -Fi Yf a' "
-   printf 'YQf' | q_to_nul >f &&
-   git grep -f f -Fi a
-"
-
-test_expect_success 'git grep -Fi Yx a' "
-   printf 'YQx' | q_to_nul >f &&
-   test_must_fail git grep -f f -Fi a
-"
-
-test_expect_success 'git grep yf a' "
-   printf 'yQf' | q_to_nul >f &&
-   git grep -f f a
-"
-
-test_expect_success 'git grep yx a' "
-   printf 'yQx' | q_to_nul >f &&
-   test_must_fail git grep -f f a
-"
+nul_match 1 '-F' 'yQf'
+nul_match 0 '-F' 'yQx'
+nul_match 1 '-Fi' 'YQf'
+nul_match 0 '-Fi' 'YQx'
+nul_match 1 '' 'yQf'
+nul_match 0 '' 'yQx'
 
 test_expect_success 'grep respects binary diff attribute' '
echo text >t &&
-- 
2.13.0.303.g4ebf302169



[PATCH v4 13/31] grep: prepare for testing binary regexes containing rx metacharacters

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add setup code needed for testing regexes that contain both binary
data and regex metacharacters.

The POSIX regcomp() function inherently can't support that, because it
takes a \0-delimited char *, but other regex engines APIs like PCRE v2
take a pattern/length pair, and are thus able to handle \0s in
patterns as well as any other character.

When kwset was imported in commit 9eceddeec6 ("Use kwset in grep",
2011-08-21) this limitation was fixed, but at the expense of
introducing the undocumented limitation that any pattern containing \0
implicitly becomes a fixed match (equivalent to -F having been
provided).

That's not something we'd like to keep in the future. The inability to
match patterns containing \0 is a leaky implementation detail.

So add tests as a first step towards changing that. In order to test
that \0-patterns can properly match as regexes the test string needs
to have some regex metacharacters in it.

There were other blind spots in the tests. The code around kwset
specially handles case-insensitive & non-ASCII data, but there were no
tests for this.

Fix all of that by amending the text being matched to contain both
regex metacharacters & non-ASCII data.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7008-grep-binary.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index df93d8e44c..20370d6e0c 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -28,7 +28,7 @@ nul_match () {
 }
 
 test_expect_success 'setup' "
-   echo 'binaryQfile' | q_to_nul >a &&
+   echo 'binaryQfileQm[*]cQ*æQð' | q_to_nul >a &&
git add a &&
git commit -m.
 "
@@ -162,7 +162,7 @@ test_expect_success 'grep does not honor textconv' '
 '
 
 test_expect_success 'grep --textconv honors textconv' '
-   echo "a:binaryQfile" >expect &&
+   echo "a:binaryQfileQm[*]cQ*æQð" >expect &&
git grep --textconv Qfile >actual &&
test_cmp expect actual
 '
@@ -172,7 +172,7 @@ test_expect_success 'grep --no-textconv does not honor 
textconv' '
 '
 
 test_expect_success 'grep --textconv blob honors textconv' '
-   echo "HEAD:a:binaryQfile" >expect &&
+   echo "HEAD:a:binaryQfileQm[*]cQ*æQð" >expect &&
git grep --textconv Qfile HEAD:a >actual &&
test_cmp expect actual
 '
-- 
2.13.0.303.g4ebf302169



[PATCH v4 14/31] grep: add tests to fix blind spots with \0 patterns

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Address a big blind spot in the tests for patterns containing \0. The
is_fixed() function considers any string that contains \0 fixed, even
if it contains regular expression metacharacters, those patterns are
currently matched with kwset.

Before this change removing that memchr(s, 0, len) check from
is_fixed() wouldn't change the result of any of the tests, since
regcomp() will happily match the part before the \0.

The kwset path is dependent on whether the the -i flag is on, and
whether the pattern has any non-ASCII characters, but none of this was
tested for.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7008-grep-binary.sh | 71 ++
 1 file changed, 71 insertions(+)

diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh
index 20370d6e0c..615e7e0162 100755
--- a/t/t7008-grep-binary.sh
+++ b/t/t7008-grep-binary.sh
@@ -22,6 +22,18 @@ nul_match () {
printf '$pattern' | q_to_nul >f &&
test_must_fail git grep -f f $flags a
"
+   elif test "$matches" = T1
+   then
+   test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   git grep -f f $flags a
+   "
+   elif test "$matches" = T0
+   then
+   test_expect_failure "git grep -f f $flags '$pattern_human' a" "
+   printf '$pattern' | q_to_nul >f &&
+   test_must_fail git grep -f f $flags a
+   "
else
test_expect_success "PANIC: Test framework error. Unknown 
matches value $matches" 'false'
fi
@@ -98,6 +110,65 @@ nul_match 1 '-Fi' 'YQf'
 nul_match 0 '-Fi' 'YQx'
 nul_match 1 '' 'yQf'
 nul_match 0 '' 'yQx'
+nul_match 1 '' 'æQð'
+nul_match 1 '-F' 'eQm[*]c'
+nul_match 1 '-Fi' 'EQM[*]C'
+
+# Regex patterns that would match but shouldn't with -F
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-F' '[y]Qf'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '-Fi' '[Y]QF'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-F' '[æ]Qð'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '-Fi' '[Æ]QÐ'
+
+# kwset is disabled on -i & non-ASCII. No way to match non-ASCII \0
+# patterns case-insensitively.
+nul_match T1 '-i' 'ÆQÐ'
+
+# \0 implicitly disables regexes. This is an undocumented internal
+# limitation.
+nul_match T1 '' 'yQ[f]'
+nul_match T1 '' '[y]Qf'
+nul_match T1 '-i' 'YQ[F]'
+nul_match T1 '-i' '[Y]Qf'
+nul_match T1 '' 'æQ[ð]'
+nul_match T1 '' '[æ]Qð'
+nul_match T1 '-i' 'ÆQ[Ð]'
+
+# ... because of \0 implicitly disabling regexes regexes that
+# should/shouldn't match don't do the right thing.
+nul_match T1 '' 'eQm.*cQ'
+nul_match T1 '-i' 'EQM.*cQ'
+nul_match T0 '' 'eQm[*]c'
+nul_match T0 '-i' 'EQM[*]C'
+
+# Due to the REG_STARTEND extension when kwset() is disabled on -i &
+# non-ASCII the string will be matched in its entirety, but the
+# pattern will be cut off at the first \0.
+nul_match 0 '-i' 'NOMATCHQð'
+nul_match T0 '-i' '[Æ]QNOMATCH'
+nul_match T0 '-i' '[æ]QNOMATCH'
+# Matches, but for the wrong reasons, just stops at [æ]
+nul_match 1 '-i' '[Æ]Qð'
+nul_match 1 '-i' '[æ]Qð'
+
+# Ensure that the matcher doesn't regress to something that stops at
+# \0
+nul_match 0 '-F' 'yQ[f]'
+nul_match 0 '-Fi' 'YQ[F]'
+nul_match 0 '' 'yQNOMATCH'
+nul_match 0 '' 'QNOMATCH'
+nul_match 0 '-i' 'YQNOMATCH'
+nul_match 0 '-i' 'QNOMATCH'
+nul_match 0 '-F' 'æQ[ð]'
+nul_match 0 '-Fi' 'ÆQ[Ð]'
+nul_match 0 '' 'yQNÓMATCH'
+nul_match 0 '' 'QNÓMATCH'
+nul_match 0 '-i' 'YQNÓMATCH'
+nul_match 0 '-i' 'QNÓMATCH'
 
 test_expect_success 'grep respects binary diff attribute' '
echo text >t &&
-- 
2.13.0.303.g4ebf302169



[PATCH v4 05/31] log: make --regexp-ignore-case work with --perl-regexp

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Make the --regexp-ignore-case option work with --perl-regexp. This
never worked, and there was no test for this. Fix the bug and add a
test.

When PCRE support was added in commit 63e7e9d8b6 ("git-grep: Learn
PCRE", 2011-05-09) compile_pcre_regexp() would only check
opt->ignore_case, but when the --perl-regexp option was added in
commit 727b6fc3ed ("log --grep: accept --basic-regexp and
--perl-regexp", 2012-10-03) the code didn't set the opt->ignore_case.

Change the test suite to test for -i and --invert-regexp with
basic/extended/perl patterns in addition to fixed, which was the only
patternType that was tested for before in combination with those
options.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 revision.c |  1 +
 t/t4202-log.sh | 60 +-
 2 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/revision.c b/revision.c
index 8a8c1789c7..4883cdd2d0 100644
--- a/revision.c
+++ b/revision.c
@@ -1991,6 +1991,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
} else if (!strcmp(arg, "--extended-regexp") || !strcmp(arg, "-E")) {
revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE;
} else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) {
+   revs->grep_filter.ignore_case = 1;
revs->grep_filter.regflags |= REG_ICASE;
DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE);
} else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) {
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 948fd719d2..c25eb9afd1 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -231,14 +231,47 @@ second
 initial
 EOF
 test_expect_success 'log --invert-grep --grep' '
-   git log --pretty="tformat:%s" --invert-grep --grep=th --grep=Sec 
>actual &&
-   test_cmp expect actual
+   # Fixed
+   git -c grep.patternType=fixed log --pretty="tformat:%s" --invert-grep 
--grep=th --grep=Sec >actual &&
+   test_cmp expect actual &&
+
+   # POSIX basic
+   git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep 
--grep=t[h] --grep=S[e]c >actual &&
+   test_cmp expect actual &&
+
+   # POSIX extended
+   git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep 
--grep=t[h] --grep=S[e]c >actual &&
+   test_cmp expect actual &&
+
+   # PCRE
+   if test_have_prereq PCRE
+   then
+   git -c grep.patternType=perl log --pretty="tformat:%s" 
--invert-grep --grep=t[h] --grep=S[e]c >actual &&
+   test_cmp expect actual
+   fi
 '
 
 test_expect_success 'log --invert-grep --grep -i' '
echo initial >expect &&
-   git log --pretty="tformat:%s" --invert-grep -i --grep=th --grep=Sec 
>actual &&
-   test_cmp expect actual
+
+   # Fixed
+   git -c grep.patternType=fixed log --pretty="tformat:%s" --invert-grep 
-i --grep=th --grep=Sec >actual &&
+   test_cmp expect actual &&
+
+   # POSIX basic
+   git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep 
-i --grep=t[h] --grep=S[e]c >actual &&
+   test_cmp expect actual &&
+
+   # POSIX extended
+   git -c grep.patternType=extended log --pretty="tformat:%s" 
--invert-grep -i --grep=t[h] --grep=S[e]c >actual &&
+   test_cmp expect actual &&
+
+   # PCRE
+   if test_have_prereq PCRE
+   then
+   git -c grep.patternType=perl log --pretty="tformat:%s" 
--invert-grep -i --grep=t[h] --grep=S[e]c >actual &&
+   test_cmp expect actual
+   fi
 '
 
 test_expect_success 'log --grep option parsing' '
@@ -256,8 +289,25 @@ test_expect_success 'log -i --grep' '
 
 test_expect_success 'log --grep -i' '
echo Second >expect &&
+
+   # Fixed
git log -1 --pretty="tformat:%s" --grep=sec -i >actual &&
-   test_cmp expect actual
+   test_cmp expect actual &&
+
+   # POSIX basic
+   git -c grep.patternType=basic log -1 --pretty="tformat:%s" --grep=s[e]c 
-i >actual &&
+   test_cmp expect actual &&
+
+   # POSIX extended
+   git -c grep.patternType=extended log -1 --pretty="tformat:%s" 
--grep=s[e]c -i >actual &&
+   test_cmp expect actual &&
+
+   # PCRE
+   if test_have_prereq PCRE
+   then
+   git -c grep.patternType=perl log -1 --pretty="tformat:%s" 
--grep=s[e]c -i >actual &&
+   test_cmp expect actual
+   fi
 '
 
 test_expect_success 'log -F -E --grep= uses ere' '
-- 
2.13.0.303.g4ebf302169



[PATCH v4 10/31] grep: amend submodule recursion test for regex engine testing

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Amend the submodule recursion test to prepare it for subsequent tests
of whether it passes along the grep.patternType to the submodule
greps.

This is the result of searching & replacing:

foobar -> (1|2)d(3|4)
foo-> (1|2)
bar-> (3|4)

Currently there's no tests for whether e.g. -P or -E is correctly
passed along, tests for that will be added in a follow-up change, but
first add content to the tests which will match differently under
different regex engines.

Reuse the pattern established in an earlier commit of mine in this
series ("log: add exhaustive tests for pattern style options &
config", 2017-04-07). The pattern "(.|.)[\d]" will match this content
differently under fixed/basic/extended & perl.

This test code was originally added in commit 0281e487fd ("grep:
optionally recurse into submodules", 2016-12-16).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7814-grep-recurse-submodules.sh | 166 ++---
 1 file changed, 83 insertions(+), 83 deletions(-)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 5b6eb3a65e..1472855e1d 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -9,13 +9,13 @@ submodules.
 . ./test-lib.sh
 
 test_expect_success 'setup directory structure and submodule' '
-   echo "foobar" >a &&
+   echo "(1|2)d(3|4)" >a &&
mkdir b &&
-   echo "bar" >b/b &&
+   echo "(3|4)" >b/b &&
git add a b &&
git commit -m "add a and b" &&
git init submodule &&
-   echo "foobar" >submodule/a &&
+   echo "(1|2)d(3|4)" >submodule/a &&
git -C submodule add a &&
git -C submodule commit -m "add a" &&
git submodule add ./submodule &&
@@ -24,18 +24,18 @@ test_expect_success 'setup directory structure and 
submodule' '
 
 test_expect_success 'grep correctly finds patterns in a submodule' '
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
-   submodule/a:foobar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and basic pathspecs' '
cat >expect <<-\EOF &&
-   submodule/a:foobar
+   submodule/a:(1|2)d(3|4)
EOF
 
git grep -e. --recurse-submodules -- submodule >actual &&
@@ -44,7 +44,7 @@ test_expect_success 'grep and basic pathspecs' '
 
 test_expect_success 'grep and nested submodules' '
git init submodule/sub &&
-   echo "foobar" >submodule/sub/a &&
+   echo "(1|2)d(3|4)" >submodule/sub/a &&
git -C submodule/sub add a &&
git -C submodule/sub commit -m "add a" &&
git -C submodule submodule add ./sub &&
@@ -54,117 +54,117 @@ test_expect_success 'grep and nested submodules' '
git commit -m "updated submodule" &&
 
cat >expect <<-\EOF &&
-   a:foobar
-   b/b:bar
-   submodule/a:foobar
-   submodule/sub/a:foobar
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
cat >expect <<-\EOF &&
-   a:foobar
-   submodule/a:foobar
-   submodule/sub/a:foobar
+   a:(1|2)d(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --and -e "foo" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --and -e "(1|2)" --recurse-submodules >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep and multiple patterns' '
cat >expect <<-\EOF &&
-   b/b:bar
+   b/b:(3|4)
EOF
 
-   git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual &&
+   git grep -e "(3|4)" --and --not -e "(1|2)" --recurse-submodules >actual 
&&
test_cmp expect actual
 '
 
 test_expect_success 'basic grep tree' '
cat >expect <<-\EOF &&
-   HEAD:a:foobar
-   HEAD:b/b:bar
-   HEAD:submodule/a:foobar
-   HEAD:submodule/sub/a:foobar
+   HEAD:a:(1|2)d(3|4)
+   HEAD:b/b:(3|4)
+   HEAD:submodule/a:(1|2)d(3|4)
+   HEAD:submodule/sub/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules HEAD >actual &&
+   git grep -e "(3|4)" --recurse-submodules HEAD >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'grep tree HEAD^' '
cat >expect <<-\EOF &&
-   HEAD^:a:foobar
-   HEAD^:b/b:bar
-   HEAD^:submodule/a:foobar
+   HEAD^:a:(1|2)d(3|4)
+   HEAD^:b/b:(3|4)
+   HEAD^:submodule/a:(1|2)d(3|4)
EOF
 
-   git grep -e "bar" --recurse-submodules HEAD^ >actual &&
+   git grep -e "(3|4)" --recurs

[PATCH v4 07/31] grep: add a test for backreferences in PCRE patterns

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a test for backreferences such as (.)\1 in PCRE patterns. This
test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned
on. Before this change turning it on would break these sort of
patterns, but wouldn't break any tests.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7810-grep.sh | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 8d69113695..daa906b9b0 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1114,6 +1114,13 @@ test_expect_success PCRE 'grep -P -w pattern' '
test_cmp expected actual
 '
 
+test_expect_success PCRE 'grep -P backreferences work (the PCRE 
NO_AUTO_CAPTURE flag is not set)' '
+   git grep -P -h "(?P.)(?P=one)" hello_world >actual &&
+   test_cmp hello_world actual &&
+   git grep -P -h "(.)\1" hello_world >actual &&
+   test_cmp hello_world actual
+'
+
 test_expect_success 'grep -G invalidpattern properly dies ' '
test_must_fail git grep -G "a["
 '
-- 
2.13.0.303.g4ebf302169



[PATCH v4 08/31] grep: change non-ASCII -i test to stop using --debug

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Change a non-ASCII case-insensitive test case to stop using --debug,
and instead simply test for the expected results.

The test coverage remains the same with this change, but the test
won't break due to internal refactoring.

This test was added in commit 793dc676e0 ("grep/icase: avoid kwsset
when -F is specified", 2016-06-25). It was asserting that the regex
must be compiled with compile_fixed_regexp(), instead test for the
expected results, allowing the underlying implementation to change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7812-grep-icase-non-ascii.sh | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh
index 04a61cb8e0..0059a1f837 100755
--- a/t/t7812-grep-icase-non-ascii.sh
+++ b/t/t7812-grep-icase-non-ascii.sh
@@ -36,29 +36,14 @@ test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 
string with "+"' '
 '
 
 test_expect_success REGEX_LOCALE 'grep literal string, with -F' '
-   git grep --debug -i -F "TILRAUN: Halló Heimur!"  2>&1 >/dev/null |
-grep fixed >debug1 &&
-   test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 &&
-   test_cmp expect1 debug1 &&
-
-   git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!"  2>&1 >/dev/null |
-grep fixed >debug2 &&
-   test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 &&
-   test_cmp expect2 debug2
+   git grep -i -F "TILRAUN: Halló Heimur!" &&
+   git grep -i -F "TILRAUN: HALLÓ HEIMUR!"
 '
 
 test_expect_success REGEX_LOCALE 'grep string with regex, with -F' '
-   test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file &&
-
-   git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 
>/dev/null |
-grep fixed >debug1 &&
-   test_write_lines "fixed \\^*TILR^AUN:\\.\\* Halló 
\$He\\[]imur!\\\$" >expect1 &&
-   test_cmp expect1 debug1 &&
-
-   git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$"  2>&1 
>/dev/null |
-grep fixed >debug2 &&
-   test_write_lines "fixed \\^*TILR^AUN:\\.\\* HALLÓ 
\$HE\\[]IMUR!\\\$" >expect2 &&
-   test_cmp expect2 debug2
+   test_write_lines "TILRAUN: Halló Heimur [abc]!" >file3 &&
+   git add file3 &&
+   git grep -i -F "TILRAUN: Halló Heimur [abc]!" file3
 '
 
 test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' '
-- 
2.13.0.303.g4ebf302169



[PATCH v4 06/31] grep: add a test asserting that --perl-regexp dies when !PCRE

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add a test asserting that when --perl-regexp (and -P for grep) is
given to git-grep & git-log that we die with an error.

In developing the PCRE v2 series I introduced a regression where -P
would (through control-flow fall-through) become synonymous with basic
POSIX matching. I.e. 'git grep -P '[\d]' would match "d" instead of
digits.

The entire test suite would still pass with this serious regression,
since everything that tested for --perl-regexp would be guarded by the
PCRE prerequisite, fix that blind-spot by adding tests under !PCRE
asserting that git must die when given --perl-regexp or -P.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t4202-log.sh  |  4 +++-
 t/t7810-grep.sh | 12 
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index c25eb9afd1..c44c4337f8 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -418,7 +418,9 @@ test_expect_success 'log with various grep.patternType 
configurations & command-
git log --pretty=tformat:%s --perl-regexp \
--grep="[\d]\|" >actual.perl.long-arg &&
test_cmp expect.perl actual.perl.long-arg
-
+   else
+   test_must_fail git log --perl-regexp \
+   --grep="[\d]\|"
fi &&
test_cmp expect.fixed actual.fixed.long-arg &&
test_cmp expect.basic actual.basic.long-arg &&
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index c84c4d99f9..8d69113695 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -281,6 +281,10 @@ do
test_cmp expected actual
'
 
+   test_expect_success !PCRE "grep $L with grep.patterntype=perl errors 
without PCRE" '
+   test_must_fail git -c grep.patterntype=perl grep "foo.*bar"
+   '
+
test_expect_success "grep $L with grep.patternType=default and 
grep.extendedRegexp=true" '
echo "${HC}ab:abc" >expected &&
git \
@@ -1058,11 +1062,19 @@ test_expect_success PCRE 'grep --perl-regexp pattern' '
test_cmp expected actual
 '
 
+test_expect_success !PCRE 'grep --perl-regexp pattern errors without PCRE' '
+   test_must_fail git grep --perl-regexp "foo.*bar"
+'
+
 test_expect_success PCRE 'grep -P pattern' '
git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
+test_expect_success !PCRE 'grep -P pattern errors without PCRE' '
+   test_must_fail git grep -P "foo.*bar"
+'
+
 test_expect_success 'grep pattern with grep.extendedRegexp=true' '
>empty &&
test_must_fail git -c grep.extendedregexp=true \
-- 
2.13.0.303.g4ebf302169



[PATCH v4 09/31] grep: add tests for --threads=N and grep.threads

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add tests for --threads=N being supplied on the command-line, or when
grep.threads=N being supplied in the configuration.

When the threading support was made run-time configurable in commit
89f09dd34e ("grep: add --threads= option and grep.threads
configuration", 2015-12-15) no tests were added for it.

In developing a change to the grep code I was able to make
'--threads=1 ` segfault, while the test suite still passed. This
change fixes that blind spot in the tests.

In addition to asserting that asking for N threads shouldn't segfault,
test that the grep output given any N is the same.

The choice to test only 1..10 as opposed to 1..8 or 1..16 or whatever
is arbitrary. Testing 1..1024 works locally for me (but gets
noticeably slower as more threads are spawned). Given the structure of
the code there's no reason to test an arbitrary number of threads,
only 0, 1 and >=2 are special modes of operation.

A later patch introduces a PTHREADS test prerequisite which is true
under NO_PTHREADS=UnfortunatelyYes, but even under NO_PTHREADS it's
fine to test --threads=N, we'll just ignore it and not use
threading. So these tests also make sense under that mode to assert
that --threads=N without pthreads still returns expected results.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7810-grep.sh | 16 
 1 file changed, 16 insertions(+)

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index daa906b9b0..561709ef6a 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -775,6 +775,22 @@ test_expect_success 'grep -W with userdiff' '
test_cmp expected actual
 '
 
+for threads in $(test_seq 0 10)
+do
+   test_expect_success "grep --threads=$threads & -c 
grep.threads=$threads" "
+   git grep --threads=$threads . >actual.$threads &&
+   if test $threads -ge 1
+   then
+   test_cmp actual.\$(($threads - 1)) actual.$threads
+   fi &&
+   git -c grep.threads=$threads grep . >actual.$threads &&
+   if test $threads -ge 1
+   then
+   test_cmp actual.\$(($threads - 1)) actual.$threads
+   fi
+   "
+done
+
 test_expect_success 'grep from a subdirectory to search wider area (1)' '
mkdir -p s &&
(
-- 
2.13.0.303.g4ebf302169



[PATCH v4 11/31] grep: add tests for grep pattern types being passed to submodules

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add testing for grep pattern types being correctly passed to
submodules. The pattern "(.|.)[\d]" matches differently under
fixed (not at all), and then matches different lines under
basic/extended & perl regular expressions, so this change asserts that
the pattern type is passed along correctly.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t7814-grep-recurse-submodules.sh | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 1472855e1d..3a58197f47 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -313,4 +313,53 @@ test_incompatible_with_recurse_submodules ()
 test_incompatible_with_recurse_submodules --untracked
 test_incompatible_with_recurse_submodules --no-index
 
+test_expect_success 'grep --recurse-submodules should pass the pattern type 
along' '
+   # Fixed
+   test_must_fail git grep -F --recurse-submodules -e "(.|.)[\d]" &&
+   test_must_fail git -c grep.patternType=fixed grep --recurse-submodules 
-e "(.|.)[\d]" &&
+
+   # Basic
+   git grep -G --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   a:(1|2)d(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=basic grep --recurse-submodules -e "(.|.)[\d]" 
>actual &&
+   test_cmp expect actual &&
+
+   # Extended
+   git grep -E --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   .gitmodules:[submodule "submodule"]
+   .gitmodules:path = submodule
+   .gitmodules:url = ./submodule
+   a:(1|2)d(3|4)
+   submodule/.gitmodules:[submodule "sub"]
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=extended grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+   git -c grep.extendedRegexp=true grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual &&
+
+   # Perl
+   if test_have_prereq PCRE
+   then
+   git grep -P --recurse-submodules -e "(.|.)[\d]" >actual &&
+   cat >expect <<-\EOF &&
+   a:(1|2)d(3|4)
+   b/b:(3|4)
+   submodule/a:(1|2)d(3|4)
+   submodule/sub/a:(1|2)d(3|4)
+   EOF
+   test_cmp expect actual &&
+   git -c grep.patternType=perl grep --recurse-submodules -e 
"(.|.)[\d]" >actual &&
+   test_cmp expect actual
+   fi
+'
+
 test_done
-- 
2.13.0.303.g4ebf302169



[PATCH v4 01/31] Makefile & configure: reword inaccurate comment about PCRE

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Reword an outdated & inaccurate comment which suggests that only
git-grep can use PCRE.

This comment was added back when PCRE support was initially added in
commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true
at the time.

It hasn't been telling the full truth since git-log learned to use
PCRE with --grep in commit 727b6fc3ed ("log --grep: accept
--basic-regexp and --perl-regexp", 2012-10-03), and more importantly
is likely to get more inaccurate over time as more use is made of PCRE
in other areas.

Reword it to be more future-proof, and to more clearly explain that
this enables user-initiated runtime behavior.

Copy/pasting this so much in configure.ac is lame, these Makefile-like
flags aren't even used by autoconf, just the corresponding
--with[out]-* options. But copy/pasting the comments that make sense
for the Makefile to configure.ac where they make less sense is the
pattern everything else follows in that file. I'm not going to war
against that as part of this change, just following the existing
pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile |  6 --
 configure.ac | 12 
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index e35542e631..eedadb8056 100644
--- a/Makefile
+++ b/Makefile
@@ -24,8 +24,10 @@ all::
 # Define NO_OPENSSL environment variable if you do not have OpenSSL.
 # This also implies BLK_SHA1.
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
diff --git a/configure.ac b/configure.ac
index 128165529f..deeb968daa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library 
(default is YES)])
 AS_HELP_STRING([],  [ARG can be prefix for openssl library and 
headers]),
 GIT_PARSE_WITH([openssl]))
 
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in
 # /foo/bar/include and /foo/bar/lib directories.
@@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO])
 GIT_CONF_SUBST([NO_OPENSSL])
 
 #
-# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be
-# able to use Perl-compatible regular expressions.
+# Define USE_LIBPCRE if you have and want to use libpcre. Various
+# commands such as log and grep offer runtime options to use
+# Perl-compatible regular expressions instead of standard or extended
+# POSIX regular expressions.
 #
 
 if test -n "$USE_LIBPCRE"; then
-- 
2.13.0.303.g4ebf302169



[PATCH v4 02/31] grep & rev-list doc: stop promising libpcre for --perl-regexp

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Stop promising in our grep & rev-list options documentation that we're
always going to be using libpcre when given the --perl-regexp option.

Instead talk about using "Perl-compatible regular expressions" and
using these types of patterns using "a compile-time dependency".

Saying "libpcre" means that we're talking about libpcre.so, which is
always going to be v1. This change is part of an ongoing saga to add
support for libpcre2, which comes with PCRE v2.

In the future we might use some completely unrelated library to
provide perl-compatible regular expression support. By wording the
documentation differently and not promising any specific version of
PCRE or even PCRE at all we have more wiggle room to change the
implementation.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-grep.txt | 7 +--
 Documentation/rev-list-options.txt | 8 ++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 71f32f3508..5033483db4 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -161,8 +161,11 @@ OPTIONS
 
 -P::
 --perl-regexp::
-   Use Perl-compatible regexp for patterns. Requires libpcre to be
-   compiled in.
+   Use Perl-compatible regular expressions for patterns.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 -F::
 --fixed-strings::
diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index a02f7324c0..a46f70c2b1 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -92,8 +92,12 @@ endif::git-rev-list[]
pattern as a regular expression).
 
 --perl-regexp::
-   Consider the limiting patterns to be Perl-compatible regular 
expressions.
-   Requires libpcre to be compiled in.
+   Consider the limiting patterns to be Perl-compatible regular
+   expressions.
++
+Support for these types of regular expressions is an optional
+compile-time dependency. If Git wasn't compiled with support for them
+providing this option will cause it to die.
 
 --remove-empty::
Stop when a given path disappears from the tree.
-- 
2.13.0.303.g4ebf302169



[PATCH v4 04/31] log: add exhaustive tests for pattern style options & config

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Add exhaustive tests for how the different grep.patternType options &
the corresponding command-line options affect git-log.

Before this change it was possible to patch revision.c so that the
--basic-regexp option was synonymous with --extended-regexp, and
--perl-regexp wasn't recognized at all, and still have 100% of the
test suite pass.

This was because the first test being modified here, added in commit
34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as
"git grep"", 2012-10-03), didn't actually check whether we'd enabled
extended regular expressions as distinct from re-toggling non-fixed
string support.

Fix that by changing the pattern to a pattern that'll only match if
--extended-regexp option is provided, but won't match under the
default --basic-regexp option.

Other potential regressions were possible since there were no tests
for the rest of the combinations of grep.patternType configuration
toggles & corresponding git-log command-line options. Add exhaustive
tests for those.

The patterns being passed to fixed/basic/extended/PCRE are carefully
crafted to return the wrong thing if the grep engine were to pick any
other matching method than the one it's told to use.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t4202-log.sh | 98 +-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 1c7d6729c6..948fd719d2 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -262,7 +262,30 @@ test_expect_success 'log --grep -i' '
 
 test_expect_success 'log -F -E --grep= uses ere' '
echo second >expect &&
-   git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual &&
+   # basic would need \(s\) to do the same
+   git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' '
+   test_when_finished "rm -rf num_commits" &&
+   git init num_commits &&
+   (
+   cd num_commits &&
+   test_commit 1d &&
+   test_commit 2e
+   ) &&
+
+   # In PCRE \d in [\d] is like saying "0-9", and matches the 2
+   # in 2e...
+   echo 2e >expect &&
+   git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp 
--grep="[\d]" >actual &&
+   test_cmp expect actual &&
+
+   # ...in POSIX basic and extended it is the same as [d],
+   # i.e. "d", which matches 1d, but does not match 2e.
+   echo 1d >expect &&
+   git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" 
>actual &&
test_cmp expect actual
 '
 
@@ -280,6 +303,79 @@ test_expect_success 'log with grep.patternType 
configuration and command line' '
test_cmp expect actual
 '
 
+test_expect_success 'log with various grep.patternType configurations & 
command-lines' '
+   git init pattern-type &&
+   (
+   cd pattern-type &&
+   test_commit 1 file A &&
+
+   # The tagname is overridden here because creating a
+   # tag called "(1|2)" as test_commit would otherwise
+   # implicitly do would fail on e.g. MINGW.
+   test_commit "(1|2)" file B 2 &&
+
+   echo "(1|2)" >expect.fixed &&
+   cp expect.fixed expect.basic &&
+   cp expect.fixed expect.extended &&
+   cp expect.fixed expect.perl &&
+
+   # A strcmp-like match with fixed.
+   git -c grep.patternType=fixed log --pretty=tformat:%s \
+   --grep="(1|2)" >actual.fixed &&
+
+   # POSIX basic matches (, | and ) literally.
+   git -c grep.patternType=basic log --pretty=tformat:%s \
+   --grep="(.|.)" >actual.basic &&
+
+   # POSIX extended needs to have | escaped to match it
+   # literally, whereas under basic this is the same as
+   # (|2), i.e. it would also match "1". This test checks
+   # for extended by asserting that it is not matching
+   # what basic would match.
+   git -c grep.patternType=extended log --pretty=tformat:%s \
+   --grep="\|2" >actual.extended &&
+   if test_have_prereq PCRE
+   then
+   # Only PCRE would match [\d]\| with only
+   # "(1|2)" due to [\d]. POSIX basic would match
+   # both it and "1" since similarly to the
+   # extended match above it is the same as
+   # \([\d]\|\). POSIX extended would
+   # match neither.
+   git -c grep.patternType=perl log --pretty=tformat:%s \
+   --grep="[\d]\|" >actual.perl &&
+   test_cmp expect.perl actual.perl
+   fi &&
+   test_cmp expect.fixed actual.

[PATCH v4 03/31] test-lib: rename the LIBPCRE prerequisite to PCRE

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Rename the LIBPCRE prerequisite to PCRE. This is for preparation for
libpcre2 support, where having just "LIBPCRE" would be confusing as it
implies v1 of the library.

None of these tests are incompatible between versions 1 & 2 of
libpcre, it's less confusing to give them a more general name to make
it clear that they work on both library versions.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README|  4 ++--
 t/t7810-grep.sh | 28 ++--
 t/t7812-grep-icase-non-ascii.sh |  4 ++--
 t/t7813-grep-icase-iso.sh   |  2 +-
 t/test-lib.sh   |  2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/t/README b/t/README
index ab386c3681..a90cb62583 100644
--- a/t/README
+++ b/t/README
@@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own.
Test is not run by root user, and an attempt to write to an
unwritable file is expected to fail correctly.
 
- - LIBPCRE
+ - PCRE
 
-   Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests
+   Git was compiled with support for PCRE. Wrap any tests
that use git-grep --perl-regexp or git-grep -P in these.
 
  - CASE_INSENSITIVE_FS
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index cee42097b0..c84c4d99f9 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -275,7 +275,7 @@ do
test_cmp expected actual
'
 
-   test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" '
+   test_expect_success PCRE "grep $L with grep.patterntype=perl" '
echo "${HC}ab:a+b*c" >expected &&
git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab 
>actual &&
test_cmp expected actual
@@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv)
 hello.c:   printf("Hello world.\n");
 EOF
 
-test_expect_success LIBPCRE 'grep --perl-regexp pattern' '
+test_expect_success PCRE 'grep --perl-regexp pattern' '
git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern' '
+test_expect_success PCRE 'grep -P pattern' '
git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
@@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with 
grep.extendedRegexp=true' '
test_cmp empty actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.extendedRegexp=true' '
+test_expect_success PCRE 'grep -P pattern with grep.extendedRegexp=true' '
git -c grep.extendedregexp=true \
grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -v pattern' '
+test_expect_success PCRE 'grep -P -v pattern' '
{
echo "ab:a+b*c"
echo "ab:a+bc"
@@ -1085,7 +1085,7 @@ test_expect_success LIBPCRE 'grep -P -v pattern' '
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -i pattern' '
+test_expect_success PCRE 'grep -P -i pattern' '
cat >expected <<-EOF &&
hello.c:printf("Hello world.\n");
EOF
@@ -1093,7 +1093,7 @@ test_expect_success LIBPCRE 'grep -P -i pattern' '
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P -w pattern' '
+test_expect_success PCRE 'grep -P -w pattern' '
{
echo "hello_world:Hello world"
echo "hello_world:HeLLo world"
@@ -1118,11 +1118,11 @@ test_expect_success 'grep invalidpattern properly dies 
with grep.patternType=ext
test_must_fail git -c grep.patterntype=extended grep "a["
 '
 
-test_expect_success LIBPCRE 'grep -P invalidpattern properly dies ' '
+test_expect_success PCRE 'grep -P invalidpattern properly dies ' '
test_must_fail git grep -P "a["
 '
 
-test_expect_success LIBPCRE 'grep invalidpattern properly dies with 
grep.patternType=perl' '
+test_expect_success PCRE 'grep invalidpattern properly dies with 
grep.patternType=perl' '
test_must_fail git -c grep.patterntype=perl grep "a["
 '
 
@@ -1191,13 +1191,13 @@ test_expect_success 'grep pattern with 
grep.patternType=fixed, =basic, =perl, =e
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -G -F -E -P pattern' '
+test_expect_success PCRE 'grep -G -F -E -P pattern' '
echo "d0:0" >expected &&
git grep -G -F -E -P "[\d]" d0 >actual &&
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, 
=extended, =perl' '
+test_expect_success PCRE 'grep pattern with grep.patternType=fixed, =basic, 
=extended, =perl' '
echo "d0:0" >expected &&
git \
-c grep.patterntype=fixed \
@@ -1208,7 +1208,7 @@ test_expect_success LIBPCRE 'grep pattern with 
grep.patternType=fixed, =basic, =
test_cmp expected actual
 '
 
-test_expect_success LIBPCRE 'grep -P pattern with grep.patternType=fixed' '
+test_e

[PATCH v4 00/31] Easy to review grep & pre-PCRE changes

2017-05-25 Thread Ævar Arnfjörð Bjarmason
Easy to review? 29 (I mean 30, I mean 31) patches? Are you kidding
me?!

As noted in v1 (<20170511091829.5634-1-ava...@gmail.com>;
https://public-inbox.org/git/20170511091829.5634-1-ava...@gmail.com/)
these are all doc, test, refactoring etc. changes needed by the
subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series.

See <20170520214233.7183-1-ava...@gmail.com>
(https://public-inbox.org/git/20170520214233.7183-1-ava...@gmail.com/)
v3 & notes about that version. What changed this time around? See
below:


Ævar Arnfjörð Bjarmason (31):
  Makefile & configure: reword inaccurate comment about PCRE
  grep & rev-list doc: stop promising libpcre for --perl-regexp
  test-lib: rename the LIBPCRE prerequisite to PCRE
  log: add exhaustive tests for pattern style options & config
  log: make --regexp-ignore-case work with --perl-regexp
  grep: add a test asserting that --perl-regexp dies when !PCRE
  grep: add a test for backreferences in PCRE patterns
  grep: change non-ASCII -i test to stop using --debug
  grep: add tests for --threads=N and grep.threads
  grep: amend submodule recursion test for regex engine testing
  grep: add tests for grep pattern types being passed to submodules
  grep: add a test helper function for less verbose -f \0 tests
  grep: prepare for testing binary regexes containing rx metacharacters
  grep: add tests to fix blind spots with \0 patterns

No changes.

  perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do

Fix trailing whitespace.

  perf: emit progress output when unpacking & building

No changes.

  perf: add a comparison test of grep regex engines
  perf: add a comparison test of grep regex engines with -F
  perf: add a comparison test of log --grep regex engines
  perf: add a comparison test of log --grep regex engines with -F

These are all improved:

 * Skip the PCRE test when we don't have the PCRE prerequisite,
   instead of erroring out.

 * Update outdated commit messages left over from previous
   submissions.

 * General minor nits in the code, e.g. use the same for-loop variable
   name in all four files which have similar code, simplify the
   test_cmp invocation etc.

  grep: catch a missing enum in switch statement
  grep: remove redundant regflags assignments

No changes.

  grep: factor test for \0 in grep patterns into a function

Brandon pointed out that one of the lines in this patch was longer
than 79 characters. Fixed.

  grep: change the internal PCRE macro names to be PCRE1
  grep: change internal *pcre* variable & function names to be *pcre1*
  grep: move is_fixed() earlier to avoid forward declaration
  test-lib: add a PTHREADS prerequisite
  pack-objects & index-pack: add test for --threads warning
  pack-objects: fix buggy warning about threads
  grep: given --threads with NO_PTHREADS=YesPlease, warn
  grep: assert that threading is enabled when calling grep_{lock,unlock}

No changes.

 Documentation/git-grep.txt |   7 +-
 Documentation/rev-list-options.txt |   8 +-
 Makefile   |  14 ++-
 builtin/grep.c |  23 +++-
 builtin/pack-objects.c |   4 +-
 configure.ac   |  12 +-
 grep.c | 110 +
 grep.h |  10 +-
 revision.c |   1 +
 t/README   |   8 +-
 t/perf/README  |  17 ++-
 t/perf/p4220-log-grep-engines.sh   |  53 
 t/perf/p4221-log-grep-engines-fixed.sh |  44 +++
 t/perf/p7820-grep-engines.sh   |  56 +
 t/perf/p7821-grep-engines-fixed.sh |  41 +++
 t/perf/run |  13 +-
 t/t4202-log.sh | 160 +++-
 t/t5300-pack-object.sh |  36 ++
 t/t7008-grep-binary.sh | 135 -
 t/t7810-grep.sh|  81 ++---
 t/t7812-grep-icase-non-ascii.sh|  29 ++---
 t/t7813-grep-icase-iso.sh  |   2 +-
 t/t7814-grep-recurse-submodules.sh | 215 -
 t/test-lib.sh  |   3 +-
 24 files changed, 843 insertions(+), 239 deletions(-)
 create mode 100755 t/perf/p4220-log-grep-engines.sh
 create mode 100755 t/perf/p4221-log-grep-engines-fixed.sh
 create mode 100755 t/perf/p7820-grep-engines.sh
 create mode 100755 t/perf/p7821-grep-engines-fixed.sh

-- 
2.13.0.303.g4ebf302169



[PATCH] connect.c: fix leak in parse_one_symref_info()

2017-05-25 Thread Jeff King
If we successfully parse a symref value like
"HEAD:refs/heads/master", we add the result to a string
list. But because the string list is marked
STRING_LIST_INIT_DUP, the string list code will make a copy
of the string and add the copy.

This patch fixes it by adding the entry with
string_list_append_nodup(), which lets the string list take
ownership of our newly allocated string. There are two
alternatives that seem like they would work, but aren't the
right solution.

The first is to initialize the list with the "NODUP"
initializer. That would avoid the copy, but then the string
list would not realize that it owns the strings. When we
eventually call string_list_clear(), it would not free the
strings, causing a leak.

The second option would be to use the normal
string_list_append(), but free the local copy in our
function. We can't do this because the local copy actually
contains _two_ strings; the symref name and its target. We
point to the target pointer via the "util" field, and its
memory must last as long as the string list does.

You may also wonder whether it's safe to ever free the local
copy, since the target points into it. The answer is yes,
because we duplicate it in annotaate_refs_with_symref_info
before clearing the string list.

Signed-off-by: Jeff King 
---
Phew. For a one-line leak fix, that sure was complicated.

I doubt it matters much in practice, because servers send only a single
HEAD, so we just leak one string. But I happened to notice it while
looking at the unborn-HEAD thing.

 connect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/connect.c b/connect.c
index cd21a1b6f..c72b1d115 100644
--- a/connect.c
+++ b/connect.c
@@ -71,7 +71,7 @@ static void parse_one_symref_info(struct string_list *symref, 
const char *val, i
check_refname_format(target, REFNAME_ALLOW_ONELEVEL))
/* "symref=bogus:pair */
goto reject;
-   item = string_list_append(symref, sym);
+   item = string_list_append_nodup(symref, sym);
item->util = target;
return;
 reject:
-- 
2.13.0.496.ge44ba89db


Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-25 Thread Jeff King
On Thu, May 25, 2017 at 11:59:24AM -0400, Jeff King wrote:

> The just-HEAD case could look like:

This patch does work, in the sense that upload-pack advertises the
unborn HEAD symref. But the client doesn't actually do anything with it.
The capability parsing happens in get_remote_heads(), which passes the
data out by adding an annotation to the "struct ref" list. But of course
we have no HEAD ref to annotate.

So either get_remote_heads() would have to start returning a bogus HEAD
ref (with a null sha1, I guess, which all callers would have to
recognize). Or clone (and probably "remote set-head -a") would have to
start reaching across the transport-module boundary and asking for any
symref values for "HEAD". I'm not excited about more special-casing of
"HEAD", though. In theory we'd want this for other symrefs in the long
run, and it would be nice if clients were ready to handle that (even if
the protocol isn't quite there).

I dunno. I was thinking there might be a quick tweak, but I'm wondering
if this arcane case is worth the restructuring we'd have to do to
support it. It only comes up when you've moved the server repo's HEAD to
an unborn branch _and_ you have other refs (since otherwise we don't
even send capabilities at all!).

-Peff


`git svn branch` does not respect authors file

2017-05-25 Thread Alexandre Bury
I have a git repository linked to a svn repository with git-svn,
including a branch path configuration and an authorsfile for svn
username -> email mapping.


When running `git svn branch new_branch`, git-svn:
* Creates a svn commit creating a new svn branch
* Creates a local git commit linked to this svn commit
The svn commit is correctly generated, but the corresponding git
commit has a bad author: the author is just the svn username.


Re-fetching the svn repository then properly applies the authors file,
but ideally it would do it directly when creating the branch.


[PATCH v3 4/6] fsmonitor: add test cases for fsmonitor extension

2017-05-25 Thread Ben Peart
Add test cases that ensure status results are correct when using the new
fsmonitor extension.  Test untracked, modified, and new files by
ensuring the results are identical to when not using the extension.

Add a test to ensure updates to the index properly mark corresponding
entries in the index extension as dirty so that the status is correct
after commands that modify the index but don't trigger changes in the
working directory.

Add a test that verifies that if the fsmonitor extension doesn't tell
git about a change, it doesn't discover it on its own.  This ensures
git is honoring the extension and that we get the performance benefits
desired.

All test hooks output a marker file that is used to ensure the hook
was actually used to generate the test results.

Signed-off-by: Ben Peart 
---
 t/t7519-status-fsmonitor.sh | 158 
 1 file changed, 158 insertions(+)
 create mode 100755 t/t7519-status-fsmonitor.sh

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
new file mode 100755
index 00..395db46d55
--- /dev/null
+++ b/t/t7519-status-fsmonitor.sh
@@ -0,0 +1,158 @@
+#!/bin/sh
+
+test_description='git status with file system watcher'
+
+. ./test-lib.sh
+
+clean_repo () {
+   git reset --hard HEAD
+   git clean -fd
+   rm -f marker
+}
+
+dirty_repo () {
+   : >untracked
+   : >dir1/untracked
+   : >dir2/untracked
+   echo 1 >modified
+   echo 2 >dir1/modified
+   echo 3 >dir2/modified
+   echo 4 >new
+   echo 5 >dir1/new
+   echo 6 >dir2/new
+   git add new
+   git add dir1/new
+   git add dir2/new
+}
+
+# The test query-fsmonitor hook proc will output a marker file we can use to
+# ensure the hook was actually used to generate the correct results.
+
+test_expect_success 'setup' '
+   mkdir -p .git/hooks &&
+   write_script .git/hooks/query-fsmonitor<<-\EOF &&
+   if [ $1 -ne 1 ]
+   then
+   echo -e "Unsupported query-fsmonitor hook version.\n" >&2
+   exit 1;
+   fi
+   : >marker
+   printf "untracked\0"
+   printf "dir1/untracked\0"
+   printf "dir2/untracked\0"
+   printf "modified\0"
+   printf "dir1/modified\0"
+   printf "dir2/modified\0"
+   printf "new\0""
+   printf "dir1/new\0"
+   printf "dir2/new\0"
+   EOF
+   : >tracked &&
+   : >modified &&
+   mkdir dir1 &&
+   : >dir1/tracked &&
+   : >dir1/modified &&
+   mkdir dir2 &&
+   : >dir2/tracked &&
+   : >dir2/modified &&
+   git add . &&
+   test_tick &&
+   git commit -m initial &&
+   dirty_repo
+'
+
+cat >.gitignore <<\EOF
+.gitignore
+expect*
+output*
+marker*
+EOF
+
+# Status is well tested elsewhere so we'll just ensure that the results are
+# the same when using core.fsmonitor. First call after turning on the option
+# does a complete scan so need to do two calls to ensure we test the new
+# codepath.
+
+test_expect_success 'status with core.untrackedcache true' '
+   git config core.fsmonitor true  &&
+   git config core.untrackedcache true &&
+   git -c core.fsmonitor=false -c core.untrackedcache=true status >expect 
&&
+   clean_repo &&
+   git status &&
+   test_path_is_missing marker &&
+   dirty_repo &&
+   git status >output &&
+   test_path_is_file marker &&
+   test_i18ncmp expect output
+'
+
+
+test_expect_success 'status with core.untrackedcache false' '
+   git config core.fsmonitor true &&
+   git config core.untrackedcache false &&
+   git -c core.fsmonitor=false -c core.untrackedcache=false status >expect 
&&
+   clean_repo &&
+   git status &&
+   test_path_is_missing marker &&
+   dirty_repo &&
+   git status >output &&
+   test_path_is_file marker &&
+   test_i18ncmp expect output
+'
+
+# Ensure commands that call refresh_index() to move the index back in time
+# properly invalidate the fsmonitor cache
+
+test_expect_success 'refresh_index() invalidates fsmonitor cache' '
+   git config core.fsmonitor true &&
+   git config core.untrackedcache true &&
+   clean_repo &&
+   git status &&
+   test_path_is_missing marker &&
+   dirty_repo &&
+   write_script .git/hooks/query-fsmonitor<<-\EOF &&
+   :>marker
+   EOF
+   git add . &&
+   git commit -m "to reset" &&
+   git status &&
+   test_path_is_file marker &&
+   git reset HEAD~1 &&
+   git status >output &&
+   test_path_is_file marker &&
+   git -c core.fsmonitor=false status >expect &&
+   test_i18ncmp expect output
+'
+
+# Now make sure it's actually skipping the check for modified and untracked
+# files unless it is told about them.  Note, after "git reset --hard HEAD" no
+# extensions exist other than 'TREE' so do a "git status" to get the extension
+# written before testing the results.
+
+test_expect_success 'status doesnt detect unreported modificat

[PATCH v3 1/6] bswap: add 64 bit endianness helper get_be64

2017-05-25 Thread Ben Peart
Add a new get_be64 macro to enable 64 bit endian conversions on memory
that may or may not be aligned.

Signed-off-by: Ben Peart 
---
 compat/bswap.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/compat/bswap.h b/compat/bswap.h
index d47c003544..f89fe7f4b5 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -158,6 +158,7 @@ static inline uint64_t git_bswap64(uint64_t x)
 
 #define get_be16(p)ntohs(*(unsigned short *)(p))
 #define get_be32(p)ntohl(*(unsigned int *)(p))
+#define get_be64(p)ntohll(*(uint64_t *)(p))
 #define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
 
 #else
@@ -170,6 +171,9 @@ static inline uint64_t git_bswap64(uint64_t x)
(*((unsigned char *)(p) + 1) << 16) | \
(*((unsigned char *)(p) + 2) <<  8) | \
(*((unsigned char *)(p) + 3) <<  0) )
+#define get_be64(p)( \
+   ((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \
+   ((uint64_t)get_be32((unsigned char *)(p) + 4) <<  0)
 #define put_be32(p, v) do { \
unsigned int __v = (v); \
*((unsigned char *)(p) + 0) = __v >> 24; \
-- 
2.13.0.windows.1.9.gc201c67b71



[PATCH v3 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman

2017-05-25 Thread Ben Peart
This hook script integrates the new fsmonitor capabilities of git with
the cross platform Watchman file watching service. To use the script:

Download and install Watchman from https://facebook.github.io/watchman/
and instruct Watchman to watch your working directory for changes
('watchman watch-project /usr/src/git').

Rename the sample integration hook from query-fsmonitor.sample to
query-fsmonitor.

Configure git to use the extension ('git config core.fsmonitor true')
and optionally turn on the untracked cache for optimal performance
('git config core.untrackedcache true').

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
---
 templates/hooks--query-fsmonitor.sample | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 templates/hooks--query-fsmonitor.sample

diff --git a/templates/hooks--query-fsmonitor.sample 
b/templates/hooks--query-fsmonitor.sample
new file mode 100644
index 00..615f3332fa
--- /dev/null
+++ b/templates/hooks--query-fsmonitor.sample
@@ -0,0 +1,37 @@
+#!/bin/sh
+#
+# An example hook script to integrate Watchman
+# (https://facebook.github.io/watchman/) with git to provide fast
+# git status.
+#
+# The hook is passed a time in nanoseconds formatted as a string and
+# outputs to stdout all files that have been modified since the given
+# time. Paths must be relative to the root of the working tree and
+# separated by a single NUL.
+#
+# To enable this hook, rename this file to "query-fsmonitor"
+
+# check the hook interface version
+if [ $1 -eq 1 ]
+then
+   # convert nanoseconds to seconds
+   time_t=$(($2/10))
+else
+   echo -e "Unsupported query-fsmonitor hook version.\nFalling back to 
scanning...\n" >&2
+   exit 1;
+fi
+
+# Convert unix style paths to escaped Windows style paths
+case "$(uname -s)" in
+MINGW*|MSYS_NT*)
+  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,,g')"
+  ;;
+*)
+  GIT_WORK_TREE="$PWD"
+  ;;
+esac
+
+# Query Watchman for all the changes since the requested time
+echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t, 
\"fields\":[\"name\"]}]" | \
+watchman -j | \
+perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die 
"Watchman: $o->{'error'}.\nFalling back to scanning...\n" if 
defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'
-- 
2.13.0.windows.1.9.gc201c67b71



[PATCH v3 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-05-25 Thread Ben Peart
When the index is read from disk, the query-fsmonitor index extension is
used to flag the last known potentially dirty index and untracked cache
entries.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes in the working directory.

Before the 'fsmonitor-dirty' flags are used to limit the scope of the
files to be checked, the query-fsmonitor hook proc is called with the
time the index was last updated.  The hook proc returns the list of
files changed since that last updated time and the list of
potentially dirty entries is updated to reflect the current state.

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes.

Signed-off-by: Ben Peart 
---
 Makefile   |   1 +
 builtin/update-index.c |   1 +
 cache.h|   5 ++
 config.c   |   5 ++
 dir.c  |  14 +++
 dir.h  |   2 +
 entry.c|   1 +
 environment.c  |   1 +
 fsmonitor.c| 238 +
 fsmonitor.h|   9 ++
 read-cache.c   |  28 +-
 unpack-trees.c |   1 +
 12 files changed, 304 insertions(+), 2 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index e35542e631..488a4466cc 100644
--- a/Makefile
+++ b/Makefile
@@ -760,6 +760,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index ebfc09faa0..32fd977b43 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+   active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
cache_tree_invalidate_path(&the_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
diff --git a/cache.h b/cache.h
index 188811920c..58e5abf69f 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)
 
 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)
 
@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   uint64_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty;
 };
 
 extern struct index_state the_index;
@@ -766,6 +770,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
+extern int core_fsmonitor;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index bb4d735701..1a8108636d 100644
--- a/config.c
+++ b/config.c
@@ -1232,6 +1232,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.fsmonitor")) {
+   core_fsmonitor = git_config_bool(var, value);
+   return 0;
+   }
+
/* Add other config variables here and to Documentation/config.txt. */
return 0;
 }
diff --git a/dir.c b/dir.c
index 1b5558fdf9..37f1c580a5 100644
--- a/dir.c
+++ b/dir.c
@@ -16,6 +16,7 @@
 #include "utf8.h"
 #include "varint.h"
 #include "ewah/ewok.h"
+#include "fsmonitor.h"
 
 /*
  * Tells read_directory_recursive how a file or directory should be treated.
@@ -1652,6 +1653,18 @@ static int valid_cached_dir(struct dir_struct *dir,
if (!untracked)
return 0;
 
+   refresh_by_fsmonitor(&the_index);
+   if (dir->untracked->use_fsmonitor) {
+   /*
+* With fsmonitor, we can trust the untracked cache's
+* valid field.
+*/
+   if (untracked->valid)
+   goto skip_stat;
+   else
+   invalidate_directory(dir->untracked, untracked);
+   }
+
if (stat(path->len ? path->buf : ".

[PATCH v3 5/6] fsmonitor: add documentation for the fsmonitor extension.

2017-05-25 Thread Ben Peart
This includes the core.fsmonitor setting, the query-fsmonitor hook,
and the fsmonitor index extension.

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |  7 +++
 Documentation/githooks.txt   | 23 +++
 Documentation/technical/index-format.txt | 19 +++
 3 files changed, 49 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index e0b9fd0bc3..5211388167 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -389,6 +389,13 @@ core.protectNTFS::
8.3 "short" names.
Defaults to `true` on Windows, and `false` elsewhere.
 
+core.fsmonitor::
+   If set to true, call the query-fsmonitor hook proc which will
+   identify all files that may have had changes since the last
+   request. This information is used to speed up operations like
+   'git commit' and 'git status' by limiting what git must scan to
+   detect changes.
+
 core.trustctime::
If false, the ctime differences between the index and the
working tree are ignored; useful when the inode change time
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 706091a569..48127e8729 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -448,6 +448,29 @@ The commits are guaranteed to be listed in the order that 
they were
 processed by rebase.
 
 
+[[query-fsmonitor]]
+query-fsmonitor
+~~~
+
+This hook is invoked when the configuration option core.fsmonitor is
+set and git needs to identify changed or untracked files.  It takes
+two arguments, a version (currently 1) and the time in elapsed
+nanoseconds since midnight, January 1, 1970.
+
+The hook should output to stdout the list of all files in the working
+directory that may have changed since the requested time.  The logic
+should be inclusive so that it does not miss any potential changes.
+The paths should be relative to the root of the working directory
+and be separated by a single NUL.
+
+Git will limit what files it checks for changes as well as which
+directories are checked for untracked files based on the path names
+given.
+
+The exit status determines whether git will use the data from the
+hook to limit its search.  On error, it will fall back to verifying
+all files and folders.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index ade0b0c445..7aeeea6f94 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,22 @@ The remaining data of each directory block is grouped by 
type:
 in the previous ewah bitmap.
 
   - One NUL.
+
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the query-fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+   time which is stored as the nanoseconds elapsed since midnight,
+   January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+is CE_FSMONITOR_DIRTY.
-- 
2.13.0.windows.1.9.gc201c67b71



[PATCH v3 2/6] dir: make lookup_untracked() available outside of dir.c

2017-05-25 Thread Ben Peart
Remove the static qualifier from lookup_untracked() and make it
available to other modules by exporting it from dir.h.  This will be
used later when we need to find entries to mark 'fsmonitor dirty.'

Signed-off-by: Ben Peart 
---
 dir.c | 2 +-
 dir.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index f451bfa48c..1b5558fdf9 100644
--- a/dir.c
+++ b/dir.c
@@ -660,7 +660,7 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
struct untracked_cache_dir 
*dir,
const char *name, int len)
 {
diff --git a/dir.h b/dir.h
index bf23a470af..9e387551bd 100644
--- a/dir.h
+++ b/dir.h
@@ -339,4 +339,7 @@ extern void connect_work_tree_and_git_dir(const char 
*work_tree, const char *git
 extern void relocate_gitdir(const char *path,
const char *old_git_dir,
const char *new_git_dir);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+struct untracked_cache_dir *dir,
+const char *name, int len);
 #endif
-- 
2.13.0.windows.1.9.gc201c67b71



[PATCH v3 0/6] Fast git status via a file system watcher

2017-05-25 Thread Ben Peart
Changes from V2 include:
 - switch to nanoseconds for last update time saved in index extension
   and passed to hook
 - pass the hook a version to enable simpler future updates
 - fixup compiler warnings found with different compilers
 - update test to run properly on Mac
 - fix documentation formatting and spelling errors
 - update code formatting based on feedback
 - rename fsmonitor_dirty_bitmap to fsmonitor_dirty

Ben Peart (6):
  bswap: add 64 bit endianness helper get_be64
  dir: make lookup_untracked() available outside of dir.c
  fsmonitor: teach git to optionally utilize a file system monitor to
speed up detecting new or changed files.
  fsmonitor: add test cases for fsmonitor extension
  fsmonitor: add documentation for the fsmonitor extension.
  fsmonitor: add a sample query-fsmonitor hook script for Watchman

 Documentation/config.txt |   7 +
 Documentation/githooks.txt   |  23 +++
 Documentation/technical/index-format.txt |  19 +++
 Makefile |   1 +
 builtin/update-index.c   |   1 +
 cache.h  |   5 +
 compat/bswap.h   |   4 +
 config.c |   5 +
 dir.c|  16 ++-
 dir.h|   5 +
 entry.c  |   1 +
 environment.c|   1 +
 fsmonitor.c  | 238 +++
 fsmonitor.h  |   9 ++
 read-cache.c |  28 +++-
 t/t7519-status-fsmonitor.sh  | 158 
 templates/hooks--query-fsmonitor.sample  |  37 +
 unpack-trees.c   |   1 +
 18 files changed, 556 insertions(+), 3 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h
 create mode 100755 t/t7519-status-fsmonitor.sh
 create mode 100644 templates/hooks--query-fsmonitor.sample

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index f7b4b4a844..48127e8729 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -450,12 +450,12 @@ processed by rebase.
 
 [[query-fsmonitor]]
 query-fsmonitor
-
+~~~
 
 This hook is invoked when the configuration option core.fsmonitor is
 set and git needs to identify changed or untracked files.  It takes
-a single argument which is the time in elapsed seconds since midnight,
-January 1, 1970.
+two arguments, a version (currently 1) and the time in elapsed
+nanoseconds since midnight, January 1, 1970.
 
 The hook should output to stdout the list of all files in the working
 directory that may have changed since the requested time.  The logic
diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index b002d23c05..7aeeea6f94 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -307,7 +307,8 @@ The remaining data of each directory block is grouped by 
type:
   - 32-bit version number: the current supported version is 1.
 
   - 64-bit time: the extension data reflects all changes through the given
-   time which is stored as the seconds elapsed since midnight, January 1, 
1970.
+   time which is stored as the nanoseconds elapsed since midnight,
+   January 1, 1970.
 
   - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
 
diff --git a/cache.h b/cache.h
index 36423c77cc..58e5abf69f 100644
--- a/cache.h
+++ b/cache.h
@@ -344,8 +344,8 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
-   time_t fsmonitor_last_update;
-   struct ewah_bitmap *fsmonitor_dirty_bitmap;
+   uint64_t fsmonitor_last_update;
+   struct ewah_bitmap *fsmonitor_dirty;
 };
 
 extern struct index_state the_index;
diff --git a/fsmonitor.c b/fsmonitor.c
index 9f08e66db9..3ce262d47d 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -5,6 +5,9 @@
 #include "strbuf.h"
 #include "fsmonitor.h"
 
+#define INDEX_EXTENSION_VERSION1
+#define HOOK_INTERFACE_VERSION 1
+
 static struct untracked_cache_dir *find_untracked_cache_dir(
struct untracked_cache *uc, struct untracked_cache_dir *ucd,
const char *name)
@@ -56,20 +59,20 @@ int read_fsmonitor_extension(struct index_state *istate, 
const void *data,
 
hdr_version = get_be32(index);
index += sizeof(uint32_t);
-   if (hdr_version != 1)
+   if (hdr_version != INDEX_EXTENSION_VERSION)
return error("bad fsmonitor version %d", hdr_version);
 
-   istate->fsmonitor_last_update = (time_t)get_be64(index);
+   istate->fsmonitor_last_update = get_be64(index);
index += sizeof(uint64_t);
 
ewah_size = get_be32(index);
index += sizeof(uint32_t);
 
-   istate->fsmonitor_dirty_bitmap = ewah_new();
-

Re: Documentation issue: git-stash examples

2017-05-25 Thread Jeff King
On Thu, May 25, 2017 at 09:52:42PM +1200, Adrian Forbes wrote:

> Some of the example commands in git-stash documentation should be
> written as comments rather than actual commands:
> https://cloud.githubusercontent.com/assets/24915363/26444394/5cf6a754-4190-11e7-845e-135288c8916e.png
> 
> For example, `$ edit emergency fix` should be `# ... edit emergency
> fix ...` like the other comments in the section.
> 
> It could be misleading for novices.

Yeah, I think that's a good idea. Do you want to try your hand at a
patch?

-Peff


Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-25 Thread Jeff King
On Thu, May 25, 2017 at 02:04:07PM -0400, Jeff King wrote:

> > ...that might be something worth thinking about---my gut feeling
> > tells me something but we should go by a measurement, not by gut
> > feeling of a random somebody.
> 
> Yeah, I'd agree. I had the impression the original change was motivated
> by gut feeling.

Hmph. On Linux, at least, I do not see that using "git merge" to
fast-forward is appreciably faster:

Here are timings for:

  git reset --hard HEAD~10 && git pull --rebase

in a git.git repo, using builds of git from various commits (all
best-of-five; the timings include the reset for simplicity, but
presumably it costs the same in each case):

  - 33b842a1e^ (just prior to the switch to git-merge)
real  0m0.256s
user  0m0.096s
sys   0m0.020s

  - 33b842a1e  (using git-merge)
real  0m0.227s
user  0m0.092s
sys   0m0.020s

So a little faster, but there seems to be 20-30ms of noise in my timings
anyway (the average for the "prior" case did seem to be higher, though).
It's possible that the difference would be more stark on Windows, where
the cost of the shell script would be higher.

The same test with the current master performs the same as 33b842a1e.
But if I then remove the optimization, as Tyler's patch did at the start
of this thread, the timings are similar to 33b842a1e^.

So I dunno. It does not seem appreciably faster, but what little speedup
it does provide is the same even with a more modern rebase. Which is
probably because that rebase isn't actually doing much in the first
place, so the optimized bits from Dscho's rebase--helper are not kicking
in yet.

Anyway. All this has shown me is that it's probably pointless to do this
timing at all on Linux. Somebody on Windows might get better results.

But regardless, we need to do something. Correctness must trump
optimizations, and the question is whether we can throw out the whole
conditional, or if we should just restrict when it kicks in.

-Peff


Re: `pull --rebase --autostash` fails when fast forward in dirty repo

2017-05-25 Thread Jeff King
On Wed, May 24, 2017 at 07:40:08AM +0900, Junio C Hamano wrote:

> > But I notice on the run_merge() code path that we do still invoke
> > git-merge.
> 
> ... wouldn't that what we want even when the merge happens to be a
> fast-forward one?  I am not sure what you meant by this, but...

I just meant that if the point of the optimization was to avoid invoking
git-rebase (because it's slow), then we're still not optimizing out a
process. It only helps at all because "rebase" (being a shell script)
may be slower to start and realize it's a fast-forward than "merge".

But once that is no longer true of git-rebase, then there is no purpose
to the optimization.

> > And rebase has been getting faster as it is moved to C code
> > itself. So is this optimization even worth doing anymore?
> 
> ...that might be something worth thinking about---my gut feeling
> tells me something but we should go by a measurement, not by gut
> feeling of a random somebody.

Yeah, I'd agree. I had the impression the original change was motivated
by gut feeling.

-Peff


Re: [PATCHv5 00/17] Diff machine: highlight moved lines.

2017-05-25 Thread Stefan Beller
On Wed, May 24, 2017 at 11:44 PM, Junio C Hamano  wrote:
> I was trying to see how this is useful in code moving change, with
> this command line:
>
> $ git -c color.moved diff js/blame-lib~6 js/blame-lib blame.c blame.h 
> builtin/blame.c
>
> Some random observations:
>
>  * You do not seem to have any command line option yet.  I guess
>that is OK while the series is still in RFC state, but when we
>are ready to seriously start considering this for 'next', we'd
>need something like --color-moved that can be used across "diff",
>"log -p" and "show".

There is "--color-moved" as a command line option. (See diff.c:4290)
Oh, it is not documented! That will be fixed.

>  * As configuration variable names go, "color.moved" is probably in
>a wrong section.  Perhaps "diff.colorMoved" or something?

As you turn on/off normal coloring via "color.diff" and this only extends
the coloring scheme, I have the impression "color" is the right section.
Maybe color.diffStyle=[normal/enhanced] and "enhanced" means this
feature for now?

The only option in the "diff" section related to color is diff.wsErrorHighlight
which has a very similar purpose, so "diff.colorMoved" would fit in that
scheme.

>
>  * The fact that I am using
>
>  [diff "color"]
> old = red reverse
> whitespace = blue reverse
>
>on a "black ink on white paper" terminal might have an effect on
>this,

Yes very much!

>but lines printed in either bold green and on green
>background (i.e. not new ones but are the ones moved from
>elsewhere) stood out a lot more prominently than lines printed in
>green (i.e. truly new additions), and I felt that this is totally
>backwards from what I needed for this exercise.  That is, while
>reviewing a code moving change, I want to be able to concentrate
>primarily of three things:
>
>- What are the new lines that are *not* moved from elsewhere?
>  Did the submitter try to sneak in unrelated changes?
>
>- What are the changes that are truly lost, not moved to
>  elsewhere?
>
>- Do the lines moved from elsewhere form a coherent whole?  Where
>  is the boundary between blocks of text that are copied?  Do
>  adjacent two blocks of moved text come from the same old place
>  next to each other?

So with these questions, I wonder if we want to color moved lines
as "color.diff.context" (i.e. plain white text in the normal coloring scheme)
This would serve the intended purpose of
dimming the attention to moved lines.

Regarding the last point of marking up adjacent blocks (which would
indicate that there is a coherency issue or just moving from different
places), we could highlight the last line of the previous block
and first line of the next block in their "normal" colors (i.e.
color.diff.old and color.diff.new).

The very first version had some boundary coloring, but then
I switched to alternating block coloring based on an idea
by Jonathan Tan.

Maybe it is time to go back to boundary coloring, but optional
and apply the boundary color only if two blocks are adjacent?

Example of how it could look:
---8<---
diff --git a/poetry.txt b/poetry.txt
index 9d32b3b..cc50ca1 100644
--- a/poetry.txt
+++ b/poetry.txt
@@ -1,12 +1,4 @@
[W] -A simple text is
[W] -written in paragraphs and
[W] -many more lines.
[R] -
[W]  A diff is smaller
[W]  than the pre or post image text form
[W]  used for review
[W]
[W] -In between focus
[W] -is hard to keep consistently
[W] -errors may occur
[W] -
diff --git a/engineer.txt b/engineer.txt
new file mode 100644
index 000..8fbc0ce
--- /dev/null
+++ b/engineer.txt
@@ -0,0 +1,7 @@
[W] +A simple text is
[W] +written in paragraphs and
[B] +many more lines.
[B] +In between focus
[W] +is hard to keep consistently
[W] +errors may occur
[W] +
---8<---

W -> simple White text (diff.color.context)
R -> Red text (diff.color.old)
B -> Boundary marker (Maybe just diff.color.{old/new}
  or a new color to configure)

>
>Using colors that stand out more prominently than for the regular
>new/old lines is counter-productive for all of these, especially
>for the first two purposes.  I may suggest using cyan (or any
>color that is very close to the background) so that the presense
>of moved lines are merely felt without being distracting.  IOW,
>while reviewing code moving patch, moved parts want to be dimmed,
>not highlighted.

I agree. So I could resend the algorithm used with other defaults
or try out the "boundary only iff adjacent blocks, else context color".

Thanks,
Stefan


Urgent Message

2017-05-25 Thread Mrs Gloria
Dear Good day,


I sent this mail praying for it to reach you in good health, since I
myself are in a very critical health condition in which I sleep every
night without knowing if I may be alive to see the next day. I am a
widow suffering from long time illness. I have some funds I inherited
from my late husband, my Doctor told me recently that I would not last
due to the illness. Having known my condition, I decided to donate
this fund to a good person that will utilize it the way i am going to
instruct herein. I need a very honest  person who can claim this money
and use it for Charity works, for orphanages, widows and also build
schools for less privilege .

 I accept this decision because I do not have any child who will
inherit this money after I die.Please I want your sincerely and urgent
answer to know if you will be able to execute this project, and I will
give you more information on how the fund will be transferred to your
bank account.

I am waiting for your reply and  my private email address is
caronglori...@yahoo.com
Thank you,

Mrs Gloria


Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch

2017-05-25 Thread Jeff King
On Thu, May 25, 2017 at 12:13:03PM +0900, Junio C Hamano wrote:

> >> So if we wanted to improve this, I think the first step would be for the
> >> server to start sending symref lines for HEAD, even when it does not
> >> resolve to anything.
> >
> > Yup, noticed the same and I agree with your conclusion.
> 
> We probably should make head_ref_namespaced() to take the
> resolve_flags that is passed down thru refs_read_ref_full() down to
> refs_resolve_ref_unsafe().  For the purpose of the first call to it
> in upload-pack.c to call back find_symref(), we do not need and want
> to say RESOLVE_REF_READING (which requires a symref to be pointing
> at an existing ref).  I suspect all the other calls (there are 2
> other in unload-pack.c and yet another in http-backend.c) to the
> function should keep passing RESOLVE_REF_READING, as they do want to
> omit a symbolic ref that points at an unborn branch.

That would make head_ref() not function-pointer compatible with all the
other for_each_ref functions. I don't know how much that matters. The
revision.c parser does use function pointers, but it doesn't handle HEAD
specially.

So I kind of wonder if that code should simply be calling
resolve_ref_unsafe() itself in the first place. We know the only value
it's going to get is HEAD.

OTOH, it's possible that we would eventually want to report all symrefs,
including ones we find while traversing the refs. And in that sense, all
of the for_each_ref functions would want to learn about this. But for
ref advertisement I think that would need a protocol change anyway, so
I'm not sure it's worth worrying about.

The just-HEAD case could look like:

diff --git a/upload-pack.c b/upload-pack.c
index 97da13e6a..04a913ad1 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -959,28 +959,25 @@ static int send_ref(const char *refname, const struct 
object_id *oid,
return 0;
 }
 
-static int find_symref(const char *refname, const struct object_id *oid,
-  int flag, void *cb_data)
+static void find_symref(const char *refname, struct string_list *out)
 {
const char *symref_target;
struct string_list_item *item;
struct object_id unused;
+   int flag;
 
-   if ((flag & REF_ISSYMREF) == 0)
-   return 0;
symref_target = resolve_ref_unsafe(refname, 0, unused.hash, &flag);
if (!symref_target || (flag & REF_ISSYMREF) == 0)
-   die("'%s' is a symref but it is not?", refname);
-   item = string_list_append(cb_data, refname);
+   return;
+   item = string_list_append(out, refname);
item->util = xstrdup(symref_target);
-   return 0;
 }
 
 static void upload_pack(void)
 {
struct string_list symref = STRING_LIST_INIT_DUP;
 
-   head_ref_namespaced(find_symref, &symref);
+   find_symref("HEAD", &symref);
 
if (advertise_refs || !stateless_rpc) {
reset_timeout();


Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths

2017-05-25 Thread Samuel Lijin
On Wed, May 24, 2017 at 12:14 AM, Torsten Bögershausen  wrote:
>
>> diff --git a/builtin/clean.c b/builtin/clean.c
>> index d861f836a..937eb17b6 100644
>> --- a/builtin/clean.c
>> +++ b/builtin/clean.c
>> @@ -857,6 +857,38 @@ static void interactive_main_loop(void)
>> }
>>   }
>>   +static void correct_untracked_entries(struct dir_struct *dir)
>
> Looking what the function does, would
> drop_or_keep_untracked_entries()
> make more sense ?

To me, drop_or_keep_ implies that they're either all dropped or all
kept, nothing in between, which is why I went with correct_, to
indicate that the set of untracked entries in the dir_struct prior to
calling the method needs to be corrected.


[RFC/PATCH] recognize pathspec magic without "--" disambiguation

2017-05-25 Thread Jeff King
For commands that take revisions and pathspecs, magic
pathspecs like ":^Documentation" or ":/Documentation" have
to appear on the right-hand side of the disambiguating "--",
like:

  git log -- :^Documentation

This makes them more annoying to use than they need to be.
We loosened the rules for wildcards in 28fcc0b71 (pathspec:
avoid the need of "--" when wildcard is used, 2015-05-02).
Let's do the same for arguments that look like pathspec
magic (colon followed by any punctuation).

The obvious and desired impact is that you can now do:

  git log :^Documentation

But let's consider related invocations and whether we're
making them better or worse:

   - git log :/foo
  (when "foo" matches a commit message)

  This one should remain the same. Like the existing
  wildcard rule, we're touching only verify_filename(),
  not verify_non_filename(). So cases that _do_ resolve
  as a rev will continue to do so.

   - git log :^foo
  (when "^foo" exists in your index)

  The same logic applies; it would continue to work. And
  ditto for any other weird filenames in your index like
  "(attr)foo".

   - git log :/foo
  (when "foo" does _not_ match a commit message)

  We won't treat this as a revision, because it doesn't
  match anything. So prior to this patch, we'd either
  treat it as a path (if "foo" does exist at the root of
  the project) or complain "this isn't a rev, and nor is
  it a path".  Whereas after this patch, we'd always
  treat it like a path, whether it exists or not (so in
  the second case instead of an error, you're likely to
  get an empty "log", unless "foo" really did exist
  somewhere in your history). So this is a potential
  downside; if the user intended a search of the commit
  messages, they may prefer the error message to the
  attempted pathspec match.

  This same downside actually exists currently when you
  have an asterisk in your regex. E.g.,

git log :/foo.*bar

  will be treated as a pathspec (if it doesn't match a
  commit message) due to the wildcard matching in
  28fcc0b71.

   - git log :^foo
   (when "^foo" isn't in your index)

  The same outcomes apply here as above, but this
  "downside" is very unlikely to occur, as "^foo" is
  such a funny name (and again, things like "(attr)foo"
  as a filename are sufficiently uncommon not to worry
  about).

   - git log :%foo
  (or any other pathspec magic char that doesn't exist)

  The new check doesn't actually parse the pathspec
  magic, but allows any punctuation (which includes the
  long-form ":(magic)"). At first glance this seems
  overly permissive, but it actually yields a better
  error message here: instead of complaining that
  ":%foo" is neither a rev nor a path, we treat it as a
  pathspec and complain that "%" is not a known magic
  (the same as we would if the "--" were given).

  Of course, the flip side here is when you really
  meant a file named "%foo" in your index, but it didn't
  exist. That seems reasonably unlikely (and the error
  message is clear enough to point the user in the right
  direction).

So the collateral damage doesn't seem too bad (it's really
just the case where :/foo doesn't match anything). There are
two possibilities for reducing that:

  1. Instead of accepting all pathspec magic, just allow
 certain ones like ":^" which are unlikely to be used to
 specify a revision (or alternatively, just disallow
 potentially confusing ones like ":/").

 That works, but it makes the rules inconsistent and
 confusing for the user (i.e., "--" is sometimes needed
 for magic and sometimes not).

  2. Instead of recognizing pathspec magic, teach
 check_filename() to parse off the filename bit and see
 if that exists (e.g., for "^foo" see if "foo" exists).
 We already do this for ":/", but it's done in a very
 ad-hoc way. We parse it ourselves, rather than relying
 on the pathspec code, and handle only "/" magic, and
 not even its ":(top)" long form.

 That could be fixed by asking the pathspec code to
 parse it for us (including all magic), and then trying
 to match the resulting name against the working tree.
 But not all pathspec magic actually results in a
 pathname. For instance, when should ":(attr)foo" be
 valid? We can't just look for "foo" in the filesystem
 (it's an attribute, not a pathname).

 By comparison, recognizing things that look like
 pathspec magic is a simple and easy to understand rule.

Signed-off-by: Jeff King 
---
I wrote all the above to try to convince myself that this
doesn't have any serious regression cases. And I think I
did.

But I actually we could make the rules in alternative (2)
above work. check_filename() would ask the pathspec code to
parse each argument and get one of three results:

  1

Re: [PATCH v2 0/6] Fast git status via a file system watcher

2017-05-25 Thread Ben Peart



On 5/24/2017 6:54 AM, Christian Couder wrote:

Design
~~

A new git hook (query-fsmonitor) must exist and be enabled
(core.fsmonitor=true) that takes a time_t formatted as a string and
outputs to stdout all files that have been modified since the requested
time.


Is there a reason why there is a new hook, instead of a
"core.fsmonitorquery" config option to which you could pass whatever
command line with options?


A hook is a simple and well defined way to integrate git with another 
process.  If there is some fixed set of arguments that need to be passed 
to a file system monitor (beyond the timestamp stored in the index 
extension), they can be encoded in the integration script like I've done 
in the Watchman integration sample hook.





A new 'fsmonitor' index extension has been added to store the time the
fsmonitor hook was last queried and a ewah bitmap of the current
'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
entries are 'fsmonitor-dirty.'

As needed, git will call the query-fsmonitor hook proc for the set of
changes since the index was last updated. Git then uses this set of
files along with the list saved in the fsmonitor index extension to flag
the potentially dirty index and untracked cache entries.


So this can work only if "core.untrackedCache" is set to true?



This works with core.untrackedCache set to true or false.  If it is set 
to false, you get valid results, you just don't get the speed up when 
checking for untracked files.



Thanks for working on this,
Christian.



Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.

2017-05-25 Thread Ben Peart



On 5/22/2017 1:28 PM, Ævar Arnfjörð Bjarmason wrote:

On Mon, May 22, 2017 at 6:18 PM, Ben Peart  wrote:

On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:


+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the
query-fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the
given
+   time which is stored as the seconds elapsed since midnight,
January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+is CE_FSMONITOR_DIRTY.



We already have a uint64_t in one place in the codebase (getnanotime)
which uses a 64 bit time for nanosecond accuracy, and numerous
filesystems already support nanosecond timestamps (ext4, that new
Apple thingy...).

I don't know if any of the inotify/fsmonitor APIs support that yet,
but it seems inevitable that that'll be added if not, in some
pathological cases we can have a lot of files modified in 1 second, so
using nanosecond accuracy means there'll be a lot less data to
consider in some cases.

It does mean this'll only work until the year ~2500, but that seems
like an acceptable trade-off.



I really don't think nano-second resolution is needed in this case for a few
reasons.

The number of files that can change within a given second is limited by the
IO throughput of the underlying device. Even assuming a very fast device and
very small files and changes, this won't be that many files.

Without this patch, git would have scanned all those files every time. With
this patch, git will only scan those files a 2nd time that are modified in
the same second that it did the first scan *that came before the first scan
started* (the "lots of files modified" section in the 1 second timeline
below).

|- one second -|
|-lots of files modified - git status - more file modified-|

Yes, some duplicate status checks can be made but its still a significant
win in any reasonable scenario. Especially when you consider that it is
pretty unusual to do git status/add/commit calls in the middle of making
lots of changes to files.


I understand that we get most of the wins either way.

I'm just wondering why not make it nanosecond-resolution now from the
beginning since that's where major filesystems are heading already,
and changing stuff like this can be a hassle once it's initially out
there, whereas just dividing by 10^9 for APIs that need seconds from
the beginning is easy & future-proof.

There are some uses of git where this would probably matter in practice.

E.g. consider a git-annex backed fileserver using ext4, currently
git-annex comes with its own FS watcher as a daemon invoked via `git
annex watch` to constantly add new files without killing performance
via a status/add in a loop, with this a daemon like that could just
run status/add in a loop, but on a busy repo the 1s interval size
might start to matter as you're constantly inspecting larger
intervals.

More importantly though, I can't think of any case where having it in
nanoseconds to begin with would do any harm.


You're right, it's not hard to support nano second resolution and it 
doesn't do any harm.  I switch the index format and interface as I don't 
expect this code will still be running when the timer rolls over. 
Someone long after me will have to fix it if it is. :)





In addition, the backing file system monitor (Watchman) supports number of
seconds since the unix epoch (unix time_t style).  This means any support of
nano seconds by git is academic until someone provides a file system watcher
that does support nano second granularity.


I haven't used watchman for anything non-trivial, but the
documentation for the query API you seem to be using says you can
request a {ctime,mtime}_ns field:

https://github.com/facebook/watchman/blob/master/website/_docs/cmd.query.markdown#user-content-available-fields

And isn't this the documentation for the "since" query you're using,
saying you can specify any arbitrary fs timing field, such as a _ns
time field:

https://github.com/facebook/watchman/blob/master/website/_docs/expr.since.md

?


To keep the index extension and hook interface generic, I have not 
adopted the Watchman specific clock format.  This enables anyone to 
provide a different file system watcher that can inter-operate as nano 
seconds since epoc is easy for anyone to support.





Finally, the fsmonitor index extension is versioned so that we can
seamlessly upgrade to nano second resolution later if we desire.


Aside from my nanosecond bikeshedding, let's say we change the
semantics of any of this in the future: The index has the version, but
there's one argument passed to the hook without a v

Re: What's cooking in git.git (May 2017, #06; Mon, 22)

2017-05-25 Thread Duy Nguyen
On Mon, May 22, 2017 at 1:11 PM, Junio C Hamano  wrote:
> * nd/fopen-errors (2017-05-09) 23 commits
>  - t1308: add a test case on open a config directory
>  - config.c: handle error on failing to fopen()
>  - xdiff-interface.c: report errno on failure to stat() or fopen()
>  - wt-status.c: report error on failure to fopen()
>  - server-info: report error on failure to fopen()
>  - sequencer.c: report error on failure to fopen()
>  - rerere.c: report correct errno
>  - rerere.c: report error on failure to fopen()
>  - remote.c: report error on failure to fopen()
>  - commit.c: report error on failure to fopen() the graft file
>  - log: fix memory leak in open_next_file()
>  - log: report errno on failure to fopen() a file
>  - blame: report error on open if graft_file is a directory
>  - bisect: report on fopen() error
>  - ident.c: use fopen_or_warn()
>  - attr.c: use fopen_or_warn()
>  - wrapper.c: add fopen_or_warn()
>  - wrapper.c: add warn_on_fopen_errors()
>  - config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
>  - config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
>  - clone: use xfopen() instead of fopen()
>  - Use xfopen() in more places
>  - git_fopen: fix a sparse 'not declared' warning
>
>  We often try to open a file for reading whose existence is
>  optional, and silently ignore errors from open/fopen; report such
>  errors if they are not due to missing files.

If anyone wants to continue this, I've cleaned up the series based on
Johannes comments here [1]. It does not have the Darwin change though.
There was the last question about the '*' test change in ref path and
what exact code change causes that, which I can't answer because I
don't have Windows, or the time to simulate and pinpoint the fault on
Linux.

[1] https://github.com/pclouds/git/commits/fopen-or-warn
-- 
Duy


  1   2   >