Re: Bug report: Corrupt pack file after committing a large file (>4 GB?)
On 2017-05-26 07:51, Yu-Hsuan Chen wrote: > Dear maintainer, > > There is a bug where committing a large file corrupts the pack file in > Windows. Steps to recreate are: > > 1. git init > 2. stage and commit a file larger than 4 GB (not entirely sure about this > size) > 3. git checkout -f > > The file checked out is much smaller than the original file size. > > This behavior is surprising. If git does not support large files, I > would at least expect an error message when staging or committing. I > have post a question on StackOverflow regrading this issue, and has > been confirmed by another user. (question id: 44022897) > > Best regards, > > David Chen > Issues for Git for Windows should, in general, be reported here: https://github.com/git-for-windows/git/ After 2 seconds of searching, we can find that the 4Gb problem has already been reported: https://github.com/git-for-windows/git/issues/1063 And, to my knowledge, it has not been fixed, since it is a lot of effort to replace the "long" (or unsigned long) in the Git code base with a better data type. In other words, thanks for reminding us, more help is needed.
Re: Bug report: Corrupt pack file after committing a large file (>4 GB?)
On Fri, May 26, 2017 at 01:51:34PM +0800, Yu-Hsuan Chen wrote: > There is a bug where committing a large file corrupts the pack file in > Windows. Steps to recreate are: > > 1. git init > 2. stage and commit a file larger than 4 GB (not entirely sure about this > size) > 3. git checkout -f > > The file checked out is much smaller than the original file size. For a bug report to be at least marginally useful, please state your Git version (run "git --version") and the version of your Windows installation (including whether it's 32- or 64-bit install). Please also make sure you're really using Git for Windows (that is, you got it from [1] or [2] and not, say, from Cygwin. The best course of actions is to download the most recent available version from the locations mentioned above and verify the problem still manifests itself. 1. https://git-scm.com/download/win 2. https://git-for-windows.github.io/
Re: `pull --rebase --autostash` fails when fast forward in dirty repo
Tyler Brazier writes: > On Thu, May 25, 2017 at 6:33 PM, Junio C Hamano wrote: >> Jeff King writes: >> >>> Anyway. All this has shown me is that it's probably pointless to do this >>> timing at all on Linux. Somebody on Windows might get better results. >>> >>> But regardless, we need to do something. Correctness must trump >>> optimizations, and the question is whether we can throw out the whole >>> conditional, or if we should just restrict when it kicks in. >> >> Yes. I personally do not mind going with the simplest approach. >> The optimization thing is relatively new and we were perfectly happy >> without it before ;-). [administrivia: please do not top-post] > Does git accept outside pull requests? I wouldn't mind committing the > fix for this once it's been decided what the fix should be. (It might > help my resume ;) Please see Documentation/SubmittingPatches. Thanks.
Bug report: Corrupt pack file after committing a large file (>4 GB?)
Dear maintainer, There is a bug where committing a large file corrupts the pack file in Windows. Steps to recreate are: 1. git init 2. stage and commit a file larger than 4 GB (not entirely sure about this size) 3. git checkout -f The file checked out is much smaller than the original file size. This behavior is surprising. If git does not support large files, I would at least expect an error message when staging or committing. I have post a question on StackOverflow regrading this issue, and has been confirmed by another user. (question id: 44022897) Best regards, David Chen
Re: `pull --rebase --autostash` fails when fast forward in dirty repo
Does git accept outside pull requests? I wouldn't mind committing the fix for this once it's been decided what the fix should be. (It might help my resume ;) On Thu, May 25, 2017 at 6:33 PM, Junio C Hamano wrote: > Jeff King writes: > >> Anyway. All this has shown me is that it's probably pointless to do this >> timing at all on Linux. Somebody on Windows might get better results. >> >> But regardless, we need to do something. Correctness must trump >> optimizations, and the question is whether we can throw out the whole >> conditional, or if we should just restrict when it kicks in. > > Yes. I personally do not mind going with the simplest approach. > The optimization thing is relatively new and we were perfectly happy > without it before ;-). >
Re: [PATCH] docs/config.txt: fix indefinite article in core.fileMode description
Obviously correct. Thanks.
[PATCH v3 11/13] log: fix memory leak in open_next_file()
From: Nguyễn Thái Ngọc Duy Noticed-by: Jeff King Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- builtin/log.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index 26d6a3cf14..f075838df9 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -842,8 +842,10 @@ static int open_next_file(struct commit *commit, const char *subject, if (output_directory) { strbuf_addstr(&filename, output_directory); if (filename.len >= - PATH_MAX - FORMAT_PATCH_NAME_MAX - suffix_len) + PATH_MAX - FORMAT_PATCH_NAME_MAX - suffix_len) { + strbuf_release(&filename); return error(_("name of output directory is too long")); + } strbuf_complete(&filename, '/'); } @@ -857,9 +859,11 @@ static int open_next_file(struct commit *commit, const char *subject, if (!quiet) printf("%s\n", filename.buf + outdir_offset); - if ((rev->diffopt.file = fopen(filename.buf, "w")) == NULL) - return error_errno(_("Cannot open patch file %s"), - filename.buf); + if ((rev->diffopt.file = fopen(filename.buf, "w")) == NULL) { + error_errno(_("Cannot open patch file %s"), filename.buf); + strbuf_release(&filename); + return -1; + } strbuf_release(&filename); return 0; -- 2.13.0-491-g71cfeddc25
[PATCH v3 10/13] rerere.c: move error_errno() closer to the source system call
From: Nguyễn Thái Ngọc Duy We are supposed to report errno from fopen(). fclose() between fopen() and the report function could either change errno or reset it. Signed-off-by: Junio C Hamano --- rerere.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rerere.c b/rerere.c index 1351b0c3fb..c26c29f87a 100644 --- a/rerere.c +++ b/rerere.c @@ -489,8 +489,9 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output if (output) { io.io.output = fopen(output, "w"); if (!io.io.output) { + error_errno("Could not write %s", output); fclose(io.input); - return error_errno("Could not write %s", output); + return -1; } } -- 2.13.0-491-g71cfeddc25
[PATCH v3 08/13] wrapper.c: make warn_on_inaccessible() static
From: Nguyễn Thái Ngọc Duy After the last patch, this function is not used outside anymore. Keep it static. Noticed-by: Ramsay Jones Signed-off-by: Junio C Hamano --- git-compat-util.h | 2 -- wrapper.c | 10 +- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index f74b401810..87f47828a5 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1100,8 +1100,6 @@ int remove_or_warn(unsigned int mode, const char *path); int access_or_warn(const char *path, int mode, unsigned flag); int access_or_die(const char *path, int mode, unsigned flag); -/* Warn on an inaccessible file that ought to be accessible */ -void warn_on_inaccessible(const char *path); /* Warn on an inaccessible file if errno indicates this is an error */ int warn_on_fopen_errors(const char *path); diff --git a/wrapper.c b/wrapper.c index 6e513c904a..b117eb14a4 100644 --- a/wrapper.c +++ b/wrapper.c @@ -418,6 +418,11 @@ FILE *fopen_for_writing(const char *path) return ret; } +static void warn_on_inaccessible(const char *path) +{ + warning_errno(_("unable to access '%s'"), path); +} + int warn_on_fopen_errors(const char *path) { if (errno != ENOENT && errno != ENOTDIR) { @@ -597,11 +602,6 @@ int remove_or_warn(unsigned int mode, const char *file) return S_ISGITLINK(mode) ? rmdir_or_warn(file) : unlink_or_warn(file); } -void warn_on_inaccessible(const char *path) -{ - warning_errno(_("unable to access '%s'"), path); -} - static int access_error_is_ok(int err, unsigned flag) { return err == ENOENT || err == ENOTDIR || -- 2.13.0-491-g71cfeddc25
[PATCH v3 12/13] wrapper: factor out is_missing_file_error()
Our code often opens a path to an optional file, to work on its contents when we can successfully open it. We can ignore a failure to open if such an optional file does not exist, but we do want to report a failure in opening for other reasons (e.g. we got an I/O error, or the file is there, but we lack the permission to open). There is a logic to determine if an errno left by open/fopen indicates such an ignorable error. Split it out into a helper function. We may want to make it an extern in later steps, as many hits from "git grep ENOENT" would instead want to be using the same logic. Signed-off-by: Junio C Hamano --- wrapper.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/wrapper.c b/wrapper.c index b117eb14a4..f1c87ec7ea 100644 --- a/wrapper.c +++ b/wrapper.c @@ -423,9 +423,23 @@ static void warn_on_inaccessible(const char *path) warning_errno(_("unable to access '%s'"), path); } +/* + * Our code often opens a path to an optional file, to work on its + * contents when we can successfully open it. We can ignore a failure + * to open if such an optional file does not exist, but we do want to + * report a failure in opening for other reasons (e.g. we got an I/O + * error, or the file is there, but we lack the permission to open). + * + * Call this function after seeing an error from open() or fopen() to + * see if the errno indicates a missing file that we can safely ignore. + */ +static int is_missing_file_error(int errno_) { + return (errno_ == ENOENT || errno_ == ENOTDIR); +} + int warn_on_fopen_errors(const char *path) { - if (errno != ENOENT && errno != ENOTDIR) { + if (!is_missing_file_error(errno)) { warn_on_inaccessible(path); return -1; } -- 2.13.0-491-g71cfeddc25
[PATCH v3 13/13] is_missing_file_error(): work around EINVAL on Windows
When asked to open/fopen a path, e.g. "a/b:/c", which does not exist on the filesystem, Windows (correctly) fails to open it but sets EINVAL to errno because the pathname has characters that cannot be stored in its filesystem. As this is an expected failure, teach is_missing_file_error() helper about this case. This is RFC, as there may be a case where we get EINVAL from open/fopen for reasons other than "the filesystem does not like this pathname" that may be worth reporting to the user, and this change is sweeping such an error under the rug. Signed-off-by: Junio C Hamano --- wrapper.c | 4 1 file changed, 4 insertions(+) diff --git a/wrapper.c b/wrapper.c index f1c87ec7ea..74aa3b7803 100644 --- a/wrapper.c +++ b/wrapper.c @@ -434,6 +434,10 @@ static void warn_on_inaccessible(const char *path) * see if the errno indicates a missing file that we can safely ignore. */ static int is_missing_file_error(int errno_) { +#ifdef GIT_WINDOWS_NATIVE + if (errno_ == EINVAL) + return 1; +#endif return (errno_ == ENOENT || errno_ == ENOTDIR); } -- 2.13.0-491-g71cfeddc25
[PATCH v3 01/13] git_fopen: fix a sparse 'not declared' warning
From: Ramsay Jones If git is built with the FREAD_READS_DIRECTORIES build variable set, this would cause sparse to issue a 'not declared, should it be static?' warning on Linux. This is a result of the method employed by 'compat/fopen.c' to suppress the (possible) redefinition of the (system) fopen macro, which also removes the extern declaration of the git_fopen function. In order to suppress the warning, introduce a new macro to suppress the definition (or possibly the re-definition) of the fopen symbol as a macro override. This new macro (SUPPRESS_FOPEN_REDEFINITION) is only defined in 'compat/fopen.c', just prior to the inclusion of the 'git-compat-util.h' header file. Signed-off-by: Ramsay Jones Signed-off-by: Junio C Hamano --- compat/fopen.c| 4 ++-- git-compat-util.h | 10 ++ 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/compat/fopen.c b/compat/fopen.c index b5ca142fed..107b3e8182 100644 --- a/compat/fopen.c +++ b/compat/fopen.c @@ -1,14 +1,14 @@ /* * The order of the following two lines is important. * - * FREAD_READS_DIRECTORIES is undefined before including git-compat-util.h + * SUPPRESS_FOPEN_REDEFINITION is defined before including git-compat-util.h * to avoid the redefinition of fopen within git-compat-util.h. This is * necessary since fopen is a macro on some platforms which may be set * based on compiler options. For example, on AIX fopen is set to fopen64 * when _LARGE_FILES is defined. The previous technique of merely undefining * fopen after including git-compat-util.h is inadequate in this case. */ -#undef FREAD_READS_DIRECTORIES +#define SUPPRESS_FOPEN_REDEFINITION #include "../git-compat-util.h" FILE *git_fopen(const char *path, const char *mode) diff --git a/git-compat-util.h b/git-compat-util.h index bd04564a69..6be55cf8b3 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -689,10 +689,12 @@ char *gitstrdup(const char *s); #endif #ifdef FREAD_READS_DIRECTORIES -#ifdef fopen -#undef fopen -#endif -#define fopen(a,b) git_fopen(a,b) +# if !defined(SUPPRESS_FOPEN_REDEFINITION) +# ifdef fopen +# undef fopen +# endif +# define fopen(a,b) git_fopen(a,b) +# endif extern FILE *git_fopen(const char*, const char*); #endif -- 2.13.0-491-g71cfeddc25
[PATCH v3 07/13] wrapper.c: add and use fopen_or_warn()
From: Nguyễn Thái Ngọc Duy When fopen() returns NULL, it could be because the given path does not exist, but it could also be some other errors and the caller has to check. Add a wrapper so we don't have to repeat the same error check everywhere. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- attr.c| 7 ++- bisect.c | 2 +- builtin/blame.c | 2 +- commit.c | 2 +- config.c | 2 +- git-compat-util.h | 1 + ident.c | 8 +++- remote.c | 4 ++-- rerere.c | 2 +- sequencer.c | 8 server-info.c | 2 +- t/t1308-config-set.sh | 2 ++ t/t5512-ls-remote.sh | 13 ++--- wrapper.c | 11 +++ wt-status.c | 3 ++- 15 files changed, 43 insertions(+), 26 deletions(-) diff --git a/attr.c b/attr.c index 7e2134471c..821203e2a9 100644 --- a/attr.c +++ b/attr.c @@ -720,16 +720,13 @@ void git_attr_set_direction(enum git_attr_direction new_direction, static struct attr_stack *read_attr_from_file(const char *path, int macro_ok) { - FILE *fp = fopen(path, "r"); + FILE *fp = fopen_or_warn(path, "r"); struct attr_stack *res; char buf[2048]; int lineno = 0; - if (!fp) { - if (errno != ENOENT && errno != ENOTDIR) - warn_on_inaccessible(path); + if (!fp) return NULL; - } res = xcalloc(1, sizeof(*res)); while (fgets(buf, sizeof(buf), fp)) { char *bufp = buf; diff --git a/bisect.c b/bisect.c index 469a3e9061..bb28bf63b2 100644 --- a/bisect.c +++ b/bisect.c @@ -666,7 +666,7 @@ static int is_expected_rev(const struct object_id *oid) if (stat(filename, &st) || !S_ISREG(st.st_mode)) return 0; - fp = fopen(filename, "r"); + fp = fopen_or_warn(filename, "r"); if (!fp) return 0; diff --git a/builtin/blame.c b/builtin/blame.c index 07506a3e45..34445d7894 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -2071,7 +2071,7 @@ static int prepare_lines(struct scoreboard *sb) */ static int read_ancestry(const char *graft_file) { - FILE *fp = fopen(graft_file, "r"); + FILE *fp = fopen_or_warn(graft_file, "r"); struct strbuf buf = STRBUF_INIT; if (!fp) return -1; diff --git a/commit.c b/commit.c index 73c78c2b80..3eeda081f9 100644 --- a/commit.c +++ b/commit.c @@ -167,7 +167,7 @@ struct commit_graft *read_graft_line(char *buf, int len) static int read_graft_file(const char *graft_file) { - FILE *fp = fopen(graft_file, "r"); + FILE *fp = fopen_or_warn(graft_file, "r"); struct strbuf buf = STRBUF_INIT; if (!fp) return -1; diff --git a/config.c b/config.c index 2894fbb6d0..e54d99d519 100644 --- a/config.c +++ b/config.c @@ -1422,7 +1422,7 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data) int ret = -1; FILE *f; - f = fopen(filename, "r"); + f = fopen_or_warn(filename, "r"); if (f) { flockfile(f); ret = do_config_from_file(fn, CONFIG_ORIGIN_FILE, filename, filename, f, data); diff --git a/git-compat-util.h b/git-compat-util.h index eb5c18c7fd..f74b401810 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -802,6 +802,7 @@ extern int xmkstemp(char *template); extern int xmkstemp_mode(char *template, int mode); extern char *xgetcwd(void); extern FILE *fopen_for_writing(const char *path); +extern FILE *fopen_or_warn(const char *path, const char *mode); #define ALLOC_ARRAY(x, alloc) (x) = xmalloc(st_mult(sizeof(*(x)), (alloc))) #define REALLOC_ARRAY(x, alloc) (x) = xrealloc((x), st_mult(sizeof(*(x)), (alloc))) diff --git a/ident.c b/ident.c index bea871c8e0..91c7609055 100644 --- a/ident.c +++ b/ident.c @@ -72,12 +72,10 @@ static int add_mailname_host(struct strbuf *buf) FILE *mailname; struct strbuf mailnamebuf = STRBUF_INIT; - mailname = fopen("/etc/mailname", "r"); - if (!mailname) { - if (errno != ENOENT) - warning_errno("cannot open /etc/mailname"); + mailname = fopen_or_warn("/etc/mailname", "r"); + if (!mailname) return -1; - } + if (strbuf_getline(&mailnamebuf, mailname) == EOF) { if (ferror(mailname)) warning_errno("cannot read /etc/mailname"); diff --git a/remote.c b/remote.c index 801137c72e..1f2453d0f6 100644 --- a/remote.c +++ b/remote.c @@ -251,7 +251,7 @@ static const char *skip_spaces(const char *s) static void read_remotes_file(struct remote *remote) { struct strbuf buf = STRBUF_INIT; - FILE *f = fopen(git_path("remotes/%s", remote->name), "r"); + FILE *f = fopen_or_warn(git_path("remotes/%s", remote->name), "r"); if (!f) r
[PATCH v3 09/13] print errno when reporting a system call error
From: Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- builtin/log.c | 3 ++- rerere.c | 4 ++-- xdiff-interface.c | 4 ++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index b3b10cc1ed..26d6a3cf14 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -858,7 +858,8 @@ static int open_next_file(struct commit *commit, const char *subject, printf("%s\n", filename.buf + outdir_offset); if ((rev->diffopt.file = fopen(filename.buf, "w")) == NULL) - return error(_("Cannot open patch file %s"), filename.buf); + return error_errno(_("Cannot open patch file %s"), + filename.buf); strbuf_release(&filename); return 0; diff --git a/rerere.c b/rerere.c index 971bfedfb2..1351b0c3fb 100644 --- a/rerere.c +++ b/rerere.c @@ -484,13 +484,13 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output io.input = fopen(path, "r"); io.io.wrerror = 0; if (!io.input) - return error("Could not open %s", path); + return error_errno("Could not open %s", path); if (output) { io.io.output = fopen(output, "w"); if (!io.io.output) { fclose(io.input); - return error("Could not write %s", output); + return error_errno("Could not write %s", output); } } diff --git a/xdiff-interface.c b/xdiff-interface.c index 060038c2d6..d3f78ca2a7 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -164,9 +164,9 @@ int read_mmfile(mmfile_t *ptr, const char *filename) size_t sz; if (stat(filename, &st)) - return error("Could not stat %s", filename); + return error_errno("Could not stat %s", filename); if ((f = fopen(filename, "rb")) == NULL) - return error("Could not open %s", filename); + return error_errno("Could not open %s", filename); sz = xsize_t(st.st_size); ptr->ptr = xmalloc(sz ? sz : 1); if (sz && fread(ptr->ptr, sz, 1, f) != 1) { -- 2.13.0-491-g71cfeddc25
[PATCH v3 00/13] reporting unexpected errors after (f)open
These are taken from https://github.com/pclouds/git/commits/fopen-or-warn cf. with a few bits salvaged from its v2 version. Ramsay Jones (1): git_fopen: fix a sparse 'not declared' warning Nguyễn Thái Ngọc Duy (9): use xfopen() in more places clone: use xfopen() instead of fopen() config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD wrapper.c: add and use warn_on_fopen_errors() wrapper.c: add and use fopen_or_warn() wrapper.c: make warn_on_inaccessible() static print errno when reporting a system call error rerere.c: move error_errno() closer to the source system call log: fix memory leak in open_next_file() Junio C Hamano (3): config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too wrapper: factor out is_missing_file_error() is_missing_file_error(): work around EINVAL on Windows Patches 12 & 13 are my attempt to answer the issue the series had on Windows. cf. attr.c| 7 ++- bisect.c | 7 ++- builtin/am.c | 8 ++-- builtin/blame.c | 2 +- builtin/clone.c | 2 +- builtin/commit.c | 5 + builtin/fast-export.c | 4 +--- builtin/fsck.c| 3 +-- builtin/log.c | 11 --- builtin/merge.c | 4 +--- builtin/pull.c| 3 +-- commit.c | 2 +- compat/fopen.c| 4 ++-- config.c | 5 - config.mak.uname | 4 diff.c| 8 ++-- dir.c | 6 +++--- fast-import.c | 4 +--- git-compat-util.h | 15 +-- ident.c | 8 +++- remote-testsvn.c | 8 ++-- remote.c | 4 ++-- rerere.c | 7 --- sequencer.c | 8 server-info.c | 2 +- t/t1308-config-set.sh | 13 - t/t5512-ls-remote.sh | 13 ++--- wrapper.c | 49 - wt-status.c | 3 ++- xdiff-interface.c | 4 ++-- 30 files changed, 133 insertions(+), 90 deletions(-) -- 2.13.0-491-g71cfeddc25
[PATCH v3 06/13] wrapper.c: add and use warn_on_fopen_errors()
From: Nguyễn Thái Ngọc Duy In many places, Git warns about an inaccessible file after a fopen() failed. To discern these cases from other cases where we want to warn about inaccessible files, introduce a new helper specifically to test whether fopen() failed because the current user lacks the permission to open file in question. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- config.c | 3 +++ dir.c | 6 +++--- git-compat-util.h | 2 ++ t/t1308-config-set.sh | 3 ++- wrapper.c | 10 ++ 5 files changed, 20 insertions(+), 4 deletions(-) diff --git a/config.c b/config.c index b4a3205da3..2894fbb6d0 100644 --- a/config.c +++ b/config.c @@ -2640,6 +2640,9 @@ int git_config_rename_section_in_file(const char *config_filename, } if (!(config_file = fopen(config_filename, "rb"))) { + ret = warn_on_fopen_errors(config_filename); + if (ret) + goto out; /* no config file means nothing to rename, no error */ goto commit_and_out; } diff --git a/dir.c b/dir.c index f451bfa48c..be616e884e 100644 --- a/dir.c +++ b/dir.c @@ -745,9 +745,9 @@ static int add_excludes(const char *fname, const char *base, int baselen, fd = open(fname, O_RDONLY); if (fd < 0 || fstat(fd, &st) < 0) { - if (errno != ENOENT) - warn_on_inaccessible(fname); - if (0 <= fd) + if (fd < 0) + warn_on_fopen_errors(fname); + else close(fd); if (!check_index || (buf = read_skip_worktree_file_from_index(fname, &size, sha1_stat)) == NULL) diff --git a/git-compat-util.h b/git-compat-util.h index 6be55cf8b3..eb5c18c7fd 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1101,6 +1101,8 @@ int access_or_die(const char *path, int mode, unsigned flag); /* Warn on an inaccessible file that ought to be accessible */ void warn_on_inaccessible(const char *path); +/* Warn on an inaccessible file if errno indicates this is an error */ +int warn_on_fopen_errors(const char *path); #ifdef GMTIME_UNRELIABLE_ERRORS struct tm *git_gmtime(const time_t *); diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh index 72d5f1f931..df92fdcd40 100755 --- a/t/t1308-config-set.sh +++ b/t/t1308-config-set.sh @@ -195,7 +195,8 @@ test_expect_success POSIXPERM,SANITY 'proper error on non-accessible files' ' chmod -r .git/config && test_when_finished "chmod +r .git/config" && echo "Error (-1) reading configuration file .git/config." >expect && - test_expect_code 2 test-config configset_get_value foo.bar .git/config 2>actual && + test_expect_code 2 test-config configset_get_value foo.bar .git/config 2>output && + grep "^Error" output >actual && test_cmp expect actual ' diff --git a/wrapper.c b/wrapper.c index d837417709..20c25e7e65 100644 --- a/wrapper.c +++ b/wrapper.c @@ -418,6 +418,16 @@ FILE *fopen_for_writing(const char *path) return ret; } +int warn_on_fopen_errors(const char *path) +{ + if (errno != ENOENT && errno != ENOTDIR) { + warn_on_inaccessible(path); + return -1; + } + + return 0; +} + int xmkstemp(char *template) { int fd; -- 2.13.0-491-g71cfeddc25
[PATCH v3 02/13] use xfopen() in more places
From: Nguyễn Thái Ngọc Duy xfopen() - provides error details - explains error on reading, or writing, or whatever operation - has l10n support - prints file name in the error Some of these are missing in the places that are replaced with xfopen(), which is a clear win. In some other places, it's just less code (not as clearly a win as the previous case but still is). The only slight regresssion is in remote-testsvn, where we don't report the file class (marks files) in the error messages anymore. But since this is a _test_ svn remote transport, I'm not too concerned. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- bisect.c | 5 + builtin/am.c | 8 ++-- builtin/commit.c | 5 + builtin/fast-export.c | 4 +--- builtin/fsck.c| 3 +-- builtin/merge.c | 4 +--- builtin/pull.c| 3 +-- diff.c| 8 ++-- fast-import.c | 4 +--- remote-testsvn.c | 8 ++-- 10 files changed, 13 insertions(+), 39 deletions(-) diff --git a/bisect.c b/bisect.c index 08c9fb7266..469a3e9061 100644 --- a/bisect.c +++ b/bisect.c @@ -438,10 +438,7 @@ static void read_bisect_paths(struct argv_array *array) { struct strbuf str = STRBUF_INIT; const char *filename = git_path_bisect_names(); - FILE *fp = fopen(filename, "r"); - - if (!fp) - die_errno(_("Could not open file '%s'"), filename); + FILE *fp = xfopen(filename, "r"); while (strbuf_getline_lf(&str, fp) != EOF) { strbuf_trim(&str); diff --git a/builtin/am.c b/builtin/am.c index a95dd8b4e6..f5dac7783e 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1275,12 +1275,8 @@ static int parse_mail(struct am_state *state, const char *mail) die("BUG: invalid value for state->scissors"); } - mi.input = fopen(mail, "r"); - if (!mi.input) - die("could not open input"); - mi.output = fopen(am_path(state, "info"), "w"); - if (!mi.output) - die("could not open output 'info'"); + mi.input = xfopen(mail, "r"); + mi.output = xfopen(am_path(state, "info"), "w"); if (mailinfo(&mi, am_path(state, "msg"), am_path(state, "patch"))) die("could not parse patch"); diff --git a/builtin/commit.c b/builtin/commit.c index 1d805f5da8..eda0d32311 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1695,10 +1695,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) if (!reflog_msg) reflog_msg = "commit (merge)"; pptr = commit_list_append(current_head, pptr); - fp = fopen(git_path_merge_head(), "r"); - if (fp == NULL) - die_errno(_("could not open '%s' for reading"), - git_path_merge_head()); + fp = xfopen(git_path_merge_head(), "r"); while (strbuf_getline_lf(&m, fp) != EOF) { struct commit *parent; diff --git a/builtin/fast-export.c b/builtin/fast-export.c index e0220630d0..128b99e6da 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -905,9 +905,7 @@ static void export_marks(char *file) static void import_marks(char *input_file) { char line[512]; - FILE *f = fopen(input_file, "r"); - if (!f) - die_errno("cannot read '%s'", input_file); + FILE *f = xfopen(input_file, "r"); while (fgets(line, sizeof(line), f)) { uint32_t mark; diff --git a/builtin/fsck.c b/builtin/fsck.c index b5e13a4556..00beaaa4e6 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -280,8 +280,7 @@ static void check_unreachable_object(struct object *obj) free(filename); return; } - if (!(f = fopen(filename, "w"))) - die_errno("Could not open '%s'", filename); + f = xfopen(filename, "w"); if (obj->type == OBJ_BLOB) { if (stream_blob_to_fd(fileno(f), &obj->oid, NULL, 1)) die_errno("Could not write '%s'", filename); diff --git a/builtin/merge.c b/builtin/merge.c index 703827f006..65a1501858 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -839,9 +839,7 @@ static int suggest_conflicts(void) struct strbuf msgbuf = STRBUF_INIT; filename = git_path_merge_msg(); - fp = fopen(filename, "a"); - if (!fp) - die_errno(_("Could not open '%s' for writing"), filename); + fp = xfopen(filename, "a"); append_conflicts_hint(&msgbuf); fputs(msgbuf.buf, fp); diff --git a/builtin/pull.c b/builtin/pull.c index dd1a4a94e4..589c25becf 100644 --- a/builtin/pull.c +++ b/builtin/pull.c @@ -337,8 +337,7 @@ static void get_merge_heads(struct oid_arra
[PATCH v3 04/13] config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
From: Nguyễn Thái Ngọc Duy This variable is added [1] with the assumption that on a sane system, fopen(, "r") should return NULL. Linux and FreeBSD do not meet this expectation while at least Windows and AIX do. Let's make sure they behave the same way. I only tested one version on Linux (4.7.0 with glibc 2.22) and FreeBSD (11.0) but since GNU/kFreeBSD is fbsd kernel with gnu userspace, I'm pretty sure it shares the same problem. [1] cba22528fa (Add compat/fopen.c which returns NULL on attempt to open directory - 2008-02-08) Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- config.mak.uname | 3 +++ t/t1308-config-set.sh | 8 2 files changed, 11 insertions(+) diff --git a/config.mak.uname b/config.mak.uname index 399fe19271..a25ffddb3e 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux) NEEDS_LIBRT = YesPlease HAVE_GETDELIM = YesPlease SANE_TEXT_GREP=-a + FREAD_READS_DIRECTORIES = UnfortunatelyYes endif ifeq ($(uname_S),GNU/kFreeBSD) HAVE_ALLOCA_H = YesPlease @@ -43,6 +44,7 @@ ifeq ($(uname_S),GNU/kFreeBSD) HAVE_PATHS_H = YesPlease DIR_HAS_BSD_GROUP_SEMANTICS = YesPlease LIBC_CONTAINS_LIBINTL = YesPlease + FREAD_READS_DIRECTORIES = UnfortunatelyYes endif ifeq ($(uname_S),UnixWare) CC = cc @@ -201,6 +203,7 @@ ifeq ($(uname_S),FreeBSD) GMTIME_UNRELIABLE_ERRORS = UnfortunatelyYes HAVE_BSD_SYSCTL = YesPlease PAGER_ENV = LESS=FRX LV=-c MORE=FRX + FREAD_READS_DIRECTORIES = UnfortunatelyYes endif ifeq ($(uname_S),OpenBSD) NO_STRCASESTR = YesPlease diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh index ff50960cca..72d5f1f931 100755 --- a/t/t1308-config-set.sh +++ b/t/t1308-config-set.sh @@ -183,6 +183,14 @@ test_expect_success 'proper error on non-existent files' ' test_cmp expect actual ' +test_expect_success 'proper error on directory "files"' ' + echo "Error (-1) reading configuration file a-directory." >expect && + mkdir a-directory && + test_expect_code 2 test-config configset_get_value foo.bar a-directory 2>output && + grep "^Error" output >actual && + test_cmp expect actual +' + test_expect_success POSIXPERM,SANITY 'proper error on non-accessible files' ' chmod -r .git/config && test_when_finished "chmod +r .git/config" && -- 2.13.0-491-g71cfeddc25
[PATCH v3 03/13] clone: use xfopen() instead of fopen()
From: Nguyễn Thái Ngọc Duy copy_alternates() called fopen() without handling errors. By switching to xfopen(), this bug is fixed. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano --- builtin/clone.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/clone.c b/builtin/clone.c index de85b85254..dde4fe73af 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -357,7 +357,7 @@ static void copy_alternates(struct strbuf *src, struct strbuf *dst, * to turn entries with paths relative to the original * absolute, so that they can be used in the new repository. */ - FILE *in = fopen(src->buf, "r"); + FILE *in = xfopen(src->buf, "r"); struct strbuf line = STRBUF_INIT; while (strbuf_getline(&line, in) != EOF) { -- 2.13.0-491-g71cfeddc25
[PATCH v3 05/13] config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
Signed-off-by: Junio C Hamano --- config.mak.uname | 1 + 1 file changed, 1 insertion(+) diff --git a/config.mak.uname b/config.mak.uname index a25ffddb3e..1743890164 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -110,6 +110,7 @@ ifeq ($(uname_S),Darwin) BASIC_CFLAGS += -DPRECOMPOSE_UNICODE BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1 HAVE_BSD_SYSCTL = YesPlease + FREAD_READS_DIRECTORIES = UnfortunatelyYes endif ifeq ($(uname_S),SunOS) NEEDS_SOCKET = YesPlease -- 2.13.0-491-g71cfeddc25
[PATCH v4 10/10] rebase -i: rearrange fixup/squash lines using the rebase--helper
Hi Johannes, Johannes Schindelin writes: > This operation has quadratic complexity, which is especially painful > on Windows, where shell scripts are *already* slow (mainly due to the > overhead of the POSIX emulation layer). > > Let's reimplement this with linear complexity (using a hash map to > match the commits' subject lines) for the common case; Sadly, the > fixup/squash feature's design neglected performance considerations, > allowing arbitrary prefixes (read: `fixup! hell` will match the > commit subject `hello world`), which means that we are stuck with > quadratic performance in the worst case. > > The reimplemented logic also happens to fix a bug where commented-out > lines (representing empty patches) were dropped by the previous code. > > While at it, clarify how the fixup/squash feature works in `git rebase > -i`'s man page. > > Signed-off-by: Johannes Schindelin > --- > Documentation/git-rebase.txt | 16 ++-- > builtin/rebase--helper.c | 6 +- > git-rebase--interactive.sh | 90 +--- > sequencer.c | 195 > +++ > sequencer.h | 1 + > t/t3415-rebase-autosquash.sh | 2 +- > 6 files changed, 212 insertions(+), 98 deletions(-) > > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt > index 53f4e14..c5464aa5365 100644 > --- a/Documentation/git-rebase.txt > +++ b/Documentation/git-rebase.txt > @@ -430,13 +430,15 @@ without an explicit `--interactive`. > --autosquash:: > --no-autosquash:: > When the commit log message begins with "squash! ..." (or > - "fixup! ..."), and there is a commit whose title begins with > - the same ..., automatically modify the todo list of rebase -i > - so that the commit marked for squashing comes right after the > - commit to be modified, and change the action of the moved > - commit from `pick` to `squash` (or `fixup`). Ignores subsequent > - "fixup! " or "squash! " after the first, in case you referred to an > - earlier fixup/squash with `git commit --fixup/--squash`. > + "fixup! ..."), and there is already a commit in the todo list that > + matches the same `...`, automatically modify the todo list of rebase > + -i so that the commit marked for squashing comes right after the > + commit to be modified, and change the action of the moved commit > + from `pick` to `squash` (or `fixup`). A commit matches the `...` if > + the commit subject matches, or if the `...` refers to the commit's > + hash. As a fall-back, partial matches of the commit subject work, > + too. The recommended way to create fixup/squash commits is by using > + the `--fixup`/`--squash` options of linkgit:git-commit[1]. > + > This option is only valid when the `--interactive` option is used. > + > diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c > index de3ccd9bfbc..e6591f01112 100644 > --- a/builtin/rebase--helper.c > +++ b/builtin/rebase--helper.c > @@ -14,7 +14,7 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > int keep_empty = 0; > enum { > CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S, > - CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS > + CHECK_TODO_LIST, SKIP_UNNECESSARY_PICKS, REARRANGE_SQUASH > } command = 0; > struct option options[] = { > OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), > @@ -33,6 +33,8 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > N_("check the todo list"), CHECK_TODO_LIST), > OPT_CMDMODE(0, "skip-unnecessary-picks", &command, > N_("skip unnecessary picks"), SKIP_UNNECESSARY_PICKS), > + OPT_CMDMODE(0, "rearrange-squash", &command, > + N_("rearrange fixup/squash lines"), REARRANGE_SQUASH), > OPT_END() > }; > > @@ -59,5 +61,7 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > return !!check_todo_list(); > if (command == SKIP_UNNECESSARY_PICKS && argc == 1) > return !!skip_unnecessary_picks(); > + if (command == REARRANGE_SQUASH && argc == 1) > + return !!rearrange_squash(); > usage_with_options(builtin_rebase_helper_usage, options); > } > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh > index 92e3ca1de3b..84c6e62518f 100644 > --- a/git-rebase--interactive.sh > +++ b/git-rebase--interactive.sh > @@ -721,94 +721,6 @@ collapse_todo_ids() { > git rebase--helper --shorten-sha1s > } > > -# Rearrange the todo list that has both "pick sha1 msg" and > -# "pick sha1 fixup!/squash! msg" appears in it so that the latter > -# comes immediately after the former, and change "pick" to > -# "fixup"/"squash". > -# > -# Note that if the config has specified a custom instruction format > -#
[PATCH v4 03/10] rebase -i: remove useless indentation
Hi Johannes, Johannes Schindelin writes: > The commands used to be indented, and it is nice to look at, but when we > transform the SHA-1s, the indentation is removed. So let's do away with it. > > For the moment, at least: when we will use the upcoming rebase--helper > to transform the SHA-1s, we *will* keep the indentation and can > reintroduce it. Yet, to be able to validate the rebase--helper against > the output of the current shell script version, we need to remove the > extra indentation. > > Signed-off-by: Johannes Schindelin > --- > git-rebase--interactive.sh | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh > index 609e150d38f..c40b1fd1d2e 100644 > --- a/git-rebase--interactive.sh > +++ b/git-rebase--interactive.sh > @@ -155,13 +155,13 @@ reschedule_last_action () { > append_todo_help () { > gettext " > Commands: > - p, pick = use commit > - r, reword = use commit, but edit the commit message > - e, edit = use commit, but stop for amending > - s, squash = use commit, but meld into previous commit > - f, fixup = like \"squash\", but discard this commit's log message > - x, exec = run command (the rest of the line) using shell > - d, drop = remove commit > +p, pick = use commit > +r, reword = use commit, but edit the commit message > +e, edit = use commit, but stop for amending > +s, squash = use commit, but meld into previous commit > +f, fixup = like \"squash\", but discard this commit's log message > +x, exec = run command (the rest of the line) using shell > +d, drop = remove commit do we also need to update all the translations since this is a `gettext` function? > > These lines can be re-ordered; they are executed from top to bottom. > " | git stripspace --comment-lines >>"$todo" > -- > 2.12.2.windows.2.800.gede8f145e06 Liam
[PATCH v4 05/10] rebase -i: also expand/collapse the SHA-1s via the rebase--helper
Hi Johannes, Johannes Schindelin writes: > This is crucial to improve performance on Windows, as the speed is now > mostly dominated by the SHA-1 transformation (because it spawns a new > rev-parse process for *every* line, and spawning processes is pretty > slow from Git for Windows' MSYS2 Bash). > > Signed-off-by: Johannes Schindelin > --- > builtin/rebase--helper.c | 10 +++- > git-rebase--interactive.sh | 27 ++ > sequencer.c| 57 > ++ > sequencer.h| 2 ++ > 4 files changed, 70 insertions(+), 26 deletions(-) > > diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c > index 821058d452d..9444c8d6c60 100644 > --- a/builtin/rebase--helper.c > +++ b/builtin/rebase--helper.c > @@ -13,7 +13,7 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > struct replay_opts opts = REPLAY_OPTS_INIT; > int keep_empty = 0; > enum { > - CONTINUE = 1, ABORT, MAKE_SCRIPT > + CONTINUE = 1, ABORT, MAKE_SCRIPT, SHORTEN_SHA1S, EXPAND_SHA1S > } command = 0; > struct option options[] = { > OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), > @@ -24,6 +24,10 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > ABORT), > OPT_CMDMODE(0, "make-script", &command, > N_("make rebase script"), MAKE_SCRIPT), > + OPT_CMDMODE(0, "shorten-sha1s", &command, > + N_("shorten SHA-1s in the todo list"), SHORTEN_SHA1S), > + OPT_CMDMODE(0, "expand-sha1s", &command, > + N_("expand SHA-1s in the todo list"), EXPAND_SHA1S), Since work is being done to convert to `struct object_id` would it not be best to use a more generic name instead of 'sha1'? maybe something like {shorten,expand}-hashs > OPT_END() > }; > > @@ -42,5 +46,9 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > return !!sequencer_remove_state(&opts); > if (command == MAKE_SCRIPT && argc > 1) > return !!sequencer_make_script(keep_empty, stdout, argc, argv); > + if (command == SHORTEN_SHA1S && argc == 1) > + return !!transform_todo_ids(1); > + if (command == EXPAND_SHA1S && argc == 1) > + return !!transform_todo_ids(0); > usage_with_options(builtin_rebase_helper_usage, options); > } > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh > index 214af0372ba..82a1941c42c 100644 > --- a/git-rebase--interactive.sh > +++ b/git-rebase--interactive.sh > @@ -750,35 +750,12 @@ skip_unnecessary_picks () { > die "$(gettext "Could not skip unnecessary pick commands")" > } > > -transform_todo_ids () { > - while read -r command rest > - do > - case "$command" in > - "$comment_char"* | exec) > - # Be careful for oddball commands like 'exec' > - # that do not have a SHA-1 at the beginning of $rest. > - ;; > - *) > - sha1=$(git rev-parse --verify --quiet "$@" ${rest%%[ > ]*}) && > - if test "a$rest" = "a${rest#*[ ]}" > - then > - rest=$sha1 > - else > - rest="$sha1 ${rest#*[]}" > - fi > - ;; > - esac > - printf '%s\n' "$command${rest:+ }$rest" > - done <"$todo" >"$todo.new" && > - mv -f "$todo.new" "$todo" > -} > - > expand_todo_ids() { > - transform_todo_ids > + git rebase--helper --expand-sha1s > } > > collapse_todo_ids() { > - transform_todo_ids --short > + git rebase--helper --shorten-sha1s > } > > # Rearrange the todo list that has both "pick sha1 msg" and > diff --git a/sequencer.c b/sequencer.c > index 88819a1a2a9..201d45b1677 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -2437,3 +2437,60 @@ int sequencer_make_script(int keep_empty, FILE *out, > strbuf_release(&buf); > return 0; > } > + > + > +int transform_todo_ids(int shorten_sha1s) > +{ > + const char *todo_file = rebase_path_todo(); > + struct todo_list todo_list = TODO_LIST_INIT; > + int fd, res, i; > + FILE *out; > + > + strbuf_reset(&todo_list.buf); > + fd = open(todo_file, O_RDONLY); > + if (fd < 0) > + return error_errno(_("could not open '%s'"), todo_file); > + if (strbuf_read(&todo_list.buf, fd, 0) < 0) { > + close(fd); > + return error(_("could not read '%s'."), todo_file); > + } > + close(fd); > + > + res = parse_insn_buffer(todo_list.buf.buf, &todo_list); > + if (res) { > + todo_list_release(&todo_list); > + return error(_("unusable instruc
[PATCH v4 02/10] rebase -i: generate the script via rebase--helper
Hi Johannes, Johannes Schindelin writes: > The first step of an interactive rebase is to generate the so-called "todo > script", to be stored in the state directory as "git-rebase-todo" and to > be edited by the user. > > Originally, we adjusted the output of `git log ` using a simple > sed script. Over the course of the years, the code became more > complicated. We now use shell scripting to edit the output of `git log` > conditionally, depending whether to keep "empty" commits (i.e. commits > that do not change any files). > > On platforms where shell scripting is not native, this can be a serious > drag. And it opens the door for incompatibilities between platforms when > it comes to shell scripting or to Unix-y commands. > > Let's just re-implement the todo script generation in plain C, using the > revision machinery directly. > > This is substantially faster, improving the speed relative to the > shell script version of the interactive rebase from 2x to 3x on Windows. > > Note that the rearrange_squash() function in git-rebase--interactive > relied on the fact that we set the "format" variable to the config setting > rebase.instructionFormat. Relying on a side effect like this is no good, > hence we explicitly perform that assignment (possibly again) in > rearrange_squash(). > > Signed-off-by: Johannes Schindelin > --- > builtin/rebase--helper.c | 8 +++- > git-rebase--interactive.sh | 44 + > sequencer.c| 49 > ++ > sequencer.h| 3 +++ > 4 files changed, 82 insertions(+), 22 deletions(-) > > diff --git a/builtin/rebase--helper.c b/builtin/rebase--helper.c > index ca1ebb2fa18..821058d452d 100644 > --- a/builtin/rebase--helper.c > +++ b/builtin/rebase--helper.c > @@ -11,15 +11,19 @@ static const char * const builtin_rebase_helper_usage[] = > { > int cmd_rebase__helper(int argc, const char **argv, const char *prefix) > { > struct replay_opts opts = REPLAY_OPTS_INIT; > + int keep_empty = 0; > enum { > - CONTINUE = 1, ABORT > + CONTINUE = 1, ABORT, MAKE_SCRIPT > } command = 0; > struct option options[] = { > OPT_BOOL(0, "ff", &opts.allow_ff, N_("allow fast-forward")), > + OPT_BOOL(0, "keep-empty", &keep_empty, N_("keep empty > commits")), > OPT_CMDMODE(0, "continue", &command, N_("continue rebase"), > CONTINUE), > OPT_CMDMODE(0, "abort", &command, N_("abort rebase"), > ABORT), > + OPT_CMDMODE(0, "make-script", &command, > + N_("make rebase script"), MAKE_SCRIPT), > OPT_END() > }; > > @@ -36,5 +40,7 @@ int cmd_rebase__helper(int argc, const char **argv, const > char *prefix) > return !!sequencer_continue(&opts); > if (command == ABORT && argc == 1) > return !!sequencer_remove_state(&opts); > + if (command == MAKE_SCRIPT && argc > 1) > + return !!sequencer_make_script(keep_empty, stdout, argc, argv); > usage_with_options(builtin_rebase_helper_usage, options); > } > diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh > index 2c9c0165b5a..609e150d38f 100644 > --- a/git-rebase--interactive.sh > +++ b/git-rebase--interactive.sh > @@ -785,6 +785,7 @@ collapse_todo_ids() { > # each log message will be re-retrieved in order to normalize the > # autosquash arrangement > rearrange_squash () { > + format=$(git config --get rebase.instructionFormat) > # extract fixup!/squash! lines and resolve any referenced sha1's > while read -r pick sha1 message > do > @@ -1210,26 +1211,27 @@ else > revisions=$onto...$orig_head > shortrevisions=$shorthead > fi > -format=$(git config --get rebase.instructionFormat) > -# the 'rev-list .. | sed' requires %m to parse; the instruction requires %H > to parse > -git rev-list $merges_option --format="%m%H ${format:-%s}" \ > - --reverse --left-right --topo-order \ > - $revisions ${restrict_revision+^$restrict_revision} | \ > - sed -n "s/^>//p" | > -while read -r sha1 rest > -do > - > - if test -z "$keep_empty" && is_empty_commit $sha1 && ! is_merge_commit > $sha1 > - then > - comment_out="$comment_char " > - else > - comment_out= > - fi > +if test t != "$preserve_merges" > +then > + git rebase--helper --make-script ${keep_empty:+--keep-empty} \ > + $revisions ${restrict_revision+^$restrict_revision} >"$todo" > +else > + format=$(git config --get rebase.instructionFormat) > + # the 'rev-list .. | sed' requires %m to parse; the instruction > requires %H to parse > + git rev-list $merges_option --format="%m%H ${format:-%s}" \ > + --reverse --left-right --topo-order \ > + $revisions ${restrict_revision+^$restrict_revision} |
[PATCH v4 00/10] The final building block for a faster rebase -i
Hi Johannes, Johannes Schindelin writes: > This patch series reimplements the expensive pre- and post-processing of > the todo script in C. > > And it concludes the work I did to accelerate rebase -i. > > Changes since v3: > > - removed the no-longer-used transform_todo_ids shell function > > - simplified transform_todo_ids()'s command parsing > > - fixed two commits in check_todo_list(), renamed the unclear > `raise_error` variable to `advise_to_edit_todo`, build the message > about missing commits directly (without the detour to building a > commit_list) and instead of assigning an unused pointer to commit->util > the code now uses (void *)1. > > - return early from check_todo_list() when parsing failed, even if the > check level is something else than CHECK_IGNORE > > - the todo list is generated is again generated in the same way as > before when rebase.instructionFormat is empty: it was interpreted as > if it had not been set > > - added a test for empty rebase.instructionFormat settings > > > Johannes Schindelin (10): > t3415: verify that an empty instructionFormat is handled as before > rebase -i: generate the script via rebase--helper > rebase -i: remove useless indentation > rebase -i: do not invent onelines when expanding/collapsing SHA-1s > rebase -i: also expand/collapse the SHA-1s via the rebase--helper > t3404: relax rebase.missingCommitsCheck tests > rebase -i: check for missing commits in the rebase--helper > rebase -i: skip unnecessary picks using the rebase--helper > t3415: test fixup with wrapped oneline > rebase -i: rearrange fixup/squash lines using the rebase--helper > > Documentation/git-rebase.txt | 16 +- > builtin/rebase--helper.c | 29 ++- > git-rebase--interactive.sh| 373 - > sequencer.c | 530 > ++ > sequencer.h | 8 + > t/t3404-rebase-interactive.sh | 22 +- > t/t3415-rebase-autosquash.sh | 28 ++- > 7 files changed, 646 insertions(+), 360 deletions(-) > > > base-commit: 027a3b943b444a3e3a76f9a89803fc10245b858f > Based-On: rebase--helper at https://github.com/dscho/git > Fetch-Base-Via: git fetch https://github.com/dscho/git rebase--helper > Published-As: https://github.com/dscho/git/releases/tag/rebase-i-extra-v4 > Fetch-It-Via: git fetch https://github.com/dscho/git rebase-i-extra-v4 > This is my first review so it's probably not the best you'll get, but here it goes! I rebased the series ontop of v2.13.0 and run the whole `make test` on both revisions. The changes do not seem to have introduced any evident breakage as the output of `make test` did not change. I tried to time the execution on an interactive rebase (on Linux) but I did not notice a significant change in speed. Do we have a way to measure performance / speed changes between version? Liam
Re: What's cooking in git.git (May 2017, #06; Mon, 22)
Duy Nguyen writes: > On Mon, May 22, 2017 at 1:11 PM, Junio C Hamano wrote: >> * nd/fopen-errors (2017-05-09) 23 commits >> - t1308: add a test case on open a config directory >> - config.c: handle error on failing to fopen() >> - xdiff-interface.c: report errno on failure to stat() or fopen() >> - wt-status.c: report error on failure to fopen() >> - server-info: report error on failure to fopen() >> - sequencer.c: report error on failure to fopen() >> - rerere.c: report correct errno >> - rerere.c: report error on failure to fopen() >> - remote.c: report error on failure to fopen() >> - commit.c: report error on failure to fopen() the graft file >> - log: fix memory leak in open_next_file() >> - log: report errno on failure to fopen() a file >> - blame: report error on open if graft_file is a directory >> - bisect: report on fopen() error >> - ident.c: use fopen_or_warn() >> - attr.c: use fopen_or_warn() >> - wrapper.c: add fopen_or_warn() >> - wrapper.c: add warn_on_fopen_errors() >> - config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too >> - config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD >> - clone: use xfopen() instead of fopen() >> - Use xfopen() in more places >> - git_fopen: fix a sparse 'not declared' warning >> >> We often try to open a file for reading whose existence is >> optional, and silently ignore errors from open/fopen; report such >> errors if they are not due to missing files. > > If anyone wants to continue this, I've cleaned up the series based on > Johannes comments here [1]. It does not have the Darwin change though. Also it seems to be missing the SUPPRESS_FOPEN_REDEF thing by Ramsay. I'll mix these two in, post to the list for review and requeue. Thanks. > There was the last question about the '*' test change in ref path and > what exact code change causes that, which I can't answer because I > don't have Windows, or the time to simulate and pinpoint the fault on > Linux. > > [1] https://github.com/pclouds/git/commits/fopen-or-warn
Re: [RFC/PATCH] recognize pathspec magic without "--" disambiguation
Jeff King writes: > But let's consider related invocations and whether we're > making them better or worse: > >- git log :/foo > (when "foo" matches a commit message) > > This one should remain the same. Like the existing > wildcard rule, we're touching only verify_filename(), > not verify_non_filename(). So cases that _do_ resolve > as a rev will continue to do so. > >- git log :^foo > (when "^foo" exists in your index) > > The same logic applies; it would continue to work. And > ditto for any other weird filenames in your index like > "(attr)foo". "git show :$path" where $path happens to be "^foo" would grab the contents of the $path out of the index and I think that is what you meant, but use of "git log" in the example made me scratch my head greatly. >- git log :/foo > (when "foo" does _not_ match a commit message) > ... > This same downside actually exists currently when you > have an asterisk in your regex. E.g., > > git log :/foo.*bar > > will be treated as a pathspec (if it doesn't match a > commit message) due to the wildcard matching in > 28fcc0b71. In other words, we are not making things worse? > I wrote all the above to try to convince myself that this > doesn't have any serious regression cases. And I think I > did. > > But I actually we could make the rules in alternative (2) > above work. check_filename() would ask the pathspec code to > parse each argument and get one of three results: > > 1. it's not pathspec magic; treat it like a filename > (e.g., just "foo", or even bogus stuff like ":%foo") > > 2. it is pathspec magic, and here is the matching filename > that ought to exist (e.g., "foo" from ":^foo" or > ":(exclude)foo") > > 3. it is pathspec magic, but there's no matching filename. > Assume it's a pathspec (e.g., "(attr)foo"). > > I'm on the fence on whether it's worth the trouble versus > the simple rule implemented by this patch. Unlike "git log builtin-checkout.c" that errors out (only because there is no such file in the checkout of the current version) and makes its solution obvious to the users, this change has the risk of silently accepting an ambiguous input and computing result that is different from what the user intended to. So I am not sure. As you pointedout, ":/" especially does look like a likely point of failure, in that both "this is path at the top" pathspec magic and "the commit with this string" are not something that we can say with confidence that are rarely used because they are so esoteric. As to "is it OK to build a rule that we cannot explain easily?", I think it is OK to say "if it is not a rev, and if it is not a pathname in the current working tree, you must disambiguate, but Git helps by guessing in some cases---if you want to have more control (e.g. you are a script), explicitly disambiguate and you'd be OK", and leave the "some cases" vague, as long as we are only making reasonably conservative guesses.
Re: [PATCHv5 00/17] Diff machine: highlight moved lines.
Stefan Beller writes: > As you turn on/off normal coloring via "color.diff" and this only extends > the coloring scheme, I have the impression "color" is the right section. > Maybe color.diffStyle=[normal/enhanced] and "enhanced" means this > feature for now? Hmph, I thought the intent of color.diff is "is the diff command itself is colored?" In other words, color.diff=false should give you monochrome if you say "diff --word-diff", etc. > The only option in the "diff" section related to color is > diff.wsErrorHighlight > which has a very similar purpose, so "diff.colorMoved" would fit in that > scheme. I didn't have "should diff output highlight whitespace errors?" in mind when I wrote the message you are responding to, but yes, that is quite similar to "should diff output show lines moved and lines deleted/added differently?". > So with these questions, I wonder if we want to color moved lines > as "color.diff.context" (i.e. plain white text in the normal coloring scheme) > This would serve the intended purpose of > dimming the attention to moved lines. Yes, but two points. (1) We want to do so while making it obvious where the boundary between two moved blocks of text whose destination (for moved-deleted lines) or source (for moved-added lines) is. (2) My message was an impression from using the code to review a patch that is meant to be "move without changing other things". For other purposes, there may be cases where moved ones may want to be highlighted, not dimmed. > Regarding the last point of marking up adjacent blocks (which would > indicate that there is a coherency issue or just moving from different > places), we could highlight the last line of the previous block > and first line of the next block in their "normal" colors (i.e. > color.diff.old and color.diff.new). Hmm. That is an interesting thought.
Re: [PATCHv5 16/17] diff: buffer all output if asked to
Stefan Beller writes: > Yes, this is essentially the v4 with small indentation fixes, so I > assumed your signoff is still valid. OK. I just didn't expect to see one without "no changes since v4" under the three-dash line. Thanks.
Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths
Samuel Lijin writes: > On Wed, May 24, 2017 at 12:14 AM, Torsten Bögershausen wrote: >> >>> diff --git a/builtin/clean.c b/builtin/clean.c >>> index d861f836a..937eb17b6 100644 >>> --- a/builtin/clean.c >>> +++ b/builtin/clean.c >>> @@ -857,6 +857,38 @@ static void interactive_main_loop(void) >>> } >>> } >>> +static void correct_untracked_entries(struct dir_struct *dir) >> >> Looking what the function does, would >> drop_or_keep_untracked_entries() >> make more sense ? > > To me, drop_or_keep_ implies that they're either all dropped or all > kept, nothing in between, which is why I went with correct_, to > indicate that the set of untracked entries in the dir_struct prior to > calling the method needs to be corrected. Neither is a particularly good name, but if I have to pick, I'd say we should keep yours. drop-or-keep may indicate some are dropped while others are kept but it does not say what the function is for. correct is better in the sense that the readers can guess that there is something wrong in "dir" at the point and needs correcting by calling the helper, but still does not convey what exactly is wrong. How the wrong-ness is corrected does not have to be explained in the name (i.e. I am saying that drop-or-keep does not give us much useful information), but I wish that the name of the helper hinted what kind of wrongness is there to be corrected to the readers.
Re: [PATCH v2 1/7] grep: don't redundantly compile throwaway patterns under threading
Ævar Arnfjörð Bjarmason writes: > I think it's a pointless distraction to start speculating in this > commit message what we're going to do with --debug it if it ever > starts emitting some debugging information at pattern execution time. OK. > As an aside, I'd very much like to remove both --debug and the > --and/--or/--all-match, gives some very rough edges in the UI and how > easy it is to make that feature error or segfault, I suspect you might > be the only one using it. I agree that rewriting "grep -e A -e B" to "grep -e A|B" as an optimization is an interesting possibility to look into, and I can understand that having to support "--and" and "--not" would make such an optimization harder to implement. "-e A --and -e B" must become "-e A.*B|B.*A" and as you get more terms your unified pattern will grow combinatorial, at which point you would be better off matching N patterns and combining the result. Ever saw a user run "ps | grep rogue | grep -v grep" to find a rogue process to kill? That would not work if the rogue process's command line has a word "grep". Because "git grep" is often run on files in order to find the location the patterns appear in, "git grep -e pattern | grep -v unwanted" shares the same issue--the unwanted pattern may appear in the filename, and the downstream "grep -v" may filter out a valid hit. This is why "--not" exists [*1*]. I agree that emulating it within the same "concatenate patterns into one" optimization you are envisioning may be hard. Attempting to optimize "--all-match" would share similar difficulty with "--and", but your matching now must be done with the entire buffer and not go line-by-line. It was meant to make it possible to say "find commits that avarab@ talks about both regex and log", i.e. $ git log --author=avarab@ --all-match --grep=log --grep=regex This is not something you can emulate by piping an output of grep to another grep. But none of the above means you have to give up optimizing. You can choose not to combine them into a single pattern if certain constructions are hard, and do only the easy ones. If you think that harder combinations are not used very often, the result would be faster for many cases while not losing useful features, which is what we want. [Footnote] *1* For human consumption, lack of "--not" may not hurt in the sense that there are workarounds (i.e. you can do without "| grep -v unwanted" and filter irrelevant ones by eyeballing). But it is essential while scripting and trying to be precise.
Feature request: "git status" highlights files that are larger than 500kb.
So it will be easy to track that we don't accidentally commit huge files. What do you think?
Re: `pull --rebase --autostash` fails when fast forward in dirty repo
Jeff King writes: > Anyway. All this has shown me is that it's probably pointless to do this > timing at all on Linux. Somebody on Windows might get better results. > > But regardless, we need to do something. Correctness must trump > optimizations, and the question is whether we can throw out the whole > conditional, or if we should just restrict when it kicks in. Yes. I personally do not mind going with the simplest approach. The optimization thing is relatively new and we were perfectly happy without it before ;-).
Re: [PATCH v2 0/2] Update sha1dc from upstream & optionally make it a submodule
Ævar Arnfjörð Bjarmason writes: > But from what you're saying here that seems like a non-issue, i.e. in > such a scenario we'd just mirror the original repo[1], change the URL > in git.git to that, and then anyone could easily use older history > since it would be pointing to the new mirror. Those who cloned before such a switch will still have the URL they chose when they cloned and did "submodule init" on it, I think, so they need to explicitly choose to switch to the new URL. So I would not exactly say "seems like a non-issue"; it certainly is an easy thing to work around.
Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch
Jeff King writes: > I dunno. I was thinking there might be a quick tweak, but I'm wondering > if this arcane case is worth the restructuring we'd have to do to > support it. It only comes up when you've moved the server repo's HEAD to > an unborn branch _and_ you have other refs (since otherwise we don't > even send capabilities at all!). Thanks for digging. You made me to start doubting it is worth doing, too.
[PATCH] docs/config.txt: fix indefinite article in core.fileMode description
Signed-off-by: SZEDER Gábor --- Documentation/config.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 475e874d5..f9adc9afa 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -334,7 +334,7 @@ core.fileMode:: is to be honored. + Some filesystems lose the executable bit when a file that is -marked as executable is checked out, or checks out an +marked as executable is checked out, or checks out a non-executable file with executable bit on. linkgit:git-clone[1] or linkgit:git-init[1] probe the filesystem to see if it handles the executable bit correctly -- 2.13.0.35.g14b6294b1
Re: [PATCH v3 2/2] Windows: do not treat a path with backslashes as a remote's nick name
Johannes Sixt writes: >>> So in short: >>> >>> (1) Hannes's patches are good, but they solve a problem that is >>> different from what their log messages say; the log message >>> needs to be updated; > > I do not resend patch 1/2 as it is unchanged in all regards. This 2/2 > changes the justification; patch text is unchanged. Thanks. I think this is explained better. Complaints against fopen() warnings sounded as if we should avoid attempting to open a file that may result in _any_ failure, which I felt was misleading, but it is not a huge issue. So how do we want to proceed on the point (2), i.e. updating the "warn on _unexpected_ errors from fopen" series to make it aware of the EINVAL we can expect on Windows? My primary question is if all EINVAL we could ever see on Windows after open/fopen returns an error is because the pathname the caller gave is not liked by the filesystem (hence we also know that the path does not exist). If that is the case, then the "workaround" patch I sent would be an OK approach (even though I do not know what to write after #ifdef and I suspect that is not "WINDOWS". We would want to cover the one you use, the one Dscho releases and possibly the cygwin build). If we can see EINVAL after open/fopen error that is _not_ expected and indicates a failure that is worth reporting to the user (just like we want to report e.g. I/O or permission errors), I think Windows folks are in a better position than I am to decide between that approach and a patch at lower level (e.g. teach open/fopen not to give EINVAL and instead give ENOENT when appropriate). > remote.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/remote.c b/remote.c > index ad6c5424ed..1949882c10 100644 > --- a/remote.c > +++ b/remote.c > @@ -645,7 +645,12 @@ static int valid_remote_nick(const char *name) > { > if (!name[0] || is_dot_or_dotdot(name)) > return 0; > - return !strchr(name, '/'); /* no slash */ > + > + /* remote nicknames cannot contain slashes */ > + while (*name) > + if (is_dir_sep(*name++)) > + return 0; > + return 1; > } > > const char *remote_for_branch(struct branch *branch, int *explicit)
Re: Documentation issue: git-stash examples
On Fri, May 26, 2017 at 08:37:41AM +1200, Adrian Forbes wrote: > Yep, but how do I send it back to you? As a mail attachment? Usually you'd send it to the list, with the commit message in the body and the patch inline below, as generated by "git format-patch". Unfortunately gmail is notorious for corrupting whitespace in message bodies. You can use git-send-email to send the mails (but you'll need to configure your gmail smtp information; see the EXAMPLE section of "git help send-email"). Alternatively, you can make a pull request to https://github.com/git/git and then send it to the list via https://submitgit.herokuapp.com/. -Peff
Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
On Thu, May 18, 2017 at 10:13 PM, Ben Peart wrote: > This hook script integrates the new fsmonitor capabilities of git with > the cross platform Watchman file watching service. To use the script: > > Download and install Watchman from https://facebook.github.io/watchman/ > and instruct Watchman to watch your working directory for changes > ('watchman watch-project /usr/src/git'). > > Rename the sample integration hook from query-fsmonitor.sample to > query-fsmonitor. > > Configure git to use the extension ('git config core.fsmonitor true') > and optionally turn on the untracked cache for optimal performance > ('git config core.untrackedcache true'). > > Signed-off-by: Ben Peart > Signed-off-by: Johannes Schindelin > --- > templates/hooks--query-fsmonitor.sample | 27 +++ > 1 file changed, 27 insertions(+) > create mode 100644 templates/hooks--query-fsmonitor.sample > > diff --git a/templates/hooks--query-fsmonitor.sample > b/templates/hooks--query-fsmonitor.sample > new file mode 100644 > index 00..4bd22f21d8 > --- /dev/null > +++ b/templates/hooks--query-fsmonitor.sample > @@ -0,0 +1,27 @@ > +#!/bin/sh > +# > +# An example hook script to integrate Watchman > +# (https://facebook.github.io/watchman/) with git to provide fast > +# git status. > +# > +# The hook is passed a time_t formatted as a string and outputs to > +# stdout all files that have been modified since the given time. > +# Paths must be relative to the root of the working tree and > +# separated by a single NUL. > +# > +# To enable this hook, rename this file to "query-fsmonitor" > + > +# Convert unix style paths to escaped Windows style paths > +case "$(uname -s)" in > +MINGW*|MSYS_NT*) > + GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,,g')" > + ;; > +*) > + GIT_WORK_TREE="$PWD" > + ;; > +esac > + > +# Query Watchman for all the changes since the requested time > +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, > \"fields\":[\"name\"]}]" | \ > +watchman -j | \ > +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); > die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if > defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));' I couldn't get watchman to work for me (built from source, keep getting [1]), but I hacked up something you can hopefully test & squash on top of this: # Query Watchman for all the changes since the requested time -echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t, \"fields\":[\"name\"]}]" | \ -watchman -j | \ -perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o- +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t, \"fields\":[\"name\"]}]" | + watchman -j | + perl -0666 -e ' + use strict; + use warnings; + + my $stdin = <>; + die "Watchman: The watchman command returned no output, error above?\n" if $stdin eq ""; + die "Watchman: Invalid input: $stdin\n" unless $stdin =~ /^\{/; + + my $json_pkg; + eval { + require JSON::XS; + $json_pkg = "JSON::XS"; + 1; + } or do { + require JSON::PP; + $json_pkg = "JSON::PP"; + }; + + my $o = $json_pkg->new->utf8->decode($stdin); + die "Watchman: $o->{error}.\nFalling back to scanning...\n" + if $o->{error}; + + local $, = "\0"; + print @{$o->{files}}; + ' Rationale: * We use the much faster JSON::XS if it's installed. * We use strict/warnings * Micro optimization: Replace joining stdin with an equivalent -0666 invocation. See "perldoc perlrun". * Micro optimization: No need to join up the possibly large list of files into one big string, just set $, to \0 and stream out the array. * Error handling: My watchman is broken (so actually this isn't tested), it only spews to stderr and exits. Handle that by checking whether stdin is "". Those changes are available at https://github.com/avar/git/commits/avar/fsmonitor 1. watchman: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by watchman)
[PATCH v3 7/7] grep: add support for PCRE v2
Add support for v2 of the PCRE API. This is a new major version of PCRE that came out in early 2015[1]. The regular expression syntax is the same, but while the API is similar, pretty much every function is either renamed or takes different arguments. Thus using it via entirely new functions makes sense, as opposed to trying to e.g. have one compile_pcre_pattern() that would call either PCRE v1 or v2 functions. Git can now be compiled with either USE_LIBPCRE1=YesPlease or USE_LIBPCRE2=YesPlease, with USE_LIBPCRE=YesPlease currently being a synonym for the former. Providing both is a compile-time error. With earlier patches to enable JIT for PCRE v1 the performance of the release versions of both libraries is almost exactly the same, with PCRE v2 being around 1% slower. However after I reported this to the pcre-dev mailing list[2] I got a lot of help with the API use from Zoltán Herczeg, he subsequently optimized some of the JIT functionality in v2 of the library. Running the p7820-grep-engines.sh performance test against the latest Subversion trunk of both, with both them and git compiled as -O3, and the test run against linux.git, gives the following results. Just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_COMMAND='grep -q LIBPCRE2 Makefile && make -j8 USE_LIBPCRE2=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre2/inst/lib || make -j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~2 HEAD~ HEAD p7820-grep-engines.sh [...] Test HEAD~2HEAD~ HEAD 7820.3: perl grep 'how.to' 0.22(0.40+0.48) 0.22(0.31+0.58) +0.0% 0.22(0.26+0.59) +0.0% 7820.7: perl grep '^how to' 0.27(0.62+0.50) 0.28(0.60+0.50) +3.7% 0.22(0.25+0.60) -18.5% 7820.11: perl grep '[how] to' 0.33(0.92+0.47) 0.33(0.94+0.45) +0.0% 0.25(0.42+0.51) -24.2% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.35(1.08+0.46) 0.35(1.12+0.41) +0.0% 0.25(0.52+0.50) -28.6% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.30(0.78+0.51) 0.30(0.86+0.42) +0.0% 0.25(0.29+0.54) -16.7% See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Here HEAD~2 is git with PCRE v1 without JIT, HEAD~ is PCRE v1 with JIT, and HEAD is PCRE v2 (also with JIT). See previous commits of mine mentioning p7820-grep-engines.sh for more details on the test setup. For ease of readability, a different run just of HEAD~ (PCRE v1 with JIT v.s. PCRE v2), again with just the /perl/ tests shown: Test HEAD~ HEAD --- 7820.3: perl grep 'how.to' 0.23(0.41+0.47) 0.23(0.26+0.59) +0.0% 7820.7: perl grep '^how to' 0.27(0.64+0.47) 0.23(0.28+0.56) -14.8% 7820.11: perl grep '[how] to' 0.34(0.95+0.44) 0.25(0.38+0.56) -26.5% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 0.34(1.07+0.46) 0.24(0.52+0.49) -29.4% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.30(0.81+0.46) 0.22(0.33+0.54) -26.7% I.e. the two are either neck-to-neck, but PCRE v2 usually pulls ahead, when it does it's around 20% faster. A brief note on thread safety: As noted in pcre2api(3) & pcre2jit(3) the compiled pattern can be shared between threads, but not some of the JIT context, however the grep threading support does all pattern & JIT compilation in separate threads, so this code doesn't need to concern itself with thread safety. See commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) for the initial addition of PCRE v1. This change follows some of the same patterns it did (and which were discussed on list at the time), e.g. mocking up types with typedef instead of ifdef-ing them out when USE_LIBPCRE2 isn't defined. This adds some trivial memory use to the program, but makes the code look nicer. 1. https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html 2. https://lists.exim.org/lurker/thread/20170419.172322.833ee099.en.html Signed-off-by: Ævar Arnfjörð Bjarmason --- Makefile | 36 ++ configure.ac | 77 +- grep.c| 147 ++ grep.h| 17 +++ t/test-lib.sh | 2 +- 5 files changed, 258 insertions(+), 21 deletions(-) diff --git a/Makefile b/Makefile index a79274e5e6..e5a2
[PATCH v3 4/7] grep: add support for the PCRE v1 JIT API
Change the grep PCRE v1 code to use JIT when available. When PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) PCRE had no JIT support, it was integrated into 8.20 released on 2011-10-21. Enabling JIT support usually improves performance by more than 40%. The pattern compilation times are relatively slower, but those relative numbers are tiny, and are easily made back in all but the most trivial cases of grep. Detailed benchmarks & overview of compilation times is at: http://sljit.sourceforge.net/pcre.html With this change the difference in a t/perf/p7820-grep-engines.sh run is, with just the /perl/ tests shown: $ GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=YesPlease CC=~/perl5/installed/bin/gcc NO_R_TO_GCC_LINKER=YesPlease CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst LDFLAGS=-Wl,-rpath,/home/avar/g/pcre/inst/lib' ./run HEAD~ HEAD p7820-grep-engines.sh Test HEAD~ HEAD --- 7820.3: perl grep 'how.to' 0.35(1.11+0.43) 0.23(0.42+0.46) -34.3% 7820.7: perl grep '^how to' 0.64(2.71+0.36) 0.27(0.66+0.44) -57.8% 7820.11: perl grep '[how] to' 0.63(2.51+0.42) 0.33(0.98+0.39) -47.6% 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.17(5.61+0.35) 0.34(1.08+0.46) -70.9% 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.43(1.52+0.44) 0.30(0.88+0.42) -30.2% The conditional support for JIT is implemented as suggested in the pcrejit(3) man page. E.g. defining PCRE_STUDY_JIT_COMPILE to 0 if it's not present. The implementation is relatively verbose because even if PCRE_CONFIG_JIT is defined only a call to pcre_config() can determine if the JIT is available, and if so the faster pcre_jit_exec() function should be called instead of pcre_exec(), and a different (but not complimentary!) function needs to be called to free pcre1_extra_info. There's no graceful fallback if pcre_jit_stack_alloc() fails under PCRE_CONFIG_JIT, instead the program will simply abort. I don't think this is worth handling gracefully, it'll only fail in cases where malloc() doesn't work, in which case we're screwed anyway. That there's no assignment of `p->pcre1_jit_on = 0` when PCRE_CONFIG_JIT isn't defined isn't a bug. The create_grep_pat() function allocates the grep_pat allocates it with calloc(), so it's guaranteed to be 0 when PCRE_CONFIG_JIT isn't defined. I you're bisecting and find this change, check that your PCRE isn't older than 8.32. This change intentionally broke really old versions of PCRE, but that's fixed in follow-up commits. Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.c | 40 grep.h | 6 ++ 2 files changed, 42 insertions(+), 4 deletions(-) diff --git a/grep.c b/grep.c index d03d424e5c..ffe95995ee 100644 --- a/grep.c +++ b/grep.c @@ -365,9 +365,22 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) if (!p->pcre1_regexp) compile_regexp_failed(p, error); - p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error); + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, PCRE_STUDY_JIT_COMPILE, &error); if (!p->pcre1_extra_info && error) die("%s", error); + +#ifdef PCRE_CONFIG_JIT + pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on); + if (p->pcre1_jit_on == 1) { + p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024); + if (!p->pcre1_jit_stack) + die("Couldn't allocate PCRE JIT stack"); + pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack); + } else if (p->pcre1_jit_on != 0) { + die("BUG: The pcre1_jit_on variable should be 0 or 1, not %d", + p->pcre1_jit_on); + } +#endif } static int pcre1match(struct grep_pat *p, const char *line, const char *eol, @@ -378,8 +391,19 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; - ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line, - 0, flags, ovector, ARRAY_SIZE(ovector)); +#ifdef PCRE_CONFIG_JIT + if (p->pcre1_jit_on) { + ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line, + eol - line, 0, flags, ovector, + ARRAY_SIZE(ovector), p->pcre1_jit_stack); + } else +#endif + { + ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, + eol - line, 0, flags, ovector, + ARRAY_SIZE(ovector)); + } + if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
[PATCH v3 6/7] grep: un-break building with PCRE < 8.20
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.20. The 8.20 release was the first release to have JIT & pcre_jit_stack in the headers, so a mock type needs to be provided for it on those releases. Now git should compile with all PCRE versions that it supported before my JIT change. I've tested it as far back as version 7.5 released on 2008-01-10, once I got down to version 7.0 it wouldn't build anymore with GCC 7.1.1, and I couldn't be bothered to anything older than 7.5 as I'm confident that if the build breaks on those older versions it's not because of my JIT change. See the "un-break" change in this series ("grep: un-break building with PCRE < 8.32", 2017-05-10) for why this isn't squashed into the main PCRE JIT commit. Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/grep.h b/grep.h index 3b948d9155..ce90969736 100644 --- a/grep.h +++ b/grep.h @@ -11,6 +11,9 @@ #ifndef PCRE_STUDY_JIT_COMPILE #define PCRE_STUDY_JIT_COMPILE 0 #endif +#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20 +typedef int pcre_jit_stack; +#endif #else typedef int pcre; typedef int pcre_extra; -- 2.13.0.303.g4ebf302169
[PATCH v3 5/7] grep: un-break building with PCRE < 8.32
Amend my change earlier in this series ("grep: add support for the PCRE v1 JIT API", 2017-04-11) to un-break the build on PCRE v1 versions earlier than 8.32. The JIT support was added in version 8.20 released on 2011-10-21, but it wasn't until 8.32 released on 2012-11-30 that the fast code path to use the JIT via pcre_jit_exec() was added[1] (see also [2]). This means that versions 8.20 through 8.31 could still use the JIT, but supporting it on those versions would add to the already verbose macro soup around JIT support it, and I don't expect that the use-case of compiling a brand new git against a 5 year old PCRE is particularly common, and if someone does that they can just get the existing pre-JIT slow codepath. So just take the easy way out and disable the JIT on any version older than 8.32. The reason this change isn't part of the initial change PCRE JIT support is to have a cleaner history showing which parts of the implementation are only used for ancient PCRE versions. This also makes it easier to revert this change if we ever decide to stop supporting those old versions. 1. http://www.pcre.org/original/changelog.txt ("28. Introducing a native interface for JIT. Through this interface, the compiled[...]") 2. https://bugs.exim.org/show_bug.cgi?id=2121 Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.c | 6 +++--- grep.h | 5 + 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/grep.c b/grep.c index ffe95995ee..19fa67c34c 100644 --- a/grep.c +++ b/grep.c @@ -369,7 +369,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) if (!p->pcre1_extra_info && error) die("%s", error); -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_USE_JIT pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on); if (p->pcre1_jit_on == 1) { p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024); @@ -391,7 +391,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_USE_JIT if (p->pcre1_jit_on) { ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line, 0, flags, ovector, @@ -418,7 +418,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol, static void free_pcre1_regexp(struct grep_pat *p) { pcre_free(p->pcre1_regexp); -#ifdef PCRE_CONFIG_JIT +#ifdef GIT_PCRE1_USE_JIT if (p->pcre1_jit_on) { pcre_free_study(p->pcre1_extra_info); pcre_jit_stack_free(p->pcre1_jit_stack); diff --git a/grep.h b/grep.h index 14f47189f9..3b948d9155 100644 --- a/grep.h +++ b/grep.h @@ -3,6 +3,11 @@ #include "color.h" #ifdef USE_LIBPCRE1 #include +#ifdef PCRE_CONFIG_JIT +#if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 +#define GIT_PCRE1_USE_JIT +#endif +#endif #ifndef PCRE_STUDY_JIT_COMPILE #define PCRE_STUDY_JIT_COMPILE 0 #endif -- 2.13.0.303.g4ebf302169
[PATCH v3 3/7] log: add -P as a synonym for --perl-regexp
Add a short -P option as a synonym for the longer --perl-regexp, for consistency with the options the corresponding grep invocations accept. This was intentionally omitted in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified future use. Make it consistent with "grep" rather than to keep it open for future use, and to avoid the confusion of -P meaning different things for grep & log, as is the case with the -G option. As noted in the aforementioned commit the --basic-regexp option can't have a corresponding -G argument, as the log command already uses that for -G. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/rev-list-options.txt | 1 + revision.c | 2 +- t/t4202-log.sh | 12 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a46f70c2b1..9c44eae55d 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -91,6 +91,7 @@ endif::git-rev-list[] Consider the limiting patterns to be fixed strings (don't interpret pattern as a regular expression). +-P:: --perl-regexp:: Consider the limiting patterns to be Perl-compatible regular expressions. diff --git a/revision.c b/revision.c index 4883cdd2d0..60329da1bd 100644 --- a/revision.c +++ b/revision.c @@ -1996,7 +1996,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE); } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED; - } else if (!strcmp(arg, "--perl-regexp")) { + } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE; } else if (!strcmp(arg, "--all-match")) { revs->grep_filter.all_match = 1; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index c44c4337f8..e6050faa14 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -404,8 +404,20 @@ test_expect_success 'log with various grep.patternType configurations & command- --grep="(1|2)" >actual.fixed.short-arg && git log --pretty=tformat:%s -E \ --grep="\|2" >actual.extended.short-arg && + if test_have_prereq PCRE + then + git log --pretty=tformat:%s -P \ + --grep="[\d]\|" >actual.perl.short-arg + else + test_must_fail git log -P \ + --grep="[\d]\|" + fi && test_cmp expect.fixed actual.fixed.short-arg && test_cmp expect.extended actual.extended.short-arg && + if test_have_prereq PCRE + then + test_cmp expect.perl actual.perl.short-arg + fi && git log --pretty=tformat:%s --fixed-strings \ --grep="(1|2)" >actual.fixed.long-arg && -- 2.13.0.303.g4ebf302169
[PATCH v3 1/7] grep: don't redundantly compile throwaway patterns under threading
Change the pattern compilation logic under threading so that grep doesn't compile a pattern it never ends up using on the non-threaded code path, only to compile it again N times for N threads which will each use their own copy, ignoring the initially compiled pattern. This redundant compilation dates back to the initial introduction of the threaded grep in commit 5b594f457a ("Threaded grep", 2010-01-25). There was never any reason for doing this redundant work other than an oversight in the initial commit. Jeff King suggested on-list in <20170414212325.fefrl3qdjigwy...@sigill.intra.peff.net> that this might be needed to check the pattern for sanity before threaded execution commences. That's not the case. The pattern is compiled under threading in start_threads() before any concurrent execution has started by calling pthread_create(), so if the pattern contains an error we still do the right thing. I.e. die with one error before any threaded execution has commenced, instead of e.g. spewing out an error for each N threads, which could be a regression a change like this might inadvertently introduce. This change is not meant as an optimization, any performance gains from this are in the hundreds to thousands of nanoseconds at most. If we wanted more performance here we could just re-use the compiled patterns in multiple threads (regcomp(3) is thread-safe), or partially re-use them and the associated structures in the case of later PCRE JIT changes. Rather, it's just to make the code easier to reason about. It's confusing to debug this under threading & non-threading when the threading codepaths redundantly compile a pattern which is never used. The reason the patterns are recompiled is as a side-effect of duplicating the whole grep_opt structure, which is not thread safe, writable, and munged during execution. The grep_opt structure then points to the grep_pat structure where pattern or patterns are stored. I looked into e.g. splitting the API into some "do & alloc threadsafe stuff", "spawn thread", "do and alloc non-threadsafe stuff", but the execution time of grep_opt_dup() & pattern compilation is trivial compared to actually executing the grep, so there was no point. Even with the more expensive JIT changes to follow the most expensive PCRE patterns take something like 0.0X milliseconds to compile at most[1]. The undocumented --debug mode added in commit 17bf35a3c7 ("grep: teach --debug option to dump the parse tree", 2012-09-13) still works properly with this change. It only emits debugging info during pattern compilation, which is now dumped by the pattern compiled just before the first thread is started. 1. http://sljit.sourceforge.net/pcre.html Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/grep.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index b1095362fb..12e62fcbf3 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -224,7 +224,8 @@ static void start_threads(struct grep_opt *opt) int err; struct grep_opt *o = grep_opt_dup(opt); o->output = strbuf_out; - o->debug = 0; + if (i) + o->debug = 0; compile_grep_patterns(o); err = pthread_create(&threads[i], NULL, run, o); @@ -1167,8 +1168,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (!opt.fixed && opt.ignore_case) opt.regflags |= REG_ICASE; - compile_grep_patterns(&opt); - /* * We have to find "--" in a separate pass, because its presence * influences how we will parse arguments that come before it. @@ -1245,6 +1244,15 @@ int cmd_grep(int argc, const char **argv, const char *prefix) num_threads = 0; #endif + if (!num_threads) + /* +* The compiled patterns on the main path are only +* used when not using threading. Otherwise +* start_threads() below calls compile_grep_patterns() +* for each thread. +*/ + compile_grep_patterns(&opt); + #ifndef NO_PTHREADS if (num_threads) { if (!(opt.name_only || opt.unmatch_name_only || opt.count) -- 2.13.0.303.g4ebf302169
[PATCH v3 2/7] grep: skip pthreads overhead when using one thread
Skip the administrative overhead of using pthreads when only using one thread. Instead take the non-threaded path which would be taken under NO_PTHREADS. The threading support was initially added in commit 5b594f457a ("Threaded grep", 2010-01-25) with a hardcoded compile-time number of 8 threads. Later the number of threads was made configurable in commit 89f09dd34e ("grep: add --threads= option and grep.threads configuration", 2015-12-15). That change did not add any special handling for --threads=1. Now we take a slightly faster path by skipping thread handling entirely when 1 thread is requested. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/grep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index 12e62fcbf3..bd008cb100 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1238,6 +1238,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) num_threads = GREP_NUM_THREADS_DEFAULT; else if (num_threads < 0) die(_("invalid number of threads specified (%d)"), num_threads); + if (num_threads == 1) + num_threads = 0; #else if (num_threads) warning(_("no threads support, ignoring --threads")); -- 2.13.0.303.g4ebf302169
[PATCH v3 0/7] PCRE v2, PCRE v1 JIT, log -P & fixes
See <20170523192453.14172-1-ava...@gmail.com> for v2 (https://public-inbox.org/git/20170523192453.14172-1-ava...@gmail.com/). This on top of "[PATCH v4 00/31] Easy to review grep & pre-PCRE changes". This hopefully addresses all the comments I had on v2. Comments below: Ævar Arnfjörð Bjarmason (7): grep: don't redundantly compile throwaway patterns under threading grep: skip pthreads overhead when using one thread log: add -P as a synonym for --perl-regexp No changes. grep: add support for the PCRE v1 JIT API * Simplify logic around canjit variable. * s/BUG: // for things that aren't internal bugs (but keep it for one that's an actual bug if it happens...) * Simplify syntax around `#ifdef PCRE_CONFIG_JIT` blocks. grep: un-break building with PCRE < 8.32 * Reword commit message for less confusion. * Macro name: s/GIT_PCRE1_CAN_DO_MODERN_JIT/GIT_PCRE1_USE_JIT/g grep: un-break building with PCRE < 8.20 No changes. grep: add support for PCRE v2 * Makefile: Clarify comment about what LIBPCREDIR=* does now. * Makefile: One use of USE_LIBPCRE is now USE_LIBPCRE1. It's logically the same, but less confusing now. * grep.c: Fix up comment syntax * grep.c: Same canjit & s/BUG: // bug fixes as noted for PCRE v1. Documentation/rev-list-options.txt | 1 + Makefile | 36 +-- builtin/grep.c | 16 +++- configure.ac | 77 --- grep.c | 187 - grep.h | 31 ++ revision.c | 2 +- t/t4202-log.sh | 12 +++ t/test-lib.sh | 2 +- 9 files changed, 335 insertions(+), 29 deletions(-) -- 2.13.0.303.g4ebf302169
[PATCH v4 31/31] grep: assert that threading is enabled when calling grep_{lock,unlock}
Change the grep_{lock,unlock} functions to assert that num_threads is true, instead of only locking & unlocking the pthread mutex lock when it is. These functions are never called when num_threads isn't true, this logic has gone through multiple iterations since the initial introduction of grep threading in commit 5b594f457a ("Threaded grep", 2010-01-25), but ever since then they'd only be called if num_threads was true, so this check made the code confusing to read. Replace the check with an assertion, so that it's clear to the reader that this code path is never taken unless we're spawning threads. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/grep.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 3c721b75a5..b1095362fb 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -73,14 +73,14 @@ static pthread_mutex_t grep_mutex; static inline void grep_lock(void) { - if (num_threads) - pthread_mutex_lock(&grep_mutex); + assert(num_threads); + pthread_mutex_lock(&grep_mutex); } static inline void grep_unlock(void) { - if (num_threads) - pthread_mutex_unlock(&grep_mutex); + assert(num_threads); + pthread_mutex_unlock(&grep_mutex); } /* Signalled when a new work_item is added to todo. */ -- 2.13.0.303.g4ebf302169
[PATCH v4 29/31] pack-objects: fix buggy warning about threads
Fix a buggy warning about threads under NO_PTHREADS=YesPlease. Due to re-using the delta_search_threads variable for both the state of the "pack.threads" config & the --threads option, setting "pack.threads" but not supplying --threads would trigger the warning for both "pack.threads" & --threads. Solve this bug by resetting the delta_search_threads variable in git_pack_config(), it might then be set by --threads again and be subsequently warned about, as the test I'm changing here asserts. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/pack-objects.c | 4 +++- t/t5300-pack-object.sh | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 9b4ba8a80d..efa21a15dd 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2472,8 +2472,10 @@ static int git_pack_config(const char *k, const char *v, void *cb) die("invalid number of threads specified (%d)", delta_search_threads); #ifdef NO_PTHREADS - if (delta_search_threads != 1) + if (delta_search_threads != 1) { warning("no threads support, ignoring %s", k); + delta_search_threads = 0; + } #endif return 0; } diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 6ed23ee1d2..9c68b99251 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -447,7 +447,7 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack. git -c pack.threads=2 pack-objects --stdout --all /dev/null 2>err && grep ^warning: err >warnings && - test_must_fail test_line_count = 1 warnings && + test_line_count = 1 warnings && grep -F "no threads support, ignoring pack.threads" err && git -c pack.threads=2 pack-objects --threads=4 --stdout --all /dev/null 2>err && -- 2.13.0.303.g4ebf302169
[PATCH v4 26/31] grep: move is_fixed() earlier to avoid forward declaration
Move the is_fixed() function which are currently only used in compile_regexp() earlier so it can be used in the PCRE family of functions in a later change. Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/grep.c b/grep.c index 2d54baeaa3..d03d424e5c 100644 --- a/grep.c +++ b/grep.c @@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p, die("%s'%s': %s", where, p->pattern, error); } +static int is_fixed(const char *s, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) { + if (is_regex_special(s[i])) + return 0; + } + + return 1; +} + static int has_null(const char *s, size_t len) { /* @@ -402,18 +414,6 @@ static void free_pcre1_regexp(struct grep_pat *p) } #endif /* !USE_LIBPCRE1 */ -static int is_fixed(const char *s, size_t len) -{ - size_t i; - - for (i = 0; i < len; i++) { - if (is_regex_special(s[i])) - return 0; - } - - return 1; -} - static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt) { struct strbuf sb = STRBUF_INIT; -- 2.13.0.303.g4ebf302169
[PATCH v4 27/31] test-lib: add a PTHREADS prerequisite
Add a PTHREADS prerequisite which is false when git is compiled with NO_PTHREADS=YesPlease. There's lots of custom code that runs when threading isn't available, but before this prerequisite there was no way to test it. Signed-off-by: Ævar Arnfjörð Bjarmason --- Makefile | 1 + t/README | 4 t/test-lib.sh | 1 + 3 files changed, 6 insertions(+) diff --git a/Makefile b/Makefile index 374fbc7e58..a79274e5e6 100644 --- a/Makefile +++ b/Makefile @@ -2242,6 +2242,7 @@ GIT-BUILD-OPTIONS: FORCE @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+ @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+ + @echo NO_PTHREADS=\''$(subst ','\'',$(subst ','\'',$(NO_PTHREADS)))'\' >>$@+ @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+ @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+ @echo PAGER_ENV=\''$(subst ','\'',$(subst ','\'',$(PAGER_ENV)))'\' >>$@+ diff --git a/t/README b/t/README index a90cb62583..2f95860369 100644 --- a/t/README +++ b/t/README @@ -817,6 +817,10 @@ use these, and "test_set_prereq" for how to define your own. Test is run on a filesystem which converts decomposed utf-8 (nfd) to precomposed utf-8 (nfc). + - PTHREADS + + Git wasn't compiled with NO_PTHREADS=YesPlease. + Tips for Writing Tests -- diff --git a/t/test-lib.sh b/t/test-lib.sh index 1d0f636cbd..43529451f9 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1013,6 +1013,7 @@ esac ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1 test -z "$NO_PERL" && test_set_prereq PERL +test -z "$NO_PTHREADS" && test_set_prereq PTHREADS test -z "$NO_PYTHON" && test_set_prereq PYTHON test -n "$USE_LIBPCRE1" && test_set_prereq PCRE test -z "$NO_GETTEXT" && test_set_prereq GETTEXT -- 2.13.0.303.g4ebf302169
[PATCH v4 21/31] grep: catch a missing enum in switch statement
Add a die(...) to a default case for the switch statement selecting between grep pattern types under --recurse-submodules. Normally this would be caught by -Wswitch, but the grep_pattern_type type is converted to int by going through parse_options(). Changing the argument type passed to compile_submodule_options() won't work, the value will just get coerced. The -Wswitch-default warning will warn about it, but that produces a lot of noise across the codebase, this potential issue would be drowned in that noise. Thus catching this at runtime is the least bad option. This won't ever trigger in practice, but if a new pattern type were to be added this catches an otherwise silent bug during development. See commit 0281e487fd ("grep: optionally recurse into submodules", 2016-12-16) for the initial addition of this code. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/grep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index 3ffb5b4e81..a191e2976b 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -495,6 +495,8 @@ static void compile_submodule_options(const struct grep_opt *opt, break; case GREP_PATTERN_TYPE_UNSPECIFIED: break; + default: + die("BUG: Added a new grep pattern type without updating switch statement"); } for (pattern = opt->pattern_list; pattern != NULL; -- 2.13.0.303.g4ebf302169
[PATCH v4 22/31] grep: remove redundant regflags assignments
Remove redundant assignments to the "regflags" variable. This variable is only used set under GREP_PATTERN_TYPE_ERE, so there's no need to un-set it under GREP_PATTERN_TYPE_{FIXED,BRE,PCRE}. Back in 5010cb5fcc[1], we did do "opt.regflags &= ~REG_EXTENDED" upon seeing "-G" on the command line and flipped the bit on upon seeing "-E", but I think that was perfectly sensible and it would have been a bug if we didn't. They were part of the command line parsing that could have seen "-E" on the command line earlier. When cca2c172 ("git-grep: do not die upon -F/-P when grep.extendedRegexp is set.", 2011-05-09) switched the command line parsing to "read into a 'tentatively this is what we saw the last' variable and then finally commit just once", we didn't touch opt.regflags for PCRE and FIXED, but we still had to flip regflags between BRE and ERE, because parsing of grep.extendedregexp configuration variable directly touched opt.regflags back then, which was done by b22520a3 ("grep: allow -E and -n to be turned on by default via configuration", 2011-03-30). When 84befcd0 ("grep: add a grep.patternType configuration setting", 2012-08-03) introduced extended_regexp_option field, we stopped flipping regflags while reading the configuration, and that was when we should have noticed and stopped dropping REG_EXTENDED bit in the "now we can commit what type to use" helper function. There is no reason to do this anymore, so stop doing it, more to reduce "wait this is used under fixed/BRE/PCRE how?" confusion when reading the code, than to to save ourselves trivial CPU cycles by removing one assignment. 1. "built-in "git grep"", 2006-04-30. Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/grep.c b/grep.c index 47cee45067..bf6c2494fd 100644 --- a/grep.c +++ b/grep.c @@ -179,7 +179,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_BRE: opt->fixed = 0; opt->pcre = 0; - opt->regflags &= ~REG_EXTENDED; break; case GREP_PATTERN_TYPE_ERE: @@ -191,13 +190,11 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_FIXED: opt->fixed = 1; opt->pcre = 0; - opt->regflags &= ~REG_EXTENDED; break; case GREP_PATTERN_TYPE_PCRE: opt->fixed = 0; opt->pcre = 1; - opt->regflags &= ~REG_EXTENDED; break; } } @@ -415,10 +412,9 @@ static void compile_fixed_regexp(struct grep_pat *p, struct grep_opt *opt) { struct strbuf sb = STRBUF_INIT; int err; - int regflags; + int regflags = opt->regflags; basic_regex_quote_buf(&sb, p->pattern); - regflags = opt->regflags & ~REG_EXTENDED; if (opt->ignore_case) regflags |= REG_ICASE; err = regcomp(&p->regexp, sb.buf, regflags); -- 2.13.0.303.g4ebf302169
[PATCH v4 30/31] grep: given --threads with NO_PTHREADS=YesPlease, warn
Add a warning about missing thread support when grep.threads or --threads is set to a non 0 (default) or 1 (no parallelism) value under NO_PTHREADS=YesPlease. This is for consistency with the index-pack & pack-objects commands, which also take a --threads option & are configurable via pack.threads, and have long warned about the same under NO_PTHREADS=YesPlease. Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/grep.c | 13 + t/t7810-grep.sh | 18 ++ 2 files changed, 31 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index a191e2976b..3c721b75a5 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -289,6 +289,17 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) if (num_threads < 0) die(_("invalid number of threads specified (%d) for %s"), num_threads, var); +#ifdef NO_PTHREADS + else if (num_threads && num_threads != 1) { + /* +* TRANSLATORS: %s is the configuration +* variable for tweaking threads, currently +* grep.threads +*/ + warning(_("no threads support, ignoring %s"), var); + num_threads = 0; + } +#endif } return st; @@ -1229,6 +1240,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) else if (num_threads < 0) die(_("invalid number of threads specified (%d)"), num_threads); #else + if (num_threads) + warning(_("no threads support, ignoring --threads")); num_threads = 0; #endif diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 561709ef6a..f106387820 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -791,6 +791,24 @@ do " done +test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'grep --threads=N or pack.threads=N warns when no pthreads' ' + git grep --threads=2 Hello hello_world 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring --threads" err && + git -c grep.threads=2 grep Hello hello_world 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring grep.threads" err && + git -c grep.threads=2 grep --threads=4 Hello hello_world 2>err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads" err && + grep -F "no threads support, ignoring grep.threads" err && + git -c grep.threads=0 grep --threads=0 Hello hello_world 2>err && + test_line_count = 0 err +' + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 2.13.0.303.g4ebf302169
[PATCH v4 28/31] pack-objects & index-pack: add test for --threads warning
Add a test for the warning that's emitted when --threads or pack.threads is provided under NO_PTHREADS=YesPlease. This uses the new PTHREADS prerequisite. The assertion for C_LOCALE_OUTPUT in the latter test is currently redundant, since unlike index-pack the pack-objects warnings aren't i18n'd. However they might be changed to be i18n'd in the future, and there's no harm in future-proofing the test. There's an existing bug in the implementation of pack-objects which this test currently tests for as-is. Details about the bug & the fix are included in a follow-up change. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t5300-pack-object.sh | 36 1 file changed, 36 insertions(+) diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 43a672c345..6ed23ee1d2 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -421,6 +421,42 @@ test_expect_success 'index-pack works in non-repo' ' test_path_is_file foo.idx ' +test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'index-pack --threads=N or pack.threads=N warns when no pthreads' ' + test_must_fail git index-pack --threads=2 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring --threads=2" err && + + test_must_fail git -c pack.threads=2 index-pack 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring pack.threads" err && + + test_must_fail git -c pack.threads=2 index-pack --threads=4 2>err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads=4" err && + grep -F "no threads support, ignoring pack.threads" err +' + +test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack.threads=N warns when no pthreads' ' + git pack-objects --threads=2 --stdout --all /dev/null 2>err && + grep ^warning: err >warnings && + test_line_count = 1 warnings && + grep -F "no threads support, ignoring --threads" err && + + git -c pack.threads=2 pack-objects --stdout --all /dev/null 2>err && + grep ^warning: err >warnings && + test_must_fail test_line_count = 1 warnings && + grep -F "no threads support, ignoring pack.threads" err && + + git -c pack.threads=2 pack-objects --threads=4 --stdout --all /dev/null 2>err && + grep ^warning: err >warnings && + test_line_count = 2 warnings && + grep -F "no threads support, ignoring --threads" err && + grep -F "no threads support, ignoring pack.threads" err +' + # # WARNING! # -- 2.13.0.303.g4ebf302169
[PATCH v4 25/31] grep: change internal *pcre* variable & function names to be *pcre1*
Change the internal PCRE variable & function names to have a "1" suffix. This is for preparation for libpcre2 support, where having non-versioned names would be confusing. An earlier change in this series ("grep: change the internal PCRE macro names to be PCRE1", 2017-04-07) elaborates on the motivations behind this change. Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.c | 52 ++-- grep.h | 8 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/grep.c b/grep.c index 7a3858a1c3..2d54baeaa3 100644 --- a/grep.c +++ b/grep.c @@ -178,23 +178,23 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_BRE: opt->fixed = 0; - opt->pcre = 0; + opt->pcre1 = 0; break; case GREP_PATTERN_TYPE_ERE: opt->fixed = 0; - opt->pcre = 0; + opt->pcre1 = 0; opt->regflags |= REG_EXTENDED; break; case GREP_PATTERN_TYPE_FIXED: opt->fixed = 1; - opt->pcre = 0; + opt->pcre1 = 0; break; case GREP_PATTERN_TYPE_PCRE: opt->fixed = 0; - opt->pcre = 1; + opt->pcre1 = 1; break; } } @@ -334,7 +334,7 @@ static int has_null(const char *s, size_t len) } #ifdef USE_LIBPCRE1 -static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) +static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) { const char *error; int erroffset; @@ -342,23 +342,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) if (opt->ignore_case) { if (has_non_ascii(p->pattern)) - p->pcre_tables = pcre_maketables(); + p->pcre1_tables = pcre_maketables(); options |= PCRE_CASELESS; } if (is_utf8_locale() && has_non_ascii(p->pattern)) options |= PCRE_UTF8; - p->pcre_regexp = pcre_compile(p->pattern, options, &error, &erroffset, - p->pcre_tables); - if (!p->pcre_regexp) + p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset, + p->pcre1_tables); + if (!p->pcre1_regexp) compile_regexp_failed(p, error); - p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, &error); - if (!p->pcre_extra_info && error) + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, &error); + if (!p->pcre1_extra_info && error) die("%s", error); } -static int pcrematch(struct grep_pat *p, const char *line, const char *eol, +static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { int ovector[30], ret, flags = 0; @@ -366,7 +366,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; - ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line, + ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line, 0, flags, ovector, ARRAY_SIZE(ovector)); if (ret < 0 && ret != PCRE_ERROR_NOMATCH) die("pcre_exec failed with error code %d", ret); @@ -379,25 +379,25 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, return ret; } -static void free_pcre_regexp(struct grep_pat *p) +static void free_pcre1_regexp(struct grep_pat *p) { - pcre_free(p->pcre_regexp); - pcre_free(p->pcre_extra_info); - pcre_free((void *)p->pcre_tables); + pcre_free(p->pcre1_regexp); + pcre_free(p->pcre1_extra_info); + pcre_free((void *)p->pcre1_tables); } #else /* !USE_LIBPCRE1 */ -static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) +static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) { die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE"); } -static int pcrematch(struct grep_pat *p, const char *line, const char *eol, +static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { return 1; } -static void free_pcre_regexp(struct grep_pat *p) +static void free_pcre1_regexp(struct grep_pat *p) { } #endif /* !USE_LIBPCRE1 */ @@ -479,8 +479,8 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) return; } - if (opt->pcre) { - compile_pcre_regexp(p, opt); + if (opt->pcre1) { + compile_pcre1_regexp(p, opt); return; } @@ -836,8 +836,8 @@ v
[PATCH v4 23/31] grep: factor test for \0 in grep patterns into a function
Factor the test for \0 in grep patterns into a function. Since commit 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a \0 is considered fixed as regcomp() can't handle it. This change makes later changes that make use of either has_null() or is_fixed() (but not both) smaller. While I'm at it make the comment conform to the style guide, i.e. add an opening "/*\n". Signed-off-by: Ævar Arnfjörð Bjarmason --- grep.c | 22 +++--- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/grep.c b/grep.c index bf6c2494fd..18306bc605 100644 --- a/grep.c +++ b/grep.c @@ -321,6 +321,18 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p, die("%s'%s': %s", where, p->pattern, error); } +static int has_null(const char *s, size_t len) +{ + /* +* regcomp cannot accept patterns with NULs so when using it +* we consider any pattern containing a NUL fixed. +*/ + if (memchr(s, 0, len)) + return 1; + + return 0; +} + #ifdef USE_LIBPCRE static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { @@ -394,12 +406,6 @@ static int is_fixed(const char *s, size_t len) { size_t i; - /* regcomp cannot accept patterns with NULs so we -* consider any pattern containing a NUL fixed. -*/ - if (memchr(s, 0, len)) - return 1; - for (i = 0; i < len; i++) { if (is_regex_special(s[i])) return 0; @@ -451,7 +457,9 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) * simple string match using kws. p->fixed tells us if we * want to use kws. */ - if (opt->fixed || is_fixed(p->pattern, p->patternlen)) + if (opt->fixed || + has_null(p->pattern, p->patternlen) || + is_fixed(p->pattern, p->patternlen)) p->fixed = !icase || ascii_only; else p->fixed = 0; -- 2.13.0.303.g4ebf302169
[PATCH v4 18/31] perf: add a comparison test of grep regex engines with -F
Add a performance comparison test of grep regex engines given fixed strings. The current logic in compile_regexp() ignores the engine parameter and uses kwset() to search for these, so this test shows no difference between engines right now: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7821-grep-engines-fixed.sh [...] Test this tree 7821.1: fixed grep int 0.56(1.67+0.68) 7821.2: basic grep int 0.57(1.70+0.57) 7821.3: extended grep int0.59(1.76+0.51) 7821.4: perl grep int1.08(1.71+0.55) 7821.6: fixed grep uncommon 0.23(0.55+0.50) 7821.7: basic grep uncommon 0.24(0.55+0.50) 7821.8: extended grep uncommon 0.26(0.55+0.52) 7821.9: perl grep uncommon 0.24(0.58+0.47) 7821.11: fixed grep æ0.36(1.30+0.42) 7821.12: basic grep æ0.36(1.32+0.40) 7821.13: extended grep æ 0.38(1.30+0.42) 7821.14: perl grep æ 0.35(1.24+0.48) Only when run with -i via GIT_PERF_7821_GREP_OPTS=' -i' do we avoid avoid going through the same kwset.[ch] codepath, see the "Even when -F..." comment in grep.c. This only kicks for the non-ASCII case: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7821_GREP_OPTS=' -i' ./run p7821-grep-engines-fixed.sh [...] Testthis tree --- 7821.1: fixed grep -i int 0.62(2.10+0.57) 7821.2: basic grep -i int 0.68(1.90+0.61) 7821.3: extended grep -i int0.78(1.94+0.57) 7821.4: perl grep -i int0.98(1.78+0.74) 7821.6: fixed grep -i uncommon 0.24(0.44+0.64) 7821.7: basic grep -i uncommon 0.25(0.56+0.54) 7821.8: extended grep -i uncommon 0.27(0.62+0.45) 7821.9: perl grep -i uncommon 0.24(0.59+0.49) 7821.11: fixed grep -i æ0.30(0.96+0.39) 7821.12: basic grep -i æ0.27(0.92+0.44) 7821.13: extended grep -i æ 0.28(0.90+0.46) 7821.14: perl grep -i æ 0.28(0.74+0.49) I'm planning to change how fixed-string searching happens. This test gives a baseline for comparing performance before & after any such change. See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/perf/p7821-grep-engines-fixed.sh | 41 ++ 1 file changed, 41 insertions(+) create mode 100755 t/perf/p7821-grep-engines-fixed.sh diff --git a/t/perf/p7821-grep-engines-fixed.sh b/t/perf/p7821-grep-engines-fixed.sh new file mode 100755 index 00..c7ef1e198f --- /dev/null +++ b/t/perf/p7821-grep-engines-fixed.sh @@ -0,0 +1,41 @@ +#!/bin/sh + +test_description="Comparison of git-grep's regex engines with -F + +Set GIT_PERF_7821_GREP_OPTS in the environment to pass options to +git-grep. Make sure to include a leading space, +e.g. GIT_PERF_7821_GREP_OPTS=' -w'. See p7820-grep-engines.sh for more +options to try. +" + +. ./perf-lib.sh + +test_perf_large_repo +test_checkout_worktree + +for pattern in 'int' 'uncommon' 'æ' +do + for engine in fixed basic extended perl + do + if test $engine = "perl" && ! test_have_prereq PCRE + then + prereq="PCRE" + else + prereq="" + fi + test_perf $prereq "$engine grep$GIT_PERF_7821_GREP_OPTS $pattern" " + git -c grep.patternType=$engine grep$GIT_PERF_7821_GREP_OPTS $pattern >'out.$engine' || : + " + done + + test_expect_success "assert that all engines found the same for$GIT_PERF_7821_GREP_OPTS $pattern" ' + test_cmp out.fixed out.basic && + test_cmp out.fixed out.extended && + if test_have_prereq PCRE + then + test_cmp out.fixed out.perl + fi + ' +done + +test_done -- 2.13.0.303.g4ebf302169
[PATCH v4 24/31] grep: change the internal PCRE macro names to be PCRE1
Change the internal USE_LIBPCRE define, & build options flag to use a naming convention ending in PCRE1, without changing the long-standing USE_LIBPCRE Makefile flag which enables this code. This is for preparation for libpcre2 support where having things like USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely need to for backwards compatibility with old Makefile arguments would be confusing. In some ways it would be better to change everything that now uses USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their build scripts. However I'd like to leave the door open to making USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease, i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it the default. This code and the USE_LIBPCRE Makefile argument was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was no indication that the PCRE project would release an entirely new & incompatible API around 3 years later. Signed-off-by: Ævar Arnfjörð Bjarmason --- Makefile | 4 ++-- grep.c| 6 +++--- grep.h| 2 +- t/test-lib.sh | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Makefile b/Makefile index d1587452f1..374fbc7e58 100644 --- a/Makefile +++ b/Makefile @@ -1088,7 +1088,7 @@ ifdef NO_LIBGEN_H endif ifdef USE_LIBPCRE - BASIC_CFLAGS += -DUSE_LIBPCRE + BASIC_CFLAGS += -DUSE_LIBPCRE1 ifdef LIBPCREDIR BASIC_CFLAGS += -I$(LIBPCREDIR)/include EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib) @@ -2240,7 +2240,7 @@ GIT-BUILD-OPTIONS: FORCE @echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+ @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+ @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+ - @echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ + @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+ @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+ @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+ diff --git a/grep.c b/grep.c index 18306bc605..7a3858a1c3 100644 --- a/grep.c +++ b/grep.c @@ -333,7 +333,7 @@ static int has_null(const char *s, size_t len) return 0; } -#ifdef USE_LIBPCRE +#ifdef USE_LIBPCRE1 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { const char *error; @@ -385,7 +385,7 @@ static void free_pcre_regexp(struct grep_pat *p) pcre_free(p->pcre_extra_info); pcre_free((void *)p->pcre_tables); } -#else /* !USE_LIBPCRE */ +#else /* !USE_LIBPCRE1 */ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE"); @@ -400,7 +400,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, static void free_pcre_regexp(struct grep_pat *p) { } -#endif /* !USE_LIBPCRE */ +#endif /* !USE_LIBPCRE1 */ static int is_fixed(const char *s, size_t len) { diff --git a/grep.h b/grep.h index 267534ca24..073b0e4c92 100644 --- a/grep.h +++ b/grep.h @@ -1,7 +1,7 @@ #ifndef GREP_H #define GREP_H #include "color.h" -#ifdef USE_LIBPCRE +#ifdef USE_LIBPCRE1 #include #else typedef int pcre; diff --git a/t/test-lib.sh b/t/test-lib.sh index 04d857a42b..1d0f636cbd 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1014,7 +1014,7 @@ esac ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1 test -z "$NO_PERL" && test_set_prereq PERL test -z "$NO_PYTHON" && test_set_prereq PYTHON -test -n "$USE_LIBPCRE" && test_set_prereq PCRE +test -n "$USE_LIBPCRE1" && test_set_prereq PCRE test -z "$NO_GETTEXT" && test_set_prereq GETTEXT # Can we rely on git's output in the C locale? -- 2.13.0.303.g4ebf302169
[PATCH v4 17/31] perf: add a comparison test of grep regex engines
Add a very basic performance comparison test comparing the POSIX basic, extended and perl engines. In theory the "basic" and "extended" engines should be implemented using the same underlying code with a slightly different pattern parser, but some implementations may not do this. Jump through some slight hoops to test both, which is worthwhile since "basic" is the default. Running this on an i7 3.4GHz Linux 4.9.0-2 Debian testing against a checkout of linux.git & latest upstream PCRE, both PCRE and git compiled with -O3 using gcc 7.1.1: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7820-grep-engines.sh [...] Testthis tree --- 7820.1: basic grep 'how.to' 0.34(1.24+0.53) 7820.2: extended grep 'how.to' 0.33(1.23+0.45) 7820.3: perl grep 'how.to' 0.31(1.05+0.56) 7820.5: basic grep '^how to'0.32(1.24+0.42) 7820.6: extended grep '^how to' 0.33(1.20+0.44) 7820.7: perl grep '^how to' 0.57(2.67+0.42) 7820.9: basic grep '[how] to' 0.51(2.16+0.45) 7820.10: extended grep '[how] to' 0.49(2.20+0.43) 7820.11: perl grep '[how] to' 0.56(2.60+0.43) 7820.13: basic grep '\(e.t[^ ]*\|v.ry\) rare' 0.66(3.25+0.40) 7820.14: extended grep '(e.t[^ ]*|v.ry) rare' 0.65(3.19+0.46) 7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.05(5.74+0.34) 7820.17: basic grep 'm\(ú\|u\)lt.b\(æ\|y\)te' 0.34(1.28+0.47) 7820.18: extended grep 'm(ú|u)lt.b(æ|y)te' 0.34(1.38+0.38) 7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.39(1.56+0.44) Options can also be passed to git-grep via the GIT_PERF_7820_GREP_OPTS environment variable. There are various modes such as "-v" that have very different performance profiles, but handling the combinatorial explosion of testing all those options would make this script much more complex and harder to maintain. Instead just add the ability to do one-shot runs with arbitrary options, e.g.: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7820_GREP_OPTS=" -i" ./run p7820-grep-engines.sh [...] Test this tree -- 7820.1: basic grep -i 'how.to' 0.49(1.72+0.38) 7820.2: extended grep -i 'how.to' 0.46(1.64+0.42) 7820.3: perl grep -i 'how.to' 0.44(1.45+0.45) 7820.5: basic grep -i '^how to'0.47(1.76+0.38) 7820.6: extended grep -i '^how to' 0.47(1.70+0.42) 7820.7: perl grep -i '^how to' 0.65(2.72+0.37) 7820.9: basic grep -i '[how] to' 0.86(3.64+0.42) 7820.10: extended grep -i '[how] to' 0.84(3.62+0.46) 7820.11: perl grep -i '[how] to' 0.73(3.06+0.39) 7820.13: basic grep -i '\(e.t[^ ]*\|v.ry\) rare' 1.63(8.13+0.36) 7820.14: extended grep -i '(e.t[^ ]*|v.ry) rare' 1.64(8.01+0.44) 7820.15: perl grep -i '(e.t[^ ]*|v.ry) rare' 1.44(6.88+0.44) 7820.17: basic grep -i 'm\(ú\|u\)lt.b\(æ\|y\)te' 0.66(2.67+0.44) 7820.18: extended grep -i 'm(ú|u)lt.b(æ|y)te' 0.66(2.67+0.43) 7820.19: perl grep -i 'm(ú|u)lt.b(æ|y)te' 0.59(2.31+0.37) Signed-off-by: Ævar Arnfjörð Bjarmason --- t/perf/p7820-grep-engines.sh | 56 1 file changed, 56 insertions(+) create mode 100755 t/perf/p7820-grep-engines.sh diff --git a/t/perf/p7820-grep-engines.sh b/t/perf/p7820-grep-engines.sh new file mode 100755 index 00..62aba19e76 --- /dev/null +++ b/t/perf/p7820-grep-engines.sh @@ -0,0 +1,56 @@ +#!/bin/sh + +test_description="Comparison of git-grep's regex engines + +Set GIT_PERF_7820_GREP_OPTS in the environment to pass options to +git-grep. Make sure to include a leading space, +e.g. GIT_PERF_7820_GREP_OPTS=' -i'. Some options to try: + + -i + -w + -v + -vi + -vw + -viw +" + +. ./perf-lib.sh + +test_perf_large_repo +test_checkout_worktree + +for pattern in \ + 'how.to' \ + '^how to' \ + '[how] to' \ + '\(e.t[^ ]*\|v.ry\) rare' \ + 'm\(ú\|u\)lt.b\(æ\|y\)te' +do + for engine in basic extended perl + do + if test $engine != "basic" + then + # Poor man's basic -> extended converter. + pattern=$(echo "$pattern" | sed 's/\\//g') + fi + if test $engine = "perl" && ! test_have_prereq PCRE + then + prereq="PCRE" + else + prereq="" + fi + test_perf $prereq "$engine grep$GIT_PERF_7820_GREP_O
[PATCH v4 15/31] perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
Add a git GIT_PERF_MAKE_COMMAND variable to compliment the existing GIT_PERF_MAKE_OPTS facility. This allows specifying an arbitrary shell command to execute instead of 'make'. This is useful e.g. in cases where the name, semantics or defaults of a Makefile flag have changed over time. It can even be used to change the contents of the tree, useful for monkeypatching ancient versions of git to get them to build. This opens Pandora's box in some ways, it's now possible to "jailbreak" the perf environment and e.g. modify the source tree via this arbitrary instead of just issuing a custom "make" command, such a command has to be re-entrant in the sense that subsequent perf runs will re-use the possibly modified tree. It would be pointless to try to mitigate or work around that caveat in a tool purely aimed at Git developers, so this change makes no attempt to do so. Signed-off-by: Ævar Arnfjörð Bjarmason --- Makefile | 3 +++ t/perf/README | 17 - t/perf/run| 11 +-- 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index eedadb8056..d1587452f1 100644 --- a/Makefile +++ b/Makefile @@ -2272,6 +2272,9 @@ endif ifdef GIT_PERF_MAKE_OPTS @echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@+ endif +ifdef GIT_PERF_MAKE_COMMAND + @echo GIT_PERF_MAKE_COMMAND=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_COMMAND)))'\' >>$@+ +endif ifdef GIT_INTEROP_MAKE_OPTS @echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+ endif diff --git a/t/perf/README b/t/perf/README index 49ea4349be..0b6a8d2906 100644 --- a/t/perf/README +++ b/t/perf/README @@ -60,7 +60,22 @@ You can set the following variables (also in your config.mak): GIT_PERF_MAKE_OPTS Options to use when automatically building a git tree for - performance testing. E.g., -j6 would be useful. + performance testing. E.g., -j6 would be useful. Passed + directly to make as "make $GIT_PERF_MAKE_OPTS". + +GIT_PERF_MAKE_COMMAND + An arbitrary command that'll be run in place of the make + command, if set the GIT_PERF_MAKE_OPTS variable is + ignored. Useful in cases where source tree changes might + require issuing a different make command to different + revisions. + + This can be (ab)used to monkeypatch or otherwise change the + tree about to be built. Note that the build directory can be + re-used for subsequent runs so the make command might get + executed multiple times on the same tree, but don't count on + any of that, that's an implementation detail that might change + in the future. GIT_PERF_REPO GIT_PERF_LARGE_REPO diff --git a/t/perf/run b/t/perf/run index c788d713ae..b61024a830 100755 --- a/t/perf/run +++ b/t/perf/run @@ -37,8 +37,15 @@ build_git_rev () { cp "../../$config" "build/$rev/" fi done - (cd build/$rev && make $GIT_PERF_MAKE_OPTS) || - die "failed to build revision '$mydir'" + ( + cd build/$rev && + if test -n "$GIT_PERF_MAKE_COMMAND" + then + sh -c "$GIT_PERF_MAKE_COMMAND" + else + make $GIT_PERF_MAKE_OPTS + fi + ) || die "failed to build revision '$mydir'" } run_dirs_helper () { -- 2.13.0.303.g4ebf302169
[PATCH v4 16/31] perf: emit progress output when unpacking & building
Amend the t/perf/run output so that in addition to the "Running N tests" heading currently being emitted, it also emits "Unpacking $rev" and "Building $rev" when setting up the build/$rev directory & when building it, respectively. This makes it easier to see what's going on and what revision is being tested as the output scrolls by. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/perf/run | 2 ++ 1 file changed, 2 insertions(+) diff --git a/t/perf/run b/t/perf/run index b61024a830..beb4acc0e4 100755 --- a/t/perf/run +++ b/t/perf/run @@ -24,6 +24,7 @@ run_one_dir () { unpack_git_rev () { rev=$1 + echo "=== Unpacking $rev in build/$rev ===" mkdir -p build/$rev (cd "$(git rev-parse --show-cdup)" && git archive --format=tar $rev) | (cd build/$rev && tar x) @@ -37,6 +38,7 @@ build_git_rev () { cp "../../$config" "build/$rev/" fi done + echo "=== Building $rev ===" ( cd build/$rev && if test -n "$GIT_PERF_MAKE_COMMAND" -- 2.13.0.303.g4ebf302169
[PATCH v4 20/31] perf: add a comparison test of log --grep regex engines with -F
Add a performance comparison test of log --grepgrep regex engines given fixed strings. See the preceding fixed-string t/perf change ("perf: add a comparison test of grep regex engines with -F", 2017-04-21) for notes about this, in particular this mostly tests exactly the same codepath now, but might not in the future: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p4221-log-grep-engines-fixed.sh [...] Test this tree 4221.1: fixed log --grep='int' 5.99(5.55+0.40) 4221.2: basic log --grep='int' 5.92(5.56+0.31) 4221.3: extended log --grep='int'6.01(5.51+0.45) 4221.4: perl log --grep='int'5.99(5.56+0.38) 4221.6: fixed log --grep='uncommon' 5.06(4.76+0.27) 4221.7: basic log --grep='uncommon' 5.02(4.78+0.21) 4221.8: extended log --grep='uncommon' 4.99(4.78+0.20) 4221.9: perl log --grep='uncommon' 5.00(4.72+0.26) 4221.11: fixed log --grep='æ'5.35(5.12+0.20) 4221.12: basic log --grep='æ'5.34(5.11+0.20) 4221.13: extended log --grep='æ' 5.39(5.10+0.22) 4221.14: perl log --grep='æ' 5.44(5.16+0.23) Only the non-ASCII -i case is different: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_4221_LOG_OPTS=' -i' ./run p4221-log-grep-engines-fixed.sh [...] Testthis tree --- 4221.1: fixed log -i --grep='int' 6.17(5.77+0.35) 4221.2: basic log -i --grep='int' 6.16(5.59+0.39) 4221.3: extended log -i --grep='int'6.15(5.70+0.39) 4221.4: perl log -i --grep='int'6.15(5.69+0.38) 4221.6: fixed log -i --grep='uncommon' 5.10(4.88+0.21) 4221.7: basic log -i --grep='uncommon' 5.04(4.76+0.25) 4221.8: extended log -i --grep='uncommon' 5.07(4.82+0.23) 4221.9: perl log -i --grep='uncommon' 5.03(4.78+0.22) 4221.11: fixed log -i --grep='æ'5.93(5.65+0.25) 4221.12: basic log -i --grep='æ'5.88(5.62+0.25) 4221.13: extended log -i --grep='æ' 6.02(5.69+0.29) 4221.14: perl log -i --grep='æ' 5.36(5.06+0.29) See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/perf/p4221-log-grep-engines-fixed.sh | 44 ++ 1 file changed, 44 insertions(+) create mode 100755 t/perf/p4221-log-grep-engines-fixed.sh diff --git a/t/perf/p4221-log-grep-engines-fixed.sh b/t/perf/p4221-log-grep-engines-fixed.sh new file mode 100755 index 00..060971265a --- /dev/null +++ b/t/perf/p4221-log-grep-engines-fixed.sh @@ -0,0 +1,44 @@ +#!/bin/sh + +test_description="Comparison of git-log's --grep regex engines with -F + +Set GIT_PERF_4221_LOG_OPTS in the environment to pass options to +git-grep. Make sure to include a leading space, +e.g. GIT_PERF_4221_LOG_OPTS=' -i'. Some options to try: + + -i + --invert-grep + -i --invert-grep +" + +. ./perf-lib.sh + +test_perf_large_repo +test_checkout_worktree + +for pattern in 'int' 'uncommon' 'æ' +do + for engine in fixed basic extended perl + do + if test $engine = "perl" && ! test_have_prereq PCRE + then + prereq="PCRE" + else + prereq="" + fi + test_perf $prereq "$engine log$GIT_PERF_4221_LOG_OPTS --grep='$pattern'" " + git -c grep.patternType=$engine log --pretty=format:%h$GIT_PERF_4221_LOG_OPTS --grep='$pattern' >'out.$engine' || : + " + done + + test_expect_success "assert that all engines found the same for$GIT_PERF_4221_LOG_OPTS '$pattern'" ' + test_cmp out.fixed out.basic && + test_cmp out.fixed out.extended && + if test_have_prereq PCRE + then + test_cmp out.fixed out.perl + fi + ' +done + +test_done -- 2.13.0.303.g4ebf302169
[PATCH v4 19/31] perf: add a comparison test of log --grep regex engines
Add a very basic performance comparison test comparing the POSIX basic, extended and perl engines with patterns matching log messages via --grep=. $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p4220-log-grep-engines.sh [...] Test this tree - 4220.1: basic log --grep='how.to' 6.22(6.00+0.21) 4220.2: extended log --grep='how.to' 6.23(5.98+0.23) 4220.3: perl log --grep='how.to' 6.07(5.79+0.25) 4220.5: basic log --grep='^how to'6.19(5.93+0.22) 4220.6: extended log --grep='^how to' 6.19(5.93+0.23) 4220.7: perl log --grep='^how to' 6.14(5.88+0.24) 4220.9: basic log --grep='[how] to' 6.96(6.65+0.28) 4220.10: extended log --grep='[how] to' 6.96(6.69+0.24) 4220.11: perl log --grep='[how] to' 6.95(6.58+0.33) 4220.13: basic log --grep='\(e.t[^ ]*\|v.ry\) rare' 7.10(6.80+0.27) 4220.14: extended log --grep='(e.t[^ ]*|v.ry) rare' 7.07(6.80+0.26) 4220.15: perl log --grep='(e.t[^ ]*|v.ry) rare' 7.70(7.46+0.22) 4220.17: basic log --grep='m\(ú\|u\)lt.b\(æ\|y\)te' 6.12(5.87+0.24) 4220.18: extended log --grep='m(ú|u)lt.b(æ|y)te' 6.14(5.84+0.26) 4220.19: perl log --grep='m(ú|u)lt.b(æ|y)te' 6.16(5.93+0.20) With -i: $ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_4220_LOG_OPTS=' -i' ./run p4220-log-grep-engines.sh [...] Test this tree 4220.1: basic log -i --grep='how.to' 6.74(6.41+0.32) 4220.2: extended log -i --grep='how.to' 6.78(6.55+0.22) 4220.3: perl log -i --grep='how.to' 6.06(5.77+0.28) 4220.5: basic log -i --grep='^how to'6.80(6.57+0.22) 4220.6: extended log -i --grep='^how to' 6.83(6.52+0.29) 4220.7: perl log -i --grep='^how to' 6.16(5.94+0.20) 4220.9: basic log -i --grep='[how] to' 7.87(7.61+0.24) 4220.10: extended log -i --grep='[how] to' 7.85(7.57+0.27) 4220.11: perl log -i --grep='[how] to' 7.03(6.75+0.25) 4220.13: basic log -i --grep='\(e.t[^ ]*\|v.ry\) rare' 8.68(8.41+0.25) 4220.14: extended log -i --grep='(e.t[^ ]*|v.ry) rare' 8.80(8.44+0.28) 4220.15: perl log -i --grep='(e.t[^ ]*|v.ry) rare' 7.85(7.56+0.26) 4220.17: basic log -i --grep='m\(ú\|u\)lt.b\(æ\|y\)te' 6.94(6.68+0.24) 4220.18: extended log -i --grep='m(ú|u)lt.b(æ|y)te' 7.04(6.76+0.24) 4220.19: perl log -i --grep='m(ú|u)lt.b(æ|y)te' 6.26(5.92+0.29) See commit ("perf: add a comparison test of grep regex engines", 2017-04-19) for details on the machine the above test run was executed on. Before commit ("log: make --regexp-ignore-case work with --perl-regexp", 2017-05-20) this test will almost definitely fail (depending on the repo) if passed the -i option, since it wasn't properly supported under PCRE. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/perf/p4220-log-grep-engines.sh | 53 1 file changed, 53 insertions(+) create mode 100755 t/perf/p4220-log-grep-engines.sh diff --git a/t/perf/p4220-log-grep-engines.sh b/t/perf/p4220-log-grep-engines.sh new file mode 100755 index 00..2bc47ded4d --- /dev/null +++ b/t/perf/p4220-log-grep-engines.sh @@ -0,0 +1,53 @@ +#!/bin/sh + +test_description="Comparison of git-log's --grep regex engines + +Set GIT_PERF_4220_LOG_OPTS in the environment to pass options to +git-grep. Make sure to include a leading space, +e.g. GIT_PERF_4220_LOG_OPTS=' -i'. Some options to try: + + -i + --invert-grep + -i --invert-grep +" + +. ./perf-lib.sh + +test_perf_large_repo +test_checkout_worktree + +for pattern in \ + 'how.to' \ + '^how to' \ + '[how] to' \ + '\(e.t[^ ]*\|v.ry\) rare' \ + 'm\(ú\|u\)lt.b\(æ\|y\)te' +do + for engine in basic extended perl + do + if test $engine != "basic" + then + # Poor man's basic -> extended converter. + pattern=$(echo $pattern | sed 's/\\//g') + fi + if test $engine = "perl" && ! test_have_prereq PCRE + then + prereq="PCRE" + else + prereq="" + fi + test_perf $prereq "$engine log$GIT_PERF_4220_LOG_OPTS --grep='$pattern'" " + git -c grep.patternType=$engine log --pretty=format:%h$GIT_PERF_4220_LOG_OPTS --grep='$pattern' >'out.$engine' || : + " + done + +
[PATCH v4 12/31] grep: add a test helper function for less verbose -f \0 tests
Add a helper function to make the tests which check for patterns with \0 in them more succinct. Right now this isn't a big win, but subsequent commits will add a lot more of these tests. The helper is based on the match() function in t3070-wildmatch.sh. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7008-grep-binary.sh | 58 +- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 9c9c378119..df93d8e44c 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -4,6 +4,29 @@ test_description='git grep in binary files' . ./test-lib.sh +nul_match () { + matches=$1 + flags=$2 + pattern=$3 + pattern_human=$(echo "$pattern" | sed 's/Q//g') + + if test "$matches" = 1 + then + test_expect_success "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + git grep -f f $flags a + " + elif test "$matches" = 0 + then + test_expect_success "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + test_must_fail git grep -f f $flags a + " + else + test_expect_success "PANIC: Test framework error. Unknown matches value $matches" 'false' + fi +} + test_expect_success 'setup' " echo 'binaryQfile' | q_to_nul >a && git add a && @@ -69,35 +92,12 @@ test_expect_failure 'git grep .fi a' ' git grep .fi a ' -test_expect_success 'git grep -F yf a' " - printf 'yQf' | q_to_nul >f && - git grep -f f -F a -" - -test_expect_success 'git grep -F yx a' " - printf 'yQx' | q_to_nul >f && - test_must_fail git grep -f f -F a -" - -test_expect_success 'git grep -Fi Yf a' " - printf 'YQf' | q_to_nul >f && - git grep -f f -Fi a -" - -test_expect_success 'git grep -Fi Yx a' " - printf 'YQx' | q_to_nul >f && - test_must_fail git grep -f f -Fi a -" - -test_expect_success 'git grep yf a' " - printf 'yQf' | q_to_nul >f && - git grep -f f a -" - -test_expect_success 'git grep yx a' " - printf 'yQx' | q_to_nul >f && - test_must_fail git grep -f f a -" +nul_match 1 '-F' 'yQf' +nul_match 0 '-F' 'yQx' +nul_match 1 '-Fi' 'YQf' +nul_match 0 '-Fi' 'YQx' +nul_match 1 '' 'yQf' +nul_match 0 '' 'yQx' test_expect_success 'grep respects binary diff attribute' ' echo text >t && -- 2.13.0.303.g4ebf302169
[PATCH v4 13/31] grep: prepare for testing binary regexes containing rx metacharacters
Add setup code needed for testing regexes that contain both binary data and regex metacharacters. The POSIX regcomp() function inherently can't support that, because it takes a \0-delimited char *, but other regex engines APIs like PCRE v2 take a pattern/length pair, and are thus able to handle \0s in patterns as well as any other character. When kwset was imported in commit 9eceddeec6 ("Use kwset in grep", 2011-08-21) this limitation was fixed, but at the expense of introducing the undocumented limitation that any pattern containing \0 implicitly becomes a fixed match (equivalent to -F having been provided). That's not something we'd like to keep in the future. The inability to match patterns containing \0 is a leaky implementation detail. So add tests as a first step towards changing that. In order to test that \0-patterns can properly match as regexes the test string needs to have some regex metacharacters in it. There were other blind spots in the tests. The code around kwset specially handles case-insensitive & non-ASCII data, but there were no tests for this. Fix all of that by amending the text being matched to contain both regex metacharacters & non-ASCII data. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7008-grep-binary.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index df93d8e44c..20370d6e0c 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -28,7 +28,7 @@ nul_match () { } test_expect_success 'setup' " - echo 'binaryQfile' | q_to_nul >a && + echo 'binaryQfileQm[*]cQ*æQð' | q_to_nul >a && git add a && git commit -m. " @@ -162,7 +162,7 @@ test_expect_success 'grep does not honor textconv' ' ' test_expect_success 'grep --textconv honors textconv' ' - echo "a:binaryQfile" >expect && + echo "a:binaryQfileQm[*]cQ*æQð" >expect && git grep --textconv Qfile >actual && test_cmp expect actual ' @@ -172,7 +172,7 @@ test_expect_success 'grep --no-textconv does not honor textconv' ' ' test_expect_success 'grep --textconv blob honors textconv' ' - echo "HEAD:a:binaryQfile" >expect && + echo "HEAD:a:binaryQfileQm[*]cQ*æQð" >expect && git grep --textconv Qfile HEAD:a >actual && test_cmp expect actual ' -- 2.13.0.303.g4ebf302169
[PATCH v4 14/31] grep: add tests to fix blind spots with \0 patterns
Address a big blind spot in the tests for patterns containing \0. The is_fixed() function considers any string that contains \0 fixed, even if it contains regular expression metacharacters, those patterns are currently matched with kwset. Before this change removing that memchr(s, 0, len) check from is_fixed() wouldn't change the result of any of the tests, since regcomp() will happily match the part before the \0. The kwset path is dependent on whether the the -i flag is on, and whether the pattern has any non-ASCII characters, but none of this was tested for. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7008-grep-binary.sh | 71 ++ 1 file changed, 71 insertions(+) diff --git a/t/t7008-grep-binary.sh b/t/t7008-grep-binary.sh index 20370d6e0c..615e7e0162 100755 --- a/t/t7008-grep-binary.sh +++ b/t/t7008-grep-binary.sh @@ -22,6 +22,18 @@ nul_match () { printf '$pattern' | q_to_nul >f && test_must_fail git grep -f f $flags a " + elif test "$matches" = T1 + then + test_expect_failure "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + git grep -f f $flags a + " + elif test "$matches" = T0 + then + test_expect_failure "git grep -f f $flags '$pattern_human' a" " + printf '$pattern' | q_to_nul >f && + test_must_fail git grep -f f $flags a + " else test_expect_success "PANIC: Test framework error. Unknown matches value $matches" 'false' fi @@ -98,6 +110,65 @@ nul_match 1 '-Fi' 'YQf' nul_match 0 '-Fi' 'YQx' nul_match 1 '' 'yQf' nul_match 0 '' 'yQx' +nul_match 1 '' 'æQð' +nul_match 1 '-F' 'eQm[*]c' +nul_match 1 '-Fi' 'EQM[*]C' + +# Regex patterns that would match but shouldn't with -F +nul_match 0 '-F' 'yQ[f]' +nul_match 0 '-F' '[y]Qf' +nul_match 0 '-Fi' 'YQ[F]' +nul_match 0 '-Fi' '[Y]QF' +nul_match 0 '-F' 'æQ[ð]' +nul_match 0 '-F' '[æ]Qð' +nul_match 0 '-Fi' 'ÆQ[Ð]' +nul_match 0 '-Fi' '[Æ]QÐ' + +# kwset is disabled on -i & non-ASCII. No way to match non-ASCII \0 +# patterns case-insensitively. +nul_match T1 '-i' 'ÆQÐ' + +# \0 implicitly disables regexes. This is an undocumented internal +# limitation. +nul_match T1 '' 'yQ[f]' +nul_match T1 '' '[y]Qf' +nul_match T1 '-i' 'YQ[F]' +nul_match T1 '-i' '[Y]Qf' +nul_match T1 '' 'æQ[ð]' +nul_match T1 '' '[æ]Qð' +nul_match T1 '-i' 'ÆQ[Ð]' + +# ... because of \0 implicitly disabling regexes regexes that +# should/shouldn't match don't do the right thing. +nul_match T1 '' 'eQm.*cQ' +nul_match T1 '-i' 'EQM.*cQ' +nul_match T0 '' 'eQm[*]c' +nul_match T0 '-i' 'EQM[*]C' + +# Due to the REG_STARTEND extension when kwset() is disabled on -i & +# non-ASCII the string will be matched in its entirety, but the +# pattern will be cut off at the first \0. +nul_match 0 '-i' 'NOMATCHQð' +nul_match T0 '-i' '[Æ]QNOMATCH' +nul_match T0 '-i' '[æ]QNOMATCH' +# Matches, but for the wrong reasons, just stops at [æ] +nul_match 1 '-i' '[Æ]Qð' +nul_match 1 '-i' '[æ]Qð' + +# Ensure that the matcher doesn't regress to something that stops at +# \0 +nul_match 0 '-F' 'yQ[f]' +nul_match 0 '-Fi' 'YQ[F]' +nul_match 0 '' 'yQNOMATCH' +nul_match 0 '' 'QNOMATCH' +nul_match 0 '-i' 'YQNOMATCH' +nul_match 0 '-i' 'QNOMATCH' +nul_match 0 '-F' 'æQ[ð]' +nul_match 0 '-Fi' 'ÆQ[Ð]' +nul_match 0 '' 'yQNÓMATCH' +nul_match 0 '' 'QNÓMATCH' +nul_match 0 '-i' 'YQNÓMATCH' +nul_match 0 '-i' 'QNÓMATCH' test_expect_success 'grep respects binary diff attribute' ' echo text >t && -- 2.13.0.303.g4ebf302169
[PATCH v4 05/31] log: make --regexp-ignore-case work with --perl-regexp
Make the --regexp-ignore-case option work with --perl-regexp. This never worked, and there was no test for this. Fix the bug and add a test. When PCRE support was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09) compile_pcre_regexp() would only check opt->ignore_case, but when the --perl-regexp option was added in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03) the code didn't set the opt->ignore_case. Change the test suite to test for -i and --invert-regexp with basic/extended/perl patterns in addition to fixed, which was the only patternType that was tested for before in combination with those options. Signed-off-by: Ævar Arnfjörð Bjarmason --- revision.c | 1 + t/t4202-log.sh | 60 +- 2 files changed, 56 insertions(+), 5 deletions(-) diff --git a/revision.c b/revision.c index 8a8c1789c7..4883cdd2d0 100644 --- a/revision.c +++ b/revision.c @@ -1991,6 +1991,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--extended-regexp") || !strcmp(arg, "-E")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_ERE; } else if (!strcmp(arg, "--regexp-ignore-case") || !strcmp(arg, "-i")) { + revs->grep_filter.ignore_case = 1; revs->grep_filter.regflags |= REG_ICASE; DIFF_OPT_SET(&revs->diffopt, PICKAXE_IGNORE_CASE); } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) { diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 948fd719d2..c25eb9afd1 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -231,14 +231,47 @@ second initial EOF test_expect_success 'log --invert-grep --grep' ' - git log --pretty="tformat:%s" --invert-grep --grep=th --grep=Sec >actual && - test_cmp expect actual + # Fixed + git -c grep.patternType=fixed log --pretty="tformat:%s" --invert-grep --grep=th --grep=Sec >actual && + test_cmp expect actual && + + # POSIX basic + git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep --grep=t[h] --grep=S[e]c >actual && + test_cmp expect actual && + + # POSIX extended + git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep --grep=t[h] --grep=S[e]c >actual && + test_cmp expect actual && + + # PCRE + if test_have_prereq PCRE + then + git -c grep.patternType=perl log --pretty="tformat:%s" --invert-grep --grep=t[h] --grep=S[e]c >actual && + test_cmp expect actual + fi ' test_expect_success 'log --invert-grep --grep -i' ' echo initial >expect && - git log --pretty="tformat:%s" --invert-grep -i --grep=th --grep=Sec >actual && - test_cmp expect actual + + # Fixed + git -c grep.patternType=fixed log --pretty="tformat:%s" --invert-grep -i --grep=th --grep=Sec >actual && + test_cmp expect actual && + + # POSIX basic + git -c grep.patternType=basic log --pretty="tformat:%s" --invert-grep -i --grep=t[h] --grep=S[e]c >actual && + test_cmp expect actual && + + # POSIX extended + git -c grep.patternType=extended log --pretty="tformat:%s" --invert-grep -i --grep=t[h] --grep=S[e]c >actual && + test_cmp expect actual && + + # PCRE + if test_have_prereq PCRE + then + git -c grep.patternType=perl log --pretty="tformat:%s" --invert-grep -i --grep=t[h] --grep=S[e]c >actual && + test_cmp expect actual + fi ' test_expect_success 'log --grep option parsing' ' @@ -256,8 +289,25 @@ test_expect_success 'log -i --grep' ' test_expect_success 'log --grep -i' ' echo Second >expect && + + # Fixed git log -1 --pretty="tformat:%s" --grep=sec -i >actual && - test_cmp expect actual + test_cmp expect actual && + + # POSIX basic + git -c grep.patternType=basic log -1 --pretty="tformat:%s" --grep=s[e]c -i >actual && + test_cmp expect actual && + + # POSIX extended + git -c grep.patternType=extended log -1 --pretty="tformat:%s" --grep=s[e]c -i >actual && + test_cmp expect actual && + + # PCRE + if test_have_prereq PCRE + then + git -c grep.patternType=perl log -1 --pretty="tformat:%s" --grep=s[e]c -i >actual && + test_cmp expect actual + fi ' test_expect_success 'log -F -E --grep= uses ere' ' -- 2.13.0.303.g4ebf302169
[PATCH v4 10/31] grep: amend submodule recursion test for regex engine testing
Amend the submodule recursion test to prepare it for subsequent tests of whether it passes along the grep.patternType to the submodule greps. This is the result of searching & replacing: foobar -> (1|2)d(3|4) foo-> (1|2) bar-> (3|4) Currently there's no tests for whether e.g. -P or -E is correctly passed along, tests for that will be added in a follow-up change, but first add content to the tests which will match differently under different regex engines. Reuse the pattern established in an earlier commit of mine in this series ("log: add exhaustive tests for pattern style options & config", 2017-04-07). The pattern "(.|.)[\d]" will match this content differently under fixed/basic/extended & perl. This test code was originally added in commit 0281e487fd ("grep: optionally recurse into submodules", 2016-12-16). Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7814-grep-recurse-submodules.sh | 166 ++--- 1 file changed, 83 insertions(+), 83 deletions(-) diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh index 5b6eb3a65e..1472855e1d 100755 --- a/t/t7814-grep-recurse-submodules.sh +++ b/t/t7814-grep-recurse-submodules.sh @@ -9,13 +9,13 @@ submodules. . ./test-lib.sh test_expect_success 'setup directory structure and submodule' ' - echo "foobar" >a && + echo "(1|2)d(3|4)" >a && mkdir b && - echo "bar" >b/b && + echo "(3|4)" >b/b && git add a b && git commit -m "add a and b" && git init submodule && - echo "foobar" >submodule/a && + echo "(1|2)d(3|4)" >submodule/a && git -C submodule add a && git -C submodule commit -m "add a" && git submodule add ./submodule && @@ -24,18 +24,18 @@ test_expect_success 'setup directory structure and submodule' ' test_expect_success 'grep correctly finds patterns in a submodule' ' cat >expect <<-\EOF && - a:foobar - b/b:bar - submodule/a:foobar + a:(1|2)d(3|4) + b/b:(3|4) + submodule/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules >actual && + git grep -e "(3|4)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'grep and basic pathspecs' ' cat >expect <<-\EOF && - submodule/a:foobar + submodule/a:(1|2)d(3|4) EOF git grep -e. --recurse-submodules -- submodule >actual && @@ -44,7 +44,7 @@ test_expect_success 'grep and basic pathspecs' ' test_expect_success 'grep and nested submodules' ' git init submodule/sub && - echo "foobar" >submodule/sub/a && + echo "(1|2)d(3|4)" >submodule/sub/a && git -C submodule/sub add a && git -C submodule/sub commit -m "add a" && git -C submodule submodule add ./sub && @@ -54,117 +54,117 @@ test_expect_success 'grep and nested submodules' ' git commit -m "updated submodule" && cat >expect <<-\EOF && - a:foobar - b/b:bar - submodule/a:foobar - submodule/sub/a:foobar + a:(1|2)d(3|4) + b/b:(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules >actual && + git grep -e "(3|4)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'grep and multiple patterns' ' cat >expect <<-\EOF && - a:foobar - submodule/a:foobar - submodule/sub/a:foobar + a:(1|2)d(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) EOF - git grep -e "bar" --and -e "foo" --recurse-submodules >actual && + git grep -e "(3|4)" --and -e "(1|2)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'grep and multiple patterns' ' cat >expect <<-\EOF && - b/b:bar + b/b:(3|4) EOF - git grep -e "bar" --and --not -e "foo" --recurse-submodules >actual && + git grep -e "(3|4)" --and --not -e "(1|2)" --recurse-submodules >actual && test_cmp expect actual ' test_expect_success 'basic grep tree' ' cat >expect <<-\EOF && - HEAD:a:foobar - HEAD:b/b:bar - HEAD:submodule/a:foobar - HEAD:submodule/sub/a:foobar + HEAD:a:(1|2)d(3|4) + HEAD:b/b:(3|4) + HEAD:submodule/a:(1|2)d(3|4) + HEAD:submodule/sub/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules HEAD >actual && + git grep -e "(3|4)" --recurse-submodules HEAD >actual && test_cmp expect actual ' test_expect_success 'grep tree HEAD^' ' cat >expect <<-\EOF && - HEAD^:a:foobar - HEAD^:b/b:bar - HEAD^:submodule/a:foobar + HEAD^:a:(1|2)d(3|4) + HEAD^:b/b:(3|4) + HEAD^:submodule/a:(1|2)d(3|4) EOF - git grep -e "bar" --recurse-submodules HEAD^ >actual && + git grep -e "(3|4)" --recurs
[PATCH v4 07/31] grep: add a test for backreferences in PCRE patterns
Add a test for backreferences such as (.)\1 in PCRE patterns. This test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned on. Before this change turning it on would break these sort of patterns, but wouldn't break any tests. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7810-grep.sh | 7 +++ 1 file changed, 7 insertions(+) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 8d69113695..daa906b9b0 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -1114,6 +1114,13 @@ test_expect_success PCRE 'grep -P -w pattern' ' test_cmp expected actual ' +test_expect_success PCRE 'grep -P backreferences work (the PCRE NO_AUTO_CAPTURE flag is not set)' ' + git grep -P -h "(?P.)(?P=one)" hello_world >actual && + test_cmp hello_world actual && + git grep -P -h "(.)\1" hello_world >actual && + test_cmp hello_world actual +' + test_expect_success 'grep -G invalidpattern properly dies ' ' test_must_fail git grep -G "a[" ' -- 2.13.0.303.g4ebf302169
[PATCH v4 08/31] grep: change non-ASCII -i test to stop using --debug
Change a non-ASCII case-insensitive test case to stop using --debug, and instead simply test for the expected results. The test coverage remains the same with this change, but the test won't break due to internal refactoring. This test was added in commit 793dc676e0 ("grep/icase: avoid kwsset when -F is specified", 2016-06-25). It was asserting that the regex must be compiled with compile_fixed_regexp(), instead test for the expected results, allowing the underlying implementation to change. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7812-grep-icase-non-ascii.sh | 25 + 1 file changed, 5 insertions(+), 20 deletions(-) diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh index 04a61cb8e0..0059a1f837 100755 --- a/t/t7812-grep-icase-non-ascii.sh +++ b/t/t7812-grep-icase-non-ascii.sh @@ -36,29 +36,14 @@ test_expect_success GETTEXT_LOCALE,PCRE 'grep pcre utf-8 string with "+"' ' ' test_expect_success REGEX_LOCALE 'grep literal string, with -F' ' - git grep --debug -i -F "TILRAUN: Halló Heimur!" 2>&1 >/dev/null | -grep fixed >debug1 && - test_write_lines "fixed TILRAUN: Halló Heimur!" >expect1 && - test_cmp expect1 debug1 && - - git grep --debug -i -F "TILRAUN: HALLÓ HEIMUR!" 2>&1 >/dev/null | -grep fixed >debug2 && - test_write_lines "fixed TILRAUN: HALLÓ HEIMUR!" >expect2 && - test_cmp expect2 debug2 + git grep -i -F "TILRAUN: Halló Heimur!" && + git grep -i -F "TILRAUN: HALLÓ HEIMUR!" ' test_expect_success REGEX_LOCALE 'grep string with regex, with -F' ' - test_write_lines "^*TILR^AUN:.* \\Halló \$He[]imur!\$" >file && - - git grep --debug -i -F "^*TILR^AUN:.* \\Halló \$He[]imur!\$" 2>&1 >/dev/null | -grep fixed >debug1 && - test_write_lines "fixed \\^*TILR^AUN:\\.\\* Halló \$He\\[]imur!\\\$" >expect1 && - test_cmp expect1 debug1 && - - git grep --debug -i -F "^*TILR^AUN:.* \\HALLÓ \$HE[]IMUR!\$" 2>&1 >/dev/null | -grep fixed >debug2 && - test_write_lines "fixed \\^*TILR^AUN:\\.\\* HALLÓ \$HE\\[]IMUR!\\\$" >expect2 && - test_cmp expect2 debug2 + test_write_lines "TILRAUN: Halló Heimur [abc]!" >file3 && + git add file3 && + git grep -i -F "TILRAUN: Halló Heimur [abc]!" file3 ' test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' ' -- 2.13.0.303.g4ebf302169
[PATCH v4 06/31] grep: add a test asserting that --perl-regexp dies when !PCRE
Add a test asserting that when --perl-regexp (and -P for grep) is given to git-grep & git-log that we die with an error. In developing the PCRE v2 series I introduced a regression where -P would (through control-flow fall-through) become synonymous with basic POSIX matching. I.e. 'git grep -P '[\d]' would match "d" instead of digits. The entire test suite would still pass with this serious regression, since everything that tested for --perl-regexp would be guarded by the PCRE prerequisite, fix that blind-spot by adding tests under !PCRE asserting that git must die when given --perl-regexp or -P. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t4202-log.sh | 4 +++- t/t7810-grep.sh | 12 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/t/t4202-log.sh b/t/t4202-log.sh index c25eb9afd1..c44c4337f8 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -418,7 +418,9 @@ test_expect_success 'log with various grep.patternType configurations & command- git log --pretty=tformat:%s --perl-regexp \ --grep="[\d]\|" >actual.perl.long-arg && test_cmp expect.perl actual.perl.long-arg - + else + test_must_fail git log --perl-regexp \ + --grep="[\d]\|" fi && test_cmp expect.fixed actual.fixed.long-arg && test_cmp expect.basic actual.basic.long-arg && diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index c84c4d99f9..8d69113695 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -281,6 +281,10 @@ do test_cmp expected actual ' + test_expect_success !PCRE "grep $L with grep.patterntype=perl errors without PCRE" ' + test_must_fail git -c grep.patterntype=perl grep "foo.*bar" + ' + test_expect_success "grep $L with grep.patternType=default and grep.extendedRegexp=true" ' echo "${HC}ab:abc" >expected && git \ @@ -1058,11 +1062,19 @@ test_expect_success PCRE 'grep --perl-regexp pattern' ' test_cmp expected actual ' +test_expect_success !PCRE 'grep --perl-regexp pattern errors without PCRE' ' + test_must_fail git grep --perl-regexp "foo.*bar" +' + test_expect_success PCRE 'grep -P pattern' ' git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' +test_expect_success !PCRE 'grep -P pattern errors without PCRE' ' + test_must_fail git grep -P "foo.*bar" +' + test_expect_success 'grep pattern with grep.extendedRegexp=true' ' >empty && test_must_fail git -c grep.extendedregexp=true \ -- 2.13.0.303.g4ebf302169
[PATCH v4 09/31] grep: add tests for --threads=N and grep.threads
Add tests for --threads=N being supplied on the command-line, or when grep.threads=N being supplied in the configuration. When the threading support was made run-time configurable in commit 89f09dd34e ("grep: add --threads= option and grep.threads configuration", 2015-12-15) no tests were added for it. In developing a change to the grep code I was able to make '--threads=1 ` segfault, while the test suite still passed. This change fixes that blind spot in the tests. In addition to asserting that asking for N threads shouldn't segfault, test that the grep output given any N is the same. The choice to test only 1..10 as opposed to 1..8 or 1..16 or whatever is arbitrary. Testing 1..1024 works locally for me (but gets noticeably slower as more threads are spawned). Given the structure of the code there's no reason to test an arbitrary number of threads, only 0, 1 and >=2 are special modes of operation. A later patch introduces a PTHREADS test prerequisite which is true under NO_PTHREADS=UnfortunatelyYes, but even under NO_PTHREADS it's fine to test --threads=N, we'll just ignore it and not use threading. So these tests also make sense under that mode to assert that --threads=N without pthreads still returns expected results. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7810-grep.sh | 16 1 file changed, 16 insertions(+) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index daa906b9b0..561709ef6a 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -775,6 +775,22 @@ test_expect_success 'grep -W with userdiff' ' test_cmp expected actual ' +for threads in $(test_seq 0 10) +do + test_expect_success "grep --threads=$threads & -c grep.threads=$threads" " + git grep --threads=$threads . >actual.$threads && + if test $threads -ge 1 + then + test_cmp actual.\$(($threads - 1)) actual.$threads + fi && + git -c grep.threads=$threads grep . >actual.$threads && + if test $threads -ge 1 + then + test_cmp actual.\$(($threads - 1)) actual.$threads + fi + " +done + test_expect_success 'grep from a subdirectory to search wider area (1)' ' mkdir -p s && ( -- 2.13.0.303.g4ebf302169
[PATCH v4 11/31] grep: add tests for grep pattern types being passed to submodules
Add testing for grep pattern types being correctly passed to submodules. The pattern "(.|.)[\d]" matches differently under fixed (not at all), and then matches different lines under basic/extended & perl regular expressions, so this change asserts that the pattern type is passed along correctly. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t7814-grep-recurse-submodules.sh | 49 ++ 1 file changed, 49 insertions(+) diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh index 1472855e1d..3a58197f47 100755 --- a/t/t7814-grep-recurse-submodules.sh +++ b/t/t7814-grep-recurse-submodules.sh @@ -313,4 +313,53 @@ test_incompatible_with_recurse_submodules () test_incompatible_with_recurse_submodules --untracked test_incompatible_with_recurse_submodules --no-index +test_expect_success 'grep --recurse-submodules should pass the pattern type along' ' + # Fixed + test_must_fail git grep -F --recurse-submodules -e "(.|.)[\d]" && + test_must_fail git -c grep.patternType=fixed grep --recurse-submodules -e "(.|.)[\d]" && + + # Basic + git grep -G --recurse-submodules -e "(.|.)[\d]" >actual && + cat >expect <<-\EOF && + a:(1|2)d(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) + EOF + test_cmp expect actual && + git -c grep.patternType=basic grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual && + + # Extended + git grep -E --recurse-submodules -e "(.|.)[\d]" >actual && + cat >expect <<-\EOF && + .gitmodules:[submodule "submodule"] + .gitmodules:path = submodule + .gitmodules:url = ./submodule + a:(1|2)d(3|4) + submodule/.gitmodules:[submodule "sub"] + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) + EOF + test_cmp expect actual && + git -c grep.patternType=extended grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual && + git -c grep.extendedRegexp=true grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual && + + # Perl + if test_have_prereq PCRE + then + git grep -P --recurse-submodules -e "(.|.)[\d]" >actual && + cat >expect <<-\EOF && + a:(1|2)d(3|4) + b/b:(3|4) + submodule/a:(1|2)d(3|4) + submodule/sub/a:(1|2)d(3|4) + EOF + test_cmp expect actual && + git -c grep.patternType=perl grep --recurse-submodules -e "(.|.)[\d]" >actual && + test_cmp expect actual + fi +' + test_done -- 2.13.0.303.g4ebf302169
[PATCH v4 01/31] Makefile & configure: reword inaccurate comment about PCRE
Reword an outdated & inaccurate comment which suggests that only git-grep can use PCRE. This comment was added back when PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true at the time. It hasn't been telling the full truth since git-log learned to use PCRE with --grep in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03), and more importantly is likely to get more inaccurate over time as more use is made of PCRE in other areas. Reword it to be more future-proof, and to more clearly explain that this enables user-initiated runtime behavior. Copy/pasting this so much in configure.ac is lame, these Makefile-like flags aren't even used by autoconf, just the corresponding --with[out]-* options. But copy/pasting the comments that make sense for the Makefile to configure.ac where they make less sense is the pattern everything else follows in that file. I'm not going to war against that as part of this change, just following the existing pattern. Signed-off-by: Ævar Arnfjörð Bjarmason --- Makefile | 6 -- configure.ac | 12 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index e35542e631..eedadb8056 100644 --- a/Makefile +++ b/Makefile @@ -24,8 +24,10 @@ all:: # Define NO_OPENSSL environment variable if you do not have OpenSSL. # This also implies BLK_SHA1. # -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as log and grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in # /foo/bar/include and /foo/bar/lib directories. diff --git a/configure.ac b/configure.ac index 128165529f..deeb968daa 100644 --- a/configure.ac +++ b/configure.ac @@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library (default is YES)]) AS_HELP_STRING([], [ARG can be prefix for openssl library and headers]), GIT_PARSE_WITH([openssl])) -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as log and grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in # /foo/bar/include and /foo/bar/lib directories. @@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO]) GIT_CONF_SUBST([NO_OPENSSL]) # -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as log and grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # if test -n "$USE_LIBPCRE"; then -- 2.13.0.303.g4ebf302169
[PATCH v4 02/31] grep & rev-list doc: stop promising libpcre for --perl-regexp
Stop promising in our grep & rev-list options documentation that we're always going to be using libpcre when given the --perl-regexp option. Instead talk about using "Perl-compatible regular expressions" and using these types of patterns using "a compile-time dependency". Saying "libpcre" means that we're talking about libpcre.so, which is always going to be v1. This change is part of an ongoing saga to add support for libpcre2, which comes with PCRE v2. In the future we might use some completely unrelated library to provide perl-compatible regular expression support. By wording the documentation differently and not promising any specific version of PCRE or even PCRE at all we have more wiggle room to change the implementation. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/git-grep.txt | 7 +-- Documentation/rev-list-options.txt | 8 ++-- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 71f32f3508..5033483db4 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -161,8 +161,11 @@ OPTIONS -P:: --perl-regexp:: - Use Perl-compatible regexp for patterns. Requires libpcre to be - compiled in. + Use Perl-compatible regular expressions for patterns. ++ +Support for these types of regular expressions is an optional +compile-time dependency. If Git wasn't compiled with support for them +providing this option will cause it to die. -F:: --fixed-strings:: diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a02f7324c0..a46f70c2b1 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -92,8 +92,12 @@ endif::git-rev-list[] pattern as a regular expression). --perl-regexp:: - Consider the limiting patterns to be Perl-compatible regular expressions. - Requires libpcre to be compiled in. + Consider the limiting patterns to be Perl-compatible regular + expressions. ++ +Support for these types of regular expressions is an optional +compile-time dependency. If Git wasn't compiled with support for them +providing this option will cause it to die. --remove-empty:: Stop when a given path disappears from the tree. -- 2.13.0.303.g4ebf302169
[PATCH v4 04/31] log: add exhaustive tests for pattern style options & config
Add exhaustive tests for how the different grep.patternType options & the corresponding command-line options affect git-log. Before this change it was possible to patch revision.c so that the --basic-regexp option was synonymous with --extended-regexp, and --perl-regexp wasn't recognized at all, and still have 100% of the test suite pass. This was because the first test being modified here, added in commit 34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as "git grep"", 2012-10-03), didn't actually check whether we'd enabled extended regular expressions as distinct from re-toggling non-fixed string support. Fix that by changing the pattern to a pattern that'll only match if --extended-regexp option is provided, but won't match under the default --basic-regexp option. Other potential regressions were possible since there were no tests for the rest of the combinations of grep.patternType configuration toggles & corresponding git-log command-line options. Add exhaustive tests for those. The patterns being passed to fixed/basic/extended/PCRE are carefully crafted to return the wrong thing if the grep engine were to pick any other matching method than the one it's told to use. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/t4202-log.sh | 98 +- 1 file changed, 97 insertions(+), 1 deletion(-) diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 1c7d6729c6..948fd719d2 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -262,7 +262,30 @@ test_expect_success 'log --grep -i' ' test_expect_success 'log -F -E --grep= uses ere' ' echo second >expect && - git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual && + # basic would need \(s\) to do the same + git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual && + test_cmp expect actual +' + +test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' ' + test_when_finished "rm -rf num_commits" && + git init num_commits && + ( + cd num_commits && + test_commit 1d && + test_commit 2e + ) && + + # In PCRE \d in [\d] is like saying "0-9", and matches the 2 + # in 2e... + echo 2e >expect && + git -C num_commits log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="[\d]" >actual && + test_cmp expect actual && + + # ...in POSIX basic and extended it is the same as [d], + # i.e. "d", which matches 1d, but does not match 2e. + echo 1d >expect && + git -C num_commits log -1 --pretty="tformat:%s" -F -E --grep="[\d]" >actual && test_cmp expect actual ' @@ -280,6 +303,79 @@ test_expect_success 'log with grep.patternType configuration and command line' ' test_cmp expect actual ' +test_expect_success 'log with various grep.patternType configurations & command-lines' ' + git init pattern-type && + ( + cd pattern-type && + test_commit 1 file A && + + # The tagname is overridden here because creating a + # tag called "(1|2)" as test_commit would otherwise + # implicitly do would fail on e.g. MINGW. + test_commit "(1|2)" file B 2 && + + echo "(1|2)" >expect.fixed && + cp expect.fixed expect.basic && + cp expect.fixed expect.extended && + cp expect.fixed expect.perl && + + # A strcmp-like match with fixed. + git -c grep.patternType=fixed log --pretty=tformat:%s \ + --grep="(1|2)" >actual.fixed && + + # POSIX basic matches (, | and ) literally. + git -c grep.patternType=basic log --pretty=tformat:%s \ + --grep="(.|.)" >actual.basic && + + # POSIX extended needs to have | escaped to match it + # literally, whereas under basic this is the same as + # (|2), i.e. it would also match "1". This test checks + # for extended by asserting that it is not matching + # what basic would match. + git -c grep.patternType=extended log --pretty=tformat:%s \ + --grep="\|2" >actual.extended && + if test_have_prereq PCRE + then + # Only PCRE would match [\d]\| with only + # "(1|2)" due to [\d]. POSIX basic would match + # both it and "1" since similarly to the + # extended match above it is the same as + # \([\d]\|\). POSIX extended would + # match neither. + git -c grep.patternType=perl log --pretty=tformat:%s \ + --grep="[\d]\|" >actual.perl && + test_cmp expect.perl actual.perl + fi && + test_cmp expect.fixed actual.
[PATCH v4 03/31] test-lib: rename the LIBPCRE prerequisite to PCRE
Rename the LIBPCRE prerequisite to PCRE. This is for preparation for libpcre2 support, where having just "LIBPCRE" would be confusing as it implies v1 of the library. None of these tests are incompatible between versions 1 & 2 of libpcre, it's less confusing to give them a more general name to make it clear that they work on both library versions. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/README| 4 ++-- t/t7810-grep.sh | 28 ++-- t/t7812-grep-icase-non-ascii.sh | 4 ++-- t/t7813-grep-icase-iso.sh | 2 +- t/test-lib.sh | 2 +- 5 files changed, 20 insertions(+), 20 deletions(-) diff --git a/t/README b/t/README index ab386c3681..a90cb62583 100644 --- a/t/README +++ b/t/README @@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own. Test is not run by root user, and an attempt to write to an unwritable file is expected to fail correctly. - - LIBPCRE + - PCRE - Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests + Git was compiled with support for PCRE. Wrap any tests that use git-grep --perl-regexp or git-grep -P in these. - CASE_INSENSITIVE_FS diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index cee42097b0..c84c4d99f9 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -275,7 +275,7 @@ do test_cmp expected actual ' - test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" ' + test_expect_success PCRE "grep $L with grep.patterntype=perl" ' echo "${HC}ab:a+b*c" >expected && git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab >actual && test_cmp expected actual @@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv) hello.c: printf("Hello world.\n"); EOF -test_expect_success LIBPCRE 'grep --perl-regexp pattern' ' +test_expect_success PCRE 'grep --perl-regexp pattern' ' git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P pattern' ' +test_expect_success PCRE 'grep -P pattern' ' git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' @@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with grep.extendedRegexp=true' ' test_cmp empty actual ' -test_expect_success LIBPCRE 'grep -P pattern with grep.extendedRegexp=true' ' +test_expect_success PCRE 'grep -P pattern with grep.extendedRegexp=true' ' git -c grep.extendedregexp=true \ grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P -v pattern' ' +test_expect_success PCRE 'grep -P -v pattern' ' { echo "ab:a+b*c" echo "ab:a+bc" @@ -1085,7 +1085,7 @@ test_expect_success LIBPCRE 'grep -P -v pattern' ' test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P -i pattern' ' +test_expect_success PCRE 'grep -P -i pattern' ' cat >expected <<-EOF && hello.c:printf("Hello world.\n"); EOF @@ -1093,7 +1093,7 @@ test_expect_success LIBPCRE 'grep -P -i pattern' ' test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P -w pattern' ' +test_expect_success PCRE 'grep -P -w pattern' ' { echo "hello_world:Hello world" echo "hello_world:HeLLo world" @@ -1118,11 +1118,11 @@ test_expect_success 'grep invalidpattern properly dies with grep.patternType=ext test_must_fail git -c grep.patterntype=extended grep "a[" ' -test_expect_success LIBPCRE 'grep -P invalidpattern properly dies ' ' +test_expect_success PCRE 'grep -P invalidpattern properly dies ' ' test_must_fail git grep -P "a[" ' -test_expect_success LIBPCRE 'grep invalidpattern properly dies with grep.patternType=perl' ' +test_expect_success PCRE 'grep invalidpattern properly dies with grep.patternType=perl' ' test_must_fail git -c grep.patterntype=perl grep "a[" ' @@ -1191,13 +1191,13 @@ test_expect_success 'grep pattern with grep.patternType=fixed, =basic, =perl, =e test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -G -F -E -P pattern' ' +test_expect_success PCRE 'grep -G -F -E -P pattern' ' echo "d0:0" >expected && git grep -G -F -E -P "[\d]" d0 >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' ' +test_expect_success PCRE 'grep pattern with grep.patternType=fixed, =basic, =extended, =perl' ' echo "d0:0" >expected && git \ -c grep.patterntype=fixed \ @@ -1208,7 +1208,7 @@ test_expect_success LIBPCRE 'grep pattern with grep.patternType=fixed, =basic, = test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P pattern with grep.patternType=fixed' ' +test_e
[PATCH v4 00/31] Easy to review grep & pre-PCRE changes
Easy to review? 29 (I mean 30, I mean 31) patches? Are you kidding me?! As noted in v1 (<20170511091829.5634-1-ava...@gmail.com>; https://public-inbox.org/git/20170511091829.5634-1-ava...@gmail.com/) these are all doc, test, refactoring etc. changes needed by the subsequent "PCRE v2, PCRE v1 JIT, log -P & fixes" series. See <20170520214233.7183-1-ava...@gmail.com> (https://public-inbox.org/git/20170520214233.7183-1-ava...@gmail.com/) v3 & notes about that version. What changed this time around? See below: Ævar Arnfjörð Bjarmason (31): Makefile & configure: reword inaccurate comment about PCRE grep & rev-list doc: stop promising libpcre for --perl-regexp test-lib: rename the LIBPCRE prerequisite to PCRE log: add exhaustive tests for pattern style options & config log: make --regexp-ignore-case work with --perl-regexp grep: add a test asserting that --perl-regexp dies when !PCRE grep: add a test for backreferences in PCRE patterns grep: change non-ASCII -i test to stop using --debug grep: add tests for --threads=N and grep.threads grep: amend submodule recursion test for regex engine testing grep: add tests for grep pattern types being passed to submodules grep: add a test helper function for less verbose -f \0 tests grep: prepare for testing binary regexes containing rx metacharacters grep: add tests to fix blind spots with \0 patterns No changes. perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do Fix trailing whitespace. perf: emit progress output when unpacking & building No changes. perf: add a comparison test of grep regex engines perf: add a comparison test of grep regex engines with -F perf: add a comparison test of log --grep regex engines perf: add a comparison test of log --grep regex engines with -F These are all improved: * Skip the PCRE test when we don't have the PCRE prerequisite, instead of erroring out. * Update outdated commit messages left over from previous submissions. * General minor nits in the code, e.g. use the same for-loop variable name in all four files which have similar code, simplify the test_cmp invocation etc. grep: catch a missing enum in switch statement grep: remove redundant regflags assignments No changes. grep: factor test for \0 in grep patterns into a function Brandon pointed out that one of the lines in this patch was longer than 79 characters. Fixed. grep: change the internal PCRE macro names to be PCRE1 grep: change internal *pcre* variable & function names to be *pcre1* grep: move is_fixed() earlier to avoid forward declaration test-lib: add a PTHREADS prerequisite pack-objects & index-pack: add test for --threads warning pack-objects: fix buggy warning about threads grep: given --threads with NO_PTHREADS=YesPlease, warn grep: assert that threading is enabled when calling grep_{lock,unlock} No changes. Documentation/git-grep.txt | 7 +- Documentation/rev-list-options.txt | 8 +- Makefile | 14 ++- builtin/grep.c | 23 +++- builtin/pack-objects.c | 4 +- configure.ac | 12 +- grep.c | 110 + grep.h | 10 +- revision.c | 1 + t/README | 8 +- t/perf/README | 17 ++- t/perf/p4220-log-grep-engines.sh | 53 t/perf/p4221-log-grep-engines-fixed.sh | 44 +++ t/perf/p7820-grep-engines.sh | 56 + t/perf/p7821-grep-engines-fixed.sh | 41 +++ t/perf/run | 13 +- t/t4202-log.sh | 160 +++- t/t5300-pack-object.sh | 36 ++ t/t7008-grep-binary.sh | 135 - t/t7810-grep.sh| 81 ++--- t/t7812-grep-icase-non-ascii.sh| 29 ++--- t/t7813-grep-icase-iso.sh | 2 +- t/t7814-grep-recurse-submodules.sh | 215 - t/test-lib.sh | 3 +- 24 files changed, 843 insertions(+), 239 deletions(-) create mode 100755 t/perf/p4220-log-grep-engines.sh create mode 100755 t/perf/p4221-log-grep-engines-fixed.sh create mode 100755 t/perf/p7820-grep-engines.sh create mode 100755 t/perf/p7821-grep-engines-fixed.sh -- 2.13.0.303.g4ebf302169
[PATCH] connect.c: fix leak in parse_one_symref_info()
If we successfully parse a symref value like "HEAD:refs/heads/master", we add the result to a string list. But because the string list is marked STRING_LIST_INIT_DUP, the string list code will make a copy of the string and add the copy. This patch fixes it by adding the entry with string_list_append_nodup(), which lets the string list take ownership of our newly allocated string. There are two alternatives that seem like they would work, but aren't the right solution. The first is to initialize the list with the "NODUP" initializer. That would avoid the copy, but then the string list would not realize that it owns the strings. When we eventually call string_list_clear(), it would not free the strings, causing a leak. The second option would be to use the normal string_list_append(), but free the local copy in our function. We can't do this because the local copy actually contains _two_ strings; the symref name and its target. We point to the target pointer via the "util" field, and its memory must last as long as the string list does. You may also wonder whether it's safe to ever free the local copy, since the target points into it. The answer is yes, because we duplicate it in annotaate_refs_with_symref_info before clearing the string list. Signed-off-by: Jeff King --- Phew. For a one-line leak fix, that sure was complicated. I doubt it matters much in practice, because servers send only a single HEAD, so we just leak one string. But I happened to notice it while looking at the unborn-HEAD thing. connect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/connect.c b/connect.c index cd21a1b6f..c72b1d115 100644 --- a/connect.c +++ b/connect.c @@ -71,7 +71,7 @@ static void parse_one_symref_info(struct string_list *symref, const char *val, i check_refname_format(target, REFNAME_ALLOW_ONELEVEL)) /* "symref=bogus:pair */ goto reject; - item = string_list_append(symref, sym); + item = string_list_append_nodup(symref, sym); item->util = target; return; reject: -- 2.13.0.496.ge44ba89db
Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch
On Thu, May 25, 2017 at 11:59:24AM -0400, Jeff King wrote: > The just-HEAD case could look like: This patch does work, in the sense that upload-pack advertises the unborn HEAD symref. But the client doesn't actually do anything with it. The capability parsing happens in get_remote_heads(), which passes the data out by adding an annotation to the "struct ref" list. But of course we have no HEAD ref to annotate. So either get_remote_heads() would have to start returning a bogus HEAD ref (with a null sha1, I guess, which all callers would have to recognize). Or clone (and probably "remote set-head -a") would have to start reaching across the transport-module boundary and asking for any symref values for "HEAD". I'm not excited about more special-casing of "HEAD", though. In theory we'd want this for other symrefs in the long run, and it would be nice if clients were ready to handle that (even if the protocol isn't quite there). I dunno. I was thinking there might be a quick tweak, but I'm wondering if this arcane case is worth the restructuring we'd have to do to support it. It only comes up when you've moved the server repo's HEAD to an unborn branch _and_ you have other refs (since otherwise we don't even send capabilities at all!). -Peff
`git svn branch` does not respect authors file
I have a git repository linked to a svn repository with git-svn, including a branch path configuration and an authorsfile for svn username -> email mapping. When running `git svn branch new_branch`, git-svn: * Creates a svn commit creating a new svn branch * Creates a local git commit linked to this svn commit The svn commit is correctly generated, but the corresponding git commit has a bad author: the author is just the svn username. Re-fetching the svn repository then properly applies the authors file, but ideally it would do it directly when creating the branch.
[PATCH v3 4/6] fsmonitor: add test cases for fsmonitor extension
Add test cases that ensure status results are correct when using the new fsmonitor extension. Test untracked, modified, and new files by ensuring the results are identical to when not using the extension. Add a test to ensure updates to the index properly mark corresponding entries in the index extension as dirty so that the status is correct after commands that modify the index but don't trigger changes in the working directory. Add a test that verifies that if the fsmonitor extension doesn't tell git about a change, it doesn't discover it on its own. This ensures git is honoring the extension and that we get the performance benefits desired. All test hooks output a marker file that is used to ensure the hook was actually used to generate the test results. Signed-off-by: Ben Peart --- t/t7519-status-fsmonitor.sh | 158 1 file changed, 158 insertions(+) create mode 100755 t/t7519-status-fsmonitor.sh diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh new file mode 100755 index 00..395db46d55 --- /dev/null +++ b/t/t7519-status-fsmonitor.sh @@ -0,0 +1,158 @@ +#!/bin/sh + +test_description='git status with file system watcher' + +. ./test-lib.sh + +clean_repo () { + git reset --hard HEAD + git clean -fd + rm -f marker +} + +dirty_repo () { + : >untracked + : >dir1/untracked + : >dir2/untracked + echo 1 >modified + echo 2 >dir1/modified + echo 3 >dir2/modified + echo 4 >new + echo 5 >dir1/new + echo 6 >dir2/new + git add new + git add dir1/new + git add dir2/new +} + +# The test query-fsmonitor hook proc will output a marker file we can use to +# ensure the hook was actually used to generate the correct results. + +test_expect_success 'setup' ' + mkdir -p .git/hooks && + write_script .git/hooks/query-fsmonitor<<-\EOF && + if [ $1 -ne 1 ] + then + echo -e "Unsupported query-fsmonitor hook version.\n" >&2 + exit 1; + fi + : >marker + printf "untracked\0" + printf "dir1/untracked\0" + printf "dir2/untracked\0" + printf "modified\0" + printf "dir1/modified\0" + printf "dir2/modified\0" + printf "new\0"" + printf "dir1/new\0" + printf "dir2/new\0" + EOF + : >tracked && + : >modified && + mkdir dir1 && + : >dir1/tracked && + : >dir1/modified && + mkdir dir2 && + : >dir2/tracked && + : >dir2/modified && + git add . && + test_tick && + git commit -m initial && + dirty_repo +' + +cat >.gitignore <<\EOF +.gitignore +expect* +output* +marker* +EOF + +# Status is well tested elsewhere so we'll just ensure that the results are +# the same when using core.fsmonitor. First call after turning on the option +# does a complete scan so need to do two calls to ensure we test the new +# codepath. + +test_expect_success 'status with core.untrackedcache true' ' + git config core.fsmonitor true && + git config core.untrackedcache true && + git -c core.fsmonitor=false -c core.untrackedcache=true status >expect && + clean_repo && + git status && + test_path_is_missing marker && + dirty_repo && + git status >output && + test_path_is_file marker && + test_i18ncmp expect output +' + + +test_expect_success 'status with core.untrackedcache false' ' + git config core.fsmonitor true && + git config core.untrackedcache false && + git -c core.fsmonitor=false -c core.untrackedcache=false status >expect && + clean_repo && + git status && + test_path_is_missing marker && + dirty_repo && + git status >output && + test_path_is_file marker && + test_i18ncmp expect output +' + +# Ensure commands that call refresh_index() to move the index back in time +# properly invalidate the fsmonitor cache + +test_expect_success 'refresh_index() invalidates fsmonitor cache' ' + git config core.fsmonitor true && + git config core.untrackedcache true && + clean_repo && + git status && + test_path_is_missing marker && + dirty_repo && + write_script .git/hooks/query-fsmonitor<<-\EOF && + :>marker + EOF + git add . && + git commit -m "to reset" && + git status && + test_path_is_file marker && + git reset HEAD~1 && + git status >output && + test_path_is_file marker && + git -c core.fsmonitor=false status >expect && + test_i18ncmp expect output +' + +# Now make sure it's actually skipping the check for modified and untracked +# files unless it is told about them. Note, after "git reset --hard HEAD" no +# extensions exist other than 'TREE' so do a "git status" to get the extension +# written before testing the results. + +test_expect_success 'status doesnt detect unreported modificat
[PATCH v3 1/6] bswap: add 64 bit endianness helper get_be64
Add a new get_be64 macro to enable 64 bit endian conversions on memory that may or may not be aligned. Signed-off-by: Ben Peart --- compat/bswap.h | 4 1 file changed, 4 insertions(+) diff --git a/compat/bswap.h b/compat/bswap.h index d47c003544..f89fe7f4b5 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -158,6 +158,7 @@ static inline uint64_t git_bswap64(uint64_t x) #define get_be16(p)ntohs(*(unsigned short *)(p)) #define get_be32(p)ntohl(*(unsigned int *)(p)) +#define get_be64(p)ntohll(*(uint64_t *)(p)) #define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0) #else @@ -170,6 +171,9 @@ static inline uint64_t git_bswap64(uint64_t x) (*((unsigned char *)(p) + 1) << 16) | \ (*((unsigned char *)(p) + 2) << 8) | \ (*((unsigned char *)(p) + 3) << 0) ) +#define get_be64(p)( \ + ((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \ + ((uint64_t)get_be32((unsigned char *)(p) + 4) << 0) #define put_be32(p, v) do { \ unsigned int __v = (v); \ *((unsigned char *)(p) + 0) = __v >> 24; \ -- 2.13.0.windows.1.9.gc201c67b71
[PATCH v3 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
This hook script integrates the new fsmonitor capabilities of git with the cross platform Watchman file watching service. To use the script: Download and install Watchman from https://facebook.github.io/watchman/ and instruct Watchman to watch your working directory for changes ('watchman watch-project /usr/src/git'). Rename the sample integration hook from query-fsmonitor.sample to query-fsmonitor. Configure git to use the extension ('git config core.fsmonitor true') and optionally turn on the untracked cache for optimal performance ('git config core.untrackedcache true'). Signed-off-by: Ben Peart Signed-off-by: Johannes Schindelin --- templates/hooks--query-fsmonitor.sample | 37 + 1 file changed, 37 insertions(+) create mode 100644 templates/hooks--query-fsmonitor.sample diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample new file mode 100644 index 00..615f3332fa --- /dev/null +++ b/templates/hooks--query-fsmonitor.sample @@ -0,0 +1,37 @@ +#!/bin/sh +# +# An example hook script to integrate Watchman +# (https://facebook.github.io/watchman/) with git to provide fast +# git status. +# +# The hook is passed a time in nanoseconds formatted as a string and +# outputs to stdout all files that have been modified since the given +# time. Paths must be relative to the root of the working tree and +# separated by a single NUL. +# +# To enable this hook, rename this file to "query-fsmonitor" + +# check the hook interface version +if [ $1 -eq 1 ] +then + # convert nanoseconds to seconds + time_t=$(($2/10)) +else + echo -e "Unsupported query-fsmonitor hook version.\nFalling back to scanning...\n" >&2 + exit 1; +fi + +# Convert unix style paths to escaped Windows style paths +case "$(uname -s)" in +MINGW*|MSYS_NT*) + GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,,g')" + ;; +*) + GIT_WORK_TREE="$PWD" + ;; +esac + +# Query Watchman for all the changes since the requested time +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t, \"fields\":[\"name\"]}]" | \ +watchman -j | \ +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));' -- 2.13.0.windows.1.9.gc201c67b71
[PATCH v3 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
When the index is read from disk, the query-fsmonitor index extension is used to flag the last known potentially dirty index and untracked cache entries. If git finds out some entries are 'fsmonitor-dirty', but are really unchanged (e.g. the file was changed, then reverted back), then Git will clear the marking in the extension. If git adds or updates an index entry, it is marked 'fsmonitor-dirty' to ensure it is checked for changes in the working directory. Before the 'fsmonitor-dirty' flags are used to limit the scope of the files to be checked, the query-fsmonitor hook proc is called with the time the index was last updated. The hook proc returns the list of files changed since that last updated time and the list of potentially dirty entries is updated to reflect the current state. refresh_index() and valid_cached_dir() are updated so that any entry not flagged as potentially dirty is not checked as it cannot have any changes. Signed-off-by: Ben Peart --- Makefile | 1 + builtin/update-index.c | 1 + cache.h| 5 ++ config.c | 5 ++ dir.c | 14 +++ dir.h | 2 + entry.c| 1 + environment.c | 1 + fsmonitor.c| 238 + fsmonitor.h| 9 ++ read-cache.c | 28 +- unpack-trees.c | 1 + 12 files changed, 304 insertions(+), 2 deletions(-) create mode 100644 fsmonitor.c create mode 100644 fsmonitor.h diff --git a/Makefile b/Makefile index e35542e631..488a4466cc 100644 --- a/Makefile +++ b/Makefile @@ -760,6 +760,7 @@ LIB_OBJS += ewah/ewah_rlw.o LIB_OBJS += exec_cmd.o LIB_OBJS += fetch-pack.o LIB_OBJS += fsck.o +LIB_OBJS += fsmonitor.o LIB_OBJS += gettext.o LIB_OBJS += gpg-interface.o LIB_OBJS += graph.o diff --git a/builtin/update-index.c b/builtin/update-index.c index ebfc09faa0..32fd977b43 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int mark) else active_cache[pos]->ce_flags &= ~flag; active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE; + active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY; cache_tree_invalidate_path(&the_index, path); active_cache_changed |= CE_ENTRY_CHANGED; return 0; diff --git a/cache.h b/cache.h index 188811920c..58e5abf69f 100644 --- a/cache.h +++ b/cache.h @@ -201,6 +201,7 @@ struct cache_entry { #define CE_ADDED (1 << 19) #define CE_HASHED(1 << 20) +#define CE_FSMONITOR_DIRTY (1 << 21) #define CE_WT_REMOVE (1 << 22) /* remove in work directory */ #define CE_CONFLICTED(1 << 23) @@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode) #define CACHE_TREE_CHANGED (1 << 5) #define SPLIT_INDEX_ORDERED(1 << 6) #define UNTRACKED_CHANGED (1 << 7) +#define FSMONITOR_CHANGED (1 << 8) struct split_index; struct untracked_cache; @@ -342,6 +344,8 @@ struct index_state { struct hashmap dir_hash; unsigned char sha1[20]; struct untracked_cache *untracked; + uint64_t fsmonitor_last_update; + struct ewah_bitmap *fsmonitor_dirty; }; extern struct index_state the_index; @@ -766,6 +770,7 @@ extern int precomposed_unicode; extern int protect_hfs; extern int protect_ntfs; extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env; +extern int core_fsmonitor; /* * Include broken refs in all ref iterations, which will diff --git a/config.c b/config.c index bb4d735701..1a8108636d 100644 --- a/config.c +++ b/config.c @@ -1232,6 +1232,11 @@ static int git_default_core_config(const char *var, const char *value) return 0; } + if (!strcmp(var, "core.fsmonitor")) { + core_fsmonitor = git_config_bool(var, value); + return 0; + } + /* Add other config variables here and to Documentation/config.txt. */ return 0; } diff --git a/dir.c b/dir.c index 1b5558fdf9..37f1c580a5 100644 --- a/dir.c +++ b/dir.c @@ -16,6 +16,7 @@ #include "utf8.h" #include "varint.h" #include "ewah/ewok.h" +#include "fsmonitor.h" /* * Tells read_directory_recursive how a file or directory should be treated. @@ -1652,6 +1653,18 @@ static int valid_cached_dir(struct dir_struct *dir, if (!untracked) return 0; + refresh_by_fsmonitor(&the_index); + if (dir->untracked->use_fsmonitor) { + /* +* With fsmonitor, we can trust the untracked cache's +* valid field. +*/ + if (untracked->valid) + goto skip_stat; + else + invalidate_directory(dir->untracked, untracked); + } + if (stat(path->len ? path->buf : ".
[PATCH v3 5/6] fsmonitor: add documentation for the fsmonitor extension.
This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension. Signed-off-by: Ben Peart --- Documentation/config.txt | 7 +++ Documentation/githooks.txt | 23 +++ Documentation/technical/index-format.txt | 19 +++ 3 files changed, 49 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index e0b9fd0bc3..5211388167 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -389,6 +389,13 @@ core.protectNTFS:: 8.3 "short" names. Defaults to `true` on Windows, and `false` elsewhere. +core.fsmonitor:: + If set to true, call the query-fsmonitor hook proc which will + identify all files that may have had changes since the last + request. This information is used to speed up operations like + 'git commit' and 'git status' by limiting what git must scan to + detect changes. + core.trustctime:: If false, the ctime differences between the index and the working tree are ignored; useful when the inode change time diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 706091a569..48127e8729 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -448,6 +448,29 @@ The commits are guaranteed to be listed in the order that they were processed by rebase. +[[query-fsmonitor]] +query-fsmonitor +~~~ + +This hook is invoked when the configuration option core.fsmonitor is +set and git needs to identify changed or untracked files. It takes +two arguments, a version (currently 1) and the time in elapsed +nanoseconds since midnight, January 1, 1970. + +The hook should output to stdout the list of all files in the working +directory that may have changed since the requested time. The logic +should be inclusive so that it does not miss any potential changes. +The paths should be relative to the root of the working directory +and be separated by a single NUL. + +Git will limit what files it checks for changes as well as which +directories are checked for untracked files based on the path names +given. + +The exit status determines whether git will use the data from the +hook to limit its search. On error, it will fall back to verifying +all files and folders. + GIT --- Part of the linkgit:git[1] suite diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index ade0b0c445..7aeeea6f94 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -295,3 +295,22 @@ The remaining data of each directory block is grouped by type: in the previous ewah bitmap. - One NUL. + +== File System Monitor cache + + The file system monitor cache tracks files for which the query-fsmonitor + hook has told us about changes. The signature for this extension is + { 'F', 'S', 'M', 'N' }. + + The extension starts with + + - 32-bit version number: the current supported version is 1. + + - 64-bit time: the extension data reflects all changes through the given + time which is stored as the nanoseconds elapsed since midnight, + January 1, 1970. + + - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap. + + - An ewah bitmap, the n-th bit indicates whether the n-th index entry +is CE_FSMONITOR_DIRTY. -- 2.13.0.windows.1.9.gc201c67b71
[PATCH v3 2/6] dir: make lookup_untracked() available outside of dir.c
Remove the static qualifier from lookup_untracked() and make it available to other modules by exporting it from dir.h. This will be used later when we need to find entries to mark 'fsmonitor dirty.' Signed-off-by: Ben Peart --- dir.c | 2 +- dir.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/dir.c b/dir.c index f451bfa48c..1b5558fdf9 100644 --- a/dir.c +++ b/dir.c @@ -660,7 +660,7 @@ static void trim_trailing_spaces(char *buf) * * If "name" has the trailing slash, it'll be excluded in the search. */ -static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc, +struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc, struct untracked_cache_dir *dir, const char *name, int len) { diff --git a/dir.h b/dir.h index bf23a470af..9e387551bd 100644 --- a/dir.h +++ b/dir.h @@ -339,4 +339,7 @@ extern void connect_work_tree_and_git_dir(const char *work_tree, const char *git extern void relocate_gitdir(const char *path, const char *old_git_dir, const char *new_git_dir); +struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc, +struct untracked_cache_dir *dir, +const char *name, int len); #endif -- 2.13.0.windows.1.9.gc201c67b71
[PATCH v3 0/6] Fast git status via a file system watcher
Changes from V2 include: - switch to nanoseconds for last update time saved in index extension and passed to hook - pass the hook a version to enable simpler future updates - fixup compiler warnings found with different compilers - update test to run properly on Mac - fix documentation formatting and spelling errors - update code formatting based on feedback - rename fsmonitor_dirty_bitmap to fsmonitor_dirty Ben Peart (6): bswap: add 64 bit endianness helper get_be64 dir: make lookup_untracked() available outside of dir.c fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files. fsmonitor: add test cases for fsmonitor extension fsmonitor: add documentation for the fsmonitor extension. fsmonitor: add a sample query-fsmonitor hook script for Watchman Documentation/config.txt | 7 + Documentation/githooks.txt | 23 +++ Documentation/technical/index-format.txt | 19 +++ Makefile | 1 + builtin/update-index.c | 1 + cache.h | 5 + compat/bswap.h | 4 + config.c | 5 + dir.c| 16 ++- dir.h| 5 + entry.c | 1 + environment.c| 1 + fsmonitor.c | 238 +++ fsmonitor.h | 9 ++ read-cache.c | 28 +++- t/t7519-status-fsmonitor.sh | 158 templates/hooks--query-fsmonitor.sample | 37 + unpack-trees.c | 1 + 18 files changed, 556 insertions(+), 3 deletions(-) create mode 100644 fsmonitor.c create mode 100644 fsmonitor.h create mode 100755 t/t7519-status-fsmonitor.sh create mode 100644 templates/hooks--query-fsmonitor.sample diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index f7b4b4a844..48127e8729 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -450,12 +450,12 @@ processed by rebase. [[query-fsmonitor]] query-fsmonitor - +~~~ This hook is invoked when the configuration option core.fsmonitor is set and git needs to identify changed or untracked files. It takes -a single argument which is the time in elapsed seconds since midnight, -January 1, 1970. +two arguments, a version (currently 1) and the time in elapsed +nanoseconds since midnight, January 1, 1970. The hook should output to stdout the list of all files in the working directory that may have changed since the requested time. The logic diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt index b002d23c05..7aeeea6f94 100644 --- a/Documentation/technical/index-format.txt +++ b/Documentation/technical/index-format.txt @@ -307,7 +307,8 @@ The remaining data of each directory block is grouped by type: - 32-bit version number: the current supported version is 1. - 64-bit time: the extension data reflects all changes through the given - time which is stored as the seconds elapsed since midnight, January 1, 1970. + time which is stored as the nanoseconds elapsed since midnight, + January 1, 1970. - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap. diff --git a/cache.h b/cache.h index 36423c77cc..58e5abf69f 100644 --- a/cache.h +++ b/cache.h @@ -344,8 +344,8 @@ struct index_state { struct hashmap dir_hash; unsigned char sha1[20]; struct untracked_cache *untracked; - time_t fsmonitor_last_update; - struct ewah_bitmap *fsmonitor_dirty_bitmap; + uint64_t fsmonitor_last_update; + struct ewah_bitmap *fsmonitor_dirty; }; extern struct index_state the_index; diff --git a/fsmonitor.c b/fsmonitor.c index 9f08e66db9..3ce262d47d 100644 --- a/fsmonitor.c +++ b/fsmonitor.c @@ -5,6 +5,9 @@ #include "strbuf.h" #include "fsmonitor.h" +#define INDEX_EXTENSION_VERSION1 +#define HOOK_INTERFACE_VERSION 1 + static struct untracked_cache_dir *find_untracked_cache_dir( struct untracked_cache *uc, struct untracked_cache_dir *ucd, const char *name) @@ -56,20 +59,20 @@ int read_fsmonitor_extension(struct index_state *istate, const void *data, hdr_version = get_be32(index); index += sizeof(uint32_t); - if (hdr_version != 1) + if (hdr_version != INDEX_EXTENSION_VERSION) return error("bad fsmonitor version %d", hdr_version); - istate->fsmonitor_last_update = (time_t)get_be64(index); + istate->fsmonitor_last_update = get_be64(index); index += sizeof(uint64_t); ewah_size = get_be32(index); index += sizeof(uint32_t); - istate->fsmonitor_dirty_bitmap = ewah_new(); -
Re: Documentation issue: git-stash examples
On Thu, May 25, 2017 at 09:52:42PM +1200, Adrian Forbes wrote: > Some of the example commands in git-stash documentation should be > written as comments rather than actual commands: > https://cloud.githubusercontent.com/assets/24915363/26444394/5cf6a754-4190-11e7-845e-135288c8916e.png > > For example, `$ edit emergency fix` should be `# ... edit emergency > fix ...` like the other comments in the section. > > It could be misleading for novices. Yeah, I think that's a good idea. Do you want to try your hand at a patch? -Peff
Re: `pull --rebase --autostash` fails when fast forward in dirty repo
On Thu, May 25, 2017 at 02:04:07PM -0400, Jeff King wrote: > > ...that might be something worth thinking about---my gut feeling > > tells me something but we should go by a measurement, not by gut > > feeling of a random somebody. > > Yeah, I'd agree. I had the impression the original change was motivated > by gut feeling. Hmph. On Linux, at least, I do not see that using "git merge" to fast-forward is appreciably faster: Here are timings for: git reset --hard HEAD~10 && git pull --rebase in a git.git repo, using builds of git from various commits (all best-of-five; the timings include the reset for simplicity, but presumably it costs the same in each case): - 33b842a1e^ (just prior to the switch to git-merge) real 0m0.256s user 0m0.096s sys 0m0.020s - 33b842a1e (using git-merge) real 0m0.227s user 0m0.092s sys 0m0.020s So a little faster, but there seems to be 20-30ms of noise in my timings anyway (the average for the "prior" case did seem to be higher, though). It's possible that the difference would be more stark on Windows, where the cost of the shell script would be higher. The same test with the current master performs the same as 33b842a1e. But if I then remove the optimization, as Tyler's patch did at the start of this thread, the timings are similar to 33b842a1e^. So I dunno. It does not seem appreciably faster, but what little speedup it does provide is the same even with a more modern rebase. Which is probably because that rebase isn't actually doing much in the first place, so the optimized bits from Dscho's rebase--helper are not kicking in yet. Anyway. All this has shown me is that it's probably pointless to do this timing at all on Linux. Somebody on Windows might get better results. But regardless, we need to do something. Correctness must trump optimizations, and the question is whether we can throw out the whole conditional, or if we should just restrict when it kicks in. -Peff
Re: `pull --rebase --autostash` fails when fast forward in dirty repo
On Wed, May 24, 2017 at 07:40:08AM +0900, Junio C Hamano wrote: > > But I notice on the run_merge() code path that we do still invoke > > git-merge. > > ... wouldn't that what we want even when the merge happens to be a > fast-forward one? I am not sure what you meant by this, but... I just meant that if the point of the optimization was to avoid invoking git-rebase (because it's slow), then we're still not optimizing out a process. It only helps at all because "rebase" (being a shell script) may be slower to start and realize it's a fast-forward than "merge". But once that is no longer true of git-rebase, then there is no purpose to the optimization. > > And rebase has been getting faster as it is moved to C code > > itself. So is this optimization even worth doing anymore? > > ...that might be something worth thinking about---my gut feeling > tells me something but we should go by a measurement, not by gut > feeling of a random somebody. Yeah, I'd agree. I had the impression the original change was motivated by gut feeling. -Peff
Re: [PATCHv5 00/17] Diff machine: highlight moved lines.
On Wed, May 24, 2017 at 11:44 PM, Junio C Hamano wrote: > I was trying to see how this is useful in code moving change, with > this command line: > > $ git -c color.moved diff js/blame-lib~6 js/blame-lib blame.c blame.h > builtin/blame.c > > Some random observations: > > * You do not seem to have any command line option yet. I guess >that is OK while the series is still in RFC state, but when we >are ready to seriously start considering this for 'next', we'd >need something like --color-moved that can be used across "diff", >"log -p" and "show". There is "--color-moved" as a command line option. (See diff.c:4290) Oh, it is not documented! That will be fixed. > * As configuration variable names go, "color.moved" is probably in >a wrong section. Perhaps "diff.colorMoved" or something? As you turn on/off normal coloring via "color.diff" and this only extends the coloring scheme, I have the impression "color" is the right section. Maybe color.diffStyle=[normal/enhanced] and "enhanced" means this feature for now? The only option in the "diff" section related to color is diff.wsErrorHighlight which has a very similar purpose, so "diff.colorMoved" would fit in that scheme. > > * The fact that I am using > > [diff "color"] > old = red reverse > whitespace = blue reverse > >on a "black ink on white paper" terminal might have an effect on >this, Yes very much! >but lines printed in either bold green and on green >background (i.e. not new ones but are the ones moved from >elsewhere) stood out a lot more prominently than lines printed in >green (i.e. truly new additions), and I felt that this is totally >backwards from what I needed for this exercise. That is, while >reviewing a code moving change, I want to be able to concentrate >primarily of three things: > >- What are the new lines that are *not* moved from elsewhere? > Did the submitter try to sneak in unrelated changes? > >- What are the changes that are truly lost, not moved to > elsewhere? > >- Do the lines moved from elsewhere form a coherent whole? Where > is the boundary between blocks of text that are copied? Do > adjacent two blocks of moved text come from the same old place > next to each other? So with these questions, I wonder if we want to color moved lines as "color.diff.context" (i.e. plain white text in the normal coloring scheme) This would serve the intended purpose of dimming the attention to moved lines. Regarding the last point of marking up adjacent blocks (which would indicate that there is a coherency issue or just moving from different places), we could highlight the last line of the previous block and first line of the next block in their "normal" colors (i.e. color.diff.old and color.diff.new). The very first version had some boundary coloring, but then I switched to alternating block coloring based on an idea by Jonathan Tan. Maybe it is time to go back to boundary coloring, but optional and apply the boundary color only if two blocks are adjacent? Example of how it could look: ---8<--- diff --git a/poetry.txt b/poetry.txt index 9d32b3b..cc50ca1 100644 --- a/poetry.txt +++ b/poetry.txt @@ -1,12 +1,4 @@ [W] -A simple text is [W] -written in paragraphs and [W] -many more lines. [R] - [W] A diff is smaller [W] than the pre or post image text form [W] used for review [W] [W] -In between focus [W] -is hard to keep consistently [W] -errors may occur [W] - diff --git a/engineer.txt b/engineer.txt new file mode 100644 index 000..8fbc0ce --- /dev/null +++ b/engineer.txt @@ -0,0 +1,7 @@ [W] +A simple text is [W] +written in paragraphs and [B] +many more lines. [B] +In between focus [W] +is hard to keep consistently [W] +errors may occur [W] + ---8<--- W -> simple White text (diff.color.context) R -> Red text (diff.color.old) B -> Boundary marker (Maybe just diff.color.{old/new} or a new color to configure) > >Using colors that stand out more prominently than for the regular >new/old lines is counter-productive for all of these, especially >for the first two purposes. I may suggest using cyan (or any >color that is very close to the background) so that the presense >of moved lines are merely felt without being distracting. IOW, >while reviewing code moving patch, moved parts want to be dimmed, >not highlighted. I agree. So I could resend the algorithm used with other defaults or try out the "boundary only iff adjacent blocks, else context color". Thanks, Stefan
Urgent Message
Dear Good day, I sent this mail praying for it to reach you in good health, since I myself are in a very critical health condition in which I sleep every night without knowing if I may be alive to see the next day. I am a widow suffering from long time illness. I have some funds I inherited from my late husband, my Doctor told me recently that I would not last due to the illness. Having known my condition, I decided to donate this fund to a good person that will utilize it the way i am going to instruct herein. I need a very honest person who can claim this money and use it for Charity works, for orphanages, widows and also build schools for less privilege . I accept this decision because I do not have any child who will inherit this money after I die.Please I want your sincerely and urgent answer to know if you will be able to execute this project, and I will give you more information on how the fund will be transferred to your bank account. I am waiting for your reply and my private email address is caronglori...@yahoo.com Thank you, Mrs Gloria
Re: [Non-Bug] cloning a repository with a default MASTER branch tries to check out the master branch
On Thu, May 25, 2017 at 12:13:03PM +0900, Junio C Hamano wrote: > >> So if we wanted to improve this, I think the first step would be for the > >> server to start sending symref lines for HEAD, even when it does not > >> resolve to anything. > > > > Yup, noticed the same and I agree with your conclusion. > > We probably should make head_ref_namespaced() to take the > resolve_flags that is passed down thru refs_read_ref_full() down to > refs_resolve_ref_unsafe(). For the purpose of the first call to it > in upload-pack.c to call back find_symref(), we do not need and want > to say RESOLVE_REF_READING (which requires a symref to be pointing > at an existing ref). I suspect all the other calls (there are 2 > other in unload-pack.c and yet another in http-backend.c) to the > function should keep passing RESOLVE_REF_READING, as they do want to > omit a symbolic ref that points at an unborn branch. That would make head_ref() not function-pointer compatible with all the other for_each_ref functions. I don't know how much that matters. The revision.c parser does use function pointers, but it doesn't handle HEAD specially. So I kind of wonder if that code should simply be calling resolve_ref_unsafe() itself in the first place. We know the only value it's going to get is HEAD. OTOH, it's possible that we would eventually want to report all symrefs, including ones we find while traversing the refs. And in that sense, all of the for_each_ref functions would want to learn about this. But for ref advertisement I think that would need a protocol change anyway, so I'm not sure it's worth worrying about. The just-HEAD case could look like: diff --git a/upload-pack.c b/upload-pack.c index 97da13e6a..04a913ad1 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -959,28 +959,25 @@ static int send_ref(const char *refname, const struct object_id *oid, return 0; } -static int find_symref(const char *refname, const struct object_id *oid, - int flag, void *cb_data) +static void find_symref(const char *refname, struct string_list *out) { const char *symref_target; struct string_list_item *item; struct object_id unused; + int flag; - if ((flag & REF_ISSYMREF) == 0) - return 0; symref_target = resolve_ref_unsafe(refname, 0, unused.hash, &flag); if (!symref_target || (flag & REF_ISSYMREF) == 0) - die("'%s' is a symref but it is not?", refname); - item = string_list_append(cb_data, refname); + return; + item = string_list_append(out, refname); item->util = xstrdup(symref_target); - return 0; } static void upload_pack(void) { struct string_list symref = STRING_LIST_INIT_DUP; - head_ref_namespaced(find_symref, &symref); + find_symref("HEAD", &symref); if (advertise_refs || !stateless_rpc) { reset_timeout();
Re: [PATCH v6 6/6] clean: teach clean -d to preserve ignored paths
On Wed, May 24, 2017 at 12:14 AM, Torsten Bögershausen wrote: > >> diff --git a/builtin/clean.c b/builtin/clean.c >> index d861f836a..937eb17b6 100644 >> --- a/builtin/clean.c >> +++ b/builtin/clean.c >> @@ -857,6 +857,38 @@ static void interactive_main_loop(void) >> } >> } >> +static void correct_untracked_entries(struct dir_struct *dir) > > Looking what the function does, would > drop_or_keep_untracked_entries() > make more sense ? To me, drop_or_keep_ implies that they're either all dropped or all kept, nothing in between, which is why I went with correct_, to indicate that the set of untracked entries in the dir_struct prior to calling the method needs to be corrected.
[RFC/PATCH] recognize pathspec magic without "--" disambiguation
For commands that take revisions and pathspecs, magic pathspecs like ":^Documentation" or ":/Documentation" have to appear on the right-hand side of the disambiguating "--", like: git log -- :^Documentation This makes them more annoying to use than they need to be. We loosened the rules for wildcards in 28fcc0b71 (pathspec: avoid the need of "--" when wildcard is used, 2015-05-02). Let's do the same for arguments that look like pathspec magic (colon followed by any punctuation). The obvious and desired impact is that you can now do: git log :^Documentation But let's consider related invocations and whether we're making them better or worse: - git log :/foo (when "foo" matches a commit message) This one should remain the same. Like the existing wildcard rule, we're touching only verify_filename(), not verify_non_filename(). So cases that _do_ resolve as a rev will continue to do so. - git log :^foo (when "^foo" exists in your index) The same logic applies; it would continue to work. And ditto for any other weird filenames in your index like "(attr)foo". - git log :/foo (when "foo" does _not_ match a commit message) We won't treat this as a revision, because it doesn't match anything. So prior to this patch, we'd either treat it as a path (if "foo" does exist at the root of the project) or complain "this isn't a rev, and nor is it a path". Whereas after this patch, we'd always treat it like a path, whether it exists or not (so in the second case instead of an error, you're likely to get an empty "log", unless "foo" really did exist somewhere in your history). So this is a potential downside; if the user intended a search of the commit messages, they may prefer the error message to the attempted pathspec match. This same downside actually exists currently when you have an asterisk in your regex. E.g., git log :/foo.*bar will be treated as a pathspec (if it doesn't match a commit message) due to the wildcard matching in 28fcc0b71. - git log :^foo (when "^foo" isn't in your index) The same outcomes apply here as above, but this "downside" is very unlikely to occur, as "^foo" is such a funny name (and again, things like "(attr)foo" as a filename are sufficiently uncommon not to worry about). - git log :%foo (or any other pathspec magic char that doesn't exist) The new check doesn't actually parse the pathspec magic, but allows any punctuation (which includes the long-form ":(magic)"). At first glance this seems overly permissive, but it actually yields a better error message here: instead of complaining that ":%foo" is neither a rev nor a path, we treat it as a pathspec and complain that "%" is not a known magic (the same as we would if the "--" were given). Of course, the flip side here is when you really meant a file named "%foo" in your index, but it didn't exist. That seems reasonably unlikely (and the error message is clear enough to point the user in the right direction). So the collateral damage doesn't seem too bad (it's really just the case where :/foo doesn't match anything). There are two possibilities for reducing that: 1. Instead of accepting all pathspec magic, just allow certain ones like ":^" which are unlikely to be used to specify a revision (or alternatively, just disallow potentially confusing ones like ":/"). That works, but it makes the rules inconsistent and confusing for the user (i.e., "--" is sometimes needed for magic and sometimes not). 2. Instead of recognizing pathspec magic, teach check_filename() to parse off the filename bit and see if that exists (e.g., for "^foo" see if "foo" exists). We already do this for ":/", but it's done in a very ad-hoc way. We parse it ourselves, rather than relying on the pathspec code, and handle only "/" magic, and not even its ":(top)" long form. That could be fixed by asking the pathspec code to parse it for us (including all magic), and then trying to match the resulting name against the working tree. But not all pathspec magic actually results in a pathname. For instance, when should ":(attr)foo" be valid? We can't just look for "foo" in the filesystem (it's an attribute, not a pathname). By comparison, recognizing things that look like pathspec magic is a simple and easy to understand rule. Signed-off-by: Jeff King --- I wrote all the above to try to convince myself that this doesn't have any serious regression cases. And I think I did. But I actually we could make the rules in alternative (2) above work. check_filename() would ask the pathspec code to parse each argument and get one of three results: 1
Re: [PATCH v2 0/6] Fast git status via a file system watcher
On 5/24/2017 6:54 AM, Christian Couder wrote: Design ~~ A new git hook (query-fsmonitor) must exist and be enabled (core.fsmonitor=true) that takes a time_t formatted as a string and outputs to stdout all files that have been modified since the requested time. Is there a reason why there is a new hook, instead of a "core.fsmonitorquery" config option to which you could pass whatever command line with options? A hook is a simple and well defined way to integrate git with another process. If there is some fixed set of arguments that need to be passed to a file system monitor (beyond the timestamp stored in the index extension), they can be encoded in the integration script like I've done in the Watchman integration sample hook. A new 'fsmonitor' index extension has been added to store the time the fsmonitor hook was last queried and a ewah bitmap of the current 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked entries are 'fsmonitor-dirty.' As needed, git will call the query-fsmonitor hook proc for the set of changes since the index was last updated. Git then uses this set of files along with the list saved in the fsmonitor index extension to flag the potentially dirty index and untracked cache entries. So this can work only if "core.untrackedCache" is set to true? This works with core.untrackedCache set to true or false. If it is set to false, you get valid results, you just don't get the speed up when checking for untracked files. Thanks for working on this, Christian.
Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
On 5/22/2017 1:28 PM, Ævar Arnfjörð Bjarmason wrote: On Mon, May 22, 2017 at 6:18 PM, Ben Peart wrote: On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote: +== File System Monitor cache + + The file system monitor cache tracks files for which the query-fsmonitor + hook has told us about changes. The signature for this extension is + { 'F', 'S', 'M', 'N' }. + + The extension starts with + + - 32-bit version number: the current supported version is 1. + + - 64-bit time: the extension data reflects all changes through the given + time which is stored as the seconds elapsed since midnight, January 1, 1970. + + - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap. + + - An ewah bitmap, the n-th bit indicates whether the n-th index entry +is CE_FSMONITOR_DIRTY. We already have a uint64_t in one place in the codebase (getnanotime) which uses a 64 bit time for nanosecond accuracy, and numerous filesystems already support nanosecond timestamps (ext4, that new Apple thingy...). I don't know if any of the inotify/fsmonitor APIs support that yet, but it seems inevitable that that'll be added if not, in some pathological cases we can have a lot of files modified in 1 second, so using nanosecond accuracy means there'll be a lot less data to consider in some cases. It does mean this'll only work until the year ~2500, but that seems like an acceptable trade-off. I really don't think nano-second resolution is needed in this case for a few reasons. The number of files that can change within a given second is limited by the IO throughput of the underlying device. Even assuming a very fast device and very small files and changes, this won't be that many files. Without this patch, git would have scanned all those files every time. With this patch, git will only scan those files a 2nd time that are modified in the same second that it did the first scan *that came before the first scan started* (the "lots of files modified" section in the 1 second timeline below). |- one second -| |-lots of files modified - git status - more file modified-| Yes, some duplicate status checks can be made but its still a significant win in any reasonable scenario. Especially when you consider that it is pretty unusual to do git status/add/commit calls in the middle of making lots of changes to files. I understand that we get most of the wins either way. I'm just wondering why not make it nanosecond-resolution now from the beginning since that's where major filesystems are heading already, and changing stuff like this can be a hassle once it's initially out there, whereas just dividing by 10^9 for APIs that need seconds from the beginning is easy & future-proof. There are some uses of git where this would probably matter in practice. E.g. consider a git-annex backed fileserver using ext4, currently git-annex comes with its own FS watcher as a daemon invoked via `git annex watch` to constantly add new files without killing performance via a status/add in a loop, with this a daemon like that could just run status/add in a loop, but on a busy repo the 1s interval size might start to matter as you're constantly inspecting larger intervals. More importantly though, I can't think of any case where having it in nanoseconds to begin with would do any harm. You're right, it's not hard to support nano second resolution and it doesn't do any harm. I switch the index format and interface as I don't expect this code will still be running when the timer rolls over. Someone long after me will have to fix it if it is. :) In addition, the backing file system monitor (Watchman) supports number of seconds since the unix epoch (unix time_t style). This means any support of nano seconds by git is academic until someone provides a file system watcher that does support nano second granularity. I haven't used watchman for anything non-trivial, but the documentation for the query API you seem to be using says you can request a {ctime,mtime}_ns field: https://github.com/facebook/watchman/blob/master/website/_docs/cmd.query.markdown#user-content-available-fields And isn't this the documentation for the "since" query you're using, saying you can specify any arbitrary fs timing field, such as a _ns time field: https://github.com/facebook/watchman/blob/master/website/_docs/expr.since.md ? To keep the index extension and hook interface generic, I have not adopted the Watchman specific clock format. This enables anyone to provide a different file system watcher that can inter-operate as nano seconds since epoc is easy for anyone to support. Finally, the fsmonitor index extension is versioned so that we can seamlessly upgrade to nano second resolution later if we desire. Aside from my nanosecond bikeshedding, let's say we change the semantics of any of this in the future: The index has the version, but there's one argument passed to the hook without a v
Re: What's cooking in git.git (May 2017, #06; Mon, 22)
On Mon, May 22, 2017 at 1:11 PM, Junio C Hamano wrote: > * nd/fopen-errors (2017-05-09) 23 commits > - t1308: add a test case on open a config directory > - config.c: handle error on failing to fopen() > - xdiff-interface.c: report errno on failure to stat() or fopen() > - wt-status.c: report error on failure to fopen() > - server-info: report error on failure to fopen() > - sequencer.c: report error on failure to fopen() > - rerere.c: report correct errno > - rerere.c: report error on failure to fopen() > - remote.c: report error on failure to fopen() > - commit.c: report error on failure to fopen() the graft file > - log: fix memory leak in open_next_file() > - log: report errno on failure to fopen() a file > - blame: report error on open if graft_file is a directory > - bisect: report on fopen() error > - ident.c: use fopen_or_warn() > - attr.c: use fopen_or_warn() > - wrapper.c: add fopen_or_warn() > - wrapper.c: add warn_on_fopen_errors() > - config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too > - config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD > - clone: use xfopen() instead of fopen() > - Use xfopen() in more places > - git_fopen: fix a sparse 'not declared' warning > > We often try to open a file for reading whose existence is > optional, and silently ignore errors from open/fopen; report such > errors if they are not due to missing files. If anyone wants to continue this, I've cleaned up the series based on Johannes comments here [1]. It does not have the Darwin change though. There was the last question about the '*' test change in ref path and what exact code change causes that, which I can't answer because I don't have Windows, or the time to simulate and pinpoint the fault on Linux. [1] https://github.com/pclouds/git/commits/fopen-or-warn -- Duy