Re: [PATCH 7/6] run-command: block signals between fork and execve
Eric Wongwrote: > Brandon Williams wrote: > > On 04/13, Eric Wong wrote: > > > @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, > > > struct child_err *cerr) > > > error_errno("exec '%s': cd to '%s' failed", > > > cmd->argv[0], cmd->dir); > > > break; > > > + case CHILD_ERR_SIGPROCMASK: > > > + error_errno("sigprocmask failed restoring signals"); > > > > missing a break statement here I'll add it in, in the re-roll. > > Good catch, thanks! Actually, I now wonder if that should be die_errno instead. sigprocmask failures (EFAULT/EINVAL) would only be due to programmer error. In one of my minor projects(*), I do something like this: # define CHECK(type, expect, expr) do { \ type checkvar = (expr); \ assert(checkvar == (expect) && "BUG" && __FILE__ && __LINE__); \ } while (0) CHECK(int, 0, sigfillset()); CHECK(int, 0, sigemptyset()); CHECK(int, 0, pthread_sigmask(SIG_SETMASK, , NULL)); Dunno if it's considered good style or not, here. (*) git clone git://bogomips.org/cmogstored
Re: [PATCH 7/6] run-command: block signals between fork and execve
Brandon Williamswrote: > On 04/13, Eric Wong wrote: > > @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, > > struct child_err *cerr) > > error_errno("exec '%s': cd to '%s' failed", > > cmd->argv[0], cmd->dir); > > break; > > + case CHILD_ERR_SIGPROCMASK: > > + error_errno("sigprocmask failed restoring signals"); > > missing a break statement here I'll add it in, in the re-roll. Good catch, thanks! > > case CHILD_ERR_ENOENT: > > error_errno("cannot run %s", cmd->argv[0]); > > break;
Re: [PATCH 7/6] run-command: block signals between fork and execve
On 04/13, Eric Wong wrote: > @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, > struct child_err *cerr) > error_errno("exec '%s': cd to '%s' failed", > cmd->argv[0], cmd->dir); > break; > + case CHILD_ERR_SIGPROCMASK: > + error_errno("sigprocmask failed restoring signals"); missing a break statement here I'll add it in, in the re-roll. > case CHILD_ERR_ENOENT: > error_errno("cannot run %s", cmd->argv[0]); > break; -- Brandon Williams
Re: [PATCH v4 0/5] Kill manual ref parsing code in worktree.c
Junio C Hamanowrites: > Nguyễn Thái Ngọc Duy writes: > >> v4 adds a new patch, 2/5, which makes the hashmap related functions in >> refs.c generic, so I could add a new map for worktree ref stores and >> not abuse submodule hashmap. >> >> I mentioned about moving these hashmap back to submodule.c and >> worktree.c where it can map more than just ref stores (in worktree >> case, we can cache 'struct worktree', for example). But I'm not doing >> it now to keep things small. >> >> The commit message in 1/5 is rephrased a bit, hopefully clearer. > > Michael, does this look good to replace what has been queued? > > I did not spot any discussion on this topic after v4 when I skimmed > the backlog messages, so I am hoping that I can pick this up without > having to worry about seeing another reroll immediately after that > (in which case I'd have been better off if I tended other topics in > flight in the meantime). > > Thanks, both. Oops, I shouldn't have done that. When applied on top of the files-backend thing (or have you updated that one and is my tree lacking it???), this breaks quite a few tests. t0001#41 dumps core from "git worktree add ../linked-worktree" for example.
Re: [PATCH v4 0/5] Kill manual ref parsing code in worktree.c
Nguyễn Thái Ngọc Duywrites: > v4 adds a new patch, 2/5, which makes the hashmap related functions in > refs.c generic, so I could add a new map for worktree ref stores and > not abuse submodule hashmap. > > I mentioned about moving these hashmap back to submodule.c and > worktree.c where it can map more than just ref stores (in worktree > case, we can cache 'struct worktree', for example). But I'm not doing > it now to keep things small. > > The commit message in 1/5 is rephrased a bit, hopefully clearer. Michael, does this look good to replace what has been queued? I did not spot any discussion on this topic after v4 when I skimmed the backlog messages, so I am hoping that I can pick this up without having to worry about seeing another reroll immediately after that (in which case I'd have been better off if I tended other topics in flight in the meantime). Thanks, both.
Re: [PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout
On Thu, Apr 13, 2017 at 04:11:31PM -0700, Junio C Hamano wrote: > g...@jeffhostetler.com writes: > > > + /* > > +* Fetch the tree from the ODB for each peer directory in the > > +* n commits. > > +* > > +* For 2- and 3-way traversals, we try to avoid hitting the > > +* ODB twice for the same OID. This should yield a nice speed > > +* up in checkouts and merges when the commits are similar. > > +* > > +* We don't bother doing the full O(n^2) search for larger n, > > +* because wider traversals don't happen that often and we > > +* avoid the search setup. > > +* > > +* When 2 peer OIDs are the same, we just copy the tree > > +* descriptor data. This implicitly borrows the buffer > > +* data from the earlier cell. > > This "borrowing" made me worried, but it turns out that this is > perfectly OK. > > fill_tree_descriptor() uses read_sha1_file() to give a tree_desc its > own copy of the tree object data, so the code that calls into the > tree traversal API is responsible for freeing the buffer returned > from fill_tree_descriptor(). The updated code avoids double freeing > by setting buf[i] to NULL for borrowing [i]. Yeah, I think that logic is fine. We could also keep a separate counter for the size of buf, like: buf[nr_buf++] = fill_tree_descriptor(t+i, sha1); and then just count from zero to nr_buf in the freeing loop. I don't think it matters much either way (the free(NULL) calls are presumably cheap). -Peff
Re: [PATCH] repack: respect gc.pid lock
On Thu, Apr 13, 2017 at 1:27 PM, David Turnerwrote: > Git gc locks the repository (using a gc.pid file) so that other gcs > don't run concurrently. Make git repack respect this lock. > > Now repack, by default, will refuse to run at the same time as a gc. > This fixes a concurrency issue: a repack which deleted packs would > make a concurrent gc sad when its packs were deleted out from under > it. The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack > cannot be accessed". Then it would die, probably leaving a large temp > pack hanging around. > > Git repack learns --no-lock, so that when run under git gc, it doesn't > attempt to manage the lock itself. > > Martin Fick suggested just moving the lock into git repack, but this > would leave parts of git gc (e.g. git prune) protected by only local > locks. I worried that a prune (part of git gc) concurrent with a > repack could confuse the repack, so I decided to go with this > solution. > The last paragraph could be reworded to be a bit less personal and more as a direct statement of why moving the lock entirely to repack is a bad idea. > Signed-off-by: David Turner > --- > Documentation/git-repack.txt | 5 +++ > Makefile | 1 + > builtin/gc.c | 72 ++-- > builtin/repack.c | 13 +++ > repack.c | 88 > > repack.h | 8 > t/t7700-repack.sh| 8 > 7 files changed, 127 insertions(+), 68 deletions(-) > create mode 100644 repack.c > create mode 100644 repack.h > > diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt > index 26afe6ed54..b347ff5c62 100644 > --- a/Documentation/git-repack.txt > +++ b/Documentation/git-repack.txt > @@ -143,6 +143,11 @@ other objects in that pack they already have locally. > being removed. In addition, any unreachable loose objects will > be packed (and their loose counterparts removed). > > +--no-lock:: > + Do not lock the repository, and do not respect any existing lock. > + Mostly useful for running repack within git gc. Do not use this > + unless you know what you are doing. > + I would have phrased this more like: Used internally by git gc to call git repack while already holding the lock. Do not use unless you know what you're doing.
Re: [PATCH v2 0/6] forking and threading
On 04/13, Jonathan Nieder wrote: > Brandon Williams wrote: > > > From what I can see, there are now no calls in the child process (after fork > > and before exec/_exit) which are not Async-Signal-Safe. This means that > > fork/exec in a threaded context should work without deadlock > > I don't see why the former implies the latter. Can you explain > further? > > You already know my opinions about fork+threads by now. I continue to > think this is heading in a direction of decreased maintainability that > I dread. I disagree here. No one thought it was a bad idea back when I was implementing grep to fork while running with threads. Id rather fix the problem in run_command so that we don't ever have to worry about this again, especially for contributors who are unaware of this issue while threading. > > That's not to say that this is wasted work. I would prefer an approach > like the following: > > 1. First, make grep work without running fork() from threads. > Jonathan Tan's approach would be one way to do this. Another way > would be to simply disable threads in --recurse-submodules mode. > > This would be the first thing to do because it would make tests > reliable again, without having to wait for deeper changes. I'm not much of a fan of Jonathan Tan's suggestion, id rather just fix the problem at its root instead of adding in an additional hack. If this series crashes and burns then yes, lets just shut off threading in grep with --recurse-submodules is uses, otherwise this series will fix that case. > > 2. Then, teaching run_command to prepare the environment and do $PATH > lookup before forking. This might make it possible for run_command > to use vfork or might not. > > 3. Teaching run_command to delegate chdir to the child, using -C for > git commands and 'cd' for shell commands, and using a shell where > necessary where it didn't before. > > 4. Switching run_command to use posix_spawn on platforms where it is > available, which would make it possible to use in a threaded > context on those platforms. After this series it should be easy enough to convert to posix_spawn if someone wants to follow up with a patch to do that. -- Brandon Williams
Re: [PATCH 7/6] run-command: block signals between fork and execve
On 04/13, Eric Wong wrote: > Brandon Williamswrote: > > v2 does a bit of restructuring based on comments from reviewers. I took the > > patch by Eric and broke it up and tweaked it a bit to flow better with v2. > > I > > left out the part of Eric's patch which did signal manipulation as I wasn't > > experienced enough to know what it was doing or why it was necessary. > > Though I > > believe the code is structured in such a way that Eric could make another > > patch > > on top of this series with just the signal changes. > > Yeah, I think a separate commit message might be necessary to > explain the signal changes. Perfect! I'll carry the changes along in the reroll. > > ---8<- > Subject: [PATCH] run-command: block signals between fork and execve > > Signal handlers of the parent firing in the forked child may > have unintended side effects. Rather than auditing every signal > handler we have and will ever have, block signals while forking > and restore default signal handlers in the child before execve. > > Restoring default signal handlers is required because > execve does not unblock signals, it only restores default > signal handlers. So we must restore them with sigprocmask > before execve, leaving a window when signal handlers > we control can fire in the child. Continue ignoring > ignored signals, but reset the rest to defaults. > > Similarly, disable pthread cancellation to future-proof our code > in case we start using cancellation; as cancellation is > implemented with signals in glibc. > > Signed-off-by: Eric Wong > --- > Changes from my original in <20170411070534.GA10552@whir>: > > - fixed typo in NO_PTHREADS code > > - dropped fflush(NULL) before fork, consider us screwed anyways > if the child uses stdio > > - respect SIG_IGN in child; that seems to be the prevailing > wisdom from reading https://ewontfix.com/7/ and process.c > in ruby (git clone https://github.com/ruby/ruby.git) > > run-command.c | 69 > +++ > 1 file changed, 69 insertions(+) > > diff --git a/run-command.c b/run-command.c > index 1c36e692d..59a8b4806 100644 > --- a/run-command.c > +++ b/run-command.c > @@ -213,6 +213,7 @@ static int child_notifier = -1; > > enum child_errcode { > CHILD_ERR_CHDIR, > + CHILD_ERR_SIGPROCMASK, > CHILD_ERR_ENOENT, > CHILD_ERR_SILENT, > CHILD_ERR_ERRNO, > @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, > struct child_err *cerr) > error_errno("exec '%s': cd to '%s' failed", > cmd->argv[0], cmd->dir); > break; > + case CHILD_ERR_SIGPROCMASK: > + error_errno("sigprocmask failed restoring signals"); > case CHILD_ERR_ENOENT: > error_errno("cannot run %s", cmd->argv[0]); > break; > @@ -388,6 +391,53 @@ static char **prep_childenv(const char *const *deltaenv) > } > #endif > > +struct atfork_state { > +#ifndef NO_PTHREADS > + int cs; > +#endif > + sigset_t old; > +}; > + > +#ifndef NO_PTHREADS > +static void bug_die(int err, const char *msg) > +{ > + if (err) { > + errno = err; > + die_errno("BUG: %s", msg); > + } > +} > +#endif > + > +static void atfork_prepare(struct atfork_state *as) > +{ > + sigset_t all; > + > + if (sigfillset()) > + die_errno("sigfillset"); > +#ifdef NO_PTHREADS > + if (sigprocmask(SIG_SETMASK, , >old)) > + die_errno("sigprocmask"); > +#else > + bug_die(pthread_sigmask(SIG_SETMASK, , >old), > + "blocking all signals"); > + bug_die(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, >cs), > + "disabling cancellation"); > +#endif > +} > + > +static void atfork_parent(struct atfork_state *as) > +{ > +#ifdef NO_PTHREADS > + if (sigprocmask(SIG_SETMASK, >old, NULL)) > + die_errno("sigprocmask"); > +#else > + bug_die(pthread_setcancelstate(as->cs, NULL), > + "re-enabling cancellation"); > + bug_die(pthread_sigmask(SIG_SETMASK, >old, NULL), > + "restoring signal mask"); > +#endif > +} > + > static inline void set_cloexec(int fd) > { > int flags = fcntl(fd, F_GETFD); > @@ -511,6 +561,7 @@ int start_command(struct child_process *cmd) > char **childenv; > struct argv_array argv = ARGV_ARRAY_INIT; > struct child_err cerr; > + struct atfork_state as; > > if (pipe(notify_pipe)) > notify_pipe[0] = notify_pipe[1] = -1; > @@ -524,6 +575,7 @@ int start_command(struct child_process *cmd) > > prepare_cmd(, cmd); > childenv = prep_childenv(cmd->env); > + atfork_prepare(); > > /* >* NOTE: In order to prevent deadlocking when using threads special > @@ -537,6 +589,7 @@ int start_command(struct child_process *cmd) > cmd->pid = fork(); > failed_errno =
Re: [PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout
g...@jeffhostetler.com writes: > + /* > + * Fetch the tree from the ODB for each peer directory in the > + * n commits. > + * > + * For 2- and 3-way traversals, we try to avoid hitting the > + * ODB twice for the same OID. This should yield a nice speed > + * up in checkouts and merges when the commits are similar. > + * > + * We don't bother doing the full O(n^2) search for larger n, > + * because wider traversals don't happen that often and we > + * avoid the search setup. > + * > + * When 2 peer OIDs are the same, we just copy the tree > + * descriptor data. This implicitly borrows the buffer > + * data from the earlier cell. This "borrowing" made me worried, but it turns out that this is perfectly OK. fill_tree_descriptor() uses read_sha1_file() to give a tree_desc its own copy of the tree object data, so the code that calls into the tree traversal API is responsible for freeing the buffer returned from fill_tree_descriptor(). The updated code avoids double freeing by setting buf[i] to NULL for borrowing [i]. > + */ > for (i = 0; i < n; i++, dirmask >>= 1) { > - const unsigned char *sha1 = NULL; > - if (dirmask & 1) > - sha1 = names[i].oid->hash; > - buf[i] = fill_tree_descriptor(t+i, sha1); > + if (i > 0 && are_same_oid([i], [i - 1])) { > + t[i] = t[i - 1]; > + buf[i] = NULL; > + } else if (i > 1 && are_same_oid([i], [i - 2])) { > + t[i] = t[i - 2]; > + buf[i] = NULL; > + } else { > + const unsigned char *sha1 = NULL; > + if (dirmask & 1) > + sha1 = names[i].oid->hash; > + buf[i] = fill_tree_descriptor(t+i, sha1); > + } > } > > bottom = switch_cache_bottom();
Re: [PATCH] submodule--helper: fix typo in is_active error message
On 04/13, Stefan Beller wrote: > Signed-off-by: Stefan Beller> --- > builtin/submodule--helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index e42e671014..b1d4269e10 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -1145,7 +1145,7 @@ static int absorb_git_dirs(int argc, const char **argv, > const char *prefix) > static int is_active(int argc, const char **argv, const char *prefix) > { > if (argc != 2) > - die("submodule--helper is-active takes exactly 1 arguments"); > + die("submodule--helper is-active takes exactly 1 argument"); Obvious fix. Thanks! > > gitmodules_config(); > > -- > 2.12.2.603.g7b28dc31ba > -- Brandon Williams
Re: [PATCH] worktree add: add --lock option
Nguyễn Thái Ngọc Duywrites: > As explained in the document. This option has an advantage over the > command sequence "git worktree add && git worktree lock": there will be > no gap that somebody can accidentally "prune" the new worktree (or soon, > explicitly "worktree remove" it). > > "worktree add" does keep a lock on while it's preparing the worktree. > If --lock is specified, this lock remains after the worktree is created. > > Suggested-by: David Taylor > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > A patch that adds --lock may look like this. This looks more like "I do believe the idea by David is a useful addition and here is how I did it to the best of my ability---let's make sure we polish it for eventual inclusion" than a mere "it may look like so---do whatever you want with it" patch. To me "git worktree add --lock" somehow sounds less correct than "git worktree add --locked", but I'd appreciate if natives can correct me. Thanks.
Re: [PATCH v2 2/6] run-command: prepare command before forking
On 04/13, Jonathan Nieder wrote: > Hi, > > Brandon Williams wrote: > > > In order to avoid allocation between 'fork()' and 'exec()' the argv > > array used in the exec call is prepared prior to forking the process. > > nit: s/(the argv array.*) is prepared/prepare \1/ > > Git's commit messages are in the imperative mood, as if they are > ordering the code or the computer to do something. > > More importantly, the commit message is a good place to explain some > of the motivation behind the patch so that people can understand what > the patch is for by reading it without having to dig into mailing list > archives and get the discussion there. > > E.g. this could say > > - that grep tests are intermittently failing in configurations using > some versions of tcmalloc > > - that the cause is interaction between fork and threads: malloc holds > a lock that those versions of tcmalloc doesn't release in a > pthread_atfork handler > > - that according to [1] we need to only call async-signal-safe > operations between fork and exec. Using malloc to build the argv > array isn't async-signal-safe > > [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html > > > In addition to this, the function used to exec is changed from > > 'execvp()' to 'execv()' as the (p) variant of exec has the potential to > > call malloc during the path resolution it performs. > > *puzzled* is execvp actually allowed to call malloc? It could possible as it isn't async-signal-safe. > > Could this part go in a separate patch? That would make it easier to > review. I'll break this conversion out to a different patch. > > [...] > > +++ b/run-command.c > [...] > > + /* > > +* If there are no '/' characters in the command then perform a path > > +* lookup and use the resolved path as the command to exec. If there > > +* are no '/' characters or if the command wasn't found in the path, > > +* have exec attempt to invoke the command directly. > > +*/ > > + if (!strchr(out->argv[0], '/')) { > > + char *program = locate_in_PATH(out->argv[0]); > > + if (program) { > > + free((char *)out->argv[0]); > > + out->argv[0] = program; > > + } > > + } > > This does one half of what execvp does but leaves out the other half. > http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html > explains: > > There are two distinct ways in which the contents of the process > image file may cause the execution to fail, distinguished by the > setting of errno to either [ENOEXEC] or [EINVAL] (see the ERRORS > section). In the cases where the other members of the exec family of > functions would fail and set errno to [ENOEXEC], the execlp() and > execvp() functions shall execute a command interpreter and the > environment of the executed command shall be as if the process > invoked the sh utility using execl() as follows: > > execl(, arg0, file, arg1, ..., (char *)0); > > I think this is what the first patch in the series was about. Do we > want to drop that support? > > I think we need to keep it, since it is easy for authors of e.g. > credential helpers to accidentally rely on it. > > [...] > > @@ -437,12 +458,9 @@ int start_command(struct child_process *cmd) > > unsetenv(*cmd->env); > > } > > } > > - if (cmd->git_cmd) > > - execv_git_cmd(cmd->argv); > > - else if (cmd->use_shell) > > - execv_shell_cmd(cmd->argv); > > - else > > - sane_execvp(cmd->argv[0], (char *const*) cmd->argv); > > + > > + execv(argv.argv[0], (char *const *) argv.argv); > > What happens in the case sane_execvp was trying to handle? Does > prepare_cmd need error handling for when the command isn't found? > > Sorry this got so fussy. > > Thanks and hope that helps, > Jonathan -- Brandon Williams
Re: [PATCH v1 0/3] travis-ci: build docs with asciidoctor
Lars Schneiderwrites: > this is a mini series to build the documentation with asciidoctor in > addition to asciidoc on Travis-CI. Overall, this looks sensible. I didn't spot anything questionable other than a minor style nit, i.e. write these make doc USE_ASCIIDOCTOR=1 make -j2 doc USE_ASCIIDOCTOR=1 more like this make USE_ASCIIDOCTOR=1 doc make -j2 USE_ASCIIDOCTOR=1 doc Having said that, I wonder if we get some interesting results out of building the documentation twice, though. By looking at the Travis log with timestamps, we probably can see how long each build takes, but that is much less interesting than learning if new versions of text used mark-up that does not format correctly on one or the other (i.e. catch documentation breakage early in each CI run), for example. I have an impression that neither AsciiDoc nor AsciiDoctor "fails" in an obvious way that "make" can notice (i.e. they often just silently produce nonsense output when fed a malformed input instead). Thanks.
Re: [PATCH] xgethostname: handle long hostnames
Am 13.04.2017 um 21:23 schrieb David Turner: > If the full hostname doesn't fit in the buffer supplied to > gethostname, POSIX does not specify whether the buffer will be > null-terminated, so to be safe, we should do it ourselves. Introduce > new function, xgethostname, which ensures that there is always a \0 > at the end of the buffer. > > Signed-off-by: David Turner> --- > diff --git a/wrapper.c b/wrapper.c > index 0542fc7582..d837417709 100644 > --- a/wrapper.c > +++ b/wrapper.c > @@ -655,3 +655,16 @@ void sleep_millisec(int millisec) > { > poll(NULL, 0, millisec); > } > + > +int xgethostname(char *buf, size_t len) > +{ > + /* > + * If the full hostname doesn't fit in buf, POSIX does not > + * specify whether the buffer will be null-terminated, so to > + * be safe, do it ourselves. > + */ > + int ret = gethostname(buf, len); > + if (!ret) > + buf[len - 1] = 0; > + return ret; > +} Silent truncation is not ideal, no matter if it's done by the wrapper or the original function. It would be better to use a properly sized buffer. POSIX requires hostnames to have a maximum length of HOST_NAME_MAX. So how about just adding an assert to make sure len is big enough? Or evaluate the condition at compile time with BUILD_ASSERT_OR_ZERO? Downside: Not all platforms define HOST_NAME_MAX. daemon.c uses 256 as a fallback. On Windows a buffer size of 256 is documented to suffice in all cases [1]. The Linux manpage [2] mentions a hostname length limit of 255 (plus NUL) as well, even though HOST_NAME_MAX is 64 there. Another possibility: Die (or at least warn) if the buffer doesn't contain a NUL byte after calling gethostname(). That only works for platforms that don't NUL-terminate on truncation, though, so silent truncation would still go unnoticed. Anyway, the buffer in builtin/gc.c with its 128 bytes seems to be too short; the others are at least 256 bytes long. Replacing the magic buffer size number with HOST_NAME_MAX + 1 might be a good idea (after moving the fallback definition to git-compat-util.h). René [1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms738527(v=vs.85).aspx [2] http://man7.org/linux/man-pages/man2/gethostname.2.html
RE: [PATCH] xgethostname: handle long hostnames
> -Original Message- > From: Jonathan Nieder [mailto:jrnie...@gmail.com] > Sent: Thursday, April 13, 2017 6:05 PM > To: David Turner> Cc: git@vger.kernel.org > Subject: Re: [PATCH] xgethostname: handle long hostnames > > Hi, > > David Turner wrote: > > > If the full hostname doesn't fit in the buffer supplied to > > gethostname, POSIX does not specify whether the buffer will be > > null-terminated, so to be safe, we should do it ourselves. > [...] > > +++ b/wrapper.c > > @@ -655,3 +655,16 @@ void sleep_millisec(int millisec) { > > poll(NULL, 0, millisec); > > } > > + > > +int xgethostname(char *buf, size_t len) { > > + /* > > +* If the full hostname doesn't fit in buf, POSIX does not > > +* specify whether the buffer will be null-terminated, so to > > +* be safe, do it ourselves. > > +*/ > > + int ret = gethostname(buf, len); > > + if (!ret) > > + buf[len - 1] = 0; > > + return ret; > > I wonder if after null-terminating we would want to report this as an error, > instead of silently using a truncated result. I.e. something like > > > + if (!ret) > > + buf[len - 1] = 0; > > + if (strlen(buf) >= len - 1) { > > + errno = ENAMETOOLONG; > > + return -1; > > + } > > (or EINVAL --- either is equally descriptive). Looking at the users of this function, I think most would be happier with a truncated buffer than an error: gc.c: used to see if we are the same machine as the machine that locked the repo. Unlikely that two machines have hostnames that differ only in the 256th-or-above character. fetch-pack.c, receive-pack.c: similar to gc.c; the hostname is a note in the .keep file Ident.c: used to make up a fake email address. On my laptop, gethostname returns "corey" (no domain part), so the email address is not likely to be valid anyway. > Also POSIX requires that hostnames are <= 255 bytes. Maybe we can force the > buffer to be large enough. That is now how I read it. I read the limit as HOST_NAME_MAX, which has a *minimum* value of 255, but which might be larger. The existing hostname buffers are 128, 256, and 1024 bytes, so they're pretty arbitrary.
[PATCH] submodule--helper: fix typo in is_active error message
Signed-off-by: Stefan Beller--- builtin/submodule--helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index e42e671014..b1d4269e10 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1145,7 +1145,7 @@ static int absorb_git_dirs(int argc, const char **argv, const char *prefix) static int is_active(int argc, const char **argv, const char *prefix) { if (argc != 2) - die("submodule--helper is-active takes exactly 1 arguments"); + die("submodule--helper is-active takes exactly 1 argument"); gitmodules_config(); -- 2.12.2.603.g7b28dc31ba
Re: [PATCH] xgethostname: handle long hostnames
Hi, David Turner wrote: > If the full hostname doesn't fit in the buffer supplied to > gethostname, POSIX does not specify whether the buffer will be > null-terminated, so to be safe, we should do it ourselves. [...] > +++ b/wrapper.c > @@ -655,3 +655,16 @@ void sleep_millisec(int millisec) > { > poll(NULL, 0, millisec); > } > + > +int xgethostname(char *buf, size_t len) > +{ > + /* > + * If the full hostname doesn't fit in buf, POSIX does not > + * specify whether the buffer will be null-terminated, so to > + * be safe, do it ourselves. > + */ > + int ret = gethostname(buf, len); > + if (!ret) > + buf[len - 1] = 0; > + return ret; I wonder if after null-terminating we would want to report this as an error, instead of silently using a truncated result. I.e. something like > + if (!ret) > + buf[len - 1] = 0; > + if (strlen(buf) >= len - 1) { > + errno = ENAMETOOLONG; > + return -1; > + } (or EINVAL --- either is equally descriptive). Also POSIX requires that hostnames are <= 255 bytes. Maybe we can force the buffer to be large enough. Thoughts? Jonathan
Re: [PATCH 1/1] difftool: fix use-after-free
Johannes Schindelin wrote: > Signed-off-by: Johannes Schindelin> --- > builtin/difftool.c | 7 +-- > t/t7800-difftool.sh | 19 +++ > 2 files changed, 24 insertions(+), 2 deletions(-) Reviewed-by: Jonathan Nieder Thank you.
Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook
Brandon Williamswrote: > On 04/13, Eric Wong wrote: > > Jonathan Nieder wrote: > > > Brandon Williams wrote: > > > > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the > > > > "!#/bin/sh" line which can cause issues with portability. Instead > > > > create the hook using the 'write_script' function which includes the > > > > proper "#!" line. > > > > > This would allow later patches to regress a previously supported > > > behavior. > > > > > > I agree that it's silly to test that behavior as a side-effect of this > > > unrelated test, but I don't think we want to lose the test coverage. > > > > I was about to write something similar about this regression. > > The new execve-using code should handle ENOEXEC as execvpe does > > and probably a new test for it needs to be written. > > Would it be enough to upon seeing a failed exec call and ENOEXEC to > retry a single time, invoking the shell to attempt to interpret the > command? Yes, that's exactly what glibc does.
Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook
On 04/13, Eric Wong wrote: > Jonathan Niederwrote: > > Brandon Williams wrote: > > > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the > > > "!#/bin/sh" line which can cause issues with portability. Instead > > > create the hook using the 'write_script' function which includes the > > > proper "#!" line. > > > This would allow later patches to regress a previously supported > > behavior. > > > > I agree that it's silly to test that behavior as a side-effect of this > > unrelated test, but I don't think we want to lose the test coverage. > > I was about to write something similar about this regression. > The new execve-using code should handle ENOEXEC as execvpe does > and probably a new test for it needs to be written. Would it be enough to upon seeing a failed exec call and ENOEXEC to retry a single time, invoking the shell to attempt to interpret the command? -- Brandon Williams
Re: [PATCH] Make git log work for git CWD outside of work tree
On Wed, Apr 12, 2017 at 08:11:22PM +0700, Duy Nguyen wrote: > On Wed, Apr 12, 2017 at 8:01 PM, Jeff Kingwrote: > > I dunno. Maybe I am missing some subtle case, but it's not clear to me > > what the user would be trying to do by having git stay in the original > > directory. > > Not sure if it really happens. But if we jump from worktree is inside > original cwd and we have to jump to worktree, we set empty prefix, not > "../../../" one. So we can't refer back to files relative to original > cwd with prefix. I was kinda hoping "super prefix" would solve it, but > that one seems designed specifically for submodules. Ah, right. I think the issue is that "prefix" really serves two uses. For things like command-line arguments, we use to find the original path after we've moved. But we also use it for "where in the working tree are we". So with an out-of-tree cwd, we'd want to set the first one to the real path ("../../whatever" or even just an absolute path), but the second one would probably be empty (i.e., just pretend that we started from the top-level). But that would require first refactoring all of the cmd_* functions to handle those prefixes separately. -Peff
Re: [PATCH] xgethostname: handle long hostnames
On Thu, Apr 13, 2017 at 03:23:35PM -0400, David Turner wrote: > If the full hostname doesn't fit in the buffer supplied to > gethostname, POSIX does not specify whether the buffer will be > null-terminated Wow, TIL. What an utterly terrible and error-prone interface (I always just assumed we'd get ENAMETOOLONG, which is what glibc does). > so to be safe, we should do it ourselves. Introduce > new function, xgethostname, which ensures that there is always a \0 > at the end of the buffer. Your patch looks good to me. -Peff
Re: [PATCH v2 2/6] run-command: prepare command before forking
Hi, Brandon Williams wrote: > In order to avoid allocation between 'fork()' and 'exec()' the argv > array used in the exec call is prepared prior to forking the process. nit: s/(the argv array.*) is prepared/prepare \1/ Git's commit messages are in the imperative mood, as if they are ordering the code or the computer to do something. More importantly, the commit message is a good place to explain some of the motivation behind the patch so that people can understand what the patch is for by reading it without having to dig into mailing list archives and get the discussion there. E.g. this could say - that grep tests are intermittently failing in configurations using some versions of tcmalloc - that the cause is interaction between fork and threads: malloc holds a lock that those versions of tcmalloc doesn't release in a pthread_atfork handler - that according to [1] we need to only call async-signal-safe operations between fork and exec. Using malloc to build the argv array isn't async-signal-safe [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html > In addition to this, the function used to exec is changed from > 'execvp()' to 'execv()' as the (p) variant of exec has the potential to > call malloc during the path resolution it performs. *puzzled* is execvp actually allowed to call malloc? Could this part go in a separate patch? That would make it easier to review. [...] > +++ b/run-command.c [...] > + /* > + * If there are no '/' characters in the command then perform a path > + * lookup and use the resolved path as the command to exec. If there > + * are no '/' characters or if the command wasn't found in the path, > + * have exec attempt to invoke the command directly. > + */ > + if (!strchr(out->argv[0], '/')) { > + char *program = locate_in_PATH(out->argv[0]); > + if (program) { > + free((char *)out->argv[0]); > + out->argv[0] = program; > + } > + } This does one half of what execvp does but leaves out the other half. http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html explains: There are two distinct ways in which the contents of the process image file may cause the execution to fail, distinguished by the setting of errno to either [ENOEXEC] or [EINVAL] (see the ERRORS section). In the cases where the other members of the exec family of functions would fail and set errno to [ENOEXEC], the execlp() and execvp() functions shall execute a command interpreter and the environment of the executed command shall be as if the process invoked the sh utility using execl() as follows: execl(, arg0, file, arg1, ..., (char *)0); I think this is what the first patch in the series was about. Do we want to drop that support? I think we need to keep it, since it is easy for authors of e.g. credential helpers to accidentally rely on it. [...] > @@ -437,12 +458,9 @@ int start_command(struct child_process *cmd) > unsetenv(*cmd->env); > } > } > - if (cmd->git_cmd) > - execv_git_cmd(cmd->argv); > - else if (cmd->use_shell) > - execv_shell_cmd(cmd->argv); > - else > - sane_execvp(cmd->argv[0], (char *const*) cmd->argv); > + > + execv(argv.argv[0], (char *const *) argv.argv); What happens in the case sane_execvp was trying to handle? Does prepare_cmd need error handling for when the command isn't found? Sorry this got so fussy. Thanks and hope that helps, Jonathan
[PATCH 7/6] run-command: block signals between fork and execve
Brandon Williamswrote: > v2 does a bit of restructuring based on comments from reviewers. I took the > patch by Eric and broke it up and tweaked it a bit to flow better with v2. I > left out the part of Eric's patch which did signal manipulation as I wasn't > experienced enough to know what it was doing or why it was necessary. Though > I > believe the code is structured in such a way that Eric could make another > patch > on top of this series with just the signal changes. Yeah, I think a separate commit message might be necessary to explain the signal changes. ---8<- Subject: [PATCH] run-command: block signals between fork and execve Signal handlers of the parent firing in the forked child may have unintended side effects. Rather than auditing every signal handler we have and will ever have, block signals while forking and restore default signal handlers in the child before execve. Restoring default signal handlers is required because execve does not unblock signals, it only restores default signal handlers. So we must restore them with sigprocmask before execve, leaving a window when signal handlers we control can fire in the child. Continue ignoring ignored signals, but reset the rest to defaults. Similarly, disable pthread cancellation to future-proof our code in case we start using cancellation; as cancellation is implemented with signals in glibc. Signed-off-by: Eric Wong --- Changes from my original in <20170411070534.GA10552@whir>: - fixed typo in NO_PTHREADS code - dropped fflush(NULL) before fork, consider us screwed anyways if the child uses stdio - respect SIG_IGN in child; that seems to be the prevailing wisdom from reading https://ewontfix.com/7/ and process.c in ruby (git clone https://github.com/ruby/ruby.git) run-command.c | 69 +++ 1 file changed, 69 insertions(+) diff --git a/run-command.c b/run-command.c index 1c36e692d..59a8b4806 100644 --- a/run-command.c +++ b/run-command.c @@ -213,6 +213,7 @@ static int child_notifier = -1; enum child_errcode { CHILD_ERR_CHDIR, + CHILD_ERR_SIGPROCMASK, CHILD_ERR_ENOENT, CHILD_ERR_SILENT, CHILD_ERR_ERRNO, @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, struct child_err *cerr) error_errno("exec '%s': cd to '%s' failed", cmd->argv[0], cmd->dir); break; + case CHILD_ERR_SIGPROCMASK: + error_errno("sigprocmask failed restoring signals"); case CHILD_ERR_ENOENT: error_errno("cannot run %s", cmd->argv[0]); break; @@ -388,6 +391,53 @@ static char **prep_childenv(const char *const *deltaenv) } #endif +struct atfork_state { +#ifndef NO_PTHREADS + int cs; +#endif + sigset_t old; +}; + +#ifndef NO_PTHREADS +static void bug_die(int err, const char *msg) +{ + if (err) { + errno = err; + die_errno("BUG: %s", msg); + } +} +#endif + +static void atfork_prepare(struct atfork_state *as) +{ + sigset_t all; + + if (sigfillset()) + die_errno("sigfillset"); +#ifdef NO_PTHREADS + if (sigprocmask(SIG_SETMASK, , >old)) + die_errno("sigprocmask"); +#else + bug_die(pthread_sigmask(SIG_SETMASK, , >old), + "blocking all signals"); + bug_die(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, >cs), + "disabling cancellation"); +#endif +} + +static void atfork_parent(struct atfork_state *as) +{ +#ifdef NO_PTHREADS + if (sigprocmask(SIG_SETMASK, >old, NULL)) + die_errno("sigprocmask"); +#else + bug_die(pthread_setcancelstate(as->cs, NULL), + "re-enabling cancellation"); + bug_die(pthread_sigmask(SIG_SETMASK, >old, NULL), + "restoring signal mask"); +#endif +} + static inline void set_cloexec(int fd) { int flags = fcntl(fd, F_GETFD); @@ -511,6 +561,7 @@ int start_command(struct child_process *cmd) char **childenv; struct argv_array argv = ARGV_ARRAY_INIT; struct child_err cerr; + struct atfork_state as; if (pipe(notify_pipe)) notify_pipe[0] = notify_pipe[1] = -1; @@ -524,6 +575,7 @@ int start_command(struct child_process *cmd) prepare_cmd(, cmd); childenv = prep_childenv(cmd->env); + atfork_prepare(); /* * NOTE: In order to prevent deadlocking when using threads special @@ -537,6 +589,7 @@ int start_command(struct child_process *cmd) cmd->pid = fork(); failed_errno = errno; if (!cmd->pid) { + int sig; /* * Ensure the default die/error/warn routines do not get * called, they can take stdio locks and malloc. @@ -584,6 +637,21 @@ int start_command(struct child_process *cmd)
Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook
Jonathan Niederwrote: > Brandon Williams wrote: > > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the > > "!#/bin/sh" line which can cause issues with portability. Instead > > create the hook using the 'write_script' function which includes the > > proper "#!" line. > This would allow later patches to regress a previously supported > behavior. > > I agree that it's silly to test that behavior as a side-effect of this > unrelated test, but I don't think we want to lose the test coverage. I was about to write something similar about this regression. The new execve-using code should handle ENOEXEC as execvpe does and probably a new test for it needs to be written.
Re: [PATCH v2 0/6] forking and threading
Brandon Williams wrote: > From what I can see, there are now no calls in the child process (after fork > and before exec/_exit) which are not Async-Signal-Safe. This means that > fork/exec in a threaded context should work without deadlock I don't see why the former implies the latter. Can you explain further? You already know my opinions about fork+threads by now. I continue to think this is heading in a direction of decreased maintainability that I dread. That's not to say that this is wasted work. I would prefer an approach like the following: 1. First, make grep work without running fork() from threads. Jonathan Tan's approach would be one way to do this. Another way would be to simply disable threads in --recurse-submodules mode. This would be the first thing to do because it would make tests reliable again, without having to wait for deeper changes. 2. Then, teaching run_command to prepare the environment and do $PATH lookup before forking. This might make it possible for run_command to use vfork or might not. 3. Teaching run_command to delegate chdir to the child, using -C for git commands and 'cd' for shell commands, and using a shell where necessary where it didn't before. 4. Switching run_command to use posix_spawn on platforms where it is available, which would make it possible to use in a threaded context on those platforms. Thoughts? Jonathan
Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook
Hi, Brandon Williams wrote: > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the > "!#/bin/sh" line which can cause issues with portability. Instead > create the hook using the 'write_script' function which includes the > proper "#!" line. > > Signed-off-by: Brandon Williams> --- > t/t5550-http-fetch-dumb.sh | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) This would allow later patches to regress a previously supported behavior. I agree that it's silly to test that behavior as a side-effect of this unrelated test, but I don't think we want to lose the test coverage. Thanks for tracking this down and hope that helps, Jonathan
"up-to-date" vs. "up to date"
Hello git community, This is about an issue of language style and punctuation, not anything functional. Apologies in advance if I've brought this to the wrong place. There are a bunch of situations in which git will print a message like "Your branch is up-to-date with 'origin/master'" or "Already up-to-date." In many of these cases, including the two examples I just gave, "up to date" should not be hyphenated --- at least according to most (if not all) English-language style guides. Here are a couple posts in support of that, which also explain when it should and should not be hyphenated: https://english.stackexchange.com/questions/180611/do-i-keep-myself-up-to-date-or-up-to-date-on-something http://grammarist.com/usage/up-to-date/ And the Chromium community dealing with the same issue: https://groups.google.com/a/chromium.org/forum/#!topic/chromium-reviews/edodON6G2oY I thought about submitting a patch, but I started looking through the source code and found that "up-to-date" appears 61 times. So before I get into that I thought I would check with the community to see if this is something that's already been debated and decided. Awaiting your thoughts, Jeff Manian
[PATCH] repack: respect gc.pid lock
Git gc locks the repository (using a gc.pid file) so that other gcs don't run concurrently. Make git repack respect this lock. Now repack, by default, will refuse to run at the same time as a gc. This fixes a concurrency issue: a repack which deleted packs would make a concurrent gc sad when its packs were deleted out from under it. The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack cannot be accessed". Then it would die, probably leaving a large temp pack hanging around. Git repack learns --no-lock, so that when run under git gc, it doesn't attempt to manage the lock itself. Martin Fick suggested just moving the lock into git repack, but this would leave parts of git gc (e.g. git prune) protected by only local locks. I worried that a prune (part of git gc) concurrent with a repack could confuse the repack, so I decided to go with this solution. Signed-off-by: David Turner--- Documentation/git-repack.txt | 5 +++ Makefile | 1 + builtin/gc.c | 72 ++-- builtin/repack.c | 13 +++ repack.c | 88 repack.h | 8 t/t7700-repack.sh| 8 7 files changed, 127 insertions(+), 68 deletions(-) create mode 100644 repack.c create mode 100644 repack.h diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 26afe6ed54..b347ff5c62 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -143,6 +143,11 @@ other objects in that pack they already have locally. being removed. In addition, any unreachable loose objects will be packed (and their loose counterparts removed). +--no-lock:: + Do not lock the repository, and do not respect any existing lock. + Mostly useful for running repack within git gc. Do not use this + unless you know what you are doing. + Configuration - diff --git a/Makefile b/Makefile index 9b36068ac5..7095f03959 100644 --- a/Makefile +++ b/Makefile @@ -816,6 +816,7 @@ LIB_OBJS += refs/files-backend.o LIB_OBJS += refs/iterator.o LIB_OBJS += ref-filter.o LIB_OBJS += remote.o +LIB_OBJS += repack.o LIB_OBJS += replace_object.o LIB_OBJS += rerere.o LIB_OBJS += resolve-undo.o diff --git a/builtin/gc.c b/builtin/gc.c index c2c61a57bb..9b9c27020b 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -18,6 +18,7 @@ #include "sigchain.h" #include "argv-array.h" #include "commit.h" +#include "repack.h" #define FAILED_RUN "failed to run %s" @@ -45,7 +46,6 @@ static struct argv_array prune = ARGV_ARRAY_INIT; static struct argv_array prune_worktrees = ARGV_ARRAY_INIT; static struct argv_array rerere = ARGV_ARRAY_INIT; -static struct tempfile pidfile; static struct lock_file log_lock; static struct string_list pack_garbage = STRING_LIST_INIT_DUP; @@ -234,70 +234,6 @@ static int need_to_gc(void) return 1; } -/* return NULL on success, else hostname running the gc */ -static const char *lock_repo_for_gc(int force, pid_t* ret_pid) -{ - static struct lock_file lock; - char my_host[128]; - struct strbuf sb = STRBUF_INIT; - struct stat st; - uintmax_t pid; - FILE *fp; - int fd; - char *pidfile_path; - - if (is_tempfile_active()) - /* already locked */ - return NULL; - - if (gethostname(my_host, sizeof(my_host))) - xsnprintf(my_host, sizeof(my_host), "unknown"); - - pidfile_path = git_pathdup("gc.pid"); - fd = hold_lock_file_for_update(, pidfile_path, - LOCK_DIE_ON_ERROR); - if (!force) { - static char locking_host[128]; - int should_exit; - fp = fopen(pidfile_path, "r"); - memset(locking_host, 0, sizeof(locking_host)); - should_exit = - fp != NULL && - !fstat(fileno(fp), ) && - /* -* 12 hour limit is very generous as gc should -* never take that long. On the other hand we -* don't really need a strict limit here, -* running gc --auto one day late is not a big -* problem. --force can be used in manual gc -* after the user verifies that no gc is -* running. -*/ - time(NULL) - st.st_mtime <= 12 * 3600 && - fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 && - /* be gentle to concurrent "gc" on remote hosts */ - (strcmp(locking_host, my_host) || !kill(pid, 0) || errno == EPERM); - if (fp != NULL) - fclose(fp); - if (should_exit) { - if
Re: Proposal for "fetch-any-blob Git protocol" and server design
On 04/12/2017 03:02 PM, Kevin David wrote: Hi Jonathan, I work on the network protocols for the GVFS project at Microsoft. I shared a couple thoughts and questions below. Thanks for your reply! I know we're considering server behavior here, but how large do you generally expect these blob-want requests to be? I ask because we took an initial approach very similar to this, however, we had a hard time being clever about figuring out what set of blobs to request for those clients that didn't want the entire set, and ended up falling back to single-blob requests. Obviously, this could be due to thenature of our filesystem-virtualization-based client, but I also suspect that the teams attacking this problem are more often than not dealing with very large blob objects, so the cost of a round-trip becomes lower relative to sending the object content itself. I am envisioning (1a) as described in Jeff Hostetler's e-mail [1] ("a pre-command or hook to identify needed blobs and pre-fetch them before allowing the actual command to start"), so a Git command would typically make a single request that contains all the blobs required, but my proposal can also handle (1c) ('"fault" them in as necessary in read_object() while the command is running and without any pre-fetch (either synchronously or asynchronously and with/without a helper process)'). Even if we decided to go with single-blob requests and responses, it is still important to send them as packfiles, so that the server can serve them directly from its compressed storage without first having to uncompress them. [1] https://public-inbox.org/git/1488999039-37631-1-git-send-email-...@jeffhostetler.com/ Along the same lines as above, this is where we started and it worked well for low-volume requests. However, when we started ramping up the load, `pack-objects` operating on a very large packed repository (~150 GiB) became very computationally expensive, even with `--compression=1 --depth=0 --window=0`. Being a bit more clever about packing objects (e.g. splitting blobs out from commits and trees) improved this a bit, but we still hit a bottlenecks from what appeared to be a large number of memory-mapping operations on a ~140GiB packfile of blobs. Each `pack-objects` process would consume approximately one CPU core for the duration of the request. It's possible that further splitting of these large blob packs would have improved performance in some scenarios, but that would increase the amount of pack-index lookups necessary to find a single object. I'm not very experienced with mmap, but I thought that memory-mapping a large file in itself does not incur much of a performance penalty (if any) - it is the accesses that count. I experimented with 15,000 and 150,000 MiB files and mmap and they seem to be handled quite well. Also, how many objects are "pack-objects" packing here? === Endpoint support for forward compatibility This "server" endpoint requires that the first line be understood, but will ignore any other lines starting with words that it does not understand. This allows new "commands" to be added (distinguished by their first lines) and existing commands to be "upgraded" with backwards compatibility. This seems like a clever way to avoid the canonical `/info/refs?service=git-upload-pack` capability negotiation on every call. However, using error handling to fallback seems slightly wonky to me. Hopefully users are incentivized to upgrade their clients. By "error handling to fallback", do you mean in my proposal or in a possible future one (assuming my proposal is implemented)? I don't think my proposal requires any error handling to fallback (since only new clients can clone partially - old clients will just clone totally and obliviously), but I acknowledge that this proposal does not mean that any future proposal can be done without requiring error handling to fallback. === Indication to use the proposed endpoint The client will probably already record that at least one of its remotes (the one that it successfully performed a "partial clone" from) supports this new endpoint (if not, it can’t determine whether a missing blob was caused by repo corruption or by the "partial clone"). This knowledge can be used both to know that the server supports "fetch-blob-pack" and "fetch-commit-pack" (for the latter, the client can fall back to "fetch-pack"/"upload-pack" when fetching from other servers). This makes a lot of sense to me. When we built our caching proxy, we had to be careful when designing how we'd handle clients requesting objects missing from the proxy. For example, a client requests a single blob and the proxy doesn't have it - we can't simply download that object from the "authoritative" remote and stick it in the `.git\objects\xx\yyy...` directory, because the repository would be made corrupt. By proxy, do you mean a Git repository? Sorry, I don't really understand this part.
Re: [PATCH v2 4/6] run-command: don't die in child when duping /dev/null
On 04/13, Eric Wong wrote: > Brandon Williamswrote: > > @@ -487,7 +483,7 @@ int start_command(struct child_process *cmd) > > atexit(notify_parent); > > > > if (cmd->no_stdin) > > - dup_devnull(0); > > + dup2(null_fd, 0); > > I prefer we keep error checking for dup2 failures, > and also add more error checking for unchecked dup2 calls. > Can be a separate patch, I suppose. > > Ditto for other dup2 changes I simply figured that since we weren't doing the checks on the other dup2 calls that I would keep it consistent. But if we wanted to add in more error checking then we can can in another patch. > > > @@ -558,6 +554,8 @@ int start_command(struct child_process *cmd) > > } > > close(notify_pipe[0]); > > > > + if (null_fd > 0) > > + close(null_fd); > > I would prefer: > > if (null_fd >= 0) > > here, even if we currently do not release stdin. K will do. > > > argv_array_clear(); > > free(childenv); -- Brandon Williams
Re: [PATCH] t6500: don't run detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 9:12 PM, Jeff Kingwrote: > On Thu, Apr 13, 2017 at 09:03:26PM +0200, SZEDER Gábor wrote: > >> > Yeah, I had a similar thought. I can't think of any reason why it would >> > trigger a false positive, as long as we account for test-failure and the >> > --debug flag properly. I don't think we can just drop the "-f" from the >> > final "rm", because it may be overriding funny permissions left by >> > tests. >> >> FWIW, I used the following rather simple change during stress-testing >> these patches (barring white-space damage): >> >> diff --git a/t/test-lib.sh b/t/test-lib.sh >> index 13b569682..d7fa15a69 100644 >> --- a/t/test-lib.sh >> +++ b/t/test-lib.sh >> @@ -763,7 +763,7 @@ test_done () { >> >> test -d "$remove_trash" && I'm not sure under what circumstances the trash dir could be missing at this point... >> cd "$(dirname "$remove_trash")" && >> - rm -rf "$(basename "$remove_trash")" >> + rm -rf "$(basename "$remove_trash")" || exit 1 ... but when it is already removed, then I think we should not exit with error here. Nothing that a pair of {} wouldn't handle. > Oh, right. I thought "-f" would cause it to exit 0 even if some items > failed to be removed, but that only applies to ENOENT. So I think that > is probably a good change. I agree it's not a true error, but just a > sign of something fishy, but I don't think it hurts to have extra lint > checks. > > Replacing it the "exit 1" with a "die" that has a message is probably a > good idea, though. If we can't 'cd' or 'rm -rf', then they will tell us why they failed anyway, most likely including the name of the trash directory. Do we really need more error messages than that?
Re: [PATCH 3/2] ls-files: only recurse on active submodules
On Thu, Apr 13, 2017 at 12:25 PM, Stefan Bellerwrote: > On Thu, Apr 13, 2017 at 12:12 PM, Jacob Keller wrote: >> On Thu, Apr 13, 2017 at 12:05 PM, Stefan Beller wrote: >>> On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams >>> wrote: Add in a check to see if a submodule is active before attempting to recurse. This prevents 'ls-files' from trying to operate on a submodule which may not exist in the working directory. >>> >>> What would currently happen on recursing into non-active submodules? >>> Can we have a test for this? >>> >>> Thanks, >>> Stefan >> >> We should be able to test for this. Is it possible that we can recurse >> into a submodule as long as we have the clone in .git/modules/ >> even if we don't have it checked out currently? > > Conceptually that should be possible, e.g. > > git ls-files --recurse-submodules > > where the ancient ref contained submodules that are not present any more. > In that case we would need to do > > struct strbuf sb = STRBUF_INIT; > struct child_process = CHILD_PROCESS_INIT; > struct submodule *sub = submodule_from_path( \ > , ) > strbuf_git_path(, "modules/%s", sub->name); > > argv_array_pushl(, "git", "ls-files", "--recurse", ...); > cp.dir = sb.buf; > run_command(); > > Stefan This is probably the right flow to use then. Thanks, Jake
Re: [PATCH v2 4/6] run-command: don't die in child when duping /dev/null
Brandon Williamswrote: > @@ -487,7 +483,7 @@ int start_command(struct child_process *cmd) > atexit(notify_parent); > > if (cmd->no_stdin) > - dup_devnull(0); > + dup2(null_fd, 0); I prefer we keep error checking for dup2 failures, and also add more error checking for unchecked dup2 calls. Can be a separate patch, I suppose. Ditto for other dup2 changes > @@ -558,6 +554,8 @@ int start_command(struct child_process *cmd) > } > close(notify_pipe[0]); > > + if (null_fd > 0) > + close(null_fd); I would prefer: if (null_fd >= 0) here, even if we currently do not release stdin. > argv_array_clear(); > free(childenv);
Re: [PATCH 3/2] ls-files: only recurse on active submodules
On Thu, Apr 13, 2017 at 12:12 PM, Jacob Kellerwrote: > On Thu, Apr 13, 2017 at 12:05 PM, Stefan Beller wrote: >> On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams wrote: >>> Add in a check to see if a submodule is active before attempting to >>> recurse. This prevents 'ls-files' from trying to operate on a submodule >>> which may not exist in the working directory. >> >> What would currently happen on recursing into non-active submodules? >> Can we have a test for this? >> >> Thanks, >> Stefan > > We should be able to test for this. Is it possible that we can recurse > into a submodule as long as we have the clone in .git/modules/ > even if we don't have it checked out currently? Conceptually that should be possible, e.g. git ls-files --recurse-submodules where the ancient ref contained submodules that are not present any more. In that case we would need to do struct strbuf sb = STRBUF_INIT; struct child_process = CHILD_PROCESS_INIT; struct submodule *sub = submodule_from_path( \ , ) strbuf_git_path(, "modules/%s", sub->name); argv_array_pushl(, "git", "ls-files", "--recurse", ...); cp.dir = sb.buf; run_command(); Stefan
[PATCH] xgethostname: handle long hostnames
If the full hostname doesn't fit in the buffer supplied to gethostname, POSIX does not specify whether the buffer will be null-terminated, so to be safe, we should do it ourselves. Introduce new function, xgethostname, which ensures that there is always a \0 at the end of the buffer. Signed-off-by: David Turner--- builtin/gc.c | 2 +- builtin/receive-pack.c | 2 +- fetch-pack.c | 2 +- git-compat-util.h | 2 ++ ident.c| 2 +- wrapper.c | 13 + 6 files changed, 19 insertions(+), 4 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index c2c61a57bb..5633483f56 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -250,7 +250,7 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid) /* already locked */ return NULL; - if (gethostname(my_host, sizeof(my_host))) + if (xgethostname(my_host, sizeof(my_host))) xsnprintf(my_host, sizeof(my_host), "unknown"); pidfile_path = git_pathdup("gc.pid"); diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index aca9c33d8d..fb62a94bc3 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1700,7 +1700,7 @@ static const char *unpack(int err_fd, struct shallow_info *si) argv_array_pushl(, "index-pack", "--stdin", hdr_arg, NULL); - if (gethostname(hostname, sizeof(hostname))) + if (xgethostname(hostname, sizeof(hostname))) xsnprintf(hostname, sizeof(hostname), "localhost"); argv_array_pushf(, "--keep=receive-pack %"PRIuMAX" on %s", diff --git a/fetch-pack.c b/fetch-pack.c index d07d85ce30..a899441c34 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -803,7 +803,7 @@ static int get_pack(struct fetch_pack_args *args, argv_array_push(, "--fix-thin"); if (args->lock_pack || unpack_limit) { char hostname[256]; - if (gethostname(hostname, sizeof(hostname))) + if (xgethostname(hostname, sizeof(hostname))) xsnprintf(hostname, sizeof(hostname), "localhost"); argv_array_pushf(, "--keep=fetch-pack %"PRIuMAX " on %s", diff --git a/git-compat-util.h b/git-compat-util.h index 8a4a3f85e7..e49b65c235 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -884,6 +884,8 @@ static inline size_t xsize_t(off_t len) __attribute__((format (printf, 3, 4))) extern int xsnprintf(char *dst, size_t max, const char *fmt, ...); +extern int xgethostname(char *buf, size_t len); + /* in ctype.c, for kwset users */ extern const unsigned char tolower_trans_tbl[256]; diff --git a/ident.c b/ident.c index c0364fe3a1..7de9f47c41 100644 --- a/ident.c +++ b/ident.c @@ -122,7 +122,7 @@ static void add_domainname(struct strbuf *out, int *is_bogus) { char buf[1024]; - if (gethostname(buf, sizeof(buf))) { + if (xgethostname(buf, sizeof(buf))) { warning_errno("cannot get host name"); strbuf_addstr(out, "(none)"); *is_bogus = 1; diff --git a/wrapper.c b/wrapper.c index 0542fc7582..d837417709 100644 --- a/wrapper.c +++ b/wrapper.c @@ -655,3 +655,16 @@ void sleep_millisec(int millisec) { poll(NULL, 0, millisec); } + +int xgethostname(char *buf, size_t len) +{ + /* +* If the full hostname doesn't fit in buf, POSIX does not +* specify whether the buffer will be null-terminated, so to +* be safe, do it ourselves. +*/ + int ret = gethostname(buf, len); + if (!ret) + buf[len - 1] = 0; + return ret; +} -- 2.11.GIT
[PATCH 1/1] difftool: fix use-after-free
The left and right base directories were pointed to the buf field of two strbufs, which were subject to change. A contrived test case shows the problem where a file with a long enough name to force the strbuf to grow is up-to-date (hence the code path is used where the work tree's version of the file is reused), and then a file that is not up-to-date needs to be written (hence the code path is used where checkout_entry() uses the previously recorded base_dir that is invalid by now). Let's just copy the base_dir strings for use with checkout_entry(), never touch them until the end, and release them then. This is an easily verifiable fix (as opposed to the next-obvious alternative: to re-set base_dir after every loop iteration). This fixes https://github.com/git-for-windows/git/issues/1124 Signed-off-by: Johannes Schindelin--- builtin/difftool.c | 7 +-- t/t7800-difftool.sh | 19 +++ 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/builtin/difftool.c b/builtin/difftool.c index 25e54ad3edb..568294aac01 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -305,6 +305,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, struct strbuf rpath = STRBUF_INIT, buf = STRBUF_INIT; struct strbuf ldir = STRBUF_INIT, rdir = STRBUF_INIT; struct strbuf wtdir = STRBUF_INIT; + char *lbase_dir, *rbase_dir; size_t ldir_len, rdir_len, wtdir_len; struct cache_entry *ce = xcalloc(1, sizeof(ce) + PATH_MAX + 1); const char *workdir, *tmp; @@ -339,11 +340,11 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, memset(, 0, sizeof(wtindex)); memset(, 0, sizeof(lstate)); - lstate.base_dir = ldir.buf; + lstate.base_dir = lbase_dir = xstrdup(ldir.buf); lstate.base_dir_len = ldir.len; lstate.force = 1; memset(, 0, sizeof(rstate)); - rstate.base_dir = rdir.buf; + rstate.base_dir = rbase_dir = xstrdup(rdir.buf); rstate.base_dir_len = rdir.len; rstate.force = 1; @@ -626,6 +627,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, finish: free(ce); + free(lbase_dir); + free(rbase_dir); strbuf_release(); strbuf_release(); strbuf_release(); diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh index 0e7f30db2d9..7f09867478c 100755 --- a/t/t7800-difftool.sh +++ b/t/t7800-difftool.sh @@ -393,6 +393,25 @@ test_expect_success 'setup change in subdirectory' ' git commit -m "modified both" ' +test_expect_success 'difftool -d with growing paths' ' + a=aaa && + git init growing && + ( + cd growing && + echo "test -f \"\$2/b\"" | write_script .git/test-for-b.sh && + one=$(printf 1 | git hash-object -w --stdin) && + two=$(printf 2 | git hash-object -w --stdin) && + git update-index --add \ + --cacheinfo 100644,$one,$a --cacheinfo 100644,$two,b && + tree1=$(git write-tree) && + git update-index --add \ + --cacheinfo 100644,$two,$a --cacheinfo 100644,$one,b && + tree2=$(git write-tree) && + git checkout -- $a && + git difftool -d --extcmd .git/test-for-b.sh $tree1 $tree2 + ) +' + run_dir_diff_test () { test_expect_success "$1 --no-symlinks" " symlinks=--no-symlinks && -- 2.12.2.windows.2.1.g7df5db8d31e
[PATCH 0/1] difftool: fix a use-after-free bug
It has been reported previously that the base_dir recorded at the beginning of run_dir_diff() may go stale due to the buffer to which it points potentially being realloc()ed. This bug has been fixed in Git for Windows 2.12.2(2) already. It took me this long (!!!) to come up with a reliable test case... But now that I have it, it can be easily verified. Johannes Schindelin (1): difftool: fix use-after-free builtin/difftool.c | 7 +-- t/t7800-difftool.sh | 19 +++ 2 files changed, 24 insertions(+), 2 deletions(-) base-commit: cf11a67975b057a144618badf16dc4e3d25b9407 Published-As: https://github.com/dscho/git/releases/tag/fix-difftool-d-crash-v1 Fetch-It-Via: git fetch https://github.com/dscho/git fix-difftool-d-crash-v1 -- 2.12.2.windows.2.1.g7df5db8d31e
Re: [PATCH 3/4] submodule.c: harden submodule_move_head against broken submodules
On Thu, Apr 13, 2017 at 12:08 PM, Brandon Williamswrote: > On 04/11, Stefan Beller wrote: >> Early on in submodule_move_head just after the check if the submodule is >> initialized, we need to check if the submodule is populated correctly. >> >> If the submodule is initialized but doesn't look like populated, this >> is a red flag and can indicate multiple sorts of failures: >> (1) The submodule may be recorded at an object name, that is missing. >> (2) The submodule '.git' file link may be broken and it is not pointing >> at a repository. >> >> In both cases we want to complain to the user in the non-forced mode, >> and in the forced mode ignoring the old state and just moving the >> submodule into its new state with a fixed '.git' file link. > > What about the case where you have marked a submodule as active but > don't have its respective .gitdir yet? For now i think it would be > acceptable to complain and do nothing/ignore it, in the future we may > want to actually clone and then check it out. I agree. With this patch we'd complain in non-forced mode, and in forced mode we'd also complain as we lack the object. In both cases in the future we may want to fetch the contents instead.
Re: [PATCH 2/4] submodule.c: uninitialized submodules are ignored in recursive commands
On 04/13, Stefan Beller wrote: > On Thu, Apr 13, 2017 at 12:05 PM, Brandon Williamswrote: > > On 04/11, Stefan Beller wrote: > >> This was an oversight when working on the working tree modifying commands > >> recursing into submodules. > >> > >> To test for uninitialized submodules, introduce another submodule, that is > >> uninitialized in the actual tests. By adding it to the branch "add_sub1", > >> which is the starting point of all other branches, we have wide coverage. > >> > > ... > > > > > The 'submodule add' command will make the submodule active, so you'll > > need to add in a line to subsequently make the submodule inactive for > > this to work, unless you do in at a later point in time. > > Yes, it will make it active, but that doesn't matter here, because at this > point (in create_lib_submodule_repo) we prepare an upstream > in submodule_update_repo > > Any later test follows the structure of > > prolog && > reset_work_tree_to no_submodule && > ( > cd submodule_update && > # do actual test here, in submodule_update > ) > > Note that 'prolog' performs a clone of submodule_update_repo > to submodule_update, manually setting 'sub1' to active. > > 'uninitialized_sub' is not active. > > I tried to explain it via > To test for uninitialized submodules, introduce another submodule, > that is uninitialized in the actual tests. > in the commit message, but that is too concise apparently. > So the resend will explain that a bit more. Thanks! I just wanted to be sure as you're more familiar with these tests. -- Brandon Williams
Re: [PATCH] t6500: don't run detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 09:03:26PM +0200, SZEDER Gábor wrote: > > Yeah, I had a similar thought. I can't think of any reason why it would > > trigger a false positive, as long as we account for test-failure and the > > --debug flag properly. I don't think we can just drop the "-f" from the > > final "rm", because it may be overriding funny permissions left by > > tests. > > FWIW, I used the following rather simple change during stress-testing > these patches (barring white-space damage): > > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 13b569682..d7fa15a69 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -763,7 +763,7 @@ test_done () { > > test -d "$remove_trash" && > cd "$(dirname "$remove_trash")" && > - rm -rf "$(basename "$remove_trash")" > + rm -rf "$(basename "$remove_trash")" || exit 1 Oh, right. I thought "-f" would cause it to exit 0 even if some items failed to be removed, but that only applies to ENOENT. So I think that is probably a good change. I agree it's not a true error, but just a sign of something fishy, but I don't think it hurts to have extra lint checks. Replacing it the "exit 1" with a "die" that has a message is probably a good idea, though. -Peff
Re: [PATCH 3/2] ls-files: only recurse on active submodules
On Thu, Apr 13, 2017 at 12:05 PM, Stefan Bellerwrote: > On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams wrote: >> Add in a check to see if a submodule is active before attempting to >> recurse. This prevents 'ls-files' from trying to operate on a submodule >> which may not exist in the working directory. > > What would currently happen on recursing into non-active submodules? > Can we have a test for this? > > Thanks, > Stefan We should be able to test for this. Is it possible that we can recurse into a submodule as long as we have the clone in .git/modules/ even if we don't have it checked out currently? Thanks, Jake
Re: [PATCH 2/4] submodule.c: uninitialized submodules are ignored in recursive commands
On Thu, Apr 13, 2017 at 12:05 PM, Brandon Williamswrote: > On 04/11, Stefan Beller wrote: >> This was an oversight when working on the working tree modifying commands >> recursing into submodules. >> >> To test for uninitialized submodules, introduce another submodule, that is >> uninitialized in the actual tests. By adding it to the branch "add_sub1", >> which is the starting point of all other branches, we have wide coverage. >> ... > > The 'submodule add' command will make the submodule active, so you'll > need to add in a line to subsequently make the submodule inactive for > this to work, unless you do in at a later point in time. Yes, it will make it active, but that doesn't matter here, because at this point (in create_lib_submodule_repo) we prepare an upstream in submodule_update_repo Any later test follows the structure of prolog && reset_work_tree_to no_submodule && ( cd submodule_update && # do actual test here, in submodule_update ) Note that 'prolog' performs a clone of submodule_update_repo to submodule_update, manually setting 'sub1' to active. 'uninitialized_sub' is not active. I tried to explain it via To test for uninitialized submodules, introduce another submodule, that is uninitialized in the actual tests. in the commit message, but that is too concise apparently. So the resend will explain that a bit more. Thanks, Stefan
Re: [PATCH 3/4] submodule.c: harden submodule_move_head against broken submodules
On 04/11, Stefan Beller wrote: > Early on in submodule_move_head just after the check if the submodule is > initialized, we need to check if the submodule is populated correctly. > > If the submodule is initialized but doesn't look like populated, this > is a red flag and can indicate multiple sorts of failures: > (1) The submodule may be recorded at an object name, that is missing. > (2) The submodule '.git' file link may be broken and it is not pointing > at a repository. > > In both cases we want to complain to the user in the non-forced mode, > and in the forced mode ignoring the old state and just moving the > submodule into its new state with a fixed '.git' file link. What about the case where you have marked a submodule as active but don't have its respective .gitdir yet? For now i think it would be acceptable to complain and do nothing/ignore it, in the future we may want to actually clone and then check it out. -- Brandon Williams
Re: [PATCH 2/4] submodule.c: uninitialized submodules are ignored in recursive commands
On 04/11, Stefan Beller wrote: > This was an oversight when working on the working tree modifying commands > recursing into submodules. > > To test for uninitialized submodules, introduce another submodule, that is > uninitialized in the actual tests. By adding it to the branch "add_sub1", > which is the starting point of all other branches, we have wide coverage. > > Signed-off-by: Stefan Beller> --- > submodule.c | 3 +++ > t/lib-submodule-update.sh | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/submodule.c b/submodule.c > index c52d6634c5..2fa42519a4 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -1332,6 +1332,9 @@ int submodule_move_head(const char *path, > struct child_process cp = CHILD_PROCESS_INIT; > const struct submodule *sub; > > + if (!is_submodule_initialized(path)) > + return 0; > + > sub = submodule_from_path(null_sha1, path); > > if (!sub) > diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh > index fb4f7b014e..22dd9e060c 100755 > --- a/t/lib-submodule-update.sh > +++ b/t/lib-submodule-update.sh > @@ -73,6 +73,7 @@ create_lib_submodule_repo () { > > git checkout -b "add_sub1" && > git submodule add ../submodule_update_sub1 sub1 && > + git submodule add ../submodule_update_sub1 uninitialized_sub && The 'submodule add' command will make the submodule active, so you'll need to add in a line to subsequently make the submodule inactive for this to work, unless you do in at a later point in time. > git config -f .gitmodules submodule.sub1.ignore all && > git config submodule.sub1.ignore all && > git add .gitmodules && > -- > 2.12.2.603.g7b28dc31ba > -- Brandon Williams
Re: [PATCH 3/2] ls-files: only recurse on active submodules
On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williamswrote: > Add in a check to see if a submodule is active before attempting to > recurse. This prevents 'ls-files' from trying to operate on a submodule > which may not exist in the working directory. What would currently happen on recursing into non-active submodules? Can we have a test for this? Thanks, Stefan
Re: Simultaneous gc and repack
On Thu, 2017-04-13 at 12:36 -0600, Martin Fick wrote: > On Thursday, April 13, 2017 02:28:07 PM David Turner wrote: > > On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote: > > > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller > > wrote: > > > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner > > > > > >wrote: > > > > > Git gc locks the repository (using a gc.pid file) so > > > > > that other gcs don't run concurrently. But git > > > > > repack > > > > > doesn't respect this lock, so it's possible to have > > > > > a > > > > > repack running at the same time as a gc. This makes > > > > > the gc sad when its packs are deleted out from under > > > > > it > > > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot > > > > > be > > > > > accessed". Then it dies, leaving a large temp file > > > > > hanging around. > > > > > > > > > > Does the following seem reasonable? > > > > > > > > > > 1. Make git repack, by default, check for a gc.pid > > > > > file > > > > > (using the same logic as git gc itself does). > > > > > 2. Provide a --force option to git repack to ignore > > > > > said > > > > > check. 3. Make git gc provide that --force option > > > > > when > > > > > it calls repack under its own lock. > > > > > > > > What about just making the code that calls repack > > > > today > > > > just call gc instead? I guess it's more work if you > > > > don't > > > > strictly need it but..? > > > > > > There are many scanerios where this does not achieve > > > the > > > same thing. On the obvious side, gc does more than > > > repacking, but on the other side, repacking has many > > > switches that are not available via gc. > > > > > > Would it make more sense to move the lock to repack > > > instead of to gc? > > > > Other gc operations might step on each other too (e.g. > > packing refs). That would be less bad (and less common), > > but it still seems worth avoiding. > > Yes, but all of thsoe operations need to be self protected > already, or they risk the same issue. They are individually protected. But I would rather fail fast. And I'm not sure what happens if git prune happens during a git repack -a; might the repack fail if an object that it expects to pack is pruned out from under it?
Re: [PATCH] t6500: don't run detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 7:57 PM, Jeff Kingwrote: > On Thu, Apr 13, 2017 at 10:55:08AM -0700, Stefan Beller wrote: > >> On Thu, Apr 13, 2017 at 9:37 AM, Jeff King wrote: >> > Ah, OK, that makes more sense. I can detect it reliably by just checking >> > >> > ! test -d $root/trash* >> > >> > in my stress-test after the test successfully completes. >> >> Would we want to report such an error from the test suite itself? >> (My line of thinking is that this is a similar issue to broken && chains, >> which we do report). A broken &&-chain can harm the test's correctness by hiding errors. The failing 'rm -rf $trash' during housekeeping, after all the tests in the test script has been run and their results reported... not really, I would think, though arguably it's a sign that something is fishy. > Yeah, I had a similar thought. I can't think of any reason why it would > trigger a false positive, as long as we account for test-failure and the > --debug flag properly. I don't think we can just drop the "-f" from the > final "rm", because it may be overriding funny permissions left by > tests. FWIW, I used the following rather simple change during stress-testing these patches (barring white-space damage): diff --git a/t/test-lib.sh b/t/test-lib.sh index 13b569682..d7fa15a69 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -763,7 +763,7 @@ test_done () { test -d "$remove_trash" && cd "$(dirname "$remove_trash")" && - rm -rf "$(basename "$remove_trash")" + rm -rf "$(basename "$remove_trash")" || exit 1 test_at_end_hook_
[PATCH 3/2] ls-files: only recurse on active submodules
Add in a check to see if a submodule is active before attempting to recurse. This prevents 'ls-files' from trying to operate on a submodule which may not exist in the working directory. Signed-off-by: Brandon Williams--- After you mentioned possibly needing to check if a submodule is initialized, I went back and noticed that there was indeed no check for it...So this patch adds in the necessary check to see if a submodule is active or not before attempting to recurse. builtin/ls-files.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index d449e46db..10ddc0306 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -15,6 +15,7 @@ #include "string-list.h" #include "pathspec.h" #include "run-command.h" +#include "submodule.h" static int abbrev; static int show_deleted; @@ -235,7 +236,8 @@ static void show_ce_entry(const char *tag, const struct cache_entry *ce) die("git ls-files: internal error - cache entry not superset of prefix"); if (recurse_submodules && S_ISGITLINK(ce->ce_mode) && - submodule_path_match(, name.buf, ps_matched)) { + submodule_path_match(, name.buf, ps_matched) && + is_submodule_initialized(ce->name)) { show_gitlink(ce); } else if (match_pathspec(, name.buf, name.len, len, ps_matched, @@ -604,8 +606,10 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) if (require_work_tree && !is_inside_work_tree()) setup_work_tree(); - if (recurse_submodules) + if (recurse_submodules) { + gitmodules_config(); compile_submodule_options(argv, , show_tag); + } if (recurse_submodules && (show_stage || show_deleted || show_others || show_unmerged || -- 2.12.2.762.g0e3151a226-goog
Re: Simultaneous gc and repack
On Thursday, April 13, 2017 02:28:07 PM David Turner wrote: > On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote: > > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote: > > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner > > > >wrote: > > > > Git gc locks the repository (using a gc.pid file) so > > > > that other gcs don't run concurrently. But git > > > > repack > > > > doesn't respect this lock, so it's possible to have > > > > a > > > > repack running at the same time as a gc. This makes > > > > the gc sad when its packs are deleted out from under > > > > it > > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot > > > > be > > > > accessed". Then it dies, leaving a large temp file > > > > hanging around. > > > > > > > > Does the following seem reasonable? > > > > > > > > 1. Make git repack, by default, check for a gc.pid > > > > file > > > > (using the same logic as git gc itself does). > > > > 2. Provide a --force option to git repack to ignore > > > > said > > > > check. 3. Make git gc provide that --force option > > > > when > > > > it calls repack under its own lock. > > > > > > What about just making the code that calls repack > > > today > > > just call gc instead? I guess it's more work if you > > > don't > > > strictly need it but..? > > > > There are many scanerios where this does not achieve > > the > > same thing. On the obvious side, gc does more than > > repacking, but on the other side, repacking has many > > switches that are not available via gc. > > > > Would it make more sense to move the lock to repack > > instead of to gc? > > Other gc operations might step on each other too (e.g. > packing refs). That would be less bad (and less common), > but it still seems worth avoiding. Yes, but all of thsoe operations need to be self protected already, or they risk the same issue. -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules
On 04/13, Stefan Beller wrote: > On Thu, Apr 13, 2017 at 11:31 AM, Jacob Kellerwrote: > > Spinning out a process is one of the big downsides of working with > > submodules in our code. Unfortunately, spinning out a process is also > > one of the biggest ways we isolate submodules, and if we wanted to do > > this "in-process" we would need to add an abstraction layer that lets > > us handle submodules in-process in some clean way. > > Yeah if we had less globals and a repo struct (class) which we could > operate on, that would be bring in the abstractions that we'd need. Agreed, though we're probably pretty far from that becoming a reality. One day though! -- Brandon Williams
Re: Simultaneous gc and repack
On Thu, Apr 13, 2017 at 11:28 AM, David Turnerwrote: > On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote: >> On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote: >> > On Thu, Apr 13, 2017 at 10:31 AM, David Turner >> >> wrote: >> > > Git gc locks the repository (using a gc.pid file) so >> > > that other gcs don't run concurrently. But git repack >> > > doesn't respect this lock, so it's possible to have a >> > > repack running at the same time as a gc. This makes >> > > the gc sad when its packs are deleted out from under it >> > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be >> > > accessed". Then it dies, leaving a large temp file >> > > hanging around. >> > > >> > > Does the following seem reasonable? >> > > >> > > 1. Make git repack, by default, check for a gc.pid file >> > > (using the same logic as git gc itself does). >> > > 2. Provide a --force option to git repack to ignore said >> > > check. 3. Make git gc provide that --force option when >> > > it calls repack under its own lock. >> > >> > What about just making the code that calls repack today >> > just call gc instead? I guess it's more work if you don't >> > strictly need it but..? >> >> There are many scanerios where this does not achieve the >> same thing. On the obvious side, gc does more than >> repacking, but on the other side, repacking has many >> switches that are not available via gc. >> >> Would it make more sense to move the lock to repack instead >> of to gc? > > Other gc operations might step on each other too (e.g. packing refs). > That would be less bad (and less common), but it still seems worth > avoiding. It sounds like your original solution would work, though I wouldn't use "force" and I would either not document or document with "this is only meant to be used by git-gc internally" Thanks, Jake
[PATCH v2 2/6] run-command: prepare command before forking
In order to avoid allocation between 'fork()' and 'exec()' the argv array used in the exec call is prepared prior to forking the process. In addition to this, the function used to exec is changed from 'execvp()' to 'execv()' as the (p) variant of exec has the potential to call malloc during the path resolution it performs. Instead we simply do the path resolution ourselves during the preparation stage prior to forking. Signed-off-by: Brandon Williams--- run-command.c | 60 +++ 1 file changed, 40 insertions(+), 20 deletions(-) diff --git a/run-command.c b/run-command.c index 574b81d3e..9ee9fde97 100644 --- a/run-command.c +++ b/run-command.c @@ -221,18 +221,6 @@ static const char **prepare_shell_cmd(struct argv_array *out, const char **argv) } #ifndef GIT_WINDOWS_NATIVE -static int execv_shell_cmd(const char **argv) -{ - struct argv_array nargv = ARGV_ARRAY_INIT; - prepare_shell_cmd(, argv); - trace_argv_printf(nargv.argv, "trace: exec:"); - sane_execvp(nargv.argv[0], (char **)nargv.argv); - argv_array_clear(); - return -1; -} -#endif - -#ifndef GIT_WINDOWS_NATIVE static int child_notifier = -1; static void notify_parent(void) @@ -244,6 +232,35 @@ static void notify_parent(void) */ xwrite(child_notifier, "", 1); } + +static void prepare_cmd(struct argv_array *out, const struct child_process *cmd) +{ + if (!cmd->argv[0]) + die("BUG: command is empty"); + + if (cmd->git_cmd) { + argv_array_push(out, "git"); + argv_array_pushv(out, cmd->argv); + } else if (cmd->use_shell) { + prepare_shell_cmd(out, cmd->argv); + } else { + argv_array_pushv(out, cmd->argv); + } + + /* +* If there are no '/' characters in the command then perform a path +* lookup and use the resolved path as the command to exec. If there +* are no '/' characters or if the command wasn't found in the path, +* have exec attempt to invoke the command directly. +*/ + if (!strchr(out->argv[0], '/')) { + char *program = locate_in_PATH(out->argv[0]); + if (program) { + free((char *)out->argv[0]); + out->argv[0] = program; + } + } +} #endif static inline void set_cloexec(int fd) @@ -372,9 +389,13 @@ int start_command(struct child_process *cmd) #ifndef GIT_WINDOWS_NATIVE { int notify_pipe[2]; + struct argv_array argv = ARGV_ARRAY_INIT; + if (pipe(notify_pipe)) notify_pipe[0] = notify_pipe[1] = -1; + prepare_cmd(, cmd); + cmd->pid = fork(); failed_errno = errno; if (!cmd->pid) { @@ -437,12 +458,9 @@ int start_command(struct child_process *cmd) unsetenv(*cmd->env); } } - if (cmd->git_cmd) - execv_git_cmd(cmd->argv); - else if (cmd->use_shell) - execv_shell_cmd(cmd->argv); - else - sane_execvp(cmd->argv[0], (char *const*) cmd->argv); + + execv(argv.argv[0], (char *const *) argv.argv); + if (errno == ENOENT) { if (!cmd->silent_exec_failure) error("cannot run %s: %s", cmd->argv[0], @@ -458,7 +476,7 @@ int start_command(struct child_process *cmd) mark_child_for_cleanup(cmd->pid, cmd); /* -* Wait for child's execvp. If the execvp succeeds (or if fork() +* Wait for child's exec. If the exec succeeds (or if fork() * failed), EOF is seen immediately by the parent. Otherwise, the * child process sends a single byte. * Note that use of this infrastructure is completely advisory, @@ -467,7 +485,7 @@ int start_command(struct child_process *cmd) close(notify_pipe[1]); if (read(notify_pipe[0], _pipe[1], 1) == 1) { /* -* At this point we know that fork() succeeded, but execvp() +* At this point we know that fork() succeeded, but exec() * failed. Errors have been reported to our stderr. */ wait_or_whine(cmd->pid, cmd->argv[0], 0); @@ -475,6 +493,8 @@ int start_command(struct child_process *cmd) cmd->pid = -1; } close(notify_pipe[0]); + + argv_array_clear(); } #else { -- 2.12.2.762.g0e3151a226-goog
Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules
On Thu, Apr 13, 2017 at 11:31 AM, Jacob Kellerwrote: > Spinning out a process is one of the big downsides of working with > submodules in our code. Unfortunately, spinning out a process is also > one of the biggest ways we isolate submodules, and if we wanted to do > this "in-process" we would need to add an abstraction layer that lets > us handle submodules in-process in some clean way. Yeah if we had less globals and a repo struct (class) which we could operate on, that would be bring in the abstractions that we'd need. Stefan
[PATCH v2 6/6] run-command: add note about forking and threading
All non-Async-Signal-Safe functions (e.g. malloc and die) were removed between 'fork' and 'exec' in start_command in order to avoid potential deadlocking when forking while multiple threads are running. This deadlocking is possible when a thread (other than the one forking) has acquired a lock and didn't get around to releasing it before the fork. This leaves the lock in a locked state in the resulting process with no hope of it ever being released. Add a note describing this potential pitfall before the call to 'fork()' so people working in this section of the code know to only use Async-Signal-Safe functions in the child process. Signed-off-by: Brandon Williams--- run-command.c | 9 + 1 file changed, 9 insertions(+) diff --git a/run-command.c b/run-command.c index 4230c4933..1c36e692d 100644 --- a/run-command.c +++ b/run-command.c @@ -525,6 +525,15 @@ int start_command(struct child_process *cmd) prepare_cmd(, cmd); childenv = prep_childenv(cmd->env); + /* +* NOTE: In order to prevent deadlocking when using threads special +* care should be taken with the function calls made in between the +* fork() and exec() calls. No calls should be made to functions which +* require acquiring a lock (e.g. malloc) as the lock could have been +* held by another thread at the time of forking, causing the lock to +* never be released in the child process. This means only +* Async-Signal-Safe functions are permitted in the child. +*/ cmd->pid = fork(); failed_errno = errno; if (!cmd->pid) { -- 2.12.2.762.g0e3151a226-goog
[PATCH v2 3/6] run-command: prepare child environment before forking
In order to avoid allocation between 'fork()' and 'exec()' prepare the environment to be used in the child process prior to forking. Switch to using 'execve()' so that the construct child environment can used in the exec'd process. Signed-off-by: Brandon Williams--- run-command.c | 83 --- 1 file changed, 74 insertions(+), 9 deletions(-) diff --git a/run-command.c b/run-command.c index 9ee9fde97..5e2a03145 100644 --- a/run-command.c +++ b/run-command.c @@ -261,6 +261,75 @@ static void prepare_cmd(struct argv_array *out, const struct child_process *cmd) } } } + +static int env_isequal(const char *e1, const char *e2) +{ + for (;;) { + char c1 = *e1++; + char c2 = *e2++; + c1 = (c1 == '=') ? '\0' : tolower(c1); + c2 = (c2 == '=') ? '\0' : tolower(c2); + + if (c1 != c2) + return 0; + if (c1 == '\0') + return 1; + } +} + +static int searchenv(char **env, const char *name) +{ + int pos = 0; + + for (; env[pos]; pos++) + if (env_isequal(env[pos], name)) + break; + + return pos; +} + +static int do_putenv(char **env, int env_nr, const char *name) +{ + int pos = searchenv(env, name); + + if (strchr(name, '=')) { + /* ('key=value'), insert of replace entry */ + if (pos >= env_nr) + env_nr++; + env[pos] = (char *) name; + } else if (pos < env_nr) { + /* otherwise ('key') remove existing entry */ + env_nr--; + memmove([pos], [pos + 1], + (env_nr - pos) * sizeof(char *)); + env[env_nr] = NULL; + } + + return env_nr; +} + +static char **prep_childenv(const char *const *deltaenv) +{ + char **childenv; + int childenv_nr = 0, childenv_alloc = 0; + int i; + + for (i = 0; environ[i]; i++) + childenv_nr++; + for (i = 0; deltaenv && deltaenv[i]; i++) + childenv_alloc++; + /* Add one for the NULL termination */ + childenv_alloc += childenv_nr + 1; + + childenv = xcalloc(childenv_alloc, sizeof(char *)); + memcpy(childenv, environ, childenv_nr * sizeof(char *)); + + /* merge in deltaenv */ + for (i = 0; deltaenv && deltaenv[i]; i++) + childenv_nr = do_putenv(childenv, childenv_nr, deltaenv[i]); + + return childenv; +} #endif static inline void set_cloexec(int fd) @@ -389,12 +458,14 @@ int start_command(struct child_process *cmd) #ifndef GIT_WINDOWS_NATIVE { int notify_pipe[2]; + char **childenv; struct argv_array argv = ARGV_ARRAY_INIT; if (pipe(notify_pipe)) notify_pipe[0] = notify_pipe[1] = -1; prepare_cmd(, cmd); + childenv = prep_childenv(cmd->env); cmd->pid = fork(); failed_errno = errno; @@ -450,16 +521,9 @@ int start_command(struct child_process *cmd) if (cmd->dir && chdir(cmd->dir)) die_errno("exec '%s': cd to '%s' failed", cmd->argv[0], cmd->dir); - if (cmd->env) { - for (; *cmd->env; cmd->env++) { - if (strchr(*cmd->env, '=')) - putenv((char *)*cmd->env); - else - unsetenv(*cmd->env); - } - } - execv(argv.argv[0], (char *const *) argv.argv); + execve(argv.argv[0], (char *const *) argv.argv, + (char *const *) childenv); if (errno == ENOENT) { if (!cmd->silent_exec_failure) @@ -495,6 +559,7 @@ int start_command(struct child_process *cmd) close(notify_pipe[0]); argv_array_clear(); + free(childenv); } #else { -- 2.12.2.762.g0e3151a226-goog
Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules
On Thu, Apr 13, 2017 at 11:15 AM, Stefan Bellerwrote: > On Thu, Apr 13, 2017 at 10:12 AM, Jacob Keller > wrote: >> From: Jacob Keller >> >> Don't assume that the current working directory is the root of the >> repository. > > 1) Oh! This bug might be hidden in other commands, too. > ($ git grep cp.dir -- submodule.c) > Almost certainly. I'm not sure how best to audit this. > 2) But why? > Isn't that what most of setup.c is all about ? (discovery of the root of the > repository, staying there, and invoking the correct subcommand with a prefix) > >> Correctly generate the path for the recursing child >> processes by building it from the work_tree() root instead. Otherwise if >> we run ls-files using --git-dir or --work-tree it will not work >> correctly as it attempts to change directory into a potentially invalid >> location. > > Oh, I see. In that case the setup doesn't cd into the worktree. > Yea we aren't in the worktree when we thought we were. >> Best case, it doesn't exist and we produce an error. Worst >> case we cd into the wrong location and unknown behavior occurs. >> >> Add a new test which highlights this possibility. >> >> Signed-off-by: Jacob Keller >> --- >> I'm not sure that I'm convinced by this method of solving the problem as >> I suspect it has some corner cases (what about when run inside a >> subdirectory? It seems to work for me but I'm not sure...) Additionally, >> it felt weird that there's no helper function for creating a toplevel >> relative path. > > Do we want to run ls-files from the working tree or from the git dir? > For the git dir there would be git_pathdup_submodule. > Well prior to this code we're assuming we are in the worktree. I wasn't actually certain if we should run from within the gitdir or the worktree. Probably we actually want to be in the gitdir so that we can work even if we're not checked out. Additionally, we probably need to protect this check with a "is_submodule_initialized" unless ls-files somehow ignores the submodule already in this case.. it didn't look like it at a glance. > We could introduce > const char *get_submodule_work_tree(const char *submodule_path); > as a wrapper around > mkpathdup("%s/%s", get_git_work_tree(), ce->name); > > Code and test look fine in this patch, > Yea adding a helper function seems like a good thing. Thanks, Jake > Thanks, > Stefan
[PATCH v2 4/6] run-command: don't die in child when duping /dev/null
Signed-off-by: Brandon Williams--- run-command.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/run-command.c b/run-command.c index 5e2a03145..6751b8319 100644 --- a/run-command.c +++ b/run-command.c @@ -117,18 +117,6 @@ static inline void close_pair(int fd[2]) close(fd[1]); } -#ifndef GIT_WINDOWS_NATIVE -static inline void dup_devnull(int to) -{ - int fd = open("/dev/null", O_RDWR); - if (fd < 0) - die_errno(_("open /dev/null failed")); - if (dup2(fd, to) < 0) - die_errno(_("dup2(%d,%d) failed"), fd, to); - close(fd); -} -#endif - static char *locate_in_PATH(const char *file) { const char *p = getenv("PATH"); @@ -458,12 +446,20 @@ int start_command(struct child_process *cmd) #ifndef GIT_WINDOWS_NATIVE { int notify_pipe[2]; + int null_fd = -1; char **childenv; struct argv_array argv = ARGV_ARRAY_INIT; if (pipe(notify_pipe)) notify_pipe[0] = notify_pipe[1] = -1; + if (cmd->no_stdin || cmd->no_stdout || cmd->no_stderr) { + null_fd = open("/dev/null", O_RDWR | O_CLOEXEC | O_NONBLOCK); + if (null_fd < 0) + die_errno(_("open /dev/null failed")); + set_cloexec(null_fd); + } + prepare_cmd(, cmd); childenv = prep_childenv(cmd->env); @@ -487,7 +483,7 @@ int start_command(struct child_process *cmd) atexit(notify_parent); if (cmd->no_stdin) - dup_devnull(0); + dup2(null_fd, 0); else if (need_in) { dup2(fdin[0], 0); close_pair(fdin); @@ -497,7 +493,7 @@ int start_command(struct child_process *cmd) } if (cmd->no_stderr) - dup_devnull(2); + dup2(null_fd, 2); else if (need_err) { dup2(fderr[1], 2); close_pair(fderr); @@ -507,7 +503,7 @@ int start_command(struct child_process *cmd) } if (cmd->no_stdout) - dup_devnull(1); + dup2(null_fd, 1); else if (cmd->stdout_to_stderr) dup2(2, 1); else if (need_out) { @@ -558,6 +554,8 @@ int start_command(struct child_process *cmd) } close(notify_pipe[0]); + if (null_fd > 0) + close(null_fd); argv_array_clear(); free(childenv); } -- 2.12.2.762.g0e3151a226-goog
[PATCH v2 1/6] t5550: use write_script to generate post-update hook
The post-update hooks created in t5550-http-fetch-dumb.sh is missing the "!#/bin/sh" line which can cause issues with portability. Instead create the hook using the 'write_script' function which includes the proper "#!" line. Signed-off-by: Brandon Williams--- t/t5550-http-fetch-dumb.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 87308cdce..8552184e7 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -20,8 +20,9 @@ test_expect_success 'create http-accessible bare repository with loose objects' (cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" && git config core.bare true && mkdir -p hooks && -echo "exec git update-server-info" >hooks/post-update && -chmod +x hooks/post-update && +write_script "hooks/post-update" <<-\EOF && +exec git update-server-info + EOF hooks/post-update ) && git remote add public "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" && -- 2.12.2.762.g0e3151a226-goog
[PATCH v2 5/6] run-command: eliminate calls to error handling functions in child
All of our standard error handling paths have the potential to call malloc or take stdio locks; so we must avoid them inside the forked child. Instead, the child only writes an 8 byte struct atomically to the parent through the notification pipe to propagate an error. All user-visible error reporting happens from the parent; even avoiding functions like atexit(3) and exit(3). Helped-by: Eric WongSigned-off-by: Brandon Williams --- run-command.c | 121 ++ 1 file changed, 89 insertions(+), 32 deletions(-) diff --git a/run-command.c b/run-command.c index 6751b8319..4230c4933 100644 --- a/run-command.c +++ b/run-command.c @@ -211,14 +211,82 @@ static const char **prepare_shell_cmd(struct argv_array *out, const char **argv) #ifndef GIT_WINDOWS_NATIVE static int child_notifier = -1; -static void notify_parent(void) +enum child_errcode { + CHILD_ERR_CHDIR, + CHILD_ERR_ENOENT, + CHILD_ERR_SILENT, + CHILD_ERR_ERRNO, +}; + +struct child_err { + enum child_errcode err; + int syserr; /* errno */ +}; + +static void child_die(enum child_errcode err) { - /* -* execvp failed. If possible, we'd like to let start_command -* know, so failures like ENOENT can be handled right away; but -* otherwise, finish_command will still report the error. -*/ - xwrite(child_notifier, "", 1); + struct child_err buf; + + buf.err = err; + buf.syserr = errno; + + /* write(2) on buf smaller than PIPE_BUF (min 512) is atomic: */ + xwrite(child_notifier, , sizeof(buf)); + _exit(1); +} + +/* + * parent will make it look like the child spewed a fatal error and died + * this is needed to prevent changes to t0061. + */ +static void fake_fatal(const char *err, va_list params) +{ + vreportf("fatal: ", err, params); +} + +static void child_error_fn(const char *err, va_list params) +{ + const char msg[] = "error() should not be called in child\n"; + xwrite(2, msg, sizeof(msg) - 1); +} + +static void child_warn_fn(const char *err, va_list params) +{ + const char msg[] = "warn() should not be called in child\n"; + xwrite(2, msg, sizeof(msg) - 1); +} + +static void NORETURN child_die_fn(const char *err, va_list params) +{ + const char msg[] = "die() should not be called in child\n"; + xwrite(2, msg, sizeof(msg) - 1); + _exit(2); +} + +/* this runs in the parent process */ +static void child_err_spew(struct child_process *cmd, struct child_err *cerr) +{ + static void (*old_errfn)(const char *err, va_list params); + + old_errfn = get_error_routine(); + set_error_routine(fake_fatal); + errno = cerr->syserr; + + switch (cerr->err) { + case CHILD_ERR_CHDIR: + error_errno("exec '%s': cd to '%s' failed", + cmd->argv[0], cmd->dir); + break; + case CHILD_ERR_ENOENT: + error_errno("cannot run %s", cmd->argv[0]); + break; + case CHILD_ERR_SILENT: + break; + case CHILD_ERR_ERRNO: + error_errno("cannot exec '%s'", cmd->argv[0]); + break; + } + set_error_routine(old_errfn); } static void prepare_cmd(struct argv_array *out, const struct child_process *cmd) @@ -355,13 +423,6 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal) code += 128; } else if (WIFEXITED(status)) { code = WEXITSTATUS(status); - /* -* Convert special exit code when execvp failed. -*/ - if (code == 127) { - code = -1; - failed_errno = ENOENT; - } } else { error("waitpid is confused (%s)", argv0); } @@ -449,6 +510,7 @@ int start_command(struct child_process *cmd) int null_fd = -1; char **childenv; struct argv_array argv = ARGV_ARRAY_INIT; + struct child_err cerr; if (pipe(notify_pipe)) notify_pipe[0] = notify_pipe[1] = -1; @@ -467,20 +529,16 @@ int start_command(struct child_process *cmd) failed_errno = errno; if (!cmd->pid) { /* -* Redirect the channel to write syscall error messages to -* before redirecting the process's stderr so that all die() -* in subsequent call paths use the parent's stderr. +* Ensure the default die/error/warn routines do not get +* called, they can take stdio locks and malloc. */ - if (cmd->no_stderr || need_err) { - int child_err = dup(2); - set_cloexec(child_err); - set_error_handle(fdopen(child_err, "w")); - } +
[PATCH v2 0/6] forking and threading
v2 does a bit of restructuring based on comments from reviewers. I took the patch by Eric and broke it up and tweaked it a bit to flow better with v2. I left out the part of Eric's patch which did signal manipulation as I wasn't experienced enough to know what it was doing or why it was necessary. Though I believe the code is structured in such a way that Eric could make another patch on top of this series with just the signal changes. I switched to using 'execve' instead of 'execvpe' because 'execvpe' isn't a portable call and doesn't work on systems like macOS. This means that the path resolution needs to be done by hand before forking (which there already existed a function to do just that). >From what I can see, there are now no calls in the child process (after fork and before exec/_exit) which are not Async-Signal-Safe. This means that fork/exec in a threaded context should work without deadlock and we could potentially move to using vfork instead of fork, though I'll let others more experienced make that decision. Brandon Williams (6): t5550: use write_script to generate post-update hook run-command: prepare command before forking run-command: prepare child environment before forking run-command: don't die in child when duping /dev/null run-command: eliminate calls to error handling functions in child run-command: add note about forking and threading run-command.c | 291 ++--- t/t5550-http-fetch-dumb.sh | 5 +- 2 files changed, 223 insertions(+), 73 deletions(-) -- 2.12.2.762.g0e3151a226-goog
Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules
On Thu, Apr 13, 2017 at 11:03 AM, Brandon Williamswrote: > On 04/13, Jacob Keller wrote: >> From: Jacob Keller >> >> Since commit e77aa336f116 ("ls-files: optionally recurse into >> submodules", 2016-10-07) ls-files has known how to recurse into >> submodules when displaying files. >> >> Unfortunately this fails for certain cases, including when nesting more >> than one submodule, called from within a submodule that itself has >> submodules, or when the GIT_DIR environemnt variable is set. >> >> Prior to commit b58a68c1c187 ("setup: allow for prefix to be passed to >> git commands", 2017-03-17) this resulted in an error indicating that >> --prefix and --super-prefix were incompatible. >> >> After this commit, instead, the process loops forever with a GIT_DIR set >> to the parent and continuously reads the parent submodule files and >> recursing forever. >> >> Fix this by preparing the environment properly for submodules when >> setting up the child process. This is similar to how other commands such >> as grep behave. >> >> This was not caught by the original tests because the scenario is >> avoided if the submodules are created separately and not stored as the >> standard method of putting the submodule git directory under >> .git/modules/. We can update the test to show the failure by the >> addition of "git submodule absorbgitdirs" to the test case. However, >> note that this new test would run forever without the necessary fix in >> this patch. >> >> Signed-off-by: Jacob Keller > > This looks good to me. Thanks again for catching this. Dealing with > submodules definitely isn't easy (I seem to have made a lot of mistakes > that have been cropping up recently)...it would be easier if we didn't > have to spin out a process for each submodule but that's not the world > we live in today :) > Spinning out a process is one of the big downsides of working with submodules in our code. Unfortunately, spinning out a process is also one of the biggest ways we isolate submodules, and if we wanted to do this "in-process" we would need to add an abstraction layer that lets us handle submodules in-process in some clean way.
Re: Simultaneous gc and repack
On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote: > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote: > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner > >wrote: > > > Git gc locks the repository (using a gc.pid file) so > > > that other gcs don't run concurrently. But git repack > > > doesn't respect this lock, so it's possible to have a > > > repack running at the same time as a gc. This makes > > > the gc sad when its packs are deleted out from under it > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be > > > accessed". Then it dies, leaving a large temp file > > > hanging around. > > > > > > Does the following seem reasonable? > > > > > > 1. Make git repack, by default, check for a gc.pid file > > > (using the same logic as git gc itself does). > > > 2. Provide a --force option to git repack to ignore said > > > check. 3. Make git gc provide that --force option when > > > it calls repack under its own lock. > > > > What about just making the code that calls repack today > > just call gc instead? I guess it's more work if you don't > > strictly need it but..? > > There are many scanerios where this does not achieve the > same thing. On the obvious side, gc does more than > repacking, but on the other side, repacking has many > switches that are not available via gc. > > Would it make more sense to move the lock to repack instead > of to gc? Other gc operations might step on each other too (e.g. packing refs). That would be less bad (and less common), but it still seems worth avoiding.
Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules
On Thu, Apr 13, 2017 at 10:12 AM, Jacob Kellerwrote: > From: Jacob Keller > > Don't assume that the current working directory is the root of the > repository. 1) Oh! This bug might be hidden in other commands, too. ($ git grep cp.dir -- submodule.c) 2) But why? Isn't that what most of setup.c is all about ? (discovery of the root of the repository, staying there, and invoking the correct subcommand with a prefix) > Correctly generate the path for the recursing child > processes by building it from the work_tree() root instead. Otherwise if > we run ls-files using --git-dir or --work-tree it will not work > correctly as it attempts to change directory into a potentially invalid > location. Oh, I see. In that case the setup doesn't cd into the worktree. > Best case, it doesn't exist and we produce an error. Worst > case we cd into the wrong location and unknown behavior occurs. > > Add a new test which highlights this possibility. > > Signed-off-by: Jacob Keller > --- > I'm not sure that I'm convinced by this method of solving the problem as > I suspect it has some corner cases (what about when run inside a > subdirectory? It seems to work for me but I'm not sure...) Additionally, > it felt weird that there's no helper function for creating a toplevel > relative path. Do we want to run ls-files from the working tree or from the git dir? For the git dir there would be git_pathdup_submodule. We could introduce const char *get_submodule_work_tree(const char *submodule_path); as a wrapper around mkpathdup("%s/%s", get_git_work_tree(), ce->name); Code and test look fine in this patch, Thanks, Stefan
Re: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 08:08:12PM +0200, SZEDER Gábor wrote: > On Thu, Apr 13, 2017 at 6:44 PM, Jeff Kingwrote: > > On Thu, Apr 13, 2017 at 12:31:38PM +0200, SZEDER Gábor wrote: > > > I did wonder what will happen if Windows learns to daemonize() the > > auto-gc. I don't think we'll get an immediate test failure, but this > > test will become racy again. But this time we'll actually notice the > > racy failure, because the "ls" will report extra packs if it runs before > > the background gc does. At which point we can revisit this. > > Dscho said that it would take significant effort to make daemonize() > work on Windows, so I guess it will take a while before we'll have to > revisit this. Yeah, that's what I figured. I mostly just didn't want to leave a time-bomb for future developers. > > I guess we could probably grep for the "in the background" message from > > the parent gc. OTOH, maybe it is not even worth it. > > That wouldn't work at the moment, because auto gc says that it will go > to the background even on Windows. Ah, OK. Let's not worry about it, then. I think the way your test is constructed we should get a racy failure not long after the change, and your comments would lead people to realize what is going on. -Peff
Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules
On 04/13, Jacob Keller wrote: > From: Jacob Keller> > Don't assume that the current working directory is the root of the > repository. Correctly generate the path for the recursing child > processes by building it from the work_tree() root instead. Otherwise if > we run ls-files using --git-dir or --work-tree it will not work > correctly as it attempts to change directory into a potentially invalid > location. Best case, it doesn't exist and we produce an error. Worst > case we cd into the wrong location and unknown behavior occurs. > > Add a new test which highlights this possibility. > > Signed-off-by: Jacob Keller > --- > I'm not sure that I'm convinced by this method of solving the problem as > I suspect it has some corner cases (what about when run inside a > subdirectory? It seems to work for me but I'm not sure...) Additionally, > it felt weird that there's no helper function for creating a toplevel > relative path. I never considered the case where you use --git-dir or --work-tree, definitely an oversight on my part. This change seems reasonable to me. -- Brandon Williams
Re: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 6:44 PM, Jeff Kingwrote: > On Thu, Apr 13, 2017 at 12:31:38PM +0200, SZEDER Gábor wrote: > I did wonder what will happen if Windows learns to daemonize() the > auto-gc. I don't think we'll get an immediate test failure, but this > test will become racy again. But this time we'll actually notice the > racy failure, because the "ls" will report extra packs if it runs before > the background gc does. At which point we can revisit this. Dscho said that it would take significant effort to make daemonize() work on Windows, so I guess it will take a while before we'll have to revisit this. > It would be nice if there were a non-racy way to detect whether we > daemonized or not, and complain on Windows when we do. Then we'll be > notified immediately when daemonize() changes by the test failure, > rather than waiting for a racy failure. > > I guess we could probably grep for the "in the background" message from > the parent gc. OTOH, maybe it is not even worth it. That wouldn't work at the moment, because auto gc says that it will go to the background even on Windows. As it is, auto gc first prints the "Auto packing the repo..." message, and calls daemonize() after that. Which message it prints, i.e. "in background" or not, depends solely on the value of the 'gc.autoDetach' config variable, which is true by default. The only platform-specific thing about all this is the #ifdef in daemonize(), but then it's already too late, the misleading message has already been printed. See the discussion the patch for the same issue in a different test script (bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01)), including a patch at the end that prevents auto gc on Windows from lying about going to the background (which I, not being a Windows user, didn't follow through). http://public-inbox.org/git/20160505171430.horde.-guvdpzbfs8vi1zcfn4b...@webmail.informatik.kit.edu/T/#u > The racy version > should fail reasonably promptly, I think, and the comments you've left > would point any investigator in the right direction.
Re: Simultaneous gc and repack
On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote: > On Thu, Apr 13, 2017 at 10:31 AM, David Turnerwrote: > > Git gc locks the repository (using a gc.pid file) so > > that other gcs don't run concurrently. But git repack > > doesn't respect this lock, so it's possible to have a > > repack running at the same time as a gc. This makes > > the gc sad when its packs are deleted out from under it > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be > > accessed". Then it dies, leaving a large temp file > > hanging around. > > > > Does the following seem reasonable? > > > > 1. Make git repack, by default, check for a gc.pid file > > (using the same logic as git gc itself does). > > 2. Provide a --force option to git repack to ignore said > > check. 3. Make git gc provide that --force option when > > it calls repack under its own lock. > > What about just making the code that calls repack today > just call gc instead? I guess it's more work if you don't > strictly need it but..? There are many scanerios where this does not achieve the same thing. On the obvious side, gc does more than repacking, but on the other side, repacking has many switches that are not available via gc. Would it make more sense to move the lock to repack instead of to gc? -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules
On 04/13, Jacob Keller wrote: > From: Jacob Keller> > Since commit e77aa336f116 ("ls-files: optionally recurse into > submodules", 2016-10-07) ls-files has known how to recurse into > submodules when displaying files. > > Unfortunately this fails for certain cases, including when nesting more > than one submodule, called from within a submodule that itself has > submodules, or when the GIT_DIR environemnt variable is set. > > Prior to commit b58a68c1c187 ("setup: allow for prefix to be passed to > git commands", 2017-03-17) this resulted in an error indicating that > --prefix and --super-prefix were incompatible. > > After this commit, instead, the process loops forever with a GIT_DIR set > to the parent and continuously reads the parent submodule files and > recursing forever. > > Fix this by preparing the environment properly for submodules when > setting up the child process. This is similar to how other commands such > as grep behave. > > This was not caught by the original tests because the scenario is > avoided if the submodules are created separately and not stored as the > standard method of putting the submodule git directory under > .git/modules/. We can update the test to show the failure by the > addition of "git submodule absorbgitdirs" to the test case. However, > note that this new test would run forever without the necessary fix in > this patch. > > Signed-off-by: Jacob Keller This looks good to me. Thanks again for catching this. Dealing with submodules definitely isn't easy (I seem to have made a lot of mistakes that have been cropping up recently)...it would be easier if we didn't have to spin out a process for each submodule but that's not the world we live in today :) > --- > I updated and reworded the description so that the problem would be more > obvious. It doesn't occur always, but only when run nested with properly > absorbed gitdirs for the submodules. This explains the reason why the > test case had not caught the issue before. > > builtin/ls-files.c | 4 > t/t3007-ls-files-recurse-submodules.sh | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > index d449e46db551..e9b3546ca053 100644 > --- a/builtin/ls-files.c > +++ b/builtin/ls-files.c > @@ -15,6 +15,7 @@ > #include "string-list.h" > #include "pathspec.h" > #include "run-command.h" > +#include "submodule.h" > > static int abbrev; > static int show_deleted; > @@ -203,6 +204,9 @@ static void show_gitlink(const struct cache_entry *ce) > struct child_process cp = CHILD_PROCESS_INIT; > int status; > > + prepare_submodule_repo_env(_array); > + argv_array_push(_array, GIT_DIR_ENVIRONMENT); > + > if (prefix_len) > argv_array_pushf(_array, "%s=%s", >GIT_TOPLEVEL_PREFIX_ENVIRONMENT, > diff --git a/t/t3007-ls-files-recurse-submodules.sh > b/t/t3007-ls-files-recurse-submodules.sh > index 4cf6ccf5a8ea..c8030dd3299a 100755 > --- a/t/t3007-ls-files-recurse-submodules.sh > +++ b/t/t3007-ls-files-recurse-submodules.sh > @@ -77,6 +77,7 @@ test_expect_success 'ls-files recurses more than 1 level' ' > git -C submodule/subsub commit -m "add d" && > git -C submodule submodule add ./subsub && > git -C submodule commit -m "added subsub" && > + git submodule absorbgitdirs && > git ls-files --recurse-submodules >actual && > test_cmp expect actual > ' > -- > 2.12.2.776.gded3dc243c29.dirty > -- Brandon Williams
Re: Simultaneous gc and repack
On Thu, Apr 13, 2017 at 10:31 AM, David Turnerwrote: > Git gc locks the repository (using a gc.pid file) so that other gcs > don't run concurrently. But git repack doesn't respect this lock, so > it's possible to have a repack running at the same time as a gc. This > makes the gc sad when its packs are deleted out from under it with: > "fatal: ./objects/pack/pack-$sha.pack cannot be accessed". Then it > dies, leaving a large temp file hanging around. > > Does the following seem reasonable? > > 1. Make git repack, by default, check for a gc.pid file (using the same > logic as git gc itself does). > 2. Provide a --force option to git repack to ignore said check. > 3. Make git gc provide that --force option when it calls repack under > its own lock. > What about just making the code that calls repack today just call gc instead? I guess it's more work if you don't strictly need it but..? Thanks, Jake > This came up because Gitlab runs a repack after every N pushes and a gc > after every M commits, where M >> N. Sometimes, when pushes come in > rapidly, the repack catches the still-running gc and the above badness > happens. At least, that's my understanding: I don't run our Gitlab > servers, but I talked to the person who does and that's what he said. > > Of course, Gitlab could do its own locking, but the general approach > seems like it would help other folks too.
Re: [PATCH] t6500: don't run detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 10:55:08AM -0700, Stefan Beller wrote: > On Thu, Apr 13, 2017 at 9:37 AM, Jeff Kingwrote: > > Ah, OK, that makes more sense. I can detect it reliably by just checking > > > > ! test -d $root/trash* > > > > in my stress-test after the test successfully completes. > > Would we want to report such an error from the test suite itself? > (My line of thinking is that this is a similar issue to broken && chains, > which we do report). Yeah, I had a similar thought. I can't think of any reason why it would trigger a false positive, as long as we account for test-failure and the --debug flag properly. I don't think we can just drop the "-f" from the final "rm", because it may be overriding funny permissions left by tests. -Peff
Re: [PATCH] t6500: don't run detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 9:37 AM, Jeff Kingwrote: > Ah, OK, that makes more sense. I can detect it reliably by just checking > > ! test -d $root/trash* > > in my stress-test after the test successfully completes. Would we want to report such an error from the test suite itself? (My line of thinking is that this is a similar issue to broken && chains, which we do report). Stefan
Simultaneous gc and repack
Git gc locks the repository (using a gc.pid file) so that other gcs don't run concurrently. But git repack doesn't respect this lock, so it's possible to have a repack running at the same time as a gc. This makes the gc sad when its packs are deleted out from under it with: "fatal: ./objects/pack/pack-$sha.pack cannot be accessed". Then it dies, leaving a large temp file hanging around. Does the following seem reasonable? 1. Make git repack, by default, check for a gc.pid file (using the same logic as git gc itself does). 2. Provide a --force option to git repack to ignore said check. 3. Make git gc provide that --force option when it calls repack under its own lock. This came up because Gitlab runs a repack after every N pushes and a gc after every M commits, where M >> N. Sometimes, when pushes come in rapidly, the repack catches the still-running gc and the above badness happens. At least, that's my understanding: I don't run our Gitlab servers, but I talked to the person who does and that's what he said. Of course, Gitlab could do its own locking, but the general approach seems like it would help other folks too.
Re: [PATCH] submodule.c: add missing ' in error messages
On Thu, Apr 13, 2017 at 9:40 AM, Ralf Thielowwrote: > Signed-off-by: Ralf Thielow > --- Thanks! Reviewed-by: Stefan Beller
[PATCH v2 2/2] ls-files: fix path used when recursing into submodules
From: Jacob KellerDon't assume that the current working directory is the root of the repository. Correctly generate the path for the recursing child processes by building it from the work_tree() root instead. Otherwise if we run ls-files using --git-dir or --work-tree it will not work correctly as it attempts to change directory into a potentially invalid location. Best case, it doesn't exist and we produce an error. Worst case we cd into the wrong location and unknown behavior occurs. Add a new test which highlights this possibility. Signed-off-by: Jacob Keller --- I'm not sure that I'm convinced by this method of solving the problem as I suspect it has some corner cases (what about when run inside a subdirectory? It seems to work for me but I'm not sure...) Additionally, it felt weird that there's no helper function for creating a toplevel relative path. builtin/ls-files.c | 5 - t/t3007-ls-files-recurse-submodules.sh | 11 +++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index e9b3546ca053..a6c70dbe9ec8 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -203,6 +203,7 @@ static void show_gitlink(const struct cache_entry *ce) { struct child_process cp = CHILD_PROCESS_INIT; int status; + char *dir; prepare_submodule_repo_env(_array); argv_array_push(_array, GIT_DIR_ENVIRONMENT); @@ -221,8 +222,10 @@ static void show_gitlink(const struct cache_entry *ce) argv_array_pushv(, submodule_options.argv); cp.git_cmd = 1; - cp.dir = ce->name; + dir = mkpathdup("%s/%s", get_git_work_tree(), ce->name); + cp.dir = dir; status = run_command(); + free(dir); if (status) exit(status); } diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh index c8030dd3299a..ebb956fd16cc 100755 --- a/t/t3007-ls-files-recurse-submodules.sh +++ b/t/t3007-ls-files-recurse-submodules.sh @@ -82,6 +82,17 @@ test_expect_success 'ls-files recurses more than 1 level' ' test_cmp expect actual ' +test_expect_success 'ls-files works with GIT_DIR' ' + cat >expect <<-\EOF && + .gitmodules + c + subsub/d + EOF + + git --git-dir=submodule/.git ls-files --recurse-submodules >actual && + test_cmp expect actual +' + test_expect_success '--recurse-submodules and pathspecs setup' ' echo e >submodule/subsub/e.txt && git -C submodule/subsub add e.txt && -- 2.12.2.776.gded3dc243c29.dirty
[PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules
From: Jacob KellerSince commit e77aa336f116 ("ls-files: optionally recurse into submodules", 2016-10-07) ls-files has known how to recurse into submodules when displaying files. Unfortunately this fails for certain cases, including when nesting more than one submodule, called from within a submodule that itself has submodules, or when the GIT_DIR environemnt variable is set. Prior to commit b58a68c1c187 ("setup: allow for prefix to be passed to git commands", 2017-03-17) this resulted in an error indicating that --prefix and --super-prefix were incompatible. After this commit, instead, the process loops forever with a GIT_DIR set to the parent and continuously reads the parent submodule files and recursing forever. Fix this by preparing the environment properly for submodules when setting up the child process. This is similar to how other commands such as grep behave. This was not caught by the original tests because the scenario is avoided if the submodules are created separately and not stored as the standard method of putting the submodule git directory under .git/modules/. We can update the test to show the failure by the addition of "git submodule absorbgitdirs" to the test case. However, note that this new test would run forever without the necessary fix in this patch. Signed-off-by: Jacob Keller --- I updated and reworded the description so that the problem would be more obvious. It doesn't occur always, but only when run nested with properly absorbed gitdirs for the submodules. This explains the reason why the test case had not caught the issue before. builtin/ls-files.c | 4 t/t3007-ls-files-recurse-submodules.sh | 1 + 2 files changed, 5 insertions(+) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index d449e46db551..e9b3546ca053 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -15,6 +15,7 @@ #include "string-list.h" #include "pathspec.h" #include "run-command.h" +#include "submodule.h" static int abbrev; static int show_deleted; @@ -203,6 +204,9 @@ static void show_gitlink(const struct cache_entry *ce) struct child_process cp = CHILD_PROCESS_INIT; int status; + prepare_submodule_repo_env(_array); + argv_array_push(_array, GIT_DIR_ENVIRONMENT); + if (prefix_len) argv_array_pushf(_array, "%s=%s", GIT_TOPLEVEL_PREFIX_ENVIRONMENT, diff --git a/t/t3007-ls-files-recurse-submodules.sh b/t/t3007-ls-files-recurse-submodules.sh index 4cf6ccf5a8ea..c8030dd3299a 100755 --- a/t/t3007-ls-files-recurse-submodules.sh +++ b/t/t3007-ls-files-recurse-submodules.sh @@ -77,6 +77,7 @@ test_expect_success 'ls-files recurses more than 1 level' ' git -C submodule/subsub commit -m "add d" && git -C submodule submodule add ./subsub && git -C submodule commit -m "added subsub" && + git submodule absorbgitdirs && git ls-files --recurse-submodules >actual && test_cmp expect actual ' -- 2.12.2.776.gded3dc243c29.dirty
Errors running gitk on OS X
Running gitk from the command line, while at the top level of a Git working directory, produces the following error message, and gitk fails to open: > objc[1031]: Objective-C garbage collection is no longer supported. > /usr/local/bin/wish: line 2: 1031 Abort trap: 6 "$(dirname > $0)/../../../Library/Frameworks/Tk.framework/Versions/8.5/Resources/Wish.app/Contents/MacOS/Wish" > "$@" Additionally, the following error message pops up in a window: > Wish quit unexpectedly. > Click Reopen to open the application again. Click Report to see more detailed > information and send a report to Apple. > Ignore | Report... | Reopen How can this be fixed? *** OS: macOS Sierra, Version 10.12.4 Git version: 2.11.0 (Apple Git-81)
Fwd: Errors running gitk on OS X
Running gitk from the command line, while at the top level of a Git working directory, produces the following error message, and gitk fails to open: > objc[1031]: Objective-C garbage collection is no longer supported. > /usr/local/bin/wish: line 2: 1031 Abort trap: 6 "$(dirname > $0)/../../../Library/Frameworks/Tk.framework/Versions/8.5/Resources/Wish.app/Contents/MacOS/Wish" > "$@" Additionally, the following error message pops up in a window: > Wish quit unexpectedly. > Click Reopen to open the application again. Click Report to see more detailed > information and send a report to Apple. > Ignore | Report... | Reopen How can this be fixed? *** OS: macOS Sierra, Version 10.12.4 Git version: 2.11.0 (Apple Git-81)
Re: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 12:31:38PM +0200, SZEDER Gábor wrote: > Note, that this fd trickery doesn't work on Windows, because due to > MSYS limitations the git process only inherits the standard fds 0, 1 > and 2 from the shell. Luckily, it doesn't matter in this case, > because on Windows daemonize() is basically a noop, thus 'git gc > --auto' always runs in the foreground. > > And since we can now continue the test reliably after the detached gc > finished, check that there is only a single packfile left at the end, > i.e. that the detached gc actually did what it was supposed to do. > Also add a comment at the end of the test script to warn developers of > future tests about this issue of long running detached gc processes. The whole thing looks nicely explained, and I'm happy that we're able to reliably add this extra check at the end of the test. I did wonder what will happen if Windows learns to daemonize() the auto-gc. I don't think we'll get an immediate test failure, but this test will become racy again. But this time we'll actually notice the racy failure, because the "ls" will report extra packs if it runs before the background gc does. At which point we can revisit this. It would be nice if there were a non-racy way to detect whether we daemonized or not, and complain on Windows when we do. Then we'll be notified immediately when daemonize() changes by the test failure, rather than waiting for a racy failure. I guess we could probably grep for the "in the background" message from the parent gc. OTOH, maybe it is not even worth it. The racy version should fail reasonably promptly, I think, and the comments you've left would point any investigator in the right direction. -Peff
[PATCH] git-add--interactive.perl: add missing dot in a message
One message appears twice in the translations and the only difference is a dot at the end. So add this dot to make the messages being identical. Signed-off-by: Ralf Thielow--- git-add--interactive.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 77b4ed53a..709a5f6ce 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -1040,7 +1040,7 @@ marked for unstaging."), marked for applying."), checkout_index => N__( "If the patch applies cleanly, the edited hunk will immediately be -marked for discarding"), +marked for discarding."), checkout_head => N__( "If the patch applies cleanly, the edited hunk will immediately be marked for discarding."), -- 2.12.2.762.g0e3151a22
[PATCH] submodule.c: add missing ' in error messages
Signed-off-by: Ralf Thielow--- submodule.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/submodule.c b/submodule.c index c52d6634c..02033c97e 100644 --- a/submodule.c +++ b/submodule.c @@ -1251,7 +1251,7 @@ int bad_to_remove_submodule(const char *path, unsigned flags) cp.dir = path; if (start_command()) { if (flags & SUBMODULE_REMOVAL_DIE_ON_ERROR) - die(_("could not start 'git status in submodule '%s'"), + die(_("could not start 'git status' in submodule '%s'"), path); ret = -1; goto out; @@ -1264,7 +1264,7 @@ int bad_to_remove_submodule(const char *path, unsigned flags) if (finish_command()) { if (flags & SUBMODULE_REMOVAL_DIE_ON_ERROR) - die(_("could not run 'git status in submodule '%s'"), + die(_("could not run 'git status' in submodule '%s'"), path); ret = -1; } -- 2.12.2.762.g0e3151a22
Re: [PATCH] t6500: don't run detached auto gc at the end of the test script
On Thu, Apr 13, 2017 at 12:03:20AM +0200, SZEDER Gábor wrote: > > I couldn't get the original to show a failure, though, even under heavy > > load. So maybe widening the race is enough. > > Just to be clear: it's only an occasionally appearing error message. > There is no failure in the sense of test failure, because 'rm -rf > $trash' erroring out during housekeeping does not fail the test suite. Ah, OK, that makes more sense. I can detect it reliably by just checking ! test -d $root/trash* in my stress-test after the test successfully completes. -Peff
RE: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script
Thanks for fixing this! > -Original Message- > From: SZEDER Gábor [mailto:szeder@gmail.com] > Sent: Thursday, April 13, 2017 6:32 AM > To: Junio C Hamano> Cc: Jeff King ; Johannes Sixt ; David Turner > ; git@vger.kernel.org; SZEDER Gábor > > Subject: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test > script > > The last test in 't6500-gc', 'background auto gc does not run if gc.log is > present > and recent but does if it is old', added in > a831c06a2 (gc: ignore old gc.log files, 2017-02-10), may sporadically trigger > an > error message from the test harness: > > rm: cannot remove 'trash directory.t6500-gc/.git/objects': Directory not > empty > > The test in question ends with executing an auto gc in the backround, which > occasionally takes so long that it's still running when 'test_done' is about > to > remove the trash directory. This 'rm -rf $trash' in the foreground might race > with the detached auto gc to create and delete files and directories, and gc > might (re-)create a path that 'rm' already visited and removed, triggering the > above error message when 'rm' attempts to remove its parent directory. > > Commit bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01) fixed the > same problem in a different test script by simply disallowing background gc. > Unfortunately, what worked there is not applicable here, because the purpose > of this test is to check the behavior of a detached auto gc. > > Make sure that the test doesn't continue before the gc is finished in the > background with a clever bit of shell trickery: > > - Open fd 9 in the shell, to be inherited by the background gc > process, because our daemonize() only closes the standard fds 0, > 1 and 2. > - Duplicate this fd 9 to stdout. > - Read 'git gc's stdout, and thus fd 9, through a command > substitution. We don't actually care about gc's output, but this > construct has two useful properties: > - This read blocks until stdout or fd 9 are open. While stdout is > closed after the main gc process creates the background process > and exits, fd 9 remains open until the backround process exits. > - The variable assignment from the command substitution gets its > exit status from the command executed within the command > substitution, i.e. a failing main gc process will cause the test > to fail. > > Note, that this fd trickery doesn't work on Windows, because due to MSYS > limitations the git process only inherits the standard fds 0, 1 and 2 from > the shell. > Luckily, it doesn't matter in this case, because on Windows daemonize() is > basically a noop, thus 'git gc --auto' always runs in the foreground. > > And since we can now continue the test reliably after the detached gc > finished, > check that there is only a single packfile left at the end, i.e. that the > detached gc > actually did what it was supposed to do. > Also add a comment at the end of the test script to warn developers of future > tests about this issue of long running detached gc processes. > > Helped-by: Jeff King > Helped-by: Johannes Sixt > Signed-off-by: SZEDER Gábor > --- > > Updated subject line, but otherwise the same as v2. > > t/t6500-gc.sh | 18 +- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index 08de2e8ab..cc7acd101 100755 > --- a/t/t6500-gc.sh > +++ b/t/t6500-gc.sh > @@ -67,6 +67,16 @@ test_expect_success 'auto gc with too many loose > objects does not attempt to cre > test_line_count = 2 new # There is one new pack and its .idx ' > > +run_and_wait_for_auto_gc () { > + # We read stdout from gc for the side effect of waiting until the > + # background gc process exits, closing its fd 9. Furthermore, the > + # variable assignment from a command substitution preserves the > + # exit status of the main gc process. > + # Note: this fd trickery doesn't work on Windows, but there is no > + # need to, because on Win the auto gc always runs in the foreground. > + doesnt_matter=$(git gc --auto 9>&1) > +} > + > test_expect_success 'background auto gc does not run if gc.log is present and > recent but does if it is old' ' > test_commit foo && > test_commit bar && > @@ -80,7 +90,13 @@ test_expect_success 'background auto gc does not run if > gc.log is present and re > test-chmtime =-345600 .git/gc.log && > test_must_fail git gc --auto && > test_config gc.logexpiry 2.days && > - git gc --auto > + run_and_wait_for_auto_gc && > + ls .git/objects/pack/pack-*.pack >packs && > + test_line_count = 1 packs > ' > > +# DO NOT leave a detached auto gc process running near the end of the # > +test script: it can run long enough in the background to racily # >
[PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout
From: Jeff HostetlerVersion 3 uses a structure copy rather than memcpy and adds a comment. Jeff Hostetler (1): unpack-trees: avoid duplicate ODB lookups during checkout unpack-trees.c | 37 + 1 file changed, 33 insertions(+), 4 deletions(-) -- 2.9.3
[PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout
From: Jeff HostetlerTeach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree and checkout, there will likely be many peer directories that have the same OID when the differences between the commits are relatively small. In these cases we can avoid hitting the ODB multiple times for the same OID. This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor(). On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 === On Linux using p0006-read-tree-checkout.sh with linux.git: Test HEAD^ HEAD --- 0006.2: read-tree br_base br_ballast (57994) 0.24(0.20+0.03) 0.24(0.22+0.01) +0.0% 0006.3: switch between br_base br_ballast (57994) 10.58(6.23+2.86) 10.67(5.94+2.87) +0.9% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.60(0.44+0.17) 0.57(0.44+0.14) -5.0% 0006.5: switch between aliases (57994)0.59(0.48+0.13) 0.57(0.44+0.15) -3.4% Signed-off-by: Jeff Hostetler --- unpack-trees.c | 37 + 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 3a8ee19..a674423 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info) return ret; } +static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k) +{ + return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid); +} + static int traverse_trees_recursive(int n, unsigned long dirmask, unsigned long df_conflicts, struct name_entry *names, @@ -553,11 +558,35 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, newinfo.pathlen += tree_entry_len(p) + 1; newinfo.df_conflicts |= df_conflicts; + /* +* Fetch the tree from the ODB for each peer directory in the +* n commits. +* +* For 2- and 3-way traversals, we try to avoid hitting the +* ODB twice for the same OID. This should yield a nice speed +* up in checkouts and merges when the commits are similar. +* +* We don't bother doing the full O(n^2) search for larger n, +* because wider traversals don't happen that often and we +* avoid the search setup. +* +* When 2 peer OIDs are the same, we just copy the tree +* descriptor data. This implicitly borrows the buffer +* data from the earlier cell. +*/ for (i = 0; i < n; i++, dirmask >>= 1) { - const unsigned char *sha1 = NULL; - if (dirmask & 1) - sha1 = names[i].oid->hash; - buf[i] = fill_tree_descriptor(t+i, sha1); + if (i > 0 && are_same_oid([i], [i - 1])) { + t[i] = t[i - 1]; + buf[i] = NULL; + } else if (i > 1 && are_same_oid([i], [i - 2])) { + t[i] = t[i - 2]; + buf[i] = NULL; + } else { + const unsigned char *sha1 = NULL; + if (dirmask & 1) + sha1 = names[i].oid->hash; + buf[i] = fill_tree_descriptor(t+i, sha1); + } } bottom = switch_cache_bottom(); -- 2.9.3
Re: [PATCH 5/7] Add support for gnupg < 1.4
On Wed, Apr 5, 2017 at 3:45 PM, Ævar Arnfjörð Bjarmasonwrote: > On Wed, Apr 5, 2017 at 3:04 PM, Tom G. Christensen > wrote: >> This adds an OLD_GNUPG define to the Makefile which when activated will >> ensure git does not use the --keyid-format argument when calling the >> 'gpg' program. >> This is consistent with how 'gpg' was used in git < 2.10.0 and slightly >> decreases security. > > This changes the code Linus Torvalds added in b624a3e67f to mitigate > the evil32 project generating keys which looked the same for 32 bit > signatures. > > I think this change makes sense, but the Makefile should have a > slightly scarier warning, something like: > > "Define OLD_GNUPG if you need support for gnupg <1.4. Note that this > will cause git to only show the first 32 bits of PGP keys instead of > 64, and there's a wide variety of brute-forced 32 bit keys in the wild > thanks to the evil32 project (https://evil32.com). Enabling this will > make GPG work old versions, but you might be fooled into accepting grammar fix: "work on older versions" > malicious keys as a result". > >> Signed-off-by: Tom G. Christensen >> --- >> Makefile| 6 ++ >> gpg-interface.c | 2 ++ >> 2 files changed, 8 insertions(+) >> >> diff --git a/Makefile b/Makefile >> index ca9f16d19..f8f585d21 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -386,6 +386,8 @@ all:: >> # >> # to say "export LESS=FRX (and LV=-c) if the environment variable >> # LESS (and LV) is not set, respectively". >> +# >> +# Define OLD_GNUPG if you need support for gnupg < 1.4. >> >> GIT-VERSION-FILE: FORCE >> @$(SHELL_PATH) ./GIT-VERSION-GEN >> @@ -1529,6 +1531,10 @@ ifndef PAGER_ENV >> PAGER_ENV = LESS=FRX LV=-c >> endif >> >> +ifdef OLD_GNUPG >> + BASIC_CFLAGS += -DOLD_GNUPG >> +endif >> + >> QUIET_SUBDIR0 = +$(MAKE) -C # space to separate -C and subdir >> QUIET_SUBDIR1 = >> >> diff --git a/gpg-interface.c b/gpg-interface.c >> index e44cc27da..57f1ea792 100644 >> --- a/gpg-interface.c >> +++ b/gpg-interface.c >> @@ -224,7 +224,9 @@ int verify_signed_buffer(const char *payload, size_t >> payload_size, >> argv_array_pushl(, >> gpg_program, >> "--status-fd=1", >> +#ifndef OLD_GNUPG >> "--keyid-format=long", >> +#endif >> "--verify", temp.filename.buf, "-", >> NULL); >> >> -- >> 2.12.2 >>
Re: Rebase sequencer changes prevent exec commands from modifying the todo file?
Hi Stephen, On Wed, 12 Apr 2017, Stephen Hicks wrote: > On Wed, Apr 12, 2017 at 3:05 PM, Johannes Schindelin >wrote: > > > > On Tue, 11 Apr 2017, Stephen Hicks wrote: > > > > > I'm hesitant to only use mtime, size, and inode, since it's quite > > > likely that these are all identical even if the file has changed. > > > > Not at all. The mtime and the size will most likely be different. > > In my experience, mtime is almost always the same, since the file is > written pretty much immediately before the exec - as long as the > command takes a small fraction of a second (which it usually does) the > mtime (which is in seconds, not millis or micros) will look the same. Oh, I see, you assume that mtime is in seconds. However, on pretty much all platforms supported by Git we use nanosecond-precision mtimes [*1*]. In practice, the mtimes are not always nanosecond-precision, as some file systems have a coarser granularity. It is typically a pretty fine a granularity, though, much finer than a second [*2*]. My mistake, I should have explained that I wanted to suggest adding a field of type `struct cache_time` to the `todo_list` structure, and to use ST_MTIME_NSEC() to populate and compare the `nsec` part of that. Ciao, Johannes Footnote *1*: The platforms for which we disable nanosecond mtimes are: OSF-1, AIX, HP-UX, Interix, Minix, NONSTOP_KERNEL (?), and QNX. (If you look at git.git's config.mak.uname, you will see that Git for Windows' nanosecond support has not yet made it upstream.) In other words, the major platforms (Windows, MacOSX and Linux) all compare the nanosecond part of the mtime. Footnote *2*: the coarsest mtime of which I know is FAT32 (2-second granularity). If you want to use Git on such a file system, well, that's what you get. And then some performance penalties, too. It would appear from a quick web search that ext4 has true nanosecond granularitry. NTFS has a 100ns granularity which is still plenty enough for our purposes.
gitk feature request: checkout any commit
Hello, it would be helpful, if gitk would allow checkouts not only for local branches but also for remote branches and any other commit. For remote branches, it is sufficient to remove the code that greys out the context menu. The context menu for regular commits would need a new entry. Of course, this can easily be done via command line - but that's true for every gitk feature :) Another feature that would really suit gitk is the "Tools" menu from git-gui. A simple 1:1 copy would be fine ;) -- regards, stef
Re: [PATCH 7/7] Do not use curl_easy_strerror with curl < 7.12.0
On Wed, Apr 12, 2017 at 11:28 PM, Junio C Hamanowrote: > Jeff King writes: > >> On Wed, Apr 05, 2017 at 03:04:24PM +0200, Tom G. Christensen wrote: >> ... >> These kinds of interleaved conditionals make me nervous that we'll get >> something wrong (especially without braces, it's not immediately clear >> that both sides are a single statement). >> >> I wonder if it would be more readable to do something like: >> >> #if LIBCURL_VERSION_NUM < 0x070c00 >> static const char *curl_easy_strerror(CURL *curl) >> { >> return "[error code unavailable; curl version too old]"; >> } >> #endif >> >> Then callers don't have to individually deal with the ifdef. It does >> mean that the user sees that kind-of ugly message, but maybe that is a >> good thing. They know they need to upgrade curl to see more details. > > Yup, thanks for a very good suggestion. I also think this is a good solution.. Thanks, Jake
Re: [PATCH] status: show in-progress info for short status
On Fri, Apr 7, 2017 at 4:05 PM, Michael J Gruberwrote: > SZEDER Gábor venit, vidit, dixit 06.04.2017 16:33: >>> @@ -1779,6 +1780,31 @@ static void wt_shortstatus_print_tracking(struct >>> wt_status *s) >>> } >>> >>> color_fprintf(s->fp, header_color, "]"); >>> + >>> + inprogress: >>> +if (!s->show_inprogress) >>> +goto conclude; >>> +memset(, 0, sizeof(state)); >>> +wt_status_get_state(, >>> +s->branch && !strcmp(s->branch, "HEAD")); >>> +if (state.merge_in_progress) >>> +color_fprintf(s->fp, header_color, "; %s", >>> LABEL(N_("MERGING"))); >>> +else if (state.am_in_progress) >>> +color_fprintf(s->fp, header_color, "; %s", LABEL(N_("AM"))); >>> +else if (state.rebase_in_progress) >>> +color_fprintf(s->fp, header_color, "; %s", >>> LABEL(N_("REBASE-m"))); In the prompt "REBASE-m" is only shown during 'rebase --merge', while "REBASE" is shown during a plain 'rebase'. Here "REBASE-m" will be shown during both. >>> +else if (state.rebase_interactive_in_progress) >>> +color_fprintf(s->fp, header_color, "; %s", >>> LABEL(N_("REBASE-i"))); >>> +else if (state.cherry_pick_in_progress) >>> +color_fprintf(s->fp, header_color, "; %s", >>> LABEL(N_("CHERRY-PICKING"))); >>> +else if (state.revert_in_progress) >>> +color_fprintf(s->fp, header_color, "; %s", >>> LABEL(N_("REVERTING"))); >>> +if (state.bisect_in_progress) >> >> else if? > > No. You can do all of the above during a bisect. Right indeed. I think the prompt should do the same. Onto the TODO list it goes, then. >>> +color_fprintf(s->fp, header_color, "; %s", >>> LABEL(N_("BISECTING"))); >>> +free(state.branch); >>> +free(state.onto); >>> +free(state.detached_from); >>> + >>> conclude: >>> fputc(s->null_termination ? '\0' : '\n', s->fp); >>> } >> >> This reminded me of a patch that I have been using for almost two >> years now... >> >> git-prompt.sh's similar long conditional chain to show the ongoing >> operation has an else-branch at the end showing "AM/REBASE". Your >> patch doesn't add an equivalent branch. Is this intentional or an >> oversight? > > I go over all states that wt_status_get_state can give. Yeah, but are those states exclusive? Or is it possible that both 'am_in_progress' and 'rebase_in_progress' are set? I suppose it's not possible. It would certainly be more obvious if it were a single enum field instead of a bunch of ints. >> I suppose it's intentional, because that "AM/REBASE" branch in the >> prompt seems to be unreachable (see below), but I never took the >> effort to actually check that (hence the "seems" and that's why I >> never submitted it). > > Note that wt_status_get_state and the prompt script do things quite > differently. > > I suppose that the prompt could make use of the in-progress info being > exposed by "git status" rather doing its on thing. The prompt currently gets all this in-progress info using only Bash builtins, which is much faster than running a git process in a subshell. This is especially important on Windows, where the overhead of running a git process is large enough to make the runtime of __git_ps1() noticeable when displaying the prompt. That's the main reason for much of the shell complexity and ugliness in git-prompt.sh. Besides, current output formats make 'git status' unsuitable for the prompt.
[PATCHv2.1] t6500: wait for detached auto gc at the end of the test script
The last test in 't6500-gc', 'background auto gc does not run if gc.log is present and recent but does if it is old', added in a831c06a2 (gc: ignore old gc.log files, 2017-02-10), may sporadically trigger an error message from the test harness: rm: cannot remove 'trash directory.t6500-gc/.git/objects': Directory not empty The test in question ends with executing an auto gc in the backround, which occasionally takes so long that it's still running when 'test_done' is about to remove the trash directory. This 'rm -rf $trash' in the foreground might race with the detached auto gc to create and delete files and directories, and gc might (re-)create a path that 'rm' already visited and removed, triggering the above error message when 'rm' attempts to remove its parent directory. Commit bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01) fixed the same problem in a different test script by simply disallowing background gc. Unfortunately, what worked there is not applicable here, because the purpose of this test is to check the behavior of a detached auto gc. Make sure that the test doesn't continue before the gc is finished in the background with a clever bit of shell trickery: - Open fd 9 in the shell, to be inherited by the background gc process, because our daemonize() only closes the standard fds 0, 1 and 2. - Duplicate this fd 9 to stdout. - Read 'git gc's stdout, and thus fd 9, through a command substitution. We don't actually care about gc's output, but this construct has two useful properties: - This read blocks until stdout or fd 9 are open. While stdout is closed after the main gc process creates the background process and exits, fd 9 remains open until the backround process exits. - The variable assignment from the command substitution gets its exit status from the command executed within the command substitution, i.e. a failing main gc process will cause the test to fail. Note, that this fd trickery doesn't work on Windows, because due to MSYS limitations the git process only inherits the standard fds 0, 1 and 2 from the shell. Luckily, it doesn't matter in this case, because on Windows daemonize() is basically a noop, thus 'git gc --auto' always runs in the foreground. And since we can now continue the test reliably after the detached gc finished, check that there is only a single packfile left at the end, i.e. that the detached gc actually did what it was supposed to do. Also add a comment at the end of the test script to warn developers of future tests about this issue of long running detached gc processes. Helped-by: Jeff KingHelped-by: Johannes Sixt Signed-off-by: SZEDER Gábor --- Updated subject line, but otherwise the same as v2. t/t6500-gc.sh | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index 08de2e8ab..cc7acd101 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -67,6 +67,16 @@ test_expect_success 'auto gc with too many loose objects does not attempt to cre test_line_count = 2 new # There is one new pack and its .idx ' +run_and_wait_for_auto_gc () { + # We read stdout from gc for the side effect of waiting until the + # background gc process exits, closing its fd 9. Furthermore, the + # variable assignment from a command substitution preserves the + # exit status of the main gc process. + # Note: this fd trickery doesn't work on Windows, but there is no + # need to, because on Win the auto gc always runs in the foreground. + doesnt_matter=$(git gc --auto 9>&1) +} + test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' ' test_commit foo && test_commit bar && @@ -80,7 +90,13 @@ test_expect_success 'background auto gc does not run if gc.log is present and re test-chmtime =-345600 .git/gc.log && test_must_fail git gc --auto && test_config gc.logexpiry 2.days && - git gc --auto + run_and_wait_for_auto_gc && + ls .git/objects/pack/pack-*.pack >packs && + test_line_count = 1 packs ' +# DO NOT leave a detached auto gc process running near the end of the +# test script: it can run long enough in the background to racily +# interfere with the cleanup in 'test_done'. + test_done -- 2.12.2.613.g9c5b79913
Re: [PATCH] status: show in-progress info for short status
On Wed, Apr 12, 2017 at 11:09 PM, Junio C Hamanowrote: > Jacob Keller writes: > >> Personally, I would want this to become the default and not have a new >> option to trigger it. I think we could also extend the porcelain >> format to include this information as well, but I'm not too familiar >> with how the v2 format extends or not. > > I think the general rule of thumb for --porcelain is that we can > freely introduce new record types without version bump, and expect > the reading scripts to ignore unrecognised records (we may need to > describe this a bit more strongly in our document, though), while > changes to the format of existing records must require a command > line option that cannot be turned on by default with configuration > (or a version bump, if you want to change the output format by > default). > > I am getting the impression that this "we are doing X" is a new and > discinct record type that existing readers can safely ignore? If > that is the case, it may be better to add it without making it > optional. I think we are safe in extending porcelain v2. Thanks, Jake
Re: [PATCH] status: show in-progress info for short status
On Wed, Apr 12, 2017 at 11:09 PM, Junio C Hamanowrote: > Jacob Keller writes: > >> Personally, I would want this to become the default and not have a new >> option to trigger it. I think we could also extend the porcelain >> format to include this information as well, but I'm not too familiar >> with how the v2 format extends or not. > > I think the general rule of thumb for --porcelain is that we can > freely introduce new record types without version bump, and expect > the reading scripts to ignore unrecognised records (we may need to > describe this a bit more strongly in our document, though), while > changes to the format of existing records must require a command > line option that cannot be turned on by default with configuration > (or a version bump, if you want to change the output format by > default). > > I am getting the impression that this "we are doing X" is a new and > discinct record type that existing readers can safely ignore? If > that is the case, it may be better to add it without making it > optional. Correct. But either way we should be free to change and extend the non-porcelain format without worry I thought? Thanks, Jake
Re: [PATCH 5/7] Add support for gnupg < 1.4
Ævar Arnfjörð Bjarmasonwrites: > On Wed, Apr 5, 2017 at 3:04 PM, Tom G. Christensen > wrote: >> This adds an OLD_GNUPG define to the Makefile which when activated will >> ensure git does not use the --keyid-format argument when calling the >> 'gpg' program. >> This is consistent with how 'gpg' was used in git < 2.10.0 and slightly >> decreases security. > > This changes the code Linus Torvalds added in b624a3e67f to mitigate > the evil32 project generating keys which looked the same for 32 bit > signatures. > > I think this change makes sense, but the Makefile should have a > slightly scarier warning, something like: > > "Define OLD_GNUPG if you need support for gnupg <1.4. Note that this > will cause git to only show the first 32 bits of PGP keys instead of > 64, and there's a wide variety of brute-forced 32 bit keys in the wild > thanks to the evil32 project (https://evil32.com). Enabling this will > make GPG work old versions, but you might be fooled into accepting > malicious keys as a result". Very good point. It surprised me somewhat to see that this is the only change necessary (iow, there is no need to tweak anything in t/ directory). Perhaps we would need a test or two (and need to figure out a way to make them work with both old and more recent GnuPG)?
Re: [PATCH 7/7] Do not use curl_easy_strerror with curl < 7.12.0
Jeff Kingwrites: > On Wed, Apr 05, 2017 at 03:04:24PM +0200, Tom G. Christensen wrote: > ... > These kinds of interleaved conditionals make me nervous that we'll get > something wrong (especially without braces, it's not immediately clear > that both sides are a single statement). > > I wonder if it would be more readable to do something like: > > #if LIBCURL_VERSION_NUM < 0x070c00 > static const char *curl_easy_strerror(CURL *curl) > { > return "[error code unavailable; curl version too old]"; > } > #endif > > Then callers don't have to individually deal with the ifdef. It does > mean that the user sees that kind-of ugly message, but maybe that is a > good thing. They know they need to upgrade curl to see more details. Yup, thanks for a very good suggestion.
Re: [PATCH v2 2/2] name-hash: fix buffer overrun
g...@jeffhostetler.com writes: > From: Kevin Willford> > Add check for the end of the entries for the thread partition. > Add test for lazy init name hash with specific directory structure > > The lazy init hash name was causing a buffer overflow when the last > entry in the index was multiple folder deep with parent folders that > did not have any files in them. > > This adds a test for the boundary condition of the thread partitions > with the folder structure that was triggering the buffer overflow. > The test is skipped on single-cpu machines because the original code > path is used in name-hash.c > > The fix was to check if it is the last entry for the thread partition > in the handle_range_dir and not try to use the next entry in the cache. As I merged the older one already to 'next', I'll queue 1/2 of v2 on top of them and then the following (which is incremental between v1 and this v2 2/2) on top. -- >8 -- From: Kevin Willford Date: Mon, 3 Apr 2017 15:16:42 + Subject: [PATCH] t3008: skip lazy-init test on a single-core box The lazy-init codepath will not be exercised uniless threaded. Skip the entire test on a single-core box. Also replace a hard-coded constant of 2000 (number of cache entries to manifacture for tests) with a variable with a human readable name. Signed-off-by: Kevin Willford Signed-off-by: Johannes Schindelin Signed-off-by: Jeff Hostetler Signed-off-by: Junio C Hamano --- t/t3008-ls-files-lazy-init-name-hash.sh | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/t/t3008-ls-files-lazy-init-name-hash.sh b/t/t3008-ls-files-lazy-init-name-hash.sh index 971975bff4..bdf5198b7e 100755 --- a/t/t3008-ls-files-lazy-init-name-hash.sh +++ b/t/t3008-ls-files-lazy-init-name-hash.sh @@ -4,14 +4,22 @@ test_description='Test the lazy init name hash with various folder structures' . ./test-lib.sh +if test 1 -eq $($GIT_BUILD_DIR/t/helper/test-online-cpus) +then + skip_all='skipping lazy-init tests, single cpu' + test_done +fi + +LAZY_THREAD_COST=2000 + test_expect_success 'no buffer overflow in lazy_init_name_hash' ' ( - test_seq 2000 | sed "s/^/a_/" + test_seq $LAZY_THREAD_COST | sed "s/^/a_/" echo b/b/b - test_seq 2000 | sed "s/^/c_/" + test_seq $LAZY_THREAD_COST | sed "s/^/c_/" test_seq 50 | sed "s/^/d_/" | tr "\n" "/"; echo d ) | - sed -e "s/^/100644 $EMPTY_BLOB /" | + sed "s/^/100644 $EMPTY_BLOB /" | git update-index --index-info && test-lazy-init-name-hash -m ' -- 2.12.2-776-gded3dc243c
Re: [PATCH] status: show in-progress info for short status
Jacob Kellerwrites: > Personally, I would want this to become the default and not have a new > option to trigger it. I think we could also extend the porcelain > format to include this information as well, but I'm not too familiar > with how the v2 format extends or not. I think the general rule of thumb for --porcelain is that we can freely introduce new record types without version bump, and expect the reading scripts to ignore unrecognised records (we may need to describe this a bit more strongly in our document, though), while changes to the format of existing records must require a command line option that cannot be turned on by default with configuration (or a version bump, if you want to change the output format by default). I am getting the impression that this "we are doing X" is a new and discinct record type that existing readers can safely ignore? If that is the case, it may be better to add it without making it optional.