Re: [PATCH v2 00/21] Read `packed-refs` using mmap()
Hi Peff, On Wed, 20 Sep 2017, Jeff King wrote: > On Tue, Sep 19, 2017 at 08:22:08AM +0200, Michael Haggerty wrote: > > > This branch is also available from my fork on GitHub as branch > > `mmap-packed-refs`. > > > > My main uncertainties are: > > > > 1. Does this code actually work on Windows? > > > > I can't really answer (1) due to lack of knowledge. Sorry, I have only a couple of minutes per day to look through the flood of emails from the Git mailing list and to answer them, so I forgot to say that I had a VM build and test Michael's patches, and they worked on Windows. Ciao, Dscho
Re: [PATCH v2 00/21] Read `packed-refs` using mmap()
On Tue, Sep 19, 2017 at 08:22:08AM +0200, Michael Haggerty wrote: > This branch is also available from my fork on GitHub as branch > `mmap-packed-refs`. > > My main uncertainties are: > > 1. Does this code actually work on Windows? > > 2. Did I implement the new compile-time option correctly? (I just >cargo-culted some existing options.) Is there some existing option >that I could piggy-back off of instead of adding a new one? > > 3. Is a compile-time option sufficient, or would the `mmap()` option >need to be configurable at runtime, or even tested at repository >create time as is done for some other filesystem properties in >`create_default_files()`? I can't really answer (1) due to lack of knowledge. For (2), I think it's mostly good, though I raised a small nit. For (3), I think this is really an OS restriction and not a filesystem one, so probably build-time is OK (though of course Windows people may be able to correct me). I left a few comments, but overall it looks quite good to me. -Peff
Re: [PATCH v2 00/21] Read `packed-refs` using mmap()
Hi Michael, On Tue, 19 Sep 2017, Michael Haggerty wrote: > This is v2 of a patch series that changes the reading and caching of the > `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and > Johannes for their comments about v1 [1]. Thank you for the new iteration. > The main change since v1 is to accommodate Windows, which doesn't let > you replace a file using `rename()` if the file is currently mmapped. > This is unfortunate, because it means that Windows will never get the > O(N) → O(lg N) improvement for reading single references that more > capable systems can now enjoy. Triggered by your enquiry, I looked into passing the FILE_SHARE_DELETE flag which I hoped would let us delete the file even if it still is open (and mapped). In my tests, this did not work. If anybody wants to have a look at what I did (and whether they can make it work): https://github.com/dscho/git/tree/replace-wopen > The background was discussed on the mailing list [2]. The bottom line > is that on Windows, keeping the `packed-refs` lock mmapped would be > tantamount to holding reader lock on that file, preventing anybody > (even unrelated processes) from changing the `packed-refs` file while > it is mmapped. This is even worse than the situation for packfiles > (which is solved using `close_all_packs()`), because a packfile, once > created, never needs to be replaced—every packfile has a filename that > is determined from its contents. The worst that can happen if a > packfile is locked is that another process cannot remove it, but that > is not critical for correctness. The `packed-refs` file, on the other > hand, always has the same filename and needs to be overwritten for > correctness. > > So the approach taken here is that a new compile-time option, > `MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then > the `packed-refs` file is read quickly into memory then closed. Another approach would be to imitate close_all_packs() and rely on the Windows-specific code that retries renames in a staggered fashion, waiting a little longer and longer before retrying, and finally telling the user that some file cannot be overwritten: https://github.com/git-for-windows/git/blob/v2.14.1.windows.1/compat/mingw.c#L2439-L2441 This is not a new problem, by the way. If a file is in use while you try to run `git checkout` with a different version of that file, we have the exact same problem on Windows. And we deal with it using that retry_ask_yes_no() function. For this to work, the current process really would need to be able to release all snapshots in one go (for simplicity, I would not even check the filename but simply blow them all away when we want to overwrite packed-refs). I guess I should set aside some time to implement that on top of your series (I *really* want our in-house users to benefit from that O(lg n) improvement). In the meantime, I think this can go forward with the current design. Ciao, Dscho
[PATCH v2 00/21] Read `packed-refs` using mmap()
This is v2 of a patch series that changes the reading and caching of the `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and Johannes for their comments about v1 [1]. The main change since v1 is to accommodate Windows, which doesn't let you replace a file using `rename()` if the file is currently mmapped. This is unfortunate, because it means that Windows will never get the O(N) → O(lg N) improvement for reading single references that more capable systems can now enjoy. The background was discussed on the mailing list [2]. The bottom line is that on Windows, keeping the `packed-refs` lock mmapped would be tantamount to holding reader lock on that file, preventing anybody (even unrelated processes) from changing the `packed-refs` file while it is mmapped. This is even worse than the situation for packfiles (which is solved using `close_all_packs()`), because a packfile, once created, never needs to be replaced—every packfile has a filename that is determined from its contents. The worst that can happen if a packfile is locked is that another process cannot remove it, but that is not critical for correctness. The `packed-refs` file, on the other hand, always has the same filename and needs to be overwritten for correctness. So the approach taken here is that a new compile-time option, `MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then the `packed-refs` file is read quickly into memory then closed. Even in that case, though, this branch brings significant performance benefits, because instead of parsing the whole file and storing it into lots of little objects in a `ref_cache` (which also involves a lot of small memory allocations), we copy the verbatim contents of the file into memory. Then we use the same binary search techniques to find any references that we need to read, just as we would do if the file were memory mapped. This means that we only have to fully parse the references that we are interested in, and hardly have to allocate any additional memory. I did some more careful benchmarks of this code vs. Git 2.14.1 on a repository that is not quite as pathological. The test repo has 110k references that are fully packed in a `packed-refs` file that has the `sorted` trait. The current version is compiled three ways: * `NO_MMAP=YesPlease`—prevents all use of `mmap()`. This variant is O(N) when reading a single reference. * `MMAP_PREVENTS_DELETE=YesPlease`—uses mmap for the initial read, but quickly copies the contents to heap-allocated memory and munmaps right away. This variant is also O(N) when reading a single reference. * default (mmap enabled)—the `packed-refs` file is kept mmapped for as long as it is in use. The commands that I timed were as follows: # for-each-ref, warm cache: $ git -C lots-of-refs for-each-ref --format="%(objectname) %(refname)" >/dev/null # rev-parse, warm cache (this command was run 10 times then the total # time divided by 10): $ git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733 # rev-parse, cold cache (but git binary warm): $ sync ; sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'; git rev-parse v1.0.0; time git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733 (Note that the `rev-parse` commands involve a handful of reference lookups as the argument is DWIMmed.) Results: for-each-ref rev-parse rev-parse warm cachewarm cachecold cache ---- v2.14.1 92 ms 23.7 ms 30 ms NO_MMAP=YesPlease 83 ms3.4 ms 10 ms MMAP_PREVENTS_DELETE=YesPlease82 ms3.5 ms 11 ms default (mmap enabled)81 ms0.8 ms 6 ms So this branch is a little bit faster at iterating over all references, but it really has big advantages when looking up single references. The advantage is smaller if `NO_MMAP` or `MMAP_PREVENTS_DELETE` is set, but is still quite significant. This branch is also available from my fork on GitHub as branch `mmap-packed-refs`. My main uncertainties are: 1. Does this code actually work on Windows? 2. Did I implement the new compile-time option correctly? (I just cargo-culted some existing options.) Is there some existing option that I could piggy-back off of instead of adding a new one? 3. Is a compile-time option sufficient, or would the `mmap()` option need to be configurable at runtime, or even tested at repository create time as is done for some other filesystem properties in `create_default_files()`? Michael [1] https://public-inbox.org/git/cover.1505319366.git.mhag...@alum.mit.edu/ [2] https://public-inbox.org/git/alpine.DEB.2.21.1.1709142101560.4132@virtualbox/ https://public-inbox.org/git/alpine.DEB.2.21.1.1709150012550.219280@virtualbox/ [3] https://github.com/mhagger/git Jeff King (1):
Re: [PATCH v2 00/21]
Duy Nguyenwrites: > On Sun, May 7, 2017 at 11:20 AM, Junio C Hamano wrote: >> On Thu, May 4, 2017 at 2:45 PM, Junio C Hamano wrote: >>> >>> Nguyễn Thái Ngọc Duy writes: >>> >>> > Changes since v1: >>> > >>> > - fopen_or_warn() and warn_on_fopen_errors() are introduced. The >>> >latter's name is probably not great... >>> > - A new patch (first one) to convert a bunch to using xfopen() >>> > - Fix test failure on Windows, found by Johannes Sixt >>> > - Fix the memory leak in log.c, found by Jeff >>> > >>> > There are still a couple of fopen() remained, but I'm getting slow >>> > again so let's get these in first and worry about the rest when >>> > somebody has time. >> >> Hmm, is this topic responsible for recent breakage Travis claims on MacOS X? >> >> https://travis-ci.org/git/git/jobs/229585098#L3091 >> >> seems to expect an error when test-config is fed a-directory but we are >> not getting the expected warning and error? > > Sounds about right. Let me see if defining FREAD_READS_DIRECTORIES on > MacOS X makes travis happy. Thanks. I should have suspected that myself to save a round-trip.
Re: [PATCH v2 00/21]
On Sun, May 7, 2017 at 11:20 AM, Junio C Hamanowrote: > On Thu, May 4, 2017 at 2:45 PM, Junio C Hamano wrote: >> >> Nguyễn Thái Ngọc Duy writes: >> >> > Changes since v1: >> > >> > - fopen_or_warn() and warn_on_fopen_errors() are introduced. The >> >latter's name is probably not great... >> > - A new patch (first one) to convert a bunch to using xfopen() >> > - Fix test failure on Windows, found by Johannes Sixt >> > - Fix the memory leak in log.c, found by Jeff >> > >> > There are still a couple of fopen() remained, but I'm getting slow >> > again so let's get these in first and worry about the rest when >> > somebody has time. > > Hmm, is this topic responsible for recent breakage Travis claims on MacOS X? > > https://travis-ci.org/git/git/jobs/229585098#L3091 > > seems to expect an error when test-config is fed a-directory but we are > not getting the expected warning and error? Sounds about right. Let me see if defining FREAD_READS_DIRECTORIES on MacOS X makes travis happy. -- Duy
Re: [PATCH v2 00/21]
On Thu, May 4, 2017 at 2:45 PM, Junio C Hamanowrote: > > Nguyễn Thái Ngọc Duy writes: > > > Changes since v1: > > > > - fopen_or_warn() and warn_on_fopen_errors() are introduced. The > >latter's name is probably not great... > > - A new patch (first one) to convert a bunch to using xfopen() > > - Fix test failure on Windows, found by Johannes Sixt > > - Fix the memory leak in log.c, found by Jeff > > > > There are still a couple of fopen() remained, but I'm getting slow > > again so let's get these in first and worry about the rest when > > somebody has time. Hmm, is this topic responsible for recent breakage Travis claims on MacOS X? https://travis-ci.org/git/git/jobs/229585098#L3091 seems to expect an error when test-config is fed a-directory but we are not getting the expected warning and error?
Re: [PATCH v2 00/21]
Nguyễn Thái Ngọc Duywrites: > Changes since v1: > > - fopen_or_warn() and warn_on_fopen_errors() are introduced. The >latter's name is probably not great... > - A new patch (first one) to convert a bunch to using xfopen() > - Fix test failure on Windows, found by Johannes Sixt > - Fix the memory leak in log.c, found by Jeff > > There are still a couple of fopen() remained, but I'm getting slow > again so let's get these in first and worry about the rest when > somebody has time. > > Nguyễn Thái Ngọc Duy (21): > Use xfopen() in more places > clone: use xfopen() instead of fopen() > config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD > wrapper.c: add warn_on_fopen_errors() > wrapper.c: add fopen_or_warn() > attr.c: use fopen_or_warn() > ident.c: use fopen_or_warn() > bisect: report on fopen() error > blame: report error on open if graft_file is a directory > log: report errno on failure to fopen() a file > log: fix memory leak in open_next_file() > commit.c: report error on failure to fopen() the graft file > remote.c: report error on failure to fopen() > rerere.c: report error on failure to fopen() > rerere.c: report correct errno > sequencer.c: report error on failure to fopen() > server-info: report error on failure to fopen() > wt-status.c: report error on failure to fopen() > xdiff-interface.c: report errno on failure to stat() or fopen() > config.c: handle error on failing to fopen() > t1308: add a test case on open a config directory Thanks. If the number of parts affected by this series were smaller, it may have made the review easier to have the introduction of a helper and its use in a single larger patch, but there are spread across many, some with files that are touched by different in-flight topics, and these "collection of smaller patches" makes it easier to manage both while reviewing and also merging. All looked good, even though I do share the doubt on the name "warn-on-fopen-errors"; when something applies equally to fopen(3) and underlying open(2), I would tend to call that with open not fopen myself. But that is a minor point.
[PATCH v2 00/21]
Changes since v1: - fopen_or_warn() and warn_on_fopen_errors() are introduced. The latter's name is probably not great... - A new patch (first one) to convert a bunch to using xfopen() - Fix test failure on Windows, found by Johannes Sixt - Fix the memory leak in log.c, found by Jeff There are still a couple of fopen() remained, but I'm getting slow again so let's get these in first and worry about the rest when somebody has time. Nguyễn Thái Ngọc Duy (21): Use xfopen() in more places clone: use xfopen() instead of fopen() config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD wrapper.c: add warn_on_fopen_errors() wrapper.c: add fopen_or_warn() attr.c: use fopen_or_warn() ident.c: use fopen_or_warn() bisect: report on fopen() error blame: report error on open if graft_file is a directory log: report errno on failure to fopen() a file log: fix memory leak in open_next_file() commit.c: report error on failure to fopen() the graft file remote.c: report error on failure to fopen() rerere.c: report error on failure to fopen() rerere.c: report correct errno sequencer.c: report error on failure to fopen() server-info: report error on failure to fopen() wt-status.c: report error on failure to fopen() xdiff-interface.c: report errno on failure to stat() or fopen() config.c: handle error on failing to fopen() t1308: add a test case on open a config directory attr.c| 7 ++- bisect.c | 7 ++- builtin/am.c | 8 ++-- builtin/blame.c | 4 +++- builtin/clone.c | 2 +- builtin/commit.c | 5 + builtin/fast-export.c | 4 +--- builtin/fsck.c| 3 +-- builtin/log.c | 11 --- builtin/merge.c | 4 +--- builtin/pull.c| 3 +-- commit.c | 4 +++- config.c | 5 - config.mak.uname | 3 +++ diff.c| 8 ++-- dir.c | 3 +-- fast-import.c | 4 +--- git-compat-util.h | 3 +++ ident.c | 8 +++- remote-testsvn.c | 8 ++-- remote.c | 4 ++-- rerere.c | 7 --- sequencer.c | 8 server-info.c | 2 +- t/t1308-config-set.sh | 13 - t/t5512-ls-remote.sh | 13 ++--- wrapper.c | 21 + wt-status.c | 3 ++- xdiff-interface.c | 4 ++-- 29 files changed, 103 insertions(+), 76 deletions(-) -- 2.11.0.157.gd943d85
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 12:40:29PM -0700, Junio C Hamano wrote: > > Here's that minor tweak, in case anybody is interested. It's less useful > > without that follow-on that touches "eol" more, but perhaps it increases > > readability on its own. > > Yup, the only thing that the original (with Brian's fix) appears to > be more careful about is it tries very hard to avoid setting boc > past eoc. As we are not checking "boc != eoc" but doing the > comparison, that "careful" appearance does not give us any benefit > in practice, other than having to do an extra "eol ? eol+1 : eoc"; > the result of this patch is easier to read. > > By the way, eoc is "one past the end" of the array that begins at > boc, so setting a pointer to eoc+1 may technically be in violation. > I do not know how much it matters, though ;-) I think that is OK. We are reading a strbuf, so eoc must either be the first character of the PGP signature, or the terminating NUL if there was no signature block[1]. So it's actually _inside_ the array, and eoc+1 is our "one past". -Peff [1] Arguably we should bail when parse_signature() does not find a PGP signature at all. We already bail with "malformed push certificate" when there are other syntactic anomalies.
Re: [PATCH v2 00/21] object_id part 7
Jeff Kingwrites: > Here's that minor tweak, in case anybody is interested. It's less useful > without that follow-on that touches "eol" more, but perhaps it increases > readability on its own. Yup, the only thing that the original (with Brian's fix) appears to be more careful about is it tries very hard to avoid setting boc past eoc. As we are not checking "boc != eoc" but doing the comparison, that "careful" appearance does not give us any benefit in practice, other than having to do an extra "eol ? eol+1 : eoc"; the result of this patch is easier to read. By the way, eoc is "one past the end" of the array that begins at boc, so setting a pointer to eoc+1 may technically be in violation. I do not know how much it matters, though ;-) > -- >8 -- > Subject: [PATCH] receive-pack: simplify eol handling in cert parsing > > The queue_commands_from_cert() function wants to handle each > line of the cert individually. It looks for "\n" in the > to-be-parsed bytes, and special-cases each use of eol (the > end-of-line variable) when we didn't find one. Instead, we > can just set the end-of-line variable to end-of-cert in the > latter case. > > For advancing to the next line, it's OK for us to move our > pointer past end-of-cert, because our loop condition just > checks for pointer inequality. And it doesn't even violate > the ANSI C "no more than one past the end of an array" rule, > because we know in the worst case we've hit the terminating > NUL of the strbuf. > > Signed-off-by: Jeff King > --- > builtin/receive-pack.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c > index 5d9e4da0a..58de2a1a9 100644 > --- a/builtin/receive-pack.c > +++ b/builtin/receive-pack.c > @@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command > **tail, > > while (boc < eoc) { > const char *eol = memchr(boc, '\n', eoc - boc); > - tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc); > - boc = eol ? eol + 1 : eoc; > + if (!eol) > + eol = eoc; > + tail = queue_command(tail, boc, eol - boc); > + boc = eol + 1; > } > }
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 01:35:36PM -0400, Jeff King wrote: > I thought I'd knock this out quickly before I forgot about it. But it > actually isn't so simple. > > The main caller in read_head_info() does indeed just pass strlen(line) > as the length in each case. But the cert parser really does need us to > respect the line length. So we either have to pass it in, or tie off the > string. > > The latter looks something like the patch below (on top of a minor > tweak around "eol" handling). It's sufficiently ugly that it may not > count as an actual cleanup, though. I'm OK if we just drop the idea. Here's that minor tweak, in case anybody is interested. It's less useful without that follow-on that touches "eol" more, but perhaps it increases readability on its own. -- >8 -- Subject: [PATCH] receive-pack: simplify eol handling in cert parsing The queue_commands_from_cert() function wants to handle each line of the cert individually. It looks for "\n" in the to-be-parsed bytes, and special-cases each use of eol (the end-of-line variable) when we didn't find one. Instead, we can just set the end-of-line variable to end-of-cert in the latter case. For advancing to the next line, it's OK for us to move our pointer past end-of-cert, because our loop condition just checks for pointer inequality. And it doesn't even violate the ANSI C "no more than one past the end of an array" rule, because we know in the worst case we've hit the terminating NUL of the strbuf. Signed-off-by: Jeff King--- builtin/receive-pack.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 5d9e4da0a..58de2a1a9 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command **tail, while (boc < eoc) { const char *eol = memchr(boc, '\n', eoc - boc); - tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc); - boc = eol ? eol + 1 : eoc; + if (!eol) + eol = eoc; + tail = queue_command(tail, boc, eol - boc); + boc = eol + 1; } } -- 2.12.2.845.g55fcf8b10
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 11:13:15AM +, brian m. carlson wrote: > > I suggested an additional cleanup around "linelen" in one patch. In the > > name of keeping the number of re-rolls sane, I'm OK if we skip that for > > now (the only reason I mentioned it at all is that you have to justify > > the caveat in the commit message; with the fix, that justification can > > go away). > > Let's leave it as it is, assuming Junio's okay with it. I can send in a > few more patches to clean that up and use skip_prefix that we can drop > on top and graduate separately. > > I think the justification is useful as it is, since it explains why we > no longer want to check that particular value for historical reasons. I thought I'd knock this out quickly before I forgot about it. But it actually isn't so simple. The main caller in read_head_info() does indeed just pass strlen(line) as the length in each case. But the cert parser really does need us to respect the line length. So we either have to pass it in, or tie off the string. The latter looks something like the patch below (on top of a minor tweak around "eol" handling). It's sufficiently ugly that it may not count as an actual cleanup, though. I'm OK if we just drop the idea. --- diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 58de2a1a9..561a982e7 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -1483,13 +1483,10 @@ static void execute_commands(struct command *commands, } static struct command **queue_command(struct command **tail, - const char *line, - int linelen) + const char *line) { struct object_id old_oid, new_oid; struct command *cmd; - const char *refname; - int reflen; const char *p; if (parse_oid_hex(line, _oid, ) || @@ -1498,9 +1495,7 @@ static struct command **queue_command(struct command **tail, *p++ != ' ') die("protocol error: expected old/new/ref, got '%s'", line); - refname = p; - reflen = linelen - (p - line); - FLEX_ALLOC_MEM(cmd, ref_name, refname, reflen); + FLEX_ALLOC_STR(cmd, ref_name, p); oidcpy(>old_oid, _oid); oidcpy(>new_oid, _oid); *tail = cmd; @@ -1510,7 +1505,7 @@ static struct command **queue_command(struct command **tail, static void queue_commands_from_cert(struct command **tail, struct strbuf *push_cert) { - const char *boc, *eoc; + char *boc, *eoc; if (*tail) die("protocol error: got both push certificate and unsigned commands"); @@ -1523,10 +1518,17 @@ static void queue_commands_from_cert(struct command **tail, eoc = push_cert->buf + parse_signature(push_cert->buf, push_cert->len); while (boc < eoc) { - const char *eol = memchr(boc, '\n', eoc - boc); + char *eol = memchr(boc, '\n', eoc - boc); + char tmp; + if (!eol) eol = eoc; - tail = queue_command(tail, boc, eol - boc); + + tmp = *eol; + *eol = '\0'; + tail = queue_command(tail, boc); + *eol = tmp; + boc = eol + 1; } } @@ -1590,7 +1592,7 @@ static struct command *read_head_info(struct oid_array *shallow) continue; } - p = queue_command(p, line, linelen); + p = queue_command(p, line); } if (push_cert.len)
Re: [PATCH v2 00/21] object_id part 7
Jeff Kingwrites: > On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote: > >> This is part 7 in the continuing transition to use struct object_id. >> >> This series focuses on two main areas: adding two constants for the >> maximum hash size we'll be using (which will be suitable for allocating >> memory) and converting struct sha1_array to struct oid_array. > > Both changes are very welcome. I do think it's probably worth changing > the name of sha1-array.[ch], but it doesn't need to happen immediately. > > I read through the whole series and didn't find anything objectionable. > The pointer-arithmetic fix should perhaps graduate separately. I didn't see anything incorrect when I queued the series, either, and after I re-read it I saw a few minor readability issues, but modulo that this looks ready. I did split the push-cert parsing fix and applied to an older base independently, though. > I suggested an additional cleanup around "linelen" in one patch. In the > name of keeping the number of re-rolls sane, I'm OK if we skip that for > now (the only reason I mentioned it at all is that you have to justify > the caveat in the commit message; with the fix, that justification can > go away). A follow-up after the dust settles could also mention "we earlier mentioned this caveat but with this fix we no longer have to worry about it", no? Thanks both, anyways.
Re: [PATCH v2 00/21] object_id part 7
On Tue, Mar 28, 2017 at 03:31:59AM -0400, Jeff King wrote: > I read through the whole series and didn't find anything objectionable. > The pointer-arithmetic fix should perhaps graduate separately. Junio's welcome to take that patch separately if he likes. > I suggested an additional cleanup around "linelen" in one patch. In the > name of keeping the number of re-rolls sane, I'm OK if we skip that for > now (the only reason I mentioned it at all is that you have to justify > the caveat in the commit message; with the fix, that justification can > go away). Let's leave it as it is, assuming Junio's okay with it. I can send in a few more patches to clean that up and use skip_prefix that we can drop on top and graduate separately. I think the justification is useful as it is, since it explains why we no longer want to check that particular value for historical reasons. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH v2 00/21] object_id part 7
On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote: > This is part 7 in the continuing transition to use struct object_id. > > This series focuses on two main areas: adding two constants for the > maximum hash size we'll be using (which will be suitable for allocating > memory) and converting struct sha1_array to struct oid_array. Both changes are very welcome. I do think it's probably worth changing the name of sha1-array.[ch], but it doesn't need to happen immediately. I read through the whole series and didn't find anything objectionable. The pointer-arithmetic fix should perhaps graduate separately. I suggested an additional cleanup around "linelen" in one patch. In the name of keeping the number of re-rolls sane, I'm OK if we skip that for now (the only reason I mentioned it at all is that you have to justify the caveat in the commit message; with the fix, that justification can go away). -Peff
[PATCH v2 00/21] object_id part 7
This is part 7 in the continuing transition to use struct object_id. This series focuses on two main areas: adding two constants for the maximum hash size we'll be using (which will be suitable for allocating memory) and converting struct sha1_array to struct oid_array. The rationale for adding separate constants for allocating memory is that with a new 256-bit hash function, we're going to need two different items: a constant for allocating memory that's as large as the largest hash, and a global variable telling us size the current hash is. I've opted to provide GIT_MAX_RAWSZ and GIT_MAX_HEXSZ for allocating memory, and leave GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ as values that can be later replaced by the aforementioned global. Replacing struct sha1_array with struct oid_array necessarily involves converting the shallow code, so I did that. The structure now handles objects of struct object_id. While I renamed the documentation (since people will search for that), I chose not to rename the sha1-array.[ch] files or the test helper because I didn't think it was worth the hassle, especially for people who don't have rename support turned on by default. There is also a patch for fixing some broken pointer arithmetic that was discovered during review of v1. I don't think it's exploitable, but it seems good to fix anyway. Additional eyes on this patch are welcomed. I chose to use Coccinelle quite a bit in this series, as it automates a lot of the manual work and aides in review. There is also some use of Perl one-liners. This series is available at https://github.com/bk2204/git under object-id-part7; it may be rebased. Changes from v1: * Rebase on current master (no changes). * Remove check for empty line in queue_command. * Add patch 6 to fix invalid pointer arithmetic. brian m. carlson (21): Define new hash-size constants for allocating memory Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ builtin/diff: convert to struct object_id builtin/pull: convert portions to struct object_id builtin/receive-pack: fix incorrect pointer arithmetic builtin/receive-pack: convert portions to struct object_id fsck: convert init_skiplist to struct object_id parse-options-cb: convert sha1_array_append caller to struct object_id test-sha1-array: convert most code to struct object_id sha1_name: convert struct disambiguate_state to object_id sha1_name: convert disambiguate_hint_fn to take object_id submodule: convert check_for_new_submodule_commits to object_id builtin/pull: convert to struct object_id sha1-array: convert internal storage for struct sha1_array to object_id Make sha1_array_append take a struct object_id * Convert remaining callers of sha1_array_lookup to object_id Convert sha1_array_lookup to take struct object_id Convert sha1_array_for_each_unique and for_each_abbrev to object_id Rename sha1_array to oid_array Documentation: update and rename api-sha1-array.txt .../{api-sha1-array.txt => api-oid-array.txt} | 44 +++ bisect.c | 43 --- builtin/blame.c| 4 +- builtin/cat-file.c | 14 +-- builtin/diff.c | 40 +++--- builtin/fetch-pack.c | 2 +- builtin/fetch.c| 6 +- builtin/merge-index.c | 2 +- builtin/merge.c| 2 +- builtin/pack-objects.c | 24 ++-- builtin/patch-id.c | 2 +- builtin/pull.c | 98 +++ builtin/receive-pack.c | 136 ++--- builtin/rev-list.c | 2 +- builtin/rev-parse.c| 4 +- builtin/send-pack.c| 4 +- cache.h| 10 +- combine-diff.c | 18 +-- commit.h | 14 +-- connect.c | 8 +- diff.c | 4 +- diff.h | 4 +- fetch-pack.c | 32 ++--- fetch-pack.h | 4 +- fsck.c | 17 +-- fsck.h | 2 +- hex.c | 2 +- parse-options-cb.c | 8 +- patch-ids.c| 2 +- patch-ids.h| 2 +- ref-filter.c | 22 ++--
Re: [PATCH v2 00/21] Add configuration options for split-index
On Mon, Dec 19, 2016 at 1:02 PM, Duy Nguyenwrote: > On Sat, Dec 17, 2016 at 03:55:26PM +0100, Christian Couder wrote: >> Goal >> >> >> We want to make it possible to use the split-index feature >> automatically by just setting a new "core.splitIndex" configuration >> variable to true. >> >> This can be valuable as split-index can help significantly speed up >> `git rebase` especially along with the work to libify `git apply` >> that has been merged to master >> (see >> https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98) >> and is now in v2.11. > > I've read through the series (*) and I think it looks good, just a few > minor comments here and there. Thanks for your review. I think I addressed all the minor points left in the v3 and the emails I just sent. > (*) guiltily admit that I only skimmed through tests, not giving them > as much attention as I should have
Re: [PATCH v2 00/21] Add configuration options for split-index
Duy Nguyenwrites: > I've read through the series (*) and I think it looks good, just a few > minor comments here and there. > > (*) guiltily admit that I only skimmed through tests, not giving them > as much attention as I should have OK. I'd still want to see them get reviewed, though. Perhaps I'll do so myself once I run out of things to do, but hopefully somebody else gets there first ;-) Thanks.
Re: [PATCH v2 00/21] Add configuration options for split-index
On Sat, Dec 17, 2016 at 03:55:26PM +0100, Christian Couder wrote: > Goal > > > We want to make it possible to use the split-index feature > automatically by just setting a new "core.splitIndex" configuration > variable to true. > > This can be valuable as split-index can help significantly speed up > `git rebase` especially along with the work to libify `git apply` > that has been merged to master > (see > https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98) > and is now in v2.11. I've read through the series (*) and I think it looks good, just a few minor comments here and there. (*) guiltily admit that I only skimmed through tests, not giving them as much attention as I should have -- Duy
Re: [PATCH v2 00/21] Add configuration options for split-index
> The previous versions were: > > RFC: https://github.com/chriscool/git/commits/config-split-index7 > v1: https://github.com/chriscool/git/commits/config-split-index72 The diff since v1 is: diff --git a/Documentation/config.txt b/Documentation/config.txt index 8fbef25cb1..52a3cac4ff 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2782,9 +2782,9 @@ splitIndex.sharedIndexExpire:: index file is created. The value "now" expires all entries immediately, and "never" suppresses expiration altogether. The default value is "one.week.ago". -Note that each time a new split-index file is created, the -mtime of the related shared index file is updated to the -current time. +Note that each time a split index based on a shared index file +is either created or read from, the mtime of the shared index +file is updated to the current time. See linkgit:git-update-index[1]. status.relativePaths:: diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index 635d1574b2..46c953b2f2 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -407,10 +407,14 @@ specified by the splitIndex.maxPercentChange config variable (see linkgit:git-config[1]). Each time a new shared index file is created, the old shared index -files are deleted if they are older than what is specified by the -splitIndex.sharedIndexExpire config variable (see +files are deleted if their mtime is older than what is specified by +the splitIndex.sharedIndexExpire config variable (see linkgit:git-config[1]). +To avoid deleting a shared index file that is still used, its mtime is +updated to the current time everytime a new split index based on the +shared index file is either created or read from. + Untracked cache --- diff --git a/builtin/gc.c b/builtin/gc.c index c1e9602892..1e40d45aa2 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -100,8 +100,8 @@ static void gc_config(void) git_config_get_int("gc.auto", _auto_threshold); git_config_get_int("gc.autopacklimit", _auto_pack_limit); git_config_get_bool("gc.autodetach", _auto); -git_config_get_date_string("gc.pruneexpire", _expire); -git_config_get_date_string("gc.worktreepruneexpire", _worktrees_expire); +git_config_get_expiry("gc.pruneexpire", _expire); +git_config_get_expiry("gc.worktreepruneexpire", _worktrees_expire); git_config(git_default_config, NULL); } diff --git a/builtin/update-index.c b/builtin/update-index.c index a14dbf2612..dc1fd0d44d 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -1099,18 +1099,18 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) if (split_index > 0) { if (git_config_get_split_index() == 0) -warning("core.splitIndex is set to false; " -"remove or change it, if you really want to " -"enable split index"); +warning(_("core.splitIndex is set to false; " + "remove or change it, if you really want to " + "enable split index")); if (the_index.split_index) the_index.cache_changed |= SPLIT_INDEX_ORDERED; else add_split_index(_index); } else if (!split_index) { if (git_config_get_split_index() == 1) -warning("core.splitIndex is set to true; " -"remove or change it, if you really want to " -"disable split index"); +warning(_("core.splitIndex is set to true; " + "remove or change it, if you really want to " + "disable split index")); remove_split_index(_index); } diff --git a/cache.h b/cache.h index 8e26aaf05e..279415afbd 100644 --- a/cache.h +++ b/cache.h @@ -1823,11 +1823,13 @@ extern int git_config_get_bool(const char *key, int *dest); extern int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest); extern int git_config_get_maybe_bool(const char *key, int *dest); extern int git_config_get_pathname(const char *key, const char **dest); -extern int git_config_get_date_string(const char *key, const char **output); extern int git_config_get_untracked_cache(void); extern int git_config_get_split_index(void); extern int git_config_get_max_percent_split_change(void); +/* This dies if the configured or default date is in the future */ +extern int git_config_get_expiry(const char *key, const char **output); + /* * This is a hack for test programs like test-dump-untracked-cache to * ensure that they do not modify the untracked cache when reading it. diff --git a/config.c b/config.c index f88c61bb30..5c52cefd78 100644 --- a/config.c +++ b/config.c @@ -1685,7 +1685,7 @@ int git_config_get_pathname(const char *key, const char **dest) return ret; } -int git_config_get_date_string(const char *key, const char **output) +int git_config_get_expiry(const char *key, const
[PATCH v2 00/21] Add configuration options for split-index
Goal We want to make it possible to use the split-index feature automatically by just setting a new "core.splitIndex" configuration variable to true. This can be valuable as split-index can help significantly speed up `git rebase` especially along with the work to libify `git apply` that has been merged to master (see https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98) and is now in v2.11. Design ~~ The design is similar as the previous work that introduced "core.untrackedCache". The new "core.splitIndex" configuration option can be either true, false or undefined which is the default. When it is true, the split index is created, if it does not already exists, when the index is read. When it is false, the split index is removed if it exists, when the index is read. Otherwise it is left as is. Along with this new configuration variable, the two following options are also introduced: - splitIndex.maxPercentChange This is to avoid having too many changes accumulating in the split index while in split index mode. The git-update-index documentation says: If split-index mode is already enabled and `--split-index` is given again, all changes in $GIT_DIR/index are pushed back to the shared index file. but it is probably better to not expect the user to think about it and to have a mechanism that pushes back all changes to the shared index file automatically when some threshold is reached. The default threshold is when the number of entries in the split index file reaches 20% of the number of entries in the shared index file. The new "splitIndex.maxPercentChange" config option lets people tweak this value. - splitIndex.sharedIndexExpire To make sure that old sharedindex files are eventually removed when a new one has been created, we "touch" the shared index file every time a split index file using the shared index file is either created or read from. Then we can delete shared indexes with an mtime older than one week (by default), when we create a new shared index file. The new "splitIndex.sharedIndexExpire" config option lets people tweak this grace period. This idea was suggested by Duy in: https://public-inbox.org/git/cacsjy8bqmfashf5kjguh+bd7xg98cafnyde964vjypxz-em...@mail.gmail.com/ and after some experiments, I agree that it is much simpler than what I thought could be done during our discussion. Junio also thinks that we have to do "time-based GC" in: https://public-inbox.org/git/xmqqeg33ccjj@gitster.mtv.corp.google.com/ Highlevel view of the patches in the series ~~~ Except for patch 1/21, there are 3 big steps, one for each new configuration variable introduced. The main difference between this patch series and the v1 patch series sent last October is that Step 3 has a few new commits to also update the mtime of the shared index file when a split index based on the shared index file is read from. - Patch 1/21 marks a message for translation. It is a new patch and it can be applied separately. (The patch 1/19 in v1, which was a typo fix, has been merged separately into master already.) Step 1 is: - Patches 2/21 to 5/21 introduce the functions that are reading the "core.splitIndex" configuration variable and tweaking the split index depending on its value. - Patch 6/21 adds a few tests for the new feature. - Patches 7/21 and 8/21 add some documentation for the new feature. The only change since v1 in this step is that some warning messages in 5/21 have been marked for translation as suggested by Duy. Step 2 is: - Patches 9/21 and 10/21 introduce the functions that are reading the "splitIndex.maxPercentChange" configuration variable and regenerating a new shared index file depending on its value. - Patch 11/21 adds a few tests for the new feature. - Patch 12/21 add some documentation for the new feature. The changes since v1 in this step are: - an error message has been marked for translation in 9/21, - camelCase is used in the same error message as suggested by Duy in 9/21, - "return error(...)" is now used as suggested by Junio in 9/21, - too_many_not_shared_entries() is now "static" as suggested by Ramsay in 10/21, - changes made in write_locked_index() have been reorganized as suggested by Duy in 10/21, Step 3 is: - Patches 13/21 to 16/21 introduce the functions that are reading the "splitIndex.sharedIndexExpire" configuration variable and expiring old shared index files depending on its value. - Patch 17/21 adds a few tests for the new feature. - Patches 18/21 and 19/21 are new patches. They update the mtime of the shared index file when a split index based on the shared
[PATCH v2 00/21] git bisect improvements
Hi, a long time ago[1] I sent the first version of this patchset to the list. Since then I wrote different variants of the algorithm, fixed some bugs, made the tests work ;), and finally performed some performance tests to pick the best version of the different variants. For the performance tests I used the Git repositories of the Linux kernel and of Git itself and performed whole-history bisections with a bisect script that decided "good" or "bad" based on the hash of a commit. And another bisect script that did the opposite. I omit the details. The best variant uses a DFS for the traversal without any further "smart" tricks: These tricks led to more "administrative expense" than gain of performance. I'm sorry that it took so long to prepare the 2nd patchset (mainly vacation and other work in between). So I hope it's sufficiently good for inclusion. :) Cheers 1. https://www.mail-archive.com/git@vger.kernel.org/msg86353.html Stephan Beyer (21): bisect: write about `bisect next` in documentation bisect: allow 'bisect run' if no good commit is known t/test-lib-functions.sh: generalize test_cmp_rev t: use test_cmp_rev() where appropriate t6030: generalize test to not rely on current implementation bisect: add test for the bisect algorithm bisect: plug the biggest memory leak bisect: make bisect compile if DEBUG_BISECT is set bisect: make algorithm behavior independent of DEBUG_BISECT bisect: get rid of recursion in count_distance() bisect: use struct node_data array instead of int array bisect: replace clear_distance() by unique markers bisect: use commit instead of commit list as arguments when appropriate bisect: extract get_distance() function from code duplication bisect: introduce distance_direction() bisect: make total number of commits global bisect: rename count_distance() to compute_weight() bisect: prepare for different algorithms based on find_all bisect: use a bottom-up traversal to find relevant weights bisect: compute best bisection in compute_relevant_weights() bisect: get back halfway shortcut Documentation/git-bisect.txt | 24 ++ bisect.c | 481 -- git-bisect.sh | 32 +- t/t2012-checkout-last.sh | 8 +- t/t3308-notes-merge.sh| 8 +- t/t3310-notes-merge-manual-resolve.sh | 8 +- t/t3311-notes-merge-fanout.sh | 6 +- t/t3404-rebase-interactive.sh | 38 +-- t/t3407-rebase-abort.sh | 8 +- t/t3410-rebase-preserve-dropped-merges.sh | 4 +- t/t3411-rebase-preserve-around-merges.sh | 10 +- t/t3414-rebase-preserve-onto.sh | 12 +- t/t3501-revert-cherry-pick.sh | 4 +- t/t3506-cherry-pick-ff.sh | 6 +- t/t3903-stash.sh | 6 +- t/t4150-am.sh | 18 +- t/t5404-tracking-branches.sh | 2 +- t/t5505-remote.sh | 4 +- t/t5520-pull.sh | 36 +-- t/t6022-merge-rename.sh | 2 +- t/t6030-bisect-porcelain.sh | 228 +++--- t/t6036-recursive-corner-cases.sh | 58 ++-- t/t6042-merge-rename-corner-cases.sh | 50 ++-- t/t7003-filter-branch.sh | 8 +- t/t7004-tag.sh| 2 +- t/t7110-reset-merge.sh| 24 +- t/t7201-co.sh | 12 +- t/t7601-merge-pull-config.sh | 17 +- t/t7603-merge-reduce-heads.sh | 30 +- t/t7605-merge-resolve.sh | 5 +- t/t8010-bisect-algorithm.sh | 155 ++ t/t9162-git-svn-dcommit-interactive.sh| 8 +- t/t9300-fast-import.sh| 12 +- t/test-lib-functions.sh | 14 +- 34 files changed, 832 insertions(+), 508 deletions(-) create mode 100755 t/t8010-bisect-algorithm.sh -- 2.8.1.137.g522756c -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
On Fri, Dec 27, 2013 at 12:12 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote: Do we even need to expose them as ref-like things as a part of the external API/UI in the first place? For end-user scripts that want to operate in a real or borrowing worktree, there should be no reason to touch this other repository directly. Things like if one of the wortrees tries to check out a branch that is already checked out elsewhere, error out policy may need to consult the other worktrees via $GIT_COMMON_DIR mechanism but at that level we have all the control without contaminating end-user facing ref namespace in a way main/FETCH_HEAD... do. No, external API/UI is just extra bonus. We need to (or should) do so in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal refs. And that is what I consider a confusion-trigger, not a nice bonus. I do not think it is particularly a good idea to contaminate end-user namespace for this kind of peek another repository special operation. In order to handle your multiple worktrees manipulating the same branch case sanely, you need to be aware of not just the real repository your worktree is borrowing from, but also _all_ the other worktrees that borrow from that same real repository, so a single main virtual namespace will not cut it. You will be dealing with an unbounded set of HEADs, one for each such worktree. Yes. My problem is, while all secondary worktrees are in $GIT_DIR/repos and their HEADs can be accessed there with repos/xxx/HEAD, the first worktree's HEAD can't be accessed this way because HEAD in a linked checkouts means repos/my worktree/HEAD. Can't we do this by adding a with this real repository layer, e.g. resolve HEAD wrt that repository, somewhat similar to how we peek into submodule namespaces? Hmm.. never thought of it like a submodule. Thanks for the idea. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
Duy Nguyen pclo...@gmail.com writes: On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote: Do we even need to expose them as ref-like things as a part of the external API/UI in the first place? For end-user scripts that want to operate in a real or borrowing worktree, there should be no reason to touch this other repository directly. Things like if one of the wortrees tries to check out a branch that is already checked out elsewhere, error out policy may need to consult the other worktrees via $GIT_COMMON_DIR mechanism but at that level we have all the control without contaminating end-user facing ref namespace in a way main/FETCH_HEAD... do. No, external API/UI is just extra bonus. We need to (or should) do so in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal refs. And that is what I consider a confusion-trigger, not a nice bonus. I do not think it is particularly a good idea to contaminate end-user namespace for this kind of peek another repository special operation. In order to handle your multiple worktrees manipulating the same branch case sanely, you need to be aware of not just the real repository your worktree is borrowing from, but also _all_ the other worktrees that borrow from that same real repository, so a single main virtual namespace will not cut it. You will be dealing with an unbounded set of HEADs, one for each such worktree. Can't we do this by adding a with this real repository layer, e.g. resolve HEAD wrt that repository, somewhat similar to how we peek into submodule namespaces? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: I am not happy with the choice of main/HEAD that would squat on a good name for remote-tracking branch (i.e. s/origin/main/), though. $GIT_DIR/COMMON_HEAD perhaps? It's not just about HEAD. Anything worktree-specific of the main checkout can be accessed this way, e.g. main/index, main/FETCH_HEAD and it's not exactly common because it's worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...) ? Do we even need to expose them as ref-like things as a part of the external API/UI in the first place? For end-user scripts that want to operate in a real or borrowing worktree, there should be no reason to touch this other repository directly. Things like if one of the wortrees tries to check out a branch that is already checked out elsewhere, error out policy may need to consult the other worktrees via $GIT_COMMON_DIR mechanism but at that level we have all the control without contaminating end-user facing ref namespace in a way main/FETCH_HEAD... do. No, external API/UI is just extra bonus. We need to (or should) do so in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal refs. Given any ref, git_path(ref) gives the path to that ref, git_path(logs/%s, ref) gives the path of its reflog. By mapping special names to real paths behind git_path(), We can feed $GIT_COMMON_DIR/HEAD (under special names) to refs.c and it'll handle correctly without any changes for special cases. You said This makes it possible for a linked checkout to detach HEAD of the main one. but I think that is misguided---it just makes it easier to confuse users, if done automatically and without any policy knob. It instead should error out, while saying which worktree has the branch in question checked out. After all, checking out a branch that is checked out in another worktree (not the common one) needs the same caution, so main/HEAD is not the only special one. And if your updated git checkout 'frotz' gives a clear report of which worktree has the branch 'frotz' checked out, the user can go there to detach the HEAD in that worktree to detach with git -C $the_other_one checkout HEAD^0 if he chooses to. Jonathan mentions about the checkout in portable device case that would make the above a bit unnatural as you just can't cd there (git update-ref still works). I don't see any problems with checking out a branch multiple times. I may want to try modifying something in the branch that will be thrown away in the end. It's when the user updates a branch that we should either error+reject or detach other checkouts. I guess it's up to the user to decide which way they want. The error+reject way may make the user hunt through dead checkouts waiting to be pruned. But we can start with error+reject then add an option to auto-detach. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
Duy Nguyen pclo...@gmail.com writes: I am not happy with the choice of main/HEAD that would squat on a good name for remote-tracking branch (i.e. s/origin/main/), though. $GIT_DIR/COMMON_HEAD perhaps? It's not just about HEAD. Anything worktree-specific of the main checkout can be accessed this way, e.g. main/index, main/FETCH_HEAD and it's not exactly common because it's worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...) ? Do we even need to expose them as ref-like things as a part of the external API/UI in the first place? For end-user scripts that want to operate in a real or borrowing worktree, there should be no reason to touch this other repository directly. Things like if one of the wortrees tries to check out a branch that is already checked out elsewhere, error out policy may need to consult the other worktrees via $GIT_COMMON_DIR mechanism but at that level we have all the control without contaminating end-user facing ref namespace in a way main/FETCH_HEAD... do. You said This makes it possible for a linked checkout to detach HEAD of the main one. but I think that is misguided---it just makes it easier to confuse users, if done automatically and without any policy knob. It instead should error out, while saying which worktree has the branch in question checked out. After all, checking out a branch that is checked out in another worktree (not the common one) needs the same caution, so main/HEAD is not the only special one. And if your updated git checkout 'frotz' gives a clear report of which worktree has the branch 'frotz' checked out, the user can go there to detach the HEAD in that worktree to detach with git -C $the_other_one checkout HEAD^0 if he chooses to. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
Duy Nguyen pclo...@gmail.com writes: I've got a better version [1] that fixes everything I can think of (there's still some room for improvements). I'm going to use it a bit longer before reposting again. But here's its basic design without going down to code New .git file format includes two lines: -- 8 -- gitid: id gitdir: path -- 8 -- Which would set $GIT_COMMON_DIR to path and $GIT_DIR to path/repos/id. Repository split is the same as before, worktree stuff in $GIT_DIR, the rest in $GIT_COMMON_DIR. This .git file format takes precedence over core.worktree but can still be overriden with $GIT_WORK_TREE. The main interface to create new worktree is git checkout --to. repos belongs to $GIT_COMMON_DIR (i.e. shared across all checkouts). The new worktrees (which I call linked checkouts) can also access HEAD of the original worktree via a virtual path main/HEAD. This makes it possible for a linked checkout to detach HEAD of the main one. I am not happy with the choice of main/HEAD that would squat on a good name for remote-tracking branch (i.e. s/origin/main/), though. $GIT_DIR/COMMON_HEAD perhaps? The interesting thing is support for third party scripts (or hooks, maybe) so that they could work with both old and new git versions without some sort of git version/feature detection. Of course old git versions will only work with ordinary worktrees. To that end, git rev-parse --git-dir behavior could be changed by two environment variables. $GIT_ONE_PATH makes 'rev-parse --git-dir' return the .git _file_ in this case, which makes it much easier to pass the repo's checkout view around with git --git-dir=... .$GIT_COMMON_DIR_PATH makes 'rev-parse --git-dir' return $GIT_COMMON_DIR if it's from a linked checkout, or $GIT_DIR otherwise. I do not understand why you need to go such a route. Existing scripts that works only in a real repository will only know git rev-parse --git-dir as the way to get the real GIT_DIR and would not care about the common thing. Scripts updated to work well with the common thing needs to be aware of the common thing anyway, so adding git rev-parse --common-git-dir or somesuch that only these updated knows would be sufficient, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
On Sat, Dec 21, 2013 at 3:32 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: I've got a better version [1] that fixes everything I can think of (there's still some room for improvements). I'm going to use it a bit longer before reposting again. But here's its basic design without going down to code New .git file format includes two lines: -- 8 -- gitid: id gitdir: path -- 8 -- Which would set $GIT_COMMON_DIR to path and $GIT_DIR to path/repos/id. Repository split is the same as before, worktree stuff in $GIT_DIR, the rest in $GIT_COMMON_DIR. This .git file format takes precedence over core.worktree but can still be overriden with $GIT_WORK_TREE. The main interface to create new worktree is git checkout --to. repos belongs to $GIT_COMMON_DIR (i.e. shared across all checkouts). The new worktrees (which I call linked checkouts) can also access HEAD of the original worktree via a virtual path main/HEAD. This makes it possible for a linked checkout to detach HEAD of the main one. I am not happy with the choice of main/HEAD that would squat on a good name for remote-tracking branch (i.e. s/origin/main/), though. $GIT_DIR/COMMON_HEAD perhaps? It's not just about HEAD. Anything worktree-specific of the main checkout can be accessed this way, e.g. main/index, main/FETCH_HEAD and it's not exactly common because it's worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...) ? The interesting thing is support for third party scripts (or hooks, maybe) so that they could work with both old and new git versions without some sort of git version/feature detection. Of course old git versions will only work with ordinary worktrees. To that end, git rev-parse --git-dir behavior could be changed by two environment variables. $GIT_ONE_PATH makes 'rev-parse --git-dir' return the .git _file_ in this case, which makes it much easier to pass the repo's checkout view around with git --git-dir=... .$GIT_COMMON_DIR_PATH makes 'rev-parse --git-dir' return $GIT_COMMON_DIR if it's from a linked checkout, or $GIT_DIR otherwise. I do not understand why you need to go such a route. Existing scripts that works only in a real repository will only know git rev-parse --git-dir as the way to get the real GIT_DIR and would not care about the common thing. Scripts updated to work well with the common thing needs to be aware of the common thing anyway, so adding git rev-parse --common-git-dir or somesuch that only these updated knows would be sufficient, no? It simplifies the changes, if the new script is to work with both old and new git versions it may have to write DIR=`git rev-parse --git-common-dir 2/dev/null || git rev-parse --git-dir` the env way makes it DIR=`GIT_COMMON_DIR=1 git rev-parse --git-dir` -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
I've got a better version [1] that fixes everything I can think of (there's still some room for improvements). I'm going to use it a bit longer before reposting again. But here's its basic design without going down to code New .git file format includes two lines: -- 8 -- gitid: id gitdir: path -- 8 -- Which would set $GIT_COMMON_DIR to path and $GIT_DIR to path/repos/id. Repository split is the same as before, worktree stuff in $GIT_DIR, the rest in $GIT_COMMON_DIR. This .git file format takes precedence over core.worktree but can still be overriden with $GIT_WORK_TREE. The main interface to create new worktree is git checkout --to. repos belongs to $GIT_COMMON_DIR (i.e. shared across all checkouts). The new worktrees (which I call linked checkouts) can also access HEAD of the original worktree via a virtual path main/HEAD. This makes it possible for a linked checkout to detach HEAD of the main one. There are three entries in repos/id: gitdir should point to the .git file that points it back here. Every time a linked checkout is used, git should update mtime of this gitdir file to help pruning. It should update the file content too if the repo is moved. link is a hardlink to .git file, if supported, again for pruning support. locked, if exists, means no automatic pruning (e.g. the linked checkout is on a portable device). The interesting thing is support for third party scripts (or hooks, maybe) so that they could work with both old and new git versions without some sort of git version/feature detection. Of course old git versions will only work with ordinary worktrees. To that end, git rev-parse --git-dir behavior could be changed by two environment variables. $GIT_ONE_PATH makes 'rev-parse --git-dir' return the .git _file_ in this case, which makes it much easier to pass the repo's checkout view around with git --git-dir=... .$GIT_COMMON_DIR_PATH makes 'rev-parse --git-dir' return $GIT_COMMON_DIR if it's from a linked checkout, or $GIT_DIR otherwise. This makes 'rev-parse --git-dir' falls back safely when running using old git versions. The last patch in [1] that updates git-completion.bash could demonstrate how it's used. [1] https://github.com/pclouds/git.git checkout-new-worktree -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 00/21] Support multiple worktrees
The UI and behavior are taking shape now. On the UI side, you do git checkout --to /somewhere -b newbranch origin/master which will create worktree-only repo at /somewhere. git prune --repos could be used to remove cruft in .git/repos. On the behavior side, you should be able to do everything in /somewhere just like in a normal repository. If a ref is updated (from any repository) that also happens to be your HEAD, it will be detached. git rev-list --all is also taught to take repos/.../HEAD into account. The structure of repos/XXX is documented in 17/21. Known issues - naming ($GIT_SUPER_DIR, the name of the shared repo and the dependent one, the reuse of gitdir in .git files) - gc --auto support, support for manually pruning .git/repos - should probably support the new .git format in enter_repo() so that we can push to it - not sure if we need UI for deleting repositories created with checkout --to, or just rm -r and let gc --auto clean things up. The thing about rm -r(f) is that if .git happens to be a real repo, the user is screwed so I don't really like to encourage doing it. - more tests Nguyễn Thái Ngọc Duy (21): path.c: avoid PATH_MAX as buffer size from get_pathname() path.c: rename vsnpath() to git_vsnpath() path.c: move git_path() closer to similar functions git_pathdup() Make git_path() aware of file relocation in $GIT_DIR reflog: use avoid constructing .lock path with git_path fast-import: use git_path() for accessing .git dir instead of get_git_dir() Add new environment variable $GIT_SUPER_DIR setup.c: refactor path manipulation out of read_gitfile() setup.c: add split-repo support to .git files setup.c: add split-repo support to is_git_directory() setup.c: reduce cleanup sites in setup_explicit_git_dir() environment.c: support super .git file specified by $GIT_DIR setup: support $GIT_SUPER_DIR as well as super .git files checkout: support checking out into a new working directory checkout: clean up half-prepared directories in --to mode setup.c: keep track of the .git file location if read prune: strategies for split repositories refs: adjust reflog path for repos/id/HEAD refs: detach split repos' HEAD when the linked ref is updated/deleted refs.c: refactor do_head_ref(... to do_one_ref(HEAD, ... revision: include repos/../HEAD in --all Documentation/config.txt | 3 +- Documentation/git-checkout.txt | 6 + Documentation/git-prune.txt| 4 + Documentation/git.txt | 8 ++ Documentation/gitrepository-layout.txt | 30 builtin/checkout.c | 173 ++ builtin/prune.c| 65 + builtin/reflog.c | 2 +- builtin/rev-parse.c| 6 + cache.h| 5 + environment.c | 37 - fast-import.c | 5 +- path.c | 140 ++ refs.c | 88 ++-- refs.h | 1 + revision.c | 1 + setup.c| 253 - t/t0060-path-utils.sh | 133 + t/t1501-worktree.sh| 52 +++ t/t1510-repo-setup.sh | 1 + test-path-utils.c | 7 + trace.c| 1 + 22 files changed, 904 insertions(+), 117 deletions(-) -- 1.8.5.1.77.g42c48fa -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/21] Support multiple worktrees
On Sat, Dec 14, 2013 at 5:54 PM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote: Known issues Scripts that expand $GIT_DIR/objects and are not aware about the new env variable. I introduced test-path-utils --git-path to test git_path(). I could move it to git rev-parse --git-path for use in scripts, but there'll be more changes. git-new-workdir's symlink approach shines here. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html