subject:"\\\[PATCH v2 00\\\/21\\\]"

Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-25 Thread Johannes Schindelin

Hi Peff,

On Wed, 20 Sep 2017, Jeff King wrote:

> On Tue, Sep 19, 2017 at 08:22:08AM +0200, Michael Haggerty wrote:
> 
> > This branch is also available from my fork on GitHub as branch
> > `mmap-packed-refs`.
> > 
> > My main uncertainties are:
> > 
> > 1. Does this code actually work on Windows?
> > 
> 
> I can't really answer (1) due to lack of knowledge.

Sorry, I have only a couple of minutes per day to look through the flood
of emails from the Git mailing list and to answer them, so I forgot to say
that I had a VM build and test Michael's patches, and they worked on
Windows.

Ciao,
Dscho

Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-20 Thread Jeff King

On Tue, Sep 19, 2017 at 08:22:08AM +0200, Michael Haggerty wrote:

> This branch is also available from my fork on GitHub as branch
> `mmap-packed-refs`.
> 
> My main uncertainties are:
> 
> 1. Does this code actually work on Windows?
> 
> 2. Did I implement the new compile-time option correctly? (I just
>cargo-culted some existing options.) Is there some existing option
>that I could piggy-back off of instead of adding a new one?
> 
> 3. Is a compile-time option sufficient, or would the `mmap()` option
>need to be configurable at runtime, or even tested at repository
>create time as is done for some other filesystem properties in
>`create_default_files()`?

I can't really answer (1) due to lack of knowledge. For (2), I think
it's mostly good, though I raised a small nit. For (3), I think this is
really an OS restriction and not a filesystem one, so probably
build-time is OK (though of course Windows people may be able to correct
me).

I left a few comments, but overall it looks quite good to me.

-Peff

Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-19 Thread Johannes Schindelin

Hi Michael,

On Tue, 19 Sep 2017, Michael Haggerty wrote:

> This is v2 of a patch series that changes the reading and caching of the
> `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and
> Johannes for their comments about v1 [1].

Thank you for the new iteration.

> The main change since v1 is to accommodate Windows, which doesn't let
> you replace a file using `rename()` if the file is currently mmapped.
> This is unfortunate, because it means that Windows will never get the
> O(N) → O(lg N) improvement for reading single references that more
> capable systems can now enjoy.

Triggered by your enquiry, I looked into passing the FILE_SHARE_DELETE
flag which I hoped would let us delete the file even if it still is open
(and mapped). In my tests, this did not work. If anybody wants to have a
look at what I did (and whether they can make it work):
https://github.com/dscho/git/tree/replace-wopen

> The background was discussed on the mailing list [2]. The bottom line
> is that on Windows, keeping the `packed-refs` lock mmapped would be
> tantamount to holding reader lock on that file, preventing anybody
> (even unrelated processes) from changing the `packed-refs` file while
> it is mmapped. This is even worse than the situation for packfiles
> (which is solved using `close_all_packs()`), because a packfile, once
> created, never needs to be replaced—every packfile has a filename that
> is determined from its contents. The worst that can happen if a
> packfile is locked is that another process cannot remove it, but that
> is not critical for correctness. The `packed-refs` file, on the other
> hand, always has the same filename and needs to be overwritten for
> correctness.
> 
> So the approach taken here is that a new compile-time option,
> `MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then
> the `packed-refs` file is read quickly into memory then closed.

Another approach would be to imitate close_all_packs() and rely on the
Windows-specific code that retries renames in a staggered fashion, waiting
a little longer and longer before retrying, and finally telling the user
that some file cannot be overwritten:
https://github.com/git-for-windows/git/blob/v2.14.1.windows.1/compat/mingw.c#L2439-L2441

This is not a new problem, by the way. If a file is in use while you try
to run `git checkout` with a different version of that file, we have the
exact same problem on Windows. And we deal with it using that
retry_ask_yes_no() function.

For this to work, the current process really would need to be able to
release all snapshots in one go (for simplicity, I would not even check
the filename but simply blow them all away when we want to overwrite
packed-refs).

I guess I should set aside some time to implement that on top of your
series (I *really* want our in-house users to benefit from that O(lg n)
improvement). In the meantime, I think this can go forward with the
current design.

Ciao,
Dscho

[PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-19 Thread Michael Haggerty

This is v2 of a patch series that changes the reading and caching of
the `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and
Johannes for their comments about v1 [1].

The main change since v1 is to accommodate Windows, which doesn't let
you replace a file using `rename()` if the file is currently mmapped.
This is unfortunate, because it means that Windows will never get the
O(N) → O(lg N) improvement for reading single references that more
capable systems can now enjoy.

The background was discussed on the mailing list [2]. The bottom line
is that on Windows, keeping the `packed-refs` lock mmapped would be
tantamount to holding reader lock on that file, preventing anybody
(even unrelated processes) from changing the `packed-refs` file while
it is mmapped. This is even worse than the situation for packfiles
(which is solved using `close_all_packs()`), because a packfile, once
created, never needs to be replaced—every packfile has a filename that
is determined from its contents. The worst that can happen if a
packfile is locked is that another process cannot remove it, but that
is not critical for correctness. The `packed-refs` file, on the other
hand, always has the same filename and needs to be overwritten for
correctness.

So the approach taken here is that a new compile-time option,
`MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then
the `packed-refs` file is read quickly into memory then closed.

Even in that case, though, this branch brings significant performance
benefits, because instead of parsing the whole file and storing it
into lots of little objects in a `ref_cache` (which also involves a
lot of small memory allocations), we copy the verbatim contents of the
file into memory. Then we use the same binary search techniques to
find any references that we need to read, just as we would do if the
file were memory mapped. This means that we only have to fully parse
the references that we are interested in, and hardly have to allocate
any additional memory.

I did some more careful benchmarks of this code vs. Git 2.14.1 on a
repository that is not quite as pathological. The test repo has 110k
references that are fully packed in a `packed-refs` file that has the
`sorted` trait. The current version is compiled three ways:

* `NO_MMAP=YesPlease`—prevents all use of `mmap()`. This variant is
  O(N) when reading a single reference.

* `MMAP_PREVENTS_DELETE=YesPlease`—uses mmap for the initial read, but
  quickly copies the contents to heap-allocated memory and munmaps
  right away. This variant is also O(N) when reading a single
  reference.

* default (mmap enabled)—the `packed-refs` file is kept mmapped for as
  long as it is in use.

The commands that I timed were as follows:

# for-each-ref, warm cache:
$ git -C lots-of-refs for-each-ref --format="%(objectname) %(refname)" 
>/dev/null

# rev-parse, warm cache (this command was run 10 times then the total
# time divided by 10):
$ git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733

# rev-parse, cold cache (but git binary warm):
$ sync ; sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'; git rev-parse v1.0.0; 
time git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733

(Note that the `rev-parse` commands involve a handful of reference
lookups as the argument is DWIMmed.)

Results:

   for-each-ref rev-parse rev-parse
 warm cachewarm cachecold cache
   ----
v2.14.1   92 ms   23.7 ms 30 ms
NO_MMAP=YesPlease 83 ms3.4 ms 10 ms
MMAP_PREVENTS_DELETE=YesPlease82 ms3.5 ms 11 ms
default (mmap enabled)81 ms0.8 ms  6 ms

So this branch is a little bit faster at iterating over all
references, but it really has big advantages when looking up single
references. The advantage is smaller if `NO_MMAP` or
`MMAP_PREVENTS_DELETE` is set, but is still quite significant.

This branch is also available from my fork on GitHub as branch
`mmap-packed-refs`.

My main uncertainties are:

1. Does this code actually work on Windows?

2. Did I implement the new compile-time option correctly? (I just
   cargo-culted some existing options.) Is there some existing option
   that I could piggy-back off of instead of adding a new one?

3. Is a compile-time option sufficient, or would the `mmap()` option
   need to be configurable at runtime, or even tested at repository
   create time as is done for some other filesystem properties in
   `create_default_files()`?

Michael

[1] https://public-inbox.org/git/cover.1505319366.git.mhag...@alum.mit.edu/
[2] 
https://public-inbox.org/git/alpine.DEB.2.21.1.1709142101560.4132@virtualbox/

https://public-inbox.org/git/alpine.DEB.2.21.1.1709150012550.219280@virtualbox/
[3] https://github.com/mhagger/git

Jeff King (1):

Re: [PATCH v2 00/21]

2017-05-09 Thread Junio C Hamano

Duy Nguyen  writes:

> On Sun, May 7, 2017 at 11:20 AM, Junio C Hamano  wrote:
>> On Thu, May 4, 2017 at 2:45 PM, Junio C Hamano  wrote:
>>>
>>> Nguyễn Thái Ngọc Duy   writes:
>>>
>>> > Changes since v1:
>>> >
>>> >  - fopen_or_warn() and warn_on_fopen_errors() are introduced. The
>>> >latter's name is probably not great...
>>> >  - A new patch (first one) to convert a bunch to using xfopen()
>>> >  - Fix test failure on Windows, found by Johannes Sixt
>>> >  - Fix the memory leak in log.c, found by Jeff
>>> >
>>> > There are still a couple of fopen() remained, but I'm getting slow
>>> > again so let's get these in first and worry about the rest when
>>> > somebody has time.
>>
>> Hmm, is this topic responsible for recent breakage Travis claims on MacOS X?
>>
>> https://travis-ci.org/git/git/jobs/229585098#L3091
>>
>> seems to expect an error when test-config is fed a-directory but we are
>> not getting the expected warning and error?
>
> Sounds about right. Let me see if defining FREAD_READS_DIRECTORIES on
> MacOS X makes travis happy.

Thanks.  I should have suspected that myself to save a round-trip.

Re: [PATCH v2 00/21]

2017-05-09 Thread Duy Nguyen

On Sun, May 7, 2017 at 11:20 AM, Junio C Hamano  wrote:
> On Thu, May 4, 2017 at 2:45 PM, Junio C Hamano  wrote:
>>
>> Nguyễn Thái Ngọc Duy   writes:
>>
>> > Changes since v1:
>> >
>> >  - fopen_or_warn() and warn_on_fopen_errors() are introduced. The
>> >latter's name is probably not great...
>> >  - A new patch (first one) to convert a bunch to using xfopen()
>> >  - Fix test failure on Windows, found by Johannes Sixt
>> >  - Fix the memory leak in log.c, found by Jeff
>> >
>> > There are still a couple of fopen() remained, but I'm getting slow
>> > again so let's get these in first and worry about the rest when
>> > somebody has time.
>
> Hmm, is this topic responsible for recent breakage Travis claims on MacOS X?
>
> https://travis-ci.org/git/git/jobs/229585098#L3091
>
> seems to expect an error when test-config is fed a-directory but we are
> not getting the expected warning and error?

Sounds about right. Let me see if defining FREAD_READS_DIRECTORIES on
MacOS X makes travis happy.
-- 
Duy

Re: [PATCH v2 00/21]

2017-05-07 Thread Junio C Hamano

On Thu, May 4, 2017 at 2:45 PM, Junio C Hamano  wrote:
>
> Nguyễn Thái Ngọc Duy   writes:
>
> > Changes since v1:
> >
> >  - fopen_or_warn() and warn_on_fopen_errors() are introduced. The
> >latter's name is probably not great...
> >  - A new patch (first one) to convert a bunch to using xfopen()
> >  - Fix test failure on Windows, found by Johannes Sixt
> >  - Fix the memory leak in log.c, found by Jeff
> >
> > There are still a couple of fopen() remained, but I'm getting slow
> > again so let's get these in first and worry about the rest when
> > somebody has time.

Hmm, is this topic responsible for recent breakage Travis claims on MacOS X?

https://travis-ci.org/git/git/jobs/229585098#L3091

seems to expect an error when test-config is fed a-directory but we are
not getting the expected warning and error?

Re: [PATCH v2 00/21]

2017-05-03 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> Changes since v1:
>
>  - fopen_or_warn() and warn_on_fopen_errors() are introduced. The
>latter's name is probably not great...
>  - A new patch (first one) to convert a bunch to using xfopen()
>  - Fix test failure on Windows, found by Johannes Sixt
>  - Fix the memory leak in log.c, found by Jeff
>
> There are still a couple of fopen() remained, but I'm getting slow
> again so let's get these in first and worry about the rest when
> somebody has time.
>
> Nguyễn Thái Ngọc Duy (21):
>   Use xfopen() in more places
>   clone: use xfopen() instead of fopen()
>   config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
>   wrapper.c: add warn_on_fopen_errors()
>   wrapper.c: add fopen_or_warn()
>   attr.c: use fopen_or_warn()
>   ident.c: use fopen_or_warn()
>   bisect: report on fopen() error
>   blame: report error on open if graft_file is a directory
>   log: report errno on failure to fopen() a file
>   log: fix memory leak in open_next_file()
>   commit.c: report error on failure to fopen() the graft file
>   remote.c: report error on failure to fopen()
>   rerere.c: report error on failure to fopen()
>   rerere.c: report correct errno
>   sequencer.c: report error on failure to fopen()
>   server-info: report error on failure to fopen()
>   wt-status.c: report error on failure to fopen()
>   xdiff-interface.c: report errno on failure to stat() or fopen()
>   config.c: handle error on failing to fopen()
>   t1308: add a test case on open a config directory

Thanks.  If the number of parts affected by this series were
smaller, it may have made the review easier to have the introduction
of a helper and its use in a single larger patch, but there are
spread across many, some with files that are touched by different
in-flight topics, and these "collection of smaller patches" makes it
easier to manage both while reviewing and also merging.

All looked good, even though I do share the doubt on the name
"warn-on-fopen-errors"; when something applies equally to fopen(3)
and underlying open(2), I would tend to call that with open not
fopen myself.  But that is a minor point.

[PATCH v2 00/21]

2017-05-03 Thread Nguyễn Thái Ngọc Duy

Changes since v1:

 - fopen_or_warn() and warn_on_fopen_errors() are introduced. The
   latter's name is probably not great...
 - A new patch (first one) to convert a bunch to using xfopen()
 - Fix test failure on Windows, found by Johannes Sixt
 - Fix the memory leak in log.c, found by Jeff

There are still a couple of fopen() remained, but I'm getting slow
again so let's get these in first and worry about the rest when
somebody has time.

Nguyễn Thái Ngọc Duy (21):
  Use xfopen() in more places
  clone: use xfopen() instead of fopen()
  config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
  wrapper.c: add warn_on_fopen_errors()
  wrapper.c: add fopen_or_warn()
  attr.c: use fopen_or_warn()
  ident.c: use fopen_or_warn()
  bisect: report on fopen() error
  blame: report error on open if graft_file is a directory
  log: report errno on failure to fopen() a file
  log: fix memory leak in open_next_file()
  commit.c: report error on failure to fopen() the graft file
  remote.c: report error on failure to fopen()
  rerere.c: report error on failure to fopen()
  rerere.c: report correct errno
  sequencer.c: report error on failure to fopen()
  server-info: report error on failure to fopen()
  wt-status.c: report error on failure to fopen()
  xdiff-interface.c: report errno on failure to stat() or fopen()
  config.c: handle error on failing to fopen()
  t1308: add a test case on open a config directory

 attr.c|  7 ++-
 bisect.c  |  7 ++-
 builtin/am.c  |  8 ++--
 builtin/blame.c   |  4 +++-
 builtin/clone.c   |  2 +-
 builtin/commit.c  |  5 +
 builtin/fast-export.c |  4 +---
 builtin/fsck.c|  3 +--
 builtin/log.c | 11 ---
 builtin/merge.c   |  4 +---
 builtin/pull.c|  3 +--
 commit.c  |  4 +++-
 config.c  |  5 -
 config.mak.uname  |  3 +++
 diff.c|  8 ++--
 dir.c |  3 +--
 fast-import.c |  4 +---
 git-compat-util.h |  3 +++
 ident.c   |  8 +++-
 remote-testsvn.c  |  8 ++--
 remote.c  |  4 ++--
 rerere.c  |  7 ---
 sequencer.c   |  8 
 server-info.c |  2 +-
 t/t1308-config-set.sh | 13 -
 t/t5512-ls-remote.sh  | 13 ++---
 wrapper.c | 21 +
 wt-status.c   |  3 ++-
 xdiff-interface.c |  4 ++--
 29 files changed, 103 insertions(+), 76 deletions(-)

-- 
2.11.0.157.gd943d85

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King

On Tue, Mar 28, 2017 at 12:40:29PM -0700, Junio C Hamano wrote:

> > Here's that minor tweak, in case anybody is interested. It's less useful
> > without that follow-on that touches "eol" more, but perhaps it increases
> > readability on its own.
> 
> Yup, the only thing that the original (with Brian's fix) appears to
> be more careful about is it tries very hard to avoid setting boc
> past eoc.  As we are not checking "boc != eoc" but doing the
> comparison, that "careful" appearance does not give us any benefit
> in practice, other than having to do an extra "eol ? eol+1 : eoc";
> the result of this patch is easier to read.
> 
> By the way, eoc is "one past the end" of the array that begins at
> boc, so setting a pointer to eoc+1 may technically be in violation.
> I do not know how much it matters, though ;-)

I think that is OK. We are reading a strbuf, so eoc must either be the
first character of the PGP signature, or the terminating NUL if there
was no signature block[1]. So it's actually _inside_ the array, and
eoc+1 is our "one past".

-Peff

[1] Arguably we should bail when parse_signature() does not find a PGP
signature at all. We already bail with "malformed push certificate"
when there are other syntactic anomalies.

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Junio C Hamano

Jeff King  writes:

> Here's that minor tweak, in case anybody is interested. It's less useful
> without that follow-on that touches "eol" more, but perhaps it increases
> readability on its own.

Yup, the only thing that the original (with Brian's fix) appears to
be more careful about is it tries very hard to avoid setting boc
past eoc.  As we are not checking "boc != eoc" but doing the
comparison, that "careful" appearance does not give us any benefit
in practice, other than having to do an extra "eol ? eol+1 : eoc";
the result of this patch is easier to read.

By the way, eoc is "one past the end" of the array that begins at
boc, so setting a pointer to eoc+1 may technically be in violation.
I do not know how much it matters, though ;-)

> -- >8 --
> Subject: [PATCH] receive-pack: simplify eol handling in cert parsing
>
> The queue_commands_from_cert() function wants to handle each
> line of the cert individually. It looks for "\n" in the
> to-be-parsed bytes, and special-cases each use of eol (the
> end-of-line variable) when we didn't find one.  Instead, we
> can just set the end-of-line variable to end-of-cert in the
> latter case.
>
> For advancing to the next line, it's OK for us to move our
> pointer past end-of-cert, because our loop condition just
> checks for pointer inequality. And it doesn't even violate
> the ANSI C "no more than one past the end of an array" rule,
> because we know in the worst case we've hit the terminating
> NUL of the strbuf.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/receive-pack.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 5d9e4da0a..58de2a1a9 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command 
> **tail,
>  
>   while (boc < eoc) {
>   const char *eol = memchr(boc, '\n', eoc - boc);
> - tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc);
> - boc = eol ? eol + 1 : eoc;
> + if (!eol)
> + eol = eoc;
> + tail = queue_command(tail, boc, eol - boc);
> + boc = eol + 1;
>   }
>  }

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King

On Tue, Mar 28, 2017 at 01:35:36PM -0400, Jeff King wrote:

> I thought I'd knock this out quickly before I forgot about it. But it
> actually isn't so simple.
> 
> The main caller in read_head_info() does indeed just pass strlen(line)
> as the length in each case. But the cert parser really does need us to
> respect the line length. So we either have to pass it in, or tie off the
> string.
> 
> The latter looks something like the patch below (on top of a minor
> tweak around "eol" handling). It's sufficiently ugly that it may not
> count as an actual cleanup, though. I'm OK if we just drop the idea.

Here's that minor tweak, in case anybody is interested. It's less useful
without that follow-on that touches "eol" more, but perhaps it increases
readability on its own.

-- >8 --
Subject: [PATCH] receive-pack: simplify eol handling in cert parsing

The queue_commands_from_cert() function wants to handle each
line of the cert individually. It looks for "\n" in the
to-be-parsed bytes, and special-cases each use of eol (the
end-of-line variable) when we didn't find one.  Instead, we
can just set the end-of-line variable to end-of-cert in the
latter case.

For advancing to the next line, it's OK for us to move our
pointer past end-of-cert, because our loop condition just
checks for pointer inequality. And it doesn't even violate
the ANSI C "no more than one past the end of an array" rule,
because we know in the worst case we've hit the terminating
NUL of the strbuf.

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 5d9e4da0a..58de2a1a9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command 
**tail,

while (boc < eoc) {
const char *eol = memchr(boc, '\n', eoc - boc);
-   tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc);
-   boc = eol ? eol + 1 : eoc;
+   if (!eol)
+   eol = eoc;
+   tail = queue_command(tail, boc, eol - boc);
+   boc = eol + 1;
}
 }

-- 
2.12.2.845.g55fcf8b10

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King

On Tue, Mar 28, 2017 at 11:13:15AM +, brian m. carlson wrote:

> > I suggested an additional cleanup around "linelen" in one patch. In the
> > name of keeping the number of re-rolls sane, I'm OK if we skip that for
> > now (the only reason I mentioned it at all is that you have to justify
> > the caveat in the commit message; with the fix, that justification can
> > go away).
> 
> Let's leave it as it is, assuming Junio's okay with it.  I can send in a
> few more patches to clean that up and use skip_prefix that we can drop
> on top and graduate separately.
> 
> I think the justification is useful as it is, since it explains why we
> no longer want to check that particular value for historical reasons.

I thought I'd knock this out quickly before I forgot about it. But it
actually isn't so simple.

The main caller in read_head_info() does indeed just pass strlen(line)
as the length in each case. But the cert parser really does need us to
respect the line length. So we either have to pass it in, or tie off the
string.

The latter looks something like the patch below (on top of a minor
tweak around "eol" handling). It's sufficiently ugly that it may not
count as an actual cleanup, though. I'm OK if we just drop the idea.

---
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 58de2a1a9..561a982e7 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1483,13 +1483,10 @@ static void execute_commands(struct command *commands,
 }
 
 static struct command **queue_command(struct command **tail,
- const char *line,
- int linelen)
+ const char *line)
 {
struct object_id old_oid, new_oid;
struct command *cmd;
-   const char *refname;
-   int reflen;
const char *p;
 
if (parse_oid_hex(line, _oid, ) ||
@@ -1498,9 +1495,7 @@ static struct command **queue_command(struct command 
**tail,
*p++ != ' ')
die("protocol error: expected old/new/ref, got '%s'", line);
 
-   refname = p;
-   reflen = linelen - (p - line);
-   FLEX_ALLOC_MEM(cmd, ref_name, refname, reflen);
+   FLEX_ALLOC_STR(cmd, ref_name, p);
oidcpy(>old_oid, _oid);
oidcpy(>new_oid, _oid);
*tail = cmd;
@@ -1510,7 +1505,7 @@ static struct command **queue_command(struct command 
**tail,
 static void queue_commands_from_cert(struct command **tail,
 struct strbuf *push_cert)
 {
-   const char *boc, *eoc;
+   char *boc, *eoc;
 
if (*tail)
die("protocol error: got both push certificate and unsigned 
commands");
@@ -1523,10 +1518,17 @@ static void queue_commands_from_cert(struct command 
**tail,
eoc = push_cert->buf + parse_signature(push_cert->buf, push_cert->len);
 
while (boc < eoc) {
-   const char *eol = memchr(boc, '\n', eoc - boc);
+   char *eol = memchr(boc, '\n', eoc - boc);
+   char tmp;
+
if (!eol)
eol = eoc;
-   tail = queue_command(tail, boc, eol - boc);
+
+   tmp = *eol;
+   *eol = '\0';
+   tail = queue_command(tail, boc);
+   *eol = tmp;
+
boc = eol + 1;
}
 }
@@ -1590,7 +1592,7 @@ static struct command *read_head_info(struct oid_array 
*shallow)
continue;
}
 
-   p = queue_command(p, line, linelen);
+   p = queue_command(p, line);
}
 
if (push_cert.len)

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Junio C Hamano

Jeff King  writes:

> On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote:
>
>> This is part 7 in the continuing transition to use struct object_id.
>> 
>> This series focuses on two main areas: adding two constants for the
>> maximum hash size we'll be using (which will be suitable for allocating
>> memory) and converting struct sha1_array to struct oid_array.
>
> Both changes are very welcome. I do think it's probably worth changing
> the name of sha1-array.[ch], but it doesn't need to happen immediately.
>
> I read through the whole series and didn't find anything objectionable.
> The pointer-arithmetic fix should perhaps graduate separately.

I didn't see anything incorrect when I queued the series, either,
and after I re-read it I saw a few minor readability issues, but
modulo that this looks ready.  I did split the push-cert parsing fix
and applied to an older base independently, though.

> I suggested an additional cleanup around "linelen" in one patch. In the
> name of keeping the number of re-rolls sane, I'm OK if we skip that for
> now (the only reason I mentioned it at all is that you have to justify
> the caveat in the commit message; with the fix, that justification can
> go away).

A follow-up after the dust settles could also mention "we earlier
mentioned this caveat but with this fix we no longer have to worry
about it", no?

Thanks both, anyways.

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread brian m. carlson

On Tue, Mar 28, 2017 at 03:31:59AM -0400, Jeff King wrote:
> I read through the whole series and didn't find anything objectionable.
> The pointer-arithmetic fix should perhaps graduate separately.

Junio's welcome to take that patch separately if he likes.

> I suggested an additional cleanup around "linelen" in one patch. In the
> name of keeping the number of re-rolls sane, I'm OK if we skip that for
> now (the only reason I mentioned it at all is that you have to justify
> the caveat in the commit message; with the fix, that justification can
> go away).

Let's leave it as it is, assuming Junio's okay with it.  I can send in a
few more patches to clean that up and use skip_prefix that we can drop
on top and graduate separately.

I think the justification is useful as it is, since it explains why we
no longer want to check that particular value for historical reasons.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King

On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote:

> This is part 7 in the continuing transition to use struct object_id.
> 
> This series focuses on two main areas: adding two constants for the
> maximum hash size we'll be using (which will be suitable for allocating
> memory) and converting struct sha1_array to struct oid_array.

Both changes are very welcome. I do think it's probably worth changing
the name of sha1-array.[ch], but it doesn't need to happen immediately.

I read through the whole series and didn't find anything objectionable.
The pointer-arithmetic fix should perhaps graduate separately.

I suggested an additional cleanup around "linelen" in one patch. In the
name of keeping the number of re-rolls sane, I'm OK if we skip that for
now (the only reason I mentioned it at all is that you have to justify
the caveat in the commit message; with the fix, that justification can
go away).

-Peff

[PATCH v2 00/21] object_id part 7

2017-03-26 Thread brian m. carlson

This is part 7 in the continuing transition to use struct object_id.

This series focuses on two main areas: adding two constants for the
maximum hash size we'll be using (which will be suitable for allocating
memory) and converting struct sha1_array to struct oid_array.

The rationale for adding separate constants for allocating memory is
that with a new 256-bit hash function, we're going to need two different
items: a constant for allocating memory that's as large as the largest
hash, and a global variable telling us size the current hash is.  I've
opted to provide GIT_MAX_RAWSZ and GIT_MAX_HEXSZ for allocating memory,
and leave GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ as values that can be later
replaced by the aforementioned global.

Replacing struct sha1_array with struct oid_array necessarily involves
converting the shallow code, so I did that.  The structure now handles
objects of struct object_id.  While I renamed the documentation (since
people will search for that), I chose not to rename the sha1-array.[ch]
files or the test helper because I didn't think it was worth the hassle,
especially for people who don't have rename support turned on by
default.

There is also a patch for fixing some broken pointer arithmetic that was
discovered during review of v1.  I don't think it's exploitable, but it
seems good to fix anyway.  Additional eyes on this patch are welcomed.

I chose to use Coccinelle quite a bit in this series, as it automates a
lot of the manual work and aides in review.  There is also some use of
Perl one-liners.

This series is available at https://github.com/bk2204/git under
object-id-part7; it may be rebased.

Changes from v1:
* Rebase on current master (no changes).
* Remove check for empty line in queue_command.
* Add patch 6 to fix invalid pointer arithmetic.

brian m. carlson (21):
  Define new hash-size constants for allocating memory
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  builtin/diff: convert to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/receive-pack: fix incorrect pointer arithmetic
  builtin/receive-pack: convert portions to struct object_id
  fsck: convert init_skiplist to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  test-sha1-array: convert most code to struct object_id
  sha1_name: convert struct disambiguate_state to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  submodule: convert check_for_new_submodule_commits to object_id
  builtin/pull: convert to struct object_id
  sha1-array: convert internal storage for struct sha1_array to
object_id
  Make sha1_array_append take a struct object_id *
  Convert remaining callers of sha1_array_lookup to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Rename sha1_array to oid_array
  Documentation: update and rename api-sha1-array.txt

 .../{api-sha1-array.txt => api-oid-array.txt}  |  44 +++
 bisect.c   |  43 ---
 builtin/blame.c|   4 +-
 builtin/cat-file.c |  14 +--
 builtin/diff.c |  40 +++---
 builtin/fetch-pack.c   |   2 +-
 builtin/fetch.c|   6 +-
 builtin/merge-index.c  |   2 +-
 builtin/merge.c|   2 +-
 builtin/pack-objects.c |  24 ++--
 builtin/patch-id.c |   2 +-
 builtin/pull.c |  98 +++
 builtin/receive-pack.c | 136 ++---
 builtin/rev-list.c |   2 +-
 builtin/rev-parse.c|   4 +-
 builtin/send-pack.c|   4 +-
 cache.h|  10 +-
 combine-diff.c |  18 +--
 commit.h   |  14 +--
 connect.c  |   8 +-
 diff.c |   4 +-
 diff.h |   4 +-
 fetch-pack.c   |  32 ++---
 fetch-pack.h   |   4 +-
 fsck.c |  17 +--
 fsck.h |   2 +-
 hex.c  |   2 +-
 parse-options-cb.c |   8 +-
 patch-ids.c|   2 +-
 patch-ids.h|   2 +-
 ref-filter.c   |  22 ++--

Re: [PATCH v2 00/21] Add configuration options for split-index

2016-12-26 Thread Christian Couder

On Mon, Dec 19, 2016 at 1:02 PM, Duy Nguyen  wrote:
> On Sat, Dec 17, 2016 at 03:55:26PM +0100, Christian Couder wrote:
>> Goal
>> 
>>
>> We want to make it possible to use the split-index feature
>> automatically by just setting a new "core.splitIndex" configuration
>> variable to true.
>>
>> This can be valuable as split-index can help significantly speed up
>> `git rebase` especially along with the work to libify `git apply`
>> that has been merged to master
>> (see 
>> https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98)
>> and is now in v2.11.
>
> I've read through the series (*) and I think it looks good, just a few
> minor comments here and there.

Thanks for your review.

I think I addressed all the minor points left in the v3 and the emails
I just sent.

> (*) guiltily admit that I only skimmed through tests, not giving them
> as much attention as I should have

Re: [PATCH v2 00/21] Add configuration options for split-index

2016-12-19 Thread Junio C Hamano

Duy Nguyen  writes:

> I've read through the series (*) and I think it looks good, just a few
> minor comments here and there.
>
> (*) guiltily admit that I only skimmed through tests, not giving them
> as much attention as I should have

OK.  I'd still want to see them get reviewed, though.  Perhaps I'll
do so myself once I run out of things to do, but hopefully somebody
else gets there first ;-)

Thanks.

Re: [PATCH v2 00/21] Add configuration options for split-index

2016-12-19 Thread Duy Nguyen

On Sat, Dec 17, 2016 at 03:55:26PM +0100, Christian Couder wrote:
> Goal
> 
> 
> We want to make it possible to use the split-index feature
> automatically by just setting a new "core.splitIndex" configuration
> variable to true.
> 
> This can be valuable as split-index can help significantly speed up
> `git rebase` especially along with the work to libify `git apply`
> that has been merged to master
> (see 
> https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98)
> and is now in v2.11.

I've read through the series (*) and I think it looks good, just a few
minor comments here and there.

(*) guiltily admit that I only skimmed through tests, not giving them
as much attention as I should have
--
Duy

Re: [PATCH v2 00/21] Add configuration options for split-index

2016-12-17 Thread Christian Couder

> The previous versions were:
>
>   RFC: https://github.com/chriscool/git/commits/config-split-index7
>   v1:  https://github.com/chriscool/git/commits/config-split-index72

The diff since v1 is:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 8fbef25cb1..52a3cac4ff 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2782,9 +2782,9 @@ splitIndex.sharedIndexExpire::
 index file is created. The value "now" expires all entries
 immediately, and "never" suppresses expiration altogether.
 The default value is "one.week.ago".
-Note that each time a new split-index file is created, the
-mtime of the related shared index file is updated to the
-current time.
+Note that each time a split index based on a shared index file
+is either created or read from, the mtime of the shared index
+file is updated to the current time.
 See linkgit:git-update-index[1].

 status.relativePaths::
diff --git a/Documentation/git-update-index.txt
b/Documentation/git-update-index.txt
index 635d1574b2..46c953b2f2 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -407,10 +407,14 @@ specified by the splitIndex.maxPercentChange
config variable (see
 linkgit:git-config[1]).

 Each time a new shared index file is created, the old shared index
-files are deleted if they are older than what is specified by the
-splitIndex.sharedIndexExpire config variable (see
+files are deleted if their mtime is older than what is specified by
+the splitIndex.sharedIndexExpire config variable (see
 linkgit:git-config[1]).

+To avoid deleting a shared index file that is still used, its mtime is
+updated to the current time everytime a new split index based on the
+shared index file is either created or read from.
+
 Untracked cache
 ---

diff --git a/builtin/gc.c b/builtin/gc.c
index c1e9602892..1e40d45aa2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -100,8 +100,8 @@ static void gc_config(void)
 git_config_get_int("gc.auto", _auto_threshold);
 git_config_get_int("gc.autopacklimit", _auto_pack_limit);
 git_config_get_bool("gc.autodetach", _auto);
-git_config_get_date_string("gc.pruneexpire", _expire);
-git_config_get_date_string("gc.worktreepruneexpire",
_worktrees_expire);
+git_config_get_expiry("gc.pruneexpire", _expire);
+git_config_get_expiry("gc.worktreepruneexpire", _worktrees_expire);
 git_config(git_default_config, NULL);
 }

diff --git a/builtin/update-index.c b/builtin/update-index.c
index a14dbf2612..dc1fd0d44d 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1099,18 +1099,18 @@ int cmd_update_index(int argc, const char
**argv, const char *prefix)

 if (split_index > 0) {
 if (git_config_get_split_index() == 0)
-warning("core.splitIndex is set to false; "
-"remove or change it, if you really want to "
-"enable split index");
+warning(_("core.splitIndex is set to false; "
+  "remove or change it, if you really want to "
+  "enable split index"));
 if (the_index.split_index)
 the_index.cache_changed |= SPLIT_INDEX_ORDERED;
 else
 add_split_index(_index);
 } else if (!split_index) {
 if (git_config_get_split_index() == 1)
-warning("core.splitIndex is set to true; "
-"remove or change it, if you really want to "
-"disable split index");
+warning(_("core.splitIndex is set to true; "
+  "remove or change it, if you really want to "
+  "disable split index"));
 remove_split_index(_index);
 }

diff --git a/cache.h b/cache.h
index 8e26aaf05e..279415afbd 100644
--- a/cache.h
+++ b/cache.h
@@ -1823,11 +1823,13 @@ extern int git_config_get_bool(const char
*key, int *dest);
 extern int git_config_get_bool_or_int(const char *key, int *is_bool,
int *dest);
 extern int git_config_get_maybe_bool(const char *key, int *dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
-extern int git_config_get_date_string(const char *key, const char **output);
 extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);

+/* This dies if the configured or default date is in the future */
+extern int git_config_get_expiry(const char *key, const char **output);
+
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
  * ensure that they do not modify the untracked cache when reading it.
diff --git a/config.c b/config.c
index f88c61bb30..5c52cefd78 100644
--- a/config.c
+++ b/config.c
@@ -1685,7 +1685,7 @@ int git_config_get_pathname(const char *key,
const char **dest)
 return ret;
 }

-int git_config_get_date_string(const char *key, const char **output)
+int git_config_get_expiry(const char *key, const

[PATCH v2 00/21] Add configuration options for split-index

2016-12-17 Thread Christian Couder

Goal

We want to make it possible to use the split-index feature
automatically by just setting a new "core.splitIndex" configuration
variable to true.

This can be valuable as split-index can help significantly speed up
`git rebase` especially along with the work to libify `git apply`
that has been merged to master
(see https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98)
and is now in v2.11.

Design
~~

The design is similar as the previous work that introduced
"core.untrackedCache".

The new "core.splitIndex" configuration option can be either true,
false or undefined which is the default.

When it is true, the split index is created, if it does not already
exists, when the index is read. When it is false, the split index is
removed if it exists, when the index is read. Otherwise it is left as
is.

Along with this new configuration variable, the two following options
are also introduced:

- splitIndex.maxPercentChange

This is to avoid having too many changes accumulating in the split
index while in split index mode. The git-update-index
documentation says:

If split-index mode is already enabled and `--split-index` is
given again, all changes in $GIT_DIR/index are pushed back to
the shared index file.

but it is probably better to not expect the user to think about it
and to have a mechanism that pushes back all changes to the shared
index file automatically when some threshold is reached.

The default threshold is when the number of entries in the split
index file reaches 20% of the number of entries in the shared
index file. The new "splitIndex.maxPercentChange" config option
lets people tweak this value.

- splitIndex.sharedIndexExpire

To make sure that old sharedindex files are eventually removed
when a new one has been created, we "touch" the shared index file
every time a split index file using the shared index file is
either created or read from. Then we can delete shared indexes
with an mtime older than one week (by default), when we create a
new shared index file. The new "splitIndex.sharedIndexExpire"
config option lets people tweak this grace period.

This idea was suggested by Duy in:

https://public-inbox.org/git/cacsjy8bqmfashf5kjguh+bd7xg98cafnyde964vjypxz-em...@mail.gmail.com/

and after some experiments, I agree that it is much simpler than
what I thought could be done during our discussion.

Junio also thinks that we have to do "time-based GC" in:

https://public-inbox.org/git/xmqqeg33ccjj@gitster.mtv.corp.google.com/

Highlevel view of the patches in the series
~~~

Except for patch 1/21, there are 3 big steps, one for each new
configuration variable introduced.

The main difference between this patch series and the v1 patch series
sent last October is that Step 3 has a few new commits to also update
the mtime of the shared index file when a split index based on the
shared index file is read from.

- Patch 1/21 marks a message for translation. It is a new patch
and it can be applied separately. (The patch 1/19 in v1, which
was a typo fix, has been merged separately into master already.)

Step 1 is:

- Patches 2/21 to 5/21 introduce the functions that are reading
the "core.splitIndex" configuration variable and tweaking the
split index depending on its value.

- Patch 6/21 adds a few tests for the new feature.

- Patches 7/21 and 8/21 add some documentation for the new
feature.

The only change since v1 in this step is that some warning
messages in 5/21 have been marked for translation as suggested
by Duy.

Step 2 is:

- Patches 9/21 and 10/21 introduce the functions that are reading
the "splitIndex.maxPercentChange" configuration variable and
regenerating a new shared index file depending on its value.

- Patch 11/21 adds a few tests for the new feature.

- Patch 12/21 add some documentation for the new feature.

The changes since v1 in this step are:

- an error message has been marked for translation in 9/21,
- camelCase is used in the same error message as suggested by
Duy in 9/21,
- "return error(...)" is now used as suggested by Junio in
9/21,
- too_many_not_shared_entries() is now "static" as suggested
by Ramsay in 10/21,
- changes made in write_locked_index() have been reorganized
as suggested by Duy in 10/21,

Step 3 is:

- Patches 13/21 to 16/21 introduce the functions that are reading
the "splitIndex.sharedIndexExpire" configuration variable and
expiring old shared index files depending on its value.

- Patch 17/21 adds a few tests for the new feature.

- Patches 18/21 and 19/21 are new patches. They update the mtime
of the shared index file when a split index based on the shared

[PATCH v2 00/21] git bisect improvements

2016-04-10 Thread Stephan Beyer

Hi,

a long time ago[1] I sent the first version of this patchset
to the list. Since then I wrote different variants of the algorithm,
fixed some bugs, made the tests work ;), and finally performed some
performance tests to pick the best version of the different variants.

For the performance tests I used the Git repositories of the Linux
kernel and of Git itself and performed whole-history bisections
with a bisect script that decided "good" or "bad" based on the hash
of a commit. And another bisect script that did the opposite.

I omit the details. The best variant uses a DFS for the traversal
without any further "smart" tricks: These tricks led to more
"administrative expense" than gain of performance.

I'm sorry that it took so long to prepare the 2nd patchset (mainly
vacation and other work in between). So I hope it's sufficiently
good for inclusion. :)

Cheers

 1. https://www.mail-archive.com/git@vger.kernel.org/msg86353.html


Stephan Beyer (21):
  bisect: write about `bisect next` in documentation
  bisect: allow 'bisect run' if no good commit is known
  t/test-lib-functions.sh: generalize test_cmp_rev
  t: use test_cmp_rev() where appropriate
  t6030: generalize test to not rely on current implementation
  bisect: add test for the bisect algorithm
  bisect: plug the biggest memory leak
  bisect: make bisect compile if DEBUG_BISECT is set
  bisect: make algorithm behavior independent of DEBUG_BISECT
  bisect: get rid of recursion in count_distance()
  bisect: use struct node_data array instead of int array
  bisect: replace clear_distance() by unique markers
  bisect: use commit instead of commit list as arguments when
appropriate
  bisect: extract get_distance() function from code duplication
  bisect: introduce distance_direction()
  bisect: make total number of commits global
  bisect: rename count_distance() to compute_weight()
  bisect: prepare for different algorithms based on find_all
  bisect: use a bottom-up traversal to find relevant weights
  bisect: compute best bisection in compute_relevant_weights()
  bisect: get back halfway shortcut

 Documentation/git-bisect.txt  |  24 ++
 bisect.c  | 481 --
 git-bisect.sh |  32 +-
 t/t2012-checkout-last.sh  |   8 +-
 t/t3308-notes-merge.sh|   8 +-
 t/t3310-notes-merge-manual-resolve.sh |   8 +-
 t/t3311-notes-merge-fanout.sh |   6 +-
 t/t3404-rebase-interactive.sh |  38 +--
 t/t3407-rebase-abort.sh   |   8 +-
 t/t3410-rebase-preserve-dropped-merges.sh |   4 +-
 t/t3411-rebase-preserve-around-merges.sh  |  10 +-
 t/t3414-rebase-preserve-onto.sh   |  12 +-
 t/t3501-revert-cherry-pick.sh |   4 +-
 t/t3506-cherry-pick-ff.sh |   6 +-
 t/t3903-stash.sh  |   6 +-
 t/t4150-am.sh |  18 +-
 t/t5404-tracking-branches.sh  |   2 +-
 t/t5505-remote.sh |   4 +-
 t/t5520-pull.sh   |  36 +--
 t/t6022-merge-rename.sh   |   2 +-
 t/t6030-bisect-porcelain.sh   | 228 +++---
 t/t6036-recursive-corner-cases.sh |  58 ++--
 t/t6042-merge-rename-corner-cases.sh  |  50 ++--
 t/t7003-filter-branch.sh  |   8 +-
 t/t7004-tag.sh|   2 +-
 t/t7110-reset-merge.sh|  24 +-
 t/t7201-co.sh |  12 +-
 t/t7601-merge-pull-config.sh  |  17 +-
 t/t7603-merge-reduce-heads.sh |  30 +-
 t/t7605-merge-resolve.sh  |   5 +-
 t/t8010-bisect-algorithm.sh   | 155 ++
 t/t9162-git-svn-dcommit-interactive.sh|   8 +-
 t/t9300-fast-import.sh|  12 +-
 t/test-lib-functions.sh   |  14 +-
 34 files changed, 832 insertions(+), 508 deletions(-)
 create mode 100755 t/t8010-bisect-algorithm.sh

-- 
2.8.1.137.g522756c

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-27 Thread Duy Nguyen

On Fri, Dec 27, 2013 at 12:12 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote:

 Do we even need to expose them as ref-like things as a part of the
 external API/UI in the first place?  For end-user scripts that want
 to operate in a real or borrowing worktree, there should be no
 reason to touch this other repository directly.  Things like if
 one of the wortrees tries to check out a branch that is already
 checked out elsewhere, error out policy may need to consult the
 other worktrees via $GIT_COMMON_DIR mechanism but at that level we
 have all the control without contaminating end-user facing ref
 namespace in a way main/FETCH_HEAD... do.

 No, external API/UI is just extra bonus. We need to (or should) do so
 in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal
 refs.

 And that is what I consider a confusion-trigger, not a nice bonus.

 I do not think it is particularly a good idea to contaminate
 end-user namespace for this kind of peek another repository
 special operation.

 In order to handle your multiple worktrees manipulating the same
 branch case sanely, you need to be aware of not just the real
 repository your worktree is borrowing from, but also _all_ the other
 worktrees that borrow from that same real repository, so a single
 main virtual namespace will not cut it. You will be dealing with
 an unbounded set of HEADs, one for each such worktree.

Yes. My problem is, while all secondary worktrees are in
$GIT_DIR/repos and their HEADs can be accessed there with
repos/xxx/HEAD, the first worktree's HEAD can't be accessed this way
because HEAD in a linked checkouts means repos/my worktree/HEAD.

 Can't we do this by adding a with this real repository layer,
 e.g. resolve HEAD wrt that repository, somewhat similar to how we
 peek into submodule namespaces?

Hmm.. never thought of it like a submodule. Thanks for the idea.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-26 Thread Junio C Hamano

Duy Nguyen pclo...@gmail.com writes:

 On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote:

 Do we even need to expose them as ref-like things as a part of the
 external API/UI in the first place?  For end-user scripts that want
 to operate in a real or borrowing worktree, there should be no
 reason to touch this other repository directly.  Things like if
 one of the wortrees tries to check out a branch that is already
 checked out elsewhere, error out policy may need to consult the
 other worktrees via $GIT_COMMON_DIR mechanism but at that level we
 have all the control without contaminating end-user facing ref
 namespace in a way main/FETCH_HEAD... do.

 No, external API/UI is just extra bonus. We need to (or should) do so
 in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal
 refs.

And that is what I consider a confusion-trigger, not a nice bonus.

I do not think it is particularly a good idea to contaminate
end-user namespace for this kind of peek another repository
special operation.

In order to handle your multiple worktrees manipulating the same
branch case sanely, you need to be aware of not just the real
repository your worktree is borrowing from, but also _all_ the other
worktrees that borrow from that same real repository, so a single
main virtual namespace will not cut it. You will be dealing with
an unbounded set of HEADs, one for each such worktree.

Can't we do this by adding a with this real repository layer,
e.g. resolve HEAD wrt that repository, somewhat similar to how we
peek into submodule namespaces?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-22 Thread Duy Nguyen

On Sun, Dec 22, 2013 at 1:38 PM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 I am not happy with the choice of main/HEAD that would squat on a
 good name for remote-tracking branch (i.e. s/origin/main/), though.
 $GIT_DIR/COMMON_HEAD perhaps?

 It's not just about HEAD. Anything worktree-specific of the main
 checkout can be accessed this way, e.g. main/index,
 main/FETCH_HEAD and it's not exactly common because it's
 worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...)
 ?

 Do we even need to expose them as ref-like things as a part of the
 external API/UI in the first place?  For end-user scripts that want
 to operate in a real or borrowing worktree, there should be no
 reason to touch this other repository directly.  Things like if
 one of the wortrees tries to check out a branch that is already
 checked out elsewhere, error out policy may need to consult the
 other worktrees via $GIT_COMMON_DIR mechanism but at that level we
 have all the control without contaminating end-user facing ref
 namespace in a way main/FETCH_HEAD... do.

No, external API/UI is just extra bonus. We need to (or should) do so
in order to handle $GIT_COMMON_DIR/HEAD exactly like how we do normal
refs. Given any ref, git_path(ref) gives the path to that ref,
git_path(logs/%s, ref) gives the path of its reflog. By mapping
special names to real paths behind git_path(), We can feed
$GIT_COMMON_DIR/HEAD (under special names) to refs.c and it'll handle
correctly without any changes for special cases.

 You said

 This makes it possible for a linked checkout to detach HEAD of
 the main one.

 but I think that is misguided---it just makes it easier to confuse
 users, if done automatically and without any policy knob. It instead
 should error out, while saying which worktree has the branch in
 question checked out. After all, checking out a branch that is
 checked out in another worktree (not the common one) needs the same
 caution, so main/HEAD is not the only special one.

 And if your updated git checkout 'frotz' gives a clear report of
 which worktree has the branch 'frotz' checked out, the user can go
 there to detach the HEAD in that worktree to detach with

 git -C $the_other_one checkout HEAD^0

 if he chooses to.

Jonathan mentions about the checkout in portable device case that
would make the above a bit unnatural as you just can't cd there (git
update-ref still works).

I don't see any problems with checking out a branch multiple times. I
may want to try modifying something in the branch that will be thrown
away in the end. It's when the user updates a branch that we should
either error+reject or detach other checkouts. I guess it's up to the
user to decide which way they want. The error+reject way may make the
user hunt through dead checkouts waiting to be pruned. But we can
start with error+reject then add an option to auto-detach.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-21 Thread Junio C Hamano

Duy Nguyen pclo...@gmail.com writes:

 I am not happy with the choice of main/HEAD that would squat on a
 good name for remote-tracking branch (i.e. s/origin/main/), though.
 $GIT_DIR/COMMON_HEAD perhaps?

 It's not just about HEAD. Anything worktree-specific of the main
 checkout can be accessed this way, e.g. main/index,
 main/FETCH_HEAD and it's not exactly common because it's
 worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...)
 ?

Do we even need to expose them as ref-like things as a part of the
external API/UI in the first place?  For end-user scripts that want
to operate in a real or borrowing worktree, there should be no
reason to touch this other repository directly.  Things like if
one of the wortrees tries to check out a branch that is already
checked out elsewhere, error out policy may need to consult the
other worktrees via $GIT_COMMON_DIR mechanism but at that level we
have all the control without contaminating end-user facing ref
namespace in a way main/FETCH_HEAD... do.  You said

This makes it possible for a linked checkout to detach HEAD of
the main one.

but I think that is misguided---it just makes it easier to confuse
users, if done automatically and without any policy knob. It instead
should error out, while saying which worktree has the branch in
question checked out. After all, checking out a branch that is
checked out in another worktree (not the common one) needs the same
caution, so main/HEAD is not the only special one.

And if your updated git checkout 'frotz' gives a clear report of
which worktree has the branch 'frotz' checked out, the user can go
there to detach the HEAD in that worktree to detach with

git -C $the_other_one checkout HEAD^0

if he chooses to.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-20 Thread Junio C Hamano

Duy Nguyen pclo...@gmail.com writes:

 I've got a better version [1] that fixes everything I can think of
 (there's still some room for improvements). I'm going to use it a bit
 longer before reposting again. But here's its basic design without
 going down to code

 New .git file format includes two lines:
 -- 8 --
 gitid: id
 gitdir: path
 -- 8 --

 Which would set $GIT_COMMON_DIR to path and $GIT_DIR to
 path/repos/id. Repository split is the same as before, worktree
 stuff in $GIT_DIR, the rest in $GIT_COMMON_DIR. This .git file format
 takes precedence over core.worktree but can still be overriden with
 $GIT_WORK_TREE. The main interface to create new worktree is git
 checkout --to.

 repos belongs to $GIT_COMMON_DIR (i.e. shared across all checkouts).
 The new worktrees (which I call linked checkouts) can also access
 HEAD of the original worktree via a virtual path main/HEAD. This
 makes it possible for a linked checkout to detach HEAD of the main
 one.

I am not happy with the choice of main/HEAD that would squat on a
good name for remote-tracking branch (i.e. s/origin/main/), though.
$GIT_DIR/COMMON_HEAD perhaps?

 The interesting thing is support for third party scripts (or hooks,
 maybe) so that they could work with both old and new git versions
 without some sort of git version/feature detection. Of course old git
 versions will only work with ordinary worktrees. To that end, git
 rev-parse --git-dir behavior could be changed by two environment
 variables. $GIT_ONE_PATH makes 'rev-parse --git-dir' return the .git
 _file_ in this case, which makes it much easier to pass the repo's
 checkout view around with git --git-dir=... .$GIT_COMMON_DIR_PATH
 makes 'rev-parse --git-dir' return $GIT_COMMON_DIR if it's from a
 linked checkout, or $GIT_DIR otherwise.

I do not understand why you need to go such a route.

Existing scripts that works only in a real repository will only know
git rev-parse --git-dir as the way to get the real GIT_DIR and
would not care about the common thing.  Scripts updated to work
well with the common thing needs to be aware of the common thing
anyway, so adding git rev-parse --common-git-dir or somesuch that
only these updated knows would be sufficient, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-20 Thread Duy Nguyen

On Sat, Dec 21, 2013 at 3:32 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 I've got a better version [1] that fixes everything I can think of
 (there's still some room for improvements). I'm going to use it a bit
 longer before reposting again. But here's its basic design without
 going down to code

 New .git file format includes two lines:
 -- 8 --
 gitid: id
 gitdir: path
 -- 8 --

 Which would set $GIT_COMMON_DIR to path and $GIT_DIR to
 path/repos/id. Repository split is the same as before, worktree
 stuff in $GIT_DIR, the rest in $GIT_COMMON_DIR. This .git file format
 takes precedence over core.worktree but can still be overriden with
 $GIT_WORK_TREE. The main interface to create new worktree is git
 checkout --to.

 repos belongs to $GIT_COMMON_DIR (i.e. shared across all checkouts).
 The new worktrees (which I call linked checkouts) can also access
 HEAD of the original worktree via a virtual path main/HEAD. This
 makes it possible for a linked checkout to detach HEAD of the main
 one.

 I am not happy with the choice of main/HEAD that would squat on a
 good name for remote-tracking branch (i.e. s/origin/main/), though.
 $GIT_DIR/COMMON_HEAD perhaps?

It's not just about HEAD. Anything worktree-specific of the main
checkout can be accessed this way, e.g. main/index,
main/FETCH_HEAD and it's not exactly common because it's
worktree info. Maybe 1ST_ as the prefix (e.g. 1ST_HEAD, 1ST_index...)
?

 The interesting thing is support for third party scripts (or hooks,
 maybe) so that they could work with both old and new git versions
 without some sort of git version/feature detection. Of course old git
 versions will only work with ordinary worktrees. To that end, git
 rev-parse --git-dir behavior could be changed by two environment
 variables. $GIT_ONE_PATH makes 'rev-parse --git-dir' return the .git
 _file_ in this case, which makes it much easier to pass the repo's
 checkout view around with git --git-dir=... .$GIT_COMMON_DIR_PATH
 makes 'rev-parse --git-dir' return $GIT_COMMON_DIR if it's from a
 linked checkout, or $GIT_DIR otherwise.

 I do not understand why you need to go such a route.

 Existing scripts that works only in a real repository will only know
 git rev-parse --git-dir as the way to get the real GIT_DIR and
 would not care about the common thing.  Scripts updated to work
 well with the common thing needs to be aware of the common thing
 anyway, so adding git rev-parse --common-git-dir or somesuch that
 only these updated knows would be sufficient, no?

It simplifies the changes, if the new script is to work with both old
and new git versions it may have to write

DIR=`git rev-parse --git-common-dir 2/dev/null || git rev-parse --git-dir`

the env way makes it

DIR=`GIT_COMMON_DIR=1 git rev-parse --git-dir`
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-19 Thread Duy Nguyen

I've got a better version [1] that fixes everything I can think of
(there's still some room for improvements). I'm going to use it a bit
longer before reposting again. But here's its basic design without
going down to code

New .git file format includes two lines:
-- 8 --
gitid: id
gitdir: path
-- 8 --

Which would set $GIT_COMMON_DIR to path and $GIT_DIR to
path/repos/id. Repository split is the same as before, worktree
stuff in $GIT_DIR, the rest in $GIT_COMMON_DIR. This .git file format
takes precedence over core.worktree but can still be overriden with
$GIT_WORK_TREE. The main interface to create new worktree is git
checkout --to.

repos belongs to $GIT_COMMON_DIR (i.e. shared across all checkouts).
The new worktrees (which I call linked checkouts) can also access
HEAD of the original worktree via a virtual path main/HEAD. This
makes it possible for a linked checkout to detach HEAD of the main
one.

There are three entries in repos/id: gitdir should point to the
.git file that points it back here. Every time a linked checkout is
used, git should update mtime of this gitdir file to help pruning.
It should update the file content too if the repo is moved. link is
a hardlink to .git file, if supported, again for pruning support.
locked, if exists, means no automatic pruning (e.g. the linked
checkout is on a portable device).

The interesting thing is support for third party scripts (or hooks,
maybe) so that they could work with both old and new git versions
without some sort of git version/feature detection. Of course old git
versions will only work with ordinary worktrees. To that end, git
rev-parse --git-dir behavior could be changed by two environment
variables. $GIT_ONE_PATH makes 'rev-parse --git-dir' return the .git
_file_ in this case, which makes it much easier to pass the repo's
checkout view around with git --git-dir=... .$GIT_COMMON_DIR_PATH
makes 'rev-parse --git-dir' return $GIT_COMMON_DIR if it's from a
linked checkout, or $GIT_DIR otherwise. This makes 'rev-parse
--git-dir' falls back safely when running using old git versions. The
last patch in [1] that updates git-completion.bash could demonstrate
how it's used.

[1] https://github.com/pclouds/git.git checkout-new-worktree
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 00/21] Support multiple worktrees

2013-12-14 Thread Nguyễn Thái Ngọc Duy

The UI and behavior are taking shape now. On the UI side, you do

  git checkout --to /somewhere -b newbranch origin/master

which will create worktree-only repo at /somewhere. git prune --repos
could be used to remove cruft in .git/repos.

On the behavior side, you should be able to do everything in
/somewhere just like in a normal repository. If a ref is updated (from
any repository) that also happens to be your HEAD, it will be
detached. git rev-list --all is also taught to take repos/.../HEAD
into account.

The structure of repos/XXX is documented in 17/21. Known issues

 - naming ($GIT_SUPER_DIR, the name of the shared repo and the
   dependent one, the reuse of gitdir in .git files)

 - gc --auto support, support for manually pruning .git/repos

 - should probably support the new .git format in enter_repo() so that
   we can push to it

 - not sure if we need UI for deleting repositories created with
   checkout --to, or just rm -r and let gc --auto clean
   things up. The thing about rm -r(f) is that if .git happens to be
   a real repo, the user is screwed so I don't really like to
   encourage doing it.

 - more tests


Nguyễn Thái Ngọc Duy (21):
  path.c: avoid PATH_MAX as buffer size from get_pathname()
  path.c: rename vsnpath() to git_vsnpath()
  path.c: move git_path() closer to similar functions git_pathdup()
  Make git_path() aware of file relocation in $GIT_DIR
  reflog: use avoid constructing .lock path with git_path
  fast-import: use git_path() for accessing .git dir instead of get_git_dir()
  Add new environment variable $GIT_SUPER_DIR
  setup.c: refactor path manipulation out of read_gitfile()
  setup.c: add split-repo support to .git files
  setup.c: add split-repo support to is_git_directory()
  setup.c: reduce cleanup sites in setup_explicit_git_dir()
  environment.c: support super .git file specified by $GIT_DIR
  setup: support $GIT_SUPER_DIR as well as super .git files
  checkout: support checking out into a new working directory
  checkout: clean up half-prepared directories in --to mode
  setup.c: keep track of the .git file location if read
  prune: strategies for split repositories
  refs: adjust reflog path for repos/id/HEAD
  refs: detach split repos' HEAD when the linked ref is updated/deleted
  refs.c: refactor do_head_ref(... to do_one_ref(HEAD, ...
  revision: include repos/../HEAD in --all

 Documentation/config.txt   |   3 +-
 Documentation/git-checkout.txt |   6 +
 Documentation/git-prune.txt|   4 +
 Documentation/git.txt  |   8 ++
 Documentation/gitrepository-layout.txt |  30 
 builtin/checkout.c | 173 ++
 builtin/prune.c|  65 +
 builtin/reflog.c   |   2 +-
 builtin/rev-parse.c|   6 +
 cache.h|   5 +
 environment.c  |  37 -
 fast-import.c  |   5 +-
 path.c | 140 ++
 refs.c |  88 ++--
 refs.h |   1 +
 revision.c |   1 +
 setup.c| 253 -
 t/t0060-path-utils.sh  | 133 +
 t/t1501-worktree.sh|  52 +++
 t/t1510-repo-setup.sh  |   1 +
 test-path-utils.c  |   7 +
 trace.c|   1 +
 22 files changed, 904 insertions(+), 117 deletions(-)

-- 
1.8.5.1.77.g42c48fa

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Support multiple worktrees

2013-12-14 Thread Duy Nguyen

On Sat, Dec 14, 2013 at 5:54 PM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote:
 Known issues

Scripts that expand $GIT_DIR/objects and are not aware about the new
env variable. I introduced test-path-utils --git-path to test
git_path(). I could move it to git rev-parse --git-path for use in
scripts, but there'll be more changes. git-new-workdir's symlink
approach shines here.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

[PATCH v2 00/21] Read `packed-refs` using mmap()

Re: [PATCH v2 00/21]

Re: [PATCH v2 00/21]

Re: [PATCH v2 00/21]

Re: [PATCH v2 00/21]

[PATCH v2 00/21]

Re: [PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] object_id part 7

[PATCH v2 00/21] object_id part 7

Re: [PATCH v2 00/21] Add configuration options for split-index

Re: [PATCH v2 00/21] Add configuration options for split-index

Re: [PATCH v2 00/21] Add configuration options for split-index

Re: [PATCH v2 00/21] Add configuration options for split-index

[PATCH v2 00/21] Add configuration options for split-index

[PATCH v2 00/21] git bisect improvements

Re: [PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

[PATCH v2 00/21] Support multiple worktrees

Re: [PATCH v2 00/21] Support multiple worktrees

32 matches

Site Navigation

Mail list logo

Footer information