Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-27 Thread Jeff King
On Mon, Aug 27, 2018 at 05:55:43AM -0400, Eric Sunshine wrote:

> On Fri, Aug 24, 2018 at 7:25 PM Jeff King  wrote:
> > On Fri, Aug 24, 2018 at 06:55:24PM -0400, Eric Sunshine wrote:
> > > On Fri, Aug 24, 2018 at 10:47 AM Duy Nguyen  wrote:
> > > > > I was thinking that "worktree add" could start respecting the --force
> > > > > option as an escape hatch.
> > > >
> > > > Sounds good. Eric are you going to implement this? Just checking so
> > > > that I can (hopefully) cross this off my backlog ;-)
> > >
> > > It wasn't something I was planning on working on (at least not
> > > immediately) [...]
> > > As for the actual implementation, I haven't yet looked at how much
> > > surgery will be needed to make 'add' respect --force.
> >
> > Me either. I may take a look this weekend. [...]
> 
> Okay, I got an implementation up and running. It didn't require too
> much code, but neither was it a simple 1- or 2-liner.
> 
> I still need to update documentation, write tests, and compose the
> actual patch series (which will probably run to about 5 patches), so
> it's not quite ready to send out, but hopefully soon.

Great, and thanks for letting me know before we duplicated effort.

-Peff


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-27 Thread Eric Sunshine
On Fri, Aug 24, 2018 at 7:25 PM Jeff King  wrote:
> On Fri, Aug 24, 2018 at 06:55:24PM -0400, Eric Sunshine wrote:
> > On Fri, Aug 24, 2018 at 10:47 AM Duy Nguyen  wrote:
> > > > I was thinking that "worktree add" could start respecting the --force
> > > > option as an escape hatch.
> > >
> > > Sounds good. Eric are you going to implement this? Just checking so
> > > that I can (hopefully) cross this off my backlog ;-)
> >
> > It wasn't something I was planning on working on (at least not
> > immediately) [...]
> > As for the actual implementation, I haven't yet looked at how much
> > surgery will be needed to make 'add' respect --force.
>
> Me either. I may take a look this weekend. [...]

Okay, I got an implementation up and running. It didn't require too
much code, but neither was it a simple 1- or 2-liner.

I still need to update documentation, write tests, and compose the
actual patch series (which will probably run to about 5 patches), so
it's not quite ready to send out, but hopefully soon.


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-24 Thread Jeff King
On Fri, Aug 24, 2018 at 06:55:24PM -0400, Eric Sunshine wrote:

> On Fri, Aug 24, 2018 at 10:47 AM Duy Nguyen  wrote:
> > On Thu, Aug 23, 2018 at 8:21 PM Eric Sunshine  
> > wrote:
> > > Peff wrote:
> > > > Yes, but then what's the next step for my script? I can't "remove" since
> > > > the worktree isn't there. I can't blow away any directory that I know
> > > > about, since there isn't one.
> > >
> > > I was thinking that "worktree add" could start respecting the --force
> > > option as an escape hatch.
> > >
> > > > What about refusing by default, but forcing an overwrite with "-f"?
> > >
> > > My thought, also.
> >
> > Sounds good. Eric are you going to implement this? Just checking so
> > that I can (hopefully) cross this off my backlog ;-)
> 
> It wasn't something I was planning on working on (at least not
> immediately) since it's still a bit fuzzy for me whether this is
> enough to help Peff's use-case (and because I have several other
> things in my queue, already).

I'm pretty sure it would just be a one-liner to "worktree add -f" in the
doc-diff script. So I think it does solve the problem.

> However, before even considering implementing it, there's at least one
> question (and possibly others) needing answering. For instance, how
> should "add --force" interact with a locked (not-present) worktree?
> Should it blast it despite the lock? Or would that need --force
> specified twice ("git worktree add -f -f foo")?

Yes, I think that should probably be two forces.

> As for the actual implementation, I haven't yet looked at how much
> surgery will be needed to make 'add' respect --force.

Me either. I may take a look this weekend. I got sucked into an asm and
coccinelle rabbit hole the last few days.

-Peff


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-24 Thread Eric Sunshine
On Fri, Aug 24, 2018 at 10:47 AM Duy Nguyen  wrote:
> On Thu, Aug 23, 2018 at 8:21 PM Eric Sunshine  wrote:
> > Peff wrote:
> > > Yes, but then what's the next step for my script? I can't "remove" since
> > > the worktree isn't there. I can't blow away any directory that I know
> > > about, since there isn't one.
> >
> > I was thinking that "worktree add" could start respecting the --force
> > option as an escape hatch.
> >
> > > What about refusing by default, but forcing an overwrite with "-f"?
> >
> > My thought, also.
>
> Sounds good. Eric are you going to implement this? Just checking so
> that I can (hopefully) cross this off my backlog ;-)

It wasn't something I was planning on working on (at least not
immediately) since it's still a bit fuzzy for me whether this is
enough to help Peff's use-case (and because I have several other
things in my queue, already).

However, before even considering implementing it, there's at least one
question (and possibly others) needing answering. For instance, how
should "add --force" interact with a locked (not-present) worktree?
Should it blast it despite the lock? Or would that need --force
specified twice ("git worktree add -f -f foo")?

As for the actual implementation, I haven't yet looked at how much
surgery will be needed to make 'add' respect --force.


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-24 Thread Duy Nguyen
Jeff, you're doing crazy things beyond my (admittedly very limited)
imagination :P I did not see this at all when I implemented the
worktree stuff.

On Thu, Aug 23, 2018 at 8:21 PM Eric Sunshine  wrote:
> > > In this case, it might make sense for "git worktree add" to refuse to
> > > operate if an existing worktree entry still points at the directory
> > > that you're trying to add. That should prevent those duplicate
> > > worktree entries you saw.
> >
> > Yes, but then what's the next step for my script? I can't "remove" since
> > the worktree isn't there. I can't blow away any directory that I know
> > about, since there isn't one.
>
> I was thinking that "worktree add" could start respecting the --force
> option as an escape hatch.
>
> > I need to somehow know that an existing
> > "$GIT_DIR/worktrees/foo" is the problem. But "foo" is not even
> > deterministic. Looking at the duplicates, it seems to be the basename of
> > the working tree, but then mutated to avoid collisions with other
> > worktrees.
>
> If the worktree directory still existed, "git -C rev-parse --git-dir"
> inside the worktree would give you the proper path of
> $GIT_DIR/worktrees/foo, but the directory doesn't exist, so...
> nothing.
>
> > What about refusing by default, but forcing an overwrite with "-f"?
>
> My thought, also.

Sounds good. Eric are you going to implement this? Just checking so
that I can (hopefully) cross this off my backlog ;-)
-- 
Duy


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-23 Thread Eric Sunshine
On Tue, Aug 21, 2018 at 4:43 PM Jeff King  wrote:
> On Tue, Aug 21, 2018 at 04:22:08PM -0400, Eric Sunshine wrote:
> > On Tue, Aug 21, 2018 at 3:36 PM Jeff King  wrote:
> > How about using "git clone --shared" instead?
>
> That seems even more dangerous to me, since the created clone can become
> corrupt when the parent prunes. Probably not huge for a single
> operation, but you may be surprised when you run the script a few days
> later and it barfs horribly due to a missing object.

Okay. I had thought that doc-diff was never doing anything other than
read-only operations on the checked-out worktree after the initial
creation, but, looking more closely at the script, I now see that it
can perform other Git-based operations, so what you say makes sense.

> > In the case that you've already blown away the directory, then having
> > "git worktree add" prune away the old worktree bookkeeping would make
> > sense and wouldn't lose anything (you've already thrown it away
> > manually). However, it could be lossy for the case when the directory
> > is only temporarily missing (because it's on removable media or a
> > network share).
>
> I think the removable ones already suffer from that problem (an auto-gc
> can prune them). And they should already be marked with "git worktree
> lock". That said, people don't always do what they should, and I'd
> rather not make the problem worse. :)

Hmph. I thought that "git worktree prune" had a sensible "expire"
default to protect against such cases of removable media for which
"git worktree lock" wasn't invoked, but, looking at the code, I see
that the default is TIME_MAX.

> > In this case, it might make sense for "git worktree add" to refuse to
> > operate if an existing worktree entry still points at the directory
> > that you're trying to add. That should prevent those duplicate
> > worktree entries you saw.
>
> Yes, but then what's the next step for my script? I can't "remove" since
> the worktree isn't there. I can't blow away any directory that I know
> about, since there isn't one.

I was thinking that "worktree add" could start respecting the --force
option as an escape hatch.

> I need to somehow know that an existing
> "$GIT_DIR/worktrees/foo" is the problem. But "foo" is not even
> deterministic. Looking at the duplicates, it seems to be the basename of
> the working tree, but then mutated to avoid collisions with other
> worktrees.

If the worktree directory still existed, "git -C rev-parse --git-dir"
inside the worktree would give you the proper path of
$GIT_DIR/worktrees/foo, but the directory doesn't exist, so...
nothing.

> What about refusing by default, but forcing an overwrite with "-f"?

My thought, also.


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 04:22:08PM -0400, Eric Sunshine wrote:

> On Tue, Aug 21, 2018 at 3:36 PM Jeff King  wrote:
> > The script does basically this to set up the temporary tree:
> >
> >   test -d $tmp || git worktree add $tmp ...
> >
> > The script never cleans up the worktree (since its results can often be
> > reused between runs), but you may do so with "rm" or "git clean". That
> > creates an interesting situation if the script is run again before
> > "worktree prune" runs.
> 
> Aside from the problems you enumerate below, leaving worktrees sitting
> around which the user did not create explicitly does seem a bit
> unfriendly, which leads me to think that worktrees may not be the best
> tool for this task. How about using "git clone --shared" instead?

That seems even more dangerous to me, since the created clone can become
corrupt when the parent prunes. Probably not huge for a single
operation, but you may be surprised when you run the script a few days
later and it barfs horribly due to a missing object.

We could do a local filesystem clone which would make a hardlink. But
eventually those hardlinks would be broken, and we'd be wasting a lot of
extra space (and there's no provision for repacking or maintenance of
the temp repo; again, OK for a day or two, but if maybe not if you leave
this thing lying around for months).

I also considered just using "git archive | tar xf -" to create the
checkouts.  But we really would like to move between trees without
updating files unnecessarily (to avoid triggering rebuilds via make).

We could do that without a full-on worktree by just keeping our own
temporary index. But I think that potentially ends up with similar
problems to the "clone --shared" one: at some point objects in the
parent go away, and it has no idea about our custom index.

I think a worktree with a detached HEAD is roughly the same concept, but
with an officially-approved mechanism by which the parent knows about
our worktree.

> >   1. Should the script be doing something else to indicate that the
> >  worktree may be reused? I tried "git worktree remove", but it's
> >  unhappy that the directory doesn't exist. Should it quietly handle
> >  ignore that and remove any leftover cruft in $GIT_DIR/worktrees?
> 
> That's a weird case. There are multiple entries in
> .git/worktrees/*/gitdir pointing at the same worktree directory, which
> I don't think was considered when the machinery was being designed.
> "git worktree remove" refusing to delete the worktree in this case
> seems a good safety measure since something is obviously askew in the
> bookkeeping and it doesn't want to lose potential work.
> 
> The solution to this problem might be to upgrade "prune" as you
> describe in #3 and then ensure that that sort of aggressive pruning
> happens automatically at "git worktree add" time.

I would feel funny about running "git worktree prune" in my script,
since it may also delete _other_ worktrees. But if "worktree prune"
handled this case, I'd be OK just ignoring occasional duplicates, and
periodic "git gc" would clean up the cruft.

> >   2. Should "git worktree add" be more clever about realizing that an
> >  existing entry in $GIT_DIR/worktrees points to this directory? That
> >  would be fine for my use, but I wonder if there's some potential
> >  for loss (e.g., you blew away the work tree but until you do a
> >  "worktree prune", the refs are still there, objects reachable,
> >  etc).
> 
> In the case that you've already blown away the directory, then having
> "git worktree add" prune away the old worktree bookkeeping would make
> sense and wouldn't lose anything (you've already thrown it away
> manually). However, it could be lossy for the case when the directory
> is only temporarily missing (because it's on removable media or a
> network share).

I think the removable ones already suffer from that problem (an auto-gc
can prune them). And they should already be marked with "git worktree
lock". That said, people don't always do what they should, and I'd
rather not make the problem worse. :)

> In this case, it might make sense for "git worktree add" to refuse to
> operate if an existing worktree entry still points at the directory
> that you're trying to add. That should prevent those duplicate
> worktree entries you saw.

Yes, but then what's the next step for my script? I can't "remove" since
the worktree isn't there. I can't blow away any directory that I know
about, since there isn't one. I need to somehow know that an existing
"$GIT_DIR/worktrees/foo" is the problem. But "foo" is not even
deterministic. Looking at the duplicates, it seems to be the basename of
the working tree, but then mutated to avoid collisions with other
worktrees.

What about refusing by default, but forcing an overwrite with "-f"?

That should keep the existing case as safe as now, but give an outlet
for this kind of case where the caller is in charge of the worktree and
know

Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Eric Sunshine
On Tue, Aug 21, 2018 at 3:36 PM Jeff King  wrote:
> The script does basically this to set up the temporary tree:
>
>   test -d $tmp || git worktree add $tmp ...
>
> The script never cleans up the worktree (since its results can often be
> reused between runs), but you may do so with "rm" or "git clean". That
> creates an interesting situation if the script is run again before
> "worktree prune" runs.

Aside from the problems you enumerate below, leaving worktrees sitting
around which the user did not create explicitly does seem a bit
unfriendly, which leads me to think that worktrees may not be the best
tool for this task. How about using "git clone --shared" instead?

More below...

> We identify the directory as a "new" worktree,
> and add it to the list. So you may end up with several copies:
>
>   $ git worktree list
>   [...]
>   /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  cc6237c051 
> (detached HEAD)
>   /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
> (detached HEAD)
>   /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
> (detached HEAD)
>
> If I then run "git worktree prune", those duplicates don't go away
> (because the directory is still there; it just corresponds to only the
> final entry). If I delete the tmp-doc-diff directory and then run "git
> worktree prune", they do all go away.
>
>   1. Should the script be doing something else to indicate that the
>  worktree may be reused? I tried "git worktree remove", but it's
>  unhappy that the directory doesn't exist. Should it quietly handle
>  ignore that and remove any leftover cruft in $GIT_DIR/worktrees?

That's a weird case. There are multiple entries in
.git/worktrees/*/gitdir pointing at the same worktree directory, which
I don't think was considered when the machinery was being designed.
"git worktree remove" refusing to delete the worktree in this case
seems a good safety measure since something is obviously askew in the
bookkeeping and it doesn't want to lose potential work.

The solution to this problem might be to upgrade "prune" as you
describe in #3 and then ensure that that sort of aggressive pruning
happens automatically at "git worktree add" time.

>   2. Should "git worktree add" be more clever about realizing that an
>  existing entry in $GIT_DIR/worktrees points to this directory? That
>  would be fine for my use, but I wonder if there's some potential
>  for loss (e.g., you blew away the work tree but until you do a
>  "worktree prune", the refs are still there, objects reachable,
>  etc).

In the case that you've already blown away the directory, then having
"git worktree add" prune away the old worktree bookkeeping would make
sense and wouldn't lose anything (you've already thrown it away
manually). However, it could be lossy for the case when the directory
is only temporarily missing (because it's on removable media or a
network share).

In this case, it might make sense for "git worktree add" to refuse to
operate if an existing worktree entry still points at the directory
that you're trying to add. That should prevent those duplicate
worktree entries you saw.

>   3. Should "git worktree prune" be more clever about dropping
>  duplicates? I think it should be easy to identify them: they are
>  entries in $GIT_DIR/worktrees for which:
>
>- the directory in $entry/gitdir does exist, but
>- $(cat $entry/gitdir)/.git does not point back to $entry

Seems a sensible improvement to the pruning logic.

However, upon further consideration, any of the proposed "fixes" could
potentially be lossy. Consider a case like this:

% git worktree add foo
... make some changes in 'foo' ...
% mv foo bar # (fogetting to do "git worktree move foo bar")
% git worktree add foo

As currently implemented, one can "correct" the situation by manually
fixing the bookkeeping file in .git/worktrees for the worktree created
first. If it gets pruned automatically, then the state of those
changes in "bar" (nee "foo") could be lost.

So, I'm not sure what, if any, fix is appropriate.

Such uncertainty also further argues in favor of "git clone --shared"
for your particular use-case, I think.


Re: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Derrick Stolee

On 8/21/2018 3:23 PM, Jeff King wrote:

We already advise people to make sure their documentation
formats correctly. Let's point them at the doc-diff script,
which can help with that.

Let's also put a brief note in the script about its purpose,
since that otherwise can only be found in the original
commit message. Along with the existing -h/usage text,
that's hopefully enough for developers to make use of it.


This is helpful, thanks!


Signed-off-by: Jeff King 
---
Just a finishing touch on the jk/diff-rendered-docs topic.

  Documentation/SubmittingPatches | 4 +++-
  Documentation/doc-diff  | 8 
  2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index b44fd51f27..ec8b205145 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -80,7 +80,9 @@ GitHub-Travis CI hints section for details.
  
  Do not forget to update the documentation to describe the updated

  behavior and make sure that the resulting documentation set formats
-well. It is currently a liberal mixture of US and UK English norms for
+well (try the Documentation/doc-diff script).
+
+We currently have a liberal mixture of US and UK English norms for
  spelling and grammar, which is somewhat unfortunate.  A huge patch that
  touches the files all over the place only to correct the inconsistency
  is not welcome, though.  Potential clashes with other changes that can
diff --git a/Documentation/doc-diff b/Documentation/doc-diff
index f483fe427c..6e285e648c 100755
--- a/Documentation/doc-diff
+++ b/Documentation/doc-diff
@@ -1,4 +1,12 @@
  #!/bin/sh
+#
+# Build two documentation trees and diff the resulting formatted output.
+# Compared to a source diff, this can reveal mistakes in the formatting.
+# For example:
+#
+#   ./doc-diff origin/master HEAD
+#
+# would show the differences introduced by a branch based on master.
  
  OPTIONS_SPEC="\

  doc-diff [options]   [-- ]


worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Jeff King
On Tue, Aug 21, 2018 at 03:23:21PM -0400, Jeff King wrote:

> We already advise people to make sure their documentation
> formats correctly. Let's point them at the doc-diff script,
> which can help with that.
> 
> Let's also put a brief note in the script about its purpose,
> since that otherwise can only be found in the original
> commit message. Along with the existing -h/usage text,
> that's hopefully enough for developers to make use of it.
> 
> Signed-off-by: Jeff King 
> ---
> Just a finishing touch on the jk/diff-rendered-docs topic.

I noticed one other oddity with this script, but I actually think the
fix lies elsewhere.

The script does basically this to set up the temporary tree:

  test -d $tmp || git worktree add $tmp ...

The script never cleans up the worktree (since its results can often be
reused between runs), but you may do so with "rm" or "git clean". That
creates an interesting situation if the script is run again before
"worktree prune" runs. We identify the directory as a "new" worktree,
and add it to the list. So you may end up with several copies:

  $ git worktree list
  /home/peff/compile/git  eee785d2e0 
[jk/doc-diff]
  /home/peff/compile/git/.git/tmp-ci  290f16acda 
(detached HEAD)
  /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  cc6237c051 
(detached HEAD)
  /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
(detached HEAD)
  /home/peff/compile/git/Documentation/tmp-doc-diff/worktree  e55de40950 
(detached HEAD)

If I then run "git worktree prune", those duplicates don't go away
(because the directory is still there; it just corresponds to only the
final entry). If I delete the tmp-doc-diff directory and then run "git
worktree prune", they do all go away.

So I'm not sure:

  1. Should the script be doing something else to indicate that the
 worktree may be reused? I tried "git worktree remove", but it's
 unhappy that the directory doesn't exist. Should it quietly handle
 ignore that and remove any leftover cruft in $GIT_DIR/worktrees?

  2. Should "git worktree add" be more clever about realizing that an
 existing entry in $GIT_DIR/worktrees points to this directory? That
 would be fine for my use, but I wonder if there's some potential
 for loss (e.g., you blew away the work tree but until you do a
 "worktree prune", the refs are still there, objects reachable,
 etc).

  3. Should "git worktree prune" be more clever about dropping
 duplicates? I think it should be easy to identify them: they are
 entries in $GIT_DIR/worktrees for which:

   - the directory in $entry/gitdir does exist, but

   - $(cat $entry/gitdir)/.git does not point back to $entry

I could see any of them being plausible fixes, but people who have given
worktrees a lot of thought may have stronger opinions.

-Peff


[PATCH] SubmittingPatches: mention doc-diff

2018-08-21 Thread Jeff King
We already advise people to make sure their documentation
formats correctly. Let's point them at the doc-diff script,
which can help with that.

Let's also put a brief note in the script about its purpose,
since that otherwise can only be found in the original
commit message. Along with the existing -h/usage text,
that's hopefully enough for developers to make use of it.

Signed-off-by: Jeff King 
---
Just a finishing touch on the jk/diff-rendered-docs topic.

 Documentation/SubmittingPatches | 4 +++-
 Documentation/doc-diff  | 8 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index b44fd51f27..ec8b205145 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -80,7 +80,9 @@ GitHub-Travis CI hints section for details.
 
 Do not forget to update the documentation to describe the updated
 behavior and make sure that the resulting documentation set formats
-well. It is currently a liberal mixture of US and UK English norms for
+well (try the Documentation/doc-diff script).
+
+We currently have a liberal mixture of US and UK English norms for
 spelling and grammar, which is somewhat unfortunate.  A huge patch that
 touches the files all over the place only to correct the inconsistency
 is not welcome, though.  Potential clashes with other changes that can
diff --git a/Documentation/doc-diff b/Documentation/doc-diff
index f483fe427c..6e285e648c 100755
--- a/Documentation/doc-diff
+++ b/Documentation/doc-diff
@@ -1,4 +1,12 @@
 #!/bin/sh
+#
+# Build two documentation trees and diff the resulting formatted output.
+# Compared to a source diff, this can reveal mistakes in the formatting.
+# For example:
+#
+#   ./doc-diff origin/master HEAD
+#
+# would show the differences introduced by a branch based on master.
 
 OPTIONS_SPEC="\
 doc-diff [options]   [-- ]
-- 
2.19.0.rc0.398.g138a08f6f6