Bad interaction between git clean, gitignore, and deleted submodules

2017-11-20 Thread Craig Silverstein
We have the following situation:
1) A .gitignore file that contains '*.pyc'
2) A repo with a submodule named jinja2

In normal use, clients of our repo have it checked out and run things
in it, creating files like jinja2/run.pyc.

I deleted the jinja2 submodule (by running `git rm jinja2` and
pushing).  Clients did a `git pull; git submodule update` and saw that
the jinja2 directory was still around, albeit untracked. So they ran
`git clean -ffd`.

The problem is that git clean refuses to delete the directory due to
the (ignored and thus uncleanable) file jinja2/run.pyc.  And when it
refuses to delete the directory, it also leaves around jinja2/.git.

Later, someone ran `git add .` and jinja2/.git got added back in as an
"orphaned submodule" (I forget the exact terminology).  I know there's
a very loud warning when this happens but somehow they didn't see it,
and it caused all sorts of trouble when they pushed their change.

My question: is it possible for a client to *really* get rid of the
jinja2 submodule?  We can't run `git clean -ffdx` because there are
other .gitignore'd files we want to keep around.

The behavior I'd like to see is for `git clean -ffd` to delete .git
files if they don't correspond to a currently registered submodule.
Then `git clean -ffd` would delete jinja2/.git even though it leaves
around jinja2/run.pyc.  But I don't know if that would break anything
else.

Or maybe we should just add `.git` to our `.gitignore`, so people who
run `git add .` can't create these orphaned submodules...

craig


Re: [PATCH] git-new-workdir: support submodules

2015-01-25 Thread Craig Silverstein
 But then, you are saying that the update does not fix these existing
 issues around submodule support.  So...?

I guess my point is that the existing contrib script has proven to be
useful to people, even though it imposes these constraints on clients
wrt the config file (namely, you can't have multiple workdirs that
need different values in the config file).  This patch, in adding
submodule support, I expect would be similarly useful to people even
though it, also, imposes those same constraints to the submodule's
config files.

I guess you'd rather see these config file issues fixed for all use
cases?  If so, I'm probably not the right person since I do not know
enough about how config files are used in git -- I fear any changes I
made would make some things worse for (some) existing clients of the
script, which is not what I want.  It sounds like this functionality
is being reimplemented in git proper in any case, so perhaps it's best
just to wait for that.  I don't know what its submodule support will
be, though.

craig
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-new-workdir: support submodules

2015-01-24 Thread Craig Silverstein
} Or one new-workdir checkout's branch may check out a top-level
} project from today while the other one may have a top-level project
} from two years ago,

This is also true, but just as much a problem with the 'git
new-workdir' script as it existed before my change.  It already
symlinks the top-level .git/config directory, which lists a remote,
submodules, and many other things.  Does symlinking the config file
for submodules add any new wrinkles, that symlinking the config file
for the top-level repository does not?

craig

On Fri, Jan 23, 2015 at 5:37 PM, Junio C Hamano gits...@pobox.com wrote:
 Craig Silverstein csilv...@khanacademy.org writes:

 Doesn't a submodule checkout keep some state tied to the working
 tree in its repository configuration file?

 Do you mean, in 'config' itself?  If so, I don't see it.  (Though it's
 possible there are ways to use submodules that do keep working-tree
 state in the config file, and we just happen not to use those ways.)
 Here's what my webapp/.git/modules/khan-exercises/config looks like:
 ---
 [core]
 repositoryformatversion = 0
 filemode = true
 bare = false
 logallrefupdates = true
 worktree = ../../../khan-exercises
 [remote origin]
 url = http://github.com/Khan/khan-exercises.git
 fetch = +refs/heads/*:refs/remotes/origin/*
 [branch master]
 remote = origin
 merge = refs/heads/master
 rebase = true
 [submodule test/qunit]
 url = https://github.com/jquery/qunit.git
 --

 The only thing that seems vaguely working-tree related is the
 'worktree' field, which is the field that motivated me to set up my
 patch the way it is.

 That is the location of the working tree of the top-level
 superproject.  Tied to the state of the submodule working tree
 appear in [submodule test/qunit] part.

 In one new-workdir checkout, that submodule may be submodule
 inited, while another one, it may not be.

 Or one new-workdir checkout's branch may check out a top-level
 project from today while the other one may have a top-level project
 from two years ago, and between these two checkouts of the top-level
 project, the settings of submodule.test/qunit.* variables may have
 to be different (perhaps even URL may have to point at two different
 repositories, one historical one to grab the state two years ago,
 the other current one).

 So sharing config between top-level checkouts may not be enough to
 support submodules (the patch title).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-new-workdir: support submodules

2015-01-23 Thread Craig Silverstein
Ping! (now that the holidays are past)

craig

On Tue, Dec 23, 2014 at 1:51 PM, Craig Silverstein
csilv...@khanacademy.org wrote:
 [Ack, I forgot to cc myself on the original patch so now I can't reply
 to it normally.  Hopefully my workaround doesn't mess up the threading
 too badly.]

 Junio C Hamano gitster at pobox.com writes:

 H, does that mean that the submodule S in the original
 repository O's working tree and its checkout in the secondary
 working tree W created from O using git-new-workdir share the same
 repository location?  More specifically:

 O/.git/ - original repository
 O/.git/index- worktree state in O
 O/S - submodule S's checkout in O
 O/S/.git- a gitfile pointing to O/.git/modules/S
 O/.git/modules/S- submodule S's repository contents
 O/.git/modules/S/config - submodule S's config

 W/.git/ - secondary working tree
 W/.git/config   - symlink to O/.git/config
 W/.git/index- worktree state in W (independent of O)
 W/S - submodule S's checkout in W (independent of O)
 W/S/.git- a gitfile pointing to O/.git/modules/S

 Right until the last line.  The .git file holds a relative path (at
 least, it always does in my experience), so W/S/.git will point to
 W/.git/modules/S.

 Also, to complete the story, my changes sets the following:

 W/.git/modules/S- secondary working tree for S
  W/.git/modules/S/config   - symlink to O/.git/modules/S/config
  W/.git/modules/S/index- worktree state in W's S
 (independent of O and O's S)

 Doesn't a submodule checkout keep some state tied to the working
 tree in its repository configuration file?

 Do you mean, in 'config' itself?  If so, I don't see it.  (Though it's
 possible there are ways to use submodules that do keep working-tree
 state in the config file, and we just happen not to use those ways.)
 Here's what my webapp/.git/modules/khan-exercises/config looks like:
 ---
 [core]
 repositoryformatversion = 0
 filemode = true
 bare = false
 logallrefupdates = true
 worktree = ../../../khan-exercises
 [remote origin]
 url = http://github.com/Khan/khan-exercises.git
 fetch = +refs/heads/*:refs/remotes/origin/*
 [branch master]
 remote = origin
 merge = refs/heads/master
 rebase = true
 [submodule test/qunit]
 url = https://github.com/jquery/qunit.git
 --

 The only thing that seems vaguely working-tree related is the
 'worktree' field, which is the field that motivated me to set up my
 patch the way it is.

 Wouldn't this change
 introduce problems by sharing O/.git/modules/S/config between the
 two checkouts?

 It's true that this change does result in sharing that file, so if
 that's problematic then you're right.  I'm afraid I don't know all the
 things that can go into a submodule config file.

 (There are other things I don't know as well, such as: do we see .git
 files with 'gitdir: ...' contents only for submodules, or are there
 other ways to create them as well?  Are 'gitdir' paths always
 relative?  Are there special files in .git (or rather .git/modules/S)
 that exist only for submodules and not for 'normal' repos, that we
 need to worry about symlinking?  I apologize for not knowing all these
 git internals, and hope you guys can help point out any gaps that
 affect this patch!)

 craig
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-new-workdir: support submodules

2014-12-23 Thread Craig Silverstein
The basic problem with submodules, from git-new-workdir's point of
view, is that instead of having a .git directory, they have a .git
file with contents `gitdir: some other path`.  This is a problem
because the submodule's config file has an entry like `worktree =
../../../khan-exercises` which is relative to some other path
rather than to submodule_dir/.git.

As a result, if we want the new workdir to work properly, it needs to
keep the same directory structure as the original repository: it
should also contain a .git file with a 'gitdir', and the actual .git
contents should be in the place mentioned therein.  (There are other
ways we could have solved this problem, including modifying the
'config' file, but this seemed most in keeping with the symlink-y
philosophy of git-new-branch.)

This commit implements this, by detecting if the source .git directory
is actually a .git file with 'gitdir: ...' as its contents, and if so
reproducing the .git file + gitdir structure in the destination
work-tree.

Test plan:
On a repo (~/khan/webapp) with a submodule, ran

git new-workdir ~/khan/webapp /tmp/webapp  # the main module
git new-workdir ~/khan/webapp/khan-exercises /tmp/webapp/khan-exercises

and saw that /tmp/webapp/khan-exercises was populated correctly,
/tmp/webapp/.git/modules/khan-exercises existed with symlinks, and
/tmp/webapp/khan-exercises/.git was a file with a 'gitdir:' entry
pointing to the .git/modules directory.

Signed-off-by: Craig Silverstein csilv...@khanacademy.org
---
 contrib/workdir/git-new-workdir | 55 +++--
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
index 888c34a..3cc50de 100755
--- a/contrib/workdir/git-new-workdir
+++ b/contrib/workdir/git-new-workdir
@@ -52,26 +52,45 @@ then
a complete repository.
 fi
 
+# don't modify an existing directory, unless it's empty
+if test -d $new_workdir  test $(ls -a1 $new_workdir/. | wc -l) -ne 2
+then
+   die destination directory '$new_workdir' is not empty.
+fi
+
 # make sure the links in the workdir have full paths to the original repo
 git_dir=$(cd $git_dir  pwd) || exit 1
 
-# don't recreate a workdir over an existing directory, unless it's empty
-if test -d $new_workdir
+new_git_dir=$new_workdir/.git
+
+# if $orig_git is a .git file with a 'gitdir' entry (as is the case for
+# submodules), have the new git dir follow that same pattern.  otherwise
+# the 'worktree' entry in .git/config, which is a relative path, will
+# not resolve properly because we're not in the expected subdirectory.
+gitdir_text=$(sed -ne 's/^gitdir: *//p' $orig_git/.git 2/dev/null)
+if test -n $gitdir_text; then
+   ln -s $orig_git/.git $new_workdir/.git || failed
+   new_git_dir=$new_workdir/$gitdir_text
+fi
+
+# if new_workdir already exists, leave it along in case of error
+if ! test -d $new_workdir
 then
-   if test $(ls -a1 $new_workdir/. | wc -l) -ne 2
-   then
-   die destination directory '$new_workdir' is not empty.
-   fi
-   cleandir=$new_workdir/.git
-else
-   cleandir=$new_workdir
+   clean_new_workdir=true
 fi
 
-mkdir -p $new_workdir/.git || failed
-cleandir=$(cd $cleandir  pwd) || failed
+mkdir -p $new_git_dir || failed
 
+cleandir=$(cd $cleandir  pwd) || failed
 cleanup () {
-   rm -rf $cleandir
+   if test z$clean_new_workdir = ztrue
+   then
+   rm -rf $new_workdir
+   fi
+   # this may (or may not) be a noop if new_workdir was already deleted.
+   rm -rf $new_git_dir
+   # this is a noop unless .git is a 'gitdir: ...' file.
+   rm -f $new_workdir/.git
 }
 siglist=0 1 2 15
 trap cleanup $siglist
@@ -84,22 +103,20 @@ do
# create a containing directory if needed
case $x in
*/*)
-   mkdir -p $new_workdir/.git/${x%/*}
+   mkdir -p $new_git_dir/${x%/*}
;;
esac
 
-   ln -s $git_dir/$x $new_workdir/.git/$x || failed
+   ln -s $git_dir/$x $new_git_dir/$x || failed
 done
 
-# commands below this are run in the context of the new workdir
-cd $new_workdir || failed
-
 # copy the HEAD from the original repository as a default branch
-cp $git_dir/HEAD .git/HEAD || failed
+cp $git_dir/HEAD $new_git_dir/HEAD || failed
 
 # the workdir is set up.  if the checkout fails, the user can fix it.
 trap - $siglist
 
 # checkout the branch (either the same as HEAD from the original repository,
-# or the one that was asked for)
+# or the one that was asked for).  we must be in the new workdir for this.
+cd $new_workdir || failed
 git checkout -f $branch
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-new-workdir: support submodules

2014-12-23 Thread Craig Silverstein
[Ack, I forgot to cc myself on the original patch so now I can't reply
to it normally.  Hopefully my workaround doesn't mess up the threading
too badly.]

Junio C Hamano gitster at pobox.com writes:

 H, does that mean that the submodule S in the original
 repository O's working tree and its checkout in the secondary
 working tree W created from O using git-new-workdir share the same
 repository location?  More specifically:

 O/.git/ - original repository
 O/.git/index- worktree state in O
 O/S - submodule S's checkout in O
 O/S/.git- a gitfile pointing to O/.git/modules/S
 O/.git/modules/S- submodule S's repository contents
 O/.git/modules/S/config - submodule S's config

 W/.git/ - secondary working tree
 W/.git/config   - symlink to O/.git/config
 W/.git/index- worktree state in W (independent of O)
 W/S - submodule S's checkout in W (independent of O)
 W/S/.git- a gitfile pointing to O/.git/modules/S

Right until the last line.  The .git file holds a relative path (at
least, it always does in my experience), so W/S/.git will point to
W/.git/modules/S.

Also, to complete the story, my changes sets the following:

W/.git/modules/S- secondary working tree for S
 W/.git/modules/S/config   - symlink to O/.git/modules/S/config
 W/.git/modules/S/index- worktree state in W's S
(independent of O and O's S)

 Doesn't a submodule checkout keep some state tied to the working
 tree in its repository configuration file?

Do you mean, in 'config' itself?  If so, I don't see it.  (Though it's
possible there are ways to use submodules that do keep working-tree
state in the config file, and we just happen not to use those ways.)
Here's what my webapp/.git/modules/khan-exercises/config looks like:
---
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
worktree = ../../../khan-exercises
[remote origin]
url = http://github.com/Khan/khan-exercises.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch master]
remote = origin
merge = refs/heads/master
rebase = true
[submodule test/qunit]
url = https://github.com/jquery/qunit.git
--

The only thing that seems vaguely working-tree related is the
'worktree' field, which is the field that motivated me to set up my
patch the way it is.

 Wouldn't this change
 introduce problems by sharing O/.git/modules/S/config between the
 two checkouts?

It's true that this change does result in sharing that file, so if
that's problematic then you're right.  I'm afraid I don't know all the
things that can go into a submodule config file.

(There are other things I don't know as well, such as: do we see .git
files with 'gitdir: ...' contents only for submodules, or are there
other ways to create them as well?  Are 'gitdir' paths always
relative?  Are there special files in .git (or rather .git/modules/S)
that exist only for submodules and not for 'normal' repos, that we
need to worry about symlinking?  I apologize for not knowing all these
git internals, and hope you guys can help point out any gaps that
affect this patch!)

craig
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Saving space/network on common repos

2014-12-22 Thread Craig Silverstein
btw, just FYI, the scheme you lay out here doesn't actually work
as-is.  The problem is the config file, which has an entry like:
   worktree = ../../../mysubmodule
This depends on the config file living in
./git/modules/mysubmodule/config.  But the proposed scheme moves the
config file to mysubmodule/.git/config, and the relative path is
broken.

I'm not sure what the best solution is; the cleanest one requires a
pretty substantial rewrite of git-new-workdir (not that it's such a
giant piece of code), and separating out new_workdir from new_gitdir.
The smallest one involves having some way to suppress the final 'git
checkout -f' (which is the only thing in this script that needs the
worktree entry to resolve somewhere) to allow for post-script cleanup.

craig

On Wed, Dec 17, 2014 at 4:07 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Craig Silverstein wrote:
 On Wed, Dec 17, 2014 at 2:32 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Craig Silverstein wrote:

 Question 4) Is there a practical way to set up submodules so they can
 use the same object-sharing framework that the main repo does?

 It's possible to do, but we haven't written a nice UI for it yet.
 (In other words, you can do this by cloning with --no-recurse-submodules
 and manually creating the submodule workdir in the appropriate place.

 Hmm, let me see if I understand you right -- you're suggesting that
 when cloning my reference repo, I do
 git clone --no-recurse-submodules my repo
 for (path, url) in `parse-.gitmodules`: git clone url path
 # this is psuedocode, obviously :-)

 and then when I want to create a new workdir, I do something like:
 cd reference_repo
 git new-workdir /var/workspace1
 for (path, url) in `parse-.gitmodules`: cd path  git new-workdir 
 /var/workspace1/path

 ?  Basically, I'm going back to the old git way of having each
 submodule have its own .git directory, rather than having it have a
 .git file with a 'gitdir' entry.  Am I understanding this right?

 Basically.  The initial clone can still use --recurse-submodules.
 When you create a new workdir you'd create new workdirs for the
 submodules by hand.

 A 'git submodule foreach' command in the initial repo can take
 care of the `parse-.gitmodules` part.

 [...]
 Also, it seems to me there's the possibility, with git-newdir, that if
 several of the workspaces try to fetch at the same time they could
 step on each others' toes.  Is that a problem?  I know there's a push
 lock but I don't believe there's a fetch lock, and I could imagine git
 getting unhappy if two fetches happened in the same repo at the same
 time.

 That's a good question.  If concurrent fetches cause trouble then I'd
 consider it a bug (it's not too different from multiple concurrent
 pushes to the same repository, which is a very common thing to do),
 but I haven't looked carefully into whether such bugs exist.

 Hopefully someone else can chime in.

 Thanks,
 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Saving space/network on common repos

2014-12-22 Thread Craig Silverstein
} This seems like good motivation to try to get that series in good
shape and release it soon.

I was going to spend some time tomorrow (if I can find any :-) )
trying to fix up the contrib script to work with submodules, or at
least the kind that we use.  Is that something that's worth the time
to do, or would we be better off just waiting for the work-tree stuff
to get released?  If I do end up doing it, would you be interested in
a pull request (or however patches are submitted in the git world)?

craig

On Mon, Dec 22, 2014 at 7:12 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Craig Silverstein wrote:

 btw, just FYI, the scheme you lay out here doesn't actually work
 as-is.  The problem is the config file, which has an entry like:
worktree = ../../../mysubmodule
 This depends on the config file living in
 ./git/modules/mysubmodule/config.  But the proposed scheme moves the
 config file to mysubmodule/.git/config, and the relative path is
 broken.

 As was pointed out to me privately, the behavior is exactly as you
 described and I had confused myself by looking at directory that
 wasn't even made with git-new-workdir.  Sorry for the nonsense.

 Workdirs share a single config file because information associated to
 branches set by git branch --set-upstream-to, git branch
 --edit-description, git remote, and so on are stored in the config
 file.

 The 'git checkout --to' series in pu avoids this problem by ignoring
 core.bare and core.worktree in worktrees created with 'git checkout --to'.
 To try it:

 git clone https://kernel.googlesource.com/pub/scm/git/git
 cd git
 git merge 'origin/pu^{/nd/multiple-work-trees}^2'
 make
 PATH=$(pwd)/bin-wrappers:$PATH

 git checkout --to=../experiment next

 This seems like good motivation to try to get that series in good
 shape and release it soon.

 Thanks again,
 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Are simultaneous fetches safe?

2014-12-19 Thread Craig Silverstein
(Separated out from another thread since this issue seems more general.)

I am planning to use 'git new-workdir', which basically lets several
workspaces share a single .git/refs directory. (Among other dirs in
.git) It's possible that I'll end up running 'git fetch' in these
workspaces simultaneously, meaning they'll be trying to update
.git/ref at the same time.  Is this safe?  I know there's a push-lock,
but there doesn't seem to be a fetch-lock.

When I tried it, I got some errors:

Running 'git fetch' in window 1:
---
khan% git fetch origin
remote: Counting objects: 37, done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 37 (delta 11), reused 0 (delta 0)
Unpacking objects: 100% (37/37), done.
From github.com:Khan/webapp
   1f893f3..a9d6739  master - origin/master
   2641d9f..b630758  athena - origin/athena
 + 2a83b90...b7ca8be chris  - origin/chris  (forced update)
   0e74194..a9d6739  sat- origin/sat
 * [new tag] gae-1219-0726-b7ca8bef9b50 - gae-1219-0726-b7ca8bef9b50
 * [new tag] gae-1219-0908-a9d67391b44c - gae-1219-0908-a9d67391b44c
---

Running 'git fetch' in window2 at the same time, in the same directory:
---
khan% git fetch origin
error: Ref refs/remotes/origin/master is at
a9d67391b44cc8df8336afbc0b0a53691eae1bd4 but expected
1f893f3a4d78012964fc7fb93d1f61eacb1b4858
From github.com:Khan/webapp
 ! 1f893f3..a9d6739  master - origin/master  (unable to update local ref)
error: Ref refs/remotes/origin/athena is at
b63075834f77e494c56aade8ef8d7154f174865c but expected
2641d9ff052769aca8919c5528f02d65210a12cf
 ! 2641d9f..b630758  athena - origin/athena  (unable to update local ref)
error: Ref refs/remotes/origin/chris is at
b7ca8bef9b5011aa763104f1193b60dd91e0ba0c but expected
2a83b9042cfd1d73970cf0333910f4db978fbc71
 ! 2a83b90...b7ca8be chris  - origin/chris  (unable to update local ref)
error: Ref refs/remotes/origin/sat is at
a9d67391b44cc8df8336afbc0b0a53691eae1bd4 but expected
0e74194f8a07a13dbae023f88d9cdf2ddcc3566f
 ! 0e74194..a9d6739  sat- origin/sat  (unable to update local ref)
 * [new tag] gae-1219-0726-b7ca8bef9b50 - gae-1219-0726-b7ca8bef9b50
 * [new tag] gae-1219-0908-a9d67391b44c - gae-1219-0908-a9d67391b44c
---

Are these errors benign, or is there the risk of corruption of some
kind?  Is there the possibility of corruption in other dirs as well,
such as .git/objects?

Is it possible that both fetch's could prompt a gc run, and if so, is
there a risk that two gc's running simultaneously could cause
problems?

(Here's the full list of .git dirs shared across workspaces, according
to https://github.com/git/git/blob/master/contrib/workdir/git-new-workdir):
   config refs logs/refs objects info hooks packed-refs remotes rr-cache svn

Thanks,
craig
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Saving space/network on common repos

2014-12-17 Thread Craig Silverstein
On Wed, Dec 17, 2014 at 2:32 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 You might find 'git new-workdir' from contrib/workdir to be helpful.
 It lets you attach multiple working copies to a single set of objects
 and refs.

Thanks!  That does indeed sound promising -- like a more principled
version of my GIT_OBJECT_DIRECTORY suggestion.

 Question 4) Is there a practical way to set up submodules so they can
 use the same object-sharing framework that the main repo does?

 It's possible to do, but we haven't written a nice UI for it yet.
 (In other words, you can do this by cloning with --no-recurse-submodules
 and manually creating the submodule workdir in the appropriate place.

Hmm, let me see if I understand you right -- you're suggesting that
when cloning my reference repo, I do
git clone --no-recurse-submodules my repo
for (path, url) in `parse-.gitmodules`: git clone url path
# this is psuedocode, obviously :-)

and then when I want to create a new workdir, I do something like:
cd reference_repo
git new-workdir /var/workspace1
for (path, url) in `parse-.gitmodules`: cd path  git new-workdir
/var/workspace1/path

?  Basically, I'm going back to the old git way of having each
submodule have its own .git directory, rather than having it have a
.git file with a 'gitdir' entry.  Am I understanding this right?

Also, it seems to me there's the possibility, with git-newdir, that if
several of the workspaces try to fetch at the same time they could
step on each others' toes.  Is that a problem?  I know there's a push
lock but I don't believe there's a fetch lock, and I could imagine git
getting unhappy if two fetches happened in the same repo at the same
time.

craig
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Saving space/network on common repos

2014-12-16 Thread Craig Silverstein
At Khan Academy, we are running a Jenkins installation as our build
server.  By design, our Jenkins machine has several different
directories that each hold a copy of the same git repository.  (For
instance, Jenkins may be running tests on our repo at several
different commits at the same time.)  When Jenkins decides to run a
test -- I'm simplifying a bit -- it will pick one of the copies of the
repo, do a 'git fetch origin  git checkout some commit' and the
run the tests.

Our repo has a lot of churn and some big files, and this git fetch can
take a long time. I'd like to reduce both the time to fetch and the
disk space used by sharing objects between the repo copies.

My research has turned up three techniques that try to address this use case:
* git clone --reference
* git clone --shared
* git clone local repo, which creates hard links

I can probably use any of these approaches, but git clone --reference
would be the easiest to set up.  I would do so by creating a 'cache'
repo that is just created to serve as a reference and not used in any
other way, so I wouldn't have to worry about the dangers with pruning,
accidentally deleting the repo, etc.

My big concern is that all these methods seem to just affect clone.  So:

Question 1) If I do 'git clone --reference, will the reference repo be
used for subsequent fetches as well?  What about 'git clone --shared'?

Question 2) If I git clone a local repo, will subsequent fetches also
create hard links?

Question 3) If the answer to any of the above is yes, how does this
work with packing?  Say I pack the reference repo (being careful not
to prune anything).  Will subsequent fetches still be able to get the
objects they need from the reference repo?

An added complication is submodules.  We have a submodule that is as
big and slow to fetch as our main repository.

Question 4) Is there a practical way to set up submodules so they can
use the same object-sharing framework that the main repo does?

I'm not keen on rewriting .gitmodules in each of my repos, so probably
something that uses info/alternates is the most workable.  I have a
scheme for setting that up that maybe will work, but it's a moot point
if info/alternates doesn't work for fetching.

I'm wondering if the best approach for us might be to use
GIT_OBJECT_DIRECTORY: set GIT_OBJECT_DIRECTORY to the shared cached
directory for each of our repos, so they all fetch to the same place.

Question 5) I haven't seen this mentioned anywhere else, so I'm
guessing it won't work.  Am I missing a big problem?

Question 6) Will git be sad if two different repos that share an
object directory, both do 'git fetch' at the same time?  I could maybe
protect fetches with an flock, but jenkins can do git operations
behind my back so it would be easier if I didn't have to worry about
locking.

Question 7) Is GIT_OBJECT_DIRECTORY supposed to work with subrepos?
In my experimentation, it looks like it doesn't: when I run
'GIT_OBJECT_DIRECTORY=../obj git submodule update --init' it still
puts the objects in .git/modules/submodule/objects/.  Is this a bug?
 Is there any way to work around it?

Any suggestions would be appreciated!  It feels to me like this is
something that git should support pretty easily given its
architecture, but I just don't see a way to do it.

Thanks,
craig
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html