Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Am 17.08.2012 20:11, schrieb Phil Hord: On Fri, Aug 17, 2012 at 12:44 PM, Jens Lehmann jens.lehm...@web.de wrote: I'm almost there. The only thing left is to check if a nested submodule is using a git directory. In that case I expect rm to fail even when -f is used to protect the submodule's history. I still need to find a suitable command for recursing the submodules and doing that check. I suppose the style of this is wrong, but this seems to work for me. git submodule foreach --recursive '! test -f .git' Thanks! I was looking for something less expensive, but given that I don't expect removing submodules to be a performance critical operation this test should just work fine. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Am 16.08.2012 23:56, schrieb Junio C Hamano: Jens Lehmann jens.lehm...@web.de writes: Am 09.07.2012 21:38, schrieb Junio C Hamano: Jens Lehmann jens.lehm...@web.de writes: Cool, so let's drop this patch and I'll teach rm to handle populated submodules according to what we do for regular files: Make sure there are no modifications which could get lost (unless -f) and remove all tracked files and the gitfile from the work tree (unless --cached) before removing the submodule from the index. If the submodule uses the old layout with the .git directory instead of a gitfile we error out just like we do today. Alternatively we could mv .git directory out of the way and the next git checkout of a branch that still has the submodule can make a gitfile to point there, no? Yup. That would mean a smooth transition for legacy .git-dir submodules into the new gitfile world. And we didn't talk about untracked or ignored files which may live inside the submodules work tree so far, but according to what a rm -r does in the superproject they should still be around after using rm on a populated submodule, right? Until we add the precious class, untracked and ignored files are expendable, so if a submodule working tree has no modification and only has leftover *.o files, they can be nuked as part of submodule removal, but if it has an untracked and unignored *.c file for example, the rm operation without -f should be stopped, no? Ok, untracked files mark the submodule modified while ignored files which are not tracked won't. Thanks for this discussion, I'll start hacking on that. A mild ping on seemingly stalled topic. I'm almost there. The only thing left is to check if a nested submodule is using a git directory. In that case I expect rm to fail even when -f is used to protect the submodule's history. I still need to find a suitable command for recursing the submodules and doing that check. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
On Fri, Aug 17, 2012 at 12:44 PM, Jens Lehmann jens.lehm...@web.de wrote: I'm almost there. The only thing left is to check if a nested submodule is using a git directory. In that case I expect rm to fail even when -f is used to protect the submodule's history. I still need to find a suitable command for recursing the submodules and doing that check. I suppose the style of this is wrong, but this seems to work for me. git submodule foreach --recursive '! test -f .git' -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Am 09.07.2012 04:17, schrieb Junio C Hamano: Jens Lehmann jens.lehm...@web.de writes: So I still think --recurse-submodules does not make any sense to the rm command. I would understand a Do not attempt to remove submodules and ignore their existence altogether option, even though I do not think it is useful. Yes, when rm removes populated submodule work trees by default then there is no need for a --recurse-submodules option. And then we don't need a --no-recurse-submodules either as that use case is already covered by the --cached option, right? All other core commands happily change the index without updating the submodule's work tree. What are all other core commands? fetch by definition does not touch working tree inside or outside working tree. add is about recording the state from the working tree to the index, and following the earlier point you raised (I unfortunately culled from the quote), as the rest of core Git considers a submodule a single path entity in the context of the superproject, the state from the working tree for the submodule is where its HEAD points at, so it is correct not to look at the working tree. I wanted to say all other /work tree updating/ core commands here (checkout, merge, reset ...). Sorry for the confusion. Without going outside the context of rm, I think the current behaviour of git rm path for submodule is merely punting---erroring out against a request for an unimplemented feature. If you ask an unpopulated submodule to be removed, we could simply rmdir() it and remove the entry from the index, but that is far insufficient for handling a submodule repository that is already inited. And we did not want to plug the check if removal will lose any state from the submodule repository logic because the information is no use for us for a long time until we added the gitfile support to allow us to relocate path/.git for the submodule safely away when we remove the working tree of it. But now we do have gitfile, so we could remove submodule working tree. I think not erroring out and removing only the index entry is a irresponsible thing to do. It would mean that we pretend as if a feature that was earlier unimplemented (hence errored out) is now supported, but it does not do what the user asked us to do, no? I do not know what other commands you have in mind, but some of them may be similar recursing down and performing an operation that potentially can fail is too complex, and we are not sure if we have enough sequencing infrastructure to guide the user to sort out the mess in the half-result, so let's punt and not to that part and have the user sort out half-features, and if that is the case, I do not think it is prudent to model rm, which is something that we now could implement properly, with the necessary infrastructure already in place, after them. Cool, so let's drop this patch and I'll teach rm to handle populated submodules according to what we do for regular files: Make sure there are no modifications which could get lost (unless -f) and remove all tracked files and the gitfile from the work tree (unless --cached) before removing the submodule from the index. If the submodule uses the old layout with the .git directory instead of a gitfile we error out just like we do today. And we didn't talk about untracked or ignored files which may live inside the submodules work tree so far, but according to what a rm -r does in the superproject they should still be around after using rm on a populated submodule, right? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Jens Lehmann jens.lehm...@web.de writes: One possible sane behaviour of git rm $path might be: - If --force is given, remove it from the index and from the working tree (i.e. rm -rf $path), but use the gitfile facility to save $path/.git away to $GIT_DIR/modules/$name; error out if the submodule directory $path cannot be removed like this. We would probably want to remove submodule.name.* entries in .gitmodules for name for which submodule.name.path matches the $path. - If --cached is given, remove it from the index if the version in the index match either HEAD or the $path/.git/HEAD, without touching the working tree. This is consistent with what happens to a regular file. - If neither --force nor --cached is given, run an equivalent of (cd $path git status), and also check if $path/.git/HEAD matches the index version. Error out if the submodule directory is dirty (again I am not sure about this part). If the submodule directory is clean, do the same as the case with --force. What you describe here is exactly how I think git submodule rm and git rm --recurse-submodules should behave. If you have a directory A with a file B in it (i.e. A/B), git rm A is refused and you have to say git rm -r A. So I can see why the above description of the mine is wrong with respect to -r option (all cases should fail if you did not give -r option). But I do not think git rm needs --recurse-submodules. Wasn't --recurse-submodules the option to control, when you tell Git to do something to submodule A, what should happen to submodules contained in the submodule A (e.g. A/B that appears at path B that itself is a separate project bound as a submodule to A)? I can see what fetching or updating A (e.g. git submodule update) while leaving A/B intact means, so there is a reason to have two ways (with or without --recurse-submodules) to such a command, but I do not see any sensible definition of what it means to remove (whether it is git submodule rm or just plain git rm) A without affecting A/B, especially with respect to the working tree files. If you remove A and its working tree files, is there a sensible way to keep A/B and its working tree files? I am OK if you choose to implement the behaviour described above only in git submodule rm A and not plain git rm -r A, but if you are going that route, I do not see how it is an improvement for it to remove the index entry for A from the index if your git rm -r A does not remove working tree files for submodule A. The user asked to remove A with a command that would remove both index entry and working tree file for a regular file (or a directory), the command may decide it is not prudent to do so for whatever reason. Perhaps the entity being asked to remove has local changes the user may regret losing. Perhaps we decided that such an opration to cause a large structural change should not be done with a plain rm but should be done with git submodule rm. The reasons do not matter much, but for the end result to be consistent, shouldn't the command keep the index entry intact if it does not remove the working tree? Unless the user used git rm --cached to explicitly ask for the index entry to be removed without touching the working tree files, that is. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Am 08.07.2012 09:32, schrieb Junio C Hamano: Jens Lehmann jens.lehm...@web.de writes: One possible sane behaviour of git rm $path might be: - If --force is given, remove it from the index and from the working tree (i.e. rm -rf $path), but use the gitfile facility to save $path/.git away to $GIT_DIR/modules/$name; error out if the submodule directory $path cannot be removed like this. We would probably want to remove submodule.name.* entries in .gitmodules for name for which submodule.name.path matches the $path. - If --cached is given, remove it from the index if the version in the index match either HEAD or the $path/.git/HEAD, without touching the working tree. This is consistent with what happens to a regular file. - If neither --force nor --cached is given, run an equivalent of (cd $path git status), and also check if $path/.git/HEAD matches the index version. Error out if the submodule directory is dirty (again I am not sure about this part). If the submodule directory is clean, do the same as the case with --force. What you describe here is exactly how I think git submodule rm and git rm --recurse-submodules should behave. If you have a directory A with a file B in it (i.e. A/B), git rm A is refused and you have to say git rm -r A. So I can see why the above description of the mine is wrong with respect to -r option (all cases should fail if you did not give -r option). I think that depends on how you see submodules in the context of the superproject. If you see the submodule entry as an abstraction describing the whole work tree below it with a single SHA1, -r isn't necessary (but won't harm either). Only if you think that the individual files of a submodule make sense in the superproject's context, you'd need -r. I think we always use the first approach: you either get all files which are in a submodule's work tree, or none. So I don't think we should require -r here, as a submodule is a single object of the superproject. And for the same reason git rm A/B issues an error: the file A/B doesn't exist in the superproject. But I do not think git rm needs --recurse-submodules. Wasn't --recurse-submodules the option to control, when you tell Git to do something to submodule A, what should happen to submodules contained in the submodule A (e.g. A/B that appears at path B that itself is a separate project bound as a submodule to A)? Nope. Only the --recursive option to the git submodule script works like that (and almost everyone seems to use that option by default anyway). But for all commands that understand the --recurse-submodule option (currently these are clone, fetch, merge, pull and push) that means include submodules in what you do and don't stop at the first level but recurse all the way down. Without this option they won't even touch the first level of submodules. I am OK if you choose to implement the behaviour described above only in git submodule rm A and not plain git rm -r A, but if you are going that route, I do not see how it is an improvement for it to remove the index entry for A from the index if your git rm -r A does not remove working tree files for submodule A. The user asked to remove A with a command that would remove both index entry and working tree file for a regular file (or a directory), the command may decide it is not prudent to do so for whatever reason. Perhaps the entity being asked to remove has local changes the user may regret losing. Perhaps we decided that such an opration to cause a large structural change should not be done with a plain rm but should be done with git submodule rm. The reasons do not matter much, but for the end result to be consistent, shouldn't the command keep the index entry intact if it does not remove the working tree? All other core commands happily change the index without updating the submodule's work tree. My first patch intended to make git rm behave the same: Don't care if the submodule's work tree is still there, but just record in the index what the user told you. Right now it *always* throws an error no matter if the submodule is modified or not, while a rm should either just work on unmodified content or leave submodules alone. That's what I'm trying to fix. To me checking out a commit before the submodule was added should leave work tree and index in the same state with respect to the submodule as when I do a git rm sub. In the former case the submodule directory is still there with all files but it won't be present in the index (and .gitmodules) anymore. And checkout doesn't care at all about any modifications in the submodule before updating. git rm will behave just the same after my patch (of course I'm talking about the upcoming v2 without checking the HEAD). But I agree with you that rm should consider modifications of the work tree of a submodule (unless -f) before actually removing its work tree. But I don't think
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Jens Lehmann jens.lehm...@web.de writes: What you describe here is exactly how I think git submodule rm and git rm --recurse-submodules should behave. If you have a directory A with a file B in it (i.e. A/B), git rm A is refused and you have to say git rm -r A. So I can see why the above description of the mine is wrong with respect to -r option (all cases should fail if you did not give -r option). I think that depends on how you see submodules in the context of the superproject. I am OK with git rm path removing the submodule working tree and the index entry for path without -r (of course git rm dir would not remove submodule dir/path in a plain directory dir and needs to be spelled git rm -r dir or git rm dir/path but that is the same if path were a regular file and a submodule is not special); thinking about it again, I think it makes more sense, because a submodule is just one single path entity in the context of the superproject. But I do not think git rm needs --recurse-submodules. Wasn't --recurse-submodules the option to control, when you tell Git to do something to submodule A, what should happen to submodules contained in the submodule A (e.g. A/B that appears at path B that itself is a separate project bound as a submodule to A)? Nope. Only the --recursive option to the git submodule script works like that (and almost everyone seems to use that option by default anyway). But for all commands that understand the --recurse-submodule option (currently these are clone, fetch, merge, pull and push) that means include submodules in what you do and don't stop at the first level but recurse all the way down. Without this option they won't even touch the first level of submodules. OK, but what does rm --no-recurse-submodules path could possibly mean in that case? If you remove path by definition anything underneath path cannot be remain, so in the context of rm, once you decide to remove submodule at path, not recursing is an option. So I still think --recurse-submodules does not make any sense to the rm command. I would understand a Do not attempt to remove submodules and ignore their existence altogether option, even though I do not think it is useful. All other core commands happily change the index without updating the submodule's work tree. What are all other core commands? fetch by definition does not touch working tree inside or outside working tree. add is about recording the state from the working tree to the index, and following the earlier point you raised (I unfortunately culled from the quote), as the rest of core Git considers a submodule a single path entity in the context of the superproject, the state from the working tree for the submodule is where its HEAD points at, so it is correct not to look at the working tree. Without going outside the context of rm, I think the current behaviour of git rm path for submodule is merely punting---erroring out against a request for an unimplemented feature. If you ask an unpopulated submodule to be removed, we could simply rmdir() it and remove the entry from the index, but that is far insufficient for handling a submodule repository that is already inited. And we did not want to plug the check if removal will lose any state from the submodule repository logic because the information is no use for us for a long time until we added the gitfile support to allow us to relocate path/.git for the submodule safely away when we remove the working tree of it. But now we do have gitfile, so we could remove submodule working tree. I think not erroring out and removing only the index entry is a irresponsible thing to do. It would mean that we pretend as if a feature that was earlier unimplemented (hence errored out) is now supported, but it does not do what the user asked us to do, no? I do not know what other commands you have in mind, but some of them may be similar recursing down and performing an operation that potentially can fail is too complex, and we are not sure if we have enough sequencing infrastructure to guide the user to sort out the mess in the half-result, so let's punt and not to that part and have the user sort out half-features, and if that is the case, I do not think it is prudent to model rm, which is something that we now could implement properly, with the necessary infrastructure already in place, after them. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Junio C Hamano gits...@pobox.com writes: Nope. Only the --recursive option to the git submodule script works like that (and almost everyone seems to use that option by default anyway). But for all commands that understand the --recurse-submodule option (currently these are clone, fetch, merge, pull and push) that means include submodules in what you do and don't stop at the first level but recurse all the way down. Without this option they won't even touch the first level of submodules. OK, but what does rm --no-recurse-submodules path could possibly mean in that case? If you remove path by definition anything underneath path cannot be remain, so in the context of rm, once you decide to remove submodule at path, not recursing is an option. Yikes, I hate myself after making silly typoes. Of course the above needs s/cannot .. remain/cannot remain/; and more importantly, not recursing is _not_ an option once you decide to remove it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/2] rm: don't fail when removing populated submodules
Am 06.07.2012 08:57, schrieb Junio C Hamano: Jens Lehmann jens.lehm...@web.de writes: Also apply the same policy as for regular files and require forcing when the submodules HEAD is different than what is recorded in the index. I think the policy for regular files is that git rm $path errors out to avoid losing information in $path. Even if the HEAD in the submodule points at the same commit as recorded in the index, if the submodule directory has other changes that (cd $path git status) would report, we would not want to remove it, no? I am not sure if the difference between $path/.git/HEAD and :$path (the version in the index) matters. Maybe it does, maybe it doesn't. Doh. I don't know how I got the idea it would be so, but a quick test with checkout and rebase showed they ignore if a submodules HEAD is different from the commit recorded. So plain rm should do the same as long as it doesn't touch the submodule work tree and I'll remove checking the HEAD from patch 1. I'll prepare v2 which will also include an updated commit message. One possible sane behaviour of git rm $path might be: - If --force is given, remove it from the index and from the working tree (i.e. rm -rf $path), but use the gitfile facility to save $path/.git away to $GIT_DIR/modules/$name; error out if the submodule directory $path cannot be removed like this. We would probably want to remove submodule.name.* entries in .gitmodules for name for which submodule.name.path matches the $path. - If --cached is given, remove it from the index if the version in the index match either HEAD or the $path/.git/HEAD, without touching the working tree. This is consistent with what happens to a regular file. - If neither --force nor --cached is given, run an equivalent of (cd $path git status), and also check if $path/.git/HEAD matches the index version. Error out if the submodule directory is dirty (again I am not sure about this part). If the submodule directory is clean, do the same as the case with --force. What you describe here is exactly how I think git submodule rm and git rm --recurse-submodules should behave. The questions remaining for me are: * What should a git rm --no-recurse-submodules do? I think it should try to follow the policy git core commands use: - don't touch the submodule's work tree - remove the submodule directory if it is empty and warn if not (currently it dies if not, to change that to a warning is the subject of patch 1) The more difficult question is if it should remove the submodule entry from .gitmodules (patch 2). As that file is part of the superproject's work tree and core git already learned to read configuration options for status, diff and fetch from it I think it's a good idea to help the user by doing so (but maybe we should make this configurable and/or add an option to enable/disable it). But on the other hand maybe users expect this only to happen when they use git submodule rm and git rm should leave .gitmodules alone? * What should the default behavior of git rm be. I tend to think that as all other core git commands never manipulate a submodule's work tree without the --recurse-submodules option, rm should do the same. So I think we should default to the --no-recure-submodules case described above to not surprise the user. That makes checking the submodule for modifications unnecessary until the --recure-submodules option is implemented. Does that make sense? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html