Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Wed, Sep 19, 2018 at 06:12:23PM +0200, Duy Nguyen wrote: > On Wed, Sep 19, 2018 at 1:19 AM Jeff King wrote: > > > > On Tue, Sep 18, 2018 at 12:36:06PM -0700, Jacob Keller wrote: > > > > > > I like that, too. It's a little more costly just because it may involve > > > > object-db writes, but I think in most cases it would be fine. I almost > > > > always "git stash" away discarded changes these days instead of "git > > > > reset --hard", because it effectively provides this kind of log. > > > > > > Obviously we do eventually turn the index into a tree, which is used > > > by the commit. Would it be possible to simply somehow store these > > > trees, and have commands which blow the tree away simply instead, save > > > it? I'm not sure how costly that is. > > > > For an up-to-date index with the cache-tree extension, this should be > > pretty cheap. Even for a partially-staged index, where we already have > > the blob in the object database, it should just incur the tree > > computation (and parts you didn't touch would hopefully have an > > up-to-date cache-tree). > > Just FYI. I wanted to get at least pruning support in place before > posting another RFC. But right now I'm going without cache-tree. > Whenever a file is updated, the _blob_ hashes are recorded in the > index log (one line per update, same as reflog format) and the > description part of the line is the updated path. > > This is simpler to do, and we can still reconstruct a tree/commit in > memory if we need to (but not whole tree snapshots; but I don't think > that would be very useful). But maybe cache-tree approach is better. > We will see. Oh, interesting. Sort of a hybrid approach. :) That makes sense, and I don't see any reason it wouldn't work. -Peff
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Wed, Sep 19, 2018 at 1:19 AM Jeff King wrote: > > On Tue, Sep 18, 2018 at 12:36:06PM -0700, Jacob Keller wrote: > > > > I like that, too. It's a little more costly just because it may involve > > > object-db writes, but I think in most cases it would be fine. I almost > > > always "git stash" away discarded changes these days instead of "git > > > reset --hard", because it effectively provides this kind of log. > > > > Obviously we do eventually turn the index into a tree, which is used > > by the commit. Would it be possible to simply somehow store these > > trees, and have commands which blow the tree away simply instead, save > > it? I'm not sure how costly that is. > > For an up-to-date index with the cache-tree extension, this should be > pretty cheap. Even for a partially-staged index, where we already have > the blob in the object database, it should just incur the tree > computation (and parts you didn't touch would hopefully have an > up-to-date cache-tree). Just FYI. I wanted to get at least pruning support in place before posting another RFC. But right now I'm going without cache-tree. Whenever a file is updated, the _blob_ hashes are recorded in the index log (one line per update, same as reflog format) and the description part of the line is the updated path. This is simpler to do, and we can still reconstruct a tree/commit in memory if we need to (but not whole tree snapshots; but I don't think that would be very useful). But maybe cache-tree approach is better. We will see. -- Duy
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Tue, Sep 18, 2018 at 12:36:06PM -0700, Jacob Keller wrote: > > I like that, too. It's a little more costly just because it may involve > > object-db writes, but I think in most cases it would be fine. I almost > > always "git stash" away discarded changes these days instead of "git > > reset --hard", because it effectively provides this kind of log. > > Obviously we do eventually turn the index into a tree, which is used > by the commit. Would it be possible to simply somehow store these > trees, and have commands which blow the tree away simply instead, save > it? I'm not sure how costly that is. For an up-to-date index with the cache-tree extension, this should be pretty cheap. Even for a partially-staged index, where we already have the blob in the object database, it should just incur the tree computation (and parts you didn't touch would hopefully have an up-to-date cache-tree). Where it gets expensive is when you are throwing away working-tree changes, and you have to re-hash those objects. Conceptually that's not much worse than just calling `git add`, but there are corner cases (e.g., imagine you keep a 1GB clone of another project in an ignored directory, and then `git clean -dx` wants to `git add` all of it). -Peff
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Tue, Sep 18, 2018 at 12:41:30PM -0700, Jacob Keller wrote: > I think having both is good. There are a lot of ways to accidentally > throw away work, and it's pretty frustrating to have it happen. But > the reflog is also somewhat complicated, and I've definitely seen a > lot of developers who've never heard of it, and struggle with the > concept. It's definitely good to improve on "oh, I screwed up - how can I recover?" part. > I personally think having the nice "it looks like you're about to > throw away all your changes, are you sure" style of protection using > something like --clobber-index is useful as a mode, even if we have an > index log of sorts. I don't think it's an appealing design. If we have the index reflog giving us an undo function, then it is (at least for me) irritating to have additional protection. Furthermore, this protection, to not break existing workflows, needs to be configurable. This comes with much baggage on its own. Having then two things which seem to solve the same problem setting, but somehow different, is in my opinion even more confusing in the long run. Greetings, Eckhard
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 12:26 PM Junio C Hamano wrote: > FWIW, I didn't mean to say that we should give users a way to > recover. Your "commit -a" or "commit $path" protection just > prevents the situation from happening, and I think it is sufficient. > > The sole point I wanted to raise by bringing up the above was that > we should have the same degree of protection against "add $path" or > "add -u". > > Of course, "index log" is interesting and it may even turn out to be > useful (I was skeptical about "reference log" the same way, but it > turned out to be useful without burdening the system too heavily), > and it may even remove the need for the "do not accidentally lose > information by adding more to the index" protection. But until that > happens, if we are to have such a protection, we would wnat to give > the same degree of protection to "commit" and "add". I think having both is good. There are a lot of ways to accidentally throw away work, and it's pretty frustrating to have it happen. But the reflog is also somewhat complicated, and I've definitely seen a lot of developers who've never heard of it, and struggle with the concept. I personally think having the nice "it looks like you're about to throw away all your changes, are you sure" style of protection using something like --clobber-index is useful as a mode, even if we have an index log of sorts. Having it be default helps new people, even if it does get in the way of someone who knows what they're doing. Having it be configurable, to me, sort of defeats the point, since it means having to tell people to turn this on. I personally don't mind having to type an extended option to clobber when I know it's what I want, but I can see that being painful. However, if we had a reflog for the index, this becomes less of a problem since recovery is much easier. Thanks, Jake
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 11:15 AM Jeff King wrote: > > On Mon, Sep 17, 2018 at 07:29:26PM +0200, Duy Nguyen wrote: > > > > On the other hand, if I am keeping some change that should never be > > > in a commit in the working tree file, and building the contents in > > > the index using "add -p" to incrementally, it would be the same > > > disaster as you are trying to prevent if I by mistake did a whole > > > path 'add', even if I catch myself doing so before running 'commit' > > > i.e. > > > > > > edit X > > > git add -p X > > > git diff --cached X > > > git diff X > > > ... repeat the above number of times ... > > > git add X ;# OOPS! > > > git add . ;# OOPS! even worse! > > > > > > Even though this does not involve "git commit -a" or "git commit X", > > > an unrecoverable damage that requires redoing the manual work is > > > already done. > > > > I don't see a good way to get to recover this situation. I could go > > back to the "index log" idea, where we keep a log of index changes (or > > just "interesting" changes). That way there's no behavior change at > > all. The user who accidentally updates/deletes something can always > > retrieve the old content back (assuming that they realize quickly > > since we can't keep very long log). > > FWIW, I like that approach much better, since: > > 1. It does not bother or restrict anybody in their workflow; instead, > they pay the complexity price only when they know they have made a > mistake. > > 2. It covers many more cases (e.g., just doing the wrong thing via > "add -p"). > I also think this is a better approach for the same reasons. > A naive index log would be pretty cheap in CPU, at least for POSIX-ish > systems. You could just hard link "index" to "index.N" before renaming > "index.lock" over "index". But I guess if you have a gigantic index, > that's less appealing. So maybe storing the equivalent of a "--raw" diff > between the two index states would make more sense (and after all, you > don't really need the stat-cache or cache-tree). It would cost more to > reconstruct the index on the fly, but then the point is that you would > create these logs a lot more than you access them. > > > I've been thinking about allowing to undo worktree changes too (e.g. > > accidental "git reset --hard") and this log can cover it as well. > > I like that, too. It's a little more costly just because it may involve > object-db writes, but I think in most cases it would be fine. I almost > always "git stash" away discarded changes these days instead of "git > reset --hard", because it effectively provides this kind of log. > Obviously we do eventually turn the index into a tree, which is used by the commit. Would it be possible to simply somehow store these trees, and have commands which blow the tree away simply instead, save it? I'm not sure how costly that is. Thanks, Jake
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 10:09 AM Junio C Hamano wrote: > > It usually is safer (simply because you do not have to think about > it) to start a behaviour change like this as a strict opt-in to gain > confidence. I tend to agree, however.. in this case it could be considered safer to err on the side of not throwing away the index which could have crafted changes in it. > The approach to check if the contents in the index matches that in > the HEAD per-path (i.e. "The contents we are adding to the index is > whole working tree contents for that path. But the index already > has contents different from HEAD for the path---are we losing > information by doing this?"), is a very good one. But for the > protection to be effective, I think "git commit" and "git add" > should be covered the same way, ideally with the same code and > possibly the same configuration knob and/or command line option to > control the behaviour. Checking both commit and add makes sense to me. > > If the information loss caused by the "add/commit X" or "add > -u/commit -a" is so serious that this new feature deserves to become > the default (which I do not yet think it is the case, by the way), > then we could even forbid "commit X" or "commit -a" when the paths > involved has difference between the index and the HEAD, without any > configuration knob or command line override for "commit", and then > tell the users to use "git add/rm" _with_ the override before coming > back to "git commit". I was going to suggest we have some sort of reflog equivalent for the index, but Duy seems to discuss that in a follow-on mail. > > How should this new check intract with paths added with "add -N", by > the way?
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 08:41:36PM +0200, Duy Nguyen wrote: > > I think the reflog approach has been successful: give these intermediate > > states a name. So in theory I could do something like: > > > > git checkout -p :@{2.minutes.ago} > > > > though it would probably be useful to be able to walk the states, too, > > just like we have "log --reflog-walk". > > > > The syntax above is off-the-cuff (and based on the ":" index > > syntax). I guess if we had a separate log for "stuff in the worktree > > that got thrown away by reset" we might need a separate syntax. > > I'm leaning towards reflog too. Not because of the syntax but because > of reusing the current code base. I don't have to worry about pruning, > walking, gc-ing... because it's pretty much already there. And the UI > is not so urgent since reflog file is very readable, early adopters > can just open the file and get the hash. Ah, good point on pruning/gc. I had imagined a new "index log" format. But really if you just turn the result into a tree, then it becomes just another ref(log). That might be slightly less efficient, but the flexibility and simplicity are probably worth it. Or maybe with cache-tree it is not even less efficient. So that definitely seems like the right direction. > I'm trying to quickly make something that writes to > "$GIT_DIR/logs/index" and see how it goes. But yeah I'll probably drop > this patch. The ":@{2.minutes.ago}" just makes me like this direction > more, even though I don't know if I could even make that work. I think "index@{2.minutes.ago}" would almost work, but it would probably complain about a reflog without a matching ref. That seems like it would be pretty easy to work around in the reflog code. Or maybe even by having a stash-like "index log" ref. -Peff
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
Duy Nguyen writes: > On Mon, Sep 17, 2018 at 7:09 PM Junio C Hamano wrote: >> ... >> On the other hand, if I am keeping some change that should never be >> in a commit in the working tree file, and building the contents in >> the index using "add -p" to incrementally, it would be the same >> disaster as you are trying to prevent if I by mistake did a whole >> path 'add', even if I catch myself doing so before running 'commit' >> i.e. >> >> edit X >> git add -p X >> git diff --cached X >> git diff X >> ... repeat the above number of times ... >> git add X ;# OOPS! >> git add . ;# OOPS! even worse! >> >> Even though this does not involve "git commit -a" or "git commit X", >> an unrecoverable damage that requires redoing the manual work is >> already done. > > I don't see a good way to get to recover this situation. I could go > back to the "index log" idea,... FWIW, I didn't mean to say that we should give users a way to recover. Your "commit -a" or "commit $path" protection just prevents the situation from happening, and I think it is sufficient. The sole point I wanted to raise by bringing up the above was that we should have the same degree of protection against "add $path" or "add -u". Of course, "index log" is interesting and it may even turn out to be useful (I was skeptical about "reference log" the same way, but it turned out to be useful without burdening the system too heavily), and it may even remove the need for the "do not accidentally lose information by adding more to the index" protection. But until that happens, if we are to have such a protection, we would wnat to give the same degree of protection to "commit" and "add".
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 8:15 PM Jeff King wrote: > On Mon, Sep 17, 2018 at 07:29:26PM +0200, Duy Nguyen wrote: > > I don't see a good way to get to recover this situation. I could go > > back to the "index log" idea, where we keep a log of index changes (or > > just "interesting" changes). That way there's no behavior change at > > all. The user who accidentally updates/deletes something can always > > retrieve the old content back (assuming that they realize quickly > > since we can't keep very long log). > > FWIW, I like that approach much better, since: > > 1. It does not bother or restrict anybody in their workflow; instead, > they pay the complexity price only when they know they have made a > mistake. > > 2. It covers many more cases (e.g., just doing the wrong thing via > "add -p"). > > A naive index log would be pretty cheap in CPU, at least for POSIX-ish > systems. You could just hard link "index" to "index.N" before renaming > "index.lock" over "index". But I guess if you have a gigantic index, > that's less appealing. So maybe storing the equivalent of a "--raw" diff > between the two index states would make more sense (and after all, you > don't really need the stat-cache or cache-tree). It would cost more to > reconstruct the index on the fly, but then the point is that you would > create these logs a lot more than you access them. Yeah. The problem though is extracting the information out of these index.N files. > > I've been thinking about allowing to undo worktree changes too (e.g. > > accidental "git reset --hard") and this log can cover it as well. > > I like that, too. It's a little more costly just because it may involve > object-db writes, but I think in most cases it would be fine. I almost > always "git stash" away discarded changes these days instead of "git > reset --hard", because it effectively provides this kind of log. Yeah the added cost is pretty much given. You want safety, you pay extra :) > > The only downside is we need a new command for the UI (or perhaps I > > can just add "git add --log" or something like that). > > I think the reflog approach has been successful: give these intermediate > states a name. So in theory I could do something like: > > git checkout -p :@{2.minutes.ago} > > though it would probably be useful to be able to walk the states, too, > just like we have "log --reflog-walk". > > The syntax above is off-the-cuff (and based on the ":" index > syntax). I guess if we had a separate log for "stuff in the worktree > that got thrown away by reset" we might need a separate syntax. I'm leaning towards reflog too. Not because of the syntax but because of reusing the current code base. I don't have to worry about pruning, walking, gc-ing... because it's pretty much already there. And the UI is not so urgent since reflog file is very readable, early adopters can just open the file and get the hash. This also lets me handle logging worktree changes in the future (keeping track of untracked files is impossible with the "index.N" approach) I'm trying to quickly make something that writes to "$GIT_DIR/logs/index" and see how it goes. But yeah I'll probably drop this patch. The ":@{2.minutes.ago}" just makes me like this direction more, even though I don't know if I could even make that work. -- Duy
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 07:29:26PM +0200, Duy Nguyen wrote: > > On the other hand, if I am keeping some change that should never be > > in a commit in the working tree file, and building the contents in > > the index using "add -p" to incrementally, it would be the same > > disaster as you are trying to prevent if I by mistake did a whole > > path 'add', even if I catch myself doing so before running 'commit' > > i.e. > > > > edit X > > git add -p X > > git diff --cached X > > git diff X > > ... repeat the above number of times ... > > git add X ;# OOPS! > > git add . ;# OOPS! even worse! > > > > Even though this does not involve "git commit -a" or "git commit X", > > an unrecoverable damage that requires redoing the manual work is > > already done. > > I don't see a good way to get to recover this situation. I could go > back to the "index log" idea, where we keep a log of index changes (or > just "interesting" changes). That way there's no behavior change at > all. The user who accidentally updates/deletes something can always > retrieve the old content back (assuming that they realize quickly > since we can't keep very long log). FWIW, I like that approach much better, since: 1. It does not bother or restrict anybody in their workflow; instead, they pay the complexity price only when they know they have made a mistake. 2. It covers many more cases (e.g., just doing the wrong thing via "add -p"). A naive index log would be pretty cheap in CPU, at least for POSIX-ish systems. You could just hard link "index" to "index.N" before renaming "index.lock" over "index". But I guess if you have a gigantic index, that's less appealing. So maybe storing the equivalent of a "--raw" diff between the two index states would make more sense (and after all, you don't really need the stat-cache or cache-tree). It would cost more to reconstruct the index on the fly, but then the point is that you would create these logs a lot more than you access them. > I've been thinking about allowing to undo worktree changes too (e.g. > accidental "git reset --hard") and this log can cover it as well. I like that, too. It's a little more costly just because it may involve object-db writes, but I think in most cases it would be fine. I almost always "git stash" away discarded changes these days instead of "git reset --hard", because it effectively provides this kind of log. > The only downside is we need a new command for the UI (or perhaps I > can just add "git add --log" or something like that). I think the reflog approach has been successful: give these intermediate states a name. So in theory I could do something like: git checkout -p :@{2.minutes.ago} though it would probably be useful to be able to walk the states, too, just like we have "log --reflog-walk". The syntax above is off-the-cuff (and based on the ":" index syntax). I guess if we had a separate log for "stuff in the worktree that got thrown away by reset" we might need a separate syntax. > Should I just drop this patch and go that direction instead? More > work, but maybe better end result. I guess my whole response is a long-winded "yes". ;) -Peff
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
On Mon, Sep 17, 2018 at 7:09 PM Junio C Hamano wrote: > > Nguyễn Thái Ngọc Duy writes: > > > This is about mixing "git add -p" and "git commit -a" (or "git commit > > ") where you may accidentally lose staged changes. After the > > discussion with Jonathan, I'm going with a bit different approach than > > v1, this behavior now becomes default, and if the user wants the old > > behavior back, they can use --clobber-index. > > > > Another change is "git commit " is covered as well, as pointed > > out by Jacob. > > > > I will need to add some test cases of course, if this direction is > > still promising. One thing I'm not sure about is whether want to > > deliberately clobber the index often, then perhaps we should add a > > config key to bring the old behavior back. > > It usually is safer (simply because you do not have to think about > it) to start a behaviour change like this as a strict opt-in to gain > confidence. Heh. The RFC was opt-in. Jonathan suggested changing default behavior and I went along just to see how far I could push it :) > As I often see myself futzing with the same file, adding changes to > it incrementally, so that I can view progress in "diff --cached" and > "diff" output, it would be a serious usability regression if the > last step in the following sequence is rejected by default: > > edit X > git add X > git diff --cached X > git diff X > ... repeat the above number of times ... > git commit X ;# or "git commit -a" to finally conclude But yes if there's a valid use case where this behavior change becomes a problem, then opt-in makes more sense. > On the other hand, if I am keeping some change that should never be > in a commit in the working tree file, and building the contents in > the index using "add -p" to incrementally, it would be the same > disaster as you are trying to prevent if I by mistake did a whole > path 'add', even if I catch myself doing so before running 'commit' > i.e. > > edit X > git add -p X > git diff --cached X > git diff X > ... repeat the above number of times ... > git add X ;# OOPS! > git add . ;# OOPS! even worse! > > Even though this does not involve "git commit -a" or "git commit X", > an unrecoverable damage that requires redoing the manual work is > already done. I don't see a good way to get to recover this situation. I could go back to the "index log" idea, where we keep a log of index changes (or just "interesting" changes). That way there's no behavior change at all. The user who accidentally updates/deletes something can always retrieve the old content back (assuming that they realize quickly since we can't keep very long log). I've been thinking about allowing to undo worktree changes too (e.g. accidental "git reset --hard") and this log can cover it as well. The only downside is we need a new command for the UI (or perhaps I can just add "git add --log" or something like that). Should I just drop this patch and go that direction instead? More work, but maybe better end result. > How should this new check intract with paths added with "add -N", by > the way? Good point. The check here is basically "git diff --cached". I would need to turn it to "git diff --cached --ita-invisible-in-index" to make ita entries disappear. -- Duy
Re: [PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
Nguyễn Thái Ngọc Duy writes: > This is about mixing "git add -p" and "git commit -a" (or "git commit > ") where you may accidentally lose staged changes. After the > discussion with Jonathan, I'm going with a bit different approach than > v1, this behavior now becomes default, and if the user wants the old > behavior back, they can use --clobber-index. > > Another change is "git commit " is covered as well, as pointed > out by Jacob. > > I will need to add some test cases of course, if this direction is > still promising. One thing I'm not sure about is whether want to > deliberately clobber the index often, then perhaps we should add a > config key to bring the old behavior back. It usually is safer (simply because you do not have to think about it) to start a behaviour change like this as a strict opt-in to gain confidence. As I often see myself futzing with the same file, adding changes to it incrementally, so that I can view progress in "diff --cached" and "diff" output, it would be a serious usability regression if the last step in the following sequence is rejected by default: edit X git add X git diff --cached X git diff X ... repeat the above number of times ... git commit X ;# or "git commit -a" to finally conclude On the other hand, if I am keeping some change that should never be in a commit in the working tree file, and building the contents in the index using "add -p" to incrementally, it would be the same disaster as you are trying to prevent if I by mistake did a whole path 'add', even if I catch myself doing so before running 'commit' i.e. edit X git add -p X git diff --cached X git diff X ... repeat the above number of times ... git add X ;# OOPS! git add . ;# OOPS! even worse! Even though this does not involve "git commit -a" or "git commit X", an unrecoverable damage that requires redoing the manual work is already done. The approach to check if the contents in the index matches that in the HEAD per-path (i.e. "The contents we are adding to the index is whole working tree contents for that path. But the index already has contents different from HEAD for the path---are we losing information by doing this?"), is a very good one. But for the protection to be effective, I think "git commit" and "git add" should be covered the same way, ideally with the same code and possibly the same configuration knob and/or command line option to control the behaviour. If the information loss caused by the "add/commit X" or "add -u/commit -a" is so serious that this new feature deserves to become the default (which I do not yet think it is the case, by the way), then we could even forbid "commit X" or "commit -a" when the paths involved has difference between the index and the HEAD, without any configuration knob or command line override for "commit", and then tell the users to use "git add/rm" _with_ the override before coming back to "git commit". How should this new check intract with paths added with "add -N", by the way?
[PATCH v2 0/1] Make 'git commit' not accidentally lose staged content
This is about mixing "git add -p" and "git commit -a" (or "git commit ") where you may accidentally lose staged changes. After the discussion with Jonathan, I'm going with a bit different approach than v1, this behavior now becomes default, and if the user wants the old behavior back, they can use --clobber-index. Another change is "git commit " is covered as well, as pointed out by Jacob. I will need to add some test cases of course, if this direction is still promising. One thing I'm not sure about is whether want to deliberately clobber the index often, then perhaps we should add a config key to bring the old behavior back. Nguyễn Thái Ngọc Duy (1): commit: do not clobber the index Documentation/git-commit.txt | 11 +++- builtin/commit.c | 105 --- cache.h | 1 + read-cache.c | 11 t/t2201-add-update-typechange.sh | 2 +- t/t4015-diff-whitespace.sh | 2 +- t/t7102-reset.sh | 2 +- t/t7500-commit.sh| 2 +- t/t7502-commit.sh| 4 +- t/t9350-fast-export.sh | 2 +- 10 files changed, 126 insertions(+), 16 deletions(-) -- 2.19.0.rc0.337.ge906d732e7