Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On Tue, Jul 15, 2014 at 11:45 PM, Junio C Hamano gits...@pobox.com wrote: What is the real point of writing into *.lock and renaming? It serves two purposes: (1) everybody adheres to that convention---if we managed to take the lock index.lock, nobody else will compete and interfere with us until we rename it to index. (2) in the meantime, others can still read the old contents in the original index. With take lock, write to a temporary, commit by rename or rollback by delete, we can have the usual transactional semantics. While we are working on it, nobody is allowed to muck with the same file, because everybody pays attention to *.lock. People will not see what we are doing until we release the lock because we are not writing into the primary file. And people can see what we did in its entirety once we are done because we close and rename to commit our changes atomically. True. Think what CLOSE_LOCK adds to that and you would appreciate its usefulness and at the same time realize its limitation. By allowing us to flush what we wrote to the disk without releasing the lock, we can give others (e.g. subprograms we spawn) controlled access to the new contents we prepared before we commit the result to the outside world. The access is controlled because we are in control when we tell these others to peek into or update the temporary file *.lock. The original implementaiton of CLOSE_LOCK is limited in that you can do so only once; you take a lock, you do your thing, you close, you let (one or more) others see it, and you commit (or rollback). You cannot do another of your thing once you close with the original implementation because there was no way to reopen. This is probably where our opinions differ. Yes, if you are sure nobody else is looking at the lock file any more, then you can do whatever you want. And because this is a .lock file, nobody is supposed to look at it unless you tell them too (in contrast $GIT_DIR/index can be read at any time). The format of the index makes it impossible to just edit one byte and be done with it. You always write a full new file. By sticking to transaction-style update, you need no extra code, and you have a back up file as a side effect. What do you gain by your proposal to lock index.lock file? We know we already have index.lock, so nobody should be competing on mucking with its contents with us and we gain nothing by writing into index.lock.lock and renaming it to index.lock. We are in total control of the lifetime of index.lock, when we spawn subprograms on it to let them write to it, when we ourselves write to it, when we spawn subprograms on it to let them read from it, all under the lock we have on the index, i.e. index.lock. The only thing use of another temporary file (that does not have to be a lock on index.lock, by the way, because we have total control over when and who updates the file while we hold the index.lock) would give you is that it allows you to make the success of the do another of your thing step optional. While holding the lock, you close and let add -i work on it, and after it returns, instead of reopening, you write into yet another index.lock.lock, expecting that it might fail and when it fails you can roll it back, leaving the contents add -i left in index.lock intact. If you do not use the extra temporary file, you start from index.lock left by add -i, write the updated index into index.lock and if you fail to write, you have to roll back the entire index---you lose the option to use the index left by add -i without repopulated cache-tree. But in the index update context, I do not think such a complexity is not necessary. If something fails, we should fail and roll back the entire index. I probably look at the problem from a wrong angle. To me the result of commit -p is precious. I'm not a big user of commit -p myself as I prefer add -p but it's the same: it'd be frustrating if after you have carefully added your chunks, the program aborts and you have to start over. And not just a few chunks. Think of reviewing .po files and approve strings by the means of adding them to the index. Perhaps because _I_ as a developer see this cache-tree update step optional and react to it unnecessarily. Ordinary users won't see any difference. And perhaps a better way to save the result of commit/add -p is some sort of index log, not be over-protective at this interactive commit code block. I don't feel strongly either way. So your call. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
Duy Nguyen pclo...@gmail.com writes: If you do not use the extra temporary file, you start from index.lock left by add -i, write the updated index into index.lock and if you fail to write, you have to roll back the entire index---you lose the option to use the index left by add -i without repopulated cache-tree. But in the index update context, I do not think such a complexity is not necessary. If something fails, we should fail and roll back the entire index. I probably look at the problem from a wrong angle. To me the result of commit -p is precious. I'm not a big user of commit -p myself as I prefer add -p but it's the same... Oh, we agree that the result of commit -p is precious. But the point of David's series is to change the definition of the precious result to not just commit is made as asked, but now also to include that the index the user will use for continued work will have populated cache-tree. The series thinks both are precious. As you can probably read from my review responses, I am not sold to the idea that spending extra cycles to pre-populate cache-tree is 100% good idea, but if we _were_ to accept that it is a good idea, it logically follows that failing to populate cache-tree is just as a failure as failing to commit. In any case, it is unlikely for writing out the updated index with refreshed cache-tree to fail and blow away the partially built index (we do not even attempt to reopen/update if we cannot prepare in-core cache-tree), so I do not think it is such a big deal either way. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On Mon, Jul 14, 2014 at 7:15 PM, Duy Nguyen pclo...@gmail.com wrote: It makes me wonder if a cleaner way of rebuilding cache-treei in this case is from git-add--interactive.perl, or by simply spawn git update-index --rebuild-cache-tree after running git-add--interactive.perl. We could check if the cache-tree has fully been populated by add -i and limit the rebuilding by git commit -p main process, but if add -i did not do so, there is no reason why git commit -p should not do so, without relying on the implementation of add -i to do so. At least to me, what you suggested sounds nothing more than a cop-out; instead of lifting the limitation of the API by enhancing it with reopen-lock-file, punting to shift the burden elsewhere. I am not quite sure why that is cleaner, but perhaps I am missing something. In the longer run, it would be plausible that somebody would want to rewrite git-add -i and will make it just a C API call away without having to spawn a separate process. You cannot punt by saying make 'add -i' responsible for it at that point; you will be doing what 'add -i' would be doing yourself, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On Mon, Jul 14, 2014 at 11:38:06PM -0700, Junio C Hamano wrote: On Mon, Jul 14, 2014 at 7:15 PM, Duy Nguyen pclo...@gmail.com wrote: It makes me wonder if a cleaner way of rebuilding cache-treei in this case is from git-add--interactive.perl, or by simply spawn git update-index --rebuild-cache-tree after running git-add--interactive.perl. We could check if the cache-tree has fully been populated by add -i and limit the rebuilding by git commit -p main process, but if add -i did not do so, there is no reason why git commit -p should not do so, without relying on the implementation of add -i to do so. At least to me, what you suggested sounds nothing more than a cop-out; instead of lifting the limitation of the API by enhancing it with reopen-lock-file, punting to shift the burden elsewhere. I am not quite sure why that is cleaner, but perhaps I am missing something. Reopen-lock-file to me does not sound like an enhancement, at least in the index update context. We have always updated the index by writing to a new file, then rename. Now if the new write_locked_index() blows up after reopen_lock_file(), we have no way but die() because index_lock.filename can no longer be trusted. Doing it in a separate process keeps the tradition. And if the second write fails, we still have good index_lock.filename. We could also do the same here with something like this without new process (lightly tested): -- 8 -- diff --git a/builtin/commit.c b/builtin/commit.c index 606c890..c2b4875 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -340,6 +340,18 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, discard_cache(); read_cache_from(index_lock.filename); + if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { + static struct lock_file second_index_lock; + hold_lock_file_for_update(second_index_lock, + index_lock.filename, + LOCK_DIE_ON_ERROR); + if (write_locked_index(the_index, second_index_lock, + COMMIT_LOCK)) { + warning(_(Failed to update main cache tree)); + rollback_lock_file(second_index_lock); + } + } else + warning(_(Failed to update main cache tree)); commit_style = COMMIT_NORMAL; return index_lock.filename; -- 8 -- I was worried about the dependency between second_index_lock and index_lock, but at cleanup, all we do is rollback, which is safe regardless of dependencies. Just need to make sure we commit second_index_lock before index_lock. In the longer run, it would be plausible that somebody would want to rewrite git-add -i and will make it just a C API call away without having to spawn a separate process. You cannot punt by saying make 'add -i' responsible for it at that point; you will be doing what 'add -i' would be doing yourself, no? Yes, but at current rewrite rate I'd bet it won't happen in the next two years at least. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
Duy Nguyen pclo...@gmail.com writes: On Mon, Jul 14, 2014 at 11:38:06PM -0700, Junio C Hamano wrote: On Mon, Jul 14, 2014 at 7:15 PM, Duy Nguyen pclo...@gmail.com wrote: It makes me wonder if a cleaner way of rebuilding cache-treei in this case is from git-add--interactive.perl, or by simply spawn git update-index --rebuild-cache-tree after running git-add--interactive.perl. We could check if the cache-tree has fully been populated by add -i and limit the rebuilding by git commit -p main process, but if add -i did not do so, there is no reason why git commit -p should not do so, without relying on the implementation of add -i to do so. At least to me, what you suggested sounds nothing more than a cop-out; instead of lifting the limitation of the API by enhancing it with reopen-lock-file, punting to shift the burden elsewhere. I am not quite sure why that is cleaner, but perhaps I am missing something. Reopen-lock-file to me does not sound like an enhancement, at least in the index update context. We have always updated the index by writing to a new file, then rename. Time to step back and think, I think. What is the real point of writing into *.lock and renaming? It serves two purposes: (1) everybody adheres to that convention---if we managed to take the lock index.lock, nobody else will compete and interfere with us until we rename it to index. (2) in the meantime, others can still read the old contents in the original index. With take lock, write to a temporary, commit by rename or rollback by delete, we can have the usual transactional semantics. While we are working on it, nobody is allowed to muck with the same file, because everybody pays attention to *.lock. People will not see what we are doing until we release the lock because we are not writing into the primary file. And people can see what we did in its entirety once we are done because we close and rename to commit our changes atomically. Think what CLOSE_LOCK adds to that and you would appreciate its usefulness and at the same time realize its limitation. By allowing us to flush what we wrote to the disk without releasing the lock, we can give others (e.g. subprograms we spawn) controlled access to the new contents we prepared before we commit the result to the outside world. The access is controlled because we are in control when we tell these others to peek into or update the temporary file *.lock. The original implementaiton of CLOSE_LOCK is limited in that you can do so only once; you take a lock, you do your thing, you close, you let (one or more) others see it, and you commit (or rollback). You cannot do another of your thing once you close with the original implementation because there was no way to reopen. What do you gain by your proposal to lock index.lock file? We know we already have index.lock, so nobody should be competing on mucking with its contents with us and we gain nothing by writing into index.lock.lock and renaming it to index.lock. We are in total control of the lifetime of index.lock, when we spawn subprograms on it to let them write to it, when we ourselves write to it, when we spawn subprograms on it to let them read from it, all under the lock we have on the index, i.e. index.lock. The only thing use of another temporary file (that does not have to be a lock on index.lock, by the way, because we have total control over when and who updates the file while we hold the index.lock) would give you is that it allows you to make the success of the do another of your thing step optional. While holding the lock, you close and let add -i work on it, and after it returns, instead of reopening, you write into yet another index.lock.lock, expecting that it might fail and when it fails you can roll it back, leaving the contents add -i left in index.lock intact. If you do not use the extra temporary file, you start from index.lock left by add -i, write the updated index into index.lock and if you fail to write, you have to roll back the entire index---you lose the option to use the index left by add -i without repopulated cache-tree. But in the index update context, I do not think such a complexity is not necessary. If something fails, we should fail and roll back the entire index. Now if the new write_locked_index() blows up after reopen_lock_file(), we have no way but die() because index_lock.filename can no longer be trusted. If that is the case, lock.filename[] is getting clobbered by somebody too early, isn't it? I think Michael's mh/lockfile topic (dropped from my tree due to getting stale without rerolling) used a separate flag to indicate the validity without mucking with the filename member, and we may want to do something like that as the right solution to that problem. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
Duy Nguyen pclo...@gmail.com writes: On Sat, Jul 12, 2014 at 11:44 AM, David Turner dtur...@twopensource.com wrote: @@ -342,6 +342,15 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, discard_cache(); read_cache_from(index_lock.filename); + if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { + fd = open(index_lock.filename, O_WRONLY); + if (fd = 0) + if (write_cache(fd, active_cache, active_nr) == 0) { + close_lock_file(index_lock); If write_cache() returns a negative value, index.lock is probably corrupted. Should we die() instead of moving on and returning index_lock.filename to the caller? The caller may move index.lock to index later on and officially ruin index. Perhaps true, but worse yet, this will not play nicely together with your split index series, no? After taking the lock and writing and closing, we spawn the interactive while still holding the lock, and the open we see here is because we want to further update the same under the same lock. Perhaps write_locked_index() API in the split index series can notice that the underlying fd in index_lock has been closed earlier, realize that the call is to re-update the index under the same lock and open the file again for writing? Then we can update the above open() then write_cache() sequence with just a call to write_locked_index(). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On 14/07/14 16:54, Junio C Hamano wrote: Duy Nguyen pclo...@gmail.com writes: On Sat, Jul 12, 2014 at 11:44 AM, David Turner dtur...@twopensource.com wrote: @@ -342,6 +342,15 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, discard_cache(); read_cache_from(index_lock.filename); + if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { + fd = open(index_lock.filename, O_WRONLY); + if (fd = 0) + if (write_cache(fd, active_cache, active_nr) == 0) { + close_lock_file(index_lock); If write_cache() returns a negative value, index.lock is probably corrupted. Should we die() instead of moving on and returning index_lock.filename to the caller? The caller may move index.lock to index later on and officially ruin index. Perhaps true, but worse yet, this will not play nicely together with your split index series, no? After taking the lock and writing and closing, we spawn the interactive while still holding the lock, and the open we see here is because we want to further update the same under the same lock. Perhaps write_locked_index() API in the split index series can notice that the underlying fd in index_lock has been closed earlier, realize that the call is to re-update the index under the same lock and open the file again for writing? Hmm, I was just about to suggest that there was some negative interplay between the 'dt/cache-tree-repair' and 'nd/split-index' branches as well. The pu branch fails the testsuite for me. In particular, t0090-cache-tree.sh fails like so: $ ./t0090-cache-tree.sh -i -v ... ok 9 - second commit has cache-tree expecting success: cat -\EOT foo.c int foo() { return 42; } int bar() { return 42; } EOT git add foo.c test_invalid_cache_tree git commit -m add a file test_cache_tree cat -\EOT foo.c int foo() { return 43; } int bar() { return 44; } EOT (echo p; echo 1; echo; echo s; echo n; echo y; echo q) | git commit --interactive -m foo test_cache_tree [master d1075a6] add a file Author: A U Thor aut...@example.com 1 file changed, 8 insertions(+) create mode 100644 foo.c staged unstaged path 1:unchanged+2/-2 foo.c *** Commands *** 1: [s]tatus 2: [u]pdate 3: [r]evert 4: [a]dd untracked 5: [p]atch 6: [d]iff 7: [q]uit 8: [h]elp What nowstaged unstaged path 1:unchanged+2/-2 [f]oo.c Patch updatestaged unstaged path * 1:unchanged+2/-2 [f]oo.c Patch update diff --git a/foo.c b/foo.c index 75522e2..3f7f049 100644 --- a/foo.c +++ b/foo.c @@ -1,8 +1,8 @@ int foo() { -return 42; +return 43; } int bar() { -return 42; +return 44; } Stage this hunk [y,n,q,a,d,/,s,e,?]? Split into 2 hunks. @@ -1,6 +1,6 @@ int foo() { -return 42; +return 43; } int bar() { Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? @@ -4,5 +4,5 @@ } int bar() { -return 42; +return 44; } Stage this hunk [y,n,q,a,d,/,K,g,e,?]? *** Commands *** 1: [s]tatus 2: [u]pdate 3: [r]evert 4: [a]dd untracked 5: [p]atch 6: [d]iff 7: [q]uit 8: [h]elp What now Bye. [master 65d7dde] foo Author: A U Thor aut...@example.com 1 file changed, 1 insertion(+), 1 deletion(-) --- expect 2014-07-14 17:10:13.755209229 + +++ filtered2014-07-14 17:10:13.763209258 + @@ -1 +1 @@ -SHA (3 entries, 0 subtrees) +invalid (0 subtrees) not ok 10 - commit --interactive gives cache-tree on partial commit # # cat -\EOT foo.c # int foo() # { # return 42; # } # int bar() # { # return 42; # } # EOT # git add foo.c # test_invalid_cache_tree # git commit -m add a file # test_cache_tree # cat -\EOT foo.c # int foo() # { # return 43; # } # int bar() # { # return 44; # } # EOT # (echo p; echo 1; echo; echo s; echo n; echo y; echo q) | # git commit --interactive -m foo # test_cache_tree # $ Note that
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
Junio C Hamano gits...@pobox.com writes: Duy Nguyen pclo...@gmail.com writes: On Sat, Jul 12, 2014 at 11:44 AM, David Turner dtur...@twopensource.com wrote: @@ -342,6 +342,15 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, discard_cache(); read_cache_from(index_lock.filename); + if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { + fd = open(index_lock.filename, O_WRONLY); + if (fd = 0) + if (write_cache(fd, active_cache, active_nr) == 0) { + close_lock_file(index_lock); If write_cache() returns a negative value, index.lock is probably corrupted. Should we die() instead of moving on and returning index_lock.filename to the caller? The caller may move index.lock to index later on and officially ruin index. Perhaps true, but worse yet, this will not play nicely together with your split index series, no? After taking the lock and writing and closing, we spawn the interactive while still holding the lock, and the open we see here is because we want to further update the same under the same lock. Perhaps write_locked_index() API in the split index series can notice that the underlying fd in index_lock has been closed earlier, realize that the call is to re-update the index under the same lock and open the file again for writing? Then we can update the above open() then write_cache() sequence with just a call to write_locked_index(). Or perhaps introduce reopen_locked_file() and have these rare callers that want to re-update after they closed the file call it, in order to avoid mistake? Perhaps something like this on top of your series? -- 8 -- Subject: [PATCH] lockfile: allow reopening a closed but still locked file In some code paths (e.g. giving add -i to prepare the contents to be committed interactively inside commit -p) where a caller takes a lock, writes the new content, give chance for others to use it while still holding the lock, and then releases the lock when all is done. As an extension, allow the caller to re-update an already closed file while still holding the lock (i.e. not yet committed) by re-opening the file, to be followed by updating the contents and then by the usual close_lock_file() or commit_lock_file(). This is necessary if we want to add code to rebuild the cache-tree and write the resulting index out after add -i returns the control to commit -p, for example. Signed-off-by: Junio C Hamano gits...@pobox.com --- cache.h| 7 --- lockfile.c | 51 +-- 2 files changed, 41 insertions(+), 17 deletions(-) diff --git a/cache.h b/cache.h index c6b7770..a9344cf 100644 --- a/cache.h +++ b/cache.h @@ -567,12 +567,13 @@ extern NORETURN void unable_to_lock_index_die(const char *path, int err); extern int hold_lock_file_for_update(struct lock_file *, const char *path, int); extern int hold_lock_file_for_append(struct lock_file *, const char *path, int); extern int commit_lock_file(struct lock_file *); -extern void update_index_if_able(struct index_state *, struct lock_file *); +extern int reopen_lock_file(struct lock_file *); +extern int close_lock_file(struct lock_file *); +extern void rollback_lock_file(struct lock_file *); +extern void update_index_if_able(struct index_state *, struct lock_file *); extern int hold_locked_index(struct lock_file *, int); extern void set_alternate_index_output(const char *); -extern int close_lock_file(struct lock_file *); -extern void rollback_lock_file(struct lock_file *); extern int delete_ref(const char *, const unsigned char *sha1, int delopt); /* Environment bits from configuration mechanism */ diff --git a/lockfile.c b/lockfile.c index b706614..618a4b2 100644 --- a/lockfile.c +++ b/lockfile.c @@ -120,21 +120,8 @@ static char *resolve_symlink(char *p, size_t s) return p; } - -static int lock_file(struct lock_file *lk, const char *path, int flags) +static int lock_file_open(struct lock_file *lk) { - /* -* subtract 5 from size to make sure there's room for adding -* .lock for the lock file name -*/ - static const size_t max_path_len = sizeof(lk-filename) - 5; - - if (strlen(path) = max_path_len) - return -1; - strcpy(lk-filename, path); - if (!(flags LOCK_NODEREF)) - resolve_symlink(lk-filename, max_path_len); - strcat(lk-filename, .lock); lk-fd = open(lk-filename, O_RDWR | O_CREAT | O_EXCL, 0666); if (0 = lk-fd) { if (!lock_file_list) { @@ -156,6 +143,23 @@ static int lock_file(struct lock_file *lk, const char *path, int flags) return lk-fd; } +static int lock_file(struct lock_file *lk, const char *path, int flags) +{ + /* +* subtract 5 from size to make sure there's room for adding +* .lock
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
Ramsay Jones ram...@ramsay1.demon.co.uk writes: that the merge commit 7608c87e fails. Looking at the details of the merge resolution, made me think of Duy's split index work. Yes, there is a deliberately dropped hunk from dt/cache-tree-repair in that merge, because the topic relied on being able to say here is the file descriptor, write the index to it, which no longer is available with the split-index topic. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On 14/07/14 18:51, Junio C Hamano wrote: Ramsay Jones ram...@ramsay1.demon.co.uk writes: that the merge commit 7608c87e fails. Looking at the details of the merge resolution, made me think of Duy's split index work. Yes, there is a deliberately dropped hunk from dt/cache-tree-repair in that merge, because the topic relied on being able to say here is the file descriptor, write the index to it, which no longer is available with the split-index topic. Ah, OK. Sounds like everything is under control then. Thanks. ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
Ramsay Jones ram...@ramsay1.demon.co.uk writes: On 14/07/14 18:51, Junio C Hamano wrote: Ramsay Jones ram...@ramsay1.demon.co.uk writes: that the merge commit 7608c87e fails. Looking at the details of the merge resolution, made me think of Duy's split index work. Yes, there is a deliberately dropped hunk from dt/cache-tree-repair in that merge, because the topic relied on being able to say here is the file descriptor, write the index to it, which no longer is available with the split-index topic. Ah, OK. Sounds like everything is under control then. Wasn't, but now I think it is ;-) David, could you please double check the conflict resolution at 882426ea (Merge branch 'dt/cache-tree-repair' into jch, 2014-07-14), at about the middle between master..pu? By eyeballing git diff 882426ea^ 882426ea we should see what your series would have done if it were based on top of the nd/split-index topic. The most iffy is the first hunk of change to builtin/commit.c, which is more or less my rewrite of what you did on top of 'master'. The change to builtin/checkout.c also seems somewhat iffy in that we treat the_index.cache_tree (aka active_cache_tree) as if cache trees are something we can manipulate independent of a particular index_state (which has been the rule for a long time), even though in the world order after nd/split-index topic, cache_tree_update() can no longer be used on a cache-tree that is not associated to a particular index_state. It is not a problem with your series, but comes from nd/split-index topic, and it might indicate a slight unevenness of the API (i.e. we may want to either insist that the public API to muck with a cache-tree outside cache-tree.c must be accessed via an index-state and never via a bare cache-tree structure, by insisting that cache_tree_fully_valid() to take a pointer to an index-state as well; or we may want to go the other way and allow API users to pass a bare cache-tree without the index-state when the latter is not absolutely necessary, by changing cache_tree_update() to take a cache-tree, not an index-state). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On Mon, 2014-07-14 at 15:16 -0700, Junio C Hamano wrote: Ramsay Jones ram...@ramsay1.demon.co.uk writes: On 14/07/14 18:51, Junio C Hamano wrote: Ramsay Jones ram...@ramsay1.demon.co.uk writes: that the merge commit 7608c87e fails. Looking at the details of the merge resolution, made me think of Duy's split index work. Yes, there is a deliberately dropped hunk from dt/cache-tree-repair in that merge, because the topic relied on being able to say here is the file descriptor, write the index to it, which no longer is available with the split-index topic. Ah, OK. Sounds like everything is under control then. Wasn't, but now I think it is ;-) David, could you please double check the conflict resolution at 882426ea (Merge branch 'dt/cache-tree-repair' into jch, 2014-07-14), at about the middle between master..pu? By eyeballing git diff 882426ea^ 882426ea we should see what your series would have done if it were based on top of the nd/split-index topic. The most iffy is the first hunk of change to builtin/commit.c, which is more or less my rewrite of what you did on top of 'master'. LGTM. And the tests all pass. The change to builtin/checkout.c also seems somewhat iffy in that we treat the_index.cache_tree (aka active_cache_tree) as if cache trees are something we can manipulate independent of a particular index_state (which has been the rule for a long time), even though in the world order after nd/split-index topic, cache_tree_update() can no longer be used on a cache-tree that is not associated to a particular index_state. It is not a problem with your series, but comes from nd/split-index topic, and it might indicate a slight unevenness of the API (i.e. we may want to either insist that the public API to muck with a cache-tree outside cache-tree.c must be accessed via an index-state and never via a bare cache-tree structure, by insisting that cache_tree_fully_valid() to take a pointer to an index-state as well; or we may want to go the other way and allow API users to pass a bare cache-tree without the index-state when the latter is not absolutely necessary, by changing cache_tree_update() to take a cache-tree, not an index-state). I agree that some sort of API updates like would be an improvement. But this seems to work for now. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On Tue, Jul 15, 2014 at 5:16 AM, Junio C Hamano gits...@pobox.com wrote: Ramsay Jones ram...@ramsay1.demon.co.uk writes: On 14/07/14 18:51, Junio C Hamano wrote: Ramsay Jones ram...@ramsay1.demon.co.uk writes: that the merge commit 7608c87e fails. Looking at the details of the merge resolution, made me think of Duy's split index work. Yes, there is a deliberately dropped hunk from dt/cache-tree-repair in that merge, because the topic relied on being able to say here is the file descriptor, write the index to it, which no longer is available with the split-index topic. Ah, OK. Sounds like everything is under control then. Wasn't, but now I think it is ;-) David, could you please double check the conflict resolution at 882426ea (Merge branch 'dt/cache-tree-repair' into jch, 2014-07-14), at about the middle between master..pu? By eyeballing git diff 882426ea^ 882426ea we should see what your series would have done if it were based on top of the nd/split-index topic. The most iffy is the first hunk of change to builtin/commit.c, which is more or less my rewrite of what you did on top of 'master'. It makes me wonder if a cleaner way of rebuilding cache-treei in this case is from git-add--interactive.perl, or by simply spawn git update-index --rebuild-cache-tree after running git-add--interactive.perl. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
On Sat, Jul 12, 2014 at 11:44 AM, David Turner dtur...@twopensource.com wrote: @@ -342,6 +342,15 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, discard_cache(); read_cache_from(index_lock.filename); + if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { + fd = open(index_lock.filename, O_WRONLY); + if (fd = 0) + if (write_cache(fd, active_cache, active_nr) == 0) { + close_lock_file(index_lock); If write_cache() returns a negative value, index.lock is probably corrupted. Should we die() instead of moving on and returning index_lock.filename to the caller? The caller may move index.lock to index later on and officially ruin index. + } + } + else + fprintf(stderr, FAiled to update main cache tree\n); make the above line something like this for i18n support: fprintf_ln(stderr, _(Failed to update main cache tree)); commit_style = COMMIT_NORMAL; return index_lock.filename; -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 4/4] cache-tree: Write updated cache-tree after commit
During the commit process, update the cache-tree. Write this updated cache-tree so that it's ready for subsequent commands. Add test code which demonstrates that git commit now writes the cache tree. Make all tests test the entire cache-tree, not just the root level. Signed-off-by: David Turner dtur...@twitter.com --- builtin/commit.c | 16 ++- t/t0090-cache-tree.sh | 117 +++--- 2 files changed, 117 insertions(+), 16 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 9cfef6c..d6681c4 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -342,6 +342,15 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, discard_cache(); read_cache_from(index_lock.filename); + if (update_main_cache_tree(WRITE_TREE_SILENT) == 0) { + fd = open(index_lock.filename, O_WRONLY); + if (fd = 0) + if (write_cache(fd, active_cache, active_nr) == 0) { + close_lock_file(index_lock); + } + } + else + fprintf(stderr, FAiled to update main cache tree\n); commit_style = COMMIT_NORMAL; return index_lock.filename; @@ -383,8 +392,12 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, if (!only !pathspec.nr) { fd = hold_locked_index(index_lock, 1); refresh_cache_or_die(refresh_flags); - if (active_cache_changed) { + if (active_cache_changed + || !cache_tree_fully_valid(active_cache_tree)) { update_main_cache_tree(WRITE_TREE_SILENT); + active_cache_changed = 1; + } + if (active_cache_changed) { if (write_cache(fd, active_cache, active_nr) || commit_locked_index(index_lock)) die(_(unable to write new_index file)); @@ -435,6 +448,7 @@ static char *prepare_index(int argc, const char **argv, const char *prefix, fd = hold_locked_index(index_lock, 1); add_remove_files(partial); refresh_cache(REFRESH_QUIET); + update_main_cache_tree(WRITE_TREE_SILENT); if (write_cache(fd, active_cache, active_nr) || close_lock_file(index_lock)) die(_(unable to write new_index file)); diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh index 3a3342e..48c4240 100755 --- a/t/t0090-cache-tree.sh +++ b/t/t0090-cache-tree.sh @@ -8,7 +8,7 @@ cache-tree extension. . ./test-lib.sh cmp_cache_tree () { - test-dump-cache-tree actual + test-dump-cache-tree | sed -e '/#(ref)/d' actual sed s/$_x40/SHA/ actual filtered test_cmp $1 filtered } @@ -16,14 +16,38 @@ cmp_cache_tree () { # We don't bother with actually checking the SHA1: # test-dump-cache-tree already verifies that all existing data is # correct. -test_shallow_cache_tree () { - printf SHA (%d entries, 0 subtrees)\n $(git ls-files|wc -l) expect +generate_expected_cache_tree_rec () { + dir=$1${1:+/} + parent=$2 + # ls-files might have foo/bar, foo/bar/baz, and foo/bar/quux + # We want to count only foo because it's the only direct child + subtrees=$(git ls-files|grep /|cut -d / -f 1|uniq) + subtree_count=$(echo $subtrees|awk '$1 {++c} END {print c}') + entries=$(git ls-files|wc -l) + printf SHA $dir (%d entries, %d subtrees)\n $entries $subtree_count + for subtree in $subtrees + do + cd $subtree + generate_expected_cache_tree_rec $dir$subtree $dir || return 1 + cd .. + done + dir=$parent +} + +generate_expected_cache_tree () { + ( + generate_expected_cache_tree_rec + ) +} + +test_cache_tree () { + generate_expected_cache_tree expect cmp_cache_tree expect } test_invalid_cache_tree () { printf invalid %s ()\n $@ expect - test-dump-cache-tree | \ + test-dump-cache-tree | sed -n -e s/[0-9]* subtrees// -e '/#(ref)/d' -e '/^invalid /p' actual test_cmp expect actual } @@ -33,14 +57,14 @@ test_no_cache_tree () { cmp_cache_tree expect } -test_expect_failure 'initial commit has cache-tree' ' +test_expect_success 'initial commit has cache-tree' ' test_commit foo - test_shallow_cache_tree + test_cache_tree ' test_expect_success 'read-tree HEAD establishes cache-tree' ' git read-tree HEAD - test_shallow_cache_tree + test_cache_tree ' test_expect_success 'git-add invalidates cache-tree' ' @@ -58,6 +82,18 @@ test_expect_success 'git-add in subdir invalidates cache-tree' '