On 09/14/2017 10:23 PM, Johannes Schindelin wrote: > On Wed, 13 Sep 2017, Michael Haggerty wrote: > >> * `mmap()` the whole file rather than `read()`ing it. > > On Windows, a memory-mapped file cannot be renamed. As a consequence, the > following tests fail on `pu`: > > [...]
Thanks for your quick investigation. > This diff: > > -- snip -- > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index f9c5e771bdd..ee8b3603624 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -1306,13 +1308,13 @@ static int packed_transaction_finish(struct > ref_store *ref_store, > char *packed_refs_path; > > packed_refs_path = get_locked_file_path(&refs->lock); > + clear_snapshot(refs); > if (rename_tempfile(&refs->tempfile, packed_refs_path)) { > strbuf_addf(err, "error replacing %s: %s", > refs->path, strerror(errno)); > goto cleanup; > } > > - clear_snapshot(refs); > ret = 0; > > cleanup: > -- snap -- > > reduces the failed tests to > > t1410-reflog.counts.sh > t3210-pack-refs.counts.sh > t3211-peel-ref.counts.sh > t5505-remote.counts.sh > t5510-fetch.counts.sh > t5600-clone-fail-cleanup.counts.sh That change is a strict improvement on all systems; I'll integrate it into the series. > However, much as I tried to wrap my head around it, I could not even start > to come up with a solution for e.g. the t1410 regression. The failure > happens in `git branch -d one/two`. > > The culprit: there is yet another unreleased snapshot while we try to > finish the transaction. > > It is the snapshot of the main_worktree_ref_store as acquired by > add_head_info() (which is called from get_main_worktree(), which in turn > was called from get_worktrees(), in turn having been called from > find_shared_symref() that was called from delete_branches()), and it seems > to me that there was never any plan to have that released, including its > snapshot. Yeah the idea was that the default situation would be that whenever a `packed-refs` file needs to be read, it would be kept mmapped for the life of the program (or until the file was detected to have been changed). This was meant to be a replacement for the explicit `ref_cache`. So any long-lived git process could be expected to have the `packed-refs` file (or even multiple `packed-refs` files in the case of submodules) mmapped. That's obviously not going to work on Windows. > [...] > Do you have any idea how to ensure that all mmap()ed packed refs are > released before attempting to finish a transaction? [And from your other email:] > This is only one example, as I figure that multiple worktrees could cause > even more ref_stores to have unreleased snapshots of the packed-refs file. > > I imagine that something like close_all_packs() is needed > ("clear_all_snapshots()"?). Yes, I wasn't really familiar with `close_all_packs()`, but it sounds like it solves a similar problem. Within a single process, we could make this work via an arduous process of figuring out what functions might want to overwrite the `packed-refs` file, and making sure that nobody up the call stack is iterating over references during those function calls. If that condition is met, then calling `clear_snapshot()` earlier, as you propose above, would decrease the reference count to zero, causing the old snapshot to be freed, and allowing the rename to succeed. We could even do if (!clear_snapshot(refs)) BUG("packed-refs snapshot is still in use"); , analogously to `close_all_packs()`, to help find code that violates the condition. Similarly, it wouldn't be allowed to invoke a subprocess or hook function while iterating over references, and one would have to clear any current snapshots before doing so. It might even make sense to do that at the same time as `close_all_packs()`, for example in a new function `close_all_unnecessary_files()`. But what about unrelated processes? Anybody reading `packed-refs` would block anybody who wants to modify it, for example to delete a reference or pack the references. I think this is worse than the packfile situation. Packfiles all have unique names, so they never need to be overwritten; at worst they are deleted. And the deletion is never critical to correctness. If you can't delete a packfile, the only consequence is that it hangs around until the next GC. However, the `packed-refs` file always has the same name, and overwriting it is sometimes critical for correctness. So it sounds to me like even *readers* mustn't keep the file open or mmapped longer than necessary! (This makes me wonder, doesn't `rename_tempfile()` for the `packed-refs` file *already* fail sometimes on Windows due to a concurrent reader? Shouldn't it retry if it gets whatever error indicates that the failure was due to the file being open?) Anyway, if all of my reasoning is correct, it seems that the inescapable conclusion is that having the `packed-refs` file open or mmapped is tantamount to holding a reader lock on the file, because it causes writers to fail or at least have to retry. Therefore, even if you are only reading the `packed-refs` file, you should read then close/munmap it as quickly as possible. And if *that* is true, then ISTM that this mmap idea is unworkable on Windows. We would either need to keep the old `ref_cache`-based code around for use there, or we would need to make a copy the file contents in memory immediately and use *that* as our cache. The latter would be pretty easy to implement, actually, because something similar is done in `sort_snapshot()` to handle the case that the `packed-refs` file on disk is not correctly ordered by refname. I think that approach would be competitive with the `ref_cache`-based code, which also reads (and parses!) the whole file whenever any packed reference needs to be read. It is possible that memory usage might go up a bit due to the fact that the SHA-1s would be stored in memory in hex rather than binary form; on the other hand, this would spare some memory overhead associated with allocating lots of little objects and the pointers that string the objects together, so it's probably a wash. Please let me know if you agree with my reasoning. If so, I'll implement "always copy `packed-refs` file contents to RAM" for Windows. A possible future optimization might be that when iterating over a subset of the references (e.g., `refs/replace/`), only that part of the file is copied to RAM. But that would be a bigger change, and it's not obvious whether it might make some read access patterns slower. Michael