Re: [PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily
On 10/26/2017 08:46 AM, Junio C Hamano wrote: > Michael Haggerty writes: > >> Even when we are deleting references, we needn't overwrite the >> `packed-refs` file if the references that we are deleting are not >> present in that file. Implement this optimization as follows: > > This goal I can understand. files-transaction-prepare may see a > deletion and in order to avoid a deletion of a ref from the > file-backend to expose a stale entry in the packed-refs file, it > prepares a transaction to remove the same ref that might exist in > it. If it turns out that there is no such ref in the packed-refs > file, then we can leave the packed-refs file as-is without > rewriting. > >> * Add a function `is_packed_transaction_noop()`, which checks whether >> a given packed-refs transaction doesn't actually have to do >> anything. This function must be called while holding the >> `packed-refs` lock to avoid races. > > This checks three things only to cover the most trivial case (I am > perfectly happy that it punts on more complex cases). If an update > > - checks the old value, > > - sets a new value, or > > - deletes a ref that is not on the filesystem, > > it is not a trivial case. I can understand the latter two (i.e. We > are special casing the deletion, and an update with a new value is > not. If removing a file from the filesystem is not sufficient to > delete, we may have to rewrite the packed-refs), but not the first > one. Is it because currently we do not say "delete this ref only > when its current value is X"? It wouldn't be too hard to allow updates with REF_HAVE_OLD to be optimized away too. I didn't do it because 1. We currently only create updates for that packed_refs backend with REF_HAVE_OLD turned off, so such could would be unreachable (and untestable). 2. I wanted to keep the patch as simple as possible in case you decide to merge it into 2.15. There would also be a little bit of a leveling violation (or maybe the function name is not ideal). `is_packed_transaction_noop()` could check that `old_oid` values are correct, and if they all are, declare the transaction a NOOP. (It wasn't *really* a NOOP, but its OP, namely checking old values, has already been carried out.) But what if it finds that an `old_oid` was incorrect? Right now `is_packed_transaction_noop()` has no way to report something like a TRANSACTION_GENERIC_ERROR. It could be that long-term it makes more sense for this optimization to happen in `packed_transaction_prepare()`. Except that function is sometimes called for its side-effects, like sorting an existing file, in which case overwriting the `packed-refs` file shouldn't be optimized away. So overall it seemed easier to punt on this optimization at this point. > Also it is not immediately obvious how the "is this noop" helper is > checking the absence of the same-named ref in the current > packed-refs file, which is what matters for the correctness of the > optimization. The check is in the second loop in `is_packed_transaction_noop()`: if (!refs_read_raw_ref(ref_store, update->refname, oid.hash, &referent, &type) || errno != ENOENT) { /* * We might have to actually delete that * reference -> not a NOOP. */ ret = 0; break; } If the refname doesn't exist, then `refs_read_raw_ref()` returns -1 and sets errno to ENOENT, and the loop continues. Otherwise we exit with a value of 0, meaning that the transaction is not a NOOP. There are a lot of double-negatives here. It might be easier to follow the logic if the sense of the function were inverted: `is_packed_transaction_needed()`. Let me know if you have a strong feeling about it. Michael
Re: [PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily
Michael Haggerty writes: > Even when we are deleting references, we needn't overwrite the > `packed-refs` file if the references that we are deleting are not > present in that file. Implement this optimization as follows: This goal I can understand. files-transaction-prepare may see a deletion and in order to avoid a deletion of a ref from the file-backend to expose a stale entry in the packed-refs file, it prepares a transaction to remove the same ref that might exist in it. If it turns out that there is no such ref in the packed-refs file, then we can leave the packed-refs file as-is without rewriting. > * Add a function `is_packed_transaction_noop()`, which checks whether > a given packed-refs transaction doesn't actually have to do > anything. This function must be called while holding the > `packed-refs` lock to avoid races. This checks three things only to cover the most trivial case (I am perfectly happy that it punts on more complex cases). If an update - checks the old value, - sets a new value, or - deletes a ref that is not on the filesystem, it is not a trivial case. I can understand the latter two (i.e. We are special casing the deletion, and an update with a new value is not. If removing a file from the filesystem is not sufficient to delete, we may have to rewrite the packed-refs), but not the first one. Is it because currently we do not say "delete this ref only when its current value is X"? Also it is not immediately obvious how the "is this noop" helper is checking the absence of the same-named ref in the current packed-refs file, which is what matters for the correctness of the optimization.
[PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily
Even when we are deleting references, we needn't overwrite the `packed-refs` file if the references that we are deleting are not present in that file. Implement this optimization as follows: * Add a function `is_packed_transaction_noop()`, which checks whether a given packed-refs transaction doesn't actually have to do anything. This function must be called while holding the `packed-refs` lock to avoid races. * Change `files_transaction_prepare()` to check whether the packed-refs transaction is unneeded. If so, squelch it, but continue holding the `packed-refs` lock until the end of the transaction to avoid races. This fixes two tests in t1409. Signed-off-by: Michael Haggerty --- refs/files-backend.c | 18 +++- refs/packed-backend.c | 68 +++ refs/packed-backend.h | 8 + t/t1409-avoid-packing-refs.sh | 4 +-- 4 files changed, 95 insertions(+), 3 deletions(-) diff --git a/refs/files-backend.c b/refs/files-backend.c index 014dabb0bf..5689e3a58d 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -2605,7 +2605,23 @@ static int files_transaction_prepare(struct ref_store *ref_store, goto cleanup; } backend_data->packed_refs_locked = 1; - ret = ref_transaction_prepare(packed_transaction, err); + + if (is_packed_transaction_noop(refs->packed_ref_store, + packed_transaction)) { + /* +* We can skip rewriting the `packed-refs` +* file. But we do need to leave it locked, so +* that somebody else doesn't pack a reference +* that we are trying to delete. +*/ + if (ref_transaction_abort(packed_transaction, err)) { + ret = TRANSACTION_GENERIC_ERROR; + goto cleanup; + } + backend_data->packed_transaction = NULL; + } else { + ret = ref_transaction_prepare(packed_transaction, err); + } } cleanup: diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 3279d42c5a..064b1b58a2 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -1261,6 +1261,74 @@ static int write_with_updates(struct packed_ref_store *refs, return -1; } +int is_packed_transaction_noop(struct ref_store *ref_store, + struct ref_transaction *transaction) +{ + struct packed_ref_store *refs = packed_downcast( + ref_store, + REF_STORE_READ, + "is_packed_transaction_noop"); + struct strbuf referent = STRBUF_INIT; + size_t i; + int ret; + + if (!is_lock_file_locked(&refs->lock)) + BUG("is_packed_transaction_noop() called while unlocked"); + + /* +* We're only going to bother returning true for the common, +* trivial case that references are only being deleted, their +* old values are not being checked, and the old `packed-refs` +* file doesn't contain any of those reference(s). More +* complicated cases (1) are unlikely to be able to be +* optimized away anyway, and (2) are more expensive to check. +* Start with cheap checks: +*/ + for (i = 0; i < transaction->nr; i++) { + struct ref_update *update = transaction->updates[i]; + + if (update->flags & REF_HAVE_OLD) + /* Have to check the old value -> not a NOOP. */ + return 0; + + if ((update->flags & REF_HAVE_NEW) && !is_null_oid(&update->new_oid)) + /* Have to set a new value -> not a NOOP. */ + return 0; + } + + /* +* The transaction isn't checking any old values nor is it +* setting any nonzero new values, so it still might be a +* NOOP. Now do the more expensive check: the update is not a +* NOOP if one of the updates is a delete, and the old +* `packed-refs` file contains a value for that reference. +*/ + ret = 1; + for (i = 0; i < transaction->nr; i++) { + struct ref_update *update = transaction->updates[i]; + unsigned int type; + struct object_id oid; + + if (!(update->flags & REF_HAVE_NEW)) + /* This reference isn't being deleted -> NOOP. */ + continue; + + if (!refs_read_raw_ref(ref_store, update->refname, + oid.hash, &referent, &type) || + errno != ENOENT) { + /* +* We might have to actually delet