Re: [PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-26 Thread Michael Haggerty
On 10/26/2017 08:46 AM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> Even when we are deleting references, we needn't overwrite the
>> `packed-refs` file if the references that we are deleting are not
>> present in that file. Implement this optimization as follows:
> 
> This goal I can understand.  files-transaction-prepare may see a
> deletion and in order to avoid a deletion of a ref from the
> file-backend to expose a stale entry in the packed-refs file, it
> prepares a transaction to remove the same ref that might exist in
> it.  If it turns out that there is no such ref in the packed-refs
> file, then we can leave the packed-refs file as-is without
> rewriting.
> 
>> * Add a function `is_packed_transaction_noop()`, which checks whether
>>   a given packed-refs transaction doesn't actually have to do
>>   anything. This function must be called while holding the
>>   `packed-refs` lock to avoid races.
> 
> This checks three things only to cover the most trivial case (I am
> perfectly happy that it punts on more complex cases).  If an update
> 
>  - checks the old value,
> 
>  - sets a new value, or
> 
>  - deletes a ref that is not on the filesystem,
> 
> it is not a trivial case.  I can understand the latter two (i.e. We
> are special casing the deletion, and an update with a new value is
> not.  If removing a file from the filesystem is not sufficient to
> delete, we may have to rewrite the packed-refs), but not the first
> one.  Is it because currently we do not say "delete this ref only
> when its current value is X"?

It wouldn't be too hard to allow updates with REF_HAVE_OLD to be
optimized away too. I didn't do it because

1. We currently only create updates for that packed_refs backend with
REF_HAVE_OLD turned off, so such could would be unreachable (and
untestable).

2. I wanted to keep the patch as simple as possible in case you decide
to merge it into 2.15.

There would also be a little bit of a leveling violation (or maybe the
function name is not ideal). `is_packed_transaction_noop()` could check
that `old_oid` values are correct, and if they all are, declare the
transaction a NOOP. (It wasn't *really* a NOOP, but its OP, namely
checking old values, has already been carried out.) But what if it finds
that an `old_oid` was incorrect? Right now
`is_packed_transaction_noop()` has no way to report something like a
TRANSACTION_GENERIC_ERROR.

It could be that long-term it makes more sense for this optimization to
happen in `packed_transaction_prepare()`. Except that function is
sometimes called for its side-effects, like sorting an existing file, in
which case overwriting the `packed-refs` file shouldn't be optimized away.

So overall it seemed easier to punt on this optimization at this point.

> Also it is not immediately obvious how the "is this noop" helper is
> checking the absence of the same-named ref in the current
> packed-refs file, which is what matters for the correctness of the
> optimization.

The check is in the second loop in `is_packed_transaction_noop()`:

if (!refs_read_raw_ref(ref_store, update->refname,
   oid.hash, &referent, &type) ||
errno != ENOENT) {
/*
 * We might have to actually delete that
 * reference -> not a NOOP.
 */
ret = 0;
break;
}

If the refname doesn't exist, then `refs_read_raw_ref()` returns -1 and
sets errno to ENOENT, and the loop continues. Otherwise we exit with a
value of 0, meaning that the transaction is not a NOOP.

There are a lot of double-negatives here. It might be easier to follow
the logic if the sense of the function were inverted:
`is_packed_transaction_needed()`. Let me know if you have a strong
feeling about it.

Michael


Re: [PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-25 Thread Junio C Hamano
Michael Haggerty  writes:

> Even when we are deleting references, we needn't overwrite the
> `packed-refs` file if the references that we are deleting are not
> present in that file. Implement this optimization as follows:

This goal I can understand.  files-transaction-prepare may see a
deletion and in order to avoid a deletion of a ref from the
file-backend to expose a stale entry in the packed-refs file, it
prepares a transaction to remove the same ref that might exist in
it.  If it turns out that there is no such ref in the packed-refs
file, then we can leave the packed-refs file as-is without
rewriting.

> * Add a function `is_packed_transaction_noop()`, which checks whether
>   a given packed-refs transaction doesn't actually have to do
>   anything. This function must be called while holding the
>   `packed-refs` lock to avoid races.

This checks three things only to cover the most trivial case (I am
perfectly happy that it punts on more complex cases).  If an update

 - checks the old value,

 - sets a new value, or

 - deletes a ref that is not on the filesystem,

it is not a trivial case.  I can understand the latter two (i.e. We
are special casing the deletion, and an update with a new value is
not.  If removing a file from the filesystem is not sufficient to
delete, we may have to rewrite the packed-refs), but not the first
one.  Is it because currently we do not say "delete this ref only
when its current value is X"?

Also it is not immediately obvious how the "is this noop" helper is
checking the absence of the same-named ref in the current
packed-refs file, which is what matters for the correctness of the
optimization.



[PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

2017-10-25 Thread Michael Haggerty
Even when we are deleting references, we needn't overwrite the
`packed-refs` file if the references that we are deleting are not
present in that file. Implement this optimization as follows:

* Add a function `is_packed_transaction_noop()`, which checks whether
  a given packed-refs transaction doesn't actually have to do
  anything. This function must be called while holding the
  `packed-refs` lock to avoid races.

* Change `files_transaction_prepare()` to check whether the
  packed-refs transaction is unneeded. If so, squelch it, but continue
  holding the `packed-refs` lock until the end of the transaction to
  avoid races.

This fixes two tests in t1409.

Signed-off-by: Michael Haggerty 
---
 refs/files-backend.c  | 18 +++-
 refs/packed-backend.c | 68 +++
 refs/packed-backend.h |  8 +
 t/t1409-avoid-packing-refs.sh |  4 +--
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 014dabb0bf..5689e3a58d 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2605,7 +2605,23 @@ static int files_transaction_prepare(struct ref_store 
*ref_store,
goto cleanup;
}
backend_data->packed_refs_locked = 1;
-   ret = ref_transaction_prepare(packed_transaction, err);
+
+   if (is_packed_transaction_noop(refs->packed_ref_store,
+  packed_transaction)) {
+   /*
+* We can skip rewriting the `packed-refs`
+* file. But we do need to leave it locked, so
+* that somebody else doesn't pack a reference
+* that we are trying to delete.
+*/
+   if (ref_transaction_abort(packed_transaction, err)) {
+   ret = TRANSACTION_GENERIC_ERROR;
+   goto cleanup;
+   }
+   backend_data->packed_transaction = NULL;
+   } else {
+   ret = ref_transaction_prepare(packed_transaction, err);
+   }
}
 
 cleanup:
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 3279d42c5a..064b1b58a2 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1261,6 +1261,74 @@ static int write_with_updates(struct packed_ref_store 
*refs,
return -1;
 }
 
+int is_packed_transaction_noop(struct ref_store *ref_store,
+  struct ref_transaction *transaction)
+{
+   struct packed_ref_store *refs = packed_downcast(
+   ref_store,
+   REF_STORE_READ,
+   "is_packed_transaction_noop");
+   struct strbuf referent = STRBUF_INIT;
+   size_t i;
+   int ret;
+
+   if (!is_lock_file_locked(&refs->lock))
+   BUG("is_packed_transaction_noop() called while unlocked");
+
+   /*
+* We're only going to bother returning true for the common,
+* trivial case that references are only being deleted, their
+* old values are not being checked, and the old `packed-refs`
+* file doesn't contain any of those reference(s). More
+* complicated cases (1) are unlikely to be able to be
+* optimized away anyway, and (2) are more expensive to check.
+* Start with cheap checks:
+*/
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+
+   if (update->flags & REF_HAVE_OLD)
+   /* Have to check the old value -> not a NOOP. */
+   return 0;
+
+   if ((update->flags & REF_HAVE_NEW) && 
!is_null_oid(&update->new_oid))
+   /* Have to set a new value -> not a NOOP. */
+   return 0;
+   }
+
+   /*
+* The transaction isn't checking any old values nor is it
+* setting any nonzero new values, so it still might be a
+* NOOP. Now do the more expensive check: the update is not a
+* NOOP if one of the updates is a delete, and the old
+* `packed-refs` file contains a value for that reference.
+*/
+   ret = 1;
+   for (i = 0; i < transaction->nr; i++) {
+   struct ref_update *update = transaction->updates[i];
+   unsigned int type;
+   struct object_id oid;
+
+   if (!(update->flags & REF_HAVE_NEW))
+   /* This reference isn't being deleted -> NOOP. */
+   continue;
+
+   if (!refs_read_raw_ref(ref_store, update->refname,
+  oid.hash, &referent, &type) ||
+   errno != ENOENT) {
+   /*
+* We might have to actually delet