On Tue, Oct 24, 2017 at 05:16:23PM +0200, Michael Haggerty wrote:
> But in fact the problem is more general; it can happen whenever a
> reference deletion within a transaction is processed successfully, and
> then another reference update in the same transaction fails in
> `lock_ref_for_update()`. In such a case the failed update and any
> subsequent ones could be silently ignored.
Thanks for digging this down to the root cause. It sounds like this
might not be all that hard to trigger for users of "push --atomic", then
(most of the rest of the code is saved by the fact that it still uses
one ref per transaction, which I think can't trigger this).
> There is a longer explanation in the second patch's commit message.
>
> The tests that I added probe a bunch of D/F update scenarios, which I
> think should be characteristic of the scenarios that would trigger
> this bug. It would be nice to have tests that examine other types of
> failures (e.g., wrong `old_oid` values).
What you added makes sense to me for now. I do admit I was a little
surprised that we didn't have better test coverage for partial
transaction failures. I think in general our test suite is weak on
checking failures, and covering more cases (like old_oid) would be
welcome. Those will be valuable especially as we started playing with
more storage backends.
I also agree that those tests can come post-release.
> While looking at this code again, I realized that the new code
> rewrites the `packed-refs` file more often than did the old code.
> Specifically, after dc39e09942 (files_ref_store: use a transaction to
> update packed refs, 2017-09-08), the `packed-refs` file is overwritten
> for any transaction that includes a reference delete. Prior to that
> commit, `packed-refs` was only overwritten if a deleted reference was
> actually present in the existing `packed-refs` file. This is even the
> case if there was previously no `packed-refs` file at all; now any
> reference deletion causes an empty `packed-refs` file to be created.
>
> I think this is a conscionable regression, since deleting references
> that are purely loose is probably not all that common, and the old
> code had to read the whole `packed-refs` file anyway. Nevertheless, it
> is obviously something that I would like to fix (though maybe not for
> 2.15? I'm open to input about its urgency.)
I agree it's not nearly as urgent as the fix you have here. It might
show up on a busy system as increased lock contention over packed-refs.
Or if you have really gigantic packed-refs files, a possible performance
regression. But as you say, it should be a pretty rare case.
So I'd be OK with leaving it to post-2.15. OTOH, I suspect it's a pretty
small patch. If you happen to produce it quickly, it might be worth
seeing before evaluating.
-Peff