Re: [PATCH 0/2] Fix an error-handling path when locking refs

2017-10-24 Thread Jeff King
On Tue, Oct 24, 2017 at 05:16:23PM +0200, Michael Haggerty wrote:

> But in fact the problem is more general; it can happen whenever a
> reference deletion within a transaction is processed successfully, and
> then another reference update in the same transaction fails in
> `lock_ref_for_update()`. In such a case the failed update and any
> subsequent ones could be silently ignored.

Thanks for digging this down to the root cause. It sounds like this
might not be all that hard to trigger for users of "push --atomic", then
(most of the rest of the code is saved by the fact that it still uses
one ref per transaction, which I think can't trigger this).

> There is a longer explanation in the second patch's commit message.
> 
> The tests that I added probe a bunch of D/F update scenarios, which I
> think should be characteristic of the scenarios that would trigger
> this bug. It would be nice to have tests that examine other types of
> failures (e.g., wrong `old_oid` values).

What you added makes sense to me for now. I do admit I was a little
surprised that we didn't have better test coverage for partial
transaction failures. I think in general our test suite is weak on
checking failures, and covering more cases (like old_oid) would be
welcome. Those will be valuable especially as we started playing with
more storage backends.

I also agree that those tests can come post-release.

> While looking at this code again, I realized that the new code
> rewrites the `packed-refs` file more often than did the old code.
> Specifically, after dc39e09942 (files_ref_store: use a transaction to
> update packed refs, 2017-09-08), the `packed-refs` file is overwritten
> for any transaction that includes a reference delete. Prior to that
> commit, `packed-refs` was only overwritten if a deleted reference was
> actually present in the existing `packed-refs` file. This is even the
> case if there was previously no `packed-refs` file at all; now any
> reference deletion causes an empty `packed-refs` file to be created.
> 
> I think this is a conscionable regression, since deleting references
> that are purely loose is probably not all that common, and the old
> code had to read the whole `packed-refs` file anyway. Nevertheless, it
> is obviously something that I would like to fix (though maybe not for
> 2.15? I'm open to input about its urgency.)

I agree it's not nearly as urgent as the fix you have here. It might
show up on a busy system as increased lock contention over packed-refs.
Or if you have really gigantic packed-refs files, a possible performance
regression. But as you say, it should be a pretty rare case.

So I'd be OK with leaving it to post-2.15. OTOH, I suspect it's a pretty
small patch. If you happen to produce it quickly, it might be worth
seeing before evaluating.

-Peff


[PATCH 0/2] Fix an error-handling path when locking refs

2017-10-24 Thread Michael Haggerty
In [1], Peff described a bug that he found that could cause a
reference transaction to be partly carried-out and partly not, with a
successful return. The scenario that he discussed was the deletion of
one reference and creation of another, where the two references D/F
conflict with each other.

But in fact the problem is more general; it can happen whenever a
reference deletion within a transaction is processed successfully, and
then another reference update in the same transaction fails in
`lock_ref_for_update()`. In such a case the failed update and any
subsequent ones could be silently ignored.

There is a longer explanation in the second patch's commit message.

The tests that I added probe a bunch of D/F update scenarios, which I
think should be characteristic of the scenarios that would trigger
this bug. It would be nice to have tests that examine other types of
failures (e.g., wrong `old_oid` values).

Bit since the fix is obviously a strict improvement and can prevent
data loss, and since the release process is already pretty far along,
I wanted to send this out ASAP. We can follow it up later with
additional tests.

These patches apply to current master. They are also available from my
GitHub fork [2] as branch `ref-locking-fix`.

While looking at this code again, I realized that the new code
rewrites the `packed-refs` file more often than did the old code.
Specifically, after dc39e09942 (files_ref_store: use a transaction to
update packed refs, 2017-09-08), the `packed-refs` file is overwritten
for any transaction that includes a reference delete. Prior to that
commit, `packed-refs` was only overwritten if a deleted reference was
actually present in the existing `packed-refs` file. This is even the
case if there was previously no `packed-refs` file at all; now any
reference deletion causes an empty `packed-refs` file to be created.

I think this is a conscionable regression, since deleting references
that are purely loose is probably not all that common, and the old
code had to read the whole `packed-refs` file anyway. Nevertheless, it
is obviously something that I would like to fix (though maybe not for
2.15? I'm open to input about its urgency.)

[1] 
https://public-inbox.org/git/20171024082409.smwsd6pla64jj...@sigill.intra.peff.net/
[2] https://github.com/mhagger/git

Michael Haggerty (2):
  t1404: add a bunch of tests of D/F conflicts
  files_transaction_prepare(): fix handling of ref lock failure

 refs/files-backend.c |   2 +-
 t/t1404-update-ref-errors.sh | 146 +++
 2 files changed, 147 insertions(+), 1 deletion(-)

-- 
2.14.1