Re: [PATCH] Race condition in svnsync can wedge a mirror repository

Stefan Sperling Fri, 27 Nov 2009 03:54:34 -0800

On Fri, Nov 27, 2009 at 11:04:26AM +0000, Philip Martin wrote:
> Stefan Sperling <[email protected]> writes:
> 
> > What about the patch below?
> >
> > [[[
> > * subversion/svnsync/main.c
> >   (get_lock): The lock attempt is done in iteration N, and success is
> >    checked in iteration N+1. Prevent the case where svnsync attempts
> >    to take the lock in the very last iteration, doesn't check for
> >    success, and ends up erroring out, leaving behind a stale lock.
> >
> > Found by: Jon Foster <[email protected]>
> > Based on patch by: philip
> > ]]]
> 
> Yes, that sort of thing.  Jon's patch was more complicated because he
> was also trying to avoid the pointless final sleep before exiting with
> a "couldn't get lock" error.  I don't think one extra second sleep
> really matters.


OK, I will commit it.

There is a much worse race condition, however.
The code with my patch looks like this:

#define SVNSYNC_LOCK_RETRIES 10
  for (i = 0; i < SVNSYNC_LOCK_RETRIES; ++i)
    {
      svn_pool_clear(subpool);
      SVN_ERR(check_cancel(NULL));

      SVN_ERR(svn_ra_rev_prop(session, 0, SVNSYNC_PROP_LOCK, &reposlocktoken,
                              subpool));

      if (reposlocktoken)
        {
          /* Did we get it?   If so, we're done, otherwise we sleep. */
          if (strcmp(reposlocktoken->data, mylocktoken->data) == 0)
            return SVN_NO_ERROR;
          else
            {
              SVN_ERR(svn_cmdline_printf
                      (pool, _("Failed to get lock on destination "
                               "repos, currently held by '%s'\n"),
                       reposlocktoken->data));

              apr_sleep(apr_time_from_sec(1));
            }
        }
      else if (i < SVNSYNC_LOCK_RETRIES - 1)
        {
          /* Except in the very last iteration, try to set the lock. */
          SVN_ERR(svn_ra_change_rev_prop(session, 0, SVNSYNC_PROP_LOCK,
                                         mylocktoken, subpool));
        }
    }

Now imagine two svnsync process running on the same machine,
or even on two different machines over the network, both trying
to get the lock:

      SVN_ERR(svn_ra_rev_prop(session, 0, SVNSYNC_PROP_LOCK, &reposlocktoken,
                              subpool));

-- Process 1 set the lock in the previous iteration and is now here.
-- The strcmp() below will be good so it got the write lock and will
-- do a sync.

      if (reposlocktoken)
        {
          /* Did we get it?   If so, we're done, otherwise we sleep. */
          if (strcmp(reposlocktoken->data, mylocktoken->data) == 0)
            return SVN_NO_ERROR;
          [...]
        }
      else if (i < SVNSYNC_LOCK_RETRIES - 1)
        {

-- Process 2 has not gotten the lock yet and is now here.
-- It is about to overwrite the lock, read it in the next iteration,
-- see that it got the lock, and will also do a sync.

          /* Except in the very last iteration, try to set the lock. */
          SVN_ERR(svn_ra_change_rev_prop(session, 0, SVNSYNC_PROP_LOCK,
                                         mylocktoken, subpool));
        }
    }

Classic set-and-test problem. It's not fixable in svnsync itself.

We've seen interleaved svnsync processes running and corrupting the
sync meta data on the mirror. The interleaved svnsync output from logs
looks like this:

Start: Wed Nov 25 17:35:01 CET 2009
Start: Wed Nov 25 17:35:01 CET 2009
Committed revision 68143.
Copied properties for revision 68143.
Committed revision 68144.
svnsync: Commit created rev 68144 but should have created 68143
real 0.92
user 0.57
sys 0.03
End: Wed Nov 25 17:35:02 CET 2009
Transmitting file data ..
Committed revision 68145.
svnsync: Commit created rev 68145 but should have created 68144
real 1.08
user 0.64
sys 0.06
End: Wed Nov 25 17:35:02 CET 2009

Both these processes were running from a cronjob and accessed
the repository via ra_local. I have no idea why they ended up
starting their sync at roughly the same time.

Anyway, the core issue here is that the repository should provide
a lockable object, that can be set and tested atomically via the
RA layer (i.e. also over the network). We need something that works
e.g. like file-locking is done in NFS (svnlockd?).

For now, users cannot assume that svnsync will handle concurrent
syncs, and need to take care of synchronising sync jobs themselves.

Stefan

Re: [PATCH] Race condition in svnsync can wedge a mirror repository

Reply via email to