On Tue, 11 Feb 2014 09:50:35 +0000 Arne Wiebalck <arne.wieba...@cern.ch> wrote:
> I am currently managing write trans 1392106724.-1892301558 [...] > Note that the sync site has gone to Recovery state f and that the time > at which the last vote was received on the other two servers has quite > a time gap which gets larger with time. Is the negative trans ID ok? There was a bug where a negative transaction ID would not be handled correctly, causing transactions to be killed and causing ubik to think it doesn't have quorum. It was fixed by 4c80871a16d6022c3d3e5edc0504208ddad49cc8 on 1.6 (gerrit 5751, 2647). It looks like that was in 1.6.1. Without that fix, the workaround is to periodically restart the dbservers so the transaction id doesn't roll over and become negative. I assume that's what it is, just because as far as I'm aware it always happens with negative transaction IDs. But if you want some other ways of trying to verify it, it can result in the "major synchronization error" (USYNC) error message, which is barely ever seen outside of this issue. There's also a thread describing a manifestation of the issue here, if this looks familiar: <http://lists.openafs.org/pipermail/openafs-info/2004-April/013225.html>. Though that thread didn't identify the problem enough to actually fix it back then. > Is this a known issue? > > I had understood that it should be OK to run 1.4 and 1.6 file servers > in parallel and that the DB servers could be updated after the file > servers, but maybe that is not correct? Mixing 1.4 and 1.6 servers is fine (some sites have or had fileservers much much older than that :). While I don't think there's too much of a reason to do dbservers 'before' or 'after' the others (but I'm not thinking too hard about it), they are usually seen as more critical, so they probably do tend to get updated last. If the issue you're experiencing is the thing I mentioned above, it doesn't have anything to do with the version of the fileservers (and I don't think anything would; there's no interaction between the fileservers and dbservers for those operations). If you saw it only when moving volumes between 1.4 and 1.6 servers, as far as I know you're just lucky :) -- Andrew Deason adea...@sinenomine.net _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info