Subject: openafs-dbserver: VLDB changes not being sync'ed to vldb.DB0 Package: openafs-dbserver Version: 1.4.7~pre3.dfsg1-1 Severity: critical Justification: breaks the whole system
*** Please type your report below this line *** Recent vlserver's fail to write VLDB changes to the /var/lib/openafs/db/vldb.DB0 file on non sync-sites. The effect is that, whilst the in-memory VLDB is correct, the version on disk is not correct except on the sync site. If all vlserver's for a cell are restarted *at the same time*, all recent changes to the VLDB are lost. The problem is reproducible: - Stop, with bos, all 3 vlserver's (all three are running the version below). - Remove /var/lib/openafs/db/vldb* on all db servers. - Restart, with bos, all 3 vlserver's. Empty vldb.DB0 files are created on all servers. The vlservers show no errors in logs. - Wait for quorum to be established (check via udebug, recovery state 1f). - Run 'vos listvldb' to check that no volumes are registered. - Run 'vos syncvldb' for each fileserver in cell. - udebug on sync site shows DB version incrementing + recovery state 1f. - 'vos listvldb' now shows all volumes in cell correctly and all clients can successfully access cell volumes. - Wait 1 or more hours. - The vldb.DB0 file has zero size on non sync-site and timestamp when vlserver was started. On sync site it has grown and has timestamp of last syncvldb operation. - Restart all vlservers. The vlservers show no errors in logs. - Wait for quorum to be established (check via udebug) + recovery state 1f. - 'vos listvldb' shows no volumes. - Redoing the syncvldb allows the clients to again access volumes. This problem was also seen with i686 dbserver on testing (before upgrade to amd64 testing) and seems to have begun somewhere after openafs 1.4.2. Initially the problem was seen with a VLDB that had worked correctly for 2+ years. At some point (1.4.6?) recently changes stopped being written to the vldb.DB0 (but no errors were logged) and the above procedure was attempted in order begin with a clean slate. The effect however remains and thus cannot be linked to a corrupt vldb.DB0. Testing with a backup of the original VLDB also shows this problem. vldb_check seems satisfied that the vldb.DB0 in all cases not corrupted. >From the above it appears that: - the vldb.DB0 file is not being updated on non-sync sites - when a restart occurs, only the sync site has a recent vldb.DB0 - but is outvoted by the previously non-sync sites and - recent changes are discarded -- System Information: Debian Release: lenny/sid APT prefers testing APT policy: (990, 'testing'), (300, 'unstable'), (80, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 2.6.25-1-amd64 (SMP w/8 CPU cores) Locale: LANG=en_ZA.UTF-8, LC_CTYPE=en_ZA.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages openafs-dbserver depends on: ii libc6 2.7-10 GNU C Library: Shared libraries ii openafs-client 1.4.7~pre3.dfsg1-1 AFS distributed filesystem client ii openafs-fileserver 1.4.7~pre3.dfsg1-1 AFS distributed filesystem file se ii perl 5.8.8-12 Larry Wall's Practical Extraction openafs-dbserver recommends no packages. -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]