[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Sat, 7 Apr 2012 12:16:46 -0500 Andrew Deason adea...@sinenomine.net wrote: Would you be willing to provide your prdb.DB0? It just contains things like usernames, groups, group memberships, ids, etc. It shouldn't contain very sensitive information, unless any of your usernames or group memberships etc are sensitive. So, the prdb.DB0 seems fine (assuming the entries in it are indeed the only entries you have); it's just the ubik label version epoch that's screwed up. The only time I'm aware of where the label can be like that is during a SendFile/GetFile... Brett, do you have a prdb.DB0.TMP file lying around in there? Can you see if PtLog or PtLog.old (or any PtLog you can find) mentions Synchronize database? And is it possible you have ever had another dbserver site? Regardless of what you have in your CellServDB, have you ever had more than one machine running the 'ptserver' process at once? -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Tue, 10 Apr 2012 11:19:07 -0500 Andrew Deason adea...@sinenomine.net wrote: So, the prdb.DB0 seems fine (assuming the entries in it are indeed the only entries you have); it's just the ubik label version epoch that's screwed up. The only time I'm aware of where the label can be like that is during a SendFile/GetFile... Brett, do you have a prdb.DB0.TMP file lying around in there? Can you see if PtLog or PtLog.old (or any PtLog you can find) mentions Synchronize database? The answer to this is no, apparently. While I'm not sure how Brett's ptserver exactly got in that situation... I've filed bug 130673 for the issue that I think is related, if anyone is curious. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Sat, 07 Apr 2012 21:40:01 -0500 Brett Heroux brett.j.her...@gmail.com wrote: That did it. Well that's good to hear, but do you still have the old database files, and would you be willing to share them? Just from what's in this thread, I don't really know why it broke, so I don't have any guarantee that it won't happen again. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Fri, 6 Apr 2012 14:59:40 -0500 Brett Heroux brett.j.her...@gmail.com wrote: The pt_util output looks good, it just gives all the users. Still would appreciate help. Would you be willing to provide your prdb.DB0? It just contains things like usernames, groups, group memberships, ids, etc. It shouldn't contain very sensitive information, unless any of your usernames or group memberships etc are sensitive. If you want something quicker to get stuff up and running, one thing that will probably work is to recreate the ptdb. One way to do this is to stop the ptserver, move the prdb.DB0 and prdb.DBSYS files out of the way, and use pt_util to dump the information from them to a temporary file. Then use pt_util to load that information into a new prdb.DB0, and start up the ptserver again. Just make sure you back up prdb.DB0/prdb.DBSYS (See the pt_util documentation for that; I'm in a bit of a hurry to provide more detailed info: http://docs.openafs.org/Reference/8/pt_util.html) -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
That did it. 1) stop the ptserver 2) back up prdb files 3) remove prdb files 4) pt_util -user -group -members -name -system -prdb ./prdb.DB0 -datafile /tmp/t 5) pt_util -user -group -members -name -system -w -datafile /tmp/t 6) start the ptserver 7) joy I think the -system was unnecessary, but Thank You So Much Andrew. Brett Heroux On 4/7/2012 12:16 PM, Andrew Deason wrote: On Fri, 6 Apr 2012 14:59:40 -0500 Brett Herouxbrett.j.her...@gmail.com wrote: The pt_util output looks good, it just gives all the users. Still would appreciate help. Would you be willing to provide your prdb.DB0? It just contains things like usernames, groups, group memberships, ids, etc. It shouldn't contain very sensitive information, unless any of your usernames or group memberships etc are sensitive. If you want something quicker to get stuff up and running, one thing that will probably work is to recreate the ptdb. One way to do this is to stop the ptserver, move the prdb.DB0 and prdb.DBSYS files out of the way, and use pt_util to dump the information from them to a temporary file. Then use pt_util to load that information into a new prdb.DB0, and start up the ptserver again. Just make sure you back up prdb.DB0/prdb.DBSYS (See the pt_util documentation for that; I'm in a bit of a hurry to provide more detailed info: http://docs.openafs.org/Reference/8/pt_util.html) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
The pt_util output looks good, it just gives all the users. Still would appreciate help. Thanks, Brett Heroux On Thu, Apr 5, 2012 at 2:43 PM, Andrew Deason adea...@sinenomine.netwrote: On Thu, 5 Apr 2012 14:17:30 -0500 Brett Heroux brett.j.her...@gmail.com wrote: The output from udebug is: This all looks fine. I probably should have asked for udebug on port 7002; that is what looks weird: $ udebug 74.222.253.110 7002 [...] Local db version is 0.134632965 I am sync site forever (1 server) Recovery state 1f Sync site's db version is 0.134632965 0 locked pages, 0 of them for write That is not a normal db version number; maybe your ptdb has been corrupted. Can you read it from local disk using pt_util? If you run 'pt_util -user' as root, it should spit out a list of all users in the database. Does it do that, or does it complain about some error? Do you see anything in PtLog? (I assume this is in /var/log/openafs, or wherever openafs logs are for you) -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Wed, 04 Apr 2012 20:06:05 -0500 Brett Heroux brett.j.her...@gmail.com wrote: I have one db/fileserver and another fileserver. The db/fileserver is east-gateway. This is the udebug output. I don't think you can get a quorum error with just one dbserver. What's in /usr/afs/etc/CellServDB? (or wherever the server-side CellServDB is) root@east-gateway:~# udebug 7003 east-gateway udebug: can't resolve port name east-gateway As Brandon mentioned, this is backwards. Seeing this with the arguments the right way around would still be helpful... -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
The CellServDB on my system is: devicesoft.org #EFS 74.222.253.110 #east-gateway.devicesoft.org It resides in /etc/openafs and /etc/openafs/server on both the db/fileserver and the other fileserver. The output from udebug is: root@east-gateway:~# udebug east-gateway 7003 Host's addresses are: 74.222.253.110 Host's 74.222.253.110 time is Thu Apr 5 14:13:10 2012 Local time is Thu Apr 5 14:13:12 2012 (time differential 2 secs) Last yes vote for 74.222.253.110 was 0 secs ago (sync site); Last vote started 0 secs ago (at Thu Apr 5 14:13:12 2012) Local db version is 1333587500.2 I am sync site forever (1 server) Recovery state 1f Sync site's db version is 1333587500.2 0 locked pages, 0 of them for write Last time a new db version was labelled was: 65690 secs ago (at Wed Apr 4 19:58:22 2012) Thanks for your help. Brett Heroux On Thu, Apr 5, 2012 at 12:02 PM, Andrew Deason adea...@sinenomine.netwrote: On Wed, 04 Apr 2012 20:06:05 -0500 Brett Heroux brett.j.her...@gmail.com wrote: I have one db/fileserver and another fileserver. The db/fileserver is east-gateway. This is the udebug output. I don't think you can get a quorum error with just one dbserver. What's in /usr/afs/etc/CellServDB? (or wherever the server-side CellServDB is) root@east-gateway:~# udebug 7003 east-gateway udebug: can't resolve port name east-gateway As Brandon mentioned, this is backwards. Seeing this with the arguments the right way around would still be helpful... -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Thu, 5 Apr 2012 14:17:30 -0500 Brett Heroux brett.j.her...@gmail.com wrote: The output from udebug is: This all looks fine. I probably should have asked for udebug on port 7002; that is what looks weird: $ udebug 74.222.253.110 7002 [...] Local db version is 0.134632965 I am sync site forever (1 server) Recovery state 1f Sync site's db version is 0.134632965 0 locked pages, 0 of them for write That is not a normal db version number; maybe your ptdb has been corrupted. Can you read it from local disk using pt_util? If you run 'pt_util -user' as root, it should spit out a list of all users in the database. Does it do that, or does it complain about some error? Do you see anything in PtLog? (I assume this is in /var/log/openafs, or wherever openafs logs are for you) -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
On Wed, 4 Apr 2012 10:24:19 -0500 Brett Heroux brett.j.her...@gmail.com wrote: Sun Apr 1 04:00:03 2012 File server starting Sun Apr 1 04:00:03 2012 afs_krb_get_lrealm failed, using devicesoft.org. Sun Apr 1 04:00:03 2012 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=0) Sun Apr 1 04:00:03 2012 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376. $ translate_et 5376 5376 (u).0 = no quorum elected Your database servers claim to be out of sync. How many dbservers do you have? Can you run 'udebug 7003 server' for each of them? I had a power outage and got a new IP address on my server, had to swap some NICs that were damaged, but I changed my CellServDB and think I am over it (I use DHCP). This used to work and I would like very much for it to work again. If you changed the server-side CellServDB after the servers started, you need to restart the server processes to pick up the changes. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.
I'm pretty sure I restarted my servers already, but I did it again. I have one db/fileserver and another fileserver. The db/fileserver is east-gateway. This is the udebug output. root@east-gateway:~# bos status east-gateway Instance buserver, currently running normally. Instance ptserver, currently running normally. Instance vlserver, currently running normally. Instance fs, currently running normally. Auxiliary status is: file server running. root@east-gateway:~# udebug 7003 east-gateway udebug: can't resolve port name east-gateway Brett Heroux On 4/4/2012 10:33 AM, Andrew Deason wrote: On Wed, 4 Apr 2012 10:24:19 -0500 Brett Herouxbrett.j.her...@gmail.com wrote: Sun Apr 1 04:00:03 2012 File server starting Sun Apr 1 04:00:03 2012 afs_krb_get_lrealm failed, using devicesoft.org. Sun Apr 1 04:00:03 2012 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=0) Sun Apr 1 04:00:03 2012 Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376. $ translate_et 5376 5376 (u).0 = no quorum elected Your database servers claim to be out of sync. How many dbservers do you have? Can you run 'udebug 7003server' for each of them? I had a power outage and got a new IP address on my server, had to swap some NICs that were damaged, but I changed my CellServDB and think I am over it (I use DHCP). This used to work and I would like very much for it to work again. If you changed the server-side CellServDB after the servers started, you need to restart the server processes to pick up the changes. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376
On Thu, 28 Apr 2011 12:34:27 +0200 Christof Hanke christof.ha...@rzg.mpg.de wrote: # translate_et 5376 5376 (u).0 = no quorum elected The DB-Servers cannot agree on who should be the master. VLLog and 'udebug vlserver 7003' might help indicate why. If the dbservers were just turned on, you may just have needed to wait a couple of minutes. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376
Andrew Deason wrote: On Thu, 28 Apr 2011 12:34:27 +0200 Christof Hanke christof.ha...@rzg.mpg.de wrote: # translate_et 5376 5376 (u).0 = no quorum elected The DB-Servers cannot agree on who should be the master. VLLog and 'udebug vlserver 7003' might help indicate why. If the dbservers were just turned on, you may just have needed to wait a couple of minutes. Some simple checks to make as well: * Clock skew can cause problems. Be sure the clocks are synchronized on all the db servers with ntpd. * Verify the server side CellServDB files on the db servers are correct and are identical. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info