[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-10 Thread Andrew Deason
On Sat, 7 Apr 2012 12:16:46 -0500
Andrew Deason adea...@sinenomine.net wrote:

 Would you be willing to provide your prdb.DB0? It just contains things
 like usernames, groups, group memberships, ids, etc. It shouldn't
 contain very sensitive information, unless any of your usernames or
 group memberships etc are sensitive.

So, the prdb.DB0 seems fine (assuming the entries in it are indeed the
only entries you have); it's just the ubik label version epoch that's
screwed up. The only time I'm aware of where the label can be like that
is during a SendFile/GetFile... Brett, do you have a prdb.DB0.TMP file
lying around in there? Can you see if PtLog or PtLog.old (or any PtLog
you can find) mentions Synchronize database?

And is it possible you have ever had another dbserver site? Regardless
of what you have in your CellServDB, have you ever had more than one
machine running the 'ptserver' process at once?

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-10 Thread Andrew Deason
On Tue, 10 Apr 2012 11:19:07 -0500
Andrew Deason adea...@sinenomine.net wrote:

 So, the prdb.DB0 seems fine (assuming the entries in it are indeed the
 only entries you have); it's just the ubik label version epoch that's
 screwed up. The only time I'm aware of where the label can be like that
 is during a SendFile/GetFile... Brett, do you have a prdb.DB0.TMP file
 lying around in there? Can you see if PtLog or PtLog.old (or any PtLog
 you can find) mentions Synchronize database?

The answer to this is no, apparently. While I'm not sure how Brett's
ptserver exactly got in that situation... I've filed bug 130673 for the
issue that I think is related, if anyone is curious.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-08 Thread Andrew Deason
On Sat, 07 Apr 2012 21:40:01 -0500
Brett Heroux brett.j.her...@gmail.com wrote:

 That did it.

Well that's good to hear, but do you still have the old database files,
and would you be willing to share them? Just from what's in this thread,
I don't really know why it broke, so I don't have any guarantee that it
won't happen again.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-07 Thread Andrew Deason
On Fri, 6 Apr 2012 14:59:40 -0500
Brett Heroux brett.j.her...@gmail.com wrote:

 The pt_util output looks good, it just gives all the users.
 
 Still would appreciate help.

Would you be willing to provide your prdb.DB0? It just contains things
like usernames, groups, group memberships, ids, etc. It shouldn't
contain very sensitive information, unless any of your usernames or
group memberships etc are sensitive.

If you want something quicker to get stuff up and running, one thing
that will probably work is to recreate the ptdb. One way to do this is
to stop the ptserver, move the prdb.DB0 and prdb.DBSYS files out of the
way, and use pt_util to dump the information from them to a temporary
file. Then use pt_util to load that information into a new prdb.DB0, and
start up the ptserver again. Just make sure you back up
prdb.DB0/prdb.DBSYS

(See the pt_util documentation for that; I'm in a bit of a hurry to
provide more detailed info:
http://docs.openafs.org/Reference/8/pt_util.html)

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-07 Thread Brett Heroux

That did it.

1) stop the ptserver
2) back up prdb files
3) remove prdb files
4) pt_util -user -group -members -name -system -prdb ./prdb.DB0 
-datafile /tmp/t

5) pt_util -user -group -members -name -system -w -datafile /tmp/t
6) start the ptserver
7) joy

I think the -system was unnecessary, but Thank You So Much Andrew.

Brett Heroux

On 4/7/2012 12:16 PM, Andrew Deason wrote:

On Fri, 6 Apr 2012 14:59:40 -0500
Brett Herouxbrett.j.her...@gmail.com  wrote:


The pt_util output looks good, it just gives all the users.

Still would appreciate help.

Would you be willing to provide your prdb.DB0? It just contains things
like usernames, groups, group memberships, ids, etc. It shouldn't
contain very sensitive information, unless any of your usernames or
group memberships etc are sensitive.

If you want something quicker to get stuff up and running, one thing
that will probably work is to recreate the ptdb. One way to do this is
to stop the ptserver, move the prdb.DB0 and prdb.DBSYS files out of the
way, and use pt_util to dump the information from them to a temporary
file. Then use pt_util to load that information into a new prdb.DB0, and
start up the ptserver again. Just make sure you back up
prdb.DB0/prdb.DBSYS

(See the pt_util documentation for that; I'm in a bit of a hurry to
provide more detailed info:
http://docs.openafs.org/Reference/8/pt_util.html)



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-06 Thread Brett Heroux
The pt_util output looks good, it just gives all the users.

Still would appreciate help.

Thanks,

Brett Heroux

On Thu, Apr 5, 2012 at 2:43 PM, Andrew Deason adea...@sinenomine.netwrote:

 On Thu, 5 Apr 2012 14:17:30 -0500
 Brett Heroux brett.j.her...@gmail.com wrote:

  The output from udebug is:

 This all looks fine. I probably should have asked for udebug on port
 7002; that is what looks weird:

 $ udebug 74.222.253.110 7002
 [...]
 Local db version is 0.134632965
 I am sync site forever (1 server)
 Recovery state 1f
 Sync site's db version is 0.134632965
 0 locked pages, 0 of them for write

 That is not a normal db version number; maybe your ptdb has been
 corrupted. Can you read it from local disk using pt_util? If you run
 'pt_util -user' as root, it should spit out a list of all users in the
 database. Does it do that, or does it complain about some error?

 Do you see anything in PtLog? (I assume this is in /var/log/openafs, or
 wherever openafs logs are for you)

 --
 Andrew Deason
 adea...@sinenomine.net

 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info



[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-05 Thread Andrew Deason
On Wed, 04 Apr 2012 20:06:05 -0500
Brett Heroux brett.j.her...@gmail.com wrote:

 I have one db/fileserver and another fileserver. The db/fileserver is 
 east-gateway. This is the udebug output.

I don't think you can get a quorum error with just one dbserver. What's
in /usr/afs/etc/CellServDB? (or wherever the server-side CellServDB is)

 root@east-gateway:~# udebug 7003 east-gateway
 udebug: can't resolve port name east-gateway

As Brandon mentioned, this is backwards. Seeing this with the arguments
the right way around would still be helpful...

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-05 Thread Brett Heroux
The CellServDB on my system is:

devicesoft.org #EFS
74.222.253.110   #east-gateway.devicesoft.org

It resides in /etc/openafs and /etc/openafs/server on both the
db/fileserver and the other fileserver.

The output from udebug is:

root@east-gateway:~# udebug east-gateway 7003
Host's addresses are: 74.222.253.110
Host's 74.222.253.110 time is Thu Apr  5 14:13:10 2012
Local time is Thu Apr  5 14:13:12 2012 (time differential 2 secs)
Last yes vote for 74.222.253.110 was 0 secs ago (sync site);
Last vote started 0 secs ago (at Thu Apr  5 14:13:12 2012)
Local db version is 1333587500.2
I am sync site forever (1 server)
Recovery state 1f
Sync site's db version is 1333587500.2
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
 65690 secs ago (at Wed Apr  4 19:58:22 2012)

Thanks for your help.

Brett Heroux

On Thu, Apr 5, 2012 at 12:02 PM, Andrew Deason adea...@sinenomine.netwrote:

 On Wed, 04 Apr 2012 20:06:05 -0500
 Brett Heroux brett.j.her...@gmail.com wrote:

  I have one db/fileserver and another fileserver. The db/fileserver is
  east-gateway. This is the udebug output.

 I don't think you can get a quorum error with just one dbserver. What's
 in /usr/afs/etc/CellServDB? (or wherever the server-side CellServDB is)

  root@east-gateway:~# udebug 7003 east-gateway
  udebug: can't resolve port name east-gateway

 As Brandon mentioned, this is backwards. Seeing this with the arguments
 the right way around would still be helpful...

 --
 Andrew Deason
 adea...@sinenomine.net

 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info



[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-05 Thread Andrew Deason
On Thu, 5 Apr 2012 14:17:30 -0500
Brett Heroux brett.j.her...@gmail.com wrote:

 The output from udebug is:

This all looks fine. I probably should have asked for udebug on port
7002; that is what looks weird:

$ udebug 74.222.253.110 7002
[...]
Local db version is 0.134632965
I am sync site forever (1 server)
Recovery state 1f
Sync site's db version is 0.134632965
0 locked pages, 0 of them for write

That is not a normal db version number; maybe your ptdb has been
corrupted. Can you read it from local disk using pt_util? If you run
'pt_util -user' as root, it should spit out a list of all users in the
database. Does it do that, or does it complain about some error?

Do you see anything in PtLog? (I assume this is in /var/log/openafs, or
wherever openafs logs are for you)

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-04 Thread Andrew Deason
On Wed, 4 Apr 2012 10:24:19 -0500
Brett Heroux brett.j.her...@gmail.com wrote:

 Sun Apr  1 04:00:03 2012 File server starting
 Sun Apr  1 04:00:03 2012 afs_krb_get_lrealm failed, using devicesoft.org.
 Sun Apr  1 04:00:03 2012 VL_RegisterAddrs rpc failed; will retry
 periodically (code=5376, err=0)
 Sun Apr  1 04:00:03 2012 Couldn't get CPS for AnyUser, will try again in 30
 seconds; code=5376.

$ translate_et 5376
5376 (u).0 = no quorum elected

Your database servers claim to be out of sync. How many dbservers do you
have? Can you run 'udebug 7003 server' for each of them?

 I had a power outage and got a new IP address on my server, had to
 swap some NICs that were damaged, but I changed my CellServDB and
 think I am over it (I use DHCP). This used to work and I would like
 very much for it to work again.

If you changed the server-side CellServDB after the servers started, you
need to restart the server processes to pick up the changes.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376.

2012-04-04 Thread Brett Heroux

I'm pretty sure I restarted my servers already, but I did it again.

I have one db/fileserver and another fileserver. The db/fileserver is 
east-gateway. This is the udebug output.


root@east-gateway:~# bos status east-gateway
Instance buserver, currently running normally.
Instance ptserver, currently running normally.
Instance vlserver, currently running normally.
Instance fs, currently running normally.
Auxiliary status is: file server running.

root@east-gateway:~# udebug 7003 east-gateway
udebug: can't resolve port name east-gateway

Brett Heroux

On 4/4/2012 10:33 AM, Andrew Deason wrote:

On Wed, 4 Apr 2012 10:24:19 -0500
Brett Herouxbrett.j.her...@gmail.com  wrote:


Sun Apr  1 04:00:03 2012 File server starting
Sun Apr  1 04:00:03 2012 afs_krb_get_lrealm failed, using devicesoft.org.
Sun Apr  1 04:00:03 2012 VL_RegisterAddrs rpc failed; will retry
periodically (code=5376, err=0)
Sun Apr  1 04:00:03 2012 Couldn't get CPS for AnyUser, will try again in 30
seconds; code=5376.

$ translate_et 5376
5376 (u).0 = no quorum elected

Your database servers claim to be out of sync. How many dbservers do you
have? Can you run 'udebug 7003server' for each of them?


I had a power outage and got a new IP address on my server, had to
swap some NICs that were damaged, but I changed my CellServDB and
think I am over it (I use DHCP). This used to work and I would like
very much for it to work again.

If you changed the server-side CellServDB after the servers started, you
need to restart the server processes to pick up the changes.



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376

2011-04-28 Thread Andrew Deason
On Thu, 28 Apr 2011 12:34:27 +0200
Christof Hanke christof.ha...@rzg.mpg.de wrote:

 # translate_et 5376
 5376 (u).0 = no quorum elected
 
 The DB-Servers cannot agree on who should be the master.

VLLog and 'udebug vlserver 7003' might help indicate why.

If the dbservers were just turned on, you may just have needed to wait a
couple of minutes.

-- 
Andrew Deason
adea...@sinenomine.net

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: Couldn't get CPS for AnyUser, will try again in 30 seconds; code=5376

2011-04-28 Thread Michael Meffie

Andrew Deason wrote:

On Thu, 28 Apr 2011 12:34:27 +0200
Christof Hanke christof.ha...@rzg.mpg.de wrote:


# translate_et 5376
5376 (u).0 = no quorum elected

The DB-Servers cannot agree on who should be the master.


VLLog and 'udebug vlserver 7003' might help indicate why.

If the dbservers were just turned on, you may just have needed to wait a
couple of minutes.


Some simple checks to make as well:

* Clock skew can cause problems. Be sure the clocks are
  synchronized on all the db servers with ntpd.

* Verify the server side CellServDB files on the db servers
  are correct and are identical.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info