On Tue, Apr 29, 2014 at 9:48 AM, Marc <m...@shoowin.de> wrote:
> 'ls' on the respective stores in /var/lib/ceph/mon/ceph.X/store.db
> returns a list of files (i.e. still present), fsck seems fine. I did
> notice that one of the nodes has different contents in the
> /var/lib/ceph/mon/ceph-b/keyring i.e. its key is different from the
> other 2 nodes'. That shouldn't be the case, should it? Would scp'ing
> over one of the other node's keyring files while mon.b is stopped be the
> right course of action then?

The fact that it's changed is...concerning. If that's the only thing
that's changed then copying over a keyring from one of the others
should do it, but it might also be a symptom of a more serious issue.
Depending on how paranoid you want to be:
1) just copy over the keyring and start it up
2) after that, do a mon scrub if it exists in your version of ceph (I
don't remember when it was introduced)
3) Prior to that, do a comparison of the information you can pull out
of each monitor's admin socket when it's trying to form a quorum; make
sure everything basically matches
4) Prior to changing the keys, you could extract several maps of
various types and compare them to make sure they match
5) Or you could just copy one of the working stores to the monitor
with a different key. (There might be some files you need to twiddle
when doing this; check for past emails about recovering from lost
monitors.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
>
> Also your red herring explanation... how do I put this... It seems like
> an important thing to know, so thanks for that. I'm not sure how one
> would go about putting this tidbit in a spot where people would find it
> when needed... maybe somewhere in the debugging section of the wiki?
>
> On 29/04/2014 18:25, Gregory Farnum wrote:
>> Monitor keys don't change; I think something else must be going on. Did you
>> remove any of their stores? Are the local filesystems actually correct
>> (fsck)?
>>
>> The ceph-create-keys is a red herring and will stop as soon as. The
>> monitors do get into a quorum.
>> -Greg
>>
>> On Tuesday, April 29, 2014, Marc <m...@shoowin.de> wrote:
>>
>>> Hi,
>>>
>>> still working on a troubled ceph cluster running .61.2-1raring
>>> consisting of (currently) 4 monitors a,b,c,g with g being a newly added
>>> monitor that failed/fails to sync up, so consider that one down. Now mon
>>> a and b died because for some (currently unknown) reason linux created a
>>> core dump on the root partition (/core) that filled up the partition to
>>> 0b left and consequently the mons died. Now I tried restarting them, but
>>> they they seem deadlocked in the following situation:
>>>
>>> the corresponding ceph-mon.X logs show various errors about cephx like
>>>
>>> "cephx: verify_authorizer could not decrypt ticket info: error: NSS AES
>>> final round failed: -8190"
>>>
>>> "cephx: verify_reply coudln't decrypt with error: error decoding block
>>> for decryption"
>>>
>>> I can see that the /usr/sbin/ceph-create-keys process is stuck (based on
>>> the fact that its still running 20 minutes later). Manually running this
>>> says:
>>>
>>>
>>> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
>>>
>>>
>>>
>>> So, the monitors dont start up (stuck probing) because they cant
>>> communicate because they need new keys, and the keys cannot be generated
>>> because theres no quorum. Is there a way to fix this?
>>>
>>>
>>> Kind regards,
>>> Marc
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <javascript:;>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to