I'd have to look for details, but I don't think the auth monitor ever
removes those keys, so if there are some missing, it sounds like some
data got lost out from underneath it. That could have happened if the
filesystem dropped a file, which we have seen on some kernels.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Jun 10, 2014 at 3:31 AM, Mohammad Salehe <sal...@gmail.com> wrote:
> Hi Greg,
>
> Thank for your suggestion and information. I've installed the cluster over
> again.
>
> I just wanted to investigate a little more based on your information. I can
> see that auth/paxos values in monitor K/V store are these:
> 'authfirst_commited': 251
> 'authlast_commited': 329
>
> and I have all the keys 'auth251'...'auth329' in there. However, there is no
> 'auth1' or 'auth250' but it seems monitor failed while reading 'auth1'. Is
> this normal?
> As a side note, I did not use cephx in this cluster.
>
> Thanks,
>
>
> 2014-06-09 22:11 GMT+04:30 Gregory Farnum <g...@inktank.com>:
>>
>> Barring a newly-introduced bug (doubtful), that assert basically means
>> that your computer lied to the ceph monitor about the durability or
>> ordering of data going to disk, and the store is now inconsistent. If
>> you don't have data you care about on the cluster, by far your best
>> option is:
>> 1) Figure out what part of the system is lying about data durability
>> (probably your filesystem or controller is ignoring barriers),
>> 2) start the Ceph install over
>> It's possible that the ceph-monstore-tool will let you edit the store
>> back into a consistent state, but it looks like the system can't find
>> the *initial* commit, which means you'll need to manufacture a new one
>> wholesale with the right keys from the other system components.
>>
>> (I am assuming that the system didn't crash right while you were
>> turning on the monitor for the first time; if it did that makes it
>> slightly more likely to be a bug on our end, but again it'll be
>> easiest to just start over since you don't have any data in it yet.)
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Sun, Jun 8, 2014 at 10:26 PM, Mohammad Salehe <sal...@gmail.com> wrote:
>> > Hi,
>> >
>> > I'm receiving failed assertion in AuthMonitor::update_from_paxos(bool*)
>> > after a system crash. I've saved a complete monitor log with 10/20 for
>> > 'mon'
>> > and 'paxos' here.
>> > There is only one monitor and two OSDs in the cluster as I was just at
>> > the
>> > beginning of deployment.
>> >
>> > I will be thankful if someone could help.
>> >
>> > --
>> > Mohammad Salehe
>> > sal...@gmail.com
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
>
>
> --
> Mohammad Salehe
> sal...@gmail.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to