On 10/05/2012 04:56 PM, Smart Weblications GmbH - Florian Wiessner wrote:
> Am 05.10.2012 17:24, schrieb Sage Weil:
>> On Fri, 5 Oct 2012, Joao Eduardo Luis wrote:
>>> On 10/05/2012 01:24 PM, Smart Weblications GmbH - Florian Wiessner wrote:
>>>> Am 04.10.2012 15:38, schrieb Smart Weblications GmbH - Florian Wiessner:
>>>>> Hi,
>>>>>
>>>>>
>>>>> i have a ceph cluster with 2 osds, 3 mons.. one of the monitors does not 
>>>>> start
>>>>> anymore:
>>>>>
>>>>> 2012-10-04 13:36:29.501178 7f7e123f9780 -1 asok(0x14ac000)
>>>>> AdminSocketConfigObs::init: error: AdminSocket::create_shutdown_pipe 
>>>>> error: (38)
>>>>> Function not implemented
>>>>> 2012-10-04 13:36:29.535018 7f7e123f9780  1 mon.2@-1(probing) e1 init fsid
>>>>> 5b59811a-d235-488f-9b9b-953db7e5028b
>>>>> 2012-10-04 13:36:29.541171 7f7e123f9780 -1 mon/Paxos.cc: In function 'bool
>>>>> Paxos::is_consistent()' thread 7f7e123f9780 time 2012-10-04 
>>>>> 13:36:29.536744
>>>>> mon/Paxos.cc: 1031: FAILED assert(consistent || (slurping == 1))
>>>
>>> This assertion means the monitor was killed or failed either during
>>> slurping (while catching up with the other monitors) or while performing
>>> some kind of update. So it ended up in an inconsistent state.
>>
>> The monitor is supposed to take note of when it is slurping and may be 
>> temporarily inconsistent by writing a 'slurping' file with '1' in it in 
>> the paxos subdirectory(ies), so some bug triggered this.  A simple 
>> workaround is to do
>>
>>   echo 1 > $mondata/osdmap/slurping
>>   echo 1 > $mondata/pgmap/slurping
>>   echo 1 > $mondata/monmap/slurping
>>   echo 1 > $mondata/logm/slurping
>>   echo 1 > $mondata/auth/slurping
>>
>> and it will go through the recovery steps.  It would be helpful if you 
>> could tar up a copy of the mon directory first, though, along with any 
>> log files on that host, so we can try to figure out what went wrong.
>>
> 
> unfortunatelly, i deleted the logs for the monitor, as i did not see anything
> special except this assertion...
> 
> 
> i'll send mon-directory directly to Sage with a seperate mail.
> 

Just following up on this, do you remember why this monitor went down
initially (the time before you were unable to start it)? Did it fail?
Was it killed? Were you upgrading it from a version prior to argonaut?

  -Joao

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to