I checked the disk that monitor is on it with smartctl and it didn't return
any error and it doesn't have any Current_Pending_Sector.
Do you recommend any disk checks to make sure that this disk has problem
and then I can send the report to the provider for replacing the disk

On Sat, Feb 17, 2018 at 1:09 AM, Gregory Farnum <gfar...@redhat.com> wrote:

> The disk that the monitor is on...there isn't anything for you to
> configure about a monitor WAL though so I'm not sure how that enters into
> it?
>
> On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani <
> behnam.loghm...@gmail.com> wrote:
>
>> Thanks for your reply
>>
>> Do you mean, that's the problem with the disk I use for WAL and DB?
>>
>> On Fri, Feb 16, 2018 at 11:33 PM, Gregory Farnum <gfar...@redhat.com>
>> wrote:
>>
>>>
>>> On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani <
>>> behnam.loghm...@gmail.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I have a Ceph cluster version 12.2.2 on CentOS 7.
>>>>
>>>> It is a testing cluster and I have set it up 2 weeks ago.
>>>> after some days, I see that one of the three mons has stopped(out of
>>>> quorum) and I can't start it anymore.
>>>> I checked the mon service log and the output shows this error:
>>>>
>>>> """
>>>> mon.XXXXXX@-1(probing) e4 preinit clean up potentially inconsistent
>>>> store state
>>>> rocksdb: submit_transaction_sync error: Corruption: block checksum
>>>> mismatch
>>>>
>>>
>>> This bit is the important one. Your disk is bad and it’s feeding back
>>> corrupted data.
>>>
>>>
>>>
>>>
>>>> code = 2 Rocksdb transaction:
>>>>      0> 2018-02-16 17:37:07.041812 7f45a1e52e40 -1
>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_
>>>> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/
>>>> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUI
>>>> LD/ceph-12.2.2/src/mon/MonitorDBStore.h: In function 'void
>>>> MonitorDBStore::clear(std::set<std::basic_string<char> >&)' thread
>>>> 7f45a1e52e40 time 2018-02-16 17:37:07.040846
>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_
>>>> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/
>>>> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/
>>>> ceph-12.2.2/src/mon/MonitorDBStore.h: 581: FAILE
>>>> D assert(r >= 0)
>>>> """
>>>>
>>>> the only solution I found is to remove this mon from quorum and remove
>>>> all mon data and re-add this mon to quorum again.
>>>> and ceph goes to the healthy status again.
>>>>
>>>> but now after some days this mon has stopped and I face the same
>>>> problem again.
>>>>
>>>> My cluster setup is:
>>>> 4 osd hosts
>>>> total 8 osds
>>>> 3 mons
>>>> 1 rgw
>>>>
>>>> this cluster has setup with ceph-volume lvm and wal/db separation on
>>>> logical volumes.
>>>>
>>>> Best regards,
>>>> Behnam Loghmani
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to