I checked the disk that monitor is on it with smartctl and it didn't return any error and it doesn't have any Current_Pending_Sector. Do you recommend any disk checks to make sure that this disk has problem and then I can send the report to the provider for replacing the disk
On Sat, Feb 17, 2018 at 1:09 AM, Gregory Farnum <gfar...@redhat.com> wrote: > The disk that the monitor is on...there isn't anything for you to > configure about a monitor WAL though so I'm not sure how that enters into > it? > > On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani < > behnam.loghm...@gmail.com> wrote: > >> Thanks for your reply >> >> Do you mean, that's the problem with the disk I use for WAL and DB? >> >> On Fri, Feb 16, 2018 at 11:33 PM, Gregory Farnum <gfar...@redhat.com> >> wrote: >> >>> >>> On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani < >>> behnam.loghm...@gmail.com> wrote: >>> >>>> Hi there, >>>> >>>> I have a Ceph cluster version 12.2.2 on CentOS 7. >>>> >>>> It is a testing cluster and I have set it up 2 weeks ago. >>>> after some days, I see that one of the three mons has stopped(out of >>>> quorum) and I can't start it anymore. >>>> I checked the mon service log and the output shows this error: >>>> >>>> """ >>>> mon.XXXXXX@-1(probing) e4 preinit clean up potentially inconsistent >>>> store state >>>> rocksdb: submit_transaction_sync error: Corruption: block checksum >>>> mismatch >>>> >>> >>> This bit is the important one. Your disk is bad and it’s feeding back >>> corrupted data. >>> >>> >>> >>> >>>> code = 2 Rocksdb transaction: >>>> 0> 2018-02-16 17:37:07.041812 7f45a1e52e40 -1 >>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_ >>>> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/ >>>> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUI >>>> LD/ceph-12.2.2/src/mon/MonitorDBStore.h: In function 'void >>>> MonitorDBStore::clear(std::set<std::basic_string<char> >&)' thread >>>> 7f45a1e52e40 time 2018-02-16 17:37:07.040846 >>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_ >>>> 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/ >>>> centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ >>>> ceph-12.2.2/src/mon/MonitorDBStore.h: 581: FAILE >>>> D assert(r >= 0) >>>> """ >>>> >>>> the only solution I found is to remove this mon from quorum and remove >>>> all mon data and re-add this mon to quorum again. >>>> and ceph goes to the healthy status again. >>>> >>>> but now after some days this mon has stopped and I face the same >>>> problem again. >>>> >>>> My cluster setup is: >>>> 4 osd hosts >>>> total 8 osds >>>> 3 mons >>>> 1 rgw >>>> >>>> this cluster has setup with ceph-volume lvm and wal/db separation on >>>> logical volumes. >>>> >>>> Best regards, >>>> Behnam Loghmani >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com