Hello Wasn't this originally an issue with mon store now you are getting a checksum error from an OSD? I think some hardware here in this node is just hosed.
On Wed, Feb 21, 2018 at 5:46 PM, Behnam Loghmani <behnam.loghm...@gmail.com> wrote: > Hi there, > > I changed SATA port and cable of SSD disk and also update ceph to version > 12.2.3 and rebuild OSDs > but when recovery starts OSDs failed with this error: > > > 2018-02-21 21:12:18.037974 7f3479fe2d00 -1 bluestore(/var/lib/ceph/osd/ceph-7) > _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x84c097b0, > expected 0xaf1040a2, device location [0x10000~1000], logical extent > 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# > 2018-02-21 21:12:18.038002 7f3479fe2d00 -1 osd.7 0 OSD::init() : unable to > read osd superblock > 2018-02-21 21:12:18.038009 7f3479fe2d00 1 bluestore(/var/lib/ceph/osd/ceph-7) > umount > 2018-02-21 21:12:18.038282 7f3479fe2d00 1 stupidalloc 0x0x55e99236c620 > shutdown > 2018-02-21 21:12:18.038308 7f3479fe2d00 1 freelist shutdown > 2018-02-21 21:12:18.038336 7f3479fe2d00 4 rocksdb: > [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/ > AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/ > MACHINE_SIZE/huge/release/12.2.3/rpm/el7/BUILD/ceph-12.2.3/src/rocksdb/db/db_impl.cc:217] > Shutdown: ca > nceling all background work > 2018-02-21 21:12:18.041561 7f3465561700 4 rocksdb: (Original Log Time > 2018/02/21-21:12:18.041514) [/home/jenkins-build/build/wor > kspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABL > E_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12. > 2.3/rpm/el7/BUILD/ceph-12. > 2.3/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base > level 1 max bytes base 268435456 files[5 0 0 0 0 0 0] max score 0.00, > MB/sec: 2495.2 rd, 10.1 wr, level 1, files in(5, 0) out(1) MB in(213.6, > 0.0) out(0.9), read-write-amplify(1.0) write-amplify(0.0) S > hutdown in progress: Database shutdown or Column > 2018-02-21 21:12:18.041569 7f3465561700 4 rocksdb: (Original Log Time > 2018/02/21-21:12:18.041545) EVENT_LOG_v1 {"time_micros": 1519234938041530, > "job": 3, "event": "compaction_finished", "compaction_time_micros": 89747, > "output_level": 1, "num_output_files": 1, "total_ou > tput_size": 902552, "num_input_records": 4470, "num_output_records": 4377, > "num_subcompactions": 1, "num_single_delete_mismatches": 0, > "num_single_delete_fallthrough": 44, "lsm_state": [5, 0, 0, 0, 0, 0, 0]} > 2018-02-21 21:12:18.041663 7f3479fe2d00 4 rocksdb: EVENT_LOG_v1 > {"time_micros": 1519234938041657, "job": 4, "event": "table_file_deletion", > "file_number": 249} > 2018-02-21 21:12:18.042144 7f3479fe2d00 4 rocksdb: > [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/ > AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/ > MACHINE_SIZE/huge/release/12.2.3/rpm/el7/BUILD/ceph-12.2.3/src/rocksdb/db/db_impl.cc:343] > Shutdown com > plete > 2018-02-21 21:12:18.043474 7f3479fe2d00 1 bluefs umount > 2018-02-21 21:12:18.043775 7f3479fe2d00 1 stupidalloc 0x0x55e991f05d40 > shutdown > 2018-02-21 21:12:18.043784 7f3479fe2d00 1 stupidalloc 0x0x55e991f05db0 > shutdown > 2018-02-21 21:12:18.043786 7f3479fe2d00 1 stupidalloc 0x0x55e991f05e20 > shutdown > 2018-02-21 21:12:18.043826 7f3479fe2d00 1 bdev(0x55e992254600 > /dev/vg0/wal-b) close > 2018-02-21 21:12:18.301531 7f3479fe2d00 1 bdev(0x55e992255800 > /dev/vg0/db-b) close > 2018-02-21 21:12:18.545488 7f3479fe2d00 1 bdev(0x55e992254400 > /var/lib/ceph/osd/ceph-7/block) close > 2018-02-21 21:12:18.650473 7f3479fe2d00 1 bdev(0x55e992254000 > /var/lib/ceph/osd/ceph-7/block) close > 2018-02-21 21:12:18.900003 7f3479fe2d00 -1 ** ERROR: osd init failed: > (22) Invalid argument > > > On Wed, Feb 21, 2018 at 5:06 PM, Behnam Loghmani < > behnam.loghm...@gmail.com> wrote: > >> but disks pass all the tests with smartctl, badblocks and there isn't any >> error on disks. because the ssd has contain WAL/DB of OSDs it's difficult >> to test it on other cluster nodes >> >> On Wed, Feb 21, 2018 at 4:58 PM, <kna...@gmail.com> wrote: >> >>> Could the problem be related with some faulty hardware (RAID-controller, >>> port, cable) but not disk? Does "faulty" disk works OK on other server? >>> >>> Behnam Loghmani wrote on 21/02/18 16:09: >>> >>>> Hi there, >>>> >>>> I changed the SSD on the problematic node with the new one and >>>> reconfigure OSDs and MON service on it. >>>> but the problem occurred again with: >>>> >>>> "rocksdb: submit_transaction error: Corruption: block checksum mismatch >>>> code = 2" >>>> >>>> I get fully confused now. >>>> >>>> >>>> >>>> On Tue, Feb 20, 2018 at 5:16 PM, Behnam Loghmani < >>>> behnam.loghm...@gmail.com <mailto:behnam.loghm...@gmail.com>> wrote: >>>> >>>> Hi Caspar, >>>> >>>> I checked the filesystem and there isn't any error on filesystem. >>>> The disk is SSD and it doesn't any attribute related to Wear level >>>> in smartctl and filesystem is >>>> mounted with default options and no discard. >>>> >>>> my ceph structure on this node is like this: >>>> >>>> it has osd,mon,rgw services >>>> 1 SSD for OS and WAL/DB >>>> 2 HDD >>>> >>>> OSDs are created by ceph-volume lvm. >>>> >>>> the whole SSD is on 1 vg. >>>> OS is on root lv >>>> OSD.1 DB is on db-a >>>> OSD.1 WAL is on wal-a >>>> OSD.2 DB is on db-b >>>> OSD.2 WAL is on wal-b >>>> >>>> output of lvs: >>>> >>>> data-a data-a -wi-a----- >>>> data-b data-b -wi-a----- >>>> db-a vg0 -wi-a----- >>>> db-b vg0 -wi-a----- >>>> root vg0 -wi-ao---- >>>> wal-a vg0 -wi-a----- >>>> wal-b vg0 -wi-a----- >>>> >>>> after making a heavy write on the radosgw, OSD.1 and OSD.2 has >>>> stopped with "block checksum >>>> mismatch" error. >>>> Now on this node MON and OSDs services has stopped working with >>>> this error >>>> >>>> I think my issue is related to this bug: >>>> http://tracker.ceph.com/issues/22102 >>>> <http://tracker.ceph.com/issues/22102> >>>> >>>> I ran >>>> #ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-1 --deep 1 >>>> but it returns the same error: >>>> >>>> *** Caught signal (Aborted) ** >>>> in thread 7fbf6c923d00 thread_name:ceph-bluestore- >>>> 2018-02-20 16:44:30.128787 7fbf6c923d00 -1 abort: Corruption: block >>>> checksum mismatch >>>> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) >>>> luminous (stable) >>>> 1: (()+0x3eb0b1) [0x55f779e6e0b1] >>>> 2: (()+0xf5e0) [0x7fbf61ae15e0] >>>> 3: (gsignal()+0x37) [0x7fbf604d31f7] >>>> 4: (abort()+0x148) [0x7fbf604d48e8] >>>> 5: (RocksDBStore::get(std::string const&, char const*, unsigned >>>> long, >>>> ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce] >>>> 6: (BlueStore::Collection::get_onode(ghobject_t const&, >>>> bool)+0x545) [0x55f779cd8f75] >>>> 7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75] >>>> 8: (main()+0xde0) [0x55f779baab90] >>>> 9: (__libc_start_main()+0xf5) [0x7fbf604bfc05] >>>> 10: (()+0x1bc59f) [0x55f779c3f59f] >>>> 2018-02-20 16:44:30.131334 7fbf6c923d00 -1 *** Caught signal >>>> (Aborted) ** >>>> in thread 7fbf6c923d00 thread_name:ceph-bluestore- >>>> >>>> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) >>>> luminous (stable) >>>> 1: (()+0x3eb0b1) [0x55f779e6e0b1] >>>> 2: (()+0xf5e0) [0x7fbf61ae15e0] >>>> 3: (gsignal()+0x37) [0x7fbf604d31f7] >>>> 4: (abort()+0x148) [0x7fbf604d48e8] >>>> 5: (RocksDBStore::get(std::string const&, char const*, unsigned >>>> long, >>>> ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce] >>>> 6: (BlueStore::Collection::get_onode(ghobject_t const&, >>>> bool)+0x545) [0x55f779cd8f75] >>>> 7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75] >>>> 8: (main()+0xde0) [0x55f779baab90] >>>> 9: (__libc_start_main()+0xf5) [0x7fbf604bfc05] >>>> 10: (()+0x1bc59f) [0x55f779c3f59f] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> -1> 2018-02-20 16:44:30.128787 7fbf6c923d00 -1 abort: >>>> Corruption: block checksum mismatch >>>> 0> 2018-02-20 16:44:30.131334 7fbf6c923d00 -1 *** Caught >>>> signal (Aborted) ** >>>> in thread 7fbf6c923d00 thread_name:ceph-bluestore- >>>> >>>> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) >>>> luminous (stable) >>>> 1: (()+0x3eb0b1) [0x55f779e6e0b1] >>>> 2: (()+0xf5e0) [0x7fbf61ae15e0] >>>> 3: (gsignal()+0x37) [0x7fbf604d31f7] >>>> 4: (abort()+0x148) [0x7fbf604d48e8] >>>> 5: (RocksDBStore::get(std::string const&, char const*, unsigned >>>> long, >>>> ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce] >>>> 6: (BlueStore::Collection::get_onode(ghobject_t const&, >>>> bool)+0x545) [0x55f779cd8f75] >>>> 7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75] >>>> 8: (main()+0xde0) [0x55f779baab90] >>>> 9: (__libc_start_main()+0xf5) [0x7fbf604bfc05] >>>> 10: (()+0x1bc59f) [0x55f779c3f59f] >>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>> needed to interpret this. >>>> >>>> >>>> >>>> Could you please help me to recover this node or find a way to >>>> prove SSD disk problem. >>>> >>>> Best regards, >>>> Behnam Loghmani >>>> >>>> >>>> >>>> >>>> On Mon, Feb 19, 2018 at 1:35 PM, Caspar Smit < >>>> caspars...@supernas.eu >>>> <mailto:caspars...@supernas.eu>> wrote: >>>> >>>> Hi Behnam, >>>> >>>> I would firstly recommend running a filesystem check on the >>>> monitor disk first to see if >>>> there are any inconsistencies. >>>> >>>> Is the disk where the monitor is running on a spinning disk or >>>> SSD? >>>> >>>> If SSD you should check the Wear level stats through smartctl. >>>> Maybe trim (discard) enabled on the filesystem mount? (discard >>>> could cause >>>> problems/corruption in combination with certain SSD firmwares) >>>> >>>> Caspar >>>> >>>> 2018-02-16 23:03 GMT+01:00 Behnam Loghmani < >>>> behnam.loghm...@gmail.com >>>> <mailto:behnam.loghm...@gmail.com>>: >>>> >>>> I checked the disk that monitor is on it with smartctl and >>>> it didn't return any error >>>> and it doesn't have any Current_Pending_Sector. >>>> Do you recommend any disk checks to make sure that this >>>> disk has problem and then I can >>>> send the report to the provider for replacing the disk >>>> >>>> On Sat, Feb 17, 2018 at 1:09 AM, Gregory Farnum < >>>> gfar...@redhat.com >>>> <mailto:gfar...@redhat.com>> wrote: >>>> >>>> The disk that the monitor is on...there isn't anything >>>> for you to configure about a >>>> monitor WAL though so I'm not sure how that enters into >>>> it? >>>> >>>> On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani < >>>> behnam.loghm...@gmail.com >>>> <mailto:behnam.loghm...@gmail.com>> wrote: >>>> >>>> Thanks for your reply >>>> >>>> Do you mean, that's the problem with the disk I use >>>> for WAL and DB? >>>> >>>> On Fri, Feb 16, 2018 at 11:33 PM, Gregory Farnum < >>>> gfar...@redhat.com >>>> <mailto:gfar...@redhat.com>> wrote: >>>> >>>> >>>> On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani >>>> <behnam.loghm...@gmail.com >>>> <mailto:behnam.loghm...@gmail.com>> wrote: >>>> >>>> Hi there, >>>> >>>> I have a Ceph cluster version 12.2.2 on >>>> CentOS 7. >>>> >>>> It is a testing cluster and I have set it >>>> up 2 weeks ago. >>>> after some days, I see that one of the >>>> three mons has stopped(out of >>>> quorum) and I can't start it anymore. >>>> I checked the mon service log and the >>>> output shows this error: >>>> >>>> """ >>>> mon.XXXXXX@-1(probing) e4 preinit clean up >>>> potentially inconsistent >>>> store state >>>> rocksdb: submit_transaction_sync error: >>>> Corruption: block checksum mismatch >>>> >>>> This bit is the important one. Your disk is bad >>>> and it’s feeding back >>>> corrupted data. >>>> >>>> >>>> >>>> >>>> code = 2 Rocksdb transaction: >>>> 0> 2018-02-16 17:37:07.041812 >>>> 7f45a1e52e40 -1 >>>> /home/jenkins-build/build/work >>>> space/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE >>>> _DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUI >>>> LD/ceph-12.2.2/src/mon/MonitorDBStore.h: >>>> In function 'void >>>> >>>> MonitorDBStore::clear(std::set<std::basic_string<char> >>>> >&)' thread >>>> 7f45a1e52e40 time 2018-02-16 17:37:07.040846 >>>> /home/jenkins-build/build/work >>>> space/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE >>>> _DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/ >>>> rpm/el7/BUILD/ceph-12.2.2/src/mon/MonitorDBStore.h: >>>> 581: FAILE >>>> D assert(r >= 0) >>>> """ >>>> >>>> the only solution I found is to remove this >>>> mon from quorum and remove >>>> all mon data and re-add this mon to quorum >>>> again. >>>> and ceph goes to the healthy status again. >>>> >>>> but now after some days this mon has >>>> stopped and I face the same problem >>>> again. >>>> >>>> My cluster setup is: >>>> 4 osd hosts >>>> total 8 osds >>>> 3 mons >>>> 1 rgw >>>> >>>> this cluster has setup with ceph-volume lvm >>>> and wal/db separation on >>>> logical volumes. >>>> >>>> Best regards, >>>> Behnam Loghmani >>>> >>>> >>>> ______________________________ >>>> _________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com <mailto: >>>> ceph-users@lists.ceph.com> >>>> http://lists.ceph.com/listinfo >>>> .cgi/ceph-users-ceph.com >>>> <http://lists.ceph.com/listinf >>>> o.cgi/ceph-users-ceph.com> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com >>>> > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com