Thanks a lot, Marc! I will try to do a fio test for the crash disks when there is no traffic in our cluster. we using Samsung nvme 970pro as wal/db and using SSD 860 Pro as SSD. And the nvme disappear after ssd hit timeout. may be also need throw 970pro away? Thanks, zx
> 在 2021年2月22日,下午9:25,Marc <m...@f1-outsourcing.eu> 写道: > > So on the disks that crash anyway, do the fio test. If it crashes, you will > know it has nothing to do with ceph. If it does not crash you will probably > get poor fio result, which would explain the problems with ceph. > > This is what someone wrote in the past. If you did not do your research on > drives, I think it is probably your drives. > > " just throw away your crappy Samsung SSD 860 Pro " > https://www.mail-archive.com/ceph-users@ceph.io/msg06820.html > > > >> -----Original Message----- >> From: zxcs <zhuxion...@163.com> >> Sent: 22 February 2021 13:10 >> To: Marc <m...@f1-outsourcing.eu> >> Cc: Mark Lehrer <leh...@gmail.com>; Konstantin Shalygin >> <k0...@k0ste.ru>; ceph-users <ceph-users@ceph.io> >> Subject: Re: [ceph-users] Ceph nvme timeout and then aborting >> >> Haven’t do any fio test for single disk , but did fio for the ceph >> cluster, actually the cluster has 12 nodes, and each node has same >> disks(means, 2 nvmes for cache, and 3 ssds as osd, 4 hdds also as osd). >> Only two nodes has such problem. And these two nodes are crash many >> times(at least 4 times). The others are good. So it strange. >> This cluster has run more than half years. >> >> >> Thanks, >> zx >> >>> 在 2021年2月22日,下午6:37,Marc <m...@f1-outsourcing.eu> 写道: >>> >>> Don't you have problems, just because the Samsung 970 PRO is not >> suitable for this? Have you run fio tests to make sure it would work ok? >>> >>> https://yourcmc.ru/wiki/Ceph_performance >>> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX- >> 0u0r5fAjjufLKayaut_FOPxYZjc/edit#gid=0 >>> >>> >>> >>>> -----Original Message----- >>>> Sent: 22 February 2021 03:16 >>>> us...@ceph.io> >>>> Subject: [ceph-users] Re: Ceph nvme timeout and then aborting >>>> >>>> Thanks for you reply! >>>> >>>> Yes, it a Nvme, and on node has two Nvmes as db/wal, one for ssd(0-2) >>>> and another for hdd(3-6). >>>> I have no spare to try. >>>> It’s very strange, the load not very high at that time. and both ssd >>>> and nvme seems healthy. >>>> >>>> If cannot fix it. I am afraid I need to setup more nodes and set out >>>> remove these OSDs which using this Nvme? >>>> >>>> Thanks, >>>> zx >>>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io