[ceph-users] Re: Ceph nvme timeout and then aborting

zxcs Mon, 22 Feb 2021 17:01:32 -0800

Thanks  a lot, Marc!

I will try to do a fio test for the crash disks when there is no traffic in our 
cluster.
we using Samsung nvme 970pro as  wal/db and using SSD 860 Pro as SSD. And the 
nvme disappear  after ssd hit timeout. may be also need throw 970pro away?
Thanks,
zx


> 在 2021年2月22日，下午9:25，Marc <m...@f1-outsourcing.eu> 写道：
> 
> So on the disks that crash anyway, do the fio test. If it crashes, you will 
> know it has nothing to do with ceph. If it does not crash you will probably 
> get poor fio result, which would explain the problems with ceph.
> 
> This is what someone wrote in the past. If you did not do your research on 
> drives, I think it is probably your drives.
> 
> " just throw away your crappy Samsung SSD 860 Pro "
> https://www.mail-archive.com/ceph-users@ceph.io/msg06820.html
> 
> 
> 
>> -----Original Message-----
>> From: zxcs <zhuxion...@163.com>
>> Sent: 22 February 2021 13:10
>> To: Marc <m...@f1-outsourcing.eu>
>> Cc: Mark Lehrer <leh...@gmail.com>; Konstantin Shalygin
>> <k0...@k0ste.ru>; ceph-users <ceph-users@ceph.io>
>> Subject: Re: [ceph-users] Ceph nvme timeout and then aborting
>> 
>> Haven’t do any fio test for single  disk , but did fio for the ceph
>> cluster, actually the cluster has 12 nodes, and each node has same
>> disks(means, 2 nvmes for cache, and 3 ssds as osd, 4 hdds also as osd).
>> Only two nodes has such problem. And these two nodes are crash many
>> times(at least 4 times). The others are good.  So it strange.
>> This cluster has run more than half years.
>> 
>> 
>> Thanks,
>> zx
>> 
>>> 在 2021年2月22日，下午6:37，Marc <m...@f1-outsourcing.eu> 写道：
>>> 
>>> Don't you have problems, just because the Samsung 970 PRO is not
>> suitable for this? Have you run fio tests to make sure it would work ok?
>>> 
>>> https://yourcmc.ru/wiki/Ceph_performance
>>> https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-
>> 0u0r5fAjjufLKayaut_FOPxYZjc/edit#gid=0
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> Sent: 22 February 2021 03:16
>>>> us...@ceph.io>
>>>> Subject: [ceph-users] Re: Ceph nvme timeout and then aborting
>>>> 
>>>> Thanks for you reply!
>>>> 
>>>> Yes, it a Nvme, and on node has two Nvmes as db/wal, one for ssd(0-2)
>>>> and another for hdd(3-6).
>>>> I have no spare to try.
>>>> It’s  very strange, the load not very high at that time. and both ssd
>>>> and nvme seems healthy.
>>>> 
>>>> If cannot fix it.  I am afraid I need to setup more nodes and set out
>>>> remove these OSDs which using this Nvme?
>>>> 
>>>> Thanks,
>>>> zx
>>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph nvme timeout and then aborting

Reply via email to