[ceph-users] Re: Smarter DB disk replacement

2021-09-15 Thread Ján Senko
M.2 was not designed for hot swap, and Icydock's solution is a bit outside 
specification.
I really like the new Supermicro box (610P) that has 12 spinning disks and then 
6 NVMs.
2 of them in 2.5"x7mm format and 4 of them in the new E1.S format.

E1.S is practically next gen hot plug M.2

Ján Senko
Proton Technologies AG
Lead Storage Engineer

‐‐‐ Original Message ‐‐‐

On Monday, September 13th, 2021 at 20:23, Reed Dier  
wrote:

> I've been eyeing a similar icydock product 
> (https://www.icydock.com/goods.php?id=309 
> https://www.icydock.com/goods.php?id=309) for make M.2 drives more 
> serviceable.
>
> While M.2 isn't ideal, if you have a 2U/4U box with a ton of available slots 
> in the back, you could use these with some Micron 7300 MAX or like M.2's for 
> WAL/DB.
>
> In theory would make identifying failed M.2 easier/quicker, and allow 
> hot-servicing, rather than say an on-motherboard slot, requiring a full 
> server pull to service.
>
> Curious if anyone has experience with it yet.
>
> Reed
>
> > On Sep 9, 2021, at 12:36 PM, Mark Nelson mnel...@redhat.com wrote:
> >
> > I don't think the bigger tier 1 enterprise vendors have really jumped on, 
> > but I've been curious to see if anyone would create a dense hotswap m.2 
> > setup (possibly combined with traditional 3.5" HDD bays). The only vendor 
> > I've really seen even attempt something like this is icydock:
> >
> > https://www.icydock.com/goods.php?id=287
> >
> > 8 NVMe m.2 devices in a single 5.25" bay. They also have another version 
> > that does 6 m.2 in 2x3.5". You could imagine that one of the tier 1 
> > enterprise vendors could probably do something similar on the back of a 
> > traditional 12-bay 2U 3.5" chassis. Stick in some moderately sized high 
> > write endurance m.2 devices and you're looking at something like 2 OSD 
> > DB/WAL per NVMe. As it is, 6:1 with 2x2.5" seems to be pretty typical and 
> > isn't terrible if you use decent drives.
> >
> > Mark
> >
> > On 9/9/21 12:04 PM, David Orman wrote:
> >
> > > Exactly, we minimize the blast radius/data destruction by allocating
> > >
> > > more devices for DB/WAL of smaller size than less of larger size. We
> > >
> > > encountered this same issue on an earlier iteration of our hardware
> > >
> > > design. With rotational drives and NVMEs, we are now aiming for a 6:1
> > >
> > > ratio based on our CRUSH rules/rotational disk sizing/nvme
> > >
> > > sizing/server sizing/EC setup/etc.
> > >
> > > Make sure to use write-friendly NVMEs for DB/WAL and the failures
> > >
> > > should be much fewer and further between.
> > >
> > > On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson icepic...@gmail.com wrote:
> > >
> > > > Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad 
> > > > michal.str...@cesnet.cz:
> > > >
> > > > > When the disk with DB died
> > > > >
> > > > > it will cause inaccessibility of all depended OSDs (six or eight in 
> > > > > our
> > > > >
> > > > > environment),
> > > > >
> > > > > How do you do it in your environment?
> > > > >
> > > > > Have two ssds for 8 OSDs, so only half go away when one ssd dies.
> > > >
> > > > --
> > > >
> > > > May the most significant bit of your life be positive.
> > > >
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > >
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > > ceph-users mailing list -- ceph-users@ceph.io
> > >
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> >
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ceph-users mailing list -- ceph-users@ceph.io
>
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Smarter DB disk replacement

2021-09-13 Thread Reed Dier
I've been eyeing a similar icydock product 
(https://www.icydock.com/goods.php?id=309 
) for make M.2 drives more 
serviceable.
While M.2 isn't ideal, if you have a 2U/4U box with a ton of available slots in 
the back, you could use these with some Micron 7300 MAX or like M.2's for 
WAL/DB.
In theory would make identifying failed M.2 easier/quicker, and allow 
hot-servicing, rather than say an on-motherboard slot, requiring a full server 
pull to service.

Curious if anyone has experience with it yet.

Reed

> On Sep 9, 2021, at 12:36 PM, Mark Nelson  wrote:
> 
> I don't think the bigger tier 1 enterprise vendors have really jumped on, but 
> I've been curious to see if anyone would create a dense hotswap m.2 setup 
> (possibly combined with traditional 3.5" HDD bays).  The only vendor I've 
> really seen even attempt something like this is icydock:
> 
> 
> https://www.icydock.com/goods.php?id=287
> 
> 
> 8 NVMe m.2 devices in a single 5.25" bay.  They also have another version 
> that does 6 m.2 in 2x3.5".  You could imagine that one of the tier 1 
> enterprise vendors could probably do something similar on the back of a 
> traditional 12-bay 2U 3.5" chassis.  Stick in some moderately sized high 
> write endurance m.2 devices and you're looking at something like 2 OSD DB/WAL 
> per NVMe.  As it is, 6:1 with 2x2.5" seems to be pretty typical and isn't 
> terrible if you use decent drives.
> 
> Mark
> 
> On 9/9/21 12:04 PM, David Orman wrote:
>> Exactly, we minimize the blast radius/data destruction by allocating
>> more devices for DB/WAL of smaller size than less of larger size. We
>> encountered this same issue on an earlier iteration of our hardware
>> design. With rotational drives and NVMEs, we are now aiming for a 6:1
>> ratio based on our CRUSH rules/rotational disk sizing/nvme
>> sizing/server sizing/EC setup/etc.
>> 
>> Make sure to use write-friendly NVMEs for DB/WAL and the failures
>> should be much fewer and further between.
>> 
>> On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson  wrote:
>>> Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad :
  When the disk with DB died
 it will cause inaccessibility of all depended OSDs (six or eight in our
 environment),
 How do you do it in your environment?
>>> Have two ssds for 8 OSDs, so only half go away when one ssd dies.
>>> 
>>> --
>>> May the most significant bit of your life be positive.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread Mark Nelson
I don't think the bigger tier 1 enterprise vendors have really jumped 
on, but I've been curious to see if anyone would create a dense hotswap 
m.2 setup (possibly combined with traditional 3.5" HDD bays).  The only 
vendor I've really seen even attempt something like this is icydock:



https://www.icydock.com/goods.php?id=287


8 NVMe m.2 devices in a single 5.25" bay.  They also have another 
version that does 6 m.2 in 2x3.5".  You could imagine that one of the 
tier 1 enterprise vendors could probably do something similar on the 
back of a traditional 12-bay 2U 3.5" chassis.  Stick in some moderately 
sized high write endurance m.2 devices and you're looking at something 
like 2 OSD DB/WAL per NVMe.  As it is, 6:1 with 2x2.5" seems to be 
pretty typical and isn't terrible if you use decent drives.


Mark

On 9/9/21 12:04 PM, David Orman wrote:

Exactly, we minimize the blast radius/data destruction by allocating
more devices for DB/WAL of smaller size than less of larger size. We
encountered this same issue on an earlier iteration of our hardware
design. With rotational drives and NVMEs, we are now aiming for a 6:1
ratio based on our CRUSH rules/rotational disk sizing/nvme
sizing/server sizing/EC setup/etc.

Make sure to use write-friendly NVMEs for DB/WAL and the failures
should be much fewer and further between.

On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson  wrote:

Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad :

  When the disk with DB died
it will cause inaccessibility of all depended OSDs (six or eight in our
environment),
How do you do it in your environment?

Have two ssds for 8 OSDs, so only half go away when one ssd dies.

--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread David Orman
Exactly, we minimize the blast radius/data destruction by allocating
more devices for DB/WAL of smaller size than less of larger size. We
encountered this same issue on an earlier iteration of our hardware
design. With rotational drives and NVMEs, we are now aiming for a 6:1
ratio based on our CRUSH rules/rotational disk sizing/nvme
sizing/server sizing/EC setup/etc.

Make sure to use write-friendly NVMEs for DB/WAL and the failures
should be much fewer and further between.

On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson  wrote:
>
> Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad :
> >  When the disk with DB died
> > it will cause inaccessibility of all depended OSDs (six or eight in our
> > environment),
> > How do you do it in your environment?
>
> Have two ssds for 8 OSDs, so only half go away when one ssd dies.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread Konstantin Shalygin
Ceph guarantee data consistency only when its write by Ceph
When NVMe dies - we replace it and fill. Normal for our is a fill osd host for 
a two weeks

k

Sent from my iPhone

> On 9 Sep 2021, at 17:10, Michal Strnad  wrote:
> 
> 2. When DB disk is not completely dead and has only relocated sectors without 
> bad sectors ... we can add new disk to the server and make a copy from old to 
> the new one (using dd or similar tools). After that we only switch symbolic 
> links in associated OSDs (in /var/lib/ceph/osd/osd-*/block.db). But this does 
> not work with a completely dead disk.
> 3. It would be nice to have some command/script which can prepare data on 
> new/replaced DB disk (scan over all disks) and accelerate DB disk replacement.
> 
> How do you do it in your environment?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread Janne Johansson
Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad :
>  When the disk with DB died
> it will cause inaccessibility of all depended OSDs (six or eight in our
> environment),
> How do you do it in your environment?

Have two ssds for 8 OSDs, so only half go away when one ssd dies.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io