[ceph-users] Re: Handling node failures.

prosergey07 Fri, 12 Nov 2021 16:27:59 -0800
> - IIUC, if a root SSD fails, there is pretty much >no way to rebuild a 
> new>node with the same OSDs and avoid data >shuffling - is this correct?You 
> can still rebuild the node and add old OSDs and avoid shuffling. Might need 
> to enable NOOUT flag while you work on configuration of new node.>- If the 
> hardware, fails - I assume replacing >the part and rebooting in>time will 
> bring back the node as is - is this >right? Sounds correct.>- If the root 
> drive fails, is there a way to bring >up a new host with the>same OSDs in the 
> same order but with a >different host name / ip address?Should be possible as 
> each OSD authenticate with its own credentials which should not count on the 
> IP address change. But IP should be in the same subnet as the cluster.>FWIW 
> we are using rook, so I am wondering if >the crush map can be>configured with 
> some logical labels instead >of host names for this purpose That should be 
> possible.>-Assuming we use a shared SSD with >partitions for WAL/ Metadata 
> for the>whole node - if this drive fails, I assume we >have to recover the 
> entire>node. Correct? I remember seeing a note that >this pretty much renders 
> all>the relevant OSDs useless.Thats correct. If DB/WAL is lost, you would 
> have to recover osd which has db broken. >Semi-related: What is the ideal 
> ratio of SSDs >for WAL/metadata to count>of OSDs? I remember seeing pdfs from 
> >Redhat showing a 1:10 ratio, The>mailing list has references to 1:3 or 1:6. 
> I am >trying to figure out what>the right number is. It depends. The 
> recommendation is 1-4% of OSD size  for DB. But it depends on how many tiny 
> objects you would have which would mainly occupy rocksdb (db).Надіслано з 
> пристрою Galaxy
-------- Оригінальне повідомлення --------Від: Subu Sankara Subramanian 
<subu.zs...@gmail.com> Дата: 12.11.21  18:41  (GMT+02:00) Кому: 
ceph-users@ceph.io Тема: [ceph-users] Handling node failures. Folks,  New here 
- I tried searching for this topic in the archive, couldn't findany since 2018 
or so. So starting a new thread.  I am looking at the impactof node failures. I 
found this 
doc:https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/operations_guide/handling-a-node-failure-
 I have a few questions about this:-  IIUC, if a root SSD fails, there is 
pretty much no way to rebuild a newnode with the same OSDs and avoid data 
shuffling - is this correct?- If the hardware, fails - I assume replacing the 
part and rebooting intime will bring back the node as is - is this right?- If 
the root drive fails, is there a way to bring up a new host with thesame OSDs 
in the same order but with a different host name / ip address?FWIW we are using 
rook, so I am wondering if the crush map can beconfigured with some logical 
labels instead of host names for this purpose- Is this possible? ( I am 
evaluating if I can bring up a new node backwith the original host name itself 
- at least the cloud K8s clusters makethis impossible).- Assuming we use a 
shared SSD with partitions for WAL/ Metadata for thewhole node - if this drive 
fails, I assume we have to recover the entirenode. Correct? I remember seeing a 
note that this pretty much renders allthe relevant OSDs useless.-- 
Semi-related: What is the ideal ratio of SSDs for WAL/metadata to countof OSDs? 
I remember seeing pdfs from Redhat showing a 1:10 ratio, Themailing list has 
references to 1:3 or 1:6. I am trying to figure out whatthe right number 
is.Thanks. Subu_______________________________________________ceph-users 
mailing list -- ceph-us...@ceph.ioto unsubscribe send an email to 
ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Handling node failures.

Reply via email to