> - IIUC, if a root SSD fails, there is pretty much >no way to rebuild a
> new>node with the same OSDs and avoid data >shuffling - is this correct?You
> can still rebuild the node and add old OSDs and avoid shuffling. Might need
> to enable NOOUT flag while you work on configuration of new node.>- If the
> hardware, fails - I assume replacing >the part and rebooting in>time will
> bring back the node as is - is this >right? Sounds correct.>- If the root
> drive fails, is there a way to bring >up a new host with the>same OSDs in the
> same order but with a >different host name / ip address?Should be possible as
> each OSD authenticate with its own credentials which should not count on the
> IP address change. But IP should be in the same subnet as the cluster.>FWIW
> we are using rook, so I am wondering if >the crush map can be>configured with
> some logical labels instead >of host names for this purpose That should be
> possible.>-Assuming we use a shared SSD with >partitions for WAL/ Metadata
> for the>whole node - if this drive fails, I assume we >have to recover the
> entire>node. Correct? I remember seeing a note that >this pretty much renders
> all>the relevant OSDs useless.Thats correct. If DB/WAL is lost, you would
> have to recover osd which has db broken. >Semi-related: What is the ideal
> ratio of SSDs >for WAL/metadata to count>of OSDs? I remember seeing pdfs from
> >Redhat showing a 1:10 ratio, The>mailing list has references to 1:3 or 1:6.
> I am >trying to figure out what>the right number is. It depends. The
> recommendation is 1-4% of OSD size for DB. But it depends on how many tiny
> objects you would have which would mainly occupy rocksdb (db).Надіслано з
> пристрою Galaxy
-------- Оригінальне повідомлення --------Від: Subu Sankara Subramanian
<subu.zs...@gmail.com> Дата: 12.11.21 18:41 (GMT+02:00) Кому:
ceph-users@ceph.io Тема: [ceph-users] Handling node failures. Folks, New here
- I tried searching for this topic in the archive, couldn't findany since 2018
or so. So starting a new thread. I am looking at the impactof node failures. I
found this
doc:https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/operations_guide/handling-a-node-failure-
I have a few questions about this:- IIUC, if a root SSD fails, there is
pretty much no way to rebuild a newnode with the same OSDs and avoid data
shuffling - is this correct?- If the hardware, fails - I assume replacing the
part and rebooting intime will bring back the node as is - is this right?- If
the root drive fails, is there a way to bring up a new host with thesame OSDs
in the same order but with a different host name / ip address?FWIW we are using
rook, so I am wondering if the crush map can beconfigured with some logical
labels instead of host names for this purpose- Is this possible? ( I am
evaluating if I can bring up a new node backwith the original host name itself
- at least the cloud K8s clusters makethis impossible).- Assuming we use a
shared SSD with partitions for WAL/ Metadata for thewhole node - if this drive
fails, I assume we have to recover the entirenode. Correct? I remember seeing a
note that this pretty much renders allthe relevant OSDs useless.--
Semi-related: What is the ideal ratio of SSDs for WAL/metadata to countof OSDs?
I remember seeing pdfs from Redhat showing a 1:10 ratio, Themailing list has
references to 1:3 or 1:6. I am trying to figure out whatthe right number
is.Thanks. Subu_______________________________________________ceph-users
mailing list -- ceph-us...@ceph.ioto unsubscribe send an email to
ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io