Folks,

I am having a knowledge question concerning the selection of secondary OSDs in 
Ceph.

I have a cluster here that consists of three nodes. For the sake of the 
argument, I have simulated latency between the third node and the other two 
using tc and netem. I have set the priority-affinity of the OSDs on the third 
node to 0, and indeed, RADOS is not using any of these OSDs as primary OSD, so 
this part works as expected.

Furthermore, my expectation for a pool that has size=3 and min_size=2 is that 
for any given write, the primary OSD on nodes 1 or 2 will select a secondary 
OSD in node 1/2 respectively and one in node 3. Which would then lead me to 
believe that any client writing into the cluster from node 1 will only ever 
have the latency between node 1 and node 2 as an actual performance penalty 
because

* client selects primary OSD on node 1 or node 2
* primary OSD selects secondary OSDs and starts transfer in parallel
* Write to OSD with lower latency will finish much sooner than the one to the 
other OSD, leading to the write acknowledgement being sent to the client, 
because min_size=2

But that appears not to be the case. priority-affinity has a very slight 
impact, but the overall performance when writing into the cluster with queue 
depth 1 and request size of 4k still very much resembles a scenario in which 
every single write appears to be latency-penalized with the latency between 
node1/2 and node 3.

Where is my understanding incorrect? Or are there any configuration settings 
for this? I tried to search for this, but the only results I can find refer to 
priority-affinity. I am looking into something like „secondary affinity“ I 
guess, but I do not think that such a thing exists in Ceph. Which leads me to 
believe that my understanding of this is seriously wrong somehow.

Any hint will be greatly appreciated. Thank you very much in advance.

Best regards
Martin

-- 
Martin Gerhard Loschwitz
Geschäftsführer / CEO, True West IT Services GmbH
Phone: +49 2433 5253130
Mobile: +49 176 61832178
Address: Schmiedegasse 24a, 41836 Hückelhoven, Germany
Legal: HRB 21985, Amtsgericht Mönchengladbach
VAT: DE363893844

True West IT Services GmbH is compliant with the GDPR regulation on data 
protection and privacy in the European Union and the European Economic Area. 
You can request the information on how we collect and process your private data 
according to the law by contacting the email sender.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to