Hi all,

Thanks for the replies. The issue as I see it is with sending data from an OST to the client, avoiding the inter-CPU link.

So, if I have:
cpu1 - IB card 1 (10.0.0.1), nvme1 (OST1)
cpu2 - IB card 2 (10.0.0.2), nvme2 (OST2)

Both IB cards on the same subnet. Therefore, by default, packets will be routed out of the server over the preferred card, say IB card 1 (I could be wrong, but this is my current understanding, and seems to be what the Lustre manual says).

Data coming in (being written to the OST) is not a problem. The client will know the IP address of the card to which the OST is closest. So, to write to OST2, it will use the 10.0.0.2 address (since this will be the IP address given in mkfs.lustre for that OST).

The slight complication here is pinning. A cpu thread may run on cpu1, so the data has to traverse the inter-cpu link twice. However, I am assuming that this won't happen - i.e. the kernel or lustre are clever enough to place this thread on cpu2. As far as I am aware, this should just work, though please correct me if I'm wrong. Perhaps I have to manually specify pinning - how does one do that with Lustre?

Reading is more problematic. A request from a client (say 10.0.0.100) for data on OST2 will come in via card 2 (10.0.0.2). A thread on CPU2 (hopefully) will then read the data from OST2, and send it out to the client, 10.0.0.100. However, here, Linux will route the packet through the first card on this subnet, so it will go over the inter-cpu link, and out of IB card 1. And this will be the case even if the thread is pinned on CPU2.

The question then is whether there is a way to configure Lustre to use IB card 2 when sending out data from OST2.

Cheers,
Alastair.

On Wed, 10 Mar 2021, Ms. Megan Larko wrote:

[EXTERNAL EMAIL]
Greetings Alastair,

Bonding is supported on InfiniBand, but  I believe that it is only 
active/passive.
I think what you might be looking for WRT avoiding data travel through the inter-cpu link is cpu 
"affinity" AKA cpu "pinning".

Cheers,
megan

WRT = "with regards to"
AKA = "also known as"

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to