We are not using RoCE in production, but a few years ago we tested it with 25Gb/s cards. From what I recall, RDMA was working as expected using the MOFED stack, running lnet bench was using a few cores with TCP, while the same benchmark with RoCE was using almost 0 CPU cores.
The only change I had documented was this one to use o2ib instead of TCP on the ethernet interface: [root@server ~]# cat /etc/modprobe.d/lustre.conf options lnet networks=o2ib(ens2) That presentation from CaRCC might be helpful: "A practical summary of our experience standing up and living with an all-Broadcom RDMA over Converged Ethernet (RoCE) fabric in support of our most recent cluster acquisition at the University of Arizona. Presented by: Adam Michel - University of Arizona" https://www.youtube.com/watch?v=G6xHirUtx7w On Fri, Jun 16, 2023 at 11:59 AM Lana Deere via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > We have been using Lustre on InfiniBand with o2ib(ib0) specified. If > we had an Ethernet with RoCE enabled, would this work on that kind of > network? If not, is there a different way to configure for RoCE which > is recommended (using RDMA - I know we could switch to tcp)? > > .. Lana (lana.de...@gmail.com) > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org