I still have the guide from that system, and I saved some of the routing scripts and what not. But really, it wasn’t much more complicated than Ethernet routing.
The routing nodes, I guess obviously, had both Omnipath and Infiniband interfaces. Compute knifes themselves I believe used a supervisord script, if I’m remembering that name right, to try to balance out which routing nide ione would use as a gateway. There were two as it was configured when I got to it, but a larger number was possible. It seems to me that there was probably a better way to do that, but it did work. The read/write rates were not as fast as our fully Inifniband clusters, but they were fast enough. The cluster was Caliburn, which was in the top 500 for some time, so there may be some papers and whatnot written on it before we inherited it. If there’s something specific you want to know, I could probably dig it up. Sent from my iPhone On Aug 21, 2023, at 14:48, Kidger, Daniel <daniel.kid...@hpe.com> wrote: Ryan, This sounds very interesting. Do you have more details or references of how they connected together, and what any pain points were? Daniel From: gpfsug-discuss <gpfsug-discuss-boun...@gpfsug.org> On Behalf Of Ryan Novosielski Sent: 21 August 2023 19:07 To: gpfsug main discussion list <gpfsug-discuss@gpfsug.org> Cc: gpfsug-disc...@spectrumscale.org Subject: Re: [gpfsug-discuss] Joining RDMA over different networks? If I understand what you’re asking correctly, we used to have a cluster that did this. GPFS was on Infininiband, some of the compute nodes were too, and the rest were on Omnipath. There were routers in between with both types. Sent from my iPhone On Aug 21, 2023, at 13:55, Kidger, Daniel <daniel.kid...@hpe.com<mailto:daniel.kid...@hpe.com>> wrote: I know in the Lustre world that LNET routers are used to provide RDMA over heterogeneous networks. Is there an equivalent for Storage Scale? eg if an ESS uses Infiniband to connect directly to Cluster A, could that InfiniBand RDMA fabric be “routed” to ClusterB that has RoCE connecting all its nodes together and hence the filesystem mounted? ps. The same question would apply to other usually incompatible RDMA networks like Omnipath, Slingshot, Cornelis, … ? Daniel Daniel Kidger HPC Storage Solutions Architect, EMEA daniel.kid...@hpe.com<mailto:daniel.kid...@hpe.com> +44 (0)7818 522266 hpe.com<http://www.hpe.com/> <image001.png> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org