Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA

Grunenberg, Renar Fri, 10 Dec 2021 02:36:00 -0800

Hallo Walter,
we had many experiences now to change our Storage-Systems in our 
Backup-Environment to RDMA-IB with HDR and EDR Connections. What we see now 
(came from a 16Gbit FC Infrastructure) we enhance our throuhput from 7 GB/s to 
30 GB/s. The main reason are the elimination of the driver-layers in the 
client-systems and make a Buffer to Buffer communication because of RDMA. The 
latency reduction are significant.
Regards Renar.
We use now ESS3k and ESS5k systems with 6.1.1.2-Code level.

Renar Grunenberg
Abteilung Informatik - Betrieb

HUK-COBURG
Bahnhofsplatz
96444 Coburg
Telefon:        09561 96-44110
Telefax:        09561 96-44104
E-Mail: [email protected]
Internet:       www.huk.de
________________________________
HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands 
a. G. in Coburg
Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021
Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg
Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin.
Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav 
Herøy, Dr. Jörg Rheinländer, Thomas Sehn, Daniel Thomas.
________________________________
Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte 
Informationen.
Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich 
erhalten haben,
informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht.
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist 
nicht gestattet.

This information may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this information in 
error) please notify the
sender immediately and destroy this information.
Any unauthorized copying, disclosure or distribution of the material in this 
information is strictly forbidden.
________________________________
Von: [email protected] 
<[email protected]> Im Auftrag von Walter Sklenka
Gesendet: Freitag, 10. Dezember 2021 11:17
An: [email protected]
Betreff: Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA

Hello Douglas!
May I ask a basic question regarding GPUdirect Storage or all local attached 
storage like NVME disks. Do you think it outerperforms “classical” shared 
storagesystems which are attached via FC connected to  NSD servers HDR attached?
With FC you have also bounce copies and more delay  , isn´t it?
There are solutions around which work with local NVME disks building some 
protection level with Raid (or duplication) . I am curious if it would be a 
better approach than shared storage which has it´s limitation (cost intensive 
scale out, extra infrstructure, max 64Gb at this time … )

Best regards
Walter

From: 
[email protected]<mailto:[email protected]>

<[email protected]<mailto:[email protected]>>
 On Behalf Of Douglas O'flaherty
Sent: Freitag, 10. Dezember 2021 05:24
To: [email protected]<mailto:[email protected]>
Subject: Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA

Jonathan:

You posed a reasonable question, which was "when is RDMA worth the hassle?"  I 
agree with part of your premises, which is that it only matters when the 
bottleneck isn't somewhere else. With a parallel file system, like Scale/GPFS, 
the absolute performance bottleneck is not the throughput of a single drive. In 
a majority of Scale/GPFS clusters the network data path is the performance 
limitation. If they deploy HDR or 100/200/400Gbps Ethernet...  At that point, 
the buffer copy time inside the server matters.

When the device is an accelerator, like a GPU, the benefit of RDMA (GDS) is 
easily demonstrated because it eliminates the bounce copy through the system 
memory. In our NVIDIA DGX A100 server testing testing we were able to get 
around 2x the per system throughput by using RDMA direct to GPU (GUP Direct 
Storage). (Tested on 2 DGX system with 4x HDR links per storage node.)

However, your question remains. Synthetic benchmarks are good indicators of 
technical benefit, but do your users and applications need that extra 
performance?

These are probably only a handful of codes in organizations that need this. 
However, they are high-value use cases. We have client applications that either 
read a lot of data semi-randomly and not-cached - think mini-Epics for scaling 
ML training. Or, demand lowest response time, like production inference on 
voice recognition and NLP.

If anyone has use cases for GPU accelerated codes with truly demanding data 
needs, please reach out directly. We are looking for more use cases to 
characterize the benefit for a new paper. f you can provide some code examples, 
we can help test if RDMA direct to GPU (GPUdirect Storage) is a benefit.

Thanks,

doug

Douglas O'Flaherty
[email protected]<mailto:[email protected]>

----- Message from Jonathan Buzzard 
<[email protected]<mailto:[email protected]>> on Fri, 
10 Dec 2021 00:27:23 +0000 -----
To:

[email protected]<mailto:[email protected]>

Subject:

Re: [gpfsug-discuss]

On 09/12/2021 16:04, Douglas O'flaherty wrote:
>
> Though not directly about your design, our work with NVIDIA on GPUdirect
> Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both
> MOFED and Firmware version compatibility can be.
>
> I would suggest anyone debugging RDMA issues should look at those closely.
>
May I ask what are the alleged benefits of using RDMA in GPFS?

I can see there would be lower latency over a plain IP Ethernet or IPoIB
solution but surely disk latency is going to swamp that?

I guess SSD drives might change that calculation but I have never seen
proper benchmarks comparing the two, or even better yet all four
connection options.

Just seems a lot of complexity and fragility for very little gain to me.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG

----- Original message -----
From: "Jonathan Buzzard" 
<[email protected]<mailto:[email protected]>>
Sent by: 
[email protected]<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Cc:
Subject: [EXTERNAL] Re: [gpfsug-discuss] alternate path between ESS Servers for 
Datamigration
Date: Fri, Dec 10, 2021 10:27

On 09/12/2021 16:04, Douglas O'flaherty wrote:
>
> Though not directly about your design, our work with NVIDIA on GPUdirect
> Storage and SuperPOD has shown how sensitive RDMA (IB & RoCE) to both
> MOFED and Firmware version compatibility can be.
>
> I would suggest anyone debugging RDMA issues should look at those closely.
>
May I ask what are the alleged benefits of using RDMA in GPFS?

I can see there would be lower latency over a plain IP Ethernet or IPoIB
solution but surely disk latency is going to swamp that?

I guess SSD drives might change that calculation but I have never seen
proper benchmarks comparing the two, or even better yet all four
connection options.

Just seems a lot of complexity and fragility for very little gain to me.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] WAS: alternative path; Now: RDMA

Reply via email to