Hi.

Do you use the latest released FW for this device?

thanks
Dotan

Marcel Heinz wrote:
Hi,

I have ported an application to use InfiniBand multicast directly via
libibverbs. I have discovered very low multicast throughput, only
~250MByte/s although we are using 4x DDR components. To count out any
effects of the application, I've created a small benchmark (well, it's
only a hack). It just tries to keep the send/recv queue filled with work
request and polls the CQ in an endless loop. In server mode, it joins
to/creates the multicast group as FullMember, attaches the QP to the
group and receives any packets. The client joins as SendOnlyNonMember
and sends Datagrams of full MTU size to the group.

The test setup is as follows:

Host A <---> Switch <---> Host B

We use Mellanox InfiniHost III Lx HCAs (MT25204) and a Flextronics
F-X430046 24-Port Switch, OFED 1.3 and a "vanilla" 2.6.23.9 Linux kernel.

The results are:

Host A          Host B          Throughput (MByte/sec)
client          server          262
client          2xserver        146
client+server   server          944
client+server   ---             946

as reference: unicast ib_send_bw (in UD mode): 1146

I don't see any reason why it should become _faster_ when I additionally
start a server on the same host as the client. OTOH, the 944MByte/s
sound relatively sane when compared to the unicast performance with the
additional overhead of having to copy the data locally.

These 260MB/s seem releatively near to the 2GBit/s effective throughput
of a 1x SDR connection. However, the created group is rate 6 (20GBit/s)
and /sys/class/infiniband/mthca0/ports/1/rate file showed 20 Gb/sec
during the whole test.

The error counters of all ports are showing nothing abnormal. Only the
RcvSwRelayErrors counter of the switch's port (to the host running the
client) is increasing very fast, but this seems to be normal for
multicast packets, as the switch is not relaying these packets back to
the source.

We could test on another cluster with 6 nodes (also with MT25204 HCAs, I
don't know the OFED version and switch type) and got the following results:

Host1   Host2   Host3   Host4   Host5   Host6   Throughput (MByte/s)    
1s       1s                              1c      255,15
1s       1s      1s                      1c      255,22
1s       1s      1s      1s              1c      255,22
1s       1s      1s      1s      1s      1c      255,22

1s1c     1s      1s                              738,64
1s1c     1s      1s      1s                      695,08
1s1c     1s      1s      1s      1s              565,14
1s1c     1s      1s      1s      1s     1s       451,90

As long as there is no server and client on the same host, it at least
behaves like multicast. When having both client and server on the same
host, performance decreases as the number of servers increases, which is
totally surprising to me.

Another test I did was doing a ib_send_bw (UD) benchmark while the
multicast benchmark was running between A and B. I got ~260MByte/s for
the multicast and also 260MB/s for ib_send_bw.

Has anyone an idea of what is going on there or a hint what I should check?

Regards,
Marcel
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to