Re: tune ib stack

2013-05-14 Thread Vasiliy Tolstov
Sorry for bumping old thread, i'm solve my problems with new firmware.
I have supermicro servers that rebrand mellanox firmware (recompile
and change some bits)
Now all works fine i have 40 gb/s QDR instead of 10 Gb/s

2013/4/9 Sebastian Riemer sebastian.rie...@profitbricks.com:
 On 09.04.2013 16:23, Hal Rosenstock wrote:
 So these values are exactly the same as in ibv_devinfo and can be set
 in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu.

 I've found the PortInfo with the command
 smpquery portinfo -C mlx4_0 3 1
 where I'm using the first HCA to contact the SM. I tell the SM the
 destination LID ('3' here in my case) and the destination port ('1').

 Is there another method to set the max MTU?

 That doesn't set max MTU (MTUCap) but merely reads it (for that port).

 Sorry, copy and paste error. I've meant the mlx4 file:
 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

 But you've answered that by vendor specific. Thanks for the valuable
 information!

 For us most interesting would be if the MTU can be changed live without
 any service disruption. Looks like the mlx4 driver can't provide that.
 Perhaps switches can do that.

 Cheers,
 Sebastian




-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
jabber: v...@selfip.ru
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-05-14 Thread Sebastian Riemer
On 14.05.2013 12:02, Vasiliy Tolstov wrote:
 Sorry for bumping old thread, i'm solve my problems with new firmware.
 I have supermicro servers that rebrand mellanox firmware (recompile
 and change some bits)
 Now all works fine i have 40 gb/s QDR instead of 10 Gb/s
 
Thanks, sharing lesson learned experience is never wrong. Especially as
there aren't many IB specialists in the world.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Vasiliy Tolstov
2013/4/9 Sebastian Riemer sebastian.rie...@profitbricks.com:
 Because 2048 is the default and 4096 is the max. supported MTU by the
 hardware.

 How can i set active mtu?

 Something like this:
 echo 4096  /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

After doing this all srp connections down and port is down. I need to
restart openibd

06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies Device 0017
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 42
Region 0: Memory at df90 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at de00 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Not readable
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s,
Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
 Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
 EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number 00-25-90-ff-ff-17-9b-24
Capabilities: [18c v1] #19
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core


 Could be a bug. Which OFED/Kernel (if using in-tree IB modules) do you use?
 Mine says with ConnectX2 QDR: 40 Gb/sec (4X QDR)

I'm using stock 3.8.6 kernel and xen patches on top. And i'm use
modules provided with kernel. (only ib_srp i'm use from Bart github
repo)


 You should see 40 Gb/sec (4X QDR) here. Perhaps the OFED is too old so
 that FDR and ConnectX 3 aren't supported, yet. 10 Gb/sec (4X) seems to
 be the default case if a rate isn't supported.

Yes, in older card with ConnecX i see this, but in case of ConnectX-3 only 10 Gb

--
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru
jabber: v...@selfip.ru
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 13:51, Vasiliy Tolstov wrote:
 Something like this:
 echo 4096  /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu
 
 After doing this all srp connections down and port is down. I need to
 restart openibd

Sorry for that! It's much easier to set the IP MTU. Managed switches
support setting the RDMA MTU. So it could be possible that it is a
setting in the SM config. But I'm not sure.

$ man opensm
says that it can be set in the partitions.conf

 You should see 40 Gb/sec (4X QDR) here. Perhaps the OFED is too old so
 that FDR and ConnectX 3 aren't supported, yet. 10 Gb/sec (4X) seems to
 be the default case if a rate isn't supported.
 
 Yes, in older card with ConnecX i see this, but in case of ConnectX-3 only 10 
 Gb

The kernel version is okay. It depends on the user space.
There is a support note in OFED 3.5:
- ConnectX-3 (fw-ConnectX3 Rev 2.11.0500) (FDR and FDR10 Modes are
Supported)

Before OFED 3.5 these HCAs aren't supported. A look at the related
source code could be worth a try.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Hal Rosenstock
On 4/9/2013 8:15 AM, Sebastian Riemer wrote:
 On 09.04.2013 13:51, Vasiliy Tolstov wrote:
 Something like this:
 echo 4096  /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

 After doing this all srp connections down and port is down. I need to
 restart openibd
 
 Sorry for that! It's much easier to set the IP MTU. Managed switches
 support setting the RDMA MTU. So it could be possible that it is a
 setting in the SM config. But I'm not sure.

IP MTU is different than link MTU. For UD mode, it's link MTU - 4. For
RC (connected) mode, this can be a much larger number than the link MTU
as the HCA does the segmentation/reassembly down to the path MTU.

 $ man opensm
 says that it can be set in the partitions.conf

Yes, MTU for the IPoIB interface is set in the partition file. This
would need configuring for the larger (4K) MTU assuming all ports
support the 4K MTU. If not, some ports won't be able to join the IPoIB
broadcast (or other) IB multicast groups and IPoIB won't work.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 14:49, Hal Rosenstock wrote:
 On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
 Hello. I have some servers, with mellanox ConnectX-3 and have some questions:
 Why max_mtu differs with active_mtu? 
 
 What does peer port say for max MTU ?
 
 How can i set active mtu?
 
 SM sets active MTU to min of peer ports max MTUs.

So with peer port max MTU do you mean this file?:

/sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

I've seen that it can be set as well. I've got two ConnectX-2 machines
connected back2back. In general these have 4K max and active.

So let's try something:

Host1:
$ echo 2048  /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu
# Port is not active, let's reactivate it.
$ echo 1  /sys/class/infiniband/mlx4_0/device/enable

ibv_devinfo Host1:
max_mtu:2048 (4)
active_mtu: 2048 (4)

Host2:
max_mtu:4096 (5)
active_mtu: 2048 (4)

Both had 4096 (5) before everywhere.
So that's the recommended way to reduce the MTU?

I've heard that reducing the MTU in a fabric can help fighting
congestion issues. As congestion control doesn't work yet, could this
help against congestion?

Cheers,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Hal Rosenstock
On 4/9/2013 9:16 AM, Sebastian Riemer wrote:
 On 09.04.2013 14:49, Hal Rosenstock wrote:
 On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
 Hello. I have some servers, with mellanox ConnectX-3 and have some 
 questions:
 Why max_mtu differs with active_mtu? 

 What does peer port say for max MTU ?

 How can i set active mtu?

 SM sets active MTU to min of peer ports max MTUs.
 
 So with peer port max MTU do you mean this file?:
 
 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

I meant NeighborMTU from PortInfo as active MTU and MTUCap there is
supported MTU.

-- Hal
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 15:34, Hal Rosenstock wrote:
 On 4/9/2013 9:16 AM, Sebastian Riemer wrote:
 On 09.04.2013 14:49, Hal Rosenstock wrote:
 On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
 Hello. I have some servers, with mellanox ConnectX-3 and have some 
 questions:
 Why max_mtu differs with active_mtu? 

 What does peer port say for max MTU ?

 How can i set active mtu?

 SM sets active MTU to min of peer ports max MTUs.

 So with peer port max MTU do you mean this file?:

 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu
 
 I meant NeighborMTU from PortInfo as active MTU and MTUCap there is
 supported MTU.

So these values are exactly the same as in ibv_devinfo and can be set
in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu.

I've found the PortInfo with the command
smpquery portinfo -C mlx4_0 3 1
where I'm using the first HCA to contact the SM. I tell the SM the
destination LID ('3' here in my case) and the destination port ('1').

Is there another method to set the max MTU?

I know that switches can also set the max MTU for their switch ports
where most of them use 2048 as default.
How to change these switch port MTUs for unmanaged switches?

On managed switches this can be done over the web front-end.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Hal Rosenstock
On 4/9/2013 9:56 AM, Sebastian Riemer wrote:
 On 09.04.2013 15:34, Hal Rosenstock wrote:
 On 4/9/2013 9:16 AM, Sebastian Riemer wrote:
 On 09.04.2013 14:49, Hal Rosenstock wrote:
 On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
 Hello. I have some servers, with mellanox ConnectX-3 and have some 
 questions:
 Why max_mtu differs with active_mtu? 

 What does peer port say for max MTU ?

 How can i set active mtu?

 SM sets active MTU to min of peer ports max MTUs.

 So with peer port max MTU do you mean this file?:

 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

 I meant NeighborMTU from PortInfo as active MTU and MTUCap there is
 supported MTU.
 
 So these values are exactly the same as in ibv_devinfo and can be set
 in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu.
 
 I've found the PortInfo with the command
 smpquery portinfo -C mlx4_0 3 1
 where I'm using the first HCA to contact the SM. I tell the SM the
 destination LID ('3' here in my case) and the destination port ('1').
 
 Is there another method to set the max MTU?

That doesn't set max MTU (MTUCap) but merely reads it (for that port).

 I know that switches can also set the max MTU for their switch ports
 where most of them use 2048 as default.

You would need to contact your CA and/or switch vendor(s) (see below).

 How to change these switch port MTUs for unmanaged switches?
 
 On managed switches this can be done over the web front-end.

Yes. MTUCap is RO in terms of the SM so there are only out of band
mechanisms to change this which are vendor specific like a web front end.

-- Hal

 Cheers,
 Sebastian
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tune ib stack

2013-04-09 Thread Sebastian Riemer
On 09.04.2013 16:23, Hal Rosenstock wrote:
 So these values are exactly the same as in ibv_devinfo and can be set
 in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu.

 I've found the PortInfo with the command
 smpquery portinfo -C mlx4_0 3 1
 where I'm using the first HCA to contact the SM. I tell the SM the
 destination LID ('3' here in my case) and the destination port ('1').

 Is there another method to set the max MTU?
 
 That doesn't set max MTU (MTUCap) but merely reads it (for that port).

Sorry, copy and paste error. I've meant the mlx4 file:
/sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu

But you've answered that by vendor specific. Thanks for the valuable
information!

For us most interesting would be if the MTU can be changed live without
any service disruption. Looks like the mlx4 driver can't provide that.
Perhaps switches can do that.

Cheers,
Sebastian

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html