Re: tune ib stack
Sorry for bumping old thread, i'm solve my problems with new firmware. I have supermicro servers that rebrand mellanox firmware (recompile and change some bits) Now all works fine i have 40 gb/s QDR instead of 10 Gb/s 2013/4/9 Sebastian Riemer sebastian.rie...@profitbricks.com: On 09.04.2013 16:23, Hal Rosenstock wrote: So these values are exactly the same as in ibv_devinfo and can be set in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. I've found the PortInfo with the command smpquery portinfo -C mlx4_0 3 1 where I'm using the first HCA to contact the SM. I tell the SM the destination LID ('3' here in my case) and the destination port ('1'). Is there another method to set the max MTU? That doesn't set max MTU (MTUCap) but merely reads it (for that port). Sorry, copy and paste error. I've meant the mlx4 file: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu But you've answered that by vendor specific. Thanks for the valuable information! For us most interesting would be if the MTU can be changed live without any service disruption. Looks like the mlx4 driver can't provide that. Perhaps switches can do that. Cheers, Sebastian -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 14.05.2013 12:02, Vasiliy Tolstov wrote: Sorry for bumping old thread, i'm solve my problems with new firmware. I have supermicro servers that rebrand mellanox firmware (recompile and change some bits) Now all works fine i have 40 gb/s QDR instead of 10 Gb/s Thanks, sharing lesson learned experience is never wrong. Especially as there aren't many IB specialists in the world. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
2013/4/9 Sebastian Riemer sebastian.rie...@profitbricks.com: Because 2048 is the default and 4096 is the max. supported MTU by the hardware. How can i set active mtu? Something like this: echo 4096 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu After doing this all srp connections down and port is down. I need to restart openibd 06:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Subsystem: Mellanox Technologies Device 0017 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 42 Region 0: Memory at df90 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at de00 (64-bit, prefetchable) [size=8M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Not readable Capabilities: [9c] MSI-X: Enable+ Count=128 Masked- Vector table: BAR=0 offset=0007c000 PBA: BAR=0 offset=0007d000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [148 v1] Device Serial Number 00-25-90-ff-ff-17-9b-24 Capabilities: [18c v1] #19 Kernel driver in use: mlx4_core Kernel modules: mlx4_core Could be a bug. Which OFED/Kernel (if using in-tree IB modules) do you use? Mine says with ConnectX2 QDR: 40 Gb/sec (4X QDR) I'm using stock 3.8.6 kernel and xen patches on top. And i'm use modules provided with kernel. (only ib_srp i'm use from Bart github repo) You should see 40 Gb/sec (4X QDR) here. Perhaps the OFED is too old so that FDR and ConnectX 3 aren't supported, yet. 10 Gb/sec (4X) seems to be the default case if a rate isn't supported. Yes, in older card with ConnecX i see this, but in case of ConnectX-3 only 10 Gb -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 13:51, Vasiliy Tolstov wrote: Something like this: echo 4096 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu After doing this all srp connections down and port is down. I need to restart openibd Sorry for that! It's much easier to set the IP MTU. Managed switches support setting the RDMA MTU. So it could be possible that it is a setting in the SM config. But I'm not sure. $ man opensm says that it can be set in the partitions.conf You should see 40 Gb/sec (4X QDR) here. Perhaps the OFED is too old so that FDR and ConnectX 3 aren't supported, yet. 10 Gb/sec (4X) seems to be the default case if a rate isn't supported. Yes, in older card with ConnecX i see this, but in case of ConnectX-3 only 10 Gb The kernel version is okay. It depends on the user space. There is a support note in OFED 3.5: - ConnectX-3 (fw-ConnectX3 Rev 2.11.0500) (FDR and FDR10 Modes are Supported) Before OFED 3.5 these HCAs aren't supported. A look at the related source code could be worth a try. Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 8:15 AM, Sebastian Riemer wrote: On 09.04.2013 13:51, Vasiliy Tolstov wrote: Something like this: echo 4096 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu After doing this all srp connections down and port is down. I need to restart openibd Sorry for that! It's much easier to set the IP MTU. Managed switches support setting the RDMA MTU. So it could be possible that it is a setting in the SM config. But I'm not sure. IP MTU is different than link MTU. For UD mode, it's link MTU - 4. For RC (connected) mode, this can be a much larger number than the link MTU as the HCA does the segmentation/reassembly down to the path MTU. $ man opensm says that it can be set in the partitions.conf Yes, MTU for the IPoIB interface is set in the partition file. This would need configuring for the larger (4K) MTU assuming all ports support the 4K MTU. If not, some ports won't be able to join the IPoIB broadcast (or other) IB multicast groups and IPoIB won't work. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 14:49, Hal Rosenstock wrote: On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: Hello. I have some servers, with mellanox ConnectX-3 and have some questions: Why max_mtu differs with active_mtu? What does peer port say for max MTU ? How can i set active mtu? SM sets active MTU to min of peer ports max MTUs. So with peer port max MTU do you mean this file?: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu I've seen that it can be set as well. I've got two ConnectX-2 machines connected back2back. In general these have 4K max and active. So let's try something: Host1: $ echo 2048 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu # Port is not active, let's reactivate it. $ echo 1 /sys/class/infiniband/mlx4_0/device/enable ibv_devinfo Host1: max_mtu:2048 (4) active_mtu: 2048 (4) Host2: max_mtu:4096 (5) active_mtu: 2048 (4) Both had 4096 (5) before everywhere. So that's the recommended way to reduce the MTU? I've heard that reducing the MTU in a fabric can help fighting congestion issues. As congestion control doesn't work yet, could this help against congestion? Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 9:16 AM, Sebastian Riemer wrote: On 09.04.2013 14:49, Hal Rosenstock wrote: On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: Hello. I have some servers, with mellanox ConnectX-3 and have some questions: Why max_mtu differs with active_mtu? What does peer port say for max MTU ? How can i set active mtu? SM sets active MTU to min of peer ports max MTUs. So with peer port max MTU do you mean this file?: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu I meant NeighborMTU from PortInfo as active MTU and MTUCap there is supported MTU. -- Hal -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 15:34, Hal Rosenstock wrote: On 4/9/2013 9:16 AM, Sebastian Riemer wrote: On 09.04.2013 14:49, Hal Rosenstock wrote: On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: Hello. I have some servers, with mellanox ConnectX-3 and have some questions: Why max_mtu differs with active_mtu? What does peer port say for max MTU ? How can i set active mtu? SM sets active MTU to min of peer ports max MTUs. So with peer port max MTU do you mean this file?: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu I meant NeighborMTU from PortInfo as active MTU and MTUCap there is supported MTU. So these values are exactly the same as in ibv_devinfo and can be set in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. I've found the PortInfo with the command smpquery portinfo -C mlx4_0 3 1 where I'm using the first HCA to contact the SM. I tell the SM the destination LID ('3' here in my case) and the destination port ('1'). Is there another method to set the max MTU? I know that switches can also set the max MTU for their switch ports where most of them use 2048 as default. How to change these switch port MTUs for unmanaged switches? On managed switches this can be done over the web front-end. Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 4/9/2013 9:56 AM, Sebastian Riemer wrote: On 09.04.2013 15:34, Hal Rosenstock wrote: On 4/9/2013 9:16 AM, Sebastian Riemer wrote: On 09.04.2013 14:49, Hal Rosenstock wrote: On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote: Hello. I have some servers, with mellanox ConnectX-3 and have some questions: Why max_mtu differs with active_mtu? What does peer port say for max MTU ? How can i set active mtu? SM sets active MTU to min of peer ports max MTUs. So with peer port max MTU do you mean this file?: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu I meant NeighborMTU from PortInfo as active MTU and MTUCap there is supported MTU. So these values are exactly the same as in ibv_devinfo and can be set in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. I've found the PortInfo with the command smpquery portinfo -C mlx4_0 3 1 where I'm using the first HCA to contact the SM. I tell the SM the destination LID ('3' here in my case) and the destination port ('1'). Is there another method to set the max MTU? That doesn't set max MTU (MTUCap) but merely reads it (for that port). I know that switches can also set the max MTU for their switch ports where most of them use 2048 as default. You would need to contact your CA and/or switch vendor(s) (see below). How to change these switch port MTUs for unmanaged switches? On managed switches this can be done over the web front-end. Yes. MTUCap is RO in terms of the SM so there are only out of band mechanisms to change this which are vendor specific like a web front end. -- Hal Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tune ib stack
On 09.04.2013 16:23, Hal Rosenstock wrote: So these values are exactly the same as in ibv_devinfo and can be set in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu. I've found the PortInfo with the command smpquery portinfo -C mlx4_0 3 1 where I'm using the first HCA to contact the SM. I tell the SM the destination LID ('3' here in my case) and the destination port ('1'). Is there another method to set the max MTU? That doesn't set max MTU (MTUCap) but merely reads it (for that port). Sorry, copy and paste error. I've meant the mlx4 file: /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu But you've answered that by vendor specific. Thanks for the valuable information! For us most interesting would be if the MTU can be changed live without any service disruption. Looks like the mlx4 driver can't provide that. Perhaps switches can do that. Cheers, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html