[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3
Hello I wonder if anyone can suggest why previously working dpdk code may fail in the Mellanox pmd code in dpdk-2.1.0, seemingly due to failure to create a "resource domain" via ibv_exp_create_res_domain(). I must admit I haven't seen that verb before, and it appears to be returning null with no error message. The DPDK log gives these hints: PMD: librte_pmd_mlx4: 0xa4fc20: TX queues number update: 0 -> 1 PMD: librte_pmd_mlx4: 0xa4fc20: RX queues number update: 0 -> 1 PMD: librte_pmd_mlx4: 0xa4fc20: RD creation failure: Cannot allocate memory I'm using dpdk-2.10.0 and MLNX_OFED_LINUX-3.1-1.0.3 on ubuntu14.04 with a connectx-3 card. thanks bill
[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3
Hi Olga Firmware is version 2.35.5100. Configuration details below. Thanks for any hints. bill root:~# cat /etc/modprobe.d/mlx4_core.conf options mlx4_core port_type_array=2,2 num_vfs=16 probe_vf=4 root:~# ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 1 Node GUID: 0xf4521403008f1680 System image GUID: 0xf4521403008f1683 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0c01 Port GUID: 0xf65214fffe8f1680 Link layer: Ethernet CA 'mlx4_1' CA type: MT4100 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 1 Node GUID: 0x00140500c2d3b05f System image GUID: 0xf4521403008f1683 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0c01 Port GUID: 0xfc9739fffe1272c3 Link layer: Ethernet CA 'mlx4_2' CA type: MT4100 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 1 Node GUID: 0x00140500b90af10c System image GUID: 0xf4521403008f1683 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0c01 Port GUID: 0x20ecbbfffeefb934 Link layer: Ethernet CA 'mlx4_3' CA type: MT4100 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 1 Node GUID: 0x001405009661e607 System image GUID: 0xf4521403008f1683 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0c01 Port GUID: 0xf4c8e6fffe5abc89 Link layer: Ethernet CA 'mlx4_4' CA type: MT4100 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 1 Node GUID: 0x00140500bd09e128 System image GUID: 0xf4521403008f1683 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0c01 Port GUID: 0x5828e1fffe34f919 Link layer: Ethernet On Thu, Oct 8, 2015 at 2:03 AM, Olga Shern wrote: > Hi Bill, > > Can you please check the fw version that is installed on your ConnectX3? > > Thanks > > > Sent from Samsung Mobile. > > > Original message > From: Olga Shern > Date:08/10/2015 7:55 AM (GMT+00:00) > To: Bill O'Hara ,dev at dpdk.org > Subject: RE: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and > MLNX_OFED-3.1-1.0.3 > > Hi Bill, > > There shouldn?t be any problem with what you are doing. > We are checking this now. > > Best Regards, > Olga > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org ] On Behalf > Of Bill O'Hara > Sent: Thursday, October 08, 2015 6:05 AM > To: dev at dpdk.org > Subject: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and > MLNX_OFED-3.1-1.0.3 > > Hello > > I wonder if anyone can suggest why previously working dpdk code may fail > in the Mellanox pmd code in dpdk-2.1.0, seemingly due to failure to create > a "resource domain" via ibv_exp_create_res_domain(). I must admit I haven't > seen that verb before, and it appears to be returning null with no error > message. > > The DPDK log gives these hints: > > PMD: librte_pmd_mlx4: 0xa4fc20: TX queues number update: 0 -> 1 > PMD: librte_pmd_mlx4: 0xa4fc20: RX queues number update: 0 -> 1 > PMD: librte_pmd_mlx4: 0xa4fc20: RD creation failure: Cannot allocate memory > > I'm using dpdk-2.10.0 and MLNX_OFED_LINUX-3.1-1.0.3 on ubuntu14.04 with a > connectx-3 card. > > thanks > bill >
[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3
Olga If it's all all helpful, linking our code against dpdk-2.0 and (statically) against the appropriate custom-built libibverbs that we used with it, works on those machines. There is of course no call to ibv_exp_create_res_domain() in that version of the library. But it at least confirms basic operation of the upgraded OFED and firmware on those boxes. Is there anything else we can check or confirm for you? thanks bill On Thu, Oct 8, 2015 at 9:06 AM, Bill O'Hara wrote: > Hi Olga > > Firmware is version 2.35.5100. Configuration details below. > > Thanks for any hints. > bill > > root:~# cat /etc/modprobe.d/mlx4_core.conf > options mlx4_core port_type_array=2,2 num_vfs=16 probe_vf=4 > > root:~# ibstat > CA 'mlx4_0' > CA type: MT4099 > Number of ports: 1 > Firmware version: 2.35.5100 > Hardware version: 1 > Node GUID: 0xf4521403008f1680 > System image GUID: 0xf4521403008f1683 > Port 1: > State: Active > Physical state: LinkUp > Rate: 56 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x0c01 > Port GUID: 0xf65214fffe8f1680 > Link layer: Ethernet > CA 'mlx4_1' > CA type: MT4100 > Number of ports: 1 > Firmware version: 2.35.5100 > Hardware version: 1 > Node GUID: 0x00140500c2d3b05f > System image GUID: 0xf4521403008f1683 > Port 1: > State: Active > Physical state: LinkUp > Rate: 56 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x0c01 > Port GUID: 0xfc9739fffe1272c3 > Link layer: Ethernet > CA 'mlx4_2' > CA type: MT4100 > Number of ports: 1 > Firmware version: 2.35.5100 > Hardware version: 1 > Node GUID: 0x00140500b90af10c > System image GUID: 0xf4521403008f1683 > Port 1: > State: Active > Physical state: LinkUp > Rate: 56 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x0c01 > Port GUID: 0x20ecbbfffeefb934 > Link layer: Ethernet > CA 'mlx4_3' > CA type: MT4100 > Number of ports: 1 > Firmware version: 2.35.5100 > Hardware version: 1 > Node GUID: 0x001405009661e607 > System image GUID: 0xf4521403008f1683 > Port 1: > State: Active > Physical state: LinkUp > Rate: 56 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x0c01 > Port GUID: 0xf4c8e6fffe5abc89 > Link layer: Ethernet > CA 'mlx4_4' > CA type: MT4100 > Number of ports: 1 > Firmware version: 2.35.5100 > Hardware version: 1 > Node GUID: 0x00140500bd09e128 > System image GUID: 0xf4521403008f1683 > Port 1: > State: Active > Physical state: LinkUp > Rate: 56 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x0c01 > Port GUID: 0x5828e1fffe34f919 > Link layer: Ethernet > > On Thu, Oct 8, 2015 at 2:03 AM, Olga Shern wrote: > >> Hi Bill, >> >> Can you please check the fw version that is installed on your ConnectX3? >> >> Thanks >> >> >> Sent from Samsung Mobile. >> >> >> Original message ---- >> From: Olga Shern >> Date:08/10/2015 7:55 AM (GMT+00:00) >> To: Bill O'Hara ,dev at dpdk.org >> Subject: RE: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and >> MLNX_OFED-3.1-1.0.3 >> >> Hi Bill, >> >> There shouldn?t be any problem with what you are doing. >> We are checking this now. >> >> Best Regards, >> Olga >> >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org ] On >> Behalf >> Of Bill O'Hara >> Sent: Thursday, October 08, 2015 6:05 AM >> To: dev at dpdk.org >> Subject: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and >> MLNX_OFED-3.1-1.0.3 >> >> Hello >> >> I wonder if anyone can suggest why previously working dpdk code may fail >> in the Mellanox pmd code in dpdk-2.1.0, seemingly due to failure to create >> a "resource domain" via ibv_exp_create_res_domain(). I must admit I haven't >> seen that verb before, and it appears to be returning null with no error >> message. >> >> The DPDK log gives these hints: >> >> PMD: librte_pmd_mlx4: 0xa4fc20: TX queues number update: 0 -> 1 >> PMD: librte_pmd_mlx4: 0xa4fc20: RX queues number update: 0 -> 1 >> PMD: librte_pmd_mlx4: 0xa4fc20: RD creation failure: Cannot allocate >> memory >> >> I'm using dpdk-2.10.0 and MLNX_OFED_LINUX-3.1-1.0.3 on ubuntu14.04 with a >> connectx-3 card. >> >> thanks >> bill >> > >
[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3
0:01:00.2: mlx4_ib_add: allocated counter index 19 for port 1 [ 14.103712] mlx4_core :01:00.2: mlx4_ib: multi-function enabled [ 14.103715] mlx4_core :01:00.2: mlx4_ib: operating in qp1 tunnel mode [ 14.104441] mlx4_core :01:00.3: mlx4_ib_add: allocated counter index 20 for port 1 [ 14.110327] mlx4_core :01:00.3: mlx4_ib: multi-function enabled [ 14.110330] mlx4_core :01:00.3: mlx4_ib: operating in qp1 tunnel mode [ 14.111063] mlx4_core :01:00.4: mlx4_ib_add: allocated counter index 21 for port 1 [ 14.119068] mlx4_core :01:00.4: mlx4_ib: multi-function enabled [ 14.119071] mlx4_core :01:00.4: mlx4_ib: operating in qp1 tunnel mode [ 14.764446] init: plymouth-upstart-bridge main process ended, respawning [ 16.188261] mlx4_en: eth2: Link Up [ 16.188288] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 16.188291] mlx4_en: p3p2: Link Up [ 16.188321] mlx4_en: p3p3: Link Up [ 16.188335] mlx4_en: p3p4: Link Up [ 16.188339] IPv6: ADDRCONF(NETDEV_CHANGE): p3p2: link becomes ready [ 16.188351] mlx4_en: p3p5: Link Up [ 421.285141] Bits 55-60 of /proc/PID/pagemap entries are about to stop being page-shift some time soon. See the linux/Documentation/vm/pagemap.txt for details. [26236.560789] mlx4_en: p3p3: frag:0 - size:1522 prefix:0 stride:1536 [26236.667849] mlx4_en: p3p4: frag:0 - size:1522 prefix:0 stride:1536 [26236.782208] mlx4_en: p3p5: frag:0 - size:1522 prefix:0 stride:1536 // devices as seen by linux # ip addr 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 54:a0:50:85:79:87 brd ff:ff:ff:ff:ff:ff inet 192.168.0.174/24 brd 192.168.0.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::56a0:50ff:fe85:7987/64 scope link valid_lft forever preferred_lft forever 3: eth2: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether f4:52:14:8f:16:80 brd ff:ff:ff:ff:ff:ff inet 10.10.10.2/24 brd 10.10.10.255 scope global eth2 valid_lft forever preferred_lft forever inet6 fe80::f652:14ff:fe8f:1680/64 scope link valid_lft forever preferred_lft forever 4: p3p2: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether b2:00:7c:2b:3f:47 brd ff:ff:ff:ff:ff:ff inet 10.10.10.3/24 brd 10.10.10.255 scope global p3p2 valid_lft forever preferred_lft forever inet6 fe80::b000:7cff:fe2b:3f47/64 scope link valid_lft forever preferred_lft forever 5: p3p3: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 3a:3d:c7:e0:ed:5a brd ff:ff:ff:ff:ff:ff inet6 fe80::383d:c7ff:fee0:ed5a/64 scope link valid_lft forever preferred_lft forever 6: p3p4: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether ee:6a:a6:79:24:4c brd ff:ff:ff:ff:ff:ff inet6 fe80::ec6a:a6ff:fe79:244c/64 scope link valid_lft forever preferred_lft forever 7: p3p5: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 8a:7a:30:00:46:33 brd ff:ff:ff:ff:ff:ff inet6 fe80::887a:30ff:fe00:4633/64 scope link valid_lft forever preferred_lft forever // our code enumerating dpdk capable ports.. # ./listdevices Eth device info { port: 0 driver name: librte_pmd_mlx4 mac address: F4:52:14:8F:16:80 PCI device: :01:00.0 } Eth device info { port: 1 driver name: librte_pmd_mlx4 mac address: B2:00:7C:2B:3F:47 PCI device: :01:00.1 } Eth device info { port: 2 driver name: librte_pmd_mlx4 mac address: 3A:3D:C7:E0:ED:5A PCI device: :01:00.2 } Eth device info { port: 3 driver name: librte_pmd_mlx4 mac address: EE:6A:A6:79:24:4C PCI device: :01:00.3 } Eth device info { port: 4 driver name: librte_pmd_mlx4 mac address: 8A:7A:30:00:46:33 PCI device: :01:00.4 } On Thu, Oct 8, 2015 at 3:27 PM, Olga Shern wrote: > Hi Bill, > > > > Starting from DPDK 2.1 ConnectX-3 PMD is based on ?accelerated verbs?, > ibv_exp_create_res_domain is coming from this new API. > > Just to make sure I understand what you are doing: you have enabled SRIOV > and you are running DPDK on hypervisor on the probed VFs that you have > created, right? > > We did test this combination (dpdk2.1 and ofed3.1-3)on hypervisor on the > PF and also on VM on VF, but in fact, I didn?t try to run DPDK on the VFs > on hypervisor, I will check this. > > Meanwhile, can you please send the output of the application on the start > up. Do you see any errors in dmesg? > > > > Best Regards, > > Olga > > > > *From:* Bill O'Hara [mailto:billtohara at gmail.com] > *Sent:* Thursday, October 08, 2015 11:55 PM > *To:* Olga Shern > *Cc:* dev at dpdk.org > *Subject:* Re: [
[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3
Hi Olga, Sorry for slow response. Initially, raw_ethernet_bw and raw_ethernet_lat were not working. * we clean installed some boxes, rather than upgrading from previous ofed etc and now the raw_ethernet_bw and raw_ethernet_lat work. (This is on ubuntu14.04 with most recent mellanox ofed release). * dpdk-2.1 still did not work. We clean installed the build box, to clear any possibility of mismatches due to upgrade process. Still not working. * at the suggestion of one of our developers, we switched from statically linking dpdk and libibverbs to static dpdk and dynamic libibverbs. Now the dpdk test programs work. * at this point, we've switched back to our own kernel build (with patches), our own dpdk programs, and everything appears to be correct. Apologies for the firedrill -- we do not yet know the exact root cause, but it appears to be related to having done an upgrade of mellanox ofed on the ubuntu14.04 boxes AND building with static linking of libibverbs (though we double checked that the correct .a file was being linked). I will let you know if we have a more satisfactory answer -- we're also more highly prioritizing hermetic builds at this point. thanks for your help and hints! bill On Thu, Oct 15, 2015 at 5:50 AM, Olga Shern wrote: > Hi Bill, > > > > Sorry it took me a while to reply ?. > > We did more tests and didn?t reproduce the issue. > > I also checked the code and seems that there are only 2 conditions when > RD creation fails, > > 1. The arguments we are passing to the RD creation function are > wrong ? this is not reasonable, because this is PMD code and here the > behavior is not deterministic , works in most cases and doesn?t work on > your setup ? > > 2. calloc function is failing ? also not reasonable > > > > There is a verb application that uses accelerated verbs and res domains, > raw_ethernet_bw > > Example: > > raw_ethernet_bw -d mlx4_0 -i 1 --client -E 00:00:00:00:01:02 > --use_res_domain --verb_type=accl > > > > Another suggestion, can you please compile PMD with debug enabled, it may > give more details ? > > > > Best Regards, > > Olga > > > > *From:* Bill O'Hara [mailto:billtohara at gmail.com] > *Sent:* Saturday, October 10, 2015 12:18 AM > *To:* Olga Shern > > *Cc:* dev at dpdk.org > *Subject:* Re: [dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and > MLNX_OFED-3.1-1.0.3 > > > > Hi Olga > > > > Thanks for the pointer towards the use of "accelerated verbs". > > > > Yes, SRIOV is enabled, dpdk on the hypervisor on the probed VFs. That > said, it also fails on the underlying PF as far as I see (e.g. below the > log shows (VF: false) for device mlx4_0 and the code fails in RD creation > on this as well as on one of the VFs). I don't see any messages generated > in dmesg that seem to indicate errors at any point, but extract included > below. > > > > But here's perhaps the crux! Switching off sriov and running with the new > combination of dpdk and ofed against just a single PF also fails in exactly > the same way (RD creation failure). > > > > The old code continues to work. I will audit our code to make sure we're > not missing something when using dpdk-2.1. In the meantime, do you have a > minimal test that involves RD creation? > > > > thanks > > bill > > > > > > > > // DPDK output for application run using dpdk-2.1 and ofed 3.1 > > EAL: Detected lcore 0 as core 0 on socket 0 > > EAL: Detected lcore 1 as core 1 on socket 0 > > EAL: Detected lcore 2 as core 2 on socket 0 > > EAL: Detected lcore 3 as core 3 on socket 0 > > EAL: Detected lcore 4 as core 4 on socket 0 > > EAL: Detected lcore 5 as core 5 on socket 0 > > EAL: Detected lcore 6 as core 0 on socket 0 > > EAL: Detected lcore 7 as core 1 on socket 0 > > EAL: Detected lcore 8 as core 2 on socket 0 > > EAL: Detected lcore 9 as core 3 on socket 0 > > EAL: Detected lcore 10 as core 4 on socket 0 > > EAL: Detected lcore 11 as core 5 on socket 0 > > EAL: Support maximum 128 logical core(s) by configuration. > > EAL: Detected 12 lcore(s) > > EAL: VFIO modules not all loaded, skip VFIO support... > > EAL: Setting up physically contiguous memory... > > EAL: Ask a virtual area of 0xe40 bytes > > EAL: Virtual area found at 0x7fffe600 (size = 0xe40) > > EAL: Ask a virtual area of 0x20 bytes > > EAL: Virtual area found at 0x7fffe5c0 (size = 0x20) > > EAL: Ask a virtual area of 0x7180 bytes > > EAL: Virtual area found at 0x7fff7420 (size = 0x7180) > > EAL: Ask a virtual area of 0x20 bytes > > EAL: Virtual area fou
[dpdk-dev] Multi-process model and mlx4 pmd
I see from the Mellanox PMD release notes: http://www.mellanox.com/related-docs/prod_software/Mellanox_ConnectX3_ConnectX3Pro_DPDK_PMD_Release_Notes_v1.7-8_2.8.4.pdf that the primary and secondary multi-process model is not supported. Though it's not noted as a limitation in the DPDK guide here: http://dpdk.org/doc/guides/nics/mlx4.html Can anyone shed light on what the source of the limitation is? Is this something that can be worked around, or fixed in the driver, or something fundamental? thanks bill