Re: [ewg] ibv_create_qp fails with enomem
On Fri, Mar 23, 2012 at 01:42:31PM -0700, Ker Can wrote: We updated to MLNX_OFED_LINUX-1.5.3-3.0.0 from OFED-1.5.2_1 recently and we're running into issues where we get a ENOMEM error from ibv_create_qp(). Earlier with this OFED version we were running into a ENOMEM with ibv_reg_mr() but that was resolved by setting log_num_mtt to 24 as described in this link http://h10025.www1.hp.com/ewfrf/wc/document?cc=uslc=endlc=endocname= c03113904 All the other settings in /sys/module/mlx4_core/parameters are the default values. The node itself has 96GB of memory we're nowhere close to using that. Is there any way to figure out whats going on ? You probably need to set log_mtts_per_seg=3 on mlx4_core. That's the current default in upstream OFED, but I believe MLNX_OFED_LINUX-1.5.3-3.0.0 defaults it to 0. -- Shawn -- --- This email, along with any attachments, is confidential. If you believe you received this message in error, please contact the sender immediately and delete all copies of the message. Thank you. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] ibv_create_qp fails with enomem
We updated to MLNX_OFED_LINUX-1.5.3-3.0.0 from OFED-1.5.2_1 recently and we're running into issues where we get a ENOMEM error from ibv_create_qp(). Earlier with this OFED version we were running into a ENOMEM with ibv_reg_mr() but that was resolved by setting log_num_mtt to 24 as described in this link http://h10025.www1.hp.com/ewfrf/wc/document?cc=uslc=endlc=endocname=c03113904 All the other settings in /sys/module/mlx4_core/parameters are the default values. The node itself has 96GB of memory we're nowhere close to using that. Is there any way to figure out whats going on ? thanks K. Can here's the output from ibv_devinfo -v: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:004b:d5e6 sys_image_guid: 0002:c903:004b:d5e9 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: HP_018009 phys_port_cnt: 2 max_mr_size: 0x page_size_cap: 0xfe00 max_qp: 260032 max_qp_wr: 16351 device_cap_flags: 0x007c9c76 max_sge: 32 max_sge_rd: 0 max_cq: 65408 max_cqe: 4194303 max_mr: 524272 max_pd: 32764 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4160512 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 2 max_mcast_grp: 8192 max_mcast_qp_attach: 120 max_total_mcast_qp_attach: 983040 max_ah: 0 max_fmr: 0 max_srq: 65472 max_srq_wr: 16383 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 311 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e7 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 158 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e8
Re: [ewg] ibv_create_qp fails with enomem
Sorry about all the formatting - trying again with plain text. We updated to MLNX_OFED_LINUX-1.5.3-3.0.0 from OFED-1.5.2_1 recently and we're running into issues where we get a ENOMEM error from ibv_create_qp(). Earlier with this OFED version we were running into a ENOMEM with ibv_reg_mr() but that was resolved by setting log_num_mtt to 24 as described in this link http://h10025.www1.hp.com/ewfrf/wc/document?cc=uslc=endlc=endocname=c03113904 All the other settings in /sys/module/mlx4_core/parameters are the default values. The node itself has 96GB of memory we're nowhere close to using that. Is there any way to figure out whats going on ? thanks K. Can here's the output from ibv_devinfo -v: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:004b:d5e6 sys_image_guid: 0002:c903:004b:d5e9 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: HP_018009 phys_port_cnt: 2 max_mr_size: 0x page_size_cap: 0xfe00 max_qp: 260032 max_qp_wr: 16351 device_cap_flags: 0x007c9c76 max_sge: 32 max_sge_rd: 0 max_cq: 65408 max_cqe: 4194303 max_mr: 524272 max_pd: 32764 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4160512 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 2 max_mcast_grp: 8192 max_mcast_qp_attach: 120 max_total_mcast_qp_attach: 983040 max_ah: 0 max_fmr: 0 max_srq: 65472 max_srq_wr: 16383 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 311 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e7 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 158 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]:
Re: [ewg] ibv_create_qp fails with enomem
I hate to be this way but I feel I must mention this. Mellanox OFED is their product. If you are having trouble with their software you should contact them. That said, why did you feel you needed to update to their version rather than the latest OFED or perhaps even using what is provided with a standard distro? Ira On Fri, 23 Mar 2012 13:47:46 -0700 (PDT) Ker Can kercan7...@yahoo.com wrote: Sorry about all the formatting - trying again with plain text. We updated to MLNX_OFED_LINUX-1.5.3-3.0.0 from OFED-1.5.2_1 recently and we're running into issues where we get a ENOMEM error from ibv_create_qp(). Earlier with this OFED version we were running into a ENOMEM with ibv_reg_mr() but that was resolved by setting log_num_mtt to 24 as described in this link http://h10025.www1.hp.com/ewfrf/wc/document?cc=uslc=endlc=endocname=c03113904 All the other settings in /sys/module/mlx4_core/parameters are the default values. The node itself has 96GB of memory we're nowhere close to using that. Is there any way to figure out whats going on ? thanks K. Can here's the output from ibv_devinfo -v: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:004b:d5e6 sys_image_guid: 0002:c903:004b:d5e9 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: HP_018009 phys_port_cnt: 2 max_mr_size: 0x page_size_cap: 0xfe00 max_qp: 260032 max_qp_wr: 16351 device_cap_flags: 0x007c9c76 max_sge: 32 max_sge_rd: 0 max_cq: 65408 max_cqe: 4194303 max_mr: 524272 max_pd: 32764 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4160512 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 2 max_mcast_grp: 8192 max_mcast_qp_attach: 120 max_total_mcast_qp_attach: 983040 max_ah: 0 max_fmr: 0 max_srq: 65472 max_srq_wr: 16383 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 311 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e7 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 158 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0
Re: [ewg] ibv_create_qp fails with enomem
Ker: In your /etc/security/limits.conf file, try adding the lines * hard memlock unlimited * soft memlock unlimited We hit the same problem and this cured it. Bob Russell r...@iol.unh.edu On Fri, 23 Mar 2012, Ker Can wrote: Sorry about all the formatting - trying again with plain text. We updated to MLNX_OFED_LINUX-1.5.3-3.0.0 from OFED-1.5.2_1 recently and we're running into issues where we get a ENOMEM error from ibv_create_qp(). Earlier with this OFED version we were running into a ENOMEM with ibv_reg_mr() but that was resolved by setting log_num_mtt to 24 as described in this link http://h10025.www1.hp.com/ewfrf/wc/document?cc=uslc=endlc=endocname=c03113904 All the other settings in /sys/module/mlx4_core/parameters are the default values. The node itself has 96GB of memory we're nowhere close to using that. Is there any way to figure out whats going on ? thanks K. Can here's the output from ibv_devinfo -v: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:004b:d5e6 sys_image_guid: 0002:c903:004b:d5e9 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: HP_018009 phys_port_cnt: 2 max_mr_size: 0x page_size_cap: 0xfe00 max_qp: 260032 max_qp_wr: 16351 device_cap_flags: 0x007c9c76 max_sge: 32 max_sge_rd: 0 max_cq: 65408 max_cqe: 4194303 max_mr: 524272 max_pd: 32764 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4160512 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 2 max_mcast_grp: 8192 max_mcast_qp_attach: 120 max_total_mcast_qp_attach: 983040 max_ah: 0 max_fmr: 0 max_srq: 65472 max_srq_wr: 16383 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 311 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e7 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 158 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18
Re: [ewg] ibv_create_qp fails with enomem
Bob, Our max locked memory was already set to unlimited. thanks K. Can - Original Message - From: Robert D. Russell r...@iol.unh.edu To: Ker Can kercan7...@yahoo.com Cc: ewg@lists.openfabrics.org ewg@lists.openfabrics.org Sent: Friday, March 23, 2012 5:08 PM Subject: Re: [ewg] ibv_create_qp fails with enomem Ker: In your /etc/security/limits.conf file, try adding the lines * hard memlock unlimited * soft memlock unlimited We hit the same problem and this cured it. Bob Russell r...@iol.unh.edu On Fri, 23 Mar 2012, Ker Can wrote: Sorry about all the formatting - trying again with plain text. We updated to MLNX_OFED_LINUX-1.5.3-3.0.0 from OFED-1.5.2_1 recently and we're running into issues where we get a ENOMEM error from ibv_create_qp(). Earlier with this OFED version we were running into a ENOMEM with ibv_reg_mr() but that was resolved by setting log_num_mtt to 24 as described in this link http://h10025.www1.hp.com/ewfrf/wc/document?cc=uslc=endlc=endocname=c03113904 All the other settings in /sys/module/mlx4_core/parameters are the default values. The node itself has 96GB of memory we're nowhere close to using that. Is there any way to figure out whats going on ? thanks K. Can here's the output from ibv_devinfo -v: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:004b:d5e6 sys_image_guid: 0002:c903:004b:d5e9 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: HP_018009 phys_port_cnt: 2 max_mr_size: 0x page_size_cap: 0xfe00 max_qp: 260032 max_qp_wr: 16351 device_cap_flags: 0x007c9c76 max_sge: 32 max_sge_rd: 0 max_cq: 65408 max_cqe: 4194303 max_mr: 524272 max_pd: 32764 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 4160512 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 2 max_mcast_grp: 8192 max_mcast_qp_attach: 120 max_total_mcast_qp_attach: 983040 max_ah: 0 max_fmr: 0 max_srq: 65472 max_srq_wr: 16383 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 311 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80::::0002:c903:004b:d5e7 port: 2 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 9 port_lid: 158 port_lmc: 0x00 link_layer: IB max_msg_sz: 0x4000