Re: [Gluster-users] Sorry for x-post: RDMA Mounts drop with "transport endpoints not connected"

Brian Smith Fri, 30 Mar 2012 17:39:56 -0700

I've got my OFED (client & server) versions in-sync with the stockversion that are part of the latest Scientific Linux 6.1 packages(removing QLogic's OFED distribution). I'm going to see if thestability issues ensue.


SL 6.1
Kernel: 2.6.32-220.2.1.el6.x86_64
HCA: InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
Switches:
 roots: 1 x QLogic 12300 w/ SM, 2 x QLogic 12200
 edge:  5 x QLogic 12200


ibv_devinfo -v
hca_id:    qib0
    transport:            InfiniBand (0)
    fw_ver:                0.0.0
    node_guid:            0011:7500:0078:b690
    sys_image_guid:            0011:7500:0078:b690
    vendor_id:            0x1175
    vendor_part_id:            29474
    hw_ver:                0x2
    board_id:            InfiniPath_QLE7340
    phys_port_cnt:            1
    max_mr_size:            0xffffffffffffffff
    page_size_cap:            0x1000
    max_qp:                16384
    max_qp_wr:            16383
    device_cap_flags:        0x00003d06
    max_sge:            96
    max_sge_rd:            0
    max_cq:                131071
    max_cqe:            196607
    max_mr:                65536
    max_pd:                65535
    max_qp_rd_atom:            16
    max_ee_rd_atom:            0
    max_res_rd_atom:        0
    max_qp_init_rd_atom:        255
    max_ee_init_rd_atom:        0
    atomic_cap:            ATOMIC_GLOB (2)
    max_ee:                0
    max_rdd:            0
    max_mw:                0
    max_raw_ipv6_qp:        0
    max_raw_ethy_qp:        0
    max_mcast_grp:            16384
    max_mcast_qp_attach:        16
    max_total_mcast_qp_attach:    262144
    max_ah:                65535
    max_fmr:            65536
    max_map_per_fmr:        32767
    max_srq:            1024
    max_srq_wr:            131071
    max_srq_sge:            128
    max_pkeys:            4
    local_ca_ack_delay:        0
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        2048 (4)
            active_mtu:        2048 (4)
            sm_lid:            97
            port_lid:        51
            port_lmc:        0x00
            max_msg_sz:        0x80000000
            port_cap_flags:        0x07610868
            max_vl_num:        2 (2)
            bad_pkey_cntr:        0x0
            qkey_viol_cntr:        0x0
            sm_sl:            0
            pkey_tbl_len:        4
            gid_tbl_len:        5
            subnet_timeout:        17
            init_type_reply:    0
            active_width:        4X (2)
            active_speed:        10.0 Gbps (4)
            phys_state:        LINK_UP (5)
            GID[  0]:        fe80:0000:0000:0000:0011:7500:0078:b690

-Brian

On 02/01/2012 04:51 PM, Joe Landman wrote:

On 02/01/2012 04:49 PM, Brian Smith wrote:

Having serious issues w/ glusterfs 3.2.5 over rdma. Clients are
periodically dropping off with "transport endpoint not connected". Any
help would be appreciated. Environment is HPC. GlusterFS is being used
as a shared /work|/scratch directory. Standard distributed volume
configuration. Nothing fancy.

Pastie log snippet is here: http://pastie.org/3291330

Any help would be appreciated!



What OS, kernel rev, OFED, etc.  What HCAs, switch, etc.

What does ibv_devinfo report for nodes experiencing the transportendpoint issue?


_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Sorry for x-post: RDMA Mounts drop with "transport endpoints not connected"

Reply via email to