I've got my OFED (client & server) versions in-sync with the stock version that are part of the latest Scientific Linux 6.1 packages (removing QLogic's OFED distribution). I'm going to see if the stability issues ensue.

SL 6.1
Kernel: 2.6.32-220.2.1.el6.x86_64
HCA: InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)
Switches:
 roots: 1 x QLogic 12300 w/ SM, 2 x QLogic 12200
 edge:  5 x QLogic 12200

ibv_devinfo -v
hca_id:    qib0
    transport:            InfiniBand (0)
    fw_ver:                0.0.0
    node_guid:            0011:7500:0078:b690
    sys_image_guid:            0011:7500:0078:b690
    vendor_id:            0x1175
    vendor_part_id:            29474
    hw_ver:                0x2
    board_id:            InfiniPath_QLE7340
    phys_port_cnt:            1
    max_mr_size:            0xffffffffffffffff
    page_size_cap:            0x1000
    max_qp:                16384
    max_qp_wr:            16383
    device_cap_flags:        0x00003d06
    max_sge:            96
    max_sge_rd:            0
    max_cq:                131071
    max_cqe:            196607
    max_mr:                65536
    max_pd:                65535
    max_qp_rd_atom:            16
    max_ee_rd_atom:            0
    max_res_rd_atom:        0
    max_qp_init_rd_atom:        255
    max_ee_init_rd_atom:        0
    atomic_cap:            ATOMIC_GLOB (2)
    max_ee:                0
    max_rdd:            0
    max_mw:                0
    max_raw_ipv6_qp:        0
    max_raw_ethy_qp:        0
    max_mcast_grp:            16384
    max_mcast_qp_attach:        16
    max_total_mcast_qp_attach:    262144
    max_ah:                65535
    max_fmr:            65536
    max_map_per_fmr:        32767
    max_srq:            1024
    max_srq_wr:            131071
    max_srq_sge:            128
    max_pkeys:            4
    local_ca_ack_delay:        0
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        2048 (4)
            active_mtu:        2048 (4)
            sm_lid:            97
            port_lid:        51
            port_lmc:        0x00
            max_msg_sz:        0x80000000
            port_cap_flags:        0x07610868
            max_vl_num:        2 (2)
            bad_pkey_cntr:        0x0
            qkey_viol_cntr:        0x0
            sm_sl:            0
            pkey_tbl_len:        4
            gid_tbl_len:        5
            subnet_timeout:        17
            init_type_reply:    0
            active_width:        4X (2)
            active_speed:        10.0 Gbps (4)
            phys_state:        LINK_UP (5)
            GID[  0]:        fe80:0000:0000:0000:0011:7500:0078:b690

-Brian

On 02/01/2012 04:51 PM, Joe Landman wrote:
On 02/01/2012 04:49 PM, Brian Smith wrote:
Having serious issues w/ glusterfs 3.2.5 over rdma. Clients are
periodically dropping off with "transport endpoint not connected". Any
help would be appreciated. Environment is HPC. GlusterFS is being used
as a shared /work|/scratch directory. Standard distributed volume
configuration. Nothing fancy.

Pastie log snippet is here: http://pastie.org/3291330

Any help would be appreciated!



What OS, kernel rev, OFED, etc.  What HCAs, switch, etc.

What does ibv_devinfo report for nodes experiencing the transport endpoint issue?


_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to