On Thu, 2007-05-17 at 00:18, Ganesh Sadasivan wrote: > The reason is: > Jan 01 01:46:17 321555 [58F3E280] -> osm_vendor_set_sm: ERR 5431: > setting IS_SM capability mask failed; errno 2
Yes, this makes sense now and explains what you are seeing. > From the code it looks like /dev/infiniband/issm<umad_port> needs to > be created and I did that. This should be done via udev rather than manually. Do you have udev setup ? If not, please follow the instructions on the wiki. -- Hal > But still the SM with higher GUID seem to become the master whenever > it does a sweep. The logs are too detailed. So I am sending snippets. > > Local port (with a high GUID) > Jan 01 02:49:56 332142 [5873E280] -> osm_pi_rcv_process: Discovered > port num 0x1 with GUID = 0x2c901097682d1 for parent node GUID = > 0x2c901097682d0, TID = 0x1236 > Jan 01 02:49:56 332197 [5873E280] -> PortInfo dump: > port number.............0x1 > > node_guid...............0x0002c901097682d0 > > port_guid...............0x0002c901097682d1 > > m_key...................0x0000000000000000 > > subnet_prefix...........0xfe80000000000000 > base_lid................0x1 > master_sm_base_lid......0x2 > capability_mask.........0x2510A68 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xFF > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x20 > client_reregister.......0x0 > subnet_timeout..........0x12 > resp_time_value.........0x10 > error_threshold.........0x88 > Jan 01 02:49:56 332337 [5873E280] -> Capabilities Mask: > IB_PORT_CAP_HAS_TRAP > IB_PORT_CAP_HAS_AUTO_MIG > IB_PORT_CAP_HAS_SL_MAP > IB_PORT_CAP_HAS_LED_INFO > IB_PORT_CAP_HAS_SYS_IMG_GUID > IB_PORT_CAP_HAS_COM_MGT > IB_PORT_CAP_HAS_VEND_CLS > IB_PORT_CAP_HAS_CAP_NTC > IB_PORT_CAP_HAS_CLIENT_REREG > > Remote Port which hosts the SM: > Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered > port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID = > 0x2c90109765da0, TID = 0x123b > Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump: > Jan 01 02:49:56 500638 [5AF3E280] -> osm_pi_rcv_process: Discovered > port num 0x1 with GUID = 0x2c90109765da1 for parent node GUID = > 0x2c90109765da0, TID = 0x123b > Jan 01 02:49:56 500690 [5AF3E280] -> PortInfo dump: > port number.............0x1 > > node_guid...............0x0002c90109765da0 > > port_guid...............0x0002c90109765da1 > > m_key...................0x0000000000000000 > > subnet_prefix...........0xfe80000000000000 > base_lid................0x2 > master_sm_base_lid......0x2 > capability_mask.........0x2510A68 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x1 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x1 > port_state..............ACTIVE > state_info2.............0x52 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x11 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xFF > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x20 > client_reregister.......0x0 > subnet_timeout..........0x12 > resp_time_value.........0x10 > error_threshold.........0x88 > Jan 01 02:49:56 500831 [5AF3E280] -> Capabilities Mask: > IB_PORT_CAP_HAS_TRAP > IB_PORT_CAP_HAS_AUTO_MIG > IB_PORT_CAP_HAS_SL_MAP > IB_PORT_CAP_HAS_LED_INFO > IB_PORT_CAP_HAS_SYS_IMG_GUID > IB_PORT_CAP_HAS_COM_MGT > IB_PORT_CAP_HAS_VEND_CLS > IB_PORT_CAP_HAS_CAP_NTC > IB_PORT_CAP_HAS_CLIENT_REREG > > Please let me know if I look at some specific portion. > > Thanks > Ganesh > > > > On 16 May 2007 21:57:27 -0400, Hal Rosenstock <[EMAIL PROTECTED]> > wrote: > Hi again Ganesh, > > On Wed, 2007-05-16 at 21:42, Ganesh Sadasivan wrote: > > Hi Hal, > > > > Please see inline. > > > > On 16 May 2007 19:22:00 -0400, Hal Rosenstock > <[EMAIL PROTECTED]> > > wrote: > > Hi Ganesh, > > > > On Wed, 2007-05-16 at 19:00, Ganesh Sadasivan wrote: > > > Hi, > > > > > > I have a setup with 2 HCAs connected back to > back and am > > running > > > opensm (ofed1.1, running at the same priority) on > both of > > them. Is > > > there any utility to see who is the master? > > > > Even with priority difeferences I am seeing the same > behavior.Am I > > missing any option. I am setting "opensm -s 30" and "opensm > -s 60" on > > the respective sides. > > Why not use the default (10 secs) or at least the same on both > sides ? > > > sminfo will show the SM state for a LID/GUID. > > > > > > Thanks. > > > > > The smlid in ibv_devinfo, seems to be changing > whenever an > > SM does a > > > sweep. Is this expected? > > > > Nope. If they are both at the same priority, the > lower GUID > > should win > > the SM election. > > > > Not sure what is going wrong in your (back to back > HCA) > > subnet. Do you > > ports stay active ? > > > > > > Yes both ports are active. > > And they stay active (no LED color changes) ? > > If not, can you run both OpenSMs in verbose mode (-V) and see > if there > is anything interesting/relevant in the logs ? > > -- Hal > > > Thanks > > Ganesh > > > > -- Hal > > > > > Thanks > > > Ganesh > > > > > > > > > ______________________________________________________________________ > > > _______________________________________________ > > > general mailing list > > > [email protected] > > > > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
