[ putting back on openib-general list ] On Mon, 2005-01-17 at 15:27 -0500, Hal Rosenstock wrote: > On Mon, 2005-01-17 at 14:47, Tom Duffy wrote: > > On Sat, 2005-01-15 at 07:30 -0500, Hal Rosenstock wrote: > > > I will have another patch later today which may actually get this to > > > work now. I forgot (hopefully) one last thing. > > > > After using the latest OpenSM, I am getting a hang on Solaris when > > running devfsadm -C. This is new behavior. There are no debug outputs > > when running at debug level 2, so I bumped it up to 3 and got this: > > > > [EMAIL PROTECTED] ~]# devfsadm -C > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open: opening > > session, guid = 0002c901097651d1, prefix = 0000000000000003 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open(): port > > exists > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_add_client: > > num_registered_clients 2 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open: clientp = > > 30001e97068, subnetp = 300024b0c50 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_add_event_subscriber: > > Adding client to event subscriber list, client = 0x1e97068 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_access_start() enter. > > attr_id = 0x35, access_type = 0x0, comp_mask = 0000000000001808 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_check_sa_support: > > cap_mask = 0x202, attr_id = 0x35 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_check_sa_support() > > exiting, attr_supported = 1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_populate_ud_dest_list(): > > Count not below low water mark > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg: Sending > > MAD, class = 0x3, method = 0x12, attr_id = 0x35 > That's SA GetTable for PathRecords of some sort. > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: > > ibmf_saa_impl_get_attr_id_length(): attr_id: 0x35 size 64 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg: Packed > > payload successfully, attr_id = 0x35, length = 64 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg() exiting > > ibmf_status = 0 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): Added > > message, msgp = 0x30003968200, class = 0x3, method = 0x12, attributeID = > > 0x35 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): msgp = > > 0x30003968200, TID = 0x97651d100000005, transp_op_flags = 0x2 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): msgp = > > 0x30003968200, local_lid = 0x2, remote_lid = 0x1, remote_qpn = 0x1, block = > > 1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): > > unsetting timer 30003968200 0 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): blocking > > for completion, msgp = 0x30003968200 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg_client(): Found > > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_compl(): > > Sequenced transaction, setting response timer msgp = 30003968200 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting > > response timer, interval = 1073745 resp_time 4 round trip time 10624d > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_cb(): Send > > callback done. Dec ref count, msg = 30003968200 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Received > > MAD, tid = 097651d100000005, class = 0x3, attrID = 0x35, lid = 0x1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Comparing to > > msg, msgp = 0x30003968200, tid = 0x97651d100000005, remote_lid = 0x1, > > mgmt_class = 0x3 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Found > > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Handling > > rmpp MAD, tid = 097651d100000005,flags = 0x7 rmpp_type = 1, rmpp_segnum = 0 > This is the SA response of DATA packet indicating First and Last (and Active). > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): first RMPP > > pkt received, msgimplp = 30003968200 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb: new resp time > > received, resp_time 0 > Oops. I forgot about setting RRespTime in the RMPP header too. > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_active_flow(): > > DATA packet received, processing packet > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_flow_main(): > > segnum = 0, es = 1, wl = 1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_flow_main(): > > Unexpected segment number, discarding packet > I also need to set SegmentNumber (to 1 as this is a First packet) and > PayloadLength in the RMPP header for the DATA packet. > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_rmpp(): msgp = > > 0x30003968200, next_seg = 0x0, num_pkts = 0 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_init_send_wqe: msgimplp = > > 30003968200, rmpp_type = 2, next_seg = 0, num_pkts = 0 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_init_send_wqe: msgimplp = > > 30003968200, rmpp_type = 2, rmpp_flags = 0x1, rmpp_segnum = 0, pyld_nwl = 5 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting > > response timer, interval = 1073742 resp_time 1 round trip time 10624d > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg_client(): Found > > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_compl(): Received > > send callback for RMPP trans msgp = 30003968200, rmpp_state = 0x3 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_cb(): Send > > callback done. Dec ref count, msg = 30003968200 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Received > > MAD, tid = 097651d100000005, class = 0x3, attrID = 0x35, lid = 0x1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Comparing to > > msg, msgp = 0x30003968200, tid = 0x97651d100000005, remote_lid = 0x1, > > mgmt_class = 0x3 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Found > > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Handling > > rmpp MAD, tid = 097651d100000005,flags = 0x1 rmpp_type = 2, rmpp_segnum = 0 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb: new resp time > > received, resp_time 14 > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_active_flow(): > > ACK packet received, discarding packet > > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting > > response timer, interval = 1090125 resp_time 4000 round trip time 10624d > > Jan 17 11:29:26 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_timeout(): resetting > > id - 893736 > > Jan 17 11:29:26 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_timeout(): Message > > not in undefined state, return without processing send timeout, msgp = > > 0x30003968200 > > > > This hangs now and is unkillable. Never returns. > > > > So, setting the rmpp_version presumably makes Solaris even more confused. > > I forgot about the other fields in the packet that need setting. > > I am not sure whether we are getting deeper into a rat hole yet. Are you > willing to keep going ?
Yeah, sure. I'll test any patches you send my way... -tduffy
signature.asc
Description: This is a digitally signed message part
_______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
