[ putting back on openib-general list ]

On Mon, 2005-01-17 at 15:27 -0500, Hal Rosenstock wrote:
> On Mon, 2005-01-17 at 14:47, Tom Duffy wrote:
> > On Sat, 2005-01-15 at 07:30 -0500, Hal Rosenstock wrote:
> > > I will have another patch later today which may actually get this to
> > > work now. I forgot (hopefully) one last thing.
> > 
> > After using the latest OpenSM, I am getting a hang on Solaris when
> > running devfsadm -C.  This is new behavior.  There are no debug outputs
> > when running at debug level 2, so I bumped it up to 3 and got this:
> > 
> > [EMAIL PROTECTED] ~]# devfsadm -C
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open: opening 
> > session, guid = 0002c901097651d1, prefix = 0000000000000003
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open(): port 
> > exists
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_add_client: 
> > num_registered_clients 2
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open: clientp = 
> > 30001e97068, subnetp = 300024b0c50
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_add_event_subscriber: 
> > Adding client to event subscriber list, client = 0x1e97068
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_access_start() enter. 
> > attr_id = 0x35, access_type = 0x0, comp_mask = 0000000000001808
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_check_sa_support: 
> > cap_mask = 0x202, attr_id = 0x35
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_check_sa_support() 
> > exiting, attr_supported = 1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_populate_ud_dest_list(): 
> > Count not below low water mark
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg: Sending 
> > MAD, class = 0x3, method = 0x12, attr_id = 0x35
> That's SA GetTable for PathRecords of some sort.
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: 
> > ibmf_saa_impl_get_attr_id_length(): attr_id: 0x35 size 64
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg: Packed 
> > payload successfully, attr_id = 0x35, length = 64
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg() exiting 
> > ibmf_status = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): Added 
> > message, msgp = 0x30003968200, class = 0x3, method = 0x12, attributeID = 
> > 0x35
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): msgp = 
> > 0x30003968200, TID = 0x97651d100000005, transp_op_flags = 0x2
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): msgp = 
> > 0x30003968200, local_lid = 0x2, remote_lid = 0x1, remote_qpn = 0x1, block = 
> > 1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): 
> > unsetting timer 30003968200 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): blocking 
> > for completion, msgp = 0x30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg_client(): Found 
> > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_compl(): 
> > Sequenced transaction, setting response timer msgp = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting 
> > response timer, interval = 1073745 resp_time 4 round trip time 10624d
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_cb(): Send 
> > callback done.  Dec ref count, msg = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Received 
> > MAD, tid = 097651d100000005, class = 0x3, attrID = 0x35, lid = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Comparing to 
> > msg, msgp = 0x30003968200, tid = 0x97651d100000005, remote_lid = 0x1, 
> > mgmt_class = 0x3
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Found 
> > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Handling 
> > rmpp MAD, tid = 097651d100000005,flags = 0x7 rmpp_type = 1, rmpp_segnum = 0
> This is the SA response of DATA packet indicating First and Last (and Active).
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): first RMPP 
> > pkt received, msgimplp = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb: new resp time 
> > received, resp_time 0
> Oops. I forgot about setting RRespTime in the RMPP header too.
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_active_flow(): 
> > DATA packet received, processing packet
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_flow_main(): 
> > segnum = 0, es = 1, wl = 1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_flow_main(): 
> > Unexpected segment number, discarding packet
> I also need to set SegmentNumber (to 1 as this is a First packet) and 
> PayloadLength in the RMPP header for the DATA packet.
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_rmpp(): msgp = 
> > 0x30003968200, next_seg = 0x0, num_pkts = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_init_send_wqe: msgimplp = 
> > 30003968200, rmpp_type = 2, next_seg = 0, num_pkts = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_init_send_wqe: msgimplp = 
> > 30003968200, rmpp_type = 2, rmpp_flags = 0x1, rmpp_segnum = 0, pyld_nwl = 5
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting 
> > response timer, interval = 1073742 resp_time 1 round trip time 10624d
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg_client(): Found 
> > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_compl(): Received 
> > send callback for RMPP trans msgp = 30003968200, rmpp_state = 0x3
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_cb(): Send 
> > callback done.  Dec ref count, msg = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Received 
> > MAD, tid = 097651d100000005, class = 0x3, attrID = 0x35, lid = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Comparing to 
> > msg, msgp = 0x30003968200, tid = 0x97651d100000005, remote_lid = 0x1, 
> > mgmt_class = 0x3
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Found 
> > message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Handling 
> > rmpp MAD, tid = 097651d100000005,flags = 0x1 rmpp_type = 2, rmpp_segnum = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb: new resp time 
> > received, resp_time 14
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_active_flow(): 
> > ACK packet received, discarding packet
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting 
> > response timer, interval = 1090125 resp_time 4000 round trip time 10624d
> > Jan 17 11:29:26 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_timeout(): resetting 
> > id - 893736
> > Jan 17 11:29:26 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_timeout(): Message 
> > not in undefined state, return without processing send timeout, msgp = 
> > 0x30003968200
> > 
> > This hangs now and is unkillable.  Never returns.
> > 
> > So, setting the rmpp_version presumably makes Solaris even more confused.
> 
> I forgot about the other fields in the packet that need setting.
> 
> I am not sure whether we are getting deeper into a rat hole yet. Are you
> willing to keep going ?

Yeah, sure.  I'll test any patches you send my way...

-tduffy

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to