Carlos, What specifically are your testing steps? From the sd driver, we normally call scsi_transport() to send down a command, with the argument 'xp->xb_pktp', and scsi_transport() will call tran_start(). As you said, if you could find xb_pktp is NULL from tran_start(), then I think the system should panic before the sd driver calls scsi_transport() because the sd driver accesses xb_pktp many times before that.
Could you please redo your testing with the debug version sd driver and go to http://supportfiles.sun.com/upload to upload your core files? Thanks, Nikko Carlos Cumming wrote: > > I am getting a panic in sd.c:sd_retry_command because sd.c is not > setting pkt->pkt_private->xb_pktp. Since xb_pktp is a sd.c private > pointer, I am confident that my hba driver is not breaking it. Just in > case, I added instrumentation in tran_start and am finding xb_pktp is > sometimes coming down == NULL. > > The panic occurs while I am testing my drivers reset path by forcing a > timeout. This causes the command to be completed with pkt->pkt_reason == > CMD_TIMEOUT. Subsequently the command is retried by > sd.c:sd_retry_command when the panic occurs. I found xb_pktp == NULL by > examining the dump with mdb. Full analysis at the end... > > I am also using the newer tran_setup_pkt(.. ) interface instead of the > tran_init_pkt(.. ) and friends interface like many of the other SCSI hba > drivers. I suspect that since this interface is not used as often, there > may be some lurking bugs. > > Anybody run into this already? I'd hate to start debugging sd.c when > someone else already has a clue what's going on. > > Thanks, carlos > > > I have variable called $MDB that automatically runs mdb on the last dump > in /var/crashes. > > Here is my stack frame: > > ffffff0007ad1aa0 sdintr+0x2b1(ffffff01ded15a80) > ffffff0007ad1b00 tw_abort_requests+0x260() > ffffff0007ad1b20 tw_initiate_reset+0x47() > ffffff0007ad1b80 tw_timeout+0x26b() > ffffff0007ad1bd0 callout_execute+0xb1(ffffff01cb3d3000) > > sdintr is called to complete the commands. It's argument is scsi_pkt *: > > # echo "0xffffff01ded15a80::print struct scsi_pkt" | $MDB > { > pkt_ha_private = 0xffffff01ded15b50 > pkt_address = { > a_hba_tran = 0xffffff01ddc75380 > a_target = 0 > a_lun = 0 > a_sublun = 0 > } > pkt_private = 0xffffff01df03de88 < buf pointer. > pkt_comp = sdintr > ... > > Given pkt_private above, we can find the buf: > > # echo "0xffffff01df03de88::print struct buf" | $MDB > { > b_flags = 0x502 > b_forw = 0xffffff01df03de00 > b_back = 0 > av_forw = 0 > av_back = 0 > b_dev = 0xffff > b_bcount = 0x2000 > b_un = { > b_addr = 0x8075cb0 > b_fs = 0x8075cb0 > b_cg = 0x8075cb0 > b_dino = 0x8075cb0 > b_daddr = 0x8075cb0 > } > _b_blkno = { > _f = 0x5460220 > _p = { > _l = 0x5460220 > _u = 0 > } > } > b_obs1 = '\0' > b_resid = 0 > b_start = 0 > b_proc = 0xffffff01de8881c0 > b_pages = 0 > b_obs2 = 0 > b_bufsize = 0 > b_iodone = aio_done > b_vp = 0 > b_chain = 0 > b_obs3 = 0 > b_error = 0 > b_private = 0xffffff01d6144340 < struct sd_xbuf * > b_edev = 0x1c00000242 > b_sem = { > _opaque = [ 0, 0 ] > } > ... > > Given b_private above, we can find the sd_xbuf: > > # echo "0xffffff01d6144340::print struct sd_xbuf" | $MDB > { > xb_un = 0xffffff01cccd2200 > xb_pktp = 0 < Oooops... > xb_pktinfo = 0 > xb_private = 0xffffff01d6144340 > xb_blkno = 0x54640e1 > xb_chain_iostart = 0x3 > xb_chain_iodone = 0x4 > xb_pkt_flags = 0 > xb_dma_resid = 0 > xb_retry_count = 0 > xb_victim_retry_count = 0 > xb_ua_retry_count = 0 > ... > > As you can see, xb_pktp above us NULL. > > _______________________________________________ > driver-discuss mailing list > [email protected] > http://mail.opensolaris.org/mailman/listinfo/driver-discuss _______________________________________________ driver-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/driver-discuss
