It's been a long time since I've worked with interrupt hooks. If I'm right that iPXE stays around forever, not sure if it might be possible to add functionality to the interrupt hook to signify that it's OK to logout from the SRP connection.
How's it work with iscsi booting? Do the TCP connections just not care about a second blasting the first away, unlike on InfiniBand? On Fri, Aug 19, 2016 at 11:56 PM, james harvey <jamespharve...@gmail.com> wrote: > This patch is incomplete. Although it shouldn't hurt to apply, it > prepares for a functional change but does not quite get there, and > would need a bit of polishing. > > Problem - Running "dhcp; sanhook <args>; sanboot" propely initiates an > SRP connection, and starts the boot loader. iPXE nevers performs an > SRP_I_LOGOUT, even though it's already defined in include/ipxe/srp.h. > After the kernel has booted and the sBFT parsed, attempting to start > an SRP connection confuses the target machine's kernel, causing SRP > commands to abort for about 10-30 seconds. The only workaround I can > find once booted is to continually repeat attempting the SRP > connection, waiting a second, and retrying. The error I get for 10-30 > seconds is: > > ===== > scsi host7: ib_srp: Got failed path rec status -22 > scsi host7: ib_srp: Path record query failed > scsi host7: ib_srp: Connection 0/4 failed > scsi host7: ib_srp: Sending CM DREQ failed > ===== > > After 10-30 seconds, the target kernel shows: > > ===== > [ 115.941418] ib_srpt Relogin - closed existing channel > 0x1d534b0003c902000002c903004b5275 > [ 115.941470] ib_srpt Received CM DREP message for ch > 0x1d534b0003c902000002c903004b5275-522. > ===== > > And the new SRP connection is made. I'm successfully able to make the > SRP connection, and successfully boot with the new root. > > With interesting timing, kernel 4.7 just got released in Arch linux 13 > days ago. And, in 4.7, ib_srpt was patched to convert to the generic > "RDMA READ/WRITE API" which never closes the existing channel and > allows a new connection, but just blocks a new connection forever. > > So the issue started with < kernel 4.7 as just a 10-30 second delay > during bootup, but now with >= 4.7 completely blocks booting. (I'm > going to follow up with a kernel regression, but the best way to > handle this is to close the initial iPXE connection.) > > This patch implements the missing srp_initiator_logout(), and modifies > srpdev_close() to call it if srpdev->logged_in is set. > > However, in a successful "dhcp; sanhook <args>; sanboot" sequence, > srpdev_close() is never called. (The (3) dbg_printf's I added in > srpdev_close, and (1) in ib_srp.c, never execute - only temporarily > using dbg_printf to avoid building with DEBUG to only see these debug > errors.) The target kernel never gets an SRP_I_LOGOUT command. > > srpdev_close is only referenced in src/drivers/block/srp.c in these locations: > (1) In srpdev_deliver, within the label err. > (2) In srp_open(), within the labels err_scsi_open and err_login. > (3) In static struct interface_operation srpdev_socket_op[] as > INTF_OP ( intf_close, struct srp_device *, srpdev_close ) > (4) In static struct interface_operation srpdev_scsi_op[] as > INTF_OP ( intf_close, struct srp_device *, srpdev_close ) > > srpdev_socket_op is only referenced in srp.c within > static struct interface_descriptor srpdev_socket_desc as > INTF_DESC_PASSTHRU ( struct srp_device, socket, srpdev_socket_op, scsi ) > > srpdev_scsi_op is only referenced in srp.c within > static struct interface_descriptor srpdev_scsi_desc as > INTF_DESC_PASSTHRU ( struct srp_device, scsi, srpdev_scsi_op, socket ) > > srpdev_scsi_desc and srpdev_socket_desc are only referenced in > srp.c::srp_open, given to intf_init > > Looking in net/infiniband/ib_srp.c::ib_srp_close(), it does call > intf_shutdown( &ib_srp->cmrc, rc ) and intf_shutdown ( &ib_srp->srp, > rc ). > > At about that point, I started thinking that iPXE probably never knows > that it should fully exit, and that it's probably left in lowmem > forever. Am I right here? I'm not sure there's a point where iPXE > currently knows it can safely fully exit and disable the interrupt > hook. So, I'm looking for guidance. > > Do we need to add to the sBFT specification, to include a memory > address of new function which would call srp_initiator_logout with its > struct srp_device *srpdev, if I'm correct that iPXE is basically > staying in low mem forever and never really exiting? Not sure if I'm > understanding that correctly. > > Patch is below and attached. > > Submitted by: James P. Harvey <jamespharve...@gmail.com> > > --- > src/drivers/block/srp.c | 59 > +++++++++++++++++++++++++++++++++++++++++++++ > src/net/infiniband/ib_srp.c | 1 + > 2 files changed, 60 insertions(+) > > diff --git a/src/drivers/block/srp.c b/src/drivers/block/srp.c > index 85c7aa8..8ab4a97 100644 > --- a/src/drivers/block/srp.c > +++ b/src/drivers/block/srp.c > @@ -229,15 +229,33 @@ static void srpcmd_close ( struct srp_command > *srpcmd, int rc ) { > * @v srpdev SRP device > * @v rc Reason for close > */ > +static int srp_new_tag ( struct srp_device *srpdev ); > +static int srp_initiator_logout ( struct srp_device *srpdev, uint32_t tag ); > static void srpdev_close ( struct srp_device *srpdev, int rc ) { > struct srp_command *srpcmd; > struct srp_command *tmp; > > + dbg_printf ( "!srpdev_close()\n" ); > if ( rc != 0 ) { > DBGC ( srpdev, "SRP %p closed: %s\n", > srpdev, strerror ( rc ) ); > } > > + /* If logged in, attempt logging out */ > + if ( srpdev->logged_in ) { > + dbg_printf ( "!srpdev->logged_in true\n" ); > + int logout_rc; > + int tag; > + > + tag = srp_new_tag ( srpdev ); > + if ( tag < 0 ) > + DBGC ( srpdev, "SRP %p tag %08x invalid tag\n", srpdev, tag ); > + else if ( ( logout_rc = srp_initiator_logout ( srpdev, tag ) ) != 0 ) > + DBGC ( srpdev, "SRP %p tag %08x cannot logout\n", srpdev, tag ); > + } else { > + dbg_printf ( "!srpdev->logged_in false\n" ); > + } > + > /* Shut down interfaces */ > intf_shutdown ( &srpdev->socket, rc ); > intf_shutdown ( &srpdev->scsi, rc ); > @@ -393,6 +411,47 @@ static int srp_login_rej ( struct srp_device *srpdev, > } > > /** > + * Transmit SRP initiator logout > + * > + * @v srpdev SRP device > + * @v tag Command tag > + * @ret rc Return status code > + */ > +static int srp_initiator_logout ( struct srp_device *srpdev, uint32_t tag ) { > + struct io_buffer *iobuf; > + struct srp_i_logout *i_logout; > + int rc; > + > + /* Allocate I/O buffer */ > + iobuf = xfer_alloc_iob ( &srpdev->socket, sizeof ( *i_logout ) ); > + if ( ! iobuf ) > + return -ENOMEM; > + > + /* Construct initiator logout IU */ > + i_logout = iob_put ( iobuf, sizeof ( *i_logout ) ); > + memset ( i_logout, 0, sizeof ( *i_logout ) ); > + i_logout->type = SRP_I_LOGOUT; > + i_logout->tag.dwords[0] = htonl ( SRP_TAG_MAGIC ); > + i_logout->tag.dwords[1] = htonl ( tag ); > + > + DBGC ( srpdev, "SRP %p tag %08x I_LOGOUT:\n", srpdev, tag ); > + DBGC_HDA ( srpdev, 0, iobuf->data, iob_len ( iobuf ) ); > + > + /* Send initiator logout IU */ > + if ( ( rc = xfer_deliver_iob ( &srpdev->socket, iobuf ) ) != 0 ) { > + DBGC ( srpdev, "SRP %p tag %08x could not send I_LOGOUT: " > + "%s\n", srpdev, tag, strerror ( rc ) ); > + return rc; > + } > + > + /* Mark as logged out */ > + srpdev->logged_in = 0; > + DBGC ( srpdev, "SRP %p logged out\n", srpdev ); > + > + return 0; > +} > + > +/** > * Transmit SRP SCSI command > * > * @v srpdev SRP device > diff --git a/src/net/infiniband/ib_srp.c b/src/net/infiniband/ib_srp.c > index 904a99b..2c718ef 100644 > --- a/src/net/infiniband/ib_srp.c > +++ b/src/net/infiniband/ib_srp.c > @@ -106,6 +106,7 @@ static void ib_srp_free ( struct refcnt *refcnt ) { > * @v rc Reason for close > */ > static void ib_srp_close ( struct ib_srp_device *ib_srp, int rc ) { > + dbg_printf ("ib_srp_close\n" ); > /* Shut down interfaces */ > intf_shutdown ( &ib_srp->cmrc, rc ); > intf_shutdown ( &ib_srp->srp, rc ); > -- > 2.9.0 _______________________________________________ ipxe-devel mailing list ipxe-devel@lists.ipxe.org https://lists.ipxe.org/mailman/listinfo.cgi/ipxe-devel