On 01/31/2017 06:08 PM, Chad Dupuis wrote:

On Mon, 30 Jan 2017, 10:34am -0000, Hannes Reinecke wrote:

On 01/25/2017 09:33 PM, Dupuis, Chad wrote:
From: "Dupuis, Chad" <chad.dup...@cavium.com>

[ .. ]
+       if (opcode == ELS_LS_RJT) {
+               rjt = fc_frame_payload_get(fp, sizeof(*rjt));
+               QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
+                   "Received LS_RJT for REC: er_reason=0x%x, "
+                   "er_explan=0x%x.\n", rjt->er_reason, rjt->er_explan);
+               /*
+                * The following response(s) mean that we need to reissue the
+                * request on another exchange.  We need to do this without
+                * informing the upper layers lest it cause an application
+                * error.
+                */
+               if ((rjt->er_reason == ELS_RJT_LOGIC ||
+                   rjt->er_reason == ELS_RJT_UNAB) &&
+                   rjt->er_explan == ELS_EXPL_OXID_RXID) {
+                       QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_ELS,
+                           "Handle CMD LOST case.\n");
+                       qedf_requeue_io_req(orig_io_req);
+               }

Care to explain this?
I found requeing within the driver without notifying the upper layers
deeply troubling.
I've came across this (or a similar issue) during implementing
FCoE over virtio; there it turned out to be an issue with the target not
handling frames in the correct order.
So it looks you are attempting to solve the problem of a REC being sent
for a command which is not know at the target.
Meaning it's either lost in the fabric, hasn't been seen by the target
(yet), or having already been processed by the target.

The above code would only solve the second problem; however, that would
indicate a command ordering issue as the REC would have arrived at the
target _before_ the originating command.
So doing a resend would only help for _that_ specific case, not for the
others.
And it's not quite clear why resending with a different OXID would help
here; typically it's the _RXID_ which isn't found ...

The problem I've observed is that If I return an error to the SCSI layer
it propogates up to the st driver and the test application would fail
since st would fail the I/O.  As the comment explains we do this on
another exchange internally in the driver so as not to cause the tape
backup application/test to fail.

bnx2fc behaves like this IIRC.

Ah, yes.
I've had this discussion with James Smart, too.
When a command abort fails with 'Invalid OXID/RXID' we should be retrying the command with a different set of IDs, as the original
need to quarantined.
(Which, incidentaily, you don't do, right? You cannot re-use that particular exchange until you either got a response for it or you reset the exchange manager. If you don't you run into the risk of getting an invalid completion if for some reason the command _does_ return eventually...)

Even though I'd rather like to have it resolved in libfc or even the SCSI midlayer itself, it's not something which should hold off the entire submission.


[ .. ]
+static void qedf_fcoe_process_vlan_resp(struct qedf_ctx *qedf,
+       struct sk_buff *skb)
+{
+       struct fip_header *fiph;
+       struct fip_desc *desc;
+       u16 vid = 0;
+       ssize_t rlen;
+       size_t dlen;
+
+       fiph = (struct fip_header *)(((void *)skb->data) + 2 * ETH_ALEN + 2);
+
+       rlen = ntohs(fiph->fip_dl_len) * 4;
+       desc = (struct fip_desc *)(fiph + 1);
+       while (rlen > 0) {
+               dlen = desc->fip_dlen * FIP_BPW;
+               switch (desc->fip_dtype) {
+               case FIP_DT_VLAN:
+                       vid = ntohs(((struct fip_vlan_desc *)desc)->fd_vlan);
+                       break;
+               }
+               desc = (struct fip_desc *)((char *)desc + dlen);
+               rlen -= dlen;
+       }
+
+       QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_DISC, "VLAN response, "
+                  "vid=0x%x.\n", vid);
+
+       if (vid > 0 && qedf->vlan_id != vid) {
+               qedf_set_vlan_id(qedf, vid);
+
+               /* Inform waiter that it's ok to call fcoe_ctlr_link up() */
+               complete(&qedf->fipvlan_compl);
+       }
+}
+
As mentioned, this will fail for HP VirtualConnect, which return a vid
of '0', indicating that the FCoE link should run on the interface itself.
And this is actually a valid response, I would think that you should be
calling complete() for any response ...

In testing I would get responses that would seem to contain VID 0 but then
subsequent responses contain the valid VID.  If I try to go with a VID of
0 FIP discovery would fail.  It could be possible that there's a bug in
the code that parses the FIP VLAN response but I'll like to leave this in
for now and it can be removed once I figure where the issue may be.

Okay.

Cheers,

Hannes
--
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Reply via email to