On Tue, 15 Dec 2009 10:16:42 -0500 Hal Rosenstock <hal.rosenst...@gmail.com> wrote:
> Ira, > > On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny <wei...@llnl.gov> wrote: > > Sasha, Hal, > > > > I have found that the following patch caused our SRP connected storage to > > break. > > What is causing the SRP target to fail ? Is it a non zero hop limit > returned in the SA PathRecord ? I am not sure exactly because I don't know what the SRP Target is seeing. (pesky firmware...) :-( > > > patch: 3d20f82edd3246879063b77721d0bcef927bdc48 > > > > opensm/osm_sa_path_record.c: separate router guid resolution code > > > > Move off subnet destination (router address) resolution code to separate > > function to improve readability. > > > > Signed-off-by: Sasha Khapyorsky <sas...@voltaire.com> > > > > I further traced the problem to pr_rcv_build_pr and the patch below fixes > > the > > bug. > > On quick glance, I'm missing the connection between Sasha's patch and > this routine setting something bad in the hop limit of the returned > path record. It sets p_dgid even when we are not using a router which results in is_nonzero_gid being true. > > > 16:03:34 > git diff > > diff --git a/opensm/opensm/osm_sa_path_record.c > > b/opensm/opensm/osm_sa_path_record.c > > index be0cd71..32c889f 100644 > > --- a/opensm/opensm/osm_sa_path_record.c > > +++ b/opensm/opensm/osm_sa_path_record.c > > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const > > osm_port_t * p_src_port, > > > > /* Only set HopLimit if going through a router */ > > if (is_nonzero_gid) > > - p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX); > > + p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX; > > This may fix this issue but it doesn't look right to me and I think > would break router scenarios and corrupts other field(s). Path record > fields are in network (not host) order. But HopLimit is an 8bit field? I might be wrong on this but I was thinking this set RawTraffic and parts of Flow label to 1's. I should double check that. Also, I wonder if the FW on the other side is flipping this wrong? :-/ I have not been able to determine this and I still have to test with our other vendor's SRP target... > > > p_pr->pkey = p_parms->pkey; > > ib_path_rec_set_sl(p_pr, p_parms->sl); > > > > > > "hop_flow_raw" is not really a 32bit value but rather 4 fields: > > Component Length(bits) Offset(bits) > > RawTraffic 1 352 > > Reserved 3 353 > > FlowLabel 20 356 > > HopLimit 8 384 > > > > > > However, I don't understand the comment "Only set HopLimit if going through > > a > > router"? > > Well, in a sense the HopLimit is always set. It's already set to 0 > right above that code snippet in the patch where: > p_pr->hop_flow_raw &= cl_hton32(1 << 31); True, however, I believe the behavior before was that the entire field is 0 so the above statement really does nothing. > This code is only setting the HopLimit to 255 (as this is very > primitive so far) when going through a router. Maybe it could be > stated better. That is exactly what Sasha's patch changes. This is not only executed when going through a router but at all times DGID is specified in the CompMask. I am willing to accept the fact that perhaps p_dgid should not be set unless we find a router, however, I __don't__ think that is correct. Searching the spec I attempted to find where HopLimit was supposed to be 0xFF only when going through a router and I could not find it; therefore my email, I'm confused again... :-( > > > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading > > through a router? I don't think it matters if we are going through a router > > or not does it? > > It's used when DGID is selected in the SA PR query via the comp mask > and in that instance is used for both the router and non router DGID > cases. So we agree that Sasha's patch is correct? > > > If not, I think the comment needs to be removed in the patch above. > > I'm not following why you say this. Well, if HopLimit must only be set to 0xff when going through a router and Sasha's patch is correct, then I think we need another way to determine if this path is going through a router. In that case the comment is correct but the logic is wrong. Ira > > -- Hal > > > Thanks, > > Ira > > > > -- > > Ira Weiny > > Math Programmer/Computer Scientist > > Lawrence Livermore National Lab > > 925-423-8008 > > wei...@llnl.gov > > -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 wei...@llnl.gov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html