On Tue, Dec 15, 2009 at 12:09 PM, Ira Weiny <wei...@llnl.gov> wrote: > On Tue, 15 Dec 2009 10:16:42 -0500 > Hal Rosenstock <hal.rosenst...@gmail.com> wrote: > >> Ira, >> >> On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny <wei...@llnl.gov> wrote: >> > Sasha, Hal, >> > >> > I have found that the following patch caused our SRP connected storage to >> > break. >> >> What is causing the SRP target to fail ? Is it a non zero hop limit >> returned in the SA PathRecord ? > > I am not sure exactly because I don't know what the SRP Target is seeing. > (pesky firmware...) :-( > >> >> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48 >> > >> > opensm/osm_sa_path_record.c: separate router guid resolution code >> > >> > Move off subnet destination (router address) resolution code to separate >> > function to improve readability. >> > >> > Signed-off-by: Sasha Khapyorsky <sas...@voltaire.com> >> > >> > I further traced the problem to pr_rcv_build_pr and the patch below fixes >> > the >> > bug. >> >> On quick glance, I'm missing the connection between Sasha's patch and >> this routine setting something bad in the hop limit of the returned >> path record. > > It sets p_dgid even when we are not using a router which results in > is_nonzero_gid being true.
Yes, that's the cause of the issue. >> >> > 16:03:34 > git diff >> > diff --git a/opensm/opensm/osm_sa_path_record.c >> > b/opensm/opensm/osm_sa_path_record.c >> > index be0cd71..32c889f 100644 >> > --- a/opensm/opensm/osm_sa_path_record.c >> > +++ b/opensm/opensm/osm_sa_path_record.c >> > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const >> > osm_port_t * p_src_port, >> > >> > /* Only set HopLimit if going through a router */ >> > if (is_nonzero_gid) >> > - p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX); >> > + p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX; >> >> This may fix this issue but it doesn't look right to me and I think >> would break router scenarios and corrupts other field(s). Path record >> fields are in network (not host) order. > > But HopLimit is an 8bit field? Yes but in terms of the ib_path_rec_t definition hop_flow_raw is a 32 bit field. > I might be wrong on this but I was thinking > this set RawTraffic and parts of Flow label to 1's. I should double check > that. Also, I wonder if the FW on the other side is flipping this wrong? :-/ > I have not been able to determine this and I still have to test with our other > vendor's SRP target... > >> >> > p_pr->pkey = p_parms->pkey; >> > ib_path_rec_set_sl(p_pr, p_parms->sl); >> > >> > >> > "hop_flow_raw" is not really a 32bit value but rather 4 fields: >> > Component Length(bits) Offset(bits) >> > RawTraffic 1 352 >> > Reserved 3 353 >> > FlowLabel 20 356 >> > HopLimit 8 384 >> > >> > >> > However, I don't understand the comment "Only set HopLimit if going >> > through a >> > router"? >> >> Well, in a sense the HopLimit is always set. It's already set to 0 >> right above that code snippet in the patch where: >> p_pr->hop_flow_raw &= cl_hton32(1 << 31); > > True, however, I believe the behavior before was that the entire field is 0 > so the above statement really does nothing. For all practical purposes that is true as raw traffic is deprecated. >> This code is only setting the HopLimit to 255 (as this is very >> primitive so far) when going through a router. Maybe it could be >> stated better. > > That is exactly what Sasha's patch changes. This is not only executed when > going through a router but at all times DGID is specified in the CompMask. I > am willing to accept the fact that perhaps p_dgid should not be set unless we > find a router, however, I __don't__ think that is correct. Searching the spec > I attempted to find where HopLimit was supposed to be 0xFF only when going > through a router and I could not find it; therefore my email, I'm confused > again... :-( When going through a router, it needs to be set to something other than 0 or 1. 255 (max) was used as there's nothing currently to scope it down to something more realistic. See IBA 1.2.1 vol 1 p.229 8.3.6 for one on settings for this field. >> >> > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are >> > heading >> > through a router? I don't think it matters if we are going through a router >> > or not does it? >> >> It's used when DGID is selected in the SA PR query via the comp mask >> and in that instance is used for both the router and non router DGID >> cases. > > So we agree that Sasha's patch is correct? Yes; it restores this back the way it used to be when it worked. -- Hal >> >> > If not, I think the comment needs to be removed in the patch above. >> >> I'm not following why you say this. > > Well, if HopLimit must only be set to 0xff when going through a router and > Sasha's patch is correct, then I think we need another way to determine if > this path is going through a router. In that case the comment is correct but > the logic is wrong. > > Ira > >> >> -- Hal >> >> > Thanks, >> > Ira >> > >> > -- >> > Ira Weiny >> > Math Programmer/Computer Scientist >> > Lawrence Livermore National Lab >> > 925-423-8008 >> > wei...@llnl.gov >> > > > > -- > Ira Weiny > Math Programmer/Computer Scientist > Lawrence Livermore National Lab > 925-423-8008 > wei...@llnl.gov > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html