On Tue, 15 Dec 2009 10:16:42 -0500
Hal Rosenstock <hal.rosenst...@gmail.com> wrote:

> Ira,
> 
> On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny <wei...@llnl.gov> wrote:
> > Sasha, Hal,
> >
> > I have found that the following patch caused our SRP connected storage to
> > break.
> 
> What is causing the SRP target to fail ? Is it a non zero hop limit
> returned in the SA PathRecord ?

I am not sure exactly because I don't know what the SRP Target is seeing.
(pesky firmware...)  :-(

> 
> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48
> >
> >    opensm/osm_sa_path_record.c: separate router guid resolution code
> >
> >    Move off subnet destination (router address) resolution code to separate
> >    function to improve readability.
> >
> >    Signed-off-by: Sasha Khapyorsky <sas...@voltaire.com>
> >
> > I further traced the problem to pr_rcv_build_pr and the patch below fixes 
> > the
> > bug.
> 
> On quick glance, I'm missing the connection between Sasha's patch and
> this routine setting something bad in the hop limit of the returned
> path record.

It sets p_dgid even when we are not using a router which results in
is_nonzero_gid being true.

> 
> > 16:03:34 > git diff
> > diff --git a/opensm/opensm/osm_sa_path_record.c 
> > b/opensm/opensm/osm_sa_path_record.c
> > index be0cd71..32c889f 100644
> > --- a/opensm/opensm/osm_sa_path_record.c
> > +++ b/opensm/opensm/osm_sa_path_record.c
> > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> > osm_port_t * p_src_port,
> >
> >        /* Only set HopLimit if going through a router */
> >        if (is_nonzero_gid)
> > -               p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> > +               p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
> 
> This may fix this issue but it doesn't look right to me and I think
> would break router scenarios and corrupts other field(s). Path record
> fields are in network (not host) order.

But HopLimit is an 8bit field?  I might be wrong on this but I was thinking
this set RawTraffic and parts of Flow label to 1's.  I should double check
that.  Also, I wonder if the FW on the other side is flipping this wrong?  :-/
I have not been able to determine this and I still have to test with our other
vendor's SRP target...

> 
> >        p_pr->pkey = p_parms->pkey;
> >        ib_path_rec_set_sl(p_pr, p_parms->sl);
> >
> >
> > "hop_flow_raw" is not really a 32bit value but rather 4 fields:
> > Component     Length(bits)   Offset(bits)
> > RawTraffic    1              352
> > Reserved      3              353
> > FlowLabel     20             356
> > HopLimit      8              384
> >
> >
> > However, I don't understand the comment "Only set HopLimit if going through 
> > a
> > router"?
> 
> Well, in a sense the HopLimit is always set. It's already set to 0
> right above that code snippet in the patch where:
> p_pr->hop_flow_raw &= cl_hton32(1 << 31);

True, however, I believe the behavior before was that the entire field is 0
so the above statement really does nothing.

> This code is only setting the HopLimit to 255 (as this is very
> primitive so far) when going through a router. Maybe it could be
> stated better.

That is exactly what Sasha's patch changes.  This is not only executed when
going through a router but at all times DGID is specified in the CompMask.  I
am willing to accept the fact that perhaps p_dgid should not be set unless we
find a router, however, I __don't__ think that is correct.  Searching the spec
I attempted to find where HopLimit was supposed to be 0xFF only when going
through a router and I could not find it; therefore my email, I'm confused
again...  :-(

> 
> > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
> > through a router? I don't think it matters if we are going through a router
> > or not does it?
> 
> It's used when DGID is selected in the SA PR query via the comp mask
> and in that instance is used for both the router and non router DGID
> cases.

So we agree that Sasha's patch is correct?

> 
> > If not, I think the comment needs to be removed in the patch above.
> 
> I'm not following why you say this.

Well, if HopLimit must only be set to 0xff when going through a router and
Sasha's patch is correct, then I think we need another way to determine if
this path is going through a router.  In that case the comment is correct but
the logic is wrong.

Ira

> 
> -- Hal
> 
> > Thanks,
> > Ira
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > wei...@llnl.gov
> >


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to