On Tue, Dec 15, 2009 at 12:09 PM, Ira Weiny <wei...@llnl.gov> wrote:
> On Tue, 15 Dec 2009 10:16:42 -0500
> Hal Rosenstock <hal.rosenst...@gmail.com> wrote:
>
>> Ira,
>>
>> On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny <wei...@llnl.gov> wrote:
>> > Sasha, Hal,
>> >
>> > I have found that the following patch caused our SRP connected storage to
>> > break.
>>
>> What is causing the SRP target to fail ? Is it a non zero hop limit
>> returned in the SA PathRecord ?
>
> I am not sure exactly because I don't know what the SRP Target is seeing.
> (pesky firmware...)  :-(
>
>>
>> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48
>> >
>> >    opensm/osm_sa_path_record.c: separate router guid resolution code
>> >
>> >    Move off subnet destination (router address) resolution code to separate
>> >    function to improve readability.
>> >
>> >    Signed-off-by: Sasha Khapyorsky <sas...@voltaire.com>
>> >
>> > I further traced the problem to pr_rcv_build_pr and the patch below fixes 
>> > the
>> > bug.
>>
>> On quick glance, I'm missing the connection between Sasha's patch and
>> this routine setting something bad in the hop limit of the returned
>> path record.
>
> It sets p_dgid even when we are not using a router which results in
> is_nonzero_gid being true.

Yes, that's the cause of the issue.

>>
>> > 16:03:34 > git diff
>> > diff --git a/opensm/opensm/osm_sa_path_record.c 
>> > b/opensm/opensm/osm_sa_path_record.c
>> > index be0cd71..32c889f 100644
>> > --- a/opensm/opensm/osm_sa_path_record.c
>> > +++ b/opensm/opensm/osm_sa_path_record.c
>> > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
>> > osm_port_t * p_src_port,
>> >
>> >        /* Only set HopLimit if going through a router */
>> >        if (is_nonzero_gid)
>> > -               p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
>> > +               p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
>>
>> This may fix this issue but it doesn't look right to me and I think
>> would break router scenarios and corrupts other field(s). Path record
>> fields are in network (not host) order.
>
> But HopLimit is an 8bit field?

Yes but in terms of the ib_path_rec_t definition hop_flow_raw is a 32 bit field.

> I might be wrong on this but I was thinking
> this set RawTraffic and parts of Flow label to 1's.  I should double check
> that.  Also, I wonder if the FW on the other side is flipping this wrong?  :-/
> I have not been able to determine this and I still have to test with our other
> vendor's SRP target...
>
>>
>> >        p_pr->pkey = p_parms->pkey;
>> >        ib_path_rec_set_sl(p_pr, p_parms->sl);
>> >
>> >
>> > "hop_flow_raw" is not really a 32bit value but rather 4 fields:
>> > Component     Length(bits)   Offset(bits)
>> > RawTraffic    1              352
>> > Reserved      3              353
>> > FlowLabel     20             356
>> > HopLimit      8              384
>> >
>> >
>> > However, I don't understand the comment "Only set HopLimit if going 
>> > through a
>> > router"?
>>
>> Well, in a sense the HopLimit is always set. It's already set to 0
>> right above that code snippet in the patch where:
>> p_pr->hop_flow_raw &= cl_hton32(1 << 31);
>
> True, however, I believe the behavior before was that the entire field is 0
> so the above statement really does nothing.

For all practical purposes that is true as raw traffic is deprecated.

>> This code is only setting the HopLimit to 255 (as this is very
>> primitive so far) when going through a router. Maybe it could be
>> stated better.
>
> That is exactly what Sasha's patch changes.  This is not only executed when
> going through a router but at all times DGID is specified in the CompMask.  I
> am willing to accept the fact that perhaps p_dgid should not be set unless we
> find a router, however, I __don't__ think that is correct.  Searching the spec
> I attempted to find where HopLimit was supposed to be 0xFF only when going
> through a router and I could not find it; therefore my email, I'm confused
> again...  :-(

When going through a router, it needs to be set to something other
than 0 or 1. 255 (max) was used as there's nothing currently to scope
it down to something more realistic. See IBA 1.2.1 vol 1 p.229 8.3.6
for one on settings for this field.

>>
>> > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are 
>> > heading
>> > through a router? I don't think it matters if we are going through a router
>> > or not does it?
>>
>> It's used when DGID is selected in the SA PR query via the comp mask
>> and in that instance is used for both the router and non router DGID
>> cases.
>
> So we agree that Sasha's patch is correct?

Yes; it restores this back the way it used to be when it worked.

-- Hal

>>
>> > If not, I think the comment needs to be removed in the patch above.
>>
>> I'm not following why you say this.
>
> Well, if HopLimit must only be set to 0xff when going through a router and
> Sasha's patch is correct, then I think we need another way to determine if
> this path is going through a router.  In that case the comment is correct but
> the logic is wrong.
>
> Ira
>
>>
>> -- Hal
>>
>> > Thanks,
>> > Ira
>> >
>> > --
>> > Ira Weiny
>> > Math Programmer/Computer Scientist
>> > Lawrence Livermore National Lab
>> > 925-423-8008
>> > wei...@llnl.gov
>> >
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> wei...@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to