Ira,

On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny <wei...@llnl.gov> wrote:
> Sasha, Hal,
>
> I have found that the following patch caused our SRP connected storage to
> break.

What is causing the SRP target to fail ? Is it a non zero hop limit
returned in the SA PathRecord ?

> patch: 3d20f82edd3246879063b77721d0bcef927bdc48
>
>    opensm/osm_sa_path_record.c: separate router guid resolution code
>
>    Move off subnet destination (router address) resolution code to separate
>    function to improve readability.
>
>    Signed-off-by: Sasha Khapyorsky <sas...@voltaire.com>
>
> I further traced the problem to pr_rcv_build_pr and the patch below fixes the
> bug.

On quick glance, I'm missing the connection between Sasha's patch and
this routine setting something bad in the hop limit of the returned
path record.

> 16:03:34 > git diff
> diff --git a/opensm/opensm/osm_sa_path_record.c 
> b/opensm/opensm/osm_sa_path_record.c
> index be0cd71..32c889f 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> osm_port_t * p_src_port,
>
>        /* Only set HopLimit if going through a router */
>        if (is_nonzero_gid)
> -               p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> +               p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;

This may fix this issue but it doesn't look right to me and I think
would break router scenarios and corrupts other field(s). Path record
fields are in network (not host) order.

>        p_pr->pkey = p_parms->pkey;
>        ib_path_rec_set_sl(p_pr, p_parms->sl);
>
>
> "hop_flow_raw" is not really a 32bit value but rather 4 fields:
> Component     Length(bits)   Offset(bits)
> RawTraffic    1              352
> Reserved      3              353
> FlowLabel     20             356
> HopLimit      8              384
>
>
> However, I don't understand the comment "Only set HopLimit if going through a
> router"?

Well, in a sense the HopLimit is always set. It's already set to 0
right above that code snippet in the patch where:
p_pr->hop_flow_raw &= cl_hton32(1 << 31);
This code is only setting the HopLimit to 255 (as this is very
primitive so far) when going through a router. Maybe it could be
stated better.

> Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
> through a router? I don't think it matters if we are going through a router
> or not does it?

It's used when DGID is selected in the SA PR query via the comp mask
and in that instance is used for both the router and non router DGID
cases.

> If not, I think the comment needs to be removed in the patch above.

I'm not following why you say this.

-- Hal

> Thanks,
> Ira
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> wei...@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to