Re: [RFC PATCH] Set HopLimit based on off subnet Re: SRP issues with OpenSM 3.3.3

2009-12-20 Thread Sasha Khapyorsky
On 18:18 Thu 17 Dec , Ira Weiny wrote:
> 
> It might be faster but it would mean a bigger change.  It looks like having
> the dgid or p_pr structures in some internal opensm structure to carry that
> information along would be best but that is a lot of work.

Actually now dgid parameter is never used when intra-subnet PR is
requested - maybe those parameters (dgid, is_off_subnet, etc.) could be
merged into some context struct which is initialized once and passed down
to the functions.

Likely some general cleanup of osm_sa_path_record.c is needed to see a
most obvious coding solution.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Set HopLimit based on off subnet Re: SRP issues with OpenSM 3.3.3

2009-12-18 Thread Hal Rosenstock
On Thu, Dec 17, 2009 at 9:18 PM, Ira Weiny  wrote:
> On Wed, 16 Dec 2009 08:29:43 -0500
> Hal Rosenstock  wrote:
>
>> On Tue, Dec 15, 2009 at 9:55 PM, Ira Weiny  wrote:
>
> [snip]
>
>> >
>> > diff --git a/opensm/opensm/osm_sa_path_record.c 
>> > b/opensm/opensm/osm_sa_path_record.c
>> > index be0cd71..1fa83a1 100644
>> > --- a/opensm/opensm/osm_sa_path_record.c
>> > +++ b/opensm/opensm/osm_sa_path_record.c
>> > @@ -757,6 +757,14 @@ Exit:
>> >        return (status);
>> >  }
>> >
>> > +static int gid_is_off_subnet(IN osm_sa_t * sa, IN const ib_gid_t * p_dgid)
>> > +{
>> > +       return (!ib_gid_is_link_local(p_dgid) &&
>> > +               !ib_gid_is_multicast(p_dgid) &&
>> > +               ib_gid_get_subnet_prefix(p_dgid) !=
>> > +               sa->p_subn->opt.subnet_prefix);
>> > +}
>> > +
>> >  /**
>> >  **/
>> >  static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const osm_port_t * 
>> > p_src_port,
>> > @@ -770,16 +778,12 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN 
>> > const osm_port_t * p_src_port,
>> >  {
>> >        const osm_physp_t *p_src_physp;
>> >        const osm_physp_t *p_dest_physp;
>> > -       boolean_t is_nonzero_gid = 0;
>> >
>> >        OSM_LOG_ENTER(sa->p_log);
>> >
>> >        p_src_physp = p_src_port->p_physp;
>> >
>> >        if (p_dgid)
>> > -               is_nonzero_gid = ib_gid_is_notzero(p_dgid);
>> > -
>> > -       if (is_nonzero_gid)
>> >                p_pr->dgid = *p_dgid;
>> >        else {
>> >                p_dest_physp = p_dest_port->p_physp;
>> > @@ -799,7 +803,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
>> > osm_port_t * p_src_port,
>> >        p_pr->hop_flow_raw &= cl_hton32(1 << 31);
>> >
>> >        /* Only set HopLimit if going through a router */
>> > -       if (is_nonzero_gid)
>> > +       if (gid_is_off_subnet(sa, &p_pr->dgid))
>> >                p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
>> >
>> >        p_pr->pkey = p_parms->pkey;
>> > @@ -1248,10 +1252,7 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t 
>> > * sa,
>> >                memset(p_dgid, 0, sizeof(*p_dgid));
>> >
>> >        if (comp_mask & IB_PR_COMPMASK_DGID) {
>> > -               if (!ib_gid_is_link_local(&p_pr->dgid) &&
>> > -                   !ib_gid_is_multicast(&p_pr->dgid) &&
>> > -                   ib_gid_get_subnet_prefix(&p_pr->dgid) !=
>> > -                   sa->p_subn->opt.subnet_prefix) {
>> > +               if (gid_is_off_subnet(sa, &p_pr->dgid)) {
>> >                        dest_guid = find_router(sa, 
>> > p_pr->dgid.unicast.prefix);
>> >                        if (!dest_guid) {
>> >                                char gid_str[INET6_ADDRSTRLEN];
>> >
>> >
>> > This maintains the p_dgid found in pr_rcv_get_end_points but appropriately
>> > set's the HopLimit when the DGID is off subnet.
>>
>> Yes, this looks right to me. Would it be better to only determine
>> gid_is_off_subnet once by saving this in a variable and passing it as
>> a parameter where needed ?
>>
>
> It might be faster but it would mean a bigger change.

Yes.

> It looks like having
> the dgid or p_pr structures in some internal opensm structure to carry that
> information along would be best but that is a lot of work.

It could also be done by passing this as an additional parameter
through the necessary routines.

> Do you think this will adversely affect performance on a big system?

I don't know as I haven't measured the difference in the two
approaches. Just trying to minimize additions to computing PRs.

-- Hal

> I plan
> to roll this out internally and we will have a chance to test on bigger
> systems here.  But there are larger out there now.
>
> Ira
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> wei...@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Set HopLimit based on off subnet Re: SRP issues with OpenSM 3.3.3

2009-12-17 Thread Ira Weiny
On Wed, 16 Dec 2009 08:29:43 -0500
Hal Rosenstock  wrote:

> On Tue, Dec 15, 2009 at 9:55 PM, Ira Weiny  wrote:

[snip]

> >
> > diff --git a/opensm/opensm/osm_sa_path_record.c 
> > b/opensm/opensm/osm_sa_path_record.c
> > index be0cd71..1fa83a1 100644
> > --- a/opensm/opensm/osm_sa_path_record.c
> > +++ b/opensm/opensm/osm_sa_path_record.c
> > @@ -757,6 +757,14 @@ Exit:
> >        return (status);
> >  }
> >
> > +static int gid_is_off_subnet(IN osm_sa_t * sa, IN const ib_gid_t * p_dgid)
> > +{
> > +       return (!ib_gid_is_link_local(p_dgid) &&
> > +               !ib_gid_is_multicast(p_dgid) &&
> > +               ib_gid_get_subnet_prefix(p_dgid) !=
> > +               sa->p_subn->opt.subnet_prefix);
> > +}
> > +
> >  /**
> >  **/
> >  static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const osm_port_t * 
> > p_src_port,
> > @@ -770,16 +778,12 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN 
> > const osm_port_t * p_src_port,
> >  {
> >        const osm_physp_t *p_src_physp;
> >        const osm_physp_t *p_dest_physp;
> > -       boolean_t is_nonzero_gid = 0;
> >
> >        OSM_LOG_ENTER(sa->p_log);
> >
> >        p_src_physp = p_src_port->p_physp;
> >
> >        if (p_dgid)
> > -               is_nonzero_gid = ib_gid_is_notzero(p_dgid);
> > -
> > -       if (is_nonzero_gid)
> >                p_pr->dgid = *p_dgid;
> >        else {
> >                p_dest_physp = p_dest_port->p_physp;
> > @@ -799,7 +803,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> > osm_port_t * p_src_port,
> >        p_pr->hop_flow_raw &= cl_hton32(1 << 31);
> >
> >        /* Only set HopLimit if going through a router */
> > -       if (is_nonzero_gid)
> > +       if (gid_is_off_subnet(sa, &p_pr->dgid))
> >                p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> >
> >        p_pr->pkey = p_parms->pkey;
> > @@ -1248,10 +1252,7 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t 
> > * sa,
> >                memset(p_dgid, 0, sizeof(*p_dgid));
> >
> >        if (comp_mask & IB_PR_COMPMASK_DGID) {
> > -               if (!ib_gid_is_link_local(&p_pr->dgid) &&
> > -                   !ib_gid_is_multicast(&p_pr->dgid) &&
> > -                   ib_gid_get_subnet_prefix(&p_pr->dgid) !=
> > -                   sa->p_subn->opt.subnet_prefix) {
> > +               if (gid_is_off_subnet(sa, &p_pr->dgid)) {
> >                        dest_guid = find_router(sa, 
> > p_pr->dgid.unicast.prefix);
> >                        if (!dest_guid) {
> >                                char gid_str[INET6_ADDRSTRLEN];
> >
> >
> > This maintains the p_dgid found in pr_rcv_get_end_points but appropriately
> > set's the HopLimit when the DGID is off subnet.
> 
> Yes, this looks right to me. Would it be better to only determine
> gid_is_off_subnet once by saving this in a variable and passing it as
> a parameter where needed ?
> 

It might be faster but it would mean a bigger change.  It looks like having
the dgid or p_pr structures in some internal opensm structure to carry that
information along would be best but that is a lot of work.

Do you think this will adversely affect performance on a big system?  I plan
to roll this out internally and we will have a chance to test on bigger
systems here.  But there are larger out there now.

Ira


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Set HopLimit based on off subnet Re: SRP issues with OpenSM 3.3.3

2009-12-16 Thread Hal Rosenstock
On Tue, Dec 15, 2009 at 9:55 PM, Ira Weiny  wrote:
> On Tue, 15 Dec 2009 12:59:11 -0500
> Hal Rosenstock  wrote:
>
>> On Tue, Dec 15, 2009 at 12:18 PM, Ira Weiny  wrote:
>
> [snip]
>
>>
>> > The follow on would be that if they specify a
>> > DGID in the PathRecord query and the result is subnet local they must still
>> > use a GRH even if the packet is going to stay local?
>>
>> No; with subnet local traffic, GRH is optional.
>>
>
> Ok, if the GRH is optional with local traffic then I think the appropriate fix
> is:
>
> diff --git a/opensm/opensm/osm_sa_path_record.c 
> b/opensm/opensm/osm_sa_path_record.c
> index be0cd71..1fa83a1 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -757,6 +757,14 @@ Exit:
>        return (status);
>  }
>
> +static int gid_is_off_subnet(IN osm_sa_t * sa, IN const ib_gid_t * p_dgid)
> +{
> +       return (!ib_gid_is_link_local(p_dgid) &&
> +               !ib_gid_is_multicast(p_dgid) &&
> +               ib_gid_get_subnet_prefix(p_dgid) !=
> +               sa->p_subn->opt.subnet_prefix);
> +}
> +
>  /**
>  **/
>  static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const osm_port_t * 
> p_src_port,
> @@ -770,16 +778,12 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> osm_port_t * p_src_port,
>  {
>        const osm_physp_t *p_src_physp;
>        const osm_physp_t *p_dest_physp;
> -       boolean_t is_nonzero_gid = 0;
>
>        OSM_LOG_ENTER(sa->p_log);
>
>        p_src_physp = p_src_port->p_physp;
>
>        if (p_dgid)
> -               is_nonzero_gid = ib_gid_is_notzero(p_dgid);
> -
> -       if (is_nonzero_gid)
>                p_pr->dgid = *p_dgid;
>        else {
>                p_dest_physp = p_dest_port->p_physp;
> @@ -799,7 +803,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> osm_port_t * p_src_port,
>        p_pr->hop_flow_raw &= cl_hton32(1 << 31);
>
>        /* Only set HopLimit if going through a router */
> -       if (is_nonzero_gid)
> +       if (gid_is_off_subnet(sa, &p_pr->dgid))
>                p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
>
>        p_pr->pkey = p_parms->pkey;
> @@ -1248,10 +1252,7 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t * 
> sa,
>                memset(p_dgid, 0, sizeof(*p_dgid));
>
>        if (comp_mask & IB_PR_COMPMASK_DGID) {
> -               if (!ib_gid_is_link_local(&p_pr->dgid) &&
> -                   !ib_gid_is_multicast(&p_pr->dgid) &&
> -                   ib_gid_get_subnet_prefix(&p_pr->dgid) !=
> -                   sa->p_subn->opt.subnet_prefix) {
> +               if (gid_is_off_subnet(sa, &p_pr->dgid)) {
>                        dest_guid = find_router(sa, p_pr->dgid.unicast.prefix);
>                        if (!dest_guid) {
>                                char gid_str[INET6_ADDRSTRLEN];
>
>
> This maintains the p_dgid found in pr_rcv_get_end_points but appropriately
> set's the HopLimit when the DGID is off subnet.

Yes, this looks right to me. Would it be better to only determine
gid_is_off_subnet once by saving this in a variable and passing it as
a parameter where needed ?

-- Hal

>
> Ira
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Set HopLimit based on off subnet Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Ira Weiny
On Tue, 15 Dec 2009 12:59:11 -0500
Hal Rosenstock  wrote:

> On Tue, Dec 15, 2009 at 12:18 PM, Ira Weiny  wrote:

[snip]

> 
> > The follow on would be that if they specify a
> > DGID in the PathRecord query and the result is subnet local they must still
> > use a GRH even if the packet is going to stay local?
> 
> No; with subnet local traffic, GRH is optional.
> 

Ok, if the GRH is optional with local traffic then I think the appropriate fix
is:

diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index be0cd71..1fa83a1 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -757,6 +757,14 @@ Exit:
return (status);
 }
 
+static int gid_is_off_subnet(IN osm_sa_t * sa, IN const ib_gid_t * p_dgid)
+{
+   return (!ib_gid_is_link_local(p_dgid) &&
+   !ib_gid_is_multicast(p_dgid) &&
+   ib_gid_get_subnet_prefix(p_dgid) !=
+   sa->p_subn->opt.subnet_prefix);
+}
+
 /**
  **/
 static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const osm_port_t * p_src_port,
@@ -770,16 +778,12 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
osm_port_t * p_src_port,
 {
const osm_physp_t *p_src_physp;
const osm_physp_t *p_dest_physp;
-   boolean_t is_nonzero_gid = 0;
 
OSM_LOG_ENTER(sa->p_log);
 
p_src_physp = p_src_port->p_physp;
 
if (p_dgid)
-   is_nonzero_gid = ib_gid_is_notzero(p_dgid);
-
-   if (is_nonzero_gid)
p_pr->dgid = *p_dgid;
else {
p_dest_physp = p_dest_port->p_physp;
@@ -799,7 +803,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
osm_port_t * p_src_port,
p_pr->hop_flow_raw &= cl_hton32(1 << 31);
 
/* Only set HopLimit if going through a router */
-   if (is_nonzero_gid)
+   if (gid_is_off_subnet(sa, &p_pr->dgid))
p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
 
p_pr->pkey = p_parms->pkey;
@@ -1248,10 +1252,7 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t * sa,
memset(p_dgid, 0, sizeof(*p_dgid));
 
if (comp_mask & IB_PR_COMPMASK_DGID) {
-   if (!ib_gid_is_link_local(&p_pr->dgid) &&
-   !ib_gid_is_multicast(&p_pr->dgid) &&
-   ib_gid_get_subnet_prefix(&p_pr->dgid) !=
-   sa->p_subn->opt.subnet_prefix) {
+   if (gid_is_off_subnet(sa, &p_pr->dgid)) {
dest_guid = find_router(sa, p_pr->dgid.unicast.prefix);
if (!dest_guid) {
char gid_str[INET6_ADDRSTRLEN];


This maintains the p_dgid found in pr_rcv_get_end_points but appropriately
set's the HopLimit when the DGID is off subnet.

Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Hal Rosenstock
On Tue, Dec 15, 2009 at 12:18 PM, Ira Weiny  wrote:
> On Tue, 15 Dec 2009 10:15:32 -0700
> Jason Gunthorpe  wrote:
>
>> > > However, I don't understand the comment "Only set HopLimit if going 
>> > > through a
>> > > router"?
>> >
>> > This is from '#ifdef ROUTER_EXP' days - as far as I could understand
>> > HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.
>>
>> If HopLimit is 0 then no GRH is required, if it is non 0 then the user
>> of that path needs to apply a GRH. IIRC there are already a few things
>> in the kernel that follow this?
>
> Is that in the spec somewhere?

IBA 1.2.1 vol 1 p.229 8.3.6 for one.

> The follow on would be that if they specify a
> DGID in the PathRecord query and the result is subnet local they must still
> use a GRH even if the packet is going to stay local?

No; with subnet local traffic, GRH is optional.

-- Hal

>  I think this makes sense
> but I wonder if __everyone__ is following that?
>
> Ira
>
>>
>> Jason
>
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> wei...@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Hal Rosenstock
On Tue, Dec 15, 2009 at 12:15 PM, Jason Gunthorpe
 wrote:
>> > However, I don't understand the comment "Only set HopLimit if going 
>> > through a
>> > router"?
>>
>> This is from '#ifdef ROUTER_EXP' days - as far as I could understand
>> HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.
>
> If HopLimit is 0 then no GRH is required, if it is non 0 then the user

I think it is:
HopLimit 0 or 1 requires no GRH (GRH is optional in these cases)
and
other than 0 or 1 requires GRH

> of that path needs to apply a GRH. IIRC there are already a few things
> in the kernel that follow this?

Yes, we started down this path. This was all discussed on the list
back prior to SC07 days AFAIR.

-- Hal

> Jason
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Hal Rosenstock
On Tue, Dec 15, 2009 at 12:09 PM, Ira Weiny  wrote:
> On Tue, 15 Dec 2009 10:16:42 -0500
> Hal Rosenstock  wrote:
>
>> Ira,
>>
>> On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny  wrote:
>> > Sasha, Hal,
>> >
>> > I have found that the following patch caused our SRP connected storage to
>> > break.
>>
>> What is causing the SRP target to fail ? Is it a non zero hop limit
>> returned in the SA PathRecord ?
>
> I am not sure exactly because I don't know what the SRP Target is seeing.
> (pesky firmware...)  :-(
>
>>
>> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48
>> >
>> >    opensm/osm_sa_path_record.c: separate router guid resolution code
>> >
>> >    Move off subnet destination (router address) resolution code to separate
>> >    function to improve readability.
>> >
>> >    Signed-off-by: Sasha Khapyorsky 
>> >
>> > I further traced the problem to pr_rcv_build_pr and the patch below fixes 
>> > the
>> > bug.
>>
>> On quick glance, I'm missing the connection between Sasha's patch and
>> this routine setting something bad in the hop limit of the returned
>> path record.
>
> It sets p_dgid even when we are not using a router which results in
> is_nonzero_gid being true.

Yes, that's the cause of the issue.

>>
>> > 16:03:34 > git diff
>> > diff --git a/opensm/opensm/osm_sa_path_record.c 
>> > b/opensm/opensm/osm_sa_path_record.c
>> > index be0cd71..32c889f 100644
>> > --- a/opensm/opensm/osm_sa_path_record.c
>> > +++ b/opensm/opensm/osm_sa_path_record.c
>> > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
>> > osm_port_t * p_src_port,
>> >
>> >        /* Only set HopLimit if going through a router */
>> >        if (is_nonzero_gid)
>> > -               p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
>> > +               p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
>>
>> This may fix this issue but it doesn't look right to me and I think
>> would break router scenarios and corrupts other field(s). Path record
>> fields are in network (not host) order.
>
> But HopLimit is an 8bit field?

Yes but in terms of the ib_path_rec_t definition hop_flow_raw is a 32 bit field.

> I might be wrong on this but I was thinking
> this set RawTraffic and parts of Flow label to 1's.  I should double check
> that.  Also, I wonder if the FW on the other side is flipping this wrong?  :-/
> I have not been able to determine this and I still have to test with our other
> vendor's SRP target...
>
>>
>> >        p_pr->pkey = p_parms->pkey;
>> >        ib_path_rec_set_sl(p_pr, p_parms->sl);
>> >
>> >
>> > "hop_flow_raw" is not really a 32bit value but rather 4 fields:
>> > Component     Length(bits)   Offset(bits)
>> > RawTraffic    1              352
>> > Reserved      3              353
>> > FlowLabel     20             356
>> > HopLimit      8              384
>> >
>> >
>> > However, I don't understand the comment "Only set HopLimit if going 
>> > through a
>> > router"?
>>
>> Well, in a sense the HopLimit is always set. It's already set to 0
>> right above that code snippet in the patch where:
>> p_pr->hop_flow_raw &= cl_hton32(1 << 31);
>
> True, however, I believe the behavior before was that the entire field is 0
> so the above statement really does nothing.

For all practical purposes that is true as raw traffic is deprecated.

>> This code is only setting the HopLimit to 255 (as this is very
>> primitive so far) when going through a router. Maybe it could be
>> stated better.
>
> That is exactly what Sasha's patch changes.  This is not only executed when
> going through a router but at all times DGID is specified in the CompMask.  I
> am willing to accept the fact that perhaps p_dgid should not be set unless we
> find a router, however, I __don't__ think that is correct.  Searching the spec
> I attempted to find where HopLimit was supposed to be 0xFF only when going
> through a router and I could not find it; therefore my email, I'm confused
> again...  :-(

When going through a router, it needs to be set to something other
than 0 or 1. 255 (max) was used as there's nothing currently to scope
it down to something more realistic. See IBA 1.2.1 vol 1 p.229 8.3.6
for one on settings for this field.

>>
>> > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are 
>> > heading
>> > through a router? I don't think it matters if we are going through a router
>> > or not does it?
>>
>> It's used when DGID is selected in the SA PR query via the comp mask
>> and in that instance is used for both the router and non router DGID
>> cases.
>
> So we agree that Sasha's patch is correct?

Yes; it restores this back the way it used to be when it worked.

-- Hal

>>
>> > If not, I think the comment needs to be removed in the patch above.
>>
>> I'm not following why you say this.
>
> Well, if HopLimit must only be set to 0xff when going through a router and
> Sasha's patch is correct, then I think we need another way to determine if
> this path is going through a router.  In that case the comment is correct 

Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Ira Weiny
On Tue, 15 Dec 2009 10:31:40 -0700
Jason Gunthorpe  wrote:

[snip]

> 
> The spec provides no guidance whatsoever on how to tell if a packets
> generated from a PR need a GRH or not. The only other choice is to
> compare the resulting DGID against all the GID prefixs on the port and
> see if any match.
> 
> If a router spec is ever written I'm sure it will clarify this matter,
> and everyone will follow.
> 
> The SM decides what hop limit should be so the SM decides if it needs
> a GRH. I belive the original long ago version of this always set the
> hop limit to 0 unless the DLID was a router port.

Ok, then I think we need a new way to tell if we are hitting a router port in
pr_rcv_build_pr.  I don't think having a non-zero GID is the definition of
going through a router port.

Ira

> 
> Jason


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Jason Gunthorpe
On Tue, Dec 15, 2009 at 09:18:19AM -0800, Ira Weiny wrote:
> On Tue, 15 Dec 2009 10:15:32 -0700
> Jason Gunthorpe  wrote:
> 
> > > > However, I don't understand the comment "Only set HopLimit if going 
> > > > through a
> > > > router"?
> > > 
> > > This is from '#ifdef ROUTER_EXP' days - as far as I could understand
> > > HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.
> > 
> > If HopLimit is 0 then no GRH is required, if it is non 0 then the user
> > of that path needs to apply a GRH. IIRC there are already a few things
> > in the kernel that follow this?
> 
> Is that in the spec somewhere?  The follow on would be that if they specify a
> DGID in the PathRecord query and the result is subnet local they must still
> use a GRH even if the packet is going to stay local?  I think this makes sense
> but I wonder if __everyone__ is following that?

The spec provides no guidance whatsoever on how to tell if a packets
generated from a PR need a GRH or not. The only other choice is to
compare the resulting DGID against all the GID prefixs on the port and
see if any match.

If a router spec is ever written I'm sure it will clarify this matter,
and everyone will follow.

The SM decides what hop limit should be so the SM decides if it needs
a GRH. I belive the original long ago version of this always set the
hop limit to 0 unless the DLID was a router port.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Ira Weiny
On Tue, 15 Dec 2009 10:15:32 -0700
Jason Gunthorpe  wrote:

> > > However, I don't understand the comment "Only set HopLimit if going 
> > > through a
> > > router"?
> > 
> > This is from '#ifdef ROUTER_EXP' days - as far as I could understand
> > HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.
> 
> If HopLimit is 0 then no GRH is required, if it is non 0 then the user
> of that path needs to apply a GRH. IIRC there are already a few things
> in the kernel that follow this?

Is that in the spec somewhere?  The follow on would be that if they specify a
DGID in the PathRecord query and the result is subnet local they must still
use a GRH even if the packet is going to stay local?  I think this makes sense
but I wonder if __everyone__ is following that?

Ira

> 
> Jason


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Ira Weiny
On Tue, 15 Dec 2009 19:03:17 +0200
Sasha Khapyorsky  wrote:

> Hi Ira,
> 
> On 16:43 Mon 14 Dec , Ira Weiny wrote:
> > 
> > I have found that the following patch caused our SRP connected storage to
> > break.
> > 
> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48
> > 
> > opensm/osm_sa_path_record.c: separate router guid resolution code
> > 
> > Move off subnet destination (router address) resolution code to separate
> > function to improve readability.
> > 
> > Signed-off-by: Sasha Khapyorsky 
> > 
> > I further traced the problem to pr_rcv_build_pr and the patch below fixes 
> > the
> > bug.
> 
> Nice catch and great investigation!
> 
> > 
> > 16:03:34 > git diff 
> > diff --git a/opensm/opensm/osm_sa_path_record.c 
> > b/opensm/opensm/osm_sa_path_record.c
> > index be0cd71..32c889f 100644
> > --- a/opensm/opensm/osm_sa_path_record.c
> > +++ b/opensm/opensm/osm_sa_path_record.c
> > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> > osm_port_t * p_src_port,
> >  
> > /* Only set HopLimit if going through a router */
> > if (is_nonzero_gid)
> > -   p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> > +   p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
> >  
> > p_pr->pkey = p_parms->pkey;
> > ib_path_rec_set_sl(p_pr, p_parms->sl);
> > 
> > 
> > "hop_flow_raw" is not really a 32bit value but rather 4 fields:
> > Component Length(bits)   Offset(bits)
> > RawTraffic1  352
> > Reserved  3  353
> > FlowLabel 20 356
> > HopLimit  8  384
> 
> This patch doesn't look correct for me. Instead of placing
> IB_HOPLIMIT_MAX in proper place this will put it in RawTraffic and
> Reserved fields on little-endian machines (such as x86). And this
> changes nothing on a big-endian machine. Right?

Yea I think this is wrong.  As I said perhaps the FW on the other side is
interpreting the fields wrong.  I will check with the vendor.

> 
> Following your investigation it seems that SRP target breaks when
> IB_HOPLIMIT_MAX is set in SA PR query responses for intra-subnet DGIDs.

Yes

> 
> > However, I don't understand the comment "Only set HopLimit if going through 
> > a
> > router"?
> 
> This is from '#ifdef ROUTER_EXP' days - as far as I could understand
> HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.

But I have not found where that is stated.

> 
> > 
> > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
> > through a router?
> 
> Seems that this was an original logic.
> 
> > I don't think it matters if we are going through a router
> > or not does it?
> 
> I thought so too, but as we can see it triggers the bug you discovered.
> 
> So immediate fix for this is to move p_dgid setup back to router
> section, like this:
> 
> 
> diff --git a/opensm/opensm/osm_sa_path_record.c 
> b/opensm/opensm/osm_sa_path_record.c
> index 7855be5..fbeef03 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -1236,6 +1236,8 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t * 
> sa,
>   sa_status = IB_SA_MAD_STATUS_INVALID_GID;
>   goto Exit;
>   }
> + if (p_dgid)
> + *p_dgid = p_pr->dgid;
>   } else
>   dest_guid = p_pr->dgid.unicast.interface_id;
>  
> @@ -1253,9 +1255,6 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t * 
> sa,
>   sa_status = IB_SA_MAD_STATUS_INVALID_GID;
>   goto Exit;
>   }
> -
> - if (p_dgid)
> - *p_dgid = p_pr->dgid;
>   } else {
>   *pp_dest_port = 0;
>   if (comp_mask & IB_PR_COMPMASK_DLID) {
> 
> 
> BTW, could you test this? Unfortunately I don't have SRP target :(

Yes this works and was my original fix.  However, it did not seem right and I
think Hal agrees.

> 
> And seems that as more solid solution we need to cleanup some logic in
> osm_sa_path_record.c code.

I think so.

Ira

> 
> Sasha


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Jason Gunthorpe
> > However, I don't understand the comment "Only set HopLimit if going through 
> > a
> > router"?
> 
> This is from '#ifdef ROUTER_EXP' days - as far as I could understand
> HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.

If HopLimit is 0 then no GRH is required, if it is non 0 then the user
of that path needs to apply a GRH. IIRC there are already a few things
in the kernel that follow this?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Ira Weiny
On Tue, 15 Dec 2009 10:16:42 -0500
Hal Rosenstock  wrote:

> Ira,
> 
> On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny  wrote:
> > Sasha, Hal,
> >
> > I have found that the following patch caused our SRP connected storage to
> > break.
> 
> What is causing the SRP target to fail ? Is it a non zero hop limit
> returned in the SA PathRecord ?

I am not sure exactly because I don't know what the SRP Target is seeing.
(pesky firmware...)  :-(

> 
> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48
> >
> >    opensm/osm_sa_path_record.c: separate router guid resolution code
> >
> >    Move off subnet destination (router address) resolution code to separate
> >    function to improve readability.
> >
> >    Signed-off-by: Sasha Khapyorsky 
> >
> > I further traced the problem to pr_rcv_build_pr and the patch below fixes 
> > the
> > bug.
> 
> On quick glance, I'm missing the connection between Sasha's patch and
> this routine setting something bad in the hop limit of the returned
> path record.

It sets p_dgid even when we are not using a router which results in
is_nonzero_gid being true.

> 
> > 16:03:34 > git diff
> > diff --git a/opensm/opensm/osm_sa_path_record.c 
> > b/opensm/opensm/osm_sa_path_record.c
> > index be0cd71..32c889f 100644
> > --- a/opensm/opensm/osm_sa_path_record.c
> > +++ b/opensm/opensm/osm_sa_path_record.c
> > @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> > osm_port_t * p_src_port,
> >
> >        /* Only set HopLimit if going through a router */
> >        if (is_nonzero_gid)
> > -               p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> > +               p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
> 
> This may fix this issue but it doesn't look right to me and I think
> would break router scenarios and corrupts other field(s). Path record
> fields are in network (not host) order.

But HopLimit is an 8bit field?  I might be wrong on this but I was thinking
this set RawTraffic and parts of Flow label to 1's.  I should double check
that.  Also, I wonder if the FW on the other side is flipping this wrong?  :-/
I have not been able to determine this and I still have to test with our other
vendor's SRP target...

> 
> >        p_pr->pkey = p_parms->pkey;
> >        ib_path_rec_set_sl(p_pr, p_parms->sl);
> >
> >
> > "hop_flow_raw" is not really a 32bit value but rather 4 fields:
> > Component     Length(bits)   Offset(bits)
> > RawTraffic    1              352
> > Reserved      3              353
> > FlowLabel     20             356
> > HopLimit      8              384
> >
> >
> > However, I don't understand the comment "Only set HopLimit if going through 
> > a
> > router"?
> 
> Well, in a sense the HopLimit is always set. It's already set to 0
> right above that code snippet in the patch where:
> p_pr->hop_flow_raw &= cl_hton32(1 << 31);

True, however, I believe the behavior before was that the entire field is 0
so the above statement really does nothing.

> This code is only setting the HopLimit to 255 (as this is very
> primitive so far) when going through a router. Maybe it could be
> stated better.

That is exactly what Sasha's patch changes.  This is not only executed when
going through a router but at all times DGID is specified in the CompMask.  I
am willing to accept the fact that perhaps p_dgid should not be set unless we
find a router, however, I __don't__ think that is correct.  Searching the spec
I attempted to find where HopLimit was supposed to be 0xFF only when going
through a router and I could not find it; therefore my email, I'm confused
again...  :-(

> 
> > Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
> > through a router? I don't think it matters if we are going through a router
> > or not does it?
> 
> It's used when DGID is selected in the SA PR query via the comp mask
> and in that instance is used for both the router and non router DGID
> cases.

So we agree that Sasha's patch is correct?

> 
> > If not, I think the comment needs to be removed in the patch above.
> 
> I'm not following why you say this.

Well, if HopLimit must only be set to 0xff when going through a router and
Sasha's patch is correct, then I think we need another way to determine if
this path is going through a router.  In that case the comment is correct but
the logic is wrong.

Ira

> 
> -- Hal
> 
> > Thanks,
> > Ira
> >
> > --
> > Ira Weiny
> > Math Programmer/Computer Scientist
> > Lawrence Livermore National Lab
> > 925-423-8008
> > wei...@llnl.gov
> >


-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Sasha Khapyorsky
On 10:16 Tue 15 Dec , Hal Rosenstock wrote:
> 
> > patch: 3d20f82edd3246879063b77721d0bcef927bdc48
> >
> >    opensm/osm_sa_path_record.c: separate router guid resolution code
> >
> >    Move off subnet destination (router address) resolution code to separate
> >    function to improve readability.
> >
> >    Signed-off-by: Sasha Khapyorsky 
> >
> > I further traced the problem to pr_rcv_build_pr and the patch below fixes 
> > the
> > bug.
> 
> On quick glance, I'm missing the connection between Sasha's patch and
> this routine setting something bad in the hop limit of the returned
> path record.

In this patch I moved *p_dgid initialization out from router only
section, as side effect it turns on this section:

/* Only set HopLimit if going through a router */
if (is_nonzero_gid)
p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);

for all SA PR queries where DGID was selected (local subnet or not).

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Sasha Khapyorsky
Hi Ira,

On 16:43 Mon 14 Dec , Ira Weiny wrote:
> 
> I have found that the following patch caused our SRP connected storage to
> break.
> 
> patch: 3d20f82edd3246879063b77721d0bcef927bdc48
> 
> opensm/osm_sa_path_record.c: separate router guid resolution code
> 
> Move off subnet destination (router address) resolution code to separate
> function to improve readability.
> 
> Signed-off-by: Sasha Khapyorsky 
> 
> I further traced the problem to pr_rcv_build_pr and the patch below fixes the
> bug.

Nice catch and great investigation!

> 
> 16:03:34 > git diff 
> diff --git a/opensm/opensm/osm_sa_path_record.c 
> b/opensm/opensm/osm_sa_path_record.c
> index be0cd71..32c889f 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> osm_port_t * p_src_port,
>  
> /* Only set HopLimit if going through a router */
> if (is_nonzero_gid)
> -   p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> +   p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
>  
> p_pr->pkey = p_parms->pkey;
> ib_path_rec_set_sl(p_pr, p_parms->sl);
> 
> 
> "hop_flow_raw" is not really a 32bit value but rather 4 fields:
> Component Length(bits)   Offset(bits)
> RawTraffic1  352
> Reserved  3  353
> FlowLabel 20 356
> HopLimit  8  384

This patch doesn't look correct for me. Instead of placing
IB_HOPLIMIT_MAX in proper place this will put it in RawTraffic and
Reserved fields on little-endian machines (such as x86). And this
changes nothing on a big-endian machine. Right?

Following your investigation it seems that SRP target breaks when
IB_HOPLIMIT_MAX is set in SA PR query responses for intra-subnet DGIDs.

> However, I don't understand the comment "Only set HopLimit if going through a
> router"?

This is from '#ifdef ROUTER_EXP' days - as far as I could understand
HopLimit should be set to IB_HOPLIMIT_MAX for inter-subnet MADs.

> 
> Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
> through a router?

Seems that this was an original logic.

> I don't think it matters if we are going through a router
> or not does it?

I thought so too, but as we can see it triggers the bug you discovered.

So immediate fix for this is to move p_dgid setup back to router
section, like this:


diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index 7855be5..fbeef03 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -1236,6 +1236,8 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t * sa,
sa_status = IB_SA_MAD_STATUS_INVALID_GID;
goto Exit;
}
+   if (p_dgid)
+   *p_dgid = p_pr->dgid;
} else
dest_guid = p_pr->dgid.unicast.interface_id;
 
@@ -1253,9 +1255,6 @@ static ib_net16_t pr_rcv_get_end_points(IN osm_sa_t * sa,
sa_status = IB_SA_MAD_STATUS_INVALID_GID;
goto Exit;
}
-
-   if (p_dgid)
-   *p_dgid = p_pr->dgid;
} else {
*pp_dest_port = 0;
if (comp_mask & IB_PR_COMPMASK_DLID) {


BTW, could you test this? Unfortunately I don't have SRP target :(

And seems that as more solid solution we need to cleanup some logic in
osm_sa_path_record.c code.

Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SRP issues with OpenSM 3.3.3

2009-12-15 Thread Hal Rosenstock
Ira,

On Mon, Dec 14, 2009 at 7:43 PM, Ira Weiny  wrote:
> Sasha, Hal,
>
> I have found that the following patch caused our SRP connected storage to
> break.

What is causing the SRP target to fail ? Is it a non zero hop limit
returned in the SA PathRecord ?

> patch: 3d20f82edd3246879063b77721d0bcef927bdc48
>
>    opensm/osm_sa_path_record.c: separate router guid resolution code
>
>    Move off subnet destination (router address) resolution code to separate
>    function to improve readability.
>
>    Signed-off-by: Sasha Khapyorsky 
>
> I further traced the problem to pr_rcv_build_pr and the patch below fixes the
> bug.

On quick glance, I'm missing the connection between Sasha's patch and
this routine setting something bad in the hop limit of the returned
path record.

> 16:03:34 > git diff
> diff --git a/opensm/opensm/osm_sa_path_record.c 
> b/opensm/opensm/osm_sa_path_record.c
> index be0cd71..32c889f 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
> osm_port_t * p_src_port,
>
>        /* Only set HopLimit if going through a router */
>        if (is_nonzero_gid)
> -               p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
> +               p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;

This may fix this issue but it doesn't look right to me and I think
would break router scenarios and corrupts other field(s). Path record
fields are in network (not host) order.

>        p_pr->pkey = p_parms->pkey;
>        ib_path_rec_set_sl(p_pr, p_parms->sl);
>
>
> "hop_flow_raw" is not really a 32bit value but rather 4 fields:
> Component     Length(bits)   Offset(bits)
> RawTraffic    1              352
> Reserved      3              353
> FlowLabel     20             356
> HopLimit      8              384
>
>
> However, I don't understand the comment "Only set HopLimit if going through a
> router"?

Well, in a sense the HopLimit is always set. It's already set to 0
right above that code snippet in the patch where:
p_pr->hop_flow_raw &= cl_hton32(1 << 31);
This code is only setting the HopLimit to 255 (as this is very
primitive so far) when going through a router. Maybe it could be
stated better.

> Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
> through a router? I don't think it matters if we are going through a router
> or not does it?

It's used when DGID is selected in the SA PR query via the comp mask
and in that instance is used for both the router and non router DGID
cases.

> If not, I think the comment needs to be removed in the patch above.

I'm not following why you say this.

-- Hal

> Thanks,
> Ira
>
> --
> Ira Weiny
> Math Programmer/Computer Scientist
> Lawrence Livermore National Lab
> 925-423-8008
> wei...@llnl.gov
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SRP issues with OpenSM 3.3.3

2009-12-14 Thread Ira Weiny
Sasha, Hal,

I have found that the following patch caused our SRP connected storage to
break.

patch: 3d20f82edd3246879063b77721d0bcef927bdc48

opensm/osm_sa_path_record.c: separate router guid resolution code

Move off subnet destination (router address) resolution code to separate
function to improve readability.

Signed-off-by: Sasha Khapyorsky 

I further traced the problem to pr_rcv_build_pr and the patch below fixes the
bug.

16:03:34 > git diff 
diff --git a/opensm/opensm/osm_sa_path_record.c 
b/opensm/opensm/osm_sa_path_record.c
index be0cd71..32c889f 100644
--- a/opensm/opensm/osm_sa_path_record.c
+++ b/opensm/opensm/osm_sa_path_record.c
@@ -800,7 +800,7 @@ static void pr_rcv_build_pr(IN osm_sa_t * sa, IN const 
osm_port_t * p_src_port,
 
/* Only set HopLimit if going through a router */
if (is_nonzero_gid)
-   p_pr->hop_flow_raw |= cl_hton32(IB_HOPLIMIT_MAX);
+   p_pr->hop_flow_raw |= IB_HOPLIMIT_MAX;
 
p_pr->pkey = p_parms->pkey;
ib_path_rec_set_sl(p_pr, p_parms->sl);


"hop_flow_raw" is not really a 32bit value but rather 4 fields:
Component Length(bits)   Offset(bits)
RawTraffic1  352
Reserved  3  353
FlowLabel 20 356
HopLimit  8  384


However, I don't understand the comment "Only set HopLimit if going through a
router"?

Was the intent to only set p_dgid in pr_rcv_get_end_points if we are heading
through a router?  I don't think it matters if we are going through a router
or not does it?  If not, I think the comment needs to be removed in the patch
above.

Thanks,
Ira

-- 
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
wei...@llnl.gov
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html