On 16:29 Wed 15 Apr     , Hal Rosenstock wrote:
> >>
> >> ??/* Redirection information */
> >> ??typedef struct redir {
> >> + ?? ?? boolean_t redirection;
> >> + ?? ?? boolean_t invalid;
> >
> > Why using lid value != 0 is/was bad for redirection invalidation?
> 
> b/c there are other fields supplied in the redirection which also
> could be invalid so this one flag summarizes that instead of
> overloading one of the fields.

Yes, and you can use lid value as such flag - just simpler.

> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_lid = 0;
> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_qp = 0;
> >> + ?? ?? ?? ?? ?? ?? /* Clear redirection info for this port except 
> >> orig_lid */
> >> + ?? ?? ?? ?? ?? ?? orig_lid = p_mon_node->redir_port[port].orig_lid;
> >> + ?? ?? ?? ?? ?? ?? memset(&p_mon_node->redir_port[port], 0, 
> >> sizeof(redir_t));
> >> + ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].orig_lid = orig_lid;
> >
> > Hmm, why should 'orig_lid' be part of redirection structure and not
> > placed on original node/port (below I see that it is used in
> > non-redirected paths)?
> 
> What are you referring to here ?

Actually I was wrong - I don't where it is used at all.
The comment about using in non-redirected path was about pkey_ix.

> > I think it would be better to use structures like:
> >
> > struct node {
> > ?? ?? ?? ??....
> > ?? ?? ?? ??uint16_t lid;
> > ?? ?? ?? ??uint16_t pkey_ix;
> 
> Why would lid and pkey_ix be part of node ?

You are using this with perfmgr_send_pc_mad() in main flow.

> It's like this with redir_tbl_size r.t. num_ports and redir_t
> redir_port[1] instead of your struct port {} ports[0]. Why would what
> you propose be better ?

My point was different - to separate redirection related data from main
flow.

> > PerfMgr is always running over discovered fabric so maybe local port
> > number should be detected later at start of PerfMgr process cycle just
> > using OpenSM DB.
> 
> Why is that better than doing this at bind time of PerfMgr ?

At least two reasons: faster and less code.

> >> ?? ?? ?? }
> >> @@ -511,6 +556,10 @@ __osm_perfmgr_query_counters(cl_map_item_t * const 
> >> p_map_item, void *context)
> >> ?? ?? ?? ?? ?? ?? ?? if (!osm_node_get_physp_ptr(node, port))
> >> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
> >>
> >> + ?? ?? ?? ?? ?? ?? if (mon_node->redir_port[port].redirection &&
> >> + ?? ?? ?? ?? ?? ?? ?? ?? mon_node->redir_port[port].invalid)
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
> >> +
> >
> > Are two flags really needed? Couldn't this be stripped down?
> 
> Just invalid should be sufficient. I'll change this in the next version.
> 
> > Also what about letting "chance" for port to refresh redirection info?
> 
> What do you mean ?

When port has invalid redirection data, should you care about attempting
to refresh this?

> >> + ?? ?? /* call the transport layer for a list of local port pkeys */
> >> + ?? ?? status = osm_vendor_get_all_port_attr(pm->subn->p_osm->p_vendor,
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 
> >> attr_array, &num_ports);
> >
> > This heavy stuff is performed per redirection request, but it actually
> > uses same data (local port's pkey table). This looks very inefficient.
> > Instead of doing this you can get local port's pkey table only once at
> > PerfMgr process cycle start and do all checks against this already
> > initialized table. Of course only in case when redirection is enabled at
> > all.
> 
> Redirection does not occur frequently.

How could we know:)

> Also, the pkey table could
> change in between

When OpenSM is in master mode it cannot change (PerfMgr is synchronized
with heavy sweep).

It is possible with standby OpenSM, so what - this single request will
fail once.

> and there's no local event support in OpenSM so I
> don't see a way around this other than polling.

Polling is not needed there.

> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR,
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "ERR 4C1B: Index for Pkey 0x%x 
> >> not found\n",
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? cl_ntoh16(cpi->redir_pkey));
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
> >> ?? ?? ?? ?? ?? ?? ?? }
> >
> > All above are not OpenSM errors, but wrong external data. I think it
> > should be logged as VERBOSE messages.
> 
> I agree it's wrong external data but it seems serious enough to me to
> treat as an error.

And some stupid port will be able to put OpenSM in endless error
printing. I don't think it is a good idea.

> If not, at least INFO rather than VERBOSE so
> nothing special needs to be done to see these.

IMO VERBOSE level is most appropriate for such things, INFO is something
else and it is on by default.

> >> - ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??"redirection 
> >> requested but disabled\n");
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR, "ERR 4C16: 
> >> "
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "redirection requested but 
> >> disabled\n");
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
> >> + ?? ?? ?? ?? ?? ?? }
> >
> > This is not an error.
> 
> Seems like some sort of configuration error to me if this is disabled
> at the manager but the PMA wants to use it.

PMA shouldn't dictate here.

> Other local configuration
> errors are treated as errors.

This is not configuration error - if admin decided to not bother with
redirection support that is fine.

> > BTW, why to bother with verifying redirection info when redirection
> > support is disabled anyway?
> 
> I thought it was useful to know the redirection info was invalid
> rather than getting the disabled notification and then enabling and
> finding out.

For PMAs debug purposes redirection support should be switched "on"
obviously.

> It can easily be moved up earlier in the flow if that's
> better.

Would be better IMO.

> > In general I would suggest to not mix redirection case with main flow
> > (by using better data structures). The Redirection is not something
> > PerfMgr specific and ideally we could have separate redirection handling
> > module.  I'm not requesting to do this now (in this implementation), but
> > at least flow separation is very desirable.
> 
> I've done some more of this in subsequent patches not yet submitted.
> However, there is no other GS manager (and if it did exist would it
> use redirection ? there are known cases for PerfMgr currently).
> 
> In any case, this is "poor man's" redirection support given what is
> currently available in the OpenFabrics IB stack.

Right. So I'm not requesting "generic redirection module", just to not
mix the main and redirected flows.

Sasha
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to