On 16:29 Wed 15 Apr , Hal Rosenstock wrote:
> >>
> >> ??/* Redirection information */
> >> ??typedef struct redir {
> >> + ?? ?? boolean_t redirection;
> >> + ?? ?? boolean_t invalid;
> >
> > Why using lid value != 0 is/was bad for redirection invalidation?
>
> b/c there are other fields supplied in the redirection which also
> could be invalid so this one flag summarizes that instead of
> overloading one of the fields.
Yes, and you can use lid value as such flag - just simpler.
> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_lid = 0;
> >> - ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].redir_qp = 0;
> >> + ?? ?? ?? ?? ?? ?? /* Clear redirection info for this port except
> >> orig_lid */
> >> + ?? ?? ?? ?? ?? ?? orig_lid = p_mon_node->redir_port[port].orig_lid;
> >> + ?? ?? ?? ?? ?? ?? memset(&p_mon_node->redir_port[port], 0,
> >> sizeof(redir_t));
> >> + ?? ?? ?? ?? ?? ?? p_mon_node->redir_port[port].orig_lid = orig_lid;
> >
> > Hmm, why should 'orig_lid' be part of redirection structure and not
> > placed on original node/port (below I see that it is used in
> > non-redirected paths)?
>
> What are you referring to here ?
Actually I was wrong - I don't where it is used at all.
The comment about using in non-redirected path was about pkey_ix.
> > I think it would be better to use structures like:
> >
> > struct node {
> > ?? ?? ?? ??....
> > ?? ?? ?? ??uint16_t lid;
> > ?? ?? ?? ??uint16_t pkey_ix;
>
> Why would lid and pkey_ix be part of node ?
You are using this with perfmgr_send_pc_mad() in main flow.
> It's like this with redir_tbl_size r.t. num_ports and redir_t
> redir_port[1] instead of your struct port {} ports[0]. Why would what
> you propose be better ?
My point was different - to separate redirection related data from main
flow.
> > PerfMgr is always running over discovered fabric so maybe local port
> > number should be detected later at start of PerfMgr process cycle just
> > using OpenSM DB.
>
> Why is that better than doing this at bind time of PerfMgr ?
At least two reasons: faster and less code.
> >> ?? ?? ?? }
> >> @@ -511,6 +556,10 @@ __osm_perfmgr_query_counters(cl_map_item_t * const
> >> p_map_item, void *context)
> >> ?? ?? ?? ?? ?? ?? ?? if (!osm_node_get_physp_ptr(node, port))
> >> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
> >>
> >> + ?? ?? ?? ?? ?? ?? if (mon_node->redir_port[port].redirection &&
> >> + ?? ?? ?? ?? ?? ?? ?? ?? mon_node->redir_port[port].invalid)
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? continue;
> >> +
> >
> > Are two flags really needed? Couldn't this be stripped down?
>
> Just invalid should be sufficient. I'll change this in the next version.
>
> > Also what about letting "chance" for port to refresh redirection info?
>
> What do you mean ?
When port has invalid redirection data, should you care about attempting
to refresh this?
> >> + ?? ?? /* call the transport layer for a list of local port pkeys */
> >> + ?? ?? status = osm_vendor_get_all_port_attr(pm->subn->p_osm->p_vendor,
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> >> attr_array, &num_ports);
> >
> > This heavy stuff is performed per redirection request, but it actually
> > uses same data (local port's pkey table). This looks very inefficient.
> > Instead of doing this you can get local port's pkey table only once at
> > PerfMgr process cycle start and do all checks against this already
> > initialized table. Of course only in case when redirection is enabled at
> > all.
>
> Redirection does not occur frequently.
How could we know:)
> Also, the pkey table could
> change in between
When OpenSM is in master mode it cannot change (PerfMgr is synchronized
with heavy sweep).
It is possible with standby OpenSM, so what - this single request will
fail once.
> and there's no local event support in OpenSM so I
> don't see a way around this other than polling.
Polling is not needed there.
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR,
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "ERR 4C1B: Index for Pkey 0x%x
> >> not found\n",
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? cl_ntoh16(cpi->redir_pkey));
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
> >> ?? ?? ?? ?? ?? ?? ?? }
> >
> > All above are not OpenSM errors, but wrong external data. I think it
> > should be logged as VERBOSE messages.
>
> I agree it's wrong external data but it seems serious enough to me to
> treat as an error.
And some stupid port will be able to put OpenSM in endless error
printing. I don't think it is a good idea.
> If not, at least INFO rather than VERBOSE so
> nothing special needs to be done to see these.
IMO VERBOSE level is most appropriate for such things, INFO is something
else and it is on by default.
> >> - ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??"redirection
> >> requested but disabled\n");
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? OSM_LOG(pm->log, OSM_LOG_ERROR, "ERR 4C16:
> >> "
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? "redirection requested but
> >> disabled\n");
> >> + ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? invalid = TRUE;
> >> + ?? ?? ?? ?? ?? ?? }
> >
> > This is not an error.
>
> Seems like some sort of configuration error to me if this is disabled
> at the manager but the PMA wants to use it.
PMA shouldn't dictate here.
> Other local configuration
> errors are treated as errors.
This is not configuration error - if admin decided to not bother with
redirection support that is fine.
> > BTW, why to bother with verifying redirection info when redirection
> > support is disabled anyway?
>
> I thought it was useful to know the redirection info was invalid
> rather than getting the disabled notification and then enabling and
> finding out.
For PMAs debug purposes redirection support should be switched "on"
obviously.
> It can easily be moved up earlier in the flow if that's
> better.
Would be better IMO.
> > In general I would suggest to not mix redirection case with main flow
> > (by using better data structures). The Redirection is not something
> > PerfMgr specific and ideally we could have separate redirection handling
> > module. I'm not requesting to do this now (in this implementation), but
> > at least flow separation is very desirable.
>
> I've done some more of this in subsequent patches not yet submitted.
> However, there is no other GS manager (and if it did exist would it
> use redirection ? there are known cases for PerfMgr currently).
>
> In any case, this is "poor man's" redirection support given what is
> currently available in the OpenFabrics IB stack.
Right. So I'm not requesting "generic redirection module", just to not
mix the main and redirected flows.
Sasha
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general