> From: [email protected] [mailto:linux-rdma-
> [email protected]] On Behalf Of Jason Gunthorpe
> Sent: Wednesday, July 22, 2015 5:09 PM
> To: Wan, Kaike
> Cc: [email protected]; Fleck, John; Weiny, Ira
> Subject: Re: [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink
>
> On Thu, Jul 09, 2015 at 01:34:28PM -0400, [email protected] wrote:
> > From: Kaike Wan <[email protected]>
> >
> > This patch routes a SA pathrecord query to netlink first and processes
> > the response appropriately. If a failure is returned, the request will
> > be sent through IB. The decision whether to route the request to
> > netlink first is determined by the presence of a listener for the
> > local service netlink multicast group. If the user-space local service
> > netlink multicast group listener is not present, the request will be
> > sent through IB, just like what is currently being done.
>
> So, I tested it again, and the UAPI looks pretty reasonable now.
>
> The netlink validation still needs to be fixed.
>
> > There is where we might have a problem: nla_parse() allows only one
> > entry for each attribute type in the returned array tb[]. If we have
> > multiple (say 6) pathrecords returned, each having different flags
> > (for different path use), a later one will replace an earlier one in
> > tb[].
>
> The parse is OK, continue to use the for loop, nla_parse does more than just
> extract into tb, it validates all the sizes are correct, ignore tb.
>
> My testing shows it seems to get into some kind of fast repeating query loop:
>
> [ 4904.969188] Return answer 0
> [ 4904.969483] Return answer 0
> [ 4904.969703] Return answer 0
> [ 4904.969824] Return answer 0
> [ 4904.969943] Return answer 0
> [ 4904.970058] Return answer 0
> [ 4904.970175] Return answer 0
> [ 4904.970289] Return answer 0
>
> diff --git a/drivers/infiniband/core/sa_query.c
> b/drivers/infiniband/core/sa_query.c
> index 63ea36d05072..e6b0181e7076 100644
> --- a/drivers/infiniband/core/sa_query.c
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -640,6 +640,8 @@ static void ib_nl_process_good_resolve_rsp(struct
> ib_sa_query *query,
> }
> }
> }
> +
> + printk("Return answer %u\n",status);
> query->callback(query, status, mad);
> }
>
> You'll have to figure that out. I'm just doing ping X while running a
> responder
> on the netlink socket, v7 didn't do this, IIRC.
I did not see such a repeated query loop. With a modified version of your
debugging code:
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -653,6 +653,7 @@ static void ib_nl_process_good_resolve_rsp(struct
ib_sa_query *query,
}
}
}
+ printk("Request %u returns answer %u\n", nlh->nlmsg_seq);
query->callback(query, status, mad);
}
I ran rping a few times and never saw such a behavior:
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 2 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 3 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 4 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26
Request 5 returns answer 0
client DISCONNECT EVENT...
[root@phifs011 ~]#
The nl sequence increased sequentially. What's the difference?
In my debugging patch, I have also other print statements and I have never seen
such a loop at all.
>
> I think it has something to do with the timer as the first request works fine,
> then a pause, then a storm.
>
> Actually, looks like nothing removes nl queries from the timeout loop when
> they successfully complete. (ie this series doesn't even have a hope of
> working properly)
That is incorrect. The first thing in ib_nl_handle_resolve_resp is to find the
request and remove it from the ib_nl_rquest_list. the timeout routine will
remove the entries that have timed out from the nl request list.
>
> Please actually test this patch completely and thoroughly before sending
> another version.
Generally, I tested the patches (along with a debugging patch) with rping,
ping, SRP, and another performance path (no debugging patch). I also tested
another ibacm patch for setting timeout. When making major changes, I also ran
tests for timeout and error response cases.
>I'm broadly happy with this, so get your whole team to look
> it over agin.
>
> > + if (query->callback) {
> > + head = (const struct nlattr *) nlmsg_data(nlh);
> > + len = nlmsg_len(nlh);
> > + switch (query->path_use) {
> > + case LS_RESOLVE_PATH_USE_UNIDIRECTIONAL:
> > + mask = IB_PATH_PRIMARY | IB_PATH_OUTBOUND;
> > + break;
> > +
> > + case LS_RESOLVE_PATH_USE_ALL:
> > + case LS_RESOLVE_PATH_USE_GMP:
> > + default:
> > + mask = IB_PATH_PRIMARY | IB_PATH_GMP |
> > + IB_PATH_BIDIRECTIONAL;
> > + break;
> > + }
> > + nla_for_each_attr(curr, head, len, rem) {
> > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) {
> > + rec = nla_data(curr);
> > + /*
> > + * Get the first one. In the future, we may
> > + * need to get up to 6 pathrecords.
> > + */
> > + if (rec->flags & mask) {
>
> (rec_>flags & mask) == mask
>
> All requested bits must be set, not just any requested bit.
>
> A IB_PATH_PRIMARY | IB_PATH_OUTBOUND result does not satisfy the
> requirement for LS_RESOLVE_PATH_USE_GMP.
>
> The goal is to local the one path of many that satisfies the requirement.
> Future kernels will direct 6 path answers
Got it.
>
> > +static int ib_nl_handle_set_timeout(struct sk_buff *skb,
> > + struct netlink_callback *cb)
> > +{
> > + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh;
> > + int timeout, delta, abs_delta;
> > + const struct nlattr *attr;
> > + unsigned long flags;
> > + struct ib_sa_query *query;
> > + long delay = 0;
> > + struct nlattr *tb[LS_NLA_TYPE_MAX + 1];
> > + int ret;
> > +
> > + ret = nla_parse(tb, LS_NLA_TYPE_MAX, nlmsg_data(nlh),
> nlmsg_len(nlh),
> > + NULL);
> > + attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT];
> > + if (ret || !attr || nla_len(attr) != sizeof(u32))
> > + goto settimeout_out;
>
> This probably doesn't need the nested attribute, but I'm indifferent.
It's not a nested attribute, only a U32 attribute data. I will use a policy to
validate the entire response (for pathrecord also).
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html