Tuesday, July 10, 2018 1:59 PM, Adrien Mazarguil: > Subject: Re: [PATCH v4 07/10] net/mlx5: probe all port representors > > On Tue, Jul 10, 2018 at 10:13:25AM +0000, Shahaf Shuler wrote: > > Tuesday, July 10, 2018 12:37 PM, Adrien Mazarguil: > > > Subject: Re: [PATCH v4 07/10] net/mlx5: probe all port representors > > > > > > On Mon, Jul 09, 2018 at 11:57:29AM +0000, Shahaf Shuler wrote: > > > > Hi Adrien, > > > > > > > > > > > > Thursday, July 5, 2018 11:46 AM, Adrien Mazarguil: > > > > > Subject: [PATCH v4 07/10] net/mlx5: probe all port representors > > > > > > > > > > Probe existing port representors in addition to their master > > > > > device and associate them automatically. > > > > > > > > > > To avoid collision between Ethernet devices, they are named as > follows: > > > > > > > > > > - "{DBDF}" for master/switch devices. > > > > > - "{DBDF}_representor_{rep}" with "rep" starting from 0 for port > > > > > representors. > > > > > > > > > > (Patch based on prior work from Yuanhan Liu) > > > > > > > > > > Signed-off-by: Adrien Mazarguil <adrien.mazarg...@6wind.com> > > > > > Signed-off-by: Nelio Laranjeiro <nelio.laranje...@6wind.com> > > > > > Reviewed-by: Xueming Li <xuemi...@mellanox.com> > > > > > Cc: Xueming Li <xuemi...@mellanox.com> > > > > > Cc: Shahaf Shuler <shah...@mellanox.com> > > > > > -- > > > > > v4 changes: > > > > > > > > > > - Fixed domain ID release once the last port using it is closed. > > > > > Closed > > > > > devices are not necessarily detached, their presence is not a good > > > > > indicator. Code was modified to check if they still use their domain > IDs > > > > > before deciding to release it. > > > <snip> > > > > > @@ -883,6 +915,41 @@ mlx5_dev_spawn(struct rte_device > *dpdk_dev, > > > > > priv->nl_socket_rdma = mlx5_nl_init(0, NETLINK_RDMA); > > > > > priv->nl_socket_route = mlx5_nl_init(RTMGRP_LINK, > > > > > NETLINK_ROUTE); > > > > > priv->nl_sn = 0; > > > > > + priv->representor = !!switch_info->representor; > > > > > + priv->domain_id = > RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID; > > > > > + priv->representor_id = > > > > > + switch_info->representor ? switch_info->port_name > : -1; > > > > > + /* > > > > > + * Look for sibling devices in order to reuse their switch > domain > > > > > + * if any, otherwise allocate one. > > > > > + */ > > > > > + i = mlx5_dev_to_port_id(dpdk_dev, NULL, 0); > > > > > + if (i > 0) { > > > > > + uint16_t port_id[i]; > > > > > + > > > > > + i = RTE_MIN(mlx5_dev_to_port_id(dpdk_dev, > port_id, i), i); > > > > > + while (i--) { > > > > > + const struct priv *opriv = > > > > > + rte_eth_devices[port_id[i]].data- > > > > > >dev_private; > > > > > + > > > > > + if (!opriv || > > > > > + opriv->domain_id == > > > > > + > RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) > > > > > + continue; > > > > > + priv->domain_id = opriv->domain_id; > > > > > > > > It looks like for the second port it will use the domain_id of the > > > > first port. Is > > > that what you intent? > > > > > > Yes, it's on purpose. Master and representors of a given device must > > > share the same domain ID to let applications know they can create > > > flow rules to forward traffic between them all. > > > > But this is not the case in Mellanox devices. On Mellanox devices each PF > along w/ its representors has a separate eswitch, and traffic cannot be > routed between the switches using flow rules. > > For example if we have PF0 along w/ its representor REP0_0 and PF1 along > w/ its representor REP1_0 . PF0 and REP0_0 will belong to switch X and PF1 > and REP1_0 will belong to switch domain Y. it is also being reflected on the > phys_switch_id. > > > > We should have switch domain per PF. > > Looks like I didn't understand your previous comment. I confirm there is no > such issue, one domain ID is allocated per PF/representors group, which are > identified by a common PCI bus address. It's fine because on mlx5, each > physical port exposes its own address, I assumed there was no need to > additionally compare phys_switch_id. Can this happen?
OK great. It is OK, the PF has only a single switch domain on which all its representors are connected. > > > > > Note - I couldn't test it due to compilation errors: > > > > > > > > > > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx > > > 5 > > > _nl.c: In function 'mlx5_nl_switch_info_cb': > > > > > > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx > > > 5 > > > _ > > > > nl.c:843:8: error: 'IFLA_PHYS_PORT_NAME' undecl ared (first use in > > > > this > > > function) > > > > case IFLA_PHYS_PORT_NAME: > > > > ^ > > > > > > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx > > > 5 > > > _ > > > > nl.c:843:8: note: each undeclared identifier is reported only > > > > once for each function it appears in > > > > > > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx > > > 5 > > > _ > > > > nl.c:851:8: error: 'IFLA_PHYS_SWITCH_ID' undecl ared (first use in > > > > this > > > function) > > > > case IFLA_PHYS_SWITCH_ID: > > > > ^ > > > > > > > > My system info: > > > > NAME="Red Hat Enterprise Linux Server" > > > > VERSION="7.3 (Maipo)" > > > > ID="rhel" > > > > ID_LIKE="fedora" > > > > VERSION_ID="7.3" > > > > PRETTY_NAME="Red Hat Enterprise Linux Server 7.3 (Maipo)" > > > > ANSI_COLOR="0;31" > > > > CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:server" > > > > > > > > HOME_URL="https://emea01.safelinks.protection.outlook.com/?url=https > > > % > 3A%2F%2Fwww.redhat.com%2F&data=02%7C01%7Cshahafs%40mellan > > > > ox.com%7C661e7b51087b460817c008d5e648bf1e%7Ca652971c7d2e4d9ba6a4 > > > > d149256f461b%7C0%7C0%7C636668122474445351&sdata=Lg8arhiYLvH5L > > > 2hef8DVhS8A3fVJ%2B5IZkLIHmqCd%2FmY%3D&reserved=0" > > > > > > > > BUG_REPORT_URL="https://emea01.safelinks.protection.outlook.com/?url > > > = > https%3A%2F%2Fbugzilla.redhat.com%2F&data=02%7C01%7Cshahafs% > > > > 40mellanox.com%7C661e7b51087b460817c008d5e648bf1e%7Ca652971c7d2e > > > > 4d9ba6a4d149256f461b%7C0%7C0%7C636668122474445351&sdata=3Do > > > > RKjxovM8tOgKLssC1mq2wwfhjpVUZSExXV4ywBEQ%3D&reserved=0" > > > > > > > > REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" > > > > REDHAT_BUGZILLA_PRODUCT_VERSION=7.3 > > > > REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" > > > > REDHAT_SUPPORT_PRODUCT_VERSION="7.3" > > > > Red Hat Enterprise Linux Server release 7.3 (Maipo) Red Hat > > > > Enterprise Linux Server release 7.3 (Maipo) > > > > > > OK, I'll redefine in v5 in case they are missing on the host system. > > > > > > <snip> > > > > > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h > > > > > index > > > > > 704046270..cc01310e0 100644 > > > > > --- a/drivers/net/mlx5/mlx5.h > > > > > +++ b/drivers/net/mlx5/mlx5.h > > > > > @@ -159,6 +159,7 @@ struct priv { > > > > > struct ibv_context *ctx; /* Verbs context. */ > > > > > struct ibv_device_attr_ex device_attr; /* Device properties. > */ > > > > > struct ibv_pd *pd; /* Protection Domain. */ > > > > > + char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device > name. */ > > > > > > > > > > > > Why we need a dedicated entry for the ibdev_name? it is already > > > > part of > > > priv->ctx->device->name. > > > > > > Heh, same reason as the next line below, don't forget those damn > > > secondaries which can't dereference local pointers from the primary > > > process > > > :) > > > > Right 😊. > > > > > > > > > > char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path > for > > > > > secondary */ > > > <snip> > > > > > struct rte_eth_dev_info *info) > > > > > info->speed_capa = priv->link_speed_capa; > > > > > info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK; > > > > > mlx5_set_default_params(dev, info); > > > > > + info->switch_info.name = dev->data->name; > > > > > + info->switch_info.domain_id = priv->domain_id; > > > > > + info->switch_info.port_id = priv->representor_id; > > > > > + if (priv->representor) { > > > > > + unsigned int i = mlx5_dev_to_port_id(dev->device, > NULL, 0); > > > > > + uint16_t port_id[i]; > > > > > + > > > > > + i = RTE_MIN(mlx5_dev_to_port_id(dev->device, > port_id, i), > > > > > i); > > > > > + while (i--) { > > > > > + struct priv *opriv = > > > > > + rte_eth_devices[port_id[i]].data- > > > > > >dev_private; > > > > > + > > > > > + if (!opriv || > > > > > + opriv->representor || > > > > > + opriv->domain_id != priv->domain_id) > > > > > + continue; > > > > > + /* > > > > > + * Override switch name with that of the > master > > > > > + * device. > > > > > + */ > > > > > + info->switch_info.name = opriv->dev_data- > >name; > > > > > + break; > > > > > > > > According to this logic it means once the master device is closed, > > > > all the > > > representors are no longer belong to the same switch (switch name of > > > each is different) which is not correct. > > > > > > They still share the same domain ID, which is what actually matters. > > > The switch name is only provided to let applications identify the > > > master > > > (control) device in case it's needed. > > > > > > > According to your notes it is possible to close master w/o closing > > > > the > > > representor. > > > > > > This allows devices to be probed in any order on a needed basis, not > > > all at once. It's done on purpose to pave the way for hotplug support. > > > > > > > Why not just storing the master switch name when probing the > > > representor and to use it as is on the dev_info? > > > > > > The switch name *must* be that of the master device. If the master > > > is not probed, there can't be a switch name. However there's no real > > > provision for this in the API, so I chose the most acceptable unique > > > name, which is the name of the local device. Would you prefer an empty > name instead? > > > > The current approach is OK. > > I was just suggesting to skip the loop iteration by saving the switch name > > on > the private structure. > > This is unsafe, if the master device is never probed or somehow replaced by > a different device with no relationship, this information could be wrong. > > Keep in mind these ethdev names are just identifiers. The only requirement > is that they must be unique, however anything can be written in there. If > some name is not taken, another device can use it. > > > > Thing is, on mlx5 flow rules can be created directly between > > > representors without involving the master device. An empty switch > > > name may be misleading in this respect. > > > > > > What do you suggest? > > -- > Adrien Mazarguil > 6WIND