On 7/12/2024 4:18 AM, Ye, MingjinX wrote:
-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com>
Sent: Friday, July 12, 2024 12:10 AM
To: Ye, MingjinX <mingjinx...@intel.com>; dev@dpdk.org
Cc: sta...@dpdk.org
Subject: Re: [PATCH 3/3] net/vdev: fix insert vdev core dump
On 3/14/2024 10:36 AM, Mingjin Ye wrote:
Inserting a vdev device when the device arguments are already stored
in devargs_list, the rte_devargs_insert function replaces the supplied
new devargs with the found devargs and frees the new devargs. As a
result, the use of free devargs results in a core dump.
This patch fixes the issue by using valid devargs.
Fixes: f3a1188cee4a ("devargs: make device representation generic")
Cc: sta...@dpdk.org
Signed-off-by: Mingjin Ye <mingjinx...@intel.com>
I am not too familiar with how devargs works, so I have a trivial question.
I understand the point of this patch, and it is roughly as follows:
1) we enter insert_vdev and allocated new devargs structure (and copy the
`name` into devargs)
2) we set dev->device.name to devargs->name
3) we insert devargs into the list using rte_devargs_insert
4) if devargs list already had devargs with the same name, our devargs is
destroyed and replaced with devargs that is in the list
5) because of this, dev->device.name becomes invalid as that specific
devargs has been freed - it points to name from the old devargs
We do need to store devargs->name in dev->device.name, and we need to
do so after calling rte_devargs_insert to avoid keeping reference to memory
that was freed. So, provisionally,
Reviewed-by: Anatoly Burakov <anatoly.bura...@intel.com>
My question is, under what circumstances would there be duplicate entries
in devargs list?
In a multi-process circumstances, the primary and secondary processes both have
"vdev" startup
Parameters. And their parameters have the same name. This causes multiple
devargs objects with the same name
to be created in the secondary process, the 1st one constructed from the
startup parameter and
the 2nd one constructed due to the VDEV_SCAN_ONE message received.
Path 1:
eal_parse_args->eal_parse_common_option: OPT_VDEV_NUM
eal_option_device_parse->rte_devargs_add
Path 2:
vdev_action: vdev_scan_one ->insert_vdev(alloc_devargs)
So, technically, this is only a problem because we're specifying --vdev
at secondary process startup even though we could've just relied on vdev
multiprocess hotplug to provide us with a device? Or is there more to it?
In any case, I think this patch is OK. I think the Fixes: tag is
pointing to an incorrect commit. I've walked through the commit history,
and it seems that this wasn't a bug initially, but became a bug when
vdev multiprocess hotplug was added, so I think the correct Fixes: tag
should be as follows:
Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")
So, on account of the above, I suggest rewording the commit message to
something like the following:
In secondary processes, insert_vdev() may be called multiple times on
the same device due to a combination of multiprocess hotplug for vdev
bus, and EAL argument adding the same vdev. In this circumstance, when
rte_devargs_insert() is called, the devargs->name reference becomes
invalid because rte_devargs_insert() will destroy the just-allocated
devargs and will replace the pointer with one from the devargs list. As
a result, the reference to devargs->name stored in dev->device.name
becomes invalid. Fix the issue by not setting device name until after
rte_devargs_insert() was called.
Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")
Cc: sta...@dpdk.org
Signed-off-by: Mingjin Ye <mingjinx...@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.bura...@intel.com>
--
Thanks,
Anatoly