On 7/12/2024 4:18 AM, Ye, MingjinX wrote:


-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com>
Sent: Friday, July 12, 2024 12:10 AM
To: Ye, MingjinX <mingjinx...@intel.com>; dev@dpdk.org
Cc: sta...@dpdk.org
Subject: Re: [PATCH 3/3] net/vdev: fix insert vdev core dump

On 3/14/2024 10:36 AM, Mingjin Ye wrote:
Inserting a vdev device when the device arguments are already stored
in devargs_list, the rte_devargs_insert function replaces the supplied
new devargs with the found devargs and frees the new devargs. As a
result, the use of free devargs results in a core dump.

This patch fixes the issue by using valid devargs.

Fixes: f3a1188cee4a ("devargs: make device representation generic")
Cc: sta...@dpdk.org

Signed-off-by: Mingjin Ye <mingjinx...@intel.com>

I am not too familiar with how devargs works, so I have a trivial question.

I understand the point of this patch, and it is roughly as follows:

1) we enter insert_vdev and allocated new devargs structure (and copy the
`name` into devargs)
2) we set dev->device.name to devargs->name
3) we insert devargs into the list using rte_devargs_insert
4) if devargs list already had devargs with the same name, our devargs is
destroyed and replaced with devargs that is in the list
5) because of this, dev->device.name becomes invalid as that specific
devargs has been freed - it points to name from the old devargs

We do need to store devargs->name in dev->device.name, and we need to
do so after calling rte_devargs_insert to avoid keeping reference to memory
that was freed. So, provisionally,

Reviewed-by: Anatoly Burakov <anatoly.bura...@intel.com>

My question is, under what circumstances would there be duplicate entries
in devargs list?

In a multi-process circumstances, the primary and secondary processes both have 
"vdev" startup
Parameters. And their parameters have the same name. This causes multiple 
devargs objects with the same name
to be created in the secondary process, the 1st one constructed from the 
startup parameter and
the 2nd one constructed due to the VDEV_SCAN_ONE message received.

Path 1:
eal_parse_args->eal_parse_common_option: OPT_VDEV_NUM
eal_option_device_parse->rte_devargs_add

Path 2:
vdev_action: vdev_scan_one ->insert_vdev(alloc_devargs)

So, technically, this is only a problem because we're specifying --vdev at secondary process startup even though we could've just relied on vdev multiprocess hotplug to provide us with a device? Or is there more to it?

In any case, I think this patch is OK. I think the Fixes: tag is pointing to an incorrect commit. I've walked through the commit history, and it seems that this wasn't a bug initially, but became a bug when vdev multiprocess hotplug was added, so I think the correct Fixes: tag should be as follows:

Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")

So, on account of the above, I suggest rewording the commit message to something like the following:

In secondary processes, insert_vdev() may be called multiple times on the same device due to a combination of multiprocess hotplug for vdev bus, and EAL argument adding the same vdev. In this circumstance, when rte_devargs_insert() is called, the devargs->name reference becomes invalid because rte_devargs_insert() will destroy the just-allocated devargs and will replace the pointer with one from the devargs list. As a result, the reference to devargs->name stored in dev->device.name becomes invalid. Fix the issue by not setting device name until after rte_devargs_insert() was called.

Fixes: cdb068f031c6 ("bus/vdev: scan by multi-process channel")
Cc: sta...@dpdk.org

Signed-off-by: Mingjin Ye <mingjinx...@intel.com>
Reviewed-by: Anatoly Burakov <anatoly.bura...@intel.com>
--
Thanks,
Anatoly

Reply via email to