On Fri, Sep 22, 2023 at 02:57:32PM +0200, Thomas Monjalon wrote:
> 20/09/2023 12:09, Bruce Richardson:
> > On Wed, Sep 20, 2023 at 12:00:08PM +0200, David Marchand wrote:
> > > On Thu, Sep 14, 2023 at 12:42 PM Bruce Richardson
> > > <bruce.richard...@intel.com> wrote:
> > > >
> > > > When examining the IOL testing failures for patch series [1], I observed
> > > > that the failures reported were in the eal_flags_file_prefix unit test.
> > > > I was able to reproduce this on my system by passing an additional
> > > > "--on-pci" flag to the test run, since the log to the test has errors
> > > > about device availability. Adding the "no-pci" flag to the individual
> > > 
> > > Something is not clear to me.
> > > 
> > > While I understand that passing "no-pci" helps avoiding the issue (as
> > > described below), I have some trouble understanding this passage
> > > (above) with "--on-pci".
> > 
> > That's a typo for no-pci. When I ran the test on my system with the main
> > process using no-pci, I was able to reproduce the issue seen in the IOL
> > lab. Otherwise I couldn't reproduce it.
> > 
> > > How did you reproduce the issue?
> > > 
> > > 
> > > > test commands used by the unit tests fixed the issue thereafter,
> > > > allowing the test to pass in all cases for me. Therefore, I am
> > > > submitting this patch in the hopes of making the test more robust, since
> > > > the observed failures seem unrelated to the original patchset [1] I
> > > > submitted.
> > > >
> > > > [1] http://patches.dpdk.org/project/dpdk/list/?series=29406
> > > >
> > > > Bruce Richardson (1):
> > > >   app/test: skip PCI bus scan when testing prefix flags
> > > >
> > > >  app/test/test_eal_flags.c | 20 ++++++++++----------
> > > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > > 
> > > Iiuc, the problem is that the file_prefix unit test can fail if any
> > > DPDK subsystem forgets to release some memory and some hugepages are
> > > left behind at the cleanup step.
> > > Passing --no-pci as you suggest hides issues coming from PCI drivers.
> > > 
> > > This is something I tried to fix too, with
> > > https://patchwork.dpdk.org/project/dpdk/list/?series=29288 though my
> > > fix only handles a part of the issue (here, the ethdev drivers).
> > > 
> > > Another way to make the file prefix more robust would be to remove the
> > > check on released memory, or move it to another test.
> > > 
> > I actually think the test is a good one to have. Also, taking in your patch
> > to help with the issue is a good idea also.
> > 
> > I'd still suggest that this patch be considered anyway, as there is no need
> > to do PCI bus scanning as part of this test. Therefore I'd view it as a
> > harmless addition that may help things.
> 
> I'm hesitating.
> This test is checking if some memory is left, and I think it is sane.
> If we add --no-pci, we reduce the coverage of this check.
> 
> Now that the root cause is fixed by David in ethdev
> (https://patches.dpdk.org/project/dpdk/patch/20230821085806.3062613-4-david.march...@redhat.com/)
> we could continue checking memory freeing with PCI drivers.
> So I tend to reject this patch.
> 
> Other opinions?
> 
No objection to this patch being rejected if not necessary.

However, I'd question if the normal case is actually checking for freeing
memory in PCI drivers. I suspect that in EAL cleanup we delete all files we
use, irrespective of whether the mappings are still in use. Then when the
process exits the hugepages will be completely freed back - even if some
components leaked memory. I believe this case is checking for correct EAL
cleanup of hugepage files, not for any memory leaks, and in that regard
omitting some components should make no difference.

/Bruce

Reply via email to