On Fri, Aug 20, 2021 at 9:57 AM Joyce Kong <[email protected]> wrote:
>
> > -----Original Message-----
> > From: David Marchand <[email protected]>
> > Sent: Tuesday, August 17, 2021 4:17 PM
> > To: Joyce Kong <[email protected]>
> > Cc: Burakov, Anatoly <[email protected]>; Olivier Matz
> > <[email protected]>; Andrew Rybchenko
> > <[email protected]>; Wang, Yipeng1
> > <[email protected]>; Gobriel, Sameh <[email protected]>;
> > Bruce Richardson <[email protected]>; Vladimir Medvedkin
> > <[email protected]>; Ananyev, Konstantin
> > <[email protected]>; Honnappa Nagarahalli
> > <[email protected]>; Ruifeng Wang
> > <[email protected]>; dev <[email protected]>; nd <[email protected]>; dpdk
> > stable <[email protected]>; Aaron Conole <[email protected]>
> > Subject: Re: [dpdk-dev] [PATCH v2] test/func_reentrancy: free memzones
> > after creating test case
> >
> > On Sat, Jul 31, 2021 at 2:04 PM Joyce Kong <[email protected]> wrote:
> > >
> > > Function reentrancy test limits maximum number of iterations
> > > simultaneously, however it doesn't free the 'fr_test_once'
> > > memzones after the fact, so introduce freeing 'fr_test_once'
> > > in ring/mempool/hash/fbk/lpm_clean.
> > >
> > > Meanwhile, add the missing free for test case on main thread.
> > >
> > > Fixes: 104a92bd026f ("app: add reentrancy tests")
> > > Fixes: 995eec619024 ("test: clean up memory for function reentrancy
> > > test")
> > > Cc: [email protected]
> > >
> > > Signed-off-by: Joyce Kong <[email protected]>
> > > Reviewed-by: Ruifeng Wang <[email protected]>
> > > Reviewed-by: Feifei Wang <[email protected]>
> >
> > This patch actually breaks the test (we are lucky, the failure happens 
> > often ;)).
> >
> > 28/94 DPDK:fast-tests / func_reentrancy_autotest       FAIL
> > 0.22s (exit status 255 or signal 127 SIGinvalid)
> >
> > --- command ---
> > 16:13:45 DPDK_TEST='func_reentrancy_autotest'
> > /home-local/jenkins-local/jenkins-agent/workspace/Generic-Unit-Test-
> > DPDK/dpdk/build/app/test/dpdk-test
> > -l 0-15 --file-prefix=func_reentrancy_autotest
> > --- stdout ---
> > RTE>>func_reentrancy_autotest
> > Func-ReEnt CASE 0: eal init once PASS
> > ring create/lookup: common object allocated 2 times (should be 1) Func-
> > ReEnt CASE 1: ring create/lookup FAIL Test Failed
> > RTE>>
> > --- stderr ---
> >
> >
> > I guess, this is what happens:
> >
> > main lcore                          worker lcore 1              ...
> > worker lcore X
> >                                     enters ring_create_lookup()
> >
> > enters ring_create_lookup()
> > rte_eal_wait_lcore(worker lcore 1);
> >                                     leaves ring_create_lookup() 
> > ring_clean(worker lcore 1);
> >
> > leaves ring_create_lookup()
> >
> > There is no synchronisation point for the main lcore to know the worker
> > lcores are finished invoking the func callback.
> > With this patch, the "common" object is freed by the main lcore
> > *potentially* before some workers start trying to create it.
> > And we end up with multiple workers successfully creating this object, hence
> > the obj_count being incremented.
> >
> >
> > --
> > David Marchand
>
> I think add rte_eal_mp_wait_lcore() like below can ensure the lcores to free 
> objects
> after all func callback finished.
> Shall do the change in next version.
>
>         RTE_LCORE_FOREACH_WORKER(lcore_id) {
>                 if (cores == 1)
>                         break;
>                 cores--;
>                 rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id);
>         }
>         rte_atomic32_set(&synchro, 1);
>         if (pt_case->func(pt_case->arg) < 0)
>                 ret = -1;
>
> +       rte_eal_mp_wait_lcore();
>
>         cores = cores_save;
>         RTE_LCORE_FOREACH_WORKER(lcore_id) {
>                 if (cores == 1)
>                         break;
>                 cores--;
> -               if (rte_eal_wait_lcore(lcore_id) < 0)
> -                       ret = -1;
>                 if (pt_case->clean != NULL)
>                         pt_case->clean(lcore_id);
>         }

Using mp_wait_lcore, the test can't tell if a lcore returned an error
after executing the passed callback.

An alternative is to split the current loop to first have the per
lcore rte_eal_wait_lcore() calls + ret code check, and then a second
loop calls the clean() callback.


-- 
David Marchand

Reply via email to