> -----Original Message-----
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Thursday, July 5, 2018 3:23 PM
> To: Zhang, Qi Z <qi.z.zh...@intel.com>
> Cc: dev@dpdk.org; Burakov, Anatoly <anatoly.bura...@intel.com>; Ananyev,
> Konstantin <konstantin.anan...@intel.com>; Richardson, Bruce
> <bruce.richard...@intel.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; Shelton,
> Benjamin H <benjamin.h.shel...@intel.com>; Vangati, Narender
> <narender.vang...@intel.com>; arybche...@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v8 04/19] ethdev: introduce device lock
>
> 05/07/2018 05:37, Zhang, Qi Z:
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > 05/07/2018 03:38, Zhang, Qi Z:
> > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > 04/07/2018 12:49, Zhang, Qi Z:
> > > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > > > 04/07/2018 03:47, Zhang, Qi Z:
> > > > > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > > > > > 03/07/2018 17:08, Zhang, Qi Z:
> > > > > > > > > > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > > > > > > > > > 02/07/2018 07:44, Qi Zhang:
> > > > > > > > > > > > Introduce API rte_eth_dev_lock and
> > > > > > > > > > > > rte_eth_dev_unlock to let application lock or
> > > > > > > > > > > > unlock on specific ethdev, a locked device can't
> > > > > > > > > > > > be detached, this help applicaiton to prevent
> > > > > > > > > > > > unexpected device detaching, especially in
> > > > > > > > > > > > multi-process
> > > > > envrionment.
> > > > > > > > > > >
> > > > > > > > > > > Trying to understand: a process of an application
> > > > > > > > > > > could try to detach a port while another process is
> > > > > > > > > > > against this
> > > decision.
> > > > > > > > > > > Why an application needs to be protected against itself?
> > > > > > > > > >
> > > > > > > > > > I think we can regard this as a help function, it help
> > > > > > > > > > application to simplified
> > > > > > > > > the situation when one process want to detach a device
> > > > > > > > > while another one is still using it.
> > > > > > > > > > Application can register a callback which can do to
> > > > > > > > > > necessary clean up (like
> > > > > > > > > stop traffic, release memory ...) before device be detached.
> > > > > > > > >
> > > > > > > > > Yes I agree such hook can be a good idea.
> > > > > [...]
> > > > > > > > > After all, it is just a pre-detach hook.
> > > > > > > >
> > > > > > > > > Wait, how is it different of RTE_ETH_EVENT_DESTROY callback?
> > > > > > > > > Perhaps we just need to improve the handling of the
> > > > > > > > > DESTROY
> > > event?
> > > > > > > >
> > > > > > > > I have thought about this before.
> > > > > > > > Not like RTE_ETH_EVENT_DESTROY and other event hook, the
> > > > > > > > hook here
> > > > > > > need to give feedback, pass or fail will impact the
> > > > > > > following behavior, this make it special, so I separate it
> > > > > > > from all exist rte_eth_event_type handle mechanism.
> > > > > > >
> > > > > > > Look at _rte_eth_dev_callback_process, there is a "ret_param".
> > > > > >
> > > > > > OK, that should work.
> > > > > > >
> > > > > > > > The alternative solution is we just introduce a new event
> > > > > > > > type like RTE_ETH_EVENT_PRE_DETACH and reuse all exist API
> > > > > > > rte_eth_dev_callback_register/rte_eth_dev_callback_unregister.
> > > > > > >
> > > > > > > I don't think we need a new event.
> > > > > > > Let's try to use RTE_ETH_EVENT_DESTROY.
> > > > > >
> > > > > > The problem is RTE_ETH_EVENT_DESTROY is used in
> > > > > rte_eth_dev_release_port already.
> > > > > > And in PMD, rte_eth_dev_release_port is called after
> > > > > > dev_uninit, that mean its too late to reject a detach
> > > > >
> > > > > You're right.
> > > > >
> > > > > It's a real mess currently.
> > > > > The right order should be to remove ethdev ports before removing
> > > > > the underlying EAL device. But it's strangely not the case.
> > > > >
> > > > > We need to separate things.
> > > > > The function rte_eth_dev_close can be used to remove an ethdev
> > > > > port if we add a call to rte_eth_dev_release_port.
> > > > > So we could call rte_eth_dev_close in PMD remove functions.
> > > > > Is "close" a good time to ask confirmation to the application?
> > > > > Or should we ask confirmation a step before, on "stop"?
> > > >
> > > > I think the confirmation should before any cleanup stage, it
> > > > should at the
> > > beginning of driver->remove.
> > >
> > > So you stop a port, even if the app policy is against detaching it?
> >
> > My understanding is, stop and detach is different, we may stop a device and
> reconfigure it then restart it.
> > but for detach, properly we will not use it, unless it be probed again.
> > For dev_close , it should be called after dev_stop.
> > so we have to like below.
> >
> > If (dev->started) {
> > dev_stop /* but still problem here, if traffic is ongoing */
> > if (dev_close()) {
> > dev_start()
> > return -EBUSY.
> > }
> > } else {
> > If (dev_close())
> > Return _EBUSY
> > }
> >
> > So for me, neither rte_eth_dev_stop and rte_eth_dev_close is the right place
> to check this.
> > But rte_eth_dev_destroy looks like a good one. We can put all the
> > ethdev general logic into it, and PMD specific dev_unit will be called
> > at last
>
> If you want to detach a port, you need to stop it.
> If one process try to detach a port, but another process decides (via
> callback)
> that the port should not be detached, you will have stopped a port for no good
> reason.
> To me it is a real design issue.
Yes, so I think we still need two iterates for detach.
First iterate to get agreement on all processes.
Secondary iterate to do the detach.
But how?
>
>
> > > > Also we should not put it into rte_eth_dev_stop, because,
> > > > rte_eth_dev_stop
> > > can invoked by application directly, in that case, we don't what any
> > > callback be invoked.
> > >
> > > It it the same to detach a port: it is invoked directly by application.
> > > I thought you wanted a callback as helper for inter-process management?
> > >
> > > > > > So , do you mean we can remove
> > > > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROPY) in
> > > > > > rte_eth_dev_release_port
> > > > >
> > > > > I would say we need RTE_ETH_EVENT_DESTROY to notify that the
> > > > > port is really destroyed.
> > > > > Maybe the right thing to do is to add a new event
> > > > > RTE_ETH_EVENT_CLOSE_REQUEST or something else.
> > > > > Note that we already have 2 removal events in ethdev:
> > > > > - RTE_ETH_EVENT_INTR_RMV when the port cannot be used
> anymore
> > > > > - RTE_ETH_EVENT_DESTROY when the port is going to be deleted
> > > > >
> > > > > > And where is right place to call
> > > > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_DESTROY)?
> > > > > > If can't be called in rte_eth_dev_detach, because if device is
> > > > > > removed by
> > > > > rte_eal_hotplug_remove, it will be skipped.
> > > > >
> > > > > No, rte_eth_dev_detach and rte_eal_hotplug_remove are 2
> > > > > different
> > > things.
> > > > > One is a mix of ethdev and EAL (and should be deprecated), the
> > > > > other one is for the underlying device at EAL level.
> > > > >
> > > > > > probably we need to call this at the beginning of each PMD
> > > > > > driver->remove?,
> > > > > that means, we need to change all PMD drivers?
> > > > >
> > > > > Yes, we can call rte_eth_dev_stop and rte_eth_dev_close at the
> > > > > beginning of PMD remove.
> > > > > Note that there is already a helper rte_eth_dev_destroy called
> > > > > in some PMD to achieve the removal, but curiously, it doesn't
> > > > > call stop and
> > > close functions.
> > > >
> > > > Currently PMD implement driver->remove with different way,
> > > rte_eth_dev_stop / rte_eth_dev_close / rte_eth_dev_destroy is not
> > > always be invoked.
> > > > So Before we standardize what ethdev API and what sequence should
> > > > be called in driver->remove (I think this is a separate task) I
> > > > will suggest 1. Create another help function like
> > > > _rte_eth_dev_allow_to_remove, 2. the help function will call
> > > _rte_eth_dev_callback_process(RTE_ETH_EVENT_PRE_REMOVE) and
> update
> > > ret_param which contain a reject count.
> > > > 3. the help function should to invoked at beginning at
> > > > driver->remove and
> > > driver->remove will abort if the help function failed.
> > > >
> > > > But once we standardized that , we can do cleanup to merge it into
> > > > another
> > > rte_eth_xxx API in next step.
> > > >
> > > > What do you think?
> > >
> > > No
> > > All the problems we have today are because we preferred add more and
> > > more functions instead of fixing the basic stuff. And it is
> > > especially the case for all the detach crap.
> > > So no.
> > > Let's fix stuff first.
> > >
> > >
> >
>
>
>
>