Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership

Ananyev, Konstantin Thu, 11 Jan 2018 16:03:07 -0800

Hi Matan,

> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > Hi Matan,
> >
> > >
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > Hi Matan,
> > > >
> > > > Few comments from me below.
> > > > BTW, do you plan to add ownership mandatory check in control path
> > > > functions that change port configuration?
> > >
> > > No.
> >
> > So it still totally voluntary usage and application nneds to be changed to
> > exploit it?
> > Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?
> >
> 
> Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected because 2 
> DPDK entities can get the same port while using it.


I am not talking about racing condition here.
Right now even from the same thread - I can call dev_configure()
for the port which I don't own (let say it belongs to failsafe port),
and that would remain, correct?
 
> As I wrote in the log\docs and as discussed a lot in the first version:
> The new synchronization rules are:
> 1. The port allocation and port release synchronization will be
>    managed by ethdev.
> 2. The port usage synchronization will be managed by the port owner.
> 3. The port ownership API synchronization(also with port creation) will be 
> managed by ethdev.
> 5. DPDK entity which want to use a port must take ownership before.
> 
> Ethdev should not protect 2 and 4 according these rules.
> 
> > > > Konstantin
> > > >
> > > > > -----Original Message-----
> > > > > From: Matan Azrad [mailto:ma...@mellanox.com]
> > > > > Sent: Sunday, January 7, 2018 9:46 AM
> > > > > To: Thomas Monjalon <tho...@monjalon.net>; Gaetan Rivet
> > > > > <gaetan.ri...@6wind.com>; Wu, Jingjing <jingjing...@intel.com>
> > > > > Cc: dev@dpdk.org; Neil Horman <nhor...@tuxdriver.com>;
> > Richardson,
> > > > > Bruce <bruce.richard...@intel.com>; Ananyev, Konstantin
> > > > > <konstantin.anan...@intel.com>
> > > > > Subject: [PATCH v2 2/6] ethdev: add port ownership
> > > > >
> > > > > The ownership of a port is implicit in DPDK.
> > > > > Making it explicit is better from the next reasons:
> > > > > 1. It will define well who is in charge of the port usage 
> > > > > synchronization.
> > > > > 2. A library could work on top of a port.
> > > > > 3. A port can work on top of another port.
> > > > >
> > > > > Also in the fail-safe case, an issue has been met in testpmd.
> > > > > We need to check that the application is not trying to use a port
> > > > > which is already managed by fail-safe.
> > > > >
> > > > > A port owner is built from owner id(number) and owner name(string)
> > > > > while the owner id must be unique to distinguish between two
> > > > > identical entity instances and the owner name can be any name.
> > > > > The name helps to logically recognize the owner by different DPDK
> > > > > entities and allows easy debug.
> > > > > Each DPDK entity can allocate an owner unique identifier and can
> > > > > use it and its preferred name to owns valid ethdev ports.
> > > > > Each DPDK entity can get any port owner status to decide if it can
> > > > > manage the port or not.
> > > > >
> > > > > The mechanism is synchronized for both the primary process threads
> > > > > and the secondary processes threads to allow secondary process
> > > > > entity to be a port owner.
> > > > >
> > > > > Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> > > > > avoid multiple management of a device by different DPDK entities.
> > > > >
> > > > > The current ethdev internal port management is not affected by
> > > > > this feature.
> > > > >
> > > > > Signed-off-by: Matan Azrad <ma...@mellanox.com>
> > > > > ---
> > > > >  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
> > > > >  lib/librte_ether/rte_ethdev.c           | 206
> > > > ++++++++++++++++++++++++++++++--
> > > > >  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
> > > > >  lib/librte_ether/rte_ethdev_version.map |  12 ++
> > > > >  4 files changed, 311 insertions(+), 10 deletions(-)
> > > >
> > > >
> > > > >
> > > > >
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -70,7 +70,10 @@
> > > > >
> > > > >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
> > > > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > > > > +/* ports data array stored in shared memory */
> > > > >  static struct rte_eth_dev_data *rte_eth_dev_data;
> > > > > +/* next owner identifier stored in shared memory */ static
> > > > > +uint16_t *rte_eth_next_owner_id;
> > > > >  static uint8_t eth_dev_last_created_port;
> > > > >
> > > > >  /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@
> > > > >  /* spinlock for add/remove tx callbacks */  static rte_spinlock_t
> > > > > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> > > > >
> > > > > +/* spinlock for eth device ownership management stored in shared
> > > > > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock;
> > > > > +
> > > > >  /* store statistics names and its offset in stats structure  */
> > > > > struct rte_eth_xstats_name_off {
> > > > >       char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@
> > > > enum {  }
> > > > >
> > > > >  static void
> > > > > -rte_eth_dev_data_alloc(void)
> > > > > +rte_eth_dev_share_data_alloc(void)
> > > > >  {
> > > > >       const unsigned flags = 0;
> > > > >       const struct rte_memzone *mz;
> > > > > +     const unsigned int data_size = RTE_MAX_ETHPORTS *
> > > > > +                                             
> > > > > sizeof(*rte_eth_dev_data);
> > > > >
> > > > >       if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > > +             /* Allocate shared memory for port data and ownership */
> > > > >               mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > > > > -                             RTE_MAX_ETHPORTS *
> > > > sizeof(*rte_eth_dev_data),
> > > > > +                             data_size + 
> > > > > sizeof(*rte_eth_next_owner_id)
> > > > +
> > > > > +                             sizeof(*rte_eth_dev_ownership_lock),
> > > > >                               rte_socket_id(), flags);
> > > > >       } else
> > > > >               mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > > > > @@ -168,9 +178,17 @@ enum {
> > > > >               rte_panic("Cannot allocate memzone for ethernet port
> > > > data\n");
> > > > >
> > > > >       rte_eth_dev_data = mz->addr;
> > > > > -     if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > > > > -             memset(rte_eth_dev_data, 0,
> > > > > -                             RTE_MAX_ETHPORTS *
> > > > sizeof(*rte_eth_dev_data));
> > > > > +     rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> > > > > +                                          data_size);
> > > > > +     rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> > > > > +             ((uintptr_t)rte_eth_next_owner_id +
> > > > > +              sizeof(*rte_eth_next_owner_id));
> > > >
> > > >
> > > > I think that might make  rte_eth_dev_ownership_lock location not 4B
> > > > aligned...
> > >
> > > Where can I find the documentation about it?
> >
> > That's in your code above - data_size and mz_->addr are both at least 4B
> > aligned - rte_eth_dev_ownership_lock = mz->addr + data_size + 2; You can
> > align it manually, but as discussed below it is probably easier to group 
> > related
> > fields into the same struct.
> >
> I mean the documentation about the needed alignment for spinlock. Where is it?

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJAGCFAF.html

Might be ARM and PPC guys can provide you some more complete/recent docs. 


> 
> > >
> > > > Why just not to put all data that you are trying to allocate as one
> > > > chunck into the same struct:
> > > > static struct {
> > > >         uint16_t next_owner_id;
> > > >         /* spinlock for eth device ownership management stored in
> > > > shared memory */
> > > >         rte_spinlock_t dev_ownership_lock;
> > > >         rte_eth_dev_data *data;
> > > > } rte_eth_dev_data;
> > > > and allocate/use it everywhere?
> > > > That would simplify allocation/management stuff.
> > > >
> > > I don't understand what exactly do you mean. ?
> > > If you mean to group all in one struct like:
> > >
> > > static struct {
> > >         uint16_t next_owner_id;
> > >         rte_spinlock_t dev_ownership_lock;
> > >         rte_eth_dev_data  data[];
> > > } rte_eth_dev_share_data;
> > >
> > > Just to simplify the addresses calculation above,
> >
> > Yep, that's exactly what I meant.
> > As you said it would help with bulk allocation/alignment stuff, plus IMO it 
> > is
> > better and easier to group several related global together - Improve code
> > quality, will make it easier to read & maintain in future.
> >
> > > It will change more code in ethdev relative to the old rte_eth_dev_data
> > global array and will be more intrusive.
> > > Stay it as is, focuses the change only here.
> >
> > Yes it would require few more changes, though I think it worth it.
> >
> 
> Ok, Got you and agree.
> 
> > >
> > > I can just move the spinlock memory allocation to be at the beginning of
> > the memzone(to be sure about the alignment).
> > >
> > > > It is good to see that now scanning/updating rte_eth_dev_data[] is
> > > > lock protected, but it might be not very plausible to protect both
> > > > data[] and next_owner_id using the same lock.
> > >
> > > I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> > > The next_owner_id is read by ownership APIs(for owner validation), so it
> > makes sense to use the same lock.
> > > Actually, why not?
> >
> > Well to me next_owner_id and rte_eth_dev_data[] are not directly related.
> > You may create new owner_id but it doesn't mean you would update
> > rte_eth_dev_data[] immediately.
> > And visa-versa - you might just want to update rte_eth_dev_data[].name or
> > .owner_id.
> > It is not very good coding practice to use same lock for non-related data
> > structures.
> >
> I see the relation like next:
> Since the ownership mechanism synchronization is in ethdev responsibility,
> we must protect against user mistakes as much as we can by using the same 
> lock.
> So, if user try to set by invalid owner (exactly the ID which currently is 
> allocated) we can protect on it.

Hmm, not sure why you can't do same checking with different lock or atomic 
variable?

> 
> > >
> > > > In fact, for next_owner_id, you don't need a lock - just
> > > > rte_atomic_t should be enough.
> > >
> > > I don't think so, it is problematic in next_owner_id wraparound and may
> > complicate the code in other places which read it.
> >
> > IMO it is not that complicated, something like that should work I think.
> >
> > /* init to 0 at startup*/
> > rte_atomic32_t *owner_id;
> >
> > int new_owner_id(void)
> > {
> >     int32_t x;
> >     x = rte_atomic32_add_return(&owner_id, 1);
> >     if (x > UINT16_MAX) {
> >        rte_atomic32_dec(&owner_id);
> >        return -EOVERWLOW;
> >     } else
> >         return x;
> > }
> >
> >
> > > Why not just to keep it simple and using the same lock?
> >
> > Lock is also fine, I just think it better be a separate one - that would 
> > protext
> > just next_owner_id.
> > Though if you are going to use uuid here - all that probably not relevant 
> > any
> > more.
> >
> 
> I agree about the uuid but still think the same lock should be used for both.

But with uuid you don't need next_owner_id at all, right?
So lock will only be used for rte_eth_dev_data[] fields anyway.

> 
> > >
> > > > Another alternative would be to use 2 locks - one for next_owner_id
> > > > second for actual data[] protection.
> > > >
> > > > Another thing - you'll probably need to grab/release a lock inside
> > > > rte_eth_dev_allocated() too.
> > > > It is a public function used by drivers, so need to be protected too.
> > > >
> > >
> > > Yes, I thought about it, but decided not to use lock in next:
> > > rte_eth_dev_allocated
> > > rte_eth_dev_count
> > > rte_eth_dev_get_name_by_port
> > > rte_eth_dev_get_port_by_name
> > > maybe more...
> >
> > As I can see in patch #3 you protect by lock access to
> > rte_eth_dev_data[].name (which seems like a good  thing).
> > So I think any other public function that access rte_eth_dev_data[].name
> > should be protected by the same lock.
> >
> 
> I don't think so, I can understand to use the ownership lock here(as in port 
> creation) but I don't think it is necessary too.
> What are we exactly protecting here?
> Don't you think it is just timing?(ask in the next moment and you
>  may get another answer) I don't see optional crash.

Not sure what you mean here by timing...
As I understand rte_eth_dev_data[].name unique identifies device and is used
by  port allocation/release/find functions.
As you stated above:
"1. The port allocation and port release synchronization will be  managed by 
ethdev."
To me it means that ethdev layer has to make sure that all accesses to
rte_eth_dev_data[].name are atomic.
Otherwise what would prevent the situation when one process does
rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
while second one does rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...)
?

> 
> > > > > +
> > > > > +     if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > > +             memset(rte_eth_dev_data, 0, data_size);
> > > > > +             *rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> > > > > +             rte_spinlock_init(rte_eth_dev_ownership_lock);
> > > > > +     }
> > > > >  }
> > > > >
> > > > >  struct rte_eth_dev *
> > > > > @@ -225,7 +243,7 @@ struct rte_eth_dev *
> > > > >       }
> > > > >
> > > > >       if (rte_eth_dev_data == NULL)
> > > > > -             rte_eth_dev_data_alloc();
> > > > > +             rte_eth_dev_share_data_alloc();
> > > > >
> > > > >       if (rte_eth_dev_allocated(name) != NULL) {
> > > > >               RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s
> > > > already
> > > > > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev *
> > > > >       struct rte_eth_dev *eth_dev;
> > > > >
> > > > >       if (rte_eth_dev_data == NULL)
> > > > > -             rte_eth_dev_data_alloc();
> > > > > +             rte_eth_dev_share_data_alloc();
> > > > >
> > > > >       for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > >               if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ -
> > > > 278,8 +296,12
> > > > > @@ struct rte_eth_dev *
> > > > >       if (eth_dev == NULL)
> > > > >               return -EINVAL;
> > > > >
> > > > > -     memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > > +     rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > >       eth_dev->state = RTE_ETH_DEV_UNUSED;
> > > > > +     memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > > +
> > > > > +     rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > @@ -294,6 +316,174 @@ struct rte_eth_dev *
> > > > >               return 1;
> > > > >  }
> > > > >
> > > > > +static int
> > > > > +rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > +     if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> > > > > +         (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> > > > > +          *rte_eth_next_owner_id <= owner_id)) {
> > > > > +             RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > > +             return 0;
> > > > > +     }
> > > > > +     return 1;
> > > > > +}
> > > > > +
> > > > > +uint16_t
> > > > > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> > > > owner_id)
> > > > > +{
> > > > > +     while (port_id < RTE_MAX_ETHPORTS &&
> > > > > +            (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED 
> > > > > ||
> > > > > +            rte_eth_devices[port_id].data->owner.id != owner_id))
> > > > > +             port_id++;
> > > > > +
> > > > > +     if (port_id >= RTE_MAX_ETHPORTS)
> > > > > +             return RTE_MAX_ETHPORTS;
> > > > > +
> > > > > +     return port_id;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_owner_new(uint16_t *owner_id) {
> > > > > +     int ret = 0;
> > > > > +
> > > > > +     rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > > +     if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> > > > > +             /* Counter wrap around. */
> > > > > +             RTE_PMD_DEBUG_TRACE("Reached maximum number of
> > > > Ethernet port owners.\n");
> > > > > +             ret = -EUSERS;
> > > > > +     } else {
> > > > > +             *owner_id = (*rte_eth_next_owner_id)++;
> > > > > +     }
> > > > > +
> > > > > +     rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > > +     return ret;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_owner_set(const uint16_t port_id,
> > > > > +                   const struct rte_eth_dev_owner *owner)
> > > >
> > > > As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner,
> > > > new_owner)
> > > > - that might be more plausible for user, and would greatly simplify
> > > > unset()
> > > > part:
> > > > just set(port_id, cur_owner, zero_owner);
> > > >
> > >
> > > How the user should know the old owner?
> >
> > By dev_owner_get() or it might have it stored somewhere already (or
> > constructed on the fly in case of NO_OWNER).
> >
> It complicates the usage.
> What's about creating an internal API  _rte_eth_dev_owner_set(port_id, 
> old_owner,
> new_owner) and using it by the current exposed set\unset APIs?

Sounds good to me.

> 
> > >
> > > > > +{
> > > > > +     struct rte_eth_dev_owner *port_owner;
> > > > > +     int ret = 0;
> > > > > +     int sret;
> > > > > +
> > > > > +     rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > > +     if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +             RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +             ret = -ENODEV;
> > > > > +             goto unlock;
> > > > > +     }
> > > > > +
> > > > > +     if (!rte_eth_is_valid_owner_id(owner->id)) {
> > > > > +             ret = -EINVAL;
> > > > > +             goto unlock;
> > > > > +     }
> > > > > +
> > > > > +     port_owner = &rte_eth_devices[port_id].data->owner;
> > > > > +     if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> > > > > +         port_owner->id != owner->id) {
> > > > > +             RTE_LOG(ERR, EAL,
> > > > > +                     "Cannot set owner to port %d already owned by
> > > > %s_%05d.\n",
> > > > > +                     port_id, port_owner->name, port_owner->id);
> > > > > +             ret = -EPERM;
> > > > > +             goto unlock;
> > > > > +     }
> > > > > +
> > > > > +     sret = snprintf(port_owner->name,
> > > > RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> > > > > +                     owner->name);
> > > > > +     if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
> > > >
> > > > Personally, I don't see any reason to fail if description was 
> > > > truncated...
> > > > Another alternative - just use rte_malloc() here to allocate big
> > > > enough buffer to hold the description.
> > > >
> > >
> > > But it is static allocation like in the device name, why to allocate it
> > differently?
> >
> > Static allocation is fine by me - I just said there is probably no need to 
> > fail if
> > description provide by use will be truncated in that case.
> > Though if used description is *that* important - rte_malloc() can help here.
> >
> Again, what is the difference between port name and owner name regarding the 
> allocations?

As I understand rte_eth_dev_data[].name unique identifies device and always has 
to be consistent.
owner.name is not critical for system operation, and I don't see a big deal if 
it would be truncated.

> The advantage of static allocation:
> 1. Not use protected malloc\free functions in other protected code.

You can call malloc/free before/after grabbing the lock.
But as I said - I am fine with static array here too - I just don't think
truncating user description should cause a failure.  

> 2.  Easier to the user.
> 
> > >
> > > > > +             memset(port_owner->name, 0,
> > > > RTE_ETH_MAX_OWNER_NAME_LEN);
> > > > > +             RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > > > > +             ret = -EINVAL;
> > > > > +             goto unlock;
> > > > > +     }
> > > > > +
> > > > > +     port_owner->id = owner->id;
> > > > > +     RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> > > > > +                         owner->name, owner->id);
> > > > > +
> > > >
> > > > As another nit - you can avoid all these gotos by restructuring code a 
> > > > bit:
> > > >
> > > > rte_eth_dev_owner_set(const uint16_t port_id, const struct
> > > > rte_eth_dev_owner *owner) {
> > > >     rte_spinlock_lock(...);
> > > >     ret = _eth_dev_owner_set_unlocked(port_id, owner);
> > > >     rte_spinlock_unlock(...);
> > > >     return ret;
> > > > }
> > > >
> > > Don't you like gotos? :)
> >
> > Not really :)
> >
> > > I personally use it only in error\performance scenarios.
> >
> > Same here - prefer to avoid them if possible.
> >
> > > Do you think it worth the effort?
> >
> > IMO - yes, well structured code is much easier to understand and maintain.
> I don't think so in error cases(and performance), It is really clear here, 
> but if you are insisting, I will change it.
> Are you?

Yes, that would be my preference.
Why otherwise I would bother to write all this? :)

> (If the community thinks like you I think "goto" check should be added to 
> checkpatch).

Might be there are pieces of code there goto are really hard to avoid,
and/or using goto would provide some performance benefit or so...
But that case definitely doesn't look like that.
Konstantin

Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership

Reply via email to