Hi Matan, > > Hi Konstantin > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM > > Hi Matan, > > > > > > > > Hi Konstantin > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM > > > > Hi Matan, > > > > > > > > Few comments from me below. > > > > BTW, do you plan to add ownership mandatory check in control path > > > > functions that change port configuration? > > > > > > No. > > > > So it still totally voluntary usage and application nneds to be changed to > > exploit it? > > Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan? > > > > Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected because 2 > DPDK entities can get the same port while using it.
I am not talking about racing condition here. Right now even from the same thread - I can call dev_configure() for the port which I don't own (let say it belongs to failsafe port), and that would remain, correct? > As I wrote in the log\docs and as discussed a lot in the first version: > The new synchronization rules are: > 1. The port allocation and port release synchronization will be > managed by ethdev. > 2. The port usage synchronization will be managed by the port owner. > 3. The port ownership API synchronization(also with port creation) will be > managed by ethdev. > 5. DPDK entity which want to use a port must take ownership before. > > Ethdev should not protect 2 and 4 according these rules. > > > > > Konstantin > > > > > > > > > -----Original Message----- > > > > > From: Matan Azrad [mailto:ma...@mellanox.com] > > > > > Sent: Sunday, January 7, 2018 9:46 AM > > > > > To: Thomas Monjalon <tho...@monjalon.net>; Gaetan Rivet > > > > > <gaetan.ri...@6wind.com>; Wu, Jingjing <jingjing...@intel.com> > > > > > Cc: dev@dpdk.org; Neil Horman <nhor...@tuxdriver.com>; > > Richardson, > > > > > Bruce <bruce.richard...@intel.com>; Ananyev, Konstantin > > > > > <konstantin.anan...@intel.com> > > > > > Subject: [PATCH v2 2/6] ethdev: add port ownership > > > > > > > > > > The ownership of a port is implicit in DPDK. > > > > > Making it explicit is better from the next reasons: > > > > > 1. It will define well who is in charge of the port usage > > > > > synchronization. > > > > > 2. A library could work on top of a port. > > > > > 3. A port can work on top of another port. > > > > > > > > > > Also in the fail-safe case, an issue has been met in testpmd. > > > > > We need to check that the application is not trying to use a port > > > > > which is already managed by fail-safe. > > > > > > > > > > A port owner is built from owner id(number) and owner name(string) > > > > > while the owner id must be unique to distinguish between two > > > > > identical entity instances and the owner name can be any name. > > > > > The name helps to logically recognize the owner by different DPDK > > > > > entities and allows easy debug. > > > > > Each DPDK entity can allocate an owner unique identifier and can > > > > > use it and its preferred name to owns valid ethdev ports. > > > > > Each DPDK entity can get any port owner status to decide if it can > > > > > manage the port or not. > > > > > > > > > > The mechanism is synchronized for both the primary process threads > > > > > and the secondary processes threads to allow secondary process > > > > > entity to be a port owner. > > > > > > > > > > Add a sinchronized ownership mechanism to DPDK Ethernet devices to > > > > > avoid multiple management of a device by different DPDK entities. > > > > > > > > > > The current ethdev internal port management is not affected by > > > > > this feature. > > > > > > > > > > Signed-off-by: Matan Azrad <ma...@mellanox.com> > > > > > --- > > > > > doc/guides/prog_guide/poll_mode_drv.rst | 14 ++- > > > > > lib/librte_ether/rte_ethdev.c | 206 > > > > ++++++++++++++++++++++++++++++-- > > > > > lib/librte_ether/rte_ethdev.h | 89 ++++++++++++++ > > > > > lib/librte_ether/rte_ethdev_version.map | 12 ++ > > > > > 4 files changed, 311 insertions(+), 10 deletions(-) > > > > > > > > > > > > > > > > > > > > > > > diff --git a/lib/librte_ether/rte_ethdev.c > > > > > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644 > > > > > --- a/lib/librte_ether/rte_ethdev.c > > > > > +++ b/lib/librte_ether/rte_ethdev.c > > > > > @@ -70,7 +70,10 @@ > > > > > > > > > > static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data"; > > > > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS]; > > > > > +/* ports data array stored in shared memory */ > > > > > static struct rte_eth_dev_data *rte_eth_dev_data; > > > > > +/* next owner identifier stored in shared memory */ static > > > > > +uint16_t *rte_eth_next_owner_id; > > > > > static uint8_t eth_dev_last_created_port; > > > > > > > > > > /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@ > > > > > /* spinlock for add/remove tx callbacks */ static rte_spinlock_t > > > > > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER; > > > > > > > > > > +/* spinlock for eth device ownership management stored in shared > > > > > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock; > > > > > + > > > > > /* store statistics names and its offset in stats structure */ > > > > > struct rte_eth_xstats_name_off { > > > > > char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@ > > > > enum { } > > > > > > > > > > static void > > > > > -rte_eth_dev_data_alloc(void) > > > > > +rte_eth_dev_share_data_alloc(void) > > > > > { > > > > > const unsigned flags = 0; > > > > > const struct rte_memzone *mz; > > > > > + const unsigned int data_size = RTE_MAX_ETHPORTS * > > > > > + > > > > > sizeof(*rte_eth_dev_data); > > > > > > > > > > if (rte_eal_process_type() == RTE_PROC_PRIMARY) { > > > > > + /* Allocate shared memory for port data and ownership */ > > > > > mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA, > > > > > - RTE_MAX_ETHPORTS * > > > > sizeof(*rte_eth_dev_data), > > > > > + data_size + > > > > > sizeof(*rte_eth_next_owner_id) > > > > + > > > > > + sizeof(*rte_eth_dev_ownership_lock), > > > > > rte_socket_id(), flags); > > > > > } else > > > > > mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA); > > > > > @@ -168,9 +178,17 @@ enum { > > > > > rte_panic("Cannot allocate memzone for ethernet port > > > > data\n"); > > > > > > > > > > rte_eth_dev_data = mz->addr; > > > > > - if (rte_eal_process_type() == RTE_PROC_PRIMARY) > > > > > - memset(rte_eth_dev_data, 0, > > > > > - RTE_MAX_ETHPORTS * > > > > sizeof(*rte_eth_dev_data)); > > > > > + rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr + > > > > > + data_size); > > > > > + rte_eth_dev_ownership_lock = (rte_spinlock_t *) > > > > > + ((uintptr_t)rte_eth_next_owner_id + > > > > > + sizeof(*rte_eth_next_owner_id)); > > > > > > > > > > > > I think that might make rte_eth_dev_ownership_lock location not 4B > > > > aligned... > > > > > > Where can I find the documentation about it? > > > > That's in your code above - data_size and mz_->addr are both at least 4B > > aligned - rte_eth_dev_ownership_lock = mz->addr + data_size + 2; You can > > align it manually, but as discussed below it is probably easier to group > > related > > fields into the same struct. > > > I mean the documentation about the needed alignment for spinlock. Where is it? http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJAGCFAF.html Might be ARM and PPC guys can provide you some more complete/recent docs. > > > > > > > > Why just not to put all data that you are trying to allocate as one > > > > chunck into the same struct: > > > > static struct { > > > > uint16_t next_owner_id; > > > > /* spinlock for eth device ownership management stored in > > > > shared memory */ > > > > rte_spinlock_t dev_ownership_lock; > > > > rte_eth_dev_data *data; > > > > } rte_eth_dev_data; > > > > and allocate/use it everywhere? > > > > That would simplify allocation/management stuff. > > > > > > > I don't understand what exactly do you mean. ? > > > If you mean to group all in one struct like: > > > > > > static struct { > > > uint16_t next_owner_id; > > > rte_spinlock_t dev_ownership_lock; > > > rte_eth_dev_data data[]; > > > } rte_eth_dev_share_data; > > > > > > Just to simplify the addresses calculation above, > > > > Yep, that's exactly what I meant. > > As you said it would help with bulk allocation/alignment stuff, plus IMO it > > is > > better and easier to group several related global together - Improve code > > quality, will make it easier to read & maintain in future. > > > > > It will change more code in ethdev relative to the old rte_eth_dev_data > > global array and will be more intrusive. > > > Stay it as is, focuses the change only here. > > > > Yes it would require few more changes, though I think it worth it. > > > > Ok, Got you and agree. > > > > > > > I can just move the spinlock memory allocation to be at the beginning of > > the memzone(to be sure about the alignment). > > > > > > > It is good to see that now scanning/updating rte_eth_dev_data[] is > > > > lock protected, but it might be not very plausible to protect both > > > > data[] and next_owner_id using the same lock. > > > > > > I guess you mean to the owner structure in rte_eth_dev_data[port_id]. > > > The next_owner_id is read by ownership APIs(for owner validation), so it > > makes sense to use the same lock. > > > Actually, why not? > > > > Well to me next_owner_id and rte_eth_dev_data[] are not directly related. > > You may create new owner_id but it doesn't mean you would update > > rte_eth_dev_data[] immediately. > > And visa-versa - you might just want to update rte_eth_dev_data[].name or > > .owner_id. > > It is not very good coding practice to use same lock for non-related data > > structures. > > > I see the relation like next: > Since the ownership mechanism synchronization is in ethdev responsibility, > we must protect against user mistakes as much as we can by using the same > lock. > So, if user try to set by invalid owner (exactly the ID which currently is > allocated) we can protect on it. Hmm, not sure why you can't do same checking with different lock or atomic variable? > > > > > > > > In fact, for next_owner_id, you don't need a lock - just > > > > rte_atomic_t should be enough. > > > > > > I don't think so, it is problematic in next_owner_id wraparound and may > > complicate the code in other places which read it. > > > > IMO it is not that complicated, something like that should work I think. > > > > /* init to 0 at startup*/ > > rte_atomic32_t *owner_id; > > > > int new_owner_id(void) > > { > > int32_t x; > > x = rte_atomic32_add_return(&owner_id, 1); > > if (x > UINT16_MAX) { > > rte_atomic32_dec(&owner_id); > > return -EOVERWLOW; > > } else > > return x; > > } > > > > > > > Why not just to keep it simple and using the same lock? > > > > Lock is also fine, I just think it better be a separate one - that would > > protext > > just next_owner_id. > > Though if you are going to use uuid here - all that probably not relevant > > any > > more. > > > > I agree about the uuid but still think the same lock should be used for both. But with uuid you don't need next_owner_id at all, right? So lock will only be used for rte_eth_dev_data[] fields anyway. > > > > > > > > Another alternative would be to use 2 locks - one for next_owner_id > > > > second for actual data[] protection. > > > > > > > > Another thing - you'll probably need to grab/release a lock inside > > > > rte_eth_dev_allocated() too. > > > > It is a public function used by drivers, so need to be protected too. > > > > > > > > > > Yes, I thought about it, but decided not to use lock in next: > > > rte_eth_dev_allocated > > > rte_eth_dev_count > > > rte_eth_dev_get_name_by_port > > > rte_eth_dev_get_port_by_name > > > maybe more... > > > > As I can see in patch #3 you protect by lock access to > > rte_eth_dev_data[].name (which seems like a good thing). > > So I think any other public function that access rte_eth_dev_data[].name > > should be protected by the same lock. > > > > I don't think so, I can understand to use the ownership lock here(as in port > creation) but I don't think it is necessary too. > What are we exactly protecting here? > Don't you think it is just timing?(ask in the next moment and you > may get another answer) I don't see optional crash. Not sure what you mean here by timing... As I understand rte_eth_dev_data[].name unique identifies device and is used by port allocation/release/find functions. As you stated above: "1. The port allocation and port release synchronization will be managed by ethdev." To me it means that ethdev layer has to make sure that all accesses to rte_eth_dev_data[].name are atomic. Otherwise what would prevent the situation when one process does rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...) while second one does rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ? > > > > > > + > > > > > + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { > > > > > + memset(rte_eth_dev_data, 0, data_size); > > > > > + *rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1; > > > > > + rte_spinlock_init(rte_eth_dev_ownership_lock); > > > > > + } > > > > > } > > > > > > > > > > struct rte_eth_dev * > > > > > @@ -225,7 +243,7 @@ struct rte_eth_dev * > > > > > } > > > > > > > > > > if (rte_eth_dev_data == NULL) > > > > > - rte_eth_dev_data_alloc(); > > > > > + rte_eth_dev_share_data_alloc(); > > > > > > > > > > if (rte_eth_dev_allocated(name) != NULL) { > > > > > RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s > > > > already > > > > > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev * > > > > > struct rte_eth_dev *eth_dev; > > > > > > > > > > if (rte_eth_dev_data == NULL) > > > > > - rte_eth_dev_data_alloc(); > > > > > + rte_eth_dev_share_data_alloc(); > > > > > > > > > > for (i = 0; i < RTE_MAX_ETHPORTS; i++) { > > > > > if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ - > > > > 278,8 +296,12 > > > > > @@ struct rte_eth_dev * > > > > > if (eth_dev == NULL) > > > > > return -EINVAL; > > > > > > > > > > - memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data)); > > > > > + rte_spinlock_lock(rte_eth_dev_ownership_lock); > > > > > + > > > > > eth_dev->state = RTE_ETH_DEV_UNUSED; > > > > > + memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data)); > > > > > + > > > > > + rte_spinlock_unlock(rte_eth_dev_ownership_lock); > > > > > return 0; > > > > > } > > > > > > > > > > @@ -294,6 +316,174 @@ struct rte_eth_dev * > > > > > return 1; > > > > > } > > > > > > > > > > +static int > > > > > +rte_eth_is_valid_owner_id(uint16_t owner_id) { > > > > > + if (owner_id == RTE_ETH_DEV_NO_OWNER || > > > > > + (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER && > > > > > + *rte_eth_next_owner_id <= owner_id)) { > > > > > + RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id); > > > > > + return 0; > > > > > + } > > > > > + return 1; > > > > > +} > > > > > + > > > > > +uint16_t > > > > > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t > > > > owner_id) > > > > > +{ > > > > > + while (port_id < RTE_MAX_ETHPORTS && > > > > > + (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED > > > > > || > > > > > + rte_eth_devices[port_id].data->owner.id != owner_id)) > > > > > + port_id++; > > > > > + > > > > > + if (port_id >= RTE_MAX_ETHPORTS) > > > > > + return RTE_MAX_ETHPORTS; > > > > > + > > > > > + return port_id; > > > > > +} > > > > > + > > > > > +int > > > > > +rte_eth_dev_owner_new(uint16_t *owner_id) { > > > > > + int ret = 0; > > > > > + > > > > > + rte_spinlock_lock(rte_eth_dev_ownership_lock); > > > > > + > > > > > + if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) { > > > > > + /* Counter wrap around. */ > > > > > + RTE_PMD_DEBUG_TRACE("Reached maximum number of > > > > Ethernet port owners.\n"); > > > > > + ret = -EUSERS; > > > > > + } else { > > > > > + *owner_id = (*rte_eth_next_owner_id)++; > > > > > + } > > > > > + > > > > > + rte_spinlock_unlock(rte_eth_dev_ownership_lock); > > > > > + return ret; > > > > > +} > > > > > + > > > > > +int > > > > > +rte_eth_dev_owner_set(const uint16_t port_id, > > > > > + const struct rte_eth_dev_owner *owner) > > > > > > > > As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner, > > > > new_owner) > > > > - that might be more plausible for user, and would greatly simplify > > > > unset() > > > > part: > > > > just set(port_id, cur_owner, zero_owner); > > > > > > > > > > How the user should know the old owner? > > > > By dev_owner_get() or it might have it stored somewhere already (or > > constructed on the fly in case of NO_OWNER). > > > It complicates the usage. > What's about creating an internal API _rte_eth_dev_owner_set(port_id, > old_owner, > new_owner) and using it by the current exposed set\unset APIs? Sounds good to me. > > > > > > > > > +{ > > > > > + struct rte_eth_dev_owner *port_owner; > > > > > + int ret = 0; > > > > > + int sret; > > > > > + > > > > > + rte_spinlock_lock(rte_eth_dev_ownership_lock); > > > > > + > > > > > + if (!rte_eth_dev_is_valid_port(port_id)) { > > > > > + RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); > > > > > + ret = -ENODEV; > > > > > + goto unlock; > > > > > + } > > > > > + > > > > > + if (!rte_eth_is_valid_owner_id(owner->id)) { > > > > > + ret = -EINVAL; > > > > > + goto unlock; > > > > > + } > > > > > + > > > > > + port_owner = &rte_eth_devices[port_id].data->owner; > > > > > + if (port_owner->id != RTE_ETH_DEV_NO_OWNER && > > > > > + port_owner->id != owner->id) { > > > > > + RTE_LOG(ERR, EAL, > > > > > + "Cannot set owner to port %d already owned by > > > > %s_%05d.\n", > > > > > + port_id, port_owner->name, port_owner->id); > > > > > + ret = -EPERM; > > > > > + goto unlock; > > > > > + } > > > > > + > > > > > + sret = snprintf(port_owner->name, > > > > RTE_ETH_MAX_OWNER_NAME_LEN, "%s", > > > > > + owner->name); > > > > > + if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) { > > > > > > > > Personally, I don't see any reason to fail if description was > > > > truncated... > > > > Another alternative - just use rte_malloc() here to allocate big > > > > enough buffer to hold the description. > > > > > > > > > > But it is static allocation like in the device name, why to allocate it > > differently? > > > > Static allocation is fine by me - I just said there is probably no need to > > fail if > > description provide by use will be truncated in that case. > > Though if used description is *that* important - rte_malloc() can help here. > > > Again, what is the difference between port name and owner name regarding the > allocations? As I understand rte_eth_dev_data[].name unique identifies device and always has to be consistent. owner.name is not critical for system operation, and I don't see a big deal if it would be truncated. > The advantage of static allocation: > 1. Not use protected malloc\free functions in other protected code. You can call malloc/free before/after grabbing the lock. But as I said - I am fine with static array here too - I just don't think truncating user description should cause a failure. > 2. Easier to the user. > > > > > > > > > + memset(port_owner->name, 0, > > > > RTE_ETH_MAX_OWNER_NAME_LEN); > > > > > + RTE_LOG(ERR, EAL, "Invalid owner name.\n"); > > > > > + ret = -EINVAL; > > > > > + goto unlock; > > > > > + } > > > > > + > > > > > + port_owner->id = owner->id; > > > > > + RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id, > > > > > + owner->name, owner->id); > > > > > + > > > > > > > > As another nit - you can avoid all these gotos by restructuring code a > > > > bit: > > > > > > > > rte_eth_dev_owner_set(const uint16_t port_id, const struct > > > > rte_eth_dev_owner *owner) { > > > > rte_spinlock_lock(...); > > > > ret = _eth_dev_owner_set_unlocked(port_id, owner); > > > > rte_spinlock_unlock(...); > > > > return ret; > > > > } > > > > > > > Don't you like gotos? :) > > > > Not really :) > > > > > I personally use it only in error\performance scenarios. > > > > Same here - prefer to avoid them if possible. > > > > > Do you think it worth the effort? > > > > IMO - yes, well structured code is much easier to understand and maintain. > I don't think so in error cases(and performance), It is really clear here, > but if you are insisting, I will change it. > Are you? Yes, that would be my preference. Why otherwise I would bother to write all this? :) > (If the community thinks like you I think "goto" check should be added to > checkpatch). Might be there are pieces of code there goto are really hard to avoid, and/or using goto would provide some performance benefit or so... But that case definitely doesn't look like that. Konstantin