Hi Stephen, Thanks for the review. Some responses in-line:
> -----Original Message----- > From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Friday, December 7, 2018 12:10 PM > To: Carrillo, Erik G <erik.g.carri...@intel.com> > Cc: rsanf...@akamai.com; jerin.ja...@caviumnetworks.com; > pbhagavat...@caviumnetworks.com; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in > shared memory > > On Fri, 7 Dec 2018 11:52:59 -0600 > Erik Gabriel Carrillo <erik.g.carri...@intel.com> wrote: > > > Currently, the timer library uses a per-process table of structures to > > manage skiplists of timers presumably because timers contain arbitrary > > function pointers whose value may not resolve properly in other > > processes. > > > > However, if the same callback is used handle all timers, and that > > callback is only invoked in one process, then it woud be safe to allow > > the data structures to be allocated in shared memory, and to allow > > secondary processes to modify the timer lists. This would let timers > > be used in more multi-process scenarios. > > > > The library's global variables are wrapped with a struct, and an array > > of these structures is created in shared memory. The original APIs > > are updated to reference the zeroth entry in the array. This maintains > > the original behavior for both primary and secondary processes since > > the set intersection of their coremasks should be empty [1]. New APIs > > are introduced to enable the allocation/deallocation of other entries > > in the array. > > > > New variants of the APIs used to start and stop timers are introduced; > > they allow a caller to specify which array entry should be used to > > locate the timer list to insert into or delete from. > > > > Finally, a new variant of rte_timer_manage() is introduced, which > > allows a caller to specify which array entry should be used to locate > > the timer lists to process; it can also process multiple timer lists > > per invocation. > > > > [1] > > https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi- > p > > rocess-limitations > > > > Signed-off-by: Erik Gabriel Carrillo <erik.g.carri...@intel.com> > > Makes sense but it looks to me like an ABI breakage. Experimental isn't going > to work for this. For APIs that existed prior to this patch, I've duplicated them in a "19.02" node in the map file; I only marked new APIs as experimental. I versioned each API in order to maintain the prior interface as well. I tested ABI compatibility with devtools/validate-abi.sh; it reported no errors detected. So I believe this won't break the ABI, but if I need to change something I certainly will. > > > +static uint32_t default_data_id; // id set to zero automatically > > C++ style comments are not allowed per DPDK coding style. > Best to just drop the comment, it is stating the obvious. > Sure - will do. > > -/* Init the timer library. */ > > +static inline int > > +timer_data_valid(uint32_t id) > > +{ > > + return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED); } > > Don't need inline on static functions. > ... > > > +MAP_STATIC_SYMBOL(int rte_timer_manage(void), > > +rte_timer_manage_v1902); BIND_DEFAULT_SYMBOL(rte_timer_manage, > > +_v1902, 19.02); > > + > > +int __rte_experimental > > +rte_timer_alt_manage(uint32_t timer_data_id, > > + unsigned int *poll_lcores, > > + int nb_poll_lcores, > > + rte_timer_alt_manage_cb_t f) > > +{ > > + union rte_timer_status status; > > + struct rte_timer *tim, *next_tim, **pprev; > > + struct rte_timer *run_first_tims[RTE_MAX_LCORE]; > > + unsigned int runlist_lcore_ids[RTE_MAX_LCORE]; > > + unsigned int this_lcore = rte_lcore_id(); > > + struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1]; > > + uint64_t cur_time; > > + int i, j, ret; > > + int nb_runlists = 0; > > + struct rte_timer_data *data; > > + struct priv_timer *privp; > > + uint32_t poll_lcore; > > + > > + TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, - > EINVAL); > > + > > + /* timer manager only runs on EAL thread with valid lcore_id */ > > + assert(this_lcore < RTE_MAX_LCORE); > > + > > + __TIMER_STAT_ADD(data->priv_timer, manage, 1); > > + > > + if (poll_lcores == NULL) { > > + poll_lcores = (unsigned int []){rte_lcore_id()}; > > > This isn't going to be safe. It assigns poll_lcores to an array allocated on > the > stack. > poll_lcores is allowed to be NULL when rte_timer_alt_manage() is called for convenience; if it is NULL, then we create an array on the stack containing one item and point poll_lcores at it. poll_lcores only needs to be valid for the invocation of the function, so pointing to an array on the stack seems fine. Did I miss the point? > > + > > + for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores; > > + poll_lcore = poll_lcores[++i]) { > > + privp = &data->priv_timer[poll_lcore]; > > + > > + /* optimize for the case where per-cpu list is empty */ > > + if (privp->pending_head.sl_next[0] == NULL) > > + continue; > > + cur_time = rte_get_timer_cycles(); > > + > > +#ifdef RTE_ARCH_64 > > + /* on 64-bit the value cached in the pending_head.expired > will > > + * be updated atomically, so we can consult that for a quick > > + * check here outside the lock > > + */ > > + if (likely(privp->pending_head.expire > cur_time)) > > + continue; > > +#endif > > > This code needs to be optimized so that application can call this at a very > high > rate without performance impact.