> > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 > > > +#define RTE_QSBR_CNT_INIT 1 > > > + > > > +/** > > > + * RTE thread Quiescent State structure. > > > + * Quiescent state counter array (array of 'struct > > > +rte_rcu_qsbr_cnt'), > > > + * whose size is dependent on the maximum number of reader threads > > > + * (m_threads) using this variable is stored immediately following > > > + * this structure. > > > + */ > > > +struct rte_rcu_qsbr { > > > + uint64_t token __rte_cache_aligned; > > > + /**< Counter to allow for multiple simultaneous QS queries */ > > > + > > > + uint32_t num_elems __rte_cache_aligned; > > > + /**< Number of elements in the thread ID array */ > > > + uint32_t m_threads; > > > + /**< Maximum number of threads this RCU variable will use */ > > > + > > > + uint64_t reg_thread_id[RTE_QSBR_THRID_ARRAY_ELEMS] > > __rte_cache_aligned; > > > + /**< Registered thread IDs are stored in a bitmap array */ > > > > > > As I understand you ended up with fixed size array to avoid 2 variable size > > arrays in this struct? > Yes > > > Is that big penalty for register/unregister() to either store a pointer to > > bitmap, > > or calculate it based on num_elems value? > In the last RFC I sent out [1], I tested the impact of having non-fixed size > array. There 'was' a performance degradation in most of the > performance tests. The issue was with calculating the address of per thread > QSBR counters (not with the address calculation of the bitmap). > With the current patch, I do not see the performance difference (the > difference between the RFC and this patch are the memory orderings, > they are masking any perf gain from having a fixed array). However, I have > kept the fixed size array as the generated code does not have > additional calculations to get the address of qsbr counter array elements. > > [1] http://mails.dpdk.org/archives/dev/2019-February/125029.html
Ok I see, but can we then arrange them ina different way: qsbr_cnt[] will start at the end of struct rte_rcu_qsbr (same as you have it right now). While bitmap will be placed after qsbr_cnt[]. As I understand register/unregister is not consider on critical path, so some perf-degradation here doesn't matter. Also check() would need extra address calculation for bitmap, but considering that we have to go through all bitmap (and in worst case qsbr_cnt[]) anyway, that probably not a big deal? > > > As another thought - do we really need bitmap at all? > The bit map is helping avoid accessing all the elements in rte_rcu_qsbr_cnt > array (as you have mentioned below). This provides the ability to > scale the number of threads dynamically. For ex: an application can create a > qsbr variable with 48 max threads, but currently only 2 threads > are active (due to traffic conditions). I understand that bitmap supposed to speedup check() for situations when most threads are unregistered. My thought was that might be check() speedup for such situation is not that critical. > > > Might it is possible to sotre register value for each thread inside it's > > rte_rcu_qsbr_cnt: > > struct rte_rcu_qsbr_cnt {uint64_t cnt; uint32_t register;} > > __rte_cache_aligned; ? > > That would cause check() to walk through all elems in rte_rcu_qsbr_cnt > > array, > > but from other side would help to avoid cache conflicts for > > register/unregister. > With the addition of rte_rcu_qsbr_thread_online/offline APIs, the > register/unregister APIs are not in critical path anymore. Hence, the > cache conflicts are fine. The online/offline APIs work on thread specific > cache lines and these are in the critical path. > > > > > > +} __rte_cache_aligned; > > > +