> > > > > > > > > > > > >
> > > > > > > > > > > > > After evaluating long term API/ABI issues, I
> > > > > > > > > > > > > think you need to get rid of almost all use of
> > > > > > > > > > > > > inline and visible structures. Yes it might be
> > > > > > > > > > > > > marginally slower, but you thank me
> > > > > > > the first time you have to fix something.
> > > > > > > > > > > > >
> > > > > > > > > > > > Agree, I was planning on another version to
> > > > > > > > > > > > address this (I am yet
> > > > > > > to take a look at your patch addressing the ABI).
> > > > > > > > > > > > The structure visibility definitely needs to be
> addressed.
> > > > > > > > > > > > For the inline functions, is the plan to convert
> > > > > > > > > > > > all the inline functions in DPDK? If yes, I think
> > > > > > > > > > > > we need to consider the performance
> > > > > > > > > > > difference. May be consider L3-fwd application,
> > > > > > > > > > > change all the
> > > > > > > inline functions in its path and run a test?
> > > > > > > > > > >
> > > > > > > > > > > Every function that is not in the direct datapath
> > > > > > > > > > > should not be
> > > > > > > inline.
> > > > > > > > > > > Exceptions or things like rx/tx burst, ring
> > > > > > > > > > > enqueue/dequeue, and packet alloc/free
> > > > > > I do not understand how DPDK can claim ABI compatibility if we
> > > > > > have
> > > > > inline functions (unless we freeze any development in these
> > > > > inline functions forever).
> > > > > >
> > > > > > > > > >
> > > > > > > > > > Plus synchronization routines: spin/rwlock/barrier, etc.
> > > > > > > > > > I think rcu should be one of such exceptions - it is
> > > > > > > > > > just another synchronization mechanism after all (just
> > > > > > > > > > a bit more
> > > > > sophisticated).
> > > > > > > > > > Konstantin
> > > > > > > > >
> > > > > > > > > If you look at the other userspace RCU, you wil see that
> > > > > > > > > the only inlines are the rcu_read_lock,rcu_read_unlock
> > > > > > > > > and
> > > > > > > rcu_reference/rcu_assign_pointer.
> > > > > > > > >
> > > > > > > > > The synchronization logic is all real functions.
> > > > > > > >
> > > > > > > > In fact, I think urcu provides both flavors:
> > > > > > > > https://github.com/urcu/userspace-
> > > > > > > rcu/blob/master/include/urcu/static/
> > > > > > > > urcu-qsbr.h I still don't understand why we have to treat
> > > > > > > > it differently then let say spin-lock/ticket-lock or rwlock.
> > > > > > > > If we gone all the way to create our own version of rcu,
> > > > > > > > we probably want it to be as fast as possible (I know that
> > > > > > > > main speedup should come from the fact that readers don't
> > > > > > > > have to wait for writer to finish, but still...)
> > > > > > > >
> > > > > > > > Konstantin
> > > > > > > >
> > > > > > >
> > > > > > > Having locking functions inline is already a problem in
> > > > > > > current
> > > releases.
> > > > > > > The implementation can not be improved without breaking ABI
> > > > > > > (or doing special workarounds like lock v2)
> > > > > > I think ABI and inline function discussion needs to be taken
> > > > > > up in a
> > > > > different thread.
> > > > > >
> > > > > > Currently, I am looking to hide the structure visibility. I
> > > > > > looked at your
> > > > > patch [1], it is a different case than what I have in this
> > > > > patch. It is a pretty generic use case as well (similar
> > > > > situation exists in other libraries). I think a generic solution 
> > > > > should
> be agreed upon.
> > > > > >
> > > > > > If we have to hide the structure content, the handle to QS
> > > > > > variable
> > > > > returned to the application needs to be opaque. I suggest using
> 'void *'
> > > > > behind which any structure can be used.
> > > > > >
> > > > > > typedef void * rte_rcu_qsbr_t; typedef void * rte_hash_t;
> > > > > >
> > > > > > But it requires typecasting.
> > > > > >
> > > > > > [1] http://patchwork.dpdk.org/cover/52609/
> > > > >
> > > > > C allows structure to be defined without knowing what is in it
> > > therefore.
> > > > >
> > > > > typedef struct rte_rcu_qsbr rte_rcu_qsbr_t;
> > > > >
> > > > > is preferred (or do it without typedef)
> > > > >
> > > > > struct rte_rcu_qsbr;
> > > >
> > > > I see that rte_hash library uses the same approach (struct
> > > > rte_hash in
> > > rte_hash.h, though it is marking as internal). But the ABI
> > > Laboratory tool [1] seems to be reporting incorrect numbers for this
> > > library even though the internal structure is changed.
> > > >
> > > > [1]
> > > > https://abi-
> > > laboratory.pro/index.php?view=compat_report&l=dpdk&v1=19.0
> > > > 2&v2=current&obj=66794&kind=abi
> > >
> > > The problem is rte_hash structure is exposed as part of ABI in
> > > rte_cuckoo_hash.h This was a mistake.
> > Do you mean, due to the use of structure with the same name? I am
> > wondering if it is just a tools issue. The application is not supposed to
> include rte_cuckoo_hash.h.
> >
> > For the RCU library, we either need to go all functions or leave it
> > the way it is. I do not see a point in trying to hide the internal structure
> while having inline functions.
> >
> > I converted the inline functions to function calls.
> >
> > Testing on Arm platform (results *are* repeatable) shows very minimal
> > drop (0.1% to 0.2%) in performance while using lock-free rte_hash data
> structure. But one of the test cases which is just spinning shows good
> amount of drop (41%).
> >
> > Testing on x86 (Xeon Gold 6132 CPU @ 2.60GHz, results *are* pretty
> > repeatable) shows performance improvements (7% to 8%) while using
> lock-free rte_hash data structure. The test cases which is just spinning
> show significant drop (14%, 155%, 231%).
> > Konstantin, any thoughts on the results?
> 
> The fact that function show better result than inline (even for hash) is sort
> of surprise to me.
It was a surprise to me too and counter-intuitive to my understanding.
 
> Don't have any good explanation off-hand, but the actual numbers for
> hash test are huge by itself...
> 
> In general, I still think that sync primitives better to stay inlined - there 
> is
> no much point to create ones and then figure out that no-one using them
> because they are too slow.
> Though if there is no real perf difference between inlined and normal - no
> point to keep it inlined.
> About RCU lib, my thought to have inlined version for 19.05 and do
> further perf testing with it (as I remember there were suggestions about
> using it in l3fwd for guarding routing table or so).
Yes, there is more work planned to integrate the library better which might 
provide more insight. 

> If we'll find there is no real difference - move it to not-inlined version in
> 19.08.
+1.

> It is experimental for now  - so could be changed without formal ABI
> breakage.
> 
>  Konstantin
> 
> 

Reply via email to