Hi Jerin,

> >
> >
> > > The CPU also
> > > knows already the value that will be written to cons.tail and that
> > > value does not depend on the previous read either. The CPU does not know 
> > > we are planning to do a spinlock there, so it might do things
> out-of-order without proper dependencies.
> > >
> > > > For  __rte_ring_sc_do_dequeue(), I think you right, we might need
> > > > something stronger.
> > > > I don't want to put rte_smp_mb() here as it would cause full HW
> > > > barrier even on machines with strong memory order (IA).
> > > > I think that rte_smp_wmb() might be enough here:
> > > > it would force cpu to wait till writes in DEQUEUE_PTRS() are
> > > > become visible, which means reads have to be completed too.
> > >
> > > In practice I think that rte_smp_wmb() would work fine, even though
> > > it is not strictly according to the book. Below solution would be my
> > > proposal as a fix to the issue of sc dequeueing (and also to mc 
> > > dequeueing, if we have the problem of CPU completely ignoring the
> spinlock in reality there):
> > >
> > > DEQUEUE_PTRS();
> > > ..
> > > rte_smp_wmb();
> > > r->cons.tail = cons_next;
> >
> > As I said in previous email - it looks good for me for
> > _rte_ring_sc_do_dequeue(), but I am interested to hear what  ARM and PPC 
> > maintainers think about it.
> > Jan, Jerin do you have any comments on it?
> 
> Actually it is NOT performance effective and difficult to capture the ORDER 
> dependency with plane store and load barriers on WEAK
> ordered machines.
> Beyond plane store and load barriers, We need to express  #LoadLoad, 
> #LoadStore,#StoreStore barrier dependency with Acquire and
> Release Semantics in Arch neutral code(Looks like this is compiler barrier on 
> IA) http://preshing.com/20120913/acquire-and-release-
> semantics/
> 
> For instance, Full barrier CAS(__sync_bool_compare_and_swap) will not be 
> required for weak ordered machine in MP case.
> I can send out a RFC version of ring implementation changes required with 
> acquire-and-release-semantics.
> If it has performance degradation on IA then we can separate it out through 
> conditional compilation flag.
> 
> GCC Built-in Functions for Memory Model Aware Atomic Operations 
> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

I am not sure what exactly changes you are planning,
but I suppose I'd just wait for your RFC here.
Though my question was: what do you think about current 
_rte_ring_sc_do_dequeue()? 
Do you agree that rmb() is not sufficient here and does Juhamatti patch:
http://dpdk.org/dev/patchwork/patch/14846/
looks good to you?
It looks good to me ,and I am going to ACK it, but thought you'd better
have a look too. 
Thanks
Konstantin


> 
> Thoughts ?
> 
> Jerin
> 
> > Chao, sorry but I still not sure why PPC is considered as architecture with 
> > strong memory ordering?
> > Might be I am missing something obvious here.
> > Thank
> > Konstantin
> >

Reply via email to