On Mon, Dec 12, 2016 at 05:56:32PM +0000, Eads, Gage wrote:
> 
> 
> >  -----Original Message-----
> >  From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> >  Sent: Wednesday, December 7, 2016 10:42 PM
> >  To: Eads, Gage <gage.e...@intel.com>
> >  1) What if the burst has ATOMIC flows and if we are NOT en-queuing to the
> >  implementation then other event ports won't get the packets from the same
> >  ATOMIC tag ? BAD. Right?
> 
> I'm not sure what scenario you're describing here. The buffered (as 
> implemented in my patch) and non-buffered enqueue operations are functionally 
> the same (as long as the buffer is flushed), the difference lies in when the 
> events are moved from the application level to the PMD.

OK. I will try to explain with time-line

Assume,
flush size: 16
bust size: 4

At t0: dequeued 4 events(3 ordered and 1 atomic events)
At t1: after processing the events, store to the events local buffer
At t2: request to dequeue 4 more events

Now, Since scheduler has been scheduled an atomic event to port at t0, it
can not schedule an atomic event of same TAG to _any port_. As atomic
events from the same TAG's in-flight entry will be always one to enable
the critical section processing in the packet flow.

> 
> >  2) At least, In our HW implementation, The event buffer strategy is more 
> > like, if
> >  you enqueue to HW then ONLY you get the events from dequeue provided if op
> >  == RTE_EVENT_OP_FORWARD.So it will create deadlock.i.e application cannot
> >  hold the events with RTE_EVENT_OP_FORWARD
> 
> If I'm reading this correctly, you're concerned that buffered events can 
> result in deadlock if they're not flushed. Whether the buffering is done in 
> the app itself, inline in the API, or in the PMDs, not flushing the buffer is 
> an application bug. E.g. the app could be fixed by flushing its enqueue 
> buffer after processing every burst dequeued event set, or only if dequeue 
> returns 0 events.

No. At least, our HW implementation, it maintains the state of scheduled events
to a port. Drivers get next set of events ONLY if driver submits
the events which already got on the first dequeue.i.e application cannot
hold the events with RTE_EVENT_OP_FORWARD

> 
> >  3) So considering the above case there is nothing like flush for us
> >  4) In real high throughput benchmark case, we will get the packets at the 
> > rate
> >  of max burst and then we always needs to memcpy before we flush.
> >  Otherwise there will be ordering issue as burst can get us the packet from
> >  different flows(unlike polling mode)
> 
> I take it you're referring to the memcpy in the patch, and not an additional 
> memcpy? At any rate, I'm hoping that SIMD instructions can optimize the 16B 
> event copy.

Hmm. The point was we need to memcpy all the time to maintain the order.

> 
> >  
> >  >
> >  > > and some does not need to hold the buffers if it is DDR backed.
> >  >
> >  
> >  See above. I am not against burst processing in "application".
> >  The flush does not make sense for us in HW perspective and it is costly 
> > for us if
> >  we trying generalize it.
> >  
> 
> Besides the data copy that buffering requires, are there additional costs 
> from your perspective?

It won't even work in our case as HW maintains the context on dequeued
events.

I suggest checking the function call overhead. If it turned out
to have the impact on the performance.Then we can split flow based on capability
flag but I recommend it as last option.

> 
> >  >
> >  > I'm skeptical that other buffering strategies would emerge, but I can 
> > only
> >  speculate on Cavium/NXP/etc. NPU software.
> >  i>
> >  > > IHMO, This may not be the candidate for common code. I guess you can
> >  > > move this to driver side and abstract under SW driver's enqueue_burst.
> >  > >
> >  >
> >  > I don't think that will work without adding a flush API, otherwise we 
> > could
> >  have indefinitely buffered events. I see three ways forward:
> >  
> >  I agree. More portable way is to move the "flush" to the implementation and
> >  "flush"
> >  whenever it makes sense to PMD.
> >  
> >  >
> >  > - The proposed approach
> >  > - Add the proposed functions but make them implementation-specific.
> >  > - Require the application to write its own buffering logic (i.e. no
> >  > API change)
> >  
> >  I think, If the additional function call overhead cost is too much for SW
> >  implementation then we can think of implementation-specific API or custom
> >  application flow based on SW driver.
> >  
> >  But I am not fan of that(but tempted do now a days), If we take that 
> > route, we
> >  have truckload of custom implementation specific API and now we try to hide
> >  all black magic under enqueue/dequeue to make it portable at some expense.
> 
> Agreed, it's not worth special-casing the API with this relatively minor 
> addition.
> 
> Thanks,
> Gage

Reply via email to