On Mon, Dec 12, 2016 at 05:56:32PM +0000, Eads, Gage wrote: > > > > -----Original Message----- > > From: Jerin Jacob [mailto:[email protected]] > > Sent: Wednesday, December 7, 2016 10:42 PM > > To: Eads, Gage <[email protected]> > > 1) What if the burst has ATOMIC flows and if we are NOT en-queuing to the > > implementation then other event ports won't get the packets from the same > > ATOMIC tag ? BAD. Right? > > I'm not sure what scenario you're describing here. The buffered (as > implemented in my patch) and non-buffered enqueue operations are functionally > the same (as long as the buffer is flushed), the difference lies in when the > events are moved from the application level to the PMD.
OK. I will try to explain with time-line Assume, flush size: 16 bust size: 4 At t0: dequeued 4 events(3 ordered and 1 atomic events) At t1: after processing the events, store to the events local buffer At t2: request to dequeue 4 more events Now, Since scheduler has been scheduled an atomic event to port at t0, it can not schedule an atomic event of same TAG to _any port_. As atomic events from the same TAG's in-flight entry will be always one to enable the critical section processing in the packet flow. > > > 2) At least, In our HW implementation, The event buffer strategy is more > > like, if > > you enqueue to HW then ONLY you get the events from dequeue provided if op > > == RTE_EVENT_OP_FORWARD.So it will create deadlock.i.e application cannot > > hold the events with RTE_EVENT_OP_FORWARD > > If I'm reading this correctly, you're concerned that buffered events can > result in deadlock if they're not flushed. Whether the buffering is done in > the app itself, inline in the API, or in the PMDs, not flushing the buffer is > an application bug. E.g. the app could be fixed by flushing its enqueue > buffer after processing every burst dequeued event set, or only if dequeue > returns 0 events. No. At least, our HW implementation, it maintains the state of scheduled events to a port. Drivers get next set of events ONLY if driver submits the events which already got on the first dequeue.i.e application cannot hold the events with RTE_EVENT_OP_FORWARD > > > 3) So considering the above case there is nothing like flush for us > > 4) In real high throughput benchmark case, we will get the packets at the > > rate > > of max burst and then we always needs to memcpy before we flush. > > Otherwise there will be ordering issue as burst can get us the packet from > > different flows(unlike polling mode) > > I take it you're referring to the memcpy in the patch, and not an additional > memcpy? At any rate, I'm hoping that SIMD instructions can optimize the 16B > event copy. Hmm. The point was we need to memcpy all the time to maintain the order. > > > > > > > > > > and some does not need to hold the buffers if it is DDR backed. > > > > > > > See above. I am not against burst processing in "application". > > The flush does not make sense for us in HW perspective and it is costly > > for us if > > we trying generalize it. > > > > Besides the data copy that buffering requires, are there additional costs > from your perspective? It won't even work in our case as HW maintains the context on dequeued events. I suggest checking the function call overhead. If it turned out to have the impact on the performance.Then we can split flow based on capability flag but I recommend it as last option. > > > > > > > I'm skeptical that other buffering strategies would emerge, but I can > > only > > speculate on Cavium/NXP/etc. NPU software. > > i> > > > > IHMO, This may not be the candidate for common code. I guess you can > > > > move this to driver side and abstract under SW driver's enqueue_burst. > > > > > > > > > > I don't think that will work without adding a flush API, otherwise we > > could > > have indefinitely buffered events. I see three ways forward: > > > > I agree. More portable way is to move the "flush" to the implementation and > > "flush" > > whenever it makes sense to PMD. > > > > > > > > - The proposed approach > > > - Add the proposed functions but make them implementation-specific. > > > - Require the application to write its own buffering logic (i.e. no > > > API change) > > > > I think, If the additional function call overhead cost is too much for SW > > implementation then we can think of implementation-specific API or custom > > application flow based on SW driver. > > > > But I am not fan of that(but tempted do now a days), If we take that > > route, we > > have truckload of custom implementation specific API and now we try to hide > > all black magic under enqueue/dequeue to make it portable at some expense. > > Agreed, it's not worth special-casing the API with this relatively minor > addition. > > Thanks, > Gage

