On Fri, Apr 29, 2016 at 10:47:25AM -0300, marcelo.leit...@gmail.com wrote: > On Fri, Apr 29, 2016 at 09:36:37AM -0400, Neil Horman wrote: > > On Thu, Apr 28, 2016 at 05:46:59PM -0300, marcelo.leit...@gmail.com wrote: > > > On Thu, Apr 14, 2016 at 05:19:00PM -0300, marcelo.leit...@gmail.com wrote: > > > > On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote: > > > > > On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote: > > > > > > From: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> > > > > > > Date: Thu, 14 Apr 2016 14:00:49 -0300 > > > > > > > > > > > > > Em 14-04-2016 10:03, Neil Horman escreveu: > > > > > > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote: > > > > > > >>> From: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com> > > > > > > >>> Date: Fri, 8 Apr 2016 16:41:26 -0300 > > > > > > >>> > > > > > > >>>> 1st patch is a preparation for the 2nd. The idea is to not call > > > > > > >>>> ->sk_data_ready() for every data chunk processed while > > > > > > >>>> processing > > > > > > >>>> packets but only once before releasing the socket. > > > > > > >>>> > > > > > > >>>> v2: patchset re-checked, small changelog fixes > > > > > > >>>> v3: on patch 2, make use of local vars to make it more readable > > > > > > >>> > > > > > > >>> Applied to net-next, but isn't this reduced overhead coming at > > > > > > >>> the > > > > > > >>> expense of latency? What if that lower latency is important to > > > > > > >>> the > > > > > > >>> application and/or consumer? > > > > > > >> Thats a fair point, but I'd make the counter argument that, as it > > > > > > >> currently > > > > > > >> stands, any latency introduced (or removed), is an artifact of > > > > > > >> our > > > > > > >> implementation rather than a designed feature of it. That is to > > > > > > >> say, > > > > > > >> we make no > > > > > > >> guarantees at the application level regarding how long it takes > > > > > > >> to > > > > > > >> signal data > > > > > > >> readines from the time we get data off the wire, so I would > > > > > > >> rather see > > > > > > >> our > > > > > > >> throughput raised if we can, as thats been sctp's more pressing > > > > > > >> achilles heel. > > > > > > >> > > > > > > >> > > > > > > >> Thats not to say I'd like to enable lower latency, but I'd > > > > > > >> rather have > > > > > > >> this now, > > > > > > >> and start pondering how to design that in. Perhaps we can > > > > > > >> convert the > > > > > > >> pending > > > > > > >> flag to a counter to count the number of events we enqueue, and > > > > > > >> call > > > > > > >> sk_data_ready every time we reach a sysctl defined threshold. > > > > > > > > > > > > > > That and also that there is no chance of the application reading > > > > > > > the > > > > > > > first chunks before all current ToDo's are performed by either > > > > > > > the bh > > > > > > > or backlog handlers for that packet. Socket lock won't be cycled > > > > > > > in > > > > > > > between chunks so the application is going to wait all the > > > > > > > processing > > > > > > > one way or another. > > > > > > > > > > > > But it takes time to signal the wakeup to the remote cpu the process > > > > > > was running on, schedule out the current process on that cpu (if it > > > > > > has in fact lost it's timeslice), and then finally look at the > > > > > > socket > > > > > > queue. > > > > > > > > > > > > Of course this is all assuming the process was sleeping in the first > > > > > > place, either in recv or more likely poll. > > > > > > > > > > > > I really think signalling early helps performance. > > > > > > > > > > > > > > > > Early, yes, often, not so much :). Perhaps what would be > > > > > adventageous would be > > > > > to signal at the start of a set of enqueues, rather than at the end. > > > > > That would > > > > > be equivalent in terms of not signaling more than needed, but would > > > > > eliminate > > > > > the signaling on every chunk. Perhaps what you could do Marcelo > > > > > would be to > > > > > change the sense of the signal_ready flag to be a has_signaled flag. > > > > > e.g. call > > > > > sk_data_ready in ulp_event_tail like we used to, but only if the > > > > > has_signaled > > > > > flag isn't set, then set the flag, and clear it at the end of the > > > > > command > > > > > interpreter. > > > > > > > > > > That would be a best of both worlds solution, as long as theres no > > > > > chance of > > > > > race with user space reading from the socket before we were done > > > > > enqueuing (i.e. > > > > > you have to guarantee that the socket lock stays held, which I think > > > > > we do). > > > > > > > > That is my feeling too. Will work on it. Thanks :-) > > > > > > I did the change and tested it on real machines set all for performance. > > > I couldn't spot any difference between both implementations. > > > > > > Set RSS and queue irq affinity for a cpu and taskset netperf and another > > > app I wrote to run on another cpu. It hits socket backlog quite often > > > but still do direct processing every now and then. > > > > > > With current state, netperf, scenario above. Results of perf sched > > > record for the CPUs in use, reported by perf sched latency: > > > > > > Task | Runtime ms | Switches | Average delay ms | > > > Maximum delay ms | Maximum delay at | > > > netserver:3205 | 9999.490 ms | 10 | avg: 0.003 ms | > > > max: 0.004 ms | max at: 69087.753356 s > > > > > > another run > > > netserver:3483 | 9999.412 ms | 15 | avg: 0.003 ms | > > > max: 0.004 ms | max at: 69194.749814 s > > > > > > With the patch below, same test: > > > netserver:2643 | 10000.110 ms | 14 | avg: 0.003 ms | > > > max: 0.004 ms | max at: 172.006315 s > > > > > > another run: > > > netserver:2698 | 10000.049 ms | 15 | avg: 0.003 ms | > > > max: 0.004 ms | max at: 368.061672 s > > > > > > I'll be happy to do more tests if you have any suggestions on how/what > > > to test. > > > > > > ---8<--- > > > > > I think this looks reasonable, but can you post it properly please, as a > > patch > > against the head of teh net-next tree, rather than a diff from your previous > > work (which wasn't comitted) > > The idea was to not officially post it yet, more just as a reference, > because I can't see any gains from it. I'm reluctant just due to that, > no strong opinion here on one way or another. > > If you think it's better anyway to signal it early, I'll properly repost > it. > Yeah, your results seem to me to indicate that for your test at least, signaling early vs. late doesn't make alot of difference, but Dave I think made a point in principle in that allowing processes to wake up when we start enqueuing can be better in some situations. So all other things being equal, I'd say go with the method that you have here.
Best Neil > Thanks, > Marcelo > >