Re: [PATCH] fs/select.c: batch user writes in do_sys_poll
On Thu, Aug 13, 2020 at 09:36:52PM +1000, Daniel Axtens wrote: > Hi, > > >> Seem like this could simply use a copy_to_user to further simplify > >> things? > > > > I'll benchmark it and find out. > > I tried this: > > for (walk = head; walk; walk = walk->next) { > - struct pollfd *fds = walk->entries; > - int j; > - > - for (j = 0; j < walk->len; j++, ufds++) > - if (__put_user(fds[j].revents, >revents)) > - goto out_fds; > + if (copy_to_user(ufds, walk->entries, > +sizeof(struct pollfd) * walk->len)) > + goto out_fds; > + ufds += walk->len; > } > > With that approach, the poll2 microbenchmark (which polls 128 fds) is > about as fast as v1. > > However, the poll1 microbenchmark, which polls just 1 fd, regresses a > touch (<1% - ~2%) compared to the current code, although it's largely > within the noise. Thoughts? I'd go with copy_to_user() here; post such variant and I'll throw it into -next after -rc1.
RE: [PATCH] fs/select.c: batch user writes in do_sys_poll
From: Daniel Axtens > Sent: 13 August 2020 12:37 > > >> Seem like this could simply use a copy_to_user to further simplify > >> things? > > > > I'll benchmark it and find out. > > I tried this: > > for (walk = head; walk; walk = walk->next) { > - struct pollfd *fds = walk->entries; > - int j; > - > - for (j = 0; j < walk->len; j++, ufds++) > - if (__put_user(fds[j].revents, >revents)) > - goto out_fds; > + if (copy_to_user(ufds, walk->entries, > +sizeof(struct pollfd) * walk->len)) > + goto out_fds; > + ufds += walk->len; > } > > With that approach, the poll2 microbenchmark (which polls 128 fds) is > about as fast as v1. > > However, the poll1 microbenchmark, which polls just 1 fd, regresses a > touch (<1% - ~2%) compared to the current code, although it's largely > within the noise. Thoughts? Is that with or without 'user copy hardening'? Or use __copy_to_user() to skip all that 'crap'. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] fs/select.c: batch user writes in do_sys_poll
Hi, >> Seem like this could simply use a copy_to_user to further simplify >> things? > > I'll benchmark it and find out. I tried this: for (walk = head; walk; walk = walk->next) { - struct pollfd *fds = walk->entries; - int j; - - for (j = 0; j < walk->len; j++, ufds++) - if (__put_user(fds[j].revents, >revents)) - goto out_fds; + if (copy_to_user(ufds, walk->entries, +sizeof(struct pollfd) * walk->len)) + goto out_fds; + ufds += walk->len; } With that approach, the poll2 microbenchmark (which polls 128 fds) is about as fast as v1. However, the poll1 microbenchmark, which polls just 1 fd, regresses a touch (<1% - ~2%) compared to the current code, although it's largely within the noise. Thoughts? Kind regards, Daniel >> Also please don't pointlessly add overly long lines. > > Weird, I ran the commit through checkpatch and it didn't pick it > up. I'll check the next version more carefully. > > Regards, > Daniel
RE: [PATCH] fs/select.c: batch user writes in do_sys_poll
From: Christoph Hellwig > Sent: 13 August 2020 08:32 > > On Thu, Aug 13, 2020 at 05:11:20PM +1000, Daniel Axtens wrote: > > When returning results to userspace, do_sys_poll repeatedly calls > > put_user() - once per fd that it's watching. > > > > This means that on architectures that support some form of > > kernel-to-userspace access protection, we end up enabling and disabling > > access once for each file descripter we're watching. This is inefficent > > and we can improve things by batching the accesses together. > > > > To make sure there's not too much happening in the window when user > > accesses are permitted, we don't walk the linked list with accesses on. > > This leads to some slightly messy code in the loop, unfortunately. > > > > Unscientific benchmarking with the poll2_threads microbenchmark from > > will-it-scale, run as `./poll2_threads -t 1 -s 15`: > > > > - Bare-metal Power9 with KUAP: ~48.8% speed-up > > - VM on amd64 laptop with SMAP: ~25.5% speed-up > > > > Signed-off-by: Daniel Axtens > > Seem like this could simply use a copy_to_user to further simplify > things? That would copy out 8 bytes/fd instead of 2. So a slight change for 32bit kernels. However the 'user copy hardening' checks that copy_to_user() does probably make a measurable difference. > Also please don't pointlessly add overly long lines. Shorten the error label? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] fs/select.c: batch user writes in do_sys_poll
Christoph Hellwig writes: > On Thu, Aug 13, 2020 at 05:11:20PM +1000, Daniel Axtens wrote: >> When returning results to userspace, do_sys_poll repeatedly calls >> put_user() - once per fd that it's watching. >> >> This means that on architectures that support some form of >> kernel-to-userspace access protection, we end up enabling and disabling >> access once for each file descripter we're watching. This is inefficent >> and we can improve things by batching the accesses together. >> >> To make sure there's not too much happening in the window when user >> accesses are permitted, we don't walk the linked list with accesses on. >> This leads to some slightly messy code in the loop, unfortunately. >> >> Unscientific benchmarking with the poll2_threads microbenchmark from >> will-it-scale, run as `./poll2_threads -t 1 -s 15`: >> >> - Bare-metal Power9 with KUAP: ~48.8% speed-up >> - VM on amd64 laptop with SMAP: ~25.5% speed-up >> >> Signed-off-by: Daniel Axtens > > Seem like this could simply use a copy_to_user to further simplify > things? I'll benchmark it and find out. > Also please don't pointlessly add overly long lines. Weird, I ran the commit through checkpatch and it didn't pick it up. I'll check the next version more carefully. Regards, Daniel
Re: [PATCH] fs/select.c: batch user writes in do_sys_poll
On Thu, Aug 13, 2020 at 05:11:20PM +1000, Daniel Axtens wrote: > When returning results to userspace, do_sys_poll repeatedly calls > put_user() - once per fd that it's watching. > > This means that on architectures that support some form of > kernel-to-userspace access protection, we end up enabling and disabling > access once for each file descripter we're watching. This is inefficent > and we can improve things by batching the accesses together. > > To make sure there's not too much happening in the window when user > accesses are permitted, we don't walk the linked list with accesses on. > This leads to some slightly messy code in the loop, unfortunately. > > Unscientific benchmarking with the poll2_threads microbenchmark from > will-it-scale, run as `./poll2_threads -t 1 -s 15`: > > - Bare-metal Power9 with KUAP: ~48.8% speed-up > - VM on amd64 laptop with SMAP: ~25.5% speed-up > > Signed-off-by: Daniel Axtens Seem like this could simply use a copy_to_user to further simplify things? Also please don't pointlessly add overly long lines.