Re: Simplfying copy_siginfo_to_user
Al Virowrites: 2> On Mon, Jul 24, 2017 at 10:43:34AM -0700, Linus Torvalds wrote: >> On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman >> wrote: >> > I played with some clever changes such as limiting the copy to 48 bytes, >> > disabling the memset and the like but I could not get a strong enough >> > signal to say that any one change removed the extra or a clear part of >> > it 20ns. >> >> What CPU did you use? Because the SMAP bit in particular matters. >> >> The field-by-field copies are extremely slow on modern CPU's that >> implement SMAP, unless you also use the special "unsafe_put_user()" >> code (or the nasty old put_user_ex() code that some of the x86 signal >> code uses). >> >> So one of the advantages of just copy_to_user() ends up being visible >> only on Broadwell+ (or whatever the SMAP cutoff is). > > Guys, could you take a look at vfs.git#work.siginfo? I'd been pretty > much buried lately (and probably will for several more weeks - long-distance > moves *suck*), so that thing got stalled, but it might be worth a > look. There is some good stuff in there. If you don't mind I am going to cherry pick out your unification of struct siginfo and struct compat_siginfo. > The code generated in copy_siginfo_to_user() in it looks reasonably good, > we don't copy more than we need and all copying to userland is done > by copy_to_user() - one call per call of copy_siginfo_to_user(), so > SMAP crap is not an issue. There is actually a core problem with doing things that way. You rely on having the siginfo union member stored in the high bits of si_code. I have just fixed that in my tree and replaced using the high bits with calling the function siginfo_layout. It has been a significant problem storing the union member differently in the kernel than in userspace. It has allowed for some pretty horrendous gaffs in the archictecures changing the meaning of SI_USER when specific signals are delivered over. It has also meant that ptrace siginfo injection and tg_sigqueueinfo have been broken for some signals almost since the interface was added. Without any optimization and just changing the code to be copy_to_user I am seeing a maybe 2% slowdown. Given that no one has seemed to care overly for the performance of signal delivery I suspect an almost unmeasurable slowdown is a reasonable tradeoff for simpler code. > The next thing I hope to do is converting compat side of that thing to > the same; that got stalled. All of that said your precise copying code appears reasonable and quite nice so I may adopt it on the compat side. > Al "Buried in boxes" Viro... Eric "Also Buried in boxes" Biederman
Re: Simplfying copy_siginfo_to_user
Al Viro writes: 2> On Mon, Jul 24, 2017 at 10:43:34AM -0700, Linus Torvalds wrote: >> On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman >> wrote: >> > I played with some clever changes such as limiting the copy to 48 bytes, >> > disabling the memset and the like but I could not get a strong enough >> > signal to say that any one change removed the extra or a clear part of >> > it 20ns. >> >> What CPU did you use? Because the SMAP bit in particular matters. >> >> The field-by-field copies are extremely slow on modern CPU's that >> implement SMAP, unless you also use the special "unsafe_put_user()" >> code (or the nasty old put_user_ex() code that some of the x86 signal >> code uses). >> >> So one of the advantages of just copy_to_user() ends up being visible >> only on Broadwell+ (or whatever the SMAP cutoff is). > > Guys, could you take a look at vfs.git#work.siginfo? I'd been pretty > much buried lately (and probably will for several more weeks - long-distance > moves *suck*), so that thing got stalled, but it might be worth a > look. There is some good stuff in there. If you don't mind I am going to cherry pick out your unification of struct siginfo and struct compat_siginfo. > The code generated in copy_siginfo_to_user() in it looks reasonably good, > we don't copy more than we need and all copying to userland is done > by copy_to_user() - one call per call of copy_siginfo_to_user(), so > SMAP crap is not an issue. There is actually a core problem with doing things that way. You rely on having the siginfo union member stored in the high bits of si_code. I have just fixed that in my tree and replaced using the high bits with calling the function siginfo_layout. It has been a significant problem storing the union member differently in the kernel than in userspace. It has allowed for some pretty horrendous gaffs in the archictecures changing the meaning of SI_USER when specific signals are delivered over. It has also meant that ptrace siginfo injection and tg_sigqueueinfo have been broken for some signals almost since the interface was added. Without any optimization and just changing the code to be copy_to_user I am seeing a maybe 2% slowdown. Given that no one has seemed to care overly for the performance of signal delivery I suspect an almost unmeasurable slowdown is a reasonable tradeoff for simpler code. > The next thing I hope to do is converting compat side of that thing to > the same; that got stalled. All of that said your precise copying code appears reasonable and quite nice so I may adopt it on the compat side. > Al "Buried in boxes" Viro... Eric "Also Buried in boxes" Biederman
Re: Simplfying copy_siginfo_to_user
On Mon, Jul 24, 2017 at 10:43:34AM -0700, Linus Torvalds wrote: > On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman >wrote: > > I played with some clever changes such as limiting the copy to 48 bytes, > > disabling the memset and the like but I could not get a strong enough > > signal to say that any one change removed the extra or a clear part of > > it 20ns. > > What CPU did you use? Because the SMAP bit in particular matters. > > The field-by-field copies are extremely slow on modern CPU's that > implement SMAP, unless you also use the special "unsafe_put_user()" > code (or the nasty old put_user_ex() code that some of the x86 signal > code uses). > > So one of the advantages of just copy_to_user() ends up being visible > only on Broadwell+ (or whatever the SMAP cutoff is). Guys, could you take a look at vfs.git#work.siginfo? I'd been pretty much buried lately (and probably will for several more weeks - long-distance moves *suck*), so that thing got stalled, but it might be worth a look. The code generated in copy_siginfo_to_user() in it looks reasonably good, we don't copy more than we need and all copying to userland is done by copy_to_user() - one call per call of copy_siginfo_to_user(), so SMAP crap is not an issue. The next thing I hope to do is converting compat side of that thing to the same; that got stalled. Al "Buried in boxes" Viro...
Re: Simplfying copy_siginfo_to_user
On Mon, Jul 24, 2017 at 10:43:34AM -0700, Linus Torvalds wrote: > On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman > wrote: > > I played with some clever changes such as limiting the copy to 48 bytes, > > disabling the memset and the like but I could not get a strong enough > > signal to say that any one change removed the extra or a clear part of > > it 20ns. > > What CPU did you use? Because the SMAP bit in particular matters. > > The field-by-field copies are extremely slow on modern CPU's that > implement SMAP, unless you also use the special "unsafe_put_user()" > code (or the nasty old put_user_ex() code that some of the x86 signal > code uses). > > So one of the advantages of just copy_to_user() ends up being visible > only on Broadwell+ (or whatever the SMAP cutoff is). Guys, could you take a look at vfs.git#work.siginfo? I'd been pretty much buried lately (and probably will for several more weeks - long-distance moves *suck*), so that thing got stalled, but it might be worth a look. The code generated in copy_siginfo_to_user() in it looks reasonably good, we don't copy more than we need and all copying to userland is done by copy_to_user() - one call per call of copy_siginfo_to_user(), so SMAP crap is not an issue. The next thing I hope to do is converting compat side of that thing to the same; that got stalled. Al "Buried in boxes" Viro...
Re: Simplfying copy_siginfo_to_user
Linus Torvaldswrites: > On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman > wrote: >> I played with some clever changes such as limiting the copy to 48 bytes, >> disabling the memset and the like but I could not get a strong enough >> signal to say that any one change removed the extra or a clear part of >> it 20ns. > > What CPU did you use? Because the SMAP bit in particular matters. > > The field-by-field copies are extremely slow on modern CPU's that > implement SMAP, unless you also use the special "unsafe_put_user()" > code (or the nasty old put_user_ex() code that some of the x86 signal > code uses). > > So one of the advantages of just copy_to_user() ends up being visible > only on Broadwell+ (or whatever the SMAP cutoff is). Good point. The cpu I was testing on was an AMD A10. I don't actually have a cpu that supports SMAP handy. If you would like I can post the minimal patches and benckmark so anyone who is interested could reproduce this for themselves. I suspect that if it is down to only 20ns without SMAP this will definitely be a performance improvement in the presence of SMAP. Eric
Re: Simplfying copy_siginfo_to_user
Linus Torvalds writes: > On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman > wrote: >> I played with some clever changes such as limiting the copy to 48 bytes, >> disabling the memset and the like but I could not get a strong enough >> signal to say that any one change removed the extra or a clear part of >> it 20ns. > > What CPU did you use? Because the SMAP bit in particular matters. > > The field-by-field copies are extremely slow on modern CPU's that > implement SMAP, unless you also use the special "unsafe_put_user()" > code (or the nasty old put_user_ex() code that some of the x86 signal > code uses). > > So one of the advantages of just copy_to_user() ends up being visible > only on Broadwell+ (or whatever the SMAP cutoff is). Good point. The cpu I was testing on was an AMD A10. I don't actually have a cpu that supports SMAP handy. If you would like I can post the minimal patches and benckmark so anyone who is interested could reproduce this for themselves. I suspect that if it is down to only 20ns without SMAP this will definitely be a performance improvement in the presence of SMAP. Eric
Re: Simplfying copy_siginfo_to_user
On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biedermanwrote: > I played with some clever changes such as limiting the copy to 48 bytes, > disabling the memset and the like but I could not get a strong enough > signal to say that any one change removed the extra or a clear part of > it 20ns. What CPU did you use? Because the SMAP bit in particular matters. The field-by-field copies are extremely slow on modern CPU's that implement SMAP, unless you also use the special "unsafe_put_user()" code (or the nasty old put_user_ex() code that some of the x86 signal code uses). So one of the advantages of just copy_to_user() ends up being visible only on Broadwell+ (or whatever the SMAP cutoff is). Linus
Re: Simplfying copy_siginfo_to_user
On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman wrote: > I played with some clever changes such as limiting the copy to 48 bytes, > disabling the memset and the like but I could not get a strong enough > signal to say that any one change removed the extra or a clear part of > it 20ns. What CPU did you use? Because the SMAP bit in particular matters. The field-by-field copies are extremely slow on modern CPU's that implement SMAP, unless you also use the special "unsafe_put_user()" code (or the nasty old put_user_ex() code that some of the x86 signal code uses). So one of the advantages of just copy_to_user() ends up being visible only on Broadwell+ (or whatever the SMAP cutoff is). Linus