re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread Paul Goyette

I agree that it probably needs to be completely re-written if it were
to be generally useable.  But, given that the only real use-case we
have for it so far is make(1) (and even then, only in "meta-mode"), I
think that the re-write effort far out-weighs the benefits.


right, i agree.

it's still not useful for mips64 platforms today -- since those
default to 32 bit binaries on 64 bit kernel.

same with running 32 bit chroot builds on i386.


Yes, that would definitely be a problem with the currnt implementation.


Besides, I don't know enough about the vfs layer to get it right!  :)


not too late to learn :-)


Well, the US tax people seem to think I've still got a few more years
left, so maybe I'll take a look.  :)

One more issue with the current mechanism:  what happens if we have
"something else" arrive that also wants to hijack syscalls?  I can see
all sorts of potential confusion, depending on how much overlap there
is in the set-of-syscalls-to-hijack and on which code loads/unloads
first...  :(

I'm tempted to implement a syscall_{hijack,restore}() capability,
similar to syscall_{,dis}establish(), but I'm afraid that would just
encourage more bad behavior!

:)



+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread matthew green
Paul Goyette writes:
> > i don't think that filemon(4) is done properly or at the right
> > layers.  it should hook directly into the vfs layer itself, not
> > hijack system calls.  it certainly still doesn't work properly
> > for netbds32 binaries or any other binaries.
> >
> > your changes are helpful, but it's still very very broken.
> 
> I agree that it probably needs to be completely re-written if it were
> to be generally useable.  But, given that the only real use-case we
> have for it so far is make(1) (and even then, only in "meta-mode"), I
> think that the re-write effort far out-weighs the benefits.

right, i agree.

it's still not useful for mips64 platforms today -- since those
default to 32 bit binaries on 64 bit kernel.

same with running 32 bit chroot builds on i386.

> Besides, I don't know enough about the vfs layer to get it right!  :)

not too late to learn :-)


.mrg.


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread Paul Goyette

i don't think that filemon(4) is done properly or at the right
layers.  it should hook directly into the vfs layer itself, not
hijack system calls.  it certainly still doesn't work properly
for netbds32 binaries or any other binaries.

your changes are helpful, but it's still very very broken.


I agree that it probably needs to be completely re-written if it were
to be generally useable.  But, given that the only real use-case we
have for it so far is make(1) (and even then, only in "meta-mode"), I
think that the re-write effort far out-weighs the benefits.

Besides, I don't know enough about the vfs layer to get it right!  :)



+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread matthew green
> >> - filemon(4) is utterly broken.  it literally replaces the sysent
> >>   values for the current emulation while active.  so it has an ugly
> >>   obvious problem with runtime usage (particularly at unload.)
> >>   however, a much bigger problem is that it adjusts an emulation
> >>   that might not be the default one.  (32 bit modload of filemon
> >>   on a 64 bit kernel, i think, will adjust the netbsd32 sysent.)
> >>   so it could break the emulation.  this also means that every
> >>   other emulation will not be affected and IO will be missed (see:
> >>   "utterly broken".)
> >
> > I think this is worth being a separate PR.
> >
> > (My original question was about
> > syscall_establish(9)/syscall_disestablish(9).  Now I believe these are
> > OK; but their users doing bad things is a different story.)
> 
> Well, filemon(4) doesn't even use syscall_{,dis}establish().  It just
> manipulates the sysent[] table directly.
> 
> In any case, I've gone through filemon in some detail, and I think
> I've fixed all of the really bad things.  It should be safe to use,
> although the only real use-case that anyone can identify is make(1).

i don't think that filemon(4) is done properly or at the right
layers.  it should hook directly into the vfs layer itself, not
hijack system calls.  it certainly still doesn't work properly
for netbds32 binaries or any other binaries.

your changes are helpful, but it's still very very broken.


.mrg.


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread Paul Goyette

On Sun, 22 Nov 2015, Masao Uebayashi wrote:


On Wed, Nov 18, 2015 at 3:18 PM, matthew green  wrote:

there's a fairly significant problem with this implementation.

you're only catching callers who use end up going through sy_call()
and that's not the majority.  mostly they're called directly from
MD code.  so to fix that, you have to update every platform syscall
handler.


FWIW, I did check the archives from 8 years ago when ad@ added the
sy_call() wrapper.  It appears that he had already addressed the
several md-specific syscall dispatchers to use the wrapper, so it
seems that we don't need to do anything special.  See [1] for the
relevant commit.  If you can point out any other code that still
needs to be updated, please do!

[1]http://mail-index.netbsd.org/source-changes/2008/10/21/msg211565.html


i think you're OK here.  i did happen to find a couple of weird things
just now:

- filemon(4) is utterly broken.  it literally replaces the sysent
  values for the current emulation while active.  so it has an ugly
  obvious problem with runtime usage (particularly at unload.)
  however, a much bigger problem is that it adjusts an emulation
  that might not be the default one.  (32 bit modload of filemon
  on a 64 bit kernel, i think, will adjust the netbsd32 sysent.)
  so it could break the emulation.  this also means that every
  other emulation will not be affected and IO will be missed (see:
  "utterly broken".)


I think this is worth being a separate PR.

(My original question was about
syscall_establish(9)/syscall_disestablish(9).  Now I believe these are
OK; but their users doing bad things is a different story.)


Well, filemon(4) doesn't even use syscall_{,dis}establish().  It just
manipulates the sysent[] table directly.

In any case, I've gone through filemon in some detail, and I think
I've fixed all of the really bad things.  It should be safe to use,
although the only real use-case that anyone can identify is make(1).



+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread Masao Uebayashi
On Wed, Nov 18, 2015 at 3:18 PM, matthew green  wrote:
>> >>> there's a fairly significant problem with this implementation.
>> >>>
>> >>> you're only catching callers who use end up going through sy_call()
>> >>> and that's not the majority.  mostly they're called directly from
>> >>> MD code.  so to fix that, you have to update every platform syscall
>> >>> handler.
>>
>> FWIW, I did check the archives from 8 years ago when ad@ added the
>> sy_call() wrapper.  It appears that he had already addressed the
>> several md-specific syscall dispatchers to use the wrapper, so it
>> seems that we don't need to do anything special.  See [1] for the
>> relevant commit.  If you can point out any other code that still
>> needs to be updated, please do!
>>
>> [1]http://mail-index.netbsd.org/source-changes/2008/10/21/msg211565.html
>
> i think you're OK here.  i did happen to find a couple of weird things
> just now:
>
> - filemon(4) is utterly broken.  it literally replaces the sysent
>   values for the current emulation while active.  so it has an ugly
>   obvious problem with runtime usage (particularly at unload.)
>   however, a much bigger problem is that it adjusts an emulation
>   that might not be the default one.  (32 bit modload of filemon
>   on a 64 bit kernel, i think, will adjust the netbsd32 sysent.)
>   so it could break the emulation.  this also means that every
>   other emulation will not be affected and IO will be missed (see:
>   "utterly broken".)

I think this is worth being a separate PR.

(My original question was about
syscall_establish(9)/syscall_disestablish(9).  Now I believe these are
OK; but their users doing bad things is a different story.)


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread Masao Uebayashi
On Thu, Nov 19, 2015 at 10:30 PM, Eric Haszlakiewicz  wrote:
> On November 19, 2015 4:28:46 AM EST, Paul Goyette  
> wrote:
>>And if there's anyone who really understands HOW the initial syscall
>>gets interrupted when the signal is being delivered (and HOW it gets
>>restarted when the handler returns) I would love an explanation!
>>
>
> Regarding restart, I don't think it needs to be explicitly saved anywhere, 
> because the original call in the userland program has the necessary 
> information.
>
> E.g. if you look at the restart path for alpha (just because that's the first 
> search result I found) you see that it simply moves the userland process' 
> program counter back a bit, and let's the process trigger the syscall again:
> http://nxr.netbsd.org/xref/src/sys/arch/alpha/alpha/syscall.c#203
>
> Dunno about how it's actually interrupted though.

My understanding (as of writing this mail) is that, when syscall()
receives ERESTART, it updates PC in the trapframe; which will be
stored in the signal handler stack by sendsig_siginfo(), and then
later be restored by setcontext(2), when signal handler returns from
signal trampoline.

>
> Eric
>


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-21 Thread Masao Uebayashi
On Thu, Nov 19, 2015 at 1:43 PM, Paul Goyette  wrote:
> On Thu, 19 Nov 2015, Masao Uebayashi wrote:
>
>> On Wed, Nov 18, 2015 at 11:07 AM, Paul Goyette 
>> wrote:
>>>
>>> Based on earlier comments, I've come up with a much-less-intrusive
>>> set of changes.  This time around, there are no bit masks and no new
>>> members in any system structures.  (I'm pretty sure we won't even
>>> need a kernel version bump for this.)
>>>
>>> Instead, I've modified sy_call() to check the current value of
>>> l_sysent before updating with the new syscall pointer.  If the
>>> current value is non-NULL then set a new flag bit LP_LOST_SYSENT in
>>> the lwp's l_pflag member.  This check is the only per-syscall cost:
>>>
>>> if (l->l_sysent != NULL)
>>> l->l_pflag |= LP_LOST_SYSENT;
>>
>>
>> My understanding is that l_sysent is *always* overriden by syscalls
>> from within signal handlers, right?
>
>
> It's not a question of "being overwritten".  The real situation is
> that the _initial_ syscall returns first (with ERESTART), and l_sysent
> is reset to NULL.  Then when the signal handler calls some other
> syscall, there is nothing to lose.

I think I understand these now ... I misunderstood that signal is
delivered (signal handler is called) first then kernel stack from
tsleep() is returned back to syscall() ... I am so stupid. :(

I also realize that grasping the big picture of signal delivery is not
difficult once understood ... but still confusing as I wrote in the
other mail.  (I wish I could have understand these 10 years ago.)

You can close the PR.

>
> Then, when the handler returns, the _original_ syscall is automatically
> restarted.
>
> As I said earlier, I have not identified the exact code where all of
> this happens.  But empirical evidence (from printf()s inserted in
> sy_call() both before and after the dispatch) shows that it works as
> I've described.  There was never any non-NULL value in l_sysent at
> the point where sy_call() stores a new value.  And the printf() I
> had inserted after the syscall dispatch clearly showed a return value
> of -3 == ERESTART.
>
>
>> I don't fully understand these things yet, but ... my gut feeling is
>> that l_sysent should be saved on stack in mi_switch(), like oldspl,
>> before cpu_switchto(), and restored after that.  In resuming code
>> path, check if syscall exists, in case syscall was forcibly
>> disestablished.  If it doesn't exist, forcibly kill self lwp ... maybe
>> using sigexit()?
>
>
> I'm expecting that the old l_sysent is being saved on the stack by
> the signal code;  I just don't know where (since there doesn't seem
> to be any such reference to l_sysent!).
>
> When the syscall gets restarted, it will get auto-loaded again if it
> has been disestablished, so nothing is lost.
>
>> (I believe someone from CS department understand such things 100% and
>> correct me if wrong. ;)
>
>
>
> +--+--+-+
> | Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
> | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
> +--+--+-+


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-19 Thread Eric Haszlakiewicz
On November 19, 2015 4:28:46 AM EST, Paul Goyette  
wrote:
>And if there's anyone who really understands HOW the initial syscall
>gets interrupted when the signal is being delivered (and HOW it gets
>restarted when the handler returns) I would love an explanation!
>

Regarding restart, I don't think it needs to be explicitly saved anywhere, 
because the original call in the userland program has the necessary information.

E.g. if you look at the restart path for alpha (just because that's the first 
search result I found) you see that it simply moves the userland process' 
program counter back a bit, and let's the process trigger the syscall again:
http://nxr.netbsd.org/xref/src/sys/arch/alpha/alpha/syscall.c#203

Dunno about how it's actually interrupted though.

Eric



Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-19 Thread Paul Goyette

My understanding is that l_sysent is *always* overriden by syscalls
from within signal handlers, right?


It's not a question of "being overwritten".  The real situation is
that the _initial_ syscall returns first (with ERESTART), and l_sysent
is reset to NULL.  Then when the signal handler calls some other
syscall, there is nothing to lose.

Then, when the handler returns, the _original_ syscall is automatically
restarted.

As I said earlier, I have not identified the exact code where all of
this happens.  But empirical evidence (from printf()s inserted in
sy_call() both before and after the dispatch) shows that it works as
I've described.  There was never any non-NULL value in l_sysent at
the point where sy_call() stores a new value.  And the printf() I
had inserted after the syscall dispatch clearly showed a return value
of -3 == ERESTART.


If anyone wants to reconfirm my empirical results, I have attached my
test program here.

1. Apply the syscallvar.h.diff patch file
2. Build and boot a new kernel
3. Build the test program, and run it.

BTW, using qemu and anita is great for this sort of testing.  :)

And if there's anyone who really understands HOW the initial syscall
gets interrupted when the signal is being delivered (and HOW it gets
restarted when the handler returns) I would love an explanation!


+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+

test.tgz
Description: Binary data


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-18 Thread Paul Goyette

On Thu, 19 Nov 2015, Masao Uebayashi wrote:


On Wed, Nov 18, 2015 at 11:07 AM, Paul Goyette  wrote:

Based on earlier comments, I've come up with a much-less-intrusive
set of changes.  This time around, there are no bit masks and no new
members in any system structures.  (I'm pretty sure we won't even
need a kernel version bump for this.)

Instead, I've modified sy_call() to check the current value of
l_sysent before updating with the new syscall pointer.  If the
current value is non-NULL then set a new flag bit LP_LOST_SYSENT in
the lwp's l_pflag member.  This check is the only per-syscall cost:

if (l->l_sysent != NULL)
l->l_pflag |= LP_LOST_SYSENT;


My understanding is that l_sysent is *always* overriden by syscalls
from within signal handlers, right?


It's not a question of "being overwritten".  The real situation is
that the _initial_ syscall returns first (with ERESTART), and l_sysent
is reset to NULL.  Then when the signal handler calls some other
syscall, there is nothing to lose.

Then, when the handler returns, the _original_ syscall is automatically
restarted.

As I said earlier, I have not identified the exact code where all of
this happens.  But empirical evidence (from printf()s inserted in
sy_call() both before and after the dispatch) shows that it works as
I've described.  There was never any non-NULL value in l_sysent at
the point where sy_call() stores a new value.  And the printf() I
had inserted after the syscall dispatch clearly showed a return value
of -3 == ERESTART.



I don't fully understand these things yet, but ... my gut feeling is
that l_sysent should be saved on stack in mi_switch(), like oldspl,
before cpu_switchto(), and restored after that.  In resuming code
path, check if syscall exists, in case syscall was forcibly
disestablished.  If it doesn't exist, forcibly kill self lwp ... maybe
using sigexit()?


I'm expecting that the old l_sysent is being saved on the stack by
the signal code;  I just don't know where (since there doesn't seem
to be any such reference to l_sysent!).

When the syscall gets restarted, it will get auto-loaded again if it
has been disestablished, so nothing is lost.


(I believe someone from CS department understand such things 100% and
correct me if wrong. ;)



+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-18 Thread Masao Uebayashi
On Wed, Nov 18, 2015 at 11:07 AM, Paul Goyette  wrote:
> Based on earlier comments, I've come up with a much-less-intrusive
> set of changes.  This time around, there are no bit masks and no new
> members in any system structures.  (I'm pretty sure we won't even
> need a kernel version bump for this.)
>
> Instead, I've modified sy_call() to check the current value of
> l_sysent before updating with the new syscall pointer.  If the
> current value is non-NULL then set a new flag bit LP_LOST_SYSENT in
> the lwp's l_pflag member.  This check is the only per-syscall cost:
>
> if (l->l_sysent != NULL)
> l->l_pflag |= LP_LOST_SYSENT;

My understanding is that l_sysent is *always* overriden by syscalls
from within signal handlers, right?

I don't fully understand these things yet, but ... my gut feeling is
that l_sysent should be saved on stack in mi_switch(), like oldspl,
before cpu_switchto(), and restored after that.  In resuming code
path, check if syscall exists, in case syscall was forcibly
disestablished.  If it doesn't exist, forcibly kill self lwp ... maybe
using sigexit()?

(I believe someone from CS department understand such things 100% and
correct me if wrong. ;)


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread matthew green
> >>> there's a fairly significant problem with this implementation.
> >>> 
> >>> you're only catching callers who use end up going through sy_call()
> >>> and that's not the majority.  mostly they're called directly from
> >>> MD code.  so to fix that, you have to update every platform syscall
> >>> handler.
> 
> FWIW, I did check the archives from 8 years ago when ad@ added the
> sy_call() wrapper.  It appears that he had already addressed the
> several md-specific syscall dispatchers to use the wrapper, so it
> seems that we don't need to do anything special.  See [1] for the
> relevant commit.  If you can point out any other code that still
> needs to be updated, please do!
> 
> [1]http://mail-index.netbsd.org/source-changes/2008/10/21/msg211565.html

i think you're OK here.  i did happen to find a couple of weird things
just now:

- filemon(4) is utterly broken.  it literally replaces the sysent
  values for the current emulation while active.  so it has an ugly
  obvious problem with runtime usage (particularly at unload.)
  however, a much bigger problem is that it adjusts an emulation
  that might not be the default one.  (32 bit modload of filemon
  on a 64 bit kernel, i think, will adjust the netbsd32 sysent.)
  so it could break the emulation.  this also means that every
  other emulation will not be affected and IO will be missed (see:
  "utterly broken".)

- all the emulation support for ptrace(2) is oddly implemented.
  while most simple calls that just need to adjust for pointer or
  integer sizes, and then call the real function, eg accept:

return (sys_accept(l, &ua, retval));

  the ptrace for netbsd32 (and all other support emuls) calls via
  .sy_call directly:

return (*sysent[SYS_ptrace].sy_call)(l, &ua, retval);

but it's not clear to me why.


.mrg.


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Paul Goyette

Based on earlier comments, I've come up with a much-less-intrusive
set of changes.  This time around, there are no bit masks and no new
members in any system structures.  (I'm pretty sure we won't even
need a kernel version bump for this.)

Instead, I've modified sy_call() to check the current value of
l_sysent before updating with the new syscall pointer.  If the
current value is non-NULL then set a new flag bit LP_LOST_SYSENT in
the lwp's l_pflag member.  This check is the only per-syscall cost:

if (l->l_sysent != NULL)
l->l_pflag |= LP_LOST_SYSENT;

There are no new considerations for syscall_establish().

For syscall_disestablish()'s FOREACH lwp loop, in addition to
checking to see if the lwp has the "target" syscall active, we check
to see if the lwp has the LP_LOST_SYSENT flag set.  If _any_ active
process has this flag set, we cannot know if it has the target
syscall active or not, so we err on the side of caution and disallow
removal of the syscall.


there's a fairly significant problem with this implementation.

you're only catching callers who use end up going through sy_call()
and that's not the majority.  mostly they're called directly from
MD code.  so to fix that, you have to update every platform syscall
handler.


FWIW, I did check the archives from 8 years ago when ad@ added the
sy_call() wrapper.  It appears that he had already addressed the
several md-specific syscall dispatchers to use the wrapper, so it
seems that we don't need to do anything special.  See [1] for the
relevant commit.  If you can point out any other code that still
needs to be updated, please do!

[1]http://mail-index.netbsd.org/source-changes/2008/10/21/msg211565.html




+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+Index: kern/kern_syscall.c
===
RCS file: /cvsroot/src/sys/kern/kern_syscall.c,v
retrieving revision 1.11
diff -u -p -r1.11 kern_syscall.c
--- kern/kern_syscall.c 9 May 2015 05:56:36 -   1.11
+++ kern/kern_syscall.c 17 Nov 2015 22:55:38 -
@@ -178,7 +178,8 @@ syscall_disestablish(const struct emul *
for (i = 0; sp[i].sp_call != NULL; i++) {
mutex_enter(proc_lock);
LIST_FOREACH(l, &alllwp, l_list) {
-   if (l->l_sysent == &sy[sp[i].sp_code]) {
+   if (l->l_sysent == &sy[sp[i].sp_code] ||
+   (l->l_pflag && LP_LOST_SYSENT)) {
break;
}
}
Index: sys/lwp.h
===
RCS file: /cvsroot/src/sys/sys/lwp.h,v
retrieving revision 1.170
diff -u -p -r1.170 lwp.h
--- sys/lwp.h   31 Mar 2015 01:10:02 -  1.170
+++ sys/lwp.h   17 Nov 2015 22:55:38 -
@@ -252,6 +252,7 @@ extern int  maxlwp __read_mostly;   /* max
 #defineLP_SYSCTLWRITE  0x0080 /* sysctl write lock held */
 #defineLP_MUSTJOIN 0x0100 /* Must join kthread on exit */
 #defineLP_VFORKWAIT0x0200 /* Waiting at vfork() for a child */
+#defineLP_LOST_SYSENT  0x0400 /* l_sysent has been overwritten */
 #defineLP_TIMEINTR 0x0001 /* Time this soft interrupt */
 #defineLP_RUNNING  0x2000 /* Active on a CPU */
 #defineLP_BOUND0x8000 /* Bound to a CPU */
Index: sys/syscallvar.h
===
RCS file: /cvsroot/src/sys/sys/syscallvar.h,v
retrieving revision 1.11
diff -u -p -r1.11 syscallvar.h
--- sys/syscallvar.h24 Sep 2015 14:34:22 -  1.11
+++ sys/syscallvar.h17 Nov 2015 22:55:38 -
@@ -61,6 +61,8 @@ sy_call(const struct sysent *sy, struct 
 {
int error;
 
+   if (l->l_sysent != NULL)
+   l->l_pflag |= LP_LOST_SYSENT;
l->l_sysent = sy;
error = (*sy->sy_call)(l, uap, rval);
l->l_sysent = NULL;


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Paul Goyette

On Tue, 17 Nov 2015, Paul Goyette wrote:


there's a fairly significant problem with this implementation.

you're only catching callers who use end up going through sy_call()
and that's not the majority.  mostly they're called directly from
MD code.  so to fix that, you have to update every platform syscall
handler.


Ah, I see.  I'll have to investigate further.


It seems that the sy_call() mechanism is used for all "modular"
syscalls.  As you correctly point out, it is not used for all
syscalls.

(Quoting from further down)


The problem I'm attempting to address is when syscall #1 is blocked,
some signal gets delivered to the lwp, and the signal handler calls some
other syscall #2.  In that case, the single l_sysent gets overwritten
and we now have no record that syscall #1 is still active.


So, the problem seems to become one of much smaller magnitude that it
was at first glance.  It's only a problem if both syscalls #1 and #2
are entered via sy_call(), rather than for any arbitrary syscall #2.


it also slows down normal operation adding branches and data access
to the call path of every system call.  and bloats struct lwp and
more.


As above, it only affects system calls that go through sy_call().  It
does, however, bloat every struct lwp.

It would be possible to transfer that bloat out of struct lwp and into
struct proc.  But that would require an extra dereference in sy_call()
for l->l_proc->.  I'm not sure whether I want memory bloat or execution
bloat.  :)

===

There is one problem I discovered with my current implementation...

The l_scdepth member is currently just incremented and decremented in
sy_call(), and the mask is checked in syscall_disestablish() only when
the depth is greater than 1.  In the case of syscall#1 --> sig-handler
--> syscall#2 --> return-from-#2 --> return-from-sig-handler and back
to the blockage in execution of syscall#2, we would decrement the
l_scdepth member back to 1.  Syscall#1's l_sysent would already have
been overwritten, but we won't check the mask because the depth is 1.

I'm proposing to solve this by replacing the

l->l_scdepth--;

with

l->l_scdepth &= ~1;

As a result, the l_scdepth member will have three states: 0, 1, and 
moer-than-1-at-some-point-in-time (values of 2 or more will never be

decremented below 2).  So now the check in syscall_disestablish()
becomes

if (any lwp has ever had more than 1 active syscall, and that
lwp has ever executed a syscall that belongs to the set), {
disallow the disestablish()
}


+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Paul Goyette

On Tue, 17 Nov 2015, matthew green wrote:


Martin Husemann writes:

On Tue, Nov 17, 2015 at 05:48:17PM +1100, matthew green wrote:

it also slows down normal operation adding branches and data access
to the call path of every system call.  and bloats struct lwp and
more.


At least all operations on the syscall pathes need to be #ifdef MODULAR.

How about just disabling all unloading for modules that ever used
syscall_establish() instead?


... or disable autounload by default anyway, because it's mostly a
way to make my system slow.  i'm not sure i would go all the way
and simply remove support for it completely, but i'd really much
rather it wasn't triggered in a normal system.  the unload, and
then reload, slow down the running of my apps, for no real benefit
as an end user.  there is benefit for kernel developers, but those
people can break the system in much more interesting ways, including
asking for the unload manually when it's time to test something.



From sysctl(7) man page (manually re-wrapped for readability)


kern.module.autotime
 An integer that controls the delay before an
 attempt is made to automatically unload a module
 that was auto-loaded.  Setting this value to zero
 disables the auto-unload function.

So just adding kern.module.autotime=0 in your /etc/sysctl.conf file
should disable auto-unload completely.


+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Paul Goyette

On Tue, 17 Nov 2015, matthew green wrote:


additional problems i've realised:

- this should be process specific, not LWP specific.

- we might need some way to copy this across fork().

- as l_sysent seems unreliable, we should just remove it.


l_sysent serves another purpose:  it is used to restart the syscall
if it originally pointed to sys_nomodule() and we auto-load the
module that provides the syscall.


In a nut-shell, we cannot rely on the struct lwp's l_sysent member
to determine if a particular syscall is "active", since it could be
overwritten during signal handling.  As a result, it is possible to
unload modules which have active syscalls, and when those syscalls
are resumed they'll return into memory that most likely no longer
contains the syscall code.  (A big "Thanks!" goes out to Masao
Uebayashi for identifying this bug.)


ah, i think i understand this now.  i wonder if any of the system
calls that support being restarted actually are modloadable (ie,
is this a non-problem?)  most system calls aren't and the few that
are are pretty basic.


There's actually a fairly large number of mod(auto-)loadable syscalls
as can be found in (generated) source-file kern/syscalls_autoload.c
I don't know how many of these could be interruptable.

I don't think "restartable" is the right adjective here - that would
imply (to me) that the syscall returned ERESTART, rather than not
having returned at all and therefore still being active.  If a syscall
returns with ERESTART, then it would be ok to unload the module, and
retrying the syscall would either fail or re(auto-)load the module.



i have another idea.

if these calls, at entry, mark the module as "busy in proc X".  then
when the unload is attempted, if X is still around, fail.  that
pushes all the resources into the module again, and seems to plug
all the holes.


Hmmm.  This would mean we'd need to keep a _set_of_ prox_X values, and
I'm not sure about sizing (and resizing) the set.  But it certainly is
a possibility.

Can you please update the PR with this alternate suggestion?


+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Paul Goyette

On Tue, 17 Nov 2015, Martin Husemann wrote:


On Tue, Nov 17, 2015 at 05:48:17PM +1100, matthew green wrote:

it also slows down normal operation adding branches and data access
to the call path of every system call.  and bloats struct lwp and
more.


At least all operations on the syscall pathes need to be #ifdef MODULAR.

How about just disabling all unloading for modules that ever used
syscall_establish() instead?


That would mean that pretty much all of the opportunistically-loaded
compat modules would hang around forever once loaded.


+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Paul Goyette

On Tue, 17 Nov 2015, matthew green wrote:



there's a fairly significant problem with this implementation.

you're only catching callers who use end up going through sy_call()
and that's not the majority.  mostly they're called directly from
MD code.  so to fix that, you have to update every platform syscall
handler.


Ah, I see.  I'll have to investigate further.


it also slows down normal operation adding branches and data access
to the call path of every system call.  and bloats struct lwp and
more.

i don't think it's worth paying that for what feature that isn't
really useful to anyone.  (sure, it's useful for you when you're
modifying the calls perhaps, but that comes down to the case of
don't shoot yourself in the foot and unload something you're
still using/testing.)


This is actually more oriented to modules that are going to get auto-
unloaded.  For example, we can (attempt to) load a large list of
compat modules in the hope of getting one that can handle an unknown
emulation/execsw.  It's really nice for (most of) those to just "go
away" if they're not really used.  But you need to be able to make
that determination.


however, it looks like this *should* already be handled by
platforms that do use sy_call() directly, and could be fixed on
everywhere else by simply setting l_sysent the same way.  that's
what the setting of l_sysent does -- "this is the system call i'm
currently actively running" -- and syscall_disestablish() checks
this correctly already.


The problem I'm attempting to address is when syscall #1 is blocked,
some signal gets delivered to the lwp, and the signal handler calls some
other syscall #2.  In that case, the single l_sysent gets overwritten
and we now have no record that syscall #1 is still active.


+--+--+-+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+--+--+-+


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread matthew green
Martin Husemann writes:
> On Tue, Nov 17, 2015 at 05:48:17PM +1100, matthew green wrote:
> > it also slows down normal operation adding branches and data access
> > to the call path of every system call.  and bloats struct lwp and
> > more.
> 
> At least all operations on the syscall pathes need to be #ifdef MODULAR.
> 
> How about just disabling all unloading for modules that ever used
> syscall_establish() instead?

... or disable autounload by default anyway, because it's mostly a
way to make my system slow.  i'm not sure i would go all the way
and simply remove support for it completely, but i'd really much
rather it wasn't triggered in a normal system.  the unload, and
then reload, slow down the running of my apps, for no real benefit
as an end user.  there is benefit for kernel developers, but those
people can break the system in much more interesting ways, including
asking for the unload manually when it's time to test something.


.mrg.


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread matthew green
additional problems i've realised:

- this should be process specific, not LWP specific.

- we might need some way to copy this across fork().

- as l_sysent seems unreliable, we should just remove it.


> In a nut-shell, we cannot rely on the struct lwp's l_sysent member
> to determine if a particular syscall is "active", since it could be
> overwritten during signal handling.  As a result, it is possible to
> unload modules which have active syscalls, and when those syscalls
> are resumed they'll return into memory that most likely no longer
> contains the syscall code.  (A big "Thanks!" goes out to Masao
> Uebayashi for identifying this bug.)

ah, i think i understand this now.  i wonder if any of the system
calls that support being restarted actually are modloadable (ie,
is this a non-problem?)  most system calls aren't and the few that
are are pretty basic.

i have another idea.

if these calls, at entry, mark the module as "busy in proc X".  then
when the unload is attempted, if X is still around, fail.  that
pushes all the resources into the module again, and seems to plug
all the holes.


.mrg.


Re: Problem with syscall_disestablish() - PR kern/50430

2015-11-17 Thread Martin Husemann
On Tue, Nov 17, 2015 at 05:48:17PM +1100, matthew green wrote:
> it also slows down normal operation adding branches and data access
> to the call path of every system call.  and bloats struct lwp and
> more.

At least all operations on the syscall pathes need to be #ifdef MODULAR.

How about just disabling all unloading for modules that ever used
syscall_establish() instead?

Martin


re: Problem with syscall_disestablish() - PR kern/50430

2015-11-16 Thread matthew green
 
there's a fairly significant problem with this implementation.

you're only catching callers who use end up going through sy_call()
and that's not the majority.  mostly they're called directly from
MD code.  so to fix that, you have to update every platform syscall
handler.

it also slows down normal operation adding branches and data access
to the call path of every system call.  and bloats struct lwp and
more.

i don't think it's worth paying that for what feature that isn't
really useful to anyone.  (sure, it's useful for you when you're
modifying the calls perhaps, but that comes down to the case of
don't shoot yourself in the foot and unload something you're
still using/testing.)


however, it looks like this *should* already be handled by
platforms that do use sy_call() directly, and could be fixed on
everywhere else by simply setting l_sysent the same way.  that's
what the setting of l_sysent does -- "this is the system call i'm
currently actively running" -- and syscall_disestablish() checks
this correctly already.


.mrg.