Re: In-kernel process exit hooks?

2016-01-09 Thread Paul Goyette

Let me rephrase - this is the only explanation I'm going to provide:

_I_ am not going to remove it.  If others feel so strongly that they 
would rather remove existing functionality (as ugly as it is), then 
_they_ can do the deed.




On Sat, 9 Jan 2016, Wolfgang Solfrank wrote:


Hi,


We all agree that filemon(4) is an ugly hack.  It probably should never
have gotten committed.  But it is there now, and there are a (very) few
use-cases.  So we don't want to remove it without having a replacement
implementation.


Well, can you explain?  Why would we not want to remove it and be done
with that nonsense?

Ciao,
Wolfgang
--
wolfg...@solfrank.net   Wolfgang Solfrank



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+--+--++


Re: In-kernel process exit hooks?

2016-01-09 Thread Mateusz Guzik
On Fri, Jan 08, 2016 at 07:08:19AM +, David Holland wrote:
> On Thu, Jan 07, 2016 at 07:34:33AM +0800, Paul Goyette wrote:
>  > Based on internal implementation of filemon(4), there is an ordering
>  > requirement imposed on the sequence of events that occur when a process
>  > using /dev/filemon exits.  In particular, the file descriptor on which the
>  > device is open needs to be closed prior to closing the descriptor to which
>  > avent/activity records are being logged.  (Otherwise, because of an extra
>  > reference on the log file, fd_close() will hang indefinitely.)
> 
> Looking at the filemon code... it is completely wrong. It should not
> be using file handles like that, especially not in a device driver.
> Rather than adding new hacks to the system to work around it being
> silly, it should be redone properly.
> 
> For an example of the right way to do this kind of thing, look in
> kern_acct.c.
> 

I would like to add my $0,03 seeing that there is time invested into
what I consider a lost cause.

I have to agree that filemon is misdesigned and of low quality in
general.

While I don't know all the details, it is clear that the purpose would
be much better served by ktrace and I would argue efforts should be
spent there.

bmake takes filemon output and copies it into some file.

>From usability perspective what seems to be needed is few hacks to
ktrace to only log the stuff make is interested in and parsing code for
bmake to translate to what looks like current filemon format for
aformentioned copying.

I only had a brief look at ktrace implementation. It uses global locks,
so that would have to be worked on, but it should be trivial to at least
make tracees using different output files not compete with each other.

So, what's so fitting about ktrace design: tracing is tied to the target
process, can react to things like changing credentials and reparenting
(fork + parent exit), but most importantly it does not make assumptions
about things which can change without its control (see below).

Filemon monitors stuff at syscall level, stores the fd and remembers
pids.

Let's see how this leads to trouble. Some of these problems were also
present in FreeBSD implementation (and some still are). A bonus
interesting fact is that implementations have diverged, although are
still fundamentally broken.

http://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon.c#filemon_ioctl
>case FILEMON_SET_PID:
>/* Set the monitored process ID - if allowed. */
>mutex_enter(proc_lock);
>tp = proc_find(*((pid_t *) data));

There are no steps taken to keep tp around nor in a certain state (see
below). proc_find only finds the pointer.

>mutex_exit(proc_lock);
>if (tp == NULL ||
>tp->p_emul != _netbsd) {

tp could have exited and be freed by now. Now that the comparison was
done it could have p_emul changed.

>error = ESRCH;
>break;
>}
>
>error = kauth_authorize_process(curproc->p_cred,
>KAUTH_PROCESS_CANSEE, tp,
>KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL);

Now that the check is pased the process could have executed something
setuid.

I'm somewhat surprised this code is here. Should not
kauth_authorize_process assert that tp has stable credentials in some
way, in effect just crashing filemon consumers?

>if (!error) {
>filemon->fm_pid = tp->p_pid;
>}

Pid is stored, but tp is not held. What if the 'traced' process exits?
grep only reveals the following bit in filemon_wrapper_sys_exit:

>if (filemon->fm_pid == curproc->p_pid)
>filemon_printf(filemon, "# Bye bye\n");

So filemon will continue tracing anything which happened to reuse the
pid.

"Tracing points" are at syscall layer. With only pid remembered this
adds up to serious breakage induced by the need to look the filemon structure
up each time, and based on unreliable data.

>static struct filemon *
>filemon_pid_check(struct proc * p)
>{   
>struct filemon *filemon;
>struct proc * lp;
>
>if (!TAILQ_EMPTY(_inuse)) {
>while (p) {
>/*
> * make sure p cannot exit
> * until we have moved on to p_pptr
> */
>rw_enter(>p_reflock, RW_READER);
>TAILQ_FOREACH(filemon, _inuse, fm_link) {
>if (p->p_pid == filemon->fm_pid) {
>rw_exit(>p_reflock);
>return (filemon);
>}
>}

But p_pid should be constant for 'struct proc' lifetime, so locking this
early is wasteful.

>lp = 

Re: In-kernel process exit hooks?

2016-01-09 Thread Mateusz Guzik
On Sat, Jan 09, 2016 at 08:25:05AM +0100, Mateusz Guzik wrote:
> On Sat, Jan 09, 2016 at 02:25:02PM +0800, Paul Goyette wrote:
> > On Sat, 9 Jan 2016, Mateusz Guzik wrote:
> > 
> > >While I don't know all the details, it is clear that the purpose would
> > >be much better served by ktrace and I would argue efforts should be
> > >spent there.
> > 
> > filemon's purpose is somewhat different than that of ktrace.  There
> > is a limited set of events that filemon captures, and it only
> > captures successful operations.  Its output is quite simple and easy
> > to parse by a human of shell script.
> > 
> 
> ktrace already have limited support for specifying what events are
> wanted. One can add a special filemon-like mode.
> 
> The output of kdump is definitely less friendly for scripts, but it is
> way easier to write a script formatting it than it is to fix filemon.
> 
> > >>From usability perspective what seems to be needed is few hacks to
> > >ktrace to only log the stuff make is interested in and parsing code for
> > >bmake to translate to what looks like current filemon format for
> > >aformentioned copying.
> > 
> > Perhaps.  But filemon is also a run-time loadable module (and it is
> > preferred that it NOT be built-in to any kernels).  If I remember
> > correctly, ktrace has hooks all over the system and does not readily
> > lend itself to modularization.
> > 
> 
> I doubt that compiling in ktrace on a which can handle filemon is an
> issue, and even then it should outweight hops around filemon.
> 
> For correct of a minimalistic safe filemon module the kernel would
> already have to grow hooks in few places (see below).
> 
> 
> > >Filemon monitors stuff at syscall level, stores the fd and remembers
> > >pids.
> > 
> > The fd is no longer being stored...  :)  But yes, it does still
> > remember pids, so it can determine if an event belongs to a monitored
> > process tree.  Again IIRC, ktrace is somewhat more intrusive, in that
> > it manipulates various process flags which can affect behavior of the
> > target (monitored) process.
> 
> It has a dedicated field in struct proc which is the only sensible way
> of implementing this I can think of.
> 
> > >>   error = kauth_authorize_process(curproc->p_cred,
> > >>   KAUTH_PROCESS_CANSEE, tp,
> > >>   KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL);
> > >
> > >Now that the check is pased the process could have executed something
> > >setuid.
> > 
> > Which process could have executed something setuid?  How would that
> > matter?  (I will confess I haven't checked, but I would be surprised
> > if "executing something setuid" would change the result of kauth()
> > check. But I could be wrong here.)
> > 
> 
> You start monitoring given process and proceed to execve a setuid file,
> which gives you information what it is doing, even though the
> kauth_authorize_process check would fail. This is basically what you
> yourself stated below about pid reuse.
> 
> So this either needs to prevent processes from changing credentials when
> traced or stop tracing when such a change occurs. The former seems like
> a rather bad idea and latter is already done by ktrace.
> 
> > >I'm somewhat surprised this code is here. Should not
> > >kauth_authorize_process assert that tp has stable credentials in some
> > >way, in effect just crashing filemon consumers?
> > >
> > >>   if (!error) {
> > >>   filemon->fm_pid = tp->p_pid;
> > >>   }
> > 
> > Why does the target process need stable credentials?  It's not doing
> > anything for filemon.
> > 
> 
> See below.
> 
> > >Pid is stored, but tp is not held. What if the 'traced' process exits?
> > >grep only reveals the following bit in filemon_wrapper_sys_exit:
> > >
> > >>   if (filemon->fm_pid == curproc->p_pid)
> > >>   filemon_printf(filemon, "# Bye bye\n");
> > >
> > >So filemon will continue tracing anything which happened to reuse the
> > >pid.
> > 
> > Yes, this is an outstanding issue.  More particularly, the new user
> > of the pid might not be accessible by the monitoring process (ie, it
> > would fail the earlier kauth() check).  But holding the pointer to
> > the proc isn't really the answer either - although less likely it is
> > entirely possible that the address could have been reused.
> > 
> 
> What you need is the ability to clear tracing bits on process exit.
> ktrace definitely has relevant bits in place already.
> 
> > (It's too bad we don't have "generation numbers" on the pid, which
> > would easily detect reuse.)
> 
> That would require struct proc to be type stable, which seems wasteful.
> 
> > >>   lp = p;
> > >>   p = p->p_pptr;
> > >>   rw_exit(>p_reflock);
> > >
> > >I guess the lock is needed to read the pointer to the parent.
> > 
> > Correct.
> > 
> > >Nothing was done to keep the parent around, which means it could 

Re: In-kernel process exit hooks?

2016-01-09 Thread Mateusz Guzik
On Sat, Jan 09, 2016 at 02:25:02PM +0800, Paul Goyette wrote:
> On Sat, 9 Jan 2016, Mateusz Guzik wrote:
> 
> >While I don't know all the details, it is clear that the purpose would
> >be much better served by ktrace and I would argue efforts should be
> >spent there.
> 
> filemon's purpose is somewhat different than that of ktrace.  There
> is a limited set of events that filemon captures, and it only
> captures successful operations.  Its output is quite simple and easy
> to parse by a human of shell script.
> 

ktrace already have limited support for specifying what events are
wanted. One can add a special filemon-like mode.

The output of kdump is definitely less friendly for scripts, but it is
way easier to write a script formatting it than it is to fix filemon.

> >>From usability perspective what seems to be needed is few hacks to
> >ktrace to only log the stuff make is interested in and parsing code for
> >bmake to translate to what looks like current filemon format for
> >aformentioned copying.
> 
> Perhaps.  But filemon is also a run-time loadable module (and it is
> preferred that it NOT be built-in to any kernels).  If I remember
> correctly, ktrace has hooks all over the system and does not readily
> lend itself to modularization.
> 

I doubt that compiling in ktrace on a which can handle filemon is an
issue, and even then it should outweight hops around filemon.

For correct of a minimalistic safe filemon module the kernel would
already have to grow hooks in few places (see below).


> >Filemon monitors stuff at syscall level, stores the fd and remembers
> >pids.
> 
> The fd is no longer being stored...  :)  But yes, it does still
> remember pids, so it can determine if an event belongs to a monitored
> process tree.  Again IIRC, ktrace is somewhat more intrusive, in that
> it manipulates various process flags which can affect behavior of the
> target (monitored) process.

It has a dedicated field in struct proc which is the only sensible way
of implementing this I can think of.

> >>   error = kauth_authorize_process(curproc->p_cred,
> >>   KAUTH_PROCESS_CANSEE, tp,
> >>   KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL);
> >
> >Now that the check is pased the process could have executed something
> >setuid.
> 
> Which process could have executed something setuid?  How would that
> matter?  (I will confess I haven't checked, but I would be surprised
> if "executing something setuid" would change the result of kauth()
> check. But I could be wrong here.)
> 

You start monitoring given process and proceed to execve a setuid file,
which gives you information what it is doing, even though the
kauth_authorize_process check would fail. This is basically what you
yourself stated below about pid reuse.

So this either needs to prevent processes from changing credentials when
traced or stop tracing when such a change occurs. The former seems like
a rather bad idea and latter is already done by ktrace.

> >I'm somewhat surprised this code is here. Should not
> >kauth_authorize_process assert that tp has stable credentials in some
> >way, in effect just crashing filemon consumers?
> >
> >>   if (!error) {
> >>   filemon->fm_pid = tp->p_pid;
> >>   }
> 
> Why does the target process need stable credentials?  It's not doing
> anything for filemon.
> 

See below.

> >Pid is stored, but tp is not held. What if the 'traced' process exits?
> >grep only reveals the following bit in filemon_wrapper_sys_exit:
> >
> >>   if (filemon->fm_pid == curproc->p_pid)
> >>   filemon_printf(filemon, "# Bye bye\n");
> >
> >So filemon will continue tracing anything which happened to reuse the
> >pid.
> 
> Yes, this is an outstanding issue.  More particularly, the new user
> of the pid might not be accessible by the monitoring process (ie, it
> would fail the earlier kauth() check).  But holding the pointer to
> the proc isn't really the answer either - although less likely it is
> entirely possible that the address could have been reused.
> 

What you need is the ability to clear tracing bits on process exit.
ktrace definitely has relevant bits in place already.

> (It's too bad we don't have "generation numbers" on the pid, which
> would easily detect reuse.)

That would require struct proc to be type stable, which seems wasteful.

> >>   lp = p;
> >>   p = p->p_pptr;
> >>   rw_exit(>p_reflock);
> >
> >I guess the lock is needed to read the pointer to the parent.
> 
> Correct.
> 
> >Nothing was done to keep the parent around, which means it could have
> >exited and be freed by now. Which in turn makes the second loop
> >iteration a use-after-free.
> 
> Hmmm, this code used to work.  Looks like the locking got messed up
> somewhere.  It should do (in pseudo code - I'm too lazy to write it
> all out!)
> 
>   lock(p)
>   while (p != 

Re: uvm vm_physseg trimming

2016-01-09 Thread Masao Uebayashi
avail_start/avail_end are used to keep track of the range used for
"managed pages" - PAGE_SIZE'ed pages that are added to free list and
allocated from there.  Managed pages are initially added after kernel
reserves its internal, bootstrap memory region (.text, .data, ...).

In some cases kernel wants to get ("steal") more memory, for its
internal use, after it already gives all remaining memory pages to
UVM.  See pmap_steal_memory().

There are no clear design or requirement for MD code about how to use
vm_physseg right now.  By merging `avail_start' to `start', you lose
information of the original `start'; actual start physical address of
a continuous physical memory segment.  And even without it, probably
all ports will continue working, because nothing uses that
information.

Personally I don't like that direction.  I wanted to make interfaces
around vm_physseg and pmap/UVM bootstrap design clearer.  I wanted to
keep more information into vm_physseg, instead of letting MD code
invent their own paddr_t range management.

I would suggest you to touch/massage those UVM/pmap/MD bootstrap code
readily and carefully.  That is the only way to thoroughly understand
such abandoned things.  Asking questions don't give you perfect
answers.  Splitting vm_physseg handling from uvm_page.c into
uvm_physseg.c would be a good start.


On Wed, Dec 30, 2015 at 7:32 PM, Cherry G. Mathew  wrote:
> Hi Everyone,
>
> Please find below a patch to remove .avail_(start|end) from
> struct vm_physseg
>
> I couldn't find a reason for them to be used redundantly, but I may be
> wrong. Are there port specific uses for these ?
>
> --
> Cherry
>
> diff -r 0b3902dbe274 sys/arch/acorn26/acorn26/pmap.c
> --- a/sys/arch/acorn26/acorn26/pmap.c   Sun Sep 13 09:15:02 2015 +0530
> +++ b/sys/arch/acorn26/acorn26/pmap.c   Wed Dec 30 15:27:27 2015 +0530
> @@ -301,11 +301,11 @@
> addr = 0;
> size = round_page(size);
> for (i = 0; i < vm_nphysseg; i++) {
> -   if (VM_PHYSMEM_PTR(i)->avail_start < 
> VM_PHYSMEM_PTR(i)->avail_end) {
> +   if (VM_PHYSMEM_PTR(i)->start < VM_PHYSMEM_PTR(i)->end) {
> addr = (vaddr_t)
> ((char*)MEMC_PHYS_BASE +
> -   ptoa(VM_PHYSMEM_PTR(i)->avail_start));
> -   VM_PHYSMEM_PTR(i)->avail_start++;
> +   ptoa(VM_PHYSMEM_PTR(i)->start));
> +   VM_PHYSMEM_PTR(i)->start++;
> break;
> }
> }
> diff -r 0b3902dbe274 sys/arch/alpha/alpha/machdep.c
> --- a/sys/arch/alpha/alpha/machdep.cSun Sep 13 09:15:02 2015 +0530
> +++ b/sys/arch/alpha/alpha/machdep.cWed Dec 30 15:27:27 2015 +0530
> @@ -617,11 +617,10 @@
> vps = VM_PHYSMEM_PTR(vm_nphysseg - 1);
>
> /* shrink so that it'll fit in the last segment */
> -   if ((vps->avail_end - vps->avail_start) < atop(sz))
> -   sz = ptoa(vps->avail_end - vps->avail_start);
> +   if ((vps->end - vps->start) < atop(sz))
> +   sz = ptoa(vps->end - vps->start);
>
> vps->end -= atop(sz);
> -   vps->avail_end -= atop(sz);
> msgbufaddr = (void *) ALPHA_PHYS_TO_K0SEG(ptoa(vps->end));
> initmsgbuf(msgbufaddr, sz);
>
> diff -r 0b3902dbe274 sys/arch/alpha/alpha/pmap.c
> --- a/sys/arch/alpha/alpha/pmap.c   Sun Sep 13 09:15:02 2015 +0530
> +++ b/sys/arch/alpha/alpha/pmap.c   Wed Dec 30 15:27:27 2015 +0530
> @@ -1027,35 +1027,34 @@
> panic("pmap_steal_memory: called _after_ bootstrap");
>
>  #if 0
> -   printf(" bank %d: avail_start 0x%lx, start 0x%lx, "
> -   "avail_end 0x%lx\n", bank, 
> VM_PHYSMEM_PTR(bank)->avail_start,
> -   VM_PHYSMEM_PTR(bank)->start, 
> VM_PHYSMEM_PTR(bank)->avail_end);
> +   printf(" bank %d: start 0x%lx, start 0x%lx, "
> +   "end 0x%lx\n", bank, VM_PHYSMEM_PTR(bank)->start,
> +   VM_PHYSMEM_PTR(bank)->start, VM_PHYSMEM_PTR(bank)->end);
>  #endif
>
> -   if (VM_PHYSMEM_PTR(bank)->avail_start != 
> VM_PHYSMEM_PTR(bank)->start ||
> -   VM_PHYSMEM_PTR(bank)->avail_start >= 
> VM_PHYSMEM_PTR(bank)->avail_end)
> +   if (VM_PHYSMEM_PTR(bank)->start != 
> VM_PHYSMEM_PTR(bank)->start ||
> +   VM_PHYSMEM_PTR(bank)->start >= VM_PHYSMEM_PTR(bank)->end)
> continue;
>
>  #if 0
> -   printf(" avail_end - avail_start = 0x%lx\n",
> -   VM_PHYSMEM_PTR(bank)->avail_end - 
> VM_PHYSMEM_PTR(bank)->avail_start);
> +   printf(" end - start = 0x%lx\n",
> +   VM_PHYSMEM_PTR(bank)->end - VM_PHYSMEM_PTR(bank)->start);
>  #endif
>
> -   if ((VM_PHYSMEM_PTR(bank)->avail_end - 
> 

Re: In-kernel process exit hooks?

2016-01-09 Thread Wolfgang Solfrank

Hi,


We all agree that filemon(4) is an ugly hack.  It probably should never
have gotten committed.  But it is there now, and there are a (very) few
use-cases.  So we don't want to remove it without having a replacement
implementation.


Well, can you explain?  Why would we not want to remove it and be done
with that nonsense?

Ciao,
Wolfgang
--
wolfg...@solfrank.net   Wolfgang Solfrank