NMI watchdog functionality on Freebsd

2013-01-22 Thread Sushanth Rai
Hi,

Does freebsd have some functionality similar to  Linux's NMI watchdog ? I'm 
aware of ichwd driver, but that depends to WDT to be available in the hardware. 
Even when it is available, BIOS needs to support a mechanism to trigger a OS 
level recovery to get any useful information when system is really wedged (with 
interrupt disabled). 

With Linux's NMI, APIC is programmed to periodically generate NMI and the OS 
NMI handler can check for some counters and invoke panic if the counters are 
not updated for a while. 

Thanks,
Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Running kgdb in batch mode.

2012-11-25 Thread Sushanth Rai
Basically I would like to get kernel backtrace of a bunch of threads from the 
live kernel under some conditions. When the condition is seen I would like to 
run kgdb, collect kernel backtrace of specific threads and exit. Is there a way 
run kgdb in batch mode ? Or any other way to get the stack trace.

Thanks,
Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Equivalent of linux F_SETLEASE/F_GETLEASE

2012-11-16 Thread Sushanth Rai
Is there a equivalent of Linux "Leases" functionality in FreeBSD ? If not, are 
there any plans of adding it in the future release?

Thanks,
Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Memory reserves or lack thereof

2012-11-12 Thread Sushanth Rai


--- On Mon, 11/12/12, Alan Cox  wrote:

> From: Alan Cox 
> Subject: Re: Memory reserves or lack thereof
> To: "Konstantin Belousov" 
> Cc: "Sushanth Rai" , a...@freebsd.org, 
> p...@freebsd.org, "StevenSears" , 
> "freebsd-hackers@freebsd.org" 
> Date: Monday, November 12, 2012, 3:10 PM
> On 11/12/2012 3:48 PM, Konstantin
> Belousov wrote:
> > On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai
> wrote:
> >> This patch still doesn't address the issue of
> M_NOWAIT calls driving
> >> the memory the all the way down to 2 pages, right ?
> It would be nice to
> >> have M_NOWAIT just do non-sleep version of M_WAITOK
> and M_USE_RESERVE
> >> flag to dig deep.
> > This is out of scope of the change. But it is required
> for any further
> > adjustements.
> 
> I would suggest a somewhat different response:
> 
> The patch does make M_NOWAIT into a "non-sleep version of
> M_WAITOK" and does reintroduce M_USE_RESERVE as a way to
> specify "dig deep".
> 
> Currently, both M_NOWAIT and M_WAITOK can drive the
> cache/free memory down to two pages.  The effect of the
> patch is to stop M_NOWAIT at two pages rather than allowing
> it to continue to zero pages.


Thanks for the correction. I was associating VM_ALLOC_SYSTEM with just M_NOWAIT 
as it seemed in the first verion of the patch.

Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Memory reserves or lack thereof

2012-11-12 Thread Sushanth Rai
This patch still doesn't address the issue of M_NOWAIT calls driving the memory 
the all the way down to 2 pages, right ? It would be nice to have M_NOWAIT just 
do non-sleep version of M_WAITOK and M_USE_RESERVE flag to dig deep. 

Sushanth 

--- On Mon, 11/12/12, Konstantin Belousov  wrote:

> From: Konstantin Belousov 
> Subject: Re: Memory reserves or lack thereof
> To: a...@freebsd.org
> Cc: p...@freebsd.org, "Sears, Steven" , 
> "freebsd-hackers@freebsd.org" 
> Date: Monday, November 12, 2012, 5:36 AM
> On Sun, Nov 11, 2012 at 03:40:24PM
> -0600, Alan Cox wrote:
> > On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov
> wrote:
> > 
> > > On Fri, Nov 09, 2012 at 07:10:04PM +, Sears,
> Steven wrote:
> > > > I have a memory subsystem design question
> that I'm hoping someone can
> > > answer.
> > > >
> > > > I've been looking at a machine that is
> completely out of memory, as in
> > > >
> > > >  v_free_count = 0,
> > > >  v_cache_count = 0,
> > > >
> > > > I wondered how a machine could completely run
> out of memory like this,
> > > especially after finding a lack of interrupt
> storms or other pathologies
> > > that would tend to overcommit memory. So I started
> investigating.
> > > >
> > > > Most allocators come down to vm_page_alloc(),
> which has this guard:
> > > >
> > > >       if ((curproc
> == pageproc) && (page_req != VM_ALLOC_INTERRUPT)) {
> > > >           
>    page_req = VM_ALLOC_SYSTEM;
> > > >       };
> > > >
> > > >       if
> (cnt.v_free_count + cnt.v_cache_count >
> cnt.v_free_reserved ||
> > > >       
>    (page_req == VM_ALLOC_SYSTEM &&
> > > >       
>    cnt.v_free_count + cnt.v_cache_count >
> > > cnt.v_interrupt_free_min) ||
> > > >       
>    (page_req == VM_ALLOC_INTERRUPT
> &&
> > > >       
>    cnt.v_free_count + cnt.v_cache_count >
> 0)) {
> > > >
> > > > The key observation is if VM_ALLOC_INTERRUPT
> is set, it will allocate
> > > every last page.
> > > >
> > > > >From the name one might expect
> VM_ALLOC_INTERRUPT to be somewhat rare,
> > > perhaps only used from interrupt threads. Not so,
> see kmem_malloc() or
> > > uma_small_alloc() which both contain this
> mapping:
> > > >
> > > >       if ((flags
> & (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
> > > >           
>    pflags = VM_ALLOC_INTERRUPT |
> VM_ALLOC_WIRED;
> > > >       else
> > > >           
>    pflags = VM_ALLOC_SYSTEM |
> VM_ALLOC_WIRED;
> > > >
> > > > Note that M_USE_RESERVE has been deprecated
> and is used in just a
> > > handful of places. Also note that lots of code
> paths come through these
> > > routines.
> > > >
> > > > What this means is essentially _any_
> allocation using M_NOWAIT will
> > > bypass whatever reserves have been held back and
> will take every last page
> > > available.
> > > >
> > > > There is no documentation stating M_NOWAIT
> has this side effect of
> > > essentially being privileged, so any innocuous
> piece of code that can't
> > > block will use it. And of course M_NOWAIT is
> literally used all over.
> > > >
> > > > It looks to me like the design goal of the
> BSD allocators is on
> > > recovery; it will give all pages away knowing it
> can recover.
> > > >
> > > > Am I missing anything? I would have expected
> some small number of pages
> > > to be held in reserve just in case. And I didn't
> expect M_NOWAIT to be a
> > > sort of back door for grabbing memory.
> > > >
> > >
> > > Your analysis is right, there is nothing to add or
> correct.
> > > This is the reason to strongly prefer M_WAITOK.
> > >
> > 
> > Agreed.  Once upon time, before SMPng, M_NOWAIT
> was rarely used.  It was
> > well understand that it should only be used by
> interrupt handlers.
> > 
> > The trouble is that M_NOWAIT conflates two orthogonal
> things.  The obvious
> > being that the allocation shouldn't sleep.  The
> other being how far we're
> > willing to deplete the cache/free page queues.
> > 
> > When fine-grained locking got sprinkled throughout the
> kernel, we all to
> > often found ourselves wanting to do allocations without
> the possibility of
> > blocking.  So, M_NOWAIT became commonplace, where
> it wasn't before.
> > 
> > This had the unintended consequence of introducing a
> lot of memory
> > allocations in the top-half of the kernel, i.e.,
> non-interrupt handling
> > code, that were digging deep into the cache/free page
> queues.
> > 
> > Also, ironically, in today's kernel an "M_NOWAIT |
> M_USE_RESERVE"
> > allocation is less likely to succeed than an "M_NOWAIT"
> allocation.
> > However, prior to FreeBSD 7.x, M_NOWAIT couldn't
> allocate a cached page; it
> > could only allocate a free page.  M_USE_RESERVE
> said that it ok to allocate
> > a cached page even though M_NOWAIT was specified. 
> Consequently, the system
> > wouldn't dig as far into the free page queue if
> M_USE_RESERVE was
> > specified, because it was allowed to reclaim a cached
> page.
> > 
> > In conclusion, I think it's time that we change
> M_NOWAIT so that it doesn't
> > dig any deeper into the cache

Kernel memory usage

2012-10-09 Thread Sushanth Rai
I was trying to co-relate the o/p from "top" to that I get from vmstat -z. I 
don't have any user programs that wires memory. Given that, I'm assuming the 
wired memory count shown by "top" is memory used by kernel. Now I would like 
find out how the kernel is using this "wired" memory. So, I look at dynamic 
memory allocated by kernel using "vmstat -z". I think memory allocated via 
malloc() is serviced by zones if the allocation size is <4k. So, I'm not sure 
how useful "vmstat -m" is. I also add up memory used by buffer cache. Is there 
any other significant chunk I'm missing ? Does vmstat -m show memory that is 
not accounted for in vmstat -z.

Thanks,
Sushanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-18 Thread Sushanth Rai
Wiring entire address space seems to have interesting side effect. The libc 
memory allocator calls madvise() to free the dirty unused pages, which does 
nothing when the pages are wired. The allocator unmaps only when entire chunk 
is free (default size of 1MB). That leaves lots for free pages which cannot 
reclaimed even when the system is under memory pressure.

Sushanth

--- On Mon, 4/16/12, Sushanth Rai  wrote:

> From: Sushanth Rai 
> Subject: Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN
> To: "Konstantin Belousov" 
> Cc: a...@freebsd.org, freebsd-hackers@freebsd.org
> Date: Monday, April 16, 2012, 11:41 AM
> Many thanks. I verified the patch you
> provided and it works fine.
> 
> Sushanth
> 
> 
> > Oh, I see. The problem is the VM_MAP_WIRE_NOHOLES
> flag.
> > Since we
> > map only the initial stack fragment even for the
> > MCL_WIREFUTURE maps,
> > there is a hole in the stack region.
> > 
> > In fact, for MCL_WIREFUTURE, we probably should map
> the
> > whole
> > stack at once, prefaulting all pages.
> > 
> > Below are two patches. The change for vm_mmap.c would
> fix
> > your immediate
> > problem by allowing holes in wired region.
> > 
> > The change for vm_map.c prefaults the whole stack
> instead of
> > the
> > initial fragment. The single-threaded programs still
> get a
> > fault
> > on stack growth.
> > 
> > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> > index 6198629..2fd18d1 100644
> > --- a/sys/vm/vm_map.c
> > +++ b/sys/vm/vm_map.c
> > @@ -3259,7 +3259,10 @@ vm_map_stack(vm_map_t map,
> > vm_offset_t addrbos, vm_size_t max_ssize,
> >          addrbos + max_ssize <
> > addrbos)
> >          return
> > (KERN_NO_SPACE);
> >  
> > -    init_ssize = (max_ssize < sgrowsiz) ?
> > max_ssize : sgrowsiz;
> > +    if (map->flags & MAP_WIREFUTURE)
> > +        init_ssize =
> > max_ssize;
> > +    else
> > +        init_ssize =
> > (max_ssize < sgrowsiz) ? max_ssize : sgrowsiz;
> >  
> >      PROC_LOCK(curthread->td_proc);
> >      vmemlim = lim_cur(curthread->td_proc,
> > RLIMIT_VMEM);
> > diff --git a/sys/vm/vm_mmap.c b/sys/vm/vm_mmap.c
> > index 2588c85..3fccd9e 100644
> > --- a/sys/vm/vm_mmap.c
> > +++ b/sys/vm/vm_mmap.c
> > @@ -1561,9 +1561,11 @@ vm_mmap(vm_map_t map,
> vm_offset_t
> > *addr, vm_size_t size, vm_prot_t prot,
> >       * If the
> > process has requested that all future mappings
> >       * be
> > wired, then heed this.
> >       */
> > -        if (map->flags
> > & MAP_WIREFUTURE)
> > +        if (map->flags
> > & MAP_WIREFUTURE) {
> >             
> > vm_map_wire(map, *addr, *addr + size,
> > -           
> >     VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES);
> > +           
> >     VM_MAP_WIRE_USER | ((flags & MAP_STACK) ?
> > +           
> >     VM_MAP_WIRE_HOLESOK : VM_MAP_WIRE_NOHOLES));
> > +        }
> >      } else {
> >          /*
> >       * If this
> > mapping was accounted for in the vnode's
> >
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-16 Thread Sushanth Rai
Many thanks. I verified the patch you provided and it works fine.

Sushanth


> Oh, I see. The problem is the VM_MAP_WIRE_NOHOLES flag.
> Since we
> map only the initial stack fragment even for the
> MCL_WIREFUTURE maps,
> there is a hole in the stack region.
> 
> In fact, for MCL_WIREFUTURE, we probably should map the
> whole
> stack at once, prefaulting all pages.
> 
> Below are two patches. The change for vm_mmap.c would fix
> your immediate
> problem by allowing holes in wired region.
> 
> The change for vm_map.c prefaults the whole stack instead of
> the
> initial fragment. The single-threaded programs still get a
> fault
> on stack growth.
> 
> diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> index 6198629..2fd18d1 100644
> --- a/sys/vm/vm_map.c
> +++ b/sys/vm/vm_map.c
> @@ -3259,7 +3259,10 @@ vm_map_stack(vm_map_t map,
> vm_offset_t addrbos, vm_size_t max_ssize,
>          addrbos + max_ssize <
> addrbos)
>          return
> (KERN_NO_SPACE);
>  
> -    init_ssize = (max_ssize < sgrowsiz) ?
> max_ssize : sgrowsiz;
> +    if (map->flags & MAP_WIREFUTURE)
> +        init_ssize =
> max_ssize;
> +    else
> +        init_ssize =
> (max_ssize < sgrowsiz) ? max_ssize : sgrowsiz;
>  
>      PROC_LOCK(curthread->td_proc);
>      vmemlim = lim_cur(curthread->td_proc,
> RLIMIT_VMEM);
> diff --git a/sys/vm/vm_mmap.c b/sys/vm/vm_mmap.c
> index 2588c85..3fccd9e 100644
> --- a/sys/vm/vm_mmap.c
> +++ b/sys/vm/vm_mmap.c
> @@ -1561,9 +1561,11 @@ vm_mmap(vm_map_t map, vm_offset_t
> *addr, vm_size_t size, vm_prot_t prot,
>       * If the
> process has requested that all future mappings
>       * be
> wired, then heed this.
>       */
> -        if (map->flags
> & MAP_WIREFUTURE)
> +        if (map->flags
> & MAP_WIREFUTURE) {
>             
> vm_map_wire(map, *addr, *addr + size,
> -           
>     VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES);
> +           
>     VM_MAP_WIRE_USER | ((flags & MAP_STACK) ?
> +           
>     VM_MAP_WIRE_HOLESOK : VM_MAP_WIRE_NOHOLES));
> +        }
>      } else {
>          /*
>       * If this
> mapping was accounted for in the vnode's
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-13 Thread Sushanth Rai
Just in case the attachment doesn't come through. Here is the program:

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

void * thr_func(void *);
#define MAX_THREADS 5

main()
{
int i;
int thread_num[MAX_THREADS];
pthread_t tid[MAX_THREADS];

if (mlockall(MCL_CURRENT|MCL_FUTURE) < 0) {
perror("Failure to lock address space");
exit(1);
}

for (i=0; i < MAX_THREADS; i++) {
thread_num[i] = i+1;
if (pthread_create(&tid[i], NULL, thr_func,
 &thread_num[i]) != 0) {
perror("thread creation failed");
exit(1);
}
}

for (i=0; i < MAX_THREADS; i++) {
if (pthread_join(tid[i], NULL) != 0) {
perror("pthread_join failed\n");
exit(1);
}
}
}

void * thr_func(void *arg)
{
   int *tnum = (int *)arg;
   printf("Thead %d going to sleep\n", *tnum);

   while(1) {
    sleep(5);
}    
}


--- On Fri, 4/13/12, Sushanth Rai  wrote:

> From: Sushanth Rai 
> Subject: Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN
> To: "Konstantin Belousov" 
> Cc: freebsd-hackers@freebsd.org
> Date: Friday, April 13, 2012, 11:37 AM
> I've attached the simple program that
> creates 5 threads. Following is the o/p of
> /proc//map when this program is running. Note
> that I modified 
> sys/fs/procfs/procfs_map.c to print whether a region is
> wired. As you can see from this o/p, none of stack areas get
> wired. 
> 
> 0x40 0x401000 1 0 0xff002d943bd0 r-x 1 0 0x1000 COW
> NC wired vnode /var/tmp/thread1
> 0x50 0x501000 1 0 0xff002dd13e58 rw- 2 0 0x3100 NCOW
> NNC wired default -
> 0x501000 0x60 255 0 0xff002dd13e58 rwx 2 0 0x3100
> NCOW NNC wired default -
> 0x80050 0x800526000 38 0 0xff0025574000 r-x 192 46
> 0x1004 COW NC wired vnode /libexec/ld-elf.so.1
> 0x800526000 0x800537000 17 0 0xff002d9f81b0 rw- 1 0
> 0x3100 NCOW NNC wired default -
> 0x800626000 0x80062d000 7 0 0xff002dd13bd0 rw- 1 0
> 0x3100 COW NNC wired vnode /libexec/ld-elf.so.1
> 0x80062d000 0x800633000 6 0 0xff002dd145e8 rw- 1 0
> 0x3100 NCOW NNC wired default -
> 0x800633000 0x800645000 18 0 0xff00256d71b0 r-x 63 42
> 0x4 COW NC wired vnode /lib/libthr.so.3
> 0x800645000 0x800646000 1 0 0xff002d975510 r-x 1 0
> 0x3100 COW NNC wired vnode /lib/libthr.so.3
> 0x800646000 0x800746000 0 0 0xff002dc5cca8 --- 4 0
> 0x3100 NCOW NNC not-wired default -
> 0x800746000 0x80074a000 4 0 0xff002572a288 rw- 1 0
> 0x3100 COW NNC wired vnode /lib/libthr.so.3
> 0x80074a000 0x80074c000 2 0 0xff002dc5cca8 rw- 4 0
> 0x3100 NCOW NNC wired default -
> 0x80074c000 0x80083e000 242 0 0xff001cd226c0 r-x 238 92
> 0x1004 COW NC wired vnode /lib/libc.so.7
> 0x80083e000 0x80083f000 1 0 0xff002dd12000 r-x 1 0
> 0x3100 COW NNC wired vnode /lib/libc.so.7
> 0x80083f000 0x80093e000 0 0 0xff002dc5cca8 --- 4 0
> 0x3100 NCOW NNC not-wired default -
> 0x80093e000 0x80095d000 31 0 0xff002dddc360 rw- 1 0
> 0x3100 COW NNC wired vnode /lib/libc.so.7
> 0x80095d000 0x800974000 23 0 0xff002dc5cca8 rw- 4 0
> 0x3100 NCOW NNC wired default -
> 0x800a0 0x800b0 256 0 0xff002dbd1798 rw- 1 0
> 0x3100 NCOW NNC wired default -
> 0x800b0 0x800c0 256 0 0xff002dd14948 rw- 1 0
> 0x3100 NCOW NNC wired default -
> 0x7f3db000 0x7f3fb000 1 0 0xff002dbb4360 rw- 1 0
> 0x3100 NCOW NNC not-wired default -
> 0x7f5dc000 0x7f5fc000 1 0 0xff002dc66af8 rw- 1 0
> 0x3100 NCOW NNC not-wired default -
> 0x7f7dd000 0x7f7fd000 1 0 0xff002dbea438 rw- 1 0
> 0x3100 NCOW NNC not-wired default -
> 0x7f9de000 0x7f9fe000 1 0 0xff002dd7fd80 rw- 1 0
> 0x3100 NCOW NNC not-wired default -
> 0x7fbdf000 0x7fbff000 1 0 0xff002dbe9438 rw- 1 0
> 0x3100 NCOW NNC not-wired default -
> 0x7fbff000 0x7fc0 0 0 0 --- 0 0 0x0 NCOW NNC
> not-wired none -
> 0x7ffffffe 0x8000 32 0 0xff002dd125e8 rwx 1
> 0 0x3100 NCOW NNC wired default -
> 
> --- On Fri, 4/13/12, Konstantin Belousov 
> wrote:
> 
> > From: Konstantin Belousov 
> > Subject: Re: mlockall() on freebsd 7.2 + amd64 returns
> EAGAIN
> > To: "Sushanth Rai" 
> > Cc: freebsd-hackers@freebsd.org
> > Date: Friday, April 13, 2012, 1:11 AM
> > On Thu, Apr 12, 2012 at 08:10:26PM
> > -0700, Sushanth Rai wrote:
> > > > 
> > > > Then it should be fixed in r190885.
> > > > 
> > &g

Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-13 Thread Sushanth Rai
I've attached the simple program that creates 5 threads. Following is the o/p 
of /proc//map when this program is running. Note that I modified 
sys/fs/procfs/procfs_map.c to print whether a region is wired. As you can see 
from this o/p, none of stack areas get wired. 

0x40 0x401000 1 0 0xff002d943bd0 r-x 1 0 0x1000 COW NC wired vnode 
/var/tmp/thread1
0x50 0x501000 1 0 0xff002dd13e58 rw- 2 0 0x3100 NCOW NNC wired default -
0x501000 0x60 255 0 0xff002dd13e58 rwx 2 0 0x3100 NCOW NNC wired 
default -
0x80050 0x800526000 38 0 0xff0025574000 r-x 192 46 0x1004 COW NC wired 
vnode /libexec/ld-elf.so.1
0x800526000 0x800537000 17 0 0xff002d9f81b0 rw- 1 0 0x3100 NCOW NNC wired 
default -
0x800626000 0x80062d000 7 0 0xff002dd13bd0 rw- 1 0 0x3100 COW NNC wired 
vnode /libexec/ld-elf.so.1
0x80062d000 0x800633000 6 0 0xff002dd145e8 rw- 1 0 0x3100 NCOW NNC wired 
default -
0x800633000 0x800645000 18 0 0xff00256d71b0 r-x 63 42 0x4 COW NC wired 
vnode /lib/libthr.so.3
0x800645000 0x800646000 1 0 0xff002d975510 r-x 1 0 0x3100 COW NNC wired 
vnode /lib/libthr.so.3
0x800646000 0x800746000 0 0 0xff002dc5cca8 --- 4 0 0x3100 NCOW NNC 
not-wired default -
0x800746000 0x80074a000 4 0 0xff002572a288 rw- 1 0 0x3100 COW NNC wired 
vnode /lib/libthr.so.3
0x80074a000 0x80074c000 2 0 0xff002dc5cca8 rw- 4 0 0x3100 NCOW NNC wired 
default -
0x80074c000 0x80083e000 242 0 0xff001cd226c0 r-x 238 92 0x1004 COW NC wired 
vnode /lib/libc.so.7
0x80083e000 0x80083f000 1 0 0xff002dd12000 r-x 1 0 0x3100 COW NNC wired 
vnode /lib/libc.so.7
0x80083f000 0x80093e000 0 0 0xff002dc5cca8 --- 4 0 0x3100 NCOW NNC 
not-wired default -
0x80093e000 0x80095d000 31 0 0xff002dddc360 rw- 1 0 0x3100 COW NNC wired 
vnode /lib/libc.so.7
0x80095d000 0x800974000 23 0 0xff002dc5cca8 rw- 4 0 0x3100 NCOW NNC wired 
default -
0x800a0 0x800b0 256 0 0xff002dbd1798 rw- 1 0 0x3100 NCOW NNC wired 
default -
0x800b0 0x800c0 256 0 0xff002dd14948 rw- 1 0 0x3100 NCOW NNC wired 
default -
0x7f3db000 0x7f3fb000 1 0 0xff002dbb4360 rw- 1 0 0x3100 NCOW NNC 
not-wired default -
0x7f5dc000 0x7f5fc000 1 0 0xff002dc66af8 rw- 1 0 0x3100 NCOW NNC 
not-wired default -
0x7f7dd000 0x7f7fd000 1 0 0xff002dbea438 rw- 1 0 0x3100 NCOW NNC 
not-wired default -
0x7f9de000 0x7f9fe000 1 0 0xff002dd7fd80 rw- 1 0 0x3100 NCOW NNC 
not-wired default -
0x7fbdf000 0x7fbff000 1 0 0xff002dbe9438 rw- 1 0 0x3100 NCOW NNC 
not-wired default -
0x7fbff000 0x7fc0 0 0 0 --- 0 0 0x0 NCOW NNC not-wired none -
0x7ffe 0x8000 32 0 0xff002dd125e8 rwx 1 0 0x3100 NCOW NNC 
wired default -

--- On Fri, 4/13/12, Konstantin Belousov  wrote:

> From: Konstantin Belousov 
> Subject: Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@freebsd.org
> Date: Friday, April 13, 2012, 1:11 AM
> On Thu, Apr 12, 2012 at 08:10:26PM
> -0700, Sushanth Rai wrote:
> > > 
> > > Then it should be fixed in r190885.
> > > 
> > 
> > Thanks. That works like a charm. 
> > 
> > mlockall() mostly works now. There is still a, issue in
> wiring the stacks of multithreaded program when the program
> uses default stack allocation scheme. Thread library
> allocates stack for each thread by calling mmap() and
> sending address and size to be mapped. The kernel adjusts
> the start address to sgrowsz in  vm_map_stack() and
> maps at the adjusted address. But the subsequent wiring is
> done using the original address, which fails. 
> > 
> Can you, please provide stand-alone example which
> demostrates the issue ?
> I suspect this should have not changed in HEAD.
>___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-12 Thread Sushanth Rai
> 
> Then it should be fixed in r190885.
> 

Thanks. That works like a charm. 

mlockall() mostly works now. There is still a, issue in wiring the stacks of 
multithreaded program when the program uses default stack allocation scheme. 
Thread library allocates stack for each thread by calling mmap() and sending 
address and size to be mapped. The kernel adjusts the start address to sgrowsz 
in  vm_map_stack() and maps at the adjusted address. But the subsequent wiring 
is done using the original address, which fails. 

> Could you use something less antique, please ?
> 

I would love to but I don't have control over some of these things. Hopefully 
some of what we have seen recently will convince higher powers to refresh to a 
newer  release.

Thanks,
Sushanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-10 Thread Sushanth Rai

> > I don't know if that has anything to do with failure.
> The snippet of code that returns failure in vm_fault() is
> the following:
> > 
> > if (fs.pindex >= fs.object->size) {
> >   
>    unlock_and_deallocate(&fs);
> >       return
> (KERN_PROTECTION_FAILURE);
> > }
> > 
> > Any help would be appreciated.
> 
> This might be a bug fixed in r191810, but I am not sure.
> 

I tried that fix but it didn't work. What seems to happen is that libm is 
mmap'ed beyond the size of the file. From truss o/p, I see the following:

open("/lib/libm.so.5",O_RDONLY,030577200)= 3 (0x3)
fstat(3,{ mode=-r--r--r-- ,inode=918533,size=115560,blksize=4096 }) = 0 (0x0)
read(3,"\^?ELF\^B\^A\^A\t\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
mmap(0x0,1155072,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_NOCORE,3,0x0) = 
34366242816 (0x800634000)

So the size of the file is 115560 but mmap() length is 1155072. The memory map 
of the file corresponding to libm as seen from running 'cat /proc//map' 
is the following:

0x800634000 0x80064c000 24 0 0xff002553eca8 r-x 108 54 0x0 COW NC vnode 
/lib/libm.so.5
0x80064c000 0x80064d000 1 0 0xff01d79b0a20 r-x 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5
0x80064d000 0x80074c000 3 0 0xff002553eca8 r-x 108 54 0x0 COW NC vnode 
/lib/libm.so.5
0x80074c000 0x80074e000 2 0 0xff01d79f1288 rw- 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5


when the program tries to fault-in all the pages as part of call to mlockall(), 
the following check in vm_fault() fails when trying to fault-in 0x800651000.

if (fs.pindex >= fs.object->size) {
 unlock_and_deallocate(&fs);
 return (KERN_PROTECTION_FAILURE);
}

since the object size corresponds to size of libm and fault address is one page 
beyond the object size. Is this a bug ?

Thanks,
Sushanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Startvation of realtime piority threads

2012-04-10 Thread Sushanth Rai
Thanks. I'll try to back port locally.

Sushanth

--- On Tue, 4/10/12, John Baldwin  wrote:

> From: John Baldwin 
> Subject: Re: Startvation of realtime piority threads
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@freebsd.org
> Date: Tuesday, April 10, 2012, 6:57 AM
> On Monday, April 09, 2012 4:32:24 pm
> Sushanth Rai wrote:
> > I'm using stock 7.2. The priorities as defined in
> priority.h are in this range:
> > 
> > /*
> >  * Priorities range from 0 to 255, but differences
> of less then 4 (RQ_PPQ)
> >  * are insignificant.  Ranges are as
> follows:
> >  *
> >  * Interrupt threads:       
>    0 - 63
> >  * Top half kernel threads: 
>    64 - 127
> >  * Realtime user threads:   
>    128 - 159
> >  * Time sharing user threads:   160
> - 223
> >  * Idle user threads:       
>    224 - 255
> >  *
> >  * XXX If/When the specific interrupt thread and
> top half thread ranges
> >  * disappear, a larger range can be used for user
> processes.
> >  */
> > 
> > The trouble is with vm_waitpfault(), which explicitly
> sleeps at PUSER.
> 
> Ah, yes, PUSER is the one Pxxx not in "top half kernel
> threads".  You can patch
> that locally, but you may have better lucking using 9.0 (or
> backporting my
> fixes in 9.0 back to 7 or 8).  They were too invasive
> to backport to FreeBSD
> 7/8, but you could still do it locally (I've used them at
> work on both 7 and 8).
> 
> -- 
> John Baldwin
> ___
> freebsd-hackers@freebsd.org
> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


mlockall() on freebsd 7.2 + amd64 returns EAGAIN

2012-04-09 Thread Sushanth Rai
Hello,

I have a simple program that links with the math library. The only thing that 
program does is to call mlockall(MCL_CURRENT | MCL_FUTURE). This call to 
mlockall fails with EAGAIN. I figured out that kernel vm_fault() is returning 
KERN_PROTECTION_FAILURE when it tries to fault-in the mmap'ed math library 
address. But I can't figure why.

The /proc//map returns the following for the process:

0x800634000 0x80064c000 24 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode 
/lib/libm.so.5
0x80064c000 0x80064d000 1 0 0xff016f11c5e8 r-x 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5
0x80064d000 0x80074c000 4 0 0xff0025571510 r-x 104 52 0x1000 COW NC vnode 
/lib/libm.so.5

Since ntpd calls mlockall with same option and links with math library too, I 
look at map o/p of ntpd, which looks slightly different "resident" column (3rd 
column) on 3rd line:
0x800682000 0x80069a000 8 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode 
/lib/libm.so.5
0x80069a000 0x80069b000 1 0 0xff0103b85870 r-x 1 0 0x3100 COW NNC vnode 
/lib/libm.so.5
0x80069b000 0x80079a000 0 0 0xff0025571510 r-x 100 50 0x1000 COW NC vnode 
/lib/libm.so.5

I don't know if that has anything to do with failure. The snippet of code that 
returns failure in vm_fault() is the following:

if (fs.pindex >= fs.object->size) {
  unlock_and_deallocate(&fs);
  return (KERN_PROTECTION_FAILURE);
}

Any help would be appreciated.

Thanks,
Sushanth

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Startvation of realtime piority threads

2012-04-09 Thread Sushanth Rai
I'm using stock 7.2. The priorities as defined in priority.h are in this range:

/*
 * Priorities range from 0 to 255, but differences of less then 4 (RQ_PPQ)
 * are insignificant.  Ranges are as follows:
 *
 * Interrupt threads:   0 - 63
 * Top half kernel threads: 64 - 127
 * Realtime user threads:   128 - 159
 * Time sharing user threads:   160 - 223
 * Idle user threads:   224 - 255
 *
 * XXX If/When the specific interrupt thread and top half thread ranges
 * disappear, a larger range can be used for user processes.
 */

The trouble is with vm_waitpfault(), which explicitly sleeps at PUSER.


Sushanth

--- On Mon, 4/9/12, John Baldwin  wrote:

> From: John Baldwin 
> Subject: Re: Startvation of realtime piority threads
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@freebsd.org
> Date: Monday, April 9, 2012, 11:37 AM
> On Monday, April 09, 2012 2:08:50 pm
> Sushanth Rai wrote:
> > I'm on 7.2. sched_sleep() on 7.2 just records the sleep
> time. That's why I 
> though _sleep might the right place to do the check.
> 
> Nah, sched_sleep() is more accurate since the sleep priority
> can have other 
> side effects.
> 
> Hmm, in stock 7.2, the rtprio range is below things like
> PVM, etc., so that
> shouldn't actually be buggy in that regard.  I fixed
> this in 9.0 and HEAD
> when I moved the rtprio range up above the kernel sleep
> priorities.  Are
> you using local patches to 7.2 to raise the priority of
> rtprio threads?
> 
> > Thanks,
> > Sushanth
> > 
> > --- On Mon, 4/9/12, John Baldwin 
> wrote:
> > 
> > > From: John Baldwin 
> > > Subject: Re: Startvation of realtime piority
> threads
> > > To: "Sushanth Rai" 
> > > Cc: freebsd-hackers@freebsd.org
> > > Date: Monday, April 9, 2012, 9:17 AM
> > > On Thursday, April 05, 2012 9:08:24
> > > pm Sushanth Rai wrote:
> > > > I understand the downside of badly written
> realtime
> > > app.  In my case 
> > > application runs in userspace without making much
> syscalls
> > > and by all means it 
> > > is a well behaved application. Yes, I can wire
> memory,
> > > change the application 
> > > to use mutex instead of spinlock and those changes
> should
> > > help but they are 
> > > still working around the problem. I still believe
> kernel
> > > should not lower the 
> > > realtime priority when blocking on resources. This
> can lead
> > > to priority 
> > > inversion, especially since these threads run at
> fixed
> > > priorities and kernel 
> > > doesn't muck with them.
> > > >  
> > > > As you suggested _sleep() should not adjust
> the
> > > priorities for realtime 
> > > threads. 
> > > 
> > > Hmm, sched_sleep() for both SCHED_4BSD and
> SCHED_ULE already
> > > does the right
> > > thing here in HEAD.
> > > 
> > >     if
> (PRI_BASE(td->td_pri_class) !=
> > > PRI_TIMESHARE)
> > >         return;
> > > 
> > > Which OS version did you see this on?
> > > 
> > > -- 
> > > John Baldwin
> > > 
> > 
> 
> -- 
> John Baldwin
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Startvation of realtime piority threads

2012-04-09 Thread Sushanth Rai
I'm on 7.2. sched_sleep() on 7.2 just records the sleep time. That's why I 
though _sleep might the right place to do the check.

Thanks,
Sushanth

--- On Mon, 4/9/12, John Baldwin  wrote:

> From: John Baldwin 
> Subject: Re: Startvation of realtime piority threads
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@freebsd.org
> Date: Monday, April 9, 2012, 9:17 AM
> On Thursday, April 05, 2012 9:08:24
> pm Sushanth Rai wrote:
> > I understand the downside of badly written realtime
> app.  In my case 
> application runs in userspace without making much syscalls
> and by all means it 
> is a well behaved application. Yes, I can wire memory,
> change the application 
> to use mutex instead of spinlock and those changes should
> help but they are 
> still working around the problem. I still believe kernel
> should not lower the 
> realtime priority when blocking on resources. This can lead
> to priority 
> inversion, especially since these threads run at fixed
> priorities and kernel 
> doesn't muck with them.
> >  
> > As you suggested _sleep() should not adjust the
> priorities for realtime 
> threads. 
> 
> Hmm, sched_sleep() for both SCHED_4BSD and SCHED_ULE already
> does the right
> thing here in HEAD.
> 
>     if (PRI_BASE(td->td_pri_class) !=
> PRI_TIMESHARE)
>         return;
> 
> Which OS version did you see this on?
> 
> -- 
> John Baldwin
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Startvation of realtime piority threads

2012-04-05 Thread Sushanth Rai
I understand the downside of badly written realtime app.  In my case 
application runs in userspace without making much syscalls and by all means it 
is a well behaved application. Yes, I can wire memory, change the application 
to use mutex instead of spinlock and those changes should help but they are 
still working around the problem. I still believe kernel should not lower the 
realtime priority when blocking on resources. This can lead to priority 
inversion, especially since these threads run at fixed priorities and kernel 
doesn't muck with them.
 
As you suggested _sleep() should not adjust the priorities for realtime 
threads. 

Thanks,
Sushanth

--- On Thu, 4/5/12, John Baldwin  wrote:

> From: John Baldwin 
> Subject: Re: Startvation of realtime piority threads
> To: freebsd-hackers@freebsd.org, davi...@freebsd.org
> Date: Thursday, April 5, 2012, 9:01 AM
> On Thursday, April 05, 2012 1:07:55
> am David Xu wrote:
> > On 2012/4/5 11:56, Konstantin Belousov wrote:
> > > On Wed, Apr 04, 2012 at 06:54:06PM -0700, Sushanth
> Rai wrote:
> > >> I have a multithreaded user space program that
> basically runs at realtime 
> priority. Synchronization between threads are done using
> spinlock. When 
> running this program on a SMP system under heavy memory
> pressure I see that 
> thread holding the spinlock is starved out of cpu. The cpus
> are effectively 
> consumed by other threads that are spinning for lock to
> become available.
> > >>
> > >> After instrumenting the kernel a little bit
> what I found was that under 
> memory pressure, when the user thread holding the spinlock
> traps into the 
> kernel due to page fault, that thread sleeps until the free
> pages are 
> available. The thread sleeps PUSER priority (within
> vm_waitpfault()). When it 
> is ready to run, it is queued at PUSER priority even thought
> it's base 
> priority is realtime. The other siblings threads that are
> spinning at realtime 
> priority to acquire the spinlock starves the owner of
> spinlock.
> > >>
> > >> I was wondering if the sleep in
> vm_waitpfault() should be a 
> MAX(td_user_pri, PUSER) instead of just PUSER. I'm running
> on 7.2 and it looks 
> like this logic is the same in the trunk.
> > > It just so happen that your program stumbles upon
> a single sleep point in
> > > the kernel. If for whatever reason the thread in
> kernel is put off CPU
> > > due to failure to acquire any resource without
> priority propagation,
> > > you would get the same effect. Only blockable
> primitives do priority
> > > propagation, that are mutexes and rwlocks, AFAIR.
> In other words, any
> > > sx/lockmgr/sleep points are vulnerable to the same
> issue.
> > This is why I suggested that POSIX realtime priority
> should not be 
> > boosted, it should be
> > only higher than PRI_MIN_TIMESHARE but lower than any
> priority all 
> > msleep() callers
> > provided.  The problem is userland realtime thread
> 's busy looping code 
> > can cause
> > starvation a thread in kernel which holding a critical
> resource.
> > In kernel we can avoid to write dead-loop code, but
> userland code is not 
> > trustable.
> 
> Note that you have to be root to be rtprio, and that there
> is trustable
> userland code (just because you haven't used any doesn't
> mean it doesn't
> exist).
> 
> > If you search "Realtime thread priorities" in
> 2010-december within @arch 
> > list.
> > you may find the argument.
> 
> I think the bug here is that sched_sleep() should not lower
> the priority of
> an rtprio process.  It should arguably not raise the
> priority of an idprio
> process either, but sched_sleep() should probably only apply
> to timesharing
> threads.
> 
> All that said, userland rtprio code is going to have to be
> careful.  It should
> be using things like wired memory as Kostik suggested, and
> probably avoiding
> most system calls.  You can definitely blow your foot
> off quite easily in lots 
> of ways with rtprio.
> 
> -- 
> John Baldwin
> ___
> freebsd-hackers@freebsd.org
> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Startvation of realtime piority threads

2012-04-04 Thread Sushanth Rai
I have a multithreaded user space program that basically runs at realtime 
priority. Synchronization between threads are done using spinlock. When running 
this program on a SMP system under heavy memory pressure I see that thread 
holding the spinlock is starved out of cpu. The cpus are effectively consumed 
by other threads that are spinning for lock to become available. 

After instrumenting the kernel a little bit what I found was that under memory 
pressure, when the user thread holding the spinlock traps into the kernel due 
to page fault, that thread sleeps until the free pages are available. The 
thread sleeps PUSER priority (within vm_waitpfault()). When it is ready to run, 
it is queued at PUSER priority even thought it's base priority is realtime. The 
other siblings threads that are spinning at realtime priority to acquire the 
spinlock starves the owner of spinlock. 

I was wondering if the sleep in vm_waitpfault() should be a MAX(td_user_pri, 
PUSER) instead of just PUSER. I'm running on 7.2 and it looks like this logic 
is the same in the trunk.

Thanks,
Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Improving gcore

2012-03-25 Thread Sushanth Rai


--- On Sat, 3/24/12, Konstantin Belousov  wrote:


> No, I mentioned exactly this in paragraph you replied to.
> To actually start executing from runq, thread needs to
> transition
> from kernel to userspace (in other words, thread appears on
> runq
> due to interrupt, thus entering kernel space). On the
> kernel->user
> transition, the thread will be suspended in AST handler.
> 
> So, if pending AST catched usermode thread on runq, no
> single usermode
> instruction is executed by the thread before suspension.
> 

Got it. Basically if the usermode thread's time slice is up, AST handler 
(triggered due timer interrupt) would switch-out the thread when it sees 
TDF_NEERESCHED flag. When thread starts running again, userret() called from 
AST handler would check for suspension.

I guess it was much ado about nothing. At least I got understand the code 
little better. Much thanks for that.

Sushanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Improving gcore

2012-03-24 Thread Sushanth Rai


--- On Fri, 3/23/12, Konstantin Belousov  wrote:


Can we
> > safely remove them out of the runq ?
> No, since thread on runq shall be considered the same as the
> thread
> actually executing on CPU. It is unsafe to suspend the
> thread in this
> state, due to it potentially owning a kernel resource.
> 
> It the thread on runq but not on CPU is set up to return to
> usermode
> 'immediately' after putting back on CPU, then normal AST
> check would
> cause its suspend.

Threads could have been running in user space and they are on the runq because 
their time slice expired or they yielded the CPU or got preempted. These 
threads will only notice the suspension when they enter the kernel via syscall 
or trap. If we can identify that a thread got switched-out for any of these 
reasons then it's reasonable to remove it from the runq when dealing with 
suspensions.

>> approach and ofcourse it is missing details at this point. The idea
>> again is to suspend all threads as quickly as possible.
> I do not see how this would provide any significant difference comparing
> with SIGSTOP delivery. The points were signals are checked and the points
> were suspension can be applied are essentially the same.

I tend to agree with this.

Thanks,
Sushanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Improving gcore

2012-03-23 Thread Sushanth Rai
What I mean by inconsistent is that, a process with lots of threads takes a 
while before all threads are suspended. When I look at the resulting core file, 
the state of some of the shared data is not exactly what I was expecting when I 
issued the gcore command. It is quite possible that state might have changed 
even before ptrace() had a chance to issue SIGSTOP. But I am looking at any 
improvement that can be reasonably done in kernel.

As you described suspension are checked at safe points and only when threads 
reach those, they get suspended. I understand and agree with the reasons behind 
asynchronous stopping. But the net effect is that threads can potentially run 
for a short duration before they suspend themselves. So, I am trying to figure 
out ways to reduce this duration as much as possible.

One thing I noticed is that in sig_suspend_threads(), we check if the threads 
are sleeping interruptibly. If so, they get suspended immediately.
Otherwise, set TDF_ASTPENDING and if the thread is running on CPU we send 
IPI_AST to that CPU. What about the target process's threads that are on the 
runq ? It looks like the thread will only notice the flag when it is at 
user->kernel boundary. Can we safely remove them out of the runq ?

With respect to PT_SUSPEND, as part of PT_ATTACH request I was thinking of 
explicitly suspending all the threads by setting TDF_DBSUSPEND instead of 
posting SIGSTOP. As each thread in the target process calls 
thread_suspend_check(), it would notice this flag and suspend itself. PT_ATTACH 
command would then wait until all threads are suspended before returning to the 
caller. This is the general approach and ofcourse it is missing details at this 
point. The idea again is to suspend all threads as quickly as possible.

I'm running on 7.2. Cursory look at trunk version didn't show major changes in 
this area.

Thanks,
Sushanth
 

--- On Thu, 3/22/12, Konstantin Belousov  wrote:

> From: Konstantin Belousov 
> Subject: Re: Improving gcore
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@freebsd.org
> Date: Thursday, March 22, 2012, 7:01 AM
> On Wed, Mar 21, 2012 at 04:35:13PM
> -0700, Sushanth Rai wrote:
> > Sometimes I have trouble capturing the "correct" state
> of a
> > multithreaded process using gcore. That is, it looks
> like target
> > process might have done some work since the time
> command was issued
> > and the core file was generated.
> >
> > Looking at the code, gcore calls ptrace(PT_ATTACH...),
> which
> > internally issues SIGSTOP, and calls waitpid() to wait
> until the
> > process stops. So, it's quite possible that some
> threads that are not
> > sleeping interruptibly will continue to run until the
> process notices
> > the signal. Signals are only checked when a thread that
> is tagged to
> > handle the signal crosses the user boundary (return
> from syscall,
> > trap). When the thread finally handles SIGSTOP, it
> needs to stop all
> > threads, which is done by lighting a flag-bit it each
> thread. This
> > bit is checked as each thread crosses the user
> boundary. So, there
> > will always be some state change in the target process
> from the time
> > SIGSTOP is posted to the time all threads are actually
> stopped.
> Yes, this is how things work. There are two factors causing
> the asynchronous
> stopping:
> first, other CPUs may execute several threads of the
> process, so the
> suspension of that other threads require an IPI to be
> generated. IPI_AST
> handler just returns, which causes kernel->usermode
> transition and
> possible signal delivery and suspend check.
> 
> second, kernel never allows to suspend thread executing and
> blocked in
> kernel. Doing otherwise would cause deadlocks, because
> executing threads
> own resources that are shared with other threads.
> 
> So, the only safe points to suspend the threads is at
> kernel->user boundary
> or at some sleep points that are not marked as unsafe with
> PBDRY flag.
> On the other hand, since kernel waits for all threads to
> suspend before
> reporting the wait(2) event, the usermode state shall be
> consistent with
> itself, or rather, it shall be not worse then if the threads
> reach the
> stop point executing asynchronously on different CPUs.
> 
> See the check for p->p_suspcount == p->p_numthreads in
> the kern_wait()
> function before it decides that the found process is
> satisfactory
> for wait request.
> 
> >
> > I was wondering if I could improve this a bit by
> calling PT_SUSPEND on
> > all threads, instead of posting SIGSTOP and waiting for
> all threads
> > to stop. Once the core is generated, unsuspend all
> threads. As with
> &

Improving gcore

2012-03-21 Thread Sushanth Rai
Sometimes I have trouble capturing the "correct" state of a multithreaded 
process using gcore. That is, it looks like target process might have done some 
work since the time command was issued and the core file was generated.

Looking at the code, gcore calls ptrace(PT_ATTACH...), which internally issues 
SIGSTOP, and calls waitpid() to wait until the process stops. So, it's quite 
possible that some threads that are not sleeping interruptibly will continue to 
run until the process notices the signal. Signals are only checked when a 
thread that is tagged to handle the signal crosses the user boundary (return 
from syscall, trap). When the thread finally handles SIGSTOP, it needs to stop 
all threads, which is done by lighting a flag-bit it each thread. This bit is 
checked as each thread crosses the user boundary. So, there will always be some 
state change in the target process from the time SIGSTOP is posted to the time 
all threads are actually stopped. 

I was wondering if I could improve this a bit by calling PT_SUSPEND on all 
threads, instead of posting SIGSTOP and waiting for all threads to stop. Once 
the core is generated, unsuspend all threads. As with SIGSTOP, individual 
thread will only notice suspension as they cross user boundary. But there is no 
overhead of tagging a thread to handle the signal and that thread doing the 
suspension. The idea is to try and generate the core file which reflects the 
running state of the process as closely as possible.

Does this sound reasonable ?

Thanks,
Sushanth   
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Generating NMI due to WDT expiry

2012-02-11 Thread Sushanth Rai

I had looked at this. It seems to be doing the opposite of what I want. That 
is, it routes a NMI as an SMI and I need SMI to trigger an NMI. Watchdog timer 
on 3100 chipset had the ability to send either an NMI or SMI when the timer 
fired for the first time. I used NMI to generate kernel panic. With 3400 no 
longer generating NMI on WDT expiry, I'm trying to figure out how I can force 
memory dump on watchdog expiry.

Sushanth
 
--- On Sat, 2/11/12, Andriy Gapon  wrote:

> From: Andriy Gapon 
> Subject: Re: Generating NMI due to WDT expiry
> To: "Sushanth Rai" 
> Cc: freebsd-hackers@FreeBSD.org
> Date: Saturday, February 11, 2012, 3:06 AM
> on 11/02/2012 00:42 Sushanth Rai said
> the following:
> > Basically I would like to force system panic (and take
> kernel dump) when
> > watchdog time expires. Assuming that timer expired due
> to some OS bug, kernel
> > memory dump would be very useful. I'm running freebsd
> 7.2 on Intel IbexPeak
> > chipset. According to specs, the watchdog timer on
> IbexPeak first generates
> > an SMI and then resets the CPU. Since SMI is handled
> within the BIOS, is
> > there a way to generate NMI from within BIOS SMI
> handler ? I see that kernel
> > has support to either enter the debugger or force panic
> upon receipt of a
> > NMI.
> > 
> > This is not necessarily a FreeBSD question, but would
> like to hear any
> > thoughts/pointers.
> 
> See this:
> http://www.intel.com/content/dam/doc/datasheet/5-chipset-3400-chipset-datasheet.pdf
> Search for NMI2SMI_EN.  Maybe it's what you want.
> 
> 
> -- 
> Andriy Gapon
> ___
> freebsd-hackers@freebsd.org
> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Generating NMI due to WDT expiry

2012-02-10 Thread Sushanth Rai
Basically I would like to force system panic (and take kernel dump) when 
watchdog time expires. Assuming that timer expired due to some OS bug, kernel 
memory dump would be very useful. I'm running freebsd 7.2 on Intel IbexPeak 
chipset. According to specs, the watchdog timer on IbexPeak first generates an 
SMI and then resets the CPU. Since SMI is handled within the BIOS, is there a 
way to generate NMI from within BIOS SMI handler ? I see that kernel has 
support to either enter the debugger or force panic upon receipt of a NMI. 
 
This is not necessarily a FreeBSD question, but would like to hear any 
thoughts/pointers.

Thanks,
Sushanth 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Using kevent for signalling user app from kernel

2011-12-21 Thread Sushanth Rai
Hi,

I'm planning to use kqueue/kevent mechanism to notify a user application from 
the kernel. Basically I set up a file descriptor for read event from the user 
application by calling kevent(). Now, I would like to wake-up the process from 
within the kernel. The wake-up will happen due to one of the internal events 
(not due to data being available to read). I have access to the process 
structure and vnode corresponding to file descriptor. Is there a mechanism 
available from the kernel to trigger a wakeup and clear the kernel break notify 
message ?

Thanks,
Sushanth 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


IPI and I/O interrupts

2011-06-22 Thread Sushanth Rai
Hi,

I would like to understand little bit about the FreeBSD interrupt handling on 
x86.

When a cpu is processing an IPI, let's say cpu is running IPI_STOP handler, are 
I/O interrupts like the timer interrupt disabled ? Conversely if the cpu is 
holding a spinlock, which means it has disabled interrupts, can it process an 
IPI. My understanding is executing "cli" instruction disables the maskable 
interrupts. I was wondering if IPIs are part of that.

Thanks,
Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"