Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Patrick Thomas


Terry,

I made an initial change to the kernel of reducing maxusers from 512 to
256 - you said that 3gig is right on the border of needing extra KVA or
not, so I thought maybe this unnecessarily high maxusers might be puching
me over the top.  However, as long as I was changing the kernel, I also
added DDB.

The bad news is, it crashed again.  The good news is, I dropped to the
debugger and got the wait channel info you wanted with `ps`.  Here are the
last four columns of ps output for the first two pages of processes
(roughly 900 procs were running at the time of the halt, so of course I
can't give you them all, especially since I am copying by hand)

3   select  c0335140local
3   select  c0335140trivial-rewrite
3   select  c0335140cleanup
3   select  c0335140smtpd
3   select  c0335140imapd
2   httpd
2   httpd
3   sbwait  e5ff6a8chttpd
3   lockf   c89b7d40httpd
3   sbwait  e5fc8d0chttpd
2   httpd
3   select  c0335140top
3   accept  e5fc9ef6httpd
3   select  c0335140imapd
3   select  c0335140couriertls
3   select  c0335140imapd
2   couriertls
3   ttyin   c74aa630bash
3   select  c0335140sshd
3   select  c0335140tt++


So there it all is.  Does this confirm your feeling that I need to
increase KVA?  Or does it show you that one of the one or two other low
probablity problems is occurring?

thanks,

PT


On Sun, 23 Jun 2002, Terry Lambert wrote:

 Patrick Thomas wrote:
  I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a
  reasonable alternative that provides the same benefit and possible
  solution to this problem ?
 
  ...since bsically 0 swap has ever been used on the machine anyway...

 Not really.

 The code in machdep.c allocated pmaps for swapped memory based
 on the size of real memory, rather than based on available swap.

 The reason it does this is that you can (effectively) add an
 arbitrary amount of swap later with swapon, without the swap
 devices at the time being known to the kernel at boot.  THis
 makes it impossible to prereserve the number of pmap pages that
 will be needed for the actual amount of swap.

 Matt Dillon made some autosizing changes after I complained
 about this before.  My actual complaint was to implicate the
 size of real memory available relative to the size of the full
 address space.  The change he made attempts to autosize, and
 doesn't quite mirror this policy directly.  THis code is not
 available in 4.5.  I believe that it was back-ported to 4.6,
 but you would have to look at the CVS log on machdep.c to be
 sure about this -- it may only be in -current.

 The upshot of this is that having a lot of memory reserves
 pmap entries at 4K per 4M of real OR virtual memory.  The
 result of this is that at 4G of physical RAM, you actually
 end up allocating more pmap's than 1G of memory can contain,
 since the total of physical RAM plus swap over 1024 is
 larger than 1G minus the amount taken by an idle kernel, not
 including the page mappings.

 If you have 3G of real RAM (which you do), then you are on
 the borderline of running out.  When you factor in the amount
 of *potential* swap that machdep.c reserves, plus tuning for
 maxfiles/sockets/inpcb/tcpcb/mbufs/etc. (if any), PLUS the
 RAM taken up for things associated with running over 1000
 processes (as your system does), then you end up exhausting
 the amount of VM space available.

 As I said before, though, the only way to know for sure if
 this is your real problem is to break to the debugger after
 the lockup (it's *not* a crash), and check out the wait channels
 for the processes thar are unable to run.

 If you want a tweak for 4.5 that has about a 95% proability of
 masking the problem, then you need to up the KVA space.

 Unfortunately, it's not really possible to tell you where
 every byte of memory is going.  Also, unfortunately, the
 pmap's for swappable memory are not themselves swappable
 (or this would not be a problem).  Probably, pmaps for
 swap and for file backing store for exectuables should be
 allocated when they are needed, not preallocated (they can
 be, if you are not out of RAM, or have RAM, but are out of
 KVA space in which to create mappings) [see growkernel].

 Taking out 1G of physical memory from the box might also
 fix the problem without a kernel tweak, FWIW.

 However, right now, you need to cause the problem, enter
 the debugger, and use ps in the debugger to examine the
 wait channels.

 -- Terry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Matthew Dillon

Well, it should be noted that there are two things going on with swap.
What I adjusted was the size of the swap_zone, which holds swblocks.
These structures hold the VM-SWAP block mappings for things that are
swapped out.  The swap zone eats a lot more KVA then the radix tree
holding the swap bitmaps.

The actual swap bitmaps are allocated from the M_SWAP malloc pool.  These
allocations are based on NSWAP * (largest_single_swap_area).  NSWAP
is usually 4.

Having a single 2GB swap area is therefore somewhat expensive, but still
nowhere near the size required to exhaust KVM (or even come close to
exhausting KVM).  It is just as expensive as having 4 x 2GB swap areas
due to the way the bitmaps are allocated.  The swap bitmaps eat around
2 bits per 4K block of swap so a single 2GB of swap will eat
2G/4K x 2 / 8 x NSWAP(4) = 0.5 MB of ram.  Not very much.

But, getting back to the swblocks... these use a zone, SWAPMETA
(vmstat -z | less, search for SWAPMETA).  The zone reserves KVA.
A machine with 2GB of real memory will typically reserve around 10 MB
of KVA to hold swblocks.  Previously it reserved 20-40 MB of KVA which
really ate into available KVA.  It should not be a problem now but
it's very easy for you to check.  Multiply the size (160) against the 
LIMIT and you will get the approximate KVA reservation being used 
for the SWAPMETA zone.

--

Ok, history lesson over.  Going over your original posting and the ps
you just posted from ddb there is not enough information to make
any sort of diagnosis.  It doesn't look like KVA exhaustion to me,
and the ps does not show any deadlocks.  I'm not sure what is going
on.  I think some more experimentation is necessary... e.g. breaking into
DDB after it deadlocks and doing a full 'ps' (don't leave anything out
this time), and potentially getting a kernel core dump (assuming you
compiled the kernel -g and have a kernel.debug lying around that we
can gdb the core against).

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Patrick Thomas


A few items that deserve mention, and two questions:

a) this problem occurred back when the machine had 2gigs in it - I
actually (naively) added the third gig of physical ram to try to fix the
problem.

b) another machine of mine is now exhibiting the same bahavior - it has
far fewer processes running (~500 vs ~1000) and it has only 2gigs of RAM.

questions:

1) How do I give you an entire `ps` output from DDB ?  Is there a way to
output it to a floppy or something ?  Or are you suggesting to copy down
by hand ~1000 lines of ps output ?

2) Any other suggestions as to what it is - if it doesn't look like KVA,
and I reduced my swap from 2gig to 256megs, and I reduced maxusers from
512 to 256 ... basically I have a perfectly healthy machine that crashes
for no reason ?

All of your help is greatly appreciated.  It's just so frustrating to have
it halt every day for no apparent reason - as you saw from the `top`
output just as it halted the other day , the load is trivial.

--PT


On Mon, 24 Jun 2002, Matthew Dillon wrote:

 Well, it should be noted that there are two things going on with swap.
 What I adjusted was the size of the swap_zone, which holds swblocks.
 These structures hold the VM-SWAP block mappings for things that are
 swapped out.  The swap zone eats a lot more KVA then the radix tree
 holding the swap bitmaps.

 The actual swap bitmaps are allocated from the M_SWAP malloc pool.  These
 allocations are based on NSWAP * (largest_single_swap_area).  NSWAP
 is usually 4.

 Having a single 2GB swap area is therefore somewhat expensive, but still
 nowhere near the size required to exhaust KVM (or even come close to
 exhausting KVM).  It is just as expensive as having 4 x 2GB swap areas
 due to the way the bitmaps are allocated.  The swap bitmaps eat around
 2 bits per 4K block of swap so a single 2GB of swap will eat
 2G/4K x 2 / 8 x NSWAP(4) = 0.5 MB of ram.  Not very much.

 But, getting back to the swblocks... these use a zone, SWAPMETA
 (vmstat -z | less, search for SWAPMETA).  The zone reserves KVA.
 A machine with 2GB of real memory will typically reserve around 10 MB
 of KVA to hold swblocks.  Previously it reserved 20-40 MB of KVA which
 really ate into available KVA.  It should not be a problem now but
 it's very easy for you to check.  Multiply the size (160) against the
 LIMIT and you will get the approximate KVA reservation being used
 for the SWAPMETA zone.

 --

 Ok, history lesson over.  Going over your original posting and the ps
 you just posted from ddb there is not enough information to make
 any sort of diagnosis.  It doesn't look like KVA exhaustion to me,
 and the ps does not show any deadlocks.  I'm not sure what is going
 on.  I think some more experimentation is necessary... e.g. breaking into
 DDB after it deadlocks and doing a full 'ps' (don't leave anything out
 this time), and potentially getting a kernel core dump (assuming you
 compiled the kernel -g and have a kernel.debug lying around that we
 can gdb the core against).

   -Matt
   Matthew Dillon
   [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Terry Lambert

Patrick Thomas wrote:
 I made an initial change to the kernel of reducing maxusers from 512 to
 256 - you said that 3gig is right on the border of needing extra KVA or
 not, so I thought maybe this unnecessarily high maxusers might be puching
 me over the top.  However, as long as I was changing the kernel, I also
 added DDB.
 
 The bad news is, it crashed again.  The good news is, I dropped to the
 debugger and got the wait channel info you wanted with `ps`.  Here are the
 last four columns of ps output for the first two pages of processes
 (roughly 900 procs were running at the time of the halt, so of course I
 can't give you them all, especially since I am copying by hand)
 
 3   select  c0335140local
 3   select  c0335140trivial-rewrite
 3   select  c0335140cleanup
 3   select  c0335140smtpd
 3   select  c0335140imapd
 2   httpd
 2   httpd
 3   sbwait  e5ff6a8chttpd
 3   lockf   c89b7d40httpd
 3   sbwait  e5fc8d0chttpd
 2   httpd
 3   select  c0335140top
 3   accept  e5fc9ef6httpd
 3   select  c0335140imapd
 3   select  c0335140couriertls
 3   select  c0335140imapd
 2   couriertls
 3   ttyin   c74aa630bash
 3   select  c0335140sshd
 3   select  c0335140tt++
 
 So there it all is.  Does this confirm your feeling that I need to
 increase KVA?  Or does it show you that one of the one or two other low
 probablity problems is occurring?

Matt Dillon is right, that there's nothing conclusive in the information
you've posted.  However... it provides room for additional speculation.

--

The number of select waits is reasonable.  The sbwait makes
me somewhat worried.

It's obvious that you are running a large number of httpd's; the
sbwait in this case could be reasonably assumed to be waits based
on sendfile for a change in so-so_snd-sb_cc; if that's the
case, then it may be that you are simply running out of mbufs,
and are deadlocking.  This can happen if you have enough data in
the pipe that you can not receive more data (e.g. the m_pullup()
in tcp_input() could fail before other things would fail).

If this is too much assumption, you can walk the entry off the
process, and see if it's the address of the sb_cc for so_snd or
for so_rcv for the process in question.

The way to cross-check this would be to run a continuous netstat -m,
e.g.:

#!/bin/sh
while true
do
netstat -m
sleep 1
done

When the lockup comes, the interesting numbers are:

# netstat -m
3/64/5696 mbufs in use (current/peak/max):  -- #3
3 mbufs allocated to data
0/40/1424 mbuf clusters in use (current/peak/max)   -- #2
96 Kbytes allocated to network (2% of mb_map in use)
0 requests for memory denied-- #1
0 requests for memory delayed
0 calls to protocol drain routines

If there are a lot of denials, then you are out of mbuf memory
and/or mbuf clusters (sendfile tends to eat clusters for breakfast;
it's one of the reasons I dislike it immensely; the other is that
the standards for the majority of wire protocols where you'd use it
require CRLF termination, and UNIX text files have only LF termination).

The current vs. peak vs. max will tell you how close to resource
saturation you are.  The ratio of clusters to mbufs will (effectively)
tell you if you need to worry about adjusting the ratio because of
sendfile.

The lockf could (maybe) be a deadlock, but if it were, everyone
would be seeing it; it's incredibly doubtful, as long as the ps
output you indicated was at all accurate.

Basically, if you have any denials, or if the number of mbuf
clusters gets really large, then you could have a problem.

It would also be interesting to see the output of:

# sysctl -a | grep tcp | grep space
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536

A standard netstat would also tell you the contents of the
Recv-Q Send-Q columns.  If they were non-zero, then you would
basically be able to tell how much memory was being consumed by
network traffic in and out.

I guess the best way to deal with this would be to drop the size
of the send or receive queues, until it didn't consume all your
memory.  In general, the size of these queues is supposed to be
a *maximum*, not a *mean*, so the number of sockets possible,
times the maximum total of both, will often exceed the amount of
available mbuf space.

An interesting attack that is moderately effective on FreeBSD
boxes is to send with a very large size, and not send one of
the fragments (e.g. the second one) to prevent fragment
reassembly, and therefore saturate the reassembly queue.  The
Linux UDP NFS client code does this unintentionally, but you
could believe that 

Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Matthew Dillon


:questions:
:
:1) How do I give you an entire `ps` output from DDB ?  Is there a way to
:output it to a floppy or something ?  Or are you suggesting to copy down
:by hand ~1000 lines of ps output ?

If you have a couple of machines you can use a null-modem cable and
make the target machine's console the serial port by adding the
following line to the target machine's /boot/loader.conf:

console=comconsole

(note: DDB will occur on the serial port now, not the main system 
console).  Then on the machine you connected the serial port you can
'tip com1' (I think).  If you don't have a com1 in /etc/remote you can
add one:

com1:dv=/dev/cuaa0:br#9600:pa=none:

In anycase, this way the console will wind up on the serial port
and you can leave yourself tipped in with a big window and then cut
and paste when it drops into DDB and you do the ps.

The other thing you want to do is to make sure all your kernel builds
are -g builds, which you can do by adding the following line to
/usr/src/sys/i386/conf/YOURKERNEL (I'm assuming from prior messages
that you are familiar with building kernels):

makeoptions DEBUG=-g

I also recommend:

options ALT_BREAK_TO_DEBUGGER

This will produce a kernel.debug as well as a kernel binary (only
'kernel' is installed, but kernel.debug will remain sitting in the
compile dir).  ALT_BREAK_TO_DEBUGGER allows you to break into DDB
via the serial console by using CR ~ ^B (return, tilde, control-B)
from your 'tip'.

Finally, make sure the swap partition is large enough to hold main
memory so the kernel dumps core, and use the 'dumpdev' option in
/etc/rc.conf to set the dump device.  For example:

dumpdev=/dev/da0s1b

:2) Any other suggestions as to what it is - if it doesn't look like KVA,
:and I reduced my swap from 2gig to 256megs, and I reduced maxusers from
:512 to 256 ... basically I have a perfectly healthy machine that crashes
:for no reason ?
:
:All of your help is greatly appreciated.  It's just so frustrating to have
:it halt every day for no apparent reason - as you saw from the `top`
:output just as it halted the other day , the load is trivial.
:
:--PT

I don't know but hopefully a full PS will give us a better window into
the problem.

Oh yah, you can also play with different memory configurations simply
by setting a physical memory limit (= actual physical ram in the box)
in /boot/loader.conf, like this:

hw.physmem=256m

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Terry Lambert

Patrick Thomas wrote:
 1) How do I give you an entire `ps` output from DDB ?  Is there a way to
 output it to a floppy or something ?  Or are you suggesting to copy down
 by hand ~1000 lines of ps output ?

Serial console + terminal program with capture.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-24 Thread Patrick Thomas


 It's obvious that you are running a large number of httpd's; the

Yes, we are running a lot of httpd's:

ps auxw | grep httpd | wc -l = 288

 The way to cross-check this would be to run a continuous netstat -m,
 e.g.:

Funny you should ask :)  I was already doing that.  Here is the output
from a `netstat -m` run once per minute - the machine crashed sometime in
the next 30-60 seconds after I got this output:

524/2576/34816 mbufs in use (current/peak/max):
500 mbufs allocated to data
24 mbufs allocated to packet headers
273/2254/8704 mbuf clusters in use (current/peak/max)
5152 Kbytes allocated to network (19% of mb_map in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines


 Basically, if you have any denials, or if the number of mbuf
 clusters gets really large, then you could have a problem.

Do you think it is reasonable that the above netstat -m output could,
within 30 or so seconds, ramp up to the bad situation you are describing ?
Because it looks fairly benign to me...


I have three questions:

1. Forgetting about my paticular problem for a moment, let's say you have
to tune a machine to run 200+ httpd servers along with another 800 misc.
processes, etc.  What do you suggest setting, just to be safe (again, as a
precaution - forgetting that in reality I am tryig to fix a sick machine)
So far I have only tuned:

In my kernel:   maxusers=256 (was 512, change to 256 didn't help)
options SHMMAXPGS=16384
options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1)
options SHMSEG=256
options SEMMNI=384
options SEMMNS=768
options SEMMNU=384
options SEMMAP=384
(all this SHM and SEM stuff is to run multiple postgres')

and at boot time:
sysctl -w jail.sysvipc_allowed=1
sysctl -w kern.ipc.shmall=65535
sysctl -w kern.ipc.shmmax=134217728
sysctl -w net.inet.tcp.syncookies=0

So anything obvious I am missing that you would tune for a 200+ http + 800
other processes machine?



2. Let's say I was being targeted by that effective attack you spoke
of...any way to immunize myself ?


3. You spoke of:

   # sysctl -a | grep tcp | grep space
   net.inet.tcp.sendspace: 32768
   net.inet.tcp.recvspace: 65536

 I guess the best way to deal with this would be to drop the size
 of the send or receive queues, until it didn't consume all your
 memory.  In general, the size of these queues is supposed to be
 a *maximum*, not a *mean*, so the number of sockets possible,
 times the maximum total of both, will often exceed the amount of
 available mbuf space.

a) are you saying to collect these sysctls regularly and
try to see their values right at the crash ?

b) where do I drop the size of the send or receive queues ?
(sysctl or kernel setting?)


thank you very much.  I will try to get a full `ps` tonight when it
crashes again :(

--PT








 An interesting attack that is moderately effective on FreeBSD
 boxes is to send with a very large size, and not send one of
 the fragments (e.g. the second one) to prevent fragment
 reassembly, and therefore saturate the reassembly queue.  The
 Linux UDP NFS client code does this unintentionally, but you
 could believe that someone might be doing it intentionally,
 as well, which would also work against TCP.  It's doubtful that
 you are being hit by a FreeBSD targetted attack, however.

 -- Terry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread Terry Lambert

Patrick Thomas wrote:
 I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a
 reasonable alternative that provides the same benefit and possible
 solution to this problem ?
 
 ...since bsically 0 swap has ever been used on the machine anyway...

Not really.

The code in machdep.c allocated pmaps for swapped memory based
on the size of real memory, rather than based on available swap.

The reason it does this is that you can (effectively) add an
arbitrary amount of swap later with swapon, without the swap
devices at the time being known to the kernel at boot.  THis
makes it impossible to prereserve the number of pmap pages that
will be needed for the actual amount of swap.

Matt Dillon made some autosizing changes after I complained
about this before.  My actual complaint was to implicate the
size of real memory available relative to the size of the full
address space.  The change he made attempts to autosize, and
doesn't quite mirror this policy directly.  THis code is not
available in 4.5.  I believe that it was back-ported to 4.6,
but you would have to look at the CVS log on machdep.c to be
sure about this -- it may only be in -current.

The upshot of this is that having a lot of memory reserves
pmap entries at 4K per 4M of real OR virtual memory.  The
result of this is that at 4G of physical RAM, you actually
end up allocating more pmap's than 1G of memory can contain,
since the total of physical RAM plus swap over 1024 is
larger than 1G minus the amount taken by an idle kernel, not
including the page mappings.

If you have 3G of real RAM (which you do), then you are on
the borderline of running out.  When you factor in the amount
of *potential* swap that machdep.c reserves, plus tuning for
maxfiles/sockets/inpcb/tcpcb/mbufs/etc. (if any), PLUS the
RAM taken up for things associated with running over 1000
processes (as your system does), then you end up exhausting
the amount of VM space available.

As I said before, though, the only way to know for sure if
this is your real problem is to break to the debugger after
the lockup (it's *not* a crash), and check out the wait channels
for the processes thar are unable to run.

If you want a tweak for 4.5 that has about a 95% proability of
masking the problem, then you need to up the KVA space.

Unfortunately, it's not really possible to tell you where
every byte of memory is going.  Also, unfortunately, the
pmap's for swappable memory are not themselves swappable
(or this would not be a problem).  Probably, pmaps for
swap and for file backing store for exectuables should be
allocated when they are needed, not preallocated (they can
be, if you are not out of RAM, or have RAM, but are out of
KVA space in which to create mappings) [see growkernel].

Taking out 1G of physical memory from the box might also
fix the problem without a kernel tweak, FWIW.

However, right now, you need to cause the problem, enter
the debugger, and use ps in the debugger to examine the
wait channels.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread Patrick Thomas


ok.  I was just looking back at a previous comment you made:

 Amusingly enough, you might actually have *better* luck with a
 lot less swap...

and thinking that even if removing most of the swap did not _solve/mask_
the problem, at least it would be a step in the same direction as upping
KVA (even if it is not as large a step)  but if that is not the case...

...then, has anyone written a HOWTO on upping it in 4.5-RELEASE ?  You
mentioned to look back over your own old posts on the subject - before I
jump in and try it, I want to confirm what I believe to understand, I need
to set the KVA value in my kernel config _and_ edit those other two files
in the kernel source, then just recompile my kernel.

Sound like I'm on the right track ?

Terry, thanks again for your help and for all the help you regularly give
to other people pursuing items such as this on the various FreeBSD lists.

--PT




On Sun, 23 Jun 2002, Terry Lambert wrote:

 Patrick Thomas wrote:
  I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a
  reasonable alternative that provides the same benefit and possible
  solution to this problem ?
 
  ...since bsically 0 swap has ever been used on the machine anyway...

 Not really.

 The code in machdep.c allocated pmaps for swapped memory based
 on the size of real memory, rather than based on available swap.

 The reason it does this is that you can (effectively) add an
 arbitrary amount of swap later with swapon, without the swap
 devices at the time being known to the kernel at boot.  THis
 makes it impossible to prereserve the number of pmap pages that
 will be needed for the actual amount of swap.

 Matt Dillon made some autosizing changes after I complained
 about this before.  My actual complaint was to implicate the
 size of real memory available relative to the size of the full
 address space.  The change he made attempts to autosize, and
 doesn't quite mirror this policy directly.  THis code is not
 available in 4.5.  I believe that it was back-ported to 4.6,
 but you would have to look at the CVS log on machdep.c to be
 sure about this -- it may only be in -current.

 The upshot of this is that having a lot of memory reserves
 pmap entries at 4K per 4M of real OR virtual memory.  The
 result of this is that at 4G of physical RAM, you actually
 end up allocating more pmap's than 1G of memory can contain,
 since the total of physical RAM plus swap over 1024 is
 larger than 1G minus the amount taken by an idle kernel, not
 including the page mappings.

 If you have 3G of real RAM (which you do), then you are on
 the borderline of running out.  When you factor in the amount
 of *potential* swap that machdep.c reserves, plus tuning for
 maxfiles/sockets/inpcb/tcpcb/mbufs/etc. (if any), PLUS the
 RAM taken up for things associated with running over 1000
 processes (as your system does), then you end up exhausting
 the amount of VM space available.

 As I said before, though, the only way to know for sure if
 this is your real problem is to break to the debugger after
 the lockup (it's *not* a crash), and check out the wait channels
 for the processes thar are unable to run.

 If you want a tweak for 4.5 that has about a 95% proability of
 masking the problem, then you need to up the KVA space.

 Unfortunately, it's not really possible to tell you where
 every byte of memory is going.  Also, unfortunately, the
 pmap's for swappable memory are not themselves swappable
 (or this would not be a problem).  Probably, pmaps for
 swap and for file backing store for exectuables should be
 allocated when they are needed, not preallocated (they can
 be, if you are not out of RAM, or have RAM, but are out of
 KVA space in which to create mappings) [see growkernel].

 Taking out 1G of physical memory from the box might also
 fix the problem without a kernel tweak, FWIW.

 However, right now, you need to cause the problem, enter
 the debugger, and use ps in the debugger to examine the
 wait channels.

 -- Terry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread Terry Lambert

Patrick Thomas wrote:
 ok.  I was just looking back at a previous comment you made:
 
  Amusingly enough, you might actually have *better* luck with a
  lot less swap...

I meant reserve, not physical swap.  I can see how it could have
been confusing in context; sorry.


 and thinking that even if removing most of the swap did not _solve/mask_
 the problem, at least it would be a step in the same direction as upping
 KVA (even if it is not as large a step)  but if that is not the case...
 
 ...then, has anyone written a HOWTO on upping it in 4.5-RELEASE ?  You
 mentioned to look back over your own old posts on the subject - before I
 jump in and try it, I want to confirm what I believe to understand, I need
 to set the KVA value in my kernel config _and_ edit those other two files
 in the kernel source, then just recompile my kernel.
 
 Sound like I'm on the right track ?

Yes.  That's the way to do it for 4.5, specifically.

FreeBSD really needs an internals book.  But like I said, this
changed between 4.5 and 4.6, and everyone who's buying books
would be more interested in 5.x, and all the important things
change too fast (writing an internals book is an ~2000 hour job,
and that basically means that the important stuff can't change
for a year, or you have to track it -- which inflates it to an
~3000 hour job).  Basically, most of the important internal
interfaces need to sit still so that a book can be written, or
no book.  Even so, the selling life of the book will be
limited to the amount of time after publication that things
actually sit still.  Kirk McKusick is rumored to be writing one;
so was Wes Peters.  Alfred and I discussed a device driver book
that both of us thought needed to be written.  Etc..  But no
book, yet.

I really hesitate to put down an A-B-C set of steps, if I know
that not only is it only applicable to a couple of versions,
none of them are the current version.  8-(.

 Terry, thanks again for your help and for all the help you regularly give
 to other people pursuing items such as this on the various FreeBSD lists.

Eh, I'm noisy.  8-).

You still need to run the debugger, I think.  So far, this is
all theory.  It fits the facts, but I can think of two other
very low probability ways to cause the same symptoms.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread Patrick Thomas



  jump in and try it, I want to confirm what I believe to understand, I need
  to set the KVA value in my kernel config _and_ edit those other two files
  in the kernel source, then just recompile my kernel.
 
  Sound like I'm on the right track ?

 Yes.  That's the way to do it for 4.5, specifically.

Because I am paranoid, I like to check the state of a measurement before
making a change and then after, to see that what I did did indeed induce a
change ... I have this irrational fear that sometimes I make changes like
this and nothing in fact changed, and I just don't know it :)

So, should I just look for the value of:

vm.zone_kmem_kvaspace: 179691520

to increase in size even though the physical RAM stays the same at 3gigs,
or is there some other measurement I should look at before and after the
KVA increase to ensure that it worked (and yes, I know that if it doesn't
work I probably will have an inoperable machine, but just out of
curiousity...)

thanks,

PT


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread John Kozubik


Terry, Patrick, et al,

  What is the procedure in 4.5-RELEASE (please say just change
  KVA_PAGES=260 to KVA_PAGES=512)

(snip)

 For 4.5, you have to hack ldscript.i386 and pmap.h.  I've posted
 on how to do this before (should be in the archives).

Actually, in 4.5 you only need to set:

options KVA_PAGES=512

and recompile your kernel.  It looks like 4.5-RELEASE was the first
release version to _not_ require hacking sys/i386/include/pmap.h and
sys/conf/ldscript.i386.  As you can see by looking at a 4.5-RELEASE 
pmap.h:

#define NKPDE   (KVA_PAGES - 2) /* addressable number of page tables/pde's
*/
#else
#define NKPDE   (KVA_PAGES - 1) /* addressable number of page tables/pde's
*/

the offsets that Terry spoke of are already in place.  This is in contrast
to 4.4-RELEASE:

#define NKPDE   254 /* addressable number of page
tables/pde's */
#else
#define NKPDE   255 /* addressable number of page
tables/pde's */

Where everything was hard coded to match the default KVA_PAGES value.  
Further, looking at ldscript.i386 we see in 4.5-RELEASE:

  . = kernbase + 0x0010 + SIZEOF_HEADERS;

whereas in 4.4-RELEASE and earlier, we saw:

  . = 0xc010 + SIZEOF_HEADERS;

Which means that in 4.4 you had to change 0xc010 to 0x8010 for a
2gig KVA.  In 4.5, however, you don't have to change ldscript.i386 at all,
because it is now a relative value that takes kernbase into account.

-

So, if you are running 4.0 - 4.4, you need to edit ldscript.i386 and
change 0xc010 to 0x801 (for a 2gig KVA), then you need to edit
pmap.h and change the two lines I pasted above from 254 and 255 to 510 and
511, respectively.  Finally, you need to set:

options KVA_PAGES=512

in your kernel config, then recompile your kernel.

But, if you are running 4.5 or 4.6, from the code I pasted above, it looks
like all you have to do is set:

options KVA_PAGES=512

in your kernel config, then recompile your kernel.

-

Another explanation of this concept can be found here:

http://www.kozubik.com/docs/original_kva_increase.txt

I am posting today mainly to get a little more information stored in the
archives.

In addition, I myself have a question regarding the default settings of
4.5 and 4.6 - by looking at the NKPDE values in the 4.4-RELEASE version of
pmap.h, the values of 254 and 255 indicate that they are hard coded for a
default of KVA_PAGES=256, however 4.5 and 4.6 have a KVA_PAGES=260 setting
in LINT, which I assume is also the default ... why the increase of 4
since 4.4-RELEASE ?

-
John Kozubik - [EMAIL PROTECTED] - http://www.kozubik.com





 
 The pages are all going to be off-by-one from your calculations, for
 the recursive page mapping, or off-by-two if your kernel is an SMP
 kernel, for the per CPU page, so remember that, or you will end up
 with a kernel that simply doesn't boot.
 
 The easiest way is to look at the numbers in pmap.h, and figure out
 how they relate to 0xc000 (remember to OR in 0x0010 after your
 math, to count the kernel loading at 1M).
 
 -- Terry
 
 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe
 freebsd-hackers in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread John Kozubik


 -
 
 So, if you are running 4.0 - 4.4, you need to edit ldscript.i386 and
 change 0xc010 to 0x801 (for a 2gig KVA), then you need to edit
 pmap.h and change the two lines I pasted above from 254 and 255 to 510 and
 511, respectively.  Finally, you need to set:
 
 options KVA_PAGES=512

An addendum - skip that last step (setting options KVA_PAGES=512 in your
kernel config) for versions 4.0-4.4, as it did not yet exist as a config
option at that time.  Again, for 4.5 and 4.6, adding that line to your
kernel config is _all_ you need to do.

If you are reading this from the archives, please see my previous post in
this thread for specific details.

-
John Kozubik - [EMAIL PROTECTED] - http://www.kozubik.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread Terry Lambert

Patrick Thomas wrote:
 Because I am paranoid, I like to check the state of a measurement before
 making a change and then after, to see that what I did did indeed induce a
 change ... I have this irrational fear that sometimes I make changes like
 this and nothing in fact changed, and I just don't know it :)
 
 So, should I just look for the value of:
 
 vm.zone_kmem_kvaspace: 179691520
 
 to increase in size even though the physical RAM stays the same at 3gigs,
 or is there some other measurement I should look at before and after the
 KVA increase to ensure that it worked (and yes, I know that if it doesn't
 work I probably will have an inoperable machine, but just out of
 curiousity...)

Yes.

You will also see the kernel load address during the boot process,
which you can interrupt/pause until you are satisfied.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-23 Thread Terry Lambert

John Kozubik wrote:
 Terry, Patrick, et al,
  For 4.5, you have to hack ldscript.i386 and pmap.h.  I've posted
  on how to do this before (should be in the archives).
 
 Actually, in 4.5 you only need to set:
 
 options KVA_PAGES=512
 
 and recompile your kernel.  It looks like 4.5-RELEASE was the first
 release version to _not_ require hacking sys/i386/include/pmap.h and
 sys/conf/ldscript.i386.  As you can see by looking at a 4.5-RELEASE
 pmap.h:
 
 #define NKPDE   (KVA_PAGES - 2) /* addressable number of page tables/pde's
 */
 #else
 #define NKPDE   (KVA_PAGES - 1) /* addressable number of page tables/pde's
 */
 
 the offsets that Terry spoke of are already in place.  This is in contrast
 to 4.4-RELEASE:
 
 #define NKPDE   254 /* addressable number of page
 tables/pde's */
 #else
 #define NKPDE   255 /* addressable number of page
 tables/pde's */

Yes; this is 1.65.2.3

This is my bad.  It's the system I was using as a reference; it has
two kernel source trees; the first one has 1.65, the second is a RELENG_4,
which makes it a 1.65.2.3.


 Where everything was hard coded to match the default KVA_PAGES value.
 Further, looking at ldscript.i386 we see in 4.5-RELEASE:
 
   . = kernbase + 0x0010 + SIZEOF_HEADERS;
 
 whereas in 4.4-RELEASE and earlier, we saw:
 
   . = 0xc010 + SIZEOF_HEADERS;
 
 Which means that in 4.4 you had to change 0xc010 to 0x8010 for a
 2gig KVA.  In 4.5, however, you don't have to change ldscript.i386 at all,
 because it is now a relative value that takes kernbase into account.

Yes, this is 1.4.2.1.

The commit comments for ldscript.i386 are incredibly misleading as to
what the merge actually does.  The derivation of kernbase itself is
also dependent on a third change, which is not documented, either.


 So, if you are running 4.0 - 4.4, you need to edit ldscript.i386 and
 change 0xc010 to 0x801 (for a 2gig KVA), then you need to edit
 pmap.h and change the two lines I pasted above from 254 and 255 to 510 and
 511, respectively.  Finally, you need to set:
 
 options KVA_PAGES=512
 
 in your kernel config, then recompile your kernel.
 
 But, if you are running 4.5 or 4.6, from the code I pasted above, it looks
 like all you have to do is set:
 
 options KVA_PAGES=512
 
 in your kernel config, then recompile your kernel.
 
 -
 
 Another explanation of this concept can be found here:
 
 http://www.kozubik.com/docs/original_kva_increase.txt
 
 I am posting today mainly to get a little more information stored in the
 archives.
 
 In addition, I myself have a question regarding the default settings of
 4.5 and 4.6 - by looking at the NKPDE values in the 4.4-RELEASE version of
 pmap.h, the values of 254 and 255 indicate that they are hard coded for a
 default of KVA_PAGES=256, however 4.5 and 4.6 have a KVA_PAGES=260 setting
 in LINT, which I assume is also the default ... why the increase of 4
 since 4.4-RELEASE ?

I believe that this would be because of the desire for the number
of *usable* pages, since you have to subtract out the ones that are
not global to all CPUs.

The LINT value is *not* the default.  It went in in 1.954 of NOTES
(LINT is a generated file).  I don't know why Peter did this.  It
says and a test in the commit, an since it's only comments and
the option itself, I guess that means that the value of 260 is the
test that the commit message was referencing.

So I guess 4.5 is actually OK, but one of my local boxes is not.

My main frustration with this has always been that the information
in the Handbook has always been insufficient to actually make the
change and have it work.  I guess I'm glad that it made it into
4.5, even if it surprised me.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Nielsen

Yes I've had the same problem. One system runs just fine with it's jails,
and another crashes habitually. It has to do with a certain jail (and
services). Our system are set up to be able to move jails between them
(great for backups and near perfect uptime), and a certain set of jails
always hangs the system in this way. I'm trying to narrow it down. Do you
get a core dump or does it just hang?

Nate

- Original Message -
From: Patrick Thomas [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, June 21, 2002 16:43
Subject: (jail) problem and a (possible) solution ?



 A test server of mine running a number of jails keeps locking up - but the
 odd thing about the lockup is that the userland stops, but the kernel
 keeps running

 (sockets can be opened, but the servers never respond on them, the machine
 still responds to pings, but logs show that all real activity stops)

 I just noticed today that some jails still have writable /dev/mem and
 /dev/kmem and /dev/io nodes.  I think it is plausable that some kind of
 fiddling (writing) to these nodes is causing this kind of lockup.

 

 Is this assumption reasonable, or if some jail user fiddled with their
 /dev/mem or /dev/kmem or /dev/io node would it just totally crash out the
 machine and I _wouldn't_ still be able to ping the server after it crashes
 ?

 thanks,

 PT


 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Patrick Thomas


What it does is the userland hangs, but the kernel keeps running.

When the system is crashed, I can still ping it successfully, and I can
still open sockets (like I can open a connection to a jails httpd or sshd,
or the sshd of the underlying server itself) but nothing answers on the
sockets - they just hang open.

So everything stops running, but it is still up - still responds to
pings...syslog stops logging though, cron stops running

Two questions for you:

1) do you allow them write access to their /dev/mem, /dev/kmem, /dev/io ?

2) does this sound like what you see?  Can you still ping the crashed
server ?

I'm mostly just curious if this kind of crash (userland hung but kernel
running) is a possible outcome of someone in a jail fiddling with those
/dev nodes, or if fiddling with dev/mem or /dev/kmem or io would just lock
the machine up hard and completely.

Terry?

--PT



On Fri, 21 Jun 2002, Nielsen wrote:

 Yes I've had the same problem. One system runs just fine with it's jails,
 and another crashes habitually. It has to do with a certain jail (and
 services). Our system are set up to be able to move jails between them
 (great for backups and near perfect uptime), and a certain set of jails
 always hangs the system in this way. I'm trying to narrow it down. Do you
 get a core dump or does it just hang?

 Nate

 - Original Message -
 From: Patrick Thomas [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Friday, June 21, 2002 16:43
 Subject: (jail) problem and a (possible) solution ?


 
  A test server of mine running a number of jails keeps locking up - but the
  odd thing about the lockup is that the userland stops, but the kernel
  keeps running
 
  (sockets can be opened, but the servers never respond on them, the machine
  still responds to pings, but logs show that all real activity stops)
 
  I just noticed today that some jails still have writable /dev/mem and
  /dev/kmem and /dev/io nodes.  I think it is plausable that some kind of
  fiddling (writing) to these nodes is causing this kind of lockup.
 
  
 
  Is this assumption reasonable, or if some jail user fiddled with their
  /dev/mem or /dev/kmem or /dev/io node would it just totally crash out the
  machine and I _wouldn't_ still be able to ping the server after it crashes
  ?
 
  thanks,
 
  PT
 
 
  To Unsubscribe: send mail to [EMAIL PROTECTED]
  with unsubscribe freebsd-hackers in the body of the message
 




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Alfred Perlstein

* Patrick Thomas [EMAIL PROTECTED] [020622 01:56] wrote:
 
 What it does is the userland hangs, but the kernel keeps running.
 
...
 I'm mostly just curious if this kind of crash (userland hung but kernel
 running) is a possible outcome of someone in a jail fiddling with those
 /dev nodes, or if fiddling with dev/mem or /dev/kmem or io would just lock
 the machine up hard and completely.
 
 Terry?

This typically means some sort of deadlock has happened, if possible
getting a crash dump (this is detailed in the handbook i think)
would help.

The reason why it seems like apps are responding is because the
kernel is only processing interrupts, something has hung the scheduler
or deadlocked the kernel somehow... FYI, the kernel is not running
except when interrupted by a device.

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Terry Lambert

Patrick Thomas wrote:
 What it does is the userland hangs, but the kernel keeps running.
 
 When the system is crashed, I can still ping it successfully, and I can
 still open sockets (like I can open a connection to a jails httpd or sshd,
 or the sshd of the underlying server itself) but nothing answers on the
 sockets - they just hang open.
 
 So everything stops running, but it is still up - still responds to
 pings...syslog stops logging though, cron stops running
 
 Two questions for you:
 
 1) do you allow them write access to their /dev/mem, /dev/kmem, /dev/io ?
 
 2) does this sound like what you see?  Can you still ping the crashed
 server ?
 
 I'm mostly just curious if this kind of crash (userland hung but kernel
 running) is a possible outcome of someone in a jail fiddling with those
 /dev nodes, or if fiddling with dev/mem or /dev/kmem or io would just lock
 the machine up hard and completely.
 
 Terry?

I've kept quiet so far because I'm not the jail expert; Poul
actually wrote the jail code, and there was someone else who
understood it enough to recently add multiple IP support.

Given your symptoms, I can pretty much guess where the problem
is, but not really how to fix it, other than trial-and-error,
since I tend to run jails on a number of my machines, and make
them do things they aren't supposed to do...

Knowing what version of FreeBSD you are running would be helpful.

That you can still ping indicates that both hardware interrupts
and NETISR are running.  That NETISR runs indicates that things
are still calling splx(), which means things are still calling
spl*() and coming back from it.

The fact that you can still connect to servers that have active
listens posted, but that you get no data is also indicative that
the NETISR is running, at least up to the accept.

It would be interesting to attempt a large number of connections,
to see if the connections stop being accepted after you've tried
more times than you set in listen(3) as the queue depth for the
number of sockets allowed to sit there pending accept.  If this
happens (connection attempts start hanging, rather than being
accepted), you know for certain that the process you are trying
to talk to is not being scheduled to run.

Basically, this implies one of two things is happening:

1)  Your scheduler lost its head entry, so it's not
scheduling anything to run, OR

2)  You've used up all your resources on the machine
(usually memory), and all of your processes are
hung on a copy-on-write or allocate request,
pending being serviced by the kernel

If you can, compile the kernel for the box with the kernel
debugger enabled, and break to debugger enabled, and break
to the debugger on the console.  The type ps and see what
you get back as the wait channel everything you are trying to
connect to is waiting on.  This should be very informative,
and it should be easy to locate the problem from there.

If you have to, you can look at the scheduler queues, if
there is anything in runnable state, and find out what's
not there.

Probably, it's not enough RAM, and your tuning parameters
are set such that this isn't fatal to processes, when it
should be.  That you are able to ping, etc. guaranteed that
you are not out of mbufs, and that you can connect that you
aren't out of inpcb's or tcpcb's -- but mbufs are freelisted,
so that's to be expected there (may not need more) and the
pcb's are allocated at boot time (so are sockets, based on
maxfiles), so tuning any of them after boot can get you in
trouble.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Patrick Thomas


Terry,

Thanks for that informative email - just a quick reality check though (for
myself) - the last time this type of crash happened, I was running and
watching `top` on the machine - and when it froze, the `top` output froze
as well, and this was the last display on the screen:


last pid:  6603;  load averages:  3.81,  1.84,  1.48
1032 processes:1 running, 1026 sleeping, 5 zombie
CPU states:  1.8% user,  0.8% nice,  3.2% system,  0.1% interrupt, 94.1%
idle
Mem: 1129M Active, 1404M Inact, 351M Wired, 103M Cache, 199M Buf, 28M Free
Swap: 2018M Total, 2732K Used, 2015M Free



Since all of the things you spoke of basically revolved around you're
running out of memory, is it possible or reasonable to think that within
the space of 1 second, I ran through 1404 megs inactive and 28 megs free
memory ?

machine is 4.5-RELEASE with 3gigs ram.  swap never gets touched, although
there is in fact 2gigs of swap.  `pstat -s` always shows 0% used.

I'll do the debug actions you suggested.

--PT



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Terry Lambert

Patrick Thomas wrote:
 Since all of the things you spoke of basically revolved around you're
 running out of memory, is it possible or reasonable to think that within
 the space of 1 second, I ran through 1404 megs inactive and 28 megs free
 memory ?
 
 machine is 4.5-RELEASE with 3gigs ram.  swap never gets touched, although
 there is in fact 2gigs of swap.  `pstat -s` always shows 0% used.

OK, there's memory, and then there's memory.

The amount of swap you have, the fact that it's 4.5, and the
amount of RAM you have imply to me that the problem is that
you are out of pmap entries.

You should up your KVA space to 2G or maybe even 3G; the default
in 4.5 was 1G.

Basically, I now think that you don't have enough memory to map
how much memory and virtual memory you have.

Amusingly enough, you might actually have *better* luck with a
lot less swap...

If your KVA space is already enlarged above the default, then
you can ignore this and just go ahead with the debugging to see
what the wait channels for all the processes that won't run are
stuck at.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Nielsen

 1) do you allow them write access to their /dev/mem, /dev/kmem, /dev/io ?

Actually haven't yet let anyone else inside a jail with root capabilities.
Will soon though. So, no probably not, unless there's a daemon which does
just that.

 2) does this sound like what you see?  Can you still ping the crashed
 server ?

Kernel routing still works. And yes ping too.

But come to think of this I've seen it on other (4.5, patched pretty much to
date) machines I use exclusively as routers. These have no jails on them. In
these cases after uptimes of let's say 2 or 3 months, the machine's daemons
stop responding and although a socket can be opened (just barely) it closes
again when the process listening on the other side doesn't pick it up.

IPSEC, firewalls, kernel routing, and all that continue to function just
fine. Like you said it's just the userland stuff that has problems.

The strange thing is, on one of my machines I was (eventually) able to log
in from the console, take the system down to single user mode and back up
and then everything worked like a charm.

Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Patrick Thomas


How do you increase KVA space these days ?  I see that in earlier releases
you had to edit /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h and
do all sorts of crazy stuff.

What is the procedure in 4.5-RELEASE (please say just change
KVA_PAGES=260 to KVA_PAGES=512)

That's what you want me to do, right ?  Is that all - can it be done just
by changing that one value in my kernel config ?

Again, thank you Terry for all your help.

--PT


On Sat, 22 Jun 2002, Terry Lambert wrote:

 Patrick Thomas wrote:
  Since all of the things you spoke of basically revolved around you're
  running out of memory, is it possible or reasonable to think that within
  the space of 1 second, I ran through 1404 megs inactive and 28 megs free
  memory ?
 
  machine is 4.5-RELEASE with 3gigs ram.  swap never gets touched, although
  there is in fact 2gigs of swap.  `pstat -s` always shows 0% used.

 OK, there's memory, and then there's memory.

 The amount of swap you have, the fact that it's 4.5, and the
 amount of RAM you have imply to me that the problem is that
 you are out of pmap entries.

 You should up your KVA space to 2G or maybe even 3G; the default
 in 4.5 was 1G.

 Basically, I now think that you don't have enough memory to map
 how much memory and virtual memory you have.

 Amusingly enough, you might actually have *better* luck with a
 lot less swap...

 If your KVA space is already enlarged above the default, then
 you can ignore this and just go ahead with the debugging to see
 what the wait channels for all the processes that won't run are
 stuck at.

 -- Terry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Terry Lambert

Patrick Thomas wrote:
 How do you increase KVA space these days ?  I see that in earlier releases
 you had to edit /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h and
 do all sorts of crazy stuff.
 
 What is the procedure in 4.5-RELEASE (please say just change
 KVA_PAGES=260 to KVA_PAGES=512)
 
 That's what you want me to do, right ?  Is that all - can it be done just
 by changing that one value in my kernel config ?

It's what I want you to do.

For 4.5, you have to hack ldscript.i386 and pmap.h.  I've posted
on how to do this before (should be in the archives).

The pages are all going to be off-by-one from your calculations,
for the recursive page mapping, or off-by-two if your kernel is an
SMP kernel, for the per CPU page, so remember that, or you will
end up with a kernel that simply doesn't boot.

The easiest way is to look at the numbers in pmap.h, and figure
out how they relate to 0xc000 (remember to OR in 0x0010
after your math, to count the kernel loading at 1M).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: (jail) problem and a (possible) solution ?

2002-06-22 Thread Patrick Thomas


I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a
reasonable alternative that provides the same benefit and possible
solution to this problem ?

...since bsically 0 swap has ever been used on the machine anyway...

--PT

On Sat, 22 Jun 2002, Terry Lambert wrote:

 Patrick Thomas wrote:
  How do you increase KVA space these days ?  I see that in earlier releases
  you had to edit /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h and
  do all sorts of crazy stuff.
 
  What is the procedure in 4.5-RELEASE (please say just change
  KVA_PAGES=260 to KVA_PAGES=512)
 
  That's what you want me to do, right ?  Is that all - can it be done just
  by changing that one value in my kernel config ?

 It's what I want you to do.

 For 4.5, you have to hack ldscript.i386 and pmap.h.  I've posted
 on how to do this before (should be in the archives).

 The pages are all going to be off-by-one from your calculations,
 for the recursive page mapping, or off-by-two if your kernel is an
 SMP kernel, for the per CPU page, so remember that, or you will
 end up with a kernel that simply doesn't boot.

 The easiest way is to look at the numbers in pmap.h, and figure
 out how they relate to 0xc000 (remember to OR in 0x0010
 after your math, to count the kernel loading at 1M).

 -- Terry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message