`dump` and/or `restore` incorrectly handles /dev files
Try this - it's good for a laugh: ls -asl dev/*mem 0 crw-r- 1 root kmem2, 1 Aug 27 15:16 kmem 0 crw-r- 1 root kmem2, 0 Aug 27 15:16 mem Now run this command, changing some permissions: chmod -w dev/mem ; chmod -w dev/kmem Now, dump that filesystem that your /dev resides on with: `dump -0a -f /some/file /dev/ad0a` Now, restore your dump file (/some/file) with: `restore -x -f /some/file` (I just restored into some arbitrary directory) (answered 1 for which volume to start with, and answered y to the trailing set owner/mode for . question) Now, once again, ls -asl dev/*mem 0 crw--- 1 root wheel 2, 1 Sep 3 01:13 kmem 0 crw--- 1 root wheel 2, 0 Sep 3 01:13 mem --- Gee, that's funny - not only are they _not_ -w as they were changed to before dumping, but they've also lost a r- as well ! Easily reproducible. Don't respond to this thread if all you have to say is well you shouldn't be chmodding those files -w anyway. --pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: setting quotas _inside_ a jail for users _inside_ a jail
No, sorry I think that I was misunderstood - here is my situation: - I have a host machine with no users - just root. - on that host machine I have a vn-backed FS 500 megs in size - on that vn-backed FS, I run a jail - and no other jails share that vn-backed FS (although other jails may share the underlying actual disk FS that the vn is on...) Now, I die in a car accident and nobody ever logs into the host system again or touches anything on the _host system_. Can the root user of the _jail running on the host system_ set up quotas for her users ? Let's assume the root user and all her other users don't even know it is a jail - as far as they are concerned, it's just their freebsd machine. So the question is, can this root user set up quotas ? And if so, some hints on exactly what needs to go into /etc/fstab _inside their jail_, since specifying anything in there seems to have the side effects of: a) not working as expected b) causing the jail not to be startable. thanks, PT On Sun, 1 Sep 2002, Robert Watson wrote: On Fri, 30 Aug 2002, Patrick Thomas wrote: I realize the difficulties in trying to use quotas on the _host_ system to limit the size of jails on the host system - userid mapping, etc. This is not what I am asking. I wonder, is it possible for the root user of a jail to set quotas _inside_ her jail for users _inside_ her jail ? Can anyone simply confirm or deny that this is possible ? Simply following normal protocol does not work, because if you place filesystem entries into /etc/fstab inside the jail, the jail will no longer start, as it does not have permission to mount or otherwise manipulate those filesystems. Other than the access control checks in the quota code being influenced by the jail, there really is no relationship between jails and quotas. Jails are solely a property of processes and other credential-bearing kernel objects. Persistent and transient quota information is stored relative to uids and gids, and quotas are enforced based on those elements of the process credential, and are not impacted by the jail field. This means that if a file system is shared by two jails, and a particular uid is in use in both jails, both sets of processes will be impacted by the same quota. Privileged users can perform quota management calls on any file system they can name via a visible file object. If quota management calls were permitted from jail, they could likewise be performed on any file system visible in the jail. If only appropriate file systems are visible from the jail, you could add PRISON_ROOT to the flags field of the relevant suser call. If you expose file systems to the jail that you don't want the root user in the jail to set quotas on, you may be out of luck. I take it from your description that you're interested in imposing quotas on the users in the jail, not quotas on the jail itself? Robert N M Watson FreeBSD Core Team, TrustedBSD Projects [EMAIL PROTECTED] Network Associates Laboratories To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
setting quotas _inside_ a jail for users _inside_ a jail
Hello, I realize the difficulties in trying to use quotas on the _host_ system to limit the size of jails on the host system - userid mapping, etc. This is not what I am asking. I wonder, is it possible for the root user of a jail to set quotas _inside_ her jail for users _inside_ her jail ? Can anyone simply confirm or deny that this is possible ? Simply following normal protocol does not work, because if you place filesystem entries into /etc/fstab inside the jail, the jail will no longer start, as it does not have permission to mount or otherwise manipulate those filesystems. Comments ? Thoughts ? Confirmations or denials ? thnaks! To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
Ok, this seems to have died down a bit, and my own urgency has passed since it is no longer manifesting itself on my test machinehowever, two things come to mind: 1. is it possible that arbitrary top output is now suspect on machines that have manifested this behavior ? I am not showing all zeros anymore, but who is to say that what I am seeing is correct ? My vmstat -i now yields: rtc irq8 29272122 66 and I am seeing a rate of 128 on normal systems. So maybe my top output is still wrong, even though it isn't all zeros. 2. What is to be done ? I have no reason to believe this won't crop up on 4.6.2 or later...does anyone else ? thanks. pat. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
ok: # vmstat -i interrupt total rate ata0 irq14 23 0 ahc0 irq10 15 0 aac0 irq2 6330470 30 fxp0 irq517556113 83 fdc0 irq6 4 0 sio0 irq4 8 0 sio1 irq3 8 0 clk irq0 21008332 99 rtc irq8 264460 1 Total45159433214 Now, when I repeat vmstat -i, all of these numbers (or rather, all of the large numbers) increase _except_ for `rtc irq8`. So is this just a simple broken clock on the system, as in, my hardware clock is physically broken/breaking ? dmesg says nothing about irq8, so I assume there is no conflict. Further, regarding the APM conjecture, this is a server and (although I may be mistaken) does not have APM in the bios at all - I have also removed it from the kernel. dmesg tends to confirm the absence of APM. --bpat On Mon, 26 Aug 2002, David Malone wrote: On Sun, Aug 25, 2002 at 04:49:23PM -0700, Patrick Thomas wrote: Also, just to add a bit more info, sometimes instead of rebooting to solve the problem, the problem doesn't exist, and rebooting causes it to manifest. So it seems fairly random. Can you watch vmstat -i before and after the problem occurs? I'm guessing that one of the interrupt counts will stop increasing. David. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
ok, after 2+ days, for no discernible reason I now have real top stats back. This has occurred within the last 20 minutes, and I have done nothing at all on the system save normal operation. vmstat -i now tells me: # vmstat -i ... rtc irq8 479105 2 ... The 497105 number is steadily rising ... and now, about 30 mins later I am at: rtc irq8 938264 4 --pt On Mon, 26 Aug 2002, Lars Eggert wrote: Patrick Thomas wrote: Now, when I repeat vmstat -i, all of these numbers (or rather, all of the large numbers) increase _except_ for `rtc irq8`. interrupt total rate mux irq114851 12 ata0 irq14 94219240 atkbd0 irq1 399 1 fdc0 irq6 2 0 ppc0 irq7 1 0 clk irq039123100 Total 138595354 Large ones increasing, too, but I don't seem to have rtc. Further, regarding the APM conjecture, this is a server and (although I may be mistaken) does not have APM in the bios at all - I have also removed it from the kernel. dmesg tends to confirm the absence of APM. Mine's a laptop with APM enabled (BIOS + kernel). Lars -- Lars Eggert [EMAIL PROTECTED] USC Information Sciences Institute To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
I will note that my system is a dual processor system, no APM hardware in it, and I have an identical machine running a kernel built from an identical kernel configuration file running an identical FreeBSD system that has _never_ had the problem. On Mon, 26 Aug 2002, Bruce M Simpson wrote: On Mon, Aug 26, 2002 at 11:02:50AM -0700, Peter Wemm wrote: This has happened before. For some reason, the RTC stops sending the 128Hz statclock (statistics clock) interrupts. One way to unwedge that in the past was to break into ddb and do a 'show rtc' command.. but that is hardly a solution. I thought we had solved this problem. APM however is a known culprit for causing badness here. I should add that my Vaio has APM compiled into the kernel. I've also done the vmstat -i inspection briefly, all interrupt counters seem to be incrementing as normal. This problem may have cropped up after a set of suspend/resume sequences; right now I've had 3 warm reboots since yesterday (the laptop has been plugged in and unmoved), the problem has not yet manifested itself, but when I last noticed it, I had been suspending and resuming between leaving home and work. I realize this is purely anecdotal but I'll continue to observe for the problem re-emerging. BMS To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
No, world and kernel out of sync is _not _ the problem in my case - I made 4.6.1-RC2 diskettes and did a ftp installation - so there was no upgrading involved. Further, this is an intermittent problem - sometimes it happens, sometimes it doesn't. I think some people have reported it on non RC2 4.6-RELEASE. --pt On Sat, 24 Aug 2002, Brian T. Schellenberger wrote: On Saturday 24 August 2002 12:00 pm, Patrick Thomas wrote: | And more important;y, does anyone know _why_ it is happening and what | it means for a system affected ? It usually means that the kernel and the world are out of sync. How did you update to 4.6.1-RC2? | | On Sat, 24 Aug 2002, Bruce M Simpson wrote: | On Sat, Aug 24, 2002 at 12:23:45AM -0700, Patrick Thomas wrote: | I have seen this twice on 4.6.1-RC2: | | [..] | | CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% | interrupt, 0.0% idle | | [..] | | This is happening on my Vaio also; has anyone filed a PR? | | FreeBSD triage.dollah.com 4.6-STABLE FreeBSD 4.6-STABLE #0: Tue Aug | 20 13:00:06 BST 2002 | [EMAIL PROTECTED]:/usr/src/sys/compile/TRIAGE i386 | | BMS | | To Unsubscribe: send mail to [EMAIL PROTECTED] | with unsubscribe freebsd-hackers in the body of the message -- Brian, the man from Babble-On . . . . [EMAIL PROTECTED] (personal) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
It's usually gone after a reboot. Haven't debugged it further since I saw now other problems. Yes, but other times it is not manifesting, and it _starts_ after a reboot. Also, concerning solving the problem with a reboot, although my system is merely a test machine, I am fairly certain that a considerable number of people are using FreeBSD release versions as important and mission critical pieces of their businesses...so this would not be an option for some folks (rebooting frequently to solve). _I_ understand that nobody but actual freeBSD developers have any business relying on it for anything critical, but I am not sure that has been made public successfully. That is, I think a fair number of people are taken by surprise by things like this. Just my two cents. --ptat To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: top shows all zeroes.
Well, the actual *release* versions *are* supposed to be reliable for mission-critical applications. The purpose of the RC and STABLE versions being to find problems so that they don't make it to the release versions. A lofty goal, indeed. However it has been pointed out that this problem manifests itself in plain old 4.6-RELEASE. Also, just to add a bit more info, sometimes instead of rebooting to solve the problem, the problem doesn't exist, and rebooting causes it to manifest. So it seems fairly random. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: possible to expand a file for vn-device FS usage ?
Thank you for the very clear explanation. Does there exist a utility to immediately take a partition that has been growfs'd and fix it so that it does not experience this performance penalty ? That is, I am willing to sit and wait 10 minutes while some utility rearranges and reorganizes the unmounted filesystem if it means I don't have to dump/restore/blah/blah and if it allows me to avoid the performance penalty you mentioned... thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
possible to expand a file for vn-device FS usage ?
I have a 500meg file that I dd'd and have mounted as a vn-device filesystem. I would like to increase this to 1gig, however it is very time consuming to do a dump of the FS to a file, dd a new larger one, then do a restore (I have many special files in the FS, thus the need for dump). Is there a procedure wherein I can just unmount the file, expand it, then remount it ? I realize some trickery is needed as the newfs originally done on the file will then be wrong, etc. - possibly disklabel as well - but I am willing to run a new disklabel and/or newfs command on the file in addition to expanding it. Any suggestions on how to expand that file without doing the dump/restore steps ? thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: possible to expand a file for vn-device FS usage ?
What is the negative effect of this fragmentation, and does it mean I won't be able to use all of the space that I added ? On Thu, 15 Aug 2002, Terry Lambert wrote: Daniel O'Connor wrote: On Thu, 2002-08-15 at 17:04, Patrick Thomas wrote: Any suggestions on how to expand that file without doing the dump/restore steps ? man 8 growfs perchance? :) You can unmount it, grow the underlying file with: dd if-/dev/zero bs=XXX,count=XXX filename and *THEN* use growfs(8) on it. Doing this will leave the allocation layout in the same state that it is at present, so the bottom half of the FS will end up fragmented, even though there is free space at the top (FS growing does not equally redistribute the FS content into the newly enlarged space). The best approach is the same as it would be for a device: dump and restore the FS from the old image to the new. In the vn device case, you could just create a new empty FS of the necessary size, and dump from the old piped to a restore of the new. If you can live with the internal fragmentation, use growfs(8); if you can't, use dump/restore. IMO, you will have less potential for future problems if you use dump/restore. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
status of UDP in jail bug ?
There is (was?) a problem with jail that, among other things, made it impossible for an ircd server to perform reverse lookups for clients. In the news archives, there were complaints about this, and after a not so good patch, eventually a good patch was posted by: From: Lamont Granquist ([EMAIL PROTECTED]) Subject: UDP jail bug patch (was Re: (PATCH) Re: jail bug with ircd-hybrid I have two questions: 1. Does anyone know which versions of FreeBSD this patch will work on ? 2. Do I need this patch anymore on 4.6 and above ? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
resolver workaround conceptually possible ?
I am under the impression that at this time there is no workaround for the resolver problem - you are forced to reinstall or upgrade. I am curious though, is it at least conceptually possible that there could be a workaround ? If so, what would it entail ? thanks - pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: resolver workaround conceptually possible ?
Assuming that bind9 has been fixed, you could use bind9 for your local resolver and it will filter anything nasty out as a side effect of the fact that it always constructs replies, rather than caching a reply and forwarding the reply as-is to the resolver client (as bind8 does). Thank you very much. I would like to clarify two things - first, that I can fix bind9 by simply grabbing the tarball, configure;make;make install ... or do I have to change libraries on the system itself and otehrwise rearrange things in order for bind9 to compile fixed ? That is, just update bind9 as normal ? Second, again, just to clarify, this is a full fix ? Once someone does this they can rest easy ? thanks a lot! To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Should I be concerned ?
I saw this show up all over my ssh session into a server today: NOTICE: --Relation pg_toast_16386-- NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_toast_16386_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Analyzing pg_relcheck NOTICE: --Relation pg_rewrite-- NOTICE: Pages 4: Changed 0, reaped 0, Empty 0, New 0; Tup 23: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 104, MaxLen 1456; Re-using: Free/Avail. Space 8496/8496; EndEmpty/Avail. Pages 0/4. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_rewrite_oid_index: Pages 2; Tuples 23. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_rewrite_rulename_index: Pages 2; Tuples 23. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Rel pg_rewrite: Pages: 4 -- 4; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: --Relation pg_toast_16410-- NOTICE: Pages 2: Changed 0, reaped 0, Empty 0, New 0; Tup 5: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 163, MaxLen 2034; Re-using: Free/Avail. Space 8088/8088; EndEmpty/Avail. Pages 0/2. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_toast_16410_idx: Pages 2; Tuples 5. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Rel pg_toast_16410: Pages: 2 -- 2; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Analyzing pg_rewrite NOTICE: --Relation pg_statistic-- NOTICE: Pages 6: Changed 6, reaped 3, Empty 0, New 0; Tup 98: Vac 98, Keep/VTL 0/0, UnUsed 8, MinLen 80, MaxLen 668; Re-using: Free/Avail. Space 26560/26484; EndEmpty/Avail. Pages 0/4. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_statistic_relid_att_index: Pages 2; Tuples 98: Deleted 98. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Rel pg_statistic: Pages: 6 -- 3; Tuple(s) moved: 90. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_statistic_relid_att_index: Pages 2; Tuples 98: Deleted 90. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: --Relation pg_toast_16408-- NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec elapsed 0.00 sec. NOTICE: Index pg_toast_16408_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. DEBUG: recycled transaction log file 00B6 Any ideas as to what this means and what I should do (if anything) about it ? thanks, pat To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
using `restore` without user input
I would like to perform a restore out of a shell script. Normally, I run restore with a command line like: restore -x -f /some/dump Which works _exactly_ as I want it to, except that I am asked two questions: Specify next volume #: and then at the end of the restore: set owner/mode for '.'? [yn] So that is a problem, since I want to run it unattended, without requiring user input. I have discovered that this command line: restore -rf /some/dump will run without user input. MY question is, is the output of this command identical to the output of the original one I was running ? I _do_ indeed wish to specify owner/mode for '.' and have everything restore just right like it was with my original command line - am I missing anything or losing any of my original functionality by using this new command line ? Or is it identical in result (except for the extra `restoresymtable` file it produces) to the original command I had ? thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: tunings for many httpds...
Incidently, looking at the PV entry angle for a moment. Suppose you create a 1GB sysvshm (pageable) segment. That's 262144 pages. Mapping this once means you consume 262144 PV entries. At 28 bytes each, that is about 7.3MB of KVM. Now, fork this process 300 times. The numbers become 78643200 PV entries taking up about 2.2GB of PV entries that would like to fit in the 1G KVA space. We dont even nearly have a way to fit all this in. This is the killer reason for SHM_PHYS stuff. It avoids the PV load which has to fit into a single confined space. The cost of the page table pages sucks, but at least that is spread over the VM space of 300 processes. Ok, I'm confused now - so I understood you to originally say that SHM does not eat into KVA regardless of whether I set the kern.ipc.shm_use_phys to '1' or not. This leads me to conclude that setting that sysctl to 1 will probably not be the magic bullet to stop my system from inexplicably halting. (my system with greatly (4x) increased SHM/SEM/etc. settings) But now in this post ... are you saying that from the PV entry angle that KVA _is_ sometimes used for SHM, when we create a pageable segment ? Or are you just providing a thought experiment and pointing out that if it _were_ done this way then XYZ bad things would occur ? thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
Terry, I made an initial change to the kernel of reducing maxusers from 512 to 256 - you said that 3gig is right on the border of needing extra KVA or not, so I thought maybe this unnecessarily high maxusers might be puching me over the top. However, as long as I was changing the kernel, I also added DDB. The bad news is, it crashed again. The good news is, I dropped to the debugger and got the wait channel info you wanted with `ps`. Here are the last four columns of ps output for the first two pages of processes (roughly 900 procs were running at the time of the halt, so of course I can't give you them all, especially since I am copying by hand) 3 select c0335140local 3 select c0335140trivial-rewrite 3 select c0335140cleanup 3 select c0335140smtpd 3 select c0335140imapd 2 httpd 2 httpd 3 sbwait e5ff6a8chttpd 3 lockf c89b7d40httpd 3 sbwait e5fc8d0chttpd 2 httpd 3 select c0335140top 3 accept e5fc9ef6httpd 3 select c0335140imapd 3 select c0335140couriertls 3 select c0335140imapd 2 couriertls 3 ttyin c74aa630bash 3 select c0335140sshd 3 select c0335140tt++ So there it all is. Does this confirm your feeling that I need to increase KVA? Or does it show you that one of the one or two other low probablity problems is occurring? thanks, PT On Sun, 23 Jun 2002, Terry Lambert wrote: Patrick Thomas wrote: I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a reasonable alternative that provides the same benefit and possible solution to this problem ? ...since bsically 0 swap has ever been used on the machine anyway... Not really. The code in machdep.c allocated pmaps for swapped memory based on the size of real memory, rather than based on available swap. The reason it does this is that you can (effectively) add an arbitrary amount of swap later with swapon, without the swap devices at the time being known to the kernel at boot. THis makes it impossible to prereserve the number of pmap pages that will be needed for the actual amount of swap. Matt Dillon made some autosizing changes after I complained about this before. My actual complaint was to implicate the size of real memory available relative to the size of the full address space. The change he made attempts to autosize, and doesn't quite mirror this policy directly. THis code is not available in 4.5. I believe that it was back-ported to 4.6, but you would have to look at the CVS log on machdep.c to be sure about this -- it may only be in -current. The upshot of this is that having a lot of memory reserves pmap entries at 4K per 4M of real OR virtual memory. The result of this is that at 4G of physical RAM, you actually end up allocating more pmap's than 1G of memory can contain, since the total of physical RAM plus swap over 1024 is larger than 1G minus the amount taken by an idle kernel, not including the page mappings. If you have 3G of real RAM (which you do), then you are on the borderline of running out. When you factor in the amount of *potential* swap that machdep.c reserves, plus tuning for maxfiles/sockets/inpcb/tcpcb/mbufs/etc. (if any), PLUS the RAM taken up for things associated with running over 1000 processes (as your system does), then you end up exhausting the amount of VM space available. As I said before, though, the only way to know for sure if this is your real problem is to break to the debugger after the lockup (it's *not* a crash), and check out the wait channels for the processes thar are unable to run. If you want a tweak for 4.5 that has about a 95% proability of masking the problem, then you need to up the KVA space. Unfortunately, it's not really possible to tell you where every byte of memory is going. Also, unfortunately, the pmap's for swappable memory are not themselves swappable (or this would not be a problem). Probably, pmaps for swap and for file backing store for exectuables should be allocated when they are needed, not preallocated (they can be, if you are not out of RAM, or have RAM, but are out of KVA space in which to create mappings) [see growkernel]. Taking out 1G of physical memory from the box might also fix the problem without a kernel tweak, FWIW. However, right now, you need to cause the problem, enter the debugger, and use ps in the debugger to examine the wait channels. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
A few items that deserve mention, and two questions: a) this problem occurred back when the machine had 2gigs in it - I actually (naively) added the third gig of physical ram to try to fix the problem. b) another machine of mine is now exhibiting the same bahavior - it has far fewer processes running (~500 vs ~1000) and it has only 2gigs of RAM. questions: 1) How do I give you an entire `ps` output from DDB ? Is there a way to output it to a floppy or something ? Or are you suggesting to copy down by hand ~1000 lines of ps output ? 2) Any other suggestions as to what it is - if it doesn't look like KVA, and I reduced my swap from 2gig to 256megs, and I reduced maxusers from 512 to 256 ... basically I have a perfectly healthy machine that crashes for no reason ? All of your help is greatly appreciated. It's just so frustrating to have it halt every day for no apparent reason - as you saw from the `top` output just as it halted the other day , the load is trivial. --PT On Mon, 24 Jun 2002, Matthew Dillon wrote: Well, it should be noted that there are two things going on with swap. What I adjusted was the size of the swap_zone, which holds swblocks. These structures hold the VM-SWAP block mappings for things that are swapped out. The swap zone eats a lot more KVA then the radix tree holding the swap bitmaps. The actual swap bitmaps are allocated from the M_SWAP malloc pool. These allocations are based on NSWAP * (largest_single_swap_area). NSWAP is usually 4. Having a single 2GB swap area is therefore somewhat expensive, but still nowhere near the size required to exhaust KVM (or even come close to exhausting KVM). It is just as expensive as having 4 x 2GB swap areas due to the way the bitmaps are allocated. The swap bitmaps eat around 2 bits per 4K block of swap so a single 2GB of swap will eat 2G/4K x 2 / 8 x NSWAP(4) = 0.5 MB of ram. Not very much. But, getting back to the swblocks... these use a zone, SWAPMETA (vmstat -z | less, search for SWAPMETA). The zone reserves KVA. A machine with 2GB of real memory will typically reserve around 10 MB of KVA to hold swblocks. Previously it reserved 20-40 MB of KVA which really ate into available KVA. It should not be a problem now but it's very easy for you to check. Multiply the size (160) against the LIMIT and you will get the approximate KVA reservation being used for the SWAPMETA zone. -- Ok, history lesson over. Going over your original posting and the ps you just posted from ddb there is not enough information to make any sort of diagnosis. It doesn't look like KVA exhaustion to me, and the ps does not show any deadlocks. I'm not sure what is going on. I think some more experimentation is necessary... e.g. breaking into DDB after it deadlocks and doing a full 'ps' (don't leave anything out this time), and potentially getting a kernel core dump (assuming you compiled the kernel -g and have a kernel.debug lying around that we can gdb the core against). -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
tunings for many httpds...
As a splinter to the ongoing KVA/crash/memory discussion, I am wondering: - given a machine that will run 250+ httpds and another ~800 misc. processes, what system tunings would any of you suggest other than the ones I have done: In my kernel: maxusers=256 (was 512, change to 256 didn't help) options SHMMAXPGS=16384 options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) options SHMSEG=256 options SEMMNI=384 options SEMMNS=768 options SEMMNU=384 options SEMMAP=384 (all this SHM and SEM stuff is to run multiple postgres') and at boot time: sysctl -w jail.sysvipc_allowed=1 sysctl -w kern.ipc.shmall=65535 sysctl -w kern.ipc.shmmax=134217728 sysctl -w net.inet.tcp.syncookies=0 Anything obvious I am missing ? Terry seems to think: quote It's obvious that you are running a large number of httpd's; the sbwait in this case could be reasonably assumed to be waits based on sendfile for a change in so-so_snd-sb_cc; if that's the case, then it may be that you are simply running out of mbufs, and are deadlocking. This can happen if you have enough data in the pipe that you can not receive more data (e.g. the m_pullup() in tcp_input() could fail before other things would fail). /quote Two things about this interested me: a) watching `top` output anytime of the day, i see several httpd processes in sbwait - granted I can only see 40 lines of processes or so in `top`, but usually at least two show sbwait. Worrisome ? b) As I showed him, the netstat -m output 30-60 seconds before the crash looks very benign: 524/2576/34816 mbufs in use (current/peak/max): 500 mbufs allocated to data 24 mbufs allocated to packet headers 273/2254/8704 mbuf clusters in use (current/peak/max) 5152 Kbytes allocated to network (19% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines Is it possible that within 30 seconds or so current mbufs would skyrocket and my percentage of mb_map in use would skyrocket and I would start to see requests for memory denied ? All comments appreciated. --PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
It's obvious that you are running a large number of httpd's; the Yes, we are running a lot of httpd's: ps auxw | grep httpd | wc -l = 288 The way to cross-check this would be to run a continuous netstat -m, e.g.: Funny you should ask :) I was already doing that. Here is the output from a `netstat -m` run once per minute - the machine crashed sometime in the next 30-60 seconds after I got this output: 524/2576/34816 mbufs in use (current/peak/max): 500 mbufs allocated to data 24 mbufs allocated to packet headers 273/2254/8704 mbuf clusters in use (current/peak/max) 5152 Kbytes allocated to network (19% of mb_map in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines Basically, if you have any denials, or if the number of mbuf clusters gets really large, then you could have a problem. Do you think it is reasonable that the above netstat -m output could, within 30 or so seconds, ramp up to the bad situation you are describing ? Because it looks fairly benign to me... I have three questions: 1. Forgetting about my paticular problem for a moment, let's say you have to tune a machine to run 200+ httpd servers along with another 800 misc. processes, etc. What do you suggest setting, just to be safe (again, as a precaution - forgetting that in reality I am tryig to fix a sick machine) So far I have only tuned: In my kernel: maxusers=256 (was 512, change to 256 didn't help) options SHMMAXPGS=16384 options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) options SHMSEG=256 options SEMMNI=384 options SEMMNS=768 options SEMMNU=384 options SEMMAP=384 (all this SHM and SEM stuff is to run multiple postgres') and at boot time: sysctl -w jail.sysvipc_allowed=1 sysctl -w kern.ipc.shmall=65535 sysctl -w kern.ipc.shmmax=134217728 sysctl -w net.inet.tcp.syncookies=0 So anything obvious I am missing that you would tune for a 200+ http + 800 other processes machine? 2. Let's say I was being targeted by that effective attack you spoke of...any way to immunize myself ? 3. You spoke of: # sysctl -a | grep tcp | grep space net.inet.tcp.sendspace: 32768 net.inet.tcp.recvspace: 65536 I guess the best way to deal with this would be to drop the size of the send or receive queues, until it didn't consume all your memory. In general, the size of these queues is supposed to be a *maximum*, not a *mean*, so the number of sockets possible, times the maximum total of both, will often exceed the amount of available mbuf space. a) are you saying to collect these sysctls regularly and try to see their values right at the crash ? b) where do I drop the size of the send or receive queues ? (sysctl or kernel setting?) thank you very much. I will try to get a full `ps` tonight when it crashes again :( --PT An interesting attack that is moderately effective on FreeBSD boxes is to send with a very large size, and not send one of the fragments (e.g. the second one) to prevent fragment reassembly, and therefore saturate the reassembly queue. The Linux UDP NFS client code does this unintentionally, but you could believe that someone might be doing it intentionally, as well, which would also work against TCP. It's doubtful that you are being hit by a FreeBSD targetted attack, however. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
ok. I was just looking back at a previous comment you made: Amusingly enough, you might actually have *better* luck with a lot less swap... and thinking that even if removing most of the swap did not _solve/mask_ the problem, at least it would be a step in the same direction as upping KVA (even if it is not as large a step) but if that is not the case... ...then, has anyone written a HOWTO on upping it in 4.5-RELEASE ? You mentioned to look back over your own old posts on the subject - before I jump in and try it, I want to confirm what I believe to understand, I need to set the KVA value in my kernel config _and_ edit those other two files in the kernel source, then just recompile my kernel. Sound like I'm on the right track ? Terry, thanks again for your help and for all the help you regularly give to other people pursuing items such as this on the various FreeBSD lists. --PT On Sun, 23 Jun 2002, Terry Lambert wrote: Patrick Thomas wrote: I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a reasonable alternative that provides the same benefit and possible solution to this problem ? ...since bsically 0 swap has ever been used on the machine anyway... Not really. The code in machdep.c allocated pmaps for swapped memory based on the size of real memory, rather than based on available swap. The reason it does this is that you can (effectively) add an arbitrary amount of swap later with swapon, without the swap devices at the time being known to the kernel at boot. THis makes it impossible to prereserve the number of pmap pages that will be needed for the actual amount of swap. Matt Dillon made some autosizing changes after I complained about this before. My actual complaint was to implicate the size of real memory available relative to the size of the full address space. The change he made attempts to autosize, and doesn't quite mirror this policy directly. THis code is not available in 4.5. I believe that it was back-ported to 4.6, but you would have to look at the CVS log on machdep.c to be sure about this -- it may only be in -current. The upshot of this is that having a lot of memory reserves pmap entries at 4K per 4M of real OR virtual memory. The result of this is that at 4G of physical RAM, you actually end up allocating more pmap's than 1G of memory can contain, since the total of physical RAM plus swap over 1024 is larger than 1G minus the amount taken by an idle kernel, not including the page mappings. If you have 3G of real RAM (which you do), then you are on the borderline of running out. When you factor in the amount of *potential* swap that machdep.c reserves, plus tuning for maxfiles/sockets/inpcb/tcpcb/mbufs/etc. (if any), PLUS the RAM taken up for things associated with running over 1000 processes (as your system does), then you end up exhausting the amount of VM space available. As I said before, though, the only way to know for sure if this is your real problem is to break to the debugger after the lockup (it's *not* a crash), and check out the wait channels for the processes thar are unable to run. If you want a tweak for 4.5 that has about a 95% proability of masking the problem, then you need to up the KVA space. Unfortunately, it's not really possible to tell you where every byte of memory is going. Also, unfortunately, the pmap's for swappable memory are not themselves swappable (or this would not be a problem). Probably, pmaps for swap and for file backing store for exectuables should be allocated when they are needed, not preallocated (they can be, if you are not out of RAM, or have RAM, but are out of KVA space in which to create mappings) [see growkernel]. Taking out 1G of physical memory from the box might also fix the problem without a kernel tweak, FWIW. However, right now, you need to cause the problem, enter the debugger, and use ps in the debugger to examine the wait channels. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: inuring FreeBSD to the apache bug without upgrading apache ?
Yeah; this whole thread is premised on working around the problem without an Apache software change. It's a reasonable premise (IMO) -- if you've got a custom compilation and a lot of modules, that can end up being a lot of software. I build a PHP4+SSL+Apache+IMAP+etc. source tree at one point, and it ended up being ~1.2 million lines of code, all told, that had to be made to work together. If you had just built it, then it would be very hard to update just one component without repeating the whole process. My advice? Use CVS. Actually, this whole thread is premised on I have a dev system with 16 jailed apaches and it would be a pain to upgrade all 16 of them vs. just making one global kernel/environment change. It sounds like that is probably a pipe dream though.. --PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
jump in and try it, I want to confirm what I believe to understand, I need to set the KVA value in my kernel config _and_ edit those other two files in the kernel source, then just recompile my kernel. Sound like I'm on the right track ? Yes. That's the way to do it for 4.5, specifically. Because I am paranoid, I like to check the state of a measurement before making a change and then after, to see that what I did did indeed induce a change ... I have this irrational fear that sometimes I make changes like this and nothing in fact changed, and I just don't know it :) So, should I just look for the value of: vm.zone_kmem_kvaspace: 179691520 to increase in size even though the physical RAM stays the same at 3gigs, or is there some other measurement I should look at before and after the KVA increase to ensure that it worked (and yes, I know that if it doesn't work I probably will have an inoperable machine, but just out of curiousity...) thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
What it does is the userland hangs, but the kernel keeps running. When the system is crashed, I can still ping it successfully, and I can still open sockets (like I can open a connection to a jails httpd or sshd, or the sshd of the underlying server itself) but nothing answers on the sockets - they just hang open. So everything stops running, but it is still up - still responds to pings...syslog stops logging though, cron stops running Two questions for you: 1) do you allow them write access to their /dev/mem, /dev/kmem, /dev/io ? 2) does this sound like what you see? Can you still ping the crashed server ? I'm mostly just curious if this kind of crash (userland hung but kernel running) is a possible outcome of someone in a jail fiddling with those /dev nodes, or if fiddling with dev/mem or /dev/kmem or io would just lock the machine up hard and completely. Terry? --PT On Fri, 21 Jun 2002, Nielsen wrote: Yes I've had the same problem. One system runs just fine with it's jails, and another crashes habitually. It has to do with a certain jail (and services). Our system are set up to be able to move jails between them (great for backups and near perfect uptime), and a certain set of jails always hangs the system in this way. I'm trying to narrow it down. Do you get a core dump or does it just hang? Nate - Original Message - From: Patrick Thomas [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, June 21, 2002 16:43 Subject: (jail) problem and a (possible) solution ? A test server of mine running a number of jails keeps locking up - but the odd thing about the lockup is that the userland stops, but the kernel keeps running (sockets can be opened, but the servers never respond on them, the machine still responds to pings, but logs show that all real activity stops) I just noticed today that some jails still have writable /dev/mem and /dev/kmem and /dev/io nodes. I think it is plausable that some kind of fiddling (writing) to these nodes is causing this kind of lockup. Is this assumption reasonable, or if some jail user fiddled with their /dev/mem or /dev/kmem or /dev/io node would it just totally crash out the machine and I _wouldn't_ still be able to ping the server after it crashes ? thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
Terry, Thanks for that informative email - just a quick reality check though (for myself) - the last time this type of crash happened, I was running and watching `top` on the machine - and when it froze, the `top` output froze as well, and this was the last display on the screen: last pid: 6603; load averages: 3.81, 1.84, 1.48 1032 processes:1 running, 1026 sleeping, 5 zombie CPU states: 1.8% user, 0.8% nice, 3.2% system, 0.1% interrupt, 94.1% idle Mem: 1129M Active, 1404M Inact, 351M Wired, 103M Cache, 199M Buf, 28M Free Swap: 2018M Total, 2732K Used, 2015M Free Since all of the things you spoke of basically revolved around you're running out of memory, is it possible or reasonable to think that within the space of 1 second, I ran through 1404 megs inactive and 28 megs free memory ? machine is 4.5-RELEASE with 3gigs ram. swap never gets touched, although there is in fact 2gigs of swap. `pstat -s` always shows 0% used. I'll do the debug actions you suggested. --PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
How do you increase KVA space these days ? I see that in earlier releases you had to edit /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h and do all sorts of crazy stuff. What is the procedure in 4.5-RELEASE (please say just change KVA_PAGES=260 to KVA_PAGES=512) That's what you want me to do, right ? Is that all - can it be done just by changing that one value in my kernel config ? Again, thank you Terry for all your help. --PT On Sat, 22 Jun 2002, Terry Lambert wrote: Patrick Thomas wrote: Since all of the things you spoke of basically revolved around you're running out of memory, is it possible or reasonable to think that within the space of 1 second, I ran through 1404 megs inactive and 28 megs free memory ? machine is 4.5-RELEASE with 3gigs ram. swap never gets touched, although there is in fact 2gigs of swap. `pstat -s` always shows 0% used. OK, there's memory, and then there's memory. The amount of swap you have, the fact that it's 4.5, and the amount of RAM you have imply to me that the problem is that you are out of pmap entries. You should up your KVA space to 2G or maybe even 3G; the default in 4.5 was 1G. Basically, I now think that you don't have enough memory to map how much memory and virtual memory you have. Amusingly enough, you might actually have *better* luck with a lot less swap... If your KVA space is already enlarged above the default, then you can ignore this and just go ahead with the debugging to see what the wait channels for all the processes that won't run are stuck at. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: (jail) problem and a (possible) solution ?
I think I'll just decrease my swap size from 2 gigs to 1 gig - is that a reasonable alternative that provides the same benefit and possible solution to this problem ? ...since bsically 0 swap has ever been used on the machine anyway... --PT On Sat, 22 Jun 2002, Terry Lambert wrote: Patrick Thomas wrote: How do you increase KVA space these days ? I see that in earlier releases you had to edit /sys/conf/ldscript.i386 and /sys/i386/include/pmap.h and do all sorts of crazy stuff. What is the procedure in 4.5-RELEASE (please say just change KVA_PAGES=260 to KVA_PAGES=512) That's what you want me to do, right ? Is that all - can it be done just by changing that one value in my kernel config ? It's what I want you to do. For 4.5, you have to hack ldscript.i386 and pmap.h. I've posted on how to do this before (should be in the archives). The pages are all going to be off-by-one from your calculations, for the recursive page mapping, or off-by-two if your kernel is an SMP kernel, for the per CPU page, so remember that, or you will end up with a kernel that simply doesn't boot. The easiest way is to look at the numbers in pmap.h, and figure out how they relate to 0xc000 (remember to OR in 0x0010 after your math, to count the kernel loading at 1M). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: inuring FreeBSD to the apache bug without upgrading apache ?
What none of you has mentioned is the thought I had in mind when I asked this question, and that is, I have a rd machine with 16 jails on it, each running apache. Therefore in a situation like this it would be _much_ easier to just tune a sysctl or rebuild the kernel, vs. rebuilding 16 differently configured, different versions of apache. YMMV. --PT On Fri, 21 Jun 2002, Frank Mayhar wrote: Brandon D. Valentine wrote: However, I would ask Frank if there's a particular reason he needs to use Covalent Raven SSL. OpenSSL is free, works like gangbusters, and comes with FreeBSD. I have a feeling he'd be much happier with it if there's not some other reason he cannot move to it. As I mentioned, the two reasons are (1) it hasn't been broken (at least up to now) and (2) I haven't had time. These are colocated production boxes; I don't have easy physical access to them to fix things if they go seriously wrong, and having them be down for any length of time is a Bad Thing. -- Frank Mayhar [EMAIL PROTECTED] http://www.exit.com/ Exit Consulting http://www.gpsclock.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
inuring FreeBSD to the apache bug without upgrading apache ?
Is it possible to patch/recompile FreeBSD 4.5 in such a way that your system is no longer vulnerable to the chunking attack, even if you are still running a vulnerable apache ? I ask because I see in one of the chunking exploits that: * Remote OpenBSD/Apache exploit for the chunking vulnerability. Kudos to * the OpenBSD developers (Theo, DugSong, jnathan, *@#!w00w00, ...) and * their crappy memcpy implementation that makes this 32-bit impossibility * very easy to accomplish. Which leads me to believe there are structures in the OS which help this vulnerability to exist. I am _very_ interested to find out if it is possible to patch this bug at the FreeBSD OS level and not the apache level. thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
reboot your own jail ?
currently I reboot jails with this process: 1. someone logs into the jail and runs `kill -KILL -1` 2. someone logs onto the BASE machine and starts it up again. I wish I could do this without involving the admin of the base machine. Has anyone come up with a strategy for allowing the root jail user to successfully reboot their own jail without outside help ? I can think of some horrible hacks involving constantly checking if the jail is runningand if it ever stops (presumably someone rebooted it) then start it again...hopefully there is sonhmething more elegant than that. --pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: reboot your own jail ?
why -TERM ? the jail man page recommends -KILL ... just curious... On Thu, 16 May 2002, Marc G. Fournier wrote: web interface that is password protected that does: ssh root@jail kill -TERM -1 restart jail On Thu, 16 May 2002, Patrick Thomas wrote: currently I reboot jails with this process: 1. someone logs into the jail and runs `kill -KILL -1` 2. someone logs onto the BASE machine and starts it up again. I wish I could do this without involving the admin of the base machine. Has anyone come up with a strategy for allowing the root jail user to successfully reboot their own jail without outside help ? I can think of some horrible hacks involving constantly checking if the jail is runningand if it ever stops (presumably someone rebooted it) then start it again...hopefully there is sonhmething more elegant than that. --pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
syncookies exploit behavior
Two questions regarding the syncookies issue - 1. What kind of crash is it ? I have an issue where my machine has no response at the console, and none of the services work (pop, imap, etc.) HOWEVER you can still ping it, and you can still initiate connections to services - they just dont talk or respond at all - and cron jobs no longer run. Someone suggested that it looks like my userland is frozen, but my kernel is still running. Is that the kind of crash you get when you encounter the syncookies problem ? 2. Is there any way to scour tcpdump on the _affected_ machine to see if syncookies was indeed your problem ? This is sort of two questions - first, will the machine be crashed so fast it won't have time to write tcpdump output to a file for the packet that caused the crash ? and second, if it is possible, what would that tcpdump output look like ? I suspect you can't scour tcpdump for it, since this problem can be caused by legitimate traffic. comments appreciated, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: what causes a userland to stop, but allows kernel to continue?
No denied requests. It's not mbufs. It must be something else. How do you feel about this: # vmstat -z ITEMSIZE LIMITUSEDFREE REQUESTS PIPE:160,0,702,522, 236316 SWAPMETA:160, 509724,452,136, 1125 unpcb:64,0,542, 98, 3398824 ripcb: 192,16424, 0, 42,3 syncache:160,15359, 0, 51,49824 tcpcb: 544,16424,353,957,64527 udpcb: 192,16424, 83, 45, 150821 socket: 192,16424,979,813, 3614256 KNOTE:64,0, 1,127,51798 DIRHASH:1024,0, 1740,268,36897 NFSNODE: 352,0, 0, 0,0 NFSMOUNT:544,0, 0, 0,0 VNODE: 192,0, 124417, 27, 124417 NAMEI: 1024,0, 0, 24, 151244479 VMSPACE: 192,0,875,533, 3797606 PROC:416,0,881,540, 3797656 DP fakepg:64,0, 0, 0,0 PV ENTRY: 28, 2690954, 601601, 266301, 2806153478 MAP ENTRY:48,0, 34223, 4070, 246626232 KMAP ENTRY: 48, 128821, 3795,514, 369055 MAP: 108,0, 7, 3,7 VM OBJECT:96,0, 132173, 10127, 97570617 anything interesting ? thanks. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
what would cause a server to behave this way ?
We have a FreeBSD 4.5-RELEASE server, it is a SMP system, and four days ago the following happened: - console became unresponsive - caps lock key no longer toggled the caps lock button - you _could_ still ping the server - you could still establish connections to running services, but NONE of those services would actually talk to you. They would just establish connection and then sit there. Here is an example, trying to ssh into the machine: # ssh -v [EMAIL PROTECTED] SSH Version OpenSSH-2.1, protocol versions 1.5/2.0. Compiled with SSL (0x0090581f). debug: Reading configuration data /etc/ssh/ssh_config debug: ssh_connect: getuid 0 geteuid 0 anon 0 debug: Connecting to example.com [1.2.3.4] port 22. debug: Allocated local port 890. debug: Connection established. and that is as far as it would go - just sat there forever. Same is true with telneting to port 25 or port 110 or 53 - you would establish a connection, but you would get no response or output from the server. We eventually just had to power cycle. --- So anyway, we are confused - we could still ping it, we could see that processes (sshd server, mail server, etc.) were still running, and it even looks like cron jobs continued to run - however, from the console it looked like a classic hard lock (no caps light LED toggle). This is a fairly heavily loaded system - in `top` idle CPU usually hovers around 60%. But we have never had any trouble in the past... any comments/suggestions appreciated. --PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
what causes a userland to stop, but allows kernel to continue ?
So, based on a previous thread, it looks like I have a server whose userland halted, essentially, but the kernel continued running. As evidenced by: - you can still ping the server just fine - you can still connect to running services just fine - if you ssh to it, `ssh -v` (verbose) claims a connection is established, but the server doesn't respond in any way over that connection. Further, you can telnet to POP or IMAP or HTTP ports, and get a connection, but you can't get any response. - cron does NOT run while the server is in this state - no jobs run - no response from the console - caps lock does NOT toggle the LED So, as was suggested in the previous thread, it looks like my kernel is still running, but the userland has halted. There are no log entries that give any clue as to why this happened last week. 1. from a theoretical standpoint, how would this happen ? 2. Is there any way to watchdog for it and escape from it before the userland completely crashes ? 3. any previous/old problems that would cause this behavior ? It is a FreeBSD 4.5-RELEASE system, and it is SMP - fairly heavily loaded (averages 60% CPU idle in `top` output). thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: what causes a userland to stop, but allows kernel to continue?
Are NMBCLUSTERS and mbuf determined by 'maxusers' ? I have maxusers=512 ... comments ? When you suggest 'clamp the total number of sockets that are permittedto be open' ... how is this done - is there a sysctl that corresponds to total number of sockets that are permitted to be open ? I am also a little confused how this performance issue is solved by _lowering_ a tunable value - all of my problems up to this point (ran out of file descriptors, ran out of ptys, etc.) were solved by increasing them. Thank you for your help, PT On Sun, 5 May 2002, Terry Lambert wrote: Anthony Schneider wrote: Livelock, maybe? Is there some sort of internal kernel semaphore table which might be getting filled up or something? I'd also like to find out more about this, but sadly, the machine is a remote one and I can't drop into ddb as suggested... Thanks you all very much. Hope this information is of use. -Anthony. More likely, you have run out of some non-renewable resource, such as mbufs, and are in the midst of a deadly embrace deadlock (e.g. as a result of having no mbufs to send responses or receive acknowledgements which would free up mbufs currently held for TCP sessions in progress, etc.). The easies way to see this is to periodically record vmstat -m and netstat -m output to a disk file, and sync, in order to make sure that it's recorded at the time you must reset. Then plot the information over time, up to the point of the failure, and you will likely see the problem in gory detail. If it is something like mbuf starvation, then you should clamp the total number of sockets that are permitted to be open at half the maximum window size divided into the number of mbufs available, minus 10% for a reserve. In general, the tuning page is broken; a number of the things it suggests tuning via systctl at run time are not actually tunable at run time, only at boot time. Though at run time, they will remove the top end limits, they will in fact not result in the reservation of sufficient resource to meet those limits, as they would had they been in effect at boot time, instead. In particular, increasing the number of open files permitted by modifying maxfiles via sysctl at runtime will not add to the prereserved amount of tcpcb's, inpcb's, or socket structures, all of which could leave you starving for one of these objects, or the mbuf's needed to support them, at runtime. It pays to understand the code before fiddling the numbers. ;^). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
RFC on my SHM tunings for multiple jailed postgres...
I have a large server that will be running ~24 jails, 8 of which will be running their own postgres server. Because of this fact: By default, Postgres allocates 34 semaphores, which is over half the default system total of 60. I need to tune kernel SHM settings in order to even run the second postgres, much less the other six. So, this is what I have in my kernel, and I appreciate any comments or suggestions regarding the appropriateness: (of course, by default, I have) options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues (and the following is _all_ that I have added) options SHMMAXPGS=16384 options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) options SHMSEG=256 options SEMMNI=384 options SEMMNS=768 options SEMMNU=384 options SEMMAP=384 My references for this are: http://www.us.postgresql.org/users-lounge/docs/7.1/admin/kernel-resources.html http://groups.google.com/groups?q=freebsd+SEMMNI+postgreshl=enselm=01091023443406.73075%40prime.vsservices.comrnum=7 All comments and suggestions appreciated ! --PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
cryptography implications (privacy) of FreeBSD jail ?
Let's say I am running in a jail, and say 5 other people are running in other, seperate jails on the same machine. Now lets say I start up pgp, and generate my keys, and generally use pgp through the command line in my jail. Or, instead of pgp I do other crypto related sensitive activities... what is my risk here ? Can someone either on the host machine or in one of the other jails watch memory on the machine and discern things like my keys or passphrases or have very easy access to the data I am decrypting ? Please feel free to expand on the topic as well, in case there are related questions that I am _not_ asking, but should be... --pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: cannot get more than 32 PTYs in 4.4-RELEASE
Ok, see the point is, I have _already done this_ sh MAKEDEV pty0 # 0-31 sh MAKEDEV pty1 # 32-63 sh MAKEDEV pty2 # 64-95 sh MAKEDEV pty3 # 96-127 sh MAKEDEV pty4 # 128-159 xterm won't recognize by default sh MAKEDEV pty5 # 160-191 xterm won't recognize by default sh MAKEDEV pty6 # 192-223 xterm won't recognize by default sh MAKEDEV pty7 # 224-255 xterm won't recognize by default These are the exact commands I used with `sh MAKEDEV` to create my 256 pty /dev entries. So to recap, all 256 /dev files are there, all 256 entries are in /etc/ttys (and were there by default) and I have: maxusers128 and pseudo-device pty 128 in my kernel. And when I create 32 screens with `screen`, nobody else can login by any method (ssh, telnet, etc.). (No more PTYs error, etc.) What am I missing here ? Please note that this is 4.4-RELEASE - this doesn't seem to be a problem in 4.5 thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Four misc. questions related to jail usage
Patrick Thomas [EMAIL PROTECTED] writes: 1. Does each jail need to have its own proc filesystem mounted? No, procfs is pretty much useless these days (except for truss). In 4.5, won't `ps` (and perhaps other apps) not work for people in a jail if their jail does not have a proc file system mounted in their /proc ? --pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
cannot get more than 32 PTYs in 4.4-RELEASE
In my kernel, I have: maxusers128 pseudo-device pty 128 In my /dev directory, I have used `sh MAKEDEV` to make all 256 /dev/pty files. They are all there, and all have correct major/minor numbers. I know I won't be using all 256 of them, but I just made them all anyway. In /etc/ptys, I didn't change anything, because all 256 pty entries are ALREADY in there: # Pseudo Terminals ttyp0 nonenetwork ttyp1 nonenetwork ... ttySu nonenetwork ttySv nonenetwork So those are all there. I have used `sysctl -a | grep maxuser` to verify that maxusers is indeed 128. BUT - if I log on via ssh and start screen, and start 31 new screen windows, then nobody else can log on to the system - I cannot create any more screen windows AND nobody else can ssh in - the machine has run out of ptys. I use `fstat` to inquire, and I am maxed out at exactly 32 ptys. SO THE question is, why am I stuck at 32 ptys ? I have done it all - everything that is in any doc or news post, and everything I was told to do here and on -hackers, and yet I am still stuck at 32 !!! Please tell me the secret lore for getting more than 32 ptys in 4.4-RELEASE. thanks, PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: using vnconfig devices instead of partitions for jails ?
thank you - I am glad to see that this is a good way of doing things. Two quick items: 1. How do I give each jail a 'proc' filesystem in its /proc using this configuration ? 2. Is there any downside to this whatsoever ? This seems infinitely better than a new partition for each jail, so was I just silly for doing it that way ? thanks! On Wed, 27 Feb 2002, Nik Clayton wrote: On Wed, Feb 27, 2002 at 03:03:11PM -0600, Kirk Strauser wrote: At 2002-02-27T20:49:18Z, Patrick Thomas [EMAIL PROTECTED] writes: I would like to put a large number of jails (16 or 20) on a server for testing purposes. I have two options so far: create 16 or 20 partitions OR just put them all in one partition, but the downside of that is that then I cannot enforce disk usage between jails. So at this point, 16-20 partitions seems the safest route. Good question. Is there any ability at all within the system to set a quota on a jail? Each vn* device has to be baced by a physical file on the system. Simply make sure that this physical device is the maximum size you want to allow in the jail. For example, on a server with 160GB of (RAID) disk, and 12 jails, each 10GB in size, I just have 12 jails; On the 'master' host for the jails. # cd /usr/local/jails/disk-images # ls -l totall 1758115 drwxr-xr-x 2 root wheel 512 Jan 23 00:40 . drwxr-xr-x 4 root wheel 512 Jan 23 00:39 .. -rw-r--r-- 1 root wheel 136 Jan 22 18:45 README -rw-r--r-- 1 root wheel 10737418240 Feb 27 23:35 foo.com.vn -rw-r--r-- 1 root wheel 10737418240 Feb 27 23:35 bar.com.vn -rw-r--r-- 1 root wheel 10737418240 Feb 27 23:35 baz.com.vn ... These were created with truncate 10G file, and are then mounted configued on different vn* devices, which are then mounted as normal. # mount ... /dev/vn0a on /usr/local/jails/foo.com /dev/vn1a on /usr/local/jails/bar.com /dev/vn2a on /usr/local/jails/baz.com ... N -- FreeBSD: The Power to Serve http://www.freebsd.org/ (__) FreeBSD Documentation Projecthttp://www.freebsd.org/docproj/\\\'',) \/ \ ^ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- .\._/_) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: using vnconfig devices instead of partitions for jails ?
one other thing: How many mount points (jails, in this case) can I run ? I see that there are 8 existing vn0X device files in /dev - can I just create more of them using MAKEDEV (or mknod) and keep going ? What is the maximum ? 256 ? also, do I need to alter the kernel to support more vn0X device files, or does a stock kernel support all the way up to the maximum (whatever that is - see previous question :) thanks again - much appreciated. --pt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
using vnconfig devices instead of partitions for jails ?
I would like to put a large number of jails (16 or 20) on a server for testing purposes. I have two options so far: create 16 or 20 partitions OR just put them all in one partition, but the downside of that is that then I cannot enforce disk usage between jails. So at this point, 16-20 partitions seems the safest route. But, what about using vnconfig to create files of fixed sizes and then mounting them? Is this reasonable ? Is there a limit to how many vnconfig files I can mount as filesystems ? Is there a way to mount a directory _inside_ a vnconfig mount as a 'proc' filesystem (since the jail needs proc in order for ps, etc., to work?) Any comments about this idea in general are appreciated. --PT To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message