FS gurus needed! (was: Strange lock-ups during backup over nfs after adding 1024M RAM)
On Monday, February 04, 2002 Peter Jeremy [EMAIL PROTECTED] wrote: PJ On 2002-Feb-01 12:36:50 +0500, Sergey Gershtein [EMAIL PROTECTED] wrote: Here's what vmstat -m says about FFS node: Memory statistics by type Type Kern Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) ... FFS node152293 76147K 76479K102400K 31264670 0 512 ... PJ One oddity here is the Size - FFS node is used to allocate struct PJ inode's and they should be 256 bytes on i386. Are you using something PJ other than an i386 architecture? Unless this is a cut-and-paste PJ error, I suspect something is radically wrong with your kernel. Yes, it's i386 and it's not cut-and-paste error. The current output of vmstat -m says: ... FFS node152725 76363K 76479K102400K 92476020 0 512 ... vfscache157865 10671K 11539K102400K 96684970 0 64,128,256,512,512K ... The system uptime is 5 days, backup is temporarily disabled. I put the coplete output of 'vmstat -m', some other commands and kernel config on the web on http://storm.mplik.ru/fbsd-stable/ so you can have a look at it. By the way, on our second server running the same hardware the size of FFS node is also 512. How can it be so? PJ By default, the memory limit is 1/2 vm_kmem_size, which is 1/3 physical PJ memory, capped to 200MB. Which means you've hit the default cap. PJ You can increase this limit with the loader environment PJ kern.vm.kmem.size (see loader(8) for details). (This is also capped PJ at twice the physical memory - which won't affect you). Before you go PJ overboard increasing this, note that the kernel virtual address space PJ is only 1GB. Hmm. Not sure what to do. Shell I try to play with kern.vm.kmem.size or better not touch it? I am now thinking that removing the extra memory we've added is the best solution to the problem. I don't like this solution though. PJ How many open files do you expect on your box? PJ Is it reasonable for there to be 150,000 active inodes? ptat -T right now says: 666/4096 files 0M/511M swap space I don't expect the number of open files go beyond 1,000-1,500. The only problem is accessing a lot (more than a 1,000,000) of small files over NFS. But if I understand correctly, those files should be opened and closed one by one, not all together. Is that right? PJ Does vfscache have around the same number of InUse entries as FFS node? Yes, it seems so (see above). What does it mean? PJ What is the output of sysctl vfs? See http://storm.mplik.ru/fbsd-stable/sysctl_vfs.txt PJ PS: I'm still hoping that one of the FS gurus will step in and point PJ out what's wrong. I changed the subject of my message to catch attention of FS gurus on the list. Thank you, Sergey To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message
Re: Strange lock-ups during backup over nfs after adding 1024M RAM
I am also encountring strange lock-ups with high traffic situation on DUMMYNET. I have managed to avoid those lock-ups by adjusting the granularity of the kernel options HZ=1000 # originally 100 Quoting Sergey Gershtein [EMAIL PROTECTED]: On Thursday, January 31, 2002 Peter Jeremy wrote: PJ It looks like you've run out of kernel memory. At a quick guess, one PJ of the nfsd processes is trying to open a file and can't allocate PJ space for another inode whilst holding locks on other inodes. The PJ lockup is either due to the lack of KVM, or the inode locks are PJ migrating up towards root and gathering more processes under their PJ clutches until nothing can run. PJ If you monitor the memory usage with vmstat -m, you should be PJ able to see the free memory drop to zero, possibly all eaten by PJ the FFS node. I've set up a cron job to monitor vmstat -m every 5 minutes so I can see what happens just before the next lock-up. By the way, the file system that is being backuped has a lot (more than 1,000,000) of small files (less than 1Kb each). PJ That triggers a faint memory about a problem with doing this, but PJ I thought it was now fixed. How old are your sources? RELENG_4_4 cvsuped a week ago (Jan 24th). For some reason we don't cvsup 4.5 until it becomes RELEASE. PJ Increasing the amount of physical RAM increases the amount of KVM PJ required to manage the RAM, reducing the amount of memory available PJ for other things. I didn't keep your original posting and I can't PJ remember what MAXUSERS is set to - from memory it is either 128 PJ (which seems too small) or 1024 (which seems too large). Try altering PJ maxusers to 400-500 and see if that helps. The initial value of MAXUSERS was 512, I tried lowering it to 128 according to Doug White's advice, but it did not help. Another server with the same hardware (which does not lock up) has MAXUSERS 1024, but it also does not have over 1,000,000 small files to backup. PJ If you still have problems, I think you'll need one of the FS gurus. My hope was to find gurus on this list, I have no clue where else I can search for them. :( Nevertheless, thank you for your help! Regards, Sergey Gershtein To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message
Re[2]: Strange lock-ups during backup over nfs after adding 1024M RAM
On Wednesday, January 30, 2002 Peter Jeremy wrote: PJ Compile and run a kernel with options DDB. When it locks up, use PJ Ctrl-Alt-Esc to enter DDB and try ps - this will tell you what PJ processes are running/blocked. (Read ddb(4) for more details). I did that, started backup and when the lock-up happened entered DDB from console. ps output showed that most processes were in 'inode' state (wmesg title of ps output). There were about a hundred of httpd processes and 2 nfsd in 'inode' state. Another one nfsd process was in 'FFS node' state. When I typed 'c' to go on, switched to another console and tried to log in (it froze after hitting enter) I entered DDB again and ps showed that getty process went into 'inode' state too. Could you tell me what this 'inode' state means and what conclusions can be done from the situation? By the way, the file system that is being backuped has a lot (more than 1,000,000) of small files (less than 1Kb each). They are organized in directory structure of about 1000 files per directory. The lock-up happens while backuping these files over nfs. Does it have anything to do with number of inodes? Does name translation cache somehow get overflowed? How all of this is related to the amount of system RAM (no lock-ups ever happened until we increased the amount of RAM from 1Gb to 1,5Gb). Any suggestions are greatly appreciated! Regards, Sergey Gershtein To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message
Strange lock-ups during backup over nfs after adding 1024M RAM
Hi! Our server runs FreeBSD 4.4-STABLE. Until recently everything was ok, but when we increased the amount of RAM from 1024Mb to 2048Mb strange lock-ups started to happen. All lock-ups happened at night where activity was pretty low. We run backup over nfs nightly, and there is a good chance it is nfs that causes the problem. When the lock-ups happen backup is usually somewhere in the middle and after the server is restarted backup finishes ok. The most strange thing about it is the lock-up itself. The server keeps responding to pings, keyboard is working (it is possible to switch consoles and type, but not log in), there is nothing on console and in any logs, but nothing else (cron, web server, telnet, ftp, etc) is working. Nothing even happens if ctrl-alt-del is pressed on console. After the hard reboot everything works fine until next night. Anyone has any ideas what can cause the problem? I don't think it is hardware problem, since server works fine all the day during heavy load. I suspect that there could be some problem with the amount of memory, maybe kernel (nfs code?) can't handle situation where cache gets too big (say, more than 1Gb)... Any ideas on how to investigate and find the problem will be greatly appreciated! Regards, Sergey Gershtein -- Ural Relcom Ltd, Ekaterinburg, Russia To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-stable in the body of the message