FS gurus needed! (was: Strange lock-ups during backup over nfs after adding 1024M RAM)

2002-02-04 Thread Sergey Gershtein

On Monday, February 04, 2002 Peter Jeremy [EMAIL PROTECTED] wrote:

PJ On 2002-Feb-01 12:36:50 +0500, Sergey Gershtein [EMAIL PROTECTED] wrote:
Here's what vmstat -m says about FFS node:

Memory statistics by type  Type  Kern
Type  InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
 ...
 FFS node152293 76147K  76479K102400K  31264670 0  512
 ...

PJ One oddity here is the Size - FFS node is used to allocate struct
PJ inode's and they should be 256 bytes on i386.  Are you using something
PJ other than an i386 architecture?  Unless this is a cut-and-paste
PJ error, I suspect something is radically wrong with your kernel.

Yes, it's i386 and it's not cut-and-paste error.

The current output of vmstat -m says:

 ...
 FFS node152725 76363K  76479K102400K  92476020 0  512
 ...
 vfscache157865 10671K  11539K102400K  96684970 0  64,128,256,512,512K
 ...
 
The system uptime is 5 days, backup is temporarily disabled.

I put the coplete output of 'vmstat -m', some other commands and
kernel config on the web on http://storm.mplik.ru/fbsd-stable/ so you
can have a look at it.

By the way, on our second server running the same hardware the size of
FFS node is also 512. How can it be so?

PJ By default, the memory limit is 1/2 vm_kmem_size, which is 1/3 physical
PJ memory, capped to 200MB.  Which means you've hit the default cap.

PJ You can increase this limit with the loader environment
PJ kern.vm.kmem.size (see loader(8) for details).  (This is also capped
PJ at twice the physical memory - which won't affect you).  Before you go
PJ overboard increasing this, note that the kernel virtual address space
PJ is only 1GB.

Hmm.  Not sure what to do.  Shell I try to play with kern.vm.kmem.size
or better not touch it?  I am now thinking that removing the extra
memory we've added is the best solution to the problem. I don't like
this solution though.

PJ How many open files do you expect on your box?
PJ Is it reasonable for there to be 150,000 active inodes?

ptat -T right now says:

666/4096 files
0M/511M swap space

I don't expect the number of open files go beyond 1,000-1,500.  The
only problem is accessing a lot (more than a 1,000,000) of small files
over NFS.  But if I understand correctly, those files should be opened
and closed one by one, not all together. Is that right?

PJ Does vfscache have around the same number of InUse entries as FFS node?

Yes, it seems so (see above).  What does it mean?

PJ What is the output of sysctl vfs?

See http://storm.mplik.ru/fbsd-stable/sysctl_vfs.txt

PJ PS: I'm still hoping that one of the FS gurus will step in and point
PJ out what's wrong.

I changed the subject of my message to catch attention of FS gurus on
the list.

Thank you,
Sergey


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Strange lock-ups during backup over nfs after adding 1024M RAM

2002-01-31 Thread talist

I am also encountring strange lock-ups with high traffic situation on DUMMYNET. 
I have managed to avoid those lock-ups by adjusting the granularity of the
kernel
options HZ=1000  # originally 100


Quoting Sergey Gershtein [EMAIL PROTECTED]:

 On Thursday, January 31, 2002 Peter Jeremy wrote:
 
 PJ It looks like you've run out of kernel memory.  At a quick guess, one
 PJ of the nfsd processes is trying to open a file and can't allocate
 PJ space for another inode whilst holding locks on other inodes.  The
 PJ lockup is either due to the lack of KVM, or the inode locks are
 PJ migrating up towards root and gathering more processes under their
 PJ clutches until nothing can run.
 
 PJ If you monitor the memory usage with vmstat -m, you should be
 PJ able to see the free memory drop to zero, possibly all eaten by
 PJ the FFS node.
 
 I've set up a cron job to monitor vmstat -m every 5 minutes so I can see
 what happens just before the next lock-up.
 
 By the way, the file system that is being backuped has a lot (more
 than 1,000,000) of small files (less than 1Kb each).
 
 PJ That triggers a faint memory about a problem with doing this, but
 PJ I thought it was now fixed.  How old are your sources?
 
 RELENG_4_4 cvsuped a week ago (Jan 24th).  For some reason we don't
 cvsup 4.5 until it becomes RELEASE.
 
 PJ Increasing the amount of physical RAM increases the amount of KVM
 PJ required to manage the RAM, reducing the amount of memory available
 PJ for other things.  I didn't keep your original posting and I can't
 PJ remember what MAXUSERS is set to - from memory it is either 128
 PJ (which seems too small) or 1024 (which seems too large).  Try altering
 PJ maxusers to 400-500 and see if that helps.
 
 The initial value of MAXUSERS was 512, I tried lowering it to 128
 according to Doug White's advice, but it did not help.  Another server
 with the same hardware (which does not lock up) has MAXUSERS 1024, but
 it also does not have over 1,000,000 small files to backup.
 
 PJ If you still have problems, I think you'll need one of the FS gurus.
 
 My hope was to find gurus on this list, I have no clue where else I
 can search for them. :(
 
 Nevertheless, thank you for your help!
 
 Regards,
 Sergey Gershtein
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-stable in the body of the message
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re[2]: Strange lock-ups during backup over nfs after adding 1024M RAM

2002-01-30 Thread Sergey Gershtein

On Wednesday, January 30, 2002 Peter Jeremy wrote:

PJ Compile and run a kernel with options DDB.  When it locks up, use
PJ Ctrl-Alt-Esc to enter DDB and try ps - this will tell you what
PJ processes are running/blocked.  (Read ddb(4) for more details).

I did that, started backup and when the lock-up happened entered
DDB from console.  ps output showed that most processes were in
'inode' state (wmesg title of ps output).  There were about a hundred
of httpd processes and 2 nfsd in 'inode' state.  Another one nfsd process
was in 'FFS node' state.

When I typed 'c' to go on, switched to another console and tried to
log in (it froze after hitting enter) I entered DDB again and ps
showed that getty process went into 'inode' state too.

Could you tell me what this 'inode' state means and what conclusions
can be done from the situation?

By the way, the file system that is being backuped has a lot (more
than 1,000,000) of small files (less than 1Kb each). They are
organized in directory structure of about 1000 files per directory.
The lock-up happens while backuping these files over nfs.  Does it
have anything to do with number of inodes?  Does name translation
cache somehow get overflowed?  How all of this is related to the
amount of system RAM (no lock-ups ever happened until we increased the
amount of RAM from 1Gb to 1,5Gb).

Any suggestions are greatly appreciated!

Regards,
Sergey Gershtein


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Strange lock-ups during backup over nfs after adding 1024M RAM

2002-01-22 Thread Sergey Gershtein

Hi!

Our server runs FreeBSD 4.4-STABLE.  Until recently everything was
ok, but when we increased the amount of RAM from 1024Mb to 2048Mb
strange lock-ups started to happen.  All lock-ups happened at night
where activity was pretty low.  We run backup over nfs nightly, and
there is a good chance it is nfs that causes the problem. When the
lock-ups happen backup is usually somewhere in the middle and after
the server is restarted backup finishes ok.

The most strange thing about it is the lock-up itself.  The server
keeps responding to pings, keyboard is working (it is possible to
switch consoles and type, but not log in), there is nothing on console
and in any logs, but nothing else (cron, web server, telnet, ftp,
etc) is working.  Nothing even happens if ctrl-alt-del is pressed on
console.  After the hard reboot everything works fine until next
night.

Anyone has any ideas what can cause the problem? I don't think it is
hardware problem, since server works fine all the day during heavy
load. I suspect that there could be some problem with the amount of
memory, maybe kernel (nfs code?) can't handle situation where cache
gets too big (say, more than 1Gb)...

Any ideas on how to investigate and find the problem will be greatly
appreciated!

Regards,
Sergey Gershtein

--
Ural Relcom Ltd,
Ekaterinburg, Russia


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message