Re: massive memory leak in 3.8-stable samba
On Tue, 7 Mar 2006, Steve Fairhead wrote: One of my production machines (3.8-stable) has suddenly started panicing every couple of hours. I found out that the culprit is smbd, eating through memory like there's no tomorrow (approx. 10Mb / minute! ). Can't figure out what has triggered it, nothing changed on the machine lately and there is only one active w2k client, writing a 2.5kB file every 15 seconds or so. I'd be glad of any assistance, even pointing out any stupid mistakes I have made, because this is driving me nuts. I ran into something very similar recently. In my case I eventually discovered that one user was writing to a folder containing 22,000 files. Avoiding this folder has entirely solved the problem. (Or at least worked around it.) FWIW, the Samba logs were helpful only inasmuch as they pointed me to the user who was causing the problem. I had to sit down and watch her operate to find out what she was doing... Perhaps (indeed probably) not relevant to your problem, but might give you some ideas. If you're writing a file every 15s, perhaps your problem is related to mine. Steve http://www.fivetrees.com Have a similar problem, though mine appears when moving files between various directories when either has a large number of files in it. Finally decided to delve into the Samba code to see if I could track down the cause. I believe I tracked it to code that scans the directories prior to the file rename operation. The code that scans the directories does some odd things (like hammer on the system lib call: telldir() which appears to leak memory like a sieve if unaccompanied by a matching seekdir() -- approximately 16 bytes per file per directory scanned). This can add up to signifant loss if the directory has, say, 10,000 files or more in it, especially if the directory is scanned multiple times per operation and repeatedly over time. I'm at a bit of a loss how to proceed from here. Given the state and conventions of *BSD-ish directory library calls, Samba isn't scanning directories in a very memory efficient manner -- at least in the case of OpenBSD. But the directory traversals and possible dependencies on the scanning methods could spider badly to fix it properly and reliably within Samba. (ie, I believe that to fix the issue in Samba properly is a heck of a lot of work and effort, but I'm also not exactly a Samba expert/developer either) On the other hand, the fact that the system library call telldir() can leak as badly as it does probably isn't a good thing either as outlined here: http://mail-index.netbsd.org/netbsd-bugs/2004/02/05/0008.html It would appear that at least in OpenBSD 3.8-release, the library implementation suffers similar potental issues. I have no idea if the patch proposed in the URL above ever made it into NetBSD, since I don't run NetBSD anywhere; however, the patch looks promising. Changing the implementation of telldir() and related functions would likely fix this particular memory leak in Samba as well, though there may be underlying OS/userland issues about which I am unaware. I guess the bottom line here is that I can see that if you have a process writing into a directory that contains a LOT of files, Samba (or your client, or both) may be scanning the directory prior to any write, and possibly multiple times. If that's the case, the telldir() issue will likely affect you as well. While reducing the number of files in the directory in question won't stop the leak, it may significantly slow it ... I suppose this also assumes that your problem is related to the one I am seeing, and that my preliminary analysis is correct. Hope this helps, - Paul
Re: massive memory leak in 3.8-stable samba
Hi, Mitja: did you check the samba-logfiles? You could try to increase the loglevel and see, what smbd is doing. I always find these logfiles very helpful. Regards Stefan Kell --- Urspr|ngliche Nachricht --- Von: Per-Olov Sjvholm [EMAIL PROTECTED] An: Mitja Muenih [EMAIL PROTECTED] Kopie: misc@openbsd.org Betreff: Re: massive memory leak in 3.8-stable samba Datum: Mon, 6 Mar 2006 18:17:06 +0100 On Saturday 04 March 2006 10.59, you wrote: Hi! One of my production machines (3.8-stable) has suddenly started panicing every couple of hours. I found out that the culprit is smbd, eating through memory like there's no tomorrow (approx. 10Mb / minute! ). Can't figure out what has triggered it, nothing changed on the machine lately and there is only one active w2k client, writing a 2.5kB file every 15 seconds or so. I'd be glad of any assistance, even pointing out any stupid mistakes I have made, because this is driving me nuts. -- load averages: 0.42, 0.87, 1.71 10:45:59 23 processes: 22 idle, 1 on processor CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.2% interrupt, 99.8% idle Memory: Real: 290M/338M act/tot Free: 160M Swap: 2372K/256M used/tot PID USERNAME PRI NICE SIZE RES STATEWAIT TIMECPU COMMAND 30693 Guest 20 284M 284M sleepselect 0:24 0.44% smbd -- load averages: 0.28, 0.56, 1.35 10:50:14 23 processes: 22 idle, 1 on processor CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.2% interrupt, 99.8% idle Memory: Real: 348M/397M act/tot Free: 101M Swap: 2372K/256M used/tot PID USERNAME PRI NICE SIZE RES STATEWAIT TIMECPU COMMAND 30693 Guest 20 342M 342M sleepselect 0:29 1.03% smbd - # smbstatus Samba version 3.0.13 PID Username Group Machine --- Service pid machine Connected at --- MC 30693 x Sat Mar 4 10:23:13 2006 IPC$ 13147 x Sat Mar 4 10:41:57 2006 Locked files: PidDenyMode Access R/WOplock Name -- 30693 DENY_NONE 0x2019f RDWR EXCLUSIVE+BATCH /var/shared/AB/gdat/ini/G_dat.ini Sat Mar 4 10:43:59 2006 The kernel is (full dmesg at the end) OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar 4 01:45:40 CET 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID (previously had a -stable built on Jan 3 2006, same symptoms) # pkg_info |grep samba samba-3.0.13p0 SMB and CIFS client and server for UNIX samba-docs-3.0.20b documentation and examples for samba (binary package from ftp.kd85.com, tried also to build it from ports and even MFC'd the latest version, 3.0.31b - no change) # cat /etc/samba/smb.conf [global] dos charset = CP852 workgroup = STIL server string = x map to guest = Bad User passdb backend = tdbsam passwd program = /usr/bin/passwd %u log file = /var/log/smbd.%m max log size = 50 mangle prefix = 6 add user script = useradd -d /var/empty -s /sbin/nologin %u add group script = groupadd '%g' add machine script = useradd -d /var/empty -s /sbin/nologin -g machines %u logon script = logon.bat logon path = \\%L\profile\%U\profile logon drive = z: logon home = \\%L\%U domain logons = Yes domain master = Yes dns proxy = No wins support = Yes ldap ssl = no load printers = no ..snip.. [AB] path = /var/shared/AB read only = No guest ok = Yes Regards, Mitja --- OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar 4 01:45:40 CET 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID RTC BIOS diagnostic error 18memory_size,fixed_disk cpu0: Intel(R) Pentium(R) 4 CPU 3.20GHz (GenuineIntel 686-class) 3.20 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFL U SH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,CNXT-ID real mem = 535883776 (523324K) avail mem = 481636352 (470348K) using 4278 buffers containing 26898432 bytes (26268K) of memory RTC BIOS diagnostic error 18memory_size,fixed_disk mainbus0 (root) bios0 at mainbus0: AT/286+(00) BIOS, date 01/15/04, BIOS32 rev. 0 @ 0xffe90 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown apm0: flags 30102 dobusy 0 doidle 1 pcibios0 at bios0: rev 2.1 @ 0xf/0x1 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfeae0/160 (8 entries) pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801EB/ER
Re: massive memory leak in 3.8-stable samba
One of my production machines (3.8-stable) has suddenly started panicing every couple of hours. I found out that the culprit is smbd, eating through memory like there's no tomorrow (approx. 10Mb / minute! ). Can't figure out what has triggered it, nothing changed on the machine lately and there is only one active w2k client, writing a 2.5kB file every 15 seconds or so. I'd be glad of any assistance, even pointing out any stupid mistakes I have made, because this is driving me nuts. I ran into something very similar recently. In my case I eventually discovered that one user was writing to a folder containing 22,000 files. Avoiding this folder has entirely solved the problem. (Or at least worked around it.) FWIW, the Samba logs were helpful only inasmuch as they pointed me to the user who was causing the problem. I had to sit down and watch her operate to find out what she was doing... Perhaps (indeed probably) not relevant to your problem, but might give you some ideas. If you're writing a file every 15s, perhaps your problem is related to mine. Steve http://www.fivetrees.com
Re: massive memory leak in 3.8-stable samba
On Saturday 04 March 2006 10.59, you wrote: Hi! One of my production machines (3.8-stable) has suddenly started panicing every couple of hours. I found out that the culprit is smbd, eating through memory like there's no tomorrow (approx. 10Mb / minute! ). Can't figure out what has triggered it, nothing changed on the machine lately and there is only one active w2k client, writing a 2.5kB file every 15 seconds or so. I'd be glad of any assistance, even pointing out any stupid mistakes I have made, because this is driving me nuts. -- load averages: 0.42, 0.87, 1.71 10:45:59 23 processes: 22 idle, 1 on processor CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.2% interrupt, 99.8% idle Memory: Real: 290M/338M act/tot Free: 160M Swap: 2372K/256M used/tot PID USERNAME PRI NICE SIZE RES STATEWAIT TIMECPU COMMAND 30693 Guest 20 284M 284M sleepselect 0:24 0.44% smbd -- load averages: 0.28, 0.56, 1.35 10:50:14 23 processes: 22 idle, 1 on processor CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.2% interrupt, 99.8% idle Memory: Real: 348M/397M act/tot Free: 101M Swap: 2372K/256M used/tot PID USERNAME PRI NICE SIZE RES STATEWAIT TIMECPU COMMAND 30693 Guest 20 342M 342M sleepselect 0:29 1.03% smbd - # smbstatus Samba version 3.0.13 PID Username Group Machine --- Service pid machine Connected at --- MC 30693 x Sat Mar 4 10:23:13 2006 IPC$ 13147 x Sat Mar 4 10:41:57 2006 Locked files: PidDenyMode Access R/WOplock Name -- 30693 DENY_NONE 0x2019f RDWR EXCLUSIVE+BATCH /var/shared/AB/gdat/ini/G_dat.ini Sat Mar 4 10:43:59 2006 The kernel is (full dmesg at the end) OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar 4 01:45:40 CET 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID (previously had a -stable built on Jan 3 2006, same symptoms) # pkg_info |grep samba samba-3.0.13p0 SMB and CIFS client and server for UNIX samba-docs-3.0.20b documentation and examples for samba (binary package from ftp.kd85.com, tried also to build it from ports and even MFC'd the latest version, 3.0.31b - no change) # cat /etc/samba/smb.conf [global] dos charset = CP852 workgroup = STIL server string = x map to guest = Bad User passdb backend = tdbsam passwd program = /usr/bin/passwd %u log file = /var/log/smbd.%m max log size = 50 mangle prefix = 6 add user script = useradd -d /var/empty -s /sbin/nologin %u add group script = groupadd '%g' add machine script = useradd -d /var/empty -s /sbin/nologin -g machines %u logon script = logon.bat logon path = \\%L\profile\%U\profile logon drive = z: logon home = \\%L\%U domain logons = Yes domain master = Yes dns proxy = No wins support = Yes ldap ssl = no load printers = no ..snip.. [AB] path = /var/shared/AB read only = No guest ok = Yes Regards, Mitja --- OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar 4 01:45:40 CET 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID RTC BIOS diagnostic error 18memory_size,fixed_disk cpu0: Intel(R) Pentium(R) 4 CPU 3.20GHz (GenuineIntel 686-class) 3.20 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFL U SH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,CNXT-ID real mem = 535883776 (523324K) avail mem = 481636352 (470348K) using 4278 buffers containing 26898432 bytes (26268K) of memory RTC BIOS diagnostic error 18memory_size,fixed_disk mainbus0 (root) bios0 at mainbus0: AT/286+(00) BIOS, date 01/15/04, BIOS32 rev. 0 @ 0xffe90 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown apm0: flags 30102 dobusy 0 doidle 1 pcibios0 at bios0: rev 2.1 @ 0xf/0x1 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfeae0/160 (8 entries) pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801EB/ER LPC rev 0x00) pcibios0: PCI bus #2 is the last bus bios0: ROM list: 0xc/0x8000 0xc8000/0x1800! 0xc9800/0x2800 cpu0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 Intel 82875P Host rev 0x02 ppb0 at pci0 dev 1 function 0 Intel 82875P AGP rev 0x02 pci1 at ppb0 bus 1 uhci0 at pci0 dev 29 function 0 Intel 82801EB/ER USB rev 0x02: irq 11 usb0 at uhci0: USB revision 1.0 uhub0 at usb0 uhub0: Intel UHCI root hub, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self