Re: massive memory leak in 3.8-stable samba

2006-03-11 Thread Paul Thorn

On Tue, 7 Mar 2006, Steve Fairhead wrote:


One of my production machines (3.8-stable) has suddenly started
panicing every couple of hours. I found out that the culprit is smbd,
eating through memory like there's no tomorrow (approx. 10Mb  /
minute! ). Can't figure out what has triggered it, nothing changed on
the machine lately and there is only one active w2k client, writing a

2.5kB file every 15 seconds or so.

I'd be glad of any assistance, even pointing out any stupid mistakes I
have made, because this is driving me nuts.


I ran into something very similar recently. In my case I eventually
discovered that one user was writing to a folder containing 22,000 files.
Avoiding this folder has entirely solved the problem. (Or at least worked
around it.)

FWIW, the Samba logs were helpful only inasmuch as they pointed me to the
user who was causing the problem. I had to sit down and watch her operate
to find out what she was doing...

Perhaps (indeed probably) not relevant to your problem, but might give you
some ideas. If you're writing a file every 15s, perhaps your problem is
related to mine.

Steve
http://www.fivetrees.com


Have a similar problem, though mine appears when moving files between
various directories when either has a large number of files in it.

Finally decided to delve into the Samba code to see if I could track
down the cause. I believe I tracked it to code that scans the
directories prior to the file rename operation. The code that scans
the directories does some odd things (like hammer on the system lib
call: telldir() which appears to leak memory like a sieve if
unaccompanied by a matching seekdir() -- approximately 16 bytes per
file per directory scanned).  This can add up to signifant loss if
the directory has, say, 10,000 files or more in it, especially if
the directory is scanned multiple times per operation and repeatedly
over time.

I'm at a bit of a loss how to proceed from here. Given the state and
conventions of *BSD-ish directory library calls, Samba isn't scanning
directories in a very memory efficient manner -- at least in the
case of OpenBSD. But the directory traversals and possible
dependencies on the scanning methods could spider badly to fix it
properly and reliably within Samba. (ie, I believe that to fix the
issue in Samba properly is a heck of a lot of work and effort,
but I'm also not exactly a Samba expert/developer either)

On the other hand, the fact that the system library call telldir() can
leak as badly as it does probably isn't a good thing either as
outlined here:

http://mail-index.netbsd.org/netbsd-bugs/2004/02/05/0008.html

It would appear that at least in OpenBSD 3.8-release, the library
implementation suffers similar potental issues. I have no idea if
the patch proposed in the URL above ever made it into NetBSD, since 
I don't run NetBSD anywhere; however, the patch looks promising.

Changing the implementation of telldir() and related functions
would likely fix this particular memory leak in Samba as well, though
there may be underlying OS/userland issues about which I am unaware.

I guess the bottom line here is that I can see that if you have a
process writing into a directory that contains a LOT of files, Samba
(or your client, or both) may be scanning the directory prior to any
write, and possibly multiple times. If that's the case, the
telldir() issue will likely affect you as well. While reducing the
number of files in the directory in question won't stop the leak, it
may significantly slow it ...

I suppose this also assumes that your problem is related to the
one I am seeing, and that my preliminary analysis is correct.

Hope this helps,
 - Paul



Re: massive memory leak in 3.8-stable samba

2006-03-07 Thread Stefan Kell
Hi,

Mitja: did you check the samba-logfiles? You could try to increase the
loglevel and see, what smbd is doing. I always find these logfiles very
helpful.

Regards

Stefan Kell

 --- Urspr|ngliche Nachricht ---
 Von: Per-Olov Sjvholm [EMAIL PROTECTED]
 An: Mitja Muenih [EMAIL PROTECTED]
 Kopie: misc@openbsd.org
 Betreff: Re: massive memory leak in 3.8-stable samba
 Datum: Mon, 6 Mar 2006 18:17:06 +0100
 
 On Saturday 04 March 2006 10.59, you wrote:
  Hi!
 
 
  One of my production machines (3.8-stable) has suddenly started panicing
  every couple of hours. I found out that the culprit is smbd, eating
 through
  memory like there's no tomorrow (approx. 10Mb  / minute! ). Can't figure
  out what has triggered it, nothing changed on the machine lately and
 there
  is only one active w2k client, writing a 2.5kB file every 15 seconds or
 so.
  I'd be glad of any assistance, even pointing out any stupid mistakes I
 have
  made, because this is driving me nuts.
 
  --
  load averages:  0.42,  0.87,  1.71
  10:45:59
  23 processes:  22 idle, 1 on processor
  CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.2% interrupt, 99.8%
  idle
  Memory: Real: 290M/338M act/tot  Free: 160M  Swap: 2372K/256M used/tot
 
PID USERNAME PRI NICE  SIZE   RES STATEWAIT TIMECPU
 COMMAND
  30693 Guest  20  284M  284M sleepselect   0:24  0.44% smbd
  --
  load averages:  0.28,  0.56,  1.35
  10:50:14
  23 processes:  22 idle, 1 on processor
  CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.2% interrupt, 99.8%
  idle
  Memory: Real: 348M/397M act/tot  Free: 101M  Swap: 2372K/256M used/tot
 
PID USERNAME PRI NICE  SIZE   RES STATEWAIT TIMECPU
 COMMAND
  30693 Guest  20  342M  342M sleepselect   0:29  1.03% smbd
  -
 
  # smbstatus
  Samba version 3.0.13
  PID Username  Group Machine
  ---
  Service  pid machine   Connected at
  ---
  MC   30693   x Sat Mar  4 10:23:13 2006
  IPC$ 13147   x Sat Mar  4 10:41:57 2006
  Locked files:
  PidDenyMode   Access  R/WOplock   Name
  --
  30693  DENY_NONE  0x2019f RDWR   EXCLUSIVE+BATCH
  /var/shared/AB/gdat/ini/G_dat.ini   Sat Mar  4 10:43:59 2006
 
 
  The kernel is (full dmesg at the end)
 
  OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar  4 01:45:40 CET 2006
  [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID
 
  (previously had a -stable built on Jan 3 2006, same symptoms)
 
 
  # pkg_info |grep samba
  samba-3.0.13p0  SMB and CIFS client and server for UNIX
  samba-docs-3.0.20b  documentation and examples for samba
 
  (binary package from ftp.kd85.com, tried also to build it from ports and
  even MFC'd the latest version, 3.0.31b - no change)
 
   # cat /etc/samba/smb.conf
  [global]
  dos charset = CP852
  workgroup = STIL
  server string = x
  map to guest = Bad User
  passdb backend = tdbsam
  passwd program = /usr/bin/passwd %u
  log file = /var/log/smbd.%m
  max log size = 50
  mangle prefix = 6
  add user script = useradd -d /var/empty -s /sbin/nologin %u
  add group script = groupadd '%g'
  add machine script = useradd -d /var/empty -s /sbin/nologin -g
  machines %u
  logon script = logon.bat
  logon path = \\%L\profile\%U\profile
  logon drive = z:
  logon home = \\%L\%U
  domain logons = Yes
  domain master = Yes
  dns proxy = No
  wins support = Yes
  ldap ssl = no
  load printers = no
  ..snip..
  [AB]
  path = /var/shared/AB
  read only = No
  guest ok = Yes
 
 
 
  Regards, Mitja
 
  ---
  OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar  4 01:45:40 CET 2006
  [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID
  RTC BIOS diagnostic error 18memory_size,fixed_disk
  cpu0: Intel(R) Pentium(R) 4 CPU 3.20GHz (GenuineIntel 686-class) 3.20
 GHz
  cpu0:
 

FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFL
 U SH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,CNXT-ID
  real mem  = 535883776 (523324K)
  avail mem = 481636352 (470348K)
  using 4278 buffers containing 26898432 bytes (26268K) of memory
  RTC BIOS diagnostic error 18memory_size,fixed_disk
  mainbus0 (root)
  bios0 at mainbus0: AT/286+(00) BIOS, date 01/15/04, BIOS32 rev. 0 @
 0xffe90
  apm0 at bios0: Power Management spec V1.2
  apm0: AC on, battery charge unknown
  apm0: flags 30102 dobusy 0 doidle 1
  pcibios0 at bios0: rev 2.1 @ 0xf/0x1
  pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfeae0/160 (8 entries)
  pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801EB/ER

Re: massive memory leak in 3.8-stable samba

2006-03-07 Thread Steve Fairhead
 One of my production machines (3.8-stable) has suddenly started 
 panicing every couple of hours. I found out that the culprit is smbd, 
 eating through memory like there's no tomorrow (approx. 10Mb  / 
 minute! ). Can't figure out what has triggered it, nothing changed on 
 the machine lately and there is only one active w2k client, writing a
2.5kB file every 15 seconds or so.
 I'd be glad of any assistance, even pointing out any stupid mistakes I 
 have made, because this is driving me nuts.

I ran into something very similar recently. In my case I eventually
discovered that one user was writing to a folder containing 22,000 files.
Avoiding this folder has entirely solved the problem. (Or at least worked
around it.)

FWIW, the Samba logs were helpful only inasmuch as they pointed me to the
user who was causing the problem. I had to sit down and watch her operate
to find out what she was doing...

Perhaps (indeed probably) not relevant to your problem, but might give you
some ideas. If you're writing a file every 15s, perhaps your problem is
related to mine.

Steve
http://www.fivetrees.com



Re: massive memory leak in 3.8-stable samba

2006-03-06 Thread Per-Olov Sjöholm
On Saturday 04 March 2006 10.59, you wrote:
 Hi!


 One of my production machines (3.8-stable) has suddenly started panicing
 every couple of hours. I found out that the culprit is smbd, eating through
 memory like there's no tomorrow (approx. 10Mb  / minute! ). Can't figure
 out what has triggered it, nothing changed on the machine lately and there
 is only one active w2k client, writing a 2.5kB file every 15 seconds or so.
 I'd be glad of any assistance, even pointing out any stupid mistakes I have
 made, because this is driving me nuts.

 --
 load averages:  0.42,  0.87,  1.71
 10:45:59
 23 processes:  22 idle, 1 on processor
 CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.2% interrupt, 99.8%
 idle
 Memory: Real: 290M/338M act/tot  Free: 160M  Swap: 2372K/256M used/tot

   PID USERNAME PRI NICE  SIZE   RES STATEWAIT TIMECPU COMMAND
 30693 Guest  20  284M  284M sleepselect   0:24  0.44% smbd
 --
 load averages:  0.28,  0.56,  1.35
 10:50:14
 23 processes:  22 idle, 1 on processor
 CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.2% interrupt, 99.8%
 idle
 Memory: Real: 348M/397M act/tot  Free: 101M  Swap: 2372K/256M used/tot

   PID USERNAME PRI NICE  SIZE   RES STATEWAIT TIMECPU COMMAND
 30693 Guest  20  342M  342M sleepselect   0:29  1.03% smbd
 -

 # smbstatus
 Samba version 3.0.13
 PID Username  Group Machine
 ---
 Service  pid machine   Connected at
 ---
 MC   30693   x Sat Mar  4 10:23:13 2006
 IPC$ 13147   x Sat Mar  4 10:41:57 2006
 Locked files:
 PidDenyMode   Access  R/WOplock   Name
 --
 30693  DENY_NONE  0x2019f RDWR   EXCLUSIVE+BATCH
 /var/shared/AB/gdat/ini/G_dat.ini   Sat Mar  4 10:43:59 2006


 The kernel is (full dmesg at the end)

 OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar  4 01:45:40 CET 2006
 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID

 (previously had a -stable built on Jan 3 2006, same symptoms)


 # pkg_info |grep samba
 samba-3.0.13p0  SMB and CIFS client and server for UNIX
 samba-docs-3.0.20b  documentation and examples for samba

 (binary package from ftp.kd85.com, tried also to build it from ports and
 even MFC'd the latest version, 3.0.31b - no change)

  # cat /etc/samba/smb.conf
 [global]
 dos charset = CP852
 workgroup = STIL
 server string = x
 map to guest = Bad User
 passdb backend = tdbsam
 passwd program = /usr/bin/passwd %u
 log file = /var/log/smbd.%m
 max log size = 50
 mangle prefix = 6
 add user script = useradd -d /var/empty -s /sbin/nologin %u
 add group script = groupadd '%g'
 add machine script = useradd -d /var/empty -s /sbin/nologin -g
 machines %u
 logon script = logon.bat
 logon path = \\%L\profile\%U\profile
 logon drive = z:
 logon home = \\%L\%U
 domain logons = Yes
 domain master = Yes
 dns proxy = No
 wins support = Yes
 ldap ssl = no
 load printers = no
 ..snip..
 [AB]
 path = /var/shared/AB
 read only = No
 guest ok = Yes



 Regards, Mitja

 ---
 OpenBSD 3.8-stable (GENERIC.RAID) #1: Sat Mar  4 01:45:40 CET 2006
 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC.RAID
 RTC BIOS diagnostic error 18memory_size,fixed_disk
 cpu0: Intel(R) Pentium(R) 4 CPU 3.20GHz (GenuineIntel 686-class) 3.20 GHz
 cpu0:
 FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFL
U SH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,CNXT-ID
 real mem  = 535883776 (523324K)
 avail mem = 481636352 (470348K)
 using 4278 buffers containing 26898432 bytes (26268K) of memory
 RTC BIOS diagnostic error 18memory_size,fixed_disk
 mainbus0 (root)
 bios0 at mainbus0: AT/286+(00) BIOS, date 01/15/04, BIOS32 rev. 0 @ 0xffe90
 apm0 at bios0: Power Management spec V1.2
 apm0: AC on, battery charge unknown
 apm0: flags 30102 dobusy 0 doidle 1
 pcibios0 at bios0: rev 2.1 @ 0xf/0x1
 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfeae0/160 (8 entries)
 pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801EB/ER LPC rev
 0x00) pcibios0: PCI bus #2 is the last bus
 bios0: ROM list: 0xc/0x8000 0xc8000/0x1800! 0xc9800/0x2800
 cpu0 at mainbus0
 pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
 pchb0 at pci0 dev 0 function 0 Intel 82875P Host rev 0x02
 ppb0 at pci0 dev 1 function 0 Intel 82875P AGP rev 0x02
 pci1 at ppb0 bus 1
 uhci0 at pci0 dev 29 function 0 Intel 82801EB/ER USB rev 0x02: irq 11
 usb0 at uhci0: USB revision 1.0
 uhub0 at usb0
 uhub0: Intel UHCI root hub, rev 1.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self