Here is a capture of top at the time: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5842 root 20 0 873m 6912 4612 S 0.0 0.4 0:01.20 winbindd 5848 root 20 0 872m 3260 2272 S 0.0 0.2 0:00.08 winbindd 5849 root 20 0 872m 3640 2652 S 0.0 0.2 0:00.06 winbindd 5850 root 20 0 872m 3320 2200 S 0.0 0.2 0:00.06 winbindd 5859 root 20 0 874m 2684 1448 S 0.0 0.2 0:00.00 winbindd 5954 root 20 0 872m 3740 2284 S 0.0 0.2 0:00.02 winbindd 5955 root 20 0 872m 3804 2348 S 0.0 0.2 0:00.04 winbindd 6025 root 20 0 873m 1544 4 S 0.0 0.1 0:00.00 winbindd 6026 root 20 0 873m 1548 4 S 0.0 0.1 0:00.00 winbindd 6518 root 20 0 873m 5048 3476 S 0.0 0.3 0:00.00 winbindd 6576 root 20 0 873m 6228 4232 S 0.0 0.4 0:00.00 winbindd 5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 529 root 16 -4 21076 632 0 S 0.0 0.0 0:00.16 udevd 6574 root 20 0 18824 1264 940 R 0.0 0.1 0:00.10 top 1761 root 20 0 5904 320 184 S 0.0 0.0 0:00.06 syslogd 1805 root 20 0 48868 720 216 S 0.0 0.0 0:00.00 sshd 5768 root 20 0 78572 916 200 S 0.0 0.1 0:00.14 sshd
Robert LeBlanc Life Sciences & Undergraduate Education Computer Support Brigham Young University On Fri, Oct 23, 2009 at 1:17 PM, Robert LeBlanc <rob...@leblancnet.us>wrote: > I also see this in the syslog sometimes: > > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132286] rsync invoked oom-killer: > gfp_mask=0x201d2, order=0, oomkilladj=0 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132649] Pid: 6516, comm: rsync > Not tainted 2.6.26-2-amd64 #1 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132916] > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132917] Call Trace: > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.133470] [<ffffffff802738c0>] > oom_kill_process+0x57/0x1dc > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.133746] [<ffffffff8023b551>] > __capable+0x9/0x1c > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.133993] [<ffffffff80273beb>] > badness+0x188/0x1c7 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.134245] [<ffffffff80273e1f>] > out_of_memory+0x1f5/0x28e > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.140836] [<ffffffff80276b70>] > __alloc_pages_internal+0x31d/0x3bf > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141048] [<ffffffff80272d1c>] > generic_file_aio_read+0x3b7/0x4ae > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141279] [<ffffffff8029ae47>] > do_sync_read+0xc9/0x10c > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141472] [<ffffffff80246221>] > autoremove_wake_function+0x0/0x2e > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141682] [<ffffffff8029b638>] > vfs_read+0xaa/0x152 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141864] [<ffffffff8029ba19>] > sys_read+0x45/0x6e > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142046] [<ffffffff8020beca>] > system_call_after_swapgs+0x8a/0x8f > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142254] > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142376] Mem-info: > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142511] Node 0 DMA per-cpu: > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142662] CPU 0: hi: 0, > btch: 1 usd: 0 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142844] Node 0 DMA32 per-cpu: > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142998] CPU 0: hi: 186, > btch: 31 usd: 173 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.143183] Active:189862 > inactive:179626 dirty:0 writeback:0 unstable:0 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.143184] free:3011 slab:7697 > mapped:76 pagetables:1122 bounce:0 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.143592] Node 0 DMA free:6020kB > min:32kB low:40kB high:48kB active:3012kB inactive:2676kB present:10724kB > pages_scanned:9007 all_unreclaimable? yes > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.144711] lowmem_reserve[]: 0 1499 > 1499 1499 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.144894] Node 0 DMA32 free:6024kB > min:4936kB low:6168kB high:7404kB active:756436kB inactive:715828kB > present:1535136kB pages_scanned:626785 all_unreclaimable? no > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.145479] lowmem_reserve[]: 0 0 0 0 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.145648] Node 0 DMA: 3*4kB 1*8kB > 1*16kB 5*32kB 3*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = > 6020kB > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.146045] Node 0 DMA32: 162*4kB > 28*8kB 9*16kB 7*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB > 1*4096kB = 6040kB > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.155603] 364394 total pagecache > pages > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.155831] Swap cache: add 0, delete > 0, find 0/0 > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.156064] Free swap = 0kB > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.156064] Total swap = 0kB > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164049] 393200 pages of RAM > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164049] 6902 reserved pages > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164049] 2124 pages shared > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164247] 0 pages swap cached > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164396] Out of memory: kill > process 5842 (winbindd) score 76798 or a child > Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164850] Killed process 5847 > (winbindd) > > Looks like winbind is running out of memory? > > Robert LeBlanc > Life Sciences & Undergraduate Education Computer Support > Brigham Young University > > > On Fri, Oct 23, 2009 at 9:33 AM, Robert LeBlanc <rob...@leblancnet.us>wrote: > >> Just out of curiosity, do any of you have mdns4_minimal or mdsn4 in your >> /etc/nsswitch.conf file? I think mdns4 doesn't work too well and I usually >> take it out, but it was alive and well on these machines. Does removing >> those items help anyone? >> >> Robert LeBlanc >> Life Sciences & Undergraduate Education Computer Support >> Brigham Young University >> >> >> On Thu, Oct 22, 2009 at 4:45 PM, Robert LeBlanc <rob...@leblancnet.us>wrote: >> >>> I'm using 3.4.2 right now and I'm seeing a similar problem. We are using >>> winbind to authenticate our users on our Linux cluster. The worker and >>> interactive nodes are on a private subnet that is NATed to the local LAN. >>> Two head nodes provide failover for the NATing. When failover is happening, >>> winbind whacks out. The system is not unusable, but no authentication >>> happens for about 30 minutes after the failover. I'm going to see if I can >>> get iptables to share state between machines to help prevent this, but there >>> needs to be a faster reconnection after domain controllers seem to be down. >>> >>> Robert LeBlanc >>> Life Sciences & Undergraduate Education Computer Support >>> Brigham Young University >>> >>> >>> >>> On Thu, Oct 22, 2009 at 1:55 AM, Clayton Hill <ad...@ateamonsite.com>wrote: >>> >>>> Hi Jason, >>>> >>>> Yup you got the same problem - just going about it a sorta different way >>>> - ouch that must really suck having winbind\ADdomain own the account you >>>> are logged in as. bummer! >>>> My problem is slightly less serious as I am trying to use my local >>>> accounts (such as root) and I just use samba as a domain member to host >>>> files with AD ACLs in the filesystem permissions... but we see the same >>>> bug. >>>> because winbind (even caching) kills access to my local accounts. >>>> I hope this is fixed in 3.4 (I just installed it yesterday) I haven't >>>> had a chance to run the same test on 3.4 >>>> >>>> possibilities: >>>> winbind is not caching right to allow smooth operation when the DC is >>>> offline and the system is virtually locked up >>>> winbind doesnt know the moment it cant connect to the DC that it should >>>> really use cache or just buzz off and die somehow >>>> winbind may or may not connect back up to the DC immediately >>>> >>>> I need to play with parameters and see what the new winbind options in >>>> 3.4 do. I have been on 3.2 until yesterday. >>>> >>>> >>>> Thanks for the info on the bug report.. >>>> >>>> Cheers, >>>> -Clayton >>>> >>>> Jason Haar wrote: >>>> >>>>> Just a FYI, but this looks an awful lot like the bug I reported months >>>>> ago >>>>> >>>>> https://bugzilla.samba.org/show_bug.cgi?id=6103 >>>>> >>>>> Basically I'm running Fedora11 with no local accounts (beyond root) - >>>>> relying on winbind. On occasion winbind appears to "hang" - and no >>>>> local >>>>> access works - including root - which shouldn't need winbind to >>>>> succeed! >>>>> Normally I have to reboot to fix, however if I was lucky enough for it >>>>> to happen before my screensaver kicked in, then simply restarting >>>>> winbind fixes the problem. >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> To unsubscribe from this list go to the following URL and read the >>>> instructions: https://lists.samba.org/mailman/options/samba >>>> >>> >>> >> > -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba