Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
I no longer use 32-bit kernels (but use the 64-bit amd64 kernel, even on my few last remaining 32-bt machines): that seems a suitable workaround or upgrade path. Should I try to test whether the issue with PAE remains? Cheers, Paul -- Paul Szabo p...@maths.usyd.edu.au www.maths.usyd.edu.au/u/psz School of Mathematics and Statistics University of SydneyAustralia I support NTEU members taking a stand for workplace rights in the face of poorly-run change management. Visit www.nteu.org.au/sydney to learn more.
Bug#892105: linux-image-4.9.0-6-amd64: i40e driver still unstable
I use kernel 4.9.130 (my own build from current "stretch" sources, package linux-source-4.9 version 4.9.130-2), and on my new machines with i40e devices, I observe similar, occasional issues: Jan 9 07:30:06 viale kernel: [428469.260531] i40e :19:00.1: cleared PE_CRITERR Jan 9 07:30:06 viale kernel: [428469.260639] i40e :19:00.1: TX driver issue detected, PF reset issued Jan 9 08:47:06 siv kernel: [422993.009196] i40e :19:00.1: cleared PE_CRITERR Jan 9 08:47:06 siv kernel: [422993.013535] i40e :19:00.1 eth1: NIC Link is Down Jan 9 08:47:16 siv kernel: [423002.131389] i40e :19:00.1 eth1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None Curiously each of those machines only ever show the one type of error (never show an error like the other machine), and both only complain about eth1, never about eth0 (though eth0 is also connected with similar traffic volumes). Following the hints in this bug report, I will try the Intel i40e driver, from (either) https://downloadcenter.intel.com/download/24411/ https://sourceforge.net/projects/e1000/files/i40e%20stable/ Cheers, Paul -- Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia
Bug#775541: NFS mounts fail at boot after Debian 8.5 upgrade
Dear Vincent, > Could you provide a bit more information about the package versions > on your system? > dpkg -l rpcbind nfs-common nfs-kernel-server systemd psz@como:~$ dpkg -l rpcbind nfs-common nfs-kernel-server systemd Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-=-=-=-=== ii nfs-common1:1.2.8-9 i386 NFS support files common to client and server ii nfs-kernel-server 1:1.2.8-9 i386 support for NFS kernel server ii rpcbind 0.2.1-6+deb8u1i386 converts RPC program numbers into universal addresses ii systemd 215-17+deb8u4.psz i386 system and service manager The systemd packages are my "own", with my (trivial!) patches as per https://bugs.debian.org/803013 > Also I think the output of these commands would be helpful > systemd-analyze critical-path remote-fs-pre.target > systemd-analyze critical-path nfs-kernel-server.service I think you meant critical-chain: psz@como:~$ systemd-analyze critical-chain remote-fs-pre.target ... remote-fs-pre.target @98ms psz@como:~$ systemd-analyze critical-chain nfs-kernel-server.service ... nfs-kernel-server.service +223ms basic.target @4.819s timers.target @4.818s systemd-tmpfiles-clean.timer @4.818s sysinit.target @4.816s console-setup.service @4.813s +1ms kbd.service @4.753s +58ms system.slice @108ms -.slice @103ms Cheers, Paul
Bug#775541: NFS mounts fail at boot after Debian 8.5 upgrade
After upgrading from Debian jessie 8.4 to 8.5, my NFS mounts in fstab failed at boot (or reboot) time. To fix, I changed the one file /lib/systemd/system/remote-fs-pre.target adding the line After=rpcbind.target then my NFS mounts work correctly. Question: should I have used After=rpcbind.service instead? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia
Bug#800945: linux-source-3.16: Scheduler prefers pinned tasks
I believe the problem is solved. Please see discussion under http://marc.info/?t=14440821092=1=2 specifically the message http://marc.info/?l=linux-kernel=144459727213633=2 and quoting from there: I believe I have now solved the problem, simply by setting: ... for n in /proc/sys/kernel/sched_domain/cpu*/dom*/min_interval; do echo 0 > $n; done for n in /proc/sys/kernel/sched_domain/cpu*/dom*/max_interval; do echo 1 > $n; done echo 10 > /proc/sys/kernel/sched_latency_ns echo 10 > /proc/sys/kernel/sched_min_granularity_ns echo 1 > /proc/sys/kernel/sched_wakeup_granularity_ns Please close this bug (as "solved" or "user config issue" or "invalid"). Sorry about the noise... Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia
Bug#800945: linux-source-3.16: Scheduler prefers pinned tasks
Package: linux-source-3.16 Version: 3.16.7-ckt11-1+deb8u4 Severity: normal The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Fairness is obtained when either: - there are as many un-pinned processes as CPUs; or - with CGROUP controls and the two kinds of processes run by different users, when there is just one un-pinned process; or - if the pinning is turned off for these processes (or they are started without). Any insight is welcome! Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- System Information: Debian Release: 8.2 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.7-ckt11-pk07.13.8-amd64 (SMP w/4 CPU cores) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 3.16.7-ckt11-pk07.13-amd64 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set CONFIG_KERNEL_BZIP2=y # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set CONFIG_RCU_STALL_COMMON=y # CONFIG_RCU_USER_QS is not set CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANO
Bug#800929: linux_source-3.16: Scheduler prefers pinned tasks
Sorry, that should have been package linux-source-3.16 (with a dash, not underscore). Maybe I could have re-assigned this but; but do not really know how to, so instead now submitted new one. Please close this errant bug report. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia
Bug#800929: linux_source-3.16: Scheduler prefers pinned tasks
Package: linux_source-3.16 Version: 3.16.7-ckt11-1+deb8u4 Severity: normal The Linux CFS scheduler prefers pinned tasks and unfairly gives more CPU time to tasks that have set CPU affinity. This effect is observed with or without CGROUP controls. To demonstrate: on an otherwise idle machine, as some user run several processes pinned to each CPU, one for each CPU (as many as CPUs present in the system) e.g. for a quad-core non-HyperThreaded machine: taskset -c 0 perl -e 'while(1){1}' & taskset -c 1 perl -e 'while(1){1}' & taskset -c 2 perl -e 'while(1){1}' & taskset -c 3 perl -e 'while(1){1}' & and (as that same or some other user) run some without pinning: perl -e 'while(1){1}' & perl -e 'while(1){1}' & and use e.g. top to observe that the pinned processes get more CPU time than "fair". Fairness is obtained when either: - there are as many un-pinned processes as CPUs; or - with CGROUP controls and the two kinds of processes run by different users, when there is just one un-pinned process; or - if the pinning is turned off for these processes (or they are started without). Any insight is welcome! Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- System Information: Debian Release: 8.2 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.7-ckt11-pk07.13.8-amd64 (SMP w/4 CPU cores) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 3.16.7-ckt11-pk07.13-amd64 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set CONFIG_KERNEL_BZIP2=y # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_PREEMPT_RCU is not set CONFIG_RCU_STALL_COMMON=y # CONFIG_RCU_USER_QS is not set CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANO
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Simon, So if he config sparse memory, the issue can be solved I think. In my config file I have: CONFIG_HAVE_SPARSE_IRQ=y CONFIG_SPARSE_IRQ=y CONFIG_ARCH_SPARSEMEM_ENABLE=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_SPARSEMEM_STATIC=y # CONFIG_INPUT_SPARSEKMAP is not set # CONFIG_SPARSE_RCU_POINTER is not set Is that sufficient for sparse memory, or should I try something else? Or maybe, you meant that some kernel source patches might be possible in the sparse memory code? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201302242210.r1omadad021...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, Thanks for the repeated explanations. PAE was a stop-gap ... ... [PAE] completely untenable. Is this a good time to withdraw PAE, to tell the world that it does not work? Maybe you should have had such comments in the code. Seems that amd64 now works somewhat: on Debian the linux-image package is tricky to install, and linux-headers is even harder. Is there work being done to make this smoother? --- I am still not convinced by the lowmem starvation explanation: because then PAE should have worked fine on my 3GB machine; maybe I should also try PAE on my 512MB laptop. - Though, what do I know, have not yet found the buggy line of code I believe is lurking there... Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301310907.r0v974j9017...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, Based on your experience I might propose to change the automatic kernel selection for i386 so that we use 'amd64' on a system with 16GB RAM and a capable processor. Don't you mean change to amd64 for 4GB (or any RAM), never using PAE? PAE is broken for any amount of RAM. More precisely, PAE with any RAM fails the sleep test: n=0; while [ $n -lt 33000 ]; do sleep 600 ((n=n+1)); done and with 32GB fails the write test: n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; ((n=n+1)); done Why do you think 16GB is significant? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301312306.r0vn6tbx012...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, (Removing the mailing lists linux-ker...@vger.kernel.org linux...@kvack.org from CC, as this may be of no interest to them.) Seems that amd64 now works somewhat: on Debian the linux-image package is tricky to install, If you do an i386 (userland) installation then you must either select expert mode to get a choice of kernel packages, or else install the 'amd64' kernel package afterward. and linux-headers is even harder. In what way? Something about dependencies; though some of that may also be due to my mixing of squeeze and wheezy 3.2.35 kernels. I will wait for the Debian defaults to change to amd64 before reporting these oddities. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301312309.r0vn9ftv012...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, PAE is broken for any amount of RAM. No it isn't. Could I please ask you to expand on that? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201302010212.r112c6uq005...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Ben, PAE is broken for any amount of RAM. No it isn't. Could I please ask you to expand on that? I already did, a few messages back. OK, thanks. Noting however that fewer than those back, I said: ... PAE with any RAM fails the sleep test: n=0; while [ $n -lt 33000 ]; do sleep 600 ((n=n+1)); done and somewhere also said that non-PAE passes. Does not that prove that PAE is broken? Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201302010313.r113dtj3027...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Pavel and Dave, The assertion was that 4GB with no PAE passed a forkbomb test (ooming) while 4GB of RAM with PAE hung, thus _PAE_ is broken. Yes, PAE is broken. Still, maybe the above needs slight correction: non-PAE HIGHMEM4G passed the sleep test: no OOM, nothing unexpected; whereas PAE OOMed then hung (tested with various RAM from 3GB to 64GB). The feeling I get is that amd64 is proposed as a drop-in replacement for PAE, that support and development of PAE is gone, that PAE is dead. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301301940.r0ujeeka016...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Dear Jonathan, If you can identify where it was fixed then your patch for older versions should go to stable with a reference to the upstream fix (see Documentation/stable_kernel_rules.txt). How about this patch? It was applied in mainline during the 3.3 merge window, so kernels newer than 3.2.y shouldn't need it. ... commit ab8fabd46f811d5153d8a0cd2fac9a0d41fb593d upstream. ... Yes, I beleive that is the correct patch, surely better than my simple subtraction of min_free_kbytes. Noting, that this does not solve all problems, the latest 3.8 kernel still crashes with OOM: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098961/comments/18 Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301262023.r0qkniak029...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Dear Minchan, So what's the effect for user? ... It seems you saw old kernel. ... Current kernel includes ... So I think we don't need this patch. As I understand now, my patch is right and needed for older kernels; for newer kernels, the issue has been fixed in equivalent ways; it was an oversight that the change was not backported; and any justification you need, you can get from those later better patches. I asked: A question: what is the use or significance of vm_highmem_is_dirtyable? It seems odd that it would be used in setting limits or threshholds, but not used in decisions where to put dirty things. Is that so, is that as should be? What is the recommended setting of highmem_is_dirtyable? The silence is deafening. I guess highmem_is_dirtyable is an aberration. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301250953.r0p9rose012...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Dear Ben, If you can identify where it was fixed then ... Sorry I cannot do that. I have no idea where kernel changelogs are kept. I am happy to do some work. Please do not call me lazy. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301252349.r0pnnfyf024...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Dear Ben, ... the mm maintainers are probably much better placed ... Exactly. Now I wonder: are you one of them? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301260307.r0q37i8q002...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()
Dear Fengguang (et al), There are 260MB reclaimable slab pages in the normal zone, however we somehow failed to reclaim them. ... Could the problem be that without CONFIG_NUMA, zone_reclaim_mode stays at zero and anyway zone_reclaim() does nothing in include/linux/swap.h ? Though... there is no CONFIG_NUMA nor /proc/sys/vm/zone_reclaim_mode in the Ubuntu non-PAE plain HIGHMEM4G kernel, and still it handles the sleep test just fine. Where does reclaiming happen (or meant to happen)? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301260357.r0q3vt1v005...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()
Dear Fengguang, Or more simple, you may show us the OOM dmesg which will contain the number of dirty pages. ... Do you mean kern.log lines like: [ 744.754199] bash invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, oom_score_adj=0 [ 744.754202] bash cpuset=/ mems_allowed=0 [ 744.754204] Pid: 3836, comm: bash Not tainted 3.2.0-4-686-pae #1 Debian 3.2.32-1 ... [ 744.754354] active_anon:13497 inactive_anon:129 isolated_anon:0 [ 744.754354] active_file:2664 inactive_file:4144756 isolated_file:0 [ 744.754355] unevictable:0 dirty:510 writeback:0 unstable:0 [ 744.754356] free:11867217 slab_reclaimable:68289 slab_unreclaimable:7204 [ 744.754356] mapped:8066 shmem:250 pagetables:519 bounce:0 [ 744.754361] DMA free:4260kB min:784kB low:980kB high:1176kB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15784kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:11628kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:499 all_unreclaimable? yes [ 744.754364] lowmem_reserve[]: 0 867 62932 62932 [ 744.754369] Normal free:43788kB min:44112kB low:55140kB high:66168kB active_anon:0kB inactive_anon:0kB active_file:912kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:261528kB slab_unreclaimable:28812kB kernel_stack:3096kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16060 all_unreclaimable? yes [ 744.754372] lowmem_reserve[]: 0 0 496525 496525 [ 744.754377] HighMem free:47420820kB min:512kB low:789888kB high:1579264kB active_anon:53988kB inactive_anon:516kB active_file:9740kB inactive_file:16579320kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:63555300kB mlocked:0kB dirty:2040kB writeback:0kB mapped:32260kB shmem:1000kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:2076kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 744.754380] lowmem_reserve[]: 0 0 0 0 [ 744.754381] DMA: 445*4kB 36*8kB 3*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4260kB [ 744.754386] Normal: 1132*4kB 620*8kB 237*16kB 70*32kB 38*64kB 26*128kB 20*256kB 14*512kB 4*1024kB 3*2048kB 0*4096kB = 43808kB [ 744.754390] HighMem: 226*4kB 242*8kB 155*16kB 66*32kB 10*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 2*2048kB 11574*4096kB = 47420680kB [ 744.754395] 4148173 total pagecache pages [ 744.754396] 0 pages in swap cache [ 744.754397] Swap cache stats: add 0, delete 0, find 0/0 [ 744.754397] Free swap = 0kB [ 744.754398] Total swap = 0kB [ 744.900649] 16777200 pages RAM [ 744.900650] 16549378 pages HighMem [ 744.900651] 664304 pages reserved [ 744.900652] 4162276 pages shared [ 744.900653] 104263 pages non-shared ? (The above and similar were reported to http://bugs.debian.org/695182 .) Do you want me to log and report something else? I believe the above crash may be provoked simply by running: n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; (( n = $n + 1 )); done on any PAE machine with over 32GB RAM. Oddly the problem does not seem to occur when using mem=32g or lower on the kernel boot line (or on machines with less than 32GB RAM). Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301242343.r0onhjxr024...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()
Dear Jan, I think he found the culprit of the problem being min_free_kbytes was not properly reflected in the dirty throttling. ... Paul please correct me if I'm wrong. Sorry but have to correct you. I noticed and patched/corrected two problems, one with (setpoint-dirty) in bdi_position_ratio(), another with min_free_kbytes not subtracted from dirtyable memory. Fixing those problems, singly or in combination, did not help in avoiding OOM: running n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; ((n=$n+1)); done still produces an OOM after a few files written (on a PAE machine with over 32GB RAM). Also, a quite similar OOM may be produced on any PAE machine with n=0; while [ $n -lt 33000 ]; do sleep 600 ((n=n+1)); done This was tested on machines with as low as just 3GB RAM ... and curiously the same machine with plain (not PAE but HIGHMEM4G) kernel handles the same sleep test without any problems. (Thus I now think that the remaining bug is not with writeback.) Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301250015.r0p0fr3t003...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()
Dear Fengguang, There are 260MB reclaimable slab pages in the normal zone ... Marked all_unreclaimable? yes: is that wrong? Question asked also in: http://marc.info/?l=linux-mmm=135873981326767w=2 ... however we somehow failed to reclaim them. ... I made a patch that would do a drop_caches at that point, please see: http://bugs.debian.org/695182 http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;filename=drop_caches.patch;att=1;bug=695182 http://marc.info/?l=linux-mmm=135785511125549w=2 and that successfully avoided OOM when writing files. But, the drop_caches patch did not protect against the sleep test. ... What's your filesystem and the content of /proc/slabinfo? Filesystem is EXT3. See output of slabinfo in Debian bug above or in http://marc.info/?l=linux-mmm=135796154427544w=2 Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301250147.r0p1l00t001...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
Dear Minchan, So what's the effect for user? Sorry I have no idea. The kernel seems to work well without this patch; or in fact not so well, PAE crashing with spurious OOM. In my fruitless efforts of avoiding OOM by sensible choices of sysctl tunables, I noticed that maybe the treatment of min_free_kbytes was not right. Getting this right did not help in avoiding OOM. It seems you saw old kernel. Yes I have Debian on my machines. :-) Current kernel includes following logic. static unsigned long global_dirtyable_memory(void) { unsigned long x; x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); x -= min(x, dirty_balance_reserve); if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); return x + 1; /* Ensure that we never return 0 */ } And dirty_lanace_reserve already includes high_wmark_pages. Look at calculate_totalreserve_pages. So I think we don't need this patch. Thanks. Presumably, dirty_balance_reserve takes min_free_kbytes into account? Then I agree that this patch is not needed on those newer kernels. A question: what is the use or significance of vm_highmem_is_dirtyable? It seems odd that it would be used in setting limits or threshholds, but not used in decisions where to put dirty things. Is that so, is that as should be? What is the recommended setting of highmem_is_dirtyable? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301230311.r0n3bpde032...@como.maths.usyd.edu.au
Bug#695182: Write couple of 1GB files for OOM crash
Dear Jonathan, Thanks again for your help in writing correct and acceptable patches. I did change a few things, hopefully for the better. I decided not to push the drop-caches part of my patch; because it now seems to me that it is not the essence of the issue: it protects against OOM when writing a few files, but does not protect when running a few sleeps. I am coming back to the idea that this is some signed-vs-unsigned or similar issue... though I could not find it yet! --- Using the amd64 kernel seems a workable workaround for the OOM issue. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301211139.r0lbdpee003...@como.maths.usyd.edu.au
Bug#695182: [PATCH] MAX_PAUSE to be at least 4
Ensure MAX_PAUSE is 4 or larger, so limits in return clamp_val(t, 4, MAX_PAUSE); (the only use of it) are not back-to-front. (This patch does not solve the PAE OOM issue.) Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Reported-by: Paul Szabo p...@maths.usyd.edu.au Reference: http://bugs.debian.org/695182 Signed-off-by: Paul Szabo p...@maths.usyd.edu.au --- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100 +++ mm/page-writeback.c 2013-01-21 13:57:05.0 +1100 @@ -39,7 +39,7 @@ /* * Sleep at most 200ms at a time in balance_dirty_pages(). */ -#define MAX_PAUSE max(HZ/5, 1) +#define MAX_PAUSE max(HZ/5, 4) /* * Estimate write bandwidth at 200ms intervals. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301210307.r0l37yug018...@como.maths.usyd.edu.au
Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory
When calculating amount of dirtyable memory, min_free_kbytes should be subtracted because it is not intended for dirty pages. Using an extern int because that is the only interface to some such sysctl values. (This patch does not solve the PAE OOM issue.) Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Reported-by: Paul Szabo p...@maths.usyd.edu.au Reference: http://bugs.debian.org/695182 Signed-off-by: Paul Szabo p...@maths.usyd.edu.au --- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100 +++ mm/page-writeback.c 2013-01-21 13:57:05.0 +1100 @@ -343,12 +343,16 @@ unsigned long determine_dirtyable_memory(void) { unsigned long x; + extern int min_free_kbytes; x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); + /* Subtract min_free_kbytes */ + x -= min(x, min_free_kbytes (PAGE_SHIFT - 10)); + return x + 1; /* Ensure that we never return 0 */ } -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301210315.r0l3fngv021...@como.maths.usyd.edu.au
Bug#695182: [RFC] Comments and questions
Many comments and questions: In __alloc_pages_slowpath(), did_some_progress is set twice but only checked after the second setting, so the first setting is wasted. [Setting of MAX_PAUSE reported previously.] The setting of highmem_is_dirtyable seems used only to calculate limits and threshholds, not used in any decisions: seems odd. [Subtraction of min_free_kbytes reported previously.] Sanity check of input values in bdi_position_ratio(). [Difference (setpoint-dirty) reported previously.] Seems that bdi_max_pause() always returns a too-small value, maybe it should simply return a fixed value. A test in balance_dirty_pages() marked unlikely() observed to be quite common. Maybe zone_reclaimable() should return true with non-zero NR_SLAB_RECLAIMABLE. Seems that all_unreclaimable may be set wrongly or too early. Maybe global_reclaimable_pages() and zone_reclaimable_pages() should add or include NR_SLAB_RECLAIMABLE. (This does not solve the PAE OOM issue.) Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Reported-by: Paul Szabo p...@maths.usyd.edu.au Reference: http://bugs.debian.org/695182 Signed-off-by: Paul Szabo p...@maths.usyd.edu.au --- mm/page_alloc.c.old 2012-12-06 22:20:40.0 +1100 +++ mm/page_alloc.c 2013-01-18 14:07:31.0 +1100 @@ -2207,6 +2207,10 @@ rebalance: * If we failed to make any progress reclaiming, then we are * running out of options and have to consider going OOM */ + /* +* We had did_some_progress set twice, but is only checked here +* so the first setting was lost. Is that as should be? +*/ if (!did_some_progress) { if ((gfp_mask __GFP_FS) !(gfp_mask __GFP_NORETRY)) { if (oom_killer_disabled) --- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100 +++ mm/page-writeback.c 2013-01-20 07:35:52.0 +1100 @@ -39,7 +39,7 @@ /* * Sleep at most 200ms at a time in balance_dirty_pages(). */ -#define MAX_PAUSE max(HZ/5, 1) +#define MAX_PAUSE max(HZ/5, 4) /* * Estimate write bandwidth at 200ms intervals. @@ -343,12 +343,22 @@ static unsigned long highmem_dirtyable_m unsigned long determine_dirtyable_memory(void) { unsigned long x; + extern int min_free_kbytes; x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); + /* +* Seems that highmem_is_dirtyable is only used here, in the +* calculation of limits and threshholds of dirtiness, not in deciding +* where to put dirty things. Is that so? Is that as should be? +* What is the recommended setting of highmem_is_dirtyable? +*/ if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); + /* Subtract min_free_kbytes */ + x -= min(x, min_free_kbytes (PAGE_SHIFT - 10)); + return x + 1; /* Ensure that we never return 0 */ } @@ -541,6 +551,9 @@ static unsigned long bdi_position_ratio( if (unlikely(dirty = limit)) return 0; + /* Never seen this happen, just sanity-check paranoia */ + if (unlikely(freerun = dirty)) + return 16 RATELIMIT_CALC_SHIFT; /* * global setpoint @@ -559,7 +572,7 @@ static unsigned long bdi_position_ratio( * = fast response on large errors; small oscillation near setpoint */ setpoint = (freerun + limit) / 2; - x = div_s64((setpoint - dirty) RATELIMIT_CALC_SHIFT, + x = div_s64(((s64)setpoint - (s64)dirty) RATELIMIT_CALC_SHIFT, limit - setpoint + 1); pos_ratio = x; pos_ratio = pos_ratio * x RATELIMIT_CALC_SHIFT; @@ -995,6 +1008,13 @@ static unsigned long bdi_max_pause(struc * The pause time will be settled within range (max_pause/4, max_pause). * Apply a minimal value of 4 to get a non-zero max_pause/4. */ + /* +* On large machine it seems we always return 4, +* on smaller desktop machine mostly return 5 (rarely 9 or 14). +* Are those too small? Should we return something fixed e.g. + return (HZ/10); +* instead of this wasted/useless calculation? +*/ return clamp_val(t, 4, MAX_PAUSE); } @@ -1109,6 +1129,11 @@ static void balance_dirty_pages(struct a } pause = HZ * pages_dirtied / task_ratelimit; if (unlikely(pause = 0)) { + /* +* Not unlikely: often we get zero. +* Seems we always get 0 on large machine. +* Should not do a pause of 1 here? +*/ trace_balance_dirty_pages(bdi, dirty_thresh
Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()
In bdi_position_ratio(), get difference (setpoint-dirty) right even when negative. Both setpoint and dirty are unsigned long, the difference was zero-padded thus wrongly sign-extended to s64. This issue affects all 32-bit architectures, does not affect 64-bit architectures where long and s64 are equivalent. In this function, dirty is between freerun and limit, the pseudo-float x is between [-1,1], expected to be negative about half the time. With zero-padding, instead of a small negative x we obtained a large positive one so bdi_position_ratio() returned garbage. Casting the difference to s64 also prevents overflow with left-shift; though normally these numbers are small and I never observed a 32-bit overflow there. (This patch does not solve the PAE OOM issue.) Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Reported-by: Paul Szabo p...@maths.usyd.edu.au Reference: http://bugs.debian.org/695182 Signed-off-by: Paul Szabo p...@maths.usyd.edu.au --- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100 +++ mm/page-writeback.c 2013-01-20 07:47:55.0 +1100 @@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio( * = fast response on large errors; small oscillation near setpoint */ setpoint = (freerun + limit) / 2; - x = div_s64((setpoint - dirty) RATELIMIT_CALC_SHIFT, + x = div_s64(((s64)setpoint - (s64)dirty) RATELIMIT_CALC_SHIFT, limit - setpoint + 1); pos_ratio = x; pos_ratio = pos_ratio * x RATELIMIT_CALC_SHIFT; -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130122.r0k02atl031...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Dave, On my large machine, 'free' fails to show about 2GB memory ... You probably have a memory hole. ... The e820 map (during early boot in dmesg) or /proc/iomem will let you locate your memory holes. Now that my machine is running an amd64 kernel, 'free' shows total Mem 65854128 (up from 64447796 with PAE kernel), and I do not see much change in /proc/iomem output (below). Is that as should be? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- root@zeno:~# uname -a Linux zeno.maths.usyd.edu.au 3.2.35-pk06.12-amd64 #2 SMP Thu Jan 17 13:19:53 EST 2013 x86_64 GNU/Linux root@zeno:~# free total used free sharedbuffers cached Mem: 658541281591704 64262424 0 227036 175620 -/+ buffers/cache:1189048 64665080 Swap:195312636 0 195312636 root@zeno:~# cat /proc/iomem - : reserved 0001-00099bff : System RAM 00099c00-0009 : reserved 000a-000b : PCI Bus :00 000c-000d : PCI Bus :00 000c-000c7fff : Video ROM 000c8000-000cf5ff : Adapter ROM 000cf800-000d07ff : Adapter ROM 000d0800-000d0bff : Adapter ROM 000e-000f : reserved 000f-000f : System ROM 0010-7e445fff : System RAM 0100-0168f8c6 : Kernel code 0168f8c7-018f24bf : Kernel data 0197d000-019dafff : Kernel bss 7e446000-7e565fff : ACPI Non-volatile Storage 7e566000-7f1e2fff : reserved 7f1e3000-7f25efff : ACPI Tables 7f25f000-7f31cfff : reserved 7f31d000-7f323fff : ACPI Non-volatile Storage 7f324000-7f333fff : reserved 7f334000-7f33bfff : ACPI Non-volatile Storage 7f33c000-7f365fff : reserved 7f366000-7f7f : ACPI Non-volatile Storage 7f80-7fff : RAM buffer 8000-dfff : PCI Bus :00 8000-8fff : PCI MMCONFIG [bus 00-ff] 8000-8fff : reserved 9000-900f : :00:16.0 9010-901f : :00:16.1 dd00-ddff : PCI Bus :08 dd00-ddff : :08:03.0 de00-de4f : PCI Bus :07 de00-de3f : :07:00.0 de47c000-de47 : :07:00.0 de60-de6f : PCI Bus :02 df00-df8f : PCI Bus :08 df00-df7f : :08:03.0 df80-df803fff : :08:03.0 df90-df9f : PCI Bus :07 dfa0-dfaf : PCI Bus :02 dfa0-dfa1 : :02:00.1 dfa0-dfa1 : igb dfa2-dfa3 : :02:00.0 dfa2-dfa3 : igb dfa4-dfa43fff : :02:00.1 dfa4-dfa43fff : igb dfa44000-dfa47fff : :02:00.0 dfa44000-dfa47fff : igb dfb0-dfb03fff : :00:04.7 dfb04000-dfb07fff : :00:04.6 dfb08000-dfb0bfff : :00:04.5 dfb0c000-dfb0 : :00:04.4 dfb1-dfb13fff : :00:04.3 dfb14000-dfb17fff : :00:04.2 dfb18000-dfb1bfff : :00:04.1 dfb1c000-dfb1 : :00:04.0 dfb2-dfb200ff : :00:1f.3 dfb21000-dfb217ff : :00:1f.2 dfb21000-dfb217ff : ahci dfb22000-dfb223ff : :00:1d.0 dfb22000-dfb223ff : ehci_hcd dfb23000-dfb233ff : :00:1a.0 dfb23000-dfb233ff : ehci_hcd dfb25000-dfb25fff : :00:05.4 dfffc000-dfffdfff : pnp 00:02 e000-fbff : PCI Bus :80 fbe0-fbef : PCI Bus :84 fbe0-fbe3 : :84:00.0 fbe4-fbe5 : :84:00.0 fbe6-fbe63fff : :84:00.0 fbf0-fbf03fff : :80:04.7 fbf04000-fbf07fff : :80:04.6 fbf08000-fbf0bfff : :80:04.5 fbf0c000-fbf0 : :80:04.4 fbf1-fbf13fff : :80:04.3 fbf14000-fbf17fff : :80:04.2 fbf18000-fbf1bfff : :80:04.1 fbf1c000-fbf1 : :80:04.0 fbf2-fbf20fff : :80:05.4 fbffe000-fbff : pnp 00:12 fc00-fcff : pnp 00:01 fd00-fdff : pnp 00:01 fe00-feaf : pnp 00:01 feb0-febf : pnp 00:01 fec0-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec4-fec403ff : IOAPIC 2 fed0-fed003ff : HPET 0 fed08000-fed08fff : pnp 00:0c fed1c000-fed3 : reserved fed1c000-fed1 : pnp 00:0c fed45000-fedf : pnp 00:01 fee0-fee00fff : Local APIC ff00- : reserved ff00- : pnp 00:0c 1-107fff : System RAM root@zeno:~# --- For comparison, output obtained (and reported previously) when machine was running PAE kernel: root@zeno:~# cat /proc/iomem - : reserved 0001-00099bff : System RAM 00099c00-0009 : reserved 000a-000b : PCI Bus :00 000a-000b : Video RAM area 000c-000d : PCI Bus :00 000c-000c7fff : Video ROM 000c8000-000cf5ff : Adapter ROM 000cf800-000d07ff : Adapter ROM 000d0800-000d0bff : Adapter ROM 000e-000f : reserved 000f-000f : System ROM 0010-7e445fff : System RAM 0100-01610e15 : Kernel code 01610e16-01802dff : Kernel data 0188-018b2fff : Kernel bss 7e446000-7e565fff : ACPI
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Sedat, ... it really makes sense to switch to x86_64 (amd64) architecture when you have a modern computer. Switching makes even more sense when you have more than 4GiB RAM. You seem to say that one should switch to amd64 (if hardware allows), even with less than 4GB RAM (where 32-bit non-PAE HIGHMEM4G kernel would work fine), and that one should definitely switch with over 4GB RAM. There would be no need or use for PAE kernels, which should be dropped. I think I agree. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- Quoting in full for the benefit of 695...@bugs.debian.org : From sedat.di...@gmail.com Tue Jan 15 21:26:14 2013 Date: Tue, 15 Jan 2013 11:25:41 +0100 Subject: Re: [RFC] Reproducible OOM with just a few sleeps From: Sedat Dilek sedat.di...@gmail.com To: paul.sz...@sydney.edu.au, Paul Szabo p...@maths.usyd.edu.au Cc: LKML linux-ker...@vger.kernel.org, linux-mm linux...@kvack.org, Ben Hutchings b...@decadent.org.uk Hi Paul, I followed a bit the thread you started in [1]. As you might know i386 got eliminated in Linux-3.8. I had several discussions with the Debian kernel-team about the iN86 (N=4..6) and PAE kernel-flavours. On the one hand I can understand the reduction of linux-images especially for iN86. Even i486 is a bit unfirm as there is no much hardware around, but Debian will keep i486 for a while (release maintenance). Topic PAE: Unfortunately, I had a notebook with a Intel Centrino Banias CPU (no PAE) which should use the -486 kernel-flavour due to the Debian kernel-team. I played with some different kernel-setup which did not give me more benefit (openssl benchmarks etc.) The -686-pae kernel did run on my hardware, but as known with all the SMP-NO-OPs. Depending on the hardware, it really makes sense to switch to x86_64 (amd64) architecture when you have a modern computer. Switching makes even more sense when you have more than 4GiB RAM. IMHO using a -686-amd64 Debian kernel makes ZERO sense, real 64-Bit or die! I switched to 64-bit... and I switched from Debian/sid to Ubuntu/precise as well :-). ( NOTE: I am working here since April 2012 in a WUBI environment (no native Ubuntu Linux) :-). ) And I am building my kernels by myself. So I know very well whom to blame :-). Some last words: I had several fruitful or fruitless discussions with the Debian kernel-team, but I can confirm (with all my heart) this team makes a fantastic job. I can recommend you Ben's blog (recently I read a series about news in the Debian/wheezy kernel) if your world is Debian or Ubuntu (Debian != Ubuntu). Just my 0.02EUR (no British pound, here as well: when you are a member of the EU chose EUR not pound!). Regards, - Sedat - [1] http://marc.info/?t=13579617221r=1w=2 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301160022.r0g0mdgj010...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Dave, Seems that any i386 PAE machine will go OOM just by running a few processes. To reproduce: sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 ((n=n+1)); done' ... I think what you're seeing here is that, as the amount of total memory increases, the amount of lowmem available _decreases_ due to inflation of mem_map[] (and a few other more minor things). The number of sleeps you can do is bound by the number of processes, as you noticed from ulimit. Creating processes that don't use much memory eats a relatively large amount of low memory. This is a sad (and counterintuitive) fact: more RAM actually *CREATES* RAM bottlenecks on 32-bit systems. I understand that more RAM leaves less lowmem. What is unacceptable is that PAE crashes or freezes with OOM: it should gracefully handle the issue. Noting that (for a machine with 4GB or under) PAE fails where the HIGHMEM4G kernel succeeds and survives. On my large machine, 'free' fails to show about 2GB memory ... You probably have a memory hole. ... The e820 map (during early boot in dmesg) or /proc/iomem will let you locate your memory holes. Thanks, that might explain it. Output of /proc/iomem below: sorry I do not know how to interpret it. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- root@zeno:~# cat /proc/iomem - : reserved 0001-00099bff : System RAM 00099c00-0009 : reserved 000a-000b : PCI Bus :00 000a-000b : Video RAM area 000c-000d : PCI Bus :00 000c-000c7fff : Video ROM 000c8000-000cf5ff : Adapter ROM 000cf800-000d07ff : Adapter ROM 000d0800-000d0bff : Adapter ROM 000e-000f : reserved 000f-000f : System ROM 0010-7e445fff : System RAM 0100-01610e15 : Kernel code 01610e16-01802dff : Kernel data 0188-018b2fff : Kernel bss 7e446000-7e565fff : ACPI Non-volatile Storage 7e566000-7f1e2fff : reserved 7f1e3000-7f25efff : ACPI Tables 7f25f000-7f31cfff : reserved 7f31d000-7f323fff : ACPI Non-volatile Storage 7f324000-7f333fff : reserved 7f334000-7f33bfff : ACPI Non-volatile Storage 7f33c000-7f365fff : reserved 7f366000-7f7f : ACPI Non-volatile Storage 7f80-7fff : RAM buffer 8000-dfff : PCI Bus :00 8000-8fff : PCI MMCONFIG [bus 00-ff] 8000-8fff : reserved 9000-900f : :00:16.0 9010-901f : :00:16.1 dd00-ddff : PCI Bus :08 dd00-ddff : :08:03.0 de00-de4f : PCI Bus :07 de00-de3f : :07:00.0 de47c000-de47 : :07:00.0 de60-de6f : PCI Bus :02 df00-df8f : PCI Bus :08 df00-df7f : :08:03.0 df80-df803fff : :08:03.0 df90-df9f : PCI Bus :07 dfa0-dfaf : PCI Bus :02 dfa0-dfa1 : :02:00.1 dfa0-dfa1 : igb dfa2-dfa3 : :02:00.0 dfa2-dfa3 : igb dfa4-dfa43fff : :02:00.1 dfa4-dfa43fff : igb dfa44000-dfa47fff : :02:00.0 dfa44000-dfa47fff : igb dfb0-dfb03fff : :00:04.7 dfb04000-dfb07fff : :00:04.6 dfb08000-dfb0bfff : :00:04.5 dfb0c000-dfb0 : :00:04.4 dfb1-dfb13fff : :00:04.3 dfb14000-dfb17fff : :00:04.2 dfb18000-dfb1bfff : :00:04.1 dfb1c000-dfb1 : :00:04.0 dfb2-dfb200ff : :00:1f.3 dfb21000-dfb217ff : :00:1f.2 dfb21000-dfb217ff : ahci dfb22000-dfb223ff : :00:1d.0 dfb22000-dfb223ff : ehci_hcd dfb23000-dfb233ff : :00:1a.0 dfb23000-dfb233ff : ehci_hcd dfb25000-dfb25fff : :00:05.4 dfffc000-dfffdfff : pnp 00:02 e000-fbff : PCI Bus :80 fbe0-fbef : PCI Bus :84 fbe0-fbe3 : :84:00.0 fbe4-fbe5 : :84:00.0 fbe6-fbe63fff : :84:00.0 fbf0-fbf03fff : :80:04.7 fbf04000-fbf07fff : :80:04.6 fbf08000-fbf0bfff : :80:04.5 fbf0c000-fbf0 : :80:04.4 fbf1-fbf13fff : :80:04.3 fbf14000-fbf17fff : :80:04.2 fbf18000-fbf1bfff : :80:04.1 fbf1c000-fbf1 : :80:04.0 fbf2-fbf20fff : :80:05.4 fbffe000-fbff : pnp 00:12 fc00-fcff : pnp 00:01 fd00-fdff : pnp 00:01 fe00-feaf : pnp 00:01 feb0-febf : pnp 00:01 fec0-fec003ff : IOAPIC 0 fec01000-fec013ff : IOAPIC 1 fec4-fec403ff : IOAPIC 2 fed0-fed003ff : HPET 0 fed08000-fed08fff : pnp 00:0c fed1c000-fed3 : reserved fed1c000-fed1 : pnp 00:0c fed45000-fedf : pnp 00:01 fee0-fee00fff : Local APIC ff00- : reserved ff00- : pnp 00:0c 1-107fff : System RAM root@zeno:~# -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301142036
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Dave, ... What is unacceptable is that PAE crashes or freezes with OOM: it should gracefully handle the issue. Noting that (for a machine with 4GB or under) PAE fails where the HIGHMEM4G kernel succeeds ... You have found a delta, but you're not really making apples-to-apples comparisons. The page tables ... I understand that the exact sizes of page tables are very important to developers. To the rest of us, all that matters is that the kernel moves them to highmem or swap or whatever, that it maybe emits some error message but that it does not crash or freeze. There's probably a bug here. But, it's incredibly unlikely to be seen in practice on anything resembling a modern system. ... Probably, I found the bug on a very modern and brand-new system, just trying to copy a few ISO image files and trying to log in a hundred students. My machine crashed under those very practical and normal circumstances. The demos with dd and sleep were just that: easily reproducible demos. ... easily worked around by upgrading to a 64-bit kernel ... Do you mean that PAE should never be used, but to use amd64 instead? ... Raising the vm.min_free_kbytes sysctl (to perhaps 10x of its current value on your system) is likely to help the hangs too, although it will further consume lowmem. I have tried that, it did not work. As you say, it is backward. ... for a bug with ... so many reasonable workarounds ... Only one workaround was proposed: use amd64. PAE is buggy and useless, should be deprecated and removed. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301150216.r0f2gnyw022...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with partial workaround
Dear Dave, You wrote: ... 64-bit kernels should basically be drop-in replacements for 32-bit ones. You can keep userspace 100% 32-bit, and just have a 64-bit kernel. Any advice on how I would install a 64-bit kernel, particularly in the Debian world? Seems to me that on a 32-bit machine, apt-get does not see the amd64 kernels. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301132332.r0dnwlra027...@como.maths.usyd.edu.au
Bug#695182: Installing a 64-bit kernel on Debian i386 (Re: [RFC] Reproducible OOM with partial workaround)
Dear Jonathan, B) The modern way: dpkg --add-architecture amd64 apt-get update apt-get install linux-image-3.2.0-4-amd64:amd64 Thanks, that seems to have worked well on my desktop PC. Will now test that everything works still, then similarly convert all my machines, hopefully so abandoning buggy PAE. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301140041.r0e0f77p014...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
The issue is a regression with PAE, reproduced and verified on Ubuntu, on my home PC with 3GB RAM. My PC was running kernel linux-image-3.2.0-35-generic so it showed: psz@DellE520:~$ uname -a Linux DellE520 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:45:18 UTC 2012 i686 i686 i386 GNU/Linux psz@DellE520:~$ free -l total used free sharedbuffers cached Mem: 3087972 6922562395716 0 18276 427116 Low:861464 71372 790092 High: 2226508 6208841605624 -/+ buffers/cache: 2468642841108 Swap: 2920 258364 19742556 Then it handled the sleep test bash -c 'n=0; while [ $n -lt 33000 ]; do sleep 600 ((n=n+1)); ((m=n%500)); if [ $m -lt 1 ]; then echo -n $n - ; date; free -l; sleep 1; fi; done' just fine, stopped only by max user processes (default setting of ulimit -u 23964), or raising that limit stopped when the machine ran out of PID space; there was no OOM. Installing and running the PAE kernel so it showed: psz@DellE520:~$ uname -a Linux DellE520 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5 18:04:39 UTC 2012 i686 i686 i386 GNU/Linux psz@DellE520:~$ free -l total used free sharedbuffers cached Mem: 3087620 6811882406432 0 167332 352296 Low:865208 214080 651128 High: 412 4671081755304 -/+ buffers/cache: 1615602926060 Swap: 2920 0 2920 and re-trying the sleep test, it ran into OOM after 18000 or so sleeps and crashed/froze so I had to press the POWER button to recover. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301121941.r0cjf5ps017...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Reported to Ubuntu also: PAE regression: OOM with just a few sleeps https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098961 Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301122020.r0ckk04m018...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with partial workaround
Dear Andrew, Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads. Please see below: I do not know what any of that means. This machine has been running just fine, with all my users logging in here via XDMCP from X-terminals, dozens logged in simultaneously. (But, I think I could make it go OOM with more processes or logins.) If so, you *may* be able to work around this by setting /proc/sys/vm/dirty_ratio really low, so the system keeps a minimum amount of dirty pagecache around. Then, with luck, if we haven't broken the buffer_heads_over_limit logic it in the past decade (we probably have), the VM should be able to reclaim those buffer_heads. I tried setting dirty_ratio to funny values, that did not seem to help. Did you notice my patch about bdi_position_ratio(), how it was plain wrong half the time (for negative x)? Anyway that did not help. Alternatively, use a filesystem which doesn't attach buffer_heads to dirty pages. xfs or btrfs, perhaps. Seems there is also a problem not related to filesystem... or rather, the essence does not seem to be filesystem or caches. The filesystem thing now seems OK with my patch doing drop_caches. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- root@como:~# free -lm total used free sharedbuffers cached Mem: 62936 2317 60618 0 41635 Low: 367271 95 High:62569 2045 60523 -/+ buffers/cache: 1640 61295 Swap: 131071 0 131071 root@como:~# cat /proc/slabinfo slabinfo - version: 2.1 # nameactive_objs num_objs objsize objperslab pagesperslab : tunables limit batchcount sharedfactor : slabdata active_slabs num_slabs sharedavail fuse_request 0 0376 434 : tunables000 : slabdata 0 0 0 fuse_inode 0 0448 364 : tunables000 : slabdata 0 0 0 bsg_cmd0 0288 282 : tunables000 : slabdata 0 0 0 ntfs_big_inode_cache 0 0512 324 : tunables000 : slabdata 0 0 0 ntfs_inode_cache 0 0176 462 : tunables000 : slabdata 0 0 0 nfs_direct_cache 0 0 80 511 : tunables000 : slabdata 0 0 0 nfs_inode_cache 5404 5404584 284 : tunables000 : slabdata193193 0 isofs_inode_cache 0 0360 454 : tunables000 : slabdata 0 0 0 fat_inode_cache0 0408 404 : tunables000 : slabdata 0 0 0 fat_cache 0 0 24 1701 : tunables000 : slabdata 0 0 0 jbd2_revoke_record 0 0 32 1281 : tunables000 : slabdata 0 0 0 journal_handle 5440 5440 24 1701 : tunables000 : slabdata 32 32 0 journal_head 16768 16768 64 641 : tunables000 : slabdata262262 0 revoke_record 20224 20224 16 2561 : tunables000 : slabdata 79 79 0 ext4_inode_cache 0 0584 284 : tunables000 : slabdata 0 0 0 ext4_free_data 0 0 40 1021 : tunables000 : slabdata 0 0 0 ext4_allocation_context 0 0112 361 : tunables00 0 : slabdata 0 0 0 ext4_prealloc_space 0 0 72 561 : tunables000 : slabdata 0 0 0 ext4_io_end0 0576 284 : tunables000 : slabdata 0 0 0 ext4_io_page 0 0 8 5121 : tunables000 : slabdata 0 0 0 ext2_inode_cache 0 0480 344 : tunables000 : slabdata 0 0 0 ext3_inode_cache 16531 19965488 334 : tunables000 : slabdata605605 0 ext3_xattr 0 0 48 851 : tunables000 : slabdata 0 0 0 dquot840840192 422 : tunables000 : slabdata 20 20 0 rpc_inode_cache 144144448 364 : tunables000 : slabdata 4 4 0 UDP-Lite 0 0576 284 : tunables000 : slabdata 0 0 0 xfrm_dst_cache 0 0320 514 : tunables000 : slabdata 0 0 0 UDP 896896576 284 : tunables000 : slabdata 32 32 0 tw_sock_TCP 1344 1344128 321
Bug#695182: [RFC] Reproducible OOM with partial workaround
Dear Andrew, Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads. Please see below ... ... Was this dump taken when the system was at or near oom? No, that was a quiescent machine. Please see a just-before-OOM dump in my next message (in a little while). Please send a copy of the oom-killer kernel message dump, if you still have one. Please see one in next message, or in http://bugs.debian.org/695182 I tried setting dirty_ratio to funny values, that did not seem to help. Did you try setting it as low as possible? Probably. Maybe. Sorry, cannot say with certainty. Did you notice my patch about bdi_position_ratio(), how it was plain wrong half the time (for negative x)? Nope, please resend. Quoting from http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;att=1;bug=695182 : ... - In bdi_position_ratio() get difference (setpoint-dirty) right even when it is negative, which happens often. Normally these numbers are small and even with left-shift I never observed a 32-bit overflow. I believe it should be possible to re-write the whole function in 32-bit ints; maybe it is not worth the effort to make it efficient; seeing how this function was always wrong and we survived, it should simply be removed. ... --- mm/page-writeback.c.old 2012-10-17 13:50:15.0 +1100 +++ mm/page-writeback.c 2013-01-06 21:54:59.0 +1100 [ Line numbers out because other patches not shown ] ... @@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio( * = fast response on large errors; small oscillation near setpoint */ setpoint = (freerun + limit) / 2; - x = div_s64((setpoint - dirty) RATELIMIT_CALC_SHIFT, + x = div_s64(((s64)setpoint - (s64)dirty) RATELIMIT_CALC_SHIFT, limit - setpoint + 1); pos_ratio = x; pos_ratio = pos_ratio * x RATELIMIT_CALC_SHIFT; ... Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301120324.r0c3o7dy015...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with just a few sleeps
Dear Linux-MM, Seems that any i386 PAE machine will go OOM just by running a few processes. To reproduce: sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600 ((n=n+1)); done' My machine has 64GB RAM. With previous OOM episodes, it seemed that running (booting) it with mem=32G might avoid OOM; but an OOM was obtained just the same, and also with lower memory: Memorysleeps to OOM free shows total (mem=64G) 5300 64447796 mem=32G 10200 31155512 mem=16G 13400 14509364 mem=8G14200 6186296 mem=6G15200 4105532 mem=4G16400 2041364 The machine does not run out of highmem, nor does it use any swap. Comparing with my desktop PC: has 4GB RAM installed, free shows 3978592 total. Running the sleep test, it simply froze after 16400 running... no response to ping, will need to press the RESET button. --- On my large machine, 'free' fails to show about 2GB memory, e.g. with mem=16G it shows: root@zeno:~# free -l total used free sharedbuffers cached Mem: 14509364 435440 14073924 0 4068 111328 Low:769044 120232 648812 High: 13740320 315208 13425112 -/+ buffers/cache: 320044 14189320 Swap:134217724 0 134217724 --- Please let me know of any ideas, or if you want me to run some other test or want to see some other output. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia - Details for when my machine was running with 64GB RAM: In another window I was running cat /proc/slabinfo; free -l repeatedly, and output of that (just before OOM) was: + cat /proc/slabinfo slabinfo - version: 2.1 # nameactive_objs num_objs objsize objperslab pagesperslab : tunables limit batchcount sharedfactor : slabdata active_slabs num_slabs sharedavail fuse_request 0 0376 434 : tunables000 : slabdata 0 0 0 fuse_inode 0 0448 364 : tunables000 : slabdata 0 0 0 bsg_cmd0 0288 282 : tunables000 : slabdata 0 0 0 ntfs_big_inode_cache 0 0512 324 : tunables000 : slabdata 0 0 0 ntfs_inode_cache 0 0176 462 : tunables000 : slabdata 0 0 0 nfs_direct_cache 0 0 80 511 : tunables000 : slabdata 0 0 0 nfs_inode_cache 28 28584 284 : tunables000 : slabdata 1 1 0 isofs_inode_cache 0 0360 454 : tunables000 : slabdata 0 0 0 fat_inode_cache0 0408 404 : tunables000 : slabdata 0 0 0 fat_cache 0 0 24 1701 : tunables000 : slabdata 0 0 0 jbd2_revoke_record 0 0 32 1281 : tunables000 : slabdata 0 0 0 journal_handle 4080 4080 24 1701 : tunables000 : slabdata 24 24 0 journal_head1024 1024 64 641 : tunables000 : slabdata 16 16 0 revoke_record768768 16 2561 : tunables000 : slabdata 3 3 0 ext4_inode_cache 0 0584 284 : tunables000 : slabdata 0 0 0 ext4_free_data 0 0 40 1021 : tunables000 : slabdata 0 0 0 ext4_allocation_context 0 0112 361 : tunables00 0 : slabdata 0 0 0 ext4_prealloc_space 0 0 72 561 : tunables000 : slabdata 0 0 0 ext4_io_end0 0576 284 : tunables000 : slabdata 0 0 0 ext4_io_page 0 0 8 5121 : tunables000 : slabdata 0 0 0 ext2_inode_cache 0 0480 344 : tunables000 : slabdata 0 0 0 ext3_inode_cache1467 2079488 334 : tunables000 : slabdata 63 63 0 ext3_xattr 0 0 48 851 : tunables000 : slabdata 0 0 0 dquot168168192 422 : tunables000 : slabdata 4 4 0 rpc_inode_cache 108108448 364 : tunables000 : slabdata 3 3 0 UDP-Lite 0 0576 284 : tunables000 : slabdata 0 0 0 xfrm_dst_cache 0 0320 514 : tunables000 : slabdata 0 0 0 UDP 336336576 284
Bug#695182: Write couple of 1GB files for OOM crash
Dear Jonathan, ... once you have a reproducible test I imagine the mm folks will already be very interested and they may be able to help ... But, I do already have a reproducible test! Write a few files, as per the initial message in this http://bugs.debian.org/695182 ; I also have a patch/solution/workaround for that particular test. Now I observed another way of making a machine with 64GB crash (sorry, not crash but to suffer an OOM episode). I am pretty sure this other test is reproducible, but is cumbersome to set up and tedious to do. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301100913.r0a9dqdr016...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with partial workaround
Dear Linux-MM, On a machine with i386 kernel and over 32GB RAM, an OOM condition is reliably obtained simply by writing a few files to some local disk e.g. with: n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; ((n=$n+1)); done Crash usually occurs after 16 or 32 files written. Seems that the problem may be avoided by using mem=32G on the kernel boot, and that it occurs with any amount of RAM over 32GB. I developed a workaround patch for this particular OOM demo, dropping filesystem caches when about to exhaust lowmem. However, subsequently I observed OOM when running many processes (as yet I do not have an easy-to-reproduce demo of this); so as I suspected, the essence of the problem is not with FS caches. Could you please help in finding the cause of this OOM bug? Please see http://bugs.debian.org/695182 for details, in particular my workaround patch http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;att=1;bug=695182 (Please reply to me directly, as I am not a subscriber to the linux-mm mailing list.) Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301102158.r0alwi4i031...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with partial workaround
Dear Dave, Your configuration has never worked. This isn't a regression ... ... does not mean that we expect it to work. Do you mean that CONFIG_HIGHMEM64G is deprecated, should not be used; that all development is for 64-bit only? ... 64-bit kernels should basically be drop-in replacements ... Will think about that. I know all my servers are 64-bit capable, will need to check all my desktops. --- I find it puzzling that there seems to be a sharp cutoff at 32GB RAM, no problem under but OOM just over; whereas I would have expected lowmem starvation to be gradual, with OOM occuring much sooner with 64GB than with 34GB. Also, the kernel seems capable of reclaiming lowmem, so I wonder why does that fail just over the 32GB threshhold. (Obviously I have no idea what I am talking about.) --- Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301110046.r0b0k6lr024...@como.maths.usyd.edu.au
Bug#695182: [RFC] Reproducible OOM with partial workaround
Dear Dave, ... I don't believe 64GB of RAM has _ever_ been booted on a 32-bit kernel without either violating the ABI (3GB/1GB split) or doing something that never got merged upstream ... Sorry to be so contradictory: psz@como:~$ uname -a Linux como.maths.usyd.edu.au 3.2.32-pk06.10-t01-i386 #1 SMP Sat Jan 5 18:34:25 EST 2013 i686 GNU/Linux psz@como:~$ free -l total used free sharedbuffers cached Mem: 644469004729292 59717608 0 15972 480520 Low:375836 304400 71436 High: 640710644424892 59646172 -/+ buffers/cache:4232800 60214100 Swap:134217724 0 134217724 psz@como:~$ (though I would not know about violations). But OK, I take your point that I should move with the times. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301110146.r0b1kf4t032...@como.maths.usyd.edu.au
Bug#695182: Write couple of 1GB files for OOM crash
I am slowly coming around to the idea of splitting my patch up into many parts, each addressing just one little issue, so there would be a couple of [PATCH]-es and some [RFC]-s, each profusely explained. Unfortunately did not yet have time to work on this; but want to do, hopefully soon. However... we now did another test on the server, and it ran into OOM. We were load-testing. We intend to use the server for student logins, via XDMCP from any one of 109 X-terminals (similar to LTSP); we logged in on all our terminals, and on each ran a firefox and an rstudio. The server ran into OOM, having plenty of free memory but having exhausted lowmem, after 80 logins. Then rebooted the server with mem=32G on the kernel, and there was no OOM after logging on all our fake students. So, there is a bug still. I will try to make up some easily reproducible test, and if can provoke OOM then look again into kernel code. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301100148.r0a1m1pj022...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, Please read Documentation/SubmittingPatches, use scripts/checkpatch.pl and try to provide a patch that is suitable for upstream inclusion. Also, your name belongs in the patch header, not in the code. I changed the proposed patch accordingly, scripts/checkpatch.pl produces just a few warnings. I had my patch in use for a while now, so I believe it is suitably tested. Please let me know if I need to do anything else. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Avoid OOM when filesystem caches fill lowmem and are not reclaimed, doing drop_caches at that point. The issue is easily reproducible on machines with over 32GB RAM. The patch correctly protects against OOM. The added call to drop_caches has been observed to trigger needlessly but on quite rare occasions only. Also included are several minor fixes: - Comment about highmem_is_dirtyable that seems used only to calculate limits and threshholds, not used in any decisions. - In determine_dirtyable_memory() subtract min_free_kbytes from returned value. I believe this is right, that min_free_kbytes is not intended for dirty pages. - In bdi_position_ratio() get difference (setpoint-dirty) right even when it is negative, which happens often. Normally these numbers are small and even with left-shift I never observed a 32-bit overflow. I believe it should be possible to re-write the whole function in 32-bit ints; maybe it is not worth the effort to make it efficient; seeing how this function was always wrong and we survived, it should simply be removed. - Comment in bdi_max_pause() that it seems to always return a too-small value, maybe it should simply return a fixed value. - Comment in balance_dirty_pages() about a test marked unlikely() but which I observe to be quite common. - Comment in __alloc_pages_slowpath() about did_some_progress being set twice, but only checked after the second setting, so the first setting is lost and wasted. - Comment in zone_reclaimable() that maybe should return true with non-zero NR_SLAB_RECLAIMABLE. - Comment about all_unreclaimable which may be set wrongly. - Comments in global_reclaimable_pages() and zone_reclaimable_pages() about maybe adding or including NR_SLAB_RECLAIMABLE. Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Reported-by: Paul Szabo p...@maths.usyd.edu.au Reference: http://bugs.debian.org/695182 Signed-off-by: Paul Szabo p...@maths.usyd.edu.au --- fs/drop_caches.c.old 2012-10-17 13:50:15.0 +1100 +++ fs/drop_caches.c 2013-01-04 21:52:47.0 +1100 @@ -65,3 +65,10 @@ int drop_caches_sysctl_handler(ctl_table } return 0; } + +/* Easy call: do echo 3 /proc/sys/vm/drop_caches */ +void easy_drop_caches(void) +{ + iterate_supers(drop_pagecache_sb, NULL); + drop_slab(); +} --- mm/page-writeback.c.old 2012-10-17 13:50:15.0 +1100 +++ mm/page-writeback.c 2013-01-06 21:54:59.0 +1100 @@ -39,7 +39,8 @@ /* * Sleep at most 200ms at a time in balance_dirty_pages(). */ -#define MAX_PAUSE max(HZ/5, 1) +/* Might as well be max(HZ/5,4) to ensure max_pause/40 always */ +#define MAX_PAUSE max(HZ/5, 4) /* * Estimate write bandwidth at 200ms intervals. @@ -343,11 +344,26 @@ static unsigned long highmem_dirtyable_m unsigned long determine_dirtyable_memory(void) { unsigned long x; + int y = 0; + extern int min_free_kbytes; x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); + /* + * Seems that highmem_is_dirtyable is only used here, in the + * calculation of limits and threshholds of dirtiness, not in deciding + * where to put dirty things. Is that so? Is that as should be? + * What is the recommended setting of highmem_is_dirtyable? + */ if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); + /* Subtract min_free_kbytes */ + if (min_free_kbytes 0) + y = min_free_kbytes (PAGE_SHIFT - 10); + if (x y) + x -= y; + else + x = 0; return x + 1; /* Ensure that we never return 0 */ } @@ -541,6 +557,9 @@ static unsigned long bdi_position_ratio( if (unlikely(dirty = limit)) return 0; + /* Never seen this happen, just sanity-check paranoia */ + if (unlikely(freerun = limit)) + return 16 RATELIMIT_CALC_SHIFT; /* * global setpoint @@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio( * = fast response on large errors; small oscillation near setpoint */ setpoint = (freerun + limit) / 2; - x = div_s64((setpoint - dirty) RATELIMIT_CALC_SHIFT, + x = div_s64(((s64)setpoint - (s64)dirty) RATELIMIT_CALC_SHIFT, limit - setpoint + 1); pos_ratio = x; pos_ratio = pos_ratio * x RATELIMIT_CALC_SHIFT; @@ -995,6 +1014,13 @@ static unsigned long bdi_max_pause(struc * The pause time will be settled within range (max_pause/4
Bug#695182: Write couple of 1GB files for OOM crash
too small? Should we return something fixed e.g. +return (HZ/10); + * instead of this wasted/useless calculation? + */ return clamp_val(t, 4, MAX_PAUSE); Another while at it, I guess. Yes. On one hand the code sometimes pauses for HZ/10 or so, and on the other hand we have this routine working so hard to return the minimum possible value. @@ -1109,6 +1135,11 @@ static void balance_dirty_pages(struct a } pause = HZ * pages_dirtied / task_ratelimit; if (unlikely(pause = 0)) { +/* + * Not unlikely: often we get zero. + * Seems we always get 0 on large machine. + * Should not do a pause of 1 here? + */ trace_balance_dirty_pages(bdi, git log -S'if (unlikely(pause = 0))' -- mm/page-writeback.c tells me this is from 57fc978cfb61 (writeback: control dirty pause time, 2011-06-11), in case that helps. Will try to look it up sometime. - I had printk() to tell me the value of pause, and mostly I got zero. Wonder how others measured it. [...] --- mm/vmscan.c.old 2012-10-17 13:50:15.0 +1100 +++ mm/vmscan.c 2013-01-06 09:50:49.0 +1100 [...] @@ -2726,9 +2731,87 @@ loop_again: nr_slab = shrink_slab(shrink, sc.nr_scanned, lru_pages); sc.nr_reclaimed += reclaim_state-reclaimed_slab; total_scanned += sc.nr_scanned; +if (unlikely( +i == 1 +nr_slab 10 +(reclaim_state-reclaimed_slab) 10 +zone_page_state(zone, NR_SLAB_RECLAIMABLE) 10 +!zone_watermark_ok_safe(zone, order, +high_wmark_pages(zone), end_zone, 0))) { +/* + * We are stressed (desperate), better This is getting really deeply nested. Would it be possible to split out a function so this code could be more easily contemplated in isolation? Hmm... I would much prefer to leave it as is. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301070303.r0733hs0026...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
tags 695182 - moreinfo thanks Dear Ben, I suggest the following patch, which seems to solve the problem. Two attachments: minimal.patch just to show the simplicity, and complete.patch with comments and enhancements. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- fs/drop_caches.c.old 2012-10-17 13:50:15.0 +1100 +++ fs/drop_caches.c 2013-01-01 09:23:57.0 +1100 @@ -58,10 +58,16 @@ if (ret) return ret; if (write) { if (sysctl_drop_caches 1) iterate_supers(drop_pagecache_sb, NULL); if (sysctl_drop_caches 2) drop_slab(); } return 0; } + +void PSz_drop_caches(void) +{ + iterate_supers(drop_pagecache_sb, NULL); + drop_slab(); +} --- mm/vmscan.c.old 2012-10-17 13:50:15.0 +1100 +++ mm/vmscan.c 2013-01-01 22:58:51.0 +1100 @@ -2719,20 +2719,25 @@ KSWAPD_ZONE_BALANCE_GAP_RATIO); if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone) + balance_gap, end_zone, 0)) { shrink_zone(priority, zone, sc); reclaim_state-reclaimed_slab = 0; nr_slab = shrink_slab(shrink, sc.nr_scanned, lru_pages); sc.nr_reclaimed += reclaim_state-reclaimed_slab; total_scanned += sc.nr_scanned; +if (i==1 nr_slab10 (reclaim_state-reclaimed_slab)10 zone_page_state(zone,NR_SLAB_RECLAIMABLE)10) +{ +extern void PSz_drop_caches(void); + PSz_drop_caches(); +} if (nr_slab == 0 !zone_reclaimable(zone)) zone-all_unreclaimable = 1; } /* * If we've done a decent amount of scanning and * the reclaim ratio is low, start doing writepage * even in laptop mode */ --- fs/drop_caches.c.old 2012-10-17 13:50:15.0 +1100 +++ fs/drop_caches.c 2013-01-01 09:23:57.0 +1100 @@ -58,10 +58,16 @@ if (ret) return ret; if (write) { if (sysctl_drop_caches 1) iterate_supers(drop_pagecache_sb, NULL); if (sysctl_drop_caches 2) drop_slab(); } return 0; } + +void PSz_drop_caches(void) +{ + iterate_supers(drop_pagecache_sb, NULL); + drop_slab(); +} --- mm/page-writeback.c.old 2012-10-17 13:50:15.0 +1100 +++ mm/page-writeback.c 2013-01-01 23:01:52.0 +1100 @@ -32,21 +32,22 @@ #include linux/sysctl.h #include linux/cpu.h #include linux/syscalls.h #include linux/buffer_head.h #include linux/pagevec.h #include trace/events/writeback.h /* * Sleep at most 200ms at a time in balance_dirty_pages(). */ -#define MAX_PAUSE max(HZ/5, 1) +/* PSz: Might as well be max(HZ/5,4) to ensure max_pause/40 always */ +#define MAX_PAUSE max(HZ/5, 4) /* * Estimate write bandwidth at 200ms intervals. */ #define BANDWIDTH_INTERVAL max(HZ/5, 1) #define RATELIMIT_CALC_SHIFT 10 /* * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited @@ -339,22 +340,40 @@ * * Returns the numebr of pages that can currently be freed and used * by the kernel for direct mappings. */ unsigned long determine_dirtyable_memory(void) { unsigned long x; x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages(); +/* + * PSz: Seems that highmem_is_dirtyable is only used here, in the + * calculation of limits and threshholds of dirtiness, not in deciding + * where to put dirty things. Is that so? Is that as should be? + * What is the recommended setting of highmem_is_dirtyable? + */ if (!vm_highmem_is_dirtyable) x -= highmem_dirtyable_memory(x); +/* PSz: Should not we subtract min_free_kbytes? */ +{ +extern int min_free_kbytes; +int y = 0; +/* printk(PSz: determine_dirtyable_memory was %ld pages, now subtract min_free_kbytes=%d\n,x,min_free_kbytes); */ +if (min_free_kbytes 0) + y = min_free_kbytes (PAGE_SHIFT - 10); +if (x y) + x -= y; +else + x = 0; +} return x + 1; /* Ensure that we never return 0 */ } static unsigned long dirty_freerun_ceiling(unsigned long thresh, unsigned long bg_thresh) { return (thresh + bg_thresh) / 2; } @@ -534,39 +553,43 @@ unsigned long limit = hard_dirty_limit(thresh); unsigned long x_intercept; unsigned long setpoint; /* dirty pages' target balance point */ unsigned long bdi_setpoint; unsigned long span; long long pos_ratio; /* for scaling up/down the rate limit */ long x; if (unlikely(dirty = limit)) return 0; + if (unlikely(freerun = limit)) +/* PSz: Never seen this happen, just sanity-check paranoia */ + return (16 RATELIMIT_CALC_SHIFT); /* * global setpoint * * setpoint - dirty 3 *f(dirty) := 1.0 + () * limit - setpoint * * it's a 3rd order polynomial that subjects to * * (1) f(freerun) = 2.0 = rampup dirty_ratelimit reasonably fast * (2) f(setpoint) = 1.0 = the balance point * (3) f(limit)= 0 = the hard limit * (4) df/dx = 0 = negative feedback control * (5) the closer to setpoint, the smaller |df/dx
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, I tried to send tags 695182 - moreinfo to cont...@bugs.debian.org but it came back with: You have been specifically excluded from using the control interface. I guess that has something to do with bug#299007. Would you please be able to have those settings corrected? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301012154.r01lslrj025...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, In the OOM message in my initial bug report, I see Normal ... slab_reclaimable:261528kB ... all_unreclaimable? yes Is that a contradiction? Should not that slab have been reclaimed? Original line: [ 744.754369] Normal free:43788kB min:44112kB low:55140kB high:66168kB active_anon:0kB inactive_anon:0kB active_file:912kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:261528kB slab_unreclaimable:28812kB kernel_stack:3096kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16060 all_unreclaimable? yes Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201212271152.qbrbqsst027...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, In response to your comments: x seems to be in the range [-1,1]. The returned pos_ratio would be within [0,2] if not for the final *8. --- [Funny: taking difference of unsigned ints and expect the result to be negative in some sense. Seems the problem was not with large memory but with negative numbers. Curious the bug was not noticed before.] Need to cast and sign-extend before taking difference of unsigned numbers, as the following demonstrates: $ cat silly.c #include stdio.h main() { unsigned long i,j; long long x; i=1; j=2; x = j-i; printf(j-i = %lld\n,x); x = i-j; printf(i-j = %lld\n,x); x = (long long)i-j; printf(OK = %lld\n,x); } $ cc silly.c; a.out j-i = 1 i-j = 4294967295 OK = -1 $ and in fact things go bad, e.g. freerun=2172 limit=2896 dirty=2589 should get x=-155, whereas original formula gets x=11831710 and Ben's formula gets x=-769071435. Seems a correct patch would be: --- old/mm/page-writeback.c 2012-10-17 13:50:15.0 +1100 +++ new/mm/page-writeback.c 2012-12-17 12:25:14.0 +1100 @@ -559,7 +559,7 @@ * = fast response on large errors; small oscillation near setpoint */ setpoint = (freerun + limit) / 2; - x = div_s64((setpoint - dirty) RATELIMIT_CALC_SHIFT, + x = div_s64(((s64)setpoint - (s64)dirty) RATELIMIT_CALC_SHIFT, limit - setpoint + 1); pos_ratio = x; pos_ratio = pos_ratio * x RATELIMIT_CALC_SHIFT; However, with that patch in place I still got an OOM crash (log below). More bugs remain... Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- Dec 17 12:43:59 zeno kernel: xterm invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0 Dec 17 12:43:59 zeno kernel: Pid: 2704, comm: xterm Not tainted 3.2.32-pk06.08-i386t07 #7 Dec 17 12:43:59 zeno kernel: Call Trace: Dec 17 12:43:59 zeno kernel: [c1607533] ? printk+0x18/0x1a Dec 17 12:43:59 zeno kernel: [c10776b8] dump_header.isra.10+0x68/0x180 Dec 17 12:43:59 zeno kernel: [c1069807] ? delayacct_end+0x97/0xb0 Dec 17 12:43:59 zeno kernel: [c11d664e] ? ___ratelimit+0x7e/0xf0 Dec 17 12:43:59 zeno kernel: [c1077929] oom_kill_process.constprop.15+0x49/0x230 Dec 17 12:43:59 zeno kernel: [c1039d34] ? has_capability_noaudit+0x24/0x30 Dec 17 12:43:59 zeno kernel: [c1077880] ? oom_badness+0xb0/0x110 Dec 17 12:43:59 zeno kernel: [c1077e70] out_of_memory+0x240/0x2c0 Dec 17 12:43:59 zeno kernel: [c107a8a8] __alloc_pages_nodemask+0x558/0x570 Dec 17 12:43:59 zeno kernel: [c1569b91] tcp_sendmsg+0x711/0xab0 Dec 17 12:43:59 zeno kernel: [c11db4fc] ? copy_to_user+0x2c/0x40 Dec 17 12:43:59 zeno kernel: [c1587f22] inet_sendmsg+0x42/0xa0 Dec 17 12:43:59 zeno kernel: [c152fe2b] sock_aio_write+0xdb/0x100 Dec 17 12:43:59 zeno kernel: [c15874f5] ? inet_recvmsg+0x55/0xa0 Dec 17 12:43:59 zeno kernel: [c152fd50] ? sock_aio_read+0x130/0x130 Dec 17 12:43:59 zeno kernel: [c10a3fd4] do_sync_readv_writev+0xa4/0xe0 Dec 17 12:43:59 zeno kernel: [c11db640] ? _copy_from_user+0x30/0x50 Dec 17 12:43:59 zeno kernel: [c10a40d3] ? rw_copy_check_uvector+0x43/0x130 Dec 17 12:43:59 zeno kernel: [c10a4262] do_readv_writev+0xa2/0x1b0 Dec 17 12:43:59 zeno kernel: [c152fd50] ? sock_aio_read+0x130/0x130 Dec 17 12:43:59 zeno kernel: [c10a3ced] ? vfs_read+0x14d/0x170 Dec 17 12:43:59 zeno kernel: [c10a43a2] vfs_writev+0x32/0x50 Dec 17 12:43:59 zeno kernel: [c10a44e8] sys_writev+0x38/0xa0 Dec 17 12:43:59 zeno kernel: [c160fd14] sysenter_do_call+0x12/0x26 Dec 17 12:43:59 zeno kernel: Mem-Info: Dec 17 12:43:59 zeno kernel: DMA per-cpu: Dec 17 12:43:59 zeno kernel: CPU0: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU1: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU2: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU3: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU4: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU5: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU6: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU7: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU8: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU9: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 10: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 11: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 12: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 13: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 14: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 15: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 16: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 17: hi:0, btch: 1 usd: 0 Dec 17 12:43:59 zeno kernel: CPU 18: hi:0, btch: 1 usd: 0 Dec 17
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, It's not a crash, though that's kind of an academic distinction. What would you like me to call it instead? The machine seemed to hang... do not know if rebooted spontaneously or in response to a shutdown -r now that I had typed into an un-responsive xterm. Perhaps you could add some printk() statements to log the results of these various calculations, so you can sanity-check them. You would probably want to make them conditional on the intitial value of x being negative (it's reused for something entirely different later so you would need to assign this condition to a separate variable). I did, and am convinced that bdi_position_ratio() now does the right thing: returns something within [0,2] mostly, and internally seems OK. I think I had my corrected bdi_position_ratio() in use during this latest OOM episode. --- Thanks again for your prompt fix of bdi_position_ratio(). I will now look for that elusive next bug. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201212170258.qbh2waxj014...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
, bdi_dirty); 646 else 647 pos_ratio *= 8; 648 } 649 BUG_ON(pos_ratio0); 650 651 return pos_ratio; 652 } Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201212152021.qbfklugf006...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, ... I think the initial overflow occurs when calculating x ... setpoint and dirty are numbers of pages and are declared as long, so on a system with enough memory they can presumably differ by 2^21 or more (2^21 pages = 8 GB). Shifting left by RATELIMIT_CALC_SHIFT = 10 can then change the sign bit. Does the attached patch fix this? ... Most variables in bdi_position_ratio() are declared long, which is enough for a page count. However, when converting (setpoint - dirty) to a fixed-point number we left-shift by 10, and on a 32-bit system with PAE it is possible to have enough dirty pages that the shift overflows into the sign bit. We need to cast to s64 before the left-shift. Reported-by: Paul Szabo paul.sz...@sydney.edu.au Reference: http://bugs.debian.org/695182 Signed-off-by: Ben Hutchings b...@decadent.org.uk --- mm/page-writeback.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 50f0824..8b5600e 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(struct backing_dev_i nfo *bdi, * = fast response on large errors; small oscillation near setpoint */ setpoint = (freerun + limit) / 2; - x = div_s64((setpoint - dirty) RATELIMIT_CALC_SHIFT, + x = div_s64((s64)(setpoint - dirty) RATELIMIT_CALC_SHIFT, limit - setpoint + 1); pos_ratio = x; pos_ratio = pos_ratio * x RATELIMIT_CALC_SHIFT; Thanks for the quick patch. I am about to test it (in a day or so). Initial (blind, off-the-cuff, uneducated) comments: - I had BUG_ON(x0) in my code, so unlikely x changed sign. - Why not use float instead of infinite-precision integer arithmetic? - Do we need a smooth function, or would an easy-to-calculate step function suffice? - Is there a check that the returned s64 pos_ratio fits into u32? Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201212160014.qbg0e961019...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, Although PAE supports up to 64 GB RAM ... The use of such a large amount of high memory is problematic ... Or you can test ... by restricting what the kernel uses with the 'mem' parameter, e.g. mem=16G. Trying various mem=XX values, no OOM was observed with mem=32G or less, but a crash is obtained with any memory over 32GB e.g. with mem=34G. This suggests a signed/unsigned bug more than an issue with highmem size; you said PAE supports 64GB, not just 32GB. A 64-bit kernel doesn't have a split between normal and high memory. ... and it may have larger integers, less affected by signedness bugs. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia *** mem=34G - Tail end of /var/log/kern.log [0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-686-pae mem=34G root=UUID=469c2730-1786-46f7-9d80-5d651ee581d7 ro quiet ... [ 388.560098] dd invoked oom-killer: gfp_mask=0x800d0, order=0, oom_adj=0, oom_score_adj=0 [ 388.560106] dd cpuset=/ mems_allowed=0 [ 388.560113] Pid: 4244, comm: dd Not tainted 3.2.0-4-686-pae #1 Debian 3.2.32-1 [ 388.560117] Call Trace: [ 388.560135] [c1097c1c] ? dump_header.isra.6+0x5c/0x167 [ 388.560144] [c1120ff8] ? security_real_capable_noaudit+0x2c/0x35 [ 388.560149] [c1097e93] ? oom_kill_process+0x30/0x201 [ 388.560155] [c109810f] ? select_bad_process.constprop.12+0xab/0xff [ 388.560160] [c10983e0] ? out_of_memory+0xf8/0x135 [ 388.560167] [c109aecd] ? __alloc_pages_nodemask+0x509/0x63e [ 388.560176] [c10c0ca3] ? cache_alloc+0x253/0x407 [ 388.560183] [c10c1425] ? kmem_cache_alloc+0x29/0x89 [ 388.560193] [c11605ee] ? radix_tree_preload+0x24/0x61 [ 388.560203] [c10961cd] ? add_to_page_cache_locked+0x3e/0xb3 [ 388.560210] [c1096253] ? add_to_page_cache_lru+0x11/0x2f [ 388.560217] [c10962cb] ? grab_cache_page_write_begin+0x5a/0x94 [ 388.560244] [f88bb348] ? ext3_write_begin+0xa0/0x1d2 [ext3] [ 388.560251] [c109d562] ? put_page+0x16/0x24 [ 388.560258] [c1095e61] ? generic_file_buffered_write+0xd8/0x1dd [ 388.560265] [c1096afb] ? __generic_file_aio_write+0x25e/0x282 [ 388.560273] [c100f2fb] ? read_tsc+0xa/0x28 [ 388.560282] [c1053548] ? timekeeping_get_ns+0x11/0x55 [ 388.560287] [c1096b7c] ? generic_file_aio_write+0x5d/0xb3 [ 388.560298] [c10cbc31] ? wait_on_retry_sync_kiocb+0x3c/0x3c [ 388.560304] [c10cbcd9] ? do_sync_write+0xa8/0xdc [ 388.560311] [c10cc1f3] ? rw_verify_area+0xc6/0xe7 [ 388.560317] [c10cc493] ? vfs_write+0x83/0xd4 [ 388.560323] [c10cc653] ? sys_write+0x3d/0x61 [ 388.560331] [c12c5d1f] ? sysenter_do_call+0x12/0x28 [ 388.560334] Mem-Info: [ 388.560337] DMA per-cpu: [ 388.560341] CPU0: hi:0, btch: 1 usd: 0 [ 388.560345] CPU1: hi:0, btch: 1 usd: 0 [ 388.560348] CPU2: hi:0, btch: 1 usd: 0 [ 388.560351] CPU3: hi:0, btch: 1 usd: 0 [ 388.560355] CPU4: hi:0, btch: 1 usd: 0 [ 388.560358] CPU5: hi:0, btch: 1 usd: 0 [ 388.560361] CPU6: hi:0, btch: 1 usd: 0 [ 388.560365] CPU7: hi:0, btch: 1 usd: 0 [ 388.560368] CPU8: hi:0, btch: 1 usd: 0 [ 388.560371] CPU9: hi:0, btch: 1 usd: 0 [ 388.560374] CPU 10: hi:0, btch: 1 usd: 0 [ 388.560378] CPU 11: hi:0, btch: 1 usd: 0 [ 388.560381] CPU 12: hi:0, btch: 1 usd: 0 [ 388.560384] CPU 13: hi:0, btch: 1 usd: 0 [ 388.560387] CPU 14: hi:0, btch: 1 usd: 0 [ 388.560390] CPU 15: hi:0, btch: 1 usd: 0 [ 388.560394] CPU 16: hi:0, btch: 1 usd: 0 [ 388.560397] CPU 17: hi:0, btch: 1 usd: 0 [ 388.560400] CPU 18: hi:0, btch: 1 usd: 0 [ 388.560404] CPU 19: hi:0, btch: 1 usd: 0 [ 388.560407] CPU 20: hi:0, btch: 1 usd: 0 [ 388.560410] CPU 21: hi:0, btch: 1 usd: 0 [ 388.560413] CPU 22: hi:0, btch: 1 usd: 0 [ 388.560417] CPU 23: hi:0, btch: 1 usd: 0 [ 388.560420] CPU 24: hi:0, btch: 1 usd: 0 [ 388.560423] CPU 25: hi:0, btch: 1 usd: 0 [ 388.560427] CPU 26: hi:0, btch: 1 usd: 0 [ 388.560430] CPU 27: hi:0, btch: 1 usd: 0 [ 388.560433] CPU 28: hi:0, btch: 1 usd: 0 [ 388.560436] CPU 29: hi:0, btch: 1 usd: 0 [ 388.560440] CPU 30: hi:0, btch: 1 usd: 0 [ 388.560443] CPU 31: hi:0, btch: 1 usd: 0 [ 388.560446] Normal per-cpu: [ 388.560449] CPU0: hi: 186, btch: 31 usd: 174 [ 388.560452] CPU1: hi: 186, btch: 31 usd: 164 [ 388.560456] CPU2: hi: 186, btch: 31 usd: 53 [ 388.560459] CPU3: hi: 186, btch: 31 usd: 51 [ 388.560462] CPU4: hi: 186, btch: 31 usd: 155 [ 388.560465] CPU5: hi: 186, btch: 31 usd: 72 [ 388.560469] CPU6: hi: 186, btch: 31 usd: 143 [ 388.560472] CPU7: hi: 186, btch: 31 usd: 95 [ 388.560475] CPU8: hi: 186, btch: 31 usd: 178
Bug#695182: Re: Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
An observation that may help in solving this issue. Using while :; do free -lm; sleep 5; done while writing the files, I see the buffers and cached values increasing; then buffers start decreasing, eventually down to zero; then soon after, OOM starts. The free or low or high values do not seem to show anything unusual. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia Extract from output: total used free sharedbuffers cached Mem: 62941 1586 61354 0 61 1359 Mem: 62941 2143 60797 0 61 1907 Mem: 62941 2652 60288 0 62 2407 Mem: 62941 3205 59735 0 63 2951 Mem: 62941 3743 59197 0 63 3483 Mem: 62941 4275 58665 0 64 4007 Mem: 62941 4791 58149 0 64 4511 Mem: 62941 5338 57602 0 65 5049 Mem: 62941 5835 57105 0 65 5538 Mem: 62941 6332 56608 0 66 6027 Mem: 62941 6837 56103 0 66 6524 Mem: 62941 7332 55608 0 67 7007 Mem: 62941 7815 55125 0 67 7482 Mem: 62941 8310 54630 0 68 7970 Mem: 62941 8820 54120 0 68 8471 Mem: 62941 9280 53660 0 69 8922 Mem: 62941 9779 53161 0 69 9413 Mem: 62941 10231 52709 0 59 9868 Mem: 62941 10736 52204 0 59 10366 Mem: 62941 11105 51835 0 48 10741 Mem: 62941 11585 51355 0 41 11223 Mem: 62941 12074 50866 0 36 11709 Mem: 62941 12544 50396 0 23 12183 Mem: 62941 13021 49919 0 24 12653 Mem: 62941 13515 49425 0 8 13152 Mem: 62941 13978 48962 0 9 13609 Mem: 62941 14459 48481 0 1 14091 Mem: 62941 14941 47999 0 0 14566 Mem: 62941 15409 47531 0 0 15028 Mem: 62941 15858 47082 0 0 15487 Mem: 62941 16251 46689 0 0 15873 Mem: 62941 16392 46548 0 0 16017 Mem: 62941 16593 46347 0 0 16215 Mem: 62941 16730 46210 0 0 16350 Mem: 62941 16808 46132 0 0 16429 Mem: 62941 16839 46101 0 0 16460 Mem: 62941 16855 46085 0 0 16476 Mem: 62941 16843 46097 0 0 16487 Mem: 62941 17121 45819 0 0 16779 Mem: 62941 17342 45598 0 0 16998 Mem: 62941 17491 45449 0 0 17146 -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201212051115.qb5bf3fl015...@como.maths.usyd.edu.au
Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Dear Ben, Although PAE supports up to 64 GB RAM, everything the kernel accesses must be mapped into 1 GB of virtual address space (about 880 MB of persistently mapped 'normal memory', plus temporary mappings of the remaining 'high memory'). The use of such a large amount of high memory is problematic, though I don't know whether it entirely explains this behaviour. (The memory stats don't seem to account for much of the normal memory, as there is ~40 MB free but the various classes of allocations seem to add up to only ~300 MB.) These machines should all be installed with the amd64 kernel. Is there any reason you would prefer not to do that? Perhaps the kernel flavour selection in the installer should be changed to favour that based on the RAM size, though I'm not sure what the critical value should be. Are you suggesting that the kernel lies, that 32-bit cannot handle 64GB? Would it help to test the issue on a 16GB machine (I have one with 2*X5460 CPUs and one with single i5-3570), or with 24GB (have several with 2*E5335 to 2*X5460) I have seen recommendations to use 64-bit amd64. I am somewhat reluctant on jumping ship: I want continuity (when I upgrade by installing a little more memory), want similarity between my various machines; and have observed 32-bit being faster in some situations. But really: this is a bug in the 32-bit build. Do I know that the same or similar or worse bugs are not present also in the 64-bit build off the same sources? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201212051136.qb5babnv016...@como.maths.usyd.edu.au
Bug#384922: NFS insecure without support for squashing multiple groups
... AUTH_SYS with untrusted root on clients is not a good fit ... NFSv4 with kerberos authentication would be less broken. root_squash is a simplistic and incomplete band-aid. NFSv4+krb is better only because it does not have a concept of groups. Remove groups from AUTH_SYS, ignoring all groups or in other words doing manage primary group similar to secondaries with -manage_gids, and issue might be solved. (In that sense NFSv4+krb is more broken, less feature-rich, than AUTH_SYS.) Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201202191159.q1jbxymm017...@bari.maths.usyd.edu.au
Bug#384922: NFS insecure without support for squashing multiple groups
Dear Jonathan, NFSv4+krb is better only because ... Surely the ability to squash multiple uids is also a help. ;-) Not when asking to squash groups. :-) I thought that idmapd worked also with AUTH_SYS. Do I understand correctly that you are requesting an export or mountd option filter_gid, which would behave like --manage-gids except it transforms the effective gid to anongid when the specified gid is not a group the user belongs to? I haven't carefully looked over the protocol specs but at first glance that seems sensible. Yes, my exact wish. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201202192011.q1jkb1pb023...@bari.maths.usyd.edu.au
Bug#657916: linux-source-2.6.32: ps time doubled then constant: missing lock for task_utime?
I wrote: I only observed this for multi-threaded processes compiled with -fopenmp . I think I now observed the same issue with a single-threaded process: $ ps u -p 14252 USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND duncans 14252 150 9.7 2458868 2408272 ? RN Jan03 71589:24 ./compact $ grep . /proc/14252/stat /proc/14252/task/*/stat /proc/14252/stat:14252 (compact) R 1 14222 14218 0 -1 4202496 366809489 0 0 0 429428470 108008 0 0 36 16 1 0 130475208 2517880832 602068 4294967295 1 1 0 0 0 0 0 0 132 4294967295 0 0 17 0 0 0 0 0 0 /proc/14252/task/14252/stat:14252 (compact) R 1 14222 14218 0 -1 4202496 366809489 0 0 0 429428468 95107 0 0 36 16 1 0 130475208 2517880832 602068 4294967295 1 1 0 0 0 0 0 0 132 4294967295 0 0 17 0 0 0 0 0 0 Should I investigate, should I try to reproduce, and check by how much do TIME and %CPU jump when the wrong results start? Thanks, Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201202051106.q15b64im007...@bari.maths.usyd.edu.au
Bug#657916: linux-source-2.6.32: ps time doubled then constant: missing lock for task_utime?
Package: linux-source-2.6.32 Version: 2.6.32-41 Severity: normal On rare occasions, for some long-running processes, ps shows a too-large and then constant CPU time. I only observed this for multi-threaded processes compiled with -fopenmp . On one occasion I seen: $ ps u -p 14804 USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND psz 14804 1599 0.0 61528 1356 ?RNl Jan13 71587:15 a.out $ grep . /proc/14804/stat /proc/14804/task/*/stat /proc/14804/stat:14804 (a.out) R 1 14804 14608 0 -1 4202496 624 0 0 0 427308277 2215294 0 0 36 16 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3214614264 134522975 0 0 1 0 4294967295 0 0 17 2 0 0 0 0 0 /proc/14804/task/14804/stat:14804 (a.out) R 1 14804 14608 0 -1 4202496 588 0 0 0 26478404 333660 0 0 36 16 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3214613280 3077747522 0 0 1 0 0 0 0 17 2 0 0 0 0 0 /proc/14804/task/14807/stat:14807 (a.out) R 1 14804 14608 0 -1 4202560 6 0 0 0 26703589 138033 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3075048080 3077747646 0 0 1 0 0 0 0 -1 5 0 0 0 0 0 /proc/14804/task/14808/stat:14808 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 26802997 48697 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3066655216 3077747646 0 0 1 0 0 0 0 -1 1 0 0 0 0 0 /proc/14804/task/14809/stat:14809 (a.out) R 1 14804 14608 0 -1 4202560 5 0 0 0 26756492 95248 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3058262672 3077747646 0 0 1 0 0 0 0 -1 6 0 0 0 0 0 /proc/14804/task/14810/stat:14810 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 26689860 161611 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3049869808 3077747646 0 0 1 0 0 0 0 -1 7 0 0 0 0 0 /proc/14804/task/14811/stat:14811 (a.out) R 1 14804 14608 0 -1 4202560 6 0 0 0 26705969 145689 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3041477104 3077747646 0 0 1 0 0 0 0 -1 0 0 0 0 0 0 /proc/14804/task/14812/stat:14812 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 26729186 122435 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3033084560 3077747646 0 0 1 0 0 0 0 -1 3 0 0 0 0 0 /proc/14804/task/14813/stat:14813 (a.out) R 1 14804 14608 0 -1 4202560 7 0 0 0 26789545 62169 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 134539869 3214629248 3024691856 3077747646 0 0 1 0 0 0 0 -1 4 0 0 0 0 0 with TIME and %CPU reported by ps apparently doubled just before then; from then on, TIME remained constant and %CPU slowly decreased. In that state, command ps u -L -p 14804 showed sensible output. I did not wait long enough to see whether TIME ever increased again. I wonder if this issue is related to task_utime in kernel/sched.c calculating and updating p-prev_utime without any locks, whereas comments say that thread_group_times must be called with siglock held. Thanks, Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- System Information: Debian Release: 6.0.4 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.32-pk05.09-svr (SMP w/8 CPU cores) Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /bin/bash Versions of packages linux-source-2.6.32 depends on: ii binutils2.20.1-16The GNU assembler, linker and bina ii bzip2 1.0.5-6+squeeze1 high-quality block-sorting file co Versions of packages linux-source-2.6.32 recommends: ii gcc 4:4.4.5-1 The GNU C compiler ii libc6-dev [libc-dev] 2.11.3-2 Embedded GNU C Library: Developmen ii make 3.81-8 An utility for Directing compilati Versions of packages linux-source-2.6.32 suggests: ii kernel-package12.036+nmu1A utility for building Linux kerne ii libncurses5-dev [ncurses- 5.7+20100313-5 developer's libraries and docs for pn libqt3-mt-dev none (no description available) -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120129225154.20218.84527.report...@bari.maths.usyd.edu.au
Bug#582826: Oops: 0002 unable to handle kernel paging request
The problem did not re-occur after upgrading to 2.6.26-24. Maybe fixed... Please close bug: I cannot reproduce anymore. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201008242303.o7on3mrw017...@bari.maths.usyd.edu.au
Bug#582826: Oops: 0002 unable to handle kernel paging request
For the record only. - My machine has been occasionally (far too regularly, every couple of weeks) crashing with the same error as reported. Other machines, similar hardware and identical kernel, did not seem affected. - Today I updated the kernels to one based on 2.6.26-24, will monitor whether that helps. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201007082254.o68ms066031...@bari.maths.usyd.edu.au
Bug#582826: Oops: 0002 unable to handle kernel paging request
Package: linux-source-2.6.26 Version: 2.6.26-21 Severity: normal My main file and login server machine crashed, with an Oops in the logs. I do not know whether this crash is reproducible: it crashed also a week earlier, but with nothing visible in the logs; it had been stable for months before these two crashes. Extract from /var/log/syslog at the crash: May 22 20:29:45 bari kernel: BUG: unable to handle kernel paging request at b14f6dc2 May 22 20:29:45 bari kernel: IP: [c014b326] find_get_pages+0x46/0x70 May 22 20:29:45 bari kernel: *pdpt = 31a75001 *pde = May 22 20:29:45 bari kernel: Oops: 0002 [#1] SMP May 22 20:29:45 bari kernel: Modules linked in: nfsd exportfs autofs4 quota_v2 fuse intel_agp agpgart usb_storage sg thermal 8250_pnp 8250 rtc_cmos rtc_core ehci_hcd parport_pc parport serial_core rtc_lib evdev i2c_i801 i2c_core processor thermal_sys May 22 20:29:45 bari kernel: May 22 20:29:45 bari kernel: Pid: 287, comm: kswapd0 Not tainted (2.6.26-pk03.17-svr #1) May 22 20:29:45 bari kernel: EIP: 0060:[c014b326] EFLAGS: 00010002 CPU: 5 May 22 20:29:45 bari kernel: EIP is at find_get_pages+0x46/0x70 May 22 20:29:45 bari kernel: EAX: b16f6cbe EBX: 0001 ECX: EDX: b14f6dbe May 22 20:29:45 bari kernel: ESI: EDI: e39c5dd8 EBP: f7e39e88 ESP: f7e39e40 May 22 20:29:45 bari kernel: DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 May 22 20:29:45 bari kernel: Process kswapd0 (pid: 287, ti=f7e39000 task=f7d7c6e0 task.ti=f7e39000) May 22 20:29:45 bari kernel: Stack: 000e e39c5de8 f7e39e80 f7e39e80 c01539c2 f7e39e88 May 22 20:29:45 bari kernel:e39c5d30 0080 c01545b4 000e 00155ca7 e39c5dd8 May 22 20:29:45 bari kernel: c520e340 c1edb9e0 c4f7bc60 c5278ae0 c39e7a60 c42deaa0 May 22 20:29:45 bari kernel: Call Trace: May 22 20:29:45 bari kernel: [c01539c2] pagevec_lookup+0x22/0x30 May 22 20:29:45 bari kernel: [c01545b4] __invalidate_mapping_pages+0x54/0x140 May 22 20:29:45 bari kernel: [c01546af] invalidate_mapping_pages+0xf/0x20 May 22 20:29:45 bari kernel: [c0186565] shrink_icache_memory+0x235/0x240 May 22 20:29:45 bari kernel: [c015609f] shrink_slab+0x12f/0x190 May 22 20:29:45 bari kernel: [c01564cd] kswapd+0x3cd/0x490 May 22 20:29:45 bari kernel: [c0154d70] isolate_pages_global+0x0/0x60 May 22 20:29:45 bari kernel: [c0136f80] autoremove_wake_function+0x0/0x50 May 22 20:29:45 bari kernel: [c0156100] kswapd+0x0/0x490 May 22 20:29:45 bari kernel: [c0136c99] kthread+0x39/0x70 May 22 20:29:45 bari kernel: [c0136c60] kthread+0x0/0x70 May 22 20:29:45 bari kernel: [c0103c83] kernel_thread_helper+0x7/0x14 May 22 20:29:45 bari kernel: === May 22 20:29:45 bari kernel: Code: 30 00 8d 47 04 89 f1 89 ea 89 1c 24 e8 84 63 10 00 85 c0 89 c3 74 1f 31 c9 8d 74 26 00 8b 54 8d 00 8b 02 f6 c4 40 74 03 8b 52 0c f0 ff 42 04 83 c1 01 39 cb 77 e7 8b 44 24 04 f0 ff 47 10 fb 83 May 22 20:29:45 bari kernel: EIP: [c014b326] find_get_pages+0x46/0x70 SS:ESP 0068:f7e39e40 May 22 20:29:45 bari kernel: ---[ end trace 34faad952d0fda3f ]--- At this last crash, I happened to be logged in via ssh, and in the ssh terminal window I had similar output, but curiously with a few lines interchanged in order (and I am not sure whether the terminal output or the syslog is correct; each line on the terminal was separately prefaced with Message from syslogd... and separated with blank lines): Message from sysl...@bari at Sat May 22 20:29:45 2010 ... bari kernel: Oops: 0002 [#1] SMP bari kernel: Process kswapd0 (pid: 287, ti=f7e39000 task=f7d7c6e0 task.ti=f7e39000) bari kernel: Stack: 000e e39c5de8 f7e39e80 f7e39e80 c01539c2 f7e39e88 bari kernel: c520e340 c1edb9e0 c4f7bc60 c5278ae0 c39e7a60 c42deaa0 bari kernel:e39c5d30 0080 c01545b4 000e 00155ca7 e39c5dd8 bari kernel: Call Trace: bari kernel: [c01539c2] pagevec_lookup+0x22/0x30 bari kernel: [c01545b4] __invalidate_mapping_pages+0x54/0x140 bari kernel: [c01546af] invalidate_mapping_pages+0xf/0x20 bari kernel: [c015609f] shrink_slab+0x12f/0x190 bari kernel: [c0186565] shrink_icache_memory+0x235/0x240 bari kernel: [c01564cd] kswapd+0x3cd/0x490 bari kernel: [c0154d70] isolate_pages_global+0x0/0x60 bari kernel: [c0156100] kswapd+0x0/0x490 bari kernel: [c0136f80] autoremove_wake_function+0x0/0x50 bari kernel: [c0136c99] kthread+0x39/0x70 bari kernel: [c0136c60] kthread+0x0/0x70 bari kernel: [c0103c83] kernel_thread_helper+0x7/0x14 bari kernel: Code: 30 00 8d 47 04 89 f1 89 ea 89 1c 24 e8 84 63 10 00 85 c0 89 c3 74 1f 31 c9 8d 74 26 00 8b 54 8d 00 8b 02 f6 c4 40 74 03 8b 52 0c f0 ff 42 04 83 c1 01 39 cb 77 e7 8b 44 24 04 f0 ff 47 10 fb 83 bari kernel: EIP: [c014b326] find_get_pages+0x46/0x70 SS:ESP 0068:f7e39e40 bari kernel: === Thanks, Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School
Re: Processed: cloning 568317, reassign -1 to kernel-package
Dear Ben, You wrote: The OP asked us to report the bug, so I assumed he didn't. Seems you did not pay attention to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568317#86 written a week before your cloning: Seeing your reluctance to talk to kernel-package, I now reported Bug#568823. Of course none of this matters at all. Kernel-package is now fixed, linux-2.6 will be fixed sometime in the far future and wrongly because they speak about repeated patterns not about reasonable perl code. But it all does not matter because no-one seems to care much about Debian stable (currently lenny) ... oh well, will patch myself. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201002162127.o1glreud015...@bari.maths.usyd.edu.au
Bug#568317: Processed: cloning 568317, reassign -1 to kernel-package
Dear Ben, ... Sorry for the mistake. OK, you are only human and forgiven. I know the scripts suck but I am not in a position to rewrite them. ... I fixed this in the svn branch ... You seem to be contradicting yourself. I asked you to send patches for linux-2.6, and you refused. Huh? What do you base that accusation on? Looking in Bug#568317, I see my fumbling about getting hold of the right scripts (oh silly me, always thinking that Debian lenny is current...) and then http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568317#71 from your colleague Maximilian Attems: ... saw your old patch and can do that later tomorrow. thanks for your input. which I interpreted as you not needing further hand-holding after all. Please explain how my interpretation was wrong; or please retract. I fixed this in the svn branch for lenny and it should be in the next stable update. Thanks. Pity you did not let Bug#568317 (and thus the world) know. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201002162212.o1gmcfdc014...@bari.maths.usyd.edu.au
Bug#568317: Processed: cloning 568317, reassign -1 to kernel-package
Dear Ben, I know the scripts suck but I am not in a position to rewrite them. ... I fixed this in the svn branch ... You seem to be contradicting yourself. Not at all. ... I have made a localised fix ... I have removed ... I would like to go much further ... OK, you seem to make a subtle distinction between fix and rewrite. I asked you to send patches for linux-2.6, and you refused. Huh? What do you base that accusation on? ... http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D568317#71 ... saw your old patch and can do that later tomorrow. thanks for your input. which I interpreted as you not needing further hand-holding after all. Please explain how my interpretation was wrong; or please retract. You declined to send a patch for linux-2.6 after I pointed out that it is separate from kernel-package. Yes, Maks said he could use your k-p patch, but it would not have applied cleanly and would have required fixing up. Given that the kernel team is quite busy, and that you are clearly capable of debugging Perl and making patches, I don't think it was unreasonable of me to expect you to help us a bit further. Please note: I never refused, never declined. Please retract. Maybe I failed to deliver, but only because I was wilfully misled into thinking that such was not wanted. Will never again listen to Maks. :-) I fixed this in the svn branch for lenny and it should be in the next stable update. Thanks. Pity you did not let Bug#568317 (and thus the world) know. Normal practice is simply to tag a bug pending when the fix is committed, and that has been done (automatically). Wow, I was not aware of that. But then... Bug#568823 had been pending (I think is now done): does that mean that I can expect to see kernel-package_12.033 in lenny, in the near future? (Surely not, surely commit has meanings unrelated to to be in next stable update.) Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201002162324.o1gnojsz017...@bari.maths.usyd.edu.au
Bug#568317: linux-image-* postinst did not correctly run lilo
Dear Ben, kernel-package has an old version of the script templates. The templates used by linux-2.6 are in debian/templates/temp.image.plain. Where do I find that? apt-get source linux-2.6 Sorry, am confused. Is that the same as apt-get install linux-source-2.6.26 It is not. I now tried that (not with apt-get but manually). I guess I needed the files: .../pool/main/l/linux-2.6/linux-2.6_2.6.26.orig.tar.gz .../pool/main/l/linux-2.6/linux-2.6_2.6.26-21.diff.gz (please confirm, or tell me what I messed up). After unpacking those, I find (essentially in the diff file) the .../linux-2.6-2.6.26/debian/templates/temp.image.plain/ directory. However, to my surprise, those files are older than those from the kernel-package directory, for example: diff -r /usr/share/kernel-package/pkg/image/postinst linux-2.6-2.6.26/debian/templates/temp.image.plain/postinst 8,10c8,10 # Last Modified On : Wed Oct 8 00:03:41 2008 # Last Machine Used: anzu.internal.golden-gryphon.com # Update Count : 360 --- # Last Modified On : Fri Sep 29 10:08:18 2006 # Last Machine Used: glaurung.internal.golden-gryphon.com # Update Count : 357 ... Does that mean that kernel-package is in fact newer, and you should import again then use my patches (sent previously)? - Should this bug somehow be given to kernel-package also (clone and reassign?): would you be able to do that? - BTW... lenny 5.0.4, with 2.6.26-21, was announced some time ago, but still http://packages.debian.org/search?keywords=linux-2.6searchon=sourcenamessuite=stablesection=all http://packages.debian.org/source/lenny/linux-2.6 show 2.6.26-19lenny2 : is that a problem? That is strange. Perhaps you should report a bug against www.debian.org. Would you be able to do that? (I guess should not use reportbug for that, am not sure how.) Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#568317: linux-image-* postinst did not correctly run lilo
Dear Maks, ... saw your old patch and can do that later tomorrow. Thanks. Please also fix current Debian stable, do not make us wait until 2.6.32 trickles down. Should this bug somehow be given to kernel-package also (clone and reassign?): would you be able to do that? don't know kernel-package is in slow maintenance, What does slow maintenance mean? They seem more up-to-date that linux-2.6 in Debian stable. Anyway please arrange for kernel-package to be fixed also. Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#568317: linux-image-* postinst did not correctly run lilo
Dear Ben, ... the date headers in linux-2.6 have not been updated ... I only whinged about headers; content in 2.6.26-21 is also in fact older than kernel-package. I do not know what you may have changed for bleeding-edge 2.6.32; and of course I want Debian stable to be updated (now or in the near future). Seeing your reluctance to talk to kernel-package, I now reported Bug#568823. Please report the issue with http://packages.debian.org/source/lenny/linux-2.6 . Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#568317: linux-image-* postinst did not correctly run lilo
Dear Ben, kernel-package has an old version of the script templates. The templates used by linux-2.6 are in debian/templates/temp.image.plain. Where do I find that? apt-get source linux-2.6 Sorry, am confused. Is that the same as apt-get install linux-source-2.6.26 which would get me (and install) linux-source-2.6.26_2.6.26-21_all.deb and suggest to get/install kernel-package_11.015_all.deb ? I do not see anything about templates in either deb file, but see /usr/share/kernel-package/pkg/image/postinst in kernel-package. BTW... lenny 5.0.4, with 2.6.26-21, was announced some time ago, but still http://packages.debian.org/search?keywords=linux-2.6searchon=sourcenamessuite=stablesection=all http://packages.debian.org/source/lenny/linux-2.6 show 2.6.26-19lenny2 : is that a problem? Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#568317: linux-image-* postinst did not correctly run lilo
Niko Tyni wrote: Reassigning now. ... Thanks. ... you could probably follow up and explain what exactly was incorrect about running lilo. Looking at my /var/lib/dpkg/info/linux-image-*.postinst files, I see in the code reading and parsing $CONF_LOC = '/etc/kernel-img.conf': ... $do_symlink = if /do_symlinks\s*=\s*(no|false|0)\s*$/ig; ... $do_bootloader = Yes if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig; $explicit_do_loader = YES if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig; ... Most of the match patterns are used once only; using /g on them is not necessary, and is probably wasteful (though perl is fast enough to handle such things). The pattern /do_bootloader.../ig is used twice. The first one may match; the second one will surely not match because of the spurious /g, thus explicit_do_loader will never be set (and lilo not run, or run after a question left un-answered in unattended runs of apt-get install). --- Minor issues, while I am criticizing perl style... These patterns are anchored at the end, should also be anchored at the beginning (and with explicit m//) like: $do_symlink = if m/^\s*do_symlinks\s*=\s*(no|false|0)\s*$/i; ... $image_dest = $1 if m/^\s*image_dest\s*=\s*(\S+)\s*$/i; I wonder about the need to use my() in a single-level script. --- Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#568317: linux-image-* postinst did not correctly run lilo
Dear Ben, ... If you can provide patches, that would be most helpful. See below. I now see that the sources of these files are in package kernel-package (I do not know how that relates to linux-2.6). I only patched the spurious /g modifiers and cleaned up the patterns e.g. to anchor at the beginning; did not drop the unnecessary my(). Cheers, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- usr/share/kernel-package/pkg/headers/postinst.bak 2008-05-02 15:06:28.0 +1000 +++ usr/share/kernel-package/pkg/headers/postinst 2010-02-05 10:30:23.0 +1100 @@ -146,8 +146,8 @@ s/\#.*$//g; next if /^\s*$/; - $src_postinst_hook = $1 if /src_postinst_hook\s*=\s*(\S+)/ig; - $header_postinst_hook = $1 if /header_postinst_hook\s*=\s*(\S+)/ig; + $src_postinst_hook = $1 if m/^\s*src_postinst_hook\s*=\s*(\S+)\s*$/i; + $header_postinst_hook = $1 if m/^\s*header_postinst_hook\s*=\s*(\S+)\s*$/i; } close CONF; $have_conffile = Yes; --- usr/share/kernel-package/pkg/source/postinst.bak2008-05-02 15:06:28.0 +1000 +++ usr/share/kernel-package/pkg/source/postinst2010-02-05 10:31:06.0 +1100 @@ -57,7 +57,7 @@ s/\#.*$//g; next if /^\s*$/; - $src_postinst_hook = $1 if /src_postinst_hook\s*=\s*(\S+)/ig; + $src_postinst_hook = $1 if m/^\s*src_postinst_hook\s*=\s*(\S+)\s*$/i; } close CONF; $have_conffile = Yes; --- usr/share/kernel-package/pkg/doc/postinst.bak 2008-05-02 15:06:28.0 +1000 +++ usr/share/kernel-package/pkg/doc/postinst 2010-02-05 10:31:51.0 +1100 @@ -57,7 +57,7 @@ s/\#.*$//g; next if /^\s*$/; - $src_postinst_hook = $1 if /src_postinst_hook\s*=\s*(\S+)/ig; + $src_postinst_hook = $1 if m/^\s*src_postinst_hook\s*=\s*(\S+)\s*$/i; } close CONF; $have_conffile = Yes; --- usr/share/kernel-package/pkg/image/postinst.bak 2008-11-25 04:01:32.0 +1100 +++ usr/share/kernel-package/pkg/image/postinst 2010-02-05 10:43:59.0 +1100 @@ -116,60 +116,60 @@ warn Option image_in_boot is deprecated, and will go away. Use link_in_boot instead.\n if m/image_in_boot\s*=\s*/; - $do_symlink = if /do_symlinks\s*=\s*(no|false|0)\s*$/ig; - $no_symlink = if /no_symlinks\s*=\s*(no|false|0)\s*$/ig; - $reverse_symlink = if /reverse_symlink\s*=\s*(no|false|0)\s*$/ig; - $link_in_boot= if /link_in_boot\s*=\s*(no|false|0)\s*$/ig; - $link_in_boot= if /image_in_boot\s*=\s*(no|false|0)\s*$/ig; - $move_image = if /move_image\s*=\s*(no|false|0)\s*$/ig; - $clobber_modules = '' if /clobber_modules\s*=\s*(no|false|0)\s*$/ig; - $do_boot_enable = '' if /do_boot_enable\s*=\s*(no|false|0)\s*$/ig; - $do_bootfloppy = '' if /do_bootfloppy\s*=\s*(no|false|0)\s*$/ig; - $relative_links = '' if /relative_links \s*=\s*(no|false|0)\s*$/ig; - $do_bootloader = '' if /do_bootloader\s*=\s*(no|false|0)\s*$/ig; - $do_initrd = '' if /do_initrd\s*=\s*(no|false|0)\s*$/ig; - $warn_initrd = '' if /warn_initrd\s*=\s*(no|false|0)\s*$/ig; - $use_hard_links = '' if /use_hard_links\s*=\s*(no|false|0)\s*$/ig; - $silent_modules = '' if /silent_modules\s*=\s*(no|false|0)\s*$/ig; - $silent_loader = '' if /silent_loader\s*=\s*(no|false|0)\s*$/ig; - $warn_reboot = '' if /warn_reboot\s*=\s*(no|false|0)\s*$/ig; - $minimal_swap= '' if /minimal_swap\s*=\s*(no|false|0)\s*$/ig; - $ignore_depmod_err = '' if /ignore_depmod_err\s*=\s*(no|false|0)\s*$/ig; - $relink_src_link = '' if /relink_src_link\s*=\s*(no|false|0)\s*$/ig; - $relink_build_link = '' if /relink_build_link\s*=\s*(no|false|0)\s*$/ig; - $force_build_link = '' if /force_build_link\s*=\s*(no|false|0)\s*$/ig; - - $do_symlink = Yes if /do_symlinks\s*=\s*(yes|true|1)\s*$/ig; - $no_symlink = Yes if /no_symlinks\s*=\s*(yes|true|1)\s*$/ig; - $reverse_symlink = Yes if /reverse_symlinks\s*=\s*(yes|true|1)\s*$/ig; - $link_in_boot= Yes if /link_in_boot\s*=\s*(yes|true|1)\s*$/ig; - $link_in_boot= Yes if /image_in_boot\s*=\s*(yes|true|1)\s*$/ig; - $move_image = Yes if /move_image\s*=\s*(yes|true|1)\s*$/ig; - $clobber_modules = Yes if /clobber_modules\s*=\s*(yes|true|1)\s*$/ig; - $do_boot_enable = Yes if /do_boot_enable\s*=\s*(yes|true|1)\s*$/ig; - $do_bootfloppy = Yes if /do_bootfloppy\s*=\s*(yes|true|1)\s*$/ig; - $do_bootloader = Yes if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig; - $explicit_do_loader = YES if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig; - $relative_links = Yes if /relative_links\s*=\s*(yes|true|1)\s*$/ig; - $do_initrd = Yes if /do_initrd\s*=\s*(yes|true|1)\s*$/ig; - $warn_initrd = Yes
Bug#568317: linux-image-* postinst did not correctly run lilo
Dear Ben, kernel-package has an old version of the script templates. The templates used by linux-2.6 are in debian/templates/temp.image.plain. Where do I find that? I am using (working, building on) Debian lenny, building from linux-source-2.6.26-21.tar.bz2 . (Anyway, are not my suggested changes simple enough so you do not need actual patch files?) Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#495529: linux-source: SMP process scheduler leaves CPUs idle
Package: linux-source Version: 2.6.18.dfsg.1-18etch6 Severity: normal I have some machines with 8 CPUs (dual Intel Xeon quad-core CPU chips), used for long-running calculations; normally there are no short-lived processes. If all processes are niced to the same or nearby level, then things behave as expected. But if there is a large difference in the nice level (e.g. one job at level 15 and 8 jobs at level 18), then one CPU is left idle: total CPU percentages adding to 700% and top shows idleness at 12.5%. (Under some similar conditions I have also observed 2 idle CPUs.) Please let me know if you need further details. Thanks, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- System Information: Debian Release: 4.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.18-pk02.19-svr Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#406902: kernel NFS data loss
On 24 Jul 07 I wrote: The patch below (against 2.6.8-16sarge7) seems to solve the problem. ... [patch for fs/exportfs/expfs.c] ... Seems to me that this has been incorporated (or done) in linux-source-2.6.18.dfsg.1-13etch2 . Earlier, on 20 Jul 07 I wrote: ... I noticed what I thought were oddities. ... Do you think the following patch against 2.6.8-16sarge7 code would be useful? ... [patch for fs/nfs/dir.c] ... That does not seem to have been added to linux-source-2.6.18.dfsg.1-13etch2 . I am still not quite sure what needs lock_kernel(), and anyway my patch did not actually solve anything, so that might be OK. Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#406902: [PATCH] Re: Bug#406902 kernel NFS data loss
The patch below (against 2.6.8-16sarge7) seems to solve the problem. I now run my machines with both this patch, and also the one I submitted on 20 Jul. Please include in future versions of the kernel. Thanks, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- fs/exportfs/expfs.c.bak 2007-04-17 07:58:28.0 +1000 +++ fs/exportfs/expfs.c 2007-07-23 10:04:19.759071709 +1000 @@ -76,6 +76,12 @@ return result; if (S_ISDIR(result-d_inode-i_mode)) { /* there is no other dentry, so fail */ +/* PSz 23 Jul 07 Not ESTALE but EACCES + * See comments around line 292 below, and + * http://bugs.debian.org/255931 + * http://bugs.debian.org/406902 + */ + err = -EACCES; goto err_result; } /* try any other aliases */ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#406902: kernel NFS data loss
Have now tested the patch in my previous message: it does not solve the problem I reported. Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#406902: kernel NFS data loss
I ran some tests today, and it seemed (but not conclusive) that the problem only occurs when the client is a multi-CPU SMP machine. Looking at kernel source code, I noticed what I thought were oddities. Do you think the following patch against 2.6.8-16sarge7 code would be useful? I have not yet tested whether this works at all, or whether it improves anything. Thanks, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia --- fs/nfs/dir.c.bak2004-08-14 15:36:58.0 +1000 +++ fs/nfs/dir.c2007-07-20 13:17:33.387030060 +1000 @@ -681,14 +681,15 @@ */ static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode) { + /* PSz 20 Jul 07 Do not we need lock_kernel() for nfs_renew_times()? */ + lock_kernel(); if (dentry-d_flags DCACHE_NFSFS_RENAMED) { - lock_kernel(); inode-i_nlink--; nfs_complete_unlink(dentry); - unlock_kernel(); } /* When creating a negative dentry, we want to renew d_time */ nfs_renew_times(dentry); + unlock_kernel(); iput(inode); } @@ -832,9 +833,12 @@ } } no_entry: + /* PSz 20 Jul 07 Do not we need lock_kernel() for d_add() and nfs_renew_times()? */ + lock_kernel(); d_add(dentry, inode); nfs_renew_times(dentry); nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); + unlock_kernel(); out: BUG_ON(error 0); return ERR_PTR(error); @@ -882,8 +886,12 @@ unlock_kernel(); out: dput(parent); - if (!ret) + if (!ret) { + /* PSz 20 Jul 07 Do not we need lock_kernel() for d_drop()? */ + lock_kernel(); d_drop(dentry); + unlock_kernel(); + } return ret; no_open: dput(parent); @@ -990,6 +998,7 @@ } inode = nfs_fhget(dentry-d_sb, fhandle, fattr); if (inode) { +/* PSz 20 Jul 07 The whole nfs_instantiate() is only ever called within lock_kernel() */ d_instantiate(dentry, inode); nfs_renew_times(dentry); nfs_set_verifier(dentry, nfs_save_change_attribute(dentry-d_parent-d_inode)); @@ -1200,6 +1209,7 @@ dir, qsilly); nfs_end_data_update(dir); if (!error) { +/* PSz 20 Jul 07 The whole nfs_sillyrename() is only ever called within lock_kernel() */ nfs_renew_times(dentry); nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); d_move(dentry, sdentry); -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#406902: kernel-source: NFS data loss
Package: kernel-source Version: 2.6.8_16sarge6 Severity: important Maybe this is related to http://bugs.debian.org/255931 (and concomitant discussion on [EMAIL PROTECTED]). I only guess that this is caused by a spurious ESTALE error return. Running on machine rome, as plain user psz: while :; do date; perl -e 'mkdir tdir; chdir tdir; open F, tscr; print F echo hello; sleep 2; /bin/pwd; echo bye\n; close F; system sh tscr; mkdir(dir$_),rmdir(dir$_) foreach 1..100; system sh tscr; unlink tscr; chdir ..; rmdir tdir'; done and (also on rome), as root: while :; do date; lsof | grep psz | grep -E 'cwd|deleted|dir' | grep -v xterm; sleep 1; done occasionally produces Mon Jan 15 08:12:40 EST 2007 hello /pisa/users/amstaff/psz/tdir bye shell-init: could not get current directory: getcwd: cannot access parent directories: No such file or directory hello /bin/pwd: cannot get current directory: No such file or directory bye Mon Jan 15 08:12:45 EST 2007 hello /pisa/users/amstaff/psz/tdir bye hello Mon Jan 15 08:12:48 EST 2007 Mon Jan 15 08:12:41 EST 2007 bash 13107 psz cwd DIR 0,15 8192581 /pisa/users/amstaff/psz (pisa:/usr/users) perl 13963 psz cwd DIR 0,15 40965800702 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) sh13964 psz cwd DIR 0,15 40965800702 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) sh13964 psz 255r REG 0,15 405800712 /pisa/users/amstaff/psz/tdir/tscr (pisa:/usr/users) sleep 13965 psz cwd DIR 0,15 40965800702 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) Mon Jan 15 08:12:42 EST 2007 bash 13107 psz cwd DIR 0,15 8192581 /pisa/users/amstaff/psz (pisa:/usr/users) perl 13963 psz cwd unknown 0,15 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) Mon Jan 15 08:12:44 EST 2007 bash 13107 psz cwd DIR 0,15 8192581 /pisa/users/amstaff/psz (pisa:/usr/users) perl 13963 psz cwd unknown 0,15 /pisa/users/amstaff/psz/tdir (deleted) (pisa:/usr/users) sh13984 psz cwd unknown 0,15 /pisa/users/amstaff/psz/tdir (deleted) (pisa:/usr/users) sh13984 psz 255r REG 0,15 405800712 /pisa/users/amstaff/psz/tdir/tscr (pisa:/usr/users) sleep 13985 psz cwd unknown 0,15 /pisa/users/amstaff/psz/tdir (deleted) (pisa:/usr/users) Mon Jan 15 08:12:45 EST 2007 bash 13107 psz cwd DIR 0,15 8192581 /pisa/users/amstaff/psz (pisa:/usr/users) perl 13995 psz cwd DIR 0,15 40965800702 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) sh13996 psz cwd DIR 0,15 40965800702 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) sh13996 psz 255r REG 0,15 405800712 /pisa/users/amstaff/psz/tdir/tscr (pisa:/usr/users) sleep 13997 psz cwd DIR 0,15 40965800702 /pisa/users/amstaff/psz/tdir (pisa:/usr/users) Mon Jan 15 08:12:46 EST 2007 Settings: rome# grep psz /etc/passwd psz:x:1001:1001:Paul Szabo:/users/amstaff/psz:/bin/bash rome# ls -l /users/amstaff lrwxrwxrwx 1 root root 19 Jan 19 2005 /users/amstaff - /pisa/users/amstaff rome# mount | grep pisa/users pisa:/usr/users on /pisa/users type nfs (rw,bg,rsize=8192,wsize=8192,addr=129.78.69.136) (pisa uses default root_squash in its /etc/exports). Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.8-spm1.7 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#402094: kernel-source-2.6.8: Intel drivers (net/e100.c, net/e1000/e1000_main.c)
Package: kernel-source-2.6.8 Version: 2.6.8-16sarge5 Severity: critical Justification: root security hole Noticed: Intel LAN Driver Buffer Overflow Local Privilege Escalation http://support.intel.com/support/network/sb/CS-023726.htm The Intel blurb says Linux, and specifically Debian, is affected also: Product Family OS Affected Driver Versions Corrected Driver Versions Intel PRO 10/100 Adapters Linux* 3.5.14 or previous3.5.17 or later Intel PRO/1000 AdaptersLinux 7.2.7 or previous 7.3.15 or later and it seems that: kernel-source-2.6.8/drivers/net/e100.c #define DRV_NAMEe100 #define DRV_VERSION 3.0.18 #define DRV_DESCRIPTION Intel(R) PRO/100 Network Driver #define DRV_COPYRIGHT Copyright(c) 1999-2004 Intel Corporation kernel-source-2.6.8/drivers/net/e1000/e1000_main.c char e1000_driver_name[] = e1000; char e1000_driver_string[] = Intel(R) PRO/1000 Network Driver; char e1000_driver_version[] = 5.2.52-k4; char e1000_copyright[] = Copyright (c) 1999-2004 Intel Corporation.; are quite old (so seem to be affected). Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.8-spm1.6 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages kernel-source-2.6.8 depends on: ii binutils 2.15-6 The GNU assembler, linker and bina ii bzip2 1.0.2-7high-quality block-sorting file co ii coreutils [fileutils] 5.2.1-2The GNU core utilities ii fileutils 5.2.1-2The GNU file management utilities -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
I will re-phrase the problem, this may be clearer for some people: The root_squash option is to protect from an evil root. Though group staff is root-equivalent, root_squash does not currently squash that group (for various reasons, the kernel not supporting such options being one). An evil root could become group staff on the client, not get squashed across NFS, then become root on the server: root_squash is defeated. Methods of exploitation, and ways to fix, were discussed already. I know this bug renders my systems exploitable as we relied on the default root_squash working, and never set non-default permissions on /usr/local or altered root's PATH. I beleive it renders many other systems exploitable also, but have no ways to test that hypothesis. Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
severity 384922 critical thanks Dear Steve, Sorry, I missed one: ... only exploitable when - you have a non-empty staff group on the client (+/- equivalent to untrusted root users on the client, since any root user can simply add users to this group) - you have NFS-shared filesystems that aren't marked nosuid - the untrusted user on the client has access to run processes on the NFS server - /usr/local/{bin,sbin} are in root's path - /usr/local/{bin,sbin} are writable by group staff No need for the attacker to have direct login access to the NFS server: if there is some user activity there, that could be trojaned. Of your five conditions, (1) is a given (what we are protecting against), (2) is what we use NFS for, (3) is likely to be present, and (4) and (5) are forced upon us by Debian policy. (Were not these things debated in #299007 already?) Sounds critically gaping to me. --- I am somewhat curious: who is Steinar, and who are you? I had submitted a bug against nfs-kernel-server; the maintainer there is Anibal. You jumped in and re-jiggled the severity; then there were some messages from Steinar, never anything from Anibal. After re-assigning to linux-2.6.16 (hmm... why the specific version?) where the maintainer is a nebulous committee, again you re-jiggle severity; and no word from the maintainers. Thanks, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
severity 384922 critical thanks Dear Steve, It happens to be very dangerous to share a filesystem via NFS between systems that have different security contexts. This does not make it a critical bug ... Is it acceptable for a root compromise of one system to easily propagate onto another? I am confused: what is the use and intent of root_squash, why is it enabled by default, and why is there an option to turn it off? Is it documented that NFS must never be used between systems in different security contexts, other than that UID/GIDs should match? Sorry, as I read Debian policy (and as discussed in #299007), I am not permitted to change root's PATH or change the permissions on /usr/local. *You* are permitted to do either of these things. Whether they will be done by default in *Debian* is a separate question. Could you please point me to where that is documented, and maybe explain what does the policy apply to? If policy may be ignored, then is there such a thing as a critical bug? Turn it off or fix it yourself and you will be safe: is that good enough? --- No need for the attacker to have direct login access to the NFS server: if there is some user activity there, that could be trojaned. Now you're not even talking about anything that can be *fixed* by smash_gids, you're talking about trojaning arbitrary files that will be accessed by individual users on the NFS server. The only way you can guard against a compromised client in that case is to never share home directories of any users you're worried about! I am talking about what an attacker can do, once he gets root on the client. I trust my users (to have no skills to attack). And it can be fixed: root on the server will be safe if we fix either of the last two points, in the policy or if the policy allows us to fix our systems; or if at great expense we implement squashing GIDs. The answer remains, don't set your NFS environment up that way. The correct answer seems to be fix or ignore the policy. Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
severity 384922 critical thanks Dear Steve, The issue is root compromise of an NFS server. If that is possible then it is critical; if it is not possible then the bug is solved. It seems logically impossible to downgrade this kind of bugs. Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
retitle 384922 NFS root_squash broken without support for squashing multiple groups severity 384922 critical thanks Dear Steve, [root_squash is] often circumventable ... References (CERT kb, securityfocus BID, secunia advisory)? I do not know of any (other than this bug) instances of defeating root_squash. Is it documented that NFS must ... How is it the responsibility of the kernel ... This bug was originally filed against NFS. I assert that it is not documented, because there is no such must. ... policy requirements ... that each package must satisfy ... NFS (or kernel) must be secure while complying with policy requirements. ... consequence of *installing or using the software* ... Using it in some common, reasonable way; following common UNIX knowledge, and Debian-specific documentation; using root_squash as was intended. No, there are three vectors by which an attacker on the client can exploit this problem to get root on the NFS server: - the server's /usr/local is itself NFS-shared from server to client, and in the absence of squash_gids the attacker is able to directly trojan root's path - a filesystem is shared that allows the attacker to write an suid binary to the server, and the attacker is able to start arbitrary processes on the server using credentials of a compromised user account and thereby trojan root's path, *or* attack through another privileged group - a filesystem is shared that allows the attacker to write a file to the server that triggers some other user who is a member of a privileged group to execute an attack using their privileged group membership; most likely this would be done via shared home directories and shell startup configuration. The attack I described does not fit well with either of your vectors. The first vector surely does not apply. I think you meant suid root in the second vector: then it does not apply, correctly prevented by root_squash; there are no compromised accounts on the server. The third vector does not apply, as there are no users in any privileged group on the server. The issue in this bug is the mere existence of the staff group: the fact that it is a root-equivalent, though without any member users, is sufficient for the attack to succeed. The attack could be prevented by squash gid trickery, or more simply by ensuring root has a sane PATH and/or by sane ownerships on /usr/local. (I note that Debian policy is insane, but that is not this bug.) Other UNIX distributions do not have root-equivalent users or groups, thus their NFS services and root_squash options perform correctly. ... unwilling to yourself block these users' access to /usr/local. None of my users have any (other than read) access. You're damn well free to ignore the policy when configuring your system. I have no idea where you got the idea that policy was binding on users. Thanks. Could you please state that in #299007 also? Still, Debian must be secure by default, out-of-the-box. I guess this bug could be solved by asking NFS to document it needs non-default, non-policy-compliant settings for it to function securely. But then those settings would need to go into its setup scripts, and it would be in breach of policy, triggering a serious bug and its removal from Debian. Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: nfs-kernel-server: root_squash is broken
retitle 384922 NFS insecure without support for squashing multiple groups tags 384922 security severity 384922 critical thanks Dear Steinar, ... You may want to actually talk to the NFS kernel server people ... Huh? I thought that is what have I been doing until now! (Oops, my mistake, package nfs-kernel-server does not come close...) Funny: you meekly accept that NFS is hopelessly insecure and no security conscious person will ever use it. Do you not find that offensive? (Not my comment: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=299007;msg=276 .) Funny: all it would take is a tiny policy change, to be permitted to drop /usr/local things from root's PATH, or to remove group staff writability from those things. Everyone seems to know those should be done... Thanks for your help, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
Dear Steve, You seem to think that this is important but not critical. Don't you agree that it is a root security hole? Thanks, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#384922: NFS insecure without support for squashing multiple groups
Dear Steve, Thanks for your response. The bug log indicates that it's only exploitable when - you have a non-empty staff group on the client (+/- equivalent to untrusted root users on the client, since any root user can simply add users to this group) - you have NFS-shared filesystems that aren't marked nosuid - the untrusted user on the client has access to run processes on the NFS server - /usr/local/{bin,sbin} are in root's path - /usr/local/{bin,sbin} are writable by group staff The last two points are true by default on Debian, but the first three points are configuration decisions on the part of the NFS server administrator. I understand that you have reasons to export shares allowing suid binaries in your own environment, but then you can also reconfigure root's path or the permissions on /usr/local/* in that case. Sorry, the NFS server administrator does not really have control over the first point. The purpose of root_squash is to limit and contain the damage of a root compromise on the client; if root on the client could be fully trusted then there would be no need or use for root_squash. Sorry, as I read Debian policy (and as discussed in #299007), I am not permitted to change root's PATH or change the permissions on /usr/local. I do agree that root should not have directories in its path by default that are writable by non-root users; but that is not this bug. Yes, that is #299007, but am told that policy bugs cannot be critical... Cheers, Paul Szabo [EMAIL PROTECTED] http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of SydneyAustralia -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]