Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2021-05-01 Thread Paul Szabo
I no longer use 32-bit kernels (but use the 64-bit amd64 kernel, even on
my few last remaining 32-bt machines): that seems a suitable workaround
or upgrade path. Should I try to test whether the issue with PAE
remains?

Cheers, Paul
-- 
Paul Szabo   p...@maths.usyd.edu.au   www.maths.usyd.edu.au/u/psz
School of Mathematics and Statistics   University of SydneyAustralia

I support NTEU members taking a stand for workplace rights in the face of
poorly-run change management. Visit www.nteu.org.au/sydney to learn more.



Bug#892105: linux-image-4.9.0-6-amd64: i40e driver still unstable

2019-01-09 Thread Paul Szabo
I use kernel 4.9.130 (my own build from current "stretch" sources,
package linux-source-4.9 version 4.9.130-2), and on my new machines
with i40e devices, I observe similar, occasional issues:

Jan  9 07:30:06 viale kernel: [428469.260531] i40e :19:00.1: cleared 
PE_CRITERR
Jan  9 07:30:06 viale kernel: [428469.260639] i40e :19:00.1: TX driver 
issue detected, PF reset issued

Jan  9 08:47:06 siv kernel: [422993.009196] i40e :19:00.1: cleared 
PE_CRITERR
Jan  9 08:47:06 siv kernel: [422993.013535] i40e :19:00.1 eth1: NIC Link is 
Down
Jan  9 08:47:16 siv kernel: [423002.131389] i40e :19:00.1 eth1: NIC Link is 
Up 10 Gbps Full Duplex, Flow Control: None

Curiously each of those machines only ever show the one type of error
(never show an error like the other machine), and both only complain
about eth1, never about eth0 (though eth0 is also connected with similar
traffic volumes).

Following the hints in this bug report, I will try the Intel i40e
driver, from (either)
   https://downloadcenter.intel.com/download/24411/
   https://sourceforge.net/projects/e1000/files/i40e%20stable/

Cheers, Paul
-- 
Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Bug#775541: NFS mounts fail at boot after Debian 8.5 upgrade

2016-09-06 Thread paul . szabo
Dear Vincent,

> Could you provide a bit more information about the package versions
> on your system?
> dpkg -l rpcbind nfs-common nfs-kernel-server systemd

psz@como:~$ dpkg -l rpcbind nfs-common nfs-kernel-server systemd
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name  Version   
Architecture  Description
+++-=-=-=-===
ii  nfs-common1:1.2.8-9 i386
  NFS support files common to client and server
ii  nfs-kernel-server 1:1.2.8-9 i386
  support for NFS kernel server
ii  rpcbind   0.2.1-6+deb8u1i386
  converts RPC program numbers into universal addresses
ii  systemd   215-17+deb8u4.psz i386
  system and service manager

The systemd packages are my "own", with my (trivial!) patches as per
https://bugs.debian.org/803013

> Also I think the output of these commands would be helpful
> systemd-analyze critical-path remote-fs-pre.target
> systemd-analyze critical-path nfs-kernel-server.service

I think you meant critical-chain:

psz@como:~$ systemd-analyze critical-chain remote-fs-pre.target
...
remote-fs-pre.target @98ms


psz@como:~$ systemd-analyze critical-chain nfs-kernel-server.service
...
nfs-kernel-server.service +223ms
  basic.target @4.819s
timers.target @4.818s
  systemd-tmpfiles-clean.timer @4.818s
sysinit.target @4.816s
  console-setup.service @4.813s +1ms
kbd.service @4.753s +58ms
  system.slice @108ms
-.slice @103ms

Cheers, Paul



Bug#775541: NFS mounts fail at boot after Debian 8.5 upgrade

2016-08-19 Thread paul . szabo
After upgrading from Debian jessie 8.4 to 8.5, my NFS mounts in fstab
failed at boot (or reboot) time. To fix, I changed the one file
  /lib/systemd/system/remote-fs-pre.target
adding the line
  After=rpcbind.target
then my NFS mounts work correctly.

Question: should I have used After=rpcbind.service instead?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



Bug#800945: linux-source-3.16: Scheduler prefers pinned tasks

2015-10-11 Thread paul . szabo
I believe the problem is solved. Please see discussion under
  http://marc.info/?t=14440821092=1=2
specifically the message
  http://marc.info/?l=linux-kernel=144459727213633=2

and quoting from there:

  I believe I have now solved the problem, simply by setting:
  ...
  for n in /proc/sys/kernel/sched_domain/cpu*/dom*/min_interval; do echo 0 > 
$n; done
  for n in /proc/sys/kernel/sched_domain/cpu*/dom*/max_interval; do echo 1 > 
$n; done
  echo 10 > /proc/sys/kernel/sched_latency_ns 
  echo 10 > /proc/sys/kernel/sched_min_granularity_ns
  echo 1 >  /proc/sys/kernel/sched_wakeup_granularity_ns

Please close this bug (as "solved" or "user config issue" or "invalid").
Sorry about the noise...

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



Bug#800945: linux-source-3.16: Scheduler prefers pinned tasks

2015-10-05 Thread Paul Szabo
Package: linux-source-3.16
Version: 3.16.7-ckt11-1+deb8u4
Severity: normal

The Linux CFS scheduler prefers pinned tasks and unfairly
gives more CPU time to tasks that have set CPU affinity.
This effect is observed with or without CGROUP controls.

To demonstrate: on an otherwise idle machine, as some user
run several processes pinned to each CPU, one for each CPU
(as many as CPUs present in the system) e.g. for a quad-core
non-HyperThreaded machine:

  taskset -c 0 perl -e 'while(1){1}' &
  taskset -c 1 perl -e 'while(1){1}' &
  taskset -c 2 perl -e 'while(1){1}' &
  taskset -c 3 perl -e 'while(1){1}' &

and (as that same or some other user) run some without
pinning:

  perl -e 'while(1){1}' &
  perl -e 'while(1){1}' &

and use e.g.   top   to observe that the pinned processes get
more CPU time than "fair".

Fairness is obtained when either:
 - there are as many un-pinned processes as CPUs; or
 - with CGROUP controls and the two kinds of processes run by
   different users, when there is just one un-pinned process; or
 - if the pinning is turned off for these processes (or they
   are started without).

Any insight is welcome!

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- System Information:
Debian Release: 8.2
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.7-ckt11-pk07.13.8-amd64 (SMP w/4 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 3.16.7-ckt11-pk07.13-amd64 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
CONFIG_KERNEL_BZIP2=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_RCU_USER_QS is not set
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANO

Bug#800929: linux_source-3.16: Scheduler prefers pinned tasks

2015-10-05 Thread paul . szabo
Sorry, that should have been package linux-source-3.16
(with a dash, not underscore). Maybe I could have re-assigned
this but; but do not really know how to, so instead now submitted
new one.

Please close this errant bug report.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



Bug#800929: linux_source-3.16: Scheduler prefers pinned tasks

2015-10-04 Thread Paul Szabo
Package: linux_source-3.16
Version: 3.16.7-ckt11-1+deb8u4
Severity: normal

The Linux CFS scheduler prefers pinned tasks and unfairly
gives more CPU time to tasks that have set CPU affinity.
This effect is observed with or without CGROUP controls.

To demonstrate: on an otherwise idle machine, as some user
run several processes pinned to each CPU, one for each CPU
(as many as CPUs present in the system) e.g. for a quad-core
non-HyperThreaded machine:

  taskset -c 0 perl -e 'while(1){1}' &
  taskset -c 1 perl -e 'while(1){1}' &
  taskset -c 2 perl -e 'while(1){1}' &
  taskset -c 3 perl -e 'while(1){1}' &

and (as that same or some other user) run some without
pinning:

  perl -e 'while(1){1}' &
  perl -e 'while(1){1}' &

and use e.g.   top   to observe that the pinned processes get
more CPU time than "fair".

Fairness is obtained when either:
 - there are as many un-pinned processes as CPUs; or
 - with CGROUP controls and the two kinds of processes run by
   different users, when there is just one un-pinned process; or
 - if the pinning is turned off for these processes (or they
   are started without).

Any insight is welcome!

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- System Information:
Debian Release: 8.2
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.7-ckt11-pk07.13.8-amd64 (SMP w/4 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 3.16.7-ckt11-pk07.13-amd64 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
CONFIG_KERNEL_BZIP2=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_RCU_USER_QS is not set
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANO

Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-02-24 Thread paul . szabo
Dear Simon,

 So if he config sparse memory, the issue can be solved I think.

In my config file I have:

CONFIG_HAVE_SPARSE_IRQ=y
CONFIG_SPARSE_IRQ=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_SPARSEMEM_STATIC=y
# CONFIG_INPUT_SPARSEKMAP is not set
# CONFIG_SPARSE_RCU_POINTER is not set

Is that sufficient for sparse memory, or should I try something else?
Or maybe, you meant that some kernel source patches might be possible
in the sparse memory code?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201302242210.r1omadad021...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-31 Thread paul . szabo
Dear Ben,

Thanks for the repeated explanations.

 PAE was a stop-gap ...
 ... [PAE] completely untenable.

Is this a good time to withdraw PAE, to tell the world that it does not
work? Maybe you should have had such comments in the code.

Seems that amd64 now works somewhat: on Debian the linux-image package
is tricky to install, and linux-headers is even harder. Is there work
being done to make this smoother?

---

I am still not convinced by the lowmem starvation explanation: because
then PAE should have worked fine on my 3GB machine; maybe I should also
try PAE on my 512MB laptop. - Though, what do I know, have not yet found
the buggy line of code I believe is lurking there...

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301310907.r0v974j9017...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-31 Thread paul . szabo
Dear Ben,

 Based on your experience I might propose to change the automatic kernel
 selection for i386 so that we use 'amd64' on a system with 16GB RAM and
 a capable processor.

Don't you mean change to amd64 for 4GB (or any RAM), never using PAE?
PAE is broken for any amount of RAM. More precisely, PAE with any RAM
fails the sleep test:
  n=0; while [ $n -lt 33000 ]; do sleep 600  ((n=n+1)); done
and with 32GB fails the write test:
  n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; 
((n=n+1)); done
Why do you think 16GB is significant?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301312306.r0vn6tbx012...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-31 Thread paul . szabo
Dear Ben,

(Removing the mailing lists
  linux-ker...@vger.kernel.org linux...@kvack.org
from CC, as this may be of no interest to them.)

 Seems that amd64 now works somewhat: on Debian the linux-image package
 is tricky to install,

 If you do an i386 (userland) installation then you must either select
 expert mode to get a choice of kernel packages, or else install the
 'amd64' kernel package afterward.

 and linux-headers is even harder.

 In what way?

Something about dependencies; though some of that may also be due to my
mixing of squeeze and wheezy 3.2.35 kernels. I will wait for the Debian
defaults to change to amd64 before reporting these oddities.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301312309.r0vn9ftv012...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-31 Thread paul . szabo
Dear Ben,

 PAE is broken for any amount of RAM.

 No it isn't.

Could I please ask you to expand on that?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201302010212.r112c6uq005...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-31 Thread paul . szabo
Dear Ben,

 PAE is broken for any amount of RAM.
 No it isn't.
 Could I please ask you to expand on that?

 I already did, a few messages back.

OK, thanks. Noting however that fewer than those back, I said:
  ... PAE with any RAM fails the sleep test:
  n=0; while [ $n -lt 33000 ]; do sleep 600  ((n=n+1)); done
and somewhere also said that non-PAE passes. Does not that prove
that PAE is broken?

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201302010313.r113dtj3027...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-30 Thread paul . szabo
Dear Pavel and Dave,

 The assertion was that 4GB with no PAE passed a forkbomb test (ooming)
 while 4GB of RAM with PAE hung, thus _PAE_ is broken.

Yes, PAE is broken. Still, maybe the above needs slight correction:
non-PAE HIGHMEM4G passed the sleep test: no OOM, nothing unexpected;
whereas PAE OOMed then hung (tested with various RAM from 3GB to 64GB).

The feeling I get is that amd64 is proposed as a drop-in replacement for
PAE, that support and development of PAE is gone, that PAE is dead.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301301940.r0ujeeka016...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-26 Thread paul . szabo
Dear Jonathan,

 If you can identify where it was fixed then your patch for older
 versions should go to stable with a reference to the upstream fix (see
 Documentation/stable_kernel_rules.txt).

 How about this patch?

 It was applied in mainline during the 3.3 merge window, so kernels
 newer than 3.2.y shouldn't need it.

 ...
 commit ab8fabd46f811d5153d8a0cd2fac9a0d41fb593d upstream.
 ...

Yes, I beleive that is the correct patch, surely better than my simple
subtraction of min_free_kbytes.

Noting, that this does not solve all problems, the latest 3.8 kernel
still crashes with OOM:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098961/comments/18

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301262023.r0qkniak029...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-25 Thread paul . szabo
Dear Minchan,

 So what's the effect for user?
 ...
 It seems you saw old kernel.
 ...
 Current kernel includes ...
 So I think we don't need this patch.

As I understand now, my patch is right and needed for older kernels;
for newer kernels, the issue has been fixed in equivalent ways; it was
an oversight that the change was not backported; and any justification
you need, you can get from those later better patches.

I asked:

  A question: what is the use or significance of vm_highmem_is_dirtyable?
  It seems odd that it would be used in setting limits or threshholds, but
  not used in decisions where to put dirty things. Is that so, is that as
  should be? What is the recommended setting of highmem_is_dirtyable?

The silence is deafening. I guess highmem_is_dirtyable is an aberration.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301250953.r0p9rose012...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-25 Thread paul . szabo
Dear Ben,

 If you can identify where it was fixed then ...

Sorry I cannot do that. I have no idea where kernel changelogs are kept.

I am happy to do some work. Please do not call me lazy.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301252349.r0pnnfyf024...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-25 Thread paul . szabo
Dear Ben,

 ... the mm maintainers are probably much better placed ...

Exactly. Now I wonder: are you one of them?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301260307.r0q37i8q002...@como.maths.usyd.edu.au




Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()

2013-01-25 Thread paul . szabo
Dear Fengguang (et al),

 There are 260MB reclaimable slab pages in the normal zone, however we
 somehow failed to reclaim them. ...

Could the problem be that without CONFIG_NUMA, zone_reclaim_mode stays
at zero and anyway zone_reclaim() does nothing in include/linux/swap.h ?

Though... there is no CONFIG_NUMA nor /proc/sys/vm/zone_reclaim_mode in
the Ubuntu non-PAE plain HIGHMEM4G kernel, and still it handles the
sleep test just fine.

Where does reclaiming happen (or meant to happen)?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301260357.r0q3vt1v005...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()

2013-01-24 Thread paul . szabo
Dear Fengguang,

 Or more simple, you may show us the OOM dmesg which will contain the
 number of dirty pages. ...

Do you mean kern.log lines like:

[  744.754199] bash invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0, 
oom_score_adj=0
[  744.754202] bash cpuset=/ mems_allowed=0
[  744.754204] Pid: 3836, comm: bash Not tainted 3.2.0-4-686-pae #1 Debian 
3.2.32-1
...
[  744.754354] active_anon:13497 inactive_anon:129 isolated_anon:0
[  744.754354]  active_file:2664 inactive_file:4144756 isolated_file:0
[  744.754355]  unevictable:0 dirty:510 writeback:0 unstable:0
[  744.754356]  free:11867217 slab_reclaimable:68289 slab_unreclaimable:7204
[  744.754356]  mapped:8066 shmem:250 pagetables:519 bounce:0
[  744.754361] DMA free:4260kB min:784kB low:980kB high:1176kB active_anon:0kB 
inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15784kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:11628kB 
slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB 
writeback_tmp:0kB pages_scanned:499 all_unreclaimable? yes
[  744.754364] lowmem_reserve[]: 0 867 62932 62932
[  744.754369] Normal free:43788kB min:44112kB low:55140kB high:66168kB 
active_anon:0kB inactive_anon:0kB active_file:912kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB 
slab_reclaimable:261528kB slab_unreclaimable:28812kB kernel_stack:3096kB 
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16060 
all_unreclaimable? yes
[  744.754372] lowmem_reserve[]: 0 0 496525 496525
[  744.754377] HighMem free:47420820kB min:512kB low:789888kB high:1579264kB 
active_anon:53988kB inactive_anon:516kB active_file:9740kB 
inactive_file:16579320kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:63555300kB mlocked:0kB dirty:2040kB writeback:0kB mapped:32260kB 
shmem:1000kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB 
pagetables:2076kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? no
[  744.754380] lowmem_reserve[]: 0 0 0 0
[  744.754381] DMA: 445*4kB 36*8kB 3*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 
0*1024kB 1*2048kB 0*4096kB = 4260kB
[  744.754386] Normal: 1132*4kB 620*8kB 237*16kB 70*32kB 38*64kB 26*128kB 
20*256kB 14*512kB 4*1024kB 3*2048kB 0*4096kB = 43808kB
[  744.754390] HighMem: 226*4kB 242*8kB 155*16kB 66*32kB 10*64kB 1*128kB 
1*256kB 0*512kB 1*1024kB 2*2048kB 11574*4096kB = 47420680kB
[  744.754395] 4148173 total pagecache pages
[  744.754396] 0 pages in swap cache
[  744.754397] Swap cache stats: add 0, delete 0, find 0/0
[  744.754397] Free swap  = 0kB
[  744.754398] Total swap = 0kB
[  744.900649] 16777200 pages RAM
[  744.900650] 16549378 pages HighMem
[  744.900651] 664304 pages reserved
[  744.900652] 4162276 pages shared
[  744.900653] 104263 pages non-shared

? (The above and similar were reported to http://bugs.debian.org/695182 .)
Do you want me to log and report something else?

I believe the above crash may be provoked simply by running:
  n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; (( n = 
$n + 1 )); done 
on any PAE machine with over 32GB RAM. Oddly the problem does not seem
to occur when using mem=32g or lower on the kernel boot line (or on
machines with less than 32GB RAM).

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301242343.r0onhjxr024...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()

2013-01-24 Thread paul . szabo
Dear Jan,

 I think he found the culprit of the problem being min_free_kbytes was not
 properly reflected in the dirty throttling. ... Paul please correct me
 if I'm wrong.

Sorry but have to correct you.

I noticed and patched/corrected two problems, one with (setpoint-dirty)
in bdi_position_ratio(), another with min_free_kbytes not subtracted
from dirtyable memory. Fixing those problems, singly or in combination,
did not help in avoiding OOM: running
  n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; 
((n=$n+1)); done
still produces an OOM after a few files written (on a PAE machine with
over 32GB RAM).

Also, a quite similar OOM may be produced on any PAE machine with
  n=0; while [ $n -lt 33000 ]; do sleep 600  ((n=n+1)); done
This was tested on machines with as low as just 3GB RAM ... and
curiously the same machine with plain (not PAE but HIGHMEM4G)
kernel handles the same sleep test without any problems.

(Thus I now think that the remaining bug is not with writeback.)

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301250015.r0p0fr3t003...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()

2013-01-24 Thread paul . szabo
Dear Fengguang,

 There are 260MB reclaimable slab pages in the normal zone ...

Marked all_unreclaimable? yes: is that wrong? Question asked also in:
http://marc.info/?l=linux-mmm=135873981326767w=2

 ... however we somehow failed to reclaim them. ...

I made a patch that would do a drop_caches at that point, please see:
http://bugs.debian.org/695182
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;filename=drop_caches.patch;att=1;bug=695182
http://marc.info/?l=linux-mmm=135785511125549w=2
and that successfully avoided OOM when writing files.
But, the drop_caches patch did not protect against the sleep test.

 ... What's your filesystem and the content of /proc/slabinfo?

Filesystem is EXT3. See output of slabinfo in Debian bug above or in
http://marc.info/?l=linux-mmm=135796154427544w=2

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301250147.r0p1l00t001...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-22 Thread paul . szabo
Dear Minchan,

 So what's the effect for user?

Sorry I have no idea.

The kernel seems to work well without this patch; or in fact not so
well, PAE crashing with spurious OOM. In my fruitless efforts of
avoiding OOM by sensible choices of sysctl tunables, I noticed that
maybe the treatment of min_free_kbytes was not right. Getting this
right did not help in avoiding OOM.

 It seems you saw old kernel.

Yes I have Debian on my machines. :-)

 Current kernel includes following logic.
 
 static unsigned long global_dirtyable_memory(void)
 {
 unsigned long x;
 
 x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 x -= min(x, dirty_balance_reserve);
 
 if (!vm_highmem_is_dirtyable)
 x -= highmem_dirtyable_memory(x);
 
 return x + 1;   /* Ensure that we never return 0 */
 }
 
 And dirty_lanace_reserve already includes high_wmark_pages.
 Look at calculate_totalreserve_pages.
 
 So I think we don't need this patch.
 Thanks.

Presumably, dirty_balance_reserve takes min_free_kbytes into account?
Then I agree that this patch is not needed on those newer kernels.

A question: what is the use or significance of vm_highmem_is_dirtyable?
It seems odd that it would be used in setting limits or threshholds, but
not used in decisions where to put dirty things. Is that so, is that as
should be? What is the recommended setting of highmem_is_dirtyable?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301230311.r0n3bpde032...@como.maths.usyd.edu.au



Bug#695182: Write couple of 1GB files for OOM crash

2013-01-21 Thread paul . szabo
Dear Jonathan,

Thanks again for your help in writing correct and acceptable
patches. I did change a few things, hopefully for the better.

I decided not to push the drop-caches part of my patch; because it
now seems to me that it is not the essence of the issue: it protects
against OOM when writing a few files, but does not protect when running
a few sleeps. I am coming back to the idea that this is some
signed-vs-unsigned or similar issue... though I could not find it yet!

---

Using the amd64 kernel seems a workable workaround for the OOM issue.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301211139.r0lbdpee003...@como.maths.usyd.edu.au



Bug#695182: [PATCH] MAX_PAUSE to be at least 4

2013-01-20 Thread paul . szabo
Ensure MAX_PAUSE is 4 or larger, so limits in
return clamp_val(t, 4, MAX_PAUSE);
(the only use of it) are not back-to-front.

(This patch does not solve the PAE OOM issue.)

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo p...@maths.usyd.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo p...@maths.usyd.edu.au

--- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page-writeback.c 2013-01-21 13:57:05.0 +1100
@@ -39,7 +39,7 @@
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE  max(HZ/5, 1)
+#define MAX_PAUSE  max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301210307.r0l37yug018...@como.maths.usyd.edu.au



Bug#695182: [PATCH] Subtract min_free_kbytes from dirtyable memory

2013-01-20 Thread paul . szabo
When calculating amount of dirtyable memory, min_free_kbytes should be
subtracted because it is not intended for dirty pages.

Using an extern int because that is the only interface to some such
sysctl values.

(This patch does not solve the PAE OOM issue.)

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo p...@maths.usyd.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo p...@maths.usyd.edu.au

--- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page-writeback.c 2013-01-21 13:57:05.0 +1100
@@ -343,12 +343,16 @@
 unsigned long determine_dirtyable_memory(void)
 {
unsigned long x;
+   extern int min_free_kbytes;
 
x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
 
+   /* Subtract min_free_kbytes */
+   x -= min(x, min_free_kbytes  (PAGE_SHIFT - 10));
+
return x + 1;   /* Ensure that we never return 0 */
 }


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301210315.r0l3fngv021...@como.maths.usyd.edu.au



Bug#695182: [RFC] Comments and questions

2013-01-20 Thread paul . szabo
Many comments and questions:

In __alloc_pages_slowpath(), did_some_progress is set twice but only
checked after the second setting, so the first setting is wasted.

[Setting of MAX_PAUSE reported previously.]

The setting of highmem_is_dirtyable seems used only to calculate limits
and threshholds, not used in any decisions: seems odd.

[Subtraction of min_free_kbytes reported previously.]

Sanity check of input values in bdi_position_ratio().

[Difference (setpoint-dirty) reported previously.]

Seems that bdi_max_pause() always returns a too-small value, maybe it
should simply return a fixed value.

A test in balance_dirty_pages() marked unlikely() observed to be quite
common.

Maybe zone_reclaimable() should return true with non-zero
NR_SLAB_RECLAIMABLE.

Seems that all_unreclaimable may be set wrongly or too early.

Maybe global_reclaimable_pages() and zone_reclaimable_pages() should add
or include NR_SLAB_RECLAIMABLE.

(This does not solve the PAE OOM issue.)

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo p...@maths.usyd.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo p...@maths.usyd.edu.au

--- mm/page_alloc.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page_alloc.c 2013-01-18 14:07:31.0 +1100
@@ -2207,6 +2207,10 @@ rebalance:
 * If we failed to make any progress reclaiming, then we are
 * running out of options and have to consider going OOM
 */
+   /*
+* We had did_some_progress set twice, but is only checked here
+* so the first setting was lost. Is that as should be?
+*/
if (!did_some_progress) {
if ((gfp_mask  __GFP_FS)  !(gfp_mask  __GFP_NORETRY)) {
if (oom_killer_disabled)
--- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page-writeback.c 2013-01-20 07:35:52.0 +1100
@@ -39,7 +39,7 @@
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE  max(HZ/5, 1)
+#define MAX_PAUSE  max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.
@@ -343,12 +343,22 @@ static unsigned long highmem_dirtyable_m
 unsigned long determine_dirtyable_memory(void)
 {
unsigned long x;
+   extern int min_free_kbytes;
 
x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
+   /*
+* Seems that highmem_is_dirtyable is only used here, in the
+* calculation of limits and threshholds of dirtiness, not in deciding
+* where to put dirty things. Is that so? Is that as should be?
+* What is the recommended setting of highmem_is_dirtyable?
+*/
if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
 
+   /* Subtract min_free_kbytes */
+   x -= min(x, min_free_kbytes  (PAGE_SHIFT - 10));
+
return x + 1;   /* Ensure that we never return 0 */
 }
 
@@ -541,6 +551,9 @@ static unsigned long bdi_position_ratio(
 
if (unlikely(dirty = limit))
return 0;
+   /* Never seen this happen, just sanity-check paranoia */
+   if (unlikely(freerun = dirty))
+   return 16  RATELIMIT_CALC_SHIFT;
 
/*
 * global setpoint
@@ -559,7 +572,7 @@ static unsigned long bdi_position_ratio(
 * = fast response on large errors; small oscillation near setpoint
 */
setpoint = (freerun + limit) / 2;
-   x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+   x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;
@@ -995,6 +1008,13 @@ static unsigned long bdi_max_pause(struc
 * The pause time will be settled within range (max_pause/4, max_pause).
 * Apply a minimal value of 4 to get a non-zero max_pause/4.
 */
+   /*
+* On large machine it seems we always return 4,
+* on smaller desktop machine mostly return 5 (rarely 9 or 14).
+* Are those too small? Should we return something fixed e.g.
+   return (HZ/10);
+* instead of this wasted/useless calculation?
+*/
return clamp_val(t, 4, MAX_PAUSE);
 }
 
@@ -1109,6 +1129,11 @@ static void balance_dirty_pages(struct a
}
pause = HZ * pages_dirtied / task_ratelimit;
if (unlikely(pause = 0)) {
+   /*
+* Not unlikely: often we get zero.
+* Seems we always get 0 on large machine.
+* Should not do a pause of 1 here?
+*/
trace_balance_dirty_pages(bdi,
  dirty_thresh

Bug#695182: [PATCH] Negative (setpoint-dirty) in bdi_position_ratio()

2013-01-19 Thread paul . szabo
In bdi_position_ratio(), get difference (setpoint-dirty) right even when
negative. Both setpoint and dirty are unsigned long, the difference was
zero-padded thus wrongly sign-extended to s64. This issue affects all
32-bit architectures, does not affect 64-bit architectures where long
and s64 are equivalent.

In this function, dirty is between freerun and limit, the pseudo-float x
is between [-1,1], expected to be negative about half the time. With
zero-padding, instead of a small negative x we obtained a large positive
one so bdi_position_ratio() returned garbage.

Casting the difference to s64 also prevents overflow with left-shift;
though normally these numbers are small and I never observed a 32-bit
overflow there.

(This patch does not solve the PAE OOM issue.)

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo p...@maths.usyd.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo p...@maths.usyd.edu.au

--- mm/page-writeback.c.old 2012-12-06 22:20:40.0 +1100
+++ mm/page-writeback.c 2013-01-20 07:47:55.0 +1100
@@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(
 * = fast response on large errors; small oscillation near setpoint
 */
setpoint = (freerun + limit) / 2;
-   x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+   x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20130122.r0k02atl031...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-17 Thread paul . szabo
Dear Dave,

 On my large machine, 'free' fails to show about 2GB memory ...
 You probably have a memory hole. ...
 The e820 map (during early boot in dmesg) or /proc/iomem will let you
 locate your memory holes.

Now that my machine is running an amd64 kernel, 'free' shows total Mem
65854128 (up from 64447796 with PAE kernel), and I do not see much
change in /proc/iomem output (below). Is that as should be?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


---

root@zeno:~# uname -a
Linux zeno.maths.usyd.edu.au 3.2.35-pk06.12-amd64 #2 SMP Thu Jan 17 13:19:53 
EST 2013 x86_64 GNU/Linux
root@zeno:~# free
 total   used   free sharedbuffers cached
Mem:  658541281591704   64262424  0 227036 175620
-/+ buffers/cache:1189048   64665080
Swap:195312636  0  195312636
root@zeno:~# cat /proc/iomem
- : reserved
0001-00099bff : System RAM
00099c00-0009 : reserved
000a-000b : PCI Bus :00
000c-000d : PCI Bus :00
  000c-000c7fff : Video ROM
  000c8000-000cf5ff : Adapter ROM
  000cf800-000d07ff : Adapter ROM
  000d0800-000d0bff : Adapter ROM
000e-000f : reserved
  000f-000f : System ROM
0010-7e445fff : System RAM
  0100-0168f8c6 : Kernel code
  0168f8c7-018f24bf : Kernel data
  0197d000-019dafff : Kernel bss
7e446000-7e565fff : ACPI Non-volatile Storage
7e566000-7f1e2fff : reserved
7f1e3000-7f25efff : ACPI Tables
7f25f000-7f31cfff : reserved
7f31d000-7f323fff : ACPI Non-volatile Storage
7f324000-7f333fff : reserved
7f334000-7f33bfff : ACPI Non-volatile Storage
7f33c000-7f365fff : reserved
7f366000-7f7f : ACPI Non-volatile Storage
7f80-7fff : RAM buffer
8000-dfff : PCI Bus :00
  8000-8fff : PCI MMCONFIG  [bus 00-ff]
8000-8fff : reserved
  9000-900f : :00:16.0
  9010-901f : :00:16.1
  dd00-ddff : PCI Bus :08
dd00-ddff : :08:03.0
  de00-de4f : PCI Bus :07
de00-de3f : :07:00.0
de47c000-de47 : :07:00.0
  de60-de6f : PCI Bus :02
  df00-df8f : PCI Bus :08
df00-df7f : :08:03.0
df80-df803fff : :08:03.0
  df90-df9f : PCI Bus :07
  dfa0-dfaf : PCI Bus :02
dfa0-dfa1 : :02:00.1
  dfa0-dfa1 : igb
dfa2-dfa3 : :02:00.0
  dfa2-dfa3 : igb
dfa4-dfa43fff : :02:00.1
  dfa4-dfa43fff : igb
dfa44000-dfa47fff : :02:00.0
  dfa44000-dfa47fff : igb
  dfb0-dfb03fff : :00:04.7
  dfb04000-dfb07fff : :00:04.6
  dfb08000-dfb0bfff : :00:04.5
  dfb0c000-dfb0 : :00:04.4
  dfb1-dfb13fff : :00:04.3
  dfb14000-dfb17fff : :00:04.2
  dfb18000-dfb1bfff : :00:04.1
  dfb1c000-dfb1 : :00:04.0
  dfb2-dfb200ff : :00:1f.3
  dfb21000-dfb217ff : :00:1f.2
dfb21000-dfb217ff : ahci
  dfb22000-dfb223ff : :00:1d.0
dfb22000-dfb223ff : ehci_hcd
  dfb23000-dfb233ff : :00:1a.0
dfb23000-dfb233ff : ehci_hcd
  dfb25000-dfb25fff : :00:05.4
  dfffc000-dfffdfff : pnp 00:02
e000-fbff : PCI Bus :80
  fbe0-fbef : PCI Bus :84
fbe0-fbe3 : :84:00.0
fbe4-fbe5 : :84:00.0
fbe6-fbe63fff : :84:00.0
  fbf0-fbf03fff : :80:04.7
  fbf04000-fbf07fff : :80:04.6
  fbf08000-fbf0bfff : :80:04.5
  fbf0c000-fbf0 : :80:04.4
  fbf1-fbf13fff : :80:04.3
  fbf14000-fbf17fff : :80:04.2
  fbf18000-fbf1bfff : :80:04.1
  fbf1c000-fbf1 : :80:04.0
  fbf2-fbf20fff : :80:05.4
  fbffe000-fbff : pnp 00:12
fc00-fcff : pnp 00:01
fd00-fdff : pnp 00:01
fe00-feaf : pnp 00:01
feb0-febf : pnp 00:01
fec0-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec4-fec403ff : IOAPIC 2
fed0-fed003ff : HPET 0
fed08000-fed08fff : pnp 00:0c
fed1c000-fed3 : reserved
  fed1c000-fed1 : pnp 00:0c
fed45000-fedf : pnp 00:01
fee0-fee00fff : Local APIC
ff00- : reserved
  ff00- : pnp 00:0c
1-107fff : System RAM
root@zeno:~# 

---

For comparison, output obtained (and reported previously) when machine
was running PAE kernel:
root@zeno:~# cat /proc/iomem
- : reserved
0001-00099bff : System RAM
00099c00-0009 : reserved
000a-000b : PCI Bus :00
  000a-000b : Video RAM area
000c-000d : PCI Bus :00
  000c-000c7fff : Video ROM
  000c8000-000cf5ff : Adapter ROM
  000cf800-000d07ff : Adapter ROM
  000d0800-000d0bff : Adapter ROM
000e-000f : reserved
  000f-000f : System ROM
0010-7e445fff : System RAM
  0100-01610e15 : Kernel code
  01610e16-01802dff : Kernel data
  0188-018b2fff : Kernel bss
7e446000-7e565fff : ACPI

Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-15 Thread paul . szabo
Dear Sedat,

 ... it really makes sense to switch to x86_64
 (amd64) architecture when you have a modern computer.
 Switching makes even more sense when you have more than 4GiB RAM.

You seem to say that one should switch to amd64 (if hardware allows),
even with less than 4GB RAM (where 32-bit non-PAE HIGHMEM4G kernel would
work fine), and that one should definitely switch with over 4GB RAM.
There would be no need or use for PAE kernels, which should be dropped.

I think I agree.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


---

Quoting in full for the benefit of 695...@bugs.debian.org :

 From sedat.di...@gmail.com Tue Jan 15 21:26:14 2013
 Date: Tue, 15 Jan 2013 11:25:41 +0100
 Subject: Re: [RFC] Reproducible OOM with just a few sleeps
 From: Sedat Dilek sedat.di...@gmail.com
 To: paul.sz...@sydney.edu.au, Paul Szabo p...@maths.usyd.edu.au
 Cc: LKML linux-ker...@vger.kernel.org, linux-mm linux...@kvack.org,
 Ben Hutchings b...@decadent.org.uk
 
 Hi Paul,
 
 I followed a bit the thread you started in [1].
 
 As you might know i386 got eliminated in Linux-3.8.
 
 I had several discussions with the Debian kernel-team about the iN86
 (N=4..6) and PAE kernel-flavours.
 On the one hand I can understand the reduction of linux-images
 especially for iN86.
 Even i486 is a bit unfirm as there is no much hardware around, but
 Debian will keep i486 for a while (release maintenance).
 
 Topic PAE:
 Unfortunately, I had a notebook with a Intel Centrino Banias CPU (no
 PAE) which should use the -486 kernel-flavour due to the Debian
 kernel-team.
 I played with some different kernel-setup which did not give me more
 benefit (openssl benchmarks etc.)
 The -686-pae kernel did run on my hardware, but as known with all the
 SMP-NO-OPs.
 
 Depending on the hardware, it really makes sense to switch to x86_64
 (amd64) architecture when you have a modern computer.
 Switching makes even more sense when you have more than 4GiB RAM.
 IMHO using a -686-amd64 Debian kernel makes ZERO sense, real 64-Bit or die!
 
 I switched to 64-bit... and I switched from Debian/sid to
 Ubuntu/precise as well :-).
 ( NOTE: I am working here since April 2012 in a WUBI environment (no
 native Ubuntu Linux) :-). )
 
 And I am building my kernels by myself.
 So I know very well whom to blame :-).
 
 Some last words: I had several fruitful or fruitless discussions with
 the Debian kernel-team, but I can confirm (with all my heart) this
 team makes a fantastic job.
 I can recommend you Ben's blog (recently I read a series about news in
 the Debian/wheezy kernel) if your world is Debian or Ubuntu (Debian !=
 Ubuntu).
 
 Just my 0.02EUR (no British pound, here as well: when you are a member
 of the EU chose EUR not pound!).
 
 Regards,
 - Sedat -
 
 
 [1] http://marc.info/?t=13579617221r=1w=2


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301160022.r0g0mdgj010...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-14 Thread paul . szabo
Dear Dave,

 Seems that any i386 PAE machine will go OOM just by running a few
 processes. To reproduce:
   sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600  ((n=n+1)); done'
 ...
 I think what you're seeing here is that, as the amount of total memory
 increases, the amount of lowmem available _decreases_ due to inflation
 of mem_map[] (and a few other more minor things).  The number of sleeps
 you can do is bound by the number of processes, as you noticed from
 ulimit.  Creating processes that don't use much memory eats a relatively
 large amount of low memory.
 This is a sad (and counterintuitive) fact: more RAM actually *CREATES*
 RAM bottlenecks on 32-bit systems.

I understand that more RAM leaves less lowmem. What is unacceptable is
that PAE crashes or freezes with OOM: it should gracefully handle the
issue. Noting that (for a machine with 4GB or under) PAE fails where the
HIGHMEM4G kernel succeeds and survives.

 On my large machine, 'free' fails to show about 2GB memory ...
 You probably have a memory hole. ...
 The e820 map (during early boot in dmesg) or /proc/iomem will let you
 locate your memory holes.

Thanks, that might explain it. Output of /proc/iomem below: sorry I do
not know how to interpret it.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


---
root@zeno:~# cat /proc/iomem
- : reserved
0001-00099bff : System RAM
00099c00-0009 : reserved
000a-000b : PCI Bus :00
  000a-000b : Video RAM area
000c-000d : PCI Bus :00
  000c-000c7fff : Video ROM
  000c8000-000cf5ff : Adapter ROM
  000cf800-000d07ff : Adapter ROM
  000d0800-000d0bff : Adapter ROM
000e-000f : reserved
  000f-000f : System ROM
0010-7e445fff : System RAM
  0100-01610e15 : Kernel code
  01610e16-01802dff : Kernel data
  0188-018b2fff : Kernel bss
7e446000-7e565fff : ACPI Non-volatile Storage
7e566000-7f1e2fff : reserved
7f1e3000-7f25efff : ACPI Tables
7f25f000-7f31cfff : reserved
7f31d000-7f323fff : ACPI Non-volatile Storage
7f324000-7f333fff : reserved
7f334000-7f33bfff : ACPI Non-volatile Storage
7f33c000-7f365fff : reserved
7f366000-7f7f : ACPI Non-volatile Storage
7f80-7fff : RAM buffer
8000-dfff : PCI Bus :00
  8000-8fff : PCI MMCONFIG  [bus 00-ff]
8000-8fff : reserved
  9000-900f : :00:16.0
  9010-901f : :00:16.1
  dd00-ddff : PCI Bus :08
dd00-ddff : :08:03.0
  de00-de4f : PCI Bus :07
de00-de3f : :07:00.0
de47c000-de47 : :07:00.0
  de60-de6f : PCI Bus :02
  df00-df8f : PCI Bus :08
df00-df7f : :08:03.0
df80-df803fff : :08:03.0
  df90-df9f : PCI Bus :07
  dfa0-dfaf : PCI Bus :02
dfa0-dfa1 : :02:00.1
  dfa0-dfa1 : igb
dfa2-dfa3 : :02:00.0
  dfa2-dfa3 : igb
dfa4-dfa43fff : :02:00.1
  dfa4-dfa43fff : igb
dfa44000-dfa47fff : :02:00.0
  dfa44000-dfa47fff : igb
  dfb0-dfb03fff : :00:04.7
  dfb04000-dfb07fff : :00:04.6
  dfb08000-dfb0bfff : :00:04.5
  dfb0c000-dfb0 : :00:04.4
  dfb1-dfb13fff : :00:04.3
  dfb14000-dfb17fff : :00:04.2
  dfb18000-dfb1bfff : :00:04.1
  dfb1c000-dfb1 : :00:04.0
  dfb2-dfb200ff : :00:1f.3
  dfb21000-dfb217ff : :00:1f.2
dfb21000-dfb217ff : ahci
  dfb22000-dfb223ff : :00:1d.0
dfb22000-dfb223ff : ehci_hcd
  dfb23000-dfb233ff : :00:1a.0
dfb23000-dfb233ff : ehci_hcd
  dfb25000-dfb25fff : :00:05.4
  dfffc000-dfffdfff : pnp 00:02
e000-fbff : PCI Bus :80
  fbe0-fbef : PCI Bus :84
fbe0-fbe3 : :84:00.0
fbe4-fbe5 : :84:00.0
fbe6-fbe63fff : :84:00.0
  fbf0-fbf03fff : :80:04.7
  fbf04000-fbf07fff : :80:04.6
  fbf08000-fbf0bfff : :80:04.5
  fbf0c000-fbf0 : :80:04.4
  fbf1-fbf13fff : :80:04.3
  fbf14000-fbf17fff : :80:04.2
  fbf18000-fbf1bfff : :80:04.1
  fbf1c000-fbf1 : :80:04.0
  fbf2-fbf20fff : :80:05.4
  fbffe000-fbff : pnp 00:12
fc00-fcff : pnp 00:01
fd00-fdff : pnp 00:01
fe00-feaf : pnp 00:01
feb0-febf : pnp 00:01
fec0-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec4-fec403ff : IOAPIC 2
fed0-fed003ff : HPET 0
fed08000-fed08fff : pnp 00:0c
fed1c000-fed3 : reserved
  fed1c000-fed1 : pnp 00:0c
fed45000-fedf : pnp 00:01
fee0-fee00fff : Local APIC
ff00- : reserved
  ff00- : pnp 00:0c
1-107fff : System RAM
root@zeno:~# 


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301142036

Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-14 Thread paul . szabo
Dear Dave,

 ... What is unacceptable is that PAE crashes or freezes with OOM:
 it should gracefully handle the issue. Noting that (for a machine
 with 4GB or under) PAE fails where the HIGHMEM4G kernel succeeds ...

 You have found a delta, but you're not really making apples-to-apples
 comparisons.  The page tables ...

I understand that the exact sizes of page tables are very important to
developers. To the rest of us, all that matters is that the kernel moves
them to highmem or swap or whatever, that it maybe emits some error
message but that it does not crash or freeze.

 There's probably a bug here.  But, it's incredibly unlikely to be seen
 in practice on anything resembling a modern system. ...

Probably, I found the bug on a very modern and brand-new system, just
trying to copy a few ISO image files and trying to log in a hundred
students. My machine crashed under those very practical and normal
circumstances. The demos with dd and sleep were just that: easily
reproducible demos.

 ... easily worked around by upgrading to a 64-bit kernel ...

Do you mean that PAE should never be used, but to use amd64 instead?

 ... Raising the vm.min_free_kbytes sysctl (to perhaps 10x of
 its current value on your system) is likely to help the hangs too,
 although it will further consume lowmem.

I have tried that, it did not work. As you say, it is backward.

 ... for a bug with ... so many reasonable workarounds ...

Only one workaround was proposed: use amd64.

PAE is buggy and useless, should be deprecated and removed.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301150216.r0f2gnyw022...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with partial workaround

2013-01-13 Thread paul . szabo
Dear Dave,

You wrote:

 ... 64-bit kernels should basically be drop-in replacements for 32-bit
 ones.  You can keep userspace 100% 32-bit, and just have a 64-bit
 kernel.

Any advice on how I would install a 64-bit kernel, particularly in the
Debian world? Seems to me that on a 32-bit machine, apt-get does not
see the amd64 kernels.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301132332.r0dnwlra027...@como.maths.usyd.edu.au



Bug#695182: Installing a 64-bit kernel on Debian i386 (Re: [RFC] Reproducible OOM with partial workaround)

2013-01-13 Thread paul . szabo
Dear Jonathan,

  B) The modern way:
   dpkg --add-architecture amd64
   apt-get update
   apt-get install linux-image-3.2.0-4-amd64:amd64

Thanks, that seems to have worked well on my desktop PC. Will now test
that everything works still, then similarly convert all my machines,
hopefully so abandoning buggy PAE.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301140041.r0e0f77p014...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-12 Thread paul . szabo
The issue is a regression with PAE, reproduced and verified on Ubuntu,
on my home PC with 3GB RAM.

My PC was running kernel linux-image-3.2.0-35-generic so it showed:
  psz@DellE520:~$ uname -a
  Linux DellE520 3.2.0-35-generic #55-Ubuntu SMP Wed Dec 5 17:45:18 UTC 2012 
i686 i686 i386 GNU/Linux
  psz@DellE520:~$ free -l
   total   used   free sharedbuffers cached
  Mem:   3087972 6922562395716  0  18276 427116
  Low:861464  71372 790092
  High:  2226508 6208841605624
  -/+ buffers/cache: 2468642841108
  Swap: 2920 258364   19742556
Then it handled the sleep test
  bash -c 'n=0; while [ $n -lt 33000 ]; do sleep 600  ((n=n+1)); ((m=n%500)); 
if [ $m -lt 1 ]; then echo -n $n - ; date; free -l; sleep 1; fi; done'
just fine, stopped only by max user processes (default setting of
ulimit -u 23964), or raising that limit stopped when the machine ran
out of PID space; there was no OOM.

Installing and running the PAE kernel so it showed:
  psz@DellE520:~$ uname -a
  Linux DellE520 3.2.0-35-generic-pae #55-Ubuntu SMP Wed Dec 5 18:04:39 UTC 
2012 i686 i686 i386 GNU/Linux
  psz@DellE520:~$ free -l
   total   used   free sharedbuffers cached
  Mem:   3087620 6811882406432  0 167332 352296
  Low:865208 214080 651128
  High:  412 4671081755304
  -/+ buffers/cache: 1615602926060
  Swap: 2920  0   2920
and re-trying the sleep test, it ran into OOM after 18000 or so sleeps
and crashed/froze so I had to press the POWER button to recover.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301121941.r0cjf5ps017...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-12 Thread paul . szabo
Reported to Ubuntu also:
  PAE regression: OOM with just a few sleeps
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1098961

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301122020.r0ckk04m018...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with partial workaround

2013-01-11 Thread paul . szabo
Dear Andrew,

 Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.

Please see below: I do not know what any of that means. This machine has
been running just fine, with all my users logging in here via XDMCP from
X-terminals, dozens logged in simultaneously. (But, I think I could make
it go OOM with more processes or logins.)

 If so, you *may* be able to work around this by setting
 /proc/sys/vm/dirty_ratio really low, so the system keeps a minimum
 amount of dirty pagecache around.  Then, with luck, if we haven't
 broken the buffer_heads_over_limit logic it in the past decade (we
 probably have), the VM should be able to reclaim those buffer_heads.

I tried setting dirty_ratio to funny values, that did not seem to
help. Did you notice my patch about bdi_position_ratio(), how it was
plain wrong half the time (for negative x)? Anyway that did not help.

 Alternatively, use a filesystem which doesn't attach buffer_heads to
 dirty pages.  xfs or btrfs, perhaps.

Seems there is also a problem not related to filesystem... or rather,
the essence does not seem to be filesystem or caches. The filesystem
thing now seems OK with my patch doing drop_caches.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


---

root@como:~# free -lm
 total   used   free sharedbuffers cached
Mem: 62936   2317  60618  0 41635
Low:   367271 95
High:62569   2045  60523
-/+ buffers/cache:   1640  61295
Swap:   131071  0 131071
root@como:~# cat /proc/slabinfo
slabinfo - version: 2.1
# nameactive_objs num_objs objsize objperslab 
pagesperslab : tunables limit batchcount sharedfactor : slabdata 
active_slabs num_slabs sharedavail
fuse_request   0  0376   434 : tunables000 : 
slabdata  0  0  0
fuse_inode 0  0448   364 : tunables000 : 
slabdata  0  0  0
bsg_cmd0  0288   282 : tunables000 : 
slabdata  0  0  0
ntfs_big_inode_cache  0  0512   324 : tunables000 : 
slabdata  0  0  0
ntfs_inode_cache   0  0176   462 : tunables000 : 
slabdata  0  0  0
nfs_direct_cache   0  0 80   511 : tunables000 : 
slabdata  0  0  0
nfs_inode_cache 5404   5404584   284 : tunables000 : 
slabdata193193  0
isofs_inode_cache  0  0360   454 : tunables000 : 
slabdata  0  0  0
fat_inode_cache0  0408   404 : tunables000 : 
slabdata  0  0  0
fat_cache  0  0 24  1701 : tunables000 : 
slabdata  0  0  0
jbd2_revoke_record  0  0 32  1281 : tunables000 : 
slabdata  0  0  0
journal_handle  5440   5440 24  1701 : tunables000 : 
slabdata 32 32  0
journal_head   16768  16768 64   641 : tunables000 : 
slabdata262262  0
revoke_record  20224  20224 16  2561 : tunables000 : 
slabdata 79 79  0
ext4_inode_cache   0  0584   284 : tunables000 : 
slabdata  0  0  0
ext4_free_data 0  0 40  1021 : tunables000 : 
slabdata  0  0  0
ext4_allocation_context  0  0112   361 : tunables00
0 : slabdata  0  0  0
ext4_prealloc_space  0  0 72   561 : tunables000 : 
slabdata  0  0  0
ext4_io_end0  0576   284 : tunables000 : 
slabdata  0  0  0
ext4_io_page   0  0  8  5121 : tunables000 : 
slabdata  0  0  0
ext2_inode_cache   0  0480   344 : tunables000 : 
slabdata  0  0  0
ext3_inode_cache   16531  19965488   334 : tunables000 : 
slabdata605605  0
ext3_xattr 0  0 48   851 : tunables000 : 
slabdata  0  0  0
dquot840840192   422 : tunables000 : 
slabdata 20 20  0
rpc_inode_cache  144144448   364 : tunables000 : 
slabdata  4  4  0
UDP-Lite   0  0576   284 : tunables000 : 
slabdata  0  0  0
xfrm_dst_cache 0  0320   514 : tunables000 : 
slabdata  0  0  0
UDP  896896576   284 : tunables000 : 
slabdata 32 32  0
tw_sock_TCP 1344   1344128   321

Bug#695182: [RFC] Reproducible OOM with partial workaround

2013-01-11 Thread paul . szabo
Dear Andrew,

 Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.
 Please see below ...
 ... Was this dump taken when the system was at or near oom?

No, that was a quiescent machine. Please see a just-before-OOM dump in
my next message (in a little while).

 Please send a copy of the oom-killer kernel message dump, if you still
 have one.

Please see one in next message, or in
http://bugs.debian.org/695182

 I tried setting dirty_ratio to funny values, that did not seem to
 help.
 Did you try setting it as low as possible?

Probably. Maybe. Sorry, cannot say with certainty.

 Did you notice my patch about bdi_position_ratio(), how it was
 plain wrong half the time (for negative x)? 
 Nope, please resend.

Quoting from
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;att=1;bug=695182
:
...
 - In bdi_position_ratio() get difference (setpoint-dirty) right even
   when it is negative, which happens often. Normally these numbers are
   small and even with left-shift I never observed a 32-bit overflow.
   I believe it should be possible to re-write the whole function in
   32-bit ints; maybe it is not worth the effort to make it efficient;
   seeing how this function was always wrong and we survived, it should
   simply be removed.
...
--- mm/page-writeback.c.old 2012-10-17 13:50:15.0 +1100
+++ mm/page-writeback.c 2013-01-06 21:54:59.0 +1100
[ Line numbers out because other patches not shown ]
...
@@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio(
 * = fast response on large errors; small oscillation near setpoint
 */
setpoint = (freerun + limit) / 2;
-   x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+   x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;
...

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301120324.r0c3o7dy015...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with just a few sleeps

2013-01-11 Thread paul . szabo
Dear Linux-MM,

Seems that any i386 PAE machine will go OOM just by running a few
processes. To reproduce:
  sh -c 'n=0; while [ $n -lt 1 ]; do sleep 600  ((n=n+1)); done'
My machine has 64GB RAM. With previous OOM episodes, it seemed that
running (booting) it with mem=32G might avoid OOM; but an OOM was
obtained just the same, and also with lower memory:
  Memorysleeps to OOM   free shows total
  (mem=64G)  5300   64447796
  mem=32G   10200   31155512
  mem=16G   13400   14509364
  mem=8G14200   6186296
  mem=6G15200   4105532
  mem=4G16400   2041364
The machine does not run out of highmem, nor does it use any swap.

Comparing with my desktop PC: has 4GB RAM installed, free shows 3978592
total. Running the sleep test, it simply froze after 16400 running...
no response to ping, will need to press the RESET button.

---

On my large machine, 'free' fails to show about 2GB memory, e.g. with
mem=16G it shows:

root@zeno:~# free -l
 total   used   free sharedbuffers cached
Mem:  14509364 435440   14073924  0   4068 111328
Low:769044 120232 648812
High: 13740320 315208   13425112
-/+ buffers/cache: 320044   14189320
Swap:134217724  0  134217724

---

Please let me know of any ideas, or if you want me to run some other
test or want to see some other output.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-

Details for when my machine was running with 64GB RAM:

In another window I was running
  cat /proc/slabinfo; free -l
repeatedly, and output of that (just before OOM) was:

+ cat /proc/slabinfo
slabinfo - version: 2.1
# nameactive_objs num_objs objsize objperslab 
pagesperslab : tunables limit batchcount sharedfactor : slabdata 
active_slabs num_slabs sharedavail
fuse_request   0  0376   434 : tunables000 : 
slabdata  0  0  0
fuse_inode 0  0448   364 : tunables000 : 
slabdata  0  0  0
bsg_cmd0  0288   282 : tunables000 : 
slabdata  0  0  0
ntfs_big_inode_cache  0  0512   324 : tunables000 : 
slabdata  0  0  0
ntfs_inode_cache   0  0176   462 : tunables000 : 
slabdata  0  0  0
nfs_direct_cache   0  0 80   511 : tunables000 : 
slabdata  0  0  0
nfs_inode_cache   28 28584   284 : tunables000 : 
slabdata  1  1  0
isofs_inode_cache  0  0360   454 : tunables000 : 
slabdata  0  0  0
fat_inode_cache0  0408   404 : tunables000 : 
slabdata  0  0  0
fat_cache  0  0 24  1701 : tunables000 : 
slabdata  0  0  0
jbd2_revoke_record  0  0 32  1281 : tunables000 : 
slabdata  0  0  0
journal_handle  4080   4080 24  1701 : tunables000 : 
slabdata 24 24  0
journal_head1024   1024 64   641 : tunables000 : 
slabdata 16 16  0
revoke_record768768 16  2561 : tunables000 : 
slabdata  3  3  0
ext4_inode_cache   0  0584   284 : tunables000 : 
slabdata  0  0  0
ext4_free_data 0  0 40  1021 : tunables000 : 
slabdata  0  0  0
ext4_allocation_context  0  0112   361 : tunables00
0 : slabdata  0  0  0
ext4_prealloc_space  0  0 72   561 : tunables000 : 
slabdata  0  0  0
ext4_io_end0  0576   284 : tunables000 : 
slabdata  0  0  0
ext4_io_page   0  0  8  5121 : tunables000 : 
slabdata  0  0  0
ext2_inode_cache   0  0480   344 : tunables000 : 
slabdata  0  0  0
ext3_inode_cache1467   2079488   334 : tunables000 : 
slabdata 63 63  0
ext3_xattr 0  0 48   851 : tunables000 : 
slabdata  0  0  0
dquot168168192   422 : tunables000 : 
slabdata  4  4  0
rpc_inode_cache  108108448   364 : tunables000 : 
slabdata  3  3  0
UDP-Lite   0  0576   284 : tunables000 : 
slabdata  0  0  0
xfrm_dst_cache 0  0320   514 : tunables000 : 
slabdata  0  0  0
UDP  336336576   284

Bug#695182: Write couple of 1GB files for OOM crash

2013-01-10 Thread paul . szabo
Dear Jonathan,

 ... once you have a reproducible test I imagine the mm folks will
 already be very interested and they may be able to help ...

But, I do already have a reproducible test! Write a few files, as per
the initial message in this http://bugs.debian.org/695182 ; I also have
a patch/solution/workaround for that particular test.

Now I observed another way of making a machine with 64GB crash (sorry,
not crash but to suffer an OOM episode). I am pretty sure this other
test is reproducible, but is cumbersome to set up and tedious to do.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301100913.r0a9dqdr016...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with partial workaround

2013-01-10 Thread paul . szabo
Dear Linux-MM,

On a machine with i386 kernel and over 32GB RAM, an OOM condition is
reliably obtained simply by writing a few files to some local disk
e.g. with:
  n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; 
((n=$n+1)); done
Crash usually occurs after 16 or 32 files written. Seems that the
problem may be avoided by using mem=32G on the kernel boot, and that
it occurs with any amount of RAM over 32GB.

I developed a workaround patch for this particular OOM demo, dropping
filesystem caches when about to exhaust lowmem. However, subsequently
I observed OOM when running many processes (as yet I do not have an
easy-to-reproduce demo of this); so as I suspected, the essence of the
problem is not with FS caches.

Could you please help in finding the cause of this OOM bug?

Please see
http://bugs.debian.org/695182
for details, in particular my workaround patch
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;att=1;bug=695182

(Please reply to me directly, as I am not a subscriber to the linux-mm
mailing list.)

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301102158.r0alwi4i031...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with partial workaround

2013-01-10 Thread paul . szabo
Dear Dave,

 Your configuration has never worked.  This isn't a regression ...
 ... does not mean that we expect it to work.

Do you mean that CONFIG_HIGHMEM64G is deprecated, should not be used;
that all development is for 64-bit only?

 ... 64-bit kernels should basically be drop-in replacements ...

Will think about that. I know all my servers are 64-bit capable, will
need to check all my desktops.

---

I find it puzzling that there seems to be a sharp cutoff at 32GB RAM,
no problem under but OOM just over; whereas I would have expected
lowmem starvation to be gradual, with OOM occuring much sooner with
64GB than with 34GB. Also, the kernel seems capable of reclaiming
lowmem, so I wonder why does that fail just over the 32GB threshhold.
(Obviously I have no idea what I am talking about.)

---

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301110046.r0b0k6lr024...@como.maths.usyd.edu.au



Bug#695182: [RFC] Reproducible OOM with partial workaround

2013-01-10 Thread paul . szabo
Dear Dave,

 ... I don't believe 64GB of RAM has _ever_ been booted on a 32-bit
 kernel without either violating the ABI (3GB/1GB split) or doing
 something that never got merged upstream ...

Sorry to be so contradictory:

psz@como:~$ uname -a
Linux como.maths.usyd.edu.au 3.2.32-pk06.10-t01-i386 #1 SMP Sat Jan 5 18:34:25 
EST 2013 i686 GNU/Linux
psz@como:~$ free -l
 total   used   free sharedbuffers cached
Mem:  644469004729292   59717608  0  15972 480520
Low:375836 304400  71436
High: 640710644424892   59646172
-/+ buffers/cache:4232800   60214100
Swap:134217724  0  134217724
psz@como:~$ 

(though I would not know about violations).

But OK, I take your point that I should move with the times.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301110146.r0b1kf4t032...@como.maths.usyd.edu.au



Bug#695182: Write couple of 1GB files for OOM crash

2013-01-09 Thread paul . szabo
I am slowly coming around to the idea of splitting my patch up into many
parts, each addressing just one little issue, so there would be a couple
of [PATCH]-es and some [RFC]-s, each profusely explained. Unfortunately
did not yet have time to work on this; but want to do, hopefully soon.

However... we now did another test on the server, and it ran into OOM.
We were load-testing. We intend to use the server for student logins,
via XDMCP from any one of 109 X-terminals (similar to LTSP); we logged
in on all our terminals, and on each ran a firefox and an rstudio. The
server ran into OOM, having plenty of free memory but having exhausted
lowmem, after 80 logins. Then rebooted the server with mem=32G on the
kernel, and there was no OOM after logging on all our fake students.

So, there is a bug still. I will try to make up some easily reproducible
test, and if can provoke OOM then look again into kernel code.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301100148.r0a1m1pj022...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-06 Thread Paul Szabo
Dear Ben,

 Please read Documentation/SubmittingPatches, use scripts/checkpatch.pl
 and try to provide a patch that is suitable for upstream inclusion.
 Also, your name belongs in the patch header, not in the code.

I changed the proposed patch accordingly, scripts/checkpatch.pl produces
just a few warnings. I had my patch in use for a while now, so I believe
it is suitably tested.

Please let me know if I need to do anything else.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia
Avoid OOM when filesystem caches fill lowmem and are not reclaimed,
doing drop_caches at that point. The issue is easily reproducible on
machines with over 32GB RAM. The patch correctly protects against OOM.
The added call to drop_caches has been observed to trigger needlessly
but on quite rare occasions only.

Also included are several minor fixes:
 - Comment about highmem_is_dirtyable that seems used only to calculate
   limits and threshholds, not used in any decisions.
 - In determine_dirtyable_memory() subtract min_free_kbytes from
   returned value. I believe this is right, that min_free_kbytes is
   not intended for dirty pages.
 - In bdi_position_ratio() get difference (setpoint-dirty) right even
   when it is negative, which happens often. Normally these numbers are
   small and even with left-shift I never observed a 32-bit overflow.
   I believe it should be possible to re-write the whole function in
   32-bit ints; maybe it is not worth the effort to make it efficient;
   seeing how this function was always wrong and we survived, it should
   simply be removed.
 - Comment in bdi_max_pause() that it seems to always return a too-small
   value, maybe it should simply return a fixed value.
 - Comment in balance_dirty_pages() about a test marked unlikely() but
   which I observe to be quite common.
 - Comment in __alloc_pages_slowpath() about did_some_progress being
   set twice, but only checked after the second setting, so the first
   setting is lost and wasted.
 - Comment in zone_reclaimable() that maybe should return true with
   non-zero NR_SLAB_RECLAIMABLE.
 - Comment about all_unreclaimable which may be set wrongly.
 - Comments in global_reclaimable_pages() and zone_reclaimable_pages()
   about maybe adding or including NR_SLAB_RECLAIMABLE.

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo p...@maths.usyd.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo p...@maths.usyd.edu.au

--- fs/drop_caches.c.old	2012-10-17 13:50:15.0 +1100
+++ fs/drop_caches.c	2013-01-04 21:52:47.0 +1100
@@ -65,3 +65,10 @@ int drop_caches_sysctl_handler(ctl_table
 	}
 	return 0;
 }
+
+/* Easy call: do echo 3  /proc/sys/vm/drop_caches */
+void easy_drop_caches(void)
+{
+	iterate_supers(drop_pagecache_sb, NULL);
+	drop_slab();
+}
--- mm/page-writeback.c.old	2012-10-17 13:50:15.0 +1100
+++ mm/page-writeback.c	2013-01-06 21:54:59.0 +1100
@@ -39,7 +39,8 @@
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE		max(HZ/5, 1)
+/* Might as well be max(HZ/5,4) to ensure max_pause/40 always */
+#define MAX_PAUSE		max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.
@@ -343,11 +344,26 @@ static unsigned long highmem_dirtyable_m
 unsigned long determine_dirtyable_memory(void)
 {
 	unsigned long x;
+	int y = 0;
+	extern int min_free_kbytes;
 
 	x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
+	/*
+	 * Seems that highmem_is_dirtyable is only used here, in the
+	 * calculation of limits and threshholds of dirtiness, not in deciding
+	 * where to put dirty things. Is that so? Is that as should be?
+	 * What is the recommended setting of highmem_is_dirtyable?
+	 */
 	if (!vm_highmem_is_dirtyable)
 		x -= highmem_dirtyable_memory(x);
+	/* Subtract min_free_kbytes */
+	if (min_free_kbytes  0)
+		y = min_free_kbytes  (PAGE_SHIFT - 10);
+	if (x  y)
+		x -= y;
+	else
+		x = 0;
 
 	return x + 1;	/* Ensure that we never return 0 */
 }
@@ -541,6 +557,9 @@ static unsigned long bdi_position_ratio(
 
 	if (unlikely(dirty = limit))
 		return 0;
+	/* Never seen this happen, just sanity-check paranoia */
+	if (unlikely(freerun = limit))
+		return 16  RATELIMIT_CALC_SHIFT;
 
 	/*
 	 * global setpoint
@@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio(
 	 * = fast response on large errors; small oscillation near setpoint
 	 */
 	setpoint = (freerun + limit) / 2;
-	x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+	x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
 		limit - setpoint + 1);
 	pos_ratio = x;
 	pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;
@@ -995,6 +1014,13 @@ static unsigned long bdi_max_pause(struc
 	 * The pause time will be settled within range (max_pause/4

Bug#695182: Write couple of 1GB files for OOM crash

2013-01-06 Thread paul . szabo
 too small? Should we return something fixed e.g.
 +return (HZ/10);
 + * instead of this wasted/useless calculation?
 + */
  return clamp_val(t, 4, MAX_PAUSE);

 Another while at it, I guess.

Yes. On one hand the code sometimes pauses for HZ/10 or so, and on the
other hand we have this routine working so hard to return the minimum
possible value.

 @@ -1109,6 +1135,11 @@ static void balance_dirty_pages(struct a
  }
  pause = HZ * pages_dirtied / task_ratelimit;
  if (unlikely(pause = 0)) {
 +/*
 + * Not unlikely: often we get zero.
 + * Seems we always get 0 on large machine.
 + * Should not do a pause of 1 here?
 + */
  trace_balance_dirty_pages(bdi,

 git log -S'if (unlikely(pause = 0))' -- mm/page-writeback.c tells
 me this is from 57fc978cfb61 (writeback: control dirty pause time,
 2011-06-11), in case that helps.

Will try to look it up sometime. - I had printk() to tell me the value
of pause, and mostly I got zero. Wonder how others measured it.

 [...]
 --- mm/vmscan.c.old  2012-10-17 13:50:15.0 +1100
 +++ mm/vmscan.c  2013-01-06 09:50:49.0 +1100
 [...]
 @@ -2726,9 +2731,87 @@ loop_again:
  nr_slab = shrink_slab(shrink, sc.nr_scanned, 
 lru_pages);
  sc.nr_reclaimed += 
 reclaim_state-reclaimed_slab;
  total_scanned += sc.nr_scanned;
 +if (unlikely(
 +i == 1 
 +nr_slab  10 
 +(reclaim_state-reclaimed_slab)  10 
 +zone_page_state(zone, NR_SLAB_RECLAIMABLE) 
  10 
 +!zone_watermark_ok_safe(zone, order,
 +high_wmark_pages(zone), 
 end_zone, 0))) {
 +/*
 + * We are stressed (desperate), better

 This is getting really deeply nested.  Would it be possible to split
 out a function so this code could be more easily contemplated in
 isolation?

Hmm... I would much prefer to leave it as is.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301070303.r0733hs0026...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-01 Thread Paul Szabo
tags 695182 - moreinfo
thanks

Dear Ben,

I suggest the following patch, which seems to solve the problem.
Two attachments: minimal.patch just to show the simplicity, and
complete.patch with comments and enhancements.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia
--- fs/drop_caches.c.old	2012-10-17 13:50:15.0 +1100
+++ fs/drop_caches.c	2013-01-01 09:23:57.0 +1100
@@ -58,10 +58,16 @@
 	if (ret)
 		return ret;
 	if (write) {
 		if (sysctl_drop_caches  1)
 			iterate_supers(drop_pagecache_sb, NULL);
 		if (sysctl_drop_caches  2)
 			drop_slab();
 	}
 	return 0;
 }
+
+void PSz_drop_caches(void)
+{
+	iterate_supers(drop_pagecache_sb, NULL);
+	drop_slab();
+}
--- mm/vmscan.c.old	2012-10-17 13:50:15.0 +1100
+++ mm/vmscan.c	2013-01-01 22:58:51.0 +1100
@@ -2719,20 +2719,25 @@
 KSWAPD_ZONE_BALANCE_GAP_RATIO);
 			if (!zone_watermark_ok_safe(zone, order,
 	high_wmark_pages(zone) + balance_gap,
 	end_zone, 0)) {
 shrink_zone(priority, zone, sc);
 
 reclaim_state-reclaimed_slab = 0;
 nr_slab = shrink_slab(shrink, sc.nr_scanned, lru_pages);
 sc.nr_reclaimed += reclaim_state-reclaimed_slab;
 total_scanned += sc.nr_scanned;
+if (i==1  nr_slab10  (reclaim_state-reclaimed_slab)10  zone_page_state(zone,NR_SLAB_RECLAIMABLE)10)
+{
+extern void PSz_drop_caches(void);
+  PSz_drop_caches();
+}
 
 if (nr_slab == 0  !zone_reclaimable(zone))
 	zone-all_unreclaimable = 1;
 			}
 
 			/*
 			 * If we've done a decent amount of scanning and
 			 * the reclaim ratio is low, start doing writepage
 			 * even in laptop mode
 			 */
--- fs/drop_caches.c.old	2012-10-17 13:50:15.0 +1100
+++ fs/drop_caches.c	2013-01-01 09:23:57.0 +1100
@@ -58,10 +58,16 @@
 	if (ret)
 		return ret;
 	if (write) {
 		if (sysctl_drop_caches  1)
 			iterate_supers(drop_pagecache_sb, NULL);
 		if (sysctl_drop_caches  2)
 			drop_slab();
 	}
 	return 0;
 }
+
+void PSz_drop_caches(void)
+{
+	iterate_supers(drop_pagecache_sb, NULL);
+	drop_slab();
+}
--- mm/page-writeback.c.old	2012-10-17 13:50:15.0 +1100
+++ mm/page-writeback.c	2013-01-01 23:01:52.0 +1100
@@ -32,21 +32,22 @@
 #include linux/sysctl.h
 #include linux/cpu.h
 #include linux/syscalls.h
 #include linux/buffer_head.h
 #include linux/pagevec.h
 #include trace/events/writeback.h
 
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE		max(HZ/5, 1)
+/* PSz: Might as well be max(HZ/5,4) to ensure max_pause/40 always */
+#define MAX_PAUSE		max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.
  */
 #define BANDWIDTH_INTERVAL	max(HZ/5, 1)
 
 #define RATELIMIT_CALC_SHIFT	10
 
 /*
  * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited
@@ -339,22 +340,40 @@
  *
  * Returns the numebr of pages that can currently be freed and used
  * by the kernel for direct mappings.
  */
 unsigned long determine_dirtyable_memory(void)
 {
 	unsigned long x;
 
 	x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
+/*
+ * PSz: Seems that highmem_is_dirtyable is only used here, in the
+ * calculation of limits and threshholds of dirtiness, not in deciding
+ * where to put dirty things. Is that so? Is that as should be?
+ * What is the recommended setting of highmem_is_dirtyable?
+ */
 	if (!vm_highmem_is_dirtyable)
 		x -= highmem_dirtyable_memory(x);
+/* PSz: Should not we subtract min_free_kbytes? */
+{
+extern int min_free_kbytes;
+int y = 0;
+/* printk(PSz: determine_dirtyable_memory was %ld pages, now subtract min_free_kbytes=%d\n,x,min_free_kbytes); */
+if (min_free_kbytes  0)
+  y = min_free_kbytes  (PAGE_SHIFT - 10);
+if (x  y)
+  x -= y;
+else
+  x = 0;
+}
 
 	return x + 1;	/* Ensure that we never return 0 */
 }
 
 static unsigned long dirty_freerun_ceiling(unsigned long thresh,
 	   unsigned long bg_thresh)
 {
 	return (thresh + bg_thresh) / 2;
 }
 
@@ -534,39 +553,43 @@
 	unsigned long limit = hard_dirty_limit(thresh);
 	unsigned long x_intercept;
 	unsigned long setpoint;		/* dirty pages' target balance point */
 	unsigned long bdi_setpoint;
 	unsigned long span;
 	long long pos_ratio;		/* for scaling up/down the rate limit */
 	long x;
 
 	if (unlikely(dirty = limit))
 		return 0;
+	if (unlikely(freerun = limit))
+/* PSz: Never seen this happen, just sanity-check paranoia */
+		return (16  RATELIMIT_CALC_SHIFT);
 
 	/*
 	 * global setpoint
 	 *
 	 *   setpoint - dirty 3
 	 *f(dirty) := 1.0 + ()
 	 *   limit - setpoint
 	 *
 	 * it's a 3rd order polynomial that subjects to
 	 *
 	 * (1) f(freerun)  = 2.0 = rampup dirty_ratelimit reasonably fast
 	 * (2) f(setpoint) = 1.0 = the balance point
 	 * (3) f(limit)= 0   = the hard limit
 	 * (4) df/dx  = 0	 = negative feedback control
 	 * (5) the closer to setpoint, the smaller |df/dx

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-01 Thread paul . szabo
Dear Ben,

I tried to send
  tags 695182 - moreinfo
to cont...@bugs.debian.org but it came back with:
  You have been specifically excluded from using the control interface.
I guess that has something to do with bug#299007. Would you please be
able to have those settings corrected?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201301012154.r01lslrj025...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-27 Thread paul . szabo
Dear Ben,

In the OOM message in my initial bug report, I see
  Normal ... slab_reclaimable:261528kB ... all_unreclaimable? yes
Is that a contradiction? Should not that slab have been reclaimed?
Original line:
[  744.754369] Normal free:43788kB min:44112kB low:55140kB high:66168kB 
active_anon:0kB inactive_anon:0kB active_file:912kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB 
slab_reclaimable:261528kB slab_unreclaimable:28812kB kernel_stack:3096kB 
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16060 
all_unreclaimable? yes

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201212271152.qbrbqsst027...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-16 Thread paul . szabo
Dear Ben,

In response to your comments: x seems to be in the range [-1,1]. The
returned pos_ratio would be within [0,2] if not for the final *8.

---

[Funny: taking difference of unsigned ints and expect the result to be
negative in some sense. Seems the problem was not with large memory but
with negative numbers. Curious the bug was not noticed before.]

Need to cast and sign-extend before taking difference of unsigned
numbers, as the following demonstrates:

$ cat silly.c
#include stdio.h
main()
{
  unsigned long i,j;
  long long x;
  i=1; j=2;
  x = j-i; printf(j-i = %lld\n,x);
  x = i-j; printf(i-j = %lld\n,x);
  x = (long long)i-j; printf(OK  = %lld\n,x);
}
$ cc silly.c; a.out
j-i = 1
i-j = 4294967295
OK  = -1
$ 

and in fact things go bad, e.g. freerun=2172 limit=2896 dirty=2589
should get x=-155, whereas original formula gets x=11831710 and Ben's
formula gets x=-769071435.

Seems a correct patch would be:

--- old/mm/page-writeback.c 2012-10-17 13:50:15.0 +1100
+++ new/mm/page-writeback.c 2012-12-17 12:25:14.0 +1100
@@ -559,7 +559,7 @@
 * = fast response on large errors; small oscillation near setpoint
 */
setpoint = (freerun + limit) / 2;
-   x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+   x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;

However, with that patch in place I still got an OOM crash (log below).
More bugs remain...

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


---

Dec 17 12:43:59 zeno kernel: xterm invoked oom-killer: gfp_mask=0xd0, order=0, 
oom_adj=0, oom_score_adj=0
Dec 17 12:43:59 zeno kernel: Pid: 2704, comm: xterm Not tainted 
3.2.32-pk06.08-i386t07 #7
Dec 17 12:43:59 zeno kernel: Call Trace:
Dec 17 12:43:59 zeno kernel:  [c1607533] ? printk+0x18/0x1a
Dec 17 12:43:59 zeno kernel:  [c10776b8] dump_header.isra.10+0x68/0x180
Dec 17 12:43:59 zeno kernel:  [c1069807] ? delayacct_end+0x97/0xb0
Dec 17 12:43:59 zeno kernel:  [c11d664e] ? ___ratelimit+0x7e/0xf0
Dec 17 12:43:59 zeno kernel:  [c1077929] 
oom_kill_process.constprop.15+0x49/0x230
Dec 17 12:43:59 zeno kernel:  [c1039d34] ? has_capability_noaudit+0x24/0x30
Dec 17 12:43:59 zeno kernel:  [c1077880] ? oom_badness+0xb0/0x110
Dec 17 12:43:59 zeno kernel:  [c1077e70] out_of_memory+0x240/0x2c0
Dec 17 12:43:59 zeno kernel:  [c107a8a8] __alloc_pages_nodemask+0x558/0x570
Dec 17 12:43:59 zeno kernel:  [c1569b91] tcp_sendmsg+0x711/0xab0
Dec 17 12:43:59 zeno kernel:  [c11db4fc] ? copy_to_user+0x2c/0x40
Dec 17 12:43:59 zeno kernel:  [c1587f22] inet_sendmsg+0x42/0xa0
Dec 17 12:43:59 zeno kernel:  [c152fe2b] sock_aio_write+0xdb/0x100
Dec 17 12:43:59 zeno kernel:  [c15874f5] ? inet_recvmsg+0x55/0xa0
Dec 17 12:43:59 zeno kernel:  [c152fd50] ? sock_aio_read+0x130/0x130
Dec 17 12:43:59 zeno kernel:  [c10a3fd4] do_sync_readv_writev+0xa4/0xe0
Dec 17 12:43:59 zeno kernel:  [c11db640] ? _copy_from_user+0x30/0x50
Dec 17 12:43:59 zeno kernel:  [c10a40d3] ? rw_copy_check_uvector+0x43/0x130
Dec 17 12:43:59 zeno kernel:  [c10a4262] do_readv_writev+0xa2/0x1b0
Dec 17 12:43:59 zeno kernel:  [c152fd50] ? sock_aio_read+0x130/0x130
Dec 17 12:43:59 zeno kernel:  [c10a3ced] ? vfs_read+0x14d/0x170
Dec 17 12:43:59 zeno kernel:  [c10a43a2] vfs_writev+0x32/0x50
Dec 17 12:43:59 zeno kernel:  [c10a44e8] sys_writev+0x38/0xa0
Dec 17 12:43:59 zeno kernel:  [c160fd14] sysenter_do_call+0x12/0x26
Dec 17 12:43:59 zeno kernel: Mem-Info:
Dec 17 12:43:59 zeno kernel: DMA per-cpu:
Dec 17 12:43:59 zeno kernel: CPU0: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU1: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU2: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU3: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU4: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU5: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU6: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU7: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU8: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU9: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   10: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   11: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   12: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   13: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   14: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   15: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   16: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   17: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   18: hi:0, btch:   1 usd:   0
Dec 17

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-16 Thread paul . szabo
Dear Ben,

 It's not a crash, though that's kind of an academic distinction.

What would you like me to call it instead? The machine seemed to
hang... do not know if rebooted spontaneously or in response to a
shutdown -r now that I had typed into an un-responsive xterm.

 Perhaps you could add some printk() statements to log the results of
 these various calculations, so you can sanity-check them.  You would
 probably want to make them conditional on the intitial value of x being
 negative (it's reused for something entirely different later so you
 would need to assign this condition to a separate variable).

I did, and am convinced that bdi_position_ratio() now does the right
thing: returns something within [0,2] mostly, and internally seems OK.
I think I had my corrected bdi_position_ratio() in use during this
latest OOM episode.

---

Thanks again for your prompt fix of bdi_position_ratio(). I will now
look for that elusive next bug.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201212170258.qbh2waxj014...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-15 Thread paul . szabo
, 
bdi_dirty);
   646  else
   647  pos_ratio *= 8;
   648  }
   649  BUG_ON(pos_ratio0);
   650  
   651  return pos_ratio;
   652  }

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201212152021.qbfklugf006...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-15 Thread paul . szabo
Dear Ben,

 ... I think the initial overflow occurs when calculating x
 ...
 setpoint and dirty are numbers of pages and are declared as long, so on
 a system with enough memory they can presumably differ by 2^21 or more
 (2^21 pages = 8 GB).  Shifting left by RATELIMIT_CALC_SHIFT = 10 can
 then change the sign bit.
 
 Does the attached patch fix this?
 
 ...
 
 Most variables in bdi_position_ratio() are declared long, which is
 enough for a page count.  However, when converting (setpoint - dirty)
 to a fixed-point number we left-shift by 10, and on a 32-bit system
 with PAE it is possible to have enough dirty pages that the shift
 overflows into the sign bit.  We need to cast to s64 before the
 left-shift.
 
 Reported-by: Paul Szabo paul.sz...@sydney.edu.au
 Reference: http://bugs.debian.org/695182
 Signed-off-by: Ben Hutchings b...@decadent.org.uk
 ---
  mm/page-writeback.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/mm/page-writeback.c b/mm/page-writeback.c
 index 50f0824..8b5600e 100644
 --- a/mm/page-writeback.c
 +++ b/mm/page-writeback.c
 @@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(struct 
 backing_dev_i
 nfo *bdi,
* = fast response on large errors; small oscillation near setpoint
*/
   setpoint = (freerun + limit) / 2;
 - x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
 + x = div_s64((s64)(setpoint - dirty)  RATELIMIT_CALC_SHIFT,
   limit - setpoint + 1);
   pos_ratio = x;
   pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;

Thanks for the quick patch. I am about to test it (in a day or so).
Initial (blind, off-the-cuff, uneducated) comments:
 - I had BUG_ON(x0) in my code, so unlikely x changed sign.
 - Why not use float instead of infinite-precision integer arithmetic?
 - Do we need a smooth function, or would an easy-to-calculate
   step function suffice?
 - Is there a check that the returned s64 pos_ratio fits into u32?

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201212160014.qbg0e961019...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-09 Thread paul . szabo
Dear Ben,

 Although PAE supports up to 64 GB RAM ... The use of such a large
 amount of high memory is problematic ...
 Or you can test ... by restricting what the kernel uses with the
 'mem' parameter, e.g. mem=16G.

Trying various mem=XX values, no OOM was observed with mem=32G or less,
but a crash is obtained with any memory over 32GB e.g. with mem=34G.
This suggests a signed/unsigned bug more than an issue with highmem
size; you said PAE supports 64GB, not just 32GB.

 A 64-bit kernel doesn't have a split between normal and high memory.

... and it may have larger integers, less affected by signedness bugs.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


*** mem=34G - Tail end of /var/log/kern.log
[0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-686-pae 
mem=34G root=UUID=469c2730-1786-46f7-9d80-5d651ee581d7 ro quiet
...
[  388.560098] dd invoked oom-killer: gfp_mask=0x800d0, order=0, oom_adj=0, 
oom_score_adj=0
[  388.560106] dd cpuset=/ mems_allowed=0
[  388.560113] Pid: 4244, comm: dd Not tainted 3.2.0-4-686-pae #1 Debian 
3.2.32-1
[  388.560117] Call Trace:
[  388.560135]  [c1097c1c] ? dump_header.isra.6+0x5c/0x167
[  388.560144]  [c1120ff8] ? security_real_capable_noaudit+0x2c/0x35
[  388.560149]  [c1097e93] ? oom_kill_process+0x30/0x201
[  388.560155]  [c109810f] ? select_bad_process.constprop.12+0xab/0xff
[  388.560160]  [c10983e0] ? out_of_memory+0xf8/0x135
[  388.560167]  [c109aecd] ? __alloc_pages_nodemask+0x509/0x63e
[  388.560176]  [c10c0ca3] ? cache_alloc+0x253/0x407
[  388.560183]  [c10c1425] ? kmem_cache_alloc+0x29/0x89
[  388.560193]  [c11605ee] ? radix_tree_preload+0x24/0x61
[  388.560203]  [c10961cd] ? add_to_page_cache_locked+0x3e/0xb3
[  388.560210]  [c1096253] ? add_to_page_cache_lru+0x11/0x2f
[  388.560217]  [c10962cb] ? grab_cache_page_write_begin+0x5a/0x94
[  388.560244]  [f88bb348] ? ext3_write_begin+0xa0/0x1d2 [ext3]
[  388.560251]  [c109d562] ? put_page+0x16/0x24
[  388.560258]  [c1095e61] ? generic_file_buffered_write+0xd8/0x1dd
[  388.560265]  [c1096afb] ? __generic_file_aio_write+0x25e/0x282
[  388.560273]  [c100f2fb] ? read_tsc+0xa/0x28
[  388.560282]  [c1053548] ? timekeeping_get_ns+0x11/0x55
[  388.560287]  [c1096b7c] ? generic_file_aio_write+0x5d/0xb3
[  388.560298]  [c10cbc31] ? wait_on_retry_sync_kiocb+0x3c/0x3c
[  388.560304]  [c10cbcd9] ? do_sync_write+0xa8/0xdc
[  388.560311]  [c10cc1f3] ? rw_verify_area+0xc6/0xe7
[  388.560317]  [c10cc493] ? vfs_write+0x83/0xd4
[  388.560323]  [c10cc653] ? sys_write+0x3d/0x61
[  388.560331]  [c12c5d1f] ? sysenter_do_call+0x12/0x28
[  388.560334] Mem-Info:
[  388.560337] DMA per-cpu:
[  388.560341] CPU0: hi:0, btch:   1 usd:   0
[  388.560345] CPU1: hi:0, btch:   1 usd:   0
[  388.560348] CPU2: hi:0, btch:   1 usd:   0
[  388.560351] CPU3: hi:0, btch:   1 usd:   0
[  388.560355] CPU4: hi:0, btch:   1 usd:   0
[  388.560358] CPU5: hi:0, btch:   1 usd:   0
[  388.560361] CPU6: hi:0, btch:   1 usd:   0
[  388.560365] CPU7: hi:0, btch:   1 usd:   0
[  388.560368] CPU8: hi:0, btch:   1 usd:   0
[  388.560371] CPU9: hi:0, btch:   1 usd:   0
[  388.560374] CPU   10: hi:0, btch:   1 usd:   0
[  388.560378] CPU   11: hi:0, btch:   1 usd:   0
[  388.560381] CPU   12: hi:0, btch:   1 usd:   0
[  388.560384] CPU   13: hi:0, btch:   1 usd:   0
[  388.560387] CPU   14: hi:0, btch:   1 usd:   0
[  388.560390] CPU   15: hi:0, btch:   1 usd:   0
[  388.560394] CPU   16: hi:0, btch:   1 usd:   0
[  388.560397] CPU   17: hi:0, btch:   1 usd:   0
[  388.560400] CPU   18: hi:0, btch:   1 usd:   0
[  388.560404] CPU   19: hi:0, btch:   1 usd:   0
[  388.560407] CPU   20: hi:0, btch:   1 usd:   0
[  388.560410] CPU   21: hi:0, btch:   1 usd:   0
[  388.560413] CPU   22: hi:0, btch:   1 usd:   0
[  388.560417] CPU   23: hi:0, btch:   1 usd:   0
[  388.560420] CPU   24: hi:0, btch:   1 usd:   0
[  388.560423] CPU   25: hi:0, btch:   1 usd:   0
[  388.560427] CPU   26: hi:0, btch:   1 usd:   0
[  388.560430] CPU   27: hi:0, btch:   1 usd:   0
[  388.560433] CPU   28: hi:0, btch:   1 usd:   0
[  388.560436] CPU   29: hi:0, btch:   1 usd:   0
[  388.560440] CPU   30: hi:0, btch:   1 usd:   0
[  388.560443] CPU   31: hi:0, btch:   1 usd:   0
[  388.560446] Normal per-cpu:
[  388.560449] CPU0: hi:  186, btch:  31 usd: 174
[  388.560452] CPU1: hi:  186, btch:  31 usd: 164
[  388.560456] CPU2: hi:  186, btch:  31 usd:  53
[  388.560459] CPU3: hi:  186, btch:  31 usd:  51
[  388.560462] CPU4: hi:  186, btch:  31 usd: 155
[  388.560465] CPU5: hi:  186, btch:  31 usd:  72
[  388.560469] CPU6: hi:  186, btch:  31 usd: 143
[  388.560472] CPU7: hi:  186, btch:  31 usd:  95
[  388.560475] CPU8: hi:  186, btch:  31 usd: 178

Bug#695182: Re: Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-05 Thread paul . szabo
An observation that may help in solving this issue. Using
  while :; do free -lm; sleep 5; done
while writing the files, I see the buffers and cached values
increasing; then buffers start decreasing, eventually down to zero;
then soon after, OOM starts. The free or low or high values do not
seem to show anything unusual.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


Extract from output:
 total   used   free sharedbuffers cached
Mem: 62941   1586  61354  0 61   1359
Mem: 62941   2143  60797  0 61   1907
Mem: 62941   2652  60288  0 62   2407
Mem: 62941   3205  59735  0 63   2951
Mem: 62941   3743  59197  0 63   3483
Mem: 62941   4275  58665  0 64   4007
Mem: 62941   4791  58149  0 64   4511
Mem: 62941   5338  57602  0 65   5049
Mem: 62941   5835  57105  0 65   5538
Mem: 62941   6332  56608  0 66   6027
Mem: 62941   6837  56103  0 66   6524
Mem: 62941   7332  55608  0 67   7007
Mem: 62941   7815  55125  0 67   7482
Mem: 62941   8310  54630  0 68   7970
Mem: 62941   8820  54120  0 68   8471
Mem: 62941   9280  53660  0 69   8922
Mem: 62941   9779  53161  0 69   9413
Mem: 62941  10231  52709  0 59   9868
Mem: 62941  10736  52204  0 59  10366
Mem: 62941  11105  51835  0 48  10741
Mem: 62941  11585  51355  0 41  11223
Mem: 62941  12074  50866  0 36  11709
Mem: 62941  12544  50396  0 23  12183
Mem: 62941  13021  49919  0 24  12653
Mem: 62941  13515  49425  0  8  13152
Mem: 62941  13978  48962  0  9  13609
Mem: 62941  14459  48481  0  1  14091
Mem: 62941  14941  47999  0  0  14566
Mem: 62941  15409  47531  0  0  15028
Mem: 62941  15858  47082  0  0  15487
Mem: 62941  16251  46689  0  0  15873
Mem: 62941  16392  46548  0  0  16017
Mem: 62941  16593  46347  0  0  16215
Mem: 62941  16730  46210  0  0  16350
Mem: 62941  16808  46132  0  0  16429
Mem: 62941  16839  46101  0  0  16460
Mem: 62941  16855  46085  0  0  16476
Mem: 62941  16843  46097  0  0  16487
Mem: 62941  17121  45819  0  0  16779
Mem: 62941  17342  45598  0  0  16998
Mem: 62941  17491  45449  0  0  17146


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201212051115.qb5bf3fl015...@como.maths.usyd.edu.au



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-05 Thread paul . szabo
Dear Ben,

 Although PAE supports up to 64 GB RAM, everything the kernel accesses
 must be mapped into 1 GB of virtual address space (about 880 MB of
 persistently mapped 'normal memory', plus temporary mappings of the
 remaining 'high memory').  The use of such a large amount of high memory
 is problematic, though I don't know whether it entirely explains this
 behaviour.  (The memory stats don't seem to account for much of the
 normal memory, as there is ~40 MB free but the various classes of
 allocations seem to add up to only ~300 MB.)

 These machines should all be installed with the amd64 kernel.  Is there
 any reason you would prefer not to do that?  Perhaps the kernel flavour
 selection in the installer should be changed to favour that based on the
 RAM size, though I'm not sure what the critical value should be.

Are you suggesting that the kernel lies, that 32-bit cannot handle 64GB?
Would it help to test the issue on a 16GB machine (I have one with
2*X5460 CPUs and one with single i5-3570), or with 24GB (have several
with 2*E5335 to 2*X5460)

I have seen recommendations to use 64-bit amd64. I am somewhat reluctant
on jumping ship: I want continuity (when I upgrade by installing a
little more memory), want similarity between my various machines; and
have observed 32-bit being faster in some situations.

But really: this is a bug in the 32-bit build. Do I know that the same
or similar or worse bugs are not present also in the 64-bit build off
the same sources?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201212051136.qb5babnv016...@como.maths.usyd.edu.au



Bug#384922: NFS insecure without support for squashing multiple groups

2012-02-19 Thread paul . szabo
 ... AUTH_SYS with untrusted root on clients is not a good fit ...
 NFSv4 with kerberos authentication would be less broken.  root_squash
 is a simplistic and incomplete band-aid.

NFSv4+krb is better only because it does not have a concept of groups.
Remove groups from AUTH_SYS, ignoring all groups or in other words doing
manage primary group similar to secondaries with -manage_gids, and
issue might be solved.
(In that sense NFSv4+krb is more broken, less feature-rich, than
AUTH_SYS.)

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201202191159.q1jbxymm017...@bari.maths.usyd.edu.au



Bug#384922: NFS insecure without support for squashing multiple groups

2012-02-19 Thread paul . szabo
Dear Jonathan,

 NFSv4+krb is better only because ...
 Surely the ability to squash multiple uids is also a help. ;-)

Not when asking to squash groups. :-)

I thought that idmapd worked also with AUTH_SYS.

 Do I understand correctly that you are requesting an export or mountd
 option filter_gid, which would behave like --manage-gids except it
 transforms the effective gid to anongid when the specified gid is not
 a group the user belongs to?  I haven't carefully looked over the
 protocol specs but at first glance that seems sensible.

Yes, my exact wish.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201202192011.q1jkb1pb023...@bari.maths.usyd.edu.au



Bug#657916: linux-source-2.6.32: ps time doubled then constant: missing lock for task_utime?

2012-02-05 Thread paul . szabo
I wrote:
  I only observed this for multi-threaded
  processes compiled with  -fopenmp  .
I think I now observed the same issue with a single-threaded process:

$ ps u -p 14252
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
duncans  14252  150  9.7 2458868 2408272 ? RN   Jan03 71589:24 ./compact
$ grep . /proc/14252/stat /proc/14252/task/*/stat
/proc/14252/stat:14252 (compact) R 1 14222 14218 0 -1 4202496 366809489 0 0 0 
429428470 108008 0 0 36 16 1 0 130475208 2517880832 602068 4294967295 1 1 0 0 0 
0 0 0 132 4294967295 0 0 17 0 0 0 0 0 0
/proc/14252/task/14252/stat:14252 (compact) R 1 14222 14218 0 -1 4202496 
366809489 0 0 0 429428468 95107 0 0 36 16 1 0 130475208 2517880832 602068 
4294967295 1 1 0 0 0 0 0 0 132 4294967295 0 0 17 0 0 0 0 0 0

Should I investigate, should I try to reproduce, and check by how much
do TIME and %CPU jump when the wrong results start?

Thanks,

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201202051106.q15b64im007...@bari.maths.usyd.edu.au



Bug#657916: linux-source-2.6.32: ps time doubled then constant: missing lock for task_utime?

2012-01-29 Thread Paul Szabo
Package: linux-source-2.6.32
Version: 2.6.32-41
Severity: normal


On rare occasions, for some long-running processes, ps shows a too-large
and then constant CPU time. I only observed this for multi-threaded
processes compiled with  -fopenmp  .

On one occasion I seen:

$ ps u -p 14804
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
psz  14804 1599  0.0  61528  1356 ?RNl  Jan13 71587:15 a.out
$ grep . /proc/14804/stat /proc/14804/task/*/stat
/proc/14804/stat:14804 (a.out) R 1 14804 14608 0 -1 4202496 624 0 0 0 427308277 
2215294 0 0 36 16 8 0 35128884 63004672 339 4294967295 134512640 134539869 
3214629248 3214614264 134522975 0 0 1 0 4294967295 0 0 17 2 0 0 0 0 0
/proc/14804/task/14804/stat:14804 (a.out) R 1 14804 14608 0 -1 4202496 588 0 0 
0 26478404 333660 0 0 36 16 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3214613280 3077747522 0 0 1 0 0 0 0 17 2 0 0 0 0 0
/proc/14804/task/14807/stat:14807 (a.out) R 1 14804 14608 0 -1 4202560 6 0 0 0 
26703589 138033 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3075048080 3077747646 0 0 1 0 0 0 0 -1 5 0 0 0 0 0
/proc/14804/task/14808/stat:14808 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 
26802997 48697 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3066655216 3077747646 0 0 1 0 0 0 0 -1 1 0 0 0 0 0
/proc/14804/task/14809/stat:14809 (a.out) R 1 14804 14608 0 -1 4202560 5 0 0 0 
26756492 95248 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3058262672 3077747646 0 0 1 0 0 0 0 -1 6 0 0 0 0 0
/proc/14804/task/14810/stat:14810 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 
26689860 161611 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3049869808 3077747646 0 0 1 0 0 0 0 -1 7 0 0 0 0 0
/proc/14804/task/14811/stat:14811 (a.out) R 1 14804 14608 0 -1 4202560 6 0 0 0 
26705969 145689 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3041477104 3077747646 0 0 1 0 0 0 0 -1 0 0 0 0 0 0
/proc/14804/task/14812/stat:14812 (a.out) R 1 14804 14608 0 -1 4202560 4 0 0 0 
26729186 122435 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3033084560 3077747646 0 0 1 0 0 0 0 -1 3 0 0 0 0 0
/proc/14804/task/14813/stat:14813 (a.out) R 1 14804 14608 0 -1 4202560 7 0 0 0 
26789545 62169 0 0 20 0 8 0 35128884 63004672 339 4294967295 134512640 
134539869 3214629248 3024691856 3077747646 0 0 1 0 0 0 0 -1 4 0 0 0 0 0

with TIME and %CPU reported by ps apparently doubled just before then;
from then on, TIME remained constant and %CPU slowly decreased. In that
state, command   ps u -L -p 14804   showed sensible output. I did not
wait long enough to see whether TIME ever increased again.

I wonder if this issue is related to task_utime in kernel/sched.c
calculating and updating p-prev_utime without any locks, whereas
comments say that thread_group_times must be called with siglock held.


Thanks,

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- System Information:
Debian Release: 6.0.4
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-pk05.09-svr (SMP w/8 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-source-2.6.32 depends on:
ii  binutils2.20.1-16The GNU assembler, linker and bina
ii  bzip2   1.0.5-6+squeeze1 high-quality block-sorting file co

Versions of packages linux-source-2.6.32 recommends:
ii  gcc   4:4.4.5-1  The GNU C compiler
ii  libc6-dev [libc-dev]  2.11.3-2   Embedded GNU C Library: Developmen
ii  make  3.81-8 An utility for Directing compilati

Versions of packages linux-source-2.6.32 suggests:
ii  kernel-package12.036+nmu1A utility for building Linux kerne
ii  libncurses5-dev [ncurses- 5.7+20100313-5 developer's libraries and docs for
pn  libqt3-mt-dev none (no description available)



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120129225154.20218.84527.report...@bari.maths.usyd.edu.au



Bug#582826: Oops: 0002 unable to handle kernel paging request

2010-08-24 Thread paul . szabo
The problem did not re-occur after upgrading to 2.6.26-24.
Maybe fixed... Please close bug: I cannot reproduce anymore.

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201008242303.o7on3mrw017...@bari.maths.usyd.edu.au



Bug#582826: Oops: 0002 unable to handle kernel paging request

2010-07-08 Thread paul . szabo
For the record only. - My machine has been occasionally (far too
regularly, every couple of weeks) crashing with the same error as
reported. Other machines, similar hardware and identical kernel,
did not seem affected. - Today I updated the kernels to one based
on 2.6.26-24, will monitor whether that helps.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201007082254.o68ms066031...@bari.maths.usyd.edu.au



Bug#582826: Oops: 0002 unable to handle kernel paging request

2010-05-23 Thread Paul Szabo
Package: linux-source-2.6.26
Version: 2.6.26-21
Severity: normal


My main file and login server machine crashed, with an Oops in the logs.
I do not know whether this crash is reproducible: it crashed also a week
earlier, but with nothing visible in the logs; it had been stable for
months before these two crashes.

Extract from /var/log/syslog at the crash:

May 22 20:29:45 bari kernel: BUG: unable to handle kernel paging request at 
b14f6dc2
May 22 20:29:45 bari kernel: IP: [c014b326] find_get_pages+0x46/0x70
May 22 20:29:45 bari kernel: *pdpt = 31a75001 *pde =  
May 22 20:29:45 bari kernel: Oops: 0002 [#1] SMP 
May 22 20:29:45 bari kernel: Modules linked in: nfsd exportfs autofs4 quota_v2 
fuse intel_agp agpgart usb_storage sg thermal 8250_pnp 8250 rtc_cmos rtc_core 
ehci_hcd parport_pc parport serial_core rtc_lib evdev i2c_i801 i2c_core 
processor thermal_sys
May 22 20:29:45 bari kernel: 
May 22 20:29:45 bari kernel: Pid: 287, comm: kswapd0 Not tainted 
(2.6.26-pk03.17-svr #1)
May 22 20:29:45 bari kernel: EIP: 0060:[c014b326] EFLAGS: 00010002 CPU: 5
May 22 20:29:45 bari kernel: EIP is at find_get_pages+0x46/0x70
May 22 20:29:45 bari kernel: EAX: b16f6cbe EBX: 0001 ECX:  EDX: 
b14f6dbe
May 22 20:29:45 bari kernel: ESI:  EDI: e39c5dd8 EBP: f7e39e88 ESP: 
f7e39e40
May 22 20:29:45 bari kernel:  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
May 22 20:29:45 bari kernel: Process kswapd0 (pid: 287, ti=f7e39000 
task=f7d7c6e0 task.ti=f7e39000)
May 22 20:29:45 bari kernel: Stack: 000e e39c5de8  f7e39e80 
 f7e39e80 c01539c2 f7e39e88 
May 22 20:29:45 bari kernel:e39c5d30 0080 c01545b4 000e 
00155ca7  e39c5dd8  
May 22 20:29:45 bari kernel:  c520e340 c1edb9e0 
c4f7bc60 c5278ae0 c39e7a60 c42deaa0 
May 22 20:29:45 bari kernel: Call Trace:
May 22 20:29:45 bari kernel:  [c01539c2] pagevec_lookup+0x22/0x30
May 22 20:29:45 bari kernel:  [c01545b4] __invalidate_mapping_pages+0x54/0x140
May 22 20:29:45 bari kernel:  [c01546af] invalidate_mapping_pages+0xf/0x20
May 22 20:29:45 bari kernel:  [c0186565] shrink_icache_memory+0x235/0x240
May 22 20:29:45 bari kernel:  [c015609f] shrink_slab+0x12f/0x190
May 22 20:29:45 bari kernel:  [c01564cd] kswapd+0x3cd/0x490
May 22 20:29:45 bari kernel:  [c0154d70] isolate_pages_global+0x0/0x60
May 22 20:29:45 bari kernel:  [c0136f80] autoremove_wake_function+0x0/0x50
May 22 20:29:45 bari kernel:  [c0156100] kswapd+0x0/0x490
May 22 20:29:45 bari kernel:  [c0136c99] kthread+0x39/0x70
May 22 20:29:45 bari kernel:  [c0136c60] kthread+0x0/0x70
May 22 20:29:45 bari kernel:  [c0103c83] kernel_thread_helper+0x7/0x14
May 22 20:29:45 bari kernel:  ===
May 22 20:29:45 bari kernel: Code: 30 00 8d 47 04 89 f1 89 ea 89 1c 24 e8 84 63 
10 00 85 c0 89 c3 74 1f 31 c9 8d 74 26 00 8b 54 8d 00 8b 02 f6 c4 40 74 03 8b 
52 0c f0 ff 42 04 83 c1 01 39 cb 77 e7 8b 44 24 04 f0 ff 47 10 fb 83 
May 22 20:29:45 bari kernel: EIP: [c014b326] find_get_pages+0x46/0x70 SS:ESP 
0068:f7e39e40
May 22 20:29:45 bari kernel: ---[ end trace 34faad952d0fda3f ]---

At this last crash, I happened to be logged in via ssh, and in the ssh
terminal window I had similar output, but curiously with a few lines
interchanged in order (and I am not sure whether the terminal output or
the syslog is correct; each line on the terminal was separately prefaced
with Message from syslogd... and separated with blank lines):

Message from sysl...@bari at Sat May 22 20:29:45 2010 ...
bari kernel: Oops: 0002 [#1] SMP 
bari kernel: Process kswapd0 (pid: 287, ti=f7e39000 task=f7d7c6e0 
task.ti=f7e39000)
bari kernel: Stack: 000e e39c5de8  f7e39e80  f7e39e80 
c01539c2 f7e39e88 
bari kernel:  c520e340 c1edb9e0 c4f7bc60 c5278ae0 
c39e7a60 c42deaa0 
bari kernel:e39c5d30 0080 c01545b4 000e 00155ca7  
e39c5dd8  
bari kernel: Call Trace:
bari kernel:  [c01539c2] pagevec_lookup+0x22/0x30
bari kernel:  [c01545b4] __invalidate_mapping_pages+0x54/0x140
bari kernel:  [c01546af] invalidate_mapping_pages+0xf/0x20
bari kernel:  [c015609f] shrink_slab+0x12f/0x190
bari kernel:  [c0186565] shrink_icache_memory+0x235/0x240
bari kernel:  [c01564cd] kswapd+0x3cd/0x490
bari kernel:  [c0154d70] isolate_pages_global+0x0/0x60
bari kernel:  [c0156100] kswapd+0x0/0x490
bari kernel:  [c0136f80] autoremove_wake_function+0x0/0x50
bari kernel:  [c0136c99] kthread+0x39/0x70
bari kernel:  [c0136c60] kthread+0x0/0x70
bari kernel:  [c0103c83] kernel_thread_helper+0x7/0x14
bari kernel: Code: 30 00 8d 47 04 89 f1 89 ea 89 1c 24 e8 84 63 10 00 85 c0 89 
c3 74 1f 31 c9 8d 74 26 00 8b 54 8d 00 8b 02 f6 c4 40 74 03 8b 52 0c f0 ff 42 
04 83 c1 01 39 cb 77 e7 8b 44 24 04 f0 ff 47 10 fb 83 
bari kernel: EIP: [c014b326] find_get_pages+0x46/0x70 SS:ESP 0068:f7e39e40
bari kernel:  ===

Thanks,

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School

Re: Processed: cloning 568317, reassign -1 to kernel-package

2010-02-16 Thread paul . szabo
Dear Ben,

You wrote:

 The OP asked us to report the bug, so I assumed he didn't.

Seems you did not pay attention to
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568317#86
written a week before your cloning:

 Seeing your reluctance to talk to kernel-package, I now reported
 Bug#568823.

Of course none of this matters at all. Kernel-package is now fixed,
linux-2.6 will be fixed sometime in the far future and wrongly because
they speak about repeated patterns not about reasonable perl code.
But it all does not matter because no-one seems to care much about
Debian stable (currently lenny) ... oh well, will patch myself.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201002162127.o1glreud015...@bari.maths.usyd.edu.au



Bug#568317: Processed: cloning 568317, reassign -1 to kernel-package

2010-02-16 Thread paul . szabo
Dear Ben,

 ... Sorry for the mistake.

OK, you are only human and forgiven.

 I know the scripts suck but I am not in a position to rewrite them. ...
 I fixed this in the svn branch ...

You seem to be contradicting yourself.

 I asked you to send patches for linux-2.6, and you refused.

Huh? What do you base that accusation on? Looking in Bug#568317, I see
my fumbling about getting hold of the right scripts (oh silly me, always
thinking that Debian lenny is current...) and then
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568317#71
from your colleague Maximilian Attems:

 ... saw your old patch and can do that later tomorrow.
 thanks for your input.

which I interpreted as you not needing further hand-holding after all.

Please explain how my interpretation was wrong; or please retract.

 I fixed this in the svn branch for lenny and it should be in the next
 stable update.

Thanks. Pity you did not let Bug#568317 (and thus the world) know.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201002162212.o1gmcfdc014...@bari.maths.usyd.edu.au



Bug#568317: Processed: cloning 568317, reassign -1 to kernel-package

2010-02-16 Thread paul . szabo
Dear Ben,

 I know the scripts suck but I am not in a position to rewrite them. ...
 I fixed this in the svn branch ...

 You seem to be contradicting yourself.

 Not at all. ... I have made a localised fix ... I have removed ...
 I would like to go much further ...

OK, you seem to make a subtle distinction between fix and rewrite.

 I asked you to send patches for linux-2.6, and you refused.

 Huh? What do you base that accusation on? ...
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D568317#71
  ... saw your old patch and can do that later tomorrow.
  thanks for your input.
 which I interpreted as you not needing further hand-holding after all.
 Please explain how my interpretation was wrong; or please retract.

 You declined to send a patch for linux-2.6 after I pointed out that it
 is separate from kernel-package.  Yes, Maks said he could use your k-p
 patch, but it would not have applied cleanly and would have required
 fixing up.  Given that the kernel team is quite busy, and that you are
 clearly capable of debugging Perl and making patches, I don't think it
 was unreasonable of me to expect you to help us a bit further.

Please note: I never refused, never declined. Please retract.

Maybe I failed to deliver, but only because I was wilfully misled into
thinking that such was not wanted. Will never again listen to Maks. :-)

 I fixed this in the svn branch for lenny and it should be in the next
 stable update.

 Thanks. Pity you did not let Bug#568317 (and thus the world) know.

 Normal practice is simply to tag a bug pending when the fix is
 committed, and that has been done (automatically).

Wow, I was not aware of that. But then... Bug#568823 had been pending
(I think is now done): does that mean that I can expect to see
kernel-package_12.033 in lenny, in the near future? (Surely not, surely
commit has meanings unrelated to to be in next stable update.)

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/201002162324.o1gnojsz017...@bari.maths.usyd.edu.au



Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-07 Thread paul . szabo
Dear Ben,

 kernel-package has an old version of the script templates.
 The templates used by linux-2.6 are in debian/templates/temp.image.plain.

 Where do I find that?

 apt-get source linux-2.6
 
 Sorry, am confused. Is that the same as
   apt-get install linux-source-2.6.26

 It is not.

I now tried that (not with apt-get but manually). I guess I needed the
files:

.../pool/main/l/linux-2.6/linux-2.6_2.6.26.orig.tar.gz
.../pool/main/l/linux-2.6/linux-2.6_2.6.26-21.diff.gz

(please confirm, or tell me what I messed up). After unpacking those,
I find (essentially in the diff file) the

.../linux-2.6-2.6.26/debian/templates/temp.image.plain/

directory. However, to my surprise, those files are older than those
from the kernel-package directory, for example:

diff -r /usr/share/kernel-package/pkg/image/postinst 
linux-2.6-2.6.26/debian/templates/temp.image.plain/postinst
8,10c8,10
 # Last Modified On : Wed Oct  8 00:03:41 2008
 # Last Machine Used: anzu.internal.golden-gryphon.com
 # Update Count : 360
---
 # Last Modified On : Fri Sep 29 10:08:18 2006
 # Last Machine Used: glaurung.internal.golden-gryphon.com
 # Update Count : 357
...

Does that mean that kernel-package is in fact newer, and you should
import again then use my patches (sent previously)?

-

Should this bug somehow be given to kernel-package also (clone and
reassign?): would you be able to do that?

-

 BTW... lenny 5.0.4, with 2.6.26-21, was announced some time ago, but
 still
   
 http://packages.debian.org/search?keywords=linux-2.6searchon=sourcenamessuite=stablesection=all
   http://packages.debian.org/source/lenny/linux-2.6
 show 2.6.26-19lenny2 : is that a problem?
  
 That is strange.  Perhaps you should report a bug against www.debian.org.

Would you be able to do that? (I guess should not use reportbug for
that, am not sure how.)

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-07 Thread paul . szabo
Dear Maks,

 ... saw your old patch and can do that later tomorrow.

Thanks. Please also fix current Debian stable, do not make us wait until
2.6.32 trickles down.

 Should this bug somehow be given to kernel-package also (clone and
 reassign?): would you be able to do that?
 don't know kernel-package is in slow maintenance,

What does slow maintenance mean? They seem more up-to-date that
linux-2.6 in Debian stable. Anyway please arrange for kernel-package to
be fixed also.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-07 Thread paul . szabo
Dear Ben,

 ... the date headers in linux-2.6 have not been updated ...

I only whinged about headers; content in 2.6.26-21 is also in fact
older than kernel-package. I do not know what you may have changed for
bleeding-edge 2.6.32; and of course I want Debian stable to be updated
(now or in the near future).

Seeing your reluctance to talk to kernel-package, I now reported
Bug#568823.

Please report the issue with
http://packages.debian.org/source/lenny/linux-2.6 .

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-05 Thread paul . szabo
Dear Ben,

 kernel-package has an old version of the script templates.
 The templates used by linux-2.6 are in debian/templates/temp.image.plain.

 Where do I find that?

 apt-get source linux-2.6

Sorry, am confused. Is that the same as
  apt-get install linux-source-2.6.26
which would get me (and install)
  linux-source-2.6.26_2.6.26-21_all.deb
and suggest to get/install
  kernel-package_11.015_all.deb
? I do not see anything about templates in either deb file,
but see /usr/share/kernel-package/pkg/image/postinst in kernel-package.

BTW... lenny 5.0.4, with 2.6.26-21, was announced some time ago, but
still
  
http://packages.debian.org/search?keywords=linux-2.6searchon=sourcenamessuite=stablesection=all
  http://packages.debian.org/source/lenny/linux-2.6
show 2.6.26-19lenny2 : is that a problem?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-04 Thread paul . szabo
Niko Tyni wrote:

 Reassigning now. ...

Thanks.

 ... you could probably follow up and explain what exactly was incorrect
 about running lilo.

Looking at my  /var/lib/dpkg/info/linux-image-*.postinst files, I see in
the code reading and parsing $CONF_LOC = '/etc/kernel-img.conf':

  ...
  $do_symlink  =  if /do_symlinks\s*=\s*(no|false|0)\s*$/ig;
  ...
  $do_bootloader   = Yes if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig;
  $explicit_do_loader = YES if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig;
  ...

Most of the match patterns are used once only; using /g on them is not
necessary, and is probably wasteful (though perl is fast enough to
handle such things).

The pattern /do_bootloader.../ig is used twice. The first one may match;
the second one will surely not match because of the spurious /g, thus
explicit_do_loader will never be set (and lilo not run, or run after a
question left un-answered in unattended runs of apt-get install).

---

Minor issues, while I am criticizing perl style...

These patterns are anchored at the end, should also be anchored at the
beginning (and with explicit m//) like:

  $do_symlink  =  if m/^\s*do_symlinks\s*=\s*(no|false|0)\s*$/i;
  ...
  $image_dest  = $1  if m/^\s*image_dest\s*=\s*(\S+)\s*$/i;

I wonder about the need to use my() in a single-level script.

---

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-04 Thread paul . szabo
Dear Ben,

 ... If you can provide patches, that would be most helpful.

See below. I now see that the sources of these files are in package
kernel-package (I do not know how that relates to linux-2.6).
I only patched the spurious /g modifiers and cleaned up the patterns
e.g. to anchor at the beginning; did not drop the unnecessary my().

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


--- usr/share/kernel-package/pkg/headers/postinst.bak   2008-05-02 
15:06:28.0 +1000
+++ usr/share/kernel-package/pkg/headers/postinst   2010-02-05 
10:30:23.0 +1100
@@ -146,8 +146,8 @@
   s/\#.*$//g;
   next if /^\s*$/;
 
-  $src_postinst_hook  = $1  if /src_postinst_hook\s*=\s*(\S+)/ig;
-  $header_postinst_hook   = $1  if /header_postinst_hook\s*=\s*(\S+)/ig;
+  $src_postinst_hook  = $1  if 
m/^\s*src_postinst_hook\s*=\s*(\S+)\s*$/i;
+  $header_postinst_hook   = $1  if 
m/^\s*header_postinst_hook\s*=\s*(\S+)\s*$/i;
 }
 close CONF;
 $have_conffile = Yes;
--- usr/share/kernel-package/pkg/source/postinst.bak2008-05-02 
15:06:28.0 +1000
+++ usr/share/kernel-package/pkg/source/postinst2010-02-05 
10:31:06.0 +1100
@@ -57,7 +57,7 @@
   s/\#.*$//g;
   next if /^\s*$/;
 
-  $src_postinst_hook   = $1  if /src_postinst_hook\s*=\s*(\S+)/ig;
+  $src_postinst_hook   = $1  if 
m/^\s*src_postinst_hook\s*=\s*(\S+)\s*$/i;
 }
 close CONF;
 $have_conffile = Yes;
--- usr/share/kernel-package/pkg/doc/postinst.bak   2008-05-02 
15:06:28.0 +1000
+++ usr/share/kernel-package/pkg/doc/postinst   2010-02-05 10:31:51.0 
+1100
@@ -57,7 +57,7 @@
   s/\#.*$//g;
   next if /^\s*$/;
 
-  $src_postinst_hook   = $1  if /src_postinst_hook\s*=\s*(\S+)/ig;
+  $src_postinst_hook   = $1  if 
m/^\s*src_postinst_hook\s*=\s*(\S+)\s*$/i;
 }
 close CONF;
 $have_conffile = Yes;
--- usr/share/kernel-package/pkg/image/postinst.bak 2008-11-25 
04:01:32.0 +1100
+++ usr/share/kernel-package/pkg/image/postinst 2010-02-05 10:43:59.0 
+1100
@@ -116,60 +116,60 @@
   warn Option image_in_boot is deprecated, and will go away. Use 
link_in_boot instead.\n
 if m/image_in_boot\s*=\s*/;
 
-  $do_symlink  =  if /do_symlinks\s*=\s*(no|false|0)\s*$/ig;
-  $no_symlink  =  if /no_symlinks\s*=\s*(no|false|0)\s*$/ig;
-  $reverse_symlink =  if /reverse_symlink\s*=\s*(no|false|0)\s*$/ig;
-  $link_in_boot=  if /link_in_boot\s*=\s*(no|false|0)\s*$/ig;
-  $link_in_boot=  if /image_in_boot\s*=\s*(no|false|0)\s*$/ig;
-  $move_image  =  if /move_image\s*=\s*(no|false|0)\s*$/ig;
-  $clobber_modules = '' if /clobber_modules\s*=\s*(no|false|0)\s*$/ig;
-  $do_boot_enable  = '' if /do_boot_enable\s*=\s*(no|false|0)\s*$/ig;
-  $do_bootfloppy   = '' if /do_bootfloppy\s*=\s*(no|false|0)\s*$/ig;
-  $relative_links  = '' if /relative_links \s*=\s*(no|false|0)\s*$/ig;
-  $do_bootloader   = '' if /do_bootloader\s*=\s*(no|false|0)\s*$/ig;
-  $do_initrd   = '' if /do_initrd\s*=\s*(no|false|0)\s*$/ig;
-  $warn_initrd = '' if /warn_initrd\s*=\s*(no|false|0)\s*$/ig;
-  $use_hard_links  = '' if /use_hard_links\s*=\s*(no|false|0)\s*$/ig;
-  $silent_modules  = '' if /silent_modules\s*=\s*(no|false|0)\s*$/ig;
-  $silent_loader   = '' if /silent_loader\s*=\s*(no|false|0)\s*$/ig;
-  $warn_reboot = '' if /warn_reboot\s*=\s*(no|false|0)\s*$/ig;
-  $minimal_swap= '' if /minimal_swap\s*=\s*(no|false|0)\s*$/ig;
-  $ignore_depmod_err = '' if /ignore_depmod_err\s*=\s*(no|false|0)\s*$/ig;
-  $relink_src_link   = '' if /relink_src_link\s*=\s*(no|false|0)\s*$/ig;
-  $relink_build_link = '' if /relink_build_link\s*=\s*(no|false|0)\s*$/ig;
-  $force_build_link  = '' if /force_build_link\s*=\s*(no|false|0)\s*$/ig;
-
-  $do_symlink  = Yes if /do_symlinks\s*=\s*(yes|true|1)\s*$/ig;
-  $no_symlink  = Yes if /no_symlinks\s*=\s*(yes|true|1)\s*$/ig;
-  $reverse_symlink = Yes if /reverse_symlinks\s*=\s*(yes|true|1)\s*$/ig;
-  $link_in_boot= Yes if /link_in_boot\s*=\s*(yes|true|1)\s*$/ig;
-  $link_in_boot= Yes if /image_in_boot\s*=\s*(yes|true|1)\s*$/ig;
-  $move_image  = Yes if /move_image\s*=\s*(yes|true|1)\s*$/ig;
-  $clobber_modules = Yes if /clobber_modules\s*=\s*(yes|true|1)\s*$/ig;
-  $do_boot_enable  = Yes if /do_boot_enable\s*=\s*(yes|true|1)\s*$/ig;
-  $do_bootfloppy   = Yes if /do_bootfloppy\s*=\s*(yes|true|1)\s*$/ig;
-  $do_bootloader   = Yes if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig;
-  $explicit_do_loader = YES if /do_bootloader\s*=\s*(yes|true|1)\s*$/ig;
-  $relative_links  = Yes if /relative_links\s*=\s*(yes|true|1)\s*$/ig;
-  $do_initrd   = Yes if /do_initrd\s*=\s*(yes|true|1)\s*$/ig;
-  $warn_initrd = Yes

Bug#568317: linux-image-* postinst did not correctly run lilo

2010-02-04 Thread paul . szabo
Dear Ben,

 kernel-package has an old version of the script templates.
 The templates used by linux-2.6 are in debian/templates/temp.image.plain.

Where do I find that? I am using (working, building on) Debian lenny,
building from linux-source-2.6.26-21.tar.bz2 . (Anyway, are not my
suggested changes simple enough so you do not need actual patch files?)

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#495529: linux-source: SMP process scheduler leaves CPUs idle

2008-08-18 Thread Paul Szabo
Package: linux-source
Version: 2.6.18.dfsg.1-18etch6
Severity: normal


I have some machines with 8 CPUs (dual Intel Xeon quad-core CPU chips),
used for long-running calculations; normally there are no short-lived
processes. If all processes are niced to the same or nearby level,
then things behave as expected. But if there is a large difference in
the nice level (e.g. one job at level 15 and 8 jobs at level 18), then
one CPU is left idle: total CPU percentages adding to 700% and top
shows idleness at 12.5%. (Under some similar conditions I have also
observed 2 idle CPUs.)

Please let me know if you need further details.

Thanks,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-pk02.19-svr
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#406902: kernel NFS data loss

2007-09-18 Thread Paul Szabo
On 24 Jul 07 I wrote:

 The patch below (against 2.6.8-16sarge7) seems to solve the problem.
 ... [patch for fs/exportfs/expfs.c] ...

Seems to me that this has been incorporated (or done) in
linux-source-2.6.18.dfsg.1-13etch2 .

Earlier, on 20 Jul 07 I wrote:

 ... I noticed what I thought were oddities. ... Do you think the
 following patch against 2.6.8-16sarge7 code would be useful?
 ... [patch for fs/nfs/dir.c] ...

That does not seem to have been added to 
linux-source-2.6.18.dfsg.1-13etch2 . I am still not quite sure what
needs lock_kernel(), and anyway my patch did not actually solve
anything, so that might be OK.

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#406902: [PATCH] Re: Bug#406902 kernel NFS data loss

2007-07-23 Thread Paul Szabo
The patch below (against 2.6.8-16sarge7) seems to solve the problem.
I now run my machines with both this patch, and also the one I submitted
on 20 Jul. Please include in future versions of the kernel.

Thanks,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


--- fs/exportfs/expfs.c.bak 2007-04-17 07:58:28.0 +1000
+++ fs/exportfs/expfs.c 2007-07-23 10:04:19.759071709 +1000
@@ -76,6 +76,12 @@
return result;
if (S_ISDIR(result-d_inode-i_mode)) {
/* there is no other dentry, so fail */
+/* PSz 23 Jul 07 Not ESTALE but EACCES
+ * See comments around line 292 below, and
+ * http://bugs.debian.org/255931
+ * http://bugs.debian.org/406902
+ */
+   err = -EACCES;
goto err_result;
}
/* try any other aliases */


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#406902: kernel NFS data loss

2007-07-21 Thread Paul Szabo
Have now tested the patch in my previous message: it does not solve
the problem I reported.

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#406902: kernel NFS data loss

2007-07-19 Thread Paul Szabo
I ran some tests today, and it seemed (but not conclusive) that the
problem only occurs when the client is a multi-CPU SMP machine. Looking
at kernel source code, I noticed what I thought were oddities.

Do you think the following patch against 2.6.8-16sarge7 code would be
useful? I have not yet tested whether this works at all, or whether it
improves anything.

Thanks,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


--- fs/nfs/dir.c.bak2004-08-14 15:36:58.0 +1000
+++ fs/nfs/dir.c2007-07-20 13:17:33.387030060 +1000
@@ -681,14 +681,15 @@
  */
 static void nfs_dentry_iput(struct dentry *dentry, struct inode *inode)
 {
+   /* PSz 20 Jul 07  Do not we need lock_kernel() for nfs_renew_times()? */
+   lock_kernel();
if (dentry-d_flags  DCACHE_NFSFS_RENAMED) {
-   lock_kernel();
inode-i_nlink--;
nfs_complete_unlink(dentry);
-   unlock_kernel();
}
/* When creating a negative dentry, we want to renew d_time */
nfs_renew_times(dentry);
+   unlock_kernel();
iput(inode);
 }
 
@@ -832,9 +833,12 @@
}
}
 no_entry:
+   /* PSz 20 Jul 07  Do not we need lock_kernel() for d_add() and 
nfs_renew_times()? */
+   lock_kernel();
d_add(dentry, inode);
nfs_renew_times(dentry);
nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
+   unlock_kernel();
 out:
BUG_ON(error  0);
return ERR_PTR(error);
@@ -882,8 +886,12 @@
unlock_kernel();
 out:
dput(parent);
-   if (!ret)
+   if (!ret) {
+   /* PSz 20 Jul 07  Do not we need lock_kernel() for d_drop()? */
+   lock_kernel();
d_drop(dentry);
+   unlock_kernel();
+   }
return ret;
 no_open:
dput(parent);
@@ -990,6 +998,7 @@
}
inode = nfs_fhget(dentry-d_sb, fhandle, fattr);
if (inode) {
+/* PSz 20 Jul 07  The whole nfs_instantiate() is only ever called within 
lock_kernel() */
d_instantiate(dentry, inode);
nfs_renew_times(dentry);
nfs_set_verifier(dentry, 
nfs_save_change_attribute(dentry-d_parent-d_inode));
@@ -1200,6 +1209,7 @@
dir, qsilly);
nfs_end_data_update(dir);
if (!error) {
+/* PSz 20 Jul 07  The whole nfs_sillyrename() is only ever called within 
lock_kernel() */
nfs_renew_times(dentry);
nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
d_move(dentry, sdentry);


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#406902: kernel-source: NFS data loss

2007-01-14 Thread Paul Szabo
Package: kernel-source
Version: 2.6.8_16sarge6
Severity: important

Maybe this is related to http://bugs.debian.org/255931 (and concomitant
discussion on [EMAIL PROTECTED]). I only guess that this is caused
by a spurious ESTALE error return.

Running on machine rome, as plain user psz:

while :; do date; perl -e 'mkdir tdir; chdir tdir; open F, tscr; print F 
echo hello; sleep 2; /bin/pwd; echo bye\n; close F; system sh tscr; 
mkdir(dir$_),rmdir(dir$_) foreach 1..100; system sh tscr; unlink tscr; 
chdir ..; rmdir tdir'; done

and (also on rome), as root:

while :; do date; lsof | grep psz | grep -E 'cwd|deleted|dir' | grep -v xterm; 
sleep 1; done

occasionally produces

Mon Jan 15 08:12:40 EST 2007
hello
/pisa/users/amstaff/psz/tdir
bye
shell-init: could not get current directory: getcwd: cannot access parent 
directories: No such file or directory
hello
/bin/pwd: cannot get current directory: No such file or directory
bye
Mon Jan 15 08:12:45 EST 2007
hello
/pisa/users/amstaff/psz/tdir
bye
hello
Mon Jan 15 08:12:48 EST 2007

Mon Jan 15 08:12:41 EST 2007
bash  13107  psz  cwd   DIR   0,15 8192581 
/pisa/users/amstaff/psz (pisa:/usr/users)
perl  13963  psz  cwd   DIR   0,15 40965800702 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
sh13964  psz  cwd   DIR   0,15 40965800702 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
sh13964  psz  255r  REG   0,15   405800712 
/pisa/users/amstaff/psz/tdir/tscr (pisa:/usr/users)
sleep 13965  psz  cwd   DIR   0,15 40965800702 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
Mon Jan 15 08:12:42 EST 2007
bash  13107  psz  cwd   DIR   0,15 8192581 
/pisa/users/amstaff/psz (pisa:/usr/users)
perl  13963  psz  cwd   unknown   0,15 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
Mon Jan 15 08:12:44 EST 2007
bash  13107  psz  cwd   DIR   0,15 8192581 
/pisa/users/amstaff/psz (pisa:/usr/users)
perl  13963  psz  cwd   unknown   0,15 
/pisa/users/amstaff/psz/tdir (deleted) (pisa:/usr/users)
sh13984  psz  cwd   unknown   0,15 
/pisa/users/amstaff/psz/tdir (deleted) (pisa:/usr/users)
sh13984  psz  255r  REG   0,15   405800712 
/pisa/users/amstaff/psz/tdir/tscr (pisa:/usr/users)
sleep 13985  psz  cwd   unknown   0,15 
/pisa/users/amstaff/psz/tdir (deleted) (pisa:/usr/users)
Mon Jan 15 08:12:45 EST 2007
bash  13107  psz  cwd   DIR   0,15 8192581 
/pisa/users/amstaff/psz (pisa:/usr/users)
perl  13995  psz  cwd   DIR   0,15 40965800702 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
sh13996  psz  cwd   DIR   0,15 40965800702 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
sh13996  psz  255r  REG   0,15   405800712 
/pisa/users/amstaff/psz/tdir/tscr (pisa:/usr/users)
sleep 13997  psz  cwd   DIR   0,15 40965800702 
/pisa/users/amstaff/psz/tdir (pisa:/usr/users)
Mon Jan 15 08:12:46 EST 2007

Settings:

rome# grep psz /etc/passwd
psz:x:1001:1001:Paul Szabo:/users/amstaff/psz:/bin/bash
rome# ls -l /users/amstaff
lrwxrwxrwx  1 root root 19 Jan 19  2005 /users/amstaff - /pisa/users/amstaff
rome# mount | grep pisa/users
pisa:/usr/users on /pisa/users type nfs 
(rw,bg,rsize=8192,wsize=8192,addr=129.78.69.136)

(pisa uses default root_squash in its /etc/exports).

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.8-spm1.7
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#402094: kernel-source-2.6.8: Intel drivers (net/e100.c, net/e1000/e1000_main.c)

2006-12-07 Thread Paul Szabo
Package: kernel-source-2.6.8
Version: 2.6.8-16sarge5
Severity: critical
Justification: root security hole


Noticed:

  Intel LAN Driver Buffer Overflow Local Privilege Escalation
  http://support.intel.com/support/network/sb/CS-023726.htm

The Intel blurb says Linux, and specifically Debian, is affected also:

Product Family OS  Affected Driver Versions  Corrected Driver 
Versions
Intel PRO 10/100 Adapters  Linux*  3.5.14 or previous3.5.17 or later
Intel PRO/1000 AdaptersLinux   7.2.7 or previous 7.3.15 or later

and it seems that:

kernel-source-2.6.8/drivers/net/e100.c
  #define DRV_NAMEe100
  #define DRV_VERSION 3.0.18
  #define DRV_DESCRIPTION Intel(R) PRO/100 Network Driver
  #define DRV_COPYRIGHT   Copyright(c) 1999-2004 Intel Corporation

kernel-source-2.6.8/drivers/net/e1000/e1000_main.c
  char e1000_driver_name[] = e1000;
  char e1000_driver_string[] = Intel(R) PRO/1000 Network Driver;
  char e1000_driver_version[] = 5.2.52-k4;
  char e1000_copyright[] = Copyright (c) 1999-2004 Intel Corporation.;

are quite old (so seem to be affected).

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.8-spm1.6
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages kernel-source-2.6.8 depends on:
ii  binutils  2.15-6 The GNU assembler, linker and bina
ii  bzip2 1.0.2-7high-quality block-sorting file co
ii  coreutils [fileutils] 5.2.1-2The GNU core utilities
ii  fileutils 5.2.1-2The GNU file management utilities 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-09-02 Thread Paul Szabo
I will re-phrase the problem, this may be clearer for some people:

  The root_squash option is to protect from an evil root. Though group
  staff is root-equivalent, root_squash does not currently squash that group
  (for various reasons, the kernel not supporting such options being one).
  An evil root could become group staff on the client, not get squashed
  across NFS, then become root on the server: root_squash is defeated.

Methods of exploitation, and ways to fix, were discussed already.

I know this bug renders my systems exploitable as we relied on the default
root_squash working, and never set non-default permissions on /usr/local or
altered root's PATH. I beleive it renders many other systems exploitable
also, but have no ways to test that hypothesis.

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-08-31 Thread Paul Szabo
severity 384922 critical
thanks

Dear Steve,

Sorry, I missed one:

 ... only exploitable when

 - you have a non-empty staff group on the client (+/- equivalent to
   untrusted root users on the client, since any root user can simply add
   users to this group)
 - you have NFS-shared filesystems that aren't marked nosuid
 - the untrusted user on the client has access to run processes on the NFS
   server
 - /usr/local/{bin,sbin} are in root's path
 - /usr/local/{bin,sbin} are writable by group staff

No need for the attacker to have direct login access to the NFS server:
if there is some user activity there, that could be trojaned.

Of your five conditions, (1) is a given (what we are protecting against),
(2) is what we use NFS for, (3) is likely to be present, and (4) and (5)
are forced upon us by Debian policy. (Were not these things debated in
#299007 already?)

Sounds critically gaping to me.

---

I am somewhat curious: who is Steinar, and who are you?

I had submitted a bug against nfs-kernel-server; the maintainer there is
Anibal. You jumped in and re-jiggled the severity; then there were some
messages from Steinar, never anything from Anibal. After re-assigning to 
linux-2.6.16 (hmm... why the specific version?) where the maintainer is
a nebulous committee, again you re-jiggle severity; and no word from the
maintainers.

Thanks,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-08-31 Thread Paul Szabo
severity 384922 critical
thanks

Dear Steve,

 It happens to be very dangerous to share a filesystem via NFS between
 systems that have different security contexts.  This does not make it a
 critical bug ...

Is it acceptable for a root compromise of one system to easily propagate
onto another?

I am confused: what is the use and intent of root_squash, why is it enabled
by default, and why is there an option to turn it off?

Is it documented that NFS must never be used between systems in different
security contexts, other than that UID/GIDs should match?

 Sorry, as I read Debian policy (and as discussed in #299007), I am not
 permitted to change root's PATH or change the permissions on /usr/local.

 *You* are permitted to do either of these things.  Whether they will be done
 by default in *Debian* is a separate question.

Could you please point me to where that is documented, and maybe explain
what does the policy apply to?

If policy may be ignored, then is there such a thing as a critical bug?
Turn it off or fix it yourself and you will be safe: is that good enough?

---

 No need for the attacker to have direct login access to the NFS server:
 if there is some user activity there, that could be trojaned.

 Now you're not even talking about anything that can be *fixed* by
 smash_gids, you're talking about trojaning arbitrary files that will be
 accessed by individual users on the NFS server.  The only way you can guard
 against a compromised client in that case is to never share home
 directories of any users you're worried about!

I am talking about what an attacker can do, once he gets root on the
client. I trust my users (to have no skills to attack). And it can be
fixed: root on the server will be safe if we fix either of the last two
points, in the policy or if the policy allows us to fix our systems; or
if at great expense we implement squashing GIDs.

 The answer remains, don't set your NFS environment up that way.

The correct answer seems to be fix or ignore the policy.

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-08-31 Thread Paul Szabo
severity 384922 critical
thanks

Dear Steve,

The issue is root compromise of an NFS server. If that is possible then
it is critical; if it is not possible then the bug is solved. It seems
logically impossible to downgrade this kind of bugs.

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-08-31 Thread Paul Szabo
retitle 384922 NFS root_squash broken without support for squashing multiple 
groups
severity 384922 critical
thanks


Dear Steve,

 [root_squash is] often circumventable ...

References (CERT kb, securityfocus BID, secunia advisory)? I do not know of
any (other than this bug) instances of defeating root_squash.

 Is it documented that NFS must ...
 How is it the responsibility of the kernel ...

This bug was originally filed against NFS. I assert that it is not
documented, because there is no such must.

  ... policy requirements ... that each package must satisfy ...

NFS (or kernel) must be secure while complying with policy requirements.

 ... consequence of *installing or using the software* ...

Using it in some common, reasonable way; following common UNIX knowledge,
and Debian-specific documentation; using root_squash as was intended.

 No, there are three vectors by which an attacker on the client can exploit
 this problem to get root on the NFS server:

 - the server's /usr/local is itself NFS-shared from server to client, and
   in the absence of squash_gids the attacker is able to directly trojan
   root's path
 - a filesystem is shared that allows the attacker to write an suid binary
   to the server, and the attacker is able to start arbitrary processes on
   the server using credentials of a compromised user account and thereby
   trojan root's path, *or* attack through another privileged group
 - a filesystem is shared that allows the attacker to write a file to the
   server that triggers some other user who is a member of a privileged group
   to execute an attack using their privileged group membership; most likely
   this would be done via shared home directories and shell startup
   configuration.

The attack I described does not fit well with either of your vectors.

The first vector surely does not apply. I think you meant suid root in the
second vector: then it does not apply, correctly prevented by root_squash;
there are no compromised accounts on the server. The third vector does not
apply, as there are no users in any privileged group on the server.

The issue in this bug is the mere existence of the staff group: the fact
that it is a root-equivalent, though without any member users, is sufficient
for the attack to succeed. The attack could be prevented by squash gid
trickery, or more simply by ensuring root has a sane PATH and/or by sane
ownerships on /usr/local. (I note that Debian policy is insane, but that is
not this bug.) Other UNIX distributions do not have root-equivalent users or
groups, thus their NFS services and root_squash options perform correctly.

 ... unwilling to yourself block these users' access to /usr/local.

None of my users have any (other than read) access.

 You're damn well free to ignore the policy when configuring your system.  I
 have no idea where you got the idea that policy was binding on users.

Thanks. Could you please state that in #299007 also?

Still, Debian must be secure by default, out-of-the-box. I guess this bug
could be solved by asking NFS to document it needs non-default,
non-policy-compliant settings for it to function securely. But then those
settings would need to go into its setup scripts, and it would be in breach
of policy, triggering a serious bug and its removal from Debian.

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: nfs-kernel-server: root_squash is broken

2006-08-30 Thread Paul Szabo
retitle 384922 NFS insecure without support for squashing multiple groups
tags 384922 security
severity 384922 critical
thanks

Dear Steinar,

 ... You may want to actually talk to the NFS kernel server people ...

Huh? I thought that is what have I been doing until now! (Oops, my mistake,
package nfs-kernel-server does not come close...)

Funny: you meekly accept that NFS is hopelessly insecure and no security
conscious person will ever use it. Do you not find that offensive? (Not my
comment: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=299007;msg=276 .)

Funny: all it would take is a tiny policy change, to be permitted to drop
/usr/local things from root's PATH, or to remove group staff writability
from those things. Everyone seems to know those should be done...

Thanks for your help,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-08-30 Thread Paul Szabo
Dear Steve,

You seem to think that this is important but not critical.
Don't you agree that it is a root security hole?

Thanks,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#384922: NFS insecure without support for squashing multiple groups

2006-08-30 Thread Paul Szabo
Dear Steve,

Thanks for your response.

 The bug log indicates that it's only exploitable when

 - you have a non-empty staff group on the client (+/- equivalent to
   untrusted root users on the client, since any root user can simply add
   users to this group)
 - you have NFS-shared filesystems that aren't marked nosuid
 - the untrusted user on the client has access to run processes on the NFS
   server
 - /usr/local/{bin,sbin} are in root's path
 - /usr/local/{bin,sbin} are writable by group staff

 The last two points are true by default on Debian, but the first three
 points are configuration decisions on the part of the NFS server
 administrator.  I understand that you have reasons to export shares allowing
 suid binaries in your own environment, but then you can also reconfigure
 root's path or the permissions on /usr/local/* in that case.

Sorry, the NFS server administrator does not really have control over the
first point. The purpose of root_squash is to limit and contain the damage
of a root compromise on the client; if root on the client could be fully
trusted then there would be no need or use for root_squash.

Sorry, as I read Debian policy (and as discussed in #299007), I am not
permitted to change root's PATH or change the permissions on /usr/local.

 I do agree that root should not have directories in its path by default that
 are writable by non-root users; but that is not this bug.

Yes, that is #299007, but am told that policy bugs cannot be critical...

Cheers,

Paul Szabo   [EMAIL PROTECTED]   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]