Re: Disk spindown on rmmod sd_mod
Jan Engelhardt wrote: > I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran > from initramfs, caused all my disks to spindown - sd even told me so. > > I recall there has been talk a while back about whether to spin down > disks on shutdown or not, but I do not think it touched the removal of > sd_mod, did it? So either way, can someone fill me in why the spindown > is done? The problem is that it's difficult to tell why a disk is going down from sd_shutdown(), so it issues STOP unless system state is SYSTEM_RESTART. Maybe we need to issue STOP only for SYSTEM_HALT, SYSTEM_POWER_OFF and SYSTEM_SUSPEND_DISK. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disk spindown on rmmod sd_mod
Jan Engelhardt wrote: I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran from initramfs, caused all my disks to spindown - sd even told me so. I recall there has been talk a while back about whether to spin down disks on shutdown or not, but I do not think it touched the removal of sd_mod, did it? So either way, can someone fill me in why the spindown is done? The problem is that it's difficult to tell why a disk is going down from sd_shutdown(), so it issues STOP unless system state is SYSTEM_RESTART. Maybe we need to issue STOP only for SYSTEM_HALT, SYSTEM_POWER_OFF and SYSTEM_SUSPEND_DISK. Thanks. -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Spindown error on shutdown
Hi, I use 2.6.22.9 with CFS-v22. When I shutdown my laptop I see a error (last message on shutdown, after "will be halt now"), but I can't read because is very fast (laptop power-off automatically). I see something about "Spindown error on ata-piix". I try found on /var/log (messages, kern) and don't see anything. .config is attached. Regards, Renato # # Automatically generated make config: don't edit # Linux kernel version: 2.6.22.9-cfs-v22 # Sun Sep 30 15:16:21 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set CONFIG_AUDIT=y # CONFIG_AUDITSYSCALL is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=14 CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MCORE2 is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_MODEL=4 CONFIG_HPET_TIMER=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y # CONFIG_PREEMPT_BKL is not set CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set # CONFIG_X86_MCE_P4THERMAL is not set CONFIG_VM86=y CONFIG_TOSHIBA=m # CONFIG_I8K is not set # CONFIG_X86_REBOOTFIXUPS is not set CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # # CONFIG_EDD is not s
Spindown error on shutdown
Hi, I use 2.6.22.9 with CFS-v22. When I shutdown my laptop I see a error (last message on shutdown, after will be halt now), but I can't read because is very fast (laptop power-off automatically). I see something about Spindown error on ata-piix. I try found on /var/log (messages, kern) and don't see anything. .config is attached. Regards, Renato # # Automatically generated make config: don't edit # Linux kernel version: 2.6.22.9-cfs-v22 # Sun Sep 30 15:16:21 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set CONFIG_AUDIT=y # CONFIG_AUDITSYSCALL is not set # CONFIG_IKCONFIG is not set CONFIG_LOG_BUF_SHIFT=14 CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set CONFIG_MPENTIUMM=y # CONFIG_MCORE2 is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_MODEL=4 CONFIG_HPET_TIMER=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y # CONFIG_PREEMPT_BKL is not set CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set # CONFIG_X86_MCE_P4THERMAL is not set CONFIG_VM86=y CONFIG_TOSHIBA=m # CONFIG_I8K is not set # CONFIG_X86_REBOOTFIXUPS is not set CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_EFI_VARS is not set # CONFIG_DELL_RBU is not set
Disk spindown on rmmod sd_mod
Hi, I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran from initramfs, caused all my disks to spindown - sd even told me so. I recall there has been talk a while back about whether to spin down disks on shutdown or not, but I do not think it touched the removal of sd_mod, did it? So either way, can someone fill me in why the spindown is done? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Disk spindown on rmmod sd_mod
Hi, I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran from initramfs, caused all my disks to spindown - sd even told me so. I recall there has been talk a while back about whether to spin down disks on shutdown or not, but I do not think it touched the removal of sd_mod, did it? So either way, can someone fill me in why the spindown is done? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
On Thu, Jun 14, 2007 at 01:48:46PM -0400, Daniel Drake wrote: > Greg KH wrote: > > I think it looks way too big. > > Agreed (otherwise I would have submitted the patches already). > > > If there are smaller patches, it might be a bit more reasonable. > > It may be possible to get rid of the couple of unrelated ones (sd printing, > SCSI constants). These were required for the real patches to be able to > build, but it would probably be easy to modify the real patches to build > against kernels without those otherwise unrelated patches. > > > Are there reported bugs that this patchset fixes? > > Yes, but they are not regressions - libata has never done this right until > now. > > Here are a few: > https://bugs.gentoo.org/show_bug.cgi?id=174373 > http://bugzilla.kernel.org/show_bug.cgi?id=7674 > http://bugzilla.kernel.org/show_bug.cgi?id=7838 > https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810 Ok, if people want to post some smaller patches to the stable team, we'll be glad to consider them. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
Greg KH wrote: I think it looks way too big. Agreed (otherwise I would have submitted the patches already). If there are smaller patches, it might be a bit more reasonable. It may be possible to get rid of the couple of unrelated ones (sd printing, SCSI constants). These were required for the real patches to be able to build, but it would probably be easy to modify the real patches to build against kernels without those otherwise unrelated patches. Are there reported bugs that this patchset fixes? Yes, but they are not regressions - libata has never done this right until now. Here are a few: https://bugs.gentoo.org/show_bug.cgi?id=174373 http://bugzilla.kernel.org/show_bug.cgi?id=7674 http://bugzilla.kernel.org/show_bug.cgi?id=7838 https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810 Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
On Thu, 14 Jun 2007, Greg KH wrote: > Are there reported bugs that this patchset fixes? Yes, at least one I opened and which got a CODE_FIX when Tejun prepared the first version of the patch. http://bugzilla.kernel.org/show_bug.cgi?id=7838 -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
On Thu, Jun 14, 2007 at 12:01:14PM -0400, Chuck Ebbert wrote: > Should we put these patches in 2.6.21-stable? > > Gentoo developers did a full backport: > > http://marc.info/?l=linux-ide=118047865916766=2 I think it looks way too big. If there are smaller patches, it might be a bit more reasonable. Are there reported bugs that this patchset fixes? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[stable] libata spindown patches for 2.6.21-stable
Should we put these patches in 2.6.21-stable? Gentoo developers did a full backport: http://marc.info/?l=linux-ide=118047865916766=2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[stable] libata spindown patches for 2.6.21-stable
Should we put these patches in 2.6.21-stable? Gentoo developers did a full backport: http://marc.info/?l=linux-idem=118047865916766w=2 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
added Daniel to CC: On Thu, Jun 14, 2007 at 12:01:14PM -0400, Chuck Ebbert wrote: Should we put these patches in 2.6.21-stable? Gentoo developers did a full backport: http://marc.info/?l=linux-idem=118047865916766w=2 I think it looks way too big. If there are smaller patches, it might be a bit more reasonable. Are there reported bugs that this patchset fixes? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
On Thu, 14 Jun 2007, Greg KH wrote: Are there reported bugs that this patchset fixes? Yes, at least one I opened and which got a CODE_FIX when Tejun prepared the first version of the patch. http://bugzilla.kernel.org/show_bug.cgi?id=7838 -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
Greg KH wrote: I think it looks way too big. Agreed (otherwise I would have submitted the patches already). If there are smaller patches, it might be a bit more reasonable. It may be possible to get rid of the couple of unrelated ones (sd printing, SCSI constants). These were required for the real patches to be able to build, but it would probably be easy to modify the real patches to build against kernels without those otherwise unrelated patches. Are there reported bugs that this patchset fixes? Yes, but they are not regressions - libata has never done this right until now. Here are a few: https://bugs.gentoo.org/show_bug.cgi?id=174373 http://bugzilla.kernel.org/show_bug.cgi?id=7674 http://bugzilla.kernel.org/show_bug.cgi?id=7838 https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810 Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [stable] libata spindown patches for 2.6.21-stable
On Thu, Jun 14, 2007 at 01:48:46PM -0400, Daniel Drake wrote: Greg KH wrote: I think it looks way too big. Agreed (otherwise I would have submitted the patches already). If there are smaller patches, it might be a bit more reasonable. It may be possible to get rid of the couple of unrelated ones (sd printing, SCSI constants). These were required for the real patches to be able to build, but it would probably be easy to modify the real patches to build against kernels without those otherwise unrelated patches. Are there reported bugs that this patchset fixes? Yes, but they are not regressions - libata has never done this right until now. Here are a few: https://bugs.gentoo.org/show_bug.cgi?id=174373 http://bugzilla.kernel.org/show_bug.cgi?id=7674 http://bugzilla.kernel.org/show_bug.cgi?id=7838 https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810 Ok, if people want to post some smaller patches to the stable team, we'll be glad to consider them. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 libata spindown
On Fri, 01 Jun 2007, Jeff Garzik wrote: > IIRC, Debian was the one OS that really did need a shutdown utility > update, as the message says :) Actually, editing /etc/init.d/halt is enough. Find the hddown="-h" and change it to hddown="". -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 libata spindown
On 6/1/07, Jeff Garzik <[EMAIL PROTECTED]> wrote: Tuncer Ayaz wrote: > I'm still seeing the libata warning that disks were not spun down > properly on the following two setups and am wondering whether I need > a new shutdown binary or the changeset mentioned below is not meant > to fix what I'm triggering by halt'ing. > > If it's not a bug I will try to update my shutdown utility and if > that does not work I promise not to bother lkml about a problem > caused by my userland. If it is a bug I hope it will be of interest > for 2.6.22 bug tracking. > > Setup 1: > SATA 1 Disks > AMD64 3200+ > nVidia nForce 3 250 (Ultra?) > Debian i386 Unstable > > Setup 2: > SATA 2 disks > Core 2 Duo E6600 > Intel 975X > Debian x86_64 Unstable > > Just to be clear what warning I'm talking about: > DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY > For more info, visit http://linux-ata.org/shutdown.html IIRC, Debian was the one OS that really did need a shutdown utility update, as the message says :) Thanks for the confirmation. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 libata spindown
Tuncer Ayaz wrote: I'm still seeing the libata warning that disks were not spun down properly on the following two setups and am wondering whether I need a new shutdown binary or the changeset mentioned below is not meant to fix what I'm triggering by halt'ing. If it's not a bug I will try to update my shutdown utility and if that does not work I promise not to bother lkml about a problem caused by my userland. If it is a bug I hope it will be of interest for 2.6.22 bug tracking. Setup 1: SATA 1 Disks AMD64 3200+ nVidia nForce 3 250 (Ultra?) Debian i386 Unstable Setup 2: SATA 2 disks Core 2 Duo E6600 Intel 975X Debian x86_64 Unstable Just to be clear what warning I'm talking about: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY For more info, visit http://linux-ata.org/shutdown.html IIRC, Debian was the one OS that really did need a shutdown utility update, as the message says :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.22 libata spindown
I'm still seeing the libata warning that disks were not spun down properly on the following two setups and am wondering whether I need a new shutdown binary or the changeset mentioned below is not meant to fix what I'm triggering by halt'ing. If it's not a bug I will try to update my shutdown utility and if that does not work I promise not to bother lkml about a problem caused by my userland. If it is a bug I hope it will be of interest for 2.6.22 bug tracking. Setup 1: SATA 1 Disks AMD64 3200+ nVidia nForce 3 250 (Ultra?) Debian i386 Unstable Setup 2: SATA 2 disks Core 2 Duo E6600 Intel 975X Debian x86_64 Unstable Just to be clear what warning I'm talking about: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY For more info, visit http://linux-ata.org/shutdown.html The following is from the reply I got from Michal Piotrowski while I was trying to find out what happened to the regression report: MICHAL>> I guess you meant this Subject: libata crash on halt References : http://marc.info/?l=linux-ide=117899827710565=2 Submitter : Andrew Morton <[EMAIL PROTECTED]> Caused-By : Tejun Heo <[EMAIL PROTECTED]> commit 920a4b1038e442700a1cfac77ea7e20bd615a2c3 Status : problem is being debugged This bug was fixed by commit da071b42f73dabbd0daf7ea4c3ff157d53b00648 Author: Tejun Heo <[EMAIL PROTECTED]> Date: Mon May 14 17:26:18 2007 +0200 libata: fix shutdown warning message printing Unlocking ap->lock and ssleeping don't work because SCSI commands can be issued from completion path without context. Reimplement delayed completion by allowing translation functions to override qc->scsidone(), storing the original completion function to scmd->scsi_done() and overriding qc->scsidone() with a function which schedules delayed invocation of scmd->scsi_done(). This isn't pretty at all but all the ugly parts are thankfully contained in the stop translation path where the compat feature is implemented. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> [ 715.196000] ata3.00: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY [ 715.196000] ata3.00: For more info, visit http://linux-ata.org/ shutdown.html If you think about this, please send a bug report. IMHO it's ABI breakage.
2.6.22 libata spindown
I'm still seeing the libata warning that disks were not spun down properly on the following two setups and am wondering whether I need a new shutdown binary or the changeset mentioned below is not meant to fix what I'm triggering by halt'ing. If it's not a bug I will try to update my shutdown utility and if that does not work I promise not to bother lkml about a problem caused by my userland. If it is a bug I hope it will be of interest for 2.6.22 bug tracking. Setup 1: SATA 1 Disks AMD64 3200+ nVidia nForce 3 250 (Ultra?) Debian i386 Unstable Setup 2: SATA 2 disks Core 2 Duo E6600 Intel 975X Debian x86_64 Unstable Just to be clear what warning I'm talking about: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY For more info, visit http://linux-ata.org/shutdown.html The following is from the reply I got from Michal Piotrowski while I was trying to find out what happened to the regression report: MICHAL I guess you meant this Subject: libata crash on halt References : http://marc.info/?l=linux-idem=117899827710565w=2 Submitter : Andrew Morton [EMAIL PROTECTED] Caused-By : Tejun Heo [EMAIL PROTECTED] commit 920a4b1038e442700a1cfac77ea7e20bd615a2c3 Status : problem is being debugged This bug was fixed by commit da071b42f73dabbd0daf7ea4c3ff157d53b00648 Author: Tejun Heo [EMAIL PROTECTED] Date: Mon May 14 17:26:18 2007 +0200 libata: fix shutdown warning message printing Unlocking ap-lock and ssleeping don't work because SCSI commands can be issued from completion path without context. Reimplement delayed completion by allowing translation functions to override qc-scsidone(), storing the original completion function to scmd-scsi_done() and overriding qc-scsidone() with a function which schedules delayed invocation of scmd-scsi_done(). This isn't pretty at all but all the ugly parts are thankfully contained in the stop translation path where the compat feature is implemented. Signed-off-by: Tejun Heo [EMAIL PROTECTED] Signed-off-by: Jeff Garzik [EMAIL PROTECTED] [ 715.196000] ata3.00: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY [ 715.196000] ata3.00: For more info, visit http://linux-ata.org/ shutdown.html If you think about this, please send a bug report. IMHO it's ABI breakage. MICHAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 libata spindown
Tuncer Ayaz wrote: I'm still seeing the libata warning that disks were not spun down properly on the following two setups and am wondering whether I need a new shutdown binary or the changeset mentioned below is not meant to fix what I'm triggering by halt'ing. If it's not a bug I will try to update my shutdown utility and if that does not work I promise not to bother lkml about a problem caused by my userland. If it is a bug I hope it will be of interest for 2.6.22 bug tracking. Setup 1: SATA 1 Disks AMD64 3200+ nVidia nForce 3 250 (Ultra?) Debian i386 Unstable Setup 2: SATA 2 disks Core 2 Duo E6600 Intel 975X Debian x86_64 Unstable Just to be clear what warning I'm talking about: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY For more info, visit http://linux-ata.org/shutdown.html IIRC, Debian was the one OS that really did need a shutdown utility update, as the message says :) Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 libata spindown
On 6/1/07, Jeff Garzik [EMAIL PROTECTED] wrote: Tuncer Ayaz wrote: I'm still seeing the libata warning that disks were not spun down properly on the following two setups and am wondering whether I need a new shutdown binary or the changeset mentioned below is not meant to fix what I'm triggering by halt'ing. If it's not a bug I will try to update my shutdown utility and if that does not work I promise not to bother lkml about a problem caused by my userland. If it is a bug I hope it will be of interest for 2.6.22 bug tracking. Setup 1: SATA 1 Disks AMD64 3200+ nVidia nForce 3 250 (Ultra?) Debian i386 Unstable Setup 2: SATA 2 disks Core 2 Duo E6600 Intel 975X Debian x86_64 Unstable Just to be clear what warning I'm talking about: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY For more info, visit http://linux-ata.org/shutdown.html IIRC, Debian was the one OS that really did need a shutdown utility update, as the message says :) Thanks for the confirmation. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22 libata spindown
On Fri, 01 Jun 2007, Jeff Garzik wrote: IIRC, Debian was the one OS that really did need a shutdown utility update, as the message says :) Actually, editing /etc/init.d/halt is enough. Find the hddown=-h and change it to hddown=. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote: > Pavel Machek wrote: > > > Isn't this why noflushd exists or is this an evil thing that shouldn't > > > ever be used and will eventually eat my disks for breakfast? > > > > It would eat your flash for breakfast. You know, flash memories have > > no spinning parts, so there's nothing to spin down. > > Btw Pavel, does noflushd work with 2.4.4? The noflushd version 2.4 I > tried said it couldn't find some kernel process (kflushd? I don't > remember) and that I should use bdflush. The manual says that's > appropriate for older kernels, but not 2.4.4 surely. Yes, noflushd works with 2.4.x. I'm running it on an ibook with debian-unstable. And as a word of warning: while running noflushd, make sure you 'sync' a few times after an 'apt-get dist-upgrade' that upgrades damn near everything before doing something that crashes the kernel. This WILL eat your ext2fs for breakfast. -- Troy Benjegerdes | master of mispeeling | 'da hozer' | [EMAIL PROTECTED] -"If this message isn't misspelled, I didn't write it" -- Me - "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shulz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote: Pavel Machek wrote: Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? It would eat your flash for breakfast. You know, flash memories have no spinning parts, so there's nothing to spin down. Btw Pavel, does noflushd work with 2.4.4? The noflushd version 2.4 I tried said it couldn't find some kernel process (kflushd? I don't remember) and that I should use bdflush. The manual says that's appropriate for older kernels, but not 2.4.4 surely. Yes, noflushd works with 2.4.x. I'm running it on an ibook with debian-unstable. And as a word of warning: while running noflushd, make sure you 'sync' a few times after an 'apt-get dist-upgrade' that upgrades damn near everything before doing something that crashes the kernel. This WILL eat your ext2fs for breakfast. -- Troy Benjegerdes | master of mispeeling | 'da hozer' | [EMAIL PROTECTED] -If this message isn't misspelled, I didn't write it -- Me - Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life. -- Charles Shulz - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Hi! > > You know about this project no doubt: > > > >http://noflushd.sourceforge.net/ > > Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in > .h files! As you say, not really lightweight. There must be a better > way. Also, I suspect (without having looked at the code) that it > doesn't handle memory pressure well. Things may get nasty when we run > low on free pages. Noflushd *is* lightweight. It is complicated because it has to know about different kernel versions etc. It is "easy stuff". If you add kernel support, it will only *add* lines to noflushd. Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Hi! > > > > I'd like that too, but what about sync writes? As things stand now, > > > > there is no option but to spin the disk back up. To get around this > > > > we'd have to change the basic behavior of the block device and > > > > that's doable, but it's an entirely different proposition than the > > > > little patch above. > > > > > > I don't care as much about sync writes. They don't seem to happen very > > > often on my boxes. > > > > syslog and some editors are the most common users of sync writes. vim, > > e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration > > of these apps, this can be prevented fairly easy though. Changing sync > > semantics for this matter on the other hand seems pretty awkward to me. I'd > > expect an application calling fsync() to have good reason for having its > > data flushed to disk _now_, no matter what state the disk happens to be in. > > If it hasn't, fix the app, not the kernel. > > But apps shouldn't have to know about the special requirements of > laptops. If app does fsync(), it hopefully knows what it is doing. [Random apps should not really do sync even on normal systems -- it hurts performance.] Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Hi! You know about this project no doubt: http://noflushd.sourceforge.net/ Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in .h files! As you say, not really lightweight. There must be a better way. Also, I suspect (without having looked at the code) that it doesn't handle memory pressure well. Things may get nasty when we run low on free pages. Noflushd *is* lightweight. It is complicated because it has to know about different kernel versions etc. It is easy stuff. If you add kernel support, it will only *add* lines to noflushd. Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Hi! I'd like that too, but what about sync writes? As things stand now, there is no option but to spin the disk back up. To get around this we'd have to change the basic behavior of the block device and that's doable, but it's an entirely different proposition than the little patch above. I don't care as much about sync writes. They don't seem to happen very often on my boxes. syslog and some editors are the most common users of sync writes. vim, e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration of these apps, this can be prevented fairly easy though. Changing sync semantics for this matter on the other hand seems pretty awkward to me. I'd expect an application calling fsync() to have good reason for having its data flushed to disk _now_, no matter what state the disk happens to be in. If it hasn't, fix the app, not the kernel. But apps shouldn't have to know about the special requirements of laptops. If app does fsync(), it hopefully knows what it is doing. [Random apps should not really do sync even on normal systems -- it hurts performance.] Pavel -- The best software in life is free (not shareware)! Pavel GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Sunday 24 June 2001 17:06, Rik van Riel wrote: > On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote: > > It is not uncommon to have a large number of tmp files on the disk(s) > > (Rik also pointed this out somewhere early in the original thread) and > > it is sensible to keep all of them in buffers if RAM is sufficient. > > Transfering _very_ large files is not _that_ common so why shouldn't > > that case be handled from the user space by calling sync(2)? > > Wait a moment. > > The only observed bad case I've heard about here is > that of large files being written out. But that's not the only advantage of doing the early update: - Early spindown for laptops - Improved latency under some conditions - Improved throughput for some loads - Improved filesystem safety > It should be easy enough to just trigger writeout of > pages of an inode once that inode has more than a > certain amount of dirty pages in RAM ... say, something > like freepages.high ? The inode dirty page list is not sorted by "time dirtied" so you would be eroding the system's ability to ensure that dirty file buffers never get older than X. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote: > It is not uncommon to have a large number of tmp files on the disk(s) > (Rik also pointed this out somewhere early in the original thread) and > it is sensible to keep all of them in buffers if RAM is sufficient. > Transfering _very_ large files is not _that_ common so why shouldn't > that case be handled from the user space by calling sync(2)? Wait a moment. The only observed bad case I've heard about here is that of large files being written out. It should be easy enough to just trigger writeout of pages of an inode once that inode has more than a certain amount of dirty pages in RAM ... say, something like freepages.high ? regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Sunday 24 June 2001 05:20, Anuradha Ratnaweera wrote: > On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote: > > 1. When running a compile, or anything else that produces lots of small > > disk writes, you tend to get lots of little pauses for all the little > > writes to disk. These seem to be unnoticable without the patch. > > > > 2. Loading programs when writing activity is occuring (even light > > activity like during the compile) is noticable slower, actually any > > reading from disk is. > > > > I also ran my simple ftp test that produced the symptom I reported > > earlier. I transferred a 750MB file via FTP, and with your patch sure > > enough disk writing started almost immediately, but it still didn't seem > > to write enough data to disk to keep up with the transfer so at > > approximately the 200MB mark the old behavior still kicked in as it went > > into full flush mode, during the time network activity halted, just like > > before. > > It is not uncommon to have a large number of tmp files on the disk(s) (Rik > also pointed this out somewhere early in the original thread) and it is > sensible to keep all of them in buffers if RAM is sufficient. Transfering > _very_ large files is not _that_ common so why shouldn't that case be > handled from the user space by calling sync(2)? The patch you're discussing has been superceded - check my "[RFC] Early flush: new, improved" post from yesterday. This addresses the problem of handling tmp files efficiently while still having the early flush. The latest patch shows no degradation at all for compilation, which uses lots of temporary files. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Sunday 24 June 2001 05:20, Anuradha Ratnaweera wrote: On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote: 1. When running a compile, or anything else that produces lots of small disk writes, you tend to get lots of little pauses for all the little writes to disk. These seem to be unnoticable without the patch. 2. Loading programs when writing activity is occuring (even light activity like during the compile) is noticable slower, actually any reading from disk is. I also ran my simple ftp test that produced the symptom I reported earlier. I transferred a 750MB file via FTP, and with your patch sure enough disk writing started almost immediately, but it still didn't seem to write enough data to disk to keep up with the transfer so at approximately the 200MB mark the old behavior still kicked in as it went into full flush mode, during the time network activity halted, just like before. It is not uncommon to have a large number of tmp files on the disk(s) (Rik also pointed this out somewhere early in the original thread) and it is sensible to keep all of them in buffers if RAM is sufficient. Transfering _very_ large files is not _that_ common so why shouldn't that case be handled from the user space by calling sync(2)? The patch you're discussing has been superceded - check my [RFC] Early flush: new, improved post from yesterday. This addresses the problem of handling tmp files efficiently while still having the early flush. The latest patch shows no degradation at all for compilation, which uses lots of temporary files. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote: It is not uncommon to have a large number of tmp files on the disk(s) (Rik also pointed this out somewhere early in the original thread) and it is sensible to keep all of them in buffers if RAM is sufficient. Transfering _very_ large files is not _that_ common so why shouldn't that case be handled from the user space by calling sync(2)? Wait a moment. The only observed bad case I've heard about here is that of large files being written out. It should be easy enough to just trigger writeout of pages of an inode once that inode has more than a certain amount of dirty pages in RAM ... say, something like freepages.high ? regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Sunday 24 June 2001 17:06, Rik van Riel wrote: On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote: It is not uncommon to have a large number of tmp files on the disk(s) (Rik also pointed this out somewhere early in the original thread) and it is sensible to keep all of them in buffers if RAM is sufficient. Transfering _very_ large files is not _that_ common so why shouldn't that case be handled from the user space by calling sync(2)? Wait a moment. The only observed bad case I've heard about here is that of large files being written out. But that's not the only advantage of doing the early update: - Early spindown for laptops - Improved latency under some conditions - Improved throughput for some loads - Improved filesystem safety It should be easy enough to just trigger writeout of pages of an inode once that inode has more than a certain amount of dirty pages in RAM ... say, something like freepages.high ? The inode dirty page list is not sorted by time dirtied so you would be eroding the system's ability to ensure that dirty file buffers never get older than X. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote: > > 1. When running a compile, or anything else that produces lots of small disk > writes, you tend to get lots of little pauses for all the little writes to disk. > These seem to be unnoticable without the patch. > > 2. Loading programs when writing activity is occuring (even light activity like > during the compile) is noticable slower, actually any reading from disk is. > > I also ran my simple ftp test that produced the symptom I reported earlier. I > transferred a 750MB file via FTP, and with your patch sure enough disk writing > started almost immediately, but it still didn't seem to write enough data to > disk to keep up with the transfer so at approximately the 200MB mark the old > behavior still kicked in as it went into full flush mode, during the time > network activity halted, just like before. It is not uncommon to have a large number of tmp files on the disk(s) (Rik also pointed this out somewhere early in the original thread) and it is sensible to keep all of them in buffers if RAM is sufficient. Transfering _very_ large files is not _that_ common so why shouldn't that case be handled from the user space by calling sync(2)? Anuradha -- Debian GNU/Linux (kernel 2.4.6-pre5) Keep cool, but don't freeze. -- Hellman's Mayonnaise - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote: 1. When running a compile, or anything else that produces lots of small disk writes, you tend to get lots of little pauses for all the little writes to disk. These seem to be unnoticable without the patch. 2. Loading programs when writing activity is occuring (even light activity like during the compile) is noticable slower, actually any reading from disk is. I also ran my simple ftp test that produced the symptom I reported earlier. I transferred a 750MB file via FTP, and with your patch sure enough disk writing started almost immediately, but it still didn't seem to write enough data to disk to keep up with the transfer so at approximately the 200MB mark the old behavior still kicked in as it went into full flush mode, during the time network activity halted, just like before. It is not uncommon to have a large number of tmp files on the disk(s) (Rik also pointed this out somewhere early in the original thread) and it is sensible to keep all of them in buffers if RAM is sufficient. Transfering _very_ large files is not _that_ common so why shouldn't that case be handled from the user space by calling sync(2)? Anuradha -- Debian GNU/Linux (kernel 2.4.6-pre5) Keep cool, but don't freeze. -- Hellman's Mayonnaise - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Saturday 23 June 2001 01:25, Daniel Kobras wrote: > On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote: > > Daniel Phillips writes: > > > I'd like that too, but what about sync writes? As things stand now, > > > there is no option but to spin the disk back up. To get around this > > > we'd have to change the basic behavior of the block device and > > > that's doable, but it's an entirely different proposition than the > > > little patch above. > > > > I don't care as much about sync writes. They don't seem to happen very > > often on my boxes. > > syslog and some editors are the most common users of sync writes. vim, > e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration > of these apps, this can be prevented fairly easy though. Changing sync > semantics for this matter on the other hand seems pretty awkward to me. I'd > expect an application calling fsync() to have good reason for having its > data flushed to disk _now_, no matter what state the disk happens to be in. > If it hasn't, fix the app, not the kernel. But apps shouldn't have to know about the special requirements of laptops. I've been playing a little with the idea of creating a special block device for laptops that goes between the vfs and the real block device, and adds the behaviour of being able to buffer writes in memory. In all respects it would seem to the vfs to be a disk. So far this is just a thought experiment. > > > You know about this project no doubt: > > > > > >http://noflushd.sourceforge.net/ > > > > Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in > > .h files! As you say, not really lightweight. There must be a better > > way. > > noflushd would benefit a lot from being able to set bdflush parameters per > device or per disk. So I'm really eager to see what Daniel comes up with. > Currently, we can only turn kupdate either on or off as a whole, which > means that noflushd implements a crude replacement for the benefit of > multi-disk setups. A lot of the cruft stems from there. Yes, another person to talk to about this is Jens Axboe who has been doing some serious hacking on the block layer. I thought I'd get the early flush patch working well for one disk before generalizing to N ;-) > > Also, I suspect (without having looked at the code) that it > > doesn't handle memory pressure well. Things may get nasty when we run > > low on free pages. > > It doesn't handle memory pressure at all. It doesn't have to. noflushd only > messes with kupdate{,d} but leaves bdflush (formerly known as kflushd) > alone. If memory gets tight, bdflush starts writing out dirty buffers, > which makes the disk spin up, and we're back to normal. Exactly. And in addition, when bdflush does wake up, I try to get kupdate out of the way as much as possible, though I've been following the traditional recipe and having it submit all buffers past a certain age. This is quite possibily a bad thing to do because it could starve the swapper. Ouch. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote: > Daniel Phillips writes: > > I'd like that too, but what about sync writes? As things stand now, > > there is no option but to spin the disk back up. To get around this > > we'd have to change the basic behavior of the block device and > > that's doable, but it's an entirely different proposition than the > > little patch above. > > I don't care as much about sync writes. They don't seem to happen very > often on my boxes. syslog and some editors are the most common users of sync writes. vim, e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration of these apps, this can be prevented fairly easy though. Changing sync semantics for this matter on the other hand seems pretty awkward to me. I'd expect an application calling fsync() to have good reason for having its data flushed to disk _now_, no matter what state the disk happens to be in. If it hasn't, fix the app, not the kernel. > > You know about this project no doubt: > > > >http://noflushd.sourceforge.net/ > > Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in > .h files! As you say, not really lightweight. There must be a better > way. noflushd would benefit a lot from being able to set bdflush parameters per device or per disk. So I'm really eager to see what Daniel comes up with. Currently, we can only turn kupdate either on or off as a whole, which means that noflushd implements a crude replacement for the benefit of multi-disk setups. A lot of the cruft stems from there. > Also, I suspect (without having looked at the code) that it > doesn't handle memory pressure well. Things may get nasty when we run > low on free pages. It doesn't handle memory pressure at all. It doesn't have to. noflushd only messes with kupdate{,d} but leaves bdflush (formerly known as kflushd) alone. If memory gets tight, bdflush starts writing out dirty buffers, which makes the disk spin up, and we're back to normal. Regards, Daniel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote: > Pavel Machek wrote: > > > Isn't this why noflushd exists or is this an evil thing that shouldn't > > > ever be used and will eventually eat my disks for breakfast? > > > > It would eat your flash for breakfast. You know, flash memories have > > no spinning parts, so there's nothing to spin down. > > Btw Pavel, does noflushd work with 2.4.4? The noflushd version 2.4 I > tried said it couldn't find some kernel process (kflushd? I don't > remember) and that I should use bdflush. The manual says that's > appropriate for older kernels, but not 2.4.4 surely. That's because of my favourite change from the 2.4.3 patch: - strcpy(tsk->comm, "kupdate"); + strcpy(tsk->comm, "kupdated"); noflushd 2.4 fixed this issue in the daemon itself, but I had forgotten about the generic startup skript. (Rpms and debs run their customized versions.) Either the current version from CVS, or ed /your/init.d/location/noflushd << EOF %s/kupdate/kupdated/g w q EOF should get you going. Regards, Daniel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote: Pavel Machek wrote: Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? It would eat your flash for breakfast. You know, flash memories have no spinning parts, so there's nothing to spin down. Btw Pavel, does noflushd work with 2.4.4? The noflushd version 2.4 I tried said it couldn't find some kernel process (kflushd? I don't remember) and that I should use bdflush. The manual says that's appropriate for older kernels, but not 2.4.4 surely. That's because of my favourite change from the 2.4.3 patch: - strcpy(tsk-comm, kupdate); + strcpy(tsk-comm, kupdated); noflushd 2.4 fixed this issue in the daemon itself, but I had forgotten about the generic startup skript. (Rpms and debs run their customized versions.) Either the current version from CVS, or ed /your/init.d/location/noflushd EOF %s/kupdate/kupdated/g w q EOF should get you going. Regards, Daniel. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote: Daniel Phillips writes: I'd like that too, but what about sync writes? As things stand now, there is no option but to spin the disk back up. To get around this we'd have to change the basic behavior of the block device and that's doable, but it's an entirely different proposition than the little patch above. I don't care as much about sync writes. They don't seem to happen very often on my boxes. syslog and some editors are the most common users of sync writes. vim, e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration of these apps, this can be prevented fairly easy though. Changing sync semantics for this matter on the other hand seems pretty awkward to me. I'd expect an application calling fsync() to have good reason for having its data flushed to disk _now_, no matter what state the disk happens to be in. If it hasn't, fix the app, not the kernel. You know about this project no doubt: http://noflushd.sourceforge.net/ Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in .h files! As you say, not really lightweight. There must be a better way. noflushd would benefit a lot from being able to set bdflush parameters per device or per disk. So I'm really eager to see what Daniel comes up with. Currently, we can only turn kupdate either on or off as a whole, which means that noflushd implements a crude replacement for the benefit of multi-disk setups. A lot of the cruft stems from there. Also, I suspect (without having looked at the code) that it doesn't handle memory pressure well. Things may get nasty when we run low on free pages. It doesn't handle memory pressure at all. It doesn't have to. noflushd only messes with kupdate{,d} but leaves bdflush (formerly known as kflushd) alone. If memory gets tight, bdflush starts writing out dirty buffers, which makes the disk spin up, and we're back to normal. Regards, Daniel. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Saturday 23 June 2001 01:25, Daniel Kobras wrote: On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote: Daniel Phillips writes: I'd like that too, but what about sync writes? As things stand now, there is no option but to spin the disk back up. To get around this we'd have to change the basic behavior of the block device and that's doable, but it's an entirely different proposition than the little patch above. I don't care as much about sync writes. They don't seem to happen very often on my boxes. syslog and some editors are the most common users of sync writes. vim, e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration of these apps, this can be prevented fairly easy though. Changing sync semantics for this matter on the other hand seems pretty awkward to me. I'd expect an application calling fsync() to have good reason for having its data flushed to disk _now_, no matter what state the disk happens to be in. If it hasn't, fix the app, not the kernel. But apps shouldn't have to know about the special requirements of laptops. I've been playing a little with the idea of creating a special block device for laptops that goes between the vfs and the real block device, and adds the behaviour of being able to buffer writes in memory. In all respects it would seem to the vfs to be a disk. So far this is just a thought experiment. You know about this project no doubt: http://noflushd.sourceforge.net/ Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in .h files! As you say, not really lightweight. There must be a better way. noflushd would benefit a lot from being able to set bdflush parameters per device or per disk. So I'm really eager to see what Daniel comes up with. Currently, we can only turn kupdate either on or off as a whole, which means that noflushd implements a crude replacement for the benefit of multi-disk setups. A lot of the cruft stems from there. Yes, another person to talk to about this is Jens Axboe who has been doing some serious hacking on the block layer. I thought I'd get the early flush patch working well for one disk before generalizing to N ;-) Also, I suspect (without having looked at the code) that it doesn't handle memory pressure well. Things may get nasty when we run low on free pages. It doesn't handle memory pressure at all. It doesn't have to. noflushd only messes with kupdate{,d} but leaves bdflush (formerly known as kflushd) alone. If memory gets tight, bdflush starts writing out dirty buffers, which makes the disk spin up, and we're back to normal. Exactly. And in addition, when bdflush does wake up, I try to get kupdate out of the way as much as possible, though I've been following the traditional recipe and having it submit all buffers past a certain age. This is quite possibily a bad thing to do because it could starve the swapper. Ouch. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
Pavel Machek wrote: > > Isn't this why noflushd exists or is this an evil thing that shouldn't > > ever be used and will eventually eat my disks for breakfast? > > It would eat your flash for breakfast. You know, flash memories have > no spinning parts, so there's nothing to spin down. Btw Pavel, does noflushd work with 2.4.4? The noflushd version 2.4 I tried said it couldn't find some kernel process (kflushd? I don't remember) and that I should use bdflush. The manual says that's appropriate for older kernels, but not 2.4.4 surely. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
Pavel Machek wrote: Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? It would eat your flash for breakfast. You know, flash memories have no spinning parts, so there's nothing to spin down. Btw Pavel, does noflushd work with 2.4.4? The noflushd version 2.4 I tried said it couldn't find some kernel process (kflushd? I don't remember) and that I should use bdflush. The manual says that's appropriate for older kernels, but not 2.4.4 surely. -- Jamie - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wednesday 20 June 2001 22:58, Tom Sightler wrote: > Quoting Daniel Phillips <[EMAIL PROTECTED]>: > > I originally intended to implement a sliding flush delay based on disk > > load. > > This turned out to be a lot of work for a hard-to-discern benefit. So > > the > > current approach has just two delays: .1 second and whatever the bdflush > > > > delay is set to. If there is any non-flush disk traffic the longer > > delay is > > used. This is crude but effective... I think. I hope that somebody > > will run > > this through some benchmarks to see if I lost any performance. > > According to > > my calculations, I did not. I tested this mainly in UML, and also ran > > it > > briefly on my laptop. The interactive feel of the change is immediately > > > > obvious, and for me at least, a big improvement. > > Well, since a lot of this discussion seemed to spin off from my original > posting last week about my particular issue with disk flushing I decided to > try your patch with my simple test/problem that I experience on my laptop. > > One note, I ran your patch against 2.4.6-pre3 as that is what currently > performs the best on my laptop. It seems to apply cleanly and compiled > without problems. > > I used this kernel on my laptop kernel on my laptop all day for my normal > workload which consist ofa Gnome 1.4 desktop, several Mozilla instances, > several ssh sessions with remote X programs displayed, StarOffice, VMware > (running Windows 2000 Pro in 128MB). I also preformed several compiles > throughout the day. Overall the machine feels slightly more sluggish I > think due to the following two things: > > 1. When running a compile, or anything else that produces lots of small > disk writes, you tend to get lots of little pauses for all the little > writes to disk. These seem to be unnoticable without the patch. OK, this is because the early flush doesn't quit when load picks up again. Measuring only the io backlog, as I do now, isn't adequate for telling the difference between load initiated by the flush itself and other load, such as cpu bound process proceding to read another file, so that's why the flush doesn't stop flushing when other IO starts happening. This has to be fixed. In the mean time, you could try this simple tweak: just set the lower bound, currently 1/10th second a little higher: - unsigned check_interval = HZ/10, ... + unsigned check_interval = HZ/5, ... This may be enough to bridge the little pauses in the the compiler's disk access pattern so the flush isn't triggered. (This is not by any means a nice solution.) If you set check_interval to HZ*5, you *should* get exactly the old behaviour, I'd be very interested to hear if you do. Also, could you do your compiles with 'time' so you can quantify the results? > 2. Loading programs when writing activity is occuring (even light activity > like during the compile) is noticable slower, actually any reading from > disk is. Hmm, let me think why that may be. The loader doesn't actually read the program into memory, it just maps it and lets the pages fault in as they're called for. So if readahead isn't perfect (it isn't) the io backlog may drop to 0 briefly just as the kflush decides to sample it, and it initiates a flush. This flush cleans the whole dirty list out, stealing bandwidth from the reads. > I also ran my simple ftp test that produced the symptom I reported earlier. > I transferred a 750MB file via FTP, and with your patch sure enough disk > writing started almost immediately, but it still didn't seem to write > enough data to disk to keep up with the transfer so at approximately the > 200MB mark the old behavior still kicked in as it went into full flush > mode, during the time network activity halted, just like before. The big > difference with the patch and without is that the patched kernel never > seems to balance out, without the patch once the initial burst is done you > get a nice stream of data from the network to disk with the disk staying > moderately active. With the patch the disk varies from barely active > moderate to heavy and back, during the heavy the network transfer always > pauses (although very briefly). > > Just my observations, you asked for comments. Yes, I have to refine this. The inner flush loop has to know how many io submissions are happening, from which it can subtract its own submissions and know sombebody else is submitting IO, at which point it can fall back to the good old 5 second buffer age limit. False positives from kflush are handled as a fringe benefit, and flush_dirty_buffers won't do extra writeout. This is easy and cheap. I could get a lot fancier than this and caculate IO load averages, but I'd only do that after mining out the simple possibilities. I'll probably have something new for you to try tomorrow, if you're willing. By the way, I'm not addressing your fundamental problem, that's Rik's job
Re: [RFC] Early flush (was: spindown)
Quoting Daniel Phillips <[EMAIL PROTECTED]>: > I originally intended to implement a sliding flush delay based on disk > load. > This turned out to be a lot of work for a hard-to-discern benefit. So > the > current approach has just two delays: .1 second and whatever the bdflush > > delay is set to. If there is any non-flush disk traffic the longer > delay is > used. This is crude but effective... I think. I hope that somebody > will run > this through some benchmarks to see if I lost any performance. > According to > my calculations, I did not. I tested this mainly in UML, and also ran > it > briefly on my laptop. The interactive feel of the change is immediately > > obvious, and for me at least, a big improvement. Well, since a lot of this discussion seemed to spin off from my original posting last week about my particular issue with disk flushing I decided to try your patch with my simple test/problem that I experience on my laptop. One note, I ran your patch against 2.4.6-pre3 as that is what currently performs the best on my laptop. It seems to apply cleanly and compiled without problems. I used this kernel on my laptop kernel on my laptop all day for my normal workload which consist ofa Gnome 1.4 desktop, several Mozilla instances, several ssh sessions with remote X programs displayed, StarOffice, VMware (running Windows 2000 Pro in 128MB). I also preformed several compiles throughout the day. Overall the machine feels slightly more sluggish I think due to the following two things: 1. When running a compile, or anything else that produces lots of small disk writes, you tend to get lots of little pauses for all the little writes to disk. These seem to be unnoticable without the patch. 2. Loading programs when writing activity is occuring (even light activity like during the compile) is noticable slower, actually any reading from disk is. I also ran my simple ftp test that produced the symptom I reported earlier. I transferred a 750MB file via FTP, and with your patch sure enough disk writing started almost immediately, but it still didn't seem to write enough data to disk to keep up with the transfer so at approximately the 200MB mark the old behavior still kicked in as it went into full flush mode, during the time network activity halted, just like before. The big difference with the patch and without is that the patched kernel never seems to balance out, without the patch once the initial burst is done you get a nice stream of data from the network to disk with the disk staying moderately active. With the patch the disk varies from barely active moderate to heavy and back, during the heavy the network transfer always pauses (although very briefly). Just my observations, you asked for comments. Later, Tom - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Wednesday 20 June 2001 19:32, Rik van Riel wrote: > On Wed, 20 Jun 2001, Daniel Phillips wrote: > > BTW, with nominal 100,000 erases you have to write 10 terabytes > > to your 100 meg flash disk before you'll see it start to > > degrade. > > That assumes you write out full blocks. If you flush after > every byte written you'll hit the limit a lot sooner ;) Yep, so if you are running on a Yopy, try not to sync after each byte. > Btw, this is also a problem with your patch, when you write > out buffers all the time your disk will spend more time seeking > all over the place (moving the disk head away from where we are > currently reading!) and you'll end up writing the same block > multiple times ... It doesn't work that way, it tacks the flush onto the trailing edge of a burst of disk activity, or it flushes out an isolated update, say an edit save, which would have required the same amount of disk activity, just a few seconds off in the future. Sometimes it does write a few extra sectors when disk activity is sporadic, but the impact on total throughput is small enough to be hard to measure reliably. Even so, there is some optimizing that could be done - the update could be interleaved a little better with the falling edge of a heavy traffic episode. This would require that the io rate be monitored instead of just the queue backlog. I'mi nterested in tackling that eventually - it has applications in other areas than just the early update. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Wed, 20 Jun 2001, Daniel Phillips wrote: > BTW, with nominal 100,000 erases you have to write 10 terabytes > to your 100 meg flash disk before you'll see it start to > degrade. That assumes you write out full blocks. If you flush after every byte written you'll hit the limit a lot sooner ;) Btw, this is also a problem with your patch, when you write out buffers all the time your disk will spend more time seeking all over the place (moving the disk head away from where we are currently reading!) and you'll end up writing the same block multiple times ... regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Tuesday 19 June 2001 12:46, Pavel Machek wrote: > > > > Roger> It does if you are running on a laptop. Then you do not want > > > > Roger> the pages go out all the time. Disk has gone too sleep, needs > > > > Roger> to start to write a few pages, stays idle for a while, goes to > > > > Roger> sleep, a few more pages, ... > > > > That could be handled by a metric which says if the disk is spun > > > > down, wait until there is more memory pressure before writing. But > > > > if the disk is spinning, we don't care, you should start writing out > > > > buffers at some low rate to keep the pressure from rising too > > > > rapidly. > > > > > > Notice that write is not free (in terms of power) even if disk is > > > spinning. Seeks (etc) also take some power. And think about > > > flashcards. It certainly is cheaper tha spinning disk up but still not > > > free. > > > > Isn't this why noflushd exists or is this an evil thing that shouldn't > > ever be used and will eventually eat my disks for breakfast? > > It would eat your flash for breakfast. You know, flash memories have > no spinning parts, so there's nothing to spin down. Yes, this doesn't make sense for flash, and in fact, it doesn't make sense to have just one set of bdflush parameters for the whole system, it's really a property of the individual device. So the thing to do is for me to go kibitz on the io layer rewrite projects and figure out how to set up the intelligence per-queue, and have the queues per-device, at which point it's trivial to do the write^H^H^H^H^H right thing for each kind of device. BTW, with nominal 100,000 erases you have to write 10 terabytes to your 100 meg flash disk before you'll see it start to degrade. These devices are set up to avoid continuous hammering on the same same page, and to take failed pages out of the pool as soon as they fail to erase. Also, the 100,000 figure is nominal - the average number of erases you'll get per page is considerably higher. The extra few sectors we see with the early flush patch are just not going to affect the life of your flash to a measurable degree. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Daniel Phillips writes: > On Wednesday 20 June 2001 06:39, Richard Gooch wrote: > > Starting I/O immediately if there is no load sounds nice. However, > > what about the other case, when the disc is already spun down (and > > hence there's no I/O load either)? I want the system to avoid doing > > writes while the disc is spun down. I'm quite happy for the system to > > accumulate dirtied pages/buffers, reclaiming clean pages as needed, > > until it absolutely has to start writing out (or I call sync(2)). > > I'd like that too, but what about sync writes? As things stand now, > there is no option but to spin the disk back up. To get around this > we'd have to change the basic behavior of the block device and > that's doable, but it's an entirely different proposition than the > little patch above. I don't care as much about sync writes. They don't seem to happen very often on my boxes. > You know about this project no doubt: > >http://noflushd.sourceforge.net/ Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in .h files! As you say, not really lightweight. There must be a better way. Also, I suspect (without having looked at the code) that it doesn't handle memory pressure well. Things may get nasty when we run low on free pages. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wednesday 20 June 2001 06:39, Richard Gooch wrote: > Daniel Phillips writes: > > I never realized how much I didn't like the good old 5 second delay > > between saving an edit and actually getting it written to disk until > > it went away. Now the question is, did I lose any performance in > > doing that. What I wrote in the previous email turned out to be > > pretty accurate, so I'll just quote it > > Starting I/O immediately if there is no load sounds nice. However, > what about the other case, when the disc is already spun down (and > hence there's no I/O load either)? I want the system to avoid doing > writes while the disc is spun down. I'm quite happy for the system to > accumulate dirtied pages/buffers, reclaiming clean pages as needed, > until it absolutely has to start writing out (or I call sync(2)). I'd like that too, but what about sync writes? As things stand now, there is no option but to spin the disk back up. To get around this we'd have to change the basic behavior of the block device and that's doable, but it's an entirely different proposition than the little patch above. You know about this project no doubt: http://noflushd.sourceforge.net/ This is really complementary to what I did. Lightweight is not really a good way to describe it though, the tar is almost 10,000 lines long. There is probably a clever thing to do at the kernel level to shorten that up. There's one thing I think I can help fix up while I'm working in here, this complaint: Reiserfs journaling bypasses the kernel's delayed write mechanisms and writes straight to disk. We need to address the reasons why such filesystems have to bypass kupdate. This touches on how sync and fsync work, updating supers, flushing the inode cache etc, but with Al Viro's superblock work merged now we could start thinking about it. > Right now I hack that by setting bdflush parameters to 5 minutes. But > that's not ideal either. Yes, that still works with my patch. The noflushd user space daemon works by turning off kupdate (set update time to 0). -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
Hi! > > > Roger> It does if you are running on a laptop. Then you do not want > > > Roger> the pages go out all the time. Disk has gone too sleep, needs > > > Roger> to start to write a few pages, stays idle for a while, goes to > > > Roger> sleep, a few more pages, ... > > > That could be handled by a metric which says if the disk is spun > > > down, wait until there is more memory pressure before writing. But > > > if the disk is spinning, we don't care, you should start writing out > > > buffers at some low rate to keep the pressure from rising too > > > rapidly. > > Notice that write is not free (in terms of power) even if disk is > > spinning. Seeks (etc) also take some power. And think about > > flashcards. It certainly is cheaper tha spinning disk up but still not > > free. > > Isn't this why noflushd exists or is this an evil thing that shouldn't > ever be used and will eventually eat my disks for breakfast? It would eat your flash for breakfast. You know, flash memories have no spinning parts, so there's nothing to spin down. Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
Hi! Roger It does if you are running on a laptop. Then you do not want Roger the pages go out all the time. Disk has gone too sleep, needs Roger to start to write a few pages, stays idle for a while, goes to Roger sleep, a few more pages, ... That could be handled by a metric which says if the disk is spun down, wait until there is more memory pressure before writing. But if the disk is spinning, we don't care, you should start writing out buffers at some low rate to keep the pressure from rising too rapidly. Notice that write is not free (in terms of power) even if disk is spinning. Seeks (etc) also take some power. And think about flashcards. It certainly is cheaper tha spinning disk up but still not free. Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? It would eat your flash for breakfast. You know, flash memories have no spinning parts, so there's nothing to spin down. Pavel -- I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care. Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wednesday 20 June 2001 06:39, Richard Gooch wrote: Daniel Phillips writes: I never realized how much I didn't like the good old 5 second delay between saving an edit and actually getting it written to disk until it went away. Now the question is, did I lose any performance in doing that. What I wrote in the previous email turned out to be pretty accurate, so I'll just quote it Starting I/O immediately if there is no load sounds nice. However, what about the other case, when the disc is already spun down (and hence there's no I/O load either)? I want the system to avoid doing writes while the disc is spun down. I'm quite happy for the system to accumulate dirtied pages/buffers, reclaiming clean pages as needed, until it absolutely has to start writing out (or I call sync(2)). I'd like that too, but what about sync writes? As things stand now, there is no option but to spin the disk back up. To get around this we'd have to change the basic behavior of the block device and that's doable, but it's an entirely different proposition than the little patch above. You know about this project no doubt: http://noflushd.sourceforge.net/ This is really complementary to what I did. Lightweight is not really a good way to describe it though, the tar is almost 10,000 lines long. There is probably a clever thing to do at the kernel level to shorten that up. There's one thing I think I can help fix up while I'm working in here, this complaint: Reiserfs journaling bypasses the kernel's delayed write mechanisms and writes straight to disk. We need to address the reasons why such filesystems have to bypass kupdate. This touches on how sync and fsync work, updating supers, flushing the inode cache etc, but with Al Viro's superblock work merged now we could start thinking about it. Right now I hack that by setting bdflush parameters to 5 minutes. But that's not ideal either. Yes, that still works with my patch. The noflushd user space daemon works by turning off kupdate (set update time to 0). -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Daniel Phillips writes: On Wednesday 20 June 2001 06:39, Richard Gooch wrote: Starting I/O immediately if there is no load sounds nice. However, what about the other case, when the disc is already spun down (and hence there's no I/O load either)? I want the system to avoid doing writes while the disc is spun down. I'm quite happy for the system to accumulate dirtied pages/buffers, reclaiming clean pages as needed, until it absolutely has to start writing out (or I call sync(2)). I'd like that too, but what about sync writes? As things stand now, there is no option but to spin the disk back up. To get around this we'd have to change the basic behavior of the block device and that's doable, but it's an entirely different proposition than the little patch above. I don't care as much about sync writes. They don't seem to happen very often on my boxes. You know about this project no doubt: http://noflushd.sourceforge.net/ Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in .h files! As you say, not really lightweight. There must be a better way. Also, I suspect (without having looked at the code) that it doesn't handle memory pressure well. Things may get nasty when we run low on free pages. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Tuesday 19 June 2001 12:46, Pavel Machek wrote: Roger It does if you are running on a laptop. Then you do not want Roger the pages go out all the time. Disk has gone too sleep, needs Roger to start to write a few pages, stays idle for a while, goes to Roger sleep, a few more pages, ... That could be handled by a metric which says if the disk is spun down, wait until there is more memory pressure before writing. But if the disk is spinning, we don't care, you should start writing out buffers at some low rate to keep the pressure from rising too rapidly. Notice that write is not free (in terms of power) even if disk is spinning. Seeks (etc) also take some power. And think about flashcards. It certainly is cheaper tha spinning disk up but still not free. Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? It would eat your flash for breakfast. You know, flash memories have no spinning parts, so there's nothing to spin down. Yes, this doesn't make sense for flash, and in fact, it doesn't make sense to have just one set of bdflush parameters for the whole system, it's really a property of the individual device. So the thing to do is for me to go kibitz on the io layer rewrite projects and figure out how to set up the intelligence per-queue, and have the queues per-device, at which point it's trivial to do the write^H^H^H^H^H right thing for each kind of device. BTW, with nominal 100,000 erases you have to write 10 terabytes to your 100 meg flash disk before you'll see it start to degrade. These devices are set up to avoid continuous hammering on the same same page, and to take failed pages out of the pool as soon as they fail to erase. Also, the 100,000 figure is nominal - the average number of erases you'll get per page is considerably higher. The extra few sectors we see with the early flush patch are just not going to affect the life of your flash to a measurable degree. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Wed, 20 Jun 2001, Daniel Phillips wrote: BTW, with nominal 100,000 erases you have to write 10 terabytes to your 100 meg flash disk before you'll see it start to degrade. That assumes you write out full blocks. If you flush after every byte written you'll hit the limit a lot sooner ;) Btw, this is also a problem with your patch, when you write out buffers all the time your disk will spend more time seeking all over the place (moving the disk head away from where we are currently reading!) and you'll end up writing the same block multiple times ... regards, Rik -- Executive summary of a recent Microsoft press release: we are concerned about the GNU General Public License (GPL) http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Wednesday 20 June 2001 19:32, Rik van Riel wrote: On Wed, 20 Jun 2001, Daniel Phillips wrote: BTW, with nominal 100,000 erases you have to write 10 terabytes to your 100 meg flash disk before you'll see it start to degrade. That assumes you write out full blocks. If you flush after every byte written you'll hit the limit a lot sooner ;) Yep, so if you are running on a Yopy, try not to sync after each byte. Btw, this is also a problem with your patch, when you write out buffers all the time your disk will spend more time seeking all over the place (moving the disk head away from where we are currently reading!) and you'll end up writing the same block multiple times ... It doesn't work that way, it tacks the flush onto the trailing edge of a burst of disk activity, or it flushes out an isolated update, say an edit save, which would have required the same amount of disk activity, just a few seconds off in the future. Sometimes it does write a few extra sectors when disk activity is sporadic, but the impact on total throughput is small enough to be hard to measure reliably. Even so, there is some optimizing that could be done - the update could be interleaved a little better with the falling edge of a heavy traffic episode. This would require that the io rate be monitored instead of just the queue backlog. I'mi nterested in tackling that eventually - it has applications in other areas than just the early update. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
Quoting Daniel Phillips [EMAIL PROTECTED]: I originally intended to implement a sliding flush delay based on disk load. This turned out to be a lot of work for a hard-to-discern benefit. So the current approach has just two delays: .1 second and whatever the bdflush delay is set to. If there is any non-flush disk traffic the longer delay is used. This is crude but effective... I think. I hope that somebody will run this through some benchmarks to see if I lost any performance. According to my calculations, I did not. I tested this mainly in UML, and also ran it briefly on my laptop. The interactive feel of the change is immediately obvious, and for me at least, a big improvement. Well, since a lot of this discussion seemed to spin off from my original posting last week about my particular issue with disk flushing I decided to try your patch with my simple test/problem that I experience on my laptop. One note, I ran your patch against 2.4.6-pre3 as that is what currently performs the best on my laptop. It seems to apply cleanly and compiled without problems. I used this kernel on my laptop kernel on my laptop all day for my normal workload which consist ofa Gnome 1.4 desktop, several Mozilla instances, several ssh sessions with remote X programs displayed, StarOffice, VMware (running Windows 2000 Pro in 128MB). I also preformed several compiles throughout the day. Overall the machine feels slightly more sluggish I think due to the following two things: 1. When running a compile, or anything else that produces lots of small disk writes, you tend to get lots of little pauses for all the little writes to disk. These seem to be unnoticable without the patch. 2. Loading programs when writing activity is occuring (even light activity like during the compile) is noticable slower, actually any reading from disk is. I also ran my simple ftp test that produced the symptom I reported earlier. I transferred a 750MB file via FTP, and with your patch sure enough disk writing started almost immediately, but it still didn't seem to write enough data to disk to keep up with the transfer so at approximately the 200MB mark the old behavior still kicked in as it went into full flush mode, during the time network activity halted, just like before. The big difference with the patch and without is that the patched kernel never seems to balance out, without the patch once the initial burst is done you get a nice stream of data from the network to disk with the disk staying moderately active. With the patch the disk varies from barely active moderate to heavy and back, during the heavy the network transfer always pauses (although very briefly). Just my observations, you asked for comments. Later, Tom - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Early flush (was: spindown)
On Wednesday 20 June 2001 22:58, Tom Sightler wrote: Quoting Daniel Phillips [EMAIL PROTECTED]: I originally intended to implement a sliding flush delay based on disk load. This turned out to be a lot of work for a hard-to-discern benefit. So the current approach has just two delays: .1 second and whatever the bdflush delay is set to. If there is any non-flush disk traffic the longer delay is used. This is crude but effective... I think. I hope that somebody will run this through some benchmarks to see if I lost any performance. According to my calculations, I did not. I tested this mainly in UML, and also ran it briefly on my laptop. The interactive feel of the change is immediately obvious, and for me at least, a big improvement. Well, since a lot of this discussion seemed to spin off from my original posting last week about my particular issue with disk flushing I decided to try your patch with my simple test/problem that I experience on my laptop. One note, I ran your patch against 2.4.6-pre3 as that is what currently performs the best on my laptop. It seems to apply cleanly and compiled without problems. I used this kernel on my laptop kernel on my laptop all day for my normal workload which consist ofa Gnome 1.4 desktop, several Mozilla instances, several ssh sessions with remote X programs displayed, StarOffice, VMware (running Windows 2000 Pro in 128MB). I also preformed several compiles throughout the day. Overall the machine feels slightly more sluggish I think due to the following two things: 1. When running a compile, or anything else that produces lots of small disk writes, you tend to get lots of little pauses for all the little writes to disk. These seem to be unnoticable without the patch. OK, this is because the early flush doesn't quit when load picks up again. Measuring only the io backlog, as I do now, isn't adequate for telling the difference between load initiated by the flush itself and other load, such as cpu bound process proceding to read another file, so that's why the flush doesn't stop flushing when other IO starts happening. This has to be fixed. In the mean time, you could try this simple tweak: just set the lower bound, currently 1/10th second a little higher: - unsigned check_interval = HZ/10, ... + unsigned check_interval = HZ/5, ... This may be enough to bridge the little pauses in the the compiler's disk access pattern so the flush isn't triggered. (This is not by any means a nice solution.) If you set check_interval to HZ*5, you *should* get exactly the old behaviour, I'd be very interested to hear if you do. Also, could you do your compiles with 'time' so you can quantify the results? 2. Loading programs when writing activity is occuring (even light activity like during the compile) is noticable slower, actually any reading from disk is. Hmm, let me think why that may be. The loader doesn't actually read the program into memory, it just maps it and lets the pages fault in as they're called for. So if readahead isn't perfect (it isn't) the io backlog may drop to 0 briefly just as the kflush decides to sample it, and it initiates a flush. This flush cleans the whole dirty list out, stealing bandwidth from the reads. I also ran my simple ftp test that produced the symptom I reported earlier. I transferred a 750MB file via FTP, and with your patch sure enough disk writing started almost immediately, but it still didn't seem to write enough data to disk to keep up with the transfer so at approximately the 200MB mark the old behavior still kicked in as it went into full flush mode, during the time network activity halted, just like before. The big difference with the patch and without is that the patched kernel never seems to balance out, without the patch once the initial burst is done you get a nice stream of data from the network to disk with the disk staying moderately active. With the patch the disk varies from barely active moderate to heavy and back, during the heavy the network transfer always pauses (although very briefly). Just my observations, you asked for comments. Yes, I have to refine this. The inner flush loop has to know how many io submissions are happening, from which it can subtract its own submissions and know sombebody else is submitting IO, at which point it can fall back to the good old 5 second buffer age limit. False positives from kflush are handled as a fringe benefit, and flush_dirty_buffers won't do extra writeout. This is easy and cheap. I could get a lot fancier than this and caculate IO load averages, but I'd only do that after mining out the simple possibilities. I'll probably have something new for you to try tomorrow, if you're willing. By the way, I'm not addressing your fundamental problem, that's Rik's job ;-). In fact, I define success in this effort by the extent to which I
Re: [RFC] Early flush (was: spindown)
Daniel Phillips writes: > I never realized how much I didn't like the good old 5 second delay > between saving an edit and actually getting it written to disk until > it went away. Now the question is, did I lose any performance in > doing that. What I wrote in the previous email turned out to be > pretty accurate, so I'll just quote it Starting I/O immediately if there is no load sounds nice. However, what about the other case, when the disc is already spun down (and hence there's no I/O load either)? I want the system to avoid doing writes while the disc is spun down. I'm quite happy for the system to accumulate dirtied pages/buffers, reclaiming clean pages as needed, until it absolutely has to start writing out (or I call sync(2)). Right now I hack that by setting bdflush parameters to 5 minutes. But that's not ideal either. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Early flush (was: spindown)
I never realized how much I didn't like the good old 5 second delay between saving an edit and actually getting it written to disk until it went away. Now the question is, did I lose any performance in doing that. What I wrote in the previous email turned out to be pretty accurate, so I'll just quote it to keep it together with the patch: > I'm now in the midst of hatching a patch. [1] The first thing I had to do > is go explore the block driver code, yum yum. I found that it already > computes the statistic I'm interested in, namely queued_sectors, which is > used to pace the IO on block devices. It's a little crude - we really want > this to be per-queue and have one queue per "spindle" - but even in its > current form it's workable. > > The idea is that when queued_sectors drops below some threshold we have > 'unused disk bandwidth' so it would be nice to do something useful with it: > > 1) Do an early 'sync_old_buffers' > 2) Do some preemptive pageout > > The benefit of (1) is that it lets disks go idle a few seconds earlier, and > (2) should improve the system's latency in response to load surges. There > are drawbacks too, which have been pointed out to me privately, but they > tend to be pretty minor, for example: on a flash disk you'd do a few extra > writes and wear it out ever-so-slightly sooner. All the same, such special > devices can be dealt easily once we progress a little further in improving > the kernel's 'per spindle' intelligence. > > Now how to implement this. I considered putting a (newly minted) > wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded > transition, and that's fine except it doesn't do the whole job: we also > need to have the early flush for any write to a disk file while the disks > are lightly loaded, i.e., there is no convenient loaded-to-unloaded > transition to trigger it. The missing trigger could be inserted into > __mark_dirty, but that would penalize the loaded state (a little, but > that's still too much). Furthermore, it's probably desirable to maintain a > small delay between the dirty and the flush. So what I'll try first is > just running kflush's timer faster, and make its reschedule period vary > with disk load, i.e., when there are fewer queued_sectors, kflush looks at > the dirty buffer list more often. > > The rest of what has to happen in kflush is pretty straightforward. It > just uses queued_sectors to determine how far to walk the dirty buffer > list, which is maintained in time-since-dirtied order. If queued_sectors > is below some threshold the entire list is flushed. Note that we want to > change the sense of b_flushtime to b_timedirtied. It's more efficient to > do it this way anyway. > > I haven't done anything about preemptive pageout yet, but similar ideas > apply. > > [1] This is an experiment, do not worry, it will not show up in your tree > any time soon. IOW, constructive criticism appreciated, flames copied to > /dev/null. I originally intended to implement a sliding flush delay based on disk load. This turned out to be a lot of work for a hard-to-discern benefit. So the current approach has just two delays: .1 second and whatever the bdflush delay is set to. If there is any non-flush disk traffic the longer delay is used. This is crude but effective... I think. I hope that somebody will run this through some benchmarks to see if I lost any performance. According to my calculations, I did not. I tested this mainly in UML, and also ran it briefly on my laptop. The interactive feel of the change is immediately obvious, and for me at least, a big improvement. The patch is against 2.4.5. To apply: cd /your/source/tree patch b_flushtime = jiffies + bdf_prm.b_un.age_buffer; + bh->b_dirtytime = jiffies; refile_buffer(bh); } @@ -2524,12 +2524,20 @@ as all dirty buffers lives _only_ in the DIRTY lru list. As we never browse the LOCKED and CLEAN lru lists they are infact completly useless. */ -static int flush_dirty_buffers(int check_flushtime) +static int flush_dirty_buffers (int update) { struct buffer_head * bh, *next; int flushed = 0, i; + unsigned queued = atomic_read (_sectors); + unsigned long youngest_to_update; - restart: +#ifdef DEBUG + if (update) + printk("kupdate %lu %i\n", jiffies, queued); +#endif + +restart: + youngest_to_update = jiffies - (queued? bdf_prm.b_un.age_buffer: 0); spin_lock(_list_lock); bh = lru_list[BUF_DIRTY]; if (!bh) @@ -2544,19 +2552,14 @@ if (buffer_locked(bh)) continue; - if (check_flushtime) { - /* The dirty lru list is chronologically ordered so - if the current bh is not yet timed out, - then also all the following bhs - will be too young. */ - if
Re: spindown
On Fri, Jun 15, 2001 at 03:23:07PM +, Pavel Machek wrote: > > Roger> It does if you are running on a laptop. Then you do not want > > Roger> the pages go out all the time. Disk has gone too sleep, needs > > Roger> to start to write a few pages, stays idle for a while, goes to > > Roger> sleep, a few more pages, ... > > That could be handled by a metric which says if the disk is spun > > down, wait until there is more memory pressure before writing. But > > if the disk is spinning, we don't care, you should start writing out > > buffers at some low rate to keep the pressure from rising too > > rapidly. > Notice that write is not free (in terms of power) even if disk is > spinning. Seeks (etc) also take some power. And think about > flashcards. It certainly is cheaper tha spinning disk up but still not > free. Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? Description: allow idle hard disks to spin down Noflushd is a daemon that spins down disks that have not been read from after a certain amount of time, and then prevents disk writes from spinning them back up. It's targeted for laptops but can be used on any computer with IDE disks. The effect is that the hard disk actually spins down, saving you battery power, and shutting off the loudest component of most computers. http://noflushd.sourceforge.net Simon. -- [ "CATS. CATS ARE NICE." - Death, "Sourcery" ] Black Cat Networks. http://www.blackcatnetworks.co.uk/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown
On Fri, Jun 15, 2001 at 03:23:07PM +, Pavel Machek wrote: Roger It does if you are running on a laptop. Then you do not want Roger the pages go out all the time. Disk has gone too sleep, needs Roger to start to write a few pages, stays idle for a while, goes to Roger sleep, a few more pages, ... That could be handled by a metric which says if the disk is spun down, wait until there is more memory pressure before writing. But if the disk is spinning, we don't care, you should start writing out buffers at some low rate to keep the pressure from rising too rapidly. Notice that write is not free (in terms of power) even if disk is spinning. Seeks (etc) also take some power. And think about flashcards. It certainly is cheaper tha spinning disk up but still not free. Isn't this why noflushd exists or is this an evil thing that shouldn't ever be used and will eventually eat my disks for breakfast? Description: allow idle hard disks to spin down Noflushd is a daemon that spins down disks that have not been read from after a certain amount of time, and then prevents disk writes from spinning them back up. It's targeted for laptops but can be used on any computer with IDE disks. The effect is that the hard disk actually spins down, saving you battery power, and shutting off the loudest component of most computers. http://noflushd.sourceforge.net Simon. -- [ CATS. CATS ARE NICE. - Death, Sourcery ] Black Cat Networks. http://www.blackcatnetworks.co.uk/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] Early flush (was: spindown)
I never realized how much I didn't like the good old 5 second delay between saving an edit and actually getting it written to disk until it went away. Now the question is, did I lose any performance in doing that. What I wrote in the previous email turned out to be pretty accurate, so I'll just quote it to keep it together with the patch: I'm now in the midst of hatching a patch. [1] The first thing I had to do is go explore the block driver code, yum yum. I found that it already computes the statistic I'm interested in, namely queued_sectors, which is used to pace the IO on block devices. It's a little crude - we really want this to be per-queue and have one queue per spindle - but even in its current form it's workable. The idea is that when queued_sectors drops below some threshold we have 'unused disk bandwidth' so it would be nice to do something useful with it: 1) Do an early 'sync_old_buffers' 2) Do some preemptive pageout The benefit of (1) is that it lets disks go idle a few seconds earlier, and (2) should improve the system's latency in response to load surges. There are drawbacks too, which have been pointed out to me privately, but they tend to be pretty minor, for example: on a flash disk you'd do a few extra writes and wear it out ever-so-slightly sooner. All the same, such special devices can be dealt easily once we progress a little further in improving the kernel's 'per spindle' intelligence. Now how to implement this. I considered putting a (newly minted) wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded transition, and that's fine except it doesn't do the whole job: we also need to have the early flush for any write to a disk file while the disks are lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to trigger it. The missing trigger could be inserted into __mark_dirty, but that would penalize the loaded state (a little, but that's still too much). Furthermore, it's probably desirable to maintain a small delay between the dirty and the flush. So what I'll try first is just running kflush's timer faster, and make its reschedule period vary with disk load, i.e., when there are fewer queued_sectors, kflush looks at the dirty buffer list more often. The rest of what has to happen in kflush is pretty straightforward. It just uses queued_sectors to determine how far to walk the dirty buffer list, which is maintained in time-since-dirtied order. If queued_sectors is below some threshold the entire list is flushed. Note that we want to change the sense of b_flushtime to b_timedirtied. It's more efficient to do it this way anyway. I haven't done anything about preemptive pageout yet, but similar ideas apply. [1] This is an experiment, do not worry, it will not show up in your tree any time soon. IOW, constructive criticism appreciated, flames copied to /dev/null. I originally intended to implement a sliding flush delay based on disk load. This turned out to be a lot of work for a hard-to-discern benefit. So the current approach has just two delays: .1 second and whatever the bdflush delay is set to. If there is any non-flush disk traffic the longer delay is used. This is crude but effective... I think. I hope that somebody will run this through some benchmarks to see if I lost any performance. According to my calculations, I did not. I tested this mainly in UML, and also ran it briefly on my laptop. The interactive feel of the change is immediately obvious, and for me at least, a big improvement. The patch is against 2.4.5. To apply: cd /your/source/tree patch this/patch -p0 --- ../uml.2.4.5.clean/fs/buffer.c Sat May 26 02:57:46 2001 +++ ./fs/buffer.c Wed Jun 20 01:55:21 2001 @@ -1076,7 +1076,7 @@ static __inline__ void __mark_dirty(struct buffer_head *bh) { - bh-b_flushtime = jiffies + bdf_prm.b_un.age_buffer; + bh-b_dirtytime = jiffies; refile_buffer(bh); } @@ -2524,12 +2524,20 @@ as all dirty buffers lives _only_ in the DIRTY lru list. As we never browse the LOCKED and CLEAN lru lists they are infact completly useless. */ -static int flush_dirty_buffers(int check_flushtime) +static int flush_dirty_buffers (int update) { struct buffer_head * bh, *next; int flushed = 0, i; + unsigned queued = atomic_read (queued_sectors); + unsigned long youngest_to_update; - restart: +#ifdef DEBUG + if (update) + printk(kupdate %lu %i\n, jiffies, queued); +#endif + +restart: + youngest_to_update = jiffies - (queued? bdf_prm.b_un.age_buffer: 0); spin_lock(lru_list_lock); bh = lru_list[BUF_DIRTY]; if (!bh) @@ -2544,19 +2552,14 @@ if (buffer_locked(bh)) continue; - if (check_flushtime) { - /* The dirty lru list is chronologically ordered so - if
Re: [RFC] Early flush (was: spindown)
Daniel Phillips writes: I never realized how much I didn't like the good old 5 second delay between saving an edit and actually getting it written to disk until it went away. Now the question is, did I lose any performance in doing that. What I wrote in the previous email turned out to be pretty accurate, so I'll just quote it Starting I/O immediately if there is no load sounds nice. However, what about the other case, when the disc is already spun down (and hence there's no I/O load either)? I want the system to avoid doing writes while the disc is spun down. I'm quite happy for the system to accumulate dirtied pages/buffers, reclaiming clean pages as needed, until it absolutely has to start writing out (or I call sync(2)). Right now I hack that by setting bdflush parameters to 5 minutes. But that's not ideal either. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Mon, 18 Jun 2001, Daniel Phillips wrote: > On Sunday 17 June 2001 12:05, Mike Galbraith wrote: > > It _juuust_ so happens that I was tinkering... what do you think of > > something like the below? (and boy do I ever wonder what a certain > > box doing slrn stuff thinks of it.. hint hint;) > > It's too subtle for me ;-) (Not shy about sying that because this part of > the kernel is probably subtle for everyone.) No subtltry (hammer), it just draws a line that doesn't move around in unpredictable ways. For example, nr_free_buffer_pages() adds in free pages to the line it draws. You may have a large volume of dirty data, decide it would be prudent to flush, then someone frees a nice chunk of memory... (send morse code messages via malloc/free?:) Anyway it's crude, but it seems to have gotten results from the slrn load. I received logs for ac15 and ac15+patch. ac15 took 265 seconds to do the job whereas with the patch it took 227 seconds. I haven't poured over the logs yet, but there seems to be throughput to be had. If anyone is interested in the logs, they're much smaller than expected -rw-r--r-- 1 mikegusers 11993 Jun 19 05:58 ac15_mike.log -rw-r--r-- 1 mikegusers 13015 Jun 19 05:58 ac15_org.log > The question I'm tackling right now is how the system behaves when the load > goes away, or doesn't get heavy. Your patch doesn't measure the load > directly - it may attempt to predict it as a function of memory pressure, but > that's a little more loosely coupled than what I had in mind. It doesn't attempt to predict, it reacts to the existing situation. > I'm now in the midst of hatching a patch. [1] The first thing I had to do is > go explore the block driver code, yum yum. I found that it already computes > the statistic I'm interested in, namely queued_sectors, which is used to pace > the IO on block devices. It's a little crude - we really want this to be > per-queue and have one queue per "spindle" - but even in its current form > it's workable. > > The idea is that when queued_sectors drops below some threshold we have > 'unused disk bandwidth' so it would be nice to do something useful with it: (that's much more subtle/clever:) > 1) Do an early 'sync_old_buffers' > 2) Do some preemptive pageout > > The benefit of (1) is that it lets disks go idle a few seconds earlier, and > (2) should improve the system's latency in response to load surges. There > are drawbacks too, which have been pointed out to me privately, but they tend > to be pretty minor, for example: on a flash disk you'd do a few extra writes > and wear it out ever-so-slightly sooner. All the same, such special devices > can be dealt easily once we progress a little further in improving the > kernel's 'per spindle' intelligence. > > Now how to implement this. I considered putting a (newly minted) > wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded > transition, and that's fine except it doesn't do the whole job: we also need > to have the early flush for any write to a disk file while the disks are > lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to > trigger it. The missing trigger could be inserted into __mark_dirty, but > that would penalize the loaded state (a little, but that's still too much). > Furthermore, it's probably desirable to maintain a small delay between the > dirty and the flush. So what I'll try first is just running kflush's timer > faster, and make its reschedule period vary with disk load, i.e., when there > are fewer queued_sectors, kflush looks at the dirty buffer list more often. > > The rest of what has to happen in kflush is pretty straightforward. It just > uses queued_sectors to determine how far to walk the dirty buffer list, which > is maintained in time-since-dirtied order. If queued_sectors is below some > threshold the entire list is flushed. Note that we want to change the sense > of b_flushtime to b_timedirtied. It's more efficient to do it this way > anyway. > > I haven't done anything about preemptive pageout yet, but similar ideas apply. Preemptive pageout could be simply walk the dirty list looking for swap pages and writing them out. With the fair aging change that's already in, there will be some. If the fair aging change to background aging works out, there will be more (don't want too many more though;). The only problem I can see with that simle method is that once written, the page lands on the inactive_clean list. That list is short and does get consumed.. might turn fake pageout into a real one unintentionally. > [1] This is an experiment, do not worry, it will not show up in your tree any > time soon. IOW, constructive criticism appreciated, flames copied to > /dev/null. Look forward to seeing it. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sunday 17 June 2001 12:05, Mike Galbraith wrote: > It _juuust_ so happens that I was tinkering... what do you think of > something like the below? (and boy do I ever wonder what a certain > box doing slrn stuff thinks of it.. hint hint;) It's too subtle for me ;-) (Not shy about sying that because this part of the kernel is probably subtle for everyone.) The question I'm tackling right now is how the system behaves when the load goes away, or doesn't get heavy. Your patch doesn't measure the load directly - it may attempt to predict it as a function of memory pressure, but that's a little more loosely coupled than what I had in mind. I'm now in the midst of hatching a patch. [1] The first thing I had to do is go explore the block driver code, yum yum. I found that it already computes the statistic I'm interested in, namely queued_sectors, which is used to pace the IO on block devices. It's a little crude - we really want this to be per-queue and have one queue per "spindle" - but even in its current form it's workable. The idea is that when queued_sectors drops below some threshold we have 'unused disk bandwidth' so it would be nice to do something useful with it: 1) Do an early 'sync_old_buffers' 2) Do some preemptive pageout The benefit of (1) is that it lets disks go idle a few seconds earlier, and (2) should improve the system's latency in response to load surges. There are drawbacks too, which have been pointed out to me privately, but they tend to be pretty minor, for example: on a flash disk you'd do a few extra writes and wear it out ever-so-slightly sooner. All the same, such special devices can be dealt easily once we progress a little further in improving the kernel's 'per spindle' intelligence. Now how to implement this. I considered putting a (newly minted) wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded transition, and that's fine except it doesn't do the whole job: we also need to have the early flush for any write to a disk file while the disks are lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to trigger it. The missing trigger could be inserted into __mark_dirty, but that would penalize the loaded state (a little, but that's still too much). Furthermore, it's probably desirable to maintain a small delay between the dirty and the flush. So what I'll try first is just running kflush's timer faster, and make its reschedule period vary with disk load, i.e., when there are fewer queued_sectors, kflush looks at the dirty buffer list more often. The rest of what has to happen in kflush is pretty straightforward. It just uses queued_sectors to determine how far to walk the dirty buffer list, which is maintained in time-since-dirtied order. If queued_sectors is below some threshold the entire list is flushed. Note that we want to change the sense of b_flushtime to b_timedirtied. It's more efficient to do it this way anyway. I haven't done anything about preemptive pageout yet, but similar ideas apply. [1] This is an experiment, do not worry, it will not show up in your tree any time soon. IOW, constructive criticism appreciated, flames copied to /dev/null. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sunday 17 June 2001 12:05, Mike Galbraith wrote: It _juuust_ so happens that I was tinkering... what do you think of something like the below? (and boy do I ever wonder what a certain box doing slrn stuff thinks of it.. hint hint;) It's too subtle for me ;-) (Not shy about sying that because this part of the kernel is probably subtle for everyone.) The question I'm tackling right now is how the system behaves when the load goes away, or doesn't get heavy. Your patch doesn't measure the load directly - it may attempt to predict it as a function of memory pressure, but that's a little more loosely coupled than what I had in mind. I'm now in the midst of hatching a patch. [1] The first thing I had to do is go explore the block driver code, yum yum. I found that it already computes the statistic I'm interested in, namely queued_sectors, which is used to pace the IO on block devices. It's a little crude - we really want this to be per-queue and have one queue per spindle - but even in its current form it's workable. The idea is that when queued_sectors drops below some threshold we have 'unused disk bandwidth' so it would be nice to do something useful with it: 1) Do an early 'sync_old_buffers' 2) Do some preemptive pageout The benefit of (1) is that it lets disks go idle a few seconds earlier, and (2) should improve the system's latency in response to load surges. There are drawbacks too, which have been pointed out to me privately, but they tend to be pretty minor, for example: on a flash disk you'd do a few extra writes and wear it out ever-so-slightly sooner. All the same, such special devices can be dealt easily once we progress a little further in improving the kernel's 'per spindle' intelligence. Now how to implement this. I considered putting a (newly minted) wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded transition, and that's fine except it doesn't do the whole job: we also need to have the early flush for any write to a disk file while the disks are lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to trigger it. The missing trigger could be inserted into __mark_dirty, but that would penalize the loaded state (a little, but that's still too much). Furthermore, it's probably desirable to maintain a small delay between the dirty and the flush. So what I'll try first is just running kflush's timer faster, and make its reschedule period vary with disk load, i.e., when there are fewer queued_sectors, kflush looks at the dirty buffer list more often. The rest of what has to happen in kflush is pretty straightforward. It just uses queued_sectors to determine how far to walk the dirty buffer list, which is maintained in time-since-dirtied order. If queued_sectors is below some threshold the entire list is flushed. Note that we want to change the sense of b_flushtime to b_timedirtied. It's more efficient to do it this way anyway. I haven't done anything about preemptive pageout yet, but similar ideas apply. [1] This is an experiment, do not worry, it will not show up in your tree any time soon. IOW, constructive criticism appreciated, flames copied to /dev/null. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Mon, 18 Jun 2001, Daniel Phillips wrote: On Sunday 17 June 2001 12:05, Mike Galbraith wrote: It _juuust_ so happens that I was tinkering... what do you think of something like the below? (and boy do I ever wonder what a certain box doing slrn stuff thinks of it.. hint hint;) It's too subtle for me ;-) (Not shy about sying that because this part of the kernel is probably subtle for everyone.) No subtltry (hammer), it just draws a line that doesn't move around in unpredictable ways. For example, nr_free_buffer_pages() adds in free pages to the line it draws. You may have a large volume of dirty data, decide it would be prudent to flush, then someone frees a nice chunk of memory... (send morse code messages via malloc/free?:) Anyway it's crude, but it seems to have gotten results from the slrn load. I received logs for ac15 and ac15+patch. ac15 took 265 seconds to do the job whereas with the patch it took 227 seconds. I haven't poured over the logs yet, but there seems to be throughput to be had. If anyone is interested in the logs, they're much smaller than expected -rw-r--r-- 1 mikegusers 11993 Jun 19 05:58 ac15_mike.log -rw-r--r-- 1 mikegusers 13015 Jun 19 05:58 ac15_org.log The question I'm tackling right now is how the system behaves when the load goes away, or doesn't get heavy. Your patch doesn't measure the load directly - it may attempt to predict it as a function of memory pressure, but that's a little more loosely coupled than what I had in mind. It doesn't attempt to predict, it reacts to the existing situation. I'm now in the midst of hatching a patch. [1] The first thing I had to do is go explore the block driver code, yum yum. I found that it already computes the statistic I'm interested in, namely queued_sectors, which is used to pace the IO on block devices. It's a little crude - we really want this to be per-queue and have one queue per spindle - but even in its current form it's workable. The idea is that when queued_sectors drops below some threshold we have 'unused disk bandwidth' so it would be nice to do something useful with it: (that's much more subtle/clever:) 1) Do an early 'sync_old_buffers' 2) Do some preemptive pageout The benefit of (1) is that it lets disks go idle a few seconds earlier, and (2) should improve the system's latency in response to load surges. There are drawbacks too, which have been pointed out to me privately, but they tend to be pretty minor, for example: on a flash disk you'd do a few extra writes and wear it out ever-so-slightly sooner. All the same, such special devices can be dealt easily once we progress a little further in improving the kernel's 'per spindle' intelligence. Now how to implement this. I considered putting a (newly minted) wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded transition, and that's fine except it doesn't do the whole job: we also need to have the early flush for any write to a disk file while the disks are lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to trigger it. The missing trigger could be inserted into __mark_dirty, but that would penalize the loaded state (a little, but that's still too much). Furthermore, it's probably desirable to maintain a small delay between the dirty and the flush. So what I'll try first is just running kflush's timer faster, and make its reschedule period vary with disk load, i.e., when there are fewer queued_sectors, kflush looks at the dirty buffer list more often. The rest of what has to happen in kflush is pretty straightforward. It just uses queued_sectors to determine how far to walk the dirty buffer list, which is maintained in time-since-dirtied order. If queued_sectors is below some threshold the entire list is flushed. Note that we want to change the sense of b_flushtime to b_timedirtied. It's more efficient to do it this way anyway. I haven't done anything about preemptive pageout yet, but similar ideas apply. Preemptive pageout could be simply walk the dirty list looking for swap pages and writing them out. With the fair aging change that's already in, there will be some. If the fair aging change to background aging works out, there will be more (don't want too many more though;). The only problem I can see with that simle method is that once written, the page lands on the inactive_clean list. That list is short and does get consumed.. might turn fake pageout into a real one unintentionally. [1] This is an experiment, do not worry, it will not show up in your tree any time soon. IOW, constructive criticism appreciated, flames copied to /dev/null. Look forward to seeing it. -Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sun, 17 Jun 2001 [EMAIL PROTECTED] wrote: > On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote: > > > > It _juuust_ so happens that I was tinkering... what do you think of > > something like the below? (and boy do I ever wonder what a certain > > box doing slrn stuff thinks of it.. hint hint;) > > > I'm sorry to say this box doesn't really think any different of it. Well darn. But.. > Everything that's in the cache before running slrn on a big group seems > to stay there the whole time, making my active slrn-process use swap. It should not be the same data if page aging is working at all. Better stated, if it _is_ the same data and page aging is working, it's needed data, so the movement of momentarily unused rss to disk might have been the right thing to do.. it just has to buy you the use of the pages moved for long enough to offset the (large) cost of dropping those pages. I saw it adding rss to the aging pool, but not terribly much IO. The fact that it is using page replacement is only interesting in regard to total system efficiency. > I applied the patch to 2.4.5-ac15, and this was the result: Thanks for running it. Can you (afford to) send me procinfo or such (what I would like to see is job efficiency) information? Full logs are fine, as long as they're not truely huge :) Anything under a meg is gratefully accepted (privately 'course). I think (am pretty darn sure) the aging fairness change is what is affecting you, but it's not possible to see whether this change is affecting you in a negative or positive way without timing data. -Mike misc: wrt this ~patch, it only allows you to move the rolldown to sync disk behavior some.. moving write delay back some (knob) is _supposed_ to get that IO load (at least) a modest throughput increase. The flushto thing was basically directed toward laptop use, but ~seems to exhibit better IO clustering/bandwidth sharing as well. (less old/new request merging?.. distance?) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote: > > It _juuust_ so happens that I was tinkering... what do you think of > something like the below? (and boy do I ever wonder what a certain > box doing slrn stuff thinks of it.. hint hint;) > I'm sorry to say this box doesn't really think any different of it. Everything that's in the cache before running slrn on a big group seems to stay there the whole time, making my active slrn-process use swap. I applied the patch to 2.4.5-ac15, and this was the result: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 0 1 0 11216 2548 183560 264172 1 4 184 343 123 119 2 6 92 0 0 0 11212 2620 183444 264184 0 0 472 12799 1 2 97 0 0 0 11212 1604 183444 264740 0 0 378 0 130 101 2 1 98 0 1 0 11212 1588 184300 263116 0 0 552 1080 277 360 3 14 83 2 0 2 11212 1692 174052 270536 0 0 1860 0 596 976 9 50 40 2 0 2 11212 1588 166732 274816 0 0 1868 5426 643 1050 8 44 48 0 1 0 11212 1588 163276 276888 0 0 1714 1816 580 972 9 17 74 0 1 0 11212 1848 166280 273688 0 0 514 3952 301 355 3 40 57 1 0 0 11212 1592 164232 273872 0 0 1824 3532 632 1083 11 25 64 2 0 2 11212 1980 167304 268792 0 0 1678 0 550 881 8 51 41 0 1 2 11212 1588 163908 271356 0 0 1344 4896 508 753 7 26 67 1 0 0 11212 1588 160896 272756 0 0 1642 1301 574 929 9 22 69 0 1 0 11212 1592 164936 268632 0 0 756 3594 370 467 6 43 51 2 0 3 11212 1596 164380 266552 0 0 1904 2392 604 1017 10 52 37 1 0 0 11212 1592 164752 265844 0 0 1784 2382 623 1000 10 22 69 0 1 0 11212 1592 168528 262256 0 0 810 4176 364 523 5 43 52 0 1 1 11212 1992 169324 259504 0 0 1686 3068 578 999 11 42 47 0 1 0 11212 1588 170696 256332 0 0 1568 1080 532 894 10 20 70 1 0 0 11212 1592 174876 253036 0 0 598 3600 315 420 4 41 55 0 1 1 11212 2316 171592 253892 0 0 1816 3286 616 1073 7 29 64 0 1 0 11212 1588 170380 253968 0 0 1638 840 540 910 13 29 58 0 1 1 11212 2896 168840 253740 0 0 752 4120 342 458 4 45 51 0 1 0 11216 2012 166392 255560 0 0 1352 2458 549 895 8 14 77 2 0 1 11216 1588 170744 250164 0 0 1504 1260 503 791 7 48 45 0 1 1 11224 1588 170704 249948 0 0 874 4106 516 655 6 10 84 0 1 0 11228 1588 170148 248988 0 0 1442 0 466 772 8 20 73 1 0 0 11228 1592 171784 247456 0 0 860 3598 362 495 7 44 48 0 1 0 11228 1588 171864 246212 0 0 1390 3176 510 840 9 41 50 0 1 2 11232 1992 170344 245832 0 0 1676 1808 539 898 10 45 45 1 0 1 10508 1632 168204 246780 0 946 1508 2804 599 920 9 20 71 0 1 0 9496 2020 168904 244880 0 0 936 3620 417 603 5 35 60 1 0 0 9604 2516 164096 247536 0 0 1700 2214 563 1085 11 33 56 0 1 0 16196 1820 162112 255492 0 2 1384 1596 497 1106 8 53 38 1 0 0 19240 3000 158052 260608 0 0 400 3824 373 388 2 14 84 1 1 1 28756 4508 146032 278104 0 0 1688 2140 612 1502 7 60 33 2 0 0 39432 29100 105668 300912 0 18 2108 1178 645 1825 12 52 36 1 0 0 40668 13024 108568 311748 0 0 1674 4992 623 1017 9 12 79 0 1 0 45324 3484 105072 326432 0 0 1876 3624 619 1090 13 24 63 1 0 0 53648 1564 102740 337688 0 18 950 3646 404 857 5 31 63 2 0 0 53672 1604 103356 335680 0 2962 1436 5864 565 976 10 43 47 1 0 1 54380 1920 103516 334320 0 1086 1826 1626 590 1072 13 45 42 0 1 1 54600 6532 99568 333860 0 1006 242 5948 277 2680 2 39 59 0 1 0 54596 1944 103744 331932 0 0 1854 3644 627 1054 11 16 73 1 0 0 54592 1924 102876 331100 0 950 1956 2612 621 1173 11 41 48 1 0 0 54592 1592 103576 329568 0 0 1548 4860 605 1106 11 36 53 0 1 1 54592 1588 102908 328320 0 452 1808 2522 583 1049 11 51 38 0 1 1 54592 1588 101916 327076 0 866 1816 1260 589 1046 11 49 40 0 1 0 54592 2076 99568 327776 0 414 992 5728 459 1314 7 25 67 0 1 0 54592 1588 103928 323824 0 0 968 3646 403 747 5 33 61 1 0 0 54592 2632 100108 325136 0 402 1856 2468 622 1369 13 44 42 0 1 0 54592 1588 101872 322600 0 392 1056 2834 461 802 6 35 60 1 0 1 55644 1724 102108 322404 0 380 1448 2682 501 1032 9 50 41 1 1 1 57388 1588 103068 322056 0 0 1384 1396 471 780 8 37 56 0 1 1 58500 2048 102024 323020 0 368 876 3932 504
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Saturday 16 June 2001 23:54, Rik van Riel wrote: > On Sat, 16 Jun 2001, Daniel Phillips wrote: > > > Does the patch below do anything good for your laptop? ;) > > > > I'll wait for the next one ;-) > > OK, here's one which isn't reversed and should work ;)) > > --- fs/buffer.c.orig Sat Jun 16 18:05:29 2001 > +++ fs/buffer.c Sat Jun 16 18:05:15 2001 > @@ -2550,7 +2550,8 @@ > if the current bh is not yet timed out, > then also all the following bhs > will be too young. */ > - if (time_before(jiffies, bh->b_flushtime)) > + if (++flushed > bdf_prm.b_un.ndirty && > + time_before(jiffies, bh->b_flushtime)) > goto out_unlock; > } else { > if (++flushed > bdf_prm.b_un.ndirty) No, it doesn't, because some way of knowing the disk load is required and there's nothing like that here. There are two components to what I was talking about: 1) Early flush when load is light 2) Preemptive cleaning when load is light Both are supposed to be triggered by other disk activity, swapout or file writes, and are supposed to be triggered when the disk activity eases up. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Daniel Phillips wrote: > On Saturday 16 June 2001 23:06, Rik van Riel wrote: > > On Sat, 16 Jun 2001, Daniel Phillips wrote: > > > As a side note, the good old multisecond delay before bdflush kicks in > > > doesn't really make a lot of sense - when bandwidth is available the > > > filesystem-initiated writeouts should happen right away. > > > > ... thus spinning up the disk ? > > Nope, the disk is already spinning, some other writeouts just finished. > > > How about just making sure we write out a bigger bunch > > of dirty pages whenever one buffer gets too old ? > > It's simpler than that. It's basically just: disk traffic low? good, write > out all the dirty buffers. Not quite as crude as that, but nearly. > > > Does the patch below do anything good for your laptop? ;) > > I'll wait for the next one ;-) Greetings! (well, not next one, but one anyway) It _juuust_ so happens that I was tinkering... what do you think of something like the below? (and boy do I ever wonder what a certain box doing slrn stuff thinks of it.. hint hint;) -Mike Doing Bonnie in big fragmented 1k bs partition on the worst spot on the disk. Bad benchmark, bad conditions.. but interesting results. 2.4.6.pre3 before ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 500 9609 36.0 10569 14.3 3322 6.4 9509 47.6 10597 13.8 101.7 1.4 2.4.6.pre3 after (using flushto behavior as in defaults) ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 500 8293 30.2 11834 29.4 5072 9.5 8879 44.1 10597 13.6 100.4 0.9 2.4.6.pre3 after (flushto = ndirty) ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 500 10286 38.4 10715 14.4 3267 6.1 9605 47.6 10596 13.4 102.7 1.6 --- fs/buffer.c.org Fri Jun 15 06:48:17 2001 +++ fs/buffer.c Sun Jun 17 09:14:17 2001 @@ -118,20 +118,21 @@ wake-cycle */ int nrefill; /* Number of clean buffers to try to obtain each time we call refill */ - int dummy1; /* unused */ + int nflushto; /* Level to flush down to once bdflush starts */ int interval; /* jiffies delay between kupdate flushes */ int age_buffer; /* Time for normal buffer to age before we flush it */ int nfract_sync; /* Percentage of buffer cache dirty to activate bdflush synchronously */ - int dummy2;/* unused */ + int nmonitor;/* Size (%physpages) at which bdflush should + begin monitoring the buffercache */ int dummy3;/* unused */ } b_un; unsigned int data[N_PARAM]; -} bdf_prm = {{30, 64, 64, 256, 5*HZ, 30*HZ, 60, 0, 0}}; +} bdf_prm = {{60, 64, 64, 50, 5*HZ, 30*HZ, 85, 15, 0}}; /* These are the min and max parameter values that we will allow to be assigned */ -int bdflush_min[N_PARAM] = { 0, 10,5, 25, 0, 1*HZ, 0, 0, 0}; -int bdflush_max[N_PARAM] = {100,5, 2, 2,600*HZ, 6000*HZ, 100, 0, 0}; +int bdflush_min[N_PARAM] = {0, 10, 5, 0, 0, 1*HZ, 0, 0, 0}; +int bdflush_max[N_PARAM] = {100,5, 2, 100,600*HZ, 6000*HZ, 100, 100, 0}; /* * Rewrote the wait-routines to use the "new" wait-queue functionality, @@ -763,12 +764,8 @@ balance_dirty(NODEV); if (free_shortage()) page_launder(GFP_BUFFER, 0); - if (!grow_buffers(size)) { + if (!grow_buffers(size)) wakeup_bdflush(1); - current->policy |= SCHED_YIELD; - __set_current_state(TASK_RUNNING); - schedule(); - } } void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private) @@ -1042,25 +1039,43 @@ 1 -> sync flush (wait for I/O completion) */ int balance_dirty_state(kdev_t dev) { - unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit; - - dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT; - tot = nr_free_buffer_pages(); + unsigned long dirty, cache, buffers = 0; + int i; - dirty *= 100; - soft_dirty_limit = tot * bdf_prm.b_un.nfract; - hard_dirty_limit = tot * bdf_prm.b_un.nfract_sync; - - /* First, check for the "real" dirty limit. */ - if (dirty > soft_dirty_limit) { - if (dirty > hard_dirty_limit) + for (i = 0; i < NR_LIST; i++) + buffers += size_buffers_type[i]; + buffers >>= PAGE_SHIFT; + if (buffers * 100 < num_physpages *
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Daniel Phillips wrote: On Saturday 16 June 2001 23:06, Rik van Riel wrote: On Sat, 16 Jun 2001, Daniel Phillips wrote: As a side note, the good old multisecond delay before bdflush kicks in doesn't really make a lot of sense - when bandwidth is available the filesystem-initiated writeouts should happen right away. ... thus spinning up the disk ? Nope, the disk is already spinning, some other writeouts just finished. How about just making sure we write out a bigger bunch of dirty pages whenever one buffer gets too old ? It's simpler than that. It's basically just: disk traffic low? good, write out all the dirty buffers. Not quite as crude as that, but nearly. Does the patch below do anything good for your laptop? ;) I'll wait for the next one ;-) Greetings! (well, not next one, but one anyway) It _juuust_ so happens that I was tinkering... what do you think of something like the below? (and boy do I ever wonder what a certain box doing slrn stuff thinks of it.. hint hint;) -Mike Doing Bonnie in big fragmented 1k bs partition on the worst spot on the disk. Bad benchmark, bad conditions.. but interesting results. 2.4.6.pre3 before ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 500 9609 36.0 10569 14.3 3322 6.4 9509 47.6 10597 13.8 101.7 1.4 2.4.6.pre3 after (using flushto behavior as in defaults) ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 500 8293 30.2 11834 29.4 5072 9.5 8879 44.1 10597 13.6 100.4 0.9 2.4.6.pre3 after (flushto = ndirty) ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 500 10286 38.4 10715 14.4 3267 6.1 9605 47.6 10596 13.4 102.7 1.6 --- fs/buffer.c.org Fri Jun 15 06:48:17 2001 +++ fs/buffer.c Sun Jun 17 09:14:17 2001 @@ -118,20 +118,21 @@ wake-cycle */ int nrefill; /* Number of clean buffers to try to obtain each time we call refill */ - int dummy1; /* unused */ + int nflushto; /* Level to flush down to once bdflush starts */ int interval; /* jiffies delay between kupdate flushes */ int age_buffer; /* Time for normal buffer to age before we flush it */ int nfract_sync; /* Percentage of buffer cache dirty to activate bdflush synchronously */ - int dummy2;/* unused */ + int nmonitor;/* Size (%physpages) at which bdflush should + begin monitoring the buffercache */ int dummy3;/* unused */ } b_un; unsigned int data[N_PARAM]; -} bdf_prm = {{30, 64, 64, 256, 5*HZ, 30*HZ, 60, 0, 0}}; +} bdf_prm = {{60, 64, 64, 50, 5*HZ, 30*HZ, 85, 15, 0}}; /* These are the min and max parameter values that we will allow to be assigned */ -int bdflush_min[N_PARAM] = { 0, 10,5, 25, 0, 1*HZ, 0, 0, 0}; -int bdflush_max[N_PARAM] = {100,5, 2, 2,600*HZ, 6000*HZ, 100, 0, 0}; +int bdflush_min[N_PARAM] = {0, 10, 5, 0, 0, 1*HZ, 0, 0, 0}; +int bdflush_max[N_PARAM] = {100,5, 2, 100,600*HZ, 6000*HZ, 100, 100, 0}; /* * Rewrote the wait-routines to use the new wait-queue functionality, @@ -763,12 +764,8 @@ balance_dirty(NODEV); if (free_shortage()) page_launder(GFP_BUFFER, 0); - if (!grow_buffers(size)) { + if (!grow_buffers(size)) wakeup_bdflush(1); - current-policy |= SCHED_YIELD; - __set_current_state(TASK_RUNNING); - schedule(); - } } void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private) @@ -1042,25 +1039,43 @@ 1 - sync flush (wait for I/O completion) */ int balance_dirty_state(kdev_t dev) { - unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit; - - dirty = size_buffers_type[BUF_DIRTY] PAGE_SHIFT; - tot = nr_free_buffer_pages(); + unsigned long dirty, cache, buffers = 0; + int i; - dirty *= 100; - soft_dirty_limit = tot * bdf_prm.b_un.nfract; - hard_dirty_limit = tot * bdf_prm.b_un.nfract_sync; - - /* First, check for the real dirty limit. */ - if (dirty soft_dirty_limit) { - if (dirty hard_dirty_limit) + for (i = 0; i NR_LIST; i++) + buffers += size_buffers_type[i]; + buffers = PAGE_SHIFT; + if (buffers * 100 num_physpages * bdf_prm.b_un.nmonitor) + return
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Saturday 16 June 2001 23:54, Rik van Riel wrote: On Sat, 16 Jun 2001, Daniel Phillips wrote: Does the patch below do anything good for your laptop? ;) I'll wait for the next one ;-) OK, here's one which isn't reversed and should work ;)) --- fs/buffer.c.orig Sat Jun 16 18:05:29 2001 +++ fs/buffer.c Sat Jun 16 18:05:15 2001 @@ -2550,7 +2550,8 @@ if the current bh is not yet timed out, then also all the following bhs will be too young. */ - if (time_before(jiffies, bh-b_flushtime)) + if (++flushed bdf_prm.b_un.ndirty + time_before(jiffies, bh-b_flushtime)) goto out_unlock; } else { if (++flushed bdf_prm.b_un.ndirty) No, it doesn't, because some way of knowing the disk load is required and there's nothing like that here. There are two components to what I was talking about: 1) Early flush when load is light 2) Preemptive cleaning when load is light Both are supposed to be triggered by other disk activity, swapout or file writes, and are supposed to be triggered when the disk activity eases up. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote: It _juuust_ so happens that I was tinkering... what do you think of something like the below? (and boy do I ever wonder what a certain box doing slrn stuff thinks of it.. hint hint;) I'm sorry to say this box doesn't really think any different of it. Everything that's in the cache before running slrn on a big group seems to stay there the whole time, making my active slrn-process use swap. I applied the patch to 2.4.5-ac15, and this was the result: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 0 1 0 11216 2548 183560 264172 1 4 184 343 123 119 2 6 92 0 0 0 11212 2620 183444 264184 0 0 472 12799 1 2 97 0 0 0 11212 1604 183444 264740 0 0 378 0 130 101 2 1 98 0 1 0 11212 1588 184300 263116 0 0 552 1080 277 360 3 14 83 2 0 2 11212 1692 174052 270536 0 0 1860 0 596 976 9 50 40 2 0 2 11212 1588 166732 274816 0 0 1868 5426 643 1050 8 44 48 0 1 0 11212 1588 163276 276888 0 0 1714 1816 580 972 9 17 74 0 1 0 11212 1848 166280 273688 0 0 514 3952 301 355 3 40 57 1 0 0 11212 1592 164232 273872 0 0 1824 3532 632 1083 11 25 64 2 0 2 11212 1980 167304 268792 0 0 1678 0 550 881 8 51 41 0 1 2 11212 1588 163908 271356 0 0 1344 4896 508 753 7 26 67 1 0 0 11212 1588 160896 272756 0 0 1642 1301 574 929 9 22 69 0 1 0 11212 1592 164936 268632 0 0 756 3594 370 467 6 43 51 2 0 3 11212 1596 164380 266552 0 0 1904 2392 604 1017 10 52 37 1 0 0 11212 1592 164752 265844 0 0 1784 2382 623 1000 10 22 69 0 1 0 11212 1592 168528 262256 0 0 810 4176 364 523 5 43 52 0 1 1 11212 1992 169324 259504 0 0 1686 3068 578 999 11 42 47 0 1 0 11212 1588 170696 256332 0 0 1568 1080 532 894 10 20 70 1 0 0 11212 1592 174876 253036 0 0 598 3600 315 420 4 41 55 0 1 1 11212 2316 171592 253892 0 0 1816 3286 616 1073 7 29 64 0 1 0 11212 1588 170380 253968 0 0 1638 840 540 910 13 29 58 0 1 1 11212 2896 168840 253740 0 0 752 4120 342 458 4 45 51 0 1 0 11216 2012 166392 255560 0 0 1352 2458 549 895 8 14 77 2 0 1 11216 1588 170744 250164 0 0 1504 1260 503 791 7 48 45 0 1 1 11224 1588 170704 249948 0 0 874 4106 516 655 6 10 84 0 1 0 11228 1588 170148 248988 0 0 1442 0 466 772 8 20 73 1 0 0 11228 1592 171784 247456 0 0 860 3598 362 495 7 44 48 0 1 0 11228 1588 171864 246212 0 0 1390 3176 510 840 9 41 50 0 1 2 11232 1992 170344 245832 0 0 1676 1808 539 898 10 45 45 1 0 1 10508 1632 168204 246780 0 946 1508 2804 599 920 9 20 71 0 1 0 9496 2020 168904 244880 0 0 936 3620 417 603 5 35 60 1 0 0 9604 2516 164096 247536 0 0 1700 2214 563 1085 11 33 56 0 1 0 16196 1820 162112 255492 0 2 1384 1596 497 1106 8 53 38 1 0 0 19240 3000 158052 260608 0 0 400 3824 373 388 2 14 84 1 1 1 28756 4508 146032 278104 0 0 1688 2140 612 1502 7 60 33 2 0 0 39432 29100 105668 300912 0 18 2108 1178 645 1825 12 52 36 1 0 0 40668 13024 108568 311748 0 0 1674 4992 623 1017 9 12 79 0 1 0 45324 3484 105072 326432 0 0 1876 3624 619 1090 13 24 63 1 0 0 53648 1564 102740 337688 0 18 950 3646 404 857 5 31 63 2 0 0 53672 1604 103356 335680 0 2962 1436 5864 565 976 10 43 47 1 0 1 54380 1920 103516 334320 0 1086 1826 1626 590 1072 13 45 42 0 1 1 54600 6532 99568 333860 0 1006 242 5948 277 2680 2 39 59 0 1 0 54596 1944 103744 331932 0 0 1854 3644 627 1054 11 16 73 1 0 0 54592 1924 102876 331100 0 950 1956 2612 621 1173 11 41 48 1 0 0 54592 1592 103576 329568 0 0 1548 4860 605 1106 11 36 53 0 1 1 54592 1588 102908 328320 0 452 1808 2522 583 1049 11 51 38 0 1 1 54592 1588 101916 327076 0 866 1816 1260 589 1046 11 49 40 0 1 0 54592 2076 99568 327776 0 414 992 5728 459 1314 7 25 67 0 1 0 54592 1588 103928 323824 0 0 968 3646 403 747 5 33 61 1 0 0 54592 2632 100108 325136 0 402 1856 2468 622 1369 13 44 42 0 1 0 54592 1588 101872 322600 0 392 1056 2834 461 802 6 35 60 1 0 1 55644 1724 102108 322404 0 380 1448 2682 501 1032 9 50 41 1 1 1 57388 1588 103068 322056 0 0 1384 1396 471 780 8 37 56 0 1 1 58500 2048 102024 323020 0 368 876 3932 504
Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sun, 17 Jun 2001 [EMAIL PROTECTED] wrote: On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote: It _juuust_ so happens that I was tinkering... what do you think of something like the below? (and boy do I ever wonder what a certain box doing slrn stuff thinks of it.. hint hint;) I'm sorry to say this box doesn't really think any different of it. Well darn. But.. Everything that's in the cache before running slrn on a big group seems to stay there the whole time, making my active slrn-process use swap. It should not be the same data if page aging is working at all. Better stated, if it _is_ the same data and page aging is working, it's needed data, so the movement of momentarily unused rss to disk might have been the right thing to do.. it just has to buy you the use of the pages moved for long enough to offset the (large) cost of dropping those pages. I saw it adding rss to the aging pool, but not terribly much IO. The fact that it is using page replacement is only interesting in regard to total system efficiency. I applied the patch to 2.4.5-ac15, and this was the result: saves vmstat Thanks for running it. Can you (afford to) send me procinfo or such (what I would like to see is job efficiency) information? Full logs are fine, as long as they're not truely huge :) Anything under a meg is gratefully accepted (privately 'course). I think (am pretty darn sure) the aging fairness change is what is affecting you, but it's not possible to see whether this change is affecting you in a negative or positive way without timing data. -Mike misc: wrt this ~patch, it only allows you to move the rolldown to sync disk behavior some.. moving write delay back some (knob) is _supposed_ to get that IO load (at least) a modest throughput increase. The flushto thing was basically directed toward laptop use, but ~seems to exhibit better IO clustering/bandwidth sharing as well. (less old/new request merging?.. distance?) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Daniel Phillips wrote: > > Does the patch below do anything good for your laptop? ;) > > I'll wait for the next one ;-) OK, here's one which isn't reversed and should work ;)) --- fs/buffer.c.origSat Jun 16 18:05:29 2001 +++ fs/buffer.c Sat Jun 16 18:05:15 2001 @@ -2550,7 +2550,8 @@ if the current bh is not yet timed out, then also all the following bhs will be too young. */ - if (time_before(jiffies, bh->b_flushtime)) + if (++flushed > bdf_prm.b_un.ndirty && + time_before(jiffies, bh->b_flushtime)) goto out_unlock; } else { if (++flushed > bdf_prm.b_un.ndirty) cheers, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Saturday 16 June 2001 23:06, Rik van Riel wrote: > On Sat, 16 Jun 2001, Daniel Phillips wrote: > > As a side note, the good old multisecond delay before bdflush kicks in > > doesn't really make a lot of sense - when bandwidth is available the > > filesystem-initiated writeouts should happen right away. > > ... thus spinning up the disk ? Nope, the disk is already spinning, some other writeouts just finished. > How about just making sure we write out a bigger bunch > of dirty pages whenever one buffer gets too old ? It's simpler than that. It's basically just: disk traffic low? good, write out all the dirty buffers. Not quite as crude as that, but nearly. > Does the patch below do anything good for your laptop? ;) I'll wait for the next one ;-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Rik van Riel wrote: Oops, I did something stupid and the patch is reversed ;) > --- buffer.c.orig Sat Jun 16 18:05:15 2001 > +++ buffer.c Sat Jun 16 18:05:29 2001 > @@ -2550,8 +2550,7 @@ > if the current bh is not yet timed out, > then also all the following bhs > will be too young. */ > - if (++flushed > bdf_prm.b_un.ndirty && > - time_before(jiffies, bh->b_flushtime)) > + if(time_before(jiffies, bh->b_flushtime)) > goto out_unlock; > } else { > if (++flushed > bdf_prm.b_un.ndirty) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Daniel Phillips wrote: > In other words, any episode of pageouts is followed immediately by a > short episode of preemptive cleaning. linux/mm/vmscan.c::page_launder(), around line 666: /* Let bdflush take care of the rest. */ wakeup_bdflush(0); > The definition of 'for a while' and 'plenty of disk bandwidth' can be > tuned, but I don't think either is particularly critical. Can be tuned a bit, indeed. > As a side note, the good old multisecond delay before bdflush kicks in > doesn't really make a lot of sense - when bandwidth is available the > filesystem-initiated writeouts should happen right away. ... thus spinning up the disk ? How about just making sure we write out a bigger bunch of dirty pages whenever one buffer gets too old ? Does the patch below do anything good for your laptop? ;) regards, Rik -- --- buffer.c.orig Sat Jun 16 18:05:15 2001 +++ buffer.cSat Jun 16 18:05:29 2001 @@ -2550,8 +2550,7 @@ if the current bh is not yet timed out, then also all the following bhs will be too young. */ - if (++flushed > bdf_prm.b_un.ndirty && - time_before(jiffies, bh->b_flushtime)) + if(time_before(jiffies, bh->b_flushtime)) goto out_unlock; } else { if (++flushed > bdf_prm.b_un.ndirty) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Friday 15 June 2001 17:23, Pavel Machek wrote: > Hi! > > > Roger> It does if you are running on a laptop. Then you do not want > > Roger> the pages go out all the time. Disk has gone too sleep, needs > > Roger> to start to write a few pages, stays idle for a while, goes to > > Roger> sleep, a few more pages, ... > > > > That could be handled by a metric which says if the disk is spun down, > > wait until there is more memory pressure before writing. But if the > > disk is spinning, we don't care, you should start writing out buffers > > at some low rate to keep the pressure from rising too rapidly. > > Notice that write is not free (in terms of power) even if disk is spinning. > Seeks (etc) also take some power. And think about flashcards. It certainly > is cheaper tha spinning disk up but still not free. > > Also note that kernel does not [currently] know that disks went spindown. There's an easy answer that should work well on both servers and laptops, that goes something like this: when memory pressure has been brought to 0, if there there is plenty of disk bandwidth available, continue writeout for a while and clean some extra pages. In other words, any episode of pageouts is followed immediately by a short episode of preemptive cleaning. This gives both the preemptive cleaning we want in order to respond to the next surge, and lets the laptop disk spin down. The definition of 'for a while' and 'plenty of disk bandwidth' can be tuned, but I don't think either is particularly critical. As a side note, the good old multisecond delay before bdflush kicks in doesn't really make a lot of sense - when bandwidth is available the filesystem-initiated writeouts should happen right away. It's not necessary or desirable to write out more dirty pages after the machine has been idle for a while, if only because the longer it's idle the less the 'surge protection' matters in terms of average throughput. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
Hi! > Roger> It does if you are running on a laptop. Then you do not want > Roger> the pages go out all the time. Disk has gone too sleep, needs > Roger> to start to write a few pages, stays idle for a while, goes to > Roger> sleep, a few more pages, ... > > That could be handled by a metric which says if the disk is spun down, > wait until there is more memory pressure before writing. But if the > disk is spinning, we don't care, you should start writing out buffers > at some low rate to keep the pressure from rising too rapidly. Notice that write is not free (in terms of power) even if disk is spinning. Seeks (etc) also take some power. And think about flashcards. It certainly is cheaper tha spinning disk up but still not free. Also note that kernel does not [currently] know that disks went spindown. Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
Hi! Roger It does if you are running on a laptop. Then you do not want Roger the pages go out all the time. Disk has gone too sleep, needs Roger to start to write a few pages, stays idle for a while, goes to Roger sleep, a few more pages, ... That could be handled by a metric which says if the disk is spun down, wait until there is more memory pressure before writing. But if the disk is spinning, we don't care, you should start writing out buffers at some low rate to keep the pressure from rising too rapidly. Notice that write is not free (in terms of power) even if disk is spinning. Seeks (etc) also take some power. And think about flashcards. It certainly is cheaper tha spinning disk up but still not free. Also note that kernel does not [currently] know that disks went spindown. Pavel -- Philips Velo 1: 1x4x8, 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Friday 15 June 2001 17:23, Pavel Machek wrote: Hi! Roger It does if you are running on a laptop. Then you do not want Roger the pages go out all the time. Disk has gone too sleep, needs Roger to start to write a few pages, stays idle for a while, goes to Roger sleep, a few more pages, ... That could be handled by a metric which says if the disk is spun down, wait until there is more memory pressure before writing. But if the disk is spinning, we don't care, you should start writing out buffers at some low rate to keep the pressure from rising too rapidly. Notice that write is not free (in terms of power) even if disk is spinning. Seeks (etc) also take some power. And think about flashcards. It certainly is cheaper tha spinning disk up but still not free. Also note that kernel does not [currently] know that disks went spindown. There's an easy answer that should work well on both servers and laptops, that goes something like this: when memory pressure has been brought to 0, if there there is plenty of disk bandwidth available, continue writeout for a while and clean some extra pages. In other words, any episode of pageouts is followed immediately by a short episode of preemptive cleaning. This gives both the preemptive cleaning we want in order to respond to the next surge, and lets the laptop disk spin down. The definition of 'for a while' and 'plenty of disk bandwidth' can be tuned, but I don't think either is particularly critical. As a side note, the good old multisecond delay before bdflush kicks in doesn't really make a lot of sense - when bandwidth is available the filesystem-initiated writeouts should happen right away. It's not necessary or desirable to write out more dirty pages after the machine has been idle for a while, if only because the longer it's idle the less the 'surge protection' matters in terms of average throughput. -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Daniel Phillips wrote: In other words, any episode of pageouts is followed immediately by a short episode of preemptive cleaning. linux/mm/vmscan.c::page_launder(), around line 666: /* Let bdflush take care of the rest. */ wakeup_bdflush(0); The definition of 'for a while' and 'plenty of disk bandwidth' can be tuned, but I don't think either is particularly critical. Can be tuned a bit, indeed. As a side note, the good old multisecond delay before bdflush kicks in doesn't really make a lot of sense - when bandwidth is available the filesystem-initiated writeouts should happen right away. ... thus spinning up the disk ? How about just making sure we write out a bigger bunch of dirty pages whenever one buffer gets too old ? Does the patch below do anything good for your laptop? ;) regards, Rik -- --- buffer.c.orig Sat Jun 16 18:05:15 2001 +++ buffer.cSat Jun 16 18:05:29 2001 @@ -2550,8 +2550,7 @@ if the current bh is not yet timed out, then also all the following bhs will be too young. */ - if (++flushed bdf_prm.b_un.ndirty - time_before(jiffies, bh-b_flushtime)) + if(time_before(jiffies, bh-b_flushtime)) goto out_unlock; } else { if (++flushed bdf_prm.b_un.ndirty) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Rik van Riel wrote: Oops, I did something stupid and the patch is reversed ;) --- buffer.c.orig Sat Jun 16 18:05:15 2001 +++ buffer.c Sat Jun 16 18:05:29 2001 @@ -2550,8 +2550,7 @@ if the current bh is not yet timed out, then also all the following bhs will be too young. */ - if (++flushed bdf_prm.b_un.ndirty - time_before(jiffies, bh-b_flushtime)) + if(time_before(jiffies, bh-b_flushtime)) goto out_unlock; } else { if (++flushed bdf_prm.b_un.ndirty) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Saturday 16 June 2001 23:06, Rik van Riel wrote: On Sat, 16 Jun 2001, Daniel Phillips wrote: As a side note, the good old multisecond delay before bdflush kicks in doesn't really make a lot of sense - when bandwidth is available the filesystem-initiated writeouts should happen right away. ... thus spinning up the disk ? Nope, the disk is already spinning, some other writeouts just finished. How about just making sure we write out a bigger bunch of dirty pages whenever one buffer gets too old ? It's simpler than that. It's basically just: disk traffic low? good, write out all the dirty buffers. Not quite as crude as that, but nearly. Does the patch below do anything good for your laptop? ;) I'll wait for the next one ;-) -- Daniel - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]
On Sat, 16 Jun 2001, Daniel Phillips wrote: Does the patch below do anything good for your laptop? ;) I'll wait for the next one ;-) OK, here's one which isn't reversed and should work ;)) --- fs/buffer.c.origSat Jun 16 18:05:29 2001 +++ fs/buffer.c Sat Jun 16 18:05:15 2001 @@ -2550,7 +2550,8 @@ if the current bh is not yet timed out, then also all the following bhs will be too young. */ - if (time_before(jiffies, bh-b_flushtime)) + if (++flushed bdf_prm.b_un.ndirty + time_before(jiffies, bh-b_flushtime)) goto out_unlock; } else { if (++flushed bdf_prm.b_un.ndirty) cheers, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Hi! > >Thanks for this patch. But why hasn't it been included into > >the kernel earlier? Wouldn't be a combination of yours and my > > It's basically included into 2.4.x. > > >patch be the proper way? As far as I understand you switch > > Your patch is sure fine. BTW, 2.4.x have an high limit of 10 minutes (as > opposed to 2.2.x that have an high limit of 1 minute). I'd suggest to > clean the patch to only increase the high limit value (one liner). Thanks. 10minutes still seems little low for me. What is high limit good for, anyway? Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Hi! > o developpers, > > this is a short description of a particular wish of notebook > users. Since kernel 2.2.11 the buffer flushing deamon is no longer > a user space program but part of the kernel (in fs/buffer.c). > > Before this kernel release it was the bdflush-program which > could be called with certain command line parameters in order > to control flushing of file system buffers. In particular in > combination with the hdparm-program it could be used to > spin down the hard disk (e.g. of laptops) if it was not accessed. > > I know that by writing to /proc/sys/vm/bdflush relevant kernel > paramters may be modified. But there are certain limits compiled > into every kernel, which have the consequence that > > *a silent hard disk hard disk is no longer feasible since kernel > 2.2.11*. Well, with noflushd daemon, it works for me. > I have modified the constants used in fs/buffer.c to allow for > bigger time intervals between forced buffer flushings. The "patch" > may be found at http://www.hmi.de/people/brunne/Spindown . Can you mail me the patch? [I do not have way to access web easily.] > Shouldn't Linux support hard disk spindown during periods of > inactivity? Is the tiny patch worth of being included into standard > kernels? Noflushd is little hacky. If your patch is really tiny, post actual patch to l-k for discussion. [Can you gracefully handle case of few active and few inactive disks?] Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Hi! o developpers, this is a short description of a particular wish of notebook users. Since kernel 2.2.11 the buffer flushing deamon is no longer a user space program but part of the kernel (in fs/buffer.c). Before this kernel release it was the bdflush-program which could be called with certain command line parameters in order to control flushing of file system buffers. In particular in combination with the hdparm-program it could be used to spin down the hard disk (e.g. of laptops) if it was not accessed. I know that by writing to /proc/sys/vm/bdflush relevant kernel paramters may be modified. But there are certain limits compiled into every kernel, which have the consequence that *a silent hard disk hard disk is no longer feasible since kernel 2.2.11*. Well, with noflushd daemon, it works for me. I have modified the constants used in fs/buffer.c to allow for bigger time intervals between forced buffer flushings. The "patch" may be found at http://www.hmi.de/people/brunne/Spindown . Can you mail me the patch? [I do not have way to access web easily.] Shouldn't Linux support hard disk spindown during periods of inactivity? Is the tiny patch worth of being included into standard kernels? Noflushd is little hacky. If your patch is really tiny, post actual patch to l-k for discussion. [Can you gracefully handle case of few active and few inactive disks?] Pavel -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
On Tue, 12 Sep 2000, Jamie Lokier wrote: > Dave Zarzycki wrote: > > Personally speaking, I always thought it would be nice if the kernel > > flushed dirty buffers right before a disk spins down. It seems silly to me > > that a disk can spin down with writes pending. > > Absolutely. That allows more time spun down too. Pavel Machek sent me a patch for noflushd to do exactly this. Need not be a kernel issue either. Regards, Daniel. -- GNU/Linux Audio Mechanics - http://www.glame.de Cutting Edge Office - http://www.c10a02.de GPG Key ID 89BF7E2B - http://www.keyserver.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Hi! > > On Sat, 9 Sep 2000 [EMAIL PROTECTED] wrote: > > > Would it be possible to detect when the disk spins up, and do the flush then? > > Yes if you had a continuious polling of power status wrt standby. > > I think the following flushing policy would work almost as well, while > remaining generic: > > - if there's a read that is not handled from the buffer cache, flush >(write) all dirty buffers > - if we need to flush (write) one dirty buffers, flush all others too > > This wouldn't catch cases like an explicit spin-up without data I/O, > but I don't think this is much of a problem in real life. noflushd works for me. It monitors "read/write" counters in /proc/stat, and if it detects activity, it syncs(). If it detect idle period, it syncs() then spins disk down. Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Dave Zarzycki wrote: > Personally speaking, I always thought it would be nice if the kernel > flushed dirty buffers right before a disk spins down. It seems silly to me > that a disk can spin down with writes pending. Absolutely. That allows more time spun down too. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Dave Zarzycki wrote: Personally speaking, I always thought it would be nice if the kernel flushed dirty buffers right before a disk spins down. It seems silly to me that a disk can spin down with writes pending. Absolutely. That allows more time spun down too. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Notebook disk spindown
Hi! On Sat, 9 Sep 2000 [EMAIL PROTECTED] wrote: Would it be possible to detect when the disk spins up, and do the flush then? Yes if you had a continuious polling of power status wrt standby. I think the following flushing policy would work almost as well, while remaining generic: - if there's a read that is not handled from the buffer cache, flush (write) all dirty buffers - if we need to flush (write) one dirty buffers, flush all others too This wouldn't catch cases like an explicit spin-up without data I/O, but I don't think this is much of a problem in real life. noflushd works for me. It monitors "read/write" counters in /proc/stat, and if it detects activity, it syncs(). If it detect idle period, it syncs() then spins disk down. Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/