Re: Disk spindown on rmmod sd_mod

2007-10-26 Thread Tejun Heo
Jan Engelhardt wrote:
> I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran 
> from initramfs, caused all my disks to spindown - sd even told me so.
> 
> I recall there has been talk a while back about whether to spin down 
> disks on shutdown or not, but I do not think it touched the removal of 
> sd_mod, did it? So either way, can someone fill me in why the spindown 
> is done?

The problem is that it's difficult to tell why a disk is going down from
sd_shutdown(), so it issues STOP unless system state is SYSTEM_RESTART.
 Maybe we need to issue STOP only for SYSTEM_HALT, SYSTEM_POWER_OFF and
SYSTEM_SUSPEND_DISK.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Disk spindown on rmmod sd_mod

2007-10-26 Thread Tejun Heo
Jan Engelhardt wrote:
 I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran 
 from initramfs, caused all my disks to spindown - sd even told me so.
 
 I recall there has been talk a while back about whether to spin down 
 disks on shutdown or not, but I do not think it touched the removal of 
 sd_mod, did it? So either way, can someone fill me in why the spindown 
 is done?

The problem is that it's difficult to tell why a disk is going down from
sd_shutdown(), so it issues STOP unless system state is SYSTEM_RESTART.
 Maybe we need to issue STOP only for SYSTEM_HALT, SYSTEM_POWER_OFF and
SYSTEM_SUSPEND_DISK.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Spindown error on shutdown

2007-10-03 Thread Renato S. Yamane

Hi, I use 2.6.22.9 with CFS-v22.

When I shutdown my laptop I see a error (last message on shutdown, after 
"will be halt now"), but I can't read because is very fast (laptop 
power-off automatically).


I see something about "Spindown error on ata-piix".

I try found on /var/log (messages, kern) and don't see anything.

.config is attached.

Regards,
Renato
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22.9-cfs-v22
# Sun Sep 30 15:16:21 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=14
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUMM=y
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
# CONFIG_X86_MCE_P4THERMAL is not set
CONFIG_VM86=y
CONFIG_TOSHIBA=m
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
# CONFIG_EDD is not s

Spindown error on shutdown

2007-10-03 Thread Renato S. Yamane

Hi, I use 2.6.22.9 with CFS-v22.

When I shutdown my laptop I see a error (last message on shutdown, after 
will be halt now), but I can't read because is very fast (laptop 
power-off automatically).


I see something about Spindown error on ata-piix.

I try found on /var/log (messages, kern) and don't see anything.

.config is attached.

Regards,
Renato
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22.9-cfs-v22
# Sun Sep 30 15:16:21 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=14
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUMM=y
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
# CONFIG_X86_MCE_P4THERMAL is not set
CONFIG_VM86=y
CONFIG_TOSHIBA=m
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_EFI_VARS is not set
# CONFIG_DELL_RBU is not set

Disk spindown on rmmod sd_mod

2007-10-02 Thread Jan Engelhardt
Hi,



I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran 
from initramfs, caused all my disks to spindown - sd even told me so.

I recall there has been talk a while back about whether to spin down 
disks on shutdown or not, but I do not think it touched the removal of 
sd_mod, did it? So either way, can someone fill me in why the spindown 
is done?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Disk spindown on rmmod sd_mod

2007-10-02 Thread Jan Engelhardt
Hi,



I am using 2.6.23-rc9 with pata_sis. `modprobe -r sd_mod`, which I ran 
from initramfs, caused all my disks to spindown - sd even told me so.

I recall there has been talk a while back about whether to spin down 
disks on shutdown or not, but I do not think it touched the removal of 
sd_mod, did it? So either way, can someone fill me in why the spindown 
is done?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Greg KH
On Thu, Jun 14, 2007 at 01:48:46PM -0400, Daniel Drake wrote:
>  Greg KH wrote:
> > I think it looks way too big.
> 
>  Agreed (otherwise I would have submitted the patches already).
> 
> > If there are smaller patches, it might be a bit more reasonable.
> 
>  It may be possible to get rid of the couple of unrelated ones (sd printing, 
>  SCSI constants). These were required for the real patches to be able to 
>  build, but it would probably be easy to modify the real patches to build 
>  against kernels without those otherwise unrelated patches.
> 
> > Are there reported bugs that this patchset fixes?
> 
>  Yes, but they are not regressions - libata has never done this right until 
>  now.
> 
>  Here are a few:
>  https://bugs.gentoo.org/show_bug.cgi?id=174373
>  http://bugzilla.kernel.org/show_bug.cgi?id=7674
>  http://bugzilla.kernel.org/show_bug.cgi?id=7838
>  https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810

Ok, if people want to post some smaller patches to the stable team,
we'll be glad to consider them.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Daniel Drake

Greg KH wrote:

I think it looks way too big.


Agreed (otherwise I would have submitted the patches already).


If there are smaller patches, it might be a bit more reasonable.


It may be possible to get rid of the couple of unrelated ones (sd 
printing, SCSI constants). These were required for the real patches to 
be able to build, but it would probably be easy to modify the real 
patches to build against kernels without those otherwise unrelated patches.



Are there reported bugs that this patchset fixes?


Yes, but they are not regressions - libata has never done this right 
until now.


Here are a few:
https://bugs.gentoo.org/show_bug.cgi?id=174373
http://bugzilla.kernel.org/show_bug.cgi?id=7674
http://bugzilla.kernel.org/show_bug.cgi?id=7838
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Henrique de Moraes Holschuh
On Thu, 14 Jun 2007, Greg KH wrote:
> Are there reported bugs that this patchset fixes?

Yes, at least one I opened and which got a CODE_FIX when Tejun prepared the
first version of the patch.

http://bugzilla.kernel.org/show_bug.cgi?id=7838

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Greg KH


On Thu, Jun 14, 2007 at 12:01:14PM -0400, Chuck Ebbert wrote:
> Should we put these patches in 2.6.21-stable?
> 
> Gentoo developers did a full backport:
> 
> http://marc.info/?l=linux-ide=118047865916766=2

I think it looks way too big.

If there are smaller patches, it might be a bit more reasonable.

Are there reported bugs that this patchset fixes?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Chuck Ebbert
Should we put these patches in 2.6.21-stable?

Gentoo developers did a full backport:

http://marc.info/?l=linux-ide=118047865916766=2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Chuck Ebbert
Should we put these patches in 2.6.21-stable?

Gentoo developers did a full backport:

http://marc.info/?l=linux-idem=118047865916766w=2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Greg KH
added Daniel to CC:

On Thu, Jun 14, 2007 at 12:01:14PM -0400, Chuck Ebbert wrote:
 Should we put these patches in 2.6.21-stable?
 
 Gentoo developers did a full backport:
 
 http://marc.info/?l=linux-idem=118047865916766w=2

I think it looks way too big.

If there are smaller patches, it might be a bit more reasonable.

Are there reported bugs that this patchset fixes?

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Henrique de Moraes Holschuh
On Thu, 14 Jun 2007, Greg KH wrote:
 Are there reported bugs that this patchset fixes?

Yes, at least one I opened and which got a CODE_FIX when Tejun prepared the
first version of the patch.

http://bugzilla.kernel.org/show_bug.cgi?id=7838

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Daniel Drake

Greg KH wrote:

I think it looks way too big.


Agreed (otherwise I would have submitted the patches already).


If there are smaller patches, it might be a bit more reasonable.


It may be possible to get rid of the couple of unrelated ones (sd 
printing, SCSI constants). These were required for the real patches to 
be able to build, but it would probably be easy to modify the real 
patches to build against kernels without those otherwise unrelated patches.



Are there reported bugs that this patchset fixes?


Yes, but they are not regressions - libata has never done this right 
until now.


Here are a few:
https://bugs.gentoo.org/show_bug.cgi?id=174373
http://bugzilla.kernel.org/show_bug.cgi?id=7674
http://bugzilla.kernel.org/show_bug.cgi?id=7838
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810

Daniel

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] libata spindown patches for 2.6.21-stable

2007-06-14 Thread Greg KH
On Thu, Jun 14, 2007 at 01:48:46PM -0400, Daniel Drake wrote:
  Greg KH wrote:
  I think it looks way too big.
 
  Agreed (otherwise I would have submitted the patches already).
 
  If there are smaller patches, it might be a bit more reasonable.
 
  It may be possible to get rid of the couple of unrelated ones (sd printing, 
  SCSI constants). These were required for the real patches to be able to 
  build, but it would probably be easy to modify the real patches to build 
  against kernels without those otherwise unrelated patches.
 
  Are there reported bugs that this patchset fixes?
 
  Yes, but they are not regressions - libata has never done this right until 
  now.
 
  Here are a few:
  https://bugs.gentoo.org/show_bug.cgi?id=174373
  http://bugzilla.kernel.org/show_bug.cgi?id=7674
  http://bugzilla.kernel.org/show_bug.cgi?id=7838
  https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/67810

Ok, if people want to post some smaller patches to the stable team,
we'll be glad to consider them.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Henrique de Moraes Holschuh
On Fri, 01 Jun 2007, Jeff Garzik wrote:
> IIRC, Debian was the one OS that really did need a shutdown utility 
> update, as the message says :)

Actually, editing /etc/init.d/halt is enough.  Find the hddown="-h" and
change it to hddown="".

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Tuncer Ayaz

On 6/1/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

Tuncer Ayaz wrote:
> I'm still seeing the libata warning that disks were not spun down
> properly on the following two setups and am wondering whether I need
> a new shutdown binary or the changeset mentioned below is not meant
> to fix what I'm triggering by halt'ing.
>
> If it's not a bug I will try to update my shutdown utility and if
> that does not work I promise not to bother lkml about a problem
> caused by my userland. If it is a bug I hope it will be of interest
> for 2.6.22 bug tracking.
>
> Setup 1:
> SATA 1 Disks
> AMD64 3200+
> nVidia nForce 3 250 (Ultra?)
> Debian i386 Unstable
>
> Setup 2:
> SATA 2 disks
> Core 2 Duo E6600
> Intel 975X
> Debian x86_64 Unstable
>
> Just to be clear what warning I'm talking about:
> DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY
> For more info, visit http://linux-ata.org/shutdown.html


IIRC, Debian was the one OS that really did need a shutdown utility
update, as the message says :)


Thanks for the confirmation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Jeff Garzik

Tuncer Ayaz wrote:

I'm still seeing the libata warning that disks were not spun down
properly on the following two setups and am wondering whether I need
a new shutdown binary or the changeset mentioned below is not meant
to fix what I'm triggering by halt'ing.

If it's not a bug I will try to update my shutdown utility and if
that does not work I promise not to bother lkml about a problem
caused by my userland. If it is a bug I hope it will be of interest
for 2.6.22 bug tracking.

Setup 1:
SATA 1 Disks
AMD64 3200+
nVidia nForce 3 250 (Ultra?)
Debian i386 Unstable

Setup 2:
SATA 2 disks
Core 2 Duo E6600
Intel 975X
Debian x86_64 Unstable

Just to be clear what warning I'm talking about:
DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY
For more info, visit http://linux-ata.org/shutdown.html



IIRC, Debian was the one OS that really did need a shutdown utility 
update, as the message says :)


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22 libata spindown

2007-06-01 Thread Tuncer Ayaz

I'm still seeing the libata warning that disks were not spun down
properly on the following two setups and am wondering whether I need
a new shutdown binary or the changeset mentioned below is not meant
to fix what I'm triggering by halt'ing.

If it's not a bug I will try to update my shutdown utility and if
that does not work I promise not to bother lkml about a problem
caused by my userland. If it is a bug I hope it will be of interest
for 2.6.22 bug tracking.

Setup 1:
SATA 1 Disks
AMD64 3200+
nVidia nForce 3 250 (Ultra?)
Debian i386 Unstable

Setup 2:
SATA 2 disks
Core 2 Duo E6600
Intel 975X
Debian x86_64 Unstable

Just to be clear what warning I'm talking about:
DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY
For more info, visit http://linux-ata.org/shutdown.html

The following is from the reply I got from Michal Piotrowski while
I was trying to find out what happened to the regression report:

MICHAL>>
I guess you meant this

Subject: libata crash on halt
References : http://marc.info/?l=linux-ide=117899827710565=2
Submitter  : Andrew Morton <[EMAIL PROTECTED]>
Caused-By  : Tejun Heo <[EMAIL PROTECTED]>
   commit 920a4b1038e442700a1cfac77ea7e20bd615a2c3
Status : problem is being debugged

This bug was fixed by

commit da071b42f73dabbd0daf7ea4c3ff157d53b00648
Author: Tejun Heo <[EMAIL PROTECTED]>
Date:   Mon May 14 17:26:18 2007 +0200

  libata: fix shutdown warning message printing

  Unlocking ap->lock and ssleeping don't work because SCSI commands can
  be issued from completion path without context.  Reimplement delayed
  completion by allowing translation functions to override
  qc->scsidone(), storing the original completion function to
  scmd->scsi_done() and overriding qc->scsidone() with a function which
  schedules delayed invocation of scmd->scsi_done().

  This isn't pretty at all but all the ugly parts are thankfully
  contained in the stop translation path where the compat feature is
  implemented.

  Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
  Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>




[  715.196000] ata3.00: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE
SHUTDOWN UTILITY
[  715.196000] ata3.00: For more info, visit http://linux-ata.org/
shutdown.html

 
If you think about this, please send a bug report. IMHO it's ABI breakage.


2.6.22 libata spindown

2007-06-01 Thread Tuncer Ayaz

I'm still seeing the libata warning that disks were not spun down
properly on the following two setups and am wondering whether I need
a new shutdown binary or the changeset mentioned below is not meant
to fix what I'm triggering by halt'ing.

If it's not a bug I will try to update my shutdown utility and if
that does not work I promise not to bother lkml about a problem
caused by my userland. If it is a bug I hope it will be of interest
for 2.6.22 bug tracking.

Setup 1:
SATA 1 Disks
AMD64 3200+
nVidia nForce 3 250 (Ultra?)
Debian i386 Unstable

Setup 2:
SATA 2 disks
Core 2 Duo E6600
Intel 975X
Debian x86_64 Unstable

Just to be clear what warning I'm talking about:
DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY
For more info, visit http://linux-ata.org/shutdown.html

The following is from the reply I got from Michal Piotrowski while
I was trying to find out what happened to the regression report:

MICHAL
I guess you meant this

Subject: libata crash on halt
References : http://marc.info/?l=linux-idem=117899827710565w=2
Submitter  : Andrew Morton [EMAIL PROTECTED]
Caused-By  : Tejun Heo [EMAIL PROTECTED]
   commit 920a4b1038e442700a1cfac77ea7e20bd615a2c3
Status : problem is being debugged

This bug was fixed by

commit da071b42f73dabbd0daf7ea4c3ff157d53b00648
Author: Tejun Heo [EMAIL PROTECTED]
Date:   Mon May 14 17:26:18 2007 +0200

  libata: fix shutdown warning message printing

  Unlocking ap-lock and ssleeping don't work because SCSI commands can
  be issued from completion path without context.  Reimplement delayed
  completion by allowing translation functions to override
  qc-scsidone(), storing the original completion function to
  scmd-scsi_done() and overriding qc-scsidone() with a function which
  schedules delayed invocation of scmd-scsi_done().

  This isn't pretty at all but all the ugly parts are thankfully
  contained in the stop translation path where the compat feature is
  implemented.

  Signed-off-by: Tejun Heo [EMAIL PROTECTED]
  Signed-off-by: Jeff Garzik [EMAIL PROTECTED]




[  715.196000] ata3.00: DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE
SHUTDOWN UTILITY
[  715.196000] ata3.00: For more info, visit http://linux-ata.org/
shutdown.html

 
If you think about this, please send a bug report. IMHO it's ABI breakage.
MICHAL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Jeff Garzik

Tuncer Ayaz wrote:

I'm still seeing the libata warning that disks were not spun down
properly on the following two setups and am wondering whether I need
a new shutdown binary or the changeset mentioned below is not meant
to fix what I'm triggering by halt'ing.

If it's not a bug I will try to update my shutdown utility and if
that does not work I promise not to bother lkml about a problem
caused by my userland. If it is a bug I hope it will be of interest
for 2.6.22 bug tracking.

Setup 1:
SATA 1 Disks
AMD64 3200+
nVidia nForce 3 250 (Ultra?)
Debian i386 Unstable

Setup 2:
SATA 2 disks
Core 2 Duo E6600
Intel 975X
Debian x86_64 Unstable

Just to be clear what warning I'm talking about:
DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY
For more info, visit http://linux-ata.org/shutdown.html



IIRC, Debian was the one OS that really did need a shutdown utility 
update, as the message says :)


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Tuncer Ayaz

On 6/1/07, Jeff Garzik [EMAIL PROTECTED] wrote:

Tuncer Ayaz wrote:
 I'm still seeing the libata warning that disks were not spun down
 properly on the following two setups and am wondering whether I need
 a new shutdown binary or the changeset mentioned below is not meant
 to fix what I'm triggering by halt'ing.

 If it's not a bug I will try to update my shutdown utility and if
 that does not work I promise not to bother lkml about a problem
 caused by my userland. If it is a bug I hope it will be of interest
 for 2.6.22 bug tracking.

 Setup 1:
 SATA 1 Disks
 AMD64 3200+
 nVidia nForce 3 250 (Ultra?)
 Debian i386 Unstable

 Setup 2:
 SATA 2 disks
 Core 2 Duo E6600
 Intel 975X
 Debian x86_64 Unstable

 Just to be clear what warning I'm talking about:
 DISK MIGHT NOT BE SPUN DOWN PROPERLY. UPDATE SHUTDOWN UTILITY
 For more info, visit http://linux-ata.org/shutdown.html


IIRC, Debian was the one OS that really did need a shutdown utility
update, as the message says :)


Thanks for the confirmation.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22 libata spindown

2007-06-01 Thread Henrique de Moraes Holschuh
On Fri, 01 Jun 2007, Jeff Garzik wrote:
 IIRC, Debian was the one OS that really did need a shutdown utility 
 update, as the message says :)

Actually, editing /etc/init.d/halt is enough.  Find the hddown=-h and
change it to hddown=.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: spindown

2001-06-27 Thread Troy Benjegerdes

On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote:
> Pavel Machek wrote:
> > > Isn't this why noflushd exists or is this an evil thing that shouldn't
> > > ever be used and will eventually eat my disks for breakfast?
> > 
> > It would eat your flash for breakfast. You know, flash memories have
> > no spinning parts, so there's nothing to spin down.
> 
> Btw Pavel, does noflushd work with 2.4.4?  The noflushd version 2.4 I
> tried said it couldn't find some kernel process (kflushd?  I don't
> remember) and that I should use bdflush.  The manual says that's
> appropriate for older kernels, but not 2.4.4 surely.

Yes, noflushd works with 2.4.x. I'm running it on an ibook with 
debian-unstable.

And as a word of warning: while running noflushd, make sure you 'sync' a 
few times after an 'apt-get dist-upgrade' that upgrades damn near 
everything before doing something that crashes the kernel. This WILL eat 
your ext2fs for breakfast.

-- 
Troy Benjegerdes | master of mispeeling | 'da hozer' |  [EMAIL PROTECTED]
-"If this message isn't misspelled, I didn't write it" -- Me -
"Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's 
why I draw cartoons. It's my life." -- Charles Shulz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-27 Thread Troy Benjegerdes

On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote:
 Pavel Machek wrote:
   Isn't this why noflushd exists or is this an evil thing that shouldn't
   ever be used and will eventually eat my disks for breakfast?
  
  It would eat your flash for breakfast. You know, flash memories have
  no spinning parts, so there's nothing to spin down.
 
 Btw Pavel, does noflushd work with 2.4.4?  The noflushd version 2.4 I
 tried said it couldn't find some kernel process (kflushd?  I don't
 remember) and that I should use bdflush.  The manual says that's
 appropriate for older kernels, but not 2.4.4 surely.

Yes, noflushd works with 2.4.x. I'm running it on an ibook with 
debian-unstable.

And as a word of warning: while running noflushd, make sure you 'sync' a 
few times after an 'apt-get dist-upgrade' that upgrades damn near 
everything before doing something that crashes the kernel. This WILL eat 
your ext2fs for breakfast.

-- 
Troy Benjegerdes | master of mispeeling | 'da hozer' |  [EMAIL PROTECTED]
-If this message isn't misspelled, I didn't write it -- Me -
Why do musicians compose symphonies and poets write poems? They do it
because life wouldn't have any meaning for them if they didn't. That's 
why I draw cartoons. It's my life. -- Charles Shulz
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-25 Thread Pavel Machek

Hi!

> > You know about this project no doubt:
> > 
> >http://noflushd.sourceforge.net/
> 
> Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in
> .h files! As you say, not really lightweight. There must be a better
> way. Also, I suspect (without having looked at the code) that it
> doesn't handle memory pressure well. Things may get nasty when we run
> low on free pages.

Noflushd *is* lightweight. It is complicated because it has to know
about different kernel versions etc. It is "easy stuff". If you add
kernel support, it will only *add* lines to noflushd.
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-25 Thread Pavel Machek

Hi!

> > > > I'd like that too, but what about sync writes?  As things stand now,
> > > > there is no option but to spin the disk back up.  To get around this
> > > > we'd have to change the basic behavior of the block device and
> > > > that's doable, but it's an entirely different proposition than the
> > > > little patch above.
> > >
> > > I don't care as much about sync writes. They don't seem to happen very
> > > often on my boxes.
> >
> > syslog and some editors are the most common users of sync writes. vim,
> > e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration
> > of these apps, this can be prevented fairly easy though. Changing sync
> > semantics for this matter on the other hand seems pretty awkward to me. I'd
> > expect an application calling fsync() to have good reason for having its
> > data flushed to disk _now_, no matter what state the disk happens to be in.
> > If it hasn't, fix the app, not the kernel.
> 
> But apps shouldn't have to know about the special requirements of
> laptops.  

If app does fsync(), it hopefully knows what it is doing. [Random apps
should not really do sync even on normal systems -- it hurts
performance.]
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-25 Thread Pavel Machek

Hi!

  You know about this project no doubt:
  
 http://noflushd.sourceforge.net/
 
 Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in
 .h files! As you say, not really lightweight. There must be a better
 way. Also, I suspect (without having looked at the code) that it
 doesn't handle memory pressure well. Things may get nasty when we run
 low on free pages.

Noflushd *is* lightweight. It is complicated because it has to know
about different kernel versions etc. It is easy stuff. If you add
kernel support, it will only *add* lines to noflushd.
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-25 Thread Pavel Machek

Hi!

I'd like that too, but what about sync writes?  As things stand now,
there is no option but to spin the disk back up.  To get around this
we'd have to change the basic behavior of the block device and
that's doable, but it's an entirely different proposition than the
little patch above.
  
   I don't care as much about sync writes. They don't seem to happen very
   often on my boxes.
 
  syslog and some editors are the most common users of sync writes. vim,
  e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration
  of these apps, this can be prevented fairly easy though. Changing sync
  semantics for this matter on the other hand seems pretty awkward to me. I'd
  expect an application calling fsync() to have good reason for having its
  data flushed to disk _now_, no matter what state the disk happens to be in.
  If it hasn't, fix the app, not the kernel.
 
 But apps shouldn't have to know about the special requirements of
 laptops.  

If app does fsync(), it hopefully knows what it is doing. [Random apps
should not really do sync even on normal systems -- it hurts
performance.]
Pavel
-- 
The best software in life is free (not shareware)!  Pavel
GCM d? s-: !g p?:+ au- a--@ w+ v- C++@ UL+++ L++ N++ E++ W--- M- Y- R+
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-24 Thread Daniel Phillips

On Sunday 24 June 2001 17:06, Rik van Riel wrote:
> On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote:
> > It is not uncommon to have a large number of tmp files on the disk(s)
> > (Rik also pointed this out somewhere early in the original thread) and
> > it is sensible to keep all of them in buffers if RAM is sufficient.
> > Transfering _very_ large files is not _that_ common so why shouldn't
> > that case be handled from the user space by calling sync(2)?
>
> Wait a moment.
>
> The only observed bad case I've heard about here is
> that of large files being written out.

But that's not the only advantage of doing the early update:

  - Early spindown for laptops
  - Improved latency under some conditions
  - Improved throughput for some loads
  - Improved filesystem safety

> It should be easy enough to just trigger writeout of
> pages of an inode once that inode has more than a
> certain amount of dirty pages in RAM ... say, something
> like freepages.high ?

The inode dirty page list is not sorted by "time dirtied" so you would be 
eroding the system's ability to ensure that dirty file buffers never get 
older than X.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-24 Thread Rik van Riel

On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote:

> It is not uncommon to have a large number of tmp files on the disk(s)
> (Rik also pointed this out somewhere early in the original thread) and
> it is sensible to keep all of them in buffers if RAM is sufficient.
> Transfering _very_ large files is not _that_ common so why shouldn't
> that case be handled from the user space by calling sync(2)?

Wait a moment.

The only observed bad case I've heard about here is
that of large files being written out.

It should be easy enough to just trigger writeout of
pages of an inode once that inode has more than a
certain amount of dirty pages in RAM ... say, something
like freepages.high ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-24 Thread Daniel Phillips

On Sunday 24 June 2001 05:20, Anuradha Ratnaweera wrote:
> On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote:
> > 1.  When running a compile, or anything else that produces lots of small
> > disk writes, you tend to get lots of little pauses for all the little
> > writes to disk. These seem to be unnoticable without the patch.
> >
> > 2.  Loading programs when writing activity is occuring (even light
> > activity like during the compile) is noticable slower, actually any
> > reading from disk is.
> >
> > I also ran my simple ftp test that produced the symptom I reported
> > earlier.  I transferred a 750MB file via FTP, and with your patch sure
> > enough disk writing started almost immediately, but it still didn't seem
> > to write enough data to disk to keep up with the transfer so at
> > approximately the 200MB mark the old behavior still kicked in as it went
> > into full flush mode, during the time network activity halted, just like
> > before.
>
> It is not uncommon to have a large number of tmp files on the disk(s) (Rik
> also pointed this out somewhere early in the original thread) and it is
> sensible to keep all of them in buffers if RAM is sufficient. Transfering
> _very_ large files is not _that_ common so why shouldn't that case be
> handled from the user space by calling sync(2)?

The patch you're discussing has been superceded - check my "[RFC] Early 
flush: new, improved" post from yesterday.  This addresses the problem of 
handling tmp files efficiently while still having the early flush.

The latest patch shows no degradation at all for compilation, which uses lots 
of temporary files.

--
Daniel 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-24 Thread Daniel Phillips

On Sunday 24 June 2001 05:20, Anuradha Ratnaweera wrote:
 On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote:
  1.  When running a compile, or anything else that produces lots of small
  disk writes, you tend to get lots of little pauses for all the little
  writes to disk. These seem to be unnoticable without the patch.
 
  2.  Loading programs when writing activity is occuring (even light
  activity like during the compile) is noticable slower, actually any
  reading from disk is.
 
  I also ran my simple ftp test that produced the symptom I reported
  earlier.  I transferred a 750MB file via FTP, and with your patch sure
  enough disk writing started almost immediately, but it still didn't seem
  to write enough data to disk to keep up with the transfer so at
  approximately the 200MB mark the old behavior still kicked in as it went
  into full flush mode, during the time network activity halted, just like
  before.

 It is not uncommon to have a large number of tmp files on the disk(s) (Rik
 also pointed this out somewhere early in the original thread) and it is
 sensible to keep all of them in buffers if RAM is sufficient. Transfering
 _very_ large files is not _that_ common so why shouldn't that case be
 handled from the user space by calling sync(2)?

The patch you're discussing has been superceded - check my [RFC] Early 
flush: new, improved post from yesterday.  This addresses the problem of 
handling tmp files efficiently while still having the early flush.

The latest patch shows no degradation at all for compilation, which uses lots 
of temporary files.

--
Daniel 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-24 Thread Rik van Riel

On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote:

 It is not uncommon to have a large number of tmp files on the disk(s)
 (Rik also pointed this out somewhere early in the original thread) and
 it is sensible to keep all of them in buffers if RAM is sufficient.
 Transfering _very_ large files is not _that_ common so why shouldn't
 that case be handled from the user space by calling sync(2)?

Wait a moment.

The only observed bad case I've heard about here is
that of large files being written out.

It should be easy enough to just trigger writeout of
pages of an inode once that inode has more than a
certain amount of dirty pages in RAM ... say, something
like freepages.high ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-24 Thread Daniel Phillips

On Sunday 24 June 2001 17:06, Rik van Riel wrote:
 On Sun, 24 Jun 2001, Anuradha Ratnaweera wrote:
  It is not uncommon to have a large number of tmp files on the disk(s)
  (Rik also pointed this out somewhere early in the original thread) and
  it is sensible to keep all of them in buffers if RAM is sufficient.
  Transfering _very_ large files is not _that_ common so why shouldn't
  that case be handled from the user space by calling sync(2)?

 Wait a moment.

 The only observed bad case I've heard about here is
 that of large files being written out.

But that's not the only advantage of doing the early update:

  - Early spindown for laptops
  - Improved latency under some conditions
  - Improved throughput for some loads
  - Improved filesystem safety

 It should be easy enough to just trigger writeout of
 pages of an inode once that inode has more than a
 certain amount of dirty pages in RAM ... say, something
 like freepages.high ?

The inode dirty page list is not sorted by time dirtied so you would be 
eroding the system's ability to ensure that dirty file buffers never get 
older than X.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-23 Thread Anuradha Ratnaweera


On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote:
> 
> 1.  When running a compile, or anything else that produces lots of small disk
> writes, you tend to get lots of little pauses for all the little writes to disk.
>  These seem to be unnoticable without the patch.
> 
> 2.  Loading programs when writing activity is occuring (even light activity like
> during the compile) is noticable slower, actually any reading from disk is.
> 
> I also ran my simple ftp test that produced the symptom I reported earlier.  I
> transferred a 750MB file via FTP, and with your patch sure enough disk writing
> started almost immediately, but it still didn't seem to write enough data to
> disk to keep up with the transfer so at approximately the 200MB mark the old
> behavior still kicked in as it went into full flush mode, during the time
> network activity halted, just like before.

It is not uncommon to have a large number of tmp files on the disk(s) (Rik also
pointed this out somewhere early in the original thread) and it is sensible to
keep all of them in buffers if RAM is sufficient. Transfering _very_ large
files is not _that_ common so why shouldn't that case be handled from the user
space by calling sync(2)?

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.6-pre5)

Keep cool, but don't freeze.
-- Hellman's Mayonnaise

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-23 Thread Anuradha Ratnaweera


On Wed, Jun 20, 2001 at 04:58:51PM -0400, Tom Sightler wrote:
 
 1.  When running a compile, or anything else that produces lots of small disk
 writes, you tend to get lots of little pauses for all the little writes to disk.
  These seem to be unnoticable without the patch.
 
 2.  Loading programs when writing activity is occuring (even light activity like
 during the compile) is noticable slower, actually any reading from disk is.
 
 I also ran my simple ftp test that produced the symptom I reported earlier.  I
 transferred a 750MB file via FTP, and with your patch sure enough disk writing
 started almost immediately, but it still didn't seem to write enough data to
 disk to keep up with the transfer so at approximately the 200MB mark the old
 behavior still kicked in as it went into full flush mode, during the time
 network activity halted, just like before.

It is not uncommon to have a large number of tmp files on the disk(s) (Rik also
pointed this out somewhere early in the original thread) and it is sensible to
keep all of them in buffers if RAM is sufficient. Transfering _very_ large
files is not _that_ common so why shouldn't that case be handled from the user
space by calling sync(2)?

Anuradha

-- 

Debian GNU/Linux (kernel 2.4.6-pre5)

Keep cool, but don't freeze.
-- Hellman's Mayonnaise

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-22 Thread Daniel Phillips

On Saturday 23 June 2001 01:25, Daniel Kobras wrote:
> On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote:
> > Daniel Phillips writes:
> > > I'd like that too, but what about sync writes?  As things stand now,
> > > there is no option but to spin the disk back up.  To get around this
> > > we'd have to change the basic behavior of the block device and
> > > that's doable, but it's an entirely different proposition than the
> > > little patch above.
> >
> > I don't care as much about sync writes. They don't seem to happen very
> > often on my boxes.
>
> syslog and some editors are the most common users of sync writes. vim,
> e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration
> of these apps, this can be prevented fairly easy though. Changing sync
> semantics for this matter on the other hand seems pretty awkward to me. I'd
> expect an application calling fsync() to have good reason for having its
> data flushed to disk _now_, no matter what state the disk happens to be in.
> If it hasn't, fix the app, not the kernel.

But apps shouldn't have to know about the special requirements of laptops.  
I've been playing a little with the idea of creating a special block device 
for laptops that goes between the vfs and the real block device, and adds the 
behaviour of being able to buffer writes in memory.  In all respects it would 
seem to the vfs to be a disk.  So far this is just a thought experiment.

> > > You know about this project no doubt:
> > >
> > >http://noflushd.sourceforge.net/
> >
> > Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in
> > .h files! As you say, not really lightweight. There must be a better
> > way.
>
> noflushd would benefit a lot from being able to set bdflush parameters per
> device or per disk. So I'm really eager to see what Daniel comes up with.
> Currently, we can only turn kupdate either on or off as a whole, which
> means that noflushd implements a crude replacement for the benefit of
> multi-disk setups. A lot of the cruft stems from there.

Yes, another person to talk to about this is Jens Axboe who has been doing 
some serious hacking on the block layer.  I thought I'd get the early flush 
patch working well for one disk before generalizing to N ;-)

> > Also, I suspect (without having looked at the code) that it
> > doesn't handle memory pressure well. Things may get nasty when we run
> > low on free pages.
>
> It doesn't handle memory pressure at all. It doesn't have to. noflushd only
> messes with kupdate{,d} but leaves bdflush (formerly known as kflushd)
> alone. If memory gets tight, bdflush starts writing out dirty buffers,
> which makes the disk spin up, and we're back to normal.

Exactly.  And in addition, when bdflush does wake up, I try to get kupdate 
out of the way as much as possible, though I've been following the 
traditional recipe and having it submit all buffers past a certain age.  This 
is quite possibily a bad thing to do because it could starve the swapper.  
Ouch.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-22 Thread Daniel Kobras

On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote:
> Daniel Phillips writes:
> > I'd like that too, but what about sync writes?  As things stand now,
> > there is no option but to spin the disk back up.  To get around this
> > we'd have to change the basic behavior of the block device and
> > that's doable, but it's an entirely different proposition than the
> > little patch above.
> 
> I don't care as much about sync writes. They don't seem to happen very
> often on my boxes.

syslog and some editors are the most common users of sync writes. vim, e.g.,
per default keeps fsync()ing its swapfile. Tweaking the configuration of
these apps, this can be prevented fairly easy though. Changing sync semantics
for this matter on the other hand seems pretty awkward to me. I'd expect an
application calling fsync() to have good reason for having its data flushed
to disk _now_, no matter what state the disk happens to be in. If it hasn't,
fix the app, not the kernel. 

> > You know about this project no doubt:
> > 
> >http://noflushd.sourceforge.net/
> 
> Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in
> .h files! As you say, not really lightweight. There must be a better
> way.

noflushd would benefit a lot from being able to set bdflush parameters per
device or per disk. So I'm really eager to see what Daniel comes up with.
Currently, we can only turn kupdate either on or off as a whole, which means
that noflushd implements a crude replacement for the benefit of multi-disk
setups. A lot of the cruft stems from there.

> Also, I suspect (without having looked at the code) that it
> doesn't handle memory pressure well. Things may get nasty when we run
> low on free pages.

It doesn't handle memory pressure at all. It doesn't have to. noflushd only
messes with kupdate{,d} but leaves bdflush (formerly known as kflushd) alone.
If memory gets tight, bdflush starts writing out dirty buffers, which makes the
disk spin up, and we're back to normal.

Regards,

Daniel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-22 Thread Daniel Kobras

On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote:
> Pavel Machek wrote:
> > > Isn't this why noflushd exists or is this an evil thing that shouldn't
> > > ever be used and will eventually eat my disks for breakfast?
> > 
> > It would eat your flash for breakfast. You know, flash memories have
> > no spinning parts, so there's nothing to spin down.
> 
> Btw Pavel, does noflushd work with 2.4.4?  The noflushd version 2.4 I
> tried said it couldn't find some kernel process (kflushd?  I don't
> remember) and that I should use bdflush.  The manual says that's
> appropriate for older kernels, but not 2.4.4 surely.

That's because of my favourite change from the 2.4.3 patch:

-   strcpy(tsk->comm, "kupdate");
+   strcpy(tsk->comm, "kupdated");

noflushd 2.4 fixed this issue in the daemon itself, but I had forgotten about 
the generic startup skript. (Rpms and debs run their customized versions.)

Either the current version from CVS, or

ed /your/init.d/location/noflushd << EOF
%s/kupdate/kupdated/g
w
q
EOF

should get you going.

Regards,

Daniel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-22 Thread Daniel Kobras

On Thu, Jun 21, 2001 at 06:07:01PM +0200, Jamie Lokier wrote:
 Pavel Machek wrote:
   Isn't this why noflushd exists or is this an evil thing that shouldn't
   ever be used and will eventually eat my disks for breakfast?
  
  It would eat your flash for breakfast. You know, flash memories have
  no spinning parts, so there's nothing to spin down.
 
 Btw Pavel, does noflushd work with 2.4.4?  The noflushd version 2.4 I
 tried said it couldn't find some kernel process (kflushd?  I don't
 remember) and that I should use bdflush.  The manual says that's
 appropriate for older kernels, but not 2.4.4 surely.

That's because of my favourite change from the 2.4.3 patch:

-   strcpy(tsk-comm, kupdate);
+   strcpy(tsk-comm, kupdated);

noflushd 2.4 fixed this issue in the daemon itself, but I had forgotten about 
the generic startup skript. (Rpms and debs run their customized versions.)

Either the current version from CVS, or

ed /your/init.d/location/noflushd  EOF
%s/kupdate/kupdated/g
w
q
EOF

should get you going.

Regards,

Daniel.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-22 Thread Daniel Kobras

On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote:
 Daniel Phillips writes:
  I'd like that too, but what about sync writes?  As things stand now,
  there is no option but to spin the disk back up.  To get around this
  we'd have to change the basic behavior of the block device and
  that's doable, but it's an entirely different proposition than the
  little patch above.
 
 I don't care as much about sync writes. They don't seem to happen very
 often on my boxes.

syslog and some editors are the most common users of sync writes. vim, e.g.,
per default keeps fsync()ing its swapfile. Tweaking the configuration of
these apps, this can be prevented fairly easy though. Changing sync semantics
for this matter on the other hand seems pretty awkward to me. I'd expect an
application calling fsync() to have good reason for having its data flushed
to disk _now_, no matter what state the disk happens to be in. If it hasn't,
fix the app, not the kernel. 

  You know about this project no doubt:
  
 http://noflushd.sourceforge.net/
 
 Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in
 .h files! As you say, not really lightweight. There must be a better
 way.

noflushd would benefit a lot from being able to set bdflush parameters per
device or per disk. So I'm really eager to see what Daniel comes up with.
Currently, we can only turn kupdate either on or off as a whole, which means
that noflushd implements a crude replacement for the benefit of multi-disk
setups. A lot of the cruft stems from there.

 Also, I suspect (without having looked at the code) that it
 doesn't handle memory pressure well. Things may get nasty when we run
 low on free pages.

It doesn't handle memory pressure at all. It doesn't have to. noflushd only
messes with kupdate{,d} but leaves bdflush (formerly known as kflushd) alone.
If memory gets tight, bdflush starts writing out dirty buffers, which makes the
disk spin up, and we're back to normal.

Regards,

Daniel.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-22 Thread Daniel Phillips

On Saturday 23 June 2001 01:25, Daniel Kobras wrote:
 On Wed, Jun 20, 2001 at 10:12:38AM -0600, Richard Gooch wrote:
  Daniel Phillips writes:
   I'd like that too, but what about sync writes?  As things stand now,
   there is no option but to spin the disk back up.  To get around this
   we'd have to change the basic behavior of the block device and
   that's doable, but it's an entirely different proposition than the
   little patch above.
 
  I don't care as much about sync writes. They don't seem to happen very
  often on my boxes.

 syslog and some editors are the most common users of sync writes. vim,
 e.g., per default keeps fsync()ing its swapfile. Tweaking the configuration
 of these apps, this can be prevented fairly easy though. Changing sync
 semantics for this matter on the other hand seems pretty awkward to me. I'd
 expect an application calling fsync() to have good reason for having its
 data flushed to disk _now_, no matter what state the disk happens to be in.
 If it hasn't, fix the app, not the kernel.

But apps shouldn't have to know about the special requirements of laptops.  
I've been playing a little with the idea of creating a special block device 
for laptops that goes between the vfs and the real block device, and adds the 
behaviour of being able to buffer writes in memory.  In all respects it would 
seem to the vfs to be a disk.  So far this is just a thought experiment.

   You know about this project no doubt:
  
  http://noflushd.sourceforge.net/
 
  Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in
  .h files! As you say, not really lightweight. There must be a better
  way.

 noflushd would benefit a lot from being able to set bdflush parameters per
 device or per disk. So I'm really eager to see what Daniel comes up with.
 Currently, we can only turn kupdate either on or off as a whole, which
 means that noflushd implements a crude replacement for the benefit of
 multi-disk setups. A lot of the cruft stems from there.

Yes, another person to talk to about this is Jens Axboe who has been doing 
some serious hacking on the block layer.  I thought I'd get the early flush 
patch working well for one disk before generalizing to N ;-)

  Also, I suspect (without having looked at the code) that it
  doesn't handle memory pressure well. Things may get nasty when we run
  low on free pages.

 It doesn't handle memory pressure at all. It doesn't have to. noflushd only
 messes with kupdate{,d} but leaves bdflush (formerly known as kflushd)
 alone. If memory gets tight, bdflush starts writing out dirty buffers,
 which makes the disk spin up, and we're back to normal.

Exactly.  And in addition, when bdflush does wake up, I try to get kupdate 
out of the way as much as possible, though I've been following the 
traditional recipe and having it submit all buffers past a certain age.  This 
is quite possibily a bad thing to do because it could starve the swapper.  
Ouch.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-21 Thread Jamie Lokier

Pavel Machek wrote:
> > Isn't this why noflushd exists or is this an evil thing that shouldn't
> > ever be used and will eventually eat my disks for breakfast?
> 
> It would eat your flash for breakfast. You know, flash memories have
> no spinning parts, so there's nothing to spin down.

Btw Pavel, does noflushd work with 2.4.4?  The noflushd version 2.4 I
tried said it couldn't find some kernel process (kflushd?  I don't
remember) and that I should use bdflush.  The manual says that's
appropriate for older kernels, but not 2.4.4 surely.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-21 Thread Jamie Lokier

Pavel Machek wrote:
  Isn't this why noflushd exists or is this an evil thing that shouldn't
  ever be used and will eventually eat my disks for breakfast?
 
 It would eat your flash for breakfast. You know, flash memories have
 no spinning parts, so there's nothing to spin down.

Btw Pavel, does noflushd work with 2.4.4?  The noflushd version 2.4 I
tried said it couldn't find some kernel process (kflushd?  I don't
remember) and that I should use bdflush.  The manual says that's
appropriate for older kernels, but not 2.4.4 surely.

-- Jamie
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Daniel Phillips

On Wednesday 20 June 2001 22:58, Tom Sightler wrote:
> Quoting Daniel Phillips <[EMAIL PROTECTED]>:
> > I originally intended to implement a sliding flush delay based on disk
> > load.
> > This turned out to be a lot of work for a hard-to-discern benefit.  So
> > the
> > current approach has just two delays: .1 second and whatever the bdflush
> >
> > delay is set to.  If there is any non-flush disk traffic the longer
> > delay is
> > used.  This is crude but effective... I think.  I hope that somebody
> > will run
> > this through some benchmarks to see if I lost any performance.
> > According to
> > my calculations, I did not.  I tested this mainly in UML, and also ran
> > it
> > briefly on my laptop.  The interactive feel of the change is immediately
> >
> > obvious, and for me at least, a big improvement.
>
> Well, since a lot of this discussion seemed to spin off from my original
> posting last week about my particular issue with disk flushing I decided to
> try your patch with my simple test/problem that I experience on my laptop.
>
> One note, I ran your patch against 2.4.6-pre3 as that is what currently
> performs the best on my laptop.  It seems to apply cleanly and compiled
> without problems.
>
> I used this kernel on my laptop kernel on my laptop all day for my normal
> workload which consist ofa Gnome 1.4 desktop, several Mozilla instances,
> several ssh sessions with remote X programs displayed, StarOffice, VMware
> (running Windows 2000 Pro in 128MB).  I also preformed several compiles
> throughout the day.  Overall the machine feels slightly more sluggish I
> think due to the following two things:
>
> 1.  When running a compile, or anything else that produces lots of small
> disk writes, you tend to get lots of little pauses for all the little
> writes to disk. These seem to be unnoticable without the patch.

OK, this is because the early flush doesn't quit when load picks up again.  
Measuring only the io backlog, as I do now, isn't adequate for telling the 
difference between load initiated by the flush itself and other load, such as 
cpu bound process proceding to read another file, so that's why the flush 
doesn't stop flushing when other IO starts happening.  This has to be fixed.

In the mean time, you could try this simple tweak: just set the lower bound, 
currently 1/10th second a little higher:

-   unsigned check_interval = HZ/10, ...
+   unsigned check_interval = HZ/5, ...

This may be enough to bridge the little pauses in the the compiler's disk 
access pattern so the flush isn't triggered.  (This is not by any means a 
nice solution.)  If you set check_interval to HZ*5, you *should* get exactly 
the old behaviour, I'd be very interested to hear if you do.

Also, could you do your compiles with 'time' so you can quantify the results?

> 2.  Loading programs when writing activity is occuring (even light activity
> like during the compile) is noticable slower, actually any reading from
> disk is.

Hmm, let me think why that may be.  The loader doesn't actually read the 
program into memory, it just maps it and lets the pages fault in as they're 
called for.  So if readahead isn't perfect (it isn't) the io backlog may drop 
to 0 briefly just as the kflush decides to sample it, and it initiates a 
flush.  This flush cleans the whole dirty list out, stealing bandwidth from 
the reads.

> I also ran my simple ftp test that produced the symptom I reported earlier.
>  I transferred a 750MB file via FTP, and with your patch sure enough disk
> writing started almost immediately, but it still didn't seem to write
> enough data to disk to keep up with the transfer so at approximately the
> 200MB mark the old behavior still kicked in as it went into full flush
> mode, during the time network activity halted, just like before.  The big
> difference with the patch and without is that the patched kernel never
> seems to balance out, without the patch once the initial burst is done you
> get a nice stream of data from the network to disk with the disk staying
> moderately active.  With the patch the disk varies from barely active
> moderate to heavy and back, during the heavy the network transfer always
> pauses (although very briefly).
>
> Just my observations, you asked for comments.

Yes, I have to refine this.  The inner flush loop has to know how many io 
submissions are happening, from which it can subtract its own submissions and 
know sombebody else is submitting IO, at which point it can fall back to the 
good old 5 second buffer age limit.  False positives from kflush are handled 
as a fringe benefit, and flush_dirty_buffers won't do extra writeout.  This 
is easy and cheap.

I could get a lot fancier than this and caculate IO load averages, but I'd 
only do that after mining out the simple possibilities.  I'll probably have 
something new for you to try tomorrow, if you're willing.  By the way, I'm 
not addressing your fundamental problem, that's Rik's job 

Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Tom Sightler

Quoting Daniel Phillips <[EMAIL PROTECTED]>:

> I originally intended to implement a sliding flush delay based on disk
> load.  
> This turned out to be a lot of work for a hard-to-discern benefit.  So
> the 
> current approach has just two delays: .1 second and whatever the bdflush
> 
> delay is set to.  If there is any non-flush disk traffic the longer
> delay is 
> used.  This is crude but effective... I think.  I hope that somebody
> will run 
> this through some benchmarks to see if I lost any performance. 
> According to 
> my calculations, I did not.  I tested this mainly in UML, and also ran
> it 
> briefly on my laptop.  The interactive feel of the change is immediately
> 
> obvious, and for me at least, a big improvement.


Well, since a lot of this discussion seemed to spin off from my original posting
last week about my particular issue with disk flushing I decided to try your
patch with my simple test/problem that I experience on my laptop.

One note, I ran your patch against 2.4.6-pre3 as that is what currently performs
the best on my laptop.  It seems to apply cleanly and compiled without problems.

I used this kernel on my laptop kernel on my laptop all day for my normal
workload which consist ofa Gnome 1.4 desktop, several Mozilla instances, several
ssh sessions with remote X programs displayed, StarOffice, VMware (running
Windows 2000 Pro in 128MB).  I also preformed several compiles throughout the
day.  Overall the machine feels slightly more sluggish I think due to the
following two things:

1.  When running a compile, or anything else that produces lots of small disk
writes, you tend to get lots of little pauses for all the little writes to disk.
 These seem to be unnoticable without the patch.

2.  Loading programs when writing activity is occuring (even light activity like
during the compile) is noticable slower, actually any reading from disk is.

I also ran my simple ftp test that produced the symptom I reported earlier.  I
transferred a 750MB file via FTP, and with your patch sure enough disk writing
started almost immediately, but it still didn't seem to write enough data to
disk to keep up with the transfer so at approximately the 200MB mark the old
behavior still kicked in as it went into full flush mode, during the time
network activity halted, just like before.  The big difference with the patch
and without is that the patched kernel never seems to balance out, without the
patch once the initial burst is done you get a nice stream of data from the
network to disk with the disk staying moderately active.  With the patch the
disk varies from barely active moderate to heavy and back, during the heavy the
network transfer always pauses (although very briefly).

Just my observations, you asked for comments.

Later,
Tom

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Daniel Phillips

On Wednesday 20 June 2001 19:32, Rik van Riel wrote:
> On Wed, 20 Jun 2001, Daniel Phillips wrote:
> > BTW, with nominal 100,000 erases you have to write 10 terabytes
> > to your 100 meg flash disk before you'll see it start to
> > degrade.
>
> That assumes you write out full blocks.  If you flush after
> every byte written you'll hit the limit a lot sooner ;)

Yep, so if you are running on a Yopy, try not to sync after each byte.

> Btw, this is also a problem with your patch, when you write
> out buffers all the time your disk will spend more time seeking
> all over the place (moving the disk head away from where we are
> currently reading!) and you'll end up writing the same block
> multiple times ...

It doesn't work that way, it tacks the flush onto the trailing edge of a 
burst of disk activity, or it flushes out an isolated update, say an edit 
save, which would have required the same amount of disk activity, just a few 
seconds off in the future.  Sometimes it does write a few extra sectors when 
disk activity is sporadic, but the impact on total throughput is small enough 
to be hard to measure reliably.  Even so, there is some optimizing that could 
be done - the update could be interleaved a little better with the falling 
edge of a heavy traffic episode.  This would require that the io rate be 
monitored instead of just the queue backlog.  I'mi nterested in tackling that 
eventually - it has applications in other areas than just the early update.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Rik van Riel

On Wed, 20 Jun 2001, Daniel Phillips wrote:

> BTW, with nominal 100,000 erases you have to write 10 terabytes
> to your 100 meg flash disk before you'll see it start to
> degrade.

That assumes you write out full blocks.  If you flush after
every byte written you'll hit the limit a lot sooner ;)

Btw, this is also a problem with your patch, when you write
out buffers all the time your disk will spend more time seeking
all over the place (moving the disk head away from where we are
currently reading!) and you'll end up writing the same block
multiple times ...

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Daniel Phillips

On Tuesday 19 June 2001 12:46, Pavel Machek wrote:
> > > > Roger> It does if you are running on a laptop. Then you do not want
> > > > Roger> the pages go out all the time. Disk has gone too sleep, needs
> > > > Roger> to start to write a few pages, stays idle for a while, goes to
> > > > Roger> sleep, a few more pages, ...
> > > > That could be handled by a metric which says if the disk is spun
> > > > down, wait until there is more memory pressure before writing.  But
> > > > if the disk is spinning, we don't care, you should start writing out
> > > > buffers at some low rate to keep the pressure from rising too
> > > > rapidly.
> > >
> > > Notice that write is not free (in terms of power) even if disk is
> > > spinning.  Seeks (etc) also take some power. And think about
> > > flashcards. It certainly is cheaper tha spinning disk up but still not
> > > free.
> >
> > Isn't this why noflushd exists or is this an evil thing that shouldn't
> > ever be used and will eventually eat my disks for breakfast?
>
> It would eat your flash for breakfast. You know, flash memories have
> no spinning parts, so there's nothing to spin down.

Yes, this doesn't make sense for flash, and in fact, it doesn't make sense to 
have just one set of bdflush parameters for the whole system, it's really a 
property of the individual device.  So the thing to do is for me to go kibitz 
on the io layer rewrite projects and figure out how to set up the 
intelligence per-queue, and have the queues per-device, at which point it's 
trivial to do the write^H^H^H^H^H right thing for each kind of device.

BTW, with nominal 100,000 erases you have to write 10 terabytes to your 100 
meg flash disk before you'll see it start to degrade.  These devices are set 
up to avoid continuous hammering on the same same page, and to take failed 
pages out of the pool as soon as they fail to erase.  Also, the 100,000 
figure is nominal - the average number of erases you'll get per page is 
considerably higher.  The extra few sectors we see with the early flush patch 
are just not going to affect the life of your flash to a measurable degree.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Richard Gooch

Daniel Phillips writes:
> On Wednesday 20 June 2001 06:39, Richard Gooch wrote:
> > Starting I/O immediately if there is no load sounds nice. However,
> > what about the other case, when the disc is already spun down (and
> > hence there's no I/O load either)? I want the system to avoid doing
> > writes while the disc is spun down. I'm quite happy for the system to
> > accumulate dirtied pages/buffers, reclaiming clean pages as needed,
> > until it absolutely has to start writing out (or I call sync(2)).
> 
> I'd like that too, but what about sync writes?  As things stand now,
> there is no option but to spin the disk back up.  To get around this
> we'd have to change the basic behavior of the block device and
> that's doable, but it's an entirely different proposition than the
> little patch above.

I don't care as much about sync writes. They don't seem to happen very
often on my boxes.

> You know about this project no doubt:
> 
>http://noflushd.sourceforge.net/

Only vaguely. It's huge. Over 2300 lines of C code and >560 lines in
.h files! As you say, not really lightweight. There must be a better
way. Also, I suspect (without having looked at the code) that it
doesn't handle memory pressure well. Things may get nasty when we run
low on free pages.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Daniel Phillips

On Wednesday 20 June 2001 06:39, Richard Gooch wrote:
> Daniel Phillips writes:
> > I never realized how much I didn't like the good old 5 second delay
> > between saving an edit and actually getting it written to disk until
> > it went away.  Now the question is, did I lose any performance in
> > doing that.  What I wrote in the previous email turned out to be
> > pretty accurate, so I'll just quote it
>
> Starting I/O immediately if there is no load sounds nice. However,
> what about the other case, when the disc is already spun down (and
> hence there's no I/O load either)? I want the system to avoid doing
> writes while the disc is spun down. I'm quite happy for the system to
> accumulate dirtied pages/buffers, reclaiming clean pages as needed,
> until it absolutely has to start writing out (or I call sync(2)).

I'd like that too, but what about sync writes?  As things stand now, there is 
no option but to spin the disk back up.  To get around this we'd have to 
change the basic behavior of the block device and that's doable, but it's an 
entirely different proposition than the little patch above.

You know about this project no doubt:

   http://noflushd.sourceforge.net/

This is really complementary to what I did.  Lightweight is not really a good 
way to describe it though, the tar is almost 10,000 lines long.  There is 
probably a clever thing to do at the kernel level to shorten that up.

There's one thing I think I can help fix up while I'm working in here, this 
complaint: 

Reiserfs journaling bypasses the kernel's delayed write mechanisms and
writes straight to disk.

We need to address the reasons why such filesystems have to bypass kupdate.  
This touches on how sync and fsync work, updating supers, flushing the inode 
cache etc, but with Al Viro's superblock work merged now we could start 
thinking about it.

> Right now I hack that by setting bdflush parameters to 5 minutes. But
> that's not ideal either.

Yes, that still works with my patch.  The noflushd user space daemon works by 
turning off kupdate (set update time to 0).

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Pavel Machek

Hi!

> > > Roger> It does if you are running on a laptop. Then you do not want
> > > Roger> the pages go out all the time. Disk has gone too sleep, needs
> > > Roger> to start to write a few pages, stays idle for a while, goes to
> > > Roger> sleep, a few more pages, ...
> > > That could be handled by a metric which says if the disk is spun
> > > down, wait until there is more memory pressure before writing.  But
> > > if the disk is spinning, we don't care, you should start writing out
> > > buffers at some low rate to keep the pressure from rising too
> > > rapidly.  
> > Notice that write is not free (in terms of power) even if disk is
> > spinning.  Seeks (etc) also take some power. And think about
> > flashcards. It certainly is cheaper tha spinning disk up but still not
> > free.
> 
> Isn't this why noflushd exists or is this an evil thing that shouldn't
> ever be used and will eventually eat my disks for breakfast?

It would eat your flash for breakfast. You know, flash memories have
no spinning parts, so there's nothing to spin down.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Pavel Machek

Hi!

   Roger It does if you are running on a laptop. Then you do not want
   Roger the pages go out all the time. Disk has gone too sleep, needs
   Roger to start to write a few pages, stays idle for a while, goes to
   Roger sleep, a few more pages, ...
   That could be handled by a metric which says if the disk is spun
   down, wait until there is more memory pressure before writing.  But
   if the disk is spinning, we don't care, you should start writing out
   buffers at some low rate to keep the pressure from rising too
   rapidly.  
  Notice that write is not free (in terms of power) even if disk is
  spinning.  Seeks (etc) also take some power. And think about
  flashcards. It certainly is cheaper tha spinning disk up but still not
  free.
 
 Isn't this why noflushd exists or is this an evil thing that shouldn't
 ever be used and will eventually eat my disks for breakfast?

It would eat your flash for breakfast. You know, flash memories have
no spinning parts, so there's nothing to spin down.
Pavel
-- 
I'm [EMAIL PROTECTED] In my country we have almost anarchy and I don't care.
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Daniel Phillips

On Wednesday 20 June 2001 06:39, Richard Gooch wrote:
 Daniel Phillips writes:
  I never realized how much I didn't like the good old 5 second delay
  between saving an edit and actually getting it written to disk until
  it went away.  Now the question is, did I lose any performance in
  doing that.  What I wrote in the previous email turned out to be
  pretty accurate, so I'll just quote it

 Starting I/O immediately if there is no load sounds nice. However,
 what about the other case, when the disc is already spun down (and
 hence there's no I/O load either)? I want the system to avoid doing
 writes while the disc is spun down. I'm quite happy for the system to
 accumulate dirtied pages/buffers, reclaiming clean pages as needed,
 until it absolutely has to start writing out (or I call sync(2)).

I'd like that too, but what about sync writes?  As things stand now, there is 
no option but to spin the disk back up.  To get around this we'd have to 
change the basic behavior of the block device and that's doable, but it's an 
entirely different proposition than the little patch above.

You know about this project no doubt:

   http://noflushd.sourceforge.net/

This is really complementary to what I did.  Lightweight is not really a good 
way to describe it though, the tar is almost 10,000 lines long.  There is 
probably a clever thing to do at the kernel level to shorten that up.

There's one thing I think I can help fix up while I'm working in here, this 
complaint: 

Reiserfs journaling bypasses the kernel's delayed write mechanisms and
writes straight to disk.

We need to address the reasons why such filesystems have to bypass kupdate.  
This touches on how sync and fsync work, updating supers, flushing the inode 
cache etc, but with Al Viro's superblock work merged now we could start 
thinking about it.

 Right now I hack that by setting bdflush parameters to 5 minutes. But
 that's not ideal either.

Yes, that still works with my patch.  The noflushd user space daemon works by 
turning off kupdate (set update time to 0).

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Richard Gooch

Daniel Phillips writes:
 On Wednesday 20 June 2001 06:39, Richard Gooch wrote:
  Starting I/O immediately if there is no load sounds nice. However,
  what about the other case, when the disc is already spun down (and
  hence there's no I/O load either)? I want the system to avoid doing
  writes while the disc is spun down. I'm quite happy for the system to
  accumulate dirtied pages/buffers, reclaiming clean pages as needed,
  until it absolutely has to start writing out (or I call sync(2)).
 
 I'd like that too, but what about sync writes?  As things stand now,
 there is no option but to spin the disk back up.  To get around this
 we'd have to change the basic behavior of the block device and
 that's doable, but it's an entirely different proposition than the
 little patch above.

I don't care as much about sync writes. They don't seem to happen very
often on my boxes.

 You know about this project no doubt:
 
http://noflushd.sourceforge.net/

Only vaguely. It's huge. Over 2300 lines of C code and 560 lines in
.h files! As you say, not really lightweight. There must be a better
way. Also, I suspect (without having looked at the code) that it
doesn't handle memory pressure well. Things may get nasty when we run
low on free pages.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Daniel Phillips

On Tuesday 19 June 2001 12:46, Pavel Machek wrote:
Roger It does if you are running on a laptop. Then you do not want
Roger the pages go out all the time. Disk has gone too sleep, needs
Roger to start to write a few pages, stays idle for a while, goes to
Roger sleep, a few more pages, ...
That could be handled by a metric which says if the disk is spun
down, wait until there is more memory pressure before writing.  But
if the disk is spinning, we don't care, you should start writing out
buffers at some low rate to keep the pressure from rising too
rapidly.
  
   Notice that write is not free (in terms of power) even if disk is
   spinning.  Seeks (etc) also take some power. And think about
   flashcards. It certainly is cheaper tha spinning disk up but still not
   free.
 
  Isn't this why noflushd exists or is this an evil thing that shouldn't
  ever be used and will eventually eat my disks for breakfast?

 It would eat your flash for breakfast. You know, flash memories have
 no spinning parts, so there's nothing to spin down.

Yes, this doesn't make sense for flash, and in fact, it doesn't make sense to 
have just one set of bdflush parameters for the whole system, it's really a 
property of the individual device.  So the thing to do is for me to go kibitz 
on the io layer rewrite projects and figure out how to set up the 
intelligence per-queue, and have the queues per-device, at which point it's 
trivial to do the write^H^H^H^H^H right thing for each kind of device.

BTW, with nominal 100,000 erases you have to write 10 terabytes to your 100 
meg flash disk before you'll see it start to degrade.  These devices are set 
up to avoid continuous hammering on the same same page, and to take failed 
pages out of the pool as soon as they fail to erase.  Also, the 100,000 
figure is nominal - the average number of erases you'll get per page is 
considerably higher.  The extra few sectors we see with the early flush patch 
are just not going to affect the life of your flash to a measurable degree.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Rik van Riel

On Wed, 20 Jun 2001, Daniel Phillips wrote:

 BTW, with nominal 100,000 erases you have to write 10 terabytes
 to your 100 meg flash disk before you'll see it start to
 degrade.

That assumes you write out full blocks.  If you flush after
every byte written you'll hit the limit a lot sooner ;)

Btw, this is also a problem with your patch, when you write
out buffers all the time your disk will spend more time seeking
all over the place (moving the disk head away from where we are
currently reading!) and you'll end up writing the same block
multiple times ...

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   we are concerned about the GNU General Public License (GPL)


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-20 Thread Daniel Phillips

On Wednesday 20 June 2001 19:32, Rik van Riel wrote:
 On Wed, 20 Jun 2001, Daniel Phillips wrote:
  BTW, with nominal 100,000 erases you have to write 10 terabytes
  to your 100 meg flash disk before you'll see it start to
  degrade.

 That assumes you write out full blocks.  If you flush after
 every byte written you'll hit the limit a lot sooner ;)

Yep, so if you are running on a Yopy, try not to sync after each byte.

 Btw, this is also a problem with your patch, when you write
 out buffers all the time your disk will spend more time seeking
 all over the place (moving the disk head away from where we are
 currently reading!) and you'll end up writing the same block
 multiple times ...

It doesn't work that way, it tacks the flush onto the trailing edge of a 
burst of disk activity, or it flushes out an isolated update, say an edit 
save, which would have required the same amount of disk activity, just a few 
seconds off in the future.  Sometimes it does write a few extra sectors when 
disk activity is sporadic, but the impact on total throughput is small enough 
to be hard to measure reliably.  Even so, there is some optimizing that could 
be done - the update could be interleaved a little better with the falling 
edge of a heavy traffic episode.  This would require that the io rate be 
monitored instead of just the queue backlog.  I'mi nterested in tackling that 
eventually - it has applications in other areas than just the early update.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Tom Sightler

Quoting Daniel Phillips [EMAIL PROTECTED]:

 I originally intended to implement a sliding flush delay based on disk
 load.  
 This turned out to be a lot of work for a hard-to-discern benefit.  So
 the 
 current approach has just two delays: .1 second and whatever the bdflush
 
 delay is set to.  If there is any non-flush disk traffic the longer
 delay is 
 used.  This is crude but effective... I think.  I hope that somebody
 will run 
 this through some benchmarks to see if I lost any performance. 
 According to 
 my calculations, I did not.  I tested this mainly in UML, and also ran
 it 
 briefly on my laptop.  The interactive feel of the change is immediately
 
 obvious, and for me at least, a big improvement.


Well, since a lot of this discussion seemed to spin off from my original posting
last week about my particular issue with disk flushing I decided to try your
patch with my simple test/problem that I experience on my laptop.

One note, I ran your patch against 2.4.6-pre3 as that is what currently performs
the best on my laptop.  It seems to apply cleanly and compiled without problems.

I used this kernel on my laptop kernel on my laptop all day for my normal
workload which consist ofa Gnome 1.4 desktop, several Mozilla instances, several
ssh sessions with remote X programs displayed, StarOffice, VMware (running
Windows 2000 Pro in 128MB).  I also preformed several compiles throughout the
day.  Overall the machine feels slightly more sluggish I think due to the
following two things:

1.  When running a compile, or anything else that produces lots of small disk
writes, you tend to get lots of little pauses for all the little writes to disk.
 These seem to be unnoticable without the patch.

2.  Loading programs when writing activity is occuring (even light activity like
during the compile) is noticable slower, actually any reading from disk is.

I also ran my simple ftp test that produced the symptom I reported earlier.  I
transferred a 750MB file via FTP, and with your patch sure enough disk writing
started almost immediately, but it still didn't seem to write enough data to
disk to keep up with the transfer so at approximately the 200MB mark the old
behavior still kicked in as it went into full flush mode, during the time
network activity halted, just like before.  The big difference with the patch
and without is that the patched kernel never seems to balance out, without the
patch once the initial burst is done you get a nice stream of data from the
network to disk with the disk staying moderately active.  With the patch the
disk varies from barely active moderate to heavy and back, during the heavy the
network transfer always pauses (although very briefly).

Just my observations, you asked for comments.

Later,
Tom

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC] Early flush (was: spindown)

2001-06-20 Thread Daniel Phillips

On Wednesday 20 June 2001 22:58, Tom Sightler wrote:
 Quoting Daniel Phillips [EMAIL PROTECTED]:
  I originally intended to implement a sliding flush delay based on disk
  load.
  This turned out to be a lot of work for a hard-to-discern benefit.  So
  the
  current approach has just two delays: .1 second and whatever the bdflush
 
  delay is set to.  If there is any non-flush disk traffic the longer
  delay is
  used.  This is crude but effective... I think.  I hope that somebody
  will run
  this through some benchmarks to see if I lost any performance.
  According to
  my calculations, I did not.  I tested this mainly in UML, and also ran
  it
  briefly on my laptop.  The interactive feel of the change is immediately
 
  obvious, and for me at least, a big improvement.

 Well, since a lot of this discussion seemed to spin off from my original
 posting last week about my particular issue with disk flushing I decided to
 try your patch with my simple test/problem that I experience on my laptop.

 One note, I ran your patch against 2.4.6-pre3 as that is what currently
 performs the best on my laptop.  It seems to apply cleanly and compiled
 without problems.

 I used this kernel on my laptop kernel on my laptop all day for my normal
 workload which consist ofa Gnome 1.4 desktop, several Mozilla instances,
 several ssh sessions with remote X programs displayed, StarOffice, VMware
 (running Windows 2000 Pro in 128MB).  I also preformed several compiles
 throughout the day.  Overall the machine feels slightly more sluggish I
 think due to the following two things:

 1.  When running a compile, or anything else that produces lots of small
 disk writes, you tend to get lots of little pauses for all the little
 writes to disk. These seem to be unnoticable without the patch.

OK, this is because the early flush doesn't quit when load picks up again.  
Measuring only the io backlog, as I do now, isn't adequate for telling the 
difference between load initiated by the flush itself and other load, such as 
cpu bound process proceding to read another file, so that's why the flush 
doesn't stop flushing when other IO starts happening.  This has to be fixed.

In the mean time, you could try this simple tweak: just set the lower bound, 
currently 1/10th second a little higher:

-   unsigned check_interval = HZ/10, ...
+   unsigned check_interval = HZ/5, ...

This may be enough to bridge the little pauses in the the compiler's disk 
access pattern so the flush isn't triggered.  (This is not by any means a 
nice solution.)  If you set check_interval to HZ*5, you *should* get exactly 
the old behaviour, I'd be very interested to hear if you do.

Also, could you do your compiles with 'time' so you can quantify the results?

 2.  Loading programs when writing activity is occuring (even light activity
 like during the compile) is noticable slower, actually any reading from
 disk is.

Hmm, let me think why that may be.  The loader doesn't actually read the 
program into memory, it just maps it and lets the pages fault in as they're 
called for.  So if readahead isn't perfect (it isn't) the io backlog may drop 
to 0 briefly just as the kflush decides to sample it, and it initiates a 
flush.  This flush cleans the whole dirty list out, stealing bandwidth from 
the reads.

 I also ran my simple ftp test that produced the symptom I reported earlier.
  I transferred a 750MB file via FTP, and with your patch sure enough disk
 writing started almost immediately, but it still didn't seem to write
 enough data to disk to keep up with the transfer so at approximately the
 200MB mark the old behavior still kicked in as it went into full flush
 mode, during the time network activity halted, just like before.  The big
 difference with the patch and without is that the patched kernel never
 seems to balance out, without the patch once the initial burst is done you
 get a nice stream of data from the network to disk with the disk staying
 moderately active.  With the patch the disk varies from barely active
 moderate to heavy and back, during the heavy the network transfer always
 pauses (although very briefly).

 Just my observations, you asked for comments.

Yes, I have to refine this.  The inner flush loop has to know how many io 
submissions are happening, from which it can subtract its own submissions and 
know sombebody else is submitting IO, at which point it can fall back to the 
good old 5 second buffer age limit.  False positives from kflush are handled 
as a fringe benefit, and flush_dirty_buffers won't do extra writeout.  This 
is easy and cheap.

I could get a lot fancier than this and caculate IO load averages, but I'd 
only do that after mining out the simple possibilities.  I'll probably have 
something new for you to try tomorrow, if you're willing.  By the way, I'm 
not addressing your fundamental problem, that's Rik's job ;-).  In fact, I 
define success in this effort by the extent to which I 

Re: [RFC] Early flush (was: spindown)

2001-06-19 Thread Richard Gooch

Daniel Phillips writes:
> I never realized how much I didn't like the good old 5 second delay
> between saving an edit and actually getting it written to disk until
> it went away.  Now the question is, did I lose any performance in
> doing that.  What I wrote in the previous email turned out to be
> pretty accurate, so I'll just quote it

Starting I/O immediately if there is no load sounds nice. However,
what about the other case, when the disc is already spun down (and
hence there's no I/O load either)? I want the system to avoid doing
writes while the disc is spun down. I'm quite happy for the system to
accumulate dirtied pages/buffers, reclaiming clean pages as needed,
until it absolutely has to start writing out (or I call sync(2)).

Right now I hack that by setting bdflush parameters to 5 minutes. But
that's not ideal either.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[RFC] Early flush (was: spindown)

2001-06-19 Thread Daniel Phillips

I never realized how much I didn't like the good old 5 second delay between 
saving an edit and actually getting it written to disk until it went away.  
Now the question is, did I lose any performance in doing that.  What I wrote 
in the previous email turned out to be pretty accurate, so I'll just quote it 
to keep it together with the patch:

> I'm now in the midst of hatching a patch. [1] The first thing I had to do
> is go explore the block driver code, yum yum.  I found that it already
> computes the statistic I'm interested in, namely queued_sectors, which is
> used to pace the IO on block devices.  It's a little crude - we really want
> this to be per-queue and have one queue per "spindle" - but even in its
> current form it's workable.
>
> The idea is that when queued_sectors drops below some threshold we have
> 'unused disk bandwidth' so it would be nice to do something useful with it:
>
>   1) Do an early 'sync_old_buffers'
>   2) Do some preemptive pageout
>
> The benefit of (1) is that it lets disks go idle a few seconds earlier, and
> (2) should improve the system's latency in response to load surges.  There
> are drawbacks too, which have been pointed out to me privately, but they
> tend to be pretty minor, for example: on a flash disk you'd do a few extra
> writes and wear it out ever-so-slightly sooner.  All the same, such special
> devices can be dealt easily once we progress a little further in improving
> the kernel's 'per spindle' intelligence.
>
> Now how to implement this.  I considered putting a (newly minted)
> wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded
> transition, and that's fine except it doesn't do the whole job: we also
> need to have the early flush for any write to a disk file while the disks
> are lightly loaded, i.e., there is no convenient loaded-to-unloaded
> transition to trigger it.  The missing trigger could be inserted into
> __mark_dirty, but that would penalize the loaded state (a little, but
> that's still too much). Furthermore, it's probably desirable to maintain a
> small delay between the dirty and the flush.  So what I'll try first is
> just running kflush's timer faster, and make its reschedule period vary
> with disk load, i.e., when there are fewer queued_sectors, kflush looks at
> the dirty buffer list more often.
>
> The rest of what has to happen in kflush is pretty straightforward.  It
> just uses queued_sectors to determine how far to walk the dirty buffer
> list, which is maintained in time-since-dirtied order.  If queued_sectors
> is below some threshold the entire list is flushed.  Note that we want to
> change the sense of b_flushtime to b_timedirtied.  It's more efficient to
> do it this way anyway.
>
> I haven't done anything about preemptive pageout yet, but similar ideas
> apply.
>
> [1] This is an experiment, do not worry, it will not show up in your tree
> any time soon.  IOW, constructive criticism appreciated, flames copied to
> /dev/null.

I originally intended to implement a sliding flush delay based on disk load.  
This turned out to be a lot of work for a hard-to-discern benefit.  So the 
current approach has just two delays: .1 second and whatever the bdflush 
delay is set to.  If there is any non-flush disk traffic the longer delay is 
used.  This is crude but effective... I think.  I hope that somebody will run 
this through some benchmarks to see if I lost any performance.  According to 
my calculations, I did not.  I tested this mainly in UML, and also ran it 
briefly on my laptop.  The interactive feel of the change is immediately 
obvious, and for me at least, a big improvement.

The patch is against 2.4.5.  To apply:

  cd /your/source/tree
  patch b_flushtime = jiffies + bdf_prm.b_un.age_buffer;
+   bh->b_dirtytime = jiffies;
refile_buffer(bh);
 }
 
@@ -2524,12 +2524,20 @@
as all dirty buffers lives _only_ in the DIRTY lru list.
As we never browse the LOCKED and CLEAN lru lists they are infact
completly useless. */
-static int flush_dirty_buffers(int check_flushtime)
+static int flush_dirty_buffers (int update)
 {
struct buffer_head * bh, *next;
int flushed = 0, i;
+   unsigned queued = atomic_read (_sectors);
+   unsigned long youngest_to_update;
 
- restart:
+#ifdef DEBUG
+   if (update)
+   printk("kupdate %lu %i\n", jiffies, queued);
+#endif
+
+restart:
+   youngest_to_update = jiffies - (queued? bdf_prm.b_un.age_buffer: 0);
spin_lock(_list_lock);
bh = lru_list[BUF_DIRTY];
if (!bh)
@@ -2544,19 +2552,14 @@
if (buffer_locked(bh))
continue;
 
-   if (check_flushtime) {
-   /* The dirty lru list is chronologically ordered so
-  if the current bh is not yet timed out,
-  then also all the following bhs
-  will be too young. */
-   if 

Re: spindown

2001-06-19 Thread Simon Huggins

On Fri, Jun 15, 2001 at 03:23:07PM +, Pavel Machek wrote:
> > Roger> It does if you are running on a laptop. Then you do not want
> > Roger> the pages go out all the time. Disk has gone too sleep, needs
> > Roger> to start to write a few pages, stays idle for a while, goes to
> > Roger> sleep, a few more pages, ...
> > That could be handled by a metric which says if the disk is spun
> > down, wait until there is more memory pressure before writing.  But
> > if the disk is spinning, we don't care, you should start writing out
> > buffers at some low rate to keep the pressure from rising too
> > rapidly.  
> Notice that write is not free (in terms of power) even if disk is
> spinning.  Seeks (etc) also take some power. And think about
> flashcards. It certainly is cheaper tha spinning disk up but still not
> free.

Isn't this why noflushd exists or is this an evil thing that shouldn't
ever be used and will eventually eat my disks for breakfast?


Description: allow idle hard disks to spin down
 Noflushd is a daemon that spins down disks that have not been read from
 after a certain amount of time, and then prevents disk writes from
 spinning them back up. It's targeted for laptops but can be used on any
 computer with IDE disks. The effect is that the hard disk actually spins
 down, saving you battery power, and shutting off the loudest component of
 most computers.

http://noflushd.sourceforge.net


Simon.

-- 
[ "CATS. CATS ARE NICE." - Death, "Sourcery"   ]
Black Cat Networks.  http://www.blackcatnetworks.co.uk/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown

2001-06-19 Thread Simon Huggins

On Fri, Jun 15, 2001 at 03:23:07PM +, Pavel Machek wrote:
  Roger It does if you are running on a laptop. Then you do not want
  Roger the pages go out all the time. Disk has gone too sleep, needs
  Roger to start to write a few pages, stays idle for a while, goes to
  Roger sleep, a few more pages, ...
  That could be handled by a metric which says if the disk is spun
  down, wait until there is more memory pressure before writing.  But
  if the disk is spinning, we don't care, you should start writing out
  buffers at some low rate to keep the pressure from rising too
  rapidly.  
 Notice that write is not free (in terms of power) even if disk is
 spinning.  Seeks (etc) also take some power. And think about
 flashcards. It certainly is cheaper tha spinning disk up but still not
 free.

Isn't this why noflushd exists or is this an evil thing that shouldn't
ever be used and will eventually eat my disks for breakfast?


Description: allow idle hard disks to spin down
 Noflushd is a daemon that spins down disks that have not been read from
 after a certain amount of time, and then prevents disk writes from
 spinning them back up. It's targeted for laptops but can be used on any
 computer with IDE disks. The effect is that the hard disk actually spins
 down, saving you battery power, and shutting off the loudest component of
 most computers.

http://noflushd.sourceforge.net


Simon.

-- 
[ CATS. CATS ARE NICE. - Death, Sourcery   ]
Black Cat Networks.  http://www.blackcatnetworks.co.uk/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[RFC] Early flush (was: spindown)

2001-06-19 Thread Daniel Phillips

I never realized how much I didn't like the good old 5 second delay between 
saving an edit and actually getting it written to disk until it went away.  
Now the question is, did I lose any performance in doing that.  What I wrote 
in the previous email turned out to be pretty accurate, so I'll just quote it 
to keep it together with the patch:

 I'm now in the midst of hatching a patch. [1] The first thing I had to do
 is go explore the block driver code, yum yum.  I found that it already
 computes the statistic I'm interested in, namely queued_sectors, which is
 used to pace the IO on block devices.  It's a little crude - we really want
 this to be per-queue and have one queue per spindle - but even in its
 current form it's workable.

 The idea is that when queued_sectors drops below some threshold we have
 'unused disk bandwidth' so it would be nice to do something useful with it:

   1) Do an early 'sync_old_buffers'
   2) Do some preemptive pageout

 The benefit of (1) is that it lets disks go idle a few seconds earlier, and
 (2) should improve the system's latency in response to load surges.  There
 are drawbacks too, which have been pointed out to me privately, but they
 tend to be pretty minor, for example: on a flash disk you'd do a few extra
 writes and wear it out ever-so-slightly sooner.  All the same, such special
 devices can be dealt easily once we progress a little further in improving
 the kernel's 'per spindle' intelligence.

 Now how to implement this.  I considered putting a (newly minted)
 wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded
 transition, and that's fine except it doesn't do the whole job: we also
 need to have the early flush for any write to a disk file while the disks
 are lightly loaded, i.e., there is no convenient loaded-to-unloaded
 transition to trigger it.  The missing trigger could be inserted into
 __mark_dirty, but that would penalize the loaded state (a little, but
 that's still too much). Furthermore, it's probably desirable to maintain a
 small delay between the dirty and the flush.  So what I'll try first is
 just running kflush's timer faster, and make its reschedule period vary
 with disk load, i.e., when there are fewer queued_sectors, kflush looks at
 the dirty buffer list more often.

 The rest of what has to happen in kflush is pretty straightforward.  It
 just uses queued_sectors to determine how far to walk the dirty buffer
 list, which is maintained in time-since-dirtied order.  If queued_sectors
 is below some threshold the entire list is flushed.  Note that we want to
 change the sense of b_flushtime to b_timedirtied.  It's more efficient to
 do it this way anyway.

 I haven't done anything about preemptive pageout yet, but similar ideas
 apply.

 [1] This is an experiment, do not worry, it will not show up in your tree
 any time soon.  IOW, constructive criticism appreciated, flames copied to
 /dev/null.

I originally intended to implement a sliding flush delay based on disk load.  
This turned out to be a lot of work for a hard-to-discern benefit.  So the 
current approach has just two delays: .1 second and whatever the bdflush 
delay is set to.  If there is any non-flush disk traffic the longer delay is 
used.  This is crude but effective... I think.  I hope that somebody will run 
this through some benchmarks to see if I lost any performance.  According to 
my calculations, I did not.  I tested this mainly in UML, and also ran it 
briefly on my laptop.  The interactive feel of the change is immediately 
obvious, and for me at least, a big improvement.

The patch is against 2.4.5.  To apply:

  cd /your/source/tree
  patch this/patch -p0

--- ../uml.2.4.5.clean/fs/buffer.c  Sat May 26 02:57:46 2001
+++ ./fs/buffer.c   Wed Jun 20 01:55:21 2001
@@ -1076,7 +1076,7 @@
 
 static __inline__ void __mark_dirty(struct buffer_head *bh)
 {
-   bh-b_flushtime = jiffies + bdf_prm.b_un.age_buffer;
+   bh-b_dirtytime = jiffies;
refile_buffer(bh);
 }
 
@@ -2524,12 +2524,20 @@
as all dirty buffers lives _only_ in the DIRTY lru list.
As we never browse the LOCKED and CLEAN lru lists they are infact
completly useless. */
-static int flush_dirty_buffers(int check_flushtime)
+static int flush_dirty_buffers (int update)
 {
struct buffer_head * bh, *next;
int flushed = 0, i;
+   unsigned queued = atomic_read (queued_sectors);
+   unsigned long youngest_to_update;
 
- restart:
+#ifdef DEBUG
+   if (update)
+   printk(kupdate %lu %i\n, jiffies, queued);
+#endif
+
+restart:
+   youngest_to_update = jiffies - (queued? bdf_prm.b_un.age_buffer: 0);
spin_lock(lru_list_lock);
bh = lru_list[BUF_DIRTY];
if (!bh)
@@ -2544,19 +2552,14 @@
if (buffer_locked(bh))
continue;
 
-   if (check_flushtime) {
-   /* The dirty lru list is chronologically ordered so
-  if 

Re: [RFC] Early flush (was: spindown)

2001-06-19 Thread Richard Gooch

Daniel Phillips writes:
 I never realized how much I didn't like the good old 5 second delay
 between saving an edit and actually getting it written to disk until
 it went away.  Now the question is, did I lose any performance in
 doing that.  What I wrote in the previous email turned out to be
 pretty accurate, so I'll just quote it

Starting I/O immediately if there is no load sounds nice. However,
what about the other case, when the disc is already spun down (and
hence there's no I/O load either)? I want the system to avoid doing
writes while the disc is spun down. I'm quite happy for the system to
accumulate dirtied pages/buffers, reclaiming clean pages as needed,
until it absolutely has to start writing out (or I call sync(2)).

Right now I hack that by setting bdflush parameters to 5 minutes. But
that's not ideal either.

Regards,

Richard
Permanent: [EMAIL PROTECTED]
Current:   [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-18 Thread Mike Galbraith

On Mon, 18 Jun 2001, Daniel Phillips wrote:

> On Sunday 17 June 2001 12:05, Mike Galbraith wrote:
> > It _juuust_ so happens that I was tinkering... what do you think of
> > something like the below?  (and boy do I ever wonder what a certain
> > box doing slrn stuff thinks of it.. hint hint;)
>
> It's too subtle for me ;-)  (Not shy about sying that because this part of
> the kernel is probably subtle for everyone.)

No subtltry (hammer), it just draws a line that doesn't move around
in unpredictable ways.  For example, nr_free_buffer_pages() adds in
free pages to the line it draws.  You may have a large volume of dirty
data, decide it would be prudent to flush, then someone frees a nice
chunk of memory...  (send morse code messages via malloc/free?:)

Anyway it's crude, but it seems to have gotten results from the slrn
load.  I received logs for ac15 and ac15+patch.  ac15 took 265 seconds
to do the job whereas with the patch it took 227 seconds.  I haven't
poured over the logs yet, but there seems to be throughput to be had.

If anyone is interested in the logs, they're much smaller than expected
-rw-r--r--   1 mikegusers   11993 Jun 19 05:58 ac15_mike.log
-rw-r--r--   1 mikegusers   13015 Jun 19 05:58 ac15_org.log

> The question I'm tackling right now is how the system behaves when the load
> goes away, or doesn't get heavy.  Your patch doesn't measure the load
> directly - it may attempt to predict it as a function of memory pressure, but
> that's a little more loosely coupled than what I had in mind.

It doesn't attempt to predict, it reacts to the existing situation.

> I'm now in the midst of hatching a patch. [1] The first thing I had to do is
> go explore the block driver code, yum yum.  I found that it already computes
> the statistic I'm interested in, namely queued_sectors, which is used to pace
> the IO on block devices.  It's a little crude - we really want this to be
> per-queue and have one queue per "spindle" - but even in its current form
> it's workable.
>
> The idea is that when queued_sectors drops below some threshold we have
> 'unused disk bandwidth' so it would be nice to do something useful with it:

(that's much more subtle/clever:)

>   1) Do an early 'sync_old_buffers'
>   2) Do some preemptive pageout
>
> The benefit of (1) is that it lets disks go idle a few seconds earlier, and
> (2) should improve the system's latency in response to load surges.  There
> are drawbacks too, which have been pointed out to me privately, but they tend
> to be pretty minor, for example: on a flash disk you'd do a few extra writes
> and wear it out ever-so-slightly sooner.  All the same, such special devices
> can be dealt easily once we progress a little further in improving the
> kernel's 'per spindle' intelligence.
>
> Now how to implement this.  I considered putting a (newly minted)
> wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded
> transition, and that's fine except it doesn't do the whole job: we also need
> to have the early flush for any write to a disk file while the disks are
> lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to
> trigger it.  The missing trigger could be inserted into __mark_dirty, but
> that would penalize the loaded state (a little, but that's still too much).
> Furthermore, it's probably desirable to maintain a small delay between the
> dirty and the flush.  So what I'll try first is just running kflush's timer
> faster, and make its reschedule period vary with disk load, i.e., when there
> are fewer queued_sectors, kflush looks at the dirty buffer list more often.
>
> The rest of what has to happen in kflush is pretty straightforward.  It just
> uses queued_sectors to determine how far to walk the dirty buffer list, which
> is maintained in time-since-dirtied order.  If queued_sectors is below some
> threshold the entire list is flushed.  Note that we want to change the sense
> of b_flushtime to b_timedirtied.  It's more efficient to do it this way
> anyway.
>
> I haven't done anything about preemptive pageout yet, but similar ideas apply.

Preemptive pageout could be simply walk the dirty list looking for swap
pages and writing them out.  With the fair aging change that's already
in, there will be some.  If the fair aging change to background aging
works out, there will be more (don't want too many more though;).  The
only problem I can see with that simle method is that once written, the
page lands on the inactive_clean list.  That list is short and does get
consumed.. might turn fake pageout into a real one unintentionally.

> [1] This is an experiment, do not worry, it will not show up in your tree any
> time soon.  IOW, constructive criticism appreciated, flames copied to
> /dev/null.

Look forward to seeing it.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-18 Thread Daniel Phillips

On Sunday 17 June 2001 12:05, Mike Galbraith wrote:
> It _juuust_ so happens that I was tinkering... what do you think of
> something like the below?  (and boy do I ever wonder what a certain
> box doing slrn stuff thinks of it.. hint hint;)

It's too subtle for me ;-)  (Not shy about sying that because this part of 
the kernel is probably subtle for everyone.)

The question I'm tackling right now is how the system behaves when the load 
goes away, or doesn't get heavy.  Your patch doesn't measure the load 
directly - it may attempt to predict it as a function of memory pressure, but 
that's a little more loosely coupled than what I had in mind.

I'm now in the midst of hatching a patch. [1] The first thing I had to do is 
go explore the block driver code, yum yum.  I found that it already computes 
the statistic I'm interested in, namely queued_sectors, which is used to pace 
the IO on block devices.  It's a little crude - we really want this to be 
per-queue and have one queue per "spindle" - but even in its current form 
it's workable.

The idea is that when queued_sectors drops below some threshold we have 
'unused disk bandwidth' so it would be nice to do something useful with it:

  1) Do an early 'sync_old_buffers'
  2) Do some preemptive pageout

The benefit of (1) is that it lets disks go idle a few seconds earlier, and 
(2) should improve the system's latency in response to load surges.  There 
are drawbacks too, which have been pointed out to me privately, but they tend 
to be pretty minor, for example: on a flash disk you'd do a few extra writes 
and wear it out ever-so-slightly sooner.  All the same, such special devices 
can be dealt easily once we progress a little further in improving the 
kernel's 'per spindle' intelligence.

Now how to implement this.  I considered putting a (newly minted) 
wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded 
transition, and that's fine except it doesn't do the whole job: we also need 
to have the early flush for any write to a disk file while the disks are 
lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to 
trigger it.  The missing trigger could be inserted into __mark_dirty, but 
that would penalize the loaded state (a little, but that's still too much).  
Furthermore, it's probably desirable to maintain a small delay between the 
dirty and the flush.  So what I'll try first is just running kflush's timer 
faster, and make its reschedule period vary with disk load, i.e., when there 
are fewer queued_sectors, kflush looks at the dirty buffer list more often.

The rest of what has to happen in kflush is pretty straightforward.  It just 
uses queued_sectors to determine how far to walk the dirty buffer list, which 
is maintained in time-since-dirtied order.  If queued_sectors is below some 
threshold the entire list is flushed.  Note that we want to change the sense 
of b_flushtime to b_timedirtied.  It's more efficient to do it this way 
anyway.

I haven't done anything about preemptive pageout yet, but similar ideas apply.

[1] This is an experiment, do not worry, it will not show up in your tree any 
time soon.  IOW, constructive criticism appreciated, flames copied to 
/dev/null.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-18 Thread Daniel Phillips

On Sunday 17 June 2001 12:05, Mike Galbraith wrote:
 It _juuust_ so happens that I was tinkering... what do you think of
 something like the below?  (and boy do I ever wonder what a certain
 box doing slrn stuff thinks of it.. hint hint;)

It's too subtle for me ;-)  (Not shy about sying that because this part of 
the kernel is probably subtle for everyone.)

The question I'm tackling right now is how the system behaves when the load 
goes away, or doesn't get heavy.  Your patch doesn't measure the load 
directly - it may attempt to predict it as a function of memory pressure, but 
that's a little more loosely coupled than what I had in mind.

I'm now in the midst of hatching a patch. [1] The first thing I had to do is 
go explore the block driver code, yum yum.  I found that it already computes 
the statistic I'm interested in, namely queued_sectors, which is used to pace 
the IO on block devices.  It's a little crude - we really want this to be 
per-queue and have one queue per spindle - but even in its current form 
it's workable.

The idea is that when queued_sectors drops below some threshold we have 
'unused disk bandwidth' so it would be nice to do something useful with it:

  1) Do an early 'sync_old_buffers'
  2) Do some preemptive pageout

The benefit of (1) is that it lets disks go idle a few seconds earlier, and 
(2) should improve the system's latency in response to load surges.  There 
are drawbacks too, which have been pointed out to me privately, but they tend 
to be pretty minor, for example: on a flash disk you'd do a few extra writes 
and wear it out ever-so-slightly sooner.  All the same, such special devices 
can be dealt easily once we progress a little further in improving the 
kernel's 'per spindle' intelligence.

Now how to implement this.  I considered putting a (newly minted) 
wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded 
transition, and that's fine except it doesn't do the whole job: we also need 
to have the early flush for any write to a disk file while the disks are 
lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to 
trigger it.  The missing trigger could be inserted into __mark_dirty, but 
that would penalize the loaded state (a little, but that's still too much).  
Furthermore, it's probably desirable to maintain a small delay between the 
dirty and the flush.  So what I'll try first is just running kflush's timer 
faster, and make its reschedule period vary with disk load, i.e., when there 
are fewer queued_sectors, kflush looks at the dirty buffer list more often.

The rest of what has to happen in kflush is pretty straightforward.  It just 
uses queued_sectors to determine how far to walk the dirty buffer list, which 
is maintained in time-since-dirtied order.  If queued_sectors is below some 
threshold the entire list is flushed.  Note that we want to change the sense 
of b_flushtime to b_timedirtied.  It's more efficient to do it this way 
anyway.

I haven't done anything about preemptive pageout yet, but similar ideas apply.

[1] This is an experiment, do not worry, it will not show up in your tree any 
time soon.  IOW, constructive criticism appreciated, flames copied to 
/dev/null.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-18 Thread Mike Galbraith

On Mon, 18 Jun 2001, Daniel Phillips wrote:

 On Sunday 17 June 2001 12:05, Mike Galbraith wrote:
  It _juuust_ so happens that I was tinkering... what do you think of
  something like the below?  (and boy do I ever wonder what a certain
  box doing slrn stuff thinks of it.. hint hint;)

 It's too subtle for me ;-)  (Not shy about sying that because this part of
 the kernel is probably subtle for everyone.)

No subtltry (hammer), it just draws a line that doesn't move around
in unpredictable ways.  For example, nr_free_buffer_pages() adds in
free pages to the line it draws.  You may have a large volume of dirty
data, decide it would be prudent to flush, then someone frees a nice
chunk of memory...  (send morse code messages via malloc/free?:)

Anyway it's crude, but it seems to have gotten results from the slrn
load.  I received logs for ac15 and ac15+patch.  ac15 took 265 seconds
to do the job whereas with the patch it took 227 seconds.  I haven't
poured over the logs yet, but there seems to be throughput to be had.

If anyone is interested in the logs, they're much smaller than expected
-rw-r--r--   1 mikegusers   11993 Jun 19 05:58 ac15_mike.log
-rw-r--r--   1 mikegusers   13015 Jun 19 05:58 ac15_org.log

 The question I'm tackling right now is how the system behaves when the load
 goes away, or doesn't get heavy.  Your patch doesn't measure the load
 directly - it may attempt to predict it as a function of memory pressure, but
 that's a little more loosely coupled than what I had in mind.

It doesn't attempt to predict, it reacts to the existing situation.

 I'm now in the midst of hatching a patch. [1] The first thing I had to do is
 go explore the block driver code, yum yum.  I found that it already computes
 the statistic I'm interested in, namely queued_sectors, which is used to pace
 the IO on block devices.  It's a little crude - we really want this to be
 per-queue and have one queue per spindle - but even in its current form
 it's workable.

 The idea is that when queued_sectors drops below some threshold we have
 'unused disk bandwidth' so it would be nice to do something useful with it:

(that's much more subtle/clever:)

   1) Do an early 'sync_old_buffers'
   2) Do some preemptive pageout

 The benefit of (1) is that it lets disks go idle a few seconds earlier, and
 (2) should improve the system's latency in response to load surges.  There
 are drawbacks too, which have been pointed out to me privately, but they tend
 to be pretty minor, for example: on a flash disk you'd do a few extra writes
 and wear it out ever-so-slightly sooner.  All the same, such special devices
 can be dealt easily once we progress a little further in improving the
 kernel's 'per spindle' intelligence.

 Now how to implement this.  I considered putting a (newly minted)
 wakeup_kflush in blk_finished_io, conditional on a loaded-to-unloaded
 transition, and that's fine except it doesn't do the whole job: we also need
 to have the early flush for any write to a disk file while the disks are
 lightly loaded, i.e., there is no convenient loaded-to-unloaded transition to
 trigger it.  The missing trigger could be inserted into __mark_dirty, but
 that would penalize the loaded state (a little, but that's still too much).
 Furthermore, it's probably desirable to maintain a small delay between the
 dirty and the flush.  So what I'll try first is just running kflush's timer
 faster, and make its reschedule period vary with disk load, i.e., when there
 are fewer queued_sectors, kflush looks at the dirty buffer list more often.

 The rest of what has to happen in kflush is pretty straightforward.  It just
 uses queued_sectors to determine how far to walk the dirty buffer list, which
 is maintained in time-since-dirtied order.  If queued_sectors is below some
 threshold the entire list is flushed.  Note that we want to change the sense
 of b_flushtime to b_timedirtied.  It's more efficient to do it this way
 anyway.

 I haven't done anything about preemptive pageout yet, but similar ideas apply.

Preemptive pageout could be simply walk the dirty list looking for swap
pages and writing them out.  With the fair aging change that's already
in, there will be some.  If the fair aging change to background aging
works out, there will be more (don't want too many more though;).  The
only problem I can see with that simle method is that once written, the
page lands on the inactive_clean list.  That list is short and does get
consumed.. might turn fake pageout into a real one unintentionally.

 [1] This is an experiment, do not worry, it will not show up in your tree any
 time soon.  IOW, constructive criticism appreciated, flames copied to
 /dev/null.

Look forward to seeing it.

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread Mike Galbraith

On Sun, 17 Jun 2001 [EMAIL PROTECTED] wrote:

> On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote:
> >
> > It _juuust_ so happens that I was tinkering... what do you think of
> > something like the below?  (and boy do I ever wonder what a certain
> > box doing slrn stuff thinks of it.. hint hint;)
> >
> I'm sorry to say this box doesn't really think any different of it.

Well darn.  But..

> Everything that's in the cache before running slrn on a big group seems
> to stay there the whole time, making my active slrn-process use swap.

It should not be the same data if page aging is working at all.  Better
stated, if it _is_ the same data and page aging is working, it's needed
data, so the movement of momentarily unused rss to disk might have been
the right thing to do.. it just has to buy you the use of the pages moved
for long enough to offset the (large) cost of dropping those pages.

I saw it adding rss to the aging pool, but not terribly much IO.  The
fact that it is using page replacement is only interesting in regard to
total system efficiency.

> I applied the patch to 2.4.5-ac15, and this was the result:



Thanks for running it.  Can you (afford to) send me procinfo or such
(what I would like to see is job efficiency) information?  Full logs
are fine, as long as they're not truely huge :)  Anything under a meg
is gratefully accepted (privately 'course).

I think (am pretty darn sure) the aging fairness change is what is
affecting you, but it's not possible to see whether this change is
affecting you in a negative or positive way without timing data.

-Mike

misc:

wrt this ~patch, it only allows you to move the rolldown to sync disk
behavior some.. moving write delay back some (knob) is _supposed_ to
get that IO load (at least) a modest throughput increase.  The flushto
thing was basically directed toward laptop use, but ~seems to exhibit
better IO clustering/bandwidth sharing as well.  (less old/new request
merging?.. distance?)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread thunder7

On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote:
> 
> It _juuust_ so happens that I was tinkering... what do you think of
> something like the below?  (and boy do I ever wonder what a certain
> box doing slrn stuff thinks of it.. hint hint;)
> 
I'm sorry to say this box doesn't really think any different of it.

Everything that's in the cache before running slrn on a big group seems
to stay there the whole time, making my active slrn-process use swap.

I applied the patch to 2.4.5-ac15, and this was the result:

   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 0  1  0  11216   2548 183560 264172   1   4   184   343  123   119   2   6  92
 0  0  0  11212   2620 183444 264184   0   0 472  12799   1   2  97
 0  0  0  11212   1604 183444 264740   0   0   378 0  130   101   2   1  98
 0  1  0  11212   1588 184300 263116   0   0   552  1080  277   360   3  14  83
 2  0  2  11212   1692 174052 270536   0   0  1860 0  596   976   9  50  40
 2  0  2  11212   1588 166732 274816   0   0  1868  5426  643  1050   8  44  48
 0  1  0  11212   1588 163276 276888   0   0  1714  1816  580   972   9  17  74
 0  1  0  11212   1848 166280 273688   0   0   514  3952  301   355   3  40  57
 1  0  0  11212   1592 164232 273872   0   0  1824  3532  632  1083  11  25  64
 2  0  2  11212   1980 167304 268792   0   0  1678 0  550   881   8  51  41
 0  1  2  11212   1588 163908 271356   0   0  1344  4896  508   753   7  26  67
 1  0  0  11212   1588 160896 272756   0   0  1642  1301  574   929   9  22  69
 0  1  0  11212   1592 164936 268632   0   0   756  3594  370   467   6  43  51
 2  0  3  11212   1596 164380 266552   0   0  1904  2392  604  1017  10  52  37
 1  0  0  11212   1592 164752 265844   0   0  1784  2382  623  1000  10  22  69
 0  1  0  11212   1592 168528 262256   0   0   810  4176  364   523   5  43  52
 0  1  1  11212   1992 169324 259504   0   0  1686  3068  578   999  11  42  47
 0  1  0  11212   1588 170696 256332   0   0  1568  1080  532   894  10  20  70
 1  0  0  11212   1592 174876 253036   0   0   598  3600  315   420   4  41  55
 0  1  1  11212   2316 171592 253892   0   0  1816  3286  616  1073   7  29  64
 0  1  0  11212   1588 170380 253968   0   0  1638   840  540   910  13  29  58
 0  1  1  11212   2896 168840 253740   0   0   752  4120  342   458   4  45  51
 0  1  0  11216   2012 166392 255560   0   0  1352  2458  549   895   8  14  77
 2  0  1  11216   1588 170744 250164   0   0  1504  1260  503   791   7  48  45
 0  1  1  11224   1588 170704 249948   0   0   874  4106  516   655   6  10  84
 0  1  0  11228   1588 170148 248988   0   0  1442 0  466   772   8  20  73
 1  0  0  11228   1592 171784 247456   0   0   860  3598  362   495   7  44  48
 0  1  0  11228   1588 171864 246212   0   0  1390  3176  510   840   9  41  50
 0  1  2  11232   1992 170344 245832   0   0  1676  1808  539   898  10  45  45
 1  0  1  10508   1632 168204 246780   0 946  1508  2804  599   920   9  20  71
 0  1  0   9496   2020 168904 244880   0   0   936  3620  417   603   5  35  60
 1  0  0   9604   2516 164096 247536   0   0  1700  2214  563  1085  11  33  56
 0  1  0  16196   1820 162112 255492   0   2  1384  1596  497  1106   8  53  38
 1  0  0  19240   3000 158052 260608   0   0   400  3824  373   388   2  14  84
 1  1  1  28756   4508 146032 278104   0   0  1688  2140  612  1502   7  60  33
 2  0  0  39432  29100 105668 300912   0  18  2108  1178  645  1825  12  52  36
 1  0  0  40668  13024 108568 311748   0   0  1674  4992  623  1017   9  12  79
 0  1  0  45324   3484 105072 326432   0   0  1876  3624  619  1090  13  24  63
 1  0  0  53648   1564 102740 337688   0  18   950  3646  404   857   5  31  63
 2  0  0  53672   1604 103356 335680   0 2962  1436  5864  565   976  10  43  47
 1  0  1  54380   1920 103516 334320   0 1086  1826  1626  590  1072  13  45  42
 0  1  1  54600   6532  99568 333860   0 1006   242  5948  277  2680   2  39  59
 0  1  0  54596   1944 103744 331932   0   0  1854  3644  627  1054  11  16  73
 1  0  0  54592   1924 102876 331100   0 950  1956  2612  621  1173  11  41  48
 1  0  0  54592   1592 103576 329568   0   0  1548  4860  605  1106  11  36  53
 0  1  1  54592   1588 102908 328320   0 452  1808  2522  583  1049  11  51  38
 0  1  1  54592   1588 101916 327076   0 866  1816  1260  589  1046  11  49  40
 0  1  0  54592   2076  99568 327776   0 414   992  5728  459  1314   7  25  67
 0  1  0  54592   1588 103928 323824   0   0   968  3646  403   747   5  33  61
 1  0  0  54592   2632 100108 325136   0 402  1856  2468  622  1369  13  44  42
 0  1  0  54592   1588 101872 322600   0 392  1056  2834  461   802   6  35  60
 1  0  1  55644   1724 102108 322404   0 380  1448  2682  501  1032   9  50  41
 1  1  1  57388   1588 103068 322056   0   0  1384  1396  471   780   8  37  56
 0  1  1  58500   2048 102024 323020   0 368   876  3932  504 

Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread Daniel Phillips

On Saturday 16 June 2001 23:54, Rik van Riel wrote:
> On Sat, 16 Jun 2001, Daniel Phillips wrote:
> > > Does the patch below do anything good for your laptop? ;)
> >
> > I'll wait for the next one ;-)
>
> OK, here's one which isn't reversed and should work ;))
>
> --- fs/buffer.c.orig  Sat Jun 16 18:05:29 2001
> +++ fs/buffer.c   Sat Jun 16 18:05:15 2001
> @@ -2550,7 +2550,8 @@
>  if the current bh is not yet timed out,
>  then also all the following bhs
>  will be too young. */
> - if (time_before(jiffies, bh->b_flushtime))
> + if (++flushed > bdf_prm.b_un.ndirty &&
> + time_before(jiffies, bh->b_flushtime))
>   goto out_unlock;
>   } else {
>   if (++flushed > bdf_prm.b_un.ndirty)

No, it doesn't, because some way of knowing the disk load is required and 
there's nothing like that here.

There are two components to what I was talking about:

  1) Early flush when load is light
  2) Preemptive cleaning when load is light

Both are supposed to be triggered by other disk activity, swapout or file 
writes, and are supposed to be triggered when the disk activity eases up.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread Mike Galbraith

On Sat, 16 Jun 2001, Daniel Phillips wrote:

> On Saturday 16 June 2001 23:06, Rik van Riel wrote:
> > On Sat, 16 Jun 2001, Daniel Phillips wrote:
> > > As a side note, the good old multisecond delay before bdflush kicks in
> > > doesn't really make a lot of sense - when bandwidth is available the
> > > filesystem-initiated writeouts should happen right away.
> >
> > ... thus spinning up the disk ?
>
> Nope, the disk is already spinning, some other writeouts just finished.
>
> > How about just making sure we write out a bigger bunch
> > of dirty pages whenever one buffer gets too old ?
>
> It's simpler than that.  It's basically just: disk traffic low? good, write
> out all the dirty buffers.  Not quite as crude as that, but nearly.
>
> > Does the patch below do anything good for your laptop? ;)
>
> I'll wait for the next one ;-)

Greetings!  (well, not next one, but one anyway)

It _juuust_ so happens that I was tinkering... what do you think of
something like the below?  (and boy do I ever wonder what a certain
box doing slrn stuff thinks of it.. hint hint;)

-Mike

Doing Bonnie in big fragmented 1k bs partition on the worst spot on
the disk.  Bad benchmark, bad conditions.. but interesting results.

2.4.6.pre3 before
---Sequential Output ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
500  9609 36.0 10569 14.3  3322  6.4  9509 47.6 10597 13.8 101.7  1.4

2.4.6.pre3 after  (using flushto behavior as in defaults)
---Sequential Output ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
500  8293 30.2 11834 29.4  5072  9.5  8879 44.1 10597 13.6 100.4  0.9


2.4.6.pre3 after  (flushto = ndirty)
 ---Sequential Output ---Sequential Input-- --Random--
 -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
500 10286 38.4 10715 14.4  3267  6.1  9605 47.6 10596 13.4 102.7  1.6


--- fs/buffer.c.org Fri Jun 15 06:48:17 2001
+++ fs/buffer.c Sun Jun 17 09:14:17 2001
@@ -118,20 +118,21 @@
wake-cycle */
int nrefill; /* Number of clean buffers to try to obtain
each time we call refill */
-   int dummy1;   /* unused */
+   int nflushto;   /* Level to flush down to once bdflush starts */
int interval; /* jiffies delay between kupdate flushes */
int age_buffer;  /* Time for normal buffer to age before we flush it */
int nfract_sync; /* Percentage of buffer cache dirty to
activate bdflush synchronously */
-   int dummy2;/* unused */
+   int nmonitor;/* Size (%physpages) at which bdflush should
+ begin monitoring the buffercache */
int dummy3;/* unused */
} b_un;
unsigned int data[N_PARAM];
-} bdf_prm = {{30, 64, 64, 256, 5*HZ, 30*HZ, 60, 0, 0}};
+} bdf_prm = {{60, 64, 64, 50, 5*HZ, 30*HZ, 85, 15, 0}};

 /* These are the min and max parameter values that we will allow to be assigned */
-int bdflush_min[N_PARAM] = {  0,  10,5,   25,  0,   1*HZ,   0, 0, 0};
-int bdflush_max[N_PARAM] = {100,5, 2, 2,600*HZ, 6000*HZ, 100, 0, 0};
+int bdflush_min[N_PARAM] = {0, 10, 5, 0, 0, 1*HZ, 0, 0, 0};
+int bdflush_max[N_PARAM] = {100,5, 2, 100,600*HZ, 6000*HZ, 100, 100, 0};

 /*
  * Rewrote the wait-routines to use the "new" wait-queue functionality,
@@ -763,12 +764,8 @@
balance_dirty(NODEV);
if (free_shortage())
page_launder(GFP_BUFFER, 0);
-   if (!grow_buffers(size)) {
+   if (!grow_buffers(size))
wakeup_bdflush(1);
-   current->policy |= SCHED_YIELD;
-   __set_current_state(TASK_RUNNING);
-   schedule();
-   }
 }

 void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private)
@@ -1042,25 +1039,43 @@
 1 -> sync flush (wait for I/O completion) */
 int balance_dirty_state(kdev_t dev)
 {
-   unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit;
-
-   dirty = size_buffers_type[BUF_DIRTY] >> PAGE_SHIFT;
-   tot = nr_free_buffer_pages();
+   unsigned long dirty, cache, buffers = 0;
+   int i;

-   dirty *= 100;
-   soft_dirty_limit = tot * bdf_prm.b_un.nfract;
-   hard_dirty_limit = tot * bdf_prm.b_un.nfract_sync;
-
-   /* First, check for the "real" dirty limit. */
-   if (dirty > soft_dirty_limit) {
-   if (dirty > hard_dirty_limit)
+   for (i = 0; i < NR_LIST; i++)
+   buffers += size_buffers_type[i];
+   buffers >>= PAGE_SHIFT;
+   if (buffers * 100 < num_physpages * 

Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread Mike Galbraith

On Sat, 16 Jun 2001, Daniel Phillips wrote:

 On Saturday 16 June 2001 23:06, Rik van Riel wrote:
  On Sat, 16 Jun 2001, Daniel Phillips wrote:
   As a side note, the good old multisecond delay before bdflush kicks in
   doesn't really make a lot of sense - when bandwidth is available the
   filesystem-initiated writeouts should happen right away.
 
  ... thus spinning up the disk ?

 Nope, the disk is already spinning, some other writeouts just finished.

  How about just making sure we write out a bigger bunch
  of dirty pages whenever one buffer gets too old ?

 It's simpler than that.  It's basically just: disk traffic low? good, write
 out all the dirty buffers.  Not quite as crude as that, but nearly.

  Does the patch below do anything good for your laptop? ;)

 I'll wait for the next one ;-)

Greetings!  (well, not next one, but one anyway)

It _juuust_ so happens that I was tinkering... what do you think of
something like the below?  (and boy do I ever wonder what a certain
box doing slrn stuff thinks of it.. hint hint;)

-Mike

Doing Bonnie in big fragmented 1k bs partition on the worst spot on
the disk.  Bad benchmark, bad conditions.. but interesting results.

2.4.6.pre3 before
---Sequential Output ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
500  9609 36.0 10569 14.3  3322  6.4  9509 47.6 10597 13.8 101.7  1.4

2.4.6.pre3 after  (using flushto behavior as in defaults)
---Sequential Output ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
500  8293 30.2 11834 29.4  5072  9.5  8879 44.1 10597 13.6 100.4  0.9


2.4.6.pre3 after  (flushto = ndirty)
 ---Sequential Output ---Sequential Input-- --Random--
 -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
 MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
500 10286 38.4 10715 14.4  3267  6.1  9605 47.6 10596 13.4 102.7  1.6


--- fs/buffer.c.org Fri Jun 15 06:48:17 2001
+++ fs/buffer.c Sun Jun 17 09:14:17 2001
@@ -118,20 +118,21 @@
wake-cycle */
int nrefill; /* Number of clean buffers to try to obtain
each time we call refill */
-   int dummy1;   /* unused */
+   int nflushto;   /* Level to flush down to once bdflush starts */
int interval; /* jiffies delay between kupdate flushes */
int age_buffer;  /* Time for normal buffer to age before we flush it */
int nfract_sync; /* Percentage of buffer cache dirty to
activate bdflush synchronously */
-   int dummy2;/* unused */
+   int nmonitor;/* Size (%physpages) at which bdflush should
+ begin monitoring the buffercache */
int dummy3;/* unused */
} b_un;
unsigned int data[N_PARAM];
-} bdf_prm = {{30, 64, 64, 256, 5*HZ, 30*HZ, 60, 0, 0}};
+} bdf_prm = {{60, 64, 64, 50, 5*HZ, 30*HZ, 85, 15, 0}};

 /* These are the min and max parameter values that we will allow to be assigned */
-int bdflush_min[N_PARAM] = {  0,  10,5,   25,  0,   1*HZ,   0, 0, 0};
-int bdflush_max[N_PARAM] = {100,5, 2, 2,600*HZ, 6000*HZ, 100, 0, 0};
+int bdflush_min[N_PARAM] = {0, 10, 5, 0, 0, 1*HZ, 0, 0, 0};
+int bdflush_max[N_PARAM] = {100,5, 2, 100,600*HZ, 6000*HZ, 100, 100, 0};

 /*
  * Rewrote the wait-routines to use the new wait-queue functionality,
@@ -763,12 +764,8 @@
balance_dirty(NODEV);
if (free_shortage())
page_launder(GFP_BUFFER, 0);
-   if (!grow_buffers(size)) {
+   if (!grow_buffers(size))
wakeup_bdflush(1);
-   current-policy |= SCHED_YIELD;
-   __set_current_state(TASK_RUNNING);
-   schedule();
-   }
 }

 void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private)
@@ -1042,25 +1039,43 @@
 1 - sync flush (wait for I/O completion) */
 int balance_dirty_state(kdev_t dev)
 {
-   unsigned long dirty, tot, hard_dirty_limit, soft_dirty_limit;
-
-   dirty = size_buffers_type[BUF_DIRTY]  PAGE_SHIFT;
-   tot = nr_free_buffer_pages();
+   unsigned long dirty, cache, buffers = 0;
+   int i;

-   dirty *= 100;
-   soft_dirty_limit = tot * bdf_prm.b_un.nfract;
-   hard_dirty_limit = tot * bdf_prm.b_un.nfract_sync;
-
-   /* First, check for the real dirty limit. */
-   if (dirty  soft_dirty_limit) {
-   if (dirty  hard_dirty_limit)
+   for (i = 0; i  NR_LIST; i++)
+   buffers += size_buffers_type[i];
+   buffers = PAGE_SHIFT;
+   if (buffers * 100  num_physpages * bdf_prm.b_un.nmonitor)
+   return 

Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread Daniel Phillips

On Saturday 16 June 2001 23:54, Rik van Riel wrote:
 On Sat, 16 Jun 2001, Daniel Phillips wrote:
   Does the patch below do anything good for your laptop? ;)
 
  I'll wait for the next one ;-)

 OK, here's one which isn't reversed and should work ;))

 --- fs/buffer.c.orig  Sat Jun 16 18:05:29 2001
 +++ fs/buffer.c   Sat Jun 16 18:05:15 2001
 @@ -2550,7 +2550,8 @@
  if the current bh is not yet timed out,
  then also all the following bhs
  will be too young. */
 - if (time_before(jiffies, bh-b_flushtime))
 + if (++flushed  bdf_prm.b_un.ndirty 
 + time_before(jiffies, bh-b_flushtime))
   goto out_unlock;
   } else {
   if (++flushed  bdf_prm.b_un.ndirty)

No, it doesn't, because some way of knowing the disk load is required and 
there's nothing like that here.

There are two components to what I was talking about:

  1) Early flush when load is light
  2) Preemptive cleaning when load is light

Both are supposed to be triggered by other disk activity, swapout or file 
writes, and are supposed to be triggered when the disk activity eases up.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread thunder7

On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote:
 
 It _juuust_ so happens that I was tinkering... what do you think of
 something like the below?  (and boy do I ever wonder what a certain
 box doing slrn stuff thinks of it.. hint hint;)
 
I'm sorry to say this box doesn't really think any different of it.

Everything that's in the cache before running slrn on a big group seems
to stay there the whole time, making my active slrn-process use swap.

I applied the patch to 2.4.5-ac15, and this was the result:

   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  sy  id
 0  1  0  11216   2548 183560 264172   1   4   184   343  123   119   2   6  92
 0  0  0  11212   2620 183444 264184   0   0 472  12799   1   2  97
 0  0  0  11212   1604 183444 264740   0   0   378 0  130   101   2   1  98
 0  1  0  11212   1588 184300 263116   0   0   552  1080  277   360   3  14  83
 2  0  2  11212   1692 174052 270536   0   0  1860 0  596   976   9  50  40
 2  0  2  11212   1588 166732 274816   0   0  1868  5426  643  1050   8  44  48
 0  1  0  11212   1588 163276 276888   0   0  1714  1816  580   972   9  17  74
 0  1  0  11212   1848 166280 273688   0   0   514  3952  301   355   3  40  57
 1  0  0  11212   1592 164232 273872   0   0  1824  3532  632  1083  11  25  64
 2  0  2  11212   1980 167304 268792   0   0  1678 0  550   881   8  51  41
 0  1  2  11212   1588 163908 271356   0   0  1344  4896  508   753   7  26  67
 1  0  0  11212   1588 160896 272756   0   0  1642  1301  574   929   9  22  69
 0  1  0  11212   1592 164936 268632   0   0   756  3594  370   467   6  43  51
 2  0  3  11212   1596 164380 266552   0   0  1904  2392  604  1017  10  52  37
 1  0  0  11212   1592 164752 265844   0   0  1784  2382  623  1000  10  22  69
 0  1  0  11212   1592 168528 262256   0   0   810  4176  364   523   5  43  52
 0  1  1  11212   1992 169324 259504   0   0  1686  3068  578   999  11  42  47
 0  1  0  11212   1588 170696 256332   0   0  1568  1080  532   894  10  20  70
 1  0  0  11212   1592 174876 253036   0   0   598  3600  315   420   4  41  55
 0  1  1  11212   2316 171592 253892   0   0  1816  3286  616  1073   7  29  64
 0  1  0  11212   1588 170380 253968   0   0  1638   840  540   910  13  29  58
 0  1  1  11212   2896 168840 253740   0   0   752  4120  342   458   4  45  51
 0  1  0  11216   2012 166392 255560   0   0  1352  2458  549   895   8  14  77
 2  0  1  11216   1588 170744 250164   0   0  1504  1260  503   791   7  48  45
 0  1  1  11224   1588 170704 249948   0   0   874  4106  516   655   6  10  84
 0  1  0  11228   1588 170148 248988   0   0  1442 0  466   772   8  20  73
 1  0  0  11228   1592 171784 247456   0   0   860  3598  362   495   7  44  48
 0  1  0  11228   1588 171864 246212   0   0  1390  3176  510   840   9  41  50
 0  1  2  11232   1992 170344 245832   0   0  1676  1808  539   898  10  45  45
 1  0  1  10508   1632 168204 246780   0 946  1508  2804  599   920   9  20  71
 0  1  0   9496   2020 168904 244880   0   0   936  3620  417   603   5  35  60
 1  0  0   9604   2516 164096 247536   0   0  1700  2214  563  1085  11  33  56
 0  1  0  16196   1820 162112 255492   0   2  1384  1596  497  1106   8  53  38
 1  0  0  19240   3000 158052 260608   0   0   400  3824  373   388   2  14  84
 1  1  1  28756   4508 146032 278104   0   0  1688  2140  612  1502   7  60  33
 2  0  0  39432  29100 105668 300912   0  18  2108  1178  645  1825  12  52  36
 1  0  0  40668  13024 108568 311748   0   0  1674  4992  623  1017   9  12  79
 0  1  0  45324   3484 105072 326432   0   0  1876  3624  619  1090  13  24  63
 1  0  0  53648   1564 102740 337688   0  18   950  3646  404   857   5  31  63
 2  0  0  53672   1604 103356 335680   0 2962  1436  5864  565   976  10  43  47
 1  0  1  54380   1920 103516 334320   0 1086  1826  1626  590  1072  13  45  42
 0  1  1  54600   6532  99568 333860   0 1006   242  5948  277  2680   2  39  59
 0  1  0  54596   1944 103744 331932   0   0  1854  3644  627  1054  11  16  73
 1  0  0  54592   1924 102876 331100   0 950  1956  2612  621  1173  11  41  48
 1  0  0  54592   1592 103576 329568   0   0  1548  4860  605  1106  11  36  53
 0  1  1  54592   1588 102908 328320   0 452  1808  2522  583  1049  11  51  38
 0  1  1  54592   1588 101916 327076   0 866  1816  1260  589  1046  11  49  40
 0  1  0  54592   2076  99568 327776   0 414   992  5728  459  1314   7  25  67
 0  1  0  54592   1588 103928 323824   0   0   968  3646  403   747   5  33  61
 1  0  0  54592   2632 100108 325136   0 402  1856  2468  622  1369  13  44  42
 0  1  0  54592   1588 101872 322600   0 392  1056  2834  461   802   6  35  60
 1  0  1  55644   1724 102108 322404   0 380  1448  2682  501  1032   9  50  41
 1  1  1  57388   1588 103068 322056   0   0  1384  1396  471   780   8  37  56
 0  1  1  58500   2048 102024 323020   0 368   876  3932  504   

Re: (lkml)Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-17 Thread Mike Galbraith

On Sun, 17 Jun 2001 [EMAIL PROTECTED] wrote:

 On Sun, Jun 17, 2001 at 12:05:10PM +0200, Mike Galbraith wrote:
 
  It _juuust_ so happens that I was tinkering... what do you think of
  something like the below?  (and boy do I ever wonder what a certain
  box doing slrn stuff thinks of it.. hint hint;)
 
 I'm sorry to say this box doesn't really think any different of it.

Well darn.  But..

 Everything that's in the cache before running slrn on a big group seems
 to stay there the whole time, making my active slrn-process use swap.

It should not be the same data if page aging is working at all.  Better
stated, if it _is_ the same data and page aging is working, it's needed
data, so the movement of momentarily unused rss to disk might have been
the right thing to do.. it just has to buy you the use of the pages moved
for long enough to offset the (large) cost of dropping those pages.

I saw it adding rss to the aging pool, but not terribly much IO.  The
fact that it is using page replacement is only interesting in regard to
total system efficiency.

 I applied the patch to 2.4.5-ac15, and this was the result:

saves vmstat

Thanks for running it.  Can you (afford to) send me procinfo or such
(what I would like to see is job efficiency) information?  Full logs
are fine, as long as they're not truely huge :)  Anything under a meg
is gratefully accepted (privately 'course).

I think (am pretty darn sure) the aging fairness change is what is
affecting you, but it's not possible to see whether this change is
affecting you in a negative or positive way without timing data.

-Mike

misc:

wrt this ~patch, it only allows you to move the rolldown to sync disk
behavior some.. moving write delay back some (knob) is _supposed_ to
get that IO load (at least) a modest throughput increase.  The flushto
thing was basically directed toward laptop use, but ~seems to exhibit
better IO clustering/bandwidth sharing as well.  (less old/new request
merging?.. distance?)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Rik van Riel

On Sat, 16 Jun 2001, Daniel Phillips wrote:

> > Does the patch below do anything good for your laptop? ;)
> 
> I'll wait for the next one ;-)

OK, here's one which isn't reversed and should work ;))

--- fs/buffer.c.origSat Jun 16 18:05:29 2001
+++ fs/buffer.c Sat Jun 16 18:05:15 2001
@@ -2550,7 +2550,8 @@
   if the current bh is not yet timed out,
   then also all the following bhs
   will be too young. */
-   if (time_before(jiffies, bh->b_flushtime))
+   if (++flushed > bdf_prm.b_un.ndirty &&
+   time_before(jiffies, bh->b_flushtime))
goto out_unlock;
} else {
if (++flushed > bdf_prm.b_un.ndirty)

cheers,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Daniel Phillips

On Saturday 16 June 2001 23:06, Rik van Riel wrote:
> On Sat, 16 Jun 2001, Daniel Phillips wrote:
> > As a side note, the good old multisecond delay before bdflush kicks in
> > doesn't really make a lot of sense - when bandwidth is available the
> > filesystem-initiated writeouts should happen right away.
>
> ... thus spinning up the disk ?

Nope, the disk is already spinning, some other writeouts just finished.

> How about just making sure we write out a bigger bunch
> of dirty pages whenever one buffer gets too old ?

It's simpler than that.  It's basically just: disk traffic low? good, write 
out all the dirty buffers.  Not quite as crude as that, but nearly.

> Does the patch below do anything good for your laptop? ;)

I'll wait for the next one ;-)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Rik van Riel

On Sat, 16 Jun 2001, Rik van Riel wrote:


Oops, I did something stupid and the patch is reversed ;)


> --- buffer.c.orig Sat Jun 16 18:05:15 2001
> +++ buffer.c  Sat Jun 16 18:05:29 2001
> @@ -2550,8 +2550,7 @@
>  if the current bh is not yet timed out,
>  then also all the following bhs
>  will be too young. */
> - if (++flushed > bdf_prm.b_un.ndirty &&
> - time_before(jiffies, bh->b_flushtime))
> + if(time_before(jiffies, bh->b_flushtime))
>   goto out_unlock;
>   } else {
>   if (++flushed > bdf_prm.b_un.ndirty)


Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Rik van Riel

On Sat, 16 Jun 2001, Daniel Phillips wrote:

> In other words, any episode of pageouts is followed immediately by a
> short episode of preemptive cleaning.

linux/mm/vmscan.c::page_launder(), around line 666:
/* Let bdflush take care of the rest. */
wakeup_bdflush(0);


> The definition of 'for a while' and 'plenty of disk bandwidth' can be
> tuned, but I don't think either is particularly critical.

Can be tuned a bit, indeed.

> As a side note, the good old multisecond delay before bdflush kicks in 
> doesn't really make a lot of sense - when bandwidth is available the 
> filesystem-initiated writeouts should happen right away.

... thus spinning up the disk ?

How about just making sure we write out a bigger bunch
of dirty pages whenever one buffer gets too old ?

Does the patch below do anything good for your laptop? ;)

regards,

Rik
--


--- buffer.c.orig   Sat Jun 16 18:05:15 2001
+++ buffer.cSat Jun 16 18:05:29 2001
@@ -2550,8 +2550,7 @@
   if the current bh is not yet timed out,
   then also all the following bhs
   will be too young. */
-   if (++flushed > bdf_prm.b_un.ndirty &&
-   time_before(jiffies, bh->b_flushtime))
+   if(time_before(jiffies, bh->b_flushtime))
goto out_unlock;
} else {
if (++flushed > bdf_prm.b_un.ndirty)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Daniel Phillips

On Friday 15 June 2001 17:23, Pavel Machek wrote:
> Hi!
>
> > Roger> It does if you are running on a laptop. Then you do not want
> > Roger> the pages go out all the time. Disk has gone too sleep, needs
> > Roger> to start to write a few pages, stays idle for a while, goes to
> > Roger> sleep, a few more pages, ...
> >
> > That could be handled by a metric which says if the disk is spun down,
> > wait until there is more memory pressure before writing.  But if the
> > disk is spinning, we don't care, you should start writing out buffers
> > at some low rate to keep the pressure from rising too rapidly.
>
> Notice that write is not free (in terms of power) even if disk is spinning.
> Seeks (etc) also take some power. And think about flashcards. It certainly
> is cheaper tha spinning disk up but still not free.
>
> Also note that kernel does not [currently] know that disks went spindown.

There's an easy answer that should work well on both servers and laptops, 
that goes something like this: when memory pressure has been brought to 0, if 
there there is plenty of disk bandwidth available, continue writeout for a 
while and clean some extra pages.  In other words, any episode of pageouts 
is followed immediately by a short episode of preemptive cleaning.

This gives both the preemptive cleaning we want in order to respond to the 
next surge, and lets the laptop disk spin down.  The definition of 'for a 
while' and 'plenty of disk bandwidth' can be tuned, but I don't think either 
is particularly critical.

As a side note, the good old multisecond delay before bdflush kicks in 
doesn't really make a lot of sense - when bandwidth is available the 
filesystem-initiated writeouts should happen right away.

It's not necessary or desirable to write out more dirty pages after the 
machine has been idle for a while, if only because the longer it's idle the 
less the 'surge protection' matters in terms of average throughput.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Pavel Machek

Hi!

> Roger> It does if you are running on a laptop. Then you do not want
> Roger> the pages go out all the time. Disk has gone too sleep, needs
> Roger> to start to write a few pages, stays idle for a while, goes to
> Roger> sleep, a few more pages, ...
> 
> That could be handled by a metric which says if the disk is spun down,
> wait until there is more memory pressure before writing.  But if the
> disk is spinning, we don't care, you should start writing out buffers
> at some low rate to keep the pressure from rising too rapidly.  

Notice that write is not free (in terms of power) even if disk is spinning.
Seeks (etc) also take some power. And think about flashcards. It certainly
is cheaper tha spinning disk up but still not free.

Also note that kernel does not [currently] know that disks went spindown.
Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Pavel Machek

Hi!

 Roger It does if you are running on a laptop. Then you do not want
 Roger the pages go out all the time. Disk has gone too sleep, needs
 Roger to start to write a few pages, stays idle for a while, goes to
 Roger sleep, a few more pages, ...
 
 That could be handled by a metric which says if the disk is spun down,
 wait until there is more memory pressure before writing.  But if the
 disk is spinning, we don't care, you should start writing out buffers
 at some low rate to keep the pressure from rising too rapidly.  

Notice that write is not free (in terms of power) even if disk is spinning.
Seeks (etc) also take some power. And think about flashcards. It certainly
is cheaper tha spinning disk up but still not free.

Also note that kernel does not [currently] know that disks went spindown.
Pavel
-- 
Philips Velo 1: 1x4x8, 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Daniel Phillips

On Friday 15 June 2001 17:23, Pavel Machek wrote:
 Hi!

  Roger It does if you are running on a laptop. Then you do not want
  Roger the pages go out all the time. Disk has gone too sleep, needs
  Roger to start to write a few pages, stays idle for a while, goes to
  Roger sleep, a few more pages, ...
 
  That could be handled by a metric which says if the disk is spun down,
  wait until there is more memory pressure before writing.  But if the
  disk is spinning, we don't care, you should start writing out buffers
  at some low rate to keep the pressure from rising too rapidly.

 Notice that write is not free (in terms of power) even if disk is spinning.
 Seeks (etc) also take some power. And think about flashcards. It certainly
 is cheaper tha spinning disk up but still not free.

 Also note that kernel does not [currently] know that disks went spindown.

There's an easy answer that should work well on both servers and laptops, 
that goes something like this: when memory pressure has been brought to 0, if 
there there is plenty of disk bandwidth available, continue writeout for a 
while and clean some extra pages.  In other words, any episode of pageouts 
is followed immediately by a short episode of preemptive cleaning.

This gives both the preemptive cleaning we want in order to respond to the 
next surge, and lets the laptop disk spin down.  The definition of 'for a 
while' and 'plenty of disk bandwidth' can be tuned, but I don't think either 
is particularly critical.

As a side note, the good old multisecond delay before bdflush kicks in 
doesn't really make a lot of sense - when bandwidth is available the 
filesystem-initiated writeouts should happen right away.

It's not necessary or desirable to write out more dirty pages after the 
machine has been idle for a while, if only because the longer it's idle the 
less the 'surge protection' matters in terms of average throughput.

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Rik van Riel

On Sat, 16 Jun 2001, Daniel Phillips wrote:

 In other words, any episode of pageouts is followed immediately by a
 short episode of preemptive cleaning.

linux/mm/vmscan.c::page_launder(), around line 666:
/* Let bdflush take care of the rest. */
wakeup_bdflush(0);


 The definition of 'for a while' and 'plenty of disk bandwidth' can be
 tuned, but I don't think either is particularly critical.

Can be tuned a bit, indeed.

 As a side note, the good old multisecond delay before bdflush kicks in 
 doesn't really make a lot of sense - when bandwidth is available the 
 filesystem-initiated writeouts should happen right away.

... thus spinning up the disk ?

How about just making sure we write out a bigger bunch
of dirty pages whenever one buffer gets too old ?

Does the patch below do anything good for your laptop? ;)

regards,

Rik
--


--- buffer.c.orig   Sat Jun 16 18:05:15 2001
+++ buffer.cSat Jun 16 18:05:29 2001
@@ -2550,8 +2550,7 @@
   if the current bh is not yet timed out,
   then also all the following bhs
   will be too young. */
-   if (++flushed  bdf_prm.b_un.ndirty 
-   time_before(jiffies, bh-b_flushtime))
+   if(time_before(jiffies, bh-b_flushtime))
goto out_unlock;
} else {
if (++flushed  bdf_prm.b_un.ndirty)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Rik van Riel

On Sat, 16 Jun 2001, Rik van Riel wrote:


Oops, I did something stupid and the patch is reversed ;)


 --- buffer.c.orig Sat Jun 16 18:05:15 2001
 +++ buffer.c  Sat Jun 16 18:05:29 2001
 @@ -2550,8 +2550,7 @@
  if the current bh is not yet timed out,
  then also all the following bhs
  will be too young. */
 - if (++flushed  bdf_prm.b_un.ndirty 
 - time_before(jiffies, bh-b_flushtime))
 + if(time_before(jiffies, bh-b_flushtime))
   goto out_unlock;
   } else {
   if (++flushed  bdf_prm.b_un.ndirty)


Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Daniel Phillips

On Saturday 16 June 2001 23:06, Rik van Riel wrote:
 On Sat, 16 Jun 2001, Daniel Phillips wrote:
  As a side note, the good old multisecond delay before bdflush kicks in
  doesn't really make a lot of sense - when bandwidth is available the
  filesystem-initiated writeouts should happen right away.

 ... thus spinning up the disk ?

Nope, the disk is already spinning, some other writeouts just finished.

 How about just making sure we write out a bigger bunch
 of dirty pages whenever one buffer gets too old ?

It's simpler than that.  It's basically just: disk traffic low? good, write 
out all the dirty buffers.  Not quite as crude as that, but nearly.

 Does the patch below do anything good for your laptop? ;)

I'll wait for the next one ;-)

--
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: spindown [was Re: 2.4.6-pre2, pre3 VM Behavior]

2001-06-16 Thread Rik van Riel

On Sat, 16 Jun 2001, Daniel Phillips wrote:

  Does the patch below do anything good for your laptop? ;)
 
 I'll wait for the next one ;-)

OK, here's one which isn't reversed and should work ;))

--- fs/buffer.c.origSat Jun 16 18:05:29 2001
+++ fs/buffer.c Sat Jun 16 18:05:15 2001
@@ -2550,7 +2550,8 @@
   if the current bh is not yet timed out,
   then also all the following bhs
   will be too young. */
-   if (time_before(jiffies, bh-b_flushtime))
+   if (++flushed  bdf_prm.b_un.ndirty 
+   time_before(jiffies, bh-b_flushtime))
goto out_unlock;
} else {
if (++flushed  bdf_prm.b_un.ndirty)

cheers,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-14 Thread Pavel Machek

Hi!
 
> >Thanks for this patch. But why hasn't it been included into
> >the kernel earlier? Wouldn't be a combination of yours and my
> 
> It's basically included into 2.4.x.
> 
> >patch be the proper way? As far as I understand you switch
> 
> Your patch is sure fine. BTW, 2.4.x have an high limit of 10 minutes (as
> opposed to 2.2.x that have an high limit of 1 minute). I'd suggest to
> clean the patch to only increase the high limit value (one liner). Thanks.

10minutes still seems little low for me. What is high limit good for, anyway?

Pavel

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-14 Thread Pavel Machek

Hi!

> o developpers,
> 
> this is a short description of a particular wish of notebook
> users. Since kernel 2.2.11 the buffer flushing deamon is no longer
> a user space program but part of the kernel (in fs/buffer.c).
> 
> Before this kernel release it was the bdflush-program which
> could be called with certain command line parameters in order
> to control flushing of file system buffers. In particular in
> combination with the hdparm-program it could be used to
> spin down the hard disk (e.g. of laptops) if it was not accessed.
> 
> I know that by writing to /proc/sys/vm/bdflush relevant kernel
> paramters may be modified. But there are certain limits compiled
> into every kernel, which have the consequence that
> 
> *a silent hard disk hard disk is no longer feasible since kernel
> 2.2.11*.

Well, with noflushd daemon, it works for me.

> I have modified the constants used in fs/buffer.c to allow for
> bigger time intervals between forced buffer flushings. The "patch"
> may be found at http://www.hmi.de/people/brunne/Spindown .

Can you mail me the patch? [I do not have way to access web easily.]

> Shouldn't Linux support hard disk spindown during periods of
> inactivity? Is the tiny patch worth of being included into standard
> kernels?

Noflushd is little hacky. If your patch is really tiny, post actual patch to 
l-k for discussion. [Can you gracefully handle case of few active and few
inactive disks?]
Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-14 Thread Pavel Machek

Hi!

 o developpers,
 
 this is a short description of a particular wish of notebook
 users. Since kernel 2.2.11 the buffer flushing deamon is no longer
 a user space program but part of the kernel (in fs/buffer.c).
 
 Before this kernel release it was the bdflush-program which
 could be called with certain command line parameters in order
 to control flushing of file system buffers. In particular in
 combination with the hdparm-program it could be used to
 spin down the hard disk (e.g. of laptops) if it was not accessed.
 
 I know that by writing to /proc/sys/vm/bdflush relevant kernel
 paramters may be modified. But there are certain limits compiled
 into every kernel, which have the consequence that
 
 *a silent hard disk hard disk is no longer feasible since kernel
 2.2.11*.

Well, with noflushd daemon, it works for me.

 I have modified the constants used in fs/buffer.c to allow for
 bigger time intervals between forced buffer flushings. The "patch"
 may be found at http://www.hmi.de/people/brunne/Spindown .

Can you mail me the patch? [I do not have way to access web easily.]

 Shouldn't Linux support hard disk spindown during periods of
 inactivity? Is the tiny patch worth of being included into standard
 kernels?

Noflushd is little hacky. If your patch is really tiny, post actual patch to 
l-k for discussion. [Can you gracefully handle case of few active and few
inactive disks?]
Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-12 Thread Daniel Kobras

On Tue, 12 Sep 2000, Jamie Lokier wrote:

> Dave Zarzycki wrote:
> > Personally speaking, I always thought it would be nice if the kernel
> > flushed dirty buffers right before a disk spins down. It seems silly to me
> > that a disk can spin down with writes pending.
> 
> Absolutely.  That allows more time spun down too.

Pavel Machek sent me a patch for noflushd to do exactly this. Need not be
a kernel issue either.

Regards,

Daniel.

-- 
GNU/Linux Audio Mechanics - http://www.glame.de
  Cutting Edge Office - http://www.c10a02.de
  GPG Key ID 89BF7E2B - http://www.keyserver.net

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-12 Thread Pavel Machek

Hi!

> > On Sat, 9 Sep 2000 [EMAIL PROTECTED] wrote:
> > > Would it be possible to detect when the disk spins up, and do the flush then?
> > Yes if you had a continuious polling of power status wrt standby.
> 
> I think the following flushing policy would work almost as well, while
> remaining generic:
> 
>  - if there's a read that is not handled from the buffer cache, flush
>(write) all dirty buffers
>  - if we need to flush (write) one dirty buffers, flush all others too
> 
> This wouldn't catch cases like an explicit spin-up without data I/O,
> but I don't think this is much of a problem in real life.

noflushd works for me. It monitors "read/write" counters in
/proc/stat, and if it detects activity, it syncs(). If it detect idle
period, it syncs() then spins disk down.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-12 Thread Jamie Lokier

Dave Zarzycki wrote:
> Personally speaking, I always thought it would be nice if the kernel
> flushed dirty buffers right before a disk spins down. It seems silly to me
> that a disk can spin down with writes pending.

Absolutely.  That allows more time spun down too.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-12 Thread Jamie Lokier

Dave Zarzycki wrote:
 Personally speaking, I always thought it would be nice if the kernel
 flushed dirty buffers right before a disk spins down. It seems silly to me
 that a disk can spin down with writes pending.

Absolutely.  That allows more time spun down too.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Notebook disk spindown

2000-09-12 Thread Pavel Machek

Hi!

  On Sat, 9 Sep 2000 [EMAIL PROTECTED] wrote:
   Would it be possible to detect when the disk spins up, and do the flush then?
  Yes if you had a continuious polling of power status wrt standby.
 
 I think the following flushing policy would work almost as well, while
 remaining generic:
 
  - if there's a read that is not handled from the buffer cache, flush
(write) all dirty buffers
  - if we need to flush (write) one dirty buffers, flush all others too
 
 This wouldn't catch cases like an explicit spin-up without data I/O,
 but I don't think this is much of a problem in real life.

noflushd works for me. It monitors "read/write" counters in
/proc/stat, and if it detects activity, it syncs(). If it detect idle
period, it syncs() then spins disk down.
Pavel
-- 
I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >