subject:"\[PATCH\] mm\: fix page_mkclean_one \(was\: 2.6.19 file content corruption on ext3\)"

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton

On Sun, 7 Jan 2007 12:36:18 +1030
"Tom Lanyon" <[EMAIL PROTECTED]> wrote:

> On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > What would also actually be interesting is whether somebody can reproduce
> > this on Reiserfs, for example. I _think_ all the reports I've seen are on
> > ext2 or ext3, and if this is somehow writeback-related, it could be some
> > bug that is just shared between the two by virtue of them still having a
> > lot of stuff in common.
> >
> > Linus
> 
> I've been following this thread for a while now as I started
> experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
> am using reiserfs.

reiserfs defaults to data=ordered, so it's quite possibly the same bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon


On 1/7/07, Tom Lanyon <[EMAIL PROTECTED]> wrote:

I've been following this thread for a while now as I started
experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
am using reiserfs.


However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far...

--
Tom Lanyon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon


On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:

What would also actually be interesting is whether somebody can reproduce
this on Reiserfs, for example. I _think_ all the reports I've seen are on
ext2 or ext3, and if this is somehow writeback-related, it could be some
bug that is just shared between the two by virtue of them still having a
lot of stuff in common.

Linus


I've been following this thread for a while now as I started
experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
am using reiserfs.

--
Tom Lanyon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon


On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote:

What would also actually be interesting is whether somebody can reproduce
this on Reiserfs, for example. I _think_ all the reports I've seen are on
ext2 or ext3, and if this is somehow writeback-related, it could be some
bug that is just shared between the two by virtue of them still having a
lot of stuff in common.

Linus


I've been following this thread for a while now as I started
experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
am using reiserfs.

--
Tom Lanyon
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon


On 1/7/07, Tom Lanyon [EMAIL PROTECTED] wrote:

I've been following this thread for a while now as I started
experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
am using reiserfs.


However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far...

--
Tom Lanyon
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton

On Sun, 7 Jan 2007 12:36:18 +1030
Tom Lanyon [EMAIL PROTECTED] wrote:

 On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote:
  What would also actually be interesting is whether somebody can reproduce
  this on Reiserfs, for example. I _think_ all the reports I've seen are on
  ext2 or ext3, and if this is somehow writeback-related, it could be some
  bug that is just shared between the two by virtue of them still having a
  lot of stuff in common.
 
  Linus
 
 I've been following this thread for a while now as I started
 experiencing file corruption in rtorrent when I upgraded to 2.6.19. I
 am using reiserfs.

reiserfs defaults to data=ordered, so it's quite possibly the same bug.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds

On Thu, 28 Dec 2006, Martin Schwidefsky wrote:
> 
> For s390 there are two aspects to consider:
> 1) the pte values are 100% software controlled.

That's fine. In that situation, you shouldn't need any atomic ops at all, 
I think all our sw page-table operations are already done under the pte 
lock. 

The reason x86 needs to be careful is exactly the fact that the hardware 
will obviously do a lot on its own, and the hardware is _not_ going to 
honor our page table locking ;)

In an all-sw situation, a lot of this should be easier. S390 has _other_ 
things that are inconvenient (the strange "dirty bit is not in the page 
tables" thing that makes it look different from everybody else), but hey, 
it's a balance..

So for s390, ptep_exchange() in my example should be able to be a simple 
"load old value and store new one", assuming everybody honors the pte lock 
(and they _should_).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky

On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote:
> What do you guys think? Does something like this work out for S/390 too? I
> tried to make that "ptep_flush_dirty()" concept work for architectures
> that hide the dirty bit somewhere else too, but..

For s390 there are two aspects to consider:
1) the pte values are 100% software controlled. They only change because
a cpu stored a value to it or issued one of the specialized instructions
(csp, ipte and idte). The ptep_flush_dirty would be a nop for s390.
2) ptep_exchange is a bit dangerous. For s390 we need a lock that
protects the software controlled updates of the ptes. The reason is the
ipte instruction. It is implemented by the machine microcode in a
non-atomic way in regard to the memory. It reads the byte of the pte
that contains the invalid bit, flushes the tlb entries for it and then
writes back the byte with the invalid bit set. The microcode makes sure
that this pte cannot be used for form a new tlb on any cpu while the
ipte is in progress.
That means a compare-and-swap semantics on ptes won't work together with
the ipte optimization. As long as there is the pte lock that protects
all software accesses to the pte we are fine. But if any code expects
that ptep_exchange does something like an xchg things break.

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH

"Reality continues to ruin my life." - Calvin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell


On 12/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

I do get this error on reiserfs ( old one, didn't try on reiser4 ).
Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the
debian bts.


I've had reports of corrupted data on earlier kernel releases with
reiserfs3, which were fixed by upgrading to reiserfs4.

Jari Sundell
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn

On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote:
> What would also actually be interesting is whether somebody can reproduce 
> this on Reiserfs, for example. I _think_ all the reports I've seen are on 
> ext2 or ext3, and if this is somehow writeback-related, it could be some 
> bug that is just shared between the two by virtue of them still having a 
> lot of stuff in common. 
> 
>   Linus
I do get this error on reiserfs ( old one, didn't try on reiser4 ). 
Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the
debian bts.

flo attenberger

---
Linux master 2.6.19 #1 PREEMPT Thu Dec 21 10:55:34 CET 2006 x86_64
GNU/Linux

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.19
# Thu Dec 21 10:45:05 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_KEXEC=y
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x20
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_REORDER=y
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SYSFS_DEPRECATED=y
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
CONFIG_ACPI_AC=m
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_HOTKEY=m
CONFIG_ACPI_FAN=m
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_THERMAL=m
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell


On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


 - It never uses mprotect on the shared mappings, but it _does_ do:
"mincore()" - but the return values don't much matter (it's used
  as a heuristic on which parts to hash, apparently)

  I double- and triple-checked this one, because I
  did make changes to "mincore()", but those didn't go
  into the affected kernels anyway (ie they are not in
  plain 2.6.19, nor in 2.6.18.3 either)


Correct, mincore is only used to check if it should delay the hash checking.


"madvise(MADV_WILLNEED)"
"msync(MS_ASYNC)" (or MS_SYNC if you use a command line flag)
"munmap()" of course

 - it never seems to mix mmap() and write() - it does _only_ mmap.

 - it seems to mmap/munmap the shared files in nice 64-page chunks, all
   64-page aligned in the file (ie it does NOT create one big mapping, it
   has some kind of LRU of thse 64-page chunks). The only exception being
   the last chunk, which it maps byte-accurate to the size.


The length of the chunks is only page aligned on single file torrents,
not so on multi-file torrents. I've attached a patch for rtorrent that
will extend the length to the page boundary.


 - I haven't checked whether it only ever has the same chunk mapped once
   at a time.


This should be the case, but two mapped chunks may share a page,
sometimes with different r/w permissions.

Jari Sundell


extend_mapping.diff
Description: Binary data

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell


On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote:
snip

 - It never uses mprotect on the shared mappings, but it _does_ do:
mincore() - but the return values don't much matter (it's used
  as a heuristic on which parts to hash, apparently)

  I double- and triple-checked this one, because I
  did make changes to mincore(), but those didn't go
  into the affected kernels anyway (ie they are not in
  plain 2.6.19, nor in 2.6.18.3 either)


Correct, mincore is only used to check if it should delay the hash checking.


madvise(MADV_WILLNEED)
msync(MS_ASYNC) (or MS_SYNC if you use a command line flag)
munmap() of course

 - it never seems to mix mmap() and write() - it does _only_ mmap.

 - it seems to mmap/munmap the shared files in nice 64-page chunks, all
   64-page aligned in the file (ie it does NOT create one big mapping, it
   has some kind of LRU of thse 64-page chunks). The only exception being
   the last chunk, which it maps byte-accurate to the size.


The length of the chunks is only page aligned on single file torrents,
not so on multi-file torrents. I've attached a patch for rtorrent that
will extend the length to the page boundary.


 - I haven't checked whether it only ever has the same chunk mapped once
   at a time.


This should be the case, but two mapped chunks may share a page,
sometimes with different r/w permissions.

Jari Sundell


extend_mapping.diff
Description: Binary data

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn

On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote:
 What would also actually be interesting is whether somebody can reproduce 
 this on Reiserfs, for example. I _think_ all the reports I've seen are on 
 ext2 or ext3, and if this is somehow writeback-related, it could be some 
 bug that is just shared between the two by virtue of them still having a 
 lot of stuff in common. 
 
   Linus
I do get this error on reiserfs ( old one, didn't try on reiser4 ). 
Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the
debian bts.

flo attenberger

---
Linux master 2.6.19 #1 PREEMPT Thu Dec 21 10:55:34 CET 2006 x86_64
GNU/Linux

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.19
# Thu Dec 21 10:45:05 2006
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_KEXEC=y
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x20
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_REORDER=y
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SYSFS_DEPRECATED=y
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
CONFIG_ACPI_AC=m
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_HOTKEY=m
CONFIG_ACPI_FAN=m
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=m
CONFIG_ACPI_THERMAL=m
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
#

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell


On 12/27/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

I do get this error on reiserfs ( old one, didn't try on reiser4 ).
Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the
debian bts.


I've had reports of corrupted data on earlier kernel releases with
reiserfs3, which were fixed by upgrading to reiserfs4.

Jari Sundell
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky

On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote:
 What do you guys think? Does something like this work out for S/390 too? I
 tried to make that ptep_flush_dirty() concept work for architectures
 that hide the dirty bit somewhere else too, but..

For s390 there are two aspects to consider:
1) the pte values are 100% software controlled. They only change because
a cpu stored a value to it or issued one of the specialized instructions
(csp, ipte and idte). The ptep_flush_dirty would be a nop for s390.
2) ptep_exchange is a bit dangerous. For s390 we need a lock that
protects the software controlled updates of the ptes. The reason is the
ipte instruction. It is implemented by the machine microcode in a
non-atomic way in regard to the memory. It reads the byte of the pte
that contains the invalid bit, flushes the tlb entries for it and then
writes back the byte with the invalid bit set. The microcode makes sure
that this pte cannot be used for form a new tlb on any cpu while the
ipte is in progress.
That means a compare-and-swap semantics on ptes won't work together with
the ipte optimization. As long as there is the pte lock that protects
all software accesses to the pte we are fine. But if any code expects
that ptep_exchange does something like an xchg things break.

-- 
blue skies,
  Martin.

Martin Schwidefsky
Linux for zSeries Development  Services
IBM Deutschland Entwicklung GmbH

Reality continues to ruin my life. - Calvin.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds



On Thu, 28 Dec 2006, Martin Schwidefsky wrote:
 
 For s390 there are two aspects to consider:
 1) the pte values are 100% software controlled.

That's fine. In that situation, you shouldn't need any atomic ops at all, 
I think all our sw page-table operations are already done under the pte 
lock. 

The reason x86 needs to be careful is exactly the fact that the hardware 
will obviously do a lot on its own, and the hardware is _not_ going to 
honor our page table locking ;)

In an all-sw situation, a lot of this should be easier. S390 has _other_ 
things that are inconvenient (the strange dirty bit is not in the page 
tables thing that makes it look different from everybody else), but hey, 
it's a balance..

So for s390, ptep_exchange() in my example should be able to be a simple 
load old value and store new one, assuming everybody honors the pte lock 
(and they _should_).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds

On Tue, 26 Dec 2006, Nick Piggin wrote:

> Linus Torvalds wrote:
> > 
> > Ok, so how about this diff.
> > 
> > I'm actually feeling good about this one. It really looks like
> > "do_no_page()" was simply buggy, and that this explains everything.
> 
> Still trying to catch up here, so I'm not going to reply to any old
> stuff and just start at the tip of the thread... Other than to say
> that I really like cancel_page_dirty ;)

Yeah, I think that part is a bit clearer about what's going on now.

> I think your patch is quite right so that's a good catch.

Actually, since people told me it didn't matter, I went back and looked at 
_why_ - the thing is, "vma->vm_page_prot" should always be read-only 
anyway, except for mappings that don't do dirty accounting at all, so I 
think my patch only found cases that are unimportant (ie pages that get 
faulted on on filesystems like ramfs that doesn't do any dirty page 
accounting because they're all dirty anyway).

> But I'm not too surprised that it does not help the problem, because I 
> don't think we have started shedding any old pte_dirty tests at 
> unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, 
> as such.

True. We should no longer _need_ those dirty bit reclaims at 
unmap/reclaim, but we still do them, so you're right, even if we were 
buggy in this area, it should only really matter for the dirty page 
counting, not for any lost data.

> I was hoping that you've almost narrowed it down to the filesystem
> writeback code, with the last few mails?

I think so, yes.

However, I've checked, and "rtorrent" really does seem to be fairly 
well-behaved wrt any filesystem activity. It does

 - no threading. It's 100% single-threaded, and doesn't even appear to use 
   signals.

 - exactly _one_ "ftruncate()", and it does it at the beginning, for the 
   full final size.

   IOW, it's not anything subtle with truncate and dirty page cancel.

 - It never uses mprotect on the shared mappings, but it _does_ do:
"mincore()" - but the return values don't much matter (it's used 
  as a heuristic on which parts to hash, apparently)

  I double- and triple-checked this one, because I
  did make changes to "mincore()", but those didn't go 
  into the affected kernels anyway (ie they are not in 
  plain 2.6.19, nor in 2.6.18.3 either)

"madvise(MADV_WILLNEED)"
"msync(MS_ASYNC)" (or MS_SYNC if you use a command line flag)
"munmap()" of course

 - it never seems to mix mmap() and write() - it does _only_ mmap.

 - it seems to mmap/munmap the shared files in nice 64-page chunks, all 
   64-page aligned in the file (ie it does NOT create one big mapping, it 
   has some kind of LRU of thse 64-page chunks). The only exception being 
   the last chunk, which it maps byte-accurate to the size.

 - I haven't checked whether it only ever has the same chunk mapped once 
   at a time.

Anyway, the _one_ half-way interesting thing is the fact that it doesn't 
allocate any backing store at all for the file, and as such the page 
writeback needs to create all the underlying buffers on the filesystem. I 
really don't see why that would be a problem either, but I could imagine 
that if we have some writeback bug where we can end up writing back the 
_same_ page concurrently, we'd actually end up racing in the kernel, and 
allocating two different backing stores, and then maybe the other one 
would effectively "get lost" (and the earlier writeback would win the 
race, explaining why we'd end up with zeroes at the end of a block).

Or something.

However, all the codepaths _seem_ to test for PG_writeback, and not even 
try to start another writeback while the first one is still active.

What would also actually be interesting is whether somebody can reproduce 
this on Reiserfs, for example. I _think_ all the reports I've seen are on 
ext2 or ext3, and if this is somehow writeback-related, it could be some 
bug that is just shared between the two by virtue of them still having a 
lot of stuff in common. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro

On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote:
> On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote:
> > 
> > 
> > On Sun, 24 Dec 2006, Andrei Popa wrote:
> > > 
> > > Hash check on download completion found bad chunks, consider using
> > > "safe_sync".
> > 
> > Dang. Did you get any warning messages from the kernel?
> > 
> > Linus
> 
> BTW, rmap.c patch is broken - needs at least

... but that doesn't affect most of the architectures - only sparc64 and
some of powerpc.  So it's definitely not enough.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro

On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> > 
> > Hash check on download completion found bad chunks, consider using
> > "safe_sync".
> 
> Dang. Did you get any warning messages from the kernel?
> 
>   Linus

BTW, rmap.c patch is broken - needs at least

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/mm/rmap.c b/mm/rmap.c
index 57306fa..669acb2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -452,7 +452,7 @@ static int page_mkclean_one(struct page 
entry = ptep_clear_flush(vma, address, pte);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
-   set_pte_at(vma, address, pte, entry);
+   set_pte_at(mm, address, pte, entry);
lazy_mmu_prot_update(entry);
ret = 1;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin


Linus Torvalds wrote:


On Sun, 24 Dec 2006, Linus Torvalds wrote:

Peter, tell me I'm crazy, but with the new rules, the following condition 
is a bug:


- shared mapping
- writable
- not already marked dirty in the PTE



Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like 
"do_no_page()" was simply buggy, and that this explains everything.


Still trying to catch up here, so I'm not going to reply to any old
stuff and just start at the tip of the thread... Other than to say
that I really like cancel_page_dirty ;)

I think your patch is quite right so that's a good catch. But I'm not
too surprised that it does not help the problem, because I don't
think we have started shedding any old pte_dirty tests at
unmap/reclaim-time, have we? So the dirty bit isn't going to get lost,
as such.

I was hoping that you've almost narrowed it down to the filesystem
writeback code, with the last few mails?

Nick

Please please please test. Throw all the other patches away (with the 
possible exception of the "update_mmu_cache()" sanity checker, which is 
still interesting in case some _other_ place does this too).


Don't do the "wait_on_page_writeback()" thing, because it changes timings 
and might hide thngs for the wrong reasons.  Just apply this on top of a 
known failing kernel, and test.


Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);




--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin


Linus Torvalds wrote:


On Sun, 24 Dec 2006, Linus Torvalds wrote:

Peter, tell me I'm crazy, but with the new rules, the following condition 
is a bug:


- shared mapping
- writable
- not already marked dirty in the PTE



Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like 
do_no_page() was simply buggy, and that this explains everything.


Still trying to catch up here, so I'm not going to reply to any old
stuff and just start at the tip of the thread... Other than to say
that I really like cancel_page_dirty ;)

I think your patch is quite right so that's a good catch. But I'm not
too surprised that it does not help the problem, because I don't
think we have started shedding any old pte_dirty tests at
unmap/reclaim-time, have we? So the dirty bit isn't going to get lost,
as such.

I was hoping that you've almost narrowed it down to the filesystem
writeback code, with the last few mails?

Nick

Please please please test. Throw all the other patches away (with the 
possible exception of the update_mmu_cache() sanity checker, which is 
still interesting in case some _other_ place does this too).


Don't do the wait_on_page_writeback() thing, because it changes timings 
and might hide thngs for the wrong reasons.  Just apply this on top of a 
known failing kernel, and test.


Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma-vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);




--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro

On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote:
 
 
 On Sun, 24 Dec 2006, Andrei Popa wrote:
  
  Hash check on download completion found bad chunks, consider using
  safe_sync.
 
 Dang. Did you get any warning messages from the kernel?
 
   Linus

BTW, rmap.c patch is broken - needs at least

Signed-off-by: Al Viro [EMAIL PROTECTED]
---
diff --git a/mm/rmap.c b/mm/rmap.c
index 57306fa..669acb2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -452,7 +452,7 @@ static int page_mkclean_one(struct page 
entry = ptep_clear_flush(vma, address, pte);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
-   set_pte_at(vma, address, pte, entry);
+   set_pte_at(mm, address, pte, entry);
lazy_mmu_prot_update(entry);
ret = 1;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro

On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote:
 On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote:
  
  
  On Sun, 24 Dec 2006, Andrei Popa wrote:
   
   Hash check on download completion found bad chunks, consider using
   safe_sync.
  
  Dang. Did you get any warning messages from the kernel?
  
  Linus
 
 BTW, rmap.c patch is broken - needs at least

... but that doesn't affect most of the architectures - only sparc64 and
some of powerpc.  So it's definitely not enough.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds



On Tue, 26 Dec 2006, Nick Piggin wrote:

 Linus Torvalds wrote:
  
  Ok, so how about this diff.
  
  I'm actually feeling good about this one. It really looks like
  do_no_page() was simply buggy, and that this explains everything.
 
 Still trying to catch up here, so I'm not going to reply to any old
 stuff and just start at the tip of the thread... Other than to say
 that I really like cancel_page_dirty ;)

Yeah, I think that part is a bit clearer about what's going on now.

 I think your patch is quite right so that's a good catch.

Actually, since people told me it didn't matter, I went back and looked at 
_why_ - the thing is, vma-vm_page_prot should always be read-only 
anyway, except for mappings that don't do dirty accounting at all, so I 
think my patch only found cases that are unimportant (ie pages that get 
faulted on on filesystems like ramfs that doesn't do any dirty page 
accounting because they're all dirty anyway).

 But I'm not too surprised that it does not help the problem, because I 
 don't think we have started shedding any old pte_dirty tests at 
 unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, 
 as such.

True. We should no longer _need_ those dirty bit reclaims at 
unmap/reclaim, but we still do them, so you're right, even if we were 
buggy in this area, it should only really matter for the dirty page 
counting, not for any lost data.

 I was hoping that you've almost narrowed it down to the filesystem
 writeback code, with the last few mails?

I think so, yes.

However, I've checked, and rtorrent really does seem to be fairly 
well-behaved wrt any filesystem activity. It does

 - no threading. It's 100% single-threaded, and doesn't even appear to use 
   signals.

 - exactly _one_ ftruncate(), and it does it at the beginning, for the 
   full final size.

   IOW, it's not anything subtle with truncate and dirty page cancel.

 - It never uses mprotect on the shared mappings, but it _does_ do:
mincore() - but the return values don't much matter (it's used 
  as a heuristic on which parts to hash, apparently)

  I double- and triple-checked this one, because I
  did make changes to mincore(), but those didn't go 
  into the affected kernels anyway (ie they are not in 
  plain 2.6.19, nor in 2.6.18.3 either)

madvise(MADV_WILLNEED)
msync(MS_ASYNC) (or MS_SYNC if you use a command line flag)
munmap() of course

 - it never seems to mix mmap() and write() - it does _only_ mmap.

 - it seems to mmap/munmap the shared files in nice 64-page chunks, all 
   64-page aligned in the file (ie it does NOT create one big mapping, it 
   has some kind of LRU of thse 64-page chunks). The only exception being 
   the last chunk, which it maps byte-accurate to the size.

 - I haven't checked whether it only ever has the same chunk mapped once 
   at a time.

Anyway, the _one_ half-way interesting thing is the fact that it doesn't 
allocate any backing store at all for the file, and as such the page 
writeback needs to create all the underlying buffers on the filesystem. I 
really don't see why that would be a problem either, but I could imagine 
that if we have some writeback bug where we can end up writing back the 
_same_ page concurrently, we'd actually end up racing in the kernel, and 
allocating two different backing stores, and then maybe the other one 
would effectively get lost (and the earlier writeback would win the 
race, explaining why we'd end up with zeroes at the end of a block).

Or something.

However, all the codepaths _seem_ to test for PG_writeback, and not even 
try to start another writeback while the first one is still active.

What would also actually be interesting is whether somebody can reproduce 
this on Reiserfs, for example. I _think_ all the reports I've seen are on 
ext2 or ext3, and if this is somehow writeback-related, it could be some 
bug that is just shared between the two by virtue of them still having a 
lot of stuff in common. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr

* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-24 11:35]:
> And if this doesn't fix it, I don't know what will..

Sorry, but it still fails (on top of plain 2.6.19).
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin

> Quoting Linus Torvalds <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content 
> corruption on ext3)
>
> Peter, tell me I'm crazy, but with the new rules, the following condition 
> is a bug:
> 
>  - shared mapping
>  - writable
>  - not already marked dirty in the PTE
> 
> because that combination means that the hardware can mark the PTE dirty 
> without us even realizing (and thus not marking the "struct page *" 
> dirty).

Er.
Sorry about bumping in, and I'm not sure I understand all of the discussion,
but this reminded me of an old issue with COW that created what looks
like a vaguely similiar data corruption on infiniband. We solved this for
infiniband with MADV_DONTFORK, but I always wondered why does it not affect
other parts of kernel.  Small reminder from that discussion:

down mmap sem
get user pages
up mmap sem
page becomes shared, and COW (e.g. fork)
process writes to first byte of page <- gets a copy
Now we had a problem: struct page that we got from get user pages
does not point to a correct page in our process.
For example: if at some point we map this page for DMA, and
hardware writes to last byte of page -> process does not
see this data.

So for infiniband, what we do is a combination of
- prevent page from becoming COW while hardware might DMA to this page, and
- ask users not to write to page if hardware might DMA to same page
  (even if its using different bytes).

I just wandered - is there some chance something like this could be happening in
the fs code?

HTH,

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like
"do_no_page()" was simply buggy, and that this explains everything.


I tested with just this patch and 2.6.19 and no change. Sorry Linus,
no early Christmas present :-(

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote:
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> > 
> > Hash check on download completion found bad chunks, consider using
> > "safe_sync".
> 
> Dang. Did you get any warning messages from the kernel?
> 

only these:
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80

but I don't think has anything to do with...

>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Andrei Popa wrote:
> 
> Hash check on download completion found bad chunks, consider using
> "safe_sync".

Dang. Did you get any warning messages from the kernel?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote:
> 
> On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> > 
> > The apt cache files (/var/cache/apt/*.bin) still get corrupted with
> > this patch and 2.6.19.
> 
> Yeah, if my guess about do_no_page() is right, _none_ of the previous 
> patches should have ANY effect what-so-ever. In fact, I'd say that even 
> the "ext3 works in writeback mode" thing that Andrei reports is probably a 
> total fluke brought on by timing changes rather than anything else.
> 
> So please try the latest patch instead (on top of anything that shows 
> corruption reliably - the patch should be _totally_ independent of all the 
> other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
> 2.6.19 too, so anything that shows corruption is a fine target - but try 
> to choose something that has been the "best" at corrupting things for you, 
> to make the testing as good as possible).
> 
> Patch included here again (although I think you were cc'd on my previous 
> email too, so you should already have it, and our emails just crossed)
> 
> And if this doesn't fix it, I don't know what will..

With latest git and patches:
http://lkml.org/lkml/diff/2006/12/24/56/1
http://lkml.org/lkml/diff/2006/12/24/61/1

Hash check on download completion found bad chunks, consider using
"safe_sync".

> 
>   Linus
> 
> ---
> diff --git a/mm/memory.c b/mm/memory.c
> index 563792f..cf429c4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2247,21 +2249,23 @@ retry:
>   if (pte_none(*page_table)) {
>   flush_icache_page(vma, new_page);
>   entry = mk_pte(new_page, vma->vm_page_prot);
> - if (write_access)
> - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> - set_pte_at(mm, address, page_table, entry);
>   if (anon) {
>   inc_mm_counter(mm, anon_rss);
>   lru_cache_add_active(new_page);
>   page_add_new_anon_rmap(new_page, vma, address);
> + if (write_access)
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   } else {
>   inc_mm_counter(mm, file_rss);
>   page_add_file_rmap(new_page);
> + entry = pte_wrprotect(entry);
>   if (write_access) {
>   dirty_page = new_page;
>   get_page(dirty_page);
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   }
>   }
> + set_pte_at(mm, address, page_table, entry);
>   } else {
>   /* One of our sibling threads was faster, back out. */
>   page_cache_release(new_page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> 
> The apt cache files (/var/cache/apt/*.bin) still get corrupted with
> this patch and 2.6.19.

Yeah, if my guess about do_no_page() is right, _none_ of the previous 
patches should have ANY effect what-so-ever. In fact, I'd say that even 
the "ext3 works in writeback mode" thing that Andrei reports is probably a 
total fluke brought on by timing changes rather than anything else.

So please try the latest patch instead (on top of anything that shows 
corruption reliably - the patch should be _totally_ independent of all the 
other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
2.6.19 too, so anything that shows corruption is a fine target - but try 
to choose something that has been the "best" at corrupting things for you, 
to make the testing as good as possible).

Patch included here again (although I think you were cc'd on my previous 
email too, so you should already have it, and our emails just crossed)

And if this doesn't fix it, I don't know what will..

Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


How about this particularly stupid diff? (please test with something that
_would_ cause corruption normally).

It is _entirely_ untested, but what it tries to do is to simply serialize
any writeback in progress with any process that tries to re-map a shared
page into its address space and dirty it. I haven't tested it, and maybe
it misses some case, but it looks likea good way to try to avoid races
with marking pages dirty and the writeback phase ..


The apt cache files (/var/cache/apt/*.bin) still get corrupted with
this patch and 2.6.19.

Gordon

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(>private_lock);
   ret = drop_buffers(page, _to_free);
   spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);

-static inline void clear_page_dirty(struct page *page)
-{
-   test_clear_page_dirty(page);
-}
-
static inline void set_page_writeback(struct page *page)
{
   test_set_page_writeback(page);
diff -Naupr linux-2.6.19.orig/mm/memory.c linux-2.6.19/mm/memory.c
--- linux-2.6.19.orig/mm/memory.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/mm/memory.c2006-12-24 11:04:03.0 -0700
@@ -1534,6 +1534,7 @@ static int do_wp_page(struct mm_struct *
   if (!pte_same(*page_table, orig_pte))
   goto unlock;
   }
+   wait_on_page_writeback(old_page);
   dirty_page = old_page;
   get_page(dirty_page);
   reuse = 1;
@@ -1832,6 +1833,33 @@ void unmap_mapping_range(struct address_
}
EXPORT_SYMBOL(unmap_mapping_range);

+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if (!mapping)
+   return;
+   offset = size & ~PAGE_MASK;
+   if (!offset)
+   return;
+   index = size >> PAGE_SHIFT;
+   page = find_lock_page(mapping, index);
+   if (page) {
+   unsigned int check = 0;
+   unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+   do {
+   check += kaddr[offset++];
+   } while (offset < PAGE_SIZE);
+   kunmap_atomic(kaddr,KM_USER0);
+   unlock_page(page);
+   page_cache_release(page);
+   if (check)
+   printk("%s: BADNESS: truncate check %u\n",
current->comm, check);
+   }
+}
+
/**
 * vmtruncate - unmap mappings "freed" by truncate() syscall
 * @inode: inode of the file used
@@ -1865,6 +1893,7 @@ do_expand:
   goto out_sig;
   if (offset

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Linus Torvalds wrote:
> 
> Peter, tell me I'm crazy, but with the new rules, the following condition 
> is a bug:
> 
>  - shared mapping
>  - writable
>  - not already marked dirty in the PTE

Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like 
"do_no_page()" was simply buggy, and that this explains everything.

Please please please test. Throw all the other patches away (with the 
possible exception of the "update_mmu_cache()" sanity checker, which is 
still interesting in case some _other_ place does this too).

Don't do the "wait_on_page_writeback()" thing, because it changes timings 
and might hide thngs for the wrong reasons.  Just apply this on top of a 
known failing kernel, and test.

Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Linus Torvalds wrote:
>
> How about this particularly stupid diff? (please test with something that 
> _would_ cause corruption normally).

Actually, here's an even more stupid diff, which actually to some degree 
seems to capture the real problem better.

Peter, tell me I'm crazy, but with the new rules, the following condition 
is a bug:

 - shared mapping
 - writable
 - not already marked dirty in the PTE

because that combination means that the hardware can mark the PTE dirty 
without us even realizing (and thus not marking the "struct page *" 
dirty).

(The above is actually a valid situation for IO mappings, but not for 
"real" mappings. And IO mappings should never take page faults, I think).

So, with that in mind, I wrote this stupid patch (for 32-bit x86, since I 
used my Mac Mini for testing ratehr than my main machine - but the x86-64 
version should be pretty much identcal)..

And you know what, Peter? It triggers for me. I get

WARNING at mm/memory.c:2274 do_no_page()
 [] show_trace_log_lvl+0x1a/0x2f
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] __handle_mm_fault+0x38d/0x919
 [] do_page_fault+0x1ff/0x507
 [] error_code+0x7c/0x84

which seems to say that do_no_page() can be used to insert shared and 
non-dirty, but still writable, pages.

But maybe my patch is just bogus, and I didn't think it through.

Peter, I realize it's Christmas Eve, but let's face it, Santa appreciates 
good boys and girls, and we all want tons of loot. So please be good, and 
waste some time looking at this and tell me why I'm either wrong, or 
there's a real smoking gun here.. ;)

Linus

---
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..1389bb7 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -494,7 +494,13 @@ do {   
\
  * The i386 doesn't have any external MMU info: the kernel page
  * tables contain all the necessary information.
  */
-#define update_mmu_cache(vma,address,pte) do { } while (0)
+#define bad_shared_pte(pte) (pte_write(pte) && !pte_dirty(pte))
+#define update_mmu_cache(vma,address,pte) do { \
+   static int __cnt;   \
+   WARN_ON(((vma)->vm_flags & VM_SHARED)   \
+&& bad_shared_pte(pte) \
+&& ++__cnt < 5);   \
+} while (0)
 #endif /* !__ASSEMBLY__ */

 #ifdef CONFIG_FLATMEM
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 09:16:06 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> 
> > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> > > Andrei Popa <[EMAIL PROTECTED]> wrote:
> > > > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > > > 
> > > > I don't have corruption. I tested twice.
> > > 
> > > This is a surprising result.  Can you pleas retest ext3 
> > > data=writeback,nobh?
> > 
> > Yes, no corruption. Also tested only with data=writeback and had no
> > corruption.
> 
> Ok, so it would seem to be writeback related _somehow_. However, most of 
> the differences (I _thought_) in ext3 actually show up only if you have 
> *both* "nobh" and "data=writeback", and as far as I can tell, just a 
> simple "data=writeback" should still use the bog-standard 
> "block_write_full_page()".
> 
> Andrew?
> 
> Although as far as I can see, then ext2 should work as-is too (since it 
> too also just uses "block_write_full_page()" without anything fancy).

ext2 uses the multipage-bio assembly code for writeback whereas ext3
doesn't.  But ext3 doesn't use that code in data=ordered mode, of course.

Still, this:

--- a/fs/ext2/inode.c~a
+++ a/fs/ext2/inode.c
@@ -693,7 +693,7 @@ const struct address_space_operations ex
.commit_write   = generic_commit_write,
.bmap   = ext2_bmap,
.direct_IO  = ext2_direct_IO,
-   .writepages = ext2_writepages,
+// .writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
 };
 
@@ -711,7 +711,7 @@ const struct address_space_operations ex
.commit_write   = nobh_commit_write,
.bmap   = ext2_bmap,
.direct_IO  = ext2_direct_IO,
-   .writepages = ext2_writepages,
+// .writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
 };
 
_

will switch it off for ext2.


> Strange.
> 
> How about this particularly stupid diff? (please test with something that 
> _would_ cause corruption normally).
> 
> It is _entirely_ untested, but what it tries to do is to simply serialize 
> any writeback in progress with any process that tries to re-map a shared 
> page into its address space and dirty it. I haven't tested it, and maybe 
> it misses some case, but it looks likea good way to try to avoid races 
> with marking pages dirty and the writeback phase ..
> 
>   Linus
> ---
> diff --git a/mm/memory.c b/mm/memory.c
> index 563792f..64ed10b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
> vm_area_struct *vma,
>   if (!pte_same(*page_table, orig_pte))
>   goto unlock;
>   }
> + wait_on_page_writeback(old_page);
>   dirty_page = old_page;
>   get_page(dirty_page);
>   reuse = 1;
> @@ -2215,6 +2216,7 @@ retry:
>   page_cache_release(new_page);
>   return VM_FAULT_SIGBUS;
>   }
> + wait_on_page_writeback(new_page);
>   }
>   }

yup.  Also, we could perhaps lock the target page during pagefaults..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Andrei Popa wrote:

> On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> > Andrei Popa <[EMAIL PROTECTED]> wrote:
> > > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > > 
> > > I don't have corruption. I tested twice.
> > 
> > This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?
> 
> Yes, no corruption. Also tested only with data=writeback and had no
> corruption.

Ok, so it would seem to be writeback related _somehow_. However, most of 
the differences (I _thought_) in ext3 actually show up only if you have 
*both* "nobh" and "data=writeback", and as far as I can tell, just a 
simple "data=writeback" should still use the bog-standard 
"block_write_full_page()".

Andrew?

Although as far as I can see, then ext2 should work as-is too (since it 
too also just uses "block_write_full_page()" without anything fancy).

Strange.

How about this particularly stupid diff? (please test with something that 
_would_ cause corruption normally).

It is _entirely_ untested, but what it tries to do is to simply serialize 
any writeback in progress with any process that tries to re-map a shared 
page into its address space and dirty it. I haven't tested it, and maybe 
it misses some case, but it looks likea good way to try to avoid races 
with marking pages dirty and the writeback phase ..

Linus
---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..64ed10b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
if (!pte_same(*page_table, orig_pte))
goto unlock;
}
+   wait_on_page_writeback(old_page);
dirty_page = old_page;
get_page(dirty_page);
reuse = 1;
@@ -2215,6 +2216,7 @@ retry:
page_cache_release(new_page);
return VM_FAULT_SIGBUS;
}
+   wait_on_page_writeback(new_page);
}
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> On Sun, 24 Dec 2006 14:14:38 +0200
> Andrei Popa <[EMAIL PROTECTED]> wrote:
> 
> > > - mount the fs with ext2 with the no-buffer-head option.  That means 
> > > either:
> > > 
> > >   grub.conf:  rootfstype=ext2 rootflags=nobh
> > >   /etc/fstab: ext2 nobh
> > 
> > ierdnac ~ # mount
> > /dev/sda7 on / type ext2 (rw,noatime,nobh)
> > 
> > I have corruption.
> > 
> > > 
> > > - mount the fs with ext3 data=writeback, nobh
> > > 
> > >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > > works)
> > >   /etc/fstab: ext2 data=writeback,nobh
> > 
> > ierdnac ~ # mount
> > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > 
> > ierdnac ~ # dmesg|grep EXT3
> > EXT3-fs: mounted filesystem with writeback data mode.
> > EXT3 FS on sda7, internal journal
> > 
> > I don't have corruption. I tested twice.
> 
> This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?

Yes, no corruption. Also tested only with data=writeback and had no
corruption.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr

* Andrew Morton <[EMAIL PROTECTED]> [2006-12-24 00:57]:
>   /etc/fstab: ext2 nobh
>   /etc/fstab: ext3 data=writeback,nobh

It seems that busybox mount ignores the nobh option but both ext2 and
ext3 data=writeback work for me.  This is with plain 2.6.19 which
normally always fails.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 14:14:38 +0200
Andrei Popa <[EMAIL PROTECTED]> wrote:

> > - mount the fs with ext2 with the no-buffer-head option.  That means either:
> > 
> >   grub.conf:  rootfstype=ext2 rootflags=nobh
> >   /etc/fstab: ext2 nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext2 (rw,noatime,nobh)
> 
> I have corruption.
> 
> > 
> > - mount the fs with ext3 data=writeback, nobh
> > 
> >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > works)
> >   /etc/fstab: ext2 data=writeback,nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext3 (rw,noatime,nobh)
> 
> ierdnac ~ # dmesg|grep EXT3
> EXT3-fs: mounted filesystem with writeback data mode.
> EXT3 FS on sda7, internal journal
> 
> I don't have corruption. I tested twice.

This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 14:26:01 +0200
Andrei Popa <[EMAIL PROTECTED]> wrote:

> I also tested with ext3 ordered, nobh  and I have file corruption...

ordered+nobh isn't a possible combination.  The filesystem probably ignored
nobh.  nobh mode only makes sense with data=writeback.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote:
> On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
> > On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
> > Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > 
> > > I now _suspect_ that we're talking about something like
> > > 
> > >  - we started a writeout. The IO is still pending, and the page was 
> > >marked clean and is now in the "writeback" phase.
> > >  - a write happens to the page, and the page gets marked dirty again. 
> > >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> > >but they were actually already dirty, because the IO hasn't completed 
> > >yet.
> > >  - the IO from the _previous_ write completes, and marks the buffers 
> > > clean 
> > >again.
> > 
> > Some things for the testers to try, please:
> > 
> > - mount the fs with ext2 with the no-buffer-head option.  That means either:
> > 
> >   grub.conf:  rootfstype=ext2 rootflags=nobh
> >   /etc/fstab: ext2 nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext2 (rw,noatime,nobh)
> 
> I have corruption.
> 
> > 
> > - mount the fs with ext3 data=writeback, nobh
> > 
> >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > works)
> >   /etc/fstab: ext2 data=writeback,nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext3 (rw,noatime,nobh)
> 
> ierdnac ~ # dmesg|grep EXT3
> EXT3-fs: mounted filesystem with writeback data mode.
> EXT3 FS on sda7, internal journal
> 
> I don't have corruption. I tested twice.
> 

I also tested with ext3 ordered, nobh  and I have file corruption...

> > 
> > if that still fails we can rule out buffer_head funnies.
> > 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
> On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > I now _suspect_ that we're talking about something like
> > 
> >  - we started a writeout. The IO is still pending, and the page was 
> >marked clean and is now in the "writeback" phase.
> >  - a write happens to the page, and the page gets marked dirty again. 
> >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> >but they were actually already dirty, because the IO hasn't completed 
> >yet.
> >  - the IO from the _previous_ write completes, and marks the buffers clean 
> >again.
> 
> Some things for the testers to try, please:
> 
> - mount the fs with ext2 with the no-buffer-head option.  That means either:
> 
>   grub.conf:  rootfstype=ext2 rootflags=nobh
>   /etc/fstab: ext2 nobh

ierdnac ~ # mount
/dev/sda7 on / type ext2 (rw,noatime,nobh)

I have corruption.

> 
> - mount the fs with ext3 data=writeback, nobh
> 
>   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> works)
>   /etc/fstab: ext2 data=writeback,nobh

ierdnac ~ # mount
/dev/sda7 on / type ext3 (rw,noatime,nobh)

ierdnac ~ # dmesg|grep EXT3
EXT3-fs: mounted filesystem with writeback data mode.
EXT3 FS on sda7, internal journal

I don't have corruption. I tested twice.

> 
> if that still fails we can rule out buffer_head funnies.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Andrew Morton wrote:
> 
> > I now _suspect_ that we're talking about something like
> > 
> >  - we started a writeout. The IO is still pending, and the page was 
> >marked clean and is now in the "writeback" phase.
> >  - a write happens to the page, and the page gets marked dirty again. 
> >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> >but they were actually already dirty, because the IO hasn't completed 
> >yet.
> >  - the IO from the _previous_ write completes, and marks the buffers clean 
> >again.
> 
> Some things for the testers to try, please:
> 
> - mount the fs with ext2 with the no-buffer-head option.  That means either:

[ snip snip ]

This is definitely worth testing, but the exact schenario I outlined is 
probably not the thing that happens. It was really meant to be more of an 
exmple of the _kind_ of situation I think we might have.

That would explain why we didn't see this before: we simply didn't mark 
pages clean all that aggressively, and an app like rtorrent would normally 
have caused its flushes to happen _synchronously_ by using msync() (even 
if the IO itself was done asynchronously, all the dirty bit stuff would be 
synchronous wrt any rtorrent behaviour).

And the things that /did/ use to clean pages asynchronously (VM scanning) 
would always actually look at the "young" bit (aka "accessed") and not 
even touch the dirty bit if an application had accessed the page recently, 
so that basically avoided any likely races, because we'd touch the dirty 
bit ONLY if the page was "cold".

So this is why I'm saying that it might be an old bug, and it would be 
just the new pattern of handling dirty bits that triggers it.

But avoiding buffer heads and testing that part is worth doing. Just to 
remove one thing from the equation.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> I now _suspect_ that we're talking about something like
> 
>  - we started a writeout. The IO is still pending, and the page was 
>marked clean and is now in the "writeback" phase.
>  - a write happens to the page, and the page gets marked dirty again. 
>Marking the page dirty also marks all the _buffers_ in the page dirty, 
>but they were actually already dirty, because the IO hasn't completed 
>yet.
>  - the IO from the _previous_ write completes, and marks the buffers clean 
>again.

Some things for the testers to try, please:

- mount the fs with ext2 with the no-buffer-head option.  That means either:

  grub.conf:  rootfstype=ext2 rootflags=nobh
  /etc/fstab: ext2 nobh

- mount the fs with ext3 data=writeback, nobh

  grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this works)
  /etc/fstab: ext2 data=writeback,nobh

if that still fails we can rule out buffer_head funnies.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds

On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> 
> Is there any way to provide any debugging information that may help
> solve the problem ?

I think we have people working on this. I know I'm trying to even come up 
with an idea of what is going on. I don't think we know yet.

> Would it help to know the nature of the corruption e.g. an analysis
> of the corruption in the file ?

I actually think we know that, because Andrei already gave details. The 
corruption seems to be basically a few pages that get zeroes at the end 
rather than the expected contents. That's consistent with the page being 
written out once, but then _not_ getting written out again despite being 
dirtied some more.

But if you see ay other pattern, please holler, because that would be 
interesting.

> BTW, I decided to try Linus's test program [1] on ARM (I don't think
> that anybody had tried it on ARM before).

You get the expected results, and in fact, I'd be very surprised if you 
didn't. It's something subtler than that going on.

I now _suspect_ that we're talking about something like

 - we started a writeout. The IO is still pending, and the page was 
   marked clean and is now in the "writeback" phase.
 - a write happens to the page, and the page gets marked dirty again. 
   Marking the page dirty also marks all the _buffers_ in the page dirty, 
   but they were actually already dirty, because the IO hasn't completed 
   yet.
 - the IO from the _previous_ write completes, and marks the buffers clean 
   again.

And no, thatr's not actually what is going on. The thing is, we actually 
clear the buffer dirty bits when we start the IO, not when we end it, but 
I think it is going to be this _kind_ of situation, where we missed 
something, and marked it clean too late, and thus cleared a dirty bit.

I don't think it's a page table issue any more, it just doesn't look 
likely with the ARM UP corruption. It's also not apparently even on a 
cacheline boundary, so it probably is really a dirty bit that got cleared 
wrogn due to some race with IO.

But right now we're all clueless. I personally suspect it's not even a new 
bug: it's probably an old bug that simply didn't matter before.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote:


* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]:
> >  and it failed.
> Since you are on ARM you might want to try with the page_mkclean_one
> cleanup patch too.

I've already tried it and it didn't work.  I just tried it again
together with Linus' patch and the two from Andrew and it still fails.
(For reference, the patch is attached.)


I can confirm this behaviour with 2.6.19 and the patches mentioned
above (cumulative patch for 2.6.19 appended to the end of this email).

Is there any way to provide any debugging information that may help
solve the problem ? Would it help to know the nature of the corruption
e.g. an analysis of the corruption in the file ? I have previously
asked apt developers if they wanted to look at the corrupted cache
files, but there were no takers then.

BTW, I decided to try Linus's test program [1] on ARM (I don't think
that anybody had tried it on ARM before).

Since we see file corruption with 2.6.18 + [PATCH] mm: tracking shared
dirty pages [2], I ran Linus's program on machines with the following
setups:

2.6.18 + the following patches
  mm: tracking shared dirty pages [2]
  mm: balance dirty pages [3]
  mm: optimize the new mprotect() code a bit [4]
  mm: small cleanup of install_page() [5]
  mm: fixup do_wp_page() [6]
  mm: msync() cleanup [7]

$ ./mm-test | od -x
000        
020        
040    
050

2.6.18 (no mm patches)

$ ./mm-test | od -x
000        
020        
040    
050

I don't know if this helps at all.

Gordon

[1] http://lkml.org/lkml/2006/12/19/200
[2] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
[3] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=edc79b2a46ed854595e40edcf3f8b37f9f14aa3f
[4] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c1e6098b23bb46e2b488fe9a26f831f867157483
[5] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e88dd6c11c5aef74d8b74a062767add53315533b
[6] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ee6a6457886a80415db209e87033b63f2b06558c
[7] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=204ec841fbea3e5138168edbc3a76d46747cc987

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(>private_lock);
   ret = drop_buffers(page, _to_free);
   spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

 static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

 struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
 int test_clear_page_writeback(struct

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr [EMAIL PROTECTED] wrote:


* Peter Zijlstra [EMAIL PROTECTED] [2006-12-22 14:25]:
   and it failed.
 Since you are on ARM you might want to try with the page_mkclean_one
 cleanup patch too.

I've already tried it and it didn't work.  I just tried it again
together with Linus' patch and the two from Andrew and it still fails.
(For reference, the patch is attached.)


I can confirm this behaviour with 2.6.19 and the patches mentioned
above (cumulative patch for 2.6.19 appended to the end of this email).

Is there any way to provide any debugging information that may help
solve the problem ? Would it help to know the nature of the corruption
e.g. an analysis of the corruption in the file ? I have previously
asked apt developers if they wanted to look at the corrupted cache
files, but there were no takers then.

BTW, I decided to try Linus's test program [1] on ARM (I don't think
that anybody had tried it on ARM before).

Since we see file corruption with 2.6.18 + [PATCH] mm: tracking shared
dirty pages [2], I ran Linus's program on machines with the following
setups:

2.6.18 + the following patches
  mm: tracking shared dirty pages [2]
  mm: balance dirty pages [3]
  mm: optimize the new mprotect() code a bit [4]
  mm: small cleanup of install_page() [5]
  mm: fixup do_wp_page() [6]
  mm: msync() cleanup [7]

$ ./mm-test | od -x
000        
020        
040    
050

2.6.18 (no mm patches)

$ ./mm-test | od -x
000        
020        
040    
050

I don't know if this helps at all.

Gordon

[1] http://lkml.org/lkml/2006/12/19/200
[2] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
[3] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=edc79b2a46ed854595e40edcf3f8b37f9f14aa3f
[4] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c1e6098b23bb46e2b488fe9a26f831f867157483
[5] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e88dd6c11c5aef74d8b74a062767add53315533b
[6] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ee6a6457886a80415db209e87033b63f2b06558c
[7] 
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=204ec841fbea3e5138168edbc3a76d46747cc987

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(mapping-private_lock);
   ret = drop_buffers(page, buffers_to_free);
   spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

 static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

 struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
 int

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Gordon Farquharson wrote:
 
 Is there any way to provide any debugging information that may help
 solve the problem ?

I think we have people working on this. I know I'm trying to even come up 
with an idea of what is going on. I don't think we know yet.

 Would it help to know the nature of the corruption e.g. an analysis
 of the corruption in the file ?

I actually think we know that, because Andrei already gave details. The 
corruption seems to be basically a few pages that get zeroes at the end 
rather than the expected contents. That's consistent with the page being 
written out once, but then _not_ getting written out again despite being 
dirtied some more.

But if you see ay other pattern, please holler, because that would be 
interesting.

 BTW, I decided to try Linus's test program [1] on ARM (I don't think
 that anybody had tried it on ARM before).

You get the expected results, and in fact, I'd be very surprised if you 
didn't. It's something subtler than that going on.

I now _suspect_ that we're talking about something like

 - we started a writeout. The IO is still pending, and the page was 
   marked clean and is now in the writeback phase.
 - a write happens to the page, and the page gets marked dirty again. 
   Marking the page dirty also marks all the _buffers_ in the page dirty, 
   but they were actually already dirty, because the IO hasn't completed 
   yet.
 - the IO from the _previous_ write completes, and marks the buffers clean 
   again.

And no, thatr's not actually what is going on. The thing is, we actually 
clear the buffer dirty bits when we start the IO, not when we end it, but 
I think it is going to be this _kind_ of situation, where we missed 
something, and marked it clean too late, and thus cleared a dirty bit.

I don't think it's a page table issue any more, it just doesn't look 
likely with the ARM UP corruption. It's also not apparently even on a 
cacheline boundary, so it probably is really a dirty bit that got cleared 
wrogn due to some race with IO.

But right now we're all clueless. I personally suspect it's not even a new 
bug: it's probably an old bug that simply didn't matter before.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
Linus Torvalds [EMAIL PROTECTED] wrote:

 I now _suspect_ that we're talking about something like
 
  - we started a writeout. The IO is still pending, and the page was 
marked clean and is now in the writeback phase.
  - a write happens to the page, and the page gets marked dirty again. 
Marking the page dirty also marks all the _buffers_ in the page dirty, 
but they were actually already dirty, because the IO hasn't completed 
yet.
  - the IO from the _previous_ write completes, and marks the buffers clean 
again.

Some things for the testers to try, please:

- mount the fs with ext2 with the no-buffer-head option.  That means either:

  grub.conf:  rootfstype=ext2 rootflags=nobh
  /etc/fstab: ext2 nobh

- mount the fs with ext3 data=writeback, nobh

  grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this works)
  /etc/fstab: ext2 data=writeback,nobh

if that still fails we can rule out buffer_head funnies.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Andrew Morton wrote:
 
  I now _suspect_ that we're talking about something like
  
   - we started a writeout. The IO is still pending, and the page was 
 marked clean and is now in the writeback phase.
   - a write happens to the page, and the page gets marked dirty again. 
 Marking the page dirty also marks all the _buffers_ in the page dirty, 
 but they were actually already dirty, because the IO hasn't completed 
 yet.
   - the IO from the _previous_ write completes, and marks the buffers clean 
 again.
 
 Some things for the testers to try, please:
 
 - mount the fs with ext2 with the no-buffer-head option.  That means either:

[ snip snip ]

This is definitely worth testing, but the exact schenario I outlined is 
probably not the thing that happens. It was really meant to be more of an 
exmple of the _kind_ of situation I think we might have.

That would explain why we didn't see this before: we simply didn't mark 
pages clean all that aggressively, and an app like rtorrent would normally 
have caused its flushes to happen _synchronously_ by using msync() (even 
if the IO itself was done asynchronously, all the dirty bit stuff would be 
synchronous wrt any rtorrent behaviour).

And the things that /did/ use to clean pages asynchronously (VM scanning) 
would always actually look at the young bit (aka accessed) and not 
even touch the dirty bit if an application had accessed the page recently, 
so that basically avoided any likely races, because we'd touch the dirty 
bit ONLY if the page was cold.

So this is why I'm saying that it might be an old bug, and it would be 
just the new pattern of handling dirty bits that triggers it.

But avoiding buffer heads and testing that part is worth doing. Just to 
remove one thing from the equation.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
 On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
 Linus Torvalds [EMAIL PROTECTED] wrote:
 
  I now _suspect_ that we're talking about something like
  
   - we started a writeout. The IO is still pending, and the page was 
 marked clean and is now in the writeback phase.
   - a write happens to the page, and the page gets marked dirty again. 
 Marking the page dirty also marks all the _buffers_ in the page dirty, 
 but they were actually already dirty, because the IO hasn't completed 
 yet.
   - the IO from the _previous_ write completes, and marks the buffers clean 
 again.
 
 Some things for the testers to try, please:
 
 - mount the fs with ext2 with the no-buffer-head option.  That means either:
 
   grub.conf:  rootfstype=ext2 rootflags=nobh
   /etc/fstab: ext2 nobh

ierdnac ~ # mount
/dev/sda7 on / type ext2 (rw,noatime,nobh)

I have corruption.

 
 - mount the fs with ext3 data=writeback, nobh
 
   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
 works)
   /etc/fstab: ext2 data=writeback,nobh

ierdnac ~ # mount
/dev/sda7 on / type ext3 (rw,noatime,nobh)

ierdnac ~ # dmesg|grep EXT3
EXT3-fs: mounted filesystem with writeback data mode.
EXT3 FS on sda7, internal journal

I don't have corruption. I tested twice.

 
 if that still fails we can rule out buffer_head funnies.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote:
 On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
  On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
  Linus Torvalds [EMAIL PROTECTED] wrote:
  
   I now _suspect_ that we're talking about something like
   
- we started a writeout. The IO is still pending, and the page was 
  marked clean and is now in the writeback phase.
- a write happens to the page, and the page gets marked dirty again. 
  Marking the page dirty also marks all the _buffers_ in the page dirty, 
  but they were actually already dirty, because the IO hasn't completed 
  yet.
- the IO from the _previous_ write completes, and marks the buffers 
   clean 
  again.
  
  Some things for the testers to try, please:
  
  - mount the fs with ext2 with the no-buffer-head option.  That means either:
  
grub.conf:  rootfstype=ext2 rootflags=nobh
/etc/fstab: ext2 nobh
 
 ierdnac ~ # mount
 /dev/sda7 on / type ext2 (rw,noatime,nobh)
 
 I have corruption.
 
  
  - mount the fs with ext3 data=writeback, nobh
  
grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
  works)
/etc/fstab: ext2 data=writeback,nobh
 
 ierdnac ~ # mount
 /dev/sda7 on / type ext3 (rw,noatime,nobh)
 
 ierdnac ~ # dmesg|grep EXT3
 EXT3-fs: mounted filesystem with writeback data mode.
 EXT3 FS on sda7, internal journal
 
 I don't have corruption. I tested twice.
 

I also tested with ext3 ordered, nobh  and I have file corruption...

  
  if that still fails we can rule out buffer_head funnies.
  

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 14:26:01 +0200
Andrei Popa [EMAIL PROTECTED] wrote:

 I also tested with ext3 ordered, nobh  and I have file corruption...

ordered+nobh isn't a possible combination.  The filesystem probably ignored
nobh.  nobh mode only makes sense with data=writeback.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 14:14:38 +0200
Andrei Popa [EMAIL PROTECTED] wrote:

  - mount the fs with ext2 with the no-buffer-head option.  That means either:
  
grub.conf:  rootfstype=ext2 rootflags=nobh
/etc/fstab: ext2 nobh
 
 ierdnac ~ # mount
 /dev/sda7 on / type ext2 (rw,noatime,nobh)
 
 I have corruption.
 
  
  - mount the fs with ext3 data=writeback, nobh
  
grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
  works)
/etc/fstab: ext2 data=writeback,nobh
 
 ierdnac ~ # mount
 /dev/sda7 on / type ext3 (rw,noatime,nobh)
 
 ierdnac ~ # dmesg|grep EXT3
 EXT3-fs: mounted filesystem with writeback data mode.
 EXT3 FS on sda7, internal journal
 
 I don't have corruption. I tested twice.

This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr

* Andrew Morton [EMAIL PROTECTED] [2006-12-24 00:57]:
   /etc/fstab: ext2 nobh
   /etc/fstab: ext3 data=writeback,nobh

It seems that busybox mount ignores the nobh option but both ext2 and
ext3 data=writeback work for me.  This is with plain 2.6.19 which
normally always fails.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
 On Sun, 24 Dec 2006 14:14:38 +0200
 Andrei Popa [EMAIL PROTECTED] wrote:
 
   - mount the fs with ext2 with the no-buffer-head option.  That means 
   either:
   
 grub.conf:  rootfstype=ext2 rootflags=nobh
 /etc/fstab: ext2 nobh
  
  ierdnac ~ # mount
  /dev/sda7 on / type ext2 (rw,noatime,nobh)
  
  I have corruption.
  
   
   - mount the fs with ext3 data=writeback, nobh
   
 grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
   works)
 /etc/fstab: ext2 data=writeback,nobh
  
  ierdnac ~ # mount
  /dev/sda7 on / type ext3 (rw,noatime,nobh)
  
  ierdnac ~ # dmesg|grep EXT3
  EXT3-fs: mounted filesystem with writeback data mode.
  EXT3 FS on sda7, internal journal
  
  I don't have corruption. I tested twice.
 
 This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?

Yes, no corruption. Also tested only with data=writeback and had no
corruption.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Andrei Popa wrote:

 On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
  Andrei Popa [EMAIL PROTECTED] wrote:
   /dev/sda7 on / type ext3 (rw,noatime,nobh)
   
   I don't have corruption. I tested twice.
  
  This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?
 
 Yes, no corruption. Also tested only with data=writeback and had no
 corruption.

Ok, so it would seem to be writeback related _somehow_. However, most of 
the differences (I _thought_) in ext3 actually show up only if you have 
*both* nobh and data=writeback, and as far as I can tell, just a 
simple data=writeback should still use the bog-standard 
block_write_full_page().

Andrew?

Although as far as I can see, then ext2 should work as-is too (since it 
too also just uses block_write_full_page() without anything fancy).

Strange.

How about this particularly stupid diff? (please test with something that 
_would_ cause corruption normally).

It is _entirely_ untested, but what it tries to do is to simply serialize 
any writeback in progress with any process that tries to re-map a shared 
page into its address space and dirty it. I haven't tested it, and maybe 
it misses some case, but it looks likea good way to try to avoid races 
with marking pages dirty and the writeback phase ..

Linus
---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..64ed10b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
if (!pte_same(*page_table, orig_pte))
goto unlock;
}
+   wait_on_page_writeback(old_page);
dirty_page = old_page;
get_page(dirty_page);
reuse = 1;
@@ -2215,6 +2216,7 @@ retry:
page_cache_release(new_page);
return VM_FAULT_SIGBUS;
}
+   wait_on_page_writeback(new_page);
}
}
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton

On Sun, 24 Dec 2006 09:16:06 -0800 (PST)
Linus Torvalds [EMAIL PROTECTED] wrote:

 
 
 On Sun, 24 Dec 2006, Andrei Popa wrote:
 
  On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
   Andrei Popa [EMAIL PROTECTED] wrote:
/dev/sda7 on / type ext3 (rw,noatime,nobh)

I don't have corruption. I tested twice.
   
   This is a surprising result.  Can you pleas retest ext3 
   data=writeback,nobh?
  
  Yes, no corruption. Also tested only with data=writeback and had no
  corruption.
 
 Ok, so it would seem to be writeback related _somehow_. However, most of 
 the differences (I _thought_) in ext3 actually show up only if you have 
 *both* nobh and data=writeback, and as far as I can tell, just a 
 simple data=writeback should still use the bog-standard 
 block_write_full_page().
 
 Andrew?
 
 Although as far as I can see, then ext2 should work as-is too (since it 
 too also just uses block_write_full_page() without anything fancy).

ext2 uses the multipage-bio assembly code for writeback whereas ext3
doesn't.  But ext3 doesn't use that code in data=ordered mode, of course.

Still, this:

--- a/fs/ext2/inode.c~a
+++ a/fs/ext2/inode.c
@@ -693,7 +693,7 @@ const struct address_space_operations ex
.commit_write   = generic_commit_write,
.bmap   = ext2_bmap,
.direct_IO  = ext2_direct_IO,
-   .writepages = ext2_writepages,
+// .writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
 };
 
@@ -711,7 +711,7 @@ const struct address_space_operations ex
.commit_write   = nobh_commit_write,
.bmap   = ext2_bmap,
.direct_IO  = ext2_direct_IO,
-   .writepages = ext2_writepages,
+// .writepages = ext2_writepages,
.migratepage= buffer_migrate_page,
 };
 
_

will switch it off for ext2.


 Strange.
 
 How about this particularly stupid diff? (please test with something that 
 _would_ cause corruption normally).
 
 It is _entirely_ untested, but what it tries to do is to simply serialize 
 any writeback in progress with any process that tries to re-map a shared 
 page into its address space and dirty it. I haven't tested it, and maybe 
 it misses some case, but it looks likea good way to try to avoid races 
 with marking pages dirty and the writeback phase ..
 
   Linus
 ---
 diff --git a/mm/memory.c b/mm/memory.c
 index 563792f..64ed10b 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct 
 vm_area_struct *vma,
   if (!pte_same(*page_table, orig_pte))
   goto unlock;
   }
 + wait_on_page_writeback(old_page);
   dirty_page = old_page;
   get_page(dirty_page);
   reuse = 1;
 @@ -2215,6 +2216,7 @@ retry:
   page_cache_release(new_page);
   return VM_FAULT_SIGBUS;
   }
 + wait_on_page_writeback(new_page);
   }
   }

yup.  Also, we could perhaps lock the target page during pagefaults..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds




On Sun, 24 Dec 2006, Linus Torvalds wrote:

 How about this particularly stupid diff? (please test with something that 
 _would_ cause corruption normally).

Actually, here's an even more stupid diff, which actually to some degree 
seems to capture the real problem better.

Peter, tell me I'm crazy, but with the new rules, the following condition 
is a bug:

 - shared mapping
 - writable
 - not already marked dirty in the PTE

because that combination means that the hardware can mark the PTE dirty 
without us even realizing (and thus not marking the struct page * 
dirty).

(The above is actually a valid situation for IO mappings, but not for 
real mappings. And IO mappings should never take page faults, I think).

So, with that in mind, I wrote this stupid patch (for 32-bit x86, since I 
used my Mac Mini for testing ratehr than my main machine - but the x86-64 
version should be pretty much identcal)..

And you know what, Peter? It triggers for me. I get

WARNING at mm/memory.c:2274 do_no_page()
 [c0103d4a] show_trace_log_lvl+0x1a/0x2f
 [c010436c] show_trace+0x12/0x14
 [c01043f0] dump_stack+0x16/0x18
 [c0159790] __handle_mm_fault+0x38d/0x919
 [c011c8c4] do_page_fault+0x1ff/0x507
 [c03fabcc] error_code+0x7c/0x84

which seems to say that do_no_page() can be used to insert shared and 
non-dirty, but still writable, pages.

But maybe my patch is just bogus, and I didn't think it through.

Peter, I realize it's Christmas Eve, but let's face it, Santa appreciates 
good boys and girls, and we all want tons of loot. So please be good, and 
waste some time looking at this and tell me why I'm either wrong, or 
there's a real smoking gun here.. ;)

Linus

---
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..1389bb7 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -494,7 +494,13 @@ do {   
\
  * The i386 doesn't have any external MMU info: the kernel page
  * tables contain all the necessary information.
  */
-#define update_mmu_cache(vma,address,pte) do { } while (0)
+#define bad_shared_pte(pte) (pte_write(pte)  !pte_dirty(pte))
+#define update_mmu_cache(vma,address,pte) do { \
+   static int __cnt;   \
+   WARN_ON(((vma)-vm_flags  VM_SHARED)   \
+ bad_shared_pte(pte) \
+ ++__cnt  5);   \
+} while (0)
 #endif /* !__ASSEMBLY__ */
 
 #ifdef CONFIG_FLATMEM
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Linus Torvalds wrote:
 
 Peter, tell me I'm crazy, but with the new rules, the following condition 
 is a bug:
 
  - shared mapping
  - writable
  - not already marked dirty in the PTE

Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like 
do_no_page() was simply buggy, and that this explains everything.

Please please please test. Throw all the other patches away (with the 
possible exception of the update_mmu_cache() sanity checker, which is 
still interesting in case some _other_ place does this too).

Don't do the wait_on_page_writeback() thing, because it changes timings 
and might hide thngs for the wrong reasons.  Just apply this on top of a 
known failing kernel, and test.

Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma-vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/24/06, Linus Torvalds [EMAIL PROTECTED] wrote:


How about this particularly stupid diff? (please test with something that
_would_ cause corruption normally).

It is _entirely_ untested, but what it tries to do is to simply serialize
any writeback in progress with any process that tries to re-map a shared
page into its address space and dirty it. I haven't tested it, and maybe
it misses some case, but it looks likea good way to try to avoid races
with marking pages dirty and the writeback phase ..


The apt cache files (/var/cache/apt/*.bin) still get corrupted with
this patch and 2.6.19.

Gordon

diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c
--- linux-2.6.19.orig/fs/buffer.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700
@@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag
   int ret = 0;

   BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
   return 0;

   if (mapping == NULL) {  /* can this still happen? */
@@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag
   spin_lock(mapping-private_lock);
   ret = drop_buffers(page, buffers_to_free);
   spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*/
-   clear_page_dirty(page);
-   }
out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c
linux-2.6.19/fs/hugetlbfs/inode.c
--- linux-2.6.19.orig/fs/hugetlbfs/inode.c  2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/fs/hugetlbfs/inode.c   2006-12-21 01:15:21.0 -0700
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct

static void truncate_huge_page(struct page *page)
{
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
   ClearPageUptodate(page);
   remove_from_page_cache(page);
   put_page(page);
diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h
linux-2.6.19/include/linux/page-flags.h
--- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29
14:57:37.0 -0700
+++ linux-2.6.19/include/linux/page-flags.h 2006-12-21
01:15:21.0 -0700
@@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc

struct page;   /* forward declaration */

-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);

-static inline void clear_page_dirty(struct page *page)
-{
-   test_clear_page_dirty(page);
-}
-
static inline void set_page_writeback(struct page *page)
{
   test_set_page_writeback(page);
diff -Naupr linux-2.6.19.orig/mm/memory.c linux-2.6.19/mm/memory.c
--- linux-2.6.19.orig/mm/memory.c   2006-11-29 14:57:37.0 -0700
+++ linux-2.6.19/mm/memory.c2006-12-24 11:04:03.0 -0700
@@ -1534,6 +1534,7 @@ static int do_wp_page(struct mm_struct *
   if (!pte_same(*page_table, orig_pte))
   goto unlock;
   }
+   wait_on_page_writeback(old_page);
   dirty_page = old_page;
   get_page(dirty_page);
   reuse = 1;
@@ -1832,6 +1833,33 @@ void unmap_mapping_range(struct address_
}
EXPORT_SYMBOL(unmap_mapping_range);

+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if (!mapping)
+   return;
+   offset = size  ~PAGE_MASK;
+   if (!offset)
+   return;
+   index = size  PAGE_SHIFT;
+   page = find_lock_page(mapping, index);
+   if (page) {
+   unsigned int check = 0;
+   unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+   do {
+   check += kaddr[offset++];
+   } while (offset  PAGE_SIZE);
+   kunmap_atomic(kaddr,KM_USER0);
+   unlock_page(page);
+   page_cache_release(page);
+   if (check)
+   printk(%s: BADNESS: truncate check %u\n,
current-comm, check);
+   }
+}
+
/**
 * vmtruncate - unmap mappings freed by truncate() syscall
 * @inode: inode of the file used
@@ -1865,6 +1893,7 @@ do_expand:
   goto out_sig;

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Gordon Farquharson wrote:
 
 The apt cache files (/var/cache/apt/*.bin) still get corrupted with
 this patch and 2.6.19.

Yeah, if my guess about do_no_page() is right, _none_ of the previous 
patches should have ANY effect what-so-ever. In fact, I'd say that even 
the ext3 works in writeback mode thing that Andrei reports is probably a 
total fluke brought on by timing changes rather than anything else.

So please try the latest patch instead (on top of anything that shows 
corruption reliably - the patch should be _totally_ independent of all the 
other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
2.6.19 too, so anything that shows corruption is a fine target - but try 
to choose something that has been the best at corrupting things for you, 
to make the testing as good as possible).

Patch included here again (although I think you were cc'd on my previous 
email too, so you should already have it, and our emails just crossed)

And if this doesn't fix it, I don't know what will..

Linus

---
diff --git a/mm/memory.c b/mm/memory.c
index 563792f..cf429c4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2247,21 +2249,23 @@ retry:
if (pte_none(*page_table)) {
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma-vm_page_prot);
-   if (write_access)
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
lru_cache_add_active(new_page);
page_add_new_anon_rmap(new_page, vma, address);
+   if (write_access)
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(new_page);
+   entry = pte_wrprotect(entry);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
}
}
+   set_pte_at(mm, address, page_table, entry);
} else {
/* One of our sibling threads was faster, back out. */
page_cache_release(new_page);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote:
 
 On Sun, 24 Dec 2006, Gordon Farquharson wrote:
  
  The apt cache files (/var/cache/apt/*.bin) still get corrupted with
  this patch and 2.6.19.
 
 Yeah, if my guess about do_no_page() is right, _none_ of the previous 
 patches should have ANY effect what-so-ever. In fact, I'd say that even 
 the ext3 works in writeback mode thing that Andrei reports is probably a 
 total fluke brought on by timing changes rather than anything else.
 
 So please try the latest patch instead (on top of anything that shows 
 corruption reliably - the patch should be _totally_ independent of all the 
 other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
 2.6.19 too, so anything that shows corruption is a fine target - but try 
 to choose something that has been the best at corrupting things for you, 
 to make the testing as good as possible).
 
 Patch included here again (although I think you were cc'd on my previous 
 email too, so you should already have it, and our emails just crossed)
 
 And if this doesn't fix it, I don't know what will..

With latest git and patches:
http://lkml.org/lkml/diff/2006/12/24/56/1
http://lkml.org/lkml/diff/2006/12/24/61/1

Hash check on download completion found bad chunks, consider using
safe_sync.

 
   Linus
 
 ---
 diff --git a/mm/memory.c b/mm/memory.c
 index 563792f..cf429c4 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -2247,21 +2249,23 @@ retry:
   if (pte_none(*page_table)) {
   flush_icache_page(vma, new_page);
   entry = mk_pte(new_page, vma-vm_page_prot);
 - if (write_access)
 - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 - set_pte_at(mm, address, page_table, entry);
   if (anon) {
   inc_mm_counter(mm, anon_rss);
   lru_cache_add_active(new_page);
   page_add_new_anon_rmap(new_page, vma, address);
 + if (write_access)
 + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
   } else {
   inc_mm_counter(mm, file_rss);
   page_add_file_rmap(new_page);
 + entry = pte_wrprotect(entry);
   if (write_access) {
   dirty_page = new_page;
   get_page(dirty_page);
 + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
   }
   }
 + set_pte_at(mm, address, page_table, entry);
   } else {
   /* One of our sibling threads was faster, back out. */
   page_cache_release(new_page);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds



On Sun, 24 Dec 2006, Andrei Popa wrote:
 
 Hash check on download completion found bad chunks, consider using
 safe_sync.

Dang. Did you get any warning messages from the kernel?

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa

On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote:
 
 On Sun, 24 Dec 2006, Andrei Popa wrote:
  
  Hash check on download completion found bad chunks, consider using
  safe_sync.
 
 Dang. Did you get any warning messages from the kernel?
 

only these:
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80

but I don't think has anything to do with...

   Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson


On 12/24/06, Linus Torvalds [EMAIL PROTECTED] wrote:


Ok, so how about this diff.

I'm actually feeling good about this one. It really looks like
do_no_page() was simply buggy, and that this explains everything.


I tested with just this patch and 2.6.19 and no change. Sorry Linus,
no early Christmas present :-(

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin

 Quoting Linus Torvalds [EMAIL PROTECTED]:
 Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content 
 corruption on ext3)

 Peter, tell me I'm crazy, but with the new rules, the following condition 
 is a bug:
 
  - shared mapping
  - writable
  - not already marked dirty in the PTE
 
 because that combination means that the hardware can mark the PTE dirty 
 without us even realizing (and thus not marking the struct page * 
 dirty).

Er.
Sorry about bumping in, and I'm not sure I understand all of the discussion,
but this reminded me of an old issue with COW that created what looks
like a vaguely similiar data corruption on infiniband. We solved this for
infiniband with MADV_DONTFORK, but I always wondered why does it not affect
other parts of kernel.  Small reminder from that discussion:

down mmap sem
get user pages
up mmap sem
page becomes shared, and COW (e.g. fork)
process writes to first byte of page - gets a copy
Now we had a problem: struct page that we got from get user pages
does not point to a correct page in our process.
For example: if at some point we map this page for DMA, and
hardware writes to last byte of page - process does not
see this data.

So for infiniband, what we do is a combination of
- prevent page from becoming COW while hardware might DMA to this page, and
- ask users not to write to page if hardware might DMA to same page
  (even if its using different bytes).

I just wandered - is there some chance something like this could be happening in
the fs code?

HTH,

-- 
MST
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr

* Linus Torvalds [EMAIL PROTECTED] [2006-12-24 11:35]:
 And if this doesn't fix it, I don't know what will..

Sorry, but it still fails (on top of plain 2.6.19).
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa

On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote:
> * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> > With all three patches I have corruption
> 
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again... but I really
> need a better testcase since an installation takes about an hour.
> Andrei, which torrent do you download as a testcase?  It would be good
> if someone could suggest a torrent which is legal and not too large.
It's a 1.4GB file torrent split in 84 rar files and there are many
seeders. I download with ~ 5MB/sec. The torrent is private.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa

On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote:
 * Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]:
  With all three patches I have corruption
 
 I've completed one installation with Linus' patch plus the two from
 Andrew successfully, but I'm currently trying again... but I really
 need a better testcase since an installation takes about an hour.
 Andrei, which torrent do you download as a testcase?  It would be good
 if someone could suggest a torrent which is legal and not too large.
It's a 1.4GB file torrent split in 84 rar files and there are many
seeders. I download with ~ 5MB/sec. The torrent is private.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]:
> >  and it failed.
> Since you are on ARM you might want to try with the page_mkclean_one
> cleanup patch too.

I've already tried it and it didn't work.  I just tried it again
together with Linus' patch and the two from Andrew and it still fails.
(For reference, the patch is attached.)
-- 
Martin Michlmayr
http://www.cyrius.com/
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 4830a3b..350878a 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -253,15 +253,11 @@ #define ClearPageUncached(page)   clear_bi
 
 struct page;   /* forward declaration */
 
-int test_clear_page_dirty(struct page *page);
+extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+
 int test_clear_page_writeback(struct page *page);
 int test_set_page_writeback(struct page *page);
 
-static inline void clear_page_dirty(struct page *page)
-{
-   test_clear_page_dirty(page);
-}
-
 static inline void set_page_writeback(struct page *page)
 {
test_set_page_writeback(page);
diff --git a/mm/memory.c b/mm/memory.c
index c00bac6..79cecab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+   pgoff_t index;
+   unsigned int offset;
+   struct page *page;
+
+   if (!mapping)
+   return;
+   offset = size & ~PAGE_MASK;
+   if (!offset)
+   return;
+   index = size >> PAGE_SHIFT;
+   page = find_lock_page(mapping, index);
+   if (page) {
+   unsigned int check = 0;
+   unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+   do {
+   check += kaddr[offset++];
+   } while (offset < PAGE_SIZE);
+   kunmap_atomic(kaddr,KM_USER0);
+   unlock_page(page);
+   page_cache_release(page);
+   if (check)
+   printk("%s: BADNESS: truncate check %u\n", 
current->comm, check);
+   }
+}
+
 /**
  * vmtruncate - unmap mappings "freed" by truncate() syscall
  * @inode: inode of the file used
@@ -1875,6 +1902,7 @@ do_expand:
goto out_sig;
if (offset > inode->i_sb->s_maxbytes)
goto out_big;
+   check_last_page(mapping, inode->i_size);
i_size_write(inode, offset);
 
 out_truncate:
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 237107c..b3a198c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -845,38 +845,6 @@ int set_page_dirty_lock(struct page *pag
 EXPORT_SYMBOL(set_page_dirty_lock);
 
 /*
- * Clear a page's dirty flag, while caring for dirty memory accounting. 
- * Returns true if the page was previously dirty.
- */
-int test_clear_page_dirty(struct page *page)
-{
-   struct address_space *mapping = page_mapping(page);
-   unsigned long flags;
-
-   if (!mapping)
-   return

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Linus Torvalds

On Fri, 22 Dec 2006, Peter Zijlstra wrote:
> 
> fix page_mkclean_one()
> 
>  - add flush_cache_page() for all those virtual indexed cache
>architectures.

I think the flush_cache_page() should be after we've actually flushed it 
from the TLB and re-inserted it (this is one reason why I did the 
"ptep_exchange()" version of this). Otherwise somebody can still write to 
the page _after_ the cache flush..

>  - handle s390.

Yeah, that looks like the proper way to handle that.

That said, it looks like we still see corruption. You may not, but Martin 
and Andrei still report problems, even with all the patches (including the 
last one from Andrew that avoids "dirty" going negative under some 
circumstances, and explains the "slow and/or never completed" case that 
Gordon and Martin saw).

The good news is that I think the code now is cleaner and more 
understandable. The bad news is that nothing we've ever tried seems to 
have fixed the _problem_.

And I don't think it's page_mkclean(). Especially not since the ARM people 
are seeing this under UP without PREEMPT. In that kind of schenario, the 
only possible races tend to be from things that actually block: 
"set_page_dirty()" (which blocks on IO in balancing), memory allocations, 
and obviously doing actual IO.

And it's not a virtual cache problem, since others see it on x86.

Of course, since it's quite possibly two different issues, maybe the 
virtual cache flush is required in order to force write-back to memory 
(which in turn is required for the DMA for the actual write!). So the ARM 
issue certainly could be due to the flush_cache_page() thing...

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-22 08:30]:
> Based on the kernel gurus current knowledge of the problem, would
> you expect the corruption to occur at the same point in a file, or
> is it possible that the corruption could occur at different points
> on successive Debian installer attempts on a UP, non PREEMPT system?

Seems like it can occur anywhere.  In fact, some people see apt
problems because of filesystem corruption on the NSLU2 after they have
already installe Debian.  I've only seen this once myself and failed
many times to find a reproducible situation.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote:


... and now that we've completed this step, the apt cache has suddenly
been reduced (see Gordon's mail for an explanation) and it segfaults:

sh-3.1# ls -l /var/cache/apt/
total 12524
drwxr-xr-x 3 root root   12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin
-rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin
sh-3.1# apt-get -f install
Reading package lists... Done
Segmentation faulty tree... 50%


I think that we are seeing different manifestations of apt's response
to corrupted cache files. There does not appear to be any pattern to
which manifestation occurs. Maybe it depends on where in the cache
file the corruption is located, i.e. when the corruption occurs. Based
on the kernel gurus current knowledge of the problem, would you expect
the corruption to occur at the same point in a file, or is it possible
that the corruption could occur at different points on successive
Debian installer attempts on a UP, non PREEMPT system ?

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Patrick Mau

On Fri, Dec 22, 2006 at 01:32:49PM +0100, Martin Michlmayr wrote:
> * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> > With all three patches I have corruption
> 
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again... but I really
> need a better testcase since an installation takes about an hour.
> Andrei, which torrent do you download as a testcase?  It would be good
> if someone could suggest a torrent which is legal and not too large.

Hi everyone,

I have been reading this thread for the last few days, but have been
silent. I have 3 torrents here for testing, if you want.

You can easily reproduce with "rtorrent", if you:

- Have a completly downloaded one, no matter what size
- Corrupt the download with
  dd if=/dev/zero of=download.file bs=16k count=1
- Restart 'rtorrent', hash-check fails
- It will download 1 piece that was corrupted.

The important part here is that rtorrent transfers one piece,
using its own code sequence to write to the file.

Let me offer to test until Saturday afternoon CET,
I have a cloned git repository, downloaded torrent files and "apt".

My systems that are affected are:

Linux oscar 2.6.18 SMP (2x450Mhz Intel P3)
(rolled back to 2.6.18 but can boot latest git)

Linux tony 2.6.20-git UP
(can be tested using all kinds of "apt" operations)

Both machines are using:
IDE  -> MD-RAID1 -> LVM -> EXT3 (data=ordered)
SCSI -> MD-RAID5 -> .

I don't want to disturb your technical discussion,
just offering some help in testing.

Regards,
Patrick

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson


On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote:


sh-3.1# ls -l /var/cache/apt/
total 5252
drwxr-xr-x 3 root root12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin
-rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin


This listing is a little different to what I got. For me,
srcpkgcache.bin did not exist when apt eventually finished. Did you
notice whether the install took a lot longer than usual ?


Gordon, does it fail for you where it normally does (installing
initramfs-tools) or much later?  For me, the installer was able to
install initramfs-tools and the kernel, but apt now hangs at "Select
and install software".


apt didn't hang for me, it just took 20 to 30 minutes to complete
building the package database. Usually, it takes less than a minute.
The installer stopped because it could not find a kernel to install. I
have seen this failure mde before, and as you have previously pointed
out, is probably the same problem (corrupted apt cache files), just a
different manifestation.

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson


On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote:


Andrew located at least one bug: we run cancel_dirty_page() too late in
"truncate_complete_page()", which means that do_invalidatepage() ends up
not clearing the page cache.

His patch is appended.


Thanks. I'll try this out later today.


But it sounds like I probably misunderstood something, because I thought
that Martin had acknowledged that this patch actually worked for him.
Which sounded very similar to your setup (he has a 32M ARM box too, no?)


Yup, we have the same machines (Linksys NSLU2) and are running the
same test case (installing Debian). However, I'm not sure what kernel
version he had used for his latest test. I presumed 2.6.20-git,
whereas I had used 2.6.19.


Maybe it's mount option issue? I've got data=ordered on my machine, are
you perhaps runnign with something else?


We are also using ordered.

/dev/scsi/host0/bus0/target0/lun0/part1 /target ext3 rw,data=ordered 0 0

Gordon

--
Gordon Farquharson
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra


A cleanup of try_to_unmap. I have not identified any races that this
would solve, but for consistencies sake.

Also includes a small s390 optimization by moving
page_test_and_clear_dirty() out of the vma iteration.


From: Peter Zijlstra <[EMAIL PROTECTED]>

We clear the page in the following sequence:
  ClearPageDirty - lock ptl, clear pte, unlock ptl

hence we should dirty in the opposite order:
  lock ptl, clear pte, unlock ptl - SetPageDirty

try_to_unmap_one violates this by doing the SetPageDirty under the ptl.

Also move page_test_and_clear_dirty() to try_to_unmap().

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/rmap.c |   10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -590,8 +590,6 @@ void page_remove_rmap(struct page *page)
 * Leaving it set also helps swapoff to reinstate ptes
 * faster for those pages still in swapcache.
 */
-   if (page_test_and_clear_dirty(page))
-   set_page_dirty(page);
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : 
NR_FILE_MAPPED);
}
@@ -610,6 +608,7 @@ static int try_to_unmap_one(struct page 
pte_t pteval;
spinlock_t *ptl;
int ret = SWAP_AGAIN;
+   struct page *dirty_page = NULL;
 
address = vma_address(page, vma);
if (address == -EFAULT)
@@ -636,7 +635,7 @@ static int try_to_unmap_one(struct page 
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
-   set_page_dirty(page);
+   dirty_page = page;
 
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
@@ -687,6 +686,8 @@ static int try_to_unmap_one(struct page 
 
 out_unmap:
pte_unmap_unlock(pte, ptl);
+   if (dirty_page)
+   set_page_dirty(dirty_page);
 out:
return ret;
 }
@@ -918,6 +919,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (page_test_and_clear_dirty(page))
+   set_page_dirty(page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra

On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote:
> * Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]:
> > I've completed one installation with Linus' patch plus the two from
> > Andrew successfully, but I'm currently trying again...
> 
>  and it failed.

Since you are on ARM you might want to try with the page_mkclean_one
cleanup patch too.

Arjan agreed that the loop is not needed; we clear the pte, flush on all
CPUs and then re-establish the pte. Any race will fault and be
serialised on the pte lock.

FWIW - with todays -git and Andrews second cancel_dirty_page() patch:
  http://lkml.org/lkml/2006/12/22/49
I am unable to trigger any corruption - I could again earlier by raising
the number of seeds from 3 to 6. (am currently at 10 seeds)




From: Peter Zijlstra <[EMAIL PROTECTED]>

fix page_mkclean_one()

 - add flush_cache_page() for all those virtual indexed cache
   architectures.

 - handle s390.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/rmap.c |   38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -432,7 +432,7 @@ static int page_mkclean_one(struct page 
 {
struct mm_struct *mm = vma->vm_mm;
unsigned long address;
-   pte_t *pte, entry;
+   pte_t *pte;
spinlock_t *ptl;
int ret = 0;
 
@@ -444,17 +444,18 @@ static int page_mkclean_one(struct page 
if (!pte)
goto out;
 
-   if (!pte_dirty(*pte) && !pte_write(*pte))
-   goto unlock;
+   if (pte_dirty(*pte) || pte_write(*pte)) {
+   pte_t entry;
 
-   entry = ptep_get_and_clear(mm, address, pte);
-   entry = pte_mkclean(entry);
-   entry = pte_wrprotect(entry);
-   ptep_establish(vma, address, pte, entry);
-   lazy_mmu_prot_update(entry);
-   ret = 1;
+   flush_cache_page(vma, address, pte_pfn(*pte));
+   entry = ptep_clear_flush(vma, address, pte);
+   entry = pte_wrprotect(entry);
+   entry = pte_mkclean(entry);
+   set_pte_at(vma, address, pte, entry);
+   lazy_mmu_prot_update(entry);
+   ret = 1;
+   }
 
-unlock:
pte_unmap_unlock(pte, ptl);
 out:
return ret;
@@ -489,6 +490,8 @@ int page_mkclean(struct page *page)
if (mapping)
ret = page_mkclean_file(mapping, page);
}
+   if (page_test_and_clear_dirty(page))
+   ret = 1;
 
return ret;
 }


 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]:
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again...

... and it failed.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> With all three patches I have corruption

I've completed one installation with Linus' patch plus the two from
Andrew successfully, but I'm currently trying again... but I really
need a better testcase since an installation takes about an hour.
Andrei, which torrent do you download as a testcase?  It would be good
if someone could suggest a torrent which is legal and not too large.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa

With all three patches I have corruption


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/asm-generic/pgtable.h
b/include/asm-generic/pgtable.h
index 9d774d0..8879f1d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -61,31 +61,6 @@ ({   
\
 })
 #endif
 
-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
-#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\
-({ \
-   pte_t __pte = *__ptep;  \
-   int r = 1;  \
-   if (!pte_dirty(__pte))  \
-   r = 0;  \
-   else\
-   set_pte_at((__vma)->vm_mm, (__address), (__ptep),   \
-  pte_mkclean(__pte)); \
-   r;  \
-})
-#endif
-
-#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(__vma, __address, __ptep)   \
-({ \
-   int __dirty;\
-   __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep);  \
-   if (__dirty)\
-   flush_tlb_page(__vma, __address);   \
-   __dirty;\
-})
-#endif
-
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear(__mm, __address, __ptep)\
 ({ \
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..b61d6f9 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -300,18 +300,20 @@ do {  
\
flush_tlb_page(vma, address);   \
 } while (0)
 
-#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(vma, address, ptep) \
-({ \
-   int __dirty;\
-   __dirty = pte_dirty(*(ptep));   \
-   if (__dirty) {  \
-   clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low);   \
-   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
-   flush_tlb_page(vma, address);   \
-   }   \
-   __dirty;

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Andrew Morton <[EMAIL PROTECTED]> [2006-12-22 02:17]:
> > This hunk (on top of git from about 2 days ago and your latest patch)
> > results in the installer hanging right at the start.
> 
> You'll need this also:

It starts again, thanks.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:10]:
> > immediately when I started wget, the hanging apt-get process
> > continued.
> ... and now that we've completed this step, the apt cache has suddenly
> been reduced (see Gordon's mail for an explanation) and it segfaults:

One of my questions was why apt-get worked to install the
initramfs-tools, the kernel and some other packages but later hung
while it was building the cache (which clearly it had built already to
install some packages): before the installer offers to install
additional packages, it changes the apt sources, which leads to apt
rebuilding the cache, and here it hangs.

Remember how I said that downloading a file with wget prompts apt to
work again?  Apparently any filesystem access will do (I just ran
find / > /dev/null).  Gordon, can you confirm this?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrew Morton

On Fri, 22 Dec 2006 11:00:04 +0100
Martin Michlmayr <[EMAIL PROTECTED]> wrote:

> > -   if (TestClearPageDirty(page) && account_size)
> > +   if (TestClearPageDirty(page) && account_size) {
> > +   dec_zone_page_state(page, NR_FILE_DIRTY);
> > task_io_account_cancelled_write(account_size);
> > +   }
> 
> This hunk (on top of git from about 2 days ago and your latest patch)
> results in the installer hanging right at the start. 

You'll need this also:

From: Andrew Morton <[EMAIL PROTECTED]>

Only (un)account for IO and page-dirtying for devices which have real backing
store (ie: not tmpfs or ramdisks).

Cc: "David S. Miller" <[EMAIL PROTECTED]>
Cc: Linus Torvalds <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 mm/truncate.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN mm/truncate.c~truncate-dirty-memory-accounting-fix mm/truncate.c
--- a/mm/truncate.c~truncate-dirty-memory-accounting-fix
+++ a/mm/truncate.c
@@ -60,7 +60,8 @@ void cancel_dirty_page(struct page *page
WARN_ON(++warncount < 5);
}

-   if (TestClearPageDirty(page) && account_size) {
+   if (TestClearPageDirty(page) && account_size &&
+   mapping_cap_account_dirty(page->mapping)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
task_io_account_cancelled_write(account_size);
}
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:06]:
> Okay, it's really weird.  So apt-get just hangs doing nothing and I
> cannot even kill it.  I just tried to download strace via wget and
> immediately when I started wget, the hanging apt-get process
> continued.

... and now that we've completed this step, the apt cache has suddenly
been reduced (see Gordon's mail for an explanation) and it segfaults:

sh-3.1# ls -l /var/cache/apt/
total 12524
drwxr-xr-x 3 root root   12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin
-rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin
sh-3.1# apt-get -f install
Reading package lists... Done
Segmentation faulty tree... 50%

-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:00]:
> This time, however, I let the installer continue and it seems that
> with your patch apt now works where it failed in the past, but it
> hangs later on.  It's pretty weird because I cannot even kill the
> process:

Okay, it's really weird.  So apt-get just hangs doing nothing and I
cannot even kill it.  I just tried to download strace via wget and
immediately when I started wget, the hanging apt-get process
continued.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-21 21:20]:
> generating these files, pkgcache.bin grows to 12582912 bytes, and when
> apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is
> 64254483 bytes. This time, when apt-get exited, it had only created
> pkgcache.bin which was still 12582912 bytes.

Yes, same here:

sh-3.1# ls -l /var/cache/apt/
total 5252
drwxr-xr-x 3 root root12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin
-rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin

Gordon, does it fail for you where it normally does (installing
initramfs-tools) or much later?  For me, the installer was able to
install initramfs-tools and the kernel, but apt now hangs at "Select
and install software".
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-21 20:54]:
> But it sounds like I probably misunderstood something, because I thought
> that Martin had acknowledged that this patch actually worked for him.

That's what I thought too but now I can confirm what Gordon sees.  But
it's pretty weird.  Our testcase is to run Debian installer on the
NSLU2 arm device and apt-get would either segfault or hang at this
particular spot in the installation (when apt is first run).  With
your patch, apt works correctly where it normally fails (at least for
me).  I stopped the installation at this point and repeated it several
more times to make sure it's really working.  And, yes, I can repeat
this result.

This time, however, I let the installer continue and it seems that
with your patch apt now works where it failed in the past, but it
hangs later on.  It's pretty weird because I cannot even kill the
process:

sh-3.1# ps aux | grep 31126
root 31126  5.7 20.6  16240  6076 ?R+   04:45   0:21 apt-get -o 
APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install 
popularity-contest
root 31157  0.0  1.6   1516   492 ttyS0S+   04:51   0:00 grep 31126
sh-3.1# kill -9 31126
sh-3.1# kill -9 31126
sh-3.1# ps aux | grep 31126
root 31126  5.6 20.6  16240  6076 ?R+   04:45   0:21 apt-get -o 
APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install 
popularity-contest
root 31159  0.0  1.6   1516   492 ttyS0S+   04:51   0:00 grep 31126
sh-3.1#

> Which sounded very similar to your setup (he has a 32M ARM box too, no?)

It's the same device, a Linksys NSLU2.

> Author: Andrew Morton <[EMAIL PROTECTED]>

This patch makes it even worse for me.

> - if (TestClearPageDirty(page) && account_size)
> + if (TestClearPageDirty(page) && account_size) {
> + dec_zone_page_state(page, NR_FILE_DIRTY);
>   task_io_account_cancelled_write(account_size);
> + }

This hunk (on top of git from about 2 days ago and your latest patch)
results in the installer hanging right at the start.  The Linux kernel
boots fine, the debian-installer is loaded into a ramdisk but when
ncurses is being started it just hangs.  Reverting this hunk makes it
start again.

Does that help or confuse you even more?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Linus Torvalds [EMAIL PROTECTED] [2006-12-21 20:54]:
 But it sounds like I probably misunderstood something, because I thought
 that Martin had acknowledged that this patch actually worked for him.

That's what I thought too but now I can confirm what Gordon sees.  But
it's pretty weird.  Our testcase is to run Debian installer on the
NSLU2 arm device and apt-get would either segfault or hang at this
particular spot in the installation (when apt is first run).  With
your patch, apt works correctly where it normally fails (at least for
me).  I stopped the installation at this point and repeated it several
more times to make sure it's really working.  And, yes, I can repeat
this result.

This time, however, I let the installer continue and it seems that
with your patch apt now works where it failed in the past, but it
hangs later on.  It's pretty weird because I cannot even kill the
process:

sh-3.1# ps aux | grep 31126
root 31126  5.7 20.6  16240  6076 ?R+   04:45   0:21 apt-get -o 
APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install 
popularity-contest
root 31157  0.0  1.6   1516   492 ttyS0S+   04:51   0:00 grep 31126
sh-3.1# kill -9 31126
sh-3.1# kill -9 31126
sh-3.1# ps aux | grep 31126
root 31126  5.6 20.6  16240  6076 ?R+   04:45   0:21 apt-get -o 
APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install 
popularity-contest
root 31159  0.0  1.6   1516   492 ttyS0S+   04:51   0:00 grep 31126
sh-3.1#

 Which sounded very similar to your setup (he has a 32M ARM box too, no?)

It's the same device, a Linksys NSLU2.

 Author: Andrew Morton [EMAIL PROTECTED]

This patch makes it even worse for me.

 - if (TestClearPageDirty(page)  account_size)
 + if (TestClearPageDirty(page)  account_size) {
 + dec_zone_page_state(page, NR_FILE_DIRTY);
   task_io_account_cancelled_write(account_size);
 + }

This hunk (on top of git from about 2 days ago and your latest patch)
results in the installer hanging right at the start.  The Linux kernel
boots fine, the debian-installer is loaded into a ramdisk but when
ncurses is being started it just hangs.  Reverting this hunk makes it
start again.

Does that help or confuse you even more?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Gordon Farquharson [EMAIL PROTECTED] [2006-12-21 21:20]:
 generating these files, pkgcache.bin grows to 12582912 bytes, and when
 apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is
 64254483 bytes. This time, when apt-get exited, it had only created
 pkgcache.bin which was still 12582912 bytes.

Yes, same here:

sh-3.1# ls -l /var/cache/apt/
total 5252
drwxr-xr-x 3 root root12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin
-rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin

Gordon, does it fail for you where it normally does (installing
initramfs-tools) or much later?  For me, the installer was able to
install initramfs-tools and the kernel, but apt now hangs at Select
and install software.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:00]:
 This time, however, I let the installer continue and it seems that
 with your patch apt now works where it failed in the past, but it
 hangs later on.  It's pretty weird because I cannot even kill the
 process:

Okay, it's really weird.  So apt-get just hangs doing nothing and I
cannot even kill it.  I just tried to download strace via wget and
immediately when I started wget, the hanging apt-get process
continued.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:06]:
 Okay, it's really weird.  So apt-get just hangs doing nothing and I
 cannot even kill it.  I just tried to download strace via wget and
 immediately when I started wget, the hanging apt-get process
 continued.

... and now that we've completed this step, the apt cache has suddenly
been reduced (see Gordon's mail for an explanation) and it segfaults:

sh-3.1# ls -l /var/cache/apt/
total 12524
drwxr-xr-x 3 root root   12288 Dec 22 04:41 archives
-rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin
-rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin
sh-3.1# apt-get -f install
Reading package lists... Done
Segmentation faulty tree... 50%

-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrew Morton

On Fri, 22 Dec 2006 11:00:04 +0100
Martin Michlmayr [EMAIL PROTECTED] wrote:

  -   if (TestClearPageDirty(page)  account_size)
  +   if (TestClearPageDirty(page)  account_size) {
  +   dec_zone_page_state(page, NR_FILE_DIRTY);
  task_io_account_cancelled_write(account_size);
  +   }
 
 This hunk (on top of git from about 2 days ago and your latest patch)
 results in the installer hanging right at the start. 

You'll need this also:

From: Andrew Morton [EMAIL PROTECTED]

Only (un)account for IO and page-dirtying for devices which have real backing
store (ie: not tmpfs or ramdisks).

Cc: David S. Miller [EMAIL PROTECTED]
Cc: Linus Torvalds [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 mm/truncate.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN mm/truncate.c~truncate-dirty-memory-accounting-fix mm/truncate.c
--- a/mm/truncate.c~truncate-dirty-memory-accounting-fix
+++ a/mm/truncate.c
@@ -60,7 +60,8 @@ void cancel_dirty_page(struct page *page
WARN_ON(++warncount  5);
}

-   if (TestClearPageDirty(page)  account_size) {
+   if (TestClearPageDirty(page)  account_size 
+   mapping_cap_account_dirty(page-mapping)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
task_io_account_cancelled_write(account_size);
}
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:10]:
  immediately when I started wget, the hanging apt-get process
  continued.
 ... and now that we've completed this step, the apt cache has suddenly
 been reduced (see Gordon's mail for an explanation) and it segfaults:

One of my questions was why apt-get worked to install the
initramfs-tools, the kernel and some other packages but later hung
while it was building the cache (which clearly it had built already to
install some packages): before the installer offers to install
additional packages, it changes the apt sources, which leads to apt
rebuilding the cache, and here it hangs.

Remember how I said that downloading a file with wget prompts apt to
work again?  Apparently any filesystem access will do (I just ran
find /  /dev/null).  Gordon, can you confirm this?
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Andrew Morton [EMAIL PROTECTED] [2006-12-22 02:17]:
  This hunk (on top of git from about 2 days ago and your latest patch)
  results in the installer hanging right at the start.
 
 You'll need this also:

It starts again, thanks.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa

With all three patches I have corruption


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/asm-generic/pgtable.h
b/include/asm-generic/pgtable.h
index 9d774d0..8879f1d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -61,31 +61,6 @@ ({   
\
 })
 #endif
 
-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
-#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\
-({ \
-   pte_t __pte = *__ptep;  \
-   int r = 1;  \
-   if (!pte_dirty(__pte))  \
-   r = 0;  \
-   else\
-   set_pte_at((__vma)-vm_mm, (__address), (__ptep),   \
-  pte_mkclean(__pte)); \
-   r;  \
-})
-#endif
-
-#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(__vma, __address, __ptep)   \
-({ \
-   int __dirty;\
-   __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep);  \
-   if (__dirty)\
-   flush_tlb_page(__vma, __address);   \
-   __dirty;\
-})
-#endif
-
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear(__mm, __address, __ptep)\
 ({ \
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..b61d6f9 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -300,18 +300,20 @@ do {  
\
flush_tlb_page(vma, address);   \
 } while (0)
 
-#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(vma, address, ptep) \
-({ \
-   int __dirty;\
-   __dirty = pte_dirty(*(ptep));   \
-   if (__dirty) {  \
-   clear_bit(_PAGE_BIT_DIRTY, (ptep)-pte_low);   \
-   pte_update_defer((vma)-vm_mm, (address), (ptep));  \
-   flush_tlb_page(vma, address);   \
-   }   \
-   __dirty;

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]:
 With all three patches I have corruption

I've completed one installation with Linus' patch plus the two from
Andrew successfully, but I'm currently trying again... but I really
need a better testcase since an installation takes about an hour.
Andrei, which torrent do you download as a testcase?  It would be good
if someone could suggest a torrent which is legal and not too large.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr

* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 13:32]:
 I've completed one installation with Linus' patch plus the two from
 Andrew successfully, but I'm currently trying again...

... and it failed.
-- 
Martin Michlmayr
http://www.cyrius.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra

On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote:
 * Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 13:32]:
  I've completed one installation with Linus' patch plus the two from
  Andrew successfully, but I'm currently trying again...
 
  and it failed.

Since you are on ARM you might want to try with the page_mkclean_one
cleanup patch too.

Arjan agreed that the loop is not needed; we clear the pte, flush on all
CPUs and then re-establish the pte. Any race will fault and be
serialised on the pte lock.

FWIW - with todays -git and Andrews second cancel_dirty_page() patch:
  http://lkml.org/lkml/2006/12/22/49
I am unable to trigger any corruption - I could again earlier by raising
the number of seeds from 3 to 6. (am currently at 10 seeds)




From: Peter Zijlstra [EMAIL PROTECTED]

fix page_mkclean_one()

 - add flush_cache_page() for all those virtual indexed cache
   architectures.

 - handle s390.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---
 mm/rmap.c |   38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -432,7 +432,7 @@ static int page_mkclean_one(struct page 
 {
struct mm_struct *mm = vma-vm_mm;
unsigned long address;
-   pte_t *pte, entry;
+   pte_t *pte;
spinlock_t *ptl;
int ret = 0;
 
@@ -444,17 +444,18 @@ static int page_mkclean_one(struct page 
if (!pte)
goto out;
 
-   if (!pte_dirty(*pte)  !pte_write(*pte))
-   goto unlock;
+   if (pte_dirty(*pte) || pte_write(*pte)) {
+   pte_t entry;
 
-   entry = ptep_get_and_clear(mm, address, pte);
-   entry = pte_mkclean(entry);
-   entry = pte_wrprotect(entry);
-   ptep_establish(vma, address, pte, entry);
-   lazy_mmu_prot_update(entry);
-   ret = 1;
+   flush_cache_page(vma, address, pte_pfn(*pte));
+   entry = ptep_clear_flush(vma, address, pte);
+   entry = pte_wrprotect(entry);
+   entry = pte_mkclean(entry);
+   set_pte_at(vma, address, pte, entry);
+   lazy_mmu_prot_update(entry);
+   ret = 1;
+   }
 
-unlock:
pte_unmap_unlock(pte, ptl);
 out:
return ret;
@@ -489,6 +490,8 @@ int page_mkclean(struct page *page)
if (mapping)
ret = page_mkclean_file(mapping, page);
}
+   if (page_test_and_clear_dirty(page))
+   ret = 1;
 
return ret;
 }


 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 216 matches

Mail list logo