Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 7 Jan 2007 12:36:18 +1030 "Tom Lanyon" <[EMAIL PROTECTED]> wrote: > On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > What would also actually be interesting is whether somebody can reproduce > > this on Reiserfs, for example. I _think_ all the reports I've seen are on > > ext2 or ext3, and if this is somehow writeback-related, it could be some > > bug that is just shared between the two by virtue of them still having a > > lot of stuff in common. > > > > Linus > > I've been following this thread for a while now as I started > experiencing file corruption in rtorrent when I upgraded to 2.6.19. I > am using reiserfs. reiserfs defaults to data=ordered, so it's quite possibly the same bug. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 1/7/07, Tom Lanyon <[EMAIL PROTECTED]> wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shared between the two by virtue of them still having a lot of stuff in common. Linus I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. -- Tom Lanyon - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shared between the two by virtue of them still having a lot of stuff in common. Linus I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. -- Tom Lanyon - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 1/7/07, Tom Lanyon [EMAIL PROTECTED] wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 7 Jan 2007 12:36:18 +1030 Tom Lanyon [EMAIL PROTECTED] wrote: On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shared between the two by virtue of them still having a lot of stuff in common. Linus I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. reiserfs defaults to data=ordered, so it's quite possibly the same bug. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: > > For s390 there are two aspects to consider: > 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte lock. The reason x86 needs to be careful is exactly the fact that the hardware will obviously do a lot on its own, and the hardware is _not_ going to honor our page table locking ;) In an all-sw situation, a lot of this should be easier. S390 has _other_ things that are inconvenient (the strange "dirty bit is not in the page tables" thing that makes it look different from everybody else), but hey, it's a balance.. So for s390, ptep_exchange() in my example should be able to be a simple "load old value and store new one", assuming everybody honors the pte lock (and they _should_). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: > What do you guys think? Does something like this work out for S/390 too? I > tried to make that "ptep_flush_dirty()" concept work for architectures > that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to consider: 1) the pte values are 100% software controlled. They only change because a cpu stored a value to it or issued one of the specialized instructions (csp, ipte and idte). The ptep_flush_dirty would be a nop for s390. 2) ptep_exchange is a bit dangerous. For s390 we need a lock that protects the software controlled updates of the ptes. The reason is the ipte instruction. It is implemented by the machine microcode in a non-atomic way in regard to the memory. It reads the byte of the pte that contains the invalid bit, flushes the tlb entries for it and then writes back the byte with the invalid bit set. The microcode makes sure that this pte cannot be used for form a new tlb on any cpu while the ipte is in progress. That means a compare-and-swap semantics on ptes won't work together with the ipte optimization. As long as there is the pte lock that protects all software accesses to the pte we are fine. But if any code expects that ptep_exchange does something like an xchg things break. -- blue skies, Martin. Martin Schwidefsky Linux for zSeries Development & Services IBM Deutschland Entwicklung GmbH "Reality continues to ruin my life." - Calvin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which were fixed by upgrading to reiserfs4. Jari Sundell - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: > What would also actually be interesting is whether somebody can reproduce > this on Reiserfs, for example. I _think_ all the reports I've seen are on > ext2 or ext3, and if this is somehow writeback-related, it could be some > bug that is just shared between the two by virtue of them still having a > lot of stuff in common. > > Linus I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. flo attenberger --- Linux master 2.6.19 #1 PREEMPT Thu Dec 21 10:55:34 CET 2006 x86_64 GNU/Linux # # Automatically generated make config: don't edit # Linux kernel version: 2.6.19 # Thu Dec 21 10:45:05 2006 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=m CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y # CONFIG_SMP is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_INTEL is not set CONFIG_X86_MCE_AMD=y CONFIG_KEXEC=y # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x20 CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_REORDER=y CONFIG_K8_NB=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y # # Power management options # CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set CONFIG_PM_SYSFS_DEPRECATED=y # CONFIG_SOFTWARE_SUSPEND is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y # CONFIG_ACPI_SLEEP_PROC_SLEEP is not set CONFIG_ACPI_AC=m # CONFIG_ACPI_BATTERY is not set CONFIG_ACPI_BUTTON=m CONFIG_ACPI_VIDEO=m CONFIG_ACPI_HOTKEY=m CONFIG_ACPI_FAN=m # CONFIG_ACPI_DOCK is not set CONFIG_ACPI_PROCESSOR=m CONFIG_ACPI_THERMAL=m # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: - It never uses mprotect on the shared mappings, but it _does_ do: "mincore()" - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I double- and triple-checked this one, because I did make changes to "mincore()", but those didn't go into the affected kernels anyway (ie they are not in plain 2.6.19, nor in 2.6.18.3 either) Correct, mincore is only used to check if it should delay the hash checking. "madvise(MADV_WILLNEED)" "msync(MS_ASYNC)" (or MS_SYNC if you use a command line flag) "munmap()" of course - it never seems to mix mmap() and write() - it does _only_ mmap. - it seems to mmap/munmap the shared files in nice 64-page chunks, all 64-page aligned in the file (ie it does NOT create one big mapping, it has some kind of LRU of thse 64-page chunks). The only exception being the last chunk, which it maps byte-accurate to the size. The length of the chunks is only page aligned on single file torrents, not so on multi-file torrents. I've attached a patch for rtorrent that will extend the length to the page boundary. - I haven't checked whether it only ever has the same chunk mapped once at a time. This should be the case, but two mapped chunks may share a page, sometimes with different r/w permissions. Jari Sundell extend_mapping.diff Description: Binary data
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: snip - It never uses mprotect on the shared mappings, but it _does_ do: mincore() - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I double- and triple-checked this one, because I did make changes to mincore(), but those didn't go into the affected kernels anyway (ie they are not in plain 2.6.19, nor in 2.6.18.3 either) Correct, mincore is only used to check if it should delay the hash checking. madvise(MADV_WILLNEED) msync(MS_ASYNC) (or MS_SYNC if you use a command line flag) munmap() of course - it never seems to mix mmap() and write() - it does _only_ mmap. - it seems to mmap/munmap the shared files in nice 64-page chunks, all 64-page aligned in the file (ie it does NOT create one big mapping, it has some kind of LRU of thse 64-page chunks). The only exception being the last chunk, which it maps byte-accurate to the size. The length of the chunks is only page aligned on single file torrents, not so on multi-file torrents. I've attached a patch for rtorrent that will extend the length to the page boundary. - I haven't checked whether it only ever has the same chunk mapped once at a time. This should be the case, but two mapped chunks may share a page, sometimes with different r/w permissions. Jari Sundell extend_mapping.diff Description: Binary data
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shared between the two by virtue of them still having a lot of stuff in common. Linus I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. flo attenberger --- Linux master 2.6.19 #1 PREEMPT Thu Dec 21 10:55:34 CET 2006 x86_64 GNU/Linux # # Automatically generated make config: don't edit # Linux kernel version: 2.6.19 # Thu Dec 21 10:45:05 2006 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE= # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=m CONFIG_IOSCHED_DEADLINE=m CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_MICROCODE=m CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y # CONFIG_SMP is not set # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y CONFIG_PREEMPT_BKL=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_INTEL is not set CONFIG_X86_MCE_AMD=y CONFIG_KEXEC=y # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x20 CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_REORDER=y CONFIG_K8_NB=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y # # Power management options # CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set CONFIG_PM_SYSFS_DEPRECATED=y # CONFIG_SOFTWARE_SUSPEND is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_SLEEP_PROC_FS=y # CONFIG_ACPI_SLEEP_PROC_SLEEP is not set CONFIG_ACPI_AC=m # CONFIG_ACPI_BATTERY is not set CONFIG_ACPI_BUTTON=m CONFIG_ACPI_VIDEO=m CONFIG_ACPI_HOTKEY=m CONFIG_ACPI_FAN=m # CONFIG_ACPI_DOCK is not set CONFIG_ACPI_PROCESSOR=m CONFIG_ACPI_THERMAL=m # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_PM_TIMER=y #
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/27/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which were fixed by upgrading to reiserfs4. Jari Sundell - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: What do you guys think? Does something like this work out for S/390 too? I tried to make that ptep_flush_dirty() concept work for architectures that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to consider: 1) the pte values are 100% software controlled. They only change because a cpu stored a value to it or issued one of the specialized instructions (csp, ipte and idte). The ptep_flush_dirty would be a nop for s390. 2) ptep_exchange is a bit dangerous. For s390 we need a lock that protects the software controlled updates of the ptes. The reason is the ipte instruction. It is implemented by the machine microcode in a non-atomic way in regard to the memory. It reads the byte of the pte that contains the invalid bit, flushes the tlb entries for it and then writes back the byte with the invalid bit set. The microcode makes sure that this pte cannot be used for form a new tlb on any cpu while the ipte is in progress. That means a compare-and-swap semantics on ptes won't work together with the ipte optimization. As long as there is the pte lock that protects all software accesses to the pte we are fine. But if any code expects that ptep_exchange does something like an xchg things break. -- blue skies, Martin. Martin Schwidefsky Linux for zSeries Development Services IBM Deutschland Entwicklung GmbH Reality continues to ruin my life. - Calvin. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: For s390 there are two aspects to consider: 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte lock. The reason x86 needs to be careful is exactly the fact that the hardware will obviously do a lot on its own, and the hardware is _not_ going to honor our page table locking ;) In an all-sw situation, a lot of this should be easier. S390 has _other_ things that are inconvenient (the strange dirty bit is not in the page tables thing that makes it look different from everybody else), but hey, it's a balance.. So for s390, ptep_exchange() in my example should be able to be a simple load old value and store new one, assuming everybody honors the pte lock (and they _should_). Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Tue, 26 Dec 2006, Nick Piggin wrote: > Linus Torvalds wrote: > > > > Ok, so how about this diff. > > > > I'm actually feeling good about this one. It really looks like > > "do_no_page()" was simply buggy, and that this explains everything. > > Still trying to catch up here, so I'm not going to reply to any old > stuff and just start at the tip of the thread... Other than to say > that I really like cancel_page_dirty ;) Yeah, I think that part is a bit clearer about what's going on now. > I think your patch is quite right so that's a good catch. Actually, since people told me it didn't matter, I went back and looked at _why_ - the thing is, "vma->vm_page_prot" should always be read-only anyway, except for mappings that don't do dirty accounting at all, so I think my patch only found cases that are unimportant (ie pages that get faulted on on filesystems like ramfs that doesn't do any dirty page accounting because they're all dirty anyway). > But I'm not too surprised that it does not help the problem, because I > don't think we have started shedding any old pte_dirty tests at > unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, > as such. True. We should no longer _need_ those dirty bit reclaims at unmap/reclaim, but we still do them, so you're right, even if we were buggy in this area, it should only really matter for the dirty page counting, not for any lost data. > I was hoping that you've almost narrowed it down to the filesystem > writeback code, with the last few mails? I think so, yes. However, I've checked, and "rtorrent" really does seem to be fairly well-behaved wrt any filesystem activity. It does - no threading. It's 100% single-threaded, and doesn't even appear to use signals. - exactly _one_ "ftruncate()", and it does it at the beginning, for the full final size. IOW, it's not anything subtle with truncate and dirty page cancel. - It never uses mprotect on the shared mappings, but it _does_ do: "mincore()" - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I double- and triple-checked this one, because I did make changes to "mincore()", but those didn't go into the affected kernels anyway (ie they are not in plain 2.6.19, nor in 2.6.18.3 either) "madvise(MADV_WILLNEED)" "msync(MS_ASYNC)" (or MS_SYNC if you use a command line flag) "munmap()" of course - it never seems to mix mmap() and write() - it does _only_ mmap. - it seems to mmap/munmap the shared files in nice 64-page chunks, all 64-page aligned in the file (ie it does NOT create one big mapping, it has some kind of LRU of thse 64-page chunks). The only exception being the last chunk, which it maps byte-accurate to the size. - I haven't checked whether it only ever has the same chunk mapped once at a time. Anyway, the _one_ half-way interesting thing is the fact that it doesn't allocate any backing store at all for the file, and as such the page writeback needs to create all the underlying buffers on the filesystem. I really don't see why that would be a problem either, but I could imagine that if we have some writeback bug where we can end up writing back the _same_ page concurrently, we'd actually end up racing in the kernel, and allocating two different backing stores, and then maybe the other one would effectively "get lost" (and the earlier writeback would win the race, explaining why we'd end up with zeroes at the end of a block). Or something. However, all the codepaths _seem_ to test for PG_writeback, and not even try to start another writeback while the first one is still active. What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shared between the two by virtue of them still having a lot of stuff in common. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: > On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > > > Hash check on download completion found bad chunks, consider using > > > "safe_sync". > > > > Dang. Did you get any warning messages from the kernel? > > > > Linus > > BTW, rmap.c patch is broken - needs at least ... but that doesn't affect most of the architectures - only sparc64 and some of powerpc. So it's definitely not enough. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > > Linus BTW, rmap.c patch is broken - needs at least Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/mm/rmap.c b/mm/rmap.c index 57306fa..669acb2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -452,7 +452,7 @@ static int page_mkclean_one(struct page entry = ptep_clear_flush(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); - set_pte_at(vma, address, pte, entry); + set_pte_at(mm, address, pte, entry); lazy_mmu_prot_update(entry); ret = 1; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really looks like "do_no_page()" was simply buggy, and that this explains everything. Still trying to catch up here, so I'm not going to reply to any old stuff and just start at the tip of the thread... Other than to say that I really like cancel_page_dirty ;) I think your patch is quite right so that's a good catch. But I'm not too surprised that it does not help the problem, because I don't think we have started shedding any old pte_dirty tests at unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, as such. I was hoping that you've almost narrowed it down to the filesystem writeback code, with the last few mails? Nick Please please please test. Throw all the other patches away (with the possible exception of the "update_mmu_cache()" sanity checker, which is still interesting in case some _other_ place does this too). Don't do the "wait_on_page_writeback()" thing, because it changes timings and might hide thngs for the wrong reasons. Just apply this on top of a known failing kernel, and test. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma->vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. Still trying to catch up here, so I'm not going to reply to any old stuff and just start at the tip of the thread... Other than to say that I really like cancel_page_dirty ;) I think your patch is quite right so that's a good catch. But I'm not too surprised that it does not help the problem, because I don't think we have started shedding any old pte_dirty tests at unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, as such. I was hoping that you've almost narrowed it down to the filesystem writeback code, with the last few mails? Nick Please please please test. Throw all the other patches away (with the possible exception of the update_mmu_cache() sanity checker, which is still interesting in case some _other_ place does this too). Don't do the wait_on_page_writeback() thing, because it changes timings and might hide thngs for the wrong reasons. Just apply this on top of a known failing kernel, and test. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma-vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? Linus BTW, rmap.c patch is broken - needs at least Signed-off-by: Al Viro [EMAIL PROTECTED] --- diff --git a/mm/rmap.c b/mm/rmap.c index 57306fa..669acb2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -452,7 +452,7 @@ static int page_mkclean_one(struct page entry = ptep_clear_flush(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); - set_pte_at(vma, address, pte, entry); + set_pte_at(mm, address, pte, entry); lazy_mmu_prot_update(entry); ret = 1; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? Linus BTW, rmap.c patch is broken - needs at least ... but that doesn't affect most of the architectures - only sparc64 and some of powerpc. So it's definitely not enough. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Tue, 26 Dec 2006, Nick Piggin wrote: Linus Torvalds wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. Still trying to catch up here, so I'm not going to reply to any old stuff and just start at the tip of the thread... Other than to say that I really like cancel_page_dirty ;) Yeah, I think that part is a bit clearer about what's going on now. I think your patch is quite right so that's a good catch. Actually, since people told me it didn't matter, I went back and looked at _why_ - the thing is, vma-vm_page_prot should always be read-only anyway, except for mappings that don't do dirty accounting at all, so I think my patch only found cases that are unimportant (ie pages that get faulted on on filesystems like ramfs that doesn't do any dirty page accounting because they're all dirty anyway). But I'm not too surprised that it does not help the problem, because I don't think we have started shedding any old pte_dirty tests at unmap/reclaim-time, have we? So the dirty bit isn't going to get lost, as such. True. We should no longer _need_ those dirty bit reclaims at unmap/reclaim, but we still do them, so you're right, even if we were buggy in this area, it should only really matter for the dirty page counting, not for any lost data. I was hoping that you've almost narrowed it down to the filesystem writeback code, with the last few mails? I think so, yes. However, I've checked, and rtorrent really does seem to be fairly well-behaved wrt any filesystem activity. It does - no threading. It's 100% single-threaded, and doesn't even appear to use signals. - exactly _one_ ftruncate(), and it does it at the beginning, for the full final size. IOW, it's not anything subtle with truncate and dirty page cancel. - It never uses mprotect on the shared mappings, but it _does_ do: mincore() - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I double- and triple-checked this one, because I did make changes to mincore(), but those didn't go into the affected kernels anyway (ie they are not in plain 2.6.19, nor in 2.6.18.3 either) madvise(MADV_WILLNEED) msync(MS_ASYNC) (or MS_SYNC if you use a command line flag) munmap() of course - it never seems to mix mmap() and write() - it does _only_ mmap. - it seems to mmap/munmap the shared files in nice 64-page chunks, all 64-page aligned in the file (ie it does NOT create one big mapping, it has some kind of LRU of thse 64-page chunks). The only exception being the last chunk, which it maps byte-accurate to the size. - I haven't checked whether it only ever has the same chunk mapped once at a time. Anyway, the _one_ half-way interesting thing is the fact that it doesn't allocate any backing store at all for the file, and as such the page writeback needs to create all the underlying buffers on the filesystem. I really don't see why that would be a problem either, but I could imagine that if we have some writeback bug where we can end up writing back the _same_ page concurrently, we'd actually end up racing in the kernel, and allocating two different backing stores, and then maybe the other one would effectively get lost (and the earlier writeback would win the race, explaining why we'd end up with zeroes at the end of a block). Or something. However, all the codepaths _seem_ to test for PG_writeback, and not even try to start another writeback while the first one is still active. What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just shared between the two by virtue of them still having a lot of stuff in common. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-24 11:35]: > And if this doesn't fix it, I don't know what will.. Sorry, but it still fails (on top of plain 2.6.19). -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
> Quoting Linus Torvalds <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content > corruption on ext3) > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > - writable > - not already marked dirty in the PTE > > because that combination means that the hardware can mark the PTE dirty > without us even realizing (and thus not marking the "struct page *" > dirty). Er. Sorry about bumping in, and I'm not sure I understand all of the discussion, but this reminded me of an old issue with COW that created what looks like a vaguely similiar data corruption on infiniband. We solved this for infiniband with MADV_DONTFORK, but I always wondered why does it not affect other parts of kernel. Small reminder from that discussion: down mmap sem get user pages up mmap sem page becomes shared, and COW (e.g. fork) process writes to first byte of page <- gets a copy Now we had a problem: struct page that we got from get user pages does not point to a correct page in our process. For example: if at some point we map this page for DMA, and hardware writes to last byte of page -> process does not see this data. So for infiniband, what we do is a combination of - prevent page from becoming COW while hardware might DMA to this page, and - ask users not to write to page if hardware might DMA to same page (even if its using different bytes). I just wandered - is there some chance something like this could be happening in the fs code? HTH, -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like "do_no_page()" was simply buggy, and that this explains everything. I tested with just this patch and 2.6.19 and no change. Sorry Linus, no early Christmas present :-( Gordon -- Gordon Farquharson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > only these: ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 but I don't think has anything to do with... > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Andrei Popa wrote: > > Hash check on download completion found bad chunks, consider using > "safe_sync". Dang. Did you get any warning messages from the kernel? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > > this patch and 2.6.19. > > Yeah, if my guess about do_no_page() is right, _none_ of the previous > patches should have ANY effect what-so-ever. In fact, I'd say that even > the "ext3 works in writeback mode" thing that Andrei reports is probably a > total fluke brought on by timing changes rather than anything else. > > So please try the latest patch instead (on top of anything that shows > corruption reliably - the patch should be _totally_ independent of all the > other issues, and I think it will apply cleanly on top of 2.6.18.3 and > 2.6.19 too, so anything that shows corruption is a fine target - but try > to choose something that has been the "best" at corrupting things for you, > to make the testing as good as possible). > > Patch included here again (although I think you were cc'd on my previous > email too, so you should already have it, and our emails just crossed) > > And if this doesn't fix it, I don't know what will.. With latest git and patches: http://lkml.org/lkml/diff/2006/12/24/56/1 http://lkml.org/lkml/diff/2006/12/24/61/1 Hash check on download completion found bad chunks, consider using "safe_sync". > > Linus > > --- > diff --git a/mm/memory.c b/mm/memory.c > index 563792f..cf429c4 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2247,21 +2249,23 @@ retry: > if (pte_none(*page_table)) { > flush_icache_page(vma, new_page); > entry = mk_pte(new_page, vma->vm_page_prot); > - if (write_access) > - entry = maybe_mkwrite(pte_mkdirty(entry), vma); > - set_pte_at(mm, address, page_table, entry); > if (anon) { > inc_mm_counter(mm, anon_rss); > lru_cache_add_active(new_page); > page_add_new_anon_rmap(new_page, vma, address); > + if (write_access) > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > } else { > inc_mm_counter(mm, file_rss); > page_add_file_rmap(new_page); > + entry = pte_wrprotect(entry); > if (write_access) { > dirty_page = new_page; > get_page(dirty_page); > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > } > } > + set_pte_at(mm, address, page_table, entry); > } else { > /* One of our sibling threads was faster, back out. */ > page_cache_release(new_page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the "ext3 works in writeback mode" thing that Andrei reports is probably a total fluke brought on by timing changes rather than anything else. So please try the latest patch instead (on top of anything that shows corruption reliably - the patch should be _totally_ independent of all the other issues, and I think it will apply cleanly on top of 2.6.18.3 and 2.6.19 too, so anything that shows corruption is a fine target - but try to choose something that has been the "best" at corrupting things for you, to make the testing as good as possible). Patch included here again (although I think you were cc'd on my previous email too, so you should already have it, and our emails just crossed) And if this doesn't fix it, I don't know what will.. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma->vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to re-map a shared page into its address space and dirty it. I haven't tested it, and maybe it misses some case, but it looks likea good way to try to avoid races with marking pages dirty and the writeback phase .. The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Gordon diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c --- linux-2.6.19.orig/fs/buffer.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700 @@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -*/ - clear_page_dirty(page); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c linux-2.6.19/fs/hugetlbfs/inode.c --- linux-2.6.19.orig/fs/hugetlbfs/inode.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/hugetlbfs/inode.c 2006-12-21 01:15:21.0 -0700 @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h linux-2.6.19/include/linux/page-flags.h --- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/include/linux/page-flags.h 2006-12-21 01:15:21.0 -0700 @@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc struct page; /* forward declaration */ -int test_clear_page_dirty(struct page *page); +extern void cancel_dirty_page(struct page *page, unsigned int account_size); + int test_clear_page_writeback(struct page *page); int test_set_page_writeback(struct page *page); -static inline void clear_page_dirty(struct page *page) -{ - test_clear_page_dirty(page); -} - static inline void set_page_writeback(struct page *page) { test_set_page_writeback(page); diff -Naupr linux-2.6.19.orig/mm/memory.c linux-2.6.19/mm/memory.c --- linux-2.6.19.orig/mm/memory.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/mm/memory.c2006-12-24 11:04:03.0 -0700 @@ -1534,6 +1534,7 @@ static int do_wp_page(struct mm_struct * if (!pte_same(*page_table, orig_pte)) goto unlock; } + wait_on_page_writeback(old_page); dirty_page = old_page; get_page(dirty_page); reuse = 1; @@ -1832,6 +1833,33 @@ void unmap_mapping_range(struct address_ } EXPORT_SYMBOL(unmap_mapping_range); +static void check_last_page(struct address_space *mapping, loff_t size) +{ + pgoff_t index; + unsigned int offset; + struct page *page; + + if (!mapping) + return; + offset = size & ~PAGE_MASK; + if (!offset) + return; + index = size >> PAGE_SHIFT; + page = find_lock_page(mapping, index); + if (page) { + unsigned int check = 0; + unsigned char *kaddr = kmap_atomic(page, KM_USER0); + do { + check += kaddr[offset++]; + } while (offset < PAGE_SIZE); + kunmap_atomic(kaddr,KM_USER0); + unlock_page(page); + page_cache_release(page); + if (check) + printk("%s: BADNESS: truncate check %u\n", current->comm, check); + } +} + /** * vmtruncate - unmap mappings "freed" by truncate() syscall * @inode: inode of the file used @@ -1865,6 +1893,7 @@ do_expand: goto out_sig; if (offset
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > - writable > - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really looks like "do_no_page()" was simply buggy, and that this explains everything. Please please please test. Throw all the other patches away (with the possible exception of the "update_mmu_cache()" sanity checker, which is still interesting in case some _other_ place does this too). Don't do the "wait_on_page_writeback()" thing, because it changes timings and might hide thngs for the wrong reasons. Just apply this on top of a known failing kernel, and test. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma->vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > How about this particularly stupid diff? (please test with something that > _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE because that combination means that the hardware can mark the PTE dirty without us even realizing (and thus not marking the "struct page *" dirty). (The above is actually a valid situation for IO mappings, but not for "real" mappings. And IO mappings should never take page faults, I think). So, with that in mind, I wrote this stupid patch (for 32-bit x86, since I used my Mac Mini for testing ratehr than my main machine - but the x86-64 version should be pretty much identcal).. And you know what, Peter? It triggers for me. I get WARNING at mm/memory.c:2274 do_no_page() [] show_trace_log_lvl+0x1a/0x2f [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] __handle_mm_fault+0x38d/0x919 [] do_page_fault+0x1ff/0x507 [] error_code+0x7c/0x84 which seems to say that do_no_page() can be used to insert shared and non-dirty, but still writable, pages. But maybe my patch is just bogus, and I didn't think it through. Peter, I realize it's Christmas Eve, but let's face it, Santa appreciates good boys and girls, and we all want tons of loot. So please be good, and waste some time looking at this and tell me why I'm either wrong, or there's a real smoking gun here.. ;) Linus --- diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h index e6a4723..1389bb7 100644 --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -494,7 +494,13 @@ do { \ * The i386 doesn't have any external MMU info: the kernel page * tables contain all the necessary information. */ -#define update_mmu_cache(vma,address,pte) do { } while (0) +#define bad_shared_pte(pte) (pte_write(pte) && !pte_dirty(pte)) +#define update_mmu_cache(vma,address,pte) do { \ + static int __cnt; \ + WARN_ON(((vma)->vm_flags & VM_SHARED) \ +&& bad_shared_pte(pte) \ +&& ++__cnt < 5); \ +} while (0) #endif /* !__ASSEMBLY__ */ #ifdef CONFIG_FLATMEM - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > > > > > I don't have corruption. I tested twice. > > > > > > This is a surprising result. Can you pleas retest ext3 > > > data=writeback,nobh? > > > > Yes, no corruption. Also tested only with data=writeback and had no > > corruption. > > Ok, so it would seem to be writeback related _somehow_. However, most of > the differences (I _thought_) in ext3 actually show up only if you have > *both* "nobh" and "data=writeback", and as far as I can tell, just a > simple "data=writeback" should still use the bog-standard > "block_write_full_page()". > > Andrew? > > Although as far as I can see, then ext2 should work as-is too (since it > too also just uses "block_write_full_page()" without anything fancy). ext2 uses the multipage-bio assembly code for writeback whereas ext3 doesn't. But ext3 doesn't use that code in data=ordered mode, of course. Still, this: --- a/fs/ext2/inode.c~a +++ a/fs/ext2/inode.c @@ -693,7 +693,7 @@ const struct address_space_operations ex .commit_write = generic_commit_write, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, - .writepages = ext2_writepages, +// .writepages = ext2_writepages, .migratepage= buffer_migrate_page, }; @@ -711,7 +711,7 @@ const struct address_space_operations ex .commit_write = nobh_commit_write, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, - .writepages = ext2_writepages, +// .writepages = ext2_writepages, .migratepage= buffer_migrate_page, }; _ will switch it off for ext2. > Strange. > > How about this particularly stupid diff? (please test with something that > _would_ cause corruption normally). > > It is _entirely_ untested, but what it tries to do is to simply serialize > any writeback in progress with any process that tries to re-map a shared > page into its address space and dirty it. I haven't tested it, and maybe > it misses some case, but it looks likea good way to try to avoid races > with marking pages dirty and the writeback phase .. > > Linus > --- > diff --git a/mm/memory.c b/mm/memory.c > index 563792f..64ed10b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct > vm_area_struct *vma, > if (!pte_same(*page_table, orig_pte)) > goto unlock; > } > + wait_on_page_writeback(old_page); > dirty_page = old_page; > get_page(dirty_page); > reuse = 1; > @@ -2215,6 +2216,7 @@ retry: > page_cache_release(new_page); > return VM_FAULT_SIGBUS; > } > + wait_on_page_writeback(new_page); > } > } yup. Also, we could perhaps lock the target page during pagefaults.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Andrei Popa wrote: > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > > > I don't have corruption. I tested twice. > > > > This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? > > Yes, no corruption. Also tested only with data=writeback and had no > corruption. Ok, so it would seem to be writeback related _somehow_. However, most of the differences (I _thought_) in ext3 actually show up only if you have *both* "nobh" and "data=writeback", and as far as I can tell, just a simple "data=writeback" should still use the bog-standard "block_write_full_page()". Andrew? Although as far as I can see, then ext2 should work as-is too (since it too also just uses "block_write_full_page()" without anything fancy). Strange. How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to re-map a shared page into its address space and dirty it. I haven't tested it, and maybe it misses some case, but it looks likea good way to try to avoid races with marking pages dirty and the writeback phase .. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..64ed10b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_same(*page_table, orig_pte)) goto unlock; } + wait_on_page_writeback(old_page); dirty_page = old_page; get_page(dirty_page); reuse = 1; @@ -2215,6 +2216,7 @@ retry: page_cache_release(new_page); return VM_FAULT_SIGBUS; } + wait_on_page_writeback(new_page); } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 14:14:38 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > - mount the fs with ext2 with the no-buffer-head option. That means > > > either: > > > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > > /etc/fstab: ext2 nobh > > > > ierdnac ~ # mount > > /dev/sda7 on / type ext2 (rw,noatime,nobh) > > > > I have corruption. > > > > > > > > - mount the fs with ext3 data=writeback, nobh > > > > > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > > > works) > > > /etc/fstab: ext2 data=writeback,nobh > > > > ierdnac ~ # mount > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > ierdnac ~ # dmesg|grep EXT3 > > EXT3-fs: mounted filesystem with writeback data mode. > > EXT3 FS on sda7, internal journal > > > > I don't have corruption. I tested twice. > > This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? Yes, no corruption. Also tested only with data=writeback and had no corruption. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-24 00:57]: > /etc/fstab: ext2 nobh > /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > /etc/fstab: ext2 nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext2 (rw,noatime,nobh) > > I have corruption. > > > > > - mount the fs with ext3 data=writeback, nobh > > > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > > works) > > /etc/fstab: ext2 data=writeback,nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > ierdnac ~ # dmesg|grep EXT3 > EXT3-fs: mounted filesystem with writeback data mode. > EXT3 FS on sda7, internal journal > > I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: > On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > I now _suspect_ that we're talking about something like > > > > > > - we started a writeout. The IO is still pending, and the page was > > >marked clean and is now in the "writeback" phase. > > > - a write happens to the page, and the page gets marked dirty again. > > >Marking the page dirty also marks all the _buffers_ in the page dirty, > > >but they were actually already dirty, because the IO hasn't completed > > >yet. > > > - the IO from the _previous_ write completes, and marks the buffers > > > clean > > >again. > > > > Some things for the testers to try, please: > > > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > /etc/fstab: ext2 nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext2 (rw,noatime,nobh) > > I have corruption. > > > > > - mount the fs with ext3 data=writeback, nobh > > > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > > works) > > /etc/fstab: ext2 data=writeback,nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > ierdnac ~ # dmesg|grep EXT3 > EXT3-fs: mounted filesystem with writeback data mode. > EXT3 FS on sda7, internal journal > > I don't have corruption. I tested twice. > I also tested with ext3 ordered, nobh and I have file corruption... > > > > if that still fails we can rule out buffer_head funnies. > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked clean and is now in the "writeback" phase. > > - a write happens to the page, and the page gets marked dirty again. > >Marking the page dirty also marks all the _buffers_ in the page dirty, > >but they were actually already dirty, because the IO hasn't completed > >yet. > > - the IO from the _previous_ write completes, and marks the buffers clean > >again. > > Some things for the testers to try, please: > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > grub.conf: rootfstype=ext2 rootflags=nobh > /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. > > - mount the fs with ext3 data=writeback, nobh > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > works) > /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. > > if that still fails we can rule out buffer_head funnies. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Andrew Morton wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked clean and is now in the "writeback" phase. > > - a write happens to the page, and the page gets marked dirty again. > >Marking the page dirty also marks all the _buffers_ in the page dirty, > >but they were actually already dirty, because the IO hasn't completed > >yet. > > - the IO from the _previous_ write completes, and marks the buffers clean > >again. > > Some things for the testers to try, please: > > - mount the fs with ext2 with the no-buffer-head option. That means either: [ snip snip ] This is definitely worth testing, but the exact schenario I outlined is probably not the thing that happens. It was really meant to be more of an exmple of the _kind_ of situation I think we might have. That would explain why we didn't see this before: we simply didn't mark pages clean all that aggressively, and an app like rtorrent would normally have caused its flushes to happen _synchronously_ by using msync() (even if the IO itself was done asynchronously, all the dirty bit stuff would be synchronous wrt any rtorrent behaviour). And the things that /did/ use to clean pages asynchronously (VM scanning) would always actually look at the "young" bit (aka "accessed") and not even touch the dirty bit if an application had accessed the page recently, so that basically avoided any likely races, because we'd touch the dirty bit ONLY if the page was "cold". So this is why I'm saying that it might be an old bug, and it would be just the new pattern of handling dirty bits that triggers it. But avoiding buffer heads and testing that part is worth doing. Just to remove one thing from the equation. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > I now _suspect_ that we're talking about something like > > - we started a writeout. The IO is still pending, and the page was >marked clean and is now in the "writeback" phase. > - a write happens to the page, and the page gets marked dirty again. >Marking the page dirty also marks all the _buffers_ in the page dirty, >but they were actually already dirty, because the IO hasn't completed >yet. > - the IO from the _previous_ write completes, and marks the buffers clean >again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh if that still fails we can rule out buffer_head funnies. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > Is there any way to provide any debugging information that may help > solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. > Would it help to know the nature of the corruption e.g. an analysis > of the corruption in the file ? I actually think we know that, because Andrei already gave details. The corruption seems to be basically a few pages that get zeroes at the end rather than the expected contents. That's consistent with the page being written out once, but then _not_ getting written out again despite being dirtied some more. But if you see ay other pattern, please holler, because that would be interesting. > BTW, I decided to try Linus's test program [1] on ARM (I don't think > that anybody had tried it on ARM before). You get the expected results, and in fact, I'd be very surprised if you didn't. It's something subtler than that going on. I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the "writeback" phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. And no, thatr's not actually what is going on. The thing is, we actually clear the buffer dirty bits when we start the IO, not when we end it, but I think it is going to be this _kind_ of situation, where we missed something, and marked it clean too late, and thus cleared a dirty bit. I don't think it's a page table issue any more, it just doesn't look likely with the ARM UP corruption. It's also not apparently even on a cacheline boundary, so it probably is really a dirty bit that got cleared wrogn due to some race with IO. But right now we're all clueless. I personally suspect it's not even a new bug: it's probably an old bug that simply didn't matter before. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: * Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again together with Linus' patch and the two from Andrew and it still fails. (For reference, the patch is attached.) I can confirm this behaviour with 2.6.19 and the patches mentioned above (cumulative patch for 2.6.19 appended to the end of this email). Is there any way to provide any debugging information that may help solve the problem ? Would it help to know the nature of the corruption e.g. an analysis of the corruption in the file ? I have previously asked apt developers if they wanted to look at the corrupted cache files, but there were no takers then. BTW, I decided to try Linus's test program [1] on ARM (I don't think that anybody had tried it on ARM before). Since we see file corruption with 2.6.18 + [PATCH] mm: tracking shared dirty pages [2], I ran Linus's program on machines with the following setups: 2.6.18 + the following patches mm: tracking shared dirty pages [2] mm: balance dirty pages [3] mm: optimize the new mprotect() code a bit [4] mm: small cleanup of install_page() [5] mm: fixup do_wp_page() [6] mm: msync() cleanup [7] $ ./mm-test | od -x 000 020 040 050 2.6.18 (no mm patches) $ ./mm-test | od -x 000 020 040 050 I don't know if this helps at all. Gordon [1] http://lkml.org/lkml/2006/12/19/200 [2] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89 [3] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=edc79b2a46ed854595e40edcf3f8b37f9f14aa3f [4] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c1e6098b23bb46e2b488fe9a26f831f867157483 [5] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e88dd6c11c5aef74d8b74a062767add53315533b [6] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ee6a6457886a80415db209e87033b63f2b06558c [7] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=204ec841fbea3e5138168edbc3a76d46747cc987 diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c --- linux-2.6.19.orig/fs/buffer.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700 @@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -*/ - clear_page_dirty(page); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c linux-2.6.19/fs/hugetlbfs/inode.c --- linux-2.6.19.orig/fs/hugetlbfs/inode.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/hugetlbfs/inode.c 2006-12-21 01:15:21.0 -0700 @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h linux-2.6.19/include/linux/page-flags.h --- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/include/linux/page-flags.h 2006-12-21 01:15:21.0 -0700 @@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc struct page; /* forward declaration */ -int test_clear_page_dirty(struct page *page); +extern void cancel_dirty_page(struct page *page, unsigned int account_size); + int test_clear_page_writeback(struct
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/22/06, Martin Michlmayr [EMAIL PROTECTED] wrote: * Peter Zijlstra [EMAIL PROTECTED] [2006-12-22 14:25]: and it failed. Since you are on ARM you might want to try with the page_mkclean_one cleanup patch too. I've already tried it and it didn't work. I just tried it again together with Linus' patch and the two from Andrew and it still fails. (For reference, the patch is attached.) I can confirm this behaviour with 2.6.19 and the patches mentioned above (cumulative patch for 2.6.19 appended to the end of this email). Is there any way to provide any debugging information that may help solve the problem ? Would it help to know the nature of the corruption e.g. an analysis of the corruption in the file ? I have previously asked apt developers if they wanted to look at the corrupted cache files, but there were no takers then. BTW, I decided to try Linus's test program [1] on ARM (I don't think that anybody had tried it on ARM before). Since we see file corruption with 2.6.18 + [PATCH] mm: tracking shared dirty pages [2], I ran Linus's program on machines with the following setups: 2.6.18 + the following patches mm: tracking shared dirty pages [2] mm: balance dirty pages [3] mm: optimize the new mprotect() code a bit [4] mm: small cleanup of install_page() [5] mm: fixup do_wp_page() [6] mm: msync() cleanup [7] $ ./mm-test | od -x 000 020 040 050 2.6.18 (no mm patches) $ ./mm-test | od -x 000 020 040 050 I don't know if this helps at all. Gordon [1] http://lkml.org/lkml/2006/12/19/200 [2] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89 [3] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=edc79b2a46ed854595e40edcf3f8b37f9f14aa3f [4] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c1e6098b23bb46e2b488fe9a26f831f867157483 [5] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e88dd6c11c5aef74d8b74a062767add53315533b [6] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ee6a6457886a80415db209e87033b63f2b06558c [7] http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=204ec841fbea3e5138168edbc3a76d46747cc987 diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c --- linux-2.6.19.orig/fs/buffer.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700 @@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -*/ - clear_page_dirty(page); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c linux-2.6.19/fs/hugetlbfs/inode.c --- linux-2.6.19.orig/fs/hugetlbfs/inode.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/hugetlbfs/inode.c 2006-12-21 01:15:21.0 -0700 @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h linux-2.6.19/include/linux/page-flags.h --- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/include/linux/page-flags.h 2006-12-21 01:15:21.0 -0700 @@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc struct page; /* forward declaration */ -int test_clear_page_dirty(struct page *page); +extern void cancel_dirty_page(struct page *page, unsigned int account_size); + int
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Gordon Farquharson wrote: Is there any way to provide any debugging information that may help solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. Would it help to know the nature of the corruption e.g. an analysis of the corruption in the file ? I actually think we know that, because Andrei already gave details. The corruption seems to be basically a few pages that get zeroes at the end rather than the expected contents. That's consistent with the page being written out once, but then _not_ getting written out again despite being dirtied some more. But if you see ay other pattern, please holler, because that would be interesting. BTW, I decided to try Linus's test program [1] on ARM (I don't think that anybody had tried it on ARM before). You get the expected results, and in fact, I'd be very surprised if you didn't. It's something subtler than that going on. I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. And no, thatr's not actually what is going on. The thing is, we actually clear the buffer dirty bits when we start the IO, not when we end it, but I think it is going to be this _kind_ of situation, where we missed something, and marked it clean too late, and thus cleared a dirty bit. I don't think it's a page table issue any more, it just doesn't look likely with the ARM UP corruption. It's also not apparently even on a cacheline boundary, so it probably is really a dirty bit that got cleared wrogn due to some race with IO. But right now we're all clueless. I personally suspect it's not even a new bug: it's probably an old bug that simply didn't matter before. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh if that still fails we can rule out buffer_head funnies. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Andrew Morton wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: [ snip snip ] This is definitely worth testing, but the exact schenario I outlined is probably not the thing that happens. It was really meant to be more of an exmple of the _kind_ of situation I think we might have. That would explain why we didn't see this before: we simply didn't mark pages clean all that aggressively, and an app like rtorrent would normally have caused its flushes to happen _synchronously_ by using msync() (even if the IO itself was done asynchronously, all the dirty bit stuff would be synchronous wrt any rtorrent behaviour). And the things that /did/ use to clean pages asynchronously (VM scanning) would always actually look at the young bit (aka accessed) and not even touch the dirty bit if an application had accessed the page recently, so that basically avoided any likely races, because we'd touch the dirty bit ONLY if the page was cold. So this is why I'm saying that it might be an old bug, and it would be just the new pattern of handling dirty bits that triggers it. But avoiding buffer heads and testing that part is worth doing. Just to remove one thing from the equation. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. if that still fails we can rule out buffer_head funnies. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. I also tested with ext3 ordered, nobh and I have file corruption... if that still fails we can rule out buffer_head funnies. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Andrew Morton [EMAIL PROTECTED] [2006-12-24 00:57]: /etc/fstab: ext2 nobh /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? Yes, no corruption. Also tested only with data=writeback and had no corruption. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Andrei Popa wrote: On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: Andrei Popa [EMAIL PROTECTED] wrote: /dev/sda7 on / type ext3 (rw,noatime,nobh) I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? Yes, no corruption. Also tested only with data=writeback and had no corruption. Ok, so it would seem to be writeback related _somehow_. However, most of the differences (I _thought_) in ext3 actually show up only if you have *both* nobh and data=writeback, and as far as I can tell, just a simple data=writeback should still use the bog-standard block_write_full_page(). Andrew? Although as far as I can see, then ext2 should work as-is too (since it too also just uses block_write_full_page() without anything fancy). Strange. How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to re-map a shared page into its address space and dirty it. I haven't tested it, and maybe it misses some case, but it looks likea good way to try to avoid races with marking pages dirty and the writeback phase .. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..64ed10b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_same(*page_table, orig_pte)) goto unlock; } + wait_on_page_writeback(old_page); dirty_page = old_page; get_page(dirty_page); reuse = 1; @@ -2215,6 +2216,7 @@ retry: page_cache_release(new_page); return VM_FAULT_SIGBUS; } + wait_on_page_writeback(new_page); } } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: Andrei Popa [EMAIL PROTECTED] wrote: /dev/sda7 on / type ext3 (rw,noatime,nobh) I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? Yes, no corruption. Also tested only with data=writeback and had no corruption. Ok, so it would seem to be writeback related _somehow_. However, most of the differences (I _thought_) in ext3 actually show up only if you have *both* nobh and data=writeback, and as far as I can tell, just a simple data=writeback should still use the bog-standard block_write_full_page(). Andrew? Although as far as I can see, then ext2 should work as-is too (since it too also just uses block_write_full_page() without anything fancy). ext2 uses the multipage-bio assembly code for writeback whereas ext3 doesn't. But ext3 doesn't use that code in data=ordered mode, of course. Still, this: --- a/fs/ext2/inode.c~a +++ a/fs/ext2/inode.c @@ -693,7 +693,7 @@ const struct address_space_operations ex .commit_write = generic_commit_write, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, - .writepages = ext2_writepages, +// .writepages = ext2_writepages, .migratepage= buffer_migrate_page, }; @@ -711,7 +711,7 @@ const struct address_space_operations ex .commit_write = nobh_commit_write, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, - .writepages = ext2_writepages, +// .writepages = ext2_writepages, .migratepage= buffer_migrate_page, }; _ will switch it off for ext2. Strange. How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to re-map a shared page into its address space and dirty it. I haven't tested it, and maybe it misses some case, but it looks likea good way to try to avoid races with marking pages dirty and the writeback phase .. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..64ed10b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_same(*page_table, orig_pte)) goto unlock; } + wait_on_page_writeback(old_page); dirty_page = old_page; get_page(dirty_page); reuse = 1; @@ -2215,6 +2216,7 @@ retry: page_cache_release(new_page); return VM_FAULT_SIGBUS; } + wait_on_page_writeback(new_page); } } yup. Also, we could perhaps lock the target page during pagefaults.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Linus Torvalds wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE because that combination means that the hardware can mark the PTE dirty without us even realizing (and thus not marking the struct page * dirty). (The above is actually a valid situation for IO mappings, but not for real mappings. And IO mappings should never take page faults, I think). So, with that in mind, I wrote this stupid patch (for 32-bit x86, since I used my Mac Mini for testing ratehr than my main machine - but the x86-64 version should be pretty much identcal).. And you know what, Peter? It triggers for me. I get WARNING at mm/memory.c:2274 do_no_page() [c0103d4a] show_trace_log_lvl+0x1a/0x2f [c010436c] show_trace+0x12/0x14 [c01043f0] dump_stack+0x16/0x18 [c0159790] __handle_mm_fault+0x38d/0x919 [c011c8c4] do_page_fault+0x1ff/0x507 [c03fabcc] error_code+0x7c/0x84 which seems to say that do_no_page() can be used to insert shared and non-dirty, but still writable, pages. But maybe my patch is just bogus, and I didn't think it through. Peter, I realize it's Christmas Eve, but let's face it, Santa appreciates good boys and girls, and we all want tons of loot. So please be good, and waste some time looking at this and tell me why I'm either wrong, or there's a real smoking gun here.. ;) Linus --- diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h index e6a4723..1389bb7 100644 --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -494,7 +494,13 @@ do { \ * The i386 doesn't have any external MMU info: the kernel page * tables contain all the necessary information. */ -#define update_mmu_cache(vma,address,pte) do { } while (0) +#define bad_shared_pte(pte) (pte_write(pte) !pte_dirty(pte)) +#define update_mmu_cache(vma,address,pte) do { \ + static int __cnt; \ + WARN_ON(((vma)-vm_flags VM_SHARED) \ + bad_shared_pte(pte) \ + ++__cnt 5); \ +} while (0) #endif /* !__ASSEMBLY__ */ #ifdef CONFIG_FLATMEM - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. Please please please test. Throw all the other patches away (with the possible exception of the update_mmu_cache() sanity checker, which is still interesting in case some _other_ place does this too). Don't do the wait_on_page_writeback() thing, because it changes timings and might hide thngs for the wrong reasons. Just apply this on top of a known failing kernel, and test. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma-vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/24/06, Linus Torvalds [EMAIL PROTECTED] wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to re-map a shared page into its address space and dirty it. I haven't tested it, and maybe it misses some case, but it looks likea good way to try to avoid races with marking pages dirty and the writeback phase .. The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Gordon diff -Naupr linux-2.6.19.orig/fs/buffer.c linux-2.6.19/fs/buffer.c --- linux-2.6.19.orig/fs/buffer.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/buffer.c2006-12-21 01:16:31.0 -0700 @@ -2832,7 +2832,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2843,17 +2843,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -*/ - clear_page_dirty(page); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff -Naupr linux-2.6.19.orig/fs/hugetlbfs/inode.c linux-2.6.19/fs/hugetlbfs/inode.c --- linux-2.6.19.orig/fs/hugetlbfs/inode.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/fs/hugetlbfs/inode.c 2006-12-21 01:15:21.0 -0700 @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff -Naupr linux-2.6.19.orig/include/linux/page-flags.h linux-2.6.19/include/linux/page-flags.h --- linux-2.6.19.orig/include/linux/page-flags.h2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/include/linux/page-flags.h 2006-12-21 01:15:21.0 -0700 @@ -253,15 +253,11 @@ static inline void SetPageUptodate(struc struct page; /* forward declaration */ -int test_clear_page_dirty(struct page *page); +extern void cancel_dirty_page(struct page *page, unsigned int account_size); + int test_clear_page_writeback(struct page *page); int test_set_page_writeback(struct page *page); -static inline void clear_page_dirty(struct page *page) -{ - test_clear_page_dirty(page); -} - static inline void set_page_writeback(struct page *page) { test_set_page_writeback(page); diff -Naupr linux-2.6.19.orig/mm/memory.c linux-2.6.19/mm/memory.c --- linux-2.6.19.orig/mm/memory.c 2006-11-29 14:57:37.0 -0700 +++ linux-2.6.19/mm/memory.c2006-12-24 11:04:03.0 -0700 @@ -1534,6 +1534,7 @@ static int do_wp_page(struct mm_struct * if (!pte_same(*page_table, orig_pte)) goto unlock; } + wait_on_page_writeback(old_page); dirty_page = old_page; get_page(dirty_page); reuse = 1; @@ -1832,6 +1833,33 @@ void unmap_mapping_range(struct address_ } EXPORT_SYMBOL(unmap_mapping_range); +static void check_last_page(struct address_space *mapping, loff_t size) +{ + pgoff_t index; + unsigned int offset; + struct page *page; + + if (!mapping) + return; + offset = size ~PAGE_MASK; + if (!offset) + return; + index = size PAGE_SHIFT; + page = find_lock_page(mapping, index); + if (page) { + unsigned int check = 0; + unsigned char *kaddr = kmap_atomic(page, KM_USER0); + do { + check += kaddr[offset++]; + } while (offset PAGE_SIZE); + kunmap_atomic(kaddr,KM_USER0); + unlock_page(page); + page_cache_release(page); + if (check) + printk(%s: BADNESS: truncate check %u\n, current-comm, check); + } +} + /** * vmtruncate - unmap mappings freed by truncate() syscall * @inode: inode of the file used @@ -1865,6 +1893,7 @@ do_expand: goto out_sig;
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Gordon Farquharson wrote: The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the ext3 works in writeback mode thing that Andrei reports is probably a total fluke brought on by timing changes rather than anything else. So please try the latest patch instead (on top of anything that shows corruption reliably - the patch should be _totally_ independent of all the other issues, and I think it will apply cleanly on top of 2.6.18.3 and 2.6.19 too, so anything that shows corruption is a fine target - but try to choose something that has been the best at corrupting things for you, to make the testing as good as possible). Patch included here again (although I think you were cc'd on my previous email too, so you should already have it, and our emails just crossed) And if this doesn't fix it, I don't know what will.. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma-vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Gordon Farquharson wrote: The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the ext3 works in writeback mode thing that Andrei reports is probably a total fluke brought on by timing changes rather than anything else. So please try the latest patch instead (on top of anything that shows corruption reliably - the patch should be _totally_ independent of all the other issues, and I think it will apply cleanly on top of 2.6.18.3 and 2.6.19 too, so anything that shows corruption is a fine target - but try to choose something that has been the best at corrupting things for you, to make the testing as good as possible). Patch included here again (although I think you were cc'd on my previous email too, so you should already have it, and our emails just crossed) And if this doesn't fix it, I don't know what will.. With latest git and patches: http://lkml.org/lkml/diff/2006/12/24/56/1 http://lkml.org/lkml/diff/2006/12/24/61/1 Hash check on download completion found bad chunks, consider using safe_sync. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma-vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? only these: ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 but I don't think has anything to do with... Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/24/06, Linus Torvalds [EMAIL PROTECTED] wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. I tested with just this patch and 2.6.19 and no change. Sorry Linus, no early Christmas present :-( Gordon -- Gordon Farquharson - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
Quoting Linus Torvalds [EMAIL PROTECTED]: Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE because that combination means that the hardware can mark the PTE dirty without us even realizing (and thus not marking the struct page * dirty). Er. Sorry about bumping in, and I'm not sure I understand all of the discussion, but this reminded me of an old issue with COW that created what looks like a vaguely similiar data corruption on infiniband. We solved this for infiniband with MADV_DONTFORK, but I always wondered why does it not affect other parts of kernel. Small reminder from that discussion: down mmap sem get user pages up mmap sem page becomes shared, and COW (e.g. fork) process writes to first byte of page - gets a copy Now we had a problem: struct page that we got from get user pages does not point to a correct page in our process. For example: if at some point we map this page for DMA, and hardware writes to last byte of page - process does not see this data. So for infiniband, what we do is a combination of - prevent page from becoming COW while hardware might DMA to this page, and - ask users not to write to page if hardware might DMA to same page (even if its using different bytes). I just wandered - is there some chance something like this could be happening in the fs code? HTH, -- MST - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Linus Torvalds [EMAIL PROTECTED] [2006-12-24 11:35]: And if this doesn't fix it, I don't know what will.. Sorry, but it still fails (on top of plain 2.6.19). -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... but I really > need a better testcase since an installation takes about an hour. > Andrei, which torrent do you download as a testcase? It would be good > if someone could suggest a torrent which is legal and not too large. It's a 1.4GB file torrent split in 84 rar files and there are many seeders. I download with ~ 5MB/sec. The torrent is private. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: * Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]: With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an hour. Andrei, which torrent do you download as a testcase? It would be good if someone could suggest a torrent which is legal and not too large. It's a 1.4GB file torrent split in 84 rar files and there are many seeders. I download with ~ 5MB/sec. The torrent is private. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again together with Linus' patch and the two from Andrew and it still fails. (For reference, the patch is attached.) -- Martin Michlmayr http://www.cyrius.com/ --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..4f4cd13 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 4830a3b..350878a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -253,15 +253,11 @@ #define ClearPageUncached(page) clear_bi struct page; /* forward declaration */ -int test_clear_page_dirty(struct page *page); +extern void cancel_dirty_page(struct page *page, unsigned int account_size); + int test_clear_page_writeback(struct page *page); int test_set_page_writeback(struct page *page); -static inline void clear_page_dirty(struct page *page) -{ - test_clear_page_dirty(page); -} - static inline void set_page_writeback(struct page *page) { test_set_page_writeback(page); diff --git a/mm/memory.c b/mm/memory.c index c00bac6..79cecab 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_ } EXPORT_SYMBOL(unmap_mapping_range); +static void check_last_page(struct address_space *mapping, loff_t size) +{ + pgoff_t index; + unsigned int offset; + struct page *page; + + if (!mapping) + return; + offset = size & ~PAGE_MASK; + if (!offset) + return; + index = size >> PAGE_SHIFT; + page = find_lock_page(mapping, index); + if (page) { + unsigned int check = 0; + unsigned char *kaddr = kmap_atomic(page, KM_USER0); + do { + check += kaddr[offset++]; + } while (offset < PAGE_SIZE); + kunmap_atomic(kaddr,KM_USER0); + unlock_page(page); + page_cache_release(page); + if (check) + printk("%s: BADNESS: truncate check %u\n", current->comm, check); + } +} + /** * vmtruncate - unmap mappings "freed" by truncate() syscall * @inode: inode of the file used @@ -1875,6 +1902,7 @@ do_expand: goto out_sig; if (offset > inode->i_sb->s_maxbytes) goto out_big; + check_last_page(mapping, inode->i_size); i_size_write(inode, offset); out_truncate: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 237107c..b3a198c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -845,38 +845,6 @@ int set_page_dirty_lock(struct page *pag EXPORT_SYMBOL(set_page_dirty_lock); /* - * Clear a page's dirty flag, while caring for dirty memory accounting. - * Returns true if the page was previously dirty. - */ -int test_clear_page_dirty(struct page *page) -{ - struct address_space *mapping = page_mapping(page); - unsigned long flags; - - if (!mapping) - return
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 22 Dec 2006, Peter Zijlstra wrote: > > fix page_mkclean_one() > > - add flush_cache_page() for all those virtual indexed cache >architectures. I think the flush_cache_page() should be after we've actually flushed it from the TLB and re-inserted it (this is one reason why I did the "ptep_exchange()" version of this). Otherwise somebody can still write to the page _after_ the cache flush.. > - handle s390. Yeah, that looks like the proper way to handle that. That said, it looks like we still see corruption. You may not, but Martin and Andrei still report problems, even with all the patches (including the last one from Andrew that avoids "dirty" going negative under some circumstances, and explains the "slow and/or never completed" case that Gordon and Martin saw). The good news is that I think the code now is cleaner and more understandable. The bad news is that nothing we've ever tried seems to have fixed the _problem_. And I don't think it's page_mkclean(). Especially not since the ARM people are seeing this under UP without PREEMPT. In that kind of schenario, the only possible races tend to be from things that actually block: "set_page_dirty()" (which blocks on IO in balancing), memory allocations, and obviously doing actual IO. And it's not a virtual cache problem, since others see it on x86. Of course, since it's quite possibly two different issues, maybe the virtual cache flush is required in order to force write-back to memory (which in turn is required for the DMA for the actual write!). So the ARM issue certainly could be due to the flush_cache_page() thing... Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-22 08:30]: > Based on the kernel gurus current knowledge of the problem, would > you expect the corruption to occur at the same point in a file, or > is it possible that the corruption could occur at different points > on successive Debian installer attempts on a UP, non PREEMPT system? Seems like it can occur anywhere. In fact, some people see apt problems because of filesystem corruption on the NSLU2 after they have already installe Debian. I've only seen this once myself and failed many times to find a reproducible situation. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: sh-3.1# ls -l /var/cache/apt/ total 12524 drwxr-xr-x 3 root root 12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin -rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin sh-3.1# apt-get -f install Reading package lists... Done Segmentation faulty tree... 50% I think that we are seeing different manifestations of apt's response to corrupted cache files. There does not appear to be any pattern to which manifestation occurs. Maybe it depends on where in the cache file the corruption is located, i.e. when the corruption occurs. Based on the kernel gurus current knowledge of the problem, would you expect the corruption to occur at the same point in a file, or is it possible that the corruption could occur at different points on successive Debian installer attempts on a UP, non PREEMPT system ? Gordon -- Gordon Farquharson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, Dec 22, 2006 at 01:32:49PM +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... but I really > need a better testcase since an installation takes about an hour. > Andrei, which torrent do you download as a testcase? It would be good > if someone could suggest a torrent which is legal and not too large. Hi everyone, I have been reading this thread for the last few days, but have been silent. I have 3 torrents here for testing, if you want. You can easily reproduce with "rtorrent", if you: - Have a completly downloaded one, no matter what size - Corrupt the download with dd if=/dev/zero of=download.file bs=16k count=1 - Restart 'rtorrent', hash-check fails - It will download 1 piece that was corrupted. The important part here is that rtorrent transfers one piece, using its own code sequence to write to the file. Let me offer to test until Saturday afternoon CET, I have a cloned git repository, downloaded torrent files and "apt". My systems that are affected are: Linux oscar 2.6.18 SMP (2x450Mhz Intel P3) (rolled back to 2.6.18 but can boot latest git) Linux tony 2.6.20-git UP (can be tested using all kinds of "apt" operations) Both machines are using: IDE -> MD-RAID1 -> LVM -> EXT3 (data=ordered) SCSI -> MD-RAID5 -> . I don't want to disturb your technical discussion, just offering some help in testing. Regards, Patrick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: sh-3.1# ls -l /var/cache/apt/ total 5252 drwxr-xr-x 3 root root12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin -rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin This listing is a little different to what I got. For me, srcpkgcache.bin did not exist when apt eventually finished. Did you notice whether the install took a lot longer than usual ? Gordon, does it fail for you where it normally does (installing initramfs-tools) or much later? For me, the installer was able to install initramfs-tools and the kernel, but apt now hangs at "Select and install software". apt didn't hang for me, it just took 20 to 30 minutes to complete building the package database. Usually, it takes less than a minute. The installer stopped because it could not find a kernel to install. I have seen this failure mde before, and as you have previously pointed out, is probably the same problem (corrupted apt cache files), just a different manifestation. Gordon -- Gordon Farquharson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Andrew located at least one bug: we run cancel_dirty_page() too late in "truncate_complete_page()", which means that do_invalidatepage() ends up not clearing the page cache. His patch is appended. Thanks. I'll try this out later today. But it sounds like I probably misunderstood something, because I thought that Martin had acknowledged that this patch actually worked for him. Which sounded very similar to your setup (he has a 32M ARM box too, no?) Yup, we have the same machines (Linksys NSLU2) and are running the same test case (installing Debian). However, I'm not sure what kernel version he had used for his latest test. I presumed 2.6.20-git, whereas I had used 2.6.19. Maybe it's mount option issue? I've got data=ordered on my machine, are you perhaps runnign with something else? We are also using ordered. /dev/scsi/host0/bus0/target0/lun0/part1 /target ext3 rw,data=ordered 0 0 Gordon -- Gordon Farquharson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
A cleanup of try_to_unmap. I have not identified any races that this would solve, but for consistencies sake. Also includes a small s390 optimization by moving page_test_and_clear_dirty() out of the vma iteration. From: Peter Zijlstra <[EMAIL PROTECTED]> We clear the page in the following sequence: ClearPageDirty - lock ptl, clear pte, unlock ptl hence we should dirty in the opposite order: lock ptl, clear pte, unlock ptl - SetPageDirty try_to_unmap_one violates this by doing the SetPageDirty under the ptl. Also move page_test_and_clear_dirty() to try_to_unmap(). Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- mm/rmap.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -590,8 +590,6 @@ void page_remove_rmap(struct page *page) * Leaving it set also helps swapoff to reinstate ptes * faster for those pages still in swapcache. */ - if (page_test_and_clear_dirty(page)) - set_page_dirty(page); __dec_zone_page_state(page, PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED); } @@ -610,6 +608,7 @@ static int try_to_unmap_one(struct page pte_t pteval; spinlock_t *ptl; int ret = SWAP_AGAIN; + struct page *dirty_page = NULL; address = vma_address(page, vma); if (address == -EFAULT) @@ -636,7 +635,7 @@ static int try_to_unmap_one(struct page /* Move the dirty bit to the physical page now the pte is gone. */ if (pte_dirty(pteval)) - set_page_dirty(page); + dirty_page = page; /* Update high watermark before we lower rss */ update_hiwater_rss(mm); @@ -687,6 +686,8 @@ static int try_to_unmap_one(struct page out_unmap: pte_unmap_unlock(pte, ptl); + if (dirty_page) + set_page_dirty(dirty_page); out: return ret; } @@ -918,6 +919,9 @@ int try_to_unmap(struct page *page, int else ret = try_to_unmap_file(page, migration); + if (page_test_and_clear_dirty(page)) + set_page_dirty(page); + if (!page_mapped(page)) ret = SWAP_SUCCESS; return ret; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote: > * Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]: > > I've completed one installation with Linus' patch plus the two from > > Andrew successfully, but I'm currently trying again... > > and it failed. Since you are on ARM you might want to try with the page_mkclean_one cleanup patch too. Arjan agreed that the loop is not needed; we clear the pte, flush on all CPUs and then re-establish the pte. Any race will fault and be serialised on the pte lock. FWIW - with todays -git and Andrews second cancel_dirty_page() patch: http://lkml.org/lkml/2006/12/22/49 I am unable to trigger any corruption - I could again earlier by raising the number of seeds from 3 to 6. (am currently at 10 seeds) From: Peter Zijlstra <[EMAIL PROTECTED]> fix page_mkclean_one() - add flush_cache_page() for all those virtual indexed cache architectures. - handle s390. Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- mm/rmap.c | 38 +- 1 file changed, 25 insertions(+), 13 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -432,7 +432,7 @@ static int page_mkclean_one(struct page { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *pte, entry; + pte_t *pte; spinlock_t *ptl; int ret = 0; @@ -444,17 +444,18 @@ static int page_mkclean_one(struct page if (!pte) goto out; - if (!pte_dirty(*pte) && !pte_write(*pte)) - goto unlock; + if (pte_dirty(*pte) || pte_write(*pte)) { + pte_t entry; - entry = ptep_get_and_clear(mm, address, pte); - entry = pte_mkclean(entry); - entry = pte_wrprotect(entry); - ptep_establish(vma, address, pte, entry); - lazy_mmu_prot_update(entry); - ret = 1; + flush_cache_page(vma, address, pte_pfn(*pte)); + entry = ptep_clear_flush(vma, address, pte); + entry = pte_wrprotect(entry); + entry = pte_mkclean(entry); + set_pte_at(vma, address, pte, entry); + lazy_mmu_prot_update(entry); + ret = 1; + } -unlock: pte_unmap_unlock(pte, ptl); out: return ret; @@ -489,6 +490,8 @@ int page_mkclean(struct page *page) if (mapping) ret = page_mkclean_file(mapping, page); } + if (page_test_and_clear_dirty(page)) + ret = 1; return ret; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]: > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... ... and it failed. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an hour. Andrei, which torrent do you download as a testcase? It would be good if someone could suggest a torrent which is legal and not too large. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..4f4cd13 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 9d774d0..8879f1d 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -61,31 +61,6 @@ ({ \ }) #endif -#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY -#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\ -({ \ - pte_t __pte = *__ptep; \ - int r = 1; \ - if (!pte_dirty(__pte)) \ - r = 0; \ - else\ - set_pte_at((__vma)->vm_mm, (__address), (__ptep), \ - pte_mkclean(__pte)); \ - r; \ -}) -#endif - -#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(__vma, __address, __ptep) \ -({ \ - int __dirty;\ - __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep); \ - if (__dirty)\ - flush_tlb_page(__vma, __address); \ - __dirty;\ -}) -#endif - #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR #define ptep_get_and_clear(__mm, __address, __ptep)\ ({ \ diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h index e6a4723..b61d6f9 100644 --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -300,18 +300,20 @@ do { \ flush_tlb_page(vma, address); \ } while (0) -#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(vma, address, ptep) \ -({ \ - int __dirty;\ - __dirty = pte_dirty(*(ptep)); \ - if (__dirty) { \ - clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low); \ - pte_update_defer((vma)->vm_mm, (address), (ptep)); \ - flush_tlb_page(vma, address); \ - } \ - __dirty;
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-22 02:17]: > > This hunk (on top of git from about 2 days ago and your latest patch) > > results in the installer hanging right at the start. > > You'll need this also: It starts again, thanks. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:10]: > > immediately when I started wget, the hanging apt-get process > > continued. > ... and now that we've completed this step, the apt cache has suddenly > been reduced (see Gordon's mail for an explanation) and it segfaults: One of my questions was why apt-get worked to install the initramfs-tools, the kernel and some other packages but later hung while it was building the cache (which clearly it had built already to install some packages): before the installer offers to install additional packages, it changes the apt sources, which leads to apt rebuilding the cache, and here it hangs. Remember how I said that downloading a file with wget prompts apt to work again? Apparently any filesystem access will do (I just ran find / > /dev/null). Gordon, can you confirm this? -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 22 Dec 2006 11:00:04 +0100 Martin Michlmayr <[EMAIL PROTECTED]> wrote: > > - if (TestClearPageDirty(page) && account_size) > > + if (TestClearPageDirty(page) && account_size) { > > + dec_zone_page_state(page, NR_FILE_DIRTY); > > task_io_account_cancelled_write(account_size); > > + } > > This hunk (on top of git from about 2 days ago and your latest patch) > results in the installer hanging right at the start. You'll need this also: From: Andrew Morton <[EMAIL PROTECTED]> Only (un)account for IO and page-dirtying for devices which have real backing store (ie: not tmpfs or ramdisks). Cc: "David S. Miller" <[EMAIL PROTECTED]> Cc: Linus Torvalds <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- mm/truncate.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff -puN mm/truncate.c~truncate-dirty-memory-accounting-fix mm/truncate.c --- a/mm/truncate.c~truncate-dirty-memory-accounting-fix +++ a/mm/truncate.c @@ -60,7 +60,8 @@ void cancel_dirty_page(struct page *page WARN_ON(++warncount < 5); } - if (TestClearPageDirty(page) && account_size) { + if (TestClearPageDirty(page) && account_size && + mapping_cap_account_dirty(page->mapping)) { dec_zone_page_state(page, NR_FILE_DIRTY); task_io_account_cancelled_write(account_size); } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:06]: > Okay, it's really weird. So apt-get just hangs doing nothing and I > cannot even kill it. I just tried to download strace via wget and > immediately when I started wget, the hanging apt-get process > continued. ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: sh-3.1# ls -l /var/cache/apt/ total 12524 drwxr-xr-x 3 root root 12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin -rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin sh-3.1# apt-get -f install Reading package lists... Done Segmentation faulty tree... 50% -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:00]: > This time, however, I let the installer continue and it seems that > with your patch apt now works where it failed in the past, but it > hangs later on. It's pretty weird because I cannot even kill the > process: Okay, it's really weird. So apt-get just hangs doing nothing and I cannot even kill it. I just tried to download strace via wget and immediately when I started wget, the hanging apt-get process continued. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-21 21:20]: > generating these files, pkgcache.bin grows to 12582912 bytes, and when > apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is > 64254483 bytes. This time, when apt-get exited, it had only created > pkgcache.bin which was still 12582912 bytes. Yes, same here: sh-3.1# ls -l /var/cache/apt/ total 5252 drwxr-xr-x 3 root root12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin -rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin Gordon, does it fail for you where it normally does (installing initramfs-tools) or much later? For me, the installer was able to install initramfs-tools and the kernel, but apt now hangs at "Select and install software". -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-21 20:54]: > But it sounds like I probably misunderstood something, because I thought > that Martin had acknowledged that this patch actually worked for him. That's what I thought too but now I can confirm what Gordon sees. But it's pretty weird. Our testcase is to run Debian installer on the NSLU2 arm device and apt-get would either segfault or hang at this particular spot in the installation (when apt is first run). With your patch, apt works correctly where it normally fails (at least for me). I stopped the installation at this point and repeated it several more times to make sure it's really working. And, yes, I can repeat this result. This time, however, I let the installer continue and it seems that with your patch apt now works where it failed in the past, but it hangs later on. It's pretty weird because I cannot even kill the process: sh-3.1# ps aux | grep 31126 root 31126 5.7 20.6 16240 6076 ?R+ 04:45 0:21 apt-get -o APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install popularity-contest root 31157 0.0 1.6 1516 492 ttyS0S+ 04:51 0:00 grep 31126 sh-3.1# kill -9 31126 sh-3.1# kill -9 31126 sh-3.1# ps aux | grep 31126 root 31126 5.6 20.6 16240 6076 ?R+ 04:45 0:21 apt-get -o APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install popularity-contest root 31159 0.0 1.6 1516 492 ttyS0S+ 04:51 0:00 grep 31126 sh-3.1# > Which sounded very similar to your setup (he has a 32M ARM box too, no?) It's the same device, a Linksys NSLU2. > Author: Andrew Morton <[EMAIL PROTECTED]> This patch makes it even worse for me. > - if (TestClearPageDirty(page) && account_size) > + if (TestClearPageDirty(page) && account_size) { > + dec_zone_page_state(page, NR_FILE_DIRTY); > task_io_account_cancelled_write(account_size); > + } This hunk (on top of git from about 2 days ago and your latest patch) results in the installer hanging right at the start. The Linux kernel boots fine, the debian-installer is loaded into a ramdisk but when ncurses is being started it just hangs. Reverting this hunk makes it start again. Does that help or confuse you even more? -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Linus Torvalds [EMAIL PROTECTED] [2006-12-21 20:54]: But it sounds like I probably misunderstood something, because I thought that Martin had acknowledged that this patch actually worked for him. That's what I thought too but now I can confirm what Gordon sees. But it's pretty weird. Our testcase is to run Debian installer on the NSLU2 arm device and apt-get would either segfault or hang at this particular spot in the installation (when apt is first run). With your patch, apt works correctly where it normally fails (at least for me). I stopped the installation at this point and repeated it several more times to make sure it's really working. And, yes, I can repeat this result. This time, however, I let the installer continue and it seems that with your patch apt now works where it failed in the past, but it hangs later on. It's pretty weird because I cannot even kill the process: sh-3.1# ps aux | grep 31126 root 31126 5.7 20.6 16240 6076 ?R+ 04:45 0:21 apt-get -o APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install popularity-contest root 31157 0.0 1.6 1516 492 ttyS0S+ 04:51 0:00 grep 31126 sh-3.1# kill -9 31126 sh-3.1# kill -9 31126 sh-3.1# ps aux | grep 31126 root 31126 5.6 20.6 16240 6076 ?R+ 04:45 0:21 apt-get -o APT::Status-Fd=4 -o APT::Keep-Fds::=5 -o APT::Keep-Fds::=6 -q -y -f install popularity-contest root 31159 0.0 1.6 1516 492 ttyS0S+ 04:51 0:00 grep 31126 sh-3.1# Which sounded very similar to your setup (he has a 32M ARM box too, no?) It's the same device, a Linksys NSLU2. Author: Andrew Morton [EMAIL PROTECTED] This patch makes it even worse for me. - if (TestClearPageDirty(page) account_size) + if (TestClearPageDirty(page) account_size) { + dec_zone_page_state(page, NR_FILE_DIRTY); task_io_account_cancelled_write(account_size); + } This hunk (on top of git from about 2 days ago and your latest patch) results in the installer hanging right at the start. The Linux kernel boots fine, the debian-installer is loaded into a ramdisk but when ncurses is being started it just hangs. Reverting this hunk makes it start again. Does that help or confuse you even more? -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Gordon Farquharson [EMAIL PROTECTED] [2006-12-21 21:20]: generating these files, pkgcache.bin grows to 12582912 bytes, and when apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is 64254483 bytes. This time, when apt-get exited, it had only created pkgcache.bin which was still 12582912 bytes. Yes, same here: sh-3.1# ls -l /var/cache/apt/ total 5252 drwxr-xr-x 3 root root12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin -rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin Gordon, does it fail for you where it normally does (installing initramfs-tools) or much later? For me, the installer was able to install initramfs-tools and the kernel, but apt now hangs at Select and install software. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:00]: This time, however, I let the installer continue and it seems that with your patch apt now works where it failed in the past, but it hangs later on. It's pretty weird because I cannot even kill the process: Okay, it's really weird. So apt-get just hangs doing nothing and I cannot even kill it. I just tried to download strace via wget and immediately when I started wget, the hanging apt-get process continued. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:06]: Okay, it's really weird. So apt-get just hangs doing nothing and I cannot even kill it. I just tried to download strace via wget and immediately when I started wget, the hanging apt-get process continued. ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: sh-3.1# ls -l /var/cache/apt/ total 12524 drwxr-xr-x 3 root root 12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 6426885 Dec 22 05:03 pkgcache.bin -rw-r--r-- 1 root root 6426835 Dec 22 05:03 srcpkgcache.bin sh-3.1# apt-get -f install Reading package lists... Done Segmentation faulty tree... 50% -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 22 Dec 2006 11:00:04 +0100 Martin Michlmayr [EMAIL PROTECTED] wrote: - if (TestClearPageDirty(page) account_size) + if (TestClearPageDirty(page) account_size) { + dec_zone_page_state(page, NR_FILE_DIRTY); task_io_account_cancelled_write(account_size); + } This hunk (on top of git from about 2 days ago and your latest patch) results in the installer hanging right at the start. You'll need this also: From: Andrew Morton [EMAIL PROTECTED] Only (un)account for IO and page-dirtying for devices which have real backing store (ie: not tmpfs or ramdisks). Cc: David S. Miller [EMAIL PROTECTED] Cc: Linus Torvalds [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- mm/truncate.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff -puN mm/truncate.c~truncate-dirty-memory-accounting-fix mm/truncate.c --- a/mm/truncate.c~truncate-dirty-memory-accounting-fix +++ a/mm/truncate.c @@ -60,7 +60,8 @@ void cancel_dirty_page(struct page *page WARN_ON(++warncount 5); } - if (TestClearPageDirty(page) account_size) { + if (TestClearPageDirty(page) account_size + mapping_cap_account_dirty(page-mapping)) { dec_zone_page_state(page, NR_FILE_DIRTY); task_io_account_cancelled_write(account_size); } _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:10]: immediately when I started wget, the hanging apt-get process continued. ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: One of my questions was why apt-get worked to install the initramfs-tools, the kernel and some other packages but later hung while it was building the cache (which clearly it had built already to install some packages): before the installer offers to install additional packages, it changes the apt sources, which leads to apt rebuilding the cache, and here it hangs. Remember how I said that downloading a file with wget prompts apt to work again? Apparently any filesystem access will do (I just ran find / /dev/null). Gordon, can you confirm this? -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Andrew Morton [EMAIL PROTECTED] [2006-12-22 02:17]: This hunk (on top of git from about 2 days ago and your latest patch) results in the installer hanging right at the start. You'll need this also: It starts again, thanks. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..4f4cd13 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 9d774d0..8879f1d 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -61,31 +61,6 @@ ({ \ }) #endif -#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY -#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\ -({ \ - pte_t __pte = *__ptep; \ - int r = 1; \ - if (!pte_dirty(__pte)) \ - r = 0; \ - else\ - set_pte_at((__vma)-vm_mm, (__address), (__ptep), \ - pte_mkclean(__pte)); \ - r; \ -}) -#endif - -#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(__vma, __address, __ptep) \ -({ \ - int __dirty;\ - __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep); \ - if (__dirty)\ - flush_tlb_page(__vma, __address); \ - __dirty;\ -}) -#endif - #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR #define ptep_get_and_clear(__mm, __address, __ptep)\ ({ \ diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h index e6a4723..b61d6f9 100644 --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -300,18 +300,20 @@ do { \ flush_tlb_page(vma, address); \ } while (0) -#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(vma, address, ptep) \ -({ \ - int __dirty;\ - __dirty = pte_dirty(*(ptep)); \ - if (__dirty) { \ - clear_bit(_PAGE_BIT_DIRTY, (ptep)-pte_low); \ - pte_update_defer((vma)-vm_mm, (address), (ptep)); \ - flush_tlb_page(vma, address); \ - } \ - __dirty;
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]: With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an hour. Andrei, which torrent do you download as a testcase? It would be good if someone could suggest a torrent which is legal and not too large. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 13:32]: I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... ... and it failed. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote: * Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 13:32]: I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... and it failed. Since you are on ARM you might want to try with the page_mkclean_one cleanup patch too. Arjan agreed that the loop is not needed; we clear the pte, flush on all CPUs and then re-establish the pte. Any race will fault and be serialised on the pte lock. FWIW - with todays -git and Andrews second cancel_dirty_page() patch: http://lkml.org/lkml/2006/12/22/49 I am unable to trigger any corruption - I could again earlier by raising the number of seeds from 3 to 6. (am currently at 10 seeds) From: Peter Zijlstra [EMAIL PROTECTED] fix page_mkclean_one() - add flush_cache_page() for all those virtual indexed cache architectures. - handle s390. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/rmap.c | 38 +- 1 file changed, 25 insertions(+), 13 deletions(-) Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -432,7 +432,7 @@ static int page_mkclean_one(struct page { struct mm_struct *mm = vma-vm_mm; unsigned long address; - pte_t *pte, entry; + pte_t *pte; spinlock_t *ptl; int ret = 0; @@ -444,17 +444,18 @@ static int page_mkclean_one(struct page if (!pte) goto out; - if (!pte_dirty(*pte) !pte_write(*pte)) - goto unlock; + if (pte_dirty(*pte) || pte_write(*pte)) { + pte_t entry; - entry = ptep_get_and_clear(mm, address, pte); - entry = pte_mkclean(entry); - entry = pte_wrprotect(entry); - ptep_establish(vma, address, pte, entry); - lazy_mmu_prot_update(entry); - ret = 1; + flush_cache_page(vma, address, pte_pfn(*pte)); + entry = ptep_clear_flush(vma, address, pte); + entry = pte_wrprotect(entry); + entry = pte_mkclean(entry); + set_pte_at(vma, address, pte, entry); + lazy_mmu_prot_update(entry); + ret = 1; + } -unlock: pte_unmap_unlock(pte, ptl); out: return ret; @@ -489,6 +490,8 @@ int page_mkclean(struct page *page) if (mapping) ret = page_mkclean_file(mapping, page); } + if (page_test_and_clear_dirty(page)) + ret = 1; return ret; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/