Re: high number of dropped packets/rx_missed_errors from 4.17 kernel
Hi, I’ve applied your patch on kernel 4.17.0 and dropped packets and rx_missed_errors are still present, through they are increasing at a lower rate. root@shaper:~# ./test rx_missed_errors: 2135 RX errors 0 dropped 2155 overruns 0 frame 0 sleeping 60 seconds rx_missed_errors: 2433 RX errors 0 dropped 2459 overruns 0 frame 0 sleeping 60 seconds rx_missed_errors: 2433 RX errors 0 dropped 2465 overruns 0 frame 0 sleeping 60 seconds rx_missed_errors: 2526 RX errors 0 dropped 2564 overruns 0 frame 0 sleeping 60 seconds > On 3 Dec 2020, at 21:43, Andrei Popa wrote: > > Hi, > > On what kernel version should I try the patch ? I tried on 5.9 and it doesn't > build. > >> On 18 Nov 2020, at 20:47, Rafael J. Wysocki wrote: >> >> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote: >>> On 11/16/2020 8:11 AM, Andrei Popa wrote: >>>> Hello, >>>> >>>> After an update from vmlinuz-4.15.0-106-generic to >>>> vmlinuz-5.4.0-37-generic we experience, on a number of servers, a very >>>> high number of rx_missed_errors and dropped packets only on the uplink 10G >>>> interface. We have another 10G downlink interface with no problems. >>>> >>>> The affected servers have the following mainboards: >>>> S5520HC ver E26045-455 >>>> S5520UR ver E22554-751 >>>> S5520UR ver E22554-753 >>>> S5000VSA >>>> >>>> On other 30 servers with similar mainboards and/or configs there are no >>>> dropped packets with vmlinuz-5.4.0-37-generic. >>>> >>>> We’ve installed vanilla 4.16 and there were no dropped packets. >>>> Vanilla 4.17 had a very high number of dropped packets like the following: >>>> >>>> root@shaper:~# cat test >>>> #!/bin/bash >>>> while true >>>> do >>>> ethtool -S ens6f1|grep "missed_errors" >>>> ifconfig ens6f1|grep RX|grep dropped >>>> sleep 1 >>>> done >>>> >>>> root@shaper:~# ./test >>>> rx_missed_errors: 2418845 >>>>RX errors 0 dropped 241 overruns 0 frame 0 >>>> rx_missed_errors: 2426175 >>>>RX errors 0 dropped 2426218 overruns 0 frame 0 >>>> rx_missed_errors: 2431910 >>>>RX errors 0 dropped 2431953 overruns 0 frame 0 >>>> rx_missed_errors: 2437266 >>>>RX errors 0 dropped 2437309 overruns 0 frame 0 >>>> rx_missed_errors: 2443305 >>>>RX errors 0 dropped 2443348 overruns 0 frame 0 >>>> rx_missed_errors: 2448357 >>>>RX errors 0 dropped 2448400 overruns 0 frame 0 >>>> rx_missed_errors: 2452539 >>>>RX errors 0 dropped 2452582 overruns 0 frame 0 >>>> >>>> We did a git bisect and we’ve found that the following commit generates >>>> the high number of dropped packets: >>>> >>>> Author: Rafael J. Wysocki >>> <mailto:rafael.j.wyso...@intel.com>> >>>> Date: Thu Apr 5 19:12:43 2018 +0200 >>>>cpuidle: menu: Avoid selecting shallow states with stopped tick >>>>If the scheduler tick has been stopped already and the governor >>>>selects a shallow idle state, the CPU can spend a long time in that >>>>state if the selection is based on an inaccurate prediction of idle >>>>time. That effect turns out to be relevant, so it needs to be >>>>mitigated. >>>>To that end, modify the menu governor to discard the result of the >>>>idle time prediction if the tick is stopped and the predicted idle >>>>time is less than the tick period length, unless the tick timer is >>>>going to expire soon. >>>>Signed-off-by: Rafael J. Wysocki >>> <mailto:rafael.j.wyso...@intel.com>> >>>>Acked-by: Peter Zijlstra (Intel) >>> <mailto:pet...@infradead.org>> >>>> diff --git a/drivers/cpuidle/governors/menu.c >>>> b/drivers/cpuidle/governors/menu.c >>>> index 267982e471e0..1bfe03ceb236 100644 >>>> --- a/drivers/cpuidle/governors/menu.c >>>> +++ b/drivers/cpuidle/governors/menu.c >>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, >>>> struct cpuidle_device *dev, >>>> */ >>>>data->predicted_us = min(data->predicted_us, expected
Re: high number of dropped packets/rx_missed_errors from 4.17 kernel
Hi, On what kernel version should I try the patch ? I tried on 5.9 and it doesn't build. > On 18 Nov 2020, at 20:47, Rafael J. Wysocki wrote: > > On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote: >> On 11/16/2020 8:11 AM, Andrei Popa wrote: >>> Hello, >>> >>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic >>> we experience, on a number of servers, a very high number of >>> rx_missed_errors and dropped packets only on the uplink 10G interface. We >>> have another 10G downlink interface with no problems. >>> >>> The affected servers have the following mainboards: >>> S5520HC ver E26045-455 >>> S5520UR ver E22554-751 >>> S5520UR ver E22554-753 >>> S5000VSA >>> >>> On other 30 servers with similar mainboards and/or configs there are no >>> dropped packets with vmlinuz-5.4.0-37-generic. >>> >>> We’ve installed vanilla 4.16 and there were no dropped packets. >>> Vanilla 4.17 had a very high number of dropped packets like the following: >>> >>> root@shaper:~# cat test >>> #!/bin/bash >>> while true >>> do >>> ethtool -S ens6f1|grep "missed_errors" >>> ifconfig ens6f1|grep RX|grep dropped >>> sleep 1 >>> done >>> >>> root@shaper:~# ./test >>> rx_missed_errors: 2418845 >>> RX errors 0 dropped 241 overruns 0 frame 0 >>> rx_missed_errors: 2426175 >>> RX errors 0 dropped 2426218 overruns 0 frame 0 >>> rx_missed_errors: 2431910 >>> RX errors 0 dropped 2431953 overruns 0 frame 0 >>> rx_missed_errors: 2437266 >>> RX errors 0 dropped 2437309 overruns 0 frame 0 >>> rx_missed_errors: 2443305 >>> RX errors 0 dropped 2443348 overruns 0 frame 0 >>> rx_missed_errors: 2448357 >>> RX errors 0 dropped 2448400 overruns 0 frame 0 >>> rx_missed_errors: 2452539 >>> RX errors 0 dropped 2452582 overruns 0 frame 0 >>> >>> We did a git bisect and we’ve found that the following commit generates the >>> high number of dropped packets: >>> >>> Author: Rafael J. Wysocki >> <mailto:rafael.j.wyso...@intel.com>> >>> Date: Thu Apr 5 19:12:43 2018 +0200 >>> cpuidle: menu: Avoid selecting shallow states with stopped tick >>> If the scheduler tick has been stopped already and the governor >>> selects a shallow idle state, the CPU can spend a long time in that >>> state if the selection is based on an inaccurate prediction of idle >>> time. That effect turns out to be relevant, so it needs to be >>> mitigated. >>> To that end, modify the menu governor to discard the result of the >>> idle time prediction if the tick is stopped and the predicted idle >>> time is less than the tick period length, unless the tick timer is >>> going to expire soon. >>> Signed-off-by: Rafael J. Wysocki >> <mailto:rafael.j.wyso...@intel.com>> >>> Acked-by: Peter Zijlstra (Intel) >> <mailto:pet...@infradead.org>> >>> diff --git a/drivers/cpuidle/governors/menu.c >>> b/drivers/cpuidle/governors/menu.c >>> index 267982e471e0..1bfe03ceb236 100644 >>> --- a/drivers/cpuidle/governors/menu.c >>> +++ b/drivers/cpuidle/governors/menu.c >>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, >>> struct cpuidle_device *dev, >>> */ >>> data->predicted_us = min(data->predicted_us, expected_interval); >>> - /* >>> -* Use the performance multiplier and the user-configurable >>> -* latency_req to determine the maximum exit latency. >>> -*/ >>> - interactivity_req = data->predicted_us / >>> performance_multiplier(nr_iowaiters, cpu_load); >>> - if (latency_req > interactivity_req) >>> - latency_req = interactivity_req; >> >> The tick_nohz_tick_stopped() check may be done after the above and it >> may be reworked a bit. >> >> I'll send a test patch to you shortly. > > The patch is appended, but please note that it has been rebased by hand and > not tested. > > Please let me know if it makes any difference. > > And in the future please avoid pasting the entire kernel config to your > reports, that's problematic. > > --- > dri
Re: [BUG] ethX misnumbered and one missing in mii-tool
On Fri, 2007-03-30 at 12:35 -0400, Lennart Sorensen wrote: > On Fri, Mar 30, 2007 at 10:42:23AM +0300, Andrei Popa wrote: > > ethtool reports the same > > Is udev running and having fun renumbering interfaces as they are being > detected in order to keep "consistent" interface names? yes, it's udevs fault: zeus rules.d # cat 70-persistent-net.rules # This file was automatically generated by the /lib/udev/write_net_rules # program, probably run by the persistent-net-generator.rules rules file. # # You can modify it, as long as you keep each rule on a single line. # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:55", NAME="eth1" # PCI device 0x8086:0x1026 (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:0e:0c:ba:a8:50", NAME="eth2" # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:54", NAME="eth0" # PCI device 0x8086:0x1027 (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:0e:0c:5f:84:84", NAME="eth3" # PCI device 0x1148:0x4320 (skge) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:0c:46:46:7c:7f", NAME="eth4" # PCI device 0x8086:0x105e (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:21:0c:09", NAME="eth5" # PCI device 0x8086:0x105e (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:21:0c:08", NAME="eth6" # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:69", NAME="eth7" # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:68", NAME="eth8" thanks for pointing this out. > > -- > Len Sorensen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] ethX misnumbered and one missing in mii-tool
On Thu, 2007-03-29 at 21:21 -0700, Jesse Brandeburg wrote: > added netdev. > > On 3/29/07, Andrei Popa <[EMAIL PROTECTED]> wrote: > > In a dual core 2 server with an intel motherboard and 5 network > > cards(two onboard) and 1 pci express card with two slots and one pci-x > > pci64 card the kernel sees all of them in dmesg but in mii-tool are > > misnumbered and one card is missing. > > (please CC as I am not subscribed to lkml) > > please don't use mii-tool, ethtool is a much better option and > actually works with gigabit cards. ethtool reports the same > > > from dmesg: > > Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI > > Copyright (c) 1999-2005 Intel Corporation. > > ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16 > > PCI: Setting latency timer of device :03:00.0 to 64 > > e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) > > 00:15:17:21:0c:08 > > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > > eth0... > > > ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17 > > PCI: Setting latency timer of device :03:00.1 to 64 > > e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) > > 00:15:17:21:0c:09 > > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > eth0... > > > ACPI: PCI Interrupt :05:00.0[A] -> GSI 18 (level, low) -> IRQ 18 > > PCI: Setting latency timer of device :05:00.0 to 64 > > e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) > > 00:15:17:17:b7:68 > > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > eth0... > > GSI 20 sharing vector 0xC9 and IRQ 20 > > ACPI: PCI Interrupt :05:00.1[B] -> GSI 19 (level, low) -> IRQ 20 > > PCI: Setting latency timer of device :05:00.1 to 64 > > e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) > > 00:15:17:17:b7:69 > > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > eth0... > > > GSI 21 sharing vector 0xD1 and IRQ 21 > > ACPI: PCI Interrupt :06:02.0[A] -> GSI 27 (level, low) -> IRQ 21 > > e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit) > > 00:0e:0c:ba:a8:50 > > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > eth0... > > um, I'm a little confused why every interface was named eth0 when it > tried to come up. > you didn't mention what kernel you're using. this war kernel 2.6.17.14 and the driver was compiled as a module. with kernel 2.6.20.4(and build in e1000 driver): zeus ~ # uname -a Linux zeus 2.6.20.4-zeus3 #3 SMP Wed Mar 28 13:44:50 EEST 2007 x86_64 Intel(R) Xeon(TM) CPU 3.00GHz GenuineIntel GNU/Linux the devices are recognized ok as eth0,eth1.eth2,eth3,eth4 but misnumered and one missing int mii-tool/ethtool Intel(R) PRO/1000 Network Driver - version 7.3.15-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device :03:00.0 to 64 e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:08 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device :03:00.1 to 64 e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:09 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :05:00.0[A] -> GSI 18 (level, low) -> IRQ 18 PCI: Setting latency timer of device :05:00.0 to 64 e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:68 e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :05:00.1[B] -> GSI 19 (level, low) -> IRQ 19 PCI: Setting latency timer of device :05:00.1 to 64 e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:69 e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :06:02.0[A] -> GSI 27 (level, low) -> IRQ 27 e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit) 00:0e:0c:ba:a8:50 e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection zeus ~ # mii-tool eth2: no link eth5: negotiated 100baseTx-FD, link ok eth6: no link eth7: no link zeus ~ # ethtool shows the same > > you can enable MSI and not share interrupts on this platform, it will > at least help your PCIe adapters. Initialy I enabled it but I thought it was a problem from there and disabled it. > > > zeus ~ # mii-tool > > eth2: no link > > eth5: negotiated 100baseTx-FD, link ok > > eth6: no link > > eth7: no link > > zeus ~ # > > > > it
Re: [BUG] ethX misnumbered and one missing in mii-tool
On Thu, 2007-03-29 at 21:21 -0700, Jesse Brandeburg wrote: added netdev. On 3/29/07, Andrei Popa [EMAIL PROTECTED] wrote: In a dual core 2 server with an intel motherboard and 5 network cards(two onboard) and 1 pci express card with two slots and one pci-x pci64 card the kernel sees all of them in dmesg but in mii-tool are misnumbered and one card is missing. (please CC as I am not subscribed to lkml) please don't use mii-tool, ethtool is a much better option and actually works with gigabit cards. ethtool reports the same from dmesg: Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI Copyright (c) 1999-2005 Intel Corporation. ACPI: PCI Interrupt :03:00.0[A] - GSI 16 (level, low) - IRQ 16 PCI: Setting latency timer of device :03:00.0 to 64 e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:08 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection eth0... ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17 PCI: Setting latency timer of device :03:00.1 to 64 e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:09 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection eth0... ACPI: PCI Interrupt :05:00.0[A] - GSI 18 (level, low) - IRQ 18 PCI: Setting latency timer of device :05:00.0 to 64 e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:68 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection eth0... GSI 20 sharing vector 0xC9 and IRQ 20 ACPI: PCI Interrupt :05:00.1[B] - GSI 19 (level, low) - IRQ 20 PCI: Setting latency timer of device :05:00.1 to 64 e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:69 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection eth0... GSI 21 sharing vector 0xD1 and IRQ 21 ACPI: PCI Interrupt :06:02.0[A] - GSI 27 (level, low) - IRQ 21 e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit) 00:0e:0c:ba:a8:50 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection eth0... um, I'm a little confused why every interface was named eth0 when it tried to come up. you didn't mention what kernel you're using. this war kernel 2.6.17.14 and the driver was compiled as a module. with kernel 2.6.20.4(and build in e1000 driver): zeus ~ # uname -a Linux zeus 2.6.20.4-zeus3 #3 SMP Wed Mar 28 13:44:50 EEST 2007 x86_64 Intel(R) Xeon(TM) CPU 3.00GHz GenuineIntel GNU/Linux the devices are recognized ok as eth0,eth1.eth2,eth3,eth4 but misnumered and one missing int mii-tool/ethtool Intel(R) PRO/1000 Network Driver - version 7.3.15-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt :03:00.0[A] - GSI 16 (level, low) - IRQ 16 PCI: Setting latency timer of device :03:00.0 to 64 e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:08 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17 PCI: Setting latency timer of device :03:00.1 to 64 e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:09 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :05:00.0[A] - GSI 18 (level, low) - IRQ 18 PCI: Setting latency timer of device :05:00.0 to 64 e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:68 e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :05:00.1[B] - GSI 19 (level, low) - IRQ 19 PCI: Setting latency timer of device :05:00.1 to 64 e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:69 e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :06:02.0[A] - GSI 27 (level, low) - IRQ 27 e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit) 00:0e:0c:ba:a8:50 e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection zeus ~ # mii-tool eth2: no link eth5: negotiated 100baseTx-FD, link ok eth6: no link eth7: no link zeus ~ # ethtool shows the same you can enable MSI and not share interrupts on this platform, it will at least help your PCIe adapters. Initialy I enabled it but I thought it was a problem from there and disabled it. zeus ~ # mii-tool eth2: no link eth5: negotiated 100baseTx-FD, link ok eth6: no link eth7: no link zeus ~ # it sees only 4 cards that are misnumbered and one is missing. what does 'ip link' or 'ifconfig -a' show? zeus ~ # ip link 1: eth6: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000 link/ether 00:15:17:21:0c:08 brd ff:ff:ff:ff:ff:ff 2: eth5: BROADCAST,MULTICAST,UP,1 mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:15:17:21:0c:09 brd ff:ff:ff:ff:ff:ff 3: eth8: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000 link/ether 00:15:17:17:b7:68 brd ff:ff:ff:ff:ff:ff 4: eth7: BROADCAST,MULTICAST mtu 1500
Re: [BUG] ethX misnumbered and one missing in mii-tool
On Fri, 2007-03-30 at 12:35 -0400, Lennart Sorensen wrote: On Fri, Mar 30, 2007 at 10:42:23AM +0300, Andrei Popa wrote: ethtool reports the same Is udev running and having fun renumbering interfaces as they are being detected in order to keep consistent interface names? yes, it's udevs fault: zeus rules.d # cat 70-persistent-net.rules # This file was automatically generated by the /lib/udev/write_net_rules # program, probably run by the persistent-net-generator.rules rules file. # # You can modify it, as long as you keep each rule on a single line. # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:55, NAME=eth1 # PCI device 0x8086:0x1026 (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:0e:0c:ba:a8:50, NAME=eth2 # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:54, NAME=eth0 # PCI device 0x8086:0x1027 (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:0e:0c:5f:84:84, NAME=eth3 # PCI device 0x1148:0x4320 (skge) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:0c:46:46:7c:7f, NAME=eth4 # PCI device 0x8086:0x105e (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:21:0c:09, NAME=eth5 # PCI device 0x8086:0x105e (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:21:0c:08, NAME=eth6 # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:69, NAME=eth7 # PCI device 0x8086:0x1096 (e1000) SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:68, NAME=eth8 thanks for pointing this out. -- Len Sorensen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] ethX misnumbered and one missing in mii-tool
Hello, In a dual core 2 server with an intel motherboard and 5 network cards(two onboard) and 1 pci express card with two slots and one pci-x pci64 card the kernel sees all of them in dmesg but in mii-tool are misnumbered and one card is missing. (please CC as I am not subscribed to lkml) from dmesg: Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI Copyright (c) 1999-2005 Intel Corporation. ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device :03:00.0 to 64 e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:08 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device :03:00.1 to 64 e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:09 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :05:00.0[A] -> GSI 18 (level, low) -> IRQ 18 PCI: Setting latency timer of device :05:00.0 to 64 e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:68 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection GSI 20 sharing vector 0xC9 and IRQ 20 ACPI: PCI Interrupt :05:00.1[B] -> GSI 19 (level, low) -> IRQ 20 PCI: Setting latency timer of device :05:00.1 to 64 e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:69 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection GSI 21 sharing vector 0xD1 and IRQ 21 ACPI: PCI Interrupt :06:02.0[A] -> GSI 27 (level, low) -> IRQ 21 e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit) 00:0e:0c:ba:a8:50 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection zeus ~ # mii-tool eth2: no link eth5: negotiated 100baseTx-FD, link ok eth6: no link eth7: no link zeus ~ # it sees only 4 cards that are misnumbered and one is missing. zeus ~ # lspci 00:00.0 Host bridge: Intel Corporation Server Memory Contoller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3 (rev b1) 00:03.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 3 (rev b1) 00:08.0 System peripheral: Intel Corporation Server DMA Engine (rev b1) 00:10.0 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.1 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.2 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:11.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #3 (rev 09) 00:1d.3 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #4 (rev 09) 00:1d.7 USB Controller: Intel Corporation Enterprise Southbridge EHCI USB (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation Enterprise Southbridge LPC (rev 09) 00:1f.1 IDE interface: Intel Corporation Enterprise Southbridge PATA (rev 09) 00:1f.2 SATA controller: Intel Corporation Enterprise Southbridge SATA AHCI (rev 09) 00:1f.3 SMBus: Intel Corporation Enterprise Southbridge SMBus (rev 09) 01:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E1 (rev 01) 02:01.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E2 (rev 01) 02:02.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E3 (rev 01) 03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 05:00.0 Ethernet controller: Intel Corporation PRO/1000 EB Network Connection with I/O Acceleration (rev 01) 05:00.1 Ethernet controller: Intel Corporation PRO/1000 EB Network Connection with I/O Acceleration (rev 01) 06:02.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet Controller (rev 04) 09:0c.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) # # Automatically generated make config: don't edit # Linux kernel version: 2.6.20.4-zeus3 # Fri Mar 30 23:07:23 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y
[BUG] ethX misnumbered and one missing in mii-tool
Hello, In a dual core 2 server with an intel motherboard and 5 network cards(two onboard) and 1 pci express card with two slots and one pci-x pci64 card the kernel sees all of them in dmesg but in mii-tool are misnumbered and one card is missing. (please CC as I am not subscribed to lkml) from dmesg: Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI Copyright (c) 1999-2005 Intel Corporation. ACPI: PCI Interrupt :03:00.0[A] - GSI 16 (level, low) - IRQ 16 PCI: Setting latency timer of device :03:00.0 to 64 e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:08 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17 PCI: Setting latency timer of device :03:00.1 to 64 e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:21:0c:09 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI Interrupt :05:00.0[A] - GSI 18 (level, low) - IRQ 18 PCI: Setting latency timer of device :05:00.0 to 64 e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:68 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection GSI 20 sharing vector 0xC9 and IRQ 20 ACPI: PCI Interrupt :05:00.1[B] - GSI 19 (level, low) - IRQ 20 PCI: Setting latency timer of device :05:00.1 to 64 e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4) 00:15:17:17:b7:69 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection GSI 21 sharing vector 0xD1 and IRQ 21 ACPI: PCI Interrupt :06:02.0[A] - GSI 27 (level, low) - IRQ 21 e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit) 00:0e:0c:ba:a8:50 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection zeus ~ # mii-tool eth2: no link eth5: negotiated 100baseTx-FD, link ok eth6: no link eth7: no link zeus ~ # it sees only 4 cards that are misnumbered and one is missing. zeus ~ # lspci 00:00.0 Host bridge: Intel Corporation Server Memory Contoller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3 (rev b1) 00:03.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 3 (rev b1) 00:08.0 System peripheral: Intel Corporation Server DMA Engine (rev b1) 00:10.0 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.1 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:10.2 Host bridge: Intel Corporation Server Error Reporting Registers (rev b1) 00:11.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation Server FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #3 (rev 09) 00:1d.3 USB Controller: Intel Corporation Enterprise Southbridge UHCI USB #4 (rev 09) 00:1d.7 USB Controller: Intel Corporation Enterprise Southbridge EHCI USB (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation Enterprise Southbridge LPC (rev 09) 00:1f.1 IDE interface: Intel Corporation Enterprise Southbridge PATA (rev 09) 00:1f.2 SATA controller: Intel Corporation Enterprise Southbridge SATA AHCI (rev 09) 00:1f.3 SMBus: Intel Corporation Enterprise Southbridge SMBus (rev 09) 01:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E1 (rev 01) 02:01.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E2 (rev 01) 02:02.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express Downstream Port E3 (rev 01) 03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 05:00.0 Ethernet controller: Intel Corporation PRO/1000 EB Network Connection with I/O Acceleration (rev 01) 05:00.1 Ethernet controller: Intel Corporation PRO/1000 EB Network Connection with I/O Acceleration (rev 01) 06:02.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet Controller (rev 04) 09:0c.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) # # Automatically generated make config: don't edit # Linux kernel version: 2.6.20.4-zeus3 # Fri Mar 30 23:07:23 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y
Re: [BUG] eth0 appers many times in /proc/interrupts after resume
It's ok, after 4 suspend/resume cycles, eth0 only appers one time. On Sun, 2007-01-21 at 21:22 +, Frederik Deweerdt wrote: > On Sun, Jan 21, 2007 at 09:17:41PM +0200, Andrei Popa wrote: > > It's the 10th resume and in /proc/interrupts eth0 appers 10 times. > > > Hi, > > The e100_resume() function should be calling netif_device_detach and > free_irq. Could you try the following (compile tested) patch? > > Regards, > Frederik > > Signed-off-by: Frederik Deweerdt <[EMAIL PROTECTED]> > > diff --git a/drivers/net/e100.c b/drivers/net/e100.c > index 2fe0445..0c376e4 100644 > --- a/drivers/net/e100.c > +++ b/drivers/net/e100.c > @@ -2671,6 +2671,7 @@ static int e100_suspend(struct pci_dev *pdev, > pm_message_t state) > del_timer_sync(>watchdog); > netif_carrier_off(nic->netdev); > > + netif_device_detach(netdev); > pci_save_state(pdev); > > if ((nic->flags & wol_magic) | e100_asf(nic)) { > @@ -2682,6 +2683,7 @@ static int e100_suspend(struct pci_dev *pdev, > pm_message_t state) > } > > pci_disable_device(pdev); > + free_irq(pdev->irq, netdev); > pci_set_power_state(pdev, PCI_D3hot); > > return 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] eth0 appers many times in /proc/interrupts after resume
It's ok, after 4 suspend/resume cycles, eth0 only appers one time. On Sun, 2007-01-21 at 21:22 +, Frederik Deweerdt wrote: On Sun, Jan 21, 2007 at 09:17:41PM +0200, Andrei Popa wrote: It's the 10th resume and in /proc/interrupts eth0 appers 10 times. Hi, The e100_resume() function should be calling netif_device_detach and free_irq. Could you try the following (compile tested) patch? Regards, Frederik Signed-off-by: Frederik Deweerdt [EMAIL PROTECTED] diff --git a/drivers/net/e100.c b/drivers/net/e100.c index 2fe0445..0c376e4 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -2671,6 +2671,7 @@ static int e100_suspend(struct pci_dev *pdev, pm_message_t state) del_timer_sync(nic-watchdog); netif_carrier_off(nic-netdev); + netif_device_detach(netdev); pci_save_state(pdev); if ((nic-flags wol_magic) | e100_asf(nic)) { @@ -2682,6 +2683,7 @@ static int e100_suspend(struct pci_dev *pdev, pm_message_t state) } pci_disable_device(pdev); + free_irq(pdev-irq, netdev); pci_set_power_state(pdev, PCI_D3hot); return 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] eth0 appers many times in /proc/interrupts after resume
Hello, It's the 10th resume and in /proc/interrupts eth0 appers 10 times. ierdnac ~ # cat /proc/interrupts CPU0 CPU1 0: 19690962 21390 IO-APIC-edge timer 1: 34666 0 IO-APIC-edge i8042 8: 12 0 IO-APIC-edge rtc 9: 189109 0 IO-APIC-fasteoi acpi 12:2467502 62285 IO-APIC-edge i8042 14: 40 0 IO-APIC-edge ide0 17:1156971 14168 IO-APIC-fasteoi uhci_hcd:usb5, [EMAIL PROTECTED]::00:02.0 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 19: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 20: 1 26290 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 21: 408192 0 IO-APIC-fasteoi HDA Intel 22: 249414 2543 IO-APIC-fasteoi ohci1394, eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0 223: 220668 0 PCI-MSI-edge libata NMI: 0 0 LOC: 19338002 19135738 ERR: 0 MIS: 0 ierdnac ~ # lsmod Module Size Used by snd_seq47120 0 snd_seq_device 6860 1 snd_seq snd_hda_intel 16344 4 snd_hda_codec 157568 1 snd_hda_intel snd_pcm68100 3 snd_hda_intel,snd_hda_codec snd_timer 18884 3 snd_seq,snd_pcm snd38776 12 snd_seq,snd_seq_device,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer snd_page_alloc 7880 2 snd_hda_intel,snd_pcm usb_storage33156 0 ohci1394 32176 0 ieee1394 82964 1 ohci1394 e100 31368 0 uhci_hcd 21516 0 ehci_hcd 27596 0 usbcore 100948 3 usb_storage,uhci_hcd,ehci_hcd from dmesg: Restarting tasks ... done. Suspend2 debugging info: - Suspend core : 2.2.9.1 - Kernel Version : 2.6.20-rc4 - Compiler vers. : 4.1 - Attempt number : 10 - Parameters : 0 81936 0 1 0 5 - Overall expected compression percentage: 0. - Compressor is 'lzf'. Compressed 525217792 bytes into 449285477 (14 percent compression). - SwapAllocator active. Swap available for image: 250982 pages. - I/O speed: Write 43 MB/s, Read 44 MB/s. - Extra pages: -99 used/500. Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/1 eip 3000 CPU 1 irqstacks, hard=c04bd000 soft=c04b5000 suspend2 maintainer: "That is interesting! Unfortunately, I don't touch anything in that area. Could I get you to send the message to the Linux kernel mailing list? Regards, Nigel" ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc4 #0 SMP PREEMPT Wed Jan 10 18:34:14 EET 2007 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] eth0 appers many times in /proc/interrupts after resume
Hello, It's the 10th resume and in /proc/interrupts eth0 appers 10 times. ierdnac ~ # cat /proc/interrupts CPU0 CPU1 0: 19690962 21390 IO-APIC-edge timer 1: 34666 0 IO-APIC-edge i8042 8: 12 0 IO-APIC-edge rtc 9: 189109 0 IO-APIC-fasteoi acpi 12:2467502 62285 IO-APIC-edge i8042 14: 40 0 IO-APIC-edge ide0 17:1156971 14168 IO-APIC-fasteoi uhci_hcd:usb5, [EMAIL PROTECTED]::00:02.0 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 19: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 20: 1 26290 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2 21: 408192 0 IO-APIC-fasteoi HDA Intel 22: 249414 2543 IO-APIC-fasteoi ohci1394, eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0 223: 220668 0 PCI-MSI-edge libata NMI: 0 0 LOC: 19338002 19135738 ERR: 0 MIS: 0 ierdnac ~ # lsmod Module Size Used by snd_seq47120 0 snd_seq_device 6860 1 snd_seq snd_hda_intel 16344 4 snd_hda_codec 157568 1 snd_hda_intel snd_pcm68100 3 snd_hda_intel,snd_hda_codec snd_timer 18884 3 snd_seq,snd_pcm snd38776 12 snd_seq,snd_seq_device,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer snd_page_alloc 7880 2 snd_hda_intel,snd_pcm usb_storage33156 0 ohci1394 32176 0 ieee1394 82964 1 ohci1394 e100 31368 0 uhci_hcd 21516 0 ehci_hcd 27596 0 usbcore 100948 3 usb_storage,uhci_hcd,ehci_hcd from dmesg: Restarting tasks ... done. Suspend2 debugging info: - Suspend core : 2.2.9.1 - Kernel Version : 2.6.20-rc4 - Compiler vers. : 4.1 - Attempt number : 10 - Parameters : 0 81936 0 1 0 5 - Overall expected compression percentage: 0. - Compressor is 'lzf'. Compressed 525217792 bytes into 449285477 (14 percent compression). - SwapAllocator active. Swap available for image: 250982 pages. - I/O speed: Write 43 MB/s, Read 44 MB/s. - Extra pages: -99 used/500. Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/1 eip 3000 CPU 1 irqstacks, hard=c04bd000 soft=c04b5000 suspend2 maintainer: That is interesting! Unfortunately, I don't touch anything in that area. Could I get you to send the message to the Linux kernel mailing list? Regards, Nigel ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc4 #0 SMP PREEMPT Wed Jan 10 18:34:14 EET 2007 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
On Fri, 2006-12-29 at 02:48 -0800, Linus Torvalds wrote: > > On Fri, 29 Dec 2006, Linus Torvalds wrote: > > > > Hmm? I'd love it if somebody else wrote the patch and tested it, because > > I'm getting sick and tired of this bug ;) > > Who the hell am I kidding? I haven't been able to sleep right for the last > few days over this bug. It was really getting to me. > > And putting on the thinking cap, there's actually a fairly simple an > nonintrusive patch. It still has a tiny tiny race (see the comment), but I > bet nobody can really hit it in real life anyway, and I know several ways > to fix it, so I'm not really _that_ worried about it. > > The patch is mostly a comment. The "real" meat of it is actually just a > few lines. > > Can anybody get corruption with this thing applied? It goes on top of > plain v2.6.20-rc2. Tested with rtorrent and there is no corruption. > > Linus > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index b3a198c..ec01da1 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > > - if (!mapping) > - return TestClearPageDirty(page); > - > - if (TestClearPageDirty(page)) { > - if (mapping_cap_account_dirty(mapping)) { > - page_mkclean(page); > + if (mapping && mapping_cap_account_dirty(mapping)) { > + /* > + * Yes, Virginia, this is indeed insane. > + * > + * We use this sequence to make sure that > + * (a) we account for dirty stats properly > + * (b) we tell the low-level filesystem to > + * mark the whole page dirty if it was > + * dirty in a pagetable. Only to then > + * (c) clean the page again and return 1 to > + * cause the writeback. > + * > + * This way we avoid all nasty races with the > + * dirty bit in multiple places and clearing > + * them concurrently from different threads. > + * > + * Note! Normally the "set_page_dirty(page)" > + * has no effect on the actual dirty bit - since > + * that will already usually be set. But we > + * need the side effects, and it can help us > + * avoid races. > + * > + * We basically use the page "master dirty bit" > + * as a serialization point for all the different > + * threds doing their things. > + * > + * FIXME! We still have a race here: if somebody > + * adds the page back to the page tables in > + * between the "page_mkclean()" and the "TestClearPageDirty()", > + * we might have it mapped without the dirty bit set. > + */ > + if (page_mkclean(page)) > + set_page_dirty(page); > + if (TestClearPageDirty(page)) { > dec_zone_page_state(page, NR_FILE_DIRTY); > + return 1; > } > - return 1; > + return 0; > } > - return 0; > + return TestClearPageDirty(page); > } > EXPORT_SYMBOL(clear_page_dirty_for_io); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
On Fri, 2006-12-29 at 02:48 -0800, Linus Torvalds wrote: On Fri, 29 Dec 2006, Linus Torvalds wrote: Hmm? I'd love it if somebody else wrote the patch and tested it, because I'm getting sick and tired of this bug ;) Who the hell am I kidding? I haven't been able to sleep right for the last few days over this bug. It was really getting to me. And putting on the thinking cap, there's actually a fairly simple an nonintrusive patch. It still has a tiny tiny race (see the comment), but I bet nobody can really hit it in real life anyway, and I know several ways to fix it, so I'm not really _that_ worried about it. The patch is mostly a comment. The real meat of it is actually just a few lines. Can anybody get corruption with this thing applied? It goes on top of plain v2.6.20-rc2. Tested with rtorrent and there is no corruption. Linus diff --git a/mm/page-writeback.c b/mm/page-writeback.c index b3a198c..ec01da1 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page) { struct address_space *mapping = page_mapping(page); - if (!mapping) - return TestClearPageDirty(page); - - if (TestClearPageDirty(page)) { - if (mapping_cap_account_dirty(mapping)) { - page_mkclean(page); + if (mapping mapping_cap_account_dirty(mapping)) { + /* + * Yes, Virginia, this is indeed insane. + * + * We use this sequence to make sure that + * (a) we account for dirty stats properly + * (b) we tell the low-level filesystem to + * mark the whole page dirty if it was + * dirty in a pagetable. Only to then + * (c) clean the page again and return 1 to + * cause the writeback. + * + * This way we avoid all nasty races with the + * dirty bit in multiple places and clearing + * them concurrently from different threads. + * + * Note! Normally the set_page_dirty(page) + * has no effect on the actual dirty bit - since + * that will already usually be set. But we + * need the side effects, and it can help us + * avoid races. + * + * We basically use the page master dirty bit + * as a serialization point for all the different + * threds doing their things. + * + * FIXME! We still have a race here: if somebody + * adds the page back to the page tables in + * between the page_mkclean() and the TestClearPageDirty(), + * we might have it mapped without the dirty bit set. + */ + if (page_mkclean(page)) + set_page_dirty(page); + if (TestClearPageDirty(page)) { dec_zone_page_state(page, NR_FILE_DIRTY); + return 1; } - return 1; + return 0; } - return 0; + return TestClearPageDirty(page); } EXPORT_SYMBOL(clear_page_dirty_for_io); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one
I have corrupted files... > --- > diff --git a/fs/buffer.c b/fs/buffer.c > index 263f88e..4652ef1 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -1653,19 +1653,7 @@ static int __block_write_full_page(struct inode > *inode, struct page *page, > do { > if (!buffer_mapped(bh)) > continue; > - /* > - * If it's a fully non-blocking write attempt and we cannot > - * lock the buffer then redirty the page. Note that this can > - * potentially cause a busy-wait loop from pdflush and kswapd > - * activity, but those code paths have their own higher-level > - * throttling. > - */ > - if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) { > - lock_buffer(bh); > - } else if (test_set_buffer_locked(bh)) { > - redirty_page_for_writepage(wbc, page); > - continue; > - } > + lock_buffer(bh); > if (test_clear_buffer_dirty(bh)) { > mark_buffer_async_write(bh); > } else { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one
I have corrupted files... --- diff --git a/fs/buffer.c b/fs/buffer.c index 263f88e..4652ef1 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1653,19 +1653,7 @@ static int __block_write_full_page(struct inode *inode, struct page *page, do { if (!buffer_mapped(bh)) continue; - /* - * If it's a fully non-blocking write attempt and we cannot - * lock the buffer then redirty the page. Note that this can - * potentially cause a busy-wait loop from pdflush and kswapd - * activity, but those code paths have their own higher-level - * throttling. - */ - if (wbc-sync_mode != WB_SYNC_NONE || !wbc-nonblocking) { - lock_buffer(bh); - } else if (test_set_buffer_locked(bh)) { - redirty_page_for_writepage(wbc, page); - continue; - } + lock_buffer(bh); if (test_clear_buffer_dirty(bh)) { mark_buffer_async_write(bh); } else { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > only these: ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 but I don't think has anything to do with... > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > > this patch and 2.6.19. > > Yeah, if my guess about do_no_page() is right, _none_ of the previous > patches should have ANY effect what-so-ever. In fact, I'd say that even > the "ext3 works in writeback mode" thing that Andrei reports is probably a > total fluke brought on by timing changes rather than anything else. > > So please try the latest patch instead (on top of anything that shows > corruption reliably - the patch should be _totally_ independent of all the > other issues, and I think it will apply cleanly on top of 2.6.18.3 and > 2.6.19 too, so anything that shows corruption is a fine target - but try > to choose something that has been the "best" at corrupting things for you, > to make the testing as good as possible). > > Patch included here again (although I think you were cc'd on my previous > email too, so you should already have it, and our emails just crossed) > > And if this doesn't fix it, I don't know what will.. With latest git and patches: http://lkml.org/lkml/diff/2006/12/24/56/1 http://lkml.org/lkml/diff/2006/12/24/61/1 Hash check on download completion found bad chunks, consider using "safe_sync". > > Linus > > --- > diff --git a/mm/memory.c b/mm/memory.c > index 563792f..cf429c4 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2247,21 +2249,23 @@ retry: > if (pte_none(*page_table)) { > flush_icache_page(vma, new_page); > entry = mk_pte(new_page, vma->vm_page_prot); > - if (write_access) > - entry = maybe_mkwrite(pte_mkdirty(entry), vma); > - set_pte_at(mm, address, page_table, entry); > if (anon) { > inc_mm_counter(mm, anon_rss); > lru_cache_add_active(new_page); > page_add_new_anon_rmap(new_page, vma, address); > + if (write_access) > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > } else { > inc_mm_counter(mm, file_rss); > page_add_file_rmap(new_page); > + entry = pte_wrprotect(entry); > if (write_access) { > dirty_page = new_page; > get_page(dirty_page); > + entry = maybe_mkwrite(pte_mkdirty(entry), vma); > } > } > + set_pte_at(mm, address, page_table, entry); > } else { > /* One of our sibling threads was faster, back out. */ > page_cache_release(new_page); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 14:14:38 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > - mount the fs with ext2 with the no-buffer-head option. That means > > > either: > > > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > > /etc/fstab: ext2 nobh > > > > ierdnac ~ # mount > > /dev/sda7 on / type ext2 (rw,noatime,nobh) > > > > I have corruption. > > > > > > > > - mount the fs with ext3 data=writeback, nobh > > > > > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > > > works) > > > /etc/fstab: ext2 data=writeback,nobh > > > > ierdnac ~ # mount > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > ierdnac ~ # dmesg|grep EXT3 > > EXT3-fs: mounted filesystem with writeback data mode. > > EXT3 FS on sda7, internal journal > > > > I don't have corruption. I tested twice. > > This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? Yes, no corruption. Also tested only with data=writeback and had no corruption. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: > On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > I now _suspect_ that we're talking about something like > > > > > > - we started a writeout. The IO is still pending, and the page was > > >marked clean and is now in the "writeback" phase. > > > - a write happens to the page, and the page gets marked dirty again. > > >Marking the page dirty also marks all the _buffers_ in the page dirty, > > >but they were actually already dirty, because the IO hasn't completed > > >yet. > > > - the IO from the _previous_ write completes, and marks the buffers > > > clean > > >again. > > > > Some things for the testers to try, please: > > > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > /etc/fstab: ext2 nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext2 (rw,noatime,nobh) > > I have corruption. > > > > > - mount the fs with ext3 data=writeback, nobh > > > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > > works) > > /etc/fstab: ext2 data=writeback,nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > ierdnac ~ # dmesg|grep EXT3 > EXT3-fs: mounted filesystem with writeback data mode. > EXT3 FS on sda7, internal journal > > I don't have corruption. I tested twice. > I also tested with ext3 ordered, nobh and I have file corruption... > > > > if that still fails we can rule out buffer_head funnies. > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked clean and is now in the "writeback" phase. > > - a write happens to the page, and the page gets marked dirty again. > >Marking the page dirty also marks all the _buffers_ in the page dirty, > >but they were actually already dirty, because the IO hasn't completed > >yet. > > - the IO from the _previous_ write completes, and marks the buffers clean > >again. > > Some things for the testers to try, please: > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > grub.conf: rootfstype=ext2 rootflags=nobh > /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. > > - mount the fs with ext3 data=writeback, nobh > > grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this > works) > /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. > > if that still fails we can rule out buffer_head funnies. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. if that still fails we can rule out buffer_head funnies. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty again. Marking the page dirty also marks all the _buffers_ in the page dirty, but they were actually already dirty, because the IO hasn't completed yet. - the IO from the _previous_ write completes, and marks the buffers clean again. Some things for the testers to try, please: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. I also tested with ext3 ordered, nobh and I have file corruption... if that still fails we can rule out buffer_head funnies. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I have corruption. - mount the fs with ext3 data=writeback, nobh grub.conf: rootfstype=ext3 rootflags=nobh,data=writeback (I hope this works) /etc/fstab: ext2 data=writeback,nobh ierdnac ~ # mount /dev/sda7 on / type ext3 (rw,noatime,nobh) ierdnac ~ # dmesg|grep EXT3 EXT3-fs: mounted filesystem with writeback data mode. EXT3 FS on sda7, internal journal I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3 data=writeback,nobh? Yes, no corruption. Also tested only with data=writeback and had no corruption. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Gordon Farquharson wrote: The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the ext3 works in writeback mode thing that Andrei reports is probably a total fluke brought on by timing changes rather than anything else. So please try the latest patch instead (on top of anything that shows corruption reliably - the patch should be _totally_ independent of all the other issues, and I think it will apply cleanly on top of 2.6.18.3 and 2.6.19 too, so anything that shows corruption is a fine target - but try to choose something that has been the best at corrupting things for you, to make the testing as good as possible). Patch included here again (although I think you were cc'd on my previous email too, so you should already have it, and our emails just crossed) And if this doesn't fix it, I don't know what will.. With latest git and patches: http://lkml.org/lkml/diff/2006/12/24/56/1 http://lkml.org/lkml/diff/2006/12/24/61/1 Hash check on download completion found bad chunks, consider using safe_sync. Linus --- diff --git a/mm/memory.c b/mm/memory.c index 563792f..cf429c4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2247,21 +2249,23 @@ retry: if (pte_none(*page_table)) { flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma-vm_page_prot); - if (write_access) - entry = maybe_mkwrite(pte_mkdirty(entry), vma); - set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); lru_cache_add_active(new_page); page_add_new_anon_rmap(new_page, vma, address); + if (write_access) + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } else { inc_mm_counter(mm, file_rss); page_add_file_rmap(new_page); + entry = pte_wrprotect(entry); if (write_access) { dirty_page = new_page; get_page(dirty_page); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } } + set_pte_at(mm, address, page_table, entry); } else { /* One of our sibling threads was faster, back out. */ page_cache_release(new_page); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? only these: ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 ACPI: EC: evaluating _Q80 but I don't think has anything to do with... Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... but I really > need a better testcase since an installation takes about an hour. > Andrei, which torrent do you download as a testcase? It would be good > if someone could suggest a torrent which is legal and not too large. It's a 1.4GB file torrent split in 84 rar files and there are many seeders. I download with ~ 5MB/sec. The torrent is private. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: * Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]: With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an hour. Andrei, which torrent do you download as a testcase? It would be good if someone could suggest a torrent which is legal and not too large. It's a 1.4GB file torrent split in 84 rar files and there are many seeders. I download with ~ 5MB/sec. The torrent is private. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..4f4cd13 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 9d774d0..8879f1d 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -61,31 +61,6 @@ ({ \ }) #endif -#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY -#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\ -({ \ - pte_t __pte = *__ptep; \ - int r = 1; \ - if (!pte_dirty(__pte)) \ - r = 0; \ - else\ - set_pte_at((__vma)->vm_mm, (__address), (__ptep), \ - pte_mkclean(__pte)); \ - r; \ -}) -#endif - -#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(__vma, __address, __ptep) \ -({ \ - int __dirty;\ - __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep); \ - if (__dirty)\ - flush_tlb_page(__vma, __address); \ - __dirty;\ -}) -#endif - #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR #define ptep_get_and_clear(__mm, __address, __ptep)\ ({ \ diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h index e6a4723..b61d6f9 100644 --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -300,18 +300,20 @@ do { \ flush_tlb_page(vma, address); \ } while (0) -#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(vma, address, ptep) \ -({ \ - int __dirty;\ - __dirty = pte_dirty(*(ptep)); \ - if (__dirty) { \ - clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low); \ - pte_update_defer((vma)->vm_mm, (address), (ptep)); \ - flush_tlb_page(vma, address); \ - } \ - __dirty;
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..4f4cd13 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + cancel_dirty_page(page, /* No IO accounting for huge pages? */0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 9d774d0..8879f1d 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -61,31 +61,6 @@ ({ \ }) #endif -#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY -#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\ -({ \ - pte_t __pte = *__ptep; \ - int r = 1; \ - if (!pte_dirty(__pte)) \ - r = 0; \ - else\ - set_pte_at((__vma)-vm_mm, (__address), (__ptep), \ - pte_mkclean(__pte)); \ - r; \ -}) -#endif - -#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(__vma, __address, __ptep) \ -({ \ - int __dirty;\ - __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep); \ - if (__dirty)\ - flush_tlb_page(__vma, __address); \ - __dirty;\ -}) -#endif - #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR #define ptep_get_and_clear(__mm, __address, __ptep)\ ({ \ diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h index e6a4723..b61d6f9 100644 --- a/include/asm-i386/pgtable.h +++ b/include/asm-i386/pgtable.h @@ -300,18 +300,20 @@ do { \ flush_tlb_page(vma, address); \ } while (0) -#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH -#define ptep_clear_flush_dirty(vma, address, ptep) \ -({ \ - int __dirty;\ - __dirty = pte_dirty(*(ptep)); \ - if (__dirty) { \ - clear_bit(_PAGE_BIT_DIRTY, (ptep)-pte_low); \ - pte_update_defer((vma)-vm_mm, (address), (ptep)); \ - flush_tlb_page(vma, address); \ - } \ - __dirty;
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote: > > Btw, I'd really love to hear whether the patch I sent out actually _helps_ > at all, or whether we're just discussing something that in the end is just > a cleanup.. > > Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be > talking about different bugs, so _both_ of your experiences definitely > matter here). with http://lkml.org/lkml/diff/2006/12/20/204/1 I have corruption: Hash check on download completion found bad chunks, consider using "safe_sync". > > Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)
On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote: Btw, I'd really love to hear whether the patch I sent out actually _helps_ at all, or whether we're just discussing something that in the end is just a cleanup.. Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be talking about different bugs, so _both_ of your experiences definitely matter here). with http://lkml.org/lkml/diff/2006/12/20/204/1 I have corruption: Hash check on download completion found bad chunks, consider using safe_sync. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Wed, 2006-12-20 at 15:23 +0100, Peter Zijlstra wrote: > On Wed, 2006-12-20 at 16:15 +0200, Andrei Popa wrote: > > On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote: > > > On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: > > > > > > > OR: > > > > > > > > - page_mkclean_one() is simply buggy. > > > > > > GOLD! > > > > > > it seems to work with all this (full diff against current git). > > > > > > /me rebuilds full kernel to make sure... > > > reboot... > > > test... pff the tension... > > > yay, still good! > > > > > > Andrei; would you please verify. > > > > I have corrupted files. > > drad; and with this patch: > http://lkml.org/lkml/2006/12/20/112 Hash check on download completion found bad chunks, consider using "safe_sync". > > /me goes rebuild his kernel and try more than 3 times > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote: > On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: > > > OR: > > > > - page_mkclean_one() is simply buggy. > > GOLD! > > it seems to work with all this (full diff against current git). > > /me rebuilds full kernel to make sure... > reboot... > test... pff the tension... > yay, still good! > > Andrei; would you please verify. I have corrupted files. > The magic seems to be in the extra tlb flush after clearing the dirty > bit. Just too bad ptep_clear_flush_dirty() needs ptep not entry. > > diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c > index 5e7cd45..2b8893b 100644 > --- a/drivers/connector/connector.c > +++ b/drivers/connector/connector.c > @@ -135,8 +135,7 @@ static int cn_call_callback(struct cn_msg *msg, void > (*destruct_data)(void *), v > spin_lock_bh(>cbdev->queue_lock); > list_for_each_entry(__cbq, >cbdev->queue_list, callback_entry) { > if (cn_cb_equal(&__cbq->id.id, >id)) { > - if (likely(!test_bit(WORK_STRUCT_PENDING, > - &__cbq->work.work.management) && > + if (likely(!delayed_work_pending(&__cbq->work) && > __cbq->data.ddata == NULL)) { > __cbq->data.callback_priv = msg; > > diff --git a/fs/buffer.c b/fs/buffer.c > index d1f1b54..263f88e 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page) > int ret = 0; > > BUG_ON(!PageLocked(page)); > - if (PageWriteback(page)) > + if (PageDirty(page) || PageWriteback(page)) > return 0; > > if (mapping == NULL) { /* can this still happen? */ > @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page) > spin_lock(>private_lock); > ret = drop_buffers(page, _to_free); > spin_unlock(>private_lock); > - if (ret) { > - /* > - * If the filesystem writes its buffers by hand (eg ext3) > - * then we can have clean buffers against a dirty page. We > - * clean the page here; otherwise later reattachment of buffers > - * could encounter a non-uptodate page, which is unresolvable. > - * This only applies in the rare case where try_to_free_buffers > - * succeeds but the page is not freed. > - * > - * Also, during truncate, discard_buffer will have marked all > - * the page's buffers clean. We discover that here and clean > - * the page also. > - */ > - if (test_clear_page_dirty(page)) > - task_io_account_cancelled_write(PAGE_CACHE_SIZE); > - } > out: > if (buffers_to_free) { > struct buffer_head *bh = buffers_to_free; > diff --git a/mm/memory.c b/mm/memory.c > index c00bac6..60e0945 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping, > } > EXPORT_SYMBOL(unmap_mapping_range); > > +static void check_last_page(struct address_space *mapping, loff_t size) > +{ > + pgoff_t index; > + unsigned int offset; > + struct page *page; > + > + if (!mapping) > + return; > + offset = size & ~PAGE_MASK; > + if (!offset) > + return; > + index = size >> PAGE_SHIFT; > + page = find_lock_page(mapping, index); > + if (page) { > + unsigned int check = 0; > + unsigned char *kaddr = kmap_atomic(page, KM_USER0); > + do { > + check += kaddr[offset++]; > + } while (offset < PAGE_SIZE); > + kunmap_atomic(kaddr, KM_USER0); > + unlock_page(page); > + page_cache_release(page); > + if (check) > + printk(KERN_ERR "%s: BADNESS: truncate check %u\n", > current->comm, check); > + } > +} > + > /** > * vmtruncate - unmap mappings "freed" by truncate() syscall > * @inode: inode of the file used > @@ -1875,6 +1902,7 @@ do_expand: > goto out_sig; > if (offset > inode->i_sb->s_maxbytes) > goto out_big; > + check_last_page(mapping, inode->i_size); > i_size_write(inode, offset); > > out_truncate: > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 237107c..f561e72 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -957,7 +957,7 @@ int test_set_page_writeback(struct page *page) > EXPORT_SYMBOL(test_set_page_writeback); > > /* > - * Return true if any of the pages in the mapping are marged with the > + * Return true if any of the pages in the mapping are marked with the > * passed tag. > */ > int mapping_tagged(struct address_space *mapping, int tag) > diff --git a/mm/rmap.c b/mm/rmap.c > index d8a842a..900229a 100644 >
Re: 2.6.19 file content corruption on ext3
On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote: On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: OR: - page_mkclean_one() is simply buggy. GOLD! it seems to work with all this (full diff against current git). /me rebuilds full kernel to make sure... reboot... test... pff the tension... yay, still good! Andrei; would you please verify. I have corrupted files. The magic seems to be in the extra tlb flush after clearing the dirty bit. Just too bad ptep_clear_flush_dirty() needs ptep not entry. diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index 5e7cd45..2b8893b 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -135,8 +135,7 @@ static int cn_call_callback(struct cn_msg *msg, void (*destruct_data)(void *), v spin_lock_bh(dev-cbdev-queue_lock); list_for_each_entry(__cbq, dev-cbdev-queue_list, callback_entry) { if (cn_cb_equal(__cbq-id.id, msg-id)) { - if (likely(!test_bit(WORK_STRUCT_PENDING, - __cbq-work.work.management) + if (likely(!delayed_work_pending(__cbq-work) __cbq-data.ddata == NULL)) { __cbq-data.callback_priv = msg; diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page) int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page) spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* - * If the filesystem writes its buffers by hand (eg ext3) - * then we can have clean buffers against a dirty page. We - * clean the page here; otherwise later reattachment of buffers - * could encounter a non-uptodate page, which is unresolvable. - * This only applies in the rare case where try_to_free_buffers - * succeeds but the page is not freed. - * - * Also, during truncate, discard_buffer will have marked all - * the page's buffers clean. We discover that here and clean - * the page also. - */ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/mm/memory.c b/mm/memory.c index c00bac6..60e0945 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping, } EXPORT_SYMBOL(unmap_mapping_range); +static void check_last_page(struct address_space *mapping, loff_t size) +{ + pgoff_t index; + unsigned int offset; + struct page *page; + + if (!mapping) + return; + offset = size ~PAGE_MASK; + if (!offset) + return; + index = size PAGE_SHIFT; + page = find_lock_page(mapping, index); + if (page) { + unsigned int check = 0; + unsigned char *kaddr = kmap_atomic(page, KM_USER0); + do { + check += kaddr[offset++]; + } while (offset PAGE_SIZE); + kunmap_atomic(kaddr, KM_USER0); + unlock_page(page); + page_cache_release(page); + if (check) + printk(KERN_ERR %s: BADNESS: truncate check %u\n, current-comm, check); + } +} + /** * vmtruncate - unmap mappings freed by truncate() syscall * @inode: inode of the file used @@ -1875,6 +1902,7 @@ do_expand: goto out_sig; if (offset inode-i_sb-s_maxbytes) goto out_big; + check_last_page(mapping, inode-i_size); i_size_write(inode, offset); out_truncate: diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 237107c..f561e72 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -957,7 +957,7 @@ int test_set_page_writeback(struct page *page) EXPORT_SYMBOL(test_set_page_writeback); /* - * Return true if any of the pages in the mapping are marged with the + * Return true if any of the pages in the mapping are marked with the * passed tag. */ int mapping_tagged(struct address_space *mapping, int tag) diff --git a/mm/rmap.c b/mm/rmap.c index d8a842a..900229a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -432,7 +432,7 @@ static int page_mkclean_one(struct page *page, struct vm_area_struct *vma) {
Re: 2.6.19 file content corruption on ext3
On Wed, 2006-12-20 at 15:23 +0100, Peter Zijlstra wrote: On Wed, 2006-12-20 at 16:15 +0200, Andrei Popa wrote: On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote: On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: OR: - page_mkclean_one() is simply buggy. GOLD! it seems to work with all this (full diff against current git). /me rebuilds full kernel to make sure... reboot... test... pff the tension... yay, still good! Andrei; would you please verify. I have corrupted files. drad; and with this patch: http://lkml.org/lkml/2006/12/20/112 Hash check on download completion found bad chunks, consider using safe_sync. /me goes rebuild his kernel and try more than 3 times - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
> > > Also, it'd be useful if you could determine whether the bug appears with > > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with > > > rootfstype=ext2 if it's the root filesystem. > > I fave file corruption. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
Also, it'd be useful if you could determine whether the bug appears with the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with rootfstype=ext2 if it's the root filesystem. I fave file corruption. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
> > > If all of test_clear_page_dirty() has been commented out then the page > > > will > > > never become clean hence will never fall out of pagecache, so unless > > > Andrei > > > is doing a reboot before checking for corruption, perhaps the underlying > > > data on-disk is incorrect, but we can't see it. > > > > if I do a sync and echo 1 > /proc/sys/vm/drop_caches > > OK, that works. > > > does the reboot is > > still necesary ? > > It might be necessary to reboot in this case - if we're leaving the > pagecache dirty, writing to drop_caches won't remove it. And you probably > won't be able to get a clean reboot either. > > > > > > > Andrei, how _are_ you running this test?What's the exact sequence of > > > steps? > > > > > > In particular, are you doing anything which would cause the corrupted file > > > to be evicted from memory, thus forcing a read from disk? Such as > > > unmounting and then remounting the filesystem? > > > > I boot linux, I start rtorrent and start the download, while it's > > downloading I start evolution and i check my mail(my mbox is very large, > > several hundered megabytes), I close evolution(I use evolution just to > > have another application witch uses the filesystem and the memory), I > > start evolution again. I start firefox. The download is complete. > > Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to > > test that all 84 downloaded rar files are ok and see the result. > > > > > > > > The point of my question is to check that the data is really incorrect > > > on-disk, or whether it is incorrect in pagecache. I rebooted and the files are still broken after reboot(tested twice) so the data is incorrect on disk. > > > > > > Also, it'd be useful if you could determine whether the bug appears with > > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with > > > rootfstype=ext2 if it's the root filesystem. > > > > I will test. Will test In a couple of hours, I have some work to do... > > ok, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote: > > On Tue, 19 Dec 2006, Andrei Popa wrote: > > > > > > > > nope, no file corruption at all. > > > > > > Ok. That's interesting, but I think you actually #ifdef'ed out too > > > much: > > > > > > It was really just the _inner_ "if (mapping_cap_account_dirty(.." > > > statement that I meant you should remove. > > > > > > Can you try that too? > > > > I have file corruption: "Hash check on download completion found bad > > chunks, consider using "safe_sync"." > > Ok, that's interesting. > > So it doesn't seem to be the call to page_mkclean() itself that causes > corruption. It looks like Peter's hunch that maybe there's some bug in > PG_dirty handling _itself_ might be an idea.. > > And the reason it only started happening now is that it may just have been > _hidden_ by the fact that while we kept the dirty bits in the page tables, > we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. > So if it's some bad interaction between writable mappings and some other > part of the system, we just didn't see it earlier, exactly because we had > _lots_ of dirty bits, and it was enough that _one_ of them was right. > > If you didn't see corruption when you #ifdef'ed out too much of the > "test_clean_page_dirty() function (the _whole_ TestClearPageDirty() > if-statement), but you get it when you just comment out the stuff that > does the page_mkclean(), that's interesting. > > I'm left lookin gat the "radix_tree_tag_clear()" in > test_clear_page_dirty(). > > What happens if you only ifdef out that single thing? I have file corruption. > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY > bit _after_ the page has been marked for writeback. Is there some ordering > constraint there, perhaps? > > I'm really reaching here. I'm trying to see the pattern, and I'm not > seeing it. I'm asking you to test things just to get more of a feel for > what triggers the failure, than because I actually have any kind of idea > of what the heck is going on. > > Andrew, Nick, Hugh - any ideas? > > Linus diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..2d8 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(>lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2006 16:57:30 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > What happens if you only ifdef out that single thing? > > > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY > > bit _after_ the page has been marked for writeback. Is there some ordering > > constraint there, perhaps? > > > > I'm really reaching here. I'm trying to see the pattern, and I'm not > > seeing it. I'm asking you to test things just to get more of a feel for > > what triggers the failure, than because I actually have any kind of idea > > of what the heck is going on. > > > > Andrew, Nick, Hugh - any ideas? > > If all of test_clear_page_dirty() has been commented out then the page will > never become clean hence will never fall out of pagecache, so unless Andrei > is doing a reboot before checking for corruption, perhaps the underlying > data on-disk is incorrect, but we can't see it. if I do a sync and echo 1 > /proc/sys/vm/drop_caches does the reboot is still necesary ? > > Andrei, how _are_ you running this test?What's the exact sequence of > steps? > > In particular, are you doing anything which would cause the corrupted file > to be evicted from memory, thus forcing a read from disk? Such as > unmounting and then remounting the filesystem? I boot linux, I start rtorrent and start the download, while it's downloading I start evolution and i check my mail(my mbox is very large, several hundered megabytes), I close evolution(I use evolution just to have another application witch uses the filesystem and the memory), I start evolution again. I start firefox. The download is complete. Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to test that all 84 downloaded rar files are ok and see the result. > > The point of my question is to check that the data is really incorrect > on-disk, or whether it is incorrect in pagecache. > > Also, it'd be useful if you could determine whether the bug appears with > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with > rootfstype=ext2 if it's the root filesystem. I will test. > > Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote: > > On Tue, 19 Dec 2006, Andrei Popa wrote: > > > > > > There's exactly two call sites that call "page_mkclean()" (an dthat is > > > the > > > only thing in turn that calls "page_mkclean_one()", which we already > > > determined will cause the corruption). > > > > > > Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() > > > case? Just do an "#if 0 .. #endif" around that whole if-statement, > > > leaving > > > the _only_ thing that actually calls "page_mkclean()" to be the > > > "clear_page_dirty_for_io()" call. > > > > > > Do you still see corruption? > > > > nope, no file corruption at all. > > Ok. That's interesting, but I think you actually #ifdef'ed out too > much: > > > + > > +#if 0 > > if (TestClearPageDirty(page)) { > > radix_tree_tag_clear(>page_tree, > > page_index(page), PAGECACHE_TAG_DIRTY); > > @@ -866,11 +868,19 @@ int test_clear_page_dirty(struct page *p > > * page is locked, which pins the address_space > > */ > > if (mapping_cap_account_dirty(mapping)) { > > - page_mkclean(page); > > + int cleaned = page_mkclean(page); > > + if (!must_clean_ptes && cleaned){ > > + WARN_ON(1); > > + set_page_dirty(page); > > + } > > + > > dec_zone_page_state(page, NR_FILE_DIRTY); > > } > > return 1; > > } > > + > > +#endif > > + > > It was really just the _inner_ "if (mapping_cap_account_dirty(.." > statement that I meant you should remove. > > Can you try that too? I have file corruption: "Hash check on download completion found bad chunks, consider using "safe_sync"." diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..2d8 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(>lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..9f82cd0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { -
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Alessandro Suardi wrote: > > > > No idea whether this can be a data point or not, but > > here it goes... my P2P box is about to turn 5 days old > > while running nonstop one or both of aMule 2.1.3 and > > BitTorrent 4.4.0 on ext3 mounted w/default options > > on both IDE and USB disks. Zero corruption. > > > > AMD K7-800, 512MB RAM, PREEMPT/UP kernel, > > 2.6.19-git20 on top of up-to-date FC6. > > It _looks_ like PREEMPT/SMP is one common configuration. > > It might also be that the blocksize of the filesystem matters. 4kB > filesystems are fundamentally simpler than 1kB filesystems, for example. > You can tell at least with "/sbin/dumpe2fs -h /dev/..." or something. > > Andrei - one thing that might be interesting to see: when corruption > occurs, can you get the corrupted file somehow? And compare it with a > known-good copy to see what the corruption looks like? the corrupted file has a chink full with zeros http://193.226.119.62/corruption0.jpg http://193.226.119.62/corruption1.jpg - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Andrei Popa wrote: > > > > > > This should be fairly easy to test: just change every single ", 1" case > > > in > > > the patch to ", 0". > > > > > > What happens for you in that case? > > > > I have file corruption. > > Magic. And btw, _thanks_ for being such a great tester. > > So now I have one more thng for you to try, it you can bother: > > There's exactly two call sites that call "page_mkclean()" (an dthat is the > only thing in turn that calls "page_mkclean_one()", which we already > determined will cause the corruption). > > Both of them do > > if (mapping_cap_account_dirty(mapping)) { > .. > > things, although they do slightly different things inside that if in your > patched kernel. > > Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() > case? Just do an "#if 0 .. #endif" around that whole if-statement, leaving > the _only_ thing that actually calls "page_mkclean()" to be the > "clear_page_dirty_for_io()" call. > > Do you still see corruption? nope, no file corruption at all. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..2d8 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(>lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..9f82cd0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..5e29b37 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp->count since we may have released page lock */ if (test_bit(META_discard, >flag) && !mp->count) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); } #else diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 47e7027..a97e198 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Linus Torvalds wrote: > > > > But at the same time, it's interesting that it still happens when we try > > to re-add the dirty bit. That would tell me that it's one of two cases: > > Forget that. There's a third case, which is much more likely: > > - Andrew's patch had a ", 1" where it _should_ have had a ", 0". > > This should be fairly easy to test: just change every single ", 1" case in > the patch to ", 0". > > The only case that _definitely_ would want ",1" is actually the case that > already calls page_mkclean() directly: clear_page_dirty_for_io(). So no > other ", 1" is valid, and that one that needed it already avoided even > calling the "test_clear_page_dirty()" function, because it did it all by > hand. > > What happens for you in that case? > > Linus I have file corruption. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..760442f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(>lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..7b87875 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..47a6b62 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp->count since we may have released page lock */ if (test_bit(META_discard, >flag) && !mp->count) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); } #else diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 47e7027..a97e198 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p bh = next; } while (bh != head); if (PAGE_SIZE == bh->b_size) { - clear_page_dirty(page); + clear_page_dirty(page, 0); } } } diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index b56eb75..d65ba84 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -343,7 +343,7 @@
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Andrei Popa wrote: > > > > I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last > > two). All unified patch is attached. I tested and I have no corruption. > > That wasn't very interesting, because you also had the patch that just > disabled "page_mkclean_one()" entirely: > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index d8a842a..3f9061e 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page > > goto unlock; > > > > entry = ptep_get_and_clear(mm, address, pte); > > - entry = pte_mkclean(entry); > > + /*entry = pte_mkclean(entry);*/ > > entry = pte_wrprotect(entry); > > ptep_establish(vma, address, pte, entry); > > lazy_mmu_prot_update(entry); > > The above patch is bad. It's always going to hide the bug, but it hides it > by just not doing anything at all. So any patch combination that contains > that patch will probably _always_ fix your problem, but it won't be an > interesting patch.. > > So can you remove that small fragment? Also, it would be nice if you added > the WARN_ON() to this sequence in mm/page-writeback.c: > > + if (!must_clean_ptes && cleaned) > + set_page_dirty(page); > > just make it do a WARN_ON() if this ever triggers. > > Then, IF the corruption is gone, we'd love to see the WARN_ON results.. > > Linus I dropped that patch and added WARN_ON(1), the unified patch is attached. I got corruption: "Hash check on download completion found bad chunks, consider using "safe_sync"." In dmesg there is no message from WARN_ON(1), my .config is attached. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..760442f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 1)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(>lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..7b87875 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..47a6b62 100644 ---
Re: 2.6.19 file content corruption on ext3
> (On that note: Andrei - if you do test this out, I'd suggest applying my > patch too - the one that you already tested. It won't apply cleanly on top > of Andrew's patch, but it should be trivial to apply by hand, since you > really just want to remove the whole "if (ret) {...}" sequence. I realize > that it didn't make any difference for you, but applying that patch is > probably a good idea just to remove the noise for a codepath that you > already showed to not matter) I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last two). All unified patch is attached. I tested and I have no corruption. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(>private_lock); ret = drop_buffers(page, _to_free); spin_unlock(>private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..760442f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 1)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(>lock); if (offset == 0 && to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..7b87875 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..47a6b62 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp->count since we may have released page lock */ if (test_bit(META_discard, >flag) && !mp->count) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); } #else diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 47e7027..a97e198 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p bh = next; } while (bh != head); if (PAGE_SIZE == bh->b_size) { - clear_page_dirty(page); + clear_page_dirty(page, 0); } } } diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index b56eb75..d65ba84 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -343,7 +343,7 @@ xfs_start_page_writeback( ASSERT(!PageWriteback(page)); set_page_writeback(page); if (clear_dirty) - clear_page_dirty(page); + clear_page_dirty(page, 1); unlock_page(page); if (!buffers) {
Re: 2.6.19 file content corruption on ext3
> OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3 > in? > ordered > > Also, for testings sake, could you give this a go: > It's a total hack but I guess worth testing. > > --- > mm/rmap.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: linux-2.6-git/mm/rmap.c > === > --- linux-2.6-git.orig/mm/rmap.c 2006-12-18 11:06:29.0 +0100 > +++ linux-2.6-git/mm/rmap.c 2006-12-18 11:07:16.0 +0100 > @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page > goto unlock; > > entry = ptep_get_and_clear(mm, address, pte); > - entry = pte_mkclean(entry); > + /* entry = pte_mkclean(entry); */ > entry = pte_wrprotect(entry); > ptep_establish(vma, address, pte, entry); > lazy_mmu_prot_update(entry); > with latest git and this patch there is no corruption ! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2006 11:19:04 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > > I tried latest git with the patch from this email and it still get file > > content corruption. If I can help you further debug the problem tell me > > what to do. > > Can you please tell us all the steps which we need to take to reproduce this? I'm using rtorrent-0.7.0 and libtorrent-0.11.0, just download a torrent with multiple files(I downloaded 84 rar files) and when it will finish it will do a hash check and at the end of the check will say "Hash check on download completion found bad chunks, consider using "safe_sync"." and stop and most of the downloaded files are broken. With Peter Zijlstra patch this error doesn't show but there is file corruption(although less files are corrupted); afther the hash check, rtorrent will download the bad chunks and do another hash check and all files are ok. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 01:18 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2006 18:22:42 +1100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Andrew Morton wrote: > > > On Mon, 18 Dec 2006 15:51:52 +1100 > > > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > > > > > >>I think the problem Andrew identified is real. > > > > > > > > > I don't. In fact I don't think I described any problem (well, I tried to, > > > but then I contradicted myself). > > > > By saying that there shouldn't be any dirty ptes if there are no > > dirty buffers? But in that case the _page_ shouldn't be dirty either, > > so that clear_page_dirty would be redundant. But presumably it isn't. > > I don't follow that. > > The linkage between pte-dirtiness and buffer_heads is a bit hard to follow > without also considering page-dirtiness. > > > > Six hours here of fsx-linux plus high memory pressure on SMP on 1k > > > blocksize ext3, mainline. Zero failures. It's unlikely that this testing > > > would pass, yet people running normal workloads are able to easily trigger > > > failures. I suspect we're looking in the wrong place. > > > > Yes I could believe it the corruption is caused by something else > > completely. > > Think so. We do have a problem here, but only on threaded apps, I believe. > rtorrent doesn't appear to be threaded, and the bug is hit on non-preempt > UP. ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc1 #2 SMP PREEMPT Mon Dec 18 11:01:52 EET 2006 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux and the other person who had corruption with rtorrent has also SMP and PREEMPT. > > > >>The issue is the disconnect between the pte dirtiness and a filesystem > > >>bringing buffers clean. > > > > > > > > > Really? The dirtying direction goes pte_dirty->PG_dirty->BH_Dirty and the > > > cleaning direction goes !BH_Dirty->!PG_dirty->!pte_dirty. That's pretty > > > simple, setting aside races. > > > > > > In the try_to_free_buffers case there's a large time inverval between > > > !BH_Dirty and !PG_dirty, but that shouldn't affect anything. > > > > After try_to_free_buffers detaches the buffers from the page, a > > pagefault can come in, and mark the pte writeable, then set_page_dirty > > (which finds no buffers, so only sets PG_dirty). > > > > The page can now get dirtied through this mapping. > > > > try_to_free_buffers then goes on to clean the page and ptes. > > try_to_free_buffers() isn't called against a page which doesn't have > buffers. It'll oops. > > > Were you testing with preempt? > > nope, just SMP. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do. On Sun, 2006-12-17 at 21:50 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Nick Piggin wrote: > > > > I can't see how that's exactly a problem -- so long as the page does not > > get reclaimed (it won't, because we have a ref on it) then all that matters > > is that the page eventually gets marked dirty. > > But the point being that "try_to_free_buffers()" marks it clean > AFTERWARDS. > > So yes, the page gets marked dirty in the pte's - the hardware generally > does that for us, so we don't have to worry about that part going on. > > But "try_to_free_buffers()" seems to clear those dirty bits without > serializing it really any way. It just says "ok, I will now clear them". > Without knowing whether the dirty bits got set before the IO that cleared > the buffer head dirty bits or not. > > What is _that_ serialization? As far as I can see, the only way to > guarantee that to happen (since the dirty bits in the page tables will get > set without us ever even being notified) is that the page tables > themselves must simply never contain that page in a writable form at all. > > And that seems to be lacking. > > Anyway, I have what I consider a much simpler solution: just don't DO all > that crap in try_to_free_buffers() at all. I sent it out to some people > already, not not very widely. > > I reproduce my suggestion here for you (and maybe others too who weren't > cc'd in that other discussion group) to comment on.. > > Linus > > --- > > So I think your patch is really broken, how about this one instead? > > It's really my previous patch, BUT it also adds a > > if (PageDirty(page) .. > return 0; > > case, on the assumption that since PageDirty() measn that one of the > buffers should be dirty, there's no point in even _trying_ drop_buffers, > since that should just fail anyway. > > Now, that assumption is obviously wrong _if_ the buffers have been cleaned > by something else. So in that case, we now don't remove the buffer heads, > but who really cares? The page will remain on the dirty list, and > something should be trying to write it out, but since now all the buffers > are clean, once that happens, there is no actual IO to happen. > > Hmm? So this means that we simply don't remove the buffers early from such > pages, but there shouldn't be any real downside. > > Now, the only question would be if the page is marked dirty _while_ this > is running. We do hold the page lock, but page dirtying doesn't get the > lock, does it? But at least we won't mark the page _clean_ when it > shouldn't be.. And we still are atomic wrt the actual buffer lists > (mapping->private_lock), so I think this should all be ok, and > drop_buffers() will do the right thing. > > So no race possible either. > > At least as far as I can see. And the patch certainly is simple. > > Now the question whether this actually _fixes_ any problems does remain, > but I think this should be a pretty good solution if the bug really is > here. Andrew? > > Linus > > > diff --git a/fs/buffer.c b/fs/buffer.c > index d1f1b54..263f88e 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page) > int ret = 0; > > BUG_ON(!PageLocked(page)); > - if (PageWriteback(page)) > + if (PageDirty(page) || PageWriteback(page)) > return 0; > > if (mapping == NULL) { /* can this still happen? */ > @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page) > spin_lock(>private_lock); > ret = drop_buffers(page, _to_free); > spin_unlock(>private_lock); > - if (ret) { > - /* > - * If the filesystem writes its buffers by hand (eg ext3) > - * then we can have clean buffers against a dirty page. We > - * clean the page here; otherwise later reattachment of buffers > - * could encounter a non-uptodate page, which is unresolvable. > - * This only applies in the rare case where try_to_free_buffers > - * succeeds but the page is not freed. > - * > - * Also, during truncate, discard_buffer will have marked all > - * the page's buffers clean. We discover that here and clean > - * the page also. > - */ > - if (test_clear_page_dirty(page)) > - task_io_account_cancelled_write(PAGE_CACHE_SIZE); > - } > out: > if (buffers_to_free) { > struct buffer_head *bh = buffers_to_free; > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the
Re: 2.6.19 file content corruption on ext3
I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do. On Sun, 2006-12-17 at 21:50 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters is that the page eventually gets marked dirty. But the point being that try_to_free_buffers() marks it clean AFTERWARDS. So yes, the page gets marked dirty in the pte's - the hardware generally does that for us, so we don't have to worry about that part going on. But try_to_free_buffers() seems to clear those dirty bits without serializing it really any way. It just says ok, I will now clear them. Without knowing whether the dirty bits got set before the IO that cleared the buffer head dirty bits or not. What is _that_ serialization? As far as I can see, the only way to guarantee that to happen (since the dirty bits in the page tables will get set without us ever even being notified) is that the page tables themselves must simply never contain that page in a writable form at all. And that seems to be lacking. Anyway, I have what I consider a much simpler solution: just don't DO all that crap in try_to_free_buffers() at all. I sent it out to some people already, not not very widely. I reproduce my suggestion here for you (and maybe others too who weren't cc'd in that other discussion group) to comment on.. Linus --- So I think your patch is really broken, how about this one instead? It's really my previous patch, BUT it also adds a if (PageDirty(page) .. return 0; case, on the assumption that since PageDirty() measn that one of the buffers should be dirty, there's no point in even _trying_ drop_buffers, since that should just fail anyway. Now, that assumption is obviously wrong _if_ the buffers have been cleaned by something else. So in that case, we now don't remove the buffer heads, but who really cares? The page will remain on the dirty list, and something should be trying to write it out, but since now all the buffers are clean, once that happens, there is no actual IO to happen. Hmm? So this means that we simply don't remove the buffers early from such pages, but there shouldn't be any real downside. Now, the only question would be if the page is marked dirty _while_ this is running. We do hold the page lock, but page dirtying doesn't get the lock, does it? But at least we won't mark the page _clean_ when it shouldn't be.. And we still are atomic wrt the actual buffer lists (mapping-private_lock), so I think this should all be ok, and drop_buffers() will do the right thing. So no race possible either. At least as far as I can see. And the patch certainly is simple. Now the question whether this actually _fixes_ any problems does remain, but I think this should be a pretty good solution if the bug really is here. Andrew? Linus diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page) int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page) spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* - * If the filesystem writes its buffers by hand (eg ext3) - * then we can have clean buffers against a dirty page. We - * clean the page here; otherwise later reattachment of buffers - * could encounter a non-uptodate page, which is unresolvable. - * This only applies in the rare case where try_to_free_buffers - * succeeds but the page is not freed. - * - * Also, during truncate, discard_buffer will have marked all - * the page's buffers clean. We discover that here and clean - * the page also. - */ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 01:18 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin [EMAIL PROTECTED] wrote: I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I tried to, but then I contradicted myself). By saying that there shouldn't be any dirty ptes if there are no dirty buffers? But in that case the _page_ shouldn't be dirty either, so that clear_page_dirty would be redundant. But presumably it isn't. I don't follow that. The linkage between pte-dirtiness and buffer_heads is a bit hard to follow without also considering page-dirtiness. Six hours here of fsx-linux plus high memory pressure on SMP on 1k blocksize ext3, mainline. Zero failures. It's unlikely that this testing would pass, yet people running normal workloads are able to easily trigger failures. I suspect we're looking in the wrong place. Yes I could believe it the corruption is caused by something else completely. Think so. We do have a problem here, but only on threaded apps, I believe. rtorrent doesn't appear to be threaded, and the bug is hit on non-preempt UP. ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc1 #2 SMP PREEMPT Mon Dec 18 11:01:52 EET 2006 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux and the other person who had corruption with rtorrent has also SMP and PREEMPT. The issue is the disconnect between the pte dirtiness and a filesystem bringing buffers clean. Really? The dirtying direction goes pte_dirty-PG_dirty-BH_Dirty and the cleaning direction goes !BH_Dirty-!PG_dirty-!pte_dirty. That's pretty simple, setting aside races. In the try_to_free_buffers case there's a large time inverval between !BH_Dirty and !PG_dirty, but that shouldn't affect anything. After try_to_free_buffers detaches the buffers from the page, a pagefault can come in, and mark the pte writeable, then set_page_dirty (which finds no buffers, so only sets PG_dirty). The page can now get dirtied through this mapping. try_to_free_buffers then goes on to clean the page and ptes. try_to_free_buffers() isn't called against a page which doesn't have buffers. It'll oops. Were you testing with preempt? nope, just SMP. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 11:19:04 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do. Can you please tell us all the steps which we need to take to reproduce this? I'm using rtorrent-0.7.0 and libtorrent-0.11.0, just download a torrent with multiple files(I downloaded 84 rar files) and when it will finish it will do a hash check and at the end of the check will say Hash check on download completion found bad chunks, consider using safe_sync. and stop and most of the downloaded files are broken. With Peter Zijlstra patch this error doesn't show but there is file corruption(although less files are corrupted); afther the hash check, rtorrent will download the bad chunks and do another hash check and all files are ok. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3 in? ordered Also, for testings sake, could you give this a go: It's a total hack but I guess worth testing. --- mm/rmap.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-git/mm/rmap.c === --- linux-2.6-git.orig/mm/rmap.c 2006-12-18 11:06:29.0 +0100 +++ linux-2.6-git/mm/rmap.c 2006-12-18 11:07:16.0 +0100 @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page goto unlock; entry = ptep_get_and_clear(mm, address, pte); - entry = pte_mkclean(entry); + /* entry = pte_mkclean(entry); */ entry = pte_wrprotect(entry); ptep_establish(vma, address, pte, entry); lazy_mmu_prot_update(entry); with latest git and this patch there is no corruption ! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
(On that note: Andrei - if you do test this out, I'd suggest applying my patch too - the one that you already tested. It won't apply cleanly on top of Andrew's patch, but it should be trivial to apply by hand, since you really just want to remove the whole if (ret) {...} sequence. I realize that it didn't make any difference for you, but applying that patch is probably a good idea just to remove the noise for a codepath that you already showed to not matter) I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last two). All unified patch is attached. I tested and I have no corruption. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..760442f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 1)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(fc-lock); if (offset == 0 to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..7b87875 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..47a6b62 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp-count since we may have released page lock */ if (test_bit(META_discard, mp-flag) !mp-count) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); } #else diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 47e7027..a97e198 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p bh = next; } while (bh != head); if (PAGE_SIZE == bh-b_size) { - clear_page_dirty(page); + clear_page_dirty(page, 0); } } } diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index b56eb75..d65ba84 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -343,7 +343,7 @@ xfs_start_page_writeback( ASSERT(!PageWriteback(page)); set_page_writeback(page); if (clear_dirty) - clear_page_dirty(page); + clear_page_dirty(page, 1); unlock_page(page); if (!buffers) {
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Andrei Popa wrote: I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last two). All unified patch is attached. I tested and I have no corruption. That wasn't very interesting, because you also had the patch that just disabled page_mkclean_one() entirely: diff --git a/mm/rmap.c b/mm/rmap.c index d8a842a..3f9061e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page goto unlock; entry = ptep_get_and_clear(mm, address, pte); - entry = pte_mkclean(entry); + /*entry = pte_mkclean(entry);*/ entry = pte_wrprotect(entry); ptep_establish(vma, address, pte, entry); lazy_mmu_prot_update(entry); The above patch is bad. It's always going to hide the bug, but it hides it by just not doing anything at all. So any patch combination that contains that patch will probably _always_ fix your problem, but it won't be an interesting patch.. So can you remove that small fragment? Also, it would be nice if you added the WARN_ON() to this sequence in mm/page-writeback.c: + if (!must_clean_ptes cleaned) + set_page_dirty(page); just make it do a WARN_ON() if this ever triggers. Then, IF the corruption is gone, we'd love to see the WARN_ON results.. Linus I dropped that patch and added WARN_ON(1), the unified patch is attached. I got corruption: Hash check on download completion found bad chunks, consider using safe_sync. In dmesg there is no message from WARN_ON(1), my .config is attached. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..760442f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 1)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(fc-lock); if (offset == 0 to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..7b87875 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 1); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..47a6b62 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp-count since we may have released page lock */ if (test_bit(META_discard, mp-flag) !mp-count) { - clear_page_dirty(page
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Linus Torvalds wrote: But at the same time, it's interesting that it still happens when we try to re-add the dirty bit. That would tell me that it's one of two cases: Forget that. There's a third case, which is much more likely: - Andrew's patch had a , 1 where it _should_ have had a , 0. This should be fairly easy to test: just change every single , 1 case in the patch to , 0. The only case that _definitely_ would want ,1 is actually the case that already calls page_mkclean() directly: clear_page_dirty_for_io(). So no other , 1 is valid, and that one that needed it already avoided even calling the test_clear_page_dirty() function, because it did it all by hand. What happens for you in that case? Linus I have file corruption. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..760442f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(fc-lock); if (offset == 0 to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..7b87875 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..47a6b62 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp-count since we may have released page lock */ if (test_bit(META_discard, mp-flag) !mp-count) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); } #else diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 47e7027..a97e198 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p bh = next; } while (bh != head); if (PAGE_SIZE == bh-b_size) { - clear_page_dirty(page); + clear_page_dirty(page, 0); } } } diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index b56eb75..d65ba84 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -343,7 +343,7 @@ xfs_start_page_writeback(
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Andrei Popa wrote: This should be fairly easy to test: just change every single , 1 case in the patch to , 0. What happens for you in that case? I have file corruption. Magic. And btw, _thanks_ for being such a great tester. So now I have one more thng for you to try, it you can bother: There's exactly two call sites that call page_mkclean() (an dthat is the only thing in turn that calls page_mkclean_one(), which we already determined will cause the corruption). Both of them do if (mapping_cap_account_dirty(mapping)) { .. things, although they do slightly different things inside that if in your patched kernel. Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() case? Just do an #if 0 .. #endif around that whole if-statement, leaving the _only_ thing that actually calls page_mkclean() to be the clear_page_dirty_for_io() call. Do you still see corruption? nope, no file corruption at all. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..2d8 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(fc-lock); if (offset == 0 to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..9f82cd0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..5e29b37 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp-count since we may have released page lock */ if (test_bit(META_discard, mp-flag) !mp-count) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); } #else diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c index 47e7027..a97e198 100644 --- a/fs/reiserfs/stree.c +++ b/fs/reiserfs/stree.c @@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p bh = next; } while (bh != head); if (PAGE_SIZE == bh-b_size) { - clear_page_dirty(page); + clear_page_dirty(page, 0
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Alessandro Suardi wrote: No idea whether this can be a data point or not, but here it goes... my P2P box is about to turn 5 days old while running nonstop one or both of aMule 2.1.3 and BitTorrent 4.4.0 on ext3 mounted w/default options on both IDE and USB disks. Zero corruption. AMD K7-800, 512MB RAM, PREEMPT/UP kernel, 2.6.19-git20 on top of up-to-date FC6. It _looks_ like PREEMPT/SMP is one common configuration. It might also be that the blocksize of the filesystem matters. 4kB filesystems are fundamentally simpler than 1kB filesystems, for example. You can tell at least with /sbin/dumpe2fs -h /dev/... or something. Andrei - one thing that might be interesting to see: when corruption occurs, can you get the corrupted file somehow? And compare it with a known-good copy to see what the corruption looks like? the corrupted file has a chink full with zeros http://193.226.119.62/corruption0.jpg http://193.226.119.62/corruption1.jpg - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Andrei Popa wrote: There's exactly two call sites that call page_mkclean() (an dthat is the only thing in turn that calls page_mkclean_one(), which we already determined will cause the corruption). Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() case? Just do an #if 0 .. #endif around that whole if-statement, leaving the _only_ thing that actually calls page_mkclean() to be the clear_page_dirty_for_io() call. Do you still see corruption? nope, no file corruption at all. Ok. That's interesting, but I think you actually #ifdef'ed out too much: + +#if 0 if (TestClearPageDirty(page)) { radix_tree_tag_clear(mapping-page_tree, page_index(page), PAGECACHE_TAG_DIRTY); @@ -866,11 +868,19 @@ int test_clear_page_dirty(struct page *p * page is locked, which pins the address_space */ if (mapping_cap_account_dirty(mapping)) { - page_mkclean(page); + int cleaned = page_mkclean(page); + if (!must_clean_ptes cleaned){ + WARN_ON(1); + set_page_dirty(page); + } + dec_zone_page_state(page, NR_FILE_DIRTY); } return 1; } + +#endif + It was really just the _inner_ if (mapping_cap_account_dirty(.. statement that I meant you should remove. Can you try that too? I have file corruption: Hash check on download completion found bad chunks, consider using safe_sync. diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..2d8 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(fc-lock); if (offset == 0 to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..9f82cd0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); remove_from_page_cache(page); put_page(page); diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index b1a1c72..5e29b37 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1 /* Retest mp-count since we may have released page lock */ if (test_bit(META_discard, mp-flag) !mp-count
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 16:57:30 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: What happens if you only ifdef out that single thing? The actual page-cleaning functions make sure to only clear the TAG_DIRTY bit _after_ the page has been marked for writeback. Is there some ordering constraint there, perhaps? I'm really reaching here. I'm trying to see the pattern, and I'm not seeing it. I'm asking you to test things just to get more of a feel for what triggers the failure, than because I actually have any kind of idea of what the heck is going on. Andrew, Nick, Hugh - any ideas? If all of test_clear_page_dirty() has been commented out then the page will never become clean hence will never fall out of pagecache, so unless Andrei is doing a reboot before checking for corruption, perhaps the underlying data on-disk is incorrect, but we can't see it. if I do a sync and echo 1 /proc/sys/vm/drop_caches does the reboot is still necesary ? Andrei, how _are_ you running this test?What's the exact sequence of steps? In particular, are you doing anything which would cause the corrupted file to be evicted from memory, thus forcing a read from disk? Such as unmounting and then remounting the filesystem? I boot linux, I start rtorrent and start the download, while it's downloading I start evolution and i check my mail(my mbox is very large, several hundered megabytes), I close evolution(I use evolution just to have another application witch uses the filesystem and the memory), I start evolution again. I start firefox. The download is complete. Rtorrent says if the hash is good or not. I do a unrar t qwe.rar to test that all 84 downloaded rar files are ok and see the result. The point of my question is to check that the data is really incorrect on-disk, or whether it is incorrect in pagecache. Also, it'd be useful if you could determine whether the bug appears with the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with rootfstype=ext2 if it's the root filesystem. I will test. Thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Andrei Popa wrote: nope, no file corruption at all. Ok. That's interesting, but I think you actually #ifdef'ed out too much: It was really just the _inner_ if (mapping_cap_account_dirty(.. statement that I meant you should remove. Can you try that too? I have file corruption: Hash check on download completion found bad chunks, consider using safe_sync. Ok, that's interesting. So it doesn't seem to be the call to page_mkclean() itself that causes corruption. It looks like Peter's hunch that maybe there's some bug in PG_dirty handling _itself_ might be an idea.. And the reason it only started happening now is that it may just have been _hidden_ by the fact that while we kept the dirty bits in the page tables, we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. So if it's some bad interaction between writable mappings and some other part of the system, we just didn't see it earlier, exactly because we had _lots_ of dirty bits, and it was enough that _one_ of them was right. If you didn't see corruption when you #ifdef'ed out too much of the test_clean_page_dirty() function (the _whole_ TestClearPageDirty() if-statement), but you get it when you just comment out the stuff that does the page_mkclean(), that's interesting. I'm left lookin gat the radix_tree_tag_clear() in test_clear_page_dirty(). What happens if you only ifdef out that single thing? I have file corruption. The actual page-cleaning functions make sure to only clear the TAG_DIRTY bit _after_ the page has been marked for writeback. Is there some ordering constraint there, perhaps? I'm really reaching here. I'm trying to see the pattern, and I'm not seeing it. I'm asking you to test things just to get more of a feel for what triggers the failure, than because I actually have any kind of idea of what the heck is going on. Andrew, Nick, Hugh - any ideas? Linus diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) + if (PageDirty(page) || PageWriteback(page)) return 0; if (mapping == NULL) { /* can this still happen? */ @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag spin_lock(mapping-private_lock); ret = drop_buffers(page, buffers_to_free); spin_unlock(mapping-private_lock); - if (ret) { - /* -* If the filesystem writes its buffers by hand (eg ext3) -* then we can have clean buffers against a dirty page. We -* clean the page here; otherwise later reattachment of buffers -* could encounter a non-uptodate page, which is unresolvable. -* This only applies in the rare case where try_to_free_buffers -* succeeds but the page is not freed. -* -* Also, during truncate, discard_buffer will have marked all -* the page's buffers clean. We discover that here and clean -* the page also. -*/ - if (test_clear_page_dirty(page)) - task_io_account_cancelled_write(PAGE_CACHE_SIZE); - } out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 0f05cab..2d8 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1245,7 +1245,7 @@ retry: wait_on_page_writeback(page); if (PageWriteback(page) || - !test_clear_page_dirty(page)) { + !test_clear_page_dirty(page, 0)) { unlock_page(page); break; } diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 1387749..da2bdb1 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -484,7 +484,7 @@ static int fuse_commit_write(struct file spin_unlock(fc-lock); if (offset == 0 to == PAGE_CACHE_SIZE) { - clear_page_dirty(page); + clear_page_dirty(page, 0); SetPageUptodate(page); } } diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index ed2c223..9f82cd0 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct static void truncate_huge_page(struct page *page) { - clear_page_dirty(page); + clear_page_dirty(page, 0); ClearPageUptodate(page); remove_from_page_cache(page
Re: 2.6.19 file content corruption on ext3
If all of test_clear_page_dirty() has been commented out then the page will never become clean hence will never fall out of pagecache, so unless Andrei is doing a reboot before checking for corruption, perhaps the underlying data on-disk is incorrect, but we can't see it. if I do a sync and echo 1 /proc/sys/vm/drop_caches OK, that works. does the reboot is still necesary ? It might be necessary to reboot in this case - if we're leaving the pagecache dirty, writing to drop_caches won't remove it. And you probably won't be able to get a clean reboot either. Andrei, how _are_ you running this test?What's the exact sequence of steps? In particular, are you doing anything which would cause the corrupted file to be evicted from memory, thus forcing a read from disk? Such as unmounting and then remounting the filesystem? I boot linux, I start rtorrent and start the download, while it's downloading I start evolution and i check my mail(my mbox is very large, several hundered megabytes), I close evolution(I use evolution just to have another application witch uses the filesystem and the memory), I start evolution again. I start firefox. The download is complete. Rtorrent says if the hash is good or not. I do a unrar t qwe.rar to test that all 84 downloaded rar files are ok and see the result. The point of my question is to check that the data is really incorrect on-disk, or whether it is incorrect in pagecache. I rebooted and the files are still broken after reboot(tested twice) so the data is incorrect on disk. Also, it'd be useful if you could determine whether the bug appears with the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with rootfstype=ext2 if it's the root filesystem. I will test. Will test In a couple of hours, I have some work to do... ok, thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
I was mistaken, I'm still having file corruption with rtorrent. On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: > On Sun, 17 Dec 2006 02:13:18 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > Hello, > > I had filesystem data corruption with rtorrent with 2.6.19. > > I tried recent git with Peter Zijlstra patch > > http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is > > fixed. > > > > oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the > ptes. > > I'd be really surprised if this was all due to a race though. Is everyone > who has observed this problem running SMP and/or premptible kernels? > > Peter, why isn't that proposed patch's cleaning of the pte racy against > do_wp_page()? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc1 #1 SMP PREEMPT Sun Dec 17 01:52:28 EET 2006 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: > On Sun, 17 Dec 2006 02:13:18 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > Hello, > > I had filesystem data corruption with rtorrent with 2.6.19. > > I tried recent git with Peter Zijlstra patch > > http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is > > fixed. > > > > oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the > ptes. > > I'd be really surprised if this was all due to a race though. Is everyone > who has observed this problem running SMP and/or premptible kernels? > > Peter, why isn't that proposed patch's cleaning of the pte racy against > do_wp_page()? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc1 #1 SMP PREEMPT Sun Dec 17 01:52:28 EET 2006 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: On Sun, 17 Dec 2006 02:13:18 +0200 Andrei Popa [EMAIL PROTECTED] wrote: Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the ptes. I'd be really surprised if this was all due to a race though. Is everyone who has observed this problem running SMP and/or premptible kernels? Peter, why isn't that proposed patch's cleaning of the pte racy against do_wp_page()? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
I was mistaken, I'm still having file corruption with rtorrent. On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: On Sun, 17 Dec 2006 02:13:18 +0200 Andrei Popa [EMAIL PROTECTED] wrote: Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the ptes. I'd be really surprised if this was all due to a race though. Is everyone who has observed this problem running SMP and/or premptible kernels? Peter, why isn't that proposed patch's cleaning of the pte racy against do_wp_page()? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. Please CC as I am not subscribed to lkml. Andrei - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 file content corruption on ext3
Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. Please CC as I am not subscribed to lkml. Andrei - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/