Re: high number of dropped packets/rx_missed_errors from 4.17 kernel

2020-12-03 Thread Andrei Popa
Hi,

I’ve applied your patch on kernel 4.17.0 and dropped packets and 
rx_missed_errors are still present, through they are increasing at a lower rate.

root@shaper:~# ./test
 rx_missed_errors: 2135
RX errors 0  dropped 2155  overruns 0  frame 0
sleeping 60 seconds
 rx_missed_errors: 2433
RX errors 0  dropped 2459  overruns 0  frame 0
sleeping 60 seconds
 rx_missed_errors: 2433
RX errors 0  dropped 2465  overruns 0  frame 0
sleeping 60 seconds
 rx_missed_errors: 2526
RX errors 0  dropped 2564  overruns 0  frame 0
sleeping 60 seconds


> On 3 Dec 2020, at 21:43, Andrei Popa  wrote:
> 
> Hi,
> 
> On what kernel version should I try the patch ? I tried on 5.9 and it doesn't 
> build.
> 
>> On 18 Nov 2020, at 20:47, Rafael J. Wysocki  wrote:
>> 
>> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>>> Hello,
>>>> 
>>>> After an update from vmlinuz-4.15.0-106-generic to 
>>>> vmlinuz-5.4.0-37-generic we experience, on a  number of servers, a very 
>>>> high number of rx_missed_errors and dropped packets only on the uplink 10G 
>>>> interface. We have another 10G downlink interface with no problems.
>>>> 
>>>> The affected servers have the following mainboards:
>>>> S5520HC ver E26045-455
>>>> S5520UR ver E22554-751
>>>> S5520UR ver E22554-753
>>>> S5000VSA
>>>> 
>>>> On other 30 servers with similar mainboards and/or configs there are no 
>>>> dropped packets with vmlinuz-5.4.0-37-generic.
>>>> 
>>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>>> 
>>>> root@shaper:~# cat test
>>>> #!/bin/bash
>>>> while true
>>>> do
>>>> ethtool -S ens6f1|grep "missed_errors"
>>>> ifconfig ens6f1|grep RX|grep dropped
>>>> sleep 1
>>>> done
>>>> 
>>>> root@shaper:~# ./test
>>>> rx_missed_errors: 2418845
>>>>RX errors 0  dropped 241  overruns 0  frame 0
>>>> rx_missed_errors: 2426175
>>>>RX errors 0  dropped 2426218  overruns 0  frame 0
>>>> rx_missed_errors: 2431910
>>>>RX errors 0  dropped 2431953  overruns 0  frame 0
>>>> rx_missed_errors: 2437266
>>>>RX errors 0  dropped 2437309  overruns 0  frame 0
>>>> rx_missed_errors: 2443305
>>>>RX errors 0  dropped 2443348  overruns 0  frame 0
>>>> rx_missed_errors: 2448357
>>>>RX errors 0  dropped 2448400  overruns 0  frame 0
>>>> rx_missed_errors: 2452539
>>>>RX errors 0  dropped 2452582  overruns 0  frame 0
>>>> 
>>>> We did a git bisect and we’ve found that the following commit generates 
>>>> the high number of dropped packets:
>>>> 
>>>> Author: Rafael J. Wysocki >>> <mailto:rafael.j.wyso...@intel.com>>
>>>> Date:   Thu Apr 5 19:12:43 2018 +0200
>>>>cpuidle: menu: Avoid selecting shallow states with stopped tick
>>>>If the scheduler tick has been stopped already and the governor
>>>>selects a shallow idle state, the CPU can spend a long time in that
>>>>state if the selection is based on an inaccurate prediction of idle
>>>>time.  That effect turns out to be relevant, so it needs to be
>>>>mitigated.
>>>>To that end, modify the menu governor to discard the result of the
>>>>idle time prediction if the tick is stopped and the predicted idle
>>>>time is less than the tick period length, unless the tick timer is
>>>>going to expire soon.
>>>>Signed-off-by: Rafael J. Wysocki >>> <mailto:rafael.j.wyso...@intel.com>>
>>>>Acked-by: Peter Zijlstra (Intel) >>> <mailto:pet...@infradead.org>>
>>>> diff --git a/drivers/cpuidle/governors/menu.c 
>>>> b/drivers/cpuidle/governors/menu.c
>>>> index 267982e471e0..1bfe03ceb236 100644
>>>> --- a/drivers/cpuidle/governors/menu.c
>>>> +++ b/drivers/cpuidle/governors/menu.c
>>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, 
>>>> struct cpuidle_device *dev,
>>>> */
>>>>data->predicted_us = min(data->predicted_us, expected

Re: high number of dropped packets/rx_missed_errors from 4.17 kernel

2020-12-03 Thread Andrei Popa
Hi,

On what kernel version should I try the patch ? I tried on 5.9 and it doesn't 
build.

> On 18 Nov 2020, at 20:47, Rafael J. Wysocki  wrote:
> 
> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>> Hello,
>>> 
>>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic 
>>> we experience, on a  number of servers, a very high number of 
>>> rx_missed_errors and dropped packets only on the uplink 10G interface. We 
>>> have another 10G downlink interface with no problems.
>>> 
>>> The affected servers have the following mainboards:
>>> S5520HC ver E26045-455
>>> S5520UR ver E22554-751
>>> S5520UR ver E22554-753
>>> S5000VSA
>>> 
>>> On other 30 servers with similar mainboards and/or configs there are no 
>>> dropped packets with vmlinuz-5.4.0-37-generic.
>>> 
>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>> 
>>> root@shaper:~# cat test
>>> #!/bin/bash
>>> while true
>>> do
>>> ethtool -S ens6f1|grep "missed_errors"
>>> ifconfig ens6f1|grep RX|grep dropped
>>> sleep 1
>>> done
>>> 
>>> root@shaper:~# ./test
>>>  rx_missed_errors: 2418845
>>> RX errors 0  dropped 241  overruns 0  frame 0
>>>  rx_missed_errors: 2426175
>>> RX errors 0  dropped 2426218  overruns 0  frame 0
>>>  rx_missed_errors: 2431910
>>> RX errors 0  dropped 2431953  overruns 0  frame 0
>>>  rx_missed_errors: 2437266
>>> RX errors 0  dropped 2437309  overruns 0  frame 0
>>>  rx_missed_errors: 2443305
>>> RX errors 0  dropped 2443348  overruns 0  frame 0
>>>  rx_missed_errors: 2448357
>>> RX errors 0  dropped 2448400  overruns 0  frame 0
>>>  rx_missed_errors: 2452539
>>> RX errors 0  dropped 2452582  overruns 0  frame 0
>>> 
>>> We did a git bisect and we’ve found that the following commit generates the 
>>> high number of dropped packets:
>>> 
>>> Author: Rafael J. Wysocki >> <mailto:rafael.j.wyso...@intel.com>>
>>> Date:   Thu Apr 5 19:12:43 2018 +0200
>>> cpuidle: menu: Avoid selecting shallow states with stopped tick
>>> If the scheduler tick has been stopped already and the governor
>>> selects a shallow idle state, the CPU can spend a long time in that
>>> state if the selection is based on an inaccurate prediction of idle
>>> time.  That effect turns out to be relevant, so it needs to be
>>> mitigated.
>>> To that end, modify the menu governor to discard the result of the
>>> idle time prediction if the tick is stopped and the predicted idle
>>> time is less than the tick period length, unless the tick timer is
>>> going to expire soon.
>>> Signed-off-by: Rafael J. Wysocki >> <mailto:rafael.j.wyso...@intel.com>>
>>> Acked-by: Peter Zijlstra (Intel) >> <mailto:pet...@infradead.org>>
>>> diff --git a/drivers/cpuidle/governors/menu.c 
>>> b/drivers/cpuidle/governors/menu.c
>>> index 267982e471e0..1bfe03ceb236 100644
>>> --- a/drivers/cpuidle/governors/menu.c
>>> +++ b/drivers/cpuidle/governors/menu.c
>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, 
>>> struct cpuidle_device *dev,
>>>  */
>>> data->predicted_us = min(data->predicted_us, expected_interval);
>>> -   /*
>>> -* Use the performance multiplier and the user-configurable
>>> -* latency_req to determine the maximum exit latency.
>>> -*/
>>> -   interactivity_req = data->predicted_us / 
>>> performance_multiplier(nr_iowaiters, cpu_load);
>>> -   if (latency_req > interactivity_req)
>>> -   latency_req = interactivity_req;
>> 
>> The tick_nohz_tick_stopped() check may be done after the above and it 
>> may be reworked a bit.
>> 
>> I'll send a test patch to you shortly.
> 
> The patch is appended, but please note that it has been rebased by hand and
> not tested.
> 
> Please let me know if it makes any difference.
> 
> And in the future please avoid pasting the entire kernel config to your
> reports, that's problematic.
> 
> ---
> dri

Re: [BUG] ethX misnumbered and one missing in mii-tool

2007-03-30 Thread Andrei Popa
On Fri, 2007-03-30 at 12:35 -0400, Lennart Sorensen wrote:
> On Fri, Mar 30, 2007 at 10:42:23AM +0300, Andrei Popa wrote:
> > ethtool reports the same
> 
> Is udev running and having fun renumbering interfaces as they are being
> detected in order to keep "consistent" interface names?

yes, it's udevs fault:

zeus rules.d # cat 70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, probably run by the persistent-net-generator.rules rules
file.
#
# You can modify it, as long as you keep each rule on a single line.

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:55",
NAME="eth1"

# PCI device 0x8086:0x1026 (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:0e:0c:ba:a8:50",
NAME="eth2"

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:54",
NAME="eth0"

# PCI device 0x8086:0x1027 (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:0e:0c:5f:84:84",
NAME="eth3"

# PCI device 0x1148:0x4320 (skge)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:0c:46:46:7c:7f",
NAME="eth4"

# PCI device 0x8086:0x105e (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:21:0c:09",
NAME="eth5"

# PCI device 0x8086:0x105e (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:21:0c:08",
NAME="eth6"

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:69",
NAME="eth7"

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM=="net", DRIVERS=="?*", ATTRS{address}=="00:15:17:17:b7:68",
NAME="eth8"

thanks for pointing this out.

> 
> --
> Len Sorensen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ethX misnumbered and one missing in mii-tool

2007-03-30 Thread Andrei Popa
On Thu, 2007-03-29 at 21:21 -0700, Jesse Brandeburg wrote:
> added netdev.
> 
> On 3/29/07, Andrei Popa <[EMAIL PROTECTED]> wrote:
> > In a dual core 2 server with an intel motherboard and 5 network
> > cards(two onboard) and 1 pci express card with two slots and one pci-x
> > pci64 card the kernel sees all of them in dmesg but in mii-tool are
> > misnumbered and one card is missing.
> > (please CC as I am not subscribed to lkml)
> 
> please don't use mii-tool, ethtool is a much better option and
> actually works with gigabit cards.

ethtool reports the same

> 
> > from dmesg:
> > Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI
> > Copyright (c) 1999-2005 Intel Corporation.
> > ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16
> > PCI: Setting latency timer of device :03:00.0 to 64
> > e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
> > 00:15:17:21:0c:08
> > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> 
> eth0...
> 
> > ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17
> > PCI: Setting latency timer of device :03:00.1 to 64
> > e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
> > 00:15:17:21:0c:09
> > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> eth0...
> 
> > ACPI: PCI Interrupt :05:00.0[A] -> GSI 18 (level, low) -> IRQ 18
> > PCI: Setting latency timer of device :05:00.0 to 64
> > e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
> > 00:15:17:17:b7:68
> > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> eth0...
> > GSI 20 sharing vector 0xC9 and IRQ 20
> > ACPI: PCI Interrupt :05:00.1[B] -> GSI 19 (level, low) -> IRQ 20
> > PCI: Setting latency timer of device :05:00.1 to 64
> > e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
> > 00:15:17:17:b7:69
> > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> eth0...
> 
> > GSI 21 sharing vector 0xD1 and IRQ 21
> > ACPI: PCI Interrupt :06:02.0[A] -> GSI 27 (level, low) -> IRQ 21
> > e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit)
> > 00:0e:0c:ba:a8:50
> > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> eth0...
> 
> um, I'm a little confused why every interface was named eth0 when it
> tried to come up.



> you didn't mention what kernel you're using.

this war kernel 2.6.17.14 and the driver was compiled as a module.


with kernel 2.6.20.4(and build in e1000 driver):
zeus ~ # uname -a
Linux zeus 2.6.20.4-zeus3 #3 SMP Wed Mar 28 13:44:50 EEST 2007 x86_64
Intel(R) Xeon(TM) CPU 3.00GHz GenuineIntel GNU/Linux

the devices are recognized ok as eth0,eth1.eth2,eth3,eth4 but misnumered
and one missing int mii-tool/ethtool

Intel(R) PRO/1000 Network Driver - version 7.3.15-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device :03:00.0 to 64
e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:08
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:09
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :05:00.0[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device :05:00.0 to 64
e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:68
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :05:00.1[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device :05:00.1 to 64
e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:69
e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :06:02.0[A] -> GSI 27 (level, low) -> IRQ 27
e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit)
00:0e:0c:ba:a8:50
e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection

zeus ~ # mii-tool
eth2: no link
eth5: negotiated 100baseTx-FD, link ok
eth6: no link
eth7: no link
zeus ~ #

ethtool shows the same

> 
> you can enable MSI and not share interrupts on this platform, it will
> at least help your PCIe adapters.

Initialy I enabled it but I thought it was a problem from there and
disabled it.

> 
> > zeus ~ # mii-tool
> > eth2: no link
> > eth5: negotiated 100baseTx-FD, link ok
> > eth6: no link
> > eth7: no link
> > zeus ~ #
> >
> > it 

Re: [BUG] ethX misnumbered and one missing in mii-tool

2007-03-30 Thread Andrei Popa
On Thu, 2007-03-29 at 21:21 -0700, Jesse Brandeburg wrote:
 added netdev.
 
 On 3/29/07, Andrei Popa [EMAIL PROTECTED] wrote:
  In a dual core 2 server with an intel motherboard and 5 network
  cards(two onboard) and 1 pci express card with two slots and one pci-x
  pci64 card the kernel sees all of them in dmesg but in mii-tool are
  misnumbered and one card is missing.
  (please CC as I am not subscribed to lkml)
 
 please don't use mii-tool, ethtool is a much better option and
 actually works with gigabit cards.

ethtool reports the same

 
  from dmesg:
  Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI
  Copyright (c) 1999-2005 Intel Corporation.
  ACPI: PCI Interrupt :03:00.0[A] - GSI 16 (level, low) - IRQ 16
  PCI: Setting latency timer of device :03:00.0 to 64
  e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
  00:15:17:21:0c:08
  e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
 
 eth0...
 
  ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17
  PCI: Setting latency timer of device :03:00.1 to 64
  e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
  00:15:17:21:0c:09
  e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
 eth0...
 
  ACPI: PCI Interrupt :05:00.0[A] - GSI 18 (level, low) - IRQ 18
  PCI: Setting latency timer of device :05:00.0 to 64
  e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
  00:15:17:17:b7:68
  e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
 eth0...
  GSI 20 sharing vector 0xC9 and IRQ 20
  ACPI: PCI Interrupt :05:00.1[B] - GSI 19 (level, low) - IRQ 20
  PCI: Setting latency timer of device :05:00.1 to 64
  e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
  00:15:17:17:b7:69
  e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
 eth0...
 
  GSI 21 sharing vector 0xD1 and IRQ 21
  ACPI: PCI Interrupt :06:02.0[A] - GSI 27 (level, low) - IRQ 21
  e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit)
  00:0e:0c:ba:a8:50
  e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
 eth0...
 
 um, I'm a little confused why every interface was named eth0 when it
 tried to come up.



 you didn't mention what kernel you're using.

this war kernel 2.6.17.14 and the driver was compiled as a module.


with kernel 2.6.20.4(and build in e1000 driver):
zeus ~ # uname -a
Linux zeus 2.6.20.4-zeus3 #3 SMP Wed Mar 28 13:44:50 EEST 2007 x86_64
Intel(R) Xeon(TM) CPU 3.00GHz GenuineIntel GNU/Linux

the devices are recognized ok as eth0,eth1.eth2,eth3,eth4 but misnumered
and one missing int mii-tool/ethtool

Intel(R) PRO/1000 Network Driver - version 7.3.15-k2-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt :03:00.0[A] - GSI 16 (level, low) - IRQ 16
PCI: Setting latency timer of device :03:00.0 to 64
e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:08
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:09
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :05:00.0[A] - GSI 18 (level, low) - IRQ 18
PCI: Setting latency timer of device :05:00.0 to 64
e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:68
e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :05:00.1[B] - GSI 19 (level, low) - IRQ 19
PCI: Setting latency timer of device :05:00.1 to 64
e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:69
e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :06:02.0[A] - GSI 27 (level, low) - IRQ 27
e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit)
00:0e:0c:ba:a8:50
e1000: eth4: e1000_probe: Intel(R) PRO/1000 Network Connection

zeus ~ # mii-tool
eth2: no link
eth5: negotiated 100baseTx-FD, link ok
eth6: no link
eth7: no link
zeus ~ #

ethtool shows the same

 
 you can enable MSI and not share interrupts on this platform, it will
 at least help your PCIe adapters.

Initialy I enabled it but I thought it was a problem from there and
disabled it.

 
  zeus ~ # mii-tool
  eth2: no link
  eth5: negotiated 100baseTx-FD, link ok
  eth6: no link
  eth7: no link
  zeus ~ #
 
  it sees only 4 cards that are misnumbered and one is missing.
 
 what does 'ip link' or 'ifconfig -a' show?

zeus ~ # ip link
1: eth6: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000
link/ether 00:15:17:21:0c:08 brd ff:ff:ff:ff:ff:ff
2: eth5: BROADCAST,MULTICAST,UP,1 mtu 1500 qdisc pfifo_fast qlen
1000
link/ether 00:15:17:21:0c:09 brd ff:ff:ff:ff:ff:ff
3: eth8: BROADCAST,MULTICAST mtu 1500 qdisc noop qlen 1000
link/ether 00:15:17:17:b7:68 brd ff:ff:ff:ff:ff:ff
4: eth7: BROADCAST,MULTICAST mtu 1500

Re: [BUG] ethX misnumbered and one missing in mii-tool

2007-03-30 Thread Andrei Popa
On Fri, 2007-03-30 at 12:35 -0400, Lennart Sorensen wrote:
 On Fri, Mar 30, 2007 at 10:42:23AM +0300, Andrei Popa wrote:
  ethtool reports the same
 
 Is udev running and having fun renumbering interfaces as they are being
 detected in order to keep consistent interface names?

yes, it's udevs fault:

zeus rules.d # cat 70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, probably run by the persistent-net-generator.rules rules
file.
#
# You can modify it, as long as you keep each rule on a single line.

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:55,
NAME=eth1

# PCI device 0x8086:0x1026 (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:0e:0c:ba:a8:50,
NAME=eth2

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:54,
NAME=eth0

# PCI device 0x8086:0x1027 (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:0e:0c:5f:84:84,
NAME=eth3

# PCI device 0x1148:0x4320 (skge)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:0c:46:46:7c:7f,
NAME=eth4

# PCI device 0x8086:0x105e (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:21:0c:09,
NAME=eth5

# PCI device 0x8086:0x105e (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:21:0c:08,
NAME=eth6

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:69,
NAME=eth7

# PCI device 0x8086:0x1096 (e1000)
SUBSYSTEM==net, DRIVERS==?*, ATTRS{address}==00:15:17:17:b7:68,
NAME=eth8

thanks for pointing this out.

 
 --
 Len Sorensen

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] ethX misnumbered and one missing in mii-tool

2007-03-29 Thread Andrei Popa
Hello,

In a dual core 2 server with an intel motherboard and 5 network
cards(two onboard) and 1 pci express card with two slots and one pci-x
pci64 card the kernel sees all of them in dmesg but in mii-tool are
misnumbered and one card is missing.
(please CC as I am not subscribed to lkml)


from dmesg:
Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device :03:00.0 to 64
e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:08
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :03:00.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:09
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :05:00.0[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device :05:00.0 to 64
e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:68
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 20 sharing vector 0xC9 and IRQ 20
ACPI: PCI Interrupt :05:00.1[B] -> GSI 19 (level, low) -> IRQ 20
PCI: Setting latency timer of device :05:00.1 to 64
e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:69
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 21 sharing vector 0xD1 and IRQ 21
ACPI: PCI Interrupt :06:02.0[A] -> GSI 27 (level, low) -> IRQ 21
e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit)
00:0e:0c:ba:a8:50
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection



zeus ~ # mii-tool
eth2: no link
eth5: negotiated 100baseTx-FD, link ok
eth6: no link
eth7: no link
zeus ~ #

it sees only 4 cards that are misnumbered and one is missing.


zeus ~ # lspci
00:00.0 Host bridge: Intel Corporation Server Memory Contoller Hub (rev
b1)
00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3
(rev b1)
00:03.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 3 (rev
b1)
00:08.0 System peripheral: Intel Corporation Server DMA Engine (rev b1)
00:10.0 Host bridge: Intel Corporation Server Error Reporting Registers
(rev b1)
00:10.1 Host bridge: Intel Corporation Server Error Reporting Registers
(rev b1)
00:10.2 Host bridge: Intel Corporation Server Error Reporting Registers
(rev b1)
00:11.0 Host bridge: Intel Corporation Reserved Registers (rev b1)
00:13.0 Host bridge: Intel Corporation Reserved Registers (rev b1)
00:15.0 Host bridge: Intel Corporation Server FBD Registers (rev b1)
00:16.0 Host bridge: Intel Corporation Server FBD Registers (rev b1)
00:1c.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation Enterprise Southbridge EHCI
USB (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation Enterprise Southbridge LPC (rev
09)
00:1f.1 IDE interface: Intel Corporation Enterprise Southbridge PATA
(rev 09)
00:1f.2 SATA controller: Intel Corporation Enterprise Southbridge SATA
AHCI (rev 09)
00:1f.3 SMBus: Intel Corporation Enterprise Southbridge SMBus (rev 09)
01:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Upstream Port (rev 01)
01:00.3 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
to PCI-X Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Downstream Port E1 (rev 01)
02:01.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Downstream Port E2 (rev 01)
02:02.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Downstream Port E3 (rev 01)
03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
05:00.0 Ethernet controller: Intel Corporation PRO/1000 EB Network
Connection with I/O Acceleration (rev 01)
05:00.1 Ethernet controller: Intel Corporation PRO/1000 EB Network
Connection with I/O Acceleration (rev 01)
06:02.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet
Controller (rev 04)
09:0c.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)



#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.20.4-zeus3
# Fri Mar 30 23:07:23 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y

[BUG] ethX misnumbered and one missing in mii-tool

2007-03-29 Thread Andrei Popa
Hello,

In a dual core 2 server with an intel motherboard and 5 network
cards(two onboard) and 1 pci express card with two slots and one pci-x
pci64 card the kernel sees all of them in dmesg but in mii-tool are
misnumbered and one card is missing.
(please CC as I am not subscribed to lkml)


from dmesg:
Intel(R) PRO/1000 Network Driver - version 7.0.33-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt :03:00.0[A] - GSI 16 (level, low) - IRQ 16
PCI: Setting latency timer of device :03:00.0 to 64
e1000: :03:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:08
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :03:00.1[B] - GSI 17 (level, low) - IRQ 17
PCI: Setting latency timer of device :03:00.1 to 64
e1000: :03:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:21:0c:09
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt :05:00.0[A] - GSI 18 (level, low) - IRQ 18
PCI: Setting latency timer of device :05:00.0 to 64
e1000: :05:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:68
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 20 sharing vector 0xC9 and IRQ 20
ACPI: PCI Interrupt :05:00.1[B] - GSI 19 (level, low) - IRQ 20
PCI: Setting latency timer of device :05:00.1 to 64
e1000: :05:00.1: e1000_probe: (PCI Express:2.5Gb/s:Width x4)
00:15:17:17:b7:69
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
GSI 21 sharing vector 0xD1 and IRQ 21
ACPI: PCI Interrupt :06:02.0[A] - GSI 27 (level, low) - IRQ 21
e1000: :06:02.0: e1000_probe: (PCI-X:100MHz:64-bit)
00:0e:0c:ba:a8:50
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection



zeus ~ # mii-tool
eth2: no link
eth5: negotiated 100baseTx-FD, link ok
eth6: no link
eth7: no link
zeus ~ #

it sees only 4 cards that are misnumbered and one is missing.


zeus ~ # lspci
00:00.0 Host bridge: Intel Corporation Server Memory Contoller Hub (rev
b1)
00:02.0 PCI bridge: Intel Corporation Server PCI Express x8 Port 2-3
(rev b1)
00:03.0 PCI bridge: Intel Corporation Server PCI Express x4 Port 3 (rev
b1)
00:08.0 System peripheral: Intel Corporation Server DMA Engine (rev b1)
00:10.0 Host bridge: Intel Corporation Server Error Reporting Registers
(rev b1)
00:10.1 Host bridge: Intel Corporation Server Error Reporting Registers
(rev b1)
00:10.2 Host bridge: Intel Corporation Server Error Reporting Registers
(rev b1)
00:11.0 Host bridge: Intel Corporation Reserved Registers (rev b1)
00:13.0 Host bridge: Intel Corporation Reserved Registers (rev b1)
00:15.0 Host bridge: Intel Corporation Server FBD Registers (rev b1)
00:16.0 Host bridge: Intel Corporation Server FBD Registers (rev b1)
00:1c.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation Enterprise Southbridge UHCI
USB #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation Enterprise Southbridge EHCI
USB (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation Enterprise Southbridge LPC (rev
09)
00:1f.1 IDE interface: Intel Corporation Enterprise Southbridge PATA
(rev 09)
00:1f.2 SATA controller: Intel Corporation Enterprise Southbridge SATA
AHCI (rev 09)
00:1f.3 SMBus: Intel Corporation Enterprise Southbridge SMBus (rev 09)
01:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Upstream Port (rev 01)
01:00.3 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
to PCI-X Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Downstream Port E1 (rev 01)
02:01.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Downstream Port E2 (rev 01)
02:02.0 PCI bridge: Intel Corporation Enterprise Southbridge PCI Express
Downstream Port E3 (rev 01)
03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
Controller (rev 06)
05:00.0 Ethernet controller: Intel Corporation PRO/1000 EB Network
Connection with I/O Acceleration (rev 01)
05:00.1 Ethernet controller: Intel Corporation PRO/1000 EB Network
Connection with I/O Acceleration (rev 01)
06:02.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet
Controller (rev 04)
09:0c.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)



#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.20.4-zeus3
# Fri Mar 30 23:07:23 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y

Re: [BUG] eth0 appers many times in /proc/interrupts after resume

2007-01-23 Thread Andrei Popa

It's ok, after 4 suspend/resume cycles, eth0 only appers one time.

On Sun, 2007-01-21 at 21:22 +, Frederik Deweerdt wrote:
> On Sun, Jan 21, 2007 at 09:17:41PM +0200, Andrei Popa wrote:
> > It's the 10th resume and in /proc/interrupts eth0 appers 10 times.
> > 
> Hi,
> 
> The e100_resume() function should be calling netif_device_detach and
> free_irq. Could you try the following (compile tested) patch?
> 
> Regards,
> Frederik
> 
> Signed-off-by: Frederik Deweerdt <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index 2fe0445..0c376e4 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -2671,6 +2671,7 @@ static int e100_suspend(struct pci_dev *pdev, 
> pm_message_t state)
>   del_timer_sync(>watchdog);
>   netif_carrier_off(nic->netdev);
>  
> + netif_device_detach(netdev);
>   pci_save_state(pdev);
>  
>   if ((nic->flags & wol_magic) | e100_asf(nic)) {
> @@ -2682,6 +2683,7 @@ static int e100_suspend(struct pci_dev *pdev, 
> pm_message_t state)
>   }
>  
>   pci_disable_device(pdev);
> + free_irq(pdev->irq, netdev);
>   pci_set_power_state(pdev, PCI_D3hot);
>  
>   return 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] eth0 appers many times in /proc/interrupts after resume

2007-01-23 Thread Andrei Popa

It's ok, after 4 suspend/resume cycles, eth0 only appers one time.

On Sun, 2007-01-21 at 21:22 +, Frederik Deweerdt wrote:
 On Sun, Jan 21, 2007 at 09:17:41PM +0200, Andrei Popa wrote:
  It's the 10th resume and in /proc/interrupts eth0 appers 10 times.
  
 Hi,
 
 The e100_resume() function should be calling netif_device_detach and
 free_irq. Could you try the following (compile tested) patch?
 
 Regards,
 Frederik
 
 Signed-off-by: Frederik Deweerdt [EMAIL PROTECTED]
 
 diff --git a/drivers/net/e100.c b/drivers/net/e100.c
 index 2fe0445..0c376e4 100644
 --- a/drivers/net/e100.c
 +++ b/drivers/net/e100.c
 @@ -2671,6 +2671,7 @@ static int e100_suspend(struct pci_dev *pdev, 
 pm_message_t state)
   del_timer_sync(nic-watchdog);
   netif_carrier_off(nic-netdev);
  
 + netif_device_detach(netdev);
   pci_save_state(pdev);
  
   if ((nic-flags  wol_magic) | e100_asf(nic)) {
 @@ -2682,6 +2683,7 @@ static int e100_suspend(struct pci_dev *pdev, 
 pm_message_t state)
   }
  
   pci_disable_device(pdev);
 + free_irq(pdev-irq, netdev);
   pci_set_power_state(pdev, PCI_D3hot);
  
   return 0;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] eth0 appers many times in /proc/interrupts after resume

2007-01-21 Thread Andrei Popa
Hello,

It's the 10th resume and in /proc/interrupts eth0 appers 10 times.

ierdnac ~ # cat /proc/interrupts
   CPU0   CPU1
  0:   19690962  21390   IO-APIC-edge  timer
  1:  34666  0   IO-APIC-edge  i8042
  8: 12  0   IO-APIC-edge  rtc
  9: 189109  0   IO-APIC-fasteoi   acpi
 12:2467502  62285   IO-APIC-edge  i8042
 14: 40  0   IO-APIC-edge  ide0
 17:1156971  14168   IO-APIC-fasteoi   uhci_hcd:usb5,
[EMAIL PROTECTED]::00:02.0
 18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb4
 19:  0  0   IO-APIC-fasteoi   uhci_hcd:usb3
 20:  1  26290   IO-APIC-fasteoi   ehci_hcd:usb1,
uhci_hcd:usb2
 21: 408192  0   IO-APIC-fasteoi   HDA Intel
 22: 249414   2543   IO-APIC-fasteoi   ohci1394, eth0, eth0,
eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0
223: 220668  0   PCI-MSI-edge  libata
NMI:  0  0
LOC:   19338002   19135738
ERR:  0
MIS:  0


ierdnac ~ # lsmod
Module  Size  Used by
snd_seq47120  0
snd_seq_device  6860  1 snd_seq
snd_hda_intel  16344  4
snd_hda_codec 157568  1 snd_hda_intel
snd_pcm68100  3 snd_hda_intel,snd_hda_codec
snd_timer  18884  3 snd_seq,snd_pcm
snd38776  12
snd_seq,snd_seq_device,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
snd_page_alloc  7880  2 snd_hda_intel,snd_pcm
usb_storage33156  0
ohci1394   32176  0
ieee1394   82964  1 ohci1394
e100   31368  0
uhci_hcd   21516  0
ehci_hcd   27596  0
usbcore   100948  3 usb_storage,uhci_hcd,ehci_hcd


from dmesg:
Restarting tasks ... done.
Suspend2 debugging info:
- Suspend core   : 2.2.9.1
- Kernel Version : 2.6.20-rc4
- Compiler vers. : 4.1
- Attempt number : 10
- Parameters : 0 81936 0 1 0 5
- Overall expected compression percentage: 0.
- Compressor is 'lzf'.
  Compressed 525217792 bytes into 449285477 (14 percent compression).
- SwapAllocator active.
  Swap available for image: 250982 pages.
- I/O speed: Write 43 MB/s, Read 44 MB/s.
- Extra pages: -99 used/500.
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c04bd000 soft=c04b5000

suspend2 maintainer:
"That is interesting! Unfortunately, I don't touch anything in that area.
Could I get you to send the message to the Linux kernel mailing list?

Regards,

Nigel"

ierdnac ~ # uname -a
Linux ierdnac 2.6.20-rc4 #0 SMP PREEMPT Wed Jan 10 18:34:14 EET 2007 i686 
Genuine Intel(R) CPU   T2050  @ 1.60GHz GenuineIntel GNU/Linux




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] eth0 appers many times in /proc/interrupts after resume

2007-01-21 Thread Andrei Popa
Hello,

It's the 10th resume and in /proc/interrupts eth0 appers 10 times.

ierdnac ~ # cat /proc/interrupts
   CPU0   CPU1
  0:   19690962  21390   IO-APIC-edge  timer
  1:  34666  0   IO-APIC-edge  i8042
  8: 12  0   IO-APIC-edge  rtc
  9: 189109  0   IO-APIC-fasteoi   acpi
 12:2467502  62285   IO-APIC-edge  i8042
 14: 40  0   IO-APIC-edge  ide0
 17:1156971  14168   IO-APIC-fasteoi   uhci_hcd:usb5,
[EMAIL PROTECTED]::00:02.0
 18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb4
 19:  0  0   IO-APIC-fasteoi   uhci_hcd:usb3
 20:  1  26290   IO-APIC-fasteoi   ehci_hcd:usb1,
uhci_hcd:usb2
 21: 408192  0   IO-APIC-fasteoi   HDA Intel
 22: 249414   2543   IO-APIC-fasteoi   ohci1394, eth0, eth0,
eth0, eth0, eth0, eth0, eth0, eth0, eth0, eth0
223: 220668  0   PCI-MSI-edge  libata
NMI:  0  0
LOC:   19338002   19135738
ERR:  0
MIS:  0


ierdnac ~ # lsmod
Module  Size  Used by
snd_seq47120  0
snd_seq_device  6860  1 snd_seq
snd_hda_intel  16344  4
snd_hda_codec 157568  1 snd_hda_intel
snd_pcm68100  3 snd_hda_intel,snd_hda_codec
snd_timer  18884  3 snd_seq,snd_pcm
snd38776  12
snd_seq,snd_seq_device,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer
snd_page_alloc  7880  2 snd_hda_intel,snd_pcm
usb_storage33156  0
ohci1394   32176  0
ieee1394   82964  1 ohci1394
e100   31368  0
uhci_hcd   21516  0
ehci_hcd   27596  0
usbcore   100948  3 usb_storage,uhci_hcd,ehci_hcd


from dmesg:
Restarting tasks ... done.
Suspend2 debugging info:
- Suspend core   : 2.2.9.1
- Kernel Version : 2.6.20-rc4
- Compiler vers. : 4.1
- Attempt number : 10
- Parameters : 0 81936 0 1 0 5
- Overall expected compression percentage: 0.
- Compressor is 'lzf'.
  Compressed 525217792 bytes into 449285477 (14 percent compression).
- SwapAllocator active.
  Swap available for image: 250982 pages.
- I/O speed: Write 43 MB/s, Read 44 MB/s.
- Extra pages: -99 used/500.
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c04bd000 soft=c04b5000

suspend2 maintainer:
That is interesting! Unfortunately, I don't touch anything in that area.
Could I get you to send the message to the Linux kernel mailing list?

Regards,

Nigel

ierdnac ~ # uname -a
Linux ierdnac 2.6.20-rc4 #0 SMP PREEMPT Wed Jan 10 18:34:14 EET 2007 i686 
Genuine Intel(R) CPU   T2050  @ 1.60GHz GenuineIntel GNU/Linux




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrei Popa
On Fri, 2006-12-29 at 02:48 -0800, Linus Torvalds wrote:
> 
> On Fri, 29 Dec 2006, Linus Torvalds wrote:
> > 
> > Hmm? I'd love it if somebody else wrote the patch and tested it, because 
> > I'm getting sick and tired of this bug ;)
> 
> Who the hell am I kidding? I haven't been able to sleep right for the last 
> few days over this bug. It was really getting to me.
> 
> And putting on the thinking cap, there's actually a fairly simple an 
> nonintrusive patch. It still has a tiny tiny race (see the comment), but I 
> bet nobody can really hit it in real life anyway, and I know several ways 
> to fix it, so I'm not really _that_ worried about it.
> 
> The patch is mostly a comment. The "real" meat of it is actually just a 
> few lines.
> 
> Can anybody get corruption with this thing applied? It goes on top of 
> plain v2.6.20-rc2.

Tested with rtorrent and there is no corruption.


> 
>   Linus
> 
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index b3a198c..ec01da1 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page)
>  {
>   struct address_space *mapping = page_mapping(page);
>  
> - if (!mapping)
> - return TestClearPageDirty(page);
> -
> - if (TestClearPageDirty(page)) {
> - if (mapping_cap_account_dirty(mapping)) {
> - page_mkclean(page);
> + if (mapping && mapping_cap_account_dirty(mapping)) {
> + /*
> +  * Yes, Virginia, this is indeed insane.
> +  *
> +  * We use this sequence to make sure that
> +  *  (a) we account for dirty stats properly
> +  *  (b) we tell the low-level filesystem to
> +  *  mark the whole page dirty if it was
> +  *  dirty in a pagetable. Only to then
> +  *  (c) clean the page again and return 1 to
> +  *  cause the writeback.
> +  *
> +  * This way we avoid all nasty races with the
> +  * dirty bit in multiple places and clearing
> +  * them concurrently from different threads.
> +  *
> +  * Note! Normally the "set_page_dirty(page)"
> +  * has no effect on the actual dirty bit - since
> +  * that will already usually be set. But we
> +  * need the side effects, and it can help us
> +  * avoid races.
> +  *
> +  * We basically use the page "master dirty bit"
> +  * as a serialization point for all the different
> +  * threds doing their things.
> +  *
> +  * FIXME! We still have a race here: if somebody
> +  * adds the page back to the page tables in
> +  * between the "page_mkclean()" and the "TestClearPageDirty()",
> +  * we might have it mapped without the dirty bit set.
> +  */
> + if (page_mkclean(page))
> + set_page_dirty(page);
> + if (TestClearPageDirty(page)) {
>   dec_zone_page_state(page, NR_FILE_DIRTY);
> + return 1;
>   }
> - return 1;
> + return 0;
>   }
> - return 0;
> + return TestClearPageDirty(page);
>  }
>  EXPORT_SYMBOL(clear_page_dirty_for_io);
>  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

2006-12-29 Thread Andrei Popa
On Fri, 2006-12-29 at 02:48 -0800, Linus Torvalds wrote:
 
 On Fri, 29 Dec 2006, Linus Torvalds wrote:
  
  Hmm? I'd love it if somebody else wrote the patch and tested it, because 
  I'm getting sick and tired of this bug ;)
 
 Who the hell am I kidding? I haven't been able to sleep right for the last 
 few days over this bug. It was really getting to me.
 
 And putting on the thinking cap, there's actually a fairly simple an 
 nonintrusive patch. It still has a tiny tiny race (see the comment), but I 
 bet nobody can really hit it in real life anyway, and I know several ways 
 to fix it, so I'm not really _that_ worried about it.
 
 The patch is mostly a comment. The real meat of it is actually just a 
 few lines.
 
 Can anybody get corruption with this thing applied? It goes on top of 
 plain v2.6.20-rc2.

Tested with rtorrent and there is no corruption.


 
   Linus
 
 
 diff --git a/mm/page-writeback.c b/mm/page-writeback.c
 index b3a198c..ec01da1 100644
 --- a/mm/page-writeback.c
 +++ b/mm/page-writeback.c
 @@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page)
  {
   struct address_space *mapping = page_mapping(page);
  
 - if (!mapping)
 - return TestClearPageDirty(page);
 -
 - if (TestClearPageDirty(page)) {
 - if (mapping_cap_account_dirty(mapping)) {
 - page_mkclean(page);
 + if (mapping  mapping_cap_account_dirty(mapping)) {
 + /*
 +  * Yes, Virginia, this is indeed insane.
 +  *
 +  * We use this sequence to make sure that
 +  *  (a) we account for dirty stats properly
 +  *  (b) we tell the low-level filesystem to
 +  *  mark the whole page dirty if it was
 +  *  dirty in a pagetable. Only to then
 +  *  (c) clean the page again and return 1 to
 +  *  cause the writeback.
 +  *
 +  * This way we avoid all nasty races with the
 +  * dirty bit in multiple places and clearing
 +  * them concurrently from different threads.
 +  *
 +  * Note! Normally the set_page_dirty(page)
 +  * has no effect on the actual dirty bit - since
 +  * that will already usually be set. But we
 +  * need the side effects, and it can help us
 +  * avoid races.
 +  *
 +  * We basically use the page master dirty bit
 +  * as a serialization point for all the different
 +  * threds doing their things.
 +  *
 +  * FIXME! We still have a race here: if somebody
 +  * adds the page back to the page tables in
 +  * between the page_mkclean() and the TestClearPageDirty(),
 +  * we might have it mapped without the dirty bit set.
 +  */
 + if (page_mkclean(page))
 + set_page_dirty(page);
 + if (TestClearPageDirty(page)) {
   dec_zone_page_state(page, NR_FILE_DIRTY);
 + return 1;
   }
 - return 1;
 + return 0;
   }
 - return 0;
 + return TestClearPageDirty(page);
  }
  EXPORT_SYMBOL(clear_page_dirty_for_io);
  

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one

2006-12-27 Thread Andrei Popa
I have corrupted files...

> ---
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 263f88e..4652ef1 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1653,19 +1653,7 @@ static int __block_write_full_page(struct inode 
> *inode, struct page *page,
>   do {
>   if (!buffer_mapped(bh))
>   continue;
> - /*
> -  * If it's a fully non-blocking write attempt and we cannot
> -  * lock the buffer then redirty the page.  Note that this can
> -  * potentially cause a busy-wait loop from pdflush and kswapd
> -  * activity, but those code paths have their own higher-level
> -  * throttling.
> -  */
> - if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) {
> - lock_buffer(bh);
> - } else if (test_set_buffer_locked(bh)) {
> - redirty_page_for_writepage(wbc, page);
> - continue;
> - }
> + lock_buffer(bh);
>   if (test_clear_buffer_dirty(bh)) {
>   mark_buffer_async_write(bh);
>   } else {

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one

2006-12-27 Thread Andrei Popa
I have corrupted files...

 ---
 diff --git a/fs/buffer.c b/fs/buffer.c
 index 263f88e..4652ef1 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -1653,19 +1653,7 @@ static int __block_write_full_page(struct inode 
 *inode, struct page *page,
   do {
   if (!buffer_mapped(bh))
   continue;
 - /*
 -  * If it's a fully non-blocking write attempt and we cannot
 -  * lock the buffer then redirty the page.  Note that this can
 -  * potentially cause a busy-wait loop from pdflush and kswapd
 -  * activity, but those code paths have their own higher-level
 -  * throttling.
 -  */
 - if (wbc-sync_mode != WB_SYNC_NONE || !wbc-nonblocking) {
 - lock_buffer(bh);
 - } else if (test_set_buffer_locked(bh)) {
 - redirty_page_for_writepage(wbc, page);
 - continue;
 - }
 + lock_buffer(bh);
   if (test_clear_buffer_dirty(bh)) {
   mark_buffer_async_write(bh);
   } else {

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote:
> 
> On Sun, 24 Dec 2006, Andrei Popa wrote:
> > 
> > Hash check on download completion found bad chunks, consider using
> > "safe_sync".
> 
> Dang. Did you get any warning messages from the kernel?
> 

only these:
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80

but I don't think has anything to do with...

>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote:
> 
> On Sun, 24 Dec 2006, Gordon Farquharson wrote:
> > 
> > The apt cache files (/var/cache/apt/*.bin) still get corrupted with
> > this patch and 2.6.19.
> 
> Yeah, if my guess about do_no_page() is right, _none_ of the previous 
> patches should have ANY effect what-so-ever. In fact, I'd say that even 
> the "ext3 works in writeback mode" thing that Andrei reports is probably a 
> total fluke brought on by timing changes rather than anything else.
> 
> So please try the latest patch instead (on top of anything that shows 
> corruption reliably - the patch should be _totally_ independent of all the 
> other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
> 2.6.19 too, so anything that shows corruption is a fine target - but try 
> to choose something that has been the "best" at corrupting things for you, 
> to make the testing as good as possible).
> 
> Patch included here again (although I think you were cc'd on my previous 
> email too, so you should already have it, and our emails just crossed)
> 
> And if this doesn't fix it, I don't know what will..

With latest git and patches:
http://lkml.org/lkml/diff/2006/12/24/56/1
http://lkml.org/lkml/diff/2006/12/24/61/1

Hash check on download completion found bad chunks, consider using
"safe_sync".

> 
>   Linus
> 
> ---
> diff --git a/mm/memory.c b/mm/memory.c
> index 563792f..cf429c4 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2247,21 +2249,23 @@ retry:
>   if (pte_none(*page_table)) {
>   flush_icache_page(vma, new_page);
>   entry = mk_pte(new_page, vma->vm_page_prot);
> - if (write_access)
> - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> - set_pte_at(mm, address, page_table, entry);
>   if (anon) {
>   inc_mm_counter(mm, anon_rss);
>   lru_cache_add_active(new_page);
>   page_add_new_anon_rmap(new_page, vma, address);
> + if (write_access)
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   } else {
>   inc_mm_counter(mm, file_rss);
>   page_add_file_rmap(new_page);
> + entry = pte_wrprotect(entry);
>   if (write_access) {
>   dirty_page = new_page;
>   get_page(dirty_page);
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>   }
>   }
> + set_pte_at(mm, address, page_table, entry);
>   } else {
>   /* One of our sibling threads was faster, back out. */
>   page_cache_release(new_page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
> On Sun, 24 Dec 2006 14:14:38 +0200
> Andrei Popa <[EMAIL PROTECTED]> wrote:
> 
> > > - mount the fs with ext2 with the no-buffer-head option.  That means 
> > > either:
> > > 
> > >   grub.conf:  rootfstype=ext2 rootflags=nobh
> > >   /etc/fstab: ext2 nobh
> > 
> > ierdnac ~ # mount
> > /dev/sda7 on / type ext2 (rw,noatime,nobh)
> > 
> > I have corruption.
> > 
> > > 
> > > - mount the fs with ext3 data=writeback, nobh
> > > 
> > >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > > works)
> > >   /etc/fstab: ext2 data=writeback,nobh
> > 
> > ierdnac ~ # mount
> > /dev/sda7 on / type ext3 (rw,noatime,nobh)
> > 
> > ierdnac ~ # dmesg|grep EXT3
> > EXT3-fs: mounted filesystem with writeback data mode.
> > EXT3 FS on sda7, internal journal
> > 
> > I don't have corruption. I tested twice.
> 
> This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?

Yes, no corruption. Also tested only with data=writeback and had no
corruption.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote:
> On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
> > On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
> > Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > 
> > > I now _suspect_ that we're talking about something like
> > > 
> > >  - we started a writeout. The IO is still pending, and the page was 
> > >marked clean and is now in the "writeback" phase.
> > >  - a write happens to the page, and the page gets marked dirty again. 
> > >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> > >but they were actually already dirty, because the IO hasn't completed 
> > >yet.
> > >  - the IO from the _previous_ write completes, and marks the buffers 
> > > clean 
> > >again.
> > 
> > Some things for the testers to try, please:
> > 
> > - mount the fs with ext2 with the no-buffer-head option.  That means either:
> > 
> >   grub.conf:  rootfstype=ext2 rootflags=nobh
> >   /etc/fstab: ext2 nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext2 (rw,noatime,nobh)
> 
> I have corruption.
> 
> > 
> > - mount the fs with ext3 data=writeback, nobh
> > 
> >   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> > works)
> >   /etc/fstab: ext2 data=writeback,nobh
> 
> ierdnac ~ # mount
> /dev/sda7 on / type ext3 (rw,noatime,nobh)
> 
> ierdnac ~ # dmesg|grep EXT3
> EXT3-fs: mounted filesystem with writeback data mode.
> EXT3 FS on sda7, internal journal
> 
> I don't have corruption. I tested twice.
> 

I also tested with ext3 ordered, nobh  and I have file corruption...

> > 
> > if that still fails we can rule out buffer_head funnies.
> > 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
> On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > I now _suspect_ that we're talking about something like
> > 
> >  - we started a writeout. The IO is still pending, and the page was 
> >marked clean and is now in the "writeback" phase.
> >  - a write happens to the page, and the page gets marked dirty again. 
> >Marking the page dirty also marks all the _buffers_ in the page dirty, 
> >but they were actually already dirty, because the IO hasn't completed 
> >yet.
> >  - the IO from the _previous_ write completes, and marks the buffers clean 
> >again.
> 
> Some things for the testers to try, please:
> 
> - mount the fs with ext2 with the no-buffer-head option.  That means either:
> 
>   grub.conf:  rootfstype=ext2 rootflags=nobh
>   /etc/fstab: ext2 nobh

ierdnac ~ # mount
/dev/sda7 on / type ext2 (rw,noatime,nobh)

I have corruption.

> 
> - mount the fs with ext3 data=writeback, nobh
> 
>   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
> works)
>   /etc/fstab: ext2 data=writeback,nobh

ierdnac ~ # mount
/dev/sda7 on / type ext3 (rw,noatime,nobh)

ierdnac ~ # dmesg|grep EXT3
EXT3-fs: mounted filesystem with writeback data mode.
EXT3 FS on sda7, internal journal

I don't have corruption. I tested twice.

> 
> if that still fails we can rule out buffer_head funnies.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
 On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
 Linus Torvalds [EMAIL PROTECTED] wrote:
 
  I now _suspect_ that we're talking about something like
  
   - we started a writeout. The IO is still pending, and the page was 
 marked clean and is now in the writeback phase.
   - a write happens to the page, and the page gets marked dirty again. 
 Marking the page dirty also marks all the _buffers_ in the page dirty, 
 but they were actually already dirty, because the IO hasn't completed 
 yet.
   - the IO from the _previous_ write completes, and marks the buffers clean 
 again.
 
 Some things for the testers to try, please:
 
 - mount the fs with ext2 with the no-buffer-head option.  That means either:
 
   grub.conf:  rootfstype=ext2 rootflags=nobh
   /etc/fstab: ext2 nobh

ierdnac ~ # mount
/dev/sda7 on / type ext2 (rw,noatime,nobh)

I have corruption.

 
 - mount the fs with ext3 data=writeback, nobh
 
   grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
 works)
   /etc/fstab: ext2 data=writeback,nobh

ierdnac ~ # mount
/dev/sda7 on / type ext3 (rw,noatime,nobh)

ierdnac ~ # dmesg|grep EXT3
EXT3-fs: mounted filesystem with writeback data mode.
EXT3 FS on sda7, internal journal

I don't have corruption. I tested twice.

 
 if that still fails we can rule out buffer_head funnies.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote:
 On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: 
  On Sun, 24 Dec 2006 00:43:54 -0800 (PST)
  Linus Torvalds [EMAIL PROTECTED] wrote:
  
   I now _suspect_ that we're talking about something like
   
- we started a writeout. The IO is still pending, and the page was 
  marked clean and is now in the writeback phase.
- a write happens to the page, and the page gets marked dirty again. 
  Marking the page dirty also marks all the _buffers_ in the page dirty, 
  but they were actually already dirty, because the IO hasn't completed 
  yet.
- the IO from the _previous_ write completes, and marks the buffers 
   clean 
  again.
  
  Some things for the testers to try, please:
  
  - mount the fs with ext2 with the no-buffer-head option.  That means either:
  
grub.conf:  rootfstype=ext2 rootflags=nobh
/etc/fstab: ext2 nobh
 
 ierdnac ~ # mount
 /dev/sda7 on / type ext2 (rw,noatime,nobh)
 
 I have corruption.
 
  
  - mount the fs with ext3 data=writeback, nobh
  
grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
  works)
/etc/fstab: ext2 data=writeback,nobh
 
 ierdnac ~ # mount
 /dev/sda7 on / type ext3 (rw,noatime,nobh)
 
 ierdnac ~ # dmesg|grep EXT3
 EXT3-fs: mounted filesystem with writeback data mode.
 EXT3 FS on sda7, internal journal
 
 I don't have corruption. I tested twice.
 

I also tested with ext3 ordered, nobh  and I have file corruption...

  
  if that still fails we can rule out buffer_head funnies.
  

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote:
 On Sun, 24 Dec 2006 14:14:38 +0200
 Andrei Popa [EMAIL PROTECTED] wrote:
 
   - mount the fs with ext2 with the no-buffer-head option.  That means 
   either:
   
 grub.conf:  rootfstype=ext2 rootflags=nobh
 /etc/fstab: ext2 nobh
  
  ierdnac ~ # mount
  /dev/sda7 on / type ext2 (rw,noatime,nobh)
  
  I have corruption.
  
   
   - mount the fs with ext3 data=writeback, nobh
   
 grub.conf:  rootfstype=ext3 rootflags=nobh,data=writeback  (I hope this 
   works)
 /etc/fstab: ext2 data=writeback,nobh
  
  ierdnac ~ # mount
  /dev/sda7 on / type ext3 (rw,noatime,nobh)
  
  ierdnac ~ # dmesg|grep EXT3
  EXT3-fs: mounted filesystem with writeback data mode.
  EXT3 FS on sda7, internal journal
  
  I don't have corruption. I tested twice.
 
 This is a surprising result.  Can you pleas retest ext3 data=writeback,nobh?

Yes, no corruption. Also tested only with data=writeback and had no
corruption.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote:
 
 On Sun, 24 Dec 2006, Gordon Farquharson wrote:
  
  The apt cache files (/var/cache/apt/*.bin) still get corrupted with
  this patch and 2.6.19.
 
 Yeah, if my guess about do_no_page() is right, _none_ of the previous 
 patches should have ANY effect what-so-ever. In fact, I'd say that even 
 the ext3 works in writeback mode thing that Andrei reports is probably a 
 total fluke brought on by timing changes rather than anything else.
 
 So please try the latest patch instead (on top of anything that shows 
 corruption reliably - the patch should be _totally_ independent of all the 
 other issues, and I think it will apply cleanly on top of 2.6.18.3 and 
 2.6.19 too, so anything that shows corruption is a fine target - but try 
 to choose something that has been the best at corrupting things for you, 
 to make the testing as good as possible).
 
 Patch included here again (although I think you were cc'd on my previous 
 email too, so you should already have it, and our emails just crossed)
 
 And if this doesn't fix it, I don't know what will..

With latest git and patches:
http://lkml.org/lkml/diff/2006/12/24/56/1
http://lkml.org/lkml/diff/2006/12/24/61/1

Hash check on download completion found bad chunks, consider using
safe_sync.

 
   Linus
 
 ---
 diff --git a/mm/memory.c b/mm/memory.c
 index 563792f..cf429c4 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -2247,21 +2249,23 @@ retry:
   if (pte_none(*page_table)) {
   flush_icache_page(vma, new_page);
   entry = mk_pte(new_page, vma-vm_page_prot);
 - if (write_access)
 - entry = maybe_mkwrite(pte_mkdirty(entry), vma);
 - set_pte_at(mm, address, page_table, entry);
   if (anon) {
   inc_mm_counter(mm, anon_rss);
   lru_cache_add_active(new_page);
   page_add_new_anon_rmap(new_page, vma, address);
 + if (write_access)
 + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
   } else {
   inc_mm_counter(mm, file_rss);
   page_add_file_rmap(new_page);
 + entry = pte_wrprotect(entry);
   if (write_access) {
   dirty_page = new_page;
   get_page(dirty_page);
 + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
   }
   }
 + set_pte_at(mm, address, page_table, entry);
   } else {
   /* One of our sibling threads was faster, back out. */
   page_cache_release(new_page);

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote:
 
 On Sun, 24 Dec 2006, Andrei Popa wrote:
  
  Hash check on download completion found bad chunks, consider using
  safe_sync.
 
 Dang. Did you get any warning messages from the kernel?
 

only these:
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80
ACPI: EC: evaluating _Q80

but I don't think has anything to do with...

   Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote:
> * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]:
> > With all three patches I have corruption
> 
> I've completed one installation with Linus' patch plus the two from
> Andrew successfully, but I'm currently trying again... but I really
> need a better testcase since an installation takes about an hour.
> Andrei, which torrent do you download as a testcase?  It would be good
> if someone could suggest a torrent which is legal and not too large.
It's a 1.4GB file torrent split in 84 rar files and there are many
seeders. I download with ~ 5MB/sec. The torrent is private.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote:
 * Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]:
  With all three patches I have corruption
 
 I've completed one installation with Linus' patch plus the two from
 Andrew successfully, but I'm currently trying again... but I really
 need a better testcase since an installation takes about an hour.
 Andrei, which torrent do you download as a testcase?  It would be good
 if someone could suggest a torrent which is legal and not too large.
It's a 1.4GB file torrent split in 84 rar files and there are many
seeders. I download with ~ 5MB/sec. The torrent is private.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa
With all three patches I have corruption


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/asm-generic/pgtable.h
b/include/asm-generic/pgtable.h
index 9d774d0..8879f1d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -61,31 +61,6 @@ ({   
\
 })
 #endif
 
-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
-#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\
-({ \
-   pte_t __pte = *__ptep;  \
-   int r = 1;  \
-   if (!pte_dirty(__pte))  \
-   r = 0;  \
-   else\
-   set_pte_at((__vma)->vm_mm, (__address), (__ptep),   \
-  pte_mkclean(__pte)); \
-   r;  \
-})
-#endif
-
-#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(__vma, __address, __ptep)   \
-({ \
-   int __dirty;\
-   __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep);  \
-   if (__dirty)\
-   flush_tlb_page(__vma, __address);   \
-   __dirty;\
-})
-#endif
-
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear(__mm, __address, __ptep)\
 ({ \
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..b61d6f9 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -300,18 +300,20 @@ do {  
\
flush_tlb_page(vma, address);   \
 } while (0)
 
-#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(vma, address, ptep) \
-({ \
-   int __dirty;\
-   __dirty = pte_dirty(*(ptep));   \
-   if (__dirty) {  \
-   clear_bit(_PAGE_BIT_DIRTY, &(ptep)->pte_low);   \
-   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
-   flush_tlb_page(vma, address);   \
-   }   \
-   __dirty;  

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa
With all three patches I have corruption


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..4f4cd13 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/include/asm-generic/pgtable.h
b/include/asm-generic/pgtable.h
index 9d774d0..8879f1d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -61,31 +61,6 @@ ({   
\
 })
 #endif
 
-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_DIRTY
-#define ptep_test_and_clear_dirty(__vma, __address, __ptep)\
-({ \
-   pte_t __pte = *__ptep;  \
-   int r = 1;  \
-   if (!pte_dirty(__pte))  \
-   r = 0;  \
-   else\
-   set_pte_at((__vma)-vm_mm, (__address), (__ptep),   \
-  pte_mkclean(__pte)); \
-   r;  \
-})
-#endif
-
-#ifndef __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(__vma, __address, __ptep)   \
-({ \
-   int __dirty;\
-   __dirty = ptep_test_and_clear_dirty(__vma, __address, __ptep);  \
-   if (__dirty)\
-   flush_tlb_page(__vma, __address);   \
-   __dirty;\
-})
-#endif
-
 #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR
 #define ptep_get_and_clear(__mm, __address, __ptep)\
 ({ \
diff --git a/include/asm-i386/pgtable.h b/include/asm-i386/pgtable.h
index e6a4723..b61d6f9 100644
--- a/include/asm-i386/pgtable.h
+++ b/include/asm-i386/pgtable.h
@@ -300,18 +300,20 @@ do {  
\
flush_tlb_page(vma, address);   \
 } while (0)
 
-#define __HAVE_ARCH_PTEP_CLEAR_DIRTY_FLUSH
-#define ptep_clear_flush_dirty(vma, address, ptep) \
-({ \
-   int __dirty;\
-   __dirty = pte_dirty(*(ptep));   \
-   if (__dirty) {  \
-   clear_bit(_PAGE_BIT_DIRTY, (ptep)-pte_low);   \
-   pte_update_defer((vma)-vm_mm, (address), (ptep));  \
-   flush_tlb_page(vma, address);   \
-   }   \
-   __dirty; 

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Andrei Popa
On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote:
> 
> Btw, I'd really love to hear whether the patch I sent out actually _helps_ 
> at all, or whether we're just discussing something that in the end is just 
> a cleanup..
> 
> Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be 
> talking about different bugs, so _both_ of your experiences definitely 
> matter here).

with http://lkml.org/lkml/diff/2006/12/20/204/1
I have corruption: Hash check on download completion found bad chunks,
consider using "safe_sync".

> 
>   Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-21 Thread Andrei Popa
On Wed, 2006-12-20 at 16:24 -0800, Linus Torvalds wrote:
 
 Btw, I'd really love to hear whether the patch I sent out actually _helps_ 
 at all, or whether we're just discussing something that in the end is just 
 a cleanup..
 
 Martin, Peter, Andrei, pls give it a try. (Martin and Andrei may be 
 talking about different bugs, so _both_ of your experiences definitely 
 matter here).

with http://lkml.org/lkml/diff/2006/12/20/204/1
I have corruption: Hash check on download completion found bad chunks,
consider using safe_sync.

 
   Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-20 Thread Andrei Popa
On Wed, 2006-12-20 at 15:23 +0100, Peter Zijlstra wrote:
> On Wed, 2006-12-20 at 16:15 +0200, Andrei Popa wrote:
> > On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote:
> > > On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote:
> > > 
> > > > OR:
> > > > 
> > > >  - page_mkclean_one() is simply buggy.
> > > 
> > > GOLD!
> > > 
> > > it seems to work with all this (full diff against current git).
> > > 
> > > /me rebuilds full kernel to make sure...
> > > reboot...
> > > test...  pff the tension...
> > > yay, still good!
> > > 
> > > Andrei; would you please verify.
> > 
> > I have corrupted files.
> 
> drad; and with this patch:
>   http://lkml.org/lkml/2006/12/20/112

Hash check on download completion found bad chunks, consider using
"safe_sync".

> 
> /me goes rebuild his kernel and try more than 3 times
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-20 Thread Andrei Popa
On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote:
> On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote:
> 
> > OR:
> > 
> >  - page_mkclean_one() is simply buggy.
> 
> GOLD!
> 
> it seems to work with all this (full diff against current git).
> 
> /me rebuilds full kernel to make sure...
> reboot...
> test...  pff the tension...
> yay, still good!
> 
> Andrei; would you please verify.

I have corrupted files.

> The magic seems to be in the extra tlb flush after clearing the dirty
> bit. Just too bad ptep_clear_flush_dirty() needs ptep not entry.
> 
> diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
> index 5e7cd45..2b8893b 100644
> --- a/drivers/connector/connector.c
> +++ b/drivers/connector/connector.c
> @@ -135,8 +135,7 @@ static int cn_call_callback(struct cn_msg *msg, void 
> (*destruct_data)(void *), v
>   spin_lock_bh(>cbdev->queue_lock);
>   list_for_each_entry(__cbq, >cbdev->queue_list, callback_entry) {
>   if (cn_cb_equal(&__cbq->id.id, >id)) {
> - if (likely(!test_bit(WORK_STRUCT_PENDING,
> -  &__cbq->work.work.management) &&
> + if (likely(!delayed_work_pending(&__cbq->work) &&
>   __cbq->data.ddata == NULL)) {
>   __cbq->data.callback_priv = msg;
>  
> diff --git a/fs/buffer.c b/fs/buffer.c
> index d1f1b54..263f88e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
>   int ret = 0;
>  
>   BUG_ON(!PageLocked(page));
> - if (PageWriteback(page))
> + if (PageDirty(page) || PageWriteback(page))
>   return 0;
>  
>   if (mapping == NULL) {  /* can this still happen? */
> @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page)
>   spin_lock(>private_lock);
>   ret = drop_buffers(page, _to_free);
>   spin_unlock(>private_lock);
> - if (ret) {
> - /*
> -  * If the filesystem writes its buffers by hand (eg ext3)
> -  * then we can have clean buffers against a dirty page.  We
> -  * clean the page here; otherwise later reattachment of buffers
> -  * could encounter a non-uptodate page, which is unresolvable.
> -  * This only applies in the rare case where try_to_free_buffers
> -  * succeeds but the page is not freed.
> -  *
> -  * Also, during truncate, discard_buffer will have marked all
> -  * the page's buffers clean.  We discover that here and clean
> -  * the page also.
> -  */
> - if (test_clear_page_dirty(page))
> - task_io_account_cancelled_write(PAGE_CACHE_SIZE);
> - }
>  out:
>   if (buffers_to_free) {
>   struct buffer_head *bh = buffers_to_free;
> diff --git a/mm/memory.c b/mm/memory.c
> index c00bac6..60e0945 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping,
>  }
>  EXPORT_SYMBOL(unmap_mapping_range);
>  
> +static void check_last_page(struct address_space *mapping, loff_t size)
> +{
> + pgoff_t index;
> + unsigned int offset;
> + struct page *page;
> +
> + if (!mapping)
> + return;
> + offset = size & ~PAGE_MASK;
> + if (!offset)
> + return;
> + index = size >> PAGE_SHIFT;
> + page = find_lock_page(mapping, index);
> + if (page) {
> + unsigned int check = 0;
> + unsigned char *kaddr = kmap_atomic(page, KM_USER0);
> + do {
> + check += kaddr[offset++];
> + } while (offset < PAGE_SIZE);
> + kunmap_atomic(kaddr, KM_USER0);
> + unlock_page(page);
> + page_cache_release(page);
> + if (check)
> + printk(KERN_ERR "%s: BADNESS: truncate check %u\n", 
> current->comm, check);
> + }
> +}
> +
>  /**
>   * vmtruncate - unmap mappings "freed" by truncate() syscall
>   * @inode: inode of the file used
> @@ -1875,6 +1902,7 @@ do_expand:
>   goto out_sig;
>   if (offset > inode->i_sb->s_maxbytes)
>   goto out_big;
> + check_last_page(mapping, inode->i_size);
>   i_size_write(inode, offset);
>  
>  out_truncate:
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 237107c..f561e72 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -957,7 +957,7 @@ int test_set_page_writeback(struct page *page)
>  EXPORT_SYMBOL(test_set_page_writeback);
>  
>  /*
> - * Return true if any of the pages in the mapping are marged with the
> + * Return true if any of the pages in the mapping are marked with the
>   * passed tag.
>   */
>  int mapping_tagged(struct address_space *mapping, int tag)
> diff --git a/mm/rmap.c b/mm/rmap.c
> index d8a842a..900229a 100644
> 

Re: 2.6.19 file content corruption on ext3

2006-12-20 Thread Andrei Popa
On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote:
 On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote:
 
  OR:
  
   - page_mkclean_one() is simply buggy.
 
 GOLD!
 
 it seems to work with all this (full diff against current git).
 
 /me rebuilds full kernel to make sure...
 reboot...
 test...  pff the tension...
 yay, still good!
 
 Andrei; would you please verify.

I have corrupted files.

 The magic seems to be in the extra tlb flush after clearing the dirty
 bit. Just too bad ptep_clear_flush_dirty() needs ptep not entry.
 
 diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
 index 5e7cd45..2b8893b 100644
 --- a/drivers/connector/connector.c
 +++ b/drivers/connector/connector.c
 @@ -135,8 +135,7 @@ static int cn_call_callback(struct cn_msg *msg, void 
 (*destruct_data)(void *), v
   spin_lock_bh(dev-cbdev-queue_lock);
   list_for_each_entry(__cbq, dev-cbdev-queue_list, callback_entry) {
   if (cn_cb_equal(__cbq-id.id, msg-id)) {
 - if (likely(!test_bit(WORK_STRUCT_PENDING,
 -  __cbq-work.work.management) 
 + if (likely(!delayed_work_pending(__cbq-work) 
   __cbq-data.ddata == NULL)) {
   __cbq-data.callback_priv = msg;
  
 diff --git a/fs/buffer.c b/fs/buffer.c
 index d1f1b54..263f88e 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
   int ret = 0;
  
   BUG_ON(!PageLocked(page));
 - if (PageWriteback(page))
 + if (PageDirty(page) || PageWriteback(page))
   return 0;
  
   if (mapping == NULL) {  /* can this still happen? */
 @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page)
   spin_lock(mapping-private_lock);
   ret = drop_buffers(page, buffers_to_free);
   spin_unlock(mapping-private_lock);
 - if (ret) {
 - /*
 -  * If the filesystem writes its buffers by hand (eg ext3)
 -  * then we can have clean buffers against a dirty page.  We
 -  * clean the page here; otherwise later reattachment of buffers
 -  * could encounter a non-uptodate page, which is unresolvable.
 -  * This only applies in the rare case where try_to_free_buffers
 -  * succeeds but the page is not freed.
 -  *
 -  * Also, during truncate, discard_buffer will have marked all
 -  * the page's buffers clean.  We discover that here and clean
 -  * the page also.
 -  */
 - if (test_clear_page_dirty(page))
 - task_io_account_cancelled_write(PAGE_CACHE_SIZE);
 - }
  out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
 diff --git a/mm/memory.c b/mm/memory.c
 index c00bac6..60e0945 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping,
  }
  EXPORT_SYMBOL(unmap_mapping_range);
  
 +static void check_last_page(struct address_space *mapping, loff_t size)
 +{
 + pgoff_t index;
 + unsigned int offset;
 + struct page *page;
 +
 + if (!mapping)
 + return;
 + offset = size  ~PAGE_MASK;
 + if (!offset)
 + return;
 + index = size  PAGE_SHIFT;
 + page = find_lock_page(mapping, index);
 + if (page) {
 + unsigned int check = 0;
 + unsigned char *kaddr = kmap_atomic(page, KM_USER0);
 + do {
 + check += kaddr[offset++];
 + } while (offset  PAGE_SIZE);
 + kunmap_atomic(kaddr, KM_USER0);
 + unlock_page(page);
 + page_cache_release(page);
 + if (check)
 + printk(KERN_ERR %s: BADNESS: truncate check %u\n, 
 current-comm, check);
 + }
 +}
 +
  /**
   * vmtruncate - unmap mappings freed by truncate() syscall
   * @inode: inode of the file used
 @@ -1875,6 +1902,7 @@ do_expand:
   goto out_sig;
   if (offset  inode-i_sb-s_maxbytes)
   goto out_big;
 + check_last_page(mapping, inode-i_size);
   i_size_write(inode, offset);
  
  out_truncate:
 diff --git a/mm/page-writeback.c b/mm/page-writeback.c
 index 237107c..f561e72 100644
 --- a/mm/page-writeback.c
 +++ b/mm/page-writeback.c
 @@ -957,7 +957,7 @@ int test_set_page_writeback(struct page *page)
  EXPORT_SYMBOL(test_set_page_writeback);
  
  /*
 - * Return true if any of the pages in the mapping are marged with the
 + * Return true if any of the pages in the mapping are marked with the
   * passed tag.
   */
  int mapping_tagged(struct address_space *mapping, int tag)
 diff --git a/mm/rmap.c b/mm/rmap.c
 index d8a842a..900229a 100644
 --- a/mm/rmap.c
 +++ b/mm/rmap.c
 @@ -432,7 +432,7 @@ static int page_mkclean_one(struct page *page, struct 
 vm_area_struct *vma)
  {

Re: 2.6.19 file content corruption on ext3

2006-12-20 Thread Andrei Popa
On Wed, 2006-12-20 at 15:23 +0100, Peter Zijlstra wrote:
 On Wed, 2006-12-20 at 16:15 +0200, Andrei Popa wrote:
  On Wed, 2006-12-20 at 00:42 +0100, Peter Zijlstra wrote:
   On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote:
   
OR:

 - page_mkclean_one() is simply buggy.
   
   GOLD!
   
   it seems to work with all this (full diff against current git).
   
   /me rebuilds full kernel to make sure...
   reboot...
   test...  pff the tension...
   yay, still good!
   
   Andrei; would you please verify.
  
  I have corrupted files.
 
 drad; and with this patch:
   http://lkml.org/lkml/2006/12/20/112

Hash check on download completion found bad chunks, consider using
safe_sync.

 
 /me goes rebuild his kernel and try more than 3 times
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrei Popa
> > > Also, it'd be useful if you could determine whether the bug appears with
> > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> > > rootfstype=ext2 if it's the root filesystem.
> > 
 I fave file corruption.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrei Popa
   Also, it'd be useful if you could determine whether the bug appears with
   the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
   rootfstype=ext2 if it's the root filesystem.
  
 I fave file corruption.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

> > > If all of test_clear_page_dirty() has been commented out then the page 
> > > will
> > > never become clean hence will never fall out of pagecache, so unless 
> > > Andrei
> > > is doing a reboot before checking for corruption, perhaps the underlying
> > > data on-disk is incorrect, but we can't see it.
> > 
> > if I do a sync and echo 1 > /proc/sys/vm/drop_caches
> 
> OK, that works.
> 
> >  does the reboot is
> > still necesary ?
> 
> It might be necessary to reboot in this case - if we're leaving the
> pagecache dirty, writing to drop_caches won't remove it.  And you probably
> won't be able to get a clean reboot either.
> 
> > > 
> > > Andrei, how _are_ you running this test?What's the exact sequence of 
> > > steps?
> > > 
> > > In particular, are you doing anything which would cause the corrupted file
> > > to be evicted from memory, thus forcing a read from disk?  Such as
> > > unmounting and then remounting the filesystem?
> > 
> > I boot linux, I start rtorrent and start the download, while it's
> > downloading I start evolution and i check my mail(my mbox is very large,
> > several hundered megabytes), I close evolution(I use evolution just to
> > have another application witch uses the filesystem and the memory), I
> > start evolution again. I start firefox. The download is complete.
> > Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
> > test that all 84 downloaded rar files are ok and see the result.
> > 
> > > 
> > > The point of my question is to check that the data is really incorrect
> > > on-disk, or whether it is incorrect in pagecache.

I rebooted and the files are still broken after reboot(tested twice) so
the data is incorrect on disk.

> > > 
> > > Also, it'd be useful if you could determine whether the bug appears with
> > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> > > rootfstype=ext2 if it's the root filesystem.
> > 
> > I will test.

Will test In a couple of hours, I have some work to do...

> 
> ok, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote:
> 
> On Tue, 19 Dec 2006, Andrei Popa wrote:
> > > > 
> > > > nope, no file corruption at all.
> > > 
> > > Ok. That's interesting, but I think you actually #ifdef'ed out too 
> > > much:
> > > 
> > > It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
> > > statement that I meant you should remove.
> > > 
> > > Can you try that too?
> > 
> > I have file corruption: "Hash check on download completion found bad
> > chunks, consider using "safe_sync"."
> 
> Ok, that's interesting.
> 
> So it doesn't seem to be the call to page_mkclean() itself that causes 
> corruption. It looks like Peter's hunch that maybe there's some bug in 
> PG_dirty handling _itself_ might be an idea..
> 
> And the reason it only started happening now is that it may just have been 
> _hidden_ by the fact that while we kept the dirty bits in the page tables, 
> we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. 
> So if it's some bad interaction between writable mappings and some other 
> part of the system, we just didn't see it earlier, exactly because we had 
> _lots_ of dirty bits, and it was enough that _one_ of them was right.
> 
> If you didn't see corruption when you #ifdef'ed out too much of the 
> "test_clean_page_dirty() function (the _whole_ TestClearPageDirty() 
> if-statement), but you get it when you just comment out the stuff that 
> does the page_mkclean(), that's interesting.
> 
> I'm left lookin gat the "radix_tree_tag_clear()" in 
> test_clear_page_dirty().
> 
> What happens if you only ifdef out that single thing? 

I have file corruption.

> 
> The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
> bit _after_ the page has been marked for writeback. Is there some ordering 
> constraint there, perhaps?
> 
> I'm really reaching here. I'm trying to see the pattern, and I'm not 
> seeing it. I'm asking you to test things just to get more of a feel for 
> what triggers the failure, than because I actually have any kind of idea 
> of what the heck is going on.
> 
> Andrew, Nick, Hugh - any ideas?
> 
>   Linus


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote:
> On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > What happens if you only ifdef out that single thing? 
> > 
> > The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
> > bit _after_ the page has been marked for writeback. Is there some ordering 
> > constraint there, perhaps?
> > 
> > I'm really reaching here. I'm trying to see the pattern, and I'm not 
> > seeing it. I'm asking you to test things just to get more of a feel for 
> > what triggers the failure, than because I actually have any kind of idea 
> > of what the heck is going on.
> > 
> > Andrew, Nick, Hugh - any ideas?
> 
> If all of test_clear_page_dirty() has been commented out then the page will
> never become clean hence will never fall out of pagecache, so unless Andrei
> is doing a reboot before checking for corruption, perhaps the underlying
> data on-disk is incorrect, but we can't see it.

if I do a sync and echo 1 > /proc/sys/vm/drop_caches does the reboot is
still necesary ?

> 
> Andrei, how _are_ you running this test?What's the exact sequence of 
> steps?
> 
> In particular, are you doing anything which would cause the corrupted file
> to be evicted from memory, thus forcing a read from disk?  Such as
> unmounting and then remounting the filesystem?

I boot linux, I start rtorrent and start the download, while it's
downloading I start evolution and i check my mail(my mbox is very large,
several hundered megabytes), I close evolution(I use evolution just to
have another application witch uses the filesystem and the memory), I
start evolution again. I start firefox. The download is complete.
Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
test that all 84 downloaded rar files are ok and see the result.

> 
> The point of my question is to check that the data is really incorrect
> on-disk, or whether it is incorrect in pagecache.
> 
> Also, it'd be useful if you could determine whether the bug appears with
> the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> rootfstype=ext2 if it's the root filesystem.

I will test.

> 
> Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote:
> 
> On Tue, 19 Dec 2006, Andrei Popa wrote:
> > > 
> > > There's exactly two call sites that call "page_mkclean()" (an dthat is 
> > > the 
> > > only thing in turn that calls "page_mkclean_one()", which we already 
> > > determined will cause the corruption). 
> > >
> > > Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() 
> > > case? Just do an "#if 0 .. #endif" around that whole if-statement, 
> > > leaving 
> > > the _only_ thing that actually calls "page_mkclean()" to be the 
> > > "clear_page_dirty_for_io()" call.
> > > 
> > > Do you still see corruption?
> > 
> > nope, no file corruption at all.
> 
> Ok. That's interesting, but I think you actually #ifdef'ed out too 
> much:
> 
> > +
> > +#if 0
> > if (TestClearPageDirty(page)) {
> > radix_tree_tag_clear(>page_tree,
> > page_index(page), PAGECACHE_TAG_DIRTY);
> > @@ -866,11 +868,19 @@ int test_clear_page_dirty(struct page *p
> >  * page is locked, which pins the address_space
> >  */
> > if (mapping_cap_account_dirty(mapping)) {
> > -   page_mkclean(page);
> > +   int cleaned = page_mkclean(page);
> > +   if (!must_clean_ptes && cleaned){
> > +   WARN_ON(1);
> > +   set_page_dirty(page);
> > +   }
> > +
> > dec_zone_page_state(page, NR_FILE_DIRTY);
> > }
> > return 1;
> > }
> > +
> > +#endif
> > +
> 
> It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
> statement that I meant you should remove.
> 
> Can you try that too?

I have file corruption: "Hash check on download completion found bad
chunks, consider using "safe_sync"."


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
- 

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Alessandro Suardi wrote:
> > 
> > No idea whether this can be a data point or not, but
> > here it goes... my P2P box is about to turn 5 days old
> > while running nonstop one or both of aMule 2.1.3 and
> > BitTorrent 4.4.0 on ext3 mounted w/default options
> > on both IDE and USB disks. Zero corruption.
> > 
> > AMD K7-800, 512MB RAM, PREEMPT/UP kernel,
> > 2.6.19-git20 on top of up-to-date FC6.
> 
> It _looks_ like PREEMPT/SMP is one common configuration.
> 
> It might also be that the blocksize of the filesystem matters. 4kB 
> filesystems are fundamentally simpler than 1kB filesystems, for example. 
> You can tell at least with "/sbin/dumpe2fs -h /dev/..." or something.
> 
> Andrei - one thing that might be interesting to see: when corruption 
> occurs, can you get the corrupted file somehow? And compare it with a 
> known-good copy to see what the corruption looks like?

the corrupted file has a chink full with zeros

http://193.226.119.62/corruption0.jpg
http://193.226.119.62/corruption1.jpg



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Andrei Popa wrote:
> > >
> > > This should be fairly easy to test: just change every single ", 1" case 
> > > in 
> > > the patch to ", 0".
> > >
> > > What happens for you in that case?
> > 
> > I have file corruption.
> 
> Magic. And btw, _thanks_ for being such a great tester.
> 
> So now I have one more thng for you to try, it you can bother:
> 
> There's exactly two call sites that call "page_mkclean()" (an dthat is the 
> only thing in turn that calls "page_mkclean_one()", which we already 
> determined will cause the corruption). 
> 
> Both of them do 
> 
>   if (mapping_cap_account_dirty(mapping)) {
>   ..
> 
> things, although they do slightly different things inside that if in your 
> patched kernel.
> 
> Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() 
> case? Just do an "#if 0 .. #endif" around that whole if-statement, leaving 
> the _only_ thing that actually calls "page_mkclean()" to be the 
> "clear_page_dirty_for_io()" call.
> 
> Do you still see corruption?

nope, no file corruption at all.



diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..5e29b37 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp->count since we may have released page lock */
if (test_bit(META_discard, >flag) && !mp->count) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
}
 #else
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..a97e198 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
   

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Linus Torvalds wrote:
> > 
> > But at the same time, it's interesting that it still happens when we try 
> > to re-add the dirty bit. That would tell me that it's one of two cases:
> 
> Forget that. There's a third case, which is much more likely:
> 
>  - Andrew's patch had a ", 1" where it _should_ have had a ", 0".
> 
> This should be fairly easy to test: just change every single ", 1" case in 
> the patch to ", 0".
> 
> The only case that _definitely_ would want ",1" is actually the case that 
> already calls page_mkclean() directly: clear_page_dirty_for_io(). So no 
> other ", 1" is valid, and that one that needed it already avoided even 
> calling the "test_clear_page_dirty()" function, because it did it all by 
> hand.
> 
> What happens for you in that case?
> 
>   Linus

I have file corruption.


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..760442f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..7b87875 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..47a6b62 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp->count since we may have released page lock */
if (test_bit(META_discard, >flag) && !mp->count) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
}
 #else
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..a97e198 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
bh = next;
} while (bh != head);
if (PAGE_SIZE == bh->b_size) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
}
}
}
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index b56eb75..d65ba84 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -343,7 +343,7 @@ 

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Andrei Popa wrote:
> > 
> > I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last
> > two). All unified patch is attached. I tested and I have no corruption.
> 
> That wasn't very interesting, because you also had the patch that just 
> disabled "page_mkclean_one()" entirely:
> 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index d8a842a..3f9061e 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page 
> > goto unlock;
> >  
> > entry = ptep_get_and_clear(mm, address, pte);
> > -   entry = pte_mkclean(entry);
> > +   /*entry = pte_mkclean(entry);*/
> > entry = pte_wrprotect(entry);
> > ptep_establish(vma, address, pte, entry);
> > lazy_mmu_prot_update(entry);
> 
> The above patch is bad. It's always going to hide the bug, but it hides it 
> by just not doing anything at all. So any patch combination that contains 
> that patch will probably _always_ fix your problem, but it won't be an 
> interesting patch..
> 
> So can you remove that small fragment? Also, it would be nice if you added 
> the WARN_ON() to this sequence in mm/page-writeback.c:
> 
> +   if (!must_clean_ptes && cleaned)
> +   set_page_dirty(page);
> 
> just make it do a WARN_ON() if this ever triggers.
> 
> Then, IF the corruption is gone, we'd love to see the WARN_ON results..
> 
>   Linus

I dropped that patch and added WARN_ON(1), the unified patch is
attached.

I got corruption: "Hash check on download completion found bad chunks,
consider using "safe_sync"."

In dmesg there is no message from WARN_ON(1), my .config is attached.



diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..760442f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 1)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..7b87875 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..47a6b62 100644
---

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

> (On that note: Andrei - if you do test this out, I'd suggest applying my 
> patch too - the one that you already tested. It won't apply cleanly on top 
> of Andrew's patch, but it should be trivial to apply by hand, since you 
> really just want to remove the whole "if (ret) {...}" sequence. I realize 
> that it didn't make any difference for you, but applying that patch is 
> probably a good idea just to remove the noise for a codepath that you 
> already showed to not matter)


I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last
two). All unified patch is attached. I tested and I have no corruption.


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(>private_lock);
ret = drop_buffers(page, _to_free);
spin_unlock(>private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..760442f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 1)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(>lock);
 
if (offset == 0 && to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..7b87875 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..47a6b62 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp->count since we may have released page lock */
if (test_bit(META_discard, >flag) && !mp->count) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
ClearPageUptodate(page);
}
 #else
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..a97e198 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
bh = next;
} while (bh != head);
if (PAGE_SIZE == bh->b_size) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
}
}
}
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index b56eb75..d65ba84 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -343,7 +343,7 @@ xfs_start_page_writeback(
ASSERT(!PageWriteback(page));
set_page_writeback(page);
if (clear_dirty)
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
unlock_page(page);
if (!buffers) {

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
> OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3
> in?
> 

ordered

> 
> Also, for testings sake, could you give this a go:
> It's a total hack but I guess worth testing.
> 
> ---
>  mm/rmap.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6-git/mm/rmap.c
> ===
> --- linux-2.6-git.orig/mm/rmap.c  2006-12-18 11:06:29.0 +0100
> +++ linux-2.6-git/mm/rmap.c   2006-12-18 11:07:16.0 +0100
> @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page 
>   goto unlock;
>  
>   entry = ptep_get_and_clear(mm, address, pte);
> - entry = pte_mkclean(entry);
> + /* entry = pte_mkclean(entry); */
>   entry = pte_wrprotect(entry);
>   ptep_establish(vma, address, pte, entry);
>   lazy_mmu_prot_update(entry);
> 

with latest git and this patch there is no corruption !



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: 
> On Mon, 18 Dec 2006 11:19:04 +0200
> Andrei Popa <[EMAIL PROTECTED]> wrote:
> 
> > 
> > I tried latest git with the patch from this email and it still get file
> > content corruption. If I can help you further debug the problem tell me
> > what to do.
> 
> Can you please tell us all the steps which we need to take to reproduce this?

I'm using rtorrent-0.7.0 and libtorrent-0.11.0, just download a torrent
with multiple files(I downloaded 84 rar files) and when it will finish
it will do a hash check and at the end of the check will say "Hash check
on download completion found bad chunks, consider using "safe_sync"."
and stop and most of the downloaded files are broken. With Peter
Zijlstra patch this error doesn't show but there is file
corruption(although less files are corrupted); afther the hash check,
rtorrent will download the bad chunks and do another hash check and all
files are ok.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

On Mon, 2006-12-18 at 01:18 -0800, Andrew Morton wrote:
> On Mon, 18 Dec 2006 18:22:42 +1100
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Andrew Morton wrote:
> > > On Mon, 18 Dec 2006 15:51:52 +1100
> > > Nick Piggin <[EMAIL PROTECTED]> wrote:
> > > 
> > > 
> > >>I think the problem Andrew identified is real.
> > > 
> > > 
> > > I don't.  In fact I don't think I described any problem (well, I tried to,
> > > but then I contradicted myself).
> > 
> > By saying that there shouldn't be any dirty ptes if there are no
> > dirty buffers? But in that case the _page_ shouldn't be dirty either,
> > so that clear_page_dirty would be redundant. But presumably it isn't.
> 
> I don't follow that.
> 
> The linkage between pte-dirtiness and buffer_heads is a bit hard to follow
> without also considering page-dirtiness.
> 
> > > Six hours here of fsx-linux plus high memory pressure on SMP on 1k
> > > blocksize ext3, mainline.  Zero failures.  It's unlikely that this testing
> > > would pass, yet people running normal workloads are able to easily trigger
> > > failures.  I suspect we're looking in the wrong place.
> > 
> > Yes I could believe it the corruption is caused by something else
> > completely.
> 
> Think so.  We do have a problem here, but only on threaded apps, I believe.
> rtorrent doesn't appear to be threaded, and the bug is hit on non-preempt
> UP.


ierdnac ~ # uname -a
Linux ierdnac 2.6.20-rc1 #2 SMP PREEMPT Mon Dec 18 11:01:52 EET 2006
i686 Genuine Intel(R) CPU   T2050  @ 1.60GHz GenuineIntel
GNU/Linux


and the other person who had corruption with rtorrent has also SMP and
PREEMPT.


> 
> > >>The issue is the disconnect between the pte dirtiness and a filesystem
> > >>bringing buffers clean.
> > > 
> > > 
> > > Really?  The dirtying direction goes pte_dirty->PG_dirty->BH_Dirty and the
> > > cleaning direction goes !BH_Dirty->!PG_dirty->!pte_dirty.  That's pretty
> > > simple, setting aside races.
> > > 
> > > In the try_to_free_buffers case there's a large time inverval between
> > > !BH_Dirty and !PG_dirty, but that shouldn't affect anything.
> > 
> > After try_to_free_buffers detaches the buffers from the page, a
> > pagefault can come in, and mark the pte writeable, then set_page_dirty
> > (which finds no buffers, so only sets PG_dirty).
> > 
> > The page can now get dirtied through this mapping.
> > 
> > try_to_free_buffers then goes on to clean the page and ptes.
> 
> try_to_free_buffers() isn't called against a page which doesn't have
> buffers.  It'll oops.
> 
> > Were you testing with preempt?
> 
> nope, just SMP.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
I tried latest git with the patch from this email and it still get file
content corruption. If I can help you further debug the problem tell me
what to do.

On Sun, 2006-12-17 at 21:50 -0800, Linus Torvalds wrote:
> 
> On Mon, 18 Dec 2006, Nick Piggin wrote:
> > 
> > I can't see how that's exactly a problem -- so long as the page does not
> > get reclaimed (it won't, because we have a ref on it) then all that matters
> > is that the page eventually gets marked dirty.
> 
> But the point being that "try_to_free_buffers()" marks it clean 
> AFTERWARDS.
> 
> So yes, the page gets marked dirty in the pte's - the hardware generally 
> does that for us, so we don't have to worry about that part going on.
> 
> But "try_to_free_buffers()" seems to clear those dirty bits without 
> serializing it really any way. It just says "ok, I will now clear them". 
> Without knowing whether the dirty bits got set before the IO that cleared 
> the buffer head dirty bits or not.
> 
> What is _that_ serialization? As far as I can see, the only way to 
> guarantee that to happen (since the dirty bits in the page tables will get 
> set without us ever even being notified) is that the page tables 
> themselves must simply never contain that page in a writable form at all.
> 
> And that seems to be lacking.
> 
> Anyway, I have what I consider a much simpler solution: just don't DO all 
> that crap in try_to_free_buffers() at all. I sent it out to some people 
> already, not not very widely. 
> 
> I reproduce my suggestion here for you (and maybe others too who weren't 
> cc'd in that other discussion group) to comment on..
> 
>   Linus
> 
> ---
> 
> So I think your patch is really broken, how about this one instead?
> 
> It's really my previous patch, BUT it also adds a 
> 
>   if (PageDirty(page) ..
>   return 0;
> 
> case, on the assumption that since PageDirty() measn that one of the 
> buffers should be dirty, there's no point in even _trying_ drop_buffers, 
> since that should just fail anyway.
> 
> Now, that assumption is obviously wrong _if_ the buffers have been cleaned 
> by something else. So in that case, we now don't remove the buffer heads, 
> but who really cares? The page will remain on the dirty list, and 
> something should be trying to write it out, but since now all the buffers 
> are clean, once that happens, there is no actual IO to happen.
> 
> Hmm? So this means that we simply don't remove the buffers early from such 
> pages, but there shouldn't be any real downside.
> 
> Now, the only question would be if the page is marked dirty _while_ this 
> is running. We do hold the page lock, but page dirtying doesn't get the 
> lock, does it? But at least we won't mark the page _clean_ when it 
> shouldn't be.. And we still are atomic wrt the actual buffer lists 
> (mapping->private_lock), so I think this should all be ok, and 
> drop_buffers() will do the right thing.
> 
> So no race possible either.
> 
> At least as far as I can see. And the patch certainly is simple.
> 
> Now the question whether this actually _fixes_ any problems does remain, 
> but I think this should be a pretty good solution if the bug really is 
> here. Andrew?
> 
>   Linus
> 
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index d1f1b54..263f88e 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
>   int ret = 0;
>  
>   BUG_ON(!PageLocked(page));
> - if (PageWriteback(page))
> + if (PageDirty(page) || PageWriteback(page))
>   return 0;
>  
>   if (mapping == NULL) {  /* can this still happen? */
> @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page)
>   spin_lock(>private_lock);
>   ret = drop_buffers(page, _to_free);
>   spin_unlock(>private_lock);
> - if (ret) {
> - /*
> -  * If the filesystem writes its buffers by hand (eg ext3)
> -  * then we can have clean buffers against a dirty page.  We
> -  * clean the page here; otherwise later reattachment of buffers
> -  * could encounter a non-uptodate page, which is unresolvable.
> -  * This only applies in the rare case where try_to_free_buffers
> -  * succeeds but the page is not freed.
> -  *
> -  * Also, during truncate, discard_buffer will have marked all
> -  * the page's buffers clean.  We discover that here and clean
> -  * the page also.
> -  */
> - if (test_clear_page_dirty(page))
> - task_io_account_cancelled_write(PAGE_CACHE_SIZE);
> - }
>  out:
>   if (buffers_to_free) {
>   struct buffer_head *bh = buffers_to_free;
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the 

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
I tried latest git with the patch from this email and it still get file
content corruption. If I can help you further debug the problem tell me
what to do.

On Sun, 2006-12-17 at 21:50 -0800, Linus Torvalds wrote:
 
 On Mon, 18 Dec 2006, Nick Piggin wrote:
  
  I can't see how that's exactly a problem -- so long as the page does not
  get reclaimed (it won't, because we have a ref on it) then all that matters
  is that the page eventually gets marked dirty.
 
 But the point being that try_to_free_buffers() marks it clean 
 AFTERWARDS.
 
 So yes, the page gets marked dirty in the pte's - the hardware generally 
 does that for us, so we don't have to worry about that part going on.
 
 But try_to_free_buffers() seems to clear those dirty bits without 
 serializing it really any way. It just says ok, I will now clear them. 
 Without knowing whether the dirty bits got set before the IO that cleared 
 the buffer head dirty bits or not.
 
 What is _that_ serialization? As far as I can see, the only way to 
 guarantee that to happen (since the dirty bits in the page tables will get 
 set without us ever even being notified) is that the page tables 
 themselves must simply never contain that page in a writable form at all.
 
 And that seems to be lacking.
 
 Anyway, I have what I consider a much simpler solution: just don't DO all 
 that crap in try_to_free_buffers() at all. I sent it out to some people 
 already, not not very widely. 
 
 I reproduce my suggestion here for you (and maybe others too who weren't 
 cc'd in that other discussion group) to comment on..
 
   Linus
 
 ---
 
 So I think your patch is really broken, how about this one instead?
 
 It's really my previous patch, BUT it also adds a 
 
   if (PageDirty(page) ..
   return 0;
 
 case, on the assumption that since PageDirty() measn that one of the 
 buffers should be dirty, there's no point in even _trying_ drop_buffers, 
 since that should just fail anyway.
 
 Now, that assumption is obviously wrong _if_ the buffers have been cleaned 
 by something else. So in that case, we now don't remove the buffer heads, 
 but who really cares? The page will remain on the dirty list, and 
 something should be trying to write it out, but since now all the buffers 
 are clean, once that happens, there is no actual IO to happen.
 
 Hmm? So this means that we simply don't remove the buffers early from such 
 pages, but there shouldn't be any real downside.
 
 Now, the only question would be if the page is marked dirty _while_ this 
 is running. We do hold the page lock, but page dirtying doesn't get the 
 lock, does it? But at least we won't mark the page _clean_ when it 
 shouldn't be.. And we still are atomic wrt the actual buffer lists 
 (mapping-private_lock), so I think this should all be ok, and 
 drop_buffers() will do the right thing.
 
 So no race possible either.
 
 At least as far as I can see. And the patch certainly is simple.
 
 Now the question whether this actually _fixes_ any problems does remain, 
 but I think this should be a pretty good solution if the bug really is 
 here. Andrew?
 
   Linus
 
 
 diff --git a/fs/buffer.c b/fs/buffer.c
 index d1f1b54..263f88e 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *page)
   int ret = 0;
  
   BUG_ON(!PageLocked(page));
 - if (PageWriteback(page))
 + if (PageDirty(page) || PageWriteback(page))
   return 0;
  
   if (mapping == NULL) {  /* can this still happen? */
 @@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *page)
   spin_lock(mapping-private_lock);
   ret = drop_buffers(page, buffers_to_free);
   spin_unlock(mapping-private_lock);
 - if (ret) {
 - /*
 -  * If the filesystem writes its buffers by hand (eg ext3)
 -  * then we can have clean buffers against a dirty page.  We
 -  * clean the page here; otherwise later reattachment of buffers
 -  * could encounter a non-uptodate page, which is unresolvable.
 -  * This only applies in the rare case where try_to_free_buffers
 -  * succeeds but the page is not freed.
 -  *
 -  * Also, during truncate, discard_buffer will have marked all
 -  * the page's buffers clean.  We discover that here and clean
 -  * the page also.
 -  */
 - if (test_clear_page_dirty(page))
 - task_io_account_cancelled_write(PAGE_CACHE_SIZE);
 - }
  out:
   if (buffers_to_free) {
   struct buffer_head *bh = buffers_to_free;
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

On Mon, 2006-12-18 at 01:18 -0800, Andrew Morton wrote:
 On Mon, 18 Dec 2006 18:22:42 +1100
 Nick Piggin [EMAIL PROTECTED] wrote:
 
  Andrew Morton wrote:
   On Mon, 18 Dec 2006 15:51:52 +1100
   Nick Piggin [EMAIL PROTECTED] wrote:
   
   
  I think the problem Andrew identified is real.
   
   
   I don't.  In fact I don't think I described any problem (well, I tried to,
   but then I contradicted myself).
  
  By saying that there shouldn't be any dirty ptes if there are no
  dirty buffers? But in that case the _page_ shouldn't be dirty either,
  so that clear_page_dirty would be redundant. But presumably it isn't.
 
 I don't follow that.
 
 The linkage between pte-dirtiness and buffer_heads is a bit hard to follow
 without also considering page-dirtiness.
 
   Six hours here of fsx-linux plus high memory pressure on SMP on 1k
   blocksize ext3, mainline.  Zero failures.  It's unlikely that this testing
   would pass, yet people running normal workloads are able to easily trigger
   failures.  I suspect we're looking in the wrong place.
  
  Yes I could believe it the corruption is caused by something else
  completely.
 
 Think so.  We do have a problem here, but only on threaded apps, I believe.
 rtorrent doesn't appear to be threaded, and the bug is hit on non-preempt
 UP.


ierdnac ~ # uname -a
Linux ierdnac 2.6.20-rc1 #2 SMP PREEMPT Mon Dec 18 11:01:52 EET 2006
i686 Genuine Intel(R) CPU   T2050  @ 1.60GHz GenuineIntel
GNU/Linux


and the other person who had corruption with rtorrent has also SMP and
PREEMPT.


 
  The issue is the disconnect between the pte dirtiness and a filesystem
  bringing buffers clean.
   
   
   Really?  The dirtying direction goes pte_dirty-PG_dirty-BH_Dirty and the
   cleaning direction goes !BH_Dirty-!PG_dirty-!pte_dirty.  That's pretty
   simple, setting aside races.
   
   In the try_to_free_buffers case there's a large time inverval between
   !BH_Dirty and !PG_dirty, but that shouldn't affect anything.
  
  After try_to_free_buffers detaches the buffers from the page, a
  pagefault can come in, and mark the pte writeable, then set_page_dirty
  (which finds no buffers, so only sets PG_dirty).
  
  The page can now get dirtied through this mapping.
  
  try_to_free_buffers then goes on to clean the page and ptes.
 
 try_to_free_buffers() isn't called against a page which doesn't have
 buffers.  It'll oops.
 
  Were you testing with preempt?
 
 nope, just SMP.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: 
 On Mon, 18 Dec 2006 11:19:04 +0200
 Andrei Popa [EMAIL PROTECTED] wrote:
 
  
  I tried latest git with the patch from this email and it still get file
  content corruption. If I can help you further debug the problem tell me
  what to do.
 
 Can you please tell us all the steps which we need to take to reproduce this?

I'm using rtorrent-0.7.0 and libtorrent-0.11.0, just download a torrent
with multiple files(I downloaded 84 rar files) and when it will finish
it will do a hash check and at the end of the check will say Hash check
on download completion found bad chunks, consider using safe_sync.
and stop and most of the downloaded files are broken. With Peter
Zijlstra patch this error doesn't show but there is file
corruption(although less files are corrupted); afther the hash check,
rtorrent will download the bad chunks and do another hash check and all
files are ok.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
 OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3
 in?
 

ordered

 
 Also, for testings sake, could you give this a go:
 It's a total hack but I guess worth testing.
 
 ---
  mm/rmap.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 Index: linux-2.6-git/mm/rmap.c
 ===
 --- linux-2.6-git.orig/mm/rmap.c  2006-12-18 11:06:29.0 +0100
 +++ linux-2.6-git/mm/rmap.c   2006-12-18 11:07:16.0 +0100
 @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page 
   goto unlock;
  
   entry = ptep_get_and_clear(mm, address, pte);
 - entry = pte_mkclean(entry);
 + /* entry = pte_mkclean(entry); */
   entry = pte_wrprotect(entry);
   ptep_establish(vma, address, pte, entry);
   lazy_mmu_prot_update(entry);
 

with latest git and this patch there is no corruption !



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

 (On that note: Andrei - if you do test this out, I'd suggest applying my 
 patch too - the one that you already tested. It won't apply cleanly on top 
 of Andrew's patch, but it should be trivial to apply by hand, since you 
 really just want to remove the whole if (ret) {...} sequence. I realize 
 that it didn't make any difference for you, but applying that patch is 
 probably a good idea just to remove the noise for a codepath that you 
 already showed to not matter)


I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last
two). All unified patch is attached. I tested and I have no corruption.


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..760442f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 1)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(fc-lock);
 
if (offset == 0  to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..7b87875 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..47a6b62 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp-count since we may have released page lock */
if (test_bit(META_discard, mp-flag)  !mp-count) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
ClearPageUptodate(page);
}
 #else
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..a97e198 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
bh = next;
} while (bh != head);
if (PAGE_SIZE == bh-b_size) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
}
}
}
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index b56eb75..d65ba84 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -343,7 +343,7 @@ xfs_start_page_writeback(
ASSERT(!PageWriteback(page));
set_page_writeback(page);
if (clear_dirty)
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
unlock_page(page);
if (!buffers) {

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote:
 
 On Mon, 18 Dec 2006, Andrei Popa wrote:
  
  I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last
  two). All unified patch is attached. I tested and I have no corruption.
 
 That wasn't very interesting, because you also had the patch that just 
 disabled page_mkclean_one() entirely:
 
  diff --git a/mm/rmap.c b/mm/rmap.c
  index d8a842a..3f9061e 100644
  --- a/mm/rmap.c
  +++ b/mm/rmap.c
  @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page 
  goto unlock;
   
  entry = ptep_get_and_clear(mm, address, pte);
  -   entry = pte_mkclean(entry);
  +   /*entry = pte_mkclean(entry);*/
  entry = pte_wrprotect(entry);
  ptep_establish(vma, address, pte, entry);
  lazy_mmu_prot_update(entry);
 
 The above patch is bad. It's always going to hide the bug, but it hides it 
 by just not doing anything at all. So any patch combination that contains 
 that patch will probably _always_ fix your problem, but it won't be an 
 interesting patch..
 
 So can you remove that small fragment? Also, it would be nice if you added 
 the WARN_ON() to this sequence in mm/page-writeback.c:
 
 +   if (!must_clean_ptes  cleaned)
 +   set_page_dirty(page);
 
 just make it do a WARN_ON() if this ever triggers.
 
 Then, IF the corruption is gone, we'd love to see the WARN_ON results..
 
   Linus

I dropped that patch and added WARN_ON(1), the unified patch is
attached.

I got corruption: Hash check on download completion found bad chunks,
consider using safe_sync.

In dmesg there is no message from WARN_ON(1), my .config is attached.



diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..760442f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 1)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(fc-lock);
 
if (offset == 0  to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..7b87875 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 1);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..47a6b62 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp-count since we may have released page lock */
if (test_bit(META_discard, mp-flag)  !mp-count) {
-   clear_page_dirty(page

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote:
 
 On Mon, 18 Dec 2006, Linus Torvalds wrote:
  
  But at the same time, it's interesting that it still happens when we try 
  to re-add the dirty bit. That would tell me that it's one of two cases:
 
 Forget that. There's a third case, which is much more likely:
 
  - Andrew's patch had a , 1 where it _should_ have had a , 0.
 
 This should be fairly easy to test: just change every single , 1 case in 
 the patch to , 0.
 
 The only case that _definitely_ would want ,1 is actually the case that 
 already calls page_mkclean() directly: clear_page_dirty_for_io(). So no 
 other , 1 is valid, and that one that needed it already avoided even 
 calling the test_clear_page_dirty() function, because it did it all by 
 hand.
 
 What happens for you in that case?
 
   Linus

I have file corruption.


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..760442f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(fc-lock);
 
if (offset == 0  to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..7b87875 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..47a6b62 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp-count since we may have released page lock */
if (test_bit(META_discard, mp-flag)  !mp-count) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
}
 #else
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..a97e198 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
bh = next;
} while (bh != head);
if (PAGE_SIZE == bh-b_size) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
}
}
}
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index b56eb75..d65ba84 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -343,7 +343,7 @@ xfs_start_page_writeback(
  

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote:
 
 On Mon, 18 Dec 2006, Andrei Popa wrote:
  
   This should be fairly easy to test: just change every single , 1 case 
   in 
   the patch to , 0.
  
   What happens for you in that case?
  
  I have file corruption.
 
 Magic. And btw, _thanks_ for being such a great tester.
 
 So now I have one more thng for you to try, it you can bother:
 
 There's exactly two call sites that call page_mkclean() (an dthat is the 
 only thing in turn that calls page_mkclean_one(), which we already 
 determined will cause the corruption). 
 
 Both of them do 
 
   if (mapping_cap_account_dirty(mapping)) {
   ..
 
 things, although they do slightly different things inside that if in your 
 patched kernel.
 
 Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() 
 case? Just do an #if 0 .. #endif around that whole if-statement, leaving 
 the _only_ thing that actually calls page_mkclean() to be the 
 clear_page_dirty_for_io() call.
 
 Do you still see corruption?

nope, no file corruption at all.



diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(fc-lock);
 
if (offset == 0  to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..5e29b37 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp-count since we may have released page lock */
if (test_bit(META_discard, mp-flag)  !mp-count) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
}
 #else
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..a97e198 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *p
bh = next;
} while (bh != head);
if (PAGE_SIZE == bh-b_size) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote:
 
 On Mon, 18 Dec 2006, Alessandro Suardi wrote:
  
  No idea whether this can be a data point or not, but
  here it goes... my P2P box is about to turn 5 days old
  while running nonstop one or both of aMule 2.1.3 and
  BitTorrent 4.4.0 on ext3 mounted w/default options
  on both IDE and USB disks. Zero corruption.
  
  AMD K7-800, 512MB RAM, PREEMPT/UP kernel,
  2.6.19-git20 on top of up-to-date FC6.
 
 It _looks_ like PREEMPT/SMP is one common configuration.
 
 It might also be that the blocksize of the filesystem matters. 4kB 
 filesystems are fundamentally simpler than 1kB filesystems, for example. 
 You can tell at least with /sbin/dumpe2fs -h /dev/... or something.
 
 Andrei - one thing that might be interesting to see: when corruption 
 occurs, can you get the corrupted file somehow? And compare it with a 
 known-good copy to see what the corruption looks like?

the corrupted file has a chink full with zeros

http://193.226.119.62/corruption0.jpg
http://193.226.119.62/corruption1.jpg



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote:
 
 On Tue, 19 Dec 2006, Andrei Popa wrote:
   
   There's exactly two call sites that call page_mkclean() (an dthat is 
   the 
   only thing in turn that calls page_mkclean_one(), which we already 
   determined will cause the corruption). 
  
   Can you just TOTALLY DISABLE that case for the test_clear_page_dirty() 
   case? Just do an #if 0 .. #endif around that whole if-statement, 
   leaving 
   the _only_ thing that actually calls page_mkclean() to be the 
   clear_page_dirty_for_io() call.
   
   Do you still see corruption?
  
  nope, no file corruption at all.
 
 Ok. That's interesting, but I think you actually #ifdef'ed out too 
 much:
 
  +
  +#if 0
  if (TestClearPageDirty(page)) {
  radix_tree_tag_clear(mapping-page_tree,
  page_index(page), PAGECACHE_TAG_DIRTY);
  @@ -866,11 +868,19 @@ int test_clear_page_dirty(struct page *p
   * page is locked, which pins the address_space
   */
  if (mapping_cap_account_dirty(mapping)) {
  -   page_mkclean(page);
  +   int cleaned = page_mkclean(page);
  +   if (!must_clean_ptes  cleaned){
  +   WARN_ON(1);
  +   set_page_dirty(page);
  +   }
  +
  dec_zone_page_state(page, NR_FILE_DIRTY);
  }
  return 1;
  }
  +
  +#endif
  +
 
 It was really just the _inner_ if (mapping_cap_account_dirty(.. 
 statement that I meant you should remove.
 
 Can you try that too?

I have file corruption: Hash check on download completion found bad
chunks, consider using safe_sync.


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(fc-lock);
 
if (offset == 0  to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page);
put_page(page);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index b1a1c72..5e29b37 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -773,7 +773,7 @@ #if MPS_PER_PAGE == 1
 
/* Retest mp-count since we may have released page lock */
if (test_bit(META_discard, mp-flag)  !mp-count

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote:
 On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
 Linus Torvalds [EMAIL PROTECTED] wrote:
 
  What happens if you only ifdef out that single thing? 
  
  The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
  bit _after_ the page has been marked for writeback. Is there some ordering 
  constraint there, perhaps?
  
  I'm really reaching here. I'm trying to see the pattern, and I'm not 
  seeing it. I'm asking you to test things just to get more of a feel for 
  what triggers the failure, than because I actually have any kind of idea 
  of what the heck is going on.
  
  Andrew, Nick, Hugh - any ideas?
 
 If all of test_clear_page_dirty() has been commented out then the page will
 never become clean hence will never fall out of pagecache, so unless Andrei
 is doing a reboot before checking for corruption, perhaps the underlying
 data on-disk is incorrect, but we can't see it.

if I do a sync and echo 1  /proc/sys/vm/drop_caches does the reboot is
still necesary ?

 
 Andrei, how _are_ you running this test?What's the exact sequence of 
 steps?
 
 In particular, are you doing anything which would cause the corrupted file
 to be evicted from memory, thus forcing a read from disk?  Such as
 unmounting and then remounting the filesystem?

I boot linux, I start rtorrent and start the download, while it's
downloading I start evolution and i check my mail(my mbox is very large,
several hundered megabytes), I close evolution(I use evolution just to
have another application witch uses the filesystem and the memory), I
start evolution again. I start firefox. The download is complete.
Rtorrent says if the hash is good or not. I do a unrar t qwe.rar to
test that all 84 downloaded rar files are ok and see the result.

 
 The point of my question is to check that the data is really incorrect
 on-disk, or whether it is incorrect in pagecache.
 
 Also, it'd be useful if you could determine whether the bug appears with
 the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
 rootfstype=ext2 if it's the root filesystem.

I will test.

 
 Thanks.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote:
 
 On Tue, 19 Dec 2006, Andrei Popa wrote:

nope, no file corruption at all.
   
   Ok. That's interesting, but I think you actually #ifdef'ed out too 
   much:
   
   It was really just the _inner_ if (mapping_cap_account_dirty(.. 
   statement that I meant you should remove.
   
   Can you try that too?
  
  I have file corruption: Hash check on download completion found bad
  chunks, consider using safe_sync.
 
 Ok, that's interesting.
 
 So it doesn't seem to be the call to page_mkclean() itself that causes 
 corruption. It looks like Peter's hunch that maybe there's some bug in 
 PG_dirty handling _itself_ might be an idea..
 
 And the reason it only started happening now is that it may just have been 
 _hidden_ by the fact that while we kept the dirty bits in the page tables, 
 we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. 
 So if it's some bad interaction between writable mappings and some other 
 part of the system, we just didn't see it earlier, exactly because we had 
 _lots_ of dirty bits, and it was enough that _one_ of them was right.
 
 If you didn't see corruption when you #ifdef'ed out too much of the 
 test_clean_page_dirty() function (the _whole_ TestClearPageDirty() 
 if-statement), but you get it when you just comment out the stuff that 
 does the page_mkclean(), that's interesting.
 
 I'm left lookin gat the radix_tree_tag_clear() in 
 test_clear_page_dirty().
 
 What happens if you only ifdef out that single thing? 

I have file corruption.

 
 The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
 bit _after_ the page has been marked for writeback. Is there some ordering 
 constraint there, perhaps?
 
 I'm really reaching here. I'm trying to see the pattern, and I'm not 
 seeing it. I'm asking you to test things just to get more of a feel for 
 what triggers the failure, than because I actually have any kind of idea 
 of what the heck is going on.
 
 Andrew, Nick, Hugh - any ideas?
 
   Linus


diff --git a/fs/buffer.c b/fs/buffer.c
index d1f1b54..263f88e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag
int ret = 0;
 
BUG_ON(!PageLocked(page));
-   if (PageWriteback(page))
+   if (PageDirty(page) || PageWriteback(page))
return 0;
 
if (mapping == NULL) {  /* can this still happen? */
@@ -2845,22 +2845,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(mapping-private_lock);
ret = drop_buffers(page, buffers_to_free);
spin_unlock(mapping-private_lock);
-   if (ret) {
-   /*
-* If the filesystem writes its buffers by hand (eg ext3)
-* then we can have clean buffers against a dirty page.  We
-* clean the page here; otherwise later reattachment of buffers
-* could encounter a non-uptodate page, which is unresolvable.
-* This only applies in the rare case where try_to_free_buffers
-* succeeds but the page is not freed.
-*
-* Also, during truncate, discard_buffer will have marked all
-* the page's buffers clean.  We discover that here and clean
-* the page also.
-*/
-   if (test_clear_page_dirty(page))
-   task_io_account_cancelled_write(PAGE_CACHE_SIZE);
-   }
 out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..2d8 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);
 
if (PageWriteback(page) ||
-   !test_clear_page_dirty(page)) {
+   !test_clear_page_dirty(page, 0)) {
unlock_page(page);
break;
}
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1387749..da2bdb1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -484,7 +484,7 @@ static int fuse_commit_write(struct file
spin_unlock(fc-lock);
 
if (offset == 0  to == PAGE_CACHE_SIZE) {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
SetPageUptodate(page);
}
}
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ed2c223..9f82cd0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,7 +176,7 @@ static int hugetlbfs_commit_write(struct
 
 static void truncate_huge_page(struct page *page)
 {
-   clear_page_dirty(page);
+   clear_page_dirty(page, 0);
ClearPageUptodate(page);
remove_from_page_cache(page

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa

   If all of test_clear_page_dirty() has been commented out then the page 
   will
   never become clean hence will never fall out of pagecache, so unless 
   Andrei
   is doing a reboot before checking for corruption, perhaps the underlying
   data on-disk is incorrect, but we can't see it.
  
  if I do a sync and echo 1  /proc/sys/vm/drop_caches
 
 OK, that works.
 
   does the reboot is
  still necesary ?
 
 It might be necessary to reboot in this case - if we're leaving the
 pagecache dirty, writing to drop_caches won't remove it.  And you probably
 won't be able to get a clean reboot either.
 
   
   Andrei, how _are_ you running this test?What's the exact sequence of 
   steps?
   
   In particular, are you doing anything which would cause the corrupted file
   to be evicted from memory, thus forcing a read from disk?  Such as
   unmounting and then remounting the filesystem?
  
  I boot linux, I start rtorrent and start the download, while it's
  downloading I start evolution and i check my mail(my mbox is very large,
  several hundered megabytes), I close evolution(I use evolution just to
  have another application witch uses the filesystem and the memory), I
  start evolution again. I start firefox. The download is complete.
  Rtorrent says if the hash is good or not. I do a unrar t qwe.rar to
  test that all 84 downloaded rar files are ok and see the result.
  
   
   The point of my question is to check that the data is really incorrect
   on-disk, or whether it is incorrect in pagecache.

I rebooted and the files are still broken after reboot(tested twice) so
the data is incorrect on disk.

   
   Also, it'd be useful if you could determine whether the bug appears with
   the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
   rootfstype=ext2 if it's the root filesystem.
  
  I will test.

Will test In a couple of hours, I have some work to do...

 
 ok, thanks.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa
I was mistaken, I'm still having file corruption with rtorrent.

On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote:
> On Sun, 17 Dec 2006 02:13:18 +0200
> Andrei Popa <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> > I had filesystem data corruption with rtorrent with 2.6.19.
> > I tried recent git with Peter Zijlstra patch
> > http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is
> > fixed.
> > 
> 
> oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the
> ptes.
> 
> I'd be really surprised if this was all due to a race though.  Is everyone
> who has observed this problem running SMP and/or premptible kernels?
> 
> Peter, why isn't that proposed patch's cleaning of the pte racy against
> do_wp_page()?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa

ierdnac ~ # uname -a
Linux ierdnac 2.6.20-rc1 #1 SMP PREEMPT Sun Dec 17 01:52:28 EET 2006
i686 Genuine Intel(R) CPU   T2050  @ 1.60GHz GenuineIntel
GNU/Linux


On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote:
> On Sun, 17 Dec 2006 02:13:18 +0200
> Andrei Popa <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> > I had filesystem data corruption with rtorrent with 2.6.19.
> > I tried recent git with Peter Zijlstra patch
> > http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is
> > fixed.
> > 
> 
> oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the
> ptes.
> 
> I'd be really surprised if this was all due to a race though.  Is everyone
> who has observed this problem running SMP and/or premptible kernels?
> 
> Peter, why isn't that proposed patch's cleaning of the pte racy against
> do_wp_page()?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa

ierdnac ~ # uname -a
Linux ierdnac 2.6.20-rc1 #1 SMP PREEMPT Sun Dec 17 01:52:28 EET 2006
i686 Genuine Intel(R) CPU   T2050  @ 1.60GHz GenuineIntel
GNU/Linux


On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote:
 On Sun, 17 Dec 2006 02:13:18 +0200
 Andrei Popa [EMAIL PROTECTED] wrote:
 
  Hello,
  I had filesystem data corruption with rtorrent with 2.6.19.
  I tried recent git with Peter Zijlstra patch
  http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is
  fixed.
  
 
 oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the
 ptes.
 
 I'd be really surprised if this was all due to a race though.  Is everyone
 who has observed this problem running SMP and/or premptible kernels?
 
 Peter, why isn't that proposed patch's cleaning of the pte racy against
 do_wp_page()?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa
I was mistaken, I'm still having file corruption with rtorrent.

On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote:
 On Sun, 17 Dec 2006 02:13:18 +0200
 Andrei Popa [EMAIL PROTECTED] wrote:
 
  Hello,
  I had filesystem data corruption with rtorrent with 2.6.19.
  I tried recent git with Peter Zijlstra patch
  http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is
  fixed.
  
 
 oh crap, I'd forgotten that test_clear_page_dirty() now fiddles with the
 ptes.
 
 I'd be really surprised if this was all due to a race though.  Is everyone
 who has observed this problem running SMP and/or premptible kernels?
 
 Peter, why isn't that proposed patch's cleaning of the pte racy against
 do_wp_page()?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Andrei Popa
Hello,
I had filesystem data corruption with rtorrent with 2.6.19.
I tried recent git with Peter Zijlstra patch
http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is
fixed.

Please CC as I am not subscribed to lkml.

Andrei

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Andrei Popa
Hello,
I had filesystem data corruption with rtorrent with 2.6.19.
I tried recent git with Peter Zijlstra patch
http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is
fixed.

Please CC as I am not subscribed to lkml.

Andrei

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/