Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction

2012-07-05 Thread Liu Bo
On 07/04/2012 11:15 PM, Marc MERLIN wrote:

 On Wed, Jul 04, 2012 at 01:58:31PM +0800, Liu Bo wrote:
  The dmesg log, sysrq log and stack dump info can usually be very helpful.
  
  From your report, we can see the csum error and hang on log,
  'no csum' is not that bad while hanging-on is serious and dangerous.
  
  so can you please get any 'sysrq + w' log in the hanging-on case and paste 
  them here,
  and the log may tell us who blocks other threads.
 
 Hi, thanks for the answer.
  
 I dumped all sysrq data, that was in my original Email. Here are two
 different sysrq+w runs, as well as aborted transaction messages from that
 Email. 
 Sorry that the original was a bit long and contained a bunch of sysrq output.
 
From doing further testing since then, it does seem that the code just start
 doing bad things, including the file corruption I saw, when I'm running low
 on free space.
 
 Anything else that would help?
 
 Thanks,
 Marc
 


I'd expect to get some info from the following one, but I fails.

Is it reproducible on your box?

thanks,
liubo

 
 Back to 3.2.16, I'm now seeing this:
 [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
 [  840.516735] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [  840.516736] VirtualBox  D 8801fd134080 0  6818   6758 
 0x0080
 [  840.516740]  8801fd134080 0086 0050 
 880202e7f100
 [  840.516744]  00013580 8801c6f0dfd8 8801c6f0dfd8 
 8801fd134080
 [  840.516748]  8801c6f0da68 8801c6f0da68 88020a4e22f0 
 88023bc13e08
 [  840.516752] Call Trace:
 [  840.516755]  [810b5c67] ? __lock_page+0x66/0x66
 [  840.516758]  [8134aea4] ? io_schedule+0x58/0x6f
 [  840.516761]  [810b5c6d] ? sleep_on_page+0x6/0xa
 [  840.516764]  [8134b1e5] ? __wait_on_bit_lock+0x3c/0x85
 [  840.516767]  [810b5c62] ? __lock_page+0x61/0x66
 [  840.516770]  [81060051] ? autoremove_wake_function+0x2a/0x2a
 [  840.516785]  [a01838d7] ? 
 extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
 [  840.516789]  [810ec9cb] ? __cache_free.isra.40+0x19/0x1a7
 [  840.516792]  [8134ed52] ? sub_preempt_count+0x83/0x94
 [  840.516795]  [8134c2dd] ? _raw_spin_unlock+0x24/0x30
 [  840.516811]  [a0183c4b] ? extent_writepages+0x40/0x57 [btrfs]
 [  840.516826]  [a0177f5f] ? __btrfs_buffered_write+0x2bb/0x2dc 
 [btrfs]
 [  840.516841]  [a016e88a] ? uncompress_inline.isra.44+0x116/0x116 
 [btrfs]
 [  840.516844]  [810b6aaf] ? __filemap_fdatawrite_range+0x4b/0x50
 [  840.516847]  [810b6ad9] ? filemap_write_and_wait_range+0x25/0x4d
 [  840.516863]  [a01782ce] ? btrfs_file_aio_write+0x34e/0x490 
 [btrfs]
 [  840.516866]  [8103e092] ? get_parent_ip+0x9/0x1b
 [  840.516882]  [a0177f80] ? __btrfs_buffered_write+0x2dc/0x2dc 
 [btrfs]
 [  840.516886]  [8112f19c] ? aio_rw_vect_retry+0x70/0x18e
 [  840.516888]  [8112f12c] ? aio_fsync+0x22/0x22
 [  840.516891]  [8112fbc7] ? aio_run_iocb+0x72/0x11c
 [  840.516894]  [81130d9a] ? do_io_submit+0x6a4/0x7f9
 [  840.516898]  [813508d2] ? system_call_fastpath+0x16/0x1b
 [ 1187.553635] btrfs: unlinked 8 orphans
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction

2012-07-04 Thread Marc MERLIN
On Wed, Jul 04, 2012 at 01:58:31PM +0800, Liu Bo wrote:
 The dmesg log, sysrq log and stack dump info can usually be very helpful.
 
 From your report, we can see the csum error and hang on log,
 'no csum' is not that bad while hanging-on is serious and dangerous.
 
 so can you please get any 'sysrq + w' log in the hanging-on case and paste 
 them here,
 and the log may tell us who blocks other threads.

Hi, thanks for the answer.
 
I dumped all sysrq data, that was in my original Email. Here are two
different sysrq+w runs, as well as aborted transaction messages from that
Email. 
Sorry that the original was a bit long and contained a bunch of sysrq output.

From doing further testing since then, it does seem that the code just start
doing bad things, including the file corruption I saw, when I'm running low
on free space.

Anything else that would help?

Thanks,
Marc

 [84951.680847] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 
 #1
 [84951.680850] ktime   : 84951680.847904
 [84951.680851] sched_clk   : 55820767.832790
 [84951.680853] cpu_clk : 84951680.846676
 [84951.680855] jiffies : 4316130216
 [84951.680857] sched_clock_stable  : 0
 [84951.680858] 
 [84951.680859] sysctl_sched
 [84951.680860]   .sysctl_sched_latency: 12.00
 [84951.680862]   .sysctl_sched_min_granularity: 1.50
 [84951.680864]   .sysctl_sched_wakeup_granularity : 2.00
 [84951.680865]   .sysctl_sched_child_runs_first   : 0
 [84951.680867]   .sysctl_sched_features   : 24119
 [84951.680869]   .sysctl_sched_tunable_scaling: 1 (logaritmic)
 [84951.680871] 
 [84951.680872] cpu#0, 2527.110 MHz
 [84951.680873]   .nr_running: 4
 [84951.680875]   .load  : 1542
 [84951.680876]   .nr_switches   : 298015357
 [84951.680878]   .nr_load_updates   : 16858906
 [84951.680879]   .nr_uninterruptible: -7812
 [84951.680881]   .next_balance  : 4316.130229
 [84951.680883]   .curr-pid : 6859
 [84951.680884]   .clock : 84951679.104864
 [84951.680886]   .cpu_load[0]   : 1277
 [84951.680887]   .cpu_load[1]   : 885
 [84951.680889]   .cpu_load[2]   : 728
 [84951.680890]   .cpu_load[3]   : 693
 [84951.680892]   .cpu_load[4]   : 731
 [84951.680894] 
 [84951.680894] cfs_rq[0]:/autogroup-69
 [84951.680896]   .exec_clock: 0.00
 [84951.680898]   .MIN_vruntime  : 0.01
 [84951.680899]   .min_vruntime  : 194572.976125
 [84951.680901]   .max_vruntime  : 0.01
 [84951.680903]   .spread: 0.00
 [84951.680904]   .spread0   : -52160826.593685
 [84951.680906]   .nr_spread_over: 0
 [84951.680907]   .nr_running: 0
 [84951.680909]   .load  : 0
 [84951.680910]   .load_avg  : 5179.920896
 [84951.680912]   .load_period   : 8.310703
 [84951.680913]   .load_contrib  : 623
 [84951.680915]   .load_tg   : 623
 [84951.680917]   .se-exec_start: 84951675.234336
 [84951.680918]   .se-vruntime  : 52355393.270692
 [84951.680920]   .se-sum_exec_runtime  : 241715.485721
 [84951.680922]   .se-load.weight   : 2
 [84951.680923] 
 [84951.680924] cfs_rq[0]:/autogroup-20
 [84951.680925]   .exec_clock: 0.00
 [84951.680927]   .MIN_vruntime  : 0.01
 [84951.680929]   .min_vruntime  : 13027.823103
 [84951.680930]   .max_vruntime  : 0.01
 [84951.680932]   .spread: 0.00
 [84951.680934]   .spread0   : -52342371.746707
 [84951.680935]   .nr_spread_over: 0
 [84951.680937]   .nr_running: 0
 [84951.680938]   .load  : 0
 [84951.680940]   .load_avg  : 1279.999872
 [84951.680941]   .load_period   : 6.444962
 [84951.680943]   .load_contrib  : 198
 [84951.680944]   .load_tg   : 257
 [84951.680946]   .se-exec_start: 84951664.094034
 [84951.680948]   .se-vruntime  : 52355389.621584
 [84951.680950]   .se-sum_exec_runtime  : 13082.655453
 [84951.680951]   .se-load.weight   : 2
 [84951.680953] 
 [84951.680954] cfs_rq[0]:/autogroup-74
 [84951.680955]   .exec_clock: 0.00
 [84951.680957]   .MIN_vruntime  : 1323610.934982
 [84951.680959]   .min_vruntime  : 1323616.934982
 [84951.680960]   

Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction

2012-07-03 Thread Liu Bo
On 07/03/2012 03:58 AM, Marc MERLIN wrote:

 On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote:
 On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:
 On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
 Now, I'm also seeing these below and I have this again (86% CPU):
  6076 root  20   0 000 R   86  0.0  29:40.11 
 btrfs-delalloc-

 How bad is it, doctor?  I think I'll be going back to 3.2.16 for now 
 though.
  
 I reverted to 3.2.16 and haven't had further problems after dropping the
 current snapshot that was corrupted in various ways.

 Now, I'm not sure when I should upgrade anymore since I haven't heard of
 any fixes for what I saw.
 Assuming I go forward again, is there something else I could have
 provided to help debug?
 
 Mmmh, ok. I understand that this code comes with no guarantees, and I have
 backups, but I'm reporting a problem that lead to corruption (I had multiple
 files that were corrupted in my latest snapshot and I had to drop it and
 revert to an older snapshot and then out of fear for 3.4.4, went back to
 3.2.16).
 


Hi Marc,

Sorry for not replying this earlier.

The dmesg log, sysrq log and stack dump info can usually be very helpful.

From your report, we can see the csum error and hang on log,
'no csum' is not that bad while hanging-on is serious and dangerous.

so can you please get any 'sysrq + w' log in the hanging-on case and paste them 
here,
and the log may tell us who blocks other threads.

 I didn't see any problems with 3.2.16 (doesn't mean there weren't any, just
 that I didn't see any).


Feel free to use the latest btrfs upstream, it always contains some fixes.

thanks,
liubo

 Since my filesystem was a bit full, and that triggers problems with btrfs, I
 freed up 70GB
 gandalfthegreat:~# btrfs fi show
 Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
   Total devices 1 FS bytes used 163.01GB
   devid1 size 231.02GB used 231.02GB path /dev/dm-0
 
 I rebooted with 3.4.4 and started copying data, and for now I've gotten this:
 kernel: [  832.108558] btrfs no csum found for inode 3896855 start 0
 kernel: [  832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 
 private 0
 
 How bad is this?
 
 More generally, what was missing from my previous report (I gave all the
 sysrq I could output) that no one seemed to be able to use it?
 

 Thanks,
 Marc
 
 Back to 3.2.16, I'm now seeing this:
 [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
 [  840.516735] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [  840.516736] VirtualBox  D 8801fd134080 0  6818   6758 
 0x0080
 [  840.516740]  8801fd134080 0086 0050 
 880202e7f100
 [  840.516744]  00013580 8801c6f0dfd8 8801c6f0dfd8 
 8801fd134080
 [  840.516748]  8801c6f0da68 8801c6f0da68 88020a4e22f0 
 88023bc13e08
 [  840.516752] Call Trace:
 [  840.516755]  [810b5c67] ? __lock_page+0x66/0x66
 [  840.516758]  [8134aea4] ? io_schedule+0x58/0x6f
 [  840.516761]  [810b5c6d] ? sleep_on_page+0x6/0xa
 [  840.516764]  [8134b1e5] ? __wait_on_bit_lock+0x3c/0x85
 [  840.516767]  [810b5c62] ? __lock_page+0x61/0x66
 [  840.516770]  [81060051] ? autoremove_wake_function+0x2a/0x2a
 [  840.516785]  [a01838d7] ? 
 extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
 [  840.516789]  [810ec9cb] ? __cache_free.isra.40+0x19/0x1a7
 [  840.516792]  [8134ed52] ? sub_preempt_count+0x83/0x94
 [  840.516795]  [8134c2dd] ? _raw_spin_unlock+0x24/0x30
 [  840.516811]  [a0183c4b] ? extent_writepages+0x40/0x57 [btrfs]
 [  840.516826]  [a0177f5f] ? __btrfs_buffered_write+0x2bb/0x2dc 
 [btrfs]
 [  840.516841]  [a016e88a] ? 
 uncompress_inline.isra.44+0x116/0x116 [btrfs]
 [  840.516844]  [810b6aaf] ? __filemap_fdatawrite_range+0x4b/0x50
 [  840.516847]  [810b6ad9] ? 
 filemap_write_and_wait_range+0x25/0x4d
 [  840.516863]  [a01782ce] ? btrfs_file_aio_write+0x34e/0x490 
 [btrfs]
 [  840.516866]  [8103e092] ? get_parent_ip+0x9/0x1b
 [  840.516882]  [a0177f80] ? __btrfs_buffered_write+0x2dc/0x2dc 
 [btrfs]
 [  840.516886]  [8112f19c] ? aio_rw_vect_retry+0x70/0x18e
 [  840.516888]  [8112f12c] ? aio_fsync+0x22/0x22
 [  840.516891]  [8112fbc7] ? aio_run_iocb+0x72/0x11c
 [  840.516894]  [81130d9a] ? do_io_submit+0x6a4/0x7f9
 [  840.516898]  [813508d2] ? system_call_fastpath+0x16/0x1b
 [ 1187.553635] btrfs: unlinked 8 orphans
 [ 3810.200064] e1000e :00:19.0: BAR 0: set to [mem 
 0xfc00-0xfc01] (PCI address [0xfc00-0xfc01])
 [ 3810.200071] e1000e :00:19.0: BAR 1: set to [mem 
 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
 [ 3810.200076] e1000e :00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI 
 address [0x1840-0x185f])
 

Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction

2012-07-02 Thread Marc MERLIN
On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote:
 On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:
  On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
   Now, I'm also seeing these below and I have this again (86% CPU):
6076 root  20   0 000 R   86  0.0  29:40.11 
   btrfs-delalloc-
   
   How bad is it, doctor?  I think I'll be going back to 3.2.16 for now 
   though.
  
 I reverted to 3.2.16 and haven't had further problems after dropping the
 current snapshot that was corrupted in various ways.
 
 Now, I'm not sure when I should upgrade anymore since I haven't heard of
 any fixes for what I saw.
 Assuming I go forward again, is there something else I could have
 provided to help debug?

Mmmh, ok. I understand that this code comes with no guarantees, and I have
backups, but I'm reporting a problem that lead to corruption (I had multiple
files that were corrupted in my latest snapshot and I had to drop it and
revert to an older snapshot and then out of fear for 3.4.4, went back to
3.2.16).

I didn't see any problems with 3.2.16 (doesn't mean there weren't any, just
that I didn't see any).
Since my filesystem was a bit full, and that triggers problems with btrfs, I
freed up 70GB
gandalfthegreat:~# btrfs fi show
Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
Total devices 1 FS bytes used 163.01GB
devid1 size 231.02GB used 231.02GB path /dev/dm-0

I rebooted with 3.4.4 and started copying data, and for now I've gotten this:
kernel: [  832.108558] btrfs no csum found for inode 3896855 start 0
kernel: [  832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 
private 0

How bad is this?

More generally, what was missing from my previous report (I gave all the
sysrq I could output) that no one seemed to be able to use it?

Thanks,
Marc

  Back to 3.2.16, I'm now seeing this:
  [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
  [  840.516735] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
  this message.
  [  840.516736] VirtualBox  D 8801fd134080 0  6818   6758 
  0x0080
  [  840.516740]  8801fd134080 0086 0050 
  880202e7f100
  [  840.516744]  00013580 8801c6f0dfd8 8801c6f0dfd8 
  8801fd134080
  [  840.516748]  8801c6f0da68 8801c6f0da68 88020a4e22f0 
  88023bc13e08
  [  840.516752] Call Trace:
  [  840.516755]  [810b5c67] ? __lock_page+0x66/0x66
  [  840.516758]  [8134aea4] ? io_schedule+0x58/0x6f
  [  840.516761]  [810b5c6d] ? sleep_on_page+0x6/0xa
  [  840.516764]  [8134b1e5] ? __wait_on_bit_lock+0x3c/0x85
  [  840.516767]  [810b5c62] ? __lock_page+0x61/0x66
  [  840.516770]  [81060051] ? autoremove_wake_function+0x2a/0x2a
  [  840.516785]  [a01838d7] ? 
  extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
  [  840.516789]  [810ec9cb] ? __cache_free.isra.40+0x19/0x1a7
  [  840.516792]  [8134ed52] ? sub_preempt_count+0x83/0x94
  [  840.516795]  [8134c2dd] ? _raw_spin_unlock+0x24/0x30
  [  840.516811]  [a0183c4b] ? extent_writepages+0x40/0x57 [btrfs]
  [  840.516826]  [a0177f5f] ? __btrfs_buffered_write+0x2bb/0x2dc 
  [btrfs]
  [  840.516841]  [a016e88a] ? 
  uncompress_inline.isra.44+0x116/0x116 [btrfs]
  [  840.516844]  [810b6aaf] ? __filemap_fdatawrite_range+0x4b/0x50
  [  840.516847]  [810b6ad9] ? 
  filemap_write_and_wait_range+0x25/0x4d
  [  840.516863]  [a01782ce] ? btrfs_file_aio_write+0x34e/0x490 
  [btrfs]
  [  840.516866]  [8103e092] ? get_parent_ip+0x9/0x1b
  [  840.516882]  [a0177f80] ? __btrfs_buffered_write+0x2dc/0x2dc 
  [btrfs]
  [  840.516886]  [8112f19c] ? aio_rw_vect_retry+0x70/0x18e
  [  840.516888]  [8112f12c] ? aio_fsync+0x22/0x22
  [  840.516891]  [8112fbc7] ? aio_run_iocb+0x72/0x11c
  [  840.516894]  [81130d9a] ? do_io_submit+0x6a4/0x7f9
  [  840.516898]  [813508d2] ? system_call_fastpath+0x16/0x1b
  [ 1187.553635] btrfs: unlinked 8 orphans
  [ 3810.200064] e1000e :00:19.0: BAR 0: set to [mem 
  0xfc00-0xfc01] (PCI address [0xfc00-0xfc01])
  [ 3810.200071] e1000e :00:19.0: BAR 1: set to [mem 
  0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
  [ 3810.200076] e1000e :00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI 
  address [0x1840-0x185f])
  [ 3810.200093] e1000e :00:19.0: restoring config space at offset 0xf 
  (was 0x100, writing 0x10b)
  [ 3810.200115] e1000e :00:19.0: restoring config space at offset 0x1 
  (was 0x10, writing 0x100107)
  [ 3810.200147] e1000e :00:19.0: PME# disabled
  [ 3810.200224] e1000e :00:19.0: irq 45 for MSI/MSI-X
  [ 4671.144685] iwlwifi :03:00.0: Tx aggregation enabled on ra = 
  2c:b0:5d:3c:7d:f1 tid = 1
  [ 4799.384107] btrfs: unlinked 8 orphans
  [ 8436.512513] btrfs: 

Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction

2012-06-26 Thread Marc MERLIN
On Tue, Jun 26, 2012 at 12:36:37PM -0700, Marc MERLIN wrote:
 I was fine with btrfs until 3.2.16, but when going to 3.4 or 3.4.4, I'm
 having my system randomly not wanting to suspend to RAM.
 
 My suspend light started flashing but the system wasn't suspending.
 I was able to unlock X, run top, saw that btrfs-delalloc-??? was taking
 98% CPU, and any command that touched disk hung 
 (so I could log in, but would not get a shell).
 
 I did the sysrq commands, which took a long time to work (they stacked
 up with nothing happening and maybe 10mn later, the system unhung itself
 and they ran, maybe too late, not sure).
 
 Interestingly the system did go to sleep and resumed ok after that, it
 just took 20mn to get there.
 
 It's all pasted below.
 
 gandalfthegreat:~# btrfs fi show
 Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
   Total devices 1 FS bytes used 214.44GB
   devid1 size 231.02GB used 231.02GB path /dev/dm-0
 
 Btrfs Btrfs v0.19
 gandalfthegreat:~# 

Now, I'm also seeing these below and I have this again (86% CPU):
 6076 root  20   0 000 R   86  0.0  29:40.11 btrfs-delalloc-

How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.

[100415.369301] BTRFS warning (device dm-0): Aborting unused transaction.
[100415.739337] BTRFS warning (device dm-0): Aborting unused transaction.
[100416.093839] BTRFS warning (device dm-0): Aborting unused transaction.
[100416.433245] BTRFS warning (device dm-0): Aborting unused transaction.
[100709.352445] usb 1-5: USB disconnect, device number 8
[100710.197602] thinkpad_acpi: EC reports that Thermal Table has changed
[100710.199954] ACPI: \_SB_.GDCK - undocking
[100711.358679] wlan0: deauthenticating from 00:24:6c:67:03:d1 by local choice 
(reason=3)
[100713.044214] e1000e :00:19.0: irq 46 for MSI/MSI-X
[101447.801190] btrfs no csum found for inode 3910588 start 5144576
[101447.802277] btrfs csum failed ino 3910588 off 5144576 csum 2096800889 
private 0
[101447.957161] btrfs no csum found for inode 3910588 start 5996544
[101447.957990] btrfs csum failed ino 3910588 off 5996544 csum 2096800889 
private 0
[101447.960459] btrfs no csum found for inode 3910588 start 6062080
[101447.961351] btrfs csum failed ino 3910588 off 6062080 csum 2096800889 
private 0
[101447.962459] btrfs no csum found for inode 3910588 start 6225920
[101447.963543] btrfs csum failed ino 3910588 off 6225920 csum 2096800889 
private 0
[101447.964369] btrfs no csum found for inode 3910588 start 6291456
[101447.965045] btrfs csum failed ino 3910588 off 6291456 csum 2096800889 
private 0
[101447.966406] btrfs no csum found for inode 3910588 start 6356992
[101447.967392] btrfs csum failed ino 3910588 off 6356992 csum 2123070294 
private 0
[101448.088912] btrfs no csum found for inode 3910588 start 6553600
[101448.088971] btrfs no csum found for inode 3910588 start 6586368
[101448.090058] btrfs csum failed ino 3910588 off 6586368 csum 2096800889 
private 0
[101448.090305] btrfs csum failed ino 3910588 off 6553600 csum 4253301504 
private 0
[101448.093231] btrfs no csum found for inode 3910588 start 6717440
[101448.094269] btrfs csum failed ino 3910588 off 6717440 csum 2096800889 
private 0
[101448.096104] btrfs no csum found for inode 3910588 start 6848512
[101448.097002] btrfs csum failed ino 3910588 off 6848512 csum 3282939717 
private 0
[101448.098249] btrfs no csum found for inode 3910588 start 6946816
[101448.101561] btrfs no csum found for inode 3910588 start 7045120
[101448.106352] btrfs no csum found for inode 3910588 start 7176192
[101448.108224] btrfs no csum found for inode 3910588 start 7241728
[101448.33] btrfs no csum found for inode 3910588 start 7307264
[101448.127081] btrfs no csum found for inode 3910588 start 7569408
[101448.144351] btrfs no csum found for inode 3910588 start 7864320
[101448.155857] btrfs no csum found for inode 3910588 start 8060928
[101448.165868] btrfs no csum found for inode 3910588 start 8257536
[101448.175275] btrfs no csum found for inode 3910588 start 8454144
[101448.176087] btrfs no csum found for inode 3910588 start 8552448
[101448.180927] btrfs no csum found for inode 3910588 start 8847360
[101448.183489] btrfs no csum found for inode 3910588 start 8978432
[101448.192909] btrfs no csum found for inode 3910588 start 9371648
[101448.198959] btrfs no csum found for inode 3910588 start 9568256
[101448.413716] btrfs no csum found for inode 3910588 start 5439488
[101448.413996] btrfs no csum found for inode 3910588 start 5537792
[101448.414379] btrfs no csum found for inode 3910588 start 5603328
[101448.416386] btrfs no csum found for inode 3910588 start 5734400
[101448.424108] btrfs no csum found for inode 3910588 start 5799936
[101448.721649] btrfs no csum found for inode 3910588 start 9699328
[101448.737377] btrfs no csum found for inode 3910588 start 9863168
[101448.743929] btrfs no csum found for inode 3910588 start 10027008
[101448.753186] btrfs no csum found for inode 

Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction

2012-06-26 Thread Marc MERLIN
On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
 Now, I'm also seeing these below and I have this again (86% CPU):
  6076 root  20   0 000 R   86  0.0  29:40.11 btrfs-delalloc-  
   
 
 How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.

Back to 3.2.16, I'm now seeing this:
[  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
[  840.516735] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[  840.516736] VirtualBox  D 8801fd134080 0  6818   6758 0x0080
[  840.516740]  8801fd134080 0086 0050 
880202e7f100
[  840.516744]  00013580 8801c6f0dfd8 8801c6f0dfd8 
8801fd134080
[  840.516748]  8801c6f0da68 8801c6f0da68 88020a4e22f0 
88023bc13e08
[  840.516752] Call Trace:
[  840.516755]  [810b5c67] ? __lock_page+0x66/0x66
[  840.516758]  [8134aea4] ? io_schedule+0x58/0x6f
[  840.516761]  [810b5c6d] ? sleep_on_page+0x6/0xa
[  840.516764]  [8134b1e5] ? __wait_on_bit_lock+0x3c/0x85
[  840.516767]  [810b5c62] ? __lock_page+0x61/0x66
[  840.516770]  [81060051] ? autoremove_wake_function+0x2a/0x2a
[  840.516785]  [a01838d7] ? 
extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
[  840.516789]  [810ec9cb] ? __cache_free.isra.40+0x19/0x1a7
[  840.516792]  [8134ed52] ? sub_preempt_count+0x83/0x94
[  840.516795]  [8134c2dd] ? _raw_spin_unlock+0x24/0x30
[  840.516811]  [a0183c4b] ? extent_writepages+0x40/0x57 [btrfs]
[  840.516826]  [a0177f5f] ? __btrfs_buffered_write+0x2bb/0x2dc 
[btrfs]
[  840.516841]  [a016e88a] ? uncompress_inline.isra.44+0x116/0x116 
[btrfs]
[  840.516844]  [810b6aaf] ? __filemap_fdatawrite_range+0x4b/0x50
[  840.516847]  [810b6ad9] ? filemap_write_and_wait_range+0x25/0x4d
[  840.516863]  [a01782ce] ? btrfs_file_aio_write+0x34e/0x490 [btrfs]
[  840.516866]  [8103e092] ? get_parent_ip+0x9/0x1b
[  840.516882]  [a0177f80] ? __btrfs_buffered_write+0x2dc/0x2dc 
[btrfs]
[  840.516886]  [8112f19c] ? aio_rw_vect_retry+0x70/0x18e
[  840.516888]  [8112f12c] ? aio_fsync+0x22/0x22
[  840.516891]  [8112fbc7] ? aio_run_iocb+0x72/0x11c
[  840.516894]  [81130d9a] ? do_io_submit+0x6a4/0x7f9
[  840.516898]  [813508d2] ? system_call_fastpath+0x16/0x1b
[ 1187.553635] btrfs: unlinked 8 orphans
[ 3810.200064] e1000e :00:19.0: BAR 0: set to [mem 0xfc00-0xfc01] 
(PCI address [0xfc00-0xfc01])
[ 3810.200071] e1000e :00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] 
(PCI address [0xfc025000-0xfc025fff])
[ 3810.200076] e1000e :00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI 
address [0x1840-0x185f])
[ 3810.200093] e1000e :00:19.0: restoring config space at offset 0xf (was 
0x100, writing 0x10b)
[ 3810.200115] e1000e :00:19.0: restoring config space at offset 0x1 (was 
0x10, writing 0x100107)
[ 3810.200147] e1000e :00:19.0: PME# disabled
[ 3810.200224] e1000e :00:19.0: irq 45 for MSI/MSI-X
[ 4671.144685] iwlwifi :03:00.0: Tx aggregation enabled on ra = 
2c:b0:5d:3c:7d:f1 tid = 1
[ 4799.384107] btrfs: unlinked 8 orphans
[ 8436.512513] btrfs: unlinked 7 orphans
[11350.749850] btrfs no csum found for inode 3909426 start 0
[11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0
[11652.088805] btrfs no csum found for inode 3910848 start 0
[11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0

My firefox and chrome profiles were corrupted, so I had to restore them from an 
old snapshot.

I can't prove it, but it looks like my corruption happened right at the same
time than I rebooted to 3.4.4.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html