On 07/03/2012 03:58 AM, Marc MERLIN wrote:

> On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote:
>> On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:
>>> On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
>>>> Now, I'm also seeing these below and I have this again (86% CPU):
>>>>  6076 root      20   0     0    0    0 R   86  0.0  29:40.11 
>>>> btrfs-delalloc-    
>>>>
>>>> How bad is it, doctor?  I think I'll be going back to 3.2.16 for now 
>>>> though.
>>  
>> I reverted to 3.2.16 and haven't had further problems after dropping the
>> current snapshot that was corrupted in various ways.
>>
>> Now, I'm not sure when I should upgrade anymore since I haven't heard of
>> any fixes for what I saw.
>> Assuming I go forward again, is there something else I could have
>> provided to help debug?
> 
> Mmmh, ok. I understand that this code comes with no guarantees, and I have
> backups, but I'm reporting a problem that lead to corruption (I had multiple
> files that were corrupted in my latest snapshot and I had to drop it and
> revert to an older snapshot and then out of fear for 3.4.4, went back to
> 3.2.16).
> 


Hi Marc,

Sorry for not replying this earlier.

The dmesg log, sysrq log and stack dump info can usually be very helpful.

>From your report, we can see the csum error and hang on log,
'no csum' is not that bad while hanging-on is serious and dangerous.

so can you please get any 'sysrq + w' log in the hanging-on case and paste them 
here,
and the log may tell us who blocks other threads.

> I didn't see any problems with 3.2.16 (doesn't mean there weren't any, just
> that I didn't see any).


Feel free to use the latest btrfs upstream, it always contains some fixes.

thanks,
liubo

> Since my filesystem was a bit full, and that triggers problems with btrfs, I
> freed up 70GB
> gandalfthegreat:~# btrfs fi show
> Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
>       Total devices 1 FS bytes used 163.01GB
>       devid    1 size 231.02GB used 231.02GB path /dev/dm-0
> 
> I rebooted with 3.4.4 and started copying data, and for now I've gotten this:
> kernel: [  832.108558] btrfs no csum found for inode 3896855 start 0
> kernel: [  832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 
> private 0
> 
> How bad is this?
> 
> More generally, what was missing from my previous report (I gave all the
> sysrq I could output) that no one seemed to be able to use it?
> 

> Thanks,
> Marc
> 
>>> Back to 3.2.16, I'm now seeing this:
>>> [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
>>> [  840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>>> this message.
>>> [  840.516736] VirtualBox      D ffff8801fd134080     0  6818   6758 
>>> 0x00000080
>>> [  840.516740]  ffff8801fd134080 0000000000000086 0000000000000050 
>>> ffff880202e7f100
>>> [  840.516744]  0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 
>>> ffff8801fd134080
>>> [  840.516748]  ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 
>>> ffff88023bc13e08
>>> [  840.516752] Call Trace:
>>> [  840.516755]  [<ffffffff810b5c67>] ? __lock_page+0x66/0x66
>>> [  840.516758]  [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f
>>> [  840.516761]  [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa
>>> [  840.516764]  [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85
>>> [  840.516767]  [<ffffffff810b5c62>] ? __lock_page+0x61/0x66
>>> [  840.516770]  [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a
>>> [  840.516785]  [<ffffffffa01838d7>] ? 
>>> extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
>>> [  840.516789]  [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7
>>> [  840.516792]  [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94
>>> [  840.516795]  [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30
>>> [  840.516811]  [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs]
>>> [  840.516826]  [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc 
>>> [btrfs]
>>> [  840.516841]  [<ffffffffa016e88a>] ? 
>>> uncompress_inline.isra.44+0x116/0x116 [btrfs]
>>> [  840.516844]  [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50
>>> [  840.516847]  [<ffffffff810b6ad9>] ? 
>>> filemap_write_and_wait_range+0x25/0x4d
>>> [  840.516863]  [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 
>>> [btrfs]
>>> [  840.516866]  [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b
>>> [  840.516882]  [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc 
>>> [btrfs]
>>> [  840.516886]  [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e
>>> [  840.516888]  [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22
>>> [  840.516891]  [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c
>>> [  840.516894]  [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9
>>> [  840.516898]  [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b
>>> [ 1187.553635] btrfs: unlinked 8 orphans
>>> [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 
>>> 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff])
>>> [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 
>>> 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
>>> [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI 
>>> address [0x1840-0x185f])
>>> [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf 
>>> (was 0x100, writing 0x10b)
>>> [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 
>>> (was 0x100000, writing 0x100107)
>>> [ 3810.200147] e1000e 0000:00:19.0: PME# disabled
>>> [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
>>> [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 
>>> 2c:b0:5d:3c:7d:f1 tid = 1
>>> [ 4799.384107] btrfs: unlinked 8 orphans
>>> [ 8436.512513] btrfs: unlinked 7 orphans
>>> [11350.749850] btrfs no csum found for inode 3909426 start 0
>>> [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0
>>> [11652.088805] btrfs no csum found for inode 3910848 start 0
>>> [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0
>>>
>>> My firefox and chrome profiles were corrupted, so I had to restore them 
>>> from an old snapshot.
>>>
>>> I can't prove it, but it looks like my corruption happened right at the same
>>> time than I rebooted to 3.4.4.
>>>
>>> Marc
>>> -- 
>>> "A mouse is a device used to point at the xterm you want to type in" - 
>>> A.S.R.
>>> Microsoft is to operating systems ....
>>>                                       .... what McDonalds is to gourmet 
>>> cooking
>>> Home page: http://marc.merlins.org/  
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> -- 
>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> Microsoft is to operating systems ....
>>                                       .... what McDonalds is to gourmet 
>> cooking
>> Home page: http://marc.merlins.org/
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to