Am Samstag, 25. Februar 2012 schrieb Arne Jansen: > On 02/24/12 16:51, Martin Steigerwald wrote: > > Am Samstag, 21. Januar 2012 schrieb Martin Steigerwald: > >> Am Samstag, 21. Januar 2012 schrieb Martin Steigerwald: > >>> I still have this with 3.2.0-1-pae - which is a debian kernel based > >>> on 3.2.1. > >>> > >>> When I do btrfs scrub start / the machine locks immediately up > >>> hard. > >>> > >>> Then usually on next boot it stops on space_cache enabled message, > >>> but not the one for /, but the one for /home which is mounted > >>> later. > >>> > >>> When I then boot with 3.1 it works. BTRFS redos the space_cache > >>> then while the machine takes ages to boot - I mean ages - 10 > >>> minutes till KDM prompt is no problem there. > >> > >> I now tested scrubbing /home which is a different BTRFS filesystem > >> on the same machine. > >> > >> Then the scrub is started, scrub status tells me so, but nothing > >> happens, no block in/out activity in vmstat, no CPU related activity > >> in top. > >> > >> btrfs scrub cancel then hangs, but not the complete machine, only > >> the process. > >> > >> I had this once on my T520 with the internal Intel SSD 320 as well. > >> The other time it worked. > >> > >> Well maybe that is due to BTRFS doing something else on my T23 now: > >> > >> deepdance:~> ps aux | grep ino-cache | grep -v grep > >> root 1992 5.5 0.0 0 0 ? D 12:15 0:09 > >> [btrfs- ino-cache] > >> > >> Hmmm, so I just let it sit for a while, maybe eventually it will > >> scrub /home. > >> > >> At least it doesn´t lock up hard, so there might really be something > >> strange with /. > > > > FWIW a btrfs filesystem balance / does work. After this a btrfs scrub > > start / still locks the kernel. > > Hi Martin, > > I just sent 2 patches to the list. Could you please test if these > fix your problem with scrub?
I didn´t yet test it but I tried the first balance then scrub stuff again: deepdance:~> btrfs filesystem balance / This time scrub didn´t lock up hard: deepdance:~> btrfs scrub start / scrub started on /, fsid 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 (pid=3347) deepdance:~> btrfs scrub status / scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 scrub started at Thu Mar 15 18:10:52 2012, running for 10 seconds total bytes scrubbed: 272.59MB with 0 errors deepdance:~> cat /proc/version Linux version 3.2.0-1-686-pae (Debian 3.2.6-1) (b...@decadent.org.uk) (gcc version 4.6.2 (Debian 4.6.2-14) ) #1 SMP Fri Feb 17 06:27:21 UTC 2012 deepdance:~> btrfs scrub status / scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 scrub started at Thu Mar 15 18:10:52 2012, running for 55 seconds total bytes scrubbed: 1.29GB with 0 errors deepdance:~> btrfs scrub status / scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 scrub started at Thu Mar 15 18:10:52 2012, running for 175 seconds total bytes scrubbed: 2.58GB with 0 errors But it seems to be stuck now: deepdance:~> btrfs scrub status / scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 scrub started at Thu Mar 15 18:10:52 2012, running for 295 seconds total bytes scrubbed: 2.58GB with 0 errors deepdance:~> btrfs scrub status / scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 scrub started at Thu Mar 15 18:10:52 2012, running for 395 seconds total bytes scrubbed: 2.58GB with 0 errors deepdance:~> btrfs scrub status / scrub status for 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 scrub started at Thu Mar 15 18:10:52 2012, running for 636 seconds total bytes scrubbed: 2.58GB with 0 errors There is almost no activity in vmstat 5: 2 0 108812 63812 36 412716 0 22 0 437 515 1723 47 13 37 3 2 0 108812 63320 36 413168 0 0 81 0 453 1600 44 12 43 1 2 0 108812 68056 36 413512 0 0 64 0 480 1589 51 11 38 0 2 0 108812 68112 36 413516 0 0 0 0 460 1454 46 11 43 0 1 0 108812 68112 36 413536 0 0 0 1 454 1456 45 11 44 0 1 1 108812 67924 36 413692 0 0 0 330 491 1578 49 13 37 1 2 0 108812 67808 36 413792 0 0 0 6 473 1556 48 11 41 0 2 0 108812 67436 36 414124 0 0 0 424 523 1922 47 13 39 2 1 0 108812 67460 36 414136 0 0 0 0 453 1578 46 9 45 0 2 0 108812 67476 36 414136 0 0 0 0 465 1493 46 12 41 0 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 0 108812 67492 36 414140 0 0 0 0 467 1609 48 10 42 0 2 0 108812 67500 36 414148 0 0 0 0 470 1686 46 12 41 0 1 0 108812 66872 36 414152 0 0 0 1 463 1585 46 11 43 0 1 0 108812 67492 36 414156 0 0 0 1 483 1612 48 13 39 0 2 0 108812 67428 36 414156 0 0 0 325 497 1654 48 9 41 1 0 0 108812 67452 36 414160 0 0 0 3 478 1588 46 15 39 0 2 0 108812 65840 36 415228 0 0 214 0 465 1408 54 10 36 0 0 0 108812 66124 36 415352 0 0 24 0 510 1476 62 14 22 2 Given the general slowness of the machine, I waited about 15 seconds on a tail -f /var/log/syslog while just a Konsole and Amarok was running and wait about that time or even longer when logging via SSH at times, I think next I try newest Debian kernel which should be up to 3.2.10. If that doesn´t work, I consider trying your patches. At least its a step forward: The machine doesn´t lock up hard anymore. Its still playing music. FWIW a cancel hangs: deepdance:~> btrfs scrub cancel / martin@deepdance:~> ps aux | grep btrfs | grep cancel root 3429 0.0 0.0 2128 304 pts/5 D+ 18:24 0:00 btrfs scrub cancel / Ah, now there we go, from dmesg: 16453.391004] btrfs: relocating block group 41393586176 flags 1 [16568.504524] btrfs: found 9891 extents [16660.810000] btrfs: found 9891 extents [16664.818271] btrfs: relocating block group 40319844352 flags 1 [16729.029078] btrfs: found 3593 extents [16755.771257] btrfs: found 3593 extents [16757.816472] btrfs: relocating block group 39246102528 flags 1 [16843.042862] btrfs: found 13147 extents [16898.078594] btrfs: found 13147 extents [16905.444650] btrfs: relocating block group 39237713920 flags 34 [16910.579963] btrfs: found 1 extents [16915.946553] btrfs: relocating block group 39103496192 flags 36 [16997.308374] btrfs: found 14552 extents [17002.239868] btrfs: relocating block group 38969278464 flags 36 [17165.253567] btrfs: found 22792 extents [17166.522513] btrfs: relocating block group 38835060736 flags 36 [17376.519168] btrfs: found 22857 extents [17377.526508] btrfs: relocating block group 38700843008 flags 36 [17559.413243] btrfs: found 24810 extents [17563.767590] btrfs: relocating block group 37627101184 flags 1 [17646.272226] btrfs: found 10900 extents [17694.323897] btrfs: found 10900 extents [17697.171187] btrfs: relocating block group 36553359360 flags 1 [17779.364231] btrfs: found 12149 extents [17828.395574] btrfs: found 12149 extents [17831.593055] btrfs: relocating block group 35479617536 flags 1 [17923.352624] btrfs: found 22174 extents [18014.147595] btrfs: found 22174 extents [18015.821756] btrfs: relocating block group 35345399808 flags 36 [18055.201043] btrfs: found 9166 extents [18061.692337] btrfs: relocating block group 34271657984 flags 1 [18173.986306] btrfs: found 29124 extents [18328.733328] btrfs: found 29124 extents [18333.141061] btrfs: relocating block group 33197916160 flags 1 [18427.523898] btrfs: found 26881 extents [18587.015489] btrfs: found 26880 extents [18590.183797] btrfs: relocating block group 33063698432 flags 36 [18590.183797] btrfs: relocating block group 33063698432 flags 36 [18624.883576] btrfs: found 6522 extents [18628.618415] btrfs: relocating block group 32929480704 flags 36 [18670.147756] btrfs: found 9876 extents [18671.426925] btrfs: relocating block group 32795262976 flags 36 [18863.313381] btrfs: found 27324 extents [18864.392712] btrfs: relocating block group 31721521152 flags 1 [18945.941315] btrfs: found 21748 extents [19076.164135] btrfs: found 21721 extents [20880.564211] INFO: task btrfs:3348 blocked for more than 120 seconds. [20880.564225] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20880.564234] btrfs D 000012d2 0 3348 1 0x00000000 [20880.564251] c0b72080 00000086 2cae1a17 000012d2 ec3ae5a0 00000000 000012d2 c14769c0 [20880.564275] c0b7222c c14769c0 00008050 e0b96ce0 ee000120 d1d5dae0 d1d5db20 00000028 [20880.564298] c105d451 00000092 c12b949b 00000092 f0b7037c 00000246 eee6073c c105d451 [20880.564321] Call Trace: [20880.564348] [<c105d451>] ? arch_local_irq_save+0xf/0x14 [20880.564368] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21 [20880.564482] [<f0b7037c>] ? btrfs_queue_worker+0x1cd/0x1fc [btrfs] [20880.564497] [<c105d451>] ? arch_local_irq_save+0xf/0x14 [20880.564511] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21 [20880.564526] [<c104d402>] ? prepare_to_wait+0x12/0x50 [20880.564601] [<f0b8a8c8>] ? btrfs_reada_wait+0x5b/0x84 [btrfs] [20880.564615] [<c104d4d0>] ? add_wait_queue+0x30/0x30 [20880.564687] [<f0b87e33>] ? scrub_stripe+0x34c/0xbda [btrfs] [20880.564729] [<f0a4da1f>] ? scsi_alloc_sgtable+0x1a/0x38 [scsi_mod] [20880.564786] [<f0b37a7f>] ? comp_keys+0x2f/0x34 [btrfs] [20880.564842] [<f0b37b4d>] ? generic_bin_search.constprop.31+0xc9/0x104 [btrfs] [20880.564856] [<c104d4d0>] ? add_wait_queue+0x30/0x30 [20880.564912] [<f0b37bbf>] ? bin_search+0x37/0x3d [btrfs] [20880.564987] [<f0b5e3d4>] ? __lookup_extent_mapping+0xe5/0x101 [btrfs] [20880.565061] [<f0b88781>] ? scrub_chunk.isra.13+0xc0/0xef [btrfs] [20880.565134] [<f0b889a3>] ? scrub_enumerate_chunks+0x1f3/0x23a [btrfs] [20880.565209] [<f0b89486>] ? btrfs_scrub_dev+0x215/0x34e [btrfs] [20880.565230] [<c10c061a>] ? __kmalloc_track_caller+0x9b/0xa7 [20880.565248] [<c102a280>] ? should_resched+0x5/0x1e [20880.565320] [<f0b74b14>] ? btrfs_ioctl+0xc55/0xda1 [btrfs] [20880.565335] [<c1029188>] ? kmap_atomic_prot+0x2f/0xe0 [20880.565353] [<c10ad3d5>] ? handle_mm_fault+0x1ee/0x1fd [20880.565426] [<f0b73ebf>] ? btrfs_ioctl_trans_end+0x49/0x49 [btrfs] [20880.565442] [<c10d7023>] ? do_vfs_ioctl+0x459/0x48f [20880.565457] [<c12bc3d7>] ? do_page_fault+0x2e0/0x2fc [20880.565469] [<c12bc3c4>] ? do_page_fault+0x2cd/0x2fc [20880.565484] [<c10d709d>] ? sys_ioctl+0x44/0x67 [20880.565499] [<c12bdd1f>] ? sysenter_do_call+0x12/0x28 [21000.564277] INFO: task btrfs:3348 blocked for more than 120 seconds. [21000.564292] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [21000.564302] btrfs D 000012d2 0 3348 1 0x00000000 [21000.564319] c0b72080 00000086 2cae1a17 000012d2 ec3ae5a0 00000000 000012d2 c14769c0 [21000.564344] c0b7222c c14769c0 00008050 e0b96ce0 ee000120 d1d5dae0 d1d5db20 00000028 [21000.564366] c105d451 00000092 c12b949b 00000092 f0b7037c 00000246 eee6073c c105d451 [21000.564390] Call Trace: [21000.564422] [<c105d451>] ? arch_local_irq_save+0xf/0x14 [21000.564442] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21 [21000.564582] [<f0b7037c>] ? btrfs_queue_worker+0x1cd/0x1fc [btrfs] [21000.564600] [<c105d451>] ? arch_local_irq_save+0xf/0x14 [21000.564614] [<c12b949b>] ? _raw_spin_lock_irqsave+0x8/0x21 [21000.564630] [<c104d402>] ? prepare_to_wait+0x12/0x50 [21000.564707] [<f0b8a8c8>] ? btrfs_reada_wait+0x5b/0x84 [btrfs] [21000.564721] [<c104d4d0>] ? add_wait_queue+0x30/0x30 [21000.564793] [<f0b87e33>] ? scrub_stripe+0x34c/0xbda [btrfs] [21000.564840] [<f0a4da1f>] ? scsi_alloc_sgtable+0x1a/0x38 [scsi_mod] [21000.564899] [<f0b37a7f>] ? comp_keys+0x2f/0x34 [btrfs] [21000.564954] [<f0b37b4d>] ? generic_bin_search.constprop.31+0xc9/0x104 [btrfs] [21000.564968] [<c104d4d0>] ? add_wait_queue+0x30/0x30 [21000.565024] [<f0b37bbf>] ? bin_search+0x37/0x3d [btrfs] [21000.565099] [<f0b5e3d4>] ? __lookup_extent_mapping+0xe5/0x101 [btrfs] [21000.565173] [<f0b88781>] ? scrub_chunk.isra.13+0xc0/0xef [btrfs] [21000.565246] [<f0b889a3>] ? scrub_enumerate_chunks+0x1f3/0x23a [btrfs] [21000.565322] [<f0b89486>] ? btrfs_scrub_dev+0x215/0x34e [btrfs] [21000.565345] [<c10c061a>] ? __kmalloc_track_caller+0x9b/0xa7 [21000.565363] [<c102a280>] ? should_resched+0x5/0x1e [21000.565436] [<f0b74b14>] ? btrfs_ioctl+0xc55/0xda1 [btrfs] [21000.565451] [<c1029188>] ? kmap_atomic_prot+0x2f/0xe0 [21000.565471] [<c10ad3d5>] ? handle_mm_fault+0x1ee/0x1fd [21000.565544] [<f0b73ebf>] ? btrfs_ioctl_trans_end+0x49/0x49 [btrfs] [21000.565561] [<c10d7023>] ? do_vfs_ioctl+0x459/0x48f [21000.565577] [<c12bc3d7>] ? do_page_fault+0x2e0/0x2fc [21000.565589] [<c12bc3c4>] ? do_page_fault+0x2cd/0x2fc [21000.565603] [<c10d709d>] ? sys_ioctl+0x44/0x67 [21000.565619] [<c12bdd1f>] ? sysenter_do_call+0x12/0x28 [… it seems to go on this way …] Maybe something of that is helpful? I leave the machine running for a while, but I think it won´t hibernate with a process stuck in D state so eventually at some time I will shut it down and restart. OTOH I could leave it running till tomorrow or what. I will put log files aside for further diagnostics in any case. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html