Re: Changing label few times killed filesystem?
Since nobody had any other suggestions, I decided to attempt to run modified btrfsck with --repair option (without BUG_ON(rec-is_root) assertion). Surprisingly modified btrfsck --repair fixed all errors but one (according to btrfsck), but btrfsck asked me to run btrfsck --repair one more time to fix the remaining error. Mounting still did not work at this point, so I did what btrfsck suggested. At first it said it fixed the remaining error but then it found many more errors (not sure if btrfsck caused them or they were already present and fixing the remaining error just uncovered them). btrfs restore (with or with -t option) returns with zero exit code without even attempting to do anything (like it did before I tried to --repair). Mounting with or without recovery option produces the same errors (they were exactly the same before --repair so I already mentioned them in previous message, but for convenience I mention them again in the log below). btrfs rescue chunk-recover and btrfs rescue super-recover say that everything is OK. Does anybody have any ideas or suggestions? Please do not be afraid to suggest something risky - at this point I have nothing to lose, because if I cannot restore files or provide further debug information for developers, I have to reformat this partition anyway. Ideas what could have caused this corruption are also welcome, because currently I find it hard to believe that relabeling or mounting/unmounting were the only reasons. Below I show what I did exactly and show some parts of terminal output (for readability I removed repeated similar messages, please download full log if you are interested). # btrfsck --repair /dev/sdb1 # Full log is can be downloaded here: http://pastebin.com/MdyjxY4w enabling repair mode Fixed 0 roots. Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents ref mismatch on [20971520 16384] extent item 0, found 1 adding new tree backref on start 20971520 len 16384 parent 3 root 3 Backref 20971520 parent 3 root 3 not found in extent tree backpointer mismatch on [20971520 16384] ... owner ref check failed [47529984 16384] repaired damaged extent references checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 5 root dir 256 error ... root 5 inode 5 errors 1, no inode item unresolved ref dir 6 index 0 namelen 7 name default filetype 0 errors 3, no dir item, no dir index Failed to find [30769152, 168, 16384] btrfs unable to find ref byte nr 30769152 parent 0 root 5 owner 0 offset 1 reset isize for dir 6 root 5 root 5 inode 6 errors 2000, link count wrong unresolved ref dir 6 index 0 namelen 2 name .. filetype 0 errors 3, no dir item, no dir index root 5 inode 7 errors 1, no inode item root 5 inode 9 errors 1, no inode item root 5 inode 257 errors 2400, nbytes wrong, link count wrong ... root 5 inode 18446744073709551607 errors 1, no inode item found 409600 bytes used err is 1 total csum bytes: 0 total tree bytes: 49152 total fs tree bytes: 0 total extent tree bytes: 16384 btree space waste bytes: 48246 file data blocks allocated: 0 referenced 0 Btrfs v3.17 To my surprise, btrfsck showed great improvements (after btrfsck --repair) and asked me to run btrfsck --repair one more time to fix remaining error: # btrfsck /dev/sdb1 root item for root 18446744073709551607, current bytenr 29540352, current gen 2758, current level 0, new bytenr 29540352, new gen 4294967296, new level 1 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. Before trying to run btrfsck --repair again, I tried to mount, but it did not work: # mount /dev/sdb1 /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so # dmesg | tail ... [268827.386951] BTRFS info (device sdb1): disk space caching is enabled [268827.389932] parent transid verify failed on 29458432 wanted 5 found 2759 [268827.390161] parent transid verify failed on 29458432 wanted 5 found 2759 [268827.405135] BTRFS: open_ctree failed Since btrfsck told me to run it with --repair option again, I did: # btrfsck --repair /dev/sdb1 # Full log is available here: http://pastebin.com/pcWte3Ru enabling repair mode fixing root item for root 18446744073709551607, current bytenr 29540352, current gen 2758, current level 0, new bytenr 29540352, new gen 4294967296, new level 1 Fixed 1 roots. Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents parent transid verify failed on 29425664 wanted 1087 found 2763 ... Ignoring transid failure leaf parent key incorrect 29425664 bad block 29425664 Chunk[256, 228, 0]: length(4194304), offset(0), type(2) is not found in block group
Re: Changing label few times killed filesystem?
On 2014-11-24 02:46, Duncan wrote if you were using gmane's web service, that explains things as weaverd, the process that does the threading on the web side, was down for some days Yes, I have used gmane blog. Good to know it is not down anymore. Back on topic. Even after updating to the latest version, btrfsck or any of its options including --repair still do not work. Does anyone know what Assertion `rec-is_root` failed means? Is it worth trying to compile my own version of btrfsck without this assertion? With or without --repair option, it looks like this assertion stops btrfsck very early, preventing btrfsck from checking the filesystem or attempting to repair it. # btrfsck /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents cmds-check.c:2645: check_owner_ref: Assertion `rec-is_root` failed. btrfs check[0x41a081] btrfs check[0x41a0a5] btrfs check[0x409783] btrfs check[0x40a45e] btrfs check[0x41bfa9] btrfs check[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb275f24b45] btrfs check[0x40b497] # btrfsck --repair /dev/sdb1 enabling repair mode Fixed 0 roots. Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents cmds-check.c:2645: check_owner_ref: Assertion `rec-is_root` failed. btrfs check[0x41a081] btrfs check[0x41a0a5] btrfs check[0x409783] btrfs check[0x40a45e] btrfs check[0x41bfa9] btrfs check[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fbc5b8dab45] btrfs check[0x40b497] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Changing label few times killed filesystem?
In attempt to get more information, I have commented out BUG_ON(rec-is_root) in cmds-check.c to let btrfsck check my file system without failing on this assertion. Below you can see the output. I would appreciate any help or ideas... # btrfsck /dev/sdb1 # Full log can be downloaded here: http://pastebin.com/D68vr69J Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents ... ref mismatch on [20987904 16384] extent item 0, found 1 Backref 20987904 parent 3 root 3 not found in extent tree backpointer mismatch on [20987904 16384] owner ref check failed [20987904 16384] ...messages like these repeat many times, download full log to see them all... ref mismatch on [29540352 16384] extent item 0, found 1 Backref 29540352 parent 18446744073709551607 root 18446744073709551607 not found in extent tree backpointer mismatch on [29540352 16384] owner ref check failed [29540352 16384] ... Errors found in extent allocation tree or chunk allocation checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 5 root dir 256 not found found 409600 bytes used err is 1 total csum bytes: 0 total tree bytes: 49152 total fs tree bytes: 0 total extent tree bytes: 16384 btree space waste bytes: 48246 file data blocks allocated: 0 referenced 0 Btrfs v3.17 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Changing label few times killed filesystem?
I suggest upgrading and just posting the results from 'btrfs check device' without any options and see what you get. OK, I have upgraded to 3.17.0 kernel and I also have upgraded btrfs-tools: # btrfs --version Btrfs v3.17 # btrfs check /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents cmds-check.c:2645: check_owner_ref: Assertion `rec-is_root` failed. btrfs[0x41a081] btrfs[0x41a0a5] btrfs[0x409783] btrfs[0x40a45e] btrfs[0x41bfa9] btrfs[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7feaf251cb45] btrfs[0x40b497] btrfsck /dev/sdb1 gives exactly the same output. It seems it does not even try to check anything but just fails on the assertion. I also tried btrfs restore: # btrfs restore /dev/sdb1 /media/backup/sdb1 # Does nothing and exits almost immediately # echo $? 0 After I have upgraded to new kernel, when I try to mount the partition I get this: # mount /dev/sdb1 /mnt mount: wrong fs type, bad option, bad superblock on /dev/sdb1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so # dmesg | tail ... [ 2505.921545] BTRFS info (device sdb1): disk space caching is enabled [ 2505.925079] parent transid verify failed on 29458432 wanted 5 found 2759 [ 2505.944413] parent transid verify failed on 29458432 wanted 5 found 2759 [ 2505.958450] BTRFS: open_ctree failed However, if you are not now and never did use compression on that filesystem, that bug shouldn't affect you, but others might. I did not use compression on this partition, but I have used it on another btrfs disk (which seems to work fine, at least for now). I think I did not use any of special features on the partition I have trouble with (I was planning to, but it died before I got a chance). it's quite possible you're seeing the one bug, and the relabeling is simply coincidence. I suppose it is possible that something else was the cause, but only other thing I did with the file system at the time was mounting/unmounting it. Also, I did not use it much, just for few weeks, before that the disk was unplugged for a few months (with no files on it). And only things I did with it (before it stopped working) was creating, moving, copying and deleting files. Before upgrading btrfs-tools and the kernel I tried to reproduce the issue by creating big file with btrfs file system, but I was unable to reproduce the problem, but I did not put as much files as on real partition, and it was of a smaller size. In other words, the issue I have encountered seems to be hard to reproduce, so I cannot tell with 100% certainty what exactly caused the corruption. Is there anything else I can try? If not to restore it then to provide more useful debug information (if possible in this case). I could try compiling latest development versions of kernel and/or btrfs-tools if is there a chance that might help? P.S. I received on my mail only shortest reply about mount command, so I was able to read other replies only after few days when they appeared on gmane (I wasn't subscribed at the time because I did not expect gmane to be so slow). This time I subscribed to the list so hopefully I will be able to read all replies without delay. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Changing label few times killed filesystem?
On 2014-11-21 04:35, Roman Mamedov wrote: On Fri, 21 Nov 2014 01:27:17 + Boris Chernov aqs1...@hotmail.com wrote: I have changed file system label few times in total. When I tried to mount it after that, it became not mountable: # mount /dev/sdb1 /mnt mount: Not a directory I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1. Before mounting try things like ls -la /mnt/, umount /mnt, etc. Or simply mounting somewhere else other than /mnt/ Before I attempted mounting to /mnt I tried to mount with KDE Device Notifier to /media/username/label, then I have tried to create directory manually in /media/ and tried to mount in the command-line, then tried /mnt, and error was the same. So I'm sure there is nothing wrong with my mount points. Now I have rebooted and tried to mount in KDE Device Notifier to /media/username/label, it failed again, so I tried from command-line as root: # mkdir /media/sdb1 ls -la /media/sdb1 mount /dev/sdb1 /media/sdb1 total 8 drwxr-sr-x 2 root disk 4096 Nov 21 08:12 . drwsrwsrwT 7 root disk 4096 Nov 21 08:12 .. ...and that's it, no output from mount command (it just hanged and become unkillable process). Please let me know if there is anything else I could try to either restore it or debug it (to at least understand why exactly it screwed up itself so it will not happen again to me or anyone else). If it matters, the disk is with single partition (BTRFS-only), was plugged-in all the time and I use Xeon-based workstation with ECC memory. In the dmesg I see the following, it seems after encountering btrfs bugs in its recovery tools (mentioned in my previous mail) I have also encountered btrfs bug in the kernel: [ 339.349260] BTRFS info (device sdb1): disk space caching is enabled [ 339.397438] parent transid verify failed on 29458432 wanted 5 found 2759 [ 339.397505] [ cut here ] [ 339.397510] kernel BUG at fs/btrfs/locking.c:269! [ 339.397513] invalid opcode: [#1] SMP [ 339.397517] Modules linked in: ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc snd_aloop snd_hrtimer xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 snd_i2c snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore ac97_bus vmnet(O) parport_pc parport vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) cpufreq_conservative cpufreq_powersave cpufreq_userspace cpufreq_stats zram nvidia(PO) cfg80211 rfkill binfmt_misc uinput zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc iTCO_wdt iTCO_vendor_support usblp kvm_intel kvm ses enclosure cdc_ether psmouse option i2c_i801 pcspkr usbnet mii usb_wwan usbserial serio_raw i7core_edac edac_core uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media evdev joydev jc42 w83627ehf lm90 coretemp adt7475 hwmon_vid adm1021 ttm drm_kms_helper drm i2c_algo_bit i2c_core msr loop fuse tpm_infineon tpm_tis lpc_ich mfd_core tpm button acpi_cpufreq processor thermal_sys autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq usb_storage sg sd_mod sr_mod cdrom crc_t10dif crct10dif_common hid_generic usbhid hid ahci libahci libata crc32c_intel scsi_mod e1000e ptp pps_core xhci_hcd ehci_pci ehci_hcd usbcore usb_common [last unloaded: vmnet] [ 339.397584] CPU: 0 PID: 25752 Comm: mount Tainted: P O 3.15.0-pf2 #1 [ 339.397585] Hardware name: Supermicro X8SIE/X8SIE, BIOS 1.2 08/19/11 [ 339.397586] task: 880036c93f80 ti: 8805702b4000 task.ti: 8805702b4000 [ 339.397587] RIP: 0010:[a0245050] [a0245050] btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs] [ 339.397604] RSP: 0018:8805702b7bf0 EFLAGS: 00010246 [ 339.397605] RAX: RBX: 8804db6da800 RCX: 0581 [ 339.397606] RDX: RSI: 8804db58d0e0 RDI: 8804db6da800 [ 339.397607] RBP: 0001 R08: 0001b830 R09: 88063fc1b830 [ 339.397608] R10: 88061afec700 R11: ea00136d6300 R12: 0005 [ 339.397609] R13: 88008c978820 R14: 88061af51000 R15: 8804db6da800 [ 339.397610] FS: 7f55bf45b840() GS:88063fc0() knlGS: [ 339.397612] CS: 0010 DS: ES: CR0: 8005003b [ 339.397613] CR2: 7f6b280af000 CR3: 0004da047000 CR4: 07f0 [ 339.397614] Stack: [ 339.397614] a024557d 8804db6da800 a0208838 [ 339.397616] 88008c978820 [ 339.397617] a02093a0 1c18 0005 8804db6da800 [ 339.397619] Call Trace: [ 339.397629
Changing label few times killed filesystem?
I have changed file system label few times in total. When I tried to mount it after that, it became not mountable: # mount /dev/sdb1 /mnt mount: Not a directory In dmesg I see the following after above command: [ 5198.413202] BTRFS info (device sdb1): disk space caching is enabled [ 5198.629958] BTRFS: checking UUID tree I have lots of manually sorted downloaded files on this partition (in other words nothing very important but downloading and sorting all files again would require a lot of time), so I would appreciate any help. This is what I have tried so far to restore it: # btrfs check /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 checking extents btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec-is_root)' failed. zsh: abort btrfs check /dev/sdb1 Since it failed after checking extents I decided to try --init-extent-tree: # btrfs check --init-extent-tree /dev/sdb1 Checking filesystem on /dev/sdb1 UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 Creating a new extent tree Failed to find [29376512, 168, 16384] btrfs unable to find ref byte nr 29376512 parent 0 root 1 owner 1 offset 0 Failed to find [30818304, 168, 16384] btrfs unable to find ref byte nr 30818304 parent 0 root 1 owner 0 offset 1 Failed to find [47546368, 168, 16384] btrfs unable to find ref byte nr 47546368 parent 0 root 1 owner 0 offset 1 parent transid verify failed on 29442048 wanted 4 found 2758 Ignoring transid failure checking extents btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec-is_root)' failed. zsh: abort btrfs check --init-extent-tree /dev/sdb1 # btrfs restore /dev/sdb1 /media/backup/sdb1 # this commands exits after a second with 0 return code # echo $? 0 I also tried btrfs restore with --path-regex and got the same result. # btrfs-find-root /dev/sdb1 Super think's the tree root is at 29360128, chunk root 20971520 Well block 4194304 seems great, but generation doesn't match, have=2, want=2759 level 0 Well block 4243456 seems great, but generation doesn't match, have=3, want=2759 level 0 Found tree root at 29360128 gen 2759 level 1 https://btrfs.wiki.kernel.org/index.php/Restore talks about picking root with largest transid, but I do not see transid in my output, so not sure what to do. I also tried btrfsck: # btrfsck /dev/sdb1 *** Error in `btrfs check': double free or corruption (fasttop): 0x01074020 *** zsh: abort btrfsck /dev/sdb1 # btrfsck -b /dev/sdb1 *** Error in `btrfs check': double free or corruption (fasttop): 0x024e8020 *** zsh: abort btrfsck -b /dev/sdb1 # btrfsck --repair /dev/sdb1 enabling repair mode *** Error in `btrfs check': double free or corruption (fasttop): 0x00e26020 *** zsh: abort btrfsck --repair /dev/sdb1 # uname -a Linux debian 3.15.0-pf2 #1 SMP Sat Jun 28 15:09:48 EEST 2014 x86_64 GNU/Linux # btrfs --version Btrfs v3.14.1 # btrfs fi show Label: 'label' uuid: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243 Total devices 1 FS bytes used 411.76GiB devid1 size 465.76GiB used 465.76GiB path /dev/sdb1 Btrfs v3.14.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html