Re: Changing label few times killed filesystem?

2014-11-27 Thread Boris Chernov


Since nobody had any other suggestions, I decided to attempt to run 
modified btrfsck with --repair option (without BUG_ON(rec-is_root) 
assertion).


Surprisingly modified btrfsck --repair fixed all errors but one 
(according to btrfsck), but btrfsck asked me to run btrfsck --repair one 
more time to fix the remaining error. Mounting still did not work at 
this point, so I did what btrfsck suggested. At first it said it fixed 
the remaining error but then it found many more errors (not sure if 
btrfsck caused them or they were already present and fixing the 
remaining error just uncovered them).


btrfs restore (with or with -t option) returns with zero exit code 
without even attempting to do anything (like it did before I tried to 
--repair). Mounting with or without recovery option produces the same 
errors (they were exactly the same before --repair so I already 
mentioned them in previous message, but for convenience I mention them 
again in the log below). btrfs rescue chunk-recover and btrfs rescue 
super-recover say that everything is OK.


Does anybody have any ideas or suggestions?

Please do not be afraid to suggest something risky - at this point 
I have nothing to lose, because if I cannot restore files or provide 
further debug information for developers, I have to reformat this 
partition anyway. Ideas what could have caused this corruption are also 
welcome, because currently I find it hard to believe that relabeling or 
mounting/unmounting were the only reasons.


Below I show what I did exactly and show some parts of terminal 
output (for readability I removed repeated similar messages, please 
download full log if you are interested).


# btrfsck --repair /dev/sdb1  # Full log is can be downloaded here: 
http://pastebin.com/MdyjxY4w

enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
ref mismatch on [20971520 16384] extent item 0, found 1
adding new tree backref on start 20971520 len 16384 parent 3 root 3
Backref 20971520 parent 3 root 3 not found in extent tree
backpointer mismatch on [20971520 16384]
...
owner ref check failed [47529984 16384]
repaired damaged extent references
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 error
...
root 5 inode 5 errors 1, no inode item
unresolved ref dir 6 index 0 namelen 7 name default filetype 0 
errors 3, no dir item, no dir index

Failed to find [30769152, 168, 16384]
btrfs unable to find ref byte nr 30769152 parent 0 root 5  owner 0 offset 1
reset isize for dir 6 root 5
root 5 inode 6 errors 2000, link count wrong
unresolved ref dir 6 index 0 namelen 2 name .. filetype 0 
errors 3, no dir item, no dir index

root 5 inode 7 errors 1, no inode item
root 5 inode 9 errors 1, no inode item
root 5 inode 257 errors 2400, nbytes wrong, link count wrong
...
root 5 inode 18446744073709551607 errors 1, no inode item
found 409600 bytes used err is 1
total csum bytes: 0
total tree bytes: 49152
total fs tree bytes: 0
total extent tree bytes: 16384
btree space waste bytes: 48246
file data blocks allocated: 0
 referenced 0
Btrfs v3.17


To my surprise, btrfsck showed great improvements (after btrfsck 
--repair) and asked me to run btrfsck --repair one more time to fix 
remaining error:



# btrfsck /dev/sdb1
root item for root 18446744073709551607, current bytenr 29540352, 
current gen 2758, current level 0, new bytenr 29540352, new gen 
4294967296, new level 1

Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.


Before trying to run btrfsck --repair again, I tried to mount, but 
it did not work:



# mount /dev/sdb1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so
# dmesg | tail
...
[268827.386951] BTRFS info (device sdb1): disk space caching is enabled
[268827.389932] parent transid verify failed on 29458432 wanted 5 found 2759
[268827.390161] parent transid verify failed on 29458432 wanted 5 found 2759
[268827.405135] BTRFS: open_ctree failed


Since btrfsck told me to run it with --repair option again, I did:


# btrfsck --repair /dev/sdb1  # Full log is available here: 
http://pastebin.com/pcWte3Ru

enabling repair mode
fixing root item for root 18446744073709551607, current bytenr 29540352, 
current gen 2758, current level 0, new bytenr 29540352, new gen 
4294967296, new level 1

Fixed 1 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
parent transid verify failed on 29425664 wanted 1087 found 2763
...
Ignoring transid failure
leaf parent key incorrect 29425664
bad block 29425664
Chunk[256, 228, 0]: length(4194304), offset(0), type(2) is not found in 
block group


Re: Changing label few times killed filesystem?

2014-11-25 Thread Boris Chernov

On 2014-11-24 02:46, Duncan wrote
 if you were using gmane's web service, that explains things as 
weaverd, the process

 that does the threading on the web side, was down for some days
Yes, I have used gmane blog. Good to know it is not down anymore.

Back on topic. Even after updating to the latest version, btrfsck 
or any of its options including --repair still do not work. Does anyone 
know what Assertion `rec-is_root` failed means? Is it worth trying to 
compile my own version of btrfsck without this assertion?
With or without --repair option, it looks like this assertion stops 
btrfsck very early, preventing btrfsck from checking the filesystem or 
attempting to repair it.


# btrfsck /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec-is_root` failed.
btrfs check[0x41a081]
btrfs check[0x41a0a5]
btrfs check[0x409783]
btrfs check[0x40a45e]
btrfs check[0x41bfa9]
btrfs check[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb275f24b45]
btrfs check[0x40b497]

# btrfsck --repair /dev/sdb1
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec-is_root` failed.
btrfs check[0x41a081]
btrfs check[0x41a0a5]
btrfs check[0x409783]
btrfs check[0x40a45e]
btrfs check[0x41bfa9]
btrfs check[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fbc5b8dab45]
btrfs check[0x40b497]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Changing label few times killed filesystem?

2014-11-25 Thread Boris Chernov


In attempt to get more information, I have commented out 
BUG_ON(rec-is_root) in cmds-check.c to let btrfsck check my file system 
without failing on this assertion. Below you can see the output. I would 
appreciate any help or ideas...


# btrfsck /dev/sdb1  # Full log can be downloaded here: 
http://pastebin.com/D68vr69J

Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
...
ref mismatch on [20987904 16384] extent item 0, found 1
Backref 20987904 parent 3 root 3 not found in extent tree
backpointer mismatch on [20987904 16384]
owner ref check failed [20987904 16384]
...messages like these repeat many times, download full log to see them 
all...

ref mismatch on [29540352 16384] extent item 0, found 1
Backref 29540352 parent 18446744073709551607 root 18446744073709551607 
not found in extent tree

backpointer mismatch on [29540352 16384]
owner ref check failed [29540352 16384]
...
Errors found in extent allocation tree or chunk allocation
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 root dir 256 not found
found 409600 bytes used err is 1
total csum bytes: 0
total tree bytes: 49152
total fs tree bytes: 0
total extent tree bytes: 16384
btree space waste bytes: 48246
file data blocks allocated: 0
 referenced 0
Btrfs v3.17
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Changing label few times killed filesystem?

2014-11-23 Thread Boris Chernov


 I suggest upgrading and just posting the results from 'btrfs check 
device'

 without any options and see what you get.
OK, I have upgraded to 3.17.0 kernel and I also have upgraded 
btrfs-tools:

# btrfs --version
Btrfs v3.17

# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
cmds-check.c:2645: check_owner_ref: Assertion `rec-is_root` failed.
btrfs[0x41a081]
btrfs[0x41a0a5]
btrfs[0x409783]
btrfs[0x40a45e]
btrfs[0x41bfa9]
btrfs[0x40b46a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7feaf251cb45]
btrfs[0x40b497]

btrfsck /dev/sdb1 gives exactly the same output. It seems it does 
not even try to check anything but just fails on the assertion. I also 
tried btrfs restore:


# btrfs restore /dev/sdb1 /media/backup/sdb1 # Does nothing and exits 
almost immediately

# echo $?
0

After I have upgraded to new kernel, when I try to mount the 
partition I get this:


# mount /dev/sdb1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

# dmesg | tail
...
[ 2505.921545] BTRFS info (device sdb1): disk space caching is enabled
[ 2505.925079] parent transid verify failed on 29458432 wanted 5 found 2759
[ 2505.944413] parent transid verify failed on 29458432 wanted 5 found 2759
[ 2505.958450] BTRFS: open_ctree failed

 However, if you are not now and never did use compression on that 
filesystem,

 that bug shouldn't affect you, but others might.
I did not use compression on this partition, but I have used it on 
another btrfs disk (which seems to work fine, at least for now). I think 
I did not use any of special features on the partition I have trouble 
with (I was planning to, but it died before I got a chance).


 it's quite possible you're seeing the one bug, and the relabeling is 
simply coincidence.
I suppose it is possible that something else was the cause, but 
only other thing I did with the file system at the time was 
mounting/unmounting it. Also, I did not use it much, just for few weeks, 
before that the disk was unplugged for a few months (with no files on 
it). And only things I did with it (before it stopped working) was 
creating, moving, copying and deleting files.
Before upgrading btrfs-tools and the kernel I tried to reproduce 
the issue by creating big file with btrfs file system, but I was unable 
to reproduce the problem, but I did not put as much files as on real 
partition, and it was of a smaller size. In other words, the issue I 
have encountered seems to be hard to reproduce, so I cannot tell with 
100% certainty what exactly caused the corruption.



Is there anything else I can try? If not to restore it then to 
provide more useful debug information (if possible in this case). I 
could try compiling latest development versions of kernel and/or 
btrfs-tools if is there a chance that might help?



P.S. I received on my mail only shortest reply about mount 
command, so I was able to read other replies only after few days when 
they appeared on gmane (I wasn't subscribed at the time because I did 
not expect gmane to be so slow). This time I subscribed to the list so 
hopefully I will be able to read all replies without delay.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Changing label few times killed filesystem?

2014-11-21 Thread Boris Chernov

On 2014-11-21 04:35, Roman Mamedov wrote:

On Fri, 21 Nov 2014 01:27:17 +
Boris Chernov aqs1...@hotmail.com wrote:

  I have changed file system label few times in total. When I tried
to mount it after that, it became not mountable:

# mount /dev/sdb1 /mnt
mount: Not a directory

I'd say that implies something is wrong with your /mnt, rather than /dev/sdb1.
Before mounting try things like ls -la /mnt/, umount /mnt, etc.
Or simply mounting somewhere else other than /mnt/
Before I attempted mounting to /mnt I tried to mount with KDE 
Device Notifier to /media/username/label, then I have tried to create 
directory manually in /media/ and tried to mount in the command-line, 
then tried /mnt, and error was the same. So I'm sure there is nothing 
wrong with my mount points.
Now I have rebooted and tried to mount in KDE Device Notifier to 
/media/username/label, it failed again, so I tried from command-line as 
root:


# mkdir /media/sdb1  ls -la /media/sdb1  mount /dev/sdb1 /media/sdb1
total 8
drwxr-sr-x 2 root disk 4096 Nov 21 08:12 .
drwsrwsrwT 7 root disk 4096 Nov 21 08:12 ..

...and that's it, no output from mount command (it just hanged and 
become unkillable process). Please let me know if there is anything else 
I could try to either restore it or debug it (to at least understand why 
exactly it screwed up itself so it will not happen again to me or anyone 
else). If it matters, the disk is with single partition (BTRFS-only), 
was plugged-in all the time and I use Xeon-based workstation with ECC 
memory. In the dmesg I see the following, it seems after encountering 
btrfs bugs in its recovery tools (mentioned in my previous mail) I have 
also encountered btrfs bug in the kernel:


[  339.349260] BTRFS info (device sdb1): disk space caching is enabled
[  339.397438] parent transid verify failed on 29458432 wanted 5 found 2759
[  339.397505] [ cut here ]
[  339.397510] kernel BUG at fs/btrfs/locking.c:269!
[  339.397513] invalid opcode:  [#1] SMP
[  339.397517] Modules linked in: ppp_deflate bsd_comp ppp_async 
crc_ccitt ppp_generic slhc snd_aloop snd_hrtimer xt_conntrack 
iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables 
snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 snd_i2c snd_ice17xx_ak4xxx 
snd_ak4xxx_adda snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm 
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device 
snd_timer snd soundcore ac97_bus vmnet(O) parport_pc parport 
vmw_vsock_vmci_transport vsock vmw_vmci vmmon(O) cpufreq_conservative 
cpufreq_powersave cpufreq_userspace cpufreq_stats zram nvidia(PO) 
cfg80211 rfkill binfmt_misc uinput zfs(PO) zunicode(PO) zavl(PO) 
zcommon(PO) znvpair(PO) spl(O) nfsd auth_rpcgss oid_registry nfs_acl nfs 
lockd fscache sunrpc iTCO_wdt iTCO_vendor_support usblp kvm_intel kvm 
ses enclosure cdc_ether psmouse option i2c_i801 pcspkr usbnet mii 
usb_wwan usbserial serio_raw i7core_edac edac_core uvcvideo 
videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media evdev 
joydev jc42 w83627ehf lm90 coretemp adt7475 hwmon_vid adm1021 ttm 
drm_kms_helper drm i2c_algo_bit i2c_core msr loop fuse tpm_infineon 
tpm_tis lpc_ich mfd_core tpm button acpi_cpufreq processor thermal_sys 
autofs4 ext4 crc16 mbcache jbd2 btrfs xor raid6_pq usb_storage sg sd_mod 
sr_mod cdrom crc_t10dif crct10dif_common hid_generic usbhid hid ahci 
libahci libata crc32c_intel scsi_mod e1000e ptp pps_core xhci_hcd 
ehci_pci ehci_hcd usbcore usb_common [last unloaded: vmnet]
[  339.397584] CPU: 0 PID: 25752 Comm: mount Tainted: P   O 
3.15.0-pf2 #1
[  339.397585] Hardware name: Supermicro X8SIE/X8SIE, BIOS 1.2
08/19/11
[  339.397586] task: 880036c93f80 ti: 8805702b4000 task.ti: 
8805702b4000
[  339.397587] RIP: 0010:[a0245050] [a0245050] 
btrfs_assert_tree_read_locked.part.0+0x0/0x10 [btrfs]

[  339.397604] RSP: 0018:8805702b7bf0  EFLAGS: 00010246
[  339.397605] RAX:  RBX: 8804db6da800 RCX: 
0581
[  339.397606] RDX:  RSI: 8804db58d0e0 RDI: 
8804db6da800
[  339.397607] RBP: 0001 R08: 0001b830 R09: 
88063fc1b830
[  339.397608] R10: 88061afec700 R11: ea00136d6300 R12: 
0005
[  339.397609] R13: 88008c978820 R14: 88061af51000 R15: 
8804db6da800
[  339.397610] FS:  7f55bf45b840() GS:88063fc0() 
knlGS:

[  339.397612] CS:  0010 DS:  ES:  CR0: 8005003b
[  339.397613] CR2: 7f6b280af000 CR3: 0004da047000 CR4: 
07f0

[  339.397614] Stack:
[  339.397614]  a024557d 8804db6da800 a0208838 

[  339.397616]     
88008c978820
[  339.397617]  a02093a0 1c18 0005 
8804db6da800

[  339.397619] Call Trace:
[  339.397629

Changing label few times killed filesystem?

2014-11-20 Thread Boris Chernov


I have changed file system label few times in total. When I tried 
to mount it after that, it became not mountable:


# mount /dev/sdb1 /mnt
mount: Not a directory

In dmesg I see the following after above command:

[ 5198.413202] BTRFS info (device sdb1): disk space caching is enabled
[ 5198.629958] BTRFS: checking UUID tree

I have lots of manually sorted downloaded files on this partition 
(in other words nothing very important but downloading and sorting all 
files again would require a lot of time), so I would appreciate any 
help.  This is what I have tried so far to restore it:


# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec-is_root)' 
failed.

zsh: abort  btrfs check /dev/sdb1

Since it failed after checking extents I decided to try 
--init-extent-tree:


# btrfs check --init-extent-tree /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Creating a new extent tree
Failed to find [29376512, 168, 16384]
btrfs unable to find ref byte nr 29376512 parent 0 root 1  owner 1 offset 0
Failed to find [30818304, 168, 16384]
btrfs unable to find ref byte nr 30818304 parent 0 root 1  owner 0 offset 1
Failed to find [47546368, 168, 16384]
btrfs unable to find ref byte nr 47546368 parent 0 root 1  owner 0 offset 1
parent transid verify failed on 29442048 wanted 4 found 2758
Ignoring transid failure
checking extents
btrfs: cmds-check.c:2266: check_owner_ref: Assertion `!(rec-is_root)' 
failed.

zsh: abort  btrfs check --init-extent-tree /dev/sdb1

# btrfs restore /dev/sdb1 /media/backup/sdb1  # this commands exits 
after a second with 0 return code

# echo $?
0

I also tried btrfs restore with --path-regex and got the same result.

# btrfs-find-root /dev/sdb1
Super think's the tree root is at 29360128, chunk root 20971520
Well block 4194304 seems great, but generation doesn't match, have=2, 
want=2759 level 0
Well block 4243456 seems great, but generation doesn't match, have=3, 
want=2759 level 0

Found tree root at 29360128 gen 2759 level 1

https://btrfs.wiki.kernel.org/index.php/Restore talks about picking root 
with largest transid, but I do not see transid in my output, so not 
sure what to do.


I also tried btrfsck:

# btrfsck /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop): 
0x01074020 ***

zsh: abort  btrfsck /dev/sdb1

# btrfsck -b /dev/sdb1
*** Error in `btrfs check': double free or corruption (fasttop): 
0x024e8020 ***

zsh: abort  btrfsck -b /dev/sdb1

# btrfsck --repair /dev/sdb1
enabling repair mode
*** Error in `btrfs check': double free or corruption (fasttop): 
0x00e26020 ***

zsh: abort  btrfsck --repair /dev/sdb1

# uname -a
Linux debian 3.15.0-pf2 #1 SMP Sat Jun 28 15:09:48 EEST 2014 x86_64 
GNU/Linux

# btrfs --version
Btrfs v3.14.1
# btrfs fi show
Label: 'label'  uuid: 787e3bc1-7583-4bd8-a52e-e57fd7fc9243
Total devices 1 FS bytes used 411.76GiB
devid1 size 465.76GiB used 465.76GiB path /dev/sdb1

Btrfs v3.14.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html