Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-05 Thread Robert Krig


On 04.04.2017 18:55, Chris Murphy wrote:
> On Tue, Apr 4, 2017 at 10:52 AM, Chris Murphy  wrote:
>
>
>> Mounting -o ro,degraded is probably permitted by the file system, but
>> chunks of the file system and certainly your data, will be missing. So
>> it's just a matter of time before copying data off will fail.
> ** Context here is, more than 1 device missing.
>

Thanks you guys for all your help and input.

I've ordered two new drives to backup all my data. I have a cloud backup
in place, but 13TB takes a while to upload :-)
I think I'm gonna abandon btrfs as the main fs for my home server. I'm
just gonna set up a separate LVM volume for storing snapshots and
backups, since I use btrfs on all my single disk machines.
Thanks again everyone.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-04 Thread Chris Murphy
On Mon, Apr 3, 2017 at 10:02 PM, Robert Krig
 wrote:
>
>
> On 03.04.2017 16:25, Robert Krig wrote:
>>
>> I'm gonna run a extensive memory check once I get home, since you
>> mentioned corrupt memory might be an issue here.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> I ran a memtest over a couple of hours with no errors. Ram seems to be
> fine so far.

Inconclusive. A memtest can take days to expose a problem, and even
that's not conclusive. The list archive has some examples of where
memory testers gave RAM a pass, but doing things like compiling the
kernel would fail.


>
> I've looked at the link you provided. Frankly it looks very scary. (At
> least to me it does)
> But I've just thought of something else.
>
> My storage array is BTRFS Raid1 with 4x8TB Drives.
> Wouldn't it be possible to simply disconnect two of those drives, mount
> with -o degraded and still have access (even if read-only) to all my data?

man mkfs.btrfs

Btrfs raid1 supports only one device missing, no matter how many drives.

Mounting -o ro,degraded is probably permitted by the file system, but
chunks of the file system and certainly your data, will be missing. So
it's just a matter of time before copying data off will fail.

I suggest trying -o ro with all drives, not a degraded mount, and
copying data off. Any failures should be logged. Metadata errors are
logged without paths, whereas data corruption included path to the
affected file. This is easier than scraping the file system with btrfs
restore.

If you can't mount ro with all drives, or ro,degraded with just one
device missing, you'll need to use btrfs restore which is more
tolerant of missing metadata.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-04 Thread Austin S. Hemmelgarn

On 2017-04-04 09:29, Brian B wrote:

On 04/04/2017 12:02 AM, Robert Krig wrote:

My storage array is BTRFS Raid1 with 4x8TB Drives.
Wouldn't it be possible to simply disconnect two of those drives, mount
with -o degraded and still have access (even if read-only) to all my data?

Just jumping on this point: my understanding of BTRFS "RAID1" is that
each file (block?) is randomly assigned to two disks of the array (no
matter how many disks are in the array).  So if you remove two disks,
you will probably have files that were "assigned" to both of those
disks, and will be missing.

In short, you can't remove more than one disk of a BTRFS RAID1 and still
have all of your data.

That understanding is correct.  From a functional perspective, BTRFS 
raid1 is currently a RAID10 implementation with striping happening at a 
very large granularity.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-04 Thread Hugo Mills
On Tue, Apr 04, 2017 at 09:29:11AM -0400, Brian B wrote:
> On 04/04/2017 12:02 AM, Robert Krig wrote:
> > My storage array is BTRFS Raid1 with 4x8TB Drives.
> > Wouldn't it be possible to simply disconnect two of those drives, mount
> > with -o degraded and still have access (even if read-only) to all my data?
> Just jumping on this point: my understanding of BTRFS "RAID1" is that
> each file (block?) is randomly assigned to two disks of the array (no

   Arbitrarily assigned, rather than randomly assigned (there is a
deterministic algorithm for it, but it's wise not to rely on the exact
behaviour of that algorithm, because there are a number of factors
that can alter its behaviour).

> matter how many disks are in the array).  So if you remove two disks,
> you will probably have files that were "assigned" to both of those
> disks, and will be missing.
> 
> In short, you can't remove more than one disk of a BTRFS RAID1 and still
> have all of your data.

   Indeed.

   Hugo.

-- 
Hugo Mills | Some days, it's just not worth gnawing through the
hugo@... carfax.org.uk | straps
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-04 Thread Brian B
On 04/04/2017 12:02 AM, Robert Krig wrote:
> My storage array is BTRFS Raid1 with 4x8TB Drives.
> Wouldn't it be possible to simply disconnect two of those drives, mount
> with -o degraded and still have access (even if read-only) to all my data?
Just jumping on this point: my understanding of BTRFS "RAID1" is that
each file (block?) is randomly assigned to two disks of the array (no
matter how many disks are in the array).  So if you remove two disks,
you will probably have files that were "assigned" to both of those
disks, and will be missing.

In short, you can't remove more than one disk of a BTRFS RAID1 and still
have all of your data.



signature.asc
Description: OpenPGP digital signature


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Robert Krig


On 03.04.2017 16:25, Robert Krig wrote:
>
> I'm gonna run a extensive memory check once I get home, since you
> mentioned corrupt memory might be an issue here.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


I ran a memtest over a couple of hours with no errors. Ram seems to be
fine so far.

I've looked at the link you provided. Frankly it looks very scary. (At
least to me it does)
But I've just thought of something else.

My storage array is BTRFS Raid1 with 4x8TB Drives.
Wouldn't it be possible to simply disconnect two of those drives, mount
with -o degraded and still have access (even if read-only) to all my data?
E.g. I could use the two removed drives as a backup and rebuild my array
from there. Since I'm kind of playing with the idea of turning it into a
MD RAID5 and only use btrfs on specific lvm volumes which need it.

The one thing that slightly worries me with this idea is, I don't know
if there is a way to tell which datablocks are on which drives. If I've
understood btrfs raid1 correctly it simply ensures that there is at
least a copy of each block on a different device.

Would my idea work? Or could it be that I can only safely remove one
drive, since the other drives might contain blocks from any of the other
drives?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Hans van Kranenburg
On 04/03/2017 04:20 PM, Robert Krig wrote:
> 
> 
> On 03.04.2017 16:08, Hans van Kranenburg wrote:
>> On 04/03/2017 12:11 PM, Robert Krig wrote:
>> The corruption is at item 157. Can you attach all of the output, or
>> pastebin it?
>>
> 
> I've attached the entire log of btrfs-debug-tree. This was generated
> with btrfs-progs 4.7.3

Meuh,

item 156 key (23416298414080 EXTENT_ITEM 4096) itemoff 8643 itemsize 53
item 157 key (23416298418176 EXTENT_ITEM 4096) itemoff 8590 itemsize 53

8590 + 53 = 8643.

I don't get what's invalid about that.

"incorrect offsets 8590 1258314415"

if (btrfs_item_offset_nr(buf, i) !=
btrfs_item_end_nr(buf, i + 1)) {
ret = BTRFS_TREE_BLOCK_INVALID_OFFSETS;
fprintf(stderr, "incorrect offsets %u %u\n",
btrfs_item_offset_nr(buf, i),
btrfs_item_end_nr(buf, i + 1));
goto fail;
}

Ah, ok, so the corruption is in item 158, but it's reported as
corruption in item 157.

There's no really simple tool right now to fix this manually. We can
also try to dd 16kiB of metadata from disk, fix it, and write it back.
We've been doing that before, it's a bit of work, but it can succeed.
Here's more instructions:

https://www.spinics.net/lists/linux-btrfs/msg62459.html

So, if you're the adventurous type...

But then again, if this is really memory failure, there might be other
errors all around the fs, which you didn't hit while reading back the
data yet.

Also note that btrfs does not protect you against this, also not for
data in files that gets corrupted in memory before it's written out
(which contains the checksum step).

> If it makes a difference, I can try it again with the newest version of
> btrfs-progs?

No, that code hasn't been touched in over 5 years.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Robert Krig


On 03.04.2017 16:20, Robert Krig wrote:
>
> On 03.04.2017 16:08, Hans van Kranenburg wrote:
>> On 04/03/2017 12:11 PM, Robert Krig wrote:
>> The corruption is at item 157. Can you attach all of the output, or
>> pastebin it?
>>
>
> I've attached the entire log of btrfs-debug-tree. This was generated
> with btrfs-progs 4.7.3
>
> If it makes a difference, I can try it again with the newest version of
> btrfs-progs?


I forgot to mention that btrfs-debug-tree also segfaults with a "memory
access error"

I'm gonna run a extensive memory check once I get home, since you
mentioned corrupt memory might be an issue here.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Robert Krig


On 03.04.2017 16:08, Hans van Kranenburg wrote:
> On 04/03/2017 12:11 PM, Robert Krig wrote:
> The corruption is at item 157. Can you attach all of the output, or
> pastebin it?
>


I've attached the entire log of btrfs-debug-tree. This was generated
with btrfs-progs 4.7.3

If it makes a difference, I can try it again with the newest version of
btrfs-progs?
btrfs-progs v4.7.3
leaf 38666170826752 items 199 free space 1506 generation 1248226 owner 2
fs uuid 8c4f8e26-3442-463f-ad8a-668dfef02593
chunk uuid 1f04f64e-0ec8-4b39-83d9-a2df75179d3e
item 0 key (23416295448576 EXTENT_ITEM 36864) itemoff 16230 itemsize 53
extent refs 1 gen 671397 flags DATA
extent data backref root 5 objectid 4959957 offset 0 count 1
item 1 key (23416295485440 EXTENT_ITEM 8192) itemoff 16177 itemsize 53
extent refs 1 gen 972749 flags DATA
extent data backref root 5 objectid 7328099 offset 0 count 1
item 2 key (23416295493632 EXTENT_ITEM 12288) itemoff 16124 itemsize 53
extent refs 1 gen 797708 flags DATA
extent data backref root 5 objectid 5842103 offset 1966080 
count 1
item 3 key (23416295505920 EXTENT_ITEM 8192) itemoff 16071 itemsize 53
extent refs 1 gen 1244513 flags DATA
extent data backref root 44107 objectid 28528 offset 974848 
count 1
item 4 key (23416295514112 EXTENT_ITEM 8192) itemoff 16034 itemsize 37
extent refs 1 gen 625327 flags DATA
shared data backref parent 38666872045568 count 1
item 5 key (23416295522304 EXTENT_ITEM 16384) itemoff 15997 itemsize 37
extent refs 1 gen 625327 flags DATA
shared data backref parent 38666872045568 count 1
item 6 key (23416295538688 EXTENT_ITEM 49152) itemoff 15944 itemsize 53
extent refs 1 gen 585321 flags DATA
extent data backref root 5 objectid 4742401 offset 393216 count 
1
item 7 key (23416295587840 EXTENT_ITEM 8192) itemoff 15907 itemsize 37
extent refs 1 gen 625327 flags DATA
shared data backref parent 38666872045568 count 1
item 8 key (23416295596032 EXTENT_ITEM 4096) itemoff 15854 itemsize 53
extent refs 1 gen 625327 flags DATA
extent data backref root 5 objectid 1123021 offset 6029312 
count 1
item 9 key (23416295600128 EXTENT_ITEM 4096) itemoff 15801 itemsize 53
extent refs 1 gen 975337 flags DATA
extent data backref root 5 objectid 7334929 offset 0 count 1
item 10 key (23416295604224 EXTENT_ITEM 57344) itemoff 15748 itemsize 53
extent refs 1 gen 572974 flags DATA
extent data backref root 5 objectid 4430156 offset 0 count 1
item 11 key (23416295661568 EXTENT_ITEM 106496) itemoff 15695 itemsize 
53
extent refs 1 gen 585319 flags DATA
extent data backref root 5 objectid 4742398 offset 2490368 
count 1
item 12 key (23416295768064 EXTENT_ITEM 4096) itemoff 15642 itemsize 53
extent refs 1 gen 795227 flags DATA
extent data backref root 5 objectid 5769382 offset 12288 count 1
item 13 key (23416295772160 EXTENT_ITEM 4096) itemoff 15589 itemsize 53
extent refs 1 gen 795227 flags DATA
extent data backref root 5 objectid 5769383 offset 4096 count 1
item 14 key (23416295776256 EXTENT_ITEM 4096) itemoff 15536 itemsize 53
extent refs 1 gen 585370 flags DATA
extent data backref root 5 objectid 4742594 offset 1310720 
count 1
item 15 key (23416295780352 EXTENT_ITEM 8192) itemoff 15499 itemsize 37
extent refs 1 gen 625327 flags DATA
shared data backref parent 32477101621248 count 1
item 16 key (23416295788544 EXTENT_ITEM 151552) itemoff 15446 itemsize 
53
extent refs 1 gen 992062 flags DATA
extent data backref root 5 objectid 7458028 offset 0 count 1
item 17 key (23416295940096 EXTENT_ITEM 4096) itemoff 15393 itemsize 53
extent refs 1 gen 1027477 flags DATA
extent data backref root 5 objectid 7508879 offset 4096 count 1
item 18 key (23416295944192 EXTENT_ITEM 4096) itemoff 15340 itemsize 53
extent refs 1 gen 1023977 flags DATA
extent data backref root 5 objectid 7496365 offset 20480 count 1
item 19 key (23416295948288 EXTENT_ITEM 36864) itemoff 15287 itemsize 53
extent refs 1 gen 516177 flags DATA
extent data backref root 5 objectid 3897818 offset 12976128 
count 1
item 20 key (23416295985152 EXTENT_ITEM 45056) itemoff 15234 itemsize 53
extent refs 1 gen 444976 flags DATA
extent data backref root 5 objectid 3591929 offset 12320768 
count 1
item 21 key 

Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Hans van Kranenburg
On 04/03/2017 03:50 PM, Robert Krig wrote:
> 
> 
> On 03.04.2017 12:11, Robert Krig wrote:
>> Hi guys, I seem to have run into a spot of trouble with my btrfs partition.
>>
>> I've got 4 x 8TB in a RAID1 BTRFS configuration.
>>
>> I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs
>> progs version v4.7.3
>>
>> Server has 8GB of Ram.
>>
>>
>> I was running duperemove using a hashfile, which seemed to have run out
>> space and aborted. Then I tried a balance operation, with -dusage
>> progressively set to 0 1 5 15 30 50, which then aborted, I presume that
>> this caused the fs to mount readonly. I only noticed it somewhat later.
>>
>> I've since rebooted, and I can mount the filesystem OK, but after some
>> time (I presume caused by reads or writes) it once again switches to
>> readonly.
>>
>> I tried unmounting/remounting again and running a scrub, but the scrub
>> aborts after some time.
>>
>>
> 
> 
> I've compiled the newest btrfs-tools version 4.10.2
> 
> This is what I get when running a btrfsck -p /dev/sda
> 
> hecking filesystem on
> /dev/sda  
> 
> 
> UUID:
> 8c4f8e26-3442-463f-ad8a-668dfef02593  
>  
> 
> incorrect offsets 8590
> 1258314415
> 
> 
> bad block
> 38666170826752
>  
> 
>   
>
> 
> ERROR: errors found in extent allocation tree or chunk
> allocation
> Speicherzugriffsfehler
> 
> For the non-german speakers: Speicherzugriffsfehler = Memory Access Error
> 
> Dmesg shows this:
> 
> Apr 03 15:47:05 atlas kernel: btrfs[9140]: segfault at 9476b99e ip
> 0044c459 sp 7fff556b4b10 error 4 in
> btrfs[40+9d000]

That's probably because the tool does not verify if the numbers in the
fields make sense before using them.


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Hans van Kranenburg
On 04/03/2017 12:11 PM, Robert Krig wrote:
> Hi guys, I seem to have run into a spot of trouble with my btrfs partition.
> 
> I've got 4 x 8TB in a RAID1 BTRFS configuration.
> 
> I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs
> progs version v4.7.3
> 
> Server has 8GB of Ram.
> 
> 
> I was running duperemove using a hashfile, which seemed to have run out
> space and aborted. Then I tried a balance operation, with -dusage
> progressively set to 0 1 5 15 30 50, which then aborted, I presume that
> this caused the fs to mount readonly. I only noticed it somewhat later.

The balance probably did not cause the issue, but it ran across the
invalid metadata page, while digging around in the filesyste and then
choked on it.

> I've since rebooted, and I can mount the filesystem OK, but after some
> time (I presume caused by reads or writes) it once again switches to
> readonly.
> 
> I tried unmounting/remounting again and running a scrub, but the scrub
> aborts after some time.
> 
> 
> Here is the output from the kernel when the partition crashes:
> 
> Apr 03 11:32:57 atlas kernel: BTRFS info (device sda): The free space
> cache file (37732863967232) is invalid. skip it
> Apr 03 11:33:46 atlas kernel: BTRFS critical (device sda): corrupt leaf,
> slot offset bad: block=38666170826752, root=1, slot=157
> [...]

Note: The root=1 is a lie? Looking at the output of btrfs-debug-tree
below, this is definitely a tree block of tree 2, not 1. I have seen
this more often, but not looked at the code yet. Maybe some bug in
assembling the error message?

> I tried running a btrfs-debug-tree -b 38666170826752 /dev/sda
> 
> btrfs-progs
> v4.7.3
> 
> 
> leaf 38666170826752 items 199 free space 1506 generation 1248226 owner
> 2 
>  
> 
> fs uuid
> 8c4f8e26-3442-463f-ad8a-668dfef02593  
> 
> 
> chunk uuid
> 1f04f64e-0ec8-4b39-83d9-a2df75179d3e  
>  
> 
> item 0 key (23416295448576 EXTENT_ITEM 36864) itemoff 16230
> itemsize
> 53   
> 
> extent refs 1 gen 671397 flags
> DATA  
>  
> 
> extent data backref root 5 objectid 4959957 offset 0
> count
> 1 
>  
> 
> [...]

The corruption is at item 157. Can you attach all of the output, or
pastebin it?

> this goes on and on.  I can provide the entire output if thats helpful.

Yes. The corruption is in item 157, and then from the point of the
itemoff value. This is the offset of the item data in the metadata page.
See https://btrfs.wiki.kernel.org/index.php/On-disk_Format#Leaf_Node

> Any ideas on what I could do to fix the partition? Is it fixable, or is
> it a lost cause?

Memory corruption, not on disk corruption.

So, either a bitflip, or garbage which ended up on this memory location
for whatever reason or a bug in whatever part of the kernel, a pointer
in another module gone wonky, etc, which we might learn more about after
seeing more of the output.


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Robert Krig


On 03.04.2017 12:11, Robert Krig wrote:
> Hi guys, I seem to have run into a spot of trouble with my btrfs partition.
>
> I've got 4 x 8TB in a RAID1 BTRFS configuration.
>
> I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs
> progs version v4.7.3
>
> Server has 8GB of Ram.
>
>
> I was running duperemove using a hashfile, which seemed to have run out
> space and aborted. Then I tried a balance operation, with -dusage
> progressively set to 0 1 5 15 30 50, which then aborted, I presume that
> this caused the fs to mount readonly. I only noticed it somewhat later.
>
> I've since rebooted, and I can mount the filesystem OK, but after some
> time (I presume caused by reads or writes) it once again switches to
> readonly.
>
> I tried unmounting/remounting again and running a scrub, but the scrub
> aborts after some time.
>
>


I've compiled the newest btrfs-tools version 4.10.2

This is what I get when running a btrfsck -p /dev/sda

hecking filesystem on
/dev/sda
  

UUID:
8c4f8e26-3442-463f-ad8a-668dfef02593
   

incorrect offsets 8590
1258314415  
  

bad block
38666170826752  
   


 

ERROR: errors found in extent allocation tree or chunk
allocation
Speicherzugriffsfehler

For the non-german speakers: Speicherzugriffsfehler = Memory Access Error

Dmesg shows this:

Apr 03 15:47:05 atlas kernel: btrfs[9140]: segfault at 9476b99e ip
0044c459 sp 7fff556b4b10 error 4 in
btrfs[40+9d000]



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Need some help: "BTRFS critical (device sda): corrupt leaf, slot offset bad: block"

2017-04-03 Thread Robert Krig
Hi guys, I seem to have run into a spot of trouble with my btrfs partition.

I've got 4 x 8TB in a RAID1 BTRFS configuration.

I'm running Debian Jessie 64 Bit, 4.9.0-0.bpo.2-amd64 kernel. Btrfs
progs version v4.7.3

Server has 8GB of Ram.


I was running duperemove using a hashfile, which seemed to have run out
space and aborted. Then I tried a balance operation, with -dusage
progressively set to 0 1 5 15 30 50, which then aborted, I presume that
this caused the fs to mount readonly. I only noticed it somewhat later.

I've since rebooted, and I can mount the filesystem OK, but after some
time (I presume caused by reads or writes) it once again switches to
readonly.

I tried unmounting/remounting again and running a scrub, but the scrub
aborts after some time.


Here is the output from the kernel when the partition crashes:

Apr 03 11:32:57 atlas kernel: BTRFS info (device sda): The free space
cache file (37732863967232) is invalid. skip it
Apr 03 11:33:46 atlas kernel: BTRFS critical (device sda): corrupt leaf,
slot offset bad: block=38666170826752, root=1, slot=157
Apr 03 11:33:46 atlas kernel: [ cut here ]
Apr 03 11:33:46 atlas kernel: WARNING: CPU: 0 PID: 17810 at
/home/zumbi/linux-4.9.13/fs/btrfs/extent-tree.c:6961
__btrfs_free_extent.isra.69+0x152/0xd60 [b
Apr 03 11:33:46 atlas kernel: BTRFS: Transaction aborted (error -5)
Apr 03 11:33:46 atlas kernel: Modules linked in: xt_multiport
iptable_filter ip_tables x_tables binfmt_misc cpufreq_userspace
cpufreq_conservative cpufreq_
Apr 03 11:33:46 atlas kernel:  ppdev lp parport autofs4 btrfs xor
raid6_pq dm_mod md_mod fuse sg sd_mod ahci libahci libata crc32c_intel
scsi_mod fan therm
Apr 03 11:33:46 atlas kernel: CPU: 0 PID: 17810 Comm: mc Not tainted
4.9.0-0.bpo.2-amd64 #1 Debian 4.9.13-1~bpo8+1
Apr 03 11:33:46 atlas kernel: Hardware name: ASUS All Series/H87M-E,
BIOS 0703 10/30/2013
Apr 03 11:33:46 atlas kernel:   97d29cd5
b8ab4bb53a50 
Apr 03 11:33:46 atlas kernel:  97a778a4 154c080b2000
b8ab4bb53aa8 8908ad438b40
Apr 03 11:33:46 atlas kernel:  890951b96000 
89086c3d4000 97a7791f
Apr 03 11:33:46 atlas kernel: Call Trace:
Apr 03 11:33:46 atlas kernel:  [] ? dump_stack+0x5c/0x77
Apr 03 11:33:46 atlas kernel:  [] ? __warn+0xc4/0xe0
Apr 03 11:33:46 atlas kernel:  [] ?
warn_slowpath_fmt+0x5f/0x80
Apr 03 11:33:46 atlas kernel:  [] ?
__btrfs_free_extent.isra.69+0x152/0xd60 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ?
__btrfs_run_delayed_refs+0x466/0x1360 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ?
set_extent_buffer_dirty+0x64/0xb0 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ?
btrfs_run_delayed_refs+0x8f/0x2b0 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ?
btrfs_should_end_transaction+0x3f/0x60 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ?
btrfs_truncate_inode_items+0x63a/0xde0 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ?
btrfs_evict_inode+0x4a2/0x5f0 [btrfs]
Apr 03 11:33:46 atlas kernel:  [] ? evict+0xb6/0x180
Apr 03 11:33:46 atlas kernel:  [] ?
do_unlinkat+0x148/0x300
Apr 03 11:33:46 atlas kernel:  [] ?
system_call_fast_compare_end+0xc/0x9b
Apr 03 11:33:46 atlas kernel: ---[ end trace 2a45c2819ff7b785 ]---
Apr 03 11:33:46 atlas kernel: BTRFS: error (device sda) in
__btrfs_free_extent:6961: errno=-5 IO failure
Apr 03 11:33:46 atlas kernel: BTRFS info (device sda): forced readonly
Apr 03 11:33:46 atlas kernel: BTRFS: error (device sda) in
btrfs_run_delayed_refs:2967: errno=-5 IO failure
Apr 03 11:33:50 atlas kernel: BTRFS warning (device sda): failed setting
block group ro, ret=-30
Apr 03 11:33:50 atlas kernel: BTRFS warning (device sda): failed setting
block group ro, ret=-30
Apr 03 11:33:52 atlas kernel: BTRFS warning (device sda): failed setting
block group ro, ret=-30
Apr 03 11:33:53 atlas kernel: BTRFS warning (device sda): Skipping
commit of aborted transaction.
Apr 03 11:33:53 atlas kernel: BTRFS: error (device sda) in
cleanup_transaction:1850: errno=-5 IO failure
Apr 03 11:33:53 atlas kernel: BTRFS info (device sda): delayed_refs has
NO entry
Apr 03 11:33:54 atlas kernel: BTRFS warning (device sda): failed setting
block group ro, ret=-30



I tried running a btrfs-debug-tree -b 38666170826752 /dev/sda

btrfs-progs
v4.7.3  
  

leaf 38666170826752 items 199 free space 1506 generation 1248226 owner
2   
   

fs uuid
8c4f8e26-3442-463f-ad8a-668dfef02593
  

chunk uuid
1f04f64e-0ec8-4b39-83d9-a2df75179d3e
   

item 0 key (23416295448576 EXTENT_ITEM 36864) itemoff 16230
itemsize
53

Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-03-07 Thread Lukas Tribus


Am 07.03.2017 um 15:12 schrieb Hans van Kranenburg:

On 03/05/2017 11:50 PM, Lukas Tribus wrote:


I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs
inspect-internal dump-tree.
But I cannot find anything about 5242107641856 in the dump-tree output.

What does that mean?

I have no idea. It probably means it's gone. Did you use the filesystem
read/write? Are the symptoms also gone?



Well I read basically everything and copied it to other drivers. Nothing 
appears corrupted
from what I can tell. I didn't write to the pool consciously, although I 
did not mount it

readonly either not that I'm thinking about it ...


btrfs check --readonly reports block corruption (and a number of "no 
inode ref" in files/folders):


Checking filesystem on /dev/mapper/sda3_crypt
UUID: f50f980e-7640-49c7-bf8d-20d55cfe6005
The following tree block(s) is corrupted in tree 261:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

The following tree block(s) is corrupted in tree 263:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

The following tree block(s) is corrupted in tree 6685:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

The following tree block(s) is corrupted in tree 6879:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

The following tree block(s) is corrupted in tree 6893:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

The following tree block(s) is corrupted in tree 6896:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

found 4080263675904 bytes used err is 1
total csum bytes: 0
total tree bytes: 181780480
total fs tree bytes: 0
total extent tree bytes: 178765824
btree space waste bytes: 49102341
file data blocks allocated: 1545338880
 referenced 1545338880


Not sure how btrfs check finds a corrupted block that doesn't appear in 
the dump-tree output.



And I had an additional stack trace on the new btrfs pool I was copying 
the data to:


[873067.780479] BTRFS error (device sdf3): bdev /dev/sdf3 errs: wr 0, rd 
1, flush 0, corrupt 0, gen 0
[873067.790639] BTRFS error (device sdf3): bdev /dev/sdf3 errs: wr 0, rd 
2, flush 0, corrupt 0, gen 0

[873067.800708] [ cut here ]
[873067.800727] WARNING: CPU: 3 PID: 12942 at 
/build/linux-hwe-6_oOe5/linux-hwe-4.8.0/fs/btrfs/extent-tree.c:6954 
__btrfs_free_extent.isra.71+0x2cb/0xcc0 [btrfs]

[873067.800730] BTRFS: Transaction aborted (error -5)
[873067.800731] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos 
jfs xfs algif_skcipher af_alg xen_gntdev xen_evtchn xenfs xen_privcmd 
dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp nls_iso8859_1 
coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bridge stp 
llc intel_rapl_perf serio_raw lpc_ich joydev shpchp nuvoton_cir 
input_leds mei_me mei rc_core mac_hid ie31200_edac edac_core ib_iser 
rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 uas usb_storage btrfs raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid mxm_wmi 
aesni_intel aes_x86_64 i915 glue_helper lrw i2c_algo_bit ablk_helper tg3 
cryptd drm_kms_helper syscopyarea sysfillrect
[873067.800782]  firewire_ohci ptp sysimgblt psmouse firewire_core 
fb_sys_fops crc_itu_t pps_core ahci drm libahci wmi fjes video
[873067.800791] CPU: 3 PID: 12942 Comm: screen Tainted: G W   
4.8.0-39-generic #42~16.04.1-Ubuntu
[873067.800791] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
[873067.800793]  0200 f56bf709 880259f1f908 
8142e043
[873067.800795]  880259f1f958  880259f1f948 
8108313b
[873067.800797]  1b2a59f1faa0 fffb 01cda76bc000 
8802a9fe0d20

[873067.800798] Call Trace:
[873067.800803]  [] dump_stack+0x63/0x90
[873067.800805]  [] __warn+0xcb/0xf0
[873067.800807]  [] warn_slowpath_fmt+0x5f/0x80
[873067.800821]  [] 
__btrfs_free_extent.isra.71+0x2cb/0xcc0 [btrfs]
[873067.800836]  [] ? 
btrfs_merge_delayed_refs+0x8f/0x6a0 [btrfs]
[873067.800846]  [] 
__btrfs_run_delayed_refs+0xb10/0x12c0 [btrfs]

[873067.800857]  [] ? set_page_dirty+0x58/0xb0
[873067.800869]  [] ? 
set_extent_buffer_dirty+0x78/0xd0 [btrfs]
[873067.800879]  [] btrfs_run_delayed_refs+0x8e/0x2b0 
[btrfs]
[873067.800890]  [] commit_cowonly_roots+0xae/0x300 
[btrfs]
[873067.800901]  [] ? 
btrfs_qgroup_account_extents+0x84/0x180 [btrfs]
[873067.800911]  [] 
btrfs_commit_transaction+0x573/0xb00 [btrfs]

[873067.800920]  [] ? start_transaction+0x9e/0x4c0 [btrfs]
[873067.800930]  [] btrfs_commit_super+0x8f/0xa0 [btrfs]
[873067.800939]  [] close_ctree+0x2b7/0x360 [btrfs]
[873067.800947]  [] btrfs_put_super+0x19/0x20 [btrfs]
[873067.800949]  [] 

Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-03-07 Thread Hans van Kranenburg
On 03/05/2017 11:50 PM, Lukas Tribus wrote:
> 
> Am 24.02.2017 um 01:26 schrieb Hans van Kranenburg:
>>
>>> Once that is done, I would like to go over the "btrfs recovery" thread
>>> and see if it can
>>> be applied for my case as well. I will certainly need your help when
>>> that time comes...
>> We can take a stab at it.
> 
> I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs
> inspect-internal dump-tree.
> But I cannot find anything about 5242107641856 in the dump-tree output.
> 
> What does that mean?

I have no idea. It probably means it's gone. Did you use the filesystem
read/write? Are the symptoms also gone?

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-03-05 Thread Lukas Tribus

Hello Hans,


Am 24.02.2017 um 01:26 schrieb Hans van Kranenburg:



Once that is done, I would like to go over the "btrfs recovery" thread
and see if it can
be applied for my case as well. I will certainly need your help when
that time comes...

We can take a stab at it.



I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs 
inspect-internal dump-tree.

But I cannot find anything about 5242107641856 in the dump-tree output.

What does that mean?




Thanks,
Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-02-23 Thread Hans van Kranenburg
On 02/24/2017 12:47 AM, Lukas Tribus wrote:
> Hello Hans,
> 
> 
> Am 22.02.2017 um 20:40 schrieb Hans van Kranenburg:
>>
>> Question here is... is it easier for you to nuke the filesystem and
>> restore the files from somewhere else, or do you want to figure out
>> manually if it's recoverable, and spend some time with dd, hexedit,
>> reading struct definitions in btrfs kernel C code etc...
>>
>> If the regular --repair can't fix it (and it can't do magic if you shoot
>> a hole in it with a shotgun), then there's no automated other tool that
>> can do it now.
>>
>> Since it's block 5242107641856 all the time, it might be worthwhile to
>> have a look at it. Either it's that block, or there's a bigger mess
>> hidden behind it.
>>
> 
> Thanks for all the inputs here and on IRC. I now have a good
> understanding of what can
> and what cannot be done realistically.
> 
> The files are still fully readable and I'm going to backup as much data
> as I can over the
> next few days.
> 
> Once that is done, I would like to go over the "btrfs recovery" thread
> and see if it can
> be applied for my case as well. I will certainly need your help when
> that time comes...

We can take a stab at it.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-02-23 Thread Lukas Tribus

Hello Hans,


Am 22.02.2017 um 20:40 schrieb Hans van Kranenburg:


Question here is... is it easier for you to nuke the filesystem and
restore the files from somewhere else, or do you want to figure out
manually if it's recoverable, and spend some time with dd, hexedit,
reading struct definitions in btrfs kernel C code etc...

If the regular --repair can't fix it (and it can't do magic if you shoot
a hole in it with a shotgun), then there's no automated other tool that
can do it now.

Since it's block 5242107641856 all the time, it might be worthwhile to
have a look at it. Either it's that block, or there's a bigger mess
hidden behind it.



Thanks for all the inputs here and on IRC. I now have a good 
understanding of what can

and what cannot be done realistically.

The files are still fully readable and I'm going to backup as much data 
as I can over the

next few days.

Once that is done, I would like to go over the "btrfs recovery" thread 
and see if it can
be applied for my case as well. I will certainly need your help when 
that time comes...



Thanks for all your help,

Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-02-22 Thread Hans van Kranenburg
On 02/22/2017 08:44 AM, Lukas Tribus wrote:
> Upgrading to 4.8, the FS no longer causes a kernel calltrace and does
> not go read-only. It only shows the "corrupt leaf, slot offset bad"
> message.
> 
> A scrub completed without errors on 3 devices, while it was aborted on 2
> devices. Not sure why it was aborted, since there is no error message in
> dmesg?
> 
> Any suggestions why the scrub was aborted?

Maybe because of the "corrupt leaf" error.

> # uname -a
> Linux srv1-dom0 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5
> 09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> # btrfs scrub status /storage/users/
> scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
> scrub started at Wed Feb 22 00:07:33 2017 and was aborted after
> 06:35:42
> total bytes scrubbed: 10.60TiB with 0 errors
> /# btrfs scrub status /storage/users/ -d
> scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
> scrub device /dev/dm-5 (id 1) history
> scrub started at Wed Feb 22 00:07:33 2017 and finished after
> 06:35:36
> total bytes scrubbed: 2.30TiB with 0 errors
> scrub device /dev/dm-6 (id 2) history
> scrub started at Wed Feb 22 00:07:33 2017 and finished after
> 06:35:30
> total bytes scrubbed: 2.30TiB with 0 errors
> scrub device /dev/dm-7 (id 3) history
> scrub started at Wed Feb 22 00:07:33 2017 and finished after
> 06:35:42
> total bytes scrubbed: 2.30TiB with 0 errors
> scrub device /dev/dm-8 (id 4) history
> scrub started at Wed Feb 22 00:07:33 2017 and was aborted after
> 05:01:37
> total bytes scrubbed: 1.85TiB with 0 errors
> scrub device /dev/mapper/sde3_crypt (id 5) history
> scrub started at Wed Feb 22 00:07:33 2017 and was aborted after
> 05:01:37
>     total bytes scrubbed: 1.85TiB with 0 errors
> #dmesg | grep BTRFS
> [  929.737119] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39
> [19772.594129] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39
> [19777.127704] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39
> [19777.552191] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39

Ok, this is not a csum failure, so probably not the disk giving other
data back than what was sent to it when doing the writes, or a disk
controller which corrupted the data while writing.

And, it's a metadata page, in which part of the entries do not make
sense any more to btrfs. Specifically, it's in root 1, which is the tree
which contains information about all other subtrees containing metadata,
so it's quite an important one.

So, the corruption which is now present in there likely happened in
memory before writing it out. This is also a scenario in which DUP or
RAIDx on disk doesn't help you, because in memory it's stored just once.

If this is a bitflip like thing in memory, it would probably be possible
to spot it and manually correct it (using a patched btrfschk with
bitflip patch, or manually by hexediting++).

Another option is memory corruption or a bug somewhere else in the
kernel, which lead to a memory address of a pointer being changed,
leading to a write to memory end up in the middle of some btrfs metadata
waiting to be checksummed and written to disk.

Question here is... is it easier for you to nuke the filesystem and
restore the files from somewhere else, or do you want to figure out
manually if it's recoverable, and spend some time with dd, hexedit,
reading struct definitions in btrfs kernel C code etc...

If the regular --repair can't fix it (and it can't do magic if you shoot
a hole in it with a shotgun), then there's no automated other tool that
can do it now.

Since it's block 5242107641856 all the time, it might be worthwhile to
have a look at it. Either it's that block, or there's a bigger mess
hidden behind it.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-02-22 Thread Lukas Tribus

I did a "btrfs check" (--readonly):

Summary:
589x filetype 1 errors 4, no inode ref (--> Files)
597x filetype 2 errors 4, no inode ref (--> Directories)
1183x root xxx inode YY errors 2001, no inode item, link count wrong

I looked at a handful of reported files which are verifiable via public 
MD5/SHA1 checksums and they are not corrupted, the checksum is correct.


Any hints or suggestions would be much appreciated, please see below for 
the btrfs check output (repeating lines omitted and some filenames 
redacted):


Checking filesystem on /dev/dm-9
UUID: f50f980e-7640-49c7-bf8d-20d55cfe6005
checking extents [.]
[...]
incorrect offsets 14927 14415
bad block 5242107641856

Errors found in extent allocation tree or chunk allocation
checking free space cache [.]
[...]
checking fs roots [.]
[...]
incorrect offsets 14927 14415
incorrect offsets 14927 14415
root 261 inode 127094 errors 500, file extent discount, nbytes wrong
Found file extent holes:
start: 0, len: 499712
unresolved ref dir 127093 index 2 namelen 24 name ABC DE Fghij 
Klmnopr.tuv filetype 1 errors 4, no inode ref

root 261 inode 127095 errors 2001, no inode item, link count wrong
unresolved ref dir 127080 index 13 namelen 17 name 
Whateverdir123456 filetype 2 errors 4, no inode ref

root 261 inode 127097 errors 2001, no inode item, link count wrong
unresolved ref dir 127080 index 14 namelen 12 name 
WhateverDirectory2 filetype 2 errors 4, no inode ref

root 261 inode 127099 errors 2001, no inode item, link count wrong
unresolved ref dir 127080 index 15 namelen 11 name AnyDir filetype 
2 errors 4, no inode ref

root 261 inode 127105 errors 2001, no inode item, link count wrong
unresolved ref dir 127080 index 16 namelen 10 name AnotherDir 
filetype 2 errors 4, no inode ref

root 261 inode 127107 errors 2001, no inode item, link count wrong
unresolved ref dir 127080 index 17 namelen 11 name Folder11 
filetype 2 errors 4, no inode ref

root 261 inode 127112 errors 2001, no inode item, link count wrong
unresolved ref dir 126959 index 51 namelen 11 name Folder120 
filetype 2 errors 4, no inode ref

root 261 inode 127114 errors 2001, no inode item, link count wrong
unresolved ref dir 126146 index 40 namelen 13 name GVC-dir filetype 
2 errors 4, no inode ref

root 261 inode 127396 errors 2001, no inode item, link count wrong
unresolved ref dir 126146 index 41 namelen 4 name G3-dir filetype 2 
errors 4, no inode ref

root 261 inode 127527 errors 2001, no inode item, link count wrong
unresolved ref dir 126146 index 42 namelen 11 name Hello Dir 2 
filetype 2 errors 4, no inode ref

root 261 inode 127535 errors 2001, no inode item, link count wrong
unresolved ref dir 126146 index 43 namelen 4 name Hellodir filetype 
2 errors 4, no inode ref

root 261 inode 127573 errors 2001, no inode item, link count wrong
unresolved ref dir 126146 index 44 namelen 6 name Hello 2 filetype 
2 errors 4, no inode ref

root 261 inode 127620 errors 2001, no inode item, link count wrong
[...]
root 261 inode 177273 errors 2001, no inode item, link count wrong
unresolved ref dir 23439 index 23 namelen 24 name Firefox Setup 
51.0.1.exe filetype 1 errors 4, no inode ref

root 261 inode 177275 errors 2001, no inode item, link count wrong
unresolved ref dir 23439 index 26 namelen 27 name Firefox Setup 
45.7.0esr.exe filetype 1 errors 4, no inode ref

root 261 inode 180457 errors 2001, no inode item, link count wrong
[...]
checking fs roots [o]
incorrect offsets 14927 14415
checking fs roots [.]
[...]
checking fs roots [o]
The following tree block(s) is corrupted in tree 263:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

checking fs roots [o]
incorrect offsets 14927 14415
checking fs roots [O]
The following tree block(s) is corrupted in tree 6685:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

checking fs roots [o]
checking fs roots [.]
incorrect offsets 14927 14415
The following tree block(s) is corrupted in tree 6879:
tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

checking fs roots [o]
incorrect offsets 14927 14415
incorrect offsets 14927 14415
root 6893 inode 127094 errors 500, file extent discount, nbytes wrong
Found file extent holes:
start: 0, len: 499712
unresolved ref dir 127093 index 2 namelen 24 name ABC DE Fghij 
Klmnopr.tuv filetype 1 errors 4, no inode ref

root 6893 inode 127095 errors 2001, no inode item, link count wrong
unresolved ref dir 127080 index 13 namelen 17 name 
Whateverdir123456 filetype 2 errors 4, no inode ref

root 6893 inode 127097 errors 2001, no inode item, link count wrong
[...]
root 6893 inode 177273 errors 2001, no inode item, link count wrong
unresolved ref dir 23439 index 23 namelen 24 name Firefox Setup 
51.0.1.exe filetype 1 errors 4, no inode ref

root 6893 inode 177275 errors 2001, no inode item, link count wrong
unresolved ref dir 23439 index 26 

Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-02-21 Thread Lukas Tribus
Upgrading to 4.8, the FS no longer causes a kernel calltrace and does 
not go read-only. It only shows the "corrupt leaf, slot offset bad" message.


A scrub completed without errors on 3 devices, while it was aborted on 2 
devices. Not sure why it was aborted, since there is no error message in 
dmesg?



Any suggestions why the scrub was aborted?



# uname -a
Linux srv1-dom0 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5 
09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# btrfs scrub status /storage/users/
scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
scrub started at Wed Feb 22 00:07:33 2017 and was aborted after 
06:35:42

total bytes scrubbed: 10.60TiB with 0 errors
/# btrfs scrub status /storage/users/ -d
scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
scrub device /dev/dm-5 (id 1) history
scrub started at Wed Feb 22 00:07:33 2017 and finished after 
06:35:36

total bytes scrubbed: 2.30TiB with 0 errors
scrub device /dev/dm-6 (id 2) history
scrub started at Wed Feb 22 00:07:33 2017 and finished after 
06:35:30

total bytes scrubbed: 2.30TiB with 0 errors
scrub device /dev/dm-7 (id 3) history
scrub started at Wed Feb 22 00:07:33 2017 and finished after 
06:35:42

total bytes scrubbed: 2.30TiB with 0 errors
scrub device /dev/dm-8 (id 4) history
scrub started at Wed Feb 22 00:07:33 2017 and was aborted after 
05:01:37

total bytes scrubbed: 1.85TiB with 0 errors
scrub device /dev/mapper/sde3_crypt (id 5) history
scrub started at Wed Feb 22 00:07:33 2017 and was aborted after 
05:01:37

total bytes scrubbed: 1.85TiB with 0 errors
#dmesg | grep BTRFS
[  929.737119] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[19772.594129] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[19777.127704] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[19777.552191] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39

#

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-02-21 Thread Lukas Tribus

Hi list!


I have btrfs pool consisting of 5x 2,72 TiB LUKS (dm-crypt) partitions 
in RAID1, mounted on Linux 4.4 with btrfs-progs 4.4. I never had any 
crashes or power loss here, but recently about every 60 - 120 minutes 
(while in use) btrfs detects corruptions, aborts the transaction and 
drops to read-only mode.
btrfs still mounts normally without any special options (it does take 
about 60 seconds, which I guess is normal for this kind of size). All 
LUKS partitions have at least 400GiB of free space.


I don't see any HW problems here; I doubt there is a corruption coming 
from the LUKS partition. I did test the RAM but it seems fine in 
multiple memtest86+ amd memtest86 runs.



Are there any known bugs in 4.4? Any suggestions would be greatly 
appreciated!



I have to admit I did not regularly scrub.


Thanks,
Lukas


---
~# uname -a
Linux srv1-dom0 4.4.0-63-generic #84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

~# btrfs --version
btrfs-progs v4.4
~# btrfs fi show
Label: 'dom0-os'  uuid: e475636c-21e0-4563-87d6-91f03c519a62
Total devices 5 FS bytes used 3.52GiB
devid1 size 10.00GiB used 3.53GiB path /dev/sda2
devid2 size 10.00GiB used 4.25GiB path /dev/sdb2
devid3 size 10.00GiB used 3.28GiB path /dev/sdc2
devid4 size 10.00GiB used 4.00GiB path /dev/sdd2
devid5 size 10.00GiB used 4.00GiB path /dev/sde2

Label: 'storage_pool'  uuid: f50f980e-7640-49c7-bf8d-20d55cfe6005
Total devices 5 FS bytes used 5.77TiB
devid1 size 2.72TiB used 2.31TiB path /dev/mapper/sda3_crypt
devid2 size 2.72TiB used 2.31TiB path /dev/mapper/sdb3_crypt
devid3 size 2.72TiB used 2.31TiB path /dev/mapper/sdc3_crypt
devid4 size 2.72TiB used 2.31TiB path /dev/mapper/sdd3_crypt
devid5 size 2.72TiB used 2.31TiB path /dev/mapper/sde3_crypt
~# btrfs fi df /storage/users/
Data, RAID1: total=5.77TiB, used=5.76TiB
System, RAID1: total=32.00MiB, used=832.00KiB
Metadata, RAID1: total=8.00GiB, used=6.96GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
~#

~#

partial dmesg:
[ 1509.033492] BTRFS: device label storage_pool devid 1 transid 238135 
/dev/dm-5
[ 1510.498804] BTRFS: device label storage_pool devid 2 transid 238135 
/dev/dm-6
[ 1511.980968] BTRFS: device label storage_pool devid 3 transid 238135 
/dev/dm-7
[ 1513.461799] BTRFS: device label storage_pool devid 4 transid 238135 
/dev/dm-8
[ 1514.838757] BTRFS: device label storage_pool devid 5 transid 238135 
/dev/dm-9

[ 1517.726471] BTRFS info (device dm-9): btrfs: use no compression
[ 1517.726477] BTRFS info (device dm-9): disk space caching is enabled
[ 1517.726479] BTRFS: has skinny extents
[ 1569.598633] BTRFS: checking UUID tree
[ 3540.825747] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[ 3540.836168] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39

[ 3540.846413] [ cut here ]
[ 3540.846432] WARNING: CPU: 2 PID: 2757 at 
/build/linux-mPTI9s/linux-4.4.0/fs/btrfs/extent-tree.c:2930 
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()

[ 3540.846433] BTRFS: Transaction aborted (error -5)
[ 3540.846434] Modules linked in: algif_skcipher af_alg xen_gntdev 
xen_evtchn xenfs xen_privcmd drbg ansi_cprng dm_crypt nls_iso8859_1 
bridge stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel serio_raw joydev 
input_leds nuvoton_cir 8250_fintek ie31200_edac mac_hid rc_core lpc_ich 
edac_core shpchp mei_me mei ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid mxm_wmi i915 i2c_algo_bit drm_kms_helper 
aesni_intel aes_x86_64 glue_helper syscopyarea sysfillrect firewire_ohci 
sysimgblt firewire_core fb_sys_fops lrw psmouse
[ 3540.846466]  tg3 gf128mul ablk_helper cryptd crc_itu_t ptp ahci drm 
pps_core libahci fjes wmi video
[ 3540.846473] CPU: 2 PID: 2757 Comm: btrfs-transacti Not tainted 
4.4.0-63-generic #84-Ubuntu
[ 3540.846475] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
[ 3540.846476]  0200 02709bc3 88007615fc90 
813f8083
[ 3540.846478]  88007615fcd8 c048d498 88007615fcc8 
810812d2
[ 3540.846479]  8802adf562f8 8802a9c71800 8800056caef0 


[ 3540.846481] Call Trace:
[ 3540.846486]  [] dump_stack+0x63/0x90
[ 3540.846489]  [] warn_slowpath_common+0x82/0xc0
[ 3540.846491]  [] warn_slowpath_fmt+0x5c/0x80
[ 3540.846500]  [] ? 
__btrfs_run_delayed_refs+0xcdd/0x1220 [btrfs]
[ 3540.846509]  [] btrfs_run_delayed_refs+0x26b/0x2a0 
[btrfs]
[ 3540.846520]  [] commit_cowonly_roots+0x22b/0x2c2

Re: corrupt leaf, slot offset bad

2016-11-14 Thread Kai Krakow
Am Tue, 11 Oct 2016 07:09:49 -0700
schrieb Liu Bo :

> On Tue, Oct 11, 2016 at 02:48:09PM +0200, David Sterba wrote:
> > Hi,
> > 
> > looks like a lot of random bitflips.
> > 
> > On Mon, Oct 10, 2016 at 11:50:14PM +0200, a...@aron.ws wrote:  
> > > item 109 has a few strange chars in its name (and it's
> > > truncated): 1-x86_64.pkg.tar.xz 0x62 0x14 0x0a 0x0a
> > > 
> > >   item 105 key (261 DIR_ITEM 54556048) itemoff 11723
> > > itemsize 72 location key (606286 INODE_ITEM 0) type FILE
> > >   namelen 42 datalen 0 name:
> > > python2-gobject-3.20.1-1-x86_64.pkg.tar.xz item 106 key (261
> > > DIR_ITEM 56363628) itemoff 11660 itemsize 63 location key (894298
> > > INODE_ITEM 0) type FILE namelen 33 datalen 0 name:
> > > unrar-1:5.4.5-1-x86_64.pkg.tar.xz item 107 key (261 DIR_ITEM
> > > 66963651) itemoff 11600 itemsize 60 location key (1178 INODE_ITEM
> > > 0) type FILE namelen 30 datalen 0 name:
> > > glibc-2.23-5-x86_64.pkg.tar.xz item 108 key (261 DIR_ITEM
> > > 68561395) itemoff 11532 itemsize 68 location key (660578
> > > INODE_ITEM 0) type FILE namelen 38 datalen 0 name:
> > > squashfs-tools-4.3-4-x86_64.pkg.tar.xz item 109 key (261 DIR_ITEM
> > > 76859450) itemoff 11483 itemsize 65 location key (2397184
> > > UNKNOWN.0 7091317839824617472) type 45 namelen 13102 datalen
> > > 13358 name: 1-x86_64.pkg.tar.xzb  
> > 
> > namelen must be smaller than 255, but the number itself does not
> > look like a bitflip (0x332e), the name looks like a fragment of.
> > 
> > The location key is random garbage, likely an overwritten memory,
> > 7091317839824617472 == 0x62696c010023 contains ascii 'bil', the
> > key type is unknown but should be INODE_ITEM.
> >   
> > >   data
> > >   item 110 key (261 DIR_ITEM 9799832789237604651) itemoff
> > > 11405 itemsize 62
> > >   location key (388547 INODE_ITEM 0) type FILE
> > >   namelen 32 datalen 0 name:
> > > intltool-0.51.0-1-any.pkg.tar.xz item 111 key (261 DIR_ITEM
> > > 81211850) itemoff 11344 itemsize 131133  
> > 
> > itemsize 131133 == 0x2003d is a clear bitflip, 0x3d == 61,
> > corresponds to the expected item size.
> > 
> > There's possibly other random bitflips in the keys or other
> > structures. It's hard to estimate the damage and thus the scope of
> > restorable data.  
> 
> It makes sense since this's a ssd we may have only one copy for
> metadata.
> 
> Thanks,
> 
> -liubo

>From this point of view it doesn't make sense to store only one copy of
meta data on SSD... The bit flip probably happened in RAM when taking
the other garbage into account, so dup meta data could have helped here.

If the SSD firmware would collapse duplicate meta data into single
blobs, that's perfectly fine. If the dup meta data arrives with bits
flipped, it won't be deduplicated. So this is fine, too.

BTW: I cannot believe that SSD firmwares really do the quite expensive
job of deduplication other than maybe internal compression. Maybe there
are some drives out there but most won't deduplicate. It's just too
little gain for too much complexity. So I personally would always
switch on duplicate meta data even for SSD. It shouldn't add to wear
leveling too much if you do the usual SSD optimization anyways (like
noatime).

PS: I suggest doing an extensive memtest86 before trying any repairs on
this system... Are you probably mixing different model DIMMs in dual
channel slots? Most of the times I've seen bitflips, this was the
culprit...

-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupt leaf, slot offset bad

2016-10-11 Thread Liu Bo
On Tue, Oct 11, 2016 at 02:48:09PM +0200, David Sterba wrote:
> Hi,
> 
> looks like a lot of random bitflips.
> 
> On Mon, Oct 10, 2016 at 11:50:14PM +0200, a...@aron.ws wrote:
> > item 109 has a few strange chars in its name (and it's truncated): 
> > 1-x86_64.pkg.tar.xz 0x62 0x14 0x0a 0x0a
> > 
> > item 105 key (261 DIR_ITEM 54556048) itemoff 11723 itemsize 72
> > location key (606286 INODE_ITEM 0) type FILE
> > namelen 42 datalen 0 name: 
> > python2-gobject-3.20.1-1-x86_64.pkg.tar.xz
> > item 106 key (261 DIR_ITEM 56363628) itemoff 11660 itemsize 63
> > location key (894298 INODE_ITEM 0) type FILE
> > namelen 33 datalen 0 name: unrar-1:5.4.5-1-x86_64.pkg.tar.xz
> > item 107 key (261 DIR_ITEM 66963651) itemoff 11600 itemsize 60
> > location key (1178 INODE_ITEM 0) type FILE
> > namelen 30 datalen 0 name: glibc-2.23-5-x86_64.pkg.tar.xz
> > item 108 key (261 DIR_ITEM 68561395) itemoff 11532 itemsize 68
> > location key (660578 INODE_ITEM 0) type FILE
> > namelen 38 datalen 0 name: 
> > squashfs-tools-4.3-4-x86_64.pkg.tar.xz
> > item 109 key (261 DIR_ITEM 76859450) itemoff 11483 itemsize 65
> > location key (2397184 UNKNOWN.0 7091317839824617472) type 45
> > namelen 13102 datalen 13358 name: 1-x86_64.pkg.tar.xzb
> 
> namelen must be smaller than 255, but the number itself does not look
> like a bitflip (0x332e), the name looks like a fragment of.
> 
> The location key is random garbage, likely an overwritten memory,
> 7091317839824617472 == 0x62696c010023 contains ascii 'bil', the key
> type is unknown but should be INODE_ITEM.
> 
> > data
> > item 110 key (261 DIR_ITEM 9799832789237604651) itemoff 11405 itemsize 
> > 62
> > location key (388547 INODE_ITEM 0) type FILE
> > namelen 32 datalen 0 name: intltool-0.51.0-1-any.pkg.tar.xz
> > item 111 key (261 DIR_ITEM 81211850) itemoff 11344 itemsize 131133
> 
> itemsize 131133 == 0x2003d is a clear bitflip, 0x3d == 61, corresponds
> to the expected item size.
> 
> There's possibly other random bitflips in the keys or other structures.
> It's hard to estimate the damage and thus the scope of restorable data.

It makes sense since this's a ssd we may have only one copy for metadata.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupt leaf, slot offset bad

2016-10-11 Thread David Sterba
Hi,

looks like a lot of random bitflips.

On Mon, Oct 10, 2016 at 11:50:14PM +0200, a...@aron.ws wrote:
> item 109 has a few strange chars in its name (and it's truncated): 
> 1-x86_64.pkg.tar.xz 0x62 0x14 0x0a 0x0a
> 
>   item 105 key (261 DIR_ITEM 54556048) itemoff 11723 itemsize 72
>   location key (606286 INODE_ITEM 0) type FILE
>   namelen 42 datalen 0 name: 
> python2-gobject-3.20.1-1-x86_64.pkg.tar.xz
>   item 106 key (261 DIR_ITEM 56363628) itemoff 11660 itemsize 63
>   location key (894298 INODE_ITEM 0) type FILE
>   namelen 33 datalen 0 name: unrar-1:5.4.5-1-x86_64.pkg.tar.xz
>   item 107 key (261 DIR_ITEM 66963651) itemoff 11600 itemsize 60
>   location key (1178 INODE_ITEM 0) type FILE
>   namelen 30 datalen 0 name: glibc-2.23-5-x86_64.pkg.tar.xz
>   item 108 key (261 DIR_ITEM 68561395) itemoff 11532 itemsize 68
>   location key (660578 INODE_ITEM 0) type FILE
>   namelen 38 datalen 0 name: 
> squashfs-tools-4.3-4-x86_64.pkg.tar.xz
>   item 109 key (261 DIR_ITEM 76859450) itemoff 11483 itemsize 65
>   location key (2397184 UNKNOWN.0 7091317839824617472) type 45
>   namelen 13102 datalen 13358 name: 1-x86_64.pkg.tar.xzb

namelen must be smaller than 255, but the number itself does not look
like a bitflip (0x332e), the name looks like a fragment of.

The location key is random garbage, likely an overwritten memory,
7091317839824617472 == 0x62696c010023 contains ascii 'bil', the key
type is unknown but should be INODE_ITEM.

>   data
>   item 110 key (261 DIR_ITEM 9799832789237604651) itemoff 11405 itemsize 
> 62
>   location key (388547 INODE_ITEM 0) type FILE
>   namelen 32 datalen 0 name: intltool-0.51.0-1-any.pkg.tar.xz
>   item 111 key (261 DIR_ITEM 81211850) itemoff 11344 itemsize 131133

itemsize 131133 == 0x2003d is a clear bitflip, 0x3d == 61, corresponds
to the expected item size.

There's possibly other random bitflips in the keys or other structures.
It's hard to estimate the damage and thus the scope of restorable data.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupt leaf, slot offset bad

2016-10-10 Thread aron

Hi liubo,

item 109 has a few strange chars in its name (and it's truncated): 
1-x86_64.pkg.tar.xz 0x62 0x14 0x0a 0x0a


item 105 key (261 DIR_ITEM 54556048) itemoff 11723 itemsize 72
location key (606286 INODE_ITEM 0) type FILE
namelen 42 datalen 0 name: 
python2-gobject-3.20.1-1-x86_64.pkg.tar.xz
item 106 key (261 DIR_ITEM 56363628) itemoff 11660 itemsize 63
location key (894298 INODE_ITEM 0) type FILE
namelen 33 datalen 0 name: unrar-1:5.4.5-1-x86_64.pkg.tar.xz
item 107 key (261 DIR_ITEM 66963651) itemoff 11600 itemsize 60
location key (1178 INODE_ITEM 0) type FILE
namelen 30 datalen 0 name: glibc-2.23-5-x86_64.pkg.tar.xz
item 108 key (261 DIR_ITEM 68561395) itemoff 11532 itemsize 68
location key (660578 INODE_ITEM 0) type FILE
namelen 38 datalen 0 name: 
squashfs-tools-4.3-4-x86_64.pkg.tar.xz
item 109 key (261 DIR_ITEM 76859450) itemoff 11483 itemsize 65
location key (2397184 UNKNOWN.0 7091317839824617472) type 45
namelen 13102 datalen 13358 name: 1-x86_64.pkg.tar.xzb

data
	item 110 key (261 DIR_ITEM 9799832789237604651) itemoff 11405 itemsize 
62

location key (388547 INODE_ITEM 0) type FILE
namelen 32 datalen 0 name: intltool-0.51.0-1-any.pkg.tar.xz
item 111 key (261 DIR_ITEM 81211850) itemoff 11344 itemsize 131133
location key (893669 INODE_ITEM 0) type FILE
namelen 31 datalen 0 name: babl-0.1.16-1-x86_64.pkg.tar.xz
location key (388547 INODE_ITEM 0) type FILE

Thanks,
Aron

On 2016-10-10 23:03, Liu Bo wrote:


On Mon, Oct 10, 2016 at 08:57:19PM +0200, aron@aron.wswrote:


Hi all, I've been using btrfs for a few months now, without any
problems. During work, I've noticed segfaults, when accessing my 
root
directory. As my home directory contents was readable, I've decided 
to

reboot. That was the worst decision, as now I can't copy my data off
the SSD. It seems like a memory isse. I have backups, but its ~2 
weeks
old. What I did is a dd dump immediately. Have latest kernel and 
latest

progs built from source now, but :S ... This is what I've got: When
mounting: BTRFS critical (device: sdb2): corrupt leaf, slot offset 
bad:

block=610107392,root=1, slot=108


This indicates that leaf 610107392 is corrupted somehow because its 
slot
108's 'start offset in leaf' and slot 109's 'end offset in leaf' 
doesn't

match with each other, the cause is not shown though.


find-root prints nothing to the stdout ofter 2 hours. running btrfs
inspect-internal dump-tre> 92 /dev/sdb2 leaf 610107392 items 188 
free

spac
tion 90792 owner 5

owner 5 means that it's not a tree root leaf, > ee leaf. fs uuid
2cc75a87-b22b-448e-80d4-383a9f42deed chunk uuid

a5b09a2a-da3d-4049-91ba-4fe66932907b item 0 key (256 INODE_ITEM 0)
itemoff 16123 itemsize 160 inode generation 3 transid 90769 size 144
nbytes 16384 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 
flags

0x0(none) item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
inode ref index 0 namelen 2 name: .. item 2 key (256 DIR_ITEM
145260132) itemoff 16078 itemsize 33 location key (265 INODE_ITEM 0)
type DIR namelen 3 datalen 0 name: dev item 3 key (256 DIR_ITEM
217684952) itemoff 16045 itemsize 33 location key (266 INODE_ITEM 0)
type DIR namelen 3 datalen 0 name: run item 4 key (256 DIR_ITEM
308198373) itemoff 16011 itemsize 34 location key (257

) type DIR ...

Maybe we can check the content of item 108 and item 109 in this 
output

from
'dump-tree'?

Thanks,

-liubo


item 111 key (261 DIR_ITEM 81211850) itemoff 11344 itemsize 131133
location key (893669 INODE_ITEM 0) type FILE namelen 31 datalen 0 
name:
babl-0.1.16-1-x86_64.pkg.tar.xz location key (388547 INODE_ITEM 0) 
type

FILE namelen 32 datalen 0 name: intltool-0.51.0-1-any.pkg.tar.xz ...
namelen 30 datalen 0 name: glibc-2.24-2-x86_64.pkg.tar.xz location 
key

(893658 INODE_ITEM 0) type FILE namelen 36 datalen 0 name:
procps-ng-3.3.12-1-x86_64.pkg.tar.xz location key (EXTENT_TREE
UNKNOWN.3 36094832640) type 12 namelen 0 datalen 0 name: location 
key

(291 UNKNOWN.0 0) type 0 namelen 0 datalen 0 name: location key
(18556457741975552 UNKNOWN.0 0) type 0 namelen 0 datalen 7134 name:
data location key (0 UNKNOWN.0 0) type 0 namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0 namelen 0 datalen 0 name: 
location

key (0 UNKNOWN.0 0) type 0 namelen 0 datalen 0 name: location key (0
UNKNOWN.0 0) type 0 namelen 0 datalen 0 name: location key (0 
UNKNOWN.0
0) type 0 namelen 0 datalen 0 name: location key (0 UNKNOWN.0 0) 
type 0
namelen 0 datalen 0 name: location key (0 UNKNOWN.0 0) type 0 
namelen 0
datalen 0 name: location key (0 UNKNOWN.0 0) type 0 namelen 0 
datalen 0

name: location key (0 UNKNOWN.0 0) type 0 namelen 0 datalen 0 name:
 segfault running restore: incorrect offsets 11532

Re: corrupt leaf, slot offset bad

2016-10-10 Thread Liu Bo
On Mon, Oct 10, 2016 at 08:57:19PM +0200, a...@aron.ws wrote:
> Hi all,
> 
> I've been using btrfs for a few months now, without any problems. During
> work, I've noticed segfaults, when accessing my root directory. As my home
> directory contents was readable, I've decided to reboot. That was the worst
> decision, as now I can't copy my data off the SSD. It seems like a memory
> isse. I have backups, but its ~2 weeks old. What I did is a dd dump
> immediately. Have latest kernel and latest progs built from source now, but
> :S ...
> 
> This is what I've got:
> 
> When mounting:
> 
> BTRFS critical (device: sdb2): corrupt leaf, slot offset bad:
> block=610107392,root=1, slot=108

This indicates that leaf 610107392 is corrupted somehow because its slot
108's 'start offset in leaf' and slot 109's 'end offset in leaf' doesn't
match with each other, the cause is not shown though.

> 
> find-root prints nothing to the stdout ofter 2 hours.
> 
> running btrfs inspect-internal dump-tree -b 610107392 /dev/sdb2

> 
> leaf 610107392 items 188 free space 1690 generation 90792 owner 5

owner 5 means that it's not a tree root leaf, it's a fs tree leaf.

> fs uuid 2cc75a87-b22b-448e-80d4-383a9f42deed
> chunk uuid a5b09a2a-da3d-4049-91ba-4fe66932907b
>   item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
>   inode generation 3 transid 90769 size 144 nbytes 16384
>   block group 0 mode 40755 links 1 uid 0 gid 0
>   rdev 0 flags 0x0(none)
>   item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
>   inode ref index 0 namelen 2 name: ..
>   item 2 key (256 DIR_ITEM 145260132) itemoff 16078 itemsize 33
>   location key (265 INODE_ITEM 0) type DIR
>   namelen 3 datalen 0 name: dev
>   item 3 key (256 DIR_ITEM 217684952) itemoff 16045 itemsize 33
>   location key (266 INODE_ITEM 0) type DIR
>   namelen 3 datalen 0 name: run
>   item 4 key (256 DIR_ITEM 308198373) itemoff 16011 itemsize 34
>   location key (257 INODE_ITEM 0) type DIR
> 
> ...

Maybe we can check the content of item 108 and item 109 in this output from
'dump-tree'?

Thanks,

-liubo

>   item 111 key (261 DIR_ITEM 81211850) itemoff 11344 itemsize 131133
>   location key (893669 INODE_ITEM 0) type FILE
>   namelen 31 datalen 0 name: babl-0.1.16-1-x86_64.pkg.tar.xz
>   location key (388547 INODE_ITEM 0) type FILE
>   namelen 32 datalen 0 name: intltool-0.51.0-1-any.pkg.tar.xz
> ...
>   namelen 30 datalen 0 name: glibc-2.24-2-x86_64.pkg.tar.xz
>   location key (893658 INODE_ITEM 0) type FILE
>   namelen 36 datalen 0 name: procps-ng-3.3.12-1-x86_64.pkg.tar.xz
>   location key (EXTENT_TREE UNKNOWN.3 36094832640) type 12
>   namelen 0 datalen 0 name:
>   location key (291 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (18556457741975552 UNKNOWN.0 0) type 0
>   namelen 0 datalen 7134 name:
>   data
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
>   location key (0 UNKNOWN.0 0) type 0
>   namelen 0 datalen 0 name:
> 
> 
> 
> segfault
> 
> 
> running restore:
> 
> incorrect offsets 11532 11548
> Error searching -1
> 
> Tried every rescue, check commands, in different variations ... nothing. It
> seems that the root leaf (?) has some garbage, tried using the corrupt-block
> utility, to mark the item dirty got the same error: incorrect offsets.
> 
> The only thing I've managed is to restore a part of the /etc directory,
> with: btrfs restore -i -f 610123776 - -d /dev/sdb2 /mnt/restore
> 
> I'm still trying to learn how the data is structured now, but my problem is
> that I can't figure out how to calculate the leaf positions, using the
> dump-tree output ...
> 
> I need some kind tool/script that can recursively rescue the structure from
>

corrupt leaf, slot offset bad

2016-10-10 Thread aron

Hi all,

I've been using btrfs for a few months now, without any problems. 
During work, I've noticed segfaults, when accessing my root directory. 
As my home directory contents was readable, I've decided to reboot. That 
was the worst decision, as now I can't copy my data off the SSD. It 
seems like a memory isse. I have backups, but its ~2 weeks old. What I 
did is a dd dump immediately. Have latest kernel and latest progs built 
from source now, but :S ...


This is what I've got:

When mounting:

BTRFS critical (device: sdb2): corrupt leaf, slot offset bad: 
block=610107392,root=1, slot=108


find-root prints nothing to the stdout ofter 2 hours.

running btrfs inspect-internal dump-tree -b 610107392 /dev/sdb2

leaf 610107392 items 188 free space 1690 generation 90792 owner 5
fs uuid 2cc75a87-b22b-448e-80d4-383a9f42deed
chunk uuid a5b09a2a-da3d-4049-91ba-4fe66932907b
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 3 transid 90769 size 144 nbytes 16384
block group 0 mode 40755 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
inode ref index 0 namelen 2 name: ..
item 2 key (256 DIR_ITEM 145260132) itemoff 16078 itemsize 33
location key (265 INODE_ITEM 0) type DIR
namelen 3 datalen 0 name: dev
item 3 key (256 DIR_ITEM 217684952) itemoff 16045 itemsize 33
location key (266 INODE_ITEM 0) type DIR
namelen 3 datalen 0 name: run
item 4 key (256 DIR_ITEM 308198373) itemoff 16011 itemsize 34
location key (257 INODE_ITEM 0) type DIR

...
item 111 key (261 DIR_ITEM 81211850) itemoff 11344 itemsize 131133
location key (893669 INODE_ITEM 0) type FILE
namelen 31 datalen 0 name: babl-0.1.16-1-x86_64.pkg.tar.xz
location key (388547 INODE_ITEM 0) type FILE
namelen 32 datalen 0 name: intltool-0.51.0-1-any.pkg.tar.xz
...
namelen 30 datalen 0 name: glibc-2.24-2-x86_64.pkg.tar.xz
location key (893658 INODE_ITEM 0) type FILE
namelen 36 datalen 0 name: procps-ng-3.3.12-1-x86_64.pkg.tar.xz
location key (EXTENT_TREE UNKNOWN.3 36094832640) type 12
namelen 0 datalen 0 name:
location key (291 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (18556457741975552 UNKNOWN.0 0) type 0
namelen 0 datalen 7134 name:
data
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:
location key (0 UNKNOWN.0 0) type 0
namelen 0 datalen 0 name:



segfault


running restore:

incorrect offsets 11532 11548
Error searching -1

Tried every rescue, check commands, in different variations ... 
nothing. It seems that the root leaf (?) has some garbage, tried using 
the corrupt-block utility, to mark the item dirty got the same error: 
incorrect offsets.


The only thing I've managed is to restore a part of the /etc directory, 
with: btrfs restore -i -f 610123776 - -d /dev/sdb2 /mnt/restore


I'm still trying to learn how the data is structured now, but my 
problem is that I can't figure out how to calculate the leaf positions, 
using the dump-tree output ...


I need some kind tool/script that can recursively rescue the structure 
from a defined leaf. (can this be done?)


Any help would be appreciated! :)

Thanks!

Yours,
Aron

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-02-04 Thread Chris Bainbridge
I had another btrfs crash with identical symptoms. This time I managed
to reproduce the crash and identify the root cause. It turns out that
the Apple boot firmware is buggy and leaves wifi DMA turned on, which
randomly corrupts memory after Linux is running.
https://bugzilla.kernel.org/show_bug.cgi?id=111781

So btrfs is not to blame after all.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-05 Thread Chris Bainbridge
On 5 January 2016 at 01:57, Qu Wenruo  wrote:
>>
>> Data, single: total=106.79GiB, used=82.01GiB
>> System, single: total=4.00MiB, used=16.00KiB
>> Metadata, single: total=2.01GiB, used=1.51GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> That's the btrfs fi df misleading output confusing you.
>
> In fact, your metadata is already used up without available space.
> GlobalReserve should also be counted as Metadata *used* space.

Thanks for the explanation - the FAQ[1] misleads when it describes
GlobalReserve as "The block reserve is only virtual and is not stored
on the devices." - which sounds like the reserve is literally not
stored on the drive.

The FAQ[2] also suggests that the free space in metadata can be less
than the block reserve total:

"If the free space in metadata is less than or equal to the block
reserve value (typically 512 MiB, but might be something else on a
particularly small or large filesystem), then it's close to full."

But what you are saying is that this is wrong and the free space in
metadata can never be less than the block reserve, because the block
reserve includes the metadata free space?

[1] 
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_GlobalReserve_and_why_does_.27btrfs_fi_df.27_show_it_as_single_even_on_RAID_filesystems.3F
[2] 
https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29

> Good, 5GiB freed space, it can be allocated for metadata to slightly reduce
> the metadata pressure.
>
> But not for long.
> The root resolve will be, add more space into this btrfs.

Yes but this is a 128GB SSD and metadata could have been reallocated
from some of the 25GB of free space allocated to data. Even with a
bigger drive, it is possible that chunks could be allocated to data,
and then later operations requiring more metadata will still run out
(running out of metadata space seems to be a reasonably common
occurrence judging by the number of "why is btrfs reporting no space
when I have space free" questions). The file system shouldn't be
corrupted when that happens.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-05 Thread Qu Wenruo



Chris Bainbridge wrote on 2016/01/05 13:41 +:

On 5 January 2016 at 01:57, Qu Wenruo  wrote:


Data, single: total=106.79GiB, used=82.01GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=2.01GiB, used=1.51GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



That's the btrfs fi df misleading output confusing you.

In fact, your metadata is already used up without available space.
GlobalReserve should also be counted as Metadata *used* space.


Thanks for the explanation - the FAQ[1] misleads when it describes
GlobalReserve as "The block reserve is only virtual and is not stored
on the devices." - which sounds like the reserve is literally not
stored on the drive.


In fact FAQ description is not wrong either.

GlobalReserve is not stored in any where, that's true.
Since it doesn't takes space(unless its used is not 0), it is stored no 
where and FAQ is right.


Metadata allocation algorithm will try its best to keep enough free 
space for GlobalReserve.
So for end user, space you can't directly use is no different from used 
space.




The FAQ[2] also suggests that the free space in metadata can be less
than the block reserve total:

"If the free space in metadata is less than or equal to the block
reserve value (typically 512 MiB, but might be something else on a
particularly small or large filesystem), then it's close to full."

But what you are saying is that this is wrong and the free space in
metadata can never be less than the block reserve, because the block
reserve includes the metadata free space?


Sorry for the confusion.
Yes, it's possible for available metadata space less than global reserve 
space.


But when it happens, your used space in GlobalReserved is not 0, and 
unfortunately you are already super short of space.

Meaning you are even unable to touch an empty file.

And in that case, if your kernel is not new enough, you can't even 
delete a file thanks to the metadata COW.


So for common case, one can just treat global reserve as used metadata, 
unless used global reserve is not 0.




[1] 
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_GlobalReserve_and_why_does_.27btrfs_fi_df.27_show_it_as_single_even_on_RAID_filesystems.3F
[2] 
https://btrfs.wiki.kernel.org/index.php/FAQ#if_your_device_is_large_.28.3E16GiB.29


Good, 5GiB freed space, it can be allocated for metadata to slightly reduce
the metadata pressure.

But not for long.
The root resolve will be, add more space into this btrfs.


Yes but this is a 128GB SSD and metadata could have been reallocated
from some of the 25GB of free space allocated to data.


This can only happens when:
1) All data chunk is balanced into super compact case, to free all the 25G
   Since btrfs store data and metadata into different chunks, one needs
   to use balance to free space from allocated data/metadata chunks.

   And in your case, you just tried dlimit=1 2 and 5, which will only
   free at most 8 chunks (and at most 8G space).

   If you want to free all the 25G free space from data chunks, then no
   dlimit at all.

2) Mixed block groups.
   This is the most straightforward case.
   All data and metadata can be stored into the same chunk. Then no
   such problem at all.

   But developers tends to avoid such behavior though.

Even with a
bigger drive, it is possible that chunks could be allocated to data,
and then later operations requiring more metadata will still run out
(running out of metadata space seems to be a reasonably common
occurrence judging by the number of "why is btrfs reporting no space
when I have space free" questions).


This is true, and that's the long existing btrfs problem.

Except balance and add more devices, there is no super good ideas so far.
Maybe one day we can enhance it from the allocation algorithm.


The file system shouldn't be corrupted when that happens.



I'm sorry that I'm off topic for the GlobalReserve and unbalanced 
data/metadata chunk.


But I don't consider the corruption is caused by unbalanced 
data/metadata chunks.


So let's go back to the corruption case.

Since you took the image of the corrupted fs, would you please try the 
following commands on the corrupted fs?


$ btrfs-debug-tree -b 67239936 

And, what the kernel mount option for the fs before crash?

The kernel messages shows that your tree root is corrupted.
This is common for a power loss.

But the problem is, btrfs uses barrier to ensure superblock is written 
to disk *after* all other metadata committed.
Or superblock is not updated and still points to old metadata, makes 
everything fine.


So, either barrier is broken or you specified nobarrier, or the power 
loss directly corrupted the new tree root and magically makes the csum 
still match.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-05 Thread Chris Bainbridge
On Wed, Jan 06, 2016 at 08:57:28AM +0800, Qu Wenruo wrote:
> 
> Since you took the image of the corrupted fs, would you please try the
> following commands on the corrupted fs?
> 
> $ btrfs-debug-tree -b 67239936 

Command runs then segfaults:

leaf 67239936 items 92 free space 9138 generation 276688 owner 2
fs uuid b1103526-98a3-4b40-a782-cf66721ed600
chunk uuid 16e767e3-a321-4d0f-9c72-6ebac9d305c4
item 0 key (61513990144 EXTENT_ITEM 16384) itemoff 16232 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61617864704 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 1 key (61514006528 EXTENT_ITEM 16384) itemoff 16181 itemsize 51
extent refs 1 gen 276285 flags TREE_BLOCK
tree block key (27627 DIR_INDEX 3576) level 0
tree block backref root 260
item 2 key (61514022912 EXTENT_ITEM 16384) itemoff 16130 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99424354304) level 0
tree block backref root 7
item 3 key (61514039296 EXTENT_ITEM 16384) itemoff 16079 itemsize 51
extent refs 1 gen 275904 flags TREE_BLOCK
tree block key (118 INODE_ITEM 0) level 0
tree block backref root 260
item 4 key (61514055680 EXTENT_ITEM 16384) itemoff 16028 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61702111232 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 5 key (61514088448 EXTENT_ITEM 16384) itemoff 15977 itemsize 51
extent refs 1 gen 276285 flags TREE_BLOCK
tree block key (1906872 INODE_REF 34185) level 0
tree block backref root 260
item 6 key (61514104832 EXTENT_ITEM 16384) itemoff 15926 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61741957120 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 7 key (61514121216 EXTENT_ITEM 16384) itemoff 15875 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99654656000) level 0
tree block backref root 7
item 8 key (61514137600 EXTENT_ITEM 16384) itemoff 15824 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99620417536) level 0
tree block backref root 7
item 9 key (61514153984 EXTENT_ITEM 16384) itemoff 15773 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99669962752) level 0
tree block backref root 7
item 10 key (61514170368 EXTENT_ITEM 16384) itemoff 15722 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99639615488) level 0
tree block backref root 7
item 11 key (61514186752 EXTENT_ITEM 16384) itemoff 15671 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99681320960) level 0
tree block backref root 7
item 12 key (61514203136 EXTENT_ITEM 16384) itemoff 15620 itemsize 51
extent refs 1 gen 276285 flags TREE_BLOCK
tree block key (882130 INODE_ITEM 0) level 0
tree block backref root 260
item 13 key (61514219520 EXTENT_ITEM 16384) itemoff 15569 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (61831168000 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 14 key (61514268672 EXTENT_ITEM 16384) itemoff 15518 itemsize 51
extent refs 1 gen 275904 flags TREE_BLOCK
tree block key (1553336 INODE_ITEM 0) level 0
tree block backref root 260
item 15 key (61514285056 EXTENT_ITEM 16384) itemoff 15467 itemsize 51
extent refs 1 gen 276685 flags TREE_BLOCK
tree block key (62053400576 EXTENT_ITEM 16384) level 0
tree block backref root 2
item 16 key (61514334208 EXTENT_ITEM 16384) itemoff 15416 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99928444928) level 0
tree block backref root 7
item 17 key (61514350592 EXTENT_ITEM 16384) itemoff 15365 itemsize 51
extent refs 1 gen 266026 flags TREE_BLOCK
tree block key (EXTENT_CSUM EXTENT_CSUM 99940794368) level 0
tree block backref root 7
item 18 key (61514366976 EXTENT_ITEM 16384) itemoff 15314 itemsize 51
extent refs 1 

Re: Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-04 Thread Qu Wenruo



Chris Bainbridge wrote on 2016/01/04 17:05 +:

Kernel 4.4.0-rc7
System is Macbook with 109GB btrfs partition on SSD

System crashed (nothing in syslog, could have been btrfs or possibly GPU fault
shortly after running xrandr). After hard reset the btrfs partition was corrupt
and would not mount. I took an image of the partition (some of the output below
refers to sda5, some to loop0, it is the same image)

# dmesg of mount failure (using kernel 4.2 from an Ubuntu 15.10 recovery drive 
but same on 4.4.0-rc7):

[ 1969.425321] BTRFS critical (device sda5): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[ 1969.428809] BTRFS critical (device sda5): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[ 1969.431981] [ cut here ]
[ 1969.432018] WARNING: CPU: 2 PID: 11162 at 
/build/linux-cRemOf/linux-4.2.0/fs/btrfs/extent-tree.c:6264 
__btrfs_free_extent.isra.69+0x2ef/0xd70 [btrfs]()
[ 1969.432028] BTRFS: Transaction aborted (error -5)
[ 1969.432030] Modules linked in: drbg ansi_cprng ctr ccm intel_rapl iosf_mbi 
x86_pkg_temp_thermal intel_powerclamp btrfs arc4 xor coretemp rt2800usb 
rt2x00usb raid6_pq b43 rt2800lib rt2x00lib kvm_intel mac80211 cfg80211 btusb 
btrtl btbcm btintel bluetooth uvcvideo kvm ssb crc_ccitt crct10dif_pclmul 
crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_codec_generic 
videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev 
snd_hda_intel snd_hda_codec snd_hda_core aesni_intel aes_x86_64 lrw gf128mul 
glue_helper applesmc snd_hwdep snd_pcm snd_timer snd soundcore joydev 
input_polldev media ablk_helper cryptd bcm5974 input_leds lpc_ich bcma mei_me 
mei thunderbolt sbs sbshc acpi_als kfifo_buf apple_gmux industrialio mac_hid 
shpchp apple_bl autofs4 hid_generic i915 hid_apple sdhci_pci
[ 1969.432115]  i2c_algo_bit ahci uas drm_kms_helper libahci sdhci usb_storage 
usbhid drm video hid
[ 1969.432132] CPU: 2 PID: 11162 Comm: mount Tainted: GW   
4.2.0-22-generic #27-Ubuntu
[ 1969.432136] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B07.1501071215 01/07/2015
[ 1969.432139]   037e909d 88026267b5d8 
817e94c9
[ 1969.432145]   88026267b630 88026267b618 
8107b3d6
[ 1969.432152]  88026267b608 000e5297c000 fffb 

[ 1969.432157] Call Trace:
[ 1969.432167]  [] dump_stack+0x45/0x57
[ 1969.432175]  [] warn_slowpath_common+0x86/0xc0
[ 1969.432181]  [] warn_slowpath_fmt+0x55/0x70
[ 1969.432199]  [] __btrfs_free_extent.isra.69+0x2ef/0xd70 
[btrfs]
[ 1969.432229]  [] ? find_ref_head+0x5a/0x80 [btrfs]
[ 1969.432248]  [] __btrfs_run_delayed_refs+0x988/0x1080 
[btrfs]
[ 1969.432268]  [] btrfs_run_delayed_refs.part.73+0x6e/0x270 
[btrfs]
[ 1969.432284]  [] ? btrfs_set_path_blocking+0x43/0x80 [btrfs]
[ 1969.432306]  [] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 1969.432326]  [] btrfs_commit_transaction+0x56/0xb20 [btrfs]
[ 1969.432332]  [] ? kmem_cache_free+0x1cf/0x1e0
[ 1969.432356]  [] btrfs_recover_log_trees+0x3ed/0x490 [btrfs]
[ 1969.432378]  [] ? replay_one_extent+0x6a0/0x6a0 [btrfs]
[ 1969.432397]  [] open_ctree+0x19b1/0x23e0 [btrfs]
[ 1969.432412]  [] btrfs_mount+0x94e/0xa70 [btrfs]
[ 1969.432420]  [] ? find_next_bit+0x15/0x20
[ 1969.432427]  [] ? pcpu_alloc+0x385/0x670
[ 1969.432434]  [] mount_fs+0x38/0x160
[ 1969.432439]  [] ? __alloc_percpu+0x15/0x20
[ 1969.432446]  [] vfs_kern_mount+0x6b/0x120
[ 1969.432463]  [] btrfs_mount+0x1e8/0xa70 [btrfs]
[ 1969.432469]  [] ? pcpu_alloc+0x385/0x670
[ 1969.432475]  [] mount_fs+0x38/0x160
[ 1969.432481]  [] ? __alloc_percpu+0x15/0x20
[ 1969.432486]  [] vfs_kern_mount+0x6b/0x120
[ 1969.432493]  [] do_mount+0x246/0xd10
[ 1969.432498]  [] ? strndup_user+0x4e/0xb0
[ 1969.432503]  [] ? memdup_user+0x46/0x80
[ 1969.432510]  [] SyS_mount+0x9f/0x100
[ 1969.432519]  [] entry_SYSCALL_64_fastpath+0x16/0x75
[ 1969.432523] ---[ end trace 7a560cc73341e0d1 ]---
[ 1969.432528] BTRFS: error (device sda5) in __btrfs_free_extent:6264: errno=-5 
IO failure
[ 1969.436576] BTRFS: error (device sda5) in btrfs_run_delayed_refs:2781: 
errno=-5 IO failure
[ 1969.442036] BTRFS: error (device sda5) in btrfs_replay_log:2375: errno=-5 IO 
failure (Failed to recover log tree)
[ 1969.446087] BTRFS error (device sda5): cleaner transaction attach returned 
-30
[ 1969.486477] BTRFS: open_ctree failed


# btrfsck:

checking filesystem on sda5
UUID: b1103526-98a3-4b40-a782-cf66721ed600
checking extents
incorrect offsets 11897 5713478
bad block 67239936
Errors found in extent allocation tree or chunk allocation
checking free space cache
There is no free space entry for 61515563008-62297997312
cache appears valid but isnt 61224255488
found 7839154285 bytes used err is -22
total csum bytes: 0
total tree bytes: 7454720
total fs tree bytes: 0
total extent tree bytes: 7356416
btree space waste bytes: 2454021
file data blocks allocated: 28508160
  referenced 28508160

Crash, boot mount failure: "corrupt leaf, slot offset bad"

2016-01-04 Thread Chris Bainbridge
Kernel 4.4.0-rc7
System is Macbook with 109GB btrfs partition on SSD

System crashed (nothing in syslog, could have been btrfs or possibly GPU fault
shortly after running xrandr). After hard reset the btrfs partition was corrupt
and would not mount. I took an image of the partition (some of the output below
refers to sda5, some to loop0, it is the same image)

# dmesg of mount failure (using kernel 4.2 from an Ubuntu 15.10 recovery drive 
but same on 4.4.0-rc7): 

[ 1969.425321] BTRFS critical (device sda5): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[ 1969.428809] BTRFS critical (device sda5): corrupt leaf, slot offset bad: 
block=67239936,root=1, slot=82
[ 1969.431981] [ cut here ]
[ 1969.432018] WARNING: CPU: 2 PID: 11162 at 
/build/linux-cRemOf/linux-4.2.0/fs/btrfs/extent-tree.c:6264 
__btrfs_free_extent.isra.69+0x2ef/0xd70 [btrfs]()
[ 1969.432028] BTRFS: Transaction aborted (error -5)
[ 1969.432030] Modules linked in: drbg ansi_cprng ctr ccm intel_rapl iosf_mbi 
x86_pkg_temp_thermal intel_powerclamp btrfs arc4 xor coretemp rt2800usb 
rt2x00usb raid6_pq b43 rt2800lib rt2x00lib kvm_intel mac80211 cfg80211 btusb 
btrtl btbcm btintel bluetooth uvcvideo kvm ssb crc_ccitt crct10dif_pclmul 
crc32_pclmul snd_hda_codec_hdmi snd_hda_codec_cirrus snd_hda_codec_generic 
videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev 
snd_hda_intel snd_hda_codec snd_hda_core aesni_intel aes_x86_64 lrw gf128mul 
glue_helper applesmc snd_hwdep snd_pcm snd_timer snd soundcore joydev 
input_polldev media ablk_helper cryptd bcm5974 input_leds lpc_ich bcma mei_me 
mei thunderbolt sbs sbshc acpi_als kfifo_buf apple_gmux industrialio mac_hid 
shpchp apple_bl autofs4 hid_generic i915 hid_apple sdhci_pci
[ 1969.432115]  i2c_algo_bit ahci uas drm_kms_helper libahci sdhci usb_storage 
usbhid drm video hid
[ 1969.432132] CPU: 2 PID: 11162 Comm: mount Tainted: GW   
4.2.0-22-generic #27-Ubuntu
[ 1969.432136] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B07.1501071215 01/07/2015
[ 1969.432139]   037e909d 88026267b5d8 
817e94c9
[ 1969.432145]   88026267b630 88026267b618 
8107b3d6
[ 1969.432152]  88026267b608 000e5297c000 fffb 

[ 1969.432157] Call Trace:
[ 1969.432167]  [] dump_stack+0x45/0x57
[ 1969.432175]  [] warn_slowpath_common+0x86/0xc0
[ 1969.432181]  [] warn_slowpath_fmt+0x55/0x70
[ 1969.432199]  [] __btrfs_free_extent.isra.69+0x2ef/0xd70 
[btrfs]
[ 1969.432229]  [] ? find_ref_head+0x5a/0x80 [btrfs]
[ 1969.432248]  [] __btrfs_run_delayed_refs+0x988/0x1080 
[btrfs]
[ 1969.432268]  [] btrfs_run_delayed_refs.part.73+0x6e/0x270 
[btrfs]
[ 1969.432284]  [] ? btrfs_set_path_blocking+0x43/0x80 [btrfs]
[ 1969.432306]  [] btrfs_run_delayed_refs+0x15/0x20 [btrfs]
[ 1969.432326]  [] btrfs_commit_transaction+0x56/0xb20 [btrfs]
[ 1969.432332]  [] ? kmem_cache_free+0x1cf/0x1e0
[ 1969.432356]  [] btrfs_recover_log_trees+0x3ed/0x490 [btrfs]
[ 1969.432378]  [] ? replay_one_extent+0x6a0/0x6a0 [btrfs]
[ 1969.432397]  [] open_ctree+0x19b1/0x23e0 [btrfs]
[ 1969.432412]  [] btrfs_mount+0x94e/0xa70 [btrfs]
[ 1969.432420]  [] ? find_next_bit+0x15/0x20
[ 1969.432427]  [] ? pcpu_alloc+0x385/0x670
[ 1969.432434]  [] mount_fs+0x38/0x160
[ 1969.432439]  [] ? __alloc_percpu+0x15/0x20
[ 1969.432446]  [] vfs_kern_mount+0x6b/0x120
[ 1969.432463]  [] btrfs_mount+0x1e8/0xa70 [btrfs]
[ 1969.432469]  [] ? pcpu_alloc+0x385/0x670
[ 1969.432475]  [] mount_fs+0x38/0x160
[ 1969.432481]  [] ? __alloc_percpu+0x15/0x20
[ 1969.432486]  [] vfs_kern_mount+0x6b/0x120
[ 1969.432493]  [] do_mount+0x246/0xd10
[ 1969.432498]  [] ? strndup_user+0x4e/0xb0
[ 1969.432503]  [] ? memdup_user+0x46/0x80
[ 1969.432510]  [] SyS_mount+0x9f/0x100
[ 1969.432519]  [] entry_SYSCALL_64_fastpath+0x16/0x75
[ 1969.432523] ---[ end trace 7a560cc73341e0d1 ]---
[ 1969.432528] BTRFS: error (device sda5) in __btrfs_free_extent:6264: errno=-5 
IO failure
[ 1969.436576] BTRFS: error (device sda5) in btrfs_run_delayed_refs:2781: 
errno=-5 IO failure
[ 1969.442036] BTRFS: error (device sda5) in btrfs_replay_log:2375: errno=-5 IO 
failure (Failed to recover log tree)
[ 1969.446087] BTRFS error (device sda5): cleaner transaction attach returned 
-30
[ 1969.486477] BTRFS: open_ctree failed


# btrfsck:

checking filesystem on sda5
UUID: b1103526-98a3-4b40-a782-cf66721ed600
checking extents
incorrect offsets 11897 5713478
bad block 67239936
Errors found in extent allocation tree or chunk allocation
checking free space cache
There is no free space entry for 61515563008-62297997312
cache appears valid but isnt 61224255488
found 7839154285 bytes used err is -22
total csum bytes: 0
total tree bytes: 7454720
total fs tree bytes: 0
total extent tree bytes: 7356416
btree space waste bytes: 2454021
file data blocks allocated: 28508160
 referenced 28508160


# btrfs-image (compiled v4.3.1 from git:.../kdave

Re: Can't mount btrfs: corrupt leaf, slot offset bad

2015-10-14 Thread Hugo Mills
On Tue, Oct 13, 2015 at 06:25:54PM -0500, EJ Parker wrote:
> I rebooted my server last night and discovered that my btrfs
> filesystem (3 disk raid1) would not mount anymore. After doing some
> research and getting nowhere I went to IRC and user darkling asked me
> a few questions and asked for output of btrfs-debug-tree and
> ultimately sent me here saying I should include a handful of things:
> 
> Before I go further, let's get required info out of the way:
> 
> uname -a:
> Linux archhost1 4.2.3-1-ARCH #1 SMP PREEMPT Sat Oct 3 18:52:50
> CEST 2015 x86_64 GNU/Linux
> btrfs --version:
> btrfs-progs v4.2.1
> output from "btrfs fi show":
> Label: none  uuid: 5470630f-39f4-4d39-90a2-277d7991722a
> Total devices 3 FS bytes used 3.10TiB
> devid1 size 3.64TiB used 2.12TiB path /dev/sdd
> devid2 size 3.64TiB used 2.12TiB path /dev/sde
> devid3 size 3.64TiB used 2.12TiB path /dev/sdc
> 
> First, I am able to mount with -o ro,recovery, but not with just -o
> recovery. When I attempt to mount w/o ro, I get this in dmesg:
[snip]
> darklink also mentioned that btrfs-zero-log might help too, but that I
> should get confirmation from one of the devs on that.

   I suggested the zero-log might work because the FS is mountable
with -o ro, but not without, which suggests a corrupt log. However,
it's not obvious to me what tree the corruption is in, and whether
zeroing the log might actually hurt the recovery process.

   Hugo.

-- 
Hugo Mills | There's an infinite number of monkeys outside who
hugo@... carfax.org.uk | want to talk to us about this new script for Hamlet
http://carfax.org.uk/  | they've worked out!
PGP: E2AB1DE4  |   Arthur Dent


signature.asc
Description: Digital signature


Can't mount btrfs: corrupt leaf, slot offset bad

2015-10-13 Thread EJ Parker
I rebooted my server last night and discovered that my btrfs
filesystem (3 disk raid1) would not mount anymore. After doing some
research and getting nowhere I went to IRC and user darkling asked me
a few questions and asked for output of btrfs-debug-tree and
ultimately sent me here saying I should include a handful of things:

Before I go further, let's get required info out of the way:

uname -a:
Linux archhost1 4.2.3-1-ARCH #1 SMP PREEMPT Sat Oct 3 18:52:50
CEST 2015 x86_64 GNU/Linux
btrfs --version:
btrfs-progs v4.2.1
output from "btrfs fi show":
Label: none  uuid: 5470630f-39f4-4d39-90a2-277d7991722a
Total devices 3 FS bytes used 3.10TiB
devid1 size 3.64TiB used 2.12TiB path /dev/sdd
devid2 size 3.64TiB used 2.12TiB path /dev/sde
devid3 size 3.64TiB used 2.12TiB path /dev/sdc

First, I am able to mount with -o ro,recovery, but not with just -o
recovery. When I attempt to mount w/o ro, I get this in dmesg:
[44478.800613] BTRFS critical (device sde): corrupt leaf, slot offset
bad: block=5674754899968,root=1, slot=147
[44478.802489] BTRFS critical (device sde): corrupt leaf, slot offset
bad: block=5674754899968,root=1, slot=147
[44478.804072] BTRFS error (device sde): Error removing orphan entry,
stopping orphan cleanup
[44478.805856] BTRFS error (device sde): could not do orphan cleanup -22
[44482.635498] BTRFS: open_ctree failed

Running "btrfs-debug-tree -b 5674754899968 /dev/sde" gave me this:
leaf 5674754899968 items 207 free space 30 generation 884595 owner 5
fs uuid 5470630f-39f4-4d39-90a2-277d7991722a
chunk uuid c269615e-7397-41bc-95d0-dfdb2a696b23
[...]
item 145 key (273094 EXTENT_DATA 364924928) itemoff 8545 itemsize 53
extent data disk byte 8658465382400 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0
item 146 key (273094 EXTENT_DATA 364929024) itemoff 8492 itemsize 53
extent data disk byte 8658465378304 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0
item 147 key (273094 EXTENT_DATA 364933120) itemoff 8439 itemsize 53
extent data disk byte 8677950173184 nr 24576
extent data offset 0 nr 20480 ram 24576
extent compression 0
item 148 key (273094 EXTENT_DATA 364953600) itemoff 8333 itemsize 53
extent data disk byte 8677990363136 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
item 149 key (273094 EXTENT_DATA 364957696) itemoff 8386 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 18446744073709514752 ram
18446744073709514752
extent compression 0
item 150 key (273094 EXTENT_DATA 364969984) itemoff 8280 itemsize 53
extent data disk byte 8678063341568 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
item 151 key (273094 EXTENT_DATA 365002752) itemoff 8227 itemsize 53
extent data disk byte 8678025232384 nr 36864
extent data offset 0 nr 32768 ram 36864
extent compression 0
item 152 key (273094 EXTENT_DATA 365019136) itemoff 8174 itemsize 53
extent data disk byte 8678112104448 nr 36864
extent data offset 0 nr 32768 ram 36864
extent compression 0
item 153 key (273094 EXTENT_DATA 365051904) itemoff 8121 itemsize 53
extent data disk byte 8678052835328 nr 53248
extent data offset 0 nr 49152 ram 53248
extent compression 0
item 154 key (273094 EXTENT_DATA 365101056) itemoff 8068 itemsize 53
extent data disk byte 8678090510336 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
item 155 key (273094 EXTENT_DATA 365117440) itemoff 8015 itemsize 53
extent data disk byte 8678117130240 nr 20480
extent data offset 0 nr 16384 ram 20480
extent compression 0
[...]


Output from "btrfs check --readonly /dev/sde":
Checking filesystem on /dev/sde
UUID: 5470630f-39f4-4d39-90a2-277d7991722a
checking extents
incorrect offsets 8439 8386
bad block 5674754899968
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots

Output from (failed) "btrfs check --repair /dev/sdc" (which I tried
prior to seeking help):
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 5470630f-39f4-4d39-90a2-277d7991722a
checking extents
incorrect offsets 8439 8386
shifting item nr 148 by bytes in block 5674754899968
items overlap, can't fix
cmds-check.c:4059: fix_item_offset: Assertion `ret` failed.


darklink also mentioned that btrfs-zero-log might help too, but that I
shou

Re: Bug: corrupt leaf. slot offset bad: root subvolume unmountable, btrfs check crashes

2014-04-23 Thread Andreas Reis
Ah. Thank you for the replies. I didn't get them as mails and spinics 
didn't update the thread until yesterday.


So I take it that the recommended course of action is not to wait for 
any more or less unlikely btrfs-progs fix, but to try --repair and be 
ready to restore from backup, too. Darn, and that over what probably 
doesn't amount to more than a few dozen KB. Wish I could simply replace 
the single subvolume instead, but I suppose that's one of btrfs's drawbacks.


I did a full partition backup some three weeks ago, so I'll have to 
spend some hours to figure out what has changed since then, and how to 
do incremental backups of it to different devices for the next timeā€¦


I don't have the time atm though; it'll probably take at least a week 
(unless the partition decides to die) to report back.


As a side note, there was an ostensibly similar issue fixed in 2012: 
https://bugzilla.novell.com/show_bug.cgi?id=760279 Guess that was a 
different underlying issue, though.


Duncan posted on Wed, 23 Apr 2014 02:55:36 +:

 Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:

  Same failure with btrfs-progs from integration-20140421 (apart from
  the line number 1156).
 
  Can I get a bit of input on this? Is it safe to just ignore the
  error for now (as I'm doing atm), ie. remount as rw to skip the
  orphan cleanup?

 I explained orphans in my other reply.  Since they're simply not yet
 completed file deletions, it should be /relatively/ safe to continue
 ignoring and doing the manual remount rw, since that continues to
 kwork.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: corrupt leaf. slot offset bad: root subvolume unmountable, btrfs check crashes

2014-04-22 Thread Andreas Reis
Same failure with btrfs-progs from integration-20140421 (apart from the 
line number 1156).


Can I get a bit of input on this? Is it safe to just ignore the error 
for now (as I'm doing atm), ie. remount as rw to skip the orphan cleanup?


Might it even be safe to call btrfs check --repair on the partition? I'm 
not keen on that failing mid-process at the same assertion and thus 
breaking it over a bunch of minor files, just like it happened with my 
previous btrfs partitions.


On 21.04.2014 21:13, Andreas Reis wrote:

Alright, turns out the partition does actually mount on 3.15-rc2 (error
messages remain, of course).

But systemd will fail to continue booting as /bin/mount returns exit
status 32 and / thus ends as ro, yet can be manually remounted as rw.

Another error message I've spotted with 3.15 is

BTRFS error (device sdc5): error loading props for ino 1810424 (root
257): -5

I've now tried to mount with -o recovery and clear_cache, no effect.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: corrupt leaf. slot offset bad: root subvolume unmountable, btrfs check crashes

2014-04-22 Thread Duncan
Andreas Reis posted on Tue, 22 Apr 2014 20:16:13 +0200 as excerpted:

 Same failure with btrfs-progs from integration-20140421 (apart from the
 line number 1156).
 
 Can I get a bit of input on this? Is it safe to just ignore the error
 for now (as I'm doing atm), ie. remount as rw to skip the orphan
 cleanup?

I explained orphans in my other reply.  Since they're simply not yet 
completed file deletions, it should be /relatively/ safe to continue 
ignoring and doing the manual remount rw, since that continues to work.

Relatively as in that's what I'd do in the shorter term here were I 
seeing the problem, tho I'd ensure my backups were current and tested, as 
should be the case on btrfs anyway since it's not entirely stable yet, 
and just because I don't like nagging half-dealt-with-problems left 
laying around and the error would eat at me until I'd cleared it, at some 
point likely rather sooner than later, I'd very likely mkfs and restore 
from those backups.  But I'd certainly be willing to continue running 
from the partition short term, for a week or so until I had a chance to 
do the mkfs.btrfs and restore from backup, as long as that remained the 
only issue I was seeing.

 Might it even be safe to call btrfs check --repair on the partition? I'm
 not keen on that failing mid-process at the same assertion and thus
 breaking it over a bunch of minor files, just like it happened with my
 previous btrfs partitions.

That I can't say.  Based on reports and the common knowledge of the list, 
I've become rather leery of btrfs check --repair myself, and tend to rely 
on scrub and balance to fix issues if they can, and beyond that, 
mkfs.btrfs and restore from backup.  In fact, while btrfs check without 
the --repair is safe as it's read-only, I don't run it regularly either, 
because I know should it report problems I'd then be worried about things 
I might have no reasonable way to fix, that obviously aren't causing me 
problems anyway.  Basically, if mounting and regular use of the 
filesystem isn't giving me anything unusual in dmesg, I consider it good, 
and I for the most part I tend to route around btrfs check entirely, as 
if it weren't even there, tho I've run it in default read-only mode a few 
times, to compare my output with a post from the list or something, 
always with a clean bill of health from btrfs check when I have run it.

That said, if you have backups tested and ready anyway, and would 
otherwise be doing a mkfs.btrfs in short order in ordered to get rid of 
those bad orphan warnings anyway, I don't see the harm in running it, 
since at that point it's zero risk anyway.  If you lose the filesystem as 
a result, big deal, as you were going to mkfs.btrfs and restore from 
backup anyway, and if it fixes the problem, well, you saved yourself the 
hassle.

Plus, either way you can report back the results and then we'll know 
whether it's safe to recommend btrfs check for the next report, or not. 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug: corrupt leaf. slot offset bad: root subvolume unmountable, btrfs check crashes

2014-04-21 Thread Andreas Reis

Kernel 3.15.0-rc2, btrfs-progs 3.14.1

While doing some minor package updates my btrfs root partition [*] 
decided to corrupt itself. There was no system crash, although I had 
plenty of these (due to an USB-related regression) in recent weeks that 
resulted in no trouble.


First only one of a package's folders was corrupted, any access to files 
within (incl. attempts to delete) printed


btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88

to dmesg (I'm actually not sure about the numbers, but that was indeed 
the error message). After moving the folder out of the way the partition 
continued to appear working as normal, one reboot also worked fine.


Now I can't boot at all (beyond loading the kernel image located on 
another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to 
mount the __current/ROOT subvolume on ArchLinux's current Live-CD 
(kernel 3.13.7) prints


btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
btrfs: use ssd allocation scheme
btrfs: disk space caching is enabled
btrfs: checking UUID tree
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
BTRFS error (device sdc5): Error removing orphan entry, stopping orphan 
cleanup

BTRFS critical (device sdc5): could not do orphan cleanup -22

Doing btrfs check /dev/sdc5 merely first prints ten

free space inode generation (0) did not match free space cache 
generation ([different transids between 40010 and 55578])


to then abort with

checking fs roots
btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec-ino != 
key-objectid || rec-refs  1)' failed.


I'm reluctant to try any of btrfs check options (or mount with -o 
recovery) since the last three times I did this (with other partitions) 
it resulted in the partition becoming entirely trashed, while before at 
least btrfs restore still managed to extract some data each time.


The affected folder was one within /usr/include/qt4 (which I then moved 
to /usr/BROKEN, to successfully reinstall the package), ie. on the 
__current/ROOT subvolume.


Which seems the only subvolume affected (yet). Mounting  accessing the 
other three (__current/{var,home,opt}) still works.


[*] Organised following 
http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html


(Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: corrupt leaf. slot offset bad: root subvolume unmountable, btrfs check crashes

2014-04-21 Thread Andreas Reis
Alright, turns out the partition does actually mount on 3.15-rc2 (error 
messages remain, of course).


But systemd will fail to continue booting as /bin/mount returns exit 
status 32 and / thus ends as ro, yet can be manually remounted as rw.


Another error message I've spotted with 3.15 is

BTRFS error (device sdc5): error loading props for ino 1810424 (root 
257): -5


I've now tried to mount with -o recovery and clear_cache, no effect.

On 21.04.2014 18:16, Andreas Reis wrote:

Kernel 3.15.0-rc2, btrfs-progs 3.14.1

While doing some minor package updates my btrfs root partition [*]
decided to corrupt itself. There was no system crash, although I had
plenty of these (due to an USB-related regression) in recent weeks that
resulted in no trouble.

First only one of a package's folders was corrupted, any access to files
within (incl. attempts to delete) printed

btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88

to dmesg (I'm actually not sure about the numbers, but that was indeed
the error message). After moving the folder out of the way the partition
continued to appear working as normal, one reboot also worked fine.

Now I can't boot at all (beyond loading the kernel image located on
another partition), neither with 3,15-rc2 nor 3.14.1. Attempting to
mount the __current/ROOT subvolume on ArchLinux's current Live-CD
(kernel 3.13.7) prints

btrfs: device label Linux devid 1 transid 55586 /dev/sdc5
btrfs: use ssd allocation scheme
btrfs: disk space caching is enabled
btrfs: checking UUID tree
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
btrfs: corrupt leaf, slot offset bad: block=842924032,root=1, slot=88
BTRFS error (device sdc5): Error removing orphan entry, stopping orphan
cleanup
BTRFS critical (device sdc5): could not do orphan cleanup -22

Doing btrfs check /dev/sdc5 merely first prints ten

free space inode generation (0) did not match free space cache
generation ([different transids between 40010 and 55578])

to then abort with

checking fs roots
btrfs: cmds-check.c:1151: procecss_file_extent: Assertion `!(rec-ino !=
key-objectid || rec-refs  1)' failed.

I'm reluctant to try any of btrfs check options (or mount with -o
recovery) since the last three times I did this (with other partitions)
it resulted in the partition becoming entirely trashed, while before at
least btrfs restore still managed to extract some data each time.

The affected folder was one within /usr/include/qt4 (which I then moved
to /usr/BROKEN, to successfully reinstall the package), ie. on the
__current/ROOT subvolume.

Which seems the only subvolume affected (yet). Mounting  accessing the
other three (__current/{var,home,opt}) still works.

[*] Organised following
http://blog.fabio.mancinelli.me/2012/12/28/Arch_Linux_on_BTRFS.html

(Also posted on https://bugzilla.kernel.org/show_bug.cgi?id=74611 )


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: corrupt leaf. slot offset bad: root subvolume unmountable, btrfs check crashes

2014-04-21 Thread Duncan
Andreas Reis posted on Mon, 21 Apr 2014 21:13:16 +0200 as excerpted:

 Alright, turns out the partition does actually mount on 3.15-rc2 (error
 messages remain, of course).
 
 But systemd will fail to continue booting as /bin/mount returns exit
 status 32 and / thus ends as ro, yet can be manually remounted as rw.

The mount manpage says status 32 is mount failure.  Dmesg should contain 
more, but that's probably the errors you already mentioned.

So you're getting the read-only mount, but can't remount rw.

(This doesn't apply in your case, but FWIW, I now have my root filesystem 
setup to be ro mounted by default, and have been running that way for 
some months, now.  Seems safer that way.  The only time I remount / rw is 
when I'm updating the system or changing something in the config, then I 
normally remount ro again, altho after updating the system I normally 
have to exit and restart X and kde as well as various system services 
before I can remount ro, depending on what libraries got changed out from 
under my running processes.  Of course in ordered to make this work a 
few /var/ subdirs that need to be writable are actually symlinks to
/home/var/ subdirs, /var/log is a dedicated writable logging partition of 
its own, etc.  So a read-only rootfs is the /normal/ case for me, and 
wouldn't interfere with normal operations at all. =:^)

 Another error message I've spotted with 3.15 is
 
 BTRFS error (device sdc5): error loading props for ino 1810424 (root
 257): -5

That would be one of the new btrfs properties introduced in kernel 3.14.  
See btrfs property list/get/set...  Unless you've set individual file 
properties (such as compress), that's probably a property (such as ro/rw) 
on a subvolume, or possibly on the main filesystem (label, etc).

Meanwhile, orphans normally refer to files that are deleted while 
they're still in use.  Normally, these will be libraries, etc, replaced 
during a system upgrade, but still in use by running programs.  Once all 
such running programs have been restarted (loading the new version of the 
library) or terminated, the filesystem can be unmounted or remounted read-
only.  In the event they're not fully cleaned up at umount time, they are 
normally cleaned up after reboot, when a filesystem is first mounted 
writable once again.

Obviously there's a problem with one of these orphans, and attempts to 
clean it up are failing, causing the remount rw to fail.

While that doesn't help with fixing the problem, it should at least give 
you some idea of what's going on, and how to interpret the messages and 
errors you see.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html