Re: btrfs-show vs. btrfs different output

2013-03-22 Thread Eric Sandeen
On 3/22/13 8:59 AM, Jon Nelson wrote:
> On Thu, Mar 21, 2013 at 11:25 AM, Eric Sandeen  wrote:
>> On 3/21/13 10:29 AM, Jon Nelson wrote:
>>> On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen  wrote:
 On 3/21/13 10:04 AM, Jon Nelson wrote:
>>> ...
> 2. the current git btrfs-show and btrfs fi show both output
> *different* devices for device with UUID
> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

 does blkid output find that uuid anywhere?

 Since you're working in git, can you maybe do a little bisecting
 to find out when it changed?  Should be a fairly quick test?
>>>
>>> blkid does /not/ report that uuid anywhere.
>>>
>>> git bisect, if I did it correctly, says:
>>>
>>>
>>> 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
>>> commit 6eba9002956ac40db87d42fb653a0524dc568810
>>> Author: Goffredo Baroncelli 
>>> Date:   Tue Sep 4 19:59:26 2012 +0200
>>>
>>> Correct un-initialized fsid variable
>>>
>>> :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
>>> 03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c
>>
>> Ok, I think this is another case of greedily scanning stale
>> backup superblocks (did you ever have btrfs on the whole sda
>> or sdb?)
>>
>> btrfs_read_dev_super() currently tries to scan all 3 superblocks
>> (primary & 2 backups).  I'm guessing that you have some stale
>> backup superblocks on sda and/or sdb.
>>
>> Before the above commit, if the first sb didn't look valid,
>> it'd skip to the 2nd.  If the 2nd (stale) one looked OK,
>> it'd compare its fsid to an uniniitialized variable (fsid)
>> which would fail (since the "fsid" contents were random.)
>> Same for the 3rd backup if found, and eventually it'd return
>> -1 as failure and not report the device.
>>
>> After the commit, it'd skip the first invalid sb as well.
>> But this time, it takes the fsid from the 2nd superblock as
>> "good" and makes it through the loop thinking that it's found
>> something valid.  Hence the report of a device which you didn't
>> expect even though the first superblock is indeed wiped out.
>>
>> There are some patches floating around to stop this
>> backup superblock scanning altogether.
>>
>> This might fix it for you; it basically returns failure
>> if any superblock on the device is found to be bad.
>>
>> What we really need is the right bits in the right places
>> to let the administrator know if a device looks like it might
>> be corrupt & in need of fixing, vs. ignoring it altogether.
>>
>> Not sure if this is something we want upstream but you could
>> test if if you like.
> 
> I did test and it appears to resolve the issue for me.
> Thank you!

Thanks.  I need to get back to finding the right overall solution
here, but have been busy elsewhere.  It's on the list ;)

Anand is looking at it too and has some patches on the list.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-22 Thread Jon Nelson
On Thu, Mar 21, 2013 at 11:25 AM, Eric Sandeen  wrote:
> On 3/21/13 10:29 AM, Jon Nelson wrote:
>> On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen  wrote:
>>> On 3/21/13 10:04 AM, Jon Nelson wrote:
>> ...
 2. the current git btrfs-show and btrfs fi show both output
 *different* devices for device with UUID
 b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.
>>>
>>> does blkid output find that uuid anywhere?
>>>
>>> Since you're working in git, can you maybe do a little bisecting
>>> to find out when it changed?  Should be a fairly quick test?
>>
>> blkid does /not/ report that uuid anywhere.
>>
>> git bisect, if I did it correctly, says:
>>
>>
>> 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
>> commit 6eba9002956ac40db87d42fb653a0524dc568810
>> Author: Goffredo Baroncelli 
>> Date:   Tue Sep 4 19:59:26 2012 +0200
>>
>> Correct un-initialized fsid variable
>>
>> :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
>> 03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c
>
> Ok, I think this is another case of greedily scanning stale
> backup superblocks (did you ever have btrfs on the whole sda
> or sdb?)
>
> btrfs_read_dev_super() currently tries to scan all 3 superblocks
> (primary & 2 backups).  I'm guessing that you have some stale
> backup superblocks on sda and/or sdb.
>
> Before the above commit, if the first sb didn't look valid,
> it'd skip to the 2nd.  If the 2nd (stale) one looked OK,
> it'd compare its fsid to an uniniitialized variable (fsid)
> which would fail (since the "fsid" contents were random.)
> Same for the 3rd backup if found, and eventually it'd return
> -1 as failure and not report the device.
>
> After the commit, it'd skip the first invalid sb as well.
> But this time, it takes the fsid from the 2nd superblock as
> "good" and makes it through the loop thinking that it's found
> something valid.  Hence the report of a device which you didn't
> expect even though the first superblock is indeed wiped out.
>
> There are some patches floating around to stop this
> backup superblock scanning altogether.
>
> This might fix it for you; it basically returns failure
> if any superblock on the device is found to be bad.
>
> What we really need is the right bits in the right places
> to let the administrator know if a device looks like it might
> be corrupt & in need of fixing, vs. ignoring it altogether.
>
> Not sure if this is something we want upstream but you could
> test if if you like.

I did test and it appears to resolve the issue for me.
Thank you!

-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Anand Jain



What we really need is the right bits in the right places
to let the administrator know if a device looks like it might
be corrupt & in need of fixing, vs. ignoring it altogether.



--
> 2. the current git btrfs-show and btrfs fi show both output
> *different* devices for device with UUID
> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.
--

 here per Jon experience showing stale btrfs on /dev/sdb (note
 that good btrfs is on /dev/sdb3) is wrong.

 More over when we show something like /dev/sdb and /dev/sdb3
 both have two different btrfs, its a bit scary.

 I would vote for ignoring it altogether since we have
 btrfs-find-root and btrfs-select-super to help find the
 backup SB for recovery if needed.

Thanks.

-Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Eric Sandeen
On 3/21/13 10:29 AM, Jon Nelson wrote:
> On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen  wrote:
>> On 3/21/13 10:04 AM, Jon Nelson wrote:
> ...
>>> 2. the current git btrfs-show and btrfs fi show both output
>>> *different* devices for device with UUID
>>> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.
>>
>> does blkid output find that uuid anywhere?
>>
>> Since you're working in git, can you maybe do a little bisecting
>> to find out when it changed?  Should be a fairly quick test?
> 
> blkid does /not/ report that uuid anywhere.
> 
> git bisect, if I did it correctly, says:
> 
> 
> 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
> commit 6eba9002956ac40db87d42fb653a0524dc568810
> Author: Goffredo Baroncelli 
> Date:   Tue Sep 4 19:59:26 2012 +0200
> 
> Correct un-initialized fsid variable
> 
> :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
> 03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c

Ok, I think this is another case of greedily scanning stale
backup superblocks (did you ever have btrfs on the whole sda
or sdb?)

btrfs_read_dev_super() currently tries to scan all 3 superblocks
(primary & 2 backups).  I'm guessing that you have some stale
backup superblocks on sda and/or sdb.

Before the above commit, if the first sb didn't look valid,
it'd skip to the 2nd.  If the 2nd (stale) one looked OK,
it'd compare its fsid to an uniniitialized variable (fsid)
which would fail (since the "fsid" contents were random.)
Same for the 3rd backup if found, and eventually it'd return
-1 as failure and not report the device.

After the commit, it'd skip the first invalid sb as well.
But this time, it takes the fsid from the 2nd superblock as
"good" and makes it through the loop thinking that it's found
something valid.  Hence the report of a device which you didn't
expect even though the first superblock is indeed wiped out.

There are some patches floating around to stop this
backup superblock scanning altogether.

This might fix it for you; it basically returns failure
if any superblock on the device is found to be bad.

What we really need is the right bits in the right places
to let the administrator know if a device looks like it might
be corrupt & in need of fixing, vs. ignoring it altogether.

Not sure if this is something we want upstream but you could
test if if you like.

-Eric

[PATCH] Fail btrfs_read_dev_super if any super looks invalid.

Signed-off-by: Eric Sandeen 
---

diff --git a/disk-io.c b/disk-io.c
index a9fd374..3aca16f 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1104,7 +1104,6 @@ struct btrfs_root *open_ctree_fd(int fp, const char 
*path, u64 sb_bytenr,
 int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr)
 {
u8 fsid[BTRFS_FSID_SIZE];
-   int fsid_is_initialized = 0;
struct btrfs_super_block buf;
int i;
int ret;
@@ -1130,24 +1129,22 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr)
if (ret < sizeof(buf))
break;
 
-   if (btrfs_super_bytenr(&buf) != bytenr )
-   continue;
-   /* if magic is NULL, the device was removed */
-   if (buf.magic == 0 && i == 0) 
+   /* Bail out on any invalid superblock */
+   if (btrfs_super_bytenr(&buf) != bytenr ||
+   buf.magic != cpu_to_le64(BTRFS_MAGIC))
return -1;
-   if (buf.magic != cpu_to_le64(BTRFS_MAGIC))
-   continue;
 
-   if (!fsid_is_initialized) {
+   /* if sb 0 looks valid use its fsid as trusted */
+   if (i == 0)
memcpy(fsid, buf.fsid, sizeof(fsid));
-   fsid_is_initialized = 1;
-   } else if (memcmp(fsid, buf.fsid, sizeof(fsid))) {
+   else if (memcmp(fsid, buf.fsid, sizeof(fsid))) {
/*
 * the superblocks (the original one and
 * its backups) contain data of different
-* filesystems -> the super cannot be trusted
+* filesystems -> the super cannot be trusted,
+* hence the fs cannot be trusted.
 */
-   continue;
+   return -1;
}
 
if (btrfs_super_generation(&buf) > transid) {



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Helmut Hullen
Hallo, Jon,

Du meintest am 21.03.13:

> First btrfs-show (git):

> **
> ** WARNING: this program is considered deprecated
> ** Please consider to switch to the btrfs utility

"btrfs-show" makes nasty errors, especially together with "blkid".  
Delete "btrfs-show". That's the safe way.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Helmut Hullen
Hallo, Jon,

Du meintest am 21.03.13:

>>> 2. the current git btrfs-show and btrfs fi show both output
>>> *different* devices for device with UUID
>>> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

>> does blkid output find that uuid anywhere?

>> Since you're working in git, can you maybe do a little bisecting
>> to find out when it changed?  Should be a fairly quick test?

> blkid does /not/ report that uuid anywhere.

"blkid" seems to be a bit clueless.
Try "file -s" instead.

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Jon Nelson
On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen  wrote:
> On 3/21/13 10:04 AM, Jon Nelson wrote:
...
>> 2. the current git btrfs-show and btrfs fi show both output
>> *different* devices for device with UUID
>> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.
>
> does blkid output find that uuid anywhere?
>
> Since you're working in git, can you maybe do a little bisecting
> to find out when it changed?  Should be a fairly quick test?

blkid does /not/ report that uuid anywhere.

git bisect, if I did it correctly, says:


6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
commit 6eba9002956ac40db87d42fb653a0524dc568810
Author: Goffredo Baroncelli 
Date:   Tue Sep 4 19:59:26 2012 +0200

Correct un-initialized fsid variable

:100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c





-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-21 Thread Eric Sandeen
On 3/21/13 10:04 AM, Jon Nelson wrote:
> I'm running openSUSE 12.3 x86_64 which has an unknown git version, but
> reports v0.19.
> I'm also supplying the output from git which reports itself as:
> v0.20-rc1-253-g7854c8b
> 
> The problem is that btrfs-show (git) and btrfs fi show (git) give
> /different/ output from each other which is also different from the
> (older) btrfs.
> 
> First btrfs-show (git):
> 
> **
> ** WARNING: this program is considered deprecated
> ** Please consider to switch to the btrfs utility
> **
> Label: none  uuid: b5dc52bd-21bf-4173-8049-d54d88c82240
> Total devices 2 FS bytes used 230.34GB
> devid1 size 298.09GB used 298.09GB path /dev/sda
> *** Some devices missing
> 
> Label: none  uuid: 7feedf1e-9711-4900-af9c-92738ea8aace
> Total devices 4 FS bytes used 348.21GB
> devid4 size 460.76GB used 444.23GB path /dev/sdb3
> devid3 size 460.76GB used 444.23GB path /dev/sda3
> devid6 size 298.09GB used 282.53GB path /dev/sdc
> devid5 size 298.09GB used 282.53GB path /dev/sdd
> 
> Btrfs v0.20-rc1-253-g7854c8b
> 
> 
> Now btrfs fi show (git):
> 
> failed to open /dev/sr0: No medium found
> Label: none  uuid: 7feedf1e-9711-4900-af9c-92738ea8aace
> Total devices 4 FS bytes used 348.21GB
> devid5 size 298.09GB used 282.53GB path /dev/sdd
> devid6 size 298.09GB used 282.53GB path /dev/sdc
> devid4 size 460.76GB used 444.23GB path /dev/sdb3
> devid3 size 460.76GB used 444.23GB path /dev/sda3
> 
> Label: none  uuid: b5dc52bd-21bf-4173-8049-d54d88c82240
> Total devices 2 FS bytes used 230.34GB
> devid1 size 298.09GB used 298.09GB path /dev/sdb
> *** Some devices missing
> 
> Btrfs v0.20-rc1-253-g7854c8b
> 
> 
> And now the (older) btrfs fi show:
> 
> failed to read /dev/sr0
> Label: none  uuid: 7feedf1e-9711-4900-af9c-92738ea8aace
> Total devices 4 FS bytes used 348.21GB
> devid5 size 298.09GB used 282.53GB path /dev/sdd
> devid6 size 298.09GB used 282.53GB path /dev/sdc
> devid4 size 460.76GB used 444.23GB path /dev/sdb3
> devid3 size 460.76GB used 444.23GB path /dev/sda3
> 
> Btrfs v0.19+
> 
> which has similar output to the (older) btrfs-show.
> 
> 
> The differences are:
> 
> 1. the order of devices varies (not a big deal)

Heh, that is annoying though.

> 2. the current git btrfs-show and btrfs fi show both output
> *different* devices for device with UUID
> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

does blkid output find that uuid anywhere?

Since you're working in git, can you maybe do a little bisecting
to find out when it changed?  Should be a fairly quick test?

-Eric

> Somewhat confusingly, /dev/disk/by-uuid/ only shows three devices:
> 
> bd40cb4e-93bb-4600-8455-ca1185aa8abe -> ../../md3
> 7feedf1e-9711-4900-af9c-92738ea8aace -> ../../sda3
> 4f0b27a5-5f5b-413c-a71b-b5a3bec5482c -> ../../md2
> 
> What's going on with the varied output from (git) btrfs-show vs.
> 'btrfs fi show' (vs. the older, as-shipped btrfs)?
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html