mount problem

2011-01-11 Thread Leonidas Spyropoulos
Hey all,

I have a weird error with my RAID 0 btrfs partition.
Information for the partitions follow:

# btrfs filesystem show
failed to read /dev/sr0
Label: none  uuid: 1882b025-58e4-4287-98a3-9b772af0ad76
Total devices 2 FS bytes used 108.16GB
devid2 size 74.53GB used 55.26GB path /dev/sdd2
devid3 size 74.53GB used 55.26GB path /dev/sde2

Btrfs v0.19-35-g1b444cd

# btrfs device scan
Scanning for Btrfs filesystems
failed to read /dev/sr0

# cat /etc/fstab | grep btrfs
/dev/disk/by-uuid/1882b025-58e4-4287-98a3-9b772af0ad76  /media/data
btrfs   rw,user 0 0

# blkid
/dev/sdd2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
UUID_SUB="468b49fa-a0b6-4e11-a312-ef0cafd2890a" TYPE="btrfs"
/dev/sde2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
UUID_SUB="cf534558-a317-4259-808b-d950a155fb5d" TYPE="btrfs"

Although I have the config within fstab it doesn't mount on bootup with error:
mount: wrong fs type, bad option, bad superblock on /dev/sde2,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

Kernel messages:
device fsid 8742e45825b08218-76adf02a779ba398 devid 3 transid 4778 /dev/sde2
btrfs: failed to read the system array on sde2

No matter how many times I try with sudo mount -a or sudo mount
/media/data I got the same error.

Now the weird stuff:
If I do a blkid and fdisk -l I can then mount normally the partition.
Any ideas?

Kernel info: Linux woofy 2.6.36-ARCH #1 SMP PREEMPT Sat Jan 8 14:15:27
CET 2011 x86_64 Dual Core AMD Opteron(tm) Processor 165 AuthenticAMD
GNU/Linux
btrfs-progs: Latest from git
I don't know which revision/patch Arch stock kernel has of btrfs.


Thanks,
Leonidas



-- 
Caution: breathing may be hazardous to your health.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mount problem

2014-09-23 Thread Simone Ferretti
Hi all,

we're testing BTRFS on our Debian server.  After a lot of operations
simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume
the kernel logs these messages:

[73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs: wr 33036, rd 0, flush 0, 
corrupt 2806, gen 0
[73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs: wr 244165, rd 0, flush 0, 
corrupt 1, gen 4

Everything seems to work nice but I'm courious to know what these
messages mean (in particular what do "gen" and "corrupt" mean?).

# uname -a
Linux dub 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux

# btrfs --version
Btrfs v3.16

# btrfs fi show
Label: 'btrfs_multiappliance'  uuid: 3452ffdd-c09b-43dd-9adb-cffde8518a72
Total devices 2 FS bytes used 20.03GiB
devid1 size 1.82TiB used 24.03GiB path /dev/etherd/e30.20
devid2 size 3.64TiB used 24.03GiB path /dev/etherd/e60.28

# btrfs fi df /media/multiapp
Data, RAID1: total=22.00GiB, used=20.01GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=21.23MiB
unknown, single: total=16.00MiB, used=0.00

# dmesg
[82932.655078] BTRFS info (device etherd/e30.20): disk space caching is enabled
[82932.678380] BTRFS: bdev /dev/etherd/e30.20 errs: wr 33036, rd 0, flush 0, 
corrupt 2806, gen 0
[82932.678388] BTRFS: bdev /dev/etherd/e60.28 errs: wr 244165, rd 0, flush 0, 
corrupt 1, gen 4


-- 
Thanks in advance,
Simone Ferretti
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount problem

2011-01-11 Thread Tsutomu Itoh
(2011/01/12 9:25), Leonidas Spyropoulos wrote:
> Hey all,
> 
> I have a weird error with my RAID 0 btrfs partition.
> Information for the partitions follow:
> 
> # btrfs filesystem show
> failed to read /dev/sr0
> Label: none  uuid: 1882b025-58e4-4287-98a3-9b772af0ad76
>   Total devices 2 FS bytes used 108.16GB
>   devid2 size 74.53GB used 55.26GB path /dev/sdd2
>   devid3 size 74.53GB used 55.26GB path /dev/sde2
> 
> Btrfs v0.19-35-g1b444cd
> 
> # btrfs device scan
> Scanning for Btrfs filesystems
> failed to read /dev/sr0
> 
> # cat /etc/fstab | grep btrfs
> /dev/disk/by-uuid/1882b025-58e4-4287-98a3-9b772af0ad76/media/data
>   btrfs   rw,user 0 0
> 
> # blkid
> /dev/sdd2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
> UUID_SUB="468b49fa-a0b6-4e11-a312-ef0cafd2890a" TYPE="btrfs"
> /dev/sde2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
> UUID_SUB="cf534558-a317-4259-808b-d950a155fb5d" TYPE="btrfs"
> 
> Although I have the config within fstab it doesn't mount on bootup with error:
> mount: wrong fs type, bad option, bad superblock on /dev/sde2,
>missing codepage or helper program, or other error
>In some cases useful info is found in syslog - try
>dmesg | tail  or so
> 
> Kernel messages:
> device fsid 8742e45825b08218-76adf02a779ba398 devid 3 transid 4778 /dev/sde2
> btrfs: failed to read the system array on sde2

Please see Problem_FAQ in btrfs wiki.
(https://btrfs.wiki.kernel.org/index.php/Problem_FAQ)

Thanks,
Itoh

> 
> No matter how many times I try with sudo mount -a or sudo mount
> /media/data I got the same error.
> 
> Now the weird stuff:
> If I do a blkid and fdisk -l I can then mount normally the partition.
> Any ideas?
> 
> Kernel info: Linux woofy 2.6.36-ARCH #1 SMP PREEMPT Sat Jan 8 14:15:27
> CET 2011 x86_64 Dual Core AMD Opteron(tm) Processor 165 AuthenticAMD
> GNU/Linux
> btrfs-progs: Latest from git
> I don't know which revision/patch Arch stock kernel has of btrfs.
> 
> 
> Thanks,
> Leonidas
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount problem

2011-01-11 Thread Leonidas Spyropoulos
2011/1/12 Tsutomu Itoh :
> (2011/01/12 9:25), Leonidas Spyropoulos wrote:
>> Hey all,
>>
>> I have a weird error with my RAID 0 btrfs partition.
>> Information for the partitions follow:
>>
>> # btrfs filesystem show
>> failed to read /dev/sr0
>> Label: none  uuid: 1882b025-58e4-4287-98a3-9b772af0ad76
>>       Total devices 2 FS bytes used 108.16GB
>>       devid    2 size 74.53GB used 55.26GB path /dev/sdd2
>>       devid    3 size 74.53GB used 55.26GB path /dev/sde2
>>
>> Btrfs v0.19-35-g1b444cd
>>
>> # btrfs device scan
>> Scanning for Btrfs filesystems
>> failed to read /dev/sr0
>>
>> # cat /etc/fstab | grep btrfs
>> /dev/disk/by-uuid/1882b025-58e4-4287-98a3-9b772af0ad76        /media/data
>>       btrfs   rw,user 0 0
>>
>> # blkid
>> /dev/sdd2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
>> UUID_SUB="468b49fa-a0b6-4e11-a312-ef0cafd2890a" TYPE="btrfs"
>> /dev/sde2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
>> UUID_SUB="cf534558-a317-4259-808b-d950a155fb5d" TYPE="btrfs"
>>
>> Although I have the config within fstab it doesn't mount on bootup with 
>> error:
>> mount: wrong fs type, bad option, bad superblock on /dev/sde2,
>>        missing codepage or helper program, or other error
>>        In some cases useful info is found in syslog - try
>>        dmesg | tail  or so
>>
>> Kernel messages:
>> device fsid 8742e45825b08218-76adf02a779ba398 devid 3 transid 4778 /dev/sde2
>> btrfs: failed to read the system array on sde2
>
> Please see Problem_FAQ in btrfs wiki.
> (https://btrfs.wiki.kernel.org/index.php/Problem_FAQ)
>
> Thanks,
> Itoh
>
>>
>> No matter how many times I try with sudo mount -a or sudo mount
>> /media/data I got the same error.
>>
>> Now the weird stuff:
>> If I do a blkid and fdisk -l I can then mount normally the partition.
>> Any ideas?
>>
>> Kernel info: Linux woofy 2.6.36-ARCH #1 SMP PREEMPT Sat Jan 8 14:15:27
>> CET 2011 x86_64 Dual Core AMD Opteron(tm) Processor 165 AuthenticAMD
>> GNU/Linux
>> btrfs-progs: Latest from git
>> I don't know which revision/patch Arch stock kernel has of btrfs.
>>
>>
>> Thanks,
>> Leonidas
>>
>
>

Hey cwillu and Itoh,

Thanks both for the answers, so as I can see I have 2 options:
Either find the startup scripts of Arch Linux and run the scan command
before parsing fstab (this should be in /etc/rc.conf ?)
OR
edit fstab with parameters the correct devices like:
 /dev/disk/by-uuid/1882b025-58e4-4287-98a3-9b772af0ad76  /media/data
btrfs   device=/dev/sde2,device=/dev/sdd2,rw,user 0 0

I think though that the devices names are randomly chosen in startup
from udev, right?
(this is the reason I use uuids)
so effectivly i have just an option.

Thanks,
Leonidas

-- 
Caution: breathing may be hazardous to your health.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount problem

2011-01-11 Thread Leonidas Spyropoulos
On 12 January 2011 00:58, Leonidas Spyropoulos  wrote:
> 2011/1/12 Tsutomu Itoh :
>> (2011/01/12 9:25), Leonidas Spyropoulos wrote:
>>> Hey all,
>>>
>>> I have a weird error with my RAID 0 btrfs partition.
>>> Information for the partitions follow:
>>>
>>> # btrfs filesystem show
>>> failed to read /dev/sr0
>>> Label: none  uuid: 1882b025-58e4-4287-98a3-9b772af0ad76
>>>       Total devices 2 FS bytes used 108.16GB
>>>       devid    2 size 74.53GB used 55.26GB path /dev/sdd2
>>>       devid    3 size 74.53GB used 55.26GB path /dev/sde2
>>>
>>> Btrfs v0.19-35-g1b444cd
>>>
>>> # btrfs device scan
>>> Scanning for Btrfs filesystems
>>> failed to read /dev/sr0
>>>
>>> # cat /etc/fstab | grep btrfs
>>> /dev/disk/by-uuid/1882b025-58e4-4287-98a3-9b772af0ad76        /media/data
>>>       btrfs   rw,user 0 0
>>>
>>> # blkid
>>> /dev/sdd2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
>>> UUID_SUB="468b49fa-a0b6-4e11-a312-ef0cafd2890a" TYPE="btrfs"
>>> /dev/sde2: UUID="1882b025-58e4-4287-98a3-9b772af0ad76"
>>> UUID_SUB="cf534558-a317-4259-808b-d950a155fb5d" TYPE="btrfs"
>>>
>>> Although I have the config within fstab it doesn't mount on bootup with 
>>> error:
>>> mount: wrong fs type, bad option, bad superblock on /dev/sde2,
>>>        missing codepage or helper program, or other error
>>>        In some cases useful info is found in syslog - try
>>>        dmesg | tail  or so
>>>
>>> Kernel messages:
>>> device fsid 8742e45825b08218-76adf02a779ba398 devid 3 transid 4778 /dev/sde2
>>> btrfs: failed to read the system array on sde2
>>
>> Please see Problem_FAQ in btrfs wiki.
>> (https://btrfs.wiki.kernel.org/index.php/Problem_FAQ)
>>
>> Thanks,
>> Itoh
>>
>>>
>>> No matter how many times I try with sudo mount -a or sudo mount
>>> /media/data I got the same error.
>>>
>>> Now the weird stuff:
>>> If I do a blkid and fdisk -l I can then mount normally the partition.
>>> Any ideas?
>>>
>>> Kernel info: Linux woofy 2.6.36-ARCH #1 SMP PREEMPT Sat Jan 8 14:15:27
>>> CET 2011 x86_64 Dual Core AMD Opteron(tm) Processor 165 AuthenticAMD
>>> GNU/Linux
>>> btrfs-progs: Latest from git
>>> I don't know which revision/patch Arch stock kernel has of btrfs.
>>>
>>>
>>> Thanks,
>>> Leonidas
>>>
>>
>>
>
> Hey cwillu and Itoh,
>
> Thanks both for the answers, so as I can see I have 2 options:
> Either find the startup scripts of Arch Linux and run the scan command
> before parsing fstab (this should be in /etc/rc.conf ?)
> OR
> edit fstab with parameters the correct devices like:
>  /dev/disk/by-uuid/1882b025-58e4-4287-98a3-9b772af0ad76  /media/data
> btrfs   device=/dev/sde2,device=/dev/sdd2,rw,user 0 0
>
> I think though that the devices names are randomly chosen in startup
> from udev, right?
> (this is the reason I use uuids)
> so effectivly i have just an option.
>
> Thanks,
> Leonidas
>
> --
> Caution: breathing may be hazardous to your health.
>

I searched around and after consulting with #archlinux irc channel the
best way to do that is to add a hook in mkinitcpio.

This tool already has a btrfs hook current version so simply add it
before filesystem hook and recreate the initramfs

For my stock kernel: mkinitcpio -p kernel26

Worked fine.

Thanks
Leonidas
-- 
Caution: breathing may be hazardous to your health.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount problem

2014-09-24 Thread Duncan
Simone Ferretti posted on Tue, 23 Sep 2014 14:06:41 +0200 as excerpted:

> we're testing BTRFS on our Debian server.  After a lot of operations
> simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume the
> kernel logs these messages:
> 
> [73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs:
> wr 33036, rd 0, flush 0, corrupt 2806, gen 0
> [73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs:
> wr 244165, rd 0, flush 0, corrupt 1, gen 4
> 
> Everything seems to work nice but I'm courious to know what these
> messages mean (in particular what do "gen" and "corrupt" mean?).

Gen=generation.  The generation or transaction-ID (different names for 
the exact same thing) is a monotonically increasing integer that gets 
updated every time a tree update reaches all the way to the superblock.  
In the error context, it means the superblock had one generation number 
but N other blocks had a different (presumably older) generation number.

Corrupt is simply the number of blocks where the calculated checksum 
didn't match the recorded checksum, thus indicating an error.

Of course rd=read, wr=write...

In raid1 mode scrub can typically find and fix many of these errors.  My 
brtfs are mostly raid1 mode, and when I crash and reboot, scrub nearly 
always finds and fixes errors on the two btrfs (independent btrfs full-
filesystems, not subvolumes, /var/log and /home) I normally have mounted 
rw.

But do note that this is the HISTORIC count, counting all errors since 
the counts were reset.  Thus, they'll still be reported after scrub or 
whatever has fixed them.  As long as the numbers don't increase, you're 
good.  Any increase indicates additional problems.

See btrfs device stats -z to reset the numbers to zero (after printing 
them one last time).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount problem

2014-09-24 Thread Simone Ferretti
Wed, Sep 24, 2014 at 01:23:32PM +, Duncan wrote:
> Simone Ferretti posted on Tue, 23 Sep 2014 14:06:41 +0200 as excerpted:
> 
> > we're testing BTRFS on our Debian server.  After a lot of operations
> > simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume the
> > kernel logs these messages:
> > 
> > [73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs:
> > wr 33036, rd 0, flush 0, corrupt 2806, gen 0
> > [73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs:
> > wr 244165, rd 0, flush 0, corrupt 1, gen 4
> > 
> > Everything seems to work nice but I'm courious to know what these
> > messages mean (in particular what do "gen" and "corrupt" mean?).
> 
> Gen=generation.  The generation or transaction-ID (different names for 
> the exact same thing) is a monotonically increasing integer that gets 
> updated every time a tree update reaches all the way to the superblock.  
> In the error context, it means the superblock had one generation number 
> but N other blocks had a different (presumably older) generation number.
> 
> Corrupt is simply the number of blocks where the calculated checksum 
> didn't match the recorded checksum, thus indicating an error.
>
> See btrfs device stats -z to reset the numbers to zero (after printing 
> them one last time).


Thank you much for your quick and illuminating answer.

I'm wondering if you (or anyone else of course) know if there is btrfs
documentation/papers/anything (besides wiki I did not find anything),
in which it's possible to learn this kind of informations?

-- 
Bye,
Simone
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount problem

2014-09-24 Thread Duncan
Simone Ferretti posted on Wed, 24 Sep 2014 16:28:35 +0200 as excerpted:

> Wed, Sep 24, 2014 at 01:23:32PM +, Duncan wrote:
>> Simone Ferretti posted on Tue, 23 Sep 2014 14:06:41 +0200 as excerpted:
>> 
>>> we're testing BTRFS on our Debian server.  After a lot of operations
>>> simulating a RAID1 failure, every time I mount my BTRFS RAID1 volume
>>> the kernel logs these messages:
>>> 
>>> [73894.436173] BTRFS: bdev /dev/etherd/e30.20 errs:
>>> wr 33036, rd 0, flush 0, corrupt 2806, gen 0
>>> [73894.436181] BTRFS: bdev /dev/etherd/e60.28 errs:
>>> wr 244165, rd 0, flush 0, corrupt 1, gen 4
>>> 
>>> Everything seems to work nice but I'm courious to know what these
>>> messages mean (in particular what do "gen" and "corrupt" mean?).
>> 
>> Gen=generation.  The generation or transaction-ID (different names for
>> the exact same thing) is a monotonically increasing integer that gets
>> updated every time a tree update reaches all the way to the superblock.
>> In the error context, it means the superblock had one generation number
>> but N other blocks had a different (presumably older) generation
>> number.
>> 
>> Corrupt is simply the number of blocks where the calculated checksum
>> didn't match the recorded checksum, thus indicating an error.
>>
>> See btrfs device stats -z to reset the numbers to zero (after printing
>> them one last time).
> 
> 
> Thank you much for your quick and illuminating answer.
> 
> I'm wondering if you (or anyone else of course) know if there is btrfs
> documentation/papers/anything (besides wiki I did not find anything), in
> which it's possible to learn this kind of informations?

I've learned it from the list and wiki, and from general background 
experience and by reading between the lines at times.

For the monotonically increasing counts and a zero-out option case, the 
manpage and help information for btrfs device stats -z, that indicates -z 
resets counts to zero, implies that they continue to count up otherwise.  
At one point I think a dev did confirm that on-list, but it's easy enough 
to read the implication without such confirmation, particularly when it 
matches observed behavior, as it does.


The gen/trans-id thing is in fact covered in the wiki, but at least on 
the user-wiki side, I believe only in passing as it is mentioned on the 
btrfs restore page, here:

https://btrfs.wiki.kernel.org/index.php/Restore

(That is in turn linked from the problem-faq, filesystem won't mount and 
none of the above helped, is there any hope, entry, as well as from the 
built-in-tools section of the main page.)

Of course people only searching for specific things instead of doing 
general research before diving head-first into a new filesystem, thus 
reading most of at least the user section of the wiki, as I did, might 
miss it.

But while it's there, it took an actual problem and trying to actually 
use restore on my own system before the equivalence of trans-id and 
generation actually sunk in.

The corrupt thing probably came from my previous experience, working with 
mdraid and its scrub, and with ECC RAM and the related BIOS scrub 
features.  In general, any admin who has worked with (and understood) any 
sort of checksumming and error detection and correction should have a 
general idea what's going on there, at least after reading the
btrfs-scrub manpage and running it to correct errors a few times, thus 
seeing how its output matches that of the corresponding stats.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rw-mount-problem after raid1-failure

2015-06-08 Thread Martin
Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid Vervet, 
btrfs-tools 3.17-1.1). One disk failed some days ago. I could remount the 
remaining one with "-o degraded". After one day and some write-operations 
(with no errrors) I had to reboot the system. And now I can not mount "rw" 
anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices, writeable mount is 
not allowed. 

I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I did no 
conversion to a single drive.

How can I mount the disk "rw" to remove the "missing" drive and add a new one? 
Because there are many snapshots of the filesystem, copying the system would 
be only the last alternative ;-)

Thanks

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-09 Thread Anand Jain



On 06/09/2015 01:10 AM, Martin wrote:

Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid Vervet,
btrfs-tools 3.17-1.1). One disk failed some days ago. I could remount the
remaining one with "-o degraded". After one day and some write-operations
(with no errrors) I had to reboot the system. And now I can not mount "rw"
anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices, writeable mount is
not allowed.

I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I did no
conversion to a single drive.

How can I mount the disk "rw" to remove the "missing" drive and add a new one?
Because there are many snapshots of the filesystem, copying the system would
be only the last alternative ;-)


 How many disks you had in the RAID1. How many are failed ?

Thanks Anand



Thanks

Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-09 Thread Duncan
Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:

> On 06/09/2015 01:10 AM, Martin wrote:
>> Hello!
>>
>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
>> remount the remaining one with "-o degraded". After one day and some
>> write-operations (with no errrors) I had to reboot the system. And now
>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>
>> In the kernel log I found BTRFS: too many missing devices, writeable
>> mount is not allowed.
>>
>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>> did no conversion to a single drive.
>>
>> How can I mount the disk "rw" to remove the "missing" drive and add a
>> new one?
>> Because there are many snapshots of the filesystem, copying the system
>> would be only the last alternative ;-)
> 
> How many disks you had in the RAID1. How many are failed ?

The answer is (a bit indirectly) in what you quoted.  Repeating:

>> One disk failed[.] I could remount the remaining one[.]

So it was a two-device raid1, one failed device, one remaining, unfailed.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-09 Thread Anand Jain



Ah thanks David. So its 2 disks RAID1.

Martin,

 disk pool error handle is primitive as of now. readonly is the only
 action it would take. rest of recovery action is manual. thats
 unacceptable in a data center solutions. I don't recommend btrfs VM
 productions yet. But we are working to get that to a complete VM.

 For now, for your pool recovery: pls try this.

- After reboot.
- modunload and modload (so that kernel devlist is empty)
- mount -o degraded  <-- this should work.
- btrfs fi show -m <-- Should show missing if you don't let me know.
- Do a replace of the missing disk without reading the source disk.

Good luck.

Thanks, Anand


On 06/10/2015 11:58 AM, Duncan wrote:

Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:


On 06/09/2015 01:10 AM, Martin wrote:

Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
remount the remaining one with "-o degraded". After one day and some
write-operations (with no errrors) I had to reboot the system. And now
I can not mount "rw" anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices, writeable
mount is not allowed.

I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
did no conversion to a single drive.

How can I mount the disk "rw" to remove the "missing" drive and add a
new one?
Because there are many snapshots of the filesystem, copying the system
would be only the last alternative ;-)


How many disks you had in the RAID1. How many are failed ?


The answer is (a bit indirectly) in what you quoted.  Repeating:


One disk failed[.] I could remount the remaining one[.]


So it was a two-device raid1, one failed device, one remaining, unfailed.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-09 Thread Martin
Hello Anand,

the

> mount -o degraded  <-- this should work

is my problem. The fist times it works but suddently, after a reboot, it fails 
with message "BTRFS: too many missing devices, writeable mount is not allowed" 
in kernel log.

"btrfs fi show /backup2" shows:
Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
Total devices 2 FS bytes used 3.50TiB
devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
*** Some devices missing

I suppose there is a "marker", telling the system only to mount in ro-mode?

Due to the ro-mount I can't replace the missing one because all the btrfs-
commands need rw-access ...

Martin

Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> Ah thanks David. So its 2 disks RAID1.
> 
> Martin,
> 
>   disk pool error handle is primitive as of now. readonly is the only
>   action it would take. rest of recovery action is manual. thats
>   unacceptable in a data center solutions. I don't recommend btrfs VM
>   productions yet. But we are working to get that to a complete VM.
> 
>   For now, for your pool recovery: pls try this.
> 
>  - After reboot.
>  - modunload and modload (so that kernel devlist is empty)
>  - mount -o degraded  <-- this should work.
>  - btrfs fi show -m <-- Should show missing if you don't let me know.
>  - Do a replace of the missing disk without reading the source disk.
> 
> Good luck.
> 
> Thanks, Anand
> 
> On 06/10/2015 11:58 AM, Duncan wrote:
> > Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> >> On 06/09/2015 01:10 AM, Martin wrote:
> >>> Hello!
> >>> 
> >>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
> >>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
> >>> remount the remaining one with "-o degraded". After one day and some
> >>> write-operations (with no errrors) I had to reboot the system. And now
> >>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> >>> 
> >>> In the kernel log I found BTRFS: too many missing devices, writeable
> >>> mount is not allowed.
> >>> 
> >>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
> >>> did no conversion to a single drive.
> >>> 
> >>> How can I mount the disk "rw" to remove the "missing" drive and add a
> >>> new one?
> >>> Because there are many snapshots of the filesystem, copying the system
> >>> would be only the last alternative ;-)
> >> 
> >> How many disks you had in the RAID1. How many are failed ?
> > 
> > The answer is (a bit indirectly) in what you quoted.  Repeating:
> >>> One disk failed[.] I could remount the remaining one[.]
> > 
> > So it was a two-device raid1, one failed device, one remaining, unfailed.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-10 Thread Anand Jain



On 06/10/2015 02:58 PM, Martin wrote:

Hello Anand,

the


mount -o degraded  <-- this should work


is my problem. The fist times it works but suddently, after a reboot, it fails
with message "BTRFS: too many missing devices, writeable mount is not allowed"
in kernel log.


 the failed(ing) disk is it still physically in the system ?
 when btrfs finds EIO on the intermittently failing disk,
 ro-mode kicks in, (there are some opportunity for fixes which
 I am trying). To recover, the approach is to make the failing
 disk a missing disk instead, by pulling out the failing disk
 from the system and boot. When system finds disk missing
 (not EIO rather) it should mount rw,degraded (from the VM part
 at least) and then replace (with a new disk) should work.

Thanks, Anand



"btrfs fi show /backup2" shows:
Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
Total devices 2 FS bytes used 3.50TiB
devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
*** Some devices missing

I suppose there is a "marker", telling the system only to mount in ro-mode?

Due to the ro-mount I can't replace the missing one because all the btrfs-
commands need rw-access ...

Martin

Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:

Ah thanks David. So its 2 disks RAID1.

Martin,

   disk pool error handle is primitive as of now. readonly is the only
   action it would take. rest of recovery action is manual. thats
   unacceptable in a data center solutions. I don't recommend btrfs VM
   productions yet. But we are working to get that to a complete VM.

   For now, for your pool recovery: pls try this.

  - After reboot.
  - modunload and modload (so that kernel devlist is empty)
  - mount -o degraded  <-- this should work.
  - btrfs fi show -m <-- Should show missing if you don't let me know.
  - Do a replace of the missing disk without reading the source disk.

Good luck.

Thanks, Anand

On 06/10/2015 11:58 AM, Duncan wrote:

Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:

On 06/09/2015 01:10 AM, Martin wrote:

Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
remount the remaining one with "-o degraded". After one day and some
write-operations (with no errrors) I had to reboot the system. And now
I can not mount "rw" anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices, writeable
mount is not allowed.

I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
did no conversion to a single drive.

How can I mount the disk "rw" to remove the "missing" drive and add a
new one?
Because there are many snapshots of the filesystem, copying the system
would be only the last alternative ;-)


How many disks you had in the RAID1. How many are failed ?


The answer is (a bit indirectly) in what you quoted.  Repeating:

One disk failed[.] I could remount the remaining one[.]


So it was a two-device raid1, one failed device, one remaining, unfailed.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-10 Thread Martin
Hello Anand,

the failed disk was removed. My procedure was the following:

 - I found some write errors in the kernel log, so
 - I shutdown the system
 - I removed the failed disk
 - I powered on the system
 - I mounted the remaining disk degraded,rw (works OK)
 - the system works an and was rebooted some times, mounting degraded,rw works
 - suddentlym mounting degraded,rw stops working and only degraded,ro works.

Thanks, Martin


Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
> On 06/10/2015 02:58 PM, Martin wrote:
> > Hello Anand,
> > 
> > the
> > 
> >> mount -o degraded  <-- this should work
> > 
> > is my problem. The fist times it works but suddently, after a reboot, it
> > fails with message "BTRFS: too many missing devices, writeable mount is
> > not allowed" in kernel log.
> 
>   the failed(ing) disk is it still physically in the system ?
>   when btrfs finds EIO on the intermittently failing disk,
>   ro-mode kicks in, (there are some opportunity for fixes which
>   I am trying). To recover, the approach is to make the failing
>   disk a missing disk instead, by pulling out the failing disk
>   from the system and boot. When system finds disk missing
>   (not EIO rather) it should mount rw,degraded (from the VM part
>   at least) and then replace (with a new disk) should work.
> 
> Thanks, Anand
> 
> > "btrfs fi show /backup2" shows:
> > Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> > 
> > Total devices 2 FS bytes used 3.50TiB
> > devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
> > *** Some devices missing
> > 
> > I suppose there is a "marker", telling the system only to mount in
> > ro-mode?
> > 
> > Due to the ro-mount I can't replace the missing one because all the btrfs-
> > commands need rw-access ...
> > 
> > Martin
> > 
> > Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> >> Ah thanks David. So its 2 disks RAID1.
> >> 
> >> Martin,
> >> 
> >>disk pool error handle is primitive as of now. readonly is the only
> >>action it would take. rest of recovery action is manual. thats
> >>unacceptable in a data center solutions. I don't recommend btrfs VM
> >>productions yet. But we are working to get that to a complete VM.
> >>
> >>For now, for your pool recovery: pls try this.
> >>
> >>   - After reboot.
> >>   - modunload and modload (so that kernel devlist is empty)
> >>   - mount -o degraded  <-- this should work.
> >>   - btrfs fi show -m <-- Should show missing if you don't let me
> >>   know.
> >>   - Do a replace of the missing disk without reading the source disk.
> >> 
> >> Good luck.
> >> 
> >> Thanks, Anand
> >> 
> >> On 06/10/2015 11:58 AM, Duncan wrote:
> >>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>  On 06/09/2015 01:10 AM, Martin wrote:
> > Hello!
> > 
> > I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
> > Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
> > remount the remaining one with "-o degraded". After one day and some
> > write-operations (with no errrors) I had to reboot the system. And now
> > I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> > 
> > In the kernel log I found BTRFS: too many missing devices, writeable
> > mount is not allowed.
> > 
> > I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
> > did no conversion to a single drive.
> > 
> > How can I mount the disk "rw" to remove the "missing" drive and add a
> > new one?
> > Because there are many snapshots of the filesystem, copying the system
> > would be only the last alternative ;-)
>  
>  How many disks you had in the RAID1. How many are failed ?
> >>> 
> >>> The answer is (a bit indirectly) in what you quoted.  Repeating:
> > One disk failed[.] I could remount the remaining one[.]
> >>> 
> >>> So it was a two-device raid1, one failed device, one remaining,
> >>> unfailed.
> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rw-mount-problem after raid1-failure

2015-06-10 Thread Anand Jain


> On 10 Jun 2015, at 5:35 pm, Martin  wrote:
> 
> Hello Anand,
> 
> the failed disk was removed. My procedure was the following:
> 
> - I found some write errors in the kernel log, so
> - I shutdown the system
> - I removed the failed disk
> - I powered on the system
> - I mounted the remaining disk degraded,rw (works OK)
> - the system works an and was rebooted some times, mounting degraded,rw works
> - suddentlym mounting degraded,rw stops working and only degraded,ro works.

any logs to say why. ?
Or
If these (above) stages are reproducible, could you fetch them afresh?

Thanks Anand

> Thanks, Martin
> 
> 
> Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
>> On 06/10/2015 02:58 PM, Martin wrote:
>>> Hello Anand,
>>> 
>>> the
>>> 
 mount -o degraded  <-- this should work
>>> 
>>> is my problem. The fist times it works but suddently, after a reboot, it
>>> fails with message "BTRFS: too many missing devices, writeable mount is
>>> not allowed" in kernel log.
>> 
>>  the failed(ing) disk is it still physically in the system ?
>>  when btrfs finds EIO on the intermittently failing disk,
>>  ro-mode kicks in, (there are some opportunity for fixes which
>>  I am trying). To recover, the approach is to make the failing
>>  disk a missing disk instead, by pulling out the failing disk
>>  from the system and boot. When system finds disk missing
>>  (not EIO rather) it should mount rw,degraded (from the VM part
>>  at least) and then replace (with a new disk) should work.
>> 
>> Thanks, Anand
>> 
>>> "btrfs fi show /backup2" shows:
>>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
>>> 
>>>Total devices 2 FS bytes used 3.50TiB
>>>devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
>>>*** Some devices missing
>>> 
>>> I suppose there is a "marker", telling the system only to mount in
>>> ro-mode?
>>> 
>>> Due to the ro-mount I can't replace the missing one because all the btrfs-
>>> commands need rw-access ...
>>> 
>>> Martin
>>> 
>>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
 Ah thanks David. So its 2 disks RAID1.
 
 Martin,
 
   disk pool error handle is primitive as of now. readonly is the only
   action it would take. rest of recovery action is manual. thats
   unacceptable in a data center solutions. I don't recommend btrfs VM
   productions yet. But we are working to get that to a complete VM.
 
   For now, for your pool recovery: pls try this.
 
  - After reboot.
  - modunload and modload (so that kernel devlist is empty)
  - mount -o degraded  <-- this should work.
  - btrfs fi show -m <-- Should show missing if you don't let me
  know.
  - Do a replace of the missing disk without reading the source disk.
 
 Good luck.
 
 Thanks, Anand
 
> On 06/10/2015 11:58 AM, Duncan wrote:
> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
>>> On 06/09/2015 01:10 AM, Martin wrote:
>>> Hello!
>>> 
>>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
>>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I could
>>> remount the remaining one with "-o degraded". After one day and some
>>> write-operations (with no errrors) I had to reboot the system. And now
>>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
>>> 
>>> In the kernel log I found BTRFS: too many missing devices, writeable
>>> mount is not allowed.
>>> 
>>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
>>> did no conversion to a single drive.
>>> 
>>> How can I mount the disk "rw" to remove the "missing" drive and add a
>>> new one?
>>> Because there are many snapshots of the filesystem, copying the system
>>> would be only the last alternative ;-)
>> 
>> How many disks you had in the RAID1. How many are failed ?
> 
> The answer is (a bit indirectly) in what you quoted.  Repeating:
>>> One disk failed[.] I could remount the remaining one[.]
> 
> So it was a two-device raid1, one failed device, one remaining,
> unfailed.
 
 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: rw-mount-problem after raid1-failure

2015-06-11 Thread Martin
It is reproduceable but the logs doesn't say much:

dmesg:
[151183.214355] BTRFS info (device sdb2): allowing degraded mounts
[151183.214361] BTRFS info (device sdb2): disk space caching is enabled
[151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush 150, 
corrupt 0, gen 0
[151214.513046] BTRFS: too many missing devices, writeable mount is not 
allowed
[151214.548566] BTRFS: open_ctree failed

Can I get more info out of the kernel-module?

Thanks, Martin

Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:
> > On 10 Jun 2015, at 5:35 pm, Martin  wrote:
> > 
> > Hello Anand,
> > 
> > the failed disk was removed. My procedure was the following:
> > 
> > - I found some write errors in the kernel log, so
> > - I shutdown the system
> > - I removed the failed disk
> > - I powered on the system
> > - I mounted the remaining disk degraded,rw (works OK)
> > - the system works an and was rebooted some times, mounting degraded,rw
> > works - suddentlym mounting degraded,rw stops working and only
> > degraded,ro works.
> any logs to say why. ?
> Or
> If these (above) stages are reproducible, could you fetch them afresh?
> 
> Thanks Anand
> 
> > Thanks, Martin
> > 
> > Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
> >> On 06/10/2015 02:58 PM, Martin wrote:
> >>> Hello Anand,
> >>> 
> >>> the
> >>> 
>  mount -o degraded  <-- this should work
> >>> 
> >>> is my problem. The fist times it works but suddently, after a reboot, it
> >>> fails with message "BTRFS: too many missing devices, writeable mount is
> >>> not allowed" in kernel log.
> >>> 
> >>  the failed(ing) disk is it still physically in the system ?
> >>  when btrfs finds EIO on the intermittently failing disk,
> >>  ro-mode kicks in, (there are some opportunity for fixes which
> >>  I am trying). To recover, the approach is to make the failing
> >>  disk a missing disk instead, by pulling out the failing disk
> >>  from the system and boot. When system finds disk missing
> >>  (not EIO rather) it should mount rw,degraded (from the VM part
> >>  at least) and then replace (with a new disk) should work.
> >> 
> >> Thanks, Anand
> >> 
> >>> "btrfs fi show /backup2" shows:
> >>> Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> >>> 
> >>>Total devices 2 FS bytes used 3.50TiB
> >>>devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
> >>>*** Some devices missing
> >>> 
> >>> I suppose there is a "marker", telling the system only to mount in
> >>> ro-mode?
> >>> 
> >>> Due to the ro-mount I can't replace the missing one because all the
> >>> btrfs-
> >>> commands need rw-access ...
> >>> 
> >>> Martin
> >>> 
> >>> Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
>  Ah thanks David. So its 2 disks RAID1.
>  
>  Martin,
>  
>    disk pool error handle is primitive as of now. readonly is the only
>    action it would take. rest of recovery action is manual. thats
>    unacceptable in a data center solutions. I don't recommend btrfs VM
>    productions yet. But we are working to get that to a complete VM.
>    
>    For now, for your pool recovery: pls try this.
>    
>   - After reboot.
>   - modunload and modload (so that kernel devlist is empty)
>   - mount -o degraded  <-- this should work.
>   - btrfs fi show -m <-- Should show missing if you don't let me
>   know.
>   - Do a replace of the missing disk without reading the source
>   disk.
>  
>  Good luck.
>  
>  Thanks, Anand
>  
> > On 06/10/2015 11:58 AM, Duncan wrote:
> > 
> > Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> >>> On 06/09/2015 01:10 AM, Martin wrote:
> >>> Hello!
> >>> 
> >>> I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
> >>> Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
> >>> could
> >>> remount the remaining one with "-o degraded". After one day and some
> >>> write-operations (with no errrors) I had to reboot the system. And
> >>> now
> >>> I can not mount "rw" anymore, only "-o degraded,ro" is possible.
> >>> 
> >>> In the kernel log I found BTRFS: too many missing devices, writeable
> >>> mount is not allowed.
> >>> 
> >>> I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
> >>> did no conversion to a single drive.
> >>> 
> >>> How can I mount the disk "rw" to remove the "missing" drive and add
> >>> a
> >>> new one?
> >>> Because there are many snapshots of the filesystem, copying the
> >>> system
> >>> would be only the last alternative ;-)
> >> 
> >> How many disks you had in the RAID1. How many are failed ?
> > 
> > The answer is (a bit indirectly) in what you quoted.  Repeating:
> >>> One disk failed[.] I could remount the remaining one[.]
> > 
> > So it was a two-device raid1, one failed devic

Re: rw-mount-problem after raid1-failure

2015-06-12 Thread Anand Jain



On 06/11/2015 09:03 PM, Martin wrote:

It is reproduceable but the logs doesn't say much:

dmesg:
[151183.214355] BTRFS info (device sdb2): allowing degraded mounts
[151183.214361] BTRFS info (device sdb2): disk space caching is enabled
[151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush 150,
corrupt 0, gen 0
[151214.513046] BTRFS: too many missing devices, writeable mount is not
allowed


presumably (we did not confirm that only one disk is missing from
kernel point of view?) with One disk missing if you are still getting
this that means, there is a group profile in your disk pool that does
not tolerate single disk failure either.

So now how would we check all the group profiles in an unmount(able)
state ?

There is a patch to show devlist using /proc/fs/btrfs/devlist.
That would have helped here to debug. I am ok if you could confirm
that using any other method as well.

Thanks, Anand



[151214.548566] BTRFS: open_ctree failed

Can I get more info out of the kernel-module?

Thanks, Martin

Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:

On 10 Jun 2015, at 5:35 pm, Martin  wrote:

Hello Anand,

the failed disk was removed. My procedure was the following:

- I found some write errors in the kernel log, so
- I shutdown the system
- I removed the failed disk
- I powered on the system
- I mounted the remaining disk degraded,rw (works OK)
- the system works an and was rebooted some times, mounting degraded,rw
works - suddentlym mounting degraded,rw stops working and only
degraded,ro works.

any logs to say why. ?
Or
If these (above) stages are reproducible, could you fetch them afresh?

Thanks Anand


Thanks, Martin

Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:

On 06/10/2015 02:58 PM, Martin wrote:

Hello Anand,

the


mount -o degraded  <-- this should work


is my problem. The fist times it works but suddently, after a reboot, it
fails with message "BTRFS: too many missing devices, writeable mount is
not allowed" in kernel log.


  the failed(ing) disk is it still physically in the system ?
  when btrfs finds EIO on the intermittently failing disk,
  ro-mode kicks in, (there are some opportunity for fixes which
  I am trying). To recover, the approach is to make the failing
  disk a missing disk instead, by pulling out the failing disk
  from the system and boot. When system finds disk missing
  (not EIO rather) it should mount rw,degraded (from the VM part
  at least) and then replace (with a new disk) should work.

Thanks, Anand


"btrfs fi show /backup2" shows:
Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512

Total devices 2 FS bytes used 3.50TiB
devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
*** Some devices missing

I suppose there is a "marker", telling the system only to mount in
ro-mode?

Due to the ro-mount I can't replace the missing one because all the
btrfs-
commands need rw-access ...

Martin

Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:

Ah thanks David. So its 2 disks RAID1.

Martin,

   disk pool error handle is primitive as of now. readonly is the only
   action it would take. rest of recovery action is manual. thats
   unacceptable in a data center solutions. I don't recommend btrfs VM
   productions yet. But we are working to get that to a complete VM.

   For now, for your pool recovery: pls try this.

  - After reboot.
  - modunload and modload (so that kernel devlist is empty)
  - mount -o degraded  <-- this should work.
  - btrfs fi show -m <-- Should show missing if you don't let me
  know.
  - Do a replace of the missing disk without reading the source
  disk.

Good luck.

Thanks, Anand


On 06/10/2015 11:58 AM, Duncan wrote:

Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:

On 06/09/2015 01:10 AM, Martin wrote:
Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu Vivid
Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
could
remount the remaining one with "-o degraded". After one day and some
write-operations (with no errrors) I had to reboot the system. And
now
I can not mount "rw" anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices, writeable
mount is not allowed.

I read about https://bugzilla.kernel.org/show_bug.cgi?id=60594 but I
did no conversion to a single drive.

How can I mount the disk "rw" to remove the "missing" drive and add
a
new one?
Because there are many snapshots of the filesystem, copying the
system
would be only the last alternative ;-)


How many disks you had in the RAID1. How many are failed ?


The answer is (a bit indirectly) in what you quoted.  Repeating:

One disk failed[.] I could remount the remaining one[.]


So it was a two-device raid1, one failed device, one remaining,
unfailed.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://v

Re: rw-mount-problem after raid1-failure

2015-06-14 Thread Martin
Do you know, where I can find this kernel-patch because I didn't find it. Then 
I will build the patched kernel and send the devlist-output.

Thanks, Martin

Am Freitag, 12. Juni 2015, 18:38:18 schrieb Anand Jain:
> On 06/11/2015 09:03 PM, Martin wrote:
> > It is reproduceable but the logs doesn't say much:
> > 
> > dmesg:
> > [151183.214355] BTRFS info (device sdb2): allowing degraded mounts
> > [151183.214361] BTRFS info (device sdb2): disk space caching is enabled
> > [151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush
> > 150,
> > corrupt 0, gen 0
> > [151214.513046] BTRFS: too many missing devices, writeable mount is not
> > allowed
> 
> presumably (we did not confirm that only one disk is missing from
> kernel point of view?) with One disk missing if you are still getting
> this that means, there is a group profile in your disk pool that does
> not tolerate single disk failure either.
> 
> So now how would we check all the group profiles in an unmount(able)
> state ?
> 
> There is a patch to show devlist using /proc/fs/btrfs/devlist.
> That would have helped here to debug. I am ok if you could confirm
> that using any other method as well.
> 
> Thanks, Anand
> 
> > [151214.548566] BTRFS: open_ctree failed
> > 
> > Can I get more info out of the kernel-module?
> > 
> > Thanks, Martin
> > 
> > Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:
> >>> On 10 Jun 2015, at 5:35 pm, Martin  wrote:
> >>> 
> >>> Hello Anand,
> >>> 
> >>> the failed disk was removed. My procedure was the following:
> >>> 
> >>> - I found some write errors in the kernel log, so
> >>> - I shutdown the system
> >>> - I removed the failed disk
> >>> - I powered on the system
> >>> - I mounted the remaining disk degraded,rw (works OK)
> >>> - the system works an and was rebooted some times, mounting degraded,rw
> >>> works - suddentlym mounting degraded,rw stops working and only
> >>> degraded,ro works.
> >> 
> >> any logs to say why. ?
> >> Or
> >> If these (above) stages are reproducible, could you fetch them afresh?
> >> 
> >> Thanks Anand
> >> 
> >>> Thanks, Martin
> >>> 
> >>> Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:
>  On 06/10/2015 02:58 PM, Martin wrote:
> > Hello Anand,
> > 
> > the
> > 
> >> mount -o degraded  <-- this should work
> > 
> > is my problem. The fist times it works but suddently, after a reboot,
> > it
> > fails with message "BTRFS: too many missing devices, writeable mount
> > is
> > not allowed" in kernel log.
> > 
>    the failed(ing) disk is it still physically in the system ?
>    when btrfs finds EIO on the intermittently failing disk,
>    ro-mode kicks in, (there are some opportunity for fixes which
>    I am trying). To recover, the approach is to make the failing
>    disk a missing disk instead, by pulling out the failing disk
>    from the system and boot. When system finds disk missing
>    (not EIO rather) it should mount rw,degraded (from the VM part
>    at least) and then replace (with a new disk) should work.
>  
>  Thanks, Anand
>  
> > "btrfs fi show /backup2" shows:
> > Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512
> > 
> > Total devices 2 FS bytes used 3.50TiB
> > devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
> > *** Some devices missing
> > 
> > I suppose there is a "marker", telling the system only to mount in
> > ro-mode?
> > 
> > Due to the ro-mount I can't replace the missing one because all the
> > btrfs-
> > commands need rw-access ...
> > 
> > Martin
> > 
> > Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:
> >> Ah thanks David. So its 2 disks RAID1.
> >> 
> >> Martin,
> >> 
> >>disk pool error handle is primitive as of now. readonly is the
> >>only
> >>action it would take. rest of recovery action is manual. thats
> >>unacceptable in a data center solutions. I don't recommend btrfs
> >>VM
> >>productions yet. But we are working to get that to a complete VM.
> >>
> >>For now, for your pool recovery: pls try this.
> >>
> >>   - After reboot.
> >>   - modunload and modload (so that kernel devlist is empty)
> >>   - mount -o degraded  <-- this should work.
> >>   - btrfs fi show -m <-- Should show missing if you don't let me
> >>   know.
> >>   - Do a replace of the missing disk without reading the source
> >>   disk.
> >> 
> >> Good luck.
> >> 
> >> Thanks, Anand
> >> 
> >>> On 06/10/2015 11:58 AM, Duncan wrote:
> >>> 
> >>> Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:
> > On 06/09/2015 01:10 AM, Martin wrote:
> > Hello!
> > 
> > I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu
> > Vivid
> 

Re: rw-mount-problem after raid1-failure

2015-06-14 Thread Anand Jain

Martin,

below patch will help to obtain the device list from the kernel.

https://patchwork.kernel.org/patch/4996111/

(just fyi above patch is v1 which should apply,
however its v2 would not apply without sysfs patches)

However  other data what we need is group profile,
collect them when and if device could mount.

(was there any patch which can obtain the group profile,
without mounting ?, I vaguely remember some efforts/comment
on that before).

Thanks, Anand



On 06/15/2015 02:24 AM, Martin wrote:

Do you know, where I can find this kernel-patch because I didn't find it. Then
I will build the patched kernel and send the devlist-output.

Thanks, Martin

Am Freitag, 12. Juni 2015, 18:38:18 schrieb Anand Jain:

On 06/11/2015 09:03 PM, Martin wrote:

It is reproduceable but the logs doesn't say much:

dmesg:
[151183.214355] BTRFS info (device sdb2): allowing degraded mounts
[151183.214361] BTRFS info (device sdb2): disk space caching is enabled
[151183.317719] BTRFS: bdev (null) errs: wr 7988389, rd 7707002, flush
150,
corrupt 0, gen 0
[151214.513046] BTRFS: too many missing devices, writeable mount is not
allowed


presumably (we did not confirm that only one disk is missing from
kernel point of view?) with One disk missing if you are still getting
this that means, there is a group profile in your disk pool that does
not tolerate single disk failure either.

So now how would we check all the group profiles in an unmount(able)
state ?

There is a patch to show devlist using /proc/fs/btrfs/devlist.
That would have helped here to debug. I am ok if you could confirm
that using any other method as well.

Thanks, Anand


[151214.548566] BTRFS: open_ctree failed

Can I get more info out of the kernel-module?

Thanks, Martin

Am Donnerstag, 11. Juni 2015, 08:04:04 schrieb Anand Jain:

On 10 Jun 2015, at 5:35 pm, Martin  wrote:

Hello Anand,

the failed disk was removed. My procedure was the following:

- I found some write errors in the kernel log, so
- I shutdown the system
- I removed the failed disk
- I powered on the system
- I mounted the remaining disk degraded,rw (works OK)
- the system works an and was rebooted some times, mounting degraded,rw
works - suddentlym mounting degraded,rw stops working and only
degraded,ro works.


any logs to say why. ?
Or
If these (above) stages are reproducible, could you fetch them afresh?

Thanks Anand


Thanks, Martin

Am Mittwoch, 10. Juni 2015, 15:46:52 schrieb Anand Jain:

On 06/10/2015 02:58 PM, Martin wrote:

Hello Anand,

the


mount -o degraded  <-- this should work


is my problem. The fist times it works but suddently, after a reboot,
it
fails with message "BTRFS: too many missing devices, writeable mount
is
not allowed" in kernel log.


   the failed(ing) disk is it still physically in the system ?
   when btrfs finds EIO on the intermittently failing disk,
   ro-mode kicks in, (there are some opportunity for fixes which
   I am trying). To recover, the approach is to make the failing
   disk a missing disk instead, by pulling out the failing disk
   from the system and boot. When system finds disk missing
   (not EIO rather) it should mount rw,degraded (from the VM part
   at least) and then replace (with a new disk) should work.

Thanks, Anand


"btrfs fi show /backup2" shows:
Label: none  uuid: 6d755db5-f8bb-494e-9bdc-cf524ff99512

 Total devices 2 FS bytes used 3.50TiB
 devid4 size 7.19TiB used 4.02TiB path /dev/sdb2
 *** Some devices missing

I suppose there is a "marker", telling the system only to mount in
ro-mode?

Due to the ro-mount I can't replace the missing one because all the
btrfs-
commands need rw-access ...

Martin

Am Mittwoch, 10. Juni 2015, 14:38:38 schrieb Anand Jain:

Ah thanks David. So its 2 disks RAID1.

Martin,

disk pool error handle is primitive as of now. readonly is the
only
action it would take. rest of recovery action is manual. thats
unacceptable in a data center solutions. I don't recommend btrfs
VM
productions yet. But we are working to get that to a complete VM.

For now, for your pool recovery: pls try this.

   - After reboot.
   - modunload and modload (so that kernel devlist is empty)
   - mount -o degraded  <-- this should work.
   - btrfs fi show -m <-- Should show missing if you don't let me
   know.
   - Do a replace of the missing disk without reading the source
   disk.

Good luck.

Thanks, Anand


On 06/10/2015 11:58 AM, Duncan wrote:

Anand Jain posted on Wed, 10 Jun 2015 09:19:37 +0800 as excerpted:

On 06/09/2015 01:10 AM, Martin wrote:
Hello!

I have a raid1-btrfs-system (Kernel 3.19.0-18-generic, Ubuntu
Vivid
Vervet, btrfs-tools 3.17-1.1). One disk failed some days ago. I
could
remount the remaining one with "-o degraded". After one day and
some
write-operations (with no errrors) I had to reboot the system. And
now
I can not mount "rw" anymore, only "-o degraded,ro" is possible.

In the kernel log I found BTRFS: too many missing devices

[PATCH] Btrfs: fix subvolume fake mount problem when default subvolume is set

2011-04-06 Thread Zhong, Xin
We create two subvolumes (meego_root and meego_home) in
btrfs root directory. And set meego_root as default mount
subvolume. After we remount btrfs, meego_root is mounted
to top directory by default. Then if we create a directory with
the same name as meego_home, we can mount meego_home subvolume
sucessfully (subvol=meego_home). But this is incorrect. What we
do in this mount point will not change anything in meego_home
subvolume. The problem is when default mount subvolume is set to
meego_root, we search meego_home in meego_root. But if we find a
directory with the same name, we will treat it as subvolume. So
the solution is to check if what we find is really a subvolume.

Signed-off-by: Zhong, Xin 
---
 fs/btrfs/super.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b85fe78..66a76b7 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -823,7 +823,9 @@ static int btrfs_get_sb(struct file_system_type *fs_type, 
int flags,
error = PTR_ERR(new_root);
goto error_free_subvol_name;
}
-   if (!new_root->d_inode) {
+   if (!new_root->d_inode ||
+   /* new_root is a directory, not subvolume */
+   new_root->d_inode->i_ino != BTRFS_FIRST_FREE_OBJECTID) {
dput(root);
dput(new_root);
deactivate_locked_super(s);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix subvolume fake mount problem when default subvolume is set

2011-04-06 Thread Goffredo Baroncelli
Hi
On 04/06/2011 11:34 AM, Zhong, Xin wrote:
> We create two subvolumes (meego_root and meego_home) in
> btrfs root directory. And set meego_root as default mount
> subvolume. After we remount btrfs, meego_root is mounted
> to top directory by default. Then if we create a directory with
> the same name as meego_home, we can mount meego_home subvolume
> sucessfully (subvol=meego_home). But this is incorrect. What we
> do in this mount point will not change anything in meego_home
> subvolume. The problem is when default mount subvolume is set to
> meego_root, we search meego_home in meego_root. But if we find a
> directory with the same name, we will treat it as subvolume. So
> the solution is to check if what we find is really a subvolume.

I think that this a bug, so a warning should be raised. We had a lot of
problem because "btrfsctl -s " did the same thing: if we referred to a
directory, they snapshot the directory's subvolume and the user didn't
understood what happened.

Personally I prefer that in case a wrong option the kernel raises a
warning and stops, not that it makes a choice about which default is
more reasonable.

Reagrds
G.Baroncelli


> 
> Signed-off-by: Zhong, Xin 
> ---
>  fs/btrfs/super.c |4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index b85fe78..66a76b7 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -823,7 +823,9 @@ static int btrfs_get_sb(struct file_system_type *fs_type, 
> int flags,
>   error = PTR_ERR(new_root);
>   goto error_free_subvol_name;
>   }
> - if (!new_root->d_inode) {
> + if (!new_root->d_inode ||
> + /* new_root is a directory, not subvolume */
> + new_root->d_inode->i_ino != BTRFS_FIRST_FREE_OBJECTID) {
>   dput(root);
>   dput(new_root);
>   deactivate_locked_super(s);

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


transid failed / mount Problem on Linux pc6 3.17.2-1-ARCH #1 SMP PREEMPT Thu Oct 30 20:49:39 CET 2014 x86_64 GNU/Linux

2014-11-11 Thread Juergen Sauer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!
this event occoured today in the morning.
Accidentially the Archive Machine was kickt  into hibernation.

After reactivating the archive Btrfs filesystem was "readonly", after
rebooting the system the "archive" btrfs filesystem was not mountable
anymore.

I tried every thing of recovery possibilities I know. Nothing worked.

Here I liste the Problem of the Machine, it would be very ugly to loose
thoes data.

Do you have any further ideas, what I may try to recover my archive
filesystem?

The archive Filesystem is an raid5-multi device btrfs.

System:
root@pc6:/usr/src/build/btrfs-progs# uname -a
Linux pc6 3.17.2-1-ARCH #1 SMP PREEMPT Thu Oct 30 20:49:39 CET 2014
x86_64 GNU/Linux

This BTRFS Tools were in use:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

root@pc6:/usr/src/build/btrfs-progs# ./btrfs fi show
parent transid verify failed on 209362944 wanted 293924 found 293922
parent transid verify failed on 209362944 wanted 293924 found 293922
Check tree block failed, want=209362944, have=209559552
parent transid verify failed on 209362944 wanted 293924 found 293922
Ignoring transid failure
Label: 'archiv'  uuid: 48f71e09-6898-4665-bc61-bd7ca4ba4a24
Total devices 4 FS bytes used 3.35TiB
devid1 size 1.70TiB used 726.69GiB path /dev/sdh3
devid2 size 1.82TiB used 1.35TiB path /dev/sda1
devid3 size 1.82TiB used 1.35TiB path /dev/sdj1
devid4 size 1.82TiB used 1.35TiB path /dev/sdi1

Btrfs v3.17.1

mount -o ro,recovery -t btrfs /dev/sdh3 /mnt
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/sdh3 ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler

   Manchmal liefert das Systemprotokoll wertvolle Informationen –
   versuchen Sie  dmesg | tail  oder ähnlich
root@pc6:/usr/src/build/btrfs-progs# dmesg ...

[ 7116.746815] BTRFS info (device sdi1): enabling auto recovery
[ 7116.746820] BTRFS info (device sdi1): disk space caching is enabled
[ 7117.028008] verify_parent_transid: 6 callbacks suppressed
[ 7117.028013] parent transid verify failed on 209362944 wanted 293924
found 293922
[ 7117.028324] parent transid verify failed on 209362944 wanted 293924
found 293922
[ 7117.033188] parent transid verify failed on 244719616 wanted 293924
found 293922
[ 7117.033516] parent transid verify failed on 244719616 wanted 293924
found 293922
[ 7117.034114] BTRFS: bdev /dev/sda1 errs: wr 3, rd 0, flush 1, corrupt
0, gen 0
[ 7117.034557] parent transid verify failed on 209375232 wanted 293924
found 293914
[ 7117.034873] parent transid verify failed on 209375232 wanted 293924
found 293914
[ 7117.037358] parent transid verify failed on 245538816 wanted 293924
found 293922
[ 7117.037702] parent transid verify failed on 245538816 wanted 293924
found 293922
[ 7117.108132] parent transid verify failed on 253378560 wanted 293924
found 293914
[ 7117.108509] parent transid verify failed on 253378560 wanted 293924
found 293914
[ 7117.231038] BTRFS: bad tree block start 0 253911040
[ 7117.231052] BTRFS: Failed to read block groups: -5
[ 7117.290534] BTRFS: open_ctree failed


root@pc6:/usr/src/build/btrfs-progs# btrfs check  --repair /dev/sdh3
enabling repair mode
parent transid verify failed on 209362944 wanted 293924 found 293922
parent transid verify failed on 209362944 wanted 293924 found 293922
Check tree block failed, want=209362944, have=209559552
parent transid verify failed on 209362944 wanted 293924 found 293922
Ignoring transid failure
parent transid verify failed on 247873536 wanted 293924 found 293922
parent transid verify failed on 247873536 wanted 293924 found 293922
Check tree block failed, want=247873536, have=248070144
parent transid verify failed on 247873536 wanted 293924 found 293922
Ignoring transid failure
leaf parent key incorrect 247873536


root@pc6:/usr/src/build/btrfs-progs# btrfs-zero-log  /dev/sdh3
parent transid verify failed on 209362944 wanted 293924 found 293922
parent transid verify failed on 209362944 wanted 293924 found 293922
Check tree block failed, want=209362944, have=209559552
parent transid verify failed on 209362944 wanted 293924 found 293922
Ignoring transid failure
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs-zero-log[0x40c0ca]
btrfs-zero-log[0x410eb3]
btrfs-zero-log[0x410f6f]
btrfs-zero-log[0x403361]
btrfs-zero-log[0x403975]
btrfs-zero-log[0x408606]
btrfs-zero-log[0x409d8e]
btrfs-zero-log[0x402542]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f810af82040]
btrfs-zero-log[0x402653]

root@pc6:/usr/src/build/btrfs-progs# btrfs rescue  chunk-recover /dev/sdi1
Speicherzugriffsfehler (Speicherabzug geschrieben)


Any Ideas, I may check for ?

TIA ...

mit freundlichen Grüßen
Jürgen Sauer
- -- 
Jürgen Sauer - automatiX GmbH,
http://www.automatix.de/juergen_sauer_publickey.gpg
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iEYEARECAAYFAlRh72UACgkQW7UKI9EqarHXMQCfdN8

Re: transid failed / mount Problem on Linux pc6 3.17.2-1-ARCH #1 SMP PREEMPT Thu Oct 30 20:49:39 CET 2014 x86_64 GNU/Linux

2014-11-11 Thread Duncan
Juergen Sauer posted on Tue, 11 Nov 2014 12:13:41 +0100 as excerpted:

> this event occoured today in the morning.
> Accidentially the Archive Machine was kickt  into hibernation.
> 
> After reactivating the archive Btrfs filesystem was "readonly", after
> rebooting the system the "archive" btrfs filesystem was not mountable
> anymore.

FWIW, I've had similar issues with both mdraid in the past, and with 
btrfs now, with both hibernation and suspend-to-ram.

Tho after early experiences I switched to mdraid-1 some time in the past, 
and now btrfs raid1 mode, which (even with the more mature mdraid) tends 
to be more resilient than raid5 and faster than raid6.  At least with 
raid1, there's multiple copies of the data, and at least in my 
experience, that dramatically increases the reliability of recovery from 
temporary or permanent dropout of one device.

The general problem seems to be that in the resume process, some devices 
wake up faster than others, and even "awake" devices don't necessarily 
fully stabilize for a minute or two.  Back on mdraid, I noticed some 
devices coming up with model number strings and UIDs that would have 
incorrect characters in some position, tho they'd stabilize over time.  
Obviously, this plays havoc with kernel efforts to ensure the devices it 
woke up to are the same devices it had when it went to sleep (either 
suspend to ram or hibernate to disk).

And the same general problems continue to occur with the pair of SSDs I 
have now, with suspend-to-ram instead of hibernate, while the original 
devices I noticed the problem on were spinning rust of an entirely 
different brand.

So it's not a btrfs-specific issue, or a device specific issue, or a 
motherboard specific issue since I've upgraded since I first saw it too, 
or a suspend/hibernate type specific issue.  It's a general issue.  Tho I 
/have/ noticed on the current equipment, that if I suspend for a 
relatively short period, an hour or two, it seems to come back with fewer 
problems than if I suspend for 6 hours or more... say if I suspend while 
I'm at work or overnite.  (FWIW, the old machine seemed to hibernate and 
resume reasonably well other than this but couldn't reliably resume from 
suspend, while the new machine is the opposite, I never got it to resume 
from hibernation, but other than this, it reliably resumes from suspend.)

Unfortunately, the only reliable solution seems to be to fully shut down 
instead of suspending or hibernating, and obviously, after running into 
issues a few times, I eventually quit experimenting further.  But the 
fact that I'm running systemd on fast ssds now, does ameliorate the 
problem quite a bit, both due to faster booting, and by making the lost 
cache of a reboot far less of an issue because reading the data back in 
is so much faster on ssd.

So it seems both suspend and hibernate seem to work better with single 
devices where one device being slower to stabilize won't be the issue it 
is with raid (either mdraid or btrfs raid), and raid doesn't combine well 
with suspend/hibernate. =:^(

Too bad, as being able to suspend and wake up right away was saving on 
the electric bill. =:^(

So if it's really critical, as it arguably might be on an archive 
machine, I'd consider pointing whatever suspend/hibernate triggers at 
shutdown or reboot, instead.  If it's not possible to accidentally 
hibernate the thing, it triggers shutdown/reboot instead, it won't/can't 
be accidentally hibernated. =:^)

> I tried every thing of recovery possibilities I know. Nothing worked.
> 
> Here I liste the Problem of the Machine, it would be very ugly to loose
> thoes data.
> 
> Do you have any further ideas, what I may try to recover my archive
> filesystem?
> 
> The archive Filesystem is an raid5-multi device btrfs.

Btrfs raid5, or mdraid-5 with btrfs on top?  Because it's common 
knowledge that btrfs raid56 modes aren't yet fully implemented, and while 
they work in normal operation, recovery from a lost device is iffy at 
best because the code simply isn't complete for that yet.  As such a 
raid5/6 mode btrfs is best effectively considered a raid0 in terms of 
reliability, don't count on recovering anything if a single device is 
lost, even temporarily.  Depending on the circumstances, it's not always 
/quite/ that bad, but raid0 reliability, or more accurately the lack 
thereof, is what you plan for when you setup a btrfs raid5 or raid6, 
because that's effectively what it is until the recovery code is complete 
and tested, and that way you won't be caught with critical data on it if 
it does go south, any more than you would put critical data on a raid0.

So I /hope/ you meant mdraid-5, on top of which you had btrfs.  With 
that, once the mdraid level is recovered, you are basically looking at a 
standard btrfs recovery as if it were a single device.  That's still not 
a great position to be in as you are after all looking at a recovery with 
a non-zero chance of failure, but let's c