Re: Unmountable Array After Drive Failure During Device Deletion

2013-12-21 Thread ronnie sahlberg
Similar things happened to me. (See my unanswered posts ~1Sep, this fs
is not really ready for production I think)

When you get wrong transid errors and reports that you have checksums
being repaired,
that is all bad news and no one can help you.

Unfortunately there are, I think, no real tools to fix basic fs erros.


I never managed to get the my in a state where it could be mounted at all
but did manage to recover most of my data using
btrfs restore from
https://github.com/FauxFaux/btrfs-progs

This is the argument from that command that I used to recover data :
I got most data back with ith but YMMV.

  commit 2a2a1fb21d375a46f9073e44a7b9d9bb7bfaa1e2
  Author: Peter Stuge 
  Date:   Fri Nov 25 01:03:58 2011 +0100

restore: Add regex matching of paths and files to be restored

The option -m is used to specify the regex string. -c is used to
specify case insensitive matching. -i was already taken.

In order to restore only a single folder somewhere in the btrfs
tree, it is unfortunately neccessary to construct a slightly
nontrivial regex, e.g.:

restore -m '^/(|home(|/username(|/Desktop(|/.*$' /dev/sdb2 /output

This is needed in order to match each directory along the way to the
Desktop directory, as well as all contents below the Desktop directory.

Signed-off-by: Peter Stuge 
Signed-off-by: Josef Bacik 


I wont give advice for your data.
For my data, I copied as much data as I could recover from the
filesystem over to a different filesystem
using the tools in the repo above.
After that destroy the damaged filesystem and rebuild from scratch.


Then  depending on how important your data is, you start making
backups regularely, or switch to a less fragile and unrepairable fs.


On Thu, Dec 19, 2013 at 1:26 AM, Chris Kastorff  wrote:
> I'm using btrfs in data and metadata RAID10 on drives (not on md or any
> other fanciness.)
>
> I was removing a drive (btrfs dev del) and during that operation, a
> different drive in the array failed. Having not had this happen before,
> I shut down the machine immediately due to the extremely loud piezo
> buzzer on the drive controller card. I attempted to do so cleanly, but
> the buzzer cut through my patience and after 4 minutes I cut the power.
>
> Afterwards, I located and removed the failed drive from the system, and
> then got back to linux. The array no longer mounts ("failed to read the
> system array on sdc"), with nearly identical messages when attempted
> with -o recovery and -o recovery,ro.
>
> btrfsck asserts and coredumps, as usual.
>
> The drive that was being removed is devid 9 in the array, and is
> /dev/sdm1 in the btrfs fi show seen below.
>
> Kernel 3.12.4-1-ARCH, btrfs-progs v0.20-rc1-358-g194aa4a-dirty
> (archlinux build.)
>
> Can I recover the array?
>
> == dmesg during failure ==
>
> ...
> sd 0:2:3:0: [sdd] Unhandled error code
> sd 0:2:3:0: [sdd]
> Result: hostbyte=0x04 driverbyte=0x00
> sd 0:2:3:0: [sdd] CDB:
> cdb[0]=0x2a: 2a 00 26 89 5b 00 00 00 80 00
> end_request: I/O error, dev sdd, sector 646535936
> btrfs_dev_stat_print_on_error: 7791 callbacks suppressed
> btrfs: bdev /dev/sdd errs: wr 315858, rd 230194, flush 0, corrupt 0, gen 0
> sd 0:2:3:0: [sdd] Unhandled error code
> sd 0:2:3:0: [sdd]
> Result: hostbyte=0x04 driverbyte=0x00
> sd 0:2:3:0: [sdd] CDB:
> cdb[0]=0x2a: 2a 00 26 89 5b 80 00 00 80 00
> end_request: I/O error, dev sdd, sector 646536064
> ...
>
> == dmesg after new boot, mounting attempt ==
>
> btrfs: device label lake devid 11 transid 4893967 /dev/sda
> btrfs: disk space caching is enabled
> btrfs: failed to read the system array on sdc
> btrfs: open_ctree failed
>
> == dmesg after new boot, mounting attempt with -o recovery,ro ==
>
> btrfs: device label lake devid 11 transid 4893967 /dev/sda
> btrfs: enabling auto recovery
> btrfs: disk space caching is enabled
> btrfs: failed to read the system array on sdc
> btrfs: open_ctree failed
>
> == btrfsck ==
>
> deep# btrfsck /dev/sda
> warning, device 14 is missing
> warning devid 14 not found already
> parent transid verify failed on 87601116364800 wanted 4893969 found 4893913
> parent transid verify failed on 87601116364800 wanted 4893969 found 4893913
> parent transid verify failed on 87601116381184 wanted 4893969 found 4893913
> parent transid verify failed on 87601116381184 wanted 4893969 found 4893913
> parent transid verify failed on 87601115320320 wanted 4893969 found 4893913
> parent transid verify failed on 87601115320320 wanted 4893969 found 4893913
> parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
> parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
> Ignoring transid failure
> Checking filesystem on /dev/sda
> UUID: d5e17c49-d980-4bde-bd96-3c8bc95ea077
> checking extents
> parent transid verify failed on 87601117159424 wanted 4893969 found 4893913
> parent transid verify failed on 87601117159424 wanted 4893969 found 4893913
> parent transid verify failed on 876011

Re: Btrfs RAID1 File System Grew Something Extra

2013-12-21 Thread Kai Krakow
Duncan <1i5t5.dun...@cox.net> schrieb:

> But the above documentation should also suggest trying this to see if it
> addresses that remaining single-mode system chunk stub:
> 
> btrfs balance start -fsconvert=raid1 /home

Cool man, that fixed it for me. :-)

Regards,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unmountable Array After Drive Failure During Device Deletion

2013-12-21 Thread Chris Murphy

On Dec 21, 2013, at 4:16 PM, Chris Kastorff  wrote:
>> 
>> 1. btrfs-image -c 9 -t <#cores> (see man page)
>> This is optional but one of the devs might want to see this because it 
>> should be a more rare case that either normal mount fix ups or additional 
>> recovery fix ups can't deal with this problem.
> 
> This fails:
> 
> deep# ./btrfs-image -c 9 -t 4 /dev/sda btrfsimg
> warning, device 14 is missing
> warning devid 14 not found already
> parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
> parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
> Ignoring transid failure
> parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
> Ignoring transid failure
> Error going to next leaf -5
> create failed (Bad file descriptor)

Well, that's unfortunate. Someone else is going to have to comment on the 
confusion of the tools trying to fix the file system while a device is missing, 
which cannot be removed due to the fact the file system can't be mounted, 
because it needs to be fixed first. Circular problem.

> 
>> 3. Try to mount again with -o degraded,recovery and report back.
> 
> Since btrfs-zero-log (probably) didn't modify anything, the error
> message is the same:
> 
> btrfs: allowing degraded mounts
> btrfs: enabling auto recovery
> btrfs: disk space caching is enabled
> btrfs: bdev (null) errs: wr 344288, rd 230234, flush 0, corrupt 0, gen 0
> btrfs: bdev /dev/sdm1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
> btrfs: bdev /dev/sdg errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
> parent transid verify failed on 87601117097984 wanted 4893969 found 4892460
> Failed to read block groups: -5
> btrfs: open_ctree failed

How about:

-o skip_balance,degraded,recovery

If that fails, try:


-o skip_balance,degraded,recovery,ro

The ro file system probably doesn't let you delete missing, but it's worth a 
shot because the seems to be limiting repairs due to the missing device.

If you still have failure, it's worth repeating with the absolute latest kernel 
you can find or even build.

After that it gets really aggressive to dangerous and I'm not sure what to 
recommend next except avoid btrfs check --repair until dead last. I'd sooner go 
for the ro mount and use btrfs send/receive to get the current data you want 
off the file system, and create a new one from scratch. 

> 
> btrfs-zero-log's "Unable to find block group for 0" combined with the
> earlier kernel message on mount attempts "btrfs: failed to read the
> system array on sdc" and btrfsck's "Couldn't map the block %ld" tells me
> the (first) underlying problem is that the block group tree(?) in the
> system allocated data is screwed up.
> 
> I have no idea where to go from here, aside from grabbing a compiler and
> having at the disk structures myself.

There are some other options but they get progressively and quickly into 
possibly making things a lot worse. At a certain point it's an extraction 
operation rather than repair and continue.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unmountable Array After Drive Failure During Device Deletion

2013-12-21 Thread Chris Kastorff
>> - Array is good. All drives are accounted for, btrfs scrub runs cleanly.
>> btrfs fi show shows no missing drives and reasonable allocations.
>> - I start btrfs dev del to remove devid 9. It chugs along with no
>> errors, until:
>> - Another drive in the array (NOT THE ONE I RAN DEV DEL ON) fails, and
>> all reads and writes to it fail, causing the SCSI errors above.
>> - I attempt clean shutdown. It takes too long for because my drive
>> controller card is buzzing loudly and the neighbors are sensitive to
>> noise, so:
>> - I power down the machine uncleanly.
>> - I remove the failed drive, NOT the one I ran dev del on.
>> - I reboot, attempt to mount with various options, all of which cause
>> the kernel to yell at me and the mount command returns failure.
> 
> devid 9 is "device delete" in-progress, and while that's occurring devid 15 
> fails completely. Is that correct?

Either devid 14 or devid 10 (from memory) dropped out, devid 15 is still 
working.

> Because previously you reported, in part this:
>devid   15 size 1.82TB used 1.47TB path /dev/sdd
>*** Some devices missing
> 
> And this:
> 
> sd 0:2:3:0: [sdd] Unhandled error code

Yeah, those two are from different boots. sdd is the one that dropped out, and 
after a reboot another (working) drive was renumbered to sdd. Sorry for the 
confusion.

(Also note that if devid 15 was missing, it would not be reported in btrfs fi 
show.)

> That why I was confused. It looks like dead/missing device is one devid, and 
> then devid 15 /dev/sdd is also having hardware problems - because all of this 
> was posted at the same time. But I take it they're different boots and the 
> /dev/sdd's are actually two different devids.
> 
> So devid 9 was "deleted" and then devid 14 failed. Right? Lovely when 
> /dev/sdX changes between boots.

It never finished the deletion (was probably about halfway through,
based on previous dev dels), but otherwise yes.

>> From what I understand, at all points there should be at least two
>> copies of every extent during a dev del when all chunks are allocated
>> RAID10 (and they are, according to btrfs fi df ran before on the mounted
>> fs).
>>
>> Because of this, I expect to be able to use the chunks from the (not
>> successfully removed) devid=9, as I have done many many times before due
>> to other btrfs bugs that needed unclean shutdowns during dev del.
> 
> I haven't looked at the code or read anything this specific on the state of 
> the file system during a device delete. But my expectation is that there are 
> 1-2 chunks available for writes. And 2-3 chunks available for reads. Some 
> writes must be only one copy because a chunk hasn't yet been replicated 
> elsewhere, and presumably the device being "deleted" is not subject to writes 
> as the transid also implies. Whereas devid 9 is one set of chunks for 
> reading, those chunks have pre-existing copies elsewhere in the file system 
> so that's two copies. And there's a replication in progress of the soon to be 
> removed chunks. So that's up to three copies.
> 
> Problem is that for sure you've lost some chunks due to the failed/missing 
> device. Normal raid10, it's unambiguous whether we've lost two mirrored sets. 
> With Btrfs that's not clear as chunks are distributed. So it's possible that 
> there are some chunks that don't exist at all for writes, and only 1 for 
> reads. It may be no chunks are in common between devid 9 and the dead one. It 
> may be only a couple of data or metadata chunks are in common.
> 
> 
> 
>>
>> Under the assumption devid=9 is good, if a slightly out of date on
>> transid (which ALL data says is true), I should be able to completely
>> recover all data, because data that was not modified during the deletion
>> resides on devid=9, and data that was modified should be redundantly
>> (RAID10) stored on the remaining drives, and thus should work given this
>> case of a single drive failure.
>>
>> Is this not the case? Does btrfs not maintain redundancy during device
>> removal?
> 
> Good questions. I'm not certain. But the speculation seems reasonable, not 
> accounting for the missing device. That's what makes this different.
> 
> 
> 
 btrfs read error corrected: ino 1 off 87601116364800 (dev /dev/sdf
 sector 62986400)

 btrfs read error corrected: ino 1 off 87601116798976 (dev /dev/sdg
 sector 113318256)
>>>
>>> I'm not sure what constitutes a btrfs read error, maybe the device it
>>> originally requested data from didn't have it where it was expected
>>> but was able to find it on these devices. If the drive itself has a
>>> problem reading a sector and ECC can't correct it, it reports the
>>> read error to libata. So kernel messages report this with a line that
>>> starts with the word "exception" and then a line with "cmd" that
>>> shows what command and LBAs where issued to the drive, and then a
>>> "res" line that should contain an error mask with the actual error -
>>> bus error, media error. 

Re: [PATCH 21/21] hfsplus: remove can_set_xattr

2013-12-21 Thread Vyacheslav Dubeyko

On Dec 20, 2013, at 4:16 PM, Christoph Hellwig wrote:

> When using the per-superblock xattr handlers permission checking is
> done by the generic code.  hfsplus just needs to check for the magic
> osx attribute not to leak into protected namespaces.
> 
> Also given that the code was obviously copied from JFS the proper
> attribution was missing.
> 

I don't think that this code changing is correct. Current modification
breaks logic. Please, see my comments below.

> Signed-off-by: Christoph Hellwig 
> ---
> fs/hfsplus/xattr.c |   87 ++--
> 1 file changed, 3 insertions(+), 84 deletions(-)
> 
> diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
> index bf88baa..0b4a5c9 100644
> --- a/fs/hfsplus/xattr.c
> +++ b/fs/hfsplus/xattr.c
> @@ -52,82 +52,6 @@ static inline int is_known_namespace(const char *name)
>   return true;
> }
> 
> -static int can_set_system_xattr(struct inode *inode, const char *name,
> - const void *value, size_t size)

I agree that it makes sense to remove this code if permission checking
is done by generic code.

> -{
> -#ifdef CONFIG_HFSPLUS_FS_POSIX_ACL
> - struct posix_acl *acl;
> - int err;
> -
> - if (!inode_owner_or_capable(inode))
> - return -EPERM;
> -
> - /*
> -  * POSIX_ACL_XATTR_ACCESS is tied to i_mode
> -  */
> - if (strcmp(name, POSIX_ACL_XATTR_ACCESS) == 0) {
> - acl = posix_acl_from_xattr(&init_user_ns, value, size);
> - if (IS_ERR(acl))
> - return PTR_ERR(acl);
> - if (acl) {
> - err = posix_acl_equiv_mode(acl, &inode->i_mode);
> - posix_acl_release(acl);
> - if (err < 0)
> - return err;
> - mark_inode_dirty(inode);
> - }
> - /*
> -  * We're changing the ACL.  Get rid of the cached one
> -  */
> - forget_cached_acl(inode, ACL_TYPE_ACCESS);
> -
> - return 0;
> - } else if (strcmp(name, POSIX_ACL_XATTR_DEFAULT) == 0) {
> - acl = posix_acl_from_xattr(&init_user_ns, value, size);
> - if (IS_ERR(acl))
> - return PTR_ERR(acl);
> - posix_acl_release(acl);
> -
> - /*
> -  * We're changing the default ACL.  Get rid of the cached one
> -  */
> - forget_cached_acl(inode, ACL_TYPE_DEFAULT);
> -
> - return 0;
> - }
> -#endif /* CONFIG_HFSPLUS_FS_POSIX_ACL */
> - return -EOPNOTSUPP;
> -}
> -
> -static int can_set_xattr(struct inode *inode, const char *name,
> - const void *value, size_t value_len)

This function works for all handlers. So, I don't think that it makes sense
to delete it.

> -{
> - if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> - return can_set_system_xattr(inode, name, value, value_len);
> -

I agree that it needs to remove this check for XATTR_SYSTEM_PREFIX case.

> - if (!strncmp(name, XATTR_MAC_OSX_PREFIX, XATTR_MAC_OSX_PREFIX_LEN)) {
> - /*
> -  * This makes sure that we aren't trying to set an
> -  * attribute in a different namespace by prefixing it
> -  * with "osx."
> -  */
> - if (is_known_namespace(name + XATTR_MAC_OSX_PREFIX_LEN))
> - return -EOPNOTSUPP;

I think that this check is important. It forbids such combinations as 
"osx.system.*" or
"osx.trusted.*", for example. Because "osx.*" is virtual namespace for xattrs 
that
it can be under Mac OS X. If you want to set xattr from "system.*" namespace, 
for example,
then you need to use another handler. And such namespace should be without
addition of "osx." prefix.

> -
> - return 0;
> - }
> -
> - /*
> -  * Don't allow setting an attribute in an unknown namespace.
> -  */
> - if (strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) &&
> - strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN) &&
> - strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN))
> - return -EOPNOTSUPP;
> -
> - return 0;
> -}
> -
> static void hfsplus_init_header_node(struct inode *attr_file,
>   u32 clump_size,
>   char *buf, u16 node_size)
> @@ -350,10 +274,6 @@ int __hfsplus_setxattr(struct inode *inode, const char 
> *name,
>   HFSPLUS_IS_RSRC(inode))
>   return -EOPNOTSUPP;
> 
> - err = can_set_xattr(inode, name, value, size);

The __hfsplus_setxattr() is common method for all handlers. So, removing
this call means that we don't check validity of namespace. I don't think
that such modification is a right way.

> - if (err)
> - return err;
> -
>   if (strncmp(name, XATTR_MAC_OSX_PR

Re: [PATCH 2/3] Btrfs: rework qgroup accounting

2013-12-21 Thread Josef Bacik


On 12/21/2013 03:56 AM, Wang Shilong wrote:

Hello Josef,

Though i know there are still problems related to qgroup(for example removing 
snapshot
will beak qgroup accounting).I did a simple test about your patch..


# btrfs quota enable /mnt
# dd if=/dev/zero of=/mnt/data bs=4k count=102400 oflag=direct
# btrfs sub snapshot /mnt/ /mnt/snap1
# btrfs sub snapshot /mnt /mnt/snap2
# btrfs sub delete /mnt/snap1 /mnt/snap2
# sync
# rm -rf /mnt/data
# sync
# dmesg
# btrfs qgroup show /mnt


Firstly, qgroup accounting is wrong, this is maybe expected because efficient  
fs tree removal.
However, from dmesg, i get the  WARNING:

WARNING: CPU: 1 PID: 2650 at fs/btrfs/qgroup.c:1486 
btrfs_delayed_qgroup_accounting

I did not take a deep look at codes, but i think you will be interested in 
taking a look at this. ^_^


Yup we shouldn't be warning, but I didn't change anything wrt quotas and 
snapshot deletion, that's a whole other issue I don't care about right 
now ;).  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Btrfs: rework qgroup accounting

2013-12-21 Thread Josef Bacik

On 12/21/2013 03:01 AM, Wang Shilong wrote:
> Hi Josef,
>
> I compie btrfs-next in my 32-bit machine, i get the following warnings:
>
> fs/btrfs/qgroup.c: In function ‘qgroup_excl_accounting’:
> fs/btrfs/qgroup.c:1503:12: warning: cast to pointer from integer of different 
> size [-Wint-to-pointer-cast]
>qgroup = (struct btrfs_qgroup *)unode->aux;
> ^
> fs/btrfs/qgroup.c: In function ‘qgroup_calc_old_refcnt’:
> fs/btrfs/qgroup.c:1571:9: warning: cast to pointer from integer of different 
> size [-Wint-to-pointer-cast]
> qg = (struct btrfs_qgroup *)tmp_unode->aux;
>  ^
> fs/btrfs/qgroup.c: In function ‘qgroup_account_deleted_refs’:
> fs/btrfs/qgroup.c:1665:8: warning: cast to pointer from integer of different 
> size [-Wint-to-pointer-cast]
>qg = (struct btrfs_qgroup *)unode->aux;
> ^
> fs/btrfs/qgroup.c: In function ‘qgroup_calc_new_refcnt’:
> fs/btrfs/qgroup.c:1705:8: warning: cast to pointer from integer of different 
> size [-Wint-to-pointer-cast]
>qg = (struct btrfs_qgroup *)unode->aux;
> ^
> fs/btrfs/qgroup.c: In function ‘qgroup_adjust_counters’:
> fs/btrfs/qgroup.c:1767:8: warning: cast to pointer from integer of different 
> size [-Wint-to-pointer-cast]
>qg = (struct btrfs_qgroup *)unode->aux;
>
> this patch is newly added into btrfs-next, so i think it is better that you 
> fix these warnings locally .^_^

Crap I fixed part of these but not the other part, I'll fix it up and
push it out Monday. Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Btrfs: rework qgroup accounting

2013-12-21 Thread Wang Shilong
Hello Josef,

Though i know there are still problems related to qgroup(for example removing 
snapshot
will beak qgroup accounting).I did a simple test about your patch..


# btrfs quota enable /mnt
# dd if=/dev/zero of=/mnt/data bs=4k count=102400 oflag=direct
# btrfs sub snapshot /mnt/ /mnt/snap1
# btrfs sub snapshot /mnt /mnt/snap2
# btrfs sub delete /mnt/snap1 /mnt/snap2
# sync
# rm -rf /mnt/data
# sync
# dmesg
# btrfs qgroup show /mnt


Firstly, qgroup accounting is wrong, this is maybe expected because efficient  
fs tree removal.
However, from dmesg, i get the  WARNING:

WARNING: CPU: 1 PID: 2650 at fs/btrfs/qgroup.c:1486 
btrfs_delayed_qgroup_accounting

I did not take a deep look at codes, but i think you will be interested in 
taking a look at this. ^_^


Thanks,
Wang--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html