Re: df missing filesystem when run on subvolume

2016-11-09 Thread Kyle Gates
Resending, hopefully correct formatting.

As the title suggests, running the df command on a subvolume doesn't return a 
filesystem. I'm not sure where the problem lies or if anyone else has noticed 
this. Some programs fail to detect free space as a result.

Example for clarification:
kyle@home:~$ sudo mount -o subvol=@data /mnt/btrfs/
kyle@home:~$ mkdir /mnt/btrfs/directory
kyle@home:~$ btrfs subvolume create /mnt/btrfs/subvolume
Create subvolume '/mnt/btrfs/subvolume'
kyle@home:~$ sudo btrfs subvolume list /mnt/btrfs/
ID 258 gen 2757271 top level 5 path @data
ID 5684 gen 2718215 top level 258 path subvolume
kyle@home:~$ df /mnt/btrfs/
Filesystem  1K-blocks   Used Available Use% Mounted on
/dev/sdc2  1412456448 1170400072 240688008  83% /mnt/btrfs
kyle@home:~$ df /mnt/btrfs/directory
Filesystem  1K-blocks   Used Available Use% Mounted on
/dev/sdc2  1412456448 1170400072 240688008  83% /mnt/btrfs
kyle@home:~$ df /mnt/btrfs/subvolume
Filesystem  1K-blocks   Used Available Use% Mounted on
-  1412456448 1170400072 240688008  83% /mnt/btrfs/subvolume

Thanks,
Kyle--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


df missing filesystem when run on subvolume

2016-11-02 Thread Kyle Gates
As the title suggests, running the df command on a subvolume doesn't return a 
filesystem. I'm not sure where the problem lies or if anyone else has noticed 
this. Some programs fail to detect free space as a result.

Example for clarification:
kyle@home:~$ sudo mount -o subvol=@data /mnt/btrfs/
kyle@home:~$ mkdir /mnt/btrfs/directory
kyle@home:~$ btrfs subvolume create /mnt/btrfs/subvolume
Create subvolume '/mnt/btrfs/subvolume'
kyle@home:~$ sudo btrfs subvolume list /mnt/btrfs/
ID 258 gen 2757271 top level 5 path @data
ID 5684 gen 2718215 top level 258 path subvolume
kyle@home:~$ df /mnt/btrfs/
Filesystem  1K-blocks   Used Available Use% Mounted on
/dev/sdc2  1412456448 1170400072 240688008  83% /mnt/btrfs
kyle@home:~$ df /mnt/btrfs/directory
Filesystem  1K-blocks   Used Available Use% Mounted on
/dev/sdc2  1412456448 1170400072 240688008  83% /mnt/btrfs
kyle@home:~$ df /mnt/btrfs/subvolume
Filesystem  1K-blocks   Used Available Use% Mounted on
-  1412456448 1170400072 240688008  83% /mnt/btrfs/subvolume

Thanks,
Kyle--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: your mail

2016-09-01 Thread Kyle Gates
> -Original Message-
> From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-
> ow...@vger.kernel.org] On Behalf Of Austin S. Hemmelgarn
> Sent: Thursday, September 01, 2016 6:18 AM
> To: linux-btrfs@vger.kernel.org
> Subject: Re: your mail
> 
> On 2016-09-01 03:44, M G Berberich wrote:
> > Am Mittwoch, den 31. August schrieb Fennec Fox:
> >> Linux Titanium 4.7.2-1-MANJARO #1 SMP PREEMPT Sun Aug 21 15:04:37
> UTC
> >> 2016 x86_64 GNU/Linux
> >> btrfs-progs v4.7
> >>
> >> Data, single: total=30.01GiB, used=18.95GiB System, single:
> >> total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB,
> >> used=422.17MiB GlobalReserve, single: total=144.00MiB, used=0.00B
> >>
> >> {02:50} Wed Aug 31
> >> [fennectech@Titanium ~]$  sudo fstrim -v / [sudo] password for
> >> fennectech:
> >> Sorry, try again.
> >> [sudo] password for fennectech:
> >> /: 99.8 GiB (107167244288 bytes) trimmed
> >>
> >> {03:08} Wed Aug 31
> >> [fennectech@Titanium ~]$  sudo fstrim -v / [sudo] password for
> >> fennectech:
> >> /: 99.9 GiB (107262181376 bytes) trimmed
> >>
> >>   I ran these commands minutes after echother ane each time it is
> >> trimming the entire free space
> >>
> >> Anyone else seen this?   the filesystem is the root FS and is compressed
> >
> > You should be very happy that it is trimming at all. Typical situation
> > on a used btrfs is
> >
> >   # fstrim -v /
> >   /: 0 B (0 bytes) trimmed
> >
> > even if there is 33G unused space ob the fs:
> >
> >   # df -h /
> >   Filesystem  Size  Used Avail Use% Mounted on
> >   /dev/sda296G   61G   33G  66% /
> >
> I think you're using an old kernel, this has been working since at least 4.5, 
> but
> was broken in some older releases.

M G is running 4.7.2
The problem is that all space has been allocated by block groups and fstrim 
will only work on unallocated space.

On my system all space has been allocated on my root filesystem so 0 B are 
trimmed:
kyle@home:~$  uname -a
Linux home 4.7.2-040702-generic #201608201334 SMP Sat Aug 20 17:37:03 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux
kyle@home:~$  sudo btrfs fi show /
Label: 'root'  uuid: 6af4ebde-81ef-428a-a45f-0e8480ad969a
Total devices 2 FS bytes used 13.44GiB
devid   14 size 20.00GiB used 20.00GiB path /dev/sde2
devid   15 size 20.00GiB used 20.00GiB path /dev/sdb2
kyle@home:~$  btrfs fi df /
Data, RAID1: total=18.97GiB, used=12.98GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=1.00GiB, used=473.83MiB
GlobalReserve, single: total=160.00MiB, used=0.00B
kyle@home:~$  sudo fstrim -v /
[sudo] password for kyle:
/: 0 B (0 bytes) trimmed

But I do have space trimmed on my home filesystem:
kyle@home:~$  sudo btrfs fi show /home/
Label: 'home'  uuid: b75fb450-4a28-434a-a483-e784940d463a
Total devices 2 FS bytes used 18.63GiB
devid   11 size 64.00GiB used 29.03GiB path /dev/sde3
devid   12 size 64.00GiB used 29.03GiB path /dev/sdb3
kyle@home:~$  btrfs fi df /home/
Data, RAID1: total=27.00GiB, used=18.46GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=2.00GiB, used=168.62MiB
GlobalReserve, single: total=64.00MiB, used=0.00B
kyle@home:~$  sudo fstrim -v /home
/home: 70 GiB (75092721664 bytes) trimmed
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


possible enhancement: failing device converted to a seed device

2015-07-02 Thread Kyle Gates
I'll preface this with the fact that I'm just a user and am only posing a 
question for a possible enhancement to btrfs.
 
I'm quite sure it isn't currently allowed but would it be possible to set a 
failing device as a seed instead of kicking it out of a multi-device 
filesystem? This would make the failing device RO, while keeping the filesystem 
as a whole RW thereby allowing the user additional protection when 
recovering/balancing. Is this a feasible/realistic request?
 
Thanks,
Kyle  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ssd mode on rotational media

2015-01-07 Thread Kyle Gates
What issues would arise if ssd mode is activated because of a block layer 
setting the rotational flag to zero? This happens for me running btrfs on 
bcache. Would it be beneficial to pass the no_ssd flag?
Thanks,
Kyle
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Btrfs-progs: rebuild the crc tree with --init-csum-tree

2014-10-01 Thread Kyle Gates
Others might be thinking this to so I better ask:
Does this just read the first copy in the case of dup, raid1, etc. and plow on?
I'm
 not sure how you would handle a mismatch due to a hardware error. 
Perhaps read all the copies and create another subvolume containing the 
mismatched copies?

Thanks,
Kyle

 From: jba...@fb.com
 To: linux-btrfs@vger.kernel.org
 Subject: [PATCH] Btrfs-progs: rebuild the crc tree with --init-csum-tree
 Date: Wed, 1 Oct 2014 10:34:51 -0400
 
 We have --init-csum-tree, which just empties the csum tree.  I'm not sure why 
 we
 would ever need this, but we definitely need to be able to rebuild the csum 
 tree
 in some cases.  This patch adds the ability to completely rebuild the crc tree
 by reading all of the data and adding csum entries for them.  This patch 
 doesn't
 pay attention to NODATASUM inodes, it'll happily add csums for everything.
 Thanks,
 
 Signed-off-by: Josef Bacik jba...@fb.com
 ---
  cmds-check.c | 98 
 
  1 file changed, 98 insertions(+)
 
 diff --git a/cmds-check.c b/cmds-check.c
 index 03b0fbd..3141aa4 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c
 @@ -6625,6 +6625,98 @@ out:
   return ret;
  }
  
 +static int populate_csum(struct btrfs_trans_handle *trans,
 +  struct btrfs_root *csum_root, char *buf, u64 start,
 +  u64 len)
 +{
 + u64 offset = 0;
 + u64 sectorsize;
 + int ret = 0;
 +
 + while (offset  len) {
 + sectorsize = csum_root-sectorsize;
 + ret = read_extent_data(csum_root, buf, start + offset,
 +sectorsize, 0);
 + if (ret)
 + break;
 + ret = btrfs_csum_file_block(trans, csum_root, start + len,
 + start + offset, buf, sectorsize);
 + if (ret)
 + break;
 + offset += sectorsize;
 + }
 + return ret;
 +}
 +
 +static int fill_csum_tree(struct btrfs_trans_handle *trans,
 +   struct btrfs_root *csum_root)
 +{
 + struct btrfs_root *extent_root = csum_root-fs_info-extent_root;
 + struct btrfs_path *path;
 + struct btrfs_extent_item *ei;
 + struct extent_buffer *leaf;
 + char *buf;
 + struct btrfs_key key;
 + int ret;
 +
 + path = btrfs_alloc_path();
 + if (!path)
 + return -ENOMEM;
 +
 + key.objectid = 0;
 + key.type = BTRFS_EXTENT_ITEM_KEY;
 + key.offset = 0;
 +
 + ret = btrfs_search_slot(NULL, extent_root, key, path, 0, 0);
 + if (ret  0) {
 + btrfs_free_path(path);
 + return ret;
 + }
 +
 + buf = malloc(csum_root-sectorsize);
 + if (!buf) {
 + btrfs_free_path(path);
 + return -ENOMEM;
 + }
 +
 + while (1) {
 + if (path-slots[0]= btrfs_header_nritems(path-nodes[0])) {
 + ret = btrfs_next_leaf(extent_root, path);
 + if (ret  0)
 + break;
 + if (ret) {
 + ret = 0;
 + break;
 + }
 + }
 + leaf = path-nodes[0];
 +
 + btrfs_item_key_to_cpu(leaf, key, path-slots[0]);
 + if (key.type != BTRFS_EXTENT_ITEM_KEY) {
 + path-slots[0]++;
 + continue;
 + }
 +
 + ei = btrfs_item_ptr(leaf, path-slots[0],
 + struct btrfs_extent_item);
 + if (!(btrfs_extent_flags(leaf, ei) 
 +   BTRFS_EXTENT_FLAG_DATA)) {
 + path-slots[0]++;
 + continue;
 + }
 +
 + ret = populate_csum(trans, csum_root, buf, key.objectid,
 + key.offset);
 + if (ret)
 + break;
 + path-slots[0]++;
 + }
 +
 + btrfs_free_path(path);
 + free(buf);
 + return ret;
 +}
 +
  static struct option long_options[] = {
   { super, 1, NULL, 's' },
   { repair, 0, NULL, 0 },
 @@ -6794,6 +6886,12 @@ int cmd_check(int argc, char **argv)
   ret = -EIO;
   goto close_out;
   }
 +
 + ret = fill_csum_tree(trans, info-csum_root);
 + if (ret) {
 + fprintf(stderr, crc refilling failed\n);
 + return -EIO;
 + }
   }
   /*
* Ok now we commit and run the normal fsck, which will add
 -- 
 1.8.3.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
  --
To 

RE: btrfs balance enospc

2014-09-16 Thread Kyle Gates
 From: li...@colorremedies.com
 Date: Tue, 16 Sep 2014 11:26:16 -0600


 On Sep 16, 2014, at 10:51 AM, Mark Murawski markm-li...@intellasoft.net 
 wrote:


 Playing around with this filesystem I hot-removed a device from the
 array and put in a replacement.

 Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa
 Total devices 2 FS bytes used 7.43GiB
 devid 1 size 9.31GiB used 8.90GiB path /dev/sdc6
 devid 3 size 9.31GiB used 8.90GiB path
 /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa

 removed /dev/sdc

 Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa
 Total devices 2 FS bytes used 7.43GiB
 devid 3 size 9.31GiB used 8.90GiB path
 /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa
 *** Some devices missing

 cartman {~} root# btrfs device add /dev/sdi6 /
 cartman {~} root# btrfs fi show
 Label: 'Root' uuid: d71404d4-468e-47d5-8f06-3b65fa7776aa
 Total devices 3 FS bytes used 7.43GiB
 devid 3 size 9.31GiB used 8.90GiB path
 /dev/disk/by-uuid/d71404d4-468e-47d5-8f06-3b65fa7776aa
 devid 4 size 10.00GiB used 0.00 path /dev/sdi6
 *** Some devices missing

 cartman {~} root# btrfs filesystem balance start /

 Better to use btrfs replace. But sequence wise you should do btrfs device 
 delete missing, which should then effectively do a balance to the newly added 
 device. So while the sequence isn't really correct, that's probably not why 
 you're getting this failure.

Does/should a balance imply removal of missing devices (as long as the minimum 
number of devices are still available)?




 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2411,
 rd 0, flush 38, corrupt 137167, gen 25

 Please post results of
 smartctl -x /dev/sdc



 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2412,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2413,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2414,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2415,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2416,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2417,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2418,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2419,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:12 localhost kernel: BTRFS: bdev /dev/sdc6 errs: wr 2420,
 rd 0, flush 38, corrupt 137167, gen 25
 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O
 error on /dev/sdc6
 Sep 16 12:47:14 localhost kernel: BTRFS: lost page write due to I/O
 error on /dev/sdc6

 I'd expect with Btrfs having problems writing to a device, that there'd be 
 libata messages related to this also. Do you have earlier kernel messages 
 indicating the drive or controller are reporting errors?


  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs-progs: mkfs: remove experimental tag

2014-08-01 Thread Kyle Gates



 From: dste...@suse.cz
 To: linux-btrfs@vger.kernel.org
 CC: dste...@suse.cz
 Subject: [PATCH] btrfs-progs: mkfs: remove experimental tag
 Date: Thu, 31 Jul 2014 14:21:34 +0200

 Make it consistent with kernel status and documentation.

 Signed-off-by: David Sterba dste...@suse.cz
 ---
 mkfs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

 diff --git a/mkfs.c b/mkfs.c
 index 16e92221a547..538b6e6837b2 100644
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -1439,8 +1439,8 @@ int main(int ac, char **av)
 }

 /* if we are here that means all devs are good to btrfsify */
 - printf(\nWARNING! - %s IS EXPERIMENTAL\n, BTRFS_BUILD_VERSION);
 - printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n);
 + printf(%s\n, BTRFS_BUILD_VERSION);
 + printf(See http://btrfs.wiki.kernel.org for more\n\n);

The sentence/thought isn't complete. I was left thinking more what?
perhaps add: information, documentation

Thanks.

 dev_cnt--;

 @@ -1597,7 +1597,6 @@ raid_groups:
 label, first_file, nodesize, leafsize, sectorsize,
 pretty_size(btrfs_super_total_bytes(root-fs_info-super_copy)));

 - printf(%s\n, BTRFS_BUILD_VERSION);
 btrfs_commit_transaction(trans, root);

 if (source_dir_set) {
 --
 1.9.0

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

2014-07-29 Thread Kyle Gates

 Date: Tue, 29 Jul 2014 11:18:17 +0900
 From: takeuchi_sat...@jp.fujitsu.com
 To: kylega...@hotmail.com; linux-btrfs@vger.kernel.org
 Subject: Re: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

 Hi Kyle,

 (2014/07/28 22:24), Kyle Gates wrote:

 small wording error inline below

 
 Date: Fri, 25 Jul 2014 15:17:05 +0900
 From: takeuchi_sat...@jp.fujitsu.com
 To: linux-btrfs@vger.kernel.org
 Subject: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 - There are many format to show snapshot name in error messages,
 '%s', '%s, %s, ('%s'), and ('%s). Since it's messy,
 unify these to '%s' format.
 - Fix a type: s/uncorrect/incorrect/

 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
 cmds-subvolume.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index b7bfb3e..ce38503 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv)
 dstdir = dirname(dupdir);

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: subvolume name('%s)\n,
 + fprintf(stderr, ERROR: subvolume name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -314,7 +314,7 @@ again:
 free(cpath);

 if (!test_issubvolname(vname)) {
 - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -322,7 +322,7 @@ again:

 len = strlen(vname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: too long snapshot name '%s'\n,

 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,

 Thank you for your comment. Fixed. How about is it?

Yes, that looks good. Thanks.

 ===
 From 73f9847c603fbe863f072d029b1a4948a1032d6e Mon Sep 17 00:00:00 2001
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Date: Fri, 25 Jul 2014 12:46:27 +0900
 Subject: [PATCH] btrfs-progs: unify the format of error messages.

 ---
 cmds-subvolume.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index b7bfb3e..5a99c94 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv)
 dstdir = dirname(dupdir);

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: subvolume name too long ('%s)\n,
 + fprintf(stderr, ERROR: subvolume name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -314,7 +314,7 @@ again:
 free(cpath);

 if (!test_issubvolname(vname)) {
 - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -322,7 +322,7 @@ again:

 len = strlen(vname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -722,14 +722,14 @@ static int cmd_snapshot(int argc, char **argv)
 }

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: incorrect snapshot name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect snapshot name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -778,7 +778,7 @@ static int cmd_snapshot(int argc, char **argv)
 res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args);

 if (res  0) {
 - fprintf( stderr, ERROR: cannot snapshot %s - %s\n,
 + fprintf( stderr, ERROR: cannot snapshot '%s' - %s\n,
 subvol_descr, strerror(errno));
 goto out;
 }
 @@ -991,7 +991,7 @@ static int cmd_subvol_show(int argc, char **argv)

 ret = find_mount_root(fullpath, mnt);
 if (ret  0) {
 - fprintf(stderr, ERROR: find_mount_root failed on %s: 
 + fprintf(stderr, ERROR: find_mount_root failed on '%s': 
 %s\n, fullpath, strerror(-ret));
 goto out;
 }
 --
 1.9.3

  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

2014-07-28 Thread Kyle Gates

small wording error inline below


 Date: Fri, 25 Jul 2014 15:17:05 +0900
 From: takeuchi_sat...@jp.fujitsu.com
 To: linux-btrfs@vger.kernel.org
 Subject: [PATCH 2/2] btrfs-progs: Unify the messy error message formats

 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 - There are many format to show snapshot name in error messages,
 '%s', '%s, %s, ('%s'), and ('%s). Since it's messy,
 unify these to '%s' format.
 - Fix a type: s/uncorrect/incorrect/

 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
 cmds-subvolume.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/cmds-subvolume.c b/cmds-subvolume.c
 index b7bfb3e..ce38503 100644
 --- a/cmds-subvolume.c
 +++ b/cmds-subvolume.c
 @@ -140,14 +140,14 @@ static int cmd_subvol_create(int argc, char **argv)
 dstdir = dirname(dupdir);

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: uncorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: subvolume name too long ('%s)\n,
 + fprintf(stderr, ERROR: subvolume name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -314,7 +314,7 @@ again:
 free(cpath);

 if (!test_issubvolname(vname)) {
 - fprintf(stderr, ERROR: incorrect subvolume name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect subvolume name '%s'\n,
 vname);
 ret = 1;
 goto out;
 @@ -322,7 +322,7 @@ again:

 len = strlen(vname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: too long snapshot name '%s'\n,

+ fprintf(stderr, ERROR: snapshot name too long '%s'\n,

 vname);
 ret = 1;
 goto out;
 @@ -722,14 +722,14 @@ static int cmd_snapshot(int argc, char **argv)
 }

 if (!test_issubvolname(newname)) {
 - fprintf(stderr, ERROR: incorrect snapshot name ('%s')\n,
 + fprintf(stderr, ERROR: incorrect snapshot name '%s'\n,
 newname);
 goto out;
 }

 len = strlen(newname);
 if (len == 0 || len= BTRFS_VOL_NAME_MAX) {
 - fprintf(stderr, ERROR: snapshot name too long ('%s)\n,
 + fprintf(stderr, ERROR: snapshot name too long '%s'\n,
 newname);
 goto out;
 }
 @@ -778,7 +778,7 @@ static int cmd_snapshot(int argc, char **argv)
 res = ioctl(fddst, BTRFS_IOC_SNAP_CREATE_V2, args);

 if (res  0) {
 - fprintf( stderr, ERROR: cannot snapshot %s - %s\n,
 + fprintf( stderr, ERROR: cannot snapshot '%s' - %s\n,
 subvol_descr, strerror(errno));
 goto out;
 }
 @@ -991,7 +991,7 @@ static int cmd_subvol_show(int argc, char **argv)

 ret = find_mount_root(fullpath, mnt);
 if (ret  0) {
 - fprintf(stderr, ERROR: find_mount_root failed on %s: 
 + fprintf(stderr, ERROR: find_mount_root failed on '%s': 
 %s\n, fullpath, strerror(-ret));
 goto out;
 }
 --
 1.9.3

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: File server structure suggestion

2014-07-16 Thread Kyle Gates


 Then there's raid10, which takes more drives and is faster, but is still
 limited to two mirrors. But while I haven't actually used raid10 myself,
 I do /not/ believe it's limited to pair-at-a-time additions. I believe
 it'll take, for instance five devices, just fine, staggering chunk
 allocation as necessary to fill all at about the same rate.

I am running just that: 3 separate raid10 btrfs filesystems (root, home, 
media/backups) on 5 drives and they are unequal sizes too!
My newer drives are bigger and have higher transfer rates which means they get 
more chunks but overall performance doesn't suffer.
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: How does btrfs handle bad blocks in raid1?

2014-01-09 Thread Kyle Gates
On Thu, 9 Jan 2014 11:40:20 -0700 Chris Murphy wrote:

 On Jan 9, 2014, at 3:42 AM, Hugo Mills wrote:

 On Thu, Jan 09, 2014 at 11:26:26AM +0100, Clemens Eisserer wrote:
 Hi,

 I am running write-intensive (well sort of, one write every 10s)
 workloads on cheap flash media which proved to be horribly unreliable.
 A 32GB microSDHC card reported bad blocks after 4 days, while a usb
 pen drive returns bogus data without any warning at all.

 So I wonder, how would btrfs behave in raid1 on two such devices?
 Would it simply mark bad blocks as bad and continue to be
 operational, or will it bail out when some block can not be
 read/written anymore on one of the two devices?

 If a block is read and fails its checksum, then the other copy (in
 RAID-1) is checked and used if it's good. The bad copy is rewritten to
 use the good data.

 If the block is bad such that writing to it won't fix it, then
 there's probably two cases: the device returns an IO error, in which
 case I suspect (but can't be sure) that the FS will go read-only. Or
 the device silently fails the write and claims success, in which case
 you're back to the situation above of the block failing its checksum.

 In a normally operating drive, when the drive firmware locates a physical 
 sector with persistent write failures, it's dereferenced. So the LBA points 
 to a reserve physical sector, the originally can't be accessed by LBA. If all 
 of the reserve sectors get used up, the next persistent write failure will 
 result in a write error reported to libata and this will appear in dmesg, and 
 should be treated as the drive being no longer in normal operation. It's a 
 drive useful for storage developers, but not for production usage.

 There's no marking of bad blocks right now, and I don't know of
 anyone working on the feature, so the FS will probably keep going back
 to the bad blocks as it makes CoW copies for modification.

 This is maybe relevant:
 https://www.kernel.org/doc/htmldocs/libata/ataExceptions.html

 READ and WRITE commands report CHS or LBA of the first failed sector but 
 ATA/ATAPI standard specifies that the amount of transferred data on error 
 completion is indeterminate, so we cannot assume that sectors preceding the 
 failed sector have been transferred and thus cannot complete those sectors 
 successfully as SCSI does.

 If I understand that correctly, Btrfs really ought to either punt the device, 
 or make the whole volume read-only. For production use, going read-only very 
 well could mean data loss, even while preserving the state of the file 
 system. Eventually I'd rather see the offending device ejected from the 
 volume, and for the volume to remain rw,degraded.

I would like to see btrfs hold onto the device in a read-only state like is 
done during a device replace operation. New writes would maintain the raid 
level but go out to the remaining devices and only go full filesystem read-only 
if the minimum number of writable devices is not met. Once a new device is 
added in, the replace operation could commence and drop the bad device when 
complete.   --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Does btrfs raid1 actually provide any resilience?

2013-11-14 Thread Kyle Gates
On 11/14/2013 11:35 AM, Lutz Vieweg wrote:
 
 On 11/14/2013 06:18 PM, George Mitchell wrote:
 The read only mount issue is by design.  It is intended to make sure you 
 know exactly what is going
 on before you proceed.
 
 Hmmm... but will a server be able to continue its operation (including 
 writes) on
 an already mounted btrfs when a storage device in a btrfs-raid1 fails?
 (If not, that would contradict the idea of achieving a higher reliability.)
 
 The read only function is designed to make certain you know that you are
 simplex before you proceed further.
 
 Ok, but once I know - e.g. by verifying that indeed, one storage device is 
 broken -
 is there any option to proceed (without redundancy) until I can replace the 
 broken
 device?

Bonus points if the raid mode is maintained during degraded operation via 
either dup (2 disk array) or allocating additional chunks (3+ disk array).
 
 I certainly wouldn't trust it just yet as it is not fully production ready.
 
 Sure, the server we intend to try btrfs on is one that we can restore when 
 required,
 and there is a redundant server (without btrfs) that can stand in. I was just
 hoping for some good experiences to justify a larger field-trial.
 
 That said, I have been using it for over six
 months now, coming off of 3ware RAID, and I have no regrets.
 
 I guess every Linux software RAID option is an improvement when
 you come from those awful hardware RAID controllers, which caused
 us additional downtime more often than they prevented downtime.
 
 Regards,
 
 Lutz Vieweg
 
 
 On 11/14/2013 03:02 AM, Lutz Vieweg wrote:
 Hi,

 on a server that so far uses an MD RAID1 with XFS on it we wanted
 to try btrfs, instead.

 But even the most basic check for btrfs actually providing
 resilience against one of the physical storage devices failing
 yields a does not work result - so I wonder whether I misunderstood
 that btrfs is meant to not require block-device level RAID
 functionality underneath.

 Here are the test procedure:

 Testing was done using vanilla linux-3.12 (x86_64) plus btrfs-progs at
 commit c652e4efb8e2dd76ef1627d8cd649c6af5905902.

 Preparing two 100 MB image files:
 # dd if=/dev/zero of=/tmp/img1 bs=1024k count=100
 100+0 records in
 100+0 records out
 104857600 bytes (105 MB) copied, 0.201003 s, 522 MB/s

 # dd if=/dev/zero of=/tmp/img2 bs=1024k count=100
 100+0 records in
 100+0 records out
 104857600 bytes (105 MB) copied, 0.185486 s, 565 MB/s

 Preparing two loop devices on those images to act as the underlying
 block devices for btrfs:
 # losetup /dev/loop1 /tmp/img1
 # losetup /dev/loop2 /tmp/img2

 Preparing the btrfs filesystem on the loop devices:
 # mkfs.btrfs --data raid1 --metadata raid1 --label test /dev/loop1 
 /dev/loop2
 SMALL VOLUME: forcing mixed metadata/data groups

 WARNING! - Btrfs v0.20-rc1-591-gc652e4e IS EXPERIMENTAL
 WARNING! - see http://btrfs.wiki.kernel.org before using

 Performing full device TRIM (100.00MiB) ...
 Turning ON incompat feature 'mixed-bg': mixed data and metadata block 
 groups
 Created a data/metadata chunk of size 8388608
 Performing full device TRIM (100.00MiB) ...
 adding device /dev/loop2 id 2
 fs created label test on /dev/loop1
 nodesize 4096 leafsize 4096 sectorsize 4096 size 200.00MiB
 Btrfs v0.20-rc1-591-gc652e4e

 Mounting the btfs filesystem:
 # mount -t btrfs /dev/loop1 /mnt/tmp

 Copying just 70MB of zeroes into a test file:
 # dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=70
 70+0 records in
 70+0 records out
 73400320 bytes (73 MB) copied, 0.0657669 s, 1.1 GB/s

 Checking that the testfile can be read:
 # md5sum /mnt/tmp/testfile
 b89fdccdd61d57b371f9611eec7d3cef  /mnt/tmp/testfile

 Unmounting before further testing:
 # umount /mnt/tmp


 Now we assume that one of the two storage devices is broken,
 so we remove one of the two loop devices:
 # losetup -d /dev/loop1

 Trying to mount the btrfs filesystem from the one storage device that is 
 left:
 # mount -t btrfs -o device=/dev/loop2,degraded /dev/loop2 /mnt/tmp
 mount: wrong fs type, bad option, bad superblock on /dev/loop2,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 ... does not work.

 In /var/log/messages we find:
 kernel: btrfs: failed to read chunk root on loop2
 kernel: btrfs: open_ctree failed

 (The same happenes when adding ,ro to the mount options.)

 Ok, so if the first of two disks was broken, so is our filesystem.
 Isn't that what RAID1 should prevent?

 We tried a different scenario, now the first disk remains
 but the second is broken:

 # losetup -d /dev/loop2
 # losetup /dev/loop1 /tmp/img1

 # mount -t btrfs -o degraded /dev/loop1 /mnt/tmp
 mount: wrong fs type, bad option, bad superblock on /dev/loop1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

 In /var/log/messages:
 kernel: Btrfs: too 

Re: [PATCH] Btrfs: fix broken nocow after balance

2013-06-12 Thread Kyle Gates

On Wednesday, June 05, 2013 Miao Xie wrote:

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

Since the extents are not shared by the relocation tree after the balance,
we can recover the old last_snapshot safely if no one snapshoted the
source tree. We fix the above problem by this way.


This patch also fixed my problem. I tend to like this patch better as the 
fix lands on disk allowing nocow to function with an older kernel after 
being balanced.

Thanks,
Kyle

Tested-by: Kyle Gates kylega...@hotmail.com


Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
fs/btrfs/relocation.c | 44 
1 file changed, 44 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 395b820..934ffe6 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1305,6 +1305,7 @@ static struct btrfs_root *create_reloc_root(struct 
btrfs_trans_handle *trans,

 struct extent_buffer *eb;
 struct btrfs_root_item *root_item;
 struct btrfs_key root_key;
+ u64 last_snap = 0;
 int ret;

 root_item = kmalloc(sizeof(*root_item), GFP_NOFS);
@@ -1320,6 +1321,7 @@ static struct btrfs_root *create_reloc_root(struct 
btrfs_trans_handle *trans,

   BTRFS_TREE_RELOC_OBJECTID);
 BUG_ON(ret);

+ last_snap = btrfs_root_last_snapshot(root-root_item);
 btrfs_set_root_last_snapshot(root-root_item,
  trans-transid - 1);
 } else {
@@ -1345,6 +1347,12 @@ static struct btrfs_root *create_reloc_root(struct 
btrfs_trans_handle *trans,

 memset(root_item-drop_progress, 0,
sizeof(struct btrfs_disk_key));
 root_item-drop_level = 0;
+ /*
+ * abuse rtransid, it is safe because it is impossible to
+ * receive data into a relocation tree.
+ */
+ btrfs_set_root_rtransid(root_item, last_snap);
+ btrfs_set_root_otransid(root_item, trans-transid);
 }

 btrfs_tree_unlock(eb);
@@ -2273,8 +2281,12 @@ void free_reloc_roots(struct list_head *list)
static noinline_for_stack
int merge_reloc_roots(struct reloc_control *rc)
{
+ struct btrfs_trans_handle *trans;
 struct btrfs_root *root;
 struct btrfs_root *reloc_root;
+ u64 last_snap;
+ u64 otransid;
+ u64 objectid;
 LIST_HEAD(reloc_roots);
 int found = 0;
 int ret = 0;
@@ -2308,12 +2320,44 @@ again:
 } else {
 list_del_init(reloc_root-root_list);
 }
+
+ /*
+ * we keep the old last snapshod transid in rtranid when we
+ * created the relocation tree.
+ */
+ last_snap = btrfs_root_rtransid(reloc_root-root_item);
+ otransid = btrfs_root_otransid(reloc_root-root_item);
+ objectid = reloc_root-root_key.offset;
+
 ret = btrfs_drop_snapshot(reloc_root, rc-block_rsv, 0, 1);
 if (ret  0) {
 if (list_empty(reloc_root-root_list))
 list_add_tail(reloc_root-root_list,
   reloc_roots);
 goto out;
+ } else if (!ret) {
+ /*
+ * recover the last snapshot tranid to avoid
+ * the space balance break NOCOW.
+ */
+ root = read_fs_root(rc-extent_root-fs_info,
+ objectid);
+ if (IS_ERR(root))
+ continue;
+
+ if (btrfs_root_refs(root-root_item) == 0)
+ continue;
+
+ trans = btrfs_join_transaction(root);
+ BUG_ON(IS_ERR(trans));
+
+ /* Check if the fs/file tree was snapshoted or not. */
+ if (btrfs_root_last_snapshot(root-root_item) ==
+ otransid - 1)
+ btrfs_set_root_last_snapshot(root-root_item,
+  last_snap);
+
+ btrfs_end_transaction(trans, root);
 }
 }

--
1.8.1.4



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix broken nocow after balance

2013-06-06 Thread Kyle Gates

On Monday, June 03, 2013, Liu Bo wrote:

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So it turns out that checking last_snapshot does not always ensure that
a node/leaf/file_extent is shared.

That's why shared node/leaf needs to search extent tree for number of 
references

even after having checked last_snapshot, and updating fs/file tree works
top-down so the children will always know how many references parents put
on them at the moment of checking shared status.

However, our nocow path does something different, it'll firstly check
if the file extent is shared, then update fs/file tree by updating inode.
This ends up that the related extent record to the file extent may don't
have actual multiple references when checking shared status.


fs_root snap
  \/
   leaf == refs=2
|
 file_extent == refs=1(but actually refs is 2)

After updating fs tree(or snapshot if snapshot is not RO), it'll be

fs root snap
 \  /
 cow  leaf
   \  /
 file_extent == refs=2(we do have two parents)


So it'll be confused by last_snapshot from balance to think that the file
extent is now shared.

There are actually a couple of ways to address it, but updating fs/file 
tree
firstly might be the easiest and cleanest one.  With this, updating 
fs/file
tree will at least make a delayed ref if the file extent is really shared 
by
several parents, we can make nocow happy again without having to check 
confusing

last_snapshot.


Works here. Extents are stable after a balance.
Thanks,
Kyle

Tested-by: Kyle Gates kylega...@hotmail.com



Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
fs/btrfs/extent-tree.c |4 
fs/btrfs/inode.c   |2 +-
2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index df472ab..d24c26c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2856,10 +2856,6 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle *trans,

 btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
 goto out;

- if (btrfs_extent_generation(leaf, ei) =
- btrfs_root_last_snapshot(root-root_item))
- goto out;
-
 iref = (struct btrfs_extent_inline_ref *)(ei + 1);
 if (btrfs_extent_inline_ref_type(leaf, iref) !=
 BTRFS_EXTENT_DATA_REF_KEY)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 23c596c..0dc5c7d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1253,7 +1253,7 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,

 cur_offset = start;
 while (1) {
 ret = btrfs_lookup_file_extent(trans, root, path, ino,
-cur_offset, 0);
+cur_offset, 1);
 if (ret  0) {
 btrfs_abort_transaction(trans, root, ret);
 goto error;
--
1.7.7



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-30 Thread Kyle Gates

On Wed, May 29, 2013 Miao Xie wrote:

On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:

On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:

From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance


[...]

Sorry for the long wait in replying.
This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
Raring kernel). I can probably try again on a newer version if you
think it will help.
This was my first kernel compile so I patched by hand and waited (10
hours on my old 32 bit single core machine).

I did move some of the files off and back on to the filesystem to
start fresh and compare but all seem to exhibit the same behavior
after a balance.



Thanks for testing the patch although it didn't help you.
Actually I tested it to be sure that it fixed the problems in my 
reproducer.


So anyway can you please apply this debug patch in order to nail it down?


Your patch can not fix the above problem is because we may 
update -last_snapshot

after we relocate the file data extent.

For example, there are two block groups which will be relocated, One is 
data block
group, the other is metadata block group. Then we relocate the data block 
group firstly,
and set the new generation for the file data extent item/the relative 
extent item and
set (new_generation - 1) for -last_snapshot. After the relocation of this 
block group,
we will end the transaction and drop the relocation tree. If we end the 
space balance now,
we won't break the nocow rule because -last_snapshot is less than the 
generation of the file
data extent item/the relative extent item. But there is still one block 
group which will be
relocated, when relocating the second block group, we will also start a 
new transaction,
and update -last_snapshot if need. So, -last_snapshot is greater than 
the generation of the file

data extent item we set before. And the nocow rule is broken.

Back to this above problem. I don't think it is a serious problem, we only 
do COW once after
the relocation, then we will still honour the nocow rule. The behaviour is 
similar to snapshot.

So maybe it needn't be fixed.


I would argue that for large vm workloads, running a balance or adding disks 
is a common practice that will result in a drastic drop in performance as 
well as massive increases in metadata writes and fragmentation.
In my case my disks were thrashing severely, performance was poor and ntp 
couldn't even hold my clock stable.

If the fix is nontrival please add this to the todo list.
Thanks,
Kyle

If we must fix this problem, I think the only way is that get the 
generation at the beginning
of the space balance, and then set it to -last_snapshot 
if -last_snapshot is less than it,
don't use (current_generation - 1) to update the -last_snapshot. Besides 
that, don't forget
to store the generation into btrfs_balance_item, or the problem will 
happen after we resume the

balance.

Thanks
Miao



thanks,
liubo

[...]






--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-28 Thread Kyle Gates

From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So here we update file extent's generation while walking relocated
file extents in data reloc root, and use file extent's generation
instead for checking if we have cross refs for the file extent.

That way we can make nocow happy again and have no impact on others.

Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
fs/btrfs/ctree.h   |2 +-
fs/btrfs/extent-tree.c |   18 +-
fs/btrfs/inode.c   |   10 --
fs/btrfs/relocation.c  |1 +
4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4560052..eb2e782 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct 
btrfs_root *root,

 u64 bytenr, u64 num_bytes);
int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr);
+   u64 objectid, u64 offset, u64 bytenr, u64 gen);
struct btrfs_block_group_cache *btrfs_lookup_block_group(
 struct btrfs_fs_info *info,
 u64 bytenr);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1e84c74..f3b3616 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2816,7 +2816,8 @@ out:
static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path,
- u64 objectid, u64 offset, u64 bytenr)
+ u64 objectid, u64 offset, u64 bytenr,
+ u64 fi_gen)
{
 struct btrfs_root *extent_root = root-fs_info-extent_root;
 struct extent_buffer *leaf;
@@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle

*trans,
 btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
 goto out;

- if (btrfs_extent_generation(leaf, ei) =
- btrfs_root_last_snapshot(root-root_item))
+ /*
+ * Usually generation in extent item is larger than that in file extent
+ * item because of delay refs.  But we don't want balance to break
+ * file's nocow behaviour, so use file_extent's generation which has
+ * been updates when we update fs root to point to relocated file
+ * extents in data reloc root.
+ */
+ fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
+ if (fi_gen = btrfs_root_last_snapshot(root-root_item))
 goto out;

 iref = (struct btrfs_extent_inline_ref *)(ei + 1);
@@ -2886,7 +2894,7 @@ out:

int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr)
+   u64 objectid, u64 offset, u64 bytenr, u64 gen)
{
 struct btrfs_path *path;
 int ret;
@@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle 
*trans,


 do {
 ret = check_committed_ref(trans, root, path, objectid,
-   offset, bytenr);
+   offset, bytenr, gen);
 if (ret  ret != -ENOENT)
 goto out;

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2cfdd33..976b045 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1727,6 +1727,8 @@ next_slot:
 ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 if (extent_type == BTRFS_FILE_EXTENT_REG ||
 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
+ u64 gen;
+ gen = btrfs_file_extent_generation(leaf, fi);
 disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 extent_offset = btrfs_file_extent_offset(leaf, fi);
 extent_end = found_key.offset +
@@ -1749,7 +1751,8 @@ next_slot:
 goto out_check;
 if (btrfs_cross_ref_exist(trans, root, ino,
   found_key.offset -
-   extent_offset, disk_bytenr))
+   extent_offset, disk_bytenr,
+   gen))
 goto out_check;
 disk_bytenr += extent_offset;
 disk_bytenr += cur_offset - found_key.offset;
@@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct 
btrfs_trans_handle

*trans,
 struct btrfs_key key;
 u64 disk_bytenr;
 u64 backref_offset;
+ u64 fi_gen;
 u64 extent_end;
 u64 num_bytes;
 int slot;
@@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct 
btrfs_trans_handle

*trans,
 }
 disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 backref_offset = btrfs_file_extent_offset(leaf, fi);
+ fi_gen = btrfs_file_extent_generation(leaf, fi);

 *orig_start = key.offset - backref_offset;
 *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
@@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct 
btrfs_trans_handle

*trans,
 * find any we must cow
 */
 if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode),
-   key.offset - backref_offset, disk_bytenr))
+   key.offset - backref_offset, disk_bytenr,
+   fi_gen))
 goto out;

 /*
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 704a1b8..07faabf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1637,6 +1637,7 @@ int replace_file_extents

Re: nocow 'C' flag ignored after balance

2013-05-17 Thread Kyle Gates

On Fri, 17 May 2013 15:04:45 +0800, Liu Bo wrote:

On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote:

and mounted with autodefrag
Am I actually just seeing large ranges getting split while remaining
contiguous on disk? This would imply crc calculation on the two
outside ranges. Or perhaps there is some data being inlined for each
write. I believe writes on this file are 32KiB each.
Does the balance produce persistent crc values in the metadata even
though the files are nocow which implies nocrc?
...
I ran this test again and here's filefrag -v after about a day of use:

[...]
As you can see the 32KiB writes fit in the extents of size 9 and 55.
Are those 9 block extents inlined?
If I understand correctly, new extents are created for these nocow
writes, then the old extents are basically hole punched producing
three (four? because of inlining) separate extents.
Something here begs for optimization. Perhaps balance should treat
nocow files a little differently. That would be the time to remove
the extra bits that prevent inplace overwrites. After the fact it
becomes much more difficult, although removing a crc for the extent
being written seems a little easier then iterating over the entire
file.

Thanks for taking the time to read,
Kyle

P.S. I'm CCing David as I believe he wrote the patch to get the 'C'
flag working on empty files and directories.


Hi Kyle,

Can you please apply this patch and see if it helps?

thanks,
liubo


From: Liu Bo bo.li@oracle.com

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So here we update file extent's generation while walking relocated
file extents in data reloc root, and use file extent's generation
instead for checking if we have cross refs for the file extent.

That way we can make nocow happy again and have no impact on others.

Reported-by: Kyle Gates kylega...@hotmail.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
fs/btrfs/ctree.h   |2 +-
fs/btrfs/extent-tree.c |   18 +-
fs/btrfs/inode.c   |   10 --
fs/btrfs/relocation.c  |1 +
4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4560052..eb2e782 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct 
btrfs_root *root,

 u64 bytenr, u64 num_bytes);
int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr);
+   u64 objectid, u64 offset, u64 bytenr, u64 gen);
struct btrfs_block_group_cache *btrfs_lookup_block_group(
 struct btrfs_fs_info *info,
 u64 bytenr);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1e84c74..f3b3616 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2816,7 +2816,8 @@ out:
static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct btrfs_path *path,
- u64 objectid, u64 offset, u64 bytenr)
+ u64 objectid, u64 offset, u64 bytenr,
+ u64 fi_gen)
{
 struct btrfs_root *extent_root = root-fs_info-extent_root;
 struct extent_buffer *leaf;
@@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
btrfs_trans_handle

*trans,
 btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
 goto out;

- if (btrfs_extent_generation(leaf, ei) =
- btrfs_root_last_snapshot(root-root_item))
+ /*
+ * Usually generation in extent item is larger than that in file extent
+ * item because of delay refs.  But we don't want balance to break
+ * file's nocow behaviour, so use file_extent's generation which has
+ * been updates when we update fs root to point to relocated file
+ * extents in data reloc root.
+ */
+ fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
+ if (fi_gen = btrfs_root_last_snapshot(root-root_item))
 goto out;

 iref = (struct btrfs_extent_inline_ref *)(ei + 1);
@@ -2886,7 +2894,7 @@ out:

int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
-   u64 objectid, u64 offset, u64 bytenr)
+   u64 objectid, u64 offset, u64 bytenr, u64 gen)
{
 struct btrfs_path *path;
 int ret;
@@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle 
*trans,


 do {
 ret = check_committed_ref(trans, root, path, objectid,
-   offset, bytenr);
+   offset, bytenr, gen);
 if (ret  ret != -ENOENT)
 goto out;

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2cfdd33..976b045 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1727,6 +1727,8 @@ next_slot:
 ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 if (extent_type == BTRFS_FILE_EXTENT_REG ||
 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
+ u64 gen;
+ gen = btrfs_file_extent_generation(leaf, fi);
 disk_bytenr = btrfs_file_extent_disk_bytenr

Re: nocow 'C' flag ignored after balance

2013-05-16 Thread Kyle Gates

On Fri, May 10, 2013 Liu Bo wrote:

On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:

I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
series kernel so please disregard if this has been fixed in higher
versions. This is on a btrfs RAID1 with 3 then 4 disks.

My use case is to set the nocow 'C' flag on a directory and copy in
some files, then make lots of writes (same file sizes) and note that
the number of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again,
now the number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files.
Are they also being checksummed thereby breaking the nocow flag?

I have made no snapshots and made no writes to said files while the
balance was running.


Hi Kyle,

It's hard to say if it's standard, it is a side effect casued by balance.

During balance, our reloc root works like a snapshot, so we set
last_snapshot on the fs root, and this makes new nocow writes think that
we have to do cow as the extent is created before taking snapshot.

But the nocow 'C' flag on the file is still there, if you make new
writes on the new extent after balance, you still get the same number of
extents.

thanks,
liubo


Thank you for the explanation.
On my machine this didn't happen however. IIRC one ~10GiB file had 24 
extents before balance, 26 extents after balance, and 1000+ and growing 
when I checked the following day.
I'll add that I am running a relatively recent version of btrfs-tools from 
a ppa.

and mounted with autodefrag
Am I actually just seeing large ranges getting split while remaining 
contiguous on disk? This would imply crc calculation on the two outside 
ranges. Or perhaps there is some data being inlined for each write. I 
believe writes on this file are 32KiB each.
Does the balance produce persistent crc values in the metadata even though 
the files are nocow which implies nocrc?

...
I ran this test again and here's filefrag -v after about a day of use:

Filesystem type is: 9123683e
File size of /blah/blah/file is 10213265920 (2493474 blocks, blocksize 4096)
ext logical physical expected length flags
  0   0 675625629   9
  1   9 675621279 675625638 55
  2  64 674410131 675621334886
  3 950 675558303 674411017  9
  4 959 675583473 675558312 55
  51014 674411081 675583528708
  61722 675456318 674411789  9
  71731 675710934 675456327 55
  81786 674411853 675710989521
  92307 675424433 674412374  9
 102316 675471062 675424442 55
 112371 674412438 675471117984
 123355 676012018 674413422  9
 133364 676024295 676012027 55
 143419 674413486 676024350871
 154290 675681138 674414357  9
 164299 675618500 675681147 55
...
13986 2486955 671627059 675876382627
13987 2487582 675677542 671627686  9
13988 2487591 675700351 675677551 55
13989 2487646 671627750 675700406   1212
13990 2488858 675932037 671628962  9
13991 2488867 675990025 675932046 55
13992 2488922 671629026 675990080220
13993 2489142 675674447 671629246  9
13994 2489151 675687864 675674456 55
13995 2489206 671629310 675687919   1821
13996 2491027 676209288 671631131  9
13997 2491036 676260767 676209297 55
13998 2491091 671631195 676260822285
13999 2491376 675650278 671631480  9
14000 2491385 675678822 675650287 55
14001 2491440 671631544 675678877   1464
14002 2492904 675534255 671633008  9
14003 2492913 675503514 675534264 55
14004 2492968 671633072 675503569506 eof
/blah/blah/file: 14005 extents found

As you can see the 32KiB writes fit in the extents of size 9 and 55. Are 
those 9 block extents inlined?
If I understand correctly, new extents are created for these nocow writes, 
then the old extents are basically hole punched producing three (four? 
because of inlining) separate extents.
Something here begs for optimization. Perhaps balance should treat nocow 
files a little differently. That would be the time to remove the extra bits 
that prevent inplace overwrites. After the fact it becomes much more 
difficult, although removing a crc for the extent being written seems a 
little easier then iterating over the entire file.


Thanks for taking the time to read,
Kyle

P.S. I'm CCing David as I believe he wrote the patch to get the 'C' flag 
working on empty files and directories. 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nocow 'C' flag ignored after balance

2013-05-10 Thread Kyle Gates

On Fri, May 10, 2013 Liu Bo wrote:

On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:

I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
series kernel so please disregard if this has been fixed in higher
versions. This is on a btrfs RAID1 with 3 then 4 disks.

My use case is to set the nocow 'C' flag on a directory and copy in
some files, then make lots of writes (same file sizes) and note that
the number of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again,
now the number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files.
Are they also being checksummed thereby breaking the nocow flag?

I have made no snapshots and made no writes to said files while the
balance was running.


Hi Kyle,

It's hard to say if it's standard, it is a side effect casued by balance.

During balance, our reloc root works like a snapshot, so we set
last_snapshot on the fs root, and this makes new nocow writes think that
we have to do cow as the extent is created before taking snapshot.

But the nocow 'C' flag on the file is still there, if you make new
writes on the new extent after balance, you still get the same number of
extents.

thanks,
liubo


Thank you for the explanation.
On my machine this didn't happen however. IIRC one 10GiB file had 24 extents 
before balance, 26 extents after balance, and 1000+ and growing when I 
checked the following day.
I'll add that I am running a relatively recent version of btrfs-tools from a 
ppa. 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nocow 'C' flag ignored after balance

2013-05-09 Thread Kyle Gates
I'll preface that I'm running Ubuntu 13.04 with the standard 3.8 series 
kernel so please disregard if this has been fixed in higher versions. This 
is on a btrfs RAID1 with 3 then 4 disks.


My use case is to set the nocow 'C' flag on a directory and copy in some 
files, then make lots of writes (same file sizes) and note that the number 
of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again, now the 
number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files. Are they 
also being checksummed thereby breaking the nocow flag?


I have made no snapshots and made no writes to said files while the balance 
was running.


Thanks,
Kyle 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: no space left on device.

2012-11-02 Thread Kyle Gates
 So I have ended up in a state where I can't delete files with rm.

 the error I get is no space on device. however I'm not even close to empty.
 /dev/sdb1 38G 27G 9.5G 75%
 there is about 800k files/dirs in this filesystem

 extra strange is that I can in another directory create and delete files.

 So I tried pretty much all I could google my way to but problem
 persisted. So I decided to do a backup and a format. But when the backup
 was done I tried one more time and now it was possible to delete the
 directory and all content?

 using the 3.5 kernel in ubuntu 12.10. Is this a known issue ? is it
 fixed in later kernels?

 fsck /btrfs scrub and kernel log. nothing indicate any problem of any kind.


First let's see the output of:
btrfs fi df /mountpoint

You're probably way over allocated in metadata so a balance should help:
btrfs bal start -m /mountpoint
or omit the -m option to run a full balance.
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: btrfs seems to do COW while inode has NODATACOW set

2012-10-26 Thread Kyle Gates
  Wade, thanks.
 
  Yes, with the preallocated extent I saw the behavior you describe, and
  it makes perfect sense to alloc a new EXTENT_DATA in this case.
  In my case, I did another simple test:
 
  Before:
  item 4 key (257 INODE_ITEM 0) itemoff 3593 itemsize 160
  inode generation 5 transid 5 size 5368709120 nbytes 5368709120
  owner[0:0] mode 100644
  inode blockgroup 0 nlink 1 flags 0x3 seq 0
  item 5 key (257 INODE_REF 256) itemoff 3578 itemsize 15
  inode ref index 2 namelen 5 name: vol-1
  item 6 key (257 EXTENT_DATA 0) itemoff 3525 itemsize 53
  extent data disk byte 5368709120 nr 131072
  extent data offset 0 nr 131072 ram 131072
  extent compression 0
  item 7 key (257 EXTENT_DATA 131072) itemoff 3472 itemsize 53
  extent data disk byte 5905842176 nr 33423360
  extent data offset 0 nr 33423360 ram 33423360
  extent compression 0
  ...
 
  I am going to do a single write of a 4Kib block into (257 EXTENT_DATA
  131072) extent:
 
  dd if=/dev/urandom of=/mnt/src/subvol-1/vol-1 bs=4096 seek=32 count=1
  conv=notrunc
 
  After:
  item 4 key (257 INODE_ITEM 0) itemoff 3593 itemsize 160
  inode generation 5 transid 21 size 5368709120 nbytes 5368709120
  owner[0:0] mode 100644
  inode blockgroup 0 nlink 1 flags 0x3 seq 1
  item 5 key (257 INODE_REF 256) itemoff 3578 itemsize 15
  inode ref index 2 namelen 5 name: vol-1
  item 6 key (257 EXTENT_DATA 0) itemoff 3525 itemsize 53
  extent data disk byte 5368709120 nr 131072
  extent data offset 0 nr 131072 ram 131072
  extent compression 0
  item 7 key (257 EXTENT_DATA 131072) itemoff 3472 itemsize 53
  extent data disk byte 5368840192 nr 4096
  extent data offset 0 nr 4096 ram 4096
  extent compression 0
  item 8 key (257 EXTENT_DATA 135168) itemoff 3419 itemsize 53
  extent data disk byte 5905842176 nr 33423360
  extent data offset 4096 nr 33419264 ram 33423360
  extent compression 0
 
  We clearly see that a new extent has been allocated for some reason
  (bytenr=5368840192), and previous extent (bytenr=5905842176) is still
  there, but used at offset of 4096. This is exactly cow, I believe.
 Hmm, I'm pretty sure that using 'dd' in this fashion skips the first 32 
 4096-sized
 blocks and thus writes -past- the length of this extent (eg: writes from 
 131073 to
 135168). This causes a new extent to be allocated after the previous extent.

 But even if using 'dd' with a 'skip' value of '31' created a new EXTENT_DATA, 
 it
 would not necessarily be data CoW, since data CoW refers only to the location 
 of
 the -data- (i.e., not metadata and thus not EXTENT_DATA) on disk. The key 
 thing
 is to look at where the EXTENT_DATAs are pointing to, not how many 
 EXTENT_DATAs
 there are.

  However, your hint about not being able to read into memory may be
  useful; it would be good if we can find the place in the code that
  does that decision to cow.
 Try looking at the callers of btrfs_cow_block(), but you'll be own your own 
 from
 there :)

  I guess I am looking for a way to never ever allocate new EXTENT_DATAs
  on a fully-mapped file. Is there one?
 Hmm, I don't think that this exists right now. You could try a '-o 
 autodefrag' to
 minimize the number of EXTENT_DATAs, though.

This seems to be a start at what you're looking for:
Commit: 7e97b8daf63487c20f78487bd4045f39b0d97cf4
btrfs: allow setting NOCOW for a zero sized file via ioctl

In short, the nodatacow option won't be honored if any checksums have been 
assigned to any extents of a file.


 Regards,
 Wade

 
  Thanks!
  Alex. --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: problem replacing failing drive

2012-10-25 Thread Kyle Gates

 To: linux-btrfs@vger.kernel.org
 From: samtyg...@yahoo.co.uk
 Subject: Re: problem replacing failing drive
 Date: Thu, 25 Oct 2012 22:02:23 +0100

 On 22/10/12 10:07, sam tygier wrote:
  hi,
 
  I have a 2 drive btrfs raid set up. It was created first with a single 
  drive, and then adding a second and doing
  btrfs fi balance start -dconvert=raid1 /data
 
  the original drive is showing smart errors so i want to replace it. i dont 
  easily have space in my desktop for an extra disk, so i decided to proceed 
  by shutting down. taking out the old failing drive and putting in the new 
  drive. this is similar to the description at
  https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices#Replacing_Failed_Devices
  (the other reason to try this is to simulate what would happen if a drive 
  did completely fail).
 
  so after swapping the drives and rebooting, i try to mount as degraded. i 
  instantly get a kernel panic, 
  http://www.hep.man.ac.uk/u/sam/pub/IMG_5397_crop.png
 
  so far all this has been with 3.5 kernel. so i upgraded to 3.6.2 and tried 
  to mount degraded again.
 
  first with just sudo mount /dev/sdd2 /mnt, then with sudo mount -o degraded 
  /dev/sdd2 /mnt
 
  [ 582.535689] device label bdata devid 1 transid 25342 /dev/sdd2
  [ 582.536196] btrfs: disk space caching is enabled
  [ 582.536602] btrfs: failed to read the system array on sdd2
  [ 582.536860] btrfs: open_ctree failed
  [ 606.784176] device label bdata devid 1 transid 25342 /dev/sdd2
  [ 606.784647] btrfs: allowing degraded mounts
  [ 606.784650] btrfs: disk space caching is enabled
  [ 606.785131] btrfs: failed to read chunk root on sdd2
  [ 606.785331] btrfs warning page private not zero on page 392922368
  [ 606.785408] btrfs: open_ctree failed
  [ 782.422959] device label bdata devid 1 transid 25342 /dev/sdd2
 
  no panic is good progress, but something is still not right.
 
  my options would seem to be
  1) reconnect old drive (probably in a USB caddy), see if it mounts as if 
  nothing ever happened. or possibly try and recover it back to a working 
  raid1. then try again with adding the new drive first, then removing the 
  old one.
  2) give up experimenting and create a new btrfs raid1, and restore from 
  backup
 
  both leave me with a worry about what would happen if a disk in a raid 1 
  did die. (unless is was the panic that did some damage that borked the 
  filesystem.)

 Some more details.

 If i reconnect the failing drive then I can mount the filesystem with no 
 errors, a quick glance suggests that the data is all there.

 Label: 'bdata' uuid: 1f07081c-316b-48be-af73-49e6f76535cc
 Total devices 2 FS bytes used 2.50TB
 devid 2 size 2.73TB used 2.73TB path /dev/sde1 -- this is the drive that i 
 wish to remove
 devid 1 size 2.73TB used 2.73TB path /dev/sdd2

 sudo btrfs filesystem df /mnt
 Data, RAID1: total=2.62TB, used=2.50TB
 System, DUP: total=40.00MB, used=396.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=112.00GB, used=3.84GB
 Metadata: total=8.00MB, used=0.00

 is the failure to mount when i remove sde due to it being dup, rather than 
 raid1?

Yes, I would say so.
Try a
btrfs balance start -mconvert=raid1 /mnt
so all metadata is on each drive.


 is adding a second drive to a btrfs filesystem and running
 btrfs fi balance start -dconvert=raid1 /mnt
 not sufficient to create an array that can survive the loss of a disk? do i 
 need -mconvert as well? is there an -sconvert for system?

 thanks

 Sam


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html 
   --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw partition or LV for btrfs?

2012-08-12 Thread Kyle Gates

I'm currently running a 1GB raid1 btrfs /boot with no problems.
Also, I think the current grub2 has lzo support.

-Original Message- 
From: Fajar A. Nugraha

Sent: Sunday, August 12, 2012 5:48 PM
To: Daniel Pocock
Cc: linux-btrfs@vger.kernel.org
Subject: Re: raw partition or LV for btrfs?

On Sun, Aug 12, 2012 at 11:46 PM, Daniel Pocock dan...@pocock.com.au 
wrote:



I notice this question on the wiki/faq:


https://btrfs.wiki.kernel.org/index.php/UseCases#What_is_best_practice_when_partitioning_a_device_that_holds_one_or_more_btr-filesystems

and as it hasn't been answered, can anyone make any comments on the 
subject


Various things come to mind:

a) partition the disk, create an LVM partition, and create lots of small
LVs, format each as btrfs

b) partition the disk, create an LVM partition, and create one big LV,
format as btrfs, make subvolumes

c) what about using btrfs RAID1?  Does either approach (a) or (b) seem
better for someone who wants the RAID1 feature?


IMHO when the qgroup feature is stable (i.e. adopted by distros, or
at least in stable kernel) then simply creating one big partition (and
letting btrfs handle RAID1, if you use it) is better. When 3.6 is out,
perhaps?

Until then I'd use LVM.



d) what about booting from a btrfs system?  Is it recommended to follow
the ages-old practice of keeping a real partition of 128-500MB,
formatting it as btrfs, even if all other data is in subvolumes as per 
(b)?


You can have one single partition only and boot directly from that.
However btrfs has the same problems as zfs in this regard:
- grub can read both, but can't write to either. In other words, no
support for grubenv
- the best compression method (gzip for zfs, lzo for btrfs) is not
supported by grub

For the first problem, an easy workaroud is just to disable the grub
configuration that uses grubenv. Easy enough, and no major
functionality loss.

The second one is harder for btrfs. zfs allows you to have separate
dataset (i.e. subvolume, in btfs terms) with different compression, so
you can have a dedicated dataset for /boot with different compression
setting from the rest of the dataset. With btrfs you're currently
stuck with using the same compression setting for everything, so if
you love lzo this might be a major setback.

There's also a btrfs-specific problem: it's hard to have a system
which have /boot on a separate subvol while managing it with current
automatic tools (e.g. update-grub).

Due to second and third problem, I'd recommend you just use a separate
partition with ext2/4 for now.

--
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nocow flags

2012-03-02 Thread Kyle Gates

I set the C (NOCOW) and z (Not_Compressed) flags on a folder but the extent 
counts of files contained there keep increasing.
Said files are large and frequently modified but not changing in size. This 
does not happen when the filesystem is mounted with nodatacow.

I'm using this as a workaround since subvolumes can't be mounted with different 
options simultaneously. ie. one with COW, one with nodatacow

Any ideas why the flags are being ignored?

I'm running 32bit 3.3rc4 with 
noatime,nodatasum,space_cache,autodefrag,inode_cache on a 3 disk RAID0 data 
RAID1 metadata filesystem.

Thanks,
Kyle
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Set nodatacow per file?

2012-02-29 Thread Kyle Gates


  Actually it is possible. Check out David's response to my question from
  some time ago:
  http://permalink.gmane.org/gmane.comp.file-systems.btrfs/14227

 this was a quick aid, please see attached file for an updated tool to set
 the file flags, now added 'z' for NOCOMPRESS flag, and supports chattr
 syntax plus all of the standard file flags.

 Setting and unsetting nocow is done like 'fileflags +C file' or -C for
 unseting. Without any + or - options it prints current state.


I get the following errors when running fileflags on large (2GB) database 
files:

open(): No such file or directory

open(): Value too large for defined data type


  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: btrfs-raid questions I couldn't find an answer to on the wiki

2012-01-31 Thread Kyle Gates

I've been having good luck with my /boot on a separate 1GB RAID1 btrfs
filesystem using grub2 (2 disks only! I wouldn't try it with 3). I
should note, however, that I'm NOT using compression on this volume
because if I remember correctly it may not play well with grub (maybe
that was just lzo though) and I'm also not using subvolumes either for
the same reason.


Thanks! I'm on grub2 as well. It's is still masked on gentoo, but I
recently unmasked and upgraded to it, taking advantage of the fact that I
have two two-spindle md/raid-1s for /boot and its backup to test and
upgrade one of them first, then the other only when I was satisfied with
the results on the first set. I'll be using a similar strategy for the
btrfs upgrades, only most of my md/raid-1s are 4-spindle, with two sets,
working and backup, and I'll upgrade one set first.

I'm going to keep /boot a pair of two-spindle raid-1s, but intend to make
them btrfs-raid1s instead of md/raid-1s, and will upgrade one two-spindle
set at a time.

More on the status of grub2 btrfs-compression support based on my
research. There is support for btrfs/gzip-compression in at least grub
trunk. AFAIK, it's gzip-compression in grub-1.99-release and
lzo-compression in trunk only, but I may be misremembering and it's gzip
in trunk only and only uncompressed in grub-1.99-release.


I believe you are correct that btrfs zlib support is included in grub2 
version 1.99 and lzo is in trunk.
I'll try compressing the files on /boot for one installed kernel with the 
defrag -czlib option and see how it goes.

Result: Seemed to work just fine.


In any event, since I'm running 128 MB /boot md/raid-1s without
compression now, and intend to increase the size to at least a quarter
gig to better align the following partitions, /boot is the one set of
btrfs partitions I do NOT intend to enable compression on, so that won't
be an issue for me here. And since for /boot I'm running a pair of
two-spindle raid1s instead of my usual quad-spindle raid1s, you've
confirmed that works as well. =:^)

As a side note, since I only recently did the grub2 upgrade, I've been
enjoying its ability to load and read md/raid and my current reiserfs
directly, thus giving me the ability to look up info in at least text-
based main system config and notes files directly from grub2, without
booting into Linux, if for some reason the above-grub boot is hosed or
inconvenient at that moment. I just realized that if I want to maintain
that direct-from-grub access, I'll need to ensure that the grub2 I'm
running groks the btrfs compression scheme I'm using on any filesystem I
want grub2 to be able to read.

Hmm... that brings up another question: You mention a 1-gig btrfs-raid1 /
boot, but do NOT mention whether you installed it before or after mixed-
chunk (data/metadata) support made it into btrfs and became the default
for = 1 gig filesystems.


I don't think I specifically enabled mixed chunk support when I created this 
filesystem. It was done on a 2.6 kernel sometime in the middle of 2011 iirc.



Can you confirm one way or the other whether you're running mixed-chunk
on that 1-gig? I'm not sure whether grub2's btrfs module groks mixed-
chunk or not, or whether that even matters to it.

Also, could you confirm mbr-bios vs gpt-bios vs uefi-gpt partitions? I'm
using gpt-bios partitioning here, with the special gpt-bios-reserved
partition, so grub2-install can build the modules necessary for /boot
access directly into its core-image and install that in the gpt-bios-
reserved partition. It occurs to me that either uefi-gpt or gpt-bios
with the appropriate reserved partition won't have quite the same issues
with grub2 reading a btrfs /boot that either mbr-bios or gpt-bios without
a reserved bios partition would. If you're running gpt-bios with a
reserved bios partition, that confirms yet another aspect of your setup,
compared to mine. If you're running uefi-gpt, not so much as at least in
theory, that's best-case. If you're running either mbr-bios or gpt-bios
without a reserved bios partition, that's a worst-case, so if it works,
then the others should definitely work.


Same here, gpt-bios, 1MB partition with bios_grub flag set (gdisk code EF02) 
for grub to reside on.



Meanwhile, you're right about subvolumes. I'd not try them on a btrfs
/boot, either. (I don't really see the use case for it, for a separate
/boot, tho there's certainly a case for a /boot subvolume on a btrfs
root, for people doing that.)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: fstab mount options ignored on subsequent subvolume mounts

2012-01-18 Thread Kyle Gates

  I have multiple subvolumes on the same filesystem that are mounted with 
  different options in fstab.
  The problem is the mount options for subsequent subvolume mounts seem to be 
  ignored as reflected in /proc/mounts.

 The output of 'mount' and /proc/mounts is different. mount takes it from
 /etc/mtab while /proc/mounts gets the information from kernel (calls
 into super.c:btrfs_show_options() )

 'mtab':
 - contains the options in order in which they were given to mount or in
 /etc/fstab
 -

 /proc/mounts:
 - order of options is fixed (as defined in the function)
 - if the option has a default value which was not given to mount, it is
 listed here (and is not in mtab)
 - an implied options appear here as well (like nodatacow implies
 nodatasum)


 Now, you're giving different set of options to each subvolume, but they
 belong to one filesystem and thus will result in set of options given to
 the first mounted subvolume for every other mounted subvolume.

 The first subvol calls 'btrfs_fill_super' and 'btrfs_parse_options', the
 other do not and do not. Remount will call 'btrfs_parse_options' again
 and will change the options set.

  $ cat /etc/fstab | grep mnt
  UUID=REMOVED /mnt/a btrfs 
  subvol=a,defaults,nodatacow,autodefrag,noatime,space_cache,inode_cache 0 0
  UUID=REMOVED /mnt/b btrfs 
  subvol=b,defaults,autodefrag,noatime,space_cache,inode_cache 0 0
  UUID=REMOVED /mnt/c btrfs 
  subvol=c,defaults,compress=zlib,autodefrag,noatime,space_cache,inode_cache 
  0 0
 
  $ mount | grep mnt
  /dev/sdb2 on /mnt/a type btrfs 
  (rw,noatime,subvol=a,nodatacow,autodefrag,space_cache,inode_cache)
  /dev/sdb2 on /mnt/b type btrfs 
  (rw,noatime,subvol=b,autodefrag,space_cache,inode_cache)
  /dev/sdb2 on /mnt/c type btrfs 
  (rw,noatime,subvol=c,compress=zlib,autodefrag,space_cache,inode_cache)
  $ cat /proc/mounts | grep mnt
  /dev/sdb2 /mnt/a btrfs 
  rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0
  /dev/sdb2 /mnt/b btrfs 
  rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0
  /dev/sdb2 /mnt/c btrfs 
  rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0
 
  continuing the example which should only change the mount options for one 
  of the subvolumes:
  $ sudo mount -o remount,compress=zlib /mnt/oldhome
  $ cat /proc/mounts | grep mnt
  /dev/sdb2 /mnt/a btrfs 
  rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache
   0 0
  /dev/sdb2 /mnt/b btrfs 
  rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache
   0 0
  /dev/sdb2 /mnt/c btrfs 
  rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache
   0 0

 I think the above explains things in general in your listings, the last
 one missing is subvol= in /proc/mounts. This is not implemented, but is
 possible (save non-default subvol name with the subvol root and print in
 show_options).


 david

Thanks for the clarification. I was under the impression that mounting 
multiple subvolumes with different options had been implemented. Perhaps
 someday it will be although for now there are more pressing issues.


I appreciate everyone's hard work and look forward to the continued development 
of btrfs.

many thanks,
Kyle
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fstab mount options ignored on subsequent subvolume mounts

2012-01-17 Thread Kyle Gates

Greeting all,

I have multiple subvolumes on the same filesystem that are mounted with 
different options in fstab.
The problem is the mount options for subsequent subvolume mounts seem to be 
ignored as reflected in /proc/mounts.

$ cat /etc/fstab | grep mnt
UUID=REMOVED /mnt/a btrfs 
subvol=a,defaults,nodatacow,autodefrag,noatime,space_cache,inode_cache 0 0
UUID=REMOVED /mnt/b btrfs 
subvol=b,defaults,autodefrag,noatime,space_cache,inode_cache 0 0
UUID=REMOVED /mnt/c btrfs 
subvol=c,defaults,compress=zlib,autodefrag,noatime,space_cache,inode_cache 0 0
$ mount | grep mnt
/dev/sdb2 on /mnt/a type btrfs 
(rw,noatime,subvol=a,nodatacow,autodefrag,space_cache,inode_cache)
/dev/sdb2 on /mnt/b type btrfs 
(rw,noatime,subvol=b,autodefrag,space_cache,inode_cache)
/dev/sdb2 on /mnt/c type btrfs 
(rw,noatime,subvol=c,compress=zlib,autodefrag,space_cache,inode_cache)
$ cat /proc/mounts | grep mnt
/dev/sdb2 /mnt/a btrfs 
rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0
/dev/sdb2 /mnt/b btrfs 
rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0
/dev/sdb2 /mnt/c btrfs 
rw,noatime,nodatasum,nodatacow,space_cache,autodefrag,inode_cache 0 0

continuing the example which should only change the mount options for one of 
the subvolumes:
$ sudo mount -o remount,compress=zlib /mnt/oldhome
$ cat /proc/mounts | grep mnt
/dev/sdb2 /mnt/a btrfs 
rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 
0 0
/dev/sdb2 /mnt/b btrfs 
rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 
0 0
/dev/sdb2 /mnt/c btrfs 
rw,noatime,nodatasum,nodatacow,compress=zlib,space_cache,autodefrag,inode_cache 
0 0


Running Ubuntu mainline kernel 3.2.1 (3.2.1-030201-generic #201201121644 SMP 
Thu Jan 12 21:53:24 UTC 2012 i686 athlon i386 GNU/Linux) with most recent 
btrfs-progs (2011-12-01) from linux/kernel/git/mason/btrfs-progs.git

Thanks,
Kyle
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-progs compile warnings on x86

2012-01-17 Thread Kyle Gates

When compiling btrfs-progs (2011-12-01) from 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git on 
3.2.1-030201-generic #201201121644 SMP Thu Jan 12 21:53:24 UTC 2012 i686 athlon 
i386 GNU/Linux
I get the following warnings:

ls btrfs_cmds.c
btrfs_cmds.c
gcc -Wp,-MMD,./.btrfs_cmds.o.d,-MT,btrfs_cmds.o -Wall -D_FILE_OFFSET_BITS=64 
-D_FORTIFY_SOURCE=2 -g -O0 -c btrfs_cmds.c
btrfs_cmds.c: In function â__ino_to_path_fdâ:
btrfs_cmds.c:1138:15: warning: cast from pointer to integer of different size 
[-Wpointer-to-int-cast]
btrfs_cmds.c: In function âdo_logical_to_inoâ:
btrfs_cmds.c:1242:15: warning: cast from pointer to integer of different size 
[-Wpointer-to-int-cast]

Thanks,
Kyle
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html