Re: price to pay for nocow file bit?

2015-01-07 Thread Duncan
Josef Bacik posted on Wed, 07 Jan 2015 15:10:06 -0500 as excerpted:

>> Does this have any effect on functionality? As I understood snapshots
>> still work fine for files marked like that, and so do reflinks. Any
>> drawback functionality-wise? Apparently file compression support is
>> lost if the bit is set? (which I can live with too, journal files are
>> internally compressed anyway)
>>
>>
> Yeah no compression, no checksums.  If you do reflink then you'll COW
> once and then the new COW will be nocow so it'll be fine.  Same goes for
> snapshots.  So you'll likely incur some fragmentation but less than
> before, but I'd measure to just make sure if it's that big of a deal.
> 
>> What about performance? Do any operations get substantially slower by
>> setting this bit? For example, what happens if I take a snapshot of
>> files with this bit set and then modify the file, does this result in a
>> full (and hence slow) copy of the file on that occasion?
>>
>>
> Performance is the same.

The otherwise nocow on-snapshot "cow1" is per-block (4096-byte AFAIK), so 
some fragmentation, but slower.

The "perfect storm" situation is people doing automated per-minute 
snapshots or similar (some people go to extremes with snapper or the 
like...), in which case setting nocow often doesn't help a whole lot, 
depending on how active the file-writing is, of course.

But for something like append-plus-pointer-update-pattern log files with 
something like per-day snapshotting, nocow should at least in theory help 
quite a bit, since the write-frequency and thus the prevented cows should 
be MUCH higher than the daily snapshot and thus the forced-block-cow1s.

-

FWIW, I'm systemd on btrfs here, but I use syslog-ng for my non-volatile 
logs and have Storage=volatile in journald.conf, using journald only for 
current-session, where unit status including last-10-messages makes 
troubleshooting /so/ much easier. =:^)  Once past current-session, text 
logs are more useful to me, which is where syslog-ng comes in.  Each to 
its strength, and keeping the journals from wearing the SSDs[1] is a very 
nice bonus. =:^)

---
[1] I can and do filter what syslog-ng writes, but couldn't find a way to 
filter journald's writes, only  queries/reads.  That alone saves writes 
for repeated noise I'm filtering out with syslog before it's ever 
written, that journald would still be writing if I let it write non-
volatile.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

2015-01-07 Thread Duncan
Martin Steigerwald posted on Wed, 07 Jan 2015 20:08:50 +0100 as excerpted:

> No BTRFS developers commented yet on this, neither in this thread nor in
> the bug report at kernel.org I made.

Just a quick general note on this point...

There has in the past (and I believe referenced on the wiki) been dev 
comment to the effect that on the list they tend to find particular 
reports/threads and work on them until they find and either fix the issue 
or (when not urgent) decide it must wait for something else, first.  
During the time they're busy pursuing such a report, they don't read 
others on the list very closely, and such list-only bug reports may thus 
get dropped on the floor and never worked on.

The recommendation, then, is to report it to the list, and if not picked 
up right away and you plan on being around in a few weeks/months when 
they potentially get to it, file a bug on it, so it doesn't get dropped 
on the floor.

With the bugzilla.kernel.org report you've followed the recommendation, 
but the implication is that you won't necessarily get any comment right 
away, only later, when they're not immediately busy looking at some other 
bug.  So lack of b.k.o comment in the immediate term doesn't mean they're 
ignoring the bug or don't value it; it just means they're hot on the 
trail of something else ATM and it might take some time to get that 
"first comment" engagement.

But the recommendation is to file the bugzilla report precisely so it 
does /not/ get lost, and you've done that, so... you've done your part 
there and now comes the enforced patience bit of waiting for that 
engagement.

But if it takes a bit, I would keep the bug updated every kernel release 
or so, with a comment updating status.

(Meanwhile, I've seen no indication of such issues here.  Most of my 
btrfs are 8-24 GiB each, all SSD, mostly dual-device btrfs raid1 both 
data/metadata.  Maybe I don't run those full enough, however.  I do have 
three mixed-bg mode sub-GiB btrfs, however, with one of them, a 256 MiB 
single-device dup-mode btrfs, used as /boot, that tends to run reasonably 
full, but I've not seen a problem like that there, either.  But my use-
case probably simply doesn't hit the problem.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] Btrfs: Enhancment for qgroup.

2015-01-07 Thread Dongsheng Yang
On 01/07/2015 08:49 AM, Satoru Takeuchi wrote:
> Hi Yang,
>
> On 2015/01/05 15:16, Dongsheng Yang wrote:
>> Hi Josef and others,
>>
>> This patch set is about enhancing qgroup.
>>
>> [1/3]: fix a bug about qgroup leak when we exceed quota limit,
>>  It is reviewd by Josef.
>> [2/3]: introduce a new accounter in qgroup to close a window where
>>  user will exceed the limit by qgroup. It "looks good" to Josef.
>> [3/3]: a new patch to fix a bug reported by Satoru.
> I tested your the patchset v3. Although it's far better
> than the patchset v2, there is still one problem in this patchset.
> When I wrote 1.5GiB to a subvolume with 1.0 GiB limit,
> 1.0GiB - 139 block (in this case, 1KiB/block) was written.
>
> I consider user should be able to write just 1.0GiB in this case.

Hi Satoru,

Yes, Currently, user can not write 1.0GiB in this case. Because qgroup
is accounting data and metadata togather. And I have posted an idea
in this thread that split it into three modes, data, metadata and both.

TODO issues:
c). limit and account size in 3 modes, data, metadata and both.
qgroup is accounting the size both of data and metadata
togather, but to users, the data size is the most useful to them.


But, you mentioned that the result is different in each time.
Hmmm there must be something wrong in it. I need some more
investigation to answer this question.

Thanx a lot for your test!
Yang
>
> * Test result
>
> ===
> + mkfs.btrfs -f /dev/vdb
> Btrfs v3.17
> See http://btrfs.wiki.kernel.org for more information.
>
> Turning ON incompat feature 'extref': increased hardlink limit per file to 
> 65536
> fs created label (null) on /dev/vdb
> nodesize 16384 leafsize 16384 sectorsize 4096 size 30.00GiB
> + mount /dev/vdb /root/btrfs-auto-test/
> + ret=0
> + btrfs quota enable /root/btrfs-auto-test/
> + btrfs subvolume create /root/btrfs-auto-test//sub
> Create subvolume '/root/btrfs-auto-test/sub'
> + btrfs qgroup limit 1G /root/btrfs-auto-test//sub
> + dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=150
> dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
> 1048438+0 records in# Tried to write 1GiB - 138 KiB
> 1048437+0 records out   # Succeeded to write 1GiB - 139 KiB
> 1073599488 bytes (1.1 GB) copied, 19.0247 s, 56.4 MB/s
> ===
>
> * note
>
> I tried to run the reproducer five times and the result is
> a bit different for each time.
>
> =
> # Written
> -
> 1 1GiB - 139 KiB
> 2 1GiB - 139 KiB
> 3 1GiB - 145 KiB
> 4 1GiB - 135 KiB
> 5 1GiB - 135 KiB
> ==
>
> So I consider it's a problem comes from timing.
>
> If I changed the block size from 1KiB to 1 MiB,
> the difference in bytes got larger.
>
> 
> # Written
> 
> 1 1GiB - 1 MiB
> 2 1GiB - 1 MiB
> 3 1GiB - 1 MiB
> 4 1GiB - 1 MiB
> 5 1GiB - 1 MiB
> 
>
> Thanks,
> Satoru
>
>> BTW, I have some other plan about qgroup in my TODO list:
>>
>> Kernel:
>> a). adjust the accounters in parent qgroup when we move
>> the child qgroup.
>>  Currently, when we move a qgroup, the parent qgroup
>> will not updated at the same time. This will cause some wrong
>> numbers in qgroup.
>>
>> b). add a ioctl to show the qgroup info.
>>  Command "btrfs qgroup show" is showing the qgroup info
>> read from qgroup tree. But there is some information in memory
>> which is not synced into device. Then it will show some outdate
>> number.
>>
>> c). limit and account size in 3 modes, data, metadata and both.
>>  qgroup is accounting the size both of data and metadata
>> togather, but to a user, the data size is the most useful to them.
>>
>> d). remove a subvolume related qgroup when subvolume is deleted and
>> there is no other reference to it.
>>
>> user-tool:
>> a). Add the unit of B/K/M/G to btrfs qgroup show.
>> b). get the information via ioctl rather than reading it from
>> btree. Will keep the old way as a fallback for compatiblity.
>>
>> Any comment and sugguestion is welcome. :)
>>
>> Yang
>>
>> Dongsheng Yang (3):
>>Btrfs: qgroup: free reserved in exceeding quota.
>>Btrfs: qgroup: Introduce a may_use to account
>>  space_info->bytes_may_use.
>>Btrfs: qgroup, Account data space in more proper timings.
>>
>>   fs/btrfs/extent-tree.c | 41 +++---
>>   fs/btrfs/file.c|  9 ---
>>   fs/btrfs/inode.c   | 18 -
>>   fs/btrfs/qgroup.c  | 68 
>> +++---
>>   fs/btrfs/qgroup.h  |  4 +++
>>   5 files changed, 117 insertions(+), 23 deletions(-)
>>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrf

Re: [PATCH] btrfs: introduce shrinker for rb_tree that keeps valid btrfs_devices

2015-01-07 Thread Gui Hecheng
[ping]

On Wed, 2014-12-10 at 15:39 +0800, Gui Hecheng wrote:
> The following patch:
>   btrfs: remove empty fs_devices to prevent memory runout
> 
> introduces @valid_dev_root aiming at recording @btrfs_device objects that
> have corresponding block devices with btrfs.
> But if a block device is broken or unplugged, no one tells the
> @valid_dev_root to cleanup the "dead" objects.
> 
> To recycle the memory occuppied by those "dead"s, we could rely on
> the shrinker. The shrinker's scan function will traverse the
> @valid_dev_root and trys to open the devices one by one, if it fails
> or encounters a non-btrfs it will remove the "dead" @btrfs_device.
> 
> A special case to deal with is that a block device is unplugged and
> replugged, then it appears with a new @bdev->bd_dev as devnum.
> In this case, we should remove the older since we should have a new
> one for that block device already.
> 
> Signed-off-by: Gui Hecheng 
> ---
>  fs/btrfs/super.c   | 10 
>  fs/btrfs/volumes.c | 74 
> +-
>  fs/btrfs/volumes.h |  4 +++
>  3 files changed, 87 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index ee09a56..29069af 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1987,6 +1987,12 @@ static struct miscdevice btrfs_misc = {
>   .fops   = &btrfs_ctl_fops
>  };
>  
> +static struct shrinker btrfs_valid_dev_shrinker = {
> + .scan_objects = btrfs_valid_dev_scan,
> + .count_objects = btrfs_valid_dev_count,
> + .seeks = DEFAULT_SEEKS,
> +};
> +
>  MODULE_ALIAS_MISCDEV(BTRFS_MINOR);
>  MODULE_ALIAS("devname:btrfs-control");
>  
> @@ -2100,6 +2106,8 @@ static int __init init_btrfs_fs(void)
>  
>   btrfs_init_lockdep();
>  
> + register_shrinker(&btrfs_valid_dev_shrinker);
> +
>   btrfs_print_info();
>  
>   err = btrfs_run_sanity_tests();
> @@ -2113,6 +2121,7 @@ static int __init init_btrfs_fs(void)
>   return 0;
>  
>  unregister_ioctl:
> + unregister_shrinker(&btrfs_valid_dev_shrinker);
>   btrfs_interface_exit();
>  free_end_io_wq:
>   btrfs_end_io_wq_exit();
> @@ -2153,6 +2162,7 @@ static void __exit exit_btrfs_fs(void)
>   btrfs_interface_exit();
>   btrfs_end_io_wq_exit();
>   unregister_filesystem(&btrfs_fs_type);
> + unregister_shrinker(&btrfs_valid_dev_shrinker);
>   btrfs_exit_sysfs();
>   btrfs_cleanup_valid_dev_root();
>   btrfs_cleanup_fs_uuids();
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7093cce..62f37b1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -54,6 +54,7 @@ static void btrfs_dev_stat_print_on_load(struct 
> btrfs_device *device);
>  DEFINE_MUTEX(uuid_mutex);
>  static LIST_HEAD(fs_uuids);
>  static struct rb_root valid_dev_root = RB_ROOT;
> +static atomic_long_t unopened_dev_count = ATOMIC_LONG_INIT(0);
>  
>  static struct btrfs_device *insert_valid_device(struct btrfs_device *new_dev)
>  {
> @@ -130,6 +131,8 @@ static void free_invalid_device(struct btrfs_device 
> *invalid_dev)
>  {
>   struct btrfs_fs_devices *old_fs;
>  
> + atomic_long_dec(&unopened_dev_count);
> +
>   old_fs = invalid_dev->fs_devices;
>   mutex_lock(&old_fs->device_list_mutex);
>   list_del(&invalid_dev->dev_list);
> @@ -615,6 +618,7 @@ static noinline int device_list_add(const char *path,
>   list_add_rcu(&device->dev_list, &fs_devices->devices);
>   fs_devices->num_devices++;
>   mutex_unlock(&fs_devices->device_list_mutex);
> + atomic_long_inc(&unopened_dev_count);
>  
>   ret = 1;
>   device->fs_devices = fs_devices;
> @@ -788,6 +792,7 @@ again:
>   blkdev_put(device->bdev, device->mode);
>   device->bdev = NULL;
>   fs_devices->open_devices--;
> + atomic_long_inc(&unopened_dev_count);
>   }
>   if (device->writeable) {
>   list_del_init(&device->dev_alloc_list);
> @@ -850,8 +855,10 @@ static int __btrfs_close_devices(struct btrfs_fs_devices 
> *fs_devices)
>   struct btrfs_device *new_device;
>   struct rcu_string *name;
>  
> - if (device->bdev)
> + if (device->bdev) {
>   fs_devices->open_devices--;
> + atomic_long_inc(&unopened_dev_count);
> + }
>  
>   if (device->writeable &&
>   device->devid != BTRFS_DEV_REPLACE_DEVID) {
> @@ -981,6 +988,7 @@ static int __btrfs_open_devices(struct btrfs_fs_devices 
> *fs_devices,
>   fs_devices->rotating = 1;
>  
>   fs_devices->open_devices++;
> + atomic_long_dec(&unopened_dev_count);
>   if (device->writeable &&
>   device->devid != BTRFS_DEV_REPLACE_DEVID) {
>   fs_devices->rw_devices++;
> @@ -6828,3 +6836,67 @@ vo

[PATCH] btrfs-progs: doc: fix format of btrfs-replace

2015-01-07 Thread Tsutomu Itoh
Current 'man btrfs-replace' is as follows:


...
...
   -f
   force using and overwriting  even if it looks like
   containing a valid btrfs filesystem.

   A valid filesystem is assumed if a btrfs superblock is found
   which contains a correct checksum. Devices which are currently
   mounted are never allowed to be used as the . -B
   no background replace.
...
...


The format of 'B' option is wrong. So, fix it.

Signed-off-by: Tsutomu Itoh  
---
NOTE: This patch based on v3.18.x branch.

 Documentation/btrfs-replace.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/btrfs-replace.txt b/Documentation/btrfs-replace.txt
index 7402484..e8eac2c 100644
--- a/Documentation/btrfs-replace.txt
+++ b/Documentation/btrfs-replace.txt
@@ -52,6 +52,7 @@ containing a valid btrfs filesystem.
 A valid filesystem is assumed if a btrfs superblock is found which contains a
 correct checksum. Devices which are currently mounted are
 never allowed to be used as the .
++
 -B
 no background replace.
 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123!

2015-01-07 Thread Tomasz Chmielewski

On 2015-01-07 15:58, Satoru Takeuchi wrote:


Create subvolume './subvolume'
# dd if=/dev/urandom of=bigfile.img bs=64k


Does it really this command? I consider it will fill up
whole /dev/vdb.


It normally would fill the fs if left for long, but I've pressed ctrl+c 
after about 6 GB.




And is it not subvolume/bigfile.img
but bigfile.img?


If I recall correctly, it was not inside the subvolume.


(...)


3127377920 bytes (3.1 GB) copied, 194.641 s, 16.1 MB/s


If bigfile.img is just under /mnt/test, I can't understand
why this command succeeded to write more 3 GiB.


The previous command wrote 6 GB, this one wrote 3.1 GB - there was still 
plenty of free space.



(...)


# dd if=/dev/urandom of=bigfile3.img bs=64k
^C3617580+0 records in
3617579+0 records out
237081657344 bytes (237 GB) copied, 14796 s, 16.0 MB/s


It's too.


This one was also left running for long, followed by ctrl+c (note the ^C 
in my pasted output).

We didn't fill the fs 100% in any of these cases.



# df -h
Filesystem  Size  Used Avail Use% Mounted on
(...)
/dev/vdb256G  230G   25G  91% /mnt/test


# btrfs qgroup show /mnt/test
qgroupid rfer excl
  
0/5  1638416384
0/257245960245248 245960245248

# ls -l
total 240451584
-rw-r--r-- 1 root root   3127377920 Dec 19 20:06 bigfile2.img
-rw-r--r-- 1 root root 237081657344 Dec 20 00:15 bigfile3.img
-rw-r--r-- 1 root root   6013386752 Dec 19 20:02 bigfile.img

# rm bigfile3.img

# sync

# dmesg
(...)
[   95.055420] BTRFS: device fsid 97f98279-21e7-4822-89be-3aed9dc05f2c 
devid 1 transid 3 /dev/vdb

[  118.446509] BTRFS info (device vdb): disk space caching is enabled
[  118.446518] BTRFS: flagging fs with big metadata feature
[  118.452176] BTRFS: creating UUID tree
[  575.189412] BTRFS info (device vdb): qgroup scan completed
[15948.234826] [ cut here ]
[15948.234883] kernel BUG at 
/home/apw/COD/linux/fs/btrfs/inode.c:3123!

[15948.234906] invalid opcode:  [#1] SMP
[15948.234925] Modules linked in: nf_log_ipv6 ip6t_REJECT 
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
ip6_tables nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 
xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables dm_crypt btrfs xor crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel ppdev aes_x86_64 lrw 
raid6_pq gf128mul glue_helper ablk_helper cryptd serio_raw mac_hid 
pvpanic 8250_fintek parport_pc i2c_piix4 lp parport psmouse qxl ttm 
floppy drm_kms_helper drm
[15948.235172] CPU: 0 PID: 3274 Comm: btrfs-cleaner Not tainted 
3.18.1-031801-generic #201412170637
[15948.235193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 
04/01/2014
[15948.235222] task: 880036708a00 ti: 88007b97c000 task.ti: 
88007b97c000
[15948.235240] RIP: 0010:[]  [] 
btrfs_orphan_add+0x1a9/0x1c0 [btrfs]

[15948.235305] RSP: 0018:88007b97fc98  EFLAGS: 00010286
[15948.235318] RAX: ffe4 RBX: 88007b80a800 RCX: 

[15948.235333] RDX: 219e RSI: 0004 RDI: 
880079418138
[15948.235349] RBP: 88007b97fcd8 R08: 88007fc1cae0 R09: 
88007ad272d0
[15948.235366] R10:  R11: 0010 R12: 
88007a2d9500
[15948.235381] R13: 8800027d60e0 R14: 88007b80ac58 R15: 
0001
[15948.235401] FS:  () GS:88007fc0() 
knlGS:

[15948.235418] CS:  0010 DS:  ES:  CR0: 80050033
[15948.235432] CR2: 7f0489ff CR3: 7a5e CR4: 
001407f0

[15948.235464] Stack:
[15948.235473]  88007b97fcd8 c0497acf 88007b809800 
88003c207400
[15948.235498]  88007b809800 88007ad272d0 88007a2d9500 
0001
[15948.235521]  88007b97fd58 c04412e0 880079418000 
0004c0427fea

[15948.235551] Call Trace:
[15948.235601]  [] ? 
lookup_free_space_inode+0x4f/0x100 [btrfs]
[15948.235642]  [] 
btrfs_remove_block_group+0x140/0x490 [btrfs]
[15948.235693]  [] btrfs_remove_chunk+0x245/0x380 
[btrfs]
[15948.235731]  [] 
btrfs_delete_unused_bgs+0x236/0x270 [btrfs]
[15948.235771]  [] cleaner_kthread+0x12c/0x190 
[btrfs]
[15948.235806]  [] ? 
btrfs_destroy_all_delalloc_inodes+0x120/0x120 [btrfs]

[15948.235844]  [] kthread+0xc9/0xe0
[15948.235872]  [] ? flush_kthread_worker+0x90/0x90
[15948.235900]  [] ret_from_fork+0x7c/0xb0
[15948.235919]  [] ? flush_kthread_worker+0x90/0x90
[15948.235933] Code: e8 7d a1 fc ff 8b 45 c8 e9 6d ff ff ff 0f 1f 44 
00 00 f0 41 80 65 80 fd 4c 89 ef 89 45 c8 e8 cf 20 fe ff 8b 45 c8 e9 
48 ff ff ff <0f> 0b 4c 89 f7 45 31 f6 e8 8a a2 35 c1 e9 f9 fe ff ff 0f 
1f 44
[15948.236017] RIP  [] btrfs_orphan_add+0x1a9/0x1c0 
[btrfs]

[15948.236017]  RSP 
[15948.761942] ---[ end trace 0ccd21c265dce56b ]---

# ls
bigfile2.img  bigfile.img

# touch 1
(...never returned...)



Tomasz 

Re: [PATCH] btrfs-progs: Fix a copy-n-paste bug in btrfs_read_fs_root().

2015-01-07 Thread Satoru Takeuchi
On 2015/01/07 18:23, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 

Reviewed-by: Satoru Takeuchi 

> ---
>   disk-io.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/disk-io.c b/disk-io.c
> index 2bf8586..b853f66 100644
> --- a/disk-io.c
> +++ b/disk-io.c
> @@ -693,7 +693,7 @@ struct btrfs_root *btrfs_read_fs_root(struct 
> btrfs_fs_info *fs_info,
>   if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
>   return fs_info->csum_root;
>   if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
> - return fs_info->csum_root;
> + return fs_info->quota_root;
>   
>   BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
>  location->offset != (u64)-1);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: price to pay for nocow file bit?

2015-01-07 Thread Josef Bacik

On 01/07/2015 04:05 PM, Goffredo Baroncelli wrote:



I am trying to understand the pros and cons of turning this bit
on, before I can make this change. So far I see one big pro, but I
wonder if there's any major con I should think about?



Nope there's no real con other than you don't get csums, but that
doesn't really matter for you.  Thanks,


In a btrfs-raid setup, in case of a corrupted sector, is BTRFS able to
rebuild the sector ?
I suppose no; if so this has to be add to the cons I think.



It won't know its corrupted, but it can rebuild if say you yank a drive 
and add a new one.  RAID5/RAID6 would catch corruption of course.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

2015-01-07 Thread Zygo Blaxell
On Wed, Jan 07, 2015 at 08:08:50PM +0100, Martin Steigerwald wrote:
> Am Dienstag, 6. Januar 2015, 15:03:23 schrieb Zygo Blaxell:
> > ext3 has a related problem when it's nearly full:  it will try to search
> > gigabytes of block allocation bitmaps searching for a free block, which
> > can result in a single 'mkdir' call spending 45 minutes reading a large
> > slow 99.5% full filesystem.
> 
> Ok, thats for bitmap access. Ext4 uses extens. 

...and the problem doesn't happen to the same degree on ext4 as it did
on ext3.

> > So far I've found that problems start when space drops below 1GB free
> > (although it can go as low as 400MB) and problems stop when space gets
> > above 1GB free, even without resizing or balancing the filesystem.
> > I've adjusted free space monitoring thresholds accordingly for now,
> > and it seems to be keeping things working so far.
> 
> Just to see whether we are on the same terms: You talk about space that BTRFS 
> has not yet reserved for chunks, i.e. the difference between size and used in 
> btrfs fi sh, right?

The number I look at for this issue is statvfs() f_bavail (i.e. the
"Available" column of /bin/df).

Before the empty-chunk-deallocation code, most of my filesystems would
quickly reach a steady state where all space is allocated to chunks,
and they stay that way unless I have to downsize them.

Now there is free (non-chunk) space on most of my filesystems.  I'll try
monitoring btrfs fi df and btrfs fi show under the failing conditions
and see if there are interesting correlations.



signature.asc
Description: Digital signature


Re: price to pay for nocow file bit?

2015-01-07 Thread Goffredo Baroncelli
> 
>> I am trying to understand the pros and cons of turning this bit
>> on, before I can make this change. So far I see one big pro, but I
>> wonder if there's any major con I should think about?
>> 
> 
> Nope there's no real con other than you don't get csums, but that
> doesn't really matter for you.  Thanks,

In a btrfs-raid setup, in case of a corrupted sector, is BTRFS able to 
rebuild the sector ?
I suppose no; if so this has to be add to the cons I think.

>From my tests [1][2] I was unable to get bigger difference between doing a 
>defrag 
and setting chattr -C the log directory. Did you get other results, if so I am 
interested to know more.

BR
G.Baroncelli



[1] http://kreijack.blogspot.it/2014/06/btrfs-and-systemd-journal.html
[2] http://lists.freedesktop.org/archives/systemd-devel/2014-June/020141.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: price to pay for nocow file bit?

2015-01-07 Thread Josef Bacik

On 01/07/2015 12:43 PM, Lennart Poettering wrote:

Heya!

Currently, systemd-journald's disk access patterns (appending to the
end of files, then updating a few pointers in the front) result in
awfully fragmented journal files on btrfs, which has a pretty
negative effect on performance when accessing them.



I've been wondering if mount -o autodefrag would deal with this problem 
but I haven't had the chance to look into it.



Now, to improve things a bit, I yesterday made a change to journald,
to issue the btrfs defrag ioctl when a journal file is rotated,
i.e. when we know that no further writes will be ever done on the
file.

However, I wonder now if I should go one step further even, and use
the equivalent of "chattr -C" (i.e. nocow) on all journal files. I am
wondering what price I would precisely have to pay for
that. Judging by this earlier thread:

 
https://urldefense.proofpoint.com/v1/url?u=http://www.spinics.net/lists/linux-btrfs/msg33134.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=ODekp6cRJncqEDXqNoiRQ1kLtNawlAzzBmNPpCF7hIw%3D%0A&s=3868518396650e6542b0189719e11f9c490e400c5205c29a20db0b699969c414

it's mostly about data integrity, which is something I can live with,
given the conservative write patterns of journald, and the fact that
we do our own checksumming and careful data validation. I mean, if
btrfs in this mode provides no worse data integrity semantics than
ext4 I am fully fine with losing this feature for these files.



Yup its no worse than ext4.


Hence I am mostly interested in what else is lost if this flag is
turned on by default for all journal files journald creates:

Does this have any effect on functionality? As I understood snapshots
still work fine for files marked like that, and so do
reflinks. Any drawback functionality-wise? Apparently file compression
support is lost if the bit is set? (which I can live with too, journal
files are internally compressed anyway)



Yeah no compression, no checksums.  If you do reflink then you'll COW 
once and then the new COW will be nocow so it'll be fine.  Same goes for 
snapshots.  So you'll likely incur some fragmentation but less than 
before, but I'd measure to just make sure if it's that big of a deal.



What about performance? Do any operations get substantially slower by
setting this bit? For example, what happens if I take a snapshot of
files with this bit set and then modify the file, does this result in
a full (and hence slow) copy of the file on that occasion?



Performance is the same.


I am trying to understand the pros and cons of turning this bit on,
before I can make this change. So far I see one big pro, but I wonder
if there's any major con I should think about?



Nope there's no real con other than you don't get csums, but that 
doesn't really matter for you.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time)

2015-01-07 Thread Martin Steigerwald
Am Dienstag, 6. Januar 2015, 15:03:23 schrieb Zygo Blaxell:
> On Mon, Dec 29, 2014 at 10:32:00AM +0100, Martin Steigerwald wrote:
> > Am Sonntag, 28. Dezember 2014, 21:07:05 schrieb Zygo Blaxell:
> > > On Sat, Dec 27, 2014 at 08:23:59PM +0100, Martin Steigerwald wrote:
[…]
> > > > Zygo, was is the characteristics of your filesystem. Do you use
> > > > compress=lzo and skinny metadata as well? How are the chunks
> > > > allocated?
> > > > What kind of data you have on it?
> > > 
> > > compress-force (default zlib), no skinny-metadata.  Chunks are d=single,
> > > m=dup.  Data is a mix of various desktop applications, most active
> > > file sizes from a few hundred K to a few MB, maybe 300k-400k files.
> > > No database or VM workloads.  Filesystem is 100GB and is usually between
> > > 98 and 99% full (about 1-2GB free).
> > > 
> > > I have another filesystem which has similar problems when it's 99.99%
> > > full (it's 13TB, so 0.01% is 1.3GB).  That filesystem is RAID1 with
> > > skinny-metadata and no-holes.
> > > 
> > > On various filesystems I have the above CPU-burning problem, a bunch of
> > > irreproducible random crashes, and a hang with a kernel stack that goes
> > > through SyS_unlinkat and btrfs_evict_inode.
> > 
> > Zygo, thanks. That desktop filesystem sounds a bit similar to my usecase,
> > with the interesting difference that you have no databases or VMs on it.
> > 
> > That said, I use the Windows XP rarely, but using it was what made the
> > issue so visible for me. Is your desktop filesystem on SSD?
> 
> No, but I recently stumbled across the same symptoms on an 8GB SD card
> on kernel 3.12.24 (raspberry pi).  When the filesystem hit over ~97%
> full, all accesses were blocked for several minutes.  I was able to
> work around it by adjusting the threshold on a garbage collector daemon
> (i.e. deleting a lot of expendable files) to keep usage below 90%.
> I didn't try to balance the filesystem, and didn't seem to need to.

Interesting.

> ext3 has a related problem when it's nearly full:  it will try to search
> gigabytes of block allocation bitmaps searching for a free block, which
> can result in a single 'mkdir' call spending 45 minutes reading a large
> slow 99.5% full filesystem.

Ok, thats for bitmap access. Ext4 uses extens. BTRFS can use bitmaps as well, 
but also supports extents and I think uses it for most use cases.

> I'd expect a btrfs filesystem that was nearly full to have a small tree
> of cached free space extents and be able to search it quickly even if
> the result is negative (i.e. there's no free space).  It seems to be
> doing something else... :-P

Yeah :)


> > Do you have the chance to extend one of the affected filesystems to check
> > my theory that this does not happen as long as BTRFS can still allocate
> > new
> > data chunks? If its right, your FS should be fluent again as long as you
> > see more than 1 GiB free
> > 
> > Label: none  uuid: 53bdf47c-4298-45bc-a30f-8a310c274069
> > 
> > Total devices 2 FS bytes used 512.00KiB
> > devid1 size 10.00GiB used 6.53GiB path
> > /dev/mapper/sata-btrfsraid1
> > devid2 size 10.00GiB used 6.53GiB path
> > /dev/mapper/msata-btrfsraid1
> > 
> > between "size" and "used" in btrfs fi sh. I suggest going with at least
> > 2-3
> > GiB, as BTRFS may allocate just one chunk so quickly that you do not have
> > the chance to recognize the difference.
> 
> So far I've found that problems start when space drops below 1GB free
> (although it can go as low as 400MB) and problems stop when space gets
> above 1GB free, even without resizing or balancing the filesystem.
> I've adjusted free space monitoring thresholds accordingly for now,
> and it seems to be keeping things working so far.

Just to see whether we are on the same terms: You talk about space that BTRFS 
has not yet reserved for chunks, i.e. the difference between size and used in 
btrfs fi sh, right?

No BTRFS developers commented yet on this, neither in this thread nor in the 
bug report at kernel.org I made.

> > Well, and if thats works for you, we are back to my recommendation:
> > 
> > More so than with other filesystems give BTRFS plenty of free space to
> > operate with. At best as much, that you always have a mininum of 2-3 GiB
> > unused device space for chunk reservation left. One could even do some
> > Nagios/Icinga monitoring plugin for that :)

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

signature.asc
Description: This is a digitally signed message part.


ssd mode on rotational media

2015-01-07 Thread Kyle Gates
What issues would arise if ssd mode is activated because of a block layer 
setting the rotational flag to zero? This happens for me running btrfs on 
bcache. Would it be beneficial to pass the no_ssd flag?
Thanks,
Kyle
  --
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


price to pay for nocow file bit?

2015-01-07 Thread Lennart Poettering
Heya!

Currently, systemd-journald's disk access patterns (appending to the
end of files, then updating a few pointers in the front) result in
awfully fragmented journal files on btrfs, which has a pretty
negative effect on performance when accessing them.

Now, to improve things a bit, I yesterday made a change to journald,
to issue the btrfs defrag ioctl when a journal file is rotated,
i.e. when we know that no further writes will be ever done on the
file. 

However, I wonder now if I should go one step further even, and use
the equivalent of "chattr -C" (i.e. nocow) on all journal files. I am
wondering what price I would precisely have to pay for
that. Judging by this earlier thread:

http://www.spinics.net/lists/linux-btrfs/msg33134.html

it's mostly about data integrity, which is something I can live with,
given the conservative write patterns of journald, and the fact that
we do our own checksumming and careful data validation. I mean, if
btrfs in this mode provides no worse data integrity semantics than
ext4 I am fully fine with losing this feature for these files.

Hence I am mostly interested in what else is lost if this flag is
turned on by default for all journal files journald creates: 

Does this have any effect on functionality? As I understood snapshots
still work fine for files marked like that, and so do
reflinks. Any drawback functionality-wise? Apparently file compression
support is lost if the bit is set? (which I can live with too, journal
files are internally compressed anyway)

What about performance? Do any operations get substantially slower by
setting this bit? For example, what happens if I take a snapshot of
files with this bit set and then modify the file, does this result in
a full (and hence slow) copy of the file on that occasion? 

I am trying to understand the pros and cons of turning this bit on,
before I can make this change. So far I see one big pro, but I wonder
if there's any major con I should think about?

Thanks,

Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_inode_item's otime?

2015-01-07 Thread Christoph Hellwig
On Wed, Jan 07, 2015 at 02:57:35PM +0100, Lennart Poettering wrote:
> Exposig this as xattr sounds great to me too.

NAK - exposing random stat data as xattr only creates problems.

Given that we don't seem to be able to get a new stat format anytime
soon we should add a generic ioctl to expose it, reading it from struct
kstat which all filesystem that support this attribute should fill out.
And there's quite a lot of them.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_inode_item's otime?

2015-01-07 Thread Lennart Poettering
On Tue, 06.01.15 19:26, David Sterba (dste...@suse.cz) wrote:

> > (Of course, even without xstat(), I think it would be good to have an
> > unprivileged ioctl to query the otime in btrfs... the TREE_SEARCH
> > ioctl after all requires privileges...)
> 
> Adding this interface is a different question. I do not like to add
> ioctls that do too specialized things that normally fit into a generic
> interface like the xstat example. We could use the object properties
> instead (ie. export the otime as an extended attribute), but the work on
> that has stalled and it's not ready to just simply add the otime in
> advance.

Exposig this as xattr sounds great to me too.

Lennart

-- 
Lennart Poettering, Red Hat
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data recovery after RBD I/O error

2015-01-07 Thread Austin S Hemmelgarn

On 2015-01-06 23:11, Jérôme Poulin wrote:

On Mon, Jan 5, 2015 at 6:59 AM, Austin S Hemmelgarn
 wrote:

Secondly, I would highly recommend not using ANY non-cluster-aware FS on top
of a clustered block device like RBD



For my use-case, this is just a single server using the RBD device. No
clustering involved on the BTRFS side of thing.
My only point is that there isn't anything in BTRFS to handle it 
accidentally being multiply mounted.  Ext* for example aren't clustered, 
but do have an optional feature to prevent multiple mounting.

However, it was really useful to take snapshots (just like LVM) before 
modifying the
filesystem in any way.

Have you tried Ceph's built in snapshot support?  I don't remember how 
to use it, but I do know it is there (at least, it is in the most recent 
versions), and it is a bit more like LVM's snapshots than BTRFS is.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: BTRFS_IOC_TREE_SEARCH ioctl

2015-01-07 Thread Lennart Poettering
On Mon, 05.01.15 19:14, Nehemiah Dacres (vivacar...@gmail.com) wrote:

> Is libbtrfs documented or even stable yet? What stage of development is it
> in anyway? is there a design spec yet?

Note that the code we use in systemd is not based on libbtrfs, we just
call the ioctls directly.

Lennart

-- 
Lennart Poettering, Red Hat
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: get the accurate value of used_bytes in btrfs_get_block_group_info().

2015-01-07 Thread Dongsheng Yang

On 01/07/2015 05:22 PM, Qu Wenruo wrote:

Hi Satoru-san

Hi Dongsheng,

On 2015/01/05 20:19, Dongsheng Yang wrote:


Ping.

IOCTL of BTRFS_IOC_SPACE_INFO currently does not report
the data used but not synced to user.  Then btrfs fi df will
give user a wrong numbers before sync. This patch solve
this problem.

On 10/27/2014 08:38 PM, Dongsheng Yang wrote:

Reproducer:
# mkfs.btrfs -f -b 20G /dev/sdb
# mount /dev/sdb /mnt/test
# fallocate  -l 17G /mnt/test/largefile
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=6.00GiB <- only 6G, but 
actually it should be 17G.

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B


I tried to reproduce your problem with 3.19-rc1.
However, this problem doesn't happen. Could you
also try to reproduce with the upstream kernel?
I can still reproduce it in 3.18, but it seems to be fixed in 3.19-rc1 
already by other patch,

so this patch is no longer needed.


Oops, my fault. I forgot to test it with upstream. :(

Satoru and Qu, thanx a lot.

Yang


Thanks,
Qu


* Detail

test script (named "yang-test.sh" here):
=== 


#!/bin/bash -x

PART1=/dev/vdb
MNT_PNT=./mnt
mkfs.btrfs -f -b 20G ${PART1}
mount ${PART1} ${MNT_PNT}
fallocate -l 17G ${MNT_PNT}/largefile
btrfs fi df ${MNT_PNT}
sync
btrfs fi df ${MNT_PNT}
umount ${MNT_PNT}
=== 



Result:
=== 


# ./yang-test.sh
+ PART1=/dev/vdb
+ MNT_PNT=./mnt
+ mkfs.btrfs -f -b 20G /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per 
file to 65536

fs created label (null) on /dev/vdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 20.00GiB
+ mount /dev/vdb ./mnt
+ fallocate -l 17G ./mnt/largefile
+ btrfs fi df ./mnt
Data, single: total=17.01GiB, used=17.00GiB   # Used 17GiB properly
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
+ sync
+ btrfs fi df ./mnt
Data, single: total=17.01GiB, used=17.00GiB# (of course) used 
17GiB too

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
+ umount ./mnt
=== 



Although I ran this test five times, the results are the same.

Thanks,
Satoru


# sync
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=17.00GiB <- After sync, it 
is expected.

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

The value of 6.00GiB is actually calculated in 
btrfs_get_block_group_info()
by adding the @block_group->item->used for each group together. In 
this way,

it did not consider the bytes in cache.

This patch adds the value of @pinned, @reserved and @bytes_super in
struct btrfs_block_group_cache to make sure we can get the accurate 
@used_bytes.


Reported-by: Qu Wenruo 
Signed-off-by: Dongsheng Yang 
---
  fs/btrfs/ioctl.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 33c80f5..bc2aaeb 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3892,6 +3892,10 @@ void btrfs_get_block_group_info(struct 
list_head *groups_list,

  space->total_bytes += block_group->key.offset;
  space->used_bytes +=
btrfs_block_group_used(&block_group->item);
+/* Add bytes-info in cache */
+space->used_bytes += block_group->pinned;
+space->used_bytes += block_group->reserved;
+space->used_bytes += block_group->bytes_super;
  }
  }


--
To unsubscribe from this list: send the line "unsubscribe 
linux-btrfs" in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe 
linux-btrfs" in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: get the accurate value of used_bytes in btrfs_get_block_group_info().

2015-01-07 Thread Qu Wenruo

Hi Satoru-san

Hi Dongsheng,

On 2015/01/05 20:19, Dongsheng Yang wrote:


Ping.

IOCTL of BTRFS_IOC_SPACE_INFO currently does not report
the data used but not synced to user.  Then btrfs fi df will
give user a wrong numbers before sync. This patch solve
this problem.

On 10/27/2014 08:38 PM, Dongsheng Yang wrote:

Reproducer:
# mkfs.btrfs -f -b 20G /dev/sdb
# mount /dev/sdb /mnt/test
# fallocate  -l 17G /mnt/test/largefile
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=6.00GiB <- only 6G, but 
actually it should be 17G.

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B


I tried to reproduce your problem with 3.19-rc1.
However, this problem doesn't happen. Could you
also try to reproduce with the upstream kernel?
I can still reproduce it in 3.18, but it seems to be fixed in 3.19-rc1 
already by other patch,

so this patch is no longer needed.

Thanks,
Qu


* Detail

test script (named "yang-test.sh" here):
=== 


#!/bin/bash -x

PART1=/dev/vdb
MNT_PNT=./mnt
mkfs.btrfs -f -b 20G ${PART1}
mount ${PART1} ${MNT_PNT}
fallocate -l 17G ${MNT_PNT}/largefile
btrfs fi df ${MNT_PNT}
sync
btrfs fi df ${MNT_PNT}
umount ${MNT_PNT}
=== 



Result:
=== 


# ./yang-test.sh
+ PART1=/dev/vdb
+ MNT_PNT=./mnt
+ mkfs.btrfs -f -b 20G /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per 
file to 65536

fs created label (null) on /dev/vdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 20.00GiB
+ mount /dev/vdb ./mnt
+ fallocate -l 17G ./mnt/largefile
+ btrfs fi df ./mnt
Data, single: total=17.01GiB, used=17.00GiB   # Used 17GiB properly
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
+ sync
+ btrfs fi df ./mnt
Data, single: total=17.01GiB, used=17.00GiB# (of course) used 
17GiB too

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
+ umount ./mnt
=== 



Although I ran this test five times, the results are the same.

Thanks,
Satoru


# sync
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=17.00GiB <- After sync, it is 
expected.

System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

The value of 6.00GiB is actually calculated in 
btrfs_get_block_group_info()
by adding the @block_group->item->used for each group together. In 
this way,

it did not consider the bytes in cache.

This patch adds the value of @pinned, @reserved and @bytes_super in
struct btrfs_block_group_cache to make sure we can get the accurate 
@used_bytes.


Reported-by: Qu Wenruo 
Signed-off-by: Dongsheng Yang 
---
  fs/btrfs/ioctl.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 33c80f5..bc2aaeb 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3892,6 +3892,10 @@ void btrfs_get_block_group_info(struct 
list_head *groups_list,

  space->total_bytes += block_group->key.offset;
  space->used_bytes +=
btrfs_block_group_used(&block_group->item);
+/* Add bytes-info in cache */
+space->used_bytes += block_group->pinned;
+space->used_bytes += block_group->reserved;
+space->used_bytes += block_group->bytes_super;
  }
  }


--
To unsubscribe from this list: send the line "unsubscribe 
linux-btrfs" in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Fix a copy-n-paste bug in btrfs_read_fs_root().

2015-01-07 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disk-io.c b/disk-io.c
index 2bf8586..b853f66 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -693,7 +693,7 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info 
*fs_info,
if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
return fs_info->csum_root;
if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
-   return fs_info->csum_root;
+   return fs_info->quota_root;
 
BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
   location->offset != (u64)-1);
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Btrfs: use asynchronous submit for large DIO io in single profile

2015-01-07 Thread Liu Bo
Commit 1ae399382512 ("Btrfs: do not use async submit for small DIO io's")
benefits small DIO io's.

However, if we're owning the SINGLE profile, this also affects large DIO io's
since in that case, map_length is (chunk_length - bio's offset_in_chunk),
it's farily large so that it's very likely to be larger than a large bio's
size, which avoids asynchronous submit.
For instance, if we have a 512k bio, the efforts of calculating (512k/4k=128)
checksums will be taken by the DIO task.

This adds a limit 'BTRFS_STRIPE_LEN' to decide if it's small enough to avoid
asynchronous submit.

Still, in this case we don't need to split the bio and can submit it directly.

Signed-off-by: Liu Bo 
---
 fs/btrfs/inode.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e687bb0..c640d7e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7792,6 +7792,7 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
int nr_pages = 0;
int ret;
int async_submit = 0;
+   u64 alloc_profile;
 
map_length = orig_bio->bi_iter.bi_size;
ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
@@ -7799,15 +7800,26 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
if (ret)
return -EIO;
 
+   alloc_profile = btrfs_get_alloc_profile(root, 1);
+
if (map_length >= orig_bio->bi_iter.bi_size) {
bio = orig_bio;
dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED;
+
+   /*
+* In the case of 'single' profile, the above check is very
+* likely to be true as map_length is (chunk_length - offset),
+* so checking BTRFS_STRIPE_LEN here.
+*/
+   if ((alloc_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 &&
+   orig_bio->bi_iter.bi_size >= BTRFS_STRIPE_LEN)
+   async_submit = 1;
+
goto submit;
}
 
/* async crcs make it difficult to collect full stripe writes. */
-   if (btrfs_get_alloc_profile(root, 1) &
-   (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6))
+   if (alloc_profile & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6))
async_submit = 0;
else
async_submit = 1;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: get the accurate value of used_bytes in btrfs_get_block_group_info().

2015-01-07 Thread Satoru Takeuchi

Hi Dongsheng,

On 2015/01/05 20:19, Dongsheng Yang wrote:


Ping.

IOCTL of BTRFS_IOC_SPACE_INFO currently does not report
the data used but not synced to user.  Then btrfs fi df will
give user a wrong numbers before sync. This patch solve
this problem.

On 10/27/2014 08:38 PM, Dongsheng Yang wrote:

Reproducer:
# mkfs.btrfs -f -b 20G /dev/sdb
# mount /dev/sdb /mnt/test
# fallocate  -l 17G /mnt/test/largefile
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=6.00GiB <- only 6G, but actually it 
should be 17G.
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B


I tried to reproduce your problem with 3.19-rc1.
However, this problem doesn't happen. Could you
also try to reproduce with the upstream kernel?

* Detail

test script (named "yang-test.sh" here):
===
#!/bin/bash -x

PART1=/dev/vdb
MNT_PNT=./mnt
mkfs.btrfs -f -b 20G ${PART1}
mount ${PART1} ${MNT_PNT}
fallocate -l 17G ${MNT_PNT}/largefile
btrfs fi df ${MNT_PNT}
sync
btrfs fi df ${MNT_PNT}
umount ${MNT_PNT}
===

Result:
===
# ./yang-test.sh
+ PART1=/dev/vdb
+ MNT_PNT=./mnt
+ mkfs.btrfs -f -b 20G /dev/vdb
Btrfs v3.17
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
fs created label (null) on /dev/vdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 20.00GiB
+ mount /dev/vdb ./mnt
+ fallocate -l 17G ./mnt/largefile
+ btrfs fi df ./mnt
Data, single: total=17.01GiB, used=17.00GiB   # Used 17GiB properly
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
+ sync
+ btrfs fi df ./mnt
Data, single: total=17.01GiB, used=17.00GiB# (of course) used 17GiB too
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
+ umount ./mnt
===

Although I ran this test five times, the results are the same.

Thanks,
Satoru


# sync
# btrfs fi df /mnt/test
Data, single: total=17.49GiB, used=17.00GiB <- After sync, it is expected.
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

The value of 6.00GiB is actually calculated in btrfs_get_block_group_info()
by adding the @block_group->item->used for each group together. In this way,
it did not consider the bytes in cache.

This patch adds the value of @pinned, @reserved and @bytes_super in
struct btrfs_block_group_cache to make sure we can get the accurate @used_bytes.

Reported-by: Qu Wenruo 
Signed-off-by: Dongsheng Yang 
---
  fs/btrfs/ioctl.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 33c80f5..bc2aaeb 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3892,6 +3892,10 @@ void btrfs_get_block_group_info(struct list_head 
*groups_list,
  space->total_bytes += block_group->key.offset;
  space->used_bytes +=
  btrfs_block_group_used(&block_group->item);
+/* Add bytes-info in cache */
+space->used_bytes += block_group->pinned;
+space->used_bytes += block_group->reserved;
+space->used_bytes += block_group->bytes_super;
  }
  }


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html