date:20180302

Re: btrfs space used issue

2018-03-02 Thread Duncan

vinayak hegde posted on Thu, 01 Mar 2018 14:56:46 +0530 as excerpted:

> This will happen over and over again until we have completely
> overwritten the original extent, at which point your space usage will go
> back down to ~302g.We split big extents with cow, so unless you've got
> lots of space to spare or are going to use nodatacow you should probably
> not pre-allocate virt images

Indeed.  Preallocation with COW doesn't make the sense it does on an 
overwrite-in-place filesystem.  Either nocow it and take the penalties 
that brings[1], or configure your app not to preallocate in the first 
place[2].

---
[1] On btrfs, nocow implies no checksumming or transparent compression, 
either.  Also, the nocow attribute needs to be set on the empty file, 
with the easiest way to do that being to set it on the parent directory 
before file creation, so it's inherited by any newly created files/
subdirs within it.

[2] Many apps that preallocate by default have an option to turn 
preallocation off.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/8] btrfs-progs: quota: Add -W option to rescan to wait without starting rescan

2018-03-02 Thread Jeff Mahoney

On 3/2/18 1:59 PM, Nikolay Borisov wrote:
> 
> 
> On  2.03.2018 20:46, je...@suse.com wrote:
>> From: Jeff Mahoney 
>> @@ -135,8 +141,9 @@ static int cmd_quota_rescan(int argc, char **argv)
>>  }
>>  }
>>  
>> -if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN && wait_for_completion) {
>> -error("switch -w cannot be used with -s");
>> +if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS && wait_for_completion) {
>> +error("switch -%c cannot be used with -s",
>> +  ioctlnum ? 'w' : 'W');
> 
> You can't really distinguish between w/W in this context, since ioctlnum
> will be RESCAN_STATUS. So just harcode the w/W in the text message itself?

Yep.  Derp.

Thanks,

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

[PATCH] Btrfs: send: fix typo in TLV_PUT

2018-03-02 Thread Liu Bo

According to tlv_put()'s prototype, data and attrlen needs to be
exchanged in the macro, but seems all callers are already aware of
this misorder and are therefore not affected.

Signed-off-by: Liu Bo 
---
 fs/btrfs/send.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f306c60..ee9ce67 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -611,9 +611,9 @@ static int tlv_put_btrfs_timespec(struct send_ctx *sctx, 
u16 attr,
 }
 
 
-#define TLV_PUT(sctx, attrtype, attrlen, data) \
+#define TLV_PUT(sctx, attrtype, data, attrlen) \
do { \
-   ret = tlv_put(sctx, attrtype, attrlen, data); \
+   ret = tlv_put(sctx, attrtype, data, attrlen); \
if (ret < 0) \
goto tlv_put_failure; \
} while (0)
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: replace: cache rbio when rebuild data on missing device

2018-03-02 Thread Liu Bo

Rebuild on missing device is as same as recover, after it's done, rbio
has data which is consistent with on-disk data, so it can be cached to
avoid further reads.

Signed-off-by: Liu Bo 
---
 fs/btrfs/raid56.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index dec0907..bb8a3c5 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1986,7 +1986,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
kfree(pointers);
 
 cleanup_io:
-   if (rbio->operation == BTRFS_RBIO_READ_REBUILD) {
+   if (rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+   rbio->operation == BTRFS_RBIO_REBUILD_MISSING) {
/*
 * - In case of two failures, where rbio->failb != -1:
 *
@@ -2008,8 +2009,6 @@ static void __raid_recover_end_io(struct btrfs_raid_bio 
*rbio)
clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
rbio_orig_end_io(rbio, err);
-   } else if (rbio->operation == BTRFS_RBIO_REBUILD_MISSING) {
-   rbio_orig_end_io(rbio, err);
} else if (err == BLK_STS_OK) {
rbio->faila = -1;
rbio->failb = -1;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: scrub: remove unnecessary variable set

2018-03-02 Thread Liu Bo

Variable "success" is only checked when !sctx->is_dev_replace.

Signed-off-by: Liu Bo 
---
 fs/btrfs/scrub.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index e3203a1..1b5ce2f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1444,7 +1444,6 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
page_num) != 0) {
btrfs_dev_replace_stats_inc(
&fs_info->dev_replace.num_write_errors);
-   success = 0;
}
} else if (sblock_other) {
ret = scrub_repair_page_from_good_copy(sblock_bad,
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: dev-replace: make sure target is identical to source when raid56 rebuild fails

2018-03-02 Thread Liu Bo

In the last step of scrub_handle_error_block, we try to combine good
copies on all possible mirrors, this works fine for raid1 and raid10,
but not for raid56 as it's doing parity rebuild.

If parity rebuild doesn't get back with correct data which matches its
checksum, in case of replace we'd rather write what is stored in the
source device than the data calculuated from parity.

Signed-off-by: Liu Bo 
---
 fs/btrfs/scrub.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 1b5ce2f..f449dc6 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1412,8 +1412,17 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
if (!page_bad->io_error && !sctx->is_dev_replace)
continue;
 
-   /* try to find no-io-error page in mirrors */
-   if (page_bad->io_error) {
+   if (scrub_is_page_on_raid56(sblock_bad->pagev[0])) {
+   /*
+* In case of dev replace, if raid56 rebuild process
+* didn't work out correct data, then copy the content
+* in sblock_bad to make sure target device is identical
+* to source device, instead of writing garbage data in
+* sblock_for_recheck array to target device.
+*/
+   sblock_other = NULL;
+   } else if (page_bad->io_error) {
+   /* try to find no-io-error page in mirrors */
for (mirror_index = 0;
 mirror_index < BTRFS_MAX_MIRRORS &&
 sblocks_for_recheck[mirror_index].page_count > 0;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: raid56: remove redundant async_missing_raid56

2018-03-02 Thread Liu Bo

async_missing_raid56() is identical to async_read_rebuild().

Signed-off-by: Liu Bo 
---
 fs/btrfs/raid56.c | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index bb8a3c5..efb42dc 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2766,24 +2766,8 @@ raid56_alloc_missing_rbio(struct btrfs_fs_info *fs_info, 
struct bio *bio,
return rbio;
 }
 
-static void missing_raid56_work(struct btrfs_work *work)
-{
-   struct btrfs_raid_bio *rbio;
-
-   rbio = container_of(work, struct btrfs_raid_bio, work);
-   __raid56_parity_recover(rbio);
-}
-
-static void async_missing_raid56(struct btrfs_raid_bio *rbio)
-{
-   btrfs_init_work(&rbio->work, btrfs_rmw_helper,
-   missing_raid56_work, NULL, NULL);
-
-   btrfs_queue_work(rbio->fs_info->rmw_workers, &rbio->work);
-}
-
 void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio)
 {
if (!lock_stripe_add(rbio))
-   async_missing_raid56(rbio);
+   async_read_rebuild(rbio);
 }
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: scrub: batch rebuild for raid56

2018-03-02 Thread Liu Bo

In case of raid56, writes and rebuilds always take BTRFS_STRIPE_LEN(64K)
as unit, however, scrub_extent() sets blocksize as unit, so rebuild
process may be triggered on every block on a same stripe.

A typical example would be that when we're replacing a disappeared disk,
all reads on the disks get -EIO, every block (size is 4K if blocksize is
4K) would go thru these,

scrub_handle_errored_block
  scrub_recheck_block # re-read pages one by one
  scrub_recheck_block # rebuild by calling raid56_parity_recover()
page by page

Although with raid56 stripe cache most of reads during rebuild can be
avoided, the parity recover calculation(xor or raid6 algorithms) needs to
be done $(BTRFS_STRIPE_LEN / blocksize) times.

This makes it less stupid by doing raid56 scrub/replace on stripe length.
---
 fs/btrfs/scrub.c | 78 +++-
 1 file changed, 60 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 9882513..e3203a1 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1718,6 +1718,44 @@ static int scrub_submit_raid56_bio_wait(struct 
btrfs_fs_info *fs_info,
return blk_status_to_errno(bio->bi_status);
 }
 
+static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
+ struct scrub_block *sblock)
+{
+   struct scrub_page *first_page = sblock->pagev[0];
+   struct bio *bio = btrfs_io_bio_alloc(BIO_MAX_PAGES);
+   int page_num;
+
+   /* All pages in sblock belongs to the same stripe on the same device. */
+   ASSERT(first_page->dev);
+   if (first_page->dev->bdev == NULL)
+   goto out;
+
+   bio_set_dev(bio, first_page->dev->bdev);
+
+   for (page_num = 0; page_num < sblock->page_count; page_num++) {
+   struct scrub_page *page = sblock->pagev[page_num];
+
+   WARN_ON(!page->page);
+   bio_add_page(bio, page->page, PAGE_SIZE, 0);
+   }
+
+   if (scrub_submit_raid56_bio_wait(fs_info, bio, first_page)) {
+   bio_put(bio);
+   goto out;
+   }
+
+   bio_put(bio);
+
+   scrub_recheck_block_checksum(sblock);
+
+   return;
+out:
+   for (page_num = 0; page_num < sblock->page_count; page_num++)
+   sblock->pagev[page_num]->io_error = 1;
+
+   sblock->no_io_error_seen = 0;
+}
+
 /*
  * this function will check the on disk data for checksum errors, header
  * errors and read I/O errors. If any I/O errors happen, the exact pages
@@ -1733,6 +1771,10 @@ static void scrub_recheck_block(struct btrfs_fs_info 
*fs_info,
 
sblock->no_io_error_seen = 1;
 
+   /* short cut for raid56 */
+   if (!retry_failed_mirror && scrub_is_page_on_raid56(sblock->pagev[0]))
+   return scrub_recheck_block_on_raid56(fs_info, sblock);
+
for (page_num = 0; page_num < sblock->page_count; page_num++) {
struct bio *bio;
struct scrub_page *page = sblock->pagev[page_num];
@@ -1748,19 +1790,12 @@ static void scrub_recheck_block(struct btrfs_fs_info 
*fs_info,
bio_set_dev(bio, page->dev->bdev);
 
bio_add_page(bio, page->page, PAGE_SIZE, 0);
-   if (!retry_failed_mirror && scrub_is_page_on_raid56(page)) {
-   if (scrub_submit_raid56_bio_wait(fs_info, bio, page)) {
-   page->io_error = 1;
-   sblock->no_io_error_seen = 0;
-   }
-   } else {
-   bio->bi_iter.bi_sector = page->physical >> 9;
-   bio_set_op_attrs(bio, REQ_OP_READ, 0);
+   bio->bi_iter.bi_sector = page->physical >> 9;
+   bio_set_op_attrs(bio, REQ_OP_READ, 0);
 
-   if (btrfsic_submit_bio_wait(bio)) {
-   page->io_error = 1;
-   sblock->no_io_error_seen = 0;
-   }
+   if (btrfsic_submit_bio_wait(bio)) {
+   page->io_error = 1;
+   sblock->no_io_error_seen = 0;
}
 
bio_put(bio);
@@ -2728,7 +2763,8 @@ static int scrub_find_csum(struct scrub_ctx *sctx, u64 
logical, u8 *csum)
 }
 
 /* scrub extent tries to collect up to 64 kB for each bio */
-static int scrub_extent(struct scrub_ctx *sctx, u64 logical, u64 len,
+static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map,
+   u64 logical, u64 len,
u64 physical, struct btrfs_device *dev, u64 flags,
u64 gen, int mirror_num, u64 physical_for_dev_replace)
 {
@@ -2737,13 +2773,19 @@ static int scrub_extent(struct scrub_ctx *sctx, u64 
logical, u64 len,
u32 blocksize;
 
if (flags & BTRFS_EXTENT_FLAG_DATA) {
-   blocksize = sctx->fs_info->sectorsize;
+   if (map->type & BTRFS_B

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Filipe Manana

On Fri, Mar 2, 2018 at 7:00 PM, Liu Bo  wrote:
> On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote:
>> On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
>> > From: Filipe Manana 
>> >
>> > If we have a file with 2 (or more) hard links in the same directory,
>> > remove one of the hard links, create a new file (or link an existing file)
>> > in the same directory with the name of the removed hard link, and then
>> > finally fsync the new file, we end up with a log that fails to replay,
>> > causing a mount failure.
>> >
>> > Example:
>> >
>> >   $ mkfs.btrfs -f /dev/sdb
>> >   $ mount /dev/sdb /mnt
>> >
>> >   $ mkdir /mnt/testdir
>> >   $ touch /mnt/testdir/foo
>> >   $ ln /mnt/testdir/foo /mnt/testdir/bar
>> >
>> >   $ sync
>> >
>> >   $ unlink /mnt/testdir/bar
>> >   $ touch /mnt/testdir/bar
>
> So here, "unlink & touch" similar to rename, therefore we could also
> be conservative and set /mnt/testdir 's last_unlink_trans to force to
> commit transaction.

That would cause any fsync on any file inside the directory to result
in a full transaction commit.
We had in the past more "exotic" scenarios with renames and creating
new files with the same old name where the initial solution was just
that, forcing a full commit [1], just to find out later that it caused
big performance drop on real workloads [2]. That's why I'm avoiding
it, even because fsync'ing any file in the directory (existing or new)
isn't uncommon at all.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56f23fdbb600e6087db7b009775b95ce07cc3195
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=44f714dae50a2e795d3268a6831762aa6fa54f55

thanks

>
> Thanks,
>
> -liubo
>
>> >   $ xfs_io -c "fsync" /mnt/testdir/bar
>> >
>> >   
>> >
>> >   $ mount /dev/sdb /mnt
>> >   mount: mount(2) failed: /mnt: No such file or directory
>> >
>> > When replaying the log, for that example, we also see the following in
>> > dmesg/syslog:
>> >
>> >   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
>> > bar, inode 258 parent 257
>> >   [71813.674204] [ cut here ]
>> >   [71813.675694] BTRFS: Transaction aborted (error -2)
>> >   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
>> > __btrfs_unlink_inode+0x17b/0x355 [btrfs]
>> >   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
>> > ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
>> > glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg 
>> > serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 
>> > zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov 
>> > async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
>> > crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod 
>> > virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy 
>> > virtio e1000 scsi_mod [last unloaded: btrfs]
>> >   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
>> > 4.15.0-rc9-btrfs-next-56+ #1
>> >   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> >   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
>> >   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
>> >   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
>> > 0001
>> >   [71813.679669] RDX:  RSI: 81c14bae RDI: 
>> > 
>> >   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
>> > 0001
>> >   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
>> > 880217d474c8
>> >   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
>> > 0101
>> >   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
>> > knlGS:
>> >   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
>> >   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
>> > 001606e0
>> >   [71813.679669] Call Trace:
>> >   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
>> >   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
>> >   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
>> >   [71813.679669]  ? __lock_is_held+0x39/0x71
>> >   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
>> >   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
>> >   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
>> >   [71813.679669]  ? __lock_is_held+0x39/0x71
>> >   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
>> >   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
>> >   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
>> >   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
>> >   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
>> >   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Filipe Manana

On Fri, Mar 2, 2018 at 6:29 PM, Liu Bo  wrote:
> On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
>> From: Filipe Manana 
>>
>> If we have a file with 2 (or more) hard links in the same directory,
>> remove one of the hard links, create a new file (or link an existing file)
>> in the same directory with the name of the removed hard link, and then
>> finally fsync the new file, we end up with a log that fails to replay,
>> causing a mount failure.
>>
>> Example:
>>
>>   $ mkfs.btrfs -f /dev/sdb
>>   $ mount /dev/sdb /mnt
>>
>>   $ mkdir /mnt/testdir
>>   $ touch /mnt/testdir/foo
>>   $ ln /mnt/testdir/foo /mnt/testdir/bar
>>
>>   $ sync
>>
>>   $ unlink /mnt/testdir/bar
>>   $ touch /mnt/testdir/bar
>>   $ xfs_io -c "fsync" /mnt/testdir/bar
>>
>>   
>>
>>   $ mount /dev/sdb /mnt
>>   mount: mount(2) failed: /mnt: No such file or directory
>>
>> When replaying the log, for that example, we also see the following in
>> dmesg/syslog:
>>
>>   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
>> bar, inode 258 parent 257
>>   [71813.674204] [ cut here ]
>>   [71813.675694] BTRFS: Transaction aborted (error -2)
>>   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
>> __btrfs_unlink_inode+0x17b/0x355 [btrfs]
>>   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
>> ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
>> glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
>> parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
>> zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
>> async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
>> multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
>> virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
>> unloaded: btrfs]
>>   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
>> 4.15.0-rc9-btrfs-next-56+ #1
>>   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>>   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
>>   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
>>   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
>> 0001
>>   [71813.679669] RDX:  RSI: 81c14bae RDI: 
>> 
>>   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
>> 0001
>>   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
>> 880217d474c8
>>   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
>> 0101
>>   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
>> knlGS:
>>   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
>>   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
>> 001606e0
>>   [71813.679669] Call Trace:
>>   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
>>   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
>>   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
>>   [71813.679669]  ? __lock_is_held+0x39/0x71
>>   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
>>   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
>>   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
>>   [71813.679669]  ? __lock_is_held+0x39/0x71
>>   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
>>   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
>>   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
>>   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
>>   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
>>   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
>>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>>   [71813.679669]  ? mount_fs+0x64/0x10b
>>   [71813.679669]  mount_fs+0x64/0x10b
>>   [71813.679669]  vfs_kern_mount+0x68/0xce
>>   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
>>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>>   [71813.679669]  ? mount_fs+0x64/0x10b
>>   [71813.679669]  mount_fs+0x64/0x10b
>>   [71813.679669]  vfs_kern_mount+0x68/0xce
>>   [71813.679669]  do_mount+0x6e5/0x973
>>   [71813.679669]  ? memdup_user+0x3e/0x5c
>>   [71813.679669]  SyS_mount+0x72/0x98
>>   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
>>   [71813.679669] RIP: 0033:0x7f7cedf150ba
>>   [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206
>>   [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 
>> 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 
>> <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
>>   [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
>>   [71813.854764] BTR

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Liu Bo

On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote:
> On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> > 
> > If we have a file with 2 (or more) hard links in the same directory,
> > remove one of the hard links, create a new file (or link an existing file)
> > in the same directory with the name of the removed hard link, and then
> > finally fsync the new file, we end up with a log that fails to replay,
> > causing a mount failure.
> > 
> > Example:
> > 
> >   $ mkfs.btrfs -f /dev/sdb
> >   $ mount /dev/sdb /mnt
> > 
> >   $ mkdir /mnt/testdir
> >   $ touch /mnt/testdir/foo
> >   $ ln /mnt/testdir/foo /mnt/testdir/bar
> > 
> >   $ sync
> > 
> >   $ unlink /mnt/testdir/bar
> >   $ touch /mnt/testdir/bar

So here, "unlink & touch" similar to rename, therefore we could also
be conservative and set /mnt/testdir 's last_unlink_trans to force to
commit transaction.

Thanks,

-liubo

> >   $ xfs_io -c "fsync" /mnt/testdir/bar
> > 
> >   
> > 
> >   $ mount /dev/sdb /mnt
> >   mount: mount(2) failed: /mnt: No such file or directory
> > 
> > When replaying the log, for that example, we also see the following in
> > dmesg/syslog:
> > 
> >   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
> > bar, inode 258 parent 257
> >   [71813.674204] [ cut here ]
> >   [71813.675694] BTRFS: Transaction aborted (error -2)
> >   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
> > __btrfs_unlink_inode+0x17b/0x355 [btrfs]
> >   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
> > ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
> > glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
> > parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
> > zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
> > async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
> > multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
> > virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
> > unloaded: btrfs]
> >   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
> > 4.15.0-rc9-btrfs-next-56+ #1
> >   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> >   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
> >   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
> >   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
> > 0001
> >   [71813.679669] RDX:  RSI: 81c14bae RDI: 
> > 
> >   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
> > 0001
> >   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
> > 880217d474c8
> >   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
> > 0101
> >   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
> > knlGS:
> >   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
> >   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
> > 001606e0
> >   [71813.679669] Call Trace:
> >   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
> >   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
> >   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
> >   [71813.679669]  ? __lock_is_held+0x39/0x71
> >   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
> >   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
> >   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
> >   [71813.679669]  ? __lock_is_held+0x39/0x71
> >   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
> >   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
> >   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
> >   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
> >   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
> >   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
> >   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
> >   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
> >   [71813.679669]  ? mount_fs+0x64/0x10b
> >   [71813.679669]  mount_fs+0x64/0x10b
> >   [71813.679669]  vfs_kern_mount+0x68/0xce
> >   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
> >   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
> >   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
> >   [71813.679669]  ? mount_fs+0x64/0x10b
> >   [71813.679669]  mount_fs+0x64/0x10b
> >   [71813.679669]  vfs_kern_mount+0x68/0xce
> >   [71813.679669]  do_mount+0x6e5/0x973
> >   [71813.679669]  ? memdup_user+0x3e/0x5c
> >   [71813.679669]  SyS_mount+0x72/0x98
> >   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
> >   [71813.679669] RIP: 0033:0x7f7cedf150ba
> >   [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206
> >   [71813

Re: [PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-03-02 Thread Liu Bo

On Wed, Feb 28, 2018 at 03:56:10PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If we have a file with 2 (or more) hard links in the same directory,
> remove one of the hard links, create a new file (or link an existing file)
> in the same directory with the name of the removed hard link, and then
> finally fsync the new file, we end up with a log that fails to replay,
> causing a mount failure.
> 
> Example:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ mkdir /mnt/testdir
>   $ touch /mnt/testdir/foo
>   $ ln /mnt/testdir/foo /mnt/testdir/bar
> 
>   $ sync
> 
>   $ unlink /mnt/testdir/bar
>   $ touch /mnt/testdir/bar
>   $ xfs_io -c "fsync" /mnt/testdir/bar
> 
>   
> 
>   $ mount /dev/sdb /mnt
>   mount: mount(2) failed: /mnt: No such file or directory
> 
> When replaying the log, for that example, we also see the following in
> dmesg/syslog:
> 
>   [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, 
> inode 258 parent 257
>   [71813.674204] [ cut here ]
>   [71813.675694] BTRFS: Transaction aborted (error -2)
>   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
> __btrfs_unlink_inode+0x17b/0x355 [btrfs]
>   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
> ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
> glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
> parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
> zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
> async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
> multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
> virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
> unloaded: btrfs]
>   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW
> 4.15.0-rc9-btrfs-next-56+ #1
>   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
>   [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286
>   [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 
> 0001
>   [71813.679669] RDX:  RSI: 81c14bae RDI: 
> 
>   [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 
> 0001
>   [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 
> 880217d474c8
>   [71813.679669] R13: fffe R14: 88021ccf1548 R15: 
> 0101
>   [71813.679669] FS:  7f7cee84c480() GS:88023fc8() 
> knlGS:
>   [71813.679669] CS:  0010 DS:  ES:  CR0: 80050033
>   [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 
> 001606e0
>   [71813.679669] Call Trace:
>   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
>   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
>   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
>   [71813.679669]  ? __lock_is_held+0x39/0x71
>   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
>   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
>   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
>   [71813.679669]  ? __lock_is_held+0x39/0x71
>   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
>   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
>   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
>   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
>   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
>   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>   [71813.679669]  ? mount_fs+0x64/0x10b
>   [71813.679669]  mount_fs+0x64/0x10b
>   [71813.679669]  vfs_kern_mount+0x68/0xce
>   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
>   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
>   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
>   [71813.679669]  ? mount_fs+0x64/0x10b
>   [71813.679669]  mount_fs+0x64/0x10b
>   [71813.679669]  vfs_kern_mount+0x68/0xce
>   [71813.679669]  do_mount+0x6e5/0x973
>   [71813.679669]  ? memdup_user+0x3e/0x5c
>   [71813.679669]  SyS_mount+0x72/0x98
>   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
>   [71813.679669] RIP: 0033:0x7f7cedf150ba
>   [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206
>   [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 
> 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 
> <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
>   [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
>   [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: 
> errno=-2 No such entry
>   [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_l

Re: [PATCH 1/8] btrfs-progs: quota: Add -W option to rescan to wait without starting rescan

2018-03-02 Thread Nikolay Borisov



On  2.03.2018 20:46, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> This patch adds a new -W option to wait for a rescan without starting a
> new operation.  This is useful for things like xfstests where we want
> do to do a "btrfs quota enable" and not continue until the subsequent
> rescan has finished.
> 
> In addition to documenting the new option in the man page, I've cleaned
> up the rescan entry to document the -w option a bit better.
> 
> Signed-off-by: Jeff Mahoney 
> ---
>  Documentation/btrfs-quota.asciidoc | 10 +++---
>  cmds-quota.c   | 21 +++--
>  2 files changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/btrfs-quota.asciidoc 
> b/Documentation/btrfs-quota.asciidoc
> index 85ebf729..0b64a69b 100644
> --- a/Documentation/btrfs-quota.asciidoc
> +++ b/Documentation/btrfs-quota.asciidoc
> @@ -238,15 +238,19 @@ Disable subvolume quota support for a filesystem.
>  *enable* ::
>  Enable subvolume quota support for a filesystem.
>  
> -*rescan* [-s] ::
> +*rescan* [-s|-w|-W] ::
>  Trash all qgroup numbers and scan the metadata again with the current config.
>  +
>  `Options`
>  +
>  -s
> -show status of a running rescan operation.
> +Show status of a running rescan operation.
> +
>  -w
> -wait for rescan operation to finish(can be already in progress).
> +Start rescan operation and wait until it has finished before exiting.  If a 
> rescan is already running, wait until it finishes and then exit without 
> starting a new one.
> +
> +-W
> +Wait for rescan operation to finish and then exit.  If a rescan is not 
> already running, exit silently.
>  
>  EXIT STATUS
>  ---
> diff --git a/cmds-quota.c b/cmds-quota.c
> index 745889d1..fe6376ac 100644
> --- a/cmds-quota.c
> +++ b/cmds-quota.c
> @@ -120,14 +120,20 @@ static int cmd_quota_rescan(int argc, char **argv)
>   int wait_for_completion = 0;
>  
>   while (1) {
> - int c = getopt(argc, argv, "sw");
> + int c = getopt(argc, argv, "swW");
>   if (c < 0)
>   break;
>   switch (c) {
>   case 's':
>   ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS;
>   break;
> + case 'W':
> + ioctlnum = 0;
> + wait_for_completion = 1;
> + break;
>   case 'w':
> + /* Reset it in case the user did both -W and -w */
> + ioctlnum = BTRFS_IOC_QUOTA_RESCAN;
>   wait_for_completion = 1;
>   break;
>   default:
> @@ -135,8 +141,9 @@ static int cmd_quota_rescan(int argc, char **argv)
>   }
>   }
>  
> - if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN && wait_for_completion) {
> - error("switch -w cannot be used with -s");
> + if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS && wait_for_completion) {
> + error("switch -%c cannot be used with -s",
> +   ioctlnum ? 'w' : 'W');

You can't really distinguish between w/W in this context, since ioctlnum
will be RESCAN_STATUS. So just harcode the w/W in the text message itself?

>   return 1;
>   }
>  
> @@ -150,8 +157,10 @@ static int cmd_quota_rescan(int argc, char **argv)
>   if (fd < 0)
>   return 1;
>  
> - ret = ioctl(fd, ioctlnum, &args);
> - e = errno;
> + if (ioctlnum) {
> + ret = ioctl(fd, ioctlnum, &args);
> + e = errno;
> + }
>  
>   if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS) {
>   close_file_or_dir(fd, dirstream);
> @@ -167,7 +176,7 @@ static int cmd_quota_rescan(int argc, char **argv)
>   return 0;
>   }
>  
> - if (ret == 0) {
> + if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN && ret == 0) {
>   printf("quota rescan started\n");
>   fflush(stdout);
>   } else if (ret < 0 && (!wait_for_completion || e != EINPROGRESS)) {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] btrfs-progs: constify pathnames passed as arguments

2018-03-02 Thread jeffm

From: Jeff Mahoney 

It's unlikely we're going to modify a pathname argument, so codify that
and use const.

Signed-off-by: Jeff Mahoney 
---
 chunk-recover.c | 4 ++--
 cmds-device.c   | 2 +-
 cmds-fi-usage.c | 6 +++---
 cmds-rescue.c   | 4 ++--
 send-utils.c| 4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/chunk-recover.c b/chunk-recover.c
index 705bcf52..1d30db51 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1492,7 +1492,7 @@ out:
return ERR_PTR(ret);
 }
 
-static int recover_prepare(struct recover_control *rc, char *path)
+static int recover_prepare(struct recover_control *rc, const char *path)
 {
int ret;
int fd;
@@ -2296,7 +2296,7 @@ static void validate_rebuild_chunks(struct 
recover_control *rc)
 /*
  * Return 0 when successful, < 0 on error and > 0 if aborted by user
  */
-int btrfs_recover_chunk_tree(char *path, int verbose, int yes)
+int btrfs_recover_chunk_tree(const char *path, int verbose, int yes)
 {
int ret = 0;
struct btrfs_root *root = NULL;
diff --git a/cmds-device.c b/cmds-device.c
index 86459d1b..a49c9d9d 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -526,7 +526,7 @@ static const char * const cmd_device_usage_usage[] = {
NULL
 };
 
-static int _cmd_device_usage(int fd, char *path, unsigned unit_mode)
+static int _cmd_device_usage(int fd, const char *path, unsigned unit_mode)
 {
int i;
int ret = 0;
diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index de7ad668..9a1c76ab 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -227,7 +227,7 @@ static int cmp_btrfs_ioctl_space_info(const void *a, const 
void *b)
 /*
  * This function load all the information about the space usage
  */
-static struct btrfs_ioctl_space_args *load_space_info(int fd, char *path)
+static struct btrfs_ioctl_space_args *load_space_info(int fd, const char *path)
 {
struct btrfs_ioctl_space_args *sargs = NULL, *sargs_orig = NULL;
int ret, count;
@@ -305,7 +305,7 @@ static void get_raid56_used(struct chunk_info *chunks, int 
chunkcount,
 #defineMIN_UNALOCATED_THRESH   SZ_16M
 static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
int chunkcount, struct device_info *devinfo, int devcount,
-   char *path, unsigned unit_mode)
+   const char *path, unsigned unit_mode)
 {
struct btrfs_ioctl_space_args *sargs = NULL;
int i;
@@ -931,7 +931,7 @@ static void _cmd_filesystem_usage_linear(unsigned unit_mode,
 static int print_filesystem_usage_by_chunk(int fd,
struct chunk_info *chunkinfo, int chunkcount,
struct device_info *devinfo, int devcount,
-   char *path, unsigned unit_mode, int tabular)
+   const char *path, unsigned unit_mode, int tabular)
 {
struct btrfs_ioctl_space_args *sargs;
int ret = 0;
diff --git a/cmds-rescue.c b/cmds-rescue.c
index c40088ad..c61145bc 100644
--- a/cmds-rescue.c
+++ b/cmds-rescue.c
@@ -32,8 +32,8 @@ static const char * const rescue_cmd_group_usage[] = {
NULL
 };
 
-int btrfs_recover_chunk_tree(char *path, int verbose, int yes);
-int btrfs_recover_superblocks(char *path, int verbose, int yes);
+int btrfs_recover_chunk_tree(const char *path, int verbose, int yes);
+int btrfs_recover_superblocks(const char *path, int verbose, int yes);
 
 static const char * const cmd_rescue_chunk_recover_usage[] = {
"btrfs rescue chunk-recover [options] ",
diff --git a/send-utils.c b/send-utils.c
index b5289e76..8ce94de1 100644
--- a/send-utils.c
+++ b/send-utils.c
@@ -28,8 +28,8 @@
 #include "ioctl.h"
 #include "btrfs-list.h"
 
-static int btrfs_subvolid_resolve_sub(int fd, char *path, size_t *path_len,
- u64 subvol_id);
+static int btrfs_subvolid_resolve_sub(int fd, char *path,
+ size_t *path_len, u64 subvol_id);
 
 static int btrfs_get_root_id_by_sub_path(int mnt_fd, const char *sub_path,
 u64 *root_id)
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] btrfs-progs: qgroups: introduce btrfs_qgroup_query

2018-03-02 Thread jeffm

From: Jeff Mahoney 

The only mechanism we have in the progs for searching qgroups is to load
all of them and filter the results.  This works for qgroup show but
to add quota information to 'btrfs subvoluem show' it's pretty wasteful.

This patch splits out setting up the search and performing the search so
we can search for a single qgroupid more easily.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 98 +---
 qgroup.h |  7 +
 2 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/qgroup.c b/qgroup.c
index b1be3311..2d0a6947 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -1146,11 +1146,11 @@ static inline void print_status_flag_warning(u64 flags)
warning("qgroup data inconsistent, rescan recommended");
 }
 
-static int __qgroups_search(int fd, struct qgroup_lookup *qgroup_lookup)
+static int __qgroups_search(int fd, struct btrfs_ioctl_search_args *args,
+   struct qgroup_lookup *qgroup_lookup)
 {
int ret;
-   struct btrfs_ioctl_search_args args;
-   struct btrfs_ioctl_search_key *sk = &args.key;
+   struct btrfs_ioctl_search_key *sk = &args->key;
struct btrfs_ioctl_search_header *sh;
unsigned long off = 0;
unsigned int i;
@@ -1161,30 +1161,12 @@ static int __qgroups_search(int fd, struct 
qgroup_lookup *qgroup_lookup)
u64 qgroupid;
u64 qgroupid1;
 
-   memset(&args, 0, sizeof(args));
-
-   sk->tree_id = BTRFS_QUOTA_TREE_OBJECTID;
-   sk->max_type = BTRFS_QGROUP_RELATION_KEY;
-   sk->min_type = BTRFS_QGROUP_STATUS_KEY;
-   sk->max_objectid = (u64)-1;
-   sk->max_offset = (u64)-1;
-   sk->max_transid = (u64)-1;
-   sk->nr_items = 4096;
-
qgroup_lookup_init(qgroup_lookup);
 
while (1) {
-   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
+   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
if (ret < 0) {
-   if (errno == ENOENT) {
-   error("can't list qgroups: quotas not enabled");
-   ret = -ENOTTY;
-   } else {
-   error("can't list qgroups: %s",
-  strerror(errno));
-   ret = -errno;
-   }
-
+   ret = -errno;
break;
}
 
@@ -1198,14 +1180,14 @@ static int __qgroups_search(int fd, struct 
qgroup_lookup *qgroup_lookup)
 * read the root_ref item it contains
 */
for (i = 0; i < sk->nr_items; i++) {
-   sh = (struct btrfs_ioctl_search_header *)(args.buf +
+   sh = (struct btrfs_ioctl_search_header *)(args->buf +
  off);
off += sizeof(*sh);
 
switch (btrfs_search_header_type(sh)) {
case BTRFS_QGROUP_STATUS_KEY:
si = (struct btrfs_qgroup_status_item *)
-(args.buf + off);
+(args->buf + off);
flags = btrfs_stack_qgroup_status_flags(si);
 
print_status_flag_warning(flags);
@@ -1213,7 +1195,7 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
case BTRFS_QGROUP_INFO_KEY:
qgroupid = btrfs_search_header_offset(sh);
info = (struct btrfs_qgroup_info_item *)
-  (args.buf + off);
+  (args->buf + off);
 
ret = update_qgroup_info(fd, qgroup_lookup,
 qgroupid, info);
@@ -1221,7 +1203,7 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
case BTRFS_QGROUP_LIMIT_KEY:
qgroupid = btrfs_search_header_offset(sh);
limit = (struct btrfs_qgroup_limit_item *)
-   (args.buf + off);
+   (args->buf + off);
 
ret = update_qgroup_limit(fd, qgroup_lookup,
  qgroupid, limit);
@@ -1267,6 +1249,66 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
return ret;
 }
 
+static int qgroups_search_all(int fd, struct qgroup_lookup *qgroup_lookup)
+{
+   struct btrfs_ioctl_search_args args = {
+   .key = {
+   .tree_id = BTRFS_QUOTA_TREE_OBJECTID,
+   .max_type = BTRFS_QGROUP_RELATION_KEY,
+   .min_type = BTRFS_QGROUP

[PATCH 8/8] btrfs-progs: qgroups: export qgroups usage information as JSON

2018-03-02 Thread jeffm

From: Jeff Mahoney 

One of the common requests I receive is for 'df' like facilities
for subvolume usage.  Really, the request is for monitoring tools to be
able to understand when subvolumes may be approaching quota in the same
manner traditional file systems approach ENOSPC.

This patch allows us to export the qgroups data in a machine-readable
format so that monitoring tools can parse it easily.

There are two modes since JSON can technically handle 64-bit numbers
but JavaScript proper cannot.  show -j enables JSON mode using 64-bit
integers directly.  --json-compat presents 64-bit numbers as an array
of two 32-bit numbers (high followed by low).

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-qgroup.asciidoc |   4 +
 Makefile.inc.in |   4 +-
 cmds-qgroup.c   |  36 +-
 configure.ac|   6 +
 qgroup.c| 211 
 qgroup.h|   3 +
 6 files changed, 258 insertions(+), 6 deletions(-)

diff --git a/Documentation/btrfs-qgroup.asciidoc 
b/Documentation/btrfs-qgroup.asciidoc
index 360b3269..22a9c2a7 100644
--- a/Documentation/btrfs-qgroup.asciidoc
+++ b/Documentation/btrfs-qgroup.asciidoc
@@ -105,6 +105,10 @@ list all qgroups which impact the given path(include 
ancestral qgroups)
 list all qgroups which impact the given path(exclude ancestral qgroups)
 -v
 Be more verbose.  Print pathnames of member qgroups when nested.
+-j
+If enabled, export qgroup usage information in JSON format.  This implies 
--raw.
+--json-compat
+By default, JSON output contains full 64-bit integers, which may be 
incompatible with some JSON parsers.  This option exports those values as an 
array of 32-bit numbers in [high, low] format.
 --raw
 raw numbers in bytes, without the 'B' suffix.
 --human-readable
diff --git a/Makefile.inc.in b/Makefile.inc.in
index 56271903..68bddbed 100644
--- a/Makefile.inc.in
+++ b/Makefile.inc.in
@@ -18,9 +18,9 @@ BTRFSRESTORE_ZSTD = @BTRFSRESTORE_ZSTD@
 SUBST_CFLAGS = @CFLAGS@
 SUBST_LDFLAGS = @LDFLAGS@
 
-LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread
+LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @JSON_LIBS@ -L. -pthread
 LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@
-STATIC_LIBS_BASE = @UUID_LIBS_STATIC@ @BLKID_LIBS_STATIC@ -L. -pthread
+STATIC_LIBS_BASE = @UUID_LIBS_STATIC@ @BLKID_LIBS_STATIC@ @JSON_LIBS_STATIC@ 
-L. -pthread
 STATIC_LIBS_COMP = @ZLIB_LIBS_STATIC@ @LZO2_LIBS_STATIC@ @ZSTD_LIBS_STATIC@
 
 prefix ?= @prefix@
diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 94cd0fd3..eee15ef1 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -282,6 +282,10 @@ static const char * const cmd_qgroup_show_usage[] = {
"   (excluding ancestral qgroups)",
"-P print first-level qgroups using pathname",
"-v verbose, prints all nested subvolumes",
+#ifdef HAVE_JSON
+   "-j export in JSON format",
+   "--json-compat  export in JSON compatibility mode",
+#endif
HELPINFO_UNITS_LONG,
"--sort=qgroupid,rfer,excl,max_rfer,max_excl,pathname",
"   list qgroups sorted by specified items",
@@ -302,6 +306,8 @@ static int cmd_qgroup_show(int argc, char **argv)
unsigned unit_mode;
int sync = 0;
bool verbose = false;
+   bool export_json = false;
+   bool compat_json = false;
 
struct btrfs_qgroup_comparer_set *comparer_set;
struct btrfs_qgroup_filter_set *filter_set;
@@ -314,16 +320,26 @@ static int cmd_qgroup_show(int argc, char **argv)
int c;
enum {
GETOPT_VAL_SORT = 256,
-   GETOPT_VAL_SYNC
+   GETOPT_VAL_SYNC,
+   GETOPT_VAL_JSCOMPAT,
};
static const struct option long_options[] = {
{"sort", required_argument, NULL, GETOPT_VAL_SORT},
{"sync", no_argument, NULL, GETOPT_VAL_SYNC},
{"verbose", no_argument, NULL, 'v'},
+#ifdef HAVE_JSON
+   {"json-compat", no_argument, NULL, GETOPT_VAL_JSCOMPAT},
+#endif
{ NULL, 0, NULL, 0 }
};
-
-   c = getopt_long(argc, argv, "pPcreFfv", long_options, NULL);
+   const char getopt_chars[] = {
+   'p', 'P', 'c', 'r', 'e', 'F', 'f', 'v',
+#ifdef HAVE_JSON
+   'j',
+#endif
+   '\0' };
+
+   c = getopt_long(argc, argv, getopt_chars, long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -353,6 +369,14 @@ static int cmd_qgroup_show(int argc, char **argv)
case 'f':
filter_flag |= 0x2;
break;
+#ifdef HAVE_JSON
+   case GETOPT_VAL_JSCOMPAT:
+   comp

[PATCH 5/8] btrfs-progs: qgroups: introduce and use info and limit structures

2018-03-02 Thread jeffm

From: Jeff Mahoney 

We use structures to pass the info and limit from the kernel as items
but store the individual values separately in btrfs_qgroup.  We already
have a btrfs_qgroup_limit structure that's used for setting the limit.

This patch introduces a btrfs_qgroup_info structure and uses that and
btrfs_qgroup_limit in btrfs_qgroup.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 73 +++-
 qgroup.h |  8 +++
 2 files changed, 43 insertions(+), 38 deletions(-)

diff --git a/qgroup.c b/qgroup.c
index 83918134..b1be3311 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -46,20 +46,12 @@ struct btrfs_qgroup {
/*
 * info_item
 */
-   u64 generation;
-   u64 rfer;   /*referenced*/
-   u64 rfer_cmpr;  /*referenced compressed*/
-   u64 excl;   /*exclusive*/
-   u64 excl_cmpr;  /*exclusive compressed*/
+   struct btrfs_qgroup_info info;
 
/*
 *limit_item
 */
-   u64 flags;  /*which limits are set*/
-   u64 max_rfer;
-   u64 max_excl;
-   u64 rsv_rfer;
-   u64 rsv_excl;
+   struct btrfs_qgroup_limit limit;
 
/*qgroups this group is member of*/
struct list_head qgroups;
@@ -272,24 +264,24 @@ static void print_qgroup_column(struct btrfs_qgroup 
*qgroup,
print_qgroup_column_add_blank(BTRFS_QGROUP_QGROUPID, len);
break;
case BTRFS_QGROUP_RFER:
-   len = printf("%*s", max_len, pretty_size_mode(qgroup->rfer, 
unit_mode));
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->info.referenced, unit_mode));
break;
case BTRFS_QGROUP_EXCL:
-   len = printf("%*s", max_len, pretty_size_mode(qgroup->excl, 
unit_mode));
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->info.exclusive, unit_mode));
break;
case BTRFS_QGROUP_PARENT:
len = print_parent_column(qgroup);
print_qgroup_column_add_blank(BTRFS_QGROUP_PARENT, len);
break;
case BTRFS_QGROUP_MAX_RFER:
-   if (qgroup->flags & BTRFS_QGROUP_LIMIT_MAX_RFER)
-   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->max_rfer, unit_mode));
+   if (qgroup->limit.flags & BTRFS_QGROUP_LIMIT_MAX_RFER)
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->limit.max_referenced, unit_mode));
else
len = printf("%*s", max_len, "none");
break;
case BTRFS_QGROUP_MAX_EXCL:
-   if (qgroup->flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
-   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->max_excl, unit_mode));
+   if (qgroup->limit.flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->limit.max_exclusive, unit_mode));
else
len = printf("%*s", max_len, "none");
break;
@@ -432,9 +424,9 @@ static int comp_entry_with_rfer(struct btrfs_qgroup *entry1,
 {
int ret;
 
-   if (entry1->rfer > entry2->rfer)
+   if (entry1->info.referenced > entry2->info.referenced)
ret = 1;
-   else if (entry1->rfer < entry2->rfer)
+   else if (entry1->info.referenced < entry2->info.referenced)
ret = -1;
else
ret = 0;
@@ -448,9 +440,9 @@ static int comp_entry_with_excl(struct btrfs_qgroup *entry1,
 {
int ret;
 
-   if (entry1->excl > entry2->excl)
+   if (entry1->info.exclusive > entry2->info.exclusive)
ret = 1;
-   else if (entry1->excl < entry2->excl)
+   else if (entry1->info.exclusive < entry2->info.exclusive)
ret = -1;
else
ret = 0;
@@ -464,9 +456,9 @@ static int comp_entry_with_max_rfer(struct btrfs_qgroup 
*entry1,
 {
int ret;
 
-   if (entry1->max_rfer > entry2->max_rfer)
+   if (entry1->limit.max_referenced > entry2->limit.max_referenced)
ret = 1;
-   else if (entry1->max_rfer < entry2->max_rfer)
+   else if (entry1->limit.max_referenced < entry2->limit.max_referenced)
ret = -1;
else
ret = 0;
@@ -480,9 +472,9 @@ static int comp_entry_with_max_excl(struct btrfs_qgroup 
*entry1,
 {
int ret;
 
-   if (entry1->max_excl > entry2->max_excl)
+   if (entry1->limit.max_exclusive > entry2->limit.max_exclusive)
ret = 1;
-   else if (entry1->max_excl < entry2->max_excl)
+   else if (entry1->limit.max_exclusive < entry2->limit.max_exclusive)
ret = -1;
else
ret = 0;
@@ -739,11 +731,13 @@ static int update_qgroup_info(int fd, struct 
qgroup_lookup *qgroup_lookup,
if (IS_ERR_OR_NULL(bq))
return PTR_ERR(bq);
 
-   bq->generatio

[PATCH 7/8] btrfs-progs: subvolume: add quota info to btrfs sub show

2018-03-02 Thread jeffm

From: Jeff Mahoney 

This patch reports on the first-level qgroup, if any, associated with
a particular subvolume.  It displays the usage and limit, subject
to the usual unit parameters.

Signed-off-by: Jeff Mahoney 
---
 cmds-subvolume.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 8a473f7a..29d0e0e5 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -972,6 +972,7 @@ static const char * const cmd_subvol_show_usage[] = {
"Show more information about the subvolume",
"-r|--rootid   rootid of the subvolume",
"-u|--uuid uuid of the subvolume",
+   HELPINFO_UNITS_SHORT_LONG,
"",
"If no option is specified,  will be shown, otherwise",
"the rootid or uuid are resolved relative to the  path.",
@@ -993,6 +994,13 @@ static int cmd_subvol_show(int argc, char **argv)
int by_uuid = 0;
u64 rootid_arg;
u8 uuid_arg[BTRFS_UUID_SIZE];
+   struct btrfs_qgroup_stats stats;
+   unsigned int unit_mode;
+   const char *referenced_size;
+   const char *referenced_limit_size = "-";
+   unsigned field_width = 0;
+
+   unit_mode = get_unit_mode_from_arg(&argc, argv, 1);
 
while (1) {
int c;
@@ -1112,6 +1120,44 @@ static int cmd_subvol_show(int argc, char **argv)
btrfs_list_subvols_print(fd, filter_set, NULL, BTRFS_LIST_LAYOUT_RAW,
1, raw_prefix);
 
+   ret = btrfs_qgroup_query(fd, get_ri.root_id, &stats);
+   if (ret < 0) {
+   if (ret == -ENODATA)
+   printf("Quotas must be enabled for per-subvolume 
usage\n");
+   else if (ret != -ENOTTY)
+   fprintf(stderr,
+   "\nERROR: BTRFS_IOC_QUOTA_QUERY failed: %s\n",
+   strerror(errno));
+   goto out;
+   }
+
+   printf("\tQuota Usage:\t\t");
+   fflush(stdout);
+
+   referenced_size = pretty_size_mode(stats.info.referenced, unit_mode);
+   if (stats.limit.max_referenced)
+  referenced_limit_size = pretty_size_mode(
+   stats.limit.max_referenced,
+   unit_mode);
+   field_width = max(strlen(referenced_size),
+ strlen(referenced_limit_size));
+
+   printf("%-*s referenced, %s exclusive\n ", field_width,
+  referenced_size,
+  pretty_size_mode(stats.info.exclusive, unit_mode));
+
+   printf("\tQuota Limits:\t\t");
+   if (stats.limit.max_referenced || stats.limit.max_exclusive) {
+   const char *excl = "-";
+
+   if (stats.limit.max_exclusive)
+  excl = pretty_size_mode(stats.limit.max_exclusive,
+  unit_mode);
+   printf("%-*s referenced, %s exclusive\n", field_width,
+  referenced_limit_size, excl);
+   } else
+   printf("None\n");
+
 out:
/* clean up */
free(get_ri.path);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] btrfs-progs: qgroups: add pathname to show output

2018-03-02 Thread jeffm

From: Jeff Mahoney 

The btrfs qgroup show command currently only exports qgroup IDs,
forcing the user to resolve which subvolume each corresponds to.

This patch adds pathname resolution to qgroup show so that when
the -P option is used, the last column contains the pathname of
the root of the subvolume it describes.  In the case of nested
qgroups, it will show the number of member qgroups or the paths
of the members if the -v option is used.

Pathname can also be used as a sort parameter.

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-qgroup.asciidoc |   4 +
 cmds-qgroup.c   |  17 -
 kerncompat.h|   1 +
 qgroup.c| 142 
 qgroup.h|   4 +-
 utils.c |  22 --
 utils.h |   2 +
 7 files changed, 166 insertions(+), 26 deletions(-)

diff --git a/Documentation/btrfs-qgroup.asciidoc 
b/Documentation/btrfs-qgroup.asciidoc
index 3108457c..360b3269 100644
--- a/Documentation/btrfs-qgroup.asciidoc
+++ b/Documentation/btrfs-qgroup.asciidoc
@@ -97,10 +97,14 @@ print child qgroup id.
 print limit of referenced size of qgroup.
 -e
 print limit of exclusive size of qgroup.
+-P
+print pathname to the root of the subvolume managed by qgroup.  For nested 
qgroups, the number of members will be printed unless -v is specified.
 -F
 list all qgroups which impact the given path(include ancestral qgroups)
 -f
 list all qgroups which impact the given path(exclude ancestral qgroups)
+-v
+Be more verbose.  Print pathnames of member qgroups when nested.
 --raw
 raw numbers in bytes, without the 'B' suffix.
 --human-readable
diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 48686436..94cd0fd3 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -280,8 +280,10 @@ static const char * const cmd_qgroup_show_usage[] = {
"   (including ancestral qgroups)",
"-f list all qgroups which impact the given path",
"   (excluding ancestral qgroups)",
+   "-P print first-level qgroups using pathname",
+   "-v verbose, prints all nested subvolumes",
HELPINFO_UNITS_LONG,
-   "--sort=qgroupid,rfer,excl,max_rfer,max_excl",
+   "--sort=qgroupid,rfer,excl,max_rfer,max_excl,pathname",
"   list qgroups sorted by specified items",
"   you can use '+' or '-' in front of each item.",
"   (+:ascending, -:descending, ascending default)",
@@ -299,6 +301,7 @@ static int cmd_qgroup_show(int argc, char **argv)
int filter_flag = 0;
unsigned unit_mode;
int sync = 0;
+   bool verbose = false;
 
struct btrfs_qgroup_comparer_set *comparer_set;
struct btrfs_qgroup_filter_set *filter_set;
@@ -316,10 +319,11 @@ static int cmd_qgroup_show(int argc, char **argv)
static const struct option long_options[] = {
{"sort", required_argument, NULL, GETOPT_VAL_SORT},
{"sync", no_argument, NULL, GETOPT_VAL_SYNC},
+   {"verbose", no_argument, NULL, 'v'},
{ NULL, 0, NULL, 0 }
};
 
-   c = getopt_long(argc, argv, "pcreFf", long_options, NULL);
+   c = getopt_long(argc, argv, "pPcreFfv", long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -327,6 +331,10 @@ static int cmd_qgroup_show(int argc, char **argv)
btrfs_qgroup_setup_print_column(
BTRFS_QGROUP_PARENT);
break;
+   case 'P':
+   btrfs_qgroup_setup_print_column(
+   BTRFS_QGROUP_PATHNAME);
+   break;
case 'c':
btrfs_qgroup_setup_print_column(
BTRFS_QGROUP_CHILD);
@@ -354,6 +362,9 @@ static int cmd_qgroup_show(int argc, char **argv)
case GETOPT_VAL_SYNC:
sync = 1;
break;
+   case 'v':
+   verbose = true;
+   break;
default:
usage(cmd_qgroup_show_usage);
}
@@ -394,7 +405,7 @@ static int cmd_qgroup_show(int argc, char **argv)
BTRFS_QGROUP_FILTER_PARENT,
qgroupid);
}
-   ret = btrfs_show_qgroups(fd, filter_set, comparer_set);
+   ret = btrfs_show_qgroups(fd, filter_set, comparer_set, verbose);
close_file_or_dir(fd, dirstream);
free(filter_set);
free(comparer_set);
diff --git a/kerncompat.h b/kerncompat.h
index fa96715f..f97495ee 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -29,6 +29,7 @@

[PATCH 2/8] btrfs-progs: qgroups: fix misleading index check

2018-03-02 Thread jeffm

From: Jeff Mahoney 

In print_single_qgroup_table we check the loop index against
BTRFS_QGROUP_CHILD, but what we really mean is "last column."  Since
we have an enum value to indicate the last value, use that instead
of assuming that BTRFS_QGROUP_CHILD is always last.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qgroup.c b/qgroup.c
index 11659e83..67bc0738 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -267,7 +267,7 @@ static void print_single_qgroup_table(struct btrfs_qgroup 
*qgroup)
continue;
print_qgroup_column(qgroup, i);
 
-   if (i != BTRFS_QGROUP_CHILD)
+   if (i != BTRFS_QGROUP_ALL - 1)
printf(" ");
}
printf("\n");
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] btrfs-progs: quota: Add -W option to rescan to wait without starting rescan

2018-03-02 Thread jeffm

From: Jeff Mahoney 

This patch adds a new -W option to wait for a rescan without starting a
new operation.  This is useful for things like xfstests where we want
do to do a "btrfs quota enable" and not continue until the subsequent
rescan has finished.

In addition to documenting the new option in the man page, I've cleaned
up the rescan entry to document the -w option a bit better.

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-quota.asciidoc | 10 +++---
 cmds-quota.c   | 21 +++--
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/btrfs-quota.asciidoc 
b/Documentation/btrfs-quota.asciidoc
index 85ebf729..0b64a69b 100644
--- a/Documentation/btrfs-quota.asciidoc
+++ b/Documentation/btrfs-quota.asciidoc
@@ -238,15 +238,19 @@ Disable subvolume quota support for a filesystem.
 *enable* ::
 Enable subvolume quota support for a filesystem.
 
-*rescan* [-s] ::
+*rescan* [-s|-w|-W] ::
 Trash all qgroup numbers and scan the metadata again with the current config.
 +
 `Options`
 +
 -s
-show status of a running rescan operation.
+Show status of a running rescan operation.
+
 -w
-wait for rescan operation to finish(can be already in progress).
+Start rescan operation and wait until it has finished before exiting.  If a 
rescan is already running, wait until it finishes and then exit without 
starting a new one.
+
+-W
+Wait for rescan operation to finish and then exit.  If a rescan is not already 
running, exit silently.
 
 EXIT STATUS
 ---
diff --git a/cmds-quota.c b/cmds-quota.c
index 745889d1..fe6376ac 100644
--- a/cmds-quota.c
+++ b/cmds-quota.c
@@ -120,14 +120,20 @@ static int cmd_quota_rescan(int argc, char **argv)
int wait_for_completion = 0;
 
while (1) {
-   int c = getopt(argc, argv, "sw");
+   int c = getopt(argc, argv, "swW");
if (c < 0)
break;
switch (c) {
case 's':
ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS;
break;
+   case 'W':
+   ioctlnum = 0;
+   wait_for_completion = 1;
+   break;
case 'w':
+   /* Reset it in case the user did both -W and -w */
+   ioctlnum = BTRFS_IOC_QUOTA_RESCAN;
wait_for_completion = 1;
break;
default:
@@ -135,8 +141,9 @@ static int cmd_quota_rescan(int argc, char **argv)
}
}
 
-   if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN && wait_for_completion) {
-   error("switch -w cannot be used with -s");
+   if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS && wait_for_completion) {
+   error("switch -%c cannot be used with -s",
+ ioctlnum ? 'w' : 'W');
return 1;
}
 
@@ -150,8 +157,10 @@ static int cmd_quota_rescan(int argc, char **argv)
if (fd < 0)
return 1;
 
-   ret = ioctl(fd, ioctlnum, &args);
-   e = errno;
+   if (ioctlnum) {
+   ret = ioctl(fd, ioctlnum, &args);
+   e = errno;
+   }
 
if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS) {
close_file_or_dir(fd, dirstream);
@@ -167,7 +176,7 @@ static int cmd_quota_rescan(int argc, char **argv)
return 0;
}
 
-   if (ret == 0) {
+   if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN && ret == 0) {
printf("quota rescan started\n");
fflush(stdout);
} else if (ret < 0 && (!wait_for_completion || e != EINPROGRESS)) {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/8] btrfs-progs: qgroups usability [corrected]

2018-03-02 Thread jeffm

From: Jeff Mahoney 

Hi all -

The following series addresses some usability issues with the qgroups UI.

1) Adds -W option so we can wait on a rescan completing without starting one.
2) Adds qgroup information to 'btrfs subvolume show'
3) Adds a -P option to show pathnames for first-level qgroups (or member
   of nested qgroups with -v)
4) Allows exporting the qgroup table in JSON format for use by external
   programs/scripts.

-Jeff

Jeff Mahoney (8):
  btrfs-progs: quota: Add -W option to rescan to wait without starting
rescan
  btrfs-progs: qgroups: fix misleading index check
  btrfs-progs: constify pathnames passed as arguments
  btrfs-progs: qgroups: add pathname to show output
  btrfs-progs: qgroups: introduce and use info and limit structures
  btrfs-progs: qgroups: introduce btrfs_qgroup_query
  btrfs-progs: subvolume: add quota info to btrfs sub show
  btrfs-progs: qgroups: export qgroups usage information as JSON

 Documentation/btrfs-qgroup.asciidoc |   8 +
 Documentation/btrfs-quota.asciidoc  |  10 +-
 Makefile.inc.in |   4 +-
 chunk-recover.c |   4 +-
 cmds-device.c   |   2 +-
 cmds-fi-usage.c |   6 +-
 cmds-qgroup.c   |  49 +++-
 cmds-quota.c|  21 +-
 cmds-rescue.c   |   4 +-
 cmds-subvolume.c|  46 
 configure.ac|   6 +
 kerncompat.h|   1 +
 qgroup.c| 526 ++--
 qgroup.h|  22 +-
 send-utils.c|   4 +-
 utils.c |  22 +-
 utils.h |   2 +
 17 files changed, 621 insertions(+), 116 deletions(-)

-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/8] btrfs-progs: qgroups usability

2018-03-02 Thread Jeff Mahoney

On 3/2/18 1:39 PM, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> Hi all -
> 
> The following series addresses some usability issues with the qgroups UI.
> 
> 1) Adds -W option so we can wait on a rescan completing without starting one.
> 2) Adds qgroup information to 'btrfs subvolume show'
> 3) Adds a -P option to show pathnames for first-level qgroups (or member
>of nested qgroups with -v)
> 4) Allows exporting the qgroup table in JSON format for use by external
>programs/scripts.


Grumble.  Ignore this thread.  I had reordered the patches and didn't
clean up an older git format-patch.

-Jeff

> -Jeff
> 
> Jeff Mahoney (8):
>   btrfs-progs: quota: Add -W option to rescan to wait without starting
> rescan
>   btrfs-progs: qgroups: fix misleading index check
>   btrfs-progs: constify pathnames passed as arguments
>   btrfs-progs: qgroups: add pathname to show output
>   btrfs-progs: qgroups: introduce and use info and limit structures
>   btrfs-progs: qgroups: introduce btrfs_qgroup_query
>   btrfs-progs: subvolume: add quota info to btrfs sub show
>   btrfs-progs: qgroups: export qgroups usage information as JSON
> 
>  Documentation/btrfs-qgroup.asciidoc |   8 +
>  Documentation/btrfs-quota.asciidoc  |  10 +-
>  Makefile.inc.in |   4 +-
>  chunk-recover.c |   4 +-
>  cmds-device.c   |   2 +-
>  cmds-fi-usage.c |   6 +-
>  cmds-qgroup.c   |  49 +++-
>  cmds-quota.c|  21 +-
>  cmds-rescue.c   |   4 +-
>  cmds-subvolume.c|  46 
>  configure.ac|   6 +
>  kerncompat.h|   1 +
>  qgroup.c| 526 
> ++--
>  qgroup.h|  22 +-
>  send-utils.c|   4 +-
>  utils.c |  22 +-
>  utils.h |   2 +
>  17 files changed, 621 insertions(+), 116 deletions(-)
> 


-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Btrfs: fix log replay failure after linking special file and fsync

2018-03-02 Thread Liu Bo

On Wed, Feb 28, 2018 at 03:55:40PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If in the same transaction we rename a special file (fifo, character/block
> device or symbolic link), create a hard link for it having its old name
> then sync the log, we will end up with a log that can not be replayed and
> at when attempting to replay it, an EEXIST error is returned and mounting
> the filesystem fails. Example scenario:
> 
>   $ mkfs.btrfs -f /dev/sdc
>   $ mount /dev/sdc /mnt
>   $ mkdir /mnt/testdir
>   $ mkfifo /mnt/testdir/foo
>   # Make sure everything done so far is durably persisted.
>   $ sync
> 
>   # Create some unrelated file and fsync it, this is just to create a log
>   # tree. The file must be in the same directory as our special file.
>   $ touch /mnt/testdir/f1
>   $ xfs_io -c "fsync" /mnt/testdir/f1
> 
>   # Rename our special file and then create a hard link with its old name.
>   $ mv /mnt/testdir/foo /mnt/testdir/bar
>   $ ln /mnt/testdir/bar /mnt/testdir/foo
> 
>   # Create some other unrelated file and fsync it, this is just to persist
>   # the log tree which was modified by the previous rename and link
>   # operations. Alternatively we could have modified file f1 and fsync it.
>   $ touch /mnt/f2
>   $ xfs_io -c "fsync" /mnt/f2
> 
>   
> 
>   $ mount /dev/sdc /mnt
>   mount: mount /dev/sdc on /mnt failed: File exists
> 
> This happens because when both the log tree and the subvolume's tree have
> an entry in the directory "testdir" with the same name, that is, there
> is one key (258 INODE_REF 257) in the subvolume tree and another one in
> the log tree (where 258 is the inode number of our special file and 257
> is the inode for directory "testdir"). Only the data of those two keys
> differs, in the subvolume tree the index field for inode reference has
> a value of 3 while the log tree it has a value of 5. Because the same key
> exists in both trees, but have different index, the log replay fails with
> an -EEXIST error when attempting to replay the inode reference from the
> log tree.
> 
> Fix this by setting the last_unlink_trans field of the inode (our special
> file) to the current transaction id when a hard link is created, as this
> forces logging the parent directory inode, solving the conflict at log
> replay time.
> 

Reviewed-by: Liu Bo 

Thanks,

-liubo
> A new generic test case for fstests was also submitted.
> 
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/tree-log.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 28d0de199b05..411a022489e4 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -5841,7 +5841,7 @@ int btrfs_log_new_name(struct btrfs_trans_handle *trans,
>* this will force the logging code to walk the dentry chain
>* up for the file
>*/
> - if (S_ISREG(inode->vfs_inode.i_mode))
> + if (!S_ISDIR(inode->vfs_inode.i_mode))
>   inode->last_unlink_trans = trans->transid;
>  
>   /*
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] btrfs-progs: qgroups: add pathname to show output

2018-03-02 Thread jeffm

From: Jeff Mahoney 

The btrfs qgroup show command currently only exports qgroup IDs,
forcing the user to resolve which subvolume each corresponds to.

This patch adds pathname resolution to qgroup show so that when
the -P option is used, the last column contains the pathname of
the root of the subvolume it describes.  In the case of nested
qgroups, it will show the number of member qgroups or the paths
of the members if the -v option is used.

Pathname can also be used as a sort parameter.

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-qgroup.asciidoc |   4 +
 cmds-qgroup.c   |  17 -
 kerncompat.h|   1 +
 qgroup.c| 142 
 qgroup.h|   4 +-
 utils.c |  22 --
 utils.h |   2 +
 7 files changed, 166 insertions(+), 26 deletions(-)

diff --git a/Documentation/btrfs-qgroup.asciidoc 
b/Documentation/btrfs-qgroup.asciidoc
index 3108457c..360b3269 100644
--- a/Documentation/btrfs-qgroup.asciidoc
+++ b/Documentation/btrfs-qgroup.asciidoc
@@ -97,10 +97,14 @@ print child qgroup id.
 print limit of referenced size of qgroup.
 -e
 print limit of exclusive size of qgroup.
+-P
+print pathname to the root of the subvolume managed by qgroup.  For nested 
qgroups, the number of members will be printed unless -v is specified.
 -F
 list all qgroups which impact the given path(include ancestral qgroups)
 -f
 list all qgroups which impact the given path(exclude ancestral qgroups)
+-v
+Be more verbose.  Print pathnames of member qgroups when nested.
 --raw
 raw numbers in bytes, without the 'B' suffix.
 --human-readable
diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 48686436..94cd0fd3 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -280,8 +280,10 @@ static const char * const cmd_qgroup_show_usage[] = {
"   (including ancestral qgroups)",
"-f list all qgroups which impact the given path",
"   (excluding ancestral qgroups)",
+   "-P print first-level qgroups using pathname",
+   "-v verbose, prints all nested subvolumes",
HELPINFO_UNITS_LONG,
-   "--sort=qgroupid,rfer,excl,max_rfer,max_excl",
+   "--sort=qgroupid,rfer,excl,max_rfer,max_excl,pathname",
"   list qgroups sorted by specified items",
"   you can use '+' or '-' in front of each item.",
"   (+:ascending, -:descending, ascending default)",
@@ -299,6 +301,7 @@ static int cmd_qgroup_show(int argc, char **argv)
int filter_flag = 0;
unsigned unit_mode;
int sync = 0;
+   bool verbose = false;
 
struct btrfs_qgroup_comparer_set *comparer_set;
struct btrfs_qgroup_filter_set *filter_set;
@@ -316,10 +319,11 @@ static int cmd_qgroup_show(int argc, char **argv)
static const struct option long_options[] = {
{"sort", required_argument, NULL, GETOPT_VAL_SORT},
{"sync", no_argument, NULL, GETOPT_VAL_SYNC},
+   {"verbose", no_argument, NULL, 'v'},
{ NULL, 0, NULL, 0 }
};
 
-   c = getopt_long(argc, argv, "pcreFf", long_options, NULL);
+   c = getopt_long(argc, argv, "pPcreFfv", long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -327,6 +331,10 @@ static int cmd_qgroup_show(int argc, char **argv)
btrfs_qgroup_setup_print_column(
BTRFS_QGROUP_PARENT);
break;
+   case 'P':
+   btrfs_qgroup_setup_print_column(
+   BTRFS_QGROUP_PATHNAME);
+   break;
case 'c':
btrfs_qgroup_setup_print_column(
BTRFS_QGROUP_CHILD);
@@ -354,6 +362,9 @@ static int cmd_qgroup_show(int argc, char **argv)
case GETOPT_VAL_SYNC:
sync = 1;
break;
+   case 'v':
+   verbose = true;
+   break;
default:
usage(cmd_qgroup_show_usage);
}
@@ -394,7 +405,7 @@ static int cmd_qgroup_show(int argc, char **argv)
BTRFS_QGROUP_FILTER_PARENT,
qgroupid);
}
-   ret = btrfs_show_qgroups(fd, filter_set, comparer_set);
+   ret = btrfs_show_qgroups(fd, filter_set, comparer_set, verbose);
close_file_or_dir(fd, dirstream);
free(filter_set);
free(comparer_set);
diff --git a/kerncompat.h b/kerncompat.h
index fa96715f..f97495ee 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -29,6 +29,7 @@

[PATCH 8/8] btrfs-progs: qgroups: export qgroups usage information as JSON

2018-03-02 Thread jeffm

From: Jeff Mahoney 

One of the common requests I receive is for 'df' like facilities
for subvolume usage.  Really, the request is for monitoring tools to be
able to understand when subvolumes may be approaching quota in the same
manner traditional file systems approach ENOSPC.

This patch allows us to export the qgroups data in a machine-readable
format so that monitoring tools can parse it easily.

There are two modes since JSON can technically handle 64-bit numbers
but JavaScript proper cannot.  show -j enables JSON mode using 64-bit
integers directly.  --json-compat presents 64-bit numbers as an array
of two 32-bit numbers (high followed by low).

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-qgroup.asciidoc |   4 +
 Makefile.inc.in |   4 +-
 cmds-qgroup.c   |  36 +-
 configure.ac|   6 +
 qgroup.c| 211 
 qgroup.h|   3 +
 6 files changed, 258 insertions(+), 6 deletions(-)

diff --git a/Documentation/btrfs-qgroup.asciidoc 
b/Documentation/btrfs-qgroup.asciidoc
index 360b3269..22a9c2a7 100644
--- a/Documentation/btrfs-qgroup.asciidoc
+++ b/Documentation/btrfs-qgroup.asciidoc
@@ -105,6 +105,10 @@ list all qgroups which impact the given path(include 
ancestral qgroups)
 list all qgroups which impact the given path(exclude ancestral qgroups)
 -v
 Be more verbose.  Print pathnames of member qgroups when nested.
+-j
+If enabled, export qgroup usage information in JSON format.  This implies 
--raw.
+--json-compat
+By default, JSON output contains full 64-bit integers, which may be 
incompatible with some JSON parsers.  This option exports those values as an 
array of 32-bit numbers in [high, low] format.
 --raw
 raw numbers in bytes, without the 'B' suffix.
 --human-readable
diff --git a/Makefile.inc.in b/Makefile.inc.in
index 56271903..68bddbed 100644
--- a/Makefile.inc.in
+++ b/Makefile.inc.in
@@ -18,9 +18,9 @@ BTRFSRESTORE_ZSTD = @BTRFSRESTORE_ZSTD@
 SUBST_CFLAGS = @CFLAGS@
 SUBST_LDFLAGS = @LDFLAGS@
 
-LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread
+LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @JSON_LIBS@ -L. -pthread
 LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@
-STATIC_LIBS_BASE = @UUID_LIBS_STATIC@ @BLKID_LIBS_STATIC@ -L. -pthread
+STATIC_LIBS_BASE = @UUID_LIBS_STATIC@ @BLKID_LIBS_STATIC@ @JSON_LIBS_STATIC@ 
-L. -pthread
 STATIC_LIBS_COMP = @ZLIB_LIBS_STATIC@ @LZO2_LIBS_STATIC@ @ZSTD_LIBS_STATIC@
 
 prefix ?= @prefix@
diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 94cd0fd3..eee15ef1 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -282,6 +282,10 @@ static const char * const cmd_qgroup_show_usage[] = {
"   (excluding ancestral qgroups)",
"-P print first-level qgroups using pathname",
"-v verbose, prints all nested subvolumes",
+#ifdef HAVE_JSON
+   "-j export in JSON format",
+   "--json-compat  export in JSON compatibility mode",
+#endif
HELPINFO_UNITS_LONG,
"--sort=qgroupid,rfer,excl,max_rfer,max_excl,pathname",
"   list qgroups sorted by specified items",
@@ -302,6 +306,8 @@ static int cmd_qgroup_show(int argc, char **argv)
unsigned unit_mode;
int sync = 0;
bool verbose = false;
+   bool export_json = false;
+   bool compat_json = false;
 
struct btrfs_qgroup_comparer_set *comparer_set;
struct btrfs_qgroup_filter_set *filter_set;
@@ -314,16 +320,26 @@ static int cmd_qgroup_show(int argc, char **argv)
int c;
enum {
GETOPT_VAL_SORT = 256,
-   GETOPT_VAL_SYNC
+   GETOPT_VAL_SYNC,
+   GETOPT_VAL_JSCOMPAT,
};
static const struct option long_options[] = {
{"sort", required_argument, NULL, GETOPT_VAL_SORT},
{"sync", no_argument, NULL, GETOPT_VAL_SYNC},
{"verbose", no_argument, NULL, 'v'},
+#ifdef HAVE_JSON
+   {"json-compat", no_argument, NULL, GETOPT_VAL_JSCOMPAT},
+#endif
{ NULL, 0, NULL, 0 }
};
-
-   c = getopt_long(argc, argv, "pPcreFfv", long_options, NULL);
+   const char getopt_chars[] = {
+   'p', 'P', 'c', 'r', 'e', 'F', 'f', 'v',
+#ifdef HAVE_JSON
+   'j',
+#endif
+   '\0' };
+
+   c = getopt_long(argc, argv, getopt_chars, long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -353,6 +369,14 @@ static int cmd_qgroup_show(int argc, char **argv)
case 'f':
filter_flag |= 0x2;
break;
+#ifdef HAVE_JSON
+   case GETOPT_VAL_JSCOMPAT:
+   comp

[PATCH 1/8] btrfs-progs: quota: Add -W option to rescan to wait without starting rescan

2018-03-02 Thread jeffm

From: Jeff Mahoney 

This patch adds a new -W option to wait for a rescan without starting a
new operation.  This is useful for things like xfstests where we want
do to do a "btrfs quota enable" and not continue until the subsequent
rescan has finished.

In addition to documenting the new option in the man page, I've cleaned
up the rescan entry to document the -w option a bit better.

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-quota.asciidoc | 10 +++---
 cmds-quota.c   | 21 +++--
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/btrfs-quota.asciidoc 
b/Documentation/btrfs-quota.asciidoc
index 85ebf729..0b64a69b 100644
--- a/Documentation/btrfs-quota.asciidoc
+++ b/Documentation/btrfs-quota.asciidoc
@@ -238,15 +238,19 @@ Disable subvolume quota support for a filesystem.
 *enable* ::
 Enable subvolume quota support for a filesystem.
 
-*rescan* [-s] ::
+*rescan* [-s|-w|-W] ::
 Trash all qgroup numbers and scan the metadata again with the current config.
 +
 `Options`
 +
 -s
-show status of a running rescan operation.
+Show status of a running rescan operation.
+
 -w
-wait for rescan operation to finish(can be already in progress).
+Start rescan operation and wait until it has finished before exiting.  If a 
rescan is already running, wait until it finishes and then exit without 
starting a new one.
+
+-W
+Wait for rescan operation to finish and then exit.  If a rescan is not already 
running, exit silently.
 
 EXIT STATUS
 ---
diff --git a/cmds-quota.c b/cmds-quota.c
index 745889d1..fe6376ac 100644
--- a/cmds-quota.c
+++ b/cmds-quota.c
@@ -120,14 +120,20 @@ static int cmd_quota_rescan(int argc, char **argv)
int wait_for_completion = 0;
 
while (1) {
-   int c = getopt(argc, argv, "sw");
+   int c = getopt(argc, argv, "swW");
if (c < 0)
break;
switch (c) {
case 's':
ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS;
break;
+   case 'W':
+   ioctlnum = 0;
+   wait_for_completion = 1;
+   break;
case 'w':
+   /* Reset it in case the user did both -W and -w */
+   ioctlnum = BTRFS_IOC_QUOTA_RESCAN;
wait_for_completion = 1;
break;
default:
@@ -135,8 +141,9 @@ static int cmd_quota_rescan(int argc, char **argv)
}
}
 
-   if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN && wait_for_completion) {
-   error("switch -w cannot be used with -s");
+   if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS && wait_for_completion) {
+   error("switch -%c cannot be used with -s",
+ ioctlnum ? 'w' : 'W');
return 1;
}
 
@@ -150,8 +157,10 @@ static int cmd_quota_rescan(int argc, char **argv)
if (fd < 0)
return 1;
 
-   ret = ioctl(fd, ioctlnum, &args);
-   e = errno;
+   if (ioctlnum) {
+   ret = ioctl(fd, ioctlnum, &args);
+   e = errno;
+   }
 
if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN_STATUS) {
close_file_or_dir(fd, dirstream);
@@ -167,7 +176,7 @@ static int cmd_quota_rescan(int argc, char **argv)
return 0;
}
 
-   if (ret == 0) {
+   if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN && ret == 0) {
printf("quota rescan started\n");
fflush(stdout);
} else if (ret < 0 && (!wait_for_completion || e != EINPROGRESS)) {
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/8] btrfs-progs: add quota info to btrfs sub show

2018-03-02 Thread jeffm

From: Jeff Mahoney 

This patch reports on the first-level qgroup, if any, associated with
a particular subvolume.  It displays the usage and limit, subject
to the usual unit parameters.

Signed-off-by: Jeff Mahoney 
---
 cmds-subvolume.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 8a473f7a..29d0e0e5 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -972,6 +972,7 @@ static const char * const cmd_subvol_show_usage[] = {
"Show more information about the subvolume",
"-r|--rootid   rootid of the subvolume",
"-u|--uuid uuid of the subvolume",
+   HELPINFO_UNITS_SHORT_LONG,
"",
"If no option is specified,  will be shown, otherwise",
"the rootid or uuid are resolved relative to the  path.",
@@ -993,6 +994,13 @@ static int cmd_subvol_show(int argc, char **argv)
int by_uuid = 0;
u64 rootid_arg;
u8 uuid_arg[BTRFS_UUID_SIZE];
+   struct btrfs_qgroup_stats stats;
+   unsigned int unit_mode;
+   const char *referenced_size;
+   const char *referenced_limit_size = "-";
+   unsigned field_width = 0;
+
+   unit_mode = get_unit_mode_from_arg(&argc, argv, 1);
 
while (1) {
int c;
@@ -1112,6 +1120,44 @@ static int cmd_subvol_show(int argc, char **argv)
btrfs_list_subvols_print(fd, filter_set, NULL, BTRFS_LIST_LAYOUT_RAW,
1, raw_prefix);
 
+   ret = btrfs_qgroup_query(fd, get_ri.root_id, &stats);
+   if (ret < 0) {
+   if (ret == -ENODATA)
+   printf("Quotas must be enabled for per-subvolume 
usage\n");
+   else if (ret != -ENOTTY)
+   fprintf(stderr,
+   "\nERROR: BTRFS_IOC_QUOTA_QUERY failed: %s\n",
+   strerror(errno));
+   goto out;
+   }
+
+   printf("\tQuota Usage:\t\t");
+   fflush(stdout);
+
+   referenced_size = pretty_size_mode(stats.info.referenced, unit_mode);
+   if (stats.limit.max_referenced)
+  referenced_limit_size = pretty_size_mode(
+   stats.limit.max_referenced,
+   unit_mode);
+   field_width = max(strlen(referenced_size),
+ strlen(referenced_limit_size));
+
+   printf("%-*s referenced, %s exclusive\n ", field_width,
+  referenced_size,
+  pretty_size_mode(stats.info.exclusive, unit_mode));
+
+   printf("\tQuota Limits:\t\t");
+   if (stats.limit.max_referenced || stats.limit.max_exclusive) {
+   const char *excl = "-";
+
+   if (stats.limit.max_exclusive)
+  excl = pretty_size_mode(stats.limit.max_exclusive,
+  unit_mode);
+   printf("%-*s referenced, %s exclusive\n", field_width,
+  referenced_limit_size, excl);
+   } else
+   printf("None\n");
+
 out:
/* clean up */
free(get_ri.path);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] btrfs-progs: qgroups: export qgroups usage information as JSON

2018-03-02 Thread jeffm

From: Jeff Mahoney 

One of the common requests I receive is for 'df' like facilities
for subvolume usage.  Really, the request is for monitoring tools to be
able to understand when subvolumes may be approaching quota in the same
manner traditional file systems approach ENOSPC.

This patch allows us to export the qgroups data in a machine-readable
format so that monitoring tools can parse it easily.

There are two modes since JSON can technically handle 64-bit numbers
but JavaScript proper cannot.  show -j enables JSON mode using 64-bit
integers directly.  --json-compat presents 64-bit numbers as an array
of two 32-bit numbers (high followed by low).

Signed-off-by: Jeff Mahoney 
---
 Documentation/btrfs-qgroup.asciidoc |   4 +
 Makefile.inc.in |   4 +-
 cmds-qgroup.c   |  36 +-
 configure.ac|   6 +
 qgroup.c| 211 
 qgroup.h|   3 +
 6 files changed, 258 insertions(+), 6 deletions(-)

diff --git a/Documentation/btrfs-qgroup.asciidoc 
b/Documentation/btrfs-qgroup.asciidoc
index 360b3269..22a9c2a7 100644
--- a/Documentation/btrfs-qgroup.asciidoc
+++ b/Documentation/btrfs-qgroup.asciidoc
@@ -105,6 +105,10 @@ list all qgroups which impact the given path(include 
ancestral qgroups)
 list all qgroups which impact the given path(exclude ancestral qgroups)
 -v
 Be more verbose.  Print pathnames of member qgroups when nested.
+-j
+If enabled, export qgroup usage information in JSON format.  This implies 
--raw.
+--json-compat
+By default, JSON output contains full 64-bit integers, which may be 
incompatible with some JSON parsers.  This option exports those values as an 
array of 32-bit numbers in [high, low] format.
 --raw
 raw numbers in bytes, without the 'B' suffix.
 --human-readable
diff --git a/Makefile.inc.in b/Makefile.inc.in
index 56271903..68bddbed 100644
--- a/Makefile.inc.in
+++ b/Makefile.inc.in
@@ -18,9 +18,9 @@ BTRFSRESTORE_ZSTD = @BTRFSRESTORE_ZSTD@
 SUBST_CFLAGS = @CFLAGS@
 SUBST_LDFLAGS = @LDFLAGS@
 
-LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ -L. -pthread
+LIBS_BASE = @UUID_LIBS@ @BLKID_LIBS@ @JSON_LIBS@ -L. -pthread
 LIBS_COMP = @ZLIB_LIBS@ @LZO2_LIBS@ @ZSTD_LIBS@
-STATIC_LIBS_BASE = @UUID_LIBS_STATIC@ @BLKID_LIBS_STATIC@ -L. -pthread
+STATIC_LIBS_BASE = @UUID_LIBS_STATIC@ @BLKID_LIBS_STATIC@ @JSON_LIBS_STATIC@ 
-L. -pthread
 STATIC_LIBS_COMP = @ZLIB_LIBS_STATIC@ @LZO2_LIBS_STATIC@ @ZSTD_LIBS_STATIC@
 
 prefix ?= @prefix@
diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 94cd0fd3..eee15ef1 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -282,6 +282,10 @@ static const char * const cmd_qgroup_show_usage[] = {
"   (excluding ancestral qgroups)",
"-P print first-level qgroups using pathname",
"-v verbose, prints all nested subvolumes",
+#ifdef HAVE_JSON
+   "-j export in JSON format",
+   "--json-compat  export in JSON compatibility mode",
+#endif
HELPINFO_UNITS_LONG,
"--sort=qgroupid,rfer,excl,max_rfer,max_excl,pathname",
"   list qgroups sorted by specified items",
@@ -302,6 +306,8 @@ static int cmd_qgroup_show(int argc, char **argv)
unsigned unit_mode;
int sync = 0;
bool verbose = false;
+   bool export_json = false;
+   bool compat_json = false;
 
struct btrfs_qgroup_comparer_set *comparer_set;
struct btrfs_qgroup_filter_set *filter_set;
@@ -314,16 +320,26 @@ static int cmd_qgroup_show(int argc, char **argv)
int c;
enum {
GETOPT_VAL_SORT = 256,
-   GETOPT_VAL_SYNC
+   GETOPT_VAL_SYNC,
+   GETOPT_VAL_JSCOMPAT,
};
static const struct option long_options[] = {
{"sort", required_argument, NULL, GETOPT_VAL_SORT},
{"sync", no_argument, NULL, GETOPT_VAL_SYNC},
{"verbose", no_argument, NULL, 'v'},
+#ifdef HAVE_JSON
+   {"json-compat", no_argument, NULL, GETOPT_VAL_JSCOMPAT},
+#endif
{ NULL, 0, NULL, 0 }
};
-
-   c = getopt_long(argc, argv, "pPcreFfv", long_options, NULL);
+   const char getopt_chars[] = {
+   'p', 'P', 'c', 'r', 'e', 'F', 'f', 'v',
+#ifdef HAVE_JSON
+   'j',
+#endif
+   '\0' };
+
+   c = getopt_long(argc, argv, getopt_chars, long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -353,6 +369,14 @@ static int cmd_qgroup_show(int argc, char **argv)
case 'f':
filter_flag |= 0x2;
break;
+#ifdef HAVE_JSON
+   case GETOPT_VAL_JSCOMPAT:
+   comp

[PATCH 7/8] btrfs-progs: subvolume: add quota info to btrfs sub show

2018-03-02 Thread jeffm

From: Jeff Mahoney 

This patch reports on the first-level qgroup, if any, associated with
a particular subvolume.  It displays the usage and limit, subject
to the usual unit parameters.

Signed-off-by: Jeff Mahoney 
---
 cmds-subvolume.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 8a473f7a..29d0e0e5 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -972,6 +972,7 @@ static const char * const cmd_subvol_show_usage[] = {
"Show more information about the subvolume",
"-r|--rootid   rootid of the subvolume",
"-u|--uuid uuid of the subvolume",
+   HELPINFO_UNITS_SHORT_LONG,
"",
"If no option is specified,  will be shown, otherwise",
"the rootid or uuid are resolved relative to the  path.",
@@ -993,6 +994,13 @@ static int cmd_subvol_show(int argc, char **argv)
int by_uuid = 0;
u64 rootid_arg;
u8 uuid_arg[BTRFS_UUID_SIZE];
+   struct btrfs_qgroup_stats stats;
+   unsigned int unit_mode;
+   const char *referenced_size;
+   const char *referenced_limit_size = "-";
+   unsigned field_width = 0;
+
+   unit_mode = get_unit_mode_from_arg(&argc, argv, 1);
 
while (1) {
int c;
@@ -1112,6 +1120,44 @@ static int cmd_subvol_show(int argc, char **argv)
btrfs_list_subvols_print(fd, filter_set, NULL, BTRFS_LIST_LAYOUT_RAW,
1, raw_prefix);
 
+   ret = btrfs_qgroup_query(fd, get_ri.root_id, &stats);
+   if (ret < 0) {
+   if (ret == -ENODATA)
+   printf("Quotas must be enabled for per-subvolume 
usage\n");
+   else if (ret != -ENOTTY)
+   fprintf(stderr,
+   "\nERROR: BTRFS_IOC_QUOTA_QUERY failed: %s\n",
+   strerror(errno));
+   goto out;
+   }
+
+   printf("\tQuota Usage:\t\t");
+   fflush(stdout);
+
+   referenced_size = pretty_size_mode(stats.info.referenced, unit_mode);
+   if (stats.limit.max_referenced)
+  referenced_limit_size = pretty_size_mode(
+   stats.limit.max_referenced,
+   unit_mode);
+   field_width = max(strlen(referenced_size),
+ strlen(referenced_limit_size));
+
+   printf("%-*s referenced, %s exclusive\n ", field_width,
+  referenced_size,
+  pretty_size_mode(stats.info.exclusive, unit_mode));
+
+   printf("\tQuota Limits:\t\t");
+   if (stats.limit.max_referenced || stats.limit.max_exclusive) {
+   const char *excl = "-";
+
+   if (stats.limit.max_exclusive)
+  excl = pretty_size_mode(stats.limit.max_exclusive,
+  unit_mode);
+   printf("%-*s referenced, %s exclusive\n", field_width,
+  referenced_limit_size, excl);
+   } else
+   printf("None\n");
+
 out:
/* clean up */
free(get_ri.path);
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] btrfs-progs: qgroups: introduce and use info and limit structures

2018-03-02 Thread jeffm

From: Jeff Mahoney 

We use structures to pass the info and limit from the kernel as items
but store the individual values separately in btrfs_qgroup.  We already
have a btrfs_qgroup_limit structure that's used for setting the limit.

This patch introduces a btrfs_qgroup_info structure and uses that and
btrfs_qgroup_limit in btrfs_qgroup.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 83 +++-
 qgroup.h |  8 +++
 2 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/qgroup.c b/qgroup.c
index 5a7a8530..7ec12ec1 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -50,20 +50,12 @@ struct btrfs_qgroup {
/*
 * info_item
 */
-   u64 generation;
-   u64 rfer;   /*referenced*/
-   u64 rfer_cmpr;  /*referenced compressed*/
-   u64 excl;   /*exclusive*/
-   u64 excl_cmpr;  /*exclusive compressed*/
+   struct btrfs_qgroup_info info;
 
/*
 *limit_item
 */
-   u64 flags;  /*which limits are set*/
-   u64 max_rfer;
-   u64 max_excl;
-   u64 rsv_rfer;
-   u64 rsv_excl;
+   struct btrfs_qgroup_limit limit;
 
/*qgroups this group is member of*/
struct list_head qgroups;
@@ -276,24 +268,24 @@ static void print_qgroup_column(struct btrfs_qgroup 
*qgroup,
print_qgroup_column_add_blank(BTRFS_QGROUP_QGROUPID, len);
break;
case BTRFS_QGROUP_RFER:
-   len = printf("%*s", max_len, pretty_size_mode(qgroup->rfer, 
unit_mode));
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->info.referenced, unit_mode));
break;
case BTRFS_QGROUP_EXCL:
-   len = printf("%*s", max_len, pretty_size_mode(qgroup->excl, 
unit_mode));
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->info.exclusive, unit_mode));
break;
case BTRFS_QGROUP_PARENT:
len = print_parent_column(qgroup);
print_qgroup_column_add_blank(BTRFS_QGROUP_PARENT, len);
break;
case BTRFS_QGROUP_MAX_RFER:
-   if (qgroup->flags & BTRFS_QGROUP_LIMIT_MAX_RFER)
-   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->max_rfer, unit_mode));
+   if (qgroup->limit.flags & BTRFS_QGROUP_LIMIT_MAX_RFER)
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->limit.max_referenced, unit_mode));
else
len = printf("%*s", max_len, "none");
break;
case BTRFS_QGROUP_MAX_EXCL:
-   if (qgroup->flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
-   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->max_excl, unit_mode));
+   if (qgroup->limit.flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->limit.max_exclusive, unit_mode));
else
len = printf("%*s", max_len, "none");
break;
@@ -436,9 +428,9 @@ static int comp_entry_with_rfer(struct btrfs_qgroup *entry1,
 {
int ret;
 
-   if (entry1->rfer > entry2->rfer)
+   if (entry1->info.referenced > entry2->info.referenced)
ret = 1;
-   else if (entry1->rfer < entry2->rfer)
+   else if (entry1->info.referenced < entry2->info.referenced)
ret = -1;
else
ret = 0;
@@ -452,9 +444,9 @@ static int comp_entry_with_excl(struct btrfs_qgroup *entry1,
 {
int ret;
 
-   if (entry1->excl > entry2->excl)
+   if (entry1->info.exclusive > entry2->info.exclusive)
ret = 1;
-   else if (entry1->excl < entry2->excl)
+   else if (entry1->info.exclusive < entry2->info.exclusive)
ret = -1;
else
ret = 0;
@@ -468,9 +460,9 @@ static int comp_entry_with_max_rfer(struct btrfs_qgroup 
*entry1,
 {
int ret;
 
-   if (entry1->max_rfer > entry2->max_rfer)
+   if (entry1->limit.max_referenced > entry2->limit.max_referenced)
ret = 1;
-   else if (entry1->max_rfer < entry2->max_rfer)
+   else if (entry1->limit.max_referenced < entry2->limit.max_referenced)
ret = -1;
else
ret = 0;
@@ -484,9 +476,9 @@ static int comp_entry_with_max_excl(struct btrfs_qgroup 
*entry1,
 {
int ret;
 
-   if (entry1->max_excl > entry2->max_excl)
+   if (entry1->limit.max_exclusive > entry2->limit.max_exclusive)
ret = 1;
-   else if (entry1->max_excl < entry2->max_excl)
+   else if (entry1->limit.max_exclusive < entry2->limit.max_exclusive)
ret = -1;
else
ret = 0;
@@ -743,11 +735,13 @@ static int update_qgroup_info(int fd, struct 
qgroup_lookup *qgroup_lookup,
if (IS_ERR_OR_NULL(bq))
return PTR_ERR(bq);
 
-   bq->generatio

[PATCH 2/8] btrfs-progs: qgroups: fix misleading index check

2018-03-02 Thread jeffm

From: Jeff Mahoney 

In print_single_qgroup_table we check the loop index against
BTRFS_QGROUP_CHILD, but what we really mean is "last column."  Since
we have an enum value to indicate the last value, use that instead
of assuming that BTRFS_QGROUP_CHILD is always last.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qgroup.c b/qgroup.c
index 11659e83..67bc0738 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -267,7 +267,7 @@ static void print_single_qgroup_table(struct btrfs_qgroup 
*qgroup)
continue;
print_qgroup_column(qgroup, i);
 
-   if (i != BTRFS_QGROUP_CHILD)
+   if (i != BTRFS_QGROUP_ALL - 1)
printf(" ");
}
printf("\n");
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] btrfs-progs: qgroups: introduce btrfs_qgroup_query

2018-03-02 Thread jeffm

From: Jeff Mahoney 

The only mechanism we have in the progs for searching qgroups is to load
all of them and filter the results.  This works for qgroup show but
to add quota information to 'btrfs subvoluem show' it's pretty wasteful.

This patch splits out setting up the search and performing the search so
we can search for a single qgroupid more easily.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 98 +---
 qgroup.h |  7 +
 2 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/qgroup.c b/qgroup.c
index b1be3311..2d0a6947 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -1146,11 +1146,11 @@ static inline void print_status_flag_warning(u64 flags)
warning("qgroup data inconsistent, rescan recommended");
 }
 
-static int __qgroups_search(int fd, struct qgroup_lookup *qgroup_lookup)
+static int __qgroups_search(int fd, struct btrfs_ioctl_search_args *args,
+   struct qgroup_lookup *qgroup_lookup)
 {
int ret;
-   struct btrfs_ioctl_search_args args;
-   struct btrfs_ioctl_search_key *sk = &args.key;
+   struct btrfs_ioctl_search_key *sk = &args->key;
struct btrfs_ioctl_search_header *sh;
unsigned long off = 0;
unsigned int i;
@@ -1161,30 +1161,12 @@ static int __qgroups_search(int fd, struct 
qgroup_lookup *qgroup_lookup)
u64 qgroupid;
u64 qgroupid1;
 
-   memset(&args, 0, sizeof(args));
-
-   sk->tree_id = BTRFS_QUOTA_TREE_OBJECTID;
-   sk->max_type = BTRFS_QGROUP_RELATION_KEY;
-   sk->min_type = BTRFS_QGROUP_STATUS_KEY;
-   sk->max_objectid = (u64)-1;
-   sk->max_offset = (u64)-1;
-   sk->max_transid = (u64)-1;
-   sk->nr_items = 4096;
-
qgroup_lookup_init(qgroup_lookup);
 
while (1) {
-   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
+   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
if (ret < 0) {
-   if (errno == ENOENT) {
-   error("can't list qgroups: quotas not enabled");
-   ret = -ENOTTY;
-   } else {
-   error("can't list qgroups: %s",
-  strerror(errno));
-   ret = -errno;
-   }
-
+   ret = -errno;
break;
}
 
@@ -1198,14 +1180,14 @@ static int __qgroups_search(int fd, struct 
qgroup_lookup *qgroup_lookup)
 * read the root_ref item it contains
 */
for (i = 0; i < sk->nr_items; i++) {
-   sh = (struct btrfs_ioctl_search_header *)(args.buf +
+   sh = (struct btrfs_ioctl_search_header *)(args->buf +
  off);
off += sizeof(*sh);
 
switch (btrfs_search_header_type(sh)) {
case BTRFS_QGROUP_STATUS_KEY:
si = (struct btrfs_qgroup_status_item *)
-(args.buf + off);
+(args->buf + off);
flags = btrfs_stack_qgroup_status_flags(si);
 
print_status_flag_warning(flags);
@@ -1213,7 +1195,7 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
case BTRFS_QGROUP_INFO_KEY:
qgroupid = btrfs_search_header_offset(sh);
info = (struct btrfs_qgroup_info_item *)
-  (args.buf + off);
+  (args->buf + off);
 
ret = update_qgroup_info(fd, qgroup_lookup,
 qgroupid, info);
@@ -1221,7 +1203,7 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
case BTRFS_QGROUP_LIMIT_KEY:
qgroupid = btrfs_search_header_offset(sh);
limit = (struct btrfs_qgroup_limit_item *)
-   (args.buf + off);
+   (args->buf + off);
 
ret = update_qgroup_limit(fd, qgroup_lookup,
  qgroupid, limit);
@@ -1267,6 +1249,66 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
return ret;
 }
 
+static int qgroups_search_all(int fd, struct qgroup_lookup *qgroup_lookup)
+{
+   struct btrfs_ioctl_search_args args = {
+   .key = {
+   .tree_id = BTRFS_QUOTA_TREE_OBJECTID,
+   .max_type = BTRFS_QGROUP_RELATION_KEY,
+   .min_type = BTRFS_QGROUP

[PATCH 5/8] btrfs-progs: qgroups: introduce and use info and limit structures

2018-03-02 Thread jeffm

From: Jeff Mahoney 

We use structures to pass the info and limit from the kernel as items
but store the individual values separately in btrfs_qgroup.  We already
have a btrfs_qgroup_limit structure that's used for setting the limit.

This patch introduces a btrfs_qgroup_info structure and uses that and
btrfs_qgroup_limit in btrfs_qgroup.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 73 +++-
 qgroup.h |  8 +++
 2 files changed, 43 insertions(+), 38 deletions(-)

diff --git a/qgroup.c b/qgroup.c
index 83918134..b1be3311 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -46,20 +46,12 @@ struct btrfs_qgroup {
/*
 * info_item
 */
-   u64 generation;
-   u64 rfer;   /*referenced*/
-   u64 rfer_cmpr;  /*referenced compressed*/
-   u64 excl;   /*exclusive*/
-   u64 excl_cmpr;  /*exclusive compressed*/
+   struct btrfs_qgroup_info info;
 
/*
 *limit_item
 */
-   u64 flags;  /*which limits are set*/
-   u64 max_rfer;
-   u64 max_excl;
-   u64 rsv_rfer;
-   u64 rsv_excl;
+   struct btrfs_qgroup_limit limit;
 
/*qgroups this group is member of*/
struct list_head qgroups;
@@ -272,24 +264,24 @@ static void print_qgroup_column(struct btrfs_qgroup 
*qgroup,
print_qgroup_column_add_blank(BTRFS_QGROUP_QGROUPID, len);
break;
case BTRFS_QGROUP_RFER:
-   len = printf("%*s", max_len, pretty_size_mode(qgroup->rfer, 
unit_mode));
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->info.referenced, unit_mode));
break;
case BTRFS_QGROUP_EXCL:
-   len = printf("%*s", max_len, pretty_size_mode(qgroup->excl, 
unit_mode));
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->info.exclusive, unit_mode));
break;
case BTRFS_QGROUP_PARENT:
len = print_parent_column(qgroup);
print_qgroup_column_add_blank(BTRFS_QGROUP_PARENT, len);
break;
case BTRFS_QGROUP_MAX_RFER:
-   if (qgroup->flags & BTRFS_QGROUP_LIMIT_MAX_RFER)
-   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->max_rfer, unit_mode));
+   if (qgroup->limit.flags & BTRFS_QGROUP_LIMIT_MAX_RFER)
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->limit.max_referenced, unit_mode));
else
len = printf("%*s", max_len, "none");
break;
case BTRFS_QGROUP_MAX_EXCL:
-   if (qgroup->flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
-   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->max_excl, unit_mode));
+   if (qgroup->limit.flags & BTRFS_QGROUP_LIMIT_MAX_EXCL)
+   len = printf("%*s", max_len, 
pretty_size_mode(qgroup->limit.max_exclusive, unit_mode));
else
len = printf("%*s", max_len, "none");
break;
@@ -432,9 +424,9 @@ static int comp_entry_with_rfer(struct btrfs_qgroup *entry1,
 {
int ret;
 
-   if (entry1->rfer > entry2->rfer)
+   if (entry1->info.referenced > entry2->info.referenced)
ret = 1;
-   else if (entry1->rfer < entry2->rfer)
+   else if (entry1->info.referenced < entry2->info.referenced)
ret = -1;
else
ret = 0;
@@ -448,9 +440,9 @@ static int comp_entry_with_excl(struct btrfs_qgroup *entry1,
 {
int ret;
 
-   if (entry1->excl > entry2->excl)
+   if (entry1->info.exclusive > entry2->info.exclusive)
ret = 1;
-   else if (entry1->excl < entry2->excl)
+   else if (entry1->info.exclusive < entry2->info.exclusive)
ret = -1;
else
ret = 0;
@@ -464,9 +456,9 @@ static int comp_entry_with_max_rfer(struct btrfs_qgroup 
*entry1,
 {
int ret;
 
-   if (entry1->max_rfer > entry2->max_rfer)
+   if (entry1->limit.max_referenced > entry2->limit.max_referenced)
ret = 1;
-   else if (entry1->max_rfer < entry2->max_rfer)
+   else if (entry1->limit.max_referenced < entry2->limit.max_referenced)
ret = -1;
else
ret = 0;
@@ -480,9 +472,9 @@ static int comp_entry_with_max_excl(struct btrfs_qgroup 
*entry1,
 {
int ret;
 
-   if (entry1->max_excl > entry2->max_excl)
+   if (entry1->limit.max_exclusive > entry2->limit.max_exclusive)
ret = 1;
-   else if (entry1->max_excl < entry2->max_excl)
+   else if (entry1->limit.max_exclusive < entry2->limit.max_exclusive)
ret = -1;
else
ret = 0;
@@ -739,11 +731,13 @@ static int update_qgroup_info(int fd, struct 
qgroup_lookup *qgroup_lookup,
if (IS_ERR_OR_NULL(bq))
return PTR_ERR(bq);
 
-   bq->generatio

[PATCH 3/8] btrfs-progs: constify pathnames passed as arguments

2018-03-02 Thread jeffm

From: Jeff Mahoney 

It's unlikely we're going to modify a pathname argument, so codify that
and use const.

Signed-off-by: Jeff Mahoney 
---
 chunk-recover.c | 4 ++--
 cmds-device.c   | 2 +-
 cmds-fi-usage.c | 6 +++---
 cmds-rescue.c   | 4 ++--
 send-utils.c| 4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/chunk-recover.c b/chunk-recover.c
index 705bcf52..1d30db51 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1492,7 +1492,7 @@ out:
return ERR_PTR(ret);
 }
 
-static int recover_prepare(struct recover_control *rc, char *path)
+static int recover_prepare(struct recover_control *rc, const char *path)
 {
int ret;
int fd;
@@ -2296,7 +2296,7 @@ static void validate_rebuild_chunks(struct 
recover_control *rc)
 /*
  * Return 0 when successful, < 0 on error and > 0 if aborted by user
  */
-int btrfs_recover_chunk_tree(char *path, int verbose, int yes)
+int btrfs_recover_chunk_tree(const char *path, int verbose, int yes)
 {
int ret = 0;
struct btrfs_root *root = NULL;
diff --git a/cmds-device.c b/cmds-device.c
index 86459d1b..a49c9d9d 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -526,7 +526,7 @@ static const char * const cmd_device_usage_usage[] = {
NULL
 };
 
-static int _cmd_device_usage(int fd, char *path, unsigned unit_mode)
+static int _cmd_device_usage(int fd, const char *path, unsigned unit_mode)
 {
int i;
int ret = 0;
diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index de7ad668..9a1c76ab 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -227,7 +227,7 @@ static int cmp_btrfs_ioctl_space_info(const void *a, const 
void *b)
 /*
  * This function load all the information about the space usage
  */
-static struct btrfs_ioctl_space_args *load_space_info(int fd, char *path)
+static struct btrfs_ioctl_space_args *load_space_info(int fd, const char *path)
 {
struct btrfs_ioctl_space_args *sargs = NULL, *sargs_orig = NULL;
int ret, count;
@@ -305,7 +305,7 @@ static void get_raid56_used(struct chunk_info *chunks, int 
chunkcount,
 #defineMIN_UNALOCATED_THRESH   SZ_16M
 static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
int chunkcount, struct device_info *devinfo, int devcount,
-   char *path, unsigned unit_mode)
+   const char *path, unsigned unit_mode)
 {
struct btrfs_ioctl_space_args *sargs = NULL;
int i;
@@ -931,7 +931,7 @@ static void _cmd_filesystem_usage_linear(unsigned unit_mode,
 static int print_filesystem_usage_by_chunk(int fd,
struct chunk_info *chunkinfo, int chunkcount,
struct device_info *devinfo, int devcount,
-   char *path, unsigned unit_mode, int tabular)
+   const char *path, unsigned unit_mode, int tabular)
 {
struct btrfs_ioctl_space_args *sargs;
int ret = 0;
diff --git a/cmds-rescue.c b/cmds-rescue.c
index c40088ad..c61145bc 100644
--- a/cmds-rescue.c
+++ b/cmds-rescue.c
@@ -32,8 +32,8 @@ static const char * const rescue_cmd_group_usage[] = {
NULL
 };
 
-int btrfs_recover_chunk_tree(char *path, int verbose, int yes);
-int btrfs_recover_superblocks(char *path, int verbose, int yes);
+int btrfs_recover_chunk_tree(const char *path, int verbose, int yes);
+int btrfs_recover_superblocks(const char *path, int verbose, int yes);
 
 static const char * const cmd_rescue_chunk_recover_usage[] = {
"btrfs rescue chunk-recover [options] ",
diff --git a/send-utils.c b/send-utils.c
index b5289e76..8ce94de1 100644
--- a/send-utils.c
+++ b/send-utils.c
@@ -28,8 +28,8 @@
 #include "ioctl.h"
 #include "btrfs-list.h"
 
-static int btrfs_subvolid_resolve_sub(int fd, char *path, size_t *path_len,
- u64 subvol_id);
+static int btrfs_subvolid_resolve_sub(int fd, char *path,
+ size_t *path_len, u64 subvol_id);
 
 static int btrfs_get_root_id_by_sub_path(int mnt_fd, const char *sub_path,
 u64 *root_id)
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] btrfs-progs: qgroups: introduce btrfs_qgroup_query

2018-03-02 Thread jeffm

From: Jeff Mahoney 

The only mechanism we have in the progs for searching qgroups is to load
all of them and filter the results.  This works for qgroup show but
to add quota information to 'btrfs subvoluem show' it's pretty wasteful.

This patch splits out setting up the search and performing the search so
we can search for a single qgroupid more easily.

Signed-off-by: Jeff Mahoney 
---
 qgroup.c | 100 +--
 qgroup.h |   7 +
 2 files changed, 78 insertions(+), 29 deletions(-)

diff --git a/qgroup.c b/qgroup.c
index 7ec12ec1..f632a45c 100644
--- a/qgroup.c
+++ b/qgroup.c
@@ -1150,11 +1150,11 @@ static inline void print_status_flag_warning(u64 flags)
warning("qgroup data inconsistent, rescan recommended");
 }
 
-static int __qgroups_search(int fd, struct qgroup_lookup *qgroup_lookup)
+static int __qgroups_search(int fd, struct btrfs_ioctl_search_args *args,
+   struct qgroup_lookup *qgroup_lookup)
 {
int ret;
-   struct btrfs_ioctl_search_args args;
-   struct btrfs_ioctl_search_key *sk = &args.key;
+   struct btrfs_ioctl_search_key *sk = &args->key;
struct btrfs_ioctl_search_header *sh;
unsigned long off = 0;
unsigned int i;
@@ -1165,30 +1165,12 @@ static int __qgroups_search(int fd, struct 
qgroup_lookup *qgroup_lookup)
u64 qgroupid;
u64 qgroupid1;
 
-   memset(&args, 0, sizeof(args));
-
-   sk->tree_id = BTRFS_QUOTA_TREE_OBJECTID;
-   sk->max_type = BTRFS_QGROUP_RELATION_KEY;
-   sk->min_type = BTRFS_QGROUP_STATUS_KEY;
-   sk->max_objectid = (u64)-1;
-   sk->max_offset = (u64)-1;
-   sk->max_transid = (u64)-1;
-   sk->nr_items = 4096;
-
qgroup_lookup_init(qgroup_lookup);
 
while (1) {
-   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args);
+   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
if (ret < 0) {
-   if (errno == ENOENT) {
-   error("can't list qgroups: quotas not enabled");
-   ret = -ENOTTY;
-   } else {
-   error("can't list qgroups: %s",
-  strerror(errno));
-   ret = -errno;
-   }
-
+   ret = -errno;
break;
}
 
@@ -1202,14 +1184,14 @@ static int __qgroups_search(int fd, struct 
qgroup_lookup *qgroup_lookup)
 * read the root_ref item it contains
 */
for (i = 0; i < sk->nr_items; i++) {
-   sh = (struct btrfs_ioctl_search_header *)(args.buf +
+   sh = (struct btrfs_ioctl_search_header *)(args->buf +
  off);
off += sizeof(*sh);
 
switch (btrfs_search_header_type(sh)) {
case BTRFS_QGROUP_STATUS_KEY:
si = (struct btrfs_qgroup_status_item *)
-(args.buf + off);
+(args->buf + off);
flags = btrfs_stack_qgroup_status_flags(si);
 
print_status_flag_warning(flags);
@@ -1217,7 +1199,7 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
case BTRFS_QGROUP_INFO_KEY:
qgroupid = btrfs_search_header_offset(sh);
info = (struct btrfs_qgroup_info_item *)
-  (args.buf + off);
+  (args->buf + off);
 
ret = update_qgroup_info(fd, qgroup_lookup,
 qgroupid, info);
@@ -1225,7 +1207,7 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
case BTRFS_QGROUP_LIMIT_KEY:
qgroupid = btrfs_search_header_offset(sh);
limit = (struct btrfs_qgroup_limit_item *)
-   (args.buf + off);
+   (args->buf + off);
 
ret = update_qgroup_limit(fd, qgroup_lookup,
  qgroupid, limit);
@@ -1271,6 +1253,66 @@ static int __qgroups_search(int fd, struct qgroup_lookup 
*qgroup_lookup)
return ret;
 }
 
+static int qgroups_search_all(int fd, struct qgroup_lookup *qgroup_lookup)
+{
+   struct btrfs_ioctl_search_args args = {
+   .key = {
+   .tree_id = BTRFS_QUOTA_TREE_OBJECTID,
+   .max_type = BTRFS_QGROUP_RELATION_KEY,
+   .min_type = BTRFS_QGROU

[PATCH 0/8] btrfs-progs: qgroups usability

2018-03-02 Thread jeffm

From: Jeff Mahoney 

Hi all -

The following series addresses some usability issues with the qgroups UI.

1) Adds -W option so we can wait on a rescan completing without starting one.
2) Adds qgroup information to 'btrfs subvolume show'
3) Adds a -P option to show pathnames for first-level qgroups (or member
   of nested qgroups with -v)
4) Allows exporting the qgroup table in JSON format for use by external
   programs/scripts.

-Jeff

Jeff Mahoney (8):
  btrfs-progs: quota: Add -W option to rescan to wait without starting
rescan
  btrfs-progs: qgroups: fix misleading index check
  btrfs-progs: constify pathnames passed as arguments
  btrfs-progs: qgroups: add pathname to show output
  btrfs-progs: qgroups: introduce and use info and limit structures
  btrfs-progs: qgroups: introduce btrfs_qgroup_query
  btrfs-progs: subvolume: add quota info to btrfs sub show
  btrfs-progs: qgroups: export qgroups usage information as JSON

 Documentation/btrfs-qgroup.asciidoc |   8 +
 Documentation/btrfs-quota.asciidoc  |  10 +-
 Makefile.inc.in |   4 +-
 chunk-recover.c |   4 +-
 cmds-device.c   |   2 +-
 cmds-fi-usage.c |   6 +-
 cmds-qgroup.c   |  49 +++-
 cmds-quota.c|  21 +-
 cmds-rescue.c   |   4 +-
 cmds-subvolume.c|  46 
 configure.ac|   6 +
 kerncompat.h|   1 +
 qgroup.c| 526 ++--
 qgroup.h|  22 +-
 send-utils.c|   4 +-
 utils.c |  22 +-
 utils.h |   2 +
 17 files changed, 621 insertions(+), 116 deletions(-)

-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Add nossd_spread mount option

2018-03-02 Thread Josef Bacik

On Wed, Feb 21, 2018 at 03:31:40PM -0800, Howard McLauchlan wrote:
> Btrfs has two mount options for SSD optimizations: ssd and ssd_spread.
> Presently there is an option to disable all SSD optimizations, but there
> isn't an option to disable just ssd_spread.
> 
> This patch adds a mount option nossd_spread that disables ssd_spread
> only.
> 
> Signed-off-by: Howard McLauchlan 

Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Ongoing Btrfs stability issues

2018-03-02 Thread Liu Bo

On Thu, Mar 01, 2018 at 09:40:41PM +0200, Nikolay Borisov wrote:
> 
> 
> On  1.03.2018 21:04, Alex Adriaanse wrote:
> > On Feb 16, 2018, at 1:44 PM, Austin S. Hemmelgarn  
> > wrote:
...
> 
> > [496003.641729] BTRFS: error (device xvdc) in __btrfs_free_extent:7076: 
> > errno=-28 No space left
> > [496003.641994] BTRFS: error (device xvdc) in btrfs_drop_snapshot:9332: 
> > errno=-28 No space left
> > [496003.641996] BTRFS info (device xvdc): forced readonly
> > [496003.641998] BTRFS: error (device xvdc) in merge_reloc_roots:2470: 
> > errno=-28 No space left
> > [496003.642060] BUG: unable to handle kernel NULL pointer dereference at
> >(null)
> > [496003.642086] IP: __del_reloc_root+0x3c/0x100 [btrfs]
> > [496003.642087] PGD 8005fe08c067 P4D 8005fe08c067 PUD 3bd2f4067 PMD > > 0
> > [496003.642091] Oops:  [#1] SMP PTI
> > [496003.642093] Modules linked in: xt_nat xt_tcpudp veth ipt_MASQUERADE 
> > nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
> > iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
> > iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c crc32c_generic 
> > br_netfilter bridge stp llc intel_rapl sb_edac crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel ppdev intel_rapl_perf serio_raw parport_pc 
> > parport evdev ip_tables x_tables autofs4 btrfs xor zstd_decompress 
> > zstd_compress xxhash raid6_pq ata_generic crc32c_intel ata_piix libata 
> > xen_blkfront cirrus ttm aesni_intel aes_x86_64 crypto_simd drm_kms_helper 
> > cryptd glue_helper ena psmouse drm scsi_mod i2c_piix4 button
> > [496003.642128] CPU: 1 PID: 25327 Comm: btrfs Tainted: GW   
> > 4.14.0-0.bpo.3-amd64 #1 Debian 4.14.13-1~bpo9+1
> > [496003.642129] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
> > [496003.642130] task: 8fbffb8dd080 task.stack: 9e81c7b8c000
> > [496003.642149] RIP: 0010:__del_reloc_root+0x3c/0x100 [btrfs]
> 
> 
> if you happen to have the vmlinux of that kernel can you run the
> following from the kernel source directory:
> 
> ./scripts/faddr2line  __del_reloc_root+0x3c/0x100 vmlinux
>

I thought this was fixed by bb166d7 btrfs: fix NULL pointer dereference from 
free_reloc_roots(),
Alex, do you mind checking if it's included in your kernel?

You can also check if the following change is merged in kernel-src deb.

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 3a49a3c..9841fae 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2400,11 +2400,11 @@ void free_reloc_roots(struct list_head *list)
while (!list_empty(list)) {
reloc_root = list_entry(list->next, struct btrfs_root,
root_list);
+   __del_reloc_root(reloc_root);
free_extent_buffer(reloc_root->node);
free_extent_buffer(reloc_root->commit_root);
reloc_root->node = NULL;
reloc_root->commit_root = NULL;
-   __del_reloc_root(reloc_root);
}
 }


Thanks,

-liubo

> 
> > [496003.642151] RSP: 0018:9e81c7b8fab0 EFLAGS: 00010286
> > [496003.642153] RAX:  RBX: 8fb90a10a3c0 RCX: 
> > ca5d1fda5a5f
> > [496003.642154] RDX: 0001 RSI: 8fc05eae62c0 RDI: 
> > 8fbc4fd87d70
> > [496003.642154] RBP: 8fbbb5139000 R08:  R09: 
> > 
> > [496003.642155] R10: 8fc05eae62c0 R11: 01bc R12: 
> > 8fc0fbeac000
> > [496003.642156] R13: 8fbc4fd87d70 R14: 8fbc4fd87800 R15: 
> > ffe4
> > [496003.642157] FS:  7f64196708c0() GS:8fc100a4() 
> > knlGS:
> > [496003.642159] CS:  0010 DS:  ES:  CR0: 80050033
> > [496003.642160] CR2:  CR3: 00069b972004 CR4: 
> > 001606e0
> > [496003.642162] DR0:  DR1:  DR2: 
> > 
> > [496003.642163] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [496003.642164] Call Trace:
> > [496003.642185]  free_reloc_roots+0x22/0x60 [btrfs]
> > [496003.642202]  merge_reloc_roots+0x184/0x260 [btrfs]
> > [496003.642217]  relocate_block_group+0x29a/0x610 [btrfs]
> > [496003.642232]  btrfs_relocate_block_group+0x17b/0x230 [btrfs]
> > [496003.642254]  btrfs_relocate_chunk+0x38/0xb0 [btrfs]
> > [496003.642272]  btrfs_balance+0xa15/0x1250 [btrfs]
> > [496003.642292]  btrfs_ioctl_balance+0x368/0x380 [btrfs]
> > [496003.642309]  btrfs_ioctl+0x1170/0x24e0 [btrfs]
> > [496003.642312]  ? mem_cgroup_try_charge+0x86/0x1a0
> > [496003.642315]  ? __handle_mm_fault+0x640/0x10e0
> > [496003.642318]  ? do_vfs_ioctl+0x9f/0x600
> > [496003.642319]  do_vfs_ioctl+0x9f/0x600
> > [496003.642321]  ? handle_mm_fault+0xc6/0x1b0
> > [496003.642325]  ? __do_page_fault+0x289/0x500
> > [496003.642327]  SyS_ioctl+0x74/0x80
> > [496003.642330]  system_call_fast_compare_end+0xc/0x6f
> > [496003.642332] RIP: 0033:0x7f64186f8e07
> > [4960

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)" with 8TB HDDs

2018-03-02 Thread Menion

Thanks
My point was to understand if this action was taken by BTRFS or
automously by scsi.
>From your word it seems clear to me that this should go in
KERNEL_DEBUG level, instead of KERNEL_NOTICE
Bye

2018-03-02 16:18 GMT+01:00 David Sterba :
> On Fri, Mar 02, 2018 at 12:37:49PM +0100, Menion wrote:
>> Is it really a no problem? I mean, for some reason BTRFS is
>> continuously read the HDD capacity in an array, that does not seem to
>> be really correct
>
> The message comes from SCSI:
> https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508
>
> Reading drive capacity could be totally opaque for the filesystem, eg.
> when the scsi layer compares the requested block address with the device
> size.
>
> The sizes of blockdevices is obtained from the i_size member of the
> inode representing the block device, so there's no direct read by btrfs.
> You'd have better luck reporting that to scsi or block layer
> mailinglists.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)" with 8TB HDDs

2018-03-02 Thread David Sterba

On Fri, Mar 02, 2018 at 12:37:49PM +0100, Menion wrote:
> Is it really a no problem? I mean, for some reason BTRFS is
> continuously read the HDD capacity in an array, that does not seem to
> be really correct

The message comes from SCSI:
https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508

Reading drive capacity could be totally opaque for the filesystem, eg.
when the scsi layer compares the requested block address with the device
size.

The sizes of blockdevices is obtained from the i_size member of the
inode representing the block device, so there's no direct read by btrfs.
You'd have better luck reporting that to scsi or block layer
mailinglists.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] btrfs: Always limit inline extent size by uncompressed size

2018-03-02 Thread Qu Wenruo



On 2018年03月02日 19:00, Filipe Manana wrote:
> On Fri, Mar 2, 2018 at 10:54 AM, Qu Wenruo  wrote:
>>
>>
>> On 2018年03月02日 18:46, Filipe Manana wrote:
>>> On Fri, Mar 2, 2018 at 5:22 AM, Qu Wenruo  wrote:
 Normally when specifying max_inline, we should normally limit it by
 uncompressed extent size, as it's the only thing user can control.
>>>
>>> Why does it matter that users can control it? Will they write less (or
>>> more) data to files because stuff won't get inlined?
>>> Why do they care about stuff getting inlined or not? That's an
>>> implementation detail of btrfs to speed up access to file data and
>>> save some space.
>>
>> Then why we still have max_inline mount option?
> 
> My comment was about deciding based on which size to make the decision
> (compressed vs uncompressed).

The same thing, we have given user options to trigger the behavior, then
we should give them *predictable* option to modify the behavior.

Not something confusing like current max_inline.

Either we give user max_inline and max_inline_compressed, or both follow
max_inline.

Thanks,
Qu

> 
>> Just do everything we *think* is best is good enough in that case.
>>
>> If we provide that mount option to allow *user* to specify the behavior,
>> then allow then to do the same control.
>>
>> Thanks,
>> Qu
>>
>>>
 (Control the algorithm and compressed data is almost impossible)

 Since btrfs is providing *TRANSPARENT* compression, max_inline should
 behave the same for both plain and compress data.
>>>
>>> Taking away the benefits of compression for. So now some cases that
>>> ended up getting the benefits of inlining won't get them anymore.
>>>
>>> I don't agree with this change.
>>
>>
>>>

 So this patch will use @inline_len instead of @data_len in
 cow_file_range_inline() so user will know their max_inline mount option
 works exactly the same for both plain and compressed data extent.

 Signed-off-by: Qu Wenruo 
 ---
  fs/btrfs/inode.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index e1a7f3cb5be9..48472509239b 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -303,7 +303,7 @@ static noinline int cow_file_range_inline(struct 
 btrfs_root *root,
 (!compressed_size &&
 (actual_end & (fs_info->sectorsize - 1)) == 0) ||
 end + 1 < isize ||
 -   data_len > fs_info->max_inline) {
 +   inline_len > fs_info->max_inline) {
 return 1;
 }

 --
 2.16.2

 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)" with 8TB HDDs

2018-03-02 Thread Menion

Is it really a no problem? I mean, for some reason BTRFS is
continuously read the HDD capacity in an array, that does not seem to
be really correct
Bye

2018-02-26 11:07 GMT+01:00 Menion :
> Hi all
> I have recently started to operate an array of 5x8TB HDD (WD RED) in RAID5 
> mode
> The array seems to work ok, but with the time the dmesg is flooded by this 
> log:
>
> [ 338.674673] sd 0:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [ 338.767184] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [  338.989477] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
> CAPACITY(16).
> [  339.301194] sd 0:0:0:4: [sde] Very big device. Trying to use READ
> CAPACITY(16).
> [  339.506579] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
> CAPACITY(16).
> [  649.393340] sd 0:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [  650.129849] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [  650.379622] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
> CAPACITY(16).
> [  650.524828] sd 0:0:0:4: [sde] Very big device. Trying to use READ
> CAPACITY(16).
> [  650.721615] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
> CAPACITY(16).
> [  959.544384] sd 0:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [  959.627015] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [  959.790280] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
> CAPACITY(16).
> [  959.901179] sd 0:0:0:4: [sde] Very big device. Trying to use READ
> CAPACITY(16).
> [  960.048734] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
> CAPACITY(16).
>
> sda,sdb,sdc,sdd,sde as you can imagine are the HDDs in the array
>
> Other info (note: there is also another single BTRFS array of 3 small
> device that never print this log and my root filesystem is BTRFS as
> well)
>
> menion@Menionubuntu:/etc$ uname -a
> Linux Menionubuntu 4.15.5-041505-generic #201802221031 SMP Thu Feb 22
> 15:32:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> menion@Menionubuntu:/etc$   btrfs --version
> btrfs-progs v4.15.1
> menion@Menionubuntu:/etc$ sudo btrfs fi show
> [sudo] password for menion:
> Label: none  uuid: 6db4baf7-fda8-41ac-a6ad-1ca7b083430f
> Total devices 1 FS bytes used 9.02GiB
> devid1 size 27.07GiB used 11.02GiB path /dev/mmcblk0p3
>
> Label: none  uuid: 931d40c6-7cd7-46f3-a4bf-61f3a53844bc
> Total devices 5 FS bytes used 5.47TiB
> devid1 size 7.28TiB used 1.37TiB path /dev/sda
> devid2 size 7.28TiB used 1.37TiB path /dev/sdb
> devid3 size 7.28TiB used 1.37TiB path /dev/sdc
> devid4 size 7.28TiB used 1.37TiB path /dev/sdd
> devid5 size 7.28TiB used 1.37TiB path /dev/sde
>
> Label: none  uuid: ba1e0d88-2e26-499d-8fe3-458b9c53349a
> Total devices 3 FS bytes used 534.50GiB
> devid1 size 232.89GiB used 102.03GiB path /dev/sdh
> devid2 size 232.89GiB used 102.00GiB path /dev/sdi
> devid3 size 465.76GiB used 335.03GiB path /dev/sdj
>
> menion@Menionubuntu:/etc$ sudo btrfs fi df /media/storage/das1
> Data, RAID5: total=5.49TiB, used=5.46TiB
> System, RAID5: total=12.75MiB, used=352.00KiB
> Metadata, RAID5: total=7.00GiB, used=6.11GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> menion@Menionubuntu:/etc$
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] btrfs: Always limit inline extent size by uncompressed size

2018-03-02 Thread Filipe Manana

On Fri, Mar 2, 2018 at 10:54 AM, Qu Wenruo  wrote:
>
>
> On 2018年03月02日 18:46, Filipe Manana wrote:
>> On Fri, Mar 2, 2018 at 5:22 AM, Qu Wenruo  wrote:
>>> Normally when specifying max_inline, we should normally limit it by
>>> uncompressed extent size, as it's the only thing user can control.
>>
>> Why does it matter that users can control it? Will they write less (or
>> more) data to files because stuff won't get inlined?
>> Why do they care about stuff getting inlined or not? That's an
>> implementation detail of btrfs to speed up access to file data and
>> save some space.
>
> Then why we still have max_inline mount option?

My comment was about deciding based on which size to make the decision
(compressed vs uncompressed).

> Just do everything we *think* is best is good enough in that case.
>
> If we provide that mount option to allow *user* to specify the behavior,
> then allow then to do the same control.
>
> Thanks,
> Qu
>
>>
>>> (Control the algorithm and compressed data is almost impossible)
>>>
>>> Since btrfs is providing *TRANSPARENT* compression, max_inline should
>>> behave the same for both plain and compress data.
>>
>> Taking away the benefits of compression for. So now some cases that
>> ended up getting the benefits of inlining won't get them anymore.
>>
>> I don't agree with this change.
>
>
>>
>>>
>>> So this patch will use @inline_len instead of @data_len in
>>> cow_file_range_inline() so user will know their max_inline mount option
>>> works exactly the same for both plain and compressed data extent.
>>>
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>  fs/btrfs/inode.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index e1a7f3cb5be9..48472509239b 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -303,7 +303,7 @@ static noinline int cow_file_range_inline(struct 
>>> btrfs_root *root,
>>> (!compressed_size &&
>>> (actual_end & (fs_info->sectorsize - 1)) == 0) ||
>>> end + 1 < isize ||
>>> -   data_len > fs_info->max_inline) {
>>> +   inline_len > fs_info->max_inline) {
>>> return 1;
>>> }
>>>
>>> --
>>> 2.16.2
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] btrfs: Show more accurate max_inline

2018-03-02 Thread Qu Wenruo



On 2018年03月02日 16:37, Nikolay Borisov wrote:
> 
> 
> On  2.03.2018 10:34, Qu Wenruo wrote:
>>
>>
>> On 2018年03月02日 16:21, Misono, Tomohiro wrote:
>>> On 2018/03/02 14:22, Qu Wenruo wrote:
 Btrfs shows max_inline option into kernel message, but for
 max_inline=4096, btrfs won't really inline 4096 bytes inline data if
 it's not compressed.
>>>
>>> Hello,
>>> I have a question.
>>>
>>> man mount(8) says: 
>>>max_inline=bytes
>>>   Specify  the  maximum  amount  of space, in bytes, that can be
>>>   inlined in a metadata B-tree leaf.  The value is specified  in
>>>   bytes,  optionally with a K, M, or G suffix, case insensitive.
>>>   In practice, this value is limited by the  root  sector  size,
>>>   with  some  space  unavailable  due to leaf headers.  For a 4k
>>>   sectorsize, max inline data is ~3900 bytes.
>>>
>>> So, is the size of 4k-(size of leaf header) actually the maximum value
>>> of max_inline instead of 4095 for 4k sectorsize?
>>
>> Not exactly.
>>
>> For 4K nodesize, max_inline would be 3960 bytes.
>> As leaf header and EXTENT_ITEM header takes extra bytes.
>>
>> For 16K nodesize (default), we can go up to 4095 bytes then.
>>
>> And that man page needs updated, as it should be 4K *nodesize*.
> 
> Actually Qu what is preventing the btrfs_drop_extents of dropping inline
> extents larger than pagesize? This why you are doing this patchset, right ?

In fact, current kernel code won't create any inline extent for real
file to cross page boundary.

So kernel itself won't try to call __btrfs_drop_extents() inside inline
extent.

(Although for symbol link, we can still create such large inline extent,
but we won't use __btrfs_drop_extent() in this case).

What I'm doing in this patchset is to fix the confusing behavior related
to inline extents.

From its different behavior between plain and compressed extent to
larger than one page inlined extent used in symbol link.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
>>> Thanks,
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 2/5] btrfs: Always limit inline extent size by uncompressed size

2018-03-02 Thread Qu Wenruo



On 2018年03月02日 18:46, Filipe Manana wrote:
> On Fri, Mar 2, 2018 at 5:22 AM, Qu Wenruo  wrote:
>> Normally when specifying max_inline, we should normally limit it by
>> uncompressed extent size, as it's the only thing user can control.
> 
> Why does it matter that users can control it? Will they write less (or
> more) data to files because stuff won't get inlined?
> Why do they care about stuff getting inlined or not? That's an
> implementation detail of btrfs to speed up access to file data and
> save some space.

Then why we still have max_inline mount option?
Just do everything we *think* is best is good enough in that case.

If we provide that mount option to allow *user* to specify the behavior,
then allow then to do the same control.

Thanks,
Qu

> 
>> (Control the algorithm and compressed data is almost impossible)
>>
>> Since btrfs is providing *TRANSPARENT* compression, max_inline should
>> behave the same for both plain and compress data.
> 
> Taking away the benefits of compression for. So now some cases that
> ended up getting the benefits of inlining won't get them anymore.
> 
> I don't agree with this change.


> 
>>
>> So this patch will use @inline_len instead of @data_len in
>> cow_file_range_inline() so user will know their max_inline mount option
>> works exactly the same for both plain and compressed data extent.
>>
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/inode.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index e1a7f3cb5be9..48472509239b 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -303,7 +303,7 @@ static noinline int cow_file_range_inline(struct 
>> btrfs_root *root,
>> (!compressed_size &&
>> (actual_end & (fs_info->sectorsize - 1)) == 0) ||
>> end + 1 < isize ||
>> -   data_len > fs_info->max_inline) {
>> +   inline_len > fs_info->max_inline) {
>> return 1;
>> }
>>
>> --
>> 2.16.2
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 2/5] btrfs: Always limit inline extent size by uncompressed size

2018-03-02 Thread Filipe Manana

On Fri, Mar 2, 2018 at 5:22 AM, Qu Wenruo  wrote:
> Normally when specifying max_inline, we should normally limit it by
> uncompressed extent size, as it's the only thing user can control.

Why does it matter that users can control it? Will they write less (or
more) data to files because stuff won't get inlined?
Why do they care about stuff getting inlined or not? That's an
implementation detail of btrfs to speed up access to file data and
save some space.

> (Control the algorithm and compressed data is almost impossible)
>
> Since btrfs is providing *TRANSPARENT* compression, max_inline should
> behave the same for both plain and compress data.

Taking away the benefits of compression for. So now some cases that
ended up getting the benefits of inlining won't get them anymore.

I don't agree with this change.

>
> So this patch will use @inline_len instead of @data_len in
> cow_file_range_inline() so user will know their max_inline mount option
> works exactly the same for both plain and compressed data extent.
>
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e1a7f3cb5be9..48472509239b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -303,7 +303,7 @@ static noinline int cow_file_range_inline(struct 
> btrfs_root *root,
> (!compressed_size &&
> (actual_end & (fs_info->sectorsize - 1)) == 0) ||
> end + 1 < isize ||
> -   data_len > fs_info->max_inline) {
> +   inline_len > fs_info->max_inline) {
> return 1;
> }
>
> --
> 2.16.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Your response is important

2018-03-02 Thread jeddy

Hello,

I am contacting you to be my foreign partner in a financial transaction in my 
Corporation with the objective of investing the fund in your country.
Thanks in anticipation of your response.

Mr.Jeddy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] btrfs: Show more accurate max_inline

2018-03-02 Thread Nikolay Borisov



On  2.03.2018 10:34, Qu Wenruo wrote:
> 
> 
> On 2018年03月02日 16:21, Misono, Tomohiro wrote:
>> On 2018/03/02 14:22, Qu Wenruo wrote:
>>> Btrfs shows max_inline option into kernel message, but for
>>> max_inline=4096, btrfs won't really inline 4096 bytes inline data if
>>> it's not compressed.
>>
>> Hello,
>> I have a question.
>>
>> man mount(8) says: 
>>max_inline=bytes
>>   Specify  the  maximum  amount  of space, in bytes, that can be
>>   inlined in a metadata B-tree leaf.  The value is specified  in
>>   bytes,  optionally with a K, M, or G suffix, case insensitive.
>>   In practice, this value is limited by the  root  sector  size,
>>   with  some  space  unavailable  due to leaf headers.  For a 4k
>>   sectorsize, max inline data is ~3900 bytes.
>>
>> So, is the size of 4k-(size of leaf header) actually the maximum value
>> of max_inline instead of 4095 for 4k sectorsize?
> 
> Not exactly.
> 
> For 4K nodesize, max_inline would be 3960 bytes.
> As leaf header and EXTENT_ITEM header takes extra bytes.
> 
> For 16K nodesize (default), we can go up to 4095 bytes then.
> 
> And that man page needs updated, as it should be 4K *nodesize*.

Actually Qu what is preventing the btrfs_drop_extents of dropping inline
extents larger than pagesize? This why you are doing this patchset, right ?

> 
> Thanks,
> Qu
> 
>>
>> Thanks,
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] btrfs: Show more accurate max_inline

2018-03-02 Thread Qu Wenruo



On 2018年03月02日 16:21, Misono, Tomohiro wrote:
> On 2018/03/02 14:22, Qu Wenruo wrote:
>> Btrfs shows max_inline option into kernel message, but for
>> max_inline=4096, btrfs won't really inline 4096 bytes inline data if
>> it's not compressed.
> 
> Hello,
> I have a question.
> 
> man mount(8) says: 
>max_inline=bytes
>   Specify  the  maximum  amount  of space, in bytes, that can be
>   inlined in a metadata B-tree leaf.  The value is specified  in
>   bytes,  optionally with a K, M, or G suffix, case insensitive.
>   In practice, this value is limited by the  root  sector  size,
>   with  some  space  unavailable  due to leaf headers.  For a 4k
>   sectorsize, max inline data is ~3900 bytes.
> 
> So, is the size of 4k-(size of leaf header) actually the maximum value
> of max_inline instead of 4095 for 4k sectorsize?

Not exactly.

For 4K nodesize, max_inline would be 3960 bytes.
As leaf header and EXTENT_ITEM header takes extra bytes.

For 16K nodesize (default), we can go up to 4095 bytes then.

And that man page needs updated, as it should be 4K *nodesize*.

Thanks,
Qu

> 
> Thanks,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 5/5] btrfs: Show more accurate max_inline

2018-03-02 Thread Nikolay Borisov



On  2.03.2018 10:21, Misono, Tomohiro wrote:
> On 2018/03/02 14:22, Qu Wenruo wrote:
>> Btrfs shows max_inline option into kernel message, but for
>> max_inline=4096, btrfs won't really inline 4096 bytes inline data if
>> it's not compressed.
> 
> Hello,
> I have a question.
> 
> man mount(8) says: 
>max_inline=bytes
>   Specify  the  maximum  amount  of space, in bytes, that can be
>   inlined in a metadata B-tree leaf.  The value is specified  in
>   bytes,  optionally with a K, M, or G suffix, case insensitive.
>   In practice, this value is limited by the  root  sector  size,
>   with  some  space  unavailable  due to leaf headers.  For a 4k
>   sectorsize, max inline data is ~3900 bytes.
> 
> So, is the size of 4k-(size of leaf header) actually the maximum value
> of max_inline instead of 4095 for 4k sectorsize?

I think the documentation is wrong. Without patch 3/5 we have the max
inline data size as:

BTRFS_MAX_ITEM_SIZE(info) - BTRFS_FILE_EXTENT_INLINE_DATA_START

so MAX_ITEM_SIZE  = nodesize - sizeof(btrfs_header) -
sizeof(btrfs_item).  So this gives us the data portion in the leaf.

So if we substitute the raw number we get:

16k - 101 - 25 = 16258 bytes.

>From this number we also subtract the offset of disk_bytenr in
btrfs_file_extent_item (which is 21). So we end up with
MAX_INLINE_DATA_SIZE of 16258 - 21 = 16237

With Qu's patch the min_t will always be taking 4095 as the
MAX_INLINE_DATA_SIZE.

> 
> Thanks,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] btrfs: Show more accurate max_inline

2018-03-02 Thread Misono, Tomohiro

On 2018/03/02 14:22, Qu Wenruo wrote:
> Btrfs shows max_inline option into kernel message, but for
> max_inline=4096, btrfs won't really inline 4096 bytes inline data if
> it's not compressed.

Hello,
I have a question.

man mount(8) says: 
   max_inline=bytes
  Specify  the  maximum  amount  of space, in bytes, that can be
  inlined in a metadata B-tree leaf.  The value is specified  in
  bytes,  optionally with a K, M, or G suffix, case insensitive.
  In practice, this value is limited by the  root  sector  size,
  with  some  space  unavailable  due to leaf headers.  For a 4k
  sectorsize, max inline data is ~3900 bytes.

So, is the size of 4k-(size of leaf header) actually the maximum value
of max_inline instead of 4095 for 4k sectorsize?

Thanks,

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] max_inline related enhancement

2018-03-02 Thread Nikolay Borisov



On  2.03.2018 07:22, Qu Wenruo wrote:
> This patchset intends to reduce confusion about "max_inline" mount
> option.
> 
> The max_inline mount option has the following problems:
> 
> 1) Different behavior for plain and compressed data extent
>For plain data extent, it's limiting the extent data size, and will
>never reach sector size.
>For compressed data extent, it's limiting the compressed data size,
>and compressed data size can reach sector size.
> 
>The compressed behavior is very confusing for normal user, as it's
>almost impossible for end user to know if their operation will end up
>inlined or no inlined.
> 
> 2) Inaccurate max inline output
>Passing max_inline=4096 and kernel will prompt max_inline is 4096,
>but we still don't allow inline plain data extent to reach 4096.
> 
> 3) Symbol link can exceed sector size for its inlined data
>Since btrfs_symlink() is calling BTRFS_MAX_INLINE_DATA_SIZE()
>directly without extra truncation.
> 
> This patchset will fixes such problems by:
> 
> 1) Limit both plain and compressed inline extent size by uncompressed
>data size
>So user know exactly what will end up on-disk, just by checking the data
>size.
> 
> 2) Output max inline size by limiting it to BTRFS_MAX_INLINE_DATA_SIZE()
>other than sector size.
> 
> 3) Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()
>So now btrfs_symlink() won't create any inline extent larger than
>page size.
>(Only affects later operations, and can still read such existing
> symbol link)
> 
> Qu Wenruo (5):
>   btrfs: Parse options after node/sector size initialized
>   btrfs: Always limit inline extent size by uncompressed size
>   btrfs: Embed sector size check into BTRFS_MAX_INLINE_DATA_SIZE()
>   btrfs: Unify inline extent creation condition for plain and compressed
> data
>   btrfs: Show more accurate max_inline
> 
>  fs/btrfs/ctree.h   |  5 +++--
>  fs/btrfs/disk-io.c | 13 +++--
>  fs/btrfs/inode.c   |  5 +
>  fs/btrfs/super.c   |  4 ++--
>  4 files changed, 13 insertions(+), 14 deletions(-)

The series look good so:

Reviewed-by: Nikolay Borisov 



> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

53 matches

Mail list logo