[PATCH] Btrfs: save us a mutex_lock usage when doing quota rescan

2013-05-06 Thread Wang Shilong
If qgroup_rescan worker is in progress, we should ignore
the extent that has not been dealt with qgroup_rescan worker,just
let them dealt later otherwise we may get wrong qgroup accounting.

However, we have checked this before find_all_roots() without spin_lock.
When doing qgroup accounting, we don't have to check it again, because
during this period,qgroup_rescan worker can deal with more extents and
qgroup_rescan_extent->objectid can only go larger, so here the check
is unnecessary.

Just remove this check, so that we don't need hold qgroup_rescan_lock
when doing qgroup accounting.

Signed-off-by: Wang Shilong 
---
 fs/btrfs/qgroup.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index d059d86..2710784 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1445,15 +1445,7 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
if (ret < 0)
return ret;
 
-   mutex_lock(&fs_info->qgroup_rescan_lock);
spin_lock(&fs_info->qgroup_lock);
-   if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
-   if (fs_info->qgroup_rescan_progress.objectid <= node->bytenr) {
-   ret = 0;
-   goto unlock;
-   }
-   }
-
quota_root = fs_info->quota_root;
if (!quota_root)
goto unlock;
@@ -1492,7 +1484,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
 
 unlock:
spin_unlock(&fs_info->qgroup_lock);
-   mutex_unlock(&fs_info->qgroup_rescan_lock);
ulist_free(roots);
 
return ret;
-- 
1.7.7.6




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix passing wrong arg gfp_t to decide the correct allocation mode

2013-05-06 Thread Wang Shilong
If you look the code carefully, you will see all the tree_mod_alloc()
has to use GFP_ATOMIC. However, the original code pass the wrong arg
gfp_t in some places, this dosen't cause any problems, because in the
tree_mod_alloc(), it ignores arg gfp_t and just use GFP_ATOMIC directly,
this is not good.

However, i think we should try best not to allocate with GFP_ATOMIC, so
i keep the gfp_t there in the hope we can change allocation mode in the
future.

Signed-off-by: Wang Shilong 
---
 fs/btrfs/ctree.c |   37 ++---
 1 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index de6de8e..33c9061 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
*fs_info, gfp_t flags,
 * once we switch from spin locks to something different, we should
 * honor the flags parameter here.
 */
-   tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
+   tm = *tm_ret = kzalloc(sizeof(*tm), flags);
if (!tm)
return -ENOMEM;
 
@@ -591,14 +591,14 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
 static noinline int
 tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info,
 struct extent_buffer *eb, int slot,
-enum mod_log_op op, gfp_t flags)
+enum mod_log_op op)
 {
int ret;
 
if (tree_mod_dont_log(fs_info, eb))
return 0;
 
-   ret = __tree_mod_log_insert_key(fs_info, eb, slot, op, flags);
+   ret = __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_ATOMIC);
 
tree_mod_log_write_unlock(fs_info);
return ret;
@@ -608,7 +608,7 @@ static noinline int
 tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct extent_buffer 
*eb,
int slot, enum mod_log_op op)
 {
-   return tree_mod_log_insert_key_mask(fs_info, eb, slot, op, GFP_NOFS);
+   return tree_mod_log_insert_key_mask(fs_info, eb, slot, op);
 }
 
 static noinline int
@@ -616,13 +616,13 @@ tree_mod_log_insert_key_locked(struct btrfs_fs_info 
*fs_info,
 struct extent_buffer *eb, int slot,
 enum mod_log_op op)
 {
-   return __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_NOFS);
+   return __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_ATOMIC);
 }
 
 static noinline int
 tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
 struct extent_buffer *eb, int dst_slot, int src_slot,
-int nr_items, gfp_t flags)
+int nr_items)
 {
struct tree_mod_elem *tm;
int ret;
@@ -642,7 +642,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
BUG_ON(ret < 0);
}
 
-   ret = tree_mod_alloc(fs_info, flags, &tm);
+   ret = tree_mod_alloc(fs_info, GFP_ATOMIC, &tm);
if (ret < 0)
goto out;
 
@@ -679,7 +679,7 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb)
 static noinline int
 tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
 struct extent_buffer *old_root,
-struct extent_buffer *new_root, gfp_t flags,
+struct extent_buffer *new_root,
 int log_removal)
 {
struct tree_mod_elem *tm;
@@ -691,7 +691,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
if (log_removal)
__tree_mod_log_free_eb(fs_info, old_root);
 
-   ret = tree_mod_alloc(fs_info, flags, &tm);
+   ret = tree_mod_alloc(fs_info, GFP_ATOMIC, &tm);
if (ret < 0)
goto out;
 
@@ -809,19 +809,18 @@ tree_mod_log_eb_move(struct btrfs_fs_info *fs_info, 
struct extent_buffer *dst,
 {
int ret;
ret = tree_mod_log_insert_move(fs_info, dst, dst_offset, src_offset,
-  nr_items, GFP_NOFS);
+  nr_items);
BUG_ON(ret < 0);
 }
 
 static noinline void
 tree_mod_log_set_node_key(struct btrfs_fs_info *fs_info,
- struct extent_buffer *eb, int slot, int atomic)
+ struct extent_buffer *eb, int slot)
 {
int ret;
 
ret = tree_mod_log_insert_key_mask(fs_info, eb, slot,
-  MOD_LOG_KEY_REPLACE,
-  atomic ? GFP_ATOMIC : GFP_NOFS);
+  MOD_LOG_KEY_REPLACE);
BUG_ON(ret < 0);
 }
 
@@ -843,7 +842,7 @@ tree_mod_log_set_root_pointer(struct btrfs_root *root,
 {
int ret;
ret = tree_mod_log_insert_root(root->fs_info, root->node,
-  new_root_node, GFP_NOFS, log_removal);
+  new_root_node, log_removal);
   

Re: [PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-06 Thread Jan Schmidt
On Mon, May 06, 2013 at 23:20 (+0200), David Sterba wrote:
> On Mon, May 06, 2013 at 09:14:17PM +0200, Jan Schmidt wrote:
>> --- a/include/uapi/linux/btrfs.h
>> +++ b/include/uapi/linux/btrfs.h
>> @@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
>> struct btrfs_ioctl_quota_rescan_args)
>>  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
>> struct btrfs_ioctl_quota_rescan_args)
>> +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
> 
> Why do you need an ioctl when the same can be achieved by polling the
> RESCAN_STATUS value ? The code does not anything special that has to be
> done within kernel.

It's because I don't like polling :-) A rescan can take hours to complete, and
you wouldn't like to see one ioctl per second for such a period either, I guess.
(Plus: Everybody would lose like .9 seconds for each run of the xfstest I'm
writing - accumulates to ages at least!)

If you're worried about ioctl numbers, we could turn it into flags for
BTRFS_IOC_QUOTA_RESCAN, but I don't see we're short on ioctl numbers yet. The
reason why I chose a separate ioctl is that it is more like an attach operation
to support both, specifying it when starting a fresh scan and waiting for a scan
that's already running. I find it more intuitive to have it separate.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Jan Schmidt
On Mon, May 06, 2013 at 22:29 (+0200), Kai Krakow wrote:
> Jan Schmidt  schrieb:
> 
>> That one should be fixed in btrfs-next. If you can reliably reproduce the
>> bug I'd be glad to get a confirmation - you can probably even save putting
>> it on bugzilla then ;-)
> 
> I can reliably reproduce it from two different approaches. I'd like to only 
> apply the commits fixing it. Can you name them here?

In git log order:

6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and tree mod 
log
ef9120b1 Btrfs: fix tree mod log regression on root split operations
2ed098ca Btrfs: fix accessing the root pointer in tree mod log functions
50723551 Btrfs: fix unlock after free on rewinded tree blocks

The commit ids are from josef's master branch
(git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git) which is
known not to be very stable regarding commit ids.

Thanks,
-Jan

>>> [snip]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-06 Thread Tomasz Torcz
On Mon, May 06, 2013 at 10:36:03PM +0200, Kai Krakow wrote:
> zwu.ker...@gmail.com  schrieb:
> 
> >   The patchset is trying to introduce hot relocation support
> > for BTRFS. In hybrid storage environment, when the data in
> > HDD disk get hot, it can be relocated to SSD disk by BTRFS
> > hot relocation support automatically; also, if SSD disk ratio
> > exceed its upper threshold, the data which get cold can be
> > looked up and relocated to HDD disk to make more space in SSD
> > disk at first, and then the data which get hot will be relocated
> > to SSD disk automatically.
> 
> How will it compare to bcache? I'm currently thinking about buying an SSD 
> but bcache requires some efforts in migrating the storage to use. And after 
> all those hassles I am even not sure if it would work easily with a dracut 
> generated initramfs.

  On the side note: dm-cache, which is already in-kernel, do not need to
reformat backing storage.

-- 
Tomasz TorczOnly gods can safely risk perfection,
xmpp: zdzich...@chrome.pl it's a dangerous thing for a man.  -- Alia

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: clean snapshots one by one

2013-05-06 Thread Chris Mason
Quoting David Sterba (2013-03-12 11:13:28)
> Each time pick one dead root from the list and let the caller know if
> it's needed to continue. This should improve responsiveness during
> umount and balance which at some point waits for cleaning all currently
> queued dead roots.
> 
> A new dead root is added to the end of the list, so the snapshots
> disappear in the order of deletion.
> 
> The snapshot cleaning work is now done only from the cleaner thread and the
> others wake it if needed.


> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 988b860..4de2351 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1690,15 +1690,19 @@ static int cleaner_kthread(void *arg)
> struct btrfs_root *root = arg;
>  
> do {
> +   int again = 0;
> +
> if (!(root->fs_info->sb->s_flags & MS_RDONLY) &&
> +   down_read_trylock(&root->fs_info->sb->s_umount) &&
> mutex_trylock(&root->fs_info->cleaner_mutex)) {
> btrfs_run_delayed_iputs(root);
> -   btrfs_clean_old_snapshots(root);
> +   again = btrfs_clean_one_deleted_snapshot(root);
> mutex_unlock(&root->fs_info->cleaner_mutex);
> btrfs_run_defrag_inodes(root->fs_info);
> +   up_read(&root->fs_info->sb->s_umount);

Can we use just the cleaner mutex for this?  We're deadlocking during
068 with autodefrag on because the cleaner is holding s_umount while
autodefrag is trying to bump the writer count.

If unmount takes the cleaner mutex once it should wait long enough for
the cleaner to stop.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: fix typecast when printing csum value

2013-05-06 Thread David Sterba
Only the first byte of the wanted csum is printed:

checksum verify failed on 65536 found DA97CF61 wanted 6B
checksum verify failed on 65536 found DA97CF61 wanted 6BC3870D

Also add leading zeros to the format.

Signed-off-by: David Sterba 
---
 disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index b001e35..21b410d 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -89,9 +89,9 @@ int csum_tree_block_size(struct extent_buffer *buf, u16 
csum_size,
 
if (verify) {
if (memcmp_extent_buffer(buf, result, 0, csum_size)) {
-   printk("checksum verify failed on %llu found %X "
-  "wanted %X\n", (unsigned long long)buf->start,
-  *((int *)result), *((char *)buf->data));
+   printk("checksum verify failed on %llu found %08X "
+  "wanted %08X\n", (unsigned long long)buf->start,
+  *((u32 *)result), *((u32*)(char *)buf->data));
free(result);
return 1;
}
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: don't stop searching after encountering the wrong item

2013-05-06 Thread Gabriel de Perthuis
The search ioctl skips items that are too large for a result buffer, but
inline items of a certain size occuring before any search result is
found would trigger an overflow and stop the search entirely.

Cc: sta...@vger.kernel.org
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641

Signed-off-by: Gabriel de Perthuis 
---
(resent, with the correct header to have stable copied)

 fs/btrfs/ioctl.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2c02310..f49b62f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1794,23 +1794,23 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 
for (i = slot; i < nritems; i++) {
item_off = btrfs_item_ptr_offset(leaf, i);
item_len = btrfs_item_size_nr(leaf, i);
 
-   if (item_len > BTRFS_SEARCH_ARGS_BUFSIZE)
+   btrfs_item_key_to_cpu(leaf, key, i);
+   if (!key_in_sk(key, sk))
+   continue;
+
+   if (sizeof(sh) + item_len > BTRFS_SEARCH_ARGS_BUFSIZE)
item_len = 0;
 
if (sizeof(sh) + item_len + *sk_offset >
BTRFS_SEARCH_ARGS_BUFSIZE) {
ret = 1;
goto overflow;
}
 
-   btrfs_item_key_to_cpu(leaf, key, i);
-   if (!key_in_sk(key, sk))
-   continue;
-
sh.objectid = key->objectid;
sh.offset = key->offset;
sh.type = key->type;
sh.len = item_len;
sh.transid = found_transid;
-- 
1.8.2.1.419.ga0b97c6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-06 Thread David Sterba
On Mon, May 06, 2013 at 09:14:17PM +0200, Jan Schmidt wrote:
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
>  struct btrfs_ioctl_quota_rescan_args)
>  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
>  struct btrfs_ioctl_quota_rescan_args)
> +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)

Why do you need an ioctl when the same can be achieved by polling the
RESCAN_STATUS value ? The code does not anything special that has to be
done within kernel.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: image: handle superblocks correctly on fs with big blocks

2013-05-06 Thread David Sterba
Superblock is always 4k, but metadata blocks may be larger. We have to
use the appropriate block size when doing checksums, otherwise they're
wrong.

Signed-off-by: David Sterba 
---
 btrfs-image.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/btrfs-image.c b/btrfs-image.c
index 188291c..dca7a28 100644
--- a/btrfs-image.c
+++ b/btrfs-image.c
@@ -469,6 +469,16 @@ static int read_data_extent(struct metadump_struct *md,
return 0;
 }
 
+static int is_sb_offset(u64 offset) {
+   switch (offset) {
+   case 65536:
+   case 67108864:
+   case 274877906944:
+   return 1;
+   }
+   return 0;
+}
+
 static int flush_pending(struct metadump_struct *md, int done)
 {
struct async_work *async = NULL;
@@ -506,7 +516,16 @@ static int flush_pending(struct metadump_struct *md, int 
done)
}
 
while (!md->data && size > 0) {
-   eb = read_tree_block(md->root, start, blocksize, 0);
+   /*
+* We must differentiate between superblock and
+* metadata on filesystems with blocksize > 4k,
+* otherwise the checksum fails for superblock
+*/
+   int bs = blocksize;
+
+   if (is_sb_offset(start))
+   bs = BTRFS_SUPER_INFO_SIZE;
+   eb = read_tree_block(md->root, start, bs, 0);
if (!eb) {
free(async->buffer);
free(async);
@@ -516,9 +535,9 @@ static int flush_pending(struct metadump_struct *md, int 
done)
}
copy_buffer(async->buffer + offset, eb);
free_extent_buffer(eb);
-   start += blocksize;
-   offset += blocksize;
-   size -= blocksize;
+   start += bs;
+   offset += bs;
+   size -= bs;
}
 
md->pending_start = (u64)-1;
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread Kai Krakow
james northrup  schrieb:

> tried a git based backup? sounds spot-on as a compromise prior to
> applying btrfs tweaks.  snapshotting the git binaries would have the
> dedupe characteristics.

Git is efficient with space, yes. But if you have a lot of binary files, and 
a lot of them are big, git becomes really slow really fast. Checking out and 
in can be very slow and resource intensive then. And I don't think it would 
track ownership and permissions correctly.

Git is great, it's an everyday tool for me, but it is just not made for 
binary files.

Regards,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-06 Thread Kai Krakow
zwu.ker...@gmail.com  schrieb:

>   The patchset is trying to introduce hot relocation support
> for BTRFS. In hybrid storage environment, when the data in
> HDD disk get hot, it can be relocated to SSD disk by BTRFS
> hot relocation support automatically; also, if SSD disk ratio
> exceed its upper threshold, the data which get cold can be
> looked up and relocated to HDD disk to make more space in SSD
> disk at first, and then the data which get hot will be relocated
> to SSD disk automatically.

How will it compare to bcache? I'm currently thinking about buying an SSD 
but bcache requires some efforts in migrating the storage to use. And after 
all those hassles I am even not sure if it would work easily with a dracut 
generated initramfs.

Bcache seems to be quite clever with its approach. This one looks completely 
different and more targetted to relocate data which is used often instead of 
trying to reduce head movement. I'm quite happy with the throuput of my 3x 
HDD btrfs pool (according to bootchart up to 600 MB/s during boot). A single 
SSD would be slower since head movement seems not to be the issue during 
boot. Will this patch relocate such data? Or does it try to relocate only 
data which requires random head movement?

Thanks,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Kai Krakow
Jan Schmidt  schrieb:

> That one should be fixed in btrfs-next. If you can reliably reproduce the
> bug I'd be glad to get a confirmation - you can probably even save putting
> it on bugzilla then ;-)

I can reliably reproduce it from two different approaches. I'd like to only 
apply the commits fixing it. Can you name them here?

>> 4,1072,17508258745,-;[ cut here ]
>> 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
>> 4,1074,17508258791,-;invalid opcode:  [#1] SMP
>> 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O)
>> vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib
>> snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev
>> coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core
>> lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
>> 4,1076,17508258966,-;CPU 0
>> 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G
>> C O 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68
>> Pro3
>> 4,1078,17508259023,-;RIP: 0010:[]  []
>> __tree_mod_log_rewind+0x4c/0x121
>> 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
>> 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX:
>> 880196671888
>> 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI:
>> 8804087be700
>> 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09:
>> 880196671898
>> 4,1083,17508259165,-;R10:  R11:  R12:
>> 880406c2e000
>> 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15:
>> 0001
>> 4,1085,17508259218,-;FS:  ()
>> GS:88041f20() knlGS:
>> 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
>> 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4:
>> 000407f0
>> 4,1088,17508259297,-;DR0:  DR1:  DR2:
>> 
>> 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7:
>> 0400
>> 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo
>> 88019667, task 8801b82e5400)
>> 4,1091,17508259383,-;Stack:
>> 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000
>> 8a11
>> 4,1093,17508259423,-; 8802d0a14000 81167606 0246
>> 8801ee8d33b0
>> 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360
>> 
>> 4,1095,17508259488,-;Call Trace:
>> 4,1096,17508259500,-; [] ?
>> btrfs_search_old_slot+0x543/0x61e
>> 4,1097,17508259526,-; [] ?
>> btrfs_next_old_leaf+0x8a/0x332 4,1098,17508259552,-; []
>> ? __resolve_indirect_refs+0x2d8/0x408
>> 4,1099,17508259578,-; [] ?
>> find_parent_nodes+0x9c1/0xcec 4,1100,17508259602,-; []
>> ? iterate_extent_inodes+0xf1/0x23c
>> 4,1101,17508259628,-; [] ?
>> btrfs_real_readdir+0x482/0x482 4,1102,17508259652,-; []
>> ? release_extent_buffer.isra.19+0x27/0x88
>> 4,1103,17508259679,-; [] ?
>> btrfs_real_readdir+0x482/0x482 4,1104,17508259703,-; []
>> ? iterate_inodes_from_logical+0x89/0x96
>> 4,1105,17508259729,-; [] ?
>> record_extent_backrefs+0x4d/0x8e
>> 4,1106,17508259755,-; [] ?
>> btrfs_finish_ordered_io+0x671/0x798
>> 4,1107,17508259781,-; [] ? worker_loop+0x176/0x493
>> 4,1108,17508259803,-; [] ?
>> btrfs_queue_worker+0x272/0x272 4,1109,17508259827,-; []
>> ? btrfs_queue_worker+0x272/0x272 4,1110,17508259852,-;
>> [] ? kthread+0x81/0x89 4,,17508259873,-;
>> [] ? free_sched_groups+0x32/0x50 4,1112,17508259896,-;
>> [] ? kthread_freezable_should_stop+0x36/0x36
>> 4,1113,17508259924,-; [] ? ret_from_fork+0x7c/0xb0
>> 4,1114,17508259947,-; [] ?
>> kthread_freezable_should_stop+0x36/0x36
>> 4,1115,17508259974,-;Code: 85 e4 89 c5 0f 85 d6 00 00 00 e9 db 00 00 00
>> 41 83 7e 28 05 0f 87 ab 00 00 00 41 8b 46 28 ff 24 c5 20 78 62 81 41 39
>> 6e 2c 73 02 <0f> 0b 41 8b 56 2c 49 8d 76 38 48 89 df ff c5 e8 7c fb ff ff
>> 49
>> 1,1116,17508260117,-;RIP  []
>> __tree_mod_log_rewind+0x4c/0x121
>> 4,1117,17508260144,-; RSP 
>> 4,1118,17508446926,-;---[ end trace e7a8cddfc052e9e9 ]---


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix off-by-one in fiemap

2013-05-06 Thread Josef Bacik
On Wed, May 01, 2013 at 10:23:41AM -0600, Liu Bo wrote:
> lock_extent/unlock_extent expect an exclusive end.
> 

Can you make an xfstest for this so we can make sure we don't screw this up in
the future?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: device delete to get errors from the kernel

2013-05-06 Thread Josef Bacik
On Tue, Apr 30, 2013 at 07:19:40AM -0600, Anand Jain wrote:
> v1->v2:
> introduce error codes for the device mgmt usage
> 
> v1:
> adds a parameter in the ioctl arg struct to carry the error string
> 
> Signed-off-by: Anand Jain 
> ---

I need a proper log for this patch.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-06 Thread Jan Schmidt
btrfs_qgroup_wait_for_completion waits until the currently running qgroup
operation completes. It returns immediately when no rescan process is in
progress. This is useful to automate things around the rescan process (e.g.
testing).

Signed-off-by: Jan Schmidt 
---
 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/ioctl.c   |   12 
 fs/btrfs/qgroup.c  |   21 +
 include/uapi/linux/btrfs.h |1 +
 4 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8624f49..39ca0d9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1607,6 +1607,7 @@ struct btrfs_fs_info {
struct mutex qgroup_rescan_lock; /* protects the progress item */
struct btrfs_key qgroup_rescan_progress;
struct btrfs_workers qgroup_rescan_workers;
+   struct completion qgroup_rescan_completion;
 
/* filesystem state */
unsigned long fs_state;
@@ -3836,6 +3837,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5e93bb8..9161660 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3937,6 +3937,16 @@ static long btrfs_ioctl_quota_rescan_status(struct file 
*file, void __user *arg)
return ret;
 }
 
+static long btrfs_ioctl_quota_rescan_wait(struct file *file, void __user *arg)
+{
+   struct btrfs_root *root = BTRFS_I(fdentry(file)->d_inode)->root;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   return btrfs_qgroup_wait_for_completion(root->fs_info);
+}
+
 static long btrfs_ioctl_set_received_subvol(struct file *file,
void __user *arg)
 {
@@ -4179,6 +4189,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_quota_rescan(file, argp);
case BTRFS_IOC_QUOTA_RESCAN_STATUS:
return btrfs_ioctl_quota_rescan_status(file, argp);
+   case BTRFS_IOC_QUOTA_RESCAN_WAIT:
+   return btrfs_ioctl_quota_rescan_wait(file, argp);
case BTRFS_IOC_DEV_REPLACE:
return btrfs_ioctl_dev_replace(root, argp);
case BTRFS_IOC_GET_FSLABEL:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9d49c58..ebca17a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2068,6 +2068,8 @@ out:
} else {
pr_err("btrfs: qgroup scan failed with %d\n", err);
}
+
+   complete_all(&fs_info->qgroup_rescan_completion);
 }
 
 static void
@@ -2108,6 +2110,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
memset(&fs_info->qgroup_rescan_progress, 0,
sizeof(fs_info->qgroup_rescan_progress));
+   init_completion(&fs_info->qgroup_rescan_completion);
 
/* clear all current qgroup tracking information */
for (n = rb_first(&fs_info->qgroup_tree); n; n = rb_next(n)) {
@@ -2124,3 +2127,21 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
return 0;
 }
+
+int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info)
+{
+   int running;
+   int ret = 0;
+
+   mutex_lock(&fs_info->qgroup_rescan_lock);
+   spin_lock(&fs_info->qgroup_lock);
+   running = fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+   spin_unlock(&fs_info->qgroup_lock);
+   mutex_unlock(&fs_info->qgroup_rescan_lock);
+
+   if (running)
+   ret = wait_for_completion_interruptible(
+   &fs_info->qgroup_rescan_completion);
+
+   return ret;
+}
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ef0df5..5b683b5 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
+#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
   char[BTRFS_LABEL_SIZE])
 #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs-progs: added "btrfs quota rescan" -w switch (wait)

2013-05-06 Thread Jan Schmidt
With -w one can wait for a rescan operation to finish. It can be used when
starting a rescan operation or later to wait for the currently running
rescan operation to finish. Waiting is interruptible.

Signed-off-by: Jan Schmidt 
---
 cmds-quota.c |   19 +--
 ioctl.h  |1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/cmds-quota.c b/cmds-quota.c
index 1169772..6557e83 100644
--- a/cmds-quota.c
+++ b/cmds-quota.c
@@ -90,10 +90,11 @@ static int cmd_quota_disable(int argc, char **argv)
 }
 
 static const char * const cmd_quota_rescan_usage[] = {
-   "btrfs quota rescan [-s] ",
+   "btrfs quota rescan [-sw] ",
"Trash all qgroup numbers and scan the metadata again with the current 
config.",
"",
"-s   show status of a running rescan operation",
+   "-w   wait for rescan operation to finish (can be already in progress)",
NULL
 };
 
@@ -105,21 +106,30 @@ static int cmd_quota_rescan(int argc, char **argv)
char *path = NULL;
struct btrfs_ioctl_quota_rescan_args args;
int ioctlnum = BTRFS_IOC_QUOTA_RESCAN;
+   int wait_for_completion = 0;
 
optind = 1;
while (1) {
-   int c = getopt(argc, argv, "s");
+   int c = getopt(argc, argv, "sw");
if (c < 0)
break;
switch (c) {
case 's':
ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS;
break;
+   case 'w':
+   wait_for_completion = 1;
+   break;
default:
usage(cmd_quota_rescan_usage);
}
}
 
+   if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN && wait_for_completion) {
+   fprintf(stderr, "ERROR: -w cannot be used with -s\n");
+   return 12;
+   }
+
if (check_argc_exact(argc - optind, 1))
usage(cmd_quota_rescan_usage);
 
@@ -134,6 +144,11 @@ static int cmd_quota_rescan(int argc, char **argv)
 
ret = ioctl(fd, ioctlnum, &args);
e = errno;
+
+   if (wait_for_completion && (ret == 0 || e == EINPROGRESS)) {
+   ret = ioctl(fd, BTRFS_IOC_QUOTA_RESCAN_WAIT, &args);
+   e = errno;
+   }
close(fd);
 
if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN) {
diff --git a/ioctl.h b/ioctl.h
index abe6dd4..c260bbf 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -529,6 +529,7 @@ struct btrfs_ioctl_clone_range_args {
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
+#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
   char[BTRFS_LABEL_SIZE])
 #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args

2013-05-06 Thread Jan Schmidt
The patch set previously sent was sent together with the kernel part, but
was not updated as I added some reserved bytes to the ioctl struct for
future compatibility. This fixes struct btrfs_ioctl_quota_rescan_args.

Signed-off-by: Jan Schmidt 
---
 ioctl.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index 1ee631a..abe6dd4 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -429,6 +429,7 @@ struct btrfs_ioctl_quota_ctl_args {
 struct btrfs_ioctl_quota_rescan_args {
__u64   flags;
__u64   progress;
+   __u64   reserved[6];
 };
 
 struct btrfs_ioctl_qgroup_assign_args {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs: wait for quota rescan to complete

2013-05-06 Thread Jan Schmidt
Two small patches, one for the kernel and one for the user mode. Both
required to support waiting for quota rescan to complete.

Jan Schmidt (1):
  Btrfs: add ioctl to wait for qgroup rescan completion

 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/ioctl.c   |   12 
 fs/btrfs/qgroup.c  |   21 +
 include/uapi/linux/btrfs.h |1 +
 4 files changed, 36 insertions(+), 0 deletions(-)


Jan Schmidt (2):
  Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args
  Btrfs-progs: added "btrfs quota rescan" -w switch (wait)

 cmds-quota.c |   19 +--
 ioctl.h  |2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: don't stop searching after encountering the wrong item

2013-05-06 Thread Greg KH
On Mon, May 06, 2013 at 07:40:18PM +0200, Gabriel de Perthuis wrote:
> The search ioctl skips items that are too large for a result buffer, but
> inline items of a certain size occuring before any search result is
> found would trigger an overflow and stop the search entirely.
> 
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641
> 
> Signed-off-by: Gabriel de Perthuis 
> ---
>  fs/btrfs/ioctl.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: don't stop searching after encountering the wrong item

2013-05-06 Thread Gabriel de Perthuis
The search ioctl skips items that are too large for a result buffer, but
inline items of a certain size occuring before any search result is
found would trigger an overflow and stop the search entirely.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641

Signed-off-by: Gabriel de Perthuis 
---
 fs/btrfs/ioctl.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 95d46cc..b3f0276 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1797,23 +1797,23 @@ static noinline int copy_to_sk(struct btrfs_root *root,
 
for (i = slot; i < nritems; i++) {
item_off = btrfs_item_ptr_offset(leaf, i);
item_len = btrfs_item_size_nr(leaf, i);
 
-   if (item_len > BTRFS_SEARCH_ARGS_BUFSIZE)
+   btrfs_item_key_to_cpu(leaf, key, i);
+   if (!key_in_sk(key, sk))
+   continue;
+
+   if (sizeof(sh) + item_len > BTRFS_SEARCH_ARGS_BUFSIZE)
item_len = 0;
 
if (sizeof(sh) + item_len + *sk_offset >
BTRFS_SEARCH_ARGS_BUFSIZE) {
ret = 1;
goto overflow;
}
 
-   btrfs_item_key_to_cpu(leaf, key, i);
-   if (!key_in_sk(key, sk))
-   continue;
-
sh.objectid = key->objectid;
sh.offset = key->offset;
sh.type = key->type;
sh.len = item_len;
sh.transid = found_transid;
-- 
1.8.2.1.419.ga0b97c6



Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread james northrup
tried a git based backup? sounds spot-on as a compromise prior to
applying btrfs tweaks.  snapshotting the git binaries would have the
dedupe characteristics.

On Mon, May 6, 2013 at 12:44 AM, Kai Krakow  wrote:
> Jan Schmidt  schrieb:
>
>>> I'm using an bash/rsync script[1] to backup my whole system on a nightly
>>> basis to an attached USB3 drive into a scratch area, then take a snapshot
>>> of this area. I'd like to have these snapshots immutable, so they should
>>> be read-only.
>>
>> Have you considered using btrfs send / receive for that purpose? You would
>> just save the dedup step.
>
> This is planned for later. In the first step I want to stay as file system
> agnostic for the source as possible. But I've put it on my todo list in the
> gist.
>
> Regards,
> Kai
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] btrfs: Introduce extent_read_full_page_nolock()

2013-05-06 Thread David Sterba
On Tue, Apr 16, 2013 at 03:15:34PM -0700, Mark Fasheh wrote:
> @@ -2625,7 +2625,7 @@ static int __extent_read_full_page(struct 
> extent_io_tree *tree,
>   }
>  
>   end = page_end;
> - while (1) {
> + while (1 && !parent_locked) {

the patch is ok, just this caught my eye :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs: offline dedupe

2013-05-06 Thread David Sterba
On Tue, Apr 16, 2013 at 03:15:35PM -0700, Mark Fasheh wrote:
> +static void btrfs_double_lock(struct inode *inode1, u64 loff1,
> +   struct inode *inode2, u64 loff2, u64 len)
> +{
> + if (inode1 < inode2) {
> + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> + lock_extent_range(inode1, loff1, len);
> + lock_extent_range(inode2, loff2, len);
> + } else {
> + mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> + mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> + lock_extent_range(inode2, loff2, len);
> + lock_extent_range(inode1, loff1, len);
> + }

You can decrease the code size by swapping just the pointers.

> +}
> +
> +static long btrfs_ioctl_file_extent_same(struct file *file,
> +  void __user *argp)
> +{
> + struct btrfs_ioctl_same_args *args;
> + struct btrfs_ioctl_same_args tmp;
> + struct btrfs_ioctl_same_extent_info *info;
> + struct inode *src = file->f_dentry->d_inode;
> + struct file *dst_file = NULL;
> + struct inode *dst;
> + u64 off;
> + u64 len;
> + int args_size;
> + int i;
> + int ret;
> + u64 bs = BTRFS_I(src)->root->fs_info->sb->s_blocksize;
> +
> + if (copy_from_user(&tmp,
> +(struct btrfs_ioctl_same_args __user *)argp,
> +sizeof(tmp)))
> + return -EFAULT;
> +
> + args_size = sizeof(tmp) + (tmp.total_files *
> + sizeof(struct btrfs_ioctl_same_extent_info));
> +
> + /* Keep size of ioctl argument sane */
> + if (args_size > PAGE_CACHE_SIZE)
> + return -ENOMEM;

Using E2BIG7  /* Argument list too long */
makes more sense to me, it's not really an ENOMEM condition.

> +
> + args = kmalloc(args_size, GFP_NOFS);
> + if (!args)
> + return -ENOMEM;

(like here)

> +
> + ret = -EFAULT;
> + if (BTRFS_I(dst)->root != BTRFS_I(src)->root) {
> + printk(KERN_ERR "btrfs: cannot dedup across subvolumes"
> +" %lld\n", info->fd);
> + goto next;
> + }
...
> + info->status = btrfs_extent_same(src, off, len, dst,
> +  info->logical_offset);
> + if (info->status == 0) {
> + info->bytes_deduped = len;
> + args->files_deduped++;
> + } else {
> + printk(KERN_ERR "error %d from btrfs_extent_same\n",

missing "btrfs:" prefix

> + info->status);
> + }
> +next:

> --- a/fs/btrfs/ioctl.h
> +++ b/fs/btrfs/ioctl.h
> +/* For extent-same ioctl */
> +struct btrfs_ioctl_same_extent_info {
> + __s64 fd;   /* in - destination file */
> + __u64 logical_offset;   /* in - start of extent in destination */
> + __u64 bytes_deduped;/* out - total # of bytes we were able
> +  * to dedupe from this file */
> + /* status of this dedupe operation:
> +  * 0 if dedup succeeds
> +  * < 0 for error
> +  * == BTRFS_SAME_DATA_DIFFERS if data differs
> +  */
> + __s32 status;   /* out - see above description */
> + __u32 reserved;
> +};
> +
> +struct btrfs_ioctl_same_args {
> + __u64 logical_offset;   /* in - start of extent in source */
> + __u64 length;   /* in - length of extent */
> + __u16 total_files;  /* in - total elements in info array */
> + __u16 files_deduped;/* out - number of files that got deduped */
> + __u32 reserved;

Please add a few more reserved bytes here, we may want to enhance the
call with some fine tunables or extended status. This is an external
interface, we don't need to count every byte here and makes minor future
enhancements easier.

> + struct btrfs_ioctl_same_extent_info info[0];
> +};
> +
>  struct btrfs_ioctl_space_info {
>   __u64 flags;
>   __u64 total_bytes;
> @@ -498,5 +523,6 @@ struct btrfs_ioctl_send_args {
> struct btrfs_ioctl_get_dev_stats)
>  #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
>   struct btrfs_ioctl_dev_replace_args)
> -
> +#define BTRFS_IOC_FILE_EXTENT_SAME _IOWR(BTRFS_IOCTL_MAGIC, 54, \
> +  struct btrfs_ioctl_same_args)

Feel free to claim the ioctl number at

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Development_notes.2C_please_read
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: introduce qgroup_ulist to avoid frequently allocating/freeing ulist

2013-05-06 Thread Wang Shilong
When doing qgroup accounting, we call ulist_alloc()/ulist_free() every time
when we want to walk qgroup tree.

By introducing 'qgroup_ulist', we only need to call ulist_alloc()/ulist_free()
once. This reduce some sys time to allocate memory, see the measurements below

fsstress -p 4 -n 1 -d $dir

With this patch:

real0m50.153s
user0m0.081s
sys 0m6.294s

real0m51.113s
user0m0.092s
sys 0m6.220s

real0m52.610s
user0m0.096s
sys 0m6.125savg 6.213
-
Without the patch:

real0m54.825s
user0m0.061s
sys 0m10.665s

real1m6.401s
user0m0.089s
sys 0m11.218s

real1m13.768s
user0m0.087s
sys 0m10.665s   avg 10.849

we can see the sys time reduce ~43%.

Signed-off-by: Wang Shilong 
---
 fs/btrfs/ctree.h   |6 
 fs/btrfs/disk-io.c |1 +
 fs/btrfs/qgroup.c  |   70 ++-
 3 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 63c328a..3ccb829 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1594,6 +1594,12 @@ struct btrfs_fs_info {
struct rb_root qgroup_tree;
spinlock_t qgroup_lock;
 
+   /*
+* used to avoid frequently calling ulist_alloc()/ulist_free()
+* when doing qgroup accounting, it must be protected by qgroup_lock.
+*/
+   struct ulist *qgroup_ulist;
+
/* protect user change for quota operations */
struct mutex qgroup_ioctl_lock;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2223494..ee8ce33 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2267,6 +2267,7 @@ int open_ctree(struct super_block *sb,
fs_info->qgroup_seq = 1;
fs_info->quota_enabled = 0;
fs_info->pending_quota_state = 0;
+   fs_info->qgroup_ulist = NULL;
mutex_init(&fs_info->qgroup_rescan_lock);
 
btrfs_init_free_cluster(&fs_info->meta_alloc_cluster);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9d49c58..7f38cce 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -259,6 +259,12 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
if (!fs_info->quota_enabled)
return 0;
 
+   fs_info->qgroup_ulist = ulist_alloc(GFP_NOFS);
+   if (!fs_info->qgroup_ulist) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
path = btrfs_alloc_path();
if (!path) {
ret = -ENOMEM;
@@ -424,6 +430,9 @@ out:
}
btrfs_free_path(path);
 
+   if (ret)
+   ulist_free(fs_info->qgroup_ulist);
+
return ret < 0 ? ret : 0;
 }
 
@@ -460,6 +469,7 @@ void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info)
}
kfree(qgroup);
}
+   ulist_free(fs_info->qgroup_ulist);
 }
 
 static int add_qgroup_relation_item(struct btrfs_trans_handle *trans,
@@ -819,6 +829,12 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
goto out;
}
 
+   fs_info->qgroup_ulist = ulist_alloc(GFP_NOFS);
+   if (!fs_info->qgroup_ulist) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
/*
 * initially create the quota tree
 */
@@ -916,6 +932,8 @@ out_free_root:
kfree(quota_root);
}
 out:
+   if (ret)
+   ulist_free(fs_info->qgroup_ulist);
mutex_unlock(&fs_info->qgroup_ioctl_lock);
return ret;
 }
@@ -1355,7 +1373,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
u64 ref_root;
struct btrfs_qgroup *qgroup;
struct ulist *roots = NULL;
-   struct ulist *tmp = NULL;
u64 seq;
int ret = 0;
int sgn;
@@ -1448,31 +1465,28 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
/*
 * step 1: for each old ref, visit all nodes once and inc refcnt
 */
-   tmp = ulist_alloc(GFP_ATOMIC);
-   if (!tmp) {
-   ret = -ENOMEM;
-   goto unlock;
-   }
+   ulist_reinit(fs_info->qgroup_ulist);
seq = fs_info->qgroup_seq;
fs_info->qgroup_seq += roots->nnodes + 1; /* max refcnt */
 
-   ret = qgroup_account_ref_step1(fs_info, roots, tmp, seq);
+   ret = qgroup_account_ref_step1(fs_info, roots, fs_info->qgroup_ulist,
+  seq);
if (ret)
goto unlock;
 
/*
 * step 2: walk from the new root
 */
-   ret = qgroup_account_ref_step2(fs_info, roots, tmp, seq, sgn,
-  node->num_bytes, qgroup);
+   ret = qgroup_account_ref_step2(fs_info, roots, fs_info->qgroup_ulist,
+  seq, sgn, node->num_bytes, qgroup);
if (ret)
goto unlock;
 
/*
 * step 3: walk again from old refs
 */
-   ret =

[PATCH V3] Btrfs: remove btrfs_sector_sum structure

2013-05-06 Thread Miao Xie
Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s -> 54MB/s).

test command:
 # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync

Signed-off-by: Miao Xie 
---
Changelog v2 -> v3:
- address the problem that the csums was inserted into the wrong range, this bug
  was reported by Josef.

Changelog v1 -> v2:
- modify the changelog and the title which can not explain this patch clearly
- fix the 64bit division problem on 32bit machine
---
 fs/btrfs/file-item.c| 144 ++--
 fs/btrfs/ordered-data.c |  19 +++
 fs/btrfs/ordered-data.h |  25 ++---
 fs/btrfs/relocation.c   |  10 
 fs/btrfs/scrub.c|  16 ++
 5 files changed, 73 insertions(+), 141 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b193bf3..a7bfc95 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -34,8 +34,7 @@
 
 #define MAX_ORDERED_SUM_BYTES(r) ((PAGE_SIZE - \
   sizeof(struct btrfs_ordered_sum)) / \
-  sizeof(struct btrfs_sector_sum) * \
-  (r)->sectorsize - (r)->sectorsize)
+  sizeof(u32) * (r)->sectorsize)
 
 int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
@@ -297,7 +296,6 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
struct btrfs_path *path;
struct extent_buffer *leaf;
struct btrfs_ordered_sum *sums;
-   struct btrfs_sector_sum *sector_sum;
struct btrfs_csum_item *item;
LIST_HEAD(tmplist);
unsigned long offset;
@@ -368,34 +366,28 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 
start, u64 end,
  struct btrfs_csum_item);
while (start < csum_end) {
size = min_t(size_t, csum_end - start,
-   MAX_ORDERED_SUM_BYTES(root));
+MAX_ORDERED_SUM_BYTES(root));
sums = kzalloc(btrfs_ordered_sum_size(root, size),
-   GFP_NOFS);
+  GFP_NOFS);
if (!sums) {
ret = -ENOMEM;
goto fail;
}
 
-   sector_sum = sums->sums;
sums->bytenr = start;
-   sums->len = size;
+   sums->len = (int)size;
 
offset = (start - key.offset) >>
root->fs_info->sb->s_blocksize_bits;
offset *= csum_size;
+   size >>= root->fs_info->sb->s_blocksize_bits;
 
-   while (size > 0) {
-   read_extent_buffer(path->nodes[0],
-   §or_sum->sum,
-   ((unsigned long)item) +
-   offset, csum_size);
-   sector_sum->bytenr = start;
-
-   size -= root->sectorsize;
-   start += root->sectorsize;
-   offset += csum_size;
-   sector_sum++;
-   }
+   read_extent_buffer(path->nodes[0],
+  sums->sums,
+  ((unsigned long)item) + offset,
+  csum_size * size);
+
+   start += root->sectorsize * size;
list_add_tail(&sums->list, &tmplist);
}
path->slots[0]++;
@@ -417,23 +409,20 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
   struct bio *bio, u64 file_start, int contig)
 {
struct btrfs_ordered_sum *sums;
-   struct btrfs_sector_sum *sector_sum;
struct btrfs_ordered_extent *ordered;
char *data;
struct bio_vec *bvec = bio->bi_io_vec;
int bio_index = 0;
+   int index;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
-   

Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Harald Glatt
I have this problem too, and I cannot reproduce it properly... When is
that patch in btrfs-next going to be in the mainline kernel?

On Mon, May 6, 2013 at 10:55 AM, Jan Schmidt  wrote:
> On Sun, May 05, 2013 at 18:10 (+0200), Kai Krakow wrote:
>> Hello list,
>>
>> Kai Krakow  schrieb:
>>
>>> I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
>>> I'm running bedup[1] on a regular basis and it is now the third time that
>>> I got back to my PC just to find it hard-frozen and I needed to use the
>>> reset button.
>>>
>>> It looks like this happens only while running bedup on my two btrfs
>>> filesystems but I'm not sure if it happens for any of the filesystems or
>>> only one. This is my setup:
>>>
>>> # cat /etc/fstab (shortened)
>>> UUID=d2bb232a-2e8f-4951-8bcc-97e237f1b536 / btrfs
>>> compress=lzo,subvol=root64 0 1 # /dev/sd{a,b,c}3
>>> LABEL=usb-backup /mnt/private/usb-backup btrfs noauto,compress-
>>> force=zlib,subvolid=0,autodefrag,comment=systemd.automount 0 0 # external
>>> usb3 disk
>>>
>>> # btrfs filesystem show
>>> Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
>>> Total devices 1 FS bytes used 1.13TB
>>> devid1 size 1.82TB used 1.75TB path /dev/sdd1
>>>
>>> Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536
>>> Total devices 3 FS bytes used 914.43GB
>>> devid3 size 927.26GB used 426.03GB path /dev/sdc3
>>> devid2 size 927.26GB used 426.03GB path /dev/sdb3
>>> devid1 size 927.26GB used 427.07GB path /dev/sda3
>>>
>>> Btrfs v0.20-rc1
>>>
>>> Since the system hard-freezes I have no messages from dmesg. But I suspect
>>> it to be related to the defragmentation option in bedup (I've switched to
>>> bedub with --defrag since 3.9.0, and autodefrag for the backup drive).
>>> Just in case, I'm going to try without this option now and see if it won't
>>> freeze.
>>>
>>> I was able to take a "physical" screenshot with a real camera of a kernel
>>> backtrace one time when the freeze happened. I wonder if it is useful to
>>> you and where to send it. I just don't want to upload jpegs right here to
>>> the list without asking first.
>>>
>>> The big plus is: Altough I had to hard-reset the frozen system several
>>> times now, btrfs survived the procedure without any impact (just boot
>>> times increases noticeably, probably due to log-replays or something). So
>>> thumbs up for the developers on that point.
>>
>> Thanks to the great cwillu netcat service here's my backtrace:
>
> That one should be fixed in btrfs-next. If you can reliably reproduce the bug
> I'd be glad to get a confirmation - you can probably even save putting it on
> bugzilla then ;-)
>
> -Jan
>
>> 4,1072,17508258745,-;[ cut here ]
>> 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
>> 4,1074,17508258791,-;invalid opcode:  [#1] SMP
>> 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O)
>> vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib
>> snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev
>> coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core
>> lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
>> 4,1076,17508258966,-;CPU 0
>> 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G C O
>> 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3
>> 4,1078,17508259023,-;RIP: 0010:[]  []
>> __tree_mod_log_rewind+0x4c/0x121
>> 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
>> 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX:
>> 880196671888
>> 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI:
>> 8804087be700
>> 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09:
>> 880196671898
>> 4,1083,17508259165,-;R10:  R11:  R12:
>> 880406c2e000
>> 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15:
>> 0001
>> 4,1085,17508259218,-;FS:  () GS:88041f20()
>> knlGS:
>> 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
>> 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4:
>> 000407f0
>> 4,1088,17508259297,-;DR0:  DR1:  DR2:
>> 
>> 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7:
>> 0400
>> 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo
>> 88019667, task 8801b82e5400)
>> 4,1091,17508259383,-;Stack:
>> 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000
>> 8a11
>> 4,1093,17508259423,-; 8802d0a14000 81167606 0246
>> 8801ee8d33b0
>> 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360
>> 
>> 4,1095,17508259488,-;Call Trace:
>> 4,

Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 18:10 (+0200), Kai Krakow wrote:
> Hello list,
> 
> Kai Krakow  schrieb:
> 
>> I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
>> I'm running bedup[1] on a regular basis and it is now the third time that
>> I got back to my PC just to find it hard-frozen and I needed to use the
>> reset button.
>>
>> It looks like this happens only while running bedup on my two btrfs
>> filesystems but I'm not sure if it happens for any of the filesystems or
>> only one. This is my setup:
>>
>> # cat /etc/fstab (shortened)
>> UUID=d2bb232a-2e8f-4951-8bcc-97e237f1b536 / btrfs
>> compress=lzo,subvol=root64 0 1 # /dev/sd{a,b,c}3
>> LABEL=usb-backup /mnt/private/usb-backup btrfs noauto,compress-
>> force=zlib,subvolid=0,autodefrag,comment=systemd.automount 0 0 # external
>> usb3 disk
>>
>> # btrfs filesystem show
>> Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
>> Total devices 1 FS bytes used 1.13TB
>> devid1 size 1.82TB used 1.75TB path /dev/sdd1
>>
>> Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536
>> Total devices 3 FS bytes used 914.43GB
>> devid3 size 927.26GB used 426.03GB path /dev/sdc3
>> devid2 size 927.26GB used 426.03GB path /dev/sdb3
>> devid1 size 927.26GB used 427.07GB path /dev/sda3
>>
>> Btrfs v0.20-rc1
>>
>> Since the system hard-freezes I have no messages from dmesg. But I suspect
>> it to be related to the defragmentation option in bedup (I've switched to
>> bedub with --defrag since 3.9.0, and autodefrag for the backup drive).
>> Just in case, I'm going to try without this option now and see if it won't
>> freeze.
>>
>> I was able to take a "physical" screenshot with a real camera of a kernel
>> backtrace one time when the freeze happened. I wonder if it is useful to
>> you and where to send it. I just don't want to upload jpegs right here to
>> the list without asking first.
>>
>> The big plus is: Altough I had to hard-reset the frozen system several
>> times now, btrfs survived the procedure without any impact (just boot
>> times increases noticeably, probably due to log-replays or something). So
>> thumbs up for the developers on that point.
> 
> Thanks to the great cwillu netcat service here's my backtrace:

That one should be fixed in btrfs-next. If you can reliably reproduce the bug
I'd be glad to get a confirmation - you can probably even save putting it on
bugzilla then ;-)

-Jan

> 4,1072,17508258745,-;[ cut here ]
> 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
> 4,1074,17508258791,-;invalid opcode:  [#1] SMP 
> 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O) 
> vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib 
> snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev 
> coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core 
> lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
> 4,1076,17508258966,-;CPU 0 
> 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G C O 
> 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3
> 4,1078,17508259023,-;RIP: 0010:[]  [] 
> __tree_mod_log_rewind+0x4c/0x121
> 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
> 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX: 
> 880196671888
> 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI: 
> 8804087be700
> 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09: 
> 880196671898
> 4,1083,17508259165,-;R10:  R11:  R12: 
> 880406c2e000
> 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15: 
> 0001
> 4,1085,17508259218,-;FS:  () GS:88041f20() 
> knlGS:
> 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
> 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4: 
> 000407f0
> 4,1088,17508259297,-;DR0:  DR1:  DR2: 
> 
> 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7: 
> 0400
> 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo 
> 88019667, task 8801b82e5400)
> 4,1091,17508259383,-;Stack:
> 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000 
> 8a11
> 4,1093,17508259423,-; 8802d0a14000 81167606 0246 
> 8801ee8d33b0
> 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360 
> 
> 4,1095,17508259488,-;Call Trace:
> 4,1096,17508259500,-; [] ? 
> btrfs_search_old_slot+0x543/0x61e
> 4,1097,17508259526,-; [] ? btrfs_next_old_leaf+0x8a/0x332
> 4,1098,17508259552,-; [] ? 
> __resolve_indirect_refs+0x2d8/0x408
> 4,1099,17508259578,-; [] ? find_parent_nodes+0x9c1/0xcec
> 4,1100,17508259602

[RFC 5/5] btrfs: add hot relocation support

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu 

  Add one new mount option '-o hot_move' for hot
relocation support. When hot relocation is enabled,
hot tracking will be enabled automatically.
  Its usage looks like:
mount -o hot_move
mount -o nouser,hot_move
mount -o nouser,hot_move,loop
mount -o hot_move,nouser

Signed-off-by: Zhi Yong Wu 
---
 fs/btrfs/super.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4cbd0de..b342f6f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -311,8 +311,13 @@ static void btrfs_put_super(struct super_block *sb)
 * process...  Whom would you report that to?
 */
 
+   /* Hot data relocation */
+   if (btrfs_test_opt(btrfs_sb(sb)->tree_root, HOT_MOVE))
+   hot_relocate_exit(btrfs_sb(sb));
+
/* Hot data tracking */
-   if (btrfs_test_opt(btrfs_sb(sb)->tree_root, HOT_TRACK))
+   if (btrfs_test_opt(btrfs_sb(sb)->tree_root, HOT_MOVE)
+   || btrfs_test_opt(btrfs_sb(sb)->tree_root, HOT_TRACK))
hot_track_exit(sb);
 }
 
@@ -327,7 +332,7 @@ enum {
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_hot_track,
-   Opt_err,
+   Opt_hot_move, Opt_err,
 };
 
 static match_table_t tokens = {
@@ -368,6 +373,7 @@ static match_table_t tokens = {
{Opt_check_integrity_print_mask, "check_int_print_mask=%d"},
{Opt_fatal_errors, "fatal_errors=%s"},
{Opt_hot_track, "hot_track"},
+   {Opt_hot_move, "hot_move"},
{Opt_err, NULL},
 };
 
@@ -636,6 +642,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
case Opt_hot_track:
btrfs_set_opt(info->mount_opt, HOT_TRACK);
break;
+   case Opt_hot_move:
+   btrfs_set_opt(info->mount_opt, HOT_MOVE);
+   break;
case Opt_err:
printk(KERN_INFO "btrfs: unrecognized mount option "
   "'%s'\n", p);
@@ -863,17 +872,26 @@ static int btrfs_fill_super(struct super_block *sb,
goto fail_close;
}
 
-   if (btrfs_test_opt(fs_info->tree_root, HOT_TRACK)) {
+   if (btrfs_test_opt(fs_info->tree_root, HOT_MOVE)
+   || btrfs_test_opt(fs_info->tree_root, HOT_TRACK)) {
err = hot_track_init(sb);
if (err)
goto fail_hot;
}
 
+   if (btrfs_test_opt(fs_info->tree_root, HOT_MOVE)) {
+   err = hot_relocate_init(fs_info);
+   if (err)
+   goto fail_reloc;
+   }
+
save_mount_options(sb, data);
cleancache_init_fs(sb);
sb->s_flags |= MS_ACTIVE;
return 0;
 
+fail_reloc:
+   hot_track_exit(sb);
 fail_hot:
dput(sb->s_root);
sb->s_root = NULL;
@@ -974,6 +992,8 @@ static int btrfs_show_options(struct seq_file *seq, struct 
dentry *dentry)
seq_puts(seq, ",fatal_errors=panic");
if (btrfs_test_opt(root, HOT_TRACK))
seq_puts(seq, ",hot_track");
+   if (btrfs_test_opt(root, HOT_MOVE))
+   seq_puts(seq, ",hot_move");
return 0;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 4/5] procfs: add three proc interfaces

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu 

  Add three proc interfaces hot-reloc-interval, hot-reloc-threshold,
and hot-reloc-max-items under the dir /proc/sys/fs/ in order to
turn HOT_RELOC_INTERVAL, HOT_RELOC_THRESHOLD, and HOT_RELOC_MAX_ITEMS
into be tunable.

Signed-off-by: Zhi Yong Wu 
---
 fs/btrfs/hot_relocate.c | 26 +-
 fs/btrfs/hot_relocate.h |  4 
 include/linux/btrfs.h   |  4 
 kernel/sysctl.c | 22 ++
 4 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/hot_relocate.c b/fs/btrfs/hot_relocate.c
index 683e154..aa8c9f0 100644
--- a/fs/btrfs/hot_relocate.c
+++ b/fs/btrfs/hot_relocate.c
@@ -25,7 +25,7 @@
  * The relocation code below operates on the heat map lists to identify
  * hot or cold data logical file ranges that are candidates for relocation.
  * The triggering mechanism for relocation is controlled by a global heat
- * threshold integer value (HOT_RELOC_THRESHOLD). Ranges are
+ * threshold integer value (sysctl_hot_reloc_threshold). Ranges are
  * queued for relocation by the periodically executing relocate kthread,
  * which updates the global heat threshold and responds to space pressure
  * on the SSDs.
@@ -52,6 +52,15 @@
  * (assuming, critically, the HOT_MOVE option is set at mount time).
  */
 
+int sysctl_hot_reloc_threshold = 150;
+EXPORT_SYMBOL_GPL(sysctl_hot_reloc_threshold);
+
+int sysctl_hot_reloc_interval __read_mostly = 120;
+EXPORT_SYMBOL_GPL(sysctl_hot_reloc_interval);
+
+int sysctl_hot_reloc_max_items __read_mostly = 250;
+EXPORT_SYMBOL_GPL(sysctl_hot_reloc_max_items);
+
 static void hot_set_extent_bits(struct extent_io_tree *tree, u64 start,
u64 end, struct extent_state **cached_state,
gfp_t mask, int storage_type, int flag)
@@ -165,7 +174,7 @@ static int hot_calc_ssd_ratio(struct hot_reloc *hot_reloc)
 static int hot_update_threshold(struct hot_reloc *hot_reloc,
int update)
 {
-   int thresh = hot_reloc->thresh;
+   int thresh = sysctl_hot_reloc_threshold;
int ratio = hot_calc_ssd_ratio(hot_reloc);
 
/* Sometimes update global threshold, others not */
@@ -189,7 +198,7 @@ static int hot_update_threshold(struct hot_reloc *hot_reloc,
thresh = 0;
}
 
-   hot_reloc->thresh = thresh;
+   sysctl_hot_reloc_threshold = thresh;
return ratio;
 }
 
@@ -280,7 +289,7 @@ static int hot_queue_extent(struct hot_reloc *hot_reloc,
hot_comm_item_put(ci);
spin_unlock(&he->i_lock);
 
-   if (*counter >= HOT_RELOC_MAX_ITEMS)
+   if (*counter >= sysctl_hot_reloc_max_items)
break;
 
if (kthread_should_stop()) {
@@ -361,7 +370,7 @@ again:
while (1) {
lock_extent(tree, page_start, page_end);
ordered = btrfs_lookup_ordered_extent(inode,
-   page_start);
+ page_start);
unlock_extent(tree, page_start, page_end);
if (!ordered)
break;
@@ -642,7 +651,7 @@ void hot_do_relocate(struct hot_reloc *hot_reloc)
 
run++;
ratio = hot_update_threshold(hot_reloc, !(run % 15));
-   thresh = hot_reloc->thresh;
+   thresh = sysctl_hot_reloc_threshold;
 
INIT_LIST_HEAD(&hot_reloc->hot_relocq[TYPE_NONROT]);
 
@@ -652,7 +661,7 @@ void hot_do_relocate(struct hot_reloc *hot_reloc)
if (count_to_hot == 0)
return;
 
-   count_to_cold = HOT_RELOC_MAX_ITEMS;
+   count_to_cold = sysctl_hot_reloc_max_items;
 
/* Don't move cold data to HDD unless there's space pressure */
if (ratio < HIGH_WATER_LEVEL)
@@ -734,7 +743,7 @@ static int hot_relocate_kthread(void *arg)
unsigned long delay;
 
do {
-   delay = HZ * HOT_RELOC_INTERVAL;
+   delay = HZ * sysctl_hot_reloc_interval;
if (mutex_trylock(&hot_reloc->hot_reloc_mutex)) {
hot_do_relocate(hot_reloc);
mutex_unlock(&hot_reloc->hot_reloc_mutex);
@@ -766,7 +775,6 @@ int hot_relocate_init(struct btrfs_fs_info *fs_info)
 
fs_info->hot_reloc = hot_reloc;
hot_reloc->fs_info = fs_info;
-   hot_reloc->thresh = HOT_RELOC_THRESHOLD;
for (i = 0; i < MAX_RELOC_TYPES; i++)
INIT_LIST_HEAD(&hot_reloc->hot_relocq[i]);
mutex_init(&hot_reloc->hot_reloc_mutex);
diff --git a/fs/btrfs/hot_relocate.h b/fs/btrfs/hot_relocate.h
index 077d9b3..ca30944 100644
--- a/fs/btrfs/hot_relocate.h
+++ b/fs/btrfs/hot_relocate.h
@@ -24,9 +24,6 @@ enum {
MAX_RELOC_TYPES
 };
 
-#define HOT_RELOC_INTERVAL  120
-#define HOT_RELOC_THRESHOLD 150
-#define HOT_RELOC_MAX_ITEMS 250
 
 #define HEAT_MAX_VALUE(MAP_SIZE - 1)
 #define HIGH_WATER_LEVEL  75 /

[RFC 3/5] btrfs: add one hot relocation kthread

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu 

   Add one private kthread for hot relocation. It will check
if there're some extents which is hotter than the threshold
and queue them at first, if no, it will return and wait for
its next turn; otherwise, it will check if SSD ratio is beyond
beyond its usage threshold, if no, it will directly relocate
those hot extents from HDD disk to SSD disk; otherwise it will
find the extents with low temperature and queue them, then
relocate those extents with low temperature and queue them,
and finally relocate the hot extents from from HDD disk to SSD
disk.

Signed-off-by: Zhi Yong Wu 
---
 fs/btrfs/ctree.h|   2 +
 fs/btrfs/hot_relocate.c | 720 +++-
 fs/btrfs/hot_relocate.h |  21 ++
 fs/btrfs/super.c|   1 +
 4 files changed, 742 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f4c4419..77d9b1c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1619,6 +1619,8 @@ struct btrfs_fs_info {
struct btrfs_dev_replace dev_replace;
 
atomic_t mutually_exclusive_operation_running;
+
+   void *hot_reloc;
 };
 
 /*
diff --git a/fs/btrfs/hot_relocate.c b/fs/btrfs/hot_relocate.c
index 1effd14..683e154 100644
--- a/fs/btrfs/hot_relocate.c
+++ b/fs/btrfs/hot_relocate.c
@@ -12,8 +12,46 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include "hot_relocate.h"
 
+/*
+ * Hot relocation strategy:
+ *
+ * The relocation code below operates on the heat map lists to identify
+ * hot or cold data logical file ranges that are candidates for relocation.
+ * The triggering mechanism for relocation is controlled by a global heat
+ * threshold integer value (HOT_RELOC_THRESHOLD). Ranges are
+ * queued for relocation by the periodically executing relocate kthread,
+ * which updates the global heat threshold and responds to space pressure
+ * on the SSDs.
+ *
+ * The heat map lists index logical ranges by heat and provide a constant-time
+ * access path to hot or cold range items. The relocation kthread uses this
+ * path to find hot or cold items to move to/from SSD. To ensure that the
+ * relocation kthread has a chance to sleep, and to prevent thrashing between
+ * SSD and HDD, there is a configurable limit to how many ranges are moved per
+ * iteration of the kthread. This limit may be overrun in the case where space
+ * pressure requires that items be aggressively moved from SSD back to HDD.
+ *
+ * This needs still more resistance to thrashing and stronger (read: actual)
+ * guarantees that relocation operations won't -ENOSPC.
+ *
+ * The relocation code has introduced one new btrfs block group type:
+ * BTRFS_BLOCK_GROUP_DATA_SSD.
+ *
+ * When mkfs'ing a volume with the hot data relocation option, initial block
+ * groups are allocated to the proper disks. Runtime block group allocation
+ * only allocates BTRFS_BLOCK_GROUP_DATA BTRFS_BLOCK_GROUP_METADATA and
+ * BTRFS_BLOCK_GROUP_SYSTEM to HDD, and likewise only allocates
+ * BTRFS_BLOCK_GROUP_DATA_SSD to SSD.
+ * (assuming, critically, the HOT_MOVE option is set at mount time).
+ */
+
 static void hot_set_extent_bits(struct extent_io_tree *tree, u64 start,
u64 end, struct extent_state **cached_state,
gfp_t mask, int storage_type, int flag)
@@ -26,10 +64,10 @@ static void hot_set_extent_bits(struct extent_io_tree 
*tree, u64 start,
EXTENT_DO_ACCOUNTING;
}
 
-   if (storage_type == ON_ROT_DISK) {
+   if (storage_type == TYPE_ROT) {
set_bits |= EXTENT_COLD;
clear_bits |= EXTENT_HOT;
-   } else if (storage_type == ON_NONROT_DISK) {
+   } else if (storage_type == TYPE_NONROT) {
set_bits |= EXTENT_HOT;
clear_bits |= EXTENT_COLD;
}
@@ -76,3 +114,681 @@ int hot_get_chunk_type(struct inode *inode, u64 start, u64 
end)
 
return ret;
 }
+
+/*
+ * Returns SSD ratio that is full.
+ * If no SSD is found, returns THRESH_MAX_VALUE + 1.
+ */
+static int hot_calc_ssd_ratio(struct hot_reloc *hot_reloc)
+{
+   struct btrfs_space_info *info;
+   struct btrfs_device *device, *next;
+   struct btrfs_fs_info *fs_info = hot_reloc->fs_info;
+   u64 total_bytes = 0, bytes_used = 0;
+
+   /*
+* Iterate through devices, if they're nonrot,
+* add their bytes to the total_bytes.
+*/
+   mutex_lock(&fs_info->fs_devices->device_list_mutex);
+   list_for_each_entry_safe(device, next,
+   &fs_info->fs_devices->devices, dev_list) {
+   if (blk_queue_nonrot(bdev_get_queue(device->bdev)))
+   total_bytes += device->total_bytes;
+   }
+   mutex_unlock(&fs_info->fs_devices->device_list_mutex);
+
+   if (total_bytes == 0)
+   return THRESH_MAX_VALUE + 1;
+
+   /*
+* Iterate through space_info. if the SSD data block group
+* is found, add the bytes used 

[RFC 2/5] btrfs: add one new block group

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu 

  Introduce one new block group BTRFS_BLOCK_GROUP_DATA_SSD,
which is used to differentiate if the block space is reserved
and allocated from one HDD disk or SSD disk.

Signed-off-by: Zhi Yong Wu 
---
 fs/btrfs/Makefile   |   3 +-
 fs/btrfs/ctree.h|  24 ++-
 fs/btrfs/extent-tree.c  | 107 +++-
 fs/btrfs/extent_io.c|  31 --
 fs/btrfs/extent_io.h|   4 ++
 fs/btrfs/file.c |  36 +---
 fs/btrfs/hot_relocate.c |  78 +++
 fs/btrfs/hot_relocate.h |  31 ++
 fs/btrfs/inode-map.c|  13 +-
 fs/btrfs/inode.c|  92 +
 fs/btrfs/ioctl.c|  23 +--
 fs/btrfs/relocation.c   |  14 ++-
 fs/btrfs/super.c|   3 +-
 fs/btrfs/volumes.c  |  28 -
 14 files changed, 439 insertions(+), 48 deletions(-)
 create mode 100644 fs/btrfs/hot_relocate.c
 create mode 100644 fs/btrfs/hot_relocate.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 3932224..94f1ea5 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o
+  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
+  hot_relocate.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 701dec5..f4c4419 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -961,6 +961,16 @@ struct btrfs_dev_replace_item {
 #define BTRFS_BLOCK_GROUP_RAID10   (1ULL << 6)
 #define BTRFS_BLOCK_GROUP_RAID5(1 << 7)
 #define BTRFS_BLOCK_GROUP_RAID6(1 << 8)
+/*
+ * New block groups for use with hot data relocation feature. When hot data
+ * relocation is on, *_SSD block groups are forced to nonrotating drives and
+ * the plain DATA and METADATA block groups are forced to rotating drives.
+ *
+ * This should be further optimized, i.e. force metadata to SSD or relocate
+ * inode metadata to SSD when any of its subfile ranges are relocated to SSD
+ * so that reads and writes aren't delayed by HDD seeks.
+ */
+#define BTRFS_BLOCK_GROUP_DATA_SSD (1ULL << 9)
 #define BTRFS_BLOCK_GROUP_RESERVED BTRFS_AVAIL_ALLOC_BIT_SINGLE
 
 enum btrfs_raid_types {
@@ -976,7 +986,8 @@ enum btrfs_raid_types {
 
 #define BTRFS_BLOCK_GROUP_TYPE_MASK(BTRFS_BLOCK_GROUP_DATA |\
 BTRFS_BLOCK_GROUP_SYSTEM |  \
-BTRFS_BLOCK_GROUP_METADATA)
+BTRFS_BLOCK_GROUP_METADATA | \
+BTRFS_BLOCK_GROUP_DATA_SSD)
 
 #define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 |   \
 BTRFS_BLOCK_GROUP_RAID1 |   \
@@ -1508,6 +1519,7 @@ struct btrfs_fs_info {
struct list_head space_info;
 
struct btrfs_space_info *data_sinfo;
+   struct btrfs_space_info *hot_data_sinfo;
 
struct reloc_control *reloc_ctl;
 
@@ -1532,6 +1544,7 @@ struct btrfs_fs_info {
u64 avail_data_alloc_bits;
u64 avail_metadata_alloc_bits;
u64 avail_system_alloc_bits;
+   u64 avail_data_ssd_alloc_bits;
 
/* restriper state */
spinlock_t balance_lock;
@@ -1544,6 +1557,7 @@ struct btrfs_fs_info {
 
unsigned data_chunk_allocations;
unsigned metadata_ratio;
+   unsigned data_ssd_chunk_allocations;
 
void *bdev_holder;
 
@@ -1901,6 +1915,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1 << 22)
 #define BTRFS_MOUNT_HOT_TRACK  (1 << 23)
+#define BTRFS_MOUNT_HOT_MOVE   (1 << 24)
 
 #define btrfs_clear_opt(o, opt)((o) &= ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
@@ -1922,6 +1937,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_INODE_NOATIME(1 << 9)
 #define BTRFS_INODE_DIRSYNC(1 << 10)
 #define BTRFS_INODE_COMPRESS   (1 << 11)
+#define BTRFS_INODE_HOT(1 << 12)
 
 #define BTRFS_INODE_ROOT_ITEM_INIT (1 << 31)
 
@@ -3014,6 +3030,8 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root 
*root,
 int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 objectid, u64 offset, u64 bytenr);
+struct btrfs_block_group_cache *btrfs_lookup_first_block_group(
+  

[RFC 1/5] vfs: add one list_head field

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu 

  Add one list_head field 'reloc_list' to accommodate
hot relocation support.

Signed-off-by: Zhi Yong Wu 
---
 fs/hot_tracking.c| 1 +
 include/linux/hot_tracking.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
index 3b0002c..7071ac8 100644
--- a/fs/hot_tracking.c
+++ b/fs/hot_tracking.c
@@ -41,6 +41,7 @@ static void hot_comm_item_init(struct hot_comm_item *ci, int 
type)
clear_bit(HOT_IN_LIST, &ci->delete_flag);
clear_bit(HOT_DELETING, &ci->delete_flag);
INIT_LIST_HEAD(&ci->track_list);
+   INIT_LIST_HEAD(&ci->reloc_list);
memset(&ci->hot_freq_data, 0, sizeof(struct hot_freq_data));
ci->hot_freq_data.avg_delta_reads = (u64) -1;
ci->hot_freq_data.avg_delta_writes = (u64) -1;
diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
index 2272975..49f901c 100644
--- a/include/linux/hot_tracking.h
+++ b/include/linux/hot_tracking.h
@@ -68,6 +68,7 @@ struct hot_comm_item {
struct rb_node rb_node; /* rbtree index */
unsigned long delete_flag;
struct list_head track_list;/* link to *_map[] */
+   struct list_head reloc_list;/* used in hot relocation*/
 };
 
 /* An item representing an inode and its access frequency */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/5] BTRFS hot relocation support

2013-05-06 Thread zwu . kernel
From: Zhi Yong Wu 

  The patchset as RFC is sent out mainly to see if it goes in the
correct development direction.

  The patchset is trying to introduce hot relocation support
for BTRFS. In hybrid storage environment, when the data in
HDD disk get hot, it can be relocated to SSD disk by BTRFS
hot relocation support automatically; also, if SSD disk ratio
exceed its upper threshold, the data which get cold can be
looked up and relocated to HDD disk to make more space in SSD
disk at first, and then the data which get hot will be relocated
to SSD disk automatically.

  BTRFS hot relocation mainly reserve block space from SSD disk
at first, load the hot data to page cache from HDD, allocate
block space from SSD disk, and finally write the data to SSD disk.

  If you'd like to play with it, pls pull the patchset from
my git on github:
  https://github.com/wuzhy/kernel.git hot_reloc

For how to use, please refer too the example below:

root@debian-i386:~# echo 0 > /sys/block/vdc/queue/rotational
^^^ Above command will hack /dev/vdc to be one SSD disk
root@debian-i386:~# echo 99 > /proc/sys/fs/hot-age-interval
root@debian-i386:~# echo 10 > /proc/sys/fs/hot-update-interval
root@debian-i386:~# echo 10 > /proc/sys/fs/hot-reloc-interval
root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc -f
 
WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using
 
[ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1 transid 
16 /dev/vdb
[ 140.283650] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 transid 
16 /dev/vdc
[ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 
3 /dev/vdb
[ 140.550759] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 
3 /dev/vdb
[ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 transid 
16 /dev/vdc
adding device /dev/vdc id 2
[ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2 transid 
3 /dev/vdc
fs created label (null) on /dev/vdb
nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB
Btrfs v0.20-rc1-254-gb0136aa-dirty
root@debian-i386:~# mount -o hot_move /dev/vdb /data2
[ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 
6 /dev/vdb
[ 144.870444] btrfs: disk space caching is enabled
[ 144.904214] VFS: Turning on hot data tracking
root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s
root@debian-i386:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 16G 13G 2.2G 86% /
tmpfs 4.8G 0 4.8G 0% /lib/init/rw
udev 10M 176K 9.9M 2% /dev
tmpfs 4.8G 0 4.8G 0% /dev/shm
/dev/vdb 15G 2.0G 13G 14% /data2
root@debian-i386:~# btrfs fi df /data2
Data: total=3.01GB, used=2.00GB
System: total=4.00MB, used=4.00KB
Metadata: total=8.00MB, used=2.19MB
Data_SSD: total=8.00MB, used=0.00
root@debian-i386:~# echo 108 > /proc/sys/fs/hot-reloc-threshold
^^^ Above command will start HOT RLEOCATE, because The data temperature is 
currently 109
root@debian-i386:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 16G 13G 2.2G 86% /
tmpfs 4.8G 0 4.8G 0% /lib/init/rw
udev 10M 176K 9.9M 2% /dev
tmpfs 4.8G 0 4.8G 0% /dev/shm
/dev/vdb 15G 2.1G 13G 14% /data2
root@debian-i386:~# btrfs fi df /data2
Data: total=3.01GB, used=6.25MB
System: total=4.00MB, used=4.00KB
Metadata: total=8.00MB, used=2.26MB
Data_SSD: total=2.01GB, used=2.00GB
root@debian-i386:~# 

Zhi Yong Wu (5):
  vfs: add one list_head field
  btrfs: add one new block group
  btrfs: add one hot relocation kthread
  procfs: add three proc interfaces
  btrfs: add hot relocation support

 fs/btrfs/Makefile|   3 +-
 fs/btrfs/ctree.h |  26 +-
 fs/btrfs/extent-tree.c   | 107 +-
 fs/btrfs/extent_io.c |  31 +-
 fs/btrfs/extent_io.h |   4 +
 fs/btrfs/file.c  |  36 +-
 fs/btrfs/hot_relocate.c  | 802 +++
 fs/btrfs/hot_relocate.h  |  48 +++
 fs/btrfs/inode-map.c |  13 +-
 fs/btrfs/inode.c |  92 -
 fs/btrfs/ioctl.c |  23 +-
 fs/btrfs/relocation.c|  14 +-
 fs/btrfs/super.c |  30 +-
 fs/btrfs/volumes.c   |  28 +-
 fs/hot_tracking.c|   1 +
 include/linux/btrfs.h|   4 +
 include/linux/hot_tracking.h |   1 +
 kernel/sysctl.c  |  22 ++
 18 files changed, 1234 insertions(+), 51 deletions(-)
 create mode 100644 fs/btrfs/hot_relocate.c
 create mode 100644 fs/btrfs/hot_relocate.h

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Kai Krakow
Josef Bacik  schrieb:

>> I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
>> I'm running bedup[1] on a regular basis and it is now the third time that
>> I got back to my PC just to find it hard-frozen and I needed to use the
>> reset button.
>> 
>> It looks like this happens only while running bedup on my two btrfs
>> filesystems but I'm not sure if it happens for any of the filesystems or
>> only one. This is my setup:

[snip]

> Can you please file a bug for this issue on bugzilla.kernel.org so I can
> make
> sure we don't lose track of it?  Make sure the component is set to Btrfs.

Meanwhile I found out: It does not only happen during dedup with bedup but 
also when creating my rsync backup. I will file all the details to bugzilla 
this evening.

Thanks,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread Kai Krakow
Jan Schmidt  schrieb:

>> I'm using an bash/rsync script[1] to backup my whole system on a nightly
>> basis to an attached USB3 drive into a scratch area, then take a snapshot
>> of this area. I'd like to have these snapshots immutable, so they should
>> be read-only.
> 
> Have you considered using btrfs send / receive for that purpose? You would
> just save the dedup step.

This is planned for later. In the first step I want to stay as file system 
agnostic for the source as possible. But I've put it on my todo list in the 
gist.

Regards,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html