[PATCH] Btrfs: check next slot type in log_one_extent()

2012-11-17 Thread Itaru Kitayama
Btrfs-next fails xfstest 127 when checking length of an extent in 
log_one_extent().  Fix this by doing the item type check.

Signed-off-by: Itaru Kitayama 

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 40b9efd..0c20bb4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3207,6 +3207,11 @@ again:
args->src = path->nodes[0];
 next_slot:
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+   if (key.objectid != btrfs_ino(inode) ||
+   key.type != BTRFS_EXTENT_DATA_KEY) {
+   btrfs_release_path(path);
+   return -ENOENT;
+   }
num_bytes = btrfs_file_extent_length(path);
if (args->nr &&
args->start_slot + args->nr == path->slots[0]) {
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag

2012-11-01 Thread Itaru Kitayama
Hi Liubo:

The V4 leaves only warnings from btrfs_destroy_inode(). So, you think
it's normal
an "old" extent recorded can be removed from the extent tree by the time
relink_file_extents() invoked?

Itaru

On Thu, Nov 1, 2012 at 8:21 PM, Liu Bo  wrote:

> The current btrfs-next HEAD actually have included this v4 patch, so
> just pull btrfs-next and give it a shot :)
>
> thanks,
> liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag

2012-11-01 Thread Itaru Kitayama
Hi Liubo,

I couldn't apply your V4 patch against the btrfs-next HEAD. Do you have
a github branch which I can checkout?

Thanks,

Itaru

On Wed, Oct 31, 2012 at 9:55 PM, Liu Bo  wrote:
> On 10/31/2012 08:13 PM, Itaru Kitayama wrote:
>> Hi LiuBo:
>>
>> I am seeing another warning with your patch applied btrfs-next.
>>
>
> Hi Itaru,
>
> Thanks for testing, you seems to be using an old version, since in the new 
> version
> record_extent_backrefs() does not own a WARN_ON().
>
> Could you please test it again with the new patches applied?
>
> thanks,
> liubo
>
>
>> [ 5224.531560] [ cut here ]
>> [ 5224.531565] WARNING: at fs/btrfs/inode.c:2054
>> record_extent_backrefs+0x87/0xe0()
>> [ 5224.531567] Hardware name: Bochs
>> [ 5224.531568] Modules linked in: microcode ppdev psmouse nfsd nfs_acl
>> auth_rpcgss serio_raw nfs fscache lockd binfmt_misc sunrpc cirrus
>> parport_pc ttm drm_kms_helper drm sysimgblt i2c_piix4 sysfillrect
>> syscopyarea i2c_core lp parport floppy
>> [ 5224.531591] Pid: 2485, comm: btrfs-endio-wri Tainted: GW
>> 3.7.0-rc1-v11+ #53
>> [ 5224.531592] Call Trace:
>> [ 5224.531598]  [] warn_slowpath_common+0x93/0xc0
>> [ 5224.531600]  [] warn_slowpath_null+0x1a/0x20
>> [ 5224.531603]  [] record_extent_backrefs+0x87/0xe0
>> [ 5224.531606]  [] btrfs_finish_ordered_io+0x8bb/0xa80
>> [ 5224.531611]  [] ? trace_hardirqs_off_caller+0xb0/0x140
>> [ 5224.531614]  [] finish_ordered_fn+0x15/0x20
>> [ 5224.531617]  [] worker_loop+0x157/0x580
>> [ 5224.531620]  [] ? btrfs_queue_worker+0x2f0/0x2f0
>> [ 5224.531624]  [] kthread+0xe8/0xf0
>> [ 5224.531627]  [] ? get_lock_stats+0x22/0x70
>> [ 5224.531630]  [] ? kthread_create_on_node+0x160/0x160
>> [ 5224.531634]  [] ret_from_fork+0x7c/0xb0
>> [ 5224.531636]  [] ? kthread_create_on_node+0x160/0x160
>> [ 5224.531638] ---[ end trace 0256d2b5a195208c ]---
>>
>> I've compared some of the old extents logical addresses with the 
>> corresponding
>> object ids and offsets from the extent tree; some are just 8k off from
>> the found extents
>> and some keys are totally off.
>>
>> Itaru
>>
>> On Sat, Oct 27, 2012 at 7:28 PM, Liu Bo  wrote:
>>> This comes from one of btrfs's project ideas,
>>> As we defragment files, we break any sharing from other snapshots.
>>> The balancing code will preserve the sharing, and defrag needs to grow this
>>> as well.
>>>
>>> Now we're able to fill the blank with this patch, in which we make full use 
>>> of
>>> backref walking stuff.
>>>
>>> Here is the basic idea,
>>> o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
>>> o  at endio, after we finish updating fs tree, we use backref walking to 
>>> find
>>>all parents of the ranges and re-link them with the new COWed file 
>>> layout by
>>>adding corresponding backrefs.
>>>
>>> Originally patch by Li Zefan 
>>> Signed-off-by: Liu Bo 
>>> ---
>>> v3->v4:
>>>   - fix duplicated refs bugs detected by mounting with autodefrag, 
>>> thanks
>>> for the bug report from Mitch and Chris.
>>>
>>>  fs/btrfs/inode.c |  609 
>>> ++
>>>  1 files changed, 609 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 85a1e50..35e6993 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -54,6 +54,7 @@
>>>  #include "locking.h"
>>>  #include "free-space-cache.h"
>>>  #include "inode-map.h"
>>> +#include "backref.h"
>>>
>>>  struct btrfs_iget_args {
>>> u64 ino;
>>> @@ -1839,6 +1840,600 @@ out:
>>> return ret;
>>>  }
>>>
>>> +/* snapshot-aware defrag */
>>> +struct sa_defrag_extent_backref {
>>> +   struct rb_node node;
>>> +   struct old_sa_defrag_extent *old;
>>> +   u64 root_id;
>>> +   u64 inum;
>>> +   u64 file_pos;
>>> +   u64 extent_offset;
>>> +   u64 num_bytes;
>>> +   u64 generation;
>>> +};
>>> +
>>> +struct old_sa_defrag_extent {
>>> +   struct list_head list;
>>> +   struct new_sa_defrag_extent *new;
>>> +
>>> +   u64 extent_offset;
>>> +   u64 bytenr;
>&g

Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag

2012-10-31 Thread Itaru Kitayama
Hi LiuBo:

I am seeing another warning with your patch applied btrfs-next.

[ 5224.531560] [ cut here ]
[ 5224.531565] WARNING: at fs/btrfs/inode.c:2054
record_extent_backrefs+0x87/0xe0()
[ 5224.531567] Hardware name: Bochs
[ 5224.531568] Modules linked in: microcode ppdev psmouse nfsd nfs_acl
auth_rpcgss serio_raw nfs fscache lockd binfmt_misc sunrpc cirrus
parport_pc ttm drm_kms_helper drm sysimgblt i2c_piix4 sysfillrect
syscopyarea i2c_core lp parport floppy
[ 5224.531591] Pid: 2485, comm: btrfs-endio-wri Tainted: GW
3.7.0-rc1-v11+ #53
[ 5224.531592] Call Trace:
[ 5224.531598]  [] warn_slowpath_common+0x93/0xc0
[ 5224.531600]  [] warn_slowpath_null+0x1a/0x20
[ 5224.531603]  [] record_extent_backrefs+0x87/0xe0
[ 5224.531606]  [] btrfs_finish_ordered_io+0x8bb/0xa80
[ 5224.531611]  [] ? trace_hardirqs_off_caller+0xb0/0x140
[ 5224.531614]  [] finish_ordered_fn+0x15/0x20
[ 5224.531617]  [] worker_loop+0x157/0x580
[ 5224.531620]  [] ? btrfs_queue_worker+0x2f0/0x2f0
[ 5224.531624]  [] kthread+0xe8/0xf0
[ 5224.531627]  [] ? get_lock_stats+0x22/0x70
[ 5224.531630]  [] ? kthread_create_on_node+0x160/0x160
[ 5224.531634]  [] ret_from_fork+0x7c/0xb0
[ 5224.531636]  [] ? kthread_create_on_node+0x160/0x160
[ 5224.531638] ---[ end trace 0256d2b5a195208c ]---

I've compared some of the old extents logical addresses with the corresponding
object ids and offsets from the extent tree; some are just 8k off from
the found extents
and some keys are totally off.

Itaru

On Sat, Oct 27, 2012 at 7:28 PM, Liu Bo  wrote:
> This comes from one of btrfs's project ideas,
> As we defragment files, we break any sharing from other snapshots.
> The balancing code will preserve the sharing, and defrag needs to grow this
> as well.
>
> Now we're able to fill the blank with this patch, in which we make full use of
> backref walking stuff.
>
> Here is the basic idea,
> o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
> o  at endio, after we finish updating fs tree, we use backref walking to find
>all parents of the ranges and re-link them with the new COWed file layout 
> by
>adding corresponding backrefs.
>
> Originally patch by Li Zefan 
> Signed-off-by: Liu Bo 
> ---
> v3->v4:
>   - fix duplicated refs bugs detected by mounting with autodefrag, thanks
> for the bug report from Mitch and Chris.
>
>  fs/btrfs/inode.c |  609 
> ++
>  1 files changed, 609 insertions(+), 0 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 85a1e50..35e6993 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -54,6 +54,7 @@
>  #include "locking.h"
>  #include "free-space-cache.h"
>  #include "inode-map.h"
> +#include "backref.h"
>
>  struct btrfs_iget_args {
> u64 ino;
> @@ -1839,6 +1840,600 @@ out:
> return ret;
>  }
>
> +/* snapshot-aware defrag */
> +struct sa_defrag_extent_backref {
> +   struct rb_node node;
> +   struct old_sa_defrag_extent *old;
> +   u64 root_id;
> +   u64 inum;
> +   u64 file_pos;
> +   u64 extent_offset;
> +   u64 num_bytes;
> +   u64 generation;
> +};
> +
> +struct old_sa_defrag_extent {
> +   struct list_head list;
> +   struct new_sa_defrag_extent *new;
> +
> +   u64 extent_offset;
> +   u64 bytenr;
> +   u64 offset;
> +   u64 len;
> +   int count;
> +};
> +
> +struct new_sa_defrag_extent {
> +   struct rb_root root;
> +   struct list_head head;
> +   struct btrfs_path *path;
> +   struct inode *inode;
> +   u64 file_pos;
> +   u64 len;
> +   u64 bytenr;
> +   u64 disk_len;
> +   u8 compress_type;
> +};
> +
> +static int backref_comp(struct sa_defrag_extent_backref *b1,
> +   struct sa_defrag_extent_backref *b2)
> +{
> +   if (b1->root_id < b2->root_id)
> +   return -1;
> +   else if (b1->root_id > b2->root_id)
> +   return 1;
> +
> +   if (b1->inum < b2->inum)
> +   return -1;
> +   else if (b1->inum > b2->inum)
> +   return 1;
> +
> +   if (b1->file_pos < b2->file_pos)
> +   return -1;
> +   else if (b1->file_pos > b2->file_pos)
> +   return 1;
> +
> +   return 0;
> +}
> +
> +static void backref_insert(struct rb_root *root,
> +  struct sa_defrag_extent_backref *backref)
> +{
> +   struct rb_node **p = &root->rb_node;
> +   struct rb_node *parent = NULL;
> +   struct sa_defrag_extent_backref *entry;
> +   int ret;
> +
> +   while (*p) {
> +   parent = *p;
> +   entry = rb_entry(parent, struct sa_defrag_extent_backref, 
> node);
> +
> +   ret = backref_comp(backref, entry);
> +   if (ret < 0)
> +   p = &(*p)->rb_left;
> +   else
> +   /*
> +* Since space can be shared, so there ca

Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested

2012-09-30 Thread Itaru Kitayama
Hi David,
Thank you for your comments. I wanted to fix a lockdep warning on a
possible deadlock
case encountered during the xfstests with a scratch space almost full.

You are right I encountered the worst scenario you described below, I
drop this patch and
I'll look at btrfs_congested_fn more to examine the mechanisms
implemented there are
working as expected.

Thanks,

Itaru

On Mon, Oct 1, 2012 at 7:29 AM, David Sterba  wrote:
> Hi again,
>
> On Sun, Sep 30, 2012 at 11:11:10AM +0900, Itaru Kitayama wrote:
>> This is the correct one.
>>
>> Signed-off-by: Itaru Kitayama 
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index efb044e..c032dbe 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root
>> *root, u64 to_reclaim, u64 orig,
>> while (delalloc_bytes && loops < 3) {
>> max_reclaim = min(delalloc_bytes, to_reclaim);
>> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
>> -   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
>> -  WB_REASON_FS_FREE_SPACE);
>> +   if (!bdi_write_congested(root->fs_info->sb->s_bdi))
>> +   writeback_inodes_sb_nr_if_idle(root->fs_info, 
>> nr_pages,
>> +  
>> WB_REASON_FS_FREE_SPACE);
>
> You don't pass the same argument to writeback_inodes_sb_nr_if_idle in
> the changed code, this would not compile (root->fs_info->sb), but it's
> a minor thing.
>
> I'm more interested in the background motivation of the change, it's
> clear that it tries to avoid writing data if the devices are congested,
> have you measured an improvement against original behaviour?
>
> writeback_inodes_sb_nr_if_idle checks if writeback is in progress and
> does not start if this is true. That way this will not hammer the device
> unnecessarily.
>
> Your code tries to avoid even more hammering of the device when the
> writes do not come from writeback. This may or may not be a good thing
> (and hard to guess without a few tests, or at least I don't see that).
>
> Shrink delalloc starts the writeback for a given number of pages and
> then hopes they'll be flushed so the reserved space can be reclaimed
> back. If the device is congested, this will not start the writeback and
> it would be very unlikely that total delalloc bytes shrinks. The rest of
> the function relies on asynchronous behaviour, it's even less clear what
> it would do without the writeback call. In the worst case it could block
> on 'wait_event' or at 'btrfs_wait_ordered_extents' in some case, though
> this is just more of a speculation.
>
>
> david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested

2012-09-29 Thread Itaru Kitayama
This is the correct one.

Signed-off-by: Itaru Kitayama 

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index efb044e..c032dbe 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root
*root, u64 to_reclaim, u64 orig,
while (delalloc_bytes && loops < 3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
-   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
-  WB_REASON_FS_FREE_SPACE);
+   if (!bdi_write_congested(root->fs_info->sb->s_bdi))
+   writeback_inodes_sb_nr_if_idle(root->fs_info, nr_pages,
+  WB_REASON_FS_FREE_SPACE);

/*
 * We need to wait for the async pages to actually start before


On Sun, Sep 30, 2012 at 10:28 AM, Itaru Kitayama  wrote:
> Resubmit this after the checkpatch test.
>
> In srhink_delalloc(), writeback starts if idle, also check the bdi is
> not write congested.
> The patch is against the head of the btrfs-next.
>
> Signed-off-by: Itaru Kitayama 
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index efb044e..1aae046 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root
> *root, u64 to_reclaim, u64 orig,
> while (delalloc_bytes && loops < 3) {
> max_reclaim = min(delalloc_bytes, to_reclaim);
> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
> -   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
> -  WB_REASON_FS_FREE_SPACE);
> +   if (!bdi_write_congested(root->fs_info->sb->s_bdi))
> +   writeback_inodes_sb_nr_if_idle(root->fs_info, nr_page,
> +  
> WB_REASON_FS_FREE_SPACE);
>
> /*
>  * We need to wait for the async pages to actually start 
> before
>
>
> On Sun, Sep 30, 2012 at 1:21 AM, David Sterba  wrote:
>> On Sat, Sep 29, 2012 at 10:20:09PM +0900, Itaru Kitayama wrote:
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, 
>>> u64 t
>>> while (delalloc_bytes && loops < 3) {
>>> max_reclaim = min(delalloc_bytes, to_reclaim);
>>> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
>>> -   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
>>> +   if (!bdi_write_congested(root->fs_info->sb->s_bdi)) 
>>> writeback_in
>>>WB_REASON_FS_FREE_SPACE);
>>
>> malformed patch
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: shrink_delalloc check bdi write congested

2012-09-29 Thread Itaru Kitayama
Resubmit this after the checkpatch test.

In srhink_delalloc(), writeback starts if idle, also check the bdi is
not write congested.
The patch is against the head of the btrfs-next.

Signed-off-by: Itaru Kitayama 

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index efb044e..1aae046 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3712,8 +3712,9 @@ static void shrink_delalloc(struct btrfs_root
*root, u64 to_reclaim, u64 orig,
while (delalloc_bytes && loops < 3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
-   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
-  WB_REASON_FS_FREE_SPACE);
+   if (!bdi_write_congested(root->fs_info->sb->s_bdi))
+   writeback_inodes_sb_nr_if_idle(root->fs_info, nr_page,
+  WB_REASON_FS_FREE_SPACE);

/*
 * We need to wait for the async pages to actually start before


On Sun, Sep 30, 2012 at 1:21 AM, David Sterba  wrote:
> On Sat, Sep 29, 2012 at 10:20:09PM +0900, Itaru Kitayama wrote:
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, 
>> u64 t
>> while (delalloc_bytes && loops < 3) {
>> max_reclaim = min(delalloc_bytes, to_reclaim);
>> nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
>> -   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
>> +   if (!bdi_write_congested(root->fs_info->sb->s_bdi)) 
>> writeback_in
>>WB_REASON_FS_FREE_SPACE);
>
> malformed patch
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: shrink_delalloc check bdi write congested

2012-09-29 Thread Itaru Kitayama
In srhink_delalloc(), writeback starts if idle, also check the bdi is
not write congested.
The patch is against the head of the btrfs-next.

Signed-off-by: Itaru Kitayama 

 fs/btrfs/extent-tree.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index efb044e..caa74d3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3712,7 +3712,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 t
while (delalloc_bytes && loops < 3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
-   writeback_inodes_sb_nr_if_idle(root->fs_info->sb, nr_pages,
+   if (!bdi_write_congested(root->fs_info->sb->s_bdi)) writeback_in
   WB_REASON_FS_FREE_SPACE);

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] btrfs: fix possible deadlock by clearing __GFP_FS flag

2011-03-28 Thread Itaru Kitayama
Hi Miao,

On Sun, 27 Mar 2011 20:27:30 +0800
Miao Xie  wrote:

> Changelog V1 -> V2:
> - modify the explanation of the deadlock.
> - clear __GFP_FS flag in the free space's page cache.

I think this is also needed on top of your V5 patch to avoid a recursion. Could 
you
review it and give your Signed-off-by?

Signed-off-by: Itaru Kitayama 

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8862dda..03e5ab3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2641,7 +2641,7 @@ int extent_readpages(struct extent_io_tree *tree,
prefetchw(&page->flags);
list_del(&page->lru);
if (!add_to_page_cache_lru(page, mapping,
-   page->index, GFP_KERNEL)) {
+   page->index, GFP_NOFS)) {
__extent_read_full_page(tree, page, get_extent,
&bio, 0, &bio_flags);
}

After applying the patch above, I don't see the warning below during Chris' 
stress test. 

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.36-v5+ #10
-
kswapd0/49 just changed the state of lock:
 (&delayed_node->mutex){+.+.-.}, at: [] 
btrfs_remove_delayed_node+0x3e/0xd2
but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
 (&found->groups_sem){.+}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
2 locks held by kswapd0/49:
 #0:  (shrinker_rwsem){..}, at: [] shrink_slab+0x3d/0x164
 #1:  (iprune_sem){.-}, at: [] 
shrink_icache_memory+0x4d/0x213

the shortest dependencies between 2nd lock and 1st lock:
 -> (&found->groups_sem){.+} ops: 3649 {
HARDIRQ-ON-W at:
  [] __lock_acquire+0x346/0xda6
  [] lock_acquire+0x11d/0x143
  [] down_write+0x55/0x9b
  [] __link_block_group+0x5a/0x83
  [] 
btrfs_read_block_groups+0x2fb/0x56c
  [] open_ctree+0xf8f/0x14c3
  [] btrfs_get_sb+0x236/0x467
  [] vfs_kern_mount+0xbd/0x1a7
  [] do_kern_mount+0x4d/0xed
  [] do_mount+0x74e/0x7c5
  [] sys_mount+0x88/0xc2
  [] system_call_fastpath+0x16/0x1b
HARDIRQ-ON-R at:
  [] __lock_acquire+0x31e/0xda6
  [] lock_acquire+0x11d/0x143
  [] down_read+0x4c/0x91
  [] find_free_extent+0x3ec/0xa86
  [] btrfs_reserve_extent+0xb4/0x142
  [] 
btrfs_alloc_free_block+0x167/0x2b2
  [] __btrfs_cow_block+0x103/0x346
  [] btrfs_cow_block+0x101/0x110
  [] btrfs_search_slot+0x143/0x513
  [] 
btrfs_truncate_inode_items+0x12a/0x61a
  [] btrfs_evict_inode+0x154/0x1be
  [] evict+0x27/0x97
  [] iput+0x1d0/0x23e
  [] btrfs_orphan_cleanup+0x1c8/0x269
  [] btrfs_cleanup_fs_roots+0x6d/0x8c
  [] btrfs_remount+0x9e/0xe9
  [] do_remount_sb+0xbb/0x106
  [] do_mount+0x255/0x7c5
  [] sys_mount+0x88/0xc2
  [] system_call_fastpath+0x16/0x1b
SOFTIRQ-ON-W at:
  [] __lock_acquire+0x367/0xda6
  [] lock_acquire+0x11d/0x143
  [] down_write+0x55/0x9b
  [] __link_block_group+0x5a/0x83
  [] 
btrfs_read_block_groups+0x2fb/0x56c
  [] open_ctree+0xf8f/0x14c3
  [] btrfs_get_sb+0x236/0x467
  [] vfs_kern_mount+0xbd/0x1a7
  [] do_kern_mount+0x4d/0xed
  [] do_mount+0x74e/0x7c5
  [] sys_mount+0x88/0xc2
  [] system_call_fastpath+0x16/0x1b
SOFTIRQ-ON-R at:
  [] __lock_acquire+0x367/0xda6
  [] lock_acquire+0x11d/0x143
  [] down_read+0x4c/0x91
  [] find_free_extent+0x3ec/0xa86
  [] btrfs_reserve_extent+0xb4/0x142
  [] 
btrfs_alloc_free_block+0x167/0x2b2
  [] __btrfs_cow_block+0x103/0x346
  [] btrfs_cow_block+0x101/0x110
  [] btrfs_search_slot+0

Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-27 Thread Itaru Kitayama
Hi Miao,

On Sun, 27 Mar 2011 15:00:00 +0800
Miao Xie  wrote:

> I got it. It is because the allocation flag of the metadata's page cache, 
> which is stored in
> the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we 
> allocate pages for
> btree's page cache, this lockdep warning will be triggered.
> 
> I think even without my patch, this lockdep warning can also be triggered, 
> btrfs_evict_inode()
> do the similar operations like what I do in the btrfs_destroy_inode(). 
>   Task1   Kswap0 task
>   open()
> ...
> btrfs_search_slot()
>   ...
>   btrfs_cow_block()
>   ...
>   alloc_page()
> wait for reclaiming
>   shrink_slab()
> ...
> shrink_icache_memory()
>   ...
>   btrfs_evict_inode()
> ...
> btrfs_search_slot()
> 
> If the path is locked by task1, the deadlock happens.

Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still possible 
for the kswapd0 
to call prune_icache(), no? I still see the lockdep warning even with your 
patch that clears
__GFP_FS in open_ctree().

itaru

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-26 Thread Itaru Kitayama
Hi Miao,

Chris' stress test, stress.sh -n 50 -c /mnt/linux-2.6 /mnt gave me another 
lockdep splat
(see below). I applied your V5 patches on top of the next-rc branch.

I haven't triggered it in my actual testing, but do you think we can iterate a 
list of block 
groups in an lockless manner using rcu?

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2164296..f40ff4e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -740,6 +740,7 @@ struct btrfs_space_info {
struct list_head block_groups[BTRFS_NR_RAID_TYPES];
spinlock_t lock;
struct rw_semaphore groups_sem;
+   struct srcu_struct groups_srcu;
atomic_t caching_threads;
 };
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9e4c9f4..22d6dbb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3003,6 +3003,7 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
for (i = 0; i < BTRFS_NR_RAID_TYPES; i++)
INIT_LIST_HEAD(&found->block_groups[i]);
init_rwsem(&found->groups_sem);
+   init_srcu_struct(&found->groups_srcu);
spin_lock_init(&found->lock);
found->flags = flags & (BTRFS_BLOCK_GROUP_DATA |
BTRFS_BLOCK_GROUP_SYSTEM |
@@ -4853,6 +4854,7 @@ static noinline int find_free_extent(struct 
btrfs_trans_handle *trans,
 int data)
 {
int ret = 0;
+   int idx;
struct btrfs_root *root = orig_root->fs_info->extent_root;
struct btrfs_free_cluster *last_ptr = NULL;
struct btrfs_block_group_cache *block_group = NULL;
@@ -4929,7 +4931,7 @@ ideal_cache:
if (block_group && block_group_bits(block_group, data) &&
(block_group->cached != BTRFS_CACHE_NO ||
 search_start == ideal_cache_offset)) {
-   down_read(&space_info->groups_sem);
+   idx = srcu_read_lock(&space_info->groups_srcu);
if (list_empty(&block_group->list) ||
block_group->ro) {
/*
@@ -4939,7 +4941,7 @@ ideal_cache:
 * valid
 */
btrfs_put_block_group(block_group);
-   up_read(&space_info->groups_sem);
+   srcu_read_unlock(&space_info->groups_srcu, idx);
} else {
index = get_block_group_index(block_group);
goto have_block_group;
@@ -4949,8 +4951,8 @@ ideal_cache:
}
}
 search:
-   down_read(&space_info->groups_sem);
-   list_for_each_entry(block_group, &space_info->block_groups[index],
+   idx = srcu_read_lock(&space_info->groups_srcu);
+   list_for_each_entry_rcu(block_group, &space_info->block_groups[index],
list) {
u64 offset;
int cached;
@@ -5197,8 +5199,8 @@ loop:
BUG_ON(index != get_block_group_index(block_group));
btrfs_put_block_group(block_group);
}
-   up_read(&space_info->groups_sem);
-
+   srcu_read_unlock(&space_info->groups_srcu, idx);
+   
if (!ins->objectid && ++index < BTRFS_NR_RAID_TYPES)
goto search;
 


=
[ INFO: possible irq lock inversion dependency detected ]
2.6.36-v5+ #2
-
kswapd0/49 just changed the state of lock:
 (&delayed_node->mutex){+.+.-.}, at: [] 
btrfs_remove_delayed_node+0x3e/0xd2
but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
 (&found->groups_sem){.+}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
2 locks held by kswapd0/49:
 #0:  (shrinker_rwsem){..}, at: [] shrink_slab+0x3d/0x164
 #1:  (iprune_sem){.-}, at: [] 
shrink_icache_memory+0x4d/0x213

the shortest dependencies between 2nd lock and 1st lock:
 -> (&found->groups_sem){.+} ops: 1334 {
HARDIRQ-ON-W at:
  [] __lock_acquire+0x346/0xda6
  [] lock_acquire+0x11d/0x143
  [] down_write+0x55/0x9b
  [] __link_block_group+0x5a/0x83
  [] 
btrfs_read_block_groups+0x2fb/0x56c
  [] open_ctree+0xf78/0x14ab
  [] btrfs_get_sb+0x236/0x467
  [] vfs_kern_mount+0xbd/0x1a7
  [] do_kern_mount+0x4d/0xed
  [] do_mount+0x74e/0x7c5
  [] sys_mount+0x88/0xc2
  [] system_call_fastpath+0x16/0x1b
HARDIRQ-ON-R at:
  [] __lock_acquire+0x31e/0xda6
  [] lock_acquire+0x11d/0x143
  

Re: [PATCH V4] btrfs: implement delayed inode items operation

2011-03-23 Thread Itaru Kitayama
On Wed, 23 Mar 2011 17:47:01 +0800
Miao Xie  wrote:

> we found GFP_KERNEL was passed into kzalloc(), I think this flag trigger the 
> above lockdep
> warning. the attached patch, which against the delayed items operation patch, 
> may fix this
> problem, Could you test it for me?

The possible irq lock inversion dependency warning seems to go away.   

itaru   
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4] btrfs: implement delayed inode items operation

2011-03-22 Thread Itaru Kitayama
On Wed, 23 Mar 2011 12:00:38 +0800
Miao Xie  wrote:

> I is testing the new version, in which I fixed the slab shrinker problem 
> reported by
> Chris. In the new version, the delayed node is removed before the relative 
> inode is
> moved into the unused_inode list(the slab shrinker will reclaim inodes in 
> this list).
> Maybe this method can also fix this bug. So could you tell me the reproduce 
> step
> or the options of mkfs and mount? I will check if the new patch can fix this 
> bug or not.

I used the default mkfs options for $TEST_DEV and enabled the space_cache and 
the
compress=lzo options upon mounting the partition.

itaru
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4] btrfs: implement delayed inode items operation

2011-03-22 Thread Itaru Kitayama
Hi Miao,

The possible circular locking dependency message doesn't show up in the updated 
V5. However,
I see a new possible irq lock inversion dependency message while running 
xfstests.

=
[ INFO: possible irq lock inversion dependency detected ]
2.6.36-xie+ #122
-
kswapd0/49 just changed the state of lock:
 (iprune_sem){.-}, at: [] shrink_icache_memory+0x4d/0x213
but this lock took another, RECLAIM_FS-unsafe lock in the past:
 (&delayed_node->mutex){+.+.+.}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
1 lock held by kswapd0/49:
 #0:  (shrinker_rwsem){..}, at: [] shrink_slab+0x3d/0x164

the shortest dependencies between 2nd lock and 1st lock:
 -> (&delayed_node->mutex){+.+.+.} ops: 1703807 {
HARDIRQ-ON-W at:
  [] __lock_acquire+0x346/0xda6
  [] lock_acquire+0x11d/0x143
  [] __mutex_lock_common+0x5a/0x444
  [] mutex_lock_nested+0x39/0x3e
  [] 
btrfs_delayed_update_inode+0x45/0x101
  [] btrfs_update_inode+0x2e/0x129
  [] btrfs_truncate+0x43d/0x477
  [] vmtruncate+0x44/0x52
  [] btrfs_setattr+0x202/0x253
  [] notify_change+0x1a2/0x29d
  [] do_truncate+0x6c/0x89
  [] do_last+0x579/0x57e
  [] do_filp_open+0x215/0x5ae
  [] do_sys_open+0x60/0xfc
  [] sys_open+0x20/0x22
  [] system_call_fastpath+0x16/0x1b
SOFTIRQ-ON-W at:
  [] __lock_acquire+0x367/0xda6
  [] lock_acquire+0x11d/0x143
  [] __mutex_lock_common+0x5a/0x444
  [] mutex_lock_nested+0x39/0x3e
  [] 
btrfs_delayed_update_inode+0x45/0x101
  [] btrfs_update_inode+0x2e/0x129
  [] btrfs_truncate+0x43d/0x477
  [] vmtruncate+0x44/0x52
  [] btrfs_setattr+0x202/0x253
  [] notify_change+0x1a2/0x29d
  [] do_truncate+0x6c/0x89
  [] do_last+0x579/0x57e
  [] do_filp_open+0x215/0x5ae
  [] do_sys_open+0x60/0xfc
  [] sys_open+0x20/0x22
  [] system_call_fastpath+0x16/0x1b
RECLAIM_FS-ON-W at:
 [] mark_held_locks+0x52/0x70
 [] lockdep_trace_alloc+0xa4/0xc2
 [] __kmalloc+0x7f/0x154
 [] kzalloc+0x14/0x16
 [] cache_block_group+0x106/0x238
 [] find_free_extent+0x4e2/0xa86
 [] 
btrfs_reserve_extent+0xb4/0x142
 [] 
btrfs_alloc_free_block+0x167/0x2af
 [] __btrfs_cow_block+0x103/0x346
 [] btrfs_cow_block+0x101/0x110
 [] btrfs_search_slot+0x143/0x513
 [] btrfs_lookup_inode+0x2f/0x8f
 [] 
btrfs_update_delayed_inode+0x75/0x135
 [] 
btrfs_run_delayed_items+0xd6/0x131
 [] 
btrfs_commit_transaction+0x28b/0x668
 [] btrfs_sync_fs+0x6b/0x70
 [] __sync_filesystem+0x6b/0x83
 [] sync_filesystem+0x4c/0x50
 [] 
generic_shutdown_super+0x27/0xd7
 [] kill_anon_super+0x16/0x54
 [] 
deactivate_locked_super+0x26/0x46
 [] deactivate_super+0x45/0x4a
 [] mntput_no_expire+0xd6/0x104
 [] sys_umount+0x2c1/0x2ec
 [] system_call_fastpath+0x16/0x1b
INITIAL USE at:
 [] __lock_acquire+0x3bd/0xda6
 [] lock_acquire+0x11d/0x143
 [] __mutex_lock_common+0x5a/0x444
 [] mutex_lock_nested+0x39/0x3e
 [] 
btrfs_delayed_update_inode+0x45/0x101
 [] btrfs_update_inode+0x2e/0x129
 [] btrfs_truncate+0x43d/0x477
 [] vmtruncate+0x44/0x52
 [] btrfs_setattr+0x202/0x253
 [] notify_change+0x1a2/0x29d
 [] do_truncate+0x6c/0x89
 [] do_last+0x579/0x57e
 [] do_filp_open+0x215/0x5ae
  

Re: [PATCH V4] btrfs: implement delayed inode items operation

2011-03-22 Thread Itaru Kitayama
Hi Miao,

On Tue, 22 Mar 2011 18:03:24 +0800
Miao Xie  wrote:

> The V5 patch is attached, could you test it for me? I have run Chris stress 
> test, dbench benchmark
> on my machine, it work well. I want to check if the above bug still exists or 
> not.
> 
> Thanks
> Miao

Here's the boot log (The V5 patch is on top of the btrfs-unstable next-rc 
branch):

BUG: unable to handle kernel NULL pointer dereference at 00c8
IP: [] __lock_acquire+0x98/0xda6
PGD 7074c067 PUD 75b6d067 PMD 0 
Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent
CPU 0 
Modules linked in: mptspi mptscsih mptbase scsi_transport_spi

Pid: 1, comm: init Not tainted 2.6.36-xie+ #120 440BX Desktop Reference 
Platform/VMware Virtual Platform
RIP: 0010:[]  [] __lock_acquire+0x98/0xda6
RSP: 0018:88007ceafb58  EFLAGS: 00010097
RAX: 0046 RBX: 88007ceb0050 RCX: 
RDX:  RSI:  RDI: 00c8
RBP: 88007ceafc18 R08: 0002 R09: 
R10:  R11: 88007ceafd48 R12: 88007ceb0050
R13: 00c8 R14: 0002 R15: 
FS:  7f1fe74e7720() GS:880004e0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00c8 CR3: 70486000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process init (pid: 1, threadinfo 88007ceae000, task 88007ceb0050)
Stack:
 0002 0001 88007ceafb78 810bdd5d
<0> 8800  88007ceafbc8 0046
<0> 88007ceafba8 88007ceb0050 81067dcd 0001
Call Trace:
 [] ? time_hardirqs_off+0x2e/0x36
 [] ? local_clock+0x2a/0x3b
 [] ? btrfs_get_delayed_items+0x37/0xb6
 [] lock_acquire+0x11d/0x143
 [] ? btrfs_get_delayed_items+0x37/0xb6
 [] ? btrfs_get_delayed_items+0x37/0xb6
 [] __mutex_lock_common+0x5a/0x444
 [] ? btrfs_get_delayed_items+0x37/0xb6
 [] ? virt_to_head_page+0xe/0x31
 [] ? cache_alloc_debugcheck_after+0x176/0x1cd
 [] mutex_lock_nested+0x39/0x3e
 [] btrfs_get_delayed_items+0x37/0xb6
 [] btrfs_real_readdir+0x170/0x52c
 [] ? vfs_readdir+0x56/0xb6
 [] ? filldir+0x0/0xd0
 [] ? trace_preempt_on+0x24/0x26
 [] ? sub_preempt_count+0xa3/0xb6
 [] ? __mutex_lock_common+0x3ee/0x444
 [] ? vfs_readdir+0x56/0xb6
 [] ? rcu_read_unlock+0x0/0x4b
 [] ? filldir+0x0/0xd0
 [] ? filldir+0x0/0xd0
 [] vfs_readdir+0x79/0xb6
 [] sys_getdents+0x85/0xd8
 [] system_call_fastpath+0x16/0x1b
Code: 58 e2 8b 01 00 be b3 0a 00 00 0f 85 c4 0c 00 00 e9 4d 0c 00 00 83 fe 07 
76 11 e8 7e 55 1f 00 48 c7 c7 55 85 78 81 e9 78 0c 00 00 <49> 8 
RIP  [] __lock_acquire+0x98/0xda6
 RSP 
CR2: 00c8
---[ end trace 87f215b3568d553d ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4] btrfs: implement delayed inode items operation

2011-03-21 Thread Itaru Kitayama
On Tue, 22 Mar 2011 11:12:37 +0800
Miao Xie  wrote:

> We can't fix it by this way, because the work threads may do insertion or 
> deletion at the same time,
> and we may lose some directory items.

Ok.
 
> Maybe we can fix it by adding a reference for the delayed directory items, we 
> can do read dir just
> like this:
> 1. hold the node lock
> 2. increase the directory items' reference and put all the directory items 
> into a list
> 3. release the node lock
> 4. read dir
> 5. decrease the directory items' reference and free them if the reference is 
> zero.
> What do you think about?

Sounds doable to me.

itaru
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4] btrfs: implement delayed inode items operation

2011-03-21 Thread Itaru Kitayama
Hi Miao,

Here is an excerpt of the V4 patch applied kernel boot log:

===
[ INFO: possible circular locking dependency detected ]
2.6.36-xie+ #117
---
vgs/1210 is trying to acquire lock:
 (&delayed_node->mutex){+.+...}, at: [] 
btrfs_delayed_update_inode+0x45/0x101

but task is already holding lock:
 (&mm->mmap_sem){++}, at: [] sys_mmap_pgoff+0xd6/0x12e

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){++}:
   [] lock_acquire+0x11d/0x143
   [] might_fault+0x95/0xb8
   [] filldir+0x75/0xd0
   [] btrfs_real_readdir+0x3d7/0x528
   [] vfs_readdir+0x79/0xb6
   [] sys_getdents+0x85/0xd8
   [] system_call_fastpath+0x16/0x1b

-> #0 (&delayed_node->mutex){+.+...}:
   [] __lock_acquire+0xa98/0xda6
   [] lock_acquire+0x11d/0x143
   [] __mutex_lock_common+0x5a/0x444
   [] mutex_lock_nested+0x39/0x3e
   [] btrfs_delayed_update_inode+0x45/0x101
   [] btrfs_update_inode+0x2e/0x129
   [] btrfs_dirty_inode+0x57/0x113
   [] __mark_inode_dirty+0x33/0x1aa
   [] touch_atime+0x107/0x12a
   [] btrfs_file_mmap+0x3e/0x57
   [] mmap_region+0x2bb/0x4c4
   [] do_mmap_pgoff+0x290/0x2f3
   [] sys_mmap_pgoff+0xf6/0x12e
   [] sys_mmap+0x22/0x24
   [] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

1 lock held by vgs/1210:
 #0:  (&mm->mmap_sem){++}, at: [] 
sys_mmap_pgoff+0xd6/0x12e

stack backtrace:
Pid: 1210, comm: vgs Not tainted 2.6.36-xie+ #117
Call Trace:
 [] print_circular_bug+0xaf/0xbd
 [] __lock_acquire+0xa98/0xda6
 [] ? btrfs_delayed_update_inode+0x45/0x101
 [] lock_acquire+0x11d/0x143
 [] ? btrfs_delayed_update_inode+0x45/0x101
 [] ? btrfs_delayed_update_inode+0x45/0x101
 [] __mutex_lock_common+0x5a/0x444
 [] ? btrfs_delayed_update_inode+0x45/0x101
 [] ? debug_mutex_init+0x31/0x3c
 [] mutex_lock_nested+0x39/0x3e
 [] btrfs_delayed_update_inode+0x45/0x101
 [] ? __mutex_unlock_slowpath+0x129/0x13a
 [] btrfs_update_inode+0x2e/0x129
 [] btrfs_dirty_inode+0x57/0x113
 [] __mark_inode_dirty+0x33/0x1aa
 [] touch_atime+0x107/0x12a
 [] btrfs_file_mmap+0x3e/0x57
 [] mmap_region+0x2bb/0x4c4
 [] ? file_map_prot_check+0x9a/0xa3
 [] do_mmap_pgoff+0x290/0x2f3
 [] ? sys_mmap_pgoff+0xd6/0x12e
 [] sys_mmap_pgoff+0xf6/0x12e
 [] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [] sys_mmap+0x22/0x24
 [] system_call_fastpath+0x16/0x1b

As the corresponding delayed node mutex lock is taken in btrfs_real_readdir, 
that seems deadlockable.
vfs_readdir holds i_mutex, I wonder if we can execute 
btrfs_readdir_delayed_dir_index without
taking the node lock. 

 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] btrfs: implement delayed inode items operation

2011-03-18 Thread Itaru Kitayama
Hi Miao,

Your updated V4 patch no longer gives me a hang. Thank you.

In the latest V4, btrfs_remove_delayed_node() does just release
a delayed node and returns for most of the time, but for space
cache inodes, it takes trans_mutex and writes the delayed nodes 
out to the tree. Is my understanding correct?

On Fri, 18 Mar 2011 11:37:03 +0800
Miao Xie  wrote:

> I find the inodes of free space information are updated when committing 
> transaction,
> and it will lead btrfs to hang.
> 
> I have fixed this bug in the attached patch, could you test for me? I have 
> tested it
> on my machine with space_cache mount option, it worked well.
> 
> Thanks
> Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] btrfs: implement delayed inode items operation

2011-03-16 Thread Itaru Kitayama
Hi Chris,
I have been using that option.

Itaru

On Wed, 16 Mar 2011 13:49:36 -0400
Chris Mason  wrote:

> Maybe this is related to the free space cache?  Is one of you using
> mount -o space_cache?
> 
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] btrfs: implement delayed inode items operation

2011-03-16 Thread Itaru Kitayama
Hi Miao,

The V4 still hangs. Does the new function, btrfs_update_inode_nodelayed() you 
introduced to the V4
need take care of a delayed node (commit delayed items of the inode and set the 
delayed note to NULL)?

Below is my experimental patch on top of V4 trying to avoid taking trans_mutex 
in btrfs_remove_delayed_node(), 
but still hangs. Could you take a look at it? Thanks.

Itaru

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 495a6af..b6ea082 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1006,9 +1006,9 @@ int btrfs_run_delayed_items(struct btrfs_trans_handle 
*trans,
return ret;
 }
 
-int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
-struct btrfs_root *root,
-struct inode *inode)
+int __btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct inode *inode, int delayed_update)
 {
struct btrfs_delayed_node *delayed_node = btrfs_get_delayed_node(inode);
struct btrfs_path *path;
@@ -1036,7 +1036,7 @@ int btrfs_commit_inode_delayed_items(struct 
btrfs_trans_handle *trans,
if (!ret)
ret = btrfs_delete_delayed_items(trans, path, root,
 delayed_node);
-   if (!ret)
+   if (!ret && delayed_update)
ret = btrfs_update_delayed_inode(trans, root, path,
 delayed_node);
btrfs_free_path(path);
@@ -1044,13 +1044,27 @@ int btrfs_commit_inode_delayed_items(struct 
btrfs_trans_handle *trans,
return ret;
 }
 
+int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
+struct btrfs_root *root,
+struct inode *inode)
+{
+   return __btrfs_commit_inode_delayed_items(trans, root, inode, 1);
+}
+
+int btrfs_commit_inode_delayed_items_no_delayed_update(struct 
btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct inode *inode)
+{
+   return __btrfs_commit_inode_delayed_items(trans, root, inode, 0);
+}
+
 int btrfs_remove_delayed_node(struct inode *inode)
 {
struct btrfs_trans_handle *trans;
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_delayed_node *delayed_node = BTRFS_I(inode)->delayed_node;
int ret;
-
+   
if (!delayed_node)
return 0;
 
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index d4134bc..b6986df 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -114,6 +114,11 @@ void btrfs_balance_delayed_items(struct btrfs_root *root);
 int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 struct inode *inode);
+int btrfs_commit_inode_delayed_items_no_delayed_update(
+   struct btrfs_trans_handle *trans,
+   struct btrfs_root *root,
+   struct inode *inode);
+
 /* Used for evicting the inode. */
 int btrfs_remove_delayed_node(struct inode *inode);
 int btrfs_kill_delayed_inode_items(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5299d9c..e198051 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2611,8 +2611,12 @@ noinline int btrfs_update_inode_nodelayed(struct 
btrfs_trans_handle *trans,
struct btrfs_inode_item *inode_item;
struct btrfs_path *path;
struct extent_buffer *leaf;
+   struct btrfs_delayed_node *delayed_node;
int ret;
 
+   btrfs_commit_inode_delayed_items_no_delayed_update(trans, root, inode);
+   delayed_node = NULL;
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] btrfs: implement delayed inode items operation

2011-03-09 Thread Itaru Kitayama

Thank you both for your help. I agree that we need to reserve 5 items for that 
operation. 

> > Miao's patch (http://marc.info/?l=linux-btrfs&m=129802068318176&w=2)
> > fixed this problem, maybe.
> 
> Yes, the above patch fixed this problem.
> I think this problem is a bug of btrfs_link(), and hope the fix patch can
> be received as early as possible. So I sent the above patch separately,
> not with the delayed inode items operation patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3] btrfs: implement delayed inode items operation

2011-03-09 Thread Itaru Kitayama
Hi Miao,

My virtual machine hangs upon reboot.

INFO: task umount:1952 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umountD fffd44dc  3024  1952   1714 0x
 880051ea99e8 0046 8800 880051ea9fd8
 001d4240 88005cd958d0 88005cd95e60 88005cd95e58
 880051ea8000 880051ea9fd8 001d4240 001d4240
Call Trace:
 [] ? start_transaction+0x7b/0x1f8
 [] __mutex_lock_common+0x28c/0x444
 [] ? start_transaction+0x7b/0x1f8
 [] mutex_lock_nested+0x39/0x3e
 [] start_transaction+0x7b/0x1f8
 [] btrfs_join_transaction+0x15/0x17
 [] btrfs_remove_delayed_node+0x36/0xa1
 [] btrfs_destroy_inode+0x2ae/0x2cc
 [] destroy_inode+0x2f/0x45
 [] iput+0x239/0x23e
 [] btrfs_write_dirty_block_groups+0x22a/0x47c
 [] commit_cowonly_roots+0xd9/0x198
 [] btrfs_commit_transaction+0x3a0/0x658
 [] ? __mutex_unlock_slowpath+0x129/0x13a
 [] ? autoremove_wake_function+0x0/0x39
 [] btrfs_commit_super+0x90/0xd9
 [] close_ctree+0x4a/0x35c
 [] ? up_write+0x23/0x3b
 [] ? invalidate_inodes+0x134/0x146
 [] btrfs_put_super+0x1d/0x2c
 [] generic_shutdown_super+0x56/0xd7
 [] kill_anon_super+0x16/0x54
 [] ? deactivate_super+0x3d/0x4a
 [] deactivate_locked_super+0x26/0x46
 [] deactivate_super+0x45/0x4a
 [] mntput_no_expire+0xd6/0x104
 [] sys_umount+0x2c1/0x2ec
 [] system_call_fastpath+0x16/0x1b
INFO: lockdep is turned off.

btrfs_remove_delayed_node() is called while trans_mutex is held.

Itaru

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3] btrfs: implement delayed inode items operation

2011-03-04 Thread Itaru Kitayama
Hi Miao,

The V3 patch on top of the next-rc fails to pass an xfstests test 13.
In the btrfs link path, we need to reserve one more metadata in the
trans_block_rsv for the delayed inode update (if needed) to complete.

Here's the log from the test:

[ cut here ]
kernel BUG at fs/btrfs/delayed-inode.c:1565!
invalid opcode:  [#2] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/module/floppy/initstate
CPU 1 
Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc 
ipv6 uinput snd_ens1371 gameport snd_rawmidi ]

Pid: 6948, comm: fsstress Tainted: G  D W   2.6.36-preempt+ #72 440BX 
Desktop Reference Platform/VMware Virtual Platform
RIP: 0010:[]  [] 
btrfs_delayed_update_inode+0xfd/0x101
RSP: 0018:88003b073d68  EFLAGS: 00010286
RAX: ffe4 RBX: 88002a876e48 RCX: 88004f3702a0
RDX: 03b9 RSI: 0001 RDI: 88003b072000
RBP: 88003b073da8 R08: 00018000 R09: 
R10: 88004f3703f0 R11: 09f911029d74e35b R12: 88006861cb90
R13: 88002ec91c18 R14: 00018000 R15: 88002a876ea0
FS:  7fcad4905720() GS:88000500() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f46eb38 CR3: 708b4000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process fsstress (pid: 6948, threadinfo 88003b072000, task 88004f324710)
Stack:
 88003b073d88 88004f3702a0  88002ec91c18
<0> 88006861cb90 880036783c18  88006fcaf000
<0> 88003b073dc8 811d8489 88002ec91c18 88006861cb90
Call Trace:
 [] btrfs_update_inode+0x17/0x44
 [] btrfs_link+0xff/0x17a
 [] vfs_link+0xda/0x144
 [] sys_linkat+0xc4/0x121
 [] ? fsnotify_modify+0x66/0x6e
 [] ? vfs_write+0xd2/0x10a
 [] sys_link+0x1e/0x20
 [] system_call_fastpath+0x16/0x1b
Code: f0 ff 40 60 eb 02 eb fe 4c 89 ff e8 7d 29 2b 00 31 f6 48 89 df e8 31 fb 
ff ff 31 c0 48 83 c4 18 5b 41 5c 41 5d 41 5e  
RIP  [] btrfs_delayed_update_inode+0xfd/0x101
 RSP 
---[ end trace 38596ade53c80d11 ]---

 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] btrfs: implement delayed inode items operation

2011-03-04 Thread Itaru Kitayama
Hi Miao,

With the V3 patch applied, I was able to run dbench 50 without a problem.
Thanks,

On Thu, 03 Mar 2011 14:15:44 +0800
Miao Xie  wrote:

> On Thu, 24 Feb 2011 23:02:55 +0900, Itaru Kitayama wrote:
> > I have applied the V2 patch on top of the next-rc branch of btrfs-unstable
> > and ran dbench 50. The run never finished and lots of stall messages 
> > recorded
> > in the log.
> > Looking at the stack trace, something wrong with the delayed item balancing?
> > Are you working on the similar problem for V3?
> 
> I have made the third version of this patch, could you test it for me to check
> whether this problem still happens or not?
> 
> Thanks
> Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] btrfs: implement delayed inode items operation

2011-02-24 Thread Itaru Kitayama
Hi Miao,

I have applied the V2 patch on top of the next-rc branch of btrfs-unstable
and ran dbench 50. The run never finished and lots of stall messages recorded
in the log.
Looking at the stack trace, something wrong with the delayed item balancing?
Are you working on the similar problem for V3? 

INFO: rcu_sched_state detected stall on CPU 1 (t=15000 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0 
Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc 
ipv6 uinput snd_ens1371 gameport snd_rawmidi snd_ac97_codec ppdev ac97_bus 
snd_seq snd_seq_device snd_pcm vmw_balloon microcode snd_timer pcspkr 
parport_pc snd parport e1000 i2c_piix4 soundcore shpchp i2c_core snd_page_alloc 
mptspi mptscsih mptbase scsi_transport_spi floppy [last unloaded: speedstep_lib]

Pid: 3297, comm: dbench Not tainted 2.6.36-desktop+ #51 440BX Desktop Reference 
Platform/VMware Virtual Platform
RIP: 0010:[]  [] 
native_apic_mem_write+0x11/0x13
RSP: 0018:880004e03ea8  EFLAGS: 0006
RAX: 81a147c0 RBX: 880004e0d480 RCX: 0020
RDX: 00059a96 RSI: 05eb RDI: 0380
RBP: 880004e03ea8 R08: 880004e0d480 R09: 0001
R10: 880004e03f18 R11: 8800345cd208 R12: 
R13: 817a4bbe R14:  R15: 880004e03f48
FS:  7fd473373720() GS:880004e0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7fcd41ce3000 CR3: 3715f000 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process dbench (pid: 3297, threadinfo 880039314000, task 8800345cd1d0)
Stack:
 880004e03eb8 8101a8c8 880004e03ec8 8106a60d
<0> 880004e03f08 8106bb62 0055a1ca9b4c 0055a1d040c0
<0> 880004e103c0 0055a1ca9b4c  880004e10400
Call Trace:
  
 [] lapic_next_event+0x1d/0x21
 [] clockevents_program_event+0x7a/0x83
 [] tick_dev_program_event+0x3c/0xfc
 [] tick_program_event+0x2a/0x2c
 [] hrtimer_interrupt+0x11a/0x1c3
 [] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [] smp_apic_timer_interrupt+0x88/0x9b
 [] apic_timer_interrupt+0x13/0x20
  
 [] ? _raw_spin_lock+0x62/0x69
 [] ? lock_acquired+0x259/0x268
 [] ? btrfs_wq_run_delayed_node+0x3f/0x149
 [] _raw_spin_lock+0x62/0x69
 [] ? btrfs_wq_run_delayed_node+0x3f/0x149
 [] btrfs_wq_run_delayed_node+0x3f/0x149
 [] btrfs_balance_delayed_items+0xbe/0xc9
 [] ? mutex_unlock+0xe/0x10
 [] ? btrfs_release_delayed_node+0xdd/0xfb
 [] btrfs_delayed_update_inode+0xbd/0xcb
 [] btrfs_update_inode+0x17/0x44
 [] btrfs_dirty_inode+0x57/0x103
 [] __mark_inode_dirty+0x33/0x1aa
 [] touch_atime+0x107/0x12a
 [] generic_file_aio_read+0x567/0x5bc
 [] do_sync_read+0xcb/0x108
 [] ? fsnotify_perm+0x4a/0x50
 [] ? security_file_permission+0x2e/0x33
 [] vfs_read+0xab/0x107
 [] sys_pread64+0x5a/0x76
 [] system_call_fastpath+0x16/0x1b
Code: 0f b6 34 dd 01 f0 f9 81 89 c2 48 c7 c7 99 70 77 81 31 c0 e8 88 b4 47 00 
eb d5 55 48 89 e5 0f 1f 44 00 00 89 ff 89 b7 00 b0 5f ff  c3 55 48 89 e5 0f 
1f 44 00 00 89 ff 8b 87 00 b0 5f ff c9 c3 
Call Trace:
   [] lapic_next_event+0x1d/0x21
 [] clockevents_program_event+0x7a/0x83
 [] tick_dev_program_event+0x3c/0xfc
 [] tick_program_event+0x2a/0x2c
 [] hrtimer_interrupt+0x11a/0x1c3
 [] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [] smp_apic_timer_interrupt+0x88/0x9b
 [] apic_timer_interrupt+0x13/0x20
   [] ? _raw_spin_lock+0x62/0x69
 [] ? lock_acquired+0x259/0x268
 [] ? btrfs_wq_run_delayed_node+0x3f/0x149
 [] _raw_spin_lock+0x62/0x69
 [] ? btrfs_wq_run_delayed_node+0x3f/0x149
 [] btrfs_wq_run_delayed_node+0x3f/0x149
 [] btrfs_balance_delayed_items+0xbe/0xc9
 [] ? mutex_unlock+0xe/0x10
 [] ? btrfs_release_delayed_node+0xdd/0xfb
 [] btrfs_delayed_update_inode+0xbd/0xcb
 [] btrfs_update_inode+0x17/0x44
 [] btrfs_dirty_inode+0x57/0x103
 [] __mark_inode_dirty+0x33/0x1aa
 [] touch_atime+0x107/0x12a
 [] generic_file_aio_read+0x567/0x5bc
 [] do_sync_read+0xcb/0x108
 [] ? fsnotify_perm+0x4a/0x50
 [] ? security_file_permission+0x2e/0x33
 [] vfs_read+0xab/0x107
 [] sys_pread64+0x5a/0x76
 [] system_call_fastpath+0x16/0x1b
Pid: 3297, comm: dbench Not tainted 2.6.36-desktop+ #51
Call Trace:
   [] ? show_regs+0x2b/0x30
 [] arch_trigger_all_cpu_backtrace_handler+0x76/0x9a
 [] notifier_call_chain+0x68/0x9c
 [] __atomic_notifier_call_chain+0x5e/0x8b
 [] ? __atomic_notifier_call_chain+0x0/0x8b
 [] atomic_notifier_call_chain+0x14/0x16
 [] notify_die+0x2e/0x30
 [] do_nmi+0xaa/0x293
 [] nmi+0x1a/0x2c
 [] ? native_apic_mem_write+0x11/0x13
 <>[] lapic_next_event+0x1d/0x21
 [] clockevents_program_event+0x7a/0x83
 [] tick_dev_program_event+0x3c/0xfc
 [] tick_program_event+0x2a/0x2c
 [] hrtimer_interrupt+0x11a/0x1c3
 [] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [] smp_apic_timer_interrupt+0x88/0x9b
 [] apic_timer_interrupt+0x13/0x20
   [] ? _raw_spin_lock+0x62/0x69
 [] ? lock_acquired+0x25

Re: [GIT PULL] [RFC PATCH 0/4] btrfs: Implement delayed directory name index insertion and deletion

2011-01-09 Thread Itaru Kitayama
Hi Miao,

As you suggested, in btrfs_recover_log_trees(), the items to modify in the 
transaction are 
not known before entering a tree, we can use the global block reservation for 
it.

Signed-off-by: Itaru Kitayama 
---
 fs/btrfs/tree-log.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 054744a..7df8c7b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3081,6 +3081,8 @@ int btrfs_recover_log_trees(struct btrfs_root 
*log_root_tree)
 
trans = btrfs_start_transaction(fs_info->tree_root, 0);
 
+   trans->block_rsv = &fs_info->global_block_rsv;
+
wc.trans = trans;
wc.pin = 1;
 
-- 
1.7.3.4

On Thu, 06 Jan 2011 14:47:41 +0800
Miao Xie  wrote:

> Hi, Kitayama-san
> 
> Firstly, thanks for your test.
> 
> On Sat, 1 Jan 2011 00:43:41 +0900, Itaru Kitayama wrote:
> > Hi Miao,
> >
> > The HEAD of the perf-improve fails to boot on my virtual machine.
> >
> > The system calls btrfs_delete_delayed_dir_index() with trans block_rsv set 
> > to NULL,
> > thus selects, in get_block_rsv(), empty_block_rsv whose reserve is 0 (and 
> > size is also 0),
> > which leads to ENOSPC. I wonder below patch is enough reserve metadata to 
> > finish
> > btrfs_recover_log_trees() without going to ENOSPC. I appreciate your review.
> >
> > Singed-Off-by: Itaru Kitayama
> >
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 054744a..f26326b 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -3079,7 +3079,7 @@ int btrfs_recover_log_trees(struct btrfs_root 
> > *log_root_tree)
> >  path = btrfs_alloc_path();
> >  BUG_ON(!path);
> >
> > -   trans = btrfs_start_transaction(fs_info->tree_root, 0);
> > +   trans = btrfs_start_transaction(fs_info->tree_root, 4);
> 
> I don't think this change is right, because we don't know how many leaves we 
> may change
> when doing log tree replay, so we can't set the secondargument to 4.
> 
> And I think the original code is right, because the space reservation is used 
> to avoid filesystem
> operations being broken by that other users hogging all of the free space. 
> but this function is
> invoked when we mount a filesystem, at this time, no other user can access 
> the filesystem,
> so we can use all of the free space, thus we needn't reserve any free space 
> for log tree replay.
> 
> I don't understand the log tree very well, maybe there is something wrong 
> with what I said.
> If what I said above is right, we should look for another way to fix this 
> problem.
> 
> (I'm making the second version of this patchset now, I'll fix it in it. So if 
> your patch is right,
> I'll want to add it into my patchset.)
> 
> Thanks again for your test.
> Miao
> 
> >
> >  wc.trans = trans;
> >  wc.pin = 1;
> >
> > Here's the log:
> >
> > kernel BUG at fs/btrfs/tree-log.c:678!
> > invalid opcode:  [#1] SMP
> > last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent
> > CPU 1
> > Modules linked in: floppy mptspi mptscsih mptbase scsi_transport_spi [last 
> > unloaded: scsi_wait_scan]
> >
> > Pid: 308, comm: mount Not tainted 2.6.36-perf-improve+ #1 440BX Desktop 
> > Reference Platform/VMware Virtual Platform
> > RIP: 0010:[]  [] 
> > drop_one_dir_item+0xd6/0xfb
> > RSP: 0018:88007a5a5858  EFLAGS: 00010286
> > RAX: ffe4 RBX: 88007d2b7800 RCX: 880037e8b240
> > RDX:  RSI: eac3ae68 RDI: 0206
> > RBP: 88007a5a58c8 R08: 005e6760 R09: 88007a5a55e8
> > R10: 88007a5a55e0 R11: 88007a5a5648 R12: 880037e8b120
> > R13: 880037e98cc0 R14: 88007b371c90 R15: 0005
> > FS:  7f37b63c4800() GS:88000204() knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 7f37b55f0190 CR3: 7a4d9000 CR4: 06e0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: 0ff0 DR7: 0400
> > Process mount (pid: 308, threadinfo 88007a5a4000, task 88007a5c9720)
> > Stack:
> >   0005 0d75 880037e98550 880037dcf000
> > <0>  0016e730 0001 0100 005e7b60
> > <0>  88007a46d000 880037e8b120 88007d2b7800 880037e98550
> > Call Trace:
> >   [] add_inode_ref+0x32a/0x403
> >   [] replay_one_buffer+0x188

Re: [GIT PULL] [RFC PATCH 0/4] btrfs: Implement delayed directory name index insertion and deletion

2010-12-31 Thread Itaru Kitayama
Hi Miao,

The HEAD of the perf-improve fails to boot on my virtual machine.

The system calls btrfs_delete_delayed_dir_index() with trans block_rsv set to 
NULL,
thus selects, in get_block_rsv(), empty_block_rsv whose reserve is 0 (and size 
is also 0), 
which leads to ENOSPC. I wonder below patch is enough reserve metadata to 
finish 
btrfs_recover_log_trees() without going to ENOSPC. I appreciate your review. 

Singed-Off-by: Itaru Kitayama 

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 054744a..f26326b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3079,7 +3079,7 @@ int btrfs_recover_log_trees(struct btrfs_root 
*log_root_tree)
path = btrfs_alloc_path();
BUG_ON(!path);
 
-   trans = btrfs_start_transaction(fs_info->tree_root, 0);
+   trans = btrfs_start_transaction(fs_info->tree_root, 4);
 
wc.trans = trans;
wc.pin = 1;

Here's the log:

kernel BUG at fs/btrfs/tree-log.c:678!
invalid opcode:  [#1] SMP 
last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent
CPU 1 
Modules linked in: floppy mptspi mptscsih mptbase scsi_transport_spi [last 
unloaded: scsi_wait_scan]

Pid: 308, comm: mount Not tainted 2.6.36-perf-improve+ #1 440BX Desktop 
Reference Platform/VMware Virtual Platform
RIP: 0010:[]  [] drop_one_dir_item+0xd6/0xfb
RSP: 0018:88007a5a5858  EFLAGS: 00010286
RAX: ffe4 RBX: 88007d2b7800 RCX: 880037e8b240
RDX:  RSI: eac3ae68 RDI: 0206
RBP: 88007a5a58c8 R08: 005e6760 R09: 88007a5a55e8
R10: 88007a5a55e0 R11: 88007a5a5648 R12: 880037e8b120
R13: 880037e98cc0 R14: 88007b371c90 R15: 0005
FS:  7f37b63c4800() GS:88000204() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f37b55f0190 CR3: 7a4d9000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mount (pid: 308, threadinfo 88007a5a4000, task 88007a5c9720)
Stack:
 0005 0d75 880037e98550 880037dcf000
<0> 0016e730 0001 0100 005e7b60
<0> 88007a46d000 880037e8b120 88007d2b7800 880037e98550
Call Trace:
 [] add_inode_ref+0x32a/0x403
 [] replay_one_buffer+0x188/0x209
 [] ? verify_parent_transid+0x36/0xf9
 [] walk_up_log_tree+0x109/0x1d1
 [] ? replay_one_buffer+0x0/0x209
 [] walk_log_tree+0x9b/0x187
 [] btrfs_recover_log_trees+0x18a/0x2a2
 [] ? replay_one_buffer+0x0/0x209
 [] ? btree_read_extent_buffer_pages+0x71/0xaf
 [] open_ctree+0xf8f/0x12c6
 [] btrfs_get_sb+0x225/0x459
 [] ? __kmalloc_track_caller+0x13a/0x14c
 [] vfs_kern_mount+0xbd/0x1a7
 [] do_kern_mount+0x4d/0xed
 [] do_mount+0x754/0x7cb
 [] ? memdup_user+0x60/0x80
 [] ? strndup_user+0x3e/0x54
 [] sys_mount+0x88/0xc2
 [] system_call_fastpath+0x16/0x1b
Code: de e8 8f e5 ff ff 85 c0 74 04 0f 0b eb fe 48 8b 55 a0 48 8b 7d a8 45 89 
f9 4d 89 f0 4c 89 e9 48 89 de e8 65 bf fd ff 85 c0 74 04 <0f> 0b eb fe 4 
RIP  [] drop_one_dir_item+0xd6/0xfb
 RSP 
---[ end trace 2ec638d9e30d6102 ]---


On Wed, 01 Dec 2010 16:09:17 +0800
Miao Xie  wrote:

> Compare with Ext3/4, the performance of file creation and deletion on btrfs is
> very poor. the reason is that btrfs must do a lot of b+ tree insertions, such 
> as
> inode item, directory name item, directory name index and so on.
> 
> If we can do some delayed b+ tree insertion or deletion, we can improve the
> performance, so we made this patch which implemented delayed directory name
> index insertion and deletion.
> 
> Beside that, we found we must map the page every time we want to set a member
> variable of the inode item, it is inefficient. We just do it at first to 
> reduce
> the times of mmap(). By this way, we can also improve the performance of file
> creation and deletion.
> 
> I did a quick test by the benchmark tool[1] and found we can improve the
> performance of file creation by ~11%, and file deletion by ~14%.
> 
> Before applying this patch:
> Create files:
>   Total files: 5
>   Total time: 1.188547
>   Average time: 0.24
> Delete files:
>   Total files: 5
>   Total time: 1.662012
>   Average time: 0.33
> 
> After applying this patch:
> Create files:
>   Total files: 5
>   Total time: 1.057432
>   Average time: 0.21
> Delete files:
>   Total files: 5
>   Total time: 1.422851
>   Average time: 0.28
> 
> You can also try out the patchset by pulling:
>   git://repo.or.cz/linux-btrfs-devel.git perf-improve
> 
> [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
> 
> ---
>  fs/btrfs/Makefile|2 +-
>  fs/btrfs/btrfs_inode.h   |2 +

[PATCH] Btrfs: pick the correct metadata allocation size on small devices

2010-12-12 Thread Itaru Kitayama

Josef's fs_mark test

fs_mark -d /mnt/btrfs-test -D 512 -t 16 -n 4096 -F -S0

on a 2GB single metadata fs leaves about 400Mb of metadata almost unused.

This patch reduces metadata chunk allocations by considering the proper
metadata chunk size of 200MB in should_alloc_chunk(), not the default 256MB 
which is set in __btrfs_alloc_chunk().

Signed-off-by: Itaru Kitayama 

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8c26441..54ab490 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3241,6 +3241,8 @@ static int should_alloc_chunk(struct btrfs_root *root,
  struct btrfs_space_info *sinfo, u64 alloc_bytes)
 {
u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly;
+   u64 total_rw_bytes = root->fs_info->fs_devices->total_rw_bytes; 
+   u64 max_chunk_size;
u64 thresh;
 
if (sinfo->bytes_used + sinfo->bytes_reserved +
@@ -3251,8 +3253,9 @@ static int should_alloc_chunk(struct btrfs_root *root,
alloc_bytes < div_factor(num_bytes, 8))
return 0;
 
+   max_chunk_size = min(256 * 1024 * 1024, div_factor(total_rw_bytes, 1));
thresh = btrfs_super_total_bytes(&root->fs_info->super_copy);
-   thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 5));
+   thresh = max_t(u64, max_chunk_size, div_factor_fine(thresh, 5));
 
if (num_bytes > thresh && sinfo->bytes_used < div_factor(num_bytes, 3))
return 0;


-- 
Itaru Kitayama 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Log parent inode if it is newer than the last commit

2010-11-08 Thread Itaru Kitayama

I've been puzzled why I see a flood of ENOENT returned upon rename during 
Fedora 13 and 14's GDM session. 

The case is of course handled by your recent patch which is in upstream and 
btrfs-unstable, but I thought 
doing a log look-up always fail is inefficient so I checked the code path that 
set inode's logged_trans field.   

Itaru

On Mon, 08 Nov 2010 08:50:21 -0500
Chris Mason  wrote:

> Excerpts from Itaru Kitayama's message of 2010-11-06 07:03:05 -0400:
> > 
> > In the file sync path, file's parent inode is logged if it is newer than 
> > the last
> > commit. This patch checks also the last_trans field as well as generation.
> > 
> > As btrfs_log_inode updates the logged_trans field of parent dir's inode, 
> > tree-log
> > lookup operations are suppressed upon unlink.
> 
> Is there a specific test case you're working on for this?  The idea
> behind the logging code is that we use the backrefs in the inode to
> recreate the inode in the directory if we crash.
> 
> This doesn't work if the directory doesn't exist when we come back, so
> the code logs the directory if it is newer than the last commit.
> 
> As long as the directory exists in the tree, we should be able to safely
> continue replay without logging it again.
> 
> -chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Log parent inode if it is newer than the last commit

2010-11-06 Thread Itaru Kitayama

In the file sync path, file's parent inode is logged if it is newer than the 
last
commit. This patch checks also the last_trans field as well as generation.

As btrfs_log_inode updates the logged_trans field of parent dir's inode, 
tree-log
lookup operations are suppressed upon unlink.

The patch is for the latest btrfs-unstable tree.

Signed-off-by: Itaru Kitayama 

---
 fs/btrfs/tree-log.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index a29f193..db63ae4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3007,7 +3007,7 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle 
*trans,
if (root != BTRFS_I(inode)->root)
break;
 
-   if (BTRFS_I(inode)->generation >
+   if (max(BTRFS_I(inode)->generation, BTRFS_I(inode)->last_trans) 
>
root->fs_info->last_trans_committed) {
ret = btrfs_log_inode(trans, root, inode, inode_only);
if (ret)
-- 
1.7.2.3
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:2642!

2010-09-10 Thread Itaru Kitayama

Thank you, Zhu. Would you send the patch to Andrew or should I pass it on to 
him?

Itaru

On Thu, 9 Sep 2010 23:30:52 +0800
Zhu Yanhai  wrote:

> Hi,
> That's because btrfs_del_dir_entries_in_log() will return the real
> errno after Andi's
> commit, so btrfs_unlink_inode() has to check the return value more seriously.
> A patch for this has been sent out.
> 
> Regards,
> Zhu Yanhai
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/inode.c:2642!

2010-09-09 Thread Itaru Kitayama
+++ b/fs/btrfs/extent-tree.c
@@ -3337,8 +3337,7 @@ struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct 
btrfs_root *root)
btrfs_init_block_rsv(block_rsv);
 
alloc_target = btrfs_get_alloc_profile(root, 0);
-   block_rsv->space_info = __find_space_info(fs_info,
- BTRFS_BLOCK_GROUP_METADATA);
+   block_rsv->space_info = __find_space_info(fs_info, alloc_target);
 
return block_rsv;
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d74e6af..609d67a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2823,6 +2823,8 @@ int extent_prepare_write(struct extent_io_tree *tree,
 NULL, 1,
 end_bio_extent_preparewrite, 0,
 0, 0);
+   if (ret && !err)
+   err = ret;
iocount++;
block_start = block_start + iosize;
} else {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c038644..1dc5b96 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1372,7 +1372,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned long 
offset,
 
if (map_length < length + size)
return 1;
-   return 0;
+   return ret;
 }
 
 /*
@@ -2672,8 +2672,8 @@ static int check_path_shared(struct btrfs_root *root,
 {
struct extent_buffer *eb;
int level;
-   int ret;
u64 refs = 1;
+   int uninitialized_var(ret);
 
for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
if (!path->nodes[level])
@@ -2686,7 +2686,7 @@ static int check_path_shared(struct btrfs_root *root,
if (refs > 1)
return 1;
}
-   return 0;
+   return ret; /* XXX callers? */
 }
 
 /*
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index b37d723..5f225fe 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3099,6 +3099,8 @@ static int add_tree_block(struct reloc_control *rc,
BUG_ON(item_size != sizeof(struct btrfs_extent_item_v0));
ret = get_ref_objectid_v0(rc, path, extent_key,
  &ref_owner, NULL);
+   if (ret < 0)
+   return ret;
BUG_ON(ref_owner >= BTRFS_MAX_LEVEL);
level = (int)ref_owner;
/* FIXME: get real generation */
@@ -4143,7 +4145,7 @@ int btrfs_reloc_clone_csums(struct inode *inode, u64 
file_pos, u64 len)
btrfs_add_ordered_sum(inode, ordered, sums);
}
btrfs_put_ordered_extent(ordered);
-   return 0;
+   return ret;
 }
 
 void btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index fb102a9..224fb5b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2273,7 +2273,7 @@ fail:
}
btrfs_end_log_trans(root);
 
-   return 0;
+   return err;
 }
 
 /* see comments for btrfs_del_dir_entries_in_log */

   

-- 
Itaru Kitayama 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html