Re: [PATCH] Btrfs: add initial tracepoint support for btrfs

2011-03-26 Thread Chris Mason
Excerpts from liubo's message of 2011-03-24 07:18:59 -0400:
 
 Tracepoints can provide insight into why btrfs hits bugs and be greatly
 helpful for debugging, e.g

This is really neat, I've queued it up.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compress=lzo a good idea for Swapfiles on SSD?

2011-03-26 Thread Chris Mason
Excerpts from John McCabe-Dansted's message of 2011-03-25 23:47:02 -0400:
 I understand that modern SSDs can withstand a significant amount of
 writes, and so using an SSD for swap seems like a reasonable
 proposition. However minimising writes still seems like a good idea.
 My experience with compcache/ramzswap suggests that swap compresses
 quite well, I tend to get a 4:1 compression ratio. Furthermore, I
 understand that we can work around the data corruption that usually
 occurs when using a swapfile on a btrfs partion, by using a loopback
 device. Given this, my question is:
 
 Does it sound like a good idea to use compress=lzo for swapfiles to
 reduce the amount of data written to the SSD, when using SSD drives
 that do not use compression internally?
 

I would tend to say no, only because using compression leads to more
allocations required to actually write the blocks.  So you're swapping
because you need to free ram but you have to allocate ram in order to
swap.

There are projects for in kernel swapfile compression that have good
results though, so I'd have to study it in more detail.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-26 Thread Chris Mason
Excerpts from Miao Xie's message of 2011-03-24 07:41:31 -0400:
 Changelog V4 - V5:
 - Fix the race on adding the delayed node to the inode, which is spotted by
   Chris Mason.
 - Merge Chris Mason's incremental patch into this patch.
 - Fix deadlock between readdir() and memory fault, which is reported by
   Itaru Kitayama.

This does do much better than v4, but I'm still hitting oom with
stress.sh -n 50.  I tried a bunch of variations but haven't been able to
get it quite right.  I think I need to hold off on this one and get the
2.6.39 pull request out.

I'll setup a .40 tree that has this in it and we can fix the ooms.  This
is a great base for the work, and I'd like to add more items to the
delay, especially the initial inode insertions.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-26 Thread Itaru Kitayama
Hi Miao,

Chris' stress test, stress.sh -n 50 -c /mnt/linux-2.6 /mnt gave me another 
lockdep splat
(see below). I applied your V5 patches on top of the next-rc branch.

I haven't triggered it in my actual testing, but do you think we can iterate a 
list of block 
groups in an lockless manner using rcu?

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2164296..f40ff4e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -740,6 +740,7 @@ struct btrfs_space_info {
struct list_head block_groups[BTRFS_NR_RAID_TYPES];
spinlock_t lock;
struct rw_semaphore groups_sem;
+   struct srcu_struct groups_srcu;
atomic_t caching_threads;
 };
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9e4c9f4..22d6dbb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3003,6 +3003,7 @@ static int update_space_info(struct btrfs_fs_info *info, 
u64 flags,
for (i = 0; i  BTRFS_NR_RAID_TYPES; i++)
INIT_LIST_HEAD(found-block_groups[i]);
init_rwsem(found-groups_sem);
+   init_srcu_struct(found-groups_srcu);
spin_lock_init(found-lock);
found-flags = flags  (BTRFS_BLOCK_GROUP_DATA |
BTRFS_BLOCK_GROUP_SYSTEM |
@@ -4853,6 +4854,7 @@ static noinline int find_free_extent(struct 
btrfs_trans_handle *trans,
 int data)
 {
int ret = 0;
+   int idx;
struct btrfs_root *root = orig_root-fs_info-extent_root;
struct btrfs_free_cluster *last_ptr = NULL;
struct btrfs_block_group_cache *block_group = NULL;
@@ -4929,7 +4931,7 @@ ideal_cache:
if (block_group  block_group_bits(block_group, data) 
(block_group-cached != BTRFS_CACHE_NO ||
 search_start == ideal_cache_offset)) {
-   down_read(space_info-groups_sem);
+   idx = srcu_read_lock(space_info-groups_srcu);
if (list_empty(block_group-list) ||
block_group-ro) {
/*
@@ -4939,7 +4941,7 @@ ideal_cache:
 * valid
 */
btrfs_put_block_group(block_group);
-   up_read(space_info-groups_sem);
+   srcu_read_unlock(space_info-groups_srcu, idx);
} else {
index = get_block_group_index(block_group);
goto have_block_group;
@@ -4949,8 +4951,8 @@ ideal_cache:
}
}
 search:
-   down_read(space_info-groups_sem);
-   list_for_each_entry(block_group, space_info-block_groups[index],
+   idx = srcu_read_lock(space_info-groups_srcu);
+   list_for_each_entry_rcu(block_group, space_info-block_groups[index],
list) {
u64 offset;
int cached;
@@ -5197,8 +5199,8 @@ loop:
BUG_ON(index != get_block_group_index(block_group));
btrfs_put_block_group(block_group);
}
-   up_read(space_info-groups_sem);
-
+   srcu_read_unlock(space_info-groups_srcu, idx);
+   
if (!ins-objectid  ++index  BTRFS_NR_RAID_TYPES)
goto search;
 


=
[ INFO: possible irq lock inversion dependency detected ]
2.6.36-v5+ #2
-
kswapd0/49 just changed the state of lock:
 (delayed_node-mutex){+.+.-.}, at: [812131f7] 
btrfs_remove_delayed_node+0x3e/0xd2
but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
 (found-groups_sem){.+}

and interrupts could create inverse lock ordering between them.


other info that might help us debug this:
2 locks held by kswapd0/49:
 #0:  (shrinker_rwsem){..}, at: [810e242a] shrink_slab+0x3d/0x164
 #1:  (iprune_sem){.-}, at: [811316d0] 
shrink_icache_memory+0x4d/0x213

the shortest dependencies between 2nd lock and 1st lock:
 - (found-groups_sem){.+} ops: 1334 {
HARDIRQ-ON-W at:
  [81075ec0] __lock_acquire+0x346/0xda6
  [81076a3d] lock_acquire+0x11d/0x143
  [814c6a2a] down_write+0x55/0x9b
  [811c352a] __link_block_group+0x5a/0x83
  [811ca562] 
btrfs_read_block_groups+0x2fb/0x56c
  [811d4921] open_ctree+0xf78/0x14ab
  [811bafdf] btrfs_get_sb+0x236/0x467
  [8111f25e] vfs_kern_mount+0xbd/0x1a7
  [8111f3b0] do_kern_mount+0x4d/0xed
  [8113668d] do_mount+0x74e/0x7c5
  [8113678c] sys_mount+0x88/0xc2