Re: Abysmal Performance

2011-06-21 Thread Henning Rohlfs

On Mon, 20 Jun 2011 20:12:16 -0400, Josef Bacik wrote:

On 06/20/2011 05:51 PM, Henning Rohlfs wrote:

Hello,

I've migrated my system to btrfs (raid1) a few months ago. Since 
then

the performance has been pretty bad, but recently it's gotten
unbearable: a simple sync called while the system is idle can take 
20 up

to 60 seconds. Creating or deleting files often has several seconds
latency, too.

One curious - but maybe unrelated - observation is that even though 
I'm

using a raid1 btrfs setup, the hdds are often being written to
sequentially. One hard-drive sees some write activity and after it
subsides, the other drive sees some activity. (See attached
sequential-writes.txt.)



Can you do sysrq+w while this is happening so we can see who is doing
the writing?  Thanks,

Josef


When I call sync, it starts with several seconds of 100% (one core) cpu 
usage by sync itself. Afterwards btrfs-submit-0 and sync are blocked. 
sysrq+w output is attached.SysRq : Show Blocked State
  taskPC stack   pid father
btrfs-submit-0  D 88021c8bbf00 0   851  2 0x
 880245b07b20 0046 0048 880245b06010
 880246ce2b80 00011340 880245b07fd8 4000
 880245b07fd8 00011340 88021f2d95c0 880246ce2b80
Call Trace:
 [81333f7a] ? put_device+0x12/0x14
 [81343af6] ? scsi_request_fn+0x341/0x40d
 [812a4643] ? __blk_run_queue+0x16/0x18
 [814fdc2f] io_schedule+0x51/0x66
 [812a6bd9] get_request_wait+0xa1/0x12f
 [81050c85] ? wake_up_bit+0x25/0x25
 [812a3585] ? elv_merge+0x99/0xa9
 [812a6dee] __make_request+0x187/0x27f
 [812a540f] generic_make_request+0x229/0x2a4
 [812a5553] submit_bio+0xc9/0xe8
 [81267b24] run_scheduled_bios+0x296/0x415
 [81044692] ? del_timer+0x83/0x83
 [81267cb3] pending_bios_fn+0x10/0x12
 [8126d591] worker_loop+0x189/0x4b7
 [8126d408] ? btrfs_queue_worker+0x263/0x263
 [8126d408] ? btrfs_queue_worker+0x263/0x263
 [810508bc] kthread+0x7d/0x85
 [81500c94] kernel_thread_helper+0x4/0x10
 [8105083f] ? kthread_worker_fn+0x13a/0x13a
 [81500c90] ? gs_change+0xb/0xb
syncD 000102884001 0 22091  19401 0x
 88019f0edc98 0086 8801 88019f0ec010
 880227808000 00011340 88019f0edfd8 4000
 88019f0edfd8 00011340 8181f020 880227808000
Call Trace:
 [810b189b] ? add_partial+0x1b/0x64
 [810b36d6] ? kmem_cache_free+0x8e/0x93
 [812623ad] ? free_extent_state+0x43/0x47
 [81086c2b] ? __lock_page+0x68/0x68
 [814fdc2f] io_schedule+0x51/0x66
 [81086c34] sleep_on_page+0x9/0xd
 [814fe32c] __wait_on_bit+0x43/0x76
 [81086df3] wait_on_page_bit+0x6d/0x74
 [81050cb9] ? autoremove_wake_function+0x34/0x34
 [81086b0e] ? find_get_page+0x19/0x66
 [81247d60] btrfs_wait_marked_extents+0xeb/0x124
 [81247ee6] btrfs_write_and_wait_marked_extents+0x2a/0x3c
 [81247f3a] btrfs_write_and_wait_transaction+0x42/0x44
 [81248676] btrfs_commit_transaction+0x53c/0x650
 [81050c85] ? wake_up_bit+0x25/0x25
 [810db775] ? __sync_filesystem+0x75/0x75
 [8122a33a] btrfs_sync_fs+0x66/0x6b
 [810db761] __sync_filesystem+0x61/0x75
 [810db786] sync_one_sb+0x11/0x13
 [810bc89c] iterate_supers+0x67/0xbd
 [810db7c8] sys_sync+0x40/0x57
 [814fff7b] system_call_fastpath+0x16/0x1b
Sched Debug Version: v0.10, 2.6.39 #3
ktime   : 141912370.788052
sched_clk   : 141831001.710073
cpu_clk : 141912370.788875
jiffies : 4337451011
sched_clock_stable  : 0

sysctl_sched
  .sysctl_sched_latency: 12.00
  .sysctl_sched_min_granularity: 1.50
  .sysctl_sched_wakeup_granularity : 2.00
  .sysctl_sched_child_runs_first   : 0
  .sysctl_sched_features   : 7279
  .sysctl_sched_tunable_scaling: 1 (logaritmic)

cpu#0, 2009.138 MHz
  .nr_running: 0
  .load  : 0
  .nr_switches   : 145331671
  .nr_load_updates   : 14210439
  .nr_uninterruptible: 0
  .next_balance  : 4337.451012
  .curr-pid : 0
  .clock : 141912370.762325
  .cpu_load[0]   : 0
  .cpu_load[1]   : 0
  .cpu_load[2]   : 14
  .cpu_load[3]   : 87
  .cpu_load[4]   : 139
  .yld_count : 20406719
  .sched_switch  : 0
  .sched_count   : 167876926
  .sched_goidle  : 51210495
  .avg_idle

Re: Abysmal Performance

2011-06-21 Thread Sander
Henning Rohlfs wrote (ao):
 - space_cache was enabled, but it seemed to make the problem worse.
 It's no longer in the mount options.

space_cache is a one time mount option which enabled space_cache. Not
supplying it anymore as a mount option has no effect (dmesg | grep btrfs).

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL v3] Btrfs: improve write ahead log with sub transaction

2011-06-21 Thread Liu Bo
I've been working to try to improve the write-ahead log's performance,
and I found that the bottleneck addresses in the checksum items,
especially when we want to make a random write on a large file, e.g a 4G file.

Then a idea for this suggested by Chris is to use sub transaction ids and just
to log the part of inode that had changed since either the last log commit or
the last transaction commit.  And as we also push the sub transid into the btree
blocks, we'll get much faster tree walks.  As a result, we abandon the original
brute force approach, which is to delete all items of the inode in log,
to making sure we get the most uptodate copies of everything, and instead
we manage to find and merge, i.e. finding extents in the log tree and merging
in the new extents from the file.

This patchset puts the above idea into code, and although the code is now more
complex, it brings us a great deal of performance improvement:

in my sysbench write + fsync test:

451.01Kb/sec - 4.3621Mb/sec

In v2, thanks to Chris, we worked together to solve 2 bugs, and after that it
works as expected.

Since there are some vital changes in recent rc, like kill trans_mutex and
use cur_trans, as David asked, I rebase the patchset to the latest for-linus
branch.

More tests are welcome!

You can also get this patchset from:

git://repo.or.cz/linux-btrfs-devel.git sub-trans

Liu Bo (12):
  Btrfs: introduce sub transaction stuff
  Btrfs: update block generation if should_cow_block fails
  Btrfs: modify btrfs_drop_extents API
  Btrfs: introduce first sub trans
  Btrfs: still update inode trans stuff when size remains unchanged
  Btrfs: improve log with sub transaction
  Btrfs: add checksum check for log
  Btrfs: fix a bug of log check
  Btrfs: kick off useless code
  Btrfs: deal with EEXIST after iput
  Btrfs: use the right generation number to read log_root_tree
  Revert Btrfs: do not flush csum items of unchanged file data during
treelog

 fs/btrfs/btrfs_inode.h |   12 ++-
 fs/btrfs/ctree.c   |   69 +---
 fs/btrfs/ctree.h   |5 +-
 fs/btrfs/disk-io.c |   12 +-
 fs/btrfs/extent-tree.c |   10 +-
 fs/btrfs/file.c|   22 ++---
 fs/btrfs/inode.c   |   33 ---
 fs/btrfs/ioctl.c   |6 +-
 fs/btrfs/relocation.c  |6 +-
 fs/btrfs/transaction.c |   14 ++-
 fs/btrfs/transaction.h |   19 +++-
 fs/btrfs/tree-defrag.c |2 +-
 fs/btrfs/tree-log.c|  272 ++-
 13 files changed, 331 insertions(+), 151 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/12 v3] Btrfs: update block generation if should_cow_block fails

2011-06-21 Thread Liu Bo
Cause we've added sub transaction, if it do not want to cow a block, we also
need to get new sub transid recorded.  This is used for log code to find the
most uptodate file extents.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.c |   34 +-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index f35b517..cfddb05 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -477,6 +477,33 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
return 0;
 }
 
+static inline void update_block_generation(struct btrfs_trans_handle *trans,
+  struct btrfs_root *root,
+  struct extent_buffer *buf,
+  struct extent_buffer *parent,
+  int slot)
+{
+   /*
+* If it does not need to cow this block, we still need to
+* update the block's generation, for transid may have been
+* changed during fsync.
+   */
+   if (btrfs_header_generation(buf) == trans-transid)
+   return;
+
+   if (buf == root-node) {
+   btrfs_set_header_generation(buf, trans-transid);
+   btrfs_mark_buffer_dirty(buf);
+   add_root_to_dirty_list(root);
+   } else {
+   btrfs_set_node_ptr_generation(parent, slot,
+ trans-transid);
+   btrfs_set_header_generation(buf, trans-transid);
+   btrfs_mark_buffer_dirty(parent);
+   btrfs_mark_buffer_dirty(buf);
+   }
+}
+
 static inline int should_cow_block(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   struct extent_buffer *buf)
@@ -517,6 +544,7 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
}
 
if (!should_cow_block(trans, root, buf)) {
+   update_block_generation(trans, root, buf, parent, parent_slot);
*cow_ret = buf;
return 0;
}
@@ -1655,8 +1683,12 @@ again:
 * then we don't want to set the path blocking,
 * so we test it here
 */
-   if (!should_cow_block(trans, root, b))
+   if (!should_cow_block(trans, root, b)) {
+   update_block_generation(trans, root, b,
+   p-nodes[level + 1],
+   p-slots[level + 1]);
goto cow_done;
+   }
 
btrfs_set_path_blocking(p);
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/12 v3] Btrfs: introduce first sub trans

2011-06-21 Thread Liu Bo
In multi-thread situations, writeback of a file may span across several
sub transactions, and we need to introduce first_sub_trans to get sub_transid of
teh first sub transaction recorded, so that log code can skip file extents which
have been logged or committed onto disk.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/btrfs_inode.h |9 +
 fs/btrfs/inode.c   |   13 -
 fs/btrfs/transaction.h |   17 -
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 52d7eca..8eca5de 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -80,6 +80,15 @@ struct btrfs_inode {
/* sequence number for NFS changes */
u64 sequence;
 
+   /* used to avoid race of first_sub_trans */
+   spinlock_t sub_trans_lock;
+
+   /*
+* sub transid of the trans that first modified this inode before
+* a trans commit or a log sync
+*/
+   u64 first_sub_trans;
+
/*
 * transid of the trans_handle that last modified this inode
 */
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ff9d4d1..93c9d22 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6468,7 +6468,16 @@ again:
set_page_dirty(page);
SetPageUptodate(page);
 
-   BTRFS_I(inode)-last_trans = root-fs_info-generation;
+   spin_lock(BTRFS_I(inode)-sub_trans_lock);
+
+   if (BTRFS_I(inode)-first_sub_trans  root-fs_info-sub_generation ||
+   BTRFS_I(inode)-last_trans = BTRFS_I(inode)-logged_trans ||
+   BTRFS_I(inode)-last_trans = root-fs_info-last_trans_committed)
+   BTRFS_I(inode)-first_sub_trans = root-fs_info-sub_generation;
+
+   spin_unlock(BTRFS_I(inode)-sub_trans_lock);
+
+   BTRFS_I(inode)-last_trans = root-fs_info-sub_generation;
BTRFS_I(inode)-last_sub_trans = BTRFS_I(inode)-root-log_transid;
 
unlock_extent_cached(io_tree, page_start, page_end, cached_state, 
GFP_NOFS);
@@ -6714,6 +6723,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei-space_info = NULL;
ei-generation = 0;
ei-sequence = 0;
+   ei-first_sub_trans = 0;
ei-last_trans = 0;
ei-last_sub_trans = 0;
ei-logged_trans = 0;
@@ -6740,6 +6750,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
extent_io_tree_init(ei-io_tree, inode-i_data);
extent_io_tree_init(ei-io_failure_tree, inode-i_data);
mutex_init(ei-log_mutex);
+   spin_lock_init(ei-sub_trans_lock);
btrfs_ordered_inode_tree_init(ei-ordered_tree);
INIT_LIST_HEAD(ei-i_orphan);
INIT_LIST_HEAD(ei-delalloc_inodes);
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 45876b0..f5ca0fd 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -73,7 +73,22 @@ struct btrfs_pending_snapshot {
 static inline void btrfs_set_inode_last_trans(struct btrfs_trans_handle *trans,
  struct inode *inode)
 {
-   BTRFS_I(inode)-last_trans = trans-transaction-transid;
+   spin_lock(BTRFS_I(inode)-sub_trans_lock);
+
+   /*
+* We have joined in a transaction, so btrfs_commit_transaction will
+* definitely wait for us and it does not need to add a extra
+* trans_mutex lock here.
+*/
+   if (BTRFS_I(inode)-first_sub_trans  trans-transid ||
+   BTRFS_I(inode)-last_trans = BTRFS_I(inode)-logged_trans ||
+   BTRFS_I(inode)-last_trans =
+BTRFS_I(inode)-root-fs_info-last_trans_committed)
+   BTRFS_I(inode)-first_sub_trans = trans-transid;
+
+   spin_unlock(BTRFS_I(inode)-sub_trans_lock);
+
+   BTRFS_I(inode)-last_trans = trans-transid;
BTRFS_I(inode)-last_sub_trans = BTRFS_I(inode)-root-log_transid;
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12 v3] Btrfs: modify btrfs_drop_extents API

2011-06-21 Thread Liu Bo
We want to use btrfs_drop_extent() in log code.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h|3 ++-
 fs/btrfs/file.c |9 +++--
 fs/btrfs/inode.c|6 +++---
 fs/btrfs/ioctl.c|4 ++--
 fs/btrfs/tree-log.c |2 +-
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 54fcc62..073fb37 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2604,7 +2604,8 @@ int btrfs_drop_extent_cache(struct inode *inode, u64 
start, u64 end,
int skip_pinned);
 extern const struct file_operations btrfs_file_operations;
 int btrfs_drop_extents(struct btrfs_trans_handle *trans, struct inode *inode,
-  u64 start, u64 end, u64 *hint_byte, int drop_cache);
+  u64 start, u64 end, u64 *hint_byte, int drop_cache,
+  int log);
 int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
  struct inode *inode, u64 start, u64 end);
 int btrfs_release_file(struct inode *inode, struct file *file);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index fa4ef18..3c434bb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -547,7 +547,8 @@ int btrfs_drop_extent_cache(struct inode *inode, u64 start, 
u64 end,
  * is deleted from the tree.
  */
 int btrfs_drop_extents(struct btrfs_trans_handle *trans, struct inode *inode,
-  u64 start, u64 end, u64 *hint_byte, int drop_cache)
+  u64 start, u64 end, u64 *hint_byte, int drop_cache,
+  int log)
 {
struct btrfs_root *root = BTRFS_I(inode)-root;
struct extent_buffer *leaf;
@@ -567,6 +568,10 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, 
struct inode *inode,
int recow;
int ret;
 
+   /* drop the existed extents in log tree */
+   if (log)
+   root = root-log_root;
+
if (drop_cache)
btrfs_drop_extent_cache(inode, start, end - 1, 0);
 
@@ -747,7 +752,7 @@ next_slot:
extent_end - key.offset);
extent_end = ALIGN(extent_end,
   root-sectorsize);
-   } else if (disk_bytenr  0) {
+   } else if (disk_bytenr  0  !log) {
ret = btrfs_free_extent(trans, root,
disk_bytenr, num_bytes, 0,
root-root_key.objectid,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6291445..ff9d4d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -246,7 +246,7 @@ static noinline int cow_file_range_inline(struct 
btrfs_trans_handle *trans,
}
 
ret = btrfs_drop_extents(trans, inode, start, aligned_end,
-hint_byte, 1);
+hint_byte, 1, 0);
BUG_ON(ret);
 
if (isize  actual_end)
@@ -1658,7 +1658,7 @@ static int insert_reserved_file_extent(struct 
btrfs_trans_handle *trans,
 * with the others.
 */
ret = btrfs_drop_extents(trans, inode, file_pos, file_pos + num_bytes,
-hint, 0);
+hint, 0, 0);
BUG_ON(ret);
 
ins.objectid = btrfs_ino(inode);
@@ -3517,7 +3517,7 @@ int btrfs_cont_expand(struct inode *inode, loff_t 
oldsize, loff_t size)
 
err = btrfs_drop_extents(trans, inode, cur_offset,
 cur_offset + hole_size,
-hint_byte, 1);
+hint_byte, 1, 0);
if (err)
break;
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 29026a7..72e1a98 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2339,7 +2339,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, 
unsigned long srcfd,
ret = btrfs_drop_extents(trans, inode,
 new_key.offset,
 new_key.offset + datal,
-hint_byte, 1);
+hint_byte, 1, 0);
BUG_ON(ret);
 
ret = btrfs_insert_empty_item(trans, root, path,
@@ -2394,7 +2394,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, 
unsigned long srcfd,
ret = btrfs_drop_extents(trans, inode,
 new_key.offset,
 new_key.offset + datal,
-

[PATCH 07/12 v3] Btrfs: add checksum check for log

2011-06-21 Thread Liu Bo
If a inode is a BTRFS_INODE_NODATASUM one, it need not to look for csum items
any more.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/tree-log.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6193e27..1eca1d4 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2672,7 +2672,8 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
   struct inode *inode,
   struct btrfs_path *dst_path,
   struct extent_buffer *src,
-  int start_slot, int nr, int inode_only)
+  int start_slot, int nr, int inode_only,
+  int csum)
 {
unsigned long src_offset;
unsigned long dst_offset;
@@ -2739,7 +2740,8 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
 * or deletes of this inode don't have to relog the inode
 * again
 */
-   if (btrfs_key_type(ins_keys + i) == BTRFS_EXTENT_DATA_KEY) {
+   if (btrfs_key_type(ins_keys + i) ==
+   BTRFS_EXTENT_DATA_KEY  csum) {
int found_type;
extent = btrfs_item_ptr(src, start_slot + i,
struct btrfs_file_extent_item);
@@ -2857,6 +2859,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle 
*trans,
int ins_nr;
u64 ino = btrfs_ino(inode);
u64 transid;
+   int csum = (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM) ? 0 : 1;
 
/*
* We use transid in btrfs_search_forward() as a filter, in order to
@@ -2934,7 +2937,7 @@ filter:
if (ins_nr) {
ret = copy_items(trans, inode, dst_path, src,
 ins_start_slot,
-ins_nr, inode_only);
+ins_nr, inode_only, csum);
if (ret) {
err = ret;
goto out_unlock;
@@ -2953,7 +2956,7 @@ next_slot:
if (ins_nr) {
ret = copy_items(trans, inode, dst_path, src,
 ins_start_slot,
-ins_nr, inode_only);
+ins_nr, inode_only, csum);
if (ret) {
err = ret;
goto out_unlock;
@@ -2974,7 +2977,7 @@ next_slot:
if (ins_nr) {
ret = copy_items(trans, inode, dst_path, src,
 ins_start_slot,
-ins_nr, inode_only);
+ins_nr, inode_only, csum);
if (ret) {
err = ret;
goto out_unlock;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12 v3] Btrfs: introduce sub transaction stuff

2011-06-21 Thread Liu Bo
Introduce a new concept sub transaction,
the relation between transaction and sub transaction is

transaction A   --- transid = x
   sub trans a(1)   --- sub_transid = x+1
   sub trans a(2)   --- sub_transid = x+2
 ... ...
   sub trans a(n-1) --- sub_transid = x+n-1
   sub trans a(n)   --- sub_transid = x+n
transaction B   --- transid = x+n+1
 ... ...

And the most important is
a) a trans handler's transid now gets value from sub transid instead of transid.
b) when a transaction commits, transid may not added by 1, but depend on the
   biggest sub_transaction of the last neighbour transaction,
   i.e.
B-transid = a(n)-transid + 1,
(B-transid - A-transid) = 1
c) we start a new sub transaction after a fsync.

We also ship some 'trans-transid' to 'trans-transaction-transid' to
ensure btrfs works well and to get rid of WARNings.

These are used for the new log code.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.c   |   35 ++-
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |7 ---
 fs/btrfs/extent-tree.c |   10 ++
 fs/btrfs/inode.c   |4 ++--
 fs/btrfs/ioctl.c   |2 +-
 fs/btrfs/relocation.c  |6 +++---
 fs/btrfs/transaction.c |   14 +-
 fs/btrfs/transaction.h |1 +
 fs/btrfs/tree-defrag.c |2 +-
 fs/btrfs/tree-log.c|   16 ++--
 11 files changed, 60 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2e66786..f35b517 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -194,9 +194,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
int level;
struct btrfs_disk_key disk_key;
 
-   WARN_ON(root-ref_cows  trans-transid !=
+   WARN_ON(root-ref_cows  trans-transaction-transid !=
root-fs_info-running_transaction-transid);
-   WARN_ON(root-ref_cows  trans-transid != root-last_trans);
+   WARN_ON(root-ref_cows  trans-transid  root-last_trans);
 
level = btrfs_header_level(buf);
if (level == 0)
@@ -391,9 +391,9 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
 
btrfs_assert_tree_locked(buf);
 
-   WARN_ON(root-ref_cows  trans-transid !=
+   WARN_ON(root-ref_cows  trans-transaction-transid !=
root-fs_info-running_transaction-transid);
-   WARN_ON(root-ref_cows  trans-transid != root-last_trans);
+   WARN_ON(root-ref_cows  trans-transid  root-last_trans);
 
level = btrfs_header_level(buf);
 
@@ -459,7 +459,8 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
else
parent_start = 0;
 
-   WARN_ON(trans-transid != btrfs_header_generation(parent));
+   WARN_ON(btrfs_header_generation(parent) 
+   trans-transaction-transid);
btrfs_set_node_blockptr(parent, parent_slot,
cow-start);
btrfs_set_node_ptr_generation(parent, parent_slot,
@@ -480,7 +481,7 @@ static inline int should_cow_block(struct 
btrfs_trans_handle *trans,
   struct btrfs_root *root,
   struct extent_buffer *buf)
 {
-   if (btrfs_header_generation(buf) == trans-transid 
+   if (btrfs_header_generation(buf) = trans-transaction-transid 
!btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) 
!(root-root_key.objectid != BTRFS_TREE_RELOC_OBJECTID 
  btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
@@ -508,7 +509,7 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
   root-fs_info-running_transaction-transid);
WARN_ON(1);
}
-   if (trans-transid != root-fs_info-generation) {
+   if (trans-transaction-transid != root-fs_info-generation) {
printk(KERN_CRIT trans %llu running %llu\n,
   (unsigned long long)trans-transid,
   (unsigned long long)root-fs_info-generation);
@@ -611,7 +612,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
 
if (trans-transaction != root-fs_info-running_transaction)
WARN_ON(1);
-   if (trans-transid != root-fs_info-generation)
+   if (trans-transaction-transid != root-fs_info-generation)
WARN_ON(1);
 
parent_nritems = btrfs_header_nritems(parent);
@@ -891,7 +892,7 @@ static noinline int balance_level(struct btrfs_trans_handle 
*trans,
mid = path-nodes[level];
 
WARN_ON(!path-locks[level]);
-   WARN_ON(btrfs_header_generation(mid) != trans-transid);
+   WARN_ON(btrfs_header_generation(mid)  trans-transaction-transid);
 
orig_ptr = btrfs_node_blockptr(mid, orig_slot);
 
@@ -1098,7 +1099,7 @@ static noinline int push_nodes_for_insert(struct 
btrfs_trans_handle *trans,
 

[PATCH 05/12 v3] Btrfs: still update inode trans stuff when size remains unchanged

2011-06-21 Thread Liu Bo
Due to DIO stuff, commit 1ef30be142d2cc60e2687ef267de864cf31be995 makes btrfs
not call btrfs_update_inode when it does not update i_disk_size, but in buffer
write case, we need to update btrfs internal inode's trans stuff, so that the
log code can find the inode's changes.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/inode.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 93c9d22..47dcbc0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1790,7 +1790,8 @@ static int btrfs_finish_ordered_io(struct inode *inode, 
u64 start, u64 end)
if (!ret) {
ret = btrfs_update_inode(trans, root, inode);
BUG_ON(ret);
-   }
+   } else
+   btrfs_set_inode_last_trans(trans, inode);
ret = 0;
 out:
if (nolock) {
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/12 v3] Btrfs: deal with EEXIST after iput

2011-06-21 Thread Liu Bo
There are two cases when BTRFS_I(inode)-logged_trans is zero:
a) an inode is just allocated;
b) iput an inode and reread it.

However, in b) if btrfs is not committed yet, and this inode _may_ still remain
in log tree.

So we need to check the log tree to get logged_trans a right value
in case it hits a EEXIST while logging.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/inode.c|9 +++--
 fs/btrfs/tree-log.c |   43 +++
 2 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0007ae3..6eff806 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1786,12 +1786,9 @@ static int btrfs_finish_ordered_io(struct inode *inode, 
u64 start, u64 end)
add_pending_csums(trans, inode, ordered_extent-file_offset,
  ordered_extent-list);
 
-   ret = btrfs_ordered_update_i_size(inode, 0, ordered_extent);
-   if (!ret) {
-   ret = btrfs_update_inode(trans, root, inode);
-   BUG_ON(ret);
-   } else
-   btrfs_set_inode_last_trans(trans, inode);
+   btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+   ret = btrfs_update_inode(trans, root, inode);
+   BUG_ON(ret);
ret = 0;
 out:
if (nolock) {
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index f1e95bf..3cf7ed2 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3063,6 +3063,37 @@ out:
return ret;
 }
 
+static int check_logged_trans(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root, struct inode *inode)
+{
+   struct btrfs_inode_item *inode_item;
+   struct btrfs_path *path;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   ret = btrfs_search_slot(trans, root,
+   BTRFS_I(inode)-location, path, 0, 0);
+   if (ret) {
+   if (ret  0)
+   ret = 0;
+   goto out;
+   }
+
+   btrfs_unlock_up_safe(path, 1);
+   inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
+   struct btrfs_inode_item);
+
+   BTRFS_I(inode)-logged_trans = btrfs_inode_transid(path-nodes[0],
+  inode_item);
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
+
 static int inode_in_log(struct btrfs_trans_handle *trans,
 struct inode *inode)
 {
@@ -3115,6 +3146,18 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle 
*trans,
if (ret)
goto end_no_trans;
 
+   /*
+* After we iput a inode and reread it from disk, logged_trans is 0.
+* However, this inode _may_ still remain in log tree and not be
+* committed yet.
+* So we need to check the log tree to get logged_trans a right value.
+*/
+   if (!BTRFS_I(inode)-logged_trans  root-log_root) {
+   ret = check_logged_trans(trans, root-log_root, inode);
+   if (ret)
+   goto end_no_trans;
+   }
+
if (inode_in_log(trans, inode)) {
ret = BTRFS_NO_LOG_SYNC;
goto end_no_trans;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/12 v3] Btrfs: fix a bug of log check

2011-06-21 Thread Liu Bo
The current code uses struct root's last_log_commit to check if an inode
has been logged, but the problem is that this root-last_log_commit is
shared among files.  Say we have N inodes to be logged, after the first
inode, root-last_log_commit is updated and the N-1 remains will not be
logged.

As we've introduce sub transaction and filled inode's last_trans and
logged_trans with sub_transid instead of transaction id, we can just
compare last_trans with logged_trans to determine if the processing inode
is logged.  And the more important thing is these two values are
inode-individual, so it will not interfere with others.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/btrfs_inode.h |5 -
 fs/btrfs/ctree.h   |1 -
 fs/btrfs/disk-io.c |2 --
 fs/btrfs/inode.c   |2 --
 fs/btrfs/transaction.h |1 -
 fs/btrfs/tree-log.c|   16 +++-
 6 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 8eca5de..a85161e 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -95,11 +95,6 @@ struct btrfs_inode {
u64 last_trans;
 
/*
-* log transid when this inode was last modified
-*/
-   u64 last_sub_trans;
-
-   /*
 * transid that last logged this inode
 */
u64 logged_trans;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 073fb37..da3c769 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1157,7 +1157,6 @@ struct btrfs_root {
atomic_t log_writers;
atomic_t log_commit[2];
unsigned long log_transid;
-   unsigned long last_log_commit;
unsigned long log_batch;
pid_t log_start_pid;
bool log_multiple_pids;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b7e80c3..d6d71f2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1065,7 +1065,6 @@ static int __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
atomic_set(root-log_writers, 0);
root-log_batch = 0;
root-log_transid = 0;
-   root-last_log_commit = 0;
extent_io_tree_init(root-dirty_log_pages,
 fs_info-btree_inode-i_mapping);
 
@@ -1202,7 +1201,6 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
WARN_ON(root-log_root);
root-log_root = log_root;
root-log_transid = 0;
-   root-last_log_commit = 0;
return 0;
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 47dcbc0..0007ae3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6479,7 +6479,6 @@ again:
spin_unlock(BTRFS_I(inode)-sub_trans_lock);
 
BTRFS_I(inode)-last_trans = root-fs_info-sub_generation;
-   BTRFS_I(inode)-last_sub_trans = BTRFS_I(inode)-root-log_transid;
 
unlock_extent_cached(io_tree, page_start, page_end, cached_state, 
GFP_NOFS);
 
@@ -6726,7 +6725,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei-sequence = 0;
ei-first_sub_trans = 0;
ei-last_trans = 0;
-   ei-last_sub_trans = 0;
ei-logged_trans = 0;
ei-delalloc_bytes = 0;
ei-reserved_bytes = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index f5ca0fd..fd2474a 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -89,7 +89,6 @@ static inline void btrfs_set_inode_last_trans(struct 
btrfs_trans_handle *trans,
spin_unlock(BTRFS_I(inode)-sub_trans_lock);
 
BTRFS_I(inode)-last_trans = trans-transid;
-   BTRFS_I(inode)-last_sub_trans = BTRFS_I(inode)-root-log_transid;
 }
 
 int btrfs_end_transaction(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 1eca1d4..f1e95bf 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1983,7 +1983,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
int ret;
struct btrfs_root *log = root-log_root;
struct btrfs_root *log_root_tree = root-fs_info-log_root_tree;
-   unsigned long log_transid = 0;
 
mutex_lock(root-log_mutex);
index1 = root-log_transid % 2;
@@ -2018,8 +2017,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
goto out;
}
 
-   log_transid = root-log_transid;
-   if (log_transid % 2 == 0)
+   if (root-log_transid % 2 == 0)
mark = EXTENT_DIRTY;
else
mark = EXTENT_NEW;
@@ -2126,11 +2124,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
btrfs_scrub_continue_super(root);
ret = 0;
 
-   mutex_lock(root-log_mutex);
-   if (root-last_log_commit  log_transid)
-   root-last_log_commit = log_transid;
-   mutex_unlock(root-log_mutex);
-
 out_wake_log_root:
atomic_set(log_root_tree-log_commit[index2], 0);
smp_mb();
@@ -3073,14 +3066,11 @@ out:
 static int inode_in_log(struct btrfs_trans_handle *trans,
 struct inode *inode)
 {
-

[PATCH 09/12 v3] Btrfs: kick off useless code

2011-06-21 Thread Liu Bo
fsync will wait for writeback till it finishes, and last_trans will get the real
transid recorded in writeback, so it does not need an extra +1 to ensure fsync's
process on the file.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/file.c |   13 -
 1 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 3c434bb..8f27007 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1404,19 +1404,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
 
mutex_unlock(inode-i_mutex);
 
-   /*
-* we want to make sure fsync finds this change
-* but we haven't joined a transaction running right now.
-*
-* Later on, someone is sure to update the inode and get the
-* real transid recorded.
-*
-* We set last_trans now to the fs_info generation + 1,
-* this will either be one more than the running transaction
-* or the generation used for the next transaction if there isn't
-* one running right now.
-*/
-   BTRFS_I(inode)-last_trans = root-fs_info-generation + 1;
if (num_written  0 || num_written == -EIOCBQUEUED) {
err = generic_write_sync(file, pos, num_written);
if (err  0  num_written  0)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/12 v3] Revert Btrfs: do not flush csum items of unchanged file data during treelog

2011-06-21 Thread Liu Bo
This reverts commit 8e531cdfeb75269c6c5aae33651cca39707848da.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/tree-log.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 166fda5..d4ccf91 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2741,9 +2741,6 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
extent = btrfs_item_ptr(src, start_slot + i,
struct btrfs_file_extent_item);
 
-   if (btrfs_file_extent_generation(src, extent)  
trans-transid)
-   continue;
-
found_type = btrfs_file_extent_type(src, extent);
if (found_type == BTRFS_FILE_EXTENT_REG ||
found_type == BTRFS_FILE_EXTENT_PREALLOC) {
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/12 v3] Btrfs: improve log with sub transaction

2011-06-21 Thread Liu Bo
When logging an inode _A_, current btrfs will
a) clear all items belonged to _A_ in log,
b) copy all items belonged to _A_ from fs/file tree to log tree,
and this just wastes a lot of time, especially when logging big files.

So we want to use a smarter approach, i.e. find and merge.
The amount of file extent items is the largest, so we focus on it.
Thanks to sub transaction, now we can find those file extent items which
are changed after last _transaction commit_ or last _log commit_, and
then merge them with the existed ones in log tree.

It will be great helpful on fsync performance, cause the common case is
make changes on a _part_ of inode.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/tree-log.c |  179 ---
 1 files changed, 127 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index b91fe8b..6193e27 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2581,61 +2581,106 @@ again:
 }
 
 /*
- * a helper function to drop items from the log before we relog an
- * inode.  max_key_type indicates the highest item type to remove.
- * This cannot be run for file data extents because it does not
- * free the extents they point to.
+ * a helper function to drop items from the log before we merge
+ * the uptodate items into the log tree.
  */
-static int drop_objectid_items(struct btrfs_trans_handle *trans,
- struct btrfs_root *log,
- struct btrfs_path *path,
- u64 objectid, int max_key_type)
+static  int prepare_for_merge_items(struct btrfs_trans_handle *trans,
+   struct inode *inode,
+   struct extent_buffer *eb,
+   int slot, int nr)
 {
-   int ret;
-   struct btrfs_key key;
+   struct btrfs_root *log = BTRFS_I(inode)-root-log_root;
+   struct btrfs_path *path;
struct btrfs_key found_key;
+   struct btrfs_key key;
+   int i;
+   int ret;
 
-   key.objectid = objectid;
-   key.type = max_key_type;
-   key.offset = (u64)-1;
+   /* There are no relative items of the inode in log. */
+   if (BTRFS_I(inode)-logged_trans  trans-transaction-transid)
+   return 0;
 
-   while (1) {
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   for (i = 0; i  nr; i++) {
+   btrfs_item_key_to_cpu(eb, key, i + slot);
+
+   if (btrfs_key_type(key) == BTRFS_EXTENT_DATA_KEY) {
+   struct btrfs_file_extent_item *fi;
+   int found_type;
+   u64 mask = BTRFS_I(inode)-root-sectorsize - 1;
+   u64 start = key.offset;
+   u64 extent_end;
+   u64 hint;
+   unsigned long size;
+
+   fi = btrfs_item_ptr(eb, slot + i,
+   struct btrfs_file_extent_item);
+   found_type = btrfs_file_extent_type(eb, fi);
+
+   if (found_type == BTRFS_FILE_EXTENT_REG ||
+   found_type == BTRFS_FILE_EXTENT_PREALLOC)
+   extent_end = start +
+   btrfs_file_extent_num_bytes(eb, fi);
+   else if (found_type == BTRFS_FILE_EXTENT_INLINE) {
+   size = btrfs_file_extent_inline_len(eb, fi);
+   extent_end = (start + size + mask)  ~mask;
+   } else
+   BUG_ON(1);
+
+   /* drop any overlapping extents */
+   ret = btrfs_drop_extents(trans, inode, start,
+extent_end, hint, 0, 1);
+   BUG_ON(ret);
+
+   continue;
+   }
+
+   /* non file extent */
ret = btrfs_search_slot(trans, log, key, path, -1, 1);
-   BUG_ON(ret == 0);
if (ret  0)
break;
 
-   if (path-slots[0] == 0)
+   /* empty log! */
+   if (ret  0  path-slots[0] == 0)
break;
 
-   path-slots[0]--;
+   if (ret  0) {
+   btrfs_release_path(path);
+   continue;
+   }
+
btrfs_item_key_to_cpu(path-nodes[0], found_key,
  path-slots[0]);
 
-   if (found_key.objectid != objectid)
-   break;
+   if (btrfs_comp_cpu_keys(found_key, key))
+   BUG_ON(1);
 
ret = btrfs_del_item(trans, log, path);
-   if (ret)
-   break;
+   

[PATCH 11/12 v3] Btrfs: use the right generation number to read log_root_tree

2011-06-21 Thread Liu Bo
Currently we use the generation number of the super to read in the log
tree root after a crash.  This doesn't always match the sub trans id and
so it doesn't always match the transid stored in the btree blocks.

We can use log_root_transid to record the log_root_tree's generation
so that when we recover from crash, we can match log_root_tree's btree blocks.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c  |3 ++-
 fs/btrfs/tree-log.c |2 ++
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d6d71f2..4568317 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2002,6 +2002,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (btrfs_super_log_root(disk_super) != 0 
!(fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)) {
u64 bytenr = btrfs_super_log_root(disk_super);
+   u64 log_root_transid = btrfs_super_log_root_transid(disk_super);
 
if (fs_devices-rw_devices == 0) {
printk(KERN_WARNING Btrfs log replay required 
@@ -2024,7 +2025,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
log_tree_root-node = read_tree_block(tree_root, bytenr,
  blocksize,
- generation + 1);
+ log_root_transid);
ret = btrfs_recover_log_trees(log_tree_root);
BUG_ON(ret);
 
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3cf7ed2..166fda5 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2105,6 +2105,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
log_root_tree-node-start);
btrfs_set_super_log_root_level(root-fs_info-super_for_commit,
btrfs_header_level(log_root_tree-node));
+   btrfs_set_super_log_root_transid(root-fs_info-super_for_commit,
+trans-transid);
 
log_root_tree-log_batch = 0;
log_root_tree-log_transid++;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Abysmal Performance

2011-06-21 Thread Henning Rohlfs

On Tue, 21 Jun 2011 10:00:59 +0200, Sander wrote:

Henning Rohlfs wrote (ao):

- space_cache was enabled, but it seemed to make the problem worse.
It's no longer in the mount options.


space_cache is a one time mount option which enabled space_cache. Not
supplying it anymore as a mount option has no effect (dmesg | grep 
btrfs).


I'm sure that after the first reboot after removing the flag from the 
mount options, the system was faster for a while. That must have been a 
coincidence (or just an error on my part).


Anyway, I rebooted with clear_cache as mount option and there was no 
improvement either.



Thanks for pointing this out,
Henning
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] fs: kill i_alloc_sem

2011-06-21 Thread Christoph Hellwig
On Tue, Jun 21, 2011 at 03:40:56PM +1000, Dave Chinner wrote:
 Modification of inode-i_state is not safe outside the
 inode-i_lock.

We never actually set the new bit in i_state, we just use it as a key
for the hashed lookups.  Or rather we try to, as I misunderstood how
wait_on_bit works, so currently we busywait for i_dio_count to reach
zero.  I'll respin a version that actually works as expected.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: fix space leak when trimming free extents

2011-06-21 Thread Li Dongyang
On Tuesday, June 21, 2011 01:50:10 PM Li Zefan wrote:
 When the end of an extent exceeds the end of the specified range,
 the extent will be accidentally truncated.
 
 Signed-off-by: Li Zefan l...@cn.fujitsu.com
 ---
  fs/btrfs/free-space-cache.c |9 -
  1 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
 index 292c0d9..185cf8e 100644
 --- a/fs/btrfs/free-space-cache.c
 +++ b/fs/btrfs/free-space-cache.c
 @@ -2509,8 +2509,15 @@ int btrfs_trim_block_group(struct 
 btrfs_block_group_cache *block_group,
   bytes = min(entry-bytes, end - start);
   if (bytes  minlen)
   goto next;
 +
   unlink_free_space(ctl, entry);
 - kmem_cache_free(btrfs_free_space_cachep, entry);
 + if (bytes  entry-bytes) {
 + entry-offset = entry-offset + bytes;
 + entry-bytes = entry-bytes - bytes;
 + link_free_space(ctl, entry);
yes, I forgot to link the rest extent to the free space cache.
Thanks for the fix!
 + } else {
 + kmem_cache_free(btrfs_free_space_cachep, entry);
 + }
   }
  
   spin_unlock(ctl-tree_lock);
 -- 1.7.3.1 
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Applications using fsync cause hangs for several seconds every few minutes

2011-06-21 Thread Jan Stilow
Hello,

Nirbheek Chauhan nirbheek at gentoo.org writes:
 However, recently, perhaps with 2.6.39, or after I quickly started
 filling up my disk again, it has become impossible for me to work for
 long periods on my machine.

 Every few minutes, (I guess) when applications do fsync (firefox,
 xchat, vim, etc), all applications that use fsync() hang for several
 seconds, and applications that use general IO suffer extreme
 slowdowns. iotop shows various combinations of the processes listed
 below doing writes, and the total write as 2-3MB/s.

 [btrfs-dealloc-]
 [btrfs-submit-0]
 [btrfs-transacti]
 [btrfs-endio-wri]
 [flush-btrfs-1]

I'm using btrfs under a 2.6.39-ARCH kernel and run into the same issue.

In my case the [btrfs-submit-0] and [btrfs-transacti] shows up in iotop
and produce 99% of IO at the time a application is frozen. For something
like 10 to 30 seconds.

After a fresh boot everything runs quiet acceptable and goes worse over
the day. If I reboot the unmount process takes a lot of time something
up to 8 minutes. After the reboot everything is back to normal. At least
for a while.

I use snapshots on a daily base and have 20 active snapshots all the
time. Maybe this is the reason for the performance impact?

 What can I do to debug this issue? What other information should I
 supply? Could someone guide me on how to figure out why my machine is
 unusable now?

I have no solution for this problem. In fact a reboot helps me even if
only temporary. I also changed the IO scheduler, switched to laptop mode
and remounted the device. Neither of these helped.

Maybe someone else has any suggestion?

Thanks.

Jan Stilow
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-21 Thread Francesco R
[HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

Hi list, I've a broken btrfs filesystem to deal with can someone
please help me in recover the data?

The filesystem has been created a pair of years ago with 4 devices
with the command at #create and is mounted with #fstab options.
Recently I've added a pair of devices and made a `btrfs filesystem
balance`, after it succeded I was doing a `btrfs device delete` on
space02 (the currently broken one) in the middle of this the power
cable has been axed.
After replacing the cable cord 'space01' is mountable, 'space02' is not.

tryed to use a backup copy of super with `btrfs-select-super` but it
fail as reported in #btrfs-select-super

please, pretty please have you suggestion on what try next?


#current kernel (vanilla + linux-vserver)
uname -a
Linux dobbia 2.6.38.8-vs2.3.0.37-rc17 #5 SMP Mon Jun 20 15:04:39 CEST
2011 x86_64 Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz GenuineIntel
GNU/Linux

#create
modprobe btrfs
mkfs.btrfs -L space01 -m raid10 -d raid10 $DEVICES1
mkfs.btrfs -L space02 -m raid10 -d raid10 $DEVICES2

# fstab
/dev/sda6 /mnt/space01 btrfs
defaults,device=/dev/sda6,device=/dev/sdc6,device=/dev/sdd1,device=/dev/sde1,device=/dev/sdf6,device=/dev/sdg6
0 0
/dev/sda7 /mnt/space02 btrfs
defaults,device=/dev/sda7,device=/dev/sdb7,device=/dev/sdc7,device=/dev/sdd2,device=/dev/sde2,device=/dev/sdf7,device=/dev/sdg7
0 0

# current layout
btrfs filesystem show
failed to read /dev/sr0
Label: 'space01'  uuid: c77c6e87-fccd-4204-bd2c-d924fe06be31
Total devices 6 FS bytes used 164.81GB
devid7 size 244.14GB used 56.59GB path /dev/sdf6
devid5 size 244.93GB used 56.59GB path /dev/sdd1
devid8 size 244.14GB used 56.59GB path /dev/sdg6
devid6 size 244.93GB used 56.59GB path /dev/sde1
devid4 size 244.14GB used 56.59GB path /dev/sda6
devid3 size 244.14GB used 56.59GB path /dev/sdc6

Label: 'space02'  uuid: f752def1-1abc-48c7-8ebb-47ba37b8ffa6
Total devices 7 FS bytes used 172.94GB
devid7 size 487.65GB used 0.00 path /dev/sdf7
devid6 size 488.94GB used 60.25GB path /dev/sde2
devid5 size 488.94GB used 58.75GB path /dev/sdd2
devid4 size 487.65GB used 60.26GB path /dev/sda7
devid7 size 487.65GB used 1.50GB path /dev/sdg7
devid2 size 487.65GB used 58.76GB path /dev/sdb7
devid3 size 487.65GB used 60.26GB path /dev/sdc7

Btrfs v0.19-35-g1b444cd-dirty

# first error messages
Jun 20 14:04:35 dobbia kernel: [  806.587580] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:04:35 dobbia kernel: [  806.629781] device label space02
devid 2 transid 756848 /dev/sdb7
Jun 20 14:04:35 dobbia kernel: [  806.630107] device label space02
devid 3 transid 757294 /dev/sdc7
Jun 20 14:04:35 dobbia kernel: [  806.652126] device label space02
devid 5 transid 756846 /dev/sdd2
Jun 20 14:04:37 dobbia kernel: [  808.201719] device label space02
devid 6 transid 757294 /dev/sde2
Jun 20 14:04:37 dobbia kernel: [  808.218108] device label space02
devid 7 transid 756846 /dev/sdf7
Jun 20 14:04:37 dobbia kernel: [  808.218433] device label space02
devid 7 transid 757294 /dev/sdg7
Jun 20 14:04:37 dobbia kernel: [  808.218715] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:04:37 dobbia kernel: [  808.271797] btrfs: failed to read
the system array on sdg7
Jun 20 14:04:37 dobbia kernel: [  808.293776] btrfs: open_ctree failed
Jun 20 14:04:56 dobbia kernel: [  827.190208] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:04:56 dobbia kernel: [  827.254517] btrfs: failed to read
the system array on sdg7
Jun 20 14:04:56 dobbia kernel: [  827.280152] btrfs: open_ctree failed
Jun 20 14:05:01 dobbia kernel: [  832.442454] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:05:01 dobbia kernel: [  832.502017] btrfs: failed to read
the system array on sdg7
Jun 20 14:05:01 dobbia kernel: [  832.521492] btrfs: open_ctree failed
Jun 20 14:05:20 dobbia kernel: [  851.113237] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:05:20 dobbia kernel: [  851.199478] btrfs: allowing degraded mounts
Jun 20 14:05:20 dobbia kernel: [  851.563583] parent transid verify
failed on 600755752960 wanted 757102 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.564146] parent transid verify
failed on 600755752960 wanted 757102 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.651006] btrfs bad tree block
start 0 600859951104
Jun 20 14:05:20 dobbia kernel: [  851.671362] parent transid verify
failed on 600859955200 wanted 756926 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.671636] parent transid verify
failed on 600859955200 wanted 756926 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.693515] btrfs bad tree block
start 0 601053986816
Jun 20 14:05:20 dobbia kernel: [  851.693559] btrfs bad tree block
start 0 601054003200
Jun 20 14:05:20 dobbia kernel: [  851.693566] 

Re: [PATCH 06/12 v3] Btrfs: improve log with sub transaction

2011-06-21 Thread Josef Bacik

On 06/21/2011 04:49 AM, Liu Bo wrote:

When logging an inode _A_, current btrfs will
a) clear all items belonged to _A_ in log,
b) copy all items belonged to _A_ from fs/file tree to log tree,
and this just wastes a lot of time, especially when logging big files.

So we want to use a smarter approach, i.e. find and merge.
The amount of file extent items is the largest, so we focus on it.
Thanks to sub transaction, now we can find those file extent items which
are changed after last _transaction commit_ or last _log commit_, and
then merge them with the existed ones in log tree.

It will be great helpful on fsync performance, cause the common case is
make changes on a _part_ of inode.

Signed-off-by: Liu Boliubo2...@cn.fujitsu.com
---
  fs/btrfs/tree-log.c |  179 ---
  1 files changed, 127 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index b91fe8b..6193e27 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2581,61 +2581,106 @@ again:
  }

  /*
- * a helper function to drop items from the log before we relog an
- * inode.  max_key_type indicates the highest item type to remove.
- * This cannot be run for file data extents because it does not
- * free the extents they point to.
+ * a helper function to drop items from the log before we merge
+ * the uptodate items into the log tree.
   */
-static int drop_objectid_items(struct btrfs_trans_handle *trans,
- struct btrfs_root *log,
- struct btrfs_path *path,
- u64 objectid, int max_key_type)
+static  int prepare_for_merge_items(struct btrfs_trans_handle *trans,
+   struct inode *inode,
+   struct extent_buffer *eb,
+   int slot, int nr)
  {
-   int ret;
-   struct btrfs_key key;
+   struct btrfs_root *log = BTRFS_I(inode)-root-log_root;
+   struct btrfs_path *path;
struct btrfs_key found_key;
+   struct btrfs_key key;
+   int i;
+   int ret;

-   key.objectid = objectid;
-   key.type = max_key_type;
-   key.offset = (u64)-1;
+   /* There are no relative items of the inode in log. */
+   if (BTRFS_I(inode)-logged_trans  trans-transaction-transid)
+   return 0;

-   while (1) {
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   for (i = 0; i  nr; i++) {
+   btrfs_item_key_to_cpu(eb,key, i + slot);
+


Just a nit, but it would be less of a pain to do

for (i = slot; i  slot+nr; i++)

that way you don't have a bunch of getter(eb, slot + i) things scattered 
around.



+   if (btrfs_key_type(key) == BTRFS_EXTENT_DATA_KEY) {
+   struct btrfs_file_extent_item *fi;
+   int found_type;
+   u64 mask = BTRFS_I(inode)-root-sectorsize - 1;
+   u64 start = key.offset;
+   u64 extent_end;
+   u64 hint;
+   unsigned long size;
+
+   fi = btrfs_item_ptr(eb, slot + i,
+   struct btrfs_file_extent_item);
+   found_type = btrfs_file_extent_type(eb, fi);
+
+   if (found_type == BTRFS_FILE_EXTENT_REG ||
+   found_type == BTRFS_FILE_EXTENT_PREALLOC)
+   extent_end = start +
+   btrfs_file_extent_num_bytes(eb, fi);
+   else if (found_type == BTRFS_FILE_EXTENT_INLINE) {
+   size = btrfs_file_extent_inline_len(eb, fi);
+   extent_end = (start + size + mask)  ~mask;
+   } else
+   BUG_ON(1);
+


Style screw up, if one chunk of an if/else block has braces, they all 
need it.



+   /* drop any overlapping extents */
+   ret = btrfs_drop_extents(trans, inode, start,
+extent_end,hint, 0, 1);


There are lot's of places where you've missed a space, like here where 
you have extent_end,hint



+   BUG_ON(ret);
+
+   continue;
+   }
+
+   /* non file extent */
ret = btrfs_search_slot(trans, log,key, path, -1, 1);
-   BUG_ON(ret == 0);
if (ret  0)
break;

-   if (path-slots[0] == 0)
+   /* empty log! */
+   if (ret  0  path-slots[0] == 0)


Here too.


break;

-   path-slots[0]--;
+   if (ret  0) {


Here.


+   btrfs_release_path(path);
+   continue;
+   }
+

Re: [PATCH 10/12 v3] Btrfs: deal with EEXIST after iput

2011-06-21 Thread Josef Bacik

On 06/21/2011 04:49 AM, Liu Bo wrote:

There are two cases when BTRFS_I(inode)-logged_trans is zero:
a) an inode is just allocated;
b) iput an inode and reread it.

However, in b) if btrfs is not committed yet, and this inode _may_ still remain
in log tree.

So we need to check the log tree to get logged_trans a right value
in case it hits a EEXIST while logging.


Instead of doing this why not just check and see if the inode has been 
logged but the transaction has not yet been committed in 
btrfs_drop_inode?  That way the inode doesn't get evicted from cache 
until after we know it's ok and that way we don't have to waste a tree 
lookup.  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-21 Thread Daniel Witzel
Welcome to the club, I have a similar issue. We pretty much have to wait for the
fsck tool to finish being developed. If possible unhook the drives and leave
them be until the tool is done. I don't know when it will be done as I am not a
developer, mearly a follower.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] added helper functions to iterate backrefs

2011-06-21 Thread David Sterba
On Sat, Jun 18, 2011 at 01:45:58PM +0200, Jan Schmidt wrote:
 +/*
 + * returns 0 if the path could be dumped (probably truncated)
 + * returns 0 in case of an error
 + */
 +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
 + struct extent_buffer *eb_ir, int slot,
 + void *ctx)
 +{
 + struct inode_fs_paths *ipath = ctx;
 + struct extent_buffer *eb_i = ipath-eb;
 + u32 path_len;
 + char *fs_path;
 +
 + if (ipath-bytes_left  2)
 + return -EOVERFLOW;
 +
 + *ipath-dest++ = ' ';
 + --ipath-bytes_left;

this actually prepends a space before the first entry.

I've played a bit with this interface, it's the missing piece to
implement the 'inode nubmer - directory names' search :)
All the names are placed into one buffer, 4k sized, separated by a
space. Not all files fit, so I suggest to print one filename at a time.
Me-user wants to see all the filenames, even if the list is potentially
long.

I've also noticed that in iref_to_path()

130 len = btrfs_inode_ref_name_len(eb, iref);

returns very large values, which do not fit into the 4k buffer and the
function returns. When trying to print the data by read_extent_buffer to
a temporary array, a pagefault occurs (and seems to come from reading
the extent buffer). The numbers are like 19000 or higher, which are way
above path or filename maximum size. I don't think this is the
intentional behaviour, it relies on some side effect rather than clear
logic.

Example:
ipath buffer and scratch are 32K, each, ie. the overly sized
ref_name_len will fit there:

[ 8766.928232] btrfs: ino2name: 266  p1/
[ 8767.440964] i2p: [4] namelen 10, left 32766
[ 8767.446417] i2p: [7] path:
[ 8767.450445] i2p: [4] namelen 2, left 32755
[ 8767.455733] i2p: [7] path: /
[ 8767.459834] i2p: [2] p1/
[ 8767.463593] i2p: [4] namelen 0, left 32752
[ 8767.468903] i2p: [7] path:
[ 8767.472908] i2p: [4] namelen 2, left 32751
[ 8767.478188] i2p: [7] path: /
[ 8767.482280] i2p: [2] p1/
[ 8767.486024] i2p: [4] namelen 0, left 32748
[ 8767.491293] i2p: [7] path:
[ 8767.495283] i2p: [4] namelen 2, left 32747
[ 8767.500587] i2p: [7] path: /
[ 8767.504680] i2p: [2] p1/
[ 8767.508430] i2p: [4] namelen 0, left 32744
[ 8767.513708] i2p: [7] path:
[ 8767.517702] i2p: [4] namelen 2, left 32743
[ 8767.522969] i2p: [7] path: /
[ 8767.527042] i2p: [2] p1/
[ 8767.530787] i2p: [4] namelen 0, left 32740
[ 8767.536049] i2p: [7] path:
[ 8767.539991] i2p: [4] namelen 2, left 32739
[ 8767.545282] i2p: [7] path: /
[ 8767.549374] i2p: [2] p1/
[ 8767.553109] i2p: [4] namelen 0, left 32736
[ 8767.558377] i2p: [7] path:
[ 8767.562354] i2p: [4] namelen 2, left 32735
[ 8767.567615] i2p: [7] path: /
[ 8767.571713] i2p: [2] p1/
[ 8767.575443] i2p: [4] namelen 0, left 32732
[ 8767.580701] i2p: [7] path:
[ 8767.584678] i2p: [4] namelen 2, left 32731
[ 8767.589933] i2p: [7] path: /
[ 8767.594007] i2p: [2] p1/
[ 8767.597713] i2p: [4] namelen 0, left 32728
[ 8767.602908] i2p: [7] path:
[ 8767.606832] i2p: [4] namelen 2, left 32727
[ 8767.612030] i2p: [7] path: /
[ 8767.616050] i2p: [2] p1/
[ 8767.619676] i2p: [4] namelen 0, left 32724
[ 8767.624873] i2p: [7] path:
[ 8767.628782] i2p: [4] namelen 2, left 32723
[ 8767.633970] i2p: [7] path: /
[ 8767.637962] i2p: [2] p1/
[ 8767.641639] i2p: [4] namelen 0, left 32720
[ 8767.646838] i2p: [7] path:
[ 8767.650757] i2p: [4] namelen 2, left 32719
[ 8767.655952] i2p: [7] path: /
[ 8767.659940] i2p: [2] p1/
[ 8767.663604] i2p: [4] namelen 0, left 32716
[ 8767.668790] i2p: [7] path:
[ 8767.672696] i2p: [4] namelen 2, left 32715
[ 8767.677881] i2p: [7] path: /
[ 8767.681878] i2p: [2] p1/
[ 8767.685549] i2p: [4] namelen 0, left 32712
[ 8767.690742] i2p: [7] path:
[ 8767.694655] i2p: [4] namelen 2, left 32711
[ 8767.699843] i2p: [7] path: /
[ 8767.703840] i2p: [2] p1/
[ 8767.707520] i2p: [4] namelen 19967, left 32708
[ 8767.713057] i2p: [7] path:
[ 8767.716955] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[ 8767.720932] IP: [a005f1d2] read_extent_buffer+0xa2/0x220 [btrfs]
[ 8767.720932] PGD 77bd0067 PUD 79d35067 PMD 0
[ 8767.720932] Oops:  [#1] SMP
[ 8767.720932] CPU 1
[ 8767.720932] Modules linked in: btrfs loop
[ 8767.720932]
[ 8767.720932] Pid: 10859, comm: btrfs-ino2path Not tainted 3.0.0-rc3-default+ 
#75 Intel Corporation Santa Rosa platform/Matanzas
[ 8767.720932] RIP: 0010:[a005f1d2]  [a005f1d2] 
read_extent_buffer+0xa2/0x220 [btrfs]
[ 8767.720932] RSP: 0018:880074acbc58  EFLAGS: 00010202
[ 8767.720932] RAX:  RBX: 3deb RCX: 1000
[ 8767.720932] RDX: 01e0 RSI:  RDI: 0246
[ 8767.720932] RBP: 880074acbcb8 R08:  R09: 0001
[ 8767.720932] R10:  R11: 1c04 R12: 0002
[ 8767.720932] R13:  R14: 880079bac1d9 R15: 880074acbfd8
[ 8767.720932] FS:  

Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-21 Thread Francesco R
2011/6/21 Daniel Witzel dannyboy48...@gmail.com:
 Welcome to the club, I have a similar issue. We pretty much have to wait for 
 the
 fsck tool to finish being developed. If possible unhook the drives and leave
 them be until the tool is done. I don't know when it will be done as I am not 
 a
 developer, mearly a follower.


There are tools to view the metadata stored as raid10? possibly in
high level language?

I see Chris Mason stopped git commits to btrfs-progs-unstable in 2010,
there is someone working on it?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] added helper functions to iterate backrefs

2011-06-21 Thread Jan Schmidt
On 21.06.2011 16:37, David Sterba wrote:
 On Sat, Jun 18, 2011 at 01:45:58PM +0200, Jan Schmidt wrote:
 +/*
 + * returns 0 if the path could be dumped (probably truncated)
 + * returns 0 in case of an error
 + */
 +static int inode_to_path(u64 inum, struct btrfs_inode_ref *iref,
 +struct extent_buffer *eb_ir, int slot,
 +void *ctx)
 +{
 +struct inode_fs_paths *ipath = ctx;
 +struct extent_buffer *eb_i = ipath-eb;
 +u32 path_len;
 +char *fs_path;
 +
 +if (ipath-bytes_left  2)
 +return -EOVERFLOW;
 +
 +*ipath-dest++ = ' ';
 +--ipath-bytes_left;
 
 this actually prepends a space before the first entry.

Which is fine because we print the buffer with (path:%s) in
scrub.c:257, and a space is well appropriate there. That way we don't
need to handle the first path differently. (Makes only sense as long as
we stick to multiple files per line).

 I've played a bit with this interface, it's the missing piece to
 implement the 'inode nubmer - directory names' search :)
 All the names are placed into one buffer, 4k sized, separated by a
 space. Not all files fit, so I suggest to print one filename at a time.

It does print all references to one *inode* to the same buffer, right.
If there are more inodes refering to that extent, those paths are
printed to a separate buffer. You can make some reflinks to try that out.

 Me-user wants to see all the filenames, even if the list is potentially
 long.

Well, maybe. However, for me, the number of hardlinks created is either
very low, or I know why a file has a whole bunch of hardlinks myself
(e.g. some kind of backup strategy). In both cases, this approach would
fit. I don't want to see some thousands of printk messages for that one
file I know to have a lot of hardlinks - maybe even missing some error
of a non-hardlinked file due to rate limiting.

On the other hand, you could argue, when I deliberately create a bunch
of reflinks, I will get the same problem. Another approach would be to
put it all into one buffer, no matter which inode it comes from. Either
way shouldn't be hard to accomplish.

 I've also noticed that in iref_to_path()
 
 130 len = btrfs_inode_ref_name_len(eb, iref);
 
 returns very large values, which do not fit into the 4k buffer and the
 function returns. When trying to print the data by read_extent_buffer to
 a temporary array, a pagefault occurs (and seems to come from reading
 the extent buffer). The numbers are like 19000 or higher, which are way
 above path or filename maximum size. I don't think this is the
 intentional behaviour, it relies on some side effect rather than clear
 logic.

Something is going wrong here:

 Example:
 ipath buffer and scratch are 32K, each, ie. the overly sized
 ref_name_len will fit there:
 
 [ 8766.928232] btrfs: ino2name: 266  p1/
 [ 8767.440964] i2p: [4] namelen 10, left 32766
 [ 8767.446417] i2p: [7] path:
 [ 8767.450445] i2p: [4] namelen 2, left 32755
 [ 8767.455733] i2p: [7] path: /
 [ 8767.459834] i2p: [2] p1/

Do I get that right? This is inode 266, which should refer to
./p1/d6/d7/d1f/c41 (as you state below). Already on second iteration,
we're at something named pl, which shouldn't happen as we construct
the path bottom-up. And those namelen 0 in the following lines shouldn't
happen, either. Furthermore, the path should change on every iteration
(I guess, that might depend on the printk behind, of course).

 [ 8767.463593] i2p: [4] namelen 0, left 32752
 [ 8767.468903] i2p: [7] path:
 [ 8767.472908] i2p: [4] namelen 2, left 32751
 [ 8767.478188] i2p: [7] path: /
 [ 8767.482280] i2p: [2] p1/
 [ 8767.486024] i2p: [4] namelen 0, left 32748
 [ 8767.491293] i2p: [7] path:
 [ 8767.495283] i2p: [4] namelen 2, left 32747
 [ 8767.500587] i2p: [7] path: /
 [ 8767.504680] i2p: [2] p1/
 [ 8767.508430] i2p: [4] namelen 0, left 32744
 [ 8767.513708] i2p: [7] path:
 [ 8767.517702] i2p: [4] namelen 2, left 32743
 [ 8767.522969] i2p: [7] path: /
 [ 8767.527042] i2p: [2] p1/
 [ 8767.530787] i2p: [4] namelen 0, left 32740
 [ 8767.536049] i2p: [7] path:
 [ 8767.539991] i2p: [4] namelen 2, left 32739
 [ 8767.545282] i2p: [7] path: /
 [ 8767.549374] i2p: [2] p1/
 [ 8767.553109] i2p: [4] namelen 0, left 32736
 [ 8767.558377] i2p: [7] path:
 [ 8767.562354] i2p: [4] namelen 2, left 32735
 [ 8767.567615] i2p: [7] path: /
 [ 8767.571713] i2p: [2] p1/
 [ 8767.575443] i2p: [4] namelen 0, left 32732
 [ 8767.580701] i2p: [7] path:
 [ 8767.584678] i2p: [4] namelen 2, left 32731
 [ 8767.589933] i2p: [7] path: /
 [ 8767.594007] i2p: [2] p1/
 [ 8767.597713] i2p: [4] namelen 0, left 32728
 [ 8767.602908] i2p: [7] path:
 [ 8767.606832] i2p: [4] namelen 2, left 32727
 [ 8767.612030] i2p: [7] path: /
 [ 8767.616050] i2p: [2] p1/
 [ 8767.619676] i2p: [4] namelen 0, left 32724
 [ 8767.624873] i2p: [7] path:
 [ 8767.628782] i2p: [4] namelen 2, left 32723
 [ 8767.633970] i2p: [7] path: /
 [ 8767.637962] i2p: [2] p1/
 [ 8767.641639] i2p: [4] 

Re: Abysmal Performance

2011-06-21 Thread Josef Bacik
On 06/21/2011 05:26 AM, Henning Rohlfs wrote:
 On Tue, 21 Jun 2011 10:00:59 +0200, Sander wrote:
 Henning Rohlfs wrote (ao):
 - space_cache was enabled, but it seemed to make the problem worse.
 It's no longer in the mount options.

 space_cache is a one time mount option which enabled space_cache. Not
 supplying it anymore as a mount option has no effect (dmesg | grep
 btrfs).
 
 I'm sure that after the first reboot after removing the flag from the
 mount options, the system was faster for a while. That must have been a
 coincidence (or just an error on my part).
 

No, the space cache will make your system faster _after_ having been
enabled once.  The reason for this is because we have to build the cache
the slow way at first, and then after that we can do it the fast way.
What is probably happening is your box is slowing down trying to build
this cache.  Don't mount with clear_cache unless there is a bug in your
cache.  Let it do it's thing and stuff will get faster.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Abysmal Performance

2011-06-21 Thread Calvin Walton
On Mon, 2011-06-20 at 23:51 +0200, Henning Rohlfs wrote:
 Hello,
 
  I've migrated my system to btrfs (raid1) a few months ago. Since then 
  the performance has been pretty bad, but recently it's gotten 
  unbearable: a simple sync called while the system is idle can take 20 up 
  to 60 seconds. Creating or deleting files often has several seconds 
  latency, too.

I think I’ve been seeing a fairly similar, or possibly the same? issue
as well. It looks like it’s actually a regression introduced in 2.6.39 -
if I switch back to a 2.6.38 kernel, my latency issues magically go
away! (I'm curious: does using the older 2.6.38.x kernel help with
anyone else that's seeing the issue?)

Some hardware/configuration details:
btrfs on a single disc (Seagate Momentus XT hybrid), lzo compression and
space cache enabled. Some snapshots in use.

I notice that in latencytop I'm seeing a lot of lines with (cropped)
traces like

sleep_on_page wait_on_page_bit read_extent_buffer_ 13.3 msec  0.5 %

showing up that I didn't see with the 2.6.38 kernel. I occasionally see
latencies as bad as 20-30 seconds on operations like fsync or
synchronous writes.

I think I can reproduce the issue well enough to bisect it, so I might
give that a try. It'll be slow going, though.

-- 
Calvin Walton calvin.wal...@kepstin.ca

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: save preloaded extent_state's in a percpu cache V2

2011-06-21 Thread Josef Bacik
When doing DIO tracing I noticed we were doing a ton of allocations, a lot of
the time for extent_states.  Some of the time we don't even use the prealloc'ed
extent_state, it just get's free'd up.  So instead create a per-cpu cache like
the radix tree stuff.  So we will check to see if our per-cpu cache has a
prealloc'ed extent_state in it and if so we just continue, else we alloc a new
one and fill the cache.  Then if we need to use a prealloc'ed extent_state we
can just take it out of our per-cpu cache.  We will also refill the cache on
free to try and limit the number of times we have to ask the allocator for
caches.  With this patch dbench 50 goes from ~210 mb/s to ~260 mb/s.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
V1-V2: Fix a uneven preempt_disable() when we use a prealloc but still have to
search again.
 fs/btrfs/extent_io.c |  167 +
 1 files changed, 126 insertions(+), 41 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7055d11..9adc614 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -12,6 +12,7 @@
 #include linux/pagevec.h
 #include linux/prefetch.h
 #include linux/cleancache.h
+#include linux/cpu.h
 #include extent_io.h
 #include extent_map.h
 #include compat.h
@@ -31,6 +32,8 @@ static DEFINE_SPINLOCK(leak_lock);
 
 #define BUFFER_LRU_MAX 64
 
+static DEFINE_PER_CPU(struct extent_state *, extent_state_preloads) = NULL;
+
 struct tree_entry {
u64 start;
u64 end;
@@ -71,10 +74,36 @@ free_state_cache:
return -ENOMEM;
 }
 
+static void __free_extent_state(struct extent_state *state)
+{
+   if (!state)
+   return;
+   if (atomic_dec_and_test(state-refs)) {
+#if LEAK_DEBUG
+   unsigned long flags;
+#endif
+   WARN_ON(state-tree);
+#if LEAK_DEBUG
+   spin_lock_irqsave(leak_lock, flags);
+   list_del(state-leak_list);
+   spin_unlock_irqrestore(leak_lock, flags);
+#endif
+   kmem_cache_free(extent_state_cache, state);
+   }
+}
 void extent_io_exit(void)
 {
struct extent_state *state;
struct extent_buffer *eb;
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   state = per_cpu(extent_state_preloads, cpu);
+   if (!state)
+   continue;
+   per_cpu(extent_state_preloads, cpu) = NULL;
+   __free_extent_state(state);
+   }
 
while (!list_empty(states)) {
state = list_entry(states.next, struct extent_state, leak_list);
@@ -114,16 +143,11 @@ void extent_io_tree_init(struct extent_io_tree *tree,
tree-mapping = mapping;
 }
 
-static struct extent_state *alloc_extent_state(gfp_t mask)
+static void init_extent_state(struct extent_state *state)
 {
-   struct extent_state *state;
 #if LEAK_DEBUG
unsigned long flags;
 #endif
-
-   state = kmem_cache_alloc(extent_state_cache, mask);
-   if (!state)
-   return state;
state-state = 0;
state-private = 0;
state-tree = NULL;
@@ -134,6 +158,16 @@ static struct extent_state *alloc_extent_state(gfp_t mask)
 #endif
atomic_set(state-refs, 1);
init_waitqueue_head(state-wq);
+}
+
+static struct extent_state *alloc_extent_state(gfp_t mask)
+{
+   struct extent_state *state;
+
+   state = kmem_cache_alloc(extent_state_cache, mask);
+   if (!state)
+   return state;
+   init_extent_state(state);
return state;
 }
 
@@ -142,6 +176,7 @@ void free_extent_state(struct extent_state *state)
if (!state)
return;
if (atomic_dec_and_test(state-refs)) {
+   struct extent_state *tmp;
 #if LEAK_DEBUG
unsigned long flags;
 #endif
@@ -151,10 +186,53 @@ void free_extent_state(struct extent_state *state)
list_del(state-leak_list);
spin_unlock_irqrestore(leak_lock, flags);
 #endif
-   kmem_cache_free(extent_state_cache, state);
+   preempt_disable();
+   tmp = __get_cpu_var(extent_state_preloads);
+   if (!tmp) {
+   init_extent_state(state);
+   __get_cpu_var(extent_state_preloads) = state;
+   } else {
+   kmem_cache_free(extent_state_cache, state);
+   }
+   preempt_enable();
}
 }
 
+/*
+ * Pre-load an extent state.
+ */
+static int extent_state_preload(gfp_t mask)
+{
+   struct extent_state *state;
+
+   preempt_disable();
+   state = __get_cpu_var(extent_state_preloads);
+   if (!state) {
+   struct extent_state *tmp;
+   preempt_enable();
+   tmp = alloc_extent_state(mask);
+   if (!tmp)
+   return -ENOMEM;
+   preempt_disable();
+   state = __get_cpu_var(extent_state_preloads);
+  

[PATCH] Btrfs: fix how we merge extent states and deal with cached states V2

2011-06-21 Thread Josef Bacik
First, we can sometimes free the state we're merging, which means anybody who
calls merge_state() may have the state it passed in free'ed.  This is
problematic because we could end up caching the state, which makes caching
useless as the state will no longer be part of the tree.  So instead of free'ing
the state we passed into merge_state(), set it's end to the other-end and free
the other state.  This way we are sure to cache the correct state.  Also because
we can merge states together, instead of only using the cache'd state if it's
start == the start we are looking for, go ahead and use it if the start we are
looking for is within the range of the cached state.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
V1-V2:
-Make this patch apply properly ontop of the new percpu cache
-Accidently removed a cache_state(), add that back
 fs/btrfs/extent_io.c |   23 +++
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9adc614..9a43a96 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -359,11 +359,10 @@ static int merge_state(struct extent_io_tree *tree,
if (other-start == state-end + 1 
other-state == state-state) {
merge_cb(tree, state, other);
-   other-start = state-start;
-   state-tree = NULL;
-   rb_erase(state-rb_node, tree-state);
-   free_extent_state(state);
-   state = NULL;
+   state-end = other-end;
+   other-tree = NULL;
+   rb_erase(other-rb_node, tree-state);
+   free_extent_state(other);
}
}
 
@@ -429,7 +428,6 @@ static int insert_state(struct extent_io_tree *tree,
   %llu %llu\n, (unsigned long long)found-start,
   (unsigned long long)found-end,
   (unsigned long long)start, (unsigned long long)end);
-   free_extent_state(state);
return -EEXIST;
}
state-tree = tree;
@@ -589,7 +587,8 @@ again:
cached_state = NULL;
}
 
-   if (cached  cached-tree  cached-start == start) {
+   if (cached  cached-tree  cached-start = start 
+   cached-end  start) {
if (clear)
atomic_dec(cached-refs);
state = cached;
@@ -832,7 +831,8 @@ again:
spin_lock(tree-lock);
if (cached_state  *cached_state) {
state = *cached_state;
-   if (state-start == start  state-tree) {
+   if (state-start = start  state-end  start 
+   state-tree) {
node = state-rb_node;
goto hit_next;
}
@@ -872,13 +872,13 @@ hit_next:
if (err)
goto out;
 
-   next_node = rb_next(node);
cache_state(state, cached_state);
merge_state(tree, state);
if (last_end == (u64)-1)
goto out;
 
start = last_end + 1;
+   next_node = rb_next(state-rb_node);
if (next_node  start  end  !need_resched()) {
state = rb_entry(next_node, struct extent_state,
 rb_node);
@@ -950,7 +950,6 @@ hit_next:
 * Avoid to free 'prealloc' if it can be merged with
 * the later extent.
 */
-   atomic_inc(prealloc-refs);
err = insert_state(tree, prealloc, start, this_end,
   bits);
BUG_ON(err == -EEXIST);
@@ -959,7 +958,6 @@ hit_next:
goto out;
}
cache_state(prealloc, cached_state);
-   free_extent_state(prealloc);
start = this_end + 1;
goto search_again;
}
@@ -1649,7 +1647,8 @@ int test_range_bit(struct extent_io_tree *tree, u64 
start, u64 end,
int bitset = 0;
 
spin_lock(tree-lock);
-   if (cached  cached-tree  cached-start == start)
+   if (cached  cached-tree  cached-start = start 
+   cached-end  start)
node = cached-rb_node;
else
node = tree_search(tree, start);
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] far: remove i_alloc_sem abuse

2011-06-21 Thread OGAWA Hirofumi
Christoph Hellwig h...@infradead.org writes:

 Add a new rw_semaphore to protect bmap against truncate.  Previous
 i_alloc_sem was abused for this, but it's going away in this series.

In FAT case, -i_mutex was better. But, last time I saw, shmfs was using
-i_mutex to call -bmap. So, this was chosen instead.

I'm not checking current version yet though, if shmfs had change, we can
use -i_mutex.

BTW, this patch looks good to me.

Thanks.

 Signed-off-by: Christoph Hellwig h...@lst.de

 Index: linux-2.6/fs/fat/inode.c
 ===
 --- linux-2.6.orig/fs/fat/inode.c 2011-06-20 21:28:19.707963855 +0200
 +++ linux-2.6/fs/fat/inode.c  2011-06-20 21:29:25.031293882 +0200
 @@ -224,9 +224,9 @@ static sector_t _fat_bmap(struct address
   sector_t blocknr;
  
   /* fat_get_cluster() assumes the requested blocknr isn't truncated. */
 - down_read(mapping-host-i_alloc_sem);
 + down_read(MSDOS_I(mapping-host)-truncate_lock);
   blocknr = generic_block_bmap(mapping, block, fat_get_block);
 - up_read(mapping-host-i_alloc_sem);
 + up_read(MSDOS_I(mapping-host)-truncate_lock);
  
   return blocknr;
  }
 @@ -510,6 +510,8 @@ static struct inode *fat_alloc_inode(str
   ei = kmem_cache_alloc(fat_inode_cachep, GFP_NOFS);
   if (!ei)
   return NULL;
 +
 + init_rwsem(ei-truncate_lock);
   return ei-vfs_inode;
  }
  
 Index: linux-2.6/fs/fat/fat.h
 ===
 --- linux-2.6.orig/fs/fat/fat.h   2011-06-20 21:28:19.724630522 +0200
 +++ linux-2.6/fs/fat/fat.h2011-06-20 21:29:25.034627215 +0200
 @@ -109,6 +109,7 @@ struct msdos_inode_info {
   int i_attrs;/* unused attribute bits */
   loff_t i_pos;   /* on-disk position of directory entry or 0 */
   struct hlist_node i_fat_hash;   /* hash by i_location */
 + struct rw_semaphore truncate_lock; /* protect bmap against truncate */
   struct inode vfs_inode;
  };
  
 Index: linux-2.6/fs/fat/file.c
 ===
 --- linux-2.6.orig/fs/fat/file.c  2011-06-20 21:28:19.744630521 +0200
 +++ linux-2.6/fs/fat/file.c   2011-06-20 21:29:54.501292390 +0200
 @@ -429,8 +429,10 @@ int fat_setattr(struct dentry *dentry, s
   }
  
   if (attr-ia_valid  ATTR_SIZE) {
 + down_write(MSDOS_I(inode)-truncate_lock);
   truncate_setsize(inode, attr-ia_size);
   fat_truncate_blocks(inode, attr-ia_size);
 + up_write(MSDOS_I(inode)-truncate_lock);
   }
  
   setattr_copy(inode, attr);


-- 
OGAWA Hirofumi hirof...@mail.parknet.co.jp
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] far: remove i_alloc_sem abuse

2011-06-21 Thread OGAWA Hirofumi
OGAWA Hirofumi hirof...@mail.parknet.co.jp writes:

 Christoph Hellwig h...@infradead.org writes:

 Add a new rw_semaphore to protect bmap against truncate.  Previous
 i_alloc_sem was abused for this, but it's going away in this series.

 In FAT case, -i_mutex was better. But, last time I saw, shmfs was using
 -i_mutex to call -bmap. So, this was chosen instead.

 I'm not checking current version yet though, if shmfs had change, we can
 use -i_mutex.

It was swapfile. And unfortunately it doesn't have change.

 BTW, this patch looks good to me.
-- 
OGAWA Hirofumi hirof...@mail.parknet.co.jp
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] far: remove i_alloc_sem abuse

2011-06-21 Thread Christoph Hellwig
On Wed, Jun 22, 2011 at 12:57:43AM +0900, OGAWA Hirofumi wrote:
 Christoph Hellwig h...@infradead.org writes:
 
  Add a new rw_semaphore to protect bmap against truncate.  Previous
  i_alloc_sem was abused for this, but it's going away in this series.
 
 In FAT case, -i_mutex was better. But, last time I saw, shmfs was using
 -i_mutex to call -bmap. So, this was chosen instead.
 
 I'm not checking current version yet though, if shmfs had change, we can
 use -i_mutex.

I tried that first, but it gives an instant deadlock when using a
swapfile on fat.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Abysmal Performance

2011-06-21 Thread Henning Rohlfs

On Tue, 21 Jun 2011 11:18:30 -0400, Josef Bacik wrote:

On 06/21/2011 05:26 AM, Henning Rohlfs wrote:

On Tue, 21 Jun 2011 10:00:59 +0200, Sander wrote:

Henning Rohlfs wrote (ao):
- space_cache was enabled, but it seemed to make the problem 
worse.

It's no longer in the mount options.


space_cache is a one time mount option which enabled space_cache. 
Not

supplying it anymore as a mount option has no effect (dmesg | grep
btrfs).


I'm sure that after the first reboot after removing the flag from 
the
mount options, the system was faster for a while. That must have 
been a

coincidence (or just an error on my part).



No, the space cache will make your system faster _after_ having been
enabled once. The reason for this is because we have to build the 
cache

the slow way at first, and then after that we can do it the fast way.
What is probably happening is your box is slowing down trying to 
build
this cache.  Don't mount with clear_cache unless there is a bug in 
your

cache.  Let it do it's thing and stuff will get faster.


I'm just reporting what I experienced. I had space_cache in the mount 
options while the problem developed and removed it when the system got 
too slow. After the next reboot the system was responsive for a short 
time (an hour maybe - which seems to have been unrelated to the mount 
option though from what you described). Now there's no difference 
whatsoever between no options, space_cache and clear_cache.


To sum it up: I only played with the clear_cache option because the 
system got too slow in the first place. I don't see how the problem can 
be related to this option if changing it it makes no difference.


Thanks,
Henning
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] ext4: remove i_alloc_sem abuse

2011-06-21 Thread Lukas Czerner
On Mon, 20 Jun 2011, Christoph Hellwig wrote:

 Add a new rw_semaphore to protect page_mkwrite against truncate.  Previous
 i_alloc_sem was abused for this, but it's going away in this series.
 
 Signed-off-by: Christoph Hellwig h...@lst.de
 
 Index: linux-2.6/fs/ext4/inode.c
 ===
 --- linux-2.6.orig/fs/ext4/inode.c2011-06-20 14:25:33.779248128 +0200
 +++ linux-2.6/fs/ext4/inode.c 2011-06-20 14:27:53.025907745 +0200
 @@ -5357,6 +5357,7 @@ int ext4_setattr(struct dentry *dentry,
   if (attr-ia_size  sbi-s_bitmap_maxbytes)
   return -EFBIG;
   }
 + down_write((EXT4_I(inode)-truncate_lock));
   }
  
   if (S_ISREG(inode-i_mode) 
 @@ -5405,6 +5406,9 @@ int ext4_setattr(struct dentry *dentry,
   ext4_truncate(inode);
   }
  
 + if (attr-ia_valid  ATTR_SIZE)
 + up_write((EXT4_I(inode)-truncate_lock));
 +

Hi Christoph,

this looks like a bug fix, so we should rather do it in a separate
commit. Could you send it separately, or at least mention that in commit
description ?

Otherwise the patch looks good.

Thanks!
-Lukas

   if (!rc) {
   setattr_copy(inode, attr);
   mark_inode_dirty(inode);
 @@ -5850,10 +5854,10 @@ int ext4_page_mkwrite(struct vm_area_str
   struct address_space *mapping = inode-i_mapping;
  
   /*
 -  * Get i_alloc_sem to stop truncates messing with the inode. We cannot
 -  * get i_mutex because we are already holding mmap_sem.
 +  * Get truncate_lock to stop truncates messing with the inode. We
 +  * cannot get i_mutex because we are already holding mmap_sem.
*/
 - down_read(inode-i_alloc_sem);
 + down_read((EXT4_I(inode)-truncate_lock));
   size = i_size_read(inode);
   if (page-mapping != mapping || size = page_offset(page)
   || !PageUptodate(page)) {
 @@ -5865,7 +5869,7 @@ int ext4_page_mkwrite(struct vm_area_str
   lock_page(page);
   wait_on_page_writeback(page);
   if (PageMappedToDisk(page)) {
 - up_read(inode-i_alloc_sem);
 + up_read((EXT4_I(inode)-truncate_lock));
   return VM_FAULT_LOCKED;
   }
  
 @@ -5883,7 +5887,7 @@ int ext4_page_mkwrite(struct vm_area_str
   if (page_has_buffers(page)) {
   if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
   ext4_bh_unmapped)) {
 - up_read(inode-i_alloc_sem);
 + up_read((EXT4_I(inode)-truncate_lock));
   return VM_FAULT_LOCKED;
   }
   }
 @@ -5912,11 +5916,11 @@ int ext4_page_mkwrite(struct vm_area_str
*/
   lock_page(page);
   wait_on_page_writeback(page);
 - up_read(inode-i_alloc_sem);
 + up_read((EXT4_I(inode)-truncate_lock));
   return VM_FAULT_LOCKED;
  out_unlock:
   if (ret)
   ret = VM_FAULT_SIGBUS;
 - up_read(inode-i_alloc_sem);
 + up_read((EXT4_I(inode)-truncate_lock));
   return ret;
  }
 Index: linux-2.6/fs/ext4/super.c
 ===
 --- linux-2.6.orig/fs/ext4/super.c2011-06-20 14:25:33.795914793 +0200
 +++ linux-2.6/fs/ext4/super.c 2011-06-20 14:27:06.269243443 +0200
 @@ -871,6 +871,7 @@ static struct inode *ext4_alloc_inode(st
   ei-i_datasync_tid = 0;
   atomic_set(ei-i_ioend_count, 0);
   atomic_set(ei-i_aiodio_unwritten, 0);
 + init_rwsem(ei-truncate_lock);
  
   return ei-vfs_inode;
  }
 Index: linux-2.6/fs/ext4/ext4.h
 ===
 --- linux-2.6.orig/fs/ext4/ext4.h 2011-06-20 14:25:33.752581461 +0200
 +++ linux-2.6/fs/ext4/ext4.h  2011-06-20 14:27:06.272576777 +0200
 @@ -844,6 +844,9 @@ struct ext4_inode_info {
   /* on-disk additional length */
   __u16 i_extra_isize;
  
 + /* protect page_mkwrite against truncates */
 + struct rw_semaphore truncate_lock;
 +
  #ifdef CONFIG_QUOTA
   /* quota space reservation, managed internally by quota code */
   qsize_t i_reserved_quota;
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] ext4: remove i_alloc_sem abuse

2011-06-21 Thread Christoph Hellwig
On Tue, Jun 21, 2011 at 06:34:50PM +0200, Lukas Czerner wrote:
 this looks like a bug fix, so we should rather do it in a separate
 commit. Could you send it separately, or at least mention that in commit
 description ?

What looks like a bugfix?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] ext4: remove i_alloc_sem abuse

2011-06-21 Thread Lukas Czerner
On Tue, 21 Jun 2011, Lukas Czerner wrote:

 On Mon, 20 Jun 2011, Christoph Hellwig wrote:
 
  Add a new rw_semaphore to protect page_mkwrite against truncate.  Previous
  i_alloc_sem was abused for this, but it's going away in this series.
  
  Signed-off-by: Christoph Hellwig h...@lst.de
  
  Index: linux-2.6/fs/ext4/inode.c
  ===
  --- linux-2.6.orig/fs/ext4/inode.c  2011-06-20 14:25:33.779248128 +0200
  +++ linux-2.6/fs/ext4/inode.c   2011-06-20 14:27:53.025907745 +0200
  @@ -5357,6 +5357,7 @@ int ext4_setattr(struct dentry *dentry,
  if (attr-ia_size  sbi-s_bitmap_maxbytes)
  return -EFBIG;
  }
  +   down_write((EXT4_I(inode)-truncate_lock));
  }
   
  if (S_ISREG(inode-i_mode) 
  @@ -5405,6 +5406,9 @@ int ext4_setattr(struct dentry *dentry,
  ext4_truncate(inode);
  }
   
  +   if (attr-ia_valid  ATTR_SIZE)
  +   up_write((EXT4_I(inode)-truncate_lock));
  +
 
 Hi Christoph,
 
 this looks like a bug fix, so we should rather do it in a separate
 commit. Could you send it separately, or at least mention that in commit
 description ?

That's stupid note, it is not a bugfix of course. Ignore it, sorry for
the noise.

 
 Otherwise the patch looks good.
 
 Thanks!
 -Lukas
 
  if (!rc) {
  setattr_copy(inode, attr);
  mark_inode_dirty(inode);
  @@ -5850,10 +5854,10 @@ int ext4_page_mkwrite(struct vm_area_str
  struct address_space *mapping = inode-i_mapping;
   
  /*
  -* Get i_alloc_sem to stop truncates messing with the inode. We cannot
  -* get i_mutex because we are already holding mmap_sem.
  +* Get truncate_lock to stop truncates messing with the inode. We
  +* cannot get i_mutex because we are already holding mmap_sem.
   */
  -   down_read(inode-i_alloc_sem);
  +   down_read((EXT4_I(inode)-truncate_lock));
  size = i_size_read(inode);
  if (page-mapping != mapping || size = page_offset(page)
  || !PageUptodate(page)) {
  @@ -5865,7 +5869,7 @@ int ext4_page_mkwrite(struct vm_area_str
  lock_page(page);
  wait_on_page_writeback(page);
  if (PageMappedToDisk(page)) {
  -   up_read(inode-i_alloc_sem);
  +   up_read((EXT4_I(inode)-truncate_lock));
  return VM_FAULT_LOCKED;
  }
   
  @@ -5883,7 +5887,7 @@ int ext4_page_mkwrite(struct vm_area_str
  if (page_has_buffers(page)) {
  if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
  ext4_bh_unmapped)) {
  -   up_read(inode-i_alloc_sem);
  +   up_read((EXT4_I(inode)-truncate_lock));
  return VM_FAULT_LOCKED;
  }
  }
  @@ -5912,11 +5916,11 @@ int ext4_page_mkwrite(struct vm_area_str
   */
  lock_page(page);
  wait_on_page_writeback(page);
  -   up_read(inode-i_alloc_sem);
  +   up_read((EXT4_I(inode)-truncate_lock));
  return VM_FAULT_LOCKED;
   out_unlock:
  if (ret)
  ret = VM_FAULT_SIGBUS;
  -   up_read(inode-i_alloc_sem);
  +   up_read((EXT4_I(inode)-truncate_lock));
  return ret;
   }
  Index: linux-2.6/fs/ext4/super.c
  ===
  --- linux-2.6.orig/fs/ext4/super.c  2011-06-20 14:25:33.795914793 +0200
  +++ linux-2.6/fs/ext4/super.c   2011-06-20 14:27:06.269243443 +0200
  @@ -871,6 +871,7 @@ static struct inode *ext4_alloc_inode(st
  ei-i_datasync_tid = 0;
  atomic_set(ei-i_ioend_count, 0);
  atomic_set(ei-i_aiodio_unwritten, 0);
  +   init_rwsem(ei-truncate_lock);
   
  return ei-vfs_inode;
   }
  Index: linux-2.6/fs/ext4/ext4.h
  ===
  --- linux-2.6.orig/fs/ext4/ext4.h   2011-06-20 14:25:33.752581461 +0200
  +++ linux-2.6/fs/ext4/ext4.h2011-06-20 14:27:06.272576777 +0200
  @@ -844,6 +844,9 @@ struct ext4_inode_info {
  /* on-disk additional length */
  __u16 i_extra_isize;
   
  +   /* protect page_mkwrite against truncates */
  +   struct rw_semaphore truncate_lock;
  +
   #ifdef CONFIG_QUOTA
  /* quota space reservation, managed internally by quota code */
  qsize_t i_reserved_quota;
  
  --
  To unsubscribe from this list: send the line unsubscribe linux-ext4 in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 

-- 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: save preloaded extent_state's in a percpu cache V2

2011-06-21 Thread Andi Kleen
Josef Bacik jo...@redhat.com writes:

 When doing DIO tracing I noticed we were doing a ton of allocations, a lot of
 the time for extent_states.  Some of the time we don't even use the 
 prealloc'ed
 extent_state, it just get's free'd up.  So instead create a per-cpu cache like
 the radix tree stuff.  So we will check to see if our per-cpu cache has a
 prealloc'ed extent_state in it and if so we just continue, else we alloc a new
 one and fill the cache.  Then if we need to use a prealloc'ed extent_state we
 can just take it out of our per-cpu cache.  We will also refill the cache on
 free to try and limit the number of times we have to ask the allocator for
 caches.  With this patch dbench 50 goes from ~210 mb/s to ~260 mb/s.  Thanks,

You're just reimplementing a poor man's custom slab cache -- all of this is 
already
done in slab.

If the difference is really that big better fix slab and have everyone 
benefit?

Did you use slub or slab?
Did you analyze where the cycles are spent?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: save preloaded extent_state's in a percpu cache V2

2011-06-21 Thread Josef Bacik
On 06/21/2011 04:20 PM, Andi Kleen wrote:
 Josef Bacik jo...@redhat.com writes:
 
 When doing DIO tracing I noticed we were doing a ton of allocations, a lot of
 the time for extent_states.  Some of the time we don't even use the 
 prealloc'ed
 extent_state, it just get's free'd up.  So instead create a per-cpu cache 
 like
 the radix tree stuff.  So we will check to see if our per-cpu cache has a
 prealloc'ed extent_state in it and if so we just continue, else we alloc a 
 new
 one and fill the cache.  Then if we need to use a prealloc'ed extent_state we
 can just take it out of our per-cpu cache.  We will also refill the cache on
 free to try and limit the number of times we have to ask the allocator for
 caches.  With this patch dbench 50 goes from ~210 mb/s to ~260 mb/s.  Thanks,
 
 You're just reimplementing a poor man's custom slab cache -- all of this is 
 already
 done in slab.
 
 If the difference is really that big better fix slab and have everyone 
 benefit?
 
 Did you use slub or slab?
 Did you analyze where the cycles are spent?
 

Ugh slub debugging bites me again.  Thanks,

Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] remove i_alloc_sem

2011-06-21 Thread Jan Kara
On Mon 20-06-11 16:15:33, Christoph Hellwig wrote:
 i_alloc_sem has always been a bit of an odd lock.  It's the only remaining
 rw_semaphore that can be released by a different thread than the one that
 locked it, and it's use case in the core direct I/O code is more like a
 counter given that the writers already have external serialization.
 
 This series removes it in favour of a simpler counter scheme, thus getting
 rid of the rw_semaphore non-owner APIs as requests by Thomas, while at the
 same time shrinking the size of struct inode by 160 bytes on 64-bit systems.
 
 The only nasty bit is that two filesystems (fat and ext4) have started
 abusing the lock for their own purposes.  I've added a new rw_semaphore
  ext4 abuse should be gone when Ted merges my rewrite of
ext4_page_mkwrite()... Ted, what happened to that patch. Should I resend
it?

Honza

-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/12 v3] Btrfs: deal with EEXIST after iput

2011-06-21 Thread liubo
On 06/21/2011 10:00 PM, Josef Bacik wrote:
 On 06/21/2011 04:49 AM, Liu Bo wrote:
 There are two cases when BTRFS_I(inode)-logged_trans is zero:
 a) an inode is just allocated;
 b) iput an inode and reread it.

 However, in b) if btrfs is not committed yet, and this inode _may_
 still remain
 in log tree.

 So we need to check the log tree to get logged_trans a right value
 in case it hits a EEXIST while logging.
 
 Instead of doing this why not just check and see if the inode has been
 logged but the transaction has not yet been committed in
 btrfs_drop_inode?  That way the inode doesn't get evicted from cache
 until after we know it's ok and that way we don't have to waste a tree
 lookup.  Thanks,
 

Good idea, I'll follow it.

thanks,
liubo

 Josef
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Christopher Whelan wants to chat

2011-06-21 Thread Christopher Whelan
---

Christopher Whelan wants to stay in better touch using some of
Google's coolest new
products.

If you already have Gmail or Google Talk, visit:
http://mail.google.com/mail/b-ee9a324c7f-06c7f01d3e-rl8EXZ4nBadGESgcw_qYvUYj1Po
You'll need to click this link to be able to chat with Christopher Whelan.

To get Gmail - a free email account from Google with over 2,800 megabytes of
storage - and chat with Christopher Whelan, visit:
http://mail.google.com/mail/a-ee9a324c7f-06c7f01d3e-rl8EXZ4nBadGESgcw_qYvUYj1Po

Gmail offers:
- Instant messaging right inside Gmail
- Powerful spam protection
- Built-in search for finding your messages and a helpful way of organizing
  emails into conversations
- No pop-up ads or untargeted banners - just text ads and related information
  that are relevant to the content of your messages

All this, and its yours for free. But wait, there's more! By opening a Gmail
account, you also get access to Google Talk, Google's instant messaging
service:

http://www.google.com/talk/

Google Talk offers:
- Web-based chat that you can use anywhere, without a download
- A contact list that's synchronized with your Gmail account
- Free, high quality PC-to-PC voice calls when you download the Google Talk
  client

We're working hard to add new features and make improvements, so we might also
ask for your comments and suggestions periodically. We appreciate your help in
making our products even better!

Thanks,
The Google Team

To learn more about Gmail and Google Talk, visit:
http://mail.google.com/mail/help/about.html
http://www.google.com/talk/about.html

(If clicking the URLs in this message does not work, copy and paste them into
the address bar of your browser).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/delayed-inode.c:1301!

2011-06-21 Thread Miao Xie
On Tue, 21 Jun 2011 02:08:54 +0200, David Sterba wrote:
 On Mon, Jun 20, 2011 at 06:12:10PM +0800, Miao Xie wrote:
 From 457f39393b2e3d475fbba029b90b6a4e17b94d43 Mon Sep 17 00:00:00 2001
 From: Miao Xie mi...@cn.fujitsu.com
 Date: Mon, 20 Jun 2011 17:21:51 +0800
 Subject: [PATCH] btrfs: fix inconsonant inode information

 When iputting the inode, We may leave the delayed nodes if they have some
 delayed items that have not been dealt with. So when the inode is read again,
 we must look up the relative delayed node, and use the information in it to
 initialize the inode. Or we will get inconsonant inode information.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/delayed-inode.c |  104 
 +++---
  fs/btrfs/delayed-inode.h |1 +
  fs/btrfs/inode.c |   12 -
  3 files changed, 91 insertions(+), 26 deletions(-)

 diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
 index f1cbd02..280755e 100644
 --- a/fs/btrfs/delayed-inode.c
 +++ b/fs/btrfs/delayed-inode.c
 @@ -82,19 +82,16 @@ static inline struct btrfs_delayed_root 
 *btrfs_get_delayed_root(
  return root-fs_info-delayed_root;
  }
  
 -static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
 -struct inode *inode)
 +static struct btrfs_delayed_node *btrfs_get_delayed_node(struct inode 
 *inode)
  {
 -struct btrfs_delayed_node *node;
  struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
  struct btrfs_root *root = btrfs_inode-root;
  u64 ino = btrfs_ino(inode);
 -int ret;
 +struct btrfs_delayed_node *node;
  
 -again:
  node = ACCESS_ONCE(btrfs_inode-delayed_node);
 
 do you still need the volatile access here, after the again: label has
 been removed? it does not break things if it's there, but it raises
 questions ...

The rule of ACCESS_ONCE said by Linus is:

  if you access unlocked values, you use ACCESS_ONCE().

(See: http://yarchive.net/comp/linux/ACCESS_ONCE.html)
So I think it is still needed.
 
  if (node) {
 -atomic_inc(node-refs);/* can be accessed */
 +atomic_inc(node-refs);
  return node;
  }
  
 @@ -103,7 +100,9 @@ again:
  if (node) {
  if (btrfs_inode-delayed_node) {
  spin_unlock(root-inode_lock);
 -goto again;
 +BUG_ON(btrfs_inode-delayed_node != node);
 +atomic_inc(node-refs);/* can be accessed */
 +return node;
  }
  btrfs_inode-delayed_node = node;
  atomic_inc(node-refs);/* can be accessed */
 @@ -113,6 +112,23 @@ again:
  }
  spin_unlock(root-inode_lock);
  
 +return NULL;
 +}
 +
 +static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
 +struct inode *inode)
 +{
 +struct btrfs_delayed_node *node;
 +struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
 +struct btrfs_root *root = btrfs_inode-root;
 +u64 ino = btrfs_ino(inode);
 +int ret;
 +
 +again:
 +node = btrfs_get_delayed_node(inode);
 
 ... aha, it's been somehow moved here, which copies the original logic.
 Now reading inode-delayed_node is inside a function and I do not think
 that compiler could optimize reading value of btrfs_inode-delayed_node
 that it would require ACCESS_ONCE.

The reason that I rewrote btrfs_get_delayed_node() is:
The old btrfs_get_delayed_node() may ignore the old delayed node which holds the
up-to-date information (such as: new directory indexes, up-to-date inode 
information),
considers there is no relative delayed node. And then btrfs will use the max 
index number
in the fs tree to initialize -index_cnt, but in fact, this number is not right,
perhaps there are some directory indexes in the delayed node, the index number 
of them
is greater than the one in the fs tree. In this way, the same index number may 
be allocated
twice, and hit EEXIST error.

 
 And there is another ACCESS_ONCE in btrfs_remove_delayed_node. I wonder
 what's the reason for that. Sorry to abuse this thread, but I'd like to
 be sure about protection of the -delayed_node members inside
 btrfs_inode. Can you please comment on that?

OK, I'll write some comment.

 If you recall one message from the original report:
 
 [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0)
 into the insertion tree of the delayed node(root id:
 262, inode id: 258, errno: -17)
 
 (-17 == -EEXIST)
 
 a printk after return from __btrfs_add_delayed_item (which is
 able to return -EEXIST) in btrfs_insert_delayed_dir_index. I haven't
 looked farther, but it seems that the item is being inserted (at least)
 twice and I suspect missing locking or other type of protection.

I don't think.
The reason is above.

Thanks
Miao

 
 
 thanks,
 david
 
 +if (node)
 +return node;
 +
  node = 

[PATCH] Btrfs: remove unneeded BUG_ON()

2011-06-21 Thread Tsutomu Itoh
The following functions always return 0.
  - add_delayed_ref_head()
  - add_delayed_tree_ref()
  - add_delayed_data_ref()
  - add_excluded_extent()

Therefore, check by BUG_ON() is unnecessary at the caller of these
functions. 

Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com
---
 fs/btrfs/delayed-ref.c |   32 +---
 fs/btrfs/extent-tree.c |   14 --
 2 files changed, 17 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 125cf76..ce0c728 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -588,7 +588,6 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle 
*trans,
struct btrfs_delayed_tree_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
-   int ret;
 
BUG_ON(extent_op  extent_op-is_data);
ref = kmalloc(sizeof(*ref), GFP_NOFS);
@@ -610,13 +609,12 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle 
*trans,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   ret = add_delayed_ref_head(trans, head_ref-node, bytenr, num_bytes,
-  action, 0);
-   BUG_ON(ret);
+   add_delayed_ref_head(trans, head_ref-node, bytenr, num_bytes,
+action, 0);
+
+   add_delayed_tree_ref(trans, ref-node, bytenr, num_bytes,
+parent, ref_root, level, action);
 
-   ret = add_delayed_tree_ref(trans, ref-node, bytenr, num_bytes,
-  parent, ref_root, level, action);
-   BUG_ON(ret);
spin_unlock(delayed_refs-lock);
return 0;
 }
@@ -633,7 +631,6 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle 
*trans,
struct btrfs_delayed_data_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
-   int ret;
 
BUG_ON(extent_op  !extent_op-is_data);
ref = kmalloc(sizeof(*ref), GFP_NOFS);
@@ -655,13 +652,12 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle 
*trans,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   ret = add_delayed_ref_head(trans, head_ref-node, bytenr, num_bytes,
-  action, 1);
-   BUG_ON(ret);
+   add_delayed_ref_head(trans, head_ref-node, bytenr, num_bytes,
+action, 1);
+
+   add_delayed_data_ref(trans, ref-node, bytenr, num_bytes,
+parent, ref_root, owner, offset, action);
 
-   ret = add_delayed_data_ref(trans, ref-node, bytenr, num_bytes,
-  parent, ref_root, owner, offset, action);
-   BUG_ON(ret);
spin_unlock(delayed_refs-lock);
return 0;
 }
@@ -672,7 +668,6 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle 
*trans,
 {
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
-   int ret;
 
head_ref = kmalloc(sizeof(*head_ref), GFP_NOFS);
if (!head_ref)
@@ -683,10 +678,9 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle 
*trans,
delayed_refs = trans-transaction-delayed_refs;
spin_lock(delayed_refs-lock);
 
-   ret = add_delayed_ref_head(trans, head_ref-node, bytenr,
-  num_bytes, BTRFS_UPDATE_DELAYED_HEAD,
-  extent_op-is_data);
-   BUG_ON(ret);
+   add_delayed_ref_head(trans, head_ref-node, bytenr,
+num_bytes, BTRFS_UPDATE_DELAYED_HEAD,
+extent_op-is_data);
 
spin_unlock(delayed_refs-lock);
return 0;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1f61bf5..15286d1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -225,9 +225,7 @@ static int exclude_super_stripes(struct btrfs_root *root,
if (cache-key.objectid  BTRFS_SUPER_INFO_OFFSET) {
stripe_len = BTRFS_SUPER_INFO_OFFSET - cache-key.objectid;
cache-bytes_super += stripe_len;
-   ret = add_excluded_extent(root, cache-key.objectid,
- stripe_len);
-   BUG_ON(ret);
+   add_excluded_extent(root, cache-key.objectid, stripe_len);
}
 
for (i = 0; i  BTRFS_SUPER_MIRROR_MAX; i++) {
@@ -239,9 +237,7 @@ static int exclude_super_stripes(struct btrfs_root *root,
 
while (nr--) {
cache-bytes_super += stripe_len;
-   ret = add_excluded_extent(root, logical[nr],
- stripe_len);
-   BUG_ON(ret);
+   add_excluded_extent(root, logical[nr], stripe_len);
}
 
kfree(logical);