block rsv returned -28 during balance
Hello, On a 3.6.0-rc7 kernel, I launched: # btrfs fi balance start -f -mconvert=single /mnt/tmp/ Current situation: # df -h /mnt/tmp/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/alpha-lv1 3.6T 2.7T 801G 78% /mnt/tmp # btrfs fi df /mnt/tmp/ Data: total=3.00TB, used=2.66TB System: total=4.00MB, used=364.00KB Metadata, DUP: total=11.00GB, used=5.72GB Metadata: total=63.00GB, used=0.00 There seems to be plenty of free space, but the balance seems to have stalled and the dmesg is being filled with messages like this: [ 2926.465406] btrfs: block rsv returned -28 [ 2926.465411] [ cut here ] [ 2926.465446] WARNING: at /home/apw/COD/linux/fs/btrfs/extent-tree.c:6323 use_block_rsv+0x19f/0x1b0 [btrfs]() [ 2926.465450] Hardware name: VirtualBox [ 2926.465452] Modules linked in: joydev microcode parport_pc hid_generic parport psmouse serio_raw pcspkr i2c_piix4 mac_hid xfs btrfs libcrc32c zlib_deflate raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq usbhid hid e1000 [ 2926.465517] Pid: 4682, comm: btrfs Tainted: GW 3.6.0-030600rc7-generic #201209232235 [ 2926.465520] Call Trace: [ 2926.465532] [81056fff] warn_slowpath_common+0x7f/0xc0 [ 2926.465539] [8105705a] warn_slowpath_null+0x1a/0x20 [ 2926.465569] [a00c4edf] use_block_rsv+0x19f/0x1b0 [btrfs] [ 2926.465599] [a00c860d] btrfs_alloc_free_block+0x3d/0x220 [btrfs] [ 2926.465625] [a00b45b4] ? __btrfs_cow_block+0x324/0x4f0 [btrfs] [ 2926.465663] [a00f516c] ? read_extent_buffer+0xbc/0x120 [btrfs] [ 2926.465689] [a00b5dfc] ? comp_keys+0x2c/0x30 [btrfs] [ 2926.465715] [a00b43b2] __btrfs_cow_block+0x122/0x4f0 [btrfs] [ 2926.465745] [a00cfff0] ? verify_parent_transid+0x170/0x170 [btrfs] [ 2926.465771] [a00b487c] btrfs_cow_block+0xfc/0x220 [btrfs] [ 2926.465808] [a0117c6f] do_relocation+0x46f/0x560 [btrfs] [ 2926.465815] [8169e85e] ? _raw_spin_lock+0xe/0x20 [ 2926.465842] [a00bc90b] ? block_rsv_add_bytes+0x5b/0x80 [btrfs] [ 2926.465878] [a0117fa4] relocate_tree_block+0x244/0x280 [btrfs] [ 2926.465914] [a011be73] relocate_tree_blocks+0x123/0x1a0 [btrfs] [ 2926.465950] [a011ccba] relocate_block_group+0x1fa/0x560 [btrfs] [ 2926.466009] [a011d1df] btrfs_relocate_block_group+0x1bf/0x2f0 [btrfs] [ 2926.466049] [a00f83a6] btrfs_relocate_chunk.isra.53+0x56/0x3b0 [btrfs] [ 2926.466086] [a00eeda9] ? release_extent_buffer.isra.37+0x39/0x60 [btrfs] [ 2926.466092] [8169e85e] ? _raw_spin_lock+0xe/0x20 [ 2926.466128] [a00f43b7] ? free_extent_buffer+0x37/0x90 [btrfs] [ 2926.466164] [a00fc602] __btrfs_balance+0x302/0x3e0 [btrfs] [ 2926.466201] [a00fc9d5] btrfs_balance+0x2f5/0x4d0 [btrfs] [ 2926.466238] [a0104e74] btrfs_ioctl_balance+0x114/0x440 [btrfs] [ 2926.466275] [a0106938] btrfs_ioctl+0x428/0x950 [btrfs] [ 2926.466282] [8115a7b6] ? do_brk+0x226/0x300 [ 2926.466290] [8119878a] do_vfs_ioctl+0x8a/0x340 [ 2926.466297] [81198ad1] sys_ioctl+0x91/0xa0 [ 2926.466304] [816a70ad] system_call_fastpath+0x1a/0x1f [ 2926.466308] ---[ end trace 37b1b50f9306e0b3 ]--- -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
[PATCH] Btrfs: remove unnecessary IS_ERR in bio_readpage_error()
Because the value of extent_map is only a correct value or NULL, so IS_ERR is unnecessary. Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com --- fs/btrfs/extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 979fa0d..576ed9f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2071,7 +2071,7 @@ static int bio_readpage_error(struct bio *failed_bio, struct page *page, } read_unlock(em_tree-lock); - if (!em || IS_ERR(em)) { + if (!em) { kfree(failrec); return -EIO; } -- 1.7.11.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called
We should confirm the value of extent_map before calling trace_btrfs_get_extent() because the value of extent_map has the possibility of NULL. Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com --- fs/btrfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index cad0c57..b8f53e8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6123,7 +6123,8 @@ insert: write_unlock(em_tree-lock); out: - trace_btrfs_get_extent(root, em); + if (em) + trace_btrfs_get_extent(root, em); if (path) btrfs_free_path(path); -- 1.7.11.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tunning - cache write (database)
Hi, First, sorry if this isn't the place to get this kind of help... If not, I appreciate some link , forum, where I can try get some answers... My problem: * Using btrfs + compression , flush of 60 MB/s take 4 minutes (on this 4 minutes they keep constatly I/O of +- 4MB/s no disks) (flush from Informix database) The enviroment : * Virtualized environment * OpenSuse 12.1 64bits, running over VmWare ESXi 5 * Btrfs version : btrfsprogs-0.19-43.1.2.x86_64 * Kernel : Linux jdivm06 3.1.10-1.16-desktop #1 SMP PREEMPT Wed Jun 27 05:21:40 UTC 2012 (d016078) x86_64 x86_64 x86_64 GNU/Linux * The file system is used to keep the Informix database data (chunks). * 8 cores (intel i7) , so, all btrfs threads able to work parallel. The file system mount : root@jdivm06:/proc/sys/fs# mount |grep ifx /dev/sdb1 on /ifxdados type btrfs (rw,nosuid,nodev,noexec,noatime,compress=zlib,space_cache) My question, what I believed will help to avoid this long flush : * Have some way to force this flush all in memory cache and then use the btrfs background process to flush to disk ... Security and recover aren't a priority for now, because this is part of a database bulkload ...after finish , integrity will be desirable (not a obligation, since this is a test environment) For now, performance is the mainly requirement... A plus : root@jdivm06:/proc/sys/fs# cat /proc/sys/vm/dirty_ratio 50 root@jdivm06:/proc/sys/fs# cat /proc/sys/vm/dirty_background_ratio 10 Thanks Cesar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tunning - cache write (database)
On Mon, Oct 1, 2012 at 8:27 PM, Cesar Inacio Martins cesar_inacio_mart...@yahoo.com.br wrote: My problem: * Using btrfs + compression , flush of 60 MB/s take 4 minutes (on this 4 minutes they keep constatly I/O of +- 4MB/s no disks) (flush from Informix database) * OpenSuse 12.1 64bits, running over VmWare ESXi 5 * Btrfs version : btrfsprogs-0.19-43.1.2.x86_64 * Kernel : Linux jdivm06 3.1.10-1.16-desktop #1 SMP PREEMPT Wed Jun 27 My question, what I believed will help to avoid this long flush : * Have some way to force this flush all in memory cache and then use the btrfs background process to flush to disk ... Security and recover aren't a priority for now, because this is part of a database bulkload ...after finish , integrity will be desirable (not a obligation, since this is a test environment) For now, performance is the mainly requirement... I suggest you start by reading http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg18827.html After that, PROBABLY start your database by preloading libeatmydata to disable fsync completely. On a side note, zfs has sync property, which when set to disabled, have pretty much the same effect as libeatmydata. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: try to avoid doing a search in btrfs_next_leaf
On Sun, Sep 30, 2012 at 05:28:23AM -0600, Alex Lyakas wrote: Hi Josef, I guess I am missing something, but I thought btrfs_next_leaf() should just jump to the next leaf (or item, if it was added meanwhile) irrespective of the key that is in the last slot of the current leaf. The change you added is effective when there is a next leaf, but you refuse to go there unless its first key has the same objectid. (I think you use the ctree property that the key in the first node/leaf of a tree block is equal to its parent's key). Can you pls explain why you insist on the same objectid? It's to avoid another search forward. Unless I missed something everybody who calls btrfs_next_leaf() only wants to walk forward based on a particular item. If I've missed something and that's not the case then this needs to be dropped, or we can set some flag in path to ignore this logic, either way. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: be smarter about dropping things from the tree log
On Fri, Sep 28, 2012 at 11:45:21AM -0600, Zach Brown wrote: When we truncate existing items in the tree log we've been searching for each individual item and removing them. This is unnecessary churn and searching, just keep track of the slot we are on and how many items we need to delete and delete them all at once. Thanks, (speaking of unnecessary churn :)) +next_slot: path-slots[0]--; + btrfs_item_key_to_cpu(path-nodes[0], found_key, path-slots[0]); if (found_key.objectid != objectid) break; - ret = btrfs_del_item(trans, log, path); + start_slot = path-slots[0]; + del_nr++; + if (start_slot) + goto next_slot; A linear backwards scan? Of potentially 64k leaves? Can you use bin_search() to look for the first key = [objectid,0,0] in the leaf? And probably a single comparison of slot 0 to fast path the case where the whole leaf contains the object id? Yeah I can do that. + ret = btrfs_del_items(trans, log, path, start_slot, del_nr); if (ret) break; btrfs_release_path(path); } + if (!ret del_nr) + ret = btrfs_del_items(trans, log, path, start_slot, del_nr); btrfs_release_path(path); You shouldn't have to duplicate deletion and releasing the path if you wrap the calculation of start_slot and nr in a helper. Something like: nr = find_nr_and_slot_doo_de_doo(, start_slot); if (nr 0) btrfs_del_items(, start_slot, nr); K, I'll fix that up, thanks. Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH V2 0/4] Btrfs: introduce extent buffer cache to btrfs
Miao Xie miaox at cn.fujitsu.com writes: This patchset introduce extent buffer cache to btrfs. The basic idea is to reduce the search time and the contentions of the extent buffer lock by re-using the last search result. I ran stress.sh, xfstests and some other tools to test it, all of them worked well. As a performance improvement patchset, of course we did performance test. Because this patchset is to improve the b+ tree search, in other words, it improves the performance of the metadata operations, we use file creation test to measure it. So we ran 10 tasks, and all of them created 10 files in their own directories at the same time. As the result, we found this patchset makes btrfs ~20% faster(98s - 77s). we can pull this patchset from the URL git://github.com/miaoxie/linux-btrfs.git extent-buffer-cache Thanks Miao --- I tested the patchset with aim7's fileserver test with 16 processes on RAM emulated SCSI btrfs file system. I got about 18% speedup in throughput. The workload is a mix of file copy, read, write and file sync. The contention on btrfs_tree_lock operation is reduced with the patchset, by about 8.4% of cpu cycles (from 48.3% to 39.9%). Miao, I can send you the detailed profile if you are interested. Thanks. Tim Chen -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: be smarter about dropping things from the tree log V2
When we truncate existing items in the tree log we've been searching for each individual item and removing them. This is unnecessary churn and searching, just keep track of the slot we are on and how many items we need to delete and delete them all at once. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- V1-V2: do as zach suggested and bin search down to the lowest key in this leaf and delete from there. In practice this won't be more than 2 or 3 items, but if you have lots of xattrs it could be a lot more. fs/btrfs/tree-log.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 4e468a0..a5dcf71 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2872,6 +2872,7 @@ static int drop_objectid_items(struct btrfs_trans_handle *trans, int ret; struct btrfs_key key; struct btrfs_key found_key; + int start_slot; key.objectid = objectid; key.type = max_key_type; @@ -2893,8 +2894,18 @@ static int drop_objectid_items(struct btrfs_trans_handle *trans, if (found_key.objectid != objectid) break; - ret = btrfs_del_item(trans, log, path); - if (ret) + found_key.offset = 0; + found_key.type = 0; + ret = btrfs_bin_search(path-nodes[0], found_key, 0, + start_slot); + + ret = btrfs_del_items(trans, log, path, start_slot, + path-slots[0] - start_slot + 1); + /* +* If start slot isn't 0 then we don't need to re-search, we've +* found the last guy with the objectid in this tree. +*/ + if (ret || start_slot != 0) break; btrfs_release_path(path); } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH] Btrfs: rework can_nocow_odirect
I need everybody to go over this with a fine toothed comb since it is a pretty big change. I think it is right and it seems to come out right, but if it's not it will mean we screw up O_DIRECT on snapshotted files with preallocated extents, so please, make sure it is correct :). --- Subject: [PATCH] Btrfs: rework can_nocow_odirect We are always doing the file extent lookup in here even though we've already done the btrfs_get_extent which does the exact same thing. So re-work can_nocow_odirect to get the same information out of the extent_map we already have and then do the cross ref check and csum checks as appropriate. This reduces the number of allocations and searches we do for every O_DIRECT write and man it helps a lot. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/inode.c | 93 ++--- 1 files changed, 25 insertions(+), 68 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2c785c0..1cd7a6b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6352,79 +6352,42 @@ out: * block must be cow'd */ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, - struct inode *inode, u64 offset, u64 len) + struct inode *inode, + struct extent_map *em, u64 offset, + u64 len) { - struct btrfs_path *path; - int ret; - struct extent_buffer *leaf; struct btrfs_root *root = BTRFS_I(inode)-root; - struct btrfs_file_extent_item *fi; - struct btrfs_key key; u64 disk_bytenr; u64 backref_offset; u64 extent_end; u64 num_bytes; - int slot; - int found_type; - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - ret = btrfs_lookup_file_extent(trans, root, path, btrfs_ino(inode), - offset, 0); - if (ret 0) - goto out; - slot = path-slots[0]; - if (ret == 1) { - if (slot == 0) { - /* can't find the item, must cow */ - ret = 0; - goto out; - } - slot--; - } - ret = 0; - leaf = path-nodes[0]; - btrfs_item_key_to_cpu(leaf, key, slot); - if (key.objectid != btrfs_ino(inode) || - key.type != BTRFS_EXTENT_DATA_KEY) { - /* not our file or wrong item type, must cow */ - goto out; - } - - if (key.offset offset) { - /* Wrong offset, must cow */ - goto out; - } - - fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); - found_type = btrfs_file_extent_type(leaf, fi); - if (found_type != BTRFS_FILE_EXTENT_REG - found_type != BTRFS_FILE_EXTENT_PREALLOC) { - /* not a regular extent, must cow */ - goto out; - } - disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); - backref_offset = btrfs_file_extent_offset(leaf, fi); + if (em-block_start == EXTENT_MAP_INLINE || + em-block_start == EXTENT_MAP_HOLE) + return 0; - extent_end = key.offset + btrfs_file_extent_num_bytes(leaf, fi); - if (extent_end offset + len) { - /* extent doesn't include our full range, must cow */ - goto out; - } + /* +* The em's disk_bytenr is already adjusted for its offset so we need to +* adjust it accordingly. +*/ + backref_offset = em-start - em-orig_start; + disk_bytenr = em-block_start - backref_offset; + extent_end = em-start + em-len; if (btrfs_extent_readonly(root, disk_bytenr)) - goto out; + return 0; /* * look for other files referencing this extent, if we * find any we must cow */ if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode), - key.offset - backref_offset, disk_bytenr)) - goto out; + em-orig_start, disk_bytenr)) + return 0; + + /* No prealloc, we won't have csums */ + if (test_bit(EXTENT_FLAG_PREALLOC, em-flags)) + return 1; /* * adjust disk_bytenr and num_bytes to cover just the bytes @@ -6433,18 +6396,12 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, * to keep the csums correct */ disk_bytenr += backref_offset; - disk_bytenr += offset - key.offset; + disk_bytenr += offset - em-start; num_bytes = min(offset + len, extent_end) - offset; if (csum_exist_in_range(root, disk_bytenr, num_bytes)) - goto out; - /* -* all of the above have
[BTRFS-PROGS][PATCH] pretty_sizes() returns incorrect values
From: Goffredo Baroncelli kreij...@inwind.it pretty_sizes() returns incorrect values if the argument is 1024. pretty_sizes(0) - 0.00 OK pretty_sizes(102) - 0.10 WRONG pretty_sizes(1023) - 1.00 WRONG pretty_sizes(1024) - 1.00KBOK Signed-off-by: Goffredo Baroncelli kreij...@inwind.it --- utils.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/utils.c b/utils.c index aade9e2..04c3e82 100644 --- a/utils.c +++ b/utils.c @@ -1097,25 +1097,27 @@ char *pretty_sizes(u64 size) { int num_divs = 0; int pretty_len = 16; - u64 last_size = size; - u64 fract_size = size; float fraction; char *pretty; - while(size 0) { - fract_size = last_size; - last_size = size; - size /= 1024; - num_divs++; - } - if (num_divs == 0) - num_divs = 1; - if (num_divs ARRAY_SIZE(size_strs)) - return NULL; + if( size 1024 ){ + fraction = size; + num_divs = 0; + } else { + u64 last_size = size; + num_divs = 0; + while(size = 1024){ + last_size = size; + size /= 1024; + num_divs ++; + } - fraction = (float)fract_size / 1024; + if (num_divs ARRAY_SIZE(size_strs)) + return NULL; + fraction = (float)last_size / 1024; + } pretty = malloc(pretty_len); - snprintf(pretty, pretty_len, %.2f%s, fraction, size_strs[num_divs-1]); + snprintf(pretty, pretty_len, %.2f%s, fraction, size_strs[num_divs]); return pretty; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tunning - cache write (database)
On Tue, Oct 2, 2012 at 3:16 AM, Clemens Eisserer linuxhi...@gmail.com wrote: I suggest you start by reading http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg18827.html After that, PROBABLY start your database by preloading libeatmydata to disable fsync completely. Which will cure the sympthoms, not the issue itself - I remember the same advice was given for Reiser4 back then ;) Usually for non-toy use-cases data is too valueable to just disable fsync. The OP DID say he doesn't really care about security, recovery, nor integrity (or at least, it's not an obligatiion) :D Other than trying latest -rc and using libeatmydata, I can't see what else can be done to improve current db performance on btrfs. As the list archive shows, zfs is currently MUCH more suitable for that. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html