Re: User feedback: raise the default leaf size to 16k
On 2013/02/13 12:33 PM, Holger Hoffstaette wrote: - raise the leaf size to 16k - use single metadata profile ... the difference in behaviour on a single disk is *very* noticeable. Did you try an isolated change of leaf size? I think the devs would be willing to look into the default size if it makes a dramatic difference on its own. Personally I think you are seeing an improvement more as a result of the metadata profile rather than the leafsize. I don't think changing the default profile for metadata will be easily entertained as this is very important for protecting against corruption due to bitrot. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs/raid56: Add missing #include linux/vmalloc.h
tilegx_defconfig: fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table': fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org --- http://kisskb.ellerman.id.au/kisskb/buildresult/8311887/ fs/btrfs/raid56.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0722205..9a79fb7 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -31,6 +31,7 @@ #include linux/hash.h #include linux/list_sort.h #include linux/raid/xor.h +#include linux/vmalloc.h #include asm/div64.h #include compat.h #include ctree.h -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs/raid56: Add missing #include linux/vmalloc.h
On Sun, Mar 03, 2013 at 04:44:41AM -0700, Geert Uytterhoeven wrote: tilegx_defconfig: fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table': fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] Thanks, I've got this one in my for-linus now. It'll go with the next pull. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix compile failure on parisc
x86 seems to include vmalloc.h by default along some of its arch paths, but most other architectures don't, leading to this compile failure: fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table': fs/btrfs/raid56.c:206: error: implicit declaration of function 'vzalloc' fs/btrfs/raid56.c:206: warning: assignment makes pointer from integer without a cast fs/btrfs/raid56.c:226: error: implicit declaration of function 'vfree' make[2]: *** [fs/btrfs/raid56.o] Error 1 Fix this by adding vmalloc.h explicitly to the includes list Signed-off-by: James Bottomley jbottom...@parallels.com --- diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0722205..1f0f57e 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -30,6 +30,7 @@ #include linux/raid/pq.h #include linux/hash.h #include linux/list_sort.h +#include linux/vmalloc.h #include linux/raid/xor.h #include asm/div64.h #include compat.h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixup
Hi Linus, Geert and James both sent this one in, sorry guys. git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus Geert Uytterhoeven (1) commits (+1/-0): btrfs/raid56: Add missing #include linux/vmalloc.h Total: (1) commits (+1/-0) fs/btrfs/raid56.c | 1 + 1 file changed, 1 insertion(+) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
same EXTENT_ITEM appears twice in the extent tree
Greetings all, I have an extent tree that looks like follows: item 22 key (27059916800 EXTENT_ITEM 16384) itemoff 2656 itemsize 24 extent refs 1 gen 164 flags 1 item 23 key (27059916800 EXTENT_ITEM 98304) itemoff 2603 itemsize 53 extent refs 1 gen 165 flags 1 extent data backref root 257 objectid 257 offset 17446191104 count 1 item 24 key (27059916800 SHARED_DATA_REF 47169536) itemoff 2599 itemsize 4 shared data backref count 1 As can be seen, same EXTENT_ITEM appears twice. This was undetected, until __btrfs_free_extent was called, after cleaner deleted one of the snapshots. Then it lead to assert: if (found_extent) { BUG_ON(is_data refs_to_drop != extent_data_ref_count(root, path, iref)); if (iref) { BUG_ON(path-slots[0] != extent_slot); } else { BUG_ON(path-slots[0] != extent_slot + 1); /* CRASH */ path-slots[0] = extent_slot; num_to_del = 2; } As for the usage of this bad extent, there are multiple snapshots sharing the 98304-length extent, but only one that uses the 16384 extent: file tree key (257 ROOT_ITEM 0) item 19 key (257 EXTENT_DATA 17446191104) itemoff 2935 itemsize 53 extent data disk byte 27059916800 nr 98304 extent data offset 0 nr 98304 ram 98304 extent compression 0 ... file tree key (350 ROOT_ITEM 164) item 21 key (257 EXTENT_DATA 17446191104) itemoff 2829 itemsize 53 extent data disk byte 27059916800 nr 16384 extent data offset 0 nr 16384 ram 16384 extent compression 0 ... file tree key (352 ROOT_ITEM 167) item 19 key (257 EXTENT_DATA 17446191104) itemoff 2935 itemsize 53 extent data disk byte 27059916800 nr 98304 extent data offset 0 nr 98304 ram 98304 extent compression 0 Kernel is for-linus, top commit: commit 1eafa6c73791e4f312324ddad9cbcaf6a1b6052b Author: Miao Xie mi...@cn.fujitsu.com Date: Tue Jan 22 10:49:00 2013 + Btrfs: fix repeated delalloc work allocation I believe I might have more extents like this, because btrfs-debug-tree warns: warning, bad space info total_bytes 26851934208 used 26852773888 warning, bad space info total_bytes 27925676032 used 27926892544 Mount options: nodatasum,nodatacow,noatime,nospace_cache. Metadata profile is DUP, data profile is single. Can anybody advise on how this could have happened? I can provide the whole debug-tree, btrfs-image or any additional info. Thanks, Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
weird kernel-oopses while deleting files on btrfs
Hi list, some rather unexpected btrfs-oopses for my taste. I use btrfs for some time now (mostly on external harddisks) and these oopses happened during some simple file and folder deletion operation on that device. It is a luks-encrypted 80GB drive. Anything like that known? And the fs was created just yesterday, how come there is a message like... [91491.919358] btrfs: mismatching generation and generation_v2 found in root item. This root was probably mounted with an older kernel. Resetting all new fields. ... but the kernel used (3.7.3 from Debian experimental on Debian sid) was installed several days ago. What kind of oopses are these? As of now there is no real data on that device. But if there were, would I need to be concerned about the integrity of those files? ii btrfs-tools 0.19+20130131-2 (if that matters) The whole log is at http://paste.debian.net/hidden/6ee00823/ (if a mua fails to display the text unwrapped) and a copy right here: [91491.900736] device label samsung_S0DWJ30L373663 devid 1 transid 10 /dev/mapper/udisks-luks-uuid-64e6f540-8df0-49b2-af3d-ea18e07355d2-uid1000 [91491.904416] btrfs: disk space caching is enabled [91491.919358] btrfs: mismatching generation and generation_v2 found in root item. This root was probably mounted with an older kernel. Resetting all new fields. [91978.944644] device label seagate_W1E2Z3TA devid 1 transid 439 /dev/dm-8 [91979.320743] device label seagate_W1E2Z3TA devid 1 transid 439 /dev/mapper/udisks-luks-uuid-24593edd-349c-451f-9b6d-eab1120471f6-uid1000 [91979.31] btrfs: disk space caching is enabled [93283.761960] btrfs: block rsv returned -28 [93283.761965] [ cut here ] [93283.762006] WARNING: at /build/buildd-linux_3.7.3-1~experimental.1-i386-eX5kUQ/linux-3.7.3/fs/btrfs/extent-tree.c:6297 btrfs_alloc_free_block+0xcd/0x2a4 [btrfs]() [93283.762010] Hardware name: System Product Name [93283.762012] Modules linked in: hid_logitech usbhid ff_memless ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables cpufreq_powersave cpufreq_conservative cpufreq_stats cpufreq_userspace ppdev lp bnep rfcomm bluetooth rfkill binfmt_misc uinput nfsd auth_rpcgss nfs_acl nfs lockd dns_resolver fscache sunrpc bridge stp llc ext4 crc16 jbd2 hwmon_vid loop fuse snd_hda_codec_analog snd_wavefront snd_cs4236 snd_hda_intel btrfs sg snd_opl3_lib sr_mod snd_hda_codec nouveau snd_hwdep cdrom snd_pcm_oss crc32c libcrc32c zlib_deflate snd_wss_lib joydev usb_storage hid_generic sata_sil snd_mpu401 snd_mixer_oss snd_mpu401_uart coretemp kvm_intel usbled snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq mxm_wmi wmi video snd_seq_device snd_timer ttm i2c_i801 iTCO_wdt snd ns558 drm_kms_helper drm i2c_algo_bit soundcore gameport iTCO_vendor_support kvm lpc_ich mfd_core rng_core pcspkr i2c_core psmouse evdev acpi_cpufreq mperf parport_pc parport processor r8169 mii serio_raw ehci_hcd asus_atk0110 thermal_sys button ext3 mbcache jbd dm_crypt dm_mod raid1 md_mod sha256_generic aes_i586 cbc hid sd_mod crc_t10dif ata_generic microcode ata_piix uhci_hcd libata scsi_mod usbcore usb_common [last unloaded: usbhid] [93283.762149] Pid: 10948, comm: pool Not tainted 3.7-trunk-686-pae #1 Debian 3.7.3-1~experimental.1 [93283.762151] Call Trace: [93283.762160] [c10310a1] ? warn_slowpath_common+0x68/0x79 [93283.762187] [fbe07206] ? btrfs_alloc_free_block+0xcd/0x2a4 [btrfs] [93283.762193] [c10310bf] ? warn_slowpath_null+0xd/0x10 [93283.762219] [fbe07206] ? btrfs_alloc_free_block+0xcd/0x2a4 [btrfs] [93283.762226] [c10b9541] ? page_address+0x1b/0x85 [93283.762254] [fbe0ed11] ? btrfs_header_generation.isra.75+0xb/0x14 [btrfs] [93283.762277] [fbdf929c] ? __btrfs_cow_block+0xfb/0x3b4 [btrfs] [93283.762301] [fbdfa8b9] ? read_block_for_search.isra.42+0x91/0x31e [btrfs] [93283.762325] [fbdf966d] ? btrfs_cow_block+0xe2/0x11f [btrfs] [93283.762349] [fbdfbea3] ? btrfs_search_slot+0x1e6/0x5ab [btrfs] [93283.762377] [fbe0c613] ? btrfs_del_csums+0xd7/0x30a [btrfs] [93283.762402] [fbe029d7] ? __btrfs_free_extent+0x5f8/0x67f [btrfs] [93283.762428] [fbe0654e] ? run_clustered_refs+0x7a7/0x803 [btrfs] [93283.762435] [c10a89ec] ? __set_page_dirty_nobuffers+0x11/0xb7 [93283.762462] [fbe088f2] ? btrfs_run_delayed_refs+0xe7/0x220 [btrfs] [93283.762491] [fbe151c4] ? __btrfs_end_transaction+0xfb/0x275 [btrfs] [93283.762521] [fbe1df70] ? btrfs_evict_inode+0x277/0x2a1 [btrfs] [93283.762528] [c10eae11] ? evict+0x89/0x122 [93283.762533] [c10e3721] ? do_unlinkat+0xcc/0x108 [93283.762538] [c10dbc57] ? fput+0xc/0x8a [93283.762543] [c10e6b16] ? sys_getdents64+0xaa/0xc4 [93283.762549] [c12ecd4d] ? sysenter_do_call+0x12/0x28 [93283.762555] [c12e007b] ? set_cpu_sibling_map+0x2cf/0x2e5 [93283.762560]
Re: same EXTENT_ITEM appears twice in the extent tree
On Sun, Mar 03, 2013 at 06:40:50AM -0700, Alex Lyakas wrote: Greetings all, I have an extent tree that looks like follows: item 22 key (27059916800 EXTENT_ITEM 16384) itemoff 2656 itemsize 24 extent refs 1 gen 164 flags 1 item 23 key (27059916800 EXTENT_ITEM 98304) itemoff 2603 itemsize 53 extent refs 1 gen 165 flags 1 extent data backref root 257 objectid 257 offset 17446191104 count 1 item 24 key (27059916800 SHARED_DATA_REF 47169536) itemoff 2599 itemsize 4 shared data backref count 1 Have you been experimenting on this FS with snapshot deletion patches? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: basic questions regarding COW in Btrfs
Hi Josef, I have some more questions following up on my previous e-mails. I now do somewhat understand the place where extent entries get cow'ed. But I am unclear about the order of operations. Is it correct that the data extent written first, then the pointer in the indirect block needs to be updated, so then it is cowed and written to disk and so on recursively up the tree? Or is the entire path from leaf to node that is going to be affected by the write cowed first and then all the cowed extents are written to the disk and then the rest of the metadata pointers, (for example, in checksum tree, extent tree, etc., I am not sure about this)? Also, I need to understand specifically how the data (leaf nodes) of a file is written to disk v/s the metadata including the indirect nodes of the file. In extent_writepage I only know the pages of a file that are to be written. I guess, I can identify metadata pages based on the inode of the page's owner. But is it possible to distinguish the pages available in extent_writepage path as belonging to the leaf node or internal node for a file? If it cannot be identified at this point, where earlier in the path can this be decided? Many thanks, Aastha. On 25 February 2013 20:00, Aastha Mehta aasth...@gmail.com wrote: Ah okay, I now see how it works. Thanks a lot for your response. Regards, Aastha. On 25 February 2013 18:27, Josef Bacik jba...@fusionio.com wrote: On Mon, Feb 25, 2013 at 08:15:40AM -0700, Aastha Mehta wrote: Thanks again Josef. I understood that cow_file_range is called for a regular file. Just to clarify, in cow_file_range is cow done at the time of reserving extents in the extent btree for the io to be done in this delalloc? I see the following comment above find_free_extent() which is called while trying to reserve extents: /* * walks the btree of allocated extents and find a hole of a given size. * The key ins is changed to record the hole: * ins-objectid == block start * ins-flags = BTRFS_EXTENT_ITEM_KEY * ins-offset == number of blocks * Any available blocks before search_start are skipped. */ This seems to be the only place where a cow might be done, because a key is being inserted into an extent which modifies it. The key isn't inserted at this time, it's just returned with those values for us to do as we please. There is no update of the btree until insert_reserved_extent/btrfs_mark_extent_written in btrfs_finish_ordered_io. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: same EXTENT_ITEM appears twice in the extent tree
Hi Chris, On Sun, Mar 3, 2013 at 5:28 PM, Chris Mason chris.ma...@fusionio.com wrote: On Sun, Mar 03, 2013 at 06:40:50AM -0700, Alex Lyakas wrote: Greetings all, I have an extent tree that looks like follows: item 22 key (27059916800 EXTENT_ITEM 16384) itemoff 2656 itemsize 24 extent refs 1 gen 164 flags 1 item 23 key (27059916800 EXTENT_ITEM 98304) itemoff 2603 itemsize 53 extent refs 1 gen 165 flags 1 extent data backref root 257 objectid 257 offset 17446191104 count 1 item 24 key (27059916800 SHARED_DATA_REF 47169536) itemoff 2599 itemsize 4 shared data backref count 1 Have you been experimenting on this FS with snapshot deletion patches? No, I haven't applied any patches on top of the commit I mentioned. (I presume you mean David's patch for one-by-one deletion). Since created, this FS has only seen straight IO with parallel snapshot creation and deletion. However, the kernel was crashing pretty frequently during this test, so I presume log replay was taking place. Any particular thing I can look for in the debug-tree output, except searching for more double-allocations? Thanks, Alex. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [btrfs] Periodic write spikes while idling, on btrfs root
On 2013/02/14 12:15 PM, Vedant Kumar wrote: Hello, I'm experiencing periodic write spikes while my system is idle. ... turned out to be some systemd log in /var/log/journal. I turned off journald and rebooted, but the write spike behavior remained. ... best, -vk I believe btrfs syncs every 30 seconds (if anything's changed). This sounds like systemd's journal is not actually disabled and that it is simply logging new information every few seconds and forcing it to be synced to disk. Have you tried following the journal as root to see what is being logged? journalctl -f Alternatively, as another measure to troubleshoot, in /etc/systemd/journald.conf, change the Storage= option either to none (which disables logging completely) or to a path inside a tmpfs, thereby eliminating btrfs' involvement. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: usage should match what is coded
On Fri, Mar 01, 2013 at 06:05:21PM +, Hugo Mills wrote: On Fri, Mar 01, 2013 at 11:47:50AM -0600, Eric Sandeen wrote: On 3/1/13 4:10 AM, Anand Jain wrote: Signed-off-by: Anand Jain anand.j...@oracle.com Revieed-by: Eric Sandeen sand...@redhat.com But the curious side of me wonders how it got this way. commit e43cc461550130494194201037590a2b1f0f6880 Author: Ian Kumlien po...@demius.net Date: Fri Feb 8 01:37:02 2013 +0100 Btrfs-progs: add restore command to btrfs added the usage text below, but didn't change the getopt or add code to handle them. No idea where it came from, it wasn't in the standalone restore either. *shrug* I guess nothing got lost. -m was definitely a thing at some point, as I recall using it. I think the code was in josef's progs tree. I suspect the other options were part of that, too. (And -m was definitely really useful for me). My fault here, I cherry-picked Ian's commit from a branch with Josef's updates to restore (adding all the commandline options). I'll pick Anand's fix to keep help and functionality matching. The updates to restore are wanted, but as they are based on an old progs version it's not all trivial to merge them. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: User feedback: raise the default leaf size to 16k
On Sun, Mar 03, 2013 at 03:33:30AM -0700, Brendan Hide wrote: On 2013/02/13 12:33 PM, Holger Hoffstaette wrote: - raise the leaf size to 16k - use single metadata profile ... the difference in behaviour on a single disk is *very* noticeable. Did you try an isolated change of leaf size? I think the devs would be willing to look into the default size if it makes a dramatic difference on its own. Personally I think you are seeing an improvement more as a result of the metadata profile rather than the leafsize. I don't think changing the default profile for metadata will be easily entertained as this is very important for protecting against corruption due to bitrot. The long term plan is to set the default size to 16K, since this does cut down on metadata fragmentation. But in some benchmarks, it adds lock contention because we have fewer blocks to spread the locks over. The 3.9 merge window has fixes for lock contention, so I need to benchmark things again. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs-progs: check out if the swap device
On 2013/02/14 09:53 AM, Tsutomu Itoh wrote: + if (ret 0) { + fprintf(stderr, error checking %s status: %s\n, file, + strerror(-ret)); + exit(1); + } ... + /* check if the device is busy */ + fd = open(file, O_RDWR|O_EXCL); + if (fd 0) { + fprintf(stderr, unable to open %s: %s\n, file, + strerror(errno)); + exit(1); + } This is fine and works (as tested by David) - but I'm not sure if the below suggestions from Zach were taken into account. 1. If the check with open(file, O_RDWR|O_EXCL) shows that the device is available, there's no point in checking if it is mounted as a swap device. A preliminary check using this could precede all other checks which should be skipped if it shows success. 2. If there's an error checking the status (for example lets say /proc/swaps is deprecated), we should print the informational message but not error out. On 2013/02/13 11:58 AM, Zach Brown wrote: - First always open with O_EXCL. If it succeeds then there's no reason to check /proc/swaps at all. (Maybe it doesn't need to try check_mounted() there either? Not sure if it's protecting against accidentally mounting mounted shared storage or not.) ... - At no point is failure of any of the /proc/swaps parsing fatal. It'd carry on ignoring errors until it doesnt have work to do. It'd only ever print the nice message when it finds a match. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Include the device in most error printk()s
On Fri, Feb 15, 2013 at 05:12:37PM -0600, Simon Kirby wrote: [...] Signed-off-by: Simon Kirby s...@hostway.ca Thanks! 2 comments below. Reviewed-by: David Sterba dste...@suse.cz @@ -2919,8 +2923,9 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, if (ret) { btrfs_delalloc_release_metadata(inode, inode-i_size); #ifdef DEBUG - printk(KERN_ERR btrfs: failed to write free ino cache -for root %llu\n, root-root_key.objectid); + btrfs_err(root-fs_info, + btrfs %s: failed to write free ino cache for root %llu, + root-root_key.objectid); failed to write free ino cache for root %llu, #endif } @@ -2454,8 +2456,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) ret = PTR_ERR(trans); goto out; } - printk(KERN_ERR auto deleting %Lu\n, -found_key.objectid); + btrfs_err(root-fs_info, auto deleting %Lu, + found_key.objectid); That's probably only a debugging message, so btrfs_debug would be more appropriate here. ret = btrfs_del_orphan_item(trans, root, found_key.objectid); BUG_ON(ret); /* -ENOMEM or corruption (JDM: Recheck) */ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send receive produces Too many open files in system
On 2013/02/18 12:37 PM, Adam Ryczkowski wrote: ... to migrate btrfs from one partition layout to another. ... source sits on top of lvm2 logical volume, which sits on top of cryptsetup Luks device which subsequentely sits on top of mdadm RAID-6 spanning a partition on each of 4 hard drives ... is a read-only snaphot which I estimate contain ca. 100GB data. ... destination is btrfs multidevice raid10 filesystem, which is based on 4 cryptsetup Luks devices, each live as a separate partition on the same 4 physical hard drives ... ... about 8MB/sek read (and the same speed of write) from each of all 4 hard drives). I hope you've solved this already - but if not: The unnecessarily complex setup aside, a 4-disk RAID6 is going to be slow - most would have gone for a RAID10 configuration, albeit that it has less redundancy. Another real problem here is that you are copying data from these disks to themselves. This means that for every read and write that all four of the disks have to do two seeks. This is time-consuming of the order of 7ms per seek depending on the disks you have. The way to avoid these unnecessary seeks is to first copy the data to a separate unrelated device and then to copy from that device to your final destination device. To increase RAID6 write performance (Perhaps irrelevant here) you can try optimising the stripe_cache_size value. It can use a ton of memory depending on how large a stripe cache setting you end up with. Search online for mdraid stripe_cache_size. To increase the read performance you can try optimising the md arrays' readahead. As above, search online for blockdev setra. This should hopefully make a noticeable difference. Good luck. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: basic questions regarding COW in Btrfs
On Sat, Mar 2, 2013 at 4:07 PM, Alex Lyakas alex.bt...@zadarastorage.com wrote: Hi Josef, I hope it's ok to piggy back on this thread for the following question: I see that in btrfs_cross_ref_exist()=check_committed_ref() path, there is the following check: if (btrfs_extent_generation(leaf, ei) = btrfs_root_last_snapshot(root-root_item)) goto out; So this basically means that after we have taken a snap of a subvol, then all subvol's extents must be COW'ed, even if we delete the snap a minute later. I wonder, why is that so? Is this because file extents can be shared indirectly, like when we create a snap, we only COW the root and only mark all root's *immediate* children shared in the extent tree? Yes that's exactly it. We have no way of knowing that there are no snapshots left for this particular root so if there ever was a snapshot we have to err on the side of caution. Can the new backref walking code be used here to check more accurately, if the extent is shared by anybody else? Probably, if we could figure out if there is a way for more than one root to point to this extent then yes this would be ideal so we don't have to force COW in cases we would rather not. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: basic questions regarding COW in Btrfs
On Sun, Mar 3, 2013 at 10:41 AM, Aastha Mehta aasth...@gmail.com wrote: Hi Josef, I have some more questions following up on my previous e-mails. I now do somewhat understand the place where extent entries get cow'ed. But I am unclear about the order of operations. Is it correct that the data extent written first, then the pointer in the indirect block needs to be updated, so then it is cowed and written to disk and so on recursively up the tree? Or is the entire path from leaf to node that is going to be affected by the write cowed first and then all the cowed extents are written to the disk and then the rest of the metadata pointers, (for example, in checksum tree, extent tree, etc., I am not sure about this)? The second one. We COW the entire path from root to leaf as things need COW'ing. We start a transaction, we insert the file extent entries, we add the checksums, and we add the delayed ref updates to the extent tree. The delayed things are guaranteed to happen in that transaction so we have consistency there. The COW'ing from top to bottom works like that for all trees. Also, I need to understand specifically how the data (leaf nodes) of a file is written to disk v/s the metadata including the indirect nodes of the file. In extent_writepage I only know the pages of a file that are to be written. I guess, I can identify metadata pages based on the inode of the page's owner. But is it possible to distinguish the pages available in extent_writepage path as belonging to the leaf node or internal node for a file? If it cannot be identified at this point, where earlier in the path can this be decided? So they are different things, and they could change from the time we write to the time that the write completes because of COW. Also keep in mind that the metadata (the file extent items and such) for the inodes are not stored specifically within the inode, they're stored inside the same tree that the inode resides in. So you can have a leaf node with multiple inodes and extents for those different inodes. And so any sort of random things can happen, other inodes can be deleted and this inode's metadata will be shifted into a new leaf, or another inode could be added and this inode's data could be pushed off into an adjacent leaf. The only way to know which leaf/page the inode is associated with is to search for whatever you are looking for in the tree, and then while you are holding all of the locks and reference counting you can be sure that those pages contain the metadata you are looking for, but once you let that go there are no guarantees. So as far as how it is written to disk, that is where transactions come in. We track all the dirty metadata pages we have per transaction, and then at transaction commit time we make sure that all of those pages are written to disk and then we commit our super to point to the new root of the tree root, which in turn points at all of our new roots because of COW. These pages can be written before the commit though because of memory pressure, and if they are written and then modified again within in the same transaction we will re-cow them to make sure we don't have any partial-page updates. Keeping track of where a specific inodes metadata is contained is a tricky business. Let me know if that helped. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: weird kernel-oopses while deleting files on btrfs
On Sun, Mar 03, 2013 at 06:57:41AM -0700, Michael Schmitt wrote: Hi list, some rather unexpected btrfs-oopses for my taste. I use btrfs for some time now (mostly on external harddisks) and these oopses happened during some simple file and folder deletion operation on that device. It is a luks-encrypted 80GB drive. Anything like that known? And the fs was created just yesterday, how come there is a message like... [91491.919358] btrfs: mismatching generation and generation_v2 found in root item. This root was probably mounted with an older kernel. Resetting all new fields. This may be from first mount after mkfs. It depends on your tools. ... but the kernel used (3.7.3 from Debian experimental on Debian sid) was installed several days ago. What kind of oopses are these? As of now there is no real data on that device. But if there were, would I need to be concerned about the integrity of those files? [93283.762006] WARNING: at /build/buildd-linux_3.7.3-1~experimental.1-i386-eX5kUQ/linux-3.7.3/fs/btrfs/extent-tree.c:6297 btrfs_alloc_free_block+0xcd/0x2a4 [btrfs]() These are not oopsen but warnings. It's an ENOSPC warning as we try to delete the extents. It did happen sometimes in this kernel, but it is only a warning. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: collapse concurrent forced allocations
On Feb 23, 2013, Alexandre Oliva ol...@gnu.org wrote: On Feb 22, 2013, Josef Bacik jba...@fusionio.com wrote: So I understand what you are getting at, but I think you are doing it wrong. If we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info-force_alloc to our original caller's CHUNK_ALLOC_FORCE. But that's ok, do_chunk_alloc will set space_info-force_alloc to CHUNK_ALLOC_NO_FORCE at the end, when it succeeds allocating, and then anyone else waiting on the mutex to try to allocate will load the NO_FORCE from space_info. So we only really care about making sure a chunk is actually allocated, instead of doing this flag shuffling we should just do if (space_info-chunk_alloc) { spin_unlock(space_info-lock); wait_event(!space_info-chunk_alloc); return 0; I looked a bit further into it. I think I this would work if we had a wait_queue for space_info-chunk_alloc. We don't, so the mutex interface is probably the best we can do. OTOH, I found out we seem to get into an allocate spree when a large file is being quickly created, such as when creating a ceph journal or making a copy of a multi-GB file. I suppose btrfs is just trying to allocate contiguous space for the file, but unfortunately there doesn't seem to be a fallback for allocation failure: as soon as data allocation fails and space_info is set as full, the large write fails and the filesystem becomes full, without even trying to use non-contiguous storage. Isn't that a bug? I've also been trying to track down why, on a single-data filesystem, (compressed?) data reads that fail because of bad blocks also spike the CPU load and lock the file that failed to map in and the entire filesystem, so that the only way to recover is to force a reboot. Does this sound familiar to anyone? -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: update mkfs.btrfs help info for raid5/6
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Since raid5/6 support was introduced, we should update mkfs.btrfs help info. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- mkfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mkfs.c b/mkfs.c index 5ece186..f9f26a5 100644 --- a/mkfs.c +++ b/mkfs.c @@ -326,7 +326,7 @@ static void print_usage(void) fprintf(stderr, options:\n); fprintf(stderr, \t -A --alloc-start the offset to start the FS\n); fprintf(stderr, \t -b --byte-count total number of bytes in the FS\n); - fprintf(stderr, \t -d --data data profile, raid0, raid1, raid10, dup or single\n); + fprintf(stderr, \t -d --data data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n); fprintf(stderr, \t -l --leafsize size of btree leaves\n); fprintf(stderr, \t -L --label set a label\n); fprintf(stderr, \t -m --metadata metadata profile, values like data profile\n); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: traverse to backup super-block only when indicated
flags = BTRFS_SCAN_REGISTER | BTRFS_SCAN_PRIMARY_SB; btrfs_scan_one_dir(/dev/, flags) I just got too flexed into the current way of coding in btrfs-progs :-) But let me get at least this part of the code in the right-way. Thanks Eric for pointing out. -Anand -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html