Re: [PATCH v3] lib: add size unit t/p/e to memparse
On 12/06/14 23:15, Andrew Morton wrote: On Wed, 2 Apr 2014 16:54:37 +0800 Gui Hecheng guihc.f...@cn.fujitsu.com wrote: For modern filesystems such as btrfs, t/p/e size level operations are common. add size unit t/p/e parsing to memparse Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- changelog v1-v2: replace kilobyte with kibibyte, and others v2-v3: add missing unit bytes in comment --- lib/cmdline.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index eb67911..511b9be 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -119,11 +119,17 @@ char *get_options(const char *str, int nints, int *ints) *@retptr: (output) Optional pointer to next char after parse completes * *Parses a string into a number. The number stored at @ptr is - * potentially suffixed with %K (for kilobytes, or 1024 bytes), - * %M (for megabytes, or 1048576 bytes), or %G (for gigabytes, or - * 1073741824). If the number is suffixed with K, M, or G, then - * the return value is the number multiplied by one kilobyte, one - * megabyte, or one gigabyte, respectively. + * potentially suffixed with + * %K (for kibibytes, or 1024 bytes), + * %M (for mebibytes, or 1048576 bytes), + * %G (for gibibytes, or 1073741824 bytes), + * %T (for tebibytes, or 1099511627776 bytes), + * %P (for pebibytes, or 1125899906842624 bytes), + * %E (for exbibytes, or 1152921504606846976 bytes). I'm afraid I find these names quite idiotic - we all know what the traditional terms mean so why go and muck with it. Also, kibibytes sounds like cat food. Hi, Andrew While I agree it sounds like cat food, it seemed like a good opportunity to fix a minor issue that is otherwise unlikely to be fixed for a very long time. Should we feel uncomfortable with the patch, as is, because of language/correctness friction? Pedantry included, the patch is correct. ;) Thanks -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] lib: add size unit t/p/e to memparse
On 13/06/14 03:42, Gui Hecheng wrote: For modern filesystems such as btrfs, t/p/e size level operations are common. add size unit t/p/e parsing to memparse Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- changelog v1-v2: replace kilobyte with kibibyte, and others v2-v3: add missing unit bytes in comment v3-v4: remove idiotic name for K,M,G,P,T,E --- lib/cmdline.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index d4932f7..76a712e 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -121,11 +121,7 @@ EXPORT_SYMBOL(get_options); *@retptr: (output) Optional pointer to next char after parse completes * *Parses a string into a number. The number stored at @ptr is - * potentially suffixed with %K (for kilobytes, or 1024 bytes), - * %M (for megabytes, or 1048576 bytes), or %G (for gigabytes, or - * 1073741824). If the number is suffixed with K, M, or G, then - * the return value is the number multiplied by one kilobyte, one - * megabyte, or one gigabyte, respectively. + * potentially suffixed with K, M, G, T, P, E. */ unsigned long long memparse(const char *ptr, char **retptr) @@ -135,6 +131,15 @@ unsigned long long memparse(const char *ptr, char **retptr) unsigned long long ret = simple_strtoull(ptr, endptr, 0); switch (*endptr) { + case 'E': + case 'e': + ret = 10; + case 'P': + case 'p': + ret = 10; + case 'T': + case 't': + ret = 10; case 'G': case 'g': ret = 10; Ah, I see - you've removed all reference to their names. That's good too. :) -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] lib: add size unit t/p/e to memparse
On Fri, 2014-06-13 at 07:55 +0200, Brendan Hide wrote: On 13/06/14 03:42, Gui Hecheng wrote: For modern filesystems such as btrfs, t/p/e size level operations are common. add size unit t/p/e parsing to memparse Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- changelog v1-v2: replace kilobyte with kibibyte, and others v2-v3: add missing unit bytes in comment v3-v4: remove idiotic name for K,M,G,P,T,E --- lib/cmdline.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/lib/cmdline.c b/lib/cmdline.c index d4932f7..76a712e 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -121,11 +121,7 @@ EXPORT_SYMBOL(get_options); *@retptr: (output) Optional pointer to next char after parse completes * *Parses a string into a number. The number stored at @ptr is - * potentially suffixed with %K (for kilobytes, or 1024 bytes), - * %M (for megabytes, or 1048576 bytes), or %G (for gigabytes, or - * 1073741824). If the number is suffixed with K, M, or G, then - * the return value is the number multiplied by one kilobyte, one - * megabyte, or one gigabyte, respectively. + * potentially suffixed with K, M, G, T, P, E. */ unsigned long long memparse(const char *ptr, char **retptr) @@ -135,6 +131,15 @@ unsigned long long memparse(const char *ptr, char **retptr) unsigned long long ret = simple_strtoull(ptr, endptr, 0); switch (*endptr) { + case 'E': + case 'e': + ret = 10; + case 'P': + case 'p': + ret = 10; + case 'T': + case 't': + ret = 10; case 'G': case 'g': ret = 10; Ah, I see - you've removed all reference to their names. That's good too. :) Thank you for your review! I think maybe more people would like cleaner things in the kernel. -Gui -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deadlock/high load
On 13/06/14 04:37, Liu Bo wrote: The output of 'btrfs filesystem df' is appreciate, it can help determine if the FS has entered into 'almost full' situation, and that may cause a bug that pages are not marked with writeback tag and lead to processes's endless waiting. I'll send it as soon as it happens again, thanks! Cheers, Alin. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deadlock/high load
On 13/06/14 04:37, Liu Bo wrote: The output of 'btrfs filesystem df' is appreciate, it can help determine if the FS has entered into 'almost full' situation, and that may cause a bug that pages are not marked with writeback tag and lead to processes's endless waiting. The output now (when the host is stable) is: Data, single: total=54.01GiB, used=24.48GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=3.01GiB, used=1.31GiB unknown, single: total=464.00MiB, used=0.00 Cheers, Alin. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] lib: add size unit t/p/e to memparse
On Fri, Jun 13, 2014 at 07:54:44AM +0200, Brendan Hide wrote: On 12/06/14 23:15, Andrew Morton wrote: On Wed, 2 Apr 2014 16:54:37 +0800 Gui Hecheng guihc.f...@cn.fujitsu.com wrote: + * %K (for kibibytes, or 1024 bytes), + * %M (for mebibytes, or 1048576 bytes), + * %G (for gibibytes, or 1073741824 bytes), + * %T (for tebibytes, or 1099511627776 bytes), + * %P (for pebibytes, or 1125899906842624 bytes), + * %E (for exbibytes, or 1152921504606846976 bytes). I'm afraid I find these names quite idiotic - we all know what the traditional terms mean so why go and muck with it. Also, kibibytes sounds like cat food. Hi, Andrew While I agree it sounds like cat food, it seemed like a good opportunity to fix a minor issue that is otherwise unlikely to be fixed for a very long time. Should we feel uncomfortable with the patch, as is, because of language/correctness friction? Pedantry included, the patch is correct. ;) Last night, I wrote a very grumpy reply to Andrew. I'm glad I didn't send it, because Brendan has managed to cover at least one of my points much more politely than I did. My other comment is that TB vs TiB is a 10% difference in the magnitude of the number, and so the accumulated error is now no longer small enough to be brushed under the carpet as we all did in days past. By Andrew's thinking, a 4 TB disk is 3.638 TB in size. I'd say a 4 TB disk is 3.638 TiB in size, and I can be precise (±1GB in the latter case) with both values. Hugo. PS. Let's just not talk about 1.44 MB floppy disks. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- 2 + 2 = 5, for sufficiently large values of 2. --- signature.asc Description: Digital signature
Re: [PATCH v2] Btrfs: fix RCU correctness warning when running sanity tests
On Tue, Jun 10, 2014 at 7:17 PM, Filipe David Borba Manana fdman...@gmail.com wrote: When CONFIG_PROVE_RCU=y and CONFIG_PROVE_RCU_REPEATEDLY=y, the following was dumped in dmesg: [ 3197.218064] === [ 3197.218064] [ INFO: suspicious RCU usage. ] [ 3197.218066] 3.15.0-rc8-fdm-btrfs-next-33+ #4 Not tainted [ 3197.218067] --- [ 3197.218068] include/linux/radix-tree.h:196 suspicious rcu_dereference_check() usage! [ 3197.218068] [ 3197.218068] other info that might help us debug this: [ 3197.218068] [ 3197.218070] [ 3197.218070] rcu_scheduler_active = 1, debug_locks = 1 [ 3197.218071] 1 lock held by modprobe/12024: [ 3197.218072] #0: ((fs_info-buffer_lock)-rlock){+.+...}, at: [a025c5fa] btrfs_free_dummy_root+0x5a/0x1d0 [btrfs] [ 3197.218093] [ 3197.218093] stack backtrace: [ 3197.218095] CPU: 3 PID: 12024 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4 [ 3197.218096] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3197.218097] 0001 8800af18fc18 81685c5a feb0 [ 3197.218099] 8800cf6ccb40 8800af18fc48 810a6316 8801d955f640 [ 3197.218101] 8800d719e328 8800d719e370 8800d719c000 8800af18fcb8 [ 3197.218102] Call Trace: [ 3197.218105] [81685c5a] dump_stack+0x4e/0x68 [ 3197.218108] [810a6316] lockdep_rcu_suspicious+0xe6/0x130 [ 3197.218119] [a025c728] btrfs_free_dummy_root+0x188/0x1d0 [btrfs] [ 3197.218129] [a025f56a] btrfs_test_qgroups+0xea/0x1bb [btrfs] [ 3197.218137] [a03a19d2] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs] [ 3197.218144] [a03a19d2] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs] [ 3197.218151] [a03a1ab7] init_btrfs_fs+0xe5/0x184 [btrfs] [ 3197.218154] [81000352] do_one_initcall+0x102/0x150 [ 3197.218157] [8103d223] ? set_memory_nx+0x43/0x50 [ 3197.218160] [81682668] ? set_section_ro_nx+0x6d/0x74 [ 3197.218162] [810d91cc] load_module+0x1cdc/0x2630 [ 3197.218164] [810d4e90] ? show_initstate+0x60/0x60 [ 3197.218166] [810d9c9e] SyS_finit_module+0x8e/0x90 [ 3197.218168] [81698212] system_call_fastpath+0x16/0x1b Signed-off-by: Filipe David Borba Manana fdman...@gmail.com Chris, Please drop this patch from your integration branch. Sasha fixed this too but in a simpler way: https://patchwork.kernel.org/patch/4337091/ (In fact both patches applied probably trigger another warning the rcu usage correctness checker) Thanks --- V2: Added missing rcu read unlock if a retry is needed. fs/btrfs/tests/btrfs-tests.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c index a5dcacb..bdb1f05 100644 --- a/fs/btrfs/tests/btrfs-tests.c +++ b/fs/btrfs/tests/btrfs-tests.c @@ -130,8 +130,8 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info) struct radix_tree_iter iter; void **slot; - spin_lock(fs_info-buffer_lock); restart: + rcu_read_lock(); radix_tree_for_each_slot(slot, fs_info-buffer_radix, iter, 0) { struct extent_buffer *eb; @@ -140,15 +140,17 @@ restart: continue; /* Shouldn't happen but that kind of thinking creates CVE's */ if (radix_tree_exception(eb)) { - if (radix_tree_deref_retry(eb)) + if (radix_tree_deref_retry(eb)) { + rcu_read_unlock(); goto restart; + } continue; } - spin_unlock(fs_info-buffer_lock); + rcu_read_unlock(); free_extent_buffer_stale(eb); - spin_lock(fs_info-buffer_lock); + goto restart; } - spin_unlock(fs_info-buffer_lock); + rcu_read_unlock(); btrfs_free_qgroup_config(fs_info); btrfs_free_fs_roots(fs_info); -- 1.9.1 -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs filesystem read only on error
Hi, Today my btrfs disk suddently turned to read only after than an error occured, I guess it's a security measure to avoid any corrupted data after a filesystem error. Here is the syslog: Jun 13 04:59:39 nas1 kernel: [110663.709572] btrfs: page allocation failure: order:1, mode:0x200020 Jun 13 04:59:39 nas1 kernel: [110663.709579] CPU: 3 PID: 21177 Comm: btrfs Not tainted 3.12-0.bpo.1-amd64 #1 Debian 3.12.9-1~bpo70+1 Jun 13 04:59:39 nas1 kernel: [110663.709581] Hardware name: Supermicro A1SAi/A1SAi, BIOS 1.0b 11/06/2013 Jun 13 04:59:39 nas1 kernel: [110663.709584] 0001 814be0b3 00200020 Jun 13 04:59:39 nas1 kernel: [110663.709590] 8112535c 88047ffefb00 00020002 Jun 13 04:59:39 nas1 kernel: [110663.709594] 88047ffefb08 0002 Jun 13 04:59:39 nas1 kernel: [110663.709599] Call Trace: Jun 13 04:59:39 nas1 kernel: [110663.709608] [814be0b3] ? dump_stack+0x41/0x51 Jun 13 04:59:39 nas1 kernel: [110663.709614] [8112535c] ? warn_alloc_failed+0x10c/0x160 Jun 13 04:59:39 nas1 kernel: [110663.709618] [811292b2] ? __alloc_pages_nodemask+0x9c2/0xaa0 Jun 13 04:59:39 nas1 kernel: [110663.709623] [8116d5c5] ? kmem_getpages+0x65/0x1a0 Jun 13 04:59:39 nas1 kernel: [110663.709627] [8116efd2] ? fallback_alloc+0x172/0x260 Jun 13 04:59:39 nas1 kernel: [110663.709631] [8116f904] ? kmem_cache_alloc+0x144/0x1f0 Jun 13 04:59:39 nas1 kernel: [110663.709635] [81281f98] ? __idr_pre_get+0x68/0x90 Jun 13 04:59:39 nas1 kernel: [110663.709639] [81282128] ? ida_pre_get+0x18/0x90 Jun 13 04:59:39 nas1 kernel: [110663.709644] [81187561] ? get_anon_bdev+0x21/0xe0 Jun 13 04:59:39 nas1 kernel: [110663.709665] [a048a768] ? btrfs_init_fs_root+0xa8/0xf0 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709677] [a048b32e] ? btrfs_get_fs_root+0xce/0x220 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709689] [a048e5fe] ? create_pending_snapshot+0x6ee/0x980 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709701] [a048e919] ? create_pending_snapshots+0x89/0xa0 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709713] [a048fd1a] ? btrfs_commit_transaction+0x46a/0x9f0 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709718] [81082d20] ? add_wait_queue+0x60/0x60 Jun 13 04:59:39 nas1 kernel: [110663.709731] [a04bf39d] ? btrfs_mksubvol.isra.59+0x3cd/0x3f0 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709743] [a04bf4d0] ? btrfs_ioctl_snap_create_transid+0x110/0x1a0 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709755] [a04bf621] ? btrfs_ioctl_snap_create_v2+0x41/0x140 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709767] [a04bf6e9] ? btrfs_ioctl_snap_create_v2+0x109/0x140 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709780] [a04c2bd2] ? btrfs_ioctl+0xc32/0x1d30 [btrfs] Jun 13 04:59:39 nas1 kernel: [110663.709785] [81151544] ? mmap_region+0x274/0x600 Jun 13 04:59:39 nas1 kernel: [110663.709790] [814c75b8] ? __do_page_fault+0x2b8/0x540 Jun 13 04:59:39 nas1 kernel: [110663.709795] [811971ca] ? do_vfs_ioctl+0x8a/0x4f0 Jun 13 04:59:39 nas1 kernel: [110663.709799] [811976d0] ? SyS_ioctl+0xa0/0xc0 Jun 13 04:59:39 nas1 kernel: [110663.709803] [814cb7b9] ? system_call_fastpath+0x16/0x1b Jun 13 04:59:39 nas1 kernel: [110663.709805] Mem-Info: Jun 13 04:59:39 nas1 kernel: [110663.709807] Node 0 DMA per-cpu: Jun 13 04:59:39 nas1 kernel: [110663.709810] CPU0: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709812] CPU1: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709814] CPU2: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709816] CPU3: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709818] CPU4: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709820] CPU5: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709822] CPU6: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709824] CPU7: hi:0, btch: 1 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709825] Node 0 DMA32 per-cpu: Jun 13 04:59:39 nas1 kernel: [110663.709828] CPU0: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709830] CPU1: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709832] CPU2: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709834] CPU3: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709836] CPU4: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709838] CPU5: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709840] CPU6: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709842] CPU7: hi: 186, btch: 31 usd: 0 Jun 13 04:59:39 nas1 kernel: [110663.709844]
[GIT PULL] Btrfs #2 for 3.16
Hi Linus, Please pull my for-linus branch: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus This has a few fixes since our last pull and a new ioctl for doing btree searches from userland. It's very similar to the existing ioctl, but lets us return larger items back down to the app. Gerhard Heift (7) commits (+183/-38): btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer (+26/-2) btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW (+15/-9) btrfs: tree_search, search_ioctl: direct copy to userspace (+33/-15) btrfs: tree_search, search_ioctl: accept varying buffer (+11/-7) btrfs: tree_search: eliminate redundant nr_items check (+7/-5) btrfs: new function read_extent_buffer_to_user (+40/-0) btrfs: new ioctl TREE_SEARCH_V2 (+51/-0) Eric Sandeen (3) commits (+11/-7): btrfs: free ulist in qgroup_shared_accounting() error path (+3/-1) btrfs: fix use of uninit ret in end_extent_writepage() (+1/-1) btrfs: fix error handling in create_pending_snapshot (+7/-5) Wang Shilong (1) commits (+7/-2): Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting Filipe Manana (1) commits (+2/-0): Btrfs: fix qgroups sanity test crash or hang Sasha Levin (1) commits (+1/-1): btrfs: prevent RCU warning when dereferencing radix tree slot Total: (13) commits fs/btrfs/extent_io.c | 39 ++- fs/btrfs/extent_io.h | 3 + fs/btrfs/ioctl.c | 147 ++ fs/btrfs/qgroup.c | 4 +- fs/btrfs/reada.c | 9 ++- fs/btrfs/tests/btrfs-tests.c | 2 +- fs/btrfs/tests/qgroup-tests.c | 2 + fs/btrfs/transaction.c| 12 ++-- include/uapi/linux/btrfs.h| 10 +++ 9 files changed, 192 insertions(+), 36 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Remove a leading '+'.
On 6/13/14, 4:34 PM, Adam Buchbinder wrote: It was added in commit aab7f3ae3eb02fb7e8a6a31d7e2ada704a3dcd5e, most likely as a copy-pasteo. Signed-off-by: Adam Buchbinder abuchbin...@google.com --- btrfs-show-super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/btrfs-show-super.c b/btrfs-show-super.c index f42cd15..ed0311f 100644 --- a/btrfs-show-super.c +++ b/btrfs-show-super.c @@ -50,7 +50,7 @@ static void print_usage(void) fprintf(stderr, \t-f : print full superblock information\n); fprintf(stderr, \t-a : print information of all superblocks\n); fprintf(stderr, \t-i super_mirror : specify which mirror to print out\n); -+fprintf(stderr, \t-F : attempt to dump superblocks with bad magic\n); + fprintf(stderr, \t-F : attempt to dump superblocks with bad magic\n); fprintf(stderr, %s\n, BTRFS_BUILD_VERSION); } Whoa, git blame blames me, but that's not the patch I sent. ;) Reviewed-by: Eric Sandeen sand...@redhat.com thanks, -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: R: Re: Slow startup of systemd-journal on BTRFS
Hi Dave On 06/13/2014 01:24 AM, Dave Chinner wrote: On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote: Goffredo Baroncelli kreij...@libero.it posted on Thu, 12 Jun 2014 13:13:26 +0200 as excerpted: systemd has a very stupid journal write pattern. It checks if there is space in the file for the write, and if not it fallocates the small amount of space it needs (it does *4 byte* fallocate calls!) and then does the write to it. All this does is fragment the crap out of the log files because the filesystems cannot optimise the allocation patterns. I checked the code, and to me it seems that the fallocate() are done in FILE_SIZE_INCREASE unit (actually 8MB). FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think actually pretty much equally bad without NOCOW set on the file. So maybe it's been fixed in systemd since the last time I looked. Yup: http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal-file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58 The reason it was changed? To save a syscall per append, not to prevent fragmentation of the file, which was the problem everyone was complaining about... thanks for pointing that. However I am performing my tests on a fedora 20 with systemd-208, which seems have this change Why? Because btrfs data blocks are 4 KiB. With COW, the effect for either 4 byte or 8 MiB file allocations is going to end up being the same, forcing (repeated until full) rewrite of each 4 KiB block into its own extent. I am reaching the conclusion that fallocate is not the problem. The fallocate increase the filesize of about 8MB, which is enough for some logging. So it is not called very often. I have to investigate more what happens when the log are copied from /run to /var/log/journal: this is when journald seems to slow all. I am prepared a PC which reboot continuously; I am collecting the time required to finish the boot vs the fragmentation of the system.journal file vs the number of boot. The results are dramatic: after 20 reboot, the boot time increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot time to the original one, but after another 20 reboot, the boot time still requires 20-30 seconds more It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with 8GB). For both PC the HD is a mechanical one... And that's now a btrfs problem :/ Are you sure ? ghigo@venice:/var/log$ sudo filefrag messages messages: 29 extents found ghigo@venice:/var/log$ sudo filefrag journal/*/system.journal journal/41d686199835445395ac629d576dfcb9/system.journal: 1378 extents found So the old rsyslog creates files with fewer fragments. BTRFS (but it seems also xfs) for sure highlights more this problem than other filesystem. But also systemd seems to create a lot of extens. BR G.Baroncelli Cheers, Dave. -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix a use-after-free in the volumes code.
When a struct btrfs_fs_devices was being torn down by btrfs_close_devices(), there was an invalidated pointer in the global list fs_uuids which still pointed to it; if a device was closed and then reopened (which btrfs-convert does), freed memory would be accessed. This was found using ThreadSanitizer (pretty much doing what AddressSanitizer would, but not exiting after the first failure). To reproduce, build with -fsanitize=thread and run 'make test'. Representative output is below. This change makes the current tests TSan-clean. WARNING: ThreadSanitizer: heap-use-after-free (pid=29161) Read of size 8 at 0x7d18eee0 by main thread: #0 memcmp ??:0 #1 find_fsid .../volumes.c:81 #2 device_list_add .../volumes.c:95 #3 btrfs_scan_one_device .../volumes.c:259 #4 btrfs_scan_fs_devices .../disk-io.c:1002 #5 __open_ctree_fd .../disk-io.c:1090 #6 open_ctree_fd .../disk-io.c:1191 #7 do_convert .../btrfs-convert.c:2317 #8 main .../btrfs-convert.c:2745 Previous write of size 8 at 0x7d18eee0 by main thread: #0 free ??:0 #1 btrfs_close_devices .../volumes.c:191 #2 close_ctree .../disk-io.c:1401 #3 do_convert .../btrfs-convert.c:2300 #4 main .../btrfs-convert.c:2745 Location is heap block of size 96 at 0x7d18eee0 allocated by main thread: #0 calloc ??:0 (exe+0x0002acc6) #1 device_list_add .../volumes.c:97 #2 btrfs_scan_one_device .../volumes.c:259 #3 btrfs_scan_fs_devices .../disk-io.c:1002 #4 __open_ctree_fd .../disk-io.c:1090 #5 open_ctree_fd .../disk-io.c:1191 #6 do_convert .../btrfs-convert.c:2256 #7 main .../btrfs-convert.c:2745 Signed-off-by: Adam Buchbinder abuchbin...@google.com --- volumes.c | 1 + volumes.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/volumes.c b/volumes.c index a61928c..36f6050 100644 --- a/volumes.c +++ b/volumes.c @@ -188,6 +188,7 @@ again: goto again; } + list_del(fs_devices-list); free(fs_devices); return 0; } diff --git a/volumes.h b/volumes.h index b1ff3d0..2e960b5 100644 --- a/volumes.h +++ b/volumes.h @@ -67,7 +67,7 @@ struct btrfs_device { struct btrfs_fs_devices { u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */ - /* the device with this id has the most recent coyp of the super */ + /* the device with this id has the most recent copy of the super */ u64 latest_devid; u64 latest_trans; u64 lowest_devid; -- 2.0.0.526.g5318336 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Properly size the leafsize field in the mdrestore_struct struct.
I'd like to follow up on this a bit, because the way I found it was *weird*. MSan found an uninitialized write. Reproducing the issue through GDB showed that there's a struct mdrestore_struct type with a member of type u64 called 'leafsize' which was... half-initialized? Four bytes were uninitialized, four were initialized. Looking further, it was being set by a call to a getter function defined in ctree.h, which pulls a particular member from a superblock struct... which is of type u32. Aha! So the struct member was missized. Easy enough to fix. Except that the mdrestore_struct is initialized using calloc, so the part that wasn't getting written to was actually initialized, and the part that *was* was showing up as uninitialized. And actually contained a sane value, reliably. What was going on? Turns out that btrfs's use of zlib, which was the uninstrumented system version, caused anything allocated by uncompress() to show up as uninitialized. (Trying to use __msan_unpoison did not work, as I have an iffy install of the sanitizer in the first place (it's whatever happened to come with clang-3.3), and MSanDR (the dynamic-instrumentation tool which marks all uninstrumented libraries as good) is apparently not trivial to install. I may just build zlib.) But the point is, it would have been annoying and sad if this had been a false positive through and through, but I *did* find an actual issue here using MSan, even though MSan was pointing in the exact wrong place. Adam Buchbinder -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: some project ideas: NFS4 ACLs, resilience on the same device, allowing to specify which devices are distinct in a RAID
On Sat, 2014-06-07 at 10:25 +0200, Goffredo Baroncelli wrote: You can add new ideas to the wiki pages, supporting by link and other info were available. This is the real nature of the wiki pages. I've added some stuff now: https://btrfs.wiki.kernel.org/index.php?title=Project_ideasaction=historysubmitdiff=27001oldid=26611 @upstream-devs: Just drop the ideas wich don't fit :) Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: Slow startup of systemd-journal on BTRFS
Goffredo Baroncelli posted on Sat, 14 Jun 2014 00:19:31 +0200 as excerpted: On 06/13/2014 01:24 AM, Dave Chinner wrote: On Thu, Jun 12, 2014 at 12:37:13PM +, Duncan wrote: FWIW, either 4 byte or 8 MiB fallocate calls would be bad, I think actually pretty much equally bad without NOCOW set on the file. So maybe it's been fixed in systemd since the last time I looked. Yup: http://cgit.freedesktop.org/systemd/systemd/commit/src/journal/journal- file.c?id=eda4b58b50509dc8ad0428a46e20f6c5cf516d58 The reason it was changed? To save a syscall per append, not to prevent fragmentation of the file, which was the problem everyone was complaining about... thanks for pointing that. However I am performing my tests on a fedora 20 with systemd-208, which seems have this change Why? Because btrfs data blocks are 4 KiB. With COW, the effect for either 4 byte or 8 MiB file allocations is going to end up being the same, forcing (repeated until full) rewrite of each 4 KiB block into its own extent. I am reaching the conclusion that fallocate is not the problem. The fallocate increase the filesize of about 8MB, which is enough for some logging. So it is not called very often. But... If a file isn't (properly[1]) set NOCOW (and the btrfs isn't mounted with nodatacow), then an fallocate of 8 MiB will increase the file size by 8 MiB and write that out. So far so good as at that point the 8 MiB should be a single extent. But then, data gets written into 4 KiB blocks of that 8 MiB one at a time, and because btrfs is COW, the new data in the block must be written to a new location. Which effectively means that by the time the 8 MiB is filled, each 4 KiB block has been rewritten to a new location and is now an extent unto itself. So now that 8 MiB is composed of 2048 new extents, each one a single 4 KiB block in size. =:^( Tho as I already stated, for file sizes upto 128 MiB or so anyway[2], the btrfs autodefrag mount option should at least catch that and rewrite (again), this time sequentially. I have to investigate more what happens when the log are copied from /run to /var/log/journal: this is when journald seems to slow all. That's an interesting point. At least in theory, during normal operation journald will write to /var/log/journal, but there's a point during boot at which it flushes the information accumulated during boot from the volatile /run location to the non-volatile /var/log location. /That/ write, at least, should be sequential, since there will be 4 KiB of journal accumulated that needs to be transferred at once. However, if it's being handled by the forced pre-write fallocate described above, then that's not going to be the case, as it'll then be a rewrite of already fallocated file blocks and thus will get COWed exactly as I described above. =:^( I am prepared a PC which reboot continuously; I am collecting the time required to finish the boot vs the fragmentation of the system.journal file vs the number of boot. The results are dramatic: after 20 reboot, the boot time increase of 20-30 seconds. Doing a defrag of system.journal reduces the boot time to the original one, but after another 20 reboot, the boot time still requires 20-30 seconds more It is a slow PC, but I saw the same behavior also on a more modern pc (i5 with 8GB). For both PC the HD is a mechanical one... The problem's duplicable. That's the first step toward a fix. =:^) And that's now a btrfs problem :/ Are you sure ? As they say, Whoosh! At least here, I interpreted that remark as primarily sarcastic commentary on the systemd devs' apparent attitude, which can be (controversially) summarized as: Systemd doesn't have problems because it's perfect. Therefore, any problems you have with systemd must instead be with other components which systemd depends on. IOW, it's a btrfs problem now in practice, not because it is so in a technical sense, but because systemd defines it as such and is unlikely to budge, so the only way to achieve progress is for btrfs to deal with it. An arguably fairer and more impartial assessment of this particular situations suggests that neither btrfs, which as a COW-based filesystem, like all COW-based filesystems has the existing-file-rewrite as a major technical challenge that it must deal with /somehow/, nor systemd, which in choosing to use fallocate is specifically putting itself in that existing-file-rewrite class, are entirely at fault. But that doesn't matter if one side refuses to budge, because then the other side must do so regardless of where the fault was, if there is to be any progress at all. Meanwhile, I've predicted before and do so here again, that as btrfs moves toward mainstream and starts supplanting ext* as the assumed Linux default filesystem, some of these problems will simply go away, because at that point, various apps are no longer optimized for the assumed default
Re: Btrfs filesystem read only on error
David Bloquel posted on Fri, 13 Jun 2014 12:51:32 +0200 as excerpted: Informations about my system: Linux nas1 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1 (2014-02-07) x86_64 GNU/Linux brtfs version: Btrfs v0.19 Yikes, that's old! Particularly the 0.19 btrfs-progs, tho there have been a lot of kernel btrfs fixes since the 3.12 series as well, not all of which will have been backported. Due to the rate of fixes going into btrfs, users are strongly recommended to keep up with current stable kernels, now 3.15, and not get /too/ far behind on btrfs-progs either. I'm actually a bit behind myself ATM as I've been focused on other things, kernel from git but not rebuild since 3.14, and btrfs-progs from git at v3.14.2. I'd recommend kernel 3.14 if not 3.15, and a btrfs-progs no older than 3.12. I think there are some snapshotting fixes in kernel 3.15 so you'll probably want that given the snapshotting you report doing. (I do little snapshotting and 3.14 has been pretty stable for me, the reason I've let myself get a bit behind, what with other things I have been doing.) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html