Re: [GIT PULL] [PATCH v4 00/26] Delete CURRENT_TIME and CURRENT_TIME_SEC macros
On Monday, August 15, 2016 6:23:12 PM CEST Greg KH wrote: > On Sat, Aug 13, 2016 at 03:48:12PM -0700, Deepa Dinamani wrote: > > The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC > > macros. > > The macros are not y2038 safe. There is no plan to transition them into > > being > > y2038 safe. > > ktime_get_* api's can be used in their place. And, these are y2038 safe. > > Who are you execting to pull this huge patch series? Dave Chinner suggested to have Al Viro pick up the whole series. > Why not just introduce the new api call, wait for that to be merged, and > then push the individual patches through the different subsystems? > After half of those get ignored, then provide a single set of patches > that can go through Andrew or my trees. That was the original approach for v4.7, but (along with requesting a number of reworks that Deepa incorporated), Linus preferred doing the API change done in one chunk, see https://patchwork.kernel.org/patch/9134249/ Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix btrfs_no_printk stub helper
The addition of btrfs_no_printk() caused a build failure when CONFIG_PRINTK is disabled: fs/btrfs/send.c: In function 'send_rename': fs/btrfs/ctree.h:3367:2: error: implicit declaration of function 'btrfs_no_printk' [-Werror=implicit-function-declaration] This moves the helper outside of that #ifdef so it is always defined, and changes the existing #ifdef to refer to that helper as well for consistency. Fixes: 47c57058ff2c ("btrfs: btrfs_debug should consume fs_info when DEBUG is not defined") Signed-off-by: Arnd Bergmann --- fs/btrfs/ctree.h | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e5621268c9fe..63bf4cc34626 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3241,20 +3241,17 @@ int btrfs_parse_options(struct btrfs_root *root, char *options, unsigned long new_flags); int btrfs_sync_fs(struct super_block *sb, int wait); +static inline __printf(2, 3) +void btrfs_no_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) +{ +} + #ifdef CONFIG_PRINTK __printf(2, 3) void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...); -__printf(2, 3) -static inline int btrfs_no_printk(const struct btrfs_fs_info *fs_info, - const char *fmt, ...) -{ - return 0; -} #else -static inline __printf(2, 3) -void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) -{ -} +#define btrfs_printk(fs_info, fmt, args...) \ + btrfs_no_printk(fs_info, fmt, ##args) #endif #define btrfs_emerg(fs_info, fmt, args...) \ -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Tue, Sep 18, 2018 at 7:59 PM Jason Gunthorpe wrote: > > On Tue, Sep 18, 2018 at 10:51:08AM -0700, Darren Hart wrote: > > On Fri, Sep 14, 2018 at 09:57:48PM +0100, Al Viro wrote: > > > On Fri, Sep 14, 2018 at 01:35:06PM -0700, Darren Hart wrote: > > > > > > > Acked-by: Darren Hart (VMware) > > > > > > > > As for a longer term solution, would it be possible to init fops in such > > > > a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg > > > > so we don't have to duplicate this boilerplate for every ioctl fops > > > > structure? > > > > > > Bad idea, that... Because several years down the road somebody will > > > add > > > an ioctl that takes an unsigned int for argument. Without so much as > > > looking > > > at your magical mystery macro being used to initialize file_operations. > > > > Fair, being explicit in the declaration as it is currently may be > > preferable then. > > It would be much cleaner and safer if you could arrange things to add > something like this to struct file_operations: > > long (*ptr_ioctl) (struct file *, unsigned int, void __user *); > > Where the core code automatically converts the unsigned long to the > void __user * as appropriate. > > Then it just works right always and the compiler will help address > Al's concern down the road. I think if we wanted to do this with a new file operation, the best way would be to do the copy_from_user()/copy_to_user() in the caller as well. We already do this inside of some subsystems, notably drivers/media/, and it simplifies the implementation of the ioctl handler function significantly. We obviously cannot do this in general, both because of traditional drivers that have 16-bit command codes (drivers/tty and others) and also because of drivers that by accident defined the commands incorrectly and use the wrong type or the wrong direction in the definition. Arnd
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Mon, Sep 24, 2018 at 10:35 PM Jason Gunthorpe wrote: > On Mon, Sep 24, 2018 at 10:18:52PM +0200, Arnd Bergmann wrote: > > On Tue, Sep 18, 2018 at 7:59 PM Jason Gunthorpe wrote: > > > On Tue, Sep 18, 2018 at 10:51:08AM -0700, Darren Hart wrote: > > > > On Fri, Sep 14, 2018 at 09:57:48PM +0100, Al Viro wrote: > > > > > On Fri, Sep 14, 2018 at 01:35:06PM -0700, Darren Hart wrote: > > We already do this inside of some subsystems, notably drivers/media/, > > and it simplifies the implementation of the ioctl handler function > > significantly. We obviously cannot do this in general, both because of > > traditional drivers that have 16-bit command codes (drivers/tty and others) > > and also because of drivers that by accident defined the commands > > incorrectly and use the wrong type or the wrong direction in the > > definition. > > That could work well, but the first idea could be done globally and > mechanically, while this would require very careful per-driver > investigation. > > Particularly if the core code has worse performance.. ie due to > kmalloc calls or something. > > I think it would make more sense to start by having the core do the > case to __user and then add another entry point to have the core do > the copy_from_user, and so on. Having six separate callback pointers to implement a single system call seems a bit excessive though. Arnd
[PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
On 32-bit ARM with gcc-8, I see a link error with the addition of the CONFIG_NO_AUTO_INLINE option: fs/btrfs/super.o: In function `btrfs_statfs': super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod' super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod' fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to `__aeabi_uldivmod' follow So far this is the only file that shows the behavior, so I'd propose to just work around it by marking the functions as 'static inline' that normally get inlined here. The reference to __aeabi_uldivmod comes from a div_u64() which has an optimization for a constant division that uses a straight '/' operator when the result should be known to the compiler. My interpretation is that as we turn off inlining, gcc still expects the result to be constant but fails to use that constant value. Cc: Changbin Du Fixes: 943b8435c3bd ("kernel hacking: add a config option to disable compiler auto-inlining") Signed-off-by: Arnd Bergmann --- fs/btrfs/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index c3c1e7bee49d..b7af0b8936ad 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1922,7 +1922,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, } /* Used to sort the devices by max_avail(descending sort) */ -static int btrfs_cmp_device_free_bytes(const void *dev_info1, +static inline int btrfs_cmp_device_free_bytes(const void *dev_info1, const void *dev_info2) { if (((struct btrfs_device_info *)dev_info1)->max_avail > @@ -1951,8 +1951,8 @@ static inline void btrfs_descending_sort_devices( * The helper to calc the free space on the devices that can be used to store * file data. */ -static int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, - u64 *free_bytes) +static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info, + u64 *free_bytes) { struct btrfs_device_info *devices_info; struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; -- 2.18.0
Re: [PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
On 11/3/18, Nikolay Borisov wrote: > On 3.11.18 г. 17:39 ч., Arnd Bergmann wrote: >> On 32-bit ARM with gcc-8, I see a link error with the addition of the >> CONFIG_NO_AUTO_INLINE option: >> >> fs/btrfs/super.o: In function `btrfs_statfs': >> super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod' >> fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to >> `__aeabi_uldivmod' follow >> >> So far this is the only file that shows the behavior, so I'd propose >> to just work around it by marking the functions as 'static inline' >> that normally get inlined here. >> >> The reference to __aeabi_uldivmod comes from a div_u64() which has an >> optimization for a constant division that uses a straight '/' operator >> when the result should be known to the compiler. My interpretation is >> that as we turn off inlining, gcc still expects the result to be constant >> but fails to use that constant value. > > I read this as "this is a compiler bug", no ? So you are providing a > hack around a compiler bug? Mostly, yes. The do_div() macro is really pushing the boundaries of what we can expect the compiler to do in terms of optimizations, and we've had problems with it in the past. IIRC the gcc developers would not classify this as a bug because the result of __builtin_constant_p() is not guaranteed to work the way we expect, it just does so most of the time. Arnd
Re: [PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
On 11/4/18, Qu Wenruo wrote: > > > On 2018/11/3 下午11:39, Arnd Bergmann wrote: >> On 32-bit ARM with gcc-8, I see a link error with the addition of the >> CONFIG_NO_AUTO_INLINE option: >> >> fs/btrfs/super.o: In function `btrfs_statfs': >> super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod' >> super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod' >> fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to >> `__aeabi_uldivmod' follow >> >> So far this is the only file that shows the behavior, so I'd propose >> to just work around it by marking the functions as 'static inline' >> that normally get inlined here. > > As a workaround it looks OK, but it's definitely not the root cause. > >> >> The reference to __aeabi_uldivmod comes from a div_u64() which has an >> optimization for a constant division that uses a straight '/' operator >> when the result should be known to the compiler. My interpretation is >> that as we turn off inlining, gcc still expects the result to be constant >> but fails to use that constant value. > > It looks more like a bug in div_u64() optimization. > > Despite this file in btrfs, did you hit the same problem for any other > file? Not this time. I've done a creduce on the file and got to this code struct kstatfs { u64 f_bfree; }; btrfs_calc_avail_data_space(p1) {} btrfs_statfs(struct kstatfs *p1) { u64 d = 0; unsigned e = 1; for (; a;) e = btrfs_bg_type_to_factor(); p1->f_bfree = div_u64(0, e) >> c; __asm__(""); div_u64(d, e); b = btrfs_calc_avail_data_space(&d); } Looking at the assembler code produced by this, it seems to be the same thing that we dealt with in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785 >> Cc: Changbin Du >> Fixes: 943b8435c3bd ("kernel hacking: add a config option to disable >> compiler auto-inlining") > > I can't find it in the mainline kernel, is the commit hash correct? > If not merged, we should still has a chance to further polish that patch. It's in linux-next. Arnd
Re: [PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
On 11/5/18, David Sterba wrote: > On Sun, Nov 04, 2018 at 11:32:03PM +0100, Arnd Bergmann wrote: >> >> Cc: Changbin Du >> >> Fixes: 943b8435c3bd ("kernel hacking: add a config option to disable >> >> compiler auto-inlining") >> > >> > I can't find it in the mainline kernel, is the commit hash correct? >> > If not merged, we should still has a chance to further polish that >> > patch. >> >> It's in linux-next. > > I can't find it in current linux-next either, the final reference in > Fixes: must refer to a commit in Linus' tree. Ah, right, it got rebased. The commit ID in today's linux-next is 917fad29febd. Most trees in linux-next don't rebase so this would not be an issue, but you are right that this one clearly did, so the line is wrong here. The commit is now delayed until 4.21 I assume. > You can take this fix with the patch that introduces the config option > (ack for that) in case merging through the btrfs tree would be too late > for it (ie. no common base for the git trees containg the new code and > fix). > > Or I can take it through btrfs tree in 4.20-rc cycle. In both cases the > Fixes: does not need to be there. Please take it through the btrfs tree. Let me know if you need me to extend the changelog to explain that situation better. Arnd
[PATCH] btrfs: reduce stack usage for btrfsic_process_written_block
btrfsic_process_written_block() cals btrfsic_process_metablock(), which has a fairly large stack usage due to the btrfsic_stack_frame variable. It also calls btrfsic_test_for_metadata(), which now needs several hundreds of bytes for its SHASH_DESC_ON_STACK(). In some configurations, we end up with both functions on the same stack, and gcc warns about the excessive stack usage that might cause the available stack space to run out: fs/btrfs/check-integrity.c:1743:13: error: stack frame size of 1152 bytes in function 'btrfsic_process_written_block' [-Werror,-Wframe-larger-than=] Marking both child functions as noinline_for_stack helps because this guarantees that the large variables are not on the same stack frame. Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for checksumming") Signed-off-by: Arnd Bergmann --- fs/btrfs/check-integrity.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 81a9731959a9..0b52ab4cb964 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -940,7 +940,7 @@ static void btrfsic_stack_frame_free(struct btrfsic_stack_frame *sf) kfree(sf); } -static int btrfsic_process_metablock( +static noinline_for_stack int btrfsic_process_metablock( struct btrfsic_state *state, struct btrfsic_block *const first_block, struct btrfsic_block_data_ctx *const first_block_ctx, @@ -1706,8 +1706,9 @@ static void btrfsic_dump_database(struct btrfsic_state *state) * Test whether the disk block contains a tree block (leaf or node) * (note that this test fails for the super block) */ -static int btrfsic_test_for_metadata(struct btrfsic_state *state, -char **datav, unsigned int num_pages) +static noinline_for_stack int btrfsic_test_for_metadata( + struct btrfsic_state *state, + char **datav, unsigned int num_pages) { struct btrfs_fs_info *fs_info = state->fs_info; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); -- 2.20.0
[PATCH] btrfs: add back libcrc32c Kconfig dependency
While part of btrfs now uses the crypto shash interfaces for crc32c, we still get a build time dependency in other places: fs/btrfs/super.o: In function `btrfs_mount_root': super.c:(.text+0xc0d4): undefined reference to `crc32c_impl' fs/btrfs/super.o: In function `btrfs_print_mod_info': super.c:(.init.text+0x3e28): undefined reference to `crc32c_impl' fs/btrfs/extent-tree.o: In function `lookup_inline_extent_backref': extent-tree.c:(.text+0x17750): undefined reference to `crc32c' fs/btrfs/extent-tree.o:extent-tree.c:(.text+0x177f4): more undefined references to `crc32c' follow Change Kconfig to depend on both. Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for checksumming") Signed-off-by: Arnd Bergmann --- fs/btrfs/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index 2521a24f74be..aa7453d44e59 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -3,6 +3,7 @@ config BTRFS_FS tristate "Btrfs filesystem support" select CRYPTO + select LIBCRC32C select CRYPTO_CRC32C select CRYPTO_SHA256 select ZLIB_INFLATE -- 2.20.0
Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
On Tue, Mar 20, 2018 at 7:29 AM, Linus Torvalds wrote: > On Mon, Mar 19, 2018 at 2:43 AM, David Laight wrote: >> >> Is it necessary to have the full checks for old versions of gcc? >> >> Even -Wvla could be predicated on very recent gcc - since we aren't >> worried about whether gcc decides to generate a vla, but whether >> the source requests one. > > You are correct. We could just ignore the issue with old gcc versions, > and disable -Wvla rather than worry about it. This version might also be an option: diff --git a/Makefile b/Makefile index 37fc475a2b92..49dd9f0fb76c 100644 --- a/Makefile +++ b/Makefile @@ -687,7 +687,8 @@ KBUILD_CFLAGS += $(call cc-option,-fno-reorder-blocks,) \ endif ifneq ($(CONFIG_FRAME_WARN),0) -KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN}) +KBUILD_CFLAGS += $(call cc-option,-Wstack-usage=${CONFIG_FRAME_WARN}, \ + -$(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})) endif # This selects the stack protector compiler flag. Testing it is delayed Wiht -Wstack-usage=, we should get a similar warning to -Wvla for frames that contain real VLAs, but not when there is a VLA that ends up being a compile-time constant size in the end. Wstack-usage was introduced in gcc-4.7, so on older versions it turns back into Wframe-larger-than=. An example output would be security/integrity/ima/ima_crypto.c: In function 'ima_calc_buffer_hash': security/integrity/ima/ima_crypto.c:616:5: error: stack usage might be unbounded [-Werror=stack-usage=] Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: tree-checker: use %zu format string for size_t
The return value of sizeof() is of type size_t, so we must print it using the %z format modifier rather than %l to avoid this warning on some architectures: fs/btrfs/tree-checker.c: In function 'check_dir_item': fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} [-Werror=format=] Fixes: 005887f2e3e0 ("btrfs: tree-checker: Add checker for dir item") Signed-off-by: Arnd Bergmann --- fs/btrfs/tree-checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 66dac0a4b01f..7c55e3ba5a6c 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -270,7 +270,7 @@ static int check_dir_item(struct btrfs_root *root, /* header itself should not cross item boundary */ if (cur + sizeof(*di) > item_size) { dir_item_err(root, leaf, slot, - "dir item header crosses item boundary, have %lu boundary %u", + "dir item header crosses item boundary, have %zu boundary %u", cur + sizeof(*di), item_size); return -EUCLEAN; } -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: tree-checker: use %zu format string for size_t
On Thu, Dec 7, 2017 at 1:32 AM, Qu Wenruo wrote: > > > On 2017年12月06日 22:18, Arnd Bergmann wrote: >> The return value of sizeof() is of type size_t, so we must print it >> using the %z format modifier rather than %l to avoid this warning >> on some architectures: >> >> fs/btrfs/tree-checker.c: In function 'check_dir_item': >> fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type >> 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} >> [-Werror=format=] > > Any idea about which architecture will cause such warning? > On x86_64 I always fail to get such warning. I think all 32-bit architectures: #ifndef __kernel_size_t #if __BITS_PER_LONG != 64 typedef unsigned int__kernel_size_t; typedef int __kernel_ssize_t; typedef int __kernel_ptrdiff_t; #else typedef __kernel_ulong_t __kernel_size_t; typedef __kernel_long_t __kernel_ssize_t; typedef __kernel_long_t __kernel_ptrdiff_t; #endif #endif Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: qgroups: remove unused label
The recent revert left one unused label behind: fs/btrfs/qgroup.c: In function 'qgroup_reserve': fs/btrfs/qgroup.c:2432:1: error: label 'retry' defined but not used [-Werror=unused-label] Let's remove it, too. Fixes: b283738ab0ad ("Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"") Signed-off-by: Arnd Bergmann --- fs/btrfs/qgroup.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index de3b96f1005b..2b89848e767d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2429,7 +2429,6 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes, bool enforce, capable(CAP_SYS_RESOURCE)) enforce = false; -retry: spin_lock(&fs_info->qgroup_lock); quota_root = fs_info->quota_root; if (!quota_root) -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: zoned: fix uninitialized max_chunk_size
From: Arnd Bergmann The ctl->max_chunk_size member might be used uninitialized when none of the three conditions for initializing it in init_alloc_chunk_ctl_policy_zoned() are true: In function ‘init_alloc_chunk_ctl_policy_zoned’, inlined from ‘init_alloc_chunk_ctl’ at fs/btrfs/volumes.c:5023:3, inlined from ‘btrfs_alloc_chunk’ at fs/btrfs/volumes.c:5340:2: include/linux/compiler-gcc.h:48:45: error: ‘ctl.max_chunk_size’ may be used uninitialized [-Werror=maybe-uninitialized] 4998 | ctl->max_chunk_size = min(limit, ctl->max_chunk_size); | ^~~ fs/btrfs/volumes.c: In function ‘btrfs_alloc_chunk’: fs/btrfs/volumes.c:5316:32: note: ‘ctl’ declared here 5316 | struct alloc_chunk_ctl ctl; |^~~ Initialize it to UINT_MAX and rely on the min() expression to limit it. Fixes: 1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator") Signed-off-by: Arnd Bergmann --- Note that the -Wmaybe-unintialized warning is globally disabled by default. For some reason I got this warning anyway when building this specific file with gcc-11. --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bc3b33efddc5..b42b423b6a10 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4980,6 +4980,7 @@ static void init_alloc_chunk_ctl_policy_zoned( u64 type = ctl->type; ctl->max_stripe_size = zone_size; + ctl->max_chunk_size = UINT_MAX; if (type & BTRFS_BLOCK_GROUP_DATA) { ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE, zone_size); -- 2.29.2
[PATCH] [v2] btrfs: zoned: bail out in btrfs_alloc_chunk for bad input
From: Arnd Bergmann gcc complains that the ctl->max_chunk_size member might be used uninitialized when none of the three conditions for initializing it in init_alloc_chunk_ctl_policy_zoned() are true: In function ‘init_alloc_chunk_ctl_policy_zoned’, inlined from ‘init_alloc_chunk_ctl’ at fs/btrfs/volumes.c:5023:3, inlined from ‘btrfs_alloc_chunk’ at fs/btrfs/volumes.c:5340:2: include/linux/compiler-gcc.h:48:45: error: ‘ctl.max_chunk_size’ may be used uninitialized [-Werror=maybe-uninitialized] 4998 | ctl->max_chunk_size = min(limit, ctl->max_chunk_size); | ^~~ fs/btrfs/volumes.c: In function ‘btrfs_alloc_chunk’: fs/btrfs/volumes.c:5316:32: note: ‘ctl’ declared here 5316 | struct alloc_chunk_ctl ctl; |^~~ If we ever get into this condition, something is seriously wrong, so the same logic as in init_alloc_chunk_ctl_policy_regular() and a few other places should be applied. This avoids both further data corruption, and the compile-time warning. Fixes: 1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator") Link: https://lore.kernel.org/lkml/20210323132343.gf7...@twin.jikos.cz/ Suggested-by: David Sterba Signed-off-by: Arnd Bergmann --- Note that the -Wmaybe-unintialized warning is globally disabled by default. For some reason I got this warning anyway when building this specific file with gcc-11. --- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bc3b33efddc5..d2994305ed77 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4989,6 +4989,8 @@ static void init_alloc_chunk_ctl_policy_zoned( ctl->max_chunk_size = 2 * ctl->max_stripe_size; ctl->devs_max = min_t(int, ctl->devs_max, BTRFS_MAX_DEVS_SYS_CHUNK); + } else { + BUG(); } /* We don't want a chunk larger than 10% of writable space */ -- 2.29.2
Re: [PATCH 1/4] module: Add module_subinit{_noexit} and module_subeixt helper macros
On Thu, Jul 25, 2024, at 16:39, Christoph Hellwig wrote: > On Thu, Jul 25, 2024 at 11:01:33AM +0800, Youling Tang wrote: >> - It doesn't feel good to have only one subinit/exit in a file. >> Assuming that there is only one file in each file, how do we >> ensure that the files are linked in order?(Is it sorted by *.o >> in the Makefile?) > > Yes, link order already matterns for initialization order for built-in > code, so this is a well known concept. Note: I removed the old way of entering a module a few years ago, which allowed simply defining a function called init_module(). The last one of these was a07d8ecf6b39 ("ethernet: isa: convert to module_init/module_exit"). Now I think we could just make the module_init() macro do the same thing as a built-in initcall() and put an entry in a special section, to let you have multiple entry points in a loadable module. There are still at least two problems though: - while link order is defined between files in a module, I don't think there is any guarantee for the order between two initcalls of the same level within a single file. - For built-in code we don't have to worry about matching the order of the exit calls since they don't exist there. As I understand, the interesting part of this patch series is about making sure the order matches between init and exit, so there still needs to be a way to express a pair of such calls. Arnd
[PATCH] btrfs: shut up bogus -Wmaybe-uninitialized warning
gcc sometimes can't determine whether a variable has been initialized when both the initialization and the use are conditional: fs/btrfs/props.c: In function 'inherit_props': fs/btrfs/props.c:389:4: error: 'num_bytes' may be used uninitialized in this function [-Werror=maybe-uninitialized] btrfs_block_rsv_release(fs_info, trans->block_rsv, This code is fine. Unfortunately, I cannot think of a good way to rephrase it in a way that makes gcc understand this, so I add a bogus initialization the way one should not. Fixes: d7400ee1b476 ("btrfs: use the existing reserved items for our first prop for inheritance") Signed-off-by: Arnd Bergmann --- fs/btrfs/props.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c index a9e2e66152ee..9d47ae1cf5b2 100644 --- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -341,7 +341,7 @@ static int inherit_props(struct btrfs_trans_handle *trans, for (i = 0; i < ARRAY_SIZE(prop_handlers); i++) { const struct prop_handler *h = &prop_handlers[i]; const char *value; - u64 num_bytes; + u64 num_bytes = 0; if (!h->inheritable) continue; -- 2.20.0
[PATCH] btrfs: fix uninitialized variable access after ASSERT
In btrfs, ASSERT() has no effect if CONFIG_BTRFS_ASSERT is disabled, and gcc notices that this can lead to using an uninitialized variable: fs/btrfs/inode.c: In function 'run_delalloc_range': fs/btrfs/inode.c:1190:18: error: 'cur_end' may be used uninitialized in this function [-Werror=maybe-uninitialized] I assume the condition that the ASSERT checks for is actually correct and we won't get there in practice, but it's easy to modify the function to make it simpler and avoid the condition that the warning is for. Fixes: e5249f75cfd0 ("btrfs: Introduce COMPRESS reserve type to fix false enospc for compression") Signed-off-by: Arnd Bergmann --- fs/btrfs/inode.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e6f35d923d67..b1d2b38d29aa 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1175,18 +1175,14 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED, 1, 0, NULL, GFP_NOFS); while (start < end) { + ASSERT(reserve_type == BTRFS_RESERVE_COMPRESS); async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS); BUG_ON(!async_cow); /* -ENOMEM */ async_cow->inode = igrab(inode); async_cow->root = root; async_cow->locked_page = locked_page; async_cow->start = start; - - if (reserve_type == BTRFS_RESERVE_COMPRESS) - cur_end = min(end, start + SZ_512K - 1); - else - ASSERT(0); - + cur_end = min(end, start + SZ_512K - 1); async_cow->end = cur_end; INIT_LIST_HEAD(&async_cow->extents); -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/12] audit: Use timespec64 to represent audit timestamps
On Sat, Apr 8, 2017 at 5:58 PM, Deepa Dinamani wrote: >> I have no problem merging this patch into audit/next for v4.12, would >> you prefer me to do that so at least this patch is merged? > > This would be fine. > But, I think whoever takes the last 2 deletion patches should also take them. > I'm not sure how that part works out. > >> It would probably make life a small bit easier for us in the audit >> world too as it would reduce the potential merge conflict. However, >> that's a relatively small thing to worry about. As Andrew has picked the remaining patches up into -mm, this will work out fine: any patches picked up by the respective maintainers for v4.12 should arrive as git pull requests before the -mm patches get applied at a later stage of the merge window. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: work around maybe-uninitialized warning
A rewrite of btrfs_submit_direct_hook appears to have introduced a warning: fs/btrfs/inode.c: In function 'btrfs_submit_direct_hook': fs/btrfs/inode.c:8467:14: error: 'bio' may be used uninitialized in this function [-Werror=maybe-uninitialized] Where the 'bio' variable was previously initialized unconditionally, it is now set in the "while (submit_len > 0)" loop that would never execute if submit_len is zero. Assuming this cannot happen in practice, we can avoid the warning by simply replacing the while{} loop with a do{}while() loop so the compiler knows that it will always be entered at least once. Fixes: 0fd27e06c61b ("Btrfs: use bio_clone_bioset_partial to simplify DIO submit") Signed-off-by: Arnd Bergmann --- --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8c37b4fa4cbb..c62cf9593cb3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8497,7 +8497,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip, /* bio split */ ASSERT(map_length <= INT_MAX); atomic_inc(&dip->pending_bios); - while (submit_len > 0) { + do { clone_len = min_t(int, submit_len, map_length); /* @@ -8540,7 +8540,7 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip, start_sector << 9, &map_length, NULL, 0); if (ret) goto out_err; - } + } while (submit_len > 0); submit: ret = __btrfs_submit_dio_bio(bio, inode, file_offset, skip_sum, -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: work around maybe-uninitialized warning
On Fri, May 19, 2017 at 8:10 PM, Liu Bo wrote: > On Thu, May 18, 2017 at 03:33:29PM +0200, Arnd Bergmann wrote: >> A rewrite of btrfs_submit_direct_hook appears to have introduced a warning: >> >> fs/btrfs/inode.c: In function 'btrfs_submit_direct_hook': >> fs/btrfs/inode.c:8467:14: error: 'bio' may be used uninitialized in this >> function [-Werror=maybe-uninitialized] >> >> Where the 'bio' variable was previously initialized unconditionally, it >> is now set in the "while (submit_len > 0)" loop that would never execute >> if submit_len is zero. >> >> Assuming this cannot happen in practice, we can avoid the warning >> by simply replacing the while{} loop with a do{}while() loop so >> the compiler knows that it will always be entered at least once. >> > > Thanks for the fix. I think it's a false positve one and I've updated it in > v2 > with a 'struct bio *bio = NULL' to make compiler happy, could you please help > reveiw it? Right, it is a false positive and adding the =NULL initialization shuts up the warning. The reason my patch used a different approach is to make the code more robust, see https://rusty.ozlabs.org/?p=232 Generally speaking initializing a local variable to an illegal value, and later using the variable without a check for that original value is error-prone. Even though the code is correct at the moment, someone else might modify it later. My first (broken) solution avoided this by checking for the condition that led to the warning, my newer solution is nicer as it makes it much clearer to the reader what is going on, compared to the NULL initialization that does not help readability but makes it slightly harder to understand why you wrote the code specifically that way. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: avoid uninitialized variable warning
With CONFIG_SMP and CONFIG_PREEMPT both disabled, gcc decides to partially inline the get_state_failrec() function but cannot figure out that means the failrec pointer is always valid if the function returns success, which causes a harmless warning: fs/btrfs/extent_io.c: In function 'clean_io_failure': fs/btrfs/extent_io.c:2131:4: error: 'failrec' may be used uninitialized in this function [-Werror=maybe-uninitialized] This marks get_state_failrec() and set_state_failrec() both as 'noinline', which avoids the warning in all cases for me, and seems less ugly than adding a fake initialization. Signed-off-by: Arnd Bergmann Fixes: 47dc196ae719 ("btrfs: use proper type for failrec in extent_state") --- fs/btrfs/extent_io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 27577f1b10dc..76a0c8597d98 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1842,7 +1842,7 @@ out: * set the private field for a given byte offset in the tree. If there isn't * an extent_state there already, this does nothing. */ -static int set_state_failrec(struct extent_io_tree *tree, u64 start, +static noinline int set_state_failrec(struct extent_io_tree *tree, u64 start, struct io_failure_record *failrec) { struct rb_node *node; @@ -1870,7 +1870,7 @@ out: return ret; } -static int get_state_failrec(struct extent_io_tree *tree, u64 start, +static noinline int get_state_failrec(struct extent_io_tree *tree, u64 start, struct io_failure_record **failrec) { struct rb_node *node; -- 2.7.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
On Thu, Jun 1, 2017 at 11:56 AM, Yan, Zheng wrote: > On Sat, Apr 8, 2017 at 8:57 AM, Deepa Dinamani wrote: >> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >> index 517838b..77204da 100644 >> --- a/drivers/block/rbd.c >> +++ b/drivers/block/rbd.c >> @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct >> rbd_obj_request *obj_request) >> { >> struct ceph_osd_request *osd_req = obj_request->osd_req; >> >> - osd_req->r_mtime = CURRENT_TIME; >> + ktime_get_real_ts(&osd_req->r_mtime); >> osd_req->r_data_offset = obj_request->offset; >> } >> >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >> index c681762..1d3fa90 100644 >> --- a/fs/ceph/mds_client.c >> +++ b/fs/ceph/mds_client.c >> @@ -1666,6 +1666,7 @@ struct ceph_mds_request * >> ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode) >> { >> struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS); >> + struct timespec ts; >> >> if (!req) >> return ERR_PTR(-ENOMEM); >> @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, >> int op, int mode) >> init_completion(&req->r_safe_completion); >> INIT_LIST_HEAD(&req->r_unsafe_item); >> >> - req->r_stamp = current_fs_time(mdsc->fsc->sb); >> + ktime_get_real_ts(&ts); >> + req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran); > > This change causes our kernel_untar_tar test case to fail (inode's > ctime goes back). The reason is that there is time drift between the > time stamps got by ktime_get_real_ts() and current_time(). We need to > revert this change until current_time() uses ktime_get_real_ts() > internally. Hmm, the change was not supposed to have a user-visible effect, so something has gone wrong, but I don't immediately see how it relates to what you observe. ktime_get_real_ts() and current_time() use the same time base, there is no drift, but there is a difference in resolution, as the latter uses the time stamp of the last jiffies update, which may be up to one jiffy (10ms) behind the exact time we put in the request stamps here. Do you still see problems if you use current_kernel_time() instead of ktime_get_real_ts()? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng wrote: > On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani wrote: >> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz wrote: >>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng wrote: >>>> On Thu, Jun 1, 2017 at 6:22 PM, Arnd Bergmann wrote: >>>>> On Thu, Jun 1, 2017 at 11:56 AM, Yan, Zheng wrote: >>>>>> On Sat, Apr 8, 2017 at 8:57 AM, Deepa Dinamani >>>>>> wrote: >>>>> >>>>>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >>>>>>> index 517838b..77204da 100644 >>>>>>> --- a/drivers/block/rbd.c >>>>>>> +++ b/drivers/block/rbd.c >>>>>>> @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct >>>>>>> rbd_obj_request *obj_request) >>>>>>> { >>>>>>> struct ceph_osd_request *osd_req = obj_request->osd_req; >>>>>>> >>>>>>> - osd_req->r_mtime = CURRENT_TIME; >>>>>>> + ktime_get_real_ts(&osd_req->r_mtime); >>>>>>> osd_req->r_data_offset = obj_request->offset; >>>>>>> } >>>>>>> >>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >>>>>>> index c681762..1d3fa90 100644 >>>>>>> --- a/fs/ceph/mds_client.c >>>>>>> +++ b/fs/ceph/mds_client.c >>>>>>> @@ -1666,6 +1666,7 @@ struct ceph_mds_request * >>>>>>> ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int >>>>>>> mode) >>>>>>> { >>>>>>> struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS); >>>>>>> + struct timespec ts; >>>>>>> >>>>>>> if (!req) >>>>>>> return ERR_PTR(-ENOMEM); >>>>>>> @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client >>>>>>> *mdsc, int op, int mode) >>>>>>> init_completion(&req->r_safe_completion); >>>>>>> INIT_LIST_HEAD(&req->r_unsafe_item); >>>>>>> >>>>>>> - req->r_stamp = current_fs_time(mdsc->fsc->sb); >>>>>>> + ktime_get_real_ts(&ts); >>>>>>> + req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran); >>>>>> >>>>>> This change causes our kernel_untar_tar test case to fail (inode's >>>>>> ctime goes back). The reason is that there is time drift between the >>>>>> time stamps got by ktime_get_real_ts() and current_time(). We need to >>>>>> revert this change until current_time() uses ktime_get_real_ts() >>>>>> internally. >>>>> >>>>> Hmm, the change was not supposed to have a user-visible effect, so >>>>> something has gone wrong, but I don't immediately see how it >>>>> relates to what you observe. >>>>> >>>>> ktime_get_real_ts() and current_time() use the same time base, there >>>>> is no drift, but there is a difference in resolution, as the latter uses >>>>> the time stamp of the last jiffies update, which may be up to one jiffy >>>>> (10ms) behind the exact time we put in the request stamps here. >>>>> >>>>> Do you still see problems if you use current_kernel_time() instead of >>>>> ktime_get_real_ts()? >>>> >>>> The problem disappears after using current_kernel_time(). >>>> >>>> https://github.com/ceph/ceph-client/commit/2e0f648da23167034a3cf1500bc90ec60aef2417 >>> >>> From the commit above: >>> "It seems there is time drift between ktime_get_real_ts() and >>> current_kernel_time()" >>> >>> Its more of a granularity difference. current_kernel_time() returns >>> the cached time at the last tick, where as ktime_get_real_ts() reads >>> the clocksource hardware and returns the immediate time. >>> >>> Filesystems usually use the cached time (similar to >>> CLOCK_REALTIME_COARSE), for performance reasons, as touching the >>> clocksource takes time. >> >> Alternatively, it would be best for this code also to use current_time(). >> I had suggested this in one of the previous versions
Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
On Fri, Jun 2, 2017 at 12:10 PM, Yan, Zheng wrote: > On Fri, Jun 2, 2017 at 5:45 PM, Arnd Bergmann wrote: >> On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng wrote: >>> On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani >>> wrote: >>>> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz wrote: >>>>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng wrote: >> >> I believe the bug you see is the result of the two timestamps >> currently being almost guaranteed to be different in the latest >> kernels. >> Changing r_stamp to use current_kernel_time() will make it the >> same value most of the time (as it was before Deepa's patch), >> but when the timer interrupt happens between the timestamps, >> the two are still different, it's just much harder to hit. >> >> I think the proper solution should be to change __ceph_setattr() >> in a way that has req->r_stamp always synchronized with i_ctime. >> If we copy i_ctime to r_stamp, that will also take care of the >> future issues with the planned changes to current_time(). >> > I already have a patch > https://github.com/ceph/ceph-client/commit/24f54cd18e195a002ee3d2ab50dbc952fd9f82af Looks good to me. In case anyone cares: Acked-by: Arnd Bergmann >> The part I don't understand is what else r_stamp (i.e. the time >> stamp in ceph_msg_data with type== >> CEPH_MSG_CLIENT_REQUEST) is used for, other than setting >> ctime in CEPH_MDS_OP_SETATTR. >> >> Will this be used to update the stored i_ctime for other operations >> too? If so, we would need to synchronize it with the in-memory >> i_ctime for all operations that do this. >> > > yes, mds uses it to update ctime of modified inodes. For example, > when handling mkdir, mds set ctime of both parent inode and new inode > to r_stamp. I see, so we may have a variation of that problem there as well: From my reading of the code, the child inode is not in memory yet, so that seems fine, but I could not find where the parent in-memory inode i_ctime is updated in ceph, but it is most likely not the same as req->r_stamp (assuming it gets updated at all). Would it make sense require all callers of ceph_mdsc_do_request() to update r_stamp at the same time as i_ctime to keep them in sync? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
On Fri, Jun 2, 2017 at 1:18 PM, Yan, Zheng wrote: > On Fri, Jun 2, 2017 at 6:51 PM, Arnd Bergmann wrote: >> On Fri, Jun 2, 2017 at 12:10 PM, Yan, Zheng wrote: >>> On Fri, Jun 2, 2017 at 5:45 PM, Arnd Bergmann wrote: >>>> On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng wrote: >>>>> On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani >>>>> wrote: >>>>>> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz >>>>>> wrote: >>>>>>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng wrote: >>>> >>>> I believe the bug you see is the result of the two timestamps >>>> currently being almost guaranteed to be different in the latest >>>> kernels. >>>> Changing r_stamp to use current_kernel_time() will make it the >>>> same value most of the time (as it was before Deepa's patch), >>>> but when the timer interrupt happens between the timestamps, >>>> the two are still different, it's just much harder to hit. >>>> >>>> I think the proper solution should be to change __ceph_setattr() >>>> in a way that has req->r_stamp always synchronized with i_ctime. >>>> If we copy i_ctime to r_stamp, that will also take care of the >>>> future issues with the planned changes to current_time(). >>>> >>> I already have a patch >>> https://github.com/ceph/ceph-client/commit/24f54cd18e195a002ee3d2ab50dbc952fd9f82af >> >> Looks good to me. In case anyone cares: >> Acked-by: Arnd Bergmann >> >>>> The part I don't understand is what else r_stamp (i.e. the time >>>> stamp in ceph_msg_data with type== >>>> CEPH_MSG_CLIENT_REQUEST) is used for, other than setting >>>> ctime in CEPH_MDS_OP_SETATTR. >>>> >>>> Will this be used to update the stored i_ctime for other operations >>>> too? If so, we would need to synchronize it with the in-memory >>>> i_ctime for all operations that do this. >>>> >>> >>> yes, mds uses it to update ctime of modified inodes. For example, >>> when handling mkdir, mds set ctime of both parent inode and new inode >>> to r_stamp. >> >> I see, so we may have a variation of that problem there as well: From >> my reading of the code, the child inode is not in memory yet, so >> that seems fine, but I could not find where the parent in-memory inode >> i_ctime is updated in ceph, but it is most likely not the same as >> req->r_stamp (assuming it gets updated at all). > > i_ctime is updated when handling request reply, by ceph_fill_file_time(). > __ceph_setattr() can update the in-memory inode's ctime after request > reply is received. The difference between ktime_get_real_ts() and > current_time() can be larger than round-trip time of request. So it's > still possible that __ceph_setattr() make ctime go back. But the __ceph_setattr() problem should be fixed by your patch, right? What I meant is another related problem in ceph_mkdir() where the i_ctime field of the parent inode is different between the persistent representation in the mds and the in-memory representation. Arnd >> Would it make sense require all callers of ceph_mdsc_do_request() >> to update r_stamp at the same time as i_ctime to keep them in sync? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
On Fri, Jun 2, 2017 at 2:18 PM, Yan, Zheng wrote: > On Fri, Jun 2, 2017 at 7:33 PM, Arnd Bergmann wrote: >> On Fri, Jun 2, 2017 at 1:18 PM, Yan, Zheng wrote: >> What I meant is another related problem in ceph_mkdir() where the >> i_ctime field of the parent inode is different between the persistent >> representation in the mds and the in-memory representation. >> > > I don't see any problem in mkdir case. Parent inode's i_ctime in mds is set to > r_stamp. When client receives request reply, it set its in-memory inode's > ctime > to the same time stamp. Ok, I see it now, thanks for the clarification. Most other file systems do this the other way round and update all fields in the in-memory inode structure first and then write that to persistent storage, so I was getting confused about the order of events here. If I understand it all right, we have three different behaviors in ceph now, though the differences are very minor and probably don't ever matter: - in setattr(), we update ctime in the in-memory inode first and then send the same time to the mds, and expect to set it again when the reply comes. - in ceph_write_iter write() and mmap/page_mkwrite(), we call file_update_time() to set i_mtime and i_ctime to the same timestamp first once a write is observed by the fs and then take two other timestamps that we send to the mds, and update the in-memory inode a second time when the reply comes. ctime is never older than mtime here, as far as I can tell, but it may be newer when the timer interrupt happens between taking the two stamps. - in all other calls, we only update the inode (and/or parent inode) after the reply arrives. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: tree-checker: use %zu format string for size_t
We now get a harmless compile-time on 32-bit architectures: fs/btrfs/tree-checker.c: In function 'check_extent_data_item': fs/btrfs/tree-checker.c:189:70: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=] This changes the format string to use %zu instead of %lu for size_t. Fixes: c1f6520bf360 ("btrfs: tree-checker: Enhance output for check_extent_data_item") Signed-off-by: Arnd Bergmann --- fs/btrfs/tree-checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index c7a09cc2a17c..388fb6553aa5 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -186,7 +186,7 @@ static int check_extent_data_item(struct btrfs_root *root, /* Regular or preallocated extent has fixed item size */ if (item_size != sizeof(*fi)) { file_extent_err(root, leaf, slot, - "invalid item size for reg/prealloc file extent, have %u expect %lu", + "invalid item size for reg/prealloc file extent, have %u expect %zu", item_size, sizeof(*fi)); return -EUCLEAN; } -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: next/master build: 214 builds: 29 failed, 185 passed, 29 errors, 68 warnings (next-20171016)
On Tue, Oct 17, 2017 at 1:02 AM, kernelci.org bot wrote: > > next/master build: 214 builds: 29 failed, 185 passed, 29 errors, 68 warnings > (next-20171016) > Full Build Summary: > https://kernelci.org/build/next/branch/master/kernel/next-20171016/ > Tree: next > Branch: master > Git Describe: next-20171016 > Git Commit: babb43f85f5fc03482aafa461bdc2d38b9345361 > Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > Built: 4 unique architectures > > Build Failures Detected: > > arm: gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05) > allmodconfig FAIL > mips: gcc version 6.3.0 (GCC) > bigsur_defconfig FAIL > bmips_be_defconfig FAIL > bmips_stb_defconfig FAIL > cavium_octeon_defconfig FAIL > ip27_defconfig FAIL > loongson3_defconfig FAIL > malta_defconfig FAIL > malta_kvm_defconfig FAIL > maltaaprp_defconfig FAIL > maltasmvp_defconfig FAIL > maltasmvp_eva_defconfig FAIL > maltaup_defconfig FAIL > maltaup_xpa_defconfig FAIL > mips_paravirt_defconfig FAIL > msp71xx_defconfig FAIL > nlm_xlp_defconfig FAIL > nlm_xlr_defconfig FAIL > pistachio_defconfig FAIL > sb1250_swarm_defconfig FAIL > xway_defconfig FAIL > x86: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) > allmodconfig FAIL > allmodconfig+CONFIG_OF=n FAIL > defconfig+CONFIG_KASAN=y FAIL > defconfig+CONFIG_LKDTM=y FAIL > defconfig+CONFIG_OF_UNITTEST=y FAIL > defconfig+kselftest FAIL > i386_defconfig FAIL > x86_64_defconfig FAIL > > Errors summary: > 20 arch/mips/include/asm/smp.h:32:29: error: 'CONFIG_MIPS_NR_CPU_NR_MAP' > undeclared here (not in a function) These are all caused by commit e79824d71155 ("MIPS: Allow __cpu_number_map to be larger than NR_CPUS"), which was apparently incomplete. I'm not sending a patch since I don't know what the intention was here. Reverting that commit avoids the problem. > 7 drivers/gpu/drm/i915/i915_gem.c:3092:54: error: 'flags' undeclared (first > use in this function) A mismerge in today's linux-next, I assume it will be resolved in the following linux-next > 1 ERROR: "__aeabi_uldivmod" [drivers/net/ethernet/intel/i40e/i40e.ko] > undefined! I haven't reproduced this one yet, will have a look. > 1 /bin/sh: 1: > /home/buildslave/workspace/kernel-single-defconfig-builder/defconfig/defconfig+CONFIG_LKDTM=y/label/builder/build-x86/tools/objtool//fixdep: > Permission denied This is an old build problem with objtool. It only shows up sometimes when we hit a certain race. Josh Poimboef suggested a workaround but can't reproduce the problem here locally, so I haven't some a patch. > Warnings summary: > 16 fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not > used [-Wunused-function] > 16 fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' > defined but not used [-Wunused-function] I sent a patch a few days ago, should get merged soon. This also happens in mainline. > 7 include/linux/typecheck.h:11:18: warning: comparison of distinct pointer > types lacks a cast > 5 fs/f2fs/node.c:1654:5: warning: suggest parentheses around assignment used > as truth value [-Wparentheses] > 5 fs/f2fs/node.c:1556:5: warning: suggest parentheses around assignment used > as truth value [-Wparentheses] > 5 fs/f2fs/node.c:1443:5: warning: suggest parentheses around assignment used > as truth value [-Wparentheses] > 5 fs/f2fs/node.c:1289:5: warning: suggest parentheses around assignment used > as truth value [-Wparentheses] > 5 fs/f2fs/checkpoint.c:322:5: warning: suggest parentheses around assignment > used as truth value [-Wparentheses] This came up today, I just sent a patch. > 2 fs/btrfs/tree-checker.c:186:4: warning: format '%lu' expects argument of > type 'long unsigned int', but argument 6 has type 'unsigned int' [-Wformat=] > 1 fs/btrfs/tree-checker.c:186:70: warning: format '%lu' expects argument of > type 'long unsigned int', but argument 6 has type 'unsigned int' [-Wformat=] I sent a patch a few days ago, Dave Sterba reviewed it, but it hasn't gone in yet. > 1 fs/binfmt_elf_fdpic.c:1501:17: warning: unused variable 'addr' > [-Wunused-variable] Al Viro applied my patch, but it hasn't made it into linux-next yet. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Update LZO compression
On Tuesday 09 October 2012, Markus F.X.J. Oberhumer wrote: > > > > : This commit updates the kernel LZO code to the current upsteam version > > : which features a significant speed improvement - benchmarking the Calgary > > : and Silesia test corpora typically shows a doubled performance in > > : both compression and decompression on modern i386/x86_64/powerpc machines. > > > > There are significant clients of the LZO library - crypto, btrfs, > > jffs2, ubifs, squashfs and zcache. So let's give all those people a cc > > and ask that they test the LZO changes once they land in linux-next. > > For correctness and performance, please. > > The core compression and decompression code has been thoroughly tested, so I > do not expect major problems. > > Good testing after the merge and feedback about build or performance issues > (and improvements!) is highly appreciated. The addition of the lzo tree to linux-next caused this problem for ARM imx_v6_v7_defconfig: In file included from /home/arnd/linux-arm/arch/arm/boot/compressed/decompress.c:40:0: /home/arnd/linux-arm/arch/arm/boot/compressed/../../../../lib/decompress_unlzo.c:34:34: fatal error: lzo/lzo1x_decompress.c: No such file or directory Since the file was renamed, anything including it needs to be updated to the new file name. Signed-off-by: Arnd Bergmann diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c index 4531294..960183d 100644 --- a/lib/decompress_unlzo.c +++ b/lib/decompress_unlzo.c @@ -31,7 +31,7 @@ */ #ifdef STATIC -#include "lzo/lzo1x_decompress.c" +#include "lzo/lzo1x_decompress_safe.c" #else #include #endif -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH v2 00/24] Delete CURRENT_TIME and CURRENT_TIME_SEC macros
On Monday, June 20, 2016 11:03:01 AM CEST you wrote: > On Sun, Jun 19, 2016 at 5:26 PM, Deepa Dinamani > wrote: > > The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC > > macros. > Gcc handles 8-byte structure returns (on most architectures) by > returning them as two 32-bit registers (%edx:%eax on x86). But once it > is timespec64, that will no longer be the case, and the calling > convention will end up using a pointer to the local stack instead. I guess we already have that today, as the implementation of current_fs_time() is static inline struct timespec64 tk_xtime(struct timekeeper *tk) { struct timespec64 ts; ts.tv_sec = tk->xtime_sec; ts.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift); return ts; } extern struct timespec64 current_kernel_time64(void); struct timespec64 current_kernel_time64(void) { struct timekeeper *tk = &tk_core.timekeeper; struct timespec64 now; unsigned long seq; do { seq = read_seqcount_begin(&tk_core.seq); now = tk_xtime(tk); } while (read_seqcount_retry(&tk_core.seq, seq)); return now; } static inline struct timespec current_kernel_time(void) { struct timespec64 now = current_kernel_time64(); return timespec64_to_timespec(now); } extern struct timespec current_fs_time(struct super_block *sb); struct timespec current_fs_time(struct super_block *sb) { struct timespec now = current_kernel_time(); return timespec_trunc(now, sb->s_time_gran); } We can surely do a little better than this, independent of the conversion in Deepa's patch set. > So for 32-bit code generation, we *may* want to introduce a new model of doing > > set_inode_time(inode, ATTR_ATIME | ATTR_MTIME); > > which basically just does > > inode->i_atime = inode->i_mtime = current_time(inode); > > but with a much easier calling convention on 32-bit architectures. > > But that is entirely orthogonal to this patch-set, and should be seen > as a separate issue. I've played around with that, but found it hard to avoid going through memory other than going all the way to the tk_xtime() access to copy both tk->xtime_sec and the nanoseconds into the inode fields. Without that, the set_inode_time() implementation ends up being more expensive than inode->i_atime = inode->i_ctime = inode->i_mtime = current_time(inode); because we still copy through the stack but also have a couple of conditional branches that we don't have at the moment. At the moment, the triple assignment becomes (here on ARM) c: 4668mov r0, sp 12: f7ff fffe bl 0 3e: f107 0520 add.w r5, r7, #32 12: R_ARM_THM_CALL current_kernel_time64 16: f106 0410 add.w r4, r6, #16 1a: e89d 000f ldmia.w sp, {r0, r1, r2, r3} # load from stack 1e: e885 000f stmia.w r5, {r0, r1, r2, r3} # store into i_atime 22: e884 000f stmia.w r4, {r0, r1, r2, r3} #i_ctime 26: e886 000f stmia.w r6, {r0, r1, r2, r3} #i_mtime and a slightly more verbose version of the same thing on x86 (storing only 12 bytes instead of 16 is cheaper there, while ARM does a store-multiple to copy the entire structure). Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH v2 00/24] Delete CURRENT_TIME and CURRENT_TIME_SEC macros
On Sunday, June 19, 2016 5:26:59 PM CEST Deepa Dinamani wrote: > The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC > macros. > The macros are not y2038 safe. There is no plan to transition them into being > y2038 safe. > ktime_get_* api's can be used in their place. And, these are y2038 safe. > > Thanks to Arnd Bergmann for all the guidance and discussions. > > Patches 2-4 were mostly generated using coccinelle scripts. > > All filesystem timestamps use current_fs_time() for right granularity as > mentioned in the respective commit texts of patches. This has a changed > signature, renamed to current_time() and moved to the fs/inode.c. > > This series also serves as a preparatory series to transition vfs to 64 bit > timestamps as outlined here: https://lkml.org/lkml/2016/2/12/104 . > > As per Linus's suggestion in https://lkml.org/lkml/2016/5/24/663 , all the > inode timestamp changes have been squashed into a single patch. Also, > current_time() now is used as a single generic vfs filesystem timestamp api. > It also takes struct inode* as argument instead of struct super_block*. > Posting all patches together in a bigger series so that the big picture is > clear. > > As per the suggestion in https://lwn.net/Articles/672598/, CURRENT_TIME macro > bug fixes are being handled in a series separate from transitioning vfs to > use. I've looked in detail at all the patches in this version now, and while overall everything is fine, I found that two patches cannot be part of the series because of the dependency on the patch that John already took (adding time64_to_tm), but I think that's ok because we just need to change over all the users of CURRENT_TIME and CURRENT_TIME_SEC that assign to inode timestamps in order to prepare for the type change, the other ones can be changed later. I also found a few things that could be done differently to make the later conversion slightly easier, but it's also possible that I missed part of your bigger plan for those files, and none of them seem important. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Y2038] [PATCH v3 00/24] Delete CURRENT_TIME_SEC and replace current_fs_time()
On Saturday, June 25, 2016 2:37:24 PM CEST Deepa Dinamani wrote: > The series is aimed at getting rid of CURRENT_TIME, CURRENT_TIME_SEC macros > and replacing current_fs_time() with current_time(). > The macros are not y2038 safe. There is no plan to transition them into being > y2038 safe. > ktime_get_* api's can be used in their place. And, these are y2038 safe. > > CURRENT_TIME will be deleted after 4.8 rc1 as there is a dependency function > time64_to_tm() for one of the CURRENT_TIME occurance. > > Thanks to Arnd Bergmann for all the guidance and discussions. > > Patches 3-5 were mostly generated using coccinelle. > > All filesystem timestamps use current_fs_time() for right granularity as > mentioned in the respective commit texts of patches. This has a changed > signature, renamed to current_time() and moved to the fs/inode.c. > > This series also serves as a preparatory series to transition vfs to 64 bit > timestamps as outlined here: https://lkml.org/lkml/2016/2/12/104 . > > As per Linus's suggestion in https://lkml.org/lkml/2016/5/24/663 , all the > inode timestamp changes have been squashed into a single patch. Also, > current_time() now is used as a single generic vfs filesystem timestamp api. > It also takes struct inode* as argument instead of struct super_block*. > Posting all patches together in a bigger series so that the big picture is > clear. > > As per the suggestion in https://lwn.net/Articles/672598/, CURRENT_TIME macro > bug fixes are being handled in a series separate from transitioning vfs to > use. > Everything in this version looks good to me. Please add Reviewed-by: Arnd Bergmann and send a pull request to Al Viro, based on the latest linux-4.7-rc release. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs
On Tuesday 04 January 2011 06:40:32 Shaohua Li wrote: > +static int ioctl_metadata_incore(struct file *filp, void __user *argp) > +{ > + struct super_block *sb = filp->f_path.dentry->d_inode->i_sb; > + struct metadata_incore_args args; > + struct metadata_incore_ent ent; > + loff_t offset, last_offset = 0; > + ssize_t size, last_size = 0; > + __u64 __user vec_addr; __user only makes sense on pointers. Just make this a "struct metadata_incore_ent __user *", which will also take care of the "sparse" warnings you get from the copy_to_user lines below. > > +struct metadata_incore_ent { > + __u64 offset; > + __u32 size; > + __u32 unused; > +}; > + > +struct metadata_incore_args { > + __u64 offset; /* offset in meta address */ > + __u64 __user vec_addr; /* vector's address */ > + __u32 vec_size; /* vector's size */ > + __u32 unused; > +}; We usually try hard to avoid ioctls with indirect pointers in them. The implementation is correct (most people get this wrong), besides the extraneous __user keyword in there. Have you tried passing just a single metadata_incore_ent at the ioctl and looping in user space? I would guess the extra overhead of that would be small enough, but that might need to be measured. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs
On Tuesday 04 January 2011 06:40:37 Shaohua Li wrote: > /* > * When you add any new common ioctls to the switches above and below > * please update compat_sys_ioctl() too. > @@ -664,6 +682,9 @@ int do_vfs_ioctl(struct file *filp, unsi > case FIMETADATA_INCORE: > return ioctl_metadata_incore(filp, argp); > > + case FIMETADATA_READAHEAD: > + return ioctl_metadata_readahead(filp, argp); > + > default: > if (S_ISREG(filp->f_path.dentry->d_inode->i_mode)) > error = file_ioctl(filp, cmd, arg); Did you notice the comment above the function? ;-) You should really add the new ioctls to compat_sys_ioctl, not to the COMPATIBLE_IOCTL() list, in order to make the behavior consistent between 32 and 64 bit user space. The main difference is that all ioctl commands that are hardcoded in the functions get called before trying to call the file system specific .ioctl method. > +struct metadata_readahead_args { > + __u64 offset; > + __u64 size; > +}; > + > #define NR_FILE 8192 /* this can well be larger on a larger system */ > > #define MAY_EXEC 1 > @@ -338,6 +343,7 @@ struct metadata_incore_args { > #define FITHAW _IOWR('X', 120, int)/* Thaw */ > #define FITRIM _IOWR('X', 121, struct fstrim_range)/* Trim */ > #define FIMETADATA_INCORE _IOWR('X', 122, struct metadata_incore_args) > +#define FIMETADATA_READAHEAD _IOR('X', 123, struct metadata_readahead_args) This should be _IOW, not _IOR. Otherwise looks good. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs
On Wednesday 05 January 2011 03:11:36 Shaohua Li wrote: > > Did you notice the comment above the function? ;-) > > > > You should really add the new ioctls to compat_sys_ioctl, not > > to the COMPATIBLE_IOCTL() list, in order to make the behavior > > consistent between 32 and 64 bit user space. The main difference > > is that all ioctl commands that are hardcoded in the functions > > get called before trying to call the file system specific > > .ioctl method. > Thanks, fixed them. The patch you posted still uses COMPATIBLE_IOCTL. Wrong patch? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs
On Wednesday 05 January 2011 10:09:20 Arnd Bergmann wrote: > > Thanks, fixed them. > > The patch you posted still uses COMPATIBLE_IOCTL. Wrong patch? On a second look, I noticed that you now have both the COMPATIBLE_IOCTL and the case statement in compat_sys_ioctl. The former can be dropped. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs
On Wednesday 05 January 2011 03:17:16 Shaohua Li wrote: > On Tue, 2011-01-04 at 17:40 +0800, Arnd Bergmann wrote: > > > Have you tried passing just a single metadata_incore_ent > > at the ioctl and looping in user space? I would guess the > > extra overhead of that would be small enough, but that might > > need to be measured. > metadata usually isn't continuous, so this means we have a lot of > metadata_incore_ent entries. And this is called at boot time and I want > to make the overhead as low as possible to not impact boot. Unless there > are certain reasons we can't use indirect pointers, I'd like to make > kernel return a vector of entries. It's not a strict rule, but the indirect data passing is rather ugly and I'd only do that if the difference can be /measured/. If the purpose is to speed up boot time by preloading metadata, the FIMETADATA_INCORE operations should of course not take a significant amount of time compared to the actual preloading, but as long as it's less than one percent of the time you need for the preload, I would just use the simpler interface. > @@ -882,6 +882,7 @@ COMPATIBLE_IOCTL(FIGETBSZ) > /* 'X' - originally XFS but some now in the VFS */ > COMPATIBLE_IOCTL(FIFREEZE) > COMPATIBLE_IOCTL(FITHAW) > +COMPATIBLE_IOCTL(FIMETADATA_INCORE) > COMPATIBLE_IOCTL(KDGETKEYCODE) > COMPATIBLE_IOCTL(KDSETKEYCODE) > COMPATIBLE_IOCTL(KDGKBTYPE) This change can go away as well. Two more general comments: - You probably want to add the ioctls to file_ioctl instead of do_vfs_ioctl, so you don't add another case statement to the common path. - I don't know if there are any rules for what should be an ioctl or an fcntl, we're rather inconsistent about this. If you have found a good reason for making it an ioctl, just put that into the changelog so we can refer to it next time. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs
On Thursday 06 January 2011, Shaohua Li wrote: > I don't understand. adding a case statement in compat_sys_ioctl, so we will do > compat_ioctl_check_table(). If I add COMPATIBLE_IOCTL(), then the check > will success, we will go to the found_handler code path and execute > do_vfs_ioctl, which is what we want. if not adding COMPATIBLE_IOCTL(), > the check will fail, and in any case, we will go to the out_fput code > path, so our ioctl does nothing. You are correct, I misremembered the code and did not check properly. > > Two more general comments: > > > > - You probably want to add the ioctls to file_ioctl instead of do_vfs_ioctl, > > so you don't add another case statement to the common path. > > > > - I don't know if there are any rules for what should be an ioctl or an > > fcntl, we're rather inconsistent about this. If you have found a good > > reason for making it an ioctl, just put that into the changelog so we > > can refer to it next time. > it can be applied to a directory too. I thought file_ioctl or fcntl is > for file. Right again, good point! Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs
On Thursday 06 January 2011, Shaohua Li wrote: > Subject: add metadata_incore ioctl in vfs > > Add an ioctl to dump filesystem's metadata in memory in vfs. Userspace > collects > such info and uses it to do metadata readahead. > Filesystem can hook to super_operations.metadata_incore to get metadata in > specific approach. Next patch will give an example how to implement > .metadata_incore in btrfs. > > Signed-off-by: Shaohua Li Looks great! Reviewed-by: Arnd Bergmann -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs
On Thursday 06 January 2011, Shaohua Li wrote: > Subject: add metadata_readahead ioctl in vfs > > Add metadata readahead ioctl in vfs. Filesystem can hook to > super_operations.metadata_readahead to handle filesystem specific task. > Next patch will give an example how btrfs implements it. > > Signed-off-by: Shaohua Li Reviewed-by: Arnd Bergmann -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oops when mounting btrfs partition
As mentioned on Google+, I have a partition that I can no longer mount normally, containing a lot of my personal data and all backups from my laptop. I found now that I am still able to mount it using the 'nospace_cache' option, but it takes a couple of minutes and I get "INFO: task btrfs-transacti:1698 blocked for more than 120 seconds." messages reporting the thread to be stuck in [] wait_for_completion+0x1d/0x20 [] btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs] [] btrfs_run_ordered_operations+0x290/0x2f0 [btrfs] [] btrfs_commit_transaction+0x5f/0xab0 [btrfs] [] transaction_kthread+0x1bd/0x240 [btrfs] [ 2040.620221] [] kthread+0xc0/0xd0 but that seems harmless otherwise. The system seems slow but usable after this, but I have seen a corrupt "akonadi" database after that, so I'm still suspicious and will try to rescue the data and reformat the partition before I do any real work on this again. Looking back at the log I actually found multiple errors. There was a btrfs checksum error a couple of weeks ago, but fsck did not find anything, so I blamed that on a spurious RAM error. This is the earliest WARNING I found in the logs I still have: Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino 15619835 off 454656 csum 2755731641 private 864823192 Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 ... Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify failed on 17006399488 wanted 54700 found 54764 Jan 21 16:35:40 localhost kernel: [1655047.752517] parent transid verify failed on 17006399488 wanted 54700 found 54764 Jan 21 16:35:40 localhost kernel: [1655047.752692] btrfs read error corrected: ino 1 off 17006399488 (dev /dev/sdb1 sector 64689288) Jan 21 16:35:40 localhost kernel: [1655047.752704] btrfs: run_one_delayed_ref returned -5 Jan 21 16:35:40 localhost kernel: [1655047.752706] [ cut here ] Jan 21 16:35:40 localhost kernel: [1655047.752746] WARNING: at /build/buildd/linux-3.5.0/fs/btrfs/super.c:221 __btrfs_abort_transaction+0xad/0xc0 [btrfs]() Jan 21 16:35:40 localhost kernel: [1655047.752747] Hardware name: P5Q-EM Jan 21 16:35:40 localhost kernel: [1655047.752748] btrfs: Transaction aborted Jan 21 16:35:40 localhost kernel: [1655047.752750] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 bnep rfcomm parport_pc ppdev bluetooth lp parport binfmt_misc dm_crypt coretemp kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek microcode snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer snd_seq_device snd soundcore snd_page_alloc asus_atk0110 mac_hid btrfs zlib_deflate libcrc32c hid_generic i915 firewire_ohci drm_kms_helper firewire_core crc_itu_t r8169 drm i2c_algo_bit video pata_marvell usbhid hid usb_storage Jan 21 16:35:40 localhost kernel: [1655047.752789] Pid: 7652, comm: btrfs-endio-wri Tainted: GW3.5.0-7-generic #7-Ubuntu Jan 21 16:35:40 localhost kernel: [1655047.752791] Call Trace: Jan 21 16:35:40 localhost kernel: [1655047.752800] [] warn_slowpath_common+0x7f/0xc0 Jan 21 16:35:40 localhost kernel: [1655047.752803] [] warn_slowpath_fmt+0x46/0x50 Jan 21 16:35:40 localhost kernel: [1655047.752814] [] __btrfs_abort_transaction+0xad/0xc0 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752826] [] btrfs_run_delayed_refs+0x3f4/0x440 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752840] [] ? merge_state+0xd9/0x150 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752845] [] ? _raw_spin_lock+0xe/0x20 Jan 21 16:35:40 localhost kernel: [1655047.752856] [] ? block_rsv_release_bytes+0x131/0x190 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752869] [] __btrfs_end_transaction+0x9f/0x350 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752882] [] btrfs_end_transaction+0x15/0x20 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752895] [] btrfs_finish_ordered_io+0x17d/0x410 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752907] [] finish_ordered_fn+0x15/0x20 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752921] [] worker_loop+0x15f/0x5a0 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752923] [] ? __schedule+0x3cf/0x7c0 Jan 21 16:35:40 localhost kernel: [1655047.752937] [] ? btrfs_queue_worker+0x330/0x330 [btrfs] Jan 21 16:35:40 localhost kernel: [1655047.752941] [] kthread+0x93/0xa0 Jan 21 16:35:40 localhost kernel: [1655047.752943] [] kernel_thread_helper+0x4/0x10 Jan 21 16:35:40 localhost kernel: [1655047.752946] [] ? flush_kthread_worker+0x80/0x80 Jan 21 16:35:40 localhost kernel: [1655047.752948] [] ? gs_change+0x13/0x13 Jan 21 16:35:40 localhost kernel: [1655047.752950] ---[ end trace 5ec1dfe1fd2d ]--- Jan 21 16:35:40 localhost kernel: [1655047.752952] BTRFS error (device sdb1) in btrfs_run_delayed_refs:2455: IO failure Jan 21 16:35:40 localhost kernel: [1655047.75
Re: Oops when mounting btrfs partition
On Saturday 02 February 2013 10:20:35 Chris Mason wrote: > Hi Arnd, > > First things first, nospace_cache is a safe thing to use. It is slow > because it's finding free extents, but it's just a cache and always safe > to discard. With your other errors, I'd just mount it readonly > and then you won't waste time on atime updates. Ok, I see. Thanks for taking a look so quickly. > I'll take a look at the BUG you got during log recovery. We've fixed a > few of those during the 3.8 rc cycle. Well, it happened on 3.8-rc4 and on 3.5 here, so I'd guess it's a different one. > > Feb 1 22:57:37 localhost kernel: [ 8561.599482] Kernel BUG at > > a01fdcf7 [verbose debug info unavailable] > > > Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino > > 15619835 off 454656 csum 2755731641 private 864823192 > > Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 > > errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 > > ... > > Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify > > failed on 17006399488 wanted 54700 found 54764 > > These aren't good. With a few exceptions for really tight races in fsx > use cases, csum errors are bad data from the disk. The transid verify > failed shows we wanted to find a metadata block from generation 54700 > but found 54764 instead: > > 54700 = 0xD5AC > 54764 = 0xD5EC > > This same bad block comes up a few different times. The machine has had problems with data consistency in the past, so I'm not too surprised with getting a single-bit error, although this is the first time in a year that I've seen problems, and I replaced the faulty memory modules some time ago. Anyway, I already ordered a replacement box a few weeks ago, and that one will have ECC memory besides being a modern Opteron system to replace the aging Core 2. > > Jan 21 16:35:40 localhost kernel: [1655047.752692] btrfs read error > > corrected: ino 1 off 17006399488 (dev /dev/sdb1 sector 64689288) > > This shows we pulled from the second copy of this block and got the > right answer, and then wrote the right answer to the duplicate. > Inode 1 means it was metadata. > > But for some reason still aborted the transaction. It could have been > an EIO on the correction, but the auto correction code in 3.5 did work > well. > > I think your plan to pull the data off and reformat is a good one. I'd > also look hard at your ram since drives don't usually send back single bit > errors. Ok. I'll wait before reformmatting though, in case you need to take a look at the data later to find out why it crashed without fsck finding a problem. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops when mounting btrfs partition
On Saturday 02 February 2013, Chris Mason wrote: > > Feb 1 22:57:37 localhost kernel: [ 8561.599482] Kernel BUG at > > a01fdcf7 [verbose debug info unavailable] > > > Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino > > 15619835 off 454656 csum 2755731641 private 864823192 > > Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 > > errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 > > ... > > Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify > > failed on 17006399488 wanted 54700 found 54764 > > These aren't good. With a few exceptions for really tight races in fsx > use cases, csum errors are bad data from the disk. The transid verify > failed shows we wanted to find a metadata block from generation 54700 > but found 54764 instead: > I've done a full backup of all data now, without any further Ooops messages, but I did get these: [66155.429029] btrfs no csum found for inode 1212139 start 23707648 [66155.429035] btrfs no csum found for inode 1212139 start 23711744 [66155.429039] btrfs no csum found for inode 1212139 start 23715840 [66155.429042] btrfs no csum found for inode 1212139 start 23719936 [66155.452298] btrfs csum failed ino 1212139 off 23707648 csum 4112094897 private 0 [66155.452310] btrfs csum failed ino 1212139 off 23711744 csum 3308812742 private 0 [66155.452316] btrfs csum failed ino 1212139 off 23715840 csum 2566472073 private 0 [66155.452322] btrfs csum failed ino 1212139 off 23719936 csum 2290008602 private 0 [66159.876785] btrfs no csum found for inode 1212139 start 69992448 [66159.876792] btrfs no csum found for inode 1212139 start 69996544 [66159.876797] btrfs no csum found for inode 1212139 start 7640 [66159.876801] btrfs no csum found for inode 1212139 start 70004736 [66159.921506] btrfs csum failed ino 1212139 off 69992448 csum 2290360822 private 0 [66159.921517] btrfs csum failed ino 1212139 off 69996544 csum 954182507 private 0 [66159.921524] btrfs csum failed ino 1212139 off 7640 csum 2594579850 private 0 [66159.921532] btrfs csum failed ino 1212139 off 70004736 csum 25334750 private 0 [66932.289905] btrfs csum failed ino 2461761 off 94208 csum 3824674580 private 1950015541 [92042.101540] btrfs csum failed ino 687755 off 7048040448 csum 2502110259 private 2186199747 [110952.542245] btrfs csum failed ino 5423479 off 475136 csum 490948044 private 3797189576 [122692.216371] btrfs csum failed ino 7959218 off 2818048 csum 1904746846 private 2392844122 [138205.726897] btrfs: sdb1 checksum verify failed on 20495056896 wanted 8C9759CB found 9BFAB73B level 0 Inode 1212139 is the akonadi database that was used by kmail, so it constantly got written to during the crashes. The file was completely corrupt. The other inodes are mostly files that were backed up from the other machine and have been on the drive I started using it, without ever being accessed. I've probably had a few bit flips the entire time I was using the machine, but never noticed before I started using a checksumming file system. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oops when mounting btrfs partition
On Friday 08 February 2013, David Sterba wrote: > On Mon, Feb 04, 2013 at 09:55:50PM +0000, Arnd Bergmann wrote: > > On Saturday 02 February 2013, Chris Mason wrote: > > > > I've done a full backup of all data now, without any further Ooops > > messages, but > > I did get these: > > > > [66155.429029] btrfs no csum found for inode 1212139 start 23707648 > > The missing csums were caused by a bug introcuded in 3.8-rc1 and fixed > in rc5. > Ok, thanks for the information. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 00/32] making inode time stamps y2038 ready
Based on the recent discussion about 64-bit time_t for new architectures, and for solving the year 2038 problem in general, I decided to try out what it would take to solve part of the kernel side of things. This is a proof-of-concept work to get us to the point where two system calls (utimes and stat) provide a working interface to user space to pass 64-bit inode time stamps in and out of the kernel all the way to the file systems. I picked this because it is a fairly isolated problem, as the inode time stamps are rarely assigned to any other time values. As a byproduct of this work, I documented for each of the file systems we support how long the on-disk format can work[1]. Obviously we also need to convert all the other syscalls and have a proper libc implementation using those for this to be really useful, but it's a start and it can be tested independently (I didn't so far, want to wait for initial feedback). All the interesting stuff is in the first five patches here, the rest is the straightforward conversion of all file systems that use 'timespec' values internally. There are of course a number of open questions: a) is this the right approach in general? The previous discussion pointed this way, but there may be other opinions. b) what type should we use internally to represent inode time stamps? The code contains three different versions that would all work, we just have to pick a good tradeoff between efficiency and the range of times we want to cover. c) Should we continue this way for all 32-bit platforms for consistency, including future ones, or should we go to different 64-bit types right away? My feeling is that the second approach would complicate this work. Arnd [1] http://kernelnewbies.org/y2038 Arnd Bergmann (32): fs: introduce new 'struct inode_time' uapi: add struct __kernel_timespec{32,64} fs: introduce sys_utimens64at fs: introduce sys_newfstat64/sys_newfstatat64 arch: hook up new stat and utimes syscalls isofs: fix timestamps beyond 2027 fs/nfs: convert to struct inode_time fs/ceph: convert to 'struct inode_time' fs/pstore: convert to struct inode_time fs/coda: convert to struct inode_time xfs: convert to struct inode_time btrfs: convert to struct inode_time ext3: convert to struct inode_time ext4: convert to struct inode_time cifs: convert to struct inode_time ntfs: convert to struct inode_time ubifs: convert to struct inode_time ocfs2: convert to struct inode_time fs/fat: convert to struct inode_time afs: convert to struct inode_time udf: convert to struct inode_time fs: convert simple fs to inode_time logfs: convert to struct inode_time hfs, hfsplus: convert to struct inode_time gfs2: convert to struct inode_time reiserfs: convert to struct inode_time jffs2: convert to struct inode_time adfs: convert to struct inode_time f2fs: convert to struct inode_time fuse: convert to struct inode_time scsi: fnic: use current_kernel_time() for timestamp fs: use new inode_time definition unconditionally arch/alpha/kernel/osf_sys.c| 2 +- arch/arm/include/asm/unistd.h | 2 +- arch/arm/include/uapi/asm/stat.h | 25 + arch/arm/include/uapi/asm/unistd.h | 3 +++ arch/arm/kernel/calls.S| 3 +++ arch/arm64/include/asm/unistd32.h | 5 +++- arch/x86/include/uapi/asm/stat.h | 28 +++ arch/x86/syscalls/syscall_32.tbl | 3 +++ drivers/block/rbd.c| 2 +- drivers/firmware/efi/efi-pstore.c | 28 +-- drivers/scsi/fnic/fnic_trace.c | 2 +- drivers/tty/tty_io.c | 2 +- drivers/usb/gadget/f_fs.c | 2 +- fs/adfs/inode.c| 4 +-- fs/afs/afs.h | 6 ++--- fs/afs/fsclient.c | 2 +- fs/attr.c | 8 +++--- fs/btrfs/file.c| 6 ++--- fs/btrfs/inode.c | 4 +-- fs/btrfs/ioctl.c | 4 +-- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 2 +- fs/ceph/cache.c| 2 +- fs/ceph/caps.c | 6 ++--- fs/ceph/file.c | 4 +-- fs/ceph/inode.c| 20 +++--- fs/ceph/super.h| 8 +++--- fs/cifs/cache.c| 6 ++--- fs/cifs/cifsglob.h | 6 ++--- fs/cifs/cifsproto.h| 6 ++--- fs/cifs/cifssmb.c | 5 ++-- fs/cifs/inode.c| 2 +- fs/cifs/netmisc.c | 15 ++- fs/coda/coda_linux.c | 18 - fs/compat.c| 19 ++--- fs/configfs/inode.c| 6 ++--- fs/cramfs/inode.c | 2 +- fs/ext3/inode.c| 4 +-- fs/ext4/ext4.h | 10 +++ fs/ext4/extents.c
[RFC 12/32] btrfs: convert to struct inode_time
btrfs uses unsigned 64-bit seconds for inode timestamps, which will work basically forever, but the VFS uses struct timespec for timestamps, which is only good until 2038 on 32-bit CPUs. This gets us one small step closer to lifting the VFS limit by using struct inode_time in btrfs. Signed-off-by: Arnd Bergmann Cc: Chris Mason Cc: Josef Bacik Cc: linux-btrfs@vger.kernel.org --- fs/btrfs/file.c| 6 +++--- fs/btrfs/inode.c | 4 ++-- fs/btrfs/ioctl.c | 4 ++-- fs/btrfs/root-tree.c | 2 +- fs/btrfs/transaction.c | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index a58df83..3e16a4e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1693,16 +1693,16 @@ out: static void update_time_for_write(struct inode *inode) { - struct timespec now; + struct inode_time now; if (IS_NOCMTIME(inode)) return; now = current_fs_time(inode->i_sb); - if (!timespec_equal(&inode->i_mtime, &now)) + if (!inode_time_equal(&inode->i_mtime, &now)) inode->i_mtime = now; - if (!timespec_equal(&inode->i_ctime, &now)) + if (!inode_time_equal(&inode->i_ctime, &now)) inode->i_ctime = now; if (IS_I_VERSION(inode)) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2ac3036..d825387 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5440,7 +5440,7 @@ static int btrfs_dirty_inode(struct inode *inode) * This is a copy of file_update_time. We need this so we can return error on * ENOSPC for updating the inode in the case of file write and mmap writes. */ -static int btrfs_update_time(struct inode *inode, struct timespec *now, +static int btrfs_update_time(struct inode *inode, struct inode_time *now, int flags) { struct btrfs_root *root = BTRFS_I(inode)->root; @@ -8223,7 +8223,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct btrfs_root *dest = BTRFS_I(new_dir)->root; struct inode *new_inode = new_dentry->d_inode; struct inode *old_inode = old_dentry->d_inode; - struct timespec ctime = CURRENT_TIME; + struct inode_time ctime = CURRENT_TIME; u64 index = 0; u64 root_objectid; int ret; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a313ab0..2de5f86 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -435,7 +435,7 @@ static noinline int create_subvol(struct inode *dir, struct btrfs_root *root = BTRFS_I(dir)->root; struct btrfs_root *new_root; struct btrfs_block_rsv block_rsv; - struct timespec cur_time = CURRENT_TIME; + struct inode_time cur_time = CURRENT_TIME; struct inode *inode; int ret; int err; @@ -4456,7 +4456,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file, struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_root_item *root_item = &root->root_item; struct btrfs_trans_handle *trans; - struct timespec ct = CURRENT_TIME; + struct inode_time ct = CURRENT_TIME; int ret = 0; int received_uuid_changed; diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 38bb47e..344e89f 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -487,7 +487,7 @@ void btrfs_update_root_times(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_root_item *item = &root->root_item; - struct timespec ct = CURRENT_TIME; + struct inode_time ct = CURRENT_TIME; spin_lock(&root->root_item_lock); btrfs_set_root_ctransid(item, trans->transid); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 7579f6d..09dcc8a 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1133,7 +1133,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, struct dentry *dentry; struct extent_buffer *tmp; struct extent_buffer *old; - struct timespec cur_time = CURRENT_TIME; + struct inode_time cur_time = CURRENT_TIME; int ret = 0; u64 to_reserve = 0; u64 index = 0; -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Saturday 31 May 2014 16:51:15 Richard Cochran wrote: > On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote: > > > > I picked this because it is a fairly isolated problem, as the > > inode time stamps are rarely assigned to any other time values. > > As a byproduct of this work, I documented for each of the file > > systems we support how long the on-disk format can work[1]. > > Why are some of the time stamp expiration dates marked as "never"? It's an approximation: with 64-bit timestamps, you can represent close to 300 billion years, which is way past the time that our planet can sustain life of any form[1]. Arnd [1] http://en.wikipedia.org/wiki/Timeline_of_the_far_future -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: > On Fri, 30 May 2014, Arnd Bergmann wrote: > > > a) is this the right approach in general? The previous discussion > >pointed this way, but there may be other opinions. > > The syscall changes seem like the sort of thing I'd expect, although > patches adding new syscalls or otherwise affecting the kernel/userspace > interface (as opposed to those relating to an individual filesystem) > should go to linux-api as well as other relevant lists. Ok. Sorry about missing linux-api, I confused it with linux-arch, which may not be as relevant here, except for the one question whether we actually want to have the new ABI on all 32-bit architectures or only as an opt-in for those that expect to stay around for another 24 years. Two more questions for you: - are you (and others) happy with adding this type of stat syscall (fstatat64/fstat64) as opposed to the more generic xstat that has been discussed in the past and that never made it through the bike- shedding discussion? - once we have enough buy-in from reviewers to merge this initial series, should we proceed to define rest of the syscall ABI (minus driver ioctls) so glibc and kernel can do the conversion on top of that, or should we better try to do things one syscall family at a time and actually get the kernel to handle them correctly internally? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote: > On 06/02/2014 12:19 PM, Arnd Bergmann wrote: > > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote: > >> On Fri, 30 May 2014, Arnd Bergmann wrote: > >> > >>> a) is this the right approach in general? The previous discussion > >>>pointed this way, but there may be other opinions. > >> > >> The syscall changes seem like the sort of thing I'd expect, although > >> patches adding new syscalls or otherwise affecting the kernel/userspace > >> interface (as opposed to those relating to an individual filesystem) > >> should go to linux-api as well as other relevant lists. > > > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > > may not be as relevant here, except for the one question whether we > > actually want to have the new ABI on all 32-bit architectures or only > > as an opt-in for those that expect to stay around for another 24 years. > > > > Two more questions for you: > > > > - are you (and others) happy with adding this type of stat syscall > > (fstatat64/fstat64) as opposed to the more generic xstat that has > > been discussed in the past and that never made it through the bike- > > shedding discussion? > > > > - once we have enough buy-in from reviewers to merge this initial > > series, should we proceed to define rest of the syscall ABI > > (minus driver ioctls) so glibc and kernel can do the conversion > > on top of that, or should we better try to do things one syscall > > family at a time and actually get the kernel to handle them > > correctly internally? > > > > The bit that is really going to hurt is every single ioctl that uses a > timespec. > > Honestly, though, I really don't understand the point with "struct > inode_time". It seems like the zeroeth-order thing is to change the > kernel internal version of struct timespec to have a 64-bit time... it > isn't just about inodes. We then should be explicit about the external > uses of time, and use accessors. I picked these because they are fairly isolated from all other uses, in particular since inode times are the only things where we really care about times in the distant past or future (decades away as opposed to things that happened between boot and shutdown). For other kernel-internal uses, we may be better off migrating to a completely different representation, such as nanoseconds since boot or the architecture specific ktime_t, but this is really something to decide for each subsystem. I just tried building an arm32 kernel with a s64 time_t, and that failed horribly, I get linker errors for missing 64-bit divides and lots of warnings for code that expects time_t pointers to functions taking a 'long' or vice versa. I also think the only way to maintain ABI compatibility is to separate the internal uses from the interface, which means auditing all code in the end. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote: > By the way, what about NILFS2? Is NILFS2 ready for suggested approach > without any changes? nilfs2 and a lot of other file systems don't need any changes for this, because they don't assign the inode time stamp fields to a 'struct timespec'. FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and can represent the full range of user space timespec on all machines. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote: > On Tue, 3 Jun 2014, Arnd Bergmann wrote: > > > I think John Stultz and Thomas Gleixner have already started looking > > at how the timekeeping code can be updated. Once that is done, we should > > be able to add a functional 64-bit gettimeofday/settimeofday syscall > > pair. While I definitely agree this is one of the most basic things to > > have, it's also not an area of the kernel that is easy to change. > > 64-bit clock_gettime / clock_settime instead of gettimeofday / > settimeofday should avoid the need for the kernel to have a 64-bit version > of struct timeval. (Userspace 64-bit gettimeofday / settimeofday would > need to use a combination of the syscalls if the tz pointer is non-NULL.) Yes, that's what I meant. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > >> > >> The bit that is really going to hurt is every single ioctl that uses a > >> timespec. > >> > >> Honestly, though, I really don't understand the point with "struct > >> inode_time". It seems like the zeroeth-order thing is to change the > >> kernel internal version of struct timespec to have a 64-bit time... it > >> isn't just about inodes. We then should be explicit about the external > >> uses of time, and use accessors. > > > > I picked these because they are fairly isolated from all other uses, > > in particular since inode times are the only things where we really > > care about times in the distant past or future (decades away as opposed > > to things that happened between boot and shutdown). > > > > If nothing else, I would expect to be able to set the system time to > weird values for testing. So I'm not so sure I agree with that... I think John Stultz and Thomas Gleixner have already started looking at how the timekeeping code can be updated. Once that is done, we should be able to add a functional 64-bit gettimeofday/settimeofday syscall pair. While I definitely agree this is one of the most basic things to have, it's also not an area of the kernel that is easy to change. > > For other kernel-internal uses, we may be better off migrating to > > a completely different representation, such as nanoseconds since > > boot or the architecture specific ktime_t, but this is really something > > to decide for each subsystem. > > Having a bunch of different time representations in the kernel seems > like a real headache... We already have time_t, ktime_t, timeval, timespec, compat_timespec, clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64, and lots of driver or file system specific representations. I'm all for removing a bunch of these from the kernel, but my feeling is that this is one of the cases where we first have to add new ones in order to remove those that are already there. To complicate things further, we also have various times bases (realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...), and at least for the timespec values we pass around, it's not always obvious which one is used, of if that's the right one. We probably don't want to add a lot of new representations, and it's possible that we can change most of the internal code we have to ktime_t and then convert that to whatever user space wants at the interfaces. The possible uses I can see for non-ktime_t types in the kernel are: * inodes need 96 bit timestamps to represent the full range of values that can be stored in a file system, you made a convincing argument for that. Almost everything else can fit into 64 bit on a 32-bit kernel, in theory also on a 64-bit kernel if we want that. * A number of interfaces pass relative timespecs: nanosleep(), poll(), select(), sigtimedwait(), alarm(), futex() and probably more. There is nothing wrong with the use of timespec here, and it may be good to annotate that by using a new type (e.g. struct timeout) that is defined as compatible with the current timespec. * For new user interfaces, we need a new type such as the __kernel_timespec64 I introduced, so it doesn't clash with the normal user timespec that may be smaller, depending on the libc. * A lot of drivers will need new ioctl commands, and for drivers that just need time stamps (audio, v4l, sockets, ...) it may be more efficient and more correct to use a new timestamp_t (e.g. boot time 64-bit nanoseconds) than __kernel_timespec64, which is not normally monotonic and requires a normalization step. If we end up introducing such a type in the user interface, we can also start using it in the kernel. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Tuesday 03 June 2014, Dave Chinner wrote: > On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote: > > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote: > > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote: > > The possible uses I can see for non-ktime_t types in the kernel are: > > * inodes need 96 bit timestamps to represent the full range of values > > that can be stored in a file system, you made a convincing argument > > for that. Almost everything else can fit into 64 bit on a 32-bit > > kernel, in theory also on a 64-bit kernel if we want that. > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > filesystems can *support up to* 96 bit timestamps. If the kernel > only supports 64 bit timestamps and that's all the kernel can > represent, then the upper bits of the 96 bit on-disk inode > timestamps simply remain zero. I meant the reverse: since we have file systems that can store 96-bit timestamps when using 64-bit kernels, we need to extend 32-bit kernels to have the same internal representation so we can actually read those file systems correctly. > If you move the filesystem between kernels with different time > ranges, then the filesystem needs to be able to tell the kernel what > it's supported range is. This is where having the VFS limit the > range of supported timestamps is important: the limit is the > min(kernel range, filesystem range). This allows the filesystems > to be indepenent of the kernel time representation, and the kernel > to be independent of the physical filesystem time encoding I agree it makes sense to let the kernel know about the limits of the file system it accesses, but for the reverse, we're probably better off just making the kernel representation large enough (i.e. 96 bits) so it can work with any known file system. We need another check at the user space boundary to turn that into a value that the user can understand, but that's another problem. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Monday 02 June 2014, Joseph S. Myers wrote: > On Mon, 2 Jun 2014, Arnd Bergmann wrote: > > > Ok. Sorry about missing linux-api, I confused it with linux-arch, which > > may not be as relevant here, except for the one question whether we > > actually want to have the new ABI on all 32-bit architectures or only > > as an opt-in for those that expect to stay around for another 24 years. > > For glibc I think it will make the most sense to add the support for > 64-bit time_t across all architectures that currently have 32-bit time_t > (with the new interfaces having fallback support to implementation in > terms of the 32-bit kernel interfaces, if the 64-bit syscalls are > unavailable either at runtime or in the kernel headers against which glibc > is compiled - this fallback code will of course need to check for overflow > when passing a time value to the kernel, hopefully with error handling > consistent with whatever the kernel ends up doing when a filesystem can't > support a timestamp). If some architectures don't provide the new > interfaces in the kernel then that will mean the fallback code in glibc > can't be removed until glibc support for those architectures is removed > (as opposed to removing it when glibc no longer supports kernels predating > the kernel support). Ok, that's a good reason to just provide the new interfaces on all architectures right away. Thanks for the insight! Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote: > On Wed, 4 Jun 2014, Arnd Bergmann wrote: > > > On Tuesday 03 June 2014, Dave Chinner wrote: > > > Just ot be pedantic, inodes don't need 96 bit timestamps - some > > > filesystems can *support up to* 96 bit timestamps. If the kernel > > > only supports 64 bit timestamps and that's all the kernel can > > > represent, then the upper bits of the 96 bit on-disk inode > > > timestamps simply remain zero. > > > > I meant the reverse: since we have file systems that can store > > 96-bit timestamps when using 64-bit kernels, we need to extend > > 32-bit kernels to have the same internal representation so we > > can actually read those file systems correctly. > > > > > If you move the filesystem between kernels with different time > > > ranges, then the filesystem needs to be able to tell the kernel what > > > it's supported range is. This is where having the VFS limit the > > > range of supported timestamps is important: the limit is the > > > min(kernel range, filesystem range). This allows the filesystems > > > to be indepenent of the kernel time representation, and the kernel > > > to be independent of the physical filesystem time encoding > > > > I agree it makes sense to let the kernel know about the limits > > of the file system it accesses, but for the reverse, we're probably > > better off just making the kernel representation large enough (i.e. > > 96 bits) so it can work with any known file system. > > Depends... 96 bit handling may get prohibitive on 32-bit archs. > > The important point here is for the kernel to be able to represent the > time _range_ used by any known filesystem, not necessarily the time > _precision_. > > For example, a 64 bit representation can be made of 40 bits for seconds > spanning 34865 years, and 24 bits for fractional seconds providing > precision down to 60 nanosecs. That ought to be plenty good on 32 bit > systems while still being cheap to handle. I have checked earlier that we don't do any computation on inode time stamps in common code, we just pass them around, so there is very little runtime overhead. There is a small bit of space overhead (12 byte) per inode, but that structure is already on the order of 500 bytes. For other timekeeping stuff in the kernel, I agree that using some 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds, ...) has advantages, that's exactly the point I was making earlier against simply extending the internal time_t/timespec to 64-bit seconds for everything. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 00/32] making inode time stamps y2038 ready
On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote: > On 06/04/2014 12:24 PM, Arnd Bergmann wrote: > > > > For other timekeeping stuff in the kernel, I agree that using some > > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds, > > ...) has advantages, that's exactly the point I was making earlier > > against simply extending the internal time_t/timespec to 64-bit > > seconds for everything. > > > > How much of a performance issue is it to make time_t 64 bits, and for > the bits there are, how hard are they to fix? Probably very little overhead for most uses, it's more the regression potential in the less common parts of the kernel I'm worried about. There is a significant but not overwhelming number of uses of the main problematic types in the kernel: arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc 188 1885566 arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc 320 320 10353 arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc 406 406 10886 I believe we have to audit all of them anyway if we want to change the kernel to less problematic types and introduce new user interfaces. IMHO this work is helped if we change the uses to a new type as we find the problems. This lets us do the work one subsystem at a time and avoid accidental ABI changes. I don't care much what type that will be, and having a 96-bit type will certainly work well in a lot of cases, but I don't see a strong reason to use that over other types, especially when they can be more efficient. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: avoid build warning on 32-bit
A recent change introduced a type cast from a private 64-bit value to a pointer, which works fine on 64-bit architectures, but not on 32-bit ones, where it produces a harmless compiler warning: fs/btrfs/extent_io.c: In function 'btrfs_free_io_failure_record': fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] This adds an intermediate cast to 'unsigned long', which tells the compiler to ignore the type mismatch. Signed-off-by: Arnd Bergmann Fixes: f612496bca664 ("Btrfs: cleanup the read failure record after write or when the inode is freeing") diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4ebabd237153..790dbae3343c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2190,7 +2190,7 @@ void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end) next = next_state(state); - failrec = (struct io_failure_record *)state->private; + failrec = (struct io_failure_record *)(unsigned long)state->private; free_extent_state(state); kfree(failrec); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix size_t format string
This resolves a harmless gcc warning in btrfs_check_super_valid that results from a size_t value being printed as %lu: fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=] On all Linux systems, size_t is the same length as unsigned long, but the compiler does not know this, and warns about potentially unportable code here. The correct printf string for size_t is %z. Signed-off-by: Arnd Bergmann Fixes: ce7fca5f57ed0f "btrfs: add checks for sys_chunk_array sizes" Cc: David Sterba Cc: Chris Mason --- This warning has been rather annoying because it shows up in every 'allmodconfig' build. I assume others have reported it before, but please apply some fix for it, ideally before 4.0. diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f79f38542a73..639f2663ed3f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3921,7 +3921,7 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, } if (btrfs_super_sys_array_size(sb) < sizeof(struct btrfs_disk_key) + sizeof(struct btrfs_chunk)) { - printk(KERN_ERR "BTRFS: system chunk array too small %u < %lu\n", + printk(KERN_ERR "BTRFS: system chunk array too small %u < %zu\n", btrfs_super_sys_array_size(sb), sizeof(struct btrfs_disk_key) + sizeof(struct btrfs_chunk)); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs for mainline
On Saturday 03 January 2009, Chris Mason wrote: > > > Actually a lot of the ioctl API don't just need documentation but > > a complete redo. That's true at least for the physical device > > management and subvolume / snaphot ones. > > > > The ioctl interface is definitely not finalized. Adding more vs > replacing the existing ones is an open question. As long as that's an open question, the ioctl interface shouldn't get merged into the kernel, or should get in as btrfsdev, otherwise you get stuck with the current ABI forever. Is it possible to separate out the nonstandard ioctls into a patch that can get merged when the interface is final, or will that make btrfs unusable? Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ENOSPC on mostly empty file system
Hi Chris, As I mentioned at the kernel summit, I have a file system that I use mostly for storing my one kernel git tree and occasionally some build trees (those are normally on a tmpfs), and I have again run into the problem where the file system is only partially full (I think 18% in this case) but I am unable to create new files. As you recommended, I have created an image using btrfs-image and uploaded it to google drive for inspection. I also played around with it some more. After removing a few small files, I could create new files with up to 20-60MB again before hitting ENOSPC. I then did a 'make clean' in all the object directories I had and after that could fill the file system up to 100% by creating a 62GB file using the same command (dd if=/dev/zero of=x). While probably unrelated, I also saw a few older 'hung task' messages with btrfs call chains in dmesg and I'm adding them here in this mail as well. The kernel messages are ten days old, but I did not reboot in the meantime and I never saw them again before or after that. The kernel is the stock Ubuntu 3.16.0 image from Ubuntu, I upgrade to that after running an old 3.11 kernel that I never rebooted. The 82 MB image file is at https://drive.google.com/file/d/0B_XQwQ5KlfJAWDJfS1E0TG1CLTA/edit?usp=sharing Arnd [407459.475639] INFO: task BrowserBlocking:3282 blocked for more than 120 seconds. [407459.475649] Not tainted 3.16.0-10-generic #15-Ubuntu [407459.475652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [407459.475656] BrowserBlocking D 88042fc94800 0 3282 2580 0x [407459.475664] 88040d9f7db0 0086 88041215b2a0 00014800 [407459.475671] 88040d9f7fd8 00014800 88041215b2a0 8800bbb12af8 [407459.475677] 8800bbb12afc 88041215b2a0 8800bbb12b00 [407459.475683] Call Trace: [407459.475699] [] schedule_preempt_disabled+0x29/0x70 [407459.475707] [] __mutex_lock_slowpath+0xd5/0x1f0 [407459.475714] [] mutex_lock+0x1f/0x30 [407459.475771] [] btrfs_log_inode_parent+0xd5/0x550 [btrfs] [407459.475804] [] btrfs_log_dentry_safe+0x42/0x60 [btrfs] [407459.475834] [] btrfs_sync_file+0x184/0x300 [btrfs] [407459.475844] [] do_fsync+0x51/0x80 [407459.475849] [] SyS_fsync+0x10/0x20 [407459.475856] [] system_call_fastpath+0x1a/0x1f [407459.475861] INFO: task BrowserBlocking:3283 blocked for more than 120 seconds. [407459.475864] Not tainted 3.16.0-10-generic #15-Ubuntu [407459.475866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [407459.475869] BrowserBlocking D 88042fd54800 0 3283 2580 0x [407459.475875] 88040bb13db0 0086 88041215ef60 00014800 [407459.475881] 88040bb13fd8 00014800 88041215ef60 880429b43590 [407459.475886] 880429b43594 88041215ef60 880429b43598 [407459.475892] Call Trace: [407459.475899] [] schedule_preempt_disabled+0x29/0x70 [407459.475906] [] __mutex_lock_slowpath+0xd5/0x1f0 [407459.475912] [] mutex_lock+0x1f/0x30 [407459.475943] [] btrfs_log_inode_parent+0x435/0x550 [btrfs] [407459.475976] [] btrfs_log_dentry_safe+0x42/0x60 [btrfs] [407459.476005] [] btrfs_sync_file+0x184/0x300 [btrfs] [407459.476012] [] do_fsync+0x51/0x80 [407459.476018] [] SyS_fdatasync+0x13/0x20 [407459.476023] [] system_call_fastpath+0x1a/0x1f [407459.476099] INFO: task kworker/u65:6:2090685 blocked for more than 120 seconds. [407459.476102] Not tainted 3.16.0-10-generic #15-Ubuntu [407459.476104] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [407459.476107] kworker/u65:6 D 88042fd14800 0 2090685 2 0x [407459.476131] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] [407459.476134] 8806189ffba0 0046 880428135100 00014800 [407459.476140] 8806189fffd8 00014800 880428135100 8806189ffcc8 [407459.476145] 8806189ffcc0 7fff 880428135100 0001 [407459.476151] Call Trace: [407459.476158] [] schedule+0x29/0x70 [407459.476164] [] schedule_timeout+0x229/0x2a0 [407459.476171] [] ? __queue_work+0x136/0x340 [407459.476177] [] ? __queue_delayed_work+0xaa/0x1c0 [407459.476184] [] wait_for_completion+0xb4/0x1e0 [407459.476191] [] ? wake_up_state+0x20/0x20 [407459.476198] [] writeback_inodes_sb_nr+0x81/0xb0 [407459.476221] [] flush_space+0x458/0x4f0 [btrfs] [407459.476243] [] ? can_overcommit+0x67/0x100 [btrfs] [407459.476266] [] btrfs_async_reclaim_metadata_space+0x16c/0x1e0 [btrfs] [407459.476273] [] process_one_work+0x182/0x4e0 [407459.476278] [] worker_thread+0x6b/0x6a0 [407459.476284] [] ? process_one_work+0x4e0/0x4e0 [407459.476290] [] kthread+0xdb/0x100 [407459.476297] [] ? kthread_create_on_node+0x1c0/0x1c0 [407459.476302] [] ret_from_fork+0x7c/0xb0 [407459.476308] [] ? kthread_create_on_node+0x1c0/0x1c0 [
Re: ENOSPC on mostly empty file system
On Tuesday 09 September 2014 16:29:05 Arnd Bergmann wrote: > > I also played around with it some more. After removing a few small > files, I could create new files with up to 20-60MB again before hitting > ENOSPC. I then did a 'make clean' in all the object directories I had > and after that could fill the file system up to 100% by creating a > 62GB file using the same command (dd if=/dev/zero of=x). Ok, one more data point: I was accidentally building randconfig kernels in the same partition all the time, meaning I had lots of traffic with small files using the same names all over all the time. After I got the file system back to a usable state today, building some more randconfig kernels broke it in the same way. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC on mostly empty file system
On Tuesday 09 September 2014 21:49:12 Clemens Eisserer wrote: > Hi Arnd, > > > Ok, one more data point: > > Why don't you provide the data point you were specifically asked for, > "btrfs fi df" ;) I've cleaned it up again already. At the moment, it's working fine, with this data: Data: total=65.11GB, used=7.41GB System, DUP: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=799.54MB Metadata: total=8.00MB, used=0.00 : total=184.00MB, used=0.00 I also realized that the file I uploaded was the uncompressed 800MB version, not the compressed 82MB file, which is now at https://drive.google.com/file/d/0B_XQwQ5KlfJAYzY5c1lJRFVQNEE/edit?usp=sharing Sorry about that. Ok, now I'm in the bad state again (after running a 'make allmodconfig' kernel build: Label: none uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47 Total devices 1 FS bytes used 8.79GB devid1 size 67.14GB used 67.14GB path /dev/sdc6 Data: total=65.11GB, used=7.99GB System, DUP: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=821.48MB Metadata: total=8.00MB, used=0.00 : total=200.00MB, used=0.00 Not much extra space used up. Also, a bit from my last shell session after getting into this state: arnd@wuerfel:~/arm-soc$ make -skj40 ../arch/arm/mach-mmp/devices.c:81:21: warning: 'u2o_get' defined but not used [-Wunused-function] static unsigned int u2o_get(void __iomem *base, unsigned int offset) ^ ../arch/arm/mach-mmp/devices.c:86:13: warning: 'u2o_set' defined but not used [-Wunused-function] static void u2o_set(void __iomem *base, unsigned int offset, ^ ../arch/arm/mach-mmp/devices.c:97:13: warning: 'u2o_clear' defined but not used [-Wunused-function] static void u2o_clear(void __iomem *base, unsigned int offset, ^ ../arch/arm/mach-mmp/devices.c:108:13: warning: 'u2o_write' defined but not used [-Wunused-function] static void u2o_write(void __iomem *base, unsigned int offset, ^ mv: cannot move 'security/selinux/ss/.tmp_conditional.o' to 'security/selinux/ss/conditional.o': No space left on device make[4]: *** [security/selinux/ss/conditional.o] Error 1 mv: cannot move 'kernel/power/.snapshot.o.tmp' to 'kernel/power/.snapshot.o.cmd': No space left on device make[4]: *** [kernel/power/snapshot.o] Error 1 mv: cannot move 'drivers/ata/.pata_hpt366.o.tmp' to 'drivers/ata/.pata_hpt366.o.cmd': No space left on device make[4]: *** [drivers/ata/pata_hpt366.o] Error 1 mv: cannot move 'kernel/irq/.proc.o.tmp' to 'kernel/irq/.proc.o.cmd': No space left on device make[4]: *** [kernel/irq/proc.o] Error 1 ... arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: writing to 'x': No space left on device 416642+0 records in 416641+0 records out 213320192 bytes (213 MB) copied, 2.97814 s, 71.6 MB/s arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: opening 'x': No space left on device arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: opening 'x': No space left on device arnd@wuerfel:~/arm-soc$ rm x arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: writing to 'x': No space left on device 366053+0 records in 366052+0 records out 187418624 bytes (187 MB) copied, 3.04426 s, 61.6 MB/s arnd@wuerfel:~/arm-soc$ rm x arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: writing to 'x': No space left on device 363482+0 records in 363481+0 records out 186102272 bytes (186 MB) copied, 2.80644 s, 66.3 MB/s arnd@wuerfel:~/arm-soc$ rm x arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: writing to 'x': No space left on device 366677+0 records in 366676+0 records out 187738112 bytes (188 MB) copied, 2.86939 s, 65.4 MB/s arnd@wuerfel:~/arm-soc$ sync arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: opening 'x': No space left on device arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: opening 'x': No space left on device arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x dd: opening 'x': No space left on device I believe that there were no other processes writing to this partition at the time. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC on mostly empty file system
On Tuesday 09 September 2014 22:57:25 Hugo Mills wrote: > On Tue, Sep 09, 2014 at 11:49:10PM +0200, Arnd Bergmann wrote: > > Ok, now I'm in the bad state again (after running a 'make allmodconfig' > > kernel build: > > > > Label: none uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47 > > Total devices 1 FS bytes used 8.79GB > > devid1 size 67.14GB used 67.14GB path /dev/sdc6 > >All the space on the FS has been allocated to some purpose or other. > > > Data: total=65.11GB, used=7.99GB > >Here, you have 65 GiB allocated to data, but only 8 GiB of that > used. The FS won't automatically free up any of that (yet -- it's one > of the project ideas). Ok, I see. >So... you need to free up some data chunks. You can do this with: > > # btrfs balance start -dusage=5 /mountpoint > >Take a look at the output of btrfs fi df and btrfs fi show > afterwards, and see how much the Data allocation has reduced by, and > how much unallocated space you have left afterwards. You may want to > increase the number in the above balance command to some higher value, > to free up even more chunks (it limits the balance to chunks less than > n% full -- so the command above will only touch chunks with 5% actual > data or less). This is in the FAQ. This is the output afterwards: $ time sudo btrfs fi balance start -dusage=5 /git/ Done, had to relocate 44 out of 70 chunks real0m22.026s user0m0.014s sys 0m8.302s $ sudo btrfs fi show /dev/sdc6 Label: none uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47 Total devices 1 FS bytes used 8.79GB devid1 size 67.14GB used 23.14GB path /dev/sdc6 $ sudo btrfs fi df /git Data: total=21.01GB, used=7.99GB System, DUP: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.05GB, used=821.32MB Metadata: total=8.00MB, used=0.00 : total=200.00MB, used=0.00 I'm pretty sure I read the FAQ one of the last times this happened, but I thought it would only apply to mostly full file systems. I can usually recover by moving all the data to tmpfs and doing a new mkfs before moving the data back, and that normally lasts for a few months before I run out of space again. I'll try to see how long it takes until the problem comes back now. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html