Re: [GIT PULL] [PATCH v4 00/26] Delete CURRENT_TIME and CURRENT_TIME_SEC macros

2016-08-23 Thread Arnd Bergmann
On Monday, August 15, 2016 6:23:12 PM CEST Greg KH wrote:
> On Sat, Aug 13, 2016 at 03:48:12PM -0700, Deepa Dinamani wrote:
> > The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC 
> > macros.
> > The macros are not y2038 safe. There is no plan to transition them into 
> > being
> > y2038 safe.
> > ktime_get_* api's can be used in their place. And, these are y2038 safe.
> 
> Who are you execting to pull this huge patch series?

Dave Chinner suggested to have Al Viro pick up the whole series.

> Why not just introduce the new api call, wait for that to be merged, and
> then push the individual patches through the different subsystems?
> After half of those get ignored, then provide a single set of patches
> that can go through Andrew or my trees.

That was the original approach for v4.7, but (along with requesting
a number of reworks that Deepa incorporated), Linus preferred doing
the API change done in one chunk, see
https://patchwork.kernel.org/patch/9134249/

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix btrfs_no_printk stub helper

2016-09-23 Thread Arnd Bergmann
The addition of btrfs_no_printk() caused a build failure when
CONFIG_PRINTK is disabled:

fs/btrfs/send.c: In function 'send_rename':
fs/btrfs/ctree.h:3367:2: error: implicit declaration of function 
'btrfs_no_printk' [-Werror=implicit-function-declaration]

This moves the helper outside of that #ifdef so it is always
defined, and changes the existing #ifdef to refer to that
helper as well for consistency.

Fixes: 47c57058ff2c ("btrfs: btrfs_debug should consume fs_info when DEBUG is 
not defined")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/ctree.h | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e5621268c9fe..63bf4cc34626 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3241,20 +3241,17 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
unsigned long new_flags);
 int btrfs_sync_fs(struct super_block *sb, int wait);
 
+static inline __printf(2, 3)
+void btrfs_no_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...)
+{
+}
+
 #ifdef CONFIG_PRINTK
 __printf(2, 3)
 void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...);
-__printf(2, 3)
-static inline int btrfs_no_printk(const struct btrfs_fs_info *fs_info,
-  const char *fmt, ...)
-{
-   return 0;
-}
 #else
-static inline __printf(2, 3)
-void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...)
-{
-}
+#define btrfs_printk(fs_info, fmt, args...) \
+   btrfs_no_printk(fs_info, fmt, ##args)
 #endif
 
 #define btrfs_emerg(fs_info, fmt, args...) \
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-24 Thread Arnd Bergmann
On Tue, Sep 18, 2018 at 7:59 PM Jason Gunthorpe  wrote:
>
> On Tue, Sep 18, 2018 at 10:51:08AM -0700, Darren Hart wrote:
> > On Fri, Sep 14, 2018 at 09:57:48PM +0100, Al Viro wrote:
> > > On Fri, Sep 14, 2018 at 01:35:06PM -0700, Darren Hart wrote:
> > >
> > > > Acked-by: Darren Hart (VMware) 
> > > >
> > > > As for a longer term solution, would it be possible to init fops in such
> > > > a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg
> > > > so we don't have to duplicate this boilerplate for every ioctl fops
> > > > structure?
> > >
> > > Bad idea, that...  Because several years down the road somebody will 
> > > add
> > > an ioctl that takes an unsigned int for argument.  Without so much as 
> > > looking
> > > at your magical mystery macro being used to initialize file_operations.
> >
> > Fair, being explicit in the declaration as it is currently may be
> > preferable then.
>
> It would be much cleaner and safer if you could arrange things to add
> something like this to struct file_operations:
>
>   long (*ptr_ioctl) (struct file *, unsigned int, void __user *);
>
> Where the core code automatically converts the unsigned long to the
> void __user * as appropriate.
>
> Then it just works right always and the compiler will help address
> Al's concern down the road.

I think if we wanted to do this with a new file operation, the best
way would be to do the copy_from_user()/copy_to_user() in the caller
as well.

We already do this inside of some subsystems, notably drivers/media/,
and it simplifies the implementation of the ioctl handler function
significantly. We obviously cannot do this in general, both because of
traditional drivers that have 16-bit command codes (drivers/tty and others)
and also because of drivers that by accident defined the commands
incorrectly and use the wrong type or the wrong direction in the
definition.

   Arnd


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-24 Thread Arnd Bergmann
On Mon, Sep 24, 2018 at 10:35 PM Jason Gunthorpe  wrote:
> On Mon, Sep 24, 2018 at 10:18:52PM +0200, Arnd Bergmann wrote:
> > On Tue, Sep 18, 2018 at 7:59 PM Jason Gunthorpe  wrote:
> > > On Tue, Sep 18, 2018 at 10:51:08AM -0700, Darren Hart wrote:
> > > > On Fri, Sep 14, 2018 at 09:57:48PM +0100, Al Viro wrote:
> > > > > On Fri, Sep 14, 2018 at 01:35:06PM -0700, Darren Hart wrote:
> > We already do this inside of some subsystems, notably drivers/media/,
> > and it simplifies the implementation of the ioctl handler function
> > significantly. We obviously cannot do this in general, both because of
> > traditional drivers that have 16-bit command codes (drivers/tty and others)
> > and also because of drivers that by accident defined the commands
> > incorrectly and use the wrong type or the wrong direction in the
> > definition.
>
> That could work well, but the first idea could be done globally and
> mechanically, while this would require very careful per-driver
> investigation.
>
> Particularly if the core code has worse performance.. ie due to
> kmalloc calls or something.
>
> I think it would make more sense to start by having the core do the
> case to __user and then add another entry point to have the core do
> the copy_from_user, and so on.

Having six separate callback pointers to implement a single
system call seems a bit excessive though.

Arnd


[PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE

2018-11-03 Thread Arnd Bergmann
On 32-bit ARM with gcc-8, I see a link error with the addition of the
CONFIG_NO_AUTO_INLINE option:

fs/btrfs/super.o: In function `btrfs_statfs':
super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod'
super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod'
super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod'
super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod'
super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod'
fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to 
`__aeabi_uldivmod' follow

So far this is the only file that shows the behavior, so I'd propose
to just work around it by marking the functions as 'static inline'
that normally get inlined here.

The reference to __aeabi_uldivmod comes from a div_u64() which has an
optimization for a constant division that uses a straight '/' operator
when the result should be known to the compiler. My interpretation is
that as we turn off inlining, gcc still expects the result to be constant
but fails to use that constant value.

Cc: Changbin Du 
Fixes: 943b8435c3bd ("kernel hacking: add a config option to disable compiler 
auto-inlining")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/super.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c3c1e7bee49d..b7af0b8936ad 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,7 +1922,7 @@ static int btrfs_remount(struct super_block *sb, int 
*flags,
 }
 
 /* Used to sort the devices by max_avail(descending sort) */
-static int btrfs_cmp_device_free_bytes(const void *dev_info1,
+static inline int btrfs_cmp_device_free_bytes(const void *dev_info1,
   const void *dev_info2)
 {
if (((struct btrfs_device_info *)dev_info1)->max_avail >
@@ -1951,8 +1951,8 @@ static inline void btrfs_descending_sort_devices(
  * The helper to calc the free space on the devices that can be used to store
  * file data.
  */
-static int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
-  u64 *free_bytes)
+static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
+ u64 *free_bytes)
 {
struct btrfs_device_info *devices_info;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
-- 
2.18.0



Re: [PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE

2018-11-03 Thread Arnd Bergmann
On 11/3/18, Nikolay Borisov  wrote:
> On 3.11.18 г. 17:39 ч., Arnd Bergmann wrote:
>> On 32-bit ARM with gcc-8, I see a link error with the addition of the
>> CONFIG_NO_AUTO_INLINE option:
>>
>> fs/btrfs/super.o: In function `btrfs_statfs':
>> super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod'
>> fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to
>> `__aeabi_uldivmod' follow
>>
>> So far this is the only file that shows the behavior, so I'd propose
>> to just work around it by marking the functions as 'static inline'
>> that normally get inlined here.
>>
>> The reference to __aeabi_uldivmod comes from a div_u64() which has an
>> optimization for a constant division that uses a straight '/' operator
>> when the result should be known to the compiler. My interpretation is
>> that as we turn off inlining, gcc still expects the result to be constant
>> but fails to use that constant value.
>
> I read this as "this is a compiler bug", no ? So you are providing a
> hack around a compiler bug?

Mostly, yes. The do_div() macro is really pushing the boundaries
of what we can expect the compiler to do in terms of optimizations,
and we've had problems with it in the past.

IIRC the gcc developers would not classify this as a bug because
the result of __builtin_constant_p() is not guaranteed to work the
way we expect, it just does so most of the time.

Arnd


Re: [PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE

2018-11-04 Thread Arnd Bergmann
On 11/4/18, Qu Wenruo  wrote:
>
>
> On 2018/11/3 下午11:39, Arnd Bergmann wrote:
>> On 32-bit ARM with gcc-8, I see a link error with the addition of the
>> CONFIG_NO_AUTO_INLINE option:
>>
>> fs/btrfs/super.o: In function `btrfs_statfs':
>> super.c:(.text+0x67b8): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x67fc): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x6858): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x6920): undefined reference to `__aeabi_uldivmod'
>> super.c:(.text+0x693c): undefined reference to `__aeabi_uldivmod'
>> fs/btrfs/super.o:super.c:(.text+0x6958): more undefined references to
>> `__aeabi_uldivmod' follow
>>
>> So far this is the only file that shows the behavior, so I'd propose
>> to just work around it by marking the functions as 'static inline'
>> that normally get inlined here.
>
> As a workaround it looks OK, but it's definitely not the root cause.
>
>>
>> The reference to __aeabi_uldivmod comes from a div_u64() which has an
>> optimization for a constant division that uses a straight '/' operator
>> when the result should be known to the compiler. My interpretation is
>> that as we turn off inlining, gcc still expects the result to be constant
>> but fails to use that constant value.
>
> It looks more like a bug in div_u64() optimization.
>
> Despite this file in btrfs, did you hit the same problem for any other
> file?

Not this time. I've done a creduce on the file and got to this code

struct kstatfs {
  u64 f_bfree;
};
btrfs_calc_avail_data_space(p1) {}
btrfs_statfs(struct kstatfs *p1) {
  u64 d = 0;
  unsigned e = 1;
  for (; a;)
e = btrfs_bg_type_to_factor();
  p1->f_bfree = div_u64(0, e) >> c;
  __asm__("");
  div_u64(d, e);
  b = btrfs_calc_avail_data_space(&d);
}

Looking at the assembler code produced by this, it seems to be
the same thing that we dealt with in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

>> Cc: Changbin Du 
>> Fixes: 943b8435c3bd ("kernel hacking: add a config option to disable
>> compiler auto-inlining")
>
> I can't find it in the mainline kernel, is the commit hash correct?
> If not merged, we should still has a chance to further polish that patch.

It's in linux-next.

   Arnd


Re: [PATCH] btrfs: avoid link error with CONFIG_NO_AUTO_INLINE

2018-11-05 Thread Arnd Bergmann
On 11/5/18, David Sterba  wrote:
> On Sun, Nov 04, 2018 at 11:32:03PM +0100, Arnd Bergmann wrote:
>> >> Cc: Changbin Du 
>> >> Fixes: 943b8435c3bd ("kernel hacking: add a config option to disable
>> >> compiler auto-inlining")
>> >
>> > I can't find it in the mainline kernel, is the commit hash correct?
>> > If not merged, we should still has a chance to further polish that
>> > patch.
>>
>> It's in linux-next.
>
> I can't find it in current linux-next either, the final reference in
> Fixes: must refer to a commit in Linus' tree.

Ah, right, it got rebased. The commit ID in today's linux-next
is 917fad29febd. Most trees in linux-next don't rebase so this
would not be an issue, but you are right that this one clearly
did, so the line is wrong here.

The commit is now delayed until 4.21 I assume.

> You can take this fix with the patch that introduces the config option
> (ack for that) in case merging through the btrfs tree would be too late
> for it (ie. no common base for the git trees containg the new code and
> fix).
>
> Or I can take it through btrfs tree in 4.20-rc cycle. In both cases the
> Fixes: does not need to be there.

Please take it through the btrfs tree. Let me know if you need me
to extend the changelog to explain that situation better.

  Arnd


[PATCH] btrfs: reduce stack usage for btrfsic_process_written_block

2019-07-08 Thread Arnd Bergmann
btrfsic_process_written_block() cals btrfsic_process_metablock(),
which has a fairly large stack usage due to the btrfsic_stack_frame
variable. It also calls btrfsic_test_for_metadata(), which now
needs several hundreds of bytes for its SHASH_DESC_ON_STACK().

In some configurations, we end up with both functions on the
same stack, and gcc warns about the excessive stack usage that
might cause the available stack space to run out:

fs/btrfs/check-integrity.c:1743:13: error: stack frame size of 1152 bytes in 
function 'btrfsic_process_written_block' [-Werror,-Wframe-larger-than=]

Marking both child functions as noinline_for_stack helps because
this guarantees that the large variables are not on the same
stack frame.

Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for 
checksumming")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/check-integrity.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 81a9731959a9..0b52ab4cb964 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -940,7 +940,7 @@ static void btrfsic_stack_frame_free(struct 
btrfsic_stack_frame *sf)
kfree(sf);
 }
 
-static int btrfsic_process_metablock(
+static noinline_for_stack int btrfsic_process_metablock(
struct btrfsic_state *state,
struct btrfsic_block *const first_block,
struct btrfsic_block_data_ctx *const first_block_ctx,
@@ -1706,8 +1706,9 @@ static void btrfsic_dump_database(struct btrfsic_state 
*state)
  * Test whether the disk block contains a tree block (leaf or node)
  * (note that this test fails for the super block)
  */
-static int btrfsic_test_for_metadata(struct btrfsic_state *state,
-char **datav, unsigned int num_pages)
+static noinline_for_stack int btrfsic_test_for_metadata(
+   struct btrfsic_state *state,
+   char **datav, unsigned int num_pages)
 {
struct btrfs_fs_info *fs_info = state->fs_info;
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
-- 
2.20.0



[PATCH] btrfs: add back libcrc32c Kconfig dependency

2019-07-08 Thread Arnd Bergmann
While part of btrfs now uses the crypto shash interfaces
for crc32c, we still get a build time dependency in other
places:

fs/btrfs/super.o: In function `btrfs_mount_root':
super.c:(.text+0xc0d4): undefined reference to `crc32c_impl'
fs/btrfs/super.o: In function `btrfs_print_mod_info':
super.c:(.init.text+0x3e28): undefined reference to `crc32c_impl'
fs/btrfs/extent-tree.o: In function `lookup_inline_extent_backref':
extent-tree.c:(.text+0x17750): undefined reference to `crc32c'
fs/btrfs/extent-tree.o:extent-tree.c:(.text+0x177f4): more undefined references 
to `crc32c' follow

Change Kconfig to depend on both.

Fixes: d5178578bcd4 ("btrfs: directly call into crypto framework for 
checksumming")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 2521a24f74be..aa7453d44e59 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -3,6 +3,7 @@
 config BTRFS_FS
tristate "Btrfs filesystem support"
select CRYPTO
+   select LIBCRC32C
select CRYPTO_CRC32C
select CRYPTO_SHA256
select ZLIB_INFLATE
-- 
2.20.0



Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-19 Thread Arnd Bergmann
On Tue, Mar 20, 2018 at 7:29 AM, Linus Torvalds
 wrote:
> On Mon, Mar 19, 2018 at 2:43 AM, David Laight  wrote:
>>
>> Is it necessary to have the full checks for old versions of gcc?
>>
>> Even -Wvla could be predicated on very recent gcc - since we aren't
>> worried about whether gcc decides to generate a vla, but whether
>> the source requests one.
>
> You are correct. We could just ignore the issue with old gcc versions,
> and disable -Wvla rather than worry about it.

This version might also be an option:

diff --git a/Makefile b/Makefile
index 37fc475a2b92..49dd9f0fb76c 100644
--- a/Makefile
+++ b/Makefile
@@ -687,7 +687,8 @@ KBUILD_CFLAGS += $(call cc-option,-fno-reorder-blocks,) \
 endif

 ifneq ($(CONFIG_FRAME_WARN),0)
-KBUILD_CFLAGS += $(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN})
+KBUILD_CFLAGS += $(call cc-option,-Wstack-usage=${CONFIG_FRAME_WARN}, \
+   -$(call cc-option,-Wframe-larger-than=${CONFIG_FRAME_WARN}))
 endif

 # This selects the stack protector compiler flag. Testing it is delayed

Wiht -Wstack-usage=, we should get a similar warning to -Wvla for frames that
contain real VLAs, but not when there is a VLA that ends up being a compile-time
constant size in the end. Wstack-usage was introduced in gcc-4.7, so
on older versions
it turns back into Wframe-larger-than=.

An example output would be

security/integrity/ima/ima_crypto.c: In function 'ima_calc_buffer_hash':
security/integrity/ima/ima_crypto.c:616:5: error: stack usage might be
unbounded [-Werror=stack-usage=]

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-12-06 Thread Arnd Bergmann
The return value of sizeof() is of type size_t, so we must print it
using the %z format modifier rather than %l to avoid this warning
on some architectures:

fs/btrfs/tree-checker.c: In function 'check_dir_item':
fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 
'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} 
[-Werror=format=]

Fixes: 005887f2e3e0 ("btrfs: tree-checker: Add checker for dir item")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/tree-checker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 66dac0a4b01f..7c55e3ba5a6c 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -270,7 +270,7 @@ static int check_dir_item(struct btrfs_root *root,
/* header itself should not cross item boundary */
if (cur + sizeof(*di) > item_size) {
dir_item_err(root, leaf, slot,
-   "dir item header crosses item boundary, have %lu boundary %u",
+   "dir item header crosses item boundary, have %zu boundary %u",
cur + sizeof(*di), item_size);
return -EUCLEAN;
}
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-12-07 Thread Arnd Bergmann
On Thu, Dec 7, 2017 at 1:32 AM, Qu Wenruo  wrote:
>
>
> On 2017年12月06日 22:18, Arnd Bergmann wrote:
>> The return value of sizeof() is of type size_t, so we must print it
>> using the %z format modifier rather than %l to avoid this warning
>> on some architectures:
>>
>> fs/btrfs/tree-checker.c: In function 'check_dir_item':
>> fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 
>> 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} 
>> [-Werror=format=]
>
> Any idea about which architecture will cause such warning?
> On x86_64 I always fail to get such warning.

I think all 32-bit architectures:

#ifndef __kernel_size_t
#if __BITS_PER_LONG != 64
typedef unsigned int__kernel_size_t;
typedef int __kernel_ssize_t;
typedef int __kernel_ptrdiff_t;
#else
typedef __kernel_ulong_t __kernel_size_t;
typedef __kernel_long_t __kernel_ssize_t;
typedef __kernel_long_t __kernel_ptrdiff_t;
#endif
#endif

  Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: qgroups: remove unused label

2017-12-20 Thread Arnd Bergmann
The recent revert left one unused label behind:

fs/btrfs/qgroup.c: In function 'qgroup_reserve':
fs/btrfs/qgroup.c:2432:1: error: label 'retry' defined but not used 
[-Werror=unused-label]

Let's remove it, too.

Fixes: b283738ab0ad ("Revert "btrfs: qgroups: Retry after commit on getting 
EDQUOT"")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/qgroup.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index de3b96f1005b..2b89848e767d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2429,7 +2429,6 @@ static int qgroup_reserve(struct btrfs_root *root, u64 
num_bytes, bool enforce,
capable(CAP_SYS_RESOURCE))
enforce = false;
 
-retry:
spin_lock(&fs_info->qgroup_lock);
quota_root = fs_info->quota_root;
if (!quota_root)
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: zoned: fix uninitialized max_chunk_size

2021-03-23 Thread Arnd Bergmann
From: Arnd Bergmann 

The ctl->max_chunk_size member might be used uninitialized
when none of the three conditions for initializing it in
init_alloc_chunk_ctl_policy_zoned() are true:

In function ‘init_alloc_chunk_ctl_policy_zoned’,
inlined from ‘init_alloc_chunk_ctl’ at fs/btrfs/volumes.c:5023:3,
inlined from ‘btrfs_alloc_chunk’ at fs/btrfs/volumes.c:5340:2:
include/linux/compiler-gcc.h:48:45: error: ‘ctl.max_chunk_size’ may be used 
uninitialized [-Werror=maybe-uninitialized]
 4998 | ctl->max_chunk_size = min(limit, ctl->max_chunk_size);
  |   ^~~
fs/btrfs/volumes.c: In function ‘btrfs_alloc_chunk’:
fs/btrfs/volumes.c:5316:32: note: ‘ctl’ declared here
 5316 | struct alloc_chunk_ctl ctl;
  |^~~

Initialize it to UINT_MAX and rely on the min() expression to limit
it.

Fixes: 1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
Signed-off-by: Arnd Bergmann 
---
Note that the -Wmaybe-unintialized warning is globally disabled
by default. For some reason I got this warning anyway when building
this specific file with gcc-11.
---
 fs/btrfs/volumes.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bc3b33efddc5..b42b423b6a10 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4980,6 +4980,7 @@ static void init_alloc_chunk_ctl_policy_zoned(
u64 type = ctl->type;
 
ctl->max_stripe_size = zone_size;
+   ctl->max_chunk_size = UINT_MAX;
if (type & BTRFS_BLOCK_GROUP_DATA) {
ctl->max_chunk_size = round_down(BTRFS_MAX_DATA_CHUNK_SIZE,
 zone_size);
-- 
2.29.2



[PATCH] [v2] btrfs: zoned: bail out in btrfs_alloc_chunk for bad input

2021-03-23 Thread Arnd Bergmann
From: Arnd Bergmann 

gcc complains that the ctl->max_chunk_size member might be used
uninitialized when none of the three conditions for initializing it in
init_alloc_chunk_ctl_policy_zoned() are true:

In function ‘init_alloc_chunk_ctl_policy_zoned’,
inlined from ‘init_alloc_chunk_ctl’ at fs/btrfs/volumes.c:5023:3,
inlined from ‘btrfs_alloc_chunk’ at fs/btrfs/volumes.c:5340:2:
include/linux/compiler-gcc.h:48:45: error: ‘ctl.max_chunk_size’ may be used 
uninitialized [-Werror=maybe-uninitialized]
 4998 | ctl->max_chunk_size = min(limit, ctl->max_chunk_size);
  |   ^~~
fs/btrfs/volumes.c: In function ‘btrfs_alloc_chunk’:
fs/btrfs/volumes.c:5316:32: note: ‘ctl’ declared here
 5316 | struct alloc_chunk_ctl ctl;
  |^~~

If we ever get into this condition, something is seriously
wrong, so the same logic as in init_alloc_chunk_ctl_policy_regular()
and a few other places should be applied. This avoids both further
data corruption, and the compile-time warning.

Fixes: 1cd6121f2a38 ("btrfs: zoned: implement zoned chunk allocator")
Link: https://lore.kernel.org/lkml/20210323132343.gf7...@twin.jikos.cz/
Suggested-by: David Sterba 
Signed-off-by: Arnd Bergmann 
---
Note that the -Wmaybe-unintialized warning is globally disabled
by default. For some reason I got this warning anyway when building
this specific file with gcc-11.
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bc3b33efddc5..d2994305ed77 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4989,6 +4989,8 @@ static void init_alloc_chunk_ctl_policy_zoned(
ctl->max_chunk_size = 2 * ctl->max_stripe_size;
ctl->devs_max = min_t(int, ctl->devs_max,
  BTRFS_MAX_DEVS_SYS_CHUNK);
+   } else {
+   BUG();
}
 
/* We don't want a chunk larger than 10% of writable space */
-- 
2.29.2



Re: [PATCH 1/4] module: Add module_subinit{_noexit} and module_subeixt helper macros

2024-07-25 Thread Arnd Bergmann
On Thu, Jul 25, 2024, at 16:39, Christoph Hellwig wrote:
> On Thu, Jul 25, 2024 at 11:01:33AM +0800, Youling Tang wrote:
>> - It doesn't feel good to have only one subinit/exit in a file.
>>   Assuming that there is only one file in each file, how do we
>>   ensure that the files are linked in order?(Is it sorted by *.o
>>   in the Makefile?)
>
> Yes, link order already matterns for initialization order for built-in
> code, so this is a well known concept.

Note: I removed the old way of entering a module a few
years ago, which allowed simply defining a function called
init_module(). The last one of these was a07d8ecf6b39
("ethernet: isa: convert to module_init/module_exit").

Now I think we could just make the module_init() macro
do the same thing as a built-in initcall() and put
an entry in a special section, to let you have multiple
entry points in a loadable module.

There are still at least two problems though:

- while link order is defined between files in a module,
  I don't think there is any guarantee for the order between
  two initcalls of the same level within a single file.

- For built-in code we don't have to worry about matching
  the order of the exit calls since they don't exist there.
  As I understand, the interesting part of this patch
  series is about making sure the order matches between
  init and exit, so there still needs to be a way to
  express a pair of such calls.

 Arnd



[PATCH] btrfs: shut up bogus -Wmaybe-uninitialized warning

2019-06-17 Thread Arnd Bergmann
gcc sometimes can't determine whether a variable has been initialized
when both the initialization and the use are conditional:

fs/btrfs/props.c: In function 'inherit_props':
fs/btrfs/props.c:389:4: error: 'num_bytes' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
btrfs_block_rsv_release(fs_info, trans->block_rsv,

This code is fine. Unfortunately, I cannot think of a good way to
rephrase it in a way that makes gcc understand this, so I add
a bogus initialization the way one should not.

Fixes: d7400ee1b476 ("btrfs: use the existing reserved items for our first prop 
for inheritance")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/props.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index a9e2e66152ee..9d47ae1cf5b2 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -341,7 +341,7 @@ static int inherit_props(struct btrfs_trans_handle *trans,
for (i = 0; i < ARRAY_SIZE(prop_handlers); i++) {
const struct prop_handler *h = &prop_handlers[i];
const char *value;
-   u64 num_bytes;
+   u64 num_bytes = 0;
 
if (!h->inheritable)
continue;
-- 
2.20.0



[PATCH] btrfs: fix uninitialized variable access after ASSERT

2016-11-28 Thread Arnd Bergmann
In btrfs, ASSERT() has no effect if CONFIG_BTRFS_ASSERT is disabled,
and gcc notices that this can lead to using an uninitialized variable:

fs/btrfs/inode.c: In function 'run_delalloc_range':
fs/btrfs/inode.c:1190:18: error: 'cur_end' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]

I assume the condition that the ASSERT checks for is actually
correct and we won't get there in practice, but it's easy to
modify the function to make it simpler and avoid the condition
that the warning is for.

Fixes: e5249f75cfd0 ("btrfs: Introduce COMPRESS reserve type to fix false 
enospc for compression")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/inode.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6f35d923d67..b1d2b38d29aa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1175,18 +1175,14 @@ static int cow_file_range_async(struct inode *inode, 
struct page *locked_page,
clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED,
 1, 0, NULL, GFP_NOFS);
while (start < end) {
+   ASSERT(reserve_type == BTRFS_RESERVE_COMPRESS);
async_cow = kmalloc(sizeof(*async_cow), GFP_NOFS);
BUG_ON(!async_cow); /* -ENOMEM */
async_cow->inode = igrab(inode);
async_cow->root = root;
async_cow->locked_page = locked_page;
async_cow->start = start;
-
-   if (reserve_type == BTRFS_RESERVE_COMPRESS)
-   cur_end = min(end, start + SZ_512K - 1);
-   else
-   ASSERT(0);
-
+   cur_end = min(end, start + SZ_512K - 1);
async_cow->end = cur_end;
INIT_LIST_HEAD(&async_cow->extents);
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/12] audit: Use timespec64 to represent audit timestamps

2017-04-15 Thread Arnd Bergmann
On Sat, Apr 8, 2017 at 5:58 PM, Deepa Dinamani  wrote:
>> I have no problem merging this patch into audit/next for v4.12, would
>> you prefer me to do that so at least this patch is merged?
>
> This would be fine.
> But, I think whoever takes the last 2 deletion patches should also take them.
> I'm not sure how that part works out.
>
>> It would probably make life a small bit easier for us in the audit
>> world too as it would reduce the potential merge conflict.  However,
>> that's a relatively small thing to worry about.

As Andrew has picked the remaining patches up into -mm, this will work
out fine: any patches picked up by the respective maintainers for v4.12
should arrive as git pull requests before the -mm patches get applied
at a later stage of the merge window.

 Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: work around maybe-uninitialized warning

2017-05-18 Thread Arnd Bergmann
A rewrite of btrfs_submit_direct_hook appears to have introduced a warning:

fs/btrfs/inode.c: In function 'btrfs_submit_direct_hook':
fs/btrfs/inode.c:8467:14: error: 'bio' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]

Where the 'bio' variable was previously initialized unconditionally, it
is now set in the "while (submit_len > 0)" loop that would never execute
if submit_len is zero.

Assuming this cannot happen in practice, we can avoid the warning
by simply replacing the while{} loop with a do{}while() loop so
the compiler knows that it will always be entered at least once.

Fixes: 0fd27e06c61b ("Btrfs: use bio_clone_bioset_partial to simplify DIO 
submit")
Signed-off-by: Arnd Bergmann 
---
---
 fs/btrfs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8c37b4fa4cbb..c62cf9593cb3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8497,7 +8497,7 @@ static int btrfs_submit_direct_hook(struct 
btrfs_dio_private *dip,
/* bio split */
ASSERT(map_length <= INT_MAX);
atomic_inc(&dip->pending_bios);
-   while (submit_len > 0) {
+   do {
clone_len = min_t(int, submit_len, map_length);
 
/*
@@ -8540,7 +8540,7 @@ static int btrfs_submit_direct_hook(struct 
btrfs_dio_private *dip,
  start_sector << 9, &map_length, NULL, 0);
if (ret)
goto out_err;
-   }
+   } while (submit_len > 0);
 
 submit:
ret = __btrfs_submit_dio_bio(bio, inode, file_offset, skip_sum,
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: work around maybe-uninitialized warning

2017-05-19 Thread Arnd Bergmann
On Fri, May 19, 2017 at 8:10 PM, Liu Bo  wrote:
> On Thu, May 18, 2017 at 03:33:29PM +0200, Arnd Bergmann wrote:
>> A rewrite of btrfs_submit_direct_hook appears to have introduced a warning:
>>
>> fs/btrfs/inode.c: In function 'btrfs_submit_direct_hook':
>> fs/btrfs/inode.c:8467:14: error: 'bio' may be used uninitialized in this 
>> function [-Werror=maybe-uninitialized]
>>
>> Where the 'bio' variable was previously initialized unconditionally, it
>> is now set in the "while (submit_len > 0)" loop that would never execute
>> if submit_len is zero.
>>
>> Assuming this cannot happen in practice, we can avoid the warning
>> by simply replacing the while{} loop with a do{}while() loop so
>> the compiler knows that it will always be entered at least once.
>>
>
> Thanks for the fix.  I think it's a false positve one and I've updated it in 
> v2
> with a 'struct bio *bio = NULL' to make compiler happy, could you please help
> reveiw it?

Right, it is a false positive and adding the =NULL initialization shuts up the
warning. The reason my patch used a different approach is to make the
code more robust, see https://rusty.ozlabs.org/?p=232

Generally speaking initializing a local variable to an illegal value, and later
using the variable without a check for that original value is error-prone.
Even though the code is correct at the moment, someone else might
modify it later. My first (broken) solution avoided this by checking for
the condition that led to the warning, my newer solution is nicer as it
makes it much clearer to the reader what is going on, compared to
the NULL initialization that does not help readability but makes
it slightly harder to understand why you wrote the code specifically that
way.

   Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: avoid uninitialized variable warning

2016-02-22 Thread Arnd Bergmann
With CONFIG_SMP and CONFIG_PREEMPT both disabled, gcc decides
to partially inline the get_state_failrec() function but cannot
figure out that means the failrec pointer is always valid
if the function returns success, which causes a harmless
warning:

fs/btrfs/extent_io.c: In function 'clean_io_failure':
fs/btrfs/extent_io.c:2131:4: error: 'failrec' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]

This marks get_state_failrec() and set_state_failrec() both
as 'noinline', which avoids the warning in all cases for me,
and seems less ugly than adding a fake initialization.

Signed-off-by: Arnd Bergmann 
Fixes: 47dc196ae719 ("btrfs: use proper type for failrec in extent_state")
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 27577f1b10dc..76a0c8597d98 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1842,7 +1842,7 @@ out:
  * set the private field for a given byte offset in the tree.  If there isn't
  * an extent_state there already, this does nothing.
  */
-static int set_state_failrec(struct extent_io_tree *tree, u64 start,
+static noinline int set_state_failrec(struct extent_io_tree *tree, u64 start,
struct io_failure_record *failrec)
 {
struct rb_node *node;
@@ -1870,7 +1870,7 @@ out:
return ret;
 }
 
-static int get_state_failrec(struct extent_io_tree *tree, u64 start,
+static noinline int get_state_failrec(struct extent_io_tree *tree, u64 start,
struct io_failure_record **failrec)
 {
struct rb_node *node;
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-06-01 Thread Arnd Bergmann
On Thu, Jun 1, 2017 at 11:56 AM, Yan, Zheng  wrote:
> On Sat, Apr 8, 2017 at 8:57 AM, Deepa Dinamani  wrote:

>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 517838b..77204da 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct 
>> rbd_obj_request *obj_request)
>>  {
>> struct ceph_osd_request *osd_req = obj_request->osd_req;
>>
>> -   osd_req->r_mtime = CURRENT_TIME;
>> +   ktime_get_real_ts(&osd_req->r_mtime);
>> osd_req->r_data_offset = obj_request->offset;
>>  }
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index c681762..1d3fa90 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -1666,6 +1666,7 @@ struct ceph_mds_request *
>>  ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
>>  {
>> struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS);
>> +   struct timespec ts;
>>
>> if (!req)
>> return ERR_PTR(-ENOMEM);
>> @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, 
>> int op, int mode)
>> init_completion(&req->r_safe_completion);
>> INIT_LIST_HEAD(&req->r_unsafe_item);
>>
>> -   req->r_stamp = current_fs_time(mdsc->fsc->sb);
>> +   ktime_get_real_ts(&ts);
>> +   req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran);
>
> This change causes our kernel_untar_tar test case to fail (inode's
> ctime goes back). The reason is that there is time drift between the
> time stamps got by  ktime_get_real_ts() and current_time(). We need to
> revert this change until current_time() uses ktime_get_real_ts()
> internally.

Hmm, the change was not supposed to have a user-visible effect, so
something has gone wrong, but I don't immediately see how it
relates to what you observe.

ktime_get_real_ts() and current_time() use the same time base, there
is no drift, but there is a difference in resolution, as the latter uses
the time stamp of the last jiffies update, which may be up to one jiffy
(10ms) behind the exact time we put in the request stamps here.

Do you still see problems if you use current_kernel_time() instead of
ktime_get_real_ts()?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-06-02 Thread Arnd Bergmann
On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng  wrote:
> On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani  wrote:
>> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz  wrote:
>>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng  wrote:
>>>> On Thu, Jun 1, 2017 at 6:22 PM, Arnd Bergmann  wrote:
>>>>> On Thu, Jun 1, 2017 at 11:56 AM, Yan, Zheng  wrote:
>>>>>> On Sat, Apr 8, 2017 at 8:57 AM, Deepa Dinamani  
>>>>>> wrote:
>>>>>
>>>>>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>>>>>> index 517838b..77204da 100644
>>>>>>> --- a/drivers/block/rbd.c
>>>>>>> +++ b/drivers/block/rbd.c
>>>>>>> @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct 
>>>>>>> rbd_obj_request *obj_request)
>>>>>>>  {
>>>>>>> struct ceph_osd_request *osd_req = obj_request->osd_req;
>>>>>>>
>>>>>>> -   osd_req->r_mtime = CURRENT_TIME;
>>>>>>> +   ktime_get_real_ts(&osd_req->r_mtime);
>>>>>>> osd_req->r_data_offset = obj_request->offset;
>>>>>>>  }
>>>>>>>
>>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>>>> index c681762..1d3fa90 100644
>>>>>>> --- a/fs/ceph/mds_client.c
>>>>>>> +++ b/fs/ceph/mds_client.c
>>>>>>> @@ -1666,6 +1666,7 @@ struct ceph_mds_request *
>>>>>>>  ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int 
>>>>>>> mode)
>>>>>>>  {
>>>>>>> struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS);
>>>>>>> +   struct timespec ts;
>>>>>>>
>>>>>>> if (!req)
>>>>>>> return ERR_PTR(-ENOMEM);
>>>>>>> @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client 
>>>>>>> *mdsc, int op, int mode)
>>>>>>> init_completion(&req->r_safe_completion);
>>>>>>> INIT_LIST_HEAD(&req->r_unsafe_item);
>>>>>>>
>>>>>>> -   req->r_stamp = current_fs_time(mdsc->fsc->sb);
>>>>>>> +   ktime_get_real_ts(&ts);
>>>>>>> +   req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran);
>>>>>>
>>>>>> This change causes our kernel_untar_tar test case to fail (inode's
>>>>>> ctime goes back). The reason is that there is time drift between the
>>>>>> time stamps got by  ktime_get_real_ts() and current_time(). We need to
>>>>>> revert this change until current_time() uses ktime_get_real_ts()
>>>>>> internally.
>>>>>
>>>>> Hmm, the change was not supposed to have a user-visible effect, so
>>>>> something has gone wrong, but I don't immediately see how it
>>>>> relates to what you observe.
>>>>>
>>>>> ktime_get_real_ts() and current_time() use the same time base, there
>>>>> is no drift, but there is a difference in resolution, as the latter uses
>>>>> the time stamp of the last jiffies update, which may be up to one jiffy
>>>>> (10ms) behind the exact time we put in the request stamps here.
>>>>>
>>>>> Do you still see problems if you use current_kernel_time() instead of
>>>>> ktime_get_real_ts()?
>>>>
>>>> The problem disappears after using current_kernel_time().
>>>>
>>>> https://github.com/ceph/ceph-client/commit/2e0f648da23167034a3cf1500bc90ec60aef2417
>>>
>>> From the commit above:
>>> "It seems there is time drift between ktime_get_real_ts() and
>>> current_kernel_time()"
>>>
>>> Its more of a granularity difference. current_kernel_time() returns
>>> the cached time at the last tick, where as ktime_get_real_ts() reads
>>> the clocksource hardware and returns the immediate time.
>>>
>>> Filesystems usually use the cached time (similar to
>>> CLOCK_REALTIME_COARSE), for performance reasons, as touching the
>>> clocksource takes time.
>>
>> Alternatively, it would be best for this code also to use current_time().
>> I had suggested this in one of the previous versions

Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-06-02 Thread Arnd Bergmann
On Fri, Jun 2, 2017 at 12:10 PM, Yan, Zheng  wrote:
> On Fri, Jun 2, 2017 at 5:45 PM, Arnd Bergmann  wrote:
>> On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng  wrote:
>>> On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani  
>>> wrote:
>>>> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz  wrote:
>>>>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng  wrote:
>>
>> I believe the bug you see is the result of the two timestamps
>> currently being almost guaranteed to be different in the latest
>> kernels.
>> Changing r_stamp to use current_kernel_time() will make it the
>> same value most of the time (as it was before Deepa's patch),
>> but when the timer interrupt happens between the timestamps,
>> the two are still different, it's just much harder to hit.
>>
>> I think the proper solution should be to change __ceph_setattr()
>> in a way that has req->r_stamp always synchronized with i_ctime.
>> If we copy i_ctime to r_stamp, that will also take care of the
>> future issues with the planned changes to current_time().
>>
> I already have a patch
> https://github.com/ceph/ceph-client/commit/24f54cd18e195a002ee3d2ab50dbc952fd9f82af

Looks good to me. In case anyone cares:
Acked-by: Arnd Bergmann 

>> The part I don't understand is what else r_stamp (i.e. the time
>> stamp in ceph_msg_data with type==
>> CEPH_MSG_CLIENT_REQUEST) is used for, other than setting
>> ctime in CEPH_MDS_OP_SETATTR.
>>
>> Will this be used to update the stored i_ctime for other operations
>> too? If so, we would need to synchronize it with the in-memory
>> i_ctime for all operations that do this.
>>
>
> yes,  mds uses it to update ctime of modified inodes. For example,
> when handling mkdir, mds set ctime of both parent inode and new inode
> to r_stamp.

I see, so we may have a variation of that problem there as well: From
my reading of the code, the child inode is not in memory yet, so
that seems fine, but I could not find where the parent in-memory inode
i_ctime is updated in ceph, but it is most likely not the same as
req->r_stamp (assuming it gets updated at all).

Would it make sense require all callers of ceph_mdsc_do_request()
to update r_stamp at the same time as i_ctime to keep them in sync?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-06-02 Thread Arnd Bergmann
On Fri, Jun 2, 2017 at 1:18 PM, Yan, Zheng  wrote:
> On Fri, Jun 2, 2017 at 6:51 PM, Arnd Bergmann  wrote:
>> On Fri, Jun 2, 2017 at 12:10 PM, Yan, Zheng  wrote:
>>> On Fri, Jun 2, 2017 at 5:45 PM, Arnd Bergmann  wrote:
>>>> On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng  wrote:
>>>>> On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani  
>>>>> wrote:
>>>>>> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz  
>>>>>> wrote:
>>>>>>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng  wrote:
>>>>
>>>> I believe the bug you see is the result of the two timestamps
>>>> currently being almost guaranteed to be different in the latest
>>>> kernels.
>>>> Changing r_stamp to use current_kernel_time() will make it the
>>>> same value most of the time (as it was before Deepa's patch),
>>>> but when the timer interrupt happens between the timestamps,
>>>> the two are still different, it's just much harder to hit.
>>>>
>>>> I think the proper solution should be to change __ceph_setattr()
>>>> in a way that has req->r_stamp always synchronized with i_ctime.
>>>> If we copy i_ctime to r_stamp, that will also take care of the
>>>> future issues with the planned changes to current_time().
>>>>
>>> I already have a patch
>>> https://github.com/ceph/ceph-client/commit/24f54cd18e195a002ee3d2ab50dbc952fd9f82af
>>
>> Looks good to me. In case anyone cares:
>> Acked-by: Arnd Bergmann 
>>
>>>> The part I don't understand is what else r_stamp (i.e. the time
>>>> stamp in ceph_msg_data with type==
>>>> CEPH_MSG_CLIENT_REQUEST) is used for, other than setting
>>>> ctime in CEPH_MDS_OP_SETATTR.
>>>>
>>>> Will this be used to update the stored i_ctime for other operations
>>>> too? If so, we would need to synchronize it with the in-memory
>>>> i_ctime for all operations that do this.
>>>>
>>>
>>> yes,  mds uses it to update ctime of modified inodes. For example,
>>> when handling mkdir, mds set ctime of both parent inode and new inode
>>> to r_stamp.
>>
>> I see, so we may have a variation of that problem there as well: From
>> my reading of the code, the child inode is not in memory yet, so
>> that seems fine, but I could not find where the parent in-memory inode
>> i_ctime is updated in ceph, but it is most likely not the same as
>> req->r_stamp (assuming it gets updated at all).
>
> i_ctime is updated when handling request reply, by ceph_fill_file_time().
> __ceph_setattr() can update the in-memory inode's ctime after request
> reply is received. The difference between ktime_get_real_ts() and
> current_time() can be larger than round-trip time of request. So it's
> still possible that __ceph_setattr() make ctime go back.

But the __ceph_setattr() problem should be fixed by your patch, right?

What I meant is another related problem in ceph_mkdir() where the
i_ctime field of the parent inode is different between the persistent
representation in the mds and the in-memory representation.

Arnd

>> Would it make sense require all callers of ceph_mdsc_do_request()
>> to update r_stamp at the same time as i_ctime to keep them in sync?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-06-02 Thread Arnd Bergmann
On Fri, Jun 2, 2017 at 2:18 PM, Yan, Zheng  wrote:
> On Fri, Jun 2, 2017 at 7:33 PM, Arnd Bergmann  wrote:
>> On Fri, Jun 2, 2017 at 1:18 PM, Yan, Zheng  wrote:
>> What I meant is another related problem in ceph_mkdir() where the
>> i_ctime field of the parent inode is different between the persistent
>> representation in the mds and the in-memory representation.
>>
>
> I don't see any problem in mkdir case. Parent inode's i_ctime in mds is set to
> r_stamp. When client receives request reply, it set its in-memory inode's 
> ctime
> to the same time stamp.

Ok, I see it now, thanks for the clarification. Most other file systems do this
the other way round and update all fields in the in-memory inode structure
first and then write that to persistent storage, so I was getting confused about
the order of events here.

If I understand it all right, we have three different behaviors in ceph now,
though the differences are very minor and probably don't ever matter:

- in setattr(), we update ctime in the in-memory inode first and then send
  the same time to the mds, and expect to set it again when the reply comes.

- in ceph_write_iter write() and mmap/page_mkwrite(), we call
  file_update_time() to set i_mtime and i_ctime to the same
  timestamp first once a write is observed by the fs and then take
  two other timestamps that we send to the mds, and update the
  in-memory inode a second time when the reply comes. ctime
  is never older than mtime here, as far as I can tell, but it may
  be newer when the timer interrupt happens between taking the
  two stamps.

- in all other calls, we only update the inode (and/or parent inode)
  after the reply arrives.

   Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-10-13 Thread Arnd Bergmann
We now get a harmless compile-time on 32-bit architectures:

fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
fs/btrfs/tree-checker.c:189:70: error: format '%lu' expects argument of type 
'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=]

This changes the format string to use %zu instead of %lu for size_t.

Fixes: c1f6520bf360 ("btrfs: tree-checker: Enhance output for 
check_extent_data_item")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/tree-checker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c7a09cc2a17c..388fb6553aa5 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -186,7 +186,7 @@ static int check_extent_data_item(struct btrfs_root *root,
/* Regular or preallocated extent has fixed item size */
if (item_size != sizeof(*fi)) {
file_extent_err(root, leaf, slot,
-   "invalid item size for reg/prealloc file extent, have 
%u expect %lu",
+   "invalid item size for reg/prealloc file extent, have 
%u expect %zu",
item_size, sizeof(*fi));
return -EUCLEAN;
}
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: next/master build: 214 builds: 29 failed, 185 passed, 29 errors, 68 warnings (next-20171016)

2017-10-17 Thread Arnd Bergmann
On Tue, Oct 17, 2017 at 1:02 AM, kernelci.org bot  wrote:
>
> next/master build: 214 builds: 29 failed, 185 passed, 29 errors, 68 warnings 
> (next-20171016)
> Full Build Summary: 
> https://kernelci.org/build/next/branch/master/kernel/next-20171016/
> Tree: next
> Branch: master
> Git Describe: next-20171016
> Git Commit: babb43f85f5fc03482aafa461bdc2d38b9345361
> Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> Built: 4 unique architectures
>
> Build Failures Detected:
>
> arm: gcc version 5.3.1 20160412 (Linaro GCC 5.3-2016.05)
> allmodconfig FAIL
> mips: gcc version 6.3.0 (GCC)
> bigsur_defconfig FAIL
> bmips_be_defconfig FAIL
> bmips_stb_defconfig FAIL
> cavium_octeon_defconfig FAIL
> ip27_defconfig FAIL
> loongson3_defconfig FAIL
> malta_defconfig FAIL
> malta_kvm_defconfig FAIL
> maltaaprp_defconfig FAIL
> maltasmvp_defconfig FAIL
> maltasmvp_eva_defconfig FAIL
> maltaup_defconfig FAIL
> maltaup_xpa_defconfig FAIL
> mips_paravirt_defconfig FAIL
> msp71xx_defconfig FAIL
> nlm_xlp_defconfig FAIL
> nlm_xlr_defconfig FAIL
> pistachio_defconfig FAIL
> sb1250_swarm_defconfig FAIL
> xway_defconfig FAIL
> x86: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
> allmodconfig FAIL
> allmodconfig+CONFIG_OF=n FAIL
> defconfig+CONFIG_KASAN=y FAIL
> defconfig+CONFIG_LKDTM=y FAIL
> defconfig+CONFIG_OF_UNITTEST=y FAIL
> defconfig+kselftest FAIL
> i386_defconfig FAIL
> x86_64_defconfig FAIL
>
> Errors summary:
> 20 arch/mips/include/asm/smp.h:32:29: error: 'CONFIG_MIPS_NR_CPU_NR_MAP' 
> undeclared here (not in a function)

These are all caused by commit e79824d71155 ("MIPS: Allow
__cpu_number_map to be larger than NR_CPUS"), which was
apparently incomplete. I'm not sending a patch since I don't
know what the intention was here. Reverting that commit
avoids the problem.

> 7 drivers/gpu/drm/i915/i915_gem.c:3092:54: error: 'flags' undeclared (first 
> use in this function)

A mismerge in today's linux-next, I assume it will be resolved in the
following linux-next

> 1 ERROR: "__aeabi_uldivmod" [drivers/net/ethernet/intel/i40e/i40e.ko] 
> undefined!

I haven't reproduced this one yet, will have a look.

> 1 /bin/sh: 1: 
> /home/buildslave/workspace/kernel-single-defconfig-builder/defconfig/defconfig+CONFIG_LKDTM=y/label/builder/build-x86/tools/objtool//fixdep:
>  Permission denied

This is an old build problem with objtool. It only shows up sometimes
when we hit a certain race. Josh Poimboef suggested a workaround but
can't reproduce the problem here locally, so I haven't some a patch.

> Warnings summary:
> 16 fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not 
> used [-Wunused-function]
> 16 fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' 
> defined but not used [-Wunused-function]

I sent a patch a few days ago, should get merged soon. This also happens
in mainline.

> 7 include/linux/typecheck.h:11:18: warning: comparison of distinct pointer 
> types lacks a cast
> 5 fs/f2fs/node.c:1654:5: warning: suggest parentheses around assignment used 
> as truth value [-Wparentheses]
> 5 fs/f2fs/node.c:1556:5: warning: suggest parentheses around assignment used 
> as truth value [-Wparentheses]
> 5 fs/f2fs/node.c:1443:5: warning: suggest parentheses around assignment used 
> as truth value [-Wparentheses]
> 5 fs/f2fs/node.c:1289:5: warning: suggest parentheses around assignment used 
> as truth value [-Wparentheses]
> 5 fs/f2fs/checkpoint.c:322:5: warning: suggest parentheses around assignment 
> used as truth value [-Wparentheses]

This came up today, I just sent a patch.

> 2 fs/btrfs/tree-checker.c:186:4: warning: format '%lu' expects argument of 
> type 'long unsigned int', but argument 6 has type 'unsigned int' [-Wformat=]
> 1 fs/btrfs/tree-checker.c:186:70: warning: format '%lu' expects argument of 
> type 'long unsigned int', but argument 6 has type 'unsigned int' [-Wformat=]

I sent a patch a few days ago, Dave Sterba reviewed it, but it hasn't
gone in yet.

> 1 fs/binfmt_elf_fdpic.c:1501:17: warning: unused variable 'addr' 
> [-Wunused-variable]

Al Viro applied my patch, but it hasn't made it into linux-next yet.

 Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Update LZO compression

2012-10-11 Thread Arnd Bergmann
On Tuesday 09 October 2012, Markus F.X.J. Oberhumer wrote:
> > 
> > : This commit updates the kernel LZO code to the current upsteam version
> > : which features a significant speed improvement - benchmarking the Calgary
> > : and Silesia test corpora typically shows a doubled performance in
> > : both compression and decompression on modern i386/x86_64/powerpc machines.
> > 
> > There are significant clients of the LZO library - crypto, btrfs,
> > jffs2, ubifs, squashfs and zcache.  So let's give all those people a cc
> > and ask that they test the LZO changes once they land in linux-next. 
> > For correctness and performance, please.
> 
> The core compression and decompression code has been thoroughly tested, so I
> do not expect major problems.
> 
> Good testing after the merge and feedback about build or performance issues
> (and improvements!) is highly appreciated.

The addition of the lzo tree to linux-next caused this problem for ARM
imx_v6_v7_defconfig:

In file included from 
/home/arnd/linux-arm/arch/arm/boot/compressed/decompress.c:40:0:
/home/arnd/linux-arm/arch/arm/boot/compressed/../../../../lib/decompress_unlzo.c:34:34:
 fatal error: lzo/lzo1x_decompress.c: No such file or directory

Since the file was renamed, anything including it needs to be updated to the
new file name.

Signed-off-by: Arnd Bergmann 

diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c
index 4531294..960183d 100644
--- a/lib/decompress_unlzo.c
+++ b/lib/decompress_unlzo.c
@@ -31,7 +31,7 @@
  */
 
 #ifdef STATIC
-#include "lzo/lzo1x_decompress.c"
+#include "lzo/lzo1x_decompress_safe.c"
 #else
 #include 
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH v2 00/24] Delete CURRENT_TIME and CURRENT_TIME_SEC macros

2016-06-21 Thread Arnd Bergmann
On Monday, June 20, 2016 11:03:01 AM CEST you wrote:
> On Sun, Jun 19, 2016 at 5:26 PM, Deepa Dinamani  
> wrote:
> > The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC 
> > macros.

> Gcc handles 8-byte structure returns (on most architectures) by
> returning them as two 32-bit registers (%edx:%eax on x86). But once it
> is timespec64, that will no longer be the case, and the calling
> convention will end up using a pointer to the local stack instead.

I guess we already have that today, as the implementation of
current_fs_time() is

static inline struct timespec64 tk_xtime(struct timekeeper *tk)
{
struct timespec64 ts;

ts.tv_sec = tk->xtime_sec;
ts.tv_nsec = (long)(tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift);
return ts;
}
extern struct timespec64 current_kernel_time64(void);
struct timespec64 current_kernel_time64(void)
{
struct timekeeper *tk = &tk_core.timekeeper;
struct timespec64 now;
unsigned long seq;

do {
seq = read_seqcount_begin(&tk_core.seq);

now = tk_xtime(tk);
} while (read_seqcount_retry(&tk_core.seq, seq));

return now;
}
static inline struct timespec current_kernel_time(void)
{
struct timespec64 now = current_kernel_time64();

return timespec64_to_timespec(now);
}
extern struct timespec current_fs_time(struct super_block *sb);
struct timespec current_fs_time(struct super_block *sb)
{   
struct timespec now = current_kernel_time();
return timespec_trunc(now, sb->s_time_gran);
}   

We can surely do a little better than this, independent of the
conversion in Deepa's patch set.

> So for 32-bit code generation, we *may* want to introduce a new model of doing
> 
> set_inode_time(inode, ATTR_ATIME | ATTR_MTIME);
> 
> which basically just does
> 
> inode->i_atime = inode->i_mtime = current_time(inode);
> 
> but with a much easier calling convention on 32-bit architectures.
> 
> But that is entirely orthogonal to this patch-set, and should be seen
> as a separate issue.

I've played around with that, but found it hard to avoid going
through memory other than going all the way to the tk_xtime()
access to copy both tk->xtime_sec and the nanoseconds into
the inode fields.

Without that, the set_inode_time() implementation ends up
being more expensive than
inode->i_atime = inode->i_ctime = inode->i_mtime = current_time(inode);
because we still copy through the stack but also have
a couple of conditional branches that we don't have at the
moment.

At the moment, the triple assignment becomes (here on ARM)

   c:   4668mov r0, sp
  12:   f7ff fffe   bl  0 
  3e:   f107 0520   add.w   r5, r7, #32
12: R_ARM_THM_CALL  current_kernel_time64
  16:   f106 0410   add.w   r4, r6, #16
  1a:   e89d 000f   ldmia.w sp, {r0, r1, r2, r3} # load from stack
  1e:   e885 000f   stmia.w r5, {r0, r1, r2, r3} # store into i_atime
  22:   e884 000f   stmia.w r4, {r0, r1, r2, r3} #i_ctime
  26:   e886 000f   stmia.w r6, {r0, r1, r2, r3} #i_mtime

and a slightly more verbose version of the same thing on x86
(storing only 12 bytes instead of 16 is cheaper there, while
ARM does a store-multiple to copy the entire structure).

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH v2 00/24] Delete CURRENT_TIME and CURRENT_TIME_SEC macros

2016-06-22 Thread Arnd Bergmann
On Sunday, June 19, 2016 5:26:59 PM CEST Deepa Dinamani wrote:
> The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC 
> macros.
> The macros are not y2038 safe. There is no plan to transition them into being
> y2038 safe.
> ktime_get_* api's can be used in their place. And, these are y2038 safe.
> 
> Thanks to Arnd Bergmann for all the guidance and discussions.
> 
> Patches 2-4 were mostly generated using coccinelle scripts.
> 
> All filesystem timestamps use current_fs_time() for right granularity as
> mentioned in the respective commit texts of patches. This has a changed
> signature, renamed to current_time() and moved to the fs/inode.c.
> 
> This series also serves as a preparatory series to transition vfs to 64 bit
> timestamps as outlined here: https://lkml.org/lkml/2016/2/12/104 .
> 
> As per Linus's suggestion in https://lkml.org/lkml/2016/5/24/663 , all the
> inode timestamp changes have been squashed into a single patch. Also,
> current_time() now is used as a single generic vfs filesystem timestamp api.
> It also takes struct inode* as argument instead of struct super_block*.
> Posting all patches together in a bigger series so that the big picture is
> clear.
> 
> As per the suggestion in https://lwn.net/Articles/672598/, CURRENT_TIME macro
> bug fixes are being handled in a series separate from transitioning vfs to 
> use.

I've looked in detail at all the patches in this version now, and while
overall everything is fine, I found that two patches cannot be part of the
series because of the dependency on the patch that John already took (adding
time64_to_tm), but I think that's ok because we just need to change over
all the users of CURRENT_TIME and CURRENT_TIME_SEC that assign to inode
timestamps in order to prepare for the type change, the other ones
can be changed later.

I also found a few things that could be done differently to make the
later conversion slightly easier, but it's also possible that I missed
part of your bigger plan for those files, and none of them seem important.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Y2038] [PATCH v3 00/24] Delete CURRENT_TIME_SEC and replace current_fs_time()

2016-06-29 Thread Arnd Bergmann
On Saturday, June 25, 2016 2:37:24 PM CEST Deepa Dinamani wrote:
> The series is aimed at getting rid of CURRENT_TIME, CURRENT_TIME_SEC macros
> and replacing current_fs_time() with current_time().
> The macros are not y2038 safe. There is no plan to transition them into being
> y2038 safe.
> ktime_get_* api's can be used in their place. And, these are y2038 safe.
> 
> CURRENT_TIME will be deleted after 4.8 rc1 as there is a dependency function
> time64_to_tm() for one of the CURRENT_TIME occurance.
> 
> Thanks to Arnd Bergmann for all the guidance and discussions.
> 
> Patches 3-5 were mostly generated using coccinelle.
> 
> All filesystem timestamps use current_fs_time() for right granularity as
> mentioned in the respective commit texts of patches. This has a changed
> signature, renamed to current_time() and moved to the fs/inode.c.
> 
> This series also serves as a preparatory series to transition vfs to 64 bit
> timestamps as outlined here: https://lkml.org/lkml/2016/2/12/104 .
> 
> As per Linus's suggestion in https://lkml.org/lkml/2016/5/24/663 , all the
> inode timestamp changes have been squashed into a single patch. Also,
> current_time() now is used as a single generic vfs filesystem timestamp api.
> It also takes struct inode* as argument instead of struct super_block*.
> Posting all patches together in a bigger series so that the big picture is
> clear.
> 
> As per the suggestion in https://lwn.net/Articles/672598/, CURRENT_TIME macro
> bug fixes are being handled in a series separate from transitioning vfs to 
> use.
> 

Everything in this version looks good to me. Please add

Reviewed-by: Arnd Bergmann 

and send a pull request to Al Viro, based on the latest linux-4.7-rc release.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs

2011-01-04 Thread Arnd Bergmann
On Tuesday 04 January 2011 06:40:32 Shaohua Li wrote:

> +static int ioctl_metadata_incore(struct file *filp, void __user *argp)
> +{
> +   struct super_block *sb = filp->f_path.dentry->d_inode->i_sb;
> +   struct metadata_incore_args args;
> +   struct metadata_incore_ent ent;
> +   loff_t offset, last_offset = 0;
> +   ssize_t size, last_size = 0;
> +   __u64 __user vec_addr;

__user only makes sense on pointers. Just make this a "struct
metadata_incore_ent __user *", which will also take care of the
"sparse" warnings you get from the copy_to_user lines below.

>  
> +struct metadata_incore_ent {
> +   __u64 offset;
> +   __u32 size;
> +   __u32 unused;
> +};
> +
> +struct metadata_incore_args {
> +   __u64 offset; /* offset in meta address */
> +   __u64 __user vec_addr; /* vector's address */
> +   __u32 vec_size; /* vector's size */
> +   __u32 unused;
> +};

We usually try hard to avoid ioctls with indirect pointers
in them. The implementation is correct (most people
get this wrong), besides the extraneous __user keyword in
there.

Have you tried passing just a single metadata_incore_ent
at the ioctl and looping in user space? I would guess the
extra overhead of that would be small enough, but that might
need to be measured.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs

2011-01-04 Thread Arnd Bergmann
On Tuesday 04 January 2011 06:40:37 Shaohua Li wrote:
>  /*
>   * When you add any new common ioctls to the switches above and below
>   * please update compat_sys_ioctl() too.
> @@ -664,6 +682,9 @@ int do_vfs_ioctl(struct file *filp, unsi
> case FIMETADATA_INCORE:
> return ioctl_metadata_incore(filp, argp);
>  
> +   case FIMETADATA_READAHEAD:
> +   return ioctl_metadata_readahead(filp, argp);
> +
> default:
> if (S_ISREG(filp->f_path.dentry->d_inode->i_mode))
> error = file_ioctl(filp, cmd, arg);

Did you notice the comment above the function? ;-)

You should really add the new ioctls to compat_sys_ioctl, not
to the COMPATIBLE_IOCTL() list, in order to make the behavior
consistent between 32 and 64 bit user space. The main difference
is that all ioctl commands that are hardcoded in the functions
get called before trying to call the file system specific
.ioctl method.

> +struct metadata_readahead_args {
> +   __u64 offset;
> +   __u64 size;
> +};
> +
>  #define NR_FILE  8192  /* this can well be larger on a larger system */
>  
>  #define MAY_EXEC 1
> @@ -338,6 +343,7 @@ struct metadata_incore_args {
>  #define FITHAW _IOWR('X', 120, int)/* Thaw */
>  #define FITRIM _IOWR('X', 121, struct fstrim_range)/* Trim */
>  #define FIMETADATA_INCORE _IOWR('X', 122, struct metadata_incore_args)
> +#define FIMETADATA_READAHEAD _IOR('X', 123, struct metadata_readahead_args)

This should be _IOW, not _IOR.

Otherwise looks good.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs

2011-01-05 Thread Arnd Bergmann
On Wednesday 05 January 2011 03:11:36 Shaohua Li wrote:
> > Did you notice the comment above the function? ;-)
> > 
> > You should really add the new ioctls to compat_sys_ioctl, not
> > to the COMPATIBLE_IOCTL() list, in order to make the behavior
> > consistent between 32 and 64 bit user space. The main difference
> > is that all ioctl commands that are hardcoded in the functions
> > get called before trying to call the file system specific
> > .ioctl method.

> Thanks, fixed them.

The patch you posted still uses COMPATIBLE_IOCTL. Wrong patch?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs

2011-01-05 Thread Arnd Bergmann
On Wednesday 05 January 2011 10:09:20 Arnd Bergmann wrote:
> > Thanks, fixed them.
> 
> The patch you posted still uses COMPATIBLE_IOCTL. Wrong patch?

On a second look, I noticed that you now have both the COMPATIBLE_IOCTL
and the case statement in compat_sys_ioctl. The former can be
dropped.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs

2011-01-05 Thread Arnd Bergmann
On Wednesday 05 January 2011 03:17:16 Shaohua Li wrote:
> On Tue, 2011-01-04 at 17:40 +0800, Arnd Bergmann wrote:
> 
> > Have you tried passing just a single metadata_incore_ent
> > at the ioctl and looping in user space? I would guess the
> > extra overhead of that would be small enough, but that might
> > need to be measured.
> metadata usually isn't continuous, so this means we have a lot of
> metadata_incore_ent entries. And this is called at boot time and I want
> to make the overhead as low as possible to not impact boot. Unless there
> are certain reasons we can't use indirect pointers, I'd like to make
> kernel return a vector of entries.

It's not a strict rule, but the indirect data passing is rather
ugly and I'd only do that if the difference can be /measured/.

If the purpose is to speed up boot time by preloading metadata,
the FIMETADATA_INCORE operations should of course not take a
significant amount of time compared to the actual preloading,
but as long as it's less than one percent of the time you need
for the preload, I would just use the simpler interface.

> @@ -882,6 +882,7 @@ COMPATIBLE_IOCTL(FIGETBSZ)
>  /* 'X' - originally XFS but some now in the VFS */
>  COMPATIBLE_IOCTL(FIFREEZE)
>  COMPATIBLE_IOCTL(FITHAW)
> +COMPATIBLE_IOCTL(FIMETADATA_INCORE)
>  COMPATIBLE_IOCTL(KDGETKEYCODE)
>  COMPATIBLE_IOCTL(KDSETKEYCODE)
>  COMPATIBLE_IOCTL(KDGKBTYPE)

This change can go away as well.

Two more general comments:

- You probably want to add the ioctls to file_ioctl instead of do_vfs_ioctl,
  so you don't add another case statement to the common path.

- I don't know if there are any rules for what should be an ioctl or an
  fcntl, we're rather inconsistent about this. If you have found a good
  reason for making it an ioctl, just put that into the changelog so we
  can refer to it next time.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs

2011-01-05 Thread Arnd Bergmann
On Thursday 06 January 2011, Shaohua Li wrote:
> I don't understand. adding a case statement in compat_sys_ioctl, so we will do
> compat_ioctl_check_table(). If I add COMPATIBLE_IOCTL(), then the check
> will success, we will go to the found_handler code path and execute
> do_vfs_ioctl, which is what we want. if not adding COMPATIBLE_IOCTL(),
> the check will fail, and in any case, we will go to the out_fput code
> path, so our ioctl does nothing.

You are correct, I misremembered the code and did not check properly.

> > Two more general comments:
> > 
> > - You probably want to add the ioctls to file_ioctl instead of do_vfs_ioctl,
> >   so you don't add another case statement to the common path.
> > 
> > - I don't know if there are any rules for what should be an ioctl or an
> >   fcntl, we're rather inconsistent about this. If you have found a good
> >   reason for making it an ioctl, just put that into the changelog so we
> >   can refer to it next time.
> it can be applied to a directory too. I thought file_ioctl or fcntl is
> for file.

Right again, good point!

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/5] add metadata_incore ioctl in vfs

2011-01-07 Thread Arnd Bergmann
On Thursday 06 January 2011, Shaohua Li wrote:
> Subject: add metadata_incore ioctl in vfs
> 
> Add an ioctl to dump filesystem's metadata in memory in vfs. Userspace 
> collects
> such info and uses it to do metadata readahead.
> Filesystem can hook to super_operations.metadata_incore to get metadata in
> specific approach. Next patch will give an example how to implement
> .metadata_incore in btrfs.
> 
> Signed-off-by: Shaohua Li 

Looks great!

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5]add metadata_readahead ioctl in vfs

2011-01-07 Thread Arnd Bergmann
On Thursday 06 January 2011, Shaohua Li wrote:
> Subject: add metadata_readahead ioctl in vfs
> 
> Add metadata readahead ioctl in vfs. Filesystem can hook to
> super_operations.metadata_readahead to handle filesystem specific task.
> Next patch will give an example how btrfs implements it.
> 
> Signed-off-by: Shaohua Li 

Reviewed-by: Arnd Bergmann 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Oops when mounting btrfs partition

2013-02-02 Thread Arnd Bergmann
As mentioned on Google+, I have a partition that I can no longer mount
normally, containing a lot of my personal data and all backups from
my laptop.

I found now that I am still able to mount it using the 'nospace_cache'
option, but it takes a couple of minutes and I get "INFO: task
btrfs-transacti:1698 blocked for more than 120 seconds." messages
reporting the thread to be stuck in

[] wait_for_completion+0x1d/0x20
[] btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[] btrfs_run_ordered_operations+0x290/0x2f0 [btrfs]
[] btrfs_commit_transaction+0x5f/0xab0 [btrfs]
[] transaction_kthread+0x1bd/0x240 [btrfs]
[ 2040.620221]  [] kthread+0xc0/0xd0

but that seems harmless otherwise. The system seems slow but usable after
this, but I have seen a corrupt "akonadi" database after that, so I'm
still suspicious and will try to rescue the data and reformat the
partition before I do any real work on this again.

Looking back at the log I actually found multiple errors. There was
a btrfs checksum error a couple of weeks ago, but fsck did not find
anything, so I blamed that on a spurious RAM error.
This is the earliest WARNING I found in the logs I still have:

Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino 
15619835 off 454656 csum 2755731641 private 864823192
Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 errs: 
wr 0, rd 0, flush 0, corrupt 17, gen 0
...
Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify failed 
on 17006399488 wanted 54700 found 54764
Jan 21 16:35:40 localhost kernel: [1655047.752517] parent transid verify failed 
on 17006399488 wanted 54700 found 54764
Jan 21 16:35:40 localhost kernel: [1655047.752692] btrfs read error corrected: 
ino 1 off 17006399488 (dev /dev/sdb1 sector 64689288)
Jan 21 16:35:40 localhost kernel: [1655047.752704] btrfs: run_one_delayed_ref 
returned -5
Jan 21 16:35:40 localhost kernel: [1655047.752706] [ cut here 
]
Jan 21 16:35:40 localhost kernel: [1655047.752746] WARNING: at 
/build/buildd/linux-3.5.0/fs/btrfs/super.c:221 
__btrfs_abort_transaction+0xad/0xc0 [btrfs]()
Jan 21 16:35:40 localhost kernel: [1655047.752747] Hardware name: P5Q-EM
Jan 21 16:35:40 localhost kernel: [1655047.752748] btrfs: Transaction aborted
Jan 21 16:35:40 localhost kernel: [1655047.752750] Modules linked in: ufs qnx4 
hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 bnep rfcomm parport_pc ppdev 
bluetooth lp parport binfmt_misc dm_crypt coretemp kvm_intel kvm 
snd_hda_codec_hdmi snd_hda_codec_realtek microcode snd_hda_intel snd_hda_codec 
snd_hwdep snd_pcm snd_seq_midi snd_rawmidi serio_raw snd_seq_midi_event lpc_ich 
snd_seq snd_timer snd_seq_device snd soundcore snd_page_alloc asus_atk0110 
mac_hid btrfs zlib_deflate libcrc32c hid_generic i915 firewire_ohci 
drm_kms_helper firewire_core crc_itu_t r8169 drm i2c_algo_bit video 
pata_marvell usbhid hid usb_storage
Jan 21 16:35:40 localhost kernel: [1655047.752789] Pid: 7652, comm: 
btrfs-endio-wri Tainted: GW3.5.0-7-generic #7-Ubuntu
Jan 21 16:35:40 localhost kernel: [1655047.752791] Call Trace:
Jan 21 16:35:40 localhost kernel: [1655047.752800]  [] 
warn_slowpath_common+0x7f/0xc0
Jan 21 16:35:40 localhost kernel: [1655047.752803]  [] 
warn_slowpath_fmt+0x46/0x50
Jan 21 16:35:40 localhost kernel: [1655047.752814]  [] 
__btrfs_abort_transaction+0xad/0xc0 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752826]  [] 
btrfs_run_delayed_refs+0x3f4/0x440 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752840]  [] ? 
merge_state+0xd9/0x150 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752845]  [] ? 
_raw_spin_lock+0xe/0x20
Jan 21 16:35:40 localhost kernel: [1655047.752856]  [] ? 
block_rsv_release_bytes+0x131/0x190 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752869]  [] 
__btrfs_end_transaction+0x9f/0x350 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752882]  [] 
btrfs_end_transaction+0x15/0x20 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752895]  [] 
btrfs_finish_ordered_io+0x17d/0x410 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752907]  [] 
finish_ordered_fn+0x15/0x20 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752921]  [] 
worker_loop+0x15f/0x5a0 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752923]  [] ? 
__schedule+0x3cf/0x7c0
Jan 21 16:35:40 localhost kernel: [1655047.752937]  [] ? 
btrfs_queue_worker+0x330/0x330 [btrfs]
Jan 21 16:35:40 localhost kernel: [1655047.752941]  [] 
kthread+0x93/0xa0
Jan 21 16:35:40 localhost kernel: [1655047.752943]  [] 
kernel_thread_helper+0x4/0x10
Jan 21 16:35:40 localhost kernel: [1655047.752946]  [] ? 
flush_kthread_worker+0x80/0x80
Jan 21 16:35:40 localhost kernel: [1655047.752948]  [] ? 
gs_change+0x13/0x13
Jan 21 16:35:40 localhost kernel: [1655047.752950] ---[ end trace 
5ec1dfe1fd2d ]---
Jan 21 16:35:40 localhost kernel: [1655047.752952] BTRFS error (device sdb1) in 
btrfs_run_delayed_refs:2455: IO failure
Jan 21 16:35:40 localhost kernel: [1655047.75

Re: Oops when mounting btrfs partition

2013-02-02 Thread Arnd Bergmann
On Saturday 02 February 2013 10:20:35 Chris Mason wrote:
> Hi Arnd,
> 
> First things first, nospace_cache is a safe thing to use.  It is slow
> because it's finding free extents, but it's just a cache and always safe
> to discard.  With your other errors, I'd just mount it readonly
> and then you won't waste time on atime updates.

Ok, I see. Thanks for taking a look so quickly.

> I'll take a look at the BUG you got during log recovery.  We've fixed a
> few of those during the 3.8 rc cycle.

Well, it happened on 3.8-rc4 and on 3.5 here, so I'd guess it's a
different one.

> > Feb  1 22:57:37 localhost kernel: [ 8561.599482] Kernel BUG at 
> > a01fdcf7 [verbose debug info unavailable]
> 
> > Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino 
> > 15619835 off 454656 csum 2755731641 private 864823192
> > Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 
> > errs: wr 0, rd 0, flush 0, corrupt 17, gen 0
> > ...
> > Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify 
> > failed on 17006399488 wanted 54700 found 54764
> 
> These aren't good.  With a few exceptions for really tight races in fsx
> use cases, csum errors are bad data from the disk.  The transid verify
> failed shows we wanted to find a metadata block from generation 54700
> but found 54764 instead:
> 
> 54700 = 0xD5AC
> 54764 = 0xD5EC
> 
> This same bad block comes up a few different times.

The machine has had problems with data consistency in the past, so
I'm not too surprised with getting a single-bit error, although this
is the first time in a year that I've seen problems, and I replaced
the faulty memory modules some time ago.

Anyway, I already ordered a replacement box a few weeks ago, and that
one will have ECC memory besides being a modern Opteron system to replace
the aging Core 2.

> > Jan 21 16:35:40 localhost kernel: [1655047.752692] btrfs read error 
> > corrected: ino 1 off 17006399488 (dev /dev/sdb1 sector 64689288)
> 
> This shows we pulled from the second copy of this block and got the
> right answer, and then wrote the right answer to the duplicate.
> Inode 1 means it was metadata.
> 
> But for some reason still aborted the transaction.  It could have been
> an EIO on the correction, but the auto correction code in 3.5 did work
> well.
> 
> I think your plan to pull the data off and reformat is a good one.  I'd
> also look hard at your ram since drives don't usually send back single bit
> errors.

Ok. I'll wait before reformmatting though, in case you need to take
a look at the data later to find out why it crashed without fsck finding
a problem.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops when mounting btrfs partition

2013-02-04 Thread Arnd Bergmann
On Saturday 02 February 2013, Chris Mason wrote:

> > Feb  1 22:57:37 localhost kernel: [ 8561.599482] Kernel BUG at 
> > a01fdcf7 [verbose debug info unavailable]
> 
> > Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino 
> > 15619835 off 454656 csum 2755731641 private 864823192
> > Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 
> > errs: wr 0, rd 0, flush 0, corrupt 17, gen 0
> > ...
> > Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify 
> > failed on 17006399488 wanted 54700 found 54764
> 
> These aren't good.  With a few exceptions for really tight races in fsx
> use cases, csum errors are bad data from the disk.  The transid verify
> failed shows we wanted to find a metadata block from generation 54700
> but found 54764 instead:
> 

I've done a full backup of all data now, without any further Ooops messages, but
I did get these:

[66155.429029] btrfs no csum found for inode 1212139 start 23707648
[66155.429035] btrfs no csum found for inode 1212139 start 23711744
[66155.429039] btrfs no csum found for inode 1212139 start 23715840
[66155.429042] btrfs no csum found for inode 1212139 start 23719936
[66155.452298] btrfs csum failed ino 1212139 off 23707648 csum 4112094897 
private 0
[66155.452310] btrfs csum failed ino 1212139 off 23711744 csum 3308812742 
private 0
[66155.452316] btrfs csum failed ino 1212139 off 23715840 csum 2566472073 
private 0
[66155.452322] btrfs csum failed ino 1212139 off 23719936 csum 2290008602 
private 0
[66159.876785] btrfs no csum found for inode 1212139 start 69992448
[66159.876792] btrfs no csum found for inode 1212139 start 69996544
[66159.876797] btrfs no csum found for inode 1212139 start 7640
[66159.876801] btrfs no csum found for inode 1212139 start 70004736
[66159.921506] btrfs csum failed ino 1212139 off 69992448 csum 2290360822 
private 0
[66159.921517] btrfs csum failed ino 1212139 off 69996544 csum 954182507 
private 0
[66159.921524] btrfs csum failed ino 1212139 off 7640 csum 2594579850 
private 0
[66159.921532] btrfs csum failed ino 1212139 off 70004736 csum 25334750 private 0
[66932.289905] btrfs csum failed ino 2461761 off 94208 csum 3824674580 private 
1950015541
[92042.101540] btrfs csum failed ino 687755 off 7048040448 csum 2502110259 
private 2186199747
[110952.542245] btrfs csum failed ino 5423479 off 475136 csum 490948044 private 
3797189576
[122692.216371] btrfs csum failed ino 7959218 off 2818048 csum 1904746846 
private 2392844122
[138205.726897] btrfs: sdb1 checksum verify failed on 20495056896 wanted 
8C9759CB found 9BFAB73B level 0

Inode 1212139 is the akonadi database that was used by kmail, so it constantly
got written to during the crashes. The file was completely corrupt. The
other inodes are mostly files that were backed up from the other machine
and have been on the drive I started using it, without ever being accessed.
I've probably had a few bit flips the entire time I was using the machine,
but never noticed before I started using a checksumming file system.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops when mounting btrfs partition

2013-02-11 Thread Arnd Bergmann
On Friday 08 February 2013, David Sterba wrote:
> On Mon, Feb 04, 2013 at 09:55:50PM +0000, Arnd Bergmann wrote:
> > On Saturday 02 February 2013, Chris Mason wrote:
> > 
> > I've done a full backup of all data now, without any further Ooops 
> > messages, but
> > I did get these:
> > 
> > [66155.429029] btrfs no csum found for inode 1212139 start 23707648
> 
> The missing csums were caused by a bug introcuded in 3.8-rc1 and fixed
> in rc5.
> 

Ok, thanks for the information.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 00/32] making inode time stamps y2038 ready

2014-05-30 Thread Arnd Bergmann
Based on the recent discussion about 64-bit time_t for new
architectures, and for solving the year 2038 problem in general,
I decided to try out what it would take to solve part of the
kernel side of things.

This is a proof-of-concept work to get us to the point where
two system calls (utimes and stat) provide a working interface
to user space to pass 64-bit inode time stamps in and out of
the kernel all the way to the file systems.

I picked this because it is a fairly isolated problem, as the
inode time stamps are rarely assigned to any other time values.
As a byproduct of this work, I documented for each of the file
systems we support how long the on-disk format can work[1].

Obviously we also need to convert all the other syscalls and
have a proper libc implementation using those for this to
be really useful, but it's a start and it can be tested
independently (I didn't so far, want to wait for initial
feedback).

All the interesting stuff is in the first five patches here,
the rest is the straightforward conversion of all file systems
that use 'timespec' values internally.

There are of course a number of open questions:

a) is this the right approach in general? The previous discussion
   pointed this way, but there may be other opinions.
b) what type should we use internally to represent inode time
   stamps? The code contains three different versions that would
   all work, we just have to pick a good tradeoff between
   efficiency and the range of times we want to cover.
c) Should we continue this way for all 32-bit platforms for
   consistency, including future ones, or should we go to
   different 64-bit types right away? My feeling is that the
   second approach would complicate this work.

Arnd

[1] http://kernelnewbies.org/y2038

Arnd Bergmann (32):
  fs: introduce new 'struct inode_time'
  uapi: add struct __kernel_timespec{32,64}
  fs: introduce sys_utimens64at
  fs: introduce sys_newfstat64/sys_newfstatat64
  arch: hook up new stat and utimes syscalls
  isofs: fix timestamps beyond 2027
  fs/nfs: convert to struct inode_time
  fs/ceph: convert to 'struct inode_time'
  fs/pstore: convert to struct inode_time
  fs/coda: convert to struct inode_time
  xfs: convert to struct inode_time
  btrfs: convert to struct inode_time
  ext3: convert to struct inode_time
  ext4: convert to struct inode_time
  cifs: convert to struct inode_time
  ntfs: convert to struct inode_time
  ubifs: convert to struct inode_time
  ocfs2: convert to struct inode_time
  fs/fat: convert to struct inode_time
  afs: convert to struct inode_time
  udf: convert to struct inode_time
  fs: convert simple fs to inode_time
  logfs: convert to struct inode_time
  hfs, hfsplus: convert to struct inode_time
  gfs2: convert to struct inode_time
  reiserfs: convert to struct inode_time
  jffs2: convert to struct inode_time
  adfs: convert to struct inode_time
  f2fs: convert to struct inode_time
  fuse: convert to struct inode_time
  scsi: fnic: use current_kernel_time() for timestamp
  fs: use new inode_time definition unconditionally

 arch/alpha/kernel/osf_sys.c|  2 +-
 arch/arm/include/asm/unistd.h  |  2 +-
 arch/arm/include/uapi/asm/stat.h   | 25 +
 arch/arm/include/uapi/asm/unistd.h |  3 +++
 arch/arm/kernel/calls.S|  3 +++
 arch/arm64/include/asm/unistd32.h  |  5 +++-
 arch/x86/include/uapi/asm/stat.h   | 28 +++
 arch/x86/syscalls/syscall_32.tbl   |  3 +++
 drivers/block/rbd.c|  2 +-
 drivers/firmware/efi/efi-pstore.c  | 28 +--
 drivers/scsi/fnic/fnic_trace.c |  2 +-
 drivers/tty/tty_io.c   |  2 +-
 drivers/usb/gadget/f_fs.c  |  2 +-
 fs/adfs/inode.c|  4 +--
 fs/afs/afs.h   |  6 ++---
 fs/afs/fsclient.c  |  2 +-
 fs/attr.c  |  8 +++---
 fs/btrfs/file.c|  6 ++---
 fs/btrfs/inode.c   |  4 +--
 fs/btrfs/ioctl.c   |  4 +--
 fs/btrfs/root-tree.c   |  2 +-
 fs/btrfs/transaction.c |  2 +-
 fs/ceph/cache.c|  2 +-
 fs/ceph/caps.c |  6 ++---
 fs/ceph/file.c |  4 +--
 fs/ceph/inode.c| 20 +++---
 fs/ceph/super.h|  8 +++---
 fs/cifs/cache.c|  6 ++---
 fs/cifs/cifsglob.h |  6 ++---
 fs/cifs/cifsproto.h|  6 ++---
 fs/cifs/cifssmb.c  |  5 ++--
 fs/cifs/inode.c|  2 +-
 fs/cifs/netmisc.c  | 15 ++-
 fs/coda/coda_linux.c   | 18 -
 fs/compat.c| 19 ++---
 fs/configfs/inode.c|  6 ++---
 fs/cramfs/inode.c  |  2 +-
 fs/ext3/inode.c|  4 +--
 fs/ext4/ext4.h | 10 +++
 fs/ext4/extents.c 

[RFC 12/32] btrfs: convert to struct inode_time

2014-05-30 Thread Arnd Bergmann
btrfs uses unsigned 64-bit seconds for inode timestamps, which will work
basically forever, but the VFS uses struct timespec for timestamps,
which is only good until 2038 on 32-bit CPUs.

This gets us one small step closer to lifting the VFS limit by using
struct inode_time in btrfs.

Signed-off-by: Arnd Bergmann 
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/file.c| 6 +++---
 fs/btrfs/inode.c   | 4 ++--
 fs/btrfs/ioctl.c   | 4 ++--
 fs/btrfs/root-tree.c   | 2 +-
 fs/btrfs/transaction.c | 2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a58df83..3e16a4e 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1693,16 +1693,16 @@ out:
 
 static void update_time_for_write(struct inode *inode)
 {
-   struct timespec now;
+   struct inode_time now;
 
if (IS_NOCMTIME(inode))
return;
 
now = current_fs_time(inode->i_sb);
-   if (!timespec_equal(&inode->i_mtime, &now))
+   if (!inode_time_equal(&inode->i_mtime, &now))
inode->i_mtime = now;
 
-   if (!timespec_equal(&inode->i_ctime, &now))
+   if (!inode_time_equal(&inode->i_ctime, &now))
inode->i_ctime = now;
 
if (IS_I_VERSION(inode))
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2ac3036..d825387 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5440,7 +5440,7 @@ static int btrfs_dirty_inode(struct inode *inode)
  * This is a copy of file_update_time.  We need this so we can return error on
  * ENOSPC for updating the inode in the case of file write and mmap writes.
  */
-static int btrfs_update_time(struct inode *inode, struct timespec *now,
+static int btrfs_update_time(struct inode *inode, struct inode_time *now,
 int flags)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -8223,7 +8223,7 @@ static int btrfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
struct btrfs_root *dest = BTRFS_I(new_dir)->root;
struct inode *new_inode = new_dentry->d_inode;
struct inode *old_inode = old_dentry->d_inode;
-   struct timespec ctime = CURRENT_TIME;
+   struct inode_time ctime = CURRENT_TIME;
u64 index = 0;
u64 root_objectid;
int ret;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a313ab0..2de5f86 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -435,7 +435,7 @@ static noinline int create_subvol(struct inode *dir,
struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_root *new_root;
struct btrfs_block_rsv block_rsv;
-   struct timespec cur_time = CURRENT_TIME;
+   struct inode_time cur_time = CURRENT_TIME;
struct inode *inode;
int ret;
int err;
@@ -4456,7 +4456,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file 
*file,
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_root_item *root_item = &root->root_item;
struct btrfs_trans_handle *trans;
-   struct timespec ct = CURRENT_TIME;
+   struct inode_time ct = CURRENT_TIME;
int ret = 0;
int received_uuid_changed;
 
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index 38bb47e..344e89f 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -487,7 +487,7 @@ void btrfs_update_root_times(struct btrfs_trans_handle 
*trans,
 struct btrfs_root *root)
 {
struct btrfs_root_item *item = &root->root_item;
-   struct timespec ct = CURRENT_TIME;
+   struct inode_time ct = CURRENT_TIME;
 
spin_lock(&root->root_item_lock);
btrfs_set_root_ctransid(item, trans->transid);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 7579f6d..09dcc8a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1133,7 +1133,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
struct dentry *dentry;
struct extent_buffer *tmp;
struct extent_buffer *old;
-   struct timespec cur_time = CURRENT_TIME;
+   struct inode_time cur_time = CURRENT_TIME;
int ret = 0;
u64 to_reserve = 0;
u64 index = 0;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-05-31 Thread Arnd Bergmann
On Saturday 31 May 2014 16:51:15 Richard Cochran wrote:
> On Fri, May 30, 2014 at 10:01:24PM +0200, Arnd Bergmann wrote:
> > 
> > I picked this because it is a fairly isolated problem, as the
> > inode time stamps are rarely assigned to any other time values.
> > As a byproduct of this work, I documented for each of the file
> > systems we support how long the on-disk format can work[1].
> 
> Why are some of the time stamp expiration dates marked as "never"?

It's an approximation:
with 64-bit timestamps, you can represent close to 300 billion
years, which is way past the time that our planet can sustain
life of any form[1].

Arnd

[1] http://en.wikipedia.org/wiki/Timeline_of_the_far_future
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> On Fri, 30 May 2014, Arnd Bergmann wrote:
> 
> > a) is this the right approach in general? The previous discussion
> >pointed this way, but there may be other opinions.
> 
> The syscall changes seem like the sort of thing I'd expect, although 
> patches adding new syscalls or otherwise affecting the kernel/userspace 
> interface (as opposed to those relating to an individual filesystem) 
> should go to linux-api as well as other relevant lists.

Ok. Sorry about missing linux-api, I confused it with linux-arch, which
may not be as relevant here, except for the one question whether we
actually want to have the new ABI on all 32-bit architectures or only
as an opt-in for those that expect to stay around for another 24 years.

Two more questions for you:

- are you (and others) happy with adding this type of stat syscall
  (fstatat64/fstat64) as opposed to the more generic xstat that has
  been discussed in the past and that never made it through the bike-
  shedding discussion?

- once we have enough buy-in from reviewers to merge this initial
  series, should we proceed to define rest of the syscall ABI
  (minus driver ioctls) so glibc and kernel can do the conversion
  on top of that, or should we better try to do things one syscall
  family at a time and actually get the kernel to handle them
  correctly internally?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 12:26:22 H. Peter Anvin wrote:
> On 06/02/2014 12:19 PM, Arnd Bergmann wrote:
> > On Monday 02 June 2014 13:52:19 Joseph S. Myers wrote:
> >> On Fri, 30 May 2014, Arnd Bergmann wrote:
> >>
> >>> a) is this the right approach in general? The previous discussion
> >>>pointed this way, but there may be other opinions.
> >>
> >> The syscall changes seem like the sort of thing I'd expect, although 
> >> patches adding new syscalls or otherwise affecting the kernel/userspace 
> >> interface (as opposed to those relating to an individual filesystem) 
> >> should go to linux-api as well as other relevant lists.
> > 
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
> > 
> > Two more questions for you:
> > 
> > - are you (and others) happy with adding this type of stat syscall
> >   (fstatat64/fstat64) as opposed to the more generic xstat that has
> >   been discussed in the past and that never made it through the bike-
> >   shedding discussion?
> > 
> > - once we have enough buy-in from reviewers to merge this initial
> >   series, should we proceed to define rest of the syscall ABI
> >   (minus driver ioctls) so glibc and kernel can do the conversion
> >   on top of that, or should we better try to do things one syscall
> >   family at a time and actually get the kernel to handle them
> >   correctly internally?
> > 
> 
> The bit that is really going to hurt is every single ioctl that uses a
> timespec.
> 
> Honestly, though, I really don't understand the point with "struct
> inode_time".  It seems like the zeroeth-order thing is to change the
> kernel internal version of struct timespec to have a 64-bit time... it
> isn't just about inodes.  We then should be explicit about the external
> uses of time, and use accessors.

I picked these because they are fairly isolated from all other uses,
in particular since inode times are the only things where we really
care about times in the distant past or future (decades away as opposed
to things that happened between boot and shutdown).

For other kernel-internal uses, we may be better off migrating to
a completely different representation, such as nanoseconds since
boot or the architecture specific ktime_t, but this is really something
to decide for each subsystem.

I just tried building an arm32 kernel with a s64 time_t, and that
failed horribly, I get linker errors for missing 64-bit divides
and lots of warnings for code that expects time_t pointers to
functions taking a 'long' or vice versa. I also think the only
way to maintain ABI compatibility is to separate the internal uses
from the interface, which means auditing all code in the end.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-03 Thread Arnd Bergmann
On Saturday 31 May 2014 18:30:49 Vyacheslav Dubeyko wrote:
> By the way, what about NILFS2? Is NILFS2 ready for suggested approach
> without any changes?

nilfs2 and a lot of other file systems don't need any changes for
this, because they don't assign the inode time stamp fields to
a 'struct timespec'.

FWIW, nilfs2 uses a 64-bit seconds value, which is always safe and
can represent the full range of user space timespec on all machines.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-03 Thread Arnd Bergmann
On Tuesday 03 June 2014 14:33:10 Joseph S. Myers wrote:
> On Tue, 3 Jun 2014, Arnd Bergmann wrote:
> 
> > I think John Stultz and Thomas Gleixner have already started looking
> > at how the timekeeping code can be updated. Once that is done, we should
> > be able to add a functional 64-bit gettimeofday/settimeofday syscall
> > pair. While I definitely agree this is one of the most basic things to
> > have, it's also not an area of the kernel that is easy to change.
> 
> 64-bit clock_gettime / clock_settime instead of gettimeofday / 
> settimeofday should avoid the need for the kernel to have a 64-bit version 
> of struct timeval.  (Userspace 64-bit gettimeofday / settimeofday would 
> need to use a combination of the syscalls if the tz pointer is non-NULL.)

Yes, that's what I meant.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-03 Thread Arnd Bergmann
On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> >>
> >> The bit that is really going to hurt is every single ioctl that uses a
> >> timespec.
> >>
> >> Honestly, though, I really don't understand the point with "struct
> >> inode_time".  It seems like the zeroeth-order thing is to change the
> >> kernel internal version of struct timespec to have a 64-bit time... it
> >> isn't just about inodes.  We then should be explicit about the external
> >> uses of time, and use accessors.
> > 
> > I picked these because they are fairly isolated from all other uses,
> > in particular since inode times are the only things where we really
> > care about times in the distant past or future (decades away as opposed
> > to things that happened between boot and shutdown).
> > 
> 
> If nothing else, I would expect to be able to set the system time to
> weird values for testing.  So I'm not so sure I agree with that...

I think John Stultz and Thomas Gleixner have already started looking
at how the timekeeping code can be updated. Once that is done, we should
be able to add a functional 64-bit gettimeofday/settimeofday syscall
pair. While I definitely agree this is one of the most basic things to
have, it's also not an area of the kernel that is easy to change.

> > For other kernel-internal uses, we may be better off migrating to
> > a completely different representation, such as nanoseconds since
> > boot or the architecture specific ktime_t, but this is really something
> > to decide for each subsystem.
> 
> Having a bunch of different time representations in the kernel seems
> like a real headache...

We already have time_t, ktime_t, timeval, timespec, compat_timespec,
clock_t, cputime_t, cputime64_t, tm, nanoseconds, jiffies, jiffies64,
and lots of driver or file system specific representations. I'm all for
removing a bunch of these from the kernel, but my feeling is that this is
one of the cases where we first have to add new ones in order to remove
those that are already there.
To complicate things further, we also have various times bases
(realtime/utc, realtime/tai, monotonic, monotonic_raw, boottime, ...),
and at least for the timespec values we pass around, it's not always
obvious which one is used, of if that's the right one.

We probably don't want to add a lot of new representations, and it's
possible that we can change most of the internal code we have to
ktime_t and then convert that to whatever user space wants at the
interfaces.

The possible uses I can see for non-ktime_t types in the kernel are:
* inodes need 96 bit timestamps to represent the full range of values
  that can be stored in a file system, you made a convincing argument
  for that. Almost everything else can fit into 64 bit on a 32-bit
  kernel, in theory also on a 64-bit kernel if we want that.
* A number of interfaces pass relative timespecs: nanosleep(), poll(),
  select(), sigtimedwait(), alarm(), futex() and probably more. There is
  nothing wrong with the use of timespec here, and it may be good to
  annotate that by using a new type (e.g. struct timeout) that is defined
  as compatible with the current timespec.
* For new user interfaces, we need a new type such as the
  __kernel_timespec64 I introduced, so it doesn't clash with the normal
  user timespec that may be smaller, depending on the libc.
* A lot of drivers will need new ioctl commands, and for drivers that
  just need time stamps (audio, v4l, sockets, ...) it may be more
  efficient and more correct to use a new timestamp_t (e.g. boot time
  64-bit nanoseconds) than __kernel_timespec64, which is not normally
  monotonic and requires a normalization step. If we end up introducing
  such a type in the user interface, we can also start using it in the
  kernel.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-04 Thread Arnd Bergmann
On Tuesday 03 June 2014, Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 04:22:19PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 14:57:26 H. Peter Anvin wrote:
> > > On 06/02/2014 12:55 PM, Arnd Bergmann wrote:
> > The possible uses I can see for non-ktime_t types in the kernel are:
> > * inodes need 96 bit timestamps to represent the full range of values
> >   that can be stored in a file system, you made a convincing argument
> >   for that. Almost everything else can fit into 64 bit on a 32-bit
> >   kernel, in theory also on a 64-bit kernel if we want that.
> 
> Just ot be pedantic, inodes don't need 96 bit timestamps - some
> filesystems can *support up to* 96 bit timestamps. If the kernel
> only supports 64 bit timestamps and that's all the kernel can
> represent, then the upper bits of the 96 bit on-disk inode
> timestamps simply remain zero.

I meant the reverse: since we have file systems that can store
96-bit timestamps when using 64-bit kernels, we need to extend
32-bit kernels to have the same internal representation so we
can actually read those file systems correctly.

> If you move the filesystem between kernels with different time
> ranges, then the filesystem needs to be able to tell the kernel what
> it's supported range is.  This is where having the VFS limit the
> range of supported timestamps is important: the limit is the
> min(kernel range, filesystem range). This allows the filesystems
> to be indepenent of the kernel time representation, and the kernel
> to be independent of the physical filesystem time encoding

I agree it makes sense to let the kernel know about the limits
of the file system it accesses, but for the reverse, we're probably
better off just making the kernel representation large enough (i.e.
96 bits) so it can work with any known file system. We need another
check at the user space boundary to turn that into a value that the
user can understand, but that's another problem.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-04 Thread Arnd Bergmann
On Monday 02 June 2014, Joseph S. Myers wrote:
> On Mon, 2 Jun 2014, Arnd Bergmann wrote:
> 
> > Ok. Sorry about missing linux-api, I confused it with linux-arch, which
> > may not be as relevant here, except for the one question whether we
> > actually want to have the new ABI on all 32-bit architectures or only
> > as an opt-in for those that expect to stay around for another 24 years.
> 
> For glibc I think it will make the most sense to add the support for 
> 64-bit time_t across all architectures that currently have 32-bit time_t 
> (with the new interfaces having fallback support to implementation in 
> terms of the 32-bit kernel interfaces, if the 64-bit syscalls are 
> unavailable either at runtime or in the kernel headers against which glibc 
> is compiled - this fallback code will of course need to check for overflow 
> when passing a time value to the kernel, hopefully with error handling 
> consistent with whatever the kernel ends up doing when a filesystem can't 
> support a timestamp).  If some architectures don't provide the new 
> interfaces in the kernel then that will mean the fallback code in glibc 
> can't be removed until glibc support for those architectures is removed 
> (as opposed to removing it when glibc no longer supports kernels predating 
> the kernel support).

Ok, that's a good reason to just provide the new interfaces on all
architectures right away. Thanks for the insight!

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-04 Thread Arnd Bergmann
On Wednesday 04 June 2014 13:30:32 Nicolas Pitre wrote:
> On Wed, 4 Jun 2014, Arnd Bergmann wrote:
> 
> > On Tuesday 03 June 2014, Dave Chinner wrote:
> > > Just ot be pedantic, inodes don't need 96 bit timestamps - some
> > > filesystems can *support up to* 96 bit timestamps. If the kernel
> > > only supports 64 bit timestamps and that's all the kernel can
> > > represent, then the upper bits of the 96 bit on-disk inode
> > > timestamps simply remain zero.
> > 
> > I meant the reverse: since we have file systems that can store
> > 96-bit timestamps when using 64-bit kernels, we need to extend
> > 32-bit kernels to have the same internal representation so we
> > can actually read those file systems correctly.
> > 
> > > If you move the filesystem between kernels with different time
> > > ranges, then the filesystem needs to be able to tell the kernel what
> > > it's supported range is.  This is where having the VFS limit the
> > > range of supported timestamps is important: the limit is the
> > > min(kernel range, filesystem range). This allows the filesystems
> > > to be indepenent of the kernel time representation, and the kernel
> > > to be independent of the physical filesystem time encoding
> > 
> > I agree it makes sense to let the kernel know about the limits
> > of the file system it accesses, but for the reverse, we're probably
> > better off just making the kernel representation large enough (i.e.
> > 96 bits) so it can work with any known file system.
> 
> Depends...  96 bit handling may get prohibitive on 32-bit archs.
> 
> The important point here is for the kernel to be able to represent the 
> time _range_ used by any known filesystem, not necessarily the time 
> _precision_.
> 
> For example, a 64 bit representation can be made of 40 bits for seconds 
> spanning 34865 years, and 24 bits for fractional seconds providing 
> precision down to 60 nanosecs.  That ought to be plenty good on 32 bit 
> systems while still being cheap to handle.

I have checked earlier that we don't do any computation on inode
time stamps in common code, we just pass them around, so there is
very little runtime overhead. There is a small bit of space overhead
(12 byte) per inode, but that structure is already on the order of
500 bytes.

For other timekeeping stuff in the kernel, I agree that using some
64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
...) has advantages, that's exactly the point I was making earlier
against simply extending the internal time_t/timespec to 64-bit
seconds for everything.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 00/32] making inode time stamps y2038 ready

2014-06-10 Thread Arnd Bergmann
On Wednesday 04 June 2014 17:10:24 H. Peter Anvin wrote:
> On 06/04/2014 12:24 PM, Arnd Bergmann wrote:
> > 
> > For other timekeeping stuff in the kernel, I agree that using some
> > 64-bit representation (nanoseconds, 32/32 unsigned seconds/nanoseconds,
> > ...) has advantages, that's exactly the point I was making earlier
> > against simply extending the internal time_t/timespec to 64-bit
> > seconds for everything.
> > 
> 
> How much of a performance issue is it to make time_t 64 bits, and for
> the bits there are, how hard are they to fix?

Probably very little overhead for most uses, it's more the regression
potential in the less common parts of the kernel I'm worried about.

There is a significant but not overwhelming number of uses of the
main problematic types in the kernel:

arnd@wuerfel:~/arm-soc$ git grep -wl time_t | wc
188 1885566
arnd@wuerfel:~/arm-soc$ git grep -wl timeval | wc
320 320   10353
arnd@wuerfel:~/arm-soc$ git grep -wl timespec | wc
406 406   10886

I believe we have to audit all of them anyway if we want to change
the kernel to less problematic types and introduce new user
interfaces.

IMHO this work is helped if we change the uses to a new type
as we find the problems. This lets us do the work one subsystem
at a time and avoid accidental ABI changes. I don't care much what
type that will be, and having a 96-bit type will certainly work
well in a lot of cases, but I don't see a strong reason to use
that over other types, especially when they can be more efficient.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: avoid build warning on 32-bit

2015-01-13 Thread Arnd Bergmann
A recent change introduced a type cast from a private 64-bit
value to a pointer, which works fine on 64-bit architectures,
but not on 32-bit ones, where it produces a harmless compiler
warning:

fs/btrfs/extent_io.c: In function 'btrfs_free_io_failure_record':
fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]

This adds an intermediate cast to 'unsigned long', which tells
the compiler to ignore the type mismatch.

Signed-off-by: Arnd Bergmann 
Fixes: f612496bca664 ("Btrfs: cleanup the read failure record after write
or when the inode is freeing")

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ebabd237153..790dbae3343c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2190,7 +2190,7 @@ void btrfs_free_io_failure_record(struct inode *inode, 
u64 start, u64 end)
 
next = next_state(state);
 
-   failrec = (struct io_failure_record *)state->private;
+   failrec = (struct io_failure_record *)(unsigned 
long)state->private;
free_extent_state(state);
kfree(failrec);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix size_t format string

2015-03-09 Thread Arnd Bergmann
This resolves a harmless gcc warning in btrfs_check_super_valid that
results from a size_t value being printed as %lu:

fs/btrfs/disk-io.c:3927:21: warning: format '%lu' expects argument of type 
'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]

On all Linux systems, size_t is the same length as unsigned long,
but the compiler does not know this, and warns about potentially
unportable code here. The correct printf string for size_t is %z.

Signed-off-by: Arnd Bergmann 
Fixes: ce7fca5f57ed0f "btrfs: add checks for sys_chunk_array sizes"
Cc: David Sterba 
Cc: Chris Mason 
---
This warning has been rather annoying because it shows up in every
'allmodconfig' build. I assume others have reported it before, but
please apply some fix for it, ideally before 4.0.

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f79f38542a73..639f2663ed3f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3921,7 +3921,7 @@ static int btrfs_check_super_valid(struct btrfs_fs_info 
*fs_info,
}
if (btrfs_super_sys_array_size(sb) < sizeof(struct btrfs_disk_key)
+ sizeof(struct btrfs_chunk)) {
-   printk(KERN_ERR "BTRFS: system chunk array too small %u < 
%lu\n",
+   printk(KERN_ERR "BTRFS: system chunk array too small %u < 
%zu\n",
btrfs_super_sys_array_size(sb),
sizeof(struct btrfs_disk_key)
+ sizeof(struct btrfs_chunk));

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs for mainline

2009-01-04 Thread Arnd Bergmann
On Saturday 03 January 2009, Chris Mason wrote:
> 
> > Actually a lot of the ioctl API don't just need documentation but
> > a complete redo.  That's true at least for the physical device
> > management and subvolume / snaphot ones.
> > 
> 
> The ioctl interface is definitely not finalized.  Adding more vs
> replacing the existing ones is an open question.

As long as that's an open question, the ioctl interface shouldn't get
merged into the kernel, or should get in as btrfsdev, otherwise you
get stuck with the current ABI forever.

Is it possible to separate out the nonstandard ioctls into a patch
that can get merged when the interface is final, or will that make
btrfs unusable?

Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ENOSPC on mostly empty file system

2014-09-09 Thread Arnd Bergmann
Hi Chris,

As I mentioned at the kernel summit, I have a file system that I use mostly
for storing my one kernel git tree and occasionally some build trees
(those are normally on a tmpfs), and I have again run into the problem
where the file system is only partially full (I think 18% in this case)
but I am unable to create new files.

As you recommended, I have created an image using btrfs-image and
uploaded it to google drive for inspection.

I also played around with it some more. After removing a few small
files, I could create new files with up to 20-60MB again before hitting
ENOSPC. I then did a 'make clean' in all the object directories I had
and after that could fill the file system up to 100% by creating a
62GB file using the same command (dd if=/dev/zero of=x).

While probably unrelated, I also saw a few older 'hung task' messages
with btrfs call chains in dmesg and I'm adding them here in this mail
as well. The kernel messages are ten days old, but I did not reboot
in the meantime and I never saw them again before or after that.
The kernel is the stock Ubuntu 3.16.0 image from Ubuntu, I upgrade
to that after running an old 3.11 kernel that I never rebooted.

The 82 MB image file is at
https://drive.google.com/file/d/0B_XQwQ5KlfJAWDJfS1E0TG1CLTA/edit?usp=sharing

Arnd

[407459.475639] INFO: task BrowserBlocking:3282 blocked for more than 120 
seconds.
[407459.475649]   Not tainted 3.16.0-10-generic #15-Ubuntu
[407459.475652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[407459.475656] BrowserBlocking D 88042fc94800 0  3282   2580 0x
[407459.475664]  88040d9f7db0 0086 88041215b2a0 
00014800
[407459.475671]  88040d9f7fd8 00014800 88041215b2a0 
8800bbb12af8
[407459.475677]  8800bbb12afc 88041215b2a0  
8800bbb12b00
[407459.475683] Call Trace:
[407459.475699]  [] schedule_preempt_disabled+0x29/0x70
[407459.475707]  [] __mutex_lock_slowpath+0xd5/0x1f0
[407459.475714]  [] mutex_lock+0x1f/0x30
[407459.475771]  [] btrfs_log_inode_parent+0xd5/0x550 [btrfs]
[407459.475804]  [] btrfs_log_dentry_safe+0x42/0x60 [btrfs]
[407459.475834]  [] btrfs_sync_file+0x184/0x300 [btrfs]
[407459.475844]  [] do_fsync+0x51/0x80
[407459.475849]  [] SyS_fsync+0x10/0x20
[407459.475856]  [] system_call_fastpath+0x1a/0x1f
[407459.475861] INFO: task BrowserBlocking:3283 blocked for more than 120 
seconds.
[407459.475864]   Not tainted 3.16.0-10-generic #15-Ubuntu
[407459.475866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[407459.475869] BrowserBlocking D 88042fd54800 0  3283   2580 0x
[407459.475875]  88040bb13db0 0086 88041215ef60 
00014800
[407459.475881]  88040bb13fd8 00014800 88041215ef60 
880429b43590
[407459.475886]  880429b43594 88041215ef60  
880429b43598
[407459.475892] Call Trace:
[407459.475899]  [] schedule_preempt_disabled+0x29/0x70
[407459.475906]  [] __mutex_lock_slowpath+0xd5/0x1f0
[407459.475912]  [] mutex_lock+0x1f/0x30
[407459.475943]  [] btrfs_log_inode_parent+0x435/0x550 [btrfs]
[407459.475976]  [] btrfs_log_dentry_safe+0x42/0x60 [btrfs]
[407459.476005]  [] btrfs_sync_file+0x184/0x300 [btrfs]
[407459.476012]  [] do_fsync+0x51/0x80
[407459.476018]  [] SyS_fdatasync+0x13/0x20
[407459.476023]  [] system_call_fastpath+0x1a/0x1f
[407459.476099] INFO: task kworker/u65:6:2090685 blocked for more than 120 
seconds.
[407459.476102]   Not tainted 3.16.0-10-generic #15-Ubuntu
[407459.476104] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[407459.476107] kworker/u65:6   D 88042fd14800 0 2090685  2 
0x
[407459.476131] Workqueue: events_unbound btrfs_async_reclaim_metadata_space 
[btrfs]
[407459.476134]  8806189ffba0 0046 880428135100 
00014800
[407459.476140]  8806189fffd8 00014800 880428135100 
8806189ffcc8
[407459.476145]  8806189ffcc0 7fff 880428135100 
0001
[407459.476151] Call Trace:
[407459.476158]  [] schedule+0x29/0x70
[407459.476164]  [] schedule_timeout+0x229/0x2a0
[407459.476171]  [] ? __queue_work+0x136/0x340
[407459.476177]  [] ? __queue_delayed_work+0xaa/0x1c0
[407459.476184]  [] wait_for_completion+0xb4/0x1e0
[407459.476191]  [] ? wake_up_state+0x20/0x20
[407459.476198]  [] writeback_inodes_sb_nr+0x81/0xb0
[407459.476221]  [] flush_space+0x458/0x4f0 [btrfs]
[407459.476243]  [] ? can_overcommit+0x67/0x100 [btrfs]
[407459.476266]  [] 
btrfs_async_reclaim_metadata_space+0x16c/0x1e0 [btrfs]
[407459.476273]  [] process_one_work+0x182/0x4e0
[407459.476278]  [] worker_thread+0x6b/0x6a0
[407459.476284]  [] ? process_one_work+0x4e0/0x4e0
[407459.476290]  [] kthread+0xdb/0x100
[407459.476297]  [] ? kthread_create_on_node+0x1c0/0x1c0
[407459.476302]  [] ret_from_fork+0x7c/0xb0
[407459.476308]  [] ? kthread_create_on_node+0x1c0/0x1c0
[

Re: ENOSPC on mostly empty file system

2014-09-09 Thread Arnd Bergmann
On Tuesday 09 September 2014 16:29:05 Arnd Bergmann wrote:
> 
> I also played around with it some more. After removing a few small
> files, I could create new files with up to 20-60MB again before hitting
> ENOSPC. I then did a 'make clean' in all the object directories I had
> and after that could fill the file system up to 100% by creating a
> 62GB file using the same command (dd if=/dev/zero of=x).

Ok, one more data point: I was accidentally building randconfig kernels
in the same partition all the time, meaning I had lots of traffic with
small files using the same names all over all the time. After I got
the file system back to a usable state today, building some more randconfig
kernels broke it in the same way.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC on mostly empty file system

2014-09-09 Thread Arnd Bergmann
On Tuesday 09 September 2014 21:49:12 Clemens Eisserer wrote:
> Hi Arnd,
> 
> > Ok, one more data point:
> 
> Why don't you provide the data point you were specifically asked for,
> "btrfs fi df" ;)

I've cleaned it up again already. At the moment, it's working fine,
with this data:

Data: total=65.11GB, used=7.41GB
System, DUP: total=8.00MB, used=12.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=799.54MB
Metadata: total=8.00MB, used=0.00
: total=184.00MB, used=0.00

I also realized that the file I uploaded was the uncompressed 800MB
version, not the compressed 82MB file, which is now at
https://drive.google.com/file/d/0B_XQwQ5KlfJAYzY5c1lJRFVQNEE/edit?usp=sharing
Sorry about that.

Ok, now I'm in the bad state again (after running a 'make allmodconfig'
kernel build:

Label: none  uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47
Total devices 1 FS bytes used 8.79GB
devid1 size 67.14GB used 67.14GB path /dev/sdc6

Data: total=65.11GB, used=7.99GB
System, DUP: total=8.00MB, used=12.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.00GB, used=821.48MB
Metadata: total=8.00MB, used=0.00
: total=200.00MB, used=0.00

Not much extra space used up. Also, a bit from my last shell session
after getting into this state:

arnd@wuerfel:~/arm-soc$ make -skj40
../arch/arm/mach-mmp/devices.c:81:21: warning: 'u2o_get' defined but not used 
[-Wunused-function]
 static unsigned int u2o_get(void __iomem *base, unsigned int offset)
 ^
../arch/arm/mach-mmp/devices.c:86:13: warning: 'u2o_set' defined but not used 
[-Wunused-function]
 static void u2o_set(void __iomem *base, unsigned int offset,
 ^
../arch/arm/mach-mmp/devices.c:97:13: warning: 'u2o_clear' defined but not used 
[-Wunused-function]
 static void u2o_clear(void __iomem *base, unsigned int offset,
 ^
../arch/arm/mach-mmp/devices.c:108:13: warning: 'u2o_write' defined but not 
used [-Wunused-function]
 static void u2o_write(void __iomem *base, unsigned int offset,
 ^
mv: cannot move 'security/selinux/ss/.tmp_conditional.o' to 
'security/selinux/ss/conditional.o': No space left on device
make[4]: *** [security/selinux/ss/conditional.o] Error 1
mv: cannot move 'kernel/power/.snapshot.o.tmp' to 
'kernel/power/.snapshot.o.cmd': No space left on device
make[4]: *** [kernel/power/snapshot.o] Error 1
mv: cannot move 'drivers/ata/.pata_hpt366.o.tmp' to 
'drivers/ata/.pata_hpt366.o.cmd': No space left on device
make[4]: *** [drivers/ata/pata_hpt366.o] Error 1
mv: cannot move 'kernel/irq/.proc.o.tmp' to 'kernel/irq/.proc.o.cmd': No space 
left on device
make[4]: *** [kernel/irq/proc.o] Error 1
...
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: writing to 'x': No space left on device
416642+0 records in
416641+0 records out
213320192 bytes (213 MB) copied, 2.97814 s, 71.6 MB/s
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: opening 'x': No space left on device
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: opening 'x': No space left on device
arnd@wuerfel:~/arm-soc$ rm x
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: writing to 'x': No space left on device
366053+0 records in
366052+0 records out
187418624 bytes (187 MB) copied, 3.04426 s, 61.6 MB/s
arnd@wuerfel:~/arm-soc$ rm x
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: writing to 'x': No space left on device
363482+0 records in
363481+0 records out
186102272 bytes (186 MB) copied, 2.80644 s, 66.3 MB/s
arnd@wuerfel:~/arm-soc$ rm x
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: writing to 'x': No space left on device
366677+0 records in
366676+0 records out
187738112 bytes (188 MB) copied, 2.86939 s, 65.4 MB/s
arnd@wuerfel:~/arm-soc$ sync
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: opening 'x': No space left on device
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: opening 'x': No space left on device
arnd@wuerfel:~/arm-soc$ dd if=/dev/zero of=x
dd: opening 'x': No space left on device

I believe that there were no other processes writing to this partition
at the time.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC on mostly empty file system

2014-09-09 Thread Arnd Bergmann
On Tuesday 09 September 2014 22:57:25 Hugo Mills wrote:
> On Tue, Sep 09, 2014 at 11:49:10PM +0200, Arnd Bergmann wrote:
> > Ok, now I'm in the bad state again (after running a 'make allmodconfig'
> > kernel build:
> > 
> > Label: none  uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47
> > Total devices 1 FS bytes used 8.79GB
> > devid1 size 67.14GB used 67.14GB path /dev/sdc6
> 
>All the space on the FS has been allocated to some purpose or other.
> 
> > Data: total=65.11GB, used=7.99GB
> 
>Here, you have 65 GiB allocated to data, but only 8 GiB of that
> used. The FS won't automatically free up any of that (yet -- it's one
> of the project ideas).

Ok, I see.
 
>So... you need to free up some data chunks. You can do this with:
> 
> # btrfs balance start -dusage=5 /mountpoint
> 
>Take a look at the output of btrfs fi df and btrfs fi show
> afterwards, and see how much the Data allocation has reduced by, and
> how much unallocated space you have left afterwards. You may want to
> increase the number in the above balance command to some higher value,
> to free up even more chunks (it limits the balance to chunks less than
> n% full -- so the command above will only touch chunks with 5% actual
> data or less). This is in the FAQ.

This is the output afterwards:
$ time sudo btrfs fi balance start  -dusage=5 /git/
Done, had to relocate 44 out of 70 chunks

real0m22.026s
user0m0.014s
sys 0m8.302s

$ sudo btrfs fi show /dev/sdc6
Label: none  uuid: 1d88cccb-3d0e-42d9-8252-a226dc5c2e47
Total devices 1 FS bytes used 8.79GB
devid1 size 67.14GB used 23.14GB path /dev/sdc6

$ sudo btrfs fi df /git
Data: total=21.01GB, used=7.99GB
System, DUP: total=8.00MB, used=12.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.05GB, used=821.32MB
Metadata: total=8.00MB, used=0.00
: total=200.00MB, used=0.00

I'm pretty sure I read the FAQ one of the last times this happened,
but I thought it would only apply to mostly full file systems.
I can usually recover by moving all the data to tmpfs and doing
a new mkfs before moving the data back, and that normally lasts
for a few months before I run out of space again.

I'll try to see how long it takes until the problem comes back
now.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html