Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps

2023-09-19 Thread Jeff Layton
On Tue, 2023-09-19 at 13:10 -0700, Paul Eggert wrote:
> On 2023-09-19 09:31, Jeff Layton wrote:
> > The typical case for make
> > timestamp comparisons is comparing source files vs. a build target. If
> > those are being written nearly simultaneously, then that could be an
> > issue, but is that a typical behavior?
> 
> I vaguely remember running into problems with 'make' a while ago 
> (perhaps with a BSDish system) when filesystem timestamps were 
> arbitrarily truncated in some cases but not others. These files would 
> look older than they really were, so 'make' would think they were 
> up-to-date when they weren't, and 'make' would omit actions that it 
> should have done, thus screwing up the build.
> 
> File timestamps can be close together with 'make -j' on fast hosts. 
> Sometimes a shell script (or 'make' itself) will run 'make', then modify 
> a file F, then immediately run 'make' again; the latter 'make' won't 
> work if F's timestamp is mistakenly older than targets that depend on it.
> 
> Although 'make'-like apps are the biggest canaries in this coal mine, 
> the issue also affects 'find -newer' (as Bruno mentioned), 'rsync -u', 
> 'mv -u', 'tar -u', Emacs file-newer-than-file-p, and surely many other 
> places. For example, any app that creates a timestamp file, then backs 
> up all files newer than that file, would be at risk.
> 
> 
> > I wonder if it would be feasible to just advance the coarse-grained
> > current_time whenever we end up updating a ctime with a fine-grained
> > timestamp?
> 
> Wouldn't this need to be done globally, that is, not just on a per-file 
> or per-filesystem basis? If so, I don't see how we'd avoid locking 
> performance issues.
> 

Maybe. Another idea might be to introduce a new timekeeper for
multigrain filesystems, but all of those would likely have to share the
same coarse-grained clock source.

So yeah, if you stat an inode and then update it, any inode written on a
multigrain filesystem within the same jiffy-sized window would have to
log an extra transaction to write out the inode. That's what I meant
when I was talking about write amplification.

> 
> PS. Although I'm no expert in the Linux inode code I hope you don't mind 
> my asking a question about this part of inode_set_ctime_current:
> 
>   /*
>* If we've recently updated with a fine-grained timestamp,
>* then the coarse-grained one may still be earlier than the
>* existing ctime. Just keep the existing value if so.
>*/
>   ctime.tv_sec = inode->__i_ctime.tv_sec;
>   if (timespec64_compare(, ) > 0)
>   return ctime;
> 
> Suppose root used clock_settime to set the clock backwards. Won't this 
> code incorrectly refuse to update the file's timestamp afterwards? That 
> is, shouldn't the last line be "goto fine_grained;" rather than "return 
> ctime;", with the comment changed from "keep the existing value" to "use 
> a fine-grained value"?

It is a problem, and Linus pointed that out yesterday, which is why I
sent this earlier today:

https://lore.kernel.org/linux-fsdevel/20230919-ctime-v1-1-97b3da92f...@kernel.org/T/#u

Bear in mind that we're not dealing with a situation where the value has
not been queried since its last update, so we don't need to use a fine
grained timestamp there (and really, it's preferable not to do so). A
coarse one should be fine in this case.
-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps

2023-09-19 Thread Jeff Layton
On Tue, 2023-09-19 at 16:52 +0200, Bruno Haible wrote:
> Jeff Layton wrote:
> > I'm not sure what we can do for this test. The nap() function is making
> > an assumption that the timestamp granularity will be constant, and that
> > isn't necessarily the case now.
> 
> This is only of secondary importance, because the scenario by Jan Kara
> shows a much more fundamental breakage:
> 
> > > The ultimate problem is that a sequence like:
> > > 
> > > write(f1)
> > > stat(f2)
> > > write(f2)
> > > stat(f2)
> > > write(f1)
> > > stat(f1)
> > > 
> > > can result in f1 timestamp to be (slightly) lower than the final f2
> > > timestamp because the second write to f1 didn't bother updating the
> > > timestamp. That can indeed be a bit confusing to programs if they compare
> > > timestamps between two files. Jeff?
> > > 
> > 
> > Basically yes.
> 
> f1 was last written to *after* f2 was last written to. If the timestamp of f1
> is then lower than the timestamp of f2, timestamps are fundamentally broken.
> 
> Many things in user-space depend on timestamps, such as build system
> centered around 'make', but also 'find ... -newer ...'.
> 


What does breakage with make look like in this situation? The "fuzz"
here is going to be on the order of a jiffy. The typical case for make
timestamp comparisons is comparing source files vs. a build target. If
those are being written nearly simultaneously, then that could be an
issue, but is that a typical behavior? It seems like it would be hard to
rely on that anyway, esp. given filesystems like NFS that can do lazy
writeback.

One of the operating principles with this series is that timestamps can
be of varying granularity between different files. Note that Linux
already violates this assumption when you're working across filesystems
of different types.

As to potential fixes if this is a real problem:

I don't really want to put this behind a mount or mkfs option (a'la
relatime, etc.), but that is one possibility.

I wonder if it would be feasible to just advance the coarse-grained
current_time whenever we end up updating a ctime with a fine-grained
timestamp? It might produce some inode write amplification. Files that
were written within the same jiffy could see more inode transactions
logged, but that still might not be _too_ awful.

I'll keep thinking about it for now.
-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps

2023-09-19 Thread Bruno Haible
Jeff Layton wrote:
> I'm not sure what we can do for this test. The nap() function is making
> an assumption that the timestamp granularity will be constant, and that
> isn't necessarily the case now.

This is only of secondary importance, because the scenario by Jan Kara
shows a much more fundamental breakage:

> > The ultimate problem is that a sequence like:
> > 
> > write(f1)
> > stat(f2)
> > write(f2)
> > stat(f2)
> > write(f1)
> > stat(f1)
> >
> > can result in f1 timestamp to be (slightly) lower than the final f2
> > timestamp because the second write to f1 didn't bother updating the
> > timestamp. That can indeed be a bit confusing to programs if they compare
> > timestamps between two files. Jeff?
> > 
> 
> Basically yes.

f1 was last written to *after* f2 was last written to. If the timestamp of f1
is then lower than the timestamp of f2, timestamps are fundamentally broken.

Many things in user-space depend on timestamps, such as build system
centered around 'make', but also 'find ... -newer ...'.

Bruno





[Cluster-devel] Reminder: New mailing list for gfs2 and dlm

2023-09-19 Thread Andrew Price

Hi all,

For those of you still reading cluster-devel, please be aware that gfs2 
and dlm development have now moved to g...@lists.linux.dev


Please subscribe to the new list to avoid missing new developments, by 
sending a message to:


  gfs2+subscr...@lists.linux.dev

The new list is hosted at kernel.org. There is more information about it 
here:


   https://subspace.kernel.org/lists.linux.dev.html

Thanks,
Andy



Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps

2023-09-19 Thread Jeff Layton
On Tue, 2023-09-19 at 13:04 +0200, Jan Kara wrote:
> On Tue 19-09-23 15:05:24, Xi Ruoyao wrote:
> > On Mon, 2023-08-07 at 15:38 -0400, Jeff Layton wrote:
> > > Enable multigrain timestamps, which should ensure that there is an
> > > apparent change to the timestamp whenever it has been written after
> > > being actively observed via getattr.
> > > 
> > > For ext4, we only need to enable the FS_MGTIME flag.
> > 
> > Hi Jeff,
> > 
> > This patch causes a gnulib test failure:
> > 
> > $ ~/sources/lfs/grep-3.11/gnulib-tests/test-stat-time
> > test-stat-time.c:141: assertion 'statinfo[0].st_mtime < 
> > statinfo[2].st_mtime || (statinfo[0].st_mtime == statinfo[2].st_mtime && 
> > (get_stat_mtime_ns ([0]) < get_stat_mtime_ns ([2])))' 
> > failed
> > Aborted (core dumped)
> > 
> > The source code of the test:
> > https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-stat-time.c
> > 
> > Is this an expected change?
> 
> Kind of yes. The test first tries to estimate filesystem timestamp
> granularity in nap() function - due to this patch, the detected granularity
> will likely be 1 ns so effectively all the test calls will happen
> immediately one after another. But we don't bother setting the timestamps
> with more than 1 jiffy (usually 4 ms) precision unless we think someone is
> watching. So as a result timestamps of all stamp1 and stamp2 files are
> going to be equal which makes the test fail.
> 

That was my take too. The multigrain ctime changes are probably causing
nap() to settle on too small a time delta.

> The ultimate problem is that a sequence like:
> 
> write(f1)
> stat(f2)
> write(f2)
> stat(f2)
> write(f1)
> stat(f1)
>
> can result in f1 timestamp to be (slightly) lower than the final f2
> timestamp because the second write to f1 didn't bother updating the
> timestamp. That can indeed be a bit confusing to programs if they compare
> timestamps between two files. Jeff?
> 

Basically yes. When there is no stat() call issued on the file in
between writes, the kernel will use coarse-grained timestamps when
updating it (since no one is watching).


I'm not sure what we can do for this test. The nap() function is making
an assumption that the timestamp granularity will be constant, and that
isn't necessarily the case now.

-- 
Jeff Layton 



Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps

2023-09-19 Thread Jan Kara
On Tue 19-09-23 15:05:24, Xi Ruoyao wrote:
> On Mon, 2023-08-07 at 15:38 -0400, Jeff Layton wrote:
> > Enable multigrain timestamps, which should ensure that there is an
> > apparent change to the timestamp whenever it has been written after
> > being actively observed via getattr.
> > 
> > For ext4, we only need to enable the FS_MGTIME flag.
> 
> Hi Jeff,
> 
> This patch causes a gnulib test failure:
> 
> $ ~/sources/lfs/grep-3.11/gnulib-tests/test-stat-time
> test-stat-time.c:141: assertion 'statinfo[0].st_mtime < statinfo[2].st_mtime 
> || (statinfo[0].st_mtime == statinfo[2].st_mtime && (get_stat_mtime_ns 
> ([0]) < get_stat_mtime_ns ([2])))' failed
> Aborted (core dumped)
> 
> The source code of the test:
> https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-stat-time.c
> 
> Is this an expected change?

Kind of yes. The test first tries to estimate filesystem timestamp
granularity in nap() function - due to this patch, the detected granularity
will likely be 1 ns so effectively all the test calls will happen
immediately one after another. But we don't bother setting the timestamps
with more than 1 jiffy (usually 4 ms) precision unless we think someone is
watching. So as a result timestamps of all stamp1 and stamp2 files are
going to be equal which makes the test fail.

The ultimate problem is that a sequence like:

write(f1)
stat(f2)
write(f2)
stat(f2)
write(f1)
stat(f1)

can result in f1 timestamp to be (slightly) lower than the final f2
timestamp because the second write to f1 didn't bother updating the
timestamp. That can indeed be a bit confusing to programs if they compare
timestamps between two files. Jeff?

Honza


> > Acked-by: Theodore Ts'o 
> > Reviewed-by: Jan Kara 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/ext4/super.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index b54c70e1a74e..cb1ff47af156 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -7279,7 +7279,7 @@ static struct file_system_type ext4_fs_type = {
> >     .init_fs_context= ext4_init_fs_context,
> >     .parameters = ext4_param_specs,
> >     .kill_sb= kill_block_super,
> > -   .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> > +   .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP |
> > FS_MGTIME,
> >  };
> >  MODULE_ALIAS_FS("ext4");
> >  
> > 
> 
-- 
Jan Kara 
SUSE Labs, CR



Re: [Cluster-devel] [PATCH v7 12/13] ext4: switch to multigrain timestamps

2023-09-19 Thread Xi Ruoyao
On Mon, 2023-08-07 at 15:38 -0400, Jeff Layton wrote:
> Enable multigrain timestamps, which should ensure that there is an
> apparent change to the timestamp whenever it has been written after
> being actively observed via getattr.
> 
> For ext4, we only need to enable the FS_MGTIME flag.

Hi Jeff,

This patch causes a gnulib test failure:

$ ~/sources/lfs/grep-3.11/gnulib-tests/test-stat-time
test-stat-time.c:141: assertion 'statinfo[0].st_mtime < statinfo[2].st_mtime || 
(statinfo[0].st_mtime == statinfo[2].st_mtime && (get_stat_mtime_ns 
([0]) < get_stat_mtime_ns ([2])))' failed
Aborted (core dumped)

The source code of the test:
https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-stat-time.c

Is this an expected change?

> Acked-by: Theodore Ts'o 
> Reviewed-by: Jan Kara 
> Signed-off-by: Jeff Layton 
> ---
>  fs/ext4/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b54c70e1a74e..cb1ff47af156 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -7279,7 +7279,7 @@ static struct file_system_type ext4_fs_type = {
>   .init_fs_context= ext4_init_fs_context,
>   .parameters = ext4_param_specs,
>   .kill_sb= kill_block_super,
> - .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> + .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP |
> FS_MGTIME,
>  };
>  MODULE_ALIAS_FS("ext4");
>  
>