Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Roger Willcocks

On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote:

> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
> 
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
>   time_t i = get_seconds();
> 
>   if (tloc) {
>   if (put_user(i,tloc))
>   return -EFAULT;
>   }
>   force_successful_syscall_return();
>   return i;
> }

get_seconds() returns an unsigned long so there's potential for overflow
here.

--
Roger



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Arnd Bergmann
On Tuesday 03 June 2014 18:41:30 Dave Chinner wrote:
> On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
> > On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> > > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > My patch set
> > > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > > > more like 64-bit kernels regarding inode time stamps, which does
> > > > impact all the file systems that the a 64-bit time or the NFS
> > > > unsigned epoch (1970-2106), while your patch extends the file
> > > > system internal epoch (1901-2038 for XFS) so it can be used by
> > > > anything that knows how to handle larger than 32-bit second values
> > > > (either 64-bit kernel or 32-bit with inode_time patch).
> > > 
> > > Right, but the issue is that 64 bit second counters are broken right
> > > now because most filesystems can't support more than 32 bit values.
> > > So it doesn't matter whether it's 32 bit or 64 bit machines, just
> > > adding explicit support for >32 bit second counters without doing
> > > anything else just extends that brokenness into the indefinite
> > > future.
> > 
> > Of course, "most filesystems" are obsolete, and most of the modern
> > file systems already support >32 bit timestamps: ext4, btrfs, cifs,
> > f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
> > except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
> > 64-bit systems, which interprets time stamps with the high bit
> > set as years 2038-2106 rather than 1903-1969.
> 
> I'm not sure that's an entirely correct representation - the
> remainder of the 32 bit-only timestamp filesystems don't actively
> interpret the time stamp at all - it's just an opaque 32 bit value.
> hence the interpretation of the value is dependent on whether the
> kernel treats it as signed or unsigned

As I mentioned elsewhere in the thread, I don't the way it's handled
is intentional, but it's definitely the file system code that does
the assignment to the timeval and decides on the interpretation, doing
either

inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode.mtime);

or

inode->i_mtime.tv_sec = le32_to_cpu(raw_inode.mtime);


Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Dave Chinner
On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
> On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > My patch set
> > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > > more like 64-bit kernels regarding inode time stamps, which does
> > > impact all the file systems that the a 64-bit time or the NFS
> > > unsigned epoch (1970-2106), while your patch extends the file
> > > system internal epoch (1901-2038 for XFS) so it can be used by
> > > anything that knows how to handle larger than 32-bit second values
> > > (either 64-bit kernel or 32-bit with inode_time patch).
> > 
> > Right, but the issue is that 64 bit second counters are broken right
> > now because most filesystems can't support more than 32 bit values.
> > So it doesn't matter whether it's 32 bit or 64 bit machines, just
> > adding explicit support for >32 bit second counters without doing
> > anything else just extends that brokenness into the indefinite
> > future.
> 
> Of course, "most filesystems" are obsolete, and most of the modern
> file systems already support >32 bit timestamps: ext4, btrfs, cifs,
> f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
> except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
> 64-bit systems, which interprets time stamps with the high bit
> set as years 2038-2106 rather than 1903-1969.

I'm not sure that's an entirely correct representation - the
remainder of the 32 bit-only timestamp filesystems don't actively
interpret the time stamp at all - it's just an opaque 32 bit value.
hence the interpretation of the value is dependent on whether the
kernel treats it as signed or unsigned

> > infrastructure), then we'll *never be able to fix it* and we'll be
> > stuck with timestamps that do really weird things when you pass
> > arbitrary future dates to the kernel.
> 
> We already have that. I agree it's fixable and we should fix it,
> but I don't see how this is different from what we had 20 years
> ago when Linux on Alpha first introduced a 64-bit time_t. It's
> been this way on every 64-bit Linux system since.

I see it differently: we've got 20 years more experience than when
the 64 bit time_t was introduced. That experience tells us that best
practices for API design are to range check every input to prevent
unintended side effects from occurring due to out-of-range data

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Arnd Bergmann
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
> On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> > On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > > all file systems at least times until 2106, because they treat
> > > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > > a completely different representation. My guess is that somebody
> > > > > earlier spent a lot of work on making that happen.
> > > > > 
> > > > > The exceptions are:
> > > > > 
> > > > > * exofs uses signed values, which can probably be changed to be
> > > > >   consistent with the others.
> > > > > * isofs has a bug that limits it until 2027 on architectures with
> > > > >   a signed 'char' type (otherwise it's 2155).
> > > > > * udf can represent times for many thousands of years through a
> > > > >   16-bit year representation, but the code to convert to epoch
> > > > >   uses a const array that ends at 2038.
> > > > > * afs uses signed seconds and can probably be fixed
> > > > > * coda relies on user space time representation getting passed
> > > > >   through an ioctl.
> > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > > >   where they really use signed.
> > > > > 
> > > > > I was confused about XFS since I didn't noticed that there are
> > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > > > 
> > > > You've missed an awful lot more than just the implications for the
> > > > core kernel code.
> > > > 
> > > > There's a good chance such changes propagate to APIs elsewhere in
> > > > the filesystems, because something you haven't realised is that XFS
> > > > effectively exposes the on-disk timestamp format directly to
> > > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > > by the online defragmenter.
> > 
> > I really didn't look at them at all, as ioctl is very late on my
> > mental list of things to change. I do realize that a lot of drivers
> > and file systems do have ioctls that pass time values and we need to
> > address them one by one.
> > 
> > I just looked at the ioctls you mentioned but don't see how open-by-handle
> > is affected by this. Can you point me to what you mean?
> 
> Sorry, I misremembered how some of the XFS open-by-handle code works
> in userspace (XFS has a pretty rich open-by-handle ioctl() interface
> that predates the kernel syscalls by at least 10 years).  Basically
> there is code in userspace that uses the information returned from
> bulkstat to construct file handles to pass to the open-by-handle
> ioctls. xfs_fsr then uses the combination of open-by-handle from the
> bulkstat output and the bulkstat output to feed into the swap extent
> ioctls
> 
> i.e. the filesystem's idea of what time is is passed to userspace as
> an opaque cookie in this case, but it is not used directly by the
> open-by-handle interfaces like I implied it was.

Ok, I see.

> > My patch set
> > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> > more like 64-bit kernels regarding inode time stamps, which does
> > impact all the file systems that the a 64-bit time or the NFS
> > unsigned epoch (1970-2106), while your patch extends the file
> > system internal epoch (1901-2038 for XFS) so it can be used by
> > anything that knows how to handle larger than 32-bit second values
> > (either 64-bit kernel or 32-bit with inode_time patch).
> 
> Right, but the issue is that 64 bit second counters are broken right
> now because most filesystems can't support more than 32 bit values.
> So it doesn't matter whether it's 32 bit or 64 bit machines, just
> adding explicit support for >32 bit second counters without doing
> anything else just extends that brokenness into the indefinite
> future.

Of course, "most filesystems" are obsolete, and most of the modern
file systems already support >32 bit timestamps: ext4, btrfs, cifs,
f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
64-bit systems, which interprets time stamps with the high bit
set as years 2038-2106 rather than 1903-1969.

> If we don't fix it now (i.e in the new user API and supporting
> infrastructure), then we'll *never be able to fix it* and we'll be
> stuck with timestamps that do really weird things when you pass
> arbitrary future dates to the kernel.

We already have that. I agree it's fixable and we should fix it,
but I don't see how this is different from what we had 20 years
ago when Linux on Alpha first introduced a 64-bit time_t. It's
been this way on every 64-bit 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Arnd Bergmann
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
 On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
  On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
   On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
 In my list at http://kernelnewbies.org/y2038, I found that almost
 all file systems at least times until 2106, because they treat
 the on-disk value as unsigned on 64-bit systems, or they use
 a completely different representation. My guess is that somebody
 earlier spent a lot of work on making that happen.
 
 The exceptions are:
 
 * exofs uses signed values, which can probably be changed to be
   consistent with the others.
 * isofs has a bug that limits it until 2027 on architectures with
   a signed 'char' type (otherwise it's 2155).
 * udf can represent times for many thousands of years through a
   16-bit year representation, but the code to convert to epoch
   uses a const array that ends at 2038.
 * afs uses signed seconds and can probably be fixed
 * coda relies on user space time representation getting passed
   through an ioctl.
 * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
   where they really use signed.
 
 I was confused about XFS since I didn't noticed that there are
 separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
 XFS to also use the 1970-2106 time range on 64-bit systems today.

You've missed an awful lot more than just the implications for the
core kernel code.

There's a good chance such changes propagate to APIs elsewhere in
the filesystems, because something you haven't realised is that XFS
effectively exposes the on-disk timestamp format directly to
userspace via the bulkstat interface (see struct xfs_bstat). It also
affects the XFS open-by-handle ioctl and the swap extent ioctl used
by the online defragmenter.
  
  I really didn't look at them at all, as ioctl is very late on my
  mental list of things to change. I do realize that a lot of drivers
  and file systems do have ioctls that pass time values and we need to
  address them one by one.
  
  I just looked at the ioctls you mentioned but don't see how open-by-handle
  is affected by this. Can you point me to what you mean?
 
 Sorry, I misremembered how some of the XFS open-by-handle code works
 in userspace (XFS has a pretty rich open-by-handle ioctl() interface
 that predates the kernel syscalls by at least 10 years).  Basically
 there is code in userspace that uses the information returned from
 bulkstat to construct file handles to pass to the open-by-handle
 ioctls. xfs_fsr then uses the combination of open-by-handle from the
 bulkstat output and the bulkstat output to feed into the swap extent
 ioctls
 
 i.e. the filesystem's idea of what time is is passed to userspace as
 an opaque cookie in this case, but it is not used directly by the
 open-by-handle interfaces like I implied it was.

Ok, I see.

  My patch set
  (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
  more like 64-bit kernels regarding inode time stamps, which does
  impact all the file systems that the a 64-bit time or the NFS
  unsigned epoch (1970-2106), while your patch extends the file
  system internal epoch (1901-2038 for XFS) so it can be used by
  anything that knows how to handle larger than 32-bit second values
  (either 64-bit kernel or 32-bit with inode_time patch).
 
 Right, but the issue is that 64 bit second counters are broken right
 now because most filesystems can't support more than 32 bit values.
 So it doesn't matter whether it's 32 bit or 64 bit machines, just
 adding explicit support for 32 bit second counters without doing
 anything else just extends that brokenness into the indefinite
 future.

Of course, most filesystems are obsolete, and most of the modern
file systems already support 32 bit timestamps: ext4, btrfs, cifs,
f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
64-bit systems, which interprets time stamps with the high bit
set as years 2038-2106 rather than 1903-1969.

 If we don't fix it now (i.e in the new user API and supporting
 infrastructure), then we'll *never be able to fix it* and we'll be
 stuck with timestamps that do really weird things when you pass
 arbitrary future dates to the kernel.

We already have that. I agree it's fixable and we should fix it,
but I don't see how this is different from what we had 20 years
ago when Linux on Alpha first introduced a 64-bit time_t. It's
been this way on every 64-bit Linux system since.

  This is how ext4 does it (I mean
  the sizeof() trick, not the bit stuffing they do):
 
  I guess if there is general agreement on introducing 'struct inode_time',
  we can skip that intermediate step.
 
 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Dave Chinner
On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
 On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
  On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
   On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
 On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
   My patch set
   (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
   more like 64-bit kernels regarding inode time stamps, which does
   impact all the file systems that the a 64-bit time or the NFS
   unsigned epoch (1970-2106), while your patch extends the file
   system internal epoch (1901-2038 for XFS) so it can be used by
   anything that knows how to handle larger than 32-bit second values
   (either 64-bit kernel or 32-bit with inode_time patch).
  
  Right, but the issue is that 64 bit second counters are broken right
  now because most filesystems can't support more than 32 bit values.
  So it doesn't matter whether it's 32 bit or 64 bit machines, just
  adding explicit support for 32 bit second counters without doing
  anything else just extends that brokenness into the indefinite
  future.
 
 Of course, most filesystems are obsolete, and most of the modern
 file systems already support 32 bit timestamps: ext4, btrfs, cifs,
 f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
 except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
 64-bit systems, which interprets time stamps with the high bit
 set as years 2038-2106 rather than 1903-1969.

I'm not sure that's an entirely correct representation - the
remainder of the 32 bit-only timestamp filesystems don't actively
interpret the time stamp at all - it's just an opaque 32 bit value.
hence the interpretation of the value is dependent on whether the
kernel treats it as signed or unsigned

  infrastructure), then we'll *never be able to fix it* and we'll be
  stuck with timestamps that do really weird things when you pass
  arbitrary future dates to the kernel.
 
 We already have that. I agree it's fixable and we should fix it,
 but I don't see how this is different from what we had 20 years
 ago when Linux on Alpha first introduced a 64-bit time_t. It's
 been this way on every 64-bit Linux system since.

I see it differently: we've got 20 years more experience than when
the 64 bit time_t was introduced. That experience tells us that best
practices for API design are to range check every input to prevent
unintended side effects from occurring due to out-of-range data

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Arnd Bergmann
On Tuesday 03 June 2014 18:41:30 Dave Chinner wrote:
 On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote:
  On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote:
   On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
 On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
  On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
My patch set
(at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
more like 64-bit kernels regarding inode time stamps, which does
impact all the file systems that the a 64-bit time or the NFS
unsigned epoch (1970-2106), while your patch extends the file
system internal epoch (1901-2038 for XFS) so it can be used by
anything that knows how to handle larger than 32-bit second values
(either 64-bit kernel or 32-bit with inode_time patch).
   
   Right, but the issue is that 64 bit second counters are broken right
   now because most filesystems can't support more than 32 bit values.
   So it doesn't matter whether it's 32 bit or 64 bit machines, just
   adding explicit support for 32 bit second counters without doing
   anything else just extends that brokenness into the indefinite
   future.
  
  Of course, most filesystems are obsolete, and most of the modern
  file systems already support 32 bit timestamps: ext4, btrfs, cifs,
  f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else
  except xfs, ext2/3 and exofs uses the nfsv3 interpretation on
  64-bit systems, which interprets time stamps with the high bit
  set as years 2038-2106 rather than 1903-1969.
 
 I'm not sure that's an entirely correct representation - the
 remainder of the 32 bit-only timestamp filesystems don't actively
 interpret the time stamp at all - it's just an opaque 32 bit value.
 hence the interpretation of the value is dependent on whether the
 kernel treats it as signed or unsigned

As I mentioned elsewhere in the thread, I don't the way it's handled
is intentional, but it's definitely the file system code that does
the assignment to the timeval and decides on the interpretation, doing
either

inode-i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode.mtime);

or

inode-i_mtime.tv_sec = le32_to_cpu(raw_inode.mtime);


Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-03 Thread Roger Willcocks

On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote:

 Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
 the Posix specification:
 
 SYSCALL_DEFINE1(time, time_t __user *, tloc)
 {
   time_t i = get_seconds();
 
   if (tloc) {
   if (put_user(i,tloc))
   return -EFAULT;
   }
   force_successful_syscall_return();
   return i;
 }

get_seconds() returns an unsigned long so there's potential for overflow
here.

--
Roger



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Dave Chinner
On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
> On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > > all file systems at least times until 2106, because they treat
> > > > the on-disk value as unsigned on 64-bit systems, or they use
> > > > a completely different representation. My guess is that somebody
> > > > earlier spent a lot of work on making that happen.
> > > > 
> > > > The exceptions are:
> > > > 
> > > > * exofs uses signed values, which can probably be changed to be
> > > >   consistent with the others.
> > > > * isofs has a bug that limits it until 2027 on architectures with
> > > >   a signed 'char' type (otherwise it's 2155).
> > > > * udf can represent times for many thousands of years through a
> > > >   16-bit year representation, but the code to convert to epoch
> > > >   uses a const array that ends at 2038.
> > > > * afs uses signed seconds and can probably be fixed
> > > > * coda relies on user space time representation getting passed
> > > >   through an ioctl.
> > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > > >   where they really use signed.
> > > > 
> > > > I was confused about XFS since I didn't noticed that there are
> > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > > 
> > > You've missed an awful lot more than just the implications for the
> > > core kernel code.
> > > 
> > > There's a good chance such changes propagate to APIs elsewhere in
> > > the filesystems, because something you haven't realised is that XFS
> > > effectively exposes the on-disk timestamp format directly to
> > > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > > by the online defragmenter.
> 
> I really didn't look at them at all, as ioctl is very late on my
> mental list of things to change. I do realize that a lot of drivers
> and file systems do have ioctls that pass time values and we need to
> address them one by one.
> 
> I just looked at the ioctls you mentioned but don't see how open-by-handle
> is affected by this. Can you point me to what you mean?

Sorry, I misremembered how some of the XFS open-by-handle code works
in userspace (XFS has a pretty rich open-by-handle ioctl() interface
that predates the kernel syscalls by at least 10 years).  Basically
there is code in userspace that uses the information returned from
bulkstat to construct file handles to pass to the open-by-handle
ioctls. xfs_fsr then uses the combination of open-by-handle from the
bulkstat output and the bulkstat output to feed into the swap extent
ioctls

i.e. the filesystem's idea of what time is is passed to userspace as
an opaque cookie in this case, but it is not used directly by the
open-by-handle interfaces like I implied it was.

> > Just to put that in context, here's the kernel patch to add extended
> > epoch support to XFS. It's completely untested as I haven't done any
> > userspace code changes to enable the feature. However, it should
> > give you an indication of how far the simple act of changing the
> > kernel time representation spread through the filesystem. This does
> > not include any of the VFS infrastructure to specifying the range of
> > supported timestamps.  It survives some smoke testing, but dies when
> > the online defragmenter starts using the bulkstat and swap extent
> > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
> > probably don't have that all sorted correctly yet...
> > 
> > To test extended epoch support, however, I need to some fstests that
> > define and validate the behaviour of the new syscalls - until we get
> > those we can't validate that the filesystem follows the spec
> > properly. I also suspect we are going to need an interface to query
> > the supported range of timestamps from a filesystem so that we can
> > test boundary conditions in an automated fashion
> 
> Thanks a lot for having an initial look at this yourself!
> 
> I'd still consider the two problems largely orthogonal.

Depends how you look at it. You can't extend the kernel's idea of
time without permanent storage being able to specify the supported
bounds - that's a non-negotiable aspect of introducing extended
epoch timestamp support.

The actual addition of extended timestamp support to each individual
filesystem is orthoganol to the introduction of the struct
inode_time, but doing this addition properly is dependent on the VFS
infrastructure being there in the first place.

> My patch set
> (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
> more like 64-bit kernels regarding inode time stamps, which does
> 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin
On 06/02/2014 04:32 PM, Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
>> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
>>>
>>> And since we are already returning (time_t) -1 in some cases, we might
>>> as well try to make things a bit more formal.
>>>
>>
>> Are we?  I am not aware of *Linux* actually using that.
> 
> Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
> the Posix specification:
> 
> SYSCALL_DEFINE1(time, time_t __user *, tloc)
> {
>   time_t i = get_seconds();
> 
>   if (tloc) {
>   if (put_user(i,tloc))
>   return -EFAULT;
>   }
>   force_successful_syscall_return();
>   return i;
> }
> 

OK, I guess I should have said... other than for -EFAULT.

I just don't know of anyone using time(2) with an argument other than NULL.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
> On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
> > 
> > And since we are already returning (time_t) -1 in some cases, we might
> > as well try to make things a bit more formal.
> > 
> 
> Are we?  I am not aware of *Linux* actually using that.

Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
the Posix specification:

SYSCALL_DEFINE1(time, time_t __user *, tloc)
{
time_t i = get_seconds();

if (tloc) {
if (put_user(i,tloc))
return -EFAULT;
}
force_successful_syscall_return();
return i;
}

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin
On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
> 
> And since we are already returning (time_t) -1 in some cases, we might
> as well try to make things a bit more formal.
> 

Are we?  I am not aware of *Linux* actually using that.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote:
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
> 
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.

We won't know if the RTC clock is wrong, true --- but the kernel will
know if (a) the hardware doesn't have RTC clock at all, or if (b) the
RTC clock is ticking some time that can't be encoded using the current
time_t type.  So in that case, the fallback would be to be for the
kernel to tick starting with time_t == 0 when the system is initially
booted, and the "time indefinite flag" would be set.

Now assume that we have a new system call, gettimestampofday(2), which
returns a new timestamp structure which has a 64-bit ts_sec field, the
ts_nsec field (ala struct timespec), and a ts_flags field, where the
kernel could signal things like "time invalid", or "time can't be
encoded in the legacy time_t type", or "I'm not sure if the time is
correct" --- i.e., because the RTC battery isn't working.

Not all hardware might be able to support the last, of course, but if
the battery is low, or the system has been exposed to very low
temperatures (or large amounts of cosmic radiation, etc.)  the RTC
time may just be plain wrong.  No system is going to be perfect, but
it should be possible to make htings better, at for certain classes of
hardware.

And since we are already returning (time_t) -1 in some cases, we might
as well try to make things a bit more formal.

- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 15:04:27 Chuck Lever wrote:
> On Jun 2, 2014, at 2:58 PM, Roger Willcocks  wrote:
> 
> > 
> > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> > 
> >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> >> (See the definition of nfstime3 in RFC 1813).
> >> 
> > 
> > nfstime3 could be extended by redefining the otherwise unused
> > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> > seconds field and an unsigned 30-bit nanoseconds field.
> > 
> > This could represent 1970 +/- 272 years.
> > 
> > Servers could indicate they can understand the extended time format by
> > adding a new FSINFO capability - FSF3_CANSETTIME_EX.
> > 
> > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> > timestamps so old servers would be protected from new clients.
> 
> You would have to get the IETF’s NFSv4 working group to sign off on
> this change. Otherwise, Linux would be the only NFSv3 implementation
> that supports the extension.
> 
> But I suspect the answer you’d get is “Use NFSv4.”

While I've never dealt with an NFS standardization, I'd assume this is
a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid
range of times until 2106 using unsigned seconds, and that should really
give enough time to migrate to something better (not necessarily NFSv4).

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Chuck Lever

On Jun 2, 2014, at 2:58 PM, Roger Willcocks  wrote:

> 
> On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
> 
>> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
>> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
>> (See the definition of nfstime3 in RFC 1813).
>> 
> 
> nfstime3 could be extended by redefining the otherwise unused
> nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
> seconds field and an unsigned 30-bit nanoseconds field.
> 
> This could represent 1970 +/- 272 years.
> 
> Servers could indicate they can understand the extended time format by
> adding a new FSINFO capability - FSF3_CANSETTIME_EX.
> 
> Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
> timestamps so old servers would be protected from new clients.

You would have to get the IETF’s NFSv4 working group to sign off on
this change. Otherwise, Linux would be the only NFSv3 implementation
that supports the extension.

But I suspect the answer you’d get is “Use NFSv4.”

> Old clients don't need to be protected from new servers because the
> on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
> so they're no worse off than they were before.
> 
> Arguably the new server ought to clamp out-of-range timestamps before
> sending them to old clients but that would need per-client state (and
> nfs3 is stateless.)

There’s no reliable way in NFSv3 for clients and servers to identify
the software running on the peer.

Practically speaking, you should assume that the NFSv3 protocol is never
going to change.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Roger Willcocks

On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:

> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
> 

nfstime3 could be extended by redefining the otherwise unused
nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
seconds field and an unsigned 30-bit nanoseconds field.

This could represent 1970 +/- 272 years.

Servers could indicate they can understand the extended time format by
adding a new FSINFO capability - FSF3_CANSETTIME_EX.

Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
timestamps so old servers would be protected from new clients.

Old clients don't need to be protected from new servers because the
on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
so they're no worse off than they were before.

Arguably the new server ought to clamp out-of-range timestamps before
sending them to old clients but that would need per-client state (and
nfs3 is stateless.)

--
Roger


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 11:04:23 Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
> 
> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
> (See the definition of nfstime3 in RFC 1813).
> 
> NFSv4 uses a signed 64-bit value where zero represents midnight UTC
> on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
> the definition of nfstime4 in RFC 5661).
> 
> The NFSv4 protocol is probably not problematic, and NFSv3 should be out
> of the picture by 2038. But if changes are planned for dealing _now_
> with timestamp issues, compatibility with NFSv3 is a consideration.
> 
> It is already the case that, via NFSv3, the Linux NFS client transmits
> timestamps earlier than 1970 as large positive numbers. Try this with
> xfstests generic/258.

If I read the code correctly, a pre-1970 timestamp will be sent as
a large unsigned integer, but received as a post-2038 timestamp on
64-bit kernels, both in the nfs client and server code.

This behavior is clearly wrong, but it's the same bug that we have
in lots of other file systems, and it makes sense to have the
same fix everywhere, at lease the cases where we know what interpretation
we actually want. NFS has the luxury of having an actual specification
saying that the value is unsigned. For most of the legacy file systems,
we can only make a guess at how other OSs would interpret the same
numbers.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote:
> On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
> > 
> > I wonder if it would make sense to try to promulgate via the Austin
> > group, and possibly the C standards committee the concept of a bit
> > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> > unknown", or "time indefinite" or "we couldn't encode the time".
> > 
> 
> (time_t)-1 already has this meaning for some calls (e.g. time(2)).
> However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
> something similar applies to all possible bit patterns, certainly within
> the range of an int.

Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means
"Sun Feb  7 07:28:15 CET 2106", and that is much harder to distinguish
from a real future date.

If we had the choice, I'd go for something like 1, i.e.
"Thu Jan  1 01:00:01 CET 1970".

> > We would then teach gmtime(3) and asctime(3) to print some appropriate
> > message, and we could teach programs like find (with the -mtime)
> > option, make, tmpwatch, et. al., that they can't make any presumption
> > about the comparibility of any timestamp which has a value of
> > TIME_UNDEFINIED.
> > 
> > It would be problematic for time(2) or gettimeofday(2) to return
> > TIME_UNDEFINED, since there are programs that care about time ticking
> > forward, but I could imagine a new interface which would be permitted
> > to return a flag indicating that we don't know the current time
> > (because the CMOS battery had run down, etc.) so instead we're going
> > to be counting the number of seconds since the system was booted.
> 
> This assumes that we actually know that that is the case, which may be
> an aggressive assumption.

It's harder for time(2), but for the inode case, we can definitely
detect when the file system specific representation overflows
or underflows, which may be be at a number of very different points
of time.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin
On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
> 
> I wonder if it would make sense to try to promulgate via the Austin
> group, and possibly the C standards committee the concept of a bit
> pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
> unknown", or "time indefinite" or "we couldn't encode the time".
> 

(time_t)-1 already has this meaning for some calls (e.g. time(2)).
However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
something similar applies to all possible bit patterns, certainly within
the range of an int.

> We would then teach gmtime(3) and asctime(3) to print some appropriate
> message, and we could teach programs like find (with the -mtime)
> option, make, tmpwatch, et. al., that they can't make any presumption
> about the comparibility of any timestamp which has a value of
> TIME_UNDEFINIED.
> 
> It would be problematic for time(2) or gettimeofday(2) to return
> TIME_UNDEFINED, since there are programs that care about time ticking
> forward, but I could imagine a new interface which would be permitted
> to return a flag indicating that we don't know the current time
> (because the CMOS battery had run down, etc.) so instead we're going
> to be counting the number of seconds since the system was booted.

This assumes that we actually know that that is the case, which may be
an aggressive assumption.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote:
> I’m wondering what should be done about NFS. A solution for NFS should
> match any scheme that is considered for local file systems, IMO.
> 
> An alternative would be to “cap” the timestamps transmitted via NFSv3 by
> Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
> timestamp is transmitted as UINT_MAX.


I wonder if it would make sense to try to promulgate via the Austin
group, and possibly the C standards committee the concept of a bit
pattern (that might commonly be INT_MAX or UINT_MAX) that means "time
unknown", or "time indefinite" or "we couldn't encode the time".

We would then teach gmtime(3) and asctime(3) to print some appropriate
message, and we could teach programs like find (with the -mtime)
option, make, tmpwatch, et. al., that they can't make any presumption
about the comparibility of any timestamp which has a value of
TIME_UNDEFINIED.

It would be problematic for time(2) or gettimeofday(2) to return
TIME_UNDEFINED, since there are programs that care about time ticking
forward, but I could imagine a new interface which would be permitted
to return a flag indicating that we don't know the current time
(because the CMOS battery had run down, etc.) so instead we're going
to be counting the number of seconds since the system was booted.

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Chuck Lever

On Jun 2, 2014, at 6:56 AM, Arnd Bergmann  wrote:

> On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
>> 
>>> For actually running kernels beyond 2038, the best idea I've seen so
>>> far is to disallow all broken code at compile time. I don't see
>>> a choice but to audit the entire kernel for invalid uses on both
>>> 32 and 64 bit in the next few years. A lot of code will get changed
>>> in the process so we can actually keep running 32-bit kernels and
>>> file systems, but other code will likely go away:
>>> 
>>> * any system calls that pass a time_t, timeval or timespec on
>>>  32-bit systems return -ENOSYS, to ensure all user land uses
>>>  the replacements we will put into place
>>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
>>>  from the kernel, and all code using it left out.
>>> * ext2 and ext3 file system code will have to be disabled, but that's
>>>  file since ext4 can mount old file systems.
>> 
>> Syscalls and libs can be "fixed".  Existing filesystem content might 
>> not.  So if you need to mount some old media in read-write mode after 
>> 2038 and that happens to content an ext2 or similarly limited filesystem 
>> then it'd better just "work".  Having the kernel refuse to modify the 
>> filesystem would be unacceptable.
> 
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
> 
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later. Until then (rather sooner than later), I'd like
> to get to the point where you can choose whether to include those
> modules at build time or not, and then get everybody to turn off that
> option and fix the bugs they run into. You wouldn't need that for a
> 2014-generation long-term support disto (rhel 7, sles 12, debian 7,
> ubuntu 14.04, ...), but perhaps for the next generation, or the
> one after that.

I’m wondering what should be done about NFS. A solution for NFS should
match any scheme that is considered for local file systems, IMO.

NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
(See the definition of nfstime3 in RFC 1813).

NFSv4 uses a signed 64-bit value where zero represents midnight UTC
on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
the definition of nfstime4 in RFC 5661).

The NFSv4 protocol is probably not problematic, and NFSv3 should be out
of the picture by 2038. But if changes are planned for dealing _now_
with timestamp issues, compatibility with NFSv3 is a consideration.

It is already the case that, via NFSv3, the Linux NFS client transmits
timestamps earlier than 1970 as large positive numbers. Try this with
xfstests generic/258.

Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and
timestamps larger than can be represented in an unsigned 32-bit field
and return an immediate error to the requesting application (like EINVAL).

If the Linux NFS server encounters a local file with a timestamp that
cannot be represented via a u32, should it also return NFS3ERR_INVAL?

RFC 1813 does not provide guidance on the behavior nor does it suggest
a particular error status code. The Solaris 11 server appears to return
NFS3ERR_INVAL in this case.

An alternative would be to “cap” the timestamps transmitted via NFSv3 by
Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
timestamp is transmitted as UINT_MAX.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote:
> Yes, there are some ongoing dicussions about changing the post-2038
> encoding of the timestamp in ext4, which is why this hasn't been fixed
> yet.  The main thing that's been missing is time for me to review the
> patches, and a good way of writing regression tests that will work (or
> at least not fail) on build environments with a 32-bit time_t and
> 32-bit-only capable versions of functions such as gmtime(3).
> 
> And given current discussions, I may want to think about some kind of
> superblock flag to allow the use of a 32-bit unsigned encoding for
> file systems using a 128-byte inode, with a way of setting that flag
> after scanning the file system to make sure there are no times that
> are previous to January 1, 1970.  (Or more generally, allow any epoch
> to be defined using a 64-bit time_t offset stored in the superblock...)

FWIW, I've gone through the other file system implementations once
more. The most common pattern I've encountered is to have a read_inode
function with

inode->i_mtime = le32_to_cpu(raw_inode->mtime);

which results in interpreting the time as 'signed' on 32-bit
kernels, but as 'unsigned' on 64-bit kernels. This could have been
done intentionally to extend the valid time range to 2106 on 64-bit
kernels, but it seems more likely that the code was written with
no thought given to 64-bit time_t at all. I see this pattern on
p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2,
jfs, minix, nfsv2/v3 (this was clearly intentional and is
spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv,
and ufs (protocol version 1 only).

The other behavior I see is to treat the on-disk 32-bit value
as signed on both 32-bit and 64-bit kernels:

inode->i_mtime = (signed)le32_to_cpu(raw_inode->mtime);

this seems to be done intentionally in all cases, to maintain
compatibility between 32-bit and 64-bit kernels, but it's
relatively rare: exofs, ext2/3/4 (good old inodes) and xfs
are the only ones doing this.

In case of ext2/3/4, the sign handlign was introduced here:
http://www.spinics.net/lists/linux-ext4/msg01758.html

exofs and xfs seem to have done it like this for all of git
history.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin

> On Jun 2, 2014, at 4:57, "Theodore Ts'o"  wrote:
> 
>> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
>> 
>> I think you misunderstood what I suggested: the intent is to avoid
>> seeing things break in 2038 by making them break much earlier. We have
>> a solution for ext2 file systems, it's called ext4, and we just need
>> to ensure that everybody knows they have to migrate eventually.
>> 
>> At some point before the mid 2030ies, you should no longer be able to
>> build a kernel that has support for ext2 or any other module that will
>> run into bugs later
> 
> Even for ext4, it's not quite so simple as that.  You only have
> support for times post 2038 if you are using an inode size > 128
> bytes.  There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
> 
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.
> 
> And even if we're talking about flash and embedded devices, the good
> news is if you assume that 10 years is enough time for people to
> update their embedded OS builds, and that the vast majority of
> deployed devices will probably only be in service for 10-15 years, we
> do have enough time to make file system format changes, although
> admittedly we can't afford to dilly-dally.

I have a number of file systems older than any device they are sitting on.  
RAID allows individual disks to be swapped out, and when all disks have been 
swapped out, extend the file system online.  The system doesn't even have to be 
taken offline in the process if it is possible to physically get to the drives 
with the system powered (e.g. hot plug bays), which is really damned nice.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Joseph S. Myers
On Sat, 31 May 2014, Dave Chinner wrote:

> If we are changing the in-kernel timestamp to have a greater dynamic
> range that anything we current support on disk, then we need support
> for all filesystems for similar translation and constraint. The
> filesystems need to be able to tell the kernel what they timestamp
> range they support, and then the kernel needs to follow those
> guidelines. And if the filesystem is mounted on a kernel that
> doesn't support the current filesystem's timestamp format, then at
> minimum that filesystem cannot do anything that writes a
> timestamp
> 
> Put simply: the filesystem defines the timestamp range that can be
> used safely, not the userspace API. If the filesystem can't support
> the date it is handed then that is an out-of-range error. Since
> when have we accepted that it's OK to handle out-of-range data with
> silent overflows or corruption of the data that we are attempting to
> store? We're defining a new API to support a wider date range -
> there is nothing that prevents us from saying ERANGE can be returned
> to a timestamp that the file cannot store correctly

I don't see anything new about this issue.  All problems that could arise 
from the kernel being able to represent a timestamp some filesystems can't 
are problems that already apply with 64-bit kernels using 64-bit time_t 
internally.  So while as part of Y2038-preparedness we do need a clear 
understanding of which filesystems have what timestamp limits and what 
happens with timestamps beyond those limits, I think this is a separate 
strand of the problem - one that applies to both 32-bit and 64-bit systems 
- from the more general issue for 32-bit systems.

-- 
Joseph S. Myers
jos...@codesourcery.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote:
> 
> "For new inodes we always reserve enough space for the kernel's known
> extended fields, but for inodes created with an old kernel this might
> not have been the case. None of the extended inode fields is critical
> for correct filesystem operation."
> 
> Do we have to worry about this for inodes that contain extended
> attributes and that get updated after 2038?

In practice, the extended timestamps was one of the first things added
to ext4, so the vast majority of ext4 file systems with inode sizes >
128 bytes will have room for the extended timestamps.  There are some
legacy ext3 file systems with 256-byte inodes (enabled for fast
sotrage of SELinux xattrs) that in theory, could have been converted
to ext4 and had enough xattrs so that the extended timestamps couldn't
be added.  That would be a vanishingly small use case, and in
practice, it's not likely to be the case for the embedded market.

I could imagine someone worrying about file systems originally
formatted using RHEL 4 post-2038 (perhaps running in a VM), but I
don't work for IBM any more, and hopefully even IBM would just tell
such customers that they need to suck it up, and do a
backup/reformat/restore pass.

Cheers,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
Yes, there are some ongoing dicussions about changing the post-2038
encoding of the timestamp in ext4, which is why this hasn't been fixed
yet.  The main thing that's been missing is time for me to review the
patches, and a good way of writing regression tests that will work (or
at least not fail) on build environments with a 32-bit time_t and
32-bit-only capable versions of functions such as gmtime(3).

And given current discussions, I may want to think about some kind of
superblock flag to allow the use of a 32-bit unsigned encoding for
file systems using a 128-byte inode, with a way of setting that flag
after scanning the file system to make sure there are no times that
are previous to January 1, 1970.  (Or more generally, allow any epoch
to be defined using a 64-bit time_t offset stored in the superblock...)

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> > 
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> > 
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later
> 
> Even for ext4, it's not quite so simple as that.  You only have
> support for times post 2038 if you are using an inode size > 128
> bytes.  There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
> 
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.

One stupid question about the current code:

static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{   
   if (sizeof(time->tv_sec) > 4)
   time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
   << 32;
   time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
}   

#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)\
do {   \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
(einode)->xtime.tv_sec =   \
(signed)le32_to_cpu((raw_inode)->xtime);   \
else   \
(einode)->xtime.tv_sec = 0;\
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
ext4_decode_extra_time(&(einode)->xtime,   \
   raw_inode->xtime ## _extra);\
else   \
(einode)->xtime.tv_nsec = 0;   \
} while (0)

For a time between 2038 and 2106, this looks like xtime.tv_sec is
negative when ext4_decode_extra_time gets called, so the '|=' operator
doesn't actually do anything. Shouldn't that be '+='?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> > 
> > I think you misunderstood what I suggested: the intent is to avoid
> > seeing things break in 2038 by making them break much earlier. We have
> > a solution for ext2 file systems, it's called ext4, and we just need
> > to ensure that everybody knows they have to migrate eventually.
> > 
> > At some point before the mid 2030ies, you should no longer be able to
> > build a kernel that has support for ext2 or any other module that will
> > run into bugs later
> 
> Even for ext4, it's not quite so simple as that.  You only have
> support for times post 2038 if you are using an inode size > 128
> bytes.  There are a very, very large number of machines which even
> today, are using 128 byte inodes with ext4 for performance reasons.
> 
> The vast majority of those machines which I know of can probably move
> to 256 byte inodes relatively easily, since hard drive replacement
> cycles are order 5-6 years tops, so I'm not that concerned, but it
> just goes to show this is a very complicated problem.

Ok, I see.

I also now noticed this comment above EXT4_FITS_IN_INODE():

"For new inodes we always reserve enough space for the kernel's known
extended fields, but for inodes created with an old kernel this might
not have been the case. None of the extended inode fields is critical
for correct filesystem operation."

Do we have to worry about this for inodes that contain extended
attributes and that get updated after 2038?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
> 
> I think you misunderstood what I suggested: the intent is to avoid
> seeing things break in 2038 by making them break much earlier. We have
> a solution for ext2 file systems, it's called ext4, and we just need
> to ensure that everybody knows they have to migrate eventually.
> 
> At some point before the mid 2030ies, you should no longer be able to
> build a kernel that has support for ext2 or any other module that will
> run into bugs later

Even for ext4, it's not quite so simple as that.  You only have
support for times post 2038 if you are using an inode size > 128
bytes.  There are a very, very large number of machines which even
today, are using 128 byte inodes with ext4 for performance reasons.

The vast majority of those machines which I know of can probably move
to 256 byte inodes relatively easily, since hard drive replacement
cycles are order 5-6 years tops, so I'm not that concerned, but it
just goes to show this is a very complicated problem.

And even if we're talking about flash and embedded devices, the good
news is if you assume that 10 years is enough time for people to
update their embedded OS builds, and that the vast majority of
deployed devices will probably only be in service for 10-15 years, we
do have enough time to make file system format changes, although
admittedly we can't afford to dilly-dally.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
> On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > > In my list at http://kernelnewbies.org/y2038, I found that almost
> > > all file systems at least times until 2106, because they treat
> > > the on-disk value as unsigned on 64-bit systems, or they use
> > > a completely different representation. My guess is that somebody
> > > earlier spent a lot of work on making that happen.
> > > 
> > > The exceptions are:
> > > 
> > > * exofs uses signed values, which can probably be changed to be
> > >   consistent with the others.
> > > * isofs has a bug that limits it until 2027 on architectures with
> > >   a signed 'char' type (otherwise it's 2155).
> > > * udf can represent times for many thousands of years through a
> > >   16-bit year representation, but the code to convert to epoch
> > >   uses a const array that ends at 2038.
> > > * afs uses signed seconds and can probably be fixed
> > > * coda relies on user space time representation getting passed
> > >   through an ioctl.
> > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> > >   where they really use signed.
> > > 
> > > I was confused about XFS since I didn't noticed that there are
> > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > > XFS to also use the 1970-2106 time range on 64-bit systems today.
> > 
> > You've missed an awful lot more than just the implications for the
> > core kernel code.
> > 
> > There's a good chance such changes propagate to APIs elsewhere in
> > the filesystems, because something you haven't realised is that XFS
> > effectively exposes the on-disk timestamp format directly to
> > userspace via the bulkstat interface (see struct xfs_bstat). It also
> > affects the XFS open-by-handle ioctl and the swap extent ioctl used
> > by the online defragmenter.

I really didn't look at them at all, as ioctl is very late on my
mental list of things to change. I do realize that a lot of drivers
and file systems do have ioctls that pass time values and we need to
address them one by one.

I just looked at the ioctls you mentioned but don't see how open-by-handle
is affected by this. Can you point me to what you mean?

> Just to put that in context, here's the kernel patch to add extended
> epoch support to XFS. It's completely untested as I haven't done any
> userspace code changes to enable the feature. However, it should
> give you an indication of how far the simple act of changing the
> kernel time representation spread through the filesystem. This does
> not include any of the VFS infrastructure to specifying the range of
> supported timestamps.  It survives some smoke testing, but dies when
> the online defragmenter starts using the bulkstat and swap extent
> ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
> probably don't have that all sorted correctly yet...
> 
> To test extended epoch support, however, I need to some fstests that
> define and validate the behaviour of the new syscalls - until we get
> those we can't validate that the filesystem follows the spec
> properly. I also suspect we are going to need an interface to query
> the supported range of timestamps from a filesystem so that we can
> test boundary conditions in an automated fashion

Thanks a lot for having an initial look at this yourself!

I'd still consider the two problems largely orthogonal. My patch set
(at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
more like 64-bit kernels regarding inode time stamps, which does
impact all the file systems that the a 64-bit time or the NFS
unsigned epoch (1970-2106), while your patch extends the file
system internal epoch (1901-2038 for XFS) so it can be used by
anything that knows how to handle larger than 32-bit second values
(either 64-bit kernel or 32-bit with inode_time patch).

> diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
> index 623bbe8..79f94722 100644
> --- a/fs/xfs/xfs_dinode.h
> +++ b/fs/xfs/xfs_dinode.h
> @@ -21,11 +21,53 @@
>  #defineXFS_DINODE_MAGIC0x494e  /* 'IN' */
>  #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3)
>  
> +/*
> + * Inode timestamps get more complex when we consider supporting times beyond
> + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot 
> support
> + * more than a single extension by playing sign games, and that is still not
> + * reliable. We also can't extend the timestamp structure because there is no
> + * free space around them in the on-disk inode.
> + *
> + * Hence the simplest thing to do is to add an epoch counter for each 
> timestamp
> + * in the inode. This can be a single byte for each timestamp and make use of
> + * a hole we currently pad. This gives us another 255 epochs range for the
> + * timestamps, but requires a superblock feature bit to indicate that these
> + * fields 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Roger Willcocks

On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote:

> 
> The 32 bit second counters in timestamps are too small to represent
> time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
> format for a timestamp to include an 8-bit epoch counter so that we
> can extend time for up to 255 Unix epochs. This should be good for
> representing timestamps from 1970 to somewhere around 19,000 A.D
> 

I assume you're using an 'epoch' variable and not simply using the
padding byte as an eight-bit prefix to the existing 32-bit counter
because the existing counter is signed ?

For long term sanity it might make more sense for the eight-bit value to
be a simple (sign-extended) prefix from 1970.

So if the feature bit is set it's a 40-bit signed time, which is good
for 1970 +/- 17400 years or so.

--
Roger





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote:
> Perhaps we should make this a kernel command line option instead, with the
> settings: error out on outside the standard window, or a date indicating the
> earliest date that should be recognized and do windowing (0 for no windowing,
> 1970 for retconning the Unix epoch as unsigned...)

What's wrong with compile-time errors? We have a pretty good understanding
of how time values are passed in the kernel, and we know they will all break
in 2038 for 32-bit kernels unless we do something about it.
 
> But again, the kernel is probably the least problem here...
 
I agree the glibc side is harder than this, but we have to get the kernel
into shape first (at the minimum we have to do the APIs), and there is enough
work to do here.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
> 
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> > 
> > * any system calls that pass a time_t, timeval or timespec on
> >   32-bit systems return -ENOSYS, to ensure all user land uses
> >   the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> >   from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> >   file since ext4 can mount old file systems.
> 
> Syscalls and libs can be "fixed".  Existing filesystem content might 
> not.  So if you need to mount some old media in read-write mode after 
> 2038 and that happens to content an ext2 or similarly limited filesystem 
> then it'd better just "work".  Having the kernel refuse to modify the 
> filesystem would be unacceptable.

I think you misunderstood what I suggested: the intent is to avoid
seeing things break in 2038 by making them break much earlier. We have
a solution for ext2 file systems, it's called ext4, and we just need
to ensure that everybody knows they have to migrate eventually.

At some point before the mid 2030ies, you should no longer be able to
build a kernel that has support for ext2 or any other module that will
run into bugs later. Until then (rather sooner than later), I'd like
to get to the point where you can choose whether to include those
modules at build time or not, and then get everybody to turn off that
option and fix the bugs they run into. You wouldn't need that for a
2014-generation long-term support disto (rhel 7, sles 12, debian 7,
ubuntu 14.04, ...), but perhaps for the next generation, or the
one after that.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Geert Uytterhoeven
On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner  wrote:
> Filesystems place all sorts of userspace visible limits on storage -
> ever tried to create a file >16TB on ext4? The on-disk format
> doesn't support it, so it returns an out of range error (E2BIG, I
> think) if you try. XFS, OTOH, handles this just fine and so it
> continues to work. It's exactly the same with timestamps - there's a
> physical limit to what can sanely be stored in any given filesystem
> and it's an *error condition* to go beyond that limit

This comparison doesn't fly.
File sizes do not depend on the current time (except for the increase of
megapixels in your new camera ;-).
Writing a 15 GiB file to ext4 is not something that magically stops working
tomorrow.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Geert Uytterhoeven
On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner da...@fromorbit.com wrote:
 Filesystems place all sorts of userspace visible limits on storage -
 ever tried to create a file 16TB on ext4? The on-disk format
 doesn't support it, so it returns an out of range error (E2BIG, I
 think) if you try. XFS, OTOH, handles this just fine and so it
 continues to work. It's exactly the same with timestamps - there's a
 physical limit to what can sanely be stored in any given filesystem
 and it's an *error condition* to go beyond that limit

This comparison doesn't fly.
File sizes do not depend on the current time (except for the increase of
megapixels in your new camera ;-).
Writing a 15 GiB file to ext4 is not something that magically stops working
tomorrow.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
 
  For actually running kernels beyond 2038, the best idea I've seen so
  far is to disallow all broken code at compile time. I don't see
  a choice but to audit the entire kernel for invalid uses on both
  32 and 64 bit in the next few years. A lot of code will get changed
  in the process so we can actually keep running 32-bit kernels and
  file systems, but other code will likely go away:
  
  * any system calls that pass a time_t, timeval or timespec on
32-bit systems return -ENOSYS, to ensure all user land uses
the replacements we will put into place
  * The definition of 'time_t', 'timval' and 'timespec' can be hidden
from the kernel, and all code using it left out.
  * ext2 and ext3 file system code will have to be disabled, but that's
file since ext4 can mount old file systems.
 
 Syscalls and libs can be fixed.  Existing filesystem content might 
 not.  So if you need to mount some old media in read-write mode after 
 2038 and that happens to content an ext2 or similarly limited filesystem 
 then it'd better just work.  Having the kernel refuse to modify the 
 filesystem would be unacceptable.

I think you misunderstood what I suggested: the intent is to avoid
seeing things break in 2038 by making them break much earlier. We have
a solution for ext2 file systems, it's called ext4, and we just need
to ensure that everybody knows they have to migrate eventually.

At some point before the mid 2030ies, you should no longer be able to
build a kernel that has support for ext2 or any other module that will
run into bugs later. Until then (rather sooner than later), I'd like
to get to the point where you can choose whether to include those
modules at build time or not, and then get everybody to turn off that
option and fix the bugs they run into. You wouldn't need that for a
2014-generation long-term support disto (rhel 7, sles 12, debian 7,
ubuntu 14.04, ...), but perhaps for the next generation, or the
one after that.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote:
 Perhaps we should make this a kernel command line option instead, with the
 settings: error out on outside the standard window, or a date indicating the
 earliest date that should be recognized and do windowing (0 for no windowing,
 1970 for retconning the Unix epoch as unsigned...)

What's wrong with compile-time errors? We have a pretty good understanding
of how time values are passed in the kernel, and we know they will all break
in 2038 for 32-bit kernels unless we do something about it.
 
 But again, the kernel is probably the least problem here...
 
I agree the glibc side is harder than this, but we have to get the kernel
into shape first (at the minimum we have to do the APIs), and there is enough
work to do here.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Roger Willcocks

On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote:

 
 The 32 bit second counters in timestamps are too small to represent
 time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
 format for a timestamp to include an 8-bit epoch counter so that we
 can extend time for up to 255 Unix epochs. This should be good for
 representing timestamps from 1970 to somewhere around 19,000 A.D
 

I assume you're using an 'epoch' variable and not simply using the
padding byte as an eight-bit prefix to the existing 32-bit counter
because the existing counter is signed ?

For long term sanity it might make more sense for the eight-bit value to
be a simple (sign-extended) prefix from 1970.

So if the feature bit is set it's a 40-bit signed time, which is good
for 1970 +/- 17400 years or so.

--
Roger





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
 On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
  On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
   In my list at http://kernelnewbies.org/y2038, I found that almost
   all file systems at least times until 2106, because they treat
   the on-disk value as unsigned on 64-bit systems, or they use
   a completely different representation. My guess is that somebody
   earlier spent a lot of work on making that happen.
   
   The exceptions are:
   
   * exofs uses signed values, which can probably be changed to be
 consistent with the others.
   * isofs has a bug that limits it until 2027 on architectures with
 a signed 'char' type (otherwise it's 2155).
   * udf can represent times for many thousands of years through a
 16-bit year representation, but the code to convert to epoch
 uses a const array that ends at 2038.
   * afs uses signed seconds and can probably be fixed
   * coda relies on user space time representation getting passed
 through an ioctl.
   * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
 where they really use signed.
   
   I was confused about XFS since I didn't noticed that there are
   separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
   XFS to also use the 1970-2106 time range on 64-bit systems today.
  
  You've missed an awful lot more than just the implications for the
  core kernel code.
  
  There's a good chance such changes propagate to APIs elsewhere in
  the filesystems, because something you haven't realised is that XFS
  effectively exposes the on-disk timestamp format directly to
  userspace via the bulkstat interface (see struct xfs_bstat). It also
  affects the XFS open-by-handle ioctl and the swap extent ioctl used
  by the online defragmenter.

I really didn't look at them at all, as ioctl is very late on my
mental list of things to change. I do realize that a lot of drivers
and file systems do have ioctls that pass time values and we need to
address them one by one.

I just looked at the ioctls you mentioned but don't see how open-by-handle
is affected by this. Can you point me to what you mean?

 Just to put that in context, here's the kernel patch to add extended
 epoch support to XFS. It's completely untested as I haven't done any
 userspace code changes to enable the feature. However, it should
 give you an indication of how far the simple act of changing the
 kernel time representation spread through the filesystem. This does
 not include any of the VFS infrastructure to specifying the range of
 supported timestamps.  It survives some smoke testing, but dies when
 the online defragmenter starts using the bulkstat and swap extent
 ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
 probably don't have that all sorted correctly yet...
 
 To test extended epoch support, however, I need to some fstests that
 define and validate the behaviour of the new syscalls - until we get
 those we can't validate that the filesystem follows the spec
 properly. I also suspect we are going to need an interface to query
 the supported range of timestamps from a filesystem so that we can
 test boundary conditions in an automated fashion

Thanks a lot for having an initial look at this yourself!

I'd still consider the two problems largely orthogonal. My patch set
(at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
more like 64-bit kernels regarding inode time stamps, which does
impact all the file systems that the a 64-bit time or the NFS
unsigned epoch (1970-2106), while your patch extends the file
system internal epoch (1901-2038 for XFS) so it can be used by
anything that knows how to handle larger than 32-bit second values
(either 64-bit kernel or 32-bit with inode_time patch).

 diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h
 index 623bbe8..79f94722 100644
 --- a/fs/xfs/xfs_dinode.h
 +++ b/fs/xfs/xfs_dinode.h
 @@ -21,11 +21,53 @@
  #defineXFS_DINODE_MAGIC0x494e  /* 'IN' */
  #define XFS_DINODE_GOOD_VERSION(v) ((v) = 1  (v) = 3)
  
 +/*
 + * Inode timestamps get more complex when we consider supporting times beyond
 + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot 
 support
 + * more than a single extension by playing sign games, and that is still not
 + * reliable. We also can't extend the timestamp structure because there is no
 + * free space around them in the on-disk inode.
 + *
 + * Hence the simplest thing to do is to add an epoch counter for each 
 timestamp
 + * in the inode. This can be a single byte for each timestamp and make use of
 + * a hole we currently pad. This gives us another 255 epochs range for the
 + * timestamps, but requires a superblock feature bit to indicate that these
 + * fields have meaning and can be non-zero.

Nice trick!

 +static inline __uint8_t
 +xfs_timestamp_epoch(
 +   struct timespec *time)
 +{
 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
 
 I think you misunderstood what I suggested: the intent is to avoid
 seeing things break in 2038 by making them break much earlier. We have
 a solution for ext2 file systems, it's called ext4, and we just need
 to ensure that everybody knows they have to migrate eventually.
 
 At some point before the mid 2030ies, you should no longer be able to
 build a kernel that has support for ext2 or any other module that will
 run into bugs later

Even for ext4, it's not quite so simple as that.  You only have
support for times post 2038 if you are using an inode size  128
bytes.  There are a very, very large number of machines which even
today, are using 128 byte inodes with ext4 for performance reasons.

The vast majority of those machines which I know of can probably move
to 256 byte inodes relatively easily, since hard drive replacement
cycles are order 5-6 years tops, so I'm not that concerned, but it
just goes to show this is a very complicated problem.

And even if we're talking about flash and embedded devices, the good
news is if you assume that 10 years is enough time for people to
update their embedded OS builds, and that the vast majority of
deployed devices will probably only be in service for 10-15 years, we
do have enough time to make file system format changes, although
admittedly we can't afford to dilly-dally.

Regards,

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
 On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
  
  I think you misunderstood what I suggested: the intent is to avoid
  seeing things break in 2038 by making them break much earlier. We have
  a solution for ext2 file systems, it's called ext4, and we just need
  to ensure that everybody knows they have to migrate eventually.
  
  At some point before the mid 2030ies, you should no longer be able to
  build a kernel that has support for ext2 or any other module that will
  run into bugs later
 
 Even for ext4, it's not quite so simple as that.  You only have
 support for times post 2038 if you are using an inode size  128
 bytes.  There are a very, very large number of machines which even
 today, are using 128 byte inodes with ext4 for performance reasons.
 
 The vast majority of those machines which I know of can probably move
 to 256 byte inodes relatively easily, since hard drive replacement
 cycles are order 5-6 years tops, so I'm not that concerned, but it
 just goes to show this is a very complicated problem.

Ok, I see.

I also now noticed this comment above EXT4_FITS_IN_INODE():

For new inodes we always reserve enough space for the kernel's known
extended fields, but for inodes created with an old kernel this might
not have been the case. None of the extended inode fields is critical
for correct filesystem operation.

Do we have to worry about this for inodes that contain extended
attributes and that get updated after 2038?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote:
 On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
  
  I think you misunderstood what I suggested: the intent is to avoid
  seeing things break in 2038 by making them break much earlier. We have
  a solution for ext2 file systems, it's called ext4, and we just need
  to ensure that everybody knows they have to migrate eventually.
  
  At some point before the mid 2030ies, you should no longer be able to
  build a kernel that has support for ext2 or any other module that will
  run into bugs later
 
 Even for ext4, it's not quite so simple as that.  You only have
 support for times post 2038 if you are using an inode size  128
 bytes.  There are a very, very large number of machines which even
 today, are using 128 byte inodes with ext4 for performance reasons.
 
 The vast majority of those machines which I know of can probably move
 to 256 byte inodes relatively easily, since hard drive replacement
 cycles are order 5-6 years tops, so I'm not that concerned, but it
 just goes to show this is a very complicated problem.

One stupid question about the current code:

static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra)
{   
   if (sizeof(time-tv_sec)  4)
   time-tv_sec |= (__u64)(le32_to_cpu(extra)  EXT4_EPOCH_MASK)
32;
   time-tv_nsec = (le32_to_cpu(extra)  EXT4_NSEC_MASK)  EXT4_EPOCH_BITS;
}   

#define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)\
do {   \
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime))  \
(einode)-xtime.tv_sec =   \
(signed)le32_to_cpu((raw_inode)-xtime);   \
else   \
(einode)-xtime.tv_sec = 0;\
if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\
ext4_decode_extra_time((einode)-xtime,   \
   raw_inode-xtime ## _extra);\
else   \
(einode)-xtime.tv_nsec = 0;   \
} while (0)

For a time between 2038 and 2106, this looks like xtime.tv_sec is
negative when ext4_decode_extra_time gets called, so the '|=' operator
doesn't actually do anything. Shouldn't that be '+='?

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
Yes, there are some ongoing dicussions about changing the post-2038
encoding of the timestamp in ext4, which is why this hasn't been fixed
yet.  The main thing that's been missing is time for me to review the
patches, and a good way of writing regression tests that will work (or
at least not fail) on build environments with a 32-bit time_t and
32-bit-only capable versions of functions such as gmtime(3).

And given current discussions, I may want to think about some kind of
superblock flag to allow the use of a 32-bit unsigned encoding for
file systems using a 128-byte inode, with a way of setting that flag
after scanning the file system to make sure there are no times that
are previous to January 1, 1970.  (Or more generally, allow any epoch
to be defined using a 64-bit time_t offset stored in the superblock...)

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote:
 
 For new inodes we always reserve enough space for the kernel's known
 extended fields, but for inodes created with an old kernel this might
 not have been the case. None of the extended inode fields is critical
 for correct filesystem operation.
 
 Do we have to worry about this for inodes that contain extended
 attributes and that get updated after 2038?

In practice, the extended timestamps was one of the first things added
to ext4, so the vast majority of ext4 file systems with inode sizes 
128 bytes will have room for the extended timestamps.  There are some
legacy ext3 file systems with 256-byte inodes (enabled for fast
sotrage of SELinux xattrs) that in theory, could have been converted
to ext4 and had enough xattrs so that the extended timestamps couldn't
be added.  That would be a vanishingly small use case, and in
practice, it's not likely to be the case for the embedded market.

I could imagine someone worrying about file systems originally
formatted using RHEL 4 post-2038 (perhaps running in a VM), but I
don't work for IBM any more, and hopefully even IBM would just tell
such customers that they need to suck it up, and do a
backup/reformat/restore pass.

Cheers,
- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Joseph S. Myers
On Sat, 31 May 2014, Dave Chinner wrote:

 If we are changing the in-kernel timestamp to have a greater dynamic
 range that anything we current support on disk, then we need support
 for all filesystems for similar translation and constraint. The
 filesystems need to be able to tell the kernel what they timestamp
 range they support, and then the kernel needs to follow those
 guidelines. And if the filesystem is mounted on a kernel that
 doesn't support the current filesystem's timestamp format, then at
 minimum that filesystem cannot do anything that writes a
 timestamp
 
 Put simply: the filesystem defines the timestamp range that can be
 used safely, not the userspace API. If the filesystem can't support
 the date it is handed then that is an out-of-range error. Since
 when have we accepted that it's OK to handle out-of-range data with
 silent overflows or corruption of the data that we are attempting to
 store? We're defining a new API to support a wider date range -
 there is nothing that prevents us from saying ERANGE can be returned
 to a timestamp that the file cannot store correctly

I don't see anything new about this issue.  All problems that could arise 
from the kernel being able to represent a timestamp some filesystems can't 
are problems that already apply with 64-bit kernels using 64-bit time_t 
internally.  So while as part of Y2038-preparedness we do need a clear 
understanding of which filesystems have what timestamp limits and what 
happens with timestamps beyond those limits, I think this is a separate 
strand of the problem - one that applies to both 32-bit and 64-bit systems 
- from the more general issue for 32-bit systems.

-- 
Joseph S. Myers
jos...@codesourcery.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin

 On Jun 2, 2014, at 4:57, Theodore Ts'o ty...@mit.edu wrote:
 
 On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote:
 
 I think you misunderstood what I suggested: the intent is to avoid
 seeing things break in 2038 by making them break much earlier. We have
 a solution for ext2 file systems, it's called ext4, and we just need
 to ensure that everybody knows they have to migrate eventually.
 
 At some point before the mid 2030ies, you should no longer be able to
 build a kernel that has support for ext2 or any other module that will
 run into bugs later
 
 Even for ext4, it's not quite so simple as that.  You only have
 support for times post 2038 if you are using an inode size  128
 bytes.  There are a very, very large number of machines which even
 today, are using 128 byte inodes with ext4 for performance reasons.
 
 The vast majority of those machines which I know of can probably move
 to 256 byte inodes relatively easily, since hard drive replacement
 cycles are order 5-6 years tops, so I'm not that concerned, but it
 just goes to show this is a very complicated problem.
 
 And even if we're talking about flash and embedded devices, the good
 news is if you assume that 10 years is enough time for people to
 update their embedded OS builds, and that the vast majority of
 deployed devices will probably only be in service for 10-15 years, we
 do have enough time to make file system format changes, although
 admittedly we can't afford to dilly-dally.

I have a number of file systems older than any device they are sitting on.  
RAID allows individual disks to be swapped out, and when all disks have been 
swapped out, extend the file system online.  The system doesn't even have to be 
taken offline in the process if it is possible to physically get to the drives 
with the system powered (e.g. hot plug bays), which is really damned nice.--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote:
 Yes, there are some ongoing dicussions about changing the post-2038
 encoding of the timestamp in ext4, which is why this hasn't been fixed
 yet.  The main thing that's been missing is time for me to review the
 patches, and a good way of writing regression tests that will work (or
 at least not fail) on build environments with a 32-bit time_t and
 32-bit-only capable versions of functions such as gmtime(3).
 
 And given current discussions, I may want to think about some kind of
 superblock flag to allow the use of a 32-bit unsigned encoding for
 file systems using a 128-byte inode, with a way of setting that flag
 after scanning the file system to make sure there are no times that
 are previous to January 1, 1970.  (Or more generally, allow any epoch
 to be defined using a 64-bit time_t offset stored in the superblock...)

FWIW, I've gone through the other file system implementations once
more. The most common pattern I've encountered is to have a read_inode
function with

inode-i_mtime = le32_to_cpu(raw_inode-mtime);

which results in interpreting the time as 'signed' on 32-bit
kernels, but as 'unsigned' on 64-bit kernels. This could have been
done intentionally to extend the valid time range to 2106 on 64-bit
kernels, but it seems more likely that the code was written with
no thought given to 64-bit time_t at all. I see this pattern on
p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2,
jfs, minix, nfsv2/v3 (this was clearly intentional and is
spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv,
and ufs (protocol version 1 only).

The other behavior I see is to treat the on-disk 32-bit value
as signed on both 32-bit and 64-bit kernels:

inode-i_mtime = (signed)le32_to_cpu(raw_inode-mtime);

this seems to be done intentionally in all cases, to maintain
compatibility between 32-bit and 64-bit kernels, but it's
relatively rare: exofs, ext2/3/4 (good old inodes) and xfs
are the only ones doing this.

In case of ext2/3/4, the sign handlign was introduced here:
http://www.spinics.net/lists/linux-ext4/msg01758.html

exofs and xfs seem to have done it like this for all of git
history.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Chuck Lever

On Jun 2, 2014, at 6:56 AM, Arnd Bergmann a...@arndb.de wrote:

 On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote:
 
 For actually running kernels beyond 2038, the best idea I've seen so
 far is to disallow all broken code at compile time. I don't see
 a choice but to audit the entire kernel for invalid uses on both
 32 and 64 bit in the next few years. A lot of code will get changed
 in the process so we can actually keep running 32-bit kernels and
 file systems, but other code will likely go away:
 
 * any system calls that pass a time_t, timeval or timespec on
  32-bit systems return -ENOSYS, to ensure all user land uses
  the replacements we will put into place
 * The definition of 'time_t', 'timval' and 'timespec' can be hidden
  from the kernel, and all code using it left out.
 * ext2 and ext3 file system code will have to be disabled, but that's
  file since ext4 can mount old file systems.
 
 Syscalls and libs can be fixed.  Existing filesystem content might 
 not.  So if you need to mount some old media in read-write mode after 
 2038 and that happens to content an ext2 or similarly limited filesystem 
 then it'd better just work.  Having the kernel refuse to modify the 
 filesystem would be unacceptable.
 
 I think you misunderstood what I suggested: the intent is to avoid
 seeing things break in 2038 by making them break much earlier. We have
 a solution for ext2 file systems, it's called ext4, and we just need
 to ensure that everybody knows they have to migrate eventually.
 
 At some point before the mid 2030ies, you should no longer be able to
 build a kernel that has support for ext2 or any other module that will
 run into bugs later. Until then (rather sooner than later), I'd like
 to get to the point where you can choose whether to include those
 modules at build time or not, and then get everybody to turn off that
 option and fix the bugs they run into. You wouldn't need that for a
 2014-generation long-term support disto (rhel 7, sles 12, debian 7,
 ubuntu 14.04, ...), but perhaps for the next generation, or the
 one after that.

I’m wondering what should be done about NFS. A solution for NFS should
match any scheme that is considered for local file systems, IMO.

NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
(See the definition of nfstime3 in RFC 1813).

NFSv4 uses a signed 64-bit value where zero represents midnight UTC
on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
the definition of nfstime4 in RFC 5661).

The NFSv4 protocol is probably not problematic, and NFSv3 should be out
of the picture by 2038. But if changes are planned for dealing _now_
with timestamp issues, compatibility with NFSv3 is a consideration.

It is already the case that, via NFSv3, the Linux NFS client transmits
timestamps earlier than 1970 as large positive numbers. Try this with
xfstests generic/258.

Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and
timestamps larger than can be represented in an unsigned 32-bit field
and return an immediate error to the requesting application (like EINVAL).

If the Linux NFS server encounters a local file with a timestamp that
cannot be represented via a u32, should it also return NFS3ERR_INVAL?

RFC 1813 does not provide guidance on the behavior nor does it suggest
a particular error status code. The Solaris 11 server appears to return
NFS3ERR_INVAL in this case.

An alternative would be to “cap” the timestamps transmitted via NFSv3 by
Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
timestamp is transmitted as UINT_MAX.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote:
 I’m wondering what should be done about NFS. A solution for NFS should
 match any scheme that is considered for local file systems, IMO.
 
 An alternative would be to “cap” the timestamps transmitted via NFSv3 by
 Linux, so that a pre-epoch timestamp is transmitted as zero, and a large
 timestamp is transmitted as UINT_MAX.


I wonder if it would make sense to try to promulgate via the Austin
group, and possibly the C standards committee the concept of a bit
pattern (that might commonly be INT_MAX or UINT_MAX) that means time
unknown, or time indefinite or we couldn't encode the time.

We would then teach gmtime(3) and asctime(3) to print some appropriate
message, and we could teach programs like find (with the -mtime)
option, make, tmpwatch, et. al., that they can't make any presumption
about the comparibility of any timestamp which has a value of
TIME_UNDEFINIED.

It would be problematic for time(2) or gettimeofday(2) to return
TIME_UNDEFINED, since there are programs that care about time ticking
forward, but I could imagine a new interface which would be permitted
to return a flag indicating that we don't know the current time
(because the CMOS battery had run down, etc.) so instead we're going
to be counting the number of seconds since the system was booted.

   - Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin
On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
 
 I wonder if it would make sense to try to promulgate via the Austin
 group, and possibly the C standards committee the concept of a bit
 pattern (that might commonly be INT_MAX or UINT_MAX) that means time
 unknown, or time indefinite or we couldn't encode the time.
 

(time_t)-1 already has this meaning for some calls (e.g. time(2)).
However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
something similar applies to all possible bit patterns, certainly within
the range of an int.

 We would then teach gmtime(3) and asctime(3) to print some appropriate
 message, and we could teach programs like find (with the -mtime)
 option, make, tmpwatch, et. al., that they can't make any presumption
 about the comparibility of any timestamp which has a value of
 TIME_UNDEFINIED.
 
 It would be problematic for time(2) or gettimeofday(2) to return
 TIME_UNDEFINED, since there are programs that care about time ticking
 forward, but I could imagine a new interface which would be permitted
 to return a flag indicating that we don't know the current time
 (because the CMOS battery had run down, etc.) so instead we're going
 to be counting the number of seconds since the system was booted.

This assumes that we actually know that that is the case, which may be
an aggressive assumption.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 11:04:23 Chuck Lever wrote:
 I’m wondering what should be done about NFS. A solution for NFS should
 match any scheme that is considered for local file systems, IMO.
 
 NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
 seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
 (See the definition of nfstime3 in RFC 1813).
 
 NFSv4 uses a signed 64-bit value where zero represents midnight UTC
 on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See
 the definition of nfstime4 in RFC 5661).
 
 The NFSv4 protocol is probably not problematic, and NFSv3 should be out
 of the picture by 2038. But if changes are planned for dealing _now_
 with timestamp issues, compatibility with NFSv3 is a consideration.
 
 It is already the case that, via NFSv3, the Linux NFS client transmits
 timestamps earlier than 1970 as large positive numbers. Try this with
 xfstests generic/258.

If I read the code correctly, a pre-1970 timestamp will be sent as
a large unsigned integer, but received as a post-2038 timestamp on
64-bit kernels, both in the nfs client and server code.

This behavior is clearly wrong, but it's the same bug that we have
in lots of other file systems, and it makes sense to have the
same fix everywhere, at lease the cases where we know what interpretation
we actually want. NFS has the luxury of having an actual specification
saying that the value is unsigned. For most of the legacy file systems,
we can only make a guess at how other OSs would interpret the same
numbers.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote:
 On 06/02/2014 08:31 AM, Theodore Ts'o wrote:
  
  I wonder if it would make sense to try to promulgate via the Austin
  group, and possibly the C standards committee the concept of a bit
  pattern (that might commonly be INT_MAX or UINT_MAX) that means time
  unknown, or time indefinite or we couldn't encode the time.
  
 
 (time_t)-1 already has this meaning for some calls (e.g. time(2)).
 However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately
 something similar applies to all possible bit patterns, certainly within
 the range of an int.

Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means
Sun Feb  7 07:28:15 CET 2106, and that is much harder to distinguish
from a real future date.

If we had the choice, I'd go for something like 1, i.e.
Thu Jan  1 01:00:01 CET 1970.

  We would then teach gmtime(3) and asctime(3) to print some appropriate
  message, and we could teach programs like find (with the -mtime)
  option, make, tmpwatch, et. al., that they can't make any presumption
  about the comparibility of any timestamp which has a value of
  TIME_UNDEFINIED.
  
  It would be problematic for time(2) or gettimeofday(2) to return
  TIME_UNDEFINED, since there are programs that care about time ticking
  forward, but I could imagine a new interface which would be permitted
  to return a flag indicating that we don't know the current time
  (because the CMOS battery had run down, etc.) so instead we're going
  to be counting the number of seconds since the system was booted.
 
 This assumes that we actually know that that is the case, which may be
 an aggressive assumption.

It's harder for time(2), but for the inode case, we can definitely
detect when the file system specific representation overflows
or underflows, which may be be at a number of very different points
of time.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Roger Willcocks

On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:

 NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
 seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
 (See the definition of nfstime3 in RFC 1813).
 

nfstime3 could be extended by redefining the otherwise unused
nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
seconds field and an unsigned 30-bit nanoseconds field.

This could represent 1970 +/- 272 years.

Servers could indicate they can understand the extended time format by
adding a new FSINFO capability - FSF3_CANSETTIME_EX.

Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
timestamps so old servers would be protected from new clients.

Old clients don't need to be protected from new servers because the
on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
so they're no worse off than they were before.

Arguably the new server ought to clamp out-of-range timestamps before
sending them to old clients but that would need per-client state (and
nfs3 is stateless.)

--
Roger


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Chuck Lever

On Jun 2, 2014, at 2:58 PM, Roger Willcocks ro...@filmlight.ltd.uk wrote:

 
 On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
 
 NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
 seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
 (See the definition of nfstime3 in RFC 1813).
 
 
 nfstime3 could be extended by redefining the otherwise unused
 nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
 seconds field and an unsigned 30-bit nanoseconds field.
 
 This could represent 1970 +/- 272 years.
 
 Servers could indicate they can understand the extended time format by
 adding a new FSINFO capability - FSF3_CANSETTIME_EX.
 
 Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
 timestamps so old servers would be protected from new clients.

You would have to get the IETF’s NFSv4 working group to sign off on
this change. Otherwise, Linux would be the only NFSv3 implementation
that supports the extension.

But I suspect the answer you’d get is “Use NFSv4.”

 Old clients don't need to be protected from new servers because the
 on-the-wire bit pattern for dates between 1970 and 2106 stays the same,
 so they're no worse off than they were before.
 
 Arguably the new server ought to clamp out-of-range timestamps before
 sending them to old clients but that would need per-client state (and
 nfs3 is stateless.)

There’s no reliable way in NFSv3 for clients and servers to identify
the software running on the peer.

Practically speaking, you should assume that the NFSv3 protocol is never
going to change.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Arnd Bergmann
On Monday 02 June 2014 15:04:27 Chuck Lever wrote:
 On Jun 2, 2014, at 2:58 PM, Roger Willcocks ro...@filmlight.ltd.uk wrote:
 
  
  On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote:
  
  NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for
  seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds.
  (See the definition of nfstime3 in RFC 1813).
  
  
  nfstime3 could be extended by redefining the otherwise unused
  nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit
  seconds field and an unsigned 30-bit nanoseconds field.
  
  This could represent 1970 +/- 272 years.
  
  Servers could indicate they can understand the extended time format by
  adding a new FSINFO capability - FSF3_CANSETTIME_EX.
  
  Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending
  timestamps so old servers would be protected from new clients.
 
 You would have to get the IETF’s NFSv4 working group to sign off on
 this change. Otherwise, Linux would be the only NFSv3 implementation
 that supports the extension.
 
 But I suspect the answer you’d get is “Use NFSv4.”

While I've never dealt with an NFS standardization, I'd assume this is
a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid
range of times until 2106 using unsigned seconds, and that should really
give enough time to migrate to something better (not necessarily NFSv4).

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote:
  It would be problematic for time(2) or gettimeofday(2) to return
  TIME_UNDEFINED, since there are programs that care about time ticking
  forward, but I could imagine a new interface which would be permitted
  to return a flag indicating that we don't know the current time
  (because the CMOS battery had run down, etc.) so instead we're going
  to be counting the number of seconds since the system was booted.
 
 This assumes that we actually know that that is the case, which may be
 an aggressive assumption.

We won't know if the RTC clock is wrong, true --- but the kernel will
know if (a) the hardware doesn't have RTC clock at all, or if (b) the
RTC clock is ticking some time that can't be encoded using the current
time_t type.  So in that case, the fallback would be to be for the
kernel to tick starting with time_t == 0 when the system is initially
booted, and the time indefinite flag would be set.

Now assume that we have a new system call, gettimestampofday(2), which
returns a new timestamp structure which has a 64-bit ts_sec field, the
ts_nsec field (ala struct timespec), and a ts_flags field, where the
kernel could signal things like time invalid, or time can't be
encoded in the legacy time_t type, or I'm not sure if the time is
correct --- i.e., because the RTC battery isn't working.

Not all hardware might be able to support the last, of course, but if
the battery is low, or the system has been exposed to very low
temperatures (or large amounts of cosmic radiation, etc.)  the RTC
time may just be plain wrong.  No system is going to be perfect, but
it should be possible to make htings better, at for certain classes of
hardware.

And since we are already returning (time_t) -1 in some cases, we might
as well try to make things a bit more formal.

- Ted

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin
On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
 
 And since we are already returning (time_t) -1 in some cases, we might
 as well try to make things a bit more formal.
 

Are we?  I am not aware of *Linux* actually using that.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Theodore Ts'o
On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
 On 06/02/2014 03:29 PM, Theodore Ts'o wrote:
  
  And since we are already returning (time_t) -1 in some cases, we might
  as well try to make things a bit more formal.
  
 
 Are we?  I am not aware of *Linux* actually using that.

Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
the Posix specification:

SYSCALL_DEFINE1(time, time_t __user *, tloc)
{
time_t i = get_seconds();

if (tloc) {
if (put_user(i,tloc))
return -EFAULT;
}
force_successful_syscall_return();
return i;
}

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread H. Peter Anvin
On 06/02/2014 04:32 PM, Theodore Ts'o wrote:
 On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote:
 On 06/02/2014 03:29 PM, Theodore Ts'o wrote:

 And since we are already returning (time_t) -1 in some cases, we might
 as well try to make things a bit more formal.


 Are we?  I am not aware of *Linux* actually using that.
 
 Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per
 the Posix specification:
 
 SYSCALL_DEFINE1(time, time_t __user *, tloc)
 {
   time_t i = get_seconds();
 
   if (tloc) {
   if (put_user(i,tloc))
   return -EFAULT;
   }
   force_successful_syscall_return();
   return i;
 }
 

OK, I guess I should have said... other than for -EFAULT.

I just don't know of anyone using time(2) with an argument other than NULL.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-02 Thread Dave Chinner
On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote:
 On Monday 02 June 2014 10:28:22 Dave Chinner wrote:
  On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
   On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
In my list at http://kernelnewbies.org/y2038, I found that almost
all file systems at least times until 2106, because they treat
the on-disk value as unsigned on 64-bit systems, or they use
a completely different representation. My guess is that somebody
earlier spent a lot of work on making that happen.

The exceptions are:

* exofs uses signed values, which can probably be changed to be
  consistent with the others.
* isofs has a bug that limits it until 2027 on architectures with
  a signed 'char' type (otherwise it's 2155).
* udf can represent times for many thousands of years through a
  16-bit year representation, but the code to convert to epoch
  uses a const array that ends at 2038.
* afs uses signed seconds and can probably be fixed
* coda relies on user space time representation getting passed
  through an ioctl.
* I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
  where they really use signed.

I was confused about XFS since I didn't noticed that there are
separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
XFS to also use the 1970-2106 time range on 64-bit systems today.
   
   You've missed an awful lot more than just the implications for the
   core kernel code.
   
   There's a good chance such changes propagate to APIs elsewhere in
   the filesystems, because something you haven't realised is that XFS
   effectively exposes the on-disk timestamp format directly to
   userspace via the bulkstat interface (see struct xfs_bstat). It also
   affects the XFS open-by-handle ioctl and the swap extent ioctl used
   by the online defragmenter.
 
 I really didn't look at them at all, as ioctl is very late on my
 mental list of things to change. I do realize that a lot of drivers
 and file systems do have ioctls that pass time values and we need to
 address them one by one.
 
 I just looked at the ioctls you mentioned but don't see how open-by-handle
 is affected by this. Can you point me to what you mean?

Sorry, I misremembered how some of the XFS open-by-handle code works
in userspace (XFS has a pretty rich open-by-handle ioctl() interface
that predates the kernel syscalls by at least 10 years).  Basically
there is code in userspace that uses the information returned from
bulkstat to construct file handles to pass to the open-by-handle
ioctls. xfs_fsr then uses the combination of open-by-handle from the
bulkstat output and the bulkstat output to feed into the swap extent
ioctls

i.e. the filesystem's idea of what time is is passed to userspace as
an opaque cookie in this case, but it is not used directly by the
open-by-handle interfaces like I implied it was.

  Just to put that in context, here's the kernel patch to add extended
  epoch support to XFS. It's completely untested as I haven't done any
  userspace code changes to enable the feature. However, it should
  give you an indication of how far the simple act of changing the
  kernel time representation spread through the filesystem. This does
  not include any of the VFS infrastructure to specifying the range of
  supported timestamps.  It survives some smoke testing, but dies when
  the online defragmenter starts using the bulkstat and swap extent
  ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
  probably don't have that all sorted correctly yet...
  
  To test extended epoch support, however, I need to some fstests that
  define and validate the behaviour of the new syscalls - until we get
  those we can't validate that the filesystem follows the spec
  properly. I also suspect we are going to need an interface to query
  the supported range of timestamps from a filesystem so that we can
  test boundary conditions in an automated fashion
 
 Thanks a lot for having an initial look at this yourself!
 
 I'd still consider the two problems largely orthogonal.

Depends how you look at it. You can't extend the kernel's idea of
time without permanent storage being able to specify the supported
bounds - that's a non-negotiable aspect of introducing extended
epoch timestamp support.

The actual addition of extended timestamp support to each individual
filesystem is orthoganol to the introduction of the struct
inode_time, but doing this addition properly is dependent on the VFS
infrastructure being there in the first place.

 My patch set
 (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave
 more like 64-bit kernels regarding inode time stamps, which does
 impact all the file systems that the a 64-bit time or the NFS
 unsigned epoch (1970-2106), while your patch extends the file
 system internal epoch (1901-2038 for XFS) so it can be used 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Dave Chinner
On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote:
> On Sun, 1 Jun 2014, Arnd Bergmann wrote:
> > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > For actually running kernels beyond 2038, the best idea I've seen so
> > far is to disallow all broken code at compile time. I don't see
> > a choice but to audit the entire kernel for invalid uses on both
> > 32 and 64 bit in the next few years. A lot of code will get changed
> > in the process so we can actually keep running 32-bit kernels and
> > file systems, but other code will likely go away:
> > 
> > * any system calls that pass a time_t, timeval or timespec on
> >   32-bit systems return -ENOSYS, to ensure all user land uses
> >   the replacements we will put into place
> > * The definition of 'time_t', 'timval' and 'timespec' can be hidden
> >   from the kernel, and all code using it left out.
> > * ext2 and ext3 file system code will have to be disabled, but that's
> >   file since ext4 can mount old file systems.
> 
> Syscalls and libs can be "fixed".  Existing filesystem content might 
> not.  So if you need to mount some old media in read-write mode after 
> 2038 and that happens to content an ext2 or similarly limited filesystem 
> then it'd better just "work".  Having the kernel refuse to modify the 
> filesystem would be unacceptable.

We can already tell the VFS/filesystems not to update timestamps:

inode->i_flags |= S_NOATIME | S_NOCMTIME;

Just enforce that everywhere (i.e. notify_change()) rather than just
on the IO path and the "legacy filesystem timestamp" problem is
"solved".

New interfaces need to return errors when an out-of-range parameter
is set. And right now, >epoch dates are out of range for most
filesystems, and so we need to handle that condition appropriately.
Silent date overflow == filesystem corruption, and as such I'm going
to error out such conditions in the filesystem regardless of what
the userspace API says.

Filesystems place all sorts of userspace visible limits on storage -
ever tried to create a file >16TB on ext4? The on-disk format
doesn't support it, so it returns an out of range error (E2BIG, I
think) if you try. XFS, OTOH, handles this just fine and so it
continues to work. It's exactly the same with timestamps - there's a
physical limit to what can sanely be stored in any given filesystem
and it's an *error condition* to go beyond that limit

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Nicolas Pitre
On Sun, 1 Jun 2014, Arnd Bergmann wrote:

> On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > > readonly if not in reality than in practice.
> > 
> > For those (legacy) filesystems with a signed 32-bit timestamps, any 
> > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be 
> > (silently) clamped to 0x7fff and that value (the last representable 
> > time) used as an overflow indicator.  The filesystem driver should 
> > convert that value into a corresponding overflow value for whatever 
> > kernel internal time representation being used when read back, and this 
> > should be propagated up to user space.  It should not be a hard error 
> > otherwise, as you rightfully stated, everything non read-only would come 
> > to a halt on that day.
> 
> I don't think there is much of a difference between not being able to
> write at all and all newly written files having the same timestamp,
> causing random things to break differently.

Well, in one case you have a crash certitude. In the other case you have 
some probability that your system might still be usable.

> The clamp to the maximum supported time stamp sounds like a reasonable
> choice for 'utimens' and related syscalls for the case of someone
> setting an arbitrary future date beyond what the file system can
> represent. Then again, I don't see a reason why that shouldn't just
> cause an error to be returned.

Resiliance is better than outright failure.

> For actually running kernels beyond 2038, the best idea I've seen so
> far is to disallow all broken code at compile time. I don't see
> a choice but to audit the entire kernel for invalid uses on both
> 32 and 64 bit in the next few years. A lot of code will get changed
> in the process so we can actually keep running 32-bit kernels and
> file systems, but other code will likely go away:
> 
> * any system calls that pass a time_t, timeval or timespec on
>   32-bit systems return -ENOSYS, to ensure all user land uses
>   the replacements we will put into place
> * The definition of 'time_t', 'timval' and 'timespec' can be hidden
>   from the kernel, and all code using it left out.
> * ext2 and ext3 file system code will have to be disabled, but that's
>   file since ext4 can mount old file systems.

Syscalls and libs can be "fixed".  Existing filesystem content might 
not.  So if you need to mount some old media in read-write mode after 
2038 and that happens to content an ext2 or similarly limited filesystem 
then it'd better just "work".  Having the kernel refuse to modify the 
filesystem would be unacceptable.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Dave Chinner
On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
> On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> > In my list at http://kernelnewbies.org/y2038, I found that almost
> > all file systems at least times until 2106, because they treat
> > the on-disk value as unsigned on 64-bit systems, or they use
> > a completely different representation. My guess is that somebody
> > earlier spent a lot of work on making that happen.
> > 
> > The exceptions are:
> > 
> > * exofs uses signed values, which can probably be changed to be
> >   consistent with the others.
> > * isofs has a bug that limits it until 2027 on architectures with
> >   a signed 'char' type (otherwise it's 2155).
> > * udf can represent times for many thousands of years through a
> >   16-bit year representation, but the code to convert to epoch
> >   uses a const array that ends at 2038.
> > * afs uses signed seconds and can probably be fixed
> > * coda relies on user space time representation getting passed
> >   through an ioctl.
> > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
> >   where they really use signed.
> > 
> > I was confused about XFS since I didn't noticed that there are
> > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> > XFS to also use the 1970-2106 time range on 64-bit systems today.
> 
> You've missed an awful lot more than just the implications for the
> core kernel code.
> 
> There's a good chance such changes propagate to APIs elsewhere in
> the filesystems, because something you haven't realised is that XFS
> effectively exposes the on-disk timestamp format directly to
> userspace via the bulkstat interface (see struct xfs_bstat). It also
> affects the XFS open-by-handle ioctl and the swap extent ioctl used
> by the online defragmenter.
> 
> IOWs, if we are changing the on-disk timestamp format then this
> affects several ioctl()s and hence quite a few of the XFS userspace
> utilities. The hardest to fix will be xfsdump which would need a new
> dump format to store the extended timestamp ranges, and then
> xfs_restore will need to be able to handle restoring such timestamps
> on filesystems that don't have extended timestamp support...
> 
> Put simply, changing the structure of system time isn't as straight
> forward as changing the kernel structures. System time gets stored
> permanently, and that has a cascade effect through the kernel all
> to all of the filesystem utilities that know about that permanent
> storage in some way
> 
> So yes, you can change the kernel definition, but until the
> permanent storage of system time can be extended to support the same
> range as the kernel the *system* will still have nasty, silent epoch
> overflow, truncation or corruption issues.

Just to put that in context, here's the kernel patch to add extended
epoch support to XFS. It's completely untested as I haven't done any
userspace code changes to enable the feature. However, it should
give you an indication of how far the simple act of changing the
kernel time representation spread through the filesystem. This does
not include any of the VFS infrastructure to specifying the range of
supported timestamps.  It survives some smoke testing, but dies when
the online defragmenter starts using the bulkstat and swap extent
ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
probably don't have that all sorted correctly yet...

To test extended epoch support, however, I need to some fstests that
define and validate the behaviour of the new syscalls - until we get
those we can't validate that the filesystem follows the spec
properly. I also suspect we are going to need an interface to query
the supported range of timestamps from a filesystem so that we can
test boundary conditions in an automated fashion

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

xfs: support timestamps beyond Unix epochs

From: Dave Chinner 

The 32 bit second counters in timestamps are too small to represent
time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
format for a timestamp to include an 8-bit epoch counter so that we
can extend time for up to 255 Unix epochs. This should be good for
representing timestamps from 1970 to somewhere around 19,000 A.D

Signed-off-by: Dave Chinner 
---
 fs/xfs/time.h|  7 --
 fs/xfs/xfs_bmap_util.c   | 35 +---
 fs/xfs/xfs_dinode.h  | 48 ++-
 fs/xfs/xfs_fs.h  |  9 +++-
 fs/xfs/xfs_fsops.c   |  5 +++-
 fs/xfs/xfs_inode.c   | 16 ++---
 fs/xfs/xfs_inode_buf.c   |  8 +++
 fs/xfs/xfs_ioctl32.c |  3 +++
 fs/xfs/xfs_ioctl32.h |  5 +++-
 fs/xfs/xfs_iops.c| 59 +++-
 fs/xfs/xfs_itable.c  | 12 ++
 fs/xfs/xfs_log_format.h  |  4 
 fs/xfs/xfs_sb.h  | 12 +-
 fs/xfs/xfs_trans_inode.c |  2 +-
 14 files changed, 175 insertions(+), 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread H. Peter Anvin
Perhaps we should make this a kernel command line option instead, with the 
settings: error out on outside the standard window, or a date indicating the 
earliest date that should be recognized and do windowing (0 for no windowing, 
1970 for retconning the Unix epoch as unsigned...)

But again, the kernel is probably the least problem here...

On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann  wrote:
>On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
>> > readonly if not in reality than in practice.
>> 
>> For those (legacy) filesystems with a signed 32-bit timestamps, any 
>> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be
>
>> (silently) clamped to 0x7fff and that value (the last
>representable 
>> time) used as an overflow indicator.  The filesystem driver should 
>> convert that value into a corresponding overflow value for whatever 
>> kernel internal time representation being used when read back, and
>this 
>> should be propagated up to user space.  It should not be a hard error
>
>> otherwise, as you rightfully stated, everything non read-only would
>come 
>> to a halt on that day.
>
>I don't think there is much of a difference between not being able to
>write at all and all newly written files having the same timestamp,
>causing random things to break differently.
>
>The clamp to the maximum supported time stamp sounds like a reasonable
>choice for 'utimens' and related syscalls for the case of someone
>setting an arbitrary future date beyond what the file system can
>represent. Then again, I don't see a reason why that shouldn't just
>cause an error to be returned.
>
>For actually running kernels beyond 2038, the best idea I've seen so
>far is to disallow all broken code at compile time. I don't see
>a choice but to audit the entire kernel for invalid uses on both
>32 and 64 bit in the next few years. A lot of code will get changed
>in the process so we can actually keep running 32-bit kernels and
>file systems, but other code will likely go away:
>
>* any system calls that pass a time_t, timeval or timespec on
>  32-bit systems return -ENOSYS, to ensure all user land uses
>  the replacements we will put into place
>* The definition of 'time_t', 'timval' and 'timespec' can be hidden
>  from the kernel, and all code using it left out.
>* ext2 and ext3 file system code will have to be disabled, but that's
>  file since ext4 can mount old file systems.
>* until xfs gets extended, we can also disiable it at build time.
>
>For most users, we probably want to leave all that enabled by
>default until we get much closer to 2038, but a compile time
>option should allow us to test what works or doesn't, and it
>can be set by embedded developers that want to ensure their
>code keeps running for the next few decades.
>
>   Arnd

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Arnd Bergmann
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
> > readonly if not in reality than in practice.
> 
> For those (legacy) filesystems with a signed 32-bit timestamps, any 
> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be 
> (silently) clamped to 0x7fff and that value (the last representable 
> time) used as an overflow indicator.  The filesystem driver should 
> convert that value into a corresponding overflow value for whatever 
> kernel internal time representation being used when read back, and this 
> should be propagated up to user space.  It should not be a hard error 
> otherwise, as you rightfully stated, everything non read-only would come 
> to a halt on that day.

I don't think there is much of a difference between not being able to
write at all and all newly written files having the same timestamp,
causing random things to break differently.

The clamp to the maximum supported time stamp sounds like a reasonable
choice for 'utimens' and related syscalls for the case of someone
setting an arbitrary future date beyond what the file system can
represent. Then again, I don't see a reason why that shouldn't just
cause an error to be returned.

For actually running kernels beyond 2038, the best idea I've seen so
far is to disallow all broken code at compile time. I don't see
a choice but to audit the entire kernel for invalid uses on both
32 and 64 bit in the next few years. A lot of code will get changed
in the process so we can actually keep running 32-bit kernels and
file systems, but other code will likely go away:

* any system calls that pass a time_t, timeval or timespec on
  32-bit systems return -ENOSYS, to ensure all user land uses
  the replacements we will put into place
* The definition of 'time_t', 'timval' and 'timespec' can be hidden
  from the kernel, and all code using it left out.
* ext2 and ext3 file system code will have to be disabled, but that's
  file since ext4 can mount old file systems.
* until xfs gets extended, we can also disiable it at build time.

For most users, we probably want to leave all that enabled by
default until we get much closer to 2038, but a compile time
option should allow us to test what works or doesn't, and it
can be set by embedded developers that want to ensure their
code keeps running for the next few decades.

Arnd


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Arnd Bergmann
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
  readonly if not in reality than in practice.
 
 For those (legacy) filesystems with a signed 32-bit timestamps, any 
 attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be 
 (silently) clamped to 0x7fff and that value (the last representable 
 time) used as an overflow indicator.  The filesystem driver should 
 convert that value into a corresponding overflow value for whatever 
 kernel internal time representation being used when read back, and this 
 should be propagated up to user space.  It should not be a hard error 
 otherwise, as you rightfully stated, everything non read-only would come 
 to a halt on that day.

I don't think there is much of a difference between not being able to
write at all and all newly written files having the same timestamp,
causing random things to break differently.

The clamp to the maximum supported time stamp sounds like a reasonable
choice for 'utimens' and related syscalls for the case of someone
setting an arbitrary future date beyond what the file system can
represent. Then again, I don't see a reason why that shouldn't just
cause an error to be returned.

For actually running kernels beyond 2038, the best idea I've seen so
far is to disallow all broken code at compile time. I don't see
a choice but to audit the entire kernel for invalid uses on both
32 and 64 bit in the next few years. A lot of code will get changed
in the process so we can actually keep running 32-bit kernels and
file systems, but other code will likely go away:

* any system calls that pass a time_t, timeval or timespec on
  32-bit systems return -ENOSYS, to ensure all user land uses
  the replacements we will put into place
* The definition of 'time_t', 'timval' and 'timespec' can be hidden
  from the kernel, and all code using it left out.
* ext2 and ext3 file system code will have to be disabled, but that's
  file since ext4 can mount old file systems.
* until xfs gets extended, we can also disiable it at build time.

For most users, we probably want to leave all that enabled by
default until we get much closer to 2038, but a compile time
option should allow us to test what works or doesn't, and it
can be set by embedded developers that want to ensure their
code keeps running for the next few decades.

Arnd


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread H. Peter Anvin
Perhaps we should make this a kernel command line option instead, with the 
settings: error out on outside the standard window, or a date indicating the 
earliest date that should be recognized and do windowing (0 for no windowing, 
1970 for retconning the Unix epoch as unsigned...)

But again, the kernel is probably the least problem here...

On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann a...@arndb.de wrote:
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
  readonly if not in reality than in practice.
 
 For those (legacy) filesystems with a signed 32-bit timestamps, any 
 attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be

 (silently) clamped to 0x7fff and that value (the last
representable 
 time) used as an overflow indicator.  The filesystem driver should 
 convert that value into a corresponding overflow value for whatever 
 kernel internal time representation being used when read back, and
this 
 should be propagated up to user space.  It should not be a hard error

 otherwise, as you rightfully stated, everything non read-only would
come 
 to a halt on that day.

I don't think there is much of a difference between not being able to
write at all and all newly written files having the same timestamp,
causing random things to break differently.

The clamp to the maximum supported time stamp sounds like a reasonable
choice for 'utimens' and related syscalls for the case of someone
setting an arbitrary future date beyond what the file system can
represent. Then again, I don't see a reason why that shouldn't just
cause an error to be returned.

For actually running kernels beyond 2038, the best idea I've seen so
far is to disallow all broken code at compile time. I don't see
a choice but to audit the entire kernel for invalid uses on both
32 and 64 bit in the next few years. A lot of code will get changed
in the process so we can actually keep running 32-bit kernels and
file systems, but other code will likely go away:

* any system calls that pass a time_t, timeval or timespec on
  32-bit systems return -ENOSYS, to ensure all user land uses
  the replacements we will put into place
* The definition of 'time_t', 'timval' and 'timespec' can be hidden
  from the kernel, and all code using it left out.
* ext2 and ext3 file system code will have to be disabled, but that's
  file since ext4 can mount old file systems.
* until xfs gets extended, we can also disiable it at build time.

For most users, we probably want to leave all that enabled by
default until we get much closer to 2038, but a compile time
option should allow us to test what works or doesn't, and it
can be set by embedded developers that want to ensure their
code keeps running for the next few decades.

   Arnd

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Dave Chinner
On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote:
 On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
  In my list at http://kernelnewbies.org/y2038, I found that almost
  all file systems at least times until 2106, because they treat
  the on-disk value as unsigned on 64-bit systems, or they use
  a completely different representation. My guess is that somebody
  earlier spent a lot of work on making that happen.
  
  The exceptions are:
  
  * exofs uses signed values, which can probably be changed to be
consistent with the others.
  * isofs has a bug that limits it until 2027 on architectures with
a signed 'char' type (otherwise it's 2155).
  * udf can represent times for many thousands of years through a
16-bit year representation, but the code to convert to epoch
uses a const array that ends at 2038.
  * afs uses signed seconds and can probably be fixed
  * coda relies on user space time representation getting passed
through an ioctl.
  * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
where they really use signed.
  
  I was confused about XFS since I didn't noticed that there are
  separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
  XFS to also use the 1970-2106 time range on 64-bit systems today.
 
 You've missed an awful lot more than just the implications for the
 core kernel code.
 
 There's a good chance such changes propagate to APIs elsewhere in
 the filesystems, because something you haven't realised is that XFS
 effectively exposes the on-disk timestamp format directly to
 userspace via the bulkstat interface (see struct xfs_bstat). It also
 affects the XFS open-by-handle ioctl and the swap extent ioctl used
 by the online defragmenter.
 
 IOWs, if we are changing the on-disk timestamp format then this
 affects several ioctl()s and hence quite a few of the XFS userspace
 utilities. The hardest to fix will be xfsdump which would need a new
 dump format to store the extended timestamp ranges, and then
 xfs_restore will need to be able to handle restoring such timestamps
 on filesystems that don't have extended timestamp support...
 
 Put simply, changing the structure of system time isn't as straight
 forward as changing the kernel structures. System time gets stored
 permanently, and that has a cascade effect through the kernel all
 to all of the filesystem utilities that know about that permanent
 storage in some way
 
 So yes, you can change the kernel definition, but until the
 permanent storage of system time can be extended to support the same
 range as the kernel the *system* will still have nasty, silent epoch
 overflow, truncation or corruption issues.

Just to put that in context, here's the kernel patch to add extended
epoch support to XFS. It's completely untested as I haven't done any
userspace code changes to enable the feature. However, it should
give you an indication of how far the simple act of changing the
kernel time representation spread through the filesystem. This does
not include any of the VFS infrastructure to specifying the range of
supported timestamps.  It survives some smoke testing, but dies when
the online defragmenter starts using the bulkstat and swap extent
ioctls (the assert in xfs_inode_time_from_epoch() fires), so I
probably don't have that all sorted correctly yet...

To test extended epoch support, however, I need to some fstests that
define and validate the behaviour of the new syscalls - until we get
those we can't validate that the filesystem follows the spec
properly. I also suspect we are going to need an interface to query
the supported range of timestamps from a filesystem so that we can
test boundary conditions in an automated fashion

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

xfs: support timestamps beyond Unix epochs

From: Dave Chinner dchin...@redhat.com

The 32 bit second counters in timestamps are too small to represent
time beyond the unix epoch (jan 2038) correctly. Extend the on-disk
format for a timestamp to include an 8-bit epoch counter so that we
can extend time for up to 255 Unix epochs. This should be good for
representing timestamps from 1970 to somewhere around 19,000 A.D

Signed-off-by: Dave Chinner dchin...@redhat.com
---
 fs/xfs/time.h|  7 --
 fs/xfs/xfs_bmap_util.c   | 35 +---
 fs/xfs/xfs_dinode.h  | 48 ++-
 fs/xfs/xfs_fs.h  |  9 +++-
 fs/xfs/xfs_fsops.c   |  5 +++-
 fs/xfs/xfs_inode.c   | 16 ++---
 fs/xfs/xfs_inode_buf.c   |  8 +++
 fs/xfs/xfs_ioctl32.c |  3 +++
 fs/xfs/xfs_ioctl32.h |  5 +++-
 fs/xfs/xfs_iops.c| 59 +++-
 fs/xfs/xfs_itable.c  | 12 ++
 fs/xfs/xfs_log_format.h  |  4 
 fs/xfs/xfs_sb.h  | 12 +-
 fs/xfs/xfs_trans_inode.c |  2 +-
 14 files changed, 175 insertions(+), 50 deletions(-)

diff --git 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Nicolas Pitre
On Sun, 1 Jun 2014, Arnd Bergmann wrote:

 On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
   readonly if not in reality than in practice.
  
  For those (legacy) filesystems with a signed 32-bit timestamps, any 
  attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be 
  (silently) clamped to 0x7fff and that value (the last representable 
  time) used as an overflow indicator.  The filesystem driver should 
  convert that value into a corresponding overflow value for whatever 
  kernel internal time representation being used when read back, and this 
  should be propagated up to user space.  It should not be a hard error 
  otherwise, as you rightfully stated, everything non read-only would come 
  to a halt on that day.
 
 I don't think there is much of a difference between not being able to
 write at all and all newly written files having the same timestamp,
 causing random things to break differently.

Well, in one case you have a crash certitude. In the other case you have 
some probability that your system might still be usable.

 The clamp to the maximum supported time stamp sounds like a reasonable
 choice for 'utimens' and related syscalls for the case of someone
 setting an arbitrary future date beyond what the file system can
 represent. Then again, I don't see a reason why that shouldn't just
 cause an error to be returned.

Resiliance is better than outright failure.

 For actually running kernels beyond 2038, the best idea I've seen so
 far is to disallow all broken code at compile time. I don't see
 a choice but to audit the entire kernel for invalid uses on both
 32 and 64 bit in the next few years. A lot of code will get changed
 in the process so we can actually keep running 32-bit kernels and
 file systems, but other code will likely go away:
 
 * any system calls that pass a time_t, timeval or timespec on
   32-bit systems return -ENOSYS, to ensure all user land uses
   the replacements we will put into place
 * The definition of 'time_t', 'timval' and 'timespec' can be hidden
   from the kernel, and all code using it left out.
 * ext2 and ext3 file system code will have to be disabled, but that's
   file since ext4 can mount old file systems.

Syscalls and libs can be fixed.  Existing filesystem content might 
not.  So if you need to mount some old media in read-write mode after 
2038 and that happens to content an ext2 or similarly limited filesystem 
then it'd better just work.  Having the kernel refuse to modify the 
filesystem would be unacceptable.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-06-01 Thread Dave Chinner
On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote:
 On Sun, 1 Jun 2014, Arnd Bergmann wrote:
  On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote:
  For actually running kernels beyond 2038, the best idea I've seen so
  far is to disallow all broken code at compile time. I don't see
  a choice but to audit the entire kernel for invalid uses on both
  32 and 64 bit in the next few years. A lot of code will get changed
  in the process so we can actually keep running 32-bit kernels and
  file systems, but other code will likely go away:
  
  * any system calls that pass a time_t, timeval or timespec on
32-bit systems return -ENOSYS, to ensure all user land uses
the replacements we will put into place
  * The definition of 'time_t', 'timval' and 'timespec' can be hidden
from the kernel, and all code using it left out.
  * ext2 and ext3 file system code will have to be disabled, but that's
file since ext4 can mount old file systems.
 
 Syscalls and libs can be fixed.  Existing filesystem content might 
 not.  So if you need to mount some old media in read-write mode after 
 2038 and that happens to content an ext2 or similarly limited filesystem 
 then it'd better just work.  Having the kernel refuse to modify the 
 filesystem would be unacceptable.

We can already tell the VFS/filesystems not to update timestamps:

inode-i_flags |= S_NOATIME | S_NOCMTIME;

Just enforce that everywhere (i.e. notify_change()) rather than just
on the IO path and the legacy filesystem timestamp problem is
solved.

New interfaces need to return errors when an out-of-range parameter
is set. And right now, epoch dates are out of range for most
filesystems, and so we need to handle that condition appropriately.
Silent date overflow == filesystem corruption, and as such I'm going
to error out such conditions in the filesystem regardless of what
the userspace API says.

Filesystems place all sorts of userspace visible limits on storage -
ever tried to create a file 16TB on ext4? The on-disk format
doesn't support it, so it returns an out of range error (E2BIG, I
think) if you try. XFS, OTOH, handles this just fine and so it
continues to work. It's exactly the same with timestamps - there's a
physical limit to what can sanely be stored in any given filesystem
and it's an *error condition* to go beyond that limit

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Dave Chinner
On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote:
> On 05/30/2014 10:54 PM, Dave Chinner wrote:
> > 
> > If we are changing the in-kernel timestamp to have a greater dynamic
> > range that anything we current support on disk, then we need support
> > for all filesystems for similar translation and constraint. The
> > filesystems need to be able to tell the kernel what they timestamp
> > range they support, and then the kernel needs to follow those
> > guidelines. And if the filesystem is mounted on a kernel that
> > doesn't support the current filesystem's timestamp format, then at
> > minimum that filesystem cannot do anything that writes a
> > timestamp
> > 
> > Put simply: the filesystem defines the timestamp range that can be
> > used safely, not the userspace API. If the filesystem can't support
> > the date it is handed then that is an out-of-range error. Since
> > when have we accepted that it's OK to handle out-of-range data with
> > silent overflows or corruption of the data that we are attempting to
> > store? We're defining a new API to support a wider date range -
> > there is nothing that prevents us from saying ERANGE can be returned
> > to a timestamp that the file cannot store correctly
> > 
> 
> I'm still puzzled.
> 
> Are you saying that you want a program that does:
> 
>   /* Deliberately simplified */
>   gettimeofdayns( ...);
>   utimensat(... now);
> 
> ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
> 32-bit timestamps),

Yes. Hard fail so overflows are in your face and we know exactly
what is going to cause silent timestamp screwups when the epoch

> or would you propose some ways for the filesystems
> in question to extend the range of the timestamps?

Filesystems are going to have to change their on-disk formats, so
we'd do that just like we do every other on-disk format change. With
feature bits and translation layers, new ioctl structures, etc.
Depending on the amount of work necessary, some filesystems could do
this in 3.16, others it might be 3.20 before everything is sorted
out across the kernel and userspace code...

Either way, the hard fail problem goes away as each filesystem is
converted. Further, if we have regression tests then new filesystems
are guaranteed to be designed to handle 2038 epoch rollover, and so
in a year of two this "hard fail" is effectively a non-problem. If
someone breaks something in future, then we'll know about it pretty
quickly.

> What you seem to propose also seems to imply that on Jan 19, 2038
> anything that writes a timestamp with the current date (which logically
> ends up being almost every write operation) would be dead and frozen on
> such a filesystem -- pretty much meaning the filesystem would become
> readonly if not in reality than in practice.

Yup. If we can't do what the user wants without the user thinking
corruption has occurred, then the only thing we are left with is
"shut down the filesystem" error handling. Kind of like using BUG()
rather than returning an error. That's why we need to be able to
hard fail and return an error.

However, we've got 20+ years to fix our current filesystems and all
their support code to ensure this doesn't happen. In the mean time,
having stuff hard fail is a great way to ensure that filesystems get
fixed sooner rather than later...

> I strongly suspect that that would be a more catastrophic failure than
> incorrect timestamps, as you suddenly have all kinds of machines
> embedded in $DEITY knows what places just stop and refuse to run.

Yup, that's a great way of flushing out problems 20 years before
they really matter.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Dave Chinner
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
> On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
> > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> > > On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > > > 
> > > > IOWs, the filesystem has to be able to reject any attempt to set a
> > > > timestamp that is can't represent on disk otherwise Bad Stuff will
> > > > happen,
> > > 
> > > Actually it is questionable if it is worse to reject a timestamp or just
> > > let it wrap.  Rejecting a valid timestamp is a bit like "You don't
> > > exist, go away."
> > 
> > I think having the new systems calls being able to
> > return EINVAL if the value cannot be stored permanently on disk
> > correctly is the right thing to do. Having it silently mangled
> > by the filesystem and returning "everything is just fine, trust me"
> > is close to the worst solution I can think of. That's exactly what
> > leads to overflow bugs occurring
> 
> While going through the file systems, I was wondering whether
> we should have the times stop at the end of each file systems
> epoch rather than wrap around.
> 
> > > > and filesystems have to be able to specify in their on
> > > > disk format what timestamp encoding is being used. The solution will
> > > > be different for every filesystem that needs to support time beyond
> > > > 2038.
> > > 
> > > Actually the cutoff can be really different for each filesystem, not
> > > necessarily 2038.  However, I maintain the above still holds.
> > 
> > Sure, but all filesystems are supposed to handle at least the
> > current unix epoch.
> 
> In my list at http://kernelnewbies.org/y2038, I found that almost
> all file systems at least times until 2106, because they treat
> the on-disk value as unsigned on 64-bit systems, or they use
> a completely different representation. My guess is that somebody
> earlier spent a lot of work on making that happen.
> 
> The exceptions are:
> 
> * exofs uses signed values, which can probably be changed to be
>   consistent with the others.
> * isofs has a bug that limits it until 2027 on architectures with
>   a signed 'char' type (otherwise it's 2155).
> * udf can represent times for many thousands of years through a
>   16-bit year representation, but the code to convert to epoch
>   uses a const array that ends at 2038.
> * afs uses signed seconds and can probably be fixed
> * coda relies on user space time representation getting passed
>   through an ioctl.
> * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
>   where they really use signed.
> 
> I was confused about XFS since I didn't noticed that there are
> separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
> XFS to also use the 1970-2106 time range on 64-bit systems today.

You've missed an awful lot more than just the implications for the
core kernel code.

There's a good chance such changes propagate to APIs elsewhere in
the filesystems, because something you haven't realised is that XFS
effectively exposes the on-disk timestamp format directly to
userspace via the bulkstat interface (see struct xfs_bstat). It also
affects the XFS open-by-handle ioctl and the swap extent ioctl used
by the online defragmenter.

IOWs, if we are changing the on-disk timestamp format then this
affects several ioctl()s and hence quite a few of the XFS userspace
utilities. The hardest to fix will be xfsdump which would need a new
dump format to store the extended timestamp ranges, and then
xfs_restore will need to be able to handle restoring such timestamps
on filesystems that don't have extended timestamp support...

Put simply, changing the structure of system time isn't as straight
forward as changing the kernel structures. System time gets stored
permanently, and that has a cascade effect through the kernel all
to all of the filesystem utilities that know about that permanent
storage in some way

So yes, you can change the kernel definition, but until the
permanent storage of system time can be extended to support the same
range as the kernel the *system* will still have nasty, silent epoch
overflow, truncation or corruption issues.

> If we are using the variant of my patch that extends
> indode_time->tv_sec to s64, nothing should change for XFS
> at all, the main difference is that we if it gets extended
> to wider on-disk timestamps, they will work the same way on
> 32-bit and 64-bit kernels. 

"nothing should change" except for the fact that a 64 bit timestamp
gets silently truncated to 32 bits and the timestamp is not what the
user expects it to be. The user does not find out until the inode
passes out of cache and is re-read from disk, and then it's wrong.

To put it politely: that is broken, obnoxious behaviour and we don't
design new interfaces with such ugly warts anymore. Define an
EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this
case and *hard fail* if the storage cannot support the extended
timestamp being 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Nicolas Pitre
On Sat, 31 May 2014, H. Peter Anvin wrote:

> On 05/30/2014 10:54 PM, Dave Chinner wrote:
> > 
> > If we are changing the in-kernel timestamp to have a greater dynamic
> > range that anything we current support on disk, then we need support
> > for all filesystems for similar translation and constraint. The
> > filesystems need to be able to tell the kernel what they timestamp
> > range they support, and then the kernel needs to follow those
> > guidelines. And if the filesystem is mounted on a kernel that
> > doesn't support the current filesystem's timestamp format, then at
> > minimum that filesystem cannot do anything that writes a
> > timestamp
> > 
> > Put simply: the filesystem defines the timestamp range that can be
> > used safely, not the userspace API. If the filesystem can't support
> > the date it is handed then that is an out-of-range error. Since
> > when have we accepted that it's OK to handle out-of-range data with
> > silent overflows or corruption of the data that we are attempting to
> > store? We're defining a new API to support a wider date range -
> > there is nothing that prevents us from saying ERANGE can be returned
> > to a timestamp that the file cannot store correctly
> > 
> 
> I'm still puzzled.
> 
> Are you saying that you want a program that does:
> 
>   /* Deliberately simplified */
>   gettimeofdayns( ...);
>   utimensat(... now);
> 
> ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
> 32-bit timestamps), or would you propose some ways for the filesystems
> in question to extend the range of the timestamps?
> 
> What you seem to propose also seems to imply that on Jan 19, 2038
> anything that writes a timestamp with the current date (which logically
> ends up being almost every write operation) would be dead and frozen on
> such a filesystem -- pretty much meaning the filesystem would become
> readonly if not in reality than in practice.

For those (legacy) filesystems with a signed 32-bit timestamps, any 
attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be 
(silently) clamped to 0x7fff and that value (the last representable 
time) used as an overflow indicator.  The filesystem driver should 
convert that value into a corresponding overflow value for whatever 
kernel internal time representation being used when read back, and this 
should be propagated up to user space.  It should not be a hard error 
otherwise, as you rightfully stated, everything non read-only would come 
to a halt on that day.

Inside the kernel, the overflow indicator could be as simple as 
dedicating one of the top bit in a 64-bit time_t value in order to still 
transmit the overflow limit.  For example, in the above case, we could 
use 0x4000-7fff to indicate the actual time is unavailable due 
to the filesystem's time representation being overflowed from 
0x7fff.

If for example a filesystem cannot represent timestamps from Jan  1 
00:00:00 2100 UTC then the overflow representation for this particular 
filesystem would be 0x4000-f48656ff.

Those syscalls with a 32-bit time_t would be returned 0x7fff 
whenever there is an overflow being signaled.  Whether 64-bit 
overflow-marked time_t values, when passed to user space, should clear 
the overflow bit, or use a unique time_t overflow value, could be 
decided and even changed later after discussion with glibc people for 
example.

Hard errors should be signaled to user space, and the actual operation 
aborted, only with the presence of a new flag passed to the kernel.  
However, by default, things should "just work" albeit with the "wrong" 
i.e clamped time being saved on disk as much as possible otherwise.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Arnd Bergmann
On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
> On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> > On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > > 
> > > IOWs, the filesystem has to be able to reject any attempt to set a
> > > timestamp that is can't represent on disk otherwise Bad Stuff will
> > > happen,
> > 
> > Actually it is questionable if it is worse to reject a timestamp or just
> > let it wrap.  Rejecting a valid timestamp is a bit like "You don't
> > exist, go away."
> 
> I think having the new systems calls being able to
> return EINVAL if the value cannot be stored permanently on disk
> correctly is the right thing to do. Having it silently mangled
> by the filesystem and returning "everything is just fine, trust me"
> is close to the worst solution I can think of. That's exactly what
> leads to overflow bugs occurring

While going through the file systems, I was wondering whether
we should have the times stop at the end of each file systems
epoch rather than wrap around.

> > > and filesystems have to be able to specify in their on
> > > disk format what timestamp encoding is being used. The solution will
> > > be different for every filesystem that needs to support time beyond
> > > 2038.
> > 
> > Actually the cutoff can be really different for each filesystem, not
> > necessarily 2038.  However, I maintain the above still holds.
> 
> Sure, but all filesystems are supposed to handle at least the
> current unix epoch.

In my list at http://kernelnewbies.org/y2038, I found that almost
all file systems at least times until 2106, because they treat
the on-disk value as unsigned on 64-bit systems, or they use
a completely different representation. My guess is that somebody
earlier spent a lot of work on making that happen.

The exceptions are:

* exofs uses signed values, which can probably be changed to be
  consistent with the others.
* isofs has a bug that limits it until 2027 on architectures with
  a signed 'char' type (otherwise it's 2155).
* udf can represent times for many thousands of years through a
  16-bit year representation, but the code to convert to epoch
  uses a const array that ends at 2038.
* afs uses signed seconds and can probably be fixed
* coda relies on user space time representation getting passed
  through an ioctl.
* I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
  where they really use signed.

I was confused about XFS since I didn't noticed that there are
separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
XFS to also use the 1970-2106 time range on 64-bit systems today.

If we are using the variant of my patch that extends
indode_time->tv_sec to s64, nothing should change for XFS
at all, the main difference is that we if it gets extended
to wider on-disk timestamps, they will work the same way on
32-bit and 64-bit kernels. 

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread H. Peter Anvin
On 05/30/2014 10:54 PM, Dave Chinner wrote:
> 
> If we are changing the in-kernel timestamp to have a greater dynamic
> range that anything we current support on disk, then we need support
> for all filesystems for similar translation and constraint. The
> filesystems need to be able to tell the kernel what they timestamp
> range they support, and then the kernel needs to follow those
> guidelines. And if the filesystem is mounted on a kernel that
> doesn't support the current filesystem's timestamp format, then at
> minimum that filesystem cannot do anything that writes a
> timestamp
> 
> Put simply: the filesystem defines the timestamp range that can be
> used safely, not the userspace API. If the filesystem can't support
> the date it is handed then that is an out-of-range error. Since
> when have we accepted that it's OK to handle out-of-range data with
> silent overflows or corruption of the data that we are attempting to
> store? We're defining a new API to support a wider date range -
> there is nothing that prevents us from saying ERANGE can be returned
> to a timestamp that the file cannot store correctly
> 

I'm still puzzled.

Are you saying that you want a program that does:

/* Deliberately simplified */
gettimeofdayns( ...);
utimensat(... now);

... to suddenly start failing on Jan 19, 2038 (for a filesystem with
32-bit timestamps), or would you propose some ways for the filesystems
in question to extend the range of the timestamps?

What you seem to propose also seems to imply that on Jan 19, 2038
anything that writes a timestamp with the current date (which logically
ends up being almost every write operation) would be dead and frozen on
such a filesystem -- pretty much meaning the filesystem would become
readonly if not in reality than in practice.

I strongly suspect that that would be a more catastrophic failure than
incorrect timestamps, as you suddenly have all kinds of machines
embedded in $DEITY knows what places just stop and refuse to run.

If that is not what you mean I genuinely like to understand the
situation better.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread H. Peter Anvin
On 05/30/2014 10:54 PM, Dave Chinner wrote:
 
 If we are changing the in-kernel timestamp to have a greater dynamic
 range that anything we current support on disk, then we need support
 for all filesystems for similar translation and constraint. The
 filesystems need to be able to tell the kernel what they timestamp
 range they support, and then the kernel needs to follow those
 guidelines. And if the filesystem is mounted on a kernel that
 doesn't support the current filesystem's timestamp format, then at
 minimum that filesystem cannot do anything that writes a
 timestamp
 
 Put simply: the filesystem defines the timestamp range that can be
 used safely, not the userspace API. If the filesystem can't support
 the date it is handed then that is an out-of-range error. Since
 when have we accepted that it's OK to handle out-of-range data with
 silent overflows or corruption of the data that we are attempting to
 store? We're defining a new API to support a wider date range -
 there is nothing that prevents us from saying ERANGE can be returned
 to a timestamp that the file cannot store correctly
 

I'm still puzzled.

Are you saying that you want a program that does:

/* Deliberately simplified */
gettimeofdayns(now ...);
utimensat(... now);

... to suddenly start failing on Jan 19, 2038 (for a filesystem with
32-bit timestamps), or would you propose some ways for the filesystems
in question to extend the range of the timestamps?

What you seem to propose also seems to imply that on Jan 19, 2038
anything that writes a timestamp with the current date (which logically
ends up being almost every write operation) would be dead and frozen on
such a filesystem -- pretty much meaning the filesystem would become
readonly if not in reality than in practice.

I strongly suspect that that would be a more catastrophic failure than
incorrect timestamps, as you suddenly have all kinds of machines
embedded in $DEITY knows what places just stop and refuse to run.

If that is not what you mean I genuinely like to understand the
situation better.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Arnd Bergmann
On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
 On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
  On 05/30/2014 05:37 PM, Dave Chinner wrote:
   
   IOWs, the filesystem has to be able to reject any attempt to set a
   timestamp that is can't represent on disk otherwise Bad Stuff will
   happen,
  
  Actually it is questionable if it is worse to reject a timestamp or just
  let it wrap.  Rejecting a valid timestamp is a bit like You don't
  exist, go away.
 
 I think having the new systems calls being able to
 return EINVAL if the value cannot be stored permanently on disk
 correctly is the right thing to do. Having it silently mangled
 by the filesystem and returning everything is just fine, trust me
 is close to the worst solution I can think of. That's exactly what
 leads to overflow bugs occurring

While going through the file systems, I was wondering whether
we should have the times stop at the end of each file systems
epoch rather than wrap around.

   and filesystems have to be able to specify in their on
   disk format what timestamp encoding is being used. The solution will
   be different for every filesystem that needs to support time beyond
   2038.
  
  Actually the cutoff can be really different for each filesystem, not
  necessarily 2038.  However, I maintain the above still holds.
 
 Sure, but all filesystems are supposed to handle at least the
 current unix epoch.

In my list at http://kernelnewbies.org/y2038, I found that almost
all file systems at least times until 2106, because they treat
the on-disk value as unsigned on 64-bit systems, or they use
a completely different representation. My guess is that somebody
earlier spent a lot of work on making that happen.

The exceptions are:

* exofs uses signed values, which can probably be changed to be
  consistent with the others.
* isofs has a bug that limits it until 2027 on architectures with
  a signed 'char' type (otherwise it's 2155).
* udf can represent times for many thousands of years through a
  16-bit year representation, but the code to convert to epoch
  uses a const array that ends at 2038.
* afs uses signed seconds and can probably be fixed
* coda relies on user space time representation getting passed
  through an ioctl.
* I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
  where they really use signed.

I was confused about XFS since I didn't noticed that there are
separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
XFS to also use the 1970-2106 time range on 64-bit systems today.

If we are using the variant of my patch that extends
indode_time-tv_sec to s64, nothing should change for XFS
at all, the main difference is that we if it gets extended
to wider on-disk timestamps, they will work the same way on
32-bit and 64-bit kernels. 

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Nicolas Pitre
On Sat, 31 May 2014, H. Peter Anvin wrote:

 On 05/30/2014 10:54 PM, Dave Chinner wrote:
  
  If we are changing the in-kernel timestamp to have a greater dynamic
  range that anything we current support on disk, then we need support
  for all filesystems for similar translation and constraint. The
  filesystems need to be able to tell the kernel what they timestamp
  range they support, and then the kernel needs to follow those
  guidelines. And if the filesystem is mounted on a kernel that
  doesn't support the current filesystem's timestamp format, then at
  minimum that filesystem cannot do anything that writes a
  timestamp
  
  Put simply: the filesystem defines the timestamp range that can be
  used safely, not the userspace API. If the filesystem can't support
  the date it is handed then that is an out-of-range error. Since
  when have we accepted that it's OK to handle out-of-range data with
  silent overflows or corruption of the data that we are attempting to
  store? We're defining a new API to support a wider date range -
  there is nothing that prevents us from saying ERANGE can be returned
  to a timestamp that the file cannot store correctly
  
 
 I'm still puzzled.
 
 Are you saying that you want a program that does:
 
   /* Deliberately simplified */
   gettimeofdayns(now ...);
   utimensat(... now);
 
 ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
 32-bit timestamps), or would you propose some ways for the filesystems
 in question to extend the range of the timestamps?
 
 What you seem to propose also seems to imply that on Jan 19, 2038
 anything that writes a timestamp with the current date (which logically
 ends up being almost every write operation) would be dead and frozen on
 such a filesystem -- pretty much meaning the filesystem would become
 readonly if not in reality than in practice.

For those (legacy) filesystems with a signed 32-bit timestamps, any 
attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be 
(silently) clamped to 0x7fff and that value (the last representable 
time) used as an overflow indicator.  The filesystem driver should 
convert that value into a corresponding overflow value for whatever 
kernel internal time representation being used when read back, and this 
should be propagated up to user space.  It should not be a hard error 
otherwise, as you rightfully stated, everything non read-only would come 
to a halt on that day.

Inside the kernel, the overflow indicator could be as simple as 
dedicating one of the top bit in a 64-bit time_t value in order to still 
transmit the overflow limit.  For example, in the above case, we could 
use 0x4000-7fff to indicate the actual time is unavailable due 
to the filesystem's time representation being overflowed from 
0x7fff.

If for example a filesystem cannot represent timestamps from Jan  1 
00:00:00 2100 UTC then the overflow representation for this particular 
filesystem would be 0x4000-f48656ff.

Those syscalls with a 32-bit time_t would be returned 0x7fff 
whenever there is an overflow being signaled.  Whether 64-bit 
overflow-marked time_t values, when passed to user space, should clear 
the overflow bit, or use a unique time_t overflow value, could be 
decided and even changed later after discussion with glibc people for 
example.

Hard errors should be signaled to user space, and the actual operation 
aborted, only with the presence of a new flag passed to the kernel.  
However, by default, things should just work albeit with the wrong 
i.e clamped time being saved on disk as much as possible otherwise.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Dave Chinner
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote:
 On Saturday 31 May 2014 11:14:50 Dave Chinner wrote:
  On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
   On 05/30/2014 05:37 PM, Dave Chinner wrote:

IOWs, the filesystem has to be able to reject any attempt to set a
timestamp that is can't represent on disk otherwise Bad Stuff will
happen,
   
   Actually it is questionable if it is worse to reject a timestamp or just
   let it wrap.  Rejecting a valid timestamp is a bit like You don't
   exist, go away.
  
  I think having the new systems calls being able to
  return EINVAL if the value cannot be stored permanently on disk
  correctly is the right thing to do. Having it silently mangled
  by the filesystem and returning everything is just fine, trust me
  is close to the worst solution I can think of. That's exactly what
  leads to overflow bugs occurring
 
 While going through the file systems, I was wondering whether
 we should have the times stop at the end of each file systems
 epoch rather than wrap around.
 
and filesystems have to be able to specify in their on
disk format what timestamp encoding is being used. The solution will
be different for every filesystem that needs to support time beyond
2038.
   
   Actually the cutoff can be really different for each filesystem, not
   necessarily 2038.  However, I maintain the above still holds.
  
  Sure, but all filesystems are supposed to handle at least the
  current unix epoch.
 
 In my list at http://kernelnewbies.org/y2038, I found that almost
 all file systems at least times until 2106, because they treat
 the on-disk value as unsigned on 64-bit systems, or they use
 a completely different representation. My guess is that somebody
 earlier spent a lot of work on making that happen.
 
 The exceptions are:
 
 * exofs uses signed values, which can probably be changed to be
   consistent with the others.
 * isofs has a bug that limits it until 2027 on architectures with
   a signed 'char' type (otherwise it's 2155).
 * udf can represent times for many thousands of years through a
   16-bit year representation, but the code to convert to epoch
   uses a const array that ends at 2038.
 * afs uses signed seconds and can probably be fixed
 * coda relies on user space time representation getting passed
   through an ioctl.
 * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds,
   where they really use signed.
 
 I was confused about XFS since I didn't noticed that there are
 separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected
 XFS to also use the 1970-2106 time range on 64-bit systems today.

You've missed an awful lot more than just the implications for the
core kernel code.

There's a good chance such changes propagate to APIs elsewhere in
the filesystems, because something you haven't realised is that XFS
effectively exposes the on-disk timestamp format directly to
userspace via the bulkstat interface (see struct xfs_bstat). It also
affects the XFS open-by-handle ioctl and the swap extent ioctl used
by the online defragmenter.

IOWs, if we are changing the on-disk timestamp format then this
affects several ioctl()s and hence quite a few of the XFS userspace
utilities. The hardest to fix will be xfsdump which would need a new
dump format to store the extended timestamp ranges, and then
xfs_restore will need to be able to handle restoring such timestamps
on filesystems that don't have extended timestamp support...

Put simply, changing the structure of system time isn't as straight
forward as changing the kernel structures. System time gets stored
permanently, and that has a cascade effect through the kernel all
to all of the filesystem utilities that know about that permanent
storage in some way

So yes, you can change the kernel definition, but until the
permanent storage of system time can be extended to support the same
range as the kernel the *system* will still have nasty, silent epoch
overflow, truncation or corruption issues.

 If we are using the variant of my patch that extends
 indode_time-tv_sec to s64, nothing should change for XFS
 at all, the main difference is that we if it gets extended
 to wider on-disk timestamps, they will work the same way on
 32-bit and 64-bit kernels. 

nothing should change except for the fact that a 64 bit timestamp
gets silently truncated to 32 bits and the timestamp is not what the
user expects it to be. The user does not find out until the inode
passes out of cache and is re-read from disk, and then it's wrong.

To put it politely: that is broken, obnoxious behaviour and we don't
design new interfaces with such ugly warts anymore. Define an
EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this
case and *hard fail* if the storage cannot support the extended
timestamp being passed in. There is no excuse for silently mangling
out-of-range data, especially as we have plenty of time to add
support to 

Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-31 Thread Dave Chinner
On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote:
 On 05/30/2014 10:54 PM, Dave Chinner wrote:
  
  If we are changing the in-kernel timestamp to have a greater dynamic
  range that anything we current support on disk, then we need support
  for all filesystems for similar translation and constraint. The
  filesystems need to be able to tell the kernel what they timestamp
  range they support, and then the kernel needs to follow those
  guidelines. And if the filesystem is mounted on a kernel that
  doesn't support the current filesystem's timestamp format, then at
  minimum that filesystem cannot do anything that writes a
  timestamp
  
  Put simply: the filesystem defines the timestamp range that can be
  used safely, not the userspace API. If the filesystem can't support
  the date it is handed then that is an out-of-range error. Since
  when have we accepted that it's OK to handle out-of-range data with
  silent overflows or corruption of the data that we are attempting to
  store? We're defining a new API to support a wider date range -
  there is nothing that prevents us from saying ERANGE can be returned
  to a timestamp that the file cannot store correctly
  
 
 I'm still puzzled.
 
 Are you saying that you want a program that does:
 
   /* Deliberately simplified */
   gettimeofdayns(now ...);
   utimensat(... now);
 
 ... to suddenly start failing on Jan 19, 2038 (for a filesystem with
 32-bit timestamps),

Yes. Hard fail so overflows are in your face and we know exactly
what is going to cause silent timestamp screwups when the epoch

 or would you propose some ways for the filesystems
 in question to extend the range of the timestamps?

Filesystems are going to have to change their on-disk formats, so
we'd do that just like we do every other on-disk format change. With
feature bits and translation layers, new ioctl structures, etc.
Depending on the amount of work necessary, some filesystems could do
this in 3.16, others it might be 3.20 before everything is sorted
out across the kernel and userspace code...

Either way, the hard fail problem goes away as each filesystem is
converted. Further, if we have regression tests then new filesystems
are guaranteed to be designed to handle 2038 epoch rollover, and so
in a year of two this hard fail is effectively a non-problem. If
someone breaks something in future, then we'll know about it pretty
quickly.

 What you seem to propose also seems to imply that on Jan 19, 2038
 anything that writes a timestamp with the current date (which logically
 ends up being almost every write operation) would be dead and frozen on
 such a filesystem -- pretty much meaning the filesystem would become
 readonly if not in reality than in practice.

Yup. If we can't do what the user wants without the user thinking
corruption has occurred, then the only thing we are left with is
shut down the filesystem error handling. Kind of like using BUG()
rather than returning an error. That's why we need to be able to
hard fail and return an error.

However, we've got 20+ years to fix our current filesystems and all
their support code to ensure this doesn't happen. In the mean time,
having stuff hard fail is a great way to ensure that filesystems get
fixed sooner rather than later...

 I strongly suspect that that would be a more catastrophic failure than
 incorrect timestamps, as you suddenly have all kinds of machines
 embedded in $DEITY knows what places just stop and refuse to run.

Yup, that's a great way of flushing out problems 20 years before
they really matter.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread Dave Chinner

[ Please don't top post. ]

On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote:
> On May 30, 2014 6:14:50 PM PDT, Dave Chinner  wrote:
> >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> >> On 05/30/2014 05:37 PM, Dave Chinner wrote:
> >> > 
> >> > IOWs, the filesystem has to be able to reject any attempt to
> >> > set a timestamp that is can't represent on disk otherwise Bad
> >> > Stuff will happen,
> >> 
> >> Actually it is questionable if it is worse to reject a
> >> timestamp or
> >just
> >> let it wrap.  Rejecting a valid timestamp is a bit like "You
> >> don't exist, go away."
> >
> >I think having the new systems calls being able to return EINVAL
> >if the value cannot be stored permanently on disk correctly is
> >the right thing to do. Having it silently mangled by the
> >filesystem and returning "everything is just fine, trust me" is
> >close to the worst solution I can think of. That's exactly what
> >leads to overflow bugs occurring
> >
> >> > and filesystems have to be able to specify in their on disk
> >> > format what timestamp encoding is being used. The solution
> >will
> >> > be different for every filesystem that needs to support time
> >> > beyond 2038.
> >> 
> >> Actually the cutoff can be really different for each
> >> filesystem, not necessarily 2038.  However, I maintain the
> >> above still holds.
> >
> >Sure, but all filesystems are supposed to handle at least the
> >current unix epoch.
> >
> >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS
> >> format. 
> >What
> >> would you have expected such a filesystem to do on Jan 1, 2000?
> >
> >Strawman.
> >
> >We don't need to cater for fundamentally broken designs that
> >can't even handle the current unix epoch correctly. If such
> >filesystems exist, then they can simple say "original unix epoch
> >support only" and do whatever crap they are doing right now.
>
> No, not a strawman.  Replace with Jan 26, 2038 and you have the
> same situation.

But that's not the problem I'm talking about.  The problem isn't the
roll-over date of the epoch - the problem is that we're changing the
in-memory meaning of time without changing what the filesystems
store on disk or how they translate them.

To use your example, what I'm actually talking about is the kernel
switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on
disk. The filesystem doesn't know the timestamp is now a different
format, so it could mangle it writing it to disk, or it could mangle
existing timestamps in the YY.. format reading them from disk and
putting them into CC.. format structures. IOWs, it will
incorrectly translate YY  format dates to CC format, or translate
something in the CC format as though it was in YY format. And it
wouldn't even know what was the correct format because there's
nothing telling it on disk whether the date is in CC or YY format.

Either way, you get mangled timestamps, the filesystem doesn't know
about it because it's just storing what the kernel gives it, the
kernel thinks they are fine because they are just opaque when read
back, but the user says "what the fuck did a reboot do to all these
timestamps?".

Hence your example of roll-over dates is a strawman - you've
constructed a problem that is irrelevant to the issue being pointed
out.

FWIW, we already have code in the superblock and VFS to avoid such
problems on filesystems with limited timestamp resolution (i.e
s_time_gran and current_fs_time()) so that what the VFS hands the
filesystem is exactly what the VFS expects to get back from disk
when comparing timestamps.

If we are changing the in-kernel timestamp to have a greater dynamic
range that anything we current support on disk, then we need support
for all filesystems for similar translation and constraint. The
filesystems need to be able to tell the kernel what they timestamp
range they support, and then the kernel needs to follow those
guidelines. And if the filesystem is mounted on a kernel that
doesn't support the current filesystem's timestamp format, then at
minimum that filesystem cannot do anything that writes a
timestamp

Put simply: the filesystem defines the timestamp range that can be
used safely, not the userspace API. If the filesystem can't support
the date it is handed then that is an out-of-range error. Since
when have we accepted that it's OK to handle out-of-range data with
silent overflows or corruption of the data that we are attempting to
store? We're defining a new API to support a wider date range -
there is nothing that prevents us from saying ERANGE can be returned
to a timestamp that the file cannot store correctly

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread H. Peter Anvin
No, not a strawman.  Replace with Jan 26, 2038 and you have the same situation.

On May 30, 2014 6:14:50 PM PDT, Dave Chinner  wrote:
>On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
>> On 05/30/2014 05:37 PM, Dave Chinner wrote:
>> > 
>> > IOWs, the filesystem has to be able to reject any attempt to set a
>> > timestamp that is can't represent on disk otherwise Bad Stuff will
>> > happen,
>> 
>> Actually it is questionable if it is worse to reject a timestamp or
>just
>> let it wrap.  Rejecting a valid timestamp is a bit like "You don't
>> exist, go away."
>
>I think having the new systems calls being able to
>return EINVAL if the value cannot be stored permanently on disk
>correctly is the right thing to do. Having it silently mangled
>by the filesystem and returning "everything is just fine, trust me"
>is close to the worst solution I can think of. That's exactly what
>leads to overflow bugs occurring
>
>> > and filesystems have to be able to specify in their on
>> > disk format what timestamp encoding is being used. The solution
>will
>> > be different for every filesystem that needs to support time beyond
>> > 2038.
>> 
>> Actually the cutoff can be really different for each filesystem, not
>> necessarily 2038.  However, I maintain the above still holds.
>
>Sure, but all filesystems are supposed to handle at least the
>current unix epoch.
>
>> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. 
>What
>> would you have expected such a filesystem to do on Jan 1, 2000?
>
>Strawman.
>
>We don't need to cater for fundamentally broken designs that can't
>even handle the current unix epoch correctly. If such filesystems
>exist, then they can simple say "original unix epoch support only"
>and do whatever crap they are doing right now.
>
>Cheers,
>
>Dave.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread Dave Chinner
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
> On 05/30/2014 05:37 PM, Dave Chinner wrote:
> > 
> > IOWs, the filesystem has to be able to reject any attempt to set a
> > timestamp that is can't represent on disk otherwise Bad Stuff will
> > happen,
> 
> Actually it is questionable if it is worse to reject a timestamp or just
> let it wrap.  Rejecting a valid timestamp is a bit like "You don't
> exist, go away."

I think having the new systems calls being able to
return EINVAL if the value cannot be stored permanently on disk
correctly is the right thing to do. Having it silently mangled
by the filesystem and returning "everything is just fine, trust me"
is close to the worst solution I can think of. That's exactly what
leads to overflow bugs occurring

> > and filesystems have to be able to specify in their on
> > disk format what timestamp encoding is being used. The solution will
> > be different for every filesystem that needs to support time beyond
> > 2038.
> 
> Actually the cutoff can be really different for each filesystem, not
> necessarily 2038.  However, I maintain the above still holds.

Sure, but all filesystems are supposed to handle at least the
current unix epoch.

> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format.  What
> would you have expected such a filesystem to do on Jan 1, 2000?

Strawman.

We don't need to cater for fundamentally broken designs that can't
even handle the current unix epoch correctly. If such filesystems
exist, then they can simple say "original unix epoch support only"
and do whatever crap they are doing right now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread H. Peter Anvin
On 05/30/2014 05:37 PM, Dave Chinner wrote:
> 
> IOWs, the filesystem has to be able to reject any attempt to set a
> timestamp that is can't represent on disk otherwise Bad Stuff will
> happen,

Actually it is questionable if it is worse to reject a timestamp or just
let it wrap.  Rejecting a valid timestamp is a bit like "You don't
exist, go away."

> and filesystems have to be able to specify in their on
> disk format what timestamp encoding is being used. The solution will
> be different for every filesystem that needs to support time beyond
> 2038.

Actually the cutoff can be really different for each filesystem, not
necessarily 2038.  However, I maintain the above still holds.

Consider a filesystem that kept timestamps in YYMMDDHHMMSS format.  What
would you have expected such a filesystem to do on Jan 1, 2000?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread Dave Chinner
On Fri, May 30, 2014 at 10:01:35PM +0200, Arnd Bergmann wrote:
> xfs uses unsigned 32-bit seconds for inode timestamps, which will work
> for the next 92 years, but the VFS uses struct timespec for timestamps,
> which is only good until 2038 on 32-bit CPUs.
> 
> This gets us one small step closer to lifting the VFS limit by using
> struct inode_time in XFS.
> 
> Signed-off-by: Arnd Bergmann 
> Cc: Dave Chinner 
> Cc: x...@oss.sgi.com
> ---
>  fs/xfs/time.h| 4 ++--
>  fs/xfs/xfs_inode.c   | 2 +-
>  fs/xfs/xfs_iops.c| 2 +-
>  fs/xfs/xfs_trans_inode.c | 6 +++---
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/time.h b/fs/xfs/time.h
> index 387e695..a490f1b 100644
> --- a/fs/xfs/time.h
> +++ b/fs/xfs/time.h
> @@ -21,14 +21,14 @@
>  #include 
>  #include 
>  
> -typedef struct timespec timespec_t;
> +typedef struct inode_time timespec_t;
>  
>  static inline void delay(long ticks)
>  {
>   schedule_timeout_uninterruptible(ticks);
>  }
>  
> -static inline void nanotime(struct timespec *tvp)
> +static inline void nanotime(struct inode_time *tvp)
>  {
>   *tvp = CURRENT_TIME;
>  }
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index a6115fe..16d5392 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -654,7 +654,7 @@ xfs_ialloc(
>   xfs_inode_t *ip;
>   uintflags;
>   int error;
> - timespec_t  tv;
> + struct inode_time tv;
>  
>   /*
>* Call the space management code to pick
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 205613a..092ee7c 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -956,7 +956,7 @@ xfs_vn_setattr(
>  STATIC int
>  xfs_vn_update_time(
>   struct inode*inode,
> - struct timespec *now,
> + struct inode_time   *now,
>   int flags)
>  {
>   struct xfs_inode*ip = XFS_I(inode);
> diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
> index 50c3f56..bae2520 100644
> --- a/fs/xfs/xfs_trans_inode.c
> +++ b/fs/xfs/xfs_trans_inode.c
> @@ -70,7 +70,7 @@ xfs_trans_ichgtime(
>   int flags)
>  {
>   struct inode*inode = VFS_I(ip);
> - timespec_t  tv;
> + struct inode_time   tv;
>  
>   ASSERT(tp);
>   ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> @@ -78,13 +78,13 @@ xfs_trans_ichgtime(
>   tv = current_fs_time(inode->i_sb);
>  
>   if ((flags & XFS_ICHGTIME_MOD) &&
> - !timespec_equal(>i_mtime, )) {
> + !inode_time_equal(>i_mtime, )) {
>   inode->i_mtime = tv;
>   ip->i_d.di_mtime.t_sec = tv.tv_sec;
>   ip->i_d.di_mtime.t_nsec = tv.tv_nsec;
>   }

The problem I see here is that the code is now potentially stuffing
a variable that is larger than 32 bits into on on-disk structure
that is only 32 bits in size.  You can't just change the in-memory
representation of inode timestamps and expect the problem to be
fixed - this just pushes the problem down a layer without any
intrastructure allowing filesystems to handle storage of the new
timestamp format sanely.

IOWs, the filesystem has to be able to reject any attempt to set a
timestamp that is can't represent on disk otherwise Bad Stuff will
happen, and filesystems have to be able to specify in their on
disk format what timestamp encoding is being used. The solution will
be different for every filesystem that needs to support time beyond
2038.

Hence I think you are going to need superblock flags and/or
variables to indicate the epoch range the fielsystem can support.
Then the fileystems need conversion functions from whatever the
internal VFS timestamp representation is to whatever their on-disk
format is, and only then can we switch the VFS to using a new
timestamp format.

At that point, filesystem developers can make the changes they need
to the on-disk format to support timestamps beyond 2038, and all
they need to do at the VFS layer is set the "supported range" fields
appropriately in the VFS superblock...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread Dave Chinner
On Fri, May 30, 2014 at 10:01:35PM +0200, Arnd Bergmann wrote:
 xfs uses unsigned 32-bit seconds for inode timestamps, which will work
 for the next 92 years, but the VFS uses struct timespec for timestamps,
 which is only good until 2038 on 32-bit CPUs.
 
 This gets us one small step closer to lifting the VFS limit by using
 struct inode_time in XFS.
 
 Signed-off-by: Arnd Bergmann a...@arndb.de
 Cc: Dave Chinner da...@fromorbit.com
 Cc: x...@oss.sgi.com
 ---
  fs/xfs/time.h| 4 ++--
  fs/xfs/xfs_inode.c   | 2 +-
  fs/xfs/xfs_iops.c| 2 +-
  fs/xfs/xfs_trans_inode.c | 6 +++---
  4 files changed, 7 insertions(+), 7 deletions(-)
 
 diff --git a/fs/xfs/time.h b/fs/xfs/time.h
 index 387e695..a490f1b 100644
 --- a/fs/xfs/time.h
 +++ b/fs/xfs/time.h
 @@ -21,14 +21,14 @@
  #include linux/sched.h
  #include linux/time.h
  
 -typedef struct timespec timespec_t;
 +typedef struct inode_time timespec_t;
  
  static inline void delay(long ticks)
  {
   schedule_timeout_uninterruptible(ticks);
  }
  
 -static inline void nanotime(struct timespec *tvp)
 +static inline void nanotime(struct inode_time *tvp)
  {
   *tvp = CURRENT_TIME;
  }
 diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
 index a6115fe..16d5392 100644
 --- a/fs/xfs/xfs_inode.c
 +++ b/fs/xfs/xfs_inode.c
 @@ -654,7 +654,7 @@ xfs_ialloc(
   xfs_inode_t *ip;
   uintflags;
   int error;
 - timespec_t  tv;
 + struct inode_time tv;
  
   /*
* Call the space management code to pick
 diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
 index 205613a..092ee7c 100644
 --- a/fs/xfs/xfs_iops.c
 +++ b/fs/xfs/xfs_iops.c
 @@ -956,7 +956,7 @@ xfs_vn_setattr(
  STATIC int
  xfs_vn_update_time(
   struct inode*inode,
 - struct timespec *now,
 + struct inode_time   *now,
   int flags)
  {
   struct xfs_inode*ip = XFS_I(inode);
 diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
 index 50c3f56..bae2520 100644
 --- a/fs/xfs/xfs_trans_inode.c
 +++ b/fs/xfs/xfs_trans_inode.c
 @@ -70,7 +70,7 @@ xfs_trans_ichgtime(
   int flags)
  {
   struct inode*inode = VFS_I(ip);
 - timespec_t  tv;
 + struct inode_time   tv;
  
   ASSERT(tp);
   ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 @@ -78,13 +78,13 @@ xfs_trans_ichgtime(
   tv = current_fs_time(inode-i_sb);
  
   if ((flags  XFS_ICHGTIME_MOD) 
 - !timespec_equal(inode-i_mtime, tv)) {
 + !inode_time_equal(inode-i_mtime, tv)) {
   inode-i_mtime = tv;
   ip-i_d.di_mtime.t_sec = tv.tv_sec;
   ip-i_d.di_mtime.t_nsec = tv.tv_nsec;
   }

The problem I see here is that the code is now potentially stuffing
a variable that is larger than 32 bits into on on-disk structure
that is only 32 bits in size.  You can't just change the in-memory
representation of inode timestamps and expect the problem to be
fixed - this just pushes the problem down a layer without any
intrastructure allowing filesystems to handle storage of the new
timestamp format sanely.

IOWs, the filesystem has to be able to reject any attempt to set a
timestamp that is can't represent on disk otherwise Bad Stuff will
happen, and filesystems have to be able to specify in their on
disk format what timestamp encoding is being used. The solution will
be different for every filesystem that needs to support time beyond
2038.

Hence I think you are going to need superblock flags and/or
variables to indicate the epoch range the fielsystem can support.
Then the fileystems need conversion functions from whatever the
internal VFS timestamp representation is to whatever their on-disk
format is, and only then can we switch the VFS to using a new
timestamp format.

At that point, filesystem developers can make the changes they need
to the on-disk format to support timestamps beyond 2038, and all
they need to do at the VFS layer is set the supported range fields
appropriately in the VFS superblock...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread H. Peter Anvin
On 05/30/2014 05:37 PM, Dave Chinner wrote:
 
 IOWs, the filesystem has to be able to reject any attempt to set a
 timestamp that is can't represent on disk otherwise Bad Stuff will
 happen,

Actually it is questionable if it is worse to reject a timestamp or just
let it wrap.  Rejecting a valid timestamp is a bit like You don't
exist, go away.

 and filesystems have to be able to specify in their on
 disk format what timestamp encoding is being used. The solution will
 be different for every filesystem that needs to support time beyond
 2038.

Actually the cutoff can be really different for each filesystem, not
necessarily 2038.  However, I maintain the above still holds.

Consider a filesystem that kept timestamps in YYMMDDHHMMSS format.  What
would you have expected such a filesystem to do on Jan 1, 2000?

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread Dave Chinner
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
 On 05/30/2014 05:37 PM, Dave Chinner wrote:
  
  IOWs, the filesystem has to be able to reject any attempt to set a
  timestamp that is can't represent on disk otherwise Bad Stuff will
  happen,
 
 Actually it is questionable if it is worse to reject a timestamp or just
 let it wrap.  Rejecting a valid timestamp is a bit like You don't
 exist, go away.

I think having the new systems calls being able to
return EINVAL if the value cannot be stored permanently on disk
correctly is the right thing to do. Having it silently mangled
by the filesystem and returning everything is just fine, trust me
is close to the worst solution I can think of. That's exactly what
leads to overflow bugs occurring

  and filesystems have to be able to specify in their on
  disk format what timestamp encoding is being used. The solution will
  be different for every filesystem that needs to support time beyond
  2038.
 
 Actually the cutoff can be really different for each filesystem, not
 necessarily 2038.  However, I maintain the above still holds.

Sure, but all filesystems are supposed to handle at least the
current unix epoch.

 Consider a filesystem that kept timestamps in YYMMDDHHMMSS format.  What
 would you have expected such a filesystem to do on Jan 1, 2000?

Strawman.

We don't need to cater for fundamentally broken designs that can't
even handle the current unix epoch correctly. If such filesystems
exist, then they can simple say original unix epoch support only
and do whatever crap they are doing right now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread H. Peter Anvin
No, not a strawman.  Replace with Jan 26, 2038 and you have the same situation.

On May 30, 2014 6:14:50 PM PDT, Dave Chinner da...@fromorbit.com wrote:
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
 On 05/30/2014 05:37 PM, Dave Chinner wrote:
  
  IOWs, the filesystem has to be able to reject any attempt to set a
  timestamp that is can't represent on disk otherwise Bad Stuff will
  happen,
 
 Actually it is questionable if it is worse to reject a timestamp or
just
 let it wrap.  Rejecting a valid timestamp is a bit like You don't
 exist, go away.

I think having the new systems calls being able to
return EINVAL if the value cannot be stored permanently on disk
correctly is the right thing to do. Having it silently mangled
by the filesystem and returning everything is just fine, trust me
is close to the worst solution I can think of. That's exactly what
leads to overflow bugs occurring

  and filesystems have to be able to specify in their on
  disk format what timestamp encoding is being used. The solution
will
  be different for every filesystem that needs to support time beyond
  2038.
 
 Actually the cutoff can be really different for each filesystem, not
 necessarily 2038.  However, I maintain the above still holds.

Sure, but all filesystems are supposed to handle at least the
current unix epoch.

 Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. 
What
 would you have expected such a filesystem to do on Jan 1, 2000?

Strawman.

We don't need to cater for fundamentally broken designs that can't
even handle the current unix epoch correctly. If such filesystems
exist, then they can simple say original unix epoch support only
and do whatever crap they are doing right now.

Cheers,

Dave.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 11/32] xfs: convert to struct inode_time

2014-05-30 Thread Dave Chinner

[ Please don't top post. ]

On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote:
 On May 30, 2014 6:14:50 PM PDT, Dave Chinner da...@fromorbit.com wrote:
 On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote:
  On 05/30/2014 05:37 PM, Dave Chinner wrote:
   
   IOWs, the filesystem has to be able to reject any attempt to
   set a timestamp that is can't represent on disk otherwise Bad
   Stuff will happen,
  
  Actually it is questionable if it is worse to reject a
  timestamp or
 just
  let it wrap.  Rejecting a valid timestamp is a bit like You
  don't exist, go away.
 
 I think having the new systems calls being able to return EINVAL
 if the value cannot be stored permanently on disk correctly is
 the right thing to do. Having it silently mangled by the
 filesystem and returning everything is just fine, trust me is
 close to the worst solution I can think of. That's exactly what
 leads to overflow bugs occurring
 
   and filesystems have to be able to specify in their on disk
   format what timestamp encoding is being used. The solution
 will
   be different for every filesystem that needs to support time
   beyond 2038.
  
  Actually the cutoff can be really different for each
  filesystem, not necessarily 2038.  However, I maintain the
  above still holds.
 
 Sure, but all filesystems are supposed to handle at least the
 current unix epoch.
 
  Consider a filesystem that kept timestamps in YYMMDDHHMMSS
  format. 
 What
  would you have expected such a filesystem to do on Jan 1, 2000?
 
 Strawman.
 
 We don't need to cater for fundamentally broken designs that
 can't even handle the current unix epoch correctly. If such
 filesystems exist, then they can simple say original unix epoch
 support only and do whatever crap they are doing right now.

 No, not a strawman.  Replace with Jan 26, 2038 and you have the
 same situation.

But that's not the problem I'm talking about.  The problem isn't the
roll-over date of the epoch - the problem is that we're changing the
in-memory meaning of time without changing what the filesystems
store on disk or how they translate them.

To use your example, what I'm actually talking about is the kernel
switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on
disk. The filesystem doesn't know the timestamp is now a different
format, so it could mangle it writing it to disk, or it could mangle
existing timestamps in the YY.. format reading them from disk and
putting them into CC.. format structures. IOWs, it will
incorrectly translate YY  format dates to CC format, or translate
something in the CC format as though it was in YY format. And it
wouldn't even know what was the correct format because there's
nothing telling it on disk whether the date is in CC or YY format.

Either way, you get mangled timestamps, the filesystem doesn't know
about it because it's just storing what the kernel gives it, the
kernel thinks they are fine because they are just opaque when read
back, but the user says what the fuck did a reboot do to all these
timestamps?.

Hence your example of roll-over dates is a strawman - you've
constructed a problem that is irrelevant to the issue being pointed
out.

FWIW, we already have code in the superblock and VFS to avoid such
problems on filesystems with limited timestamp resolution (i.e
s_time_gran and current_fs_time()) so that what the VFS hands the
filesystem is exactly what the VFS expects to get back from disk
when comparing timestamps.

If we are changing the in-kernel timestamp to have a greater dynamic
range that anything we current support on disk, then we need support
for all filesystems for similar translation and constraint. The
filesystems need to be able to tell the kernel what they timestamp
range they support, and then the kernel needs to follow those
guidelines. And if the filesystem is mounted on a kernel that
doesn't support the current filesystem's timestamp format, then at
minimum that filesystem cannot do anything that writes a
timestamp

Put simply: the filesystem defines the timestamp range that can be
used safely, not the userspace API. If the filesystem can't support
the date it is handed then that is an out-of-range error. Since
when have we accepted that it's OK to handle out-of-range data with
silent overflows or corruption of the data that we are attempting to
store? We're defining a new API to support a wider date range -
there is nothing that prevents us from saying ERANGE can be returned
to a timestamp that the file cannot store correctly

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/