Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote: > Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per > the Posix specification: > > SYSCALL_DEFINE1(time, time_t __user *, tloc) > { > time_t i = get_seconds(); > > if (tloc) { > if (put_user(i,tloc)) > return -EFAULT; > } > force_successful_syscall_return(); > return i; > } get_seconds() returns an unsigned long so there's potential for overflow here. -- Roger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Tuesday 03 June 2014 18:41:30 Dave Chinner wrote: > On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote: > > On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: > > > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > > > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > > My patch set > > > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > > > > more like 64-bit kernels regarding inode time stamps, which does > > > > impact all the file systems that the a 64-bit time or the NFS > > > > unsigned epoch (1970-2106), while your patch extends the file > > > > system internal epoch (1901-2038 for XFS) so it can be used by > > > > anything that knows how to handle larger than 32-bit second values > > > > (either 64-bit kernel or 32-bit with inode_time patch). > > > > > > Right, but the issue is that 64 bit second counters are broken right > > > now because most filesystems can't support more than 32 bit values. > > > So it doesn't matter whether it's 32 bit or 64 bit machines, just > > > adding explicit support for >32 bit second counters without doing > > > anything else just extends that brokenness into the indefinite > > > future. > > > > Of course, "most filesystems" are obsolete, and most of the modern > > file systems already support >32 bit timestamps: ext4, btrfs, cifs, > > f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else > > except xfs, ext2/3 and exofs uses the nfsv3 interpretation on > > 64-bit systems, which interprets time stamps with the high bit > > set as years 2038-2106 rather than 1903-1969. > > I'm not sure that's an entirely correct representation - the > remainder of the 32 bit-only timestamp filesystems don't actively > interpret the time stamp at all - it's just an opaque 32 bit value. > hence the interpretation of the value is dependent on whether the > kernel treats it as signed or unsigned As I mentioned elsewhere in the thread, I don't the way it's handled is intentional, but it's definitely the file system code that does the assignment to the timeval and decides on the interpretation, doing either inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode.mtime); or inode->i_mtime.tv_sec = le32_to_cpu(raw_inode.mtime); Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote: > On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: > > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > My patch set > > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > > > more like 64-bit kernels regarding inode time stamps, which does > > > impact all the file systems that the a 64-bit time or the NFS > > > unsigned epoch (1970-2106), while your patch extends the file > > > system internal epoch (1901-2038 for XFS) so it can be used by > > > anything that knows how to handle larger than 32-bit second values > > > (either 64-bit kernel or 32-bit with inode_time patch). > > > > Right, but the issue is that 64 bit second counters are broken right > > now because most filesystems can't support more than 32 bit values. > > So it doesn't matter whether it's 32 bit or 64 bit machines, just > > adding explicit support for >32 bit second counters without doing > > anything else just extends that brokenness into the indefinite > > future. > > Of course, "most filesystems" are obsolete, and most of the modern > file systems already support >32 bit timestamps: ext4, btrfs, cifs, > f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else > except xfs, ext2/3 and exofs uses the nfsv3 interpretation on > 64-bit systems, which interprets time stamps with the high bit > set as years 2038-2106 rather than 1903-1969. I'm not sure that's an entirely correct representation - the remainder of the 32 bit-only timestamp filesystems don't actively interpret the time stamp at all - it's just an opaque 32 bit value. hence the interpretation of the value is dependent on whether the kernel treats it as signed or unsigned > > infrastructure), then we'll *never be able to fix it* and we'll be > > stuck with timestamps that do really weird things when you pass > > arbitrary future dates to the kernel. > > We already have that. I agree it's fixable and we should fix it, > but I don't see how this is different from what we had 20 years > ago when Linux on Alpha first introduced a 64-bit time_t. It's > been this way on every 64-bit Linux system since. I see it differently: we've got 20 years more experience than when the 64 bit time_t was introduced. That experience tells us that best practices for API design are to range check every input to prevent unintended side effects from occurring due to out-of-range data Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: > On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > > > In my list at http://kernelnewbies.org/y2038, I found that almost > > > > > all file systems at least times until 2106, because they treat > > > > > the on-disk value as unsigned on 64-bit systems, or they use > > > > > a completely different representation. My guess is that somebody > > > > > earlier spent a lot of work on making that happen. > > > > > > > > > > The exceptions are: > > > > > > > > > > * exofs uses signed values, which can probably be changed to be > > > > > consistent with the others. > > > > > * isofs has a bug that limits it until 2027 on architectures with > > > > > a signed 'char' type (otherwise it's 2155). > > > > > * udf can represent times for many thousands of years through a > > > > > 16-bit year representation, but the code to convert to epoch > > > > > uses a const array that ends at 2038. > > > > > * afs uses signed seconds and can probably be fixed > > > > > * coda relies on user space time representation getting passed > > > > > through an ioctl. > > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > > > > where they really use signed. > > > > > > > > > > I was confused about XFS since I didn't noticed that there are > > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > > > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > > > > > > > You've missed an awful lot more than just the implications for the > > > > core kernel code. > > > > > > > > There's a good chance such changes propagate to APIs elsewhere in > > > > the filesystems, because something you haven't realised is that XFS > > > > effectively exposes the on-disk timestamp format directly to > > > > userspace via the bulkstat interface (see struct xfs_bstat). It also > > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used > > > > by the online defragmenter. > > > > I really didn't look at them at all, as ioctl is very late on my > > mental list of things to change. I do realize that a lot of drivers > > and file systems do have ioctls that pass time values and we need to > > address them one by one. > > > > I just looked at the ioctls you mentioned but don't see how open-by-handle > > is affected by this. Can you point me to what you mean? > > Sorry, I misremembered how some of the XFS open-by-handle code works > in userspace (XFS has a pretty rich open-by-handle ioctl() interface > that predates the kernel syscalls by at least 10 years). Basically > there is code in userspace that uses the information returned from > bulkstat to construct file handles to pass to the open-by-handle > ioctls. xfs_fsr then uses the combination of open-by-handle from the > bulkstat output and the bulkstat output to feed into the swap extent > ioctls > > i.e. the filesystem's idea of what time is is passed to userspace as > an opaque cookie in this case, but it is not used directly by the > open-by-handle interfaces like I implied it was. Ok, I see. > > My patch set > > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > > more like 64-bit kernels regarding inode time stamps, which does > > impact all the file systems that the a 64-bit time or the NFS > > unsigned epoch (1970-2106), while your patch extends the file > > system internal epoch (1901-2038 for XFS) so it can be used by > > anything that knows how to handle larger than 32-bit second values > > (either 64-bit kernel or 32-bit with inode_time patch). > > Right, but the issue is that 64 bit second counters are broken right > now because most filesystems can't support more than 32 bit values. > So it doesn't matter whether it's 32 bit or 64 bit machines, just > adding explicit support for >32 bit second counters without doing > anything else just extends that brokenness into the indefinite > future. Of course, "most filesystems" are obsolete, and most of the modern file systems already support >32 bit timestamps: ext4, btrfs, cifs, f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else except xfs, ext2/3 and exofs uses the nfsv3 interpretation on 64-bit systems, which interprets time stamps with the high bit set as years 2038-2106 rather than 1903-1969. > If we don't fix it now (i.e in the new user API and supporting > infrastructure), then we'll *never be able to fix it* and we'll be > stuck with timestamps that do really weird things when you pass > arbitrary future dates to the kernel. We already have that. I agree it's fixable and we should fix it, but I don't see how this is different from what we had 20 years ago when Linux on Alpha first introduced a 64-bit time_t. It's been this way on every 64-bit
Re: [RFC 11/32] xfs: convert to struct inode_time
On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: On Monday 02 June 2014 10:28:22 Dave Chinner wrote: On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. I really didn't look at them at all, as ioctl is very late on my mental list of things to change. I do realize that a lot of drivers and file systems do have ioctls that pass time values and we need to address them one by one. I just looked at the ioctls you mentioned but don't see how open-by-handle is affected by this. Can you point me to what you mean? Sorry, I misremembered how some of the XFS open-by-handle code works in userspace (XFS has a pretty rich open-by-handle ioctl() interface that predates the kernel syscalls by at least 10 years). Basically there is code in userspace that uses the information returned from bulkstat to construct file handles to pass to the open-by-handle ioctls. xfs_fsr then uses the combination of open-by-handle from the bulkstat output and the bulkstat output to feed into the swap extent ioctls i.e. the filesystem's idea of what time is is passed to userspace as an opaque cookie in this case, but it is not used directly by the open-by-handle interfaces like I implied it was. Ok, I see. My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used by anything that knows how to handle larger than 32-bit second values (either 64-bit kernel or 32-bit with inode_time patch). Right, but the issue is that 64 bit second counters are broken right now because most filesystems can't support more than 32 bit values. So it doesn't matter whether it's 32 bit or 64 bit machines, just adding explicit support for 32 bit second counters without doing anything else just extends that brokenness into the indefinite future. Of course, most filesystems are obsolete, and most of the modern file systems already support 32 bit timestamps: ext4, btrfs, cifs, f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else except xfs, ext2/3 and exofs uses the nfsv3 interpretation on 64-bit systems, which interprets time stamps with the high bit set as years 2038-2106 rather than 1903-1969. If we don't fix it now (i.e in the new user API and supporting infrastructure), then we'll *never be able to fix it* and we'll be stuck with timestamps that do really weird things when you pass arbitrary future dates to the kernel. We already have that. I agree it's fixable and we should fix it, but I don't see how this is different from what we had 20 years ago when Linux on Alpha first introduced a 64-bit time_t. It's been this way on every 64-bit Linux system since. This is how ext4 does it (I mean the sizeof() trick, not the bit stuffing they do): I guess if there is general agreement on introducing 'struct inode_time', we can skip that intermediate step.
Re: [RFC 11/32] xfs: convert to struct inode_time
On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote: On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: On Monday 02 June 2014 10:28:22 Dave Chinner wrote: On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used by anything that knows how to handle larger than 32-bit second values (either 64-bit kernel or 32-bit with inode_time patch). Right, but the issue is that 64 bit second counters are broken right now because most filesystems can't support more than 32 bit values. So it doesn't matter whether it's 32 bit or 64 bit machines, just adding explicit support for 32 bit second counters without doing anything else just extends that brokenness into the indefinite future. Of course, most filesystems are obsolete, and most of the modern file systems already support 32 bit timestamps: ext4, btrfs, cifs, f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else except xfs, ext2/3 and exofs uses the nfsv3 interpretation on 64-bit systems, which interprets time stamps with the high bit set as years 2038-2106 rather than 1903-1969. I'm not sure that's an entirely correct representation - the remainder of the 32 bit-only timestamp filesystems don't actively interpret the time stamp at all - it's just an opaque 32 bit value. hence the interpretation of the value is dependent on whether the kernel treats it as signed or unsigned infrastructure), then we'll *never be able to fix it* and we'll be stuck with timestamps that do really weird things when you pass arbitrary future dates to the kernel. We already have that. I agree it's fixable and we should fix it, but I don't see how this is different from what we had 20 years ago when Linux on Alpha first introduced a 64-bit time_t. It's been this way on every 64-bit Linux system since. I see it differently: we've got 20 years more experience than when the 64 bit time_t was introduced. That experience tells us that best practices for API design are to range check every input to prevent unintended side effects from occurring due to out-of-range data Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Tuesday 03 June 2014 18:41:30 Dave Chinner wrote: On Tue, Jun 03, 2014 at 09:33:36AM +0200, Arnd Bergmann wrote: On Tuesday 03 June 2014 10:32:27 Dave Chinner wrote: On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: On Monday 02 June 2014 10:28:22 Dave Chinner wrote: On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used by anything that knows how to handle larger than 32-bit second values (either 64-bit kernel or 32-bit with inode_time patch). Right, but the issue is that 64 bit second counters are broken right now because most filesystems can't support more than 32 bit values. So it doesn't matter whether it's 32 bit or 64 bit machines, just adding explicit support for 32 bit second counters without doing anything else just extends that brokenness into the indefinite future. Of course, most filesystems are obsolete, and most of the modern file systems already support 32 bit timestamps: ext4, btrfs, cifs, f2fs, 9p, nfsv4, ntfs, gfs2, ocfs2, fuse, ufs2. Everything else except xfs, ext2/3 and exofs uses the nfsv3 interpretation on 64-bit systems, which interprets time stamps with the high bit set as years 2038-2106 rather than 1903-1969. I'm not sure that's an entirely correct representation - the remainder of the 32 bit-only timestamp filesystems don't actively interpret the time stamp at all - it's just an opaque 32 bit value. hence the interpretation of the value is dependent on whether the kernel treats it as signed or unsigned As I mentioned elsewhere in the thread, I don't the way it's handled is intentional, but it's definitely the file system code that does the assignment to the timeval and decides on the interpretation, doing either inode-i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode.mtime); or inode-i_mtime.tv_sec = le32_to_cpu(raw_inode.mtime); Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, 2014-06-02 at 19:32 -0400, Theodore Ts'o wrote: Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per the Posix specification: SYSCALL_DEFINE1(time, time_t __user *, tloc) { time_t i = get_seconds(); if (tloc) { if (put_user(i,tloc)) return -EFAULT; } force_successful_syscall_return(); return i; } get_seconds() returns an unsigned long so there's potential for overflow here. -- Roger -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: > On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > > In my list at http://kernelnewbies.org/y2038, I found that almost > > > > all file systems at least times until 2106, because they treat > > > > the on-disk value as unsigned on 64-bit systems, or they use > > > > a completely different representation. My guess is that somebody > > > > earlier spent a lot of work on making that happen. > > > > > > > > The exceptions are: > > > > > > > > * exofs uses signed values, which can probably be changed to be > > > > consistent with the others. > > > > * isofs has a bug that limits it until 2027 on architectures with > > > > a signed 'char' type (otherwise it's 2155). > > > > * udf can represent times for many thousands of years through a > > > > 16-bit year representation, but the code to convert to epoch > > > > uses a const array that ends at 2038. > > > > * afs uses signed seconds and can probably be fixed > > > > * coda relies on user space time representation getting passed > > > > through an ioctl. > > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > > > where they really use signed. > > > > > > > > I was confused about XFS since I didn't noticed that there are > > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > > > > > You've missed an awful lot more than just the implications for the > > > core kernel code. > > > > > > There's a good chance such changes propagate to APIs elsewhere in > > > the filesystems, because something you haven't realised is that XFS > > > effectively exposes the on-disk timestamp format directly to > > > userspace via the bulkstat interface (see struct xfs_bstat). It also > > > affects the XFS open-by-handle ioctl and the swap extent ioctl used > > > by the online defragmenter. > > I really didn't look at them at all, as ioctl is very late on my > mental list of things to change. I do realize that a lot of drivers > and file systems do have ioctls that pass time values and we need to > address them one by one. > > I just looked at the ioctls you mentioned but don't see how open-by-handle > is affected by this. Can you point me to what you mean? Sorry, I misremembered how some of the XFS open-by-handle code works in userspace (XFS has a pretty rich open-by-handle ioctl() interface that predates the kernel syscalls by at least 10 years). Basically there is code in userspace that uses the information returned from bulkstat to construct file handles to pass to the open-by-handle ioctls. xfs_fsr then uses the combination of open-by-handle from the bulkstat output and the bulkstat output to feed into the swap extent ioctls i.e. the filesystem's idea of what time is is passed to userspace as an opaque cookie in this case, but it is not used directly by the open-by-handle interfaces like I implied it was. > > Just to put that in context, here's the kernel patch to add extended > > epoch support to XFS. It's completely untested as I haven't done any > > userspace code changes to enable the feature. However, it should > > give you an indication of how far the simple act of changing the > > kernel time representation spread through the filesystem. This does > > not include any of the VFS infrastructure to specifying the range of > > supported timestamps. It survives some smoke testing, but dies when > > the online defragmenter starts using the bulkstat and swap extent > > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I > > probably don't have that all sorted correctly yet... > > > > To test extended epoch support, however, I need to some fstests that > > define and validate the behaviour of the new syscalls - until we get > > those we can't validate that the filesystem follows the spec > > properly. I also suspect we are going to need an interface to query > > the supported range of timestamps from a filesystem so that we can > > test boundary conditions in an automated fashion > > Thanks a lot for having an initial look at this yourself! > > I'd still consider the two problems largely orthogonal. Depends how you look at it. You can't extend the kernel's idea of time without permanent storage being able to specify the supported bounds - that's a non-negotiable aspect of introducing extended epoch timestamp support. The actual addition of extended timestamp support to each individual filesystem is orthoganol to the introduction of the struct inode_time, but doing this addition properly is dependent on the VFS infrastructure being there in the first place. > My patch set > (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave > more like 64-bit kernels regarding inode time stamps, which does >
Re: [RFC 11/32] xfs: convert to struct inode_time
On 06/02/2014 04:32 PM, Theodore Ts'o wrote: > On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote: >> On 06/02/2014 03:29 PM, Theodore Ts'o wrote: >>> >>> And since we are already returning (time_t) -1 in some cases, we might >>> as well try to make things a bit more formal. >>> >> >> Are we? I am not aware of *Linux* actually using that. > > Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per > the Posix specification: > > SYSCALL_DEFINE1(time, time_t __user *, tloc) > { > time_t i = get_seconds(); > > if (tloc) { > if (put_user(i,tloc)) > return -EFAULT; > } > force_successful_syscall_return(); > return i; > } > OK, I guess I should have said... other than for -EFAULT. I just don't know of anyone using time(2) with an argument other than NULL. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote: > On 06/02/2014 03:29 PM, Theodore Ts'o wrote: > > > > And since we are already returning (time_t) -1 in some cases, we might > > as well try to make things a bit more formal. > > > > Are we? I am not aware of *Linux* actually using that. Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per the Posix specification: SYSCALL_DEFINE1(time, time_t __user *, tloc) { time_t i = get_seconds(); if (tloc) { if (put_user(i,tloc)) return -EFAULT; } force_successful_syscall_return(); return i; } Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 06/02/2014 03:29 PM, Theodore Ts'o wrote: > > And since we are already returning (time_t) -1 in some cases, we might > as well try to make things a bit more formal. > Are we? I am not aware of *Linux* actually using that. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote: > > It would be problematic for time(2) or gettimeofday(2) to return > > TIME_UNDEFINED, since there are programs that care about time ticking > > forward, but I could imagine a new interface which would be permitted > > to return a flag indicating that we don't know the current time > > (because the CMOS battery had run down, etc.) so instead we're going > > to be counting the number of seconds since the system was booted. > > This assumes that we actually know that that is the case, which may be > an aggressive assumption. We won't know if the RTC clock is wrong, true --- but the kernel will know if (a) the hardware doesn't have RTC clock at all, or if (b) the RTC clock is ticking some time that can't be encoded using the current time_t type. So in that case, the fallback would be to be for the kernel to tick starting with time_t == 0 when the system is initially booted, and the "time indefinite flag" would be set. Now assume that we have a new system call, gettimestampofday(2), which returns a new timestamp structure which has a 64-bit ts_sec field, the ts_nsec field (ala struct timespec), and a ts_flags field, where the kernel could signal things like "time invalid", or "time can't be encoded in the legacy time_t type", or "I'm not sure if the time is correct" --- i.e., because the RTC battery isn't working. Not all hardware might be able to support the last, of course, but if the battery is low, or the system has been exposed to very low temperatures (or large amounts of cosmic radiation, etc.) the RTC time may just be plain wrong. No system is going to be perfect, but it should be possible to make htings better, at for certain classes of hardware. And since we are already returning (time_t) -1 in some cases, we might as well try to make things a bit more formal. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 15:04:27 Chuck Lever wrote: > On Jun 2, 2014, at 2:58 PM, Roger Willcocks wrote: > > > > > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: > > > >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for > >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. > >> (See the definition of nfstime3 in RFC 1813). > >> > > > > nfstime3 could be extended by redefining the otherwise unused > > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit > > seconds field and an unsigned 30-bit nanoseconds field. > > > > This could represent 1970 +/- 272 years. > > > > Servers could indicate they can understand the extended time format by > > adding a new FSINFO capability - FSF3_CANSETTIME_EX. > > > > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending > > timestamps so old servers would be protected from new clients. > > You would have to get the IETF’s NFSv4 working group to sign off on > this change. Otherwise, Linux would be the only NFSv3 implementation > that supports the extension. > > But I suspect the answer you’d get is “Use NFSv4.” While I've never dealt with an NFS standardization, I'd assume this is a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid range of times until 2106 using unsigned seconds, and that should really give enough time to migrate to something better (not necessarily NFSv4). Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Jun 2, 2014, at 2:58 PM, Roger Willcocks wrote: > > On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: > >> NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for >> seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. >> (See the definition of nfstime3 in RFC 1813). >> > > nfstime3 could be extended by redefining the otherwise unused > nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit > seconds field and an unsigned 30-bit nanoseconds field. > > This could represent 1970 +/- 272 years. > > Servers could indicate they can understand the extended time format by > adding a new FSINFO capability - FSF3_CANSETTIME_EX. > > Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending > timestamps so old servers would be protected from new clients. You would have to get the IETF’s NFSv4 working group to sign off on this change. Otherwise, Linux would be the only NFSv3 implementation that supports the extension. But I suspect the answer you’d get is “Use NFSv4.” > Old clients don't need to be protected from new servers because the > on-the-wire bit pattern for dates between 1970 and 2106 stays the same, > so they're no worse off than they were before. > > Arguably the new server ought to clamp out-of-range timestamps before > sending them to old clients but that would need per-client state (and > nfs3 is stateless.) There’s no reliable way in NFSv3 for clients and servers to identify the software running on the peer. Practically speaking, you should assume that the NFSv3 protocol is never going to change. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: > NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for > seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. > (See the definition of nfstime3 in RFC 1813). > nfstime3 could be extended by redefining the otherwise unused nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit seconds field and an unsigned 30-bit nanoseconds field. This could represent 1970 +/- 272 years. Servers could indicate they can understand the extended time format by adding a new FSINFO capability - FSF3_CANSETTIME_EX. Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending timestamps so old servers would be protected from new clients. Old clients don't need to be protected from new servers because the on-the-wire bit pattern for dates between 1970 and 2106 stays the same, so they're no worse off than they were before. Arguably the new server ought to clamp out-of-range timestamps before sending them to old clients but that would need per-client state (and nfs3 is stateless.) -- Roger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 11:04:23 Chuck Lever wrote: > I’m wondering what should be done about NFS. A solution for NFS should > match any scheme that is considered for local file systems, IMO. > > NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for > seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. > (See the definition of nfstime3 in RFC 1813). > > NFSv4 uses a signed 64-bit value where zero represents midnight UTC > on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See > the definition of nfstime4 in RFC 5661). > > The NFSv4 protocol is probably not problematic, and NFSv3 should be out > of the picture by 2038. But if changes are planned for dealing _now_ > with timestamp issues, compatibility with NFSv3 is a consideration. > > It is already the case that, via NFSv3, the Linux NFS client transmits > timestamps earlier than 1970 as large positive numbers. Try this with > xfstests generic/258. If I read the code correctly, a pre-1970 timestamp will be sent as a large unsigned integer, but received as a post-2038 timestamp on 64-bit kernels, both in the nfs client and server code. This behavior is clearly wrong, but it's the same bug that we have in lots of other file systems, and it makes sense to have the same fix everywhere, at lease the cases where we know what interpretation we actually want. NFS has the luxury of having an actual specification saying that the value is unsigned. For most of the legacy file systems, we can only make a guess at how other OSs would interpret the same numbers. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote: > On 06/02/2014 08:31 AM, Theodore Ts'o wrote: > > > > I wonder if it would make sense to try to promulgate via the Austin > > group, and possibly the C standards committee the concept of a bit > > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time > > unknown", or "time indefinite" or "we couldn't encode the time". > > > > (time_t)-1 already has this meaning for some calls (e.g. time(2)). > However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately > something similar applies to all possible bit patterns, certainly within > the range of an int. Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means "Sun Feb 7 07:28:15 CET 2106", and that is much harder to distinguish from a real future date. If we had the choice, I'd go for something like 1, i.e. "Thu Jan 1 01:00:01 CET 1970". > > We would then teach gmtime(3) and asctime(3) to print some appropriate > > message, and we could teach programs like find (with the -mtime) > > option, make, tmpwatch, et. al., that they can't make any presumption > > about the comparibility of any timestamp which has a value of > > TIME_UNDEFINIED. > > > > It would be problematic for time(2) or gettimeofday(2) to return > > TIME_UNDEFINED, since there are programs that care about time ticking > > forward, but I could imagine a new interface which would be permitted > > to return a flag indicating that we don't know the current time > > (because the CMOS battery had run down, etc.) so instead we're going > > to be counting the number of seconds since the system was booted. > > This assumes that we actually know that that is the case, which may be > an aggressive assumption. It's harder for time(2), but for the inode case, we can definitely detect when the file system specific representation overflows or underflows, which may be be at a number of very different points of time. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 06/02/2014 08:31 AM, Theodore Ts'o wrote: > > I wonder if it would make sense to try to promulgate via the Austin > group, and possibly the C standards committee the concept of a bit > pattern (that might commonly be INT_MAX or UINT_MAX) that means "time > unknown", or "time indefinite" or "we couldn't encode the time". > (time_t)-1 already has this meaning for some calls (e.g. time(2)). However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately something similar applies to all possible bit patterns, certainly within the range of an int. > We would then teach gmtime(3) and asctime(3) to print some appropriate > message, and we could teach programs like find (with the -mtime) > option, make, tmpwatch, et. al., that they can't make any presumption > about the comparibility of any timestamp which has a value of > TIME_UNDEFINIED. > > It would be problematic for time(2) or gettimeofday(2) to return > TIME_UNDEFINED, since there are programs that care about time ticking > forward, but I could imagine a new interface which would be permitted > to return a flag indicating that we don't know the current time > (because the CMOS battery had run down, etc.) so instead we're going > to be counting the number of seconds since the system was booted. This assumes that we actually know that that is the case, which may be an aggressive assumption. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote: > I’m wondering what should be done about NFS. A solution for NFS should > match any scheme that is considered for local file systems, IMO. > > An alternative would be to “cap” the timestamps transmitted via NFSv3 by > Linux, so that a pre-epoch timestamp is transmitted as zero, and a large > timestamp is transmitted as UINT_MAX. I wonder if it would make sense to try to promulgate via the Austin group, and possibly the C standards committee the concept of a bit pattern (that might commonly be INT_MAX or UINT_MAX) that means "time unknown", or "time indefinite" or "we couldn't encode the time". We would then teach gmtime(3) and asctime(3) to print some appropriate message, and we could teach programs like find (with the -mtime) option, make, tmpwatch, et. al., that they can't make any presumption about the comparibility of any timestamp which has a value of TIME_UNDEFINIED. It would be problematic for time(2) or gettimeofday(2) to return TIME_UNDEFINED, since there are programs that care about time ticking forward, but I could imagine a new interface which would be permitted to return a flag indicating that we don't know the current time (because the CMOS battery had run down, etc.) so instead we're going to be counting the number of seconds since the system was booted. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Jun 2, 2014, at 6:56 AM, Arnd Bergmann wrote: > On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote: >> >>> For actually running kernels beyond 2038, the best idea I've seen so >>> far is to disallow all broken code at compile time. I don't see >>> a choice but to audit the entire kernel for invalid uses on both >>> 32 and 64 bit in the next few years. A lot of code will get changed >>> in the process so we can actually keep running 32-bit kernels and >>> file systems, but other code will likely go away: >>> >>> * any system calls that pass a time_t, timeval or timespec on >>> 32-bit systems return -ENOSYS, to ensure all user land uses >>> the replacements we will put into place >>> * The definition of 'time_t', 'timval' and 'timespec' can be hidden >>> from the kernel, and all code using it left out. >>> * ext2 and ext3 file system code will have to be disabled, but that's >>> file since ext4 can mount old file systems. >> >> Syscalls and libs can be "fixed". Existing filesystem content might >> not. So if you need to mount some old media in read-write mode after >> 2038 and that happens to content an ext2 or similarly limited filesystem >> then it'd better just "work". Having the kernel refuse to modify the >> filesystem would be unacceptable. > > I think you misunderstood what I suggested: the intent is to avoid > seeing things break in 2038 by making them break much earlier. We have > a solution for ext2 file systems, it's called ext4, and we just need > to ensure that everybody knows they have to migrate eventually. > > At some point before the mid 2030ies, you should no longer be able to > build a kernel that has support for ext2 or any other module that will > run into bugs later. Until then (rather sooner than later), I'd like > to get to the point where you can choose whether to include those > modules at build time or not, and then get everybody to turn off that > option and fix the bugs they run into. You wouldn't need that for a > 2014-generation long-term support disto (rhel 7, sles 12, debian 7, > ubuntu 14.04, ...), but perhaps for the next generation, or the > one after that. I’m wondering what should be done about NFS. A solution for NFS should match any scheme that is considered for local file systems, IMO. NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). NFSv4 uses a signed 64-bit value where zero represents midnight UTC on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See the definition of nfstime4 in RFC 5661). The NFSv4 protocol is probably not problematic, and NFSv3 should be out of the picture by 2038. But if changes are planned for dealing _now_ with timestamp issues, compatibility with NFSv3 is a consideration. It is already the case that, via NFSv3, the Linux NFS client transmits timestamps earlier than 1970 as large positive numbers. Try this with xfstests generic/258. Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and timestamps larger than can be represented in an unsigned 32-bit field and return an immediate error to the requesting application (like EINVAL). If the Linux NFS server encounters a local file with a timestamp that cannot be represented via a u32, should it also return NFS3ERR_INVAL? RFC 1813 does not provide guidance on the behavior nor does it suggest a particular error status code. The Solaris 11 server appears to return NFS3ERR_INVAL in this case. An alternative would be to “cap” the timestamps transmitted via NFSv3 by Linux, so that a pre-epoch timestamp is transmitted as zero, and a large timestamp is transmitted as UINT_MAX. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote: > Yes, there are some ongoing dicussions about changing the post-2038 > encoding of the timestamp in ext4, which is why this hasn't been fixed > yet. The main thing that's been missing is time for me to review the > patches, and a good way of writing regression tests that will work (or > at least not fail) on build environments with a 32-bit time_t and > 32-bit-only capable versions of functions such as gmtime(3). > > And given current discussions, I may want to think about some kind of > superblock flag to allow the use of a 32-bit unsigned encoding for > file systems using a 128-byte inode, with a way of setting that flag > after scanning the file system to make sure there are no times that > are previous to January 1, 1970. (Or more generally, allow any epoch > to be defined using a 64-bit time_t offset stored in the superblock...) FWIW, I've gone through the other file system implementations once more. The most common pattern I've encountered is to have a read_inode function with inode->i_mtime = le32_to_cpu(raw_inode->mtime); which results in interpreting the time as 'signed' on 32-bit kernels, but as 'unsigned' on 64-bit kernels. This could have been done intentionally to extend the valid time range to 2106 on 64-bit kernels, but it seems more likely that the code was written with no thought given to 64-bit time_t at all. I see this pattern on p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2, jfs, minix, nfsv2/v3 (this was clearly intentional and is spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv, and ufs (protocol version 1 only). The other behavior I see is to treat the on-disk 32-bit value as signed on both 32-bit and 64-bit kernels: inode->i_mtime = (signed)le32_to_cpu(raw_inode->mtime); this seems to be done intentionally in all cases, to maintain compatibility between 32-bit and 64-bit kernels, but it's relatively rare: exofs, ext2/3/4 (good old inodes) and xfs are the only ones doing this. In case of ext2/3/4, the sign handlign was introduced here: http://www.spinics.net/lists/linux-ext4/msg01758.html exofs and xfs seem to have done it like this for all of git history. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
> On Jun 2, 2014, at 4:57, "Theodore Ts'o" wrote: > >> On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: >> >> I think you misunderstood what I suggested: the intent is to avoid >> seeing things break in 2038 by making them break much earlier. We have >> a solution for ext2 file systems, it's called ext4, and we just need >> to ensure that everybody knows they have to migrate eventually. >> >> At some point before the mid 2030ies, you should no longer be able to >> build a kernel that has support for ext2 or any other module that will >> run into bugs later > > Even for ext4, it's not quite so simple as that. You only have > support for times post 2038 if you are using an inode size > 128 > bytes. There are a very, very large number of machines which even > today, are using 128 byte inodes with ext4 for performance reasons. > > The vast majority of those machines which I know of can probably move > to 256 byte inodes relatively easily, since hard drive replacement > cycles are order 5-6 years tops, so I'm not that concerned, but it > just goes to show this is a very complicated problem. > > And even if we're talking about flash and embedded devices, the good > news is if you assume that 10 years is enough time for people to > update their embedded OS builds, and that the vast majority of > deployed devices will probably only be in service for 10-15 years, we > do have enough time to make file system format changes, although > admittedly we can't afford to dilly-dally. I have a number of file systems older than any device they are sitting on. RAID allows individual disks to be swapped out, and when all disks have been swapped out, extend the file system online. The system doesn't even have to be taken offline in the process if it is possible to physically get to the drives with the system powered (e.g. hot plug bays), which is really damned nice.-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, 31 May 2014, Dave Chinner wrote: > If we are changing the in-kernel timestamp to have a greater dynamic > range that anything we current support on disk, then we need support > for all filesystems for similar translation and constraint. The > filesystems need to be able to tell the kernel what they timestamp > range they support, and then the kernel needs to follow those > guidelines. And if the filesystem is mounted on a kernel that > doesn't support the current filesystem's timestamp format, then at > minimum that filesystem cannot do anything that writes a > timestamp > > Put simply: the filesystem defines the timestamp range that can be > used safely, not the userspace API. If the filesystem can't support > the date it is handed then that is an out-of-range error. Since > when have we accepted that it's OK to handle out-of-range data with > silent overflows or corruption of the data that we are attempting to > store? We're defining a new API to support a wider date range - > there is nothing that prevents us from saying ERANGE can be returned > to a timestamp that the file cannot store correctly I don't see anything new about this issue. All problems that could arise from the kernel being able to represent a timestamp some filesystems can't are problems that already apply with 64-bit kernels using 64-bit time_t internally. So while as part of Y2038-preparedness we do need a clear understanding of which filesystems have what timestamp limits and what happens with timestamps beyond those limits, I think this is a separate strand of the problem - one that applies to both 32-bit and 64-bit systems - from the more general issue for 32-bit systems. -- Joseph S. Myers jos...@codesourcery.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote: > > "For new inodes we always reserve enough space for the kernel's known > extended fields, but for inodes created with an old kernel this might > not have been the case. None of the extended inode fields is critical > for correct filesystem operation." > > Do we have to worry about this for inodes that contain extended > attributes and that get updated after 2038? In practice, the extended timestamps was one of the first things added to ext4, so the vast majority of ext4 file systems with inode sizes > 128 bytes will have room for the extended timestamps. There are some legacy ext3 file systems with 256-byte inodes (enabled for fast sotrage of SELinux xattrs) that in theory, could have been converted to ext4 and had enough xattrs so that the extended timestamps couldn't be added. That would be a vanishingly small use case, and in practice, it's not likely to be the case for the embedded market. I could imagine someone worrying about file systems originally formatted using RHEL 4 post-2038 (perhaps running in a VM), but I don't work for IBM any more, and hopefully even IBM would just tell such customers that they need to suck it up, and do a backup/reformat/restore pass. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
Yes, there are some ongoing dicussions about changing the post-2038 encoding of the timestamp in ext4, which is why this hasn't been fixed yet. The main thing that's been missing is time for me to review the patches, and a good way of writing regression tests that will work (or at least not fail) on build environments with a 32-bit time_t and 32-bit-only capable versions of functions such as gmtime(3). And given current discussions, I may want to think about some kind of superblock flag to allow the use of a 32-bit unsigned encoding for file systems using a 128-byte inode, with a way of setting that flag after scanning the file system to make sure there are no times that are previous to January 1, 1970. (Or more generally, allow any epoch to be defined using a 64-bit time_t offset stored in the superblock...) Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote: > On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: > > > > I think you misunderstood what I suggested: the intent is to avoid > > seeing things break in 2038 by making them break much earlier. We have > > a solution for ext2 file systems, it's called ext4, and we just need > > to ensure that everybody knows they have to migrate eventually. > > > > At some point before the mid 2030ies, you should no longer be able to > > build a kernel that has support for ext2 or any other module that will > > run into bugs later > > Even for ext4, it's not quite so simple as that. You only have > support for times post 2038 if you are using an inode size > 128 > bytes. There are a very, very large number of machines which even > today, are using 128 byte inodes with ext4 for performance reasons. > > The vast majority of those machines which I know of can probably move > to 256 byte inodes relatively easily, since hard drive replacement > cycles are order 5-6 years tops, so I'm not that concerned, but it > just goes to show this is a very complicated problem. One stupid question about the current code: static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra) { if (sizeof(time->tv_sec) > 4) time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32; time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS; } #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)\ do { \ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \ (einode)->xtime.tv_sec = \ (signed)le32_to_cpu((raw_inode)->xtime); \ else \ (einode)->xtime.tv_sec = 0;\ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\ ext4_decode_extra_time(&(einode)->xtime, \ raw_inode->xtime ## _extra);\ else \ (einode)->xtime.tv_nsec = 0; \ } while (0) For a time between 2038 and 2106, this looks like xtime.tv_sec is negative when ext4_decode_extra_time gets called, so the '|=' operator doesn't actually do anything. Shouldn't that be '+='? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote: > On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: > > > > I think you misunderstood what I suggested: the intent is to avoid > > seeing things break in 2038 by making them break much earlier. We have > > a solution for ext2 file systems, it's called ext4, and we just need > > to ensure that everybody knows they have to migrate eventually. > > > > At some point before the mid 2030ies, you should no longer be able to > > build a kernel that has support for ext2 or any other module that will > > run into bugs later > > Even for ext4, it's not quite so simple as that. You only have > support for times post 2038 if you are using an inode size > 128 > bytes. There are a very, very large number of machines which even > today, are using 128 byte inodes with ext4 for performance reasons. > > The vast majority of those machines which I know of can probably move > to 256 byte inodes relatively easily, since hard drive replacement > cycles are order 5-6 years tops, so I'm not that concerned, but it > just goes to show this is a very complicated problem. Ok, I see. I also now noticed this comment above EXT4_FITS_IN_INODE(): "For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. None of the extended inode fields is critical for correct filesystem operation." Do we have to worry about this for inodes that contain extended attributes and that get updated after 2038? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: > > I think you misunderstood what I suggested: the intent is to avoid > seeing things break in 2038 by making them break much earlier. We have > a solution for ext2 file systems, it's called ext4, and we just need > to ensure that everybody knows they have to migrate eventually. > > At some point before the mid 2030ies, you should no longer be able to > build a kernel that has support for ext2 or any other module that will > run into bugs later Even for ext4, it's not quite so simple as that. You only have support for times post 2038 if you are using an inode size > 128 bytes. There are a very, very large number of machines which even today, are using 128 byte inodes with ext4 for performance reasons. The vast majority of those machines which I know of can probably move to 256 byte inodes relatively easily, since hard drive replacement cycles are order 5-6 years tops, so I'm not that concerned, but it just goes to show this is a very complicated problem. And even if we're talking about flash and embedded devices, the good news is if you assume that 10 years is enough time for people to update their embedded OS builds, and that the vast majority of deployed devices will probably only be in service for 10-15 years, we do have enough time to make file system format changes, although admittedly we can't afford to dilly-dally. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 10:28:22 Dave Chinner wrote: > On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > > In my list at http://kernelnewbies.org/y2038, I found that almost > > > all file systems at least times until 2106, because they treat > > > the on-disk value as unsigned on 64-bit systems, or they use > > > a completely different representation. My guess is that somebody > > > earlier spent a lot of work on making that happen. > > > > > > The exceptions are: > > > > > > * exofs uses signed values, which can probably be changed to be > > > consistent with the others. > > > * isofs has a bug that limits it until 2027 on architectures with > > > a signed 'char' type (otherwise it's 2155). > > > * udf can represent times for many thousands of years through a > > > 16-bit year representation, but the code to convert to epoch > > > uses a const array that ends at 2038. > > > * afs uses signed seconds and can probably be fixed > > > * coda relies on user space time representation getting passed > > > through an ioctl. > > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > > where they really use signed. > > > > > > I was confused about XFS since I didn't noticed that there are > > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > > > You've missed an awful lot more than just the implications for the > > core kernel code. > > > > There's a good chance such changes propagate to APIs elsewhere in > > the filesystems, because something you haven't realised is that XFS > > effectively exposes the on-disk timestamp format directly to > > userspace via the bulkstat interface (see struct xfs_bstat). It also > > affects the XFS open-by-handle ioctl and the swap extent ioctl used > > by the online defragmenter. I really didn't look at them at all, as ioctl is very late on my mental list of things to change. I do realize that a lot of drivers and file systems do have ioctls that pass time values and we need to address them one by one. I just looked at the ioctls you mentioned but don't see how open-by-handle is affected by this. Can you point me to what you mean? > Just to put that in context, here's the kernel patch to add extended > epoch support to XFS. It's completely untested as I haven't done any > userspace code changes to enable the feature. However, it should > give you an indication of how far the simple act of changing the > kernel time representation spread through the filesystem. This does > not include any of the VFS infrastructure to specifying the range of > supported timestamps. It survives some smoke testing, but dies when > the online defragmenter starts using the bulkstat and swap extent > ioctls (the assert in xfs_inode_time_from_epoch() fires), so I > probably don't have that all sorted correctly yet... > > To test extended epoch support, however, I need to some fstests that > define and validate the behaviour of the new syscalls - until we get > those we can't validate that the filesystem follows the spec > properly. I also suspect we are going to need an interface to query > the supported range of timestamps from a filesystem so that we can > test boundary conditions in an automated fashion Thanks a lot for having an initial look at this yourself! I'd still consider the two problems largely orthogonal. My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used by anything that knows how to handle larger than 32-bit second values (either 64-bit kernel or 32-bit with inode_time patch). > diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h > index 623bbe8..79f94722 100644 > --- a/fs/xfs/xfs_dinode.h > +++ b/fs/xfs/xfs_dinode.h > @@ -21,11 +21,53 @@ > #defineXFS_DINODE_MAGIC0x494e /* 'IN' */ > #define XFS_DINODE_GOOD_VERSION(v) ((v) >= 1 && (v) <= 3) > > +/* > + * Inode timestamps get more complex when we consider supporting times beyond > + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot > support > + * more than a single extension by playing sign games, and that is still not > + * reliable. We also can't extend the timestamp structure because there is no > + * free space around them in the on-disk inode. > + * > + * Hence the simplest thing to do is to add an epoch counter for each > timestamp > + * in the inode. This can be a single byte for each timestamp and make use of > + * a hole we currently pad. This gives us another 255 epochs range for the > + * timestamps, but requires a superblock feature bit to indicate that these > + * fields
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote: > > The 32 bit second counters in timestamps are too small to represent > time beyond the unix epoch (jan 2038) correctly. Extend the on-disk > format for a timestamp to include an 8-bit epoch counter so that we > can extend time for up to 255 Unix epochs. This should be good for > representing timestamps from 1970 to somewhere around 19,000 A.D > I assume you're using an 'epoch' variable and not simply using the padding byte as an eight-bit prefix to the existing 32-bit counter because the existing counter is signed ? For long term sanity it might make more sense for the eight-bit value to be a simple (sign-extended) prefix from 1970. So if the feature bit is set it's a 40-bit signed time, which is good for 1970 +/- 17400 years or so. -- Roger -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote: > Perhaps we should make this a kernel command line option instead, with the > settings: error out on outside the standard window, or a date indicating the > earliest date that should be recognized and do windowing (0 for no windowing, > 1970 for retconning the Unix epoch as unsigned...) What's wrong with compile-time errors? We have a pretty good understanding of how time values are passed in the kernel, and we know they will all break in 2038 for 32-bit kernels unless we do something about it. > But again, the kernel is probably the least problem here... I agree the glibc side is harder than this, but we have to get the kernel into shape first (at the minimum we have to do the APIs), and there is enough work to do here. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote: > > > For actually running kernels beyond 2038, the best idea I've seen so > > far is to disallow all broken code at compile time. I don't see > > a choice but to audit the entire kernel for invalid uses on both > > 32 and 64 bit in the next few years. A lot of code will get changed > > in the process so we can actually keep running 32-bit kernels and > > file systems, but other code will likely go away: > > > > * any system calls that pass a time_t, timeval or timespec on > > 32-bit systems return -ENOSYS, to ensure all user land uses > > the replacements we will put into place > > * The definition of 'time_t', 'timval' and 'timespec' can be hidden > > from the kernel, and all code using it left out. > > * ext2 and ext3 file system code will have to be disabled, but that's > > file since ext4 can mount old file systems. > > Syscalls and libs can be "fixed". Existing filesystem content might > not. So if you need to mount some old media in read-write mode after > 2038 and that happens to content an ext2 or similarly limited filesystem > then it'd better just "work". Having the kernel refuse to modify the > filesystem would be unacceptable. I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later. Until then (rather sooner than later), I'd like to get to the point where you can choose whether to include those modules at build time or not, and then get everybody to turn off that option and fix the bugs they run into. You wouldn't need that for a 2014-generation long-term support disto (rhel 7, sles 12, debian 7, ubuntu 14.04, ...), but perhaps for the next generation, or the one after that. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner wrote: > Filesystems place all sorts of userspace visible limits on storage - > ever tried to create a file >16TB on ext4? The on-disk format > doesn't support it, so it returns an out of range error (E2BIG, I > think) if you try. XFS, OTOH, handles this just fine and so it > continues to work. It's exactly the same with timestamps - there's a > physical limit to what can sanely be stored in any given filesystem > and it's an *error condition* to go beyond that limit This comparison doesn't fly. File sizes do not depend on the current time (except for the increase of megapixels in your new camera ;-). Writing a 15 GiB file to ext4 is not something that magically stops working tomorrow. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 2, 2014 at 4:22 AM, Dave Chinner da...@fromorbit.com wrote: Filesystems place all sorts of userspace visible limits on storage - ever tried to create a file 16TB on ext4? The on-disk format doesn't support it, so it returns an out of range error (E2BIG, I think) if you try. XFS, OTOH, handles this just fine and so it continues to work. It's exactly the same with timestamps - there's a physical limit to what can sanely be stored in any given filesystem and it's an *error condition* to go beyond that limit This comparison doesn't fly. File sizes do not depend on the current time (except for the increase of megapixels in your new camera ;-). Writing a 15 GiB file to ext4 is not something that magically stops working tomorrow. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote: For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. Syscalls and libs can be fixed. Existing filesystem content might not. So if you need to mount some old media in read-write mode after 2038 and that happens to content an ext2 or similarly limited filesystem then it'd better just work. Having the kernel refuse to modify the filesystem would be unacceptable. I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later. Until then (rather sooner than later), I'd like to get to the point where you can choose whether to include those modules at build time or not, and then get everybody to turn off that option and fix the bugs they run into. You wouldn't need that for a 2014-generation long-term support disto (rhel 7, sles 12, debian 7, ubuntu 14.04, ...), but perhaps for the next generation, or the one after that. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sunday 01 June 2014 13:26:03 H. Peter Anvin wrote: Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...) What's wrong with compile-time errors? We have a pretty good understanding of how time values are passed in the kernel, and we know they will all break in 2038 for 32-bit kernels unless we do something about it. But again, the kernel is probably the least problem here... I agree the glibc side is harder than this, but we have to get the kernel into shape first (at the minimum we have to do the APIs), and there is enough work to do here. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, 2014-06-02 at 10:28 +1000, Dave Chinner wrote: The 32 bit second counters in timestamps are too small to represent time beyond the unix epoch (jan 2038) correctly. Extend the on-disk format for a timestamp to include an 8-bit epoch counter so that we can extend time for up to 255 Unix epochs. This should be good for representing timestamps from 1970 to somewhere around 19,000 A.D I assume you're using an 'epoch' variable and not simply using the padding byte as an eight-bit prefix to the existing 32-bit counter because the existing counter is signed ? For long term sanity it might make more sense for the eight-bit value to be a simple (sign-extended) prefix from 1970. So if the feature bit is set it's a 40-bit signed time, which is good for 1970 +/- 17400 years or so. -- Roger -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 10:28:22 Dave Chinner wrote: On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. I really didn't look at them at all, as ioctl is very late on my mental list of things to change. I do realize that a lot of drivers and file systems do have ioctls that pass time values and we need to address them one by one. I just looked at the ioctls you mentioned but don't see how open-by-handle is affected by this. Can you point me to what you mean? Just to put that in context, here's the kernel patch to add extended epoch support to XFS. It's completely untested as I haven't done any userspace code changes to enable the feature. However, it should give you an indication of how far the simple act of changing the kernel time representation spread through the filesystem. This does not include any of the VFS infrastructure to specifying the range of supported timestamps. It survives some smoke testing, but dies when the online defragmenter starts using the bulkstat and swap extent ioctls (the assert in xfs_inode_time_from_epoch() fires), so I probably don't have that all sorted correctly yet... To test extended epoch support, however, I need to some fstests that define and validate the behaviour of the new syscalls - until we get those we can't validate that the filesystem follows the spec properly. I also suspect we are going to need an interface to query the supported range of timestamps from a filesystem so that we can test boundary conditions in an automated fashion Thanks a lot for having an initial look at this yourself! I'd still consider the two problems largely orthogonal. My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used by anything that knows how to handle larger than 32-bit second values (either 64-bit kernel or 32-bit with inode_time patch). diff --git a/fs/xfs/xfs_dinode.h b/fs/xfs/xfs_dinode.h index 623bbe8..79f94722 100644 --- a/fs/xfs/xfs_dinode.h +++ b/fs/xfs/xfs_dinode.h @@ -21,11 +21,53 @@ #defineXFS_DINODE_MAGIC0x494e /* 'IN' */ #define XFS_DINODE_GOOD_VERSION(v) ((v) = 1 (v) = 3) +/* + * Inode timestamps get more complex when we consider supporting times beyond + * the standard unix epoch of Jan 2038. The struct xfs_timestamp cannot support + * more than a single extension by playing sign games, and that is still not + * reliable. We also can't extend the timestamp structure because there is no + * free space around them in the on-disk inode. + * + * Hence the simplest thing to do is to add an epoch counter for each timestamp + * in the inode. This can be a single byte for each timestamp and make use of + * a hole we currently pad. This gives us another 255 epochs range for the + * timestamps, but requires a superblock feature bit to indicate that these + * fields have meaning and can be non-zero. Nice trick! +static inline __uint8_t +xfs_timestamp_epoch( + struct timespec *time) +{
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later Even for ext4, it's not quite so simple as that. You only have support for times post 2038 if you are using an inode size 128 bytes. There are a very, very large number of machines which even today, are using 128 byte inodes with ext4 for performance reasons. The vast majority of those machines which I know of can probably move to 256 byte inodes relatively easily, since hard drive replacement cycles are order 5-6 years tops, so I'm not that concerned, but it just goes to show this is a very complicated problem. And even if we're talking about flash and embedded devices, the good news is if you assume that 10 years is enough time for people to update their embedded OS builds, and that the vast majority of deployed devices will probably only be in service for 10-15 years, we do have enough time to make file system format changes, although admittedly we can't afford to dilly-dally. Regards, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote: On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later Even for ext4, it's not quite so simple as that. You only have support for times post 2038 if you are using an inode size 128 bytes. There are a very, very large number of machines which even today, are using 128 byte inodes with ext4 for performance reasons. The vast majority of those machines which I know of can probably move to 256 byte inodes relatively easily, since hard drive replacement cycles are order 5-6 years tops, so I'm not that concerned, but it just goes to show this is a very complicated problem. Ok, I see. I also now noticed this comment above EXT4_FITS_IN_INODE(): For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. None of the extended inode fields is critical for correct filesystem operation. Do we have to worry about this for inodes that contain extended attributes and that get updated after 2038? Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 07:57:37 Theodore Ts'o wrote: On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later Even for ext4, it's not quite so simple as that. You only have support for times post 2038 if you are using an inode size 128 bytes. There are a very, very large number of machines which even today, are using 128 byte inodes with ext4 for performance reasons. The vast majority of those machines which I know of can probably move to 256 byte inodes relatively easily, since hard drive replacement cycles are order 5-6 years tops, so I'm not that concerned, but it just goes to show this is a very complicated problem. One stupid question about the current code: static inline void ext4_decode_extra_time(struct inode_time *time, __le32 extra) { if (sizeof(time-tv_sec) 4) time-tv_sec |= (__u64)(le32_to_cpu(extra) EXT4_EPOCH_MASK) 32; time-tv_nsec = (le32_to_cpu(extra) EXT4_NSEC_MASK) EXT4_EPOCH_BITS; } #define EXT4_EINODE_GET_XTIME(xtime, einode, raw_inode)\ do { \ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime)) \ (einode)-xtime.tv_sec = \ (signed)le32_to_cpu((raw_inode)-xtime); \ else \ (einode)-xtime.tv_sec = 0;\ if (EXT4_FITS_IN_INODE(raw_inode, einode, xtime ## _extra))\ ext4_decode_extra_time((einode)-xtime, \ raw_inode-xtime ## _extra);\ else \ (einode)-xtime.tv_nsec = 0; \ } while (0) For a time between 2038 and 2106, this looks like xtime.tv_sec is negative when ext4_decode_extra_time gets called, so the '|=' operator doesn't actually do anything. Shouldn't that be '+='? Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
Yes, there are some ongoing dicussions about changing the post-2038 encoding of the timestamp in ext4, which is why this hasn't been fixed yet. The main thing that's been missing is time for me to review the patches, and a good way of writing regression tests that will work (or at least not fail) on build environments with a 32-bit time_t and 32-bit-only capable versions of functions such as gmtime(3). And given current discussions, I may want to think about some kind of superblock flag to allow the use of a 32-bit unsigned encoding for file systems using a 128-byte inode, with a way of setting that flag after scanning the file system to make sure there are no times that are previous to January 1, 1970. (Or more generally, allow any epoch to be defined using a 64-bit time_t offset stored in the superblock...) Cheers, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 02:38:09PM +0200, Arnd Bergmann wrote: For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. None of the extended inode fields is critical for correct filesystem operation. Do we have to worry about this for inodes that contain extended attributes and that get updated after 2038? In practice, the extended timestamps was one of the first things added to ext4, so the vast majority of ext4 file systems with inode sizes 128 bytes will have room for the extended timestamps. There are some legacy ext3 file systems with 256-byte inodes (enabled for fast sotrage of SELinux xattrs) that in theory, could have been converted to ext4 and had enough xattrs so that the extended timestamps couldn't be added. That would be a vanishingly small use case, and in practice, it's not likely to be the case for the embedded market. I could imagine someone worrying about file systems originally formatted using RHEL 4 post-2038 (perhaps running in a VM), but I don't work for IBM any more, and hopefully even IBM would just tell such customers that they need to suck it up, and do a backup/reformat/restore pass. Cheers, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, 31 May 2014, Dave Chinner wrote: If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly I don't see anything new about this issue. All problems that could arise from the kernel being able to represent a timestamp some filesystems can't are problems that already apply with 64-bit kernels using 64-bit time_t internally. So while as part of Y2038-preparedness we do need a clear understanding of which filesystems have what timestamp limits and what happens with timestamps beyond those limits, I think this is a separate strand of the problem - one that applies to both 32-bit and 64-bit systems - from the more general issue for 32-bit systems. -- Joseph S. Myers jos...@codesourcery.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Jun 2, 2014, at 4:57, Theodore Ts'o ty...@mit.edu wrote: On Mon, Jun 02, 2014 at 12:56:42PM +0200, Arnd Bergmann wrote: I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later Even for ext4, it's not quite so simple as that. You only have support for times post 2038 if you are using an inode size 128 bytes. There are a very, very large number of machines which even today, are using 128 byte inodes with ext4 for performance reasons. The vast majority of those machines which I know of can probably move to 256 byte inodes relatively easily, since hard drive replacement cycles are order 5-6 years tops, so I'm not that concerned, but it just goes to show this is a very complicated problem. And even if we're talking about flash and embedded devices, the good news is if you assume that 10 years is enough time for people to update their embedded OS builds, and that the vast majority of deployed devices will probably only be in service for 10-15 years, we do have enough time to make file system format changes, although admittedly we can't afford to dilly-dally. I have a number of file systems older than any device they are sitting on. RAID allows individual disks to be swapped out, and when all disks have been swapped out, extend the file system online. The system doesn't even have to be taken offline in the process if it is possible to physically get to the drives with the system powered (e.g. hot plug bays), which is really damned nice.-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 09:07:00 Theodore Ts'o wrote: Yes, there are some ongoing dicussions about changing the post-2038 encoding of the timestamp in ext4, which is why this hasn't been fixed yet. The main thing that's been missing is time for me to review the patches, and a good way of writing regression tests that will work (or at least not fail) on build environments with a 32-bit time_t and 32-bit-only capable versions of functions such as gmtime(3). And given current discussions, I may want to think about some kind of superblock flag to allow the use of a 32-bit unsigned encoding for file systems using a 128-byte inode, with a way of setting that flag after scanning the file system to make sure there are no times that are previous to January 1, 1970. (Or more generally, allow any epoch to be defined using a 64-bit time_t offset stored in the superblock...) FWIW, I've gone through the other file system implementations once more. The most common pattern I've encountered is to have a read_inode function with inode-i_mtime = le32_to_cpu(raw_inode-mtime); which results in interpreting the time as 'signed' on 32-bit kernels, but as 'unsigned' on 64-bit kernels. This could have been done intentionally to extend the valid time range to 2106 on 64-bit kernels, but it seems more likely that the code was written with no thought given to 64-bit time_t at all. I see this pattern on p9fs (old protocol only), afs, bfs, ceph, efs, freevxfs, hpfs, jffs2, jfs, minix, nfsv2/v3 (this was clearly intentional and is spelled out in the RFC), qnx4, qnx6, reiserfs, squashfs, sysv, and ufs (protocol version 1 only). The other behavior I see is to treat the on-disk 32-bit value as signed on both 32-bit and 64-bit kernels: inode-i_mtime = (signed)le32_to_cpu(raw_inode-mtime); this seems to be done intentionally in all cases, to maintain compatibility between 32-bit and 64-bit kernels, but it's relatively rare: exofs, ext2/3/4 (good old inodes) and xfs are the only ones doing this. In case of ext2/3/4, the sign handlign was introduced here: http://www.spinics.net/lists/linux-ext4/msg01758.html exofs and xfs seem to have done it like this for all of git history. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Jun 2, 2014, at 6:56 AM, Arnd Bergmann a...@arndb.de wrote: On Sunday 01 June 2014 21:36:26 Nicolas Pitre wrote: For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. Syscalls and libs can be fixed. Existing filesystem content might not. So if you need to mount some old media in read-write mode after 2038 and that happens to content an ext2 or similarly limited filesystem then it'd better just work. Having the kernel refuse to modify the filesystem would be unacceptable. I think you misunderstood what I suggested: the intent is to avoid seeing things break in 2038 by making them break much earlier. We have a solution for ext2 file systems, it's called ext4, and we just need to ensure that everybody knows they have to migrate eventually. At some point before the mid 2030ies, you should no longer be able to build a kernel that has support for ext2 or any other module that will run into bugs later. Until then (rather sooner than later), I'd like to get to the point where you can choose whether to include those modules at build time or not, and then get everybody to turn off that option and fix the bugs they run into. You wouldn't need that for a 2014-generation long-term support disto (rhel 7, sles 12, debian 7, ubuntu 14.04, ...), but perhaps for the next generation, or the one after that. I’m wondering what should be done about NFS. A solution for NFS should match any scheme that is considered for local file systems, IMO. NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). NFSv4 uses a signed 64-bit value where zero represents midnight UTC on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See the definition of nfstime4 in RFC 5661). The NFSv4 protocol is probably not problematic, and NFSv3 should be out of the picture by 2038. But if changes are planned for dealing _now_ with timestamp issues, compatibility with NFSv3 is a consideration. It is already the case that, via NFSv3, the Linux NFS client transmits timestamps earlier than 1970 as large positive numbers. Try this with xfstests generic/258. Maybe nfs3_proc_setattr() should recognize pre-epoch timestamps and timestamps larger than can be represented in an unsigned 32-bit field and return an immediate error to the requesting application (like EINVAL). If the Linux NFS server encounters a local file with a timestamp that cannot be represented via a u32, should it also return NFS3ERR_INVAL? RFC 1813 does not provide guidance on the behavior nor does it suggest a particular error status code. The Solaris 11 server appears to return NFS3ERR_INVAL in this case. An alternative would be to “cap” the timestamps transmitted via NFSv3 by Linux, so that a pre-epoch timestamp is transmitted as zero, and a large timestamp is transmitted as UINT_MAX. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 11:04:23AM -0400, Chuck Lever wrote: I’m wondering what should be done about NFS. A solution for NFS should match any scheme that is considered for local file systems, IMO. An alternative would be to “cap” the timestamps transmitted via NFSv3 by Linux, so that a pre-epoch timestamp is transmitted as zero, and a large timestamp is transmitted as UINT_MAX. I wonder if it would make sense to try to promulgate via the Austin group, and possibly the C standards committee the concept of a bit pattern (that might commonly be INT_MAX or UINT_MAX) that means time unknown, or time indefinite or we couldn't encode the time. We would then teach gmtime(3) and asctime(3) to print some appropriate message, and we could teach programs like find (with the -mtime) option, make, tmpwatch, et. al., that they can't make any presumption about the comparibility of any timestamp which has a value of TIME_UNDEFINIED. It would be problematic for time(2) or gettimeofday(2) to return TIME_UNDEFINED, since there are programs that care about time ticking forward, but I could imagine a new interface which would be permitted to return a flag indicating that we don't know the current time (because the CMOS battery had run down, etc.) so instead we're going to be counting the number of seconds since the system was booted. - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 06/02/2014 08:31 AM, Theodore Ts'o wrote: I wonder if it would make sense to try to promulgate via the Austin group, and possibly the C standards committee the concept of a bit pattern (that might commonly be INT_MAX or UINT_MAX) that means time unknown, or time indefinite or we couldn't encode the time. (time_t)-1 already has this meaning for some calls (e.g. time(2)). However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately something similar applies to all possible bit patterns, certainly within the range of an int. We would then teach gmtime(3) and asctime(3) to print some appropriate message, and we could teach programs like find (with the -mtime) option, make, tmpwatch, et. al., that they can't make any presumption about the comparibility of any timestamp which has a value of TIME_UNDEFINIED. It would be problematic for time(2) or gettimeofday(2) to return TIME_UNDEFINED, since there are programs that care about time ticking forward, but I could imagine a new interface which would be permitted to return a flag indicating that we don't know the current time (because the CMOS battery had run down, etc.) so instead we're going to be counting the number of seconds since the system was booted. This assumes that we actually know that that is the case, which may be an aggressive assumption. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 11:04:23 Chuck Lever wrote: I’m wondering what should be done about NFS. A solution for NFS should match any scheme that is considered for local file systems, IMO. NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). NFSv4 uses a signed 64-bit value where zero represents midnight UTC on January 1, 1970, and an unsigned 32-bit value for nanoseconds. (See the definition of nfstime4 in RFC 5661). The NFSv4 protocol is probably not problematic, and NFSv3 should be out of the picture by 2038. But if changes are planned for dealing _now_ with timestamp issues, compatibility with NFSv3 is a consideration. It is already the case that, via NFSv3, the Linux NFS client transmits timestamps earlier than 1970 as large positive numbers. Try this with xfstests generic/258. If I read the code correctly, a pre-1970 timestamp will be sent as a large unsigned integer, but received as a post-2038 timestamp on 64-bit kernels, both in the nfs client and server code. This behavior is clearly wrong, but it's the same bug that we have in lots of other file systems, and it makes sense to have the same fix everywhere, at lease the cases where we know what interpretation we actually want. NFS has the luxury of having an actual specification saying that the value is unsigned. For most of the legacy file systems, we can only make a guess at how other OSs would interpret the same numbers. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 10:12:37 H. Peter Anvin wrote: On 06/02/2014 08:31 AM, Theodore Ts'o wrote: I wonder if it would make sense to try to promulgate via the Austin group, and possibly the C standards committee the concept of a bit pattern (that might commonly be INT_MAX or UINT_MAX) that means time unknown, or time indefinite or we couldn't encode the time. (time_t)-1 already has this meaning for some calls (e.g. time(2)). However, this also means Wed Dec 31 23:59:59 UTC 1969, and unfortunately something similar applies to all possible bit patterns, certainly within the range of an int. Worse than Wed Dec 31 23:59:59 UTC 1969, on NFSv3 it also means Sun Feb 7 07:28:15 CET 2106, and that is much harder to distinguish from a real future date. If we had the choice, I'd go for something like 1, i.e. Thu Jan 1 01:00:01 CET 1970. We would then teach gmtime(3) and asctime(3) to print some appropriate message, and we could teach programs like find (with the -mtime) option, make, tmpwatch, et. al., that they can't make any presumption about the comparibility of any timestamp which has a value of TIME_UNDEFINIED. It would be problematic for time(2) or gettimeofday(2) to return TIME_UNDEFINED, since there are programs that care about time ticking forward, but I could imagine a new interface which would be permitted to return a flag indicating that we don't know the current time (because the CMOS battery had run down, etc.) so instead we're going to be counting the number of seconds since the system was booted. This assumes that we actually know that that is the case, which may be an aggressive assumption. It's harder for time(2), but for the inode case, we can definitely detect when the file system specific representation overflows or underflows, which may be be at a number of very different points of time. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). nfstime3 could be extended by redefining the otherwise unused nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit seconds field and an unsigned 30-bit nanoseconds field. This could represent 1970 +/- 272 years. Servers could indicate they can understand the extended time format by adding a new FSINFO capability - FSF3_CANSETTIME_EX. Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending timestamps so old servers would be protected from new clients. Old clients don't need to be protected from new servers because the on-the-wire bit pattern for dates between 1970 and 2106 stays the same, so they're no worse off than they were before. Arguably the new server ought to clamp out-of-range timestamps before sending them to old clients but that would need per-client state (and nfs3 is stateless.) -- Roger -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Jun 2, 2014, at 2:58 PM, Roger Willcocks ro...@filmlight.ltd.uk wrote: On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). nfstime3 could be extended by redefining the otherwise unused nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit seconds field and an unsigned 30-bit nanoseconds field. This could represent 1970 +/- 272 years. Servers could indicate they can understand the extended time format by adding a new FSINFO capability - FSF3_CANSETTIME_EX. Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending timestamps so old servers would be protected from new clients. You would have to get the IETF’s NFSv4 working group to sign off on this change. Otherwise, Linux would be the only NFSv3 implementation that supports the extension. But I suspect the answer you’d get is “Use NFSv4.” Old clients don't need to be protected from new servers because the on-the-wire bit pattern for dates between 1970 and 2106 stays the same, so they're no worse off than they were before. Arguably the new server ought to clamp out-of-range timestamps before sending them to old clients but that would need per-client state (and nfs3 is stateless.) There’s no reliable way in NFSv3 for clients and servers to identify the software running on the peer. Practically speaking, you should assume that the NFSv3 protocol is never going to change. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Monday 02 June 2014 15:04:27 Chuck Lever wrote: On Jun 2, 2014, at 2:58 PM, Roger Willcocks ro...@filmlight.ltd.uk wrote: On Mon, 2014-06-02 at 11:04 -0400, Chuck Lever wrote: NFSv2/3 timestamps are a pair of unsigned 32-bit values: one value for seconds since midnight GMT Jan 1, 1970, and one value for nanoseconds. (See the definition of nfstime3 in RFC 1813). nfstime3 could be extended by redefining the otherwise unused nanoseconds bits{31,30} as seconds{33,32}, to give a (signed) 34-bit seconds field and an unsigned 30-bit nanoseconds field. This could represent 1970 +/- 272 years. Servers could indicate they can understand the extended time format by adding a new FSINFO capability - FSF3_CANSETTIME_EX. Clients would use a new SET_TO_CLIENT_TIME_EX time_how enum when sending timestamps so old servers would be protected from new clients. You would have to get the IETF’s NFSv4 working group to sign off on this change. Otherwise, Linux would be the only NFSv3 implementation that supports the extension. But I suspect the answer you’d get is “Use NFSv4.” While I've never dealt with an NFS standardization, I'd assume this is a workable answer. The NFSv2 and NFSv3 definition clearly defines a valid range of times until 2106 using unsigned seconds, and that should really give enough time to migrate to something better (not necessarily NFSv4). Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 10:12:37AM -0700, H. Peter Anvin wrote: It would be problematic for time(2) or gettimeofday(2) to return TIME_UNDEFINED, since there are programs that care about time ticking forward, but I could imagine a new interface which would be permitted to return a flag indicating that we don't know the current time (because the CMOS battery had run down, etc.) so instead we're going to be counting the number of seconds since the system was booted. This assumes that we actually know that that is the case, which may be an aggressive assumption. We won't know if the RTC clock is wrong, true --- but the kernel will know if (a) the hardware doesn't have RTC clock at all, or if (b) the RTC clock is ticking some time that can't be encoded using the current time_t type. So in that case, the fallback would be to be for the kernel to tick starting with time_t == 0 when the system is initially booted, and the time indefinite flag would be set. Now assume that we have a new system call, gettimestampofday(2), which returns a new timestamp structure which has a 64-bit ts_sec field, the ts_nsec field (ala struct timespec), and a ts_flags field, where the kernel could signal things like time invalid, or time can't be encoded in the legacy time_t type, or I'm not sure if the time is correct --- i.e., because the RTC battery isn't working. Not all hardware might be able to support the last, of course, but if the battery is low, or the system has been exposed to very low temperatures (or large amounts of cosmic radiation, etc.) the RTC time may just be plain wrong. No system is going to be perfect, but it should be possible to make htings better, at for certain classes of hardware. And since we are already returning (time_t) -1 in some cases, we might as well try to make things a bit more formal. - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 06/02/2014 03:29 PM, Theodore Ts'o wrote: And since we are already returning (time_t) -1 in some cases, we might as well try to make things a bit more formal. Are we? I am not aware of *Linux* actually using that. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote: On 06/02/2014 03:29 PM, Theodore Ts'o wrote: And since we are already returning (time_t) -1 in some cases, we might as well try to make things a bit more formal. Are we? I am not aware of *Linux* actually using that. Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per the Posix specification: SYSCALL_DEFINE1(time, time_t __user *, tloc) { time_t i = get_seconds(); if (tloc) { if (put_user(i,tloc)) return -EFAULT; } force_successful_syscall_return(); return i; } Cheers, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 06/02/2014 04:32 PM, Theodore Ts'o wrote: On Mon, Jun 02, 2014 at 03:32:35PM -0700, H. Peter Anvin wrote: On 06/02/2014 03:29 PM, Theodore Ts'o wrote: And since we are already returning (time_t) -1 in some cases, we might as well try to make things a bit more formal. Are we? I am not aware of *Linux* actually using that. Linux's time(2) can return (time_t) -1 and set errno to EFAULT, per the Posix specification: SYSCALL_DEFINE1(time, time_t __user *, tloc) { time_t i = get_seconds(); if (tloc) { if (put_user(i,tloc)) return -EFAULT; } force_successful_syscall_return(); return i; } OK, I guess I should have said... other than for -EFAULT. I just don't know of anyone using time(2) with an argument other than NULL. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Mon, Jun 02, 2014 at 01:43:44PM +0200, Arnd Bergmann wrote: On Monday 02 June 2014 10:28:22 Dave Chinner wrote: On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. I really didn't look at them at all, as ioctl is very late on my mental list of things to change. I do realize that a lot of drivers and file systems do have ioctls that pass time values and we need to address them one by one. I just looked at the ioctls you mentioned but don't see how open-by-handle is affected by this. Can you point me to what you mean? Sorry, I misremembered how some of the XFS open-by-handle code works in userspace (XFS has a pretty rich open-by-handle ioctl() interface that predates the kernel syscalls by at least 10 years). Basically there is code in userspace that uses the information returned from bulkstat to construct file handles to pass to the open-by-handle ioctls. xfs_fsr then uses the combination of open-by-handle from the bulkstat output and the bulkstat output to feed into the swap extent ioctls i.e. the filesystem's idea of what time is is passed to userspace as an opaque cookie in this case, but it is not used directly by the open-by-handle interfaces like I implied it was. Just to put that in context, here's the kernel patch to add extended epoch support to XFS. It's completely untested as I haven't done any userspace code changes to enable the feature. However, it should give you an indication of how far the simple act of changing the kernel time representation spread through the filesystem. This does not include any of the VFS infrastructure to specifying the range of supported timestamps. It survives some smoke testing, but dies when the online defragmenter starts using the bulkstat and swap extent ioctls (the assert in xfs_inode_time_from_epoch() fires), so I probably don't have that all sorted correctly yet... To test extended epoch support, however, I need to some fstests that define and validate the behaviour of the new syscalls - until we get those we can't validate that the filesystem follows the spec properly. I also suspect we are going to need an interface to query the supported range of timestamps from a filesystem so that we can test boundary conditions in an automated fashion Thanks a lot for having an initial look at this yourself! I'd still consider the two problems largely orthogonal. Depends how you look at it. You can't extend the kernel's idea of time without permanent storage being able to specify the supported bounds - that's a non-negotiable aspect of introducing extended epoch timestamp support. The actual addition of extended timestamp support to each individual filesystem is orthoganol to the introduction of the struct inode_time, but doing this addition properly is dependent on the VFS infrastructure being there in the first place. My patch set (at least with the 64-bit tv_sec) just gets 32-bit kernels to behave more like 64-bit kernels regarding inode time stamps, which does impact all the file systems that the a 64-bit time or the NFS unsigned epoch (1970-2106), while your patch extends the file system internal epoch (1901-2038 for XFS) so it can be used
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote: > On Sun, 1 Jun 2014, Arnd Bergmann wrote: > > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: > > For actually running kernels beyond 2038, the best idea I've seen so > > far is to disallow all broken code at compile time. I don't see > > a choice but to audit the entire kernel for invalid uses on both > > 32 and 64 bit in the next few years. A lot of code will get changed > > in the process so we can actually keep running 32-bit kernels and > > file systems, but other code will likely go away: > > > > * any system calls that pass a time_t, timeval or timespec on > > 32-bit systems return -ENOSYS, to ensure all user land uses > > the replacements we will put into place > > * The definition of 'time_t', 'timval' and 'timespec' can be hidden > > from the kernel, and all code using it left out. > > * ext2 and ext3 file system code will have to be disabled, but that's > > file since ext4 can mount old file systems. > > Syscalls and libs can be "fixed". Existing filesystem content might > not. So if you need to mount some old media in read-write mode after > 2038 and that happens to content an ext2 or similarly limited filesystem > then it'd better just "work". Having the kernel refuse to modify the > filesystem would be unacceptable. We can already tell the VFS/filesystems not to update timestamps: inode->i_flags |= S_NOATIME | S_NOCMTIME; Just enforce that everywhere (i.e. notify_change()) rather than just on the IO path and the "legacy filesystem timestamp" problem is "solved". New interfaces need to return errors when an out-of-range parameter is set. And right now, >epoch dates are out of range for most filesystems, and so we need to handle that condition appropriately. Silent date overflow == filesystem corruption, and as such I'm going to error out such conditions in the filesystem regardless of what the userspace API says. Filesystems place all sorts of userspace visible limits on storage - ever tried to create a file >16TB on ext4? The on-disk format doesn't support it, so it returns an out of range error (E2BIG, I think) if you try. XFS, OTOH, handles this just fine and so it continues to work. It's exactly the same with timestamps - there's a physical limit to what can sanely be stored in any given filesystem and it's an *error condition* to go beyond that limit Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sun, 1 Jun 2014, Arnd Bergmann wrote: > On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: > > > readonly if not in reality than in practice. > > > > For those (legacy) filesystems with a signed 32-bit timestamps, any > > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be > > (silently) clamped to 0x7fff and that value (the last representable > > time) used as an overflow indicator. The filesystem driver should > > convert that value into a corresponding overflow value for whatever > > kernel internal time representation being used when read back, and this > > should be propagated up to user space. It should not be a hard error > > otherwise, as you rightfully stated, everything non read-only would come > > to a halt on that day. > > I don't think there is much of a difference between not being able to > write at all and all newly written files having the same timestamp, > causing random things to break differently. Well, in one case you have a crash certitude. In the other case you have some probability that your system might still be usable. > The clamp to the maximum supported time stamp sounds like a reasonable > choice for 'utimens' and related syscalls for the case of someone > setting an arbitrary future date beyond what the file system can > represent. Then again, I don't see a reason why that shouldn't just > cause an error to be returned. Resiliance is better than outright failure. > For actually running kernels beyond 2038, the best idea I've seen so > far is to disallow all broken code at compile time. I don't see > a choice but to audit the entire kernel for invalid uses on both > 32 and 64 bit in the next few years. A lot of code will get changed > in the process so we can actually keep running 32-bit kernels and > file systems, but other code will likely go away: > > * any system calls that pass a time_t, timeval or timespec on > 32-bit systems return -ENOSYS, to ensure all user land uses > the replacements we will put into place > * The definition of 'time_t', 'timval' and 'timespec' can be hidden > from the kernel, and all code using it left out. > * ext2 and ext3 file system code will have to be disabled, but that's > file since ext4 can mount old file systems. Syscalls and libs can be "fixed". Existing filesystem content might not. So if you need to mount some old media in read-write mode after 2038 and that happens to content an ext2 or similarly limited filesystem then it'd better just "work". Having the kernel refuse to modify the filesystem would be unacceptable. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: > On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > > In my list at http://kernelnewbies.org/y2038, I found that almost > > all file systems at least times until 2106, because they treat > > the on-disk value as unsigned on 64-bit systems, or they use > > a completely different representation. My guess is that somebody > > earlier spent a lot of work on making that happen. > > > > The exceptions are: > > > > * exofs uses signed values, which can probably be changed to be > > consistent with the others. > > * isofs has a bug that limits it until 2027 on architectures with > > a signed 'char' type (otherwise it's 2155). > > * udf can represent times for many thousands of years through a > > 16-bit year representation, but the code to convert to epoch > > uses a const array that ends at 2038. > > * afs uses signed seconds and can probably be fixed > > * coda relies on user space time representation getting passed > > through an ioctl. > > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > > where they really use signed. > > > > I was confused about XFS since I didn't noticed that there are > > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > > XFS to also use the 1970-2106 time range on 64-bit systems today. > > You've missed an awful lot more than just the implications for the > core kernel code. > > There's a good chance such changes propagate to APIs elsewhere in > the filesystems, because something you haven't realised is that XFS > effectively exposes the on-disk timestamp format directly to > userspace via the bulkstat interface (see struct xfs_bstat). It also > affects the XFS open-by-handle ioctl and the swap extent ioctl used > by the online defragmenter. > > IOWs, if we are changing the on-disk timestamp format then this > affects several ioctl()s and hence quite a few of the XFS userspace > utilities. The hardest to fix will be xfsdump which would need a new > dump format to store the extended timestamp ranges, and then > xfs_restore will need to be able to handle restoring such timestamps > on filesystems that don't have extended timestamp support... > > Put simply, changing the structure of system time isn't as straight > forward as changing the kernel structures. System time gets stored > permanently, and that has a cascade effect through the kernel all > to all of the filesystem utilities that know about that permanent > storage in some way > > So yes, you can change the kernel definition, but until the > permanent storage of system time can be extended to support the same > range as the kernel the *system* will still have nasty, silent epoch > overflow, truncation or corruption issues. Just to put that in context, here's the kernel patch to add extended epoch support to XFS. It's completely untested as I haven't done any userspace code changes to enable the feature. However, it should give you an indication of how far the simple act of changing the kernel time representation spread through the filesystem. This does not include any of the VFS infrastructure to specifying the range of supported timestamps. It survives some smoke testing, but dies when the online defragmenter starts using the bulkstat and swap extent ioctls (the assert in xfs_inode_time_from_epoch() fires), so I probably don't have that all sorted correctly yet... To test extended epoch support, however, I need to some fstests that define and validate the behaviour of the new syscalls - until we get those we can't validate that the filesystem follows the spec properly. I also suspect we are going to need an interface to query the supported range of timestamps from a filesystem so that we can test boundary conditions in an automated fashion Cheers, Dave. -- Dave Chinner da...@fromorbit.com xfs: support timestamps beyond Unix epochs From: Dave Chinner The 32 bit second counters in timestamps are too small to represent time beyond the unix epoch (jan 2038) correctly. Extend the on-disk format for a timestamp to include an 8-bit epoch counter so that we can extend time for up to 255 Unix epochs. This should be good for representing timestamps from 1970 to somewhere around 19,000 A.D Signed-off-by: Dave Chinner --- fs/xfs/time.h| 7 -- fs/xfs/xfs_bmap_util.c | 35 +--- fs/xfs/xfs_dinode.h | 48 ++- fs/xfs/xfs_fs.h | 9 +++- fs/xfs/xfs_fsops.c | 5 +++- fs/xfs/xfs_inode.c | 16 ++--- fs/xfs/xfs_inode_buf.c | 8 +++ fs/xfs/xfs_ioctl32.c | 3 +++ fs/xfs/xfs_ioctl32.h | 5 +++- fs/xfs/xfs_iops.c| 59 +++- fs/xfs/xfs_itable.c | 12 ++ fs/xfs/xfs_log_format.h | 4 fs/xfs/xfs_sb.h | 12 +- fs/xfs/xfs_trans_inode.c | 2 +- 14 files changed, 175 insertions(+),
Re: [RFC 11/32] xfs: convert to struct inode_time
Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...) But again, the kernel is probably the least problem here... On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann wrote: >On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: >> > readonly if not in reality than in practice. >> >> For those (legacy) filesystems with a signed 32-bit timestamps, any >> attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be > >> (silently) clamped to 0x7fff and that value (the last >representable >> time) used as an overflow indicator. The filesystem driver should >> convert that value into a corresponding overflow value for whatever >> kernel internal time representation being used when read back, and >this >> should be propagated up to user space. It should not be a hard error > >> otherwise, as you rightfully stated, everything non read-only would >come >> to a halt on that day. > >I don't think there is much of a difference between not being able to >write at all and all newly written files having the same timestamp, >causing random things to break differently. > >The clamp to the maximum supported time stamp sounds like a reasonable >choice for 'utimens' and related syscalls for the case of someone >setting an arbitrary future date beyond what the file system can >represent. Then again, I don't see a reason why that shouldn't just >cause an error to be returned. > >For actually running kernels beyond 2038, the best idea I've seen so >far is to disallow all broken code at compile time. I don't see >a choice but to audit the entire kernel for invalid uses on both >32 and 64 bit in the next few years. A lot of code will get changed >in the process so we can actually keep running 32-bit kernels and >file systems, but other code will likely go away: > >* any system calls that pass a time_t, timeval or timespec on > 32-bit systems return -ENOSYS, to ensure all user land uses > the replacements we will put into place >* The definition of 'time_t', 'timval' and 'timespec' can be hidden > from the kernel, and all code using it left out. >* ext2 and ext3 file system code will have to be disabled, but that's > file since ext4 can mount old file systems. >* until xfs gets extended, we can also disiable it at build time. > >For most users, we probably want to leave all that enabled by >default until we get much closer to 2038, but a compile time >option should allow us to test what works or doesn't, and it >can be set by embedded developers that want to ensure their >code keeps running for the next few decades. > > Arnd -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: > > readonly if not in reality than in practice. > > For those (legacy) filesystems with a signed 32-bit timestamps, any > attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be > (silently) clamped to 0x7fff and that value (the last representable > time) used as an overflow indicator. The filesystem driver should > convert that value into a corresponding overflow value for whatever > kernel internal time representation being used when read back, and this > should be propagated up to user space. It should not be a hard error > otherwise, as you rightfully stated, everything non read-only would come > to a halt on that day. I don't think there is much of a difference between not being able to write at all and all newly written files having the same timestamp, causing random things to break differently. The clamp to the maximum supported time stamp sounds like a reasonable choice for 'utimens' and related syscalls for the case of someone setting an arbitrary future date beyond what the file system can represent. Then again, I don't see a reason why that shouldn't just cause an error to be returned. For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. * until xfs gets extended, we can also disiable it at build time. For most users, we probably want to leave all that enabled by default until we get much closer to 2038, but a compile time option should allow us to test what works or doesn't, and it can be set by embedded developers that want to ensure their code keeps running for the next few decades. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: readonly if not in reality than in practice. For those (legacy) filesystems with a signed 32-bit timestamps, any attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be (silently) clamped to 0x7fff and that value (the last representable time) used as an overflow indicator. The filesystem driver should convert that value into a corresponding overflow value for whatever kernel internal time representation being used when read back, and this should be propagated up to user space. It should not be a hard error otherwise, as you rightfully stated, everything non read-only would come to a halt on that day. I don't think there is much of a difference between not being able to write at all and all newly written files having the same timestamp, causing random things to break differently. The clamp to the maximum supported time stamp sounds like a reasonable choice for 'utimens' and related syscalls for the case of someone setting an arbitrary future date beyond what the file system can represent. Then again, I don't see a reason why that shouldn't just cause an error to be returned. For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. * until xfs gets extended, we can also disiable it at build time. For most users, we probably want to leave all that enabled by default until we get much closer to 2038, but a compile time option should allow us to test what works or doesn't, and it can be set by embedded developers that want to ensure their code keeps running for the next few decades. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
Perhaps we should make this a kernel command line option instead, with the settings: error out on outside the standard window, or a date indicating the earliest date that should be recognized and do windowing (0 for no windowing, 1970 for retconning the Unix epoch as unsigned...) But again, the kernel is probably the least problem here... On June 1, 2014 12:56:52 PM PDT, Arnd Bergmann a...@arndb.de wrote: On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: readonly if not in reality than in practice. For those (legacy) filesystems with a signed 32-bit timestamps, any attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be (silently) clamped to 0x7fff and that value (the last representable time) used as an overflow indicator. The filesystem driver should convert that value into a corresponding overflow value for whatever kernel internal time representation being used when read back, and this should be propagated up to user space. It should not be a hard error otherwise, as you rightfully stated, everything non read-only would come to a halt on that day. I don't think there is much of a difference between not being able to write at all and all newly written files having the same timestamp, causing random things to break differently. The clamp to the maximum supported time stamp sounds like a reasonable choice for 'utimens' and related syscalls for the case of someone setting an arbitrary future date beyond what the file system can represent. Then again, I don't see a reason why that shouldn't just cause an error to be returned. For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. * until xfs gets extended, we can also disiable it at build time. For most users, we probably want to leave all that enabled by default until we get much closer to 2038, but a compile time option should allow us to test what works or doesn't, and it can be set by embedded developers that want to ensure their code keeps running for the next few decades. Arnd -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sun, Jun 01, 2014 at 10:24:37AM +1000, Dave Chinner wrote: On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. IOWs, if we are changing the on-disk timestamp format then this affects several ioctl()s and hence quite a few of the XFS userspace utilities. The hardest to fix will be xfsdump which would need a new dump format to store the extended timestamp ranges, and then xfs_restore will need to be able to handle restoring such timestamps on filesystems that don't have extended timestamp support... Put simply, changing the structure of system time isn't as straight forward as changing the kernel structures. System time gets stored permanently, and that has a cascade effect through the kernel all to all of the filesystem utilities that know about that permanent storage in some way So yes, you can change the kernel definition, but until the permanent storage of system time can be extended to support the same range as the kernel the *system* will still have nasty, silent epoch overflow, truncation or corruption issues. Just to put that in context, here's the kernel patch to add extended epoch support to XFS. It's completely untested as I haven't done any userspace code changes to enable the feature. However, it should give you an indication of how far the simple act of changing the kernel time representation spread through the filesystem. This does not include any of the VFS infrastructure to specifying the range of supported timestamps. It survives some smoke testing, but dies when the online defragmenter starts using the bulkstat and swap extent ioctls (the assert in xfs_inode_time_from_epoch() fires), so I probably don't have that all sorted correctly yet... To test extended epoch support, however, I need to some fstests that define and validate the behaviour of the new syscalls - until we get those we can't validate that the filesystem follows the spec properly. I also suspect we are going to need an interface to query the supported range of timestamps from a filesystem so that we can test boundary conditions in an automated fashion Cheers, Dave. -- Dave Chinner da...@fromorbit.com xfs: support timestamps beyond Unix epochs From: Dave Chinner dchin...@redhat.com The 32 bit second counters in timestamps are too small to represent time beyond the unix epoch (jan 2038) correctly. Extend the on-disk format for a timestamp to include an 8-bit epoch counter so that we can extend time for up to 255 Unix epochs. This should be good for representing timestamps from 1970 to somewhere around 19,000 A.D Signed-off-by: Dave Chinner dchin...@redhat.com --- fs/xfs/time.h| 7 -- fs/xfs/xfs_bmap_util.c | 35 +--- fs/xfs/xfs_dinode.h | 48 ++- fs/xfs/xfs_fs.h | 9 +++- fs/xfs/xfs_fsops.c | 5 +++- fs/xfs/xfs_inode.c | 16 ++--- fs/xfs/xfs_inode_buf.c | 8 +++ fs/xfs/xfs_ioctl32.c | 3 +++ fs/xfs/xfs_ioctl32.h | 5 +++- fs/xfs/xfs_iops.c| 59 +++- fs/xfs/xfs_itable.c | 12 ++ fs/xfs/xfs_log_format.h | 4 fs/xfs/xfs_sb.h | 12 +- fs/xfs/xfs_trans_inode.c | 2 +- 14 files changed, 175 insertions(+), 50 deletions(-) diff --git
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sun, 1 Jun 2014, Arnd Bergmann wrote: On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: readonly if not in reality than in practice. For those (legacy) filesystems with a signed 32-bit timestamps, any attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be (silently) clamped to 0x7fff and that value (the last representable time) used as an overflow indicator. The filesystem driver should convert that value into a corresponding overflow value for whatever kernel internal time representation being used when read back, and this should be propagated up to user space. It should not be a hard error otherwise, as you rightfully stated, everything non read-only would come to a halt on that day. I don't think there is much of a difference between not being able to write at all and all newly written files having the same timestamp, causing random things to break differently. Well, in one case you have a crash certitude. In the other case you have some probability that your system might still be usable. The clamp to the maximum supported time stamp sounds like a reasonable choice for 'utimens' and related syscalls for the case of someone setting an arbitrary future date beyond what the file system can represent. Then again, I don't see a reason why that shouldn't just cause an error to be returned. Resiliance is better than outright failure. For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. Syscalls and libs can be fixed. Existing filesystem content might not. So if you need to mount some old media in read-write mode after 2038 and that happens to content an ext2 or similarly limited filesystem then it'd better just work. Having the kernel refuse to modify the filesystem would be unacceptable. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sun, Jun 01, 2014 at 09:36:26PM -0400, Nicolas Pitre wrote: On Sun, 1 Jun 2014, Arnd Bergmann wrote: On Saturday 31 May 2014 11:46:16 Nicolas Pitre wrote: For actually running kernels beyond 2038, the best idea I've seen so far is to disallow all broken code at compile time. I don't see a choice but to audit the entire kernel for invalid uses on both 32 and 64 bit in the next few years. A lot of code will get changed in the process so we can actually keep running 32-bit kernels and file systems, but other code will likely go away: * any system calls that pass a time_t, timeval or timespec on 32-bit systems return -ENOSYS, to ensure all user land uses the replacements we will put into place * The definition of 'time_t', 'timval' and 'timespec' can be hidden from the kernel, and all code using it left out. * ext2 and ext3 file system code will have to be disabled, but that's file since ext4 can mount old file systems. Syscalls and libs can be fixed. Existing filesystem content might not. So if you need to mount some old media in read-write mode after 2038 and that happens to content an ext2 or similarly limited filesystem then it'd better just work. Having the kernel refuse to modify the filesystem would be unacceptable. We can already tell the VFS/filesystems not to update timestamps: inode-i_flags |= S_NOATIME | S_NOCMTIME; Just enforce that everywhere (i.e. notify_change()) rather than just on the IO path and the legacy filesystem timestamp problem is solved. New interfaces need to return errors when an out-of-range parameter is set. And right now, epoch dates are out of range for most filesystems, and so we need to handle that condition appropriately. Silent date overflow == filesystem corruption, and as such I'm going to error out such conditions in the filesystem regardless of what the userspace API says. Filesystems place all sorts of userspace visible limits on storage - ever tried to create a file 16TB on ext4? The on-disk format doesn't support it, so it returns an out of range error (E2BIG, I think) if you try. XFS, OTOH, handles this just fine and so it continues to work. It's exactly the same with timestamps - there's a physical limit to what can sanely be stored in any given filesystem and it's an *error condition* to go beyond that limit Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote: > On 05/30/2014 10:54 PM, Dave Chinner wrote: > > > > If we are changing the in-kernel timestamp to have a greater dynamic > > range that anything we current support on disk, then we need support > > for all filesystems for similar translation and constraint. The > > filesystems need to be able to tell the kernel what they timestamp > > range they support, and then the kernel needs to follow those > > guidelines. And if the filesystem is mounted on a kernel that > > doesn't support the current filesystem's timestamp format, then at > > minimum that filesystem cannot do anything that writes a > > timestamp > > > > Put simply: the filesystem defines the timestamp range that can be > > used safely, not the userspace API. If the filesystem can't support > > the date it is handed then that is an out-of-range error. Since > > when have we accepted that it's OK to handle out-of-range data with > > silent overflows or corruption of the data that we are attempting to > > store? We're defining a new API to support a wider date range - > > there is nothing that prevents us from saying ERANGE can be returned > > to a timestamp that the file cannot store correctly > > > > I'm still puzzled. > > Are you saying that you want a program that does: > > /* Deliberately simplified */ > gettimeofdayns( ...); > utimensat(... now); > > ... to suddenly start failing on Jan 19, 2038 (for a filesystem with > 32-bit timestamps), Yes. Hard fail so overflows are in your face and we know exactly what is going to cause silent timestamp screwups when the epoch > or would you propose some ways for the filesystems > in question to extend the range of the timestamps? Filesystems are going to have to change their on-disk formats, so we'd do that just like we do every other on-disk format change. With feature bits and translation layers, new ioctl structures, etc. Depending on the amount of work necessary, some filesystems could do this in 3.16, others it might be 3.20 before everything is sorted out across the kernel and userspace code... Either way, the hard fail problem goes away as each filesystem is converted. Further, if we have regression tests then new filesystems are guaranteed to be designed to handle 2038 epoch rollover, and so in a year of two this "hard fail" is effectively a non-problem. If someone breaks something in future, then we'll know about it pretty quickly. > What you seem to propose also seems to imply that on Jan 19, 2038 > anything that writes a timestamp with the current date (which logically > ends up being almost every write operation) would be dead and frozen on > such a filesystem -- pretty much meaning the filesystem would become > readonly if not in reality than in practice. Yup. If we can't do what the user wants without the user thinking corruption has occurred, then the only thing we are left with is "shut down the filesystem" error handling. Kind of like using BUG() rather than returning an error. That's why we need to be able to hard fail and return an error. However, we've got 20+ years to fix our current filesystems and all their support code to ensure this doesn't happen. In the mean time, having stuff hard fail is a great way to ensure that filesystems get fixed sooner rather than later... > I strongly suspect that that would be a more catastrophic failure than > incorrect timestamps, as you suddenly have all kinds of machines > embedded in $DEITY knows what places just stop and refuse to run. Yup, that's a great way of flushing out problems 20 years before they really matter. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: > On Saturday 31 May 2014 11:14:50 Dave Chinner wrote: > > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > > > On 05/30/2014 05:37 PM, Dave Chinner wrote: > > > > > > > > IOWs, the filesystem has to be able to reject any attempt to set a > > > > timestamp that is can't represent on disk otherwise Bad Stuff will > > > > happen, > > > > > > Actually it is questionable if it is worse to reject a timestamp or just > > > let it wrap. Rejecting a valid timestamp is a bit like "You don't > > > exist, go away." > > > > I think having the new systems calls being able to > > return EINVAL if the value cannot be stored permanently on disk > > correctly is the right thing to do. Having it silently mangled > > by the filesystem and returning "everything is just fine, trust me" > > is close to the worst solution I can think of. That's exactly what > > leads to overflow bugs occurring > > While going through the file systems, I was wondering whether > we should have the times stop at the end of each file systems > epoch rather than wrap around. > > > > > and filesystems have to be able to specify in their on > > > > disk format what timestamp encoding is being used. The solution will > > > > be different for every filesystem that needs to support time beyond > > > > 2038. > > > > > > Actually the cutoff can be really different for each filesystem, not > > > necessarily 2038. However, I maintain the above still holds. > > > > Sure, but all filesystems are supposed to handle at least the > > current unix epoch. > > In my list at http://kernelnewbies.org/y2038, I found that almost > all file systems at least times until 2106, because they treat > the on-disk value as unsigned on 64-bit systems, or they use > a completely different representation. My guess is that somebody > earlier spent a lot of work on making that happen. > > The exceptions are: > > * exofs uses signed values, which can probably be changed to be > consistent with the others. > * isofs has a bug that limits it until 2027 on architectures with > a signed 'char' type (otherwise it's 2155). > * udf can represent times for many thousands of years through a > 16-bit year representation, but the code to convert to epoch > uses a const array that ends at 2038. > * afs uses signed seconds and can probably be fixed > * coda relies on user space time representation getting passed > through an ioctl. > * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, > where they really use signed. > > I was confused about XFS since I didn't noticed that there are > separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected > XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. IOWs, if we are changing the on-disk timestamp format then this affects several ioctl()s and hence quite a few of the XFS userspace utilities. The hardest to fix will be xfsdump which would need a new dump format to store the extended timestamp ranges, and then xfs_restore will need to be able to handle restoring such timestamps on filesystems that don't have extended timestamp support... Put simply, changing the structure of system time isn't as straight forward as changing the kernel structures. System time gets stored permanently, and that has a cascade effect through the kernel all to all of the filesystem utilities that know about that permanent storage in some way So yes, you can change the kernel definition, but until the permanent storage of system time can be extended to support the same range as the kernel the *system* will still have nasty, silent epoch overflow, truncation or corruption issues. > If we are using the variant of my patch that extends > indode_time->tv_sec to s64, nothing should change for XFS > at all, the main difference is that we if it gets extended > to wider on-disk timestamps, they will work the same way on > 32-bit and 64-bit kernels. "nothing should change" except for the fact that a 64 bit timestamp gets silently truncated to 32 bits and the timestamp is not what the user expects it to be. The user does not find out until the inode passes out of cache and is re-read from disk, and then it's wrong. To put it politely: that is broken, obnoxious behaviour and we don't design new interfaces with such ugly warts anymore. Define an EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this case and *hard fail* if the storage cannot support the extended timestamp being
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, 31 May 2014, H. Peter Anvin wrote: > On 05/30/2014 10:54 PM, Dave Chinner wrote: > > > > If we are changing the in-kernel timestamp to have a greater dynamic > > range that anything we current support on disk, then we need support > > for all filesystems for similar translation and constraint. The > > filesystems need to be able to tell the kernel what they timestamp > > range they support, and then the kernel needs to follow those > > guidelines. And if the filesystem is mounted on a kernel that > > doesn't support the current filesystem's timestamp format, then at > > minimum that filesystem cannot do anything that writes a > > timestamp > > > > Put simply: the filesystem defines the timestamp range that can be > > used safely, not the userspace API. If the filesystem can't support > > the date it is handed then that is an out-of-range error. Since > > when have we accepted that it's OK to handle out-of-range data with > > silent overflows or corruption of the data that we are attempting to > > store? We're defining a new API to support a wider date range - > > there is nothing that prevents us from saying ERANGE can be returned > > to a timestamp that the file cannot store correctly > > > > I'm still puzzled. > > Are you saying that you want a program that does: > > /* Deliberately simplified */ > gettimeofdayns( ...); > utimensat(... now); > > ... to suddenly start failing on Jan 19, 2038 (for a filesystem with > 32-bit timestamps), or would you propose some ways for the filesystems > in question to extend the range of the timestamps? > > What you seem to propose also seems to imply that on Jan 19, 2038 > anything that writes a timestamp with the current date (which logically > ends up being almost every write operation) would be dead and frozen on > such a filesystem -- pretty much meaning the filesystem would become > readonly if not in reality than in practice. For those (legacy) filesystems with a signed 32-bit timestamps, any attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be (silently) clamped to 0x7fff and that value (the last representable time) used as an overflow indicator. The filesystem driver should convert that value into a corresponding overflow value for whatever kernel internal time representation being used when read back, and this should be propagated up to user space. It should not be a hard error otherwise, as you rightfully stated, everything non read-only would come to a halt on that day. Inside the kernel, the overflow indicator could be as simple as dedicating one of the top bit in a 64-bit time_t value in order to still transmit the overflow limit. For example, in the above case, we could use 0x4000-7fff to indicate the actual time is unavailable due to the filesystem's time representation being overflowed from 0x7fff. If for example a filesystem cannot represent timestamps from Jan 1 00:00:00 2100 UTC then the overflow representation for this particular filesystem would be 0x4000-f48656ff. Those syscalls with a 32-bit time_t would be returned 0x7fff whenever there is an overflow being signaled. Whether 64-bit overflow-marked time_t values, when passed to user space, should clear the overflow bit, or use a unique time_t overflow value, could be decided and even changed later after discussion with glibc people for example. Hard errors should be signaled to user space, and the actual operation aborted, only with the presence of a new flag passed to the kernel. However, by default, things should "just work" albeit with the "wrong" i.e clamped time being saved on disk as much as possible otherwise. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Saturday 31 May 2014 11:14:50 Dave Chinner wrote: > On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > > On 05/30/2014 05:37 PM, Dave Chinner wrote: > > > > > > IOWs, the filesystem has to be able to reject any attempt to set a > > > timestamp that is can't represent on disk otherwise Bad Stuff will > > > happen, > > > > Actually it is questionable if it is worse to reject a timestamp or just > > let it wrap. Rejecting a valid timestamp is a bit like "You don't > > exist, go away." > > I think having the new systems calls being able to > return EINVAL if the value cannot be stored permanently on disk > correctly is the right thing to do. Having it silently mangled > by the filesystem and returning "everything is just fine, trust me" > is close to the worst solution I can think of. That's exactly what > leads to overflow bugs occurring While going through the file systems, I was wondering whether we should have the times stop at the end of each file systems epoch rather than wrap around. > > > and filesystems have to be able to specify in their on > > > disk format what timestamp encoding is being used. The solution will > > > be different for every filesystem that needs to support time beyond > > > 2038. > > > > Actually the cutoff can be really different for each filesystem, not > > necessarily 2038. However, I maintain the above still holds. > > Sure, but all filesystems are supposed to handle at least the > current unix epoch. In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. If we are using the variant of my patch that extends indode_time->tv_sec to s64, nothing should change for XFS at all, the main difference is that we if it gets extended to wider on-disk timestamps, they will work the same way on 32-bit and 64-bit kernels. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 05/30/2014 10:54 PM, Dave Chinner wrote: > > If we are changing the in-kernel timestamp to have a greater dynamic > range that anything we current support on disk, then we need support > for all filesystems for similar translation and constraint. The > filesystems need to be able to tell the kernel what they timestamp > range they support, and then the kernel needs to follow those > guidelines. And if the filesystem is mounted on a kernel that > doesn't support the current filesystem's timestamp format, then at > minimum that filesystem cannot do anything that writes a > timestamp > > Put simply: the filesystem defines the timestamp range that can be > used safely, not the userspace API. If the filesystem can't support > the date it is handed then that is an out-of-range error. Since > when have we accepted that it's OK to handle out-of-range data with > silent overflows or corruption of the data that we are attempting to > store? We're defining a new API to support a wider date range - > there is nothing that prevents us from saying ERANGE can be returned > to a timestamp that the file cannot store correctly > I'm still puzzled. Are you saying that you want a program that does: /* Deliberately simplified */ gettimeofdayns( ...); utimensat(... now); ... to suddenly start failing on Jan 19, 2038 (for a filesystem with 32-bit timestamps), or would you propose some ways for the filesystems in question to extend the range of the timestamps? What you seem to propose also seems to imply that on Jan 19, 2038 anything that writes a timestamp with the current date (which logically ends up being almost every write operation) would be dead and frozen on such a filesystem -- pretty much meaning the filesystem would become readonly if not in reality than in practice. I strongly suspect that that would be a more catastrophic failure than incorrect timestamps, as you suddenly have all kinds of machines embedded in $DEITY knows what places just stop and refuse to run. If that is not what you mean I genuinely like to understand the situation better. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 05/30/2014 10:54 PM, Dave Chinner wrote: If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly I'm still puzzled. Are you saying that you want a program that does: /* Deliberately simplified */ gettimeofdayns(now ...); utimensat(... now); ... to suddenly start failing on Jan 19, 2038 (for a filesystem with 32-bit timestamps), or would you propose some ways for the filesystems in question to extend the range of the timestamps? What you seem to propose also seems to imply that on Jan 19, 2038 anything that writes a timestamp with the current date (which logically ends up being almost every write operation) would be dead and frozen on such a filesystem -- pretty much meaning the filesystem would become readonly if not in reality than in practice. I strongly suspect that that would be a more catastrophic failure than incorrect timestamps, as you suddenly have all kinds of machines embedded in $DEITY knows what places just stop and refuse to run. If that is not what you mean I genuinely like to understand the situation better. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Saturday 31 May 2014 11:14:50 Dave Chinner wrote: On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: On 05/30/2014 05:37 PM, Dave Chinner wrote: IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like You don't exist, go away. I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning everything is just fine, trust me is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring While going through the file systems, I was wondering whether we should have the times stop at the end of each file systems epoch rather than wrap around. and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. If we are using the variant of my patch that extends indode_time-tv_sec to s64, nothing should change for XFS at all, the main difference is that we if it gets extended to wider on-disk timestamps, they will work the same way on 32-bit and 64-bit kernels. Arnd -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, 31 May 2014, H. Peter Anvin wrote: On 05/30/2014 10:54 PM, Dave Chinner wrote: If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly I'm still puzzled. Are you saying that you want a program that does: /* Deliberately simplified */ gettimeofdayns(now ...); utimensat(... now); ... to suddenly start failing on Jan 19, 2038 (for a filesystem with 32-bit timestamps), or would you propose some ways for the filesystems in question to extend the range of the timestamps? What you seem to propose also seems to imply that on Jan 19, 2038 anything that writes a timestamp with the current date (which logically ends up being almost every write operation) would be dead and frozen on such a filesystem -- pretty much meaning the filesystem would become readonly if not in reality than in practice. For those (legacy) filesystems with a signed 32-bit timestamps, any attempt to create a timestamp past Jan 19 03:14:06 2038 UTC should be (silently) clamped to 0x7fff and that value (the last representable time) used as an overflow indicator. The filesystem driver should convert that value into a corresponding overflow value for whatever kernel internal time representation being used when read back, and this should be propagated up to user space. It should not be a hard error otherwise, as you rightfully stated, everything non read-only would come to a halt on that day. Inside the kernel, the overflow indicator could be as simple as dedicating one of the top bit in a 64-bit time_t value in order to still transmit the overflow limit. For example, in the above case, we could use 0x4000-7fff to indicate the actual time is unavailable due to the filesystem's time representation being overflowed from 0x7fff. If for example a filesystem cannot represent timestamps from Jan 1 00:00:00 2100 UTC then the overflow representation for this particular filesystem would be 0x4000-f48656ff. Those syscalls with a 32-bit time_t would be returned 0x7fff whenever there is an overflow being signaled. Whether 64-bit overflow-marked time_t values, when passed to user space, should clear the overflow bit, or use a unique time_t overflow value, could be decided and even changed later after discussion with glibc people for example. Hard errors should be signaled to user space, and the actual operation aborted, only with the presence of a new flag passed to the kernel. However, by default, things should just work albeit with the wrong i.e clamped time being saved on disk as much as possible otherwise. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, May 31, 2014 at 05:37:52PM +0200, Arnd Bergmann wrote: On Saturday 31 May 2014 11:14:50 Dave Chinner wrote: On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: On 05/30/2014 05:37 PM, Dave Chinner wrote: IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like You don't exist, go away. I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning everything is just fine, trust me is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring While going through the file systems, I was wondering whether we should have the times stop at the end of each file systems epoch rather than wrap around. and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. In my list at http://kernelnewbies.org/y2038, I found that almost all file systems at least times until 2106, because they treat the on-disk value as unsigned on 64-bit systems, or they use a completely different representation. My guess is that somebody earlier spent a lot of work on making that happen. The exceptions are: * exofs uses signed values, which can probably be changed to be consistent with the others. * isofs has a bug that limits it until 2027 on architectures with a signed 'char' type (otherwise it's 2155). * udf can represent times for many thousands of years through a 16-bit year representation, but the code to convert to epoch uses a const array that ends at 2038. * afs uses signed seconds and can probably be fixed * coda relies on user space time representation getting passed through an ioctl. * I miscategorized xfs/ext2/ext3 as having unsigned 32-bit seconds, where they really use signed. I was confused about XFS since I didn't noticed that there are separate xfs_ictimestamp_t and xfs_timestamp_t types, so I expected XFS to also use the 1970-2106 time range on 64-bit systems today. You've missed an awful lot more than just the implications for the core kernel code. There's a good chance such changes propagate to APIs elsewhere in the filesystems, because something you haven't realised is that XFS effectively exposes the on-disk timestamp format directly to userspace via the bulkstat interface (see struct xfs_bstat). It also affects the XFS open-by-handle ioctl and the swap extent ioctl used by the online defragmenter. IOWs, if we are changing the on-disk timestamp format then this affects several ioctl()s and hence quite a few of the XFS userspace utilities. The hardest to fix will be xfsdump which would need a new dump format to store the extended timestamp ranges, and then xfs_restore will need to be able to handle restoring such timestamps on filesystems that don't have extended timestamp support... Put simply, changing the structure of system time isn't as straight forward as changing the kernel structures. System time gets stored permanently, and that has a cascade effect through the kernel all to all of the filesystem utilities that know about that permanent storage in some way So yes, you can change the kernel definition, but until the permanent storage of system time can be extended to support the same range as the kernel the *system* will still have nasty, silent epoch overflow, truncation or corruption issues. If we are using the variant of my patch that extends indode_time-tv_sec to s64, nothing should change for XFS at all, the main difference is that we if it gets extended to wider on-disk timestamps, they will work the same way on 32-bit and 64-bit kernels. nothing should change except for the fact that a 64 bit timestamp gets silently truncated to 32 bits and the timestamp is not what the user expects it to be. The user does not find out until the inode passes out of cache and is re-read from disk, and then it's wrong. To put it politely: that is broken, obnoxious behaviour and we don't design new interfaces with such ugly warts anymore. Define an EOVERFLOW, EINVAL or ERANGE error in the new syscalls to handle this case and *hard fail* if the storage cannot support the extended timestamp being passed in. There is no excuse for silently mangling out-of-range data, especially as we have plenty of time to add support to
Re: [RFC 11/32] xfs: convert to struct inode_time
On Sat, May 31, 2014 at 01:41:56AM -0700, H. Peter Anvin wrote: On 05/30/2014 10:54 PM, Dave Chinner wrote: If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly I'm still puzzled. Are you saying that you want a program that does: /* Deliberately simplified */ gettimeofdayns(now ...); utimensat(... now); ... to suddenly start failing on Jan 19, 2038 (for a filesystem with 32-bit timestamps), Yes. Hard fail so overflows are in your face and we know exactly what is going to cause silent timestamp screwups when the epoch or would you propose some ways for the filesystems in question to extend the range of the timestamps? Filesystems are going to have to change their on-disk formats, so we'd do that just like we do every other on-disk format change. With feature bits and translation layers, new ioctl structures, etc. Depending on the amount of work necessary, some filesystems could do this in 3.16, others it might be 3.20 before everything is sorted out across the kernel and userspace code... Either way, the hard fail problem goes away as each filesystem is converted. Further, if we have regression tests then new filesystems are guaranteed to be designed to handle 2038 epoch rollover, and so in a year of two this hard fail is effectively a non-problem. If someone breaks something in future, then we'll know about it pretty quickly. What you seem to propose also seems to imply that on Jan 19, 2038 anything that writes a timestamp with the current date (which logically ends up being almost every write operation) would be dead and frozen on such a filesystem -- pretty much meaning the filesystem would become readonly if not in reality than in practice. Yup. If we can't do what the user wants without the user thinking corruption has occurred, then the only thing we are left with is shut down the filesystem error handling. Kind of like using BUG() rather than returning an error. That's why we need to be able to hard fail and return an error. However, we've got 20+ years to fix our current filesystems and all their support code to ensure this doesn't happen. In the mean time, having stuff hard fail is a great way to ensure that filesystems get fixed sooner rather than later... I strongly suspect that that would be a more catastrophic failure than incorrect timestamps, as you suddenly have all kinds of machines embedded in $DEITY knows what places just stop and refuse to run. Yup, that's a great way of flushing out problems 20 years before they really matter. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
[ Please don't top post. ] On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote: > On May 30, 2014 6:14:50 PM PDT, Dave Chinner wrote: > >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > >> On 05/30/2014 05:37 PM, Dave Chinner wrote: > >> > > >> > IOWs, the filesystem has to be able to reject any attempt to > >> > set a timestamp that is can't represent on disk otherwise Bad > >> > Stuff will happen, > >> > >> Actually it is questionable if it is worse to reject a > >> timestamp or > >just > >> let it wrap. Rejecting a valid timestamp is a bit like "You > >> don't exist, go away." > > > >I think having the new systems calls being able to return EINVAL > >if the value cannot be stored permanently on disk correctly is > >the right thing to do. Having it silently mangled by the > >filesystem and returning "everything is just fine, trust me" is > >close to the worst solution I can think of. That's exactly what > >leads to overflow bugs occurring > > > >> > and filesystems have to be able to specify in their on disk > >> > format what timestamp encoding is being used. The solution > >will > >> > be different for every filesystem that needs to support time > >> > beyond 2038. > >> > >> Actually the cutoff can be really different for each > >> filesystem, not necessarily 2038. However, I maintain the > >> above still holds. > > > >Sure, but all filesystems are supposed to handle at least the > >current unix epoch. > > > >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS > >> format. > >What > >> would you have expected such a filesystem to do on Jan 1, 2000? > > > >Strawman. > > > >We don't need to cater for fundamentally broken designs that > >can't even handle the current unix epoch correctly. If such > >filesystems exist, then they can simple say "original unix epoch > >support only" and do whatever crap they are doing right now. > > No, not a strawman. Replace with Jan 26, 2038 and you have the > same situation. But that's not the problem I'm talking about. The problem isn't the roll-over date of the epoch - the problem is that we're changing the in-memory meaning of time without changing what the filesystems store on disk or how they translate them. To use your example, what I'm actually talking about is the kernel switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on disk. The filesystem doesn't know the timestamp is now a different format, so it could mangle it writing it to disk, or it could mangle existing timestamps in the YY.. format reading them from disk and putting them into CC.. format structures. IOWs, it will incorrectly translate YY format dates to CC format, or translate something in the CC format as though it was in YY format. And it wouldn't even know what was the correct format because there's nothing telling it on disk whether the date is in CC or YY format. Either way, you get mangled timestamps, the filesystem doesn't know about it because it's just storing what the kernel gives it, the kernel thinks they are fine because they are just opaque when read back, but the user says "what the fuck did a reboot do to all these timestamps?". Hence your example of roll-over dates is a strawman - you've constructed a problem that is irrelevant to the issue being pointed out. FWIW, we already have code in the superblock and VFS to avoid such problems on filesystems with limited timestamp resolution (i.e s_time_gran and current_fs_time()) so that what the VFS hands the filesystem is exactly what the VFS expects to get back from disk when comparing timestamps. If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
No, not a strawman. Replace with Jan 26, 2038 and you have the same situation. On May 30, 2014 6:14:50 PM PDT, Dave Chinner wrote: >On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: >> On 05/30/2014 05:37 PM, Dave Chinner wrote: >> > >> > IOWs, the filesystem has to be able to reject any attempt to set a >> > timestamp that is can't represent on disk otherwise Bad Stuff will >> > happen, >> >> Actually it is questionable if it is worse to reject a timestamp or >just >> let it wrap. Rejecting a valid timestamp is a bit like "You don't >> exist, go away." > >I think having the new systems calls being able to >return EINVAL if the value cannot be stored permanently on disk >correctly is the right thing to do. Having it silently mangled >by the filesystem and returning "everything is just fine, trust me" >is close to the worst solution I can think of. That's exactly what >leads to overflow bugs occurring > >> > and filesystems have to be able to specify in their on >> > disk format what timestamp encoding is being used. The solution >will >> > be different for every filesystem that needs to support time beyond >> > 2038. >> >> Actually the cutoff can be really different for each filesystem, not >> necessarily 2038. However, I maintain the above still holds. > >Sure, but all filesystems are supposed to handle at least the >current unix epoch. > >> Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. >What >> would you have expected such a filesystem to do on Jan 1, 2000? > >Strawman. > >We don't need to cater for fundamentally broken designs that can't >even handle the current unix epoch correctly. If such filesystems >exist, then they can simple say "original unix epoch support only" >and do whatever crap they are doing right now. > >Cheers, > >Dave. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: > On 05/30/2014 05:37 PM, Dave Chinner wrote: > > > > IOWs, the filesystem has to be able to reject any attempt to set a > > timestamp that is can't represent on disk otherwise Bad Stuff will > > happen, > > Actually it is questionable if it is worse to reject a timestamp or just > let it wrap. Rejecting a valid timestamp is a bit like "You don't > exist, go away." I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning "everything is just fine, trust me" is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring > > and filesystems have to be able to specify in their on > > disk format what timestamp encoding is being used. The solution will > > be different for every filesystem that needs to support time beyond > > 2038. > > Actually the cutoff can be really different for each filesystem, not > necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. > Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What > would you have expected such a filesystem to do on Jan 1, 2000? Strawman. We don't need to cater for fundamentally broken designs that can't even handle the current unix epoch correctly. If such filesystems exist, then they can simple say "original unix epoch support only" and do whatever crap they are doing right now. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 05/30/2014 05:37 PM, Dave Chinner wrote: > > IOWs, the filesystem has to be able to reject any attempt to set a > timestamp that is can't represent on disk otherwise Bad Stuff will > happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like "You don't exist, go away." > and filesystems have to be able to specify in their on > disk format what timestamp encoding is being used. The solution will > be different for every filesystem that needs to support time beyond > 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What would you have expected such a filesystem to do on Jan 1, 2000? -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Fri, May 30, 2014 at 10:01:35PM +0200, Arnd Bergmann wrote: > xfs uses unsigned 32-bit seconds for inode timestamps, which will work > for the next 92 years, but the VFS uses struct timespec for timestamps, > which is only good until 2038 on 32-bit CPUs. > > This gets us one small step closer to lifting the VFS limit by using > struct inode_time in XFS. > > Signed-off-by: Arnd Bergmann > Cc: Dave Chinner > Cc: x...@oss.sgi.com > --- > fs/xfs/time.h| 4 ++-- > fs/xfs/xfs_inode.c | 2 +- > fs/xfs/xfs_iops.c| 2 +- > fs/xfs/xfs_trans_inode.c | 6 +++--- > 4 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/fs/xfs/time.h b/fs/xfs/time.h > index 387e695..a490f1b 100644 > --- a/fs/xfs/time.h > +++ b/fs/xfs/time.h > @@ -21,14 +21,14 @@ > #include > #include > > -typedef struct timespec timespec_t; > +typedef struct inode_time timespec_t; > > static inline void delay(long ticks) > { > schedule_timeout_uninterruptible(ticks); > } > > -static inline void nanotime(struct timespec *tvp) > +static inline void nanotime(struct inode_time *tvp) > { > *tvp = CURRENT_TIME; > } > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index a6115fe..16d5392 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -654,7 +654,7 @@ xfs_ialloc( > xfs_inode_t *ip; > uintflags; > int error; > - timespec_t tv; > + struct inode_time tv; > > /* >* Call the space management code to pick > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > index 205613a..092ee7c 100644 > --- a/fs/xfs/xfs_iops.c > +++ b/fs/xfs/xfs_iops.c > @@ -956,7 +956,7 @@ xfs_vn_setattr( > STATIC int > xfs_vn_update_time( > struct inode*inode, > - struct timespec *now, > + struct inode_time *now, > int flags) > { > struct xfs_inode*ip = XFS_I(inode); > diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c > index 50c3f56..bae2520 100644 > --- a/fs/xfs/xfs_trans_inode.c > +++ b/fs/xfs/xfs_trans_inode.c > @@ -70,7 +70,7 @@ xfs_trans_ichgtime( > int flags) > { > struct inode*inode = VFS_I(ip); > - timespec_t tv; > + struct inode_time tv; > > ASSERT(tp); > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); > @@ -78,13 +78,13 @@ xfs_trans_ichgtime( > tv = current_fs_time(inode->i_sb); > > if ((flags & XFS_ICHGTIME_MOD) && > - !timespec_equal(>i_mtime, )) { > + !inode_time_equal(>i_mtime, )) { > inode->i_mtime = tv; > ip->i_d.di_mtime.t_sec = tv.tv_sec; > ip->i_d.di_mtime.t_nsec = tv.tv_nsec; > } The problem I see here is that the code is now potentially stuffing a variable that is larger than 32 bits into on on-disk structure that is only 32 bits in size. You can't just change the in-memory representation of inode timestamps and expect the problem to be fixed - this just pushes the problem down a layer without any intrastructure allowing filesystems to handle storage of the new timestamp format sanely. IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Hence I think you are going to need superblock flags and/or variables to indicate the epoch range the fielsystem can support. Then the fileystems need conversion functions from whatever the internal VFS timestamp representation is to whatever their on-disk format is, and only then can we switch the VFS to using a new timestamp format. At that point, filesystem developers can make the changes they need to the on-disk format to support timestamps beyond 2038, and all they need to do at the VFS layer is set the "supported range" fields appropriately in the VFS superblock... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Fri, May 30, 2014 at 10:01:35PM +0200, Arnd Bergmann wrote: xfs uses unsigned 32-bit seconds for inode timestamps, which will work for the next 92 years, but the VFS uses struct timespec for timestamps, which is only good until 2038 on 32-bit CPUs. This gets us one small step closer to lifting the VFS limit by using struct inode_time in XFS. Signed-off-by: Arnd Bergmann a...@arndb.de Cc: Dave Chinner da...@fromorbit.com Cc: x...@oss.sgi.com --- fs/xfs/time.h| 4 ++-- fs/xfs/xfs_inode.c | 2 +- fs/xfs/xfs_iops.c| 2 +- fs/xfs/xfs_trans_inode.c | 6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/xfs/time.h b/fs/xfs/time.h index 387e695..a490f1b 100644 --- a/fs/xfs/time.h +++ b/fs/xfs/time.h @@ -21,14 +21,14 @@ #include linux/sched.h #include linux/time.h -typedef struct timespec timespec_t; +typedef struct inode_time timespec_t; static inline void delay(long ticks) { schedule_timeout_uninterruptible(ticks); } -static inline void nanotime(struct timespec *tvp) +static inline void nanotime(struct inode_time *tvp) { *tvp = CURRENT_TIME; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index a6115fe..16d5392 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -654,7 +654,7 @@ xfs_ialloc( xfs_inode_t *ip; uintflags; int error; - timespec_t tv; + struct inode_time tv; /* * Call the space management code to pick diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 205613a..092ee7c 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -956,7 +956,7 @@ xfs_vn_setattr( STATIC int xfs_vn_update_time( struct inode*inode, - struct timespec *now, + struct inode_time *now, int flags) { struct xfs_inode*ip = XFS_I(inode); diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c index 50c3f56..bae2520 100644 --- a/fs/xfs/xfs_trans_inode.c +++ b/fs/xfs/xfs_trans_inode.c @@ -70,7 +70,7 @@ xfs_trans_ichgtime( int flags) { struct inode*inode = VFS_I(ip); - timespec_t tv; + struct inode_time tv; ASSERT(tp); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); @@ -78,13 +78,13 @@ xfs_trans_ichgtime( tv = current_fs_time(inode-i_sb); if ((flags XFS_ICHGTIME_MOD) - !timespec_equal(inode-i_mtime, tv)) { + !inode_time_equal(inode-i_mtime, tv)) { inode-i_mtime = tv; ip-i_d.di_mtime.t_sec = tv.tv_sec; ip-i_d.di_mtime.t_nsec = tv.tv_nsec; } The problem I see here is that the code is now potentially stuffing a variable that is larger than 32 bits into on on-disk structure that is only 32 bits in size. You can't just change the in-memory representation of inode timestamps and expect the problem to be fixed - this just pushes the problem down a layer without any intrastructure allowing filesystems to handle storage of the new timestamp format sanely. IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Hence I think you are going to need superblock flags and/or variables to indicate the epoch range the fielsystem can support. Then the fileystems need conversion functions from whatever the internal VFS timestamp representation is to whatever their on-disk format is, and only then can we switch the VFS to using a new timestamp format. At that point, filesystem developers can make the changes they need to the on-disk format to support timestamps beyond 2038, and all they need to do at the VFS layer is set the supported range fields appropriately in the VFS superblock... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On 05/30/2014 05:37 PM, Dave Chinner wrote: IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like You don't exist, go away. and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What would you have expected such a filesystem to do on Jan 1, 2000? -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: On 05/30/2014 05:37 PM, Dave Chinner wrote: IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like You don't exist, go away. I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning everything is just fine, trust me is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What would you have expected such a filesystem to do on Jan 1, 2000? Strawman. We don't need to cater for fundamentally broken designs that can't even handle the current unix epoch correctly. If such filesystems exist, then they can simple say original unix epoch support only and do whatever crap they are doing right now. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
No, not a strawman. Replace with Jan 26, 2038 and you have the same situation. On May 30, 2014 6:14:50 PM PDT, Dave Chinner da...@fromorbit.com wrote: On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: On 05/30/2014 05:37 PM, Dave Chinner wrote: IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like You don't exist, go away. I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning everything is just fine, trust me is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What would you have expected such a filesystem to do on Jan 1, 2000? Strawman. We don't need to cater for fundamentally broken designs that can't even handle the current unix epoch correctly. If such filesystems exist, then they can simple say original unix epoch support only and do whatever crap they are doing right now. Cheers, Dave. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 11/32] xfs: convert to struct inode_time
[ Please don't top post. ] On Fri, May 30, 2014 at 06:22:55PM -0700, H. Peter Anvin wrote: On May 30, 2014 6:14:50 PM PDT, Dave Chinner da...@fromorbit.com wrote: On Fri, May 30, 2014 at 05:41:14PM -0700, H. Peter Anvin wrote: On 05/30/2014 05:37 PM, Dave Chinner wrote: IOWs, the filesystem has to be able to reject any attempt to set a timestamp that is can't represent on disk otherwise Bad Stuff will happen, Actually it is questionable if it is worse to reject a timestamp or just let it wrap. Rejecting a valid timestamp is a bit like You don't exist, go away. I think having the new systems calls being able to return EINVAL if the value cannot be stored permanently on disk correctly is the right thing to do. Having it silently mangled by the filesystem and returning everything is just fine, trust me is close to the worst solution I can think of. That's exactly what leads to overflow bugs occurring and filesystems have to be able to specify in their on disk format what timestamp encoding is being used. The solution will be different for every filesystem that needs to support time beyond 2038. Actually the cutoff can be really different for each filesystem, not necessarily 2038. However, I maintain the above still holds. Sure, but all filesystems are supposed to handle at least the current unix epoch. Consider a filesystem that kept timestamps in YYMMDDHHMMSS format. What would you have expected such a filesystem to do on Jan 1, 2000? Strawman. We don't need to cater for fundamentally broken designs that can't even handle the current unix epoch correctly. If such filesystems exist, then they can simple say original unix epoch support only and do whatever crap they are doing right now. No, not a strawman. Replace with Jan 26, 2038 and you have the same situation. But that's not the problem I'm talking about. The problem isn't the roll-over date of the epoch - the problem is that we're changing the in-memory meaning of time without changing what the filesystems store on disk or how they translate them. To use your example, what I'm actually talking about is the kernel switching to CCYYMMDDHHMMSS while the filesystem has YYMMDDHHMMSS on disk. The filesystem doesn't know the timestamp is now a different format, so it could mangle it writing it to disk, or it could mangle existing timestamps in the YY.. format reading them from disk and putting them into CC.. format structures. IOWs, it will incorrectly translate YY format dates to CC format, or translate something in the CC format as though it was in YY format. And it wouldn't even know what was the correct format because there's nothing telling it on disk whether the date is in CC or YY format. Either way, you get mangled timestamps, the filesystem doesn't know about it because it's just storing what the kernel gives it, the kernel thinks they are fine because they are just opaque when read back, but the user says what the fuck did a reboot do to all these timestamps?. Hence your example of roll-over dates is a strawman - you've constructed a problem that is irrelevant to the issue being pointed out. FWIW, we already have code in the superblock and VFS to avoid such problems on filesystems with limited timestamp resolution (i.e s_time_gran and current_fs_time()) so that what the VFS hands the filesystem is exactly what the VFS expects to get back from disk when comparing timestamps. If we are changing the in-kernel timestamp to have a greater dynamic range that anything we current support on disk, then we need support for all filesystems for similar translation and constraint. The filesystems need to be able to tell the kernel what they timestamp range they support, and then the kernel needs to follow those guidelines. And if the filesystem is mounted on a kernel that doesn't support the current filesystem's timestamp format, then at minimum that filesystem cannot do anything that writes a timestamp Put simply: the filesystem defines the timestamp range that can be used safely, not the userspace API. If the filesystem can't support the date it is handed then that is an out-of-range error. Since when have we accepted that it's OK to handle out-of-range data with silent overflows or corruption of the data that we are attempting to store? We're defining a new API to support a wider date range - there is nothing that prevents us from saying ERANGE can be returned to a timestamp that the file cannot store correctly Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/