On Mar 15, 2018, at 11:51 AM, Andiry Xu <jix...@eng.ucsd.edu> wrote: > > On Thu, Mar 15, 2018 at 2:05 AM, Arnd Bergmann <a...@arndb.de> wrote: >> On Thu, Mar 15, 2018 at 7:11 AM, Andiry Xu <jix...@eng.ucsd.edu> wrote: >>> On Wed, Mar 14, 2018 at 9:54 PM, Darrick J. Wong >>> <darrick.w...@oracle.com> wrote: >>>> On Sat, Mar 10, 2018 at 10:17:44AM -0800, Andiry Xu wrote: >> >>>>> + /* s_mtime and s_wtime should be together and their order should >>>>> not be >>>>> + * changed. we use an 8 byte write to update both of them atomically >>>>> + */ >>>>> + __le32 s_mtime; /* mount time */ >>>>> + __le32 s_wtime; /* write time */ >>>> >>>> Hmmm, 32-bit timestamps? 2038 isn't that far away... >>>> >>> >>> I will try fixing this in the next version. >> >> I would also recommend adding nanosecond-resolution timestamps. >> In theory, a signed 64-bit nanosecond field is sufficient for each timestamp >> (it's good for several hundred years), but the more common format uses >> 64-bit seconds and 32-bit nanoseconds in other file systems. >> >> Unfortunately it looks, you will have to come up with a more sophisticated >> update method above, even if you leave out the nanoseconds, you can't >> easily rely on a 16-byte atomic update across architectures to deal with >> the two 64-bit timestamps. For the superblock fields, you might be able >> to get away with using second resolution, and then encoding the >> timestamps as a signed 64-bit 'mkfs time' along with two unsigned >> 32-bit times added on top, which gives you a range of 136 years mount >> a file system after its creation. >> > > I will take a look at other file systems. > > Superblock mtime is not a big problem as it is updated rarely. 64-bit > seconds and 32-bit nanoseconds make the inode and log entry bigger, > and updating file->atime cannot be done with a single 64bit update. > That may be annoying and needs to use journaling.
If the 64-bit atomicity was really a performance issue, you could do something like: __u32 time_high = seconds >> 32; __u64 time_low = seconds << 32 | nanoseconds; and then you only need to update time_high with a journal operation if it has changed from the current time_high value (about once every 140 years), and the time_low can be set atomically. It needs a few extra cycles each time (hidden with an unlikely()) vs. just setting both, but that is a win if it avoids other CPU or IO overhead. Cheers, Andreas
signature.asc
Description: Message signed with OpenPGP