Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-16 Thread Thomas Gleixner
Bruce,

On Wed, Oct 14 2020 at 18:05, J. Bruce Fields wrote:
> On Mon, Oct 12, 2020 at 05:13:08PM -0400, J. Bruce Fields wrote:
>> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
>> I'll give it a shot.
>
> I took your code above verbatim, but should I really be following the
> example of ktime_mono_to_any()?  (With the seqlock, and maybe also the
> more general prototype in case someone needs the other conversions.)--b.

Having ktime_boot/real_to_any() makes sense.

Vs. the sequence count: In ktime_mono_to_any() it's required for 32bit,
but now that I look at it again it's pointless for 64bit.

For ktime_real_to_any() and ktime_boot_to_any() the sequence count is
obviously mandatory for 32bit, but it's also needed for 64bit, because it
needs to apply two offsets.

So
  read();
  sub(offs_real)
  set_timeofday()
  add(offs_boot)
  consume_result()

is not any different from:

  read();
  sub(offs_real)
  add(offs_boot)
  set_timeofday()
  consume_result()

Same thing for suspend_resume() instead of set_timeofday()

The only case which needs the sequence count even on 64bit is:

  read();
  sub(offs_real)
  set_timeofday();
  suspend_resume();
  add(offs_boot)
  consume_result();

No matter what, the conversion is a best effort approach and only
accurate when the offsets did not change between the point where the
input time was acquired and the point of conversion. But that's not any
different to the approach you have so far in sunrpc.

Something like the untested below. Needs to be split in several patches
and gain the wrapper inlines.

Thanks,

tglx
---
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -58,6 +58,7 @@ extern time64_t ktime_get_real_seconds(v
  */
 
 enum tk_offsets {
+   TK_OFFS_MONO,
TK_OFFS_REAL,
TK_OFFS_BOOT,
TK_OFFS_TAI,
@@ -68,6 +69,8 @@ extern ktime_t ktime_get(void);
 extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
 extern ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs);
 extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
+extern ktime_t ktime_boot_to_any(ktime_t tboot, enum tk_offsets offs);
+extern ktime_t ktime_real_to_any(ktime_t treal, enum tk_offsets offs);
 extern ktime_t ktime_get_raw(void);
 extern u32 ktime_get_resolution_ns(void);
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -855,7 +855,10 @@ u32 ktime_get_resolution_ns(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_resolution_ns);
 
-static ktime_t *offsets[TK_OFFS_MAX] = {
+static const ktime_t zero_offset;
+
+static const ktime_t * const offsets[TK_OFFS_MAX] = {
+   [TK_OFFS_MONO]  = _offset,
[TK_OFFS_REAL]  = _core.timekeeper.offs_real,
[TK_OFFS_BOOT]  = _core.timekeeper.offs_boot,
[TK_OFFS_TAI]   = _core.timekeeper.offs_tai,
@@ -864,8 +867,9 @@ static ktime_t *offsets[TK_OFFS_MAX] = {
 ktime_t ktime_get_with_offset(enum tk_offsets offs)
 {
struct timekeeper *tk = _core.timekeeper;
+   const ktime_t *offset = offsets[offs];
unsigned int seq;
-   ktime_t base, *offset = offsets[offs];
+   ktime_t base;
u64 nsecs;
 
WARN_ON(timekeeping_suspended);
@@ -885,8 +889,9 @@ EXPORT_SYMBOL_GPL(ktime_get_with_offset)
 ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs)
 {
struct timekeeper *tk = _core.timekeeper;
+   const ktime_t *offset = offsets[offs];
unsigned int seq;
-   ktime_t base, *offset = offsets[offs];
+   ktime_t base;
u64 nsecs;
 
WARN_ON(timekeeping_suspended);
@@ -906,13 +911,26 @@ EXPORT_SYMBOL_GPL(ktime_get_coarse_with_
  * ktime_mono_to_any() - convert mononotic time to any other time
  * @tmono: time to convert.
  * @offs:  which offset to use
+ *
+ * The conversion is a best effort approach and only correct when the
+ * offset of the selected target time has not changed since @tmono was
+ * read.
  */
 ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs)
 {
-   ktime_t *offset = offsets[offs];
+   const ktime_t *offset = offsets[offs];
unsigned int seq;
ktime_t tconv;
 
+   /*
+* 64bit can spare the sequence count. The read is safe
+* and checking the sequence count does not make it
+* more consistent.
+*/
+   if (IS_ENABLED(CONFIG_64BIT))
+   return ktime_add(tmono, *offset);
+
+   /* 32bit requires it to access the 64bit offset safely */
do {
seq = read_seqcount_begin(_core.seq);
tconv = ktime_add(tmono, *offset);
@@ -922,6 +940,56 @@ ktime_t ktime_mono_to_any(ktime_t tmono,
 }
 EXPORT_SYMBOL_GPL(ktime_mono_to_any);
 
+static ktime_t ktime_to_any(ktime_t tfrom, ktime_t *from_offset,
+   enum tk_offsets to_offs)
+{
+   const ktime_t *to_offset = offsets[to_offs];
+   unsigned int seq;
+   ktime_t tconv;
+
+   /*
+  

Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-14 Thread J. Bruce Fields
On Mon, Oct 12, 2020 at 05:13:08PM -0400, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
> > In fact the whole thing can be simplified. You can just use time in
> > nanoseconds retrieved via ktime_get_coarse_boottime() which does not
> > read the clocksource and advances once per tick and does not contain a
> > divison and is definitely faster than seconds_since_boot()
> > 
> > The expiry time is initialized via get_expiry() which does:
> > 
> >getboottime64();
> >return rv - boot.tv_sec; 
> > 
> > The expiry value 'rv' which is read out of the buffer is wall time in
> > seconds. So all you need is are function which convert real to boottime
> > and vice versa. That's trivial to implement and again faster than
> > getboottime64(). Something like this:
> > 
> > ktime_t ktime_real_to_boot(ktime_t real)
> > {
> > struct timekeeper *tk = _core.timekeeper;
> > ktime_t mono = ktime_sub(real, tk->offs_real);
> >   
> > return ktime_add(mono, tk->offs_boot);
> > }
> > 
> > so the above becomes:
> > 
> >return ktime_real_to_boot(rv * NSEC_PER_SEC);
> > 
> > which is still faster than a division.
> > 
> > The nanoseconds value after converting back to wall clock will need a
> > division to get seconds since the epoch, but that's not an issue because
> > that backward conversion already has one today.
> > 
> > You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
> > add's '1' to the expiry value and some janitoring of function names and
> > variable types, but no real big surgery AFAICT.
> 
> I'll give it a shot.

I took your code above verbatim, but should I really be following the
example of ktime_mono_to_any()?  (With the seqlock, and maybe also the
more general prototype in case someone needs the other conversions.)--b.

commit eae2005cb432
Author: J. Bruce Fields 
Date:   Wed Oct 14 10:19:59 2020 -0400

sunrpc: simplify cache expiry times

We've been rolling our own conversion between wallclock and boot time in
get_expiry() and convert_to_wallclock().  Thomas Gleixner suggests it
would be simpler to use nanoseconds since boot stored in time_t
internally, and create common helpers for the conversion in
kernel/time/timekeeping.c.

Signed-off-by: J. Bruce Fields 

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 10891b70fc7b..10868e74d6e2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -45,8 +45,8 @@
  */
 struct cache_head {
struct hlist_node   cache_list;
-   time64_texpiry_time;/* After time time, don't use the data 
*/
-   time64_tlast_refresh;   /* If CACHE_PENDING, this is when 
upcall was
+   ktime_t expiry_time;/* After time time, don't use the data 
*/
+   ktime_t last_refresh;   /* If CACHE_PENDING, this is when 
upcall was
 * sent, else this is when update was
 * received, though it is alway set to
 * be *after* ->flush_time.
@@ -59,7 +59,8 @@ struct cache_head {
 #defineCACHE_PENDING   2   /* An upcall has been sent but no reply 
received yet*/
 #defineCACHE_CLEANED   3   /* Entry has been cleaned from cache */
 
-#defineCACHE_NEW_EXPIRY 120/* keep new things pending confirmation 
for 120 seconds */
+#defineCACHE_NEW_EXPIRY (120*NSEC_PER_SEC)
+   /* keep new things pending confirmation for 120 seconds */
 
 struct cache_detail {
struct module * owner;
@@ -102,7 +103,7 @@ struct cache_detail {
 * than this.
 */
struct list_headothers;
-   time64_tnextcheck;
+   ktime_t nextcheck;
int entries;
 
/* fields for communication over channel */
@@ -159,11 +160,9 @@ static inline time64_t seconds_since_boot(void)
return ktime_get_real_seconds() - boot.tv_sec;
 }
 
-static inline time64_t convert_to_wallclock(time64_t sinceboot)
+static inline time64_t convert_to_wallclock(ktime_t sinceboot)
 {
-   struct timespec64 boot;
-   getboottime64();
-   return boot.tv_sec + sinceboot;
+   return ktime_boot_to_real(sinceboot) / NSEC_PER_SEC;
 }
 
 extern const struct file_operations cache_file_operations_pipefs;
@@ -298,17 +297,15 @@ static inline int get_time(char **bpp, time64_t *time)
return 0;
 }
 
-static inline time64_t get_expiry(char **bpp)
+static inline ktime_t get_expiry(char **bpp)
 {
time64_t rv;
-   struct timespec64 boot;
 
if (get_time(bpp, ))
return 0;
if (rv < 0)
return 0;
-   getboottime64();
-   return rv - boot.tv_sec;
+   return 

Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-13 Thread Christian Brauner
On Sat, Oct 10, 2020 at 12:19:14AM -0700, Andrei Vagin wrote:
> On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> > On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > > getboottime64() provides the time stamp of system boot. In case of
> > > time namespaces, the offset to the boot time stamp was not applied
> > > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > > the system boot time to userspace. In container runtimes which utilize
> > > time namespaces to virtualize boottime of a container, this leaks
> > > information about the host system boot time.
> > > 
> > > Therefore, we make getboottime64() to respect the time namespace offset
> > > for boottime by subtracting the boottime offset.
> > > 
> > > Signed-off-by: Michael Weiß 
> > > ---
> > >  kernel/time/timekeeping.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > > index 4c47f388a83f..67530cdb389e 100644
> > > --- a/kernel/time/timekeeping.c
> > > +++ b/kernel/time/timekeeping.c
> > > @@ -17,6 +17,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> > >  {
> > >   struct timekeeper *tk = _core.timekeeper;
> > >   ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > > + /* shift boot time stamp according to the timens offset */
> > > + t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> > 
> > Note that getbootime64() is mostly used in net/sunrpc and I don't know
> > if this change has any security implications for them.
> 
> I would prefer to not patch kernel internal functions if they are used
> not only to expose time to the userspace.
> 
> I think when kernel developers sees the getboottime64 function, they
> will expect that it returns the real time of kernel boot. They will
> not expect that it is aware of time namespaces and a returned time will
> depend on a task in which context it will be called.
> 
> IMHO, as a minimum, we need to update the documentation for this function or
> even adjust a function name.
> 
> And I think we need to consider an option to not change getbootime64 and
> apply a timens offset right in the show_stat(fs/proc/stat.c) function.

This is why I asked about this since I assumed this would break
someone's use-case. :)

In any case, if I understand correctly then we want the same thing: just
change fs/proc/stat.c i.e. have a a specific helper that applies the
correct offset.

Christian


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-12 Thread Thomas Gleixner
On Mon, Oct 12 2020 at 17:13, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
>> You wish. That's clearly wrong because that code is not guaranteed to
>> always run in a context which belongs to the root time namespace.
>
> Argh, right, thanks.
>
>> AFAICT, this stuff can run in softirq context which is context stealing
>> and the interrupted task can belong to a different time name space.
>
> Some of it runs in the context of a process doing IO to proc, some from
> kthreads.  So, anyway, yes, it's not consistent in the way we'd need.

Yes, that'll do it. If the process is in a time namespace then it's
definitely incorrect vs. the kthread which is in the root namespace.

>> You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
>> add's '1' to the expiry value and some janitoring of function names and
>> variable types, but no real big surgery AFAICT.
>
> I'll give it a shot.
>
> Thanks so much for taking a careful look at this.

Welcome. I was just looking at the use case. The code and especially
Arnds comment were odd enogh to make me look deeper. Such constructs are
usually showing shortcomings of the core interfaces. Seventh sense which
I gained over the past decades. :)

Thanks,

tglx


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-12 Thread J. Bruce Fields
On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
> On Fri, Oct 09 2020 at 09:55, J. Bruce Fields wrote:
> > Looking at how it's used in net/sunrpc/cache.c  All it's doing is
> > comparing times which have all been calculated relative to the time
> > returned by getboottime64().  So it doesn't really matter what
> > getboottime64() is, as long as it's always the same.
> >
> > So, I don't think this should change behavior of the sunrpc code at all.
> 
> You wish. That's clearly wrong because that code is not guaranteed to
> always run in a context which belongs to the root time namespace.

Argh, right, thanks.

> AFAICT, this stuff can run in softirq context which is context stealing
> and the interrupted task can belong to a different time name space.

Some of it runs in the context of a process doing IO to proc, some from
kthreads.  So, anyway, yes, it's not consistent in the way we'd need.

> In fact the whole thing can be simplified. You can just use time in
> nanoseconds retrieved via ktime_get_coarse_boottime() which does not
> read the clocksource and advances once per tick and does not contain a
> divison and is definitely faster than seconds_since_boot()
> 
> The expiry time is initialized via get_expiry() which does:
> 
>getboottime64();
>return rv - boot.tv_sec; 
> 
> The expiry value 'rv' which is read out of the buffer is wall time in
> seconds. So all you need is are function which convert real to boottime
> and vice versa. That's trivial to implement and again faster than
> getboottime64(). Something like this:
> 
> ktime_t ktime_real_to_boot(ktime_t real)
> {
> struct timekeeper *tk = _core.timekeeper;
> ktime_t mono = ktime_sub(real, tk->offs_real);
>   
> return ktime_add(mono, tk->offs_boot);
> }
> 
> so the above becomes:
> 
>return ktime_real_to_boot(rv * NSEC_PER_SEC);
> 
> which is still faster than a division.
> 
> The nanoseconds value after converting back to wall clock will need a
> division to get seconds since the epoch, but that's not an issue because
> that backward conversion already has one today.
> 
> You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
> add's '1' to the expiry value and some janitoring of function names and
> variable types, but no real big surgery AFAICT.

I'll give it a shot.

Thanks so much for taking a careful look at this.

--b.


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-10 Thread Thomas Gleixner
Michael,

On Sat, Oct 10 2020 at 13:50, Michael Weiß wrote:
> On 10.10.20 09:19, Andrei Vagin wrote:
>> And I think we need to consider an option to not change getbootime64 and
>> apply a timens offset right in the show_stat(fs/proc/stat.c)
>> function.

That's what I meant and failed to express correctly.

> Since the problems in softirq context mentioned from Thomas,
> I would agree to Andrei's option to just patch proc/stat.c and leave
> getboottime64 unchanged.
>
> Digging around in the kernel tree, I just found /proc/stat as the only
> place where boottime is exposed to userspace, thus it seems a valid
> option.

Yes, I thought about a wrapper function which adds the namespace offset,
but as it is the only place, open coding is fine.

Thanks,

tglx


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-10 Thread Michael Weiß


On 10.10.20 09:19, Andrei Vagin wrote:
> On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
>> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
>>> getboottime64() provides the time stamp of system boot. In case of
>>> time namespaces, the offset to the boot time stamp was not applied
>>> earlier. However, getboottime64 is used e.g., in /proc/stat to print
>>> the system boot time to userspace. In container runtimes which utilize
>>> time namespaces to virtualize boottime of a container, this leaks
>>> information about the host system boot time.
>>>
>>> Therefore, we make getboottime64() to respect the time namespace offset
>>> for boottime by subtracting the boottime offset.
>>>
>>> Signed-off-by: Michael Weiß 
>>> ---
>>>  kernel/time/timekeeping.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>>> index 4c47f388a83f..67530cdb389e 100644
>>> --- a/kernel/time/timekeeping.c
>>> +++ b/kernel/time/timekeeping.c
>>> @@ -17,6 +17,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
>>>  {
>>> struct timekeeper *tk = _core.timekeeper;
>>> ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
>>> +   /* shift boot time stamp according to the timens offset */
>>> +   t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
>> Note that getbootime64() is mostly used in net/sunrpc and I don't know
>> if this change has any security implications for them.
> I would prefer to not patch kernel internal functions if they are used
> not only to expose time to the userspace.
>
> I think when kernel developers sees the getboottime64 function, they
> will expect that it returns the real time of kernel boot. They will
> not expect that it is aware of time namespaces and a returned time will
> depend on a task in which context it will be called.
>
> IMHO, as a minimum, we need to update the documentation for this function or
> even adjust a function name.
>
> And I think we need to consider an option to not change getbootime64 and
> apply a timens offset right in the show_stat(fs/proc/stat.c) function.
>
> Thanks,
> Andrei

Since the problems in softirq context mentioned from Thomas,
I would agree to Andrei's option to just patch proc/stat.c and leave
getboottime64 unchanged.

Digging around in the kernel tree, I just found /proc/stat as the only
place where boottime is exposed to userspace, thus it seems a valid
option.

What do you think? If you agree I'll come up with an updated patch-set.

Cheers,
Michael





Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-10 Thread Andrei Vagin
On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > getboottime64() provides the time stamp of system boot. In case of
> > time namespaces, the offset to the boot time stamp was not applied
> > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > the system boot time to userspace. In container runtimes which utilize
> > time namespaces to virtualize boottime of a container, this leaks
> > information about the host system boot time.
> > 
> > Therefore, we make getboottime64() to respect the time namespace offset
> > for boottime by subtracting the boottime offset.
> > 
> > Signed-off-by: Michael Weiß 
> > ---
> >  kernel/time/timekeeping.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4c47f388a83f..67530cdb389e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> >  {
> > struct timekeeper *tk = _core.timekeeper;
> > ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > +   /* shift boot time stamp according to the timens offset */
> > +   t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> 
> Note that getbootime64() is mostly used in net/sunrpc and I don't know
> if this change has any security implications for them.

I would prefer to not patch kernel internal functions if they are used
not only to expose time to the userspace.

I think when kernel developers sees the getboottime64 function, they
will expect that it returns the real time of kernel boot. They will
not expect that it is aware of time namespaces and a returned time will
depend on a task in which context it will be called.

IMHO, as a minimum, we need to update the documentation for this function or
even adjust a function name.

And I think we need to consider an option to not change getbootime64 and
apply a timens offset right in the show_stat(fs/proc/stat.c) function.

Thanks,
Andrei


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-09 Thread Thomas Gleixner
On Fri, Oct 09 2020 at 09:55, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
>> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
>> > getboottime64() provides the time stamp of system boot. In case of
>> > time namespaces,
>
> Huh, I didn't know there were time namespaces.

There are enough people who live in their own universe, so there is a
clear need for them to have their own time zones :)

>> >struct timekeeper *tk = _core.timekeeper;
>> >ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
>> > +  /* shift boot time stamp according to the timens offset */
>> > +  t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
>> 
>> Note that getbootime64() is mostly used in net/sunrpc and I don't know
>> if this change has any security implications for them.
>> 
>> Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
>> to the time namespace of the caller, i.e. a container can e.g. reset
>> it's boottime when started. This is already possible. The series here
>> fixes a bug where /proc/stat's btime field is not virtualized but since
>> this changes getboottime64() this would also apply to sunrpc's
>> timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
>> i.e. the time in the initial time namespace?
>
> Looking at how it's used in net/sunrpc/cache.c  All it's doing is
> comparing times which have all been calculated relative to the time
> returned by getboottime64().  So it doesn't really matter what
> getboottime64() is, as long as it's always the same.
>
> So, I don't think this should change behavior of the sunrpc code at all.

You wish. That's clearly wrong because that code is not guaranteed to
always run in a context which belongs to the root time namespace.

AFAICT, this stuff can run in softirq context which is context stealing
and the interrupted task can belong to a different time name space.

So, no. All time namespace functions are seperate and the conversion to
and from the root name space happens at the system call boundaries.
Michael, this needs an explicit wrapper which can be used in those
places.

But that made me look at the usage of that in sunrpc.

static inline time64_t seconds_since_boot(void)
{
struct timespec64 boot;
getboottime64();
return ktime_get_real_seconds() - boot.tv_sec;
}

Yikes.

static inline time64_t convert_to_wallclock(time64_t sinceboot)
{
struct timespec64 boot;
getboottime64();
return boot.tv_sec + sinceboot;
}

The comment above seconds_since_boot() is just hillarious. How is the
above so much faster than ktime_get_boottime_seconds()? Arnd?

seconds_since_boot()
  getboottime64() {
 ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);

 *ts = ktime_to_timespec64(t) {
 
 ts.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, );
 }
   }
   ktime_get_real_seconds()
 return tk->xtime_sec;

ktime_get_bootime_seconds() contains a division as well to convert
nanoseconds to seconds and I seriously doubt that the difference is
observable at all. So that comment was pulled out of thin air.

The real reason to have this clumsiness is that the expiry time is read
out of a buffer. That time is seconds since the epoch, aka. wall
time. There is also the conversion the other way round which is used for
procfs. All that can't be changed of course.

Up to 2010 this used get_seconds() and that was changed in commit
c5b29f885afe ("sunrpc: use seconds since boot in expiry cache"):

This protects us from confusion when the wallclock time changes.

We convert to and from wallclock when  setting or reading expiry
times.

So this stuff has been hacked up as is because there are no functions to
convert wall time to boot time and vice versa. I know it's always better
to hack something up than to talk to people.

In fact the whole thing can be simplified. You can just use time in
nanoseconds retrieved via ktime_get_coarse_boottime() which does not
read the clocksource and advances once per tick and does not contain a
divison and is definitely faster than seconds_since_boot()

The expiry time is initialized via get_expiry() which does:

   getboottime64();
   return rv - boot.tv_sec; 

The expiry value 'rv' which is read out of the buffer is wall time in
seconds. So all you need is are function which convert real to boottime
and vice versa. That's trivial to implement and again faster than
getboottime64(). Something like this:

ktime_t ktime_real_to_boot(ktime_t real)
{
struct timekeeper *tk = _core.timekeeper;
ktime_t mono = ktime_sub(real, tk->offs_real);
  
return ktime_add(mono, tk->offs_boot);
}

so the above becomes:

   return ktime_real_to_boot(rv * NSEC_PER_SEC);

which is still faster than a division.

The nanoseconds value after converting back to wall clock will need a
division to get seconds since the epoch, but that's not an issue because
that backward conversion 

Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-09 Thread J. Bruce Fields
On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > getboottime64() provides the time stamp of system boot. In case of
> > time namespaces,

Huh, I didn't know there were time namespaces.

> > the offset to the boot time stamp was not applied
> > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > the system boot time to userspace. In container runtimes which utilize
> > time namespaces to virtualize boottime of a container, this leaks
> > information about the host system boot time.
> > 
> > Therefore, we make getboottime64() to respect the time namespace offset
> > for boottime by subtracting the boottime offset.
> > 
> > Signed-off-by: Michael Weiß 
> > ---
> >  kernel/time/timekeeping.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4c47f388a83f..67530cdb389e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> >  {
> > struct timekeeper *tk = _core.timekeeper;
> > ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > +   /* shift boot time stamp according to the timens offset */
> > +   t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> 
> Note that getbootime64() is mostly used in net/sunrpc and I don't know
> if this change has any security implications for them.
> 
> Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
> to the time namespace of the caller, i.e. a container can e.g. reset
> it's boottime when started. This is already possible. The series here
> fixes a bug where /proc/stat's btime field is not virtualized but since
> this changes getboottime64() this would also apply to sunrpc's
> timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
> i.e. the time in the initial time namespace?

Looking at how it's used in net/sunrpc/cache.c  All it's doing is
comparing times which have all been calculated relative to the time
returned by getboottime64().  So it doesn't really matter what
getboottime64() is, as long as it's always the same.

So, I don't think this should change behavior of the sunrpc code at all.

--b.


Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-09 Thread Christian Brauner
On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> getboottime64() provides the time stamp of system boot. In case of
> time namespaces, the offset to the boot time stamp was not applied
> earlier. However, getboottime64 is used e.g., in /proc/stat to print
> the system boot time to userspace. In container runtimes which utilize
> time namespaces to virtualize boottime of a container, this leaks
> information about the host system boot time.
> 
> Therefore, we make getboottime64() to respect the time namespace offset
> for boottime by subtracting the boottime offset.
> 
> Signed-off-by: Michael Weiß 
> ---
>  kernel/time/timekeeping.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 4c47f388a83f..67530cdb389e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
>  {
>   struct timekeeper *tk = _core.timekeeper;
>   ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> + /* shift boot time stamp according to the timens offset */
> + t = timens_ktime_to_host(CLOCK_BOOTTIME, t);

Note that getbootime64() is mostly used in net/sunrpc and I don't know
if this change has any security implications for them.

Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
to the time namespace of the caller, i.e. a container can e.g. reset
it's boottime when started. This is already possible. The series here
fixes a bug where /proc/stat's btime field is not virtualized but since
this changes getboottime64() this would also apply to sunrpc's
timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
i.e. the time in the initial time namespace?

Christian


[PATCH v2 2/4] time: make getboottime64 aware of time namespace

2020-10-07 Thread Michael Weiß
getboottime64() provides the time stamp of system boot. In case of
time namespaces, the offset to the boot time stamp was not applied
earlier. However, getboottime64 is used e.g., in /proc/stat to print
the system boot time to userspace. In container runtimes which utilize
time namespaces to virtualize boottime of a container, this leaks
information about the host system boot time.

Therefore, we make getboottime64() to respect the time namespace offset
for boottime by subtracting the boottime offset.

Signed-off-by: Michael Weiß 
---
 kernel/time/timekeeping.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f388a83f..67530cdb389e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
 {
struct timekeeper *tk = _core.timekeeper;
ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
+   /* shift boot time stamp according to the timens offset */
+   t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
 
*ts = ktime_to_timespec64(t);
 }
-- 
2.20.1