Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-04-14 Thread Thomas Gleixner
Fenghua,

On Tue, Apr 13 2021 at 23:40, Fenghua Yu wrote:
> On Mon, Apr 12, 2021 at 09:15:08AM +0200, Thomas Gleixner wrote:
>> Aside of that why are you trying to make this throttling in any way
>> accurate? It does not matter at all, really. Limit reached, put it to
>> sleep for some time and be done with it. No point in trying to be clever
>> for no value.
>
> Is it OK to set bld_ratelimit between 1 and 1,000 bus locks/sec for
> bld_ratelimit?
>
> Can I do the throttling like this?
>
>/* Enforce no more than bld_ratelimit bus locks/sec. */
>while (!__ratelimit(_bld_ratelimit))
>msleep(10);
>
> On one machine, if bld_ratelimit=1,000, that's about 5msec for a busy
> bus lock loop, i.e. bus is locked for about 5msec and then the process
> sleeps for 10msec and thus won't generate any bus lock.
> "dd" command running on other cores doesn't have noticeable degradation
> with bld_ratelimit=1,000.

Something like this makes sense. Add some rationale for the upper limit
you finally end up with so sysadmins can make informed decisions.

Thanks,

tglx


Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-04-13 Thread Fenghua Yu
Hi, Thomas,

On Mon, Apr 12, 2021 at 09:15:08AM +0200, Thomas Gleixner wrote:
> On Sat, Apr 03 2021 at 01:04, Fenghua Yu wrote:
> > On Sat, Mar 20, 2021 at 01:42:52PM +0100, Thomas Gleixner wrote:
> >> On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
> >> And even with throttling the injection rate further down to 25k per
> >> second the impact on the workload is still significant in the 10% range.
> > So I can change the ratelimit to system wide and call usleep_range()
> > to sleep: 
> >while (!__ratelimit(_bld_ratelimit))
> >usleep_range(100 / bld_ratelimit,
> > 100 / bld_ratelimit);
> >
> > The max bld_ratelimit is 1000,000/s because the max sleeping time is 1
> > usec.
> 
> Maximum sleep time is 1usec?
> 
> > The min bld_ratelimit is 1/s.
> 
> Again. This does not make sense at all. 1Mio bus lock events per second
> are way beyond the point where the machine does anything else than being
> stuck in buslocks.
> 
> Aside of that why are you trying to make this throttling in any way
> accurate? It does not matter at all, really. Limit reached, put it to
> sleep for some time and be done with it. No point in trying to be clever
> for no value.

Is it OK to set bld_ratelimit between 1 and 1,000 bus locks/sec for
bld_ratelimit?

Can I do the throttling like this?

   /* Enforce no more than bld_ratelimit bus locks/sec. */
   while (!__ratelimit(_bld_ratelimit))
   msleep(10);

On one machine, if bld_ratelimit=1,000, that's about 5msec for a busy
bus lock loop, i.e. bus is locked for about 5msec and then the process
sleeps for 10msec and thus won't generate any bus lock.
"dd" command running on other cores doesn't have noticeable degradation
with bld_ratelimit=1,000.

Thanks.

-Fenghua


Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-04-12 Thread Thomas Gleixner
On Sat, Apr 03 2021 at 01:04, Fenghua Yu wrote:
> On Sat, Mar 20, 2021 at 01:42:52PM +0100, Thomas Gleixner wrote:
>> On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
>> And even with throttling the injection rate further down to 25k per
>> second the impact on the workload is still significant in the 10% range.
>
> Thank you for your insight!
>
> So I can change the ratelimit to system wide and call usleep_range()
> to sleep: 
>while (!__ratelimit(_bld_ratelimit))
>usleep_range(100 / bld_ratelimit,
> 100 / bld_ratelimit);
>
> The max bld_ratelimit is 1000,000/s because the max sleeping time is 1
> usec.

Maximum sleep time is 1usec?

> The min bld_ratelimit is 1/s.

Again. This does not make sense at all. 1Mio bus lock events per second
are way beyond the point where the machine does anything else than being
stuck in buslocks.

Aside of that why are you trying to make this throttling in any way
accurate? It does not matter at all, really. Limit reached, put it to
sleep for some time and be done with it. No point in trying to be clever
for no value.

Thanks,

tglx


Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-04-02 Thread Fenghua Yu
Hi, Thomas,

On Sat, Mar 20, 2021 at 01:42:52PM +0100, Thomas Gleixner wrote:
> On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
> > On Fri, Mar 19, 2021 at 10:30:50PM +0100, Thomas Gleixner wrote:
> >> > +if (sscanf(arg, "ratelimit:%d", ) == 1 && ratelimit > 
> >> > 0) {
> >> > +bld_ratelimit = ratelimit;
> >> 
> >> So any rate up to INTMAX/s is valid here, right?
> >
> > Yes. I don't see smaller limitation than INTMX/s. Is that right?
> 
> That's a given, but what's the point of limits in that range?
> 
> A buslock access locks up the system for X cycles. So the total amount
> of allowable damage in cycles per second is:
> 
>limit * stall_cycles_per_bus_lock
> 
> ergo the time (in seconds) which the system is locked up is:
> 
>limit * stall_cycles_per_bus_lock / cpufreq
> 
> Which means for ~INTMAX/2 on a 2 GHz CPU:
> 
>2 * 10^9 * $CYCLES  / 2 * 10^9  = $CYCLES seconds
> 
> Assumed the inflicted damage is only 1 cycle then #LOCK is pretty much
> permanently on if there are enough threads. Sure #DB will slow them
> down, but it still does not make any sense at all especially as the
> damage is certainly greater than a single cycle.
> 
> And because the changelogs and the docs are void of numbers I just got
> real numbers myself.
> 
> With a single thread doing a 'lock inc *mem' accross a cache line
> boundary the workload which I measured with perf stat goes from:
> 
>  5,940,985,091  instructions  #0.88  insn per cycle   
>   
>2.780950806 seconds time elapsed
>0.99848 seconds user
>4.202137000 seconds sys
> to
> 
>  7,467,979,504  instructions  #0.10  insn per cycle   
>   
>5.110795917 seconds time elapsed
>7.123499000 seconds user
>   37.266852000 seconds sys
> 
> The buslock injection rate is ~250k per second.
> 
> Even if I ratelimit the locked inc by a delay loop of ~5000 cycles
> which is probably more than what the #DB will cost then this single task
> still impacts the workload significantly:
> 
>  6,496,994,537  instructions  #0.39  insn per cycle   
>   
>3.043275473 seconds time elapsed
>1.899852000 seconds user
>8.957088000 seconds sys
> 
> The buslock injection rate is down to ~150k per second in this case.
> 
> And even with throttling the injection rate further down to 25k per
> second the impact on the workload is still significant in the 10% range.

Thank you for your insight!

So I can change the ratelimit to system wide and call usleep_range()
to sleep: 
   while (!__ratelimit(_bld_ratelimit))
   usleep_range(100 / bld_ratelimit,
100 / bld_ratelimit);

The max bld_ratelimit is 1000,000/s because the max sleeping time is 1 usec.
The min bld_ratelimit is 1/s.

> 
> And of course the documentation of the ratelimit parameter explains all
> of this in great detail so the administrator has a trivial job to tune
> that, right?

I will explain how to tune the parameter in buslock.rst doc.

> 
> >> > +case sld_ratelimit:
> >> > +/* Enforce no more than bld_ratelimit bus locks/sec. */
> >> > +while (!__ratelimit(_current_user()->bld_ratelimit))
> >> > +msleep(1000 / bld_ratelimit);
> 
> For any ratelimit > 1000 this will loop up to 1000 times with
> CONFIG_HZ=1000.
> 
> Assume that the buslock producer has tons of threads which all end up
> here pretty soon then you launch a mass wakeup in the worst case every
> jiffy. Are you sure that the cure is better than the disease?

if using usleep_range() to sleep, the threads will not sleep and wakeup,
right? Maybe I can use msleep() for msec (bigger bld_ratelimit) and
usleep_range() for usec (smaller bld_ratelimit)?

Even if there is mass wakeup, throttling the threads can avoid the system
wide performance degradation (e.g. 7x slower dd command in another user).
Is that a good justification for throttling the threads?

> 
> > If I split this whole patch set into two patch sets:
> > 1. Three patches in the first patch set: the enumeration patch, the warn
> >and fatal patch, and the documentation patch.
> > 2. Two patches in the second patch set: the ratelimit patch and the
> >documentation patch.
> >
> > Then I will send the two patch sets separately, you will accept them one
> > by one. Is that OK?
> 
> That's obviously the right thing to do because #1 should be ready and we
> can sort out #2 seperately. See the conversation with Tony.

Thank you for picking up the first patch set!

-Fenghua


Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-04-02 Thread Fenghua Yu
Hi, Thomas,

On Sat, Mar 20, 2021 at 02:57:52PM +0100, Thomas Gleixner wrote:
> On Sat, Mar 20 2021 at 02:01, Thomas Gleixner wrote:
> 
> > On Fri, Mar 19 2021 at 21:50, Tony Luck wrote:
> >>>  What is the justifucation for making this rate limit per UID and not
> >>>  per task, per process or systemwide?
> >>
> >> The concern is that a malicious user is running a workload that loops
> >> obtaining the buslock. This brings the whole system to its knees.
> >>
> >> Limiting per task doesn't help. The user can just fork(2) a whole bunch
> >> of tasks for a distributed buslock attack..
> >
> > Fair enough.
> >
> >> Systemwide might be an interesting alternative. Downside would be 
> >> accidental
> >> rate limit of non-malicious tasks that happen to grab a bus lock 
> >> periodically
> >> but in the same window with other buslocks from other users.
> >>
> >> Do you think that a risk worth taking to make the code simpler?
> >
> > I'd consider it low risk, but I just looked for the usage of the
> > existing ratelimit in struct user and the related commit. Nw it's dawns
> > on me where you are coming from.
> 
> So after getting real numbers myself, I have more thoughts on
> this. Setting a reasonable per user limit might be hard when you want to
> protect e.g. against an orchestrated effort by several users
> (containers...). If each of them stays under the limit which is easy
> enough to figure out then you still end up with significant accumulated
> damage.
> 
> So systemwide might not be the worst option after all.

Indeed.

> 
> The question is how wide spread are bus locks in existing applications?
> I haven't found any on a dozen machines with random variants of
> workloads so far according to perf ... -e sq_misc.split_lock.

We have been running various tests widely inside Intel (and also outside)
after enabling split lock and captured a few split lock issues in firmware,
kernel, drivers, and apps. As you know, we have submitted a few patches to
fix the split lock issues in the kernel and drivers (e.g. split lock
in bit ops) and fixed a few split lock issues in firmware.

But so far I'm not aware of any split lock issues in user space yet.
I guess compilers do good cache line alignment good job to avoid this
issue. But inline asm in user apps can easily hit this issue (on purpose).

> 
> What's the actual scenario in the real world where a buslock access
> might be legitimate?

I did a simple experiment: looping on a split locked instruction on
one core in one user can slow down "dd" command running on another core
in another user by 7 times. A malicious user can do similar things to
slow down the whole system performance, right?

> 
> And what's the advice, recommendation for a system administrator how to
> analyze the situation and what kind of parameter to set?
> 
> I tried to get answers from Documentation/x86/buslock.rst, but 

Can I change the sleep code in the handle_bus_lock() to the following?

   while (!__ratelimit(_bld_ratelimit))
   usleep_range(100 / bld_ratelimit,
100 / bld_ratelimit);

Maybe the system wide bus lock ratelimit can be set to default value
1000,000/s which is also the max ratelimit value.

The max sleep in the kernel is 1 us which means max bld_ratelimit
can be up to 1000,000.

If the system administrator think bus locks are less tolerant and wants
to throttle bus lock further, bld_ratelimit can be set as a smaller number.
The smallest bld_ratelimit is 1.

When I gradually decreases bld_ratelimit value, I can see less bus locks
can be issued per second systemwide and "dd" command or other memory
benchmarks are less impacted by the bus locks.

If this works, I will have the buslock.rst doc to explain the situation
and how to set the parameter.

Thanks.

-Fenghua


RE: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-20 Thread Thomas Gleixner
On Sat, Mar 20 2021 at 02:01, Thomas Gleixner wrote:

> On Fri, Mar 19 2021 at 21:50, Tony Luck wrote:
>>>  What is the justifucation for making this rate limit per UID and not
>>>  per task, per process or systemwide?
>>
>> The concern is that a malicious user is running a workload that loops
>> obtaining the buslock. This brings the whole system to its knees.
>>
>> Limiting per task doesn't help. The user can just fork(2) a whole bunch
>> of tasks for a distributed buslock attack..
>
> Fair enough.
>
>> Systemwide might be an interesting alternative. Downside would be accidental
>> rate limit of non-malicious tasks that happen to grab a bus lock periodically
>> but in the same window with other buslocks from other users.
>>
>> Do you think that a risk worth taking to make the code simpler?
>
> I'd consider it low risk, but I just looked for the usage of the
> existing ratelimit in struct user and the related commit. Nw it's dawns
> on me where you are coming from.

So after getting real numbers myself, I have more thoughts on
this. Setting a reasonable per user limit might be hard when you want to
protect e.g. against an orchestrated effort by several users
(containers...). If each of them stays under the limit which is easy
enough to figure out then you still end up with significant accumulated
damage.

So systemwide might not be the worst option after all.

The question is how wide spread are bus locks in existing applications?
I haven't found any on a dozen machines with random variants of
workloads so far according to perf ... -e sq_misc.split_lock.

What's the actual scenario in the real world where a buslock access
might be legitimate?

And what's the advice, recommendation for a system administrator how to
analyze the situation and what kind of parameter to set?

I tried to get answers from Documentation/x86/buslock.rst, but 

Thanks,

tglx




Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-20 Thread Thomas Gleixner
On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
> On Fri, Mar 19, 2021 at 10:30:50PM +0100, Thomas Gleixner wrote:
>> > +  if (sscanf(arg, "ratelimit:%d", ) == 1 && ratelimit > 0) {
>> > +  bld_ratelimit = ratelimit;
>> 
>> So any rate up to INTMAX/s is valid here, right?
>
> Yes. I don't see smaller limitation than INTMX/s. Is that right?

That's a given, but what's the point of limits in that range?

A buslock access locks up the system for X cycles. So the total amount
of allowable damage in cycles per second is:

   limit * stall_cycles_per_bus_lock

ergo the time (in seconds) which the system is locked up is:

   limit * stall_cycles_per_bus_lock / cpufreq

Which means for ~INTMAX/2 on a 2 GHz CPU:

   2 * 10^9 * $CYCLES  / 2 * 10^9  = $CYCLES seconds

Assumed the inflicted damage is only 1 cycle then #LOCK is pretty much
permanently on if there are enough threads. Sure #DB will slow them
down, but it still does not make any sense at all especially as the
damage is certainly greater than a single cycle.

And because the changelogs and the docs are void of numbers I just got
real numbers myself.

With a single thread doing a 'lock inc *mem' accross a cache line
boundary the workload which I measured with perf stat goes from:

 5,940,985,091  instructions  #0.88  insn per cycle 

   2.780950806 seconds time elapsed
   0.99848 seconds user
   4.202137000 seconds sys
to

 7,467,979,504  instructions  #0.10  insn per cycle 

   5.110795917 seconds time elapsed
   7.123499000 seconds user
  37.266852000 seconds sys

The buslock injection rate is ~250k per second.

Even if I ratelimit the locked inc by a delay loop of ~5000 cycles
which is probably more than what the #DB will cost then this single task
still impacts the workload significantly:

 6,496,994,537  instructions  #0.39  insn per cycle 

   3.043275473 seconds time elapsed
   1.899852000 seconds user
   8.957088000 seconds sys

The buslock injection rate is down to ~150k per second in this case.

And even with throttling the injection rate further down to 25k per
second the impact on the workload is still significant in the 10% range.

And of course the documentation of the ratelimit parameter explains all
of this in great detail so the administrator has a trivial job to tune
that, right?

>> > +  case sld_ratelimit:
>> > +  /* Enforce no more than bld_ratelimit bus locks/sec. */
>> > +  while (!__ratelimit(_current_user()->bld_ratelimit))
>> > +  msleep(1000 / bld_ratelimit);

For any ratelimit > 1000 this will loop up to 1000 times with
CONFIG_HZ=1000.

Assume that the buslock producer has tons of threads which all end up
here pretty soon then you launch a mass wakeup in the worst case every
jiffy. Are you sure that the cure is better than the disease?

> If I split this whole patch set into two patch sets:
> 1. Three patches in the first patch set: the enumeration patch, the warn
>and fatal patch, and the documentation patch.
> 2. Two patches in the second patch set: the ratelimit patch and the
>documentation patch.
>
> Then I will send the two patch sets separately, you will accept them one
> by one. Is that OK?

That's obviously the right thing to do because #1 should be ready and we
can sort out #2 seperately. See the conversation with Tony.

Thanks,

tglx


RE: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-19 Thread Thomas Gleixner
On Fri, Mar 19 2021 at 21:50, Tony Luck wrote:
>>  What is the justifucation for making this rate limit per UID and not
>>  per task, per process or systemwide?
>
> The concern is that a malicious user is running a workload that loops
> obtaining the buslock. This brings the whole system to its knees.
>
> Limiting per task doesn't help. The user can just fork(2) a whole bunch
> of tasks for a distributed buslock attack..

Fair enough.

> Systemwide might be an interesting alternative. Downside would be accidental
> rate limit of non-malicious tasks that happen to grab a bus lock periodically
> but in the same window with other buslocks from other users.
>
> Do you think that a risk worth taking to make the code simpler?

I'd consider it low risk, but I just looked for the usage of the
existing ratelimit in struct user and the related commit. Nw it's dawns
on me where you are coming from.

So that seems to become a pattern ... so the uncompiled thing below
might solve that.

Yes, it makes the efivars thingy slower, but do we care?

We neither care about efivars performance nor about the buslock
performance.

But I pretty much care about no having to stare at code which gets the
fundamental refcounting wrong.

Thanks,

tglx
---
 fs/efivarfs/Kconfig   |1 +
 fs/efivarfs/file.c|   11 +--
 include/linux/ratelimit.h |1 +
 include/linux/ratelimit_uid.h |   26 ++
 include/linux/sched/user.h|4 ++--
 kernel/user.c |7 ---
 lib/Kconfig   |3 +++
 lib/ratelimit.c   |   41 +
 8 files changed, 87 insertions(+), 7 deletions(-)

--- a/fs/efivarfs/Kconfig
+++ b/fs/efivarfs/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config EFIVAR_FS
tristate "EFI Variable filesystem"
+   select UID_RATELIMIT
depends on EFI
default m
help
--- a/fs/efivarfs/file.c
+++ b/fs/efivarfs/file.c
@@ -63,6 +63,14 @@ static ssize_t efivarfs_file_write(struc
return bytes;
 }
 
+static const struct uid_ratelimit_cfg efivars_rl = {
+   .which  = UID_RATELIMIT_EFIVARS,
+   .interval   = HZ,
+   .burst  = 100,
+   .flags  = RATELIMIT_MSG_ON_RELEASE,
+   .delay  = 50,
+};
+
 static ssize_t efivarfs_file_read(struct file *file, char __user *userbuf,
size_t count, loff_t *ppos)
 {
@@ -73,8 +81,7 @@ static ssize_t efivarfs_file_read(struct
ssize_t size = 0;
int err;
 
-   while (!__ratelimit(>f_cred->user->ratelimit))
-   msleep(50);
+   uid_ratelimit(_rl);
 
err = efivar_entry_size(var, );
 
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -3,6 +3,7 @@
 #define _LINUX_RATELIMIT_H
 
 #include 
+#include 
 #include 
 #include 
 
--- /dev/null
+++ b/include/linux/ratelimit_uid.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RATELIMIT_UID_H
+#define _LINUX_RATELIMIT_UID_H
+
+/* Per UID ratelimits */
+enum uid_ratelimits {
+#ifdef CONFIG_EFIVAR_FS
+   UID_RATELIMIT_EFIVARS,
+#endif
+   UID_RATELIMIT_MAX,
+};
+
+#define UID_RATELIMIT_NODELAY  ULONG_MAX
+
+struct uid_ratelimit_cfg {
+   enum uid_ratelimits which;
+   int interval;
+   int burst;
+   unsigned long   flags;
+   unsigned long   delay;
+};
+
+extern int __uid_ratelimit(const struct uid_ratelimit_cfg *cfg, const void 
*func);
+#define uid_ratelimit(cfg) __uid_ratelimit(cfg, __func__)
+
+#endif
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -40,8 +40,8 @@ struct user_struct {
atomic_t nr_watches;/* The number of watches this user currently 
has */
 #endif
 
-   /* Miscellaneous per-user rate limit */
-   struct ratelimit_state ratelimit;
+   /* Miscellaneous per-user rate limits storage */
+   struct ratelimit_state  *ratelimits[UID_RATELIMIT_MAX];
 };
 
 extern int uids_sysfs_init(void);
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -102,7 +102,6 @@ struct user_struct root_user = {
.sigpending = ATOMIC_INIT(0),
.locked_shm = 0,
.uid= GLOBAL_ROOT_UID,
-   .ratelimit  = RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
 };
 
 /*
@@ -139,8 +138,12 @@ static struct user_struct *uid_hash_find
 static void free_user(struct user_struct *up, unsigned long flags)
__releases(_lock)
 {
+   unsigned int i;
+
uid_hash_remove(up);
spin_unlock_irqrestore(_lock, flags);
+   for (i = 0; i < UID_RATELIMIT_MAX; i++)
+   kfree(up->ratelimits[i]);
kmem_cache_free(uid_cachep, up);
 }
 
@@ -188,8 +191,6 @@ struct user_struct *alloc_uid(kuid_t uid
 
new->uid = uid;
refcount_set(>__count, 1);
-   ratelimit_state_init(>ratelimit, HZ, 100);
-   

Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-19 Thread Fenghua Yu
Hi, Thomas,

On Fri, Mar 19, 2021 at 10:30:50PM +0100, Thomas Gleixner wrote:
> On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> > Change Log:
> > v5:
> > Address all comments from Thomas:
> > - Merge patch 2 and patch 3 into one patch so all "split_lock_detect="
> >   options are processed in one patch.
> 
> What? I certainly did not request that. I said:
> 
>  "Why is this seperate and an all in one thing? patch 2/4 changes the
>   parameter first and 3/4 adds a new option"
> 
> which means we want documentation for patch 2 and documentation for
> patch 3? 
> 
> The ratelimit thing is clearly an extra functionality on top of that
> buslock muck.
> 
> Next time I write it out..

Sorry for misunderstanding your comments. I will split the document patch
into two: one for patch 2 (warn and fatal) and one for patch 3 (ratelimit).

> 
> > +   if (sscanf(arg, "ratelimit:%d", ) == 1 && ratelimit > 0) {
> > +   bld_ratelimit = ratelimit;
> 
> So any rate up to INTMAX/s is valid here, right?

Yes. I don't see smaller limitation than INTMX/s. Is that right?

> 
> > +   case sld_ratelimit:
> > +   /* Enforce no more than bld_ratelimit bus locks/sec. */
> > +   while (!__ratelimit(_current_user()->bld_ratelimit))
> > +   msleep(1000 / bld_ratelimit);
> 
> which is cute because msleep() will always sleep until the next jiffie
> increment happens.
> 
> What's not so cute here is the fact that get_current_user() takes a
> reference on current's UID on every invocation, but nothing ever calls
> free_uid(). I missed that last time over the way more obvious HZ division.

I will call free_uid().

> 
> > +++ b/kernel/user.c
> > @@ -103,6 +103,9 @@ struct user_struct root_user = {
> > .locked_shm = 0,
> > .uid= GLOBAL_ROOT_UID,
> > .ratelimit  = RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
> > +#ifdef CONFIG_CPU_SUP_INTEL
> > +   .bld_ratelimit  = RATELIMIT_STATE_INIT(root_user.bld_ratelimit, 0, 0),
> > +#endif
> >  };
> >  
> >  /*
> > @@ -172,6 +175,11 @@ void free_uid(struct user_struct *up)
> > free_user(up, flags);
> >  }
> >  
> > +#ifdef CONFIG_CPU_SUP_INTEL
> > +/* Some Intel CPUs may set this for rate-limited bus locks. */
> > +int bld_ratelimit;
> > +#endif
> 
> Of course this variable is still required to be in the core kernel code
> because?
> 
> While you decided to munge this all together, you obviously ignored the
> following review comment:
> 
>   "It also lacks the information that the ratelimiting is per UID
>and not per task and why this was chosen to be per UID..."
> 
> There is still no reasoning neither in the changelog nor in the cover
> letter nor in a reply to my review.
> 
> So let me repeat my question and make it more explicit:
> 
>   What is the justifucation for making this rate limit per UID and not
>   per task, per process or systemwide?

Tony jut now answered the justification. If that's OK, I will add the
answer in the commit message.

> 
> >  struct user_struct *alloc_uid(kuid_t uid)
> >  {
> > struct hlist_head *hashent = uidhashentry(uid);
> > @@ -190,6 +198,11 @@ struct user_struct *alloc_uid(kuid_t uid)
> > refcount_set(>__count, 1);
> > ratelimit_state_init(>ratelimit, HZ, 100);
> > ratelimit_set_flags(>ratelimit, RATELIMIT_MSG_ON_RELEASE);
> > +#ifdef CONFIG_CPU_SUP_INTEL
> > +   ratelimit_state_init(>bld_ratelimit, HZ, bld_ratelimit);
> > +   ratelimit_set_flags(>bld_ratelimit,
> > +   RATELIMIT_MSG_ON_RELEASE);
> > +#endif
> 
> If this has a proper justification for being per user and having to add
> 40 bytes per UID for something which is mostly unused then there are
> definitely better ways to do that than slapping #ifdefs into
> architecture agnostic core code.
> 
> So if you instead of munging the code patches had split the
> documentation, then I could apply the first 3 patches and we would only
> have to sort out the ratelimiting muck.

If I split this whole patch set into two patch sets:
1. Three patches in the first patch set: the enumeration patch, the warn
   and fatal patch, and the documentation patch.
2. Two patches in the second patch set: the ratelimit patch and the
   documentation patch.

Then I will send the two patch sets separately, you will accept them one
by one. Is that OK?

Or should I still send the 5 patches in one patch set so you will pick up
the first 3 patches and then the next 2 patches separately?

Thanks.

-Fenghua



RE: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-19 Thread Luck, Tony
>  What is the justifucation for making this rate limit per UID and not
>  per task, per process or systemwide?

The concern is that a malicious user is running a workload that loops
obtaining the buslock. This brings the whole system to its knees.

Limiting per task doesn't help. The user can just fork(2) a whole bunch
of tasks for a distributed buslock attack..

Systemwide might be an interesting alternative. Downside would be accidental
rate limit of non-malicious tasks that happen to grab a bus lock periodically
but in the same window with other buslocks from other users.

Do you think that a risk worth taking to make the code simpler?

-Tony


Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-19 Thread Thomas Gleixner
On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> Change Log:
> v5:
> Address all comments from Thomas:
> - Merge patch 2 and patch 3 into one patch so all "split_lock_detect="
>   options are processed in one patch.

What? I certainly did not request that. I said:

 "Why is this seperate and an all in one thing? patch 2/4 changes the
  parameter first and 3/4 adds a new option"

which means we want documentation for patch 2 and documentation for
patch 3? 

The ratelimit thing is clearly an extra functionality on top of that
buslock muck.

Next time I write it out..

> + if (sscanf(arg, "ratelimit:%d", ) == 1 && ratelimit > 0) {
> + bld_ratelimit = ratelimit;

So any rate up to INTMAX/s is valid here, right?

> + case sld_ratelimit:
> + /* Enforce no more than bld_ratelimit bus locks/sec. */
> + while (!__ratelimit(_current_user()->bld_ratelimit))
> + msleep(1000 / bld_ratelimit);

which is cute because msleep() will always sleep until the next jiffie
increment happens.

What's not so cute here is the fact that get_current_user() takes a
reference on current's UID on every invocation, but nothing ever calls
free_uid(). I missed that last time over the way more obvious HZ division.

> +++ b/kernel/user.c
> @@ -103,6 +103,9 @@ struct user_struct root_user = {
>   .locked_shm = 0,
>   .uid= GLOBAL_ROOT_UID,
>   .ratelimit  = RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
> +#ifdef CONFIG_CPU_SUP_INTEL
> + .bld_ratelimit  = RATELIMIT_STATE_INIT(root_user.bld_ratelimit, 0, 0),
> +#endif
>  };
>  
>  /*
> @@ -172,6 +175,11 @@ void free_uid(struct user_struct *up)
>   free_user(up, flags);
>  }
>  
> +#ifdef CONFIG_CPU_SUP_INTEL
> +/* Some Intel CPUs may set this for rate-limited bus locks. */
> +int bld_ratelimit;
> +#endif

Of course this variable is still required to be in the core kernel code
because?

While you decided to munge this all together, you obviously ignored the
following review comment:

  "It also lacks the information that the ratelimiting is per UID
   and not per task and why this was chosen to be per UID..."

There is still no reasoning neither in the changelog nor in the cover
letter nor in a reply to my review.

So let me repeat my question and make it more explicit:

  What is the justifucation for making this rate limit per UID and not
  per task, per process or systemwide?

>  struct user_struct *alloc_uid(kuid_t uid)
>  {
>   struct hlist_head *hashent = uidhashentry(uid);
> @@ -190,6 +198,11 @@ struct user_struct *alloc_uid(kuid_t uid)
>   refcount_set(>__count, 1);
>   ratelimit_state_init(>ratelimit, HZ, 100);
>   ratelimit_set_flags(>ratelimit, RATELIMIT_MSG_ON_RELEASE);
> +#ifdef CONFIG_CPU_SUP_INTEL
> + ratelimit_state_init(>bld_ratelimit, HZ, bld_ratelimit);
> + ratelimit_set_flags(>bld_ratelimit,
> + RATELIMIT_MSG_ON_RELEASE);
> +#endif

If this has a proper justification for being per user and having to add
40 bytes per UID for something which is mostly unused then there are
definitely better ways to do that than slapping #ifdefs into
architecture agnostic core code.

So if you instead of munging the code patches had split the
documentation, then I could apply the first 3 patches and we would only
have to sort out the ratelimiting muck.

Thanks,

tglx


[PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock

2021-03-12 Thread Fenghua Yu
Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.

#AC for split lock is already implemented. Add code to use the #DB for
bus lock feature to cover additional situations with new options to
mitigate.

split_lock_detect=
#AC for split lock  #DB for bus lock

off Do nothing  Do nothing

warnKernel OOPs Warn once per task and
Warn once per task and  and continues to run.
disable future checking
When both features are
supported, warn in #AC

fatal   Kernel OOPs Send SIGBUS to user.
Send SIGBUS to user
When both features are
supported, fatal in #AC

ratelimit:N Do nothing  Limit bus lock rate to
N per second in the
current non-root user.

Default option is "warn".

Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.

#DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.

Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.

Delivery of #DB for bus lock in userspace clears DR6[11]. To avoid
confusion in identifying #DB, #DB handler sets the bit to 1 before
returning to the interrupted task.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
Change Log:
v5:
Address all comments from Thomas:
- Merge patch 2 and patch 3 into one patch so all "split_lock_detect="
  options are processed in one patch.
- Change warn to #AC if both #AC and #DB are supported.
- Remove sld and bld variables and use boot_cpu_has() to check bus lock
  split lock support.
- Remove bus lock checking in handle_bus_lock().
- Remove bld_ratelimit < HZ/2 check.
- Add rate limit handling comment in bus lock #DB.
- Define bld_ratelimit only for Intel CPUs.

v3:
- Enable Bus Lock Detection when fatal to handle bus lock from non-WB
  (PeterZ).

v2:
- Send SIGBUS in fatal case for bus lock #DB (PeterZ).

v1::
- Check bus lock bit by its positive polarity (Xiaoyao).

RFC v3:
- Remove DR6_RESERVED change (PeterZ).

 arch/x86/include/asm/cpu.h   |  10 +-
 arch/x86/include/asm/msr-index.h |   1 +
 arch/x86/include/uapi/asm/debugreg.h |   1 +
 arch/x86/kernel/cpu/common.c |   2 +-
 arch/x86/kernel/cpu/intel.c  | 148 +++
 arch/x86/kernel/traps.c  |   7 ++
 include/linux/sched/user.h   |   5 +-
 kernel/user.c|  13 +++
 8 files changed, 162 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccbd493b..991de5f2a09c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -41,12 +41,14 @@ unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
-extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void __init sld_setup(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+extern void handle_bus_lock(struct pt_regs *regs);
+extern int bld_ratelimit;
 #else
-static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void __init sld_setup(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long 
error_code)
 {
@@ -57,6 +59,10 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
return false;
 }
+
+static inline void handle_bus_lock(struct pt_regs *regs)
+{
+}
 #endif
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..558485965f21 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -265,6 +265,7 @@
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
 #define DEBUGCTLMSR_BTF(1UL <<  1) /* single-step on 
branches */
+#define DEBUGCTLMSR_BUS_LOCK_DETECT(1UL <<  2) /* Bus lock detect */
 #define DEBUGCTLMSR_TR