Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-26 Thread Fenghua Yu
On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 20, 2019 at 7:44 PM Tao Xu  wrote:
> >
> > From: Fenghua Yu 
> 
> >
> > From patchwork Wed Jan 16 21:18:41 2019
> > Content-Type: text/plain; charset="utf-8"
> 
> [snipped more stuff like this]
> 
> What happened here?
> 
> > +/* Return value that will be used to set umwait control MSR */
> > +static inline u32 umwait_control_val(void)
> > +{
> > +   /*
> > +* Enable or disable C0.2 (bit 0) based on global setting on all 
> > CPUs.
> > +* When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled.
> > +* So value in bit 0 is opposite of umwait_enable_c0_2.
> > +*/
> > +   return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK;
> > +}
> 
> This function is horribly named.  How about something like
> umwait_compute_msr_value() or something liek that?  Also, what
> happened to the maximum wait time?
> 
> > +
> > +static ssize_t umwait_enable_c0_2_show(struct device *dev,
> > +  struct device_attribute *attr,
> > +  char *buf)
> > +{
> > +   return sprintf(buf, "%d\n", umwait_enable_c0_2);
> 
> I realize that it's traditional to totally ignore races in sysfs and
> such, but it's a bad tradition.  Please either READ_ONCE it with a
> comment or take the mutex.
> 
> > +static ssize_t umwait_enable_c0_2_store(struct device *dev,
> > +   struct device_attribute *attr,
> > +   const char *buf, size_t count)
> > +{
> > +   int enable_c0_2, cpu, ret;
> > +   u32 msr_val;
> > +
> > +   ret = kstrtou32(buf, 10, _c0_2);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (enable_c0_2 != 1 && enable_c0_2 != 0)
> > +   return -EINVAL;
> 
> How about if (enable_c0_2 > 1)?
> 
> > +
> > +   mutex_lock(_lock);
> > +
> > +   umwait_enable_c0_2 = enable_c0_2;
> > +   msr_val = umwait_control_val();
> > +   get_online_cpus();
> > +   /* All CPUs have same umwait control setting */
> > +   for_each_online_cpu(cpu)
> > +   wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> > +   put_online_cpus();
> > +
> > +   mutex_unlock(_lock);
> 
> Please factor this thing out into a helper like
> umwait_update_all_cpus().  That helper can assert that the lock is
> held.
> 
> > +/* Set up umwait control MSR on this CPU using the current global setting. 
> > */
> > +static int umwait_cpu_online(unsigned int cpu)
> > +{
> > +   u32 msr_val;
> > +
> > +   mutex_lock(_lock);
> > +
> > +   msr_val = umwait_control_val();
> > +   wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> > +
> > +   mutex_unlock(_lock);
> > +
> > +   return 0;
> > +}
> > +
> > +static int __init umwait_init(void)
> > +{
> > +   struct device *dev;
> > +   int ret;
> > +
> > +   if (!boot_cpu_has(X86_FEATURE_WAITPKG))
> > +   return -ENODEV;
> > +
> > +   /* Add CPU global user wait interface to control umwait. */
> > +   dev = cpu_subsys.dev_root;
> > +   ret = sysfs_create_group(>kobj, _attr_group);
> > +   if (ret)
> > +   return ret;
> > +
> > +   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
> > +   umwait_cpu_online, NULL);
> 
> This hotplug notifier thing is awful.  Thomas, do we have a function
> that gets called every time a CPU is brought up (via BSP boot, AP
> boot, hotplug, hibernation resume, etc) where we can just put all
> these things?  cpu_init() is almost appropriate, except that it's
> called at somewhat erratic times (quite different for BSP and AP IIRC)
> and it's not called AFAICT during hibernation restore.  I suppose we
> could add a new thing that is called by cpu_init() and
> restore_processor_state().
> 
> Also, surely you should actually write the MSR in this function, too.

Seems the current patch set misses pm_notifier for hibernation on BSP.
All APs are all updated by the online funciton in the current patch set.
If adding hiberation pm_notifier to update MSR 0xe1 on BSP, is that good
enough?

Thanks.

-Fenghua


Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-26 Thread Fenghua Yu
On Sun, Feb 24, 2019 at 11:45:35AM -0800, Andy Lutomirski wrote:
> On Thu, Feb 21, 2019 at 2:57 PM Yu, Fenghua  wrote:
> >
> > > From: Fenghua Yu [mailto:fenghua...@intel.com]
> > > On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:
> > > Or to simplify the situation, how about we still use zero as global max 
> > > wait
> > > time (i.e. no limitation for global wait time) as implemented in this 
> > > patch
> > > set? User can still change the value to any value based on their usage. 
> > > Isn't
> > > this a right way to take care of any usage?
> >
> > Plus, if scheduler timers works, umwait will be forced time out by timer in 
> > HZ anyway.
> 
> Indeed.  But, on some configs, the scheduler timer intentionally will
> *not* fire.
> 
> >
> > Only for case without scheduler timer, sysadmin may need to set a small max 
> > umwait time to force timout. But in this case (real time), I doubt user app 
> > really wants to call umwait to save power with long latency of waking up 
> > from umwait. So likely in this case, user app won't call umwait and there 
> > is no usage to set a small value for max umwait time.
> 
> I don't really know why an application would use umwait at all.  About
> the only legitimate use I can think of is to treat it as a somewhat
> power-saving variant of REP NOP.  What I want to avoid is the case
> where it works dramatically differently on NO_HZ_FULL systems as
> compared to everything else.  Also, UMWAIT may behave a bit
> differently if the max timeout is hit, and I'd like that path to get
> exercised widely by making it happen even on default configs.
> 
> So I propose setting the timeout to either 100 microseconds or 100k
> "cycles" by default.  In the event someone determines that they save
> materially more power or gets materially better performance with a
> longer timeout, we can revisit the value.

OK. I will set the default umwait max time value to 100K cycles.

Thanks.

-Fenghua


Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-24 Thread Andy Lutomirski
On Thu, Feb 21, 2019 at 2:57 PM Yu, Fenghua  wrote:
>
> > From: Fenghua Yu [mailto:fenghua...@intel.com]
> > On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:
> > Or to simplify the situation, how about we still use zero as global max wait
> > time (i.e. no limitation for global wait time) as implemented in this patch
> > set? User can still change the value to any value based on their usage. 
> > Isn't
> > this a right way to take care of any usage?
>
> Plus, if scheduler timers works, umwait will be forced time out by timer in 
> HZ anyway.

Indeed.  But, on some configs, the scheduler timer intentionally will
*not* fire.

>
> Only for case without scheduler timer, sysadmin may need to set a small max 
> umwait time to force timout. But in this case (real time), I doubt user app 
> really wants to call umwait to save power with long latency of waking up from 
> umwait. So likely in this case, user app won't call umwait and there is no 
> usage to set a small value for max umwait time.

I don't really know why an application would use umwait at all.  About
the only legitimate use I can think of is to treat it as a somewhat
power-saving variant of REP NOP.  What I want to avoid is the case
where it works dramatically differently on NO_HZ_FULL systems as
compared to everything else.  Also, UMWAIT may behave a bit
differently if the max timeout is hit, and I'd like that path to get
exercised widely by making it happen even on default configs.

So I propose setting the timeout to either 100 microseconds or 100k
"cycles" by default.  In the event someone determines that they save
materially more power or gets materially better performance with a
longer timeout, we can revisit the value.

--Andy


Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-21 Thread Tao Xu

On 2/22/2019 3:24 AM, Yu, Fenghua wrote:

On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:

On Wed, Feb 20, 2019 at 7:44 PM Tao Xu  wrote:


From: Fenghua Yu 




 From patchwork Wed Jan 16 21:18:41 2019
Content-Type: text/plain; charset="utf-8"


[snipped more stuff like this]

What happened here?


I don't know either:(

Tao never talked to me before he sent out this patch set. And the email
format is totally wrong and he didn't change any code in this patch set.

I have no idea why he sent out this patch set. As of now I haven't got
any response from him yet.

I believe he made a mistake here.

I am really sorry for sending the wrong email. Yesterday I make a 
mistake to cc Fenghua's former patch file when I sending other patch.


I'm sorry for this trouble and I will be more careful in my work.


RE: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-21 Thread Yu, Fenghua
> From: Fenghua Yu [mailto:fenghua...@intel.com]
> On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:
> Or to simplify the situation, how about we still use zero as global max wait
> time (i.e. no limitation for global wait time) as implemented in this patch
> set? User can still change the value to any value based on their usage. Isn't
> this a right way to take care of any usage?

Plus, if scheduler timers works, umwait will be forced time out by timer in HZ 
anyway.

Only for case without scheduler timer, sysadmin may need to set a small max 
umwait time to force timout. But in this case (real time), I doubt user app 
really wants to call umwait to save power with long latency of waking up from 
umwait. So likely in this case, user app won't call umwait and there is no 
usage to set a small value for max umwait time. 

Thanks.

-Fenghua


Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-21 Thread Fenghua Yu
On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 20, 2019 at 7:44 PM Tao Xu  wrote:
> >
> > From: Fenghua Yu 
> 
> >
> > From patchwork Wed Jan 16 21:18:41 2019
> > Content-Type: text/plain; charset="utf-8"
> 
> [snipped more stuff like this]
> 
> What happened here?

I don't know either:(

Tao never talked to me before he sent out this patch set. And the email
format is totally wrong and he didn't change any code in this patch set.

I have no idea why he sent out this patch set. As of now I haven't got
any response from him yet.

I believe he made a mistake here.

> 
> > +/* Return value that will be used to set umwait control MSR */
> > +static inline u32 umwait_control_val(void)
> > +{
> > +   /*
> > +* Enable or disable C0.2 (bit 0) based on global setting on all 
> > CPUs.
> > +* When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled.
> > +* So value in bit 0 is opposite of umwait_enable_c0_2.
> > +*/
> > +   return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK;
> > +}
> 
> This function is horribly named.  How about something like
> umwait_compute_msr_value() or something liek that?
OK. Will change the name.

> Also, what
> happened to the maximum wait time?
> 
> > +
> > +static ssize_t umwait_enable_c0_2_show(struct device *dev,
> > +  struct device_attribute *attr,
> > +  char *buf)
> > +{
> > +   return sprintf(buf, "%d\n", umwait_enable_c0_2);
> 
> I realize that it's traditional to totally ignore races in sysfs and
> such, but it's a bad tradition.  Please either READ_ONCE it with a
> comment or take the mutex.

Will change to READ_ONCE.

> 
> > +static ssize_t umwait_enable_c0_2_store(struct device *dev,
> > +   struct device_attribute *attr,
> > +   const char *buf, size_t count)
> > +{
> > +   int enable_c0_2, cpu, ret;
> > +   u32 msr_val;
> > +
> > +   ret = kstrtou32(buf, 10, _c0_2);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (enable_c0_2 != 1 && enable_c0_2 != 0)
> > +   return -EINVAL;
> 
> How about if (enable_c0_2 > 1)?
Ok. Will change to this.

> 
> > +
> > +   mutex_lock(_lock);
> > +
> > +   umwait_enable_c0_2 = enable_c0_2;
> > +   msr_val = umwait_control_val();
> > +   get_online_cpus();
> > +   /* All CPUs have same umwait control setting */
> > +   for_each_online_cpu(cpu)
> > +   wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> > +   put_online_cpus();
> > +
> > +   mutex_unlock(_lock);
> 
> Please factor this thing out into a helper like
> umwait_update_all_cpus().  That helper can assert that the lock is
> held.
Ok.

> 
> > +/* Set up umwait control MSR on this CPU using the current global setting. 
> > */
> > +static int umwait_cpu_online(unsigned int cpu)
> > +{
> > +   u32 msr_val;
> > +
> > +   mutex_lock(_lock);
> > +
> > +   msr_val = umwait_control_val();
> > +   wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> > +
> > +   mutex_unlock(_lock);
> > +
> > +   return 0;
> > +}
> > +
> > +static int __init umwait_init(void)
> > +{
> > +   struct device *dev;
> > +   int ret;
> > +
> > +   if (!boot_cpu_has(X86_FEATURE_WAITPKG))
> > +   return -ENODEV;
> > +
> > +   /* Add CPU global user wait interface to control umwait. */
> > +   dev = cpu_subsys.dev_root;
> > +   ret = sysfs_create_group(>kobj, _attr_group);
> > +   if (ret)
> > +   return ret;
> > +
> > +   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
> > +   umwait_cpu_online, NULL);
> 
> This hotplug notifier thing is awful.  Thomas, do we have a function
> that gets called every time a CPU is brought up (via BSP boot, AP
> boot, hotplug, hibernation resume, etc) where we can just put all
> these things?  cpu_init() is almost appropriate, except that it's
> called at somewhat erratic times (quite different for BSP and AP IIRC)
> and it's not called AFAICT during hibernation restore.  I suppose we
> could add a new thing that is called by cpu_init() and
> restore_processor_state().
> 
> Also, surely you should actually write the MSR in this function, too.
> 
> >
> >  static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */
> > +static u32 umwait_max_time; /* In TSC-quanta. Only bits [31:2] are used. */
> 
> I still think the default should be some reasonable nonzero value.  It
> should be long enough that we get decent C0.2 residency and short
> enough that UMWAIT never gives the impression that it is anything
> other than a fancy way to save a bit of power and SMT resources when
> spinning.  I don't want to see a situation where some library uses
> UMWAIT under the expectation that it genuinely waits for an event,
> appears to work well on bare 

Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-21 Thread Peter Zijlstra
On Wed, Feb 20, 2019 at 10:37:27PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 20, 2019 at 7:44 PM Tao Xu  wrote:

> > +static ssize_t umwait_enable_c0_2_show(struct device *dev,
> > +  struct device_attribute *attr,
> > +  char *buf)
> > +{
> > +   return sprintf(buf, "%d\n", umwait_enable_c0_2);
> 
> I realize that it's traditional to totally ignore races in sysfs and
> such, but it's a bad tradition.  Please either READ_ONCE it with a
> comment or take the mutex.

Note that when using READ_ONCE(), the other side must use WRITE_ONCE().
Mixed usage is not valid.


Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-20 Thread Andy Lutomirski
On Wed, Feb 20, 2019 at 7:44 PM Tao Xu  wrote:
>
> From: Fenghua Yu 

>
> From patchwork Wed Jan 16 21:18:41 2019
> Content-Type: text/plain; charset="utf-8"

[snipped more stuff like this]

What happened here?

> +/* Return value that will be used to set umwait control MSR */
> +static inline u32 umwait_control_val(void)
> +{
> +   /*
> +* Enable or disable C0.2 (bit 0) based on global setting on all CPUs.
> +* When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled.
> +* So value in bit 0 is opposite of umwait_enable_c0_2.
> +*/
> +   return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK;
> +}

This function is horribly named.  How about something like
umwait_compute_msr_value() or something liek that?  Also, what
happened to the maximum wait time?

> +
> +static ssize_t umwait_enable_c0_2_show(struct device *dev,
> +  struct device_attribute *attr,
> +  char *buf)
> +{
> +   return sprintf(buf, "%d\n", umwait_enable_c0_2);

I realize that it's traditional to totally ignore races in sysfs and
such, but it's a bad tradition.  Please either READ_ONCE it with a
comment or take the mutex.

> +static ssize_t umwait_enable_c0_2_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> +   int enable_c0_2, cpu, ret;
> +   u32 msr_val;
> +
> +   ret = kstrtou32(buf, 10, _c0_2);
> +   if (ret)
> +   return ret;
> +
> +   if (enable_c0_2 != 1 && enable_c0_2 != 0)
> +   return -EINVAL;

How about if (enable_c0_2 > 1)?

> +
> +   mutex_lock(_lock);
> +
> +   umwait_enable_c0_2 = enable_c0_2;
> +   msr_val = umwait_control_val();
> +   get_online_cpus();
> +   /* All CPUs have same umwait control setting */
> +   for_each_online_cpu(cpu)
> +   wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> +   put_online_cpus();
> +
> +   mutex_unlock(_lock);

Please factor this thing out into a helper like
umwait_update_all_cpus().  That helper can assert that the lock is
held.

> +/* Set up umwait control MSR on this CPU using the current global setting. */
> +static int umwait_cpu_online(unsigned int cpu)
> +{
> +   u32 msr_val;
> +
> +   mutex_lock(_lock);
> +
> +   msr_val = umwait_control_val();
> +   wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> +
> +   mutex_unlock(_lock);
> +
> +   return 0;
> +}
> +
> +static int __init umwait_init(void)
> +{
> +   struct device *dev;
> +   int ret;
> +
> +   if (!boot_cpu_has(X86_FEATURE_WAITPKG))
> +   return -ENODEV;
> +
> +   /* Add CPU global user wait interface to control umwait. */
> +   dev = cpu_subsys.dev_root;
> +   ret = sysfs_create_group(>kobj, _attr_group);
> +   if (ret)
> +   return ret;
> +
> +   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
> +   umwait_cpu_online, NULL);

This hotplug notifier thing is awful.  Thomas, do we have a function
that gets called every time a CPU is brought up (via BSP boot, AP
boot, hotplug, hibernation resume, etc) where we can just put all
these things?  cpu_init() is almost appropriate, except that it's
called at somewhat erratic times (quite different for BSP and AP IIRC)
and it's not called AFAICT during hibernation restore.  I suppose we
could add a new thing that is called by cpu_init() and
restore_processor_state().

Also, surely you should actually write the MSR in this function, too.

>
>  static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */
> +static u32 umwait_max_time; /* In TSC-quanta. Only bits [31:2] are used. */

I still think the default should be some reasonable nonzero value.  It
should be long enough that we get decent C0.2 residency and short
enough that UMWAIT never gives the impression that it is anything
other than a fancy way to save a bit of power and SMT resources when
spinning.  I don't want to see a situation where some library uses
UMWAIT under the expectation that it genuinely waits for an event,
appears to work well on bare metal on an otherwise idle system, and
falls apart when it's run in a VM guest or with other software
running.  IOW, programs more or less must be written to expect many
spurious wakeups, so I think we should pick a default value so that
there are essentially always many spurious wakeups.

As a guess, I think that the default wait time should be well under 1
ms but at least 20x the C0.2 entry+exit latency.

--Andy
>  static ssize_t umwait_enable_c0_2_show(struct device *dev,
> @@ -61,8 +63,46 @@ static ssize_t umwait_enable_c0_2_store(struct device *dev,
>
>  static DEVICE_ATTR_RW(umwait_enable_c0_2);
>
> +static ssize_t umwait_max_time_show(struct device *kobj,
> +   struct