On Sun, 2015-22-03 at 04:42:59 UTC, "Shreyas B. Prabhu" wrote:
> Fastsleep is one of the idle state which cpuidle subsystem currently
> uses on power8 machines. In this state L2 cache is brought down to a
> threshold voltage. Therefore when the core is in fastsleep, the
> communication between L2 and L3 needs to be fenced. But there is a bug
> in the current power8 chips surrounding this fencing.
> 
> OPAL provides a workaround which precludes the possibility of hitting
> this bug. But running with this workaround applied causes checkstop
> if any correctable error in L2 cache directory is detected. Hence OPAL
> also provides a way to undo the workaround.
> 
> In the existing implementation, workaround is applied by the last thread
> of the core entering fastsleep and undone by the first thread waking up.
> But this has a performance cost. These OPAL calls account for roughly
> 4000 cycles everytime the core has to enter or wakeup from fastsleep.
> 
> This patch introduces a sysfs attribute (fastsleep_workaround_state)
> to choose the behavior of this workaround.
> 
> By default, fastsleep_workaround_state = 0. In this case, workaround
> is applied/undone everytime the core enters/exits fastsleep.
> 
> fastsleep_workaround_state = 1. In this case the workaround is applied
> once on all the cores and never undone. This can be triggered by
> echo 1 > /sys/devices/system/cpu/fastsleep_workaround_state
> 
> For simplicity this attribute can be modified only once. Implying, once
> fastsleep_workaround_state is changed to 1, it cannot be reverted to
> the default state.

This sounds good, although the name is a bit vague.

Just calling it "state" doesn't make it clear what 0 and 1 mean.
I think better would be "fastsleep_workaround_active" ?

Though even that is a bit wrong, because 0 doesn't really mean it's not active,
it means it's not *permanently* active.

So another option would be to make it a string attribute, with the initial
state being eg. "dynamic" and then maybe "applied" for the applied state?

I won't say you have to do that, but think about it, seeing as I'm going to ask
for a v4 anyway (see comments below).

> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 9ee0a30..8bea8fc 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -180,6 +180,13 @@ struct opal_sg_list {
>  #define OPAL_PM_WINKLE_ENABLED       0x00040000
>  #define OPAL_PM_SLEEP_ENABLED_ER1    0x00080000
>  
> +/*
> + * OPAL_CONFIG_CPU_IDLE_STATE parameters
> + */
> +#define OPAL_CONFIG_IDLE_FASTSLEEP   1
> +#define OPAL_CONFIG_IDLE_UNDO                0
> +#define OPAL_CONFIG_IDLE_APPLY               1

The OPAL defines have moved to opal-api.h in Linux.

They should also be made #defines in skiboot.

> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 77992f6..79157b9 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -13,6 +13,8 @@
>  #include <linux/mm.h>
>  #include <linux/slab.h>
>  #include <linux/of.h>
> +#include <linux/device.h>
> +#include <linux/cpu.h>
>  
>  #include <asm/firmware.h>
>  #include <asm/opal.h>
> @@ -136,6 +138,77 @@ u32 pnv_get_supported_cpuidle_states(void)
>  }
>  EXPORT_SYMBOL_GPL(pnv_get_supported_cpuidle_states);
>  
> +static void pnv_fastsleep_workaround_apply(void *info)
> +{
> +     opal_config_cpu_idle_state(OPAL_CONFIG_IDLE_FASTSLEEP,
> +                                     OPAL_CONFIG_IDLE_APPLY);

This can fail, please check the return. You'll need to report the result via
*info.

> +}
> +
> +/*
> + * Used to store fastsleep workaround state
> + * 0 - Workaround applied/undone at fastsleep entry/exit path (Default)
> + * 1 - Workaround applied once, never undone.
> + */
> +static u8 fastsleep_workaround_state;
> +
> +static ssize_t show_fastsleep_workaround_state(struct device *dev,
> +             struct device_attribute *attr, char *buf)
> +{
> +     return sprintf(buf, "%u\n", fastsleep_workaround_state);
> +}
> +
> +static ssize_t store_fastsleep_workaround_state(struct device *dev,
> +             struct device_attribute *attr, const char *buf,
> +             size_t count)
> +{
> +     u32 val;
> +     cpumask_t primary_thread_mask;
> +
> +     /*
> +      * fastsleep_workaround_state is write-once parameter.
> +      * Once it has been set to 1, it cannot be undone.
> +      */
> +     if (fastsleep_workaround_state == 1)
> +             return -EINVAL;

Better behaviour here is to delay this check until after you've done the
kstrtou32(), and if they are asking to set it to 1 (again) then you just return
count (OK).

That way scripts can just "echo 1 > .." and if the workaround is already
applied then there is no error.

> +     if (kstrtou32(buf, 0, &val))
> +             return -EINVAL;

You use a u8 above, so why not a u8 here?

> +     if (val > 1)
> +             return -EINVAL;
> +
> +     fastsleep_workaround_state = 1;

You should delay setting this until below when you know it's succeeded.

> +     /*
> +      * fastsleep_workaround_state = 1 implies fastsleep workaround needs to
> +      * be left in 'applied' state on all the cores. Do this by-
> +      * 1. Patching out the call to 'undo' workaround in fastsleep exit path
> +      * 2. Sending ipi to all the cores which have atleast one online thread
> +      * 3. Patching out the call to 'apply' workaround in fastsleep entry
> +      * path
> +      * There is no need to send ipi to cores which have all threads
> +      * offlined, as last thread of the core entering fastsleep or deeper
> +      * state would have applied workaround.
> +      */
> +     patch_instruction(
> +             (unsigned int *)pnv_fastsleep_workaround_at_exit,
> +             PPC_INST_NOP);

This can fail.

> +     primary_thread_mask = cpu_online_cores_map();
> +     on_each_cpu_mask(&primary_thread_mask,
> +                             pnv_fastsleep_workaround_apply,
> +                             NULL, 1);
> +
> +     patch_instruction(
> +             (unsigned int *)pnv_fastsleep_workaround_at_entry,
> +             PPC_INST_NOP);

And so can this.

> +     return count;
> +}
> +
> +static DEVICE_ATTR(fastsleep_workaround_state, 0600,
> +                     show_fastsleep_workaround_state,
> +                     store_fastsleep_workaround_state);
> +
>  static int __init pnv_init_idle_states(void)
>  {
>       struct device_node *power_mgt;
> @@ -178,7 +251,15 @@ static int __init pnv_init_idle_states(void)
>               patch_instruction(
>                       (unsigned int *)pnv_fastsleep_workaround_at_exit,
>                       PPC_INST_NOP);
> -     }
> +     } else

I know it's coding style to not bracket a single statement else block, but I
disagree in cases like this. Because the comment is so big it's preferable to
bracket it IMHO.

> +     /*
> +      * OPAL_PM_SLEEP_ENABLED_ER1 is set. It indicates that workaround is
> +      * needed to use fastsleep. Provide sysfs control to choose how this
> +      * workaround has to be applied.
> +      */

And the comment should indentend to match the code.

> +             device_create_file(cpu_subsys.dev_root,
> +                                     &dev_attr_fastsleep_workaround_state);
> +
>       pnv_alloc_idle_core_states();
>       return 0;
>  }

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to