Re: [PATCH v2 5/6] PCI / PM: Take SMART_SUSPEND driver flag into account

2017-10-31 Thread Bjorn Helgaas
On Sat, Oct 28, 2017 at 12:27:45AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Make the PCI bus type take DPM_FLAG_SMART_SUSPEND into account in its
> system-wide PM callbacks and make sure that all code that should not
> run in parallel with pci_pm_runtime_resume() is executed in the "late"
> phases of system suspend, freeze and poweroff transitions.
> 
> [Note that the pm_runtime_suspended() check in pci_dev_keep_suspended()
> is an optimization, because if is not passed, all of the subsequent
> checks may be skipped and some of them are much more overhead in
> general.]
> 
> Also use the observation that if the device is in runtime suspend
> at the beginning of the "late" phase of a system-wide suspend-like
> transition, its state cannot change going forward (runtime PM is
> disabled for it at that time) until the transition is over and the
> subsequent system-wide PM callbacks should be skipped for it (as
> they generally assume the device to not be suspended), so add checks
> for that in pci_pm_suspend_late/noirq(), pci_pm_freeze_late/noirq()
> and pci_pm_poweroff_late/noirq().
> 
> Moreover, if pci_pm_resume_noirq() or pci_pm_restore_noirq() is
> called during the subsequent system-wide resume transition and if
> the device was left in runtime suspend previously, its runtime PM
> status needs to be changed to "active" as it is going to be put
> into the full-power state, so add checks for that too to these
> functions.
> 
> In turn, if pci_pm_thaw_noirq() runs after the device has been
> left in runtime suspend, the subsequent "thaw" callbacks need
> to be skipped for it (as they may not work correctly with a
> suspended device), so set the power.direct_complete flag for the
> device then to make the PM core skip those callbacks.
> 
> In addition to the above add a core helper for checking if
> DPM_FLAG_SMART_SUSPEND is set and the device runtime PM status is
> "suspended" at the same time, which is done quite often in the new
> code (and will be done elsewhere going forward too).
> 
> Signed-off-by: Rafael J. Wysocki 
> Acked-by: Greg Kroah-Hartman 

Acked-by: Bjorn Helgaas 

> ---
> 
> -> v2: Implement the entire handling of DPM_FLAG_SMART_SUSPEND in
>the PCI bus type (instead of doing that in the core).
> 
> ---
>  Documentation/power/pci.txt |   14 +
>  drivers/base/power/main.c   |6 ++
>  drivers/pci/pci-driver.c|  103 
> 
>  include/linux/pm.h  |2 
>  4 files changed, 108 insertions(+), 17 deletions(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -734,18 +734,25 @@ static int pci_pm_suspend(struct device
>  
>   if (!pm) {
>   pci_pm_default_suspend(pci_dev);
> - goto Fixup;
> + return 0;
>   }
>  
>   /*
> -  * PCI devices suspended at run time need to be resumed at this point,
> -  * because in general it is necessary to reconfigure them for system
> -  * suspend.  Namely, if the device is supposed to wake up the system
> -  * from the sleep state, we may need to reconfigure it for this purpose.
> -  * In turn, if the device is not supposed to wake up the system from the
> -  * sleep state, we'll have to prevent it from signaling wake-up.
> +  * PCI devices suspended at run time may need to be resumed at this
> +  * point, because in general it may be necessary to reconfigure them for
> +  * system suspend.  Namely, if the device is expected to wake up the
> +  * system from the sleep state, it may have to be reconfigured for this
> +  * purpose, or if the device is not expected to wake up the system from
> +  * the sleep state, it should be prevented from signaling wakeup events
> +  * going forward.
> +  *
> +  * Also if the driver of the device does not indicate that its system
> +  * suspend callbacks can cope with runtime-suspended devices, it is
> +  * better to resume the device from runtime suspend here.
>*/
> - pm_runtime_resume(dev);
> + if (!dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) ||
> + !pci_dev_keep_suspended(pci_dev))
> + pm_runtime_resume(dev);
>  
>   pci_dev->state_saved = false;
>   if (pm->suspend) {
> @@ -765,17 +772,27 @@ static int pci_pm_suspend(struct device
>   }
>   }
>  
> - Fixup:
> - pci_fixup_device(pci_fixup_suspend, pci_dev);
> -
>   return 0;
>  }
>  
> +static int pci_pm_suspend_late(struct device *dev)
> +{
> + if (dev_pm_smart_suspend_and_suspended(dev))
> + return 0;
> +
> + pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
> +
> + return pm_generic_suspend_late(dev);
> +}
> +
> 

Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-31 Thread David Rientjes
On Tue, 31 Oct 2017, Michal Hocko wrote:

> > I'm not ignoring them, I have stated that we need the ability to protect 
> > important cgroups on the system without oom disabling all attached 
> > processes.  If that is implemented as a memory.oom_score_adj with the same 
> > semantics as /proc/pid/oom_score_adj, i.e. a proportion of available 
> > memory (the limit), it can also address the issues pointed out with the 
> > hierarchical approach in v8.
> 
> No it cannot and it would be a terrible interface to have as well. You
> do not want to permanently tune oom_score_adj to compensate for
> structural restrictions on the hierarchy.
> 

memory.oom_score_adj would never need to be permanently tuned, just as 
/proc/pid/oom_score_adj need never be permanently tuned.  My response was 
an answer to Roman's concern that "v8 has it's own limitations," but I 
haven't seen a concrete example where the oom killer is forced to kill 
from the non-preferred cgroup while the user has power of biasing against 
certain cgroups with memory.oom_score_adj.  Do you have such a concrete 
example that we can work with?

> I believe, and Roman has pointed that out as well already, that further
> improvements can be implemented without changing user visible behavior
> as and add-on. If you disagree then you better come with a solid proof
> that all of us wrong and reasonable semantic cannot be achieved that
> way.

We simply cannot determine if improvements can be implemented in the 
future without user-visible changes if those improvements are unknown or 
undecided at this time.  It may require hierarchical accounting when 
making a choice between siblings, as suggested with oom_score_adj.  The 
only thing that we need to agree on is that userspace needs to have some 
kind of influence over victim selection: the oom killer killing an 
important user process is an extremely sensitive thing.  If the patchset 
lacks the ability to have that influence, and such an ability would impact 
the heuristic overall, it's better to introduce that together as a 
complete patchset rather than merging an incomplete feature when it's 
known the user needs some control, asking the user to workaround it by 
setting all processes to oom disabled in a preferred mem cgroup, and then 
changing the heuristic again.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Michal Hocko
On Tue 31-10-17 20:06:44, Michal Hocko wrote:
> On Tue 31-10-17 16:29:23, Michal Hocko wrote:
> > On Tue 31-10-17 08:04:19, Shakeel Butt wrote:
> > > > +
> > > > +static void select_victim_memcg(struct mem_cgroup *root, struct 
> > > > oom_control *oc)
> > > > +{
> > > > +   struct mem_cgroup *iter;
> > > > +
> > > > +   oc->chosen_memcg = NULL;
> > > > +   oc->chosen_points = 0;
> > > > +
> > > > +   /*
> > > > +* The oom_score is calculated for leaf memory cgroups 
> > > > (including
> > > > +* the root memcg).
> > > > +*/
> > > > +   rcu_read_lock();
> > > > +   for_each_mem_cgroup_tree(iter, root) {
> > > > +   long score;
> > > > +
> > > > +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
> > > > +   continue;
> > > > +
> > > 
> > > Cgroup v2 does not support charge migration between memcgs. So, there
> > > can be intermediate nodes which may contain the major charge of the
> > > processes in their leave descendents. Skipping such intermediate nodes
> > > will kind of protect such processes from oom-killer (lower on the list
> > > to be killed). Is it ok to not handle such scenario? If yes, shouldn't
> > > we document it?
> > 
> > Yes, this is a real problem and the one which is not really solvable
> > without the charge migration. You simply have no clue _who_ owns the
> > memory so I assume that admins will need to setup the hierarchy which
> > allows subgroups to migrate tasks to be oom_group.
> 
> Hmm, scratch that. I have completely missed that the memory controller
> disables tasks migration completely in v2. I thought the standard
> restriction about the write access to the target cgroup and a common
> ancestor holds for all controllers but now I've noticed that we
> simply disallow the migration altogether. This wasn't the case before
> 1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from
> subtree_control enabling") which I wasn't aware of.

Blee brain fart, I have misread the code. We return 0 which is a success
so can_attach doesn't fail and so the tasks migration should be allowed
under standard cgroup restrictions, we just do not migrate charges.

Anyway, time to stop writing emails for me today. Sorry about the
confusion.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Michal Hocko
On Tue 31-10-17 16:29:23, Michal Hocko wrote:
> On Tue 31-10-17 08:04:19, Shakeel Butt wrote:
> > > +
> > > +static void select_victim_memcg(struct mem_cgroup *root, struct 
> > > oom_control *oc)
> > > +{
> > > +   struct mem_cgroup *iter;
> > > +
> > > +   oc->chosen_memcg = NULL;
> > > +   oc->chosen_points = 0;
> > > +
> > > +   /*
> > > +* The oom_score is calculated for leaf memory cgroups (including
> > > +* the root memcg).
> > > +*/
> > > +   rcu_read_lock();
> > > +   for_each_mem_cgroup_tree(iter, root) {
> > > +   long score;
> > > +
> > > +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
> > > +   continue;
> > > +
> > 
> > Cgroup v2 does not support charge migration between memcgs. So, there
> > can be intermediate nodes which may contain the major charge of the
> > processes in their leave descendents. Skipping such intermediate nodes
> > will kind of protect such processes from oom-killer (lower on the list
> > to be killed). Is it ok to not handle such scenario? If yes, shouldn't
> > we document it?
> 
> Yes, this is a real problem and the one which is not really solvable
> without the charge migration. You simply have no clue _who_ owns the
> memory so I assume that admins will need to setup the hierarchy which
> allows subgroups to migrate tasks to be oom_group.

Hmm, scratch that. I have completely missed that the memory controller
disables tasks migration completely in v2. I thought the standard
restriction about the write access to the target cgroup and a common
ancestor holds for all controllers but now I've noticed that we
simply disallow the migration altogether. This wasn't the case before
1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from
subtree_control enabling") which I wasn't aware of.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Johannes Weiner
On Tue, Oct 31, 2017 at 10:50:43AM -0700, Shakeel Butt wrote:
> On Tue, Oct 31, 2017 at 9:40 AM, Johannes Weiner  wrote:
> > On Tue, Oct 31, 2017 at 08:04:19AM -0700, Shakeel Butt wrote:
> >> > +
> >> > +static void select_victim_memcg(struct mem_cgroup *root, struct 
> >> > oom_control *oc)
> >> > +{
> >> > +   struct mem_cgroup *iter;
> >> > +
> >> > +   oc->chosen_memcg = NULL;
> >> > +   oc->chosen_points = 0;
> >> > +
> >> > +   /*
> >> > +* The oom_score is calculated for leaf memory cgroups (including
> >> > +* the root memcg).
> >> > +*/
> >> > +   rcu_read_lock();
> >> > +   for_each_mem_cgroup_tree(iter, root) {
> >> > +   long score;
> >> > +
> >> > +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
> >> > +   continue;
> >> > +
> >>
> >> Cgroup v2 does not support charge migration between memcgs. So, there
> >> can be intermediate nodes which may contain the major charge of the
> >> processes in their leave descendents. Skipping such intermediate nodes
> >> will kind of protect such processes from oom-killer (lower on the list
> >> to be killed). Is it ok to not handle such scenario? If yes, shouldn't
> >> we document it?
> >
> > Tasks cannot be in intermediate nodes, so the only way you can end up
> > in a situation like this is to start tasks fully, let them fault in
> > their full workingset, then create child groups and move them there.
> >
> > That has attribution problems much wider than the OOM killer: any
> > local limits you would set on a leaf cgroup like this ALSO won't
> > control the memory of its tasks - as it's all sitting in the parent.
> >
> > We created the "no internal competition" rule exactly to prevent this
> > situation.
> 
> Rather than the "no internal competition" restriction I think "charge
> migration" would have resolved that situation? Also "no internal
> competition" restriction (I am assuming 'no internal competition' is
> no tasks in internal nodes, please correct me if I am wrong) has made
> "charge migration" hard to implement and thus not added in cgroup v2.
> 
> I know this is parallel discussion and excuse my ignorance, what are
> other reasons behind "no internal competition" specifically for memory
> controller?

Sorry, but this is completely off-topic.

The rationale for this decisions is in Documentation/cgroup-v2.txt.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Shakeel Butt
On Tue, Oct 31, 2017 at 9:40 AM, Johannes Weiner  wrote:
> On Tue, Oct 31, 2017 at 08:04:19AM -0700, Shakeel Butt wrote:
>> > +
>> > +static void select_victim_memcg(struct mem_cgroup *root, struct 
>> > oom_control *oc)
>> > +{
>> > +   struct mem_cgroup *iter;
>> > +
>> > +   oc->chosen_memcg = NULL;
>> > +   oc->chosen_points = 0;
>> > +
>> > +   /*
>> > +* The oom_score is calculated for leaf memory cgroups (including
>> > +* the root memcg).
>> > +*/
>> > +   rcu_read_lock();
>> > +   for_each_mem_cgroup_tree(iter, root) {
>> > +   long score;
>> > +
>> > +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
>> > +   continue;
>> > +
>>
>> Cgroup v2 does not support charge migration between memcgs. So, there
>> can be intermediate nodes which may contain the major charge of the
>> processes in their leave descendents. Skipping such intermediate nodes
>> will kind of protect such processes from oom-killer (lower on the list
>> to be killed). Is it ok to not handle such scenario? If yes, shouldn't
>> we document it?
>
> Tasks cannot be in intermediate nodes, so the only way you can end up
> in a situation like this is to start tasks fully, let them fault in
> their full workingset, then create child groups and move them there.
>
> That has attribution problems much wider than the OOM killer: any
> local limits you would set on a leaf cgroup like this ALSO won't
> control the memory of its tasks - as it's all sitting in the parent.
>
> We created the "no internal competition" rule exactly to prevent this
> situation.

Rather than the "no internal competition" restriction I think "charge
migration" would have resolved that situation? Also "no internal
competition" restriction (I am assuming 'no internal competition' is
no tasks in internal nodes, please correct me if I am wrong) has made
"charge migration" hard to implement and thus not added in cgroup v2.

I know this is parallel discussion and excuse my ignorance, what are
other reasons behind "no internal competition" specifically for memory
controller?

> To be consistent with that rule, we might want to disallow
> the creation of child groups once a cgroup has local memory charges.
>
> It's trivial to change the setup sequence to create the leaf cgroup
> first, then launch the workload from within.
>

Only if cgroup hierarchy is centrally controller and each task's whole
hierarchy is known in advance.

> Either way, this is nothing specific about the OOM killer.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set

2017-10-31 Thread Eduardo Valentin
Hello Radim,

On Tue, Oct 24, 2017 at 01:18:59PM +0200, Radim Krčmář wrote:
> 2017-10-23 17:44-0700, Eduardo Valentin:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> 
> Where have you detected the main source of overhead with pinned VCPUs?
> Makes me wonder if we couldn't improve general PV_UNHALT,

This is essentially for cases of non-overcommitted vCPUs in which we want 
the instance vCPUs to run uninterrupted as much as possible. Here by disabling
the PV_UNHALT,  we avoid the accounting needed to properly do the PV_UNHALT 
hypercall, as the lock holder won't be preempted anyway for the 1:1 pin case.

> 
> thanks.
> 
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> 
> Some flag makes sense and we do want to make sure that userspaces don't
> enable it in pass-through-cpuid mode.

Did you mean something like:
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..8ceb503 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -211,7 +211,8 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
}
for (i = 0; i < cpuid->nent; i++) {
vcpu->arch.cpuid_entries[i].function = 
cpuid_entries[i].function;
-   vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax;
+   vcpu->arch.cpuid_entries[i].eax = cpuid_entries[i].eax &
+   
~KVM_FEATURE_PV_DEDICATED;
vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx;
vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx;
vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx;


But I do not see any other KVM_FEATURE_* being enforced (e.g. PV_UNHALT).
Do you mind elaborating a bit here?

> 

-- 
All the best,
Eduardo Valentin
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Johannes Weiner
On Tue, Oct 31, 2017 at 08:04:19AM -0700, Shakeel Butt wrote:
> > +
> > +static void select_victim_memcg(struct mem_cgroup *root, struct 
> > oom_control *oc)
> > +{
> > +   struct mem_cgroup *iter;
> > +
> > +   oc->chosen_memcg = NULL;
> > +   oc->chosen_points = 0;
> > +
> > +   /*
> > +* The oom_score is calculated for leaf memory cgroups (including
> > +* the root memcg).
> > +*/
> > +   rcu_read_lock();
> > +   for_each_mem_cgroup_tree(iter, root) {
> > +   long score;
> > +
> > +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
> > +   continue;
> > +
> 
> Cgroup v2 does not support charge migration between memcgs. So, there
> can be intermediate nodes which may contain the major charge of the
> processes in their leave descendents. Skipping such intermediate nodes
> will kind of protect such processes from oom-killer (lower on the list
> to be killed). Is it ok to not handle such scenario? If yes, shouldn't
> we document it?

Tasks cannot be in intermediate nodes, so the only way you can end up
in a situation like this is to start tasks fully, let them fault in
their full workingset, then create child groups and move them there.

That has attribution problems much wider than the OOM killer: any
local limits you would set on a leaf cgroup like this ALSO won't
control the memory of its tasks - as it's all sitting in the parent.

We created the "no internal competition" rule exactly to prevent this
situation. To be consistent with that rule, we might want to disallow
the creation of child groups once a cgroup has local memory charges.

It's trivial to change the setup sequence to create the leaf cgroup
first, then launch the workload from within.

Either way, this is nothing specific about the OOM killer.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND

2017-10-31 Thread Rafael J. Wysocki
On Tue, Oct 31, 2017 at 4:09 PM, Lee Jones  wrote:
> On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:
>
>> From: Rafael J. Wysocki 
>>
>> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
>> devices which will allow them to stay in runtime suspend during
>> system suspend unless they need to be reconfigured for some reason.
>>
>> Also make it avoid resuming its child devices if they have
>> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
>> suspend during system suspend.
>>
>> Signed-off-by: Rafael J. Wysocki 
>> ---
>>  drivers/mfd/intel-lpss.c |6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Is this patch independent?

It depends on the flag definition at least, but functionally it also
depends on the PCI support for the flag.

> For my own reference:
>   Acked-for-MFD-by: Lee Jones 

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Michal Hocko
On Tue 31-10-17 08:04:19, Shakeel Butt wrote:
> > +
> > +static void select_victim_memcg(struct mem_cgroup *root, struct 
> > oom_control *oc)
> > +{
> > +   struct mem_cgroup *iter;
> > +
> > +   oc->chosen_memcg = NULL;
> > +   oc->chosen_points = 0;
> > +
> > +   /*
> > +* The oom_score is calculated for leaf memory cgroups (including
> > +* the root memcg).
> > +*/
> > +   rcu_read_lock();
> > +   for_each_mem_cgroup_tree(iter, root) {
> > +   long score;
> > +
> > +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
> > +   continue;
> > +
> 
> Cgroup v2 does not support charge migration between memcgs. So, there
> can be intermediate nodes which may contain the major charge of the
> processes in their leave descendents. Skipping such intermediate nodes
> will kind of protect such processes from oom-killer (lower on the list
> to be killed). Is it ok to not handle such scenario? If yes, shouldn't
> we document it?

Yes, this is a real problem and the one which is not really solvable
without the charge migration. You simply have no clue _who_ owns the
memory so I assume that admins will need to setup the hierarchy which
allows subgroups to migrate tasks to be oom_group.

Or we might want to allow opt-in for charge migration in v2. To be
honest I wasn't completely happy about removing this functionality
altogether in v2 but there was a strong pushback back then that relying
on the charge migration doesn't have any sound usecase.

Anyway, I agree that documentation should be explicit about that.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/12] PM / mfd: intel-lpss: Use DPM_FLAG_SMART_SUSPEND

2017-10-31 Thread Lee Jones
On Mon, 16 Oct 2017, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki 
> 
> Make the intel-lpss driver set DPM_FLAG_SMART_SUSPEND for its
> devices which will allow them to stay in runtime suspend during
> system suspend unless they need to be reconfigured for some reason.
> 
> Also make it avoid resuming its child devices if they have
> DPM_FLAG_SMART_SUSPEND set to allow them to remain in runtime
> suspend during system suspend.
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/mfd/intel-lpss.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Is this patch independent?

For my own reference:
  Acked-for-MFD-by: Lee Jones 

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 3/6] mm, oom: cgroup-aware OOM killer

2017-10-31 Thread Shakeel Butt
> +
> +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control 
> *oc)
> +{
> +   struct mem_cgroup *iter;
> +
> +   oc->chosen_memcg = NULL;
> +   oc->chosen_points = 0;
> +
> +   /*
> +* The oom_score is calculated for leaf memory cgroups (including
> +* the root memcg).
> +*/
> +   rcu_read_lock();
> +   for_each_mem_cgroup_tree(iter, root) {
> +   long score;
> +
> +   if (memcg_has_children(iter) && iter != root_mem_cgroup)
> +   continue;
> +

Cgroup v2 does not support charge migration between memcgs. So, there
can be intermediate nodes which may contain the major charge of the
processes in their leave descendents. Skipping such intermediate nodes
will kind of protect such processes from oom-killer (lower on the list
to be killed). Is it ok to not handle such scenario? If yes, shouldn't
we document it?

> +   score = oom_evaluate_memcg(iter, oc->nodemask, 
> oc->totalpages);
> +
> +   /*
> +* Ignore empty and non-eligible memory cgroups.
> +*/
> +   if (score == 0)
> +   continue;
> +
> +   /*
> +* If there are inflight OOM victims, we don't need
> +* to look further for new victims.
> +*/
> +   if (score == -1) {
> +   oc->chosen_memcg = INFLIGHT_VICTIM;
> +   mem_cgroup_iter_break(root, iter);
> +   break;
> +   }
> +
> +   if (score > oc->chosen_points) {
> +   oc->chosen_points = score;
> +   oc->chosen_memcg = iter;
> +   }
> +   }
> +
> +   if (oc->chosen_memcg && oc->chosen_memcg != INFLIGHT_VICTIM)
> +   css_get(>chosen_memcg->css);
> +
> +   rcu_read_unlock();
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-31 Thread Michal Hocko
On Tue 31-10-17 15:17:11, peter enderborg wrote:
> On 10/27/2017 10:05 PM, Johannes Weiner wrote:
> > On Thu, Oct 26, 2017 at 02:03:41PM -0700, David Rientjes wrote:
> >> On Thu, 26 Oct 2017, Johannes Weiner wrote:
> >>
>  The nack is for three reasons:
> 
>   (1) unfair comparison of root mem cgroup usage to bias against that mem 
>   cgroup from oom kill in system oom conditions,
> 
>   (2) the ability of users to completely evade the oom killer by attaching
>   all processes to child cgroups either purposefully or 
>  unpurposefully,
>   and
> 
>   (3) the inability of userspace to effectively control oom victim  
>   selection.
> >>> My apologies if my summary was too reductionist.
> >>>
> >>> That being said, the arguments you repeat here have come up in
> >>> previous threads and been responded to. This doesn't change my
> >>> conclusion that your NAK is bogus.
> >> They actually haven't been responded to, Roman was working through v11 and 
> >> made a change on how the root mem cgroup usage was calculated that was 
> >> better than previous iterations but still not an apples to apples 
> >> comparison with other cgroups.  The problem is that it the calculation for 
> >> leaf cgroups includes additional memory classes, so it biases against 
> >> processes that are moved to non-root mem cgroups.  Simply creating mem 
> >> cgroups and attaching processes should not independently cause them to 
> >> become more preferred: it should be a fair comparison between the root mem 
> >> cgroup and the set of leaf mem cgroups as implemented.  That is very 
> >> trivial to do with hierarchical oom cgroup scoring.
> > There is absolutely no value in your repeating the same stuff over and
> > over again without considering what other people are telling you.
> >
> > Hierarchical oom scoring has other downsides, and most of us agree
> > that they aren't preferable over the differences in scoring the root
> > vs scoring other cgroups - in particular because the root cannot be
> > controlled, doesn't even have local statistics, and so is unlikely to
> > contain important work on a containerized system. Getting the ballpark
> > right for the vast majority of usecases is more than good enough here.
> >
> >> Since the ability of userspace to control oom victim selection is not 
> >> addressed whatsoever by this patchset, and the suggested method cannot be 
> >> implemented on top of this patchset as you have argued because it requires 
> >> a change to the heuristic itself, the patchset needs to become complete 
> >> before being mergeable.
> > It is complete. It just isn't a drop-in replacement for what you've
> > been doing out-of-tree for years. Stop making your problem everybody
> > else's problem.
> >
> > You can change the the heuristics later, as you have done before. Or
> > you can add another configuration flag and we can phase out the old
> > mode, like we do all the time.
> >
> I think this problem is related to the removal of the lowmemorykiller,
> where this is the life-line when the user-space for some reason fails.
> 
> So I guess quite a few will have this problem.

Could you be more specific please? We are _not_ removing possibility of
the user space influenced oom victim selection. You can still use the
_current_ oom selection heuristic. The patch adds a new selection method
which is opt-in so only those who want to opt-in will not be allowed to
have any influence on the victim selection. And as it has been pointed
out this can be implemented later so it is not like "this won't be
possible anymore in future"
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-31 Thread peter enderborg
On 10/27/2017 10:05 PM, Johannes Weiner wrote:
> On Thu, Oct 26, 2017 at 02:03:41PM -0700, David Rientjes wrote:
>> On Thu, 26 Oct 2017, Johannes Weiner wrote:
>>
 The nack is for three reasons:

  (1) unfair comparison of root mem cgroup usage to bias against that mem 
  cgroup from oom kill in system oom conditions,

  (2) the ability of users to completely evade the oom killer by attaching
  all processes to child cgroups either purposefully or unpurposefully,
  and

  (3) the inability of userspace to effectively control oom victim  
  selection.
>>> My apologies if my summary was too reductionist.
>>>
>>> That being said, the arguments you repeat here have come up in
>>> previous threads and been responded to. This doesn't change my
>>> conclusion that your NAK is bogus.
>> They actually haven't been responded to, Roman was working through v11 and 
>> made a change on how the root mem cgroup usage was calculated that was 
>> better than previous iterations but still not an apples to apples 
>> comparison with other cgroups.  The problem is that it the calculation for 
>> leaf cgroups includes additional memory classes, so it biases against 
>> processes that are moved to non-root mem cgroups.  Simply creating mem 
>> cgroups and attaching processes should not independently cause them to 
>> become more preferred: it should be a fair comparison between the root mem 
>> cgroup and the set of leaf mem cgroups as implemented.  That is very 
>> trivial to do with hierarchical oom cgroup scoring.
> There is absolutely no value in your repeating the same stuff over and
> over again without considering what other people are telling you.
>
> Hierarchical oom scoring has other downsides, and most of us agree
> that they aren't preferable over the differences in scoring the root
> vs scoring other cgroups - in particular because the root cannot be
> controlled, doesn't even have local statistics, and so is unlikely to
> contain important work on a containerized system. Getting the ballpark
> right for the vast majority of usecases is more than good enough here.
>
>> Since the ability of userspace to control oom victim selection is not 
>> addressed whatsoever by this patchset, and the suggested method cannot be 
>> implemented on top of this patchset as you have argued because it requires 
>> a change to the heuristic itself, the patchset needs to become complete 
>> before being mergeable.
> It is complete. It just isn't a drop-in replacement for what you've
> been doing out-of-tree for years. Stop making your problem everybody
> else's problem.
>
> You can change the the heuristics later, as you have done before. Or
> you can add another configuration flag and we can phase out the old
> mode, like we do all the time.
>
I think this problem is related to the removal of the lowmemorykiller,
where this is the life-line when the user-space for some reason fails.

So I guess quite a few will have this problem.



--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/5] gpio: gpiolib: Add chardev support for maintaining GPIO values on reset

2017-10-31 Thread Linus Walleij
On Thu, Oct 26, 2017 at 2:05 AM, Andrew Jeffery  wrote:

> I feel that taking this argument to its logical conclusion leads to
> never exporting any GPIOs to userspace and doing everything in the
> kernel.

That is very much how I feel about things anyways.
In a recent presentation:
https://dflund.se/~triad/papers/GPIO-for-Engineers-and-Makers.pdf

I had the following text:

The Rules of Linux Userspace GPIO
1. You do not access GPIOs from userspace
2. YOU DO NOT ACCESS GPIOS FROM USERSPACE
3. Read Documentation/gpio/drivers-on-gpio.txt
4. Use the character device

> If userspace has exported the GPIO and is managing its state,
> then it can *already* cause very weird hardware behaviour if set wrong.
> The fact that userspace is controlling the GPIO state and not the
> kernel already says that the kernel doesn't know how to manage it, so
> why not expose the option for userspace to set the persistence, given
> that it should know what it's doing?

People do need to access GPIOs from userspace for things
like one-off makerspace projects, relays on factory lines,
fire alarms, door openers etc etc.

One-off projects is fine, the user likely has an idea about the
whole system that is comprehensive. They use the random
raspberry Pi (etc) development board for this. OK.

When we are talking about adding GPIO in mass-market goods
such as phones and tablets and laptops userspace GPIO
access become more and more dubious.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] dmaengine: ReSTize documentation

2017-10-31 Thread Vinod Koul
On Tue, Oct 31, 2017 at 02:16:14AM -0600, Jonathan Corbet wrote:
> On Wed, 25 Oct 2017 12:02:51 +0530
> Vinod Koul  wrote:
> 
> > So here is the conversion of the dmaengine documents form txt files to rst
> > format. Not much functional change but somehow git detects only two renames
> > (possibly due to added indent to make new format happier)
> 
> Overall this looks pretty good, thanks.  I could pick out various nits
> (the use of single-space indents seems like an invitation for future
> editing errors, for example) but that's all second-order stuff.

Thanks for pointing, I will check. What is the indent preferred here?

> The bigger one is that I would really prefer to see this as part of the
> driver-api manual - that's what it's there for.  I would really rather
> that the top-level index file not replicate the unordered mess that is
> Documentation/ now.  Could we make that move as part of this change?

Sure, looking at it I agree that makes sense. Quick question though, should
we move the location of these files or just do references

-- 
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] dmaengine: ReSTize documentation

2017-10-31 Thread Jonathan Corbet
On Wed, 25 Oct 2017 12:02:51 +0530
Vinod Koul  wrote:

> So here is the conversion of the dmaengine documents form txt files to rst
> format. Not much functional change but somehow git detects only two renames
> (possibly due to added indent to make new format happier)

Overall this looks pretty good, thanks.  I could pick out various nits
(the use of single-space indents seems like an invitation for future
editing errors, for example) but that's all second-order stuff.

The bigger one is that I would really prefer to see this as part of the
driver-api manual - that's what it's there for.  I would really rather
that the top-level index file not replicate the unordered mess that is
Documentation/ now.  Could we make that move as part of this change?

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Check all .c files for bad kernel-doc comments

2017-10-31 Thread Jani Nikula
On Mon, 30 Oct 2017, Matthew Wilcox  wrote:
> On Mon, Oct 30, 2017 at 05:19:18PM +0200, Jani Nikula wrote:
>> Related, there was also a script to do reStructuredText lint style
>> checks in addition to the kernel-doc checks using make CHECK and
>> C=1. See http://mid.mail-archive.com/87h98quc1w.fsf@intel.com
>
> I don't really care which patch goes in.  If I understand your python
> script correctly, it relies on having various python packages installed.
> Unless we're going to switch kernel-doc over to being written in python,
> I'd prefer to not require additional dependencies.

I think your patch has a much better chance of getting enabled by
default in the long run, so I'd prefer that. I've also kind of dropped
the ball on my script... but thought it might be interesting.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bug-hunting.rst: Fix an example and a typo in a Sphinx tag

2017-10-31 Thread Jonathan Corbet
On Fri, 20 Oct 2017 21:49:14 +0200
Christophe JAILLET  wrote:

> - Use the same file name in the explanation and in the example (conex.c vs
> sonixj.c)
> - Add a missing ':' in a :ref: tag which leads to incorrect Shpinx output
> - Add some missing ',' and ';'
> 
> Signed-off-by: Christophe JAILLET 

Applied to the docs tree, thanks.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND v12 0/6] cgroup-aware OOM killer

2017-10-31 Thread Michal Hocko
On Mon 30-10-17 14:36:39, David Rientjes wrote:
> On Fri, 27 Oct 2017, Roman Gushchin wrote:
> 
> > The thing is that the hierarchical approach (as in v8), which are you 
> > pushing,
> > has it's own limitations, which we've discussed in details earlier. There 
> > are
> > reasons why v12 is different, and we can't really simple go back. I mean if
> > there are better ideas how to resolve concerns raised in discussions around 
> > v8,
> > let me know, but ignoring them is not an option.
> > 
> 
> I'm not ignoring them, I have stated that we need the ability to protect 
> important cgroups on the system without oom disabling all attached 
> processes.  If that is implemented as a memory.oom_score_adj with the same 
> semantics as /proc/pid/oom_score_adj, i.e. a proportion of available 
> memory (the limit), it can also address the issues pointed out with the 
> hierarchical approach in v8.

No it cannot and it would be a terrible interface to have as well. You
do not want to permanently tune oom_score_adj to compensate for
structural restrictions on the hierarchy.

> If this is not the case, could you elaborate 
> on what your exact concern is and why we do not care that users can 
> completely circumvent victim selection by creating child cgroups for other 
> controllers?
> 
> Since the ability to protect important cgroups on the system may require a 
> heuristic change, I think it should be solved now rather than constantly 
> insisting that we can make this patchset complete later and in the 
> meantime force the user to set all attached processes to be oom disabled.

I believe, and Roman has pointed that out as well already, that further
improvements can be implemented without changing user visible behavior
as and add-on. If you disagree then you better come with a solid proof
that all of us wrong and reasonable semantic cannot be achieved that
way.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html