Re: [PATCH 3/6] cpufreq: governor: Drop min_sampling_rate

2017-06-29 Thread Viresh Kumar
On 30-06-17, 06:53, Dominik Brodowski wrote:
> On Fri, Jun 30, 2017 at 09:04:25AM +0530, Viresh Kumar wrote:
> > On 29-06-17, 20:01, Dominik Brodowski wrote:
> > > On Thu, Jun 29, 2017 at 04:29:06PM +0530, Viresh Kumar wrote:
> > > > The cpufreq core and governors aren't supposed to set a limit on how
> > > > fast we want to try changing the frequency. This is currently done for
> > > > the legacy governors with help of min_sampling_rate.
> > > > 
> > > > At worst, we may end up setting the sampling rate to a value lower than
> > > > the rate at which frequency can be changed and then one of the CPUs in
> > > > the policy will be only changing frequency for ever.
> > > 
> > > Is it safe to issue requests to change the CPU frequency so frequently,
> > 
> > Well, I assumed so. I am not sure the hardware would break though.
> > Overheating ?
> > 
> > > even
> > > on historic hardware such as speedstep-{ich,smi,centrino}? In the past,

speedstep-smi is the only one which sets transition_latency to
CPUFREQ_ETERNAL and the others are putting some meaningful values. So
yes, they should be doing DVFS dynamically.

> > > these checks more or less disallowed the running of dynamic frequency
> > > scaling at least on speedstep-smi[*],
> > 
> > We must by doing dynamic freq scaling even without this patch. I don't
> > see why you say the above then.
> > 
> > All we do here is that we get rid of the limit on how soon we can
> > change the freq again.
> 
> Well, as I understand it, first generation "speedstep" was designed more or
> less to switch frequencies only when AC power was lost or restored.
> 
> The Linux implementation merely said: "no on-the-fly changes", but switch
> frequencies whenever a user explicitly requested such a change (presumably
> only every once in an unspecified while).
> 
> This same reasoning may be present in other drivers using CPUFREQ_ETERNAL.

Thanks for the explanation here and I am convinced that this series
has at least done one thing wrong. And that is removal of
max_transition_latency from governors and allowing ondemand to run on
such platforms (which may end up breaking them).

So I will actually modify that patch and set max_transition_latency to
CPUFREQ_ETERNAL for ondemand/conservative instead of 10ms. Also we
should do the same for schedutil as well, so that will also use the
max_transition_latency field.

But I hope, this patch will still be fine. Right ?

> I am not *sure* either, I am just worried of the consequences of doing
> things out-of-spec...

Thanks for your inputs Dominik.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] cpufreq: governor: Drop min_sampling_rate

2017-06-29 Thread Dominik Brodowski
On Fri, Jun 30, 2017 at 09:04:25AM +0530, Viresh Kumar wrote:
> On 29-06-17, 20:01, Dominik Brodowski wrote:
> > On Thu, Jun 29, 2017 at 04:29:06PM +0530, Viresh Kumar wrote:
> > > The cpufreq core and governors aren't supposed to set a limit on how
> > > fast we want to try changing the frequency. This is currently done for
> > > the legacy governors with help of min_sampling_rate.
> > > 
> > > At worst, we may end up setting the sampling rate to a value lower than
> > > the rate at which frequency can be changed and then one of the CPUs in
> > > the policy will be only changing frequency for ever.
> > 
> > Is it safe to issue requests to change the CPU frequency so frequently,
> 
> Well, I assumed so. I am not sure the hardware would break though.
> Overheating ?
> 
> > even
> > on historic hardware such as speedstep-{ich,smi,centrino}? In the past,
> > these checks more or less disallowed the running of dynamic frequency
> > scaling at least on speedstep-smi[*],
> 
> We must by doing dynamic freq scaling even without this patch. I don't
> see why you say the above then.
> 
> All we do here is that we get rid of the limit on how soon we can
> change the freq again.

Well, as I understand it, first generation "speedstep" was designed more or
less to switch frequencies only when AC power was lost or restored.

The Linux implementation merely said: "no on-the-fly changes", but switch
frequencies whenever a user explicitly requested such a change (presumably
only every once in an unspecified while).

This same reasoning may be present in other drivers using CPUFREQ_ETERNAL.

> > but maybe on a few other platforms as
> > well. That's why I am curious on whether this may break systems potentially
> > on a hardware level if the hardware was not designed to do dynamic frequency
> > scaling (and not just frequency switches on battery/AC).
> 
> Honestly I am not sure if any hardware can break or not, just because
> of this commit.

I am not *sure* either, I am just worried of the consequences of doing
things out-of-spec...

Best
Dominik


signature.asc
Description: PGP signature


Re: [PATCH 3/6] cpufreq: governor: Drop min_sampling_rate

2017-06-29 Thread Viresh Kumar
On 29-06-17, 20:01, Dominik Brodowski wrote:
> On Thu, Jun 29, 2017 at 04:29:06PM +0530, Viresh Kumar wrote:
> > The cpufreq core and governors aren't supposed to set a limit on how
> > fast we want to try changing the frequency. This is currently done for
> > the legacy governors with help of min_sampling_rate.
> > 
> > At worst, we may end up setting the sampling rate to a value lower than
> > the rate at which frequency can be changed and then one of the CPUs in
> > the policy will be only changing frequency for ever.
> 
> Is it safe to issue requests to change the CPU frequency so frequently,

Well, I assumed so. I am not sure the hardware would break though.
Overheating ?

> even
> on historic hardware such as speedstep-{ich,smi,centrino}? In the past,
> these checks more or less disallowed the running of dynamic frequency
> scaling at least on speedstep-smi[*],

We must by doing dynamic freq scaling even without this patch. I don't
see why you say the above then.

All we do here is that we get rid of the limit on how soon we can
change the freq again.

> but maybe on a few other platforms as
> well. That's why I am curious on whether this may break systems potentially
> on a hardware level if the hardware was not designed to do dynamic frequency
> scaling (and not just frequency switches on battery/AC).

Honestly I am not sure if any hardware can break or not, just because
of this commit.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-29 Thread Tetsuo Handa
Roman Gushchin wrote:
> On Fri, Jun 23, 2017 at 06:52:20AM +0900, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > Oops, I misinterpreted. This is where a multithreaded OOM victim with or 
> > without
> > the OOM reaper can get stuck forever. Think about a process with two 
> > threads is
> > selected by the OOM killer and only one of these two threads can get 
> > TIF_MEMDIE.
> > 
> >   Thread-1 Thread-2 The OOM killer  
> >  The OOM reaper
> > 
> >Calls down_write(¤t->mm->mmap_sem).
> >   Enters __alloc_pages_slowpath().
> >Enters __alloc_pages_slowpath().
> >   Takes oom_lock.
> >   Calls out_of_memory().
> > Selects Thread-1 as an 
> > OOM victim.
> >   Gets SIGKILL.Gets SIGKILL.
> >   Gets TIF_MEMDIE.
> >   Releases oom_lock.
> >   Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
> > 
> >  Takes oom_lock.
> > 
> >  Will do nothing because down_read_trylock() fails.
> > 
> >  Releases oom_lock.
> > 
> >  Gives up and sets MMF_OOM_SKIP after one second.
> >Takes oom_lock.
> >Calls out_of_memory().
> >Will not check MMF_OOM_SKIP because Thread-1 
> > still has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
> >Releases oom_lock.
> >Will not leave __alloc_pages_slowpath() because 
> > Thread-2 does not have TIF_MEMDIE.
> >Will not call up_write(¤t->mm->mmap_sem).
> >   Reaches do_exit().
> >   Calls down_read(¤t->mm->mmap_sem) in exit_mm() in do_exit(). // <= 
> > get stuck waiting for Thread-2.
> >   Will not call up_read(¤t->mm->mmap_sem) in exit_mm() in do_exit().
> >   Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().
> 
> That's interesting... Does it mean, that we have to give an access to the 
> reserves
> to all threads to guarantee the forward progress?

Yes, for we don't have __GFP_KILLABLE flag.

> 
> What do you think about Michal's approach? He posted a link in the thread.

Please read that thread.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation: fix wrong example command

2017-06-29 Thread Jonathan Corbet
On Thu, 29 Jun 2017 18:36:35 +0200
Matteo Croce  wrote:

> Signed-off-by: Matteo Croce 
> ---
>  Documentation/networking/ipvlan.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/ipvlan.txt 
> b/Documentation/networking/ipvlan.txt
> index 24196ce..1fe42a8 100644
> --- a/Documentation/networking/ipvlan.txt
> +++ b/Documentation/networking/ipvlan.txt
> @@ -22,9 +22,9 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) 
> or as a module
>   There are no module parameters for this driver and it can be configured
>  using IProute2/ip utility.
>  
> - ip link add link   type ipvlan mode { l2 | l3 | 
> l3s }
> + ip link add link  name  type ipvlan mode { l2 | 
> l3 | l3s }
>  
> - e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
> + e.g. ip link add link eth0 name ipvl0 type ipvlan mode l2
>  

Patches to the networking documentation go through the networking tree, so
this one should be resent with a copy to the netdev list.  I'd also
recommend putting in a real changelog.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-29 Thread Roman Gushchin
On Fri, Jun 23, 2017 at 06:52:20AM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Roman Gushchin wrote:
> > > On Thu, Jun 22, 2017 at 09:40:28AM +0900, Tetsuo Handa wrote:
> > > > Roman Gushchin wrote:
> > > > > --- a/mm/oom_kill.c
> > > > > +++ b/mm/oom_kill.c
> > > > > @@ -992,6 +992,13 @@ bool out_of_memory(struct oom_control *oc)
> > > > >   if (oom_killer_disabled)
> > > > >   return false;
> > > > >  
> > > > > + /*
> > > > > +  * If there are oom victims in flight, we don't need to select
> > > > > +  * a new victim.
> > > > > +  */
> > > > > + if (atomic_read(&oom_victims) > 0)
> > > > > + return true;
> > > > > +
> > > > >   if (!is_memcg_oom(oc)) {
> > > > >   blocking_notifier_call_chain(&oom_notify_list, 0, 
> > > > > &freed);
> > > > >   if (freed > 0)
> > > > 
> > > > The OOM reaper is not available for CONFIG_MMU=n kernels, and timeout 
> > > > based
> > > > giveup is not permitted, but a multithreaded process might be selected 
> > > > as
> > > > an OOM victim. Not setting TIF_MEMDIE to all threads sharing an OOM 
> > > > victim's
> > > > mm increases possibility of preventing some OOM victim thread from 
> > > > terminating
> > > > (e.g. one of them cannot leave __alloc_pages_slowpath() with mmap_sem 
> > > > held for
> > > > write due to waiting for the TIF_MEMDIE thread to call 
> > > > exit_oom_victim() when
> > > > the TIF_MEMDIE thread is waiting for the thread with mmap_sem held for 
> > > > write).
> > > 
> > > I agree, that CONFIG_MMU=n is a special case, and the proposed approach 
> > > can't
> > > be used directly. But can you, please, why do you find the first  chunk 
> > > wrong?
> > 
> > Since you are checking oom_victims before checking 
> > task_will_free_mem(current),
> > only one thread can get TIF_MEMDIE. This is where a multithreaded OOM 
> > victim without
> > the OOM reaper can get stuck forever.
> 
> Oops, I misinterpreted. This is where a multithreaded OOM victim with or 
> without
> the OOM reaper can get stuck forever. Think about a process with two threads 
> is
> selected by the OOM killer and only one of these two threads can get 
> TIF_MEMDIE.
> 
>   Thread-1 Thread-2 The OOM killer   
> The OOM reaper
> 
>Calls down_write(¤t->mm->mmap_sem).
>   Enters __alloc_pages_slowpath().
>Enters __alloc_pages_slowpath().
>   Takes oom_lock.
>   Calls out_of_memory().
> Selects Thread-1 as an 
> OOM victim.
>   Gets SIGKILL.Gets SIGKILL.
>   Gets TIF_MEMDIE.
>   Releases oom_lock.
>   Leaves __alloc_pages_slowpath() because Thread-1 has TIF_MEMDIE.
>  
> Takes oom_lock.
>  
> Will do nothing because down_read_trylock() fails.
>  
> Releases oom_lock.
>  
> Gives up and sets MMF_OOM_SKIP after one second.
>Takes oom_lock.
>Calls out_of_memory().
>Will not check MMF_OOM_SKIP because Thread-1 still 
> has TIF_MEMDIE. // <= get stuck waiting for Thread-1.
>Releases oom_lock.
>Will not leave __alloc_pages_slowpath() because 
> Thread-2 does not have TIF_MEMDIE.
>Will not call up_write(¤t->mm->mmap_sem).
>   Reaches do_exit().
>   Calls down_read(¤t->mm->mmap_sem) in exit_mm() in do_exit(). // <= 
> get stuck waiting for Thread-2.
>   Will not call up_read(¤t->mm->mmap_sem) in exit_mm() in do_exit().
>   Will not clear TIF_MEMDIE in exit_oom_victim() in exit_mm() in do_exit().

That's interesting... Does it mean, that we have to give an access to the 
reserves
to all threads to guarantee the forward progress?

What do you think about Michal's approach? He posted a link in the thread.

Thank you!

Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 5/6] mm, oom: don't mark all oom victims tasks with TIF_MEMDIE

2017-06-29 Thread Roman Gushchin
On Thu, Jun 29, 2017 at 10:53:57AM +0200, Michal Hocko wrote:
> On Wed 21-06-17 22:19:15, Roman Gushchin wrote:
> > We want to limit the number of tasks which are having an access
> > to the memory reserves. To ensure the progress it's enough
> > to have one such process at the time.
> > 
> > If we need to kill the whole cgroup, let's give an access to the
> > memory reserves only to the first process in the list, which is
> > (usually) the biggest process.
> > This will give us good chances that all other processes will be able
> > to quit without an access to the memory reserves.
> 
> I don't like this to be honest. Is there any reason to go the reduced
> memory reserves access to oom victims I was suggesting earlier [1]?
> 
> [1] 
> http://lkml.kernel.org/r/http://lkml.kernel.org/r/1472723464-22866-2-git-send-email-mho...@kernel.org

I've nothing against your approach. What's the state of this patchset?
Do you plan to bring it upstream?

Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] cpufreq: governor: Drop min_sampling_rate

2017-06-29 Thread Dominik Brodowski
On Thu, Jun 29, 2017 at 04:29:06PM +0530, Viresh Kumar wrote:
> The cpufreq core and governors aren't supposed to set a limit on how
> fast we want to try changing the frequency. This is currently done for
> the legacy governors with help of min_sampling_rate.
> 
> At worst, we may end up setting the sampling rate to a value lower than
> the rate at which frequency can be changed and then one of the CPUs in
> the policy will be only changing frequency for ever.

Is it safe to issue requests to change the CPU frequency so frequently, even
on historic hardware such as speedstep-{ich,smi,centrino}? In the past,
these checks more or less disallowed the running of dynamic frequency
scaling at least on speedstep-smi[*], but maybe on a few other platforms as
well. That's why I am curious on whether this may break systems potentially
on a hardware level if the hardware was not designed to do dynamic frequency
scaling (and not just frequency switches on battery/AC).

Best,
Dominik
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: fix wrong example command

2017-06-29 Thread Matteo Croce
Signed-off-by: Matteo Croce 
---
 Documentation/networking/ipvlan.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ipvlan.txt 
b/Documentation/networking/ipvlan.txt
index 24196ce..1fe42a8 100644
--- a/Documentation/networking/ipvlan.txt
+++ b/Documentation/networking/ipvlan.txt
@@ -22,9 +22,9 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or 
as a module
There are no module parameters for this driver and it can be configured
 using IProute2/ip utility.
 
-   ip link add link   type ipvlan mode { l2 | l3 | 
l3s }
+   ip link add link  name  type ipvlan mode { l2 | 
l3 | l3s }
 
-   e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
+   e.g. ip link add link eth0 name ipvl0 type ipvlan mode l2
 
 
 4. Operating modes:
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 00/20] ILP32 for ARM64

2017-06-29 Thread Catalin Marinas
Hi Yury,

On Mon, Jun 19, 2017 at 06:49:43PM +0300, Yury Norov wrote:
> This series enables aarch64 with ilp32 mode.

Thanks for putting this series together, I do appreciate the effort.
There are still some review comments coming in but I'm happy with how
the ABI looks now. I did some LTP testing (AArch64/LP64, AArch64/ILP32,
AArch32) and benchmarking and didn't see any regressions (apart from an
LTP bug with sync_file_range2). James Morse is working on reproducing
similar testing in ARM. Szabolcs reported some glibc test-suite
regressions on the libc-alpha list which I assume will be followed up.
VDSO in C is another issue I'd like sorted but this is not strictly
specific to ILP32 and can be done as a follow up. Note that I didn't run
any big-endian tests, though this is something that needs doing.

Now, having agreed on the ABI and implementation very close to being
ready doesn't necessarily make the code suitable for upstream. With my
maintainer hat on, I'm trying to see where ILP32 will be in 2-5-10
years, whether anyone still cares about it in this time frame. The
difference from a driver or SoC support is that ABIs are very hard to
revert, though are as (or even more) likely to bit-rot when not in use
or regularly tested (we have the big-endian experience here).

There are two main aspects to make the code upstream-worthy:

1. Actual/real users (current, future). I don't mean just a few distros
   showing that it can be done but actual/planned real deployments

2. Long term testing/maintenance plan. This is not about kernel code
   maintenance but a healthy ILP32 ecosystem:
   a) readily available toolchains (x86-hosted and AArch64-hosted)
   b) filesystems (can be large distros like openSUSE or more
  embedded-oriented like Yocto or OpenEmbedded)
   c) suitable continuous regression testing (kernel + userland)
   d) commitment from all parties involved (including ARM Ltd) to treat
  the ILP32 ABI as a (nearly) first class citizen

It is pretty clear from private discussions that there are potential
users but at the moment I can't tell if those would turn into real
deployments of production systems. As for (2), the long term plans are
not convincing (or I haven't spotted them yet), so I'd like to see the
interested parties putting a plan together (something along the lines of
kernelci.org + LTP, glibc buildbot).

What I'd like to propose is that Will and I (as arm64 maintainers, maybe
with with the help of others including this series' authors) take over
the series and push it to a staging branch under the arm64 kernel on
git.kernel.org. This is aimed as a commitment to keep the ABI *stable*
and will be rebased with every kernel release (starting with 4.13). The
decision to merge upstream will be revisited every 6 months, assessing
the progress on the points I mentioned above, with a time limit of 2
years when, if still not upstream, we will stop maintaining such branch.

I am aware that the above proposal has an impact on the glibc patches
since they will not merge a new ABI upstream until officially supported
by the kernel. I cc'ed some of the glibc developers and they will follow
up on the libc-alpha list.

> As supporting work, it introduces ARCH_32BIT_OFF_T configuration
> option that is enabled for existing 32-bit architectures but disabled
> for new arches (so 64-bit off_t userspace type is used by new userspace).
> Also it deprecates getrlimit and setrlimit syscalls prior to prlimit64.
[...]
> Patches 1, 2, 3 and 8 are general, and may be applied separately.

These 4 patches should be merged independently, I don't see a point in
carrying them with the ILP32 series. Arnd, are you ok to push them
upstream?

BTW, patch 3 seems to never make it to the linux-arm-kernel list, I
guess too many on cc.

--
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] cpufreq: governor: Drop min_sampling_rate

2017-06-29 Thread Viresh Kumar
The cpufreq core and governors aren't supposed to set a limit on how
fast we want to try changing the frequency. This is currently done for
the legacy governors with help of min_sampling_rate.

At worst, we may end up setting the sampling rate to a value lower than
the rate at which frequency can be changed and then one of the CPUs in
the policy will be only changing frequency for ever.

But that is something for the user to decide and there is no need to
have special handling for such cases in the core. Leave it for the user
to figure out.

Signed-off-by: Viresh Kumar 
---
 Documentation/admin-guide/pm/cpufreq.rst |  8 
 drivers/cpufreq/cpufreq_conservative.c   |  6 --
 drivers/cpufreq/cpufreq_governor.c   | 10 ++
 drivers/cpufreq/cpufreq_governor.h   |  1 -
 drivers/cpufreq/cpufreq_ondemand.c   | 12 
 include/linux/cpufreq.h  |  2 --
 6 files changed, 2 insertions(+), 37 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpufreq.rst 
b/Documentation/admin-guide/pm/cpufreq.rst
index 09aa2e949787..6adbe1ed58b9 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -471,14 +471,6 @@ it is allowed to use (the ``scaling_max_freq`` policy 
limit).
 
# echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > 
ondemand/sampling_rate
 
-
-``min_sampling_rate``
-   The minimum value of ``sampling_rate``.
-
-   Equal to 1 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and
-   :c:data:`tick_nohz_active` are both set or to 20 times the value of
-   :c:data:`jiffies` in microseconds otherwise.
-
 ``up_threshold``
If the estimated CPU load is above this value (in percent), the governor
will set the frequency to the maximum value allowed for the policy.
diff --git a/drivers/cpufreq/cpufreq_conservative.c 
b/drivers/cpufreq/cpufreq_conservative.c
index 88220ff3e1c2..f20f20a77d4d 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -246,7 +246,6 @@ gov_show_one_common(sampling_rate);
 gov_show_one_common(sampling_down_factor);
 gov_show_one_common(up_threshold);
 gov_show_one_common(ignore_nice_load);
-gov_show_one_common(min_sampling_rate);
 gov_show_one(cs, down_threshold);
 gov_show_one(cs, freq_step);
 
@@ -254,12 +253,10 @@ gov_attr_rw(sampling_rate);
 gov_attr_rw(sampling_down_factor);
 gov_attr_rw(up_threshold);
 gov_attr_rw(ignore_nice_load);
-gov_attr_ro(min_sampling_rate);
 gov_attr_rw(down_threshold);
 gov_attr_rw(freq_step);
 
 static struct attribute *cs_attributes[] = {
-   &min_sampling_rate.attr,
&sampling_rate.attr,
&sampling_down_factor.attr,
&up_threshold.attr,
@@ -297,10 +294,7 @@ static int cs_init(struct dbs_data *dbs_data)
dbs_data->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
dbs_data->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR;
dbs_data->ignore_nice_load = 0;
-
dbs_data->tuners = tuners;
-   dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
-   jiffies_to_usecs(10);
 
return 0;
 }
diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index 47e24b5384b3..858081f9c3d7 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -47,14 +47,11 @@ ssize_t store_sampling_rate(struct gov_attr_set *attr_set, 
const char *buf,
 {
struct dbs_data *dbs_data = to_dbs_data(attr_set);
struct policy_dbs_info *policy_dbs;
-   unsigned int rate;
int ret;
-   ret = sscanf(buf, "%u", &rate);
+   ret = sscanf(buf, "%u", &dbs_data->sampling_rate);
if (ret != 1)
return -EINVAL;
 
-   dbs_data->sampling_rate = max(rate, dbs_data->min_sampling_rate);
-
/*
 * We are operating under dbs_data->mutex and so the list and its
 * entries can't be freed concurrently.
@@ -437,10 +434,7 @@ int cpufreq_dbs_governor_init(struct cpufreq_policy 
*policy)
latency = 1;
 
/* Bring kernel and HW constraints together */
-   dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
- MIN_LATENCY_MULTIPLIER * latency);
-   dbs_data->sampling_rate = max(dbs_data->min_sampling_rate,
- LATENCY_MULTIPLIER * latency);
+   dbs_data->sampling_rate = LATENCY_MULTIPLIER * latency;
 
if (!have_governor_per_policy())
gov->gdbs_data = dbs_data;
diff --git a/drivers/cpufreq/cpufreq_governor.h 
b/drivers/cpufreq/cpufreq_governor.h
index 7cbb07512e4c..06d9f90ede93 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -41,7 +41,6 @@ enum {OD_NORMAL_SAMPLE, OD_SUB_SAMPLE};
 struct dbs_data {
struct gov_attr_set attr_set;
void *tuners;
-   unsigned int min_sampling_rate;
unsigned int ignore_nice_load;
unsigned

Re: [v3 1/6] mm, oom: use oom_victims counter to synchronize oom victim selection

2017-06-29 Thread Michal Hocko
On Wed 21-06-17 22:19:11, Roman Gushchin wrote:
> Oom killer should avoid unnecessary kills. To prevent them, during
> the tasks list traverse we check for task which was previously
> selected as oom victims. If there is such a task, new victim
> is not selected.
> 
> This approach is sub-optimal (we're doing costly iteration over the task
> list every time) and will not work for the cgroup-aware oom killer.
> 
> We already have oom_victims counter, which can be effectively used
> for the task.

A global counter will not work properly, I am afraid. a) you should
consider the oom domain and do not block oom on unrelated domains and b)
you have no guarantee that the oom victim will terminate reasonably.
That is why we have MMF_OOM_SKIP check in oom_evaluate_task.

I think you should have something similar for your memcg victim selection.
If you see a memcg in the oom hierarchy with oom victims which are alive
and not MMF_OOM_SKIP, you should abort the scanning.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 5/6] mm, oom: don't mark all oom victims tasks with TIF_MEMDIE

2017-06-29 Thread Michal Hocko
On Wed 21-06-17 22:19:15, Roman Gushchin wrote:
> We want to limit the number of tasks which are having an access
> to the memory reserves. To ensure the progress it's enough
> to have one such process at the time.
> 
> If we need to kill the whole cgroup, let's give an access to the
> memory reserves only to the first process in the list, which is
> (usually) the biggest process.
> This will give us good chances that all other processes will be able
> to quit without an access to the memory reserves.

I don't like this to be honest. Is there any reason to go the reduced
memory reserves access to oom victims I was suggesting earlier [1]?

[1] 
http://lkml.kernel.org/r/http://lkml.kernel.org/r/1472723464-22866-2-git-send-email-mho...@kernel.org
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html