Re: [v8 0/4] cgroup-aware OOM killer

2017-09-13 Thread Roman Gushchin
On Wed, Sep 13, 2017 at 02:29:14PM +0200, Michal Hocko wrote:
> On Mon 11-09-17 13:44:39, David Rientjes wrote:
> > On Mon, 11 Sep 2017, Roman Gushchin wrote:
> > 
> > > This patchset makes the OOM killer cgroup-aware.
> > > 
> > > v8:
> > >   - Do not kill tasks with OOM_SCORE_ADJ -1000
> > >   - Make the whole thing opt-in with cgroup mount option control
> > >   - Drop oom_priority for further discussions
> > 
> > Nack, we specifically require oom_priority for this to function correctly, 
> > otherwise we cannot prefer to kill from low priority leaf memcgs as 
> > required.
> 
> While I understand that your usecase might require priorities I do not
> think this part missing is a reason to nack the cgroup based selection
> and kill-all parts. This can be done on top. The only important part
> right now is the current selection semantic - only leaf memcgs vs. size
> of the hierarchy).

I agree.

> I strongly believe that comparing only leaf memcgs
> is more straightforward and it doesn't lead to unexpected results as
> mentioned before (kill a small memcg which is a part of the larger
> sub-hierarchy).

One of two main goals of this patchset is to introduce cgroup-level
fairness: bigger cgroups should be affected more than smaller,
despite the size of tasks inside. I believe the same principle
should be used for cgroups.

Also, the opposite will make oom_semantics more weird: it will mean
kill all tasks, but also treat memcg as a leaf cgroup.

> 
> I didn't get to read the new version of this series yet and hope to get
> to it soon.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] led: ledtrig-transient: add support for hrtimer

2017-09-13 Thread Pavel Machek
On Wed 2017-09-13 14:20:58, David Lin wrote:
> On Wed, Sep 13, 2017 at 1:20 PM, Pavel Machek  wrote:
> >
> > Hi!
> >
> > > These patch series add the LED_BRIGHTNESS_FAST flag support for
> > > ledtrig-transient to use hrtimer so that platforms with high-resolution 
> > > timer
> > > support can have better accuracy in the trigger duration timing. The need 
> > > for
> > > this support is driven by the fact that Android has removed the 
> > > timed_ouput [1]
> > > and is now using led-trigger for handling vibrator control which requires 
> > > the
> > > timer to be accurate up to a millisecond. However, this flag support 
> > > would also
> > > allow hrtimer to co-exist with the ktimer without causing warning to the
> > > existing drivers [2].
> >
> > NAK.
> >
> > LEDs do not need extra overhead, and vibrator control should not go
> > through LED subsystem.
> >
> > Input subsystem includes support for vibrations and force
> > feedback. Please use that instead.
> 
> I thought we are already over this discussion. As of now, the support
> of vibrator through ledtrig is documented
> (Documentation/leds/ledtrig-transient.txt) and there are users using
> it.

I also thought we are over with that discussion.

Yes, I'm working on fixing the docs.

What mainline users are doing that?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH v2 0/3] led: ledtrig-transient: add support for hrtimer

2017-09-13 Thread David Lin
On Wed, Sep 13, 2017 at 1:20 PM, Pavel Machek  wrote:
>
> Hi!
>
> > These patch series add the LED_BRIGHTNESS_FAST flag support for
> > ledtrig-transient to use hrtimer so that platforms with high-resolution 
> > timer
> > support can have better accuracy in the trigger duration timing. The need 
> > for
> > this support is driven by the fact that Android has removed the timed_ouput 
> > [1]
> > and is now using led-trigger for handling vibrator control which requires 
> > the
> > timer to be accurate up to a millisecond. However, this flag support would 
> > also
> > allow hrtimer to co-exist with the ktimer without causing warning to the
> > existing drivers [2].
>
> NAK.
>
> LEDs do not need extra overhead, and vibrator control should not go
> through LED subsystem.
>
> Input subsystem includes support for vibrations and force
> feedback. Please use that instead.

I thought we are already over this discussion. As of now, the support
of vibrator through ledtrig is documented
(Documentation/leds/ledtrig-transient.txt) and there are users using
it.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v8 2/4] mm, oom: cgroup-aware OOM killer

2017-09-13 Thread David Rientjes
On Mon, 11 Sep 2017, Roman Gushchin wrote:

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 15af3da5af02..da2b12ea4667 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2661,6 +2661,231 @@ static inline bool memcg_has_children(struct 
> mem_cgroup *memcg)
>   return ret;
>  }
>  
> +static long memcg_oom_badness(struct mem_cgroup *memcg,
> +   const nodemask_t *nodemask,
> +   unsigned long totalpages)
> +{
> + long points = 0;
> + int nid;
> + pg_data_t *pgdat;
> +
> + /*
> +  * We don't have necessary stats for the root memcg,
> +  * so we define it's oom_score as the maximum oom_score
> +  * of the belonging tasks.
> +  */
> + if (memcg == root_mem_cgroup) {
> + struct css_task_iter it;
> + struct task_struct *task;
> + long score, max_score = 0;
> +
> + css_task_iter_start(>css, 0, );
> + while ((task = css_task_iter_next())) {
> + score = oom_badness(task, memcg, nodemask,
> + totalpages);
> + if (max_score > score)

score > max_score

> + max_score = score;
> + }
> + css_task_iter_end();
> +
> + return max_score;
> + }
> +
> + for_each_node_state(nid, N_MEMORY) {
> + if (nodemask && !node_isset(nid, *nodemask))
> + continue;
> +
> + points += mem_cgroup_node_nr_lru_pages(memcg, nid,
> + LRU_ALL_ANON | BIT(LRU_UNEVICTABLE));
> +
> + pgdat = NODE_DATA(nid);
> + points += lruvec_page_state(mem_cgroup_lruvec(pgdat, memcg),
> + NR_SLAB_UNRECLAIMABLE);
> + }
> +
> + points += memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) /
> + (PAGE_SIZE / 1024);
> + points += memcg_page_state(memcg, MEMCG_SOCK);
> + points += memcg_page_state(memcg, MEMCG_SWAP);
> +
> + return points;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v8 0/4] cgroup-aware OOM killer

2017-09-13 Thread David Rientjes
On Wed, 13 Sep 2017, Michal Hocko wrote:

> > > This patchset makes the OOM killer cgroup-aware.
> > > 
> > > v8:
> > >   - Do not kill tasks with OOM_SCORE_ADJ -1000
> > >   - Make the whole thing opt-in with cgroup mount option control
> > >   - Drop oom_priority for further discussions
> > 
> > Nack, we specifically require oom_priority for this to function correctly, 
> > otherwise we cannot prefer to kill from low priority leaf memcgs as 
> > required.
> 
> While I understand that your usecase might require priorities I do not
> think this part missing is a reason to nack the cgroup based selection
> and kill-all parts. This can be done on top. The only important part
> right now is the current selection semantic - only leaf memcgs vs. size
> of the hierarchy). I strongly believe that comparing only leaf memcgs
> is more straightforward and it doesn't lead to unexpected results as
> mentioned before (kill a small memcg which is a part of the larger
> sub-hierarchy).
> 

The problem is that we cannot enable the cgroup-aware oom killer and 
oom_group behavior because, without oom priorities, we have no ability to 
influence the cgroup that it chooses.  It is doing two things: providing 
more fairness amongst cgroups by selecting based on cumulative usage 
rather than single large process (good!), and effectively is removing all 
userspace control of oom selection (bad).  We want the former, but it 
needs to be coupled with support so that we can protect vital cgroups, 
regardless of their usage.

It is certainly possible to add oom priorities on top before it is merged, 
but I don't see why it isn't part of the patchset.  We need it before its 
merged to avoid users playing with /proc/pid/oom_score_adj to prevent any 
killing in the most preferable memcg when they could have simply changed 
the oom priority.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] led: ledtrig-transient: add support for hrtimer

2017-09-13 Thread Pavel Machek
Hi!

> These patch series add the LED_BRIGHTNESS_FAST flag support for
> ledtrig-transient to use hrtimer so that platforms with high-resolution timer
> support can have better accuracy in the trigger duration timing. The need for
> this support is driven by the fact that Android has removed the timed_ouput 
> [1]
> and is now using led-trigger for handling vibrator control which requires the
> timer to be accurate up to a millisecond. However, this flag support would 
> also
> allow hrtimer to co-exist with the ktimer without causing warning to the
> existing drivers [2].

NAK.

LEDs do not need extra overhead, and vibrator control should not go
through LED subsystem.

Input subsystem includes support for vibrations and force
feedback. Please use that instead.

Pavel

> David Lin (3):
>   leds: Replace flags bit shift with BIT() macros
>   leds: Add the LED_BRIGHTNESS_FAST flag
>   led: ledtrig-transient: add support for hrtimer
> 
>  Documentation/leds/leds-class.txt|  5 +++
>  drivers/leds/trigger/ledtrig-transient.c | 59 
> +---
>  include/linux/leds.h | 19 +-
>  3 files changed, 69 insertions(+), 14 deletions(-)
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH 3/3 v11] printk: Add monotonic, boottime, and realtime timestamps

2017-09-13 Thread Mark Salyzyn

On 09/05/2017 05:06 AM, Prarit Bhargava wrote:

printk.time=1/CONFIG_PRINTK_TIME=1 adds a unmodified local hardware clock
timestamp to printk messages.  The local hardware clock loses time each
day making it difficult to determine exactly when an issue has occurred in
the kernel log, and making it difficult to determine how kernel and
hardware issues relate to each other in real time.

Make printk output different timestamps by adding options for no
timestamp, the local hardware clock, the monotonic clock, the boottime
clock, and the real clock.  Allow a user to pick one of the clocks by
using the printk.time kernel parameter.  Output the type of clock in
/sys/module/printk/parameters/time so userspace programs can interpret the
timestamp.

v2: Use peterz's suggested Kconfig options.  Merge patchset together.
Fix i386 !CONFIG_PRINTK builds.

v3: Fixed x86_64_defconfig. Added printk_time_type enum and
printk_time_str for better output. Added BOOTTIME clock functionality.

v4: Fix messages, add additional printk.time options, and fix configs.

v5: Renaming of structures, and allow printk_time_set() to
evaluate substrings of entries (eg: allow 'r', 'real', 'realtime').  From
peterz, make fast functions return 0 until timekeeping is initialized
(removes timekeeping_active & ktime_get_boot|real_log_ts() suggested by
  tglx and adds ktime_get_real_offset()).  Switch to a function pointer
for printk_get_ts() and reference fast functions.  Make timestamp_sources enum
match choice options for CONFIG_PRINTK_TIME (adds PRINTK_TIME_UNDEFINED).

v6: Define PRINTK_TIME_UNDEFINED for !CONFIG_PRINTK builds.  Separate
timekeeping changes into separate patch.  Minor include file cleanup.

v7: Add default case to printk_set_timestamp() and add PRINTK_TIME_DEBUG
for users that want to set timestamp to different values during runtime.
Add jstultz' Kconfig to avoid defconfig churn.

v8: Add CONFIG_PRINTK_TIME_DEBUG to allow timestamp runtime switching.
Rename PRINTK_TIME_DISABLE to PRINTK_TIME_DISABLED.  Rename
printk_set_timestamp() to printk_set_ts_func().  Separate
printk_set_ts_func() and printk_get_first_ts() portions.  Rename param
functions.  Adjust configs, enum, and timestamp_sources_str to be 0-4.
Add mention realtime clock is UTC in Documentation.

v9: Fix typo.  Add __ktime_get_real_fast_ns_unsafe().

v10: Remove time parameter restrictions.


Ack and unit tested on Backport to 4.9.

(you are missing the v11 respin comment, NBD)

-- Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] arm64: fix documentation on kernel pages mappings to HYP VA

2017-09-13 Thread Yury Norov
The Documentation/arm64/memory.txt says:
When using KVM, the hypervisor maps kernel pages in EL2, at a fixed
offset from the kernel VA (top 24bits of the kernel VA set to zero):

In fact, kernel addresses are transleted to HYP with kern_hyp_va macro,
which has more options, and none of them assumes clearing of top 24bits
of the kernel VA.

Signed-off-by: Yury Norov 
---
 Documentation/arm64/memory.txt | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt
index d7273a5f6456..c39895d7e3a2 100644
--- a/Documentation/arm64/memory.txt
+++ b/Documentation/arm64/memory.txt
@@ -86,9 +86,12 @@ Translation table lookup with 64KB pages:
  +-> [63] TTBR0/1
 
 
-When using KVM, the hypervisor maps kernel pages in EL2, at a fixed
-offset from the kernel VA (top 24bits of the kernel VA set to zero):
-
-Start  End SizeUse

-0040   007f 256GB  kernel objects 
mapped in HYP
+When using KVM without Virtualization Host Extensions, the hypervisor maps
+kernel pages in EL2, at a fixed offset from the kernel VA. Namely, top 16
+or 25 bits of the kernel VA set to zero depending on ARM64_VA_BITS_48 or
+ARM64_VA_BITS_39 config option selected; or top 17 or 26 bits of the kernel
+VA set to zero if CPU has Reduced HYP mapping offset capability. See
+kern_hyp_va macro.
+
+When using KVM with Virtualization Host Extensions, no additional mappings
+created as host kernel already operates in EL2.
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] led: ledtrig-transient: add support for hrtimer

2017-09-13 Thread David Lin
Hi,

These patch series add the LED_BRIGHTNESS_FAST flag support for
ledtrig-transient to use hrtimer so that platforms with high-resolution timer
support can have better accuracy in the trigger duration timing. The need for
this support is driven by the fact that Android has removed the timed_ouput [1]
and is now using led-trigger for handling vibrator control which requires the
timer to be accurate up to a millisecond. However, this flag support would also
allow hrtimer to co-exist with the ktimer without causing warning to the
existing drivers [2].

David

[1] https://patchwork.kernel.org/patch/8664831/
[2] https://lkml.org/lkml/2015/4/28/260

Changes from v1 to v2:
- Convert all the bit shifting flag in leds.h to use the BIT macro.
- Removed inline modifiers for the timer helper function.

David Lin (3):
  leds: Replace flags bit shift with BIT() macros
  leds: Add the LED_BRIGHTNESS_FAST flag
  led: ledtrig-transient: add support for hrtimer

 Documentation/leds/leds-class.txt|  5 +++
 drivers/leds/trigger/ledtrig-transient.c | 59 +---
 include/linux/leds.h | 19 +-
 3 files changed, 69 insertions(+), 14 deletions(-)

-- 
2.14.1.581.gf28d330327-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] leds: Replace flags bit shift with BIT() macros

2017-09-13 Thread David Lin
This is for readability as well as to avoid checkpatch warnings when
adding new bit flag information in the future.

Signed-off-by: David Lin 
---
 include/linux/leds.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/leds.h b/include/linux/leds.h
index bf6db4fe895b..5579c64c8fd6 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -40,16 +40,16 @@ struct led_classdev {
int  flags;
 
/* Lower 16 bits reflect status */
-#define LED_SUSPENDED  (1 << 0)
-#define LED_UNREGISTERING  (1 << 1)
+#define LED_SUSPENDED  BIT(0)
+#define LED_UNREGISTERING  BIT(1)
/* Upper 16 bits reflect control information */
-#define LED_CORE_SUSPENDRESUME (1 << 16)
-#define LED_SYSFS_DISABLE  (1 << 17)
-#define LED_DEV_CAP_FLASH  (1 << 18)
-#define LED_HW_PLUGGABLE   (1 << 19)
-#define LED_PANIC_INDICATOR(1 << 20)
-#define LED_BRIGHT_HW_CHANGED  (1 << 21)
-#define LED_RETAIN_AT_SHUTDOWN (1 << 22)
+#define LED_CORE_SUSPENDRESUME BIT(16)
+#define LED_SYSFS_DISABLE  BIT(17)
+#define LED_DEV_CAP_FLASH  BIT(18)
+#define LED_HW_PLUGGABLE   BIT(19)
+#define LED_PANIC_INDICATORBIT(20)
+#define LED_BRIGHT_HW_CHANGED  BIT(21)
+#define LED_RETAIN_AT_SHUTDOWN BIT(22)
 
/* set_brightness_work / blink_timer flags, atomic, private. */
unsigned long   work_flags;
-- 
2.14.1.581.gf28d330327-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] leds: Add the LED_BRIGHTNESS_FAST flag

2017-09-13 Thread David Lin
This patch adds the LED_BRIGHTNESS_FAST flag to allow the driver to
indicate that the brightness_set() callback is implemented on a fastpath
so that the LED core may choose to for example use a hrtimer to
implement the duration of a trigger for better timing accuracy.

Suggested-by: Jacek Anaszewski 
Signed-off-by: David Lin 
---
 Documentation/leds/leds-class.txt | 5 +
 include/linux/leds.h  | 1 +
 2 files changed, 6 insertions(+)

diff --git a/Documentation/leds/leds-class.txt 
b/Documentation/leds/leds-class.txt
index 836cb16d6f09..70d7a3dca621 100644
--- a/Documentation/leds/leds-class.txt
+++ b/Documentation/leds/leds-class.txt
@@ -80,6 +80,11 @@ flag must be set in flags before registering. Calling
 led_classdev_notify_brightness_hw_changed on a classdev not registered with
 the LED_BRIGHT_HW_CHANGED flag is a bug and will trigger a WARN_ON.
 
+Optionally, the driver may choose to register with the LED_BRIGHTNESS_FAST 
flag.
+This flag indicates that the driver implements the brightness_set() callback
+function using a fastpath so the LED core can use hrtimer if the driver 
requires
+high precision for the trigger timing.
+
 Hardware accelerated blink of LEDs
 ==
 
diff --git a/include/linux/leds.h b/include/linux/leds.h
index 5579c64c8fd6..ccfa0a1799fe 100644
--- a/include/linux/leds.h
+++ b/include/linux/leds.h
@@ -50,6 +50,7 @@ struct led_classdev {
 #define LED_PANIC_INDICATORBIT(20)
 #define LED_BRIGHT_HW_CHANGED  BIT(21)
 #define LED_RETAIN_AT_SHUTDOWN BIT(22)
+#define LED_BRIGHTNESS_FASTBIT(23)
 
/* set_brightness_work / blink_timer flags, atomic, private. */
unsigned long   work_flags;
-- 
2.14.1.581.gf28d330327-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] led: ledtrig-transient: add support for hrtimer

2017-09-13 Thread David Lin
This patch adds a hrtimer to ledtrig-transient so that when driver is
registered with LED_BRIGHTNESS_FAST, the hrtimer is used for the better
time accuracy in handling the duration.

Signed-off-by: David Lin 
---
 drivers/leds/trigger/ledtrig-transient.c | 59 +---
 1 file changed, 54 insertions(+), 5 deletions(-)

diff --git a/drivers/leds/trigger/ledtrig-transient.c 
b/drivers/leds/trigger/ledtrig-transient.c
index 7e6011bd3646..7d2ce757b39d 100644
--- a/drivers/leds/trigger/ledtrig-transient.c
+++ b/drivers/leds/trigger/ledtrig-transient.c
@@ -24,15 +24,18 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "../leds.h"
 
 struct transient_trig_data {
+   struct led_classdev *led_cdev;
int activate;
int state;
int restore_state;
unsigned long duration;
struct timer_list timer;
+   struct hrtimer hrtimer;
 };
 
 static void transient_timer_function(unsigned long data)
@@ -44,6 +47,54 @@ static void transient_timer_function(unsigned long data)
led_set_brightness_nosleep(led_cdev, transient_data->restore_state);
 }
 
+static enum hrtimer_restart transient_hrtimer_function(struct hrtimer *timer)
+{
+   struct transient_trig_data *transient_data =
+   container_of(timer, struct transient_trig_data, hrtimer);
+
+   transient_timer_function((unsigned long)transient_data->led_cdev);
+
+   return HRTIMER_NORESTART;
+}
+
+static void transient_timer_setup(struct led_classdev *led_cdev)
+{
+   struct transient_trig_data *tdata = led_cdev->trigger_data;
+
+   if (led_cdev->flags & LED_BRIGHTNESS_FAST) {
+   tdata->led_cdev = led_cdev;
+   hrtimer_init(>hrtimer, CLOCK_MONOTONIC,
+HRTIMER_MODE_REL);
+   tdata->hrtimer.function = transient_hrtimer_function;
+   } else {
+   setup_timer(>timer, transient_timer_function,
+   (unsigned long)led_cdev);
+   }
+}
+
+static void transient_timer_start(struct led_classdev *led_cdev)
+{
+   struct transient_trig_data *tdata = led_cdev->trigger_data;
+
+   if (led_cdev->flags & LED_BRIGHTNESS_FAST) {
+   hrtimer_start(>hrtimer, ms_to_ktime(tdata->duration),
+ HRTIMER_MODE_REL);
+   } else {
+   mod_timer(>timer,
+ jiffies + msecs_to_jiffies(tdata->duration));
+   }
+}
+
+static void transient_timer_cancel(struct led_classdev *led_cdev)
+{
+   struct transient_trig_data *tdata = led_cdev->trigger_data;
+
+   if (led_cdev->flags & LED_BRIGHTNESS_FAST)
+   hrtimer_cancel(>hrtimer);
+   else
+   del_timer_sync(>timer);
+}
+
 static ssize_t transient_activate_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -70,7 +121,7 @@ static ssize_t transient_activate_store(struct device *dev,
 
/* cancel the running timer */
if (state == 0 && transient_data->activate == 1) {
-   del_timer(_data->timer);
+   transient_timer_cancel(led_cdev);
transient_data->activate = state;
led_set_brightness_nosleep(led_cdev,
transient_data->restore_state);
@@ -84,8 +135,7 @@ static ssize_t transient_activate_store(struct device *dev,
led_set_brightness_nosleep(led_cdev, transient_data->state);
transient_data->restore_state =
(transient_data->state == LED_FULL) ? LED_OFF : LED_FULL;
-   mod_timer(_data->timer,
- jiffies + msecs_to_jiffies(transient_data->duration));
+   transient_timer_start(led_cdev);
}
 
/* state == 0 && transient_data->activate == 0
@@ -182,8 +232,7 @@ static void transient_trig_activate(struct led_classdev 
*led_cdev)
if (rc)
goto err_out_state;
 
-   setup_timer(>timer, transient_timer_function,
-   (unsigned long) led_cdev);
+   transient_timer_setup(led_cdev);
led_cdev->activated = true;
 
return;
-- 
2.14.1.581.gf28d330327-goog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v8 0/4] cgroup-aware OOM killer

2017-09-13 Thread Michal Hocko
On Mon 11-09-17 13:44:39, David Rientjes wrote:
> On Mon, 11 Sep 2017, Roman Gushchin wrote:
> 
> > This patchset makes the OOM killer cgroup-aware.
> > 
> > v8:
> >   - Do not kill tasks with OOM_SCORE_ADJ -1000
> >   - Make the whole thing opt-in with cgroup mount option control
> >   - Drop oom_priority for further discussions
> 
> Nack, we specifically require oom_priority for this to function correctly, 
> otherwise we cannot prefer to kill from low priority leaf memcgs as 
> required.

While I understand that your usecase might require priorities I do not
think this part missing is a reason to nack the cgroup based selection
and kill-all parts. This can be done on top. The only important part
right now is the current selection semantic - only leaf memcgs vs. size
of the hierarchy). I strongly believe that comparing only leaf memcgs
is more straightforward and it doesn't lead to unexpected results as
mentioned before (kill a small memcg which is a part of the larger
sub-hierarchy).

I didn't get to read the new version of this series yet and hope to get
to it soon.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v8 3/4] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

2017-09-13 Thread Michal Hocko
On Tue 12-09-17 21:01:15, Roman Gushchin wrote:
> On Mon, Sep 11, 2017 at 01:48:39PM -0700, David Rientjes wrote:
> > On Mon, 11 Sep 2017, Roman Gushchin wrote:
> > 
> > > Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware
> > > OOM killer. If not set, the OOM selection is performed in
> > > a "traditional" per-process way.
> > > 
> > > The behavior can be changed dynamically by remounting the cgroupfs.
> > 
> > I can't imagine that Tejun would be happy with a new mount option, 
> > especially when it's not required.
> > 
> > OOM behavior does not need to be defined at mount time and for the entire 
> > hierarchy.  It's possible to very easily implement a tunable as part of 
> > mem cgroup that is propagated to descendants and controls the oom scoring 
> > behavior for that hierarchy.  It does not need to be system wide and 
> > affect scoring of all processes based on which mem cgroup they are 
> > attached to at any given time.
> 
> No, I don't think that mixing per-cgroup and per-process OOM selection
> algorithms is a good idea.
> 
> So, there are 3 reasonable options:
> 1) boot option
> 2) sysctl
> 3) cgroup mount option
> 
> I believe, 3) is better, because it allows changing the behavior dynamically,
> and explicitly depends on v2 (what sysctl lacks).

I see your argument here. I would just be worried that we end up really
needing more oom strategies in future and those wouldn't fit into memcg
mount option scope. So 1/2 sounds more exensible to me long term. Boot
time would be easier because we do not have to bother dynamic selection
in that case.

> So, the only question is should it be opt-in or opt-out option.
> Personally, I would prefer opt-out, but Michal has a very strong opinion here.

Yes I still strongly believe this has to be opt-in.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 0/7] x86/idle: add halt poll support

2017-09-13 Thread Yang Zhang

On 2017/8/29 22:56, Michael S. Tsirkin wrote:

On Tue, Aug 29, 2017 at 11:46:34AM +, Yang Zhang wrote:

Some latency-intensive workload will see obviously performance
drop when running inside VM.


But are we trading a lot of CPU for a bit of lower latency?


The main reason is that the overhead
is amplified when running inside VM. The most cost i have seen is
inside idle path.

This patch introduces a new mechanism to poll for a while before
entering idle state. If schedule is needed during poll, then we
don't need to goes through the heavy overhead path.


Isn't it the job of an idle driver to find the best way to
halt the CPU?

It looks like just by adding a cstate we can make it
halt at higher latencies only. And at lower latencies,
if it's doing a good job we can hopefully use mwait to
stop the CPU.

In fact I have been experimenting with exactly that.
Some initial results are encouraging but I could use help
with testing and especially tuning. If you can help
pls let me know!


Quan, Can you help to test it and give result? Thanks.

--
Yang
Alibaba Cloud Computing
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html