Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-26 Thread Aaron Tomlin
On Fri 2021-03-26 16:36 +0100, Michal Hocko wrote:
> We should be focusing on the compaction retry logic and see whether we
> can have some "run away" scenarios there.

Fair enough.

> Seeing so many retries without compaction bailing out sounds like a bug
> in that retry logic.

Agreed - I will continue to look into this.



Kind regards,

-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-26 Thread Aaron Tomlin
Hi Michal,

On Fri 2021-03-26 09:16 +0100, Michal Hocko wrote:
> The oom killer is never triggered for costly allocation request.

Yes - I agree. Looking at __alloc_pages_may_oom() I can see for a costly
order allocation request the OOM killer is explicitly not used.
If I understand correctly, the patch I proposed was for the following
scenarios:

  1.The costly order allocation request to fail when
"some" progress is made (i.e. order-0) and the last
compaction request was "skipped"

  2.A non-costly order allocation request that is
unable to make any progress and failed over the
maximum reclaim retry count in a row and the last
known compaction result was skipped to consider
using the OOM killer for assistance

> Both reclaim and compaction maintain their own retries counters as they
> are targeting a different operation. Although the compaction really
> depends on the reclaim to do some progress.

Yes. Looking at should_compact_retry() if the last known compaction result
was skipped i.e. suggesting there was not enough order-0 pages to support
compaction, so assistance is needed from reclaim
(see __compaction_suitable()).

I noticed that the value of compaction_retries, compact_result and
compact_priority was 0, COMPACT_SKIPPED and 1 i.e. COMPACT_PRIO_SYNC_LIGHT,
respectively.

> OK, this sound unexpected as it indicates that the reclaim is able to
> make a forward progress but compaction doesn't want to give up and keeps
> retrying. Are you able to reproduce this or could you find out which
> specific condition keeps compaction retrying? I would expect that it is
> one of the 3 conditions before the max_retries is checked.

Unfortunately, I have been told it is not entirely reproducible.
I suspect it is the following in should_compact_retry() - as I indicated
above the last known value stored in compaction_retries was 0:


if (order > PAGE_ALLOC_COSTLY_ORDER)
max_retries /= 4;
if (*compaction_retries <= max_retries) {
ret = true;
    goto out;
}




Kind regards,

-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-25 Thread Aaron Tomlin
On Mon 2021-03-22 11:47 +0100, Michal Hocko wrote:
> Costly orders already do have heuristics for the retry in place. Could
> you be more specific what kind of problem you see with those?

If I understand correctly, when the gfp_mask consists of
GFP_KERNEL | __GFP_RETRY_MAYFAIL in particular, an allocation request will
fail, if and only if reclaim is unable to make progress.

The costly order allocation retry logic is handled primarily in
should_reclaim_retry(). Looking at should_reclaim_retry() we see that the
no progress counter value is always incremented in the costly order case
even when "some" progress has been made which is represented by the value
stored in did_some_progress.

if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
goto nopage;

if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 did_some_progress > 0, _progress_loops))
goto retry;

I think after we have tried MAX_RECLAIM_RETRIES in a row without success
and the last known compaction attempt was "skipped", perhaps it would be
better to try and use the OOM killer or fail the allocation attempt?

I encountered a situation when the value of no_progress_loops was found to
be 31,611,688 i.e. significantly over MAX_RECLAIM_RETRIES; the allocation
order was 5. The gfp_mask contained the following:

 #define ___GFP_HIGHMEM  0x02
 #define ___GFP_IO   0x40
 #define ___GFP_FS   0x80
 #define ___GFP_NOWARN   0x200
 #define ___GFP_RETRY_MAYFAIL0x400
 #define ___GFP_COMP 0x4000
 #define ___GFP_HARDWALL 0x2
 #define ___GFP_DIRECT_RECLAIM   0x20
 #define ___GFP_KSWAPD_RECLAIM   0x40



-- 
Aaron Tomlin



Re: [PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-19 Thread Aaron Tomlin
Hi Michal,

On Thu 2021-03-18 17:16 +0100, Michal Hocko wrote:
> On Mon 15-03-21 16:58:37, Aaron Tomlin wrote:
> > In the situation where direct reclaim is required to make progress for
> > compaction but no_progress_loops is already over the limit of
> > MAX_RECLAIM_RETRIES consider invoking the oom killer.

Firstly, thank you for your response.

> What is the problem you are trying to fix?

If I understand correctly, in the case of a "costly" order allocation
request that is permitted to repeatedly retry, it is possible to exceed the
maximum reclaim retry threshold as long as "some" progress is being made
even at the highest compaction priority. Furthermore, if the allocator has
a fatal signal pending, this is not considered.

In my opinion, it might be better to just give up straight away or try and
use the OOM killer only in the non-costly order allocation scenario to
assit reclaim. Looking at __alloc_pages_may_oom() the current logic is to
entirely skip the OOM killer for a costly order request, which makes sense.


Regards,

-- 
Aaron Tomlin



[PATCH] mm/page_alloc: try oom if reclaim is unable to make forward progress

2021-03-15 Thread Aaron Tomlin
In the situation where direct reclaim is required to make progress for
compaction but no_progress_loops is already over the limit of
MAX_RECLAIM_RETRIES consider invoking the oom killer.

Signed-off-by: Aaron Tomlin 
---
 mm/page_alloc.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a2c89b21115..8d748b1b8d1e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4181,6 +4181,16 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
int order,
return NULL;
 }
 
+static inline bool
+should_try_oom(int no_progress_loops,
+   enum compact_result last_compact_result)
+{
+   if (no_progress_loops > MAX_RECLAIM_RETRIES && last_compact_result
+   == COMPACT_SKIPPED)
+   return true;
+   return false;
+}
+
 static inline bool
 should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 enum compact_result compact_result,
@@ -4547,10 +4557,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 * Make sure we converge to OOM if we cannot make any progress
 * several times in the row.
 */
-   if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-   /* Before OOM, exhaust highatomic_reserve */
-   return unreserve_highatomic_pageblock(ac, true);
-   }
+   if (*no_progress_loops > MAX_RECLAIM_RETRIES)
+   result false;
+   /* Last chance before OOM, try draining highatomic_reserve once */
+   else if (*no_progress_loops == MAX_RECLAIM_RETRIES)
+   return unreserve_highatomic_pageblock(ac, true)
 
/*
 * Keep reclaiming pages while there is a chance this will lead
@@ -4822,6 +4833,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 did_some_progress > 0, _progress_loops))
goto retry;
 
+   if (should_try_oom(no_progress_loops, compact_result))
+   goto oom:
/*
 * It doesn't make any sense to retry for the compaction if the order-0
 * reclaim is not able to make any progress because the current
@@ -4839,6 +4852,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
if (check_retry_cpuset(cpuset_mems_cookie, ac))
goto retry_cpuset;
 
+oom:
/* Reclaim has failed us, start killing things */
page = __alloc_pages_may_oom(gfp_mask, order, ac, _some_progress);
if (page)
-- 
2.26.2



Re: [PATCH] memcg: add support to generate the total count of children from root

2020-11-26 Thread Aaron Tomlin
On Tue 2020-11-24 13:47 +, Aaron Tomlin wrote:
> On Tue, 24 Nov 2020 at 13:36, Michal Hocko  wrote:
> > This like any other user visible interface would be a much easier sell
> > if there was a clear usecase to justify it. I do not see anything
> > controversial about exporting such a value but my general take is that
> > we are only adding new interface when existing ones are insufficient. A
> > performance might be a very good reason but that would really require to
> > come with some real life numbers.

Michal,

> Fair enough and understood.
> 
> At this stage, I unfortunately do not have such supporting data. This was only
> useful in an isolated situation. Having said this, I thought that the
> aforementioned interface would be helpful to others, in particular, given the
> known limitation.

Furthermore, I can see that this is already provided via /proc/cgroups
(see proc_cgroupstats_show()). As such, another possibility:
the proposed interface could reliably produce the difference between the
maximum permitted memory-controlled cgroup count and the used count
(i.e. the remaining memory cgroup count, from root); albeit, I doubt that
this would be particularly useful to others i.e., the use-case would be
rather limited.


Kind regards,

-- 
Aaron Tomlin



Re: [PATCH] memcg: add support to generate the total count of children from root

2020-11-24 Thread Aaron Tomlin
On Tue, 24 Nov 2020 at 13:36, Michal Hocko  wrote:

Michal,

> This like any other user visible interface would be a much easier sell
> if there was a clear usecase to justify it. I do not see anything
> controversial about exporting such a value but my general take is that
> we are only adding new interface when existing ones are insufficient. A
> performance might be a very good reason but that would really require to
> come with some real life numbers.

Fair enough and understood.

At this stage, I unfortunately do not have such supporting data. This was only
useful in an isolated situation. Having said this, I thought that the
aforementioned interface would be helpful to others, in particular, given the
known limitation.

Kind regards,
-- 
Aaron Tomlin



Re: [PATCH] memcg: add support to generate the total count of children from root

2020-11-24 Thread Aaron Tomlin
On Tue, 24 Nov 2020 at 11:26, Michal Hocko  wrote:
>
> On Tue 24-11-20 10:58:36, Aaron Tomlin wrote:
> > Each memory-controlled cgroup is assigned a unique ID and the total
> > number of memory cgroups is limited to MEM_CGROUP_ID_MAX.
> >
> > This patch provides the ability to determine the number of
> > memory cgroups from the root memory cgroup, only.
> > A value of 1 (i.e. self count) is returned if there are no children.
> > For example, the number of memory cgroups can be established by
> > reading the /sys/fs/cgroup/memory/memory.total_cnt file.
>

Hi Michal,

> Could you add some explanation why is this information useful for
> userspace? Who is going to use it and why a simple scripting on top of
> cgroupfs is insufficient.

Thank you for your feedback.

Indeed, one can use a command/script to manually calculate this.
Having said that, one that creates a significant number of
memory-controlled cgroups may prefer a quick, simple and reliable method
to generate the aforementioned data, for management purposes only.
As such, I thought this patch might be particularly useful.

Kind regards,
-- 
Aaron Tomlin



[PATCH] memcg: add support to generate the total count of children from root

2020-11-24 Thread Aaron Tomlin
Each memory-controlled cgroup is assigned a unique ID and the total
number of memory cgroups is limited to MEM_CGROUP_ID_MAX.

This patch provides the ability to determine the number of
memory cgroups from the root memory cgroup, only.
A value of 1 (i.e. self count) is returned if there are no children.
For example, the number of memory cgroups can be established by
reading the /sys/fs/cgroup/memory/memory.total_cnt file.

Signed-off-by: Aaron Tomlin 
---
 mm/memcontrol.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 29459a6ce1c7..a4f7cb40e233 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4535,6 +4535,19 @@ static int mem_cgroup_oom_control_write(struct 
cgroup_subsys_state *css,
return 0;
 }
 
+static int mem_cgroup_total_count_read(struct cgroup_subsys_state *css,
+ struct cftype *cft)
+{
+   struct mem_cgroup *iter, *memcg = mem_cgroup_from_css(css);
+   int num = 0;
+
+   for_each_mem_cgroup_tree(iter, memcg)
+   num++;
+
+   /* Returns 1 (i.e. self count) if no children. */
+   return num;
+}
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 
 #include 
@@ -5050,6 +5063,11 @@ static struct cftype mem_cgroup_legacy_files[] = {
.write_u64 = mem_cgroup_oom_control_write,
.private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL),
},
+   {
+   .name = "total_cnt",
+   .flags = CFTYPE_ONLY_ON_ROOT,
+   .read_u64 = mem_cgroup_total_count_read,
+   },
{
.name = "pressure_level",
},
-- 
2.26.2



Re: [PATCH v5] proc/sysctl: add shared variables for range check

2019-05-23 Thread Aaron Tomlin
On Tue 2019-04-30 20:01 +0200, Matteo Croce wrote:
> In the sysctl code the proc_dointvec_minmax() function is often used to
> validate the user supplied value between an allowed range. This function
> uses the extra1 and extra2 members from struct ctl_table as minimum and
> maximum allowed value.
> 
> On sysctl handler declaration, in every source file there are some readonly
> variables containing just an integer which address is assigned to the
> extra1 and extra2 members, so the sysctl range is enforced.
> 
> The special values 0, 1 and INT_MAX are very often used as range boundary,
> leading duplication of variables like zero=0, one=1, int_max=INT_MAX in
> different source files:
> 
> $ git grep -E '\.extra[12].*&(zero|one|int_max)\b' |wc -l
> 248
> 
> Add a const int array containing the most commonly used values,
> some macros to refer more easily to the correct array member,
> and use them instead of creating a local one for every object file.
> 
> This is the bloat-o-meter output comparing the old and new binary
> compiled with the default Fedora config:
> 
> # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
> add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
> Data old new   delta
> sysctl_vals-  12 +12
> __kstrtab_sysctl_vals  -  12 +12
> max   14  10  -4
> int_max   16   - -16
> one   68   - -68
> zero 128  28-100
> Total: Before=20583249, After=20583085, chg -0.00%
> 
> Signed-off-by: Matteo Croce 
> ---

Nice idea.

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


Re: [PATCH RFC 4/5] mm/ksm, proc: introduce remote merge

2019-05-16 Thread Aaron Tomlin
On Thu 2019-05-16 12:00 +0200, Jann Horn wrote:
> On Thu, May 16, 2019 at 11:43 AM Oleksandr Natalenko
>  wrote:
[ ... ]
> > +   }
> > +
> > +   down_write(>mmap_sem);
> 
> Should a check for mmget_still_valid(mm) be inserted here? See commit
> 04f5866e41fb70690e28397487d8bd8eea7d712a.

Yes - I'd say this is required here.

Thanks,

-- 
Aaron Tomlin


Re: [PATCH RFC v2 3/4] mm/ksm: introduce force_madvise knob

2019-05-14 Thread Aaron Tomlin
On Tue 2019-05-14 15:16 +0200, Oleksandr Natalenko wrote:
> Present a new sysfs knob to mark task's anonymous memory as mergeable.
> 
> To force merging task's VMAs, its PID is echoed in a write-only file:
> 
># echo PID > /sys/kernel/mm/ksm/force_madvise
> 
> Force unmerging is done similarly, but with "minus" sign:
> 
># echo -PID > /sys/kernel/mm/ksm/force_madvise
> 
> "0" or "-0" can be used to control the current task.
> 
> To achieve this, previously introduced ksm_enter()/ksm_leave() helpers
> are used in the "store" handler.
> 
> Signed-off-by: Oleksandr Natalenko 
> ---
>  mm/ksm.c | 68 
>  1 file changed, 68 insertions(+)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index e9f3901168bb..22c59fb03d3a 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -2879,10 +2879,77 @@ static void wait_while_offlining(void)
>  
>  #define KSM_ATTR_RO(_name) \
>   static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> +#define KSM_ATTR_WO(_name) \
> + static struct kobj_attribute _name##_attr = __ATTR_WO(_name)
>  #define KSM_ATTR(_name) \
>   static struct kobj_attribute _name##_attr = \
>   __ATTR(_name, 0644, _name##_show, _name##_store)
>  
> +static ssize_t force_madvise_store(struct kobject *kobj,
> +  struct kobj_attribute *attr,
> +  const char *buf, size_t count)
> +{
> + int err;
> + pid_t pid;
> + bool merge = true;
> + struct task_struct *tsk;
> + struct mm_struct *mm;
> + struct vm_area_struct *vma;
> +
> + err = kstrtoint(buf, 10, );
> + if (err)
> + return -EINVAL;
> +
> + if (pid < 0) {
> + pid = abs(pid);
> + merge = false;
> + }
> +
> + if (!pid && *buf == '-')
> + merge = false;
> +
> + rcu_read_lock();
> + if (pid) {
> + tsk = find_task_by_vpid(pid);
> + if (!tsk) {
> + err = -ESRCH;
> + rcu_read_unlock();
> + goto out;
> + }
> + } else {
> + tsk = current;
> + }
> +
> + tsk = tsk->group_leader;
> +
> + get_task_struct(tsk);
> + rcu_read_unlock();
> +
> + mm = get_task_mm(tsk);
> + if (!mm) {
> + err = -EINVAL;
> + goto out_put_task_struct;
> + }
> + down_write(>mmap_sem);
> + vma = mm->mmap;
> + while (vma) {
> + if (merge)
> + ksm_enter(vma->vm_mm, vma, >vm_flags);
> + else
> + ksm_leave(vma, vma->vm_start, vma->vm_end, 
> >vm_flags);
> + vma = vma->vm_next;
> + }
> + up_write(>mmap_sem);
> + mmput(mm);
> +
> +out_put_task_struct:
> + put_task_struct(tsk);
> +
> +out:
> + return err ? err : count;
> +}
> +KSM_ATTR_WO(force_madvise);
> +
>  static ssize_t sleep_millisecs_show(struct kobject *kobj,
>       struct kobj_attribute *attr, char *buf)
>  {
> @@ -3185,6 +3252,7 @@ static ssize_t full_scans_show(struct kobject *kobj,
>  KSM_ATTR_RO(full_scans);
>  
>  static struct attribute *ksm_attrs[] = {
> + _madvise_attr.attr,
>   _millisecs_attr.attr,
>   _to_scan_attr.attr,
>   _attr.attr,

Looks fine to me.

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


Re: [PATCH] mm/slub: avoid double string traverse in kmem_cache_flags()

2019-05-14 Thread Aaron Tomlin
On Tue 2019-04-30 22:31 -0700, Yury Norov wrote:
> If ',' is not found, kmem_cache_flags() calls strlen() to find the end
> of line. We can do it in a single pass using strchrnul().
> 
> Signed-off-by: Yury Norov 
> ---
>  mm/slub.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 4922a0394757..85f90370a293 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1317,9 +1317,7 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
>   char *end, *glob;
>   size_t cmplen;
>  
> - end = strchr(iter, ',');
> - if (!end)
> - end = iter + strlen(iter);
> + end = strchrnul(iter, ',');
>  
>   glob = strnchr(iter, end - iter, '*');
>       if (glob)

Fair enough.

Acked-by: Aaron Tomlin 

-- 
Aaron Tomlin


[PATCH v3] slub: extend slub debug to handle multiple slabs

2018-09-28 Thread Aaron Tomlin
Extend the slub_debug syntax to "slub_debug=[,]*", where 
may contain an asterisk at the end.  For example, the following would poison
all kmalloc slabs:

slub_debug=P,kmalloc*

and the following would apply the default flags to all kmalloc and all block IO
slabs:

slub_debug=,bio*,kmalloc*

Please note that a similar patch was posted by Iliyan Malchev some time ago but
was never merged:

https://marc.info/?l=linux-mm=131283905330474=2

Signed-off-by: Aaron Tomlin 
---
Changes from v2 [2]:

 - Add a function and kernel-doc comment
 - Refactor code
 - Use the appropriate helper function max_t()

Changes from v1 [1]:

 - Add appropriate cast to address compiler warning

[1]: https://marc.info/?l=linux-kernel=153657804506284=2
[2]: https://marc.info/?l=linux-kernel=153747362316965=2
---
 Documentation/vm/slub.rst | 12 +---
 mm/slub.c | 50 +--
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
 
 slub_debug=
Enable options for all slabs
-slub_debug=,
-   Enable options only for select slabs
 
+slub_debug=,,,...
+   Enable options only for select slabs (no spaces
+   after a comma)
 
 Possible debug options are::
 
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
 
slub_debug=,dentry
 
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache.  You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix.  For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+   slub_debug=P,kmalloc-*,dentry
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
 to the dentry cache with::
diff --git a/mm/slub.c b/mm/slub.c
index 8da34a8af53d..7927f971df0d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1276,16 +1276,54 @@ static int __init setup_slub_debug(char *str)
 
 __setup("slub_debug", setup_slub_debug);
 
+/*
+ * kmem_cache_flags - apply debugging options to the cache
+ * @object_size:   the size of an object without meta data
+ * @flags: flags to set
+ * @name:  name of the cache
+ * @ctor:  constructor function
+ *
+ * Debug option(s) are applied to @flags. In addition to the debug
+ * option(s), if a slab name (or multiple) is specified i.e.
+ * slub_debug=,, ...
+ * then only the select slabs will receive the debug option(s).
+ */
 slab_flags_t kmem_cache_flags(unsigned int object_size,
slab_flags_t flags, const char *name,
void (*ctor)(void *))
 {
-   /*
-* Enable debugging if selected on the kernel commandline.
-*/
-   if (slub_debug && (!slub_debug_slabs || (name &&
-   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
-   flags |= slub_debug;
+   char *iter;
+   size_t len;
+
+   /* If slub_debug = 0, it folds into the if conditional. */
+   if (!slub_debug_slabs)
+   return flags | slub_debug;
+
+   len = strlen(name);
+   iter = slub_debug_slabs;
+   while (*iter) {
+   char *end, *glob;
+   size_t cmplen;
+
+   end = strchr(iter, ',');
+   if (!end)
+   end = iter + strlen(iter);
+
+   glob = strnchr(iter, end - iter, '*');
+   if (glob)
+   cmplen = glob - iter;
+   else
+   cmplen = max_t(size_t, len, (end - iter));
+
+   if (!strncmp(name, iter, cmplen)) {
+   flags |= slub_debug;
+   break;
+   }
+
+   if (!*end)
+   break;
+   iter = end + 1;
+   }
 
return flags;
 }
-- 
2.14.4



[PATCH v3] slub: extend slub debug to handle multiple slabs

2018-09-28 Thread Aaron Tomlin
Extend the slub_debug syntax to "slub_debug=[,]*", where 
may contain an asterisk at the end.  For example, the following would poison
all kmalloc slabs:

slub_debug=P,kmalloc*

and the following would apply the default flags to all kmalloc and all block IO
slabs:

slub_debug=,bio*,kmalloc*

Please note that a similar patch was posted by Iliyan Malchev some time ago but
was never merged:

https://marc.info/?l=linux-mm=131283905330474=2

Signed-off-by: Aaron Tomlin 
---
Changes from v2 [2]:

 - Add a function and kernel-doc comment
 - Refactor code
 - Use the appropriate helper function max_t()

Changes from v1 [1]:

 - Add appropriate cast to address compiler warning

[1]: https://marc.info/?l=linux-kernel=153657804506284=2
[2]: https://marc.info/?l=linux-kernel=153747362316965=2
---
 Documentation/vm/slub.rst | 12 +---
 mm/slub.c | 50 +--
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
 
 slub_debug=
Enable options for all slabs
-slub_debug=,
-   Enable options only for select slabs
 
+slub_debug=,,,...
+   Enable options only for select slabs (no spaces
+   after a comma)
 
 Possible debug options are::
 
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
 
slub_debug=,dentry
 
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache.  You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix.  For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+   slub_debug=P,kmalloc-*,dentry
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
 to the dentry cache with::
diff --git a/mm/slub.c b/mm/slub.c
index 8da34a8af53d..7927f971df0d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1276,16 +1276,54 @@ static int __init setup_slub_debug(char *str)
 
 __setup("slub_debug", setup_slub_debug);
 
+/*
+ * kmem_cache_flags - apply debugging options to the cache
+ * @object_size:   the size of an object without meta data
+ * @flags: flags to set
+ * @name:  name of the cache
+ * @ctor:  constructor function
+ *
+ * Debug option(s) are applied to @flags. In addition to the debug
+ * option(s), if a slab name (or multiple) is specified i.e.
+ * slub_debug=,, ...
+ * then only the select slabs will receive the debug option(s).
+ */
 slab_flags_t kmem_cache_flags(unsigned int object_size,
slab_flags_t flags, const char *name,
void (*ctor)(void *))
 {
-   /*
-* Enable debugging if selected on the kernel commandline.
-*/
-   if (slub_debug && (!slub_debug_slabs || (name &&
-   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
-   flags |= slub_debug;
+   char *iter;
+   size_t len;
+
+   /* If slub_debug = 0, it folds into the if conditional. */
+   if (!slub_debug_slabs)
+   return flags | slub_debug;
+
+   len = strlen(name);
+   iter = slub_debug_slabs;
+   while (*iter) {
+   char *end, *glob;
+   size_t cmplen;
+
+   end = strchr(iter, ',');
+   if (!end)
+   end = iter + strlen(iter);
+
+   glob = strnchr(iter, end - iter, '*');
+   if (glob)
+   cmplen = glob - iter;
+   else
+   cmplen = max_t(size_t, len, (end - iter));
+
+   if (!strncmp(name, iter, cmplen)) {
+   flags |= slub_debug;
+   break;
+   }
+
+   if (!*end)
+   break;
+   iter = end + 1;
+   }
 
return flags;
 }
-- 
2.14.4



Re: [PATCH v2] slub: extend slub debug to handle multiple slabs

2018-09-28 Thread Aaron Tomlin
On Fri 2018-09-21 16:34 -0700, Andrew Morton wrote:
> On Thu, 20 Sep 2018 21:00:16 +0100 Aaron Tomlin  wrote:
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int 
> > object_size,
> > /*
> >  * Enable debugging if selected on the kernel commandline.
> >  */
> 
> The above comment is in a strange place.  Can we please move it to
> above the function definition in the usual fashion?  And make it
> better, if anything seems to be missing.

OK.

> > -   if (slub_debug && (!slub_debug_slabs || (name &&
> > -   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
> > -   flags |= slub_debug;
> > +
> > +   char *end, *n, *glob;
> 
> `end' and `glob' could be local to the loop which uses them, which I
> find a bit nicer.

OK.

> `n' is a rotten identifier.  Can't we think of something which
> communicates meaning?

OK.

> > +   int len = strlen(name);
> > +
> > +   /* If slub_debug = 0, it folds into the if conditional. */
> > +   if (!slub_debug_slabs)
> > +   return flags | slub_debug;
> 
> If we take the above return, the call to strlen() was wasted cycles. 
> Presumably gcc is smart enough to prevent that, but why risk it.

OK.

> > +   n = slub_debug_slabs;
> > +   while (*n) {
> > +   int cmplen;
> > +
> > +   end = strchr(n, ',');
> > +   if (!end)
> > +   end = n + strlen(n);
> > +
> > +   glob = strnchr(n, end - n, '*');
> > +   if (glob)
> > +   cmplen = glob - n;
> > +   else
> > +   cmplen = max(len, (int)(end - n));
> 
> max_t() exists for this.  Or maybe make `len' size_t, but I expect that
> will still warn - that subtraction returns a ptrdiff_t, yes?

I think max_t(size_t, ...) should be appropriate?

I'll address the above and in the next version.


> > +
> > +   if (!strncmp(name, n, cmplen)) {
> > +   flags |= slub_debug;
> > +   break;
> > +   }
> > +
> > +   if (!*end)
> > +   break;
> > +   n = end + 1;
> > +   }
> The code in this loop hurts my brain a bit. I hope it's correct ;)

It works :)



-- 
Aaron Tomlin


Re: [PATCH v2] slub: extend slub debug to handle multiple slabs

2018-09-28 Thread Aaron Tomlin
On Fri 2018-09-21 16:34 -0700, Andrew Morton wrote:
> On Thu, 20 Sep 2018 21:00:16 +0100 Aaron Tomlin  wrote:
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int 
> > object_size,
> > /*
> >  * Enable debugging if selected on the kernel commandline.
> >  */
> 
> The above comment is in a strange place.  Can we please move it to
> above the function definition in the usual fashion?  And make it
> better, if anything seems to be missing.

OK.

> > -   if (slub_debug && (!slub_debug_slabs || (name &&
> > -   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
> > -   flags |= slub_debug;
> > +
> > +   char *end, *n, *glob;
> 
> `end' and `glob' could be local to the loop which uses them, which I
> find a bit nicer.

OK.

> `n' is a rotten identifier.  Can't we think of something which
> communicates meaning?

OK.

> > +   int len = strlen(name);
> > +
> > +   /* If slub_debug = 0, it folds into the if conditional. */
> > +   if (!slub_debug_slabs)
> > +   return flags | slub_debug;
> 
> If we take the above return, the call to strlen() was wasted cycles. 
> Presumably gcc is smart enough to prevent that, but why risk it.

OK.

> > +   n = slub_debug_slabs;
> > +   while (*n) {
> > +   int cmplen;
> > +
> > +   end = strchr(n, ',');
> > +   if (!end)
> > +   end = n + strlen(n);
> > +
> > +   glob = strnchr(n, end - n, '*');
> > +   if (glob)
> > +   cmplen = glob - n;
> > +   else
> > +   cmplen = max(len, (int)(end - n));
> 
> max_t() exists for this.  Or maybe make `len' size_t, but I expect that
> will still warn - that subtraction returns a ptrdiff_t, yes?

I think max_t(size_t, ...) should be appropriate?

I'll address the above and in the next version.


> > +
> > +   if (!strncmp(name, n, cmplen)) {
> > +   flags |= slub_debug;
> > +   break;
> > +   }
> > +
> > +   if (!*end)
> > +   break;
> > +   n = end + 1;
> > +   }
> The code in this loop hurts my brain a bit. I hope it's correct ;)

It works :)



-- 
Aaron Tomlin


[PATCH v2] slub: extend slub debug to handle multiple slabs

2018-09-20 Thread Aaron Tomlin
Extend the slub_debug syntax to "slub_debug=[,]*", where 
may contain an asterisk at the end.  For example, the following would poison
all kmalloc slabs:

slub_debug=P,kmalloc*

and the following would apply the default flags to all kmalloc and all block IO
slabs:

slub_debug=,bio*,kmalloc*

Please note that a similar patch was posted by Iliyan Malchev some time ago but
was never merged:

https://marc.info/?l=linux-mm=131283905330474=2

Signed-off-by: Aaron Tomlin 
---
Changes from v1 [1]:

 - Add appropriate cast to address compiler warning

[1]: https://lore.kernel.org/lkml/20180910111358.10539-1-atom...@redhat.com/
---
 Documentation/vm/slub.rst | 12 +---
 mm/slub.c | 34 +++---
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
 
 slub_debug=
Enable options for all slabs
-slub_debug=,
-   Enable options only for select slabs
 
+slub_debug=,,,...
+   Enable options only for select slabs (no spaces
+   after a comma)
 
 Possible debug options are::
 
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
 
slub_debug=,dentry
 
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache.  You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix.  For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+   slub_debug=P,kmalloc-*,dentry
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
 to the dentry cache with::
diff --git a/mm/slub.c b/mm/slub.c
index 8da34a8af53d..d20901514075 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
/*
 * Enable debugging if selected on the kernel commandline.
 */
-   if (slub_debug && (!slub_debug_slabs || (name &&
-   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
-   flags |= slub_debug;
+
+   char *end, *n, *glob;
+   int len = strlen(name);
+
+   /* If slub_debug = 0, it folds into the if conditional. */
+   if (!slub_debug_slabs)
+   return flags | slub_debug;
+
+   n = slub_debug_slabs;
+   while (*n) {
+   int cmplen;
+
+   end = strchr(n, ',');
+   if (!end)
+   end = n + strlen(n);
+
+   glob = strnchr(n, end - n, '*');
+   if (glob)
+   cmplen = glob - n;
+   else
+   cmplen = max(len, (int)(end - n));
+
+   if (!strncmp(name, n, cmplen)) {
+   flags |= slub_debug;
+   break;
+   }
+
+   if (!*end)
+   break;
+   n = end + 1;
+   }
 
return flags;
 }
-- 
2.14.4



[PATCH v2] slub: extend slub debug to handle multiple slabs

2018-09-20 Thread Aaron Tomlin
Extend the slub_debug syntax to "slub_debug=[,]*", where 
may contain an asterisk at the end.  For example, the following would poison
all kmalloc slabs:

slub_debug=P,kmalloc*

and the following would apply the default flags to all kmalloc and all block IO
slabs:

slub_debug=,bio*,kmalloc*

Please note that a similar patch was posted by Iliyan Malchev some time ago but
was never merged:

https://marc.info/?l=linux-mm=131283905330474=2

Signed-off-by: Aaron Tomlin 
---
Changes from v1 [1]:

 - Add appropriate cast to address compiler warning

[1]: https://lore.kernel.org/lkml/20180910111358.10539-1-atom...@redhat.com/
---
 Documentation/vm/slub.rst | 12 +---
 mm/slub.c | 34 +++---
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
 
 slub_debug=
Enable options for all slabs
-slub_debug=,
-   Enable options only for select slabs
 
+slub_debug=,,,...
+   Enable options only for select slabs (no spaces
+   after a comma)
 
 Possible debug options are::
 
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
 
slub_debug=,dentry
 
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache.  You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix.  For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+   slub_debug=P,kmalloc-*,dentry
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
 to the dentry cache with::
diff --git a/mm/slub.c b/mm/slub.c
index 8da34a8af53d..d20901514075 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
/*
 * Enable debugging if selected on the kernel commandline.
 */
-   if (slub_debug && (!slub_debug_slabs || (name &&
-   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
-   flags |= slub_debug;
+
+   char *end, *n, *glob;
+   int len = strlen(name);
+
+   /* If slub_debug = 0, it folds into the if conditional. */
+   if (!slub_debug_slabs)
+   return flags | slub_debug;
+
+   n = slub_debug_slabs;
+   while (*n) {
+   int cmplen;
+
+   end = strchr(n, ',');
+   if (!end)
+   end = n + strlen(n);
+
+   glob = strnchr(n, end - n, '*');
+   if (glob)
+   cmplen = glob - n;
+   else
+   cmplen = max(len, (int)(end - n));
+
+   if (!strncmp(name, n, cmplen)) {
+   flags |= slub_debug;
+   break;
+   }
+
+   if (!*end)
+   break;
+   n = end + 1;
+   }
 
return flags;
 }
-- 
2.14.4



[PATCH] slub: extend slub debug to handle multiple slabs

2018-09-10 Thread Aaron Tomlin
Extend the slub_debug syntax to "slub_debug=[,]*", where 
may contain an asterisk at the end.  For example, the following would poison
all kmalloc slabs:

slub_debug=P,kmalloc*

and the following would apply the default flags to all kmalloc and all block IO
slabs:

slub_debug=,bio*,kmalloc*

Please note that a similar patch was posted by Iliyan Malchev some time ago but
was never merged:

https://marc.info/?l=linux-mm=131283905330474=2


Signed-off-by: Aaron Tomlin 
---
 Documentation/vm/slub.rst | 12 +---
 mm/slub.c | 34 +++---
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
 
 slub_debug=
Enable options for all slabs
-slub_debug=,
-   Enable options only for select slabs
 
+slub_debug=,,,...
+   Enable options only for select slabs (no spaces
+   after a comma)
 
 Possible debug options are::
 
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
 
slub_debug=,dentry
 
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache.  You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix.  For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+   slub_debug=P,kmalloc-*,dentry
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
 to the dentry cache with::
diff --git a/mm/slub.c b/mm/slub.c
index 8da34a8af53d..27281867cbe0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
/*
 * Enable debugging if selected on the kernel commandline.
 */
-   if (slub_debug && (!slub_debug_slabs || (name &&
-   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
-   flags |= slub_debug;
+
+   char *end, *n, *glob;
+   int len = strlen(name);
+
+   /* If slub_debug = 0, it folds into the if conditional. */
+   if (!slub_debug_slabs)
+   return flags | slub_debug;
+
+   n = slub_debug_slabs;
+   while (*n) {
+   int cmplen;
+
+   end = strchr(n, ',');
+   if (!end)
+   end = n + strlen(n);
+
+   glob = strnchr(n, end - n, '*');
+   if (glob)
+   cmplen = glob - n;
+   else
+   cmplen = max(len, end - n);
+
+   if (!strncmp(name, n, cmplen)) {
+   flags |= slub_debug;
+   break;
+   }
+
+   if (!*end)
+   break;
+   n = end + 1;
+   }
 
return flags;
 }
-- 
2.14.4



[PATCH] slub: extend slub debug to handle multiple slabs

2018-09-10 Thread Aaron Tomlin
Extend the slub_debug syntax to "slub_debug=[,]*", where 
may contain an asterisk at the end.  For example, the following would poison
all kmalloc slabs:

slub_debug=P,kmalloc*

and the following would apply the default flags to all kmalloc and all block IO
slabs:

slub_debug=,bio*,kmalloc*

Please note that a similar patch was posted by Iliyan Malchev some time ago but
was never merged:

https://marc.info/?l=linux-mm=131283905330474=2


Signed-off-by: Aaron Tomlin 
---
 Documentation/vm/slub.rst | 12 +---
 mm/slub.c | 34 +++---
 2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 3a775fd64e2d..195928808bac 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -36,9 +36,10 @@ debugging is enabled. Format:
 
 slub_debug=
Enable options for all slabs
-slub_debug=,
-   Enable options only for select slabs
 
+slub_debug=,,,...
+   Enable options only for select slabs (no spaces
+   after a comma)
 
 Possible debug options are::
 
@@ -62,7 +63,12 @@ Trying to find an issue in the dentry cache? Try::
 
slub_debug=,dentry
 
-to only enable debugging on the dentry cache.
+to only enable debugging on the dentry cache.  You may use an asterisk at the
+end of the slab name, in order to cover all slabs with the same prefix.  For
+example, here's how you can poison the dentry cache as well as all kmalloc
+slabs:
+
+   slub_debug=P,kmalloc-*,dentry
 
 Red zoning and tracking may realign the slab.  We can just apply sanity checks
 to the dentry cache with::
diff --git a/mm/slub.c b/mm/slub.c
index 8da34a8af53d..27281867cbe0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
/*
 * Enable debugging if selected on the kernel commandline.
 */
-   if (slub_debug && (!slub_debug_slabs || (name &&
-   !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs)
-   flags |= slub_debug;
+
+   char *end, *n, *glob;
+   int len = strlen(name);
+
+   /* If slub_debug = 0, it folds into the if conditional. */
+   if (!slub_debug_slabs)
+   return flags | slub_debug;
+
+   n = slub_debug_slabs;
+   while (*n) {
+   int cmplen;
+
+   end = strchr(n, ',');
+   if (!end)
+   end = n + strlen(n);
+
+   glob = strnchr(n, end - n, '*');
+   if (glob)
+   cmplen = glob - n;
+   else
+   cmplen = max(len, end - n);
+
+   if (!strncmp(name, n, cmplen)) {
+   flags |= slub_debug;
+   break;
+   }
+
+   if (!*end)
+   break;
+   n = end + 1;
+   }
 
return flags;
 }
-- 
2.14.4



Re: [PATCH/RFC] perf report: Add option to change offset format when address is specified as a sort_key

2017-12-24 Thread Aaron Tomlin
On Sat 2017-12-23 00:59 +0100, Jiri Olsa wrote:
[ ... ]
> not sure we've already discussed that, but this could be default?

Probably it is best to keep the default behaviour.

I'd prefer a hexadecimal address offset, as the default, however perhaps
someone is happy with the current default (decimal).

> if not, I think the hex option should be part of -g option arg, maybe like:

Does it have to be?

>   -g graph,callee,xaddress

Not sure - adding another sort_key seems odd to me.

But if you insist, perhaps address[=hex] would be cleaner?

Otherwise I would prefer --hex.


Regards,

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [PATCH/RFC] perf report: Add option to change offset format when address is specified as a sort_key

2017-12-24 Thread Aaron Tomlin
On Sat 2017-12-23 00:59 +0100, Jiri Olsa wrote:
[ ... ]
> not sure we've already discussed that, but this could be default?

Probably it is best to keep the default behaviour.

I'd prefer a hexadecimal address offset, as the default, however perhaps
someone is happy with the current default (decimal).

> if not, I think the hex option should be part of -g option arg, maybe like:

Does it have to be?

>   -g graph,callee,xaddress

Not sure - adding another sort_key seems odd to me.

But if you insist, perhaps address[=hex] would be cleaner?

Otherwise I would prefer --hex.


Regards,

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


[PATCH/RFC] perf report: Add option to change offset format when address is specified as a sort_key

2017-12-15 Thread Aaron Tomlin
With --call-graph option, a user can choose to display call chains with a
specific sort_key. The sort_key can be either: function, address or srcline.
By default, when the address is specified as the sort_key, the offset (i.e.
symbol + offset) is provided in decimal (base 10) form.
This is because in __get_srcline() we use PRIu64 to display the offset.
This may or may not be desirable.

This patch adds a new option --hex, to change the offset format to
hexidecimal. Note, this can only be used with the -g, --call-graph option
when address is specified as a sort_key.  When this option is not
specified, the default behavior is used, which is to display the offset
in decimal form.

Before:

$ ./perf report -n --fields=overhead,sample,period,pid,symbol --stdio \
> -g graph,callee,address --percent-limit 10 --no-children
[ ... ]
10.87% 1 506940477 15:migration/3   [k] _spin_trylock
|
---_spin_trylock +13
   double_lock_balance +48
   schedule +2007
   migration_thread +613
   kthread +158
   child_rip +10

After:

$ ./perf report -n --fields=overhead,sample,period,pid,symbol --stdio \
> -g graph,callee,address --percent-limit 10 --hex --no-children
[ ... ]
10.87% 1 506940477 15:migration/3   [k] _spin_trylock
|
---_spin_trylock +0xd
   double_lock_balance +0x30
   schedule +0x7d7
   migration_thread +0x265
   kthread +0x9e
   child_rip +0xa

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 4 
 tools/perf/builtin-report.c  | 2 ++
 tools/perf/util/srcline.c| 6 --
 tools/perf/util/srcline.h| 1 +
 4 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index ddde2b5..589435a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -386,6 +386,10 @@ OPTIONS
sum of shown entries will be always 100%.  "absolute" means it retains
the original value before and after the filter is applied.
 
+--hex::
+   When address is specified as a sort_key, show the offset in
+   hexidecimal form. (Default: decimal)
+
 --header::
Show header information in the perf.data file.  This includes
various information like hostname, OS and perf version, cpu/mem
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index af5dd03..7820e55 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -866,6 +866,8 @@ int cmd_report(int argc, const char **argv)
itrace_parse_synth_opts),
OPT_BOOLEAN(0, "full-source-path", _full_filename,
"Show full source file name path for source lines"),
+   OPT_BOOLEAN(0, "hex", _hex_offset,
+   "When address is specified as a sort_key, show the 
offset in hexidecimal form"),
OPT_BOOLEAN(0, "show-ref-call-graph", _conf.show_ref_callgraph,
"Show callgraph from reference event"),
OPT_INTEGER(0, "socket-filter", _filter,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index d19f05c..98fc911 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -15,6 +15,7 @@
 #include "symbol.h"
 
 bool srcline_full_filename;
+bool show_hex_offset;
 
 static const char *dso__name(struct dso *dso)
 {
@@ -535,8 +536,9 @@ char *__get_srcline(struct dso *dso, u64 addr, struct 
symbol *sym,
strndup(sym->name, sym->namelen) : NULL;
 
if (sym) {
-   if (asprintf(, "%s+%" PRIu64, show_sym ? sym->name : "",
-   addr - sym->start) < 0)
+   if (asprintf(,
+   show_hex_offset ? "%s+%#" PRIx64 : "%s+%" PRIu64,
+   show_sym ? sym->name : "", addr - sym->start) < 0)
return SRCLINE_UNKNOWN;
} else if (asprintf(, "%s[%" PRIx64 "]", dso->short_name, addr) 
< 0)
return SRCLINE_UNKNOWN;
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 847b708..1bca068 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -10,6 +10,7 @@ struct dso;
 struct symbol;
 
 extern bool srcline_full_filename;
+extern bool show_hex_offset;
 char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
  bool show_sym, bool show_addr);
 char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
-- 
2.9.5



[PATCH/RFC] perf report: Add option to change offset format when address is specified as a sort_key

2017-12-15 Thread Aaron Tomlin
With --call-graph option, a user can choose to display call chains with a
specific sort_key. The sort_key can be either: function, address or srcline.
By default, when the address is specified as the sort_key, the offset (i.e.
symbol + offset) is provided in decimal (base 10) form.
This is because in __get_srcline() we use PRIu64 to display the offset.
This may or may not be desirable.

This patch adds a new option --hex, to change the offset format to
hexidecimal. Note, this can only be used with the -g, --call-graph option
when address is specified as a sort_key.  When this option is not
specified, the default behavior is used, which is to display the offset
in decimal form.

Before:

$ ./perf report -n --fields=overhead,sample,period,pid,symbol --stdio \
> -g graph,callee,address --percent-limit 10 --no-children
[ ... ]
10.87% 1 506940477 15:migration/3   [k] _spin_trylock
|
---_spin_trylock +13
   double_lock_balance +48
   schedule +2007
   migration_thread +613
   kthread +158
   child_rip +10

After:

$ ./perf report -n --fields=overhead,sample,period,pid,symbol --stdio \
> -g graph,callee,address --percent-limit 10 --hex --no-children
[ ... ]
10.87% 1 506940477 15:migration/3   [k] _spin_trylock
|
---_spin_trylock +0xd
   double_lock_balance +0x30
   schedule +0x7d7
   migration_thread +0x265
   kthread +0x9e
   child_rip +0xa

Signed-off-by: Aaron Tomlin 
---
 tools/perf/Documentation/perf-report.txt | 4 
 tools/perf/builtin-report.c  | 2 ++
 tools/perf/util/srcline.c| 6 --
 tools/perf/util/srcline.h| 1 +
 4 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index ddde2b5..589435a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -386,6 +386,10 @@ OPTIONS
sum of shown entries will be always 100%.  "absolute" means it retains
the original value before and after the filter is applied.
 
+--hex::
+   When address is specified as a sort_key, show the offset in
+   hexidecimal form. (Default: decimal)
+
 --header::
Show header information in the perf.data file.  This includes
various information like hostname, OS and perf version, cpu/mem
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index af5dd03..7820e55 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -866,6 +866,8 @@ int cmd_report(int argc, const char **argv)
itrace_parse_synth_opts),
OPT_BOOLEAN(0, "full-source-path", _full_filename,
"Show full source file name path for source lines"),
+   OPT_BOOLEAN(0, "hex", _hex_offset,
+   "When address is specified as a sort_key, show the 
offset in hexidecimal form"),
OPT_BOOLEAN(0, "show-ref-call-graph", _conf.show_ref_callgraph,
"Show callgraph from reference event"),
OPT_INTEGER(0, "socket-filter", _filter,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index d19f05c..98fc911 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -15,6 +15,7 @@
 #include "symbol.h"
 
 bool srcline_full_filename;
+bool show_hex_offset;
 
 static const char *dso__name(struct dso *dso)
 {
@@ -535,8 +536,9 @@ char *__get_srcline(struct dso *dso, u64 addr, struct 
symbol *sym,
strndup(sym->name, sym->namelen) : NULL;
 
if (sym) {
-   if (asprintf(, "%s+%" PRIu64, show_sym ? sym->name : "",
-   addr - sym->start) < 0)
+   if (asprintf(,
+   show_hex_offset ? "%s+%#" PRIx64 : "%s+%" PRIu64,
+   show_sym ? sym->name : "", addr - sym->start) < 0)
return SRCLINE_UNKNOWN;
} else if (asprintf(, "%s[%" PRIx64 "]", dso->short_name, addr) 
< 0)
return SRCLINE_UNKNOWN;
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 847b708..1bca068 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -10,6 +10,7 @@ struct dso;
 struct symbol;
 
 extern bool srcline_full_filename;
+extern bool show_hex_offset;
 char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
  bool show_sym, bool show_addr);
 char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
-- 
2.9.5



Re: [RFC 02/10] module: fix memory leak on early load_module() failures

2016-12-15 Thread Aaron Tomlin
On Thu 2016-12-08 11:48 -0800, Luis R. Rodriguez wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
> 
>   o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
>   o mod_sysfs_setup() fails
>   o we're a live patch module and copy_module_elf() fails
> 
> Chances of running into this issue is really low.
> 
> kmemleak splat:
> 
> unreferenced object 0x9f2c4ada1b00 (size 32):
>   comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
>   hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00  memstick0...
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] __kmalloc_track_caller+0x126/0x230
> [] kstrdup+0x31/0x60
> [] kstrdup_const+0x24/0x30
> [] kvasprintf_const+0x7a/0x90
> [] kobject_set_name_vargs+0x21/0x90
> [] dev_set_name+0x47/0x50
> [] memstick_check+0x95/0x33c [memstick]
> [] process_one_work+0x1f3/0x4b0
> [] worker_thread+0x48/0x4e0
> [] kthread+0xc9/0xe0
> [] ret_from_fork+0x1f/0x40
> [] 0x
> 
> Signed-off-by: Luis R. Rodriguez <mcg...@kernel.org>
> ---
>  kernel/module.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Aaron Tomlin <atom...@redhat.com>

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [RFC 02/10] module: fix memory leak on early load_module() failures

2016-12-15 Thread Aaron Tomlin
On Thu 2016-12-08 11:48 -0800, Luis R. Rodriguez wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
> 
>   o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
>   o mod_sysfs_setup() fails
>   o we're a live patch module and copy_module_elf() fails
> 
> Chances of running into this issue is really low.
> 
> kmemleak splat:
> 
> unreferenced object 0x9f2c4ada1b00 (size 32):
>   comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
>   hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00  memstick0...
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] __kmalloc_track_caller+0x126/0x230
> [] kstrdup+0x31/0x60
> [] kstrdup_const+0x24/0x30
> [] kvasprintf_const+0x7a/0x90
> [] kobject_set_name_vargs+0x21/0x90
> [] dev_set_name+0x47/0x50
> [] memstick_check+0x95/0x33c [memstick]
> [] process_one_work+0x1f3/0x4b0
> [] worker_thread+0x48/0x4e0
> [] kthread+0xc9/0xe0
> [] ret_from_fork+0x1f/0x40
> [] 0x
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  kernel/module.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [PATCH] kernel/watchdog: Prevent false hardlockup on overloaded system

2016-12-15 Thread Aaron Tomlin
On Tue 2016-12-06 11:17 -0500, Don Zickus wrote:
> On an overloaded system, it is possible that a change in the watchdog 
> threshold
> can be delayed long enough to trigger a false positive.
> 
> This can easily be achieved by having a cpu spinning indefinitely on a task,
> while another cpu updates watchdog threshold.
> 
> What happens is while trying to park the watchdog threads, the hrtimers on the
> other cpus trigger and reprogram themselves with the new slower watchdog
> threshold.  Meanwhile, the nmi watchdog is still programmed with the old 
> faster
> threshold.
> 
> Because the one cpu is blocked, it prevents the thread parking on the other
> cpus from completing, which is needed to shutdown the nmi watchdog and
> reprogram it correctly.  As a result, a false positive from the nmi watchdog 
> is
> reported.
> 
> Fix this by setting a park_in_progress flag to block all lockups
> until the parking is complete.
> 
> Fix provided by Ulrich Obergfell.
> 
> Cc: Ulrich Obergfell <uober...@redhat.com>
> Signed-off-by: Don Zickus <dzic...@redhat.com>
> ---
>  include/linux/nmi.h   | 1 +
>  kernel/watchdog.c | 9 +
>  kernel/watchdog_hld.c | 3 +++
>  3 files changed, 13 insertions(+)

Looks fine to me.

Reviewed-by: Aaron Tomlin <atom...@redhat.com>

-- 
Aaron Tomlin


Re: [PATCH] kernel/watchdog: Prevent false hardlockup on overloaded system

2016-12-15 Thread Aaron Tomlin
On Tue 2016-12-06 11:17 -0500, Don Zickus wrote:
> On an overloaded system, it is possible that a change in the watchdog 
> threshold
> can be delayed long enough to trigger a false positive.
> 
> This can easily be achieved by having a cpu spinning indefinitely on a task,
> while another cpu updates watchdog threshold.
> 
> What happens is while trying to park the watchdog threads, the hrtimers on the
> other cpus trigger and reprogram themselves with the new slower watchdog
> threshold.  Meanwhile, the nmi watchdog is still programmed with the old 
> faster
> threshold.
> 
> Because the one cpu is blocked, it prevents the thread parking on the other
> cpus from completing, which is needed to shutdown the nmi watchdog and
> reprogram it correctly.  As a result, a false positive from the nmi watchdog 
> is
> reported.
> 
> Fix this by setting a park_in_progress flag to block all lockups
> until the parking is complete.
> 
> Fix provided by Ulrich Obergfell.
> 
> Cc: Ulrich Obergfell 
> Signed-off-by: Don Zickus 
> ---
>  include/linux/nmi.h   | 1 +
>  kernel/watchdog.c | 9 +
>  kernel/watchdog_hld.c | 3 +++
>  3 files changed, 13 insertions(+)

Looks fine to me.

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


Re: [RFC PATCH v2 2/2] module: When modifying a module's text ignore modules which are going away too

2016-11-07 Thread Aaron Tomlin
On Thu 2016-10-27 09:49 -0400, Steven Rostedt wrote:
[ ... ]
> I also added Jessica to the Cc as I notice she will be the new module
> maintainer: http://lwn.net/Articles/704653/

Hi Jessica,

Any thoughts?

Thanks,

-- 
Aaron Tomlin


Re: [RFC PATCH v2 2/2] module: When modifying a module's text ignore modules which are going away too

2016-11-07 Thread Aaron Tomlin
On Thu 2016-10-27 09:49 -0400, Steven Rostedt wrote:
[ ... ]
> I also added Jessica to the Cc as I notice she will be the new module
> maintainer: http://lwn.net/Articles/704653/

Hi Jessica,

Any thoughts?

Thanks,

-- 
Aaron Tomlin


[RFC PATCH v2 2/2] module: When modifying a module's text ignore modules which are going away too

2016-10-27 Thread Aaron Tomlin
By default, during the access permission modification of a module's core
and init pages, we only ignore modules that are malformed. Albeit for a
module which is going away, it does not make sense to change its text to
RO since the module should be RW, before deallocation.

This patch makes set_all_modules_text_ro() skip modules which are going
away too.

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 kernel/module.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index ff93ab8..2a383df 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1969,7 +1969,8 @@ void set_all_modules_text_ro(void)
 
mutex_lock(_mutex);
list_for_each_entry_rcu(mod, , list) {
-   if (mod->state == MODULE_STATE_UNFORMED)
+   if (mod->state == MODULE_STATE_UNFORMED ||
+   mod->state == MODULE_STATE_GOING)
continue;
 
frob_text(>core_layout, set_memory_ro);
-- 
2.5.5



[RFC PATCH v2 2/2] module: When modifying a module's text ignore modules which are going away too

2016-10-27 Thread Aaron Tomlin
By default, during the access permission modification of a module's core
and init pages, we only ignore modules that are malformed. Albeit for a
module which is going away, it does not make sense to change its text to
RO since the module should be RW, before deallocation.

This patch makes set_all_modules_text_ro() skip modules which are going
away too.

Signed-off-by: Aaron Tomlin 
---
 kernel/module.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index ff93ab8..2a383df 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1969,7 +1969,8 @@ void set_all_modules_text_ro(void)
 
mutex_lock(_mutex);
list_for_each_entry_rcu(mod, , list) {
-   if (mod->state == MODULE_STATE_UNFORMED)
+   if (mod->state == MODULE_STATE_UNFORMED ||
+   mod->state == MODULE_STATE_GOING)
continue;
 
frob_text(>core_layout, set_memory_ro);
-- 
2.5.5



[RFC PATCH 0/2] Possible race between load_module() error handling and kprobe registration ?

2016-10-20 Thread Aaron Tomlin

I think there is a race (albeit a hard-to-hit one) between load_module()
error handling and kprobe registration which could cause a kernel page to
become read-only, panic due to protection fault.

In short, the protection that gets applied [at the bug_cleanup label] can be
overridden by another CPU when executing set_all_modules_text_ro().
Therefore creating the possibility for the kprobe registration code path to
touch a [formed] module that is being deallocated. Consequently we could
free a mapped page, that is not 'writable'. The same page, when later
accessed, will result in a page fault which cannot be handled. Below is an
attempt to illustrate the race. Please note we assume that:

  - kprobe uses ftrace

  - parse_args() or mod_sysfs_setup() would have to fail

  - CPU Y and X do not attempt to load the same module

  - CPU Y would have to sneak in *after* CPU X called the two 'unset'
functions but before CPU X removes the module from the list of all
modules

   CPU X
   ...
 load_module
   // Unknown/invalid module
   // parameter specified ...
   after_dashes = parse_args(...)
   if (IS_ERR(after_dashes))
 err = PTR_ERR(after_dashes)
 goto coming_cleanup:
   ...
  bug_cleanup:

   module_disable_ro(mod)
   module_disable_nx(mod)
   ...
   // set_all_modules_text_ro() on CPU Y sneaks in here <-.
   // and overrides the effects of the previous 'unset'   |
   ...|
   list_del_rcu(>list)   |
  |
  |
   CPU Y  |
   ...|
 sys_finit_module |
   load_module|
 do_init_module   |
   do_one_initcall|
 // mod->init |
 kprobe_example_init  |
   register_kprobe|
 arm_kprobe   |
   // kprobe uses ftrace  |
   arm_kprobe_ftrace  |
 register_ftrace_function |
   ftrace_startup |
 ftrace_startup_enable|
   ftrace_run_update_code |
 ftrace_arch_code_modify_post_process |
 {|
   // |
   // Set all [formed] module's   |
   // core and init pages as  |
   // read-only under |
   // module_mutex ...|
   // |
   set_all_modules_text_ro() -'
 }



The following patches (I hope) is an attempt to address this theoretical
race.  Please let me know your thoughts.


Aaron Tomlin (2):
  module: Ensure a module's state is set accordingly during module
coming cleanup code
  module: When modifying a module's text ignore modules which are going
away too

 kernel/module.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.5.5



[RFC PATCH 0/2] Possible race between load_module() error handling and kprobe registration ?

2016-10-20 Thread Aaron Tomlin

I think there is a race (albeit a hard-to-hit one) between load_module()
error handling and kprobe registration which could cause a kernel page to
become read-only, panic due to protection fault.

In short, the protection that gets applied [at the bug_cleanup label] can be
overridden by another CPU when executing set_all_modules_text_ro().
Therefore creating the possibility for the kprobe registration code path to
touch a [formed] module that is being deallocated. Consequently we could
free a mapped page, that is not 'writable'. The same page, when later
accessed, will result in a page fault which cannot be handled. Below is an
attempt to illustrate the race. Please note we assume that:

  - kprobe uses ftrace

  - parse_args() or mod_sysfs_setup() would have to fail

  - CPU Y and X do not attempt to load the same module

  - CPU Y would have to sneak in *after* CPU X called the two 'unset'
functions but before CPU X removes the module from the list of all
modules

   CPU X
   ...
 load_module
   // Unknown/invalid module
   // parameter specified ...
   after_dashes = parse_args(...)
   if (IS_ERR(after_dashes))
 err = PTR_ERR(after_dashes)
 goto coming_cleanup:
   ...
  bug_cleanup:

   module_disable_ro(mod)
   module_disable_nx(mod)
   ...
   // set_all_modules_text_ro() on CPU Y sneaks in here <-.
   // and overrides the effects of the previous 'unset'   |
   ...|
   list_del_rcu(>list)   |
  |
  |
   CPU Y  |
   ...|
 sys_finit_module |
   load_module|
 do_init_module   |
   do_one_initcall|
 // mod->init |
 kprobe_example_init  |
   register_kprobe|
 arm_kprobe   |
   // kprobe uses ftrace  |
   arm_kprobe_ftrace  |
 register_ftrace_function |
   ftrace_startup |
 ftrace_startup_enable|
   ftrace_run_update_code |
 ftrace_arch_code_modify_post_process |
 {|
   // |
   // Set all [formed] module's   |
   // core and init pages as  |
   // read-only under |
   // module_mutex ...|
   // |
   set_all_modules_text_ro() -'
 }



The following patches (I hope) is an attempt to address this theoretical
race.  Please let me know your thoughts.


Aaron Tomlin (2):
  module: Ensure a module's state is set accordingly during module
coming cleanup code
  module: When modifying a module's text ignore modules which are going
away too

 kernel/module.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.5.5



[RFC PATCH 1/2] module: Ensure a module's state is set accordingly during module coming cleanup code

2016-10-20 Thread Aaron Tomlin
In load_module() in the event of an error, for e.g. unknown module
parameter(s) specified we go to perform some module coming clean up
operations. At this point the module is still in a "formed" state
when it is actually going away.

This patch updates the module's state accordingly to ensure anyone on the
module_notify_list waiting for a module going away notification will be
notified accordingly.

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index f57dd63..ff93ab8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3708,6 +3708,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
  sysfs_cleanup:
mod_sysfs_teardown(mod);
  coming_cleanup:
+   mod->state = MODULE_STATE_GOING;
blocking_notifier_call_chain(_notify_list,
 MODULE_STATE_GOING, mod);
klp_module_going(mod);
-- 
2.5.5



[RFC PATCH 2/2] module: When modifying a module's text ignore modules which are going away too

2016-10-20 Thread Aaron Tomlin
By default, during the access permission modification of a module's core
and init pages, we only ignore modules that are malformed. There is no
reason not to extend this to modules which are going away too.

This patch makes both set_all_modules_text_rw() and
set_all_modules_text_ro() skip modules which are going away too.

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 kernel/module.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index ff93ab8..09c386b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1953,7 +1953,8 @@ void set_all_modules_text_rw(void)
 
mutex_lock(_mutex);
list_for_each_entry_rcu(mod, , list) {
-   if (mod->state == MODULE_STATE_UNFORMED)
+   if (mod->state == MODULE_STATE_UNFORMED ||
+   mod->state == MODULE_STATE_GOING)
continue;
 
frob_text(>core_layout, set_memory_rw);
@@ -1969,7 +1970,8 @@ void set_all_modules_text_ro(void)
 
mutex_lock(_mutex);
list_for_each_entry_rcu(mod, , list) {
-   if (mod->state == MODULE_STATE_UNFORMED)
+   if (mod->state == MODULE_STATE_UNFORMED ||
+   mod->state == MODULE_STATE_GOING)
continue;
 
frob_text(>core_layout, set_memory_ro);
-- 
2.5.5



[RFC PATCH 1/2] module: Ensure a module's state is set accordingly during module coming cleanup code

2016-10-20 Thread Aaron Tomlin
In load_module() in the event of an error, for e.g. unknown module
parameter(s) specified we go to perform some module coming clean up
operations. At this point the module is still in a "formed" state
when it is actually going away.

This patch updates the module's state accordingly to ensure anyone on the
module_notify_list waiting for a module going away notification will be
notified accordingly.

Signed-off-by: Aaron Tomlin 
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index f57dd63..ff93ab8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3708,6 +3708,7 @@ static int load_module(struct load_info *info, const char 
__user *uargs,
  sysfs_cleanup:
mod_sysfs_teardown(mod);
  coming_cleanup:
+   mod->state = MODULE_STATE_GOING;
blocking_notifier_call_chain(_notify_list,
 MODULE_STATE_GOING, mod);
klp_module_going(mod);
-- 
2.5.5



[RFC PATCH 2/2] module: When modifying a module's text ignore modules which are going away too

2016-10-20 Thread Aaron Tomlin
By default, during the access permission modification of a module's core
and init pages, we only ignore modules that are malformed. There is no
reason not to extend this to modules which are going away too.

This patch makes both set_all_modules_text_rw() and
set_all_modules_text_ro() skip modules which are going away too.

Signed-off-by: Aaron Tomlin 
---
 kernel/module.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index ff93ab8..09c386b 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1953,7 +1953,8 @@ void set_all_modules_text_rw(void)
 
mutex_lock(_mutex);
list_for_each_entry_rcu(mod, , list) {
-   if (mod->state == MODULE_STATE_UNFORMED)
+   if (mod->state == MODULE_STATE_UNFORMED ||
+   mod->state == MODULE_STATE_GOING)
continue;
 
frob_text(>core_layout, set_memory_rw);
@@ -1969,7 +1970,8 @@ void set_all_modules_text_ro(void)
 
mutex_lock(_mutex);
list_for_each_entry_rcu(mod, , list) {
-   if (mod->state == MODULE_STATE_UNFORMED)
+   if (mod->state == MODULE_STATE_UNFORMED ||
+   mod->state == MODULE_STATE_GOING)
continue;
 
frob_text(>core_layout, set_memory_ro);
-- 
2.5.5



Re: [PATCH v5 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

2016-04-14 Thread Aaron Tomlin
On Tue 2016-04-05 13:26 -0400, Chris Metcalf wrote:
> Currently you can only request a backtrace of either all cpus, or
> all cpus but yourself.  It can also be helpful to request a remote
> backtrace of a single cpu, and since we want that, the logical
> extension is to support a cpumask as the underlying primitive.
> 
> This change modifies the existing lib/nmi_backtrace.c code to take
> a cpumask as its basic primitive, and modifies the linux/nmi.h code
> to use either the old "all/all_but_self" arch methods, or the new
> "cpumask" method, depending on which is available.
> 
> The existing clients of nmi_backtrace (arm and x86) are converted
> to using the new cpumask approach in this change.
> 
> Signed-off-by: Chris Metcalf <cmetc...@mellanox.com>
> ---
>  arch/arm/include/asm/irq.h|  4 +--
>  arch/arm/kernel/smp.c |  4 +--
>  arch/x86/include/asm/irq.h|  4 +--
>  arch/x86/kernel/apic/hw_nmi.c |  6 ++---
>  include/linux/nmi.h   | 63 
> ++-
>  lib/nmi_backtrace.c   | 15 +--
>  6 files changed, 65 insertions(+), 31 deletions(-)

Looks good to me.

Reviewed-by: Aaron Tomlin <atom...@redhat.com>

-- 
Aaron Tomlin


Re: [PATCH v5 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

2016-04-14 Thread Aaron Tomlin
On Tue 2016-04-05 13:26 -0400, Chris Metcalf wrote:
> Currently you can only request a backtrace of either all cpus, or
> all cpus but yourself.  It can also be helpful to request a remote
> backtrace of a single cpu, and since we want that, the logical
> extension is to support a cpumask as the underlying primitive.
> 
> This change modifies the existing lib/nmi_backtrace.c code to take
> a cpumask as its basic primitive, and modifies the linux/nmi.h code
> to use either the old "all/all_but_self" arch methods, or the new
> "cpumask" method, depending on which is available.
> 
> The existing clients of nmi_backtrace (arm and x86) are converted
> to using the new cpumask approach in this change.
> 
> Signed-off-by: Chris Metcalf 
> ---
>  arch/arm/include/asm/irq.h|  4 +--
>  arch/arm/kernel/smp.c |  4 +--
>  arch/x86/include/asm/irq.h|  4 +--
>  arch/x86/kernel/apic/hw_nmi.c |  6 ++---
>  include/linux/nmi.h   | 63 
> ++-
>  lib/nmi_backtrace.c   | 15 +--
>  6 files changed, 65 insertions(+), 31 deletions(-)

Looks good to me.

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


Re: [PATCH v5 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

2016-04-14 Thread Aaron Tomlin
On Tue 2016-04-05 13:26 -0400, Chris Metcalf wrote:
> Currently on arm there is code that checks whether it should call
> dump_stack() explicitly, to avoid trying to raise an NMI when the
> current context is not preemptible by the backtrace IPI.  Similarly,
> the forthcoming arch/tile support uses an IPI mechanism that does
> not support generating an NMI to self.
> 
> Accordingly, move the code that guards this case into the generic
> mechanism, and invoke it unconditionally whenever we want a
> backtrace of the current cpu.  It seems plausible that in all cases,
> dump_stack() will generate better information than generating a
> stack from the NMI handler.  The register state will be missing,
> but that state is likely not particularly helpful in any case.
> 
> Or, if we think it is helpful, we should be capturing and emitting
> the current register state in all cases when regs == NULL is passed
> to nmi_cpu_backtrace().
> 
> Signed-off-by: Chris Metcalf <cmetc...@mellanox.com>
> ---
>  arch/arm/kernel/smp.c | 9 -
>  lib/nmi_backtrace.c   | 9 +
>  2 files changed, 9 insertions(+), 9 deletions(-)

Thanks Chris.

Acked-by: Aaron Tomlin <atom...@redhat.com>

-- 
Aaron Tomlin


Re: [PATCH v5 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

2016-04-14 Thread Aaron Tomlin
On Tue 2016-04-05 13:26 -0400, Chris Metcalf wrote:
> Currently on arm there is code that checks whether it should call
> dump_stack() explicitly, to avoid trying to raise an NMI when the
> current context is not preemptible by the backtrace IPI.  Similarly,
> the forthcoming arch/tile support uses an IPI mechanism that does
> not support generating an NMI to self.
> 
> Accordingly, move the code that guards this case into the generic
> mechanism, and invoke it unconditionally whenever we want a
> backtrace of the current cpu.  It seems plausible that in all cases,
> dump_stack() will generate better information than generating a
> stack from the NMI handler.  The register state will be missing,
> but that state is likely not particularly helpful in any case.
> 
> Or, if we think it is helpful, we should be capturing and emitting
> the current register state in all cases when regs == NULL is passed
> to nmi_cpu_backtrace().
> 
> Signed-off-by: Chris Metcalf 
> ---
>  arch/arm/kernel/smp.c | 9 -
>  lib/nmi_backtrace.c   | 9 +
>  2 files changed, 9 insertions(+), 9 deletions(-)

Thanks Chris.

Acked-by: Aaron Tomlin 

-- 
Aaron Tomlin


Re: [PATCH v2] watchdog: don't run proc_watchdog_update if new value is same as old

2016-03-16 Thread Aaron Tomlin
On Tue 2016-03-15 00:19 -0400, Joshua Hunt wrote:
> 
> 
> While working on a script to restore all sysctl params before a series of
> tests I found that writing any value into the
> /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
> causes them to call proc_watchdog_update().
> 
> [  955.756196] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [  955.765994] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [  955.774619] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [  955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> 
> There doesn't appear to be a reason for doing this work every time a write
> occurs, so only do it when the values change.
> 
> Signed-off-by: Josh Hunt <joh...@akamai.com>
> Cc: <sta...@vger.kernel.org> # 4.1.x-
> ---
>  kernel/watchdog.c |9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> v2: Update changelog to no longer include mention of soft lockup, and add 
> stable.
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b3ace6e..9acb29f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -923,6 +923,9 @@ static int proc_watchdog_common(int which, struct 
> ctl_table *table, int write,
>* both lockup detectors are disabled if proc_watchdog_update()
>* returns an error.
>*/
> + if (old == new)
> + goto out;
> +
>   err = proc_watchdog_update();
>   }
>  out:
> @@ -967,7 +970,7 @@ int proc_soft_watchdog(struct ctl_table *table, int write,
>  int proc_watchdog_thresh(struct ctl_table *table, int write,
>void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - int err, old;
> + int err, old, new;
>  
>   get_online_cpus();
>   mutex_lock(_proc_mutex);
> @@ -987,6 +990,10 @@ int proc_watchdog_thresh(struct ctl_table *table, int 
> write,
>   /*
>* Update the sample period. Restore on failure.
>*/
> + new = ACCESS_ONCE(watchdog_thresh);
> + if (old == new)
> + goto out;
> +
>   set_sample_period();
>   err = proc_watchdog_update();
>   if (err) {

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


Re: [PATCH v2] watchdog: don't run proc_watchdog_update if new value is same as old

2016-03-16 Thread Aaron Tomlin
On Tue 2016-03-15 00:19 -0400, Joshua Hunt wrote:
> 
> 
> While working on a script to restore all sysctl params before a series of
> tests I found that writing any value into the
> /proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
> causes them to call proc_watchdog_update().
> 
> [  955.756196] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [  955.765994] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [  955.774619] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> [  955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one 
> hw-PMU counter.
> 
> There doesn't appear to be a reason for doing this work every time a write
> occurs, so only do it when the values change.
> 
> Signed-off-by: Josh Hunt 
> Cc:  # 4.1.x-
> ---
>  kernel/watchdog.c |9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> v2: Update changelog to no longer include mention of soft lockup, and add 
> stable.
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b3ace6e..9acb29f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -923,6 +923,9 @@ static int proc_watchdog_common(int which, struct 
> ctl_table *table, int write,
>* both lockup detectors are disabled if proc_watchdog_update()
>* returns an error.
>*/
> + if (old == new)
> + goto out;
> +
>   err = proc_watchdog_update();
>   }
>  out:
> @@ -967,7 +970,7 @@ int proc_soft_watchdog(struct ctl_table *table, int write,
>  int proc_watchdog_thresh(struct ctl_table *table, int write,
>void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - int err, old;
> + int err, old, new;
>  
>   get_online_cpus();
>   mutex_lock(_proc_mutex);
> @@ -987,6 +990,10 @@ int proc_watchdog_thresh(struct ctl_table *table, int 
> write,
>   /*
>* Update the sample period. Restore on failure.
>*/
> + new = ACCESS_ONCE(watchdog_thresh);
> + if (old == new)
> + goto out;
> +
>   set_sample_period();
>   err = proc_watchdog_update();
>   if (err) {

Reviewed-by: Aaron Tomlin 


Re: [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry)

2015-11-05 Thread Aaron Tomlin
On Tue 2015-11-03 16:20 +0100, Ulrich Obergfell wrote:
> This patch set addresses various races in relation to CPU hotplug
> and a race in relation to watchdog timer expiry. I discovered the
> corner cases during code inspection. I haven't seen any of these
> issues occur in practice.

This patch series does adequately address the race conditions mentioned
above. Thanks.

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry)

2015-11-05 Thread Aaron Tomlin
On Tue 2015-11-03 16:20 +0100, Ulrich Obergfell wrote:
> This patch set addresses various races in relation to CPU hotplug
> and a race in relation to watchdog timer expiry. I discovered the
> corner cases during code inspection. I haven't seen any of these
> issues occur in practice.

This patch series does adequately address the race conditions mentioned
above. Thanks.

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


[RESEND PATCH v2] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-11-04 Thread Aaron Tomlin
Resending due to incomplete subject.

Changes since v2:

 - Fixed incorrect commit message

The "vma" parameter to khugepaged_alloc_page() is unused.
It has to remain unused or the drop read lock 'map_sem' optimisation
introduce by commit 8b1645685acf ("mm, THP: don't hold mmap_sem in
khugepaged when allocating THP") wouldn't be safe. So let's remove it.

Signed-off-by: Aaron Tomlin 
---
 mm/huge_memory.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbac913..490fa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2413,8 +2413,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
VM_BUG_ON_PAGE(*hpage, *hpage);
 
@@ -2481,8 +2480,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
up_read(>mmap_sem);
VM_BUG_ON(!*hpage);
@@ -2530,7 +2528,7 @@ static void collapse_huge_page(struct mm_struct *mm,
__GFP_THISNODE;
 
/* release the mmap_sem read lock. */
-   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
+   new_page = khugepaged_alloc_page(hpage, gfp, mm, address, node);
if (!new_page)
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND PATCH v2] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-11-04 Thread Aaron Tomlin
Resending due to incomplete subject.

Changes since v2:

 - Fixed incorrect commit message

The "vma" parameter to khugepaged_alloc_page() is unused.
It has to remain unused or the drop read lock 'map_sem' optimisation
introduce by commit 8b1645685acf ("mm, THP: don't hold mmap_sem in
khugepaged when allocating THP") wouldn't be safe. So let's remove it.

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 mm/huge_memory.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbac913..490fa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2413,8 +2413,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
VM_BUG_ON_PAGE(*hpage, *hpage);
 
@@ -2481,8 +2480,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
up_read(>mmap_sem);
VM_BUG_ON(!*hpage);
@@ -2530,7 +2528,7 @@ static void collapse_huge_page(struct mm_struct *mm,
__GFP_THISNODE;
 
/* release the mmap_sem read lock. */
-   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
+   new_page = khugepaged_alloc_page(hpage, gfp, mm, address, node);
if (!new_page)
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-10-28 Thread Aaron Tomlin
The "vma" parameter to khugepaged_alloc_page() is unused.
It has to remain unused or the drop read lock 'map_sem' optimisation
introduce by commit 8b1645685acf ("thp: introduce khugepaged_prealloc_page
and khugepaged_alloc_page") wouldn't be possible. So let's remove it.

Signed-off-by: Aaron Tomlin 
---
 mm/huge_memory.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbac913..490fa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2413,8 +2413,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
VM_BUG_ON_PAGE(*hpage, *hpage);
 
@@ -2481,8 +2480,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
up_read(>mmap_sem);
VM_BUG_ON(!*hpage);
@@ -2530,7 +2528,7 @@ static void collapse_huge_page(struct mm_struct *mm,
__GFP_THISNODE;
 
/* release the mmap_sem read lock. */
-   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
+   new_page = khugepaged_alloc_page(hpage, gfp, mm, address, node);
if (!new_page)
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-10-28 Thread Aaron Tomlin
On Wed 2015-10-28 17:05 +, Aaron Tomlin wrote:
> The "vma" parameter to khugepaged_alloc_page() is unused.
> It has to remain unused or the drop read lock 'map_sem' optimisation
> introduce by commit 8b1645685acf ("thp: introduce khugepaged_prealloc_page
> and khugepaged_alloc_page") wouldn't be possible. So let's remove it.

Self nack due to incorrect commit message.

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-10-28 Thread Aaron Tomlin
The "vma" parameter to khugepaged_alloc_page() is unused.
It has to remain unused or the drop read lock 'map_sem' optimisation
introduce by commit 8b1645685acf ("mm, THP: don't hold mmap_sem in
khugepaged when allocating THP") wouldn't be possible. So let's remove it.

Signed-off-by: Aaron Tomlin 
---
 mm/huge_memory.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbac913..490fa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2413,8 +2413,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
VM_BUG_ON_PAGE(*hpage, *hpage);
 
@@ -2481,8 +2480,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
up_read(>mmap_sem);
VM_BUG_ON(!*hpage);
@@ -2530,7 +2528,7 @@ static void collapse_huge_page(struct mm_struct *mm,
__GFP_THISNODE;
 
/* release the mmap_sem read lock. */
-   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
+   new_page = khugepaged_alloc_page(hpage, gfp, mm, address, node);
if (!new_page)
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-10-28 Thread Aaron Tomlin
The "vma" parameter to khugepaged_alloc_page() is unused.
It has to remain unused or the drop read lock 'map_sem' optimisation
introduce by commit 8b1645685acf ("thp: introduce khugepaged_prealloc_page
and khugepaged_alloc_page") wouldn't be possible. So let's remove it.

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 mm/huge_memory.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbac913..490fa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2413,8 +2413,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
VM_BUG_ON_PAGE(*hpage, *hpage);
 
@@ -2481,8 +2480,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
up_read(>mmap_sem);
VM_BUG_ON(!*hpage);
@@ -2530,7 +2528,7 @@ static void collapse_huge_page(struct mm_struct *mm,
__GFP_THISNODE;
 
/* release the mmap_sem read lock. */
-   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
+   new_page = khugepaged_alloc_page(hpage, gfp, mm, address, node);
if (!new_page)
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-10-28 Thread Aaron Tomlin
On Wed 2015-10-28 17:05 +, Aaron Tomlin wrote:
> The "vma" parameter to khugepaged_alloc_page() is unused.
> It has to remain unused or the drop read lock 'map_sem' optimisation
> introduce by commit 8b1645685acf ("thp: introduce khugepaged_prealloc_page
> and khugepaged_alloc_page") wouldn't be possible. So let's remove it.

Self nack due to incorrect commit message.

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] thp: Remove unused vma parameter from khugepaged_alloc_page

2015-10-28 Thread Aaron Tomlin
The "vma" parameter to khugepaged_alloc_page() is unused.
It has to remain unused or the drop read lock 'map_sem' optimisation
introduce by commit 8b1645685acf ("mm, THP: don't hold mmap_sem in
khugepaged when allocating THP") wouldn't be possible. So let's remove it.

Signed-off-by: Aaron Tomlin <atom...@redhat.com>
---
 mm/huge_memory.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbac913..490fa81 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2413,8 +2413,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
VM_BUG_ON_PAGE(*hpage, *hpage);
 
@@ -2481,8 +2480,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, 
bool *wait)
 
 static struct page *
 khugepaged_alloc_page(struct page **hpage, gfp_t gfp, struct mm_struct *mm,
-  struct vm_area_struct *vma, unsigned long address,
-  int node)
+  unsigned long address, int node)
 {
up_read(>mmap_sem);
VM_BUG_ON(!*hpage);
@@ -2530,7 +2528,7 @@ static void collapse_huge_page(struct mm_struct *mm,
__GFP_THISNODE;
 
/* release the mmap_sem read lock. */
-   new_page = khugepaged_alloc_page(hpage, gfp, mm, vma, address, node);
+   new_page = khugepaged_alloc_page(hpage, gfp, mm, address, node);
if (!new_page)
return;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] watchdog: Add sysctl knob hardlockup_panic

2015-10-18 Thread Aaron Tomlin
On Thu 2015-10-15 11:06 -0400, Don Zickus wrote:
> The only way to enable a hardlockup to panic the machine is to set
> 'nmi_watchdog=panic' on the kernel command line.
> 
> This makes it awkward for end users and folks who want to run automate tests
> (like myself).
> 
> Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
> knob.
> 
> Signed-off-by: Don Zickus 
> 
> --
> V2 - wrap the sysctl variable with CONFIG_HARDLOCKUP_DETECTOR (kbuid bot)
> ---
>  Documentation/lockup-watchdogs.txt |  5 +++--
>  include/linux/sched.h  |  1 +
>  kernel/sysctl.c| 11 +++
>  kernel/watchdog.c  |  2 +-
>  4 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/lockup-watchdogs.txt 
> b/Documentation/lockup-watchdogs.txt
> index 22dd6af..4a6e33e 100644
> --- a/Documentation/lockup-watchdogs.txt
> +++ b/Documentation/lockup-watchdogs.txt
> @@ -20,8 +20,9 @@ kernel mode for more than 10 seconds (see "Implementation" 
> below for
>  details), without letting other interrupts have a chance to run.
>  Similarly to the softlockup case, the current stack trace is displayed
>  upon detection and the system will stay locked up unless the default
> -behavior is changed, which can be done through a compile time knob,
> -"BOOTPARAM_HARDLOCKUP_PANIC", and a kernel parameter, "nmi_watchdog"
> +behavior is changed, which can be done through a sysctl,
> +'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
> +and a kernel parameter, "nmi_watchdog"
>  (see "Documentation/kernel-parameters.txt" for details).
>  
>  The panic option can be used in combination with panic_timeout (this
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b7b9501..5e65b14 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -384,6 +384,7 @@ extern int proc_dowatchdog_thresh(struct ctl_table 
> *table, int write,
> void __user *buffer,
> size_t *lenp, loff_t *ppos);
>  extern unsigned int  softlockup_panic;
> +extern unsigned int  hardlockup_panic;
>  void lockup_detector_init(void);
>  #else
>  static inline void touch_softlockup_watchdog(void)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index efb0370..a341117 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -887,6 +887,17 @@ static struct ctl_table kern_table[] = {
>   .extra1 = ,
>   .extra2 = ,
>   },
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> + {
> + .procname   = "hardlockup_panic",
> + .data   = _panic,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = ,
> + .extra2 = ,
> + },
> +#endif
>  #ifdef CONFIG_SMP
>   {
>   .procname   = "softlockup_all_cpu_backtrace",
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index f6b32b8..0a23125 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -112,7 +112,7 @@ static unsigned long soft_lockup_nmi_warn;
>   * Should we panic when a soft-lockup or hard-lockup occurs:
>   */
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> -static int hardlockup_panic =
> +unsigned int __read_mostly hardlockup_panic =
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>  static unsigned long hardlockup_allcpu_dumped;
>  /*

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH v2] watchdog: Add sysctl knob hardlockup_panic

2015-10-18 Thread Aaron Tomlin
On Thu 2015-10-15 11:06 -0400, Don Zickus wrote:
> The only way to enable a hardlockup to panic the machine is to set
> 'nmi_watchdog=panic' on the kernel command line.
> 
> This makes it awkward for end users and folks who want to run automate tests
> (like myself).
> 
> Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
> knob.
> 
> Signed-off-by: Don Zickus <dzic...@redhat.com>
> 
> --
> V2 - wrap the sysctl variable with CONFIG_HARDLOCKUP_DETECTOR (kbuid bot)
> ---
>  Documentation/lockup-watchdogs.txt |  5 +++--
>  include/linux/sched.h  |  1 +
>  kernel/sysctl.c| 11 +++
>  kernel/watchdog.c  |  2 +-
>  4 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/lockup-watchdogs.txt 
> b/Documentation/lockup-watchdogs.txt
> index 22dd6af..4a6e33e 100644
> --- a/Documentation/lockup-watchdogs.txt
> +++ b/Documentation/lockup-watchdogs.txt
> @@ -20,8 +20,9 @@ kernel mode for more than 10 seconds (see "Implementation" 
> below for
>  details), without letting other interrupts have a chance to run.
>  Similarly to the softlockup case, the current stack trace is displayed
>  upon detection and the system will stay locked up unless the default
> -behavior is changed, which can be done through a compile time knob,
> -"BOOTPARAM_HARDLOCKUP_PANIC", and a kernel parameter, "nmi_watchdog"
> +behavior is changed, which can be done through a sysctl,
> +'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
> +and a kernel parameter, "nmi_watchdog"
>  (see "Documentation/kernel-parameters.txt" for details).
>  
>  The panic option can be used in combination with panic_timeout (this
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b7b9501..5e65b14 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -384,6 +384,7 @@ extern int proc_dowatchdog_thresh(struct ctl_table 
> *table, int write,
> void __user *buffer,
> size_t *lenp, loff_t *ppos);
>  extern unsigned int  softlockup_panic;
> +extern unsigned int  hardlockup_panic;
>  void lockup_detector_init(void);
>  #else
>  static inline void touch_softlockup_watchdog(void)
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index efb0370..a341117 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -887,6 +887,17 @@ static struct ctl_table kern_table[] = {
>   .extra1 = ,
>   .extra2 = ,
>   },
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> + {
> + .procname   = "hardlockup_panic",
> + .data   = _panic,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = ,
> + .extra2 = ,
> + },
> +#endif
>  #ifdef CONFIG_SMP
>   {
>   .procname   = "softlockup_all_cpu_backtrace",
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index f6b32b8..0a23125 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -112,7 +112,7 @@ static unsigned long soft_lockup_nmi_warn;
>   * Should we panic when a soft-lockup or hard-lockup occurs:
>   */
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> -static int hardlockup_panic =
> +unsigned int __read_mostly hardlockup_panic =
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>  static unsigned long hardlockup_allcpu_dumped;
>  /*

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH] watchdog: perform all-CPU backtrace in case of hard lockup

2015-10-01 Thread Aaron Tomlin
On Thu 2015-10-01 09:45 +0200, Jiri Kosina wrote:
> That should be fine. Worst case scenario is hardlockup and softlockup 
> trigerring 'in parallel' on different CPUs, and all-cpu backtrace being 
> triggered twice.
> 
> Frankly, I've never seen this happen in practice (hardlockup and 
> softlockup triggering at different CPUs at the very same time).
> 
> We could possibly make a global 'all_cpus_dump_in_progress' flag so that 
> we guarantee only one stream of dumps, but we'd need to convert everybody 
> using this facility (e.g. RCU stall detector, and whoever else) to be 
> aware of it as well, otherwise it wouldn't make too much sense.
> 
> Something to add to TODO I guess.

This could indeed be worth further investigation.


-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] watchdog: perform all-CPU backtrace in case of hard lockup

2015-10-01 Thread Aaron Tomlin
2,6 +114,7 @@ static unsigned long soft_lockup_nmi_warn;
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  static int hardlockup_panic =
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
> +static unsigned long hardlockup_allcpu_dumped;
>  /*
>   * We may not want to enable hard lockup detection by default in all cases,
>   * for example when running the kernel as a guest on a hypervisor. In these
> @@ -173,6 +176,13 @@ static int __init 
> softlockup_all_cpu_backtrace_setup(char *str)
>   return 1;
>  }
>  __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
> +static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> +{
> + sysctl_hardlockup_all_cpu_backtrace =
> + !!simple_strtol(str, NULL, 0);
> + return 1;
> +}
> +__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  #endif
>  
>  /*
> @@ -318,17 +328,30 @@ static void watchdog_overflow_callback(struct 
> perf_event *event,
>*/
>   if (is_hardlockup()) {
>   int this_cpu = smp_processor_id();
> + struct pt_regs *regs = get_irq_regs();
>  
>   /* only print hardlockups once */
>   if (__this_cpu_read(hard_watchdog_warn) == true)
>   return;
>  
> - if (hardlockup_panic)
> - panic("Watchdog detected hard LOCKUP on cpu %d",
> -   this_cpu);
> + pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> + print_modules();
> + print_irqtrace_events(current);
> + if (regs)
> + show_regs(regs);
>   else
> - WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
> -  this_cpu);
> + dump_stack();
> +
> + /*
> +  * Perform all-CPU dump only once to avoid multiple hardlockups
> +  * generating interleaving traces
> +  */
> + if (sysctl_hardlockup_all_cpu_backtrace &&
> + !test_and_set_bit(0, _allcpu_dumped))
> + trigger_allbutself_cpu_backtrace();

How does this play when 'softlockup_all_cpu_backtrace' is enabled too?

> +
> + if (hardlockup_panic)
> + panic("Hard LOCKUP");
>  
>   __this_cpu_write(hard_watchdog_warn, true);
>   return;

This does indeed appear similar to Linus commit ed235875
("kernel/watchdog.c: print traces for all cpus on lockup detection");
albeit for the hardlockup detector.

Looks fine to me. Thanks!

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH] watchdog: perform all-CPU backtrace in case of hard lockup

2015-10-01 Thread Aaron Tomlin
bits = cpumask_bits(_cpumask);
> @@ -112,6 +114,7 @@ static unsigned long soft_lockup_nmi_warn;
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  static int hardlockup_panic =
>   CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
> +static unsigned long hardlockup_allcpu_dumped;
>  /*
>   * We may not want to enable hard lockup detection by default in all cases,
>   * for example when running the kernel as a guest on a hypervisor. In these
> @@ -173,6 +176,13 @@ static int __init 
> softlockup_all_cpu_backtrace_setup(char *str)
>   return 1;
>  }
>  __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
> +static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> +{
> + sysctl_hardlockup_all_cpu_backtrace =
> + !!simple_strtol(str, NULL, 0);
> + return 1;
> +}
> +__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  #endif
>  
>  /*
> @@ -318,17 +328,30 @@ static void watchdog_overflow_callback(struct 
> perf_event *event,
>*/
>   if (is_hardlockup()) {
>   int this_cpu = smp_processor_id();
> + struct pt_regs *regs = get_irq_regs();
>  
>   /* only print hardlockups once */
>   if (__this_cpu_read(hard_watchdog_warn) == true)
>   return;
>  
> - if (hardlockup_panic)
> - panic("Watchdog detected hard LOCKUP on cpu %d",
> -   this_cpu);
> + pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> + print_modules();
> + print_irqtrace_events(current);
> + if (regs)
> + show_regs(regs);
>   else
> - WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
> -  this_cpu);
> + dump_stack();
> +
> + /*
> +  * Perform all-CPU dump only once to avoid multiple hardlockups
> +  * generating interleaving traces
> +  */
> + if (sysctl_hardlockup_all_cpu_backtrace &&
> + !test_and_set_bit(0, _allcpu_dumped))
> + trigger_allbutself_cpu_backtrace();

How does this play when 'softlockup_all_cpu_backtrace' is enabled too?

> +
> + if (hardlockup_panic)
> + panic("Hard LOCKUP");
>  
>   __this_cpu_write(hard_watchdog_warn, true);
>   return;

This does indeed appear similar to Linus commit ed235875
("kernel/watchdog.c: print traces for all cpus on lockup detection");
albeit for the hardlockup detector.

Looks fine to me. Thanks!

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH] watchdog: perform all-CPU backtrace in case of hard lockup

2015-10-01 Thread Aaron Tomlin
On Thu 2015-10-01 09:45 +0200, Jiri Kosina wrote:
> That should be fine. Worst case scenario is hardlockup and softlockup 
> trigerring 'in parallel' on different CPUs, and all-cpu backtrace being 
> triggered twice.
> 
> Frankly, I've never seen this happen in practice (hardlockup and 
> softlockup triggering at different CPUs at the very same time).
> 
> We could possibly make a global 'all_cpus_dump_in_progress' flag so that 
> we guarantee only one stream of dumps, but we'd need to convert everybody 
> using this facility (e.g. RCU stall detector, and whoever else) to be 
> aware of it as well, otherwise it wouldn't make too much sense.
> 
> Something to add to TODO I guess.

This could indeed be worth further investigation.


-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] watchdog: do not unpark threads in watchdog_park_threads() on error

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> If kthread_park() returns an error, watchdog_park_threads() should not
> blindly 'roll back' the already parked threads to the unparked state.
> Instead leave it up to the callers to handle such errors appropriately
> in their context. For example, it is redundant to unpark the threads
> if the lockup detectors will soon be disabled by the callers anyway.
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  kernel/watchdog.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 3bc22a9..af70bf2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -654,6 +654,12 @@ static struct smp_hotplug_thread watchdog_threads = {
>  
>  /*
>   * park all watchdog threads that are specified in 'watchdog_cpumask'
> + *
> + * This function returns an error if kthread_park() of a watchdog thread
> + * fails. In this situation, the watchdog threads of some CPUs can already
> + * be parked and the watchdog threads of other CPUs can still be runnable.
> + * Callers are expected to handle this special condition as appropriate in
> + * their context.
>   */
>  static int watchdog_park_threads(void)
>  {
> @@ -665,10 +671,6 @@ static int watchdog_park_threads(void)
>   if (ret)
>   break;
>   }
> - if (ret) {
> - for_each_watchdog_cpu(cpu)
> - kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> - }
>   put_online_cpus();
>  
>   return ret;

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 4/5] watchdog: implement error handling in lockup_detector_suspend()

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> lockup_detector_suspend() now handles errors from watchdog_park_threads().
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  kernel/watchdog.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 457113c..3bc22a9 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -707,6 +707,11 @@ int lockup_detector_suspend(void)
>  
>   if (ret == 0)
>   watchdog_suspended++;
> + else {
> + watchdog_disable_all_cpus();
> + pr_err("Failed to suspend lockup detectors, disabled\n");
> + watchdog_enabled = 0;
> + }
>  
>   mutex_unlock(_proc_mutex);
>  

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 3/5] watchdog: implement error handling in update_watchdog_all_cpus() and callers

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> update_watchdog_all_cpus() now passes errors from watchdog_park_threads()
> up to functions in the call chain. This allows watchdog_enable_all_cpus()
> and proc_watchdog_update() to handle such errors too.
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  kernel/watchdog.c | 30 +++---
>  1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index eb9527c..457113c 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -731,10 +731,17 @@ void lockup_detector_resume(void)
>   mutex_unlock(_proc_mutex);
>  }
>  
> -static void update_watchdog_all_cpus(void)
> +static int update_watchdog_all_cpus(void)
>  {
> - watchdog_park_threads();
> + int ret;
> +
> + ret = watchdog_park_threads();
> + if (ret)
> + return ret;
> +
>   watchdog_unpark_threads();
> +
> + return 0;
>  }
>  
>  static int watchdog_enable_all_cpus(void)
> @@ -753,9 +760,17 @@ static int watchdog_enable_all_cpus(void)
>* Enable/disable the lockup detectors or
>* change the sample period 'on the fly'.
>*/
> - update_watchdog_all_cpus();
> + err = update_watchdog_all_cpus();
> +
> + if (err) {
> + watchdog_disable_all_cpus();
> + pr_err("Failed to update lockup detectors, disabled\n");
> + }
>   }
>  
> + if (err)
> + watchdog_enabled = 0;
> +
>   return err;
>  }
>  
> @@ -851,12 +866,13 @@ static int proc_watchdog_common(int which, struct 
> ctl_table *table, int write,
>   } while (cmpxchg(_enabled, old, new) != old);
>  
>   /*
> -  * Update the run state of the lockup detectors.
> -  * Restore 'watchdog_enabled' on failure.
> +  * Update the run state of the lockup detectors. There is _no_
> +  * need to check the value returned by proc_watchdog_update()
> +  * and to restore the previous value of 'watchdog_enabled' as
> +  * both lockup detectors are disabled if proc_watchdog_update()
> +  * returns an error.
>    */
>   err = proc_watchdog_update();
> - if (err)
> - watchdog_enabled = old;
>   }
>  out:
>   mutex_unlock(_proc_mutex);

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 2/5] watchdog: move watchdog_disable_all_cpus() outside of ifdef

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> It makes sense to place watchdog_{dis|enable}_all_cpus() outside of
> the ifdef so that _both_ are available even if CONFIG_SYSCTL is not
> defined.
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  kernel/watchdog.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index cd9a504..eb9527c 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -347,6 +347,9 @@ static void watchdog_interrupt_count(void)
>  static int watchdog_nmi_enable(unsigned int cpu);
>  static void watchdog_nmi_disable(unsigned int cpu);
>  
> +static int watchdog_enable_all_cpus(void);
> +static void watchdog_disable_all_cpus(void);
> +
>  /* watchdog kicker functions */
>  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  {
> @@ -756,9 +759,6 @@ static int watchdog_enable_all_cpus(void)
>   return err;
>  }
>  
> -/* prepare/enable/disable routines */
> -/* sysctl functions */
> -#ifdef CONFIG_SYSCTL
>  static void watchdog_disable_all_cpus(void)
>  {
>   if (watchdog_running) {
> @@ -767,6 +767,8 @@ static void watchdog_disable_all_cpus(void)
>   }
>  }
>  
> +#ifdef CONFIG_SYSCTL
> +
>  /*
>   * Update the run state of the lockup detectors.
>   */

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 1/5] watchdog: fix error handling in proc_watchdog_thresh()

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> Restore the previous value of watchdog_thresh _and_ sample_period
> if proc_watchdog_update() returns an error. The variables must be
> consistent to avoid false positives of the lockup detectors.
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  kernel/watchdog.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 64ed1c3..cd9a504 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -914,13 +914,14 @@ int proc_watchdog_thresh(struct ctl_table *table, int 
> write,
>   goto out;
>  
>   /*
> -  * Update the sample period.
> -  * Restore 'watchdog_thresh' on failure.
> +  * Update the sample period. Restore on failure.
>*/
>   set_sample_period();
>   err = proc_watchdog_update();
> - if (err)
> + if (err) {
>   watchdog_thresh = old;
> + set_sample_period();
> +     }
>  out:
>   mutex_unlock(_proc_mutex);
>   return err;

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 2/5] watchdog: move watchdog_disable_all_cpus() outside of ifdef

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> It makes sense to place watchdog_{dis|enable}_all_cpus() outside of
> the ifdef so that _both_ are available even if CONFIG_SYSCTL is not
> defined.
> 
> Signed-off-by: Ulrich Obergfell <uober...@redhat.com>
> ---
>  kernel/watchdog.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index cd9a504..eb9527c 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -347,6 +347,9 @@ static void watchdog_interrupt_count(void)
>  static int watchdog_nmi_enable(unsigned int cpu);
>  static void watchdog_nmi_disable(unsigned int cpu);
>  
> +static int watchdog_enable_all_cpus(void);
> +static void watchdog_disable_all_cpus(void);
> +
>  /* watchdog kicker functions */
>  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  {
> @@ -756,9 +759,6 @@ static int watchdog_enable_all_cpus(void)
>   return err;
>  }
>  
> -/* prepare/enable/disable routines */
> -/* sysctl functions */
> -#ifdef CONFIG_SYSCTL
>  static void watchdog_disable_all_cpus(void)
>  {
>   if (watchdog_running) {
> @@ -767,6 +767,8 @@ static void watchdog_disable_all_cpus(void)
>   }
>  }
>  
> +#ifdef CONFIG_SYSCTL
> +
>  /*
>   * Update the run state of the lockup detectors.
>   */

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH 1/5] watchdog: fix error handling in proc_watchdog_thresh()

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> Restore the previous value of watchdog_thresh _and_ sample_period
> if proc_watchdog_update() returns an error. The variables must be
> consistent to avoid false positives of the lockup detectors.
> 
> Signed-off-by: Ulrich Obergfell <uober...@redhat.com>
> ---
>  kernel/watchdog.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 64ed1c3..cd9a504 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -914,13 +914,14 @@ int proc_watchdog_thresh(struct ctl_table *table, int 
> write,
>   goto out;
>  
>   /*
> -  * Update the sample period.
> -  * Restore 'watchdog_thresh' on failure.
> +  * Update the sample period. Restore on failure.
>*/
>   set_sample_period();
>   err = proc_watchdog_update();
> - if (err)
> + if (err) {
>   watchdog_thresh = old;
> + set_sample_period();
> +     }
>  out:
>   mutex_unlock(_proc_mutex);
>   return err;

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH 3/5] watchdog: implement error handling in update_watchdog_all_cpus() and callers

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> update_watchdog_all_cpus() now passes errors from watchdog_park_threads()
> up to functions in the call chain. This allows watchdog_enable_all_cpus()
> and proc_watchdog_update() to handle such errors too.
> 
> Signed-off-by: Ulrich Obergfell <uober...@redhat.com>
> ---
>  kernel/watchdog.c | 30 +++---
>  1 file changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index eb9527c..457113c 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -731,10 +731,17 @@ void lockup_detector_resume(void)
>   mutex_unlock(_proc_mutex);
>  }
>  
> -static void update_watchdog_all_cpus(void)
> +static int update_watchdog_all_cpus(void)
>  {
> - watchdog_park_threads();
> + int ret;
> +
> + ret = watchdog_park_threads();
> + if (ret)
> + return ret;
> +
>   watchdog_unpark_threads();
> +
> + return 0;
>  }
>  
>  static int watchdog_enable_all_cpus(void)
> @@ -753,9 +760,17 @@ static int watchdog_enable_all_cpus(void)
>* Enable/disable the lockup detectors or
>* change the sample period 'on the fly'.
>*/
> - update_watchdog_all_cpus();
> + err = update_watchdog_all_cpus();
> +
> + if (err) {
> + watchdog_disable_all_cpus();
> + pr_err("Failed to update lockup detectors, disabled\n");
> + }
>   }
>  
> + if (err)
> + watchdog_enabled = 0;
> +
>   return err;
>  }
>  
> @@ -851,12 +866,13 @@ static int proc_watchdog_common(int which, struct 
> ctl_table *table, int write,
>   } while (cmpxchg(_enabled, old, new) != old);
>  
>   /*
> -  * Update the run state of the lockup detectors.
> -  * Restore 'watchdog_enabled' on failure.
> +  * Update the run state of the lockup detectors. There is _no_
> +  * need to check the value returned by proc_watchdog_update()
> +  * and to restore the previous value of 'watchdog_enabled' as
> +  * both lockup detectors are disabled if proc_watchdog_update()
> +  * returns an error.
>    */
>   err = proc_watchdog_update();
> - if (err)
> - watchdog_enabled = old;
>   }
>  out:
>   mutex_unlock(_proc_mutex);

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH 4/5] watchdog: implement error handling in lockup_detector_suspend()

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> lockup_detector_suspend() now handles errors from watchdog_park_threads().
> 
> Signed-off-by: Ulrich Obergfell <uober...@redhat.com>
> ---
>  kernel/watchdog.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 457113c..3bc22a9 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -707,6 +707,11 @@ int lockup_detector_suspend(void)
>  
>   if (ret == 0)
>   watchdog_suspended++;
> + else {
> + watchdog_disable_all_cpus();
> + pr_err("Failed to suspend lockup detectors, disabled\n");
> + watchdog_enabled = 0;
> + }
>  
>   mutex_unlock(_proc_mutex);
>  

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH 5/5] watchdog: do not unpark threads in watchdog_park_threads() on error

2015-09-29 Thread Aaron Tomlin
On Mon 2015-09-28 22:44 +0200, Ulrich Obergfell wrote:
> If kthread_park() returns an error, watchdog_park_threads() should not
> blindly 'roll back' the already parked threads to the unparked state.
> Instead leave it up to the callers to handle such errors appropriately
> in their context. For example, it is redundant to unpark the threads
> if the lockup detectors will soon be disabled by the callers anyway.
> 
> Signed-off-by: Ulrich Obergfell <uober...@redhat.com>
> ---
>  kernel/watchdog.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 3bc22a9..af70bf2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -654,6 +654,12 @@ static struct smp_hotplug_thread watchdog_threads = {
>  
>  /*
>   * park all watchdog threads that are specified in 'watchdog_cpumask'
> + *
> + * This function returns an error if kthread_park() of a watchdog thread
> + * fails. In this situation, the watchdog threads of some CPUs can already
> + * be parked and the watchdog threads of other CPUs can still be runnable.
> + * Callers are expected to handle this special condition as appropriate in
> + * their context.
>   */
>  static int watchdog_park_threads(void)
>  {
> @@ -665,10 +671,6 @@ static int watchdog_park_threads(void)
>   if (ret)
>   break;
>   }
> - if (ret) {
> - for_each_watchdog_cpu(cpu)
> - kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> - }
>   put_online_cpus();
>  
>   return ret;

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH] kernel/watchdog: is_hardlockup can be boolean

2015-09-22 Thread Aaron Tomlin
On Tue 2015-09-22 21:32 +0800, Yaowei Bai wrote:
> This patch makes is_hardlockup return bool to improve readability
> due to this particular function only using either one or zero as its
> return value.
> 
> No functional change.
> 
> Signed-off-by: Yaowei Bai 
> ---
>  kernel/watchdog.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 64ed1c3..568ba64 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -263,15 +263,15 @@ void touch_softlockup_watchdog_sync(void)
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  /* watchdog detector functions */
> -static int is_hardlockup(void)
> +static bool is_hardlockup(void)
>  {
>   unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>  
>   if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> - return 1;
> + return true;
>  
>   __this_cpu_write(hrtimer_interrupts_saved, hrint);
> - return 0;
> + return false;
>  }
>  #endif
>  

Fair enough with regards to readability.

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH] kernel/watchdog: is_hardlockup can be boolean

2015-09-22 Thread Aaron Tomlin
On Tue 2015-09-22 21:32 +0800, Yaowei Bai wrote:
> This patch makes is_hardlockup return bool to improve readability
> due to this particular function only using either one or zero as its
> return value.
> 
> No functional change.
> 
> Signed-off-by: Yaowei Bai <bywxiao...@163.com>
> ---
>  kernel/watchdog.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 64ed1c3..568ba64 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -263,15 +263,15 @@ void touch_softlockup_watchdog_sync(void)
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  /* watchdog detector functions */
> -static int is_hardlockup(void)
> +static bool is_hardlockup(void)
>  {
>   unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>  
>   if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> - return 1;
> + return true;
>  
>   __this_cpu_write(hrtimer_interrupts_saved, hrint);
> - return 0;
> + return false;
>  }
>  #endif
>  

Fair enough with regards to readability.

Reviewed-by: Aaron Tomlin <atom...@redhat.com>


signature.asc
Description: PGP signature


Re: [PATCH 2/2] watchdog: use pr_debug() in fixup_ht_bug() failure path

2015-08-07 Thread Aaron Tomlin
On Fri 2015-08-07 11:58 +0200, Ulrich Obergfell wrote:
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
> b/arch/x86/kernel/cpu/perf_event_intel.c
> index 0357bf7..abb25c3 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -3369,7 +3369,7 @@ static __init int fixup_ht_bug(void)
>   }
>  
>   if (lockup_detector_suspend() != 0) {
> - pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 
> workaround\n");
> + pr_debug("failed to disable PMU erratum BJ122, BV98, HSD29 
> workaround\n");
>   return 0;
>   }
>  

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [PATCH 1/2] watchdog: rename watchdog_suspend() and watchdog_resume()

2015-08-07 Thread Aaron Tomlin
On Fri 2015-08-07 11:58 +0200, Ulrich Obergfell wrote:
> Rename watchdog_suspend() to lockup_detector_suspend() and
> watchdog_resume() to lockup_detector_resume() to avoid
> confusion with the watchdog subsystem and to be consistent
> with the existing name lockup_detector_init().
> 
> Also provide comment blocks to explain the watchdog_running
> and watchdog_suspended variables and their relationship.
> 
> Signed-off-by: Ulrich Obergfell 
> ---
>  arch/x86/kernel/cpu/perf_event_intel.c |  4 ++--
>  include/linux/nmi.h|  8 
>  kernel/watchdog.c  | 26 ++
>  3 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
> b/arch/x86/kernel/cpu/perf_event_intel.c
> index d4e1b0c..0357bf7 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -3368,7 +3368,7 @@ static __init int fixup_ht_bug(void)
>   return 0;
>   }
>  
> - if (watchdog_suspend() != 0) {
> + if (lockup_detector_suspend() != 0) {
>   pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 
> workaround\n");
>   return 0;
>   }
> @@ -3379,7 +3379,7 @@ static __init int fixup_ht_bug(void)
>   x86_pmu.commit_scheduling = NULL;
>   x86_pmu.stop_scheduling = NULL;
>  
> - watchdog_resume();
> + lockup_detector_resume();
>  
>   get_online_cpus();
>  
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 46e28e9e..78488e0 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -84,15 +84,15 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>   void __user *, size_t *, loff_t *);
>  extern int proc_watchdog_cpumask(struct ctl_table *, int,
>void __user *, size_t *, loff_t *);
> -extern int watchdog_suspend(void);
> -extern void watchdog_resume(void);
> +extern int lockup_detector_suspend(void);
> +extern void lockup_detector_resume(void);
>  #else
> -static inline int watchdog_suspend(void)
> +static inline int lockup_detector_suspend(void)
>  {
>   return 0;
>  }
>  
> -static inline void watchdog_resume(void)
> +static inline void lockup_detector_resume(void)
>  {
>  }
>  #endif
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 69666f4..64ed1c3 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -67,8 +67,26 @@ unsigned long *watchdog_cpumask_bits = 
> cpumask_bits(_cpumask);
>  #define for_each_watchdog_cpu(cpu) \
>   for_each_cpu_and((cpu), cpu_online_mask, _cpumask)
>  
> -static int __read_mostly watchdog_suspended;
> +/*
> + * The 'watchdog_running' variable is set to 1 when the watchdog threads
> + * are registered/started and is set to 0 when the watchdog threads are
> + * unregistered/stopped, so it is an indicator whether the threads exist.
> + */
>  static int __read_mostly watchdog_running;
> +/*
> + * If a subsystem has a need to deactivate the watchdog temporarily, it
> + * can use the suspend/resume interface to achieve this. The content of
> + * the 'watchdog_suspended' variable reflects this state. Existing threads
> + * are parked/unparked by the lockup_detector_{suspend|resume} functions
> + * (see comment blocks pertaining to those functions for further details).
> + *
> + * 'watchdog_suspended' also prevents threads from being registered/started
> + * or unregistered/stopped via parameters in /proc/sys/kernel, so the state
> + * of 'watchdog_running' cannot change while the watchdog is deactivated
> + * temporarily (see related code in 'proc' handlers).
> + */
> +static int __read_mostly watchdog_suspended;
> +
>  static u64 __read_mostly sample_period;
>  
>  static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
> @@ -669,7 +687,7 @@ static void watchdog_unpark_threads(void)
>  /*
>   * Suspend the hard and soft lockup detector by parking the watchdog threads.
>   */
> -int watchdog_suspend(void)
> +int lockup_detector_suspend(void)
>  {
>   int ret = 0;
>  
> @@ -679,7 +697,7 @@ int watchdog_suspend(void)
>* the 'watchdog_suspended' variable). If the watchdog threads are
>* running, the first caller takes care that they will be parked.
>* The state of 'watchdog_running' cannot change while a suspend
> -  * request is active (see related changes in 'proc' handlers).
> +  * request is active (see related code in 'proc' handlers).
>*/
>   if (watchdog_running && !watchdog_suspended)
>   ret = watchdog_park_threads();
> @@ -695,7 +713,7 @@ int watchdog_suspend(void)
>  /*
>   * Resume the hard and soft lockup detector by unparking the watchdog 
> threads.
>   */
> -void watchdog_resume(void)
> +void lockup_detector_resume(void)
>  {
>   mutex_lock(_proc_mutex);
>  

Reviewed-by: Aaron Tomlin 

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [PATCH 1/2] watchdog: rename watchdog_suspend() and watchdog_resume()

2015-08-07 Thread Aaron Tomlin
On Fri 2015-08-07 11:58 +0200, Ulrich Obergfell wrote:
 Rename watchdog_suspend() to lockup_detector_suspend() and
 watchdog_resume() to lockup_detector_resume() to avoid
 confusion with the watchdog subsystem and to be consistent
 with the existing name lockup_detector_init().
 
 Also provide comment blocks to explain the watchdog_running
 and watchdog_suspended variables and their relationship.
 
 Signed-off-by: Ulrich Obergfell uober...@redhat.com
 ---
  arch/x86/kernel/cpu/perf_event_intel.c |  4 ++--
  include/linux/nmi.h|  8 
  kernel/watchdog.c  | 26 ++
  3 files changed, 28 insertions(+), 10 deletions(-)
 
 diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
 b/arch/x86/kernel/cpu/perf_event_intel.c
 index d4e1b0c..0357bf7 100644
 --- a/arch/x86/kernel/cpu/perf_event_intel.c
 +++ b/arch/x86/kernel/cpu/perf_event_intel.c
 @@ -3368,7 +3368,7 @@ static __init int fixup_ht_bug(void)
   return 0;
   }
  
 - if (watchdog_suspend() != 0) {
 + if (lockup_detector_suspend() != 0) {
   pr_info(failed to disable PMU erratum BJ122, BV98, HSD29 
 workaround\n);
   return 0;
   }
 @@ -3379,7 +3379,7 @@ static __init int fixup_ht_bug(void)
   x86_pmu.commit_scheduling = NULL;
   x86_pmu.stop_scheduling = NULL;
  
 - watchdog_resume();
 + lockup_detector_resume();
  
   get_online_cpus();
  
 diff --git a/include/linux/nmi.h b/include/linux/nmi.h
 index 46e28e9e..78488e0 100644
 --- a/include/linux/nmi.h
 +++ b/include/linux/nmi.h
 @@ -84,15 +84,15 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
   void __user *, size_t *, loff_t *);
  extern int proc_watchdog_cpumask(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 -extern int watchdog_suspend(void);
 -extern void watchdog_resume(void);
 +extern int lockup_detector_suspend(void);
 +extern void lockup_detector_resume(void);
  #else
 -static inline int watchdog_suspend(void)
 +static inline int lockup_detector_suspend(void)
  {
   return 0;
  }
  
 -static inline void watchdog_resume(void)
 +static inline void lockup_detector_resume(void)
  {
  }
  #endif
 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 69666f4..64ed1c3 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -67,8 +67,26 @@ unsigned long *watchdog_cpumask_bits = 
 cpumask_bits(watchdog_cpumask);
  #define for_each_watchdog_cpu(cpu) \
   for_each_cpu_and((cpu), cpu_online_mask, watchdog_cpumask)
  
 -static int __read_mostly watchdog_suspended;
 +/*
 + * The 'watchdog_running' variable is set to 1 when the watchdog threads
 + * are registered/started and is set to 0 when the watchdog threads are
 + * unregistered/stopped, so it is an indicator whether the threads exist.
 + */
  static int __read_mostly watchdog_running;
 +/*
 + * If a subsystem has a need to deactivate the watchdog temporarily, it
 + * can use the suspend/resume interface to achieve this. The content of
 + * the 'watchdog_suspended' variable reflects this state. Existing threads
 + * are parked/unparked by the lockup_detector_{suspend|resume} functions
 + * (see comment blocks pertaining to those functions for further details).
 + *
 + * 'watchdog_suspended' also prevents threads from being registered/started
 + * or unregistered/stopped via parameters in /proc/sys/kernel, so the state
 + * of 'watchdog_running' cannot change while the watchdog is deactivated
 + * temporarily (see related code in 'proc' handlers).
 + */
 +static int __read_mostly watchdog_suspended;
 +
  static u64 __read_mostly sample_period;
  
  static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
 @@ -669,7 +687,7 @@ static void watchdog_unpark_threads(void)
  /*
   * Suspend the hard and soft lockup detector by parking the watchdog threads.
   */
 -int watchdog_suspend(void)
 +int lockup_detector_suspend(void)
  {
   int ret = 0;
  
 @@ -679,7 +697,7 @@ int watchdog_suspend(void)
* the 'watchdog_suspended' variable). If the watchdog threads are
* running, the first caller takes care that they will be parked.
* The state of 'watchdog_running' cannot change while a suspend
 -  * request is active (see related changes in 'proc' handlers).
 +  * request is active (see related code in 'proc' handlers).
*/
   if (watchdog_running  !watchdog_suspended)
   ret = watchdog_park_threads();
 @@ -695,7 +713,7 @@ int watchdog_suspend(void)
  /*
   * Resume the hard and soft lockup detector by unparking the watchdog 
 threads.
   */
 -void watchdog_resume(void)
 +void lockup_detector_resume(void)
  {
   mutex_lock(watchdog_proc_mutex);
  

Reviewed-by: Aaron Tomlin atom...@redhat.com

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [PATCH 2/2] watchdog: use pr_debug() in fixup_ht_bug() failure path

2015-08-07 Thread Aaron Tomlin
On Fri 2015-08-07 11:58 +0200, Ulrich Obergfell wrote:
 
 Signed-off-by: Ulrich Obergfell uober...@redhat.com
 ---
  arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
 b/arch/x86/kernel/cpu/perf_event_intel.c
 index 0357bf7..abb25c3 100644
 --- a/arch/x86/kernel/cpu/perf_event_intel.c
 +++ b/arch/x86/kernel/cpu/perf_event_intel.c
 @@ -3369,7 +3369,7 @@ static __init int fixup_ht_bug(void)
   }
  
   if (lockup_detector_suspend() != 0) {
 - pr_info(failed to disable PMU erratum BJ122, BV98, HSD29 
 workaround\n);
 + pr_debug(failed to disable PMU erratum BJ122, BV98, HSD29 
 workaround\n);
   return 0;
   }
  

Reviewed-by: Aaron Tomlin atom...@redhat.com

-- 
Aaron Tomlin


signature.asc
Description: PGP signature


Re: [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions

2015-08-02 Thread Aaron Tomlin
On Sat 2015-08-01 14:49 +0200, Ulrich Obergfell wrote:
> Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
> only called in watchdog thread context. However, the following commits
> utilize these functions outside of watchdog thread context too.
> 
>   commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
>   Author: Michal Hocko 
>   Date:   Tue Sep 24 15:27:30 2013 -0700
> 
>   watchdog: update watchdog_thresh properly
> 
>   commit b3738d29323344da3017a91010530cf3a58590fc
>   Author: Stephane Eranian 
>   Date:   Mon Nov 17 20:07:03 2014 +0100
> 
>   watchdog: Add watchdog enable/disable all functions
> 
> Hence, it is now possible that these functions execute concurrently with
> the same 'cpu' argument. This concurrency is problematic because per-cpu
> 'watchdog_ev' can be accessed/modified without adequate synchronization.
> 
> The patch series aims to address the above problem. However, instead of
> introducing locks to protect per-cpu 'watchdog_ev' a different approach
> is taken: Invoke these functions by parking and unparking the watchdog
> threads (to ensure they are always called in watchdog thread context).
> 
>   static struct smp_hotplug_thread watchdog_threads = {
>...
>   .park   = watchdog_disable, // calls watchdog_nmi_disable()
>   .unpark = watchdog_enable,  // calls watchdog_nmi_enable()
>   };
> 
> Both previously mentioned commits call these functions in a similar way
> and thus in principle contain some duplicate code. The patch series also
> avoids this duplication by providing a commonly usable mechanism.
> 
> - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
>   park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
>   are intended to be called inside of kernel/watchdog.c only.
> 
> - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
>   be utilized by external callers to deactivate the hard and soft lockup
>   detector temporarily.
> 
> - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
>   that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
> 
> - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
>   was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
> 
> A few corner cases should be mentioned here for completeness.
> 
> - kthread_park() of watchdog/N could hang if cpu N is already locked up.
>   However, if watchdog is enabled the lockup will be detected anyway.
> 
> - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
>   kthread_park(). The occurrence of this scenario should be _very_ rare
>   in practice, in particular because it is not expected that temporary
>   deactivation will happen frequently, and if it happens at all it is
>   expected that the duration of deactivation will be short.
> 
> Ulrich Obergfell (4):
>   watchdog: introduce watchdog_park_threads() and
> watchdog_unpark_threads()
>   watchdog: introduce watchdog_suspend() and watchdog_resume()
>   watchdog: use park/unpark functions in update_watchdog_all_cpus()
>   watchdog: use suspend/resume interface in fixup_ht_bug()
> 
>  arch/x86/kernel/cpu/perf_event_intel.c |   9 +-
>  include/linux/nmi.h|   2 +
>  include/linux/watchdog.h   |   8 --
>  kernel/watchdog.c  | 157 
> +++--
>  4 files changed, 100 insertions(+), 76 deletions(-)
> 

The whole patch series looks good to me.

Reviewed-by: Aaron Tomlin 


signature.asc
Description: PGP signature


Re: [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions

2015-08-02 Thread Aaron Tomlin
On Sat 2015-08-01 14:49 +0200, Ulrich Obergfell wrote:
 Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
 only called in watchdog thread context. However, the following commits
 utilize these functions outside of watchdog thread context too.
 
   commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
   Author: Michal Hocko mho...@suse.cz
   Date:   Tue Sep 24 15:27:30 2013 -0700
 
   watchdog: update watchdog_thresh properly
 
   commit b3738d29323344da3017a91010530cf3a58590fc
   Author: Stephane Eranian eran...@google.com
   Date:   Mon Nov 17 20:07:03 2014 +0100
 
   watchdog: Add watchdog enable/disable all functions
 
 Hence, it is now possible that these functions execute concurrently with
 the same 'cpu' argument. This concurrency is problematic because per-cpu
 'watchdog_ev' can be accessed/modified without adequate synchronization.
 
 The patch series aims to address the above problem. However, instead of
 introducing locks to protect per-cpu 'watchdog_ev' a different approach
 is taken: Invoke these functions by parking and unparking the watchdog
 threads (to ensure they are always called in watchdog thread context).
 
   static struct smp_hotplug_thread watchdog_threads = {
...
   .park   = watchdog_disable, // calls watchdog_nmi_disable()
   .unpark = watchdog_enable,  // calls watchdog_nmi_enable()
   };
 
 Both previously mentioned commits call these functions in a similar way
 and thus in principle contain some duplicate code. The patch series also
 avoids this duplication by providing a commonly usable mechanism.
 
 - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
   park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
   are intended to be called inside of kernel/watchdog.c only.
 
 - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
   be utilized by external callers to deactivate the hard and soft lockup
   detector temporarily.
 
 - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
   that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
 
 - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
   was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
 
 A few corner cases should be mentioned here for completeness.
 
 - kthread_park() of watchdog/N could hang if cpu N is already locked up.
   However, if watchdog is enabled the lockup will be detected anyway.
 
 - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
   kthread_park(). The occurrence of this scenario should be _very_ rare
   in practice, in particular because it is not expected that temporary
   deactivation will happen frequently, and if it happens at all it is
   expected that the duration of deactivation will be short.
 
 Ulrich Obergfell (4):
   watchdog: introduce watchdog_park_threads() and
 watchdog_unpark_threads()
   watchdog: introduce watchdog_suspend() and watchdog_resume()
   watchdog: use park/unpark functions in update_watchdog_all_cpus()
   watchdog: use suspend/resume interface in fixup_ht_bug()
 
  arch/x86/kernel/cpu/perf_event_intel.c |   9 +-
  include/linux/nmi.h|   2 +
  include/linux/watchdog.h   |   8 --
  kernel/watchdog.c  | 157 
 +++--
  4 files changed, 100 insertions(+), 76 deletions(-)
 

The whole patch series looks good to me.

Reviewed-by: Aaron Tomlin atom...@redhat.com


signature.asc
Description: PGP signature


Re: [PATCH 2/2] hung_task: improve the rcu_lock_break() logic

2015-03-20 Thread Aaron Tomlin
On Tue 2015-03-17 20:25 +0100, Oleg Nesterov wrote:
> check_hung_uninterruptible_tasks() stops after rcu_lock_break() if either
> "t" or "g" exits, this is suboptimal.
> 
> If "t" is alive, we can always continue, t->group_leader can be used as the
> new "g". We do not even bother to check g != NULL in this case.
> 
> If "g" is alive, we can at least continue the outer for_each_process() loop.
> 
> Signed-off-by: Oleg Nesterov 
> ---
>  kernel/hung_task.c |   29 -
>  1 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 4735b99..f488059 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -134,20 +134,26 @@ static void check_hung_task(struct task_struct *t, 
> unsigned long timeout)
>   * For preemptible RCU it is sufficient to call rcu_read_unlock in order
>   * to exit the grace period. For classic RCU, a reschedule is required.
>   */
> -static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
> +static void rcu_lock_break(struct task_struct **g, struct task_struct **t)
>  {
> - bool can_cont;
> + bool alive;
> +
> + get_task_struct(*g);
> + get_task_struct(*t);
>  
> - get_task_struct(g);
> - get_task_struct(t);
>   rcu_read_unlock();
>   cond_resched();
>   rcu_read_lock();
> - can_cont = pid_alive(g) && pid_alive(t);
> - put_task_struct(t);
> - put_task_struct(g);
>  
> - return can_cont;
> + alive = pid_alive(*g);
> + put_task_struct(*g);
> + if (!alive)
> + *g = NULL;
> +
> + alive = pid_alive(*t);
> + put_task_struct(*t);
> + if (!alive)
> + *t = NULL;
>  }
>  
>  /*
> @@ -178,7 +184,12 @@ static void check_hung_uninterruptible_tasks(unsigned 
> long timeout)
>  
>   if (!--batch_count) {
>   batch_count = HUNG_TASK_BATCHING;
> - if (!rcu_lock_break(g, t))
> + rcu_lock_break(, );
> + if (t)  /* in case g == NULL */
> + g = t->group_leader;
> + else if (g) /* continue the outer loop */
> + break;
> +     else/* both dead */
>   goto unlock;
>   }
>   /* use "==" to skip the TASK_KILLABLE tasks */

Looks good to me. Thanks.

Acked-by: Aaron Tomlin 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] hung_task: split for_each_process_thread() into for_each_process() + __for_each_thread()

2015-03-20 Thread Aaron Tomlin
On Tue 2015-03-17 20:25 +0100, Oleg Nesterov wrote:
> Preparation. Change the main loop in check_hung_uninterruptible_tasks()
> to use the nested for_each_process() + __for_each_thread() loops explicitly.
> Note that we use __for_each_thread(), not for_each_thread(). This way it
> is clear that the inner loop doesn't depend on 'g' after we read ->signal.
> 
> Signed-off-by: Oleg Nesterov 
> ---
>  kernel/hung_task.c |   23 ++-
>  1 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index e0f90c2..4735b99 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -169,17 +169,22 @@ static void check_hung_uninterruptible_tasks(unsigned 
> long timeout)
>   return;
>  
>   rcu_read_lock();
> - for_each_process_thread(g, t) {
> - if (!max_count--)
> - goto unlock;
> - if (!--batch_count) {
> - batch_count = HUNG_TASK_BATCHING;
> - if (!rcu_lock_break(g, t))
> + for_each_process(g) {
> + struct signal_struct *sig = g->signal;
> +
> + __for_each_thread(sig, t) {
> + if (!max_count--)
>   goto unlock;
> +
> + if (!--batch_count) {
> + batch_count = HUNG_TASK_BATCHING;
> + if (!rcu_lock_break(g, t))
> + goto unlock;
> + }
> + /* use "==" to skip the TASK_KILLABLE tasks */
> + if (t->state == TASK_UNINTERRUPTIBLE)
> + check_hung_task(t, timeout);
>   }
> - /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
> - if (t->state == TASK_UNINTERRUPTIBLE)
> - check_hung_task(t, timeout);
>   }
>   unlock:
>   rcu_read_unlock();
> 

Acked-by: Aaron Tomlin 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] hung_task: improve the rcu_lock_break() logic

2015-03-20 Thread Aaron Tomlin
On Tue 2015-03-17 20:25 +0100, Oleg Nesterov wrote:
 check_hung_uninterruptible_tasks() stops after rcu_lock_break() if either
 t or g exits, this is suboptimal.
 
 If t is alive, we can always continue, t-group_leader can be used as the
 new g. We do not even bother to check g != NULL in this case.
 
 If g is alive, we can at least continue the outer for_each_process() loop.
 
 Signed-off-by: Oleg Nesterov o...@redhat.com
 ---
  kernel/hung_task.c |   29 -
  1 files changed, 20 insertions(+), 9 deletions(-)
 
 diff --git a/kernel/hung_task.c b/kernel/hung_task.c
 index 4735b99..f488059 100644
 --- a/kernel/hung_task.c
 +++ b/kernel/hung_task.c
 @@ -134,20 +134,26 @@ static void check_hung_task(struct task_struct *t, 
 unsigned long timeout)
   * For preemptible RCU it is sufficient to call rcu_read_unlock in order
   * to exit the grace period. For classic RCU, a reschedule is required.
   */
 -static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
 +static void rcu_lock_break(struct task_struct **g, struct task_struct **t)
  {
 - bool can_cont;
 + bool alive;
 +
 + get_task_struct(*g);
 + get_task_struct(*t);
  
 - get_task_struct(g);
 - get_task_struct(t);
   rcu_read_unlock();
   cond_resched();
   rcu_read_lock();
 - can_cont = pid_alive(g)  pid_alive(t);
 - put_task_struct(t);
 - put_task_struct(g);
  
 - return can_cont;
 + alive = pid_alive(*g);
 + put_task_struct(*g);
 + if (!alive)
 + *g = NULL;
 +
 + alive = pid_alive(*t);
 + put_task_struct(*t);
 + if (!alive)
 + *t = NULL;
  }
  
  /*
 @@ -178,7 +184,12 @@ static void check_hung_uninterruptible_tasks(unsigned 
 long timeout)
  
   if (!--batch_count) {
   batch_count = HUNG_TASK_BATCHING;
 - if (!rcu_lock_break(g, t))
 + rcu_lock_break(g, t);
 + if (t)  /* in case g == NULL */
 + g = t-group_leader;
 + else if (g) /* continue the outer loop */
 + break;
 + else/* both dead */
   goto unlock;
   }
   /* use == to skip the TASK_KILLABLE tasks */

Looks good to me. Thanks.

Acked-by: Aaron Tomlin atom...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] hung_task: split for_each_process_thread() into for_each_process() + __for_each_thread()

2015-03-20 Thread Aaron Tomlin
On Tue 2015-03-17 20:25 +0100, Oleg Nesterov wrote:
 Preparation. Change the main loop in check_hung_uninterruptible_tasks()
 to use the nested for_each_process() + __for_each_thread() loops explicitly.
 Note that we use __for_each_thread(), not for_each_thread(). This way it
 is clear that the inner loop doesn't depend on 'g' after we read -signal.
 
 Signed-off-by: Oleg Nesterov o...@redhat.com
 ---
  kernel/hung_task.c |   23 ++-
  1 files changed, 14 insertions(+), 9 deletions(-)
 
 diff --git a/kernel/hung_task.c b/kernel/hung_task.c
 index e0f90c2..4735b99 100644
 --- a/kernel/hung_task.c
 +++ b/kernel/hung_task.c
 @@ -169,17 +169,22 @@ static void check_hung_uninterruptible_tasks(unsigned 
 long timeout)
   return;
  
   rcu_read_lock();
 - for_each_process_thread(g, t) {
 - if (!max_count--)
 - goto unlock;
 - if (!--batch_count) {
 - batch_count = HUNG_TASK_BATCHING;
 - if (!rcu_lock_break(g, t))
 + for_each_process(g) {
 + struct signal_struct *sig = g-signal;
 +
 + __for_each_thread(sig, t) {
 + if (!max_count--)
   goto unlock;
 +
 + if (!--batch_count) {
 + batch_count = HUNG_TASK_BATCHING;
 + if (!rcu_lock_break(g, t))
 + goto unlock;
 + }
 + /* use == to skip the TASK_KILLABLE tasks */
 + if (t-state == TASK_UNINTERRUPTIBLE)
 + check_hung_task(t, timeout);
   }
 - /* use == to skip the TASK_KILLABLE tasks waiting on NFS */
 - if (t-state == TASK_UNINTERRUPTIBLE)
 - check_hung_task(t, timeout);
   }
   unlock:
   rcu_read_unlock();
 

Acked-by: Aaron Tomlin atom...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] hung_task: Change hung_task.c to use for_each_process_thread()

2015-03-17 Thread Aaron Tomlin
On Tue 2015-03-17 18:09 +0100, Oleg Nesterov wrote:
> On 03/17, Aaron Tomlin wrote:
> >
> > --- a/kernel/hung_task.c
> > +++ b/kernel/hung_task.c
> > @@ -169,7 +169,7 @@ static void check_hung_uninterruptible_tasks(unsigned 
> > long timeout)
> > return;
> >  
> > rcu_read_lock();
> > -   do_each_thread(g, t) {
> > +   for_each_process_thread(g, t) {
> > if (!max_count--)
> > goto unlock;
> > if (!--batch_count) {
> > @@ -180,7 +180,7 @@ static void check_hung_uninterruptible_tasks(unsigned 
> > long timeout)
> > /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
> > if (t->state == TASK_UNINTERRUPTIBLE)
> > check_hung_task(t, timeout);
> > -   } while_each_thread(g, t);
> > +   }
> 
> 
> Acked-by: Oleg Nesterov 
> 
> 
> 
> Perhaps it also makes sense to improve this rcu_lock_break a bit...
> For example, if 't' is dead but 'g' is alive we can continue the
> "for_each_process" part of this double loop. And if 't' is still
> alive then we can find the new leader and continue...

OK - I'll incorporate that in a separate patch.

Thanks,

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] hung_task: Change hung_task.c to use for_each_process_thread()

2015-03-17 Thread Aaron Tomlin
In check_hung_uninterruptible_tasks() avoid the use of deprecated
while_each_thread().

The "max_count" logic will prevents a livelock - see commit 0c740d0a
("introduce for_each_thread() to replace the buggy while_each_thread()").
Having said this let's use for_each_process_thread().

Signed-off-by: Aaron Tomlin 
---
 kernel/hung_task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06db124..e0f90c2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -169,7 +169,7 @@ static void check_hung_uninterruptible_tasks(unsigned long 
timeout)
return;
 
rcu_read_lock();
-   do_each_thread(g, t) {
+   for_each_process_thread(g, t) {
if (!max_count--)
goto unlock;
if (!--batch_count) {
@@ -180,7 +180,7 @@ static void check_hung_uninterruptible_tasks(unsigned long 
timeout)
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);
-   } while_each_thread(g, t);
+   }
  unlock:
rcu_read_unlock();
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1] hung_task: Change hung_task.c to use for_each_process_thread()

2015-03-17 Thread Aaron Tomlin
Hi Andrew,


Further work is required to improve khungtaskd. I'll do this later
but for now let's start with this trivial clean up.


Aaron Tomlin (1):
  hung_task: Change hung_task.c to use for_each_process_thread()

 kernel/hung_task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] hung_task: Change hung_task.c to use for_each_process_thread()

2015-03-17 Thread Aaron Tomlin
On Tue 2015-03-17 18:09 +0100, Oleg Nesterov wrote:
 On 03/17, Aaron Tomlin wrote:
 
  --- a/kernel/hung_task.c
  +++ b/kernel/hung_task.c
  @@ -169,7 +169,7 @@ static void check_hung_uninterruptible_tasks(unsigned 
  long timeout)
  return;
   
  rcu_read_lock();
  -   do_each_thread(g, t) {
  +   for_each_process_thread(g, t) {
  if (!max_count--)
  goto unlock;
  if (!--batch_count) {
  @@ -180,7 +180,7 @@ static void check_hung_uninterruptible_tasks(unsigned 
  long timeout)
  /* use == to skip the TASK_KILLABLE tasks waiting on NFS */
  if (t-state == TASK_UNINTERRUPTIBLE)
  check_hung_task(t, timeout);
  -   } while_each_thread(g, t);
  +   }
 
 
 Acked-by: Oleg Nesterov o...@redhat.com
 
 
 
 Perhaps it also makes sense to improve this rcu_lock_break a bit...
 For example, if 't' is dead but 'g' is alive we can continue the
 for_each_process part of this double loop. And if 't' is still
 alive then we can find the new leader and continue...

OK - I'll incorporate that in a separate patch.

Thanks,

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] hung_task: Change hung_task.c to use for_each_process_thread()

2015-03-17 Thread Aaron Tomlin
In check_hung_uninterruptible_tasks() avoid the use of deprecated
while_each_thread().

The max_count logic will prevents a livelock - see commit 0c740d0a
(introduce for_each_thread() to replace the buggy while_each_thread()).
Having said this let's use for_each_process_thread().

Signed-off-by: Aaron Tomlin atom...@redhat.com
---
 kernel/hung_task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06db124..e0f90c2 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -169,7 +169,7 @@ static void check_hung_uninterruptible_tasks(unsigned long 
timeout)
return;
 
rcu_read_lock();
-   do_each_thread(g, t) {
+   for_each_process_thread(g, t) {
if (!max_count--)
goto unlock;
if (!--batch_count) {
@@ -180,7 +180,7 @@ static void check_hung_uninterruptible_tasks(unsigned long 
timeout)
/* use == to skip the TASK_KILLABLE tasks waiting on NFS */
if (t-state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);
-   } while_each_thread(g, t);
+   }
  unlock:
rcu_read_unlock();
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1] hung_task: Change hung_task.c to use for_each_process_thread()

2015-03-17 Thread Aaron Tomlin
Hi Andrew,


Further work is required to improve khungtaskd. I'll do this later
but for now let's start with this trivial clean up.


Aaron Tomlin (1):
  hung_task: Change hung_task.c to use for_each_process_thread()

 kernel/hung_task.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/32] do_fork(): Rename 'stack_size' argument to reflect actual use

2015-03-13 Thread Aaron Tomlin
On Fri 2015-03-13 20:04 +0200, Alex Dowad wrote:
> The 'stack_size' argument is never used to pass a stack size. It's only used 
> when
> forking a kernel thread, in which case it is an argument which should be 
> passed
> to the 'main' function which the kernel thread executes. Hence, rename it to
> 'kthread_arg'.
> 
> Signed-off-by: Alex Dowad 
> ---

AFAICT this clean up looks OK and should improve readability. Thanks.

-- 
Aaron Tomlin


pgp4xNeaAV5NO.pgp
Description: PGP signature


Re: [PATCH 01/32] do_fork(): Rename 'stack_size' argument to reflect actual use

2015-03-13 Thread Aaron Tomlin
On Fri 2015-03-13 20:04 +0200, Alex Dowad wrote:
 The 'stack_size' argument is never used to pass a stack size. It's only used 
 when
 forking a kernel thread, in which case it is an argument which should be 
 passed
 to the 'main' function which the kernel thread executes. Hence, rename it to
 'kthread_arg'.
 
 Signed-off-by: Alex Dowad alexinbeij...@gmail.com
 ---

AFAICT this clean up looks OK and should improve readability. Thanks.

-- 
Aaron Tomlin


pgp4xNeaAV5NO.pgp
Description: PGP signature


Re: [tracer branch] kernel BUG at kernel/sched/core.c:2697!

2014-11-06 Thread Aaron Tomlin
On Mon 2014-11-03 15:06 +0800, Fengguang Wu wrote:
> Hi Aaron,
> 
> FYI your patch triggered a BUG on an existing old bug.

Oh right. Good to know :)

> Let's hope it provides more info to debug the problem.

Hopefully :)

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tracer branch] kernel BUG at kernel/sched/core.c:2697!

2014-11-06 Thread Aaron Tomlin
On Mon 2014-11-03 15:06 +0800, Fengguang Wu wrote:
 Hi Aaron,
 
 FYI your patch triggered a BUG on an existing old bug.

Oh right. Good to know :)

 Let's hope it provides more info to debug the problem.

Hopefully :)

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysctl: terminate strings also on \r

2014-10-22 Thread Aaron Tomlin
On Tue, Oct 21, 2014 at 01:21:37PM -0700, Kees Cook wrote:
> From: Paul Wise 
> 
> This partially mitigates a common strategy used by attackers for hiding
> the full contents of strings in procfs from naive sysadmins who use cat,
> more or sysctl to inspect the contents of strings in procfs.
> 
> References: 
> http://www.jakoblell.com/blog/2014/05/07/hacking-contest-hiding-stuff-from-the-terminal/
> Signed-off-by: Paul Wise 
> Signed-off-by: Kees Cook 
> ---
>  kernel/sysctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 4aada6d9fe74..c34c9414caac 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1739,7 +1739,7 @@ static int _proc_do_string(char *data, int maxlen, int 
> write,
>   while ((p - buffer) < *lenp && len < maxlen - 1) {
>   if (get_user(c, p++))
>   return -EFAULT;
> - if (c == 0 || c == '\n')
> + if (c == 0 || c == '\n' || c == '\r')
>       break;
>   data[len++] = c;
>   }
> -- 

Acked-by: Aaron Tomlin 

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sysctl: terminate strings also on \r

2014-10-22 Thread Aaron Tomlin
On Tue, Oct 21, 2014 at 01:21:37PM -0700, Kees Cook wrote:
 From: Paul Wise pa...@bonedaddy.net
 
 This partially mitigates a common strategy used by attackers for hiding
 the full contents of strings in procfs from naive sysadmins who use cat,
 more or sysctl to inspect the contents of strings in procfs.
 
 References: 
 http://www.jakoblell.com/blog/2014/05/07/hacking-contest-hiding-stuff-from-the-terminal/
 Signed-off-by: Paul Wise pa...@bonedaddy.net
 Signed-off-by: Kees Cook keesc...@chromium.org
 ---
  kernel/sysctl.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/kernel/sysctl.c b/kernel/sysctl.c
 index 4aada6d9fe74..c34c9414caac 100644
 --- a/kernel/sysctl.c
 +++ b/kernel/sysctl.c
 @@ -1739,7 +1739,7 @@ static int _proc_do_string(char *data, int maxlen, int 
 write,
   while ((p - buffer)  *lenp  len  maxlen - 1) {
   if (get_user(c, p++))
   return -EFAULT;
 - if (c == 0 || c == '\n')
 + if (c == 0 || c == '\n' || c == '\r')
   break;
   data[len++] = c;
   }
 -- 

Acked-by: Aaron Tomlin atom...@redhat.com

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] fs: Use a seperate wq for do_sync_work() to avoid a potential deadlock

2014-09-19 Thread Aaron Tomlin
On Thu, Sep 18, 2014 at 07:16:13AM +1000, Dave Chinner wrote:
> >   - Both lru_add_drain and do_sync_work work items are added to
> > the same global system_wq
> > 
> >   - The current work fn on the system_wq is do_sync_work and is
> > blocked waiting to aquire an sb's s_umount for reading
> > 
> >   - The umount task is the current owner of the s_umount in
> > question but is waiting for do_sync_work to continue.
> > Thus we hit a deadlock situation.
> 
> What kernel did you see this deadlock on?

Sorry for the noise. This deadlock was produced under a kernel whereby
the workqueue implementation is significantly less sophisticated.

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix end_of_stack() fn and location of stack canary for archs using STACK_GROWSUP

2014-09-19 Thread Aaron Tomlin
On Fri, Sep 19, 2014 at 12:21:37PM +0100, James Hogan wrote:
> Hi Chuck,
> 
> On 16/09/14 08:37, Chuck Ebbert wrote:
> >  static inline unsigned long *end_of_stack(struct task_struct *p)
> >  {
> > +#ifdef CONFIG_STACK_GROWSUP
> > +   return (unsigned long *)((unsigned long)task_thread_info(p) + 
> > THREAD_SIZE) - 1;
> 
> Nit: this line should probably be wrapped to 80 columns.
> 
> Other than that, I've tested this on metag and can confirm that it fixes
> the following BUG which you would otherwise get during boot with Aaron's
> patches:
> 
> BUG: failure at kernel/sched/core.c:2664/schedule_debug()!
> Kernel panic - not syncing: BUG!
> 
> Tested-by: James Hogan  [metag]
> Acked-by: James Hogan 

OK.

Acked-by: Aaron Tomlin 

> Aaron: please can you try to get this patch applied before your patch
> series.

Ingo,


I hope it's not too late to get this patch in (once the nit has been
addressed) for CONFIG_STACK_GROWSUP?

Regards,

-- 
Aaron Tomlin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] init/main.c: Give init_task a canary

2014-09-19 Thread tip-bot for Aaron Tomlin
Commit-ID:  d4311ff1a8da48d609db9500f121c15580dfeeb7
Gitweb: http://git.kernel.org/tip/d4311ff1a8da48d609db9500f121c15580dfeeb7
Author: Aaron Tomlin 
AuthorDate: Fri, 12 Sep 2014 14:16:17 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Sep 2014 12:35:22 +0200

init/main.c: Give init_task a canary

Tasks get their end of stack set to STACK_END_MAGIC with the
aim to catch stack overruns. Currently this feature does not
apply to init_task. This patch removes this restriction.

Note that a similar patch was posted by Prarit Bhargava
some time ago but was never merged:

  http://marc.info/?l=linux-kernel=127144305403241=2

Signed-off-by: Aaron Tomlin 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Oleg Nesterov 
Acked-by: Michael Ellerman 
Cc: aneesh.ku...@linux.vnet.ibm.com
Cc: dzic...@redhat.com
Cc: b...@redhat.com
Cc: jcasti...@redhat.com
Cc: j...@redhat.com
Cc: minc...@kernel.org
Cc: t...@linutronix.de
Cc: han...@cmpxchg.org
Cc: Alex Thorlton 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Daeseok Youn 
Cc: David Rientjes 
Cc: Fabian Frederick 
Cc: Geert Uytterhoeven 
Cc: Jiri Olsa 
Cc: Kees Cook 
Cc: Kirill A. Shutemov 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Michael Opdenacker 
Cc: Paul Mackerras 
Cc: Prarit Bhargava 
Cc: Rik van Riel 
Cc: Rusty Russell 
Cc: Seiji Aguchi 
Cc: Steven Rostedt 
Cc: Vladimir Davydov 
Cc: Yasuaki Ishimatsu 
Cc: linuxppc-...@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atom...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/powerpc/mm/fault.c|  3 +--
 arch/x86/mm/fault.c|  3 +--
 include/linux/sched.h  |  2 ++
 init/main.c|  1 +
 kernel/fork.c  | 12 +---
 kernel/trace/trace_stack.c |  4 +---
 6 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 51ab9e7..35d0760c 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -538,7 +537,7 @@ void bad_page_fault(struct pt_regs *regs, unsigned long 
address, int sig)
regs->nip);
 
stackend = end_of_stack(current);
-   if (current != _task && *stackend != STACK_END_MAGIC)
+   if (*stackend != STACK_END_MAGIC)
printk(KERN_ALERT "Thread overran stack, or stack corrupted\n");
 
die("Kernel access of bad area", regs, sig);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a241946..bc23a70 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -3,7 +3,6 @@
  *  Copyright (C) 2001, 2002 Andi Kleen, SuSE Labs.
  *  Copyright (C) 2008-2009, Red Hat Inc., Ingo Molnar
  */
-#include/* STACK_END_MAGIC  */
 #include/* test_thread_flag(), ...  */
 #include   /* oops_begin/end, ...  */
 #include   /* search_exception_table   */
@@ -710,7 +709,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
show_fault_oops(regs, error_code, address);
 
stackend = end_of_stack(tsk);
-   if (tsk != _task && *stackend != STACK_END_MAGIC)
+   if (*stackend != STACK_END_MAGIC)
printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");
 
tsk->thread.cr2 = address;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 82ff3d6..118dca7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -57,6 +57,7 @@ struct sched_param {
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -2638,6 +2639,7 @@ static inline unsigned long stack_not_used(struct 
task_struct *p)
return (unsigned long)n - (unsigned long)end_of_stack(p);
 }
 #endif
+extern void set_task_stack_end_magic(struct task_struct *tsk);
 
 /* set thread flags in other task's structures
  * - see asm/thread_info.h for TIF_ flags available
diff --git a/init/main.c b/init/main.c
index bb1aed9..5fc3fc7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -508,6 +508,7 @@ asmlinkage __visible void __init start_kernel(void)
 * lockdep hash:
 */
lockdep_init();
+   set_task_stack_end_magic(_task);
smp_setup_processor_id();
debug_objects_early_init();
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 9387ae8..ad64248 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -294,11 +294,18 @@ int __weak arch_dup_task_struct(struct task_struct *dst,
return 0;
 }
 
+void set_task_stack_end_magic(struct task_struct *tsk)
+{
+   unsigned long *stackend;
+
+   stackend = end_of_stack(tsk);
+   *stackend = STACK_END_MAGIC;/* for overflow detection */
+}
+
 static struct task_struct *dup_task_struct(struct task_struct *orig)
 {
struct task_struct *tsk;
struct thread_info *ti;
-   unsigned long *stackend;
int node = tsk_fork_get_node(orig);
int 

[tip:sched/core] sched: Add default-disabled option to BUG() when stack end location is overwritten

2014-09-19 Thread tip-bot for Aaron Tomlin
Commit-ID:  0d9e26329b0c9263d4d9e0422d80a0e73268c52f
Gitweb: http://git.kernel.org/tip/0d9e26329b0c9263d4d9e0422d80a0e73268c52f
Author: Aaron Tomlin 
AuthorDate: Fri, 12 Sep 2014 14:16:19 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Sep 2014 12:35:24 +0200

sched: Add default-disabled option to BUG() when stack end location is 
overwritten

Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

This patch checks for a stack overrun and takes appropriate
action since the damage is already done, there is no point
in continuing.

Signed-off-by: Aaron Tomlin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: aneesh.ku...@linux.vnet.ibm.com
Cc: dzic...@redhat.com
Cc: b...@redhat.com
Cc: jcasti...@redhat.com
Cc: o...@redhat.com
Cc: r...@redhat.com
Cc: pra...@redhat.com
Cc: j...@redhat.com
Cc: minc...@kernel.org
Cc: m...@ellerman.id.au
Cc: t...@linutronix.de
Cc: rost...@goodmis.org
Cc: han...@cmpxchg.org
Cc: Alexei Starovoitov 
Cc: Al Viro 
Cc: Andi Kleen 
Cc: Andrew Morton 
Cc: Dan Streetman 
Cc: Davidlohr Bueso 
Cc: David S. Miller 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Lubomir Rintel 
Cc: Paul E. McKenney 
Link: 
http://lkml.kernel.org/r/1410527779-8133-4-git-send-email-atom...@redhat.com
Signed-off-by: Ingo Molnar 
---
 kernel/sched/core.c |  3 +++
 lib/Kconfig.debug   | 12 
 2 files changed, 15 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b1ddeb..61ee2b3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2693,6 +2693,9 @@ static noinline void __schedule_bug(struct task_struct 
*prev)
  */
 static inline void schedule_debug(struct task_struct *prev)
 {
+#ifdef CONFIG_SCHED_STACK_END_CHECK
+   BUG_ON(unlikely(task_stack_end_corrupted(prev)));
+#endif
/*
 * Test if we are atomic. Since do_exit() needs to call into
 * schedule() atomically, we ignore that path. Otherwise whine
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a285900..e58163d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -824,6 +824,18 @@ config SCHEDSTATS
  application, you can say N to avoid the very slight overhead
  this adds.
 
+config SCHED_STACK_END_CHECK
+   bool "Detect stack corruption on calls to schedule()"
+   depends on DEBUG_KERNEL
+   default n
+   help
+ This option checks for a stack overrun on calls to schedule().
+ If the stack end location is found to be over written always panic as
+ the content of the corrupted region can no longer be trusted.
+ This is to ensure no erroneous behaviour occurs which could result in
+ data corruption or a sporadic crash at a later stage once the region
+ is examined. The runtime overhead introduced is minimal.
+
 config TIMER_STATS
bool "Collect kernel timers statistics"
depends on DEBUG_KERNEL && PROC_FS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:sched/core] sched: Add helper for task stack page overrun checking

2014-09-19 Thread tip-bot for Aaron Tomlin
Commit-ID:  a70857e46dd13e87ae06bf0e64cb6a2d4f436265
Gitweb: http://git.kernel.org/tip/a70857e46dd13e87ae06bf0e64cb6a2d4f436265
Author: Aaron Tomlin 
AuthorDate: Fri, 12 Sep 2014 14:16:18 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Sep 2014 12:35:23 +0200

sched: Add helper for task stack page overrun checking

This facility is used in a few places so let's introduce
a helper function to improve code readability.

Signed-off-by: Aaron Tomlin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: aneesh.ku...@linux.vnet.ibm.com
Cc: dzic...@redhat.com
Cc: b...@redhat.com
Cc: jcasti...@redhat.com
Cc: o...@redhat.com
Cc: r...@redhat.com
Cc: pra...@redhat.com
Cc: j...@redhat.com
Cc: minc...@kernel.org
Cc: m...@ellerman.id.au
Cc: t...@linutronix.de
Cc: han...@cmpxchg.org
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Seiji Aguchi 
Cc: Steven Rostedt 
Cc: Yasuaki Ishimatsu 
Cc: linuxppc-...@lists.ozlabs.org
Link: 
http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atom...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/powerpc/mm/fault.c| 4 +---
 arch/x86/mm/fault.c| 4 +---
 include/linux/sched.h  | 2 ++
 kernel/trace/trace_stack.c | 2 +-
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 35d0760c..99b2f27 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -507,7 +507,6 @@ bail:
 void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 {
const struct exception_table_entry *entry;
-   unsigned long *stackend;
 
/* Are we prepared to handle this fault?  */
if ((entry = search_exception_tables(regs->nip)) != NULL) {
@@ -536,8 +535,7 @@ void bad_page_fault(struct pt_regs *regs, unsigned long 
address, int sig)
printk(KERN_ALERT "Faulting instruction address: 0x%08lx\n",
regs->nip);
 
-   stackend = end_of_stack(current);
-   if (*stackend != STACK_END_MAGIC)
+   if (task_stack_end_corrupted(current))
printk(KERN_ALERT "Thread overran stack, or stack corrupted\n");
 
die("Kernel access of bad area", regs, sig);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index bc23a70..6240bc7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -648,7 +648,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
   unsigned long address, int signal, int si_code)
 {
struct task_struct *tsk = current;
-   unsigned long *stackend;
unsigned long flags;
int sig;
 
@@ -708,8 +707,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 
show_fault_oops(regs, error_code, address);
 
-   stackend = end_of_stack(tsk);
-   if (*stackend != STACK_END_MAGIC)
+   if (task_stack_end_corrupted(tsk))
printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");
 
tsk->thread.cr2 = address;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 118dca7..18f5262 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2617,6 +2617,8 @@ static inline unsigned long *end_of_stack(struct 
task_struct *p)
 }
 
 #endif
+#define task_stack_end_corrupted(task) \
+   (*(end_of_stack(task)) != STACK_END_MAGIC)
 
 static inline int object_is_on_stack(void *obj)
 {
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 1636e41..16eddb3 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -170,7 +170,7 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}
 
-   if (*end_of_stack(current) != STACK_END_MAGIC) {
+   if (task_stack_end_corrupted(current)) {
print_max_stack();
BUG();
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >