Re: [PATCH] Add slowpath enter/exit trace events

2017-11-24 Thread peter enderborg
On 11/23/2017 03:01 PM, Michal Hocko wrote:
> I am not sure adding a probe on a production system will fly in many
> cases. A static tracepoint would be much easier in that case. But I
> agree there are other means to accomplish the same thing. My main point
> was to have an easy out-of-the-box way to check latencies. But that is
> not something I would really insist on.
>
In android tracefs (or part of it) is the way for the system to control to what 
developers can access to the linux system on commercial devices.  So it is very 
much used on production systems, it is even  a requirement from google to be 
certified as android.  Things like dmesg is not.  However, this probe is at the 
moment not in that scope. 

My point is that you need to condense the information as much as possible but 
still be useful before making the effort to copy it to userspace.  And  for 
this the trace-event are very useful for small systems since the cost is very 
low for events where no one is listening.



Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread peter enderborg
On 11/23/2017 02:43 PM, Tetsuo Handa wrote:
> Please see my attempt at
> http://lkml.kernel.org/r/1510833448-19918-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
>  .
> Printing just current thread is not sufficient for me.
>
>
Seems to  me that it is a lot more overhead with timers and stuff.
My probe is for the health of the system trying to capture how get the penalty 
and how much. A slowpath alloc in a audio stream can causes drop-outs. And they 
are very hard to debug in userspace.



Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Mel Gorman
On Thu, Nov 23, 2017 at 03:01:27PM +0100, Michal Hocko wrote:
> On Thu 23-11-17 13:36:29, Mel Gorman wrote:
> > On Thu, Nov 23, 2017 at 01:25:30PM +0100, Michal Hocko wrote:
> > > On Thu 23-11-17 11:43:36, peter.enderb...@sony.com wrote:
> > > > From: Peter Enderborg 
> > > > 
> > > > The warning of slow allocation has been removed, this is
> > > > a other way to fetch that information. But you need
> > > > to enable the trace. The exit function also returns
> > > > information about the number of retries, how long
> > > > it was stalled and failure reason if that happened.
> > > 
> > > I think this is just too excessive. We already have a tracepoint for the
> > > allocation exit. All we need is an entry to have a base to compare with.
> > > Another usecase would be to measure allocation latency. Information you
> > > are adding can be (partially) covered by existing tracepoints.
> > > 
> > 
> > You can gather that by simply adding a probe to __alloc_pages_slowpath
> > (like what perf probe does) and matching the trigger with the existing
> > mm_page_alloc points.
> 
> I am not sure adding a probe on a production system will fly in many
> cases.

Not sure why not considering that the final mechanism is very similar.

> A static tracepoint would be much easier in that case.

Sure, but if it's only really latencies that are a concern then a probe
would do the job without backports. 

> But I
> agree there are other means to accomplish the same thing. My main point
> was to have an easy out-of-the-box way to check latencies. But that is
> not something I would really insist on.
> 

An entry tracepoint is enough for just latencies by punting the analysis
to userspace and state tracking to either systemtap or userspace. If
userspace then it would need to never malloc once analysis starts and be
mlocked.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Michal Hocko
On Thu 23-11-17 13:36:29, Mel Gorman wrote:
> On Thu, Nov 23, 2017 at 01:25:30PM +0100, Michal Hocko wrote:
> > On Thu 23-11-17 11:43:36, peter.enderb...@sony.com wrote:
> > > From: Peter Enderborg 
> > > 
> > > The warning of slow allocation has been removed, this is
> > > a other way to fetch that information. But you need
> > > to enable the trace. The exit function also returns
> > > information about the number of retries, how long
> > > it was stalled and failure reason if that happened.
> > 
> > I think this is just too excessive. We already have a tracepoint for the
> > allocation exit. All we need is an entry to have a base to compare with.
> > Another usecase would be to measure allocation latency. Information you
> > are adding can be (partially) covered by existing tracepoints.
> > 
> 
> You can gather that by simply adding a probe to __alloc_pages_slowpath
> (like what perf probe does) and matching the trigger with the existing
> mm_page_alloc points.

I am not sure adding a probe on a production system will fly in many
cases. A static tracepoint would be much easier in that case. But I
agree there are other means to accomplish the same thing. My main point
was to have an easy out-of-the-box way to check latencies. But that is
not something I would really insist on.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Michal Hocko
On Thu 23-11-17 14:03:04, peter enderborg wrote:
> On 11/23/2017 01:47 PM, Michal Hocko wrote:
> >
> > This might be true but the other POV is that the trace point with the
> > additional information is just too disruptive to the rest of the code
> > and it exposes too many implementation details.
> 
> From who do you want to hide details? Is this a security thing? I
> don't understand this  argument.  Tracefs is not part of uapi,
> right?

Linus would disagree https://lwn.net/Articles/737530/

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Mel Gorman
On Thu, Nov 23, 2017 at 01:25:30PM +0100, Michal Hocko wrote:
> On Thu 23-11-17 11:43:36, peter.enderb...@sony.com wrote:
> > From: Peter Enderborg 
> > 
> > The warning of slow allocation has been removed, this is
> > a other way to fetch that information. But you need
> > to enable the trace. The exit function also returns
> > information about the number of retries, how long
> > it was stalled and failure reason if that happened.
> 
> I think this is just too excessive. We already have a tracepoint for the
> allocation exit. All we need is an entry to have a base to compare with.
> Another usecase would be to measure allocation latency. Information you
> are adding can be (partially) covered by existing tracepoints.
> 

You can gather that by simply adding a probe to __alloc_pages_slowpath
(like what perf probe does) and matching the trigger with the existing
mm_page_alloc points. This is a bit approximate because you would need
to filter mm_page_alloc hits that do not have a corresponding hit with
__alloc_pages_slowpath but that is easy.

With that probe, it's trivial to use systemtap to track the latencies between
those points on a per-processes basis and then only do a dump_stack from
systemtap for the ones that are above a particular threshold. This can all
be done without introducing state-tracking code into the page allocator
that is active regardless of whether the tracepoint is in use. It also
has the benefit of working with many older kernels.

If systemtap is not an option then use ftrace directly to gather the
information from userspace. It can be done via trace_pipe with some overhead
or on a per-cpu basis like what trace-cmd does. It's important to note
that even *if* the tracepoints were introduced that it would be necessary
to have something gather the information and report it in a sensible fashion.

That probe+mm_page_alloc can tell you the frequency of allocation
attempts that take a long time but not the why. Compaction and direct
reclaim latencies can be checked via existing tracepoints and in the case
of compaction, detailed information can also be gathered from existing
tracepoints. Detailed information on why direct reclaim stalled can be
harder but the biggest one is checking if reclaim stalls due to congestion
and again, tracepoints already exist for that.

I'm not convinced that a new tracepoint is needed.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Tetsuo Handa
On 2017/11/23 22:36, Mel Gorman wrote:
> On Thu, Nov 23, 2017 at 01:25:30PM +0100, Michal Hocko wrote:
>> On Thu 23-11-17 11:43:36, peter.enderb...@sony.com wrote:
>>> From: Peter Enderborg 
>>>
>>> The warning of slow allocation has been removed, this is
>>> a other way to fetch that information. But you need
>>> to enable the trace. The exit function also returns
>>> information about the number of retries, how long
>>> it was stalled and failure reason if that happened.
>>
>> I think this is just too excessive. We already have a tracepoint for the
>> allocation exit. All we need is an entry to have a base to compare with.
>> Another usecase would be to measure allocation latency. Information you
>> are adding can be (partially) covered by existing tracepoints.
>>
> 
> You can gather that by simply adding a probe to __alloc_pages_slowpath
> (like what perf probe does) and matching the trigger with the existing
> mm_page_alloc points. This is a bit approximate because you would need
> to filter mm_page_alloc hits that do not have a corresponding hit with
> __alloc_pages_slowpath but that is easy.
> 
> With that probe, it's trivial to use systemtap to track the latencies between
> those points on a per-processes basis and then only do a dump_stack from
> systemtap for the ones that are above a particular threshold. This can all
> be done without introducing state-tracking code into the page allocator
> that is active regardless of whether the tracepoint is in use. It also
> has the benefit of working with many older kernels.

Please see my attempt at
http://lkml.kernel.org/r/1510833448-19918-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
 .
Printing just current thread is not sufficient for me.



Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread peter enderborg
On 11/23/2017 01:47 PM, Michal Hocko wrote:
>
> This might be true but the other POV is that the trace point with the
> additional information is just too disruptive to the rest of the code
> and it exposes too many implementation details.

>From who do you want to hide details? Is this a security thing? I don't 
>understand this  argument.  Tracefs is not part of uapi, right?

Hopefully there are not that many fails, and they might be very hard to 
reproduce if you don't know what you are looking for.



Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Tetsuo Handa
On 2017/11/23 19:43, peter.enderb...@sony.com wrote:
> The warning of slow allocation has been removed, this is
> a other way to fetch that information. But you need
> to enable the trace. The exit function also returns
> information about the number of retries, how long
> it was stalled and failure reason if that happened.

However, the fast path (I mean, get_page_from_freelist() at
"/* First allocation attempt */" label) might be slow, for it is
allowed to call node_reclaim() which can take uncontrollable
duration. I think that you need to add hooks like
http://lkml.kernel.org/r/1510833448-19918-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
 does. ;-)


Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Michal Hocko
On Thu 23-11-17 13:35:15, peter enderborg wrote:
> Monitoring both enter/exit for all allocations and track down the one
> that are slow will be a very big load on mobile devices or embedded
> device consuming a lot of battery and cpu. With this we can do useful
> monitoring on devices on our field tests with real usage.

This might be true but the other POV is that the trace point with the
additional information is just too disruptive to the rest of the code
and it exposes too many implementation details.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread peter enderborg
Monitoring both enter/exit for all allocations and track down the one that are 
slow will be a very
big load on mobile devices or embedded device consuming a lot of battery and 
cpu. With this we can do useful monitoring
on devices on our field tests with real usage.

On 11/23/2017 01:25 PM, Michal Hocko wrote:
> On Thu 23-11-17 11:43:36, peter.enderb...@sony.com wrote:
>> From: Peter Enderborg 
>>
>> The warning of slow allocation has been removed, this is
>> a other way to fetch that information. But you need
>> to enable the trace. The exit function also returns
>> information about the number of retries, how long
>> it was stalled and failure reason if that happened.
> I think this is just too excessive. We already have a tracepoint for the
> allocation exit. All we need is an entry to have a base to compare with.
> Another usecase would be to measure allocation latency. Information you
> are adding can be (partially) covered by existing tracepoints.
>
>> Signed-off-by: Peter Enderborg 
>> ---
>>  include/trace/events/kmem.h | 68 
>> +
>>  mm/page_alloc.c | 62 +++--
>>  2 files changed, 116 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
>> index eb57e30..bb882ca 100644
>> --- a/include/trace/events/kmem.h
>> +++ b/include/trace/events/kmem.h
>> @@ -315,6 +315,74 @@ TRACE_EVENT(mm_page_alloc_extfrag,
>>  __entry->change_ownership)
>>  );
>>  
>> +TRACE_EVENT(mm_page_alloc_slowpath_enter,
>> +
>> +TP_PROTO(int alloc_order,
>> +nodemask_t *nodemask,
>> +gfp_t gfp_flags),
>> +
>> +TP_ARGS(alloc_order, nodemask, gfp_flags),
>> +
>> +TP_STRUCT__entry(
>> +__field(int, alloc_order)
>> +__field(nodemask_t *, nodemask)
>> +__field(gfp_t, gfp_flags)
>> + ),
>> +
>> + TP_fast_assign(
>> +__entry->alloc_order= alloc_order;
>> +__entry->nodemask   = nodemask;
>> +__entry->gfp_flags  = gfp_flags;
>> + ),
>> +
>> + TP_printk("alloc_order=%d nodemask=%*pbl gfp_flags=%s",
>> +__entry->alloc_order,
>> +nodemask_pr_args(__entry->nodemask),
>> +show_gfp_flags(__entry->gfp_flags))
>> +);
>> +
>> +TRACE_EVENT(mm_page_alloc_slowpath_exit,
>> +
>> +TP_PROTO(struct page *page,
>> +int alloc_order,
>> +nodemask_t *nodemask,
>> +u64 alloc_start,
>> +gfp_t gfp_flags,
>> +int retrys,
>> +int exit),
>> +
>> +TP_ARGS(page, alloc_order, nodemask, alloc_start, gfp_flags,
>> +retrys, exit),
>> +
>> +TP_STRUCT__entry(__field(struct page *, page)
>> +__field(int, alloc_order)
>> +__field(nodemask_t *, nodemask)
>> +__field(u64, msdelay)
>> +__field(gfp_t, gfp_flags)
>> +__field(int, retrys)
>> +__field(int, exit)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->page= page;
>> +__entry->alloc_order = alloc_order;
>> +__entry->nodemask= nodemask;
>> +__entry->msdelay = jiffies_to_msecs(jiffies-alloc_start);
>> +__entry->gfp_flags   = gfp_flags;
>> +__entry->retrys  = retrys;
>> +__entry->exit= exit;
>> +),
>> +
>> +TP_printk("page=%p alloc_order=%d nodemask=%*pbl msdelay=%llu 
>> gfp_flags=%s retrys=%d exit=%d",
>> +__entry->page,
>> +__entry->alloc_order,
>> +nodemask_pr_args(__entry->nodemask),
>> +__entry->msdelay,
>> +show_gfp_flags(__entry->gfp_flags),
>> +__entry->retrys,
>> +__entry->exit)
>> +);
>> +
>>  #endif /* _TRACE_KMEM_H */
>>  
>>  /* This part must be outside protection */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 48b5b01..bae9cb9 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -104,6 +104,17 @@ DEFINE_PER_CPU(struct work_struct, pcpu_drain);
>>  volatile unsigned long latent_entropy __latent_entropy;
>>  EXPORT_SYMBOL(latent_entropy);
>>  #endif
>> +enum slowpath_exit {
>> +SLOWPATH_NOZONE = -16,
>> +SLOWPATH_COMPACT_DEFERRED,
>> +SLOWPATH_CAN_NOT_DIRECT_RECLAIM,
>> +SLOWPATH_RECURSION,
>> +SLOWPATH_NO_RETRY,
>> +SLOWPATH_COSTLY_ORDER,
>> +SLOWPATH_OOM_VICTIM,
>> +SLOWPATH_NO_DIRECT_RECLAIM,
>> +SLOWPATH_ORDER
>> +};
>>  
>>  /*
>>   * Array of node states.
>> @@ -3908,8 +3919,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
>> order,
>>  enum compact_result compact_result;
>>  int compaction_retries;
>>  int no_progress_loops;
>> +unsigned long alloc_start = jiffies;
>>  unsigned int cpuset_mems_cookie;
>>  int reserve_flags;
>> +enum slowpath_exit slowpath_exit;
>> +int retry_count = 0;
>>

Re: [PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread Michal Hocko
On Thu 23-11-17 11:43:36, peter.enderb...@sony.com wrote:
> From: Peter Enderborg 
> 
> The warning of slow allocation has been removed, this is
> a other way to fetch that information. But you need
> to enable the trace. The exit function also returns
> information about the number of retries, how long
> it was stalled and failure reason if that happened.

I think this is just too excessive. We already have a tracepoint for the
allocation exit. All we need is an entry to have a base to compare with.
Another usecase would be to measure allocation latency. Information you
are adding can be (partially) covered by existing tracepoints.

> Signed-off-by: Peter Enderborg 
> ---
>  include/trace/events/kmem.h | 68 
> +
>  mm/page_alloc.c | 62 +++--
>  2 files changed, 116 insertions(+), 14 deletions(-)
> 
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index eb57e30..bb882ca 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -315,6 +315,74 @@ TRACE_EVENT(mm_page_alloc_extfrag,
>   __entry->change_ownership)
>  );
>  
> +TRACE_EVENT(mm_page_alloc_slowpath_enter,
> +
> + TP_PROTO(int alloc_order,
> + nodemask_t *nodemask,
> + gfp_t gfp_flags),
> +
> + TP_ARGS(alloc_order, nodemask, gfp_flags),
> +
> + TP_STRUCT__entry(
> + __field(int, alloc_order)
> + __field(nodemask_t *, nodemask)
> + __field(gfp_t, gfp_flags)
> +  ),
> +
> +  TP_fast_assign(
> + __entry->alloc_order= alloc_order;
> + __entry->nodemask   = nodemask;
> + __entry->gfp_flags  = gfp_flags;
> +  ),
> +
> +  TP_printk("alloc_order=%d nodemask=%*pbl gfp_flags=%s",
> + __entry->alloc_order,
> + nodemask_pr_args(__entry->nodemask),
> + show_gfp_flags(__entry->gfp_flags))
> +);
> +
> +TRACE_EVENT(mm_page_alloc_slowpath_exit,
> +
> + TP_PROTO(struct page *page,
> + int alloc_order,
> + nodemask_t *nodemask,
> + u64 alloc_start,
> + gfp_t gfp_flags,
> + int retrys,
> + int exit),
> +
> + TP_ARGS(page, alloc_order, nodemask, alloc_start, gfp_flags,
> + retrys, exit),
> +
> + TP_STRUCT__entry(__field(struct page *, page)
> + __field(int, alloc_order)
> + __field(nodemask_t *, nodemask)
> + __field(u64, msdelay)
> + __field(gfp_t, gfp_flags)
> + __field(int, retrys)
> + __field(int, exit)
> + ),
> +
> + TP_fast_assign(
> + __entry->page= page;
> + __entry->alloc_order = alloc_order;
> + __entry->nodemask= nodemask;
> + __entry->msdelay = jiffies_to_msecs(jiffies-alloc_start);
> + __entry->gfp_flags   = gfp_flags;
> + __entry->retrys  = retrys;
> + __entry->exit= exit;
> + ),
> +
> + TP_printk("page=%p alloc_order=%d nodemask=%*pbl msdelay=%llu 
> gfp_flags=%s retrys=%d exit=%d",
> + __entry->page,
> + __entry->alloc_order,
> + nodemask_pr_args(__entry->nodemask),
> + __entry->msdelay,
> + show_gfp_flags(__entry->gfp_flags),
> + __entry->retrys,
> + __entry->exit)
> +);
> +
>  #endif /* _TRACE_KMEM_H */
>  
>  /* This part must be outside protection */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48b5b01..bae9cb9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -104,6 +104,17 @@ DEFINE_PER_CPU(struct work_struct, pcpu_drain);
>  volatile unsigned long latent_entropy __latent_entropy;
>  EXPORT_SYMBOL(latent_entropy);
>  #endif
> +enum slowpath_exit {
> + SLOWPATH_NOZONE = -16,
> + SLOWPATH_COMPACT_DEFERRED,
> + SLOWPATH_CAN_NOT_DIRECT_RECLAIM,
> + SLOWPATH_RECURSION,
> + SLOWPATH_NO_RETRY,
> + SLOWPATH_COSTLY_ORDER,
> + SLOWPATH_OOM_VICTIM,
> + SLOWPATH_NO_DIRECT_RECLAIM,
> + SLOWPATH_ORDER
> +};
>  
>  /*
>   * Array of node states.
> @@ -3908,8 +3919,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>   enum compact_result compact_result;
>   int compaction_retries;
>   int no_progress_loops;
> + unsigned long alloc_start = jiffies;
>   unsigned int cpuset_mems_cookie;
>   int reserve_flags;
> + enum slowpath_exit slowpath_exit;
> + int retry_count = 0;
> +
> + trace_mm_page_alloc_slowpath_enter(order,
> + ac->nodemask,
> + gfp_mask);
>  
>   /*
>* In the slowpath, we sanity check order to avoid ever trying to
> @@ -3919,7 +3937,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
> order,
>*/
>   if (order >= MAX_ORDER) {
>   WARN_ON_ONCE(!(gfp_mask & __GFP

[PATCH] Add slowpath enter/exit trace events

2017-11-23 Thread peter.enderborg
From: Peter Enderborg 

The warning of slow allocation has been removed, this is
a other way to fetch that information. But you need
to enable the trace. The exit function also returns
information about the number of retries, how long
it was stalled and failure reason if that happened.

Signed-off-by: Peter Enderborg 
---
 include/trace/events/kmem.h | 68 +
 mm/page_alloc.c | 62 +++--
 2 files changed, 116 insertions(+), 14 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index eb57e30..bb882ca 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -315,6 +315,74 @@ TRACE_EVENT(mm_page_alloc_extfrag,
__entry->change_ownership)
 );
 
+TRACE_EVENT(mm_page_alloc_slowpath_enter,
+
+   TP_PROTO(int alloc_order,
+   nodemask_t *nodemask,
+   gfp_t gfp_flags),
+
+   TP_ARGS(alloc_order, nodemask, gfp_flags),
+
+   TP_STRUCT__entry(
+   __field(int, alloc_order)
+   __field(nodemask_t *, nodemask)
+   __field(gfp_t, gfp_flags)
+),
+
+TP_fast_assign(
+   __entry->alloc_order= alloc_order;
+   __entry->nodemask   = nodemask;
+   __entry->gfp_flags  = gfp_flags;
+),
+
+TP_printk("alloc_order=%d nodemask=%*pbl gfp_flags=%s",
+   __entry->alloc_order,
+   nodemask_pr_args(__entry->nodemask),
+   show_gfp_flags(__entry->gfp_flags))
+);
+
+TRACE_EVENT(mm_page_alloc_slowpath_exit,
+
+   TP_PROTO(struct page *page,
+   int alloc_order,
+   nodemask_t *nodemask,
+   u64 alloc_start,
+   gfp_t gfp_flags,
+   int retrys,
+   int exit),
+
+   TP_ARGS(page, alloc_order, nodemask, alloc_start, gfp_flags,
+   retrys, exit),
+
+   TP_STRUCT__entry(__field(struct page *, page)
+   __field(int, alloc_order)
+   __field(nodemask_t *, nodemask)
+   __field(u64, msdelay)
+   __field(gfp_t, gfp_flags)
+   __field(int, retrys)
+   __field(int, exit)
+   ),
+
+   TP_fast_assign(
+   __entry->page= page;
+   __entry->alloc_order = alloc_order;
+   __entry->nodemask= nodemask;
+   __entry->msdelay = jiffies_to_msecs(jiffies-alloc_start);
+   __entry->gfp_flags   = gfp_flags;
+   __entry->retrys  = retrys;
+   __entry->exit= exit;
+   ),
+
+   TP_printk("page=%p alloc_order=%d nodemask=%*pbl msdelay=%llu 
gfp_flags=%s retrys=%d exit=%d",
+   __entry->page,
+   __entry->alloc_order,
+   nodemask_pr_args(__entry->nodemask),
+   __entry->msdelay,
+   show_gfp_flags(__entry->gfp_flags),
+   __entry->retrys,
+   __entry->exit)
+);
+
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48b5b01..bae9cb9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -104,6 +104,17 @@ DEFINE_PER_CPU(struct work_struct, pcpu_drain);
 volatile unsigned long latent_entropy __latent_entropy;
 EXPORT_SYMBOL(latent_entropy);
 #endif
+enum slowpath_exit {
+   SLOWPATH_NOZONE = -16,
+   SLOWPATH_COMPACT_DEFERRED,
+   SLOWPATH_CAN_NOT_DIRECT_RECLAIM,
+   SLOWPATH_RECURSION,
+   SLOWPATH_NO_RETRY,
+   SLOWPATH_COSTLY_ORDER,
+   SLOWPATH_OOM_VICTIM,
+   SLOWPATH_NO_DIRECT_RECLAIM,
+   SLOWPATH_ORDER
+};
 
 /*
  * Array of node states.
@@ -3908,8 +3919,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
enum compact_result compact_result;
int compaction_retries;
int no_progress_loops;
+   unsigned long alloc_start = jiffies;
unsigned int cpuset_mems_cookie;
int reserve_flags;
+   enum slowpath_exit slowpath_exit;
+   int retry_count = 0;
+
+   trace_mm_page_alloc_slowpath_enter(order,
+   ac->nodemask,
+   gfp_mask);
 
/*
 * In the slowpath, we sanity check order to avoid ever trying to
@@ -3919,7 +3937,8 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 */
if (order >= MAX_ORDER) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
-   return NULL;
+   slowpath_exit = SLOWPATH_ORDER;
+   goto fail;
}
 
/*
@@ -3951,8 +3970,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
 */
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->high_zoneidx, ac->nodemask);
-   if (!ac->preferred_zoneref->zone)
+   if (!ac->preferred_zoneref->zone) {
+