Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-04-10 Thread Thomas Ilsche

However my fundamental concerns about the policy whether to disable the sched
tick remain:

Mixing the precise timer and vague heuristic for the decision is
dangerous. The timer should not be wrong, heuristic may be.


Well, I wouldn't say "dangerous".  It may be suboptimal, but even that is not
a given.  Besides ->


Decisions should use actual time points rather than the generic tick
duration and residency time. e.g.
expected_interval < delta_next_us
vs
expected_interval < TICK_USEC


-> the role of this check is to justify taking the overhead of stopping the
tick and it certainly is justifiable if the governor doesn't anticipate any
wakeups (timer or not) in the TICK_USEC range.  It may be justifiable in
other cases too, but that's a matter of some more complex checks and may not
be worth the extra complexity at all.


I tried that change. It's just just a bit bulky because I
cache the result of ktime_to_us(delta_next) early.

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index a6eca02..cad87bf 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -296,6 +296,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
unsigned long nr_iowaiters, cpu_load;
int resume_latency = dev_pm_qos_raw_read_value(device);
ktime_t delta_next;
+   unsigned long delta_next_us;

if (data->needs_update) {
menu_update(drv, dev);
@@ -314,6 +315,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
  
/* determine the expected residency time, round up */

data->next_timer_us = 
ktime_to_us(tick_nohz_get_sleep_length(_next));
+   delta_next_us = ktime_to_us(delta_next);
  
get_iowait_load(_iowaiters, _load);

data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
@@ -364,7 +366,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
 */
if (data->predicted_us < TICK_USEC)
data->predicted_us = min_t(unsigned int, TICK_USEC,
-  ktime_to_us(delta_next));
+  delta_next_us);
} else {
/*
 * Use the performance multiplier and the user-configurable
@@ -412,9 +414,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
 * expected idle duration is shorter than the tick period length.
 */
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-   expected_interval < TICK_USEC) {
-   unsigned int delta_next_us = ktime_to_us(delta_next);
-
+   expected_interval < delta_next_us) {
*stop_tick = false;
  
if (!tick_nohz_tick_stopped() && idx > 0 &&


This works as a I expect in the sense of stopping the tick more often
and allowing deeper sleep states in some cases. However, it
drastically *increases* the power consumption for some affected
workloads test system (SKL-SP).

So while I believe this generally improves the behavior - I can't
recommend it based on the practical implications. Below are some
details for the curious:

power consumption for various workload configurations with 250 Hz
kernels 4.16.0, v9, v9+delta_next patch:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_250_Hz_power.png

Practically v9 has the best power consumption in most cases.

The following detailed looks are with 1000 Hz kernels.
v9 with a synchronized 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_sync.png
v9 with a staggered 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_stagger.png

Both show that the sched tick is kept on and this causes more requests
to C1E than C6

v9+delta_next with a synchronized 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_sync.png
v9+delta_next with a staggered 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_stagger.png

No more sched tick, no more C1E requests, but increased power.

Besides:
- the hardware reports 0 residency in C6 (both core and PKG) for
  both v9 and v9+delta_next_us.
- the increased power consumption comes after a ramp-up of ~200 ms
  for the staggered and ~2 s for the synchronized workload.

For reference traces from an unmodified 4.16.0:

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_sync.png
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_stagger.png

It behaves similar to the delta_next patch but does not show the
increased power consumption in this exact workload configuration.

I couldn't help to dig into the effect a bit more and am able to
reproduce it even under unmodified kernels with staggered sleep 

Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-04-10 Thread Thomas Ilsche

However my fundamental concerns about the policy whether to disable the sched
tick remain:

Mixing the precise timer and vague heuristic for the decision is
dangerous. The timer should not be wrong, heuristic may be.


Well, I wouldn't say "dangerous".  It may be suboptimal, but even that is not
a given.  Besides ->


Decisions should use actual time points rather than the generic tick
duration and residency time. e.g.
expected_interval < delta_next_us
vs
expected_interval < TICK_USEC


-> the role of this check is to justify taking the overhead of stopping the
tick and it certainly is justifiable if the governor doesn't anticipate any
wakeups (timer or not) in the TICK_USEC range.  It may be justifiable in
other cases too, but that's a matter of some more complex checks and may not
be worth the extra complexity at all.


I tried that change. It's just just a bit bulky because I
cache the result of ktime_to_us(delta_next) early.

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index a6eca02..cad87bf 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -296,6 +296,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
unsigned long nr_iowaiters, cpu_load;
int resume_latency = dev_pm_qos_raw_read_value(device);
ktime_t delta_next;
+   unsigned long delta_next_us;

if (data->needs_update) {
menu_update(drv, dev);
@@ -314,6 +315,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
  
/* determine the expected residency time, round up */

data->next_timer_us = 
ktime_to_us(tick_nohz_get_sleep_length(_next));
+   delta_next_us = ktime_to_us(delta_next);
  
get_iowait_load(_iowaiters, _load);

data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
@@ -364,7 +366,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
 */
if (data->predicted_us < TICK_USEC)
data->predicted_us = min_t(unsigned int, TICK_USEC,
-  ktime_to_us(delta_next));
+  delta_next_us);
} else {
/*
 * Use the performance multiplier and the user-configurable
@@ -412,9 +414,7 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev,
 * expected idle duration is shorter than the tick period length.
 */
if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-   expected_interval < TICK_USEC) {
-   unsigned int delta_next_us = ktime_to_us(delta_next);
-
+   expected_interval < delta_next_us) {
*stop_tick = false;
  
if (!tick_nohz_tick_stopped() && idx > 0 &&


This works as a I expect in the sense of stopping the tick more often
and allowing deeper sleep states in some cases. However, it
drastically *increases* the power consumption for some affected
workloads test system (SKL-SP).

So while I believe this generally improves the behavior - I can't
recommend it based on the practical implications. Below are some
details for the curious:

power consumption for various workload configurations with 250 Hz
kernels 4.16.0, v9, v9+delta_next patch:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_250_Hz_power.png

Practically v9 has the best power consumption in most cases.

The following detailed looks are with 1000 Hz kernels.
v9 with a synchronized 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_sync.png
v9 with a staggered 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_poll_stagger.png

Both show that the sched tick is kept on and this causes more requests
to C1E than C6

v9+delta_next with a synchronized 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_sync.png
v9+delta_next with a staggered 950 us sleep workload:
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v9_delta_poll_stagger.png

No more sched tick, no more C1E requests, but increased power.

Besides:
- the hardware reports 0 residency in C6 (both core and PKG) for
  both v9 and v9+delta_next_us.
- the increased power consumption comes after a ramp-up of ~200 ms
  for the staggered and ~2 s for the synchronized workload.

For reference traces from an unmodified 4.16.0:

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_sync.png
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v4.16.0_poll_stagger.png

It behaves similar to the delta_next patch but does not show the
increased power consumption in this exact workload configuration.

I couldn't help to dig into the effect a bit more and am able to
reproduce it even under unmodified kernels with staggered sleep 

Re: [PATCH v9 00/10] sched/cpuidle: Idle loop rework

2018-04-09 Thread Thomas Ilsche

On 2018-04-08 18:32, Rafael J. Wysocki wrote:

The v9 along with some cleanups suggested by Frederic on top of it and with
ACKs from Peter (obtained on IRC) is now available from the pm-cpuidle branch
in the linux-pm.git tree.

It has been added to my linux-next branch, so it probably will be picked up by
linux-next tomorrow and I have a plan to push it for v4.17 in the second half
of the next week unless a major issue with it is found in the meantime.


Great to hear that. Thanks for all your work.

I'm finishing up some analysis of corner cases, but nothing major.
So I'm glad to see this is moving along. I've been nitpicking a lot,
but this is clearly a huge improvement and there are practical
limitations against a theoretically perfect solution. In any case the
changes will also make future policy adaptions much easier.

Thanks,
Thomas



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v9 00/10] sched/cpuidle: Idle loop rework

2018-04-09 Thread Thomas Ilsche

On 2018-04-08 18:32, Rafael J. Wysocki wrote:

The v9 along with some cleanups suggested by Frederic on top of it and with
ACKs from Peter (obtained on IRC) is now available from the pm-cpuidle branch
in the linux-pm.git tree.

It has been added to my linux-next branch, so it probably will be picked up by
linux-next tomorrow and I have a plan to push it for v4.17 in the second half
of the next week unless a major issue with it is found in the meantime.


Great to hear that. Thanks for all your work.

I'm finishing up some analysis of corner cases, but nothing major.
So I'm glad to see this is moving along. I've been nitpicking a lot,
but this is clearly a huge improvement and there are practical
limitations against a theoretically perfect solution. In any case the
changes will also make future policy adaptions much easier.

Thanks,
Thomas



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-28 Thread Thomas Ilsche

On 2018-03-28 12:56, Rafael J. Wysocki wrote:

On Wed, Mar 28, 2018 at 12:37 PM, Rafael J. Wysocki <raf...@kernel.org> wrote:

On Wed, Mar 28, 2018 at 10:38 AM, Thomas Ilsche
<thomas.ils...@tu-dresden.de> wrote:

On 2018-03-28 10:13, Rafael J. Wysocki wrote:




[cut]



So I do

$ for cpu in 0 1 2 3; do taskset -c $cpu sh -c 'while true; do usleep
500; done' & done

which is a shell kind of imitation of the above and I cannot see this
issue at all.

I count the number of times data->next_timer_us in menu_select() is
greater than TICK_USEC and while this "workload" is running, that
number is exactly 0.

I'll try with a C program still.


And with a C program I see data->next_timer_us greater than TICK_USEC
while it is running, so let me dig deeper.



I can confirm that a shell-loop behaves differently like you describe.
Even if it's just a shell-loop calling "main{usleep(500);}" binary.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-28 Thread Thomas Ilsche

On 2018-03-28 12:56, Rafael J. Wysocki wrote:

On Wed, Mar 28, 2018 at 12:37 PM, Rafael J. Wysocki  wrote:

On Wed, Mar 28, 2018 at 10:38 AM, Thomas Ilsche
 wrote:

On 2018-03-28 10:13, Rafael J. Wysocki wrote:




[cut]



So I do

$ for cpu in 0 1 2 3; do taskset -c $cpu sh -c 'while true; do usleep
500; done' & done

which is a shell kind of imitation of the above and I cannot see this
issue at all.

I count the number of times data->next_timer_us in menu_select() is
greater than TICK_USEC and while this "workload" is running, that
number is exactly 0.

I'll try with a C program still.


And with a C program I see data->next_timer_us greater than TICK_USEC
while it is running, so let me dig deeper.



I can confirm that a shell-loop behaves differently like you describe.
Even if it's just a shell-loop calling "main{usleep(500);}" binary.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-03-28 Thread Thomas Ilsche

On 2018-03-22 18:40, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
  (1) the idle exit latency is constrained at 0, or
  (2) the selected state is a polling one, or
  (3) the expected idle period duration is within the tick period
  range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values.  To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.

Signed-off-by: Rafael J. Wysocki 
---

One more revision here.

 From the Thomas Ilsche's testing on the Skylake server it looks like
data->intervals[] need to be updated along with the correction factor
on tick wakeups that occur when next_timer_us is above the tick boundary.

The difference between this and the original v7 (of patch [5/8]) is
what happens in menu_update().  This time next_timer_us is checked
properly and if that is above the tick boundary and a tick wakeup occurs,
the function simply sets mesured_us to a large constant and uses that to
update both the correction factor and data->intervals[] (the particular
value used in this patch was found through a bit of experimentation).

Let's see how this works for Thomas and Doug.

For easier testing there is a git branch containing this patch (and the
rest of the series) at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
  idle-loop-v7.3

Thanks!


Besides the other issue with tick_nohz_get_sleep_length, v7.3
generally works well in idle. So far I don't see anything
statistically noticeable, but I saw one peculiar anomaly. After all
cores woke up simultaneously to schedule some kworker task, some of
them kept the sched tick up, even stayed in shallow sleep state for a
while, without having any tasks scheduled. Gradually they chose deeper
sleep states and stopped their sched ticks. After 23 ms (1000 Hz
kernel), they all went back to deep sleep.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp_anomaly.png

I have only seen this once so far and can't reproduce it yet, so this
particular instance may not be an issue in practice. However my
fundamental concerns about the policy whether to disable the sched
tick remain:

Mixing the precise timer and vague heuristic for the decision is
dangerous. The timer should not be wrong, heuristic may be.

Decisions should use actual time points rather than the generic tick
duration and residency time. e.g.
  expected_interval < delta_next_us
vs
  expected_interval < TICK_USEC

For some cases the unmodified sched tick is not efficient as fallback.
Is it feasible to
1) enable the sched tick when it's currently disabled instead of
blindly choosing a different C state?
2) modify the next upcoming sched tick to be better suitable as
fallback timer?

I think with the infrastructure changes it should be possible to
implement the policy I envisioned previously
(https://marc.info/?l=linux-pm=151384941425947=2), which is based
on the ordering of timers and the heuristically predicted idle time.
If the sleep_length issue is fixed and I have some mechanism for a
modifiable fallback timer, I'll try to demonstrate that on top of your
changes.



---
  drivers/cpuidle/cpuidle.c  |   10 +-
  drivers/cpuidle/governors/ladder.c |3 +
  drivers/cpuidle/governors/menu.c   |   59 
+
  include/linux/cpuidle.h|8 +++--
  include/linux/tick.h   |2 +
  kernel/sched/idle.c|4 +-
  kernel/time/tick-sched.c   |   20 
  7 files changed, 87 insertions(+), 19 deletions(-)

Index: linux-pm/include/linux/cpuidle.h
===
--- linux-pm.orig/include/linux/cpuidle.h
+++ linux-pm/include/linux/cpuidle.h
@@ -135,7 +135,8 @@ extern bool cpuidle_not_available(struct
  struct 

Re: [RFT][PATCH v7.3 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-03-28 Thread Thomas Ilsche

On 2018-03-22 18:40, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
  (1) the idle exit latency is constrained at 0, or
  (2) the selected state is a polling one, or
  (3) the expected idle period duration is within the tick period
  range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values.  To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.

Signed-off-by: Rafael J. Wysocki 
---

One more revision here.

 From the Thomas Ilsche's testing on the Skylake server it looks like
data->intervals[] need to be updated along with the correction factor
on tick wakeups that occur when next_timer_us is above the tick boundary.

The difference between this and the original v7 (of patch [5/8]) is
what happens in menu_update().  This time next_timer_us is checked
properly and if that is above the tick boundary and a tick wakeup occurs,
the function simply sets mesured_us to a large constant and uses that to
update both the correction factor and data->intervals[] (the particular
value used in this patch was found through a bit of experimentation).

Let's see how this works for Thomas and Doug.

For easier testing there is a git branch containing this patch (and the
rest of the series) at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
  idle-loop-v7.3

Thanks!


Besides the other issue with tick_nohz_get_sleep_length, v7.3
generally works well in idle. So far I don't see anything
statistically noticeable, but I saw one peculiar anomaly. After all
cores woke up simultaneously to schedule some kworker task, some of
them kept the sched tick up, even stayed in shallow sleep state for a
while, without having any tasks scheduled. Gradually they chose deeper
sleep states and stopped their sched ticks. After 23 ms (1000 Hz
kernel), they all went back to deep sleep.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp_anomaly.png

I have only seen this once so far and can't reproduce it yet, so this
particular instance may not be an issue in practice. However my
fundamental concerns about the policy whether to disable the sched
tick remain:

Mixing the precise timer and vague heuristic for the decision is
dangerous. The timer should not be wrong, heuristic may be.

Decisions should use actual time points rather than the generic tick
duration and residency time. e.g.
  expected_interval < delta_next_us
vs
  expected_interval < TICK_USEC

For some cases the unmodified sched tick is not efficient as fallback.
Is it feasible to
1) enable the sched tick when it's currently disabled instead of
blindly choosing a different C state?
2) modify the next upcoming sched tick to be better suitable as
fallback timer?

I think with the infrastructure changes it should be possible to
implement the policy I envisioned previously
(https://marc.info/?l=linux-pm=151384941425947=2), which is based
on the ordering of timers and the heuristically predicted idle time.
If the sleep_length issue is fixed and I have some mechanism for a
modifiable fallback timer, I'll try to demonstrate that on top of your
changes.



---
  drivers/cpuidle/cpuidle.c  |   10 +-
  drivers/cpuidle/governors/ladder.c |3 +
  drivers/cpuidle/governors/menu.c   |   59 
+
  include/linux/cpuidle.h|8 +++--
  include/linux/tick.h   |2 +
  kernel/sched/idle.c|4 +-
  kernel/time/tick-sched.c   |   20 
  7 files changed, 87 insertions(+), 19 deletions(-)

Index: linux-pm/include/linux/cpuidle.h
===
--- linux-pm.orig/include/linux/cpuidle.h
+++ linux-pm/include/linux/cpuidle.h
@@ -135,7 +135,8 @@ extern bool cpuidle_not_available(struct
  struct cpuidle_device *dev);
  
  extern int cpuidle_select(struct 

Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-28 Thread Thomas Ilsche

On 2018-03-28 10:13, Rafael J. Wysocki wrote:

On Wed, Mar 28, 2018 at 12:10 AM, Rafael J. Wysocki <r...@rjwysocki.net> wrote:

On Tuesday, March 27, 2018 11:50:02 PM CEST Thomas Ilsche wrote:

On 2018-03-20 16:45, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki <rafael.j.wyso...@intel.com>

In order to address the issue with short idle duration predictions
by the idle governor after the tick has been stopped, reorder the
code in cpuidle_idle_call() so that the governor idle state selection
runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned
by cpuidle_select() to decide whether or not to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_enter().  Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.


I think something is wrong with the new tick_nohz_get_sleep_length.
It seems to return a value that is too large, ignoring immanent
non-sched timer.


That's a very useful hint, let me have a look.


I tested idle-loop-v7.3. It looks very similar to my previous results
on the first idle-loop-git-version [1]. Idle and traditional synthetic
powernightmares are mostly good.


OK


But it selects too deep C-states for short idle periods, which is bad
for power consumption [2].


That still needs to be improved, then.


I tracked this down with additional tests using
__attribute__((optimize("O0"))) menu_select
and perf probe. With this the behavior seems slightly different, but it
shows that data->next_timer_us is:
v4.16-rc6: the expected ~500 us [3]
idle-loop-v7.3: many milliseconds to minutes [4].
This leads to the governor to wrongly selecting C6.

Checking with 372be9e and 6ea0577, I can confirm that the change is
introduced by this patch.


Yes, that's where the most intrusive reordering happens.


Overall, this is an interesting conundrum, because the case in
question is when the tick should never be stopped at all during the
workload and the code's behavior in that case should not change, so
the change was not intentional.

Now, from walking through the code, as long as can_stop_idle_tick()
returns 'true' all should be fine or at least I don't see why there is
any difference in behavior in that case.

However, if can_stop_idle_tick() returns 'false' (for example, because
need_resched() returns 'true' when it is evaluated), the behavior *is*
different in a couple of ways.  I sort of know how that can be
addressed, but I'd like to reproduce your results here.

Are you still using the same workload as before to trigger this behavior?



Yes, the exact code I use is as follows

$ gcc poller.c -O3 -fopenmp -o poller_omp
$ GOMP_CPU_AFFINITY=0-35 ./poller_omp 500

#include 
#include 
#include 

int main(int argc, char *argv[])
{
int sleep_us = 1;
if (argc == 2) {
sleep_us = atoi(argv[1]);
}

#pragma omp parallel
{
while (1) {
usleep(sleep_us);
}
}
}


Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-28 Thread Thomas Ilsche

On 2018-03-28 10:13, Rafael J. Wysocki wrote:

On Wed, Mar 28, 2018 at 12:10 AM, Rafael J. Wysocki  wrote:

On Tuesday, March 27, 2018 11:50:02 PM CEST Thomas Ilsche wrote:

On 2018-03-20 16:45, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

In order to address the issue with short idle duration predictions
by the idle governor after the tick has been stopped, reorder the
code in cpuidle_idle_call() so that the governor idle state selection
runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned
by cpuidle_select() to decide whether or not to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_enter().  Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.


I think something is wrong with the new tick_nohz_get_sleep_length.
It seems to return a value that is too large, ignoring immanent
non-sched timer.


That's a very useful hint, let me have a look.


I tested idle-loop-v7.3. It looks very similar to my previous results
on the first idle-loop-git-version [1]. Idle and traditional synthetic
powernightmares are mostly good.


OK


But it selects too deep C-states for short idle periods, which is bad
for power consumption [2].


That still needs to be improved, then.


I tracked this down with additional tests using
__attribute__((optimize("O0"))) menu_select
and perf probe. With this the behavior seems slightly different, but it
shows that data->next_timer_us is:
v4.16-rc6: the expected ~500 us [3]
idle-loop-v7.3: many milliseconds to minutes [4].
This leads to the governor to wrongly selecting C6.

Checking with 372be9e and 6ea0577, I can confirm that the change is
introduced by this patch.


Yes, that's where the most intrusive reordering happens.


Overall, this is an interesting conundrum, because the case in
question is when the tick should never be stopped at all during the
workload and the code's behavior in that case should not change, so
the change was not intentional.

Now, from walking through the code, as long as can_stop_idle_tick()
returns 'true' all should be fine or at least I don't see why there is
any difference in behavior in that case.

However, if can_stop_idle_tick() returns 'false' (for example, because
need_resched() returns 'true' when it is evaluated), the behavior *is*
different in a couple of ways.  I sort of know how that can be
addressed, but I'd like to reproduce your results here.

Are you still using the same workload as before to trigger this behavior?



Yes, the exact code I use is as follows

$ gcc poller.c -O3 -fopenmp -o poller_omp
$ GOMP_CPU_AFFINITY=0-35 ./poller_omp 500

#include 
#include 
#include 

int main(int argc, char *argv[])
{
int sleep_us = 1;
if (argc == 2) {
sleep_us = atoi(argv[1]);
}

#pragma omp parallel
{
while (1) {
usleep(sleep_us);
}
}
}


Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-27 Thread Thomas Ilsche

On 2018-03-20 16:45, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

In order to address the issue with short idle duration predictions
by the idle governor after the tick has been stopped, reorder the
code in cpuidle_idle_call() so that the governor idle state selection
runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned
by cpuidle_select() to decide whether or not to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_enter().  Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.


I think something is wrong with the new tick_nohz_get_sleep_length.
It seems to return a value that is too large, ignoring immanent
non-sched timer.

I tested idle-loop-v7.3. It looks very similar to my previous results
on the first idle-loop-git-version [1]. Idle and traditional synthetic
powernightmares are mostly good. But it selects too deep C-states
for short idle periods, which is bad for power consumption [2].

I tracked this down with additional tests using
__attribute__((optimize("O0"))) menu_select
and perf probe. With this the behavior seems slightly different, but it
shows that data->next_timer_us is:
v4.16-rc6: the expected ~500 us [3]
idle-loop-v7.3: many milliseconds to minutes [4].
This leads to the governor to wrongly selecting C6.

Checking with 372be9e and 6ea0577, I can confirm that the change is
introduced by this patch.

[1] https://lkml.org/lkml/2018/3/20/238
[2] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp.png
[3] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-v4.16-rc6.png
[4] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-idle-loop-v7.3.png
 

Namely, notice that tick_nohz_stop_sched_tick() already computes the
next timer event time to reprogram the scheduler tick hrtimer and
that time can be used as a proxy for the actual next timer event
time in the idle duration predicition.  Moreover, it is possible
to split tick_nohz_stop_sched_tick() into two separate routines,
one computing the time to the next timer event and the other
simply stopping the tick when the time to the next timer event
is known.

Accordingly, split tick_nohz_stop_sched_tick() into
tick_nohz_next_event() and tick_nohz_stop_tick() and use the
former in tick_nohz_get_sleep_length().  Add two new extra fields,
timer_expires and timer_expires_base, to struct tick_sched for
passing data between these two new functions and to indicate that
tick_nohz_next_event() has run and tick_nohz_stop_tick() can be
called now.  Also drop the now redundant sleep_length field from
there.

Signed-off-by: Rafael J. Wysocki 
---

v5 -> v7:
   * Rebase on top of the new [5/8].

---
  include/linux/tick.h |2
  kernel/sched/idle.c  |   11 ++-
  kernel/time/tick-sched.c |  156 
+++
  kernel/time/tick-sched.h |6 +
  4 files changed, 120 insertions(+), 55 deletions(-)

Index: linux-pm/kernel/time/tick-sched.h
===
--- linux-pm.orig/kernel/time/tick-sched.h
+++ linux-pm/kernel/time/tick-sched.h
@@ -38,7 +38,8 @@ enum tick_nohz_mode {
   * @idle_exittime:Time when the idle state was left
   * @idle_sleeptime:   Sum of the time slept in idle with sched tick stopped
   * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, 
with IO outstanding
- * @sleep_length:  Duration of the current idle sleep
+ * @timer_expires: Anticipated timer expiration time (in case sched tick 
is stopped)
+ * @timer_expires_base:Base time clock monotonic for @timer_expires
   * @do_timer_lst: CPU was the last one doing do_timer before going idle
   */
  struct tick_sched {
@@ -58,8 +59,9 @@ struct tick_sched {
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
-   ktime_t sleep_length;
unsigned long   last_jiffies;
+   u64 timer_expires;
+   u64 timer_expires_base;
u64 next_timer;
ktime_t idle_expires;
int do_timer_last;
Index: linux-pm/kernel/sched/idle.c
===
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -190,13 +190,18 @@ static void cpuidle_idle_call(void)
} else {
bool stop_tick = true;
  
-		tick_nohz_idle_stop_tick();

-   rcu_idle_enter();
-
/*

Re: [RFT][PATCH v7 6/8] sched: idle: Select idle state before stopping the tick

2018-03-27 Thread Thomas Ilsche

On 2018-03-20 16:45, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

In order to address the issue with short idle duration predictions
by the idle governor after the tick has been stopped, reorder the
code in cpuidle_idle_call() so that the governor idle state selection
runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned
by cpuidle_select() to decide whether or not to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_enter().  Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.


I think something is wrong with the new tick_nohz_get_sleep_length.
It seems to return a value that is too large, ignoring immanent
non-sched timer.

I tested idle-loop-v7.3. It looks very similar to my previous results
on the first idle-loop-git-version [1]. Idle and traditional synthetic
powernightmares are mostly good. But it selects too deep C-states
for short idle periods, which is bad for power consumption [2].

I tracked this down with additional tests using
__attribute__((optimize("O0"))) menu_select
and perf probe. With this the behavior seems slightly different, but it
shows that data->next_timer_us is:
v4.16-rc6: the expected ~500 us [3]
idle-loop-v7.3: many milliseconds to minutes [4].
This leads to the governor to wrongly selecting C6.

Checking with 372be9e and 6ea0577, I can confirm that the change is
introduced by this patch.

[1] https://lkml.org/lkml/2018/3/20/238
[2] https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_3_skl_sp.png
[3] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-v4.16-rc6.png
[4] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/next_timer_us-idle-loop-v7.3.png
 

Namely, notice that tick_nohz_stop_sched_tick() already computes the
next timer event time to reprogram the scheduler tick hrtimer and
that time can be used as a proxy for the actual next timer event
time in the idle duration predicition.  Moreover, it is possible
to split tick_nohz_stop_sched_tick() into two separate routines,
one computing the time to the next timer event and the other
simply stopping the tick when the time to the next timer event
is known.

Accordingly, split tick_nohz_stop_sched_tick() into
tick_nohz_next_event() and tick_nohz_stop_tick() and use the
former in tick_nohz_get_sleep_length().  Add two new extra fields,
timer_expires and timer_expires_base, to struct tick_sched for
passing data between these two new functions and to indicate that
tick_nohz_next_event() has run and tick_nohz_stop_tick() can be
called now.  Also drop the now redundant sleep_length field from
there.

Signed-off-by: Rafael J. Wysocki 
---

v5 -> v7:
   * Rebase on top of the new [5/8].

---
  include/linux/tick.h |2
  kernel/sched/idle.c  |   11 ++-
  kernel/time/tick-sched.c |  156 
+++
  kernel/time/tick-sched.h |6 +
  4 files changed, 120 insertions(+), 55 deletions(-)

Index: linux-pm/kernel/time/tick-sched.h
===
--- linux-pm.orig/kernel/time/tick-sched.h
+++ linux-pm/kernel/time/tick-sched.h
@@ -38,7 +38,8 @@ enum tick_nohz_mode {
   * @idle_exittime:Time when the idle state was left
   * @idle_sleeptime:   Sum of the time slept in idle with sched tick stopped
   * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, 
with IO outstanding
- * @sleep_length:  Duration of the current idle sleep
+ * @timer_expires: Anticipated timer expiration time (in case sched tick 
is stopped)
+ * @timer_expires_base:Base time clock monotonic for @timer_expires
   * @do_timer_lst: CPU was the last one doing do_timer before going idle
   */
  struct tick_sched {
@@ -58,8 +59,9 @@ struct tick_sched {
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
-   ktime_t sleep_length;
unsigned long   last_jiffies;
+   u64 timer_expires;
+   u64 timer_expires_base;
u64 next_timer;
ktime_t idle_expires;
int do_timer_last;
Index: linux-pm/kernel/sched/idle.c
===
--- linux-pm.orig/kernel/sched/idle.c
+++ linux-pm/kernel/sched/idle.c
@@ -190,13 +190,18 @@ static void cpuidle_idle_call(void)
} else {
bool stop_tick = true;
  
-		tick_nohz_idle_stop_tick();

-   rcu_idle_enter();
-
/*
 * Ask the cpuidle framework to choose a convenient 

Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-03-22 Thread Thomas Ilsche

On 2018-03-21 23:15, Rafael J. Wysocki wrote:

On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche
<thomas.ils...@tu-dresden.de> wrote:

On 2018-03-21 15:36, Rafael J. Wysocki wrote:



So please disregard this one entirely and take the v7.2 replacement
instead of it:https://patchwork.kernel.org/patch/10299429/

The current versions (including the above) is in the git branch at

   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
   idle-loop-v7.2



With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
as with v5: several cores which just keep the sched tick enabled.
Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
full sched tick.
The resulting power consumption is ~105 W instead of ~ 70 W.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png

I have briefly ran v7 and I believe it was also affected.


Then it looks like menu_select() stubbornly thinks that the idle
duration will be within the tick boundary on those cores.

That may be because the bumping up of the correction factor in
menu_reflect() is too conservative or it may be necessary to do
something radical to measured_us in menu_update() in case of a tick
wakeup combined with a large next_timer_us value.

For starters, please see if the attached patch (on top of the
idle-loop-v7.2 git branch) changes this behavior in any way.



The patch on top of idle-loop-v7.2 doesn't improve idle behavior on
SKL-SP. Overall it is pretty erratic, I have not seen any regular
patterns. Sometimes only few cpus are affected, here's a screenshot of
almost all cpus being affected after a short burst workload.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_reflect_skl_sp_idle.png


Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-03-22 Thread Thomas Ilsche

On 2018-03-21 23:15, Rafael J. Wysocki wrote:

On Wed, Mar 21, 2018 at 6:59 PM, Thomas Ilsche
 wrote:

On 2018-03-21 15:36, Rafael J. Wysocki wrote:



So please disregard this one entirely and take the v7.2 replacement
instead of it:https://patchwork.kernel.org/patch/10299429/

The current versions (including the above) is in the git branch at

   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
   idle-loop-v7.2



With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
as with v5: several cores which just keep the sched tick enabled.
Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
full sched tick.
The resulting power consumption is ~105 W instead of ~ 70 W.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png

I have briefly ran v7 and I believe it was also affected.


Then it looks like menu_select() stubbornly thinks that the idle
duration will be within the tick boundary on those cores.

That may be because the bumping up of the correction factor in
menu_reflect() is too conservative or it may be necessary to do
something radical to measured_us in menu_update() in case of a tick
wakeup combined with a large next_timer_us value.

For starters, please see if the attached patch (on top of the
idle-loop-v7.2 git branch) changes this behavior in any way.



The patch on top of idle-loop-v7.2 doesn't improve idle behavior on
SKL-SP. Overall it is pretty erratic, I have not seen any regular
patterns. Sometimes only few cpus are affected, here's a screenshot of
almost all cpus being affected after a short burst workload.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_reflect_skl_sp_idle.png


Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-03-21 Thread Thomas Ilsche

On 2018-03-21 15:36, Rafael J. Wysocki wrote:


So please disregard this one entirely and take the v7.2 replacement
instead of it:https://patchwork.kernel.org/patch/10299429/

The current versions (including the above) is in the git branch at

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
  idle-loop-v7.2


With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
as with v5: several cores which just keep the sched tick enabled.
Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
full sched tick.
The resulting power consumption is ~105 W instead of ~ 70 W.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png

I have briefly ran v7 and I believe it was also affected.


Re: [RFT][PATCH v7 5/8] cpuidle: Return nohz hint from cpuidle_select()

2018-03-21 Thread Thomas Ilsche

On 2018-03-21 15:36, Rafael J. Wysocki wrote:


So please disregard this one entirely and take the v7.2 replacement
instead of it:https://patchwork.kernel.org/patch/10299429/

The current versions (including the above) is in the git branch at

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
  idle-loop-v7.2


With v7.2 (tested on SKL-SP from git) I see similar behavior in idle
as with v5: several cores which just keep the sched tick enabled.
Worse yet, some go only in C1 (not even C1E!?) despite sleeping the
full sched tick.
The resulting power consumption is ~105 W instead of ~ 70 W.

https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/v7_2_skl_sp_idle.png

I have briefly ran v7 and I believe it was also affected.


Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework

2018-03-20 Thread Thomas Ilsche
han the deepest available one with target residency
+* within the tick period range.
+*
+* This allows the tick to be stopped even if the
+* predicted idle duration is within the tick period
+* range to counter the effect by which the prediction
+* may be skewed towards lower values due to the tick
+* bias.
+*/
+   expected_interval = s->target_residency;
 break;


BTW, I guess I need to explain the motivation here more thoroughly, so
here it goes.

The governor predicts idle duration under the assumption that the
tick will be stopped, so if the result of the prediction is within the tick
period range and it is not accurate, that needs to be taken into
account in the governor's statistics.  However, if the tick is allowed
to run every time the governor predicts idle duration within the tick
period range, the governor will always see that it was "almost
right" and the correction factor applied by it to improve the
prediction next time will not be sufficient.  For this reason, it
is better to stop the tick at least sometimes when the governor
predicts idle duration within the tick period range and the idea
here is to do that when the selected state is the deepest available
one with the target residency within the tick period range.  This
allows the opportunity to save more energy to be seized which
balances the extra overhead of stopping the tick.

HTH



--
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-37773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework

2018-03-20 Thread Thomas Ilsche
 * within the tick period range.
+*
+* This allows the tick to be stopped even if the
+* predicted idle duration is within the tick period
+* range to counter the effect by which the prediction
+* may be skewed towards lower values due to the tick
+* bias.
+*/
+   expected_interval = s->target_residency;
 break;


BTW, I guess I need to explain the motivation here more thoroughly, so
here it goes.

The governor predicts idle duration under the assumption that the
tick will be stopped, so if the result of the prediction is within the tick
period range and it is not accurate, that needs to be taken into
account in the governor's statistics.  However, if the tick is allowed
to run every time the governor predicts idle duration within the tick
period range, the governor will always see that it was "almost
right" and the correction factor applied by it to improve the
prediction next time will not be sufficient.  For this reason, it
is better to stop the tick at least sometimes when the governor
predicts idle duration within the tick period range and the idea
here is to do that when the selected state is the deepest available
one with the target residency within the tick period range.  This
allows the opportunity to save more energy to be seized which
balances the extra overhead of stopping the tick.

HTH



--
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-37773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFT][PATCH v5 7/7] cpuidle: menu: Avoid selecting shallow states with stopped tick

2018-03-19 Thread Thomas Ilsche

On 2018-03-15 23:19, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

If the scheduler tick has been stopped already and the governor
selects a shallow idle state, the CPU can spend a long time in that
state if the selection is based on an inaccurate prediction of idle
time.  That effect turns out to be noticeable, so it needs to be
mitigated.


What are some common causes for that situation?
How could I trigger this for testing?


To that end, modify the menu governor to discard the result of the
idle time prediction if the tick is stopped and the predicted idle
time is less than the tick period length, unless the tick timer is
going to expire soon.


This seems dangerous. Using a C-state that is too deep could be
problematic for soft latency, caches and overall energy.

Would it be viable to re-enable the sched tick to act as a fallback?
Generally, would it be feasible to modify the upcoming sched tick
timer to be a better time for a fallback wakeup in certain situations?



Signed-off-by: Rafael J. Wysocki 
---

v4 -> v5:
   * Rebase on top of the new [1-6/7].
   * Never use the interactivity factor when the tick is stopped.

---
  drivers/cpuidle/governors/menu.c |   29 ++---
  1 file changed, 22 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -353,13 +353,28 @@ static int menu_select(struct cpuidle_dr
 */
data->predicted_us = min(data->predicted_us, expected_interval);
  
-	/*

-* Use the performance multiplier and the user-configurable
-* latency_req to determine the maximum exit latency.
-*/
-   interactivity_req = data->predicted_us / 
performance_multiplier(nr_iowaiters, cpu_load);
-   if (latency_req > interactivity_req)
-   latency_req = interactivity_req;
+   if (tick_nohz_tick_stopped()) {
+   /*
+* If the tick is already stopped, the cost of possible short
+* idle duration misprediction is much higher, because the CPU
+* may be stuck in a shallow idle state for a long time as a
+* result of it.  In that case say we might mispredict and try
+* to force the CPU into a state for which we would have stopped
+* the tick, unless the tick timer is going to expire really
+* soon anyway.
+*/
+   if (data->predicted_us < TICK_USEC_HZ)
+   data->predicted_us = min_t(unsigned int, TICK_USEC_HZ,
+  ktime_to_us(tick_time));


This applies to the heuristic (expected_interval) and the (heuristically
corrected) next timer. Should this modification be applied only to the
expected_interval under the assumption that the next_timer_us * correction
is never totally wrong.


+   } else {
+   /*
+* Use the performance multiplier and the user-configurable
+* latency_req to determine the maximum exit latency.
+*/
+   interactivity_req = data->predicted_us / 
performance_multiplier(nr_iowaiters, cpu_load);
+   if (latency_req > interactivity_req)
+   latency_req = interactivity_req;
+   }
  
  	expected_interval = data->predicted_us;

/*



Re: [RFT][PATCH v5 7/7] cpuidle: menu: Avoid selecting shallow states with stopped tick

2018-03-19 Thread Thomas Ilsche

On 2018-03-15 23:19, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

If the scheduler tick has been stopped already and the governor
selects a shallow idle state, the CPU can spend a long time in that
state if the selection is based on an inaccurate prediction of idle
time.  That effect turns out to be noticeable, so it needs to be
mitigated.


What are some common causes for that situation?
How could I trigger this for testing?


To that end, modify the menu governor to discard the result of the
idle time prediction if the tick is stopped and the predicted idle
time is less than the tick period length, unless the tick timer is
going to expire soon.


This seems dangerous. Using a C-state that is too deep could be
problematic for soft latency, caches and overall energy.

Would it be viable to re-enable the sched tick to act as a fallback?
Generally, would it be feasible to modify the upcoming sched tick
timer to be a better time for a fallback wakeup in certain situations?



Signed-off-by: Rafael J. Wysocki 
---

v4 -> v5:
   * Rebase on top of the new [1-6/7].
   * Never use the interactivity factor when the tick is stopped.

---
  drivers/cpuidle/governors/menu.c |   29 ++---
  1 file changed, 22 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -353,13 +353,28 @@ static int menu_select(struct cpuidle_dr
 */
data->predicted_us = min(data->predicted_us, expected_interval);
  
-	/*

-* Use the performance multiplier and the user-configurable
-* latency_req to determine the maximum exit latency.
-*/
-   interactivity_req = data->predicted_us / 
performance_multiplier(nr_iowaiters, cpu_load);
-   if (latency_req > interactivity_req)
-   latency_req = interactivity_req;
+   if (tick_nohz_tick_stopped()) {
+   /*
+* If the tick is already stopped, the cost of possible short
+* idle duration misprediction is much higher, because the CPU
+* may be stuck in a shallow idle state for a long time as a
+* result of it.  In that case say we might mispredict and try
+* to force the CPU into a state for which we would have stopped
+* the tick, unless the tick timer is going to expire really
+* soon anyway.
+*/
+   if (data->predicted_us < TICK_USEC_HZ)
+   data->predicted_us = min_t(unsigned int, TICK_USEC_HZ,
+  ktime_to_us(tick_time));


This applies to the heuristic (expected_interval) and the (heuristically
corrected) next timer. Should this modification be applied only to the
expected_interval under the assumption that the next_timer_us * correction
is never totally wrong.


+   } else {
+   /*
+* Use the performance multiplier and the user-configurable
+* latency_req to determine the maximum exit latency.
+*/
+   interactivity_req = data->predicted_us / 
performance_multiplier(nr_iowaiters, cpu_load);
+   if (latency_req > interactivity_req)
+   latency_req = interactivity_req;
+   }
  
  	expected_interval = data->predicted_us;

/*



Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework

2018-03-17 Thread Thomas Ilsche

Over the last week I tested v4+pollv2 and now v5+pollv3. With v5, I
observe a particular idle behavior, that I have not seen before with
v4. On a dual-socket Skylake system the idle power increases from
74.1 W (system total) to 85.5 W with a 300 HZ build and even to
138.3 W with a 1000 HZ build. A similar Haswell-EP system is also
affected.

There are phases during which one core will keep switching to the
highest C-state, but not disable the sched tick. Every 4th sched tick,
a kworker on that core is scheduled shortly. Every wakeup from C6 of a
single core will more than double the package power consumption of
*both8 sockets for ~500 us resulting in the significantly increased
sustained power consumption.

This is illustrated in [1]. For a comparison of a "normal" phase
(samekernel), see [2]. For a global view of the effect on a 1000 Hz
build, see [3].

I have not yet found any particular triggers or the specific
interaction between the sched tick and the kworker. I'm not sure how
this was introduced in v5. I would guess it could be a feedback loop
that I was concerned about initially.

I have more findings from v4, but this seems much more impactful.

[1] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz.png
[2] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz_ok.png
[3] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_1000Hz.png

On 2018-03-15 22:59, Rafael J. Wysocki wrote:

Hi All,

Thanks a lot for the feedback so far!

One more respin after the last batch of comments from Peter and Frederic.

The previous summary that still applies:

On Sunday, March 4, 2018 11:21:30 PM CET Rafael J. Wysocki wrote:


The problem is that if we stop the sched tick in
tick_nohz_idle_enter() and then the idle governor predicts short idle
duration, we lose regardless of whether or not it is right.

If it is right, we've lost already, because we stopped the tick
unnecessarily.  If it is not right, we'll lose going forward, because
the idle state selected by the governor is going to be too shallow and
we'll draw too much power (that has been reported recently to actually
happen often enough for people to care).

This patch series is an attempt to improve the situation and the idea
here is to make the decision whether or not to stop the tick deeper in
the idle loop and in particular after running the idle state selection
in the path where the idle governor is invoked.  This way the problem
can be avoided, because the idle duration predicted by the idle governor
can be used to decide whether or not to stop the tick so that the tick
is only stopped if that value is large enough (and, consequently, the
idle state selected by the governor is deep enough).

The series tires to avoid adding too much new code, rather reorder the
existing code and make it more fine-grained.

Patch 1 prepares the tick-sched code for the subsequent modifications and it
doesn't change the code's functionality (at least not intentionally).

Patch 2 starts pushing the tick stopping decision deeper into the idle
loop, but that is limited to do_idle() and tick_nohz_irq_exit().

Patch 3 makes cpuidle_idle_call() decide whether or not to stop the tick
and sets the stage for the subsequent changes.

Patch 4 adds a bool pointer argument to cpuidle_select() and the ->select
governor callback allowing them to return a "nohz" hint on whether or not to
stop the tick to the caller.  It also adds code to decide what value to
return as "nohz" to the menu governor.

Patch 5 reorders the idle state selection with respect to the stopping of
the tick and causes the additional "nohz" hint from cpuidle_select() to be
used for deciding whether or not to stop the tick.

Patch 6 causes the menu governor to refine the state selection in case the
tick is not going to be stopped and the already selected state may not fit
before the next tick time.

Patch 7 Deals with the situation in which the tick was stopped previously,
but the idle governor still predicts short idle.


This series is complementary to the poll_idle() patch at

https://patchwork.kernel.org/patch/10282237/

Thanks,
Rafael



--
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-37773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFT][PATCH v5 0/7] sched/cpuidle: Idle loop rework

2018-03-17 Thread Thomas Ilsche

Over the last week I tested v4+pollv2 and now v5+pollv3. With v5, I
observe a particular idle behavior, that I have not seen before with
v4. On a dual-socket Skylake system the idle power increases from
74.1 W (system total) to 85.5 W with a 300 HZ build and even to
138.3 W with a 1000 HZ build. A similar Haswell-EP system is also
affected.

There are phases during which one core will keep switching to the
highest C-state, but not disable the sched tick. Every 4th sched tick,
a kworker on that core is scheduled shortly. Every wakeup from C6 of a
single core will more than double the package power consumption of
*both8 sockets for ~500 us resulting in the significantly increased
sustained power consumption.

This is illustrated in [1]. For a comparison of a "normal" phase
(samekernel), see [2]. For a global view of the effect on a 1000 Hz
build, see [3].

I have not yet found any particular triggers or the specific
interaction between the sched tick and the kworker. I'm not sure how
this was introduced in v5. I would guess it could be a feedback loop
that I was concerned about initially.

I have more findings from v4, but this seems much more impactful.

[1] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz.png
[2] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_300Hz_ok.png
[3] 
https://wwwpub.zih.tu-dresden.de/~tilsche/powernightmares/rjwv5_idle_1000Hz.png

On 2018-03-15 22:59, Rafael J. Wysocki wrote:

Hi All,

Thanks a lot for the feedback so far!

One more respin after the last batch of comments from Peter and Frederic.

The previous summary that still applies:

On Sunday, March 4, 2018 11:21:30 PM CET Rafael J. Wysocki wrote:


The problem is that if we stop the sched tick in
tick_nohz_idle_enter() and then the idle governor predicts short idle
duration, we lose regardless of whether or not it is right.

If it is right, we've lost already, because we stopped the tick
unnecessarily.  If it is not right, we'll lose going forward, because
the idle state selected by the governor is going to be too shallow and
we'll draw too much power (that has been reported recently to actually
happen often enough for people to care).

This patch series is an attempt to improve the situation and the idea
here is to make the decision whether or not to stop the tick deeper in
the idle loop and in particular after running the idle state selection
in the path where the idle governor is invoked.  This way the problem
can be avoided, because the idle duration predicted by the idle governor
can be used to decide whether or not to stop the tick so that the tick
is only stopped if that value is large enough (and, consequently, the
idle state selected by the governor is deep enough).

The series tires to avoid adding too much new code, rather reorder the
existing code and make it more fine-grained.

Patch 1 prepares the tick-sched code for the subsequent modifications and it
doesn't change the code's functionality (at least not intentionally).

Patch 2 starts pushing the tick stopping decision deeper into the idle
loop, but that is limited to do_idle() and tick_nohz_irq_exit().

Patch 3 makes cpuidle_idle_call() decide whether or not to stop the tick
and sets the stage for the subsequent changes.

Patch 4 adds a bool pointer argument to cpuidle_select() and the ->select
governor callback allowing them to return a "nohz" hint on whether or not to
stop the tick to the caller.  It also adds code to decide what value to
return as "nohz" to the menu governor.

Patch 5 reorders the idle state selection with respect to the stopping of
the tick and causes the additional "nohz" hint from cpuidle_select() to be
used for deciding whether or not to stop the tick.

Patch 6 causes the menu governor to refine the state selection in case the
tick is not going to be stopped and the already selected state may not fit
before the next tick time.

Patch 7 Deals with the situation in which the tick was stopped previously,
but the idle governor still predicts short idle.


This series is complementary to the poll_idle() patch at

https://patchwork.kernel.org/patch/10282237/

Thanks,
Rafael



--
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-37773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC/RFT][PATCH 6/7] sched: idle: Predict idle duration before stopping the tick

2018-03-05 Thread Thomas Ilsche

On 2018-03-04 23:28, Rafael J. Wysocki wrote:

use the expected idle period
duration returned by cpuidle_select() to tell tick_nohz_idle_go_idle()
whether or not to stop the tick.


I assume that at the point of going idle, the actual next scheduling
tick may happen anywhere between now and 1/HZ. If there is a mechanism
that somehow ensures that the next scheduling tick always happens 1/HZ
after going idle, then some of my arguments are invalid.

Ideally, the decision whether to disable the sched tick should
primarily depend on the order of tree upcoming events: the the sched
tick, the next non-sched timer, and the heuristic prediction:

  https://marc.info/?l=linux-pm=151384941425947=2

If I read the code correctly, there is already logic deep within
__tick_nohz_idle_enter that prevents disabling the sched tick when
it is scheduled to happen after another timer, which is a good primary
condition for not stopping the sched tick. However the newly added
condition prevents stopping the sched tick in more cases where it is
undesirable.
Assume duration_us is slightly less than USEC_PER_SEC / HZ.
and next sched tick will happen in 0.1 * USEC_PER_SEC / HZ
If the prediction was accurate, the cpu will be woken up way too soon
by the not-disabled sched tick.

I fear that might even create positive feedback loops on the
heuristic, which will take into account the sleep durations for
sched tick wakeups in sort of a self fulfilling prophecy:
1) The heuristic predicts to wake up in less than a full sched period,
2) The sched tick is kept enabled
3) The sched tick wakes up the system in less than a full sched period
4) Repeat

Even when sleeping for longer than target_residency of the deepest
sleep state, you can still improve energy consumption by sleeping
longer whenever possible.

On the opposite side - undesirable shallow sleeps - the proposed patch
will basically always keep the tick enabled if there is a higher sleep
state with a target_residency <= 1/HZ. On systems with relatively low
target_residencies, such as the ones that I am primarily
investigating, this should effectively prevent long shallow sleeps.
However, on mobile systems with C10 states > 5 ms the sched tick is
not a suitable fallback timer for preventing these issues. Well, maybe
the timer itself could be used, but with a larger expiry.

So IMHO
- the precise timer and vague heuristic should not be mixed
- decisions should preferably use actual time points rather than the
  generic tick duration and residency time.
- for some cases the sched tick as is may not be sufficient as fallback

Question: Does disabling a timer on a cpu guarantee that this cpu will
wake-up or is there a scenario where a timer is deleted or moved
externally without the cpu having a chance to change it's idle state?


Re: [RFC/RFT][PATCH 6/7] sched: idle: Predict idle duration before stopping the tick

2018-03-05 Thread Thomas Ilsche

On 2018-03-04 23:28, Rafael J. Wysocki wrote:

use the expected idle period
duration returned by cpuidle_select() to tell tick_nohz_idle_go_idle()
whether or not to stop the tick.


I assume that at the point of going idle, the actual next scheduling
tick may happen anywhere between now and 1/HZ. If there is a mechanism
that somehow ensures that the next scheduling tick always happens 1/HZ
after going idle, then some of my arguments are invalid.

Ideally, the decision whether to disable the sched tick should
primarily depend on the order of tree upcoming events: the the sched
tick, the next non-sched timer, and the heuristic prediction:

  https://marc.info/?l=linux-pm=151384941425947=2

If I read the code correctly, there is already logic deep within
__tick_nohz_idle_enter that prevents disabling the sched tick when
it is scheduled to happen after another timer, which is a good primary
condition for not stopping the sched tick. However the newly added
condition prevents stopping the sched tick in more cases where it is
undesirable.
Assume duration_us is slightly less than USEC_PER_SEC / HZ.
and next sched tick will happen in 0.1 * USEC_PER_SEC / HZ
If the prediction was accurate, the cpu will be woken up way too soon
by the not-disabled sched tick.

I fear that might even create positive feedback loops on the
heuristic, which will take into account the sleep durations for
sched tick wakeups in sort of a self fulfilling prophecy:
1) The heuristic predicts to wake up in less than a full sched period,
2) The sched tick is kept enabled
3) The sched tick wakes up the system in less than a full sched period
4) Repeat

Even when sleeping for longer than target_residency of the deepest
sleep state, you can still improve energy consumption by sleeping
longer whenever possible.

On the opposite side - undesirable shallow sleeps - the proposed patch
will basically always keep the tick enabled if there is a higher sleep
state with a target_residency <= 1/HZ. On systems with relatively low
target_residencies, such as the ones that I am primarily
investigating, this should effectively prevent long shallow sleeps.
However, on mobile systems with C10 states > 5 ms the sched tick is
not a suitable fallback timer for preventing these issues. Well, maybe
the timer itself could be used, but with a larger expiry.

So IMHO
- the precise timer and vague heuristic should not be mixed
- decisions should preferably use actual time points rather than the
  generic tick duration and residency time.
- for some cases the sched tick as is may not be sufficient as fallback

Question: Does disabling a timer on a cpu guarantee that this cpu will
wake-up or is there a scenario where a timer is deleted or moved
externally without the cpu having a chance to change it's idle state?


Re: [PATCH] cpuidle: Add "cpuidle.use_deepest" to bypass governor and allow HW to go deep

2017-11-16 Thread Thomas Ilsche

On 2017-11-09 08:38, Len Brown wrote:

From: Len Brown 

While there are several mechanisms (cmdline, sysfs, PM_QOS) to limit
cpuidle to shallow idle states, there is no simple mechanism
to give the hardware permission to enter the deeptest state permitted by PM_QOS.

Here we create the "cpuidle.use_deepest" modparam to provide this capability.

"cpuidle.use_deepest=Y" can be set at boot-time, and
/sys/module/cpuidle/use_deepest can be modified (with Y/N) at run-time.


This is a good option to have and can conveniently prevent idle power 
consumption
issues. But that wouldn't be a reasonable default, would it?
I still think there is an inherent need for a heuristic and a corresponding
mitigation to avoid staying in a sleep state too long.

Best,
Thomas



Re: [PATCH] cpuidle: Add "cpuidle.use_deepest" to bypass governor and allow HW to go deep

2017-11-16 Thread Thomas Ilsche

On 2017-11-09 08:38, Len Brown wrote:

From: Len Brown 

While there are several mechanisms (cmdline, sysfs, PM_QOS) to limit
cpuidle to shallow idle states, there is no simple mechanism
to give the hardware permission to enter the deeptest state permitted by PM_QOS.

Here we create the "cpuidle.use_deepest" modparam to provide this capability.

"cpuidle.use_deepest=Y" can be set at boot-time, and
/sys/module/cpuidle/use_deepest can be modified (with Y/N) at run-time.


This is a good option to have and can conveniently prevent idle power 
consumption
issues. But that wouldn't be a reasonable default, would it?
I still think there is an inherent need for a heuristic and a corresponding
mitigation to avoid staying in a sleep state too long.

Best,
Thomas



[tip:perf/core] perf tools: Default to python version 2

2014-08-12 Thread tip-bot for Thomas Ilsche
Commit-ID:  d6a947fb6cdff3a19db93895c746f70b5903a965
Gitweb: http://git.kernel.org/tip/d6a947fb6cdff3a19db93895c746f70b5903a965
Author: Thomas Ilsche 
AuthorDate: Mon, 4 Aug 2014 15:03:15 +0200
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Tue, 12 Aug 2014 12:03:08 -0300

perf tools: Default to python version 2

According to PEP 394 recommendation [1], it's more portable to use
python2 rather than plain python to refer python binary version 2.

Since there're distros using python3 by default like Arch, and we don't
support python3 (yet), it'd be better using python2 explicitly.

But older versions (prior to 2.7) seem not to provide python2 but just
python.  Given that it's only old version, try python2 first and then
fallback to python.  It'll ensure that it always points to python 2.x.

I tested (compiles and perf script runs) with the combinations:

1) python  -> python2.x, python-config  -> python2.x-config
   python2 N/A,  python2-config N/A

2) python  -> python3.x, python-config  -> python3.x-config
   python2 -> python2.x, python2-config -> python2.x-config

3) python  -> python2.x, python-config  -> python2.x-config
   python2 -> python2.x, python2-config -> python2.x-config

4) python  -> python2.x, python-config  -> python2.x-config
   python2 -> python2.x, python2-config N/A

Based on / replaces the patch 2/2 by Namhyung Kim.

[1] https://www.python.org/dev/peps/pep-0394

Based-on-patch-by: Namhyung Kim 
Signed-off-by: Thomas Ilsche 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/53df8493.6070...@tu-dresden.de
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/config/Makefile | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index e05d8f9..75d4c23 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -121,10 +121,16 @@ ifdef PARSER_DEBUG
 endif
 
 ifndef NO_LIBPYTHON
-  override PYTHON := \
-$(call get-executable-or-default,PYTHON,python)
+  # Try different combinations to accommodate systems that only have
+  # python[2][-config] in weird combinations but always preferring
+  # python2 and python2-config as per pep-0394. If we catch a
+  # python[-config] in version 3, the version check will kill it.
+  PYTHON2 := $(if $(call get-executable,python2),python2,python)
+  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
+  PYTHON2_CONFIG := \
+$(if $(call 
get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)
   override PYTHON_CONFIG := \
-$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
+$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG))
 
   PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/core] perf tools: Default to python version 2

2014-08-12 Thread tip-bot for Thomas Ilsche
Commit-ID:  d6a947fb6cdff3a19db93895c746f70b5903a965
Gitweb: http://git.kernel.org/tip/d6a947fb6cdff3a19db93895c746f70b5903a965
Author: Thomas Ilsche thomas.ils...@tu-dresden.de
AuthorDate: Mon, 4 Aug 2014 15:03:15 +0200
Committer:  Arnaldo Carvalho de Melo a...@redhat.com
CommitDate: Tue, 12 Aug 2014 12:03:08 -0300

perf tools: Default to python version 2

According to PEP 394 recommendation [1], it's more portable to use
python2 rather than plain python to refer python binary version 2.

Since there're distros using python3 by default like Arch, and we don't
support python3 (yet), it'd be better using python2 explicitly.

But older versions (prior to 2.7) seem not to provide python2 but just
python.  Given that it's only old version, try python2 first and then
fallback to python.  It'll ensure that it always points to python 2.x.

I tested (compiles and perf script runs) with the combinations:

1) python  - python2.x, python-config  - python2.x-config
   python2 N/A,  python2-config N/A

2) python  - python3.x, python-config  - python3.x-config
   python2 - python2.x, python2-config - python2.x-config

3) python  - python2.x, python-config  - python2.x-config
   python2 - python2.x, python2-config - python2.x-config

4) python  - python2.x, python-config  - python2.x-config
   python2 - python2.x, python2-config N/A

Based on / replaces the patch 2/2 by Namhyung Kim.

[1] https://www.python.org/dev/peps/pep-0394

Based-on-patch-by: Namhyung Kim namhy...@kernel.org
Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de
Cc: Ingo Molnar mi...@kernel.org
Cc: Jiri Olsa jo...@redhat.com
Cc: Namhyung Kim namhy...@kernel.org
Cc: Paul Mackerras pau...@samba.org
Cc: Peter Zijlstra a.p.zijls...@chello.nl
Link: http://lkml.kernel.org/r/53df8493.6070...@tu-dresden.de
Signed-off-by: Arnaldo Carvalho de Melo a...@redhat.com
---
 tools/perf/config/Makefile | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index e05d8f9..75d4c23 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -121,10 +121,16 @@ ifdef PARSER_DEBUG
 endif
 
 ifndef NO_LIBPYTHON
-  override PYTHON := \
-$(call get-executable-or-default,PYTHON,python)
+  # Try different combinations to accommodate systems that only have
+  # python[2][-config] in weird combinations but always preferring
+  # python2 and python2-config as per pep-0394. If we catch a
+  # python[-config] in version 3, the version check will kill it.
+  PYTHON2 := $(if $(call get-executable,python2),python2,python)
+  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
+  PYTHON2_CONFIG := \
+$(if $(call 
get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)
   override PYTHON_CONFIG := \
-$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
+$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG))
 
   PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] perf tools: Default to python version 2

2014-08-04 Thread Thomas Ilsche
On 2014-08-02 15:20, Arnaldo Carvalho de Melo wrote:
> Maybe the way python was packaged on f14 is problematic, as you
> describe, but the fact is that python support is not working on that
> platform after this patch.
> 
> Could you please try to prepare a patch that does as you suggest so that
> I can try?

Please try the patch below.
I tested (compiles and perf script runs) with the combinations:

1) python  -> python2.x, python-config  -> python2.x-config
   python2 N/A,  python2-config N/A

2) python  -> python3.x, python-config  -> python3.x-config
   python2 -> python2.x, python2-config -> python2.x-config

3) python  -> python2.x, python-config  -> python2.x-config
   python2 -> python2.x, python2-config -> python2.x-config

4) python  -> python2.x, python-config  -> python2.x-config
   python2 -> python2.x, python2-config N/A

Based on / replaces the patch 2/2 by Namhyung Kim.

Signed-off-by: Thomas Ilsche 
---
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 9ee2008..0528cd37 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -104,10 +104,16 @@ ifdef PARSER_DEBUG
 endif
 
 ifndef NO_LIBPYTHON
-  override PYTHON := \
-$(call get-executable-or-default,PYTHON,python)
+  # Try different combinations to accommodate systems that only have
+  # python[2][-config] in weird combinations but always preferring
+  # python2 and python2-config as per pep-0394. If we catch a
+  # python[-config] in version 3, the version check will kill it.
+  PYTHON2 := $(if $(call get-executable,python2),python2,python)
+  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
+  PYTHON2_CONFIG := \
+$(if $(call 
get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)
   override PYTHON_CONFIG := \
-$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
+$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG))
 
   PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 2/2] perf tools: Default to python version 2

2014-08-04 Thread Thomas Ilsche
On 2014-08-02 15:20, Arnaldo Carvalho de Melo wrote:
 Maybe the way python was packaged on f14 is problematic, as you
 describe, but the fact is that python support is not working on that
 platform after this patch.
 
 Could you please try to prepare a patch that does as you suggest so that
 I can try?

Please try the patch below.
I tested (compiles and perf script runs) with the combinations:

1) python  - python2.x, python-config  - python2.x-config
   python2 N/A,  python2-config N/A

2) python  - python3.x, python-config  - python3.x-config
   python2 - python2.x, python2-config - python2.x-config

3) python  - python2.x, python-config  - python2.x-config
   python2 - python2.x, python2-config - python2.x-config

4) python  - python2.x, python-config  - python2.x-config
   python2 - python2.x, python2-config N/A

Based on / replaces the patch 2/2 by Namhyung Kim.

Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de
---
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 9ee2008..0528cd37 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -104,10 +104,16 @@ ifdef PARSER_DEBUG
 endif
 
 ifndef NO_LIBPYTHON
-  override PYTHON := \
-$(call get-executable-or-default,PYTHON,python)
+  # Try different combinations to accommodate systems that only have
+  # python[2][-config] in weird combinations but always preferring
+  # python2 and python2-config as per pep-0394. If we catch a
+  # python[-config] in version 3, the version check will kill it.
+  PYTHON2 := $(if $(call get-executable,python2),python2,python)
+  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
+  PYTHON2_CONFIG := \
+$(if $(call 
get-executable,$(PYTHON)-config),$(PYTHON)-config,python-config)
   override PYTHON_CONFIG := \
-$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
+$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON2_CONFIG))
 
   PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] perf tools: Default to python version 2

2014-08-02 Thread Thomas Ilsche

Zitat von Arnaldo Carvalho de Melo :

> Em Tue, Jul 29, 2014 at 03:57:20PM +0900, Namhyung Kim escreveu:
>> According to PEP 394 recommendation [1], it's more portable to use
>> python2 rather than plain python to refer python binary version 2.
>>
>> Since there're distros using python3 by default like Arch, and we
>> don't support python3 (yet), it'd be better using python2 explicitly.
>>
>> But older versions (prior to 2.7) seem not to provide python2 but just
>> python.  Given that it's only old version, try python2 first and then
>> fallback to python.  It'll ensure that it always points to python 2.x.
>
> It should fallback, right?

Yes.

> [acme@fedora14 linux]$ ls -la /usr/bin/python2-config
> ls: cannot access /usr/bin/python2-config: No such file or directory
> [acme@fedora14 linux]$ ls -la /usr/bin/python-config
> lrwxrwxrwx. 1 root root 16 Mar 25 09:43 /usr/bin/python-config ->  
> python2.7-config

Appearently this fedora package symlinks python -> python2 but does not so for
python2-config. The Makefile looks for either python2 or python as fallback and
then assumes that a respective "-config" exists. I think this is a sane 
assumption.

I guess if this needs to be supported, there needs to be another fallback if
$(PYTHON)-config is not found to try python-config again. However this may
behave weirdly if someone has python2, no python2-devel but python3-devel
installed, sets PYTHON=python2 and then still gets a version error. Or we
somehow only fallback if no PYTHON is specified (not sure how to implemented
that elegantly).

Best,
Thomas


> [acme@fedora14 linux]$ rpm -qf /usr/bin/python-config
> python-devel-2.7-8.fc14.1.x86_64
> [acme@fedora14 linux]$ cat /etc/fedora-release
> Fedora release 14 (Laughlin)
> [acme@fedora14 linux]$
>
> [acme@fedora14 linux]$ time make O=/tmp/build/perf -C tools/perf install
> make: Entering directory `/home/acme/git/linux/tools/perf'
>   BUILD:   Doing 'make -j4' parallel build
> config/Makefile:126: The path '/usr/bin/python2-config' is not executable.
> config/Makefile:339: No libdw DWARF unwind found, Please install
> elfutils-devel/libdw-dev >= 0.158 and/or set LIBDW_DIR
> config/Makefile:481: Missing perl devel files. Disabling perl scripting
> support, consider installing perl-ExtUtils-Embed
> config/Makefile:512: No python-config tool was found
> config/Makefile:512: Python support will not be built
>
>> [1] https://www.python.org/dev/peps/pep-0394
>>
>> Suggested-by: Thomas Ilsche 
>> Tested-by: Thomas Ilsche 
>> Signed-off-by: Namhyung Kim 
>> ---
>>  tools/perf/config/Makefile | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
>> index e05d8f99424d..60177278a357 100644
>> --- a/tools/perf/config/Makefile
>> +++ b/tools/perf/config/Makefile
>> @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG
>>  endif
>>
>>  ifndef NO_LIBPYTHON
>> -  override PYTHON := \
>> -$(call get-executable-or-default,PYTHON,python)
>> +  PYTHON2 := $(if $(call get-executable,python2),python2,python)
>> +  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
>>override PYTHON_CONFIG := \
>>  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
>>
>> --
>> 2.0.0






smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v2 2/2] perf tools: Default to python version 2

2014-08-02 Thread Thomas Ilsche

Zitat von Arnaldo Carvalho de Melo a...@kernel.org:

 Em Tue, Jul 29, 2014 at 03:57:20PM +0900, Namhyung Kim escreveu:
 According to PEP 394 recommendation [1], it's more portable to use
 python2 rather than plain python to refer python binary version 2.

 Since there're distros using python3 by default like Arch, and we
 don't support python3 (yet), it'd be better using python2 explicitly.

 But older versions (prior to 2.7) seem not to provide python2 but just
 python.  Given that it's only old version, try python2 first and then
 fallback to python.  It'll ensure that it always points to python 2.x.

 It should fallback, right?

Yes.

 [acme@fedora14 linux]$ ls -la /usr/bin/python2-config
 ls: cannot access /usr/bin/python2-config: No such file or directory
 [acme@fedora14 linux]$ ls -la /usr/bin/python-config
 lrwxrwxrwx. 1 root root 16 Mar 25 09:43 /usr/bin/python-config -  
 python2.7-config

Appearently this fedora package symlinks python - python2 but does not so for
python2-config. The Makefile looks for either python2 or python as fallback and
then assumes that a respective -config exists. I think this is a sane 
assumption.

I guess if this needs to be supported, there needs to be another fallback if
$(PYTHON)-config is not found to try python-config again. However this may
behave weirdly if someone has python2, no python2-devel but python3-devel
installed, sets PYTHON=python2 and then still gets a version error. Or we
somehow only fallback if no PYTHON is specified (not sure how to implemented
that elegantly).

Best,
Thomas


 [acme@fedora14 linux]$ rpm -qf /usr/bin/python-config
 python-devel-2.7-8.fc14.1.x86_64
 [acme@fedora14 linux]$ cat /etc/fedora-release
 Fedora release 14 (Laughlin)
 [acme@fedora14 linux]$

 [acme@fedora14 linux]$ time make O=/tmp/build/perf -C tools/perf install
 make: Entering directory `/home/acme/git/linux/tools/perf'
   BUILD:   Doing 'make -j4' parallel build
 config/Makefile:126: The path '/usr/bin/python2-config' is not executable.
 config/Makefile:339: No libdw DWARF unwind found, Please install
 elfutils-devel/libdw-dev = 0.158 and/or set LIBDW_DIR
 config/Makefile:481: Missing perl devel files. Disabling perl scripting
 support, consider installing perl-ExtUtils-Embed
 config/Makefile:512: No python-config tool was found
 config/Makefile:512: Python support will not be built

 [1] https://www.python.org/dev/peps/pep-0394

 Suggested-by: Thomas Ilsche thomas.ils...@tu-dresden.de
 Tested-by: Thomas Ilsche thomas.ils...@tu-dresden.de
 Signed-off-by: Namhyung Kim namhy...@kernel.org
 ---
  tools/perf/config/Makefile | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
 index e05d8f99424d..60177278a357 100644
 --- a/tools/perf/config/Makefile
 +++ b/tools/perf/config/Makefile
 @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG
  endif

  ifndef NO_LIBPYTHON
 -  override PYTHON := \
 -$(call get-executable-or-default,PYTHON,python)
 +  PYTHON2 := $(if $(call get-executable,python2),python2,python)
 +  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
override PYTHON_CONFIG := \
  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)

 --
 2.0.0






smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 2/2] perf tools: Default to python version 2

2014-07-28 Thread Thomas Ilsche
Hi Namhyung,

>>> I'm thinking about something like below.. but sadly it doesn't work for
>>> me.. hmm.
>>
>> Actually this appears to work for me (with 2.6.9 & 2.7.6) and I  
>> find this solution more
>> elegant.
>
> Thanks for testing.  It was my fault on setting symlink to a
> non-existing file for testing.  It now works well for me too.
>
> Can I add your Tested-by then?

Yes
Tested-by: Thomas Ilsche 

Plase note that your patch requires the previous patch that just set the 
default to python2.

I tested in an Arch Linux VM with:
A) no system install of python, source installation 2.6.9 in PATH
B) no system install of python, source installation 2.7.6 in PATH
C) python3 (as python) and python2 system installation

In all cases it reports to build with libpython and perf script with a dummy 
Python script
works.

If no python whatsoever is in PATH it reports:

config/Makefile:121: The path 'python' is not executable.
sh: line 0: command: -c: invalid option
command: usage: command [-pVv] command [arg ...]
config/Makefile:122: The path '-config' is not executable.

And the created perf executable does not work with the dummy script.

If only pthon3 is in PATH I get the expected version error message (build 
stops.)

Note that for the manual old installations of python I had an issue with 
linking order and
a -Werror issue that I resolved manually. But this is fixed with the latest 
system install
of python.

Best,
Thomas

>
> Thanks,
> Namhyung





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 2/2] perf tools: Default to python version 2

2014-07-28 Thread Thomas Ilsche
Hi Namhyung,

 I'm thinking about something like below.. but sadly it doesn't work for
 me.. hmm.

 Actually this appears to work for me (with 2.6.9  2.7.6) and I  
 find this solution more
 elegant.

 Thanks for testing.  It was my fault on setting symlink to a
 non-existing file for testing.  It now works well for me too.

 Can I add your Tested-by then?

Yes
Tested-by: Thomas Ilsche thomas.ils...@tu-dresden.de

Plase note that your patch requires the previous patch that just set the 
default to python2.

I tested in an Arch Linux VM with:
A) no system install of python, source installation 2.6.9 in PATH
B) no system install of python, source installation 2.7.6 in PATH
C) python3 (as python) and python2 system installation

In all cases it reports to build with libpython and perf script with a dummy 
Python script
works.

If no python whatsoever is in PATH it reports:

config/Makefile:121: The path 'python' is not executable.
sh: line 0: command: -c: invalid option
command: usage: command [-pVv] command [arg ...]
config/Makefile:122: The path '-config' is not executable.

And the created perf executable does not work with the dummy script.

If only pthon3 is in PATH I get the expected version error message (build 
stops.)

Note that for the manual old installations of python I had an issue with 
linking order and
a -Werror issue that I resolved manually. But this is fixed with the latest 
system install
of python.

Best,
Thomas


 Thanks,
 Namhyung





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 2/2] perf tools: Default to python version 2

2014-07-25 Thread Thomas Ilsche
Hi Namhyung,

apologies for mixing up your first name earlier.

On 2014-07-25 12:24, Namhyung Kim wrote:
> Hi Thomas,
> 
> On Fri, 25 Jul 2014 11:28:55 +0200, Thomas Ilsche wrote:
>> Hi Kim,
>>
>> On 2014-07-25 05:14, Namhyung Kim wrote:
>>> According to PEP 394 recommendation [1], it's more portable to use
>>> python2 rather than plain python to refer python binary version 2.
>>
>> I tried to find out how backwards-compatible this is. python2(-config)
>> was first available in Python 2.7.3 (April 2012), but it is still not
>> available in 2.6.9 (Oct. 2013). So it might be better to use python2
>> as default but fall back to python if python2 is not available.
>>
>> Best,
>> Thomas
>>
>> Signed-off-by: Thomas Ilsche 
>> ---
>> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
>> index f4f7f58..1b1bc9c 100644
>> --- a/tools/perf/config/Makefile
>> +++ b/tools/perf/config/Makefile
>> @@ -117,8 +117,8 @@ ifdef PARSER_DEBUG
>>  endif
>>
>>  ifndef NO_LIBPYTHON
>> -  override PYTHON := \
>> -$(call get-executable-or-default,PYTHON,python)
>> +  PYTHON2 := $(call get-executable-or-default,PYTHON,python2)
> 
> But wouldn't it set PYTHON2 to python2 even if the system doesn't have
> python2 actually?
> 
> 
>> +  override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call 
>> get-executable,python))
> 
> And then it'll set PYTHON to python2, no?
> 

No, get-executable-or-default only returns anything if it is an actual 
executable
(asserted by a 'command -v' and 'test -f $ -a -x $'). At least this is how I 
understand
the Makefile code. I also tested it with a Python 2.6.9 (no python2 in path) 
and it worked
(after fixing some totally unrelated issues in python headers). It will however 
complain
that "The path 'python2' is not executable." before using python.

> 
>>override PYTHON_CONFIG := \
>>  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
> 
> 
> I'm thinking about something like below.. but sadly it doesn't work for
> me.. hmm.

Actually this appears to work for me (with 2.6.9 & 2.7.6) and I find this 
solution more
elegant.

Best,
Thomas

> 
> Thanks,
> Namhyung
> 
> 
> 
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 0f4b7fbc4852..60177278a357 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG
>  endif
>  
>  ifndef NO_LIBPYTHON
> -  override PYTHON := \
> -$(call get-executable-or-default,PYTHON,python2)
> +  PYTHON2 := $(if $(call get-executable,python2),python2,python)
> +  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
>override PYTHON_CONFIG := \
>  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
>  
> 

-- 
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-3773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 2/2] perf tools: Default to python version 2

2014-07-25 Thread Thomas Ilsche
Hi Kim,

On 2014-07-25 05:14, Namhyung Kim wrote:
> According to PEP 394 recommendation [1], it's more portable to use
> python2 rather than plain python to refer python binary version 2.

I tried to find out how backwards-compatible this is. python2(-config)
was first available in Python 2.7.3 (April 2012), but it is still not
available in 2.6.9 (Oct. 2013). So it might be better to use python2
as default but fall back to python if python2 is not available.

Best,
Thomas

Signed-off-by: Thomas Ilsche 
---
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index f4f7f58..1b1bc9c 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -117,8 +117,8 @@ ifdef PARSER_DEBUG
 endif

 ifndef NO_LIBPYTHON
-  override PYTHON := \
-$(call get-executable-or-default,PYTHON,python)
+  PYTHON2 := $(call get-executable-or-default,PYTHON,python2)
+  override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call get-executable,python))
   override PYTHON_CONFIG := \
 $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] perf tools: Fix make PYTHON override

2014-07-25 Thread Thomas Ilsche
Hi Namhyung,

On 2014-07-25 05:14, Namhyung Kim wrote:
> Thomas reported that make PYTHON=python2 is not work on some systems.
> I can reproduce it on my ArchLinux box too.
> 
> This is because it's overridden by config/feature-checks/Makefile
> regardless of PYTHON setting.  I guess it's a bug slipped into during
> the feature checking change.
> 
> Actually, we don't need to check python-config in the feature-checks.
> We can just pass appropriate FEATURE_CHECK_*FLAGS.

The patch works fine for me.

Thanks,
Thomas

> 
> Reported-by: Thomas Ilsche 
> Signed-off-by: Namhyung Kim 
> ---
>  tools/perf/config/Makefile| 34 
> ---
>  tools/perf/config/feature-checks/Makefile | 18 ++--
>  2 files changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index 1f67aa02d240..e05d8f99424d 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -120,6 +120,23 @@ ifdef PARSER_DEBUG
>CFLAGS += -DPARSER_DEBUG
>  endif
>  
> +ifndef NO_LIBPYTHON
> +  override PYTHON := \
> +$(call get-executable-or-default,PYTHON,python)
> +  override PYTHON_CONFIG := \
> +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
> +
> +  PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
> +
> +  PYTHON_EMBED_LDOPTS := $(shell $(PYTHON_CONFIG_SQ) --ldflags 2>/dev/null)
> +  PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null)
> +
> +  FEATURE_CHECK_CFLAGS-libpython := $(PYTHON_EMBED_CCOPTS)
> +  FEATURE_CHECK_LDFLAGS-libpython := $(PYTHON_EMBED_LDOPTS)
> +  FEATURE_CHECK_CFLAGS-libpython-version := $(PYTHON_EMBED_CCOPTS)
> +  FEATURE_CHECK_LDFLAGS-libpython-version := $(PYTHON_EMBED_LDOPTS)
> +endif
> +
>  CFLAGS += -fno-omit-frame-pointer
>  CFLAGS += -ggdb3
>  CFLAGS += -funwind-tables
> @@ -482,21 +499,14 @@ define disable-python_code
>NO_LIBPYTHON := 1
>  endef
>  
> -override PYTHON := \
> -  $(call get-executable-or-default,PYTHON,python)
> -
> -ifndef PYTHON
> -  $(call disable-python,python interpreter)
> +ifdef NO_LIBPYTHON
> +  $(call disable-python)
>  else
>  
> -  PYTHON_WORD := $(call shell-wordify,$(PYTHON))
> -
> -  ifdef NO_LIBPYTHON
> -$(call disable-python)
> +  ifndef PYTHON
> +$(call disable-python,python interpreter)
>else
> -
> -override PYTHON_CONFIG := \
> -  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
> +PYTHON_WORD := $(call shell-wordify,$(PYTHON))
>  
>  ifndef PYTHON_CONFIG
>$(call disable-python,python-config tool)
> diff --git a/tools/perf/config/feature-checks/Makefile 
> b/tools/perf/config/feature-checks/Makefile
> index 6088f8d8a434..72ab2984718e 100644
> --- a/tools/perf/config/feature-checks/Makefile
> +++ b/tools/perf/config/feature-checks/Makefile
> @@ -101,25 +101,11 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) 
> $(PERL_EMBED_LDOPTS)
>  test-libperl.bin:
>   $(BUILD) $(FLAGS_PERL_EMBED)
>  
> -override PYTHON := python
> -override PYTHON_CONFIG := python-config
> -
> -escape-for-shell-sq =  $(subst ','\'',$(1))
> -shell-sq = '$(escape-for-shell-sq)'
> -
> -PYTHON_CONFIG_SQ = $(call shell-sq,$(PYTHON_CONFIG))
> -
> -PYTHON_EMBED_LDOPTS = $(shell $(PYTHON_CONFIG_SQ) --ldflags 2>/dev/null)
> -PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS))
> -PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS))
> -PYTHON_EMBED_CCOPTS = $(shell $(PYTHON_CONFIG_SQ) --cflags 2>/dev/null)
> -FLAGS_PYTHON_EMBED = $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS)
> -
>  test-libpython.bin:
> - $(BUILD) $(FLAGS_PYTHON_EMBED)
> + $(BUILD)
>  
>  test-libpython-version.bin:
> - $(BUILD) $(FLAGS_PYTHON_EMBED)
> + $(BUILD)
>  
>  test-libbfd.bin:
>   $(BUILD) -DPACKAGE='"perf"' -lbfd -lz -liberty -ldl
> 

-- 
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-3773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 1/2] perf tools: Fix make PYTHON override

2014-07-25 Thread Thomas Ilsche
Hi Namhyung,

On 2014-07-25 05:14, Namhyung Kim wrote:
 Thomas reported that make PYTHON=python2 is not work on some systems.
 I can reproduce it on my ArchLinux box too.
 
 This is because it's overridden by config/feature-checks/Makefile
 regardless of PYTHON setting.  I guess it's a bug slipped into during
 the feature checking change.
 
 Actually, we don't need to check python-config in the feature-checks.
 We can just pass appropriate FEATURE_CHECK_*FLAGS.

The patch works fine for me.

Thanks,
Thomas

 
 Reported-by: Thomas Ilsche thomas.ils...@tu-dresden.de
 Signed-off-by: Namhyung Kim namhy...@kernel.org
 ---
  tools/perf/config/Makefile| 34 
 ---
  tools/perf/config/feature-checks/Makefile | 18 ++--
  2 files changed, 24 insertions(+), 28 deletions(-)
 
 diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
 index 1f67aa02d240..e05d8f99424d 100644
 --- a/tools/perf/config/Makefile
 +++ b/tools/perf/config/Makefile
 @@ -120,6 +120,23 @@ ifdef PARSER_DEBUG
CFLAGS += -DPARSER_DEBUG
  endif
  
 +ifndef NO_LIBPYTHON
 +  override PYTHON := \
 +$(call get-executable-or-default,PYTHON,python)
 +  override PYTHON_CONFIG := \
 +$(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
 +
 +  PYTHON_CONFIG_SQ := $(call shell-sq,$(PYTHON_CONFIG))
 +
 +  PYTHON_EMBED_LDOPTS := $(shell $(PYTHON_CONFIG_SQ) --ldflags 2/dev/null)
 +  PYTHON_EMBED_CCOPTS := $(shell $(PYTHON_CONFIG_SQ) --cflags 2/dev/null)
 +
 +  FEATURE_CHECK_CFLAGS-libpython := $(PYTHON_EMBED_CCOPTS)
 +  FEATURE_CHECK_LDFLAGS-libpython := $(PYTHON_EMBED_LDOPTS)
 +  FEATURE_CHECK_CFLAGS-libpython-version := $(PYTHON_EMBED_CCOPTS)
 +  FEATURE_CHECK_LDFLAGS-libpython-version := $(PYTHON_EMBED_LDOPTS)
 +endif
 +
  CFLAGS += -fno-omit-frame-pointer
  CFLAGS += -ggdb3
  CFLAGS += -funwind-tables
 @@ -482,21 +499,14 @@ define disable-python_code
NO_LIBPYTHON := 1
  endef
  
 -override PYTHON := \
 -  $(call get-executable-or-default,PYTHON,python)
 -
 -ifndef PYTHON
 -  $(call disable-python,python interpreter)
 +ifdef NO_LIBPYTHON
 +  $(call disable-python)
  else
  
 -  PYTHON_WORD := $(call shell-wordify,$(PYTHON))
 -
 -  ifdef NO_LIBPYTHON
 -$(call disable-python)
 +  ifndef PYTHON
 +$(call disable-python,python interpreter)
else
 -
 -override PYTHON_CONFIG := \
 -  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
 +PYTHON_WORD := $(call shell-wordify,$(PYTHON))
  
  ifndef PYTHON_CONFIG
$(call disable-python,python-config tool)
 diff --git a/tools/perf/config/feature-checks/Makefile 
 b/tools/perf/config/feature-checks/Makefile
 index 6088f8d8a434..72ab2984718e 100644
 --- a/tools/perf/config/feature-checks/Makefile
 +++ b/tools/perf/config/feature-checks/Makefile
 @@ -101,25 +101,11 @@ FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) 
 $(PERL_EMBED_LDOPTS)
  test-libperl.bin:
   $(BUILD) $(FLAGS_PERL_EMBED)
  
 -override PYTHON := python
 -override PYTHON_CONFIG := python-config
 -
 -escape-for-shell-sq =  $(subst ','\'',$(1))
 -shell-sq = '$(escape-for-shell-sq)'
 -
 -PYTHON_CONFIG_SQ = $(call shell-sq,$(PYTHON_CONFIG))
 -
 -PYTHON_EMBED_LDOPTS = $(shell $(PYTHON_CONFIG_SQ) --ldflags 2/dev/null)
 -PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS))
 -PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS))
 -PYTHON_EMBED_CCOPTS = $(shell $(PYTHON_CONFIG_SQ) --cflags 2/dev/null)
 -FLAGS_PYTHON_EMBED = $(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS)
 -
  test-libpython.bin:
 - $(BUILD) $(FLAGS_PYTHON_EMBED)
 + $(BUILD)
  
  test-libpython-version.bin:
 - $(BUILD) $(FLAGS_PYTHON_EMBED)
 + $(BUILD)
  
  test-libbfd.bin:
   $(BUILD) -DPACKAGE='perf' -lbfd -lz -liberty -ldl
 

-- 
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-3773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 2/2] perf tools: Default to python version 2

2014-07-25 Thread Thomas Ilsche
Hi Kim,

On 2014-07-25 05:14, Namhyung Kim wrote:
 According to PEP 394 recommendation [1], it's more portable to use
 python2 rather than plain python to refer python binary version 2.

I tried to find out how backwards-compatible this is. python2(-config)
was first available in Python 2.7.3 (April 2012), but it is still not
available in 2.6.9 (Oct. 2013). So it might be better to use python2
as default but fall back to python if python2 is not available.

Best,
Thomas

Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de
---
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index f4f7f58..1b1bc9c 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -117,8 +117,8 @@ ifdef PARSER_DEBUG
 endif

 ifndef NO_LIBPYTHON
-  override PYTHON := \
-$(call get-executable-or-default,PYTHON,python)
+  PYTHON2 := $(call get-executable-or-default,PYTHON,python2)
+  override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call get-executable,python))
   override PYTHON_CONFIG := \
 $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] perf tools: Default to python version 2

2014-07-25 Thread Thomas Ilsche
Hi Namhyung,

apologies for mixing up your first name earlier.

On 2014-07-25 12:24, Namhyung Kim wrote:
 Hi Thomas,
 
 On Fri, 25 Jul 2014 11:28:55 +0200, Thomas Ilsche wrote:
 Hi Kim,

 On 2014-07-25 05:14, Namhyung Kim wrote:
 According to PEP 394 recommendation [1], it's more portable to use
 python2 rather than plain python to refer python binary version 2.

 I tried to find out how backwards-compatible this is. python2(-config)
 was first available in Python 2.7.3 (April 2012), but it is still not
 available in 2.6.9 (Oct. 2013). So it might be better to use python2
 as default but fall back to python if python2 is not available.

 Best,
 Thomas

 Signed-off-by: Thomas Ilsche thomas.ils...@tu-dresden.de
 ---
 diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
 index f4f7f58..1b1bc9c 100644
 --- a/tools/perf/config/Makefile
 +++ b/tools/perf/config/Makefile
 @@ -117,8 +117,8 @@ ifdef PARSER_DEBUG
  endif

  ifndef NO_LIBPYTHON
 -  override PYTHON := \
 -$(call get-executable-or-default,PYTHON,python)
 +  PYTHON2 := $(call get-executable-or-default,PYTHON,python2)
 
 But wouldn't it set PYTHON2 to python2 even if the system doesn't have
 python2 actually?
 
 
 +  override PYTHON := $(if $(PYTHON2),$(PYTHON2),$(call 
 get-executable,python))
 
 And then it'll set PYTHON to python2, no?
 

No, get-executable-or-default only returns anything if it is an actual 
executable
(asserted by a 'command -v' and 'test -f $ -a -x $'). At least this is how I 
understand
the Makefile code. I also tested it with a Python 2.6.9 (no python2 in path) 
and it worked
(after fixing some totally unrelated issues in python headers). It will however 
complain
that The path 'python2' is not executable. before using python.

 
override PYTHON_CONFIG := \
  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
 
 
 I'm thinking about something like below.. but sadly it doesn't work for
 me.. hmm.

Actually this appears to work for me (with 2.6.9  2.7.6) and I find this 
solution more
elegant.

Best,
Thomas

 
 Thanks,
 Namhyung
 
 
 
 diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
 index 0f4b7fbc4852..60177278a357 100644
 --- a/tools/perf/config/Makefile
 +++ b/tools/perf/config/Makefile
 @@ -121,8 +121,8 @@ ifdef PARSER_DEBUG
  endif
  
  ifndef NO_LIBPYTHON
 -  override PYTHON := \
 -$(call get-executable-or-default,PYTHON,python2)
 +  PYTHON2 := $(if $(call get-executable,python2),python2,python)
 +  override PYTHON := $(call get-executable-or-default,PYTHON,$(PYTHON2))
override PYTHON_CONFIG := \
  $(call get-executable-or-default,PYTHON_CONFIG,$(PYTHON)-config)
  
 

-- 
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-3773
E-Mail: thomas.ils...@tu-dresden.de



smime.p7s
Description: S/MIME Cryptographic Signature


Perf: cannot overide PYTHON(_CONFIG)

2014-07-24 Thread Thomas Ilsche
Dear perf maintainers,

It is documented in Makefile.perf and the Python 3 error message to use
make PYTHON=python2. However this does not work anymore since
9734163b6ee1425c6fa4b65d7e6ce34c9079420d moved the libpython feature check.

Further it would be more portable anyway to use "python2" as default
instead of "python" as long as python3 is not supported. This follows
the PEP 394 recommendation: https://www.python.org/dev/peps/pep-0394


I am not sure how to correctly fix the failure to respect
PYTHON(_CONFIG). I would assume that early in perf/config/Makefile
FEATURE_CHECK_[CFLAGS|LDFLAGS]-libpython(-version) needs to be set
(before the feature checks are executed) and that the reconstruction of
FLAGS_PYTHON_EMBED based on hardcoded python-config needs to be removed.

Best Regards,
Thomas
-- 
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-3773
E-Mail: thomas.ils...@tu-dresden.de
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Perf: cannot overide PYTHON(_CONFIG)

2014-07-24 Thread Thomas Ilsche
Dear perf maintainers,

It is documented in Makefile.perf and the Python 3 error message to use
make PYTHON=python2. However this does not work anymore since
9734163b6ee1425c6fa4b65d7e6ce34c9079420d moved the libpython feature check.

Further it would be more portable anyway to use python2 as default
instead of python as long as python3 is not supported. This follows
the PEP 394 recommendation: https://www.python.org/dev/peps/pep-0394


I am not sure how to correctly fix the failure to respect
PYTHON(_CONFIG). I would assume that early in perf/config/Makefile
FEATURE_CHECK_[CFLAGS|LDFLAGS]-libpython(-version) needs to be set
(before the feature checks are executed) and that the reconstruction of
FLAGS_PYTHON_EMBED based on hardcoded python-config needs to be removed.

Best Regards,
Thomas
-- 
Dipl. Inf. Thomas Ilsche
Computer Scientist
Highly Adaptive Energy-Efficient Computing
CRC 912 HAEC: http://tu-dresden.de/sfb912
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: +49 351 463-42168
Fax: +49 351 463-3773
E-Mail: thomas.ils...@tu-dresden.de
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Provide additional sample information to Python scripts

2014-03-12 Thread Thomas Ilsche
On 2014-03-07 15:18, Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 18, 2014 at 09:43:24AM +0100, Joseph Schuchart escreveu:
>> Good morning,
>>
>> We have developed a patch for the perf Python scripting interface to
>> provide additional information about the pid, tid, and cpu of generic
>> events as well as information about the call-stack and resolved symbol
>> names. This provides scripts with a greater level of detail. The
>> mentioned information is already available to the scripting engine and
>> just has to be handed down. This is done by the attached patch. The
>> patch is based on Linux-3.13.3.
>>
>> Please let me know if you have any questions on this.
> 
> Can you please resend, against the perf/core branch in
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git, and as an
> attachement or making sure that the patch is not mangled?

Joseph and I are working on this but due to vacation we expect to send
you the patch with some improvements in the beginning of April. Please
let us know in case there is some deadline.

Thanks,
Thomas

> 
> - Arnaldo
>  
>> Thanks
>> Joseph
>> -- 
>> Dipl. Inf. Joseph Schuchart
>> Computer Scientist
>>
>> Technische Universität Dresden
>> Center for Information Services and High Performance Computing (ZIH)
>> 01062 Dresden, Germany
>>
>> Phone: +49 351 463-36494
>> Fax: +49 351 463-3773
>> E-Mail: joseph.schuch...@tu-dresden.de
> 
>> Perf: Provide sample information and call-chain to Python script
>>
>> Provide additional sample information on generic events to Python 
>> scripts, including pid, tid, and cpu for which the event was recorded.
>> Additionally, provide the call-stack recorded at each event with 
>> resolved symbols. At the moment, the pointer to the sample struct
>> is passed to scripts, which seems to be of little use. The patch
>> puts this information in dictionaries for easy access by Python
>> scripts. 
>>
>> Signed-off-by: Joseph Schuchart 
>> Acked-by: Thomas Ilsche 
>>
>> @@ -359,7 +359,7 @@ static void python_process_general_event
>>   struct thread *thread,
>>   struct addr_location *al)
>>  {
>> -PyObject *handler, *retval, *t, *dict;
>> +PyObject *handler, *retval, *t, *dict, *dict_sample;
>>  static char handler_name[64];
>>  unsigned n = 0;
>>  
>> @@ -375,6 +375,10 @@ static void python_process_general_event
>>  if (!dict)
>>  Py_FatalError("couldn't create Python dictionary");
>>  
>> +dict_sample = PyDict_New();
>> +if (!dict_sample)
>> +Py_FatalError("couldn't create Python dictionary");
>> +
>>  snprintf(handler_name, sizeof(handler_name), "%s", "process_event");
>>  
>>  handler = PyDict_GetItemString(main_dict, handler_name);
>> @@ -384,8 +388,76 @@ static void python_process_general_event
>>  pydict_set_item_string_decref(dict, "ev_name", 
>> PyString_FromString(perf_evsel__name(evsel)));
>>  pydict_set_item_string_decref(dict, "attr", PyString_FromStringAndSize(
>>  (const char *)>attr, sizeof(evsel->attr)));
>> -pydict_set_item_string_decref(dict, "sample", 
>> PyString_FromStringAndSize(
>> -(const char *)sample, sizeof(*sample)));
>> +
>> +/* PID/TIDs are limited to 2^29, so we can safely use PyInt */
>> +pydict_set_item_string_decref(dict_sample, "pid", 
>> PyInt_FromLong(sample->pid));
>> +pydict_set_item_string_decref(dict_sample, "tid", 
>> PyInt_FromLong(sample->tid));
>> +pydict_set_item_string_decref(dict_sample, "cpu", 
>> PyInt_FromLong(sample->cpu));
>> +pydict_set_item_string_decref(dict_sample, "time", 
>> PyLong_FromUnsignedLongLong(sample->time));
>> +pydict_set_item_string_decref(dict, "sample", dict_sample);
>> +
>> +/* ip unwinding */
>> +
>> +if (symbol_conf.use_callchain && sample->callchain) {
>> +PyObject *pylist;
>> +
>> +if (machine__resolve_callchain(machine, evsel, al->thread,
>> +   sample, NULL, NULL, 
>> PERF_MAX_STACK_DEPTH) != 0) {
>> +pr_err("Failed to resolve callchain. Skipping\n");
>> +goto exit;
>> +}
>> +callchain_cu

Re: [PATCH] Provide additional sample information to Python scripts

2014-03-12 Thread Thomas Ilsche
On 2014-03-07 15:18, Arnaldo Carvalho de Melo wrote:
 Em Tue, Feb 18, 2014 at 09:43:24AM +0100, Joseph Schuchart escreveu:
 Good morning,

 We have developed a patch for the perf Python scripting interface to
 provide additional information about the pid, tid, and cpu of generic
 events as well as information about the call-stack and resolved symbol
 names. This provides scripts with a greater level of detail. The
 mentioned information is already available to the scripting engine and
 just has to be handed down. This is done by the attached patch. The
 patch is based on Linux-3.13.3.

 Please let me know if you have any questions on this.
 
 Can you please resend, against the perf/core branch in
 git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git, and as an
 attachement or making sure that the patch is not mangled?

Joseph and I are working on this but due to vacation we expect to send
you the patch with some improvements in the beginning of April. Please
let us know in case there is some deadline.

Thanks,
Thomas

 
 - Arnaldo
  
 Thanks
 Joseph
 -- 
 Dipl. Inf. Joseph Schuchart
 Computer Scientist

 Technische Universität Dresden
 Center for Information Services and High Performance Computing (ZIH)
 01062 Dresden, Germany

 Phone: +49 351 463-36494
 Fax: +49 351 463-3773
 E-Mail: joseph.schuch...@tu-dresden.de
 
 Perf: Provide sample information and call-chain to Python script

 Provide additional sample information on generic events to Python 
 scripts, including pid, tid, and cpu for which the event was recorded.
 Additionally, provide the call-stack recorded at each event with 
 resolved symbols. At the moment, the pointer to the sample struct
 is passed to scripts, which seems to be of little use. The patch
 puts this information in dictionaries for easy access by Python
 scripts. 

 Signed-off-by: Joseph Schuchart joseph.schuch...@tu-dresden.de
 Acked-by: Thomas Ilsche thomas.ils...@tu-dresden.de

 @@ -359,7 +359,7 @@ static void python_process_general_event
   struct thread *thread,
   struct addr_location *al)
  {
 -PyObject *handler, *retval, *t, *dict;
 +PyObject *handler, *retval, *t, *dict, *dict_sample;
  static char handler_name[64];
  unsigned n = 0;
  
 @@ -375,6 +375,10 @@ static void python_process_general_event
  if (!dict)
  Py_FatalError(couldn't create Python dictionary);
  
 +dict_sample = PyDict_New();
 +if (!dict_sample)
 +Py_FatalError(couldn't create Python dictionary);
 +
  snprintf(handler_name, sizeof(handler_name), %s, process_event);
  
  handler = PyDict_GetItemString(main_dict, handler_name);
 @@ -384,8 +388,76 @@ static void python_process_general_event
  pydict_set_item_string_decref(dict, ev_name, 
 PyString_FromString(perf_evsel__name(evsel)));
  pydict_set_item_string_decref(dict, attr, PyString_FromStringAndSize(
  (const char *)evsel-attr, sizeof(evsel-attr)));
 -pydict_set_item_string_decref(dict, sample, 
 PyString_FromStringAndSize(
 -(const char *)sample, sizeof(*sample)));
 +
 +/* PID/TIDs are limited to 2^29, so we can safely use PyInt */
 +pydict_set_item_string_decref(dict_sample, pid, 
 PyInt_FromLong(sample-pid));
 +pydict_set_item_string_decref(dict_sample, tid, 
 PyInt_FromLong(sample-tid));
 +pydict_set_item_string_decref(dict_sample, cpu, 
 PyInt_FromLong(sample-cpu));
 +pydict_set_item_string_decref(dict_sample, time, 
 PyLong_FromUnsignedLongLong(sample-time));
 +pydict_set_item_string_decref(dict, sample, dict_sample);
 +
 +/* ip unwinding */
 +
 +if (symbol_conf.use_callchain  sample-callchain) {
 +PyObject *pylist;
 +
 +if (machine__resolve_callchain(machine, evsel, al-thread,
 +   sample, NULL, NULL, 
 PERF_MAX_STACK_DEPTH) != 0) {
 +pr_err(Failed to resolve callchain. Skipping\n);
 +goto exit;
 +}
 +callchain_cursor_commit(callchain_cursor);
 +
 +pylist = PyList_New(0);
 +if (!pylist)
 +Py_FatalError(couldn't create Python list);
 +
 +while (1) {
 +PyObject *pyelem;
 +struct callchain_cursor_node *node;
 +node = callchain_cursor_current(callchain_cursor);
 +if (!node)
 +break;
 +
 +pyelem = PyDict_New();
 +if (!pyelem)
 +Py_FatalError(couldn't create Python 
 dictionary);
 +
 +
 +pydict_set_item_string_decref(pyelem, ip, 
 PyInt_FromLong(node-ip));
 +
 +if (node-sym) {
 +PyObject *pysym  = PyDict_New();
 +if (!pysym)
 +Py_FatalError